Info |
---|
(23 September 2020) A first pass through the bioinformatics for this library is complete and otutables are available for review (Paul Ayayee Erin Bentley John Calder Joshua HarrisonSeifeddine Ben Tekaya). Alex Buerkle used the same bioinformatic steps as were used for runs 1A and 1B Status (1 October 2020) Based on Gordon Custer and Joshua Harrison's work to understand why chimeras were not being found and screened out, Alex Buerkle made the suggested change in a setting in 20 March 2021) Revisiting the trim, merge, join, and otutable generation steps, to have equivalent methods that we have now applied to Novaseq 1A, 1B, 1C, 3, and 4. |
Table of Contents |
---|
Demultiplexing and splitting
...
Merge reads, filter, and (optionally) trim primer regionsoptionally) trim primer regions
– Alex Buerkle is rerunning the following steps, with updated code to correspond to new options we’re using across libraries.
In /project/microbiome/data/seq/psomagen_17sep20_novaseq2/tfmergedreads
, we used run_slurm_mergereads.pl
to crawl all of the project folders and sample files (created in the splitting step above) to merge read pairs, filter based on base quality, and optionally trim primer regions from the reads. We are now not trimming primer regions, even though we found that we could not readily eliminate low frequency, potentially spurious OTUs downstream by masking with vsearch; it would not properly mask these regions [by marking them with lower-case bases] and the soft mask them when we do vsearch cluster_size
or join read pairs, filter based on base quality, and trim primer regions from the reads. This writes a new set of fasta files for each sample and project, rather than fastq, to be used in subsequent steps. These files are found in the 16S/
and ITS/
folders in tfmergedreads/
. Statistics on the initial number reads, the number of reads that merged, and the number of reads that remain after filtering are in filtermergestats.csv
in each project folder. For the full lane these summaries were concatenated in tfmergedreads/
with
...
I used commands.R
in that folder to make a plot of numbers of reads per sample (horizontal axis) and the number reads that were removed because they did not merge (this does not account for reads that were joined/concatenated because the amplicon region was so long that we didn’t have sufficient overlap for merge), or did not meet quality criteria and were filtered out (vertical axis). Purple is for 16S and orange is for ITS. It might be interesting to do that plot for each of the projects in the library (TODO), and possibly to have marginal histograms (or put quantiles on the plots).
...