Info |
---|
Status (3 March 2021)
|
...
In a sequence library’s rawdata/
directory (e.g., /project/microbiome/data/seq/cu_24feb21novaseq4/rawdata
) I made run_aggregate.sh
, to run aggregate_usearch_fastx_info.pl
with a slurm job.
...
Trim, merge and filter reads
In Using the current steps in Bioinformatics v3.0, we trimmed primers from reads with cutadapt, and merged and filtered them with vsearch. Output is in /project/microbiome/data/seq/cu_24feb21novaseq4/tfmergedreads
, we used run_slurm_mergereads.pl
to crawl the project folders and sample files (created in the splitting step above) to merge read pairs, and filter based on base quality. We are now not trimming primer regions. This is true even though we found with previous libraries that we could not readily eliminate low frequency, potentially spurious OTUs downstream by masking with vsearch; it would not properly mask these regions [by marking them with lower-case bases] and the soft mask them when we do vsearch cluster_size
. This writes a new set of fasta files for each sample and project, rather than fastq, to be used in subsequent steps. These files are found in the 16S/
and ITS/
folders in tfmergedreads/
. which is further broken by 16S and ITS, and project name.
Within each of these directories are files for the trimmed, merged, and filtered reads. In each of these directories, there are subfolders trimmed/
, joined/
, and unmerged/
(the last one is used as a working directory, should be empty; unmerged reads are filtered and joined and put in joined/
if they can be joined; the joined directory can be empty, if all unmerged reads were coligos for example). For example, see contents of /project/microbiome/data/seq/cu_24feb21novaseq4/tfmergedreads/16S/ayayee/
...
This is where I am as of (steps above have been launched). Alex Buerkle will continue from here . The steps below have not yet been started; they contain notes for what I will do.
...
Statistics on the initial number reads, the number of reads that merged, and the number of reads that remain after filtering are in filtermergestats.csv
in each project folder. For the full lane these summaries were concatenated in tfmergedreads/
with
...
I used commands.R
in that folder to make a plot of numbers of reads per sample (horizontal axis) and the number reads that were removed because they did not merge, or did meet quality criteria and were filtered out (vertical axis). Purple is for 16S and orange is for ITS. It might be interesting to do that plot for each of the projects in the library (TODO), and possibly to have marginal histograms (or put quantiles on the plots).
Make OTU table
In /project/microbiome/data/seq/cu_24feb21novaseq4/otu
, I ran run_slurm_mkotu.pl
, which I modified to also pick up the joined reads (in addition the merged reads).