Page Comparison

Info
Status (

...

10 March 2021)

Data arrived by mail on 24 February 2021.

...

The otutables should be ready for users to work with.

Table of Contents

Demultiplexing and splitting

...

In a sequence library’s rawdata/ directory (e.g., /project/microbiome/data/seq/cu_24feb21novaseq4/rawdata) I made run_aggregate.sh, to run aggregate_usearch_fastx_info.pl with a slurm job. Summaries are written to summary_sample_fastq.csv.

Trim, merge and filter reads

...

Within each of these directories are files for the trimmed, merged, and filtered reads, in subfolders trimmed/, joined/, and unmerged/ (the last one is used as a working directory, should be empty; unmerged reads are filtered and joined and put in joined/ if they can be joined; the joined directory can be empty, if all unmerged reads were coligos for example).

This is where I am as of 03 Mar 2021 (steps above have been launched). Alex Buerkle will continue from here . The steps below have not yet been started; they contain notes for what I will do.

Statistics on the initial number reads, the number of reads that merged, and the number of reads that remain after filtering are in filtermergestats.csv in each project folder. Please note that this will not include the number of reads that failed to merge, but we were able to join. This category is likely to include ITS sequences for which the amplicon was large enough that our 2x250bp reads could not span the whole length. The greater number removed in ITS (orange) in the plot below is consistent with this idea. For the full lane these summaries were concatenated in tfmergedreads/ with

...

I used commands.R in that folder to make a plot of numbers of reads per sample (horizontal axis) and the number reads that were removed because they did not merge, or did meet quality criteria and were filtered out (vertical axis). Purple is for 16S and orange is for ITS. It might be interesting to do that plot for each of the projects in the library (TODO), and possibly to have marginal histograms (or put quantiles on the plots).

...

Make OTU table

In /project/microbiome/data/seq/cu_24feb21novaseq4/otu, I ran run_slurm_mkotu.pl, which I modified to also pick up the joined reads (in addition the merged reads). I reran this on 05 Apr 2021 to address an error in how we using vsearch, whereby unique sequences were being incorrectly merged in the otutable because they did not have unique names (runs should be done late on 05 Apr 2021.

Make coligo table

In /project/microbiome/data/seq/psomagen_6mar20/coligoISD, /project/microbiome/data/seq/psomagen_26may20/coligoISD, and /project/microbiome/data/seq/psomagen_29jan21novaseq1c/coligoISD, there are 16S and ITS directories for all projects. These contain a file named coligoISDtable.txt with counts of the coligos and the ISD found in the trimmed forward reads, per sample. The file run_slurm_mkcoligoISDtable.pl has the code that passes over all of the projects and uses vsearch for making the table.

Versions Compared

Old Version 13

New Version Current

Key

Demultiplexing and splitting

Trim, merge and filter reads

Make OTU table

Make coligo table