Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Info

Status (3 March 2021)

  • Data arrived by mail on 24 February 2021. Alex Buerkle has started processing the data to merged reads, and otu tables.

...

Trim, merge and filter reads

Using the current steps in Bioinformatics v3.0, we trimmed primers from reads with cutadapt, and merged and filtered them with vsearch. Output is in In /project/microbiome/data/seq/cu_24feb21novaseq4/tfmergedreads , we used run_slurm_mergereads.plto crawl the project folders and sample files (created in the splitting step above) to merge read pairs, and filter based on base quality. This script conforms to the steps in https://microcollaborative.atlassian.net/wiki/spaces/MICLAB/pages/1123778569/Bioinformatics+v3.0?focusedCommentId=1280377080#comment-1280377080, including trimming primers, and joining unmerged reads. This writes a new set of fasta files for each sample and project, rather than fastq, to be used in subsequent steps. These files are found in the 16S/ and ITS/ folders in tfmergedreads/. For example, see contents of /project/microbiome/data/seq/cu_24feb21novaseq4/tfmergedreads, which is further broken by 16S and ITS, and project name./16S/ayayee/

Within each of these directories are files for the trimmed, merged, and filtered reads. In each of these directories, there are , in subfolders trimmed/, joined/, and unmerged/ (the last one is used as a working directory, should be empty; unmerged reads are filtered and joined and put in joined/ if they can be joined; the joined directory can be empty, if all unmerged reads were coligos for example). For example, see contents of /project/microbiome/data/seq/cu_24feb21novaseq4/tfmergedreads/16S/ayayee/

...

This is where I am as of (steps above have been launched). Alex Buerkle will continue from here . The steps below have not yet been started; they contain notes for what I will do.

...

Statistics on the initial number reads, the number of reads that merged, and the number of reads that remain after filtering are in filtermergestats.csv in each project folder. Please note that this will not include the number of reads that failed to merge, but we were able to join. This category is likely to include ITS sequences for which the amplicon was large enough that our 2x250bp reads could not span the whole length. The greater number removed in ITS (orange) in the plot below is consistent with this idea. For the full lane these summaries were concatenated in tfmergedreads/ with

...