Page Comparison

...

run_splitFastq_fwd.sh and run_splitFastq_rev.sh run splitFastq_manyInputfiles.pl, which steps through the many pairs of files to split reads by sample and project, and place them in /project/gtl/data/raw/ALF1/16S/rawdata/sample_fastq/

Stopped here on 9-07-22

Calculate summary statistics on reads

...

cd /project/gtl/data/raw/ALF1/ITS/rawdata/sample_fastq

mv * /project/gtl/data/raw/ALF1/sample_fastq/

Tfmergedreads Perl Issue fixing

cd /project/gtl/data/raw/ALF1/

mkdir rawdata

cd rawdata

cp /project/gtl/data/raw/ALF1/16S/rawdata/sample_fastq ./

cd 16S/ALF16S21

for f in *-*; do mv -i "$f" "${f//-/_}"; done

sed -i 's/-/_/g' *

cd ../../ITS/ALF16S21

for f in *-*; do mv -i "$f" "${f//-/_}"; done

sed -i 's/-/_/g' *

for f in *EMPTY*; do mv -i "$f" "${f//EMPTY/Emp}"; done

for f in *Unknown*; do mv -i "$f" "${f//Unknown/Unk}"; done

for f in *Soil*; do mv -i "$f" "${f//Soil/S}"; done

Trim, merge and filter reads

In /project/gtl/data/raw/ALF1/16S/rawdata/tfmergedreads , we used run_slurm_mergereads.plto crawl the project folders and sample files (created in the splitting step above) to merge read pairs, and filter based on base quality. This script conforms to the steps in https://microcollaborative.atlassian.net/wiki/spaces/MICLAB/pages/1123778569/Bioinformatics+v3.0?focusedCommentId=1280377080#comment-1280377080, including trimming primers, and joining unmerged reads. This writes a new set of fasta files for each sample and project, rather than fastq, to be used in subsequent steps. These files are found in the 16S/ and ITS/ folders in tfmergedreads/. For example, see contents of /project/gtl/data/distributionraw/ALF1/16S/rawdatatfmergedreads/

Within each of these directories are files for the trimmed, merged, and filtered reads, in subfolders trimmed/, joined/, and unmerged/ (the last one is used as a working directory, should be empty; unmerged reads are filtered and joined and put in joined/ if they can be joined; the joined directory can be empty, if all unmerged reads were coligos for example).

...

I used commands.R in that folder to make a plot of numbers of reads per sample (horizontal axis) and the number reads that were removed because they did not merge, or did meet quality criteria and were filtered out (vertical axis). Purple is for 16S and orange is for ITS. It might be interesting to do that plot for each of the projects in the library (TODO), and possibly to have marginal histograms (or put quantiles on the plots).

View file

name	AlfReadCounts.pdf

Make OTU table

In /project/gtl/data/raw/ALF1/16S/rawdata/otu, I ran run_slurm_mkotu.pl, which I modified to also pick up the joined reads (in addition the merged reads).

...

In /project/gtl/data/raw/ALF1/16S/coligoISD and /project/gtl/data/raw/ALF1/16S/otu there are 16S and ITS directories for all projects. These contain a file named coligoISDtable.txt with counts of the coligos and the ISD found in the trimmed forward reads, per sample. The file run_slurm_mkcoligoISDtable.pl has the code that passes over all of the projects and uses vsearch for making the table.

5-2-2022 Gregg Randolph transferred all of this to `/project/gtl/data/distribution/ALF1/`

...

Versions Compared

Old Version 12

New Version 13

Key

Calculate summary statistics on reads

Tfmergedreads Perl Issue fixing

Trim, merge and filter reads

Make OTU table

5-2-2022 Gregg Randolph transferred all of this to `/project/gtl/data/distribution/ALF1/`

Page Comparison

Versions Compared

Old Version 12

New Version 13

Key

Calculate summary statistics on reads

Tfmergedreads Perl Issue fixing

Trim, merge and filter reads

Make OTU table

5-2-2022 Gregg Randolph transferred all of this to /project/gtl/data/distribution/ALF1/

5-2-2022 Gregg Randolph transferred all of this to `/project/gtl/data/distribution/ALF1/`