LRIII Bioinformatics
The newest Mock Community is not showing up at all when the sequence run is demultiplexed. So, it was either not added to the pool or the demux key is incorrect for them.
Assign reads and otus to samples:
Assign Reads:
/project/microbiome/data_queue/seq/LRIII/rawdata
salloc --account=microbiome -t 0-06:00
mkdir -p /gscratch/grandol1/LRIII/rawdata
cd /gscratch/grandol1/LRIII/rawdata
unpigz --to-stdout /project/microbiome/data_queue/seq/LRIII/rawdata/LRIII-MC-MB_S1_L001_R1_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LRIII_R1_ ;unpigz --to-stdout /project/microbiome/data_queue/seq/LRIII/rawdata/LRIII-MC-MB_S1_L001_R2_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LRIII_R2_
//project/microbiome/data_queue/seq/LRIII/rawdata/run_parse_count_onSplitInput.pl
cd /project/microbiome/data_queue/seq/LRIII/rawdata
./run_splitFastq_fwd.sh
./run_splitFastq_rev.sh
cd /project/microbiome/data_queue/seq/LRIII/rawdata
./run_aggregate.sh
wc -l *R1* > LRIII.txt
mv LRIII.txt /project/microbiome/data_queue/seq/LRIII/rawdata
Process through to otus:
salloc --account=microbiome -t 0-06:00
cd /project/microbiome/data_queue/seq/LRIII/tfmergedreads
./run_slurm_mergereads.pl
cd /project/microbiome/data_queue/seq/LRIII/otu
./run_slurm_mkotu.pl
Exploration of Data created so far:
From my understanding, Line 9 from above simply splits the raw data into equal sized files, but the total number of reads should remain constant.
cd /gscratch/grandol1/LRIII
/rawdata
wc -l LRII*
Should return 8x the number of paired end reads (2x for R1 and 4x for the structure of fastq files).
This returns: 33809712 total
Divided by 8: 4226214
Line 11 then reads through all of the split files and assigns each read to a sample (parsed), to PhiX or non target (phixOther), or a mid error (truemiderrors). The reads assigned to these should add up to the numbers above.
wc -l parsed*
Returns: 25644064 total
Divided by 8: 3205508 assigned to samples.
Assigned/Total (*100) = percent assigned: ~76%
The target for samples was 80%.
Things get more confusing with the phixOther and truemiderror files, because they do not appear to be true fastq files nor do they appear to be Fasta. So, I do not know how to count reads.
Blasting random lines from phixOther returns a mix of phiX and ‘uncultured bacterium 16S’. I see no way of disentangling this.
So, let us explore the results of lines 13 to 17. These should be found in /project/microbiome/data_queue/seq/LRIII/rawdata/sample_fastq/16S/LRIII
and
/project/microbiome/data_queue/seq/LRIII/rawdata/sample_fastq/16S/LRIII
For locad2:
wc -l locad2*
Returns: 18304944 total
The file formats appear the same as the “parsed*” files above.
Divided by 8: 2288118
For LRII:
wc -l LRII*
Returns: 7339120 total
Divided by 8: 917390
LRII + locad2:
2288118+917390= 3205508
Same as the parsed read count above.