/
LRIII Bioinformatics

LRIII Bioinformatics

The newest Mock Community is not showing up at all when the sequence run is demultiplexed. So, it was either not added to the pool or the demux key is incorrect for them.

Assign reads and otus to samples:

Assign Reads:

/project/microbiome/data_queue/seq/LRIII/rawdata salloc --account=microbiome -t 0-06:00 mkdir -p /gscratch/grandol1/LRIII/rawdata cd /gscratch/grandol1/LRIII/rawdata unpigz --to-stdout /project/microbiome/data_queue/seq/LRIII/rawdata/LRIII-MC-MB_S1_L001_R1_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LRIII_R1_ ;unpigz --to-stdout /project/microbiome/data_queue/seq/LRIII/rawdata/LRIII-MC-MB_S1_L001_R2_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LRIII_R2_ //project/microbiome/data_queue/seq/LRIII/rawdata/run_parse_count_onSplitInput.pl cd /project/microbiome/data_queue/seq/LRIII/rawdata ./run_splitFastq_fwd.sh ./run_splitFastq_rev.sh cd /project/microbiome/data_queue/seq/LRIII/rawdata ./run_aggregate.sh wc -l *R1* > LRIII.txt mv LRIII.txt /project/microbiome/data_queue/seq/LRIII/rawdata

Process through to otus:

salloc --account=microbiome -t 0-06:00 cd /project/microbiome/data_queue/seq/LRIII/tfmergedreads ./run_slurm_mergereads.pl cd /project/microbiome/data_queue/seq/LRIII/otu ./run_slurm_mkotu.pl

Exploration of Data created so far:

From my understanding, Line 9 from above simply splits the raw data into equal sized files, but the total number of reads should remain constant.

cd /gscratch/grandol1/LRIII/rawdata

wc -l LRII*

Should return 8x the number of paired end reads (2x for R1 and 4x for the structure of fastq files).

This returns: 33809712 total

Divided by 8: 4226214

 

Line 11 then reads through all of the split files and assigns each read to a sample (parsed), to PhiX or non target (phixOther), or a mid error (truemiderrors). The reads assigned to these should add up to the numbers above.

wc -l parsed*

Returns: 25644064 total

Divided by 8: 3205508 assigned to samples.

Assigned/Total (*100) = percent assigned: ~76%

The target for samples was 80%.

 

Things get more confusing with the phixOther and truemiderror files, because they do not appear to be true fastq files nor do they appear to be Fasta. So, I do not know how to count reads.

Blasting random lines from phixOther returns a mix of phiX and ‘uncultured bacterium 16S’. I see no way of disentangling this.

So, let us explore the results of lines 13 to 17. These should be found in /project/microbiome/data_queue/seq/LRIII/rawdata/sample_fastq/16S/LRIII

and

/project/microbiome/data_queue/seq/LRIII/rawdata/sample_fastq/16S/LRIII

For locad2:

wc -l locad2*

Returns: 18304944 total

The file formats appear the same as the “parsed*” files above.

Divided by 8: 2288118

For LRII:

wc -l LRII*

Returns: 7339120 total

Divided by 8: 917390

LRII + locad2:

2288118+917390= 3205508

Same as the parsed read count above.