/
LRII.6 Bioinformatics

LRII.6 Bioinformatics

 

Assign reads and otus to samples:

Assign Reads:

/project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/rawdata salloc --account=microbiome -t 0-06:00 mkdir -p /gscratch/grandol1/LRII_LocAd2_3_16_22/rawdata cd /gscratch/grandol1/LRII_LocAd2_3_16_22/rawdata unpigz --to-stdout /project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/rawdata/LRII-5-RMJan22_S1_L001_R1_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LRII_LocAd2_3_16_22_R1_ ;unpigz --to-stdout /project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/rawdata/LRII-5-RMJan22_S1_L001_R2_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LRII_LocAd2_3_16_22_R2_ //project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/rawdata/run_parse_count_onSplitInput.pl cd /project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/rawdata ./run_splitFastq_fwd.sh ./run_splitFastq_rev.sh cd /project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/rawdata ./run_aggregate.sh

Process through to otus:

salloc --account=microbiome -t 0-06:00 cd /project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/tfmergedreads ./run_slurm_mergereads.pl cd /project/microbiome/data_queue/seq/LRII_LocAd2_3_16_22/otu ./run_slurm_mkotu.pl

Exploration of Data created so far:

From my understanding, Line 9 from above simply splits the raw data into equal sized files, but the total number of reads should remain constant.

cd /gscratch/grandol1/LRII_LocAd2_3_16_22/rawdata

wc -l LRII*

Should return 8x the number of paired end reads (2x for R1 and 4x for the structure of fastq files).

This returns: 33809712 total

Divided by 8: 4226214

 

Line 11 then reads through all of the split files and assigns each read to a sample (parsed), to PhiX or non target (phixOther), or a mid error (truemiderrors). The reads assigned to these should add up to the numbers above.

wc -l parsed*

Returns: 25644064 total

Divided by 8: 3205508 assigned to samples.

Assigned/Total (*100) = percent assigned: ~76%

The target for samples was 80%.

 

Things get more confusing with the phixOther and truemiderror files, because they do not appear to be true fastq files nor do they appear to be Fasta. So, I do not know how to count reads.

Blasting random lines from phixOther returns a mix of phiX and ‘uncultured bacterium 16S’. I see no way of disentangling this.

So, let us explore the results of lines 13 to 17. These should be found in /project/microbiome/data_queue/seq/loc_ad2/rawdata/sample_fastq/16S/locad2

and

/project/microbiome/data_queue/seq/loc_ad2/rawdata/sample_fastq/16S/LRII

For locad2:

wc -l locad2*

Returns: 18304944 total

The file formats appear the same as the “parsed*” files above.

Divided by 8: 2288118

For LRII:

wc -l LRII*

Returns: 7339120 total

Divided by 8: 917390

LRII + locad2:

2288118+917390= 3205508

Same as the parsed read count above.


Related content

LRII.5 Bioinformatics
LRII.5 Bioinformatics
Read with this
LRII Bioinformatics
More like this
MagBead Prep Comparison LRIII
MagBead Prep Comparison LRIII
Read with this
3AMF MiSeq Bioinformatics
3AMF MiSeq Bioinformatics
More like this
LRIII Bioinformatics
LRIII Bioinformatics
Read with this
LowRead Bioinformatics
LowRead Bioinformatics
More like this