Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Assign reads and otus to samples:

...

From my understanding, Line 9 from above simply splits the raw data into equal sized files, but the total number of reads should remain constant.

cd /gscratch/grandol1/loc_ad2LRII_LocAd2_3_16_22/rawdata

wc -l LowReadLRII*

Should return 8x the number of paired end reads (2x for R1 and 4x for the structure of fastq files).

This returns: 21045424 33809712 total

Divided by 8: 26306784226214

Line 11 then reads through all of the split files and assigns each read to a sample (parsed), to PhiX or non target (phixOther), or a mid error (truemiderrors). The reads assigned to these should add up to the numbers above.

wc -l parsed*

Returns: 1537123225644064 total

Divided by 8: 1921404 3205508 assigned to samples.

Assigned/Total (*100) = percent assigned: ~73%~76%

The target for samples was 83% (Off target by 12%)80%.

Things get more confusing with the phixOther and truemiderror files, because they do not appear to be true fastq files nor do they appear to be Fasta. So, I do not know how to count reads.

...

For locad2:

wc -l locad2*

Returns: 407128018304944 total

The file formats appear the same as the “parsed*” files above.

Divided by 8: 5089102288118

For LRII:

wc -l LRII*

Returns: 112999527339120 total

Divided by 8: 1412494917390

LRII + locad2: 1921404

Even if all the unassigned reads are from locad2, this does not fix the expected ration of 2lo:1LR.

[508910+(2630678-1921404)] = 1218184 total possible locad2 reads

2288118+917390= 3205508

Same as the parsed read count above.

cd /project/microbiome/data_queue/seq/loc_ad2/tfmergedreads

...