The red Low Read II samples were renormalized via pooling and the new MC samples were added to that same pool at 1 ul per sample. This pool was then adjusted to 1nM. The pool of the repeated RMJan22 samples was also adjusted to 1 nM. 50 ul of Low Read was added to 100 ul RMJan22. However, when the LowRead pool was qPCRed, one replicate was much higher than the other 2. The 1:2 ratio might be off because of this. We are running an RNaseP plate to recheck the recalibration of the 7500 qPCR machine. We ran an RNaseP Test plate to check the recalibration of the qPCR machine and it looks good. Because the reads passing filter was so low (35%), we are going to re-qPCR new dilutions of the pools and try this again.
RNaseP Check of qPCR machine recalibration:
Recalibration looks good. The 5,000 copy samples all reported back as close to 5,000 copies and the 10,000 copy samples all reported back as roughly 10,000 copies with no real outliers. A snapshot is below:
Sample Name | Detector | Task | Ct | StdDev Ct | Qty | Mean Qty | StdDev Qty |
---|---|---|---|---|---|---|---|
5K | RNase P | Unknown | 27.80 | 0.052 | 5170.54 | 4935.10 | 168.205 |
5K | RNase P | Unknown | 27.80 | 0.052 | 5164.36 | 4935.10 | 168.205 |
10K | RNase P | Unknown | 26.77 | 0.058 | 10193.14 | 9923.36 | 380.987 |
10K | RNase P | Unknown | 26.82 | 0.058 | 9863.28 | 9923.36 | 380.987 |
NTC | RNase P | NTC | Undetermined | ||||
NTC | RNase P | NTC | Undetermined | ||||
NTC | RNase P | NTC | Undetermined | ||||
NTC | RNase P | NTC | Undetermined | ||||
Standard1 | RNase P | Standard | 29.94 | 0.039 | 1250.00 | ||
Standard1 | RNase P | Standard | 29.92 | 0.039 | 1250.00 | ||
Standard1 | RNase P | Standard | 30.00 | 0.039 | 1250.00 | ||
Standard1 | RNase P | Standard | 30.00 | 0.039 | 1250.00 | ||
Standard2 | RNase P | Standard | 28.92 | 0.059 | 2500.00 | ||
Standard2 | RNase P | Standard | 28.87 | 0.059 | 2500.00 | ||
Standard2 | RNase P | Standard | 28.82 | 0.059 | 2500.00 | ||
Standard2 | RNase P | Standard | 28.78 | 0.059 | 2500.00 | ||
Standard3 | RNase P | Standard | 27.81 | 0.034 | 5000.00 | ||
Standard3 | RNase P | Standard | 27.82 | 0.034 | 5000.00 | ||
Standard3 | RNase P | Standard | 27.86 | 0.034 | 5000.00 | ||
Standard3 | RNase P | Standard | 27.88 | 0.034 | 5000.00 | ||
Standard4 | RNase P | Standard | 26.91 | 0.013 | 10000.00 | ||
Standard4 | RNase P | Standard | 26.93 | 0.013 | 10000.00 | ||
Standard4 | RNase P | Standard | 26.90 | 0.013 | 10000.00 | ||
Standard4 | RNase P | Standard | 26.89 | 0.013 | 10000.00 | ||
Standard5 | RNase P | Standard | 25.76 | 0.062 | 20000.00 | ||
Standard5 | RNase P | Standard | 25.69 | 0.062 | 20000.00 | ||
Standard5 | RNase P | Standard | 25.67 | 0.062 | 20000.00 | ||
Standard5 | RNase P | Standard | 25.61 | 0.062 | 20000.00 |
Assign reads and otus to samples:
/project/microbiome/data_queue/seq/LowReadII/rawdata salloc --account=microbiome -t 0-06:00 mkdir -p /gscratch/grandol1/loc_ad2/rawdata cd /gscratch/grandol1/loc_ad2/rawdata unpigz --to-stdout /project/microbiome/data_queue/seq/loc_ad2/rawdata/LRII-RMJAN22_S1_L001_R1_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LowReadII_R1_ ;unpigz --to-stdout /project/microbiome/data_queue/seq/loc_ad2/rawdata/LRII-RMJAN22_S1_L001_R2_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LowReadII_R2_ //project/microbiome/data_queue/seq/loc_ad2/rawdata/run_parse_count_onSplitInput.pl cd /project/microbiome/data_queue/seq/loc_ad2/rawdata ./run_splitFastq_fwd.sh ./run_splitFastq_rev.sh cd /project/microbiome/data_queue/seq/loc_ad2/rawdata ./run_aggregate.sh
Exploration of Data created so far:
From my understanding, Line 9 from above simply splits the raw data into equal sized files, but the total number of reads should remain constant.
cd /gscratch/grandol1/loc_ad2/rawdata
wc -l LowRead*
Should return 8x the number of paired end reads (2x for R1 and 4x for the structure of fastq files).
This returns: 21045424 total
Divided by 8: 2630678
Line 11 then reads through all of the split files and assigns each read to a sample (parsed), to PhiX or non target (phixOther), or a mid error (truemiderrors). The reads assigned to these should add up to the numbers above.
wc -l parsed*
Returns: 15371232
Divided by 8: 1921404 assigned to samples.
Assigned/Total (*100) = percent assigned: ~73%
The target for samples was 83% (Off target by 12%).
Things get more confusing with the phixOther and truemiderror files, because they do not appear to be true fastq files nor do they appear to be Fasta. So, I do not know how to count reads.
Blasting random lines from phixOther returns a mix of phiX and ‘uncultured bacterium 16S’. I see no way of disentangling this.
So, let us explore the results of lines 13 to 17. These should be found in /project/microbiome/data_queue/seq/loc_ad2/rawdata/sample_fastq/16S/locad2
and
/project/microbiome/data_queue/seq/loc_ad2/rawdata/sample_fastq/16S/LRII
For locad2:
wc -l locad2*
Returns: 4071280
The file formats appear the same as the “parsed*” files above.
Divided by 8: 508910
For LRII:
wc -l LRII*
Returns: 11299952
Divided by 8: 1412494
LRII + locad2: 1921404
Even if all the unassigned reads are from locad2, this does not fix the expected ration of 2lo:1LR.
[508910+(2630678-1921404)] = 1218184 total possible locad2 reads
cd /project/microbiome/data_queue/seq/loc_ad2/tfmergedreads
./run_slurm_mergereads.pl
cd /project/microbiome/data_queue/seq/LowReadII/otu
./run_slurm_mkotu.pl
Assign taxonomy
salloc --account=microbiome -t 0-02:00 --mem=500G module load swset/2018.05 gcc/7.3.0 module load vsearch/2.15.1 vsearch --sintax zotus.fa --db /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa -tabbedout LRII.sintax -sintax_cutoff 0.8
Output:
Reading file /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa 100%
1769520677 nt in 1262986 seqs, min 1111, max 2368, avg 1401
Counting k-mers 100%
Creating k-mer index 100%
Classifying sequences 100%
Classified 4038 of 4042 sequences (99.90%)
Convert into useful form:
awk -F "\t" '{OFS=","} NR==1 {print "OTU_ID","SEQS","SIZE","DOMAIN","KINGDOM","PHYLUM","CLASS","ORDER","FAMILY","GENUS","SPECIES"} {gsub(";", ","); gsub("centroid=", ""); gsub("seqs=", ""); gsub("size=", ""); match($4, /d:[^,]+/, d); match($4, /k:[^,]+/, k); match($4, /p:[^,]+/, p); match($4, /c:[^,]+/, c); match($4, /o:[^,]+/, o); match($4, /f:[^,]+/, f); match($4, /g:[^,]+/, g); match($4, /s:[^,]+/, s); print $1, d[0]=="" ? "NA" : d[0], k[0]=="" ? "NA" : k[0], p[0]=="" ? "NA" : p[0], c[0]=="" ? "NA" : c[0], o[0]=="" ? "NA" : o[0], f[0]=="" ? "NA" : f[0], g[0]=="" ? "NA" : g[0], s[0]=="" ? "NA" : s[0] }' LRII.sintax > LRIItaxonomy.csv