Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

The red Low Read II samples were renormalized via pooling and the new MC samples were added to that same pool at 1 ul per sample. This pool was then adjusted to 1nM. The pool of the repeated RMJan22 samples was also adjusted to 1 nM. 50 ul of Low Read was added to 100 ul RMJan22. However, when the LowRead pool was qPCRed, one replicate was much higher than the other 2. The 1:2 ratio might be off because of this. We are running an RNaseP plate to recheck the recalibration of the 7500 qPCR machine. We ran an RNaseP Test plate to check the recalibration of the qPCR machine and it looks good. Because the reads passing filter was so low (35%), we are going to re-qPCR new dilutions of the pools and try this again.

RNaseP Check of qPCR machine recalibration:

Recalibration looks good. The 5,000 copy samples all reported back as close to 5,000 copies and the 10,000 copy samples all reported back as roughly 10,000 copies with no real outliers. A snapshot is below:

Sample Name

Detector

Task

Ct

StdDev Ct

Qty

Mean Qty

StdDev Qty

5K

RNase P

Unknown

27.80

0.052

5170.54

4935.10

168.205

5K

RNase P

Unknown

27.80

0.052

5164.36

4935.10

168.205

10K

RNase P

Unknown

26.77

0.058

10193.14

9923.36

380.987

10K

RNase P

Unknown

26.82

0.058

9863.28

9923.36

380.987

NTC

RNase P

NTC

Undetermined

NTC

RNase P

NTC

Undetermined

NTC

RNase P

NTC

Undetermined

NTC

RNase P

NTC

Undetermined

Standard1

RNase P

Standard

29.94

0.039

1250.00

Standard1

RNase P

Standard

29.92

0.039

1250.00

Standard1

RNase P

Standard

30.00

0.039

1250.00

Standard1

RNase P

Standard

30.00

0.039

1250.00

Standard2

RNase P

Standard

28.92

0.059

2500.00

Standard2

RNase P

Standard

28.87

0.059

2500.00

Standard2

RNase P

Standard

28.82

0.059

2500.00

Standard2

RNase P

Standard

28.78

0.059

2500.00

Standard3

RNase P

Standard

27.81

0.034

5000.00

Standard3

RNase P

Standard

27.82

0.034

5000.00

Standard3

RNase P

Standard

27.86

0.034

5000.00

Standard3

RNase P

Standard

27.88

0.034

5000.00

Standard4

RNase P

Standard

26.91

0.013

10000.00

Standard4

RNase P

Standard

26.93

0.013

10000.00

Standard4

RNase P

Standard

26.90

0.013

10000.00

Standard4

RNase P

Standard

26.89

0.013

10000.00

Standard5

RNase P

Standard

25.76

0.062

20000.00

Standard5

RNase P

Standard

25.69

0.062

20000.00

Standard5

RNase P

Standard

25.67

0.062

20000.00

Standard5

RNase P

Standard

25.61

0.062

20000.00

Assign reads and otus to samples:

/project/microbiome/data_queue/seq/LowReadII/rawdata

salloc --account=microbiome -t 0-06:00

mkdir -p /gscratch/grandol1/loc_ad2/rawdata

cd /gscratch/grandol1/loc_ad2/rawdata

unpigz --to-stdout /project/microbiome/data_queue/seq/loc_ad2/rawdata/LRII-RMJAN22_S1_L001_R1_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LowReadII_R1_ ;unpigz --to-stdout /project/microbiome/data_queue/seq/loc_ad2/rawdata/LRII-RMJAN22_S1_L001_R2_001.fastq.gz | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - LowReadII_R2_

//project/microbiome/data_queue/seq/loc_ad2/rawdata/run_parse_count_onSplitInput.pl

cd /project/microbiome/data_queue/seq/loc_ad2/rawdata

./run_splitFastq_fwd.sh

./run_splitFastq_rev.sh

cd /project/microbiome/data_queue/seq/loc_ad2/rawdata

./run_aggregate.sh

Exploration of Data created so far:

From my understanding, Line 9 from above simply splits the raw data into equal sized files, but the total number of reads should remain constant.

cd /gscratch/grandol1/loc_ad2/rawdata

wc -l LowRead*

Should return 8x the number of paired end reads (2x for R1 and 4x for the structure of fastq files).

This returns: 21045424 total

Divided by 8: 2630678

Line 11 then reads through all of the split files and assigns each read to a sample (parsed), to PhiX or non target (phixOther), or a mid error (truemiderrors). The reads assigned to these should add up to the numbers above.

wc -l parsed*

Returns: 15371232

Divided by 8: 1921404 assigned to samples.

Assigned/Total (*100) = percent assigned: ~73%

The target for samples was 83% (Off target by 12%).

Things get more confusing with the phixOther and truemiderror files, because they do not appear to be true fastq files nor do they appear to be Fasta. So, I do not know how to count reads.

Blasting random lines from phixOther returns a mix of phiX and ‘uncultured bacterium 16S’. I see no way of disentangling this.

So, let us explore the results of lines 13 to 17. These should be found in /project/microbiome/data_queue/seq/loc_ad2/rawdata/sample_fastq/16S/locad2

and

/project/microbiome/data_queue/seq/loc_ad2/rawdata/sample_fastq/16S/LRII

For locad2:

wc -l locad2*

Returns: 4071280

The file formats appear the same as the “parsed*” files above.

Divided by 8: 508910

For LRII:

wc -l LRII*

Returns: 11299952

Divided by 8: 1412494

LRII + locad2: 1921404

Even if all the unassigned reads are from locad2, this does not fix the expected ration of 2lo:1LR.

[508910+(2630678-1921404)] = 1218184 total possible locad2 reads

cd /project/microbiome/data_queue/seq/loc_ad2/tfmergedreads

./run_slurm_mergereads.pl

cd /project/microbiome/data_queue/seq/LowReadII/otu

./run_slurm_mkotu.pl

Assign taxonomy

salloc --account=microbiome -t 0-02:00 --mem=500G

module load swset/2018.05  gcc/7.3.0

module load vsearch/2.15.1

vsearch --sintax zotus.fa --db /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa -tabbedout LRII.sintax -sintax_cutoff 0.8

Output:

Reading file /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa 100%  

1769520677 nt in 1262986 seqs, min 1111, max 2368, avg 1401

Counting k-mers 100% 

Creating k-mer index 100% 

Classifying sequences 100%   

Classified 4038 of 4042 sequences (99.90%)

Convert into useful form:

awk -F "\t" '{OFS=","} NR==1 {print "OTU_ID","SEQS","SIZE","DOMAIN","KINGDOM","PHYLUM","CLASS","ORDER","FAMILY","GENUS","SPECIES"} {gsub(";", ","); gsub("centroid=", ""); gsub("seqs=", ""); gsub("size=", ""); match($4, /d:[^,]+/, d); match($4, /k:[^,]+/, k); match($4, /p:[^,]+/, p); match($4, /c:[^,]+/, c); match($4, /o:[^,]+/, o); match($4, /f:[^,]+/, f); match($4, /g:[^,]+/, g); match($4, /s:[^,]+/, s); print $1, d[0]=="" ? "NA" : d[0], k[0]=="" ? "NA" : k[0], p[0]=="" ? "NA" : p[0], c[0]=="" ? "NA" : c[0], o[0]=="" ? "NA" : o[0], f[0]=="" ? "NA" : f[0], g[0]=="" ? "NA" : g[0], s[0]=="" ? "NA" : s[0] }' LRII.sintax > LRIItaxonomy.csv


  • No labels