Info |
---|
The new Zymo Mock Community samples (cleaned via the max tip protocol) did not show up in the demultiplexed reads. They were either not added, or somehow the demux key is wrong for them. I believe the results are inconclusive and our best approach is to pool NS5 as normal and use it as the data set to decide on pooling standards moving forward. There is at least a plate effect when comparing the connection between reads and qPCR quantity (molar concentration). We have 7 partial plates with a total of 409 samples that used the tip conservative method and 18 full plates (1728) that were cleaned without tip reuse on the 96 channel pipette. I would suggest using 288 from each category and repeating this experiment doing single qPCRs for <73 samples from each plate and the sequencing data from NS5. Any low read plates or samples can be included in NS6There is a relationship between reads and qPCR quantification. We will take absorbance readings of these same samples. If there is a relationship between qPCR, absorbance, and reads of these same samples, we will use absorbance from all samples to normalize prior to pooling. |
...
Code Block | ||
---|---|---|
| ||
library(tidyverse) library(ggplot2) library(ggpubr)#Read in sequencing count data LRIIIReads <- read.table("/Volumes/Macintosh HD/Users/gregg/Downloads/LRIII.txt", header=FALSE, stringsAsFactors = FALSE, na.strings = "") #adjust names back to simple sample names LRIIIReads <- separate(LRIIIReads, 2, sep="[.]", into = c(NA, "samplename", NA, NA, NA, NA, NA, NA, NA)) #remove the total line and rename columns LRIIIReads <- LRIIIReads[(1:144),] colnames(LRIIIReads) <- c("readsx4", "samplename") #Divide read count by 4 to get the proper number LRIIIReads$readsk <- LRIIIReads$readsx4 / 4 #Add the replicates together LRIIIReads <- LRIIIReads %>% group_by(samplename) %>% summarise(reads = sum(readsk)) #read in the qPCR data, subset columns to sample name, mean qty, and standard deviation DGqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/5DG5_COL456.csv", header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 ) DGqPCR <- DGqPCR[ , c(2, 8, 9)] DGqPCR$plate <- "5DG5" LVDqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/5LVD5_Col123.csv", header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 ) LVDqPCR <- LVDqPCR[ , c(2, 8, 9)] LVDqPCR$plate <- "5LVD5" MCqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/MC_MB_Comparison.csv", header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 ) MCqPCR <- MCqPCR[ , c(2, 8, 9)] MCqPCR$plate <- "MC" #adjust to match naming conventionsbetween files MCqPCR[,1] <- gsub("MC_A", "MCA_", MCqPCR[,1]) MCqPCR[,1] <- gsub("MC_B", "MCB_", MCqPCR[,1]) MCqPCR[,1] <- gsub("MC_C", "MCC_", MCqPCR[,1]) MCqPCR[,1] <- gsub("MC_D", "MCD_", MCqPCR[,1]) #combine data qPCR <- rbind(DGqPCR, LVDqPCR, MCqPCR) #reduce to one incidence of each sample qPCR <- qPCR %>% group_by(Sample.Name) %>% slice(1) colnames(qPCR) <- c("samplename", "mean", "stddev", "plate") #combine qPCR and read data together LRIIIReads <- left_join(LRIIIReads, qPCR, by = "samplename") #plot reads vs qPCR qty mean ggplot(LRIIIReads, aes(readsmean, meanreads, color=plate, shape=plate))+ geom_point(show.legend = TRUE)+ geom_errorbar(aes(yminxmin=mean-stddev, ymaxxmax=mean+stddev), width=.2)+ geom_smooth(method='lm', formula= y~x)+ stat_regline_equation(label.y = 250200000, aes(label = ..eq.label..)) + stat_regline_equation(label.y = 150150000, aes(label = ..rr.label..))+ facet_wrap(~plate) |
This initial graph points toward there being more of a plate effect than a MagBead treatment effect. But, some of the standard deviations are large and we have seen poor qPCR results from this machine or our prep before. So, removal of outliers even from such a small group does not seem unreasonable. There are two outliers in the 5DG5 data that seem to be reducing the fit when compared to the 5LVD5 data. A compounding factor is that 5LVD5’s reads and qPCR numbers are all largely the same, so any carryover from tip sharing could only really have impacted one low read and low qPCR sample.
Code Block | ||
---|---|---|
| ||
#Remove the 2 outliers adjLRIIIReads <- LRIIIReads[c(1:6, 8:13, 15:72),] ggplot(adjLRIIIReads, aes(readsmean, meanreads, color=plate, shape=plate))+ geom_point(show.legend = TRUE)+ geom_errorbar(aes(yminxmin=mean-stddev, ymaxxmax=mean+stddev), width=.2)+ geom_smooth(method='lm', formula= y~x)+ stat_regline_equation(label.y = 250200000, aes(label = ..eq.label..)) + stat_regline_equation(label.y = 150150000, aes(label = ..rr.label..))+ facet_wrap(~plate) |
Removing the 2 outliers from the 5DG5 (max tip use) data, drastically improves the line fit above 5LVD5’s fit.
...
I repeated the same analyses with the filtermergestats.csv data. Nothing changed.
I think we need to either do a larger experiment prior to sending out NovaSeq5. Or, we pool NovaSeq5 as normal, qPCR the first 72 samples singly from 10? plates and then use the sequencing data from it with the qPCR data to decide the pooling standard moving forward. I would advocate for option B.
We added absorbance into the mix as a cheaper and quicker tool for normalized pooling. We got absorbance readings for these same products and did the same comparison to reads.
Code Block | ||
---|---|---|
| ||
#Read in the absorbance data
AbsLVD <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/5LVD5_16S_ITS.xlsx", sheet = "Summary", skip = 2)
AbsLVD <- AbsLVD[, c(3,11)]
AbsDG <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/5DG5_16S_ITS.xlsx", sheet = "Summary", skip = 2)
AbsDG <- AbsDG[, c(3,11)]
AbsMC <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/LRII_LRIII_MC.xlsx", sheet = "Summary", skip = 2)
AbsMC <- AbsMC[, c(3,11)]
AbsLRIII <- rbind(AbsLVD, AbsDG, AbsMC)
colnames(AbsLRIII) <- c("samplename", "NgPerUl")
LRIIIReads <- left_join(LRIIIReads, AbsLRIII, by = "samplename")
#First lets check the relationship to reads
ggplot(LRIIIReads, aes(NgPerUl, reads, color=plate, shape=plate))+
geom_point(show.legend = TRUE)+
geom_smooth(method='lm', formula= y~x)+
stat_regline_equation(label.y = 200000, aes(label = ..eq.label..)) +
stat_regline_equation(label.y = 150000, aes(label = ..rr.label..))+
facet_wrap(~plate) |
There is still a relationship here. We will proceed with 1 column of qPCR as a check on sequence-able state, and use absorbance to adjust for less read disparity.
We will perform absorbance checks, compile the data, and then sort out the best program for normalization. Probably, we will normalize to around 10 ng/ul. 1 nM is the minimum library concentration for a NovaSeq run. From the data here, 1.44 nM ~ 4 ng/ul. We would like to operate under a larger margin of error.
Is there a relationship between reads and qPCR?
Code Block | ||
---|---|---|
| ||
ggplot(LRIIIReads, aes(NgPerUl, mean, color=plate, shape=plate))+
geom_point(show.legend = TRUE)+
geom_smooth(method='lm', formula= y~x)+
stat_regline_equation(label.y = 125, aes(label = ..eq.label..)) +
stat_regline_equation(label.y = 100, aes(label = ..rr.label..))+
facet_wrap(~plate) |
Excluding the Mock Community samples, there is a strong relationship between absorbance and qPCR. The odd MC results may be a result of the small Data Set, their larger than average contribution to the data set, and their very low relative complexity.
Files:
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|
View file | ||
---|---|---|
|