LRIII Analysis

The new Zymo Mock Community samples (cleaned via the max tip protocol) did not show up in the demultiplexed reads. They were either not added, or somehow the demux key is wrong for them.

There is a relationship between reads and qPCR quantification. We will take absorbance readings of these same samples. If there is a relationship between qPCR, absorbance, and reads of these same samples, we will use absorbance from all samples to normalize prior to pooling. If not, we will consider further qPCR. There is still a relationship to absorbance. We will proceed with 1 column of qPCR as a check on sequence-able state, and use absorbance to adjust for less read disparity. We will perform absorbance checks, compile the data, and then sort out the best program for normalization.

@Alex Buerkle @Linda van Diepen

LowReadIII was meant to explore the question: Does MagBead treatment impact the connection between reads and qPCR? We chose 3 columns from 2 sample plates from NovaSeq5(which is waiting this decision for its pooling parameters) and 2 sets of similarly PCRed Mock Community. The testing parameter was a tip conservative method of AxyPrep MagBead cleanup vs one avoiding any reuse of tips while DNA was bound to beads and pelleted. 5DG5 was cleaned with max tips and 5LVD5 was cleaned conservatively. The MC samples that processed with reads were cleaned conservatively. The maximum tip usage MC samples were likely not added to the pool, because they demultiplexed showing 0 reads each.

First I read the read count data into r, then qPCR results, then I combined them and graphed the mean quantity from the qPCR against the reads.

library(tidyverse)
library(ggplot2)
library(ggpubr)#Read in sequencing count data

LRIIIReads <- read.table("/Volumes/Macintosh HD/Users/gregg/Downloads/LRIII.txt",

                         header=FALSE, stringsAsFactors = FALSE, na.strings = "")

#adjust names back to simple sample names

LRIIIReads <- separate(LRIIIReads, 2, sep="[.]", into = c(NA, "samplename", NA, NA, NA, NA, NA, NA, NA))

#remove the total line and rename columns

LRIIIReads <- LRIIIReads[(1:144),]

colnames(LRIIIReads) <- c("readsx4", "samplename")

#Divide read count by 4 to get the proper number

LRIIIReads$readsk <- LRIIIReads$readsx4 / 4

#Add the replicates together

LRIIIReads <- LRIIIReads %>%

  group_by(samplename) %>%

  summarise(reads = sum(readsk))

#read in the qPCR data, subset columns to sample name, mean qty, and standard deviation

DGqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/5DG5_COL456.csv",

                   header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 )

DGqPCR <- DGqPCR[ , c(2, 8, 9)]

DGqPCR$plate <- "5DG5"

LVDqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/5LVD5_Col123.csv",

                    header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 )

LVDqPCR <- LVDqPCR[ , c(2, 8, 9)]

LVDqPCR$plate <- "5LVD5"

MCqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/MC_MB_Comparison.csv",

                   header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 )

MCqPCR <- MCqPCR[ , c(2, 8, 9)]

MCqPCR$plate <- "MC"

#adjust to match naming conventionsbetween files

MCqPCR[,1] <- gsub("MC_A", "MCA_", MCqPCR[,1])

MCqPCR[,1] <- gsub("MC_B", "MCB_", MCqPCR[,1])

MCqPCR[,1] <- gsub("MC_C", "MCC_", MCqPCR[,1])

MCqPCR[,1] <- gsub("MC_D", "MCD_", MCqPCR[,1])

#combine data

qPCR <- rbind(DGqPCR, LVDqPCR, MCqPCR)

#reduce to one incidence of each sample

qPCR <- qPCR %>%

  group_by(Sample.Name) %>%

  slice(1)

colnames(qPCR) <- c("samplename", "mean", "stddev", "plate")

#combine qPCR and read data together

LRIIIReads <- left_join(LRIIIReads, qPCR, by = "samplename")

#plot reads vs qPCR qty mean

ggplot(LRIIIReads, aes(mean, reads, color=plate, shape=plate))+
  geom_point(show.legend = TRUE)+
  geom_errorbar(aes(xmin=mean-stddev, xmax=mean+stddev), width=.2)+
  geom_smooth(method='lm', formula= y~x)+
  stat_regline_equation(label.y = 200000, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 150000, aes(label = ..rr.label..))+
  facet_wrap(~plate)

This initial graph points toward there being more of a plate effect than a MagBead treatment effect. But, some of the standard deviations are large and we have seen poor qPCR results from this machine or our prep before. So, removal of outliers even from such a small group does not seem unreasonable. There are two outliers in the 5DG5 data that seem to be reducing the fit when compared to the 5LVD5 data. A compounding factor is that 5LVD5’s reads and qPCR numbers are all largely the same, so any carryover from tip sharing could only really have impacted one low read and low qPCR sample.

#Remove the 2 outliers

adjLRIIIReads <- LRIIIReads[c(1:6, 8:13, 15:72),]

ggplot(adjLRIIIReads, aes(mean, reads, color=plate, shape=plate))+
  geom_point(show.legend = TRUE)+
  geom_errorbar(aes(xmin=mean-stddev, xmax=mean+stddev), width=.2)+
  geom_smooth(method='lm', formula= y~x)+
  stat_regline_equation(label.y = 200000, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 150000, aes(label = ..rr.label..))+
  facet_wrap(~plate)

Removing the 2 outliers from the 5DG5 (max tip use) data, drastically improves the line fit above 5LVD5’s fit.

I repeated the same analyses with the filtermergestats.csv data. Nothing changed.

I think we need to either do a larger experiment prior to sending out NovaSeq5. Or, we pool NovaSeq5 as normal, qPCR the first 72 samples singly from 10? plates and then use the sequencing data from it with the qPCR data to decide the pooling standard moving forward. I would advocate for option B.

We added absorbance into the mix as a cheaper and quicker tool for normalized pooling. We got absorbance readings for these same products and did the same comparison to reads.

#Read in the absorbance data
AbsLVD <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/5LVD5_16S_ITS.xlsx", sheet = "Summary", skip = 2)
AbsLVD <- AbsLVD[, c(3,11)]
AbsDG <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/5DG5_16S_ITS.xlsx", sheet = "Summary", skip = 2)
AbsDG <- AbsDG[, c(3,11)]
AbsMC <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/LRII_LRIII_MC.xlsx", sheet = "Summary", skip = 2)
AbsMC <- AbsMC[, c(3,11)]
AbsLRIII <- rbind(AbsLVD, AbsDG, AbsMC)
colnames(AbsLRIII) <- c("samplename", "NgPerUl")

LRIIIReads <- left_join(LRIIIReads, AbsLRIII, by = "samplename")

#First lets check the relationship to reads
ggplot(LRIIIReads, aes(NgPerUl, reads, color=plate, shape=plate))+
  geom_point(show.legend = TRUE)+
  geom_smooth(method='lm', formula= y~x)+
  stat_regline_equation(label.y = 200000, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 150000, aes(label = ..rr.label..))+
  facet_wrap(~plate)

There is still a relationship here. We will proceed with 1 column of qPCR as a check on sequence-able state, and use absorbance to adjust for less read disparity.

We will perform absorbance checks, compile the data, and then sort out the best program for normalization. Probably, we will normalize to around 10 ng/ul. 1 nM is the minimum library concentration for a NovaSeq run. From the data here, 1.44 nM ~ 4 ng/ul. We would like to operate under a larger margin of error.

Is there a relationship between reads and qPCR?

Excluding the Mock Community samples, there is a strong relationship between absorbance and qPCR. The odd MC results may be a result of the small Data Set, their larger than average contribution to the data set, and their very low relative complexity.

Genome Technologies Laboratory

LRIII Analysis

Files: