Page Comparison

Info

The new Zymo Mock Community samples (cleaned via the max tip protocol) did not show up in the demultiplexed reads. They were either not added, or somehow the demux key is wrong for them.

I believe the results are inconclusive and our best approach is to pool NS5 as normal and use it as the data set to decide on pooling standards moving forward. There is at least a plate effect when comparing the connection between reads and qPCR quantity (molar concentration).

We have 7 partial plates with a total of 409 samples that used the tip conservative method and 18 full plates (1728) that were cleaned without tip reuse on the 96 channel pipette. I would suggest using 288 from each category and repeating this experiment doing single qPCRs for <73 samples from each plate and the sequencing data from NS5. Any low read plates or samples can be included in NS6There is a relationship between reads and qPCR quantification. We will take absorbance readings of these same samples. If there is a relationship between qPCR, absorbance, and reads of these same samples, we will use absorbance from all samples to normalize prior to pooling. ~~If not, we will consider further qPCR.~~ There is still a relationship to absorbance. We will proceed with 1 column of qPCR as a check on sequence-able state, and use absorbance to adjust for less read disparity. We will perform absorbance checks, compile the data, and then sort out the best program for normalization.

Alex Buerkle Linda van Diepen

...

Code Block

language	r

library(tidyverse)
library(ggplot2)
library(ggpubr)#Read in sequencing count data

LRIIIReads <- read.table("/Volumes/Macintosh HD/Users/gregg/Downloads/LRIII.txt",

                         header=FALSE, stringsAsFactors = FALSE, na.strings = "")

#adjust names back to simple sample names

LRIIIReads <- separate(LRIIIReads, 2, sep="[.]", into = c(NA, "samplename", NA, NA, NA, NA, NA, NA, NA))

#remove the total line and rename columns

LRIIIReads <- LRIIIReads[(1:144),]

colnames(LRIIIReads) <- c("readsx4", "samplename")

#Divide read count by 4 to get the proper number

LRIIIReads$readsk <- LRIIIReads$readsx4 / 4

#Add the replicates together

LRIIIReads <- LRIIIReads %>%

  group_by(samplename) %>%

  summarise(reads = sum(readsk))

#read in the qPCR data, subset columns to sample name, mean qty, and standard deviation

DGqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/5DG5_COL456.csv",

                   header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 )

DGqPCR <- DGqPCR[ , c(2, 8, 9)]

DGqPCR$plate <- "5DG5"

LVDqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/5LVD5_Col123.csv",

                    header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 )

LVDqPCR <- LVDqPCR[ , c(2, 8, 9)]

LVDqPCR$plate <- "5LVD5"

MCqPCR <- read.csv("/Volumes/Macintosh HD/Users/gregg/Downloads/MC_MB_Comparison.csv",

                   header=TRUE, stringsAsFactors = FALSE, na.strings = "", skip =14 )

MCqPCR <- MCqPCR[ , c(2, 8, 9)]

MCqPCR$plate <- "MC"

#adjust to match naming conventionsbetween files

MCqPCR[,1] <- gsub("MC_A", "MCA_", MCqPCR[,1])

MCqPCR[,1] <- gsub("MC_B", "MCB_", MCqPCR[,1])

MCqPCR[,1] <- gsub("MC_C", "MCC_", MCqPCR[,1])

MCqPCR[,1] <- gsub("MC_D", "MCD_", MCqPCR[,1])

#combine data

qPCR <- rbind(DGqPCR, LVDqPCR, MCqPCR)

#reduce to one incidence of each sample

qPCR <- qPCR %>%

  group_by(Sample.Name) %>%

  slice(1)

colnames(qPCR) <- c("samplename", "mean", "stddev", "plate")

#combine qPCR and read data together

LRIIIReads <- left_join(LRIIIReads, qPCR, by = "samplename")

#plot reads vs qPCR qty mean

ggplot(LRIIIReads, aes(readsmean, meanreads, color=plate, shape=plate))+
   geom_point(show.legend = TRUE)+
   geom_errorbar(aes(yminxmin=mean-stddev, ymaxxmax=mean+stddev), width=.2)+
   geom_smooth(method='lm', formula= y~x)+
   stat_regline_equation(label.y = 250200000, aes(label = ..eq.label..)) +
   stat_regline_equation(label.y = 150150000, aes(label = ..rr.label..))+
   facet_wrap(~plate)

Image Removed

Image Added

This initial graph points toward there being more of a plate effect than a MagBead treatment effect. But, some of the standard deviations are large and we have seen poor qPCR results from this machine or our prep before. So, removal of outliers even from such a small group does not seem unreasonable. There are two outliers in the 5DG5 data that seem to be reducing the fit when compared to the 5LVD5 data. A compounding factor is that 5LVD5’s reads and qPCR numbers are all largely the same, so any carryover from tip sharing could only really have impacted one low read and low qPCR sample.

Code Block

language	r

#Remove the 2 outliers

adjLRIIIReads <- LRIIIReads[c(1:6, 8:13, 15:72),]

ggplot(adjLRIIIReads, aes(readsmean, meanreads, color=plate, shape=plate))+
   geom_point(show.legend = TRUE)+
   geom_errorbar(aes(yminxmin=mean-stddev, ymaxxmax=mean+stddev), width=.2)+
   geom_smooth(method='lm', formula= y~x)+
   stat_regline_equation(label.y = 250200000, aes(label = ..eq.label..)) +
   stat_regline_equation(label.y = 150150000, aes(label = ..rr.label..))+
   facet_wrap(~plate)

Removing the 2 outliers from the 5DG5 (max tip use) data, drastically improves the line fit above 5LVD5’s fit.

...

I repeated the same analyses with the filtermergestats.csv data. Nothing changed.

I think we need to either do a larger experiment prior to sending out NovaSeq5. Or, we pool NovaSeq5 as normal, qPCR the first 72 samples singly from 10? plates and then use the sequencing data from it with the qPCR data to decide the pooling standard moving forward. I would advocate for option B.

We added absorbance into the mix as a cheaper and quicker tool for normalized pooling. We got absorbance readings for these same products and did the same comparison to reads.

Code Block

language	r

#Read in the absorbance data
AbsLVD <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/5LVD5_16S_ITS.xlsx", sheet = "Summary", skip = 2)
AbsLVD <- AbsLVD[, c(3,11)]
AbsDG <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/5DG5_16S_ITS.xlsx", sheet = "Summary", skip = 2)
AbsDG <- AbsDG[, c(3,11)]
AbsMC <- read_xlsx("/Volumes/Macintosh HD/Users/gregg/Downloads/LRII_LRIII_MC.xlsx", sheet = "Summary", skip = 2)
AbsMC <- AbsMC[, c(3,11)]
AbsLRIII <- rbind(AbsLVD, AbsDG, AbsMC)
colnames(AbsLRIII) <- c("samplename", "NgPerUl")

LRIIIReads <- left_join(LRIIIReads, AbsLRIII, by = "samplename")

#First lets check the relationship to reads
ggplot(LRIIIReads, aes(NgPerUl, reads, color=plate, shape=plate))+
  geom_point(show.legend = TRUE)+
  geom_smooth(method='lm', formula= y~x)+
  stat_regline_equation(label.y = 200000, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 150000, aes(label = ..rr.label..))+
  facet_wrap(~plate)

Image Added

There is still a relationship here. We will proceed with 1 column of qPCR as a check on sequence-able state, and use absorbance to adjust for less read disparity.

We will perform absorbance checks, compile the data, and then sort out the best program for normalization. Probably, we will normalize to around 10 ng/ul. 1 nM is the minimum library concentration for a NovaSeq run. From the data here, 1.44 nM ~ 4 ng/ul. We would like to operate under a larger margin of error.

Is there a relationship between reads and qPCR?

Code Block

language	r

ggplot(LRIIIReads, aes(NgPerUl, mean, color=plate, shape=plate))+
  geom_point(show.legend = TRUE)+
  geom_smooth(method='lm', formula= y~x)+
  stat_regline_equation(label.y = 125, aes(label = ..eq.label..)) +
  stat_regline_equation(label.y = 100, aes(label = ..rr.label..))+
  facet_wrap(~plate)

Image Added

Excluding the Mock Community samples, there is a strong relationship between absorbance and qPCR. The odd MC results may be a result of the small Data Set, their larger than average contribution to the data set, and their very low relative complexity.

Files:

View file

name	LRIIIanalysis.pdf

View file

name	LRIIIanalysis.Rmd

View file

name	MC_MB_Comparison.csv

View file

name	LRIIIfiltermergestats.csv

View file

name	LRIII.txt

View file

name	5LVD5_Col123.csv

View file

name	5DG5_COL456.csv

View file

name	LRIIIanalysis.pdf

View file

name	LRIIIanalysis.Rmd

Version	Old Version 4	New Version Current
Changes made by	Gregg Randolph	Gregg Randolph
Saved on	Mar 23, 2022	Mar 29, 2022

Versions Compared

Key

Files: