Boilerplate lab and bioinformatics methods

@Gordon Custer @Reilly Dibner @Macy Ricketts @Seifeddine Ben Tekaya @Ella DeWolf @John Calder @Erin Bentley @Paul Ayayee @Alessandra Ceretto @Abby Hoffman @Alex Buerkle @Gregg Randolph @Shannon Harris

Please be sure to paraphrase this as I plan on using what is written here in one of my own papers and we don’t want to plagiarize one another accidentally. Also, we should have a manuscript soon that can be cited to support these methods. I reference that manuscript a few times below as Harrison et al. XX.

Current as of Oct 5, 2020

Sampling and sample processing

This is likely gonna differ by project. Just as a reminder to us all, we should include which kits we used to do extractions, that we used an Integra AssistPlus robot, that we ground the hell out of stuff with a tissue lyser, and that we included a bunch of extraction blanks. Of course, we will all also have a bunch of other sample prep stuff to mention here too.

Library preparation

Prior to library preparation, a synthetically designed internal standard (ISD) was added to extracted DNA. This ISD is described in Harrison et al. 2020 and allows for conversion of the relative abundance data obtained from the sequencer into estimates of actual abundances. To account for cross-contamination, ‘coligo’ sequences were also added to each well (Harrison et al. xx). Coligos are synthetically designed DNAs. By adding a unique coligo to each well, it is possible to track incidences of cross-contamination. We included negative controls within our library to account for contamination of PCR reagents. We also performed library preparation on a ZymoBiomics mock community, as a positive control.

After coligos and ISD were added to all DNAs, they were normalized to a standardized concentration of 10 ng/ul (samples, such as blanks, that had less DNA then this were included in the library as is, without concentration). The same library preparation approach was used for both focal loci. The 515–806 (Walters et al. 2016) primer pair was used to amplify the V4 region of the 16S locus, and the ITS1f-ITS2 (Gardes and Bruns 1993, White et al. 1990) primer pair was used to amplify the ITS1 locus. A two-step PCR approach was used, where molecular identifiers (MIDs) were added to both ends of template molecules during an initial round of PCR, along with a portion of the Illumina flow cell adaptors. In a subsequent round of PCR, the remaining portion of the flow cell adaptor was added (see Harrison et al. XX). All MIDs were a Levenshtein distance of two or more apart and varied in length from 8–10 bases (Fadrosh et al. 2014, Kozich et al. 2013, Parchman et al. 2012). Variable length MIDs increase heterogeneity in the early portions of the template, which can prevent cluster loss during sequencing (Fadrosh et al. 2014). Kapa HiFi Hot Start polymerase, Kapa HiFi Hot Start buffer and reagents, and HPLC grade water were used during PCR. PCR conditions for the first round were: 95° for 3 min; followed by 15 cycles of 98° for 30 sec, 62° for 30 sec, and 72° for 30 s; with a final 72° elongation step for 5 min and a 4° hold. PCR products were cleaned using AxyPrep MagBead magnetic beads (Axygen; Union City, CA, USA; see Harrison et al. XX for details). PCR conditions for the second round were: 95° for 3 min; followed by 19 cycles of 98° for 30 sec, 55° for 30 sec, and 72° for 30 s; with a final 72° elongation step for 5 min and a 4° hold. Products from the second round of PCR were also cleaned using AxyPrep MagBead magnetic beads. Library success was confirmed using a Bioanalyzer fragment analyzer (Agilent; Santa Clara, CA, USA).

Libraries were sequenced by Psomagen (Rockville, Maryland, USA) on an Illumina NovaSeq 6000 using 2x250 paired-end sequencing.

Bioinformatics

Sequence data were demultiplexed using a custom perl script (created by C. Alex Buerkle). Unique reads were identified ('dereplicated') using vsearch v.2.9.0 (Edgar 2010, Rognes et al. 2016). Dereplicated reads were clustered using the ‘cluster_unoise’ (Edgar 2016) algorithm and a 99% similarity threshold. We stipulated that a sequence must occur 12 or more times for it to be considered as a potential OTU. This choice was made because of the very large number of reads we obtained, as a way to avoid analyzing variants caused by technical error. Chimeric sequences were removed using 'uchime3_denovo' algorithm (Edgar et al. 2011) and the resulting OTUs used to make an OTU table using the 'usearch_global' algorithm.

OTUs that corresponded with the ISD were identified using ‘usearch_global' with the ISD sequence as the queried database. Similarly, coligo sequences were identified using the ‘search_exact’ algorithm of vsearch with coligo sequences as the database. Identification of coligos required the ‘search_exact’ algorithm because the heuristics of the 'usearch_global’ algorithm caused occasoinal mismatches during testing, because of how short the coligo sequences were. Computing was performed using the Teton Computing Environment at the Advanced Research Computing Center, University of Wyoming, Laramie (https://doi.org/10.15786/M2FY47).

Citations:

Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26(19), 2460-2461.

Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C., & Knight, R. (2011). UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27(16), 2194-2200.

Edgar, R. C. (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. BioRxiv, 081257.

Fadrosh, D. W. et al. (2014). “An improved dual-indexing approach for multiplexed 16SrRNA gene sequencing on the Illumina MiSeq platform”.Microbiome2, p. 6.

Gardes M, Bruns TD (1993) ITS primers with enhanced specificity for basidiomycetes – application to the identification of mycorrhizae and rusts. Mol Ecol 2: 113–118.

Harrison et al. XX - forthcoming methods paper

Harrison, J. G., John Calder, W., Shuman, B., & Alex Buerkle, C. (2020). The quest for absolute abundance: The use of internal standards for DNA‐based community ecology. Molecular Ecology Resources.

Kozich, J. J. et al. (2013). “Development of a dual-index sequencing strategy and cura-tion pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencingplatform”.Applied and Environmental Microbiology79.17, pp. 5112–5120

Parchman, T. L., Gompert, Z., Mudge, J., Schilkey, F. D., Benkman, C. W., & Buerkle, C. A. (2012). Genome‐wide association genetics of an adaptive trait in lodgepole pine. Molecular ecology, 21(12), 2991-3005.

Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ, 4, e2584.

Walters, W., Hyde, E. R., Berg-Lyons, D., Ackermann, G., Humphrey, G., Parada, A., ... & Apprill, A. (2016). Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal internal transcribed spacer marker gene primers for microbial community surveys. Msystems, 1(1), e00009-15.

White TJ, Bruns TD, Lee SB, Taylor JW (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: Innis MA, Gelfand DH, Sninsky JJ, White TJ, editors. PCR protocols: a guide to methods and applications. United States: Academic Press. pp. 315–322.