Sampling and sample processing

This is likely gonna differ by project. Just as a reminder to us all, we should include which kits we used to do extractions, that we used an Integra AssistPlus robot, that we ground the hell out of stuff with a tissue lyser, and that we included a bunch of extraction blanks. Of course, we will all also have a bunch of other sample prep stuff to mention here too.

Library preparation

Prior to library preparation, a synthetically designed internal standard (ISD) was added to extracted DNA. This ISD is described in Harrison et al. 2020 and allows for conversion of the relative abundance data obtained from the sequencer into estimates of actual abundances. To account for cross-contamination, ‘coligo’ sequences were also added to each well (Harrison et al. xx). Coligos are synthetically designed DNAs. By adding a unique coligo to each well, it is possible to track incidences of cross-contamination. We included negative controls within our library to account for contamination of PCR reagents. We also performed library preparation on a ZymoBiomics mock community, as a positive control.

After coligos and ISD were added to all DNAs, they were normalized to a standardized concentration of 10 ng/ul (samples, such as blanks, that had less DNA then this were included in the library as is, without concentration). The same library preparation approach was used for both focal loci. The 515–806 (Walters et al. 2016) primer pair was used to amplify the V4 region of the 16S locus, and the ITS1f-ITS2 (Gardes and Bruns 1993, White et al. 1990) primer pair was used to amplify the ITS1 locus. A two-step PCR approach was used, where molecular identifiers (MIDs) were added to both ends of template molecules during an initial round of PCR, along with a portion of the Illumina flow cell adaptors. In a subsequent round of PCR, the remaining portion of the flow cell adaptor was added (see Harrison et al. XX). All MIDs were a Levenshtein distance of two or more apart and varied in length from 8–10 bases (Fadrosh et al. 2014, Kozich et al. 2013, Parchman et al. 2012). Variable length MIDs increase heterogeneity in the early portions of the template, which can prevent cluster loss during sequencing (Fadrosh et al. 2014). Kapa HiFi Hot Start polymerase, Kapa HiFi Hot Start buffer and reagents, and HPLC grade water were used during PCR. PCR conditions for the first round were: 95° for 3 min; followed by 15 cycles of 98° for 30 sec, 62° for 30 sec, and 72° for 30 s; with a final 72° elongation step for 5 min and a 4° hold. PCR products were cleaned using AxyPrep MagBead magnetic beads (Axygen; Union City, CA, USA; see Harrison et al. XX for details). PCR conditions for the second round were: 95° for 3 min; followed by 19 cycles of 98° for 30 sec, 55° for 30 sec, and 72° for 30 s; with a final 72° elongation step for 5 min and a 4° hold. Products from the second round of PCR were also cleaned using AxyPrep MagBead magnetic beads. Library success was confirmed using a Bioanalyzer fragment analyzer (Agilent; Santa Clara, CA, USA).

Libraries were sequenced by Psomagen (Rockville, Maryland, USA) on an Illumina NovaSeq 6000 using 2x250 paired-end sequencing.


Sequence data were demultiplexed using a custom perl script (created by C. Alex Buerkle). Unique reads were identified ('dereplicated') using vsearch v.2.9.0 (Edgar 2010, Rognes et al. 2016). Dereplicated reads were clustered using the ‘cluster_unoise’ (Edgar 2016) algorithm and a 99% similarity threshold. We stipulated that a sequence must occur 12 or more times for it to be considered as a potential OTU. This choice was made because of the very large number of reads we obtained, as a way to avoid analyzing variants caused by technical error. Chimeric sequences were removed using 'uchime3_denovo' algorithm (Edgar et al. 2011) and the resulting OTUs used to make an OTU table using the 'usearch_global' algorithm.

OTUs that corresponded with the ISD were identified using ‘usearch_global' with the ISD sequence as the queried database. Similarly, coligo sequences were identified using the ‘search_exact’ algorithm of vsearch with coligo sequences as the database. Identification of coligos required the ‘search_exact’ algorithm because the heuristics of the 'usearch_global’ algorithm caused occasoinal mismatches during testing, because of how short the coligo sequences were. Computing was performed using the Teton Computing Environment at the Advanced Research Computing Center, University of Wyoming, Laramie (https://doi.org/10.15786/M2FY47).


