Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

See below for a series of QC analyses that are worth considering.

Note, when reading OTU tables into R it is worth being aware that vsearch inserts a “#” and a space into the first entry of the first field. To get around this, you can use this solution when reading the data into R:

“read.table("otutable", strip.white=TRUE, comment.char="", sep="\t", header=TRUE)”

Note that R will place “X” in front of the field names because they start with an integer and are thus not syntactically valid. Not a big deal, but be aware of this when doing data wrangling.

  1. Count reads per replicate and get a feel for those samples that failed. Perhaps a treatment or a sampling location just didn’t pan out.

  2. Check out technical replicates and make sure they look like similar. Combine read counts from them, if desired. If you don't want to combine them then beware to avoid pseudoreplication in downstream analyses.

  3. Check for cross-contamination using coligos. Decide what to do about cross-contamination if you find it. See THIS link for instructions for how to determine which OTUs are the ISD and coligo.

  4. Check out your negative controls and see what is in there. Decide how to deal with contaminants. Note that deleting anything that shows up in a blank from the whole dataset is not a good approach. Often one can see very minor contamination of some ubiquitous microbe that is really present out in nature. Currently, I recommend deleting those taxa that are very abundant in blanks, but not abundant in replicates. For instance, if 5% or more of the total reads from a taxon are in a blank then I might consider deleting that taxon. I also may delete a taxon if it is a known contaminant, see Salter et al. 2014 and other papers exploring the ‘kitome’. One may also want to redo analyses with and without the putative contaminants to see if anything changes.

  5. Doublecheck that you do not have multiple OTUs that are the same taxon, but perhaps have an indel or other variant present. Combine OTUs if desired.

  6. Generate taxonomic hypotheses for each OTU. Typically this is done using a classifier such as SINTAX and a database of curated sequences, such as SILVA or UNITE. If you do not know what I am talking about then please talk to Josh, Paul, John, Gordon, or someone else who has done this before. A new tool called AutoTax may be useful as well (see: https://mbio.asm.org/content/11/5/e01557-20#sec-1). Another possible tool we could use for both ITS and 16S classification IDTAXA (see: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0521-5). When generating taxonomic hypotheses make sure to report the date accessed or version of the database used, because the results will change depending on database version.

  7. Convert from relative to absolute abundances if you want to do that. It can be instructive to compare results between relative and absolute datasets, they will almost certainly differ substantially.

  8. Remove nontarget OTUs. These are sequences from hosts, eukaryotes etc.

  9. While not an analysis, most journals will require your data be deposited online. Consider depositing the raw sequences, the OTU table, all code used, and the consensus OTU sequences. The university can host data with a DOI for free. Contact Josh if you need to do that soon, otherwise we will eventually post instructions here. Post your data/code at the latest possible date, or in a version controlled repo for the code, else you will need to repost after the many rounds of revisions that are inevitable when publishing.

...