/
GTLCONtrols Bioinformatics

GTLCONtrols Bioinformatics

Assign taxonomy

salloc --account=microbiome -t 0-02:00 --mem=500G module load swset/2018.05  module load gcc/7.3.0 module load vsearch/2.15.1 vsearch --sintax zotus.fa --db /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa -tabbedout GTLCON_16S_NS5.sintax -sintax_cutoff 0.8

Output:  

Reading file /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa 100%

1769520677 nt in 1262986 seqs, min 1111, max 2368, avg 1401

Counting k-mers 100%

Creating k-mer index 100%

Classifying sequences 100%

Classified 631 of 631 sequences (100.00%)

Convert into useful form:

 

awk -F "\t" '{OFS=","} NR==1 {print "OTU_ID","SEQS","SIZE","DOMAIN","KINGDOM","PHYLUM","CLASS","ORDER","FAMILY","GENUS","SPECIES"} {gsub(";", ","); gsub("centroid=", ""); gsub("seqs=", ""); gsub("size=", ""); match($4, /d:[^,]+/, d); match($4, /k:[^,]+/, k); match($4, /p:[^,]+/, p); match($4, /c:[^,]+/, c); match($4, /o:[^,]+/, o); match($4, /f:[^,]+/, f); match($4, /g:[^,]+/, g); match($4, /s:[^,]+/, s); print $1, d[0]=="" ? "NA" : d[0], k[0]=="" ? "NA" : k[0], p[0]=="" ? "NA" : p[0], c[0]=="" ? "NA" : c[0], o[0]=="" ? "NA" : o[0], f[0]=="" ? "NA" : f[0], g[0]=="" ? "NA" : g[0], s[0]=="" ? "NA" : s[0] }' GTLCON_16S_NS5.sintax > GTLCON_16S_NS5taxonomy.csv

 

 

Related content

Teton: beginner's guide
Teton: beginner's guide
Read with this
NS5 Blank Analysis
More like this
How to identify and remove contaminant sequences
How to identify and remove contaminant sequences
Read with this
Bioinformatics for loc ad1
Bioinformatics for loc ad1
More like this
micro NovaSeq Run #5
micro NovaSeq Run #5
Read with this
BigHorn Sheep (5CM) Bioinformatics
BigHorn Sheep (5CM) Bioinformatics
More like this