/
GTLCONtrols Bioinformatics
GTLCONtrols Bioinformatics
Assign taxonomy
salloc --account=microbiome -t 0-02:00 --mem=500G
module load swset/2018.05
module load gcc/7.3.0
module load vsearch/2.15.1
vsearch --sintax zotus.fa --db /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa -tabbedout GTLCON_16S_NS5.sintax -sintax_cutoff 0.8
Output:
Reading file /project/microbiome/users/grandol1/ref_db/gg_16s_13.5.fa 100%
1769520677 nt in 1262986 seqs, min 1111, max 2368, avg 1401
Counting k-mers 100%
Creating k-mer index 100%
Classifying sequences 100%
Classified 631 of 631 sequences (100.00%)
Convert into useful form:
awk -F "\t" '{OFS=","} NR==1 {print "OTU_ID","SEQS","SIZE","DOMAIN","KINGDOM","PHYLUM","CLASS","ORDER","FAMILY","GENUS","SPECIES"} {gsub(";", ","); gsub("centroid=", ""); gsub("seqs=", ""); gsub("size=", ""); match($4, /d:[^,]+/, d); match($4, /k:[^,]+/, k); match($4, /p:[^,]+/, p); match($4, /c:[^,]+/, c); match($4, /o:[^,]+/, o); match($4, /f:[^,]+/, f); match($4, /g:[^,]+/, g); match($4, /s:[^,]+/, s); print $1, d[0]=="" ? "NA" : d[0], k[0]=="" ? "NA" : k[0], p[0]=="" ? "NA" : p[0], c[0]=="" ? "NA" : c[0], o[0]=="" ? "NA" : o[0], f[0]=="" ? "NA" : f[0], g[0]=="" ? "NA" : g[0], s[0]=="" ? "NA" : s[0] }' GTLCON_16S_NS5.sintax > GTLCON_16S_NS5taxonomy.csv
Related content
Teton: beginner's guide
Teton: beginner's guide
Read with this
NS5 Blank Analysis
NS5 Blank Analysis
More like this
How to identify and remove contaminant sequences
How to identify and remove contaminant sequences
Read with this
Bioinformatics for loc ad1
Bioinformatics for loc ad1
More like this
micro NovaSeq Run #5
micro NovaSeq Run #5
Read with this
BigHorn Sheep (5CM) Bioinformatics
BigHorn Sheep (5CM) Bioinformatics
More like this