Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

cutadapt -g ^GTGYCAGCMGCCGCGGTAA -o trimmed_5RM1-16S_S1_L001_R1_001.fastq parsed_5RM1-16S_S1_L001_R1_001.fastq -e 0.25

Remove forward 16s primer

cutadapt -g ^GTGYCAGCMGCCGCGGTAA -o trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq parsed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -e 0.25

...

cutadapt -g ^GGACTACHVGGGTWTCTAAT -o trimmed_5RM1-16S_S1_L001_R2_001.fastq parsed_5RM1-16S_S1_L001_R2_001.fastq fastq -e 0.25

 Remove reverse 16s primer

 cutadapt -g ^GGACTACHVGGGTWTCTAAT -o trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq parsed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq fastq -e 0.25

Clean up the headers of all files to remove problematic characters.

...

sed 's/\s/_/g' trimmed_5RM1-16S_S1_L001_R2_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S_S1_L001_R2_001.fastq

Clean up the headers of all files to remove problematic characters.

sed 's/\s/_/g' trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq

sed 's/\s/_/g' trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq

Step 3. Make coligo table

Note, I changed my mind and think it makes sense to include coligo processing in any comprehensive bash script, so long as we continue to use coligos.

The input to this call to awk is the output from step 2 (primers removed and headers cleaned up with sed).

awk '{if ($1 ~ /@/) {print}else{print substr ($0, 0, 13)}}' cleaned_trimmed_5RM1-16S_S1_L001_R1_001.fastq > coligo_5RM1-16S_S1_L001_R1_001.fastq

vsearch --fastq_filter coligo_5RM1-16S_S1_L001_R1_001.fastq --fastaout filter_coligo_5RM1-16S_S1_L001_R1_001.fasta

Output of the following command is our coligo table. It is left to the client to look at the coligo table, we don’t process it further here.

vsearch -search_exact filter_coligo_5RM1-16S_S1_L001_R1_001.fasta -db /project/microbiome/ref_db/coligos_and_abbreviatedISD.fa -strand plus -otutabout final_coligo_5RM1-16S_S1_L001_R1_001.fasta -minseqlen 5

The input to this call to awk is the output from step 2 (primers removed and headers cleaned up with sed).

awk '{if ($1 ~ /@/) {print}else{print substr ($0, 0, 13)}}' cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq > coligo_5RM1-16S-TAKE2_S1_L001_R1_001.fastq

vsearch --fastq_filter coligo_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastaout filter_coligo_5RM1-16S-TAKE2_S1_L001_R1_001.fasta

Output of the following command is our coligo table. It is left to the client to look at the coligo table, we don’t process it further here.

vsearch -search_exact filter_coligo_5RM1-16S-TAKE2_S1_L001_R1_001.fasta -db /project/microbiome/ref_db/coligos_and_abbreviatedISD.fa -strand plus -otutabout final_coligo_5RM1-16S-TAKE2_S1_L001_R1_001.fasta -minseqlen 5

 

Step 4. Remove low complexity reads (this gets rid of coligos)

usearch -filter_lowc cleaned_trimmed_5RM1-16S_S1_L001_R1_001.fastq -reverse cleaned_trimmed_5RM1-16S_S1_L001_R2_001.fastq -output NoLow_5RM1-16S_S1_L001_R1_001.fastq -output2 NoLow_5RM1-16S_S1_L001_R2_001.fastq

And Again:

usearch -filter_lowc cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -reverse cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq -output NoLow_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -output2 NoLow_5RM1-16S-TAKE2_S1_L001_R2_001.fastq

...

vsearch --fastq_mergepairs NoLow_5RM1-16S_S1_L001_R1_001.fastq --reverse NoLow_5RM1-16S_S1_L001_R2_001.fastq --fastqout Merged_5RM1-16S_S1_L001_R1_001.fastq --fastq_maxdiffs 12 --fastq_allowmergestagger --fastq_minovlen 10 --fastq_minmergelen 60 --fastqout_notmerged_fwd UnMerged_5RM1-16S_S1_L001_R1_001.fastq --fastqout_notmerged_rev UnMerged_5RM1-16S_S1_L001_R2_001.fastq

And Again:

vsearch --fastq_mergepairs NoLow_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --reverse NoLow_5RM1-16S-TAKE2_S1_L001_R2_001.fastq --fastqout Merged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastq_maxdiffs 12 --fastq_allowmergestagger --fastq_minovlen 10 --fastq_minmergelen 60 --fastqout_notmerged_fwd UnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastqout_notmerged_rev UnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq

...

vsearch -fastq_filter Merged_5RM1-16S_S1_L001_R1_001.fastq -fastq_maxee 1 -fastaout filteredMerged_5RM1-16S_S1_L001_R1_001.fastq

Filter merged reads using:

vsearch -fastq_filter Merged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -fastq_maxee 1 -fastaout filteredMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq

...

/project/microbiome/bin/fastp --in1 UnMerged_5RM1-16S_S1_L001_R1_001.fastq --in2 UnMerged_5RM1-16S_S1_L001_R2_001.fastq -q 15 -u 40 -l 107 --out1 FilteredUnMerged_5RM1-16S_S1_L001_R1_001.fastq --out2 FilteredUnMerged_5RM1-16S_S1_L001_R2_001.fastq

Filter reads that didn’t merge using:

/project/microbiome/bin/fastp --in1 UnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --in2 UnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq -q 15 -u 40 -l 107 --out1 FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --out2 FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq

...

vsearch -fastx_filter FilteredUnMerged_5RM1-16S_S1_L001_R2_001.fastq --fastq_trunclen 215 -fastqout TruncFilteredUnMerged_5RM1-16S_S1_L001_R2_001.fastq

And Again

vsearch -fastx_filter FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastq_trunclen 215 -fastqout TruncFilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq

...

vsearch -derep_fulllength Merged_5RM1-16S_S1_L001_R1_001.fastq --output DeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --sizeout --sizein

And Again:

vsearch -derep_fulllength Merged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --output DeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --sizeout --sizein

...

vsearch --uchime3_denovo DenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --nonchimeras NoChiDenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq

And Again:

vsearch --cluster_unoise DeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --centroids DenoisedDeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --minsize 8 --relabel 'otu' --sizein --sizeout

...

vsearch --usearch_global DeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --db NoChiDenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --otutabout 'OTU_5RM1-16S_S1_L001_R1_001.fastq' --id 0.99

And Again:

vsearch --usearch_global DeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --db NoChiDenoisedDeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --otutabout 'OTU_5RM1-16S-TAKE2_S1_L001_R1_001.fastq' --id 0.99

...

sed -i 's/^#OTU ID/OTUID/' OTU_5RM1-16S_S1_L001_R1_001.fastq
And Again:

sed -i 's/^#OTU ID/OTUID/' OTU_5RM1-16S-TAKE2_S1_L001_R1_001.fastq