Step 0. Demultiplex
module load swset/2018.05
module load gcc/7.3.0
module load usearch/10.0.240
module load perl-text-levenshtein-xs
perl //project/microbiome/data/seq/cu_24feb21novaseq4/rawdata/parse_count.pl loc_ad1_Demux.csv 5RM1-16S_S1_L001_R1_001.fastq 5RM1-16S_S1_L001_R2_001.fastq FS10000611
perl //project/microbiome/data/seq/cu_24feb21novaseq4/rawdata/parse_count.pl ../loc_ad1_Demux.csv 5RM1-16S-TAKE2_S1_L001_R1_001.fastq 5RM1-16S-TAKE2_S1_L001_R2_001.fastq FS10000611
Step 1. load modules
module load cutadapt/2.10 usearch/10.0.240 vsearch/2.15.1
The only other software that will be used is fastp, which is located here: /project/microbiome/bin/fastp
Step 2. Remove primers using cutadapt and clean up headers
Remove forward 16s primer
cutadapt -g ^GTGYCAGCMGCCGCGGTAA -o trimmed_5RM1-16S_S1_L001_R1_001.fastq parsed_5RM1-16S_S1_L001_R1_001.fastq -e 0.25
cutadapt -g ^GTGYCAGCMGCCGCGGTAA -o trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq parsed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -e 0.25
Remove reverse 16s primer
cutadapt -g ^GGACTACHVGGGTWTCTAAT -o trimmed_5RM1-16S_S1_L001_R2_001.fastq parsed_5RM1-16S_S1_L001_R2_001.fastq -e 0.25
cutadapt -g ^GGACTACHVGGGTWTCTAAT -o trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq parsed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq -e 0.25
Clean up the headers of all files to remove problematic characters.
sed 's/\s/_/g' trimmed_5RM1-16S_S1_L001_R1_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S_S1_L001_R1_001.fastq
sed 's/\s/_/g' trimmed_5RM1-16S_S1_L001_R2_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S_S1_L001_R2_001.fastq
sed 's/\s/_/g' trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq
sed 's/\s/_/g' trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq | sed -e 's/^@16/@rna16/' - | sed -e 's/-/_/g' > cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq
Step 4. Remove low complexity reads (this gets rid of coligos)
usearch -filter_lowc cleaned_trimmed_5RM1-16S_S1_L001_R1_001.fastq -reverse cleaned_trimmed_5RM1-16S_S1_L001_R2_001.fastq -output NoLow_5RM1-16S_S1_L001_R1_001.fastq -output2 NoLow_5RM1-16S_S1_L001_R2_001.fastq
usearch -filter_lowc cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -reverse cleaned_trimmed_5RM1-16S-TAKE2_S1_L001_R2_001.fastq -output NoLow_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -output2 NoLow_5RM1-16S-TAKE2_S1_L001_R2_001.fastq
There doesn’t appear to be a similar command in vsearch.
Step 5. Merge reads (note we are saving both merged and unmerged reads so there are multiple outputs to this command)
vsearch --fastq_mergepairs NoLow_5RM1-16S_S1_L001_R1_001.fastq --reverse NoLow_5RM1-16S_S1_L001_R2_001.fastq --fastqout Merged_5RM1-16S_S1_L001_R1_001.fastq --fastq_maxdiffs 12 --fastq_allowmergestagger --fastq_minovlen 10 --fastq_minmergelen 60 --fastqout_notmerged_fwd UnMerged_5RM1-16S_S1_L001_R1_001.fastq --fastqout_notmerged_rev UnMerged_5RM1-16S_S1_L001_R2_001.fastq
vsearch --fastq_mergepairs NoLow_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --reverse NoLow_5RM1-16S-TAKE2_S1_L001_R2_001.fastq --fastqout Merged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastq_maxdiffs 12 --fastq_allowmergestagger --fastq_minovlen 10 --fastq_minmergelen 60 --fastqout_notmerged_fwd UnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastqout_notmerged_rev UnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq
Step 6. Filter reads
Filter merged reads using:
vsearch -fastq_filter Merged_5RM1-16S_S1_L001_R1_001.fastq -fastq_maxee 1 -fastaout filteredMerged_5RM1-16S_S1_L001_R1_001.fastq
vsearch -fastq_filter Merged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq -fastq_maxee 1 -fastaout filteredMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq
Filter reads that didn’t merge using:
/project/microbiome/bin/fastp --in1 UnMerged_5RM1-16S_S1_L001_R1_001.fastq --in2 UnMerged_5RM1-16S_S1_L001_R2_001.fastq -q 15 -u 40 -l 107 --out1 FilteredUnMerged_5RM1-16S_S1_L001_R1_001.fastq --out2 FilteredUnMerged_5RM1-16S_S1_L001_R2_001.fastq
/project/microbiome/bin/fastp --in1 UnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --in2 UnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq -q 15 -u 40 -l 107 --out1 FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --out2 FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq
Step 7. Truncate and join reads that didn’t merge
vsearch -fastx_filter FilteredUnMerged_5RM1-16S_S1_L001_R1_001.fastq --fastq_trunclen 215 -fastqout TruncFilteredUnMerged_5RM1-16S_S1_L001_R1_001.fastq
vsearch -fastx_filter FilteredUnMerged_5RM1-16S_S1_L001_R2_001.fastq --fastq_trunclen 215 -fastqout TruncFilteredUnMerged_5RM1-16S_S1_L001_R2_001.fastq
vsearch -fastx_filter FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --fastq_trunclen 215 -fastqout TruncFilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq
vsearch -fastx_filter FilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq --fastq_trunclen 215 -fastqout TruncFilteredUnMerged_5RM1-16S-TAKE2_S1_L001_R2_001.fastq
Step 8. Dereplicate reads
I think it makes sense to cat both merged (output from step 5 above) and joined reads (output from step 7 above) and pipe them into this command. Otherwise the command must be run twice, once on the merged reads then again on the joined reads and then the output of both of those commands should be concatenated.
vsearch -derep_fulllength Merged_5RM1-16S_S1_L001_R1_001.fastq --output DeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --sizeout --sizein
vsearch -derep_fulllength Merged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --output DeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --sizeout --sizein
Step 9. Make OTUs and remove chimeras
vsearch --cluster_unoise DeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --centroids DenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --minsize 8 --relabel 'otu' --sizein --sizeout
vsearch --uchime3_denovo DenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --nonchimeras NoChiDenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq
vsearch --cluster_unoise DeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --centroids DenoisedDeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --minsize 8 --relabel 'otu' --sizein --sizeout
vsearch --uchime3_denovo DenoisedDeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --nonchimeras NoChiDenoisedDeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq
Step 10. Make OTU table
The input to this is concatenated merged and joined reads (reads should be filtered and in fasta format as output from steps 7 and 5 above).
The “-db” argument below stands for database and should be the fasta sequence for the OTUs/ESVs (from step 9) also note that the OUTPUT here is in single quotes.
vsearch --usearch_global DeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --db NoChiDenoisedDeRepMerged_5RM1-16S_S1_L001_R1_001.fastq --otutabout 'OTU_5RM1-16S_S1_L001_R1_001.fastq' --id 0.99
vsearch --usearch_global DeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --db NoChiDenoisedDeRepMerged_5RM1-16S-TAKE2_S1_L001_R1_001.fastq --otutabout 'OTU_5RM1-16S-TAKE2_S1_L001_R1_001.fastq' --id 0.99
Remove the pesky “#OTU ID” that vsearch puts in the first cell of the table:
sed -i 's/^#OTU ID/OTUID/' OTU_5RM1-16S_S1_L001_R1_001.fastq
sed -i 's/^#OTU ID/OTUID/' OTU_5RM1-16S-TAKE2_S1_L001_R1_001.fastq