...
Info |
---|
Status (09 Nov 2022)
|
Nothing below here has been done yet
Demultiplexing and splitting
...
RG_SP_500_S1_R1_001.fastq.gz
RG_SP_500_S1_R1_001.fastq.gz
Demultiplexing
The work is done by run_parse_count_onSplitInput.pl
. As the name implies, we split the raw data into many files (492), so that the parsing can be done in parallel by many nodes. The approximate string matching that we are doing requires ~140 hours of CPU time, so we are splitting the task across many jobs. By doing so, the parsing takes less than one hour.
...
Code Block |
---|
mkdir -p /gscratch/grandol1/NS55FB1_ContamTest/rawdata cd /gscratch/grandol1/NS55FB1_ContamTest/rawdata unpigz --to-stdout /project/microbiome/data_queue/seq/NS55FB1_ContamTest/rawdata/RG_SP_500_S17-5FB1-take3_S1_L001_R1_001.fastq.gz | split -l 160000001000000 -d --suffix-length=3 --additional-suffix=.fastq - NS55FB1_R1_ ; unpigz --to-stdout /project/microbiome/data_queue/seq/NS55FB1_ContamTest/rawdata/RG_SP_500_S1_7-5FB1-take3_S1_L001_R2_001.fastq.gz | split -l 160000001000000 -d --suffix-length=3 --additional-suffix=.fastq - NS55FB1_R2_ |
making 240 10 R1 files and 240 10 R2 files, with structured names (e.g., for the R1 set):
/gscratch/grandol1/NS55FB1_ContamTest/rawdata/NS55FB1_R1_000.fastq
/gscratch/grandol1/NS55FB1_ContamTest/rawdata/NS55FB1_R1_001.fastq
etc.
run_parse_count_onSplitInput.pl
also writes to /gscratch
.
...
In /project/microbiome/data_queue/seq/NS5/otu
, I ran run_slurm_mkotu.pl
, which I modified to also pick up the joined reads (in addition the merged reads).
Nothing below here has been done yet
Make coligo table
In /project/microbiome/data_queue/seq/NS5/coligoISD
, /project/microbiome/data/seq/NS5/coligoISD
, and /project/microbiome/data/seq/NS5/coligoISD
, there are 16S
and ITS
directories for all projects. These contain a file named coligoISDtable.txt
with counts of the coligos and the ISD found in the trimmed forward reads, per sample. The file run_slurm_mkcoligoISDtable.pl
has the code that passes over all of the projects and uses vsearch
for making the table.
...