...
Code Block | ||
---|---|---|
| ||
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\3,\1,\2/' Hmax1Demux.csv > Hmax1Demux_fixed.csv |
On I started parse_barcodes_slurm_L1.sh
and parse_barcodes_slurm_L2.sh
. Demultiplexing on the two files in parallel took more than the two days I initially allocated to it (in part because of the ~10% of the data that do not match our MIDS, because we did not filter contaminants). So I broke the data into 228 parts (each with 160 million lines) and ran 228 jobs in parallel.
Code Block | ||
---|---|---|
| ||
mkdir /gscratch/buerkle/data/HMAX1
cd /gscratch/buerkle/data/HMAX1
cat /project/microbiome/data/seq/HMAX1/rawreads/WyomingPool* | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - WyomingPool_HMAX1_
mkdir rawreads
mv WyomingPool_HMAX1_* rawreads/
/project/microbiome/analyses/gtl/HMAX1/demultiplex/run_parsebarcodes_onSplitInput.pl |
Note that I did not do separate contaminant filtering (which I did for Penstemon), because the parsing code and other downstream steps should knock out contaminants. I can double-check this.
...