Content Comparison

...

Code Block

language	bash

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\3,\1,\2/' Hmax1Demux.csv > Hmax1Demux_fixed.csv

On 06 Aug 2021 I started parse_barcodes_slurm_L1.sh and parse_barcodes_slurm_L2.sh . Demultiplexing on the two files in parallel took more than the two days I initially allocated to it (in part because of the ~10% of the data that do not match our MIDS, because we did not filter contaminants). So I broke the data into 228 parts (each with 160 million lines) and ran 228 jobs in parallel.

Code Block

language	bash

mkdir /gscratch/buerkle/data/HMAX1
cd /gscratch/buerkle/data/HMAX1
cat /project/microbiome/data/seq/HMAX1/rawreads/WyomingPool* | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - WyomingPool_HMAX1_
mkdir rawreads
mv WyomingPool_HMAX1_* rawreads/
/project/microbiome/analyses/gtl/HMAX1/demultiplex/run_parsebarcodes_onSplitInput.pl

Note that I did not do separate contaminant filtering (which I did for Penstemon), because the parsing code and other downstream steps should knock out contaminants. I can double-check this.

...

Version	Old Version 3	New Version 4
Changes made by	Alex Buerkle	Alex Buerkle
Saved on	Aug 06, 2021	Aug 09, 2021

Versions Compared

Key