Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\3,\1,\2/' Hmax1Demux.csv > Hmax1Demux_fixed.csv

On I started parse_barcodes_slurm_L1.sh and parse_barcodes_slurm_L2.sh . Demultiplexing on the two files in parallel took more than the two days I initially allocated to it (in part because of the ~10% of the data that do not match our MIDS, because we did not filter contaminants). So I broke the data into 228 parts (each with 160 million lines) and ran 228 jobs in parallel.

Code Block
languagebash
mkdir /gscratch/buerkle/data/HMAX1
cd /gscratch/buerkle/data/HMAX1
cat /project/microbiome/data/seq/HMAX1/rawreads/WyomingPool* | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - WyomingPool_HMAX1_
mkdir rawreads
mv WyomingPool_HMAX1_* rawreads/
/project/microbiome/analyses/gtl/HMAX1/demultiplex/run_parsebarcodes_onSplitInput.pl

Note that I did not do separate contaminant filtering (which I did for Penstemon), because the parsing code and other downstream steps should knock out contaminants. I can double-check this.

...