Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To speed up the demultiplexing (which would have taken several days on each input file), I split the concatenated raw files into 435 files with 16 million lines each. I still need needed to respect the different libraries, because these use overlapping MIDs. I do did that with the split, but then I also needed to modify run_parsebarcodes_onSplitInput.pl to work with different pools/demux keys, which I did.

Code Block
languagebash
mkdir -p /gscratch/buerkle/data/alfalfa/rawreads
cd /gscratch/buerkle/data/alfalfa/rawreads
split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool1_S1_L001_R1_001.fastq alf1_pool1_
split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool2_S2_L002_R1_001.fastq alf1_pool2_
split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool3_S1_L001_R1_001.fastq alf1_pool3_
split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool4_S2_L002_R1_001.fastq alf1_pool4_
/project/evolgen/assem/alf1GBS_NS1_mar21/demultiplex/run_parsebarcodes_onSplitInput.pl

...