...
To speed up the demultiplexing (which would have taken several days on each input file), I split the concatenated raw files into 435 files with 16 million lines each. I still need needed to respect the different libraries, because these use overlapping MIDs. I do did that with the split, but then I also needed to modify run_parsebarcodes_onSplitInput.pl
to work with different pools/demux keys, which I did.
Code Block | ||
---|---|---|
| ||
mkdir -p /gscratch/buerkle/data/alfalfa/rawreads cd /gscratch/buerkle/data/alfalfa/rawreads split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool1_S1_L001_R1_001.fastq alf1_pool1_ split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool2_S2_L002_R1_001.fastq alf1_pool2_ split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool3_S1_L001_R1_001.fastq alf1_pool3_ split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /project/evolgen/data/local/alfalfa/alf1GBS_NS1_mar21/Pool4_S2_L002_R1_001.fastq alf1_pool4_ /project/evolgen/assem/alf1GBS_NS1_mar21/demultiplex/run_parsebarcodes_onSplitInput.pl |
...