...
Splitting the raw (uncompressed) data was accomplish with the program split
(done in an interactive SLURM job), with 16x106 lines (4x106 reads) being written to each file (with a remainder in the final file). These files were written to /gscratch and, as intermediate files that can be reconstructed readily, will not be retained long-term.
Code Block |
---|
mkdir -p /gscratch/buerkle/psomagen_9oct20_novaseq3/rawdata
cd /gscratch/buerkle/psomagen_9oct20_novaseq3/rawdata
split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /pfs/tsfs1/project/microbiome/data/seq/psomagen_9oct20_novaseq3/rawdata/NovaSeq3_R1.fastq novaseq3_R1_ ;
split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq /pfs/tsfs1/project/microbiome/data/seq/psomagen_9oct20_novaseq3/rawdata/NovaSeq3_R2.fastq novaseq3_R2_ |
...