...
The directory for the raw sequence data (typically gz compressed; use run_pigz.sh and run_unpigz.sh to compress and decompress with multithreaded pigz, using SLURM) and the parsed and split reads is /project/microbiome/data_queue/seq/5ALAReRun2/rawdata
. Files for individual samples will be in /project/microbiome/data_queue/seq/5ALAReRun2/rawdata/sample_fastq/
.
Demultiplexing
...
Code Block |
---|
mkdir -p /gscratch/grandol1/IllTest6-17ReRun2/rawdata cd /gscratch/grandol1/IllTest6-17ReRun2/rawdata unpigz --to-stdout /project/microbiome/data_queue/seq/IllTest6-17ReRun2/rawdata/IllTestReRun2EPSCoR_S1_R1_001.fasqfastq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - TOADReRun2_AIR_R1_ ; unpigz --to-stdout /project/microbiome/data_queue/seq/IllTest6-17ReRun2/rawdata/IllTestReRun2EPSCoR_S1_R2_001.fasqfastq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - TOAD_AIRReRun2_R2_ |
making 257 R1 files and 257 R2 files, with structured names (e.g., for the R1 set):
...
run_parse_count_onSplitInput.pl
also writes to /gscratch
.
TOAD_AIRReRun2_Demux.csv
is used to map MIDS to sample names and projects.
...