Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Info

Status (02 May 2022)

...

The directory for the raw sequence data (typically gz compressed; use run_pigz.sh and run_unpigz.sh to compress and decompress with multithreaded pigz, using SLURM) and the parsed and split reads is /project/microbiome/data_queue/seq/5ALAReRun2/rawdata. Files for individual samples will be in /project/microbiome/data_queue/seq/5ALAReRun2/rawdata/sample_fastq/.

Demultiplexing

...

Code Block
mkdir -p /gscratch/grandol1/IllTest6-17ReRun2/rawdata
cd /gscratch/grandol1/IllTest6-17ReRun2/rawdata
unpigz --to-stdout /project/microbiome/data_queue/seq/IllTest6-17ReRun2/rawdata/IllTestReRun2EPSCoR_S1_R1_001.fasqfastq.gz  | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - TOAD_AIRReRun2_R1_ ;
unpigz --to-stdout /project/microbiome/data_queue/seq/IllTest6-17ReRun2/rawdata/IllTestReRun2EPSCoR_S1_R2_001.fasqfastq.gz  | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - TOAD_AIRReRun2_R2_

making 257 R1 files and 257 R2 files, with structured names (e.g., for the R1 set):

...

run_parse_count_onSplitInput.pl also writes to /gscratch.

TOAD_AIRReRun2_Demux.csv is used to map MIDS to sample names and projects.

...