Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Info

Status (02 May 2022)

Table of Contents

Demultiplexing and splitting

...

Code Block
mkdir -p /gscratch/grandol1/NS6/rawdata
cd /gscratch/grandol1/NS6/rawdata
unpigz --to-stdout /project/microbiome/data_queue/seq/NS6/rawdata/NovaSeq6_pool_1.fq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - NS6_R1_ ;
unpigz --to-stdout /project/microbiome/data_queue/seq/NS6/rawdata/NovaSeq6_pool_2.fq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - NS6_R2_

making 94 257 R1 files and 94 257 R2 files, with structured names (e.g., for the R1 set):

/gscratch/grandol1/5ALA/rawdata/NS6_R1_000.fastq
/gscratch/grandol1/5ALA/rawdata/NS6_R1_001.fastq
etc.Stopped at above step on 1/31/23 2:41pm

run_parse_count_onSplitInput.pl also writes to /gscratch.

NS5NS6_Demux.csv is used to map MIDS to sample names and projects.

...

splitFastq.pl and splitFastq_manyInputfiles.pl will need tweaking in the future, whenever sample names and the format of the key for demultiplexing and metadata changes. The number of columns has differed among some of early sequence lanes, which necessitated changes to this parsing script.

Stopped at above step on 2/01/23 6:05pm

Calculate summary statistics on reads

...