Page Comparison

...

Info

Status (09 Nov 2022)

Sequenced and data transferred to Teton

Nothing below here has been done yet

Demultiplexing and splitting

...

`RG_SP_500_S1_R1_001.fastq.gz`
`RG_SP_500_S1_R1_001.fastq.gz`

Demultiplexing

The work is done by run_parse_count_onSplitInput.pl. As the name implies, we split the raw data into many files (492), so that the parsing can be done in parallel by many nodes. The approximate string matching that we are doing requires ~140 hours of CPU time, so we are splitting the task across many jobs. By doing so, the parsing takes less than one hour.

...

Code Block

mkdir -p /gscratch/grandol1/NS55FB1_ContamTest/rawdata
cd /gscratch/grandol1/NS55FB1_ContamTest/rawdata
unpigz --to-stdout /project/microbiome/data_queue/seq/NS55FB1_ContamTest/rawdata/RG_SP_500_S17-5FB1-take3_S1_L001_R1_001.fastq.gz | split -l 160000001000000 -d --suffix-length=3 --additional-suffix=.fastq - NS55FB1_R1_ ;
unpigz --to-stdout /project/microbiome/data_queue/seq/NS55FB1_ContamTest/rawdata/RG_SP_500_S1_7-5FB1-take3_S1_L001_R2_001.fastq.gz | split -l 160000001000000 -d --suffix-length=3 --additional-suffix=.fastq - NS55FB1_R2_

making 240 10 R1 files and 240 10 R2 files, with structured names (e.g., for the R1 set):

/gscratch/grandol1/NS55FB1_ContamTest/rawdata/NS55FB1_R1_000.fastq
/gscratch/grandol1/NS55FB1_ContamTest/rawdata/NS55FB1_R1_001.fastq
etc.

run_parse_count_onSplitInput.pl also writes to /gscratch.

NS5_Demux.csv is used to map MIDS to sample names and projects.

Nothing below here has been done yet

Splitting to fastq for individuals

...

Versions Compared

Old Version 1

New Version 2

Key

Demultiplexing and splitting

`RG_SP_500_S1_R1_001.fastq.gz`
`RG_SP_500_S1_R1_001.fastq.gz`

Demultiplexing

Splitting to fastq for individuals

Page Comparison

Versions Compared

Old Version 1

New Version 2

Key

Demultiplexing and splitting

RG_SP_500_S1_R1_001.fastq.gzRG_SP_500_S1_R1_001.fastq.gz

Demultiplexing

Splitting to fastq for individuals

`RG_SP_500_S1_R1_001.fastq.gz`
`RG_SP_500_S1_R1_001.fastq.gz`