Page Comparison

Info

Status (02 May 2022)

Data arrived by Globus on 01 10 2023. Everything below is modified from Bioinformatics for Novaseq run 4 ). Data processing ~~finished~~ .

...

The directory for the raw sequence data (typically gz compressed; use run_pigz.sh and run_unpigz.sh to compress and decompress with multithreaded pigz, using SLURM) and the parsed and split reads is /project/microbiome/data_queue/seq/5ALAReRun2/rawdata. Files for individual samples will be in /project/microbiome/data_queue/seq/5ALAReRun2/rawdata/sample_fastq/.

Demultiplexing

...

Code Block

mkdir -p /gscratch/grandol1/IllTest6-17ReRun2/rawdata
cd /gscratch/grandol1/IllTest6-17ReRun2/rawdata
unpigz --to-stdout /project/microbiome/data_queue/seq/IllTest6-17ReRun2/rawdata/IllTestReRun2EPSCoR_S1_R1_001.fasqfastq.gz  | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - TOAD_AIRReRun2_R1_ ;
unpigz --to-stdout /project/microbiome/data_queue/seq/IllTest6-17ReRun2/rawdata/IllTestReRun2EPSCoR_S1_R2_001.fasqfastq.gz  | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - TOAD_AIRReRun2_R2_

~~making 257 R1 files and 257 R2 files, with structured names (e.g., for the R1 set):~~

...

run_parse_count_onSplitInput.pl also writes to /gscratch.

TOAD_AIRReRun2_Demux.csv is used to map MIDS to sample names and projects.

...

Versions Compared

Old Version 1

New Version Current

Key

Demultiplexing