Page Comparison

Info

Status (02 May 2022)

Data arrived by Globus on 01 10 2023. Everything below is modified from Bioinformatics for Novaseq run 4 ). Data processing ~~finished~~ .

Table of Contents

Demultiplexing and splitting

...

Code Block

mkdir -p /gscratch/grandol1/NS6/rawdata
cd /gscratch/grandol1/NS6/rawdata
unpigz --to-stdout /project/microbiome/data_queue/seq/NS6/rawdata/NovaSeq6_pool_1.fq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - NS6_R1_ ;
unpigz --to-stdout /project/microbiome/data_queue/seq/NS6/rawdata/NovaSeq6_pool_2.fq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - NS6_R2_

making 94 257 R1 files and 94 257 R2 files, with structured names (e.g., for the R1 set):

/gscratch/grandol1/5ALA/rawdata/NS6_R1_000.fastq
/gscratch/grandol1/5ALA/rawdata/NS6_R1_001.fastq
etc.Stopped at above step on 1/31/23 2:41pm

run_parse_count_onSplitInput.pl also writes to /gscratch.

NS5NS6_Demux.csv is used to map MIDS to sample names and projects.

...

splitFastq.pl and splitFastq_manyInputfiles.pl will need tweaking in the future, whenever sample names and the format of the key for demultiplexing and metadata changes. The number of columns has differed among some of early sequence lanes, which necessitated changes to this parsing script.

Stopped at above step on 2/01/23 6:05pm

Calculate summary statistics on reads

...

Versions Compared

Old Version 2

New Version Current

Key

Demultiplexing and splitting

Calculate summary statistics on reads