Info |
---|
Status (02 May 2022)
|
Table of Contents |
---|
Demultiplexing and splitting
...
Code Block |
---|
mkdir -p /gscratch/grandol1/NS6/rawdata cd /gscratch/grandol1/NS6/rawdata unpigz --to-stdout /project/microbiome/data_queue/seq/NS6/rawdata/NovaSeq6_pool_1.fq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - NS6_R1_ ; unpigz --to-stdout /project/microbiome/data_queue/seq/NS6/rawdata/NovaSeq6_pool_2.fq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - NS6_R2_ |
making 94 257 R1 files and 94 257 R2 files, with structured names (e.g., for the R1 set):
/gscratch/grandol1/5ALA/rawdata/NS6_R1_000.fastq
/gscratch/grandol1/5ALA/rawdata/NS6_R1_001.fastq
etc.
Stopped at above step on 1/31/23 2:41pm
run_parse_count_onSplitInput.pl
also writes to /gscratch
.
NS5NS6_Demux.csv
is used to map MIDS to sample names and projects.
...
splitFastq.pl
and splitFastq_manyInputfiles.pl
will need tweaking in the future, whenever sample names and the format of the key for demultiplexing and metadata changes. The number of columns has differed among some of early sequence lanes, which necessitated changes to this parsing script.
Stopped at above step on 2/01/23 6:05pm
Calculate summary statistics on reads
...