Raw reads
We retrieved 12 files from the NovaSeq runs. Four of these contain the 1x100bp reads. I am not sure why we received 2 index reads for each run. The files with the raw reads of interest will be in: /project/gtl/data/raw/2Trout/1and2/rawreads
.
Swap out “distribution” for “raw” for starting directory.
Unzip and split
cd /gscratch/grandol1/
mkdir 2Trout1and2
cd 2Trout1and2
unpigz --to-stdout /project/gtl/data/raw/2Trout/1and2/rawreads/2Trout1and2_S1_L001_R1_001.fastq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - 2Trout1and2_
Demultiplexing
Remove underscores and extraneous spaces
cd /project/gtl/data/raw/2Trout/1and2/rawreads
sed 's/_/-/' 2TroutRedo1and2_Demux.csv > 2TroutRedo1and2_Demux.csv
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' 2TroutRedo1and2_Demux.csv > 2TroutRedo1and2_Demux.csv
cd /gscratch/grandol1/2Trout1and2
Parse split files
/project/gtl/data/raw/2Trout/1and2/demultiplex/run_parsebarcodes_onSplitInput.pl
Recombine by sample name and mid
/project/gtl/data/raw/2Trout/1and2
/demultiplex/run_splitFastq_gbs.sh