2TroutRedo Bioinformatics

Raw reads

We retrieved 12 files from the NovaSeq runs. Four of these contain the 1x100bp reads. I am not sure why we received 2 index reads for each run. The files with the raw reads of interest will be in: /project/gtl/data/raw/2Trout/1and2/rawreads.

Swap out “distribution” for “raw” for starting directory.

Unzip and split

cd /gscratch/grandol1/

mkdir 2Trout1and2

cd 2Trout1and2

unpigz --to-stdout /project/gtl/data/raw/2Trout/1and2/rawreads/2Trout1and2_S1_L001_R1_001.fastq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - 2Trout1and2_

Demultiplexing

Remove underscores and extraneous spaces

cd /project/gtl/data/raw/2Trout/1and2/rawreads

sed 's/_/-/' 2TroutRedo1and2_Demux.csv > 2TroutRedo1and2_Demux.csv

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' 2TroutRedo1and2_Demux.csv > 2TroutRedo1and2_Demux.csv

cd /gscratch/grandol1/2Trout1and2

Parse split files

/project/gtl/data/raw/2Trout/1and2/rawreads/demultiplex/run_parsebarcodes_onSplitInput.pl

Recombine by sample name and mid

/project/gtl/data/raw/2Trout/1and2/rawreads/demultiplex/run_splitFastq_gbs.sh