5TroutExp bioinformatics

I split the files for faster processing:

mkdir -p /gscratch/grandol1/5Trout1Exp/rawreads cd /gscratch/grandol1/5Trout1Exp/rawreads unpigz --to-stdout /project/gtl/data/distribution/Wagner/Rosenthall/5Trout/1/Exp/Exp5Trout1_S1_R1_001.fastq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - 5Trout1Exp_

Making 236 files

I split the files for faster processing:

mkdir -p /gscratch/grandol1/5Trout3/rawreads cd /gscratch/grandol1/5Trout3/rawreads unpigz --to-stdout /project/gtl/data/distribution/Wagner/Rosenthall/5Trout/3/Std/5Trout3_S1_R1_001.fastq.gz | split -l 16000000 -d --suffix-length=3 --additional-suffix=.fastq - 5Trout1Std_

Making 236 files

Demultiplexing

In /project/gtl/data/RHIR1/rawreads/ I removed extraneous spaces in the file that maps MIDS to individual identifiers (RHIR1_Demux.csv). I made a fixed version (now we have RHIR1Demux_fixed.csv):

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' RHIR1_Demux.csv > RHIR1Demux_fixed.csv

cd /gscratch/grandol1/RHIR1/rawreads/

Parse split files

/project/gtl/data/raw/5Trout1Exp/demultiplex/run_parsebarcodes_onSplitInput.pl

Recombine by sample name and mid

/project/gtl/data/raw/RHIR1/demultiplex/run_splitFastq_gbs.sh

Â