Trout1 iSeq Test bioinformatics

Raw reads

We retrieved four files from the iSeq. One of these contains the 1x150bp reads. The file with the raw reads of interest are ( in /project/microbiome/data_queue/seq/trout1/rawreads).

Trout-Pool3_S1_L001_R1_001.fastq.gz (273M) – 457,726,974 reads (1.5 GBytes uncompressed)

gunzip Trout-Pool3_S1_L001_R1_001.fastq.gz

Demultiplexing

Split into 100000 line files

mkdir /gscratch/grandol1/trout1
cd /gscratch/grandol1/trout1
cat /project/microbiome/data_queue/seq/trout1/Trout1_Pool3_S1_L001_R1_001.fastq | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool3_

Remove underscores and extraneous spaces

sed 's/_/-/' Trout1Pool3_Demux.csv > Trout1Pool3_Demux1.csv

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool3_Demux1.csv > Trout1Pool3_Demux_fixed.csv

Parse split files

/project/microbiome/analyses/gtl/HMAX1/demultiplex/run_parsebarcodes_onSplitInput.pl

Recombine by sample name and mid

./run_splitFastq_gbs.sh