Raw reads
We retrieved four files from the iSeq. One of these contains the 1x100bp reads. The file with the raw reads of interest are ( in /project/gtl/seq/distribution/trout1/Trout1_*/rawreads
).
Trout1_P1_S1_L001_R1_001.fastq.gz (24G) – 457,726,974 reads (1.5 GBytes uncompressed)
Trout1_P2_S2_L002_R1_001.fastq.gz (25G) – 457,726,974 reads (1.5 GBytes uncompressed)
Trout1_P3_S1_L001_R1_001.fastq.gz (39G) – 457,726,974 reads (1.5 GBytes uncompressed)
Trout1_P4_S2_L002_R1_001.fastq.gz (40G) – 457,726,974 reads (1.5 GBytes uncompressed)
gunzip /project/gtl/seq/raw/trout1/Trout1_1/rawreads/Trout1_P1_S1_L001_R1_001.fastq.gz
gunzip /project/gtl/seq/raw/trout1/Trout1_2/rawreads/Trout1_P2_S2_L002_R1_001.fastq.gz
gunzip /project/gtl/seq/raw/trout1/Trout1_3/rawreads/Trout1_P3_S1_L001_R1_001.fastq.gz
gunzip /project/gtl/seq/raw/trout1/Trout1_4/rawreads/Trout1_P4_S2_L002_R1_001.fastq.gz
Needs updating below 4-26-22
Demultiplexing
Split into 100000 line files
mkdir /gscratch/grandol1/trout1 cd /gscratch/grandol1/trout1 cat /project/microbiome/data_queue/seq/trout1/Trout1_Pool3_S1_L001_R1_001.fastq | split -l 1000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool3_
Remove underscores and extraneous spaces
sed 's/_/-/' Trout1Pool3_Demux.csv > Trout1Pool3_Demux1.csv
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool3_Demux1.csv > Trout1Pool3_Demux_fixed.csv
Parse split files
/project/gtl/analyses/gtl/HMAX1/demultiplex/run_parsebarcodes_onSplitInput.pl
Recombine by sample name and mid
./run_splitFastq_gbs.sh