Raw reads
We retrieved four 12 files from the iSeqNovaSeq runs. One Four of these contains the 1x150bp reads. The file contain the 1x100bp reads. I am not sure why we received 2 index reads for each run. The files with the raw reads of interest are ( will be in: /project/microbiomegtl/data_queueseq/seqdistribution/trout1/Trout1_*/rawreads
.
Swap out “distribution” for “raw” for starting directory.
Trout1_P1_S1_L001_R1_001.fastq.gz (24G)
Trout1_P2_S2_L002_R1_001.
...
Trout-Pool3fastq.gz (25G)
Trout1_P3_S1_L001_R1_001.fastq.gz (273M) – 457,726,974 reads (1.5 GBytes uncompressed)
...
39G)
Trout1_P4_S2_L002_R1_001.fastq.gz (40G)
Unzip and split
cd /gscratch/grandol1/
mkdir Trout1
cd Trout1
gunzip /project/gtl/seq/raw/trout1/Trout1_1/rawreads/Trout1_P1_S1_L001_R1_001.fastq.gz
Demultiplexing
Split into 100000 line files
...
gunzip /project/gtl/seq/raw/trout1/Trout1_2/rawreads/Trout1_P2_S2_L002_R1_001.fastq.gz
gunzip /project/gtl/seq/raw/trout1/Trout1_3/rawreads/Trout1_P3_S1_L001_R1_001.fastq.gz
gunzip /project/gtl/seq/raw/trout1/Trout1_4/rawreads/Trout1_P4_S2_L002_R1_001.fastq.gz
cat /project/gtl/seq/raw/trout1/Trout1_1/rawreads/Trout1_P1_S1_L001_R1_001.fastq | split -l 10000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool1_
cat /project/gtl/seq/raw/trout1/Trout1_2/rawreads/Trout1_P2_S2_L002_R1_001.fastq | split -l 10000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool2_
cat /project/gtl/seq/raw/trout1/Trout1_3/rawreads/Trout1_P3_S1_L001_R1_001.fastq
...
|
...
split
...
-l
...
10000000 -d
...
--suffix-length=3
...
--additional-suffix=.fastq
...
-
...
Trout1_Pool3_
cat /project/gtl/seq/raw/trout1/Trout1_4/rawreads/Trout1_P4_S2_L002_R1_001.fastq | split -l 10000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool14
Demultiplexing
Remove underscores and extraneous spaces
cd /project/gtl/seq/raw/trout1/Trout1_1/rawreads
sed 's/_/-/' Trout1Pool1_Demux.csv > Trout1Pool1_Demux1.csv
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool1_Demux1.csv > Trout1Pool1_Demux_fixed.csv
Remove underscores and extraneous spaces
cd /project/gtl/seq/raw/trout1/Trout1_2/rawreads
sed 's/_/-/' Trout1Pool2_Demux.csv > Trout1Pool2_Demux1.csv
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool2_Demux1.csv > Trout1Pool2_Demux_fixed.csv
Remove underscores and extraneous spaces
cd /project/gtl/seq/raw/trout1/Trout1_3/rawreads
sed 's/_/-/' Trout1Pool3_Demux.csv > Trout1Pool3_Demux1.csv
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool3_Demux1.csv > Trout1Pool3_Demux_fixed.csv
Remove underscores and extraneous spaces
cd /project/gtl/seq/raw/trout1/Trout1_4/rawreads
sed 's/_/-/' Trout1Pool4_Demux.csv > Trout1Pool4_Demux1.csv
sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool4_Demux1.csv > Trout1Pool4_Demux_fixed.csv
cd /gscratch/grandol1/Trout1
Parse split files
/project/microbiome/analyses/gtl/HMAX1/gtl/seq/raw/trout1/Trout1_1/demultiplex/run_parsebarcodes_onSplitInput.pl
/project/gtl/seq/raw/trout1/Trout1_2/demultiplex/run_parsebarcodes_onSplitInput.pl
/project/gtl/seq/raw/trout1/Trout1_3/demultiplex/run_parsebarcodes_onSplitInput.pl
/project/gtl/seq/raw/trout1/Trout1_4/demultiplex/run_parsebarcodes_onSplitInput.pl
Recombine by sample name and mid
mkdir Pool1 Pool2 Pool3 Pool4
mv *_Pool1* ./Pool1
mv *_Pool2* ./Pool2
mv *_Pool3* ./Pool3
mv *_Pool4* ./Pool4
/project/gtl/seq/raw/trout1/Trout1_1/demultiplex/run_splitFastq_gbs.sh
/project/gtl/seq/raw/trout1/Trout1_2/demultiplex/run_splitFastq_gbs.sh
/project/gtl/seq/raw/trout1/Trout1_3/demultiplex/run_splitFastq_gbs.sh
/project/gtl/seq/raw/trout1/Trout1_4/demultiplex/run_splitFastq_gbs.sh