Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Raw reads

We retrieved four 12 files from the iSeqNovaSeq runs. One Four of these contains contain the 1x100bp reads. The file I am not sure why we received 2 index reads for each run. The files with the raw reads of interest are ( will be in: /project/gtl/seq/distribution/trout1/Trout1_*/rawreads).

Swap out “distribution” for “raw” for starting directory.

  1. Trout1_P1_S1_L001_R1_001.fastq.gz (24G) – 457,726,974 reads (1.5 GBytes uncompressed)

  2. Trout1_P2_S2_L002_R1_001.fastq.gz (25G) – 457,726,974 reads (1.5 GBytes uncompressed)

  3. Trout1_P3_S1_L001_R1_001.fastq.gz (39G) – 457,726,974 reads (1.5 GBytes uncompressed)

  4. Trout1_P4_S2_L002_R1_001.fastq.gz (40G) – 457,726,974 reads (1.5 GBytes uncompressed)

Unzip and split

cd /gscratch/grandol1/

mkdir Trout1

cd Trout1

gunzip /project/gtl/seq/raw/trout1/Trout1_1/rawreads/Trout1_P1_S1_L001_R1_001.fastq.gz

...

gunzip /project/gtl/seq/raw/trout1/Trout1_4/rawreads/Trout1_P4_S2_L002_R1_001.fastq.gz

Needs updating below 4-26-22

Demultiplexing

Split into 100000 line files

...

cat /project/gtl/seq/raw/trout1/Trout1_1/rawreads/Trout1_P1_S1_L001_R1_001.fastq | split -l 10000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool1_

cat /project/gtl/seq/raw/trout1/Trout1_2/rawreads/Trout1_P2_S2_L002_R1_001.fastq | split -l 10000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool2_

cat /project/gtl/seq/raw/trout1/Trout1_3/rawreads/Trout1_P3_S1_L001_R1_001.fastq

...

|

...

split

...

-l

...

10000000 -d

...

--suffix-length=3

...

--additional-suffix=.fastq

...

-

...

Trout1_Pool3

...

_

cat /project/gtl/seq/raw/trout1/Trout1_4/rawreads/Trout1_P4_S2_L002_R1_001.fastq | split -l 10000000 -d --suffix-length=3 --additional-suffix=.fastq - Trout1_Pool14

Demultiplexing

Remove underscores and extraneous spaces

cd /project/gtl/seq/raw/trout1/Trout1_1/rawreads

sed 's/_/-/' Trout1Pool1_Demux.csv > Trout1Pool1_Demux1.csv

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool1_Demux1.csv > Trout1Pool1_Demux_fixed.csv

Remove underscores and extraneous spaces

cd /project/gtl/seq/raw/trout1/Trout1_2/rawreads

sed 's/_/-/' Trout1Pool2_Demux.csv > Trout1Pool2_Demux1.csv

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool2_Demux1.csv > Trout1Pool2_Demux_fixed.csv

Remove underscores and extraneous spaces

cd /project/gtl/seq/raw/trout1/Trout1_3/rawreads

sed 's/_/-/' Trout1Pool3_Demux.csv > Trout1Pool3_Demux1.csv

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool3_Demux1.csv > Trout1Pool3_Demux_fixed.csv

Remove underscores and extraneous spaces

cd /project/gtl/seq/raw/trout1/Trout1_4/rawreads

sed 's/_/-/' Trout1Pool4_Demux.csv > Trout1Pool4_Demux1.csv

sed -E 's/^([[:alnum:]-]+),([[:alnum:]-]+),([[:alnum:]-]+).*/\1,\2,\3/' Trout1Pool4_Demux1.csv > Trout1Pool4_Demux_fixed.csv

cd /gscratch/grandol1/Trout1

Parse split files

/project/gtl/seq/raw/trout1/Trout1_1/demultiplex/run_parsebarcodes_onSplitInput.pl

/project/gtl/analyses/gtl/HMAX1seq/raw/trout1/Trout1_2/demultiplex/run_parsebarcodes_onSplitInput.pl

/project/gtl/seq/raw/trout1/Trout1_3/demultiplex/run_parsebarcodes_onSplitInput.pl

/project/gtl/seq/raw/trout1/Trout1_4/demultiplex/run_parsebarcodes_onSplitInput.pl

Recombine by sample name and mid

mkdir Pool1 Pool2 Pool3 Pool4

mv *_Pool1* ./Pool1

mv *_Pool2* ./Pool2

mv *_Pool3* ./Pool3

mv *_Pool4* ./Pool4

/project/gtl/seq/raw/trout1/Trout1_1/demultiplex/run_splitFastq_gbs.sh

/project/gtl/seq/raw/trout1/Trout1_2/demultiplex/run_splitFastq_gbs.sh

/project/gtl/seq/raw/trout1/Trout1_3/demultiplex/run_splitFastq_gbs.sh

/project/gtl/seq/raw/trout1/Trout1_4/demultiplex/run_splitFastq_gbs.sh