...
I modified the script from 16S/ITS work for splitting fastq files based on information in their info line, to different files. It is: /project/microbiome/analyses/gtl/HMAX1/demultiplex/splitFastq_manyInputfiles_gbs.pl
and is run with run_splitFastq_gbs.sh
, in the same directory. I started it running on with 12 hours of wall time (tomorrow is a system maintenance day, so I am seeing whether I can finish before that starts). Output is in /project/microbiome/analyses/gtl/HMAX1/demultiplex/sample_fastq
To do:
Still want to go back and summarize the parse report files in /gscratch
It looks like a few individuals were duplicated, with different MIDs. Was this planned? I didn’t account for this in the parsing script (the info line only has the individual sample ID, not the MID. I could add it back in. But then the replicates would need to be merged. As is now, all reads for an individual are going into one file.