...
I modified the script from 16S/ITS work for splitting fastq files based on information in their info line, to different files. It is: /project/microbiome/analyses/gtl/HMAX1/demultiplex/splitFastq_manyInputfiles_gbs.pl
and is run with run_splitFastq_gbs.sh
, in the same directory. Output is was initially in /project/microbiome/analyses/gtl/HMAX1/demultiplex/sample_fastq
It looks like eight . All of this now is in /project/microbiome/data/seq/HMAX1/demultiplex
, so that it is reachable thru globus at /project/microbiome/data/seq/HMAX1/
.
Eight individuals were duplicated, with different MIDs. Was this planned? I didn’t account for this in the parsing script (the info line only has the individual sample ID, not the MID. I could add it back in. But then the replicates would need to be merged. As is now, all reads for an individual are going into one file. There are also four tubes labeled ‘BLANK' that will all have been merged (all the reads went into BLANK.GGATCCTT.fq).
To do:Still want to go back and summarize
Do de novo assemblies
Summarize the parse report files in /gscratch with some code to iterate over all the individual reports and get an overall count.
variant calling