...
compressed all
sample_fastq/
files with pigz: usingsbatch /project/microbiome/data/seq/HMAX1/demultiplex/run_pigz.sh
moved fastq for all four blank samples (data are all in one file because names are collapsed; noted above) to a subfolder (
/project/microbiome/data/seq/HMAX1/demultiplex/sample_fastq/blanks
), to get them out of the way.started denovo assembly in
/gscratch/buerkle/data/HMAX1/denovo
Completed first step for dDocent and am running cd-hit for 92%, 96% and 98% minimum match. Initially didn’t give these enough wall time and in reruns I bumped up the number of cores to 16.
summarized denovo assemblies: 98 (46,971,194 contigs), 96 (27,839,279 contigs), 92 (18,493,729 contigs). Um, that’s a lot. Previously they aligned to the Helianthus annuus genome (v1.0), so we will try that. This was in Testing for evolutionary change in restoration: A genomic comparison between ex situ, native, and commercial seed sources of Helianthus maximiliani. Fetching annuus genome now.
To do:
Summarize the parse report files in /gscratch with some code to iterate over all the individual reports and get an overall count.
variant calling
...