...
Eight individuals were duplicated, with different MIDs. Was this planned? I didn’t account for this in the parsing script (the info line only has the individual sample ID, not the MID. I could add it back in. But then the replicates would need to be merged. As is now, all reads for an individual are going into one file. There are also four tubes labeled ‘BLANK' that will all have been merged (all the reads went into BLANK.GGATCCTT.fq).
On I moved 8 files that contained no lines into empty/
with wc -l *.gz | awk '{if ($1 == "0") print $2 }' | xargs mv -t empty/
compressed all
sample_fastq/
files with pigz: usingsbatch /project/microbiome/data/seq/HMAX1/demultiplex/run_pigz.sh
moved fastq for all four blank samples (data are all in one file because names are collapsed; noted above) to a subfolder (
/project/microbiome/data/seq/HMAX1/demultiplex/sample_fastq/blanks
), to get them out of the way.started denovo assembly in
/gscratch/buerkle/data/HMAX1/denovo
Completed first step for dDocent and am running cd-hit for 92%, 96% and 98% minimum match. Initially didn’t give these enough wall time and in reruns I bumped up the number of cores to 16.
...