...
Working in /project/microbiome/data/seq/HMAX1/assem
and assembling all reads in /project/microbiome/data/seq/HMAX1/demultiplex/sample_fastq/
against /project/evolgen/data/public/genomes/helianthus/GCF_002127325.2_HanXRQr2.0-SUNRISE_genomic.fna
. gz.
Ran
bwa index -a bwtsw GCF_002127325.2_HanXRQr2.0-SUNRISE_genomic.fna.gz
by hand in an interactive node (took roughly one hour)Commands are in
0_assem.nf
. Run this withnextflow run -bg 0_assem.nf -c teton.config
. These are jobs are using:module load swset/2018.05 gcc/7.3.0 bwa/0.7.17 samtools/1.12
as specified inteton.config
in this directory (bwa is version 0.7.17-r1188). Output is in/project/microbiome/data/seq/HMAX1/assem/sambam/
. Gave each job 60 minutes, which was unnecessarily long, but conservative. Longest running jobs I could see were less than 20 minutes. Moved all 476 468 inputs files through in about 30 minutes total.
...
Following steps from https://github.com/zgompert/DimensionsExperiment.
Built bcftools version 1.16 and installed in
/project/evolgen/bin/
.bcftools
needed reference genome in bzip2 format, not gzip. So I now simply have an unzipped reference genome, which I have reindexed.Completed this step with something like:
sbatch --account=evolgen --time=1-00:00 --nodes=1 --mem=120G --mail-type=ENDÂ 0_call_variants.sh
Filtering vcf:
Started notes in
1_filter_variants.sh
...
Variant filtering
Used the following filters: 2X coverage (2302 reads), 10 alt. reads, not fixed, Man-Whitney P for BQB = 0.01, Man-Whitney P for RPB = 0.01, minimum mapping quality 30, missing data for fewer than 230 (80% with data), biallelic SNPs only.
Ended up with 64,061 SNPs in /uufs/chpc.utah.edu/common/home/gompert-group2/data/dimension_lyc_gbs/Variants/filtered2x_lmel_variants.vcf.
...
To do:
Summarize the parse report files in /gscratch with some code to iterate over all the individual reports and get an overall count.