Page Comparison

Raw reads

We received four files with sequence reads. Two of these contain the 1x100bp reads, because two lanes were used on the instrument. Two of these because CU unnecessarily ran indexing reads on the fragments. I deleted these nonsense files. The two files with the raw reads of interest are (these are in /project/microbiome/data/seq/HMAX1/rawreads).

...

summarized denovo assemblies: 98 (46,971,194 contigs), 96 (27,839,279 contigs), 92 (18,493,729 contigs). Um, that’s a lot. Previously they aligned to the Helianthus annuus genome (v1.0). This was in Testing for evolutionary change in restoration: A genomic comparison between ex situ, native, and commercial seed sources of Helianthus maximiliani. We will use the v2.0 annuus genome (v2.0). It is in /project/evolgen/data/public/genomes/helianthus/GCF_002127325.2_HanXRQr2.0-SUNRISE_genomic.fna.gz
Next up, use bwa to map against reference genome.

Assembly

10 Sep 2022 Working in /project/microbiome/data/seq/HMAX1/assem and assembling all reads in /project/microbiome/data/seq/HMAX1/demultiplex/sample_fastq/ against /project/evolgen/data/public/genomes/helianthus/GCF_002127325.2_HanXRQr2.0-SUNRISE_genomic.fna.gz.

Ran bwa index -a bwtsw GCF_002127325.2_HanXRQr2.0-SUNRISE_genomic.fna.gz by hand in an interactive node
Commands are in 0_assem.nf. Run this with nextflow run -bg 0_assem.nf -c teton.config. These are jobs are using: module load swset/2018.05 gcc/7.3.0 bwa/0.7.17 samtools/1.12 as specified in teton.config in this directory (bwa is version 0.7.17-r1188)

… working here …

Variant calling

Copying steps from https://github.com/zgompert/DimensionsExperiment

Variant calling with bcftools version 1.9

Code Block

bcftools mpileup -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/gompert-group3/data/LmelGenome/Lmel_dovetailPacBio_genome.fasta -q 30 -Q 20 -I -b lmel_bams.txt -o lmel_variants.bcf -O u -a FORMAT/AD,FORMAT/DP
bcftools call -v -c -p 0.01 -P 0.001 -O v -o lmel_variants.vcf lmel_variants.bcf

Variant filtering with vcfFilter.pl and filterSomeMore.pl

Used the following filters: 2X coverage (2302 reads), 10 alt. reads, not fixed, Man-Whitney P for BQB = 0.01, Man-Whitney P for RPB = 0.01, minimum mapping quality 30, missing data for fewer than 230 (80% with data), biallelic SNPs only.
Ended up with 64,061 SNPs in /uufs/chpc.utah.edu/common/home/gompert-group2/data/dimension_lyc_gbs/Variants/filtered2x_lmel_variants.vcf.
Next, I dropped SNPs with > mean + 3 SD coverage (possible repeats). This left 63,194 SNPs in morefilter_filtered2x_lmel_variants.vcf.

To do:

Summarize the parse report files in /gscratch with some code to iterate over all the individual reports and get an overall count.variant calling

Versions Compared

Old Version 21

New Version 22

Key

Raw reads

Assembly

Variant calling

Page Comparison

Versions Compared

Old Version 21

New Version 22

Key

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-0">[data-colorid=ipubp0pwi3]{color:#bf2600} html[data-color-mode=dark] [data-colorid=ipubp0pwi3]{color:#ff6640}</span>Raw reads

Assembly

Variant calling

Raw reads