Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It appears that there are a large number of reads going into truemiderrors_Brew20Nov_S1_L001_R1_001.fastq many of which look like they could be ITS coligos (they . They have long terminal, default G base calls and the sequence before this looks pretty consistent. It is possible that they had mids, but that the primer sequence was too far off to be recognized.

Splitting

The info lines for each read in parsed_*_R1.fastq and parsed_*_R2.fastq have the locus, the forward mid, the reverse mid, and the sample name. These can be used with the demux key to separate reads into loci, projects, and samples, in the folder sample_fastq/. The reads are in separate files for each sequenced sample, including replicates. The unique combination of forward and reverse MIDs (for a locus) is part of the filename and allows replicates to be distinguished and subsequently merged.

...

splitFastq.pl will need tweaking in the future, until sample names and the format of the key for demultiplexing and metadata stabilizes. The number of columns in the demux key has differed among some of our completed sequence lanes. Brew_20_DEMUX.csv has 10 columns: forward_barcode,reverse_barcode,locus,samplename,project,wellposition,plate,midplate,substrate,client_name.

Calculate summary statistics on reads

In a sequence library’s rawdata/ directory (e.g., /project/microbiome/data/seq/gtl_tests/iSeq100Pilot1_brew_30nov20/rawdata), in an interactive node run:

module load swset/2018.05 gcc/7.3.0 usearch/10.0.240

aggregate_usearch_fastx_info.pl