...
I uncompressed the forward and reverse read files with
unpigz
on an interactive node (could also used single-threaded gunzip).I fixed the line-endings (from MS DOS line-endings) in the
Brew_20_DEMUX.csv
withdos2unix Brew_20_DEMUX.csv
I ran
sbatch run_slurm_parse_count.sh
after editingrun_slurm_parse_count.sh
to have the correct filenames and string for the sequencer id.
It appears that there are a large number of reads going into truemiderrors_Brew20Nov_S1_L001_R1_001.fastq
many of which look like they could be ITS coligos (they have long terminal, default G base calls and the sequence before this looks pretty consistent.
Splitting
The info lines for each read in parsed_*_R1.fastq
and parsed_*_R2.fastq
have the locus, the forward mid, the reverse mid, and the sample name. These can be used with the demux key to separate reads into loci, projects, and samples, in the folder sample_fastq/
. The reads are in separate files for each sequenced sample, including replicates. The unique combination of forward and reverse MIDs (for a locus) is part of the filename and allows replicates to be distinguished and subsequently merged.
...