...
splitFastq.pl
and splitFastq_manyInputfiles.pl
will need tweaking in the future, whenever sample names and the format of the key for demultiplexing and metadata changes. The number of columns has differed among some of early sequence lanes, which necessitated changes to this parsing script.
Below This Point is yet to be done
Calculate summary statistics on reads
...
Statistics on the initial number reads, the number of reads that merged, and the number of reads that remain after filtering are in filtermergestats.csv
in each project folder. Please note that this will not include the number of reads that failed to merge, but we were able to join. This category is likely to include ITS sequences for which the amplicon was large enough that our 2x250bp reads could not span the whole length. The greater number removed in ITS (orange) in the plot below is consistent with this idea. For the full lane these summaries were concatenated in tfmergedreads/
with
Below This Point is yet to be done
cat */*/filtermergestats.csv > all_filtermergestats.csv
...