...
The work is done by run_parse_count_onSplitInput.pl
. As the name implies, we split the raw data into many files, so that the parsing can be done in parallel by many nodes. The approximate string matching that we are doing requires ~140 hours of CPU time, so we are splitting the task across ~220 191 jobs. By doing so, the parsing takes less than one hour.
...