Nextflow is a powerful workflow control language that can run large sets of process through SLURM and chain multiple steps together. It is commonly used for bioinformatics, but it is relatively straightforward to use for other high-throughput computing tasks.
\uD83D\uDCD8 Instructions for two examples
Simple example
Log in to teton
load the nextflow module:
module load nextflow
Create the text file below and call it
minimal_wrapper.nf
. You can then run it with the commandnextflow run minimal_wrapper.nf
.If you want to run the same command using SLURM, rather than the local computer, create the second file below, and call it
teton.config
. Then you can run the same command using SLURM and its distribution of jobs to the cluster. Please adjust theaccount
name to match yours, otherwise the job will not run.
#!/usr/bin/env nextflow // author: Alex Buerkle <buerkle@uwyo.edu> nextflow.enable.dsl=2 // invoke as, to run locally: nextflow run minimal_wrapper.nf // or as, to use SLURM: nextflow run minimal_wrapper.nf -c teton.config // output will be in work/*/*/.commands.out and neighboring files simsRds = Channel.fromPath( "/etc/*d.conf") // several arbitrary files in the /etc/ directory process iterateFiles{ input: file x output: stdout """ echo 'made it here $x $HOSTNAME' """ } workflow{ // workflow uses channels as input by default iterateFiles(simsRds) }
// for more on building a config for an HPC see: https://nf-co.re/usage/tutorials/step_by_step_institutional_profile params { config_profile_description = 'Teton cluster profile' max_memory = 4.TB max_time = 168.h max_cpus = 72 } process { executor = 'slurm' queue = 'teton' beforeScript = 'module load swset/2018.05 gcc/7.3.0 r/4.0.5-py27 gsl/2.5 openblas' // gsl and openblas are needed by gemma, called from R // Put in whatever options that are accepted by our system // including things like, --partition=teton-hugemem --mem=10000 # 10 GB clusterOptions = '--account=modelscape --time=00:05:00 --cpus-per-task=1 --ntasks-per-node=1 --mem-per-cpu=1G' singularity.enabled = true } singularity { enabled = true autoMounts = true //cacheDir = '/project/evolgen/singularity_containers' } executor { queueSize = 400 submitRateLimit = '200/2min' }
Example of submitting many R jobs
Log in to teton
load the nextflow module:
module load nextflow
Create a file from the first text window below and save it as
minimal_wrapper_Rwork.nf
Create a second file with the simple R command
rnorm(5)
to give R something to do and call itRtest.R
To run the collection of jobs,
nextflow run minimal_wrapper_Rwork.nf -c teton.config
#!/usr/bin/env nextflow // author: Alex Buerkle <buerkle@uwyo.edu> nextflow.enable.dsl=2 // to run locally: nextflow run minimal_wrapper_Rwork.nf // or using SLURM: nextflow run minimal_wrapper_Rwork.nf -c teton.config // output will be in work/*/*/.commands.out and neighboring files simsRds = Channel.fromPath( "/etc/*d.conf") // several arbitrary files in the /etc directory process iterateFiles{ input: path x output: stdout """ echo 'Working on $x' Rscript --vanilla Rtest.R $x out_$x """ } workflow{ // workflow uses simsRds channel as input iterateFiles(simsRds) }
rnorm(5)
\uD83D\uDCCB Further reading
This short post was meant to get you started with one step “workflows”. I might post more in the future, but meanwhile, you can find more information at the following sources.
Nextflow’s official documentation with some examples: https://www.nextflow.io/docs/latest/index.html