How to get started with Nextflow for workflows on teton
Nextflow is a powerful workflow control language that can run large sets of process through SLURM and chain multiple steps together. It is commonly used for bioinformatics, but it is relatively straightforward to use for other high-throughput computing tasks.
Instructions for two examples
Simple example
Log in to teton
load the nextflow module:
module load nextflow
Create the text file below and call it
minimal_wrapper.nf
. You can then run it with the commandnextflow run minimal_wrapper.nf
.If you want to run the same command using SLURM, rather than the local computer, create the second file below, and call it
teton.config
. Then you can run the same command using SLURM and its distribution of jobs to the cluster. Please adjust theaccount
name to match yours, otherwise the job will not run.Output will be in the folder
work/
. This includes hidden files for standard error and out from the jobs: you can see these withls -al work/*/*/
#!/usr/bin/env nextflow
// author: Alex Buerkle <buerkle@uwyo.edu>
nextflow.enable.dsl=2
// invoke as, to run locally: nextflow run minimal_wrapper.nf
// or as, to use SLURM: nextflow run minimal_wrapper.nf -c teton.config
// output will be in work/*/*/.commands.out and neighboring files
simsRds = Channel.fromPath( "/etc/*d.conf") // several arbitrary files in the /etc/ directory
process iterateFiles{
input:
file x
output:
stdout
"""
echo 'made it here $x $HOSTNAME'
"""
}
workflow{
// workflow uses channels as input by default
iterateFiles(simsRds)
}
// for more on building a config for an HPC see: https://nf-co.re/usage/tutorials/step_by_step_institutional_profile
params {
config_profile_description = 'Teton cluster profile'
max_memory = 4.TB
max_time = 168.h
max_cpus = 72
}
process {
executor = 'slurm'
queue = 'teton'
beforeScript = 'module load swset/2018.05 gcc/7.3.0 r/4.0.5-py27 gsl/2.5 openblas' // gsl and openblas are needed by gemma, called from R
// Put in whatever options that are accepted by our system
// including things like, --partition=teton-hugemem --mem=10000 # 10 GB
clusterOptions = '--account=modelscape --time=00:05:00 --cpus-per-task=1 --ntasks-per-node=1 --mem-per-cpu=1G'
singularity.enabled = true
}
singularity {
enabled = true
autoMounts = true
//cacheDir = '/project/evolgen/singularity_containers'
}
executor {
queueSize = 400
submitRateLimit = '200/2min'
}
Example of submitting many R jobs
Log in to teton
load the nextflow module:
module load nextflow
Create a file from the first text window below and save it as
minimal_wrapper_Rwork.nf
Create a second file with the simple R command
rnorm(5)
to give R something to do and call itRtest.R
To run the collection of jobs,
nextflow run minimal_wrapper_Rwork.nf -c teton.config
Output will be in the folder
work/
. This includes hidden files for standard error and out from the jobs: you can see these withls -al work/*/*/
#!/usr/bin/env nextflow
// author: Alex Buerkle <buerkle@uwyo.edu>
nextflow.enable.dsl=2
// to run locally: nextflow run minimal_wrapper_Rwork.nf
// or using SLURM: nextflow run minimal_wrapper_Rwork.nf -c teton.config
// output will be in work/*/*/.commands.out and neighboring files
simsRds = Channel.fromPath( "/etc/*d.conf") // several arbitrary files in the /etc directory
process iterateFiles{
input:
path x
output:
stdout
"""
echo 'Working on $x'
Rscript --vanilla Rtest.R $x out_$x
"""
}
workflow{
// workflow uses simsRds channel as input
iterateFiles(simsRds)
}
Further reading
This short post was meant to get you started with one step “workflows”. I might post more in the future, but meanwhile, you can find more information at the following sources.
Nextflow’s official documentation with some examples: Nextflow — Nextflow documentation
Introduction to Bioinformatics workflows with Nextflow and nf-core: Summary and Setup