How to get started with Nextflow for workflows on teton

Nextflow is a powerful workflow control language that can run large sets of process through SLURM and chain multiple steps together. It is commonly used for bioinformatics, but it is relatively straightforward to use for other high-throughput computing tasks.

 Instructions for two examples

Simple example

  1. Log in to teton

  2. load the nextflow module: module load nextflow

  3. Create the text file below and call it minimal_wrapper.nf. You can then run it with the command nextflow run minimal_wrapper.nf.

  4. If you want to run the same command using SLURM, rather than the local computer, create the second file below, and call it teton.config. Then you can run the same command using SLURM and its distribution of jobs to the cluster. Please adjust the account name to match yours, otherwise the job will not run.

  5. Output will be in the folder work/. This includes hidden files for standard error and out from the jobs: you can see these with ls -al work/*/*/

#!/usr/bin/env nextflow // author: Alex Buerkle <buerkle@uwyo.edu> nextflow.enable.dsl=2 // invoke as, to run locally: nextflow run minimal_wrapper.nf // or as, to use SLURM: nextflow run minimal_wrapper.nf -c teton.config // output will be in work/*/*/.commands.out and neighboring files simsRds = Channel.fromPath( "/etc/*d.conf") // several arbitrary files in the /etc/ directory process iterateFiles{ input: file x output: stdout """ echo 'made it here $x $HOSTNAME' """ } workflow{ // workflow uses channels as input by default iterateFiles(simsRds) }
// for more on building a config for an HPC see: https://nf-co.re/usage/tutorials/step_by_step_institutional_profile params { config_profile_description = 'Teton cluster profile' max_memory = 4.TB max_time = 168.h max_cpus = 72 } process { executor = 'slurm' queue = 'teton' beforeScript = 'module load swset/2018.05 gcc/7.3.0 r/4.0.5-py27 gsl/2.5 openblas' // gsl and openblas are needed by gemma, called from R // Put in whatever options that are accepted by our system // including things like, --partition=teton-hugemem --mem=10000 # 10 GB clusterOptions = '--account=modelscape --time=00:05:00 --cpus-per-task=1 --ntasks-per-node=1 --mem-per-cpu=1G' singularity.enabled = true } singularity { enabled = true autoMounts = true //cacheDir = '/project/evolgen/singularity_containers' } executor { queueSize = 400 submitRateLimit = '200/2min' }

Example of submitting many R jobs

  1. Log in to teton

  2. load the nextflow module: module load nextflow

  3. Create a file from the first text window below and save it as minimal_wrapper_Rwork.nf

  4. Create a second file with the simple R command rnorm(5) to give R something to do and call it Rtest.R

  5. To run the collection of jobs, nextflow run minimal_wrapper_Rwork.nf -c teton.config

  6. Output will be in the folder work/. This includes hidden files for standard error and out from the jobs: you can see these with ls -al work/*/*/

#!/usr/bin/env nextflow // author: Alex Buerkle <buerkle@uwyo.edu> nextflow.enable.dsl=2 // to run locally: nextflow run minimal_wrapper_Rwork.nf // or using SLURM: nextflow run minimal_wrapper_Rwork.nf -c teton.config // output will be in work/*/*/.commands.out and neighboring files simsRds = Channel.fromPath( "/etc/*d.conf") // several arbitrary files in the /etc directory process iterateFiles{ input: path x output: stdout """ echo 'Working on $x' Rscript --vanilla Rtest.R $x out_$x """ } workflow{ // workflow uses simsRds channel as input iterateFiles(simsRds) }

 Further reading

This short post was meant to get you started with one step “workflows”. I might post more in the future, but meanwhile, you can find more information at the following sources.

  1. Nextflow’s official documentation with some examples: Nextflow — Nextflow documentation

  2. Introduction to Bioinformatics workflows with Nextflow and nf-core: Summary and Setup