How to get started with Nextflow for workflows on teton

Nextflow is a powerful workflow control language that can run large sets of process through SLURM and chain multiple steps together. It is commonly used for bioinformatics, but it is relatively straightforward to use for other high-throughput computing tasks.

Instructions for two examples

Simple example

Log in to teton
load the nextflow module: module load nextflow
Create the text file below and call it minimal_wrapper.nf. You can then run it with the command nextflow run minimal_wrapper.nf.
If you want to run the same command using SLURM, rather than the local computer, create the second file below, and call it teton.config. Then you can run the same command using SLURM and its distribution of jobs to the cluster. Please adjust the account name to match yours, otherwise the job will not run.
Output will be in the folder work/. This includes hidden files for standard error and out from the jobs: you can see these with ls -al work/*/*/

#!/usr/bin/env nextflow
// author: Alex Buerkle <buerkle@uwyo.edu>
nextflow.enable.dsl=2

// invoke as, to run locally:   nextflow run minimal_wrapper.nf 
// or as, to use SLURM: nextflow run minimal_wrapper.nf -c teton.config
// output will be in work/*/*/.commands.out and neighboring files

simsRds = Channel.fromPath( "/etc/*d.conf")  // several arbitrary files in the /etc/ directory

process iterateFiles{
       input:
         file x
       output:
         stdout

       """
       echo 'made it here $x $HOSTNAME'
       """
}

workflow{ 
    // workflow uses channels as input by default
    iterateFiles(simsRds) 
}

// for more on building a config for an HPC see: https://nf-co.re/usage/tutorials/step_by_step_institutional_profile
params {
  config_profile_description = 'Teton cluster profile'
    max_memory = 4.TB
    max_time = 168.h
    max_cpus = 72
}

process {
    executor = 'slurm'
    queue = 'teton'
    beforeScript = 'module load swset/2018.05 gcc/7.3.0 r/4.0.5-py27 gsl/2.5 openblas' // gsl and openblas are needed by gemma, called from R
    // Put in whatever options that are accepted by our system
    // including things like, --partition=teton-hugemem --mem=10000 # 10 GB
    clusterOptions = '--account=modelscape --time=00:05:00 --cpus-per-task=1 --ntasks-per-node=1 --mem-per-cpu=1G' 
    singularity.enabled = true
}

singularity {
 enabled = true
 autoMounts = true
     //cacheDir = '/project/evolgen/singularity_containers'
}

executor {
    queueSize = 400
    submitRateLimit = '200/2min'
}

Example of submitting many R jobs

Log in to teton
load the nextflow module: module load nextflow
Create a file from the first text window below and save it as minimal_wrapper_Rwork.nf
Create a second file with the simple R command rnorm(5) to give R something to do and call it Rtest.R
To run the collection of jobs, nextflow run minimal_wrapper_Rwork.nf -c teton.config
Output will be in the folder work/. This includes hidden files for standard error and out from the jobs: you can see these with ls -al work/*/*/

#!/usr/bin/env nextflow
// author: Alex Buerkle <buerkle@uwyo.edu>
nextflow.enable.dsl=2

// to run locally:   nextflow run minimal_wrapper_Rwork.nf 
// or using SLURM: nextflow run minimal_wrapper_Rwork.nf -c teton.config

// output will be in work/*/*/.commands.out and neighboring files

simsRds = Channel.fromPath( "/etc/*d.conf")  // several arbitrary files in the /etc directory

process iterateFiles{
       input:
         path x
       output:
         stdout

       """
       echo 'Working on $x'
       Rscript --vanilla Rtest.R $x out_$x
       """
}

workflow{ 
    // workflow uses simsRds channel as input
    iterateFiles(simsRds) 
}

UW Data Science Center

How to get started with Nextflow for workflows on teton

Instructions for two examples

Simple example

Example of submitting many R jobs

Further reading