How to cancel a lot of jobs with Slurm

Say you dispatch thousands of jobs with Slurm, but goofed something up and want to cancel some of those jobs.

  • If you want to cancel all of your jobs then you can use scancel -u username, where username is your system username (i.e. jharri62 is my username).

  • Often you may want to be selective and keep some jobs running, but cancel others. This situation can be handled via the script below (source: cas on stack exchange).

For more see:

Step-by-step guide

  1. Make a file to house the program. Put it somewhere convenient, like in your home directory, or in a directory of common scripts in your home directory. One way to make a file is nano

  2. Open the file with nano or another editor and paste the following into it:

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 #!/bin/bash declare -a jobs=() if [ -z "$1" ] ; then echo "Minimum Job Number argument is required. Run as '$0 jobnum'" exit 1 fi minjobnum="$1" myself="$(id -u -n)" for j in $(squeue --user="$myself" --noheader --format=%i) ; do if [ "$j" -gt "$minjobnum" ] ; then jobs+=($j) fi done scancel "${jobs[@]}"
  3. Make the file executable: chmod u+x

  4. Usage is: 300000 where 300000 is the base job id number used to delimit where wanted jobs stop. In other words, any of your jobs with an ID smaller than this number will be retained, but jobs with IDs larger than this number will be removed. This will not mess with anybody else's jobs. Alternatively, if the current directory is not in your PATH, use: bash 300000

You can also give an explicit range.

1 2 3 4 5 6 #!/bin/bash for j in `seq 14423305 14423724` ; do scancel $j echo $j done