/
How to cancel a lot of jobs with Slurm

How to cancel a lot of jobs with Slurm

Say you dispatch thousands of jobs with Slurm, but goofed something up and want to cancel some of those jobs.

  • If you want to cancel all of your jobs then you can use scancel -u username, where username is your system username (i.e. jharri62 is my username).

  • Often you may want to be selective and keep some jobs running, but cancel others. This situation can be handled via the script below (source: cas on stack exchange).

For more see: https://www.rc.fas.harvard.edu/resources/documentation/convenient-slurm-commands/

Step-by-step guide

  1. Make a file to house the program. Put it somewhere convenient, like in your home directory, or in a directory of common scripts in your home directory. One way to make a file is nano yourfilename.sh

  2. Open the file with nano or another editor and paste the following into it:

    #!/bin/bash declare -a jobs=() if [ -z "$1" ] ; then echo "Minimum Job Number argument is required. Run as '$0 jobnum'" exit 1 fi minjobnum="$1" myself="$(id -u -n)" for j in $(squeue --user="$myself" --noheader --format=%i) ; do if [ "$j" -gt "$minjobnum" ] ; then jobs+=($j) fi done scancel "${jobs[@]}"
  3. Make the file executable: chmod u+x yourfilename.sh

  4. Usage is: yourfilename.sh 300000 where 300000 is the base job id number used to delimit where wanted jobs stop. In other words, any of your jobs with an ID smaller than this number will be retained, but jobs with IDs larger than this number will be removed. This will not mess with anybody else's jobs. Alternatively, if the current directory is not in your PATH, use: bash yourfilename.sh 300000


You can also give an explicit range.

#!/bin/bash for j in `seq 14423305 14423724` ; do scancel $j echo $j done