Software: Slurm

From arccwiki
Jump to: navigation, search

The Slurm Workload Manager (https://slurm.schedmd.com) is a powerful and flexible workload manager used to schedule jobs on HPC clusters. ARCC utilizes Slurm on Teton, Mount Moran, and Loren. Slurm is the basis of which all jobs are to be submitted. These include batch and interactive jobs. Slurm consist of several facing user commands. All of which have appropriate Unix man pages associated with them and should be consulted.

Commands

  • sacct
- Query detailed information about job that have completed. Use this utility to get information about running or completed jobs
  • salloc
- Request in interactive job for debugging and/or interactive computing. ARCC configures the salloc command to launch interactive shell on individual compute nodes with your current environment carried over from the current session (except in the dgx partition where the environment is reinitialized for Ubuntu). This command requires specifying a project account (-A,--account=) and walltime (-t,--time=).
  • sbatch
- Submit a batch job consisting of a single job or job array. Several methods can be used to submit batch jobs. A script file can be used and provided as an argument on the command line. Alternatively, and more rare, the use of standard input can be used and batch job can be created interactively. We recommend writing the batch job in a script so that it may be referenced at a later time.
  • scancel
- Cancel jobs after submission. Works on pending and running jobs. By default, provide a jobid or set of jobids to cancel. Alternatively, one can use sets of flags to cancel specific jobs relating to account, name, partition, qos, reservation, nodelist. To cancel all array tasks, specify the parent jobid.
  • sinfo
- View the status of the Slurm partitions or nodes. Status of nodes that are drained can be seen using the -R flag.
  • squeue
- View what is running or waiting to run in the job queue. Several modifiers and formats can be supplied to the command. You may be interested in the use of arccq as an alternative. The command arccjobs also provides a summary.
  • sreport
- Obtain information regarding usage since last database roll up (usually around midnight each day). sreport an be used as an interactive tool to see usage of the clusters.
  • srun
- A front-end launcher for job steps which includes serial and parallel jobs. srun can be considered an equivalent to mpirun or mpiexec when launching MPI jobs. Using srun inside a job is defined to be a job step which provides accounting information relating to memory, cpu time, and other parameters that are valuable when a job terminates unexpectedly or historical information is needed.

There are some additional commands, however they'll not be mentioned here because they're not that useful on our system for general users. It's important to note that reading the man pages on the Slurm commands can be highly beneficial and if you have questions, ARCC encourages your to request information on submitting jobs to arcc-help@uwyo.edu.


Batch Jobs

Batch jobs are jobs which are submitted via job script or commands that are input into the sbatch command interactively which will then enter the queueing system and prepare for execution, then execute when possible. The execution could start immediately if the queue is not completely full, start after a short time period if preemption opted for, or after extensive time if queue is full or running limits are already reached.

A simple sbatch script to submit a simple "Hello World!" type problem follows:

#!/bin/bash

### Assume this file is named hello.sh

#SBATCH --account=arcc
#SBATCH --time=24:00:00

echo "Hello World!"

The two '#SBATCH' directives above are required for all job submissions, whether interactive or batch. The values to account should be changed to the appropriate project account and the time should be changed to an appropriate walltime limit. This is walltime limit, not CPU time. These values could also be supplied when submitting jobs by providing them directly on the command line when submitting. Slurm will default jobs to use one node, one task per node, and once cpu per node.

Submitting jobs

 $ sbatch hello.sh 

or, with account and time on the command line directly rather than as directives in the shell script:

 $ sbatch --account=arcc --time=24:00:00 test.sh 

Single Node, Multi-Core Jobs

Slurm creates allocations of resources and resources can vary depending on the work needing to be done with the cluster. A batch job that requires multiple cores can have a few different layouts depending on what is intending to be run. If the job is a multi-threaded application such as OpenMP or utilizes pthreads, it's best to set the number of tasks to 1. The below script will request that a single node with 4 cores available. The job script, assuming OpenMP, sets the number of threads to to the job provided environment variable SLURM_CPUS_PER_TASK.

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-tasks=4

export $OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./application

Single Node, Multi-Tasks

This could be a multi-tasked job where the application has it's own parallel processing engine or uses MPI, but experiences poor scaling over multiple nodes.

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1

### Assuming MPI application
srun ./application

Multi Node, Non-Multithreaded

An application that strictly uses MPI often can use multiple nodes. However there is often a chance that MPI type programs do not implement multithreading capability. Therefore, the number of cpus per task should be set to a value of 1.

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1

### Assuming 'application' is on your $PATH environment variable
srun application

Multi Node, Multithreaded

Some applications have been developed to take advantage of both distributed memory parallelism and shared memory parallelism such that they're capable of using MPI and threading together. This often requires the user to find the right balance based on additional resources required such as memory per tasks, network bandwidth, and node core count. The below example request that 4 nodes be allocated, each supporting 4 MPI ranks and each MPI rank supporting 4 threads. The total CPU request count aggregates to 64 (i.e., 4 x 4 x 4).

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=24:00:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun application -arg1 -arg2

Checking status and canceling

You can use the squeue command to display the status of all your jobs:

$ squeue -u $USER

and scancel to delete a particular job from the queue:

$ scancel <jobid>

Viewing the results

Once your job has completed, you should see two files in the directory from which you submitted the job. By default, these will be named <jobname>.oXXXXX and <jobname>.eXXXXX (where the <jobname> is replaced by the name of the SLURM script and the X's are replaced by the numerical portion of the job identifier returned by sbatch). In the Hello World example, any output from the job sent to "standard output" will be written to the hello.oXXXXX file and any output sent to "standard error" will be written to the hello.eXXXXX file.


Interactive Jobs

Interactive jobs are jobs which allow shell access to compute nodes where applications can be run interactively, heavy processing of files, or compiling large applications. They can be requested with similar arguments to batch jobs. ARCC has configured the clusters such that Slurm interactive allocations will give shell access on the compute nodes themselves rather than keeping the shell on the login node. The salloc command is appropriate to launch interactive jobs.

 $ salloc --account=arcc --time=40:00 --nodes=1 --ntasks-per-node=1 --cpus-per-task=8 

The value of interactive jobs are to allow users to work interactively with the CLI or interactive use of debuggers (ddt, gdb) , profilers (map, gprof), or language interpretters such as Python, R, or Julia.


Special Hardware / Configuration Requests

Slurm is a flexible and powerful workload manager. It has been configured to allow very good expressiveness to allocate certain features of nodes and specialized hardware. Certain features are requested by the use of Generic Resource or GRES while others are requested through the constraints option.

GPU Requests

Request that 16 cpus 2 GPUs be requested for an interactive session:

 $ salloc -A arcc --time=40:00 -N 1 --ntasks-per-node=1 --cpus-per-task=16 --gres=gpu:2 

Request 16 cpus, 1 GPU of type P100 in an batch script:

#!/bin/bash

#SBATCH --account=arcc
#SBATCH --time=1-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-tasks=16
#SBATCH --gres:P100:1

srun gpu_application