HPC system: Job Scheduling Slurm
The Teton cluster uses the Slurm Workload Manager to schedule jobs, control resource access, provide fairshare, implement preemption, and provide record keeping. All compute activity should be used from within a Slurm resource allocation (i.e., job). Teton is a condominium resource and as such, investors do have priority on invested resources. This is implemented through preemption and jobs not associated with the investment could be requeued on the system when investor submits jobs. However, if the investor chooses not to implement preemption on their resources, ARCC can disable preemption while offering next-in-line access if that mode is preferred.
- There are default concurrent limits in place to prevent individual project accounts and users from saturating the cluster away from others. The default limits are listed below. To incentivize investments into the condo system, investors will have their limits increased.
- The system leverages a fairshare mechanism to offer a mechanism for projects that execute jobs on a more rare occasion priority over those who continuously run jobs on the system. To incentivize investments into the condo system, investors will have their fairshare value increased as well.
- Finally, individual jobs occur runtime limits based on a study that was performed in ~2014 such that our maximum walltime for a compute job is 7 days. ARCC is currently evaluating this to determine whether the orthogonal limits of CPU count and walltime are optimal operational modes. ARCC is considering concurrent usage limits based on a relational combination of CPU count, Memory, and walltime that would allow more flexibility for different areas of science. There will likely still be an upper limit on individual compute job walltime as ARCC will not allow infinite job walltime and due to possible hardware faults.
Required Inputs and Default Values and Limits
There are some default limits set for Slurm jobs. By default the following is required for submission:
- Walltime limit
- Project account
Additionally, the default submission has the following characteristics:
- is for one node (-N 1, --nodes=1)
- task count
- one tasks (-n 1, --ntasks-per-node=1)
- memory amount
- 1000 MB RAM / CPU (--mem-per-cpu=1000).
These can be changed by requesting different allocation schemes by modifying the appropriate flags. Please reference our Slurm documentation.
On Mount Moran, the default limits were specifically represented by concurrent used cores by each project account. Investors received an increase concurrent core usage capability. To facilitate more flexible scheduling for all research groups, ARCC is looking at implementing limits based on concurrent usage of cores, memory, and walltime of jobs. This will be defined in the near future and will be subject to the FAC review.
The Slurm configuration on Teton is quite complicated to help with the layout of hardware, investors, and runtime limits. The following tables represents the partition on Teton. Some require a QoS which will be auto-assigned during job submission. The tables represent the Slurm allocatable units rather than hardware units.
|Partition||Max Walltime||Node Cnt||Core Cnt||Thds / Core||CPUS||Mem (MB) / Node||Req'd QoS|
|Moran||7-00:00:00||284||4544||1||4544||64000 or 128000||N/A|
Investor partitions are likely to be quite heterogeneous and may have a mix of hardware and are indicated below where appropriate. They require a special QoS for access.
|Partition||Max Walltime||Node Cnt||Core Cnt||Thds / Core||Mem (MB) / Node||Req'd QoS||Preemption||Owner|
Special partitions require access to be given directly to user accounts or project accounts and likely require additional approval for access.
|Partition||Max Walltime||Node Cnt||Core Cnt||Thds / Core||Mem (MB) / Node||Owner||Notes|
|dgx||7-00:00:00||2||40||2||512000||EvolvingAI Lab||NVIDIA V100 with NVLink, Ubuntu 16.04|
Generally, to run a job on a cluster you will need the following:
A handy migration reference to compare MOAB/Torque commands to SLURM commands can be found on the SLURM home site: Batch system Rosetta stone
For further details on using Slurm, see Slurm.