Batch Processing
When logging into an HPC system, you are placed on a login node. From there, you can manage your data, set up your workflow, and prepare and submit jobs. The compute nodes cannot be accessed directly, but run under the control of a batch system. The batch system handles the queuing of jobs into different partitions (depending on the needed resources, e.g. runtime) and sorting according to some priority scheme. A job will run when the required resources become available.
The login nodes are not suitable for computational work, since they are shared among all users. We do not allow MPI-parallel applications on the frontends, short parallel test runs must be performed using batch jobs. It is also possible to submit interactive batch jobs that, when started, open a shell on one of the assigned compute nodes and let you run interactive programs there. On most clusters, a number of nodes are reserved during working hours for short test runs with less than one hour of runtime.
The basic usage of the Slurm batch system is outlined on this page. Information on the following topics is given below:
- Batch Job Submission
- Interactive Jobs
- Options for sbatch/salloc/srun
- Environment Variables
- Manage and Control Jobs
- Job Scripts – General structure
For more detailed information, please refer to the official Slurm documentation and the official Slurm tutorials .
Batch job submission
Apart from short test runs and interactive work, it is recommended to submit your jobs by using the command sbatch
. It will queue the job for later execution when the specified resources become available. You can either specify the resources via command-line options or more conveniently directly in your job file using the script directive #SBATCH
. The job file is basically a script stating the resource requirements, environment settings, and commands for executing the application. Examples are given below.
The batch file is submitted by using
sbatch [options] job_file
After submission, sbatch
will output the Job ID of your job. It can later be used for identification purposes and is also available as the environment variable $SLURM_JOBID
in job scripts.
For TinyFat and TinyGPU, use the respective command wrapper sbatch.tinyfat
/sbatch.tinygpu
.
Interactive jobs
For interactive work and test runs, the command salloc
can be used to get an interactive shell on a compute node. After issuing salloc
, do not close your terminal session but wait until the resources become available. You will directly be logged into the first granted compute node. When you close your terminal, the allocation will automatically be revoked. There is currently no way to request X11 forwarding to an interactive Slurm job.
To run an interactive job with Slurm on Meggie, Alex and Fritz:
salloc [Options for number of nodes, walltime, etc.]
For TinyFat and TinyGPU, use the respective command wrapper salloc.tinyfat
/salloc.tinygpu
.
Options for sbatch/salloc/srun
The following parameters can be specified as options for sbatch
, salloc
, andsrun
or included in the job script by using the script directive #SBATCH
:
--job-name=<name> |
Specifies the name which is shown with squeue . If the option is omitted, the name of the batch script file is used. |
--nodes=<number> |
Specifies the number of nodes requested. Default value is 1. |
--ntasks=<number> |
Overall number of tasks (MPI processes). Can be omitted if –nodes and –ntasks-per-node are given. Default value is 1. |
--ntasks-per-node=<number> |
Number of tasks (MPI processes) per node. |
--cpus-per-task=<number> |
Number of threads (logical cores) per task. Used for OpenMP or hybrid jobs. NOTE: Beginning with 22.05, srun will not inherit the –cpus-per-task value requested by salloc or sbatch. It must be requested again with the call to srun or set with the SRUN_CPUS_PER_TASK environment variable if desired for the task(s). |
--time=HH:MM:SS |
Specifies the required wall clock time (runtime). When the job reaches the walltime given here it will be sent a TERM signal. After a few seconds, if the job has not ended yet, it will be sent KILL . If you omit the walltime option, a – very short – default time will be used. Please specify a reasonable runtime, since the scheduler bases its decisions also on this value (short jobs are preferred). |
--mail-user=<address> --mail-type=<type> |
You will get an e-mail to <address> depending on the type you have specified. As a type, you can choose either BEGIN , END , FAIL , TIME_LIMIT or ALL . Specifying more than one option is also possible. |
--output=<file_name>
|
Filename for the standard output stream. This should not be used, since a suitable name is automatically compiled from the job name and the job ID. |
--error=<file_name>
|
Filename for the standard error stream. Per default, stderr is merged with stdout. |
--partition=<partition> |
Specifies the partition/queue to which the job is submitted. If no partition is given, the default partition of the respective cluster is used (see cluster documentation). |
--constraint=hwperf |
Access to hardware performance counters (e.g. using likwid-perfctr ). Only request this feature if you really want to access the hardware performance counters!likwid-perfctr is not required for e.g. likwid-pin or likwid-mpirun . |
--export=none |
Only available for sbatch . Environment variables of the submission environment (e.g. PATH set by modules) will not be exported to the submitted job. Must be combined with unset SLURM_EXPORT_ENV inside job script to ensure proper execution of the application (see notes below). |
Many more options are available. For details, refer to the official Slurm documentation for sbatch, salloc or srun.
Environment variables
The scheduler typically sets environment variables to tell the job about what resources were allocated to it. These can also be used in batch scripts. A complete list can be found in the official Slurm documentation. The most useful are given below:
Job ID | $SLURM_JOB_ID |
Directory from which the job was submitted | $SLURM_SUBMIT_DIR |
List of nodes on which job runs | $SLURM_JOB_NODELIST |
Number of nodes allocated to job | $SLURM_JOB_NUM_NODES |
Number of cores per task; set $OMP_NUM_THREADS to this value for OpenMP/hybrid applications |
$SLURM_CPUS_PER_TASK
|
SLURM automatically propagates environment variables that are set in the shell at the time of submission into the Slurm job. This includes currently loaded module files. To have a clean environment in job scripts, it is recommended to add #SBATCH --export=NONE
and unset SLURM_EXPORT_ENV
to the job script. Otherwise, the job will inherit some settings from the submitting shell. The additional un-setting of SLURM_EXPORT_ENV
inside the job script ensures propagation of all Slurm-specific variables and loaded modules to the srun
call. Specifying export SLURM_EXPORT_ENV=ALL
is equivalent to unset SLURM_EXPORT_ENV
and can be used interchangeably.
Manage and control jobs
Job and cluster status
squeue <options> |
Displays status information on queued jobs. Only the user’s own jobs are displayed. | -t running display currently running jobs-j <JobID> display info on job <JobID> |
scontrol show job <JobID> |
Displays very detailed information on jobs. | |
sinfo |
Overview of cluster status. Shows available partitions and availability of nodes. |
Editing jobs
If your job is not running yet, it is possible to change details of the resource allocation, e.g. the runtime with scontrol update timelimit=4:00:00 jobid=<jobid>.
For more details and available options, see the official documentation.
Cancelling jobs
To cancel a job and remove it from the queue use scancel
. It will remove queued as well as running jobs. To cancel all your jobs at once use scancel -u <your_username>
.
Job scripts – general structure
A batch or job file is generally a script holding information like resource allocations, environment specifications, and commands to execute an application during the runtime of the job. The following example shows the general structure of a job script. More detailed examples are available in Job Script Examples.
#!/bin/bash -l # Batch script starts with shebang line # # -l is necessary to initialize modules correctly! #SBATCH --ntasks=20 # All #SBATCH lines have to follow uninterrupted #SBATCH --time=01:00:00 # comments start with # and do not count as interruptions #SBATCH --job-name=fancyExp #SBATCH --export=NONE # do not export environment from submitting shell # first non-empty non-comment line ends SBATCH options unset SLURM_EXPORT_ENV # enable export of environment from this script to srun module load <modules> # Load necessary modules srun ./application [options] # Execute parallel application with srun