• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
  • FAUTo the central FAU website
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Success Stories from the Support
    • Annual Report
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters and Talks
    • Software & Tools
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • NHR PerfLab Seminar
    • Projects
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures and Seminars
    • Tutorials & Courses
    • Theses
    • HPC Café
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • Training Resources
    • Summary of System Utilization
    Portal Systems & Services
  • FAQ

  1. Home
  2. Systems & Services
  3. Systems, Documentation & Instructions
  4. Batch Processing

Batch Processing

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
      • NHR@FAU HPC-Portal Usage
    • Job monitoring with ClusterCockpit
    • NHR application rules – NHR@FAU
    • HPC clusters & systems
      • Dialog server
      • Alex GPGPU cluster (NHR+Tier3)
      • Fritz parallel cluster (NHR+Tier3)
      • Meggie parallel cluster (Tier3)
      • Emmy parallel cluster (Tier3)
      • Woody(-old) throughput cluster (Tier3)
      • Woody throughput cluster (Tier3)
      • TinyFat cluster (Tier3)
      • TinyGPU cluster (Tier3)
      • Test cluster
      • Jupyterhub
    • SSH – Secure Shell access to HPC systems
    • File systems
    • Batch Processing
      • Job script examples – Slurm
      • Advanced topics Slurm
    • Software environment
    • Special applications, and tips & tricks
      • Amber/AmberTools
      • ANSYS CFX
      • ANSYS Fluent
      • ANSYS Mechanical
      • Continuous Integration / Gitlab Cx
        • Continuous Integration / One-way syncing of GitHub to Gitlab repositories
      • CP2K
      • CPMD
      • GROMACS
      • IMD
      • Intel MKL
      • LAMMPS
      • Matlab
      • NAMD
      • OpenFOAM
      • ORCA
      • Python and Jupyter
      • Quantum Espresso
      • R and R Studio
      • Spack package manager
      • STAR-CCM+
      • Tensorflow and PyTorch
      • TURBOMOLE
      • VASP
        • Request access to central VASP installation
      • Working with NVIDIA GPUs
      • WRF
  • Support & Contact
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
  • HPC User Training
  • HPC System Utilization

Batch Processing

When logging into an HPC system, you are placed on a login node. From there, you can manage your data, set up your workflow, and prepare and submit jobs. The compute nodes cannot be accessed directly, but run under the control of a batch system. The batch system handles the queuing of jobs into different partitions (depending on the needed resources, e.g. runtime) and sorting according to some priority scheme. A job will run when the required resources become available.

The login nodes are not suitable for computational work, since they are shared among all users. We do not allow MPI-parallel applications on the frontends, short parallel test runs must be performed using batch jobs. It is also possible to submit interactive batch jobs that, when started, open a shell on one of the assigned compute nodes and let you run interactive programs there. On most clusters, a number of nodes are reserved during working hours for short test runs with less than one hour of runtime.

This documentation gives you a general overview of how to use the Slurm batch system and is applicable to all clusters. For more cluster-specific information, consult the respective cluster documentation!

The basic usage of the Slurm batch system is outlined on this page. Information on the following topics is given below:

  • Batch Job Submission
  • Interactive Jobs
  • Options for sbatch/salloc/srun
  • Environment Variables
  • Manage and Control Jobs
  • Job Scripts – General structure

For more detailed information, please refer to the official Slurm documentation and the official Slurm tutorials .

Batch job submission

Apart from short test runs and interactive work, it is recommended to submit your jobs by using the command sbatch. It will queue the job for later execution when the specified resources become available. You can either specify the resources via command-line options or more conveniently directly in your job file using the script directive #SBATCH. The job file is basically a script stating the resource requirements, environment settings, and commands for executing the application. Examples are given below.

The batch file is submitted by using

sbatch [options] job_file

After submission, sbatch will output the Job ID of your job. It can later be used for identification purposes and is also available as the environment variable $SLURM_JOBID in job scripts.

For TinyFat and TinyGPU, use the respective command wrapper sbatch.tinyfat/sbatch.tinygpu.

Interactive jobs

For interactive work and test runs, the command salloc can be used to get an interactive shell on a compute node. After issuing salloc, do not close your terminal session but wait until the resources become available. You will directly be logged into the first granted compute node. When you close your terminal, the allocation will automatically be revoked. There is currently no way to request X11 forwarding to an interactive Slurm job.

To run an interactive job with Slurm on Meggie, Alex and Fritz:

salloc [Options for number of nodes, walltime, etc.]

For TinyFat and TinyGPU, use the respective command wrapper salloc.tinyfat/salloc.tinygpu.

Options for sbatch/salloc/srun

The following parameters can be specified as options for sbatch, salloc, andsrun or included in the job script by using the script directive #SBATCH:

--job-name=<name> Specifies the name which is shown with squeue. If the option is omitted, the name of the batch script file is used.
--nodes=<number> Specifies the number of nodes requested. Default value is 1.
--ntasks=<number> Overall number of tasks (MPI processes). Can be omitted if –nodes and –ntasks-per-node are given. Default value is 1.
--ntasks-per-node=<number> Number of tasks (MPI processes) per node.
--cpus-per-task=<number> Number of threads (logical cores) per task. Used for OpenMP or hybrid jobs. NOTE: Beginning with 22.05, srun will not inherit the –cpus-per-task value requested by salloc or sbatch. It must be requested again with the call to srun or set with the SRUN_CPUS_PER_TASK environment variable if desired for the task(s).
--time=HH:MM:SS Specifies the required wall clock time (runtime). When the job reaches the walltime given here it will be sent a TERM signal. After a few seconds, if the job has not ended yet, it will be sent KILL. If you omit the walltime option, a – very short – default time will be used. Please specify a reasonable runtime, since the scheduler bases its decisions also on this value (short jobs are preferred).
--mail-user=<address>--mail-type=<type> You will get an e-mail to <address> depending on the type you have specified. As a type, you can choose either BEGIN, END, FAIL, TIME_LIMIT or ALL.  Specifying more than one option is also possible.
--output=<file_name> Filename for the standard output stream. This should not be used, since a suitable name is automatically compiled from the job name and the job ID.
--error=<file_name> Filename for the standard error stream. Per default, stderr is merged with stdout.
--partition=<partition> Specifies the partition/queue to which the job is submitted. If no partition is given, the default partition of the respective cluster is used (see cluster documentation).
--constraint=hwperf Access to hardware performance counters (e.g. using likwid-perfctr). Only request this feature if you really want to access the hardware performance counters!
likwid-perfctr is not required for e.g. likwid-pin or likwid-mpirun.
--export=none Only available for sbatch. Environment variables of the submission environment (e.g. PATH set by modules) will not be exported to the submitted job. Must be combined with unset SLURM_EXPORT_ENV inside job script to ensure proper execution of the application (see notes below).

Many more options are available. For details, refer to the official Slurm documentation for sbatch, salloc or srun.

Environment variables

The scheduler typically sets environment variables to tell the job about what resources were allocated to it. These can also be used in batch scripts. A complete list can be found in the official Slurm documentation. The most useful are given below:

Job ID $SLURM_JOB_ID
Directory from which the job was submitted $SLURM_SUBMIT_DIR
List of nodes on which job runs $SLURM_JOB_NODELIST
Number of nodes allocated to job $SLURM_JOB_NUM_NODES
Number of cores per task; set $OMP_NUM_THREADS to this value for OpenMP/hybrid applications $SLURM_CPUS_PER_TASK

SLURM automatically propagates environment variables that are set in the shell at the time of submission into the Slurm job. This includes currently loaded module files. To have a clean environment in job scripts, it is recommended to add #SBATCH --export=NONE and unset SLURM_EXPORT_ENV to the job script. Otherwise, the job will inherit some settings from the submitting shell. The additional un-setting of SLURM_EXPORT_ENV inside the job script ensures propagation of all Slurm-specific variables and loaded modules to the srun call. Specifying export SLURM_EXPORT_ENV=ALL is equivalent to unset SLURM_EXPORT_ENV and can be used interchangeably.

Manage and control jobs

Job and cluster status

squeue <options> Displays status information on queued jobs. Only the user’s own jobs are displayed. -t running display currently running jobs
-j <JobID> display info on job <JobID>
scontrol show job <JobID> Displays very detailed information on jobs.
sinfo Overview of cluster status. Shows available partitions and availability of nodes.

Editing jobs

If your job is not running yet, it is possible to change details of the resource allocation, e.g. the runtime with scontrol update timelimit=4:00:00 jobid=<jobid>. For more details and available options, see the official documentation.

Cancelling jobs

To cancel a job and remove it from the queue use scancel. It will remove queued as well as running jobs. To cancel all your jobs at once use scancel -u <your_username>.

Job scripts – general structure

A batch or job file is generally a script holding information like resource allocations, environment specifications, and commands to execute an application during the runtime of the job.  The following example shows the general structure of a job script. More detailed examples are available in Job Script Examples.

#!/bin/bash -l                     # Batch script starts with shebang line
#                                  # -l is necessary to initialize modules correctly!
#SBATCH --ntasks=20                # All #SBATCH lines have to follow uninterrupted
#SBATCH --time=01:00:00            # comments start with # and do not count as interruptions
#SBATCH --job-name=fancyExp 
#SBATCH --export=NONE              # do not export environment from submitting shell
                                   # first non-empty non-comment line ends SBATCH options
unset SLURM_EXPORT_ENV             # enable export of environment from this script to srun

module load <modules>              # Load necessary modules

srun ./application [options]       # Execute parallel application with srun

 

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
Up