• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
  • FAUTo the central FAU website
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Success Stories from the Support
    • Annual Report
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters and Talks
    • Software & Tools
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • NHR PerfLab Seminar
    • Projects
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures and Seminars
    • Tutorials & Courses
    • Theses
    • HPC Café
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • Training Resources
    • Summary of System Utilization
    Portal Systems & Services
  • FAQ

  1. Home
  2. Systems & Services
  3. Systems, Documentation & Instructions
  4. Batch Processing
  5. Job script examples – Slurm

Job script examples – Slurm

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
      • NHR@FAU HPC-Portal Usage
    • Job monitoring with ClusterCockpit
    • NHR application rules – NHR@FAU
    • HPC clusters & systems
      • Dialog server
      • Alex GPGPU cluster (NHR+Tier3)
      • Fritz parallel cluster (NHR+Tier3)
      • Meggie parallel cluster (Tier3)
      • Emmy parallel cluster (Tier3)
      • Woody(-old) throughput cluster (Tier3)
      • Woody throughput cluster (Tier3)
      • TinyFat cluster (Tier3)
      • TinyGPU cluster (Tier3)
      • Test cluster
      • Jupyterhub
    • SSH – Secure Shell access to HPC systems
    • File systems
    • Batch Processing
      • Job script examples – Slurm
      • Advanced topics Slurm
    • Software environment
    • Special applications, and tips & tricks
      • Amber/AmberTools
      • ANSYS CFX
      • ANSYS Fluent
      • ANSYS Mechanical
      • Continuous Integration / Gitlab Cx
        • Continuous Integration / One-way syncing of GitHub to Gitlab repositories
      • CP2K
      • CPMD
      • GROMACS
      • IMD
      • Intel MKL
      • LAMMPS
      • Matlab
      • NAMD
      • OpenFOAM
      • ORCA
      • Python and Jupyter
      • Quantum Espresso
      • R and R Studio
      • Spack package manager
      • STAR-CCM+
      • Tensorflow and PyTorch
      • TURBOMOLE
      • VASP
        • Request access to central VASP installation
      • Working with NVIDIA GPUs
      • WRF
  • Support & Contact
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
  • HPC User Training
  • HPC System Utilization

Job script examples – Slurm

In the following, example batch scripts for different types of jobs are given. Note that all of them have to be adapted to your specific application and the target system. You can find more examples in the cluster-specific documentation.

  • Parallel jobs
    • Jobs with OpenMP
    • Jobs with MPI
    • Hybrid MPI/OpenMP jobs, single-node
    • Hybrid MPI/OpenMP jobs, multi-node
  • GPU jobs
    • Single-node jobs

Parallel jobs

For submitting parallel jobs, a few rules have to be understood and followed. In general, they depend on the type of parallelization and architecture.

Job script with OpenMP

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

OpenMP applications can only make use of one node, therefore --nodes=1 and --ntasks-per-node=1 are necessary.  The number of allocated CPUs --cpus-per-task and therefore OpenMP threads depends on the system and has to be adjusted accordingly.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=64
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load <modules>

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./openmpi_application

Job script with MPI

For pure MPI jobs, you have two different options of defining the allocated resources depending on the needs of your application and the intended distribution of tasks on the compute nodes.

Option 1: define the total number of MPI processes that should be started via --ntasks.  Adjust the value according to the available CPUs per node of the respective cluster. Beware that this might lead to load imbalances if the --ntasks does not correspond to “full” nodes!

#!/bin/bash -l

#SBATCH --ntasks=64
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load <modules>

srun ./mpi_application

Option 2: define the number of MPI processes that should be started via the number of nodes --nodes and --ntasks-per-node. Use this formulation to better define the distribution of MPI processes on the nodes, especially if not using one process per physical core! --ntasks-per-node has to be adapted to the hardware of the specific cluster.

#!/bin/bash -l

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=20
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load <modules>

srun ./mpi_application

Job script for hybrid MPI/OpenMP, single-node

Adjust the number of MPI processes via --ntasks and the corresponding number of OpenMP threads per MPI process via --cpus-per-task according to the available hardware configuration and the needs of your application.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l

#SBATCH --ntasks=2
#SBATCH --cpus-per-task=8
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# for Slurm version >22.05: cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun ./hybrid_application

Job script for hybrid MPI/OpenMP, multi-node

Adjust the number of MPI processes via --nodes and --ntasks-per-node and the corresponding number of OpenMP threads per MPI process via --cpus-per-task according to the available hardware configuration and the needs of your application.

OpenMP is not Slurm-aware, so you need to specify OMP_NUM_THREADS in your script. It should match the number of cores requested via --cpus-per-task.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# for Slurm version >22.05: cpus-per-task has to be set again for srun
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
srun ./hybrid_application

GPU jobs

GPUs are only available in TinyGPU and Alex. To submit a job to one of those clusters, you have to specify the number of GPUs that should be allocated to your job. This is done via the Slurm option --gres=gpu:<number_gpus_per_node>. The available number of GPUs per node differs between 4 and 8, depending on the cluster and node configuration.

Single-node job

The number of requested GPUs has to be smaller or equal to the available number of GPUs per node. In this case, the compute nodes are not allocated exclusively but are shared among several jobs – the GPUs themselves are always granted exclusively. The corresponding share of the resources of the host system (CPU cores, RAM) is automatically allocated.

If you are using hybrid OpenMP/MPI or pure MPI code, adjust --cpus-per-task or --ntasks according to the above examples. Do not specify more cores/tasks than available for the number of requested GPUs!

#!/bin/bash -l
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
srun ./cuda_application

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
Up