Transition from Woody with Ubuntu 18.04 and Torque to Woody-NG with AlmaLinux8 and Slurm

Symbolbild zum Artikel. Der Link öffnet das Bild in einer großen Anzeige.

Valued Tier3 HPC users of NHR@FAU,

as briefly announced in the HPC Cafe in June, we now started with switching from Woody with Ubuntu 18.04 and Torque to Woody-NG (“Woody Next Generation”) with AlmaLinux8 as operating system and Slurm as batch system.

Woody-NG uses the same operating system as the large NHR system Alex + Fritz and will have similar modules.

Woody-NG is a throughput resource for single core / single node jobs with a total of almost 3,000 cores available to our Tier3 (i.e. non-NHR) users. Woody-NG consists of two node types: thin nodes with 4 cores per node (w12xx, w13xx, w14xx, w15xx – already known from Woody) and larger ones with 32 cores per node (w22xx, w23xx). In contrast to Woody, there are no longer restrictions for single-core jobs. The granularity of allocations are now generally cores. Cores are assigned exclusively and you always get 7.75 GB of main memory per allocated core on both the thin and larger nodes. Network and local HDD / SSD storage has to be shared.

First nodes are already available in Woody-NG (together with 70 new Intel Icelake-based nodes). The remaining nodes will follow gradually until end of August to allow a smooth transition. The w11xx nodes with only 8 GB main memory will not be moved to Woody-NG but turned off end of August.

Woody-NG has new login nodes: woody.nhr.fau.de (keep the “nhr.fau.de” instead of “rrze.uni-erlangen.de” in mind!) with new SSH host keys.
NHR accounts are not enabled by default but all FAU/Tier3 accounts are.

TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de instead to submit jobs to TinyGPU/TinyFAT!

Besides the new batch system, the AlmaLinux8 nodes also use a new /apps directory, thus, the available modules and versions of software change. As a consequence, batch scripts not only have to be updated with regard to the batch pragmas but also in the commands section. Recompilation of software is usually required when moving from Woody to Woody-NG as the dependencies changed too much. If modules are missing, let us know; however, we won’t install outdated versions of software.

Kind regards
NHR@FAU

A quick primer for the transition from Torque to Slurm

See also https://hpc.fau.de/systems-services/documentation-instructions/batch-processing/ for a more detailed description.

Batch commands (Torque vs. Slurm):

qsub  jobscript.pbs      => sbatch  jobscript.slurm
qstat                    => squeue
qstat -f JOBID           => scontrol show job=JOBID
qdel  JOBID              => scancel  JOBID
qsub -I                  => salloc

Batch scripts (Torque vs. Slurm):

#!/bin/bash -l           => #!/bin/bash -l                ## no change
#PBS -lnodes=1:ppn=4     => #SBATCH --ntasks=4
#PBS -lnodes=...:sl/:kl  => #SBATCH --constraint=sl / kl  # specific node type (usage discouraged)
#PBS -lnodes=...:likwid  => #SBATCH --constraint=hwperf --exclusive  # measuring counters with LIKWID
#PBS -lwalltime=12:0:0   => #SBATCH --time=12:0:0
#PBS -N myjob            => #SBATCH --job-name=myjob
--n/a--                  => to get a clean environment add the following 2 lines
                            #SBATCH --export=NONE
                            unset SLURM_EXPORT_ENV
cd $PBS_O_WORKDIR        => usually not required

Environment variables (Torque vs. Slurm):

$PBS_O_WORKDIR           => $SLURM_SUBMIT_DIR
$PBS_JOBID               => $SLURM_JOB_ID
cat $PBS_NODEFILE        => scontrol show hostnames $SLURM_JOB_NODELIST

Major module changes

Special behavior which was only available at RRZE has been given up:

  1. The intel64 module has been renamed to intel and no longer automatically loads intel-mpi and mkl.
  2. intel-mpi/VERSION-intel and intel-mpi/VERSION-gcc have been unified into intel-mpi/VERSION. The selection of the compiler occurs by the wrapper name, e.g. mpicc = GCC, mpiicc = Intel; mpif90 = GFortran; mpiifort = Intel.