Transition from Woody with Ubuntu 18.04 and Torque to Woody-NG with AlmaLinux8 and Slurm
Valued Tier3 HPC users of NHR@FAU,
as briefly announced in the HPC Cafe in June, we now started with switching from Woody with Ubuntu 18.04 and Torque to Woody-NG (“Woody Next Generation”) with AlmaLinux8 as operating system and Slurm as batch system.
Woody-NG uses the same operating system as the large NHR system Alex + Fritz and will have similar modules.
Woody-NG is a throughput resource for single core / single node jobs with a total of almost 3,000 cores available to our Tier3 (i.e. non-NHR) users. Woody-NG consists of two node types: thin nodes with 4 cores per node (w12xx, w13xx, w14xx, w15xx – already known from Woody) and larger ones with 32 cores per node (w22xx, w23xx). In contrast to Woody, there are no longer restrictions for single-core jobs. The granularity of allocations are now generally cores. Cores are assigned exclusively and you always get 7.75 GB of main memory per allocated core on both the thin and larger nodes. Network and local HDD / SSD storage has to be shared.
First nodes are already available in Woody-NG (together with 70 new Intel Icelake-based nodes). The remaining nodes will follow gradually until end of August to allow a smooth transition. The w11xx nodes with only 8 GB main memory will not be moved to Woody-NG but turned off end of August.
Woody-NG has new login nodes:
woody.nhr.fau.de (keep the “nhr.fau.de” instead of “rrze.uni-erlangen.de” in mind!) with new SSH host keys.
NHR accounts are not enabled by default but all FAU/Tier3 accounts are.
TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use
tinyx.nhr.fau.de instead to submit jobs to TinyGPU/TinyFAT!
Besides the new batch system, the AlmaLinux8 nodes also use a new
/apps directory, thus, the available modules and versions of software change. As a consequence, batch scripts not only have to be updated with regard to the batch pragmas but also in the commands section. Recompilation of software is usually required when moving from Woody to Woody-NG as the dependencies changed too much. If modules are missing, let us know; however, we won’t install outdated versions of software.
A quick primer for the transition from Torque to Slurm
See also https://hpc.fau.de/systems-services/documentation-instructions/batch-processing/ for a more detailed description.
Batch commands (Torque vs. Slurm):
qsub jobscript.pbs => sbatch jobscript.slurm qstat => squeue qstat -f JOBID => scontrol show job=JOBID qdel JOBID => scancel JOBID qsub -I => salloc
Batch scripts (Torque vs. Slurm):
#!/bin/bash -l => #!/bin/bash -l ## no change #PBS -lnodes=1:ppn=4 => #SBATCH --ntasks=4 #PBS -lnodes=...:sl/:kl => #SBATCH --constraint=sl / kl # specific node type (usage discouraged) #PBS -lnodes=...:likwid => #SBATCH --constraint=hwperf --exclusive # measuring counters with LIKWID #PBS -lwalltime=12:0:0 => #SBATCH --time=12:0:0 #PBS -N myjob => #SBATCH --job-name=myjob --n/a-- => to get a clean environment add the following 2 lines #SBATCH --export=NONE unset SLURM_EXPORT_ENV cd $PBS_O_WORKDIR => usually not required
Environment variables (Torque vs. Slurm):
$PBS_O_WORKDIR => $SLURM_SUBMIT_DIR $PBS_JOBID => $SLURM_JOB_ID cat $PBS_NODEFILE => scontrol show hostnames $SLURM_JOB_NODELIST
Major module changes
Special behavior which was only available at RRZE has been given up:
intel64module has been renamed to
inteland no longer automatically loads
intel-mpi/VERSION-gcchave been unified into
intel-mpi/VERSION. The selection of the compiler occurs by the wrapper name, e.g.