Skip to content

Test cluster#

The test and benchmark cluster is an environment for porting software to new CPU architectures and running benchmark tests. It comprises a variety of nodes with different processors, clock speeds, memory speeds, memory capacity, number of CPU sockets, etc. There is no high-speed network, and MPI parallelization is restricted to one node. The usual NFS filesystems are available.

This is a testing ground. Any job may be canceled without prior notice.

System overview#

This is a quick overview of the systems including their host names (frequencies are nominal values) - NDA systems are not listed:

Hostname CPU RAM Accelerators
applem1studio Apple M1 Ultra 20-Core Processor 64 GiB RAM 64-Core GPU
aurora1 Single Intel Xeon "Skylake" Gold 6126 CPU
(12 cores + SMT) @ 2.60GHz
2x NEC Aurora TSUBASA 10B (48 GiB RAM)
broadep2 Dual Intel Xeon "Broadwell" CPU E5-2697 v4
(2x 18 cores + SMT) @ 2.30GHz
128 GiB RAM
casclakesp2 Dual Intel Xeon "Cascade Lake" Gold 6248 CPU
(2x 20 cores + SMT) @ 2.50GHz
384 GiB RAM
euryale Dual Intel Xeon "Broadwell" CPU E5-2620 v4
(2x 8 cores) @ 2.10GHz
64 GiB RAM AMD RX 6900 XT (16 GB)
genoa1 Dual AMD EPYC 9654 "Genoa" CPU
(2x 96 cores + SMT) @ 2.40GHz
768 GiB RAM
genoa2 Dual AMD EPYC 9354 "Genoa" CPU
(2x 32 cores + SMT) @ 3.25GHz
768 GiB RAM Nvidia A40 (48 GiB GDDR6)
Nvidia L40s (48 GiB GDDR6)
gracehop1 Nvidia Grace Hopper GH200 (72 cores) 480 GiB RAM Nvidia H100 (98 GiB HBM3)
gracesup1 Nvidia Grace Superchip (2x 72 cores) 480 GiB RAM
hasep1 Dual Intel Xeon "Haswell" E5-2695 v3 CPU
(2x 14 cores + SMT) @ 2.30GHz
64 GiB RAM
icx32 Dual Intel Xeon "Ice Lake" Platinum 8358 CPU
(2x 32 cores + SMT) @ 2.60GHz
256 GiB RAM Nvidia L4 (24 GB)
icx36 Dual Intel Xeon "Ice Lake" Platinum 8360Y CPU
(2x 36 cores + SMT) @ 2.40GHz
256 GiB RAM
ivyep1 Dual Intel Xeon "Ivy Bridge" E5-2690 v2 CPU
(2x 10 cores + SMT) @ 3.00GHz
64 GiB RAM
lukewarm Dual ARM Ampere Altra Max M128-30, ARM aarch64
(2x 128 cores) @ 2.8 GHz
512 GB RAM (DDR4-3200)
medusa Dual Intel Xeon "Cascade Lake" Gold 6246 CPU
(2x 12 cores + SMT) @ 3.30GHz
192 GiB RAM Nvidia Geforce RTX 2070 SUPER (8 GiB GDDR6)
Nvidia Geforce RTX 2080 SUPER (8 GiB GDDR6)
Nvidia Quadro RTX 5000 (16 GiB GDDR6)
Nvidia Quadro RTX 6000 (24 GiB GDDR6)
milan1 Dual AMD EPYC 7543 "Milan" CPU
(2x 32 cores + SMT) @ 2.8 GHz
256 GiB RAM AMD MI210 (64 GiB HBM2e)
naples1 Dual AMD EPYC 7451 "Naples" CPU
(2x 24 cores + SMT) @ 2.3 GHz
128 GiB RAM
optane1 Dual Intel Xeon "Ice Lake" Platinum 8362 CPU
(2x 32 cores + SMT) @ 2.80 GHz
256 GiB RAM
1024 GiB Optane Memory
rome1 Single AMD EPYC 7452 "Rome" CPU
(32 cores + SMT) @ 2.35 GHz
128 GiB RAM
rome2 Dual AMD EPYC 7352 "Rome" CPU
(2x 24 cores + SMT) @ 2.3 GHz
256 GiB RAM AMD MI100 (32 GiB HBM2)
AMD MI210 (64 GiB HBM2)
skylakesp2 Intel Xeon "Skylake" Gold 6148 CPU
(2x 20 cores + SMT) @ 2.40GHz
96 GiB RAM
warmup Dual Cavium/Marvell "ThunderX2" (ARMv8) CN9980, ARM aarch64
(2x 32 cores + 4-way SMT) @ 2.20 GHz
128 GiB RAM

GPU availability#

Technical specifications of all more or less recent GPUs available at NHR@FAU (either in the test cluster or in TinyGPU):

RAM BW [GB/s] Ref Clock [GHz] Cores Shader/TMUs/ROPs TDP [W] SP [TFlop/s] DP [TFlop/s] Host Host CPU (base clock frequency)
Nvidia Geforce GTX1080 8 GB GDDR5 320 1,607 2.560/160/64 180 8,87 0,277 tg03x (JupyterHub) Intel Xeon Broadwell E5-2620 v4 (8 C, 2.10GHz)
Nvidia Geforce GTX1080Ti 11 GB GDDR5 484 1,48 3.584/224/88 250 11,34 0,354 tg04x (JupyterHub) Intel Xeon Broadwell E5-2620 v4 (2x 8 C, 2.10GHz)
Nvidia Geforce RTX2070Super 8 GB GDDR6 448 1,605 2.560/160/64 215 9,06 0,283 medusa Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Quadro RTX5000 16 GB GDDR6 448 1,62 3.072/192/64 230 11,15 0,348 medusa Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Geforce RTX2080Super 8 GB GDDR6 496 1,65 3.072/192/64 250 11,15 0,348 medusa Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Geforce RTX2080Ti 11 GB GDDR6 616 1,35 4.352/272/88 250 13,45 0,42 tg06x (TinyGPU) Intel Xeon Skylake Gold 6134 (2x 8 Cores + SMT, 3.20GHz)
Nvidia Quadro RTX6000 24 GB GDDR6 672 1,44 4608/288/96 260 16,31 0,51 medusa Intel Xeon Cascade Lake Gold 6246 (2x 12 C, 3.30GHz)
Nvidia Geforce RTX3080 10 GB, GDDR6X 760 1440 8704 320 29.77 465 tg08x (TinyGPU) Intel Xeon Ice Lake Gold 6226R (2x 32 cores + SMT, 2.90GHz)
Nvidia Tesla V100 (PCIe, passive) 32 GB HBM2 900 1,245 5.120 Shader 250 14,13 7,066 tg07x (TinyGPU) Intel Xeon Skylake Gold 6134 (2x 8 Cores + SMT, 3.20GHz)
Nvidia A40 (passive) 48 GB GDDR6 696 1305 10.752 Shader 300 37.42 1,169 genoa2 AMD Genoa 9354 (2x 32 cores + SMT, 3.25 GHz)
Nvidia A100 (SMX4/NVLink, passive) 40 GB HBM2 1555 1410 6.912 Shader 400 19,5 9.7 tg09x (TinyGPU) AMD Rome 7662 (2x 64 Cores, 2.0GHz)
Nvidia L40 (passive) 48 GB GDDR6 864 735 18.176 Shader 300 90.52 1414 genoa2 AMD Genoa 9354 (2x 32 cores + SMT, 3.25 GHz)
AMD Instinct MI100 (PCIe Gen4, passive) 32 GB HBM2 1229 1,502 120 Compute Units / 7680 Cores 300 21,1 11,5 rome2 AMD Rome 7352 (2x 24 cores + SMT, 2.3 GHz)
AMD Radeon VII 16 GB HBM2 1,024 1,4 3,840/240/64 300 13.44 3360 interlagos1 AMD Interlagos Opteron 6276
AMD Instinct MI210 (PCIe Gen4, passive) 64 GB HBM2e 1,638 1 104 Compute Units / 6,656 Cores 300 22.6 22.6 milan1, rome2 AMD Milan 7543 (2×32 cores + SMT, 2.8 GHz), AMD Rome 7352 (2x 24 cores + SMT, 2.3 GHz)

Accessing the Test cluster#

Access to the test cluster is restricted. Contact hpc-support@fau.de to request access.

See configuring connection settings or SSH in general for configuring your SSH connection.

You might configure proxy jump over csnhr.nhr.fau.de and usage of SSH private key for the test cluster's frontend node testfront.rrze.fau.de.

If successfully configured, the test cluster can be accessed via SSH by:

ssh testfront.rrze.fau.de

Software#

The frontend runs Ubuntu 22.04 LTS.

All software on NHR@FAU systems, e.g. (commercial) applications, compilers and libraries, is provided using environment modules. These modules are used to setup a custom environment when working interactively or inside batch jobs.

For available software see:

Most software is centrally installed using Spack. By default, only a subset of packages installed via Spack is shown. To see all installed packages, load the 000-all-spack-pkgs module. You can install software yourself by using the user-spack functionality.

Containers, e.g. Docker, are supported via Apptainer.

Filesystems#

On all front ends and nodes the filesystems $HOME, $HPCVAULT, and $WORK are mounted. For details see the filesystems documentation.

The nodes have local hard disks of very different capacities and speeds. Do not expect a production environment.

Batch processing#

Resources are controlled through the batch system Slurm.

Access to the nda partition is restricted and benchmark results must not be published without further consideration.

Maximum job runtime is 24 hours.

The currently available nodes can be listed using:

sinfo -o "%.14N %.9P %.11T %.4c %.8z %.6m %.35f"

To select a node, you can either use the host name or a feature name from sinfo with salloc or sbatch:

  • --nodes=1 --constraint=featurename ...
  • -w hostname ...

Specific constraints#

target Slurm constraint description
hardware performance counters hwperf Enables access to hardware performance counters, required for likwid-perfctr.
NUMA balancing numa_off Disables NUMA balancing, enabled by default.
transparent huge pages thp_always Automatically use transparent huge pages, by default set to madvise.

Specify a constraint with -C <CONSTRAINT> or --constraint=<CONSTRAINTS> when using salloc or sbatch. Multiple constraints require quoting and are concatenated via &, e.g. -C "hwperf&thp_always"

Interactive job#

The environment from the calling shell, like loaded modules, will be inherited by the interactive job.

Interactive jobs can be requested by using salloc instead of sbatch and specifying the respective options on the command line:

salloc --nodes=1 -w hostname --time=hh:mm:ss

This will give you a shell on the node hostname for the specified amount of time.

Batch job#

The following job script will allocate node icx36 for 6 hours:

#!/bin/bash -l 
#
#SBATCH --nodes=1
#SBATCH -w ixc36
#SBATCH --time=06:00:00 
#SBATCH --export=NONE 

unset SLURM_EXPORT_ENV 

module load ...

./a.out

Attach to a running job#

See the general documentation on batch processing.