Woody throughput cluster (Tier3)

FAU’s Woody cluster is a high-performance compute resource for throughput workloads. Woody is a Tier3 resource serving FAU’s basic needs. Therefore, NHR accounts are not enabled by default.

TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de instead to submit jobs to TinyGPU/TinyFAT!

Woody is the next generation throughput cluster and became operational in summer 2022.

2 public (and 2 dedicated) front end nodes each with with two Intel Xeon Gold 6342 CPUs (“Icelake”, 2x 24 cores, HT disabled, 2.80 GHz base frequency), 512 GB RAM, and 100 GbE connection to RRZE’s network backbone.
64 compute nodes (w12xx/w13xx nodes) with one Intel Xeon E3-1240 v5 CPU (“Skylake”, 4 cores, HT disabled, 3.50 GHz base frequency), 32 GB RAM, 1 TB HDD
a total of 256 cores from 04/2016 and 01/2017
112 compute nodes (w14xx/w15xx nodes) with one Intel Xeon E3-1240 v6 CPU (“Kaby Lake”, 4 cores, HT disabled, 3.70 GHz base frequency), 32 GB RAM, 960 GB SDD
a total of 448 cores from Q3/2019
110 compute nodes (w22xx/w23xx/w24xx/w25xx) with two Intel Xeon Gold 6326 CPUs (“Icelake”, 2x 16 cores, HT disabled, 2.90 GHz base frequency), 256 GB RAM, 1.8 TB SSD
a total of 3.520 cores from Q2/2022 and Q2/2023.
40 nodes are financed by ECAP and dedicated to them. 40 nodes additional are financed by and belong to a project from the Physics department.

This website shows information regarding the following topics:

Access, User Environment, File Systems
Further Information

Access, User Environment, and File Systems

Access to the machine

Users can connect to woody.nhr.fau.de (keep the “nhr” instead of “rrze” in mind!) and will be randomly routed to one of the two front ends. All systems in the cluster, including the front ends, have private IPv4 addresses in the 10.28.244.0/22 and IPv6 addresses in the 2001:638:a000:4801::/64 range. They can normally only be accessed directly from within the FAU networks. There is one exception: If your internet connection supports IPv6, you can directly ssh to the front ends (but not to the compute nodes). Otherwise, if you need access from outside of FAU, you usually have to connect for example to the dialog server cshpc.rrze.fau.de first and then ssh to woody.nhr.fau.de from there.

TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de instead to submit jobs to TinyGPU/TinyFAT!

SSH public host keys of woody.nhr.fau.de (as of 07/2022)

ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFBe6F2hyQkJp19HVlZJnc3d/SiSfMRx4LDXAGnhFKXi4sBMWbMxsAFQPT5ZnlTRsPbhMJJeJLGxycQg/pGboyI= woody.nhr.fau.de
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIw6ALXTLpIx+JJG3D917bkTlA2J4vXZGXDIysOV+ZiP woody.nhr.fau.de
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDL6ZBgdcFMU1RjtDpBGjtTovgQZqT/Bd0QWfxzgYvqzUJohiTIAk5YPNq5zygi5yVfO7NphURKSCYn/9XPVa1bBEoVpVAFa5hX1aN1nKynL0Ao2FIlfp/eO9jN0sH4FFdkEoMyIPdYrbsaXULQafREZi2J3dZhhx4+hItP9euAGQCGbdpmZ0f6o+PBleQ6jdILpGhz8Iw/+eMLkqul7okGhiUP9IZqFtJnUW2Smqlgty3qI5eK4TdXDUV17Uvqk4YJFpwyLjXuOCantL+jadtYtEpYtlHW5ZLz33LKkMnnpb4k3ZSJ9QprI8i9/RFRtQlarz7KCnRjbVaI94k51wzV3JIFasdEx2NXDxBC8ZVoi/fWYyt2gsMV+4UB6BSo31xsI3RhsTfpAtUGF0Mt4XV/XuSyMTcX1u6qaSZKCOOVWkp75h+1FROHKlTPtMnLLc71WohDKk/snLpOS736SKSLVoxA+t+21+SiMkFcZa+mX7GySQtHz42ahTS3nMfQsMc= woody-ng woody.nhr.fau.de

fingerprints of the SSH public host keys of woody.nhr.fau.de (as of 07/2022)

256  SHA256:sf45uAzebm6at31zW3/e/T6nyd+sdCSBR9wgeu8ZOuc woody.nhr.fau.de (ECDSA)
256  SHA256:GUA916wyrc1WFjUT9FcqoJU3aNaHyb40QwJfbtXG5UI woody.nhr.fau.de (ED25519)
3072 SHA256:jpQpiCp3FAVLOap+f1QNdL52NskusxEdhowyUru5gM0 woody.nhr.fau.de (RSA)

SSH public host keys of cshpc.rrze.fau.de (as of 11/2021)

ssh-dss AAAAB3NzaC1kc3MAAACBAO2L8+7bhJm7OvvJMcdGSJ5/EaxvX5RRzE9RrB8fx5H69ObkqC6Baope4rOS9/+2gtnm8Q3gZ5QkostCiKT/Wex0kQQUmKn3fx6bmtExLq8YwqoRXRmNTjBIuyZuZH9w/XFK36MP63p/8h7KZXvkAzSRmNVKWzlsAg5AcTpLSs3ZAAAAFQCD0574+lRlF0WONMSuWeQDRFM4vwAAAIEAz1nRhBHZY+bFMZKMjuRnVzEddOWB/3iWEpJyOuyQWDEWYhAOEjB2hAId5Qsf+bNhscAyeKgJRNwn2KQMA2kX3O2zcfSdpSAGEgtTONX93XKkfh6JseTiFWos9Glyd04jlWzMbwjdpWvwlZjmvPI3ATsv7bcwHji3uA75PznVUikAAACBANjcvCxlW1Rjo92s7KwpismWfcpVqY7n5LxHfKRVqhr7vg/TIhs+rAK1XF/AWxyn8MHt0qlWxnEkbBoKIO5EFTvxCpHUR4TcHCx/Xkmtgeq5jWZ3Ja2bGBC3b47bHHNdDJLU2ttXysWorTXCoSYH82jr7kgP5EV+nPgwDhIMscpk cshpc.rrze.fau.de
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNVzp97t3CxlHtUiJ5ULqc/KLLH+Zw85RhmyZqCGXwxBroT+iK1Quo1jmG6kCgjeIMit9xQAHWjS/rxrlI10GIw= cshpc.rrze.fau.de
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPSIFF3lv2wTa2IQqmLZs+5Onz1DEug8krSrWM3aCDRU cshpc.rrze.fau.de
1024 35 135989634870042614980757742097308821255254102542653975453162649702179684202242220882431712465065778248253859082063925854525619976733650686605102826383502107993967196649405937335020370409719760342694143074628619457902426899384188195801203193251135968431827547590638365453993548743041030790174687920459410070371 cshpc.rrze.fau.de
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAs0wFVn1PN3DGcUtd/JHsa6s1DFOAu+Djc1ARQklFSYmxdx5GNQMvS2+SZFFa5Rcw+foAP9Ks46hWLo9mOjTV9AwJdOcSu/YWAhh+TUOLMNowpAEKj1i7L1Iz9M1yrUQsXcqDscwepB9TSSO0pSJAyrbuGMY7cK8m6//2mf7WSxc= cshpc.rrze.fau.de

fingerprints of the SSH public host keys of cshpc.rrze.fau.de (as of 11/2021)

1024 SHA256:A82eA7py46zE/TrSTCRYnJSW7LZXY16oOBxstJF3jxU cshpc.rrze.fau.de (DSA)
256  SHA256:wFaDywle3yJvygQ4ZAPDsi/iSBTaF6Uoo0i0z727aJU cshpc.rrze.fau.de (ECDSA)
256  SHA256:is52MRsxMgxHFn58o0ZUh8vCzIuE2gYanmhrxdy0rC4 cshpc.rrze.fau.de (ED25519)
1024 SHA256:Za1mKhTRFDXUwn7nhPsWc7py9a6OHqS2jin01LJC3ro cshpc.rrze.fau.de (RSA)

While it is possible to ssh directly to a compute node, users are only allowed to do this while they have a batch job running there. When all batch jobs of a user on a node have ended, all of their processes, including any open shells, will be killed automatically.

Software environment

The frontend and compute nodes run AlmaLinux8 (which is basically Redhat Enterprise Linux 8 without the support).

The login shell for all users on Woody is always bash and cannot be changed.

As on many other HPC systems, environment modules are used to facilitate access to software packages. Type “module avail” to get a list of available packages.

Even more packages will become visible once one of the 000-all-spack-pkgs modules has been loaded. Most of the software is installed using “Spack“ as enhanced HPC package manager.

General notes on how to use certain software on our systems (including in some cases sample job scripts) can be found on the Special applications, and tips & tricks pages. Specific notes on how some software provided via modules on the Woody cluster has been compiled, can be found in the following accordion:

Intel One API is installed in the “Free User” edition via Spack.

The module intel (and the Spack internal intel-oneapi-compilers) provides the legacy Intel compilers icc, icpc, and ifort as well as the new LLVM-based ones (icx, icpx, dpcpp, ifx).

Recommended compiler flags are: -O3 -xHost

If you want to enable full-width AVX512 SIMD support (only available on the w22xx/w23xx Icelake nodes) you have to additionally set the flag: -qopt-zmm-usage=high

The frontend nodes have Intel IceLake processors which support AVX512 while the w1xxx compute nodes do not support AVX512. Thus, software compiled with advanced options flags explicitly or implicitly may fail on the w1xxx compute nodes with illegal instruction.

The moduleintelmpi (and the Spack internal intel-oneapi-mpi) provides Intel MPI. To use the legacy Intel compilers with Intel MPI you just have to use the appropriate wrappers with the Intel compiler names, i.e. mpiicc, mpiicpc, mpiifort. To use the new LLVM-based Intel compilers with Intel MPI you have to specify them explicitly, i.e use mpiicc -cc=icx, mpiicpc -cxx=icpx, or mpiifort -fc=ifx. The execution of mpicc, mpicxx, and mpif90 results in using the GNU compilers.

The modules mkland tbb(and the Spack internalintel-oneapi-mkl, and intel-oneapi-tbb) provide Intel MKL and TBB. Use Intel’s MKL link line advisor to figure out the appropriate command line for linking with MKL. The Intel MKL also includes drop-in wrappers for FFTW3.

Further Intel tools may be added in the future.

The Intel modules on Fritz, Alex, Woody, and the Slurm-TinyGPU/TinyFAT behave different than on the older RRZE systems: (1) The intel64 module has been renamed to intel and no longer automatically loads intel-mpi and mkl. (2) intel-mpi/VERSION-intel and intel-mpi/VERSION-gcc have been unified into intel-mpi/VERSION. The selection of the compiler occurs by the wrapper name, e.g. mpicc = GCC, mpiicc = Intel; mpif90 = GFortran; mpiifort = Intel.

The GPU compilers are available in the version coming with the operating system (currently 8.5.0) as well as modules (currently versions 11.2.0 and 12.1.0).

Recommended compiler flags are: -O3 -xHost

If you want to enable full-width AVX512 SIMD support (only available on the w22xx/w23xx Icelake nodes) you have to additionally set the flag: -qopt-zmm-usage=high

The frontend nodes have Intel IceLake processors which support AVX512 while the w1xxx compute nodes do not support AVX512. Thus, software compiled with advanced option flags explicitly or implicitly may fail on the w1xxx compute nodes with illegal instruction.

Feel free to compile software in the versions and with the options you need yourself. This is perfectly fine, yet support for self-installed software cannot be granted. We only can provide software centrally which is of importance for multiple groups. If you want to use Spack for compiling additional software, you can load our user-spack module to make use of the packages we already build with Spack if the concretization matches instead of starting from scratch. Once user-spack is loaded, the command spack will be available (as alias), you will inherit the pre-sets we defined for certain packages (e.g. Open MPI to work with Slurm), but you’ll install everything into your own directories ($WORK/USER-SPACK).

You can also bring your own environment in a container using Singularity/Apptainer.

File Systems

The following table summarizes the available file systems and their features. It is only an excerpt from the description of the HPC file system.

Further details will follow.

Quota in $HOME is very limited as snapshots are made every 30 minutes. Put simulation data to $WORK! Do not rely on the specific path of $WORK as this may change over time when your work directory is relocated to a different NFS server.

Use $TMPDIR for the (limited) local storage which gets cleaned up once the job is finished.

Batch processing

As with all production clusters at RRZE, resources are controlled through a batch system. The front ends can be used for compiling and very short serial test runs, but everything else has to go through the batch system to the cluster.

TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de instead to submit jobs to TinyGPU/TinyFAT!

Woody uses SLURM as a batch system. Please see our general batch system description for further details.

The granularity of batch allocations are individual cores – not complete nodes, i.e.
- nodes can be shared but your cores and your share of the main memory are exclusive
- memory bandwidth, network bandwidth, and the local HDD/storage are shared.
You always get 7.75 GB of main memory per requested core.
Multi-node jobs are not possible as Woody is a throughput resource.

Partition	min – max walltime	min – max cores	availability	comments
work	0 – 2:00:00	1 – 32 cores; always on 1 node	a few nodes will be reserved for short jobs once the cluster is more loaded	cores & main memory (7.75 GB per requested core) are exclusive; short jobs will automatically benefit from the reserved nodes
work	0 – 24:00:00	1 – 32 cores; always on 1 node	always	cores & main memory (7.75 GB per requested core) are exclusive

If you submit jobs, then by default you can get any type of node: Icelake, Skylake, or Kabylake based. They differ in their total number of cores and main memory. However, with every type of node you will automatically get 7.75GB per requested core. Due to differences in processor generation, the speed of the CPUs can be different, which means that job runtimes may vary.

It is also possible to request certain types of nodes from the batch system. This has two mayor use cases besides the obvious “benchmarking”. Some applications can benefit from AVX512, which is only available on the Icelake nodes. Moreover, the Icelake nodes have more main memory and CPU cores in total. You request a certain node type by adding the option --constraint=... to the sbatch/salloc command. In general, the following node types are available:

Constraint	Matching nodes	Number of cores	RAM per node	Comments
(none specified)	all	1-32	32-256 GB	Can run on any node, that is all the Icelake, Skylake, and Kabylake nodes.
`--constraint=sl`	w12xx and w13xx	1-4	32 GB	Can run on the Skylake nodes only.
`--constraint=kl`	w14xx and w15xx	1-4	32 GB	Can run on the Kabylake nodes only.
`--constraint=icx`	w22xx and w23xx	1-32	256 GB	Can run on the Icelake nodes only.

Interactive jobs can be requested by using salloc instead of sbatch and specifying the respective options on the command line.

The following will give you an interactive shell on one node with one core dedicated to you for one hour:

salloc -n 1 --time=01:00:00

Settings from the calling shell (e.g. loaded module paths) will be inherited by the interactive job!

In this example, the executable will be run on one node, using 4 MPI processes, i.e. one per physical core.

#!/bin/bash -l
#SBATCH --nodes=1 
#SBATCH --ntasks-per-node=4
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV
module load XXX 

srun ./mpi_application

In this example, the executable will be run using 4 OpenMP threads for a total job walltime of 1 hour.

For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores, OMP_PROC_BIND=true. For more information, see e.g. the HPC Wiki.

#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=01:00:00
#SBATCH --export=NONE

unset SLURM_EXPORT_ENV 
module load XXX 

# set number of threads to requested cpus-per-task
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
./openmpi_application

Further Information

performance characteristics (depending on used instruction set for Linpack)	*w12xx/w13xx*** Intel Xeon E3-1240 v5 CPU (“Skylake”)	w14xx/w15xx Intel Xeon E3-1240 v6 CPU (“Kaby Lake”)	w22xx/w23xx Intel Xeon Gold 6326 CPUs (“Icelake”)
only one core in use; single core HPL with AVX512	n/a	n/a	92.8 GFlop/s
only one core in use; single core HPL with AVX2	58.6 GFlop/s	61.4 GFlop/s	49.2 GFlop/s
only one core in use; single core HPL with AVX	30.2 GFlop/s	31.6 GFlop/s	25.6 GFlop/s
only one core in use; single core HPL with SSE4.2	15.3 GFlop/s	16.0 GFlop/s	13.7 GFlop/s
only one core in use; single core memory bandwidth (stream triad)	20.2 GB/s ( 25.8 GB/s with NT stores)	21.9 GB/s (27.6 GB/s with NT stores)	15.8 GB/s (21.0 GB/s with NT stores)
throughput HPL per 4 cores with AVX512	n/a	n/a	231 GFlop/s
throughput HPL per 4 cores with AVX2	206 GFlop/s	219 GFlop/s	145 GFlop/s
throughput HPL per 4 cores with AVX	112 GFlop/s	118 GFlop/s	90 GFlop/s
throughput HPL per 4 cores with SSE4.2	57 GFlop/s	60 GFlop/s	48 GFlop/s
throughput memory bandwidth per 4 cores (stream triad)	21.2 GB/s ( 28.7 GB/s with NT stores)	21.9 GB/s ( 27.6 GB/s with NT stores)	29 GB/s (31 GB/s with NT stores)
theoretical Ethernet throughput per 4 cores	1 Gbit/s	1 Gbit/s	3 Gbit/s
HDD / SSD space per 4 cores	900 GB	870 GB	210 GB

The performance of the new Icelake nodes is only better in the High Performance Linkpack (HPL) if AVX512 instructions are used!

Saturating the node in throughput mode, the memory bandwidth per (four) core(s) is only slightly higher.

Woody throughput cluster (Tier3)

Access, User Environment, and File Systems

Access to the machine

SSH public host keys of the (externally) reachable hosts

Software environment

Intel tools (compiler, MPI, MKL, TBB)

GNU compiler (gcc/g++/gfortran)

File Systems

Batch processing

Interactive job (single- core)

MPI parallel job (single-node)

OpenMP job (single-node)

Further Information