Woody throughput cluster (Tier3)
FAU’s Woody cluster is a high-performance compute resource for throughput workloads. Woody is a Tier3 resource serving FAU’s basic needs. Therefore, NHR accounts are not enabled by default.
TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de
instead to submit jobs to TinyGPU/TinyFAT!
Woody is the next generation throughput cluster and became operational in summer 2022.
- 2 public (and 2 dedicated) front end nodes each with with two Intel Xeon Gold 6342 CPUs (“Icelake”, 2x 24 cores, HT disabled, 2.80 GHz base frequency), 512 GB RAM, and 100 GbE connection to RRZE’s network backbone.
- 64 compute nodes (w12xx/w13xx nodes) with one Intel Xeon E3-1240 v5 CPU (“Skylake”, 4 cores, HT disabled, 3.50 GHz base frequency), 32 GB RAM, 1 TB HDD
a total of 256 cores from 04/2016 and 01/2017 - 112 compute nodes (w14xx/w15xx nodes) with one Intel Xeon E3-1240 v6 CPU (“Kaby Lake”, 4 cores, HT disabled, 3.70 GHz base frequency), 32 GB RAM, 960 GB SDD
a total of 448 cores from Q3/2019 - 110 compute nodes (w22xx/w23xx/w24xx/w25xx) with two Intel Xeon Gold 6326 CPUs (“Icelake”, 2x 16 cores, HT disabled, 2.90 GHz base frequency), 256 GB RAM, 1.8 TB SSD
a total of 3.520 cores from Q2/2022 and Q2/2023.
40 nodes are financed by ECAP and dedicated to them. 40 nodes additional are financed by and belong to a project from the Physics department.
This website shows information regarding the following topics:
Access, User Environment, and File Systems
Access to the machine
Users can connect to woody.nhr.fau.de
(keep the “nhr” instead of “rrze” in mind!) and will be randomly routed to one of the two front ends. All systems in the cluster, including the front ends, have private IPv4 addresses in the 10.28.244.0/22
and IPv6 addresses in the 2001:638:a000:4801::/64
range. They can normally only be accessed directly from within the FAU networks. There is one exception: If your internet connection supports IPv6, you can directly ssh to the front ends (but not to the compute nodes). Otherwise, if you need access from outside of FAU, you usually have to connect for example to the dialog server cshpc.rrze.fau.de
first and then ssh to woody.nhr.fau.de
from there.
TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de
instead to submit jobs to TinyGPU/TinyFAT!
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFBe6F2hyQkJp19HVlZJnc3d/SiSfMRx4LDXAGnhFKXi4sBMWbMxsAFQPT5ZnlTRsPbhMJJeJLGxycQg/pGboyI= woody.nhr.fau.de ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIw6ALXTLpIx+JJG3D917bkTlA2J4vXZGXDIysOV+ZiP woody.nhr.fau.de ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDL6ZBgdcFMU1RjtDpBGjtTovgQZqT/Bd0QWfxzgYvqzUJohiTIAk5YPNq5zygi5yVfO7NphURKSCYn/9XPVa1bBEoVpVAFa5hX1aN1nKynL0Ao2FIlfp/eO9jN0sH4FFdkEoMyIPdYrbsaXULQafREZi2J3dZhhx4+hItP9euAGQCGbdpmZ0f6o+PBleQ6jdILpGhz8Iw/+eMLkqul7okGhiUP9IZqFtJnUW2Smqlgty3qI5eK4TdXDUV17Uvqk4YJFpwyLjXuOCantL+jadtYtEpYtlHW5ZLz33LKkMnnpb4k3ZSJ9QprI8i9/RFRtQlarz7KCnRjbVaI94k51wzV3JIFasdEx2NXDxBC8ZVoi/fWYyt2gsMV+4UB6BSo31xsI3RhsTfpAtUGF0Mt4XV/XuSyMTcX1u6qaSZKCOOVWkp75h+1FROHKlTPtMnLLc71WohDKk/snLpOS736SKSLVoxA+t+21+SiMkFcZa+mX7GySQtHz42ahTS3nMfQsMc= woody-ng woody.nhr.fau.de
fingerprints of the SSH public host keys of woody.nhr.fau.de (as of 07/2022)
256 SHA256:sf45uAzebm6at31zW3/e/T6nyd+sdCSBR9wgeu8ZOuc woody.nhr.fau.de (ECDSA) 256 SHA256:GUA916wyrc1WFjUT9FcqoJU3aNaHyb40QwJfbtXG5UI woody.nhr.fau.de (ED25519) 3072 SHA256:jpQpiCp3FAVLOap+f1QNdL52NskusxEdhowyUru5gM0 woody.nhr.fau.de (RSA)
SSH public host keys of cshpc.rrze.fau.de (as of 11/2021)
ssh-dss AAAAB3NzaC1kc3MAAACBAO2L8+7bhJm7OvvJMcdGSJ5/EaxvX5RRzE9RrB8fx5H69ObkqC6Baope4rOS9/+2gtnm8Q3gZ5QkostCiKT/Wex0kQQUmKn3fx6bmtExLq8YwqoRXRmNTjBIuyZuZH9w/XFK36MP63p/8h7KZXvkAzSRmNVKWzlsAg5AcTpLSs3ZAAAAFQCD0574+lRlF0WONMSuWeQDRFM4vwAAAIEAz1nRhBHZY+bFMZKMjuRnVzEddOWB/3iWEpJyOuyQWDEWYhAOEjB2hAId5Qsf+bNhscAyeKgJRNwn2KQMA2kX3O2zcfSdpSAGEgtTONX93XKkfh6JseTiFWos9Glyd04jlWzMbwjdpWvwlZjmvPI3ATsv7bcwHji3uA75PznVUikAAACBANjcvCxlW1Rjo92s7KwpismWfcpVqY7n5LxHfKRVqhr7vg/TIhs+rAK1XF/AWxyn8MHt0qlWxnEkbBoKIO5EFTvxCpHUR4TcHCx/Xkmtgeq5jWZ3Ja2bGBC3b47bHHNdDJLU2ttXysWorTXCoSYH82jr7kgP5EV+nPgwDhIMscpk cshpc.rrze.fau.de ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNVzp97t3CxlHtUiJ5ULqc/KLLH+Zw85RhmyZqCGXwxBroT+iK1Quo1jmG6kCgjeIMit9xQAHWjS/rxrlI10GIw= cshpc.rrze.fau.de ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPSIFF3lv2wTa2IQqmLZs+5Onz1DEug8krSrWM3aCDRU cshpc.rrze.fau.de 1024 35 135989634870042614980757742097308821255254102542653975453162649702179684202242220882431712465065778248253859082063925854525619976733650686605102826383502107993967196649405937335020370409719760342694143074628619457902426899384188195801203193251135968431827547590638365453993548743041030790174687920459410070371 cshpc.rrze.fau.de ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAs0wFVn1PN3DGcUtd/JHsa6s1DFOAu+Djc1ARQklFSYmxdx5GNQMvS2+SZFFa5Rcw+foAP9Ks46hWLo9mOjTV9AwJdOcSu/YWAhh+TUOLMNowpAEKj1i7L1Iz9M1yrUQsXcqDscwepB9TSSO0pSJAyrbuGMY7cK8m6//2mf7WSxc= cshpc.rrze.fau.de
fingerprints of the SSH public host keys of cshpc.rrze.fau.de (as of 11/2021)
1024 SHA256:A82eA7py46zE/TrSTCRYnJSW7LZXY16oOBxstJF3jxU cshpc.rrze.fau.de (DSA) 256 SHA256:wFaDywle3yJvygQ4ZAPDsi/iSBTaF6Uoo0i0z727aJU cshpc.rrze.fau.de (ECDSA) 256 SHA256:is52MRsxMgxHFn58o0ZUh8vCzIuE2gYanmhrxdy0rC4 cshpc.rrze.fau.de (ED25519) 1024 SHA256:Za1mKhTRFDXUwn7nhPsWc7py9a6OHqS2jin01LJC3ro cshpc.rrze.fau.de (RSA)
While it is possible to ssh directly to a compute node, users are only allowed to do this while they have a batch job running there. When all batch jobs of a user on a node have ended, all of their processes, including any open shells, will be killed automatically.
Software environment
The frontend and compute nodes run AlmaLinux8 (which is basically Redhat Enterprise Linux 8 without the support).
The login shell for all users on Woody is always bash
and cannot be changed.
As on many other HPC systems, environment modules are used to facilitate access to software packages. Type “module avail
” to get a list of available packages.
Even more packages will become visible once one of the 000-all-spack-pkgs
modules has been loaded. Most of the software is installed using “Spack“ as enhanced HPC package manager.
General notes on how to use certain software on our systems (including in some cases sample job scripts) can be found on the Special applications, and tips & tricks pages. Specific notes on how some software provided via modules on the Woody cluster has been compiled, can be found in the following accordion:
The module intel
(and the Spack internal intel-oneapi-compilers
) provides the legacy Intel compilers icc
, icpc
, and ifort
as well as the new LLVM-based ones (icx
, icpx
, dpcpp
, ifx
).
Recommended compiler flags are: -O3 -xHost
If you want to enable full-width AVX512 SIMD support (only available on the w22xx/w23xx Icelake nodes) you have to additionally set the flag: -qopt-zmm-usage=high
The frontend nodes have Intel IceLake processors which support AVX512 while the w1xxx compute nodes do not support AVX512. Thus, software compiled with advanced options flags explicitly or implicitly may fail on the w1xxx compute nodes with illegal instruction.
The moduleintelmpi
(and the Spack internal intel-oneapi-mpi
) provides Intel MPI. To use the legacy Intel compilers with Intel MPI you just have to use the appropriate wrappers with the Intel compiler names, i.e. mpiicc
, mpiicpc
, mpiifort
. To use the new LLVM-based Intel compilers with Intel MPI you have to specify them explicitly, i.e use mpiicc -cc=icx
, mpiicpc -cxx=icpx
, or mpiifort -fc=ifx
. The execution of mpicc
, mpicxx
, and mpif90
results in using the GNU compilers.
The modules mkl
and tbb
(and the Spack internalintel-oneapi-mkl
, and intel-oneapi-tbb
) provide Intel MKL and TBB. Use Intel’s MKL link line advisor to figure out the appropriate command line for linking with MKL. The Intel MKL also includes drop-in wrappers for FFTW3.
Further Intel tools may be added in the future.
The Intel modules on Fritz, Alex, Woody, and the Slurm-TinyGPU/TinyFAT behave different than on the older RRZE systems: (1) The intel64
module has been renamed to intel
and no longer automatically loads intel-mpi
and mkl
. (2) intel-mpi/VERSION-intel
and intel-mpi/VERSION-gcc
have been unified into intel-mpi/VERSION
. The selection of the compiler occurs by the wrapper name, e.g. mpicc
= GCC, mpiicc
= Intel; mpif90
= GFortran; mpiifort
= Intel.
The GPU compilers are available in the version coming with the operating system (currently 8.5.0) as well as modules (currently versions 11.2.0 and 12.1.0).
Recommended compiler flags are: -O3 -xHost
If you want to enable full-width AVX512 SIMD support (only available on the w22xx/w23xx Icelake nodes) you have to additionally set the flag: -qopt-zmm-usage=high
The frontend nodes have Intel IceLake processors which support AVX512 while the w1xxx compute nodes do not support AVX512. Thus, software compiled with advanced option flags explicitly or implicitly may fail on the w1xxx compute nodes with illegal instruction.
Feel free to compile software in the versions and with the options you need yourself. This is perfectly fine, yet support for self-installed software cannot be granted. We only can provide software centrally which is of importance for multiple groups. If you want to use Spack for compiling additional software, you can load our user-spack
module to make use of the packages we already build with Spack if the concretization matches instead of starting from scratch. Once user-spack
is loaded, the command spack
will be available (as alias), you will inherit the pre-sets we defined for certain packages (e.g. Open MPI to work with Slurm), but you’ll install everything into your own directories ($WORK/USER-SPACK
).
You can also bring your own environment in a container using Singularity/Apptainer.
File Systems
The following table summarizes the available file systems and their features. It is only an excerpt from the description of the HPC file system.
Further details will follow.
Quota in $HOME
is very limited as snapshots are made every 30 minutes. Put simulation data to $WORK
! Do not rely on the specific path of $WORK
as this may change over time when your work directory is relocated to a different NFS server.
Use $TMPDIR
for the (limited) local storage which gets cleaned up once the job is finished.
Batch processing
As with all production clusters at RRZE, resources are controlled through a batch system. The front ends can be used for compiling and very short serial test runs, but everything else has to go through the batch system to the cluster.
TinyGPU and TinyFAT are no longer served by the Woody frontend nodes; use tinyx.nhr.fau.de
instead to submit jobs to TinyGPU/TinyFAT!
Woody uses SLURM as a batch system. Please see our general batch system description for further details.
- The granularity of batch allocations are individual cores – not complete nodes, i.e.
- nodes can be shared but your cores and your share of the main memory are exclusive
- memory bandwidth, network bandwidth, and the local HDD/storage are shared.
- You always get 7.75 GB of main memory per requested core.
- Multi-node jobs are not possible as Woody is a throughput resource.
Partition | min – max walltime | min – max cores | availability | comments |
work | 0 – 2:00:00 | 1 – 32 cores; always on 1 node |
a few nodes will be reserved for short jobs once the cluster is more loaded | cores & main memory (7.75 GB per requested core) are exclusive; short jobs will automatically benefit from the reserved nodes |
work | 0 – 24:00:00 | 1 – 32 cores; always on 1 node |
always | cores & main memory (7.75 GB per requested core) are exclusive |
If you submit jobs, then by default you can get any type of node: Icelake, Skylake, or Kabylake based. They differ in their total number of cores and main memory. However, with every type of node you will automatically get 7.75GB per requested core. Due to differences in processor generation, the speed of the CPUs can be different, which means that job runtimes may vary.
It is also possible to request certain types of nodes from the batch system. This has two mayor use cases besides the obvious “benchmarking”. Some applications can benefit from AVX512, which is only available on the Icelake nodes. Moreover, the Icelake nodes have more main memory and CPU cores in total. You request a certain node type by adding the option --constraint=...
to the sbatch
/salloc
command. In general, the following node types are available:
Constraint | Matching nodes | Number of cores | RAM per node | Comments |
---|---|---|---|---|
(none specified) | all | 1-32 | 32-256 GB | Can run on any node, that is all the Icelake, Skylake, and Kabylake nodes. |
--constraint=sl |
w12xx and w13xx |
1-4 | 32 GB | Can run on the Skylake nodes only. |
--constraint=kl |
w14xx and w15xx | 1-4 | 32 GB | Can run on the Kabylake nodes only. |
--constraint=icx |
w22xx and w23xx |
1-32 | 256 GB | Can run on the Icelake nodes only. |
Interactive jobs can be requested by using salloc
instead of sbatch
and specifying the respective options on the command line.
The following will give you an interactive shell on one node with one core dedicated to you for one hour:
salloc -n 1 --time=01:00:00
Settings from the calling shell (e.g. loaded module paths) will be inherited by the interactive job!
In this example, the executable will be run on one node, using 4 MPI processes, i.e. one per physical core.
#!/bin/bash -l #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --time=01:00:00 #SBATCH --export=NONE unset SLURM_EXPORT_ENV module load XXX srun ./mpi_application
In this example, the executable will be run using 4 OpenMP threads for a total job walltime of 1 hour.
For more efficient computation, OpenMP threads should be pinned to the compute cores. This can be achieved by the following environment variables: OMP_PLACES=cores
, OMP_PROC_BIND=true
. For more information, see e.g. the HPC Wiki.
#!/bin/bash -l #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=4 #SBATCH --time=01:00:00 #SBATCH --export=NONE unset SLURM_EXPORT_ENV module load XXX # set number of threads to requested cpus-per-task export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./openmpi_application
Further Information
performance characteristics (depending on used instruction set for Linpack) |
w12xx/w13xx |
w14xx/w15xx Intel Xeon E3-1240 v6 CPU (“Kaby Lake”) |
w22xx/w23xx Intel Xeon Gold 6326 CPUs (“Icelake”) |
---|---|---|---|
only one core in use; single core HPL with AVX512 | n/a | n/a | 92.8 GFlop/s |
only one core in use; single core HPL with AVX2 | 58.6 GFlop/s | 61.4 GFlop/s | 49.2 GFlop/s |
only one core in use; single core HPL with AVX | 30.2 GFlop/s | 31.6 GFlop/s | 25.6 GFlop/s |
only one core in use; single core HPL with SSE4.2 | 15.3 GFlop/s | 16.0 GFlop/s | 13.7 GFlop/s |
only one core in use; single core memory bandwidth (stream triad) | 20.2 GB/s ( 25.8 GB/s with NT stores) | 21.9 GB/s (27.6 GB/s with NT stores) | 15.8 GB/s (21.0 GB/s with NT stores) |
throughput HPL per 4 cores with AVX512 | n/a | n/a | 231 GFlop/s |
throughput HPL per 4 cores with AVX2 | 206 GFlop/s | 219 GFlop/s | 145 GFlop/s |
throughput HPL per 4 cores with AVX | 112 GFlop/s | 118 GFlop/s | 90 GFlop/s |
throughput HPL per 4 cores with SSE4.2 | 57 GFlop/s | 60 GFlop/s | 48 GFlop/s |
throughput memory bandwidth per 4 cores (stream triad) | 21.2 GB/s ( 28.7 GB/s with NT stores) | 21.9 GB/s ( 27.6 GB/s with NT stores) | 29 GB/s (31 GB/s with NT stores) |
theoretical Ethernet throughput per 4 cores | 1 Gbit/s | 1 Gbit/s | 3 Gbit/s |
HDD / SSD space per 4 cores | 900 GB | 870 GB | 210 GB |
The performance of the new Icelake nodes is only better in the High Performance Linkpack (HPL) if AVX512 instructions are used!
Saturating the node in throughput mode, the memory bandwidth per (four) core(s) is only slightly higher.