HPC clusters & systems
NHR@FAU operates different HPC systems which target different application areas. Some systems are for basic Tier3 FAU service only while others are jointly operated for NHR and Tier3 FAU access. Tier3 systems/parts are financed by FAU or as DFG Forschungsgroßgerät while NHR systems/parts are funded by federal and state authorities (BMBF and Bavarian State Ministry of Science and the Arts, respectively.
Overview
Cluster name | #nodes | target applications | Parallel filesystem |
Local harddisks |
description |
---|---|---|---|---|---|
Fritz (NHR+Tier3) | 992 | high-end massively parallel | Yes | No | open for NHR and Tier3 after application |
Alex (NHR+Tier3) | 304 Nvidia A100 and 352 Nvidia A40 GPGPUs in 82 nodes | high-end GPGPU | Yes (but only via Ethernet) | Yes (NVMe SSDs) | open for NHR and Tier3 after application |
Meggie (Tier3) | 728 | parallel | no longer | No | This is the RRZE’ main working horse, intended for parallel jobs. |
Woody (Tier3) | 288 | serial throughput | No | Yes | Cluster with fast (single- and dual-socket) CPUs for serial throughput workloads |
TinyGPU (Tier3) | 35 nodes 1638 GPUs |
GPU | No | Yes (SSDs) | The nodes in this cluster are equipped with NVIDIA GPUs (mostly with 4 GPUs per node) |
TinyFat (Tier3) | 47 | large memory requirements | No | Yes (SSDs) | This cluster is for applications that require large amounts of memory. Each node has 256 or 512 gigabytes of main memory. |
Alex (installed 2021/2022; extended 2023)
Nodes | 20 GPGPU nodes, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 Cache per chip and 1,024 GB of DDR4-RAM, eight Nvidia A100 (each 40 GB HBM2 @ 1,555 GB/s; HGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32), two HDR200 Infiniband HCAs, 25 GbE, and 14 TB on local NVMe SSDs.
18 GPGPU nodes, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 Cache per chip and 2,048 GB of DDR4-RAM, eight Nvidia A100 (each 80 GB HBM2 @ 1,555 GB/s; HGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32), two HDR200 Infiniband HCAs, 25 GbE, and 14 TB on local NVMe SSDs. 44 GPGPU nodes, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3Cache per chip, 512 GB of DDR4-RAM, eight Nvidia A40 (each with 48 GB DDR6 @ 696 GB/s; 37.42 TFlop/s in FP32), 25 GbE, and 7 TB on local NVMe SSDs. |
Linpack Performance | 4.030 PFlop/s |
Fritz (installed 2021/2022; extended 2023)
Nodes | 992 compute nodes, each with two Intel Xeon Platinum 8360Y “Ice Lake” chips (36 cores per chip), running at 2.4 GHz with 54 MB Shared L3 Cache per chip and 256 GB of DDR4-RAM.
Additionally 64 huge-memory compute nodes, each with two Intel Xeon Platinum 8470 “Sapphire Rapids” processors (52 cores per chip), running at a base frequency of 2.0 GHz and 105 MB Shared L3 cache per chip. 48 of these nodes have 1 TB of DDR5 RAM, while 16 of these nodes even have 2 TB of DDR5 RAM. |
Parallel file system | Lustre-based parallel filesystem with a capacity of about 3,5 PB and an aggregated parallel I/O bandwidth of > 20 GB/s. |
Network | Blocking HDR100 Infiniband with up to 100 GBit/s bandwidth per link and direction. |
Linpack Performance | 3.578 PFlop/s |
Meggie (installed 2017)
Nodes | 728 compute nodes, each with two Intel Xeon E5-2630v4 „Broadwell“ chips (10 cores per chip) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM. |
Parallel file system | Lustre-based parallel filesystem with a capacity of almost 1 PB and an aggregated parallel I/O bandwidth of > 9000 MB/s. |
Network | Intel OmniPath interconnect with up to 100 GBit/s bandwidth per link and direction. |
Linpack Performance | 481 TFlop/s |
Emmy (EOL; 2013-2022)
Nodes | 560 compute nodes, each with two Xeon 2660v2 „Ivy Bridge“ chips (10 cores per chip + SMT) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM |
Parallel file system | LXFS with a capacity of 400 TB and an aggregated parallel I/O bandwidth of > 7000 MB/s |
Network | Fat-tree Infiniband interconnect fabric with 40 GBit/s bandwidth per link and direction |
Linpack Performance | 191 TFlop/s |
Testcluster
For the evaluation of microarchitectures and research purposes we also maintain a cluster of test machines. We try to always have at least one machine of every relevant architecture in HPC. Currently all recent Intel processor generations are available. Frequently we also get early access prototypes for benchmarking.