HPC clusters & systems

NHR@FAU operates different HPC systems which target different application areas. Some systems are for basic Tier3 FAU service only while others are jointly operated for NHR and Tier3 FAU access. Tier3 systems/parts are financed by FAU or as DFG Forschungsgroßgerät while NHR systems/parts are funded by federal and state authorities (BMBF and Bavarian State Ministry of Science and the Arts, respectively.

Overview

Cluster name	#nodes	target applications	Parallel filesystem	Local harddisks	description
Fritz (NHR+Tier3)	992	high-end massively parallel	Yes	No	open for NHR and Tier3 after application
Alex (NHR+Tier3)	304 Nvidia A100 and 352 Nvidia A40 GPGPUs in 82 nodes	high-end GPGPU	Yes (but only via Ethernet)	Yes (NVMe SSDs)	open for NHR and Tier3 after application
Meggie (Tier3)	728	parallel	no longer	No	This is the RRZE’ main working horse, intended for parallel jobs.
~~Emmy (Tier3)~~	~~560~~	~~parallel~~	~~Yes~~	No	~~EOL – This has been the main cluster for single-node and multi-node parallel jobs~~
Woody (Tier3)	288	serial throughput	No	Yes	Cluster with fast (single- and dual-socket) CPUs for serial throughput workloads
TinyGPU (Tier3)	35 nodes 1638 GPUs	GPU	No	Yes (SSDs)	The nodes in this cluster are equipped with NVIDIA GPUs (mostly with 4 GPUs per node)
TinyFat (Tier3)	47	large memory requirements	No	Yes (SSDs)	This cluster is for applications that require large amounts of memory. Each node has 256 or 512 gigabytes of main memory.

Alex (installed 2021/2022; extended 2023)

Nodes

20 GPGPU nodes, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 Cache per chip and 1,024 GB of DDR4-RAM, eight Nvidia A100 (each 40 GB HBM2 @ 1,555 GB/s; HGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32), two HDR200 Infiniband HCAs, 25 GbE, and 14 TB on local NVMe SSDs.

18 GPGPU nodes, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3 Cache per chip and 2,048 GB of DDR4-RAM, eight Nvidia A100 (each 80 GB HBM2 @ 1,555 GB/s; HGX board with NVLink; 9.7 TFlop/s in FP64 or 19.5 TFlop/s in FP32), two HDR200 Infiniband HCAs, 25 GbE, and 14 TB on local NVMe SSDs.

44 GPGPU nodes, each with two AMD EPYC 7713 “Milan” processors (64 cores per chip) running at 2.0 GHz with 256 MB Shared L3Cache per chip, 512 GB of DDR4-RAM, eight Nvidia A40 (each with 48 GB DDR6 @ 696 GB/s; 37.42 TFlop/s in FP32), 25 GbE, and 7 TB on local NVMe SSDs.

Linpack Performance

4.030 PFlop/s

Fritz (installed 2021/2022; extended 2023)

Nodes	992 compute nodes, each with two Intel Xeon Platinum 8360Y “Ice Lake” chips (36 cores per chip), running at 2.4 GHz with 54 MB Shared L3 Cache per chip and 256 GB of DDR4-RAM. Additionally 64 huge-memory compute nodes, each with two Intel Xeon Platinum 8470 “Sapphire Rapids” processors (52 cores per chip), running at a base frequency of 2.0 GHz and 105 MB Shared L3 cache per chip. 48 of these nodes have 1 TB of DDR5 RAM, while 16 of these nodes even have 2 TB of DDR5 RAM.
Parallel file system	Lustre-based parallel filesystem with a capacity of about 3,5 PB and an aggregated parallel I/O bandwidth of > 20 GB/s.
Network	Blocking HDR100 Infiniband with up to 100 GBit/s bandwidth per link and direction.
Linpack Performance	3.578 PFlop/s

Meggie (installed 2017)

Nodes	728 compute nodes, each with two Intel Xeon E5-2630v4 „Broadwell“ chips (10 cores per chip) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM.
Parallel file system	Lustre-based parallel filesystem with a capacity of almost 1 PB and an aggregated parallel I/O bandwidth of > 9000 MB/s.
Network	Intel OmniPath interconnect with up to 100 GBit/s bandwidth per link and direction.
Linpack Performance	481 TFlop/s

Emmy (EOL; 2013-2022)

Nodes	560 compute nodes, each with two Xeon 2660v2 „Ivy Bridge“ chips (10 cores per chip + SMT) running at 2.2 GHz with 25 MB Shared Cache per chip and 64 GB of RAM
Parallel file system	LXFS with a capacity of 400 TB and an aggregated parallel I/O bandwidth of > 7000 MB/s
Network	Fat-tree Infiniband interconnect fabric with 40 GBit/s bandwidth per link and direction
Linpack Performance	191 TFlop/s

Testcluster

For the evaluation of microarchitectures and research purposes we also maintain a cluster of test machines. We try to always have at least one machine of every relevant architecture in HPC. Currently all recent Intel processor generations are available. Frequently we also get early access prototypes for benchmarking.