Node-Level Performance Engineering

Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work.

We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.

Level: Intermediate

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

Programming experience in C, C++, or Fortran at a level sufficient to read simple loop kernels
Some exposure to parallel programming (OpenMP or MPI) is helpful but not required

Technical

A modern web browser (for JupyterHub access to NHR@FAU’s HPC clusters)

2026, Dec 1-3: three-day online course in collaboration with LRZ (Register)

2026, Jun 16: full-day on-site tutorial in Durham, England, as part of Durham HPC Days 2026
2026, Jun 9-12: four-day online course in collaboration with HLRS, ZIH (TU Dresden)
2026, Mar 17: full-day on-site tutorial in Bonn, Germany, as part of 2nd Anniversary of the Supercomputer Marvin
2025, Dec 2-4: three-day online course in collaboration with LRZ
2025, Sep 10-12: three-day on-site course at NHR@FAU
2025, Jun 3-6: four-day on-site course at HLRS in collaboration with HLRS, ZIH (TU Dresden)
2024, Dec 3-5: three-day on-site course at LRZ in collaboration with LRZ
2024, Jun 18-21: four-day on-site course at HLRS in collaboration with HLRS, ZIH (TU Dresden)
2023, Dec 4-6: three-day on-site course at LRZ in collaboration with LRZ
2023, Nov 12: full-day on-site tutorial in Denver, CO, USA, as part of SC23
2023, Oct 4-6: three-day on-site course at NHR@FAU
2023, Jun 27-30: four-day on-site course at HLRS in collaboration with HLRS, ZIH (TU Dresden)
2023, May 11: half-day on-site tutorial in Hamburg, Germany, as part of ISC High Performance
2022, Dec 5-7: three-day on-site course at LRZ in collaboration with LRZ, PRACE
2022, Nov 13: full-day on-site tutorial in Dallas, TX, USA, as part of SC22
2022, Jun 28 – Jul 1: four-day on-site course at HLRS in collaboration with HLRS, PRACE, ZIH (TU Dresden)

For an overview of all NHR@FAU courses, visit the course overview page.

Node-Level Performance Engineering

Course Details

Prerequisites

Upcoming Events

Past Events (16)