Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work.
We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.
Level: Intermediate
Language: English (German upon request for bespoke courses)
Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).
Knowledge
- Programming experience in C, C++, or Fortran at a level sufficient to read simple loop kernels
- Some exposure to parallel programming (OpenMP or MPI) is helpful but not required
Technical
- A modern web browser (for JupyterHub access to NHR@FAU’s HPC clusters)
- 2026, Dec 1-3: three-day online course in collaboration with LRZ
- 2026, Jun 16: full-day on-site tutorial in Durham, England, as part of Durham HPC Days 2026
- 2026, Jun 9-12: four-day online course in collaboration with HLRS, ZIH (TU Dresden)
- 2026, Mar 17: full-day on-site tutorial in Bonn, Germany, as part of 2nd Anniversary of the Supercomputer Marvin
- 2025, Dec 2-4: three-day online course in collaboration with LRZ
- 2025, Sep 10-12: three-day on-site course at NHR@FAU
- 2025, Jun 3-6: four-day on-site course at HLRS in collaboration with HLRS, ZIH (TU Dresden)
- 2024, Dec 3-5: three-day on-site course at LRZ in collaboration with LRZ
- 2024, Jun 18-21: four-day on-site course at HLRS in collaboration with HLRS, ZIH (TU Dresden)
- 2023, Dec 4-6: three-day on-site course at LRZ in collaboration with LRZ
- 2023, Nov 12: full-day on-site tutorial in Denver, CO, USA, as part of SC23
- 2023, Oct 4-6: three-day on-site course at NHR@FAU
- 2023, Jun 27-30: four-day on-site course at HLRS in collaboration with HLRS, ZIH (TU Dresden)
- 2023, May 11: half-day on-site tutorial in Hamburg, Germany, as part of ISC High Performance
- 2022, Dec 5-7: three-day on-site course at LRZ in collaboration with LRZ, PRACE
- 2022, Nov 13: full-day on-site tutorial in Dallas, TX, USA, as part of SC22
- 2022, Jun 28 – Jul 1: four-day on-site course at HLRS in collaboration with HLRS, PRACE, ZIH (TU Dresden)
For an overview of all NHR@FAU courses, visit the course overview page.