Tutorials & Courses
Apart from regular teaching activities we are known for our “Node-Level Performance Engineering” tutorials and courses, which we provide regularly for the PRACE course program at the German Gauss Centre for Supercomputing (GCS) sites at Garching (LRZ) and Stuttgart (HLRS), and at Vienna Scientific Cluster (VSC) at TU Wien. At these sites, we are also actively involved in “MPI+X” hybrid programming tutorials in close collaboration with lecturers from HLRS and VSC.
We also give tutorials on node-level performance engineering and hybrid programming at top-ranked conferences. Our full-day tutorial “Node-Level Performance Engineering” has become a regular event at the IEEE/ACM Supercomputing conference series since 2012. Upon request we also offer our tutorial and course program for interested computing centers, research institutions, and industry.
Beyond these signature events, we offer courses on parallel programming, GPU programming, code optimization, modern C++, and more.
To see upcoming dates for our courses, please click on the name of the course you are interested in.
If you want to participate in one of our courses, please find the link to the registration in the respective accordion section.
Overview of the entire course program
HPC Introduction
Parallel Programming of High-Performance Systems
This online course, a collaboration of Erlangen National High Performance Computing Center (NHR@FAU) and Leibniz Supercomputing Center (LRZ), is targeted at students and scientists with interest in programming modern HPC hardware, specifically the large scale parallel computing systems available in Jülich, Stuttgart and Munich, but also smaller clusters in Tier-2/3 centers and departments.
Upcoming
- Three-day online course (PPHPS23), March 7-9, 2023 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff).
Past
- Three-day online course (PPHPS22), March 8–10, 2022 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff).
- Three-day online course (PPHPS21), April 13–15, 2021 (together with LRZ staff).
- Annual course at RRZE, March 9–13, 2020 (together with LRZ staff).
Introduction to parallel programming with OpenMP
Upcoming
- Introduction to OpenMP: part 1 (online), March 21, 2023
- Introduction to OpenMP: part 2 (online), March 28, 2023
Past
- Full-day online course, October 4, 2022.
Fundamentals of Accelerated Computing with CUDA C/C++
This course covers the fundamental tools and techniques for accelerating C/C++ applications to run on massively parallel GPUs with CUDA. You’ll learn how to write code, configure code parallelization with CUDA, optimize memory migration between the CPU and GPU accelerator, and implement the workflow that you’ve learned on a new task – accelerating a fully functional, but CPU-only, particle simulator for observable massive performance gains. At the end of the workshop, you’ll have access to additional resources to create new GPU-accelerated applications on your own.
Upcoming
- Full-day in-person course, March 23, 2023
Past
- Full-day online course, November 28, 2022 (in collaboration with LRZ Garching).
- Two half-day online course, April 21–22, 2022.
Fundamentals of Accelerated Computing with CUDA Python
This course conveys the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA GPUs and the Numba compiler.
Upcoming
- Full-day in-person course, March 16, 2023
Past
- Two half-day online course, September 22–23, 2022.
- Two half-day online course, August 02–03, 2022.
Advanced HPC
Core-Level Performance Engineering
This course covers performance engineering approaches on the CPU core level.
While many developers put a lot of effort into optimizing parallelism, they often lose track of the importance of an efficient serial code first. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level, the L1 cache. It covers general computer architecture for x86 and ARM processors, an introduction to (AT&T and AArch64) assembly code, and performance analysis and engineering using the Open Source Architecture Code Analyzer (OSACA) tool in combination with the Compiler Explorer.
Upcoming
- Full-day tutorial at ICPE 2023, the 14th ACM/SPEC International Conference on Performance Engineering, April 15-19, 2023, Coimbra, Portugal.
Node-Level Performance Engineering
This course covers performance engineering approaches on the compute node level.
Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.
Upcoming
- Four-day online tutorial at the High Performance Computing Center Stuttgart (HLRS), June 27–30, 2023 (with ZIH staff)
Past
- Three-day online PRACE tutorial at the Leibniz Supercomputing Center (LRZ), December 5–7, 2022.
- Full-day tutorial at Supercomputing 2022 (SC22), Nov 13–18, 2022, Dallas, TX.
- Four-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 28–July 1, 2022 (with ZIH staff)
- Three-day online PRACE tutorial at the Leibniz Supercomputing Center (LRZ), December 1–3, 2021.
- Full-day tutorial at Supercomputing 2021 (SC21), Nov 14–19, 2021, St Louis, MO.
- Three-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), July 12–14, 2021.
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, March 10–12, 2021.
- Full-day tutorial at the virtual Supercomputing 2020 (SC20), Nov 9–20, 2020, Atlanta, GA.
- Three-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 29–July 1, 2020.
- Three-day short course at the University of Cologne, January 20–22, 2020.
Introduction to Hybrid Programming in HPC
Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.
Hands-on sessions are included on all days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a “how-to” section. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.
Past
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, December 12-14, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at LRZ Garching, Germany, June 22-24, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, April 5–7, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 15–17, 2021 (with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 17–19, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
- Two-day tutorial at High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, January 27–28, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
HPC Tools
Performance analysis on GPUs with NVIDIA tools
Upcoming
- Half-day online course at NHR@FAU, April 4, 2023
Past
- Half-day online course at NHR@FAU, September 29, 2022.
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), online, July 18–30, 2021.
VI-HPS Tuning Workshop
This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.
On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.
Past
- Three-day online workshop at NHR@FAU, March 1–3, 2021.
- Three-day online workshop at CSC Frankfurt, December 7–11, 2020.
- Three-day online workshop at CINECA, Italy, September 30–October 2, 2020.
LIKWID
LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.
Past
- LIKWID, OSACA, and Sparse MVM on A64FX. Webinar for Stony Brook University, July 27, 2021. Video recording
- Webinar: Using the LIKWID and OSACA tools on A64FX. June 2, 2021. Video
Performance Evaluation
Past
- 2021 Code Performance Series: From analysis to insight. Online session on “Single-Node optimization,” July 15, 2021. Video recording
- EXA2PRO-EoCoE joint workshop, afternoon online session “Performance Engineering and code generation techniques”, February 23, 2021. Slides
Programming
Introduction to C++ for beginners
The focus of this course is on the introduction of the essential language features and the syntax of C++. Additionally, it introduces many C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code from the very beginning.
The course aims at understanding the core of the C++ programming language, teaches guidelines to develop mature, robust, maintainable, and efficient C++ software, and helps to avoid the most common pitfalls. Attendees should have a grasp of general programming (in any language).
Upcoming
- The next course is planned for October 2023.
Past
- Five-day online course at NHR@FAU, October 10–14, 2022.
Modern C++ Software Design
This advanced C++ training is a course on software development with the C++ programming language. The focus of the training are the essential C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code.
The course will give insight into the different aspects of C++ (object-oriented programming, functional programming, generic programming) and will teach guidelines to develop mature, robust, maintainable, and efficient C++ code.
Upcoming
- The next course is planned for October 2023.
Past
- Three-day online course at NHR@FAU, October 5–7, 2022.
Tutorials on Molecular Dynamics Simulations
GROMACS Course
December 12–16, 2022
This course covers an introduction into the molecular dynamics engine GROMACS, including fundamental commands and applications. Over five days, the participants will learn how to prepare and run simulations of biomolecular systems (e.g. including membranes and proteins) at an atomistic and coarse-grained level of resolution. Post-processing and analysis of simulation trajectories are a large part of the tutorial.
The course is usually embedded in the Bachelor programs of Biology and Integrated Life Sciences. There are five places available for people from NHR. The course will be held in person and takes place in the CIP of the Biology Department.
Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de.