Tutorials & Courses
Apart from regular teaching activities we are known for our “Node-Level Performance Engineering” tutorials and courses, which we provide regularly for the PRACE course program at the German Gauss Centre for Supercomputing (GCS) sites at Garching (LRZ) and Stuttgart (HLRS), and at Vienna Scientific Cluster (VSC) at TU Wien. At these sites, we are also actively involved in “MPI+X” hybrid programming tutorials in close collaboration with lecturers from HLRS and VSC.
We also give tutorials on node-level performance engineering and hybrid programming at top-ranked conferences. Our full-day tutorial “Node-Level Performance Engineering” has become a regular event at the IEEE/ACM Supercomputing conference series since 2012. Upon request we also offer our tutorial and course program for interested computing centers, research institutions, and industry.
Beyond these signature events, we offer courses on parallel programming, GPU programming, code optimization, modern C++, and more.
To see upcoming dates for our courses, please click on the name of the course you are interested in.
If you want to participate in one of our courses, please find the link to the registration in the respective accordion section.
Overview of the entire course program
Parallel Programming of High-Performance Systems
This online course, a collaboration of Erlangen National High Performance Computing Center (NHR@FAU) and Leibniz Supercomputing Center (LRZ), is targeted at students and scientists with interest in programming modern HPC hardware, specifically the large scale parallel computing systems available in Jülich, Stuttgart and Munich, but also smaller clusters in Tier-2/3 centers and departments.
- Three-day online course (PPHPS23), March 7-9, 2023 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff).
- Three-day online course (PPHPS22), March 8–10, 2022 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff).
- Three-day online course (PPHPS21), April 13–15, 2021 (together with LRZ staff).
- Annual course at RRZE, March 9–13, 2020 (together with LRZ staff).
Introduction to parallel programming with OpenMP
- Full-day online course, October 4, 2022.
Fundamentals of Accelerated Computing with CUDA C/C++
This workshop teaches the fundamental tools and techniques for accelerating C/C++ applications to run on massively parallel GPUs with CUDA. You’ll learn how to write code, configure code parallelization with CUDA, optimize memory migration between the CPU and GPU accelerator, and implement the workflow that you’ve learned on a new task – accelerating a fully functional, but CPU-only, particle simulator for observable massive performance gains. At the end of the workshop, you’ll have access to additional resources to create new GPU-accelerated applications on your own.
- The next course is planned for the last quarter of 2022.
- Two half-day online course, April 21–22, 2022.
Fundamentals of Accelerated Computing with CUDA Python
Node-Level Performance Engineering
This course covers performance engineering approaches on the compute node level. Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.
- Four-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 28–July 1, 2022 (with ZIH staff)
- Three-day online PRACE tutorial at the Leibniz Supercomputing Center (LRZ), December 1–3, 2021.
- Full-day tutorial at Supercomputing 2021 (SC21), Nov 14–19, 2021, St Louis, MO.
- Three-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), July 12–14, 2021.
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, March 10–12, 2021.
- Full-day tutorial at the virtual Supercomputing 2020 (SC20), Nov 9–20, 2020, Atlanta, GA.
- Three-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 29–July 1, 2020.
- Three-day short course at the University of Cologne, January 20–22, 2020.
Introduction to Hybrid Programming in HPC
Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.
Hands-on sessions are included on all days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a “how-to” section. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, April 5–7, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 15–17, 2021 (with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 17–19, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
- Two-day tutorial at High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, January 27–28, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
Performance analysis on GPUs with NVIDIA tools
VI-HPS Tuning Workshop
This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.
On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.
LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.
Introduction to C++ for beginners
The focus of this course is on the introduction of the essential language features and the syntax of C++. Additionally, it introduces many C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code from the very beginning.
The course aims at understanding the core of the C++ programming language, teaches guidelines to develop mature, robust, maintainable, and efficient C++ software, and helps to avoid the most common pitfalls. Attendees should have a grasp of general programming (in any language).
- Five-day online course at NHR@FAU, October 10–14, 2022.
Modern C++ Software Design
This advanced C++ training is a course on software development with the C++ programming language. The focus of the training are the essential C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code.
The course will give insight into the different aspects of C++ (object-oriented programming, functional programming, generic programming) and will teach guidelines to develop mature, robust, maintainable, and efficient C++ code.
- Three-day online course at NHR@FAU, October 5–7, 2022.