Tutorials & Courses
Apart from regular teaching activities we are known for our “Node-Level Performance Engineering” tutorials and courses, which we provide regularly at the the IEEE/ACM Supercomputing conference series and the German Gauss Centre for Supercomputing (GCS) sites at Garching (LRZ) and Stuttgart (HLRS), and at Vienna Scientific Cluster (VSC) at TU Wien. At these sites, we are also actively involved in “MPI+X” hybrid programming tutorials in close collaboration with lecturers from HLRS and VSC.
Upon request we also offer our tutorial and course program for other interested computing centers, research institutions, and industry. Beyond these signature events, we offer courses on parallel programming, GPU programming, code optimization, modern C++, and more.
To see upcoming dates for our courses, please click on the name of the course you are interested in.
If you want to participate in one of our courses, please find the link to the registration in the respective accordion section.
Overview of the entire course program
HPC Introduction
Parallel Programming of High-Performance Systems
This online course, a collaboration of Erlangen National High Performance Computing Center (NHR@FAU) and Leibniz Supercomputing Center (LRZ), is targeted at students and scientists with interest in programming modern HPC hardware, specifically the large scale parallel computing systems available in Jülich, Stuttgart and Munich, but also smaller clusters in Tier-2/3 centers and departments.
Past
- Three-day online course (PPHPS23), March 7-9, 2023 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff).
- Three-day online course (PPHPS22), March 8–10, 2022 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff).
- Three-day online course (PPHPS21), April 13–15, 2021 (together with LRZ staff).
- Annual course at RRZE, March 9–13, 2020 (together with LRZ staff).
Introduction to parallel programming with OpenMP
Upcoming
- Introduction to OpenMP: part 2 (online), September 27, 2023
Past
- Introduction to OpenMP: part 1 (online), September 20, 2023
- Introduction to OpenMP: part 2 (online), March 28, 2023
- Introduction to OpenMP: part 1 (online), March 21, 2023
- Full-day online course, October 4, 2022.
Fundamentals of Accelerated Computing with CUDA C/C++
This course covers the fundamental tools and techniques for accelerating C/C++ applications to run on massively parallel GPUs with CUDA. You’ll learn how to write code, configure code parallelization with CUDA, optimize memory migration between the CPU and GPU accelerator, and implement the workflow that you’ve learned on a new task – accelerating a fully functional, but CPU-only, particle simulator for observable massive performance gains. At the end of the workshop, you’ll have access to additional resources to create new GPU-accelerated applications on your own.
Past
- Full-day in-person course, July 28, 2023
- Full-day in-person course, March 23, 2023
- Full-day online course, November 28, 2022 (in collaboration with LRZ Garching).
- Two half-day online course, April 21–22, 2022.
Fundamentals of Accelerated Computing with CUDA Python
This course conveys the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA GPUs and the Numba compiler.
Past
- Full-day on-site course, September 18, 2023
- Full-day in-person course, March 16, 2023
- Two half-day online course, September 22–23, 2022
- Two half-day online course, August 02–03, 2022
Python for HPC
The Python programming language has become very popular in scientific computing for various reasons. Users not only implement prototypes for numerical experiments on small scales, but also develop parallel production codes, thereby partly replacing compiled languages such as C, C++, and Fortran. However, when following this approach it is crucial to pay special attention to performance. This course teaches approaches to use Python efficiently and reasonably in a HPC environment. The first lecture gives a whirlwind tour through the Python programming language and the standard library. In the following, the lectures strongly focus on performance-related topics such as NumPy, Cython, Numba, compiled C- and Fortran extensions, profiling of Python and compiled code, parallelism using multiprocessing and mpi4py, parallel frameworks such as Dask, and efficient IO with HDF5. In addition, we will cover topics more related to software-engineering such as packaging, publishing, testing, and the semi-automated generation of documentation. Finally, basic visualization tasks using matplotlib and similar packages are discussed.
Past
- Three-day online course (conducted by MPCDF), July 25-27, 2023
Advanced HPC
Core-Level Performance Engineering
This course covers performance engineering approaches on the CPU core level.
While many developers put a lot of effort into optimizing parallelism, they often lose track of the importance of an efficient serial code first. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level, the L1 cache. It covers general computer architecture for x86 and ARM processors, an introduction to (AT&T and AArch64) assembly code, and performance analysis and engineering using the Open Source Architecture Code Analyzer (OSACA) tool in combination with the Compiler Explorer.
Upcoming
- Full-day on-site tutorial at PACT 2023, the 32nd International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, October 21-25, 2023.
- Full-day online tutorial at NHR@FAU, October 12, 2023
Past
- Full-day tutorial at ICPE 2023, the 14th ACM/SPEC International Conference on Performance Engineering, April 15-19, 2023, Coimbra, Portugal.
Node-Level Performance Engineering
This course covers performance engineering approaches on the compute node level.
Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.
Upcoming
- Three-day online tutorial at the Leibniz Supercomputing Center (LRZ), December 4–6, 2023.
- Full-day tutorial at Supercomputing 2023 (SC23), Nov 12–17, 2023, Denver, CO (with Gerhard Wellein and Thomas Gruber).
- Three-day on-site tutorial at NHR@FAU, October 4-6, 2023.
Past
- Half-day online tutorial at ISC High Performance 2023, May 11, 2023
- Four-day online tutorial at the High Performance Computing Center Stuttgart (HLRS), June 27–30, 2023 (with ZIH staff).
- Three-day online PRACE tutorial at the Leibniz Supercomputing Center (LRZ), December 5–7, 2022.
- Full-day tutorial at Supercomputing 2022 (SC22), Nov 13–18, 2022, Dallas, TX.
- Four-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 28–July 1, 2022 (with ZIH staff)
- Three-day online PRACE tutorial at the Leibniz Supercomputing Center (LRZ), December 1–3, 2021.
- Full-day tutorial at Supercomputing 2021 (SC21), Nov 14–19, 2021, St Louis, MO.
- Three-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), July 12–14, 2021.
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, March 10–12, 2021.
- Full-day tutorial at the virtual Supercomputing 2020 (SC20), Nov 9–20, 2020, Atlanta, GA.
- Three-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 29–July 1, 2020.
- Three-day short course at the University of Cologne, January 20–22, 2020.
Introduction to Hybrid Programming in HPC
Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.
Hands-on sessions are included on all days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a “how-to” section. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.
Upcoming
- Three-day online tutorial at High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, January 23-25, 2024 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
Past
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, December 12-14, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at LRZ Garching, Germany, June 22-24, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, April 5–7, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 15–17, 2021 (with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 17–19, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
- Two-day tutorial at High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, January 27–28, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
HPC Tools
Performance Analysis on GPUs with NVIDIA tools
Porting Code to the GPU promises large speedups, but there can be pitfalls on the way to realizing its potential. This course will introduce NVIDIA’s profiler as a tool to spot common performance bugs that arise when porting code to the GPU.
A practical demonstration introduces the basics of GPU performance analysis. NVIDIA’s profiling tool Nsight Systems ist used to analyze GPU utilization and to spot performance anomalies. The complementary tool Nsight Compute is then used to gain more insight about the performance of individual GPU kernels. The performance analysis will be guided by simple, resource-based performance models which will enable the developer to develop a concept of how far the performance is from the “target.”
Attendees will be able to follow along the demos and conduct their own experiments on the NHR@FAU GPU cluster.
Upcoming
- Performance Analysis on GPUs with NVIDIA tools, online, October 10, 2023
Past
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), Atlanta, GA, July 9–14, 2023
- Performance Analysis on GPUs with NVIDIA tools, half-day online course at NHR@FAU, April 4, 2023
- Performance Analysis on GPUs with NVIDIA tools, half-day online course at NHR@FAU, September 29, 2022
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), online, June 19–24, 2022
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), online, July 18–30, 2021
VI-HPS Tuning Workshop
This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.
On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.
Past
- Three-day online workshop at NHR@FAU, March 1–3, 2021.
- Three-day online workshop at CSC Frankfurt, December 7–11, 2020.
- Three-day online workshop at CINECA, Italy, September 30–October 2, 2020.
LIKWID
LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.
Past
- Introduction to the LIKWID Tool Suite. Full-day online tutorial, July 24, 2023.
- LIKWID, OSACA, and Sparse MVM on A64FX. Webinar for Stony Brook University, July 27, 2021. Video recording
- Webinar: Using the LIKWID and OSACA tools on A64FX. June 2, 2021. Video
Performance Evaluation
Past
- 2021 Code Performance Series: From analysis to insight. Online session on “Single-Node optimization,” July 15, 2021. Video recording
- EXA2PRO-EoCoE joint workshop, afternoon online session “Performance Engineering and code generation techniques”, February 23, 2021. Slides
Programming
Introduction to C++ for beginners
The focus of this course is on the introduction of the essential language features and the syntax of C++. Additionally, it introduces many C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code from the very beginning.
The course aims at understanding the core of the C++ programming language, teaches guidelines to develop mature, robust, maintainable, and efficient C++ software, and helps to avoid the most common pitfalls. Attendees should have a grasp of general programming (in any language).
Upcoming
- Six-day online course at NHR@FAU, September 14/15, 21/22, and 28/29, 2023.
Past
- Five-day online course at NHR@FAU, October 10–14, 2022.
Modern C++ Software Design
This advanced C++ training is a course on software development with the C++ programming language. The focus of the training are the essential C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code.
The course will give insight into the different aspects of C++ (object-oriented programming, functional programming, generic programming) and will teach guidelines to develop mature, robust, maintainable, and efficient C++ code.
Upcoming
- Three-day online course at NHR@FAU, October 11-13, 2023.
Past
- Three-day online course at NHR@FAU, October 5–7, 2022.
Tutorials on Molecular Dynamics Simulations
GROMACS Course
This course covers an introduction into the molecular dynamics engine GROMACS, including fundamental commands and applications. Over five days, the participants will learn how to prepare and run simulations of biomolecular systems (e.g. including membranes and proteins) at an atomistic and coarse-grained level of resolution. Post-processing and analysis of simulation trajectories are a large part of the tutorial.
The course is usually embedded in the Bachelor programs of Biology and Integrated Life Sciences. There are five places available for people from NHR. The course will be held in person and takes place in the CIP of the Biology Department.
Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de.
Upcoming
- Three-day online course at the Biology Department of FAU, October 10–12, 2023
Past
- Three-day online course at the Biology Department of FAU, December 12–16, 2022.