Tutorials & Courses
Apart from regular teaching activities we are known for our “Node-Level Performance Engineering” tutorials and courses, which we provide regularly at the the IEEE/ACM Supercomputing conference series and the German Gauss Centre for Supercomputing (GCS) sites at Garching (LRZ) and Stuttgart (HLRS), and at Vienna Scientific Cluster (VSC) at TU Wien. At these sites, we are also actively involved in “MPI+X” hybrid programming tutorials in close collaboration with lecturers from HLRS and VSC.
Upon request we also offer our tutorial and course program for other interested computing centers, research institutions, and industry. Beyond these signature events, we offer courses on parallel programming, GPU programming, code optimization, modern C++, and more.
To see upcoming dates for our courses, please click on the name of the course you are interested in.
If you want to participate in one of our courses, please find the link to the registration in the respective accordion section.
Overview of the entire course program
NHR@FAU course program for Spring 2024 at a glance.
HPC Introduction
Parallel Programming of High-Performance Systems (PPHPS)
This online course, a collaboration of Erlangen National High Performance Computing Center (NHR@FAU) and Leibniz Supercomputing Center (LRZ), is targeted at students and scientists with interest in programming modern HPC hardware, specifically the large scale parallel computing systems available in Jülich, Stuttgart and Munich, but also smaller clusters in Tier-2/3 centers and departments.
Past
- PPHPS24, three-day on-site course at NHR@FAU, February 20-22, 2024 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff)
- Three-day online course (PPHPS23), March 7-9, 2023 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff)
- Three-day online course (PPHPS22), March 8–10, 2022 (Ayesha Afzal, Markus Wittmann, Georg Hager together with LRZ staff)
- Three-day online course (PPHPS21), April 13–15, 2021 (together with LRZ staff)
- Annual course at RRZE, March 9–13, 2020 (together with LRZ staff)
Introduction to parallel programming with MPI
This course gives an introduction to the Message Passing Interface (MPI), the dominating distributed-memory programming paradigm in High Performance Computing.
Upcoming
- Introduction to MPI, two-day online course at NHR@FAU, April 11-12, 2024.
Introduction to parallel programming with OpenMP
OpenMP is a standard for parallelizing shared memory C/C++ and Fortran applications. It is supported by major compilers and provides a simple, low-entry barrier for thread-based parallelization. This course gives an introduction to the basic workings and constructs used for parallelizing applications with OpenMP.
Past
- Introduction to OpenMP: part 2 (online), March 12, 2024
- Introduction to OpenMP: part 1 (online), March 5, 2024
- Introduction to OpenMP: part 2 (online), September 27, 2023
- Introduction to OpenMP: part 1 (online), September 20, 2023
- Introduction to OpenMP: part 2 (online), March 28, 2023
- Introduction to OpenMP: part 1 (online), March 21, 2023
- Full-day online course, October 4, 2022.
Fundamentals of Accelerated Computing with CUDA C/C++
At the conclusion of this workshop, participants will possess a robust understanding of the essential tools and techniques required for GPU-accelerating C/C++ applications with CUDA. Key takeaways include the ability to write GPU-executable code, harness data parallelism, optimize memory migration with asynchronous prefetching, employ command-line and visual profilers for guidance, utilize concurrent streams for enhanced parallelism, and apply a profile-driven approach to develop or refactor CUDA C/C++ applications for optimal performance.
Past
- Two half-day online course, March 4-5, 2024 (in collaboration with EUMaster4HPC)
- Full-day online course, February 29, 2024
- Full-day in-person course, July 28, 2023
- Full-day in-person course, March 23, 2023
- Two half-day online course, March 8-9, 2023 (in collaboration with EUMaster4HPC)
- Two half-day online course, December 9 & 16, 2022 (in collaboration with EUMaster4HPC)
- Full-day online course, November 28, 2022 (in collaboration with LRZ Garching).
- Two half-day online course, April 21–22, 2022.
Fundamentals of Accelerated Computing with CUDA Python
By the end of this workshop, participants will gain proficiency in fundamental tools and techniques for GPU-accelerated Python applications using CUDA and Numba. Highlights include the ability to GPU-accelerate NumPy ufuncs, configure code parallelization via the CUDA thread hierarchy, implement custom device kernels for optimal performance and flexibility, and employ memory coalescing and on-device shared memory to enhance the performance of CUDA kernels.
Upcoming
- Accelerated Computing with CUDA Python, full-day online course, March 14, 2024
Past
- Two half-day online course, March 6-7, 2024 (in collaboration with EUMaster4HPC)
- Full-day on-site course, September 18, 2023
- Full-day in-person course, March 16, 2023
- Two half-day online course, September 22–23, 2022
- Two half-day online course, August 02–03, 2022
Python for HPC
The Python programming language has become very popular in scientific computing for various reasons. Users not only implement prototypes for numerical experiments on small scales, but also develop parallel production codes, thereby partly replacing compiled languages such as C, C++, and Fortran. However, when following this approach it is crucial to pay special attention to performance. This course teaches approaches to use Python efficiently and reasonably in a HPC environment. The first lecture gives a whirlwind tour through the Python programming language and the standard library. In the following, the lectures strongly focus on performance-related topics such as NumPy, Cython, Numba, compiled C- and Fortran extensions, profiling of Python and compiled code, parallelism using multiprocessing and mpi4py, parallel frameworks such as Dask, and efficient IO with HDF5. In addition, we will cover topics more related to software-engineering such as packaging, publishing, testing, and the semi-automated generation of documentation. Finally, basic visualization tasks using matplotlib and similar packages are discussed.
Past
- Three-day online course (conducted by MPCDF), July 25-27, 2023
Advanced HPC
Core-Level Performance Engineering
This course covers performance engineering approaches on the CPU core level.
While many developers put a lot of effort into optimizing parallelism, they often lose track of the importance of an efficient serial code first. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level, the L1 cache. It covers general computer architecture for x86 and ARM processors, an introduction to (AT&T and AArch64) assembly code, and performance analysis and engineering using the Open Source Architecture Code Analyzer (OSACA) tool in combination with the Compiler Explorer.
Past
- Full-day on-site tutorial at PACT 2023, the 32nd International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, October 21-25, 2023.
- Full-day online tutorial at NHR@FAU, October 12, 2023
- Full-day tutorial at ICPE 2023, the 14th ACM/SPEC International Conference on Performance Engineering, April 15-19, 2023, Coimbra, Portugal.
Node-Level Performance Engineering
This course covers performance engineering approaches on the compute node level.
Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.
Upcoming
- Four-day online tutorial at the High Performance Computing Center Stuttgart (HLRS), June 18–21, 2024 (with ZIH staff).
- Three-day online tutorial at the Leibniz Supercomputing Center (LRZ), December 3–5, 2024.
Past
- Three-day online tutorial at the Leibniz Supercomputing Center (LRZ), December 4–6, 2023.
- Full-day tutorial at Supercomputing 2023 (SC23), Nov 12–17, 2023, Denver, CO (with Gerhard Wellein and Thomas Gruber.)
- Three-day on-site tutorial at NHR@FAU, October 4-6, 2023.
- Half-day online tutorial at ISC High Performance 2023, May 11, 2023.
- Four-day online tutorial at the High Performance Computing Center Stuttgart (HLRS), June 27–30, 2023 (with ZIH staff.)
- Three-day online PRACE tutorial at the Leibniz Supercomputing Center (LRZ), December 5–7, 2022.
- Full-day tutorial at Supercomputing 2022 (SC22), Nov 13–18, 2022, Dallas, TX.
- Four-day online PRACE tutorial at the High Performance Computing Center Stuttgart (HLRS), June 28–July 1, 2022 (with ZIH staff.)
Introduction to Hybrid Programming in HPC
Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.
Hands-on sessions are included on all days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a “how-to” section. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.
Past
- Three-day hybrid tutorial at High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, January 23-25, 2024 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, December 12-14, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at LRZ Garching, Germany, June 22-24, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online PRACE tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, April 5–7, 2022 (Georg Hager, with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 15–17, 2021 (with Rolf Rabenseifner [HLRS] and Claudia Blaas-Schenner [TU Wien]).
- Three-day online tutorial at Vienna Scientific Cluster (VSC), TU Wien, Austria, June 17–19, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
- Two-day tutorial at High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, January 27–28, 2020 (with Rolf Rabenseifner [HLRS], Irene Reichl, and Claudia Blaas-Schenner [TU Wien]).
Multi-GPU Programming with CUDA C++
This two day tutorial covers different approaches to extend single-GPU programs to utilize multiple GPUs within a single compute node as well as across multiple compute nodes.
Part one focuses on the single node use case, acceleration techniques such as overlapping computation and CPU-GPU data transfers, and using Nsight Systems to analyze execution behavior and performance. Part two extends on that methodology by introducing techniques for multiple nodes as well as more advanced application examples.
Upcoming
- Part 1: Accelerating CUDA C++ Applications with Multiple GPUs, full-day online course, April 5, 2024
- Part 2: Scaling CUDA C++ Applications to Multiple Nodes, full-day online course, April 10, 2024
Past
- Two full-day online course, February 8-9, 2024
HPC Tools
Performance Analysis on GPUs with NVIDIA tools
Porting Code to the GPU promises large speedups, but there can be pitfalls on the way to realizing its potential. This course will introduce NVIDIA’s profiler as a tool to spot common performance bugs that arise when porting code to the GPU.
A practical demonstration introduces the basics of GPU performance analysis. NVIDIA’s profiling tool Nsight Systems ist used to analyze GPU utilization and to spot performance anomalies. The complementary tool Nsight Compute is then used to gain more insight about the performance of individual GPU kernels. The performance analysis will be guided by simple, resource-based performance models which will enable the developer to develop a concept of how far the performance is from the “target.”
Attendees will be able to follow along the demos and conduct their own experiments on the NHR@FAU GPU cluster.
Upcoming
- Performance Analysis on GPUs with NVIDIA tools, online course, March 19, 2024.
Past
- Performance Analysis on GPUs with NVIDIA tools, online course, October 10, 2023.
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), Atlanta, GA, July 9–14, 2023.
- Performance Analysis on GPUs with NVIDIA tools, half-day online course, April 4, 2023.
- Performance Analysis on GPUs with NVIDIA tools, half-day online course, September 29, 2022.
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), online, June 19–24, 2022.
- GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS), online, July 18–30, 2021.
VI-HPS Tuning Workshop
This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.
On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.
Past
- Three-day online workshop at NHR@FAU, March 1–3, 2021.
- Three-day online workshop at CSC Frankfurt, December 7–11, 2020.
- Three-day online workshop at CINECA, Italy, September 30–October 2, 2020.
LIKWID
LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.
Upcoming
- Introduction to the LIKWID Tool Suite. Full-day online tutorial, July 23, 2024.
Past
- Introduction to the LIKWID Tool Suite. Full-day online tutorial, July 24, 2023.
- LIKWID, OSACA, and Sparse MVM on A64FX. Webinar for Stony Brook University, July 27, 2021. Video recording
- Webinar: Using the LIKWID and OSACA tools on A64FX. June 2, 2021. Video
Performance Evaluation
Past
- 2021 Code Performance Series: From analysis to insight. Online session on “Single-Node optimization,” July 15, 2021 Video recording
- EXA2PRO-EoCoE joint workshop, afternoon online session “Performance Engineering and code generation techniques”, February 23, 2021 Slides
Programming
C++ for beginners
The focus of this course is on the introduction of the essential language features and the syntax of C++. Additionally, it introduces many C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code from the very beginning.
The course aims at understanding the core of the C++ programming language, teaches guidelines to develop mature, robust, maintainable, and efficient C++ software, and helps to avoid the most common pitfalls. Attendees should have a grasp of general programming (in any language).
Upcoming
- Six-day online course at NHR@FAU, September 12/13, 19/20, and 26/27, 2024.
Past
- Six-day online course at NHR@FAU, September 14/15, 21/22, and 28/29, 2023.
- Five-day online course at NHR@FAU, October 10–14, 2022.
Modern C++ Software Design
This advanced C++ training is a course on software development with the C++ programming language. The focus of the training are the essential C++ software development principles, concepts, idioms, and best practices, which enable programmers to create professional, high-quality code.
The course will give insight into the different aspects of C++ (object-oriented programming, functional programming, generic programming) and will teach guidelines to develop mature, robust, maintainable, and efficient C++ code.
Upcoming
- Three-day online course at NHR@FAU, September 30-October 2, 2024.
Past
- Three-day online course at NHR@FAU, October 11-13, 2023.
- Three-day online course at NHR@FAU, October 5–7, 2022.
Tutorials on Molecular Dynamics Simulations
GROMACS Course
This course covers an introduction into the molecular dynamics engine GROMACS, including fundamental commands and applications. Over five days, the participants will learn how to prepare and run simulations of biomolecular systems (e.g. including membranes and proteins) at an atomistic and coarse-grained level of resolution. Post-processing and analysis of simulation trajectories are a large part of the tutorial.
The course is usually embedded in the Bachelor programs of Biology and Integrated Life Sciences. There are five places available for people from NHR. The course will be held in person and takes place in the CIP of the Biology Department.
Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de.
Past
- Three-day online course at the Biology Department of FAU, October 10–12, 2023.
- Three-day online course at the Biology Department of FAU, December 12–16, 2022.