Tutorials & Courses

Erlangen National High Performance Computing Center (NHR@FAU) offers a wide range of HPC-related courses, covering topics such as modern C++, parallel programming, GPU programming, performance engineering, and domain-specific applications like molecular dynamics simulations.

We regularly present our flagship events Core–Level Performance Engineering and Node-Level Performance Engineering at leading conferences such as SC and ISC, as well as at high-performance computing centers. Many of our courses are conducted in collaboration with educators from Leibniz Supercomputing Centre (LRZ), High Performance Computing Center Stuttgart (HLRS), Vienna Scientific Cluster (VSC) at TU Wien, and NHR@TUD/ ZIH at TU Dresden. Several of our GPU programming courses are offered in partnership with the Nvidia Deep Learning Institute (DLI), and we regularly contribute workshops to the European Master For High Performance Computing (EUMaster4HPC) program.

Upon request, we also conduct customized course sessions for interested computing centers, research institutions, and industry partners. Feel free to reach out to our head of training Sebastian Kuckuk, the NHR@FAU training team, or send a general inquiry to hpc-support@fau.de.

New users of the NHR@FAU computing resources are also encouraged to attend our beginner’s introduction “HPC in a nutshell” – offered monthly and online as a one-hour general introduction and as an additional one-hour introduction for AI users.

If you are an FAU student, we also encourage you to also explore the curricular courses offered by the Professorship of High Performance Computing.

Please register for each course individually using the links provided.

Course Program Overview

2026, Jun 8: Fundamentals of Accelerated Computing with OpenACC
Full-Day, Online @ LRZ
Registration
2026, Jun 9: Fundamentals of Accelerated Computing with OpenMP and Kokkos
Full-Day, Online @ LRZ
Registration
2026, Jun 9-12: Node-Level Performance Engineering
Four-Day, Online @ HLRS
Registration
2026, Jun 10-11: Fundamentals of Accelerated Computing with CUDA C/C++
Three Half-Day, Online @ LRZ
Registration
2026, Jun 15-19: Node-Level Performance Engineering
Full-Day, Durham, England
Registration
2026, Jun 22-26: Performance Engineering for Linear Solvers
Half-Day, Hamburg, Germany
Registration
2026, Jul 12-17: GPU Performance Engineering
GPU Performance Analysis, Perth, Australia
Registration
2026, Sep 17-18, 24-25, and Oct 1-2: C++ for Beginners
Six-Day, Online
Registration
2026, Oct 7-9: Modern C++ Software Design
Three-Day, Online
Registration
2026, Dec 1-3: Node-Level Performance Engineering
Three-Day, Online @ LRZ
Registration

Frequently Asked Questions (FAQ)

Check out our FAQ section here.

Performance Engineering

This course covers performance engineering approaches on the CPU core level. While many developers put a lot of effort into optimizing parallelism, they often lose track of the importance of an efficient serial code first. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level, the L1 cache. It covers general computer architecture for x86 and ARM processors, an introduction to (AT&T and AArch64) assembly code, and performance analysis and engineering using the Open Source Architecture Code Analyzer (OSACA) tool in combination with the Compiler Explorer.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Jan 31 – Feb 4: Full-Day Training in Sydney, Australia
Event: CGO26
2025, Nov 16-21: Half-Day Training in St. Louis, MO, USA
Event: SC25
2025, Oct 6: Full-Day Online Training
2025, Jun 10-13: Half-Day Training in Hamburg, Germany
Event: ISC High Performance
2024, Nov 17-22: Half-Day Training in Atlanta, GA, USA
Event: SC24
2024, Oct 8: Full-Day Training at NHR@FAU
2024, Sep 8-11: Full-Day Training in Ostrava, Czech Republic
Event: PPAM 2024
2023, Oct 21-25: Full-Day Training in Vienna, Austria
Event: PACT 2023
2023, Oct 12: Full-Day Training at NHR@FAU
2023, Apr 15-19: Full-Day Training in Coimbra, Portugal
Event: ICPE 2023

This course covers performance engineering approaches on the compute node level. Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

2026, Dec 1-3: Three-Day Online @ LRZ Training
Registration
2026, Jun 15-19: Full-Day Training in Durham, England
Event: Durham HPC Days 2026
Registration
2026, Jun 9-12: Four-Day Online @ HLRS Training in Collaboration with ZIH (TU Dresden)
Registration

Past Events

2026, Mar 17-18: Two-Day Training in Bonn, Germany
Event: 2nd Anniversary of the Supercomputer Marvin
2025, Dec 2-4: Three-Day Online @ LRZ Training
2025, Sep 10-12: Three-Day Training at NHR@FAU
2025, Jun 3-6: Four-Day Training at HLRS in Collaboration with ZIH (TU Dresden)
2024, Dec 3-5: Three-Day Training at LRZ
2024, Jun 18-21: Four-Day Training at HLRS in Collaboration with ZIH (TU Dresden)
2023, Dec 4-6: Three-Day Training at LRZ
2023, Nov 12-17: Full-Day Training in Denver, CO, USA
Event: SC23
2023, Oct 4-6: Three-Day Training at NHR@FAU
2023, May 11: Half-Day Training in Hamburg, Germany
Event: ISC High Performance
2023, Jun 27-30: Four-Day Training at HLRS in Collaboration with ZIH (TU Dresden)
2022, Dec 5-7: Three-Day Training at LRZ in Collaboration with PRACE
2022, Nov 13-18: Full-Day Training in Dallas, TX, USA
Event: SC22
2022, Jun 28-Jul 1: Four-Day Training at HLRS in Collaboration with PRACE and ZIH (TU Dresden)

This tutorial covers code analysis, performance modeling, and optimization for linear solvers on CPU and GPU nodes. Performance Engineering is often taught using simple loops as instructive examples for performance models and how they can guide optimization; however, full, preconditioned linear solvers comprise multiple back-to-back loops enclosed in an iteration scheme that is executed until convergence is achieved. Consequently, the concept of “optimal performance” has to account for both hardware resource efficiency and iterative solver convergence. We convey a performance engineering process that is geared towards linear iterative solvers. After introducing basic notions of hardware organization and storage for dense and sparse data structures, we show how the Roofline performance model can be applied to such solvers in predictive and diagnostic ways and how it can be used to assess the hardware efficiency of a solver, covering important corner cases such as pure memory boundedness. Then we advance to the structure of preconditioned solvers, using the Conjugate Gradient Method (CG) algorithm as a leading example. Hotspots and bottlenecks of the complete solver are identified followed by the introduction of advanced performance optimization techniques like preconditioning and cache blocking.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

2026, Jun 22-26: Half-Day Training in Hamburg, Germany in Collaboration with TU Delft and TU Munich
Event: ISC High Performance
Registration

Past Events

2025, Nov 16-21: Half-Day Training in St. Louis, MO, USA in Collaboration with TU Delft and TU Munich
Event: SC25
2025, Jun 10-13: Half-Day Training in Hamburg, Germany in Collaboration with TU Delft and TU Munich
Event: ISC High Performance
2024, Nov 17-22: Half-Day Training in Atlanta, GA, USA in Collaboration with TU Delft and TU Munich
Event: SC24
2024, May 12-16: Half-Day Training in Hamburg, Germany in Collaboration with TU Delft and TU Munich
Event: ISC High Performance

LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2025, Jul 31: Full-Day Online Training
2024, Jul 23: Full-Day Online Training
2023, Jul 24: Full-Day Online Training

Artificial Intelligence (AI)

By the end of this workshop, participants have experienced how deep learning works through hands-on exercises in computer vision and natural language processing. They can train deep learning models from scratch and know about tools and tricks to achieve highly accurate results. They can also leverage freely available, state-of-the-art pre-trained models to save time and get their deep learning applications up and running quickly.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Past Events

2026, Jan 9: Full-Day Training at NHR@FAU
2026, Mar 25: Full-Day Online Training

GPU Programming

By the end of the workshop, participants will understand the fundamental concepts and techniques for accelerating C++ code with CUDA. They will be able to write and compile code that runs on the GPU, optimize memory transfers between CPU and GPU, and leverage parallel algorithms to simplify adding GPU acceleration.

Additionally, participants will learn to implement custom parallel algorithms through CUDA kernels, utilize concurrent CUDA streams to overlap computation with memory operations, and identify the best opportunities to integrate CUDA acceleration into existing CPU-only applications.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Past Events

2026, Apr 8-10: Three Half-Day Online Training
2026, Jan 14: Full-Day Online @ NVIDIA Training
Event: NVIDIA DLI Virtual Workshop Series for Higher Education
2025, Oct 28: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
2025, Sep 10-12: Three Half-Day Online Training
2025, May 27: Full-Day Online @ NVIDIA Training
Event: NVIDIA DLI Virtual Workshop Series for Higher Education

By the end of this workshop, participants will have a solid grasp of the essential tools and techniques for GPU-accelerating C/C++ applications using CUDA. They will be able to write GPU-executable code, leverage data parallelism, optimize memory transfers with asynchronous prefetching, and use both command-line and visual profilers to guide performance tuning. Additionally, they will know how to employ concurrent streams to increase parallelism and apply a profile-driven approach to develop or refactor CUDA applications for maximum performance.

Until March 2025, this course was offered as an official NVIDIA Deep Learning Institute (DLI) program. Its successor, Fundamentals of Accelerated Computing with Modern CUDA C++, is also offered by NHR@FAU. Due to the original course’s popularity, we continue to offer a custom, updated version that builds upon the original material.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

2026, Jun 10-11: Three Half-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
Registration

Past Events

2026, Mar 9: Full-Day Online @ TUD Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2025, Sep 8-9: Two Half-Day Online @ TUD Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2025, Mar 12: Full-Day Online @ TUD Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2025, Feb 4: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
2024, Sep 18: Full-Day Online @ TUD Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2024, Mar 4-5: Two Half-Day Online Training in Collaboration with EUMaster4HPC
2024, Feb 29: Full-Day Online Training
2023, Jul 28: Full-Day Training at NHR@FAU
2023, Mar 23: Full-Day Training at NHR@FAU
2023, Mar 8-9: Two Half-Day Online Training in Collaboration with EUMaster4HPC
2022, Dec 9 & 16: Two Half-Day Online Training in Collaboration with EUMaster4HPC
2022, Nov 28: Full-Day Online @ LRZ Training in Collaboration with LRZ
2022, Apr 21-22: Two Half-Day Online Training

By the end of this workshop, participants will be proficient in the core tools and techniques for GPU-accelerating Python applications using CUDA and Numba. They will learn how to accelerate NumPy ufuncs on the GPU, configure parallel execution using CUDA’s thread hierarchy, implement custom device kernels for greater performance and flexibility, and optimize memory access through coalescing and shared memory to enhance kernel efficiency.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Past Events

2026, Mar 27: Full-Day Online Training
2025, Oct 29: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
2025, Sep 29: Full-Day Online Training
2025, Apr 2: Full-Day Online Training in Collaboration with EUMaster4HPC
2025, Feb 5: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
2025, Jan 16: Full-Day Online @ NVIDIA Training
Event: NVIDIA DLI Virtual Workshop Series for Higher Education
2024, Oct 24: Full-Day Online @ NVIDIA Training
Event: NVIDIA DLI Virtual Workshop Series for Higher Education
2024, Oct 7: Full-Day Online Training
2024, Mar 14: Full-Day Online Training
2024, Mar 6-7: Two Half-Day Online Training in Collaboration with EUMaster4HPC
2023, Sep 18: Full-Day Training at NHR@FAU
2023, Mar 16: Full-Day Training at NHR@FAU
2022, Sep 22-23: Two Half-Day Online Training
2022, Aug 02-03: Two Half-Day Online Training

This course is discontinued by the DLI and no alternative has been announced so far.

By the end of this workshop, participants will have a foundational understanding of OpenACC, a high-level programming model for parallel computing on CPUs and GPUs. The workshop covers profiling and optimizing applications to identify performance hotspots, using OpenACC directives to offload computations to the GPU, and improving data movement between the CPU and GPU to maximize efficiency.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

2026, Jun 8: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
Registration

Past Events

2025, Oct 27: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
2025, Apr 16: Full-Day Online Training in Collaboration with EUMaster4HPC
2025, Feb 3: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop

Morning session: Introduction to GPU programming using OpenMP directives. The lecture covers the OpenMP offloading model, including parallelism, data mapping, and memory management.

Afternoon session: GPU programming with Kokkos, a modern C++ library for performance-portable parallel computing. Participants will learn to implement algorithms efficiently on GPUs, leverage parallelism, and optimize memory access to maximise performance.

Upcoming Events

2026, Jun 9: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
Registration

This course is currently on hold and cannot be offered at the moment. NHR@FAU offers the two-day course Scaling CUDA-Accelerated Applications as an alternative.

This advanced course explores techniques for extending single-GPU applications to utilize multiple GPUs within a single compute node. It focuses on distributing workloads across multiple accelerators, optimizing performance through overlapping computation and data transfers, and using NVIDIA Nsight Systems to analyze execution behavior and identify performance bottlenecks.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Past Events

2025, Sep 15-16: Two Half-Day Online Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2025, Mar 19: Full-Day Online Training in Collaboration with NHR@TUD and EUMaster4HPC
Event: From Zero to Multi-Node GPU Programming
2025, Feb 6: Full-Day Online @ LRZ Training in Collaboration with LRZ
Event: GPU Programming Workshop
2024, Sep 25: Full-Day Online Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2024, Apr 5: Full-Day Online Training
Event: Multi-GPU Programming with CUDA C++
2024, Feb 8: Full-Day Online Training

This course is currently on hold and cannot be offered at the moment. NHR@FAU offers the two-day course Scaling CUDA-Accelerated Applications as an alternative.

This advanced course covers multi-node programming techniques for GPU-accelerated applications and examines advanced examples, with a special emphasis on using MPI and NVSHMEM to distribute workloads efficiently.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Past Events

2025, Sep 17-18: Two Half-Day Online Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2025, Mar 26: Full-Day Online Training in Collaboration with NHR@TUD and EUMaster4HPC
Event: From Zero to Multi-Node GPU Programming
2024, Oct 2: Full-Day Online Training in Collaboration with NHR@TUD
Event: From Zero to Multi-Node GPU Programming
2024, Apr 10: Full-Day Online Training
Event: Multi-GPU Programming with CUDA C++
2024, Feb 9: Full-Day Online Training

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Mar 10-11: Two-Day Online Training in Collaboration with NHR@TUD
Event: Part 2 and 3 of From Zero to Multi-Node GPU Programming

Porting code to the GPU can yield significant speedups but often presents challenges. This advanced course introduces NVIDIA’s profiling tools to identify common performance issues during the porting process. Performance analysis is guided by straightforward, resource-based models that help developers evaluate how close their code is to the optimal performance target.

The course was previously called Performance Analysis on GPUs with NVIDIA Tools and has undergone restructuring and extension at the beginning of 2025. We offer a comprehensive GPU Performance Engineering course, along with a condensed GPU Performance Analysis module that can be incorporated into larger events.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

2026, Jul 12-17: GPU Performance Analysis Training in Perth, Australia
Event: International HPC Summer School (IHPCSS26)
Registration

Past Events

2026, Apr 22-24: Three Half-Day Online Training
2025, Oct 8-10: Three Half-Day Online Training
2025, Jul 6-11: GPU Performance Analysis Training in Lisbon, Portugal
Event: International HPC Summer School (IHPCSS25)
2025, Apr 11: Full-Day Online Training
2024, Oct 9: Half-Day Online Training
2024, Jul 7-12: GPU Performance Analysis Training in Kobe, Japan
Event: International HPC Summer School (IHPCSS24)
2024, Mar 19: Half-Day Online Training
2023, Oct 10: Half-Day Online Training
2023, Jul 9-14: GPU Performance Analysis Training in Atlanta, GA, USA
Event: International HPC Summer School (IHPCSS23)
2023, Apr 4: Half-Day Online Training
2022, Sep 29: Half-Day Online Training
2022, Jun 19-24: GPU Performance Analysis Online Training
Event: International HPC Summer School (IHPCSS22)
2021, Jul 18-30: GPU Performance Analysis Online Training
Event: International HPC Summer School (IHPCSS21)
2019, Jul 7-12: GPU Performance Analysis Training in Kobe, Japan
Event: International HPC Summer School (IHPCSS19)

This workshop series bundles three of our most popular GPU programming courses: Fundamentals of Accelerated Computing with CUDA C/C++, Accelerating CUDA C++ Applications with Multiple GPUs, and Scaling CUDA C++ Applications to Multiple Nodes. Their delivery is augmented with additional material connecting the individual courses, their key concepts, and the overall workflow of GPU-accelerated applications.

Note that parts two and three are currently on hold from the DLI and have been replaced with our new two-day course Scaling CUDA-Accelerated Applications.

Please register for each part you want to attend separately.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Mar 9-11: Three-Day Online Training in Collaboration with NHR@TUD
2025, Sep 8-9, 15-18: Six Half-Day Online Training in Collaboration with NHR@TUD
2025, Mar 12, 19, 26: Three-Day Online Training in Collaboration with NHR@TUD
2024, Sep 18, Sep 25, Oct 2: Three-Day Online Training in Collaboration with NHR@TUD

This course provides an overview of the most common GPU programming approaches, including CUDA/ HIP, SYCL, modern C++, Thrust, OpenACC, OpenMP and Kokkos. It helps participants understand the strengths and weaknesses of each approach, enabling them to make informed decisions about which one to use for their specific applications.

Participants will get the most out of this course if they have already have prior experience in at least one GPU programming approach, but participation without any prior knowledge is also possible.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Mar 4-5: Two Half-Day Online Training
2025, Sep 4-5: Two Half-Day Online Training

Parallel Programming

This long-standing course is a collaboration between the Erlangen National High Performance Computing Center (NHR@FAU) and the Leibniz Supercomputing Center (LRZ). It is designed for students and researchers interested in programming modern HPC hardware, with a focus on large-scale parallel computing systems available in Jülich, Stuttgart, and Munich, as well as smaller clusters at Tier-2/3 centers and departmental facilities.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Feb 24-26: Three-Day Training at NHR@FAU in Collaboration with LRZ
Event: PPHPS26
2025, Feb 18-20: Three-Day Training at LRZ in Collaboration with LRZ
Event: PPHPS25
2024, Feb 20-22: Three-Day Training at NHR@FAU in Collaboration with LRZ
Event: PPHPS24
2023, Mar 7-9: Three-Day Online Training in Collaboration with LRZ
Event: PPHPS23
2022, Mar 8-10: Three-Day Online Training in Collaboration with LRZ
Event: PPHPS22
2021, Apr 13-15: Three-Day Online Training in Collaboration with LRZ
Event: PPHPS21
2020, Mar 9-13: Four-Day Training at FAU in Collaboration with LRZ

This course provides an introduction to the Message Passing Interface (MPI), the dominant distributed-memory programming paradigm in High Performance Computing.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, May 7-8: Two-Day Online Training
2025, Apr 9-10: Two-Day Online Training
2024, Apr 11-12: Two-Day Online Training

OpenMP is a widely supported standard for parallelizing shared-memory C/C++ and Fortran applications. It offers a simple, low-barrier entry to thread-based parallelization. This course introduces the fundamental concepts and constructs of OpenMP, as well as advanced topics like tasking and accelerator offloading.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, May 4-6: Three-Day Online Training
2025, Feb 26-28: Three-Day Online Training
2024, Sep 4-6: Three Half-Day Online Training
2024, Mar 12: Part 1 Online Training
2024, Mar 5: Part 2 Online Training
2023, Sep 27: Part 2 Online Training
2023, Sep 20: Part 1 Online Training
2023, Mar 28: Part 2 Online Training
2023, Mar 21: Part 1 Online Training
2022, Oct 4: Full-Day Online Training

Most HPC systems consist of clusters of shared-memory nodes. Efficient use of such systems requires optimizing both memory consumption and communication time. Hybrid programming combines distributed-memory parallelization across nodes (e.g., using MPI) with shared-memory parallelization within each node (e.g., using OpenMP or MPI-3.0 shared memory).

This course examines the strengths and weaknesses of various parallel programming models on clusters of shared-memory nodes, with special focus on multi-socket, multi-core systems in highly parallel environments. MPI-3.0 introduces a shared memory programming interface that complements inter-node MPI communication. This interface supports direct neighbor accesses, similar to OpenMP, and enables direct halo copies, paving the way for innovative hybrid programming models. These models are compared against hybrid MPI+OpenMP approaches and pure MPI implementations. Additionally, the course covers MPI+OpenMP offloading with GPUs. Through numerous case studies and micro-benchmarks, the course highlights performance aspects of hybrid programming. Hands-on sessions are included daily. Tools for hybrid programming such as thread and process placement support and performance analysis are demonstrated in practical “how-to” sections.

This course is a joint training event of EuroCC@GCS and EuroCC-Austria, the German and Austrian National Competence Centres for High-Performance Computing. It is organized by the HLRS in cooperation with the VSC Research Center at TU Wien and NHR@FAU.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Feb 10-12: Three-Day Training in Hybrid @ HLRS in Collaboration with HLRS and ASC
2025, Jan 21-23: Three-Day Training in Hybrid @ HLRS in Collaboration with HLRS and VSC
2024, Jan 23-25: Three-Day Training in Hybrid @ HLRS in Collaboration with HLRS and VSC
2022, Dec 12-14: Three-Day Online @ VSC Training in Collaboration with PRACE, HLRS, and VSC
2022, Jun 22-24: Three-Day Online @ LRZ Training in Collaboration with PRACE, HLRS, and VSC
2022, Apr 5-7: Three-Day Online @ VSC Training in Collaboration with PRACE, HLRS, and VSC
2021, Jun 15-17: Three-Day Online @ VSC Training in Collaboration with HLRS and VSC
2020, Jun 17-19: Three-Day Online @ VSC Training in Collaboration with HLRS and VSC
2020, Jan 27-28: Two-Day Training at HLRS in Collaboration with HLRS and VSC

C++ Programming and Software Engineering

This course introduces the core features and syntax of C++, along with key principles, idioms, and best practices for professional software development. It is designed to help programmers write high-quality, maintainable code from the start.

Participants will learn how to develop robust, efficient, and mature C++ applications while avoiding common pitfalls. A basic understanding of programming in any language is assumed.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

2026, Sep 17-18, 24-25, and Oct 1-2: Six-Day Online Training
Registration

Past Events

2025, Sep 18-19, 25-26, and Oct 1-2: Six-Day Online Training
2024, Sep 12-13, 19-20, and 26-27: Six-Day Online Training
2023, Sep 14-15, 21-22, and 28-29: Six-Day Online Training
2022, Oct 10-14: Five-Day Online Training

This advanced course focuses on software development using the C++ programming language. It emphasizes essential principles, concepts, idioms, and best practices that enable developers to write professional, high-quality code.

Participants will gain insight into key C++ paradigms object-oriented, functional, and generic programming and learn guidelines for developing robust, efficient, maintainable, and mature C++ applications.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

2026, Oct 7-9: Three-Day Online Training
Registration

Past Events

2025, Sep 30-Oct 2: Three-Day Online Training
2024, Sep 30-Oct 2: Three-Day Online Training
2023, Oct 11-13: Three-Day Online Training
2022, Oct 5-7: Three-Day Online Training

Keeping track of how software changes over time is essential in almost all development workflows today. This holds true independent of whether developers are working on a project alone or in a team of any size. Software versioning aids in many tasks such as quick recovery of deleted code, identifying when and where a bug was introduced, collaborating with other people, and deploying software in production.

This course introduces the basics of the Git version control system. It covers the concepts of Git, how to use it with different workflows (command line, VS Code, etc.), and many other practical essentials.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

2026, Mar 12-13: Two Half-Day Online Training in Collaboration with NHR@KIT

Molecular Dynamics Simulations

This course covers an introduction into the molecular dynamics engine GROMACS, including fundamental commands and applications. Over five days, the participants will learn how to prepare and run simulations of biomolecular systems (e.g. including membranes and proteins) at an atomistic and coarse-grained level of resolution. Post-processing and analysis of simulation trajectories are a large part of the tutorial.

The course is usually embedded in the Bachelor programs of Biology and Integrated Life Sciences. There are five places available for people from NHR. The course will be held in person and takes place in the CIP of the Biology Department.

Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de.

Past Events

2023, Oct 10-12: Three-Day Training at FAU, Department of Biology
2022, Dec 12-16: Three-Day Training at FAU, Department of Biology

Protonation states of biologically relevant molecules (proteins, lipids, etc.) are fundamental for their function and are influenced by pH and electrostatic interactions with neighboring molecules. Accurately capturing protonation shifts with molecular dynamics (MD) may considerably improve the understanding of pH-dependent processes [1, 2]. However, classical MD employs fixed protonation states impeding dynamical modeling of protonation changes.

This two-day advanced course introduces a recent extension of the GROMACS engine that enables constant-pH molecular dynamics (cpHMD) using λ-dynamics [3]. Participants will learn how to prepare, run, and analyze cpHMD simulations of a soluble protein, including trajectory post-processing. This is an advanced course; intermediate knowledge of GROMACS and the Linux environment is required.

The course is offered in person at the Department of Biology at Friedrich-Alexander-Universität Erlangen–Nürnberg and is limited to eight participants. Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de (Deadline: 20. March 2026). There are options for travel grants for participants. We welcome applicants of all backgrounds and especially encourage those from underrepresented groups in the natural science to apply.

Past Events

2026, Apr 13-14: Two-Day Training at FAU, Department of Biology

This course provides a short introduction to the AMBER molecular dynamics simulation suite: General workflow, system setup, simulation on NHR@FAU cluster systems (incl. GPU acceleration), and common analysis tasks on NHR@FAU systems including GPU-accelerated HPC. The following topics are covered:

1. System Setup: Model building (structure, protonation states, choice of force field/parameters), solvation + simulation box, constraints, minimisation/relaxation

2. Simulation: heating, equilibration, production run; 3. Analysis: Imaging, RMSD and fluctuations, time series of quantities (e.g. distances), probabilities (hydrogen bonds)

Intermittent Course Offerings

This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.

On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.

Past Events

2021, Mar 1-3: Three-Day Training at NHR@FAU
Event: 38th VI-HPS Tuning Workshop
2020, Dec 7-11: Three-Day Training at CSC Frankfurt
Event: 37th VI-HPS Tuning Workshop
2020, Sep 30-Oct 2: Three-Day Training in CINECA, Italy
Event: 36th VI-HPS Tuning Workshop

The Python programming language has become very popular in scientific computing for various reasons. Users not only implement prototypes for numerical experiments on small scales, but also develop parallel production codes, thereby partly replacing compiled languages such as C, C++, and Fortran. However, when following this approach it is crucial to pay special attention to performance. This course teaches approaches to use Python efficiently and reasonably in a HPC environment. The first lecture gives a whirlwind tour through the Python programming language and the standard library. In the following, the lectures strongly focus on performance-related topics such as NumPy, Cython, Numba, compiled C- and Fortran extensions, profiling of Python and compiled code, parallelism using multiprocessing and mpi4py, parallel frameworks such as Dask, and efficient IO with HDF5. In addition, we will cover topics more related to software-engineering such as packaging, publishing, testing, and the semi-automated generation of documentation. Finally, basic visualization tasks using matplotlib and similar packages are discussed.

Past Events

2023, Jul 25-27: Three-Day Online Training in Collaboration with MPCDF

Our Trainers and Collaborators

We sincerely thank all our trainers and collaborators for their valuable contributions to our courses, trainings, and events. Their support in teaching, organizing, and sharing expertise has been essential to our work. The lists below, ordered alphabetically, acknowledge those who have been part of these (joint) efforts.

HLRS

Rabenseifner, Rolf

LRZ

Allalen, Momme
Azizi, Sajjad
Weinberg, Volker

NHR@KIT

Tuteja, Keshvi

TU Delft

Thies, Jonas

TU Dresden

TU München

Anzt, Hartwig

TU Wien (VSC, ASC)

Blaas-Schenner, Claudia
Reichl, Irene

Tutorials & Courses

Course Program Overview

All Our Upcoming Courses At a Glance

Frequently Asked Questions (FAQ)

FAQ – Frequently Asked Questions

Performance Engineering

Core-Level Performance Engineering

Past Events

Node-Level Performance Engineering

Upcoming Events

Past Events

Performance Engineering for Linear Solvers

Upcoming Events

Past Events

Introduction to the LIKWID Tool Suite

Past Events

Artificial Intelligence (AI)

Fundamentals of Deep Learning

Past Events

GPU Programming

Fundamentals of Accelerated Computing with Modern CUDA C++

Past Events

Fundamentals of Accelerated Computing with CUDA C/C++

Upcoming Events

Past Events

Fundamentals of Accelerated Computing with CUDA Python

Past Events

Fundamentals of Accelerated Computing with OpenACC

Upcoming Events

Past Events

Fundamentals of Accelerated Computing with OpenMP and Kokkos

Upcoming Events

Accelerating CUDA C++ Applications with Multiple GPUs

Past Events

Scaling CUDA C++ Applications to Multiple Nodes

Past Events

Scaling CUDA-Accelerated Applications

Past Events

GPU Performance Engineering

Upcoming Events

Past Events

From Zero to Multi-Node GPU Programming

Past Events

Choosing GPU Programming Approaches

Past Events

Parallel Programming

Parallel Programming of High-Performance Systems (PPHPS)

Past Events

Introduction to Parallel Programming with MPI

Past Events

Introduction to Parallel Programming with OpenMP

Past Events

Hybrid Programming in HPC – MPI+X

Past Events

C++ Programming and Software Engineering

C++ for Beginners

Upcoming Events

Past Events

Modern C++ Software Design

Upcoming Events

Past Events

Introduction to Git

Past Events

Molecular Dynamics Simulations

Introduction to GROMACS

Past Events

GROMACS for Constant-pH Simulations

Past Events

Introduction to Amber

Intermittent Course Offerings

VI-HPS Tuning Workshop

Past Events

Python for HPC

Past Events

Our Trainers and Collaborators

NHR@FAU Trainers

External Trainers and Collaborators

HLRS

LRZ

NHR@KIT