Tutorials & Courses

Erlangen National High Performance Computing Center (NHR@FAU) offers a wide range of HPC-related courses, covering topics such as modern C++, parallel programming, GPU programming, performance engineering, and domain-specific applications like molecular dynamics simulations.

We regularly present our flagship events Core–Level Performance Engineering and Node-Level Performance Engineering at leading conferences such as SC and ISC, as well as at high-performance computing centers. Many of our courses are conducted in collaboration with educators from Leibniz Supercomputing Centre (LRZ), High Performance Computing Center Stuttgart (HLRS), Vienna Scientific Cluster (VSC) at TU Wien, and NHR@TUD/ ZIH at TU Dresden. Several of our GPU programming courses are offered in partnership with the Nvidia Deep Learning Institute (DLI), and we regularly contribute workshops to the European Master For High Performance Computing (EUMaster4HPC) program.

Upon request, we conduct customized course sessions for interested computing centers, research institutions, and industry partners.

If you are an FAU student, we also encourage you to also explore the curricular courses offered by the Professorship of High Performance Computing.

Please register for each course individually using the links provided.

Course Program Overview

Join Us	Where	When	What	How
Registration	Online	2025, Jul 31	Introduction to the LIKWID Tool Suite	Full-Day
Registration	Online	2025, Sep 4-5	Choosing GPU Programming Approaches	Two Half-Day
Registration	Online	2025, Sep 8-9	Fundamentals of Accelerated Computing with CUDA C/C++	Two Half-Day
	Online	2025, Sep 8-9, 15-18	From Zero to Multi-Node GPU Programming	Six Half-Day
Registration	Online	2025, Sep 10-12	Fundamentals of Accelerated Computing with Modern CUDA C++	Three Half-Day
Registration	NHR@FAU	2025, Sep 10-12	Node-Level Performance Engineering	Three-Day
Registration	Online	2025, Sep 15-16	Accelerating CUDA C++ Applications with Multiple GPUs	Two Half-Day
Registration	Online	2025, Sep 17-18	Scaling CUDA C++ Applications to Multiple Nodes	Two Half-Day
Registration	Online	2025, Sep 18-19, 25-26, and Oct 1-2	C++ for Beginners	Six-Day
Registration	Online	2025, Sep 29	Fundamentals of Accelerated Computing with CUDA Python	Full-Day
Registration	Online	2025, Sep 30-Oct 2	Modern C++ Software Design	Three-Day
Registration	Online	2025, Oct 6	Core-Level Performance Engineering	Full-Day
Registration	Online	2025, Oct 8-10	GPU Performance Engineering	Three Half-Day
Registration	Online	2025, Oct 27	Fundamentals of Accelerated Computing with OpenACC	Full-Day
Registration	Online	2025, Oct 28	Fundamentals of Accelerated Computing with Modern CUDA C++	Full-Day
Registration	Online	2025, Oct 29	Fundamentals of Accelerated Computing with CUDA Python	Full-Day
Registration	Online	2025, Oct 30	Accelerating CUDA C++ Applications with Multiple GPUs	Full-Day
Registration	Online	2025, Dec 2-4	Node-Level Performance Engineering	Three-Day

C++ Programming

This course introduces the core features and syntax of C++, along with key principles, idioms, and best practices for professional software development. It is designed to help programmers write high-quality, maintainable code from the start.

Participants will learn how to develop robust, efficient, and mature C++ applications while avoiding common pitfalls. A basic understanding of programming in any language is assumed.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Six-Day	Online	2025, Sep 18-19, 25-26, and Oct 1-2	registration

Past Events

Format	Location	Date	Event	In Collaboration with
Six-Day	Online	2024, Sep 12-13, 19-20, and 26-27
Six-Day	Online	2023, Sep 14-15, 21-22, and 28-29
Five-Day	Online	2022, Oct 10-14

This advanced course focuses on software development using the C++ programming language. It emphasizes essential principles, concepts, idioms, and best practices that enable developers to write professional, high-quality code.

Participants will gain insight into key C++ paradigms object-oriented, functional, and generic programming and learn guidelines for developing robust, efficient, maintainable, and mature C++ applications.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Three-Day	Online	2025, Sep 30-Oct 2	registration

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	Online	2024, Sep 30-Oct 2
Three-Day	Online	2023, Oct 11-13
Three-Day	Online	2022, Oct 5-7

GPU Programming

By the end of the workshop, participants will understand the fundamental concepts and techniques for accelerating C++ code with CUDA. They will be able to write and compile code that runs on the GPU, optimize memory transfers between CPU and GPU, and leverage parallel algorithms to simplify adding GPU acceleration.

Additionally, participants will learn to implement custom parallel algorithms through CUDA kernels, utilize concurrent CUDA streams to overlap computation with memory operations, and identify the best opportunities to integrate CUDA acceleration into existing CPU-only applications.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Full-Day	Online	2025, Oct 28	registration	Part 2 of GPU Programming Workshop	LRZ
Three Half-Day	Online	2025, Sep 10-12	registration

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2025, May 27	NVIDIA DLI Virtual Workshops for Higher Education

By the end of this workshop, participants will have a solid grasp of the essential tools and techniques for GPU-accelerating C/C++ applications using CUDA. They will be able to write GPU-executable code, leverage data parallelism, optimize memory transfers with asynchronous prefetching, and use both command-line and visual profilers to guide performance tuning. Additionally, they will know how to employ concurrent streams to increase parallelism and apply a profile-driven approach to develop or refactor CUDA applications for maximum performance.

Until March 2025, this course was offered as an official NVIDIA Deep Learning Institute (DLI) program. Its successor, Fundamentals of Accelerated Computing with Modern CUDA C++, is also offered by NHR@FAU. Due to the original course’s popularity, we continue to offer a custom, updated version that builds upon the original material.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Two Half-Day	Online	2025, Sep 8-9	registration	Part 1 of From Zero to Multi-Node GPU Programming	NHR@TUD

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2025, Mar 12	Part 1 of From Zero to Multi-Node GPU Programming	NHR@TUD
Full-Day	Online	2025, Feb 4	Part 2 of GPU Programming Workshop	LRZ
Full-Day	Online	2024, Sep 18	Part 1 of From Zero to Multi-Node GPU Programming	NHR@TUD
Two Half-Day	Online	2024, Mar 4-5		EUMaster4HPC
Full-Day	Online	2024, Feb 29
Full-Day	NHR@FAU	2023, Jul 28
Full-Day	NHR@FAU	2023, Mar 23
Two Half-Day	Online	2023, Mar 8-9		EUMaster4HPC
Two Half-Day	Online	2022, Dec 9 & 16		EUMaster4HPC
Full-Day	Online	2022, Nov 28		LRZ
Two Half-Day	Online	2022, Apr 21-22

By the end of this workshop, participants will be proficient in the core tools and techniques for GPU-accelerating Python applications using CUDA and Numba. They will learn how to accelerate NumPy ufuncs on the GPU, configure parallel execution using CUDA’s thread hierarchy, implement custom device kernels for greater performance and flexibility, and optimize memory access through coalescing and shared memory to enhance kernel efficiency.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Full-Day	Online	2025, Oct 29	registration	Part 3 of GPU Programming Workshop	LRZ
Full-Day	Online	2025, Sep 29	registration

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2025, Apr 2		EUMaster4HPC
Full-Day	Online	2025, Feb 5	Part 3 of GPU Programming Workshop	LRZ
Full-Day	Online	2025, Jan 16	NVIDIA DLI Virtual Workshops for Higher Education
Full-Day	Online	2024, Oct 24	NVIDIA DLI Virtual Workshops for Higher Education
Full-Day	Online	2024, Oct 7
Full-Day	Online	2024, Mar 14
Two Half-Day	Online	2024, Mar 6-7		EUMaster4HPC
Full-Day	On-site	2023, Sep 18
Full-Day	In-person	2023, Mar 16
Two Half-Day	Online	2022, Sep 22-23
Two Half-Day	Online	2022, Aug 02-03

By the end of this workshop, participants will have a foundational understanding of OpenACC, a high-level programming model for parallel computing on CPUs and GPUs. The workshop covers profiling and optimizing applications to identify performance hotspots, using OpenACC directives to offload computations to the GPU, and improving data movement between the CPU and GPU to maximize efficiency.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA		Spring 2026
Full-Day	Online	2025, Oct 27	registration	Part 1 of GPU Programming Workshop	LRZ

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2025, Apr 16		EUMaster4HPC
Full-Day	Online	2025, Feb 3	Part 1 of GPU Programming Workshop	LRZ

This advanced course explores techniques for extending single-GPU applications to utilize multiple GPUs within a single compute node. It focuses on distributing workloads across multiple accelerators, optimizing performance through overlapping computation and data transfers, and using NVIDIA Nsight Systems to analyze execution behavior and identify performance bottlenecks.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Full-Day	Online	2025, Oct 30	registration	Part 4 of GPU Programming Workshop	LRZ
Two Half-Day	Online	2025, Sep 15-16	registration	Part 2 of From Zero to Multi-Node GPU Programming	NHR@TUD

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2025, Mar 19	Part 2 of From Zero to Multi-Node GPU Programming	NHR@TUD, EUMaster4HPC
Full-Day	Online	2025, Feb 6	Part 4 of GPU Programming Workshop	LRZ
Full-Day	Online	2024, Sep 25	Part 2 of From Zero to Multi-Node GPU Programming	NHR@TUD
Full-Day	Online	2024, Apr 5	Part 1 of Multi-GPU Programming with CUDA C++
Full-Day	Online	2024, Feb 8

This advanced course covers multi-node programming techniques for GPU-accelerated applications and examines advanced examples, with a special emphasis on using MPI and NVSHMEM to distribute workloads efficiently.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Two Half-Day	Online	2025, Sep 17-18	registration	Part 3 of From Zero to Multi-Node GPU Programming	NHR@TUD

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2025, Mar 26	Part 3 of From Zero to Multi-Node GPU Programming	NHR@TUD, EUMaster4HPC
Full-Day	Online	2024, Oct 2	Part 3 of From Zero to Multi-Node GPU Programming	NHR@TUD
Full-Day	Online	2024, Apr 10	Part 2 of Multi-GPU Programming with CUDA C++
Full-Day	Online	2024, Feb 9

Porting code to the GPU can yield significant speedups but often presents challenges. This advanced course introduces NVIDIA’s profiling tools to identify common performance issues during the porting process. Performance analysis is guided by straightforward, resource-based models that help developers evaluate how close their code is to the optimal performance target.

The course was previously called Performance Analysis on GPUs with NVIDIA Tools and has undergone restructuring and extension at the beginning of 2025. We offer a comprehensive GPU Performance Engineering course, along with a condensed GPU Performance Analysis module that can be incorporated into larger events.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Three Half-Day	Online	2025, Oct 8-10	registration

Past Events

Format	Location	Date	Event	In Collaboration with
GPU Performance Analysis	Lisbon, Portugal	2025, Jul 6-11	International HPC Summer School (IHPCSS)
Full-Day	Online	2025, Apr 11
Half-Day	Online	2024, Oct 9
GPU Performance Analysis	Kobe, Japan	2024, Jul 7-12	GPU Performance Analysis. Tutorial at the International HPC Summer School (IHPCSS)
Half-Day	Online	2024, Mar 19
Half-Day	Online	2023, Oct 10
GPU Performance Analysis	Atlanta, GA, USA	2023, Jul 9-14	GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS)
Half-Day	Online	2023, Apr 4
Half-Day	Online	2022, Sep 29
GPU Performance Analysis	Online	2022, Jun 19-24	International HPC Summer School (IHPCSS)
GPU Performance Analysis	Online	2021, Jul 18-30	International HPC Summer School (IHPCSS)

This workshop series bundles three of our most popular GPU programming courses: Fundamentals of Accelerated Computing with CUDA C/C++, Accelerating CUDA C++ Applications with Multiple GPUs, and Scaling CUDA C++ Applications to Multiple Nodes. Their delivery is augmented with additional material connecting the individual courses, their key concepts, and the overall workflow of GPU-accelerated applications.

Please register for each part you want to attend separately.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Six Half-Day	Online	2025, Sep 8-9, 15-18			NHR@TUD

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	Online	2025, Mar 12, 19, 26		NHR@TUD
Three-Day	Online	2024, Sep 18, Sep 25, Oct 2		NHR@TUD

This course provides an overview of the most common GPU programming approaches, including CUDA/ HIP, SYCL, modern C++, Thrust, OpenACC, OpenMP and Kokkos. It helps participants understand the strengths and weaknesses of each approach, enabling them to make informed decisions about which one to use for their specific applications.

Participants will get the most out of this course if they have already have prior experience in at least one GPU programming approach, but participation without any prior knowledge is also possible.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Two Half-Day	Online	2025, Sep 4-5	registration

Parallel Programming

This long-standing course is a collaboration between the Erlangen National High Performance Computing Center (NHR@FAU) and the Leibniz Supercomputing Center (LRZ). It is designed for students and researchers interested in programming modern HPC hardware, with a focus on large-scale parallel computing systems available in Jülich, Stuttgart, and Munich, as well as smaller clusters at Tier-2/3 centers and departmental facilities.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA	TBA	Spring 2026			LRZ

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	LRZ	2025, Feb 18-20	PPHPS25	LRZ
Three-Day	NHR@FAU	2024, Feb 20-22	PPHPS24	LRZ
Three-Day	Online	2023, Mar 7-9	PPHPS23	LRZ
Three-Day	Online	2022, Mar 8-10	PPHPS22	LRZ
Three-Day	Online	2021, Apr 13-15	PPHPS21	LRZ
Four-Day	FAU	2020, Mar 9-13		LRZ

This course provides an introduction to the Message Passing Interface (MPI), the dominant distributed-memory programming paradigm in High Performance Computing.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA		Spring 2026

Past Events

Format	Location	Date	Event	In Collaboration with
Two-Day	Online	2025, Apr 9-10
Two-Day	Online	2024, Apr 11-12

OpenMP is a widely supported standard for parallelizing shared-memory C/C++ and Fortran applications. It offers a simple, low-barrier entry to thread-based parallelization. This course introduces the fundamental concepts and constructs of OpenMP, as well as advanced topics like tasking and accelerator offloading.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA		Spring 2026

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	Online	2025, Feb 26-28
Three Half-Day	Online	2024, Sep 4-6
Part 1	Online	2024, Mar 12
Part 2	Online	2024, Mar 5
Part 2	Online	2023, Sep 27
Part 1	Online	2023, Sep 20
Part 2	Online	2023, Mar 28
Part 1	Online	2023, Mar 21
Full-Day	Online	2022, Oct 4

Most HPC systems consist of clusters of shared-memory nodes. Efficient use of such systems requires optimizing both memory consumption and communication time. Hybrid programming combines distributed-memory parallelization across nodes (e.g., using MPI) with shared-memory parallelization within each node (e.g., using OpenMP or MPI-3.0 shared memory).

This course examines the strengths and weaknesses of various parallel programming models on clusters of shared-memory nodes, with special focus on multi-socket, multi-core systems in highly parallel environments. MPI-3.0 introduces a shared memory programming interface that complements inter-node MPI communication. This interface supports direct neighbor accesses, similar to OpenMP, and enables direct halo copies, paving the way for innovative hybrid programming models. These models are compared against hybrid MPI+OpenMP approaches and pure MPI implementations. Additionally, the course covers MPI+OpenMP offloading with GPUs. Through numerous case studies and micro-benchmarks, the course highlights performance aspects of hybrid programming. Hands-on sessions are included daily. Tools for hybrid programming such as thread and process placement support and performance analysis are demonstrated in practical “how-to” sections.

This course is a joint training event of EuroCC@GCS and EuroCC-Austria, the German and Austrian National Competence Centres for High-Performance Computing. It is organized by the HLRS in cooperation with the VSC Research Center at TU Wien and NHR@FAU.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA		2026

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	Hybrid @ HLRS	2025, Jan 21-23		HLRS, VSC
Three-Day	Hybrid @ HLRS	2024, Jan 23-25		HLRS, VSC
Three-Day	Online @ VSC	2022, Dec 12-14		PRACE, HLRS, VSC
Three-Day	Online @ LRZ	2022, Jun 22-24		PRACE, HLRS, VSC
Three-Day	Online @ VSC	2022, Apr 5-7		PRACE, HLRS, VSC
Three-Day	Online @ VSC	2021, Jun 15-17		HLRS, VSC
Three-Day	Online @ VSC	2020, Jun 17-19		HLRS, VSC
Two-Day	HLRS	2020, Jan 27-28		HLRS, VSC

Performance Engineering

This course covers performance engineering approaches on the CPU core level. While many developers put a lot of effort into optimizing parallelism, they often lose track of the importance of an efficient serial code first. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level, the L1 cache. It covers general computer architecture for x86 and ARM processors, an introduction to (AT&T and AArch64) assembly code, and performance analysis and engineering using the Open Source Architecture Code Analyzer (OSACA) tool in combination with the Compiler Explorer.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Full-Day	Online	2025, Oct 6	registration

Past Events

Format	Location	Date	Event	In Collaboration with
Half-Day	Hamburg, Germany	2025, Jun 10-13	ISC High Performance
Half-Day	Atlanta, GA, USA	2024, Nov 17-22	SC24
Full-Day	NHR@FAU	2024, Oct 8
Full-Day	Ostrava, Czech Republic	2024, Sep 8-11	PPAM 2024
Full-Day	Vienna, Austria	2023, Oct 21-25	PACT 2023
Full-Day	NHR@FAU	2023, Oct 12
Full-Day	Coimbra, Portugal	2023, Apr 15-19	ICPE 2023

This course covers performance engineering approaches on the compute node level. Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Three-Day	NHR@FAU	2025, Sep 10-12	registration
Three-Day	Online	2025, Dec 2-4	registration		LRZ

Past Events

Format	Location	Date	Event	In Collaboration with
Four-Day	HLRS	2025, Jun 3-6		ZIH (TU Dresden)
Three-Day	LRZ	2024, Dec 3-5
Four-Day	HLRS	2024, Jun 18-21		ZIH (TU Dresden)
Three-Day	LRZ	2023, Dec 4-6
Full-Day	Denver, CO, USA	2023, Nov 12-17	SC23
Three-Day	NHR@FAU	2023, Oct 4-6
Half-Day	Hamburg, Germany	2023, May 11	ISC High Performance
Four-Day	HLRS	2023, Jun 27-30		ZIH (TU Dresden)
Three-Day	LRZ	2022, Dec 5-7		PRACE
Full-Day	Dallas, TX, USA	2022, Nov 13-18	SC22
Four-Day	HLRS	2022, Jun 28-Jul 1		PRACE, ZIH (TU Dresden)

This tutorial covers code analysis, performance modeling, and optimization for linear solvers on CPU and GPU nodes. Performance Engineering is often taught using simple loops as instructive examples for performance models and how they can guide optimization; however, full, preconditioned linear solvers comprise multiple back-to-back loops enclosed in an iteration scheme that is executed until convergence is achieved. Consequently, the concept of “optimal performance” has to account for both hardware resource efficiency and iterative solver convergence. We convey a performance engineering process that is geared towards linear iterative solvers. After introducing basic notions of hardware organization and storage for dense and sparse data structures, we show how the Roofline performance model can be applied to such solvers in predictive and diagnostic ways and how it can be used to assess the hardware efficiency of a solver, covering important corner cases such as pure memory boundedness. Then we advance to the structure of preconditioned solvers, using the Conjugate Gradient Method (CG) algorithm as a leading example. Hotspots and bottlenecks of the complete solver are identified followed by the introduction of advanced performance optimization techniques like preconditioning and cache blocking.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events

Format	Location	Date	Event	In Collaboration with
Half-Day	Hamburg, Germany	2025, Jun 10-13	ISC High Performance	TU Delft, TU Munich
Half-Day	Atlanta, GA, USA	2024, Nov 17-22	SC24	TU Delft, TU Munich
Half-Day	Hamburg, Germany	2024, May 12-16	ISC High Performance	TU Delft, TU Munich

LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
Full-Day	Online	2025, Jul 31	registration

Past Events

Format	Location	Date	Event	In Collaboration with
Full-Day	Online	2024, Jul 23
Full-Day	Online	2023, Jul 24
Webinar	Stony Brook University	2021, Jul 27	LIKWID, OSACA, and Sparse MVM on A64FX (video)
Webinar	Online	2021, Jun 2	Using the LIKWID and OSACA tools on A64FX (video)

Molecular Dynamics Simulations

This course covers an introduction into the molecular dynamics engine GROMACS, including fundamental commands and applications. Over five days, the participants will learn how to prepare and run simulations of biomolecular systems (e.g. including membranes and proteins) at an atomistic and coarse-grained level of resolution. Post-processing and analysis of simulation trajectories are a large part of the tutorial.

The course is usually embedded in the Bachelor programs of Biology and Integrated Life Sciences. There are five places available for people from NHR. The course will be held in person and takes place in the CIP of the Biology Department.

Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de.

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA		TBA

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	Department of Biology, FAU	2023, Oct 10-12
Three-Day	Department of Biology, FAU	2022, Dec 12-16

This course provides a short introduction to the AMBER molecular dynamics simulation suite: General workflow, system setup, simulation on NHR@FAU cluster systems (incl. GPU acceleration), and common analysis tasks on NHR@FAU systems including GPU-accelerated HPC. The following topics are covered:

1. System Setup: Model building (structure, protonation states, choice of force field/parameters), solvation + simulation box, constraints, minimisation/relaxation

2. Simulation: heating, equilibration, production run; 3. Analysis: Imaging, RMSD and fluctuations, time series of quantities (e.g. distances), probabilities (hydrogen bonds)

Upcoming Events

Format	Location	Date	Registration	Event	In Collaboration with
TBA		TBA

Intermittent Course Offerings

This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.

On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	NHR@FAU	2021, Mar 1-3	38th VI-HPS Tuning Workshop
Three-Day	CSC Frankfurt	2020, Dec 7-11	37th VI-HPS Tuning Workshop
Three-Day	CINECA, Italy	2020, Sep 30-Oct 2	36th VI-HPS Tuning Workshop

Past Events

Format	Location	Date	Event	In Collaboration with
Contributed Session	Online	2021, Jul 15	2021 Code Performance Series: From analysis to insight
Contributed Session	Online	2021, Feb 23	EXA2PRO-EoCoE joint workshop

The Python programming language has become very popular in scientific computing for various reasons. Users not only implement prototypes for numerical experiments on small scales, but also develop parallel production codes, thereby partly replacing compiled languages such as C, C++, and Fortran. However, when following this approach it is crucial to pay special attention to performance. This course teaches approaches to use Python efficiently and reasonably in a HPC environment. The first lecture gives a whirlwind tour through the Python programming language and the standard library. In the following, the lectures strongly focus on performance-related topics such as NumPy, Cython, Numba, compiled C- and Fortran extensions, profiling of Python and compiled code, parallelism using multiprocessing and mpi4py, parallel frameworks such as Dask, and efficient IO with HDF5. In addition, we will cover topics more related to software-engineering such as packaging, publishing, testing, and the semi-automated generation of documentation. Finally, basic visualization tasks using matplotlib and similar packages are discussed.

Past Events

Format	Location	Date	Event	In Collaboration with
Three-Day	Online	2023, Jul 25-27		MPCDF

Tutorials & Courses

Course Program Overview

All Our Upcoming Courses At a Glance

C++ Programming

C++ for Beginners

Upcoming Events

Past Events

Modern C++ Software Design

Upcoming Events

Past Events

GPU Programming

Fundamentals of Accelerated Computing with Modern CUDA C++

Upcoming Events

Past Events

Fundamentals of Accelerated Computing with CUDA C/C++

Upcoming Events

Past Events

Fundamentals of Accelerated Computing with CUDA Python

Upcoming Events

Past Events

Fundamentals of Accelerated Computing with OpenACC

Upcoming Events

Past Events

Accelerating CUDA C++ Applications with Multiple GPUs

Upcoming Events

Past Events

Scaling CUDA C++ Applications to Multiple Nodes

Upcoming Events

Past Events

GPU Performance Engineering

Upcoming Events

Past Events

From Zero to Multi-Node GPU Programming

Upcoming Events

Past Events

Choosing GPU Programming Approaches

Upcoming Events

Parallel Programming

Parallel Programming of High-Performance Systems (PPHPS)

Upcoming Events

Past Events

Introduction to Parallel Programming with MPI

Upcoming Events

Past Events

Introduction to Parallel Programming with OpenMP

Upcoming Events

Past Events

Hybrid Programming in HPC - MPI+X

Upcoming Events

Past Events

Performance Engineering

Core-Level Performance Engineering

Upcoming Events

Past Events

Node-Level Performance Engineering

Upcoming Events

Past Events

Performance Engineering for Linear Solvers

Past Events

Introduction to the LIKWID Tool Suite

Upcoming Events

Past Events

Molecular Dynamics Simulations

Introduction to GROMACS

Upcoming Events

Past Events

Introduction to Amber

Upcoming Events

Intermittent Course Offerings

VI-HPS Tuning Workshop

Past Events

Performance Evaluation

Past Events

Python for HPC

Past Events