• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
NHR@FAU
  • FAUTo the central FAU website
Suche öffnen
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

NHR@FAU

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • BayernKI
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Support Success Stories
    • Annual Reports
    • NHR@FAU Newsletters
    • Previous Events
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters & Talks
    • Performance Tools and Libraries
    • NHR PerfLab Seminar
    • Projects
    • Workshops
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures & Seminars
    • [RETIRED] Tutorials & Courses
    • Monthly HPC Café and Beginner’s Introduction
    • Theses
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • HPC User Training
    • HPC System Utilization
    Portal Systems & Services
  • FAQ

NHR@FAU

  1. Home
  2. Teaching & Training
  3. Tutorials & Courses

Tutorials & Courses

In page navigation: Teaching & Training
  • Tutorials & Courses
    • Accelerating CUDA C++ Applications with Multiple GPUs
    • C++ for Beginners
    • Choosing GPU Programming Approaches
    • Core-Level Performance Engineering
    • From Zero to Multi-Node GPU Programming
    • Fundamentals of Accelerated Computing with CUDA C/C++
    • Fundamentals of Accelerated Computing with CUDA Python
    • Fundamentals of Accelerated Computing with Modern CUDA C++
    • Fundamentals of Accelerated Computing with OpenACC
    • GPU Performance Engineering
    • Hybrid Programming in HPC - MPI+X
    • Introduction to OpenMP
    • Introduction to the LIKWID Tool Suite
    • Modern C++ Software Design
    • Node-Level Performance Engineering
    • Parallel Programming of High-Performance Systems (PPHPS)
    • Performance Engineering for Linear Solvers
    • Scaling CUDA C++ Applications to Multiple Nodes
  • Lectures & Seminars
  • Monthly HPC Café and Beginner's Introduction
  • Theses
  • Student Cluster Competition

Tutorials & Courses

Erlangen National High Performance Computing Center (NHR@FAU) offers a wide range of HPC-related courses, covering topics such as modern C++, parallel programming, GPU programming, performance engineering, and domain-specific applications like molecular dynamics simulations.

We regularly present our flagship events Core–Level Performance Engineering and Node-Level Performance Engineering at leading conferences such as SC and ISC, as well as at high-performance computing centers. Many of our courses are conducted in collaboration with educators from Leibniz Supercomputing Centre (LRZ), High Performance Computing Center Stuttgart (HLRS), Vienna Scientific Cluster (VSC) at TU Wien, and NHR@TUD/ ZIH at TU Dresden. Several of our GPU programming courses are offered in partnership with the Nvidia Deep Learning Institute (DLI), and we regularly contribute workshops to the European Master For High Performance Computing (EUMaster4HPC) program.

Upon request, we conduct customized course sessions for interested computing centers, research institutions, and industry partners.

If you are an FAU student, we also encourage you to also explore the curricular courses offered by the Professorship of High Performance Computing.

Please register for each course individually using the links provided.

 

Course Program Overview

What How Where When Join
GPU Performance Engineering GPU Performance Analysis Lisbon, Portugal 2025, Jul 6-11
Introduction to the LIKWID Tool Suite Full-Day Online 2025, Jul 31 Register
Choosing GPU Programming Approaches Two Half-Day Online 2025, Sep 4-5 Register
Fundamentals of Accelerated Computing with CUDA C/C++ Two Half-Day Online 2025, Sep 8-9 Register
From Zero to Multi-Node GPU Programming Six Half-Day Online 2025, Sep 8-9, 15-18
Fundamentals of Accelerated Computing with Modern CUDA C++ Three Half-Day Online 2025, Sep 10-12 Register
Accelerating CUDA C++ Applications with Multiple GPUs Two Half-Day Online 2025, Sep 15-16 Register
Scaling CUDA C++ Applications to Multiple Nodes Two Half-Day Online 2025, Sep 17-18 Register
C++ for Beginners Six-Day Online 2025, Sep 18-19, 25-26, and Oct 1-2 Register
Fundamentals of Accelerated Computing with CUDA Python Full-Day Online 2025, Sep 29 Register
Modern C++ Software Design Three-Day Online 2025, Sep 30-Oct 2 Register
Core-Level Performance Engineering Full-Day Online 2025, Oct 6 Register
GPU Performance Engineering Three Half-Day Online 2025, Oct 8-10 Register
Fundamentals of Accelerated Computing with OpenACC Full-Day Online 2025, Oct 27 Register
Fundamentals of Accelerated Computing with Modern CUDA C++ Full-Day Online 2025, Oct 28 Register
Fundamentals of Accelerated Computing with CUDA Python Full-Day Online 2025, Oct 29 Register
Accelerating CUDA C++ Applications with Multiple GPUs Full-Day Online 2025, Oct 30 Register
Node-Level Performance Engineering Three-Day LRZ 2025, Dec 2-4 Register

C++ Programming

This course introduces the core features and syntax of C++, along with key principles, idioms, and best practices for professional software development. It is designed to help programmers write high-quality, maintainable code from the start.

Participants will learn how to develop robust, efficient, and mature C++ applications while avoiding common pitfalls. A basic understanding of programming in any language is assumed.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Six-Day Online 2025, Sep 18-19, 25-26, and Oct 1-2 registration
Past Events
Format Location Date Event In Collaboration with
Six-Day Online 2024, Sep 12-13, 19-20, and 26-27
Six-Day Online 2023, Sep 14-15, 21-22, and 28-29
Five-Day Online 2022, Oct 10-14

This advanced course focuses on software development using the C++ programming language. It emphasizes essential principles, concepts, idioms, and best practices that enable developers to write professional, high-quality code.

Participants will gain insight into key C++ paradigms object-oriented, functional, and generic programming and learn guidelines for developing robust, efficient, maintainable, and mature C++ applications.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Three-Day Online 2025, Sep 30-Oct 2 registration
Past Events
Format Location Date Event In Collaboration with
Three-Day Online 2024, Sep 30-Oct 2
Three-Day Online 2023, Oct 11-13
Three-Day Online 2022, Oct 5-7

GPU Programming

By the end of the workshop, participants will understand the fundamental concepts and techniques for accelerating C++ code with CUDA. They will be able to write and compile code that runs on the GPU, optimize memory transfers between CPU and GPU, and leverage parallel algorithms to simplify adding GPU acceleration.

Additionally, participants will learn to implement custom parallel algorithms through CUDA kernels, utilize concurrent CUDA streams to overlap computation with memory operations, and identify the best opportunities to integrate CUDA acceleration into existing CPU-only applications.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Full-Day Online 2025, Oct 28 registration Part 2 of GPU Programming Workshop LRZ
Three Half-Day Online 2025, Sep 10-12 registration
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, May 27 NVIDIA DLI Virtual Workshops for Higher Education

By the end of this workshop, participants will have a solid grasp of the essential tools and techniques for GPU-accelerating C/C++ applications using CUDA. They will be able to write GPU-executable code, leverage data parallelism, optimize memory transfers with asynchronous prefetching, and use both command-line and visual profilers to guide performance tuning. Additionally, they will know how to employ concurrent streams to increase parallelism and apply a profile-driven approach to develop or refactor CUDA applications for maximum performance.

Until March 2025, this course was offered as an official NVIDIA Deep Learning Institute (DLI) program. Its successor, Fundamentals of Accelerated Computing with Modern CUDA C++, is also offered by NHR@FAU. Due to the original course’s popularity, we continue to offer a custom, updated version that builds upon the original material.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Two Half-Day Online 2025, Sep 8-9 registration Part 1 of From Zero to Multi-Node GPU Programming NHR@TUD
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, Mar 12 Part 1 of From Zero to Multi-Node GPU Programming NHR@TUD
Full-Day Online 2025, Feb 4 Part 2 of GPU Programming Workshop LRZ
Full-Day Online 2024, Sep 18 Part 1 of From Zero to Multi-Node GPU Programming NHR@TUD
Two Half-Day Online 2024, Mar 4-5 EUMaster4HPC
Full-Day Online 2024, Feb 29
Full-Day NHR@FAU 2023, Jul 28
Full-Day NHR@FAU 2023, Mar 23
Two Half-Day Online 2023, Mar 8-9 EUMaster4HPC
Two Half-Day Online 2022, Dec 9 & 16 EUMaster4HPC
Full-Day Online 2022, Nov 28 LRZ
Two Half-Day Online 2022, Apr 21-22

By the end of this workshop, participants will be proficient in the core tools and techniques for GPU-accelerating Python applications using CUDA and Numba. They will learn how to accelerate NumPy ufuncs on the GPU, configure parallel execution using CUDA’s thread hierarchy, implement custom device kernels for greater performance and flexibility, and optimize memory access through coalescing and shared memory to enhance kernel efficiency.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Full-Day Online 2025, Oct 29 registration Part 3 of GPU Programming Workshop LRZ
Full-Day Online 2025, Sep 29 registration
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, Apr 2 EUMaster4HPC
Full-Day Online 2025, Feb 5 Part 3 of GPU Programming Workshop LRZ
Full-Day Online 2025, Jan 16 NVIDIA DLI Virtual Workshops for Higher Education
Full-Day Online 2024, Oct 24 NVIDIA DLI Virtual Workshops for Higher Education
Full-Day Online 2024, Oct 7
Full-Day Online 2024, Mar 14
Two Half-Day Online 2024, Mar 6-7 EUMaster4HPC
Full-Day On-site 2023, Sep 18
Full-Day In-person 2023, Mar 16
Two Half-Day Online 2022, Sep 22-23
Two Half-Day Online 2022, Aug 02-03

By the end of this workshop, participants will have a foundational understanding of OpenACC, a high-level programming model for parallel computing on CPUs and GPUs. The workshop covers profiling and optimizing applications to identify performance hotspots, using OpenACC directives to offload computations to the GPU, and improving data movement between the CPU and GPU to maximize efficiency.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA Spring 2026
Full-Day Online 2025, Oct 27 registration Part 1 of GPU Programming Workshop LRZ
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, Apr 16 EUMaster4HPC
Full-Day Online 2025, Feb 3 Part 1 of GPU Programming Workshop LRZ

This advanced course explores techniques for extending single-GPU applications to utilize multiple GPUs within a single compute node. It focuses on distributing workloads across multiple accelerators, optimizing performance through overlapping computation and data transfers, and using NVIDIA Nsight Systems to analyze execution behavior and identify performance bottlenecks.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Full-Day Online 2025, Oct 30 registration Part 4 of GPU Programming Workshop LRZ
Two Half-Day Online 2025, Sep 15-16 registration Part 2 of From Zero to Multi-Node GPU Programming NHR@TUD
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, Mar 19 Part 2 of From Zero to Multi-Node GPU Programming NHR@TUD, EUMaster4HPC
Full-Day Online 2025, Feb 6 Part 4 of GPU Programming Workshop LRZ
Full-Day Online 2024, Sep 25 Part 2 of From Zero to Multi-Node GPU Programming NHR@TUD
Full-Day Online 2024, Apr 5 Part 1 of Multi-GPU Programming with CUDA C++
Full-Day Online 2024, Feb 8

This advanced course covers multi-node programming techniques for GPU-accelerated applications and examines advanced examples, with a special emphasis on using MPI and NVSHMEM to distribute workloads efficiently.

Additional information is available on the Nvidia DLI course homepage and the NHR@FAU course homepage.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Two Half-Day Online 2025, Sep 17-18 registration Part 3 of From Zero to Multi-Node GPU Programming NHR@TUD
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, Mar 26 Part 3 of From Zero to Multi-Node GPU Programming NHR@TUD, EUMaster4HPC
Full-Day Online 2024, Oct 2 Part 3 of From Zero to Multi-Node GPU Programming NHR@TUD
Full-Day Online 2024, Apr 10 Part 2 of Multi-GPU Programming with CUDA C++
Full-Day Online 2024, Feb 9

Porting code to the GPU can yield significant speedups but often presents challenges. This advanced course introduces NVIDIA’s profiling tools to identify common performance issues during the porting process. Performance analysis is guided by straightforward, resource-based models that help developers evaluate how close their code is to the optimal performance target.

The course was previously called Performance Analysis on GPUs with NVIDIA Tools and has undergone restructuring and extension at the beginning of 2025. We offer a comprehensive GPU Performance Engineering course, along with a condensed GPU Performance Analysis module that can be incorporated into larger events.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Three Half-Day Online 2025, Oct 8-10 registration
GPU Performance Analysis Lisbon, Portugal 2025, Jul 6-11 International HPC Summer School (IHPCSS)
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2025, Apr 11
Half-Day Online 2024, Oct 9
GPU Performance Analysis Kobe, Japan 2024, Jul 7-12 GPU Performance Analysis. Tutorial at the International HPC Summer School (IHPCSS)
Half-Day Online 2024, Mar 19
Half-Day Online 2023, Oct 10
GPU Performance Analysis Atlanta, GA, USA 2023, Jul 9-14 GPU Performance Analysis. Lecture at the International HPC Summer School (IHPCSS)
Half-Day Online 2023, Apr 4
Half-Day Online 2022, Sep 29
GPU Performance Analysis Online 2022, Jun 19-24 International HPC Summer School (IHPCSS)
GPU Performance Analysis Online 2021, Jul 18-30 International HPC Summer School (IHPCSS)

This workshop series bundles three of our most popular GPU programming courses: Fundamentals of Accelerated Computing with CUDA C/C++, Accelerating CUDA C++ Applications with Multiple GPUs, and Scaling CUDA C++ Applications to Multiple Nodes. Their delivery is augmented with additional material connecting the individual courses, their key concepts, and the overall workflow of GPU-accelerated applications.

Please register for each part you want to attend separately.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Six Half-Day Online 2025, Sep 8-9, 15-18 NHR@TUD
Past Events
Format Location Date Event In Collaboration with
Three-Day Online 2025, Mar 12, 19, 26 NHR@TUD
Three-Day Online 2024, Sep 18, Sep 25, Oct 2 NHR@TUD

This course provides an overview of the most common GPU programming approaches, including CUDA/ HIP, SYCL, modern C++, Thrust, OpenACC, OpenMP and Kokkos. It helps participants understand the strengths and weaknesses of each approach, enabling them to make informed decisions about which one to use for their specific applications.

Participants will get the most out of this course if they have already have prior experience in at least one GPU programming approach, but participation without any prior knowledge is also possible.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Two Half-Day Online 2025, Sep 4-5 registration

Parallel Programming

This long-standing course is a collaboration between the Erlangen National High Performance Computing Center (NHR@FAU) and the Leibniz Supercomputing Center (LRZ). It is designed for students and researchers interested in programming modern HPC hardware, with a focus on large-scale parallel computing systems available in Jülich, Stuttgart, and Munich, as well as smaller clusters at Tier-2/3 centers and departmental facilities.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA TBA Spring 2026 LRZ
Past Events
Format Location Date Event In Collaboration with
Three-Day LRZ 2025, Feb 18-20 PPHPS25 LRZ
Three-Day NHR@FAU 2024, Feb 20-22 PPHPS24 LRZ
Three-Day Online 2023, Mar 7-9 PPHPS23 LRZ
Three-Day Online 2022, Mar 8-10 PPHPS22 LRZ
Three-Day Online 2021, Apr 13-15 PPHPS21 LRZ
Four-Day FAU 2020, Mar 9-13 LRZ

This course provides an introduction to the Message Passing Interface (MPI), the dominant distributed-memory programming paradigm in High Performance Computing.

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA Spring 2026
Past Events
Format Location Date Event In Collaboration with
Two-Day Online 2025, Apr 9-10
Two-Day Online 2024, Apr 11-12

OpenMP is a widely supported standard for parallelizing shared-memory C/C++ and Fortran applications. It offers a simple, low-barrier entry to thread-based parallelization. This course introduces the fundamental concepts and constructs of OpenMP, as well as advanced topics like tasking and accelerator offloading.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA Spring 2026
Past Events
Format Location Date Event In Collaboration with
Three-Day Online 2025, Feb 26-28
Three Half-Day Online 2024, Sep 4-6
Part 1 Online 2024, Mar 12
Part 2 Online 2024, Mar 5
Part 2 Online 2023, Sep 27
Part 1 Online 2023, Sep 20
Part 2 Online 2023, Mar 28
Part 1 Online 2023, Mar 21
Full-Day Online 2022, Oct 4

Most HPC systems consist of clusters of shared-memory nodes. Efficient use of such systems requires optimizing both memory consumption and communication time. Hybrid programming combines distributed-memory parallelization across nodes (e.g., using MPI) with shared-memory parallelization within each node (e.g., using OpenMP or MPI-3.0 shared memory).

This course examines the strengths and weaknesses of various parallel programming models on clusters of shared-memory nodes, with special focus on multi-socket, multi-core systems in highly parallel environments. MPI-3.0 introduces a shared memory programming interface that complements inter-node MPI communication. This interface supports direct neighbor accesses, similar to OpenMP, and enables direct halo copies, paving the way for innovative hybrid programming models. These models are compared against hybrid MPI+OpenMP approaches and pure MPI implementations. Additionally, the course covers MPI+OpenMP offloading with GPUs. Through numerous case studies and micro-benchmarks, the course highlights performance aspects of hybrid programming. Hands-on sessions are included daily. Tools for hybrid programming such as thread and process placement support and performance analysis are demonstrated in practical “how-to” sections.

This course is a joint training event of EuroCC@GCS and EuroCC-Austria, the German and Austrian National Competence Centres for High-Performance Computing. It is organized by the HLRS in cooperation with the VSC Research Center at TU Wien and NHR@FAU.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA 2026
Past Events
Format Location Date Event In Collaboration with
Three-Day Hybrid @ HLRS 2025, Jan 21-23 HLRS, VSC
Three-Day Hybrid @ HLRS 2024, Jan 23-25 HLRS, VSC
Three-Day Online @ VSC 2022, Dec 12-14 PRACE, HLRS, VSC
Three-Day Online @ LRZ 2022, Jun 22-24 PRACE, HLRS, VSC
Three-Day Online @ VSC 2022, Apr 5-7 PRACE, HLRS, VSC
Three-Day Online @ VSC 2021, Jun 15-17 HLRS, VSC
Three-Day Online @ VSC 2020, Jun 17-19 HLRS, VSC
Two-Day HLRS 2020, Jan 27-28 HLRS, VSC

Performance Engineering

This course covers performance engineering approaches on the CPU core level. While many developers put a lot of effort into optimizing parallelism, they often lose track of the importance of an efficient serial code first. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level, the L1 cache. It covers general computer architecture for x86 and ARM processors, an introduction to (AT&T and AArch64) assembly code, and performance analysis and engineering using the Open Source Architecture Code Analyzer (OSACA) tool in combination with the Compiler Explorer.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Full-Day Online 2025, Oct 6 registration
Past Events
Format Location Date Event In Collaboration with
Half-Day Hamburg, Germany 2025, Jun 10-13 ISC High Performance
Half-Day Atlanta, GA, USA 2024, Nov 17-22 SC24
Full-Day NHR@FAU 2024, Oct 8
Full-Day Ostrava, Czech Republic 2024, Sep 8-11 PPAM 2024
Full-Day Vienna, Austria 2023, Oct 21-25 PACT 2023
Full-Day NHR@FAU 2023, Oct 12
Full-Day Coimbra, Portugal 2023, Apr 15-19 ICPE 2023

This course covers performance engineering approaches on the compute node level. Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Three-Day LRZ 2025, Dec 2-4 registration
Past Events
Format Location Date Event In Collaboration with
Four-Day HLRS 2025, Jun 3-6 ZIH (TU Dresden)
Three-Day LRZ 2024, Dec 3-5
Four-Day HLRS 2024, Jun 18-21 ZIH (TU Dresden)
Three-Day LRZ 2023, Dec 4-6
Full-Day Denver, CO, USA 2023, Nov 12-17 SC23
Three-Day NHR@FAU 2023, Oct 4-6
Half-Day Hamburg, Germany 2023, May 11 ISC High Performance
Four-Day HLRS 2023, Jun 27-30 ZIH (TU Dresden)
Three-Day LRZ 2022, Dec 5-7 PRACE
Full-Day Dallas, TX, USA 2022, Nov 13-18 SC22
Four-Day HLRS 2022, Jun 28-Jul 1 PRACE, ZIH (TU Dresden)

This tutorial covers code analysis, performance modeling, and optimization for linear solvers on CPU and GPU nodes. Performance Engineering is often taught using simple loops as instructive examples for performance models and how they can guide optimization; however, full, preconditioned linear solvers comprise multiple back-to-back loops enclosed in an iteration scheme that is executed until convergence is achieved. Consequently, the concept of “optimal performance” has to account for both hardware resource efficiency and iterative solver convergence. We convey a performance engineering process that is geared towards linear iterative solvers. After introducing basic notions of hardware organization and storage for dense and sparse data structures, we show how the Roofline performance model can be applied to such solvers in predictive and diagnostic ways and how it can be used to assess the hardware efficiency of a solver, covering important corner cases such as pure memory boundedness. Then we advance to the structure of preconditioned solvers, using the Conjugate Gradient Method (CG) algorithm as a leading example. Hotspots and bottlenecks of the complete solver are identified followed by the introduction of advanced performance optimization techniques like preconditioning and cache blocking.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Past Events
Format Location Date Event In Collaboration with
Half-Day Hamburg, Germany 2025, Jun 10-13 ISC High Performance TU Delft, TU Munich
Half-Day Atlanta, GA, USA 2024, Nov 17-22 SC24 TU Delft, TU Munich
Half-Day Hamburg, Germany 2024, May 12-16 ISC High Performance TU Delft, TU Munich

LIKWID stands for “Like I Knew What I’m Doing.” It is an easy to use yet powerful command line performance tool suite for the GNU/Linux operating system. While the focus of LIKWID is on x86 processors, some of the tools are portable and not limited to any specific architecture. For the upcoming release, LIKWID has been ported to ARMv7/v8 and POWER8/9 architectures as well as for Nvidia GPU co-processors.

Additional information such as learning objectives, prerequisites, certification and more can be found on the course page.

Upcoming Events
Format Location Date Registration Event In Collaboration with
Full-Day Online 2025, Jul 31 registration
Past Events
Format Location Date Event In Collaboration with
Full-Day Online 2024, Jul 23
Full-Day Online 2023, Jul 24
Webinar Stony Brook University 2021, Jul 27 LIKWID, OSACA, and Sparse MVM on A64FX (video)
Webinar Online 2021, Jun 2 Using the LIKWID and OSACA tools on A64FX (video)

Molecular Dynamics Simulations

This course covers an introduction into the molecular dynamics engine GROMACS, including fundamental commands and applications. Over five days, the participants will learn how to prepare and run simulations of biomolecular systems (e.g. including membranes and proteins) at an atomistic and coarse-grained level of resolution. Post-processing and analysis of simulation trajectories are a large part of the tutorial.

The course is usually embedded in the Bachelor programs of Biology and Integrated Life Sciences. There are five places available for people from NHR. The course will be held in person and takes place in the CIP of the Biology Department.

Interested candidates should send a short note about their background and motivation to rainer.boeckmann@fau.de.

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA TBA
Past Events
Format Location Date Event In Collaboration with
Three-Day Department of Biology, FAU 2023, Oct 10-12
Three-Day Department of Biology, FAU 2022, Dec 12-16

This course provides a short introduction to the AMBER molecular dynamics simulation suite: General workflow, system setup, simulation on NHR@FAU cluster systems (incl. GPU acceleration), and common analysis tasks on NHR@FAU systems including GPU-accelerated HPC. The following topics are covered:

1. System Setup: Model building (structure, protonation states, choice of force field/parameters), solvation + simulation box, constraints, minimisation/relaxation

2. Simulation: heating, equilibration, production run; 3. Analysis: Imaging, RMSD and fluctuations, time series of quantities (e.g. distances), probabilities (hydrogen bonds)

Upcoming Events
Format Location Date Registration Event In Collaboration with
TBA TBA

Intermittent Course Offerings

This workshop organized by VI-HPS and Erlangen National High Performance Computing Center will give an overview of the VI-HPS programming tools suite, explain the functionality of individual tools, and how to use them effectively and offer hands-on experience and expert assistance using the tools.

On completion participants should be familiar with common performance analysis and diagnosis techniques and how they can be employed in practice (on a range of HPC systems). Those who prepared their own application test cases will have been coached in the tuning of their measurement and analysis, and provided optimization suggestions.

Past Events
Format Location Date Event In Collaboration with
Three-Day NHR@FAU 2021, Mar 1-3 38th VI-HPS Tuning Workshop
Three-Day CSC Frankfurt 2020, Dec 7-11 37th VI-HPS Tuning Workshop
Three-Day CINECA, Italy 2020, Sep 30-Oct 2 36th VI-HPS Tuning Workshop

Past Events
Format Location Date Event In Collaboration with
Contributed Session Online 2021, Jul 15 2021 Code Performance Series: From analysis to insight
Contributed Session Online 2021, Feb 23 EXA2PRO-EoCoE joint workshop

The Python programming language has become very popular in scientific computing for various reasons. Users not only implement prototypes for numerical experiments on small scales, but also develop parallel production codes, thereby partly replacing compiled languages such as C, C++, and Fortran. However, when following this approach it is crucial to pay special attention to performance. This course teaches approaches to use Python efficiently and reasonably in a HPC environment. The first lecture gives a whirlwind tour through the Python programming language and the standard library. In the following, the lectures strongly focus on performance-related topics such as NumPy, Cython, Numba, compiled C- and Fortran extensions, profiling of Python and compiled code, parallelism using multiprocessing and mpi4py, parallel frameworks such as Dask, and efficient IO with HDF5. In addition, we will cover topics more related to software-engineering such as packaging, publishing, testing, and the semi-automated generation of documentation. Finally, basic visualization tasks using matplotlib and similar packages are discussed.

Past Events
Format Location Date Event In Collaboration with
Three-Day Online 2023, Jul 25-27 MPCDF
Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
  • RSS Feed
Up