GPU Performance Engineering

Course Description

Porting code to the GPU can yield significant speedups but often presents challenges. This advanced course introduces NVIDIA’s profiling tools to identify common performance issues during the porting process. Performance analysis is guided by straightforward, resource-based models that help developers evaluate how close their code is to the optimal performance target.

The course has undergone restructuring and extension at the beginning of 2025.

We offer a comprehensive GPU Performance Engineering course, along with a condensed GPU Performance Analysis module that can be incorporated into larger events.

Learning Objectives

This course focuses on assessing the performance of GPU-accelerated applications using NVIDIA’s profiling tools, including:

GPU architecture review
Using NVTX markers to instrument GPU-accelerated applications
The Nsight Systems command line interface for summarizing application-level behavior
The Nsight Systems GUI for visualizing a timeline of the entire application
The Nsight Compute command line interface for focusing on performance aspects of individual kernels
The Nsight Compute GUI for obtaining a comprehensive view of kernel performance

Participants will follow live demonstrations and conduct hands-on exercises using the NHR@FAU clusters, gaining practical experience to reinforce the concepts learned.

Certification

A certificate of participation will be awarded to all participants who actively engage in the course.

Prerequisites

Participants should meet the following requirements:

A basic understanding of programming in C++
Experience with GPU programming using one or more of the following: CUDA, OpenMP, OpenACC
Familiarity with compiling applications using a command-line compiler

Upcoming Iterations and Additional Courses

You can find dates and registration links for this and other upcoming NHR@FAU courses at https://hpc.fau.de/teaching/tutorials-and-courses/.