GPU Performance Analysis

This condensed module covers the essentials of GPU performance analysis in approximately two hours. It introduces NVIDIA Nsight Systems and Nsight Compute alongside resource-based performance models, giving participants the conceptual and practical foundation needed to identify GPU bottlenecks and reason about optimization potential. The module has been delivered annually at the International HPC Summer School (IHPCSS) since 2019.

For a comprehensive treatment, see the full-length GPU Performance Engineering course.

Level: Intermediate

Language: English

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

  • Experience with GPU programming in CUDA or OpenMP offloading using C/C++

Technical

  • A local installation of NVIDIA Nsight Systems and Nsight Compute (no local GPU required) if participants wish to follow along when exploring provided profile files

After completing this course, you will be able to:

  • Understand how GPU resource limits translate into theoretical performance ceilings
  • Use Nsight Systems and Nsight Compute to identify the dominant bottleneck of a GPU kernel
  • Apply the roofline model to quantify the gap to peak performance

  • GPU architecture and resource-based performance models (roofline)
  • Timeline analysis with Nsight Systems
  • Kernel profiling with Nsight Compute

For an overview of all NHR@FAU courses, visit the course overview page.