Choosing GPU Programming Approaches

The GPU programming landscape has grown from a single dominant framework (CUDA) into a diverse ecosystem of vendor-neutral and performance-portable alternatives. Choosing the right approach for a given application – considering hardware targets, portability requirements, team expertise, and performance goals – is a non-trivial decision. This course surveys the most widely used GPU programming models: CUDA/HIP, SYCL, modern C++ parallel algorithms, Thrust, OpenACC, OpenMP offloading, and Kokkos. For each approach, participants see representative code, learn the key abstractions, and assess the trade-offs in portability, expressiveness, and performance.

Level: Intermediate

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

Familiarity with modern C++ programming (templates, lambdas, and the STL)
Prior experience with at least one GPU programming approach is recommended but not required

Technical

A modern web browser (exercises run on NHR@FAU’s HPC clusters via JupyterHub – no local installation required)

After completing this course, you will be able to:

Describe the key abstractions and execution model of each major GPU programming framework: CUDA/HIP, SYCL, OpenACC, OpenMP offloading, Kokkos, Thrust, and standard C++ parallel algorithms
Implement a representative kernel in multiple frameworks and compare the resulting code
Evaluate each approach across the dimensions of portability, performance, and programming effort
Select the most appropriate GPU programming model for a given combination of hardware targets and application requirements
Profile a GPU application with Nsight Systems and Nsight Compute and relate observed performance to the choice of approach
Identify NHR@FAU courses and resources for deepening expertise in any specific framework

GPU programming landscape: hardware diversity, portability challenges, and framework taxonomy
Low-level, vendor-specific approaches: CUDA and HIP
Open standard directives: OpenACC and OpenMP target offloading
Performance portability libraries: Kokkos
C++ abstraction layers: SYCL, Thrust, and standard library parallel algorithms
Performance analysis: profiling with Nsight Systems and Nsight Compute, and common optimization patterns
Hands-on programming challenge: porting STREAM, a 2D stencil, and a conjugate-gradient solver across multiple approaches
Comparative evaluation: portability, performance, and practical considerations for framework selection

2026, Nov 9-10: two half-day online course (Register)

2026, Mar 4-5: two half-day online course
2025, Sep 4-5: two half-day online course

For an overview of all NHR@FAU courses, visit the course overview page.

Choosing GPU Programming Approaches

Course Details

Prerequisites

Learning Outcomes

Course Outline

Upcoming Events

Past Events (2)