The GPU programming landscape has grown from a single dominant framework (CUDA) into a diverse ecosystem of vendor-neutral and performance-portable alternatives. Choosing the right approach for a given application – considering hardware targets, portability requirements, team expertise, and performance goals – is a non-trivial decision. This course surveys the most widely used GPU programming models: CUDA/HIP, SYCL, modern C++ parallel algorithms, Thrust, OpenACC, OpenMP offloading, and Kokkos. For each approach, participants see representative code, learn the key abstractions, and assess the trade-offs in portability, expressiveness, and performance.
Level: Intermediate
Language: English (German upon request for bespoke courses)
Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).
Knowledge
- Familiarity with modern C++ programming (templates, lambdas, and the STL)
- Prior experience with at least one GPU programming approach is recommended but not required
Technical
- A modern web browser (exercises run on NHR@FAU’s HPC clusters via JupyterHub – no local installation required)
After completing this course, you will be able to:
- Describe the key abstractions and execution model of each major GPU programming framework: CUDA/HIP, SYCL, OpenACC, OpenMP offloading, Kokkos, Thrust, and standard C++ parallel algorithms
- Implement a representative kernel in multiple frameworks and compare the resulting code
- Evaluate each approach across the dimensions of portability, performance, and programming effort
- Select the most appropriate GPU programming model for a given combination of hardware targets and application requirements
- Profile a GPU application with Nsight Systems and Nsight Compute and relate observed performance to the choice of approach
- Identify NHR@FAU courses and resources for deepening expertise in any specific framework
- GPU programming landscape: hardware diversity, portability challenges, and framework taxonomy
- Low-level, vendor-specific approaches: CUDA and HIP
- Open standard directives: OpenACC and OpenMP target offloading
- Performance portability libraries: Kokkos
- C++ abstraction layers: SYCL, Thrust, and standard library parallel algorithms
- Performance analysis: profiling with Nsight Systems and Nsight Compute, and common optimization patterns
- Hands-on programming challenge: porting STREAM, a 2D stencil, and a conjugate-gradient solver across multiple approaches
- Comparative evaluation: portability, performance, and practical considerations for framework selection
- 2026, Nov 9-10: two half-day online course (Register)
- 2026, Mar 4-5: two half-day online course
- 2025, Sep 4: full-day online course
For an overview of all NHR@FAU courses, visit the course overview page.