Choosing GPU Programming Approaches

The GPU programming landscape has grown from a single dominant framework (CUDA) into a diverse ecosystem of vendor-neutral and performance-portable alternatives. Choosing the right approach for a given application – considering hardware targets, portability requirements, team expertise, and performance goals – is a non-trivial decision. This course surveys the most widely used GPU programming models: CUDA/HIP, SYCL, modern C++ parallel algorithms, Thrust, OpenACC, OpenMP offloading, and Kokkos. For each approach, participants see representative code, learn the key abstractions, and assess the trade-offs in portability, expressiveness, and performance.

Level: Intermediate

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

  • Familiarity with modern C++ programming (templates, lambdas, and the STL)
  • Prior experience with at least one GPU programming approach is recommended but not required

Technical

  • A modern web browser (exercises run on NHR@FAU’s HPC clusters via JupyterHub – no local installation required)

After completing this course, you will be able to:

  • Describe the key abstractions and execution model of each major GPU programming framework: CUDA/HIP, SYCL, OpenACC, OpenMP offloading, Kokkos, Thrust, and standard C++ parallel algorithms
  • Implement a representative kernel in multiple frameworks and compare the resulting code
  • Evaluate each approach across the dimensions of portability, performance, and programming effort
  • Select the most appropriate GPU programming model for a given combination of hardware targets and application requirements
  • Profile a GPU application with Nsight Systems and Nsight Compute and relate observed performance to the choice of approach
  • Identify NHR@FAU courses and resources for deepening expertise in any specific framework

  • GPU programming landscape: hardware diversity, portability challenges, and framework taxonomy
  • Low-level, vendor-specific approaches: CUDA and HIP
  • Open standard directives: OpenACC and OpenMP target offloading
  • Performance portability libraries: Kokkos
  • C++ abstraction layers: SYCL, Thrust, and standard library parallel algorithms
  • Performance analysis: profiling with Nsight Systems and Nsight Compute, and common optimization patterns
  • Hands-on programming challenge: porting STREAM, a 2D stencil, and a conjugate-gradient solver across multiple approaches
  • Comparative evaluation: portability, performance, and practical considerations for framework selection

  • 2026, Nov 9-10: two half-day online course (Register)

  • 2026, Mar 4-5: two half-day online course
  • 2025, Sep 4: full-day online course

For an overview of all NHR@FAU courses, visit the course overview page.