Multi-GPU Programming with CUDA C++ (two-part online course)

Symbolic picture for the article. The link opens the image in a large view.
This brand new two-part course provides an in-depth coverage of how to put multiple NVIDIA GPUs to work in your code.
Please note that the successful participation in a CUDA intro course such as “Fundamentals of Accelerated Computing with CUDA C/C++” is a required prerequisite to benefit from this course.
Course dates and times:
Part 1: Friday, April 5, 2024
Part 2: Wednesday, April 10, 2024
9:00 a.m. – 5:00 p.m.
Part 1 objectives:
  • Use concurrent CUDA streams to overlap memory transfers with GPU computation,
  • Utilize all GPUs on a single node to scale workloads across available GPUs,
  • Combine the use of copy/compute overlap with multiple GPUs, and
  • Rely on the NVIDIA Nsight Systems timeline to observe improvement opportunities and the impact of the techniques covered in the workshop
Part 2 objectives:
  • Use several methods for writing multi-GPU CUDA C++ applications,
  • Use a variety of multi-GPU communication patterns and understand their tradeoffs,
  • Write portable, scalable CUDA code with the single-program multiple data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM,
  • Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory mode and its ability to perform GPU-initiated data transfers, and
  • Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges.
Please see the course pages (separate for both days!) for more details and registration:
For a full overview of the NHR@FAU course program, please see

Dr. Georg Hager

Head of Training & Support

Erlangen National High Performance Computing Center (NHR@FAU)
Training & Support Division