Introduction to CUDA C/C++

CUDA is NVIDIA’s parallel computing platform for GPU-accelerated C/C++ applications. The course covers the full arc from writing and running a first GPU kernel, through systematically porting existing CPU code to the GPU, to understanding the execution model and optimizing with shared memory, atomics, and reductions. Short teaching segments alternate with exercises in interactive Jupyter notebooks; optional modules extend the course to debugging techniques and CUDA streams.

This course was developed as an updated replacement for the discontinued Fundamentals of Accelerated Computing with CUDA C/C++ NVIDIA DLI course. NVIDIA DLI’s own successor in this space, the Fundamentals of Accelerated Computing with Modern CUDA C++, is also offered by NHR@FAU and covers the similar fundamentals, but with a strong focus on modern C++ and Thrust.

Level: Beginner

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

C or C++ programming experience, including variables, loops, conditionals, functions, and arrays

Technical

A modern web browser (for JupyterHub access to NHR@FAU’s HPC clusters)
A local installation of NVIDIA Nsight Systems (no local GPU required)

After completing this course, you will be able to:

Write, compile, and run GPU-accelerated C/C++ code using CUDA
Map computations to CUDA’s thread hierarchy and launch kernels with suitable execution configurations
Systematically port existing CPU applications to the GPU following a structured porting workflow
Understand the GPU execution model and its implications for choosing thread block sizes and grid dimensions
Use shared memory, atomic operations, and optimize reduction patterns
Profile GPU applications with NVIDIA Nsight Systems to identify and address performance bottlenecks
[Optional] Use CUDA streams for concurrent kernel execution and copy/compute overlap

Introduction: GPU computing overview, course tooling, and interactive Jupyter notebook workflow
First GPU application: CUDA kernel syntax, compilation, and the thread hierarchy
Porting applications to GPU: systematic porting approach and GPU memory management
GPU architecture: the thread execution model and selecting execution configurations
Reductions, atomics, and shared memory: avoiding race conditions and optimizing reduction patterns on GPUs
[Optional] Debugging CUDA applications: techniques for debugging GPU applications and writing robust code
[Optional] CUDA streams: concurrent kernel execution and copy/compute overlap

2026, Sep 3-4: two half-day online course in collaboration with NHR@TUD (Register); part 1 of From Zero to Multi-Node GPU Programming

2026, Jun 10-11: one and a half day online course in collaboration with LRZ; part 3 of GPU Programming Workshop
2026, Mar 9: full-day online course in collaboration with NHR@TUD; part 1 of From Zero to Multi-Node GPU Programming

For an overview of all NHR@FAU courses, visit the course overview page.

Introduction to CUDA C/C++

Course Details

Prerequisites

Learning Outcomes

Course Outline

Upcoming Events

Past Events (2)