Fundamentals of Accelerated Computing with CUDA C/C++

CUDA is NVIDIA’s parallel computing platform and programming model for GPU-accelerated C/C++ applications. This course introduces the essential tools and techniques for writing GPU-accelerating code: expressing data parallelism through CUDA’s thread hierarchy, managing CPU-GPU memory with unified memory and asynchronous prefetching, and using concurrent streams to increase throughput. No prior CUDA or GPU programming experience is required.

Further information about this tutorial can be found on the NVIDIA DLI course page.

This course has been discontinued in 2025 and replaced with Fundamentals of Accelerated Computing with Modern CUDA C++. It is complemented by the NHR@FAU course Introduction to CUDA C/C++ which covers GPU programming from a lower-level perspective.

Level: Beginner

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

C or C++ programming experience, including variables, loops, conditionals, functions, and arrays

Technical

A free NVIDIA developer account
A local installation of NVIDIA Nsight Systems is recommended

After completing this course, you will be able to:

Write, compile, and run GPU-accelerated C/C++ code using CUDA
Express data parallelism by mapping computation across CUDA thread hierarchies
Manage CPU-GPU memory using unified memory and asynchronous prefetching
Profile GPU applications with command-line and visual profilers to identify bottlenecks
Increase throughput by overlapping computation and data transfers with concurrent CUDA streams
Apply a profile-driven workflow to iteratively optimize CUDA applications for maximum performance

Accelerating applications with CUDA C/C++: GPU code compilation, parallel thread hierarchy, and GPU memory allocation
Managing accelerated application memory: command-line profiling, unified memory, and asynchronous optimization techniques
Asynchronous streaming and visual profiling: NVIDIA Nsight Systems, concurrent CUDA streams, and performance analysis

2025, Sep 8-9: full-day online tutorial in collaboration with NHR@TUD, as part of From Zero to Multi-Node GPU Programming; part 1 of From Zero to Multi-Node GPU Programming
2025, Mar 12: full-day online tutorial in collaboration with NHR@TUD, as part of From Zero to Multi-Node GPU Programming; part 1 of From Zero to Multi-Node GPU Programming
2025, Feb 4: full-day online tutorial, as part of GPU Programming Workshop; part 2 of GPU Programming Workshop
2024, Sep 18: full-day online tutorial in collaboration with NHR@TUD, as part of From Zero to Multi-Node GPU Programming; part 1 of From Zero to Multi-Node GPU Programming
2024, Mar 4-5: two half-day online course in collaboration with EUMaster4HPC
2024, Feb 29: full-day online course
2023, Jul 28: full-day on-site course at NHR@FAU
2023, Mar 23: full-day on-site course at NHR@FAU
2023, Mar 8-9: two half-day online course in collaboration with EUMaster4HPC
2022, Dec 9, Dec 16: two half-day online course in collaboration with EUMaster4HPC
2022, Nov 28: full-day online course in collaboration with LRZ
2022, Apr 21-22: two half-day online course

For an overview of all NHR@FAU courses, visit the course overview page.

Fundamentals of Accelerated Computing with CUDA C/C++

Course Details

Prerequisites

Learning Outcomes

Course Outline

Past Events (12)