CUDA is NVIDIA’s parallel computing platform and programming model for GPU-accelerated C/C++ applications. This course introduces the essential tools and techniques for writing GPU-accelerating code: expressing data parallelism through CUDA’s thread hierarchy, managing CPU-GPU memory with unified memory and asynchronous prefetching, and using concurrent streams to increase throughput. No prior CUDA or GPU programming experience is required.
Further information about this tutorial can be found on the NVIDIA DLI course page.
This course has been discontinued in 2025 and replaced with Fundamentals of Accelerated Computing with Modern CUDA C++. It is complemented by the NHR@FAU course Introduction to CUDA C/C++ which covers GPU programming from a lower-level perspective.
Level: Beginner
Language: English (German upon request for bespoke courses)
Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).
Knowledge
- C or C++ programming experience, including variables, loops, conditionals, functions, and arrays
Technical
- A free NVIDIA developer account
- A local installation of NVIDIA Nsight Systems is recommended
After completing this course, you will be able to:
- Write, compile, and run GPU-accelerated C/C++ code using CUDA
- Express data parallelism by mapping computation across CUDA thread hierarchies
- Manage CPU-GPU memory using unified memory and asynchronous prefetching
- Profile GPU applications with command-line and visual profilers to identify bottlenecks
- Increase throughput by overlapping computation and data transfers with concurrent CUDA streams
- Apply a profile-driven workflow to iteratively optimize CUDA applications for maximum performance
- Accelerating applications with CUDA C/C++: GPU code compilation, parallel thread hierarchy, and GPU memory allocation
- Managing accelerated application memory: command-line profiling, unified memory, and asynchronous optimization techniques
- Asynchronous streaming and visual profiling: NVIDIA Nsight Systems, concurrent CUDA streams, and performance analysis
- 2025, Sep 8-9: full-day online tutorial in collaboration with NHR@TUD, as part of From Zero to Multi-Node GPU Programming; part 1 of From Zero to Multi-Node GPU Programming
- 2025, Mar 12: full-day online tutorial in collaboration with NHR@TUD, as part of From Zero to Multi-Node GPU Programming; part 1 of From Zero to Multi-Node GPU Programming
- 2025, Feb 4: full-day online tutorial, as part of GPU Programming Workshop; part 2 of GPU Programming Workshop
- 2024, Sep 18: full-day online tutorial in collaboration with NHR@TUD, as part of From Zero to Multi-Node GPU Programming; part 1 of From Zero to Multi-Node GPU Programming
- 2024, Mar 4-5: two half-day online course in collaboration with EUMaster4HPC
- 2024, Feb 29: full-day online course
- 2023, Jul 28: full-day on-site course at NHR@FAU
- 2023, Mar 23: full-day on-site course at NHR@FAU
- 2023, Mar 8-9: two half-day online course in collaboration with EUMaster4HPC
- 2022, Dec 9-16: two half-day online course in collaboration with EUMaster4HPC
- 2022, Nov 28: full-day online course in collaboration with LRZ
- 2022, Apr 21: full-day online course
For an overview of all NHR@FAU courses, visit the course overview page.