Fundamentals of Accelerated Computing with CUDA Python

CUDA Python with Numba brings GPU acceleration to Python data workflows without requiring C/C++ code. This course covers the full range from accelerating NumPy ufuncs on the GPU with minimal code changes, to configuring parallel execution using CUDA’s thread hierarchy, to writing custom CUDA device kernels in pure Python. Participants also learn how to optimize memory access patterns through coalescing and on-device shared memory. No prior CUDA or GPU programming experience is required.

Further information about this tutorial can be found on the NVIDIA DLI course page.

This course was discontinued in 2026.

Level: Beginner

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

  • Python programming experience, including variable types, loops, conditionals, functions, and array manipulations
  • NumPy experience, including ndarrays and ufuncs

Technical

  • A free NVIDIA developer account

After completing this course, you will be able to:

  • Accelerate NumPy ufuncs on the GPU using Numba decorators with minimal code changes
  • Configure parallel execution by mapping work across CUDA’s thread and grid hierarchy
  • Launch massively parallel custom CUDA device kernels written in Python
  • Avoid race conditions in parallel kernels using CUDA atomic operations
  • Work with multidimensional grids to parallelize operations on 2D matrices
  • Optimize kernel memory bandwidth through memory coalescing and on-device shared memory

  • Introduction to CUDA Python with Numba: GPU-accelerating NumPy functions and managing host-device memory transfers
  • Custom CUDA kernels in Python: thread hierarchy, massively parallel kernel launches, and atomic operations
  • Multidimensional grids and shared memory: parallelizing 2D matrix operations and improving memory coalescing

  • 2026, Mar 27: full-day online course
  • 2025, Oct 29: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 3 of GPU Programming Workshop
  • 2025, Sep 29: full-day online course
  • 2025, Apr 2: full-day online course in collaboration with EUMaster4HPC
  • 2025, Feb 5: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 3 of GPU Programming Workshop
  • 2025, Jan 16: full-day online course in collaboration with NVIDIA (NVIDIA DLI Virtual Workshop Series for Higher Education)
  • 2024, Oct 24: full-day online course in collaboration with NVIDIA (NVIDIA DLI Virtual Workshop Series for Higher Education)
  • 2024, Oct 7: full-day online course
  • 2024, Mar 14: full-day online course
  • 2024, Mar 6-7: two half-day online course in collaboration with EUMaster4HPC
  • 2023, Sep 18: full-day on-site course at NHR@FAU
  • 2023, Mar 16: full-day on-site course at NHR@FAU
  • 2022, Sep 22-23: two half-day online course
  • 2022, Aug 2: full-day online course

For an overview of all NHR@FAU courses, visit the course overview page.