Fundamentals of Accelerated Computing with CUDA Python
Course Description
By the end of this workshop, participants will be proficient in the core tools and techniques for GPU-accelerating Python applications using CUDA and Numba. They will learn how to accelerate NumPy ufuncs on the GPU, configure parallel execution using CUDA’s thread hierarchy, implement custom device kernels for greater performance and flexibility, and optimize memory access through coalescing and shared memory to enhance kernel efficiency.
Additional information is available on the Nvidia DLI course homepage.
Learning Objectives
At the conclusion of the workshop, you will have an understanding of the fundamental tools and techniques for GPU-accelerating Python applications with CUDA and Numba, including:
- GPU-accelerating NumPy ufuncs with just a few lines of code
- Configuring code parallelization using the CUDA thread hierarchy
- Writing custom CUDA device kernels for maximum performance and flexibility
- Using memory coalescing and on-device shared memory to increase CUDA kernel bandwidth
Course Structure
Introduction to CUDA Python with Numba
- Begin working with the Numba compiler and CUDA programming in Python.
- Use Numba decorators to GPU-accelerate numerical Python functions.
- Optimize host-to-device and device-to-host memory transfers.
Custom CUDA Kernels in Python with Numba
- Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
- Launch massively parallel custom CUDA kernels on the GPU.
- Utilize CUDA atomic operations to avoid race conditions during parallel execution.
Multidimensional Grids, and Shared Memory for CUDA Python with Numba
- Learn multidimensional grid creation and how to work in parallel on 2D matrices.
- Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.
Certification
Upon successfully completing the course assessments, participants will receive an NVIDIA DLI Certificate, recognizing their subject matter expertise and supporting their professional career growth.
Prerequisites
A free NVIDIA developer account is required to access the course material. Please register before the training at https://learn.nvidia.com/join.
Participants should additionally meet the following requirements:
- Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
- NumPy competency, including the use of ndarrays and ufuncs
- No previous knowledge of CUDA programming is required
Upcoming Iterations and Additional Courses
You can find dates and registration links for this and other upcoming NHR@FAU courses at https://hpc.fau.de/teaching/tutorials-and-courses/.