Fundamentals of Accelerated Computing with CUDA Python

Course Description

By the end of this workshop, participants will be proficient in the core tools and techniques for GPU-accelerating Python applications using CUDA and Numba. They will learn how to accelerate NumPy ufuncs on the GPU, configure parallel execution using CUDA’s thread hierarchy, implement custom device kernels for greater performance and flexibility, and optimize memory access through coalescing and shared memory to enhance kernel efficiency.

Additional information is available on the Nvidia DLI course homepage.

Learning Objectives

At the conclusion of the workshop, you will have an understanding of the fundamental tools and techniques for GPU-accelerating Python applications with CUDA and Numba, including:

GPU-accelerating NumPy ufuncs with just a few lines of code
Configuring code parallelization using the CUDA thread hierarchy
Writing custom CUDA device kernels for maximum performance and flexibility
Using memory coalescing and on-device shared memory to increase CUDA kernel bandwidth

Course Structure

Introduction to CUDA Python with Numba

Begin working with the Numba compiler and CUDA programming in Python.
Use Numba decorators to GPU-accelerate numerical Python functions.
Optimize host-to-device and device-to-host memory transfers.

Custom CUDA Kernels in Python with Numba

Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
Launch massively parallel custom CUDA kernels on the GPU.
Utilize CUDA atomic operations to avoid race conditions during parallel execution.

Multidimensional Grids, and Shared Memory for CUDA Python with Numba

Learn multidimensional grid creation and how to work in parallel on 2D matrices.
Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.

Certification

Upon successfully completing the course assessments, participants will receive an NVIDIA DLI Certificate, recognizing their subject matter expertise and supporting their professional career growth.

Prerequisites

A free NVIDIA developer account is required to access the course material. Please register before the training at https://learn.nvidia.com/join.

Participants should additionally meet the following requirements:

Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
NumPy competency, including the use of ndarrays and ufuncs
No previous knowledge of CUDA programming is required

Upcoming Iterations and Additional Courses

You can find dates and registration links for this and other upcoming NHR@FAU courses at https://go-nhr.de/trainings .