• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
NHR@FAU
  • FAUTo the central FAU website
Suche öffnen
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

NHR@FAU

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • BayernKI
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Support Success Stories
    • Annual Reports
    • NHR@FAU Newsletters
    • Previous Events
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters & Talks
    • Performance Tools and Libraries
    • NHR PerfLab Seminar
    • Projects
    • Workshops
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures & Seminars
    • Tutorials & Courses
    • Monthly HPC Café and Beginner’s Introduction
    • Theses
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • HPC User Training
    • HPC System Utilization
    Portal Systems & Services
  • FAQ

NHR@FAU

  1. Home
  2. Teaching & Training
  3. Tutorials & Courses
  4. Fundamentals of Accelerated Computing with CUDA C/C++

Fundamentals of Accelerated Computing with CUDA C/C++

In page navigation: Teaching & Training
  • Lectures & Seminars
  • Tutorials & Courses
    • Accelerating CUDA C++ Applications with Multiple GPUs
    • C++ for Beginners
    • Choosing GPU Programming Approaches
    • Core-Level Performance Engineering
    • From Zero to Multi-Node GPU Programming
    • Fundamentals of Accelerated Computing with CUDA C/C++
    • Fundamentals of Accelerated Computing with CUDA Python
    • Fundamentals of Accelerated Computing with Modern CUDA C++
    • Fundamentals of Accelerated Computing with OpenACC
    • GPU Performance Engineering
    • Hybrid Programming in HPC - MPI+X
    • Introduction to OpenMP
    • Introduction to the LIKWID Tool Suite
    • Modern C++ Software Design
    • Node-Level Performance Engineering
    • Parallel Programming of High-Performance Systems (PPHPS)
    • Performance Engineering for Linear Solvers
    • Scaling CUDA C++ Applications to Multiple Nodes
  • Monthly HPC Café and Beginner's Introduction
  • Theses
  • Student Cluster Competition

Fundamentals of Accelerated Computing with CUDA C/C++

Course Description

By the end of this workshop, participants will have a solid grasp of the essential tools and techniques for GPU-accelerating C/C++ applications using CUDA. They will be able to write GPU-executable code, leverage data parallelism, optimize memory transfers with asynchronous prefetching, and use both command-line and visual profilers to guide performance tuning. Additionally, they will know how to employ concurrent streams to increase parallelism and apply a profile-driven approach to develop or refactor CUDA applications for maximum performance.

Additional information is available on the Nvidia DLI course homepage.

Until March 2025, this course was offered as an official NVIDIA Deep Learning Institute (DLI) program. Its successor, Fundamentals of Accelerated Computing with Modern CUDA C++, is also offered by NHR@FAU. Due to the original course’s popularity, we continue to offer a custom, updated version that builds upon the original material.

Learning Objectives

At the conclusion of the workshop, participants will have a solid understanding of the fundamental tools and techniques for GPU-accelerating C/C++ applications with CUDA. Participants will be able to:

  • Write code that can be executed by a GPU accelerator
  • Identify and express data and instruction-level parallelism in C/C++ applications using CUDA
  • Utilize CUDA-managed memory and optimize memory migration through asynchronous prefetching
  • Use command-line and visual profilers to guide optimization efforts
  • Leverage concurrent streams to achieve instruction-level parallelism
  • Write GPU-accelerated CUDA C/C++ applications or refactor existing CPU-only applications using a profile-driven approach

Course Structure

Accelerating Applications with CUDA C/C++

  • Writing, compiling, and running GPU code
  • Controlling the parallel thread hierarchy
  • Allocating and freeing memory for the GPU

Managing Accelerated Application Memory with CUDA C/C++

  • Profiling CUDA code with the command-line profiler
  • Details on unified memory
  • Optimizing unified memory management

Asynchronous Streaming and Visual Profiling for Accelerated Applications with CUDA C/C++

  • Profiling CUDA code with NVIDIA Nsight Systems
  • Using concurrent CUDA streams

Certification

A certificate of participation will be awarded to all participants who actively engage in the course.

Prerequisites

Participants should additionally meet the following requirements:

  • Basic C/C++ competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
  • No previous knowledge of CUDA programming is assumed

Upcoming Iterations and Additional Courses

You can find dates and registration links for this and other upcoming NHR@FAU courses at https://hpc.fau.de/teaching/tutorials-and-courses/.

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
  • RSS Feed
Up