• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
NHR@FAU
  • FAUTo the central FAU website
Suche öffnen
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

NHR@FAU

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • BayernKI
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Support Success Stories
    • Annual Reports
    • NHR@FAU Newsletters
    • Previous Events
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters & Talks
    • Performance Tools and Libraries
    • NHR PerfLab Seminar
    • Projects
    • Workshops
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures & Seminars
    • Tutorials & Courses
    • Monthly HPC Café and Beginner’s Introduction
    • Theses
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • HPC User Training
    • HPC System Utilization
    Portal Systems & Services
  • FAQ

NHR@FAU

  1. Home
  2. Teaching & Training
  3. Tutorials & Courses
  4. Fundamentals of Accelerated Computing with Modern CUDA C++

Fundamentals of Accelerated Computing with Modern CUDA C++

In page navigation: Teaching & Training
  • Lectures & Seminars
  • Tutorials & Courses
    • Accelerating CUDA C++ Applications with Multiple GPUs
    • C++ for Beginners
    • Core-Level Performance Engineering
    • Fundamentals of Accelerated Computing with CUDA C/C++
    • Fundamentals of Accelerated Computing with CUDA Python
    • Fundamentals of Accelerated Computing with Modern CUDA C++
    • Fundamentals of Accelerated Computing with OpenACC
    • GPU Performance Engineering
    • Hybrid Programming in HPC - MPI+X
    • Introduction to OpenMP
    • Introduction to the LIKWID Tool Suite
    • Modern C++ Software Design
    • Node-Level Performance Engineering
    • Parallel Programming of High-Performance Systems (PPHPS)
    • Performance Engineering for Linear Solvers
    • Scaling CUDA C++ Applications to Multiple Nodes
  • Monthly HPC Café and Beginner's Introduction
  • Theses
  • Student Cluster Competition

Fundamentals of Accelerated Computing with Modern CUDA C++

Course Description

By the end of the workshop, participants will understand the fundamental concepts and techniques for accelerating C++ code with CUDA. They will be able to write and compile code that runs on the GPU, optimize memory transfers between CPU and GPU, and leverage parallel algorithms to simplify adding GPU acceleration.

Additionally, participants will learn to implement custom parallel algorithms through CUDA kernels, utilize concurrent CUDA streams to overlap computation with memory operations, and identify the best opportunities to integrate CUDA acceleration into existing CPU-only applications.

Additional information is available on the Nvidia DLI course homepage.

Learning Objectives

At the conclusion of the workshop, participants will have a fundamental understanding of the concepts and techniques for accelerating C++ code with CUDA and be able to:
  • Write and compile code that runs on the GPU
  • Optimize memory migration between CPU and GPU
  • Leverage powerful parallel algorithms that simplify adding GPU acceleration to your code
  • Implement your own parallel algorithms by directly programming GPUs with CUDA kernels
  • Utilize concurrent CUDA streams to overlap memory traffic with compute
  • Know where, when, and how to best add CUDA acceleration to existing CPU-only applications

Course Structure

CUDA Made Easy: Accelerating Applications with Parallel Algorithms

  • Write, compile, and run GPU code
  • Refactor standard algorithms to execute on GPU
  • Extend standard algorithms to fit your unique use cases

Unlocking the GPU’s Full Potential: Harnessing Asynchrony with CUDA Streams

  • Use CUDA streams to overlap execution and memory transfers
  • Use CUDA events for asynchronous dependency management
  • Profile CUDA code with NVIDIA Nsight Systems

Implementing New Algorithms with CUDA Kernels

  • Write and launch custom CUDA kernels
  • Control thread hierarchy
  • Leverage shared memory
  • Use cooperative algorithms

Certification

Upon successfully completing the course assessments, participants will receive an NVIDIA DLI Certificate, recognizing their subject matter expertise and supporting their professional career growth.

Prerequisites

A free NVIDIA developer account is required to access the course material. Please register before the training at https://learn.nvidia.com/join.

Participants should additionally meet the following requirements:

  • Sound C++ competency, including familiarity with lambda expressions, loops, conditional statements, functions, standard algorithms and containers.
  • No previous knowledge of CUDA programming is assumed

Upcoming Iterations and Additional Courses

You can find dates and registration links for this and other upcoming NHR@FAU courses at https://hpc.fau.de/teaching/tutorials-and-courses/.

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
  • RSS Feed
Up