Fundamentals of Accelerated Computing with OpenACC

OpenACC is a high-level, directive-based programming model that enables parallel computing on CPUs and GPUs for C/C++ and Fortran applications. Rather than rewriting code from scratch, developers annotate existing loops and data regions with OpenACC directives to offload work to GPU accelerators. This course covers the full workflow – from profiling applications to identify hotspots, to offloading computation with parallel, kernels, and loop directives, to fine-tuning data movement between CPU and GPU with structured and unstructured data regions. No prior GPU programming experience is required.

Further information about this tutorial can be found on the NVIDIA DLI course page.

This tutorial has been removed from the DLI course catalog and cannot be offered as official DLI event any longer. It can be offered (upon request) based on the official OpenACC training materials.

Level: Beginner

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

C, C++, or Fortran programming experience, including variables, loops, conditionals, functions, and arrays

Technical

A free NVIDIA developer account for DLI events
A local installation of NVIDIA Nsight Systems for non-DLI events

After completing this course, you will be able to:

Profile C/C++ and Fortran applications to identify performance hotspots suitable for GPU acceleration
Offload computation to the GPU using OpenACC parallel, kernels, and loop directives
Control data movement between CPU and GPU with structured and unstructured data regions
Apply loop optimization clauses – including reduction, collapse, tile, and gang/worker/vector – to maximize GPU throughput
Use CUDA Unified Memory alongside OpenACC to simplify memory management
Validate and profile OpenACC-accelerated applications to guide iterative optimization

OpenACC programming model: parallelism basics, goals, and fundamental code parallelization
Profiling with OpenACC: compilation, multicore profiling, and identifying acceleration opportunities
OpenACC directives: parallel, kernels, and loop constructs for GPU offloading
Data management: data directives and clauses, structured and unstructured data regions, and the update directive
Loop optimizations: seq/auto, independent, reduction, collapse, tile, and gang/worker/vector concepts

2026, Jun 8: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 1 of GPU Programming Workshop
2025, Oct 27: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 1 of GPU Programming Workshop
2025, Apr 16: full-day online course in collaboration with EUMaster4HPC
2025, Feb 3: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 1 of GPU Programming Workshop

For an overview of all NHR@FAU courses, visit the course overview page.

Fundamentals of Accelerated Computing with OpenACC

Course Details

Prerequisites

Learning Outcomes

Course Outline

Past Events (4)