Fundamentals of Accelerated Computing with OpenACC

OpenACC is a high-level, directive-based programming model that enables parallel computing on CPUs and GPUs for C/C++ and Fortran applications. Rather than rewriting code from scratch, developers annotate existing loops and data regions with OpenACC directives to offload work to GPU accelerators. This course covers the full workflow – from profiling applications to identify hotspots, to offloading computation with parallel, kernels, and loop directives, to fine-tuning data movement between CPU and GPU with structured and unstructured data regions. No prior GPU programming experience is required.

Further information about this tutorial can be found on the NVIDIA DLI course page.

This tutorial has been removed from the DLI course catalog and cannot be offered as official DLI event any longer. It can be offered (upon request) based on the official OpenACC training materials.

Level: Beginner

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

  • C, C++, or Fortran programming experience, including variables, loops, conditionals, functions, and arrays

Technical

  • A free NVIDIA developer account for DLI events
  • A local installation of NVIDIA Nsight Systems for non-DLI events

After completing this course, you will be able to:

  • Profile C/C++ and Fortran applications to identify performance hotspots suitable for GPU acceleration
  • Offload computation to the GPU using OpenACC parallel, kernels, and loop directives
  • Control data movement between CPU and GPU with structured and unstructured data regions
  • Apply loop optimization clauses – including reduction, collapse, tile, and gang/worker/vector – to maximize GPU throughput
  • Use CUDA Unified Memory alongside OpenACC to simplify memory management
  • Validate and profile OpenACC-accelerated applications to guide iterative optimization

  • OpenACC programming model: parallelism basics, goals, and fundamental code parallelization
  • Profiling with OpenACC: compilation, multicore profiling, and identifying acceleration opportunities
  • OpenACC directives: parallel, kernels, and loop constructs for GPU offloading
  • Data management: data directives and clauses, structured and unstructured data regions, and the update directive
  • Loop optimizations: seq/auto, independent, reduction, collapse, tile, and gang/worker/vector concepts

  • 2026, Jun 8: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 1 of GPU Programming Workshop
  • 2025, Oct 27: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 1 of GPU Programming Workshop
  • 2025, Apr 16: full-day online course in collaboration with EUMaster4HPC
  • 2025, Feb 3: full-day online course in collaboration with LRZ (GPU Programming Workshop); part 1 of GPU Programming Workshop

For an overview of all NHR@FAU courses, visit the course overview page.