Hybrid Programming in HPC - MPI+X

Modern HPC clusters are hierarchical: distributed-memory parallelism connects nodes across a network, while each node exposes multiple sockets and cores for shared-memory parallelism. Exploiting this hierarchy efficiently requires combining programming models – typically MPI for inter-node communication and OpenMP (or MPI-3.0 shared memory) within each node. This course examines the motivations, design choices, and performance trade-offs of such hybrid approaches, covering MPI+OpenMP, MPI-3.0 shared memory, and MPI+OpenMP GPU offloading side by side.

Through case studies and targeted micro-benchmarks, participants explore the performance implications of process and thread placement, intra-node communication strategies, and halo exchange patterns on multi-socket, multi-core systems.

This is an advanced course intended for participants who are already proficient in both MPI and OpenMP. It pairs naturally with the NHR@FAU Introduction to Parallel Programming with MPI and Introduction to Parallel Programming with OpenMP courses.

Level: Advanced

Language: English (German upon request for bespoke courses)

Price and Eligibility: Refer to the registration page for each event (generally free of charge for members of academia from Europe).

Knowledge

Solid experience with MPI programming (point-to-point and collective communication)
Solid experience with OpenMP programming (parallel regions, loop parallelism, synchronization)

Technical

A modern web browser or SSH client for accessing the HLRS cluster environment provided for the course

After completing this course, you will be able to:

Explain the performance motivation for hybrid MPI+OpenMP programming on hierarchical HPC systems
Implement hybrid parallel programs that combine MPI for inter-node communication with OpenMP for intra-node threading
Exploit MPI-3.0 shared memory windows for direct neighbor access and efficient halo copies within a node
Compare hybrid MPI+OpenMP, MPI-3.0 shared memory, and pure MPI implementations in terms of performance and programmability
Optimize process and thread placement for multi-socket, multi-core architectures
Apply hybrid MPI+OpenMP offloading strategies for GPU-accelerated workloads
Use performance analysis tools to diagnose bottlenecks in hybrid parallel programs

Motivation for hybrid programming: memory consumption, communication overhead, and the node hierarchy
Hybrid MPI+OpenMP: programming model, synchronization strategies, and thread-safety levels
MPI-3.0 shared memory: windows, direct neighbor access, and halo exchange without message passing
Process and thread placement on multi-socket, multi-core systems
Performance comparison: hybrid vs. pure MPI on representative benchmarks and case studies
MPI+OpenMP offloading to GPU accelerators
Performance analysis tools for hybrid programs

2026, Feb 10: full-day on-site course in Hybrid @ HLRS in collaboration with HLRS, ASC
2025, Jan 21: full-day on-site course in Hybrid @ HLRS in collaboration with HLRS, VSC
2024, Jan 23: full-day on-site course in Hybrid @ HLRS in collaboration with HLRS, VSC
2022, Dec 12: full-day online course in collaboration with VSC, PRACE, HLRS
2022, Jun 22: full-day online course in collaboration with LRZ, PRACE, HLRS, VSC
2022, Apr 5: full-day online course in collaboration with VSC, PRACE, HLRS
2021, Jun 15: full-day online course in collaboration with VSC, HLRS
2020, Jun 17: full-day online course in collaboration with VSC, HLRS
2020, Jan 27: full-day on-site course at HLRS in collaboration with HLRS, VSC

For an overview of all NHR@FAU courses, visit the course overview page.

Hybrid Programming in HPC – MPI+X

Course Details

Prerequisites

Learning Outcomes

Course Outline

Past Events (9)