NHR PerfLab Seminar: The Role of Idle Waves in Modeling and Optimization of Parallel Programs

Symbolic picture for the article. The link opens the image in a large view.

Speaker: Ayesha Afzal, NHR@FAU

Title: The Role of Idle Waves in Modeling and Optimization of Parallel Programs

Date and time: Tuesday, April 26, 2:00 p.m. – 3:00 p.m.



A wide spectrum of disturbances emerging from the system and the applications poses challenges to the analytic “white-box” performance modeling of distributed-memory parallel programs. Major research has been conducted on the characterization of “noise” and its mitigation via explicit techniques, but there is very little work on incorporating noise effects into analytic performance models. As a first step in this direction we have developed and validated an analytic model of the propagation speed of idle waves, which emerge from delays in execution or communication on specific processes and propagate through the parallel program, much like a train delay that causes other trains to wait and thus “ripples” through the schedule. Idle wave speed depends on the execution and communication properties of the program [1]. Using a variety of HPC platforms and diverse application scenarios, we further explored how these idle waves interact nonlinearly within a parallel code on a cluster and how they decay due to different mechanisms: interaction with bottlenecks, system noise, system topology, and application load imbalance [2]. One important consequence of the presence of idle waves or even fine-grained noise in a program is that it fosters desynchronization, i.e., processes get out of their “natural lockstep” over time. This can lead to the interesting effect that communication overhead overlaps with useful work, so that eventually the disturbances cause the code to run faster. This can be observed when the program is subject to bottlenecks, such as memory or network bandwidth [3, 4]. The first part of my talk will highlight the relevant phenomenology and modeling approaches for “out-of-lockstep” execution.

The second part of the talk will focus on the performance aspects. I will describe how above findings on the impact of automatic asynchronous communication can be useful in selecting code or parameter changes for optimizing program performance. To this end, a wide spectrum of codes is studied, including dynamic/adaptive programs (miniAMR), collective-avoiding algorithms (ChebFD) and parallel programs with strictly requiring collectives (HPCG), with avoidable collectives of adjustable frequency (LULESH, LBM), and without collectives (spMVM).

This research work is supported by KONWIHR, the Bavarian Competence Network for Scientific High Performance Computing in Bavaria, under project name “OMI4papps.” and has recently received the first place ISC PhD Forum Award 2021.

[1] https://doi.org/10.1109/CLUSTER.2019.8890995

[2] https://doi.org/10.1007/978-3-030-78713-4_19

[3] https://doi.org/10.1007/978-3-030-50743-5_20

[4] https://doi.org/10.1002/cpe.6816

Speaker bio:

Ayesha Afzal is a PhD student at the professorship for High Performance Computing at Erlangen National High Performance Computing Center (NHR@FAU), Germany. She holds a Master’s degree in Computational Engineering from Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany, and a Bachelor’s degree in Electrical Engineering from the University of Engineering and Technology, Lahore, Pakistan. Her PhD research lies at the intersection of analytic performance models, performance tools and parallel simulation frameworks, with a focus on first-principles performance modelling of distributed-memory parallel programs in high-performance computing. She further conducts research in multi-core and parallel architectures, parallel computing and algorithms, parallel programming models, modern C++, and domain-specific languages. Ayesha is the recipient of the First Place ISC PhD Forum Award 2021, which recognizes the most outstanding PhD work.