It is our pleasure to announce a talk by Prof. Martin Schulz (TU Munich).
Title: A Case for More Adaptivity in HPC
Time: Monday, February 5, 2018, 16:15
Place: RRZE e-Studio (Room 02.037)
Current HPC environments and applications are rather rigid and inflexible: applications run only with fixed numbers of processors, scheduling is done without considering the context of other jobs and their impact, network and I/O congestion are often only accepted as a matter of fact, and power is in most cases still fully provisioned to avoid the need for any adaptivity. This leads to inefficient usage of HPC systems with resources fragmentation, high variability and suboptimal performance. In this talk, I will discuss several examples of such inefficient usage of resources and will show how adaptivity has to be an important part to help solve this challenge. In particular, I will focus on efficient power usage both on the application and system level, as well as approaches to support malleable applications using new programming abstractions. These efforts provide first steps towards more adaptive systems, which will enable us to exhaust HPC systems to their full capabilities.
Martin Schulz is a Full Professor and Chair for Computer Architecture and Computer Organization at the Technische Universität München (TUM), which he joined in 2017. Prior to that, he held positions at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL) and Cornell University. He earned his Doctorate in Computer Science in 2001 from TUM and a Master of Science in Computer Science from UIUC. Martin has published over 200 peer-reviewed papers and currently serves as the chair of the MPI Forum, the standardization body for the Message Passing Interface. His research interests include parallel and distributed architectures and applications; performance monitoring, modeling and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; power-aware parallel computing; and fault tolerance at the application and system level. Martin was a recipient of the IEEE/ACM Gordon Bell Award in 2006 and an R&D 100 award in 2011.