Research Focus

Our activities are in the following research fields:

Performance Engineering

Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively. 

The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.

Projects:

Term: 2017-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering…

More information

Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous…

More information

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

More information

Term: 2019-01-01 - 2021-12-31
Funding source: Europäische Union (EU)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

More information

Term: 2022-09-01 - 2025-08-31
Funding source: BMFTR / Verbundprojekt
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätz…

More information

Term: 2024-01-01 - 2026-12-31
Funding source: EU / Cluster 4: Digital, Industry and Space
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

The Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two…

More information

Term: 2022-09-01 - 2025-08-31
Funding source: BMFTR / Verbundprojekt
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

More information

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

More information

Publications:

2024

2023

2022

2021

2020

2019

  • , , , , , , , , , , :
    Code generation for massively parallel phase-field simulations
    2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 (Denver, CO, 2019-11-17 - 2019-11-22)
    In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC
    DOI: 10.1145/3295500.3356186

2018

Performance Modeling

Performance models describe the interaction between an application and the hardware, forming the basis for a deep understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. 

The execution cache memory (ECM) developed by the group allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model. 

Beyond the node level, the group investigates the performance of highly parallel MPI and hybrid applications, especially those without frequent synchronizing operations. Applications show highly dynamic behavior due to their interaction with the system's hardware bottlenecks, such as memory and network bandwidth. As a consequence, a simple additive combination of runtime models for the different phases of an application is often inaccurate. We extend existing node-level and communication models to describe effects like desynchronization, resynchronization, and idle wave propagation.

Publications:

2023

2022

2021

2019

Performance Tools

The group develops open-source software in the areas of performance tools, cluster monitoring, and benchmarking.
In the area of “performance tools,” the well-known LIKWID tool collection (https://github.com/RRZE-HPC/likwid) is being developed. It contains various tools for the controlled execution of applications on modern computing nodes with complex topology and adaptive runtime parameters. By measuring appropriate hardware metrics, LIKWID enables a detailed analysis of the hardware usage of application programs and is therefore of central importance for the validation of performance models and the identification of performance patterns. The output of derived metrics, such as the main memory bandwidth used, requires continuous adaptation and validation of this tool to new computer architectures.
The static code analysis tool OSACA (Open Source Architecture Code Analyzer) can analyze assembler code and provides a runtime prediction within the computing core (https://github.com/RRZE-HPC/OSACA).
With ClusterCockpit (https://clustercockpit.org/), the group is developing a comprehensive HPC cluster monitoring solution. ClusterCockpit comprises the following components: cc-metric-collector (node agent on the compute nodes), cc-backend (REST API and web server backend including web-based user interface), cc-metric-store (in-memory metric database), cc-energy-manager (job-specific control of power capping settings, global power capping for a cluster), and cc-node-controller (setting system parameters at the node level).   ClusterCockpit offers both job-centric and node-centric views and is accessible to regular HPC users, support staff, and administrators. ClusterCockpit is in productive use at a large number of HPC centers.
Benchmark applications are an important tool for understanding performance-limiting factors and exploring new optimization opportunities. They are used to characterize hardware platforms and in research and teaching. The group is developing “The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark), an application for measuring the maximum achievable bandwidth on all levels of the memory hierarchy. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implements state-of-the-art algorithms in the field of molecular dynamics for CPUs and GPUs, including scalable MPI parallelization. SparseBench implements solvers for sparse systems of equations. Different memory formats are supported. SparseBench is also MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) collects and stores all performance-related information at the node level, thus making an important contribution to reproducible benchmark results.

Projects:

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

More information

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

More information

Publications:

2019

2018

2017

  • , , , :
    Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
    10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 2016-10-04 - 2016-10-05)
    In: Tools for High Performance Computing 2016, Cham:

2015

Hardware-efficient building blocks for sparse linear algebra and stencil solvers

The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods.  This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.

Projects:

Term: 2012-11-01 - 2019-06-30
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: ,

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault…

More information

Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

The ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction

More information

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
Project leader:

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

More information

Publications:

2022

2021

2020

2019

2018

2015

2014

2011

2009

HPC/HPDA Research Software Engineering

Increasing computing power and also amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data.

Publications:

2024

2023

2022