Research Test

Our activities are in the following research fields: Performance Engineering Performance Modeling Performance Tools Hardware-efficient building blocks for sparse linear algebra and stencil solvers HPC/HPDA Research Software Engineering

Performance Engineering

Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively.

The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.

Publications:

Ernst D., Holzer M., Hager G., Knorr M., Wellein G.:
Analytical performance estimation during code generation on modern GPUs
In: Journal of Parallel and Distributed Computing 173 (2023), p. 152-167
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2022.11.003
Ravedutti Lucio Machado R., Eitzinger J., Laukemann J., Hager G., Köstler H., Wellein G.:
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023
Afzal A., Hager G., Wellein G.:
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3624197
Alappat C., Hager G., Schenk O., Wellein G.:
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512
Owen H., Ernst D., Gruber T., Lemkuhl O., Houzeaux G., Gasparino L., Wellein G.:
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00043

Performance Models

Performance models describe the interaction between application and hardware, forming the basis for a profound understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. The execution cache memory (ECM), which was developed by the group, allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model.

Publications:

Afzal A., Hager G., Wellein G.:
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 2021-06-24 - 2021-07-02)
In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
DOI: 10.1007/978-3-030-78713-4_19
Afzal A., Hager G., Wellein G.:
Addressing White-box Modeling and Simulation Challenges in Parallel Computing
ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 2022-06-08 - 2022-06-10)
In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
DOI: 10.1145/3518997.3534986
Ravedutti Lucio Machado R., Eitzinger J., Köstler H., Wellein G.:
MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms
In: Parallel Processing and Applied Mathematics. PPAM 2022., Springer, Cham, 2023, p. 321-332 (Lecture Notes in Computer Science (LNCS), Vol.13826)
ISBN: 978-3-031-30441-5
DOI: 10.1007/978-3-031-30442-2_24

Performance Tools

The group develops, validates and maintains simple open source tools, which support performance analysis, the creation of performance models and the performance engineering process on the compute node level.

The well-known tool collection LIKWID (http://tiny.cc/LIKWID) comprises various tools for the controlled execution of applications on modern compute nodes with complex topologies and adaptive runtime parameters. By measuring suitable hardware metrics LIKWID enables a detailed analysis of the hardware usage of application programs and is thus pivotal for the validation of performance models and identification of performance patterns. The support for derived metrics such as attained main memory bandwidth requires a continuous adaptation and validation of this tool to new computer architectures.

The automatic generation of Roofline and ECM models for simple kernels is the purpose of the Kerncraft Tool (http://tiny.cc/kerncraft). An important component of Kerncraft is the OSACA tool (Open Source Architecture Code Analyzer), which is responsible for the single core analysis and runtime prediction of an existing assembly code (http://tiny.cc/OSACA). For all the tools mentioned above, we aim to support as many relevant hardware architectures as possible (Intel/AMD x86, ARM-based processors, IBM Power, NVIDIA GPU).

Based on LIKWID and the existing experience in performance analysis, the group is also pushing forward work on job-specific performance monitoring. The goal is to develop web-based administrative tools such as ClusterCockpit (http://tiny.cc/ClusterCockpit), which will make it much easier for users and administrators to identify bottlenecks in cluster jobs. ClusterCockpit is currently being tested at RRZE and other centers.

Publications:

Hardware-efficient building blocks for sparse linear algebra and stencil solvers

The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods. This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.

Publications:

Ernst D., Hager G., Thies J., Wellein G.:
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43
Alappat C., Seiferth J., Hager G., Korch M., Rauber T., Wellein G.:
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 2021-02-27 - 2021-03-03)
In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (ed.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
DOI: 10.1109/CGO51591.2021.9370316
Alappat C., Hager G., Schenk O., Wellein G.:
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512

HPC/HPDA Research Software Engineering Increasing computing power and the increasing amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data. Completed Projects

Weiterentwicklung des Hochleistungsrechnens

(Third Party Funds Single)

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation

→More information

Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

(Third Party Funds Group – Sub project)

Overall project: Energy Oriented Center of Excellence: toward exascale for energy
Term: 2019-01-01 - 2021-12-31
Funding source: Europäische Union (EU)

→More information

Metaprogrammierung für Beschleunigerarchitekturen

(Third Party Funds Single)

Term: 2017-01-01 - 2019-12-31
Funding source: Bundesministerium für Bildung und Forschung (BMBF)

Abstract

In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für
heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Modelle (RRZE) unterstützt
Eine Laufzeitumgebung mit eingebautem Performance-Profiling kümmert sich um Resourcenverwaltung und Systemkonfiguration. Das entstandene Framework wird anhand zweier Anwendungen, Ray-Tracing (DFKI) und Bioinformatik (JGU), evaluiert.
Als Zielplattformen dienen Einzelknoten und Cluster mit mehreren Beschleunigern (CPUs, GPUs, Xeon Phi).

Die Universität Erlangen-Nürnberg ist hauptverantwortlich für die Unterstützung von verteilter
Programmierung (LSS) sowie für die Entwicklung und Umsetzung von unterstützenden Performance-Modellen sowie einer integrierten Profiling Komponente (RRZE). In beiden Teilbereichen wird zu Beginn eine Anforderungsanalyse durchgeführt um weitere Schritte zu planen und mit den Partnern abzustimmen.
Der LSS wird im ersten Jahr die Verteilung der Datenstrukturen umsetzen. Im weiteren Verlauf wird sich die Arbeit auf die Umsetzung von Synchronisationsmechanismen konzentrieren. Im letzten Jahr werden Codetransformationen entworfen, um die Konzepte für Verteilung und Synchronisation in AnyDSL auf die gewählten Anwendungen anzupassen. Das RRZE wird in einem ersten Schritt das kerncraft Framework in die partielle Auswertung integrieren. Hierbei wird kerncraft erweitert um aktuelle Beschleunigerarchitekturen sowie Modelle für die Distributed-Memory-Parallelisierung zu unterstützen. In zwei weiteren Paketen wird eine Ressourcenverwaltung und eine auf LIKWID basierende Profiling Komponente umgesetzt

→More information
Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

(Third Party Funds Single)

Term: 2017-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
URL: https://blogs.fau.de/prope/

Abstract

The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering (PE) process. This PE process defines and drives code
optimization and parallelization as a target-oriented, structured
process. Application hot spots are identified first and then
optimized/parallelized in an iterative cycle: Starting with an analysis of
the algorithm, the code, and the target hardware a hypothesis of the
performance-limiting factors is proposed based on performance
patterns and models. Performance measurements validate or guide
the iterative adaption of the hypothesis. After validation of the
hardware bottleneck, appropriate code changes are deployed and the
PE cycle restarts. The level of detail of the PE process can be
adapted to the complexity of the underlying problem and the
experience of the HPC analyst. Currently this process is applied by
experts and at the prototype level. ProPE will formalize and document
the PE process and apply it to various scenarios (single core/node
optimization, distributed parallelization, IO-intensive problems).
Different abstraction levels of the PE process will be implemented and
disseminated to HPC analysts and application developers via user
support projects, teaching activities, and web documentation. The
integration of the PE process into modern IT infrastructure across
several centers with different HPC support expertise will be the
second project focus. All components of the PE process will be
coordinated and standardized across the partnering sites. This way
the complete HPC expertise within ProPE can be offered as coherent
service on a nationwide scale. Ongoing support projects can be
transferred easily between participating centers. In order to identify
low-performing applications, characterize application loads, and
quantify benefits of the PE activities at a system level, ProPE will
employ a system monitoring infrastructure for HPC clusters. This tool
will be tailored to the requirements of the PE process and designed
for easy deployment and usage at tier-2/3 centers. The associated
ProPE partners will ensure the embedding into the German HPC
infrastructure and provide basic PE expertise in terms of algorithmic
choices, perfectly complementing the code optimization and
parallelization efforts of ProPE.

→More information
Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

(Third Party Funds Single)

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Bildung und Forschung (BMBF)

Abstract

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als auch möglicher Programmtransformationen.
Die Optimierung der Programmausführung für mehrere nicht-funktionale Ziele (z.B. Laufzeit oder Energieverbrauch) soll auf einer Performance-Modellierung zur Eingrenzung des Suchraums effizienter Programmvarianten aufbauen. Anwendungsunabhängige Methoden und Strategien zur Selbstadaption sollen in einem Autotuning-Navigator gekapselt werden.

Das Erlanger Teilprojekt beschäftigt sich zunächst mit der modellbasierten Verständnis von Autotuning-Verfahren für reguläre Simulationsalgorithmen am Beispiel verschiedener gängiger Stencilklassen. Dabeisollen mit Hilfe erweiterter Performancemodelle strukturierte Richtlinien und Empfehlungen für den Autotuning-Prozess bzgl. relevanter Code-Transformationen und der Beschränkung des Suchraums für Optimierungsparameter erstellt und für den Autotuning-Navigator exemplarisch aufbereitet werden.
Der zweite Schwerpunkt der Arbeiten besteht in der Erweiterung bestehender analytischer
Performancemodelle und Software-Werkzeuge auf neue Rechnerarchitekturen und der Integration in den Autotuning-Navigator. Darüber hinaus betreut der Erlanger Gruppe den Demonstrator für Stencil-Codes.
Die Gruppe wirkt weiters an der Auslegung des AT-Navigators und der Definition von Schnittstellen mit.

→More information

SPP EXA 1648

(Third Party Funds Group – Sub project)

Overall project: SPP EXA 1648
Term: 2016-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

→More information
EXASTEEL II - Bridging Scales for Multiphase Steels

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: http://www.numerik.uni-koeln.de/14079.html

Abstract

In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale.

Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods.

Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project.

The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators).

→More information
Equipping Sparse Solvers for Exascale II (ESSEX-II)

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: https://blogs.fau.de/essex/activities

Abstract

The ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction
boundaries separating these layers are broken in ESSEX-II by
strongly integrating objectives: scalability, numerical reliability, fault
tolerance, and holistic performance and power engineering. Driven by
Moores Law and power dissipation constraints, computer systems will
become more parallel and heterogeneous even on the node level in
upcoming years, further increasing overall system parallelism. MPI+X
programming models can be adapted in flexible ways to the
underlying hardware structure and are widely expected to be able to
address the challenges of the massively multi-level parallel
heterogeneous architectures of the next decade. Consequently, the
parallel building blocks layer supports MPI+X, with X being a
combination of node-level programming models able to fully exploit
hardware heterogeneity, functional parallelism, and data parallelism.
In addition, facilities for fully asynchronous checkpointing, silent data
corruption detection and correction, performance assessment,
performance model validation, and energy measurements will be
provided. The algorithms layer will leverage the components in the
building blocks layer to deliver fully heterogeneous, automatically
fault-tolerant, and state-of-the-art implementations of Jacobi-Davidson
eigensolvers, the Kernel Polynomial Method (KPM), and Chebyshev
Time Propagation (ChebTP) that are ready to use for production on
modern heterogeneous compute nodes with best performance and
numerical accuracy. Chebyshev filter diagonalization (ChebFD) and a
Krylov eigensolver complement these implementations, and the
recent FEAST method will be investigated and further developed for
improved scalability. The applications layer will deliver scalable
solutions for conservative (Hermitian) and dissipative (non-Hermitian)
quantum systems with strong links to optics and biology and to novel
materials such as graphene and topological insulators. Extending its
predecessor project, ESSEX-II adopts an additional focus on
production-grade software. Although the selection of algorithms is
strictly motivated by quantum physics application scenarios, the
underlying research directions of algorithmic and hardware efficiency,
accuracy, and resilience will radiate into many fields of computational
science. Most importantly, all developments will be accompanied by
an uncompromising performance engineering process that will
rigorously expose any discrepancy between expected and observed
resource efficiency.

→More information

ESSEX - Equipping Sparse Solvers for Exascale

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2019-06-30
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

→More information
ESSEX - Equipping Sparse Solvers for Exascale

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

→More information
EXASTEEL - Bridging Scales for Multiphase Steels

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

This project adresses algorithms and Software for the Simulation of three dimensional multiscale material science problems on the future Supercomputers developed for exascale computing.The performance of modern high strength steels is governed by the complex interaction of the individual constituents on the microscale. Direct computational homogenization schemes such as the FE2 method allow for the high fidelity material design and analysis of modern steels. Using this approach, fluctuations of the local field equations (balance laws) can be resolved to a high accuracy, which is needed for the prediction of failure of such micro-heterogeneous materials.Performing the scale bridging within the FE2 method for realistic problems in 3D still requires new ultra-scalable, robust algorithms and solvers which have to be developed and incorporated into a new application Software.Such algorithms must be specifically designed to allow the efficient use of the future hardware.Here, the direct multiscale approach (FE2) will be combined with new, highly efficient, parallel solver algorithms. For the latter algorithms, a hybrid algorithmic approach will be taken, combining nonoverlapping parallel domain decomposition (FETl) methods with efficient parallel multigrid preconditioners. A comprehensive performance engineering approach will be implemented guided by the Pl Wellein, to ensure a systematic optimization and parallelization process across all Software layers.This project builds on parallel Simulation Software developed for the solution of complex nonlinear structural mechanics problem by the Pls Schröder, Balzani and Klawonn, Rheinbach. !t is based on the application Software package FEAP (Finite Element Analysis Program, R. Taylor, UC Berkeley). Within a new Software environment FEAP has been combined with a FETI-DP domain decomposition solver, based on PETSc (Argonne National Laboratory) and hypre (Lawrence Livermore National Laboratory), e.g„ to perform parallel simulations in nonlinear biomechanics. The optimization, performance modeling and performance engineering will be guided by the Pl Wellein. The Pls Schröder and Balzani have performed FE2-simulations in the past using an extended version of FEAP.The envisioned scale-bridging for realistic, advanced engineering problems in three dimensions will require a computational power which will only be obtainable when exascale computing becomes available.

→More information
TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

Much of what one refers to as geological activity of the Earth is due to the fact that heat is transported from the interior of our planet to the surface in a planetwide solid-state convection in the Earth’s mantle. For this reason, the study of the dynamics of the mantle is critical to our understanding of how the entire planet works. Processes from earthquakes, plate tectonics, crustal evolution to the geodynamo are governed by convection in the mantle. Without a detailed knowledge of Earth‘s internal dynamic processes, we cannot hope to deduce the many interactions between shallow and deep Earth processes that dominate the Earth system. The vast forces associated with mantle convection cells drive horizontal movement of Earth’s surface in the form of plate tectonics, which is well known albeit poorly understood. They also induce substantial vertical motion in the form of dynamically maintained topography that manifests itself prominently in the geologic record through sea level variations and their profound impact on the ocean and climate system. Linking mantle processes to their surface manifestations is seen widely today as one of the most fundamental problems in the Earth sciences, while being at the same time a matter of direct practical relevance through the evolution of sedimentary basins and their paramount economical importance.Simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. With exascale systems of the future it will be possible to advance beyond the deterministic forward problem to a stochastic uncertainty analysis for the inverse problem. In fact, fluid dynamic inverse theory is now at hand that will allow us to track mantle motion back into the past exploiting the rich constraints available from the geologic record, subject to the availability of powerful geodynamical simulation software that could take advantage of these future supercomputers.The new community code TERRA-NEO will be based on a carefully designed multi-scale spacetime discretization using hybridized Discontinuous Galerkin elements on an icosahedral mesh with block-wise refinement. This advanced finite element technique promises better stability and higher accuracy for the non-linear transport processes in the Earth mantle while requiring less communication in a massively parallel setting. The resulting algebraic systems with finally more than 1012 unknowns per time step will be solved by a new class of communication-avoiding, asynchronous multigrid preconditioners that will achieve maximal scalability and resource-optimized computational performance. A non-deterministic control flow and a lazy evaluation strategy will alleviate the traditional over-synchronization of hierarchical iterative methods and will support advanced resiliency techniques on the algorithmic level.The software framework of TERRA-NEO will be developed specifically for the upcoming heterogeneous exascale computers by using an advanced architecture-aware design process. Special white-box performance models will guide the software development leading to a holistic co-design of the data structures and the algorithms on all levels. With this systematic performance engineering methodology we will also optimize a balanced compromise between minimal energy consumption and shortest run time.This consortium is fully committed to the interdisciplinary collaboration that is necessary for creating TERRA-NEO as new exascale simulation framework. To this end, TERRA-NEO brings top experts together that cover all aspects of CS&E, from modeling via the discretization to solvers and software engineering for exascale architectures.

→More information

Eine fehlertolerante Umgebung für peta-scale MPI-Löser

(Third Party Funds Group – Sub project)

Overall project: FEToL
Term: 2011-06-01 - 2014-05-31
Funding source: Bundesministerium für Bildung und Forschung (BMBF)

→More information

SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen

(Third Party Funds Group – Overall project)

Term: 2009-01-01 - 2011-12-31
Funding source: BMBF / Verbundprojekt, Bundesministerium für Bildung und Forschung (BMBF)

Abstract

Ziel des vom BMBF geförderten Projekts SKALB (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist die effiziente Implementierung und Weiterentwicklung von Lattice-Boltzmann basierten Strömungslösern zur Simulation komplexer Multi-Physik-Anwendungen auf Rechnern der Petascale-Klasse. Die Lattice-Boltzmann Methode ist ein akzeptiertes Lösungsverfahren im Bereich der numerischen Strömungsmechanik. Als zentraler Vorteil der Methode ist die prinzipielle Einfachheit des numerischen Verfahrens zu nennen, so dass sich sowohl komplexe Strömungsgeometrien wie poröse Medien oder Metallschäume als auch direkte numerische Simulationen (DNS) zur Untersuchung turbulenter Strömungen effizient berechnen lassen. Im Projekt SKALB sollen Lattice-Boltzmann-Applikationen für die neue Klassen massivst paralleler heterogener und homogener Supercomputer methodisch und technisch weiterentwickelt werden. Das RRZE bringt seine langjährige Erfahrung auf dem Gebiet der Performancemodellierung und effizienten Implementierung von Lattice-Boltzmann-Methoden auf einem breiten Spektrum moderner Rechner ein und beschäftigt sich darüberhinaus mit neuen Programmieransätzen für Multi-/Manycore Prozessoren. Der am RRZE weiterentwickelte Applikationscode soll gemeinsam mit der AG Prof. Schwieger zur massiv parallelen Simulation von Strömungen in porösen Medien eingesetzt werden.

→More information

Research Test

Performance Engineering

Publications:

Performance Models

Publications:

Performance Tools

Publications:

Hardware-efficient building blocks for sparse linear algebra and stencil solvers

Publications:

2022

Weiterentwicklung des Hochleistungsrechnens

2019

Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

2017

Metaprogrammierung für Beschleunigerarchitekturen

Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

2016

SPP EXA 1648

EXASTEEL II - Bridging Scales for Multiphase Steels

Equipping Sparse Solvers for Exascale II (ESSEX-II)

2012

ESSEX - Equipping Sparse Solvers for Exascale

ESSEX - Equipping Sparse Solvers for Exascale

EXASTEEL - Bridging Scales for Multiphase Steels

TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework

2011

Eine fehlertolerante Umgebung für peta-scale MPI-Löser

2009

SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen