Research Focus

Our activities are in the following research fields:

Performance Engineering
Performance Modeling
Performance Tools
Hardware-efficient building blocks for sparse linear algebra and stencil solvers
HPC/HPDA Research Software Engineering

Performance Engineering

Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively.

The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.

Projects:

Term: 2017-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering…

→ More information

Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous…

→ More information

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

→ More information

Term: 2019-01-01 - 2021-12-31
Funding source: Europäische Union (EU)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

→ More information

Term: 2022-09-01 - 2025-08-31
Funding source: BMFTR / Verbundprojekt
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätz…

→ More information

Term: 2024-01-01 - 2026-12-31
Funding source: EU / Cluster 4: Digital, Industry and Space
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

The Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two…

→ More information

Term: 2022-09-01 - 2025-08-31
Funding source: BMFTR / Verbundprojekt
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

→ More information

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

→ More information

Publications:

2024

Laukemann J., Gruber T., Hager G., Oryspayev D., Wellein G.:
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, CA, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00038
Owen H., Ernst D., Gruber T., Lemkuhl O., Houzeaux G., Gasparino L., Wellein G.:
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00043

2023

Afzal A., Hager G., Wellein G.:
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3624197
Ernst D., Holzer M., Hager G., Knorr M., Wellein G.:
Analytical performance estimation during code generation on modern GPUs
In: Journal of Parallel and Distributed Computing 173 (2023), p. 152-167
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2022.11.003
Ravedutti Lucio Machado R., Eitzinger J., Laukemann J., Hager G., Köstler H., Wellein G.:
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023

2022

Alappat C., Hager G., Schenk O., Wellein G.:
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512

2021

Alappat C., Meyer N., Laukemann J., Gruber T., Hager G., Wellein G., Wettig T.:
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX
In: Concurrency and Computation-Practice & Experience (2021)
ISSN: 1532-0626
DOI: 10.1002/cpe.6512
URL: https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6512

2020

Ernst D., Hager G., Thies J., Wellein G.:
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43
Klawonn A., Lanser M., Rheinbach O., Wellein G., Wittmann M.:
Energy efficiency of nonlinear domain decomposition methods
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020953891

2019

Bauer M., Hötzer J., Ernst D., Hammer J., Seiz M., Hierl H., Hönig J., Köstler H., Wellein G., Nestler B., Rüde U.:
Code generation for massively parallel phase-field simulations
2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 (Denver, CO, 2019-11-17 - 2019-11-22)
In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
DOI: 10.1145/3295500.3356186

2018

Anzt H., Kreutzer M., Ponce E., Peterson GD., Wellein G., Dongarra J.:
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
Kreutzer M., Ernst D., Bishop AR., Fehske H., Hager G., Nakajima K., Wellein G.:
Chebyshev filter diagonalization on modern manycore processors and GPGPUs
Springer Verlag, 2018
ISBN: 9783319920399
DOI: 10.1007/978-3-319-92040-5_17
Laukemann J., Hammer J., Hofmann J., Hager G., Wellein G.:
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578

Performance Modeling

Performance models describe the interaction between an application and the hardware, forming the basis for a deep understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis.

The execution cache memory (ECM) developed by the group allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model.

Beyond the node level, the group investigates the performance of highly parallel MPI and hybrid applications, especially those without frequent synchronizing operations. Applications show highly dynamic behavior due to their interaction with the system's hardware bottlenecks, such as memory and network bandwidth. As a consequence, a simple additive combination of runtime models for the different phases of an application is often inaccurate. We extend existing node-level and communication models to describe effects like desynchronization, resynchronization, and idle wave propagation.

Publications:

2023

Ravedutti Lucio Machado R., Eitzinger J., Köstler H., Wellein G.:
MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms
In: Parallel Processing and Applied Mathematics. PPAM 2022., Springer, Cham, 2023, p. 321-332 (Lecture Notes in Computer Science (LNCS), Vol.13826)
ISBN: 978-3-031-30441-5
DOI: 10.1007/978-3-031-30442-2_24

2022

Afzal A., Hager G., Wellein G.:
Addressing White-box Modeling and Simulation Challenges in Parallel Computing
ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 2022-06-08 - 2022-06-10)
In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
DOI: 10.1145/3518997.3534986

2021

Afzal A., Hager G., Wellein G.:
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 2021-06-24 - 2021-07-02)
In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
DOI: 10.1007/978-3-030-78713-4_19

2019

Afzal A., Hager G., Wellein G.:
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8890995

Performance Tools

The group develops open-source software in the areas of performance tools, cluster monitoring, and benchmarking.
In the area of “performance tools,” the well-known LIKWID tool collection (https://github.com/RRZE-HPC/likwid) is being developed. It contains various tools for the controlled execution of applications on modern computing nodes with complex topology and adaptive runtime parameters. By measuring appropriate hardware metrics, LIKWID enables a detailed analysis of the hardware usage of application programs and is therefore of central importance for the validation of performance models and the identification of performance patterns. The output of derived metrics, such as the main memory bandwidth used, requires continuous adaptation and validation of this tool to new computer architectures.
The static code analysis tool OSACA (Open Source Architecture Code Analyzer) can analyze assembler code and provides a runtime prediction within the computing core (https://github.com/RRZE-HPC/OSACA).
With ClusterCockpit (https://clustercockpit.org/), the group is developing a comprehensive HPC cluster monitoring solution. ClusterCockpit comprises the following components: cc-metric-collector (node agent on the compute nodes), cc-backend (REST API and web server backend including web-based user interface), cc-metric-store (in-memory metric database), cc-energy-manager (job-specific control of power capping settings, global power capping for a cluster), and cc-node-controller (setting system parameters at the node level). ClusterCockpit offers both job-centric and node-centric views and is accessible to regular HPC users, support staff, and administrators. ClusterCockpit is in productive use at a large number of HPC centers.
Benchmark applications are an important tool for understanding performance-limiting factors and exploring new optimization opportunities. They are used to characterize hardware platforms and in research and teaching. The group is developing “The Bandwidth Benchmark” (https://github.com/RRZE-HPC/TheBandwidthBenchmark), an application for measuring the maximum achievable bandwidth on all levels of the memory hierarchy. MD-Bench (https://github.com/RRZE-HPC/MD-Bench) implements state-of-the-art algorithms in the field of molecular dynamics for CPUs and GPUs, including scalable MPI parallelization. SparseBench implements solvers for sparse systems of equations. Different memory formats are supported. SparseBench is also MPI-parallel. MachineState (https://github.com/RRZE-HPC/MachineState) collects and stores all performance-related information at the node level, thus making an important contribution to reproducible benchmark results.

Projects:

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

→ More information

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

→ More information

Publications:

2019

Eitzinger J., Gruber T., Afzal A., Zeiser T., Wellein G.:
ClusterCockpit-A web application for job-specific performance monitoring
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8891017
Laukemann J., Hammer J., Hager G., Wellein G.:
Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
10th IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2019
DOI: 10.1109/PMBS49563.2019.00006

2018

Laukemann J., Hammer J., Hofmann J., Hager G., Wellein G.:
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578

2017

Hammer J., Eitzinger J., Hager G., Wellein G.:
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 2016-10-04 - 2016-10-05)
In: Niethammer C, Gracia J, Hilbrich T, Knüpfer A, Resch MM, Nagel WE (ed.): Tools for High Performance Computing 2016, Cham: 2017

2015

Hammer J., Hager G., Eitzinger J., Wellein G.:
Automatic loop kernel analysis and performance modeling with kerncraft
6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2015 - Held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
DOI: 10.1145/2832087.2832092
Hammer J., Hager G., Eitzinger J., Wellein G.:
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
SC15 The International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA, 2015-11-15)
In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, New York, NY, USA: 2015
DOI: 10.1145/2832087.2832092
URL: http://dl.acm.org/citation.cfm?id=2832087&preflayout=flat
Wellein G., Eitzinger J., Hager G., Röhl T.:
Overhead Analysis of Performance Counter Measurements
43rd International Conference on Parallel Processing Workshops, ICPPW 2014
DOI: 10.1109/ICPPW.2014.34

Hardware-efficient building blocks for sparse linear algebra and stencil solvers

The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods. This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.

Projects:

Term: 2012-11-01 - 2019-06-30
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Gerhard Wellein, Georg Hager

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Dr. Georg Hager

Head of Research

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault…

→ More information

Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

The ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction

→ More information

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
Project leader: Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

→ More information

Publications:

2022

Alappat C., Hager G., Schenk O., Wellein G.:
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512

2021

Alappat C., Seiferth J., Hager G., Korch M., Rauber T., Wellein G.:
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 2021-02-27 - 2021-03-03)
In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (ed.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
DOI: 10.1109/CGO51591.2021.9370316

2020

Ernst D., Hager G., Thies J., Wellein G.:
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43

2019

Alvermann A., Basermann A., Bungartz HJ., Carbogno C., Ernst D., Fehske H., Futamura Y., Galgon M., Hager G., Huber S., Huckle T., Ida A., Imakura A., Kawai M., Köcher S., Kreutzer M., Kus P., Lang B., Lederer H., Manin V., Marek A., Nakajima K., Nemec L., Reuter K., Rippl M., Röhrig-Zöllner M., Sakurai T., Scheffler M., Scheurer C., Shahzad F., Simoes Brambila D., Thies J., Wellein G.:
Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
In: Japan Journal of Industrial and Applied Mathematics (2019)
ISSN: 0916-7005
DOI: 10.1007/s13160-019-00360-8

2018

Anzt H., Kreutzer M., Ponce E., Peterson GD., Wellein G., Dongarra J.:
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
ISSN: 1094-3420
DOI: 10.1177/1094342016646844

2015

Kreutzer M., Hager G., Wellein G., Alvermann A., Fehske H., Pieper A.:
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International (Hyderabad, India, 2015-05-25 - 2015-05-29)
In: IEEE (ed.): Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2015
DOI: 10.1109/IPDPS.2015.76
URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7161530
Malas T., Hager G., Ltaief H., Stengel H., Wellein G., Keyes D.:
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
In: SIAM Journal on Scientific Computing 37 (2015), p. C439-C464
ISSN: 1064-8275
DOI: 10.1137/140991133

2014

Kreutzer M., Hager G., Wellein G., Fehske H., Bishop AR.:
A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units
In: SIAM Journal on Scientific Computing 36 (2014), p. C401C423
ISSN: 1064-8275
DOI: 10.1137/130930352
URL: http://epubs.siam.org/doi/abs/10.1137/130930352

2011

Hager G., Wellein G., Schubert G., Fehske H.:
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems.
In: Parallel Processing Letters 21 (2011), p. 339-358
ISSN: 0129-6264
DOI: 10.1142/S0129626411000254
Schubert G., Hager G., Fehske H., Wellein G.:
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI OpenMP programming
25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 (Anchorage, AK)
DOI: 10.1109/IPDPS.2011.332

2009

Wellein G., Hager G., Zeiser T., Wittmann M., Fehske H.:
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
COMPSAC 2009 (Seattle, USA, 2009-07-20 - 2009-07-24)
In: Proceedings of 2009 33rd Annual IEEE International Computer Software and Applications Conference, IEEE Computer Society: 2009
DOI: 10.1109/COMPSAC.2009.82

HPC/HPDA Research Software Engineering

Increasing computing power and also amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data.

Publications:

2024

Alt C., Lanser M., Plewinski J., Janki A., Klawonn A., Köstler H., Selzer M., Rüde U.:
A continuous benchmarking infrastructure for high-performance computing applications
In: International Journal of Parallel, Emergent and Distributed Systems (2024)
ISSN: 1744-5760
DOI: 10.1080/17445760.2024.2360190
Angersbach R., Köstler H., Kuckuk S.:
Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication
Euro-Par 2024 (Madrid, 2024-08-26 - 2024-08-30)
DOI: 10.1007/978-3-031-69583-4_17
Büttner M., Alt C., Kenter T., Köstler H., Plessl C., Aizinger V.:
Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL
PASC '24: Platform for Advanced Scientific Computing Conference (Zürich, 2024-06-03 - 2024-06-05)
In: PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference, New York, NY: 2024
DOI: 10.1145/3659914.3659925
Schottenhamml H., Anciaux Sedrakian A., Blondel F., Köstler H., Rüde U.:
waLBerla-wind: A lattice-Boltzmann-based high-performance flow solver for wind energy applications
In: Concurrency and Computation-Practice & Experience 36 (2024)
ISSN: 1532-0626
DOI: 10.1002/cpe.8117

2023

Dahmardeh M., Mirzaalian Dastjerdi H., Mazal H., Köstler H., Sandoghdar V.:
Self-supervised machine learning pushes the sensitivity limit in label-free detection of single proteins below 10 kDa
In: Nature methods (2023)
ISSN: 1548-7105
DOI: 10.1038/s41592-023-01778-2

2022

Faghih-Naini S., Aizinger V.:
p-adaptive discontinuous Galerkin method for the shallow water equations with a parameter-free error indicator.
In: GEM - International Journal on Geomathematics 13 (2022)
ISSN: 1869-2672
DOI: 10.1007/s13137-022-00208-3
Maier A., Köstler H., Heisig M., Krauß P., Yang SH.:
Known operator learning and hybrid machine learning in medical imaging - A review of the past, the present, and the future
In: Progress in Biomedical Engineering 4 (2022), Article No.: 022002
ISSN: 2516-1091
DOI: 10.1088/2516-1091/ac5b13

Research Focus

Performance Engineering

Projects:

ProPE: Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

SPPEXA: EXASTEEL II - Bridging Scales for Multiphase Steels

SeASiTe: Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

EoCoE-II: Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

StroemungsRaum: Der skalierbare Strömungsraum

EoCoE-III: FOSTERING THE EUROPEAN ENERGY TRANSITION WITH EXASCALE

DaREXA-F: DatenREduktion für Exascale- Anwendungen in der Fusionsforschung

NHR22-04–11-ER: Weiterentwicklung des Hochleistungsrechnens

Publications:

2024

2023

2022

2021

2020

2019

2018

Performance Modeling

Publications:

2023

2022

2021

2019

Performance Tools

Projects:

SeASiTe: Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

NHR22-04–11-ER: Weiterentwicklung des Hochleistungsrechnens

Publications:

2019

2018

2017

2015

Hardware-efficient building blocks for sparse linear algebra and stencil solvers

Projects:

ESSEX - Equipping Sparse Solvers for Exascale

SPPEXA: Equipping Sparse Solvers for Exascale II (ESSEX-II)

SeASiTe: Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

Publications:

2022

2021

2020

2019

2018

2015

2014

2011

2009

HPC/HPDA Research Software Engineering

Publications:

2024

2023

2022