• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
NHR@FAU
  • FAUTo the central FAU website
Suche öffnen
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz

NHR@FAU

Navigation Navigation close
  • News
  • About us
    • People
    • Funding
    • BayernKI
    • NHR Compute Time Projects
    • Tier3 User Project Reports
    • Support Success Stories
    • Annual Reports
    • NHR@FAU Newsletters
    • Previous Events
    • Jobs
    Portal About us
  • Research
    • Research Focus
    • Publications, Posters & Talks
    • Performance Tools and Libraries
    • NHR PerfLab Seminar
    • Projects
    • Workshops
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures & Seminars
    • Tutorials & Courses
    • Monthly HPC Café and Beginner’s Introduction
    • Theses
    • Student Cluster Competition
    Portal Teaching & Training
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • HPC User Training
    • HPC System Utilization
    Portal Systems & Services
  • FAQ

NHR@FAU

  1. Home
  2. Research
  3. Research Focus

Research Focus

In page navigation: Research
  • Research Focus
  • Publications, Posters & Talks
  • Performance Tools and Libraries
  • ECM Performance Model
  • NHR PerfLab Seminar Series
  • Projects
  • Workshops
  • Awards

Research Focus

Our activities are in the following research fields:

  • Performance Engineering
  • Performance Modeling
  • Performance Tools
  • Hardware-efficient building blocks for sparse linear algebra and stencil solvers
  • HPC/HPDA Research Software Engineering

Performance Engineering

Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively. 

The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.

Projects:

Term: 2017-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader: Gerhard Wellein

The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering…

→ More information

Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Gerhard Wellein

In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous…

→ More information

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader: Gerhard Wellein

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

→ More information

Term: 2019-01-01 - 2021-12-31
Funding source: Europäische Union (EU)
Project leader: Gerhard Wellein

→ More information

Term: 2022-09-01 - 2025-08-31
Funding source: BMBF / Verbundprojekt
Project leader: Gerhard Wellein

Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätz…

→ More information

Term: 2024-01-01 - 2026-12-31
Funding source: EU / Cluster 4: Digital, Industry and Space
Project leader: Gerhard Wellein

The Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two…

→ More information

Term: 2022-09-01 - 2025-08-31
Funding source: BMBF / Verbundprojekt
Project leader: Gerhard Wellein

→ More information

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
Project leader: Gerhard Wellein

→ More information

Publications:

  • Laukemann J., Gruber T., Hager G., Oryspayev D., Wellein G.:
    CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
    38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, CA, 2024-05-27 - 2024-05-31)
    In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
    DOI: 10.1109/IPDPS57955.2024.00038
  • Owen H., Ernst D., Gruber T., Lemkuhl O., Houzeaux G., Gasparino L., Wellein G.:
    Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
    38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, 2024-05-27 - 2024-05-31)
    In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
    DOI: 10.1109/IPDPS57955.2024.00043

  • Afzal A., Hager G., Wellein G.:
    SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
    14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
    In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
    DOI: 10.1145/3624062.3624197
  • Ernst D., Holzer M., Hager G., Knorr M., Wellein G.:
    Analytical performance estimation during code generation on modern GPUs
    In: Journal of Parallel and Distributed Computing 173 (2023), p. 152-167
    ISSN: 0743-7315
    DOI: 10.1016/j.jpdc.2022.11.003
  • Ravedutti Lucio Machado R., Eitzinger J., Laukemann J., Hager G., Köstler H., Wellein G.:
    MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
    In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
    ISSN: 0167-739X
    DOI: 10.1016/j.future.2023.06.023

  • Alappat C., Hager G., Schenk O., Wellein G.:
    Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
    In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
    ISSN: 1045-9219
    DOI: 10.1109/TPDS.2022.3223512

  • Alappat C., Meyer N., Laukemann J., Gruber T., Hager G., Wellein G., Wettig T.:
    Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX
    In: Concurrency and Computation-Practice & Experience (2021)
    ISSN: 1532-0626
    DOI: 10.1002/cpe.6512
    URL: https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6512

  • Ernst D., Hager G., Thies J., Wellein G.:
    Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
    13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
    In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
    DOI: 10.1007/978-3-030-43229-4_43
  • Klawonn A., Lanser M., Rheinbach O., Wellein G., Wittmann M.:
    Energy efficiency of nonlinear domain decomposition methods
    In: International Journal of High Performance Computing Applications (2020)
    ISSN: 1094-3420
    DOI: 10.1177/1094342020953891

  • Bauer M., Hötzer J., Ernst D., Hammer J., Seiz M., Hierl H., Hönig J., Köstler H., Wellein G., Nestler B., Rüde U.:
    Code generation for massively parallel phase-field simulations
    2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 (Denver, CO, 2019-11-17 - 2019-11-22)
    In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
    DOI: 10.1145/3295500.3356186

  • Anzt H., Kreutzer M., Ponce E., Peterson GD., Wellein G., Dongarra J.:
    Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
    In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
    ISSN: 1094-3420
    DOI: 10.1177/1094342016646844
  • Kreutzer M., Ernst D., Bishop AR., Fehske H., Hager G., Nakajima K., Wellein G.:
    Chebyshev filter diagonalization on modern manycore processors and GPGPUs
    Springer Verlag, 2018
    ISBN: 9783319920399
    DOI: 10.1007/978-3-319-92040-5_17
  • Laukemann J., Hammer J., Hofmann J., Hager G., Wellein G.:
    Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
    2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
    In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
    DOI: 10.1007/978-3-319-92040-5_2
    URL: https://ieeexplore.ieee.org/document/8641578

Performance Modeling

Performance models describe the interaction between application and hardware, forming the basis for a profound understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. The execution cache memory (ECM), which was developed by the group, allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model. 

Publications:

  • Ravedutti Lucio Machado R., Eitzinger J., Köstler H., Wellein G.:
    MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms
    In: Parallel Processing and Applied Mathematics. PPAM 2022., Springer, Cham, 2023, p. 321-332 (Lecture Notes in Computer Science (LNCS), Vol.13826)
    ISBN: 978-3-031-30441-5

    DOI: 10.1007/978-3-031-30442-2_24

  • Afzal A., Hager G., Wellein G.:
    Addressing White-box Modeling and Simulation Challenges in Parallel Computing
    ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 2022-06-08 - 2022-06-10)
    In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
    DOI: 10.1145/3518997.3534986

  • Afzal A., Hager G., Wellein G.:
    Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
    36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 2021-06-24 - 2021-07-02)
    In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
    DOI: 10.1007/978-3-030-78713-4_19

  • Afzal A., Hager G., Wellein G.:
    Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
    2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
    In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
    DOI: 10.1109/CLUSTER.2019.8890995

Performance Tools

The group develops, validates and maintains simple open source tools, which support performance analysis, the creation of performance models and the performance engineering process on the compute node level. 

The well-known tool collection LIKWID (http://tiny.cc/LIKWID) comprises various tools for the controlled execution of applications on modern compute nodes with complex topologies and adaptive runtime parameters. By measuring suitable hardware metrics LIKWID enables a detailed analysis of the hardware usage of application programs and is thus pivotal for the validation of performance models and identification of performance patterns. The support for derived metrics such as attained main memory bandwidth requires a continuous adaptation and validation of this tool to new computer architectures. 

The automatic generation of Roofline and ECM models for simple kernels is the purpose of the Kerncraft Tool (http://tiny.cc/kerncraft). An important component of Kerncraft is the OSACA tool (Open Source Architecture Code Analyzer), which is responsible for the single core analysis and runtime prediction of an existing assembly code (http://tiny.cc/OSACA). For all the tools mentioned above, we aim to support as many relevant hardware architectures as possible (Intel/AMD x86, ARM-based processors, IBM Power, NVIDIA GPU).

Based on LIKWID and the existing experience in performance analysis, the group is also pushing forward work on job-specific performance monitoring. The goal is to develop web-based administrative tools such as ClusterCockpit (http://tiny.cc/ClusterCockpit), which will make it much easier for users and administrators to identify bottlenecks in cluster jobs. ClusterCockpit is currently being tested at RRZE and other centers.

Projects:

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader: Gerhard Wellein

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

→ More information

Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
Project leader: Gerhard Wellein

→ More information

Publications:

  • Eitzinger J., Gruber T., Afzal A., Zeiser T., Wellein G.:
    ClusterCockpit-A web application for job-specific performance monitoring
    2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
    In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
    DOI: 10.1109/CLUSTER.2019.8891017
  • Laukemann J., Hammer J., Hager G., Wellein G.:
    Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
    10th IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2019
    DOI: 10.1109/PMBS49563.2019.00006

  • Laukemann J., Hammer J., Hofmann J., Hager G., Wellein G.:
    Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
    2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
    In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
    DOI: 10.1007/978-3-319-92040-5_2
    URL: https://ieeexplore.ieee.org/document/8641578

  • Hammer J., Eitzinger J., Hager G., Wellein G.:
    Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
    10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 2016-10-04 - 2016-10-05)
    In: Niethammer C, Gracia J, Hilbrich T, Knüpfer A, Resch MM, Nagel WE (ed.): Tools for High Performance Computing 2016, Cham: 2017

  • Hammer J., Hager G., Eitzinger J., Wellein G.:
    Automatic loop kernel analysis and performance modeling with kerncraft
    6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2015 - Held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
    DOI: 10.1145/2832087.2832092
  • Hammer J., Hager G., Eitzinger J., Wellein G.:
    Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
    SC15 The International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA, 2015-11-15)
    In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, New York, NY, USA: 2015
    DOI: 10.1145/2832087.2832092
    URL: http://dl.acm.org/citation.cfm?id=2832087&preflayout=flat
  • Wellein G., Eitzinger J., Hager G., Röhl T.:
    Overhead Analysis of Performance Counter Measurements
    43rd International Conference on Parallel Processing Workshops, ICPPW 2014
    DOI: 10.1109/ICPPW.2014.34

Hardware-efficient building blocks for sparse linear algebra and stencil solvers

The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods.  This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.

Projects:

Term: 2012-11-01 - 2019-06-30
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Gerhard Wellein, Georg Hager

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault…

→ More information

Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: Gerhard Wellein

The ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction

→ More information

Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader: Gerhard Wellein

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…

→ More information

Publications:

  • Alappat C., Hager G., Schenk O., Wellein G.:
    Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
    In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
    ISSN: 1045-9219
    DOI: 10.1109/TPDS.2022.3223512

  • Alappat C., Seiferth J., Hager G., Korch M., Rauber T., Wellein G.:
    YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
    19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 2021-02-27 - 2021-03-03)
    In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (ed.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
    DOI: 10.1109/CGO51591.2021.9370316

  • Ernst D., Hager G., Thies J., Wellein G.:
    Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
    13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
    In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
    DOI: 10.1007/978-3-030-43229-4_43

  • Alvermann A., Basermann A., Bungartz HJ., Carbogno C., Ernst D., Fehske H., Futamura Y., Galgon M., Hager G., Huber S., Huckle T., Ida A., Imakura A., Kawai M., Köcher S., Kreutzer M., Kus P., Lang B., Lederer H., Manin V., Marek A., Nakajima K., Nemec L., Reuter K., Rippl M., Röhrig-Zöllner M., Sakurai T., Scheffler M., Scheurer C., Shahzad F., Simoes Brambila D., Thies J., Wellein G.:
    Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
    In: Japan Journal of Industrial and Applied Mathematics (2019)
    ISSN: 0916-7005
    DOI: 10.1007/s13160-019-00360-8

  • Anzt H., Kreutzer M., Ponce E., Peterson GD., Wellein G., Dongarra J.:
    Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
    In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
    ISSN: 1094-3420
    DOI: 10.1177/1094342016646844

  • Kreutzer M., Hager G., Wellein G., Alvermann A., Fehske H., Pieper A.:
    Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International (Hyderabad, India, 2015-05-25 - 2015-05-29)
    In: IEEE (ed.): Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2015
    DOI: 10.1109/IPDPS.2015.76
    URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7161530
  • Malas T., Hager G., Ltaief H., Stengel H., Wellein G., Keyes D.:
    Multicore-optimized wavefront diamond blocking for optimizing stencil updates
    In: SIAM Journal on Scientific Computing 37 (2015), p. C439-C464
    ISSN: 1064-8275
    DOI: 10.1137/140991133

  • Kreutzer M., Hager G., Wellein G., Fehske H., Bishop AR.:
    A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units
    In: SIAM Journal on Scientific Computing 36 (2014), p. C401–C423
    ISSN: 1064-8275
    DOI: 10.1137/130930352
    URL: http://epubs.siam.org/doi/abs/10.1137/130930352

  • Hager G., Wellein G., Schubert G., Fehske H.:
    Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems.
    In: Parallel Processing Letters 21 (2011), p. 339-358
    ISSN: 0129-6264
    DOI: 10.1142/S0129626411000254
  • Schubert G., Hager G., Fehske H., Wellein G.:
    Parallel sparse matrix-vector multiplication as a test case for hybrid MPI OpenMP programming
    25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 (Anchorage, AK)
    DOI: 10.1109/IPDPS.2011.332

  • Wellein G., Hager G., Zeiser T., Wittmann M., Fehske H.:
    Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
    COMPSAC 2009 (Seattle, USA, 2009-07-20 - 2009-07-24)
    In: Proceedings of 2009 33rd Annual IEEE International Computer Software and Applications Conference, IEEE Computer Society: 2009
    DOI: 10.1109/COMPSAC.2009.82

HPC/HPDA Research Software Engineering

Increasing computing power and also amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data.

Publications:

  • Alt C., Lanser M., Plewinski J., Janki A., Klawonn A., Köstler H., Selzer M., Rüde U.:
    A continuous benchmarking infrastructure for high-performance computing applications
    In: International Journal of Parallel, Emergent and Distributed Systems (2024)
    ISSN: 1744-5760
    DOI: 10.1080/17445760.2024.2360190
  • Angersbach R., Köstler H., Kuckuk S.:
    Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication
    Euro-Par 2024 (Madrid, 2024-08-26 - 2024-08-30)
    DOI: 10.1007/978-3-031-69583-4_17
  • Büttner M., Alt C., Kenter T., Köstler H., Plessl C., Aizinger V.:
    Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL
    PASC '24: Platform for Advanced Scientific Computing Conference (Zürich, 2024-06-03 - 2024-06-05)
    In: PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference, New York, NY: 2024
    DOI: 10.1145/3659914.3659925
  • Schottenhamml H., Anciaux Sedrakian A., Blondel F., Köstler H., Rüde U.:
    waLBerla-wind: A lattice-Boltzmann-based high-performance flow solver for wind energy applications
    In: Concurrency and Computation-Practice & Experience 36 (2024)
    ISSN: 1532-0626
    DOI: 10.1002/cpe.8117

  • Dahmardeh M., Mirzaalian Dastjerdi H., Mazal H., Köstler H., Sandoghdar V.:
    Self-supervised machine learning pushes the sensitivity limit in label-free detection of single proteins below 10 kDa
    In: Nature methods (2023)
    ISSN: 1548-7105
    DOI: 10.1038/s41592-023-01778-2

  • Faghih-Naini S., Aizinger V.:
    p-adaptive discontinuous Galerkin method for the shallow water equations with a parameter-free error indicator.
    In: GEM - International Journal on Geomathematics 13 (2022)
    ISSN: 1869-2672
    DOI: 10.1007/s13137-022-00208-3
  • Maier A., Köstler H., Heisig M., Krauß P., Yang SH.:
    Known operator learning and hybrid machine learning in medical imaging - A review of the past, the present, and the future
    In: Progress in Biomedical Engineering 4 (2022), Article No.: 022002
    ISSN: 2516-1091
    DOI: 10.1088/2516-1091/ac5b13
Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
  • RSS Feed
Up