Research Focus
Our activities are in the following research fields:
- Performance Engineering
- Performance Modeling
- Performance Tools
- Hardware-efficient building blocks for sparse linear algebra and stencil solvers
- HPC/HPDA Research Software Engineering
Performance Engineering
Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively.
The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.
Projects:
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader:
The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering…
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader:
In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.
There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous…
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader:
Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…
Funding source: Europäische Union (EU)
Project leader:
Funding source: BMBF / Verbundprojekt
Project leader:
Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätz…
Funding source: EU / Cluster 4: Digital, Industry and Space
Project leader:
The Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two…
Funding source: BMBF / Verbundprojekt
Project leader:
Funding source: andere Förderorganisation
Project leader:
Publications:
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, CA, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00038
, , , , :
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00043
, , , , , , :
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3624197
, , :
Analytical performance estimation during code generation on modern GPUs
In: Journal of Parallel and Distributed Computing 173 (2023), p. 152-167
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2022.11.003
, , , , :
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023
, , , , , :
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512
, , , :
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX
In: Concurrency and Computation-Practice & Experience (2021)
ISSN: 1532-0626
DOI: 10.1002/cpe.6512
URL: https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6512
, , , , , , :
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43
, , , :
Energy efficiency of nonlinear domain decomposition methods
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020953891
, , , , :
Code generation for massively parallel phase-field simulations
2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 (Denver, CO, 2019-11-17 - 2019-11-22)
In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
DOI: 10.1145/3295500.3356186
, , , , , , , , , , :
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
, , , , , :
Chebyshev filter diagonalization on modern manycore processors and GPGPUs
Springer Verlag, 2018
ISBN: 9783319920399
DOI: 10.1007/978-3-319-92040-5_17
, , , , , , :
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578
, , , , :
Performance Modeling
Performance models describe the interaction between application and hardware, forming the basis for a profound understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. The execution cache memory (ECM), which was developed by the group, allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model.
Publications:
MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms
In: Parallel Processing and Applied Mathematics. PPAM 2022., Springer, Cham, 2023, p. 321-332 (Lecture Notes in Computer Science (LNCS), Vol.13826)
ISBN: 978-3-031-30441-5
DOI: 10.1007/978-3-031-30442-2_24
, , , :
Addressing White-box Modeling and Simulation Challenges in Parallel Computing
ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 2022-06-08 - 2022-06-10)
In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
DOI: 10.1145/3518997.3534986
, , :
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 2021-06-24 - 2021-07-02)
In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
DOI: 10.1007/978-3-030-78713-4_19
, , :
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8890995
, , :
Performance Tools
The group develops, validates and maintains simple open source tools, which support performance analysis, the creation of performance models and the performance engineering process on the compute node level.
The well-known tool collection LIKWID (http://tiny.cc/LIKWID) comprises various tools for the controlled execution of applications on modern compute nodes with complex topologies and adaptive runtime parameters. By measuring suitable hardware metrics LIKWID enables a detailed analysis of the hardware usage of application programs and is thus pivotal for the validation of performance models and identification of performance patterns. The support for derived metrics such as attained main memory bandwidth requires a continuous adaptation and validation of this tool to new computer architectures.
The automatic generation of Roofline and ECM models for simple kernels is the purpose of the Kerncraft Tool (http://tiny.cc/kerncraft). An important component of Kerncraft is the OSACA tool (Open Source Architecture Code Analyzer), which is responsible for the single core analysis and runtime prediction of an existing assembly code (http://tiny.cc/OSACA). For all the tools mentioned above, we aim to support as many relevant hardware architectures as possible (Intel/AMD x86, ARM-based processors, IBM Power, NVIDIA GPU).
Based on LIKWID and the existing experience in performance analysis, the group is also pushing forward work on job-specific performance monitoring. The goal is to develop web-based administrative tools such as ClusterCockpit (http://tiny.cc/ClusterCockpit), which will make it much easier for users and administrators to identify bottlenecks in cluster jobs. ClusterCockpit is currently being tested at RRZE and other centers.
Projects:
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader:
Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…
Funding source: andere Förderorganisation
Project leader:
Publications:
ClusterCockpit-A web application for job-specific performance monitoring
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8891017
, , , , :
Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
10th IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2019
DOI: 10.1109/PMBS49563.2019.00006
, , , :
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578
, , , , :
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 2016-10-04 - 2016-10-05)
In: Niethammer C, Gracia J, Hilbrich T, Knüpfer A, Resch MM, Nagel WE (ed.): Tools for High Performance Computing 2016, Cham: 2017
, , , :
Automatic loop kernel analysis and performance modeling with kerncraft
6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2015 - Held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
DOI: 10.1145/2832087.2832092
, , , :
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
SC15 The International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA, 2015-11-15)
In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, New York, NY, USA: 2015
DOI: 10.1145/2832087.2832092
URL: http://dl.acm.org/citation.cfm?id=2832087&preflayout=flat
, , , :
Overhead Analysis of Performance Counter Measurements
43rd International Conference on Parallel Processing Workshops, ICPPW 2014
DOI: 10.1109/ICPPW.2014.34
, , , :
Hardware-efficient building blocks for sparse linear algebra and stencil solvers
The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods. This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.
Projects:
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: ,
The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault…
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader:
The ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader:
Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als au…
Publications:
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512
, , , :
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 2021-02-27 - 2021-03-03)
In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (ed.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
DOI: 10.1109/CGO51591.2021.9370316
, , , , , :
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43
, , , :
Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
In: Japan Journal of Industrial and Applied Mathematics (2019)
ISSN: 0916-7005
DOI: 10.1007/s13160-019-00360-8
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , :
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
, , , , , :
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International (Hyderabad, India, 2015-05-25 - 2015-05-29)
In: IEEE (ed.): Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2015
DOI: 10.1109/IPDPS.2015.76
URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7161530
, , , , , :
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
In: SIAM Journal on Scientific Computing 37 (2015), p. C439-C464
ISSN: 1064-8275
DOI: 10.1137/140991133
, , , , , :
A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units
In: SIAM Journal on Scientific Computing 36 (2014), p. C401C423
ISSN: 1064-8275
DOI: 10.1137/130930352
URL: http://epubs.siam.org/doi/abs/10.1137/130930352
, , , , :
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems.
In: Parallel Processing Letters 21 (2011), p. 339-358
ISSN: 0129-6264
DOI: 10.1142/S0129626411000254
, , , :
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI OpenMP programming
25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 (Anchorage, AK)
DOI: 10.1109/IPDPS.2011.332
, , , :
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
COMPSAC 2009 (Seattle, USA, 2009-07-20 - 2009-07-24)
In: Proceedings of 2009 33rd Annual IEEE International Computer Software and Applications Conference, IEEE Computer Society: 2009
DOI: 10.1109/COMPSAC.2009.82
, , , , :
HPC/HPDA Research Software Engineering
Increasing computing power and also amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data.
Projects:
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader: ,
Complex phenomena in the natural and the engineering sciences are increasingly being studied with the help of simulation techniques. This is facilitated by a dramatic increase of the available computational power, and Computational Science and Engineering (CSE) is emerging as a third fundamental pillar of science. CSE aims at designing,analyzing, and implementing new simulation methods on high-performance computing(HPC) systems such that they can be employed in a robust, user-friendly, and reliable…
Funding source: DFG / Forschungsgruppe (FOR)
Project leader:
Das Laserstrahlschweißen als flexibles und kontaktloses Fügeverfahren gewinnt immer mehr an Bedeutung. Die Bearbeitung von Legierungen mit großem Schmelzintervall stellt aufgrund ihrer Neigung zu Erstarrungsrissen jedoch eine Herausforderung dar. Diese entstehen durch kritische Spannungs- bzw. Dehnungszustände der dendritischen Mikrostruktur mit interdendritischer Schmelze. Trotz der hohen industriellen Relevanz existieren bisher lediglich Ansätze, die sich Teilaspekten dieser Problematik - me…
Funding source: EU / Cluster 4: Digital, Industry and Space
Project leader:
The Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two…
Funding source: BMBF / Verbundprojekt
Project leader:
Physikalische Fragestellungen werden von der Anwendung diktiertund führen oft auf unterschiedliche Modellierungsparadigmen. In diesem Antragwird einerseits klassische Kontinuumsmechanik genutzt, die zu EulerschenFinite-Element-Modellen führt, andererseits lassen sich Transportphänomenejedoch oft besser mit Lagrangeschen Methoden darstellen, z.B. als Trajektorienin einem System vieler Teilchen. Die Kopplung dieser unterschiedlichen Modelleführt zu Herausforderungen in der Mathematik, sowie bei…
Project leader:
Solving problems in present-day simulation is becoming more and more complex. Both the number of physical effects taken into account and the complexity of the associated software development process increase. In order to meet these growing demands, the Chair for System Simulation (LSS) developed the massively parallel and flexible simulation framework waLBerla (widely applicable Lattice Boltzmann solver from Erlangen). Originally, the framework has been centered around the Lattice-Boltzmann method…
Project leader:
Supercomputer architecture is moving quickly to multi-core and many- core architectures. An additional trend is the increasing use of special purpose accelerators, e.g. in form of graphics cards, the Cell processor, or reconfigu- rable hardware. This has the potential to deliver unprecedented performance at lower cost and reduced power consumption. However, this trend opens many unanswered questions on how these devices can be use effectively in real life supercomputing applications, since these…
Funding source: Bayerisches Staatsministerium für Wissenschaft und Kunst (StMWK) (seit 2018)
Project leader:
Multicomponent flows are of considerable scientific interest due to their broad range of applications. Emulsions, for example, play an important role in processing of coatings, cosmetics, pharmaceutics, and foods. Especially double emulsions, where smaller drops are encapsulated in larger drops, carry high potential for medial applications like controlled drug delivery and release. On much larger scales, multicomponent flows are of wide interest in the oil industry where advanced recovery processes…
Funding source: DFG / Schwerpunktprogramm (SPP)
Project leader: , ,
Future exascale computing systems with 107 processing units and supporting up to 1018 FLOPS peak performance will require a tight co-design of application, algorithm, and architecture aware program development to sustain this performance for many applications of interest, mainly for two reasons. First, the node structure inside an exascale cluster will become increasingly heterogeneous, always exploiting the most recent available on-chip manycore/GPU/HWassist technology. Second, the clusters themselves will be composed of heterogeneous subsystems and interconnects. As a result, new software techniques and tools supporting the joint algorithm and architecture-aware program development will become indispensable not only (a) to ease application and program development, but also (b) for performance analysis and tuning, (c) to ensure short turn-around times, and (d) for reasons of portability.
Project ExaStencils will investigate and provide a unique, tool-assisted, domain-specific codesign approach for the important class of stencil codes, which play a central role in high performance simulation on structured or block-structured grids. Stencils are regular access patterns on (usually multidimensional) data grids. Multigrid methods involve a hierarchy of very fine to successively coarser grids. The challenge of exascale is that, for the coarser grids, less processing power is required and communication dominates. From the computational algorithm perspective, domain-specific investigations include the extraction and development of suitable stencils, the analysis of performance-relevant algorithmic tradeoffs (e.g., the number of grid levels) and the analysis and reduction of synchronization requirements guided by a template model of the targeted cluster architecture. Based on this analysis, sophisticated programming and software tool support shall be developed by capturing the relevant data structures and program segments for stencil computations in a domain-specific language and applying a generator-based product-line technology to generate and optimize automatically stencil codes tailored to each application–platform pair. A central distinguishing mark of ExaStencils is that domain knowledge is being pursued in a coordinated manner across all abstraction levels, from the formulation of the application scenario down to the generation of highly-optimized stencil code.
For the developed unique and first-time seamless cross-level design flow, the three objectives of (1) a substantial gain in productivity, (2) high flexibility in the choice of algorithm and execution platform, and (3) the provision of the ExaFLOPS performance for stencil code shall be demonstrated in a detailed, final evaluation phase.
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader: ,
Das HPC²SE Projekt entwickelteinen neuartigen Metaprogrammieransatz, um die Nutzung moderner und zukünftigerheterogener HPC-Systeme für eine breite Klasse von Simulationen einfacher undeffizienter zu gestalten.
Eine Schlüsseltechnologie fürdie Forschung oder industrielle Entwicklungen ist die numerische Simulation.Beispiele hierfür sind die Klimaprognose, der Katastrophenschutz, dieEnergieversorgung, der Fahrzeugbau. Zunehmend gewinnen simulationsbasierteRisikoabschätzungen an ges…
Funding source: BMBF / Verbundprojekt
Project leader:
Komplexe Phänomene in den Natur- und Ingenieurwissenschaften werden dank der rapide steigenden Rechenleistung immer öfter mit Hilfe von realitätsgetreuen Simulationstechniken erforscht. Das daraus entstandene Fachgebiet Computational Science and Engineering (CSE) gilt deshalb als neue, dritte Säule der Wissenschaft, die die beiden klassischen Säulen Theorie und Experiment ergänzt und verstärkt. Im Kern des CSE geht es darum, leistungsfähige Simulationsmethoden für aktuelle und zukünftige Höchstlei…
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader: , ,
Um akkurate Ozean, Atmosphären oder Klima Simulationen durchzuführen werden sehr effiziente numerische Verfahren und große Rechenkapazitäten benötigt, die in vielen Teilen der Welt und bei vielen Forschungsgruppen in diesen Anwendungsfeldern nicht verfügbar sind. Solche Beschränkungen führen auch dazu, dass Modelle und Softwarepakete basierend auf strukturierten Gittern derzeit in der Ozeanwissenschaft immer noch vorherrschend sind.In diesem Projekt soll zum einen die Rechenzeit für Modelle, die auf …
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
Project leader: ,
In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für
heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Mo…
Publications:
A continuous benchmarking infrastructure for high-performance computing applications
In: International Journal of Parallel, Emergent and Distributed Systems (2024)
ISSN: 1744-5760
DOI: 10.1080/17445760.2024.2360190
, , , , , , , :
Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication
Euro-Par 2024 (Madrid, 2024-08-26 - 2024-08-30)
DOI: 10.1007/978-3-031-69583-4_17
, , :
Enabling Performance Portability for Shallow Water Equations on CPUs, GPUs, and FPGAs with SYCL
PASC '24: Platform for Advanced Scientific Computing Conference (Zürich, 2024-06-03 - 2024-06-05)
In: PASC '24: Proceedings of the Platform for Advanced Scientific Computing Conference, New York, NY: 2024
DOI: 10.1145/3659914.3659925
, , , , , :
waLBerla-wind: A lattice-Boltzmann-based high-performance flow solver for wind energy applications
In: Concurrency and Computation-Practice & Experience 36 (2024)
ISSN: 1532-0626
DOI: 10.1002/cpe.8117
, , , , :
Self-supervised machine learning pushes the sensitivity limit in label-free detection of single proteins below 10 kDa
In: Nature methods (2023)
ISSN: 1548-7105
DOI: 10.1038/s41592-023-01778-2
, , , , :
p-adaptive discontinuous Galerkin method for the shallow water equations with a parameter-free error indicator.
In: GEM - International Journal on Geomathematics 13 (2022)
ISSN: 1869-2672
DOI: 10.1007/s13137-022-00208-3
, :
Known operator learning and hybrid machine learning in medical imaging - A review of the past, the present, and the future
In: Progress in Biomedical Engineering 4 (2022), Article No.: 022002
ISSN: 2516-1091
DOI: 10.1088/2516-1091/ac5b13
, , , , :