Prof. Dr. Gerhard Wellein
Prof. Dr. Gerhard Wellein
Gerhard Wellein is a Professor for High Performance Computing at the Department for Computer Science of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and holds a PhD in theoretical physics from the University of Bayreuth. From 2015 to 2017, he was also a guest lecturer at the Faculty of Informatics at the Università della Svizzera italiana (USI) Lugano. Since 2024, he has been a Visiting Professor for HPC at the Delft Institute of Applied Mathematics at the Delft University of Technology. He is the director of the Erlangen National Center for High Performance Computing (NHR@FAU) and a member of the board of directors of the German NHR-Alliance which coordinates the national HPC Tier-2 infrastructure at German universities.
Gerhard Wellein has more than twenty years of experience in teaching HPC techniques to students and scientists. He has contributed to numerous tutorials on node-level performance engineering in the past decade and received the “2011 Informatics Europe Curriculum Best Practices Award” (together with Jan Treibig and Georg Hager) for outstanding teaching contributions. His research interests focus on performance modeling and performance engineering, architecture-specific code optimization, novel parallelization approaches, hardware-efficient building blocks for sparse linear algebra, and stencil solvers. He has been conducting and leading numerous national and international HPC research projects and has authored or co-authored more than 100 peer-reviewed publications.
-
FOSTERING THE EUROPEAN ENERGY TRANSITION WITH EXASCALE
(Third Party Funds Group – Sub project)
Overall project: FOSTERING THE EUROPEAN ENERGY TRANSITION WITH EXASCALE
Term: 2024-01-01 - 2026-12-31
Funding source: EU / Cluster 4: Digital, Industry and SpaceThe Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two successful previous projects EoCoE-I and EoCoE-II, where a set of diverse computer applications from four energy domains achieved significant efficiency gains thanks to its multidisciplinary expertise in applied mathematics and supercomputing. During this 3rd round, EoCoE-III will channel its efforts into 5 exascale lighthouse applications covering the key domains of Energy Materials, Water, Wind and Fusion. A world-class consortium of 18 complementary partners from 6 countries will form a unique network of expertise in energy science, scientific computing and HPC, including 3 leading European supercomputing centres. This multidisciplinary effort will harness innovations in computer science and mathematical algorithms within a tightly integrated co-design approach to overcome performance bottlenecks, to deploy the lighthouse applications on the coming European exascale infrastructure and to anticipate future HPC hardware developments. New modelling capabilities will be created at unprecedented scale, demonstrating the potential benefits to the energy industry, such as accelerated design of photovoltaic devices, high-resolution wind farm modelling over complex terrains and quantitative understanding of plasma core-edge interactions in ITER-scale tokamaks. These lighthouse applications will provide a high-visibility platform for high-performance computational energy science, cross-fertilized through close working connections to the EERA consortium.
-
Quelloffene Lösungsansätze für Monitoring und Systemeinstellungen für energieoptimierte Rechenzentren
(Third Party Funds Single)
Term: 2022-09-01 - 2025-08-31
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
URL: https://eehpc.clustercockpit.org/The aim of this project is to reduce power consumption while maximizing throughput in the operation of HPC systems. This is achieved by optimally adjusting system parameters that have an influence on energy consumption to the respective running jobs. To map the throughput of useful work, the Energy Productivity of the IT Equipment metric specified by KPI4DCE is used. The savings potential is demonstrated at all participating data centers for two selected applications each. This project combines a comprehensive job-specific measurement and control infrastructure with machine learning (ML) techniques and software-hardware co-design with the ability to control energy parameters via runtime environments. Policies are used to specify the framework conditions, and the actual optimization of system parameters is automatic and adaptive. To achieve the goals, the GEOPM open-source framework must be extended to include a machine learning component. To make the most of the potential for energy savings, automatic phase detection will be developed, as well as extensions to the MPI and OpenMP runtime environments that allow information about application state to be communicated to the GEOPM framework. To capture required time-resolved metrics on energy consumption as well as performance behavior of the application, interfaces and extensions in LIKWID will be developed. For visualization and control of the GEOPM functionality, the framework is extended to the job-specific Performance Monitoring ClusterCockpit and coupled with GEOPM. The novelty of the solution approach is the development and provision of a product-ready software environment for a fully user-transparent energy optimization of HPC applications. The project builds on existing open source software components and integrates, extends and adapts them for the new requirements.
-
Der skalierbare Strömungsraum
(Third Party Funds Group – Sub project)
Overall project: Der skalierbare Strömungsraum
Term: 2022-09-01 - 2025-08-31
Funding source: BMBF / VerbundprojektKommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätzlich in der Lage sein müssen, die hohe Peak Performance von Beschleuniger-Hardware algorithmisch auszunutzen. Zudem müssen diese Verfahrensansätze in der Anwendersoftware so realisiert werden, dass sie für reale Anwendungen, insbesondere bei der Simulation, Kontrolle und Optimierung von industrierelevanten Prozessen, von “Nicht-HPCExperten” verwendet werden und dabei ressourceneffizient die hohe Leistungsfähigkeit von zukünftigen Exascale-Rechnern ausnutzen können.
Die vor allem an der TU Dortmund entwickelte Open Source Software FEATFLOW ist ein leistungsstarkes CFD-Werkzeug und zentraler Teil der StrömungsRaum-Plattform, die von IANUS Simulation seit Jahren erfolgreich im industriellen Umfeld eingesetzt wird. Im Rahmen des Gesamtprojektes soll FEATFLOW methodisch und durch hardwarenahe parallele Implementierungen erweitert werden, so dass hochskalierbare CFD-Simulationen mit FEATFLOW auf zukünftigen Exascale-Architekturen möglich werden.
Im Teilprojekt der FAU werden Methoden und Prozesse des Performance Engineerings eingesetzt und weiterentwickelt, um zielgerichtet Hardwareeffizienz und Skalierung von FEATFLOW für die kommenden Klassen von HPC-Systemen und abzusehenden Exascale-Architekturen zu verbessern und damit die Simulationszeit stark zu verringern. Dabei werden insbesondere die im Rahmen des Projektes geplanten methodischen Erweiterungen bei der Implementierung effizienter Bibliotheken unterstützt. Darüber hinaus werden Performance Modelle für ausgewählte Kernroutinen erstellt, diese Routinen optimiert und deren effiziente Implementierung in Form von Proxy-Applikationen veröffentlicht.
-
DatenREduktion für Exascale- Anwendungen in der Fusionsforschung
(Third Party Funds Group – Sub project)
Overall project: DatenREduktion für Exascale- Anwendungen in der Fusionsforschung
Term: 2022-09-01 - 2025-08-31
Funding source: BMBF / Verbundprojekt -
Weiterentwicklung des Hochleistungsrechnens
(Third Party Funds Single)
Term: 2022-01-01 - 2022-12-31
Funding source: andere Förderorganisation
extracted from CRIS / see also on Google Scholar
2024
Algebraic temporal blocking for sparse iterative solvers on multi-core CPUs
In: International Journal of High Performance Computing Applications (2024)
ISSN: 1094-3420
DOI: 10.1177/10943420241283828
, , , , :
Charge-order melting in the one-dimensional Edwards model
In: Physical Review Research 6 (2024), Article No.: L022007
ISSN: 2643-1564
DOI: 10.1103/PhysRevResearch.6.L022007
, , :
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, CA, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00038
, , , , :
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00043
, , , , , , :
2023
Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.017
, , , :
Physical Oscillator Model for Supercomputing
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3625535
, , :
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3624197
, , :
Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022 (Gdansk, Poland, 2022-09-11 - 2023-06-14)
In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (ed.): Lecture Notes in Computer Science 2023
DOI: 10.1007/978-3-031-30442-2_12
, , , :
Analytical performance estimation during code generation on modern GPUs
In: Journal of Parallel and Distributed Computing 173 (2023), p. 152-167
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2022.11.003
, , , , :
2D-dwell-time analysis with simulations of ion-channel gating using high-performance computing.
In: Biophysical Journal (2023)
ISSN: 0006-3495
DOI: 10.1016/j.bpj.2023.02.023
, , , , , , :
MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms
In: Parallel Processing and Applied Mathematics. PPAM 2022., Springer, Cham, 2023, p. 321-332 (Lecture Notes in Computer Science (LNCS), Vol.13826)
ISBN: 978-3-031-30441-5
DOI: 10.1007/978-3-031-30442-2_24
, , , :
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023
, , , , , :
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications 149 (2023), p. 25-38
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023
, , , , , :
2022
Addressing White-box Modeling and Simulation Challenges in Parallel Computing
ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 2022-06-08 - 2022-06-10)
In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
DOI: 10.1145/3518997.3534986
, , :
Analytic performance model for parallel overlapping memory-bound kernels
In: Concurrency and Computation-Practice & Experience (2022)
ISSN: 1532-0626
DOI: 10.1002/cpe.6816
URL: https://onlinelibrary.wiley.com/doi/10.1002/cpe.6816
, , :
The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-16
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3221085
, , :
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512
, , , :
2021
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 2021-06-24 - 2021-07-02)
In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
DOI: 10.1007/978-3-030-78713-4_19
, , :
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX
In: Concurrency and Computation-Practice & Experience (2021)
ISSN: 1532-0626
DOI: 10.1002/cpe.6512
URL: https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6512
, , , , , , :
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 2021-02-27 - 2021-03-03)
In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (ed.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
DOI: 10.1109/CGO51591.2021.9370316
, , , , , :
Opening the Black Box: Performance Estimation during Code Generation for GPUs
IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (Belo Horizonte – Brazil, 2021-10-26 - 2021-10-28)
DOI: 10.1109/sbac-pad53543.2021.00014
, , , , :
Valley filtering in strain-induced α- T3 quantum dots
In: Physical Review B 103 (2021), Article No.: 165114
ISSN: 2469-9950
DOI: 10.1103/PhysRevB.103.165114
, , , , :
Multiway p-spectral graph cuts on Grassmann manifolds
In: Machine Learning (2021)
ISSN: 0885-6125
DOI: 10.1007/s10994-021-06108-1
, , , :
2020
Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs
35th International Conference on High Performance Computing, ISC High Performance 2020 (Frankfurt, 2020-06-22 - 2020-06-25)
In: Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020
DOI: 10.1007/978-3-030-50743-5_20
, , :
A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication
In: ACM Transactions on Parallel Computing 7 (2020), Article No.: 19
ISSN: 2329-4949
DOI: 10.1145/3399732
, , , , , , , :
Understanding HPC benchmark performance on intel broadwell and cascade lake processors
35th International Conference on High Performance Computing, ISC High Performance 2020 (Frankfurt, 2020-06-22 - 2020-06-25)
In: Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020
DOI: 10.1007/978-3-030-50743-5_21
, , , , , :
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX
2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2020 (, 2020-11-12)
In: Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems 2020
DOI: 10.1109/PMBS51919.2020.00006
, , , , , , :
Analytic performance modeling and analysis of detailed neuron simulations
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020912528
, , , :
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43
, , , :
Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020965661
, , , :
Bridging the architecture gap: Abstracting performance-relevant properties of modern server processors
In: Supercomputing Frontiers and Innovations 7 (2020), p. 54-78
ISSN: 2409-6008
DOI: 10.14529/jsfi200204
, , , , :
Energy efficiency of nonlinear domain decomposition methods
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020953891
, , , , :
Exasteel: Towards a virtual laboratory for the multiscale simulation of dual-phase steel using high-performance computing
In: Lecture Notes in Computational Science and Engineering, Springer, 2020, p. 351-404 (Lecture Notes in Computational Science and Engineering, Vol.136)
DOI: 10.1007/978-3-030-47956-5_13
, , , , , , , , , , , , , :
PHIST: A Pipelined, Hybrid-Parallel Iterative Solver Toolkit
In: Acm Transactions on Mathematical Software 46 (2020), Article No.: 3402227
ISSN: 0098-3500
DOI: 10.1145/3402227
, , , , , , :
2024
- Performance Engineering for High Performance Computing
(Speech / Talk)
2024-11-13, Event: CNLS Seminar, Los Alamos National Laboratory
2023
- The National High-Performance Computing Alliance: New infrastructure and opportunities for science and research at German universities
(Speech / Talk)
2023-03-15, Event: 35th Molecular Modelling Workshop 2023 - NHR Alliance University High Performance Computing in Germany
(Speech / Talk)
2023-03-22, Event: Seminar, Delft University of Technology (TU Delft) - Application Knowledge Required: Performance Modeling for Fun and Profit
(Speech / Talk)
2023-05-25, Event: Seminar, Helmut-Schmidt-Universität - Universität der Bundeswehr Hamburg - Thirteen modern ways to fool the masses with performance results on parallel computers
(Speech / Talk)
2023-06-06, Event: Dinner Talk - DCSE Summerschool: Numerical Linear Algebra on High Performance Computers, Delft University of Technology (TU Delft) - Application Knowledge Required: Performance Modeling for Fun and Profit
(Speech / Talk)
2023-06-07, Event: HPC mini-symposium, Institute for Computational Science and Engineering (DCSE), Delft University of Technology (TU Delft) - The National High-Performance Computing Alliance and NHR@FAU: New Structures and Opportunities
(Speech / Talk)
2023-08-03, Event: Seminar, University of Victoria (UVic) - Performance modelling facing disruptive technologies on the horizon
(Speech / Talk)
2023-08-10, Event: ModSim 2023 - Workshop on Modeling & Simulation of Systems and Applications - Application Knowledge Required: Analytical Performance Modeling and its application to SpMV
(Speech / Talk)
2023-09-27, Event: 2023 Woudschoten Conference - Performance Engineering for Sparse Matrix-Vector Multiplication: Some new ideas for old problems
(Speech / Talk)
2023-09-28, Event: 2023 Woudschoten Conference
2022
- Level-based Blocking for Sparse Matrix-Power-Vector Multiplication
(Speech / Talk)
2022-08-19, Event: CS Seminar, Lawrence Berkeley National Laboratory (LBNL) - The National High Performance Computing Alliance and NHR@FAU: New structures and opportunities
(Speech / Talk)
2022-11-23, Event: Kolloquium des Forschungszentrums für wissenschaftliches Rechnen, Universität Bayreuth - Power, Energy and HPC
(Speech / Talk)
2022-11-24, Event: Sustainability and Computational Science 2022, Lund University, URL: https://www.compute.lu.se/other-activities/sustainability-and-computational-science-2022/ - The National High Performance Computing Alliance and NHR@FAU: New structures and opportunities
(Speech / Talk)
2022-12-19, Event: Physik Kolloquium, Universität Regensburg
2019
-
Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)
(Third Party Funds Group – Sub project)
Overall project: Energy Oriented Center of Excellence: toward exascale for energy
Term: 2019-01-01 - 2021-12-31
Funding source: Europäische Union (EU)
2017
-
Metaprogrammierung für Beschleunigerarchitekturen
(Third Party Funds Single)
Term: 2017-01-01 - 2019-12-31
Funding source: Bundesministerium für Bildung und Forschung (BMBF)In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für
heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Modelle (RRZE) unterstützt
Eine Laufzeitumgebung mit eingebautem Performance-Profiling kümmert sich um Resourcenverwaltung und Systemkonfiguration. Das entstandene Framework wird anhand zweier Anwendungen, Ray-Tracing (DFKI) und Bioinformatik (JGU), evaluiert.
Als Zielplattformen dienen Einzelknoten und Cluster mit mehreren Beschleunigern (CPUs, GPUs, Xeon Phi).Die Universität Erlangen-Nürnberg ist hauptverantwortlich für die Unterstützung von verteilter
Programmierung (LSS) sowie für die Entwicklung und Umsetzung von unterstützenden Performance-Modellen sowie einer integrierten Profiling Komponente (RRZE). In beiden Teilbereichen wird zu Beginn eine Anforderungsanalyse durchgeführt um weitere Schritte zu planen und mit den Partnern abzustimmen.
Der LSS wird im ersten Jahr die Verteilung der Datenstrukturen umsetzen. Im weiteren Verlauf wird sich die Arbeit auf die Umsetzung von Synchronisationsmechanismen konzentrieren. Im letzten Jahr werden Codetransformationen entworfen, um die Konzepte für Verteilung und Synchronisation in AnyDSL auf die gewählten Anwendungen anzupassen. Das RRZE wird in einem ersten Schritt das kerncraft Framework in die partielle Auswertung integrieren. Hierbei wird kerncraft erweitert um aktuelle Beschleunigerarchitekturen sowie Modelle für die Distributed-Memory-Parallelisierung zu unterstützen. In zwei weiteren Paketen wird eine Ressourcenverwaltung und eine auf LIKWID basierende Profiling Komponente umgesetzt -
Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers
(Third Party Funds Single)
Term: 2017-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
URL: https://blogs.fau.de/prope/The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering (PE) process. This PE process defines and drives code
optimization and parallelization as a target-oriented, structured
process. Application hot spots are identified first and then
optimized/parallelized in an iterative cycle: Starting with an analysis of
the algorithm, the code, and the target hardware a hypothesis of the
performance-limiting factors is proposed based on performance
patterns and models. Performance measurements validate or guide
the iterative adaption of the hypothesis. After validation of the
hardware bottleneck, appropriate code changes are deployed and the
PE cycle restarts. The level of detail of the PE process can be
adapted to the complexity of the underlying problem and the
experience of the HPC analyst. Currently this process is applied by
experts and at the prototype level. ProPE will formalize and document
the PE process and apply it to various scenarios (single core/node
optimization, distributed parallelization, IO-intensive problems).
Different abstraction levels of the PE process will be implemented and
disseminated to HPC analysts and application developers via user
support projects, teaching activities, and web documentation. The
integration of the PE process into modern IT infrastructure across
several centers with different HPC support expertise will be the
second project focus. All components of the PE process will be
coordinated and standardized across the partnering sites. This way
the complete HPC expertise within ProPE can be offered as coherent
service on a nationwide scale. Ongoing support projects can be
transferred easily between participating centers. In order to identify
low-performing applications, characterize application loads, and
quantify benefits of the PE activities at a system level, ProPE will
employ a system monitoring infrastructure for HPC clusters. This tool
will be tailored to the requirements of the PE process and designed
for easy deployment and usage at tier-2/3 centers. The associated
ProPE partners will ensure the embedding into the German HPC
infrastructure and provide basic PE expertise in terms of algorithmic
choices, perfectly complementing the code optimization and
parallelization efforts of ProPE. -
Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen
(Third Party Funds Single)
Term: 2017-03-01 - 2020-02-29
Funding source: Bundesministerium für Bildung und Forschung (BMBF)Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als auch möglicher Programmtransformationen.
Die Optimierung der Programmausführung für mehrere nicht-funktionale Ziele (z.B. Laufzeit oder Energieverbrauch) soll auf einer Performance-Modellierung zur Eingrenzung des Suchraums effizienter Programmvarianten aufbauen. Anwendungsunabhängige Methoden und Strategien zur Selbstadaption sollen in einem Autotuning-Navigator gekapselt werden.Das Erlanger Teilprojekt beschäftigt sich zunächst mit der modellbasierten Verständnis von Autotuning-Verfahren für reguläre Simulationsalgorithmen am Beispiel verschiedener gängiger Stencilklassen. Dabeisollen mit Hilfe erweiterter Performancemodelle strukturierte Richtlinien und Empfehlungen für den Autotuning-Prozess bzgl. relevanter Code-Transformationen und der Beschränkung des Suchraums für Optimierungsparameter erstellt und für den Autotuning-Navigator exemplarisch aufbereitet werden.
Der zweite Schwerpunkt der Arbeiten besteht in der Erweiterung bestehender analytischer
Performancemodelle und Software-Werkzeuge auf neue Rechnerarchitekturen und der Integration in den Autotuning-Navigator. Darüber hinaus betreut der Erlanger Gruppe den Demonstrator für Stencil-Codes.
Die Gruppe wirkt weiters an der Auslegung des AT-Navigators und der Definition von Schnittstellen mit.
2016
-
SPP EXA 1648
(Third Party Funds Group – Sub project)
Overall project: SPP EXA 1648
Term: 2016-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH) -
EXASTEEL II - Bridging Scales for Multiphase Steels
(Third Party Funds Group – Sub project)
Overall project: SPP 1648: Software for Exascale Computing
Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: http://www.numerik.uni-koeln.de/14079.htmlIn the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.
There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale.
Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods.
Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project.
The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators).
-
Equipping Sparse Solvers for Exascale II (ESSEX-II)
(Third Party Funds Group – Sub project)
Overall project: SPP 1648: Software for Exascale Computing
Term: 2016-01-01 - 2018-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: https://blogs.fau.de/essex/activitiesThe ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction
boundaries separating these layers are broken in ESSEX-II by
strongly integrating objectives: scalability, numerical reliability, fault
tolerance, and holistic performance and power engineering. Driven by
Moores Law and power dissipation constraints, computer systems will
become more parallel and heterogeneous even on the node level in
upcoming years, further increasing overall system parallelism. MPI+X
programming models can be adapted in flexible ways to the
underlying hardware structure and are widely expected to be able to
address the challenges of the massively multi-level parallel
heterogeneous architectures of the next decade. Consequently, the
parallel building blocks layer supports MPI+X, with X being a
combination of node-level programming models able to fully exploit
hardware heterogeneity, functional parallelism, and data parallelism.
In addition, facilities for fully asynchronous checkpointing, silent data
corruption detection and correction, performance assessment,
performance model validation, and energy measurements will be
provided. The algorithms layer will leverage the components in the
building blocks layer to deliver fully heterogeneous, automatically
fault-tolerant, and state-of-the-art implementations of Jacobi-Davidson
eigensolvers, the Kernel Polynomial Method (KPM), and Chebyshev
Time Propagation (ChebTP) that are ready to use for production on
modern heterogeneous compute nodes with best performance and
numerical accuracy. Chebyshev filter diagonalization (ChebFD) and a
Krylov eigensolver complement these implementations, and the
recent FEAST method will be investigated and further developed for
improved scalability. The applications layer will deliver scalable
solutions for conservative (Hermitian) and dissipative (non-Hermitian)
quantum systems with strong links to optics and biology and to novel
materials such as graphene and topological insulators. Extending its
predecessor project, ESSEX-II adopts an additional focus on
production-grade software. Although the selection of algorithms is
strictly motivated by quantum physics application scenarios, the
underlying research directions of algorithmic and hardware efficiency,
accuracy, and resilience will radiate into many fields of computational
science. Most importantly, all developments will be accompanied by
an uncompromising performance engineering process that will
rigorously expose any discrepancy between expected and observed
resource efficiency. -
Bridging scales - from Quantum Mechanics to Continuum Mechanics. A Finite Element approach.
(Third Party Funds Single)
Term: 2016-01-01 - 2018-09-30
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)The concurrently coupled Quantum Mechanics (QM) - Continuum Mechanics (CM) approach for electro-elastic problems is considered in this proposal. Despite the fact that efforts have been made to bridge different description of matter, many questions are yet to be answered. First, an efficient Finite Element (FE)-based solution approach to the Kohn-Sham (KS) equations of Density Functional Theory (DFT) will be further developed. The h-adaptivity in the FE-based solution with non-local pseudo-potentials, as well as the mesh transformation during the structural optimization and formulation of the deformation map are the main topics to be studied. It should be noted that until now there exists no open-source implementation of the DFT approach which uses a FE basis and provides hp-refinement capabilities. A FE basis is very attractive in the context of the DFT theory because of its completeness, refinement possibility as well as good polarization properties based on domain decomposition. Second, QM quantities will be related to their CM counterparts (e.g. displacements, deformation gradient, the Piola stress, polarization, etc). This will be achieved using averaging in the Lagrangian configuration. To that end the full control over a FE-based solution of the KS equations is required. The procedure is then to be tested on a representative numerical example - bending of a single wall carbon nanotube. On the CM side, the surface-enhanced continuum theory will be utilized to properly capture surface effects. It should be noted that although several theoretical works exist on this matter, no numerical attempts have been made to check their validity on test examples. Lastly, based on the correspondence between different formulations, a concurrently coupled QM-CM method will be proposed. Coupling will be achieved in a staggered way, i.e. QM and CM problems will be solved iteratively with a proper exchange of information between them. A test-problem of crack propagation in a graphene sheet will be considered. As a long term goal of the project, coupling strategies for electro-elastic problems will be developed. To the best of my knowledge, non of the QM-CM coupling method is capable to handle electro-elastic problems.
-
Ultra-Skalierbare Multiphysiksimulationen für Erstarrungsprozesse in Metallen
(Third Party Funds Group – Overall project)
Term: 2016-02-01 - 2019-01-31
Funding source: BMBF / VerbundprojektKomplexe Phänomene in den Natur- und Ingenieurwissenschaften werden dank der rapide steigenden Rechenleistung immer öfter mit Hilfe von realitätsgetreuen Simulationstechniken erforscht. Das daraus entstandene Fachgebiet Computational Science and Engineering (CSE) gilt deshalb als neue, dritte Säule der Wissenschaft, die die beiden klassischen Säulen Theorie und Experiment ergänzt und verstärkt. Im Kern des CSE geht es darum, leistungsfähige Simulationsmethoden für aktuelle und zukünftige Höchstleistungsrechner zu entwerfen, zu analysieren und sie für die praktische Nutzung robust, benutzerfreundlich und zuverlässig zu implementieren.
Für die Entwicklung neuer Materialien mit besseren Werkstoffeigenschaften, sowie für die Optimierung von Herstellungs- und Fertigungsprozessen sind moderne und hocheffiziente Simulationstechniken heute unverzichtbar. Sie ersetzen hier zu einem großen Teil die traditionellen zeit- und kostenintensiven Experimente, die sonst für die Materialentwicklung und die Qualitätssteigerung von Werkstoffkomponenten erforderlich sind. Materialsimulationen bilden dabei jedoch eine große Herausforderung für die Grundlagenforschung und für das Höchstleistungsrechnen.
Die mechanischen Eigenschaften eines Werkstoffes werden ganz wesentlich durch die Ausbildung der Mikrostruktur beim Herstellungsprozess - d.h. bei der Erstarrung aus der Schmelze - festgelegt. Die Simulation des Erstarrungsprozesses kann dabei wichtige neue Erkenntnisse über experimentell nicht beobachtbare Gefügeausbildungsprozesse liefern und dies ermöglicht es, den Einfluss auf die erzielte Struktur systematisch zu analysieren. Hiermit wird es in Zukunft möglich, neue Materialien mit speziellen Eigenschaften virtuell am Computer zu entwerfen.
Simulationsbasierte Forschungs- und Entwicklungsarbeiten für diese Problemstellung erfordern eine sehr feine räumliche und zeitliche Auflösung, um alle relevanten physikalischen Effekte abzubilden und deshalb benötigen sie eine extrem hohe Rechenleistung. Um auf künftigen Großrechensystemen derartige Probleme mit vielen Tausend Rechenknoten lösen zu können, muss die eingesetzte Simulationssoftware nicht nur in der Lage sein, diese vielen Rechenknoten gleichzeitig zu nutzen, sondern sie muss darüber hinaus auch eine maximale Rechenleistung bei möglichst geringem Ressourcenverbrauch liefern. Neben der eigentlichen Rechenzeit gewinnt hier auch der Energieverbrauch der Supercomputer eine erhebliche Bedeutung. Als Software Basis von SKAMPY wird das waLBerla Framework verwendet. In diesem Projekt wird waLBerla nun erweitert um neue anwendungsorientierte Probleme in den Materialwissenschaften zu lösen. Dabei kommen speziell entwickelte Programmiermethoden zum Einsatz, die eine besonders gute Ausnutzung der Supercomputer ermöglichen. Im Rahmen einer vielversprechenden gemeinsamen Machbarkeitsstudie für die Simulation von Erstarrungsprozessen in Metalllegierungen wurde bereits die Leistungsfähigkeit des Ansatzes und die Portierbarkeit auf die Architekturen aller drei deutschen Höchstleistungsrechner gezeigt, so dass das Projektkonsortium nun bestens aufgestellt ist, um Supercomputersimulationen auch für zukünftige, noch deutlich komplexere Forschungsaufgaben nachhaltig nutzbar zu machen.
2012
-
ESSEX - Equipping Sparse Solvers for Exascale
(Third Party Funds Group – Sub project)
Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2019-06-30
Funding source: DFG / Schwerpunktprogramm (SPP)The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.
-
ESSEX - Equipping Sparse Solvers for Exascale
(Third Party Funds Group – Sub project)
Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.
-
EXASTEEL - Bridging Scales for Multiphase Steels
(Third Party Funds Group – Sub project)
Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)This project adresses algorithms and Software for the Simulation of three dimensional multiscale material science problems on the future Supercomputers developed for exascale computing.The performance of modern high strength steels is governed by the complex interaction of the individual constituents on the microscale. Direct computational homogenization schemes such as the FE2 method allow for the high fidelity material design and analysis of modern steels. Using this approach, fluctuations of the local field equations (balance laws) can be resolved to a high accuracy, which is needed for the prediction of failure of such micro-heterogeneous materials.Performing the scale bridging within the FE2 method for realistic problems in 3D still requires new ultra-scalable, robust algorithms and solvers which have to be developed and incorporated into a new application Software.Such algorithms must be specifically designed to allow the efficient use of the future hardware.Here, the direct multiscale approach (FE2) will be combined with new, highly efficient, parallel solver algorithms. For the latter algorithms, a hybrid algorithmic approach will be taken, combining nonoverlapping parallel domain decomposition (FETl) methods with efficient parallel multigrid preconditioners. A comprehensive performance engineering approach will be implemented guided by the Pl Wellein, to ensure a systematic optimization and parallelization process across all Software layers.This project builds on parallel Simulation Software developed for the solution of complex nonlinear structural mechanics problem by the Pls Schröder, Balzani and Klawonn, Rheinbach. !t is based on the application Software package FEAP (Finite Element Analysis Program, R. Taylor, UC Berkeley). Within a new Software environment FEAP has been combined with a FETI-DP domain decomposition solver, based on PETSc (Argonne National Laboratory) and hypre (Lawrence Livermore National Laboratory), e.g„ to perform parallel simulations in nonlinear biomechanics. The optimization, performance modeling and performance engineering will be guided by the Pl Wellein. The Pls Schröder and Balzani have performed FE2-simulations in the past using an extended version of FEAP.The envisioned scale-bridging for realistic, advanced engineering problems in three dimensions will require a computational power which will only be obtainable when exascale computing becomes available.
-
TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework
(Third Party Funds Group – Sub project)
Overall project: SPP 1648: Software for Exascale Computing
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)Much of what one refers to as geological activity of the Earth is due to the fact that heat is transported from the interior of our planet to the surface in a planetwide solid-state convection in the Earth’s mantle. For this reason, the study of the dynamics of the mantle is critical to our understanding of how the entire planet works. Processes from earthquakes, plate tectonics, crustal evolution to the geodynamo are governed by convection in the mantle. Without a detailed knowledge of Earth‘s internal dynamic processes, we cannot hope to deduce the many interactions between shallow and deep Earth processes that dominate the Earth system. The vast forces associated with mantle convection cells drive horizontal movement of Earth’s surface in the form of plate tectonics, which is well known albeit poorly understood. They also induce substantial vertical motion in the form of dynamically maintained topography that manifests itself prominently in the geologic record through sea level variations and their profound impact on the ocean and climate system. Linking mantle processes to their surface manifestations is seen widely today as one of the most fundamental problems in the Earth sciences, while being at the same time a matter of direct practical relevance through the evolution of sedimentary basins and their paramount economical importance.Simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. With exascale systems of the future it will be possible to advance beyond the deterministic forward problem to a stochastic uncertainty analysis for the inverse problem. In fact, fluid dynamic inverse theory is now at hand that will allow us to track mantle motion back into the past exploiting the rich constraints available from the geologic record, subject to the availability of powerful geodynamical simulation software that could take advantage of these future supercomputers.The new community code TERRA-NEO will be based on a carefully designed multi-scale spacetime discretization using hybridized Discontinuous Galerkin elements on an icosahedral mesh with block-wise refinement. This advanced finite element technique promises better stability and higher accuracy for the non-linear transport processes in the Earth mantle while requiring less communication in a massively parallel setting. The resulting algebraic systems with finally more than 1012 unknowns per time step will be solved by a new class of communication-avoiding, asynchronous multigrid preconditioners that will achieve maximal scalability and resource-optimized computational performance. A non-deterministic control flow and a lazy evaluation strategy will alleviate the traditional over-synchronization of hierarchical iterative methods and will support advanced resiliency techniques on the algorithmic level.The software framework of TERRA-NEO will be developed specifically for the upcoming heterogeneous exascale computers by using an advanced architecture-aware design process. Special white-box performance models will guide the software development leading to a holistic co-design of the data structures and the algorithms on all levels. With this systematic performance engineering methodology we will also optimize a balanced compromise between minimal energy consumption and shortest run time.This consortium is fully committed to the interdisciplinary collaboration that is necessary for creating TERRA-NEO as new exascale simulation framework. To this end, TERRA-NEO brings top experts together that cover all aspects of CS&E, from modeling via the discretization to solvers and software engineering for exascale architectures.
2011
-
Eine fehlertolerante Umgebung für peta-scale MPI-Löser
(Third Party Funds Group – Sub project)
Overall project: FEToL
Term: 2011-06-01 - 2014-05-31
Funding source: Bundesministerium für Bildung und Forschung (BMBF)
2009
-
SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen
(Third Party Funds Group – Overall project)
Term: 2009-01-01 - 2011-12-31
Funding source: BMBF / Verbundprojekt, Bundesministerium für Bildung und Forschung (BMBF)Ziel des vom BMBF geförderten Projekts SKALB (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist die effiziente Implementierung und Weiterentwicklung von Lattice-Boltzmann basierten Strömungslösern zur Simulation komplexer Multi-Physik-Anwendungen auf Rechnern der Petascale-Klasse. Die Lattice-Boltzmann Methode ist ein akzeptiertes Lösungsverfahren im Bereich der numerischen Strömungsmechanik. Als zentraler Vorteil der Methode ist die prinzipielle Einfachheit des numerischen Verfahrens zu nennen, so dass sich sowohl komplexe Strömungsgeometrien wie poröse Medien oder Metallschäume als auch direkte numerische Simulationen (DNS) zur Untersuchung turbulenter Strömungen effizient berechnen lassen. Im Projekt SKALB sollen Lattice-Boltzmann-Applikationen für die neue Klassen massivst paralleler heterogener und homogener Supercomputer methodisch und technisch weiterentwickelt werden. Das RRZE bringt seine langjährige Erfahrung auf dem Gebiet der Performancemodellierung und effizienten Implementierung von Lattice-Boltzmann-Methoden auf einem breiten Spektrum moderner Rechner ein und beschäftigt sich darüberhinaus mit neuen Programmieransätzen für Multi-/Manycore Prozessoren. Der am RRZE weiterentwickelte Applikationscode soll gemeinsam mit der AG Prof. Schwieger zur massiv parallelen Simulation von Strömungen in porösen Medien eingesetzt werden.
2019
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8890995
, , :
Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
In: Japan Journal of Industrial and Applied Mathematics (2019)
ISSN: 0916-7005
DOI: 10.1007/s13160-019-00360-8
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , :
Code generation for massively parallel phase-field simulations
2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 (Denver, CO, 2019-11-17 - 2019-11-22)
In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
DOI: 10.1145/3295500.3356186
, , , , , , , , , , :
ClusterCockpit-A web application for job-specific performance monitoring
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8891017
, , , , :
Collecting and presenting reproducible intranode stencil performance: INSPECT
In: Supercomputing Frontiers and Innovations 6 (2019), p. 4-25
ISSN: 2409-6008
DOI: 10.14529/js?190301
, , , , :
Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
10th IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2019
DOI: 10.1109/PMBS49563.2019.00006
, , , :
Automated instruction stream throughput prediction for intel and AMD microarchitectures
2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2018 (Dallas, TX, 2018-11-12)
In: Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis 2019
DOI: 10.1109/PMBS.2018.8641578
, , , , :
2018
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
, , , , , :
Chebyshev filter diagonalization on modern manycore processors and GPGPUs
Springer Verlag, 2018
ISBN: 9783319920399
DOI: 10.1007/978-3-319-92040-5_17
, , , , , , :
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578
, , , , :
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
In: IEEE Transactions on Parallel and Distributed Systems (2018)
ISSN: 1045-9219
DOI: 10.1109/TPDS.2018.2866794
URL: https://ieeexplore.ieee.org/document/8444763
, , , , , :
Lattice Boltzmann benchmark kernels as a testbed for performance analysis
In: Computers & Fluids 172 (2018), p. 582-592
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2018.03.030
, , , , :
Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model
30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (Lyon, 2018-09-24 - 2018-09-27)
In: 2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), NEW YORK: 2018
DOI: 10.1109/SBAC-PAD.2018.00047
, , , , , , , :
2017
Preconditioned Krylov solvers on GPUs
In: Parallel Computing 68 (2017), p. 32-44
ISSN: 0167-8191
DOI: 10.1016/j.parco.2017.05.006
, , , , , :
Improved coefficients for polynomial filtering in ESSEX
1st InternationalWorkshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, EPASA 2015
DOI: 10.1007/978-3-319-62426-6_5
, , , , , , , , , , , , :
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 2016-10-04 - 2016-10-05)
In: Niethammer C, Gracia J, Hilbrich T, Knüpfer A, Resch MM, Nagel WE (ed.): Tools for High Performance Computing 2016, Cham: 2017
, , , :
Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors
In: Concurrency and Computation-Practice & Experience 29 (2017)
ISSN: 1532-0626
DOI: 10.1002/cpe.3921
, , , , , :
An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors
32nd International Conference on High Performance Computing: ISC High Performance 2017 (Frankfurt)
In: High Performance Computing. ISC 2017. Lecture Notes in Computer Science, vol 10266, Cham: 2017
DOI: 10.1007/978-3-319-58667-0_16
, , , :
LIKWID monitoring stack: A flexible framework enabling job specific performance monitoring for the masses
2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
DOI: 10.1109/CLUSTER.2017.115
, , , :
2016
Efficiency of general Krylov methods on GPUs - An experimental study
30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
DOI: 10.1109/IPDPSW.2016.45
, , , , :
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications (2016)
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
URL: http://hpc.sagepub.com/content/early/2016/05/05/1094342016646844.abstract
, , , , , :
Hybrid Parallel Multigrid Methods for Geodynamical Simulations
In: Bungartz H., Neumann P., Nagel E. (ed.): 113, Berlin, Heidelberg, New York: Springer, 2016, p. 211-235 (Lecture Notes in Computational Science and Engineering, Vol.113)
ISBN: 978-3-319-40526-1
DOI: 10.1007/978-3-319-40528-5_10
URL: http://link.springer.com/chapter/10.1007%2F978-3-319-40528-5_10
, , , , , , , , , , , , , :
Exploring performance and power properties of modern multi-core chips via simple machine models
In: Concurrency and Computation-Practice & Experience 28 (2016), p. 189-210
ISSN: 1532-0626
DOI: 10.1002/cpe.3180
, , , :
Analysis of intel’s haswell microarchitecture using the ECM model and microbenchmarks
Springer Verlag, 2016
ISBN: 9783319306940
DOI: 10.1007/978-3-319-30695-7_16
, , , , :
Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks
29th International Conference on Architecture of Computing Systems (Nuremberg)
In: Architecture of Computing Systems -- ARCS 2016: 29th International Conference, Nuremberg, Germany, April 4-7, 2016, Proceedings, Cham: 2016
DOI: 10.1007/978-3-319-30695-7_16
, , , , :
Performance analysis of the Kahan-enhanced scalar product on current multi-corecore and many-core processors
In: Concurrency and Computation-Practice & Experience 28 (2016)
ISSN: 1532-0626
DOI: 10.1002/cpe.3921
, , , , , :
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
In: International Journal of Parallel Programming (2016), p. 1-27
ISSN: 0885-7458
DOI: 10.1007/s10766-016-0464-z
, , , , , , , , , :
Towards an exascale enabled sparse solver repository
Springer Verlag, 2016
ISBN: 9783319405261
DOI: 10.1007/978-3-319-40528-5_13
, , , , , , , , , , , :
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations
In: Journal of Computational Physics 325 (2016), p. 226-243
ISSN: 0021-9991
DOI: 10.1016/j.jcp.2016.08.027
, , , , , , , :
2015
Automatic loop kernel analysis and performance modeling with kerncraft
6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2015 - Held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
DOI: 10.1145/2832087.2832092
, , , :
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
SC15 The International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA, 2015-11-15)
In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, New York, NY, USA: 2015
DOI: 10.1145/2832087.2832092
URL: http://dl.acm.org/citation.cfm?id=2832087&preflayout=flat
, , , :
Performance analysis of the Kahan-enhanced scalar product on current multicore processors
the 11th International Conference on Parallel Processing and Applied Mathematics (Krakow, Poland)
In: Accepted for PPAM 2015 2015
URL: http://arxiv.org/abs/1505.02586
, , , , :
Hybrid MPI/OpenMP Parallelization in FETI-DP Methods
In: Recent Trends in Computational Engineering - CE2014, -: Springer Link, 2015, p. 67-84 (Lecture Notes in Computational Science and Engineering, Vol.105)
ISBN: 978-3-319-22997-3
DOI: 10.1007/978-3-319-22997-3_4
URL: http://link.springer.com/chapter/10.1007%2F978-3-319-22997-3_4
, , , , :
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International (Hyderabad, India, 2015-05-25 - 2015-05-29)
In: IEEE (ed.): Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2015
DOI: 10.1109/IPDPS.2015.76
URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7161530
, , , , , :
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
In: SIAM Journal on Scientific Computing 37 (2015), p. C439-C464
ISSN: 1064-8275
DOI: 10.1137/140991133
, , , , , :
Increasing the performance of the Jacobi-Davidson method by blocking
In: SIAM Journal on Scientific Computing DLR Portal ISSN 1064-8275 (2015), p. 1-27
ISSN: 1064-8275
DOI: 10.1137/140976017
URL: http://elib.dlr.de/98373/
, , , , , , , , :
Building a Fault Tolerant Application Using the GASPI Communication Layer
the 1st International Workshop on Fault-Tolerant Systems (Chicago, IL, 2015-09-08 - 2015-09-11)
In: Proceedings of FTS 2015, in conjunction with IEEE Cluster 2015: 2015
DOI: 10.1109/CLUSTER.2015.106
, , , , , , :
Overhead Analysis of Performance Counter Measurements
43rd International Conference on Parallel Processing Workshops, ICPPW 2014
DOI: 10.1109/ICPPW.2014.34
, , , :
Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations
In: Concurrency and Computation-Practice & Experience (2015), p. 1-5
ISSN: 1532-0626
DOI: 10.1002/cpe.3489
URL: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3489/full
, , , , :
2014
Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
DOI: 10.1145/2751205.2751240
URL: http://arxiv.org/abs/1410.5010
, , , :- Alvermann Andreas, Basermann Achim, Fehske Holger, Galgon Martin, Hager Georg, Kreutzer Moritz, Krämer Lukas, Lang Bruno, Pieper Andreas, Röhrig-Zöllner Melven, Shahzad Faisal, Jonas Thies, Wellein Gerhard:
ESSEX: Equipping Sparse Solvers for Exascale
In: Euro-Par 2014: Parallel Processing Workshops, Lecture Notes in Computer Science: SpringerLink, 2014, p. 577-588 (Lecture Notes in Computer Science, Vol.8806)
ISBN: 9783319143125
URL: http://link.springer.com/chapter/10.1007/978-3-319-14313-2_49
Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips
2014 1st ACM SIGPLAN Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2014 - Co-located with PPoPP 2014 (Orlando, USA, 2014-02-16 - 2014-02-16)
In: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, New York, NY, USA: 2014
DOI: 10.1145/2568058.2568068
URL: http://dl.acm.org/citation.cfm?doid=2568058.2568068
, , , :
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator
In: ARCS Workshops'14 2014
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6775080&isnumber=6775071
, , , :
A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units
In: SIAM Journal on Scientific Computing 36 (2014), p. C401C423
ISSN: 1064-8275
DOI: 10.1137/130930352
URL: http://epubs.siam.org/doi/abs/10.1137/130930352
, , , , :
Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices
(2014)
Open Access: http://arxiv.org/abs/1410.0412
URL: https://arxiv.org/abs/1410.0412
(Techreport)
, , , :
2013
Pushing the limits for medical image reconstruction on recent standard multicore processors
In: International Journal of High Performance Computing Applications 27 (2013), p. 162-177
ISSN: 1094-3420
DOI: 10.1177/1094342012442424
, , , , :
Effects of disorder and contacts on transport through graphene nanoribbons
In: Physical Review B 88 (2013), p. 195409
ISSN: 1098-0121
DOI: 10.1103/PhysRevB.88.195409
, , , :
A survey of checkpoint/restart techniques on distributed memory systems
In: Parallel Processing Letters 23 (2013), p. 1340011-1340030
ISSN: 0129-6264
DOI: 10.1142/S0129626413400112
URL: http://www.worldscientific.com/doi/abs/10.1142/S0129626413400112
, , , , , :
PGAS implementation of SpMVM and LBM with GPI
The 7th International Conference on PGAS Programming Models (Edinburgh, Scotland, UK)
In: Proceedings of the 7th International Conference on PGAS Programming Models, Edinburgh: 2013
, , , , , :
An Evaluation of Different I/O Techniques for Checkpoint/Restart
2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (Boston, MA, USA, 2013-05-20 - 2013-05-24)
In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, n.a.: 2013
DOI: 10.1109/IPDPSW.2013.145
, , , , :
MPC and CoArray Fortran: Alternatives to classic MPI implementations on the examples of scalable lattice boltzmann flow solvers
15th Results and Review Workshop on High Performance Computing in Science and Engineering, HLRS 2012 (Stuttgart)
DOI: 10.1007/978-3-642-33374-3_27
, , , , :
Comparison of Different Propagation Steps for Lattice Boltzmann Methods
In: Computers & Mathematics with Applications 65 (2013), p. 924-935
ISSN: 0898-1221
DOI: 10.1016/j.camwa.2012.05.002
URL: http://www.sciencedirect.com/science/article/pii/S0898122112003835
, , , :
2012
Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering
5th Workshop on Productivity and Performance (PROPER 2012) (Rhodes Island, Greece)
In: Euro-Par 2012, -: 2012
URL: http://arxiv.org/abs/1206.3738
, , :
Performance Engineering for the Lattice Boltzmann Method on GPGPUs: Architectural Requirements and Performance Results
In: Computers & Fluids (2012), p. 10
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2012.02.013.
URL: http://www.sciencedirect.com/science/article/pii/S0045793012000679
, , , , :
Exploring performance and power properties of modern multicore chips via simple machine models
In: Concurrency and Computation-Practice & Experience Submitted (2012), p. 22
ISSN: 1532-0626
URL: http://arxiv.org/abs/1208.2908
, , , :
Evaluation of the Coarray Fortran Programming Model on the Example of a Lattice Boltzmann Code
The 6th Conference on Partitioned Global Address Space Programming Models (Santa Barbara, CA, USA)
In: PGAS12, In Press: 2012
, , , , :
Asynchronous Checkpointing by Dedicated Checkpoint Threads
In: Recent Advances in the Message Passing Interface, -: Springer-verlag, 2012, p. 289-290 (Lecture Notes in Computer Science, Vol.7490)
ISBN: 978-3-642-33517-4
DOI: 10.1007/978-3-642-33518-1_36
URL: http://link.springer.com/chapter/10.1007/978-3-642-33518-1_36
, , , :
Domain Decomposition and Locality Optimization for Large-Scale Lattice Boltzmann Simulations
In: Computers & Fluids (2012)
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2012.02.007
URL: http://www.sciencedirect.com/science/article/pii/S0045793012000527
, , , :
2011
Simulation software for supercomputers
In: Journal of Computational Science 2 (2011), p. 93-94
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2011.05.003
URL: http://www.sciencedirect.com/science/article/pii/S1877750311000342
, , , :
Efficient multicore-aware parallelization strategies for iterative stencil computations
In: Journal of Computational Science 2 (2011), p. 130137
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2011.01.010
URL: http://www.sciencedirect.com/science/article/pii/S1877750311000172
, , :
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters
In: Parallel Computing 37 (2011), p. 536-549
ISSN: 0167-8191
DOI: 10.1016/j.parco.2011.03.005
URL: http://www.sciencedirect.com/science/article/pii/S0167819111000342
, , , , , :
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
PARENG 2009
In: Advances in Engineering Software, ScienceDirect: 2011
DOI: 10.1016/j.advengsoft.2010.10.007
URL: http://www.sciencedirect.com/science/article/pii/S0965997810001274
, , , :
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems.
In: Parallel Processing Letters 21 (2011), p. 339-358
ISSN: 0129-6264
DOI: 10.1142/S0129626411000254
, , , :
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI OpenMP programming
25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 (Anchorage, AK)
DOI: 10.1109/IPDPS.2011.332
, , , :
2010
Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures
Transactions of the Fourth Joint HLRB and KONWIHR Review and Results Workshop (Leibniz Supercomputing Centre, Garching/Munich, Germany)
In: High Performance Computing in Science and Engineering, Garching/Munich 2009, Berlin Heidelberg: 2010
DOI: 10.1007/978-3-642-13872-0_1
URL: http://www.springerlink.com/content/m1288m0174021600/
, , :
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
39th International Conference on Parallel Processing Workshops (San Diego, CA, USA, 2010-09-13 - 2010-09-16)
In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, IEEE: 2010
DOI: 10.1109/ICPPW.2010.38
URL: http://arxiv.org/abs/1004.4431
, , :
LIKWID performance tools
URL: http://inside.hlrs.de/pdfs/inSiDE_spring2010.pdf
, , , :
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
(2010), p. 18
URL: https://www10.cs.fau.de/publications/reports/TechRep_2010-07.pdf
(Techreport)
, , , , , :
Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
In: Parallel Processing Letters 20 (2010), p. 359-376
ISSN: 0129-6264
DOI: 10.1142/S0129626410000296
URL: http://arxiv.org/abs/1006.3148
, , , :
2009
Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs
PARENG2009 (Pécs, Hungary, 2009-04-06 - 2009-04-08)
In: Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering, Kippen, Stirlingshire, United Kingdom: 2009
, , , :
RZBENCH: performance evaluation of current HPC architectures using low-level and application benchmarks
In: High Performance Computing in Science and Engineering, Garching/Munich 2007: Transactions of the Third Joint HLRB and KONWIHR Status and Result Workshop, Dec. 3-4, 2007, Leibniz Supercomputing Centre, Garching/Munich, Germany, Berlin, Heidelberg: Springer, 2009, p. 485-501 (Mathematics and Statistics, Vol.V)
ISBN: 978-3-540-69181-5
DOI: 10.1007/978-3-540-69182-2_39
, , , :
Challenges and Potentials of Emerging Multicore Architectures
Third Joint HLRB and KONWIHR Status and Result Workshop (Garching, 2007-12-03 - 2007-12-04)
In: High Performance Computing in Science and Engineering Garching-Munich 2007, Berlin Heidelberg: 2009
URL: http://www.springer.com/math/cse/book/978-3-540-69181-5
, , , , :
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
COMPSAC 2009 (Seattle, USA, 2009-07-20 - 2009-07-24)
In: Proceedings of 2009 33rd Annual IEEE International Computer Software and Applications Conference, IEEE Computer Society: 2009
DOI: 10.1109/COMPSAC.2009.82
, , , , :
Benchmark analysis and application results for lattice Boltzmann simulations on NEC SX vector and Intel Nehalem systems
In: Parallel Processing Letters 19 (2009), p. 491-511
ISSN: 0129-6264
DOI: 10.1142/S0129626409000389
URL: http://www.worldscinet.com/ppl/19/1904/S0129626409000389.html
, , :
The world's fastest CPU and SMP node: Some performance results from the NEC SX-9
23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS) (Roma, 2009-05-23 - 2009-05-29)
In: Proceedings of the IEEE International Symposium on Parallel&Distributed Processing 2009, IEEE Computer Society: 2009
DOI: 10.1109/IPDPS.2009.5161089
, , :
Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver
In: High Performance Computing in Science and Engineering '08: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2008, Berlin Heidelberg: Springer, 2009, p. 333-347 (Mathematics and Statistics, Vol.5)
ISBN: 978-3-540-88301-2
DOI: 10.1007/978-3-540-88303-6_24
, , :
Selecting an Appropriate Computational Platform for Supporting the Development of New Catalyst Carriers
In: Innovatives Supercomputing in Deutschland : inSiDE 7 Spring (2009), p. 12-16
URL: http://inside.hlrs.de/htm/Edition_01_09/article_05.html
, , , , , , :
2008
Direct numerical simulation of turbulent flow over dimples - Code optimization for NEC SX-8 plus flow results
In: High Performance Computing in Science and Engineering '07: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2007, Berlin/Heidelberg: Springer, 2008, p. 303-318
ISBN: 9783540747383
DOI: 10.1007/978-3-540-74739-0_21
URL: http://link.springer.com/chapter/10.1007%2F978-3-540-74739-0_21
, , , , :
Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems
In: International Journal of Computational Science and Engineering 4 (2008), p. 3-11
ISSN: 1742-7185
DOI: 10.1504/IJCSE.2008.021107
URL: https://www10.informatik.uni-erlangen.de/Publications/Papers/2008/Donath_IJCSE_4_1.pdf
, , , , , :
Data access characteristics and optimizations for SUN ULTRASPARC T2 AND T2+ systems
In: Parallel Processing Letters 18 (2008), p. 471-490
ISSN: 0129-6264
DOI: 10.1142/S0129626408003521
, , :
Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers
IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008 (Miami, FL, USA, 2008-04-14 - 2008-04-18)
In: Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium, IEEE Catalog Number: 2008
DOI: 10.1109/IPDPS.2008.4536341
, , :
What's next? Evaluating Performance and Programming Approaches for Emerging Computer Technologies
(2008), p. 42-45
URL: http://www.rrze.uni-erlangen.de/wir-ueber-uns/publikationen/HPC-2008-Screenversion.pdf
(Techreport)
, , :
Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method
In: Progress in Computational Fluid Dynamics 8 (2008), p. 179-188
ISSN: 1468-4349
DOI: 10.1504/PCFD.2008.018088
, , , , , :
2007
Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations
In: International Journal of Parallel, Emergent and Distributed Systems 22 (2007), p. 311-329
ISSN: 1744-5760
DOI: 10.1080/17445760701442218
, , , :
2006
Optimizing performance on modern HPC systems: learning from simple kernel benchmarks
The 2nd Russian-German Advanced Research Workshop (Stuttgart, Germany)
In: Computational Science and High Performance Computing II, Berlin Heidelberg: 2006
DOI: 10.1007/3-540-31768-6_23
URL: http://www.springerlink.com/content/8401n54088177483/
, , , :
Optimization of Cache Oblivious Lattice Boltzmann Method in 2D and 3D
ASIM 2006 - 19. Symposium Simulationstechnik (Hannover)
In: Simulationstechnique - 19th Symposium in Hannover, September 2006, Erlangen: 2006
URL: https://www10.informatik.uni-erlangen.de/Publications/Papers/2006/Nitsure_ASIM06.pdf
, , , , , :
Have the Vectors the Continuing Ability to Parry the Attack of the Killer Micros?
In: High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005, Berlin Heidelberg: Springer, 2006, p. 25-37 (Mathematics and Statistics, Vol.1)
ISBN: 978-3-540-29124-4
DOI: 10.1007/3-540-35074-8_2
, , , , :
Towards optimal performance for lattice Boltzmann applications on terascale computers
In: Parallel Computational Fluid Dynamics: Theory and Applications, Proceedings of the 2005 International Conference on Parallel Computational Fluid Dynamics, -: 2006
DOI: 10.1016/B978-044452206-1/50005-7
, , , , :
On the single processor performance of simple lattice Boltzmann kernels
In: Computers & Fluids 35 (2006), p. 910-919
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2005.02.008
, , , :
2005
Performance of Scientific Applications on Modern Supercomputers
High Performance Computing in Science and Engineering (München)
In: High Performance Computing in Science and Engineering Munich 2004 Transactions of the Second Joint HLRB and KONWIHR Status and ResultWorkshop, March 2-3, 2004, Technical University of Munich, andLeibniz-Rechenzentrum Munich, Germany., Berlin Heidelberg: 2005
URL: http://link.springer.com/chapter/10.1007/3-540-26657-7_1#page-1
, , , :
Optimizing performance of the lattice Boltzmann method for complex structures on cache-based architectures
In: Frontiers in Simulation: Simulationstechnique, 18th Symposium in Erlangen, September 2005 (ASIM), Erlangen: 2005
URL: http://www.rrze.uni-erlangen.de/dienste/arbeiten-rechnen/hpc/Projekte/Donath_ASIM05.pdf
, , , , :
Taming the Bandwidth Behemoth. First Experiences on a Large SGI Altix System
In: Innovatives Supercomputing in Deutschland : inSiDE 3 (2005), p. 24
, , , :
2004
Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures.
(2004)
URL: https://www10.cs.fau.de/publications/reports/TechRep_2004-02.pdf
(Techreport)
, , , , , , :
Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures
Supercomputing Conference '04 (Pittsburgh, PA, USA, 2004-11-06 - 2004-11-12)
In: Friedrich-Alexander-Universität Erlangen-Nürnberg (ed.): Supercomputing, 2004: Proceedings of the ACM/IEEE SC2004 Conference 2004
DOI: 10.1109/SC.2004.37
URL: http://dl.acm.org/ft_gateway.cfm?id=1049965&ftid=310446&dwn=1&CFID=475065051&CFTOKEN=96852916
, , , , , , :
Is there still a need for tailored HPC systems or can we go with commodity off-the-shelf clusters - Some comments based on performance measurements using a lattice Boltzmann flow solver
In: Innovatives Supercomputing in Deutschland : inSiDE 2 (2004), p. 10-15
, , :
2003
Pseudo-Vectorization and RISC Optimization Techniques for the Hitachi SR8000 Architecture
High Performance Computing in Science and Engineering (München, 2002-10-10 - 2002-10-11)
In: High Performance Computing in Science and Engineering, Munich 2002: Transactions of the First Joint HLRB and KONWIHR Status and Result Workshop, October 10-11, Technical University of Munich, Germany., New York, LLC: 2003
, , :