Research Test

Our activities are in the following research fields: Performance Engineering Performance Modeling Performance Tools Hardware-efficient building blocks for sparse linear algebra and stencil solvers HPC/HPDA Research Software Engineering

Performance Engineering

Performance Engineering (PE) is a structured, model-based process for the structured optimization and parallelization of basic operations, algorithms and application codes for modern compute architectures. The process is divided into analysis, modeling and optimization phases, which are iterated for each homogeneous code section until an optimal or satisfactory performance is achieved. During the analysis, the first step is to develop a hypothesis about which aspect of the architecture (bottleneck) limits the execution speed of the software. The qualitative identification of typical bottlenecks can be done with so-called application-independent performance patterns. A concrete performance pattern is described by a set of observable runtime characteristics. Using suitable performance models, the interaction of the application with the given hardware architecture is then described analytically and quantitatively. 

The model thus indicates the maximum expected performance and potential runtime improvements through appropriate modifications. If the model predictions cannot be validated by measurements, the underlying model assumptions are revisited and refined or adjusted if necessary. Based on the model, optimizations can be planned and their performance gain be assessed a-priori. The PE approach is not limited to standard microprocessor architectures and can also be used for projections to future computer architectures. The main focus of the group is on the computational node, where analytic performance models such as the Roofline model or the Execution Cache Memory (ECM) model are used.

Publications:

Performance Models

Performance models describe the interaction between application and hardware, forming the basis for a profound understanding of the runtime behavior of an application. The group pursues an analytic approach, the essential components of which are application models and machine models. These components are initially created independently, but their combination and interaction finally provide insights about the bottlenecks and the expected performance. Especially the creation of accurate machine models requires a profound microarchitecture analysis. The execution cache memory (ECM), which was developed by the group, allows predictions of single-core performance as well as scaling within a multi-core processor or compute node. In combination with analytic models of electrical power consumption, it can also be used to derive estimates for the energy consumption of an application. The ECM model is a generalization of the well-known Roofline model. 

Publications:

Performance Tools

The group develops, validates and maintains simple open source tools, which support performance analysis, the creation of performance models and the performance engineering process on the compute node level. 

The well-known tool collection LIKWID (http://tiny.cc/LIKWID) comprises various tools for the controlled execution of applications on modern compute nodes with complex topologies and adaptive runtime parameters. By measuring suitable hardware metrics LIKWID enables a detailed analysis of the hardware usage of application programs and is thus pivotal for the validation of performance models and identification of performance patterns. The support for derived metrics such as attained main memory bandwidth requires a continuous adaptation and validation of this tool to new computer architectures. 

The automatic generation of Roofline and ECM models for simple kernels is the purpose of the Kerncraft Tool (http://tiny.cc/kerncraft). An important component of Kerncraft is the OSACA tool (Open Source Architecture Code Analyzer), which is responsible for the single core analysis and runtime prediction of an existing assembly code (http://tiny.cc/OSACA). For all the tools mentioned above, we aim to support as many relevant hardware architectures as possible (Intel/AMD x86, ARM-based processors, IBM Power, NVIDIA GPU).

Based on LIKWID and the existing experience in performance analysis, the group is also pushing forward work on job-specific performance monitoring. The goal is to develop web-based administrative tools such as ClusterCockpit (http://tiny.cc/ClusterCockpit), which will make it much easier for users and administrators to identify bottlenecks in cluster jobs. ClusterCockpit is currently being tested at RRZE and other centers.

Publications:

    Hardware-efficient building blocks for sparse linear algebra and stencil solvers

    The solution of large, sparsely populated systems of equations and eigenvalue problems is typically done by iterative methods.  This research area deals with the efficient implementation, optimization and parallelization of the most important basic building blocks of such iterative solvers. The focus is on the multiplication of a large sparse matrix with one or more vector(s) (SpMV). Both matrix-free representations for regular matrices, such as those occurring in the discretization of partial differential equations ("stencils"), and the generic case of a general SpMV with a stored matrix are considered. Our work on the development and implementation of optimized building blocks for SpMV-based solvers includes hardware-efficient algorithms, data access optimizations (spatial and temporal blocking), and efficient and portable data structures. Our structured performance engineering process is employed in this context.

    Publications:

    HPC/HPDA Research Software Engineering Increasing computing power and the increasing amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data. Completed Projects

    • Weiterentwicklung des Hochleistungsrechnens

      (Third Party Funds Single)

      Term: 2022-01-01 - 2022-12-31
      Funding source: andere Förderorganisation

    • Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

      (Third Party Funds Group – Sub project)

      Overall project: Energy Oriented Center of Excellence: toward exascale for energy
      Term: 2019-01-01 - 2021-12-31
      Funding source: Europäische Union (EU)

    • Metaprogrammierung für Beschleunigerarchitekturen

      (Third Party Funds Single)

      Term: 2017-01-01 - 2019-12-31
      Funding source: Bundesministerium für Bildung und Forschung (BMBF)

      In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für
      heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Modelle (RRZE) unterstützt
      Eine Laufzeitumgebung mit eingebautem Performance-Profiling kümmert sich um Resourcenverwaltung und Systemkonfiguration. Das entstandene Framework wird anhand zweier Anwendungen, Ray-Tracing (DFKI) und Bioinformatik (JGU), evaluiert.
      Als Zielplattformen dienen Einzelknoten und Cluster mit mehreren Beschleunigern (CPUs, GPUs, Xeon Phi).

       

      Die Universität Erlangen-Nürnberg ist hauptverantwortlich für die Unterstützung von verteilter
      Programmierung (LSS) sowie für die Entwicklung und Umsetzung von unterstützenden Performance-Modellen sowie einer integrierten Profiling Komponente (RRZE). In beiden Teilbereichen wird zu Beginn eine Anforderungsanalyse durchgeführt um weitere Schritte zu planen und mit den Partnern abzustimmen.
      Der LSS wird im ersten Jahr die Verteilung der Datenstrukturen umsetzen. Im weiteren Verlauf wird sich die Arbeit auf die Umsetzung von Synchronisationsmechanismen konzentrieren. Im letzten Jahr werden Codetransformationen entworfen, um die Konzepte für Verteilung und Synchronisation in AnyDSL auf die gewählten Anwendungen anzupassen. Das RRZE wird in einem ersten Schritt das kerncraft Framework in die partielle Auswertung integrieren. Hierbei wird kerncraft erweitert um aktuelle Beschleunigerarchitekturen sowie Modelle für die Distributed-Memory-Parallelisierung zu unterstützen. In zwei weiteren Paketen wird eine Ressourcenverwaltung und eine auf LIKWID basierende Profiling Komponente umgesetzt

    • Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

      (Third Party Funds Single)

      Term: 2017-01-01 - 2019-12-31
      Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
      URL: https://blogs.fau.de/prope/

      The ProPE project will deploy a prototype HPC user support
      infrastructure as a distributed cross-site collaborative effort of several
      tier-2/3 centers with complementing HPC expertise. Within ProPE
      code optimizing and parallelization of scientific software is seen as a
      structured, well-defined process with sustainable outcome. The
      central component of ProPE is the improvement, process-based
      implementation, and dissemination of a structured performance
      engineering (PE) process. This PE process defines and drives code
      optimization and parallelization as a target-oriented, structured
      process. Application hot spots are identified first and then
      optimized/parallelized in an iterative cycle: Starting with an analysis of
      the algorithm, the code, and the target hardware a hypothesis of the
      performance-limiting factors is proposed based on performance
      patterns and models. Performance measurements validate or guide
      the iterative adaption of the hypothesis. After validation of the
      hardware bottleneck, appropriate code changes are deployed and the
      PE cycle restarts. The level of detail of the PE process can be
      adapted to the complexity of the underlying problem and the
      experience of the HPC analyst. Currently this process is applied by
      experts and at the prototype level. ProPE will formalize and document
      the PE process and apply it to various scenarios (single core/node
      optimization, distributed parallelization, IO-intensive problems).
      Different abstraction levels of the PE process will be implemented and
      disseminated to HPC analysts and application developers via user
      support projects, teaching activities, and web documentation. The
      integration of the PE process into modern IT infrastructure across
      several centers with different HPC support expertise will be the
      second project focus. All components of the PE process will be
      coordinated and standardized across the partnering sites. This way
      the complete HPC expertise within ProPE can be offered as coherent
      service on a nationwide scale. Ongoing support projects can be
      transferred easily between participating centers. In order to identify
      low-performing applications, characterize application loads, and
      quantify benefits of the PE activities at a system level, ProPE will
      employ a system monitoring infrastructure for HPC clusters. This tool
      will be tailored to the requirements of the PE process and designed
      for easy deployment and usage at tier-2/3 centers. The associated
      ProPE partners will ensure the embedding into the German HPC
      infrastructure and provide basic PE expertise in terms of algorithmic
      choices, perfectly complementing the code optimization and
      parallelization efforts of ProPE.

    • Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

      (Third Party Funds Single)

      Term: 2017-03-01 - 2020-02-29
      Funding source: Bundesministerium für Bildung und Forschung (BMBF)

      Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als auch möglicher Programmtransformationen.
      Die Optimierung der Programmausführung für mehrere nicht-funktionale Ziele (z.B. Laufzeit oder Energieverbrauch) soll auf einer Performance-Modellierung zur Eingrenzung des Suchraums effizienter Programmvarianten aufbauen. Anwendungsunabhängige Methoden und Strategien zur Selbstadaption sollen in einem Autotuning-Navigator gekapselt werden.

      Das Erlanger Teilprojekt beschäftigt sich zunächst mit der modellbasierten Verständnis von Autotuning-Verfahren für reguläre Simulationsalgorithmen am Beispiel verschiedener gängiger Stencilklassen. Dabeisollen mit Hilfe erweiterter Performancemodelle strukturierte Richtlinien und Empfehlungen für den Autotuning-Prozess bzgl. relevanter Code-Transformationen und der Beschränkung des Suchraums für Optimierungsparameter erstellt und für den Autotuning-Navigator exemplarisch aufbereitet werden.
      Der zweite Schwerpunkt der Arbeiten besteht in der Erweiterung bestehender analytischer
      Performancemodelle und Software-Werkzeuge auf neue Rechnerarchitekturen und der Integration in den Autotuning-Navigator. Darüber hinaus betreut der Erlanger Gruppe den Demonstrator für Stencil-Codes.
      Die Gruppe wirkt weiters an der Auslegung des AT-Navigators und der Definition von Schnittstellen mit.
       

    • SPP EXA 1648

      (Third Party Funds Group – Sub project)

      Overall project: SPP EXA 1648
      Term: 2016-01-01 - 2019-12-31
      Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    • EXASTEEL II - Bridging Scales for Multiphase Steels

      (Third Party Funds Group – Sub project)

      Overall project: SPP 1648: Software for Exascale Computing
      Term: 2016-01-01 - 2018-12-31
      Funding source: DFG / Schwerpunktprogramm (SPP)
      URL: http://www.numerik.uni-koeln.de/14079.html

      In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

      There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale.

      Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods.

      Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project.

      The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators).

    • Equipping Sparse Solvers for Exascale II (ESSEX-II)

      (Third Party Funds Group – Sub project)

      Overall project: SPP 1648: Software for Exascale Computing
      Term: 2016-01-01 - 2018-12-31
      Funding source: DFG / Schwerpunktprogramm (SPP)
      URL: https://blogs.fau.de/essex/activities

      The ESSEX-II project will use the successful concepts and software
      blueprints developed in ESSEX-I for sparse eigenvalue solvers to
      produce widely usable and scalable software solutions with high
      hardware efficiency for the computer architectures of the upcoming
      decade. All activities are organized along the traditional software
      layers of low-level parallel building blocks (kernels), algorithm
      implementations, and applications. However, the classic abstraction
      boundaries separating these layers are broken in ESSEX-II by
      strongly integrating objectives: scalability, numerical reliability, fault
      tolerance, and holistic performance and power engineering. Driven by
      Moores Law and power dissipation constraints, computer systems will
      become more parallel and heterogeneous even on the node level in
      upcoming years, further increasing overall system parallelism. MPI+X
      programming models can be adapted in flexible ways to the
      underlying hardware structure and are widely expected to be able to
      address the challenges of the massively multi-level parallel
      heterogeneous architectures of the next decade. Consequently, the
      parallel building blocks layer supports MPI+X, with X being a
      combination of node-level programming models able to fully exploit
      hardware heterogeneity, functional parallelism, and data parallelism.
      In addition, facilities for fully asynchronous checkpointing, silent data
      corruption detection and correction, performance assessment,
      performance model validation, and energy measurements will be
      provided. The algorithms layer will leverage the components in the
      building blocks layer to deliver fully heterogeneous, automatically
      fault-tolerant, and state-of-the-art implementations of Jacobi-Davidson
      eigensolvers, the Kernel Polynomial Method (KPM), and Chebyshev
      Time Propagation (ChebTP) that are ready to use for production on
      modern heterogeneous compute nodes with best performance and
      numerical accuracy. Chebyshev filter diagonalization (ChebFD) and a
      Krylov eigensolver complement these implementations, and the
      recent FEAST method will be investigated and further developed for
      improved scalability. The applications layer will deliver scalable
      solutions for conservative (Hermitian) and dissipative (non-Hermitian)
      quantum systems with strong links to optics and biology and to novel
      materials such as graphene and topological insulators. Extending its
      predecessor project, ESSEX-II adopts an additional focus on
      production-grade software. Although the selection of algorithms is
      strictly motivated by quantum physics application scenarios, the
      underlying research directions of algorithmic and hardware efficiency,
      accuracy, and resilience will radiate into many fields of computational
      science. Most importantly, all developments will be accompanied by
      an uncompromising performance engineering process that will
      rigorously expose any discrepancy between expected and observed
      resource efficiency.

    • ESSEX - Equipping Sparse Solvers for Exascale

      (Third Party Funds Group – Sub project)

      Overall project: SPP 1648: Software for Exascale Computing
      Term: 2012-11-01 - 2019-06-30
      Funding source: DFG / Schwerpunktprogramm (SPP)

      The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

    • ESSEX - Equipping Sparse Solvers for Exascale

      (Third Party Funds Group – Sub project)

      Overall project: SPP 1648: Software for Exascale Computing
      Term: 2012-11-01 - 2015-12-31
      Funding source: DFG / Schwerpunktprogramm (SPP)

      The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

    • EXASTEEL - Bridging Scales for Multiphase Steels

      (Third Party Funds Group – Sub project)

      Overall project: SPP 1648: Software for Exascale Computing
      Term: 2012-11-01 - 2015-12-31
      Funding source: DFG / Schwerpunktprogramm (SPP)

      This project adresses algorithms and Software for the Simulation of three dimensional multiscale material science problems on the future Supercomputers developed for exascale computing.The performance of modern high strength steels is governed by the complex interaction of the individual constituents on the microscale. Direct computational homogenization schemes such as the FE2 method allow for the high fidelity material design and analysis of modern steels. Using this approach, fluctuations of the local field equations (balance laws) can be resolved to a high accuracy, which is needed for the prediction of failure of such micro-heterogeneous materials.Performing the scale bridging within the FE2 method for realistic problems in 3D still requires new ultra-scalable, robust algorithms and solvers which have to be developed and incorporated into a new application Software.Such algorithms must be specifically designed to allow the efficient use of the future hardware.Here, the direct multiscale approach (FE2) will be combined with new, highly efficient, parallel solver algorithms. For the latter algorithms, a hybrid algorithmic approach will be taken, combining nonoverlapping parallel domain decomposition (FETl) methods with efficient parallel multigrid preconditioners. A comprehensive performance engineering approach will be implemented guided by the Pl Wellein, to ensure a systematic optimization and parallelization process across all Software layers.This project builds on parallel Simulation Software developed for the solution of complex nonlinear structural mechanics problem by the Pls Schröder, Balzani and Klawonn, Rheinbach. !t is based on the application Software package FEAP (Finite Element Analysis Program, R. Taylor, UC Berkeley). Within a new Software environment FEAP has been combined with a FETI-DP domain decomposition solver, based on PETSc (Argonne National Laboratory) and hypre (Lawrence Livermore National Laboratory), e.g„ to perform parallel simulations in nonlinear biomechanics. The optimization, performance modeling and performance engineering will be guided by the Pl Wellein. The Pls Schröder and Balzani have performed FE2-simulations in the past using an extended version of FEAP.The envisioned scale-bridging for realistic, advanced engineering problems in three dimensions will require a computational power which will only be obtainable when exascale computing becomes available.

    • TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework

      (Third Party Funds Group – Sub project)

      Overall project: SPP 1648: Software for Exascale Computing
      Term: 2012-11-01 - 2015-12-31
      Funding source: DFG / Schwerpunktprogramm (SPP)

      Much of what one refers to as geological activity of the Earth is due to the fact that heat is transported from the interior of our planet to the surface in a planetwide solid-state convection in the Earth’s mantle. For this reason, the study of the dynamics of the mantle is critical to our understanding of how the entire planet works. Processes from earthquakes, plate tectonics, crustal evolution to the geodynamo are governed by convection in the mantle. Without a detailed knowledge of Earth‘s internal dynamic processes, we cannot hope to deduce the many interactions between shallow and deep Earth processes that dominate the Earth system. The vast forces associated with mantle convection cells drive horizontal movement of Earth’s surface in the form of plate tectonics, which is well known albeit poorly understood. They also induce substantial vertical motion in the form of dynamically maintained topography that manifests itself prominently in the geologic record through sea level variations and their profound impact on the ocean and climate system. Linking mantle processes to their surface manifestations is seen widely today as one of the most fundamental problems in the Earth sciences, while being at the same time a matter of direct practical relevance through the evolution of sedimentary basins and their paramount economical importance.Simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. With exascale systems of the future it will be possible to advance beyond the deterministic forward problem to a stochastic uncertainty analysis for the inverse problem. In fact, fluid dynamic inverse theory is now at hand that will allow us to track mantle motion back into the past exploiting the rich constraints available from the geologic record, subject to the availability of powerful geodynamical simulation software that could take advantage of these future supercomputers.The new community code TERRA-NEO will be based on a carefully designed multi-scale spacetime discretization using hybridized Discontinuous Galerkin elements on an icosahedral mesh with block-wise refinement. This advanced finite element technique promises better stability and higher accuracy for the non-linear transport processes in the Earth mantle while requiring less communication in a massively parallel setting. The resulting algebraic systems with finally more than 1012 unknowns per time step will be solved by a new class of communication-avoiding, asynchronous multigrid preconditioners that will achieve maximal scalability and resource-optimized computational performance. A non-deterministic control flow and a lazy evaluation strategy will alleviate the traditional over-synchronization of hierarchical iterative methods and will support advanced resiliency techniques on the algorithmic level.The software framework of TERRA-NEO will be developed specifically for the upcoming heterogeneous exascale computers by using an advanced architecture-aware design process. Special white-box performance models will guide the software development leading to a holistic co-design of the data structures and the algorithms on all levels. With this systematic performance engineering methodology we will also optimize a balanced compromise between minimal energy consumption and shortest run time.This consortium is fully committed to the interdisciplinary collaboration that is necessary for creating TERRA-NEO as new exascale simulation framework. To this end, TERRA-NEO brings top experts together that cover all aspects of CS&E, from modeling via the discretization to solvers and software engineering for exascale architectures.

    • Eine fehlertolerante Umgebung für peta-scale MPI-Löser

      (Third Party Funds Group – Sub project)

      Overall project: FEToL
      Term: 2011-06-01 - 2014-05-31
      Funding source: Bundesministerium für Bildung und Forschung (BMBF)

    • SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen

      (Third Party Funds Group – Overall project)

      Term: 2009-01-01 - 2011-12-31
      Funding source: BMBF / Verbundprojekt, Bundesministerium für Bildung und Forschung (BMBF)

      Ziel des vom BMBF geförderten Projekts SKALB (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist die effiziente Implementierung und Weiterentwicklung von Lattice-Boltzmann basierten Strömungslösern zur Simulation komplexer Multi-Physik-Anwendungen auf Rechnern der Petascale-Klasse. Die Lattice-Boltzmann Methode ist ein akzeptiertes Lösungsverfahren im Bereich der numerischen Strömungsmechanik. Als zentraler Vorteil der Methode ist die prinzipielle Einfachheit des numerischen Verfahrens zu nennen, so dass sich sowohl komplexe Strömungsgeometrien wie poröse Medien oder Metallschäume als auch direkte numerische Simulationen (DNS) zur Untersuchung turbulenter Strömungen effizient berechnen lassen. Im Projekt SKALB sollen Lattice-Boltzmann-Applikationen für die neue Klassen massivst paralleler heterogener und homogener Supercomputer methodisch und technisch weiterentwickelt werden. Das RRZE bringt seine langjährige Erfahrung auf dem Gebiet der Performancemodellierung und effizienten Implementierung von Lattice-Boltzmann-Methoden auf einem breiten Spektrum moderner Rechner ein und beschäftigt sich darüberhinaus mit neuen Programmieransätzen für Multi-/Manycore Prozessoren. Der am RRZE weiterentwickelte Applikationscode soll gemeinsam mit der AG Prof. Schwieger zur massiv parallelen Simulation von Strömungen in porösen Medien eingesetzt werden.