Research focus

Overview

Our activities are in the following research fields:

Performance Engineering

We are active and known for our work in the field of node-level performance engineering (PE). We employ a systematic performance engineering process based on performance patterns, develop high performance prototype codes and libraries, and perform performance analysis of codes and hardware platforms.

Activities

Selected Publications

D. Ernst, G. Hager, J. Thies, and G. Wellein: Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. Accepted for PPAM’2019, the 13th International Conference on Parallel Processing and Applied Mathematics, September 8-11, 2019, Białystok, Poland. Preprint: arXiv:1905.03136
J. Hofmann, D. Fey, M. Riedmann, J. Eitzinger, G. Hager, and G. Wellein: Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors. Concurrency & Computation: Practice & Experience (2016). Available online, DOI: 10.1002/cpe.3921.
M. Kreutzer, G. Hager, G. Wellein, A. Pieper, A. Alvermann, and H. Fehske: Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems. Proc. IPDPS15, the 29th IEEE International Parallel & Distributed Processing Symposium, May 25-29, 2015, Hyderabad, India. DOI: 10.1109/IPDPS.2015.76.
J. Treibig, G. Hager, and G. Wellein: Pattern-Driven Performance Engineering. Poster at SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, November 2013.
J. Treibig, G. Hager, and G. Wellein: Performance patterns and hardware metrics on modern multicore processors: Best practices for performance engineering. Proc. 5^th Workshop on Productivity and Performance (PROPER 2012) at Euro-Par 2012, August 28, 2012, Rhodes Island, Greece. Euro-Par 2012: Parallel Processing Workshops, Lecture Notes in Computer Science 7640, 451-460 (2013), Springer, ISBN 978-3-642-36948-3. DOI: 10.1007/978-3-642-36949-0_50. Preprint: arXiv:1206.3738

Performance Modeling

We expedite the use of analytic, diagnostic performance models as crucial components in performance engineering. The Execution-Cache-Memory (ECM) model, a refinement of the well-known Roofline model, was developed in our group. We also analyze and model collective behavior in large-scale parallel codes with a variety of methods.

Activities

Selected Publications

A. Afzal, G. Hager, and G. Wellein: The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs. IEEE Transactions on Parallel and Distributed Systems 34(2), 623-638 (2023), DOI: 10.1109/TPDS.2022.3221085. Preprint: arXiv:2205.04190
D. Ernst, M. Holzer, G. Hager, M. Knorr, and G. Wellein: Analytical Performance Estimation during Code Generation on Modern GPUs. Journal of Parallel and Distributed Computing 173, 152-167 (2023). DOI: 10.1016/j.jpdc.2022.11.003, Preprint: arXiv:2204.14242
A. Afzal, G. Hager, and G. Wellein: Analytic performance model for parallel overlapping memory-bound kernels. Concurrency and Computation: Practice and Experience (January 2022). Available with Open Access. DOI: 10.1002/cpe.6816, Preprint: arXiv:2011.00243
A. Afzal, G. Hager, and G. Wellein: Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs. In: P. Sadayappan, B. Chamberlain, G. Juckeland, H. Ltaief (eds): High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science, vol 12151. Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-030-50743-5_20
J. Hofmann, G. Hager, and D. Fey: On the accuracy and usefulness of analytic energy models for contemporary multicore processors. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 22-43. DOI: 10.1007/978-3-319-92040-5_2. Winner of the ISC 2018 Gauss Award.
H. Stengel, J. Treibig, G. Hager, and G. Wellein: Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model. Proc. ICS15, the 29th International Conference on Supercomputing, June 8-11, 2015, Newport Beach, CA. DOI: 10.1145/2751205.2751240.
G. Hager, J. Treibig, J. Habich, and G. Wellein: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency and Computation: Practice and Experience 28(2), 189-210 (2016). First published online December 2013, DOI: 10.1002/cpe.3180.
J. Treibig and G. Hager: Introducing a Performance Model for Bandwidth-Limited Loop Kernels. Proceedings of the Workshop “Memory issues on Multi- and Manycore Platforms” at PPAM 2009, the 8th International Conference on Parallel Processing and Applied Mathematics, Wroclaw, Poland, September 13-16, 2009. Lecture Notes in Computer Science, Volume 6067, 2010, pp 615-624. DOI: 10.1007/978-3-642-14390-8_64.

Performance Tools

We created and maintain the Likwid performance tool suite and the Kerncraft loop kernel analysis and performance modeling tool.

Activities

Selected Publications

T. Röhl, J. Eitzinger, G. Hager, and G. Wellein: LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses. Accepted for the HPCMASPA 2017, the Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI, September 5, 2017. Preprint: arXiv:1708.01476
J. Hammer, J. Eitzinger, G. Hager, and G. Wellein: Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels. In: Niethammer C., Gracia J., Hilbrich T., Knüpfer A., Resch M., Nagel W. (eds), Tools for High Performance Computing 2016, ISBN 978-3-319-56702-0, 1-22 (2017). Proceedings of IPTW 2016, the 10th International Parallel Tools Workshop, October 4-5, 2016, Stuttgart, Germany. Springer, Cham. DOI: 10.1007/978-3-319-56702-0_1, Preprint: arXiv:1702.04653
J. Hammer, G. Hager, J. Eitzinger, and G. Wellein: Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft. Proc. PMBS15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, in conjunction with ACM/IEEE Supercomputing 2015 (SC15), November 16, 2015, Austin, TX. DOI: 10.1145/2832087.2832092,
J. Treibig, G. Hager, and G. Wellein: likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes. In: H. Brunst et al. (eds.), Tools for High Performance Computing 2011. Springer, ISBN 978-3-642-31475-9, (2012) 27-36 . DOI: 978-3-642-31475-9.
J. Treibig, G. Hager and G. Wellein: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, September 13, 2010. DOI: 10.1109/ICPPW.2010.38, Preprint: arXiv:1004.4431

Hardware-Aware Building Blocks for Sparse Linear Algebra and Stencil Solvers

We investigate programming concepts and numerical algorithms for scalable, efficient and robust iterative sparse matrix applications and stencil-based solvers on HPC systems.

Selected Publications

M. Kreutzer, G. Hager, D. Ernst, H. Fehske, A.R. Bishop, and G. Wellein: Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 329-349. DOI: 10.1007/978-3-319-92040-5_17. ISC 2018 Hans Meuer Award Finalist.
T. M. Malas, G. Hager, H. Ltaief, and D. E. Keyes: Multi-dimensional intra-tile parallelization for memory-starved stencil computations. ACM Transactions on Parallel Computing 4(3), 12:1-12:32 (2017). DOI: 10.1145/3155290
M. Kreutzer, J. Thies, M. Röhrig-Zöllner, A. Pieper, F. Shahzad, M. Galgon, A. Basermann, H. Fehske, G. Hager, and G. Wellein: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. International Journal of Parallel Programming (2016). DOI: 10.1007/s10766-016-0464-z.
A. Pieper, M. Kreutzer, A. Alvermann, M. Galgon, H. Fehske, G. Hager, B. Lang, and G. Wellein: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. Journal of Computational Physics 325, 226-243 (2016). DOI: 10.1016/j.jcp.2016.08.027

Software Engineering for High Performance Computing and Data Analytics

Increasing computing power and the increasing amount of data enables us to significantly improve the mathematical models for various applications. We investigate the fusion of classical model-driven and data-driven approaches and their implementation in modern, open source research software. Our focus lies on code generation technology for numerical solvers or machine learning methods on (block)-structured data.

Activities

DFG Research Group Solidification Cracks in Laser Beam Welding: High Performance Computing for High Performance Processes, Subproject TP7: Sustainable Data and Software Management for Research Software for the Simulation of Solidification Cracks in Laser Beam Welding
EU project SCALABLE (SCAlable LAttice Boltzmann Leaps to Exascale)
DFG Project Dynamic HPC Software Packages: Seamless integration of existing software packages and code generation techniques

Selected Publications

M. Bauer, S. Eibl, C. Godenschwager, N. Kohl, M. Kuron, C. Rettinger, F. Schornbaum, C. Schwarzmeier, D. Thönnes, H. Köstler, et al. waLBerla: A block-structured high-performance framework for multiphysics simulations. Computers & Mathematics with Applications, 81:478-501, 2021
M. Bauer, H. Köstler, and U. Rüde, lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods, Journal of Computational Science, vol. 49, p. 101269, 2021
S. Faghih-Naini, S. Kuckuk, V. Aizinger, D. Zint, R. Grosso, and H. Köstler. Quadrature-free discontinuous galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes. Advances in Water Resources, 138:103552, 2020
J. Schmitt, S. Kuckuk, and H. Köstler. Constructing efficient multigrid solvers with genetic programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 1012-1020, 2020.
R. R. L. Machado, J. Schmitt, S. Eibl, J. Eitzinger, R. Leiÿa, S. Hack, A. Pérard-Gayot, R. Membarth, and H. Köstler. tinymd: A portable and scalable implementation for pairwise interactions simulations. arXiv preprint arXiv:2009.07400, 2020
C. Lengauer, S. Apel, M. Bolten, S. Chiba, U. Rüde, J. Teich, A. Größlinger, F. Hannig, H. Köstler, L. Claus, et al., Exastencils: Advanced multigrid solver generation, in Software for Exascale Computing-SPPEXA 2016-2019, 405-452, Springer, Cham, 2020
M. Bauer, J. Hötzer, D. Ernst, J. Hammer, M. Seiz, H. Hierl, J. Hönig, H. Köstler, G. Wellein, B. Nestler, et al., Code generation for massively parallel phase-field simulations, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1-32, 2019
F. Nicoli, K. König, M. Dahmardeh, A. Gemeinhardt, R. G. Mahmoodabadi, H. M. Dastjerdi, H. Köstler, and V. Sandoghdar. Opportunities and challenges of single protein detection with iscat. In 2019 Conference on Lasers and Electro-Optics Europe & European Quantum Electronics Conference (CLEO/Europe-EQEC), pages 1-1. IEEE, 2019
M. Heisig and H. Köstler. Petalisp: run time code generation for operations on strided arrays. In Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, 11-17, 2018