Navigation

Research focus

Performance Engineering

We are active and known for our work in the field of node-level performance engineering (PE). We employ a systematic performance engineering process based on performance patterns, develop high performance prototype codes and libraries, and perform performance analysis of codes and hardware platforms.

Activities

Selected Publications

  • D. Ernst, G. Hager, J. Thies, and G. Wellein: Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. Accepted for PPAM’2019, the 13th International Conference on Parallel Processing and Applied Mathematics,  September 8-11, 2019, Białystok, Poland. Preprint: arXiv:1905.03136
  • J. Hofmann, D. Fey, M. Riedmann, J. Eitzinger, G. Hager, and G. Wellein: Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors. Concurrency & Computation: Practice & Experience (2016). Available online, DOI: 10.1002/cpe.3921.
  • M. Kreutzer, G. Hager, G. Wellein, A. Pieper, A. Alvermann, and H. Fehske: Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems. Proc. IPDPS15, the 29th IEEE International Parallel & Distributed Processing Symposium, May 25-29, 2015, Hyderabad, India. DOI: 10.1109/IPDPS.2015.76.
  • J. Treibig, G. Hager, and G. Wellein: Pattern-Driven Performance Engineering. Poster at SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, November 2013.
  • J. Treibig, G. Hager, and G. Wellein: Performance patterns and hardware metrics on modern multicore processors: Best practices for performance engineering. Proc. 5th Workshop on Productivity and Performance (PROPER 2012) at Euro-Par 2012, August 28, 2012, Rhodes Island, Greece. Euro-Par 2012: Parallel Processing Workshops, Lecture Notes in Computer Science 7640, 451-460 (2013), Springer, ISBN 978-3-642-36948-3. DOI: 10.1007/978-3-642-36949-0_50. Preprint: arXiv:1206.3738

Performance Modeling

We expedite the use of analytic, diagnostic performance models as crucial components in performance engineering. The Execution-Cache-Memory (ECM) model, a refinement of the well-known Roofline model, was developed in our group.

Activities

Selected Publications

  • M. Wittmann, G. Hager, R. Janalík, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, and G. Wellein: Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model. Proc. 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), September 24-27, 2018, Lyon, France, 233-241. DOI: 10.1109/CAHPC.2018.8645938
  • J. Hofmann, G. Hager, and D. Fey: On the accuracy and usefulness of analytic energy models for contemporary multicore processors. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 22-43. DOI: 10.1007/978-3-319-92040-5_2. Winner of the ISC 2018 Gauss Award.
  • H. Stengel, J. Treibig, G. Hager, and G. Wellein: Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model. Proc. ICS15, the 29th International Conference on Supercomputing, June 8-11, 2015, Newport Beach, CA. DOI: 10.1145/2751205.2751240.
  • G. Hager, J. Treibig, J. Habich, and G. Wellein: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency and Computation: Practice and Experience 28(2), 189-210 (2016). First published online December 2013, DOI: 10.1002/cpe.3180.
  • J. Treibig and G. Hager: Introducing a Performance Model for Bandwidth-Limited Loop Kernels. Proceedings of the Workshop “Memory issues on Multi- and Manycore Platforms” at PPAM 2009, the 8th International Conference on Parallel Processing and Applied Mathematics, Wroclaw, Poland, September 13-16, 2009. Lecture Notes in Computer Science, Volume 6067, 2010, pp 615-624. DOI: 10.1007/978-3-642-14390-8_64.

 

Tool Development

We created and maintain the Likwid performance tool suite and the Kerncraft loop kernel analysis and performance modeling tool.

Activities

Selected Publications

  • T. Röhl, J. Eitzinger, G. Hager, and G. Wellein: LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses. Accepted for the HPCMASPA 2017, the Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI, September 5, 2017. Preprint: arXiv:1708.01476
  • J. Hammer, J. Eitzinger, G. Hager, and G. Wellein: Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels. In: Niethammer C., Gracia J., Hilbrich T., Knüpfer A., Resch M., Nagel W. (eds), Tools for High Performance Computing 2016, ISBN 978-3-319-56702-0, 1-22 (2017). Proceedings of IPTW 2016, the 10th International Parallel Tools Workshop, October 4-5, 2016, Stuttgart, Germany. Springer, Cham. DOI: 10.1007/978-3-319-56702-0_1, Preprint: arXiv:1702.04653
  • J. Hammer, G. Hager, J. Eitzinger, and G. Wellein: Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft. Proc. PMBS15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, in conjunction with ACM/IEEE Supercomputing 2015 (SC15), November 16, 2015, Austin, TX. DOI: 10.1145/2832087.2832092,
  • J. Treibig, G. Hager, and G. Wellein: likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes. In: H. Brunst et al. (eds.), Tools for High Performance Computing 2011. Springer, ISBN 978-3-642-31475-9, (2012) 27-36 . DOI: 978-3-642-31475-9.
  • J. Treibig, G. Hager and G. Wellein: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, September 13, 2010. DOI: 10.1109/ICPPW.2010.38, Preprint: arXiv:1004.4431

Hardware-Aware Building Blocks for Sparse Linear Algebra and Stencil Solvers

We investigate programming concepts and numerical algorithms for scalable, efficient and robust iterative sparse matrix applications and stencil-based solvers on HPC systems.

Selected Publications

  • M. Kreutzer, G. Hager, D. Ernst, H. Fehske, A.R. Bishop, and G. Wellein: Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 329-349. DOI: 10.1007/978-3-319-92040-5_17ISC 2018 Hans Meuer Award Finalist.
  • T. M. Malas, G. Hager, H. Ltaief, and D. E. Keyes: Multi-dimensional intra-tile parallelization for memory-starved stencil computations. ACM Transactions on Parallel Computing 4(3), 12:1-12:32 (2017). DOI: 10.1145/3155290
  • M. Kreutzer, J. Thies, M. Röhrig-Zöllner, A. Pieper, F. Shahzad, M. Galgon, A. Basermann, H. Fehske, G. Hager, and G. Wellein: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. International Journal of Parallel Programming (2016). DOI: 10.1007/s10766-016-0464-z.
  • A. Pieper, M. Kreutzer, A. Alvermann, M. Galgon, H. Fehske, G. Hager, B. Lang, and G. Wellein: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. Journal of Computational Physics 325, 226-243 (2016). DOI: 10.1016/j.jcp.2016.08.027