Publications, Posters & Talks

Below you can also find lists of our posters and talks.

Publications

Afzal A.:
A Holistic White-Box Approach to Performance Modeling for Supercomputing
FAU University Press, 2026
(FAU Studien aus der Informatik, Vol.22)
ISBN: 978-3-96147-940-5
DOI: 10.25593/978-3-96147-940-5
URL: https://open.fau.de/handle/openfau/40254
Afzal A., Hager G., Wellein G.:
Exploring metrics for analyzing dynamic behavior in MPI programs via a coupled-oscillator model
In: Parallel Computing (2026)
ISSN: 0167-8191
DOI: 10.1016/j.parco.2026.103184
URL: https://www.sciencedirect.com/science/article/abs/pii/S0167819126000025
Afzal A., Manfred Li M., Panzlaff M.:
Modeling and Chasing the Energy-Efficiency Sweet Spots in Modern GPUs
16th International Conference on Parallel Processing and Applied Mathematics, PPAM 2026 (Poznań, Poland, 2026-08-30 - 2026-09-02)
In: Wyrzykowski, R., Deelman, E. (ed.): Lecture Notes in Computer Science 2026
DOI: 10.48550/arXiv.2607.00819
Bhandary Panambur A., Nguyen TT., Bayer S., Maier A.:
Lesion-Aware AI for Mammography: Multi-Dataset Pretraining with ROI-Guided Contrastive Learning and Clinical Image Retrieval
ECR 2026
DOI: 10.26044/ecr2026/C-15780
Ghete T., Gaschler L., Krumbholz M., Sembill SS., Behrens YL., Karow A., Wölfl M., Auer F., Hauer J., Carta MG., Ferrazzi F., Hutter S., Horn A., Sticht H., Schlegel PG., Metzler M.:
Prevalence and characterization of germline RAS pathway variants in children with chronic myeloid leukemia
In: Leukemia 40 (2026), p. 1527–1531
ISSN: 0887-6924
DOI: 10.1038/s41375-026-02952-z
Laukemann J., Hager G., Wellein G.:
Microarchitectural comparison, in-core modeling, and memory hierarchy analysis of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa
In: Parallel Computing 127 (2026), Article No.: 103183
ISSN: 0167-8191
DOI: 10.1016/j.parco.2026.103183
Ma B., Afzal A., Eitzinger J., Wellein G.:
The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures
16th International Conference on Parallel Processing and Applied Mathematics, PPAM 2026 (Poznań, Poland, 2026-08-30 - 2026-09-02)
In: Wyrzykowski, R., Deelman, E. (ed.): Lecture Notes in Computer Science 2026
DOI: 10.48550/arXiv.2605.11999
Mayr M., Wind S., Schröder L., Hager G., Köstler H., Wellein G.:
AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models
(2026)
DOI: 10.48550/arXiv.2603.16164
Monroy LCR., Mayr M., Mill L., Köstler H., Maier A.:
Patch-Level Brain Tumor Sub-region Classification Using Foundation Models Under Long-Tailed Data Distributions
Brain TumorS Lighthouse Cluster of Challenges, and the Automated Identification of Moderate-Severe Traumatic Brain Injury Lesions Challenge, BraTS 2025 and AIMS-TBI 2025, held in Conjunction International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2025 (Daejeon, KOR, 2025-09-23 - 2025-09-27)
In: Spyridon Bakas, Emily Dennis, Mehdi Astaraki, Ujjwal Baid, Gian Marco Conte, Martha Foltyn-Dumitru, Zhifan Jiang, Marius George Linguraru, Dominic Labella, Marie-Christin Metz, Udunna Anazodo, Maria Correia de Verdier, Florian Kofler, Hongwei Bran Li, Nazanin Maleki (ed.): Lecture Notes in Computer Science 2026
DOI: 10.1007/978-3-032-16370-7_19
Schröder L., Kavane S., Köstler H.:
A Validated LBM Dataset and Pipeline for Surrogate Modeling of Turbulent 3D Obstructed Channel Flows
(2026)
DOI: 10.48550/arXiv.2606.16765
Trollmann M., Böckmann R.:
Decoding pH-Driven Phase Transition of Lipid Nanoparticles
In: Small (2026)
ISSN: 1613-6829
DOI: 10.1002/smll.202511381

Afzal A., Li M., Panzlaff M.:
Modeling and Chasing the Energy-Efficiency Sweet Spots in Modern GPUs.
Accepted for publication in 16th International Conference on Parallel Processing and Applied Mathematics, PPAM 2026 (Poznań, Poland, 2026-08-30 – 2026-09-02).
DOI: 10.48550/arXiv.2607.00819
Ma B., Afzal A., Eitzinger J., Wellein G.:
The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures.
Accepted for publication in 16th International Conference on Parallel Processing and Applied Mathematics, PPAM 2026 (Poznań, Poland, 2026-08-30 – 2026-09-02).
DIO: 10.48550/arXiv.2605.11999.
A. Afzal, G. Hager, and G. Wellein: Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters. Submitted. ArXiv DOI: 10.48550/arXiv.2604.08182

Afzal A., Bates N., Jana S.:
EESP'25: 1st International Workshop on Energy Efficiency with Sustainable Performance: Techniques, Tools, and Best Practices
EESP'25: 1st International Workshop on Energy Efficiency with Sustainable Performance: Techniques, Tools, and Best Practices (Hamburg, Germany, 2025-06-13 - 2025-06-13)
In: ISC High Performance 2025 International Workshops, Hamburg, Germany, June 10–13, 2025, Revised Selected Papers 2025
DOI: 10.1007/978-3-032-07612-0
Afzal A., Hager G.:
PERMAVOST '25: 5th Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy
5th Workshop on Performance Engineering, Modelling, Analysis, and Visualization Strategy, PERMAVOST 2025 (Notre Dame, IN, USA, 2025-07-20 - 2025-07-20)
In: PERMAVOST 2025 - Proceedings of the 2025 on Performance Engineering, Modelling, Analysis, and Visualization Strategy 2025
DOI: 10.1145/3731545
URL: https://dl.acm.org/doi/proceedings/10.1145/3731545#heading3
Afzal A., Hager G., Wellein G.:
Analytic roofline modeling and energy analysis of the LULESH proxy application on multi-core clusters
In: International Journal of High Performance Computing Applications (2025), Article No.: 10943420251363711
ISSN: 1094-3420
DOI: 10.1177/10943420251363711
Afzal A., Hager G., Wellein G.:
GROMACS Unplugged: How Power Capping and Frequency Shapes Performance on GPUs
31st International European Conference on Parallel and Distributed Computing (Euro-Par 2025) (Dresden, Germany, 2025-08-25 - 2025-08-29)
In: Euro-Par 2025: Parallel Processing Workshops Volume in the Springer Lecture Notes in Computer Science (LNCS) series. 2025
DOI: 10.48550/arXiv.2510.06902
Böhm F., Bauer D., Kohl N., Alappat C., Thönnes D., Mohr M., Köstler H., Rüde U.:
Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids
In: SIAM Journal on Scientific Computing 47 (2025), p. B131-B159
ISSN: 1064-8275
DOI: 10.1137/24M1653756
Denysenko O., Horn A., Sticht H.:
Comparative Molecular Dynamics Study of 19 Bovine Antibodies with Ultralong CDR H3
In: Antibodies 14 (2025), Article No.: 70
ISSN: 2073-4468
DOI: 10.3390/antib14030070
Godé H., Kruse C., Angersbach R., Köstler H., Bauerheim M., Rüde U.:
Comparison of Multigrid and Machine Learning-Based Poisson Solvers
15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024 (Ostrava, 2024-09-08 - 2024-09-11)
In: Roman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski (ed.): Lecture Notes in Computer Science 2025
DOI: 10.1007/978-3-031-85703-4_12
Gourmelon N., Dreier MN., Mayr M., Seehaus T., Pyles DR., Braun M., Maier A., Christlein V.:
SSL4SAR: Self-Supervised Learning for Glacier Calving Front Extraction from SAR Imagery
In: IEEE Transactions on Geoscience and Remote Sensing (2025)
ISSN: 0196-2892
DOI: 10.1109/TGRS.2025.3580945
Handke M., Beierlein F., Imhof P., Schiedel M., Hammann S.:
New fluorogenic triacylglycerols as sensors for dynamic measurement of lipid oxidation
In: Analytical and Bioanalytical Chemistry 417 (2025), p. 287-296
ISSN: 1618-2642
DOI: 10.1007/s00216-024-05642-w
Heger L., Ankermann P., Socher E.:
Molecular Characterization of the GALC Mutation Thr112Ala Causing Krabbe Disease
In: International Journal of Molecular Sciences 26 (2025), p. 8647
ISSN: 1422-0067
DOI: 10.3390/ijms26178647
Hüttner L., Mayr M., Gorges T., Wu F., Seuret M., Maier A., Christlein V.:
Low-Rank Adaptation vs. Fine-Tuning for Handwritten Text Recognition
2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) (Tucson, Arizona, USA, 2025-02-28 - 2025-03-04)
In: IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) 2025
DOI: 10.1109/WACVW65960.2025.00146
URL: https://ieeexplore.ieee.org/document/10972546
Lacey D., Alappat C., Lange F., Hager G., Fehske H., Wellein G.:
Cache blocking of distributed-memory parallel matrix power kernels
In: International Journal of High Performance Computing Applications 39 (2025), p. 385-404
ISSN: 1094-3420
DOI: 10.1177/10943420251319332
Lange F., Heunisch L., Fehske H., DiVincenzo DP., Hartmann M.:
Cross-talk in superconducting qubit lattices with tunable couplers – comparing transmon and fluxonium architectures
In: Quantum Science and Technology 11 (2025), p. 015020
ISSN: 2058-9565
DOI: 10.1088/2058-9565/ae2358
Laukemann J.:
Reproducibility Report for SC25 Paper Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality
2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 (St. Louis, MO, USA, 2025-11-16 - 2025-11-21)
In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 2025
DOI: 10.1145/3712285.3769440
Laukemann J., Helal AE., Anderson SIG., Checconi F., Soh Y., Tithi JJ., Ranadive T., Gravelle BJ., Petrini F., Choi J.:
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation
In: IEEE Transactions on Parallel and Distributed Systems (2025)
ISSN: 1045-9219
DOI: 10.1109/TPDS.2025.3553092
Mayr M., Krenz J., Neumeier K., Bub A., Bürcky S., Brolich N., Herbers K., Habermann M., Fleischmann P., Maier A., Christlein V.:
Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis.
In: Scientific Data 12 (2025), Article No.: 811
ISSN: 2052-4463
Open Access: https://www.nature.com/articles/s41597-025-05144-z
URL: https://www.nature.com/articles/s41597-025-05144-z
Raucheisen M., Attia D., Körber M., Crevillén AA., Hidalgo A., Beierlein F., Imhof P., Mokhir A.:
Mitochondria-Catalyzed Activation of Anticancer Prodrugs
In: ChemCatChem (2025)
ISSN: 1867-3880
DOI: 10.1002/cctc.202500054
Suarez E., Bockelmann H., Eicker N., Eitzinger J., El Sayed S., Fieseler T., Frank M., Frech P., Giesselmann P., Hackenberg D., Hager G., Herten A., Ilsche T., Koller B., Laure E., Manzano C., Oeste S., Ott M., Reuter K., Schneider R., Thust K., von St. Vieth B.:
Energy-aware operation of HPC systems in Germany
In: Frontiers in High Performance Computing 3 (2025)
DOI: 10.3389/fhpcp.2025.1520207
Wind S., Sopa J., Truhn D., Lotfinia M., Nguyen TT., Bressem K., Adams L., Rusu M., Köstler H., Wellein G., Maier A., Tayebi Arasteh S.:
Multi-step retrieval and reasoning improves radiology question answering with large language models
In: npj Digital Medicine 8 (2025), Article No.: 790
ISSN: 2398-6352
DOI: 10.1038/s41746-025-02250-5
URL: https://www.nature.com/articles/s41746-025-02250-5
Wolfson-Pou J., Laukemann J., Petrini F.:
MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
ICS '25: 2025 International Conference on Supercomputing Salt Lake City USA (Salt Lake City, 2025-06-08 - 2025-06-11)
In: ICS '25: Proceedings of the 39th ACM International Conference on Supercomputing 2025
DOI: 10.1145/3721145.3725773
Wu F., Dreier MN., Gourmelon N., Wind S., Zhang J., Seehaus T., Braun M., Maier A., Christlein V.:
AMD-HookNet++: Evolution of AMD-HookNet with Hybrid CNN-Transformer Feature Enhancement for Glacier Calving Front Segmentation
In: IEEE Transactions on Geoscience and Remote Sensing 64 (2025), p. 1-22
ISSN: 0196-2892
DOI: 10.1109/TGRS.2025.3642764
URL: https://ieeexplore.ieee.org/document/11296938
Wu F., Seuret M., Mayr M., Kordon F., Zöllner J., Wind S., Maier A., Christlein V.:
Lightweight cross-attention-based HookNet for historical handwritten document layout analysis
In: International Journal on Document Analysis and Recognition 28 (2025), p. 409-427
ISSN: 1433-2833
DOI: 10.1007/s10032-025-00519-9
URL: https://link.springer.com/article/10.1007/s10032-025-00519-9

E. Oikonomou, Y. Juli, R.R. Kolan, L. Kern, T. Gruber, C. Alzheimer, P. Krauss, A. Maier, and T. Huth: A deep learning approach to real-time Markov modeling of ion channel gating. Commun Chem 7, 280 (2024). DOI: 10.1038/s42004-024-01369-y
J. Laukemann, G. Hager, and G. Wellein: Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa. Proc. 15th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2024) , Atlanta, GA, USA, November 18, 2024, DOI: 10.1109/SCW63240.2024.00181. Preprint: arXiv:2409.08108
F. Lange and H. Fehske: Metal-insulator transition of spinless fermions coupled to dispersive optical bosons. Sci Rep 14, 18050 (2024), DOI: 10.1038/s41598-024-68811-y
C. Alappat, J. Thies, G. Hager, H. Fehske, and G. Wellein: Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs. The International Journal of High Performance Computing Applications, 2024;0(0). Available with Open Access. DOI: 10.1177/10943420241283828. Preprint: arXiv:2309.02228
K. Nolkemper, M. Antonietti, T. D. Kühne, and S. A. Ghasemi: Kinetically Stable and Highly Ordered Two-Dimensional CN2 Crystal Structures. J. Phys. Chem. C 128(1), 330-338 (2024), DOI: 10.1021/acs.jpcc.3c03539
F. Lange, G. Wellein, and H. Fehske: Charge-order melting in the one-dimensional Edwards model. Phys. Rev. Res. 6, L022007 (2024), DOI: 10.1103/PhysRevResearch.6.L022007
M. Chakraborty and H. Fehske: Quantum transport in an environment parametrized by dispersive bosons. Phys. Rev. B 109, 085125 (2024), DOI: 10.1103/PhysRevB.109.085125
R. Ravedutti Lucio Machado, J, Eitzinger, and H. Köstler: P4irs: An Intermediate Representation and Compiler for Parallel and Performance-Portable Particle Simulations. SSRN, 4714072 (2024), DOI: 10.2139/ssrn.4714072
H. Owen, D. Ernst, T. Gruber, O. Lemkuhl, G. Houzeaux, L. Gasparino, and Gerhard Wellein: Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs. In 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA, USA, 2024 pp. 408-416. DOI: 10.1109/IPDPS57955.2024.00043 Preprint: arXiv:2403.08777
J. Laukemann, T. Gruber, G. Hager, D. Oryspayev, and G. Wellein: CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion. In 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA, USA, 2024 pp. 350-360. DOI: 10.1109/IPDPS57955.2024.00038. Preprint: arXiv:2311.04797

M. Chakraborty, J. Chatterjee, and H. Fehske: Particularities of polaron formation in the extended Holstein model with next nearest neighbor transfer. Phys. Rev. B 108, 235134 (2023), DOI: 10.1103/PhysRevB.108.235134
S. Ejima, F. Lange, and H. Fehske: Entanglement analysis of photoinduced η-pairing states. Eur. Phys. J. Spec. Top. (2023), DOI: 10.1140/epjs/s11734-023-00975-6
E. Oikonomou, T. Gruber, R.C. Achanta, S. Höller, C. Alzheimer, G. Wellein, and T. Huth: 2D-dwell-time analysis with simulations of ion-channel gating using high-performance computing. Biophysical Journal (2023), DOI: 10.1016/j.bpj.2023.02.023
A. Afzal, G. Hager, and G. Wellein: Physical Oscillator Model for Supercomputing. Proc. 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23), Denver, CO, USA. PMBS23 Best Short Paper Award. Available with Open Access. DOI: 10.1145/3624062.3625535, Preprint: arXiv:2310.05701
A. Afzal, G. Hager, and G. Wellein: SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study. Proc. 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23), Denver, CO, USA. Available with Open Access. DOI: 10.1145/3624062.3624197 , Preprint: arXiv:2309.05373
A. Alvermann, G. Hager, and H. Fehske: Orthogonal layers of parallelism in large-scale eigenvalue computations. ACM Transactions on Parallel Computing 10(3), Article 16 (September 2023), pp 1-31. DOI: 10.1145/3614444. Preprint: arXiv:2209.01974
L. Berg, A. Alvermann, and H. Fehske: Quantized charge transport in disordered Floquet topological insulators in the absence of Anderson localization. Phys. Rev. B 108, 035123 (2023). DOI: 10.1103/PhysRevB.108.035123. Preprint: arXiv:2301.09520v3
S. Ejima and H. Fehske: Photoinduced pairing in Mott insulators. SciPost Phys. Proc. *11*, 009 (2023). DOI: 10.21468/SciPostPhysProc.11.009. Preprint: arXiv:2301.04496
A. Filusch and H. Fehske: Singular flat bands in the modified Haldane-Dice model. Physica B: Condensed Matter 659, 414848 (2023). DOI: 10.1016/j.physb.2023.414848. Preprint: arXiv:2303.17850
R. Ravedutti Lucio Machado, J. Eitzinger, J. Laukemann, G. Hager, H. Köstler, and G. Wellein: MD-Bench: Engineering the in-core performance of short-range molecular dynamics kernels from state-of-the-art simulation packages. Future Generation Computer Systems (2023), ISSN 0167-739X, DOI: 10.1016/j.future.2023.06.023. Preprint: arXiv:2302.14660
A. Afzal, G. Hager, S. Markidis, and G. Wellein: Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives. Future Generation Computer Systems (2023), ISSN 0167-739X, DOI: 10.1016/j.future.2023.06.017. Preprint: arXiv:2302.12164
D. Ernst, M. Holzer, G. Hager, M. Knorr, and G. Wellein: Analytical Performance Estimation during Code Generation on Modern GPUs. Journal of Parallel and Distributed Computing 173, 152-167 (2023). DOI: 10.1016/j.jpdc.2022.11.003, Preprint: arXiv:2204.14242
C. L. Alappat, G. Hager, O. Schenk, and G. Wellein: Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication. IEEE Transactions on Parallel and Distributed Systems 34(2), 581-597 (2023), DOI: 10.1109/TPDS.2022.3223512. Preprint: arXiv:2205.01598
A. Afzal, G. Hager, and G. Wellein: The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs. IEEE Transactions on Parallel and Distributed Systems 34(2), 623-638 (2023), DOI: 10.1109/TPDS.2022.3221085. 2023 Best Paper Runner-up in IEEE TPDS. Preprint: arXiv:2205.04190
R. Ravedutti Lucio Machado, J. Eitzinger, H. Köstler, and G. Wellein: MD-Bench: A generic proxy-app toolbox for state-of-the-art molecular dynamics algorithms. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. . PPAM 2022 Best Paper Award. DOI: 10.1007/978-3-031-30442-2_24, Preprint: arXiv:2207.13094
A. Afzal, G. Hager, G. Wellein, and S. Markidis: Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-031-30442-2_12, Preprint: arXiv:2205.13963

A. Filusch and H. Fehske: Tunable valley filtering in dynamically strained alpha-T3 lattices. Phys. Rev. B 106, 245106 (2022). DOI: 10.1103/PhysRevB.106.245106. Preprint: arxiv:2210.16522
S. Ejima, F. Lange, and H. Fehske: Photoinduced metallization of excitonic insulators. Phys. Rev. *B 105*, 245126 (2022). DOI: 10.1103/PhysRevB.105.245126. Preprint: arxiv:2204.0908
M. Trollmann and R. Böckmann: mRNA lipid nanoparticle phase transition, In: Biophysical Journal, Volume 121, Issue 20, pp. 3927-3939, October 2022. DOI: 10.1016/j.bpj.2022.08.037
D. Zint, R. Grosso, V. Aizinger, S. Faghih-Naini, S. Kuckuk, and H. Köstler: Automatic Generation of Load-Balancing-Aware Block-Structured Grids for Complex Ocean Domains, In: Proceedings of the 2022 SIAM International Meshing Roundtable, DOI: 10.5281/zenodo.6562440
A. Afzal, G. Wellein, and G. Hager: Addressing White-box Modeling and Simulation Challenges in Parallel Computing. In: SIGSIM-PADS ’22: Proceedings of the 2022 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp. 25-26, June 2022. DOI: 10.1145/3518997.3534986
A. Nguyen, A. E. Helal, F. Checconi, J. Laukemann, J. J. Tithi, Y. Soh, T. Ranadive, F. Petrini, and J. W. Choi: Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures. Accepted for publication at ICS 2022, the ACM International Conference on Supercomputing, June 27-30, 2022 (virtual). Preprint: arXiv:2201.12523
A. Afzal, G. Hager, and G. Wellein: Analytic performance model for parallel overlapping memory-bound kernels. Concurrency and Computation: Practice and Experience 34(10), e6816 (2022). Available with Open Access. DOI: 10.1002/cpe.6816, Preprint: arXiv:2011.00243

D. Ernst, G. Hager, M. Knorr, G. Wellein, and M. Holzer: Opening the Black Box: Performance Estimation during Code Generation for GPUs. Accepted for SBAC-PAD 2021, in 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Belo Horizonte, Brazil, Oct 26-29, 2021, pp. 22-32. DOI: 10.1109/SBAC-PAD53543.2021.00014, Preprint: arXiv:2107.01143.
D. Pasadakis, C. L. Alappat, O. Schenk, and G. Wellein: Multiway p-spectral graph cuts on Grassmann manifolds. Machine Learning (2021). DOI: 10.1007/s10994-021-06108-1. Preprint: arXiv:2008.13210
C. L. Alappat, N. Meyer, J. Laukemann, T. Gruber, G. Hager, G. Wellein, and T. Wettig: ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX. Concurrency and Computation: Practice and Experience, e6512 (2021). Available with Open Access. DOI: 10.1002/cpe.6512, Preprint: arXiv:2103.03013
A. Afzal, G. Hager, and G. Wellein: Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact. Proc. ISC High Performance 2021 Digital, June 24 – July 2, 2021, Frankfurt, Germany. DOI: 10.1007/978-3-030-78713-4_19 Preprint: arXiv:2103.03175
C. L. Alappat, J. Seiferth, G. Hager, M. Korch, Thomas Rauber, and G. Wellein: YaskSite – Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures. 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, Korea (South), 2021 pp. 174-186. DOI: 10.1109/CGO51591.2021.9370316, Preprint: cgo21main-p18-p-aeebf45-49058-preprint.pdf
R. Ravedutti L. M., J. Eitzinger, A. M. Maidl, and D. Weingaertner: An instrumentation framework for performance analysis of halide schedules. Journal of Computer Languages, 101065 (2021). DOI: 10.1016/j.cola.2021.101065
R. Ravedutti L. M., J. Schmitt, S. Eibl, J. Eitzinger, R. Leißa, S. Hack, A. Pérard-Gayot, R. Membarth, and H. Köstler: tinyMD: Mapping molecular dynamics simulations to heterogeneous hardware using partial evaluation. Journal of Computational Science 54, 101425 (2021). DOI: 10.1016/j.jocs.2021.101425
T. Gruber, C. L. Alappat, J. Laukemann, and G.Hager: Webinar about LIKWID, OSACA, and Sparse Matrix-Vector Multiplication (SpMV) on A64FX processor, Institute for Advanced Computational Science at Stony Brook University, July 27, 2021, Recording: https://youtu.be/0LIvlTULdz0, Slides: http://tiny.cc/OOKAMI-Hackathon

T. Patel, A. Wagenhäuser, C. Eibel, T. Hönig, T. Zeiser, and D. Tiwari: What does Power Consumption Behavior of HPC Jobs Reveal? : Demystifying, Quantifying, and Predicting Power Consumption Characteristics. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020, pp. 799-809. DOI: 10.1109/IPDPS47924.2020.00087.
C. L. Alappat, J. Laukemann, T. Gruber, G. Hager, G. Wellein, N. Meyer, and T. Wettig: Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX. 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), GA, USA, 2020, pp. 1-7. PMBS20 Best Short Paper Award. DOI: 10.1109/PMBS51919.2020.00006 Preprint: arXiv:2009.13903
A. Klawonn, M. Lanser, O. Rheinbach, G. Wellein, and M. Wittmann: Energy efficiency of nonlinear domain decomposition methods. The International Journal of High Performance Computing Applications, (September 2020). DOI: 10.1177/1094342020953891
A. Pieper, G. Hager, and H. Fehske: A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials. The International Journal of High Performance Computing Applications, September 2020. DOI: 10.1177/1094342020959423. Preprint: arXiv:1708.09689
A. Klawonn, M. Lanser, M. Uran, O. Rheinbach, S. Köhler, J. Schröder, L. Scheunemann, D. Brands, D. Balzani, A. Gandhi, G. Wellein, M. Wittmann, O. Schenk, and R. Janalík: Exasteel: Towards a virtual laboratory for the multiscale simulation of dual-phase steel using high-performance computing. In: Bungartz HJ., Reiz S., Uekermann B., Neumann P., Nagel W. (eds.): Software for Exascale Computing – SPPEXA 2016-2019. Lecture Notes in Computational Science and Engineering 136, 351-404 (2020). Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-030-47956-5_13
C. L. Alappat, A. Alvermann, A. Basermann, H. Fehske, Y. Futamura, M. Galgon, G. Hager, S. Huber, A. Imakura, M. Kawai, M. Kreutzer, B. Lang, K. Nakajima, M. Röhrig-Zöllner, T. Sakurai, F. Shahzad, J. Thies, and G. Wellein: ESSEX: Equipping Sparse Solvers For Exascale. In: Bungartz HJ., Reiz S., Uekermann B., Neumann P., Nagel W. (eds.): Software for Exascale Computing – SPPEXA 2016-2019. Lecture Notes in Computational Science and Engineering 136, 143-187 (2020). Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-030-47956-5_7
J. Hofmann, C. L. Alappat, G. Hager, D. Fey, and G. Wellein: Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors. Supercomputing Frontiers and Innovations 7(2), 54-78, July 2020. Available with Open Access. DOI: 10.14529/jsfi200204.
J. Thies, M. Röhrig-Zöllner, N. Overmars, A. Basermann, D. Ernst, G. Hager, and G. Wellein: PHIST: a Pipelined, Hybrid-parallel Iterative Solver Toolkit. Accepted for publication in ACM Transactions on Mathematical Software (2020). Preprint: https://elib.dlr.de/123323/
C. L. Alappat, G. Hager, O. Schenk, J. Thies, A. Basermann, A. R. Bishop, H. Fehske, and G. Wellein: A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication. ACM Trans. Parallel Comput. 7(3), Article 19 (June 2020), 37 pages. Available with Open Access. DOI: 10.1145/3399732.
F. Cremonesi, G. Hager, G. Wellein, and F. Schürmann: Analytic Performance Modeling and Analysis of Detailed Neuron Simulations. The International Journal of High Performance Computing Applications, (April 2020). Available with Open Access. DOI: 10.1177/1094342020912528. Preprint: arXiv:1901.05344
D. Ernst, G. Hager, J. Thies, and G. Wellein: Performance Engineering for Real and Complex Tall & Skinny Matrix Multiplication Kernels on GPUs. The International Journal of High Performance Computing Applications, (October 2020). Available with Open Access. DOI: 1094342020965661. Preprint: arXiv:1905.03136v2
A. Afzal, G. Hager, and G. Wellein: Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs. In: P. Sadayappan, B. Chamberlain, G. Juckeland, H. Ltaief (eds): High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science, vol 12151. Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-030-50743-5_20
C. L. Alappat, J. Hofmann, G. Hager, H. Fehske, A. R. Bishop, and G. Wellein: Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors. In: P. Sadayappan, B. Chamberlain, G. Juckeland, H. Ltaief (eds): High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science, vol 12151. Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-030-50743-5_21

J. Laukemann, J. Hammer, G. Hager, and G. Wellein: Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels. 2019 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Denver, CO, USA, 2019, pp. 1-6, DOI: 10.1109/PMBS49563.2019.00006. PMBS19 Best Late-Breaking Paper Award. Preprint: arXiv:1910.00214
J. Eitzinger, T. Gruber, A. Afzal, T. Zeiser, and G. Wellein: ClusterCockpit – A web application for job-specific performance monitoring. Accepted for HPCMASPA 2019, the Workshop for Monitoring and Analysis for High Performance Computing Systems and Applications, September 23, 2019, Albuquerque, NM, USA. Held in conjunction with IEEE Cluster 2019. DOI: 10.1109/CLUSTER.2019.8891017
M. Bauer, J. Hötzer, D. Ernst, J. Hammer, M. Seiz, H. Hierl, J. Hönig, H. Köstler, G. Wellein, B. Nestler, and U. Rüde: Code Generation for Massively Parallel Phase-Field Simulations. Proc. International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, November 17-22, 2019. DOI: 10.1145/3295500.3356186
J. Hornich, J. Hammer, G. Hager, T. Gruber, and G. Wellein: Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT. Supercomputing Frontiers and Innovations 6(3), 4-25 (2019). ISSN 2313-8734. Available with Open Access. DOI: 10.14529/jsfi190301
A. Afzal, G. Hager, and G. Wellein: Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study. Proc. 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, NM, September 23-26, 2019. DOI: 10.1109/CLUSTER.2019.8890995, Preprint: arXiv:1905.10603
D. Ernst, G. Hager, J. Thies, and G. Wellein: Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. In: Wyrzykowski R., Deelman E., Dongarra J., Karczewski K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science, vol 12043. Springer, Cham. PPAM 2019 Best Paper Award. DOI: 10.1007/978-3-030-43229-4_43, Preprint: arXiv:1905.03136v1
A. Alvermann, A. Basermann, H.-J. Bungartz, C. Carbogno, D. Ernst, H. Fehske, Y. Futamura, M. Galgon, G. Hager, S. Huber, T. Huckle, A. Ida, A. Imakura, M. Kawai, S. Köcher, M. Kreutzer, P. Kus, B. Lang, H. Lederer, V. Manin, A. Marek, K. Nakajima, L. Nemec, K. Reuter, M. Rippl, M. Röhrig-Zöllner, T. Sakurai, M. Scheffler, C. Scheurer, F. Shahzad, D. Simoes Brambila, J. Thies, and G. Wellein: Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects. Proc. EPASA 2018, Japan Journal of Industrial and Applied Mathematics, 36(2), 699-717, DOI: 10.1007/s13160-019-00360-8. Preprint: arXiv:1806.01036.
F. Shahzad, J. Thies, M. Kreutzer, T. Zeiser, G. Hager, and G. Wellein: CRAFT: A library for easier application-level checkpoint/restart and automatic fault tolerance. IEEE Transactions on Parallel and Distributed Systems 30(3), 501-514 (2019). DOI: 10.1109/TPDS.2018.2866794, Preprint: arXiv:1708.02030
T. Gruber: Seminar on LIKWID Profiling for Perlmutter and novelties for the ECM model. Talk at Lawrence Berkeley National Laboratory, July 25 2019

J. Schmitt, H. Köstler, J. Eitzinger, and R. Membarth: Unified Code Generation for the Parallel Computation of Pairwise Interactions Using Partial Evaluation. Proc. 17th International Symposium on Parallel and Distributed Computing (ISPDC), Geneva, Switzerland, 2018, pp. 17-24. DOI: 10.1109/ISPDC2018.2018.00012.
G. Hager, and G. Wellein: Performance Engineering. Informatik Spektrum, ISSN 1432-122X, Online first, DOI: 10.1007/s00287-018-1122-1. (in German)
J. Laukemann, J. Hammer, J. Hofmann, G. Hager, and G. Wellein: Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures. 2018 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Dallas, TX, USA, 2018, pp. 121-131. DOI: 10.1109/PMBS.2018.8641578. Preprint: arXiv:1809.00912
M. Wittmann, G. Hager, R. Janalík, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, and G. Wellein: Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model. Proc. 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), September 24-27, 2018, Lyon, France, 233-241. DOI: 10.1109/CAHPC.2018.8645938
J. Hofmann, G. Hager, and D. Fey: On the accuracy and usefulness of analytic energy models for contemporary multicore processors. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 22-43. DOI: 10.1007/978-3-319-92040-5_2, Preprint: arXiv:1803.01618. Winner of the ISC 2018 Gauss Award.
J. Seiferth, C.L. Alappat, M. Korch, and T. Rauber: Applicability of the ECM Performance Model to Explicit ODE Methods on Current Multi-Core Processors. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 163-183. DOI: 10.1007/978-3-319-92040-5_9.
M. Kreutzer, G. Hager, D. Ernst, H. Fehske, A.R. Bishop, and G. Wellein: Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 329-349. DOI: 10.1007/978-3-319-92040-5_17. ISC 2018 Hans Meuer Award Finalist.
J. Hornich, G. Hager, and C. Pflaum: Efficient optical simulation of nano structures in thin-film solar cells. Proc. SPIE 10694, Computational Optics II, 106940R (28 May 2018); DOI: 10.1117/12.2312545
M. Wittmann, V. Haag, T. Zeiser, H. Köstler, and G. Wellein: Lattice Boltzmann Benchmark Kernels as a Testbed for Performance Analysis. Computer & Fluids, Special Issue DSFD2017, (2018). DOI: 10.1016/j.compfluid.2018.03.030. Preprint: arXiv:1711.11468.

M. Galgon, L. Krämer, B. Lang, A. Alvermann, H. Fehske, A. Pieper, G. Hager, M. Kreutzer, F. Shahzad, G. Wellein, A. Basermann, M. Röhrig-Zöllner, and J. Thies: Improved coefficients for polynomial filtering in ESSEX. In T. Sakurai, S.-L. Zhang, T. Imamura, Y. Yamamoto, Y. Kuramashi, and T. Hoshi (eds.), Eigenvalue Problems: Algorithms, Software and Applications, in Petascale Computing. Proc. EPASA 2015, Tsukuba, Japan, September 2015, volume 117 of LNCSE, pages 63-79. Springer International Publishing, 2017. DOI: 10.1007/978-3-319-62426-6_5
T. Heidig, T. Zeiser, and H. Freund: Influence of resolution of rasterized geometries on porosity and specific surface area exemplified for model geometries of porous media. Transport in Porous Media 120 (1), 207–225 (2017). DOI: 10.1007/s11242-017-0916-y.
S. Bauer, M. Mohr, U. Rüde, J. Weismüller, M. Wittmann, and B. Wohlmuth: A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes. Applied Numerical Mathematics 122 (Supplement C), 14-38 (2017). DOI: 10.1016/j.apnum.2017.07.006.
T. Röhl, J. Eitzinger, G. Hager, and G. Wellein: LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses. Accepted for the HPCMASPA 2017, the Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI, September 5, 2017. DOI: 10.1109/CLUSTER.2017.115. Preprint: arXiv:1708.01476
T. M. Malas, G. Hager, H. Ltaief, and D. E. Keyes: Multi-dimensional intra-tile parallelization for memory-starved stencil computations. ACM Transactions on Parallel Computing 4(3), 12:1-12:32 (2017). DOI: 10.1145/3155290, Preprint: arXiv:1510.04995
J. Hofmann, G. Hager, G. Wellein, and D. Fey: An analysis of core- and chip-level architectural features in four generations of Intel server processors. In: J. Kunkel et al. (eds.), High Performance Computing: 32nd International Conference, ISC High Performance 2017, Frankfurt, Germany, June 18-22, 2017, Proceedings, Springer, Cham, LNCS 10266, ISBN 978-3-319-58667-0 (2017), 294-314. DOI: 10.1007/978-3-319-58667-0_16. Preprint: arXiv:1702.07554
J. Hammer, J. Eitzinger, G. Hager, and G. Wellein: Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels. In: Niethammer C., Gracia J., Hilbrich T., Knüpfer A., Resch M., Nagel W. (eds), Tools for High Performance Computing 2016, ISBN 978-3-319-56702-0, 1-22 (2017). Proceedings of IPTW 2016, the 10th International Parallel Tools Workshop, October 4-5, 2016, Stuttgart, Germany. Springer, Cham. DOI: 10.1007/978-3-319-56702-0_1, Preprint: arXiv:1702.04653

H. Anzt, J. Dongarra, M. Kreutzer, G. Wellein, and M. Köhler: Efficiency of General Krylov Methods on GPUs – An Experimental Study. Proc. 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 683-691 (2016). DOI: 10.1109/IPDPSW.2016.45
H. Anzt, M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra: Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs. International Journal of High Performance Computing Applications (2016), ISSN: 1094-3420. DOI: 10.1177/1094342016646844
T. Röhl, J. Eitzinger, G. Hager, and G. Wellein: Validation of Hardware Events for Successful Performance Pattern Identification in High Performance Computing. In: A. Knüpfer et al. (eds.), Tools for High Performance Computing 2015, Springer International Publishing, ISBN 978-3-319-39589-0 (2016), 17-28. DOI: 10.1007/978-3-319-39589-0_2. Preprint: arXiv:1710.04094
F. Shahzad, M. Kreutzer, T. Zeiser, R. Machado, A. Pieper, G. Hager, and G. Wellein: Building and utilizing fault tolerance support tools for the GASPI applications. International Journal of High Performance Computing Applications (2016). First published date: November-28-2016, DOI: 10.1177/1094342016677085. Preprint (post-review): ft-gaspi-ijhpca.pdf
M. Kreutzer, J. Thies, M. Röhrig-Zöllner, A. Pieper, F. Shahzad, M. Galgon, A. Basermann, H. Fehske, G. Hager, and G. Wellein: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. International Journal of Parallel Programming (2016). DOI: 10.1007/s10766-016-0464-z. Preprint: arXiv:1507.08101
A. Pieper, M. Kreutzer, A. Alvermann, M. Galgon, H. Fehske, G. Hager, B. Lang, and G. Wellein: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. Journal of Computational Physics 325, 226-243 (2016). DOI: 10.1016/j.jcp.2016.08.027, Preprint: arXiv:1510.04895
J. Hofmann, D. Fey, J. Eitzinger, G. Hager, and G. Wellein: Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks. Proc. Architecture of Computing Systems — ARCS 2016, Volume 9637 of the series Lecture Notes in Computer Science, 210-222 (2016). DOI: 10.1007/978-3-319-30695-7_16
J. Hofmann, D. Fey, M. Riedmann, J. Eitzinger, G. Hager, and G. Wellein: Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors. Concurrency Computat.: Pract. Exper., 29: e3921 (2016). DOI: 10.1002/cpe.3921. Preprint: arXiv:1604.01890
M. Wittmann, T. Zeiser, G. Hager, and G. Wellein: Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices. Preprint: arXiv:1410.0412
T. M. Malas, J. Hornich, G. Hager, H. Ltaief, C. Pflaum, and D. E. Keyes: Optimization of an electromagnetics code with multicore wavefront diamond blocking and multi-dimensional intra-tile parallelization. Proc. IPDPS16, the 30th IEEE International Parallel & Distributed Processing Symposium, May 23-27, 2016, Chicago, IL. DOI: 10.1109/IPDPS.2016.87. Preprint: arXiv:1510.05218
J. Thies, M. Galgon, F. Shahzad, A. Alvermann, M. Kreutzer, A. Pieper, M. Röhrig-Zöllner, A. Basermann, H. Fehske, G. Hager, B. Lang, and G. Wellein: Towards an Exascale Enabled Sparse Solver Repository. In: Software for Exascale Computing – SPPEXA 2013-2015, Volume 113 of the series Lecture Notes in Computational Science and Engineering, 295-316 (2016). DOI: 10.1007/978-3-319-40528-5_13. Preprint: lncs_CWPs-4.pdf
M. Kreutzer, J. Thies, A. Pieper, A. Alvermann, M. Galgon, M. Röhrig-Zöllner, F. Shahzad, A. Basermann, A. R. Bishop, H. Fehske, G. Hager, B. Lang, and G. Wellein: Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers. In: Software for Exascale Computing – SPPEXA 2013-2015, Volume 113 of the series Lecture Notes in Computational Science and Engineering, 317-338 (2016). DOI: 10.1007/978-3-319-40528-5_14

B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth: Towards Textbook Efficiency for Parallel Multigrid. Numerical Mathematics-Theory Methods and Applications 8 (2015), p. 22-46, ISSN: 1004-8979, DOI: 10.4208/nmtma.2015.w10si
B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth: Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems. SIAM Journal on Scientific Computing 37 (2015), p. C143-C168, ISSN: 1064-8275, DOI: 10.1137/130941353
C. Feichtinger, J. Habich, H. Köstler, U. Rüde, and T. Aoki: Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters. Parallel Computing 46, 1-13 (2015). DOI: 10.1016/j.parco.2014.12.003
J. Hammer, G. Hager, J. Eitzinger, and G. Wellein: Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft. Proc. PMBS15, the 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, in conjunction with ACM/IEEE Supercomputing 2015 (SC15), November 16, 2015, Austin, TX. DOI: 10.1145/2832087.2832092, Preprint: arXiv:1509.03778
J. Hofmann, D. Fey, J. Eitzinger, G. Hager, and G. Wellein: Performance analysis of the Kahan-enhanced scalar product on current multicore processors. In: R. Wyrzykowski et al. (eds.), Parallel Processing and Applied Mathematics: 11th International Conference, PPAM 2015, Krakow, Poland, September 6-9, 2015. Revised Selected Papers, Part I. LNCS vol. 9573, 63-73 (2016). DOI: 10.1007/978-3-319-32149-3_7 Preprint: arXiv:1505.02586
F. Shahzad, M. Kreutzer, T. Zeiser, R. Machado, A. Pieper, G. Hager, and G. Wellein: Building a fault tolerant application using the GASPI communication layer. Proc. FTS 2015, the 1st International Workshop on Fault-Tolerant Systems, in conjunction with IEEE Cluster 2015, September 8, 2015, Chicago, IL. DOI: 10.1109/CLUSTER.2015.106, Preprint: arXiv:1505.04628
T. M. Malas, G. Hager, H. Ltaief, H. Stengel, G. Wellein, and D. E. Keyes: Multicore-optimized wavefront diamond blocking for optimizing stencil updates. SIAM Journal on Scientific Computing 37(4), C439-C464 (2015). DOI: 10.1137/140991133, Preprint: arXiv:1410.3060
M. Röhrig-Zöllner, J. Thies, M. Kreutzer, A. Alvermann, A. Pieper, A. Basermann, G. Hager, G. Wellein, and H. Fehske: Increasing the performance of the Jacobi-Davidson method by blocking. SIAM Journal on Scientific Computing, 37(6), C697–C722 (2015). DOI: 10.1137/140976017, Preprint: http://elib.dlr.de/89980/
H. Stengel, J. Treibig, G. Hager, and G. Wellein: Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model. Proc. ICS15, the 29th International Conference on Supercomputing, June 8-11, 2015, Newport Beach, CA. DOI: 10.1145/2751205.2751240. Preprint: arXiv:1410.5010
H. Fehske, G. Hager, and A. Pieper: Electron confinement in graphene with gate-defined quantum dots. Phys. Status Solidi B, 252: 1868–1871 (2015). DOI: 10.1002/pssb.201552119. Preprint: arXiv:1503.05815
M. Wittmann, G. Hager, T. Zeiser, J. Treibig, and G. Wellein: Chip-level and multi-node analysis of energy-optimized lattice-Boltzmann CFD simulations. Concurrency and Computation: Practice and Experience 28(7), 2295-2315 (2015). DOI: 10.1002/cpe.3489 Preprint: arXiv:1304.7664
M. Kreutzer, G. Hager, G. Wellein, A. Pieper, A. Alvermann, and H. Fehske: Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems. Proc. IPDPS15, the 29th IEEE International Parallel & Distributed Processing Symposium, May 25-29, 2015, Hyderabad, India. DOI: 10.1109/IPDPS.2015.76, Preprint: arXiv:1410.5242

R. Schöne, J. Treibig, M.F. Dolz, C. Guillen, C. Navarrete, M. Knobloch, and B. Rountree: Tools and methods for measuring and tuning the energy efficiency of HPC systems. Scientific Programming 22(4), 273-283 (2014). DOI: 10.3233/SPR-140393
T. Röhl, J. Treibig, G. Hager, and G. Wellein: Overhead Analysis of Performance Counter Measurements. In: Proc. PSTI 2014, the Fifth International Workshop on Parallel Software Tools and Tool Infrastructures, Sept 11, 2014, Minneapolis, MN. DOI: 10.1109/ICPPW.2014.34
T. M. Malas, G. Hager, H. Ltaief, and D. E. Keyes: Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking. Preprint: arXiv:1410.5561
A. Alvermann, A. Basermann, H. Fehske, Martin Galgon, G. Hager, M. Kreutzer, L. Krämer, B. Lang, A. Pieper, M. Röhrig-Zöllner, F. Shahzad, J. Thies, and G. Wellein: ESSEX: Equipping Sparse Solvers for Exascale. In: L. Lopes et al. (Eds.): Euro-Par 2014 Workshops, Part II, LNCS 8806, 577-588 (2014). DOI: 10.1007/978-3-319-14313-2_49. Preprint
M. Kreutzer, G. Hager, G. Wellein, H. Fehske, and A. R. Bishop: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scientific Computing 36(5), C401–C423 (2014). DOI: 10.1137/130930352, Preprint: arXiv:1307.6209, BibTeX
J. Hofmann, J. Treibig, G. Hager, and G. Wellein: Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips. Accepted for WPMVP 2014, the Workshop on Programming Models for SIMD/Vector Processing at PPoPP 2014, Orlando, FL, Feb 16, 2014. DOI: 10.1145/2568058.2568068, Preprint: arXiv:1401.7494
J. Hofmann, J. Treibig, G. Hager, and G. Wellein: Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator. Accepted for PASA 2014, the 11th Workshop on Parallel Algorithms and Systems and Algorithms, Lübeck, Germany, Feb 25-26, 2014. IEEE Archive, Preprint: arXiv:1401.3615
S. Kronawitter, H. Stengel, G. Hager, and C. Lengauer: Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model. Parallel Processing Letters 24, 1441004 (2014). DOI: 10.1142/S0129626414410047
G. Hager, J. Treibig, J. Habich, and G. Wellein: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency and Computation: Practice and Experience 28(2), 189-210 (2016). First published online December 2013, DOI: 10.1002/cpe.3180, Preprint: arXiv:1208.2908

M. Wittmann, T. Zeiser, G. Hager, and G. Wellein: Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations. Computers & Fluids 80 (2013), 283-289. DOI: 10.1016/j.compfluid.2012.02.007. Preprint: arXiv 1111.1129 (2011).
M. Wittmann, G. Hager, G. Wellein, T. Zeiser, and B. Krammer: MPC and Coarray Fortran: Alternatives to Classic MPI Implementations on the Examples of Scalable Lattice Boltzmann Flow Solvers. In: W. E. Nagel et al. (eds.), High Performance Computing in Science and Engineering ‘12, Springer, ISBN 978-3-642-33373-6 (2013) 367-372. DOI: 10.1007/978-3-642-33374-3_27
C. Scheit, G. Hager, J. Treibig, S. Becker, and G. Wellein: Optimization of FASTEST-3D for Modern Multicore Systems. Preprint: arXiv:1303.4538
T. Scharpff, K. Iglberger, G. Hager, and U. Rüde: Model-guided Performance Analysis of the Sparse Matrix-Matrix Multiplication. Proc. 2013 International Conference on High Performance Computing & Simulation (HPCS 2013), July 1-5, 2013, Helsinki, Finland. DOI: 10.1109/HPCSim.2013.6641452, Preprint: arXiv:1303.1651
M. Wittmann, G. Hager, T. Zeiser, and G. Wellein: Asynchronous MPI for the Masses. Preprint: arXiv:1302.4280
F. Shahzad, M. Wittmann, T. Zeiser, G. Hager, and G. Wellein: An Evaluation of Different IO Techniques for Checkpoint/Restart. Workshop on Large-Scale Parallel Processing 2013 (LSPP13). DOI: 10.1109/IPDPSW.2013.145, Preprint: asyn_ckpt_130115.pdf
F. Shahzad, M. Wittmann, M. Kreutzer, T. Zeiser, G. Hager, and G. Wellein: A survey of checkpoint/restart techniques on distributed memory systems. Parallel Processing Letters 23(04), 1340011-1340030 (2013). DOI: 10.1142/S0129626413400112
F. Shahzad, M. Wittmann, M. Kreutzer, T. Zeiser, G. Hager, and G. Wellein: PGAS implementation of SpMVM and LBM with GPI. Proceedings of the 7th International Conference on PGAS Programming Models, Oct. 3-4, 2013, Edinburgh, Scotland, 172-184 (2013).

J. Treibig, G. Hager, and G. Wellein: likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes. In: H. Brunst et al. (eds.), Tools for High Performance Computing 2011. Springer, ISBN 978-3-642-31475-9, (2012) 27-36 . DOI: 978-3-642-31475-9.
K. Sembritzki, G. Hager, B. Krammer, J. Treibig, and G. Wellein: Evaluation of the Coarray Fortran Programming Model on the Example of a Lattice Boltzmann Code. Proceedings of PGAS ’12, The 6th Conference on Partitioned Global Address Space Programming Models, Oct 10-12, 2012, Santa Barbara, CA, USA.
G. Hager: Performance engineering: From numbers to insight. Proc. 5^th Workshop on Productivity and Performance (PROPER 2012) at Euro-Par 2012, August 28, 2012, Rhodes Island, Greece. Euro-Par 2012: Parallel Processing Workshops, Lecture Notes in Computer Science 7640, 393-394 (2013), Springer, ISBN 978-3-642-36948-3. DOI: 10.1007/978-3-642-36949-0_44
J. Treibig, G. Hager, and G. Wellein: Performance patterns and hardware metrics on modern multicore processors: Best practices for performance engineering. Proc. 5^th Workshop on Productivity and Performance (PROPER 2012) at Euro-Par 2012, August 28, 2012, Rhodes Island, Greece. Euro-Par 2012: Parallel Processing Workshops, Lecture Notes in Computer Science 7640, 451-460 (2013), Springer, ISBN 978-3-642-36948-3. DOI: 10.1007/978-3-642-36949-0_50. Preprint: arXiv:1206.3738
K. Iglberger, G. Hager, J. Treibig, and U. Rüde: High Performance Smart Expression Template Math Libraries. Accepted for the 2nd International Workshop on New Algorithms and Programming Models for the Manycore Era (APMM 2012) at HPCS 2012, July 2-6, 2012, Madrid, Spain. DOI: 10.1109/HPCSim.2012.6266939
M. Wittmann, T. Zeiser, G. Hager, and G. Wellein: Comparison of Different Propagation Steps for Lattice Boltzmann Methods. Computers & Mathematics with Applications (Proc. ICMMES 2011). Available online, DOI: 10.1016/j.camwa.2012.05.002. Preprint: arXiv:1111.0922
M. Kreutzer, G. Hager, G. Wellein, H. Fehske, A. Basermann, and A.R. Bishop: Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation. Accepted for the Workshop on Large-Scale Parallel Processing 2012 (LSPP12). DOI: 10.1109/IPDPSW.2012.211. Preprint: arXiv:1112.5588
J. Habich, C. Feichtinger, H. Köstler, G. Hager, and G. Wellein: Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results. Computers & Fluids, DOI: 10.1016/j.compfluid.2012.02.013. Preprint: arXiv:1112.0850
J. Treibig, G. Hager, H. G. Hofmann, J. Hornegger, and G. Wellein: Pushing the limits for medical image reconstruction on recent standard multicore processors. International Journal of High Performance Computing Applications 27(2), 162–177 (2013).
DOI: 10.1177/1094342012442424, Preprint: arXiv:1104.5243
K. Iglberger, G. Hager, J. Treibig, and U. Rüde: Expression Templates Revisited: A Performance Analysis of Current ET Methodologies. SIAM Journal on Scientific Computing 34(2), C42-C69 (2012). DOI: 10.1137/110830125, Preprint: arXiv:1104.1729

G. Schubert, H. Fehske, G. Hager, and G. Wellein: Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems. Parallel Processing Letters 21(3), 339-358 (2011). DOI: 10.1142/S0129626411000254, Preprint: arXiv:1106.5908
G. Hager, G. Schubert, T. Schoenemeyer, and G. Wellein: Prospects for Truly Asynchronous Communication with Pure MPI and Hybrid MPI/OpenMP on Current Supercomputing Platforms. Proc. Cray Users Group Conference 2011 (CUG 2011), May 23-26, 2011, Fairbanks, AK. Hager-Paper-CUG11.pdf
J. Treibig, G. Hager, and G. Wellein: LIKWID performance tools. In: C. Bischof et al. (eds.), Competence in High Performance Computing 2010. Springer, ISBN 978-3-642-24025-6 (2012), 165-175. DOI: 10.1007/978-3-642-24025-6_14, Preprint: arXiv:1104.4874
G. Schubert, G. Hager, H. Fehske, and G. Wellein: Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming. Workshop on Large-Scale Parallel Processing (LSPP 2011), May 20th, 2011, Anchorage, AK. DOI:10.1109/IPDPS.2011.332, Preprint: arXiv:1101.0091
J. Treibig, G. Wellein, and G. Hager: Efficient multicore-aware parallelization strategies for iterative stencil computations. Journal of Computational Science 2, 130-137 (2011). DOI: 10.1016/j.jocs.2011.01.010, Preprint: arXiv:1004.1741

M. Wittmann, and G. Hager: Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems. Preprint: arXiv:1101.0093
G. Hager, and G. Wellein: Introduction to High Performance Computing for Scientists and Engineers. CRC Press, ISBN 978-1439811924, 356 pages, July 2010. Available as eBook.
C. Feichtinger, J. Habich, H. Köstler, G. Hager, U. Rüde, and G.Wellein: A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters. Parallel Computing 37(9), 536-549 (2011) . DOI: 10.1016/j.parco.2011.03.005. Preprint: arXiv:1007.1388
M. Wittmann, G. Hager, J. Treibig, and G. Wellein: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters. Parallel Processing Letters 20 (4), 359-376 (2010). DOI: 10.1142/S0129626410000296 Preprint: arXiv:1006.3148
H. Fehske, and G. Hager: Luttinger, Peierls or Mott? Quantum Phase Transitions in Strongly Correlated 1D Electron-Phonon Systems. In: F. Hensel, P. Edwards and R. Redmer (Eds.), Metal-to-Nonmetal Transitions. Springer Series in Material Sciences, Vol. 132, (Springer) 1-22, 2010. DOI: 10.1007/978-3-642-03953-9_1
J. Treibig, G. Hager, M. Meier, and G. Wellein: LIKWID performance tools. InSiDE 8(1), 50-53 (Spring 2010).
J. Treibig, G. Hager, and G. Wellein: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA, September 13, 2010. DOI: 10.1109/ICPPW.2010.38 Preprint: arXiv:1004.4431
J. Treibig, G. Hager, and G. Wellein: Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures. In: S. Wagner et al., High Performance Computing in Science and Engineering, Garching/Munich 2009. Springer, ISBN 978-3642138713 (2010), 3-12. DOI: 10.1007/978-3-642-13872-0_1, Preprint (Multi-core architectures: Complexities of performance prediction and the impact of cache topology): arXiv:0910.4865.
G. Schubert, G. Hager, and H. Fehske: Performance limitations for sparse matrix-vector multiplications on current multicore environments. In: S. Wagner et al., High Performance Computing in Science and Engineering, Garching/Munich 2009. Springer, ISBN 978-3642138713 (2010), 13-26. DOI: 10.1007/978-3-642-13872-0_2, Preprint: arXiv:0910.4836.
M. Wittmann, G. Hager, and G. Wellein: Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory. Workshop on Large-Scale Parallel Processing at IPDPS 2010, April 23rd, 2010, Atlanta, GA.Preprint: arXiv:0912.4506, DOI: 10.1109/IPDPSW.2010.5470813
J. Habich, T. Zeiser, G. Hager, and G. Wellein: Performance analysis and optimization strategies for a D3Q19 Lattice Boltzmann Kernel on nVIDIA GPUs using CUDA. Advances in Engineering Software 42 (5), 266-272 (2011). DOI: 10.1016/j.advengsoft.2010.10.007

T. Zeiser, G. Hager, and G. Wellein: Benchmark analysis and application results for lattice Boltzmann simulations on NEC SX vector and Intel Nehalem systems. Parallel Processing Letters 19 (4), 491-511 (2009) DOI:10.1142/S0129626409000389
J. Treibig, and G. Hager: Introducing a Performance Model for Bandwidth-Limited Loop Kernels. Proceedings of the Workshop “Memory issues on Multi- and Manycore Platforms” at PPAM 2009, the 8th International Conference on Parallel Processing and Applied Mathematics, Wroclaw, Poland, September 13-16, 2009. Lecture Notes in Computer Science Volume 6067, 2010, pp 615-624. DOI: 10.1007/978-3-642-14390-8_64. arXiv:0905.0792
T. Zeiser, G. Hager, and G. Wellein: The world’s fastest CPU and SMP node: Some performance results from the NEC SX-9. Proceedings of LSPP 2009 at IPDPS09, Rome, Italy, May 25-29, 2009. DOI:10.1109/IPDPS.2009.5161089
G. Hager, G. Jost, and R. Rabenseifner: Communication Characteristics and Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-core SMP Nodes. In: Proceedings of the Cray Users Group Conference 2009 (CUG 2009), Atlanta, GA, USA, May 4-7, 2009. cug09_hager_jost_rabenseifner.pdf
G. Wellein, G. Hager, T. Zeiser, M. Wittmann, and H. Fehske: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. Proceedings of COMPSAC 2009, the 33rd Annual IEEE International Computer Software and Applications Conference, Seattle, WA, July 20-24, 2009. DOI:10.1109/COMPSAC.2009.82
J. Habich, T. Zeiser, G. Hager, and G. Wellein: Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs. Proceedings of PARENG09-S01, the First International Conference on Parallel, Distributed and Grid Computing for Engineering, Pecs, Hungary, April 2009. DOI:10.4203/ccp.90.17
M. Wittmann, and G. Hager: A Proof of Concept for Optimizing Task Parallelism by Locality Queues. arXiv:0902.1884
R. Rabenseifner, G. Hager, and G. Jost: Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In Didier El Baz et al. (Eds.), Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and network-based Processing PDP 2009, Feb 18-20, 2009, Weimar, Germany. Computer Society Press, pp. 427-436. DOI:10.1109/PDP.2009.43 hjr.pdf
S. Ejima, G. Hager, and H. Fehske: Quantum phase transition in a 1D transport model with boson affected hopping: Luttinger liquid versus charge-density-wave behavior. Phys. Rev. Lett. 102, 106404 (2009), DOI: 10.1103/PhysRevLett.102.106404, arXiv:0811.0742
T. Zeiser, G. Hager, and G. Wellein: Vector computers in a world of commodity clusters, massively parallel systems and many-core many-threaded CPUs: recent experience based on advanced lattice Boltzmann flow solvers. In: W. E. Nagel, D. B. Kröner, M. Resch (eds.), High Performance Computing in Science and Engineering ’08, Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2008, Springer, ISBN 978-3-540-88301-2, (2009) 333-347. DOI:10.1007/978-3-540-88303-6.

N. Schindzielorz, J. Erler, P. Klüpfel, P.-G. Reinhard, and G. Hager: Fission of super-heavy nuclei explored with Skyrme forces. Int. J. Mod. Phys. E 18(4), 773-781 (2009). DOI:10.1142/S0218301309012860
M. Breuer, P. Lammers, T. Zeiser, G. Hager, and G. Wellein: Towards the simulation of the turbulent flow over dimples – Code evaluation and optimization for the NEC SX-8. In: W.E. Nagel, D. Körner, M. Resch (eds.), High Performance Computing in Science and Engineering ’07, Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2007, Springer, ISBN 978-3-540-74739-0 / 978-3-540-74738-3, (2008) 303-318. doi:10.1007/978-3-540-74739-0_21.
H. Fehske, G. Hager, and J. Jeckelmann: Metallicity in the half-filled Holstein-Hubbard model. Europhys. Lett. 84, 57001 (2008), DOI:10.1209/0295-5075/84/57001, arXiv:0808.1675
G. Hager, T. Zeiser, and G. Wellein: Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers. Workshop on Large-Scale Parallel Processing 2008, DOI:10.1109/IPDPS.2008.4536341, arXiv:0712.2302
G. Hager, T. Zeiser, and G. Wellein: Data access characteristics and optimizations for Sun UltraSPARC T2 and T2+ systems. Parallel Processing Letters, Vol. 18, No. 4 (2008) 471-490. DOI:10.1142/S0129626408003521 Preprint: ppl-hzw.pdf

G. Hager, A. Weiße, G. Wellein, E. Jeckelmann, and H. Fehske: The spin-Peierls chain revisited. J. Magn. Magn. Mater. 310, 1380-1382 (2007). Erratum: J. Magn. Magn. Mater. 316, 43 (2007). Proceedings of the 17th International Conference on Magnetism (ICM 2006), Aug 20-25 2006, Kyoto, Japan. arXiv:cond-mat/0606360
M. Hohenadler, G. Hager, G. Wellein, and H. Fehske: Carrier-density effects in many-polaron systems. J. Phys.: Condens. Matter 19 (2007) 255202. arXiv:cond-mat/0609296
T. Zeiser, G. Wellein, A. Nitsure, K. Iglberger, U. Rüde, and G. Hager: Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method. Progress in Computational Fluid Dynamics, An Int. J. Vol. 8, No.1/2/3/4 (2008) 179-188. Proceedings of ICMMES 2006. DOI:10.1504/PCFD.2008.018088
G. Hager, and G. Wellein: Architectures and Performance Characteristics of Modern High Performance Computers. In Fehske et al., Lect. Notes Phys. 739, 681-730 (2008), ISBN: 978-3-540-74685-0
G. Hager, and G. Wellein: Optimization Techniques for Modern High Performance Computers. In Fehske et al., Lect. Notes Phys. 739, 731-767 (2008), ISBN: 978-3-540-74685-0
G. Hager, H. Stengel, T. Zeiser, and G. Wellein: RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks. In: S. Wagner et al. (Eds.), High Performance Computing in Science and Engineering, Garching/Munich 2007. Transactions of the Third Joint HLRB and KONWIHR Status and Result Workshop, Dec 3-4, 2007, LRZ Garching, Springer, ISBN 978-3-540-69181-5 (2009) 485-501. arXiv:0712.3389
M. Stürmer, G. Wellein, G. Hager, H. Köstler, and Ulrich Rüde: Challenges and potentials of emerging multicore architectures. In: S. Wagner et al. (Eds.), High Performance Computing in Science and Engineering, Garching/Munich 2007. Transactions of the Third Joint HLRB and KONWIHR Status and Result Workshop, Dec 3-4, 2007, LRZ Garching, Springer, ISBN 978-3-540-69181-5 (2009) 551-566.

G. Wellein, P. Lammers, G. Hager, S. Donath, and T. Zeiser: Towards optimal performance for lattice Boltzmann applications on terascale computers. In: A. Deane et al. (eds), Parallel Computational Fluid Dynamics – Theory and Applications. Proceedings of the Parallel CFD 2005 Conference, College Park, MD, USA, May 24-27, 2005. Elsevier, ISBN 0-444-52206-9 (2006) 31-40.
H. Fehske, G. Hager, G. Wellein, and E. Jeckelmann: Hole-doped Hubbard ladders. Physica B 378-380, 319-320 (2006). arXiv:cond-mat/0505666
G. Schubert, A. Alvermann, A. Weiße, G. Hager, G. Wellein, and H. Fehske: Spectral Properties of Strongly Correlated Electron Phonon Systems. NIC Symposium 2006, G. Münster, D. Wolf, M. Kremer (Editors), John von Neumann Institute for Computing, Jülich, NIC Series, Vol. 32, ISBN 3-00-017351-X, pp. 201-210, 2006.
A. Weiße, G. Hager, A. R. Bishop, and H. Fehske: Phase diagram of the spin-Peierls chain with local coupling. Phys. Rev. B 74, 214426 (2006). arXiv:cond-mat/0607209
A. Nitsure, K. Iglberger, U. Rüde, C. Feichtinger, G. Wellein, G. Hager: Optimization of Cache Oblivious Lattice Boltzmann Method in 2D and 3D. In: Becker, Matthias; Szczerbicka, Helena (Hrsg.): Simulationstechnique – 19th Symposium in Hannover, September 2006 (ASIM 2006 – 19. Symposium Simulationstechnik, Hannover, 12. – 14. 09. 2006). Erlangen, SCS Publishing House, 2006, S. 265-270 (Frontiers in Simulation, Vol. 16)
P. Lammers, G. Wellein, T. Zeiser, G. Hager, M. Breuer: Have the vectors the continuing ability to parry the attack of the killer micros? In: M. Resch, T. Bönisch, K. Benkert, T. Furui, Y. Seo, W. Bez (editors): High Performance Computing on Vector Systems. Proceedings of the High Performance Computing Center Stuttgart, March 2005), Springer, ISBN 3-540-29124-5, (2006) 25-39. doi:10.1007/3-540-35074-8_2.

G. Hager: A parallelized density matrix renormalization group algorithm and its application to strongly correlated quantum systems. Dissertation, Ernst-Moritz-Arndt-Universität Greifswald, 2005. URN: urn:nbn:de:gbv:9-000024-1
G. Hager, T. Zeiser, and H. Heller:Setting up ByGRID – First Steps Towards an e-Science Infrastructure in Bavaria. In: A. Bode, F. Durst (Eds.): High Performance Computing in Science and Engineering, Garching 2005. Transactions of the KONWIHR Result Workshop, October 14-15, 2004 2, Technical University of Munich, Garching, Springer, ISBN 3-540-26145-1 (2005) 97-102.
G. Hager, G. Wellein, E. Jeckelmann, and H. Fehske: Stripe formation in doped Hubbard ladders. Phys. Rev. B 71, 075108 (2005). arXiv:cond-mat/0409321
H. Fehske, G. Wellein, G. Hager, A. Weiße, K.W. Becker, and A.R. Bishop: Luttinger liquid versus charge density wave behaviour in the one-dimensional spinless fermion Holstein model. Physica B 359-361, 699-701 (2005). arXiv:cond-mat/0406023
G. Hager, T. Zeiser, J. Treibig, and G. Wellein: Optimizing performance on modern HPC systems: learning from simple kernel benchmarks. In: Proceedings of the 2nd Russian-German Advanced Research Workshop on Computational Science and High Performance Computing, HLRS, Stuttgart, March 14 – 16, 2005.
G. Wellein, T. Zeiser, S. Donath, and G. Hager: On the Single Processor Performance of Simple Lattice Boltzmann Kernels. Proc. ICMMES, 2004. Computers & Fluids 35, 910-919 (2006). DOI:10.1016/j.compfluid.2005.02.008
S. Donath, T. Zeiser, G. Hager, J. Habich, and G. Wellein: Optimizing Performance of the Lattice Boltzmann Method for Complex Structures on Cache-based Architectures. In: F. Huelsemann, M. Kowarschik, U. Ruede (Eds.): Frontiers in Simulation: Simulation Techniques – 18th Symposium in Erlangen, September 2005 (ASIM), pp. 728-735, SCS Publishing House, Erlangen, 2005.
G. Hager, B. Bergen, P. Lammers, and G. Wellein: Taming the Bandwidth Behemoth – First Experiences on a Large SGI Altix System. InSiDE 3, No. 2, Autumn 2005, pp. 24-25 (2005).

G. Hager, E. Jeckelmann, H. Fehske, and G. Wellein: Parallelization Strategies for Density Matrix Renormalization Group Algorithms on Shared-Memory Systems. J. Comput. Phys. 194(2), 795 (2004). arXiv:cond-mat/0305463
H. Fehske, G. Wellein, G. Hager, A. Weiße, and A. R. Bishop: Quantum Lattice Dynamical Effects on Single-Particle Excitations in One-dimensional Mott and Peierls Insulators. Phys. Rev. B 69, 165115 (2004). arXiv:cond-mat/0312426
G. Hager, G. Wellein, E. Jeckelmann, and H. Fehske: DMRG Investigation of Stripe Formation in Doped Hubbard Ladders. In: A. Bode (Ed.): High Performance Computing in Science and Engineering 2004 – Transactions of the Second Joint HLRB and KONWIHR Result and Reviewing Workshop (Second Joint HLRB and KONWIHR Result and Reviewing Workshop Munich – Germany 2-3 March 2004). Berlin: Springer, 2004.
G. Hager, E. Jeckelmann, H. Fehske, and G. Wellein: Exact Numerical Treatment of Finite Quantum Systems using Leading-Edge Supercomputers. In: Modelling, Simulation and Optimization of Complex Processes, Eds. H. G. Bock, E. Kostina, H.-X. Phu, R. Rannacher, Springer-Verlag Berlin Heidelberg (2005), pp 165-175.
G. Wellein, T. Zeiser, G. Hager, and P. Lammers: Application Performance of Modern Number Crunchers. CSAR Focus, Ed. 12, Summer-Autumn 2004, pp. 17-19 (2004).

G. Wellein, G. Hager, A. Basermann, and H. Fehske: Fast sparse matrix-vector multiplication for TFlop/s computers. In: J.M.L.M. Palma; J. Dongarra (Hrsg.) : High Performance Computing for Computational Science – VECPAR2002 (High Performance Computing for Computational Science – VECPAR2002 Porto – Portugal 26-28 June 2002). Berlin : Springer, 2003.
H. Fehske, G. Wellein, A. P. Kampf, M. Sekania, G. Hager, A. Weiße, H. Büttner, and A. R. Bishop: One-dimensional electron-phonon systems: Mott- versus Peierls-insulators. In: A. Bode (Hrsg.) : High Performance Computing in Science and Engineering 2002 – Transactions of the First Joint HLRB and KONWIHR Result and Reviewing Workshop (First Joint HLRB and KONWIHR Result and Reviewing Workshop Garching – Germany 10-11 October 2002). Berlin : Springer, 2003.
G. Hager, F. Deserno, and G. Wellein: Pseudo-Vectorization and RISC Optimization Techniques for the Hitachi SR8000 architecture. In: A. Bode (Ed.) : High Performance Computing in Science and Engineering 2002 – Transactions of the First Joint HLRB and KONWIHR Result and Reviewing Workshop (First Joint HLRB and KONWIHR Result and Reviewing Workshop Garching – Germany 10-11 October 2002). Berlin : Springer, 2003.
G. Hager, F. Brechtefeld, P. Lammers, and G. Wellein: Processor Architecture and Application Performance in Modern Supercomputers. InSiDE 1, No. 1, pring 2003, pp. 8-13 (2003).

G. Wellein, G. Hager, A. Basermann, and H. Fehske: Exact Diagonalization of Large Sparse Matrices: A Challenge for Modern Supercomputers. In: Proceedings of CRAY Users Group (CUG) Summit 2001 (CUG Summit 2001 Indian Wells – USA May 2001). 2001, S. CD-ROM.

Posters

A. Afzal, G. Hager, and G. Wellein: Wattlytics: Peak FLOPS Don’t Buy the Most Science per Euro. Research poster at ISC High Performance 2026, Hamburg, Germany, June 22-26, 2026.

J.-Y. Verharghe, G. Hager, and A. Afzal: ParaViz3D: MPI Trace Visualization with 3D Video. Research poster at SC25, St. Louis, MO, November 16-21, 2025. Poster PDF Poster summary
A. Afzal, G. Hager, and G. Wellein: DisCostiC: Digital Twin Performance Simulations Unlocking Hardware-Software Interplay. Research poster at ISC High Performance 2025, Hamburg, Germany, June 10-13, 2025. Poster PDF Poster summary

A. Afzal, G. Hager, and G. Wellein: DisCostiC: Simulating MPI Applications Without Executing Code. Best Research Poster Award Candidate and Finalist at SC24, Atlanta, GA, November 17-22, 2024.
J. Laukemann, F. Jung, C.-M. Pfeiler, D. Jimenez, C. Clauss, T. Dannert, F. Jenko, E. Laure, M. Schulz, G. Wellein: Exploiting Data Compression and Low Precision for Exascale Fusion Turbulence Simulations. Research Poster at SC24, Atlanta, GA, November 17-22, 2024.
A. Ghasemi: Machine Learning Interatomic Potentials: Workflow for Generating Training Dataset on Massively Parallel Computers, Research Poster at CECAM Workshop, Berlin, February 19, 2024.

J. Eitzinger and T. Gruber: EE-HPC – A Framework for Energy Efficient HPC System Operation. Research Poster at SC23, Denver, CO, November 12-17, 2023.
A. Ghasemi, E. Rahmatizad Khajehpasha, and T. D. Kühn: Incorporating electrostatic interactions in machine learning interatomic potentials, Research Poster at 1st NHR Conference, Berlin, September 18, 2023.
A. Afzal, G. Hager, and G. Wellein: Making Applications Run Faster by Slowing Down Processes? Research Poster at ISC High Performance 2023. Poster PDF
C. L. Alappat, G. Hager, J. Thies, and G. Wellein: RACE: Speeding Up Sparse Iterative Solvers Using Cache Blocked Matrix Power Kernels. Research Poster at ISC High Performance 2023. Best Poster Award Finalist and 3rd Prize. Poster PDF

A. Afzal, G. Hager, and G. Wellein: DisCosTiC: A DSL-based Parallel Simulation Framework using First-Principles Analytic Performance Models. Poster at PASC22 Conference, the Platform for Advanced Scientific Computing (PASC), Basel, Switzerland, June 28, 2022.
Video presentation Poster PDF

A. Afzal, G. Hager, and G. Wellein: White-box Modelling of Parallel Computing Dynamics. Poster at HPC Asia 2022, The International Conference on High Performance Computing in Asia-Pacific Region, January 12-14, 2022 (online). Extended abstract Poster PDF

D. Ernst: The Best Thread Block Size and other parameters you have to tune for optimal performance on GPUs. PhD Forum presentation at ISC 2021 Digital. Video presentation
A. Afzal, G. Hager, and G. Wellein: Physical Oscillator Model for Parallel Distributed Computing. Poster at ISC 2021 Digital. Video presentation
A. Afzal: Noise-driven Cluster-level Performance Modelling and Engineering. PhD Forum presentation at ISC 2021 Digital. Video presentation

T. Gruber, J. Eitzinger, G. Hager, and G. Wellein: LIKWID 5: Lightweight Performance Tools. Poster at SC19.
J. Hammer, J. Hornich, G. Hager, T. Gruber, and G. Wellein: INSPECT Intranode Stencil Performance Evaluation Collection. Poster at SC19.
A. Afzal, G. Hager, and G. Wellein: Delay Flow Mechanisms on Clusters. Poster at EuroMPI 2019. EuroMPI2019_AHW-Poster.pdf EuroMPI2019-AHW-Summary.pdf

G. Hager, J. Eitzinger, J. Hornich, F. Cremonesi, C. A. Alappat, T. Gruber, and G. Wellein: Applying the Execution-Cache-Memory Performance Model: Current State of Practice. Best poster award candidate at SC18. SC18-Poster-ECM_PRINT_M.pdf SC18-Poster-Hager_summary.pdf
C. A. Alappat: Recursive Algebraic Coloring Engine. Poster in the ACM Student Research Competition and ACM SRC winner at SC18. RACE_sc18_poster_summary.pdf
J. Hammer: Out of Order Instruction Benchmarking Framework on the Back of Dragons. Poster in the ACM Student Research Competition at SC18.

J. Eitzinger, T. Röhl, G. Hager, and G, Wellein: LIKWID 4: Lightweight Performance Tools. Poster at SC16.
J. Hammer: Performance Modeling and Engineering with Kerncraft. Poster in the ACM Student Research Competition and ACM SRC runner-up at SC16.

M. Kreutzer, A. Pieper, A. Alvermann, H. Fehske, G. Hager, G. Wellein, and A. R. Bishop: Efficient Large-Scale Sparse Eigenvalue Computations on Heterogeneous Hardware. Poster at SC15.
T. Malas, G. Hager, H. Ltaief, and D. Keyes: Advanced Tiling Techniques for Memory-Starved Streaming Numerical Kernels. Best poster award candidate at SC15.

T. Malas, G. Hager, H. Ltaief, H. Stengel, G. Wellein, and D. Keyes: Optimizing Stencil Computations: Multicore-Optimized Wavefront Diamond Blocking on Shared and Distributed Memory Systems. Poster at SC14.
M. Röhrig-Zöllner, J. Thies, M. Kreutzer, A. Alvermann, A. Pieper, A. Basermann, G. Hager, G. Wellein, and H. Fehske: Performance of Block Jacobi-Davidson Eigensolvers. Poster at SC14.

J. Treibig, G. Hager, and G. Wellein: Pattern-Driven Node-Level Performance Engineering. Poster at SC13.

J. Treibig, G. Hager, M. Meier, and G. Wellein: LIKWID Performance Tools. Poster at SC11.

Talks

A. Afzal: Energy-Aware Optimizations: From Application, Software and Infrastructure Point of View. Invited talk at Energy Efficiency and Operational Costs in NHR (EEC) Community Workshop, Germany, January 23, 2026.
G. Hager and A. Ghasemi: Parallel Computing: From CPU Core to Supercomputer. Invited talk at Fakultät Mathremaik und Informatik, OTH Regensburg, January 14, 2026.

T. Gruber: Unleash the control freak in yourself for fun and profit—and for science! Invited talk at the Durham HPC & AI Days, Durham, UK, June 5, 2025.

G. Hager: Analytic Performance Modeling for HPC Workloads. Invited talk at the Sino-German Workshop on Multiphysics Device Simulation and Hardware-Aware Computing, Xi’An, China, October 10-16, 2024.
T. Gruber: The LIKWID Performance Tool Suite. Invited talk for the Server Performance Group at Intel, June 27, 2024.
A. Kahler: Performance Improvements through In-Depth Hardware Knowledge. Talk at the 1st Proud and Strong in Computing Conference, June 24, 2024.
G. Hager: Hardware Evolution from an HPC Point of View. Invited talk at “20 ans du Groupe Calcul,” Paris, France, June 3, 2024.
T. Gruber: ClusterCockpit and EE-HPC: A way to more energy efficiency on HPC systems? Invited talk at the Durham HPC/AI Days 2024, Durham, UK, May 10, 2024.
A. Afzal: Predicting Parallel Applications Performance using Automated Analytic First-principles Models in DisCostiC. Invited talk at TU Darmstadt, Parallel Programming Group, Darmstadt, Germany, March 25, 2024.
G. Hager and J. Eitzinger: Resources for High Performance Computing at FAU. Talk at the FAU Graduate Centre, March 19, 2024.
T. Gruber: The LIKWID Performance Tool Suite. Invited online talk at Los Alamos National Laboratory, Los Alamos, NM, March 18, 2024.
C. Alappat: Performance optimisation of sparse iterative solvers using temporal cache blocking. Talk at the Minisyposium “Parallel in time methods for High-Performance Computing” at Algorithmy 2024, High Tatra Mountains, Slovakia, March 17, 2024.
D. Ernst: Performance Engineering of the Navier-Stokes Finite Element Assembly of Alya on GPUs. Talk at the Minisymposium “Advances in Highly Parallel Solvers for Partial Differential Equations” at SIAM PP24, Baltimore, MD, March 8, 2024.
C. L. Alappat: Accelerating Sparse Iterative Solvers and Preconditioners Using RACE. Best Paper Prize talk at SIAM PP24, Baltimore, MD, March 7, 2024.
C. L. Alappat: Accelerating Sparse Solvers with Cache-Optimized Matrix Power Kernels. Talk at the Minisyposium “Advancements in Sparse Linear Algebra: Hardware-Aware Algorithms and Optimization Techniques” at SIAM PP24, Baltimore, MD, March 7, 2024.
T. Gruber and Jakob Fritz (JSC): Github and Gitlab – Combine the best of both worlds. Invited talk at dRSE24 – Conference for Research Software Engineering in Germany, University of Würzburg, March 6, 2024.
T. Gruber: LIKWID Performance Tool Suite. Lecture with hands-on exercises at the 44th VI-HPS Tuning Workshop (RWTH Aachen and ZIH, TU Dresden, Germany), February 26-March 1, 2024.
J. Veh: Experience with Ceph in Erlangen. Monthly Storage Talks, March 6, 2024.
G. Hager: Performance Engineering with Resource-Based Metrics. Invited talk at the Zentralinstitut für Technische Informatik (ZITI), University of Heidelberg, February 5, 2024.
J. Veh: Ceph. Talk at the online NHR Data Lakes Workshop, GWDG, January 23, 2024.

G. Hager: Performance Modeling and Performance Engineering. Invited lecture series at the AQTIVATE Training Workshop on Exacale Computing and Scalable Algorithms, Stockholm, Sweden, November 27-December 14, 2023.
A. Afzal (G. Hager): Physical Osciallator Model for Supercomputing. Short paper presentation at PMBS23, the 14th Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Denver, CO, November 13, 2023.
A. Afzal (G. Hager): SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study. Paper presentation at PMBS23, the 14th Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Denver, CO, November 13, 2023.
G. Hager and J. Eitzinger: Resources for High Performance Computing at FAU. Talk at the FAU Graduate Centre, September 14, 2024.
A. Afzal: Performance Modeling Challenges in Extreme Scale Computing. Online talk at ICIAM 2023 Minisymposium “Progress and Challenges in Extreme Scale Computing and Big Data” [02458], Waseda University, Tokyo, Japan, August 22, 2023.
J. Laukemann: OSACA – A Multi-Platform Static Code Analyzer for In-core Performance Prediction. Talk at Scalable Tools Workshop 2023, Lake Tahoe, California, USA, June 19, 2023. Slides
G. Hager: Application Knowledge Required: Performance Modeling for Fun and Profit. Keynote at ICPE 2023, the 14th International Conference on Performance Engineering, Coimbra, Portugal, April 19, 2023.
T. Gruber: Pinning, Hardware Counting, and Microbenchmarking with LIKWID. Invited guest lecture in the CSE 6230 – “High Performance Parallel Computing” lecture, Georgia Institute of Technology, March 9, 2023.
G. Hager: Performance Engineering in CSE: A Bird’s-Eye View. Talk at the SIAM CSE23 Minisymposium “Performance Engineering and Applications” (MS167), Amsterdam, The Netherlands, March 1, 2023. Slides
C. L. Alappat: Accelerating Sparse Iterative Solvers and Preconditioners Using RACE. Talk at the SIAM CSE23 Minisymposium “Performance Engineering and Applications” (MS199), Amsterdam, The Netherlands, March 1, 2023.
G. Hager and J. Eitzinger: Resources for High Performance Computing at FAU. Talk at the FAU Graduate Centre, February 16, 2023.
G. Hager and J. Veh: News from NHR@FAU – Fritz, Alex and Woody. ECAP Seminar, FAU Erlangen-Nürnberg, January 19, 2023 (with Johannes Veh).
M. Wittmann: Parallelisierung mit OpenMP. Eingeladener Vortrag bei FSIM e.V. (Fachschaft Informatik/Mathematik), OTH Regensburg, 16. Januar 2023.

Jan Eitzinger, Torsten Wilde: BMBF Green HPC project Quelloffene Lösungsansätze für Monitoring und Systemeinstellungen für energieoptimierte Rechenzentren (EE-HPC) . Online talk at Powerstack Seminar (2022).
G. Wellein: Power, Energy, and HPC. Invited talk at the Workshop on Sustainability and Computational Science 2022, Lund University, Lund, Sweden, November 24, 2022.
F. Lange: Infinite MPS simulations for 1D systems in and out of equilibrium. Talk at the group seminar of the Chair for Quantum Theory, Department pf Physics, FAU Erlangen-Nürnberg, October 26, 2022.
T. Gruber: Using the LIKWID toolsuite. Webinar at the RSE Seminars. University of Cambridge, UK, November 3, 2022.
R. Ravedutti Lucio Machado: MD-Bench: A generic proxy-app toolbox for state-of-the-art molecular dynamics algorithms. Paper presentation at PPAM 2022, the 14th International Conference on Parallel Processing and Applied Mathematics, Gdansk, Poland, September 11-14, 2022.
A. Afzal: Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications. Paper presentation at PPAM 2022, the 14th International Conference on Parallel Processing and Applied Mathematics, Gdansk, Poland, September 11-14, 2022.
G. Hager: Spontaneous Asynchronicity – Parallel Programs out of Lockstep. Keynote at PPAM 2022, the 14th International Conference on Parallel Processing and Applied Mathematics, Gdansk, Poland, September 11-14, 2022.
G. Hager: Roofline Modeling and Performance Engineering. Invited talk with hands-on exercises at the 2022 CSCS-USI Summer University on Effective High-Performance Computing and Data Analytics, Serpiano, Switzerland, July 23, 2022.
C. Alappat: RACE: Speeding up Iterative Solvers and Spectral Clustering using Level-Based Blocking Techniques. Talk at MS5D, PASC 2022, Basel, Switzerland, 29 June 2022.
C. Alappat: RACE: Speeding Up Sparse Iterative Solvers Using Level-Based Blocking Technique. Talk at Sparse Days 2022, Saint-Girons, France, 20 June 2022.
C. Alappat: RACE: Speeding Up Sparse Iterative Solvers Using Level-Based Blocking Technique. Talk at MS41, ECCOMAS 2022, Oslo, Norway, 8 June 2022.
A. Afzal: The Role of Idle Waves in Modeling and Optimization of Parallel Programs. Online talk at the NHR PerfLab Seminar, April 26, 2022
C. L. Alappat: Performance Engineering for Sparse Matrix-Vector Multiplication with the Recursive Algebraic Coloring Engine. Online talk at the NHR PerfLab Seminar, February 1, 2022.

G. Hager: From numbers to insight via performance models. Online invited talk at the IACS Seminar at Stony Brook University, Stony Brook, NY, October 14, 2021. Video recording
G. Hager: The surprising dynamics of non-lockstep execution. Talk at the 18th ScalPerf Workshop, Bertinoro, Italy, September 19-23, 2021
T. Gruber: Introduction to the LIKWID performance toolkit. Online invited talk in the HPC.NRW Tools Talks Series, August 31, 2021. Video recording
A. Afzal: Variability is not all bad: parallel code out of lock-step. Invited talk at the High Performance, Parallel, and Distributed Computing Group of the Department of Mathematics and Computer Science, University of Basel, Switzerland, August 30, 2021.
G. Hager: Modeling and tuning of SpMV and a lattice QCD kernel on the A64FX. Invited talk at the online A64FX Symposium, Stony Brook University, NY, August 12, 2021. Slides
A. Afzal: Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact. ISC 2021 Digital, June 24-July 2, Frankfurt, Germany (online event).
G. Wellein: Performance Engineering for Sparse Matrix-Vector Multiplication. Invited talk at SPCL Bcast, ETH Zurich, March 23, 2021. Video Slides
C. L. Alappat and J. Seiferth: YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures. Talk at CGO 2021, March 2, 2021. Slides Video
G. Hager: A closer look at the Fujitsu A64FX processor. Public online talk in the NHR@FAU seminar, February 23, 2021. Video

C. L. Alappat: The A64FX processor: Understanding streaming kernels and sparse matrix-vector multiplication. EoCoE-II online webinar, November 18, 2020. Slides: A64FX_EoCoE.pdf, video recording: https://youtu.be/SrWT83lKVgs
C. L. Alappat and J. Laukemann: Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX. 11th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). PMBS20_A64FX.pdf, video recording: https://youtu.be/nqrrTCLTPDw
G. Wellein: Zum Verständnis des Performanceverhaltens moderner Prozessoren für dünn besetzte lineare Algebra. 10. HPC-Statuskonferenz der Gauß-Allianz, 1. Oktober 2020 (virtuell). HPC-Statustagung-2020-GW.pdf
A. Afzal: Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs. ISC 2020 Digital, June 22-24, Frankfurt, Germany (online event).
C. Alappat, Johannes Hofmann, Georg Hager, Gerhard Wellein: Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors. ISC 2020 Digital, June 22-24, Frankfurt, Germany (online event).
J. Hammer: Analytical Performance Modelling using the ECM Model, Kerncraft and OSACA. Invited Talk in the lecture “Performance Engineering” by Ana Lucia Varbanescu at the University of Amsterdam, May 15. 2020.

J. Laukemann: Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels. 10th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS19), Denver, CO, November 18, 2019 (co-located with SC19).
J. Eitzinger: Software-Eigenentwicklungen am RRZE – Ein Erfahrungsbericht. ZKI AK Supercomputing, FU Berlin, September 26-27, 2019.
J. Eitzinger: KONWIHR – Kompetenznetzwerk für wissenschaftliches Höchstleistungsrechnen in Bayern. ZKI AK Supercomputing, FU Berlin, September 26-27, 2019.
A. Afzal: Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study. Paper presentation at IEEE Cluster 2019, Albuquerque, NM, September 25, 2019.
D. Ernst: Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. Paper presentation at PPAM 2019, Bialystok, Poland, September 10, 2019.
C. L. Alappat: First results for performance modeling on ARM CPUs. 2nd ARM HPC Workshop, Shanghai, China, July 12, 2019.
C. L. Alappat: Recursive Algebraic Coloring Engine. Lawrence Berkeley National Laboratory (LBNL), Berkeley CA, June 14, 2019.
C. L. Alappat: Recursive Algebraic Coloring Engine. Georgia Institute of Technology, Atlanta, GA, June 18, 2019.
G. Hager: Von der Wettervorhersage zur Kernwaffe: Supercomputer – was sie sind und was sie können. Night of Science, Universität Frankfurt, 14. Juni 2019.

T. Gruber: LIKWID – Detecting performance limiting factors with hardware monitoring. Talk at aiXcelerate 2018 (HPC Tuning Workshop) at IT Center of RWTH Aachen University, Aachen, Germany, December 4, 2018.
T. Gruber: Single node optimization. Lecture at International HPC Summer School 2018, IT4Innovations National Supercomputing Center Ostrava, Czech Republic, July 12, 2018.
T. Gruber: The Performance Addict’s Toolbox. Talk at Heise Parallel 2018 conference, Heidelberg, Germany, March 8, 2018.
G. Hager: Making sense of performance numbers. Invited talk at OpenMPCon 2018, Barcelona, Spain, September 24-26, 2018.
G. Hager: Thirteen modern ways to fool the masses with performance results on parallel computers. GridKa School 2018, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, August 29, 2018.
C. Alappat: RACE: Recursive Algebraic Coloring Engine. PASC MS23, Basel, Switzerland, July 2-4, 2018.
G. Hager: Performance Engineering – Why and How? PASC MS05, Basel, Switzerland, July 2-4, 2018. PASC18_MS05_Hager.pdf
G. Wellein: Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018.
T. Köster: Porting Physical Parameterizations from a Climate Model to Accelerators. PASC MS25, Basel, Switzerland, July 2-4, 2018.
J. Eitzinger: Der unstillbare Hunger nach Rechenleistung: 25 Jahre High Performance Computing am RRZE, Video Recording, Campustreffen Sommersemester 2018, RRZE Erlangen, June 7, 2018
G. Hager: Von der Wettervorhersage zur Kernwaffe: Supercomputer – was sie sind und was sie können. Night of Science, Universität Frankfurt, 8. Juni 2018.
G. Wellein: “Ja wie schnell laufen sie denn?!” Performance Engineering fürs Höchstleistungsrechnen. Feierliche Inbetriebnahme des LiDo3 – TU Dortmund,16. Mai 2018.
J. Eitzinger: ProPE: Node Level Performance Engineering and Performance Patterns, Workshop: Parallele Programmierung in Computational Engineering and Science, RWTH Aachen, March 15, 2018
G. Hager: “If it doesn’t work, we learn something.” Instructive case studies from performance engineering. Minisymposium MS29 at SIAM PP18, the 2018 Conference on Parallel Processing, March 8, 2018, Tokyo, Japan.
G. Wellein: Performance Engineering for Sparse Linear Algebra Kernels: Navigating Between Models and Expectations. Minisymposium MS85 at SIAM PP18, the 2018 Conference on Parallel Processing, March 9, 2018, Tokyo, Japan.
J. Hammer: Cache-aware Scheduling and Performance Modeling with LLVM-Polly and Kerncraft. Second LLVM Performance Workshop at CGO, Saturday February 24th, 2018, Vienna, Austria.
J. Eitzinger: SIMD – past, present and future, Keynote talk, WPMVP Workshop at PPoPP 2018, February 24th, 2018, Vienna, Austria.

T. Gruber: LIKWID Monitoring Stack – A flexible framework enabling job specific performance monitoring for the masses. Talk at HPCMASPA Workshop held in conjunction with IEEE Cluster 2017, Honolulu, Hawaii, USA, September 5, 2017.
T. Gruber: LIKWID and performance monitoring with ECM. Talk at Lawrence Berkeley National Laboratory, Berkeley, USA, August 15, 2017.
T. Gruber: Performance Analysis with LIKWID. Talk at “Parallel Programming in Computational Engineering and Science 2017” at IT Center of RWTH Aachen University, Aachen, Germany, March 21, 2017.
G. Wellein: Performance Engineering for Scalable Sparse Eigensolvers in the DFG Project ESSEX: From basic building blocks to full scale applications. JST/CREST International Symposium on Post Petascale System Software, December 12th, 2017, Tokyo, Japan.
G. Wellein: Performance Engineering: Welcome to the world of FLOPs, Bytes and Cycles!. 34th ASE Seminar, December 13th, 2017, The University of Tokyo, Tokyo, Japan.
J. Eitzinger: Components for practical performance engineering in a computing center environment: The ProPE project, 7. GA Status Konferenz, HLRS Stuttgart, December 4, 2017.
J. Eitzinger: Eine kurze Einführung in Rechnerarchitektur und Programmierung von Hochleistungsrechnern als zentrales Werkzeug in der Simulation, Video recording, Collegium Alexandrinum, Erlangen, November 30, 2017
G. Wellein: Performance Engineering for HPC: Models Generating Insights.
- Austrian HPC Meeting 2017 (AHPC17), March 1-3, 2017, Grundlsee (Austria)
- Seminar, Faculty of Informatics, USI Lugano, March 29, 2017, Lugano (Switzerland)
- Invited Talk, General Assembly of SFB/TRR-55, June 2, 2017, Regensburg (Germany)
- EoCoE Face-to-Face Meeting Autumn 2017, November 29, 2017, Toulouse (France)
J. Eitzinger: Defining upper performance bounds using analytic performance models – Opportunities and Limitations, Dagstuhl Seminar 17431 “Performance Portability in Extreme Scale Computing: Metrics, Challenges, Solutions”, Schloß Dagstuhl, October 22, 2017
J. Eitzinger: Introduction and Demo: Likwid Performance Tools. Seminar Talk at University Regensburg, October 20, 2017
G. Hager: The curses and blessings of analytic performance modeling. Invited talk at PPAM 2017, the 12th International Conference on Parallel Processing and Applied Mathematics, Lublin, Poland, September 10-13, 2017.
J. Eitzinger: Evaluation of Intel Xeon Phi “Knights Landing”: Initial impressions and benchmarking results, Prace Xeon Phi User Forum, LRZ Garching, June 28, 2017.
G. Hager: Supercomputer: Mächtiges Werkzeug und Forschungsobjekt. Night of Science, Universität Frankfurt, 9. Juni 2017.
G. Hager: Thirteen modern ways to fool the masses with performance results on parallel computers. Evening talk at the Course on “Parallel Programming of High Performance Systems 2017”, LRZ Garching, March 6-10, 2017.
G. Hager: Making sense of temporally blocked stencil performance via analytic modeling. Invited talk at the 7th AICS International Symposium, Integrated Research Center of Kobe University, Kobe, Japan, February 23-24, 2017.

T. Gruber: Performance Analysis with LIKWID on IBM POWER8 chips. Talk at PADC Workshop 2016 at JSC Jülich, Jülich, Germany, October 18, 2016.
T. Gruber: Performance Analysis with LIKWID. Talk at “Parallel Programming in Computational Engineering and Science 2016” at IT Center of RWTH Aachen University, Aachen, Germany, March 17, 2016.
G. Hager: Performance Engineering for Algorithmic Building Blocks in GHOST. Talk at the ESSEX Minisymposium at the SPPEXA Symposium 2016, LRZ Garching, Germany, January 25-27, 2016
J. Hammer: From Regular to Irregular Algorithm Performance Modeling. Talk at UT Austin, May 6, 2016.
J. Eitzinger: Pattern-driven Performance Engineering. Talk at NEC User group, Osaka, Japan, May 23, 2016.
J. Eitzinger: Evaluation of Intel Xeon Phi “Knights Corner”: Opportunities and Shortcomings, Prace Xeon Phi User Forum, LRZ Garching, June 29, 2016.
J. Hammer: Modeling Approaches of Graph Algorithm Performance. Talk at UT Austin, August 12, 2016.
J. Hammer: From Tool Supported Performance Modeling of Regular Algorithms to Modeling of Irregular Algorithms. Talk at the Scalable Tools Workshop, Lake Tahoe, CA, August 2, 2016.
J. Hammer: Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels. Talk and poster at the International Parallel Tools Workshop, HLRS Stuttgart, October 5, 2016.
J. Eitzinger: Thoughts on Whitebox-Performance Modeling. Talk at the University of Amsterdam, February 15, 2016.
J. Hammer: Automatic Loop Kernel Analysis and Performance Modeling with Kerncraft. Talk at the University of Amsterdam, February 15, 2016.

J. Eitzinger: Systematic Node-Level Performance Engineering. Seminar talk, UT Austin, Austin, TX, USA, February 2, 2015.
J. Eitzinger: Employing the ECM performance model on SIMD Kahan summation. Talk at the Workshop on Programming Models for SIMD/Vector Processing, San Francisco, CA, USA, February 8, 2015.
J. Eitzinger: Evaluierung von Co-Array Fortran als Alternative zu MPI. Talk at Parallel 2015 Konferenz, Karlsruhe, Germany, April 22, 2015.
J. Eitzinger: Introducing the ECM diagnostic performance model. Invited talk at Scalperf 2015, Bertinoro, Italy, September 23, 2015.
G. Hager: Systematic Node-Level Performance Engineering. Talk at the SPEC DevOps Meeting, University of Würzburg, Germany, February 20, 2015.
G. Hager: Insight into stencil performance by analytic modeling. Talk at the Dagstuhl Seminar on Advanced Stencil Code Engineering, Schloss Dagstuhl, Wadern, Germany, April 13-17, 2015.
G. Hager: White-box modeling for performance and energy: Useful patterns for resource optimization. Invited lecture at PACO 2015, the Workshop on Power-Aware Computing, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany, July 6-7, 2015.
G. Hager: Model-guided performance engineering of numerical kernels. Invited talk at the meeting of the SFB Transregio 55 “Hadron Physics from Lattice QCD,” University of Wuppertal, Germany, July 10, 2015.
G. Hager: Holistic node-level performance engineering for maximum resource efficiency on modern multi-core CPUs. Talk at ParisTech TELECOM, Paris, France, September 7, 2015.
J. Hammer: Automatic Loop Kernel Analysis and Performance Modeling. Talk at the workshop on “Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems”, Supercomputing 2015, Austin, TX, November 13, 2015.
M. Kreutzer: Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems. Talk at the “2015 IEEE International Parallel and Distributed Processing Symposium” (IPDPS), Hyderabad, India, May 26, 2015.
F. Shahzad: Building a Fault Tolerant Application Using the GASPI Communication Layer. Paper presentation at the “1st International Workshop on Fault Tolerant Systems” (FTS2015), in conjunction with IEEE Cluster 2015, Chicago, IL, September 8, 2015.
M. Wittmann: Performance Modeling and Analysis of Stencil operations in Earth Mantle Convection Simulations. Talk at ParCo 2015, Symposium on Parallel solvers for very large PDE based systems in the earth and atmospheric sciences, Edinburgh, Scotland, September 1-4, 2015.
M. Wittmann: Locality and Performance Optimized Adjacency List Generation for Lattice Boltzmann Based Simulations. Talk at ParCFD 2015, Montreal, Canada, May 17-21, 2015.

Publications, Posters & Talks

Publications

2026

Further Publications 2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2001

Posters

2026

2025

2024

2023

2022

2021

2019

2018

2016

2015

2014

2013

2011

Talks

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015