Dr. Sebastian Kuckuk

Dr. Sebastian Kuckuk

Erlangen National High Performance Computing Center
Training and Support

Room: Room 1.131
Martensstraße 1
91058 Erlangen


Sebastian holds a PhD in computer science from FAU, where he currently works as a researcher. His primary interests revolve around enhancing performance portability and programmer productivity through the use of domain-specific programming languages, code generation techniques, automatic parallelization, and GPU programming. He applies these methodologies in the context of massively parallel numerical solvers for computational fluid applications and other related fields.

In 2021, Sebastian joined NHR@FAU as a liaison scientist for the Chair of System Simulation. He simultaneously became a member of the Training and Support division which he joined as a full member in spring of 2024. In addition, he serves as an NVIDIA Deep Learning Institute (DLI) university ambassador and is certified to teach DLI courses covering both introductory and advanced concepts in GPU programming. Sebastian contributes his expertise by engaging in smaller project-driven consultation activities, conducting lecture exercises, and delivering single- and multi-day tutorials.

During his free time, Sebastian finds pleasure in cooking and biking. He also pursues his interests in learning the Japanese language and exploring topics in psychology and philosophy.

NHR Activities, Projects, and Support

Sebastian provides support and consulting for KONWIHR and NHR projects revolving around GPU programming, performance analysis and optimization.


Sebastian completed multiple courses offered by the NVIDIA Deep Learning Institute (DLI) on GPU programming and CUDA-accelerated applications that scale across multiple GPUs. He attained certification as DLI ambassador and was certified to teach the following courses:

  • Fundamentals of Accelerated Computing with CUDA C/C++
  • Fundamentals of Accelerated Computing with CUDA Python
  • Accelerating CUDA C++ Applications with Multiple GPUs
  • Scaling CUDA C++ Applications to Multiple Nodes

A list of upcoming and past courses can be found here.


Sebastian contributes his expertise to the following lectures:

  • High-End Simulation in Practice (HESP)
  • Programming Techniques for Supercomputers (PTfS)


Sebastian is a maintainer and developer of

  • the ExaStencils code generation framework for massively parallel multigrid solvers, and
  • the GHODDESS module for quadrature free higher-order discretizations of the shallow water equations.

Further information can be found on the official page https://www.exastencils.fau.de/ .

List of Publications














Automatic Code Generation for Massively Parallel Applications in Computational Fluid Dynamics

An open access version of the thesis is available here (PDF).


Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domain‐specific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domain‐specific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patch‐structured grids.

We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlab‐like syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and large‐scale cluster alike.

We showcase the applicability of our approach by implementing simple test problems, like Poisson’s equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, Navier‐Stokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of Navier‐Stokes, we also extend our implementation towards non‐uniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being non‐Newtonian and non‐isothermal.