Dr. Sebastian Kuckuk

Dr. Sebastian Kuckuk

Training and Support

Central Scientific Institutions
Erlangen National High Performance Computing Center

Room: Room 1.131
Martensstraße 1
91058 91058 Erlangen

NHR Activities, Projects, and Support

Sebastian Kuckuk has already provided support in analyzing libxc that was focused on the overall workflow, the compute kernel composition, the maple-to-C scripts, and revealing potential for optimization. Based on this, he provided suggestions and a proof-of-concept implementation for employing code generation techniques using Python and SymPy.

Moreover, he provides consultation on the efficient use of GPUs. Noteworthy topics include porting legacy code to accelerators, profiling and performance analysis for GPU-enabled applications, employing hybrid (CPU-GPU) parallelization for application codes, and porting applications to energy-efficient SoC architectures. Sebastian has already worked with an MPI/OpenMP hybrid C++ code accelerated with CUDA and a C/C++/FORTRAN code that was later accelerated with OpenCL.

Sebastian is certified as NVIDIA Deep Learning Institute (DLI) University Ambassador.


Sebastian completed multiple courses offered by the NVIDIA Deep Learning Institute (DLI) on GPU programming and CUDA-accelerated applications that scale across multiple GPUs. He attained certification as DLI ambassador and was certified to teach the following courses:

  • Fundamentals of Accelerated Computing with CUDA C/C++
  • Fundamentals of Accelerated Computing with CUDA Python
  • Accelerating CUDA C++ Applications with Multiple GPUs
  • Scaling CUDA C++ Applications to Multiple Nodes

A list of upcoming and past courses can be found here.

Further, he developed a training unit on performance analysis for stencil codes on GPUs; this entailed implementing benchmark cases, profiling different configurations, data editing, and creating slides. A first version was presented as part of the Programming Techniques for Supercomputers (PTfS) lecture at FAU. In addition, he conceptualized a new workshop on performance portability and programmer productivity and realized first teaching units as interactive Jupyter notebooks.


Maintainer and developer of the ExaStencils code generation framework for massively parallel multigrid solvers and the GHODDESS module for quadrature free higher-order discretizations of the shallow water equations.


List of Publications













Automatic Code Generation for Massively Parallel Applications in Computational Fluid Dynamics

An open access version of the thesis is available here (PDF).


Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domain‐specific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domain‐specific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patch‐structured grids.

We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlab‐like syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and large‐scale cluster alike.

We showcase the applicability of our approach by implementing simple test problems, like Poisson’s equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, Navier‐Stokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of Navier‐Stokes, we also extend our implementation towards non‐uniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being non‐Newtonian and non‐isothermal.