My talk will present recent developments on matrix-free finite-element algorithms for numerically solving partial differential equations on complex geometries. The core ingredient is the computation of the integrals underlying the finite-element discretization on the fly. While this leads to algorithms with several hundreds of arithmetic operations per unknown and was traditionally considered too expensive compared to assembling a global sparse matrix, progress in performance engineering made it the fastest way to evaluate the matrix-vector product for practical cases of high-order discretizations with curvilinear unstructured hexahedral mesh elements or variable coefficients. The explanation is that the additional arithmetic work can be hidden behind the memory transfer of accessing the solution vectors, and in fact leverage a throughput close to simple finite difference stencils. I will present node-level performance results for high-order continuous and discontinuous Galerkin discretizations, including the case of adaptively refined meshes with hanging nodes.
With the achieved high throughput of the matrix-vector product, we have observed that other operations in common iterative solvers, such as the vector operations in multigrid smoothers or the conjugate gradient method, now take a significant share of run time both on GPUs and CPUs. I will present results of loop fusion to increase data locality, which benefit CPUs with large L2 and L3 caches.
Speaker bio: Martin Kronbichler is a Professor at the University of Augsburg, Germany. He holds a diploma in applied mathematics from Technical University of Munich, Germany (2007) and a PhD degree in scientific computing with specialization in numerical analysis from Uppsala University, Sweden (2012). His research interests include high-order finite element methods for flow problems with matrix-free implementations, efficient numerical linear algebra, and their parallel and high-performance implementation on emerging exascale hardware using generic numerical software.