NHR@FAU seminar: A closer look at the Fujitsu A64FX processor

Fujistu A64FX. (c) Fujitsu, with permission
Source: Fujitsu. With permission.

Competitive Arm-based processor designs have been entering the HPC scene in recent years. With the Fujitsu A64FX, one of these has finally made it to the top of the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. In this talk we present performance modeling and benchmarking results for the A64FX, with a special focus on sparse matrix-vector multiplication (SpMV). We detail the construction of the Execution-Cache-Memory (ECM) performance model for the A64FX processor and validate it using streaming loops. We also point out peculiarities in the microarchitecture to keep in mind when optimizing applications, and why the CRS matrix storage format is inappropriate and should be dropped in favor of SELL-C-sigma in order to achieve bandwidth saturation for SpMV. In this context, we also look into some code optimization strategies that are relevant for A64FX and compare its SpMV performance with current x86 processors and the NVIDIA V100.

Date & time: Tuesday, February 23, 2 p.m.

Speaker: Georg Hager, Head of NHR@FAU Training & Support

Slides: A64FX_NHR.pdf