NHR PerfLab Seminar: LARC: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache

Symbolic picture for the article. The link opens the image in a large view.

Speaker: Dr. Jens Domke, RIKEN Center for Computational Science (R-CCS), Kobe, Japan

Title: LARC: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache

Date and time: Tuesday, September 6, 2:00 p.m. – 3:00 p.m.

Slides

Abstract: Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this talk, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of LARC, a processor fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a board set of proxy-applications and benchmarks, we aim to reveal where HPC CPU performance could be circa 2028.

Dr. Jens DomkeSpeaker bio: Jens Domke is the Team Leader of the Supercomputing Performance Research Team at the RIKEN Center for Computational Science (R-CCS), Japan. He received his doctoral degree from the Technische Universität Dresden, Germany, in 2017 for his work on HPC routing algorithms and interconnects. Jens started his career in HPC in 2008, after he and a team of five students of the TU Dresden and Indiana University, won the Student Cluster Competition at SC08. Since then, he published dozens of peer-reviewed journal and conference articles. Jens contributed the DFSSSP and Nue routing algorithms to the subnet manager of InfiniBand, and built the first large-scale HyperX prototype at the Tokyo Institute of Technology. His research interests include system co-design, performance evaluation, extrapolation, and modelling, interconnect networks, and optimization of parallel applications and architectures.