Monthly HPC Café: Inference in the Age of Reasoning Models (hybrid)

A cup of coffee and a plate with a sweet piece of pastry on it in front of a computer keyboard.

The next HPC Café will take place on Tuesday, November 11, 2025, at 4:00 p.m. CET as a hybrid event. As always, there will be plenty of time to get in touch with your favorite HPC group. We invite you to come to NHR@FAU to enjoy coffee, cake, and computing.

The event starts at 4:00 p.m. CET with an open coffee chat. The presentation is scheduled for 4:30 p.m.

Topic: Inference in the Age of Reasoning Models

Speaker: Séverine Habert, NVIDIA

Location: seminar room 02.049 (RRZE, Martensstraße 1, 91058 Erlangen)

Or online via Zoom: https://go-nhr.de/hpc-cafe

Abstract: This presentation explores how distributed and disaggregated inference techniques enable scalable execution of large language models (LLMs), particularly in the context of reasoning and agentic AI. It highlights architectural optimizations such as KV caching, prefix reuse, KV-cache aware routing and KV-cache offloading which improve performance, reduce latency, and support efficient deployment at the cluster level of inference workloads.

Material from past events is available at: https://hpc.fau.de/teaching/hpc-cafe/