Monthly HPC Café: Inference in the Age of Reasoning Models (hybrid)
The next HPC Café will take place on Tuesday, November 11, 2025, at 4:00 p.m. CET as a hybrid event. As always, there will be plenty of time to get in touch with your favorite HPC group. We invite you to come to NHR@FAU to enjoy coffee, cake, and computing.
The event starts at 4:00 p.m. CET with an open coffee chat. The presentation is scheduled for 4:30 p.m.
Topic: Inference in the Age of Reasoning Models
Speaker: Séverine Habert, NVIDIA
Location: seminar room 02.049 (RRZE, Martensstraße 1, 91058 Erlangen)
Or online via Zoom: https://go-nhr.de/hpc-cafe
Abstract: This presentation explores how distributed and disaggregated inference techniques enable scalable execution of large language models (LLMs), particularly in the context of reasoning and agentic AI. It highlights architectural optimizations such as KV caching, prefix reuse, KV-cache aware routing and KV-cache offloading which improve performance, reduce latency, and support efficient deployment at the cluster level of inference workloads.
Material from past events is available at: https://hpc.fau.de/teaching/hpc-cafe/

