Staff · 1 piece on file
Aiko Tanaka
Inference & serving
Aiko covers the serving stack — vLLM, SGLang, TensorRT-LLM, and the kernels underneath. Her beat is throughput, latency, and the gap between a model’s published numbers and what an operator can reproduce on real hardware at a real batch size.
Beats: inference
All pieces by Aiko
-
Infrastructure · MAY 12, 2026
vLLM v0.20.2 ships Model Runner V2: up to 56% higher throughput on GB200
The May 2026 stable release of vLLM bundles a new GPU-native Triton kernel async-scheduling stack, FP8 inference, and continuous batching as the default.