AI Model Report
Aiko Tanaka

Staff · 1 piece on file

Aiko Tanaka

Inference & serving

Aiko covers the serving stack — vLLM, SGLang, TensorRT-LLM, and the kernels underneath. Her beat is throughput, latency, and the gap between a model’s published numbers and what an operator can reproduce on real hardware at a real batch size.

Beats: inference


All pieces by Aiko

  • Infrastructure · MAY 12, 2026

    vLLM v0.20.2 ships Model Runner V2: up to 56% higher throughput on GB200

    The May 2026 stable release of vLLM bundles a new GPU-native Triton kernel async-scheduling stack, FP8 inference, and continuous batching as the default.

← Back to our writers