Reviews · JUNE 10, 2026
Microsoft ships seven MAI models, with MAI-Thinking-1 at 53% on SWE-Bench Pro and zero distillation
At Build 2026 in San Francisco, Microsoft AI unveiled a seven-model in-house family — led by a 35B-active-parameter MoE reasoning flagship trained from scratch — and put first-party silicon, GitHub Copilot defaults, and a Sonnet 4.6 preference claim on the line.
Microsoft AI used its Build 2026 keynote in San Francisco on June 2 to ship seven in-house MAI models at once, headlined by MAI-Thinking-1, a 35-billion-active-parameter sparse Mixture-of-Experts reasoning model with a 256K context window that the company says was trained from scratch, no distillation from a frontier partner involved. The framing is what matters. After $13 billion into OpenAI and $5 billion into Anthropic, Microsoft is now publicly arguing it can build alongside its suppliers rather than only on top of them.
"We believe the time has come for every company to just move from consuming a frontier model to fully participating at the frontier," Satya Nadella said onstage. The line landed the day after Anthropic confidentially filed for IPO, and it's hard to read as anything other than a portfolio-wide hedge being made legible to investors.
The numbers Microsoft put on the board are the kind that get cited later. MAI-Thinking-1 posts 97.0% on AIME 2025, 94.5% on AIME 2026, and 53% on SWE-Bench Pro, which the keynote placed "right alongside Opus 4.6." On a blind side-by-side human evaluation across 1,276 single- and multi-turn tasks scored by Surge raters, Microsoft claims the model is preferred over Claude Sonnet 4.6.
Underneath the flagship, the family does the unglamorous work. MAI-Code-1-Flash, at 5 billion active parameters and 51% on SWE-Bench Pro, ships today as the default model in VS Code and the GitHub Copilot CLI. GitHub COO Kyle Daigle described it as "inference ultra-efficient," and CNBC called it Microsoft's "inaugural" model translating natural-language descriptions into source code. MAI-Image-2.5 and its Flash variant sit at #2 on the Arena image-edit leaderboard with a 1403±9 score, narrowly above Gemini 3 Pro Image Preview 2K (1388±3) and Gemini 3.1 Flash Image Preview / Nano Banana 2 (1389±4). MAI-Transcribe-1.5 leads in WER on 18 of 43 FLEURS languages, outperforming GPT-4o-Transcribe, Scribe v2, and Gemini 3.1 Flash Lite. MAI-Voice-2 covers 15 languages with watermarked outputs; MAI-Voice-2-Flash was announced but isn't shipping yet.
The silicon claim is where the strategy stops being abstract. Microsoft says MAI-Thinking-1 runs at a 1.4× performance-per-watt gain on its first-party Maia 200 versus NVIDIA's GB200, a figure separate from the 30% improvement Nadella cited earlier in the keynote. MAI-Thinking-1 is in private preview on Microsoft Foundry and routed through OpenRouter, Fireworks, and Baseten.
The arc here rhymes with Amazon's 2018 pivot from selling AWS compute to Anthropic and others toward Trainium and its own Titan models: invest in the suppliers, learn the workload, then build the substrate underneath them. Microsoft has now started the same maneuver in public, with benchmarks attached.