Item: Microsoft
Author: Lars Iverson

Microsoft shipped seven in-house models at Build 2026 on June 2, and the line that mattered wasn't a benchmark. It was a disclaimer: "We don't distill from other labs and we don't rely on opaque data. Every component of the system, from architecture to training pipeline to post-training, we built ourselves." After $13 billion into OpenAI and $5 billion into Anthropic, Mustafa Suleyman's AI Superintelligence team is signaling that the partner-model era has an expiration date.

The headliner is MAI-Thinking-1, a 35B-active-parameter MoE reasoning model with a 256K context window. It posts 97.0% on AIME 2025, 94.5% on AIME 2026, 87.7% on LiveCodeBench v6, and 53% on SWE-Bench Pro. Surge ran a blind human evaluation across 1,276 single- and multi-turn tasks against Claude Sonnet 4.6. The accompanying technical report, dated June 2, frames the architecture around FlashAttention-4 and a training regime co-designed against Microsoft's Maia 200 silicon for a 1.4× performance-per-watt gain end-to-end.

The data pipeline is the part legacy labs will read twice. Microsoft started from a 1.2-trillion-page proprietary crawl, pulled it down to 794 billion pages after UT1 blocklisting and AI-generated-content filtering, and folded in a 24.2-billion-page deduplicated Common Crawl slice. Several thousand small proxy models, ranging from 760M to 4B active parameters, were trained purely to predict which data mixes would scale.

Then there's MAI-Code-1-Flash, a 5B-active coding model that hits 51% on SWE-Bench Pro and ships today as the default in VS Code and GitHub Copilot CLI. MAI-Image-2.5 sits at #2 on the Arena Image Edit leaderboard with 1403±9, above Gemini 3 Pro Image Preview 2K at 1388±3. MAI-Transcribe-1.5 leads in 18 of 43 FLEURS languages against GPT-4o-Transcribe, Scribe v2, and Gemini 3.1 Flash Lite.

The economic argument lives inside Frontier Tuning, Microsoft's enterprise RL-environments system. A configuration tuned for Excel runs up to 10× cheaper than GPT-5.4. One tuned with McKinsey runs roughly 10× cheaper than GPT-5.5. A June 8 update added Mayo Clinic as a healthcare co-development partner. Weights will be available through OpenRouter, Fireworks, and Baseten.

The structural read is straightforward. Azure has spent three years as OpenAI's largest distribution channel, and CNBC noted the obvious corollary: every token served on a partner model is margin Microsoft doesn't keep. Seven models, a proprietary corpus, custom silicon, and an enterprise tuning layer aren't a hedge. They're a vertical stack, and the disclaimer about distillation is really a notice to the cap table.

Microsoft ships seven MAI models from scratch, declares independence from OpenAI distillation

Sources