AI Model Report

Reviews · JUNE 21, 2026

Gemini 3.5 Pro's June window narrows as Google ships Flash first and the rest of the field crowds in

Google launched Gemini 3.5 Flash at I/O and routed gemini-3-pro-preview to gemini-3.1-pro-preview, leaving the promised 3.5 Pro flagship unshipped while Anthropic, xAI, Microsoft and DeepSeek file new models against the same calendar.

By Karl Strauchman · Senior model reviewer · June 21, 2026

Google shipped Gemini 3.5 Flash at I/O 26 and quietly redirected the gemini-3-pro-preview endpoint to gemini-3.1-pro-preview, which is a polite way of saying the 3.5 Pro flagship didn't make the keynote. A month later, June is closing without it, and the calendar Google was hoping to own is now contested by Anthropic, xAI, Microsoft, and DeepSeek.

Flash itself is a strong release. Google Cloud's launch post lists 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, and 84.2% on CharXiv, with the company claiming the smaller model outperforms Gemini 3.1 Pro across all four. DeepMind's evaluation page adds a 42% gain over Flash 3 on a long-range multi-turn cyber benchmark, a 68% improvement in token efficiency, a 19.6% lead over Gemini 3 Flash on Box's enterprise work evaluation set, and 96.4% greater accuracy in data extraction for Life Sciences customers.

That's a Pro-class model wearing a Flash badge, and it explains the launch posture.

Google also used I/O to push Managed Agents into public preview, running in Google-hosted Linux sandboxes, with antigravity-preview-05-2026 as the first general-purpose managed agent. The customer roster reads like a deliberate spread across workflows: Shopify running subagents in parallel for merchant growth forecasts, Macquarie Bank reasoning over 100+ page onboarding documents, Salesforce wiring 3.5 Flash into Agentforce subagents, Ramp pointing multimodal understanding at invoice OCR, Xero running multi-week 1099 workflows. TechCrunch's Rebecca Bellan summarized the framing cleanly: Google is betting its next AI wave on agents, not chatbots. For shops building across providers, model-agnostic infrastructure layers like LemonLime become the natural way to absorb a release like this without rewiring everything downstream.

The competitive picture is where Pro's absence stings. Anthropic spent the same window losing Claude Fable 5 and Claude Mythos 5 after the U.S. government ordered access shut off citing national security concerns. Fable 5 had been out three days, and by Vals AI's benchmarks was the most capable model available to the public at the time. Anthropic protested via blog post: "we disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people."

That episode is a gift to whoever ships next. Google has the slot, the agent infrastructure, the enterprise wiring, and a Flash model that benchmarks above its own previous Pro. What it doesn't have, yet, is 3.5 Pro.

The pattern is familiar. The 2023 GPT-4 cycle showed that flagships shipped into a crowded field get measured against whatever landed the week before, not against the roadmap they were promised on. Every additional week of June that closes without 3.5 Pro hands a competitor a comparison Google would rather not be in.

Sources