Elon Musk announced on June 28 that Grok 4.5, sitting on top of xAI's new 1.5-trillion-parameter V9 foundation model, has entered private beta at SpaceX and Tesla and nowhere else. The test population, in other words, consists of two companies he owns.

That framing matters because everything else in the announcement is internal too. The parameter count is internal. The performance claim, that Grok 4.5 is "close to, perhaps exceeding" Anthropic's Claude Opus, comes from Musk's own posts with no published methodology, no test suite, and no scores. The supplemental training corpus is reportedly drawn from Cursor, the AI code editor that SpaceX has agreed to acquire for $60 billion, which is to say: an asset Musk controls, feeding a model Musk owns, being evaluated inside two more companies Musk runs.

The scale jump itself is real, or at least internally consistent. V9 at 1.5T is roughly three times the 0.5-trillion-parameter v8-small currently handling Grok workloads on X. Crypto Briefing frames it as a 50% lift over a "Grok 4.4" at 1 trillion parameters from late May, though that intermediate version isn't corroborated elsewhere and is best treated as unverified. The release schedule has already drifted: V9 was originally targeted for late May, and Prokerala cited a mid-June public release for Grok 4.5. Neither landed.

The Cursor data is being positioned as a coding-and-technical-competence upgrade, paired with an internal harness Musk calls "Grok Build" and describes as "becoming better every day." He has also pledged a cadence of fully-from-scratch models shipped via SpaceX every month through year-end, with Grok 5 variants projected up to 10 trillion parameters.

Set this against what xAI itself published for Grok 4 Heavy on third-party benchmarks: 50.7% on Humanity's Last Exam, 15.9% on ARC-AGI V2, 61.9% on USAMO'25, and a $4,694.15 net-worth figure on Vending-Bench. Those are numbers outside observers can argue with. Grok 4.5 has none of them yet. Wikipedia's chatbot entry, summarizing independent coverage, already notes that many xAI-published metrics derive from internal evaluations or community leaderboards rather than peer-reviewed benchmarks.

The deeper pattern is the one worth naming. The 2023–2024 frontier-model cycle established a convention, weak but real, that capability claims arrived alongside external evals. Grok 4.5 inverts that: scale disclosed, capability asserted, access withheld, evaluation conducted in-house. The beta isn't a test of the model. It's a test of whether the announcement is enough.

Sources

https://x.ai/news/grok-4
https://en.wikipedia.org/wiki/Grok_(chatbot)
https://cryptobriefing.com/grok-4-5-private-beta-spacex-tesla/
https://www.prokerala.com/news/articles/a1781158.html
https://cryptobriefing.com/xai-grok-4-5-v9-model-upgrade/