Reviews · JULY 3, 2026
Claude Sonnet 5 lands at 63.2% on SWE-bench Pro, six points off Opus 4.8
Anthropic's new default Sonnet ships June 30 at $2/$10 per million tokens introductory, with an updated tokenizer that inflates the same text by 1.0–1.35× and the first real-time cyber safeguards on a Sonnet-class model.
Anthropic shipped Claude Sonnet 5 on June 30, scoring 63.2% on SWE-bench Pro against Opus 4.8's 69.2% and its own predecessor Sonnet 4.6 at 58.1%. It's now the default model on the Free and Pro plans, and available on Max, Team, and Enterprise.
The headline for buyers is the sticker: $2 per million input tokens and $10 per million output tokens, introductory, through August 31, 2026. After that, pricing reverts to $3/$15, described by Anthropic as "unchanged from Sonnet 4.6."
That "unchanged" is doing work. Sonnet 5 ships with a new tokenizer, the same architectural swap Anthropic made with Opus 4.7, and it inflates token counts by 1.0–1.35× for the same input depending on content type. Platform docs put the practical hit at "approximately 30% more tokens for the same text." At standard pricing, a workload that cost $15 in output on Sonnet 4.6 quietly becomes roughly $19.50 on Sonnet 5. The introductory window absorbs that gap; September doesn't. Anthropic acknowledges the arithmetic directly, if delicately: "The introductory pricing is set so that the transition to Sonnet 5 is roughly cost-neutral." Cost-neutral for whom, and for how long, is the actual product question.
The benchmark story is the more interesting one. Sonnet 5 hits 80.4% on Terminal-Bench 2.1 against Opus 4.8's 82.7% (Sonnet 4.6 was at 67.0%), and it lands within statistical noise of Opus 4.8 on Humanity's Last Exam with tools (57.4% vs. 57.9%) and GDPval-AA v2 (1,618 vs. 1,615). OSWorld-Verified moves from a revised 78.5% baseline to 81.2%. A mid-tier model closing to within a point of the flagship on two out of five headline benchmarks is the sort of compression that has defined every Sonnet release since the 3.5 cycle, and it's why Anthropic can treat Sonnet, not Opus, as the workhorse SKU.
The system card is careful. Sonnet 5 is "somewhat stronger than its predecessor" on cyber tasks but remains "significantly less capable" than Mythos 5, "does not advance our capability frontier" against Opus- or Mythos-class models, and poses "very low alignment risk," though higher than prior Sonnets. It doesn't cross the CB-2 threshold or automated AI R&D triggers. It's, however, the first Sonnet-tier model to ship with real-time cyber safeguards, a detection layer previously reserved for Opus 4.7 and 4.8. Refusals now return HTTP 200 with stop_reason: 'refusal'; manual extended thinking and non-default temperature, top_p, or top_k throw 400s. Adaptive thinking is on by default. Priority Tier is unavailable. Context is 1M tokens, max output 128k.
Read the pricing calendar and the tokenizer swap together and Sonnet 5 is less a discount than a repricing dressed as one, timed to a two-month window in which the meter runs slower than the model actually eats.
Sources
- https://www.anthropic.com/news/claude-sonnet-5
- https://www.anthropic.com/claude-sonnet-5-system-card
- https://techcrunch.com/2026/06/30/anthropic-launches-claude-sonnet-5-as-a-cheaper-way-to-run-agents/
- https://platform.claude.com/docs/en/about-claude/models/whats-new-sonnet-5
- https://venturebeat.com/technology/anthropic-launches-claude-sonnet-5-at-a-steep-discount-to-its-top-model-as-the-company-races-toward-a-blockbuster-ipo