AI Model Report

Reviews · JUNE 23, 2026

OpenAI ships GPT-5.5-Cyber at 85.6% on CyberGym and pivots Daybreak from finding bugs to patching them

The full release lands alongside Patch the Planet, a Trail of Bits-led open-source remediation effort that has already merged dozens of patches across 19 projects — including a Firefox WebAssembly flaw fixed two days before Pwn2Own Berlin.

By Karl Strauchman · Senior model reviewer · June 23, 2026

OpenAI shipped the full version of GPT-5.5-Cyber on June 22, 2026, scoring 85.6% on CyberGym against the base GPT-5.5's 81.8%, and paired the release with Patch the Planet, a Trail of Bits-led remediation program already counting more than 30 committed open-source projects. The benchmark lift is 3.8 points. The framing shift is bigger.

For most of the past two years, the offense-defense debate around frontier models has been stuck on a single question: can the model find the bug. Daybreak's earlier work, and the Five Eyes warning earlier this year that advanced AI hacking models were months rather than years away, both pointed in that direction. GPT-5.5-Cyber's pitch is different. It's scoped to the side of the loop that doesn't trend on Twitter, which is closing the bug after someone or something else finds it.

The results from the initial five-day sprint read like a maintainer's nightmare ledger. Across 19 projects, the system surfaced hundreds of issues and merged dozens of patches. Coverage of the Linux kernel spanned more than 30 million lines of code and produced 8 kernel-pointer information-leak PoCs and 24 local privilege-escalation exploits. FreeBSD yielded 34 confirmed vulnerabilities and 7 local-privesc PoCs. OpenBSD shipped with a 23-year-old use-after-free in its kernel implementation of System V semaphores. Six dnsmasq bugs. An HTTP/2 Bomb denial-of-service technique affecting NGINX, Apache, and IIS.

The Firefox detail is the one to sit with. Mozilla patched a WebAssembly flaw two days before Pwn2Own Berlin, and 5 of the 6 Firefox entries registered for the contest withdrew. A defensive tool measurably altered the economics of an offensive showcase, in public, on a clock.

Patch the Planet's existence is itself a structural read on open source. SiliconANGLE cites Linux Foundation and Harvard research showing that in 94% of widely-used open-source projects studied, fewer than 10 developers contribute more than 90% of code added in a year. The supply chain everyone depends on is a few people answering email. Committed participants now include cURL, the Go project, Sigstore, pyca/cryptography, NATS Server, aiohttp, freenginx, and python.org, with Trail of Bits and HackerOne coordinating the workflow.

The commercial scaffolding around all this is the Daybreak Cyber Partner Program, seeded with Accenture, Cisco, CrowdStrike, IBM, Okta, Palo Alto Networks, and Wiz. Codex Security, which entered research preview around March 2026, has scanned more than 30,000 codebases and more than 30 million commits, with more than 70,000 findings marked as fixed by human reviewers.

The interesting question isn't whether GPT-5.5-Cyber is the strongest offensive model on the market. It's whether OpenAI has decided the more defensible product surface is patches, not exploits. Frontier labs have spent two years insisting the offense-defense balance would tilt toward defenders eventually. This is the first release that treats that claim as a roadmap rather than a talking point.

Sources