AI Model Report

Open Source · JUNE 24, 2026

GLM-5.2 lands at 744B parameters, MIT-licensed, and tied with Opus 4.8 on long-horizon coding

Z.ai's open-weights flagship debuts at #1 on open-source coding boards with a 1M-token context, IndexShare cutting per-token FLOPs 2.9x, and API pricing roughly one-sixth of GPT-5.5's.

By Lars Iverson · Open source & model weights · June 24, 2026

Z.ai shipped GLM-5.2 to its GLM Coding Plan subscribers on June 13, 2026, then three days later pushed MIT-licensed weights, a standalone API, and a chatbot into the open. The frame matters as much as the artifact: a 744–753 billion-parameter Mixture-of-Experts model with roughly 40B active parameters per token, posted to Hugging Face under zai-org/GLM-5.2 with what the release statement calls "no regional limits" and "technical access without borders." The closed-frontier labs now have a free, redistributable competitor sitting one decimal place behind them on the benchmarks that matter for software engineering.

The numbers are the story. On SWE-bench Pro, GLM-5.2 scores 62.1, ahead of GPT-5.5 at 58.6 and the prior GLM-5.1 at 58.4, with Gemini 3.1 Pro trailing at 54.2%. On FrontierSWE, designed to measure long-horizon task completion, it lands at 74.4%, effectively tied with Claude Opus 4.8 at 75.1% and ahead of GPT-5.5 at 72.6%. MCP-Atlas, the tool-use eval, puts it at 77.0 against Opus 4.8's 77.8 and GPT-5.5's 75.3. These are open weights drawing even with the frontier on coding work.

The architectural play is IndexShare, which reuses sparse-attention top-k indices across every four layers and cuts per-token compute by 2.9x at the 1-million-token context length. A refreshed Multi-Token Prediction layer lifts accepted token length by up to 20% during speculative decoding. Together with inherited MLA and DSA techniques familiar from earlier GLM and DeepSeek-style designs, the model serves a 1M-token context and a 131,072-token output ceiling at costs the incumbents can't currently match.

That's the second story. Z.ai's API runs roughly $1.40 per million input tokens and $4.40 per million output, with enterprise plans starting at $12.60 a month. On Artificial Analysis's AA-Briefcase, GLM-5.2 averaged $2.40 per task versus Opus 4.8's $10.40 and GPT-5.5 xhigh's $3.68. Elo standings still favor the frontier (Opus 4.8 at 1356, GLM-5.2 at 1266, Claude Fable 5 at 1587), and AA-Briefcase remains punishing enough that the top model satisfies all rubric criteria on just 3% of tasks. But the cost-adjusted picture is what enterprise procurement actually sees.

Endpoints at https://api.z.ai/api/coding/paas/v4 are Anthropic-compatible, with out-of-box support for Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, and Kilo Code, plus GGUF builds on llama.cpp and Unsloth. Writing in the Latent Space digest, Sebastian Raschka tracked the architectural lineage; Jeremy Howard, who flagged the absent vision support, called the model "at least as good as Opus 4.8 and GPT 5.5" in daily use.

The skepticism is professional. Lian Jye Su, an analyst at Omdia, cautioned that the claims "still need wider validation, particularly around hallucination control and coherence during extended tasks." That caveat is the live question for every long-horizon agentic deployment. What's no longer in question is whether an open-weight MoE can sit at the table.

Sources