GLM-5.2 by Zhipu AI: The 1M-Context Open-Source Milestone Model

Zhipu AI's GLM-5.2, released on 2026-06-16, is a MIT-licensed flagship language model with a 1M-token context, 128K-token output, and IndexShare architecture. For developers, it signals a new era of open-source, project-scale coding agents and long-context reasoning systems.

June 16, 2026

Model ReleaseGLM-5.2

Introduction

GLM-5.2 by Zhipu AI (Z.AI), released on 2026-06-16, is a milestone language model release for developers building serious AI engineering systems. It is positioned as a flagship foundation model with a truly usable 1,000,000-token context window, 128,000-token maximum output, and MIT-licensed weights available on Hugging Face and ModelScope.

Those numbers matter because long context has often been a headline feature without practical throughput or cost. GLM-5.2 pairs the context window with IndexShare, speculative decoding improvements, and open weights, making it one of the most historically significant open-source model releases for project-scale coding agents, RAG, and autonomous engineering workflows.

Provider: Zhipu AI (Z.AI)
Release date: 2026-06-16
Category: language model
Open source: Yes, MIT license
Strategic significance: flagship open model with 1M usable context and 128K output

Key Features & Architecture

The architecture story is centered on making million-token context economically usable. Zhipu AI introduces IndexShare, which it reports reduces per-token FLOPs by 2.9x at 1M context length. The model also improves speculative decoding with MTP, IndexShare, and KVShare, yielding a 20% acceptance-length increase, a critical detail for latency-sensitive agent loops.

GLM-5.2 is also framed as a developer-facing model rather than just a chat model. It supports multiple thinking effort levels, High and Max, so teams can trade latency for deeper reasoning. The release also highlights function calling, context caching, structured output, streaming, and MCP integration, which are the plumbing needed for agentic systems.

Parameters: N/A in the release facts provided; do not infer a size class from benchmark claims.
MoE/routing: N/A in the release facts provided; the named architectural contribution is IndexShare.
Input context window: 1,000,000 tokens.
Maximum output: 128,000 tokens.
Multimodal capabilities: N/A / not claimed in the provided release facts; GLM-5.2 is described here as a language model.
IndexShare: reduces per-token FLOPs by 2.9x at 1M context length.
Speculative decoding: 20% acceptance-length increase via MTP with IndexShare and KVShare.
Thinking effort levels: High and Max.
Developer interfaces: function calling, context caching, structured output, streaming, and MCP integration.
License: MIT; weights are available on Hugging Face and ModelScope.

GLM-5.2 by Zhipu AI: The 1M-Context Open-Source Milestone Model

Introduction

Key Features & Architecture

Performance & Benchmarks

API Pricing

Use Cases

Getting Started

Sources