Moonshot AI’s kimi-k2.7-code brings open-weight coding, 256k context, multimodal input, tool calling, JSON mode, partial mode, and cheaper cached inference for developers building code agents.

Moonshot AI has released kimi-k2.7-code, a new open-weights coding model designed for software engineering, autonomous agents, long-context reasoning, and multimodal development workflows. Announced on 2026-06-12, kimi-k2.7-code is positioned as a direct successor to Kimi K2.6, with Moonshot AI reporting major gains across coding and software benchmark suites while reducing the token cost of internal reasoning.
For developers, the headline is not just better benchmark performance. It is the combination of open weights, 256k-token context, long-thinking support, tool calling, JSON mode, partial mode, automatic context caching, and multimodal text-image-video input in one coding-oriented model. That mix makes kimi-k2.7-code relevant for code completion, repo-level refactoring, agent orchestration, RAG over large codebases, and AI IDE integrations.
kimi-k2.7-code is an open-weights coding model from Moonshot AI. The release information does not disclose a fixed parameter count, but Moonshot frames the model as an open-weights system optimized for code, reasoning, and long-context software tasks. Compared with K2.6, kimi-k2.7-code is designed to spend fewer tokens internally while producing stronger outputs, which matters for latency, routing, and agent cost control.
The model supports a 256k-token context window with long-thinking and deep reasoning capabilities. In practical terms, that means it can process large repositories, lengthy design documents, multi-file diffs, test suites, logs, and product specifications without forcing developers to aggressively chunk every input. Moonshot also describes kimi-k2.7-code as using a native multimodal architecture, supporting text, image, and video input rather than treating visual data as an external add-on.
Moonshot AI reports that kimi-k2.7-code improves substantially over K2.6 across several coding and software-engineering benchmarks. The strongest reported gain is on Kimi Code Bench v2, where kimi-k2.7-code delivers a +21.8% improvement over K2.6. That benchmark is especially relevant because it is Moonshot’s own coding benchmark and likely reflects the kinds of software tasks the company is optimizing for: code generation, debugging, repository reasoning, and tool use.
The model also improves by +11% on Program Bench and +31.5% on MLS Bench Lite compared with K2.6. Those numbers suggest broad gains across both traditional program-level coding tasks and more complex software-engineering scenarios. Perhaps just as important for production systems, Moonshot reports 30% fewer tokens used during internal reasoning versus K2.6. For agentic workloads that run long chains of tool calls, test-generation loops, and self-correction steps, that reduction can translate into faster runs and lower inference cost.
Moonshot AI’s verified API pricing for kimi-k2.7-code is $0.95 per million input tokens, $0.19 per million input tokens on cache hit, and $4.00 per million output tokens. The verified context window for pricing is 262,144 tokens, which aligns with the advertised 256k context window. The cache-hit input price is especially important for long-context coding agents, because repository indexing, repeated file scans, and multi-turn debugging sessions can reuse the same prompt context across many calls.
Developers should treat kimi-k2.7-code as a premium coding model rather than a low-cost chat model. At $4.00 per million output tokens, it is best suited for high-value software tasks such as architectural refactoring, multi-file debugging, test generation, agent planning, and production code review. For repetitive or latency-sensitive workloads, Moonshot’s separate 6x High-Speed mode is expected to arrive soon, while the open-weights release gives teams another path for self-hosting or custom deployment.
kimi-k2.7-code is best suited for developer workflows where context depth and reasoning quality matter more than raw chat throughput. The most obvious use case is coding assistance inside IDEs: generating code, explaining errors, rewriting modules, writing tests, reviewing pull requests, and producing structured patch suggestions. Partial Mode is particularly useful here because IDEs can render incremental model output while the model continues reasoning.
Developers can access kimi-k2.7-code through Moonshot AI’s API, via HuggingFace for open-weight deployment, and inside Kimi Code IDE. For API integrations, teams should start by enabling context caching where repeated repository context is likely to be reused. This is especially important for long-context agent loops, where the same files, specifications, or test outputs may be referenced repeatedly across multiple tool calls.
A practical first workflow is to use kimi-k2.7-code for high-value tasks rather than every autocomplete request. Start with repository-level debugging, test creation, pull-request review, or architecture analysis. Use JSON Mode when the model output must be consumed by another system, ToolCalls when the agent needs to inspect files or run commands, and Partial Mode when building interactive developer tools that need streaming responses.
API Pricing — Input: $0.95 / Output: $4.00 / Context: 262,144 tokens