AI Tools
Kimi K2.7 Code Just Dropped: The Open-Source Coding Agent That Beats Claude Opus on Tool Use

Most open-source model releases improve incrementally. Kimi K2.7 Code, which appeared on Hugging Face on June 12, 2026, does something different: it tops Claude Opus 4.8 on MCPMark tool use benchmarks — one of the most relevant real-world tests for coding agents — while pricing its API at roughly $0.95 per million input tokens. That is about one-thirtieth the cost of closed frontier alternatives for comparable capability.
This is Moonshot AI's fifth significant model release in under a year. The jump from K2.6 to K2.7 Code is not a full model rewrite. It is a specialist branch focused entirely on coding and agentic software engineering. And based on the early numbers, the focus shows.
Key takeaways
- Kimi K2.7 Code released June 12, 2026 on Hugging Face under a Modified MIT license — commercially usable
- 1 trillion total parameters, 32B active per token, 256K context window, same MoE architecture as K2.6
- Scores 81.1% on MCPMark tool use, beating Claude Opus 4.8 at 76.4% on agent tool-calling benchmarks
- Uses about 30% fewer reasoning tokens than K2.6 while improving code benchmarks by 21.8%
- API pricing: $0.95/M input, $4.00/M output through Moonshot's platform and third-party providers
What is Kimi K2.7 Code?
Moonshot AI, the Beijing-based lab behind the Kimi model family, launched K2.7 Code as a purpose-built derivative of the K2.6 architecture. Where K2.6 was a general-purpose multimodal agent, K2.7 Code narrows its focus: the entire fine-tuning run targets software engineering tasks, tool invocation, and long-horizon coding agents.
The architecture is a 1-trillion-parameter Mixture-of-Experts model with 32 billion parameters active per token across 384 experts. The context window sits at 256K tokens, which accommodates large codebases and multi-file projects. The model is available in full weights on Hugging Face and accessible via API with the model identifier kimi-k2.7-code.
The license is Modified MIT. That means you can use it commercially, fine-tune it, and deploy it in products without royalties or usage restrictions, as long as you comply with Moonshot's specific modification clauses — primarily restricting use in certain jurisdictions.
How does it compare to Kimi K2.6?
The biggest practical improvements Moonshot reports for K2.7 Code over K2.6:
- +21.8% on Kimi Code Bench v2 — the lab's internal coding evaluation
- +11.0% on Program Bench
- +31.5% on MLS Bench Lite, which covers multi-language software tasks
- About 30% fewer thinking tokens consumed per task
That last point matters enormously for anyone running the model at scale through an API. Fewer thinking tokens means lower cost per agentic coding session, faster responses, and less likelihood of hitting context limits mid-task. If you are orchestrating coding agents that make dozens of tool calls per run, a 30% reduction in reasoning token usage compounds quickly into meaningful cost savings.

The benchmark that stands out: 81.1% on MCPMark
MCPMark measures how well a model handles structured tool calls — the backbone of any agentic coding workflow. Models are evaluated on their ability to select the right tool, format inputs correctly, interpret outputs, and chain calls together across multi-step tasks.
Kimi K2.7 Code scores 81.1% on MCPMark. Claude Opus 4.8 — Anthropic's current flagship for complex reasoning and tool use — scores 76.4%. That is a 4.7-point lead on a benchmark that reflects real agentic use cases more closely than most synthetic evals.
This does not mean K2.7 Code outperforms Claude Opus 4.8 across the board. Math reasoning and general knowledge tasks are not where this model is optimized. But for teams running coding agents, automated code review, tool-calling pipelines, or AI-assisted software engineering workflows, tool use performance is the number that determines day-to-day reliability.
One important caveat: Moonshot's benchmark table is first-party only. Independent replication is still pending as of today's release. The results should be treated as directional until external leaderboards confirm them.
How does Kimi K2.7 Code compare to DeepSeek V4 Pro?
The two most serious open-weight coding models right now are Kimi K2.7 Code and DeepSeek V4 Pro (released April 24, 2026). Both are Chinese-developed, both use MoE architectures, and both dramatically undercut closed-model pricing. The differences come down to what you prioritize.
Context window: DeepSeek V4 Pro has a 1 million token context window via API. K2.7 Code caps at 256K. If you need to ingest an entire large codebase in a single prompt, V4 Pro wins by a significant margin.
Raw code benchmark: DeepSeek V4 Pro scores 80.6% on SWE-bench Verified — the highest open-weights entry and within striking distance of frontier closed models. K2.7 Code's equivalent on that specific benchmark is not yet independently confirmed at publication.
Tool use and agents: K2.7 Code's 81.1% MCPMark result is the headline advantage. DeepSeek V4 Pro has not published comparable MCPMark scores.
Pricing: DeepSeek V4 Pro is cheaper on both ends — $0.435/M input and $0.87/M output versus K2.7 Code's $0.95/M input and $4.00/M output. For output-heavy agentic runs where models generate large volumes of code, V4 Pro's output pricing is about 4.6 times lower. That gap adds up in production.
Who should use Kimi K2.7 Code?
K2.7 Code is the right choice if you are building agentic coding pipelines where tool use reliability is the primary constraint, and your context fits within 256K tokens. Teams evaluating open-source alternatives to Claude Opus 4.8 for software engineering tasks — particularly those running automated code review, test generation, or multi-step coding agents — will find the benchmark position genuinely compelling.
If your workloads routinely need 500K+ token contexts for very large codebase analysis, or if output token volume is high enough that 4.6x price difference on output dominates your budget, DeepSeek V4 Pro is the more pragmatic choice for now.
For developers who want to self-host and fine-tune without closed-model constraints, both are worth evaluating. K2.7 Code's Modified MIT license and Hugging Face availability make it as accessible as any frontier-adjacent open-weight model currently available.
How to access Kimi K2.7 Code
The model is live now through Moonshot AI's Kimi platform API using model ID kimi-k2.7-code, on Hugging Face for local deployment, and through third-party providers including DeepInfra. The Hugging Face full weights release means you can run it locally on sufficient hardware without API rate limits or per-token billing.
What today's release means for the open-source AI landscape
A year ago, the coding agent category belonged almost entirely to closed models. GPT-4o, Claude Opus, and Gemini led the benchmarks and came with per-token costs that made high-volume agentic pipelines expensive by default.
The combination of Kimi K2.6, K2.7 Code, and DeepSeek V4 Pro has effectively put frontier-adjacent coding intelligence at open-weights pricing. K2.7 Code's MCPMark result — beating Claude Opus 4.8 on tool use while pricing at roughly $0.95/M input tokens — is the clearest signal yet that the gap between open and closed models on agent benchmarks is closing faster than most predicted.
For teams building software on top of AI agents rather than just using them as chat interfaces, today's release is the kind of development worth pausing to evaluate seriously.
Frequently asked questions
Is Kimi K2.7 Code free to use?
The weights are open and downloadable under a Modified MIT license, which allows commercial use. Running it locally is free beyond compute costs. API access through Moonshot AI or third-party providers is priced at $0.95/M input tokens and $4.00/M output tokens — not free, but among the lowest costs for a frontier-adjacent coding model with these benchmark scores.
How is K2.7 Code different from K2.6?
K2.7 Code is a coding-specialist branch of the K2.6 architecture. It uses the same 1T MoE foundation but was fine-tuned specifically for software engineering tasks, tool invocation, and long-horizon agentic coding. The result is better code benchmarks, 30% lower thinking token usage per task, and higher MCPMark tool use scores — but it is not a general-purpose multimodal model the way K2.6 is.
Does Kimi K2.7 Code beat GPT-5.5 on coding?
Not by current numbers. Moonshot reports that K2.7 Code narrows the gap to GPT-5.5 from 18 points down to 7 points on Code Bench — a significant improvement, but GPT-5.5 still leads on that metric. For tool use specifically, K2.7 Code leads Claude Opus 4.8, but a comparable MCPMark figure for GPT-5.5 is not yet publicly available for a direct comparison.
Was this article helpful?
SaaS Master
Creator behind SaaS Master — tutorials, walkthroughs, reviews, and explainers that help SaaS, AI, and WordPress products get understood and chosen. Writing here about the tools, trends, and tactics that actually move the needle. Work with me →
Want your product explained this clearly — in video?
Tutorials, walkthroughs, reviews, and shorts for SaaS, AI, and WordPress products.
Work With SaaS Master