AI Tools

MiniMax M3 vs Qwen 3.7 Max: China's Two Newest Frontier AI Models Compared (2026)

June 17, 20268 min readBy SaaS Master

Two Chinese frontier AI models dropped within two weeks of each other in June 2026, and both claim to beat GPT-5.5 on at least one benchmark while costing a fraction of the price. MiniMax M3 launched June 1 as the first open-weight model to combine frontier coding, a 1M-token context window, and native multimodality in a single package. Qwen 3.7 Max arrived May 19 from Alibaba, positioning itself as the "Agent Frontier" — built specifically for tasks that run for hours and require thousands of tool calls without losing context.

These are not midtier models catching up to the West. They are competing directly with Claude Opus 4.7 and GPT-5.5 on hard benchmarks, and on price they are not even in the same universe. Here is what actually separates them and which one belongs in your stack.

Key takeaways

MiniMax M3 is open-weight and multimodal: it accepts text, image, and video inputs for $0.60 per million input tokens, making it the most affordable frontier-class multimodal model available right now.
Qwen 3.7 Max is API-only and agentic-first: at $2.50 input / $7.50 output per million tokens, it targets long-horizon autonomous agents that run for hours, not quick chat interactions.
On coding, Qwen 3.7 Max scores higher on SWE-bench Pro (60.6% vs 59.0%) and Terminal-Bench (69.7% vs 66.0%). On web browsing and agentic task averages, MiniMax M3 leads.
MiniMax M3's 50% promotional launch discount currently brings input cost to roughly $0.30 per million tokens — about one-tenth the cost of Claude Opus 4.7.
Neither model is a Western product. Both are Chinese labs, both are credibly frontier-class, and both are available globally on OpenRouter right now.

What is MiniMax M3?

MiniMax is a Shanghai-based AI lab that has been quietly building models since 2021. M3, released June 1, 2026, is their flagship and the first model to genuinely combine three capabilities that previously required separate models: frontier-level coding performance, a full 1-million-token context window, and native multimodal input across text, images, and video.

The architecture behind M3 is called MiniMax Sparse Attention (MSA). Instead of running full attention across all tokens — which gets extremely expensive at a million-token context — MSA selects which key-value blocks to attend to at each step. The practical result is that M3 can handle long-context tasks without the quadratic compute cost that normally makes long-context models prohibitively expensive to run.

On benchmarks, M3 scores 59.0% on SWE-bench Pro (the hard version of the popular coding benchmark), 66.0% on Terminal-Bench 2.1, and 83.5 on BrowseComp — a web browsing agent test. MiniMax claims M3 surpasses GPT-5.5 and Gemini 3.1 Pro on SWE-bench Pro and approaches Claude Opus 4.7. Independent benchmark sites show results in that range, though some place it slightly below Opus 4.7 on coding and reasoning.

The pricing is what makes M3 notable beyond the benchmarks. At launch it listed on OpenRouter at $0.60 per million input tokens and $2.40 per million output tokens, with a 50% promotional discount that brings it to roughly $0.30 input and $1.20 output. To put that in context: Claude Opus 4.7 runs $15 per million input tokens. GPT-5.5 is similarly priced. M3 at its regular rate is 25 times cheaper per input token than Opus — and it is multimodal.

M3 is also open-weight, meaning you can download the model and run it on your own infrastructure. That matters for teams with data privacy requirements, on-premise deployments, or who want to fine-tune.

What is Qwen 3.7 Max?

Qwen is Alibaba Cloud's AI model family, and Qwen 3.7 Max is their current flagship, announced May 19, 2026. The entire release was framed around a single positioning: "The Agent Frontier." Alibaba is not pitching this model for chat or quick lookups. They are pitching it for workflows that run for hours, span hundreds or thousands of tool calls, and require the model to maintain coherent context and decision-making across all of them.

The most striking demonstration in the release materials is a 35-hour autonomous kernel optimization run. The model completed 432 kernel evaluations and 1,158 tool calls and achieved a 10x geometric mean speedup over a standard reference implementation — without human intervention over the course of that 35-hour window. That is a qualitatively different use case than most AI tools are designed for.

On benchmarks, Qwen 3.7 Max leads MiniMax M3 on coding: 60.6% on SWE-bench Pro vs M3's 59.0%, and 69.7% on Terminal-Bench vs M3's 66.0%. On reasoning, Qwen 3.7 Max hits 92.4% on GPQA Diamond, which beats Claude Opus 4.7's 91.3% on the same test — a notable result for a non-Western model. Its score on the HMMT 2026 February math competition is 97.1%, also ahead of Opus 4.7.

Pricing for Qwen 3.7 Max is $2.50 input / $7.50 output per million tokens — significantly more expensive than M3, but still less than Claude Opus 4.7 or GPT-5.5. Alibaba's prompt caching system reduces costs dramatically for repeated context: cached input costs $0.25 per million tokens, a 90% discount, and in agentic workflows where system prompts are large and repeated, caching can cut effective costs by 60 to 80 percent.

Unlike M3, Qwen 3.7 Max is API-only. There is no downloadable version. You access it through Alibaba Cloud Model Studio or via API gateways. It supports both OpenAI-compatible and Anthropic-compatible API formats, which means you can swap it into stacks already built on either without rewriting your client code.

Which is cheaper, Qwen 3.7 Max or MiniMax M3?

At standard pricing, M3 is about 4x cheaper on input tokens ($0.60 vs $2.50) and about 3x cheaper on output tokens ($2.40 vs $7.50). For teams running high-volume inference — document processing, content pipelines, analysis at scale — that difference is real money.

However, Qwen 3.7 Max's caching discount changes the math for agentic workloads. If your system prompt is large (say, 50,000 tokens) and you are running it against thousands of queries, paying $0.25 per million cached input tokens versus $0.60 per million uncached on M3 makes Qwen competitive on actual cost-per-query depending on your cache hit rate.

For low-cache workloads or for teams processing mixed media where M3's multimodal input is needed, M3 wins on cost. For long-running agentic pipelines with repetitive context, Qwen 3.7 Max's caching makes it more competitive than the headline pricing suggests.

Which model is better for coding?

Qwen 3.7 Max has the edge on coding benchmarks: 60.6% on SWE-bench Pro vs MiniMax M3's 59.0%, and 69.7% on Terminal-Bench vs 66.0%. Those gaps are real but not wide — these are two models in the same tier, not a generation apart.

What I find more useful than the headline numbers is the context of the 35-hour run. Qwen 3.7 Max is designed to maintain coherent coding strategy over extremely long sessions. Most coding agents break down after 20 or 30 tool calls. If your use case is a full autonomous coding pipeline — not a single PR, but a multi-day development workflow — Qwen 3.7 Max appears to handle that better based on Alibaba's own demonstration.

For everyday coding tasks, pair programming, or shorter automation scripts, the gap between the two models is small enough that M3's lower cost tips the balance, especially if you are already sending image or video inputs alongside the code.

Which model handles agentic tasks better?

This is the one area where M3 leads on aggregate benchmark scores: 71.9 average on agentic tasks vs Qwen 3.7 Max's 69.7. M3 also leads on BrowseComp at 83.5, which measures web browsing agent performance.

However, Qwen 3.7 Max's entire design philosophy is agentic. The 35-hour kernel run is the kind of result that does not show up cleanly in standard benchmarks, which typically cap runs at a few hundred tool calls. For truly long-horizon agents — the kind that run overnight, manage their own tool errors, and maintain goal coherence across thousands of steps — Qwen 3.7 Max is probably the right choice based on design intent and the evidence Alibaba has published.

For shorter agentic tasks, web browsing agents, or mixed-media workflows, M3 is strong and significantly cheaper.

Who should use MiniMax M3?

M3 is the right choice if you need multimodal input at frontier-class quality without paying frontier-class prices. If your workflow involves analyzing images or video alongside text — product screenshots, UX review, document parsing with visual elements — M3 handles all of that natively at $0.60 per million input tokens.

M3 is also the right call if you want to run the model on your own infrastructure. Open-weight means you can host it, fine-tune it, and control your data. For teams with compliance requirements or who are building products on top of a model they do not want to depend on a third-party API for, that is a meaningful advantage.

Who should use Qwen 3.7 Max?

Qwen 3.7 Max is for teams building long-horizon autonomous agents where the task runs for hours, not minutes. If you are building a coding agent that takes a ticket and produces a complete PR with tests, or a research agent that synthesizes thousands of documents, or a DevOps agent that monitors and responds to production events overnight, Qwen 3.7 Max is designed explicitly for that. Its GPQA Diamond score of 92.4% and IMOAnswerBench score of 90 also make it one of the strongest reasoning models available for math and science tasks.

The OpenAI-compatible and Anthropic-compatible APIs make migration easy if your team already has a working agent stack built on either ecosystem.

Frequently asked questions

Is MiniMax M3 really open-weight?

Yes. MiniMax M3 is open-weight, meaning the model weights are publicly downloadable. You can run it on your own hardware, fine-tune it, or deploy it in air-gapped environments. This is different from Qwen 3.7 Max, which is API-only and not downloadable.

Does Qwen 3.7 Max work with the OpenAI API format?

Yes. Qwen 3.7 Max supports both OpenAI-compatible and Anthropic-compatible API formats. You can point your existing OpenAI SDK calls at the Qwen 3.7 Max endpoint with minimal code changes.

Which Chinese AI model is better for most users?

For most users who want a capable, affordable model for a mix of tasks, MiniMax M3 is the better starting point. It is multimodal, cheaper, open-weight, and scores competitively on coding and agentic benchmarks. Qwen 3.7 Max is the stronger choice if you are specifically building long-running autonomous agents or need maximum reasoning performance on math and science.

MiniMax M3 Qwen 3.7 Max Chinese AI models AI model comparison frontier AI LLM comparison

Was this article helpful?

SaaS Master

Creator behind SaaS Master — tutorials, walkthroughs, reviews, and explainers that help SaaS, AI, and WordPress products get understood and chosen. Writing here about the tools, trends, and tactics that actually move the needle. Work with me →