AI & SaaS

DeepSeek's Permanent Price Cut Explained: Why the V4-Pro Move Reset the AI Market in 2026

June 9, 20267 min readBy SaaS Master

DeepSeek just changed the math for anyone building on AI. On May 22, 2026, the company made its 75% price cut on the V4-Pro model permanent, dropping standing rates to about $0.435 per million input tokens and $0.87 per million output tokens. A discount that expires is a marketing event. A discount that does not expire is a new market floor — and that is what this is.

I follow pricing moves closely because they decide which tools the SaaS companies I work with can actually afford to ship. This one is bigger than a sale. Here is what happened and what it means for builders.

It helps to understand why pricing has become the loudest story in AI this year. The frontier capability gap between the top models has narrowed — most of them can handle the everyday tasks businesses actually run. When the products are close enough in quality, competition moves to the next lever, and that lever is cost. DeepSeek has chosen to compete almost entirely on price, and by locking in a permanent floor it is forcing every rival to decide what they are really selling: raw capability, trust, integration, or just cheaper tokens.

Key takeaways

DeepSeek made its 75% V4-Pro discount permanent on May 22, 2026, retroactive to the model's April 24 launch.
New standing rates: about $0.435/M input and $0.87/M output, with cache hits at roughly $0.0036/M — among the lowest first-party frontier-model prices anywhere.
V4-Pro is now roughly 11.5x cheaper than GPT-5.5 on input and about 34x cheaper on output.
The model is no slouch: around 80.6% on SWE-bench Verified, competitive with frontier proprietary models.
Expect pressure on every other provider's budget tier — the "cheap AI" floor just dropped again.

What exactly changed?

DeepSeek V4-Pro launched on April 24, 2026 with a full list price of $1.74 / $3.48 per million tokens and a promotional 75% discount that was supposed to run only through May 31. Instead, on May 22 the company announced the promo rate is now the permanent list rate — retroactive to launch and forward indefinitely. The expiring discount became the price.

That puts standing rates at roughly $0.435 per million input tokens and $0.87 per million output tokens. The cache-hit price fell to about $0.0036 per million tokens, which DeepSeek positions as the lowest first-party cache price for a frontier-class model in 2026. For workloads that reuse a lot of context — RAG systems, long system prompts, repeated document analysis — that cache pricing alone can reshape a budget.

Key statistics on DeepSeek V4-Pro permanent pricing

Why this is a big deal, not just a cheap option

Cheap models are not new. What makes this notable is that V4-Pro is cheap and genuinely capable. It scores around 80.6% on SWE-bench Verified — the benchmark for resolving real GitHub issues — which places it in direct competition with frontier proprietary models rather than in the budget bargain bin. It is widely regarded as one of the strongest open-weight reasoning models available, behind only Kimi K2.6.

Stack that quality against the price and the comparison gets stark. V4-Pro is roughly 11.5x cheaper than GPT-5.5 on input tokens and about 34x cheaper on output. For a product making millions of model calls a month, that is not a line-item difference — it is the difference between a feature being viable and being shelved.

What it means for the rest of the market

When a capable model sets a permanent floor this low, everyone else has to respond. We are already seeing the budget tier from Western labs drift in interesting directions — Google priced the new Gemini 3.5 Flash at roughly 3x the model it replaced, betting that quality and speed justify a premium rather than racing DeepSeek to the bottom. That is the strategic fork this move forces: compete on price and lose margin, or compete on trust, integration, and reliability and defend a premium.

For builders, the practical takeaway is that a routing strategy is now close to mandatory if cost matters. Send the easy, high-volume requests to an ultra-cheap model like V4-Pro, and reserve a premium Western model for the requests where reliability, data governance, or nuance justify the cost. Treating one model as your answer for everything is leaving money on the table in 2026.

The catch worth naming

The honest caveat is data governance. V4-Pro is a Chinese-developed model, and where your requests are processed matters for regulated industries and for some enterprise buyers. The capability and price are real; whether they fit your compliance posture is a separate decision that no benchmark answers for you. For non-sensitive, high-volume work, the value is hard to beat. For regulated data, the calculus is different — and that nuance is exactly why routing, not wholesale switching, is the smart play.

What should builders actually do this week?

If you are running on a premium model today, the first move is simple: audit where your tokens go. Most products have a handful of high-volume, low-stakes operations — summaries, tagging, classification, first-draft generation — that consume the bulk of the bill and do not need a frontier model. Those are the obvious candidates to route to V4-Pro, and the savings show up immediately without users noticing a quality drop.

The second move is to set up a fallback. One practical pattern is to send a request to the cheap model first, run a quick quality check on the output, and escalate to a premium model only when the check fails. That keeps the average cost close to the cheap tier while protecting the cases that genuinely need more horsepower. It takes a day to wire up and pays for itself almost immediately at any real volume.

Why permanent pricing changes behavior

There is a psychological piece here that is easy to miss. Teams hesitate to build on promotional pricing because they have been burned before — you architect around a cheap rate, the promo ends, and your unit economics break overnight. By making the cut permanent and retroactive, DeepSeek removed that hesitation. Builders can now design systems that assume this price floor will hold, which is exactly the kind of confidence that drives adoption. A temporary discount wins trials; a permanent one wins production deployments. That distinction is why this move will ripple further than the headline number suggests.

Frequently asked questions

How much does DeepSeek V4-Pro cost now?

About $0.435 per million input tokens and $0.87 per million output tokens, with cache hits near $0.0036 per million. These are permanent list rates as of May 22, 2026, not a limited-time promotion.

Is DeepSeek V4-Pro actually good, or just cheap?

Both. It scores around 80.6% on SWE-bench Verified, competitive with frontier proprietary models, and is considered one of the top open-weight reasoning models available.

Should I switch everything to DeepSeek to save money?

Not necessarily. The smart approach is routing — use V4-Pro for high-volume, non-sensitive tasks and keep a Western model for regulated data or work where reliability and vendor trust matter most.

DeepSeekV4-ProLLM pricingAI costAPI2026

SaaS Master

Creator behind SaaS Master — tutorials, walkthroughs, reviews, and explainers that help SaaS, AI, and WordPress products get understood and chosen. Writing here about the tools, trends, and tactics that actually move the needle. Work with me →