Skip to content
22 min readCritique

DeepSeek & MiMo Are 0.5 Credits Forever — Plus Opus 4.8, Qwen3.7-Max, and 2026's Cheapest Frontier Review Stack

Permanent catalog cuts on DeepSeek V4 and MiMo, plus Opus 4.8, Ring-2.6-1T, Gemini 3.5 Flash, Grok Build, and Step 3.7 Flash — with benchmarks that explain why the price war finally matters for PR review.

DeepSeekXiaomiMiMo
0cr
Permanent floor — DeepSeek V4 Flash & MiMo v2.5
DeepSeekXiaomiMiMo
0cr
Permanent floor — DeepSeek V4 Pro & MiMo v2.5 Pro
XiaomiMiMo
0%
Max MiMo API cut — Xiaomi permanent repricing (May 27)
+100
One-time reader credits at the end of this essay

If you have been waiting for “good enough” review to cost less than coffee, this is the release. The mental model we want you to leave with is anchoring in reverse: flagship models still exist for the PR where one missed security bug costs more than a month of credits — but the median PR should not touch them. The median PR should ride open-weight lanes that now score within striking distance of Sonnet-class SWE-bench Verified numbers at 1/20th the credit burn.

Launch window — treasury pricing

Half-price DeepSeek V4 lanes for both Flash and Pro.

Anthropic Sonnet and Opus tiers still excel on frontier agentic dashboards — Critique bundles those models everywhere you expect them — DeepSeek sits in a different place on the value curve. When the watchdog and specialist graph can call Flash on a cadence measured in tens of seconds, predictable credit floors matter as much as raw Elo.

DeepSeek
Permanent
DeepSeek V4 Flash
0.5 cr1 cr
Ends No expiry
DeepSeek
Permanent
DeepSeek V4 Pro
1 cr3 cr
Ends No expiry
XiaomiMiMo
Permanent
MiMo v2.5
0.5 cr1.5 cr
Ends No expiry
XiaomiMiMo
Permanent
MiMo v2.5 Pro
1 cr3 cr
Ends No expiry
AntGroup
Permanent
Ling-2.6-Flash
0.5 cr1 cr
Ends No expiry

This is not Critique inventing a promo lane in isolation. On May 27, Xiaomi permanently renovated the entire MiMo-V2.5 pricing system: flat per-token rates with no more input-length multipliers, cuts of up to 99% versus the old 256K–1M tiers, Token Plan quotas reset and expanded 5–8×, and a public write-up on how they kept costs down — SWA with SGLang HiCache shrinking KV-cache churn, better expert-parallel bucketing, and higher cache hit rates on long agent runs. Read their announcement in Primary sources below; our 0.5cr / 1cr MiMo floors are the downstream bet that those economics should be the default for PR review, not a weekend experiment.

Critique.shLive · Updated just now

New & repriced models in this drop

Provider icons via LobeHub. Credit floors are per review slice — depth and specialists still multiply total burn.

13 models88.6% top SWE score0.5 cr lowest floor
Claude

Claude Opus 4.8

SWE-bench

88.6%

37 cr
DeepSeek

DeepSeek V4 Pro

Max

SWE-bench

80.6%

1 cr
Qwen

Qwen3.7-Max

SWE-bench

80.4%

6 cr
DeepSeek

DeepSeek V4 Flash

Max

SWE-bench

79.0%

0.5 cr
XiaomiMiMo

MiMo v2.5 Pro

SWE-bench

78.9%

1 cr
AntGroup

Ring-2.6-1T

SWE-bench

74.0%

1.5 cr
AntGroup

Ling-2.6-Flash

SWE-bench

61.2%

0.5 cr
Stepfun

StepFun 3.7 Flash

SWE-bench

56.3%

1 cr
Minimax

MiniMax M2.7

SWE-bench

56.2%

1.5 cr
XiaomiMiMo

MiMo v2.5

SWE-bench

56.1%

0.5 cr
Google

Gemini 3.5 Flash

SWE-bench

55.1%

10 cr
Grok

Grok Build 0.1

Benchmark

Agentic coding (xAI)

3 cr
Claude

Opus 4.8 (Fast)

Benchmark

Same weights, 2× credits

74 cr

SWE-bench scores reflect best observed performance on the toughest real-world coding tasks.

All scores are relative.

DeepSeek’s V4 family was already the rational default for teams that wanted MoE scale and a million-token context class without renting Claude for every specialist pass. What changed is the price story. V4 Flash at 0.5 credits is not “cheap for a demo.” At roughly 1M input + 150k output per credit unit, a deep 5M-token review pass on Flash can land near 2.5 credits — less than a single old-system unit on a mid-tier model. V4 Pro at 1 credit is the open-weight lead for messy PRs: Artificial Analysis quotes GDPval-AA near 1554 Elo on the Pro Max reasoning profile, leading the open-weights pack on agentic work tasks.

SWE-bench Verified — open-weight value stack

Higher is better. Vendor-reported scores; harnesses differ across labs.

Sources: Anthropic Opus 4.8 launch, DeepSeek Hugging Face cards, Qwen3.7 agent blog, Xiaomi MiMo Pro card, InclusionAI Ring/Ling HF evals.

Price-to-performance

DeepSeek V4 Pro vs Claude Sonnet 4.6 on Critique credits

Sonnet is still excellent. The buying question is whether the last few SWE-V points are worth 22× the floor on every lead pass.

Metric
DeepSeekDeepSeek V4 Pro
ClaudeClaude Sonnet 4.6
critique.sh credit floor
1 cr (V4 Pro)
22 cr (Sonnet 4.6)
SWE-bench Verified (vendor)
80.6%
79.6%
GDPval-AA Elo (AA, Apr 2026)
1554
~1600 class
Best for
Default lead on cost-sensitive repos
Policy-mandated Anthropic lane

MiMo v2.5 at 0.5 credits is the parallel bet to Gemma and Ling in our volume tier: Xiaomi positions the Flash lane for high-throughput agent loops, and the tech report cites 73.4% on SWE-bench Verified for the Flash profile while keeping active parameters tiny enough to run wide specialist fan-out. MiMo v2.5 Pro at 1 credit is the escalation lane inside the same family — 78.9% SWE-V in the official card, with Terminal-Bench 2.0 in the high-60s. The vendor-side story matters too: in their May 2026 price-adjustment post, Xiaomi says they “permanently renovate the entire model pricing system,” drop context-length surcharges, and fund the cut with real inference engineering — not a time-boxed coupon. Critique passes that through as permanent catalog pricing so the median PR does not need a flagship model to get a serious second opinion. If your team has been mentally bucketed into “cheap Chinese models = toy reviewers,” update the bucket: the scores crossed the line where selective review becomes irrational.

Xiaomi vs frontier tax

MiMo v2.5 Pro vs GPT-5.4 Mini

Both are “serious enough” for many PRs. One costs 1 credit; the other costs 6.

Metric
XiaomiMiMoMiMo v2.5 Pro
OpenAIGPT-5.4 Mini
critique.sh floor
1 cr
6 cr
SWE-bench Verified
78.9%
73.0% (Vals on 5.4 mini)
Context class
1M (vendor)
128K–1M (route-dependent)
Vendor pricing story
Flat 1M-context API (May 27)
Usage-tier mini model

Claude Opus 4.7 leaves the catalog; Opus 4.8 takes its Ultra slot at 37 credits with a 1M-token context window and Anthropic’s published 88.6% SWE-bench Verified. Opus 4.8 (Fast) uses the same weights at 74 credits — double the floor for teams that buy latency, not capability. Qwen3.6-Max-Preview retires in favor of Qwen3.7-Max at 6 credits: Alibaba’s agent blog cites 80.4% SWE-V and 60.6% SWE-Pro, a cleaner mid-flagship than the old 8cr preview lane. Gemini 3.5 Flash lands at 10 credits as Google’s “near-Pro coding at Flash economics” bet — 55.1% SWE-Pro public, 76.2% Terminal-Bench 2.1 in DeepMind materials.

Agent & coding lanes

New specialist-sized models worth routing

Not every model should be your lead. These are the slots we expect in specialist grids and Remedy picks.

Metric
Model
Why it exists on Critique
Grok Build 0.1 (3 cr)
xAI coding agent
Fast tool-use model for interactive fix loops; pairs with Remedy when you want xAI flavor.
Ring-2.6-1T (1.5 cr)
InclusionAI 63B active MoE
SWE-V 74% at Ring pricing — thinking model for tool-heavy agents without Qwen flagship cost.
StepFun 3.7 Flash (1 cr)
196B MoE, 11B active
Replaces 3.5 Flash; native multimodal + 256K context for repos with UI screenshots in PRs.
MiniMax M2.7 (1.5 cr)
Was 2 cr
M2.5 removed; M2.7 is the MiniMax lane now at a lower permanent credit floor.
Routing checklist
  1. 1
    Default lead for volume?
    deepseek/deepseek-v4-pro at 1 cr, or deepseek/deepseek-v4-flash at 0.5 cr if PRs are small.
  2. 2
    Default specialist grid?
    Mix ling-2.6-flash, mimo-v2.5, deepseek-v4-flash, ring-2.6-1t — all sub-2cr before depth multipliers.
  3. 3
    When to escalate to Opus 4.8?
    Auth, billing, migrations, or incident-linked PRs. Use Opus 4.8 Fast only when wall-clock dominates invoice.
  4. 4
    Remedy default?
    Still Qwen3.6 Plus (free model cost) for lint-level fixes; escalate to MiMo Pro or V4 Pro when validation fails.
What we removed or aliased

Existing repo policies keep working — IDs map forward.

Old IDNew target
minimax/minimax-m2.5minimax/minimax-m2.7
stepfun/step-3.5-flashstepfun/step-3.7-flash
qwen/qwen3.6-max-previewqwen/qwen3.7-max
anthropic/claude-opus-4.7anthropic/claude-opus-4.8
:nitro suffixesStripped for billing; legacy speed suffixes no longer apply
Yes. Critique set permanent credit floors — not a promo window. DeepSeek V4 Flash and MiMo v2.5 bill at 0.5 credits; DeepSeek V4 Pro and MiMo v2.5 Pro bill at 1 credit. The public /models page and this essay reflect the same numbers as billing.
Permanent floor cuts on DeepSeek V4 and MiMo lanes, plus new models: Claude Opus 4.8 (+ Fast), Qwen3.7-Max, Gemini 3.5 Flash, Grok Build 0.1, Ring-2.6-1T, StepFun 3.7 Flash (replacing 3.5), and MiniMax M2.7 at a lower floor. Legacy IDs alias forward so saved repo policies keep working.
Use deepseek/deepseek-v4-flash or xiaomi/mimo-v2.5 at 0.5 credits for small diffs; deepseek/deepseek-v4-pro or mimo-v2.5-pro at 1 credit for heavier lanes. Reserve Opus 4.8 for auth, billing, migrations, or incident-linked changes. The decision checklist earlier in this essay has a full routing table.
Signed-in users can claim a one-time +100 credit bonus at the end of this essay. It is meant to fund A/B tests against your current lead model while the new permanent floors are live. Claim details are on the embedded credit card below.

Read the drop, claim the credits

+100 credits for reading this drop

Signed-in Critique users get a one-time +100 bonus credits for reading this catalog spring essay. Use them to A/B DeepSeek and MiMo lanes against your current lead — the floors are permanent; this bonus is our way of paying for your experiment time.

Open dashboard

Open the model guide

Every floor, plan gate, and Ultra slot lives on the public models page — same numbers as billing.

Browse models →