Skip to content
16 min readCritique

Qwen 3.6 and Grok 4.3 in Critique: cheaper routing, stronger coding signal, and one absurd xAI price cut

We replaced Grok 4.2 with Grok 4.3 at 3 credits, retired Qwen3.5-27B in favor of Qwen3.6-35B-A3B at 1 credit, added Ling-2.6-Flash at 1 credit, added Qwen3.6-Max-Preview at 8 credits, discounted GLM-5.1 by 1 credit, and moved MiMo v2.5 plus KAT Coder Pro V2 to their new floors.

This update is less about adding names to a dropdown and more about cleaning up our routing ladder. The old Qwen3.5-27B slot had become hard to justify once Qwen shipped a 35B/3B-active successor that is better on repository reasoning, better on frontend tasks, and still cheap enough to use as a routine specialist. On the xAI side, Grok 4.3 gives us a simpler story: reasoning is always on, multimodal input stays available, the context window remains large, and the credit floor falls sharply enough that it stops being a novelty lane and starts being a real option.

3 cr
New Grok 4.3 floor, down from 8 cr for the old Grok 4.2 slot
1 cr
Qwen3.6-35B-A3B floor, 0.5 credits cheaper than retired Qwen3.5-27B
1 cr
Ling-2.6-Flash joins the catalog as a fast InclusionAI agent lane
8 cr
Qwen3.6-Max-Preview floor in Critique and Remedy
3 cr
GLM-5.1 now costs 1 credit less than before
2 cr
KAT Coder Pro V2 returns to its normal shelf price after a month-long discount
1.5 cr
New MiMo v2.5 floor after a +1 credit adjustment

New routing snapshot

Catalog floors after the May 2, 2026 refresh.

Grok
3 cr
Grok 4.3
1M ctx · reasoning-only · text+image in
Qwen
1 cr
Qwen3.6-35B-A3B
SWE-Bench Verified 73.4%
AntGroup
1 cr
Ling-2.6-Flash
104B total · 7.4B active · fast agent lane
Qwen
8 cr
Qwen3.6-Max-Preview
Top score on 6 coding benchmarks (vendor summary)
Z.ai
3 cr
GLM-5.1
Discounted by 1 cr
KwaiKAT
2 cr
KAT Coder Pro V2
Back to normal shelf price
XiaomiMiMo
1.5 cr
MiMo v2.5
Credit floor raised by 1 cr

PART ONE - GROK 4.3 REPLACES GROK 4.2

xAI’s current developer docs surface `grok-4.3` as the active flagship in the overview flow, with a 1M-token context class and text-plus-image input support. Their model docs also keep the important behavior constraint from the Grok 4 family: reasoning is built in, and there is no separate reasoning-effort dial for the standard Grok 4 line. That makes Grok 4.3 a clean fit for Critique. We do not need to explain hidden mode switches to users, and we do not need one price for “thinking” and another for “non-thinking.”

xAI slot refresh

How the Grok lane changed

We replaced the old Grok 4.2 entry with Grok 4.3 and repriced the lane hard downward.

Grok
Throughput tier
Grok 4.3

Reasoning-only xAI flagship with 1M context and multimodal inputs, now viable for regular lead and specialist use.

Critique floor3 cr / run
xAI API — input / 1M$1.25
xAI API — output / 1M$2.50

xAI public docs show Grok 4.3 in the live developer overview as of May 2, 2026. The 1.25 / 2.50 vendor token pricing is the current xAI pricing input we used for this catalog refresh.

PART TWO - QWEN3.6-35B-A3B REPLACES QWEN3.5-27B

This is the cleaner benchmark story in the release. Qwen’s official Hugging Face card for Qwen3.6-35B-A3B lists a 35B total / 3B active MoE architecture, 262,144 native context, and extension to roughly 1.01M tokens. The published coding-agent table is mixed on the SWE rows, but it shows clear gains on the repo-scale and front-end-shaped tasks we care about most: Terminal-Bench 2.0, Claw-Eval average, SkillsBench, QwenClawBench, NL2Repo, and QwenWebBench.

Qwen3.6-35B-A3B on the rows that matter most to Critique

Official Qwen model-card scores on repo, terminal, and browser-shaped coding tasks. Higher is better.

Qwen’s table mixes percentages with benchmark-specific scales such as QwenWebBench. Qwen3.5-27B still leads on some SWE rows, but Qwen3.6-35B-A3B leads most of the broader agentic, terminal, browser, and repo-style workflow rows while also costing less in our catalog.

Why we still replaced Qwen3.5-27B

The replacement call is about the overall workflow mix, not one benchmark in isolation.

Qwen3.5-27BQwen3.6-35B-A3BPractical read
Context256K class262K native / ~1.01M extendedNew model has more headroom
Active paramsdense-style 27B slot3B active MoECheaper inference profile
Terminal-Bench 2.041.651.5Meaningful jump for agent loops
NL2Repo27.329.4Better repo-scale reasoning
QwenWebBench10681397Better browser/front-end shaped tasks
Critique floor1.5 cr (retired)1 crCheaper and broader

One important correction to the simplistic launch line: the official Qwen table does not show a clean sweep on every single coding number. Qwen3.5-27B still posts higher values on some SWE rows. What it does show is that Qwen3.6-35B-A3B is stronger on the repo-scale, terminal, browser, and agentic workflow benchmarks that matter more to Critique’s review and Remedy loops, while also costing less in our catalog. That is enough to retire the older slot.

PART THREE - LING-2.6-FLASH JOINS AT 1 CREDIT

InclusionAI positions Ling-2.6-Flash as a fast instruct model for real-world agents: 104B total parameters, 7.4B active parameters, and a focus on token-efficient execution rather than theatrical benchmark chasing. The official Hugging Face card emphasizes lower token usage across coding, document processing, and lightweight workflows. That makes it a natural cheap specialist lane in Critique, especially for teams that want something faster and broader than a tiny extraction model but still do not want to pay mid-tier prices.

PART FOUR - QWEN3.6-MAX-PREVIEW JOINS AT 8 CREDITS

Alibaba positions Qwen3.6-Max-Preview as the higher-end proprietary Qwen lane. The official Model Studio docs list a 256K context window with thinking mode, function calling, and structured output support. Alibaba’s launch writeup says the preview model improves on Qwen3.6-Plus by +9.9 on SkillsBench, +6.3 on SciCode, +5.0 on NL2Repo, and +3.8 on Terminal-Bench 2.0, then summarizes the release by saying it leads six major coding benchmarks in their internal comparison set.

Qwen3.6-Max-Preview vs Qwen3.6-Plus

Vendor-reported deltas from Alibaba’s launch note. Higher is better.

Alibaba published the improvement margins and the six-benchmark summary in the launch article, not a full plain-text score table in the page body.

PART FIVE - GLM DOWN, KAT UP, MIMO UP

Three more credit moves round out the refresh. `z-ai/glm-5.1` drops from 4 credits to 3, making it a more attractive mid-tier generalist. `kwaipilot/kat-coder-pro-v2` rises from 1 credit back to its normal 2-credit shelf price; the lower number was a month-long partnership discount with the Kwaipilot team, not the permanent list price. And `xiaomi/mimo-v2.5` rises from 0.5 to 1.5 credits, ending the unusually cheap launch positioning it held earlier.

Primary sources

Ask about this essay

Nemotron-3-Super
Ask about the argument, the evidence, the structure, or how the post connects to Critique.
Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy