May 2, 202616 min readCritique

Qwen 3.6 and Grok 4.3 in Critique: cheaper routing, stronger coding signal, and one absurd xAI price cut

We replaced Grok 4.2 with Grok 4.3 at 3 credits, retired Qwen3.5-27B in favor of Qwen3.6-35B-A3B at 1 credit, added Ling-2.6-Flash at 1 credit, added Qwen3.6-Max-Preview at 8 credits, discounted GLM-5.1 by 1 credit, and moved MiMo v2.5 plus KAT Coder Pro V2 to their new floors.

This update is less about adding names to a dropdown and more about cleaning up our routing ladder. The old Qwen3.5-27B slot had become hard to justify once Qwen shipped a 35B/3B-active successor that is better on repository reasoning, better on frontend tasks, and still cheap enough to use as a routine specialist. On the xAI side, Grok 4.3 gives us a simpler story: reasoning is always on, multimodal input stays available, the context window remains large, and the credit floor falls sharply enough that it stops being a novelty lane and starts being a real option.

3 cr
New Grok 4.3 floor, down from 8 cr for the old Grok 4.2 slot
1 cr
Qwen3.6-35B-A3B floor, 0.5 credits cheaper than retired Qwen3.5-27B
1 cr
Ling-2.6-Flash joins the catalog as a fast InclusionAI agent lane
8 cr
Qwen3.6-Max-Preview floor in Critique and Remedy
3 cr
GLM-5.1 now costs 1 credit less than before
2 cr
KAT Coder Pro V2 returns to its normal shelf price after a month-long discount
1.5 cr
New MiMo v2.5 floor after a +1 credit adjustment

New routing snapshotCatalog floors after the May 2, 2026 refresh.
Grok
3 cr
Grok 4.3
1M ctx · reasoning-only · text+image in
Qwen
1 cr
Qwen3.6-35B-A3B
SWE-Bench Verified 73.4%
AntGroup
1 cr
Ling-2.6-Flash
104B total · 7.4B active · fast agent lane
Qwen
8 cr
Qwen3.6-Max-Preview
Top score on 6 coding benchmarks (vendor summary)
Z.ai
3 cr
GLM-5.1
Discounted by 1 cr
KwaiKAT
2 cr
KAT Coder Pro V2
Back to normal shelf price
XiaomiMiMo
1.5 cr
MiMo v2.5
Credit floor raised by 1 cr

PART ONE - GROK 4.3 REPLACES GROK 4.2

xAI’s current developer docs surface `grok-4.3` as the active flagship in the overview flow, with a 1M-token context class and text-plus-image input support. Their model docs also keep the important behavior constraint from the Grok 4 family: reasoning is built in, and there is no separate reasoning-effort dial for the standard Grok 4 line. That makes Grok 4.3 a clean fit for Critique. We do not need to explain hidden mode switches to users, and we do not need one price for “thinking” and another for “non-thinking.”

xAI slot refresh
How the Grok lane changedWe replaced the old Grok 4.2 entry with Grok 4.3 and repriced the lane hard downward.
Grok
Throughput tier
Grok 4.3
Reasoning-only xAI flagship with 1M context and multimodal inputs, now viable for regular lead and specialist use.
Critique floor3 cr / run
xAI API — input / 1M$1.25
xAI API — output / 1M$2.50
xAI public docs show Grok 4.3 in the live developer overview as of May 2, 2026. The 1.25 / 2.50 vendor token pricing is the current xAI pricing input we used for this catalog refresh.

PART TWO - QWEN3.6-35B-A3B REPLACES QWEN3.5-27B

This is the cleaner benchmark story in the release. Qwen’s official Hugging Face card for Qwen3.6-35B-A3B lists a 35B total / 3B active MoE architecture, 262,144 native context, and extension to roughly 1.01M tokens. The published coding-agent table is mixed on the SWE rows, but it shows clear gains on the repo-scale and front-end-shaped tasks we care about most: Terminal-Bench 2.0, Claw-Eval average, SkillsBench, QwenClawBench, NL2Repo, and QwenWebBench.

Qwen3.6-35B-A3B on the rows that matter most to Critique

Official Qwen model-card scores on repo, terminal, and browser-shaped coding tasks. Higher is better.

Qwen’s table mixes percentages with benchmark-specific scales such as QwenWebBench. Qwen3.5-27B still leads on some SWE rows, but Qwen3.6-35B-A3B leads most of the broader agentic, terminal, browser, and repo-style workflow rows while also costing less in our catalog.

Why we still replaced Qwen3.5-27B

The replacement call is about the overall workflow mix, not one benchmark in isolation.

Qwen3.5-27B	Qwen3.6-35B-A3B	Practical read
Context	256K class	262K native / ~1.01M extended	New model has more headroom
Active params	dense-style 27B slot	3B active MoE	Cheaper inference profile
Terminal-Bench 2.0	41.6	51.5	Meaningful jump for agent loops
NL2Repo	27.3	29.4	Better repo-scale reasoning
QwenWebBench	1068	1397	Better browser/front-end shaped tasks
Critique floor	1.5 cr (retired)	1 cr	Cheaper and broader

One important correction to the simplistic launch line: the official Qwen table does not show a clean sweep on every single coding number. Qwen3.5-27B still posts higher values on some SWE rows. What it does show is that Qwen3.6-35B-A3B is stronger on the repo-scale, terminal, browser, and agentic workflow benchmarks that matter more to Critique’s review and Remedy loops, while also costing less in our catalog. That is enough to retire the older slot.

PART THREE - LING-2.6-FLASH JOINS AT 1 CREDIT

InclusionAI positions Ling-2.6-Flash as a fast instruct model for real-world agents: 104B total parameters, 7.4B active parameters, and a focus on token-efficient execution rather than theatrical benchmark chasing. The official Hugging Face card emphasizes lower token usage across coding, document processing, and lightweight workflows. That makes it a natural cheap specialist lane in Critique, especially for teams that want something faster and broader than a tiny extraction model but still do not want to pay mid-tier prices.

PART FOUR - QWEN3.6-MAX-PREVIEW JOINS AT 8 CREDITS

Alibaba positions Qwen3.6-Max-Preview as the higher-end proprietary Qwen lane. The official Model Studio docs list a 256K context window with thinking mode, function calling, and structured output support. Alibaba’s launch writeup says the preview model improves on Qwen3.6-Plus by +9.9 on SkillsBench, +6.3 on SciCode, +5.0 on NL2Repo, and +3.8 on Terminal-Bench 2.0, then summarizes the release by saying it leads six major coding benchmarks in their internal comparison set.

Qwen3.6-Max-Preview vs Qwen3.6-Plus

Vendor-reported deltas from Alibaba’s launch note. Higher is better.

Alibaba published the improvement margins and the six-benchmark summary in the launch article, not a full plain-text score table in the page body.

PART FIVE - GLM DOWN, KAT UP, MIMO UP

Three more credit moves round out the refresh. `z-ai/glm-5.1` drops from 4 credits to 3, making it a more attractive mid-tier generalist. `kwaipilot/kat-coder-pro-v2` rises from 1 credit back to its normal 2-credit shelf price; the lower number was a month-long partnership discount with the Kwaipilot team, not the permanent list price. And `xiaomi/mimo-v2.5` rises from 0.5 to 1.5 credits, ending the unusually cheap launch positioning it held earlier.

Primary sources
xAI developer overview
Live Grok 4.3 overview, context class, API examples, and model positioning.
xAI models and pricing docs
Reasoning-model behavior notes for the Grok 4 family and current pricing surface.
Qwen3.6-35B-A3B on Hugging Face
Official model card with architecture, context, and coding-agent benchmark table.
InclusionAI Ling-2.6-Flash on Hugging Face
Official model card with architecture and positioning for fast, token-efficient agents.
Alibaba Cloud Model Studio model list
Current Qwen3.6-Max-Preview context, function calling, and structured output support.
Alibaba Cloud launch note for Qwen3.6-Max-Preview
Published improvement deltas over Qwen3.6-Plus and six-benchmark summary.

← All essays Privacy & Terms

Get started

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy