Qwen 3.6 and Grok 4.3 in Critique: cheaper routing, stronger coding signal, and one absurd xAI price cut
We replaced Grok 4.2 with Grok 4.3 at 3 credits, retired Qwen3.5-27B in favor of Qwen3.6-35B-A3B at 1 credit, added Ling-2.6-Flash at 1 credit, added Qwen3.6-Max-Preview at 8 credits, discounted GLM-5.1 by 1 credit, and moved MiMo v2.5 plus KAT Coder Pro V2 to their new floors.
This update is less about adding names to a dropdown and more about cleaning up our routing ladder. The old Qwen3.5-27B slot had become hard to justify once Qwen shipped a 35B/3B-active successor that is better on repository reasoning, better on frontend tasks, and still cheap enough to use as a routine specialist. On the xAI side, Grok 4.3 gives us a simpler story: reasoning is always on, multimodal input stays available, the context window remains large, and the credit floor falls sharply enough that it stops being a novelty lane and starts being a real option.
New routing snapshot
Catalog floors after the May 2, 2026 refresh.
PART ONE - GROK 4.3 REPLACES GROK 4.2
xAI’s current developer docs surface `grok-4.3` as the active flagship in the overview flow, with a 1M-token context class and text-plus-image input support. Their model docs also keep the important behavior constraint from the Grok 4 family: reasoning is built in, and there is no separate reasoning-effort dial for the standard Grok 4 line. That makes Grok 4.3 a clean fit for Critique. We do not need to explain hidden mode switches to users, and we do not need one price for “thinking” and another for “non-thinking.”
How the Grok lane changed
We replaced the old Grok 4.2 entry with Grok 4.3 and repriced the lane hard downward.
Reasoning-only xAI flagship with 1M context and multimodal inputs, now viable for regular lead and specialist use.
xAI public docs show Grok 4.3 in the live developer overview as of May 2, 2026. The 1.25 / 2.50 vendor token pricing is the current xAI pricing input we used for this catalog refresh.
PART TWO - QWEN3.6-35B-A3B REPLACES QWEN3.5-27B
This is the cleaner benchmark story in the release. Qwen’s official Hugging Face card for Qwen3.6-35B-A3B lists a 35B total / 3B active MoE architecture, 262,144 native context, and extension to roughly 1.01M tokens. The published coding-agent table is mixed on the SWE rows, but it shows clear gains on the repo-scale and front-end-shaped tasks we care about most: Terminal-Bench 2.0, Claw-Eval average, SkillsBench, QwenClawBench, NL2Repo, and QwenWebBench.
Official Qwen model-card scores on repo, terminal, and browser-shaped coding tasks. Higher is better.
Qwen’s table mixes percentages with benchmark-specific scales such as QwenWebBench. Qwen3.5-27B still leads on some SWE rows, but Qwen3.6-35B-A3B leads most of the broader agentic, terminal, browser, and repo-style workflow rows while also costing less in our catalog.
The replacement call is about the overall workflow mix, not one benchmark in isolation.
| Qwen3.5-27B | Qwen3.6-35B-A3B | Practical read | |
|---|---|---|---|
| Context | 256K class | 262K native / ~1.01M extended | New model has more headroom |
| Active params | dense-style 27B slot | 3B active MoE | Cheaper inference profile |
| Terminal-Bench 2.0 | 41.6 | 51.5 | Meaningful jump for agent loops |
| NL2Repo | 27.3 | 29.4 | Better repo-scale reasoning |
| QwenWebBench | 1068 | 1397 | Better browser/front-end shaped tasks |
| Critique floor | 1.5 cr (retired) | 1 cr | Cheaper and broader |
One important correction to the simplistic launch line: the official Qwen table does not show a clean sweep on every single coding number. Qwen3.5-27B still posts higher values on some SWE rows. What it does show is that Qwen3.6-35B-A3B is stronger on the repo-scale, terminal, browser, and agentic workflow benchmarks that matter more to Critique’s review and Remedy loops, while also costing less in our catalog. That is enough to retire the older slot.
PART THREE - LING-2.6-FLASH JOINS AT 1 CREDIT
InclusionAI positions Ling-2.6-Flash as a fast instruct model for real-world agents: 104B total parameters, 7.4B active parameters, and a focus on token-efficient execution rather than theatrical benchmark chasing. The official Hugging Face card emphasizes lower token usage across coding, document processing, and lightweight workflows. That makes it a natural cheap specialist lane in Critique, especially for teams that want something faster and broader than a tiny extraction model but still do not want to pay mid-tier prices.
PART FOUR - QWEN3.6-MAX-PREVIEW JOINS AT 8 CREDITS
Alibaba positions Qwen3.6-Max-Preview as the higher-end proprietary Qwen lane. The official Model Studio docs list a 256K context window with thinking mode, function calling, and structured output support. Alibaba’s launch writeup says the preview model improves on Qwen3.6-Plus by +9.9 on SkillsBench, +6.3 on SciCode, +5.0 on NL2Repo, and +3.8 on Terminal-Bench 2.0, then summarizes the release by saying it leads six major coding benchmarks in their internal comparison set.
Vendor-reported deltas from Alibaba’s launch note. Higher is better.
Alibaba published the improvement margins and the six-benchmark summary in the launch article, not a full plain-text score table in the page body.
PART FIVE - GLM DOWN, KAT UP, MIMO UP
Three more credit moves round out the refresh. `z-ai/glm-5.1` drops from 4 credits to 3, making it a more attractive mid-tier generalist. `kwaipilot/kat-coder-pro-v2` rises from 1 credit back to its normal 2-credit shelf price; the lower number was a month-long partnership discount with the Kwaipilot team, not the permanent list price. And `xiaomi/mimo-v2.5` rises from 0.5 to 1.5 credits, ending the unusually cheap launch positioning it held earlier.