Model directory

Choose the right
model stack.

32 models for PR review, Remedy, and Critique Chat. Filter by job, compare benchmark scores, and line up lead, specialist, and Remedy lanes before you ship.

View benchmarks Pricing

Models in the review catalog

0.5 cr

Lowest credit floor (Gemma-4, MiMo, Ling, DeepSeek Flash)

Team-only escalation lanes

Benchmarks

Compare published scores
on the same suite.

Each tab is one exact benchmark — SWE-bench Verified and SWE-bench Pro are never mixed. Scores come from vendor launch posts and public eval tables in the Critique catalog, not Critique-run tests.

500 human-validated real GitHub issues — the standard cross-vendor coding ruler.

SWE-bench Verified

Source: Vendor model cards & launch posts · harnesses differ

25 models with data

Claude Opus 4.8

Anthropic

SWE-bench Verified

88.6%

#1 · 37 cr

Claude Opus 4.8 (Fast)

Anthropic

SWE-bench Verified

88.6%

#2 · 74 cr

GPT-5.5

OpenAI

SWE-bench Verified

82.6%

#3 · 40 cr

DeepSeek V4 Pro

DeepSeek

SWE-bench Verified

80.6%

#4 · 1 cr

Gemini 3.1 Pro

Google

SWE-bench Verified

80.6%

#5 · 16 cr

Qwen3.7-Max

Alibaba

SWE-bench Verified

80.4%

#6 · 6 cr

MiniMax-M2.7

MiniMax

SWE-bench Verified

80.2%

#7 · 1.5 cr

Kimi K2.6

MoonshotAI

SWE-bench Verified

80.2%

#8 · 4 cr

GPT-5.2 Pro

OpenAI

SWE-bench Verified

80%

#9 · 180 cr

Claude Sonnet 4.6

Anthropic

SWE-bench Verified

79.6%

#10 · 22 cr

DeepSeek V4 Flash

DeepSeek

SWE-bench Verified

79%

#11 · 0.5 cr

MiMo v2.5 Pro

Xiaomi

SWE-bench Verified

78.9%

#12 · 1 cr

Qwen3.6 Plus

Alibaba

SWE-bench Verified

78.8%

#13 · 2 cr

GPT-5.4

OpenAI

SWE-bench Verified

78.2%

#14 · 20 cr

Gemini 3 Flash

Google

SWE-bench Verified

78%

#15 · 4 cr

GLM-5.1

Z.AI

SWE-bench Verified

77.8%

#16 · 3 cr

StepFun-3.7 Flash

StepFun

SWE-bench Verified

74.4%

#17 · 1 cr

Ring-2.6-1T

InclusionAI

SWE-bench Verified

74%

#18 · 1.5 cr

GLM-5V-Turbo

Z.AI

SWE-bench Verified

73.8%

#19 · 3 cr

MiMo v2.5

Xiaomi

SWE-bench Verified

73.4%

#20 · 0.5 cr

KAT Coder Pro V2

KwaiPilot

SWE-bench Verified

73.4%

#21 · 2 cr

GPT-5.4 Mini

OpenAI

SWE-bench Verified

73%

#22 · 6 cr

Trinity-Large-Thinking

Arcee AI

SWE-bench Verified

63.2%

#23 · 1 cr

Ling-2.6-Flash

InclusionAI

SWE-bench Verified

61.2%

#24 · 0.5 cr

GPT-5.4 Nano

OpenAI

SWE-bench Verified

46.5%

#25 · 2 cr

Start with a stack,
not a blank slate.

Pre-configured lead + specialist + Remedy combinations. Pin any layer in the dashboard when you have data.

Profile	Lead	Specialist	Remedy
Default stack	GLM-5V-Turbo	Qwen3.6 Plus	MiMo v2.5 Pro
Cheap volume	DeepSeek V4 Flash	Ling-2.6-Flash	MiMo v2.5
Quality first	GPT-5.4	Claude Sonnet 4.6	GLM-5.1
Team escalation	Claude Sonnet 4.6	GPT-5.5	Claude Opus 4.8

Optimize for

Balanced quality and cost for most engineering teams — Best starting point for most teams. Strong lead + reliable Remedy.

Named routing templates

Best Default

GLM-5V-Turbo · Qwen3.6-35B-A3B · MiMo v2.5 Pro

3–8 cr / review

Cheap Volume

DeepSeek V4 Flash · Ling-2.6-Flash · MiMo v2.5

1–3 cr / review

Balanced Engineering

GLM-5.1 · GPT-5.5 · GLM-5V-Turbo

40–90 cr / review

Frontier Escalation

Claude Sonnet 4.6 · GPT-5.5 · GLM-5.1

40–240+ cr / review

Full catalog

Click a row for routing guidance, benchmark receipts, and OpenRouter IDs. Select up to three models to compare side by side.

32 models

Model	Provider	Floor	Context	Speed	Plan
GLM-5V-Turbo z-ai/glm-5v-turbo	Z.AI	3 cr	203K	Fast	Solo + Pro
MiMo v2.5 ProNew xiaomi/mimo-v2.5-pro	Xiaomi	1 cr	1M	Deep	Solo + Pro
Claude Sonnet 4.6 anthropic/claude-sonnet-4.6	Anthropic	22 cr	1M	Balanced	Solo + Pro
GPT-5.4 openai/gpt-5.4	OpenAI	20 cr	1M	Balanced	Solo + Pro
Qwen3.6 Plus qwen/qwen3.6-plus	Alibaba	2 cr	1M	Balanced	Solo + Pro
Gemma-4-31B google/gemma-4-31b-it	Google	0.5 cr	262K	Fast	Solo + Pro
MiMo v2.5 xiaomi/mimo-v2.5	Xiaomi	0.5 cr	1M	Instant	Solo + Pro
Ling-2.6-FlashNew inclusionai/ling-2.6-flash	InclusionAI	0.5 cr	262K	Instant	Solo + Pro
DeepSeek V4 FlashNew deepseek/deepseek-v4-flash	DeepSeek	0.5 cr	1M	Fast	Solo + Pro
StepFun-3.7 Flash stepfun/step-3.7-flash	StepFun	1 cr	262K	Instant	Solo + Pro
Trinity-Large-ThinkingNew arcee-ai/trinity-large-thinking	Arcee AI	1 cr	262K	Balanced	Solo + Pro
DeepSeek V4 ProNew deepseek/deepseek-v4-pro	DeepSeek	1 cr	1M	Balanced	Solo + Pro
MiniMax-M2.7 minimax/minimax-m2.7	MiniMax	1.5 cr	197K	Balanced	Solo + Pro
Ring-2.6-1T inclusionai/ring-2.6-1t	InclusionAI	1.5 cr	262K	Balanced	Solo + Pro
KAT Coder Pro V2 kwaipilot/kat-coder-pro-v2	KwaiPilot	2 cr	256K	Fast	Solo + Pro
Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite-preview	Google	2 cr	1M	Instant	Solo + Pro
GPT-5.4 Nano openai/gpt-5.4-nano	OpenAI	2 cr	400K	Instant	Solo + Pro
GLM-5.1 z-ai/glm-5.1	Z.AI	3 cr	203K	Balanced	Solo + Pro
Grok Build 0.1 x-ai/grok-build-0.1	xAI	3 cr	256K	Balanced	Solo + Pro
Grok 4.3New x-ai/grok-4.3	xAI	3 cr	1M	Balanced	Solo + Pro
Kimi K2.6New moonshotai/kimi-k2.6	MoonshotAI	4 cr	262K	Balanced	Solo + Pro
Gemini 3 Flash google/gemini-3-flash-preview	Google	4 cr	1M	Fast	Solo + Pro
GPT-5.4 Mini openai/gpt-5.4-mini	OpenAI	6 cr	400K	Fast	Solo + Pro
Qwen3.7-Max qwen/qwen3.7-max	Alibaba	6 cr	262K	Balanced	Solo + Pro
Gemini 3.5 Flash google/gemini-3.5-flash	Google	10 cr	1M	Balanced	Solo + Pro
Gemini 3.1 Pro google/gemini-3.1-pro-preview	Google	16 cr	1M	Deep	Solo + Pro
GPT-5.3 Codex openai/gpt-5.3-codex	OpenAI	18 cr	400K	Deep	Solo + Pro
Claude Opus 4.8 anthropic/claude-opus-4.8	Anthropic	37 cr	1M	Balanced	Team
GPT-5.5New openai/gpt-5.5	OpenAI	40 cr	1M	Balanced	Solo + Pro
Claude Opus 4.8 (Fast) anthropic/claude-opus-4.8-fast	Anthropic	74 cr	1M	Balanced	Team
GPT-5.2 Pro openai/gpt-5.2-pro	OpenAI	180 cr	400K	Deep	Team
GPT-5.5 Pro openai/gpt-5.5-pro	OpenAI	237 cr	1M	Deep	Team

Credit floor vs. real cost

Floor is not the full review bill

Actual spend is lead + specialist + depth multiplier. Remedy adds execution overhead and possible re-review.

See pricing examples →

Critique Chat

Two free chat models

Chat does not spend PR review credits. Current roster: Ling 2.6 Flash and DeepSeek V4 Flash.

Open Critique Chat →

Start with the default stack

GLM-5V-Turbo lead, Qwen3.6 Plus specialist, MiMo v2.5 Pro for Remedy — then swap layers when your PR data says so.

Get started

Choose the rightmodel stack.

Compare published scoreson the same suite.

Start with a stack,not a blank slate.

Full catalog

Floor is not the full review bill

Two free chat models

Start with the default stack

Choose the right
model stack.

Compare published scores
on the same suite.

Start with a stack,
not a blank slate.