Critique v5 Beta: Marketplace, Merge Policy in Plain English, Passport Exports, and the Biggest Ship Since v4
v5.0.0 is live in beta: Agent Skill Marketplace, natural-language merge policy, signed audit exports, Cursor SDK on Composer 2.5, warm Coding Agent API sessions, repo-first dashboard, Insights for leadership — and three companion essays that go deeper on models, harnesses, and automation.

Critique v5 beta
Marketplace · policy · exports · agents
critique.sh
Beta v5.0.0 · 3 June 2026
The largest ship since v4. Every merge boundary got an upgrade.
Agent Skill Marketplace. Natural-language merge policy. Signed passport exports. Cursor SDK on Composer 2.5. Coding Agent API with warm sessions. Repo-first dashboard. Insights for leadership. One beta release — not a changelog dump, a new operating layer.
What is Critique v5 beta?
Critique v5 beta is a coordinated platform release — not a single feature flag. It ships v5.0.0 on the Critique change-control stack that v4 introduced: Change Passports, the Control Board, evidence runs, merge policy checks, and Remedy proof. v5 adds operator surfaces that were missing or immature in v4.1–v4.2: a public Agent Skill Marketplace, a natural-language merge policy editor, signed passport exports, one-click finding feedback, Insights for velocity and compliance, a Coding Agent API with persistent sessions, Cursor BYOA via the Agent SDK, and a repo-first dashboard that lists every connected repository instead of a six-repo teaser.
If you installed Critique during the v4 “passport as product” era, nothing fundamental was removed. Reviews still run in sandboxes when policy allows. GitHub check names stay stable. v5 extends what you can configure, export, automate, and compare — without forcing a migration narrative.
Founder note: we almost shipped this as v4.3
Honest versioning story, because the semver on this post is doing real work. When we started this wave, the internal name was v4.3 — marketplace browse, a few dashboard fixes, polish on repo home. Reasonable increment. Then the merge policy editor stopped being “a nice extra” and became the thing operators kept asking for in support. We renamed the milestone v4.5 in Slack: “okay, this is bigger than a patch, but it is still v4-shaped.”
Then it kept getting bigger. Signed passport exports. Insights with backfill and compliance bundles. Finding feedback wired into leaderboard metrics. Cursor on the Agent SDK, not just REST handoff. Coding Agent API with warm sessions and SSE. Full repo catalogs everywhere instead of the six-repo teaser. Three companion essays because no single page could carry the benchmarks and the API examples without lying by omission. At some point we stopped debating whether this was “still v4” and admitted we were shipping the largest platform release since v4 Change Control itself. v5.0.0 is what that honesty looks like on the changelog.
- v4.3First internal name
Skill marketplace browse, dashboard polish, repo-home fixes — a sensible point release on the v4 passport stack.
- v4.5“Big, but still v4”
Natural-language merge policy, exports, Insights — too much for v4.3, not yet honest enough for a major bump.
- v5.0.0What actually shipped
Marketplace + policy compiler + audit exports + Cursor SDK + Coding Agent API + repo-first chrome + leadership Insights — one coordinated beta.
Crossing two billion tokens of review work is not a vanity metric for a slide deck. It is the load that exposed every boring bug that matters: credit ledger edge cases, sandbox liveness gaps, repo picker caps that only hurt power users, export manifests that had to survive an auditor’s unzip script. You did that with us — not by cheering from the sidelines, but by connecting repositories, burning credits on messy agent PRs, filing feedback when findings were wrong, and staying on Critique while we renamed the release three times in internal docs.
The companion essays (read these next)
v5 is too large for one page to carry every benchmark table and API example. We published three deep essays during the same ship window. They are standalone posts — not marketing attachments — and this launch note threads them like a feed so you can jump to the layer you care about.
Companion essay · v5 ship window
MiniMax M3 and Qwen3.7 Plus on Critique: Coding Benchmarks and a Two-Week M3 Welcome Price
Critique now routes minimax/minimax-m3 on the paid PR review catalog. M3 bills at 1.5 credits per run through June 17, 2026 — the same effective floor as MiniMax-M2.7 today — then returns to a 3-credit shelf. Vendor-reported SWE-Bench Pro, terminal, and multimodal scores vs M2.7, Qwen3.6 Plus, GLM-5.1, Kimi K2.6, Composer 2.5, and Claude Opus 4.8.
Companion essay · v5 ship window
Cursor as a Top-Tier Agent Harness: Composer 2.5, Cloud BYOA, and How It Compares to the Models on Critique
Critique’s Cursor BYOA path is live: save your Cursor API key, queue a fix from any completed review run, and we launch a cloud agent on your PR through the Cursor Agent SDK with Composer 2.5. Review stays on Critique credits; execution stays on your Cursor subscription.
Companion essay · v5 ship window
Coding Agent API: Persistent OpenCode Sessions for Multi-Turn Automation
Persistent sessions let your automation send the next instruction into the same OpenCode run instead of paying for a full re-clone, re-bootstrap, and chained summary prompt on every turn. Status becomes idle with sessionActive: true until you endSession or the sandbox expires.
v4 → v5: same passport, new operating system
v4 reframed the product. v5 ships the tools operators asked for in the first month of passports.
| Layer | v4.0–v4.2 | v5.0 beta |
|---|---|---|
| Review lenses | Built-in critique-review skill + repo policy | Public Skill Marketplace + performance leaderboard |
| Merge policy authoring | YAML/JSON in dashboard or repo file | Natural-language editor → compiled policy + diff preview |
| Compliance evidence | Passport timeline in UI | HMAC-signed JSON export + batch export + monthly compliance bundle |
| Agent execution handoff | Remedy + BYOA REST paths | Cursor Agent SDK default on composer-2.5 + Coding Agent API |
| Dashboard gravity | Passports queue or repo home (v4.2) | Unified workspace chrome + full repo catalog everywhere |
| Finding quality loop | Memory + suppressions | One-click Accepted / FP / Fixed / Suppress + opt-in anonymized export |
| Leadership view | Usage page + control room | Insights: velocity vs risk, cost attribution, retros, fleet benchmarks |
| Workspace staging | Chat + Builder in one shell | Durable agent queue + Processes inspector panel |
GitHub checks remain Critique / Checkpoint, Critique / Review, and Critique / Merge Policy. Branch protection you configured in v4 keeps working.
- v4.0Change Control Platform
Passports, Control Board, merge policy v1, evidence contract, Agent Firewall framing.
- v4.1–v4.2Ecosystem + repo inbox
Connections, MCP, Platform API, critique-review skill, repo-first PR dashboard with cached GitHub inbox.
- v5.0Operable at scale
Marketplace, NL merge policy v2, exports, Insights, Coding Agent API, Cursor SDK, full repo pickers.
Agent Skill Marketplace — portable review lenses with receipts
The Agent Skill Marketplace at `/skills` is the headline v5 surface for teams that treat review as a protocol, not a single prompt. Browse versioned critique-review lenses — official and community — search by category, and install skills into Critique Chat or any agent runtime with portable `npx skills add` bundles. The shape is familiar if you have used public skill directories; the ranking signal is not.
Install counts alone do not win the board. The Skill Performance Leaderboard at `/skills/leaderboard` ranks lenses by outcomes from real review runs: human acceptance rate, false-positive rate, actionable fixes, and post-merge incident correlation from finding feedback. That is the difference between “popular on GitHub” and “actually helped merges.”
- Browse and install at /skills — Critique Chat skills sheet links here
- Publish at /skills/publish — requires Critique account; public or unlisted listings
- Leaderboard at /skills/leaderboard — global board + org-internal mode after sign-in
- critique-review landing still offers Markdown download alongside marketplace flows
- Same metrics scoped to your GitHub App installation
- Visible only after sign-in — no private repo detail on the public board
- Compare custom lenses without exposing installation metadata
Browse lenses
Official critique-review variants and community skills — category search, version pins, portable bundles.
Publish a skill
Signed-in authors ship new skills or patch versions. Choose public vs unlisted; control whether stats appear globally.
Performance board
Rankings blend acceptance, false positives, fixes, and incident correlation — not install vanity metrics.
Chat integration
Critique Chat skills sheet links to marketplace install flows alongside built-in runtime guidance.
Open-source root skill
The MIT critique-review repo remains the baseline protocol. Marketplace lenses extend — not replace — that contract.
Merge policy in plain English — compile with models, enforce without them
Merge policy v1 in v4 was already deterministic at enforcement time: Critique evaluated `.critique/policy.yml`, published the Critique / Merge Policy check, and blocked or warned based on rules you saved. v5 adds the authoring experience operators actually wanted — describe rules in plain English, let Critique compile them into strict policy JSON, preview YAML, show a live rule diff, surface assumptions and unsupported clauses, and display a confidence badge before you save.
The LLM only translates intent at compile time. MiniMax-M3 is the primary compiler (`minimax/minimax-m3`), falling back to MiniMax-M2.7 and DeepSeek V4 Flash when needed. At merge time there are no model calls — same server-owned check, same deterministic evaluator, same branch-protection story as v4.
Block or warn when changed paths match risk tags (auth, migrations, infra, related lanes) or glob patterns. Require touched test files in the PR. Require a minimum count of current-head GitHub approvals — stale approvals on older commits do not count.
Require review blocks apply and policy PR when confidence is low. Allow draft PR still opens a draft policy pull request for manual review. Ask followups holds apply until clarification questions are answered.
Companion essay · v5 ship window
MiniMax M3 welcome pricing — same compile lane as review
M3 joins the paid PR review catalog at 1.5 credits per run through the welcome window — then 3 credits shelf. The merge policy compiler uses the same model family when available, so teams trialing M3 on reviews can reuse economic familiarity on policy compile without a separate SKU story.
MiniMax M3 welcome window and Qwen3.7 Plus on the review catalog
MiniMax M3 on PR review and Remedy — welcome pricing through mid-June.
Critique Chat stays Ling 2.6 Flash and DeepSeek V4 Flash only — no extra chat fee. M3 and Qwen3.7 Plus are review and Remedy lanes, not chat pickers. Older MiniMax or Qwen chat preferences normalize to DeepSeek V4 Flash so review models are not silently treated as free chat.
Blog promo strips with gold treasury styling now accept custom headlines and intros — so M3/Qwen launches no longer reuse DeepSeek-only copy from earlier catalog posts. `/models` and `/pricing` show the active welcome window with strikethrough shelf pricing, promo countdown, and clear post-promo credit floor.
Cursor BYOA on Composer 2.5 via the Agent SDK
Bring-your-own-agent for Cursor graduated in v5. Critique now queues through the Cursor Agent SDK on Composer 2.5 (`composer-2.5`) in Cursor cloud VMs — isolated repo clone, tool loop, PR URL, and `workOnCurrentBranch` on the head you reviewed. Execution bills your Cursor plan, not Critique review credits. If the SDK path is unavailable in a given deploy, Critique falls back to the Cloud Agents REST API with the same handoff shape.
Settings → Cursor agent (BYOA) and completed review runs explain the SDK path: save your key once, optionally add operator instructions, Queue Cursor agent, then Open in Cursor when the cloud run URL is ready. JSON export at `GET /api/review-runs/{reviewRunId}/byoa/cursor` still works for custom CI.
Companion essay · v5 ship window
Harness vs model — why we queue Cursor instead of replicating it
Teams argue about model leaderboards. Staff engineers argue about agent harnesses: Does the loop survive 40 minutes? Does it respect the PR branch? Critique does not try to replicate Cursor’s harness inside our sandboxes for BYOA — we already have Remedy for Critique-managed OpenCode on E2B.
One place for BYOA keys
Cursor, Anthropic, and OpenAI panels grouped with consistent copy and links to /docs/platform/byoa.
Queue from findings
Completed runs show Cursor queue actions, JSON export, and latest cloud run status.
SDK default documented
Claude Managed Agents and OpenAI Codex queued paths unchanged — same review-on-Critique split.
Repo-first dashboard — every connected repository, one chrome
v4.2 introduced repo home at `/dashboard/pull-requests` — searchable PR table, attention states, side inspector, checkpoint blockers, and quick settings for model lane, runtime, depth, context packs, and GitHub publish behavior. v5 fixes the footgun where repo pickers showed a short alphabetical slice instead of your full installation catalog.
Repo pickers on the repo-first dashboard now surface every connected repository. The old empty state capped quick picks at six repos with no search. Repo home shows an always-visible searchable list with connected-repo count; the header selector opens in a portaled menu so long lists are not clipped inside the scrolling dashboard shell. Issues and review-run repo filters use the same full installation list. Critique Chat and Workspace repo menus list the full catalog when opened — search still narrows, scrolling reaches repos beyond the first screen.
Dashboard and Settings share one workspace chrome — sidebar, top bar, spacing — so repo home, review runs, issues, control, usage, help, and settings subpages feel like one signed-in product, not a mix of marketing chrome and dashboard panels.
Change Passport exports — signed audit evidence, not a certification
Compliance teams kept asking for something they could attach to SOC 2 or ISO evidence requests without granting auditors Critique login. v5 ships Change Passport export: from the passport timeline, passport queue, or a review run linked to a passport, download redacted JSON with manifest section hashes, snapshots, provenance, risk, merge policy decisions, remedy proof, incidents, and timeline — HMAC-signed for integrity. Batch export covers up to 25 filtered passports in one file.
Copy positions this as audit evidence formatted for SOC 2 / ISO review, not a compliance certification. Critique exports what the passport recorded; your auditor decides whether that satisfies control objectives.
Passport export API
Platform API and signed-in dashboard routes mirror the same bundle shape.
curl -sS "https://critique.sh/api/v1/passports/${PASSPORT_ID}/export" \
-H "Authorization: Bearer ${CRT_API_KEY}" \
-o passport-audit.jsonFinding feedback and opt-in model-feedback sharing
Review-run findings gain one-click feedback without leaving the dashboard. Expand a finding to mark Accepted, False positive, Fixed, or Suppress — action buttons no longer compete with the expand control. Private memory, suppressions, and the existing feedback ledger still apply by default.
Optional anonymized model-feedback sharing is strictly opt-in. Grant or revoke consent per installation (or single repo) from Control Board → Memory, export a signed batch of queued examples, or tick Share anonymized feedback on a finding when consent is active. GitHub slash commands accept `--share-feedback` only with the same consent gate — no silent training export from PR comments.
Coding Agent API — warm OpenCode sessions and SSE
Coding Agent as API ships for automation teams: public overview at `/coding-agent-api`, `POST /api/v1/coding-agent/runs` to start an OpenCode-backed run (repo + prompt + model), `POST …/runs/{id}/messages` for follow-ups, and `GET …/runs/{id}` for status, timeline events, patch, and draft-PR linkage. Choose managed billing (Critique credits) or pass an OpenRouter key for that run only; optional draft PR publish and validation mode match Builder semantics.
v5’s persistent-session upgrade keeps a warm OpenCode session between turns on the same run. After the first turn finishes, status becomes `idle` (`sessionActive: true` until `sessionExpiresAt`) while the E2B sandbox and OpenCode session stay connected. Follow-ups send the next prompt into that live session instead of spinning a new sandbox with chained summary text. Live activity streams over SSE at `GET /api/v1/coding-agent/runs/{id}/stream`. Close explicitly with `{ "endSession": true }` on the messages route when automation is done.
Companion essay · v5 ship window
Why chained follow-ups were the right MVP — and why warm sessions are the right v5
When we shipped the Coding Agent API in Critique v5, the honest constraint was visible in the docs: follow-ups were new jobs that replayed prior output as text. That worked everywhere. Persistent sessions are the upgrade for teams running multi-turn CI bots that should not pay for a full re-clone every message.
Workspace agent queue and Processes inspector
Workspace adds a durable agent queue in the explorer: line up prompts for Critique, Claude Code, or Codex in Ask or Build mode, send or remove items via `/api/workspace/queue`, scoped to your active repository and chat or builder session — so you can stage work before kicking off a long run.
Workspace inspector → Processes shows live run and request state (streaming chat, builder job, sandbox, retrieval) in one panel so operators can tell whether work is still moving without digging through raw logs first.
Insights — velocity, risk, spend, and compliance in one signed-in hub
Insights at `/dashboard/insights` gives operators and leadership one hub grounded in Change Passport and gate evidence — not a separate analytics silo.
- Velocity vs risk — throughput vs risky merges in one glance
- Cost attribution — most expensive PRs this month + BYOK routing savings
- Blame-aware retrospective reports — merges, incidents, checkpoint overrides with cited evidence
- Staffing reports — projected weekly review load vs risk posture
- One-click compliance period export — signed outer bundle + per-passport audit JSON
- Fleet insights — benchmark comparisons when enough teams opt in (e.g. auth-path block rates)
- Daily insight rollups update as reviews complete and gates fire
- After upgrade, operators can backfill historical daily metrics from existing reviews and usage
- 30- and 90-day charts become useful immediately instead of waiting for new traffic
Insights hub
Velocity, cost, retros, compliance exports, staffing signals, optional fleet benchmarks.
Passport queue
Still the v4 system of record — now with export actions on timeline and batch filters.
Control Board
NL merge policy editor lives alongside Gate, Policy, Delivery, Memory, and Learnings.
Coding Agent API
Automation overview + warm sessions + SSE for internal agent platforms.
What we did not ship (honest boundaries)
v5 is large, but it is not everything on the roadmap. Passport export is audit evidence, not a compliance certification. Fleet insights require opt-in cohort participation — sparse cohorts show dry-run suggestions, not fabricated benchmarks. Model-feedback sharing does not automatically retrain frontier models; it feeds leaderboard metrics and signed export batches. Merge policy compilation can refuse or ask followups when confidence is low — that is intentional, not a bug.
Critique Chat did not gain M3 or Qwen3.7 Plus in the picker. Review and Remedy did. That separation keeps chat economics predictable and prevents review-tier models from being treated as free conversation.
Upgrade checklist for v4 operators
- 1Do you need custom review lenses?Browse /skills, install into Chat, or publish at /skills/publish. Check /skills/leaderboard for outcome-ranked lenses.
- 2Are merge rules still edited as YAML by hand?Open Control Board → merge policy editor. Compile in English, review diff + confidence, save or open policy PR.
- 3Does compliance ask for PR audit trails?Export single passports or batch up to 25 from the queue. Use Insights compliance period export for calendar months.
- 4Does your team standardize on Cursor for fixes?Save Cursor API key in Settings → Agents. Queue from review runs — SDK path on composer-2.5 by default.
- 5Are you building internal coding bots?Read /coding-agent-api and the persistent sessions essay. Use idle + sessionActive for multi-turn runs.
- 6Can leadership see speed vs safety?Open /dashboard/insights, backfill if needed, and pin velocity vs risk for your installation.
FAQ — Critique v5 beta
Run Critique v5 beta on your repositories
Connect GitHub, open repo home or passports, browse the Skill Marketplace, and export your first passport bundle. The v5 surfaces are live for signed-in installations.
Get started