Skip to content
45 min readRepath Khan

Critique v5 Beta: Marketplace, Merge Policy in Plain English, Passport Exports, and the Biggest Ship Since v4

v5.0.0 is live in beta: Agent Skill Marketplace, natural-language merge policy, signed audit exports, Cursor SDK on Composer 2.5, warm Coding Agent API sessions, repo-first dashboard, Insights for leadership — and three companion essays that go deeper on models, harnesses, and automation.

Critique v5 beta

Marketplace · policy · exports · agents

critique.sh

Beta v5.0.0 · 3 June 2026

The largest ship since v4. Every merge boundary got an upgrade.

Agent Skill Marketplace. Natural-language merge policy. Signed passport exports. Cursor SDK on Composer 2.5. Coding Agent API with warm sessions. Repo-first dashboard. Insights for leadership. One beta release — not a changelog dump, a new operating layer.

Marketplace
/skills
Versioned review lenses + leaderboard
Merge policy
NL → YAML
MiniMax-M3 compile, deterministic enforce
Passport export
HMAC bundle
SOC 2 / ISO audit evidence
Insights
30–90d
Velocity vs risk, cost, retros
v5.0.0
Beta release — 3 June 2026 (UTC ship window)
/skills
Agent Skill Marketplace — browse, install, publish
HMAC
Signed passport export for compliance workflows
SSE
Live Coding Agent API stream + persistent OpenCode sessions

Critique v5 beta is a coordinated platform release — not a single feature flag. It ships v5.0.0 on the Critique change-control stack that v4 introduced: Change Passports, the Control Board, evidence runs, merge policy checks, and Remedy proof. v5 adds operator surfaces that were missing or immature in v4.1–v4.2: a public Agent Skill Marketplace, a natural-language merge policy editor, signed passport exports, one-click finding feedback, Insights for velocity and compliance, a Coding Agent API with persistent sessions, Cursor BYOA via the Agent SDK, and a repo-first dashboard that lists every connected repository instead of a six-repo teaser.

If you installed Critique during the v4 “passport as product” era, nothing fundamental was removed. Reviews still run in sandboxes when policy allows. GitHub check names stay stable. v5 extends what you can configure, export, automate, and compare — without forcing a migration narrative.

Honest versioning story, because the semver on this post is doing real work. When we started this wave, the internal name was v4.3 — marketplace browse, a few dashboard fixes, polish on repo home. Reasonable increment. Then the merge policy editor stopped being “a nice extra” and became the thing operators kept asking for in support. We renamed the milestone v4.5 in Slack: “okay, this is bigger than a patch, but it is still v4-shaped.”

Then it kept getting bigger. Signed passport exports. Insights with backfill and compliance bundles. Finding feedback wired into leaderboard metrics. Cursor on the Agent SDK, not just REST handoff. Coding Agent API with warm sessions and SSE. Full repo catalogs everywhere instead of the six-repo teaser. Three companion essays because no single page could carry the benchmarks and the API examples without lying by omission. At some point we stopped debating whether this was “still v4” and admitted we were shipping the largest platform release since v4 Change Control itself. v5.0.0 is what that honesty looks like on the changelog.

  1. v4.3
    First internal name

    Skill marketplace browse, dashboard polish, repo-home fixes — a sensible point release on the v4 passport stack.

  2. v4.5
    “Big, but still v4”

    Natural-language merge policy, exports, Insights — too much for v4.3, not yet honest enough for a major bump.

  3. v5.0.0
    What actually shipped

    Marketplace + policy compiler + audit exports + Cursor SDK + Coding Agent API + repo-first chrome + leadership Insights — one coordinated beta.

0B+
Review tokens processed on Critique — and counting
v4.3→v5
Internal semver journey before we named the release honestly
0
Companion essays shipped in the same window as v5.0.0
You
Teams who ran real PRs through beta and told us what broke

Crossing two billion tokens of review work is not a vanity metric for a slide deck. It is the load that exposed every boring bug that matters: credit ledger edge cases, sandbox liveness gaps, repo picker caps that only hurt power users, export manifests that had to survive an auditor’s unzip script. You did that with us — not by cheering from the sidelines, but by connecting repositories, burning credits on messy agent PRs, filing feedback when findings were wrong, and staying on Critique while we renamed the release three times in internal docs.

v5 is too large for one page to carry every benchmark table and API example. We published three deep essays during the same ship window. They are standalone posts — not marketing attachments — and this launch note threads them like a feed so you can jump to the layer you care about.

Critique@critique· 3 Jun 2026

Companion essay · v5 ship window

MiniMax M3 and Qwen3.7 Plus on Critique: Coding Benchmarks and a Two-Week M3 Welcome Price

Critique now routes minimax/minimax-m3 on the paid PR review catalog. M3 bills at 1.5 credits per run through June 17, 2026 — the same effective floor as MiniMax-M2.7 today — then returns to a 3-credit shelf. Vendor-reported SWE-Bench Pro, terminal, and multimodal scores vs M2.7, Qwen3.6 Plus, GLM-5.1, Kimi K2.6, Composer 2.5, and Claude Opus 4.8.
Critique@critique· 4 Jun 2026

Companion essay · v5 ship window

Cursor as a Top-Tier Agent Harness: Composer 2.5, Cloud BYOA, and How It Compares to the Models on Critique

Critique’s Cursor BYOA path is live: save your Cursor API key, queue a fix from any completed review run, and we launch a cloud agent on your PR through the Cursor Agent SDK with Composer 2.5. Review stays on Critique credits; execution stays on your Cursor subscription.
Critique@critique· 4 Jun 2026

Companion essay · v5 ship window

Coding Agent API: Persistent OpenCode Sessions for Multi-Turn Automation

Persistent sessions let your automation send the next instruction into the same OpenCode run instead of paying for a full re-clone, re-bootstrap, and chained summary prompt on every turn. Status becomes idle with sessionActive: true until you endSession or the sandbox expires.
What v5 adds on top of v4 Change Control

v4 reframed the product. v5 ships the tools operators asked for in the first month of passports.

Layerv4.0–v4.2v5.0 beta
Review lensesBuilt-in critique-review skill + repo policyPublic Skill Marketplace + performance leaderboard
Merge policy authoringYAML/JSON in dashboard or repo fileNatural-language editor → compiled policy + diff preview
Compliance evidencePassport timeline in UIHMAC-signed JSON export + batch export + monthly compliance bundle
Agent execution handoffRemedy + BYOA REST pathsCursor Agent SDK default on composer-2.5 + Coding Agent API
Dashboard gravityPassports queue or repo home (v4.2)Unified workspace chrome + full repo catalog everywhere
Finding quality loopMemory + suppressionsOne-click Accepted / FP / Fixed / Suppress + opt-in anonymized export
Leadership viewUsage page + control roomInsights: velocity vs risk, cost attribution, retros, fleet benchmarks
Workspace stagingChat + Builder in one shellDurable agent queue + Processes inspector panel

GitHub checks remain Critique / Checkpoint, Critique / Review, and Critique / Merge Policy. Branch protection you configured in v4 keeps working.

  1. v4.0
    Change Control Platform

    Passports, Control Board, merge policy v1, evidence contract, Agent Firewall framing.

  2. v4.1–v4.2
    Ecosystem + repo inbox

    Connections, MCP, Platform API, critique-review skill, repo-first PR dashboard with cached GitHub inbox.

  3. v5.0
    Operable at scale

    Marketplace, NL merge policy v2, exports, Insights, Coding Agent API, Cursor SDK, full repo pickers.

The Agent Skill Marketplace at `/skills` is the headline v5 surface for teams that treat review as a protocol, not a single prompt. Browse versioned critique-review lenses — official and community — search by category, and install skills into Critique Chat or any agent runtime with portable `npx skills add` bundles. The shape is familiar if you have used public skill directories; the ranking signal is not.

Install counts alone do not win the board. The Skill Performance Leaderboard at `/skills/leaderboard` ranks lenses by outcomes from real review runs: human acceptance rate, false-positive rate, actionable fixes, and post-merge incident correlation from finding feedback. That is the difference between “popular on GitHub” and “actually helped merges.”

Marketplace operator paths
  • Browse and install at /skills — Critique Chat skills sheet links here
  • Publish at /skills/publish — requires Critique account; public or unlisted listings
  • Leaderboard at /skills/leaderboard — global board + org-internal mode after sign-in
  • critique-review landing still offers Markdown download alongside marketplace flows
Org-internal leaderboard mode
  • Same metrics scoped to your GitHub App installation
  • Visible only after sign-in — no private repo detail on the public board
  • Compare custom lenses without exposing installation metadata

Merge policy v1 in v4 was already deterministic at enforcement time: Critique evaluated `.critique/policy.yml`, published the Critique / Merge Policy check, and blocked or warned based on rules you saved. v5 adds the authoring experience operators actually wanted — describe rules in plain English, let Critique compile them into strict policy JSON, preview YAML, show a live rule diff, surface assumptions and unsupported clauses, and display a confidence badge before you save.

The LLM only translates intent at compile time. MiniMax-M3 is the primary compiler (`minimax/minimax-m3`), falling back to MiniMax-M2.7 and DeepSeek V4 Flash when needed. At merge time there are no model calls — same server-owned check, same deterministic evaluator, same branch-protection story as v4.

Merge policy v2 rules

Block or warn when changed paths match risk tags (auth, migrations, infra, related lanes) or glob patterns. Require touched test files in the PR. Require a minimum count of current-head GitHub approvals — stale approvals on older commits do not count.

Safety modes

Require review blocks apply and policy PR when confidence is low. Allow draft PR still opens a draft policy pull request for manual review. Ask followups holds apply until clarification questions are answered.

Authoring flow
Operator describes merge rules in natural language on Control BoardCritique compiles to strict JSON with MiniMax-M3 (fallback chain)Preview JSON + canonical YAML + live rule diff + confidence badgeSave to dashboard, open draft/ready policy PR, or copy YAML — validation gates intact
Enforcement (unchanged contract)
Pull request reaches merge gateDeterministic evaluator reads .critique/policy.ymlCritique / Merge Policy check publishes — no LLM at merge time
Critique@critique· 3 Jun 2026

Companion essay · v5 ship window

MiniMax M3 welcome pricing — same compile lane as review

M3 joins the paid PR review catalog at 1.5 credits per run through the welcome window — then 3 credits shelf. The merge policy compiler uses the same model family when available, so teams trialing M3 on reviews can reuse economic familiarity on policy compile without a separate SKU story.
June 2026 — v5 model lane

MiniMax M3 on PR review and Remedy — welcome pricing through mid-June.

Critique Chat stays Ling 2.6 Flash and DeepSeek V4 Flash only — no extra chat fee. M3 and Qwen3.7 Plus are review and Remedy lanes, not chat pickers. Older MiniMax or Qwen chat preferences normalize to DeepSeek V4 Flash so review models are not silently treated as free chat.

50% welcome
MiniMax M3
1.5 credits / run3 credits shelf
Ends Through June 17, 2026 (UTC)
Review lane
Qwen3.7 Plus
1.5 credits / runLegacy Qwen ids alias forward
Ends Lead, specialist, Remedy stacks

Blog promo strips with gold treasury styling now accept custom headlines and intros — so M3/Qwen launches no longer reuse DeepSeek-only copy from earlier catalog posts. `/models` and `/pricing` show the active welcome window with strikethrough shelf pricing, promo countdown, and clear post-promo credit floor.

Bring-your-own-agent for Cursor graduated in v5. Critique now queues through the Cursor Agent SDK on Composer 2.5 (`composer-2.5`) in Cursor cloud VMs — isolated repo clone, tool loop, PR URL, and `workOnCurrentBranch` on the head you reviewed. Execution bills your Cursor plan, not Critique review credits. If the SDK path is unavailable in a given deploy, Critique falls back to the Cloud Agents REST API with the same handoff shape.

Settings → Cursor agent (BYOA) and completed review runs explain the SDK path: save your key once, optionally add operator instructions, Queue Cursor agent, then Open in Cursor when the cloud run URL is ready. JSON export at `GET /api/review-runs/{reviewRunId}/byoa/cursor` still works for custom CI.

Critique@critique· 4 Jun 2026

Companion essay · v5 ship window

Harness vs model — why we queue Cursor instead of replicating it

Teams argue about model leaderboards. Staff engineers argue about agent harnesses: Does the loop survive 40 minutes? Does it respect the PR branch? Critique does not try to replicate Cursor’s harness inside our sandboxes for BYOA — we already have Remedy for Critique-managed OpenCode on E2B.

v4.2 introduced repo home at `/dashboard/pull-requests` — searchable PR table, attention states, side inspector, checkpoint blockers, and quick settings for model lane, runtime, depth, context packs, and GitHub publish behavior. v5 fixes the footgun where repo pickers showed a short alphabetical slice instead of your full installation catalog.

Repo pickers on the repo-first dashboard now surface every connected repository. The old empty state capped quick picks at six repos with no search. Repo home shows an always-visible searchable list with connected-repo count; the header selector opens in a portaled menu so long lists are not clipped inside the scrolling dashboard shell. Issues and review-run repo filters use the same full installation list. Critique Chat and Workspace repo menus list the full catalog when opened — search still narrows, scrolling reaches repos beyond the first screen.

Dashboard and Settings share one workspace chrome — sidebar, top bar, spacing — so repo home, review runs, issues, control, usage, help, and settings subpages feel like one signed-in product, not a mix of marketing chrome and dashboard panels.

Compliance teams kept asking for something they could attach to SOC 2 or ISO evidence requests without granting auditors Critique login. v5 ships Change Passport export: from the passport timeline, passport queue, or a review run linked to a passport, download redacted JSON with manifest section hashes, snapshots, provenance, risk, merge policy decisions, remedy proof, incidents, and timeline — HMAC-signed for integrity. Batch export covers up to 25 filtered passports in one file.

Copy positions this as audit evidence formatted for SOC 2 / ISO review, not a compliance certification. Critique exports what the passport recorded; your auditor decides whether that satisfies control objectives.

Passport export API

Platform API and signed-in dashboard routes mirror the same bundle shape.

curl -sS "https://critique.sh/api/v1/passports/${PASSPORT_ID}/export" \
  -H "Authorization: Bearer ${CRT_API_KEY}" \
  -o passport-audit.json

Review-run findings gain one-click feedback without leaving the dashboard. Expand a finding to mark Accepted, False positive, Fixed, or Suppress — action buttons no longer compete with the expand control. Private memory, suppressions, and the existing feedback ledger still apply by default.

Optional anonymized model-feedback sharing is strictly opt-in. Grant or revoke consent per installation (or single repo) from Control Board → Memory, export a signed batch of queued examples, or tick Share anonymized feedback on a finding when consent is active. GitHub slash commands accept `--share-feedback` only with the same consent gate — no silent training export from PR comments.

Coding Agent as API ships for automation teams: public overview at `/coding-agent-api`, `POST /api/v1/coding-agent/runs` to start an OpenCode-backed run (repo + prompt + model), `POST …/runs/{id}/messages` for follow-ups, and `GET …/runs/{id}` for status, timeline events, patch, and draft-PR linkage. Choose managed billing (Critique credits) or pass an OpenRouter key for that run only; optional draft PR publish and validation mode match Builder semantics.

v5’s persistent-session upgrade keeps a warm OpenCode session between turns on the same run. After the first turn finishes, status becomes `idle` (`sessionActive: true` until `sessionExpiresAt`) while the E2B sandbox and OpenCode session stay connected. Follow-ups send the next prompt into that live session instead of spinning a new sandbox with chained summary text. Live activity streams over SSE at `GET /api/v1/coding-agent/runs/{id}/stream`. Close explicitly with `{ "endSession": true }` on the messages route when automation is done.

Critique@critique· 4 Jun 2026

Companion essay · v5 ship window

Why chained follow-ups were the right MVP — and why warm sessions are the right v5

When we shipped the Coding Agent API in Critique v5, the honest constraint was visible in the docs: follow-ups were new jobs that replayed prior output as text. That worked everywhere. Persistent sessions are the upgrade for teams running multi-turn CI bots that should not pay for a full re-clone every message.

Workspace adds a durable agent queue in the explorer: line up prompts for Critique, Claude Code, or Codex in Ask or Build mode, send or remove items via `/api/workspace/queue`, scoped to your active repository and chat or builder session — so you can stage work before kicking off a long run.

Workspace inspector → Processes shows live run and request state (streaming chat, builder job, sandbox, retrieval) in one panel so operators can tell whether work is still moving without digging through raw logs first.

Insights at `/dashboard/insights` gives operators and leadership one hub grounded in Change Passport and gate evidence — not a separate analytics silo.

30–90d
Velocity vs risk charts — daily merges vs high-risk merges
BYOK
Cost attribution — credits by repo + BYOK savings estimate
14d
Retrospective override heuristic window vs linked incidents
Opt-in
Fleet benchmarks — anonymized cohort, no repo names exported
Insights surfaces
  • Velocity vs risk — throughput vs risky merges in one glance
  • Cost attribution — most expensive PRs this month + BYOK routing savings
  • Blame-aware retrospective reports — merges, incidents, checkpoint overrides with cited evidence
  • Staffing reports — projected weekly review load vs risk posture
  • One-click compliance period export — signed outer bundle + per-passport audit JSON
  • Fleet insights — benchmark comparisons when enough teams opt in (e.g. auth-path block rates)
Live rollups + backfill
  • Daily insight rollups update as reviews complete and gates fire
  • After upgrade, operators can backfill historical daily metrics from existing reviews and usage
  • 30- and 90-day charts become useful immediately instead of waiting for new traffic

v5 is large, but it is not everything on the roadmap. Passport export is audit evidence, not a compliance certification. Fleet insights require opt-in cohort participation — sparse cohorts show dry-run suggestions, not fabricated benchmarks. Model-feedback sharing does not automatically retrain frontier models; it feeds leaderboard metrics and signed export batches. Merge policy compilation can refuse or ask followups when confidence is low — that is intentional, not a bug.

Critique Chat did not gain M3 or Qwen3.7 Plus in the picker. Review and Remedy did. That separation keeps chat economics predictable and prevents review-tier models from being treated as free conversation.

First hour on v5 beta
  1. 1
    Do you need custom review lenses?
    Browse /skills, install into Chat, or publish at /skills/publish. Check /skills/leaderboard for outcome-ranked lenses.
  2. 2
    Are merge rules still edited as YAML by hand?
    Open Control Board → merge policy editor. Compile in English, review diff + confidence, save or open policy PR.
  3. 3
    Does compliance ask for PR audit trails?
    Export single passports or batch up to 25 from the queue. Use Insights compliance period export for calendar months.
  4. 4
    Does your team standardize on Cursor for fixes?
    Save Cursor API key in Settings → Agents. Queue from review runs — SDK path on composer-2.5 by default.
  5. 5
    Are you building internal coding bots?
    Read /coding-agent-api and the persistent sessions essay. Use idle + sessionActive for multi-turn runs.
  6. 6
    Can leadership see speed vs safety?
    Open /dashboard/insights, backfill if needed, and pin velocity vs risk for your installation.
Internally the release started as v4.3 (marketplace + dashboard polish), was renamed v4.5 when merge policy and exports landed, and became v5.0.0 when Cursor SDK BYOA, Coding Agent API persistent sessions, Insights, and the full repo catalog shipped as one coordinated platform update — larger than any v4 point release.
Critique v5.0.0 is the largest platform release since v4 Change Control. It adds the Agent Skill Marketplace, natural-language merge policy compiler, signed Change Passport exports, Cursor Agent SDK BYOA on Composer 2.5, Coding Agent API with persistent OpenCode sessions, repo-first dashboard improvements, finding feedback, opt-in anonymized model-feedback sharing, Workspace agent queue, and Insights for velocity, risk, cost, and compliance.
Critique PR Review v5 (April) reorganized the review-run page around diff-first layout and multi-turn OpenCode in one session. Platform v5.0.0 (June) is the change-control platform release: marketplace, merge policy v2, exports, Insights, Coding Agent API, and dashboard chrome. Same name family, different scope — platform v5 vs review UX v5.
Browse at /skills, publish at /skills/publish, and see outcome-based rankings at /skills/leaderboard. Critique Chat links from the skills sheet. Install portable bundles with npx skills add compatible flows.
No. LLMs compile operator intent at authoring time (MiniMax-M3 with fallbacks). Enforcement remains deterministic via Critique / Merge Policy and .critique/policy.yml — same contract as v4.
Redacted JSON with manifest section hashes, snapshots, provenance, risk, merge policy decisions, remedy proof, incidents, and timeline — HMAC-signed. Positioned as SOC 2 / ISO audit evidence, not certification.
Review and findings stay on Critique credits (or BYOK for review models). Cursor cloud agent execution bills your Cursor plan via your saved API key. Critique queues through the Cursor Agent SDK on composer-2.5 when available.
M3 review runs cost 1.5 credits through June 17, 2026 (UTC), then 3 credits shelf. See /models, /pricing, and the M3/Qwen companion essay for benchmark context.
Yes. After the first turn, runs can enter idle with sessionActive true while the E2B sandbox stays warm. POST follow-ups to /messages on the same run id; stream activity via SSE; end with endSession: true.
Signed-in Insights at /dashboard/insights charts daily merges against high-risk merges over 30- and 90-day windows, with cost attribution and optional fleet benchmarks when opted in.
No. Anonymized sharing is opt-in per installation or repo from Control Board → Memory. Examples strip patches and secrets; they feed leaderboard metrics, not silent training export.

Run Critique v5 beta on your repositories

Connect GitHub, open repo home or passports, browse the Skill Marketplace, and export your first passport bundle. The v5 surfaces are live for signed-in installations.

Get started