ProductJune 5, 202622 min readRepath Khan

Critique v5.1: We Hit Go, Walked Away for Five Hours, and Shipped While Reviewing Ourselves

v5.1.0 tightens the loop between coding agents and merge-grade review: transparent OpenCode decisions, credit previews, Builder-to-passport handoff, marketplace feedback you can see, and a repo-first Platform inbox — mostly built by autonomous agents with almost no human steering.

Critique v5.1

Autonomous ship · self-review · operator-first

critique.sh

v5.1.0 · 5 June 2026

We hit go. Five hours later, Critique had reviewed itself into v5.1.

Transparent OpenCode controllers. One Coding Agent across Builder and API with credit preview before you spend. Marketplace attribution on every skill. Repo-first inbox and Platform connections beta. Mostly built by autonomous agents — the same harness that reviews your PRs.

Read v5.1 ship log Coding Agent API

Review runs

Explainable

Controller waits, nudges, and evidence quality on the run

Coding Agent

One job

Builder UI and API share builderJob + credit preview

Marketplace

Attributed

Runs, false-positive rate, and leaderboard readiness on cards

Dashboard

Repo home

Cross-repo inbox · Platform when you need control room

v5.1.0

Ship date — 5 June 2026 (UTC)

~0 hrs

Autonomous agent window on this milestone (human: go + remedy)

One job

Builder UI and Coding Agent API share the same builderJob

/version

Operator ship log — outcome bullets, no repo archaeology

What we mean by “Critique built Critique”

Picture the boring version first, because the boring version is true. A cloud agent branch opened. It read the v5.1 spec from internal ship notes and the existing v5.0 surfaces. It implemented review-controller persistence, Builder credit previews, API idempotency, marketplace attribution UI, repo-home defaults, and connections copy. It ran unit tests. It opened a pull request.

Then Critique reviewed that pull request the same way it reviews yours: sandbox OpenCode, structured findings, passport linkage, merge policy context. A human did not line-edit TypeScript at 2 a.m. They pressed go, said run Remedy on the failures that mattered, and let the critique-review skill argue with the agent until the diff was boring enough to merge. The loop closed. Review infrastructure shipped on infrastructure that had just been reviewed by itself.

Traditional ship
Engineers write code in isolation→Review tool runs after the fact→Changelog written by whoever remembers
v5.1 ship window
Cloud agent implements from ship spec→Critique reviews the agent PR (same harness as customers)→Remedy + critique-review on failures; human only gates merge→Ship log and essay published from the same release branch

Review runs that explain themselves

PR Review v5 made the diff win the first screen. v5.1 answers the next question operators actually ask: why did the agent do that? OpenCode is no longer a black box that either returns JSON or times out. The cheap controller now persists liveness decisions (wait, nudge, abort) and artifact decisions (accept, follow up) with an evidence-quality score and explicit missing-requirement lists.

When merge is blocked, the review-run detail page does not dump twelve equal-weight buttons. It shows what blocked you across verdict, passport, merge policy, and evidence quality, then offers one primary path: Start Remedy, Queue BYOA, or Dismiss with feedback. That sounds like UX polish. It is actually policy: merge decisions should not feel like a settings panel.

Operator-visible signals
Evidence strong / usable / limited — one badge, proof panel when you need depth
Controller timeline — why Critique waited or followed up before accept
Repo quick presets — auth/security and payments/data paths with stronger models
Finding feedback — each action states it updates Findings Memory on the repo
Collector parity
Yarn audit JSON/NDJSON parsed for high and critical advisories
Same sandbox collector lane as npm/pnpm — fewer “skipped repo” dead ends

If you lived through the era where every OpenCode failure looked like a hard 500, this release will feel almost insultingly calm. Recoverable stalls read as stalls. Thin evidence reads as thin evidence. The product stops training you to debug the harness when you meant to judge the pull request.

One Coding Agent product, two entry points

The most common v5.0 confusion was philosophical: “Is Builder different from the API?” Yes and no. Same `builderJob`, different front door. v5.1 makes that impossible to misunderstand.

Builder (Workspace)
Composer with model picker, credit preview before submit, draft PR publish, and handoff to passport/review when a PR number exists. You see balance, floor, and projected balance after the minimum charge; submit is blocked before the request if you cannot afford the model.
Coding Agent API
The same job over HTTP: create with optional "preview": true, idempotent retries, cursor lists, durable SSE resume, models discovery, status polls, cancel, safety budgets (maxTurns, maxRuntimeMs, maxCredits), and signed webhooks for idle/completed/failed/cancelled.

Responses now return `builderJobId` plus `links.builder`, `links.status`, `links.stream`, and `links.passport` when a draft PR exists. The public `/coding-agent-api` page says this out loud. Signed-in operators can jump from marketing copy to their latest Builder job without guessing IDs.

From green Builder run to merge-grade review

Coding agents that only ship code are demos. Coding agents that ship code and land in your change-control story are products. When a published Builder job has a PR number, v5.1 surfaces draft PR and Change Passport actions together: open the PR, jump to review for that branch, or walk back to the repository passport queue. The Branch / PR tab links both directions.

That handoff is the v5 thesis in one gesture: generation is cheap; merge evidence is expensive. Critique already had passports and policy from v4. v5.1 makes sure Builder output does not dead-end in a URL you paste into Slack.

Marketplace feedback you can see

v5.0 launched `/skills` with browse, publish, and a performance leaderboard. v5.1 makes the flywheel visible before you publish your third version. Marketplace cards and skill detail pages show attributed review runs, false-positive rate, label counts, and leaderboard readiness. Publishers can trace “this skill was active on run X” to ranked quality instead of install vanity.

Automation catches up: `crt_` keys gain `read:skills`, pinned `SKILL.md` fetch by version, and portable bundles for CI portals. Install panels can mark a marketplace skill as the default PR-review lens per repository; manual dashboard launches attach the slug to review-run metadata for attribution.

/version

Ship log

Outcome bullets for v5.1.0 — what moved for operators, no file-path archaeology.

Explore →

Marketplace

Browse skills

Official and community critique-review lenses with performance signal on the card.

Explore →

API

Coding Agent API

Preview, idempotency, webhooks, and the same builderJob as Workspace Builder.

Explore →

v5.0.0

v5.0 platform essay

Marketplace, merge policy in English, passport exports, Cursor SDK — the v5.0 story.

Explore →

Deep dive

Persistent sessions

Warm OpenCode between API turns — companion essay from the v5.0 API wave.

Explore →

Repo home first, Platform when you need the control room

New installs no longer land on a first-run chooser. Repo home is the default. `/dashboard/pull-requests` opens on a cross-repo attention inbox: blocked, failed, warned, running, needs review — without forcing you to pick a repository before you see fire. Rows keep repository context attached; the picker shows webhook and cache health (Healthy, Stale cache, Missed events, Refresh failed, Install warning) where you actually select repos.

The old control room is not gone. It is `Platform` in the nav: passports, policy, delivery, connections, and fleet status when you need the wide lens. Repo-first for daily review work; Platform for the operating picture. Settings still let you switch experiences if your team runs control-room-first.

Connections as a beta platform layer

We stopped describing Connections as “just settings.” v5.1 labels what is live today: scoped `crt_` keys, MCP v0.1.0 with three tools (list_passports, get_review_run, queue_review), REST v1 for passports and review runs, the Coding Agent API, native Linear and Slack, and Zapier-first positioning for Jira, Sentry, Vercel, and everything else in the long tail.

Three MCP tools is thin for the v5 story. We know. Expanding MCP is the obvious P0 — queue_remedy, get_passport, list_review_runs, compile_merge_policy_draft — because MCP is how Cursor and Claude distributions actually install habits. v5.1 ships honest beta framing instead of pretending the catalog is finished.

We really do not stop shipping

Apr 2026

PR Review v5

Diff-first runs, multi-turn OpenCode, quieter default streams.

3 Jun 2026

v5.0.0 beta

Marketplace, NL merge policy, passport exports, Coding Agent API warm sessions, repo-first dashboard, Insights.

5 Jun 2026

v5.1.0

Transparent controllers, one Coding Agent product, credit previews, marketplace attribution, Platform inbox — mostly agent-built.

v5.0 was the “this is a platform” release. v5.1 is the “this is how we operate the platform while agents ship the next slice” release. The semver is honest: not a rewrite, a tightening of mental models and operator paths that v5.0 introduced faster than docs could follow.

Human
Go
Pick branch, enable cloud agent, set scope to v5.1 ship notes.
Agents
~5 hours
Implement, test, PR, Critique review, Remedy on failures — repeat until mergeable.
Critique
Self-review
Same critique-review skill and passport evidence as customer runs; findings feed marketplace metrics.
Ship
v5.1.0
Changelog on /version, this essay, package.json 5.1.0 — deploy.

What to do Monday morning

If you are an operator: open a recent review run that blocked merge and read the controller timeline. Flip a repo to an auth/security preset and rerun. If you automate coding: call create with `"preview": true` once before you wire CI spend. If you publish skills: check whether your lens shows attributed runs yet; install defaults per repo if you want leaderboard credit on manual launches.

If you are evaluating whether Critique is “real”: watch us ship v5.1 mostly agent-driven, then read the same findings we show you on your PRs. The ship log at `/version` is the operator contract. This essay is the why.

Read the ship log

Every bullet on /version is written for operators and finance — active voice, no private repo links, no implementation dump.

Open /version

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy