Skip to content
Product22 min readRepath Khan

Critique v5.1: We Hit Go, Walked Away for Five Hours, and Shipped While Reviewing Ourselves

v5.1.0 tightens the loop between coding agents and merge-grade review: transparent OpenCode decisions, credit previews, Builder-to-passport handoff, marketplace feedback you can see, and a repo-first Platform inbox — mostly built by autonomous agents with almost no human steering.

Critique v5.1

Autonomous ship · self-review · operator-first

critique.sh

v5.1.0 · 5 June 2026

We hit go. Five hours later, Critique had reviewed itself into v5.1.

Transparent OpenCode controllers. One Coding Agent across Builder and API with credit preview before you spend. Marketplace attribution on every skill. Repo-first inbox and Platform connections beta. Mostly built by autonomous agents — the same harness that reviews your PRs.

Review runs
Explainable
Controller waits, nudges, and evidence quality on the run
Coding Agent
One job
Builder UI and API share builderJob + credit preview
Marketplace
Attributed
Runs, false-positive rate, and leaderboard readiness on cards
Dashboard
Repo home
Cross-repo inbox · Platform when you need control room
v5.1.0
Ship date — 5 June 2026 (UTC)
~0 hrs
Autonomous agent window on this milestone (human: go + remedy)
One job
Builder UI and Coding Agent API share the same builderJob
/version
Operator ship log — outcome bullets, no repo archaeology

Picture the boring version first, because the boring version is true. A cloud agent branch opened. It read the v5.1 spec from internal ship notes and the existing v5.0 surfaces. It implemented review-controller persistence, Builder credit previews, API idempotency, marketplace attribution UI, repo-home defaults, and connections copy. It ran unit tests. It opened a pull request.

Then Critique reviewed that pull request the same way it reviews yours: sandbox OpenCode, structured findings, passport linkage, merge policy context. A human did not line-edit TypeScript at 2 a.m. They pressed go, said run Remedy on the failures that mattered, and let the critique-review skill argue with the agent until the diff was boring enough to merge. The loop closed. Review infrastructure shipped on infrastructure that had just been reviewed by itself.

Traditional ship
Engineers write code in isolationReview tool runs after the factChangelog written by whoever remembers
v5.1 ship window
Cloud agent implements from ship specCritique reviews the agent PR (same harness as customers)Remedy + critique-review on failures; human only gates mergeShip log and essay published from the same release branch

PR Review v5 made the diff win the first screen. v5.1 answers the next question operators actually ask: why did the agent do that? OpenCode is no longer a black box that either returns JSON or times out. The cheap controller now persists liveness decisions (wait, nudge, abort) and artifact decisions (accept, follow up) with an evidence-quality score and explicit missing-requirement lists.

When merge is blocked, the review-run detail page does not dump twelve equal-weight buttons. It shows what blocked you across verdict, passport, merge policy, and evidence quality, then offers one primary path: Start Remedy, Queue BYOA, or Dismiss with feedback. That sounds like UX polish. It is actually policy: merge decisions should not feel like a settings panel.

Operator-visible signals
  • Evidence strong / usable / limited — one badge, proof panel when you need depth
  • Controller timeline — why Critique waited or followed up before accept
  • Repo quick presets — auth/security and payments/data paths with stronger models
  • Finding feedback — each action states it updates Findings Memory on the repo
Collector parity
  • Yarn audit JSON/NDJSON parsed for high and critical advisories
  • Same sandbox collector lane as npm/pnpm — fewer “skipped repo” dead ends

If you lived through the era where every OpenCode failure looked like a hard 500, this release will feel almost insultingly calm. Recoverable stalls read as stalls. Thin evidence reads as thin evidence. The product stops training you to debug the harness when you meant to judge the pull request.

The most common v5.0 confusion was philosophical: “Is Builder different from the API?” Yes and no. Same `builderJob`, different front door. v5.1 makes that impossible to misunderstand.

Builder (Workspace)

Composer with model picker, credit preview before submit, draft PR publish, and handoff to passport/review when a PR number exists. You see balance, floor, and projected balance after the minimum charge; submit is blocked before the request if you cannot afford the model.

Coding Agent API

The same job over HTTP: create with optional `"preview": true`, idempotent retries, cursor lists, durable SSE resume, models discovery, status polls, cancel, safety budgets (`maxTurns`, `maxRuntimeMs`, `maxCredits`), and signed webhooks for idle/completed/failed/cancelled.

Responses now return `builderJobId` plus `links.builder`, `links.status`, `links.stream`, and `links.passport` when a draft PR exists. The public `/coding-agent-api` page says this out loud. Signed-in operators can jump from marketing copy to their latest Builder job without guessing IDs.

Coding agents that only ship code are demos. Coding agents that ship code and land in your change-control story are products. When a published Builder job has a PR number, v5.1 surfaces draft PR and Change Passport actions together: open the PR, jump to review for that branch, or walk back to the repository passport queue. The Branch / PR tab links both directions.

That handoff is the v5 thesis in one gesture: generation is cheap; merge evidence is expensive. Critique already had passports and policy from v4. v5.1 makes sure Builder output does not dead-end in a URL you paste into Slack.

v5.0 launched `/skills` with browse, publish, and a performance leaderboard. v5.1 makes the flywheel visible before you publish your third version. Marketplace cards and skill detail pages show attributed review runs, false-positive rate, label counts, and leaderboard readiness. Publishers can trace “this skill was active on run X” to ranked quality instead of install vanity.

Automation catches up: `crt_` keys gain `read:skills`, pinned `SKILL.md` fetch by version, and portable bundles for CI portals. Install panels can mark a marketplace skill as the default PR-review lens per repository; manual dashboard launches attach the slug to review-run metadata for attribution.

New installs no longer land on a first-run chooser. Repo home is the default. `/dashboard/pull-requests` opens on a cross-repo attention inbox: blocked, failed, warned, running, needs review — without forcing you to pick a repository before you see fire. Rows keep repository context attached; the picker shows webhook and cache health (Healthy, Stale cache, Missed events, Refresh failed, Install warning) where you actually select repos.

The old control room is not gone. It is `Platform` in the nav: passports, policy, delivery, connections, and fleet status when you need the wide lens. Repo-first for daily review work; Platform for the operating picture. Settings still let you switch experiences if your team runs control-room-first.

We stopped describing Connections as “just settings.” v5.1 labels what is live today: scoped `crt_` keys, MCP v0.1.0 with three tools (`list_passports`, `get_review_run`, `queue_review`), REST v1 for passports and review runs, the Coding Agent API, native Linear and Slack, and Zapier-first positioning for Jira, Sentry, Vercel, and everything else in the long tail.

Three MCP tools is thin for the v5 story. We know. Expanding MCP is the obvious P0 — `queue_remedy`, `get_passport`, `list_review_runs`, `compile_merge_policy_draft` — because MCP is how Cursor and Claude distributions actually install habits. v5.1 ships honest beta framing instead of pretending the catalog is finished.

Apr 2026
PR Review v5

Diff-first runs, multi-turn OpenCode, quieter default streams.

3 Jun 2026
v5.0.0 beta

Marketplace, NL merge policy, passport exports, Coding Agent API warm sessions, repo-first dashboard, Insights.

5 Jun 2026
v5.1.0

Transparent controllers, one Coding Agent product, credit previews, marketplace attribution, Platform inbox — mostly agent-built.

v5.0 was the “this is a platform” release. v5.1 is the “this is how we operate the platform while agents ship the next slice” release. The semver is honest: not a rewrite, a tightening of mental models and operator paths that v5.0 introduced faster than docs could follow.

  1. Human
    Go

    Pick branch, enable cloud agent, set scope to v5.1 ship notes.

  2. Agents
    ~5 hours

    Implement, test, PR, Critique review, Remedy on failures — repeat until mergeable.

  3. Critique
    Self-review

    Same critique-review skill and passport evidence as customer runs; findings feed marketplace metrics.

  4. Ship
    v5.1.0

    Changelog on /version, this essay, package.json 5.1.0 — deploy.

If you are an operator: open a recent review run that blocked merge and read the controller timeline. Flip a repo to an auth/security preset and rerun. If you automate coding: call create with `"preview": true` once before you wire CI spend. If you publish skills: check whether your lens shows attributed runs yet; install defaults per repo if you want leaderboard credit on manual launches.

If you are evaluating whether Critique is “real”: watch us ship v5.1 mostly agent-driven, then read the same findings we show you on your PRs. The ship log at `/version` is the operator contract. This essay is the why.

Read the ship log

Every bullet on /version is written for operators and finance — active voice, no private repo links, no implementation dump.

Open /version