ProductMay 30, 202624 min readThe Critique Team

Critique v4: The AI Change Control Platform (and What Happened to “Just Review”)

Reviews still run — in sandboxes, with evidence. v4 changes what Critique optimizes for: who gets a full review, why it runs, what may merge, and one passport per PR that remembers the answer.

v4.0.0

Largest product gravity shift since launch

Change Passport per pull request — the product object

Enforcement phases: Gate, Review, Merge

∞

Evidence runs per passport — reviews did not go away

What we are building (and what we are not)

Our goal is to be the best AI management platform for AI-powered software change — the layer that decides how machine-generated and human-generated work flows through your repository before it becomes everyone else’s dependency. That is broader than “leave a review on the PR.” It includes gating, policy, risk, evidence, verified repair, and memory from production.

We are explicitly not trying to be a prettier precursor to the comment bots you already know: single-model diff commentators that optimize for thread volume. Those tools ask one question — “what might be wrong here?” Critique v4 asks four: who made this change, should we spend review on it, what proof blocks merge, and what did we learn last time this path broke?

Critique v1 through v3: review was the whole story

Early Critique was legible because the category was legible: AI code review for GitHub. You installed the App, a pull request opened, and Critique produced a review — scout context, specialist passes, a lead verdict, comments on the diff. The dashboard was a review inbox. Success meant “the bot caught something a human would have missed.”

v1–v2
Review as product
Multi-model PR review, GitHub check runs, growing specialist lanes. The unit of value was the review artifact on the commit.
v3
More engine, same object
Checkpoint (deterministic pre-review gate), Workspace, Remedy, BYOK, richer sandbox execution — but operators still lived in review runs and comments.
v4
Passport as product
The PR-level Change Passport is the system of record. Review runs become evidence generators inside that story.

That progression matters because nothing was thrown away. The review pipeline — retrieval, sandbox execution, model routing, specialist synthesis — is the engine that produces evidence. v4 changes what sits in the driver’s seat: the passport and the Control Board, not the comment thread.

v4 in one table: same engine, different contract

What actually changed in v4

If you used Critique in v3 last week, here is what moved.

Dimension	v3 (review-first)	v4 (change control)
Primary object	Review run on a commit	Change Passport on the PR
Dashboard home	Review runs list	Passports queue
Operator question	What did the model say?	May this change merge?
Provenance	Not tracked	Human, bot, or managed agent + confidence
Risk	Buried in prose	Score, band, filterable column
Pre-review	Checkpoint (separate product feel)	Agent Firewall tab on Control Board
Policy	Automation + merge scattered	Unified Policy tab (gate + review + merge slices)
Blocking	Opinion in comments	Blocking decision + evidenceId
Repair	Remedy suggestions	Remedy with proof bundle
Memory	Limited / ad hoc	Findings memory + incident learnings
Reviews	The product	Still run — now “evidence runs” on the passport

GitHub check names stay stable: Critique / Checkpoint, Critique / Review, Critique / Merge Policy. Branch protection you already configured keeps working.

Reviews did not disappear — they changed jobs

This is the most common misunderstanding about v4, so we will be explicit: Critique still runs deep, AI-powered reviews on your pull requests. When policy says a change deserves a full pass, we still spin up isolated sandboxes, still route frontier and mid-tier models, still run scout plus specialist lanes plus a lead synthesis — the same family of work you relied on in v3.

What we call that work in the UI is an evidence run: a commit-level record linked to the passport. Depending on repository settings, that run may execute as a managed sandbox review (our default premium path), a collector-backed review, or backend synthesis — but the operator outcome is the same class of artifact: findings, severity, an evidence contract, and a publishable GitHub check.

What did not change
AI-powered sandbox reviews still execute for approved PRs
Multi-model routing, credits, and BYOK paths still apply
Findings still post to GitHub when configured
Remedy can still turn eligible findings into verified patches
What changed
The review inbox is no longer the homepage of the product
A blocked merge must cite evidence, not vibes
Gate may stop a PR before any sandbox credits burn
One passport chains many runs into one auditable PR story

WHO, WHY, and WHAT NOT: the management layer

Comment bots are optimized for coverage: run on every PR, emit text, hope a human reads it. A management platform is optimized for decisions under constraint: limited reviewer attention, limited credits, unlimited agent enthusiasm.

WHO
Is this change from a trusted maintainer, a first-time contributor, a dependabot bump, or an agent whose vendor we can infer? Provenance rides on the passport with confidence and reasons — not guessed from the avatar alone.
WHY
Given who opened it and what paths moved, does this PR deserve expensive review right now? Risk scoring and gate rules answer before the sandbox starts. Review policy answers which specialists run and how hard.

WHAT NOT
Some PRs should never reach a model panel: slop shape, forbidden paths, dependency weakening, override abuse. Others may review but still must not merge until evidence and merge policy agree. “What not” is enforceable, not advisory.
WHAT HAPPENED
After merge or incident, what did we learn? Findings memory, suppressions with expiry, and incident learnings feed the next gate — so the system does not rediscover the same false positive every week.

That framing is why the passport exists. GitHub comments are ephemeral. Check runs are per-commit fragments. The passport is the PR-level ledger that ties gate events, evidence runs, merge decisions, remedy proof, and incidents into one timeline an operator (or auditor) can read without reconstructing drama from fifty notifications.

What is a Change Passport?

A Change Passport is Critique’s system of record for a single pull request. It opens when Critique governs that PR and accumulates state until the change merges or abandons. One passport, many evidence runs, one auditable story.

Passport lifecycle
PR opened → passport created→Provenance + risk scored→Agent Firewall (gate)→Evidence run(s) if policy allows→Merge policy evaluates→Remedy proof optional→Merge or block with recorded reason

The queue at /dashboard/passports is how operators live in v4: filter by repo, risk band, gate outcome, verdict, evidence status, merge policy state, and memory. Click through to the passport detail and you see summary, provenance, risk, gate timeline, linked evidence runs, merge permission (including overrides), remedy proof, suppressions, linked incidents, and a single chronological feed — not six disconnected admin pages.

If no immutable snapshot exists yet, the passport still renders: provenance may be labeled heuristic from live PR signals so the queue is never empty while the first review queues. That is a deliberate v4 choice — control rooms fail when the primary table is blank.

Three enforcement phases (gate → review → merge)

One policy model, three moments

Unified on the Control Board — not separate “automation” and “merge policy” products.

Phase	When	GitHub check	Question answered
Gate	Before review queue	Critique / Checkpoint	Should this PR consume review at all?
Review	During evidence run	Critique / Review	What defects and risks does evidence support?
Merge	After evidence exists	Critique / Merge Policy	May this change merge under our rules?

Gate is cheap and fast — contributor trust, PR shape, paths, dependencies. Review is expensive and thorough — sandbox execution, specialists, evidence contract. Merge is the enforceable boundary — owners, risk bands, proof requirements, dry-run to enforce when you are not ready to block contributors yet.

Why we rebuilt the dashboard around the merge boundary

For two years the industry optimized writing code. Agents now open pull requests on their own, refactor across dozens of files, and ship while the author sleeps. Generation is solved. What did not scale is deciding whether a change is safe to merge.

A review comment is advice. It does not record who made the change, it does not score the risk, it does not hold a line, and it forgets everything the moment the tab closes. When most of your diffs are written by machines, advice is not enough. You need a boundary with memory.

The six surfaces operators use

The dashboard was rebuilt so gravity matches the product. Passports come first. Evidence runs are drill-down. The Control Board replaces a scatter of settings URLs.

Passports
Primary queue
Every governed PR: repo, author, source badge, gate, risk, verdict, evidence, merge policy, proof, memory.
Passport detail
PR system of record
Full timeline: provenance, risk, gate, evidence runs, merge permission, Remedy proof, incidents.
Control Board
Gate · Policy · Delivery · Memory · Learnings
One surface for firewall rules, unified policy, webhooks, suppressions, incident promotion.
Evidence runs
Commit-level drill-down
Deep review record for one commit — sandbox review output, evidence contract, blocking decision.
Agent Firewall
Pre-merge gate
WHO/WHAT NOT before credits burn: trust, paths, deps, workflow/auth weakening.
Merge policy
Policy as code
Dashboard or .critique/policy.yml — dry-run, warn, enforce; overrides recorded with provenance.

The five control layers (under the hood)

Surfaces are how operators touch v4. Layers are what persists. Together they are the difference between “we use an AI reviewer” and “we run change control.”

1. Provenance

Source kind (human, bot, managed agent), inferred vendor when detectable, confidence and reasons. Feeds the WHO decision and the passports queue badge so agent floods are visible without opening every PR.

2. Risk

Persisted score, band, and reasons on the run, flowing to passport and overview. Risk is how you sort a hundred open PRs into “review today” versus “gate already handled it.”

3. Evidence

Evidence Contract v1 normalizes review artifacts. Blocking claims require evidenceId. Legacy runs still render through accessors — your historical sandbox reviews are not orphaned.

Without evidence contract
Blocking reads as opinion
No audit trail
Threads argue forever
With evidence contract
Verdict links to findings
Replayable from evidence run
Merge policy can enforce

4. Merge policy

Schema, evaluator, Critique / Merge Policy check. Dry-run while you learn, warn while you train contributors, enforce when you are ready. Overrides patch check status with recorded provenance.

5. Verified repair + learning loop

Remedy stores proof: patch hash, validation, verification linkage. Findings memory and incident ingest (Sentry, Linear, Jira, Vercel, manual) promote learnings into rules. Production teaches the gate.

Why this is not “Bugbot with extra steps”

Single-model comment tools compete on eloquence in the thread. Critique competes on governance across the PR lifetime. You can run Critique in dry-run on merge policy and gate while still posting reviews — or you can enforce blocks that comments cannot enforce. You can show an auditor the passport, not a screenshot of a bot saying “LGTM with concerns.”

We still want the best review engine in the category. We no longer want the category to stop at review. The teams winning in 2026 are not the ones with the most comments — they are the ones with the shortest path from agent PR to trustworthy merge.

Migrating from v3: what you will notice

First week on v4
1Where did my homepage go?
Passports replaced the review-run inbox as default gravity. Review runs live under Evidence runs — linked from each passport.
2Did my checks break?
No. Critique / Checkpoint and Critique / Merge Policy names are unchanged. UI says Agent Firewall; GitHub says Checkpoint.
3Do I have to enforce merge policy day one?
No. Dry-run and warn are first-class. Adopt enforcement when passport evidence quality earns trust.
4What happened to automation settings?
Control Board → Policy and Delivery. Legacy automation editor remains for edge cases during migration.

No. v4 still runs AI-powered sandbox reviews (and other review paths) when gate and policy allow. Those runs are evidence runs attached to the passport. What changed is the primary product object and the questions Critique optimizes for: who, why, what not, and whether merge is allowed with proof.

The official record for one pull request under Critique control: who opened it, how risky it is, what the gate said, which evidence runs executed, what merge policy decided, whether Remedy proved a fix, and what memory or incidents apply — in one timeline.

v3 centered ReviewRun and PR comments. v4 centers ChangePassport and the Control Board. The review pipeline still produces findings; v4 adds provenance, risk, enforceable merge policy, evidence contracts, remedy proof, and findings memory around that pipeline.

Comment bots optimize thread text. Critique v4 optimizes merge-boundary decisions with enforceable checks, auditable passports, and gates that can block PRs before review credits spend. Reviews are one layer — not the whole product.

No. Gate still publishes as Critique / Checkpoint; merge policy as Critique / Merge Policy. Agent Firewall is UI language only.

They remain available as evidence runs and link to passports when a passport exists. Legacy artifacts normalize into Evidence Contract v1 for display.

See /docs/platform/change-control for passports, merge policy, evidence contract, and Control Board tabs. Marketing checkpoint landing remains at /checkpoint.

Primary sources

Critique v5.0.0 beta ship

What shipped after v4 — marketplace, policy, Insights, BYOA.

Change control docs

Operator reference for v4.

AI change control guide

Buyer-facing merge-boundary essay.

Open source PR control

Foundations and high-volume OSS.

Cursor Composer 2.5 BYOA

Fix handoffs after hosted review.

Checkpoint landing

Pre-review gate marketing surface.

Open Passports — watch the boundary fill in

Run merge policy in dry-run, let gate observe a week of agent PRs, and keep sandbox reviews on the PRs that deserve spend. That is v4.

Open Passports

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy