Critique v4: The AI Change Control Platform (and What Happened to “Just Review”)
Reviews still run — in sandboxes, with evidence. v4 changes what Critique optimizes for: who gets a full review, why it runs, what may merge, and one passport per PR that remembers the answer.
What we are building (and what we are not)
Our goal is to be the best AI management platform for AI-powered software change — the layer that decides how machine-generated and human-generated work flows through your repository before it becomes everyone else’s dependency. That is broader than “leave a review on the PR.” It includes gating, policy, risk, evidence, verified repair, and memory from production.
We are explicitly not trying to be a prettier precursor to the comment bots you already know: single-model diff commentators that optimize for thread volume. Those tools ask one question — “what might be wrong here?” Critique v4 asks four: who made this change, should we spend review on it, what proof blocks merge, and what did we learn last time this path broke?
Critique v1 through v3: review was the whole story
Early Critique was legible because the category was legible: AI code review for GitHub. You installed the App, a pull request opened, and Critique produced a review — scout context, specialist passes, a lead verdict, comments on the diff. The dashboard was a review inbox. Success meant “the bot caught something a human would have missed.”
- v1–v2Review as product
Multi-model PR review, GitHub check runs, growing specialist lanes. The unit of value was the review artifact on the commit.
- v3More engine, same object
Checkpoint (deterministic pre-review gate), Workspace, Remedy, BYOK, richer sandbox execution — but operators still lived in review runs and comments.
- v4Passport as product
The PR-level Change Passport is the system of record. Review runs become evidence generators inside that story.
That progression matters because nothing was thrown away. The review pipeline — retrieval, sandbox execution, model routing, specialist synthesis — is the engine that produces evidence. v4 changes what sits in the driver’s seat: the passport and the Control Board, not the comment thread.
v4 in one table: same engine, different contract
If you used Critique in v3 last week, here is what moved.
| Dimension | v3 (review-first) | v4 (change control) |
|---|---|---|
| Primary object | Review run on a commit | Change Passport on the PR |
| Dashboard home | Review runs list | Passports queue |
| Operator question | What did the model say? | May this change merge? |
| Provenance | Not tracked | Human, bot, or managed agent + confidence |
| Risk | Buried in prose | Score, band, filterable column |
| Pre-review | Checkpoint (separate product feel) | Agent Firewall tab on Control Board |
| Policy | Automation + merge scattered | Unified Policy tab (gate + review + merge slices) |
| Blocking | Opinion in comments | Blocking decision + evidenceId |
| Repair | Remedy suggestions | Remedy with proof bundle |
| Memory | Limited / ad hoc | Findings memory + incident learnings |
| Reviews | The product | Still run — now “evidence runs” on the passport |
GitHub check names stay stable: Critique / Checkpoint, Critique / Review, Critique / Merge Policy. Branch protection you already configured keeps working.
Reviews did not disappear — they changed jobs
This is the most common misunderstanding about v4, so we will be explicit: Critique still runs deep, AI-powered reviews on your pull requests. When policy says a change deserves a full pass, we still spin up isolated sandboxes, still route frontier and mid-tier models, still run scout plus specialist lanes plus a lead synthesis — the same family of work you relied on in v3.
What we call that work in the UI is an evidence run: a commit-level record linked to the passport. Depending on repository settings, that run may execute as a managed sandbox review (our default premium path), a collector-backed review, or backend synthesis — but the operator outcome is the same class of artifact: findings, severity, an evidence contract, and a publishable GitHub check.
- AI-powered sandbox reviews still execute for approved PRs
- Multi-model routing, credits, and BYOK paths still apply
- Findings still post to GitHub when configured
- Remedy can still turn eligible findings into verified patches
- The review inbox is no longer the homepage of the product
- A blocked merge must cite evidence, not vibes
- Gate may stop a PR before any sandbox credits burn
- One passport chains many runs into one auditable PR story
WHO, WHY, and WHAT NOT: the management layer
Comment bots are optimized for coverage: run on every PR, emit text, hope a human reads it. A management platform is optimized for decisions under constraint: limited reviewer attention, limited credits, unlimited agent enthusiasm.
Is this change from a trusted maintainer, a first-time contributor, a dependabot bump, or an agent whose vendor we can infer? Provenance rides on the passport with confidence and reasons — not guessed from the avatar alone.
Given who opened it and what paths moved, does this PR deserve expensive review right now? Risk scoring and gate rules answer before the sandbox starts. Review policy answers which specialists run and how hard.
Some PRs should never reach a model panel: slop shape, forbidden paths, dependency weakening, override abuse. Others may review but still must not merge until evidence and merge policy agree. “What not” is enforceable, not advisory.
After merge or incident, what did we learn? Findings memory, suppressions with expiry, and incident learnings feed the next gate — so the system does not rediscover the same false positive every week.
That framing is why the passport exists. GitHub comments are ephemeral. Check runs are per-commit fragments. The passport is the PR-level ledger that ties gate events, evidence runs, merge decisions, remedy proof, and incidents into one timeline an operator (or auditor) can read without reconstructing drama from fifty notifications.
What is a Change Passport?
A Change Passport is Critique’s system of record for a single pull request. It opens when Critique governs that PR and accumulates state until the change merges or abandons. One passport, many evidence runs, one auditable story.
The queue at /dashboard/passports is how operators live in v4: filter by repo, risk band, gate outcome, verdict, evidence status, merge policy state, and memory. Click through to the passport detail and you see summary, provenance, risk, gate timeline, linked evidence runs, merge permission (including overrides), remedy proof, suppressions, linked incidents, and a single chronological feed — not six disconnected admin pages.
If no immutable snapshot exists yet, the passport still renders: provenance may be labeled heuristic from live PR signals so the queue is never empty while the first review queues. That is a deliberate v4 choice — control rooms fail when the primary table is blank.
Three enforcement phases (gate → review → merge)
Unified on the Control Board — not separate “automation” and “merge policy” products.
| Phase | When | GitHub check | Question answered |
|---|---|---|---|
| Gate | Before review queue | Critique / Checkpoint | Should this PR consume review at all? |
| Review | During evidence run | Critique / Review | What defects and risks does evidence support? |
| Merge | After evidence exists | Critique / Merge Policy | May this change merge under our rules? |
Gate is cheap and fast — contributor trust, PR shape, paths, dependencies. Review is expensive and thorough — sandbox execution, specialists, evidence contract. Merge is the enforceable boundary — owners, risk bands, proof requirements, dry-run to enforce when you are not ready to block contributors yet.
Why we rebuilt the dashboard around the merge boundary
For two years the industry optimized writing code. Agents now open pull requests on their own, refactor across dozens of files, and ship while the author sleeps. Generation is solved. What did not scale is deciding whether a change is safe to merge.
A review comment is advice. It does not record who made the change, it does not score the risk, it does not hold a line, and it forgets everything the moment the tab closes. When most of your diffs are written by machines, advice is not enough. You need a boundary with memory.
The six surfaces operators use
The dashboard was rebuilt so gravity matches the product. Passports come first. Evidence runs are drill-down. The Control Board replaces a scatter of settings URLs.
Primary queue
Every governed PR: repo, author, source badge, gate, risk, verdict, evidence, merge policy, proof, memory.
PR system of record
Full timeline: provenance, risk, gate, evidence runs, merge permission, Remedy proof, incidents.
Gate · Policy · Delivery · Memory · Learnings
One surface for firewall rules, unified policy, webhooks, suppressions, incident promotion.
Commit-level drill-down
Deep review record for one commit — sandbox review output, evidence contract, blocking decision.
Pre-merge gate
WHO/WHAT NOT before credits burn: trust, paths, deps, workflow/auth weakening.
Policy as code
Dashboard or .critique/policy.yml — dry-run, warn, enforce; overrides recorded with provenance.
The five control layers (under the hood)
Surfaces are how operators touch v4. Layers are what persists. Together they are the difference between “we use an AI reviewer” and “we run change control.”
1. Provenance
Source kind (human, bot, managed agent), inferred vendor when detectable, confidence and reasons. Feeds the WHO decision and the passports queue badge so agent floods are visible without opening every PR.
2. Risk
Persisted score, band, and reasons on the run, flowing to passport and overview. Risk is how you sort a hundred open PRs into “review today” versus “gate already handled it.”
3. Evidence
Evidence Contract v1 normalizes review artifacts. Blocking claims require evidenceId. Legacy runs still render through accessors — your historical sandbox reviews are not orphaned.
- Blocking reads as opinion
- No audit trail
- Threads argue forever
- Verdict links to findings
- Replayable from evidence run
- Merge policy can enforce
4. Merge policy
Schema, evaluator, Critique / Merge Policy check. Dry-run while you learn, warn while you train contributors, enforce when you are ready. Overrides patch check status with recorded provenance.
5. Verified repair + learning loop
Remedy stores proof: patch hash, validation, verification linkage. Findings memory and incident ingest (Sentry, Linear, Jira, Vercel, manual) promote learnings into rules. Production teaches the gate.
Why this is not “Bugbot with extra steps”
Single-model comment tools compete on eloquence in the thread. Critique competes on governance across the PR lifetime. You can run Critique in dry-run on merge policy and gate while still posting reviews — or you can enforce blocks that comments cannot enforce. You can show an auditor the passport, not a screenshot of a bot saying “LGTM with concerns.”
We still want the best review engine in the category. We no longer want the category to stop at review. The teams winning in 2026 are not the ones with the most comments — they are the ones with the shortest path from agent PR to trustworthy merge.
Migrating from v3: what you will notice
- 1Where did my homepage go?Passports replaced the review-run inbox as default gravity. Review runs live under Evidence runs — linked from each passport.
- 2Did my checks break?No. Critique / Checkpoint and Critique / Merge Policy names are unchanged. UI says Agent Firewall; GitHub says Checkpoint.
- 3Do I have to enforce merge policy day one?No. Dry-run and warn are first-class. Adopt enforcement when passport evidence quality earns trust.
- 4What happened to automation settings?Control Board → Policy and Delivery. Legacy automation editor remains for edge cases during migration.
Open Passports — watch the boundary fill in
Run merge policy in dry-run, let gate observe a week of agent PRs, and keep sandbox reviews on the PRs that deserve spend. That is v4.
Open Passports