Skip to content
24 min readThe Critique Team

Critique v4: The AI Change Control Platform (and What Happened to “Just Review”)

Reviews still run — in sandboxes, with evidence. v4 changes what Critique optimizes for: who gets a full review, why it runs, what may merge, and one passport per PR that remembers the answer.

v4.0.0
Largest product gravity shift since launch
1
Change Passport per pull request — the product object
3
Enforcement phases: Gate, Review, Merge
Evidence runs per passport — reviews did not go away

Our goal is to be the best AI management platform for AI-powered software change — the layer that decides how machine-generated and human-generated work flows through your repository before it becomes everyone else’s dependency. That is broader than “leave a review on the PR.” It includes gating, policy, risk, evidence, verified repair, and memory from production.

We are explicitly not trying to be a prettier precursor to the comment bots you already know: single-model diff commentators that optimize for thread volume. Those tools ask one question — “what might be wrong here?” Critique v4 asks four: who made this change, should we spend review on it, what proof blocks merge, and what did we learn last time this path broke?

Early Critique was legible because the category was legible: AI code review for GitHub. You installed the App, a pull request opened, and Critique produced a review — scout context, specialist passes, a lead verdict, comments on the diff. The dashboard was a review inbox. Success meant “the bot caught something a human would have missed.”

  1. v1–v2
    Review as product

    Multi-model PR review, GitHub check runs, growing specialist lanes. The unit of value was the review artifact on the commit.

  2. v3
    More engine, same object

    Checkpoint (deterministic pre-review gate), Workspace, Remedy, BYOK, richer sandbox execution — but operators still lived in review runs and comments.

  3. v4
    Passport as product

    The PR-level Change Passport is the system of record. Review runs become evidence generators inside that story.

That progression matters because nothing was thrown away. The review pipeline — retrieval, sandbox execution, model routing, specialist synthesis — is the engine that produces evidence. v4 changes what sits in the driver’s seat: the passport and the Control Board, not the comment thread.

What actually changed in v4

If you used Critique in v3 last week, here is what moved.

Dimensionv3 (review-first)v4 (change control)
Primary objectReview run on a commitChange Passport on the PR
Dashboard homeReview runs listPassports queue
Operator questionWhat did the model say?May this change merge?
ProvenanceNot trackedHuman, bot, or managed agent + confidence
RiskBuried in proseScore, band, filterable column
Pre-reviewCheckpoint (separate product feel)Agent Firewall tab on Control Board
PolicyAutomation + merge scatteredUnified Policy tab (gate + review + merge slices)
BlockingOpinion in commentsBlocking decision + evidenceId
RepairRemedy suggestionsRemedy with proof bundle
MemoryLimited / ad hocFindings memory + incident learnings
ReviewsThe productStill run — now “evidence runs” on the passport

GitHub check names stay stable: Critique / Checkpoint, Critique / Review, Critique / Merge Policy. Branch protection you already configured keeps working.

This is the most common misunderstanding about v4, so we will be explicit: Critique still runs deep, AI-powered reviews on your pull requests. When policy says a change deserves a full pass, we still spin up isolated sandboxes, still route frontier and mid-tier models, still run scout plus specialist lanes plus a lead synthesis — the same family of work you relied on in v3.

What we call that work in the UI is an evidence run: a commit-level record linked to the passport. Depending on repository settings, that run may execute as a managed sandbox review (our default premium path), a collector-backed review, or backend synthesis — but the operator outcome is the same class of artifact: findings, severity, an evidence contract, and a publishable GitHub check.

What did not change
  • AI-powered sandbox reviews still execute for approved PRs
  • Multi-model routing, credits, and BYOK paths still apply
  • Findings still post to GitHub when configured
  • Remedy can still turn eligible findings into verified patches
What changed
  • The review inbox is no longer the homepage of the product
  • A blocked merge must cite evidence, not vibes
  • Gate may stop a PR before any sandbox credits burn
  • One passport chains many runs into one auditable PR story

Comment bots are optimized for coverage: run on every PR, emit text, hope a human reads it. A management platform is optimized for decisions under constraint: limited reviewer attention, limited credits, unlimited agent enthusiasm.

WHO

Is this change from a trusted maintainer, a first-time contributor, a dependabot bump, or an agent whose vendor we can infer? Provenance rides on the passport with confidence and reasons — not guessed from the avatar alone.

WHY

Given who opened it and what paths moved, does this PR deserve expensive review right now? Risk scoring and gate rules answer before the sandbox starts. Review policy answers which specialists run and how hard.

WHAT NOT

Some PRs should never reach a model panel: slop shape, forbidden paths, dependency weakening, override abuse. Others may review but still must not merge until evidence and merge policy agree. “What not” is enforceable, not advisory.

WHAT HAPPENED

After merge or incident, what did we learn? Findings memory, suppressions with expiry, and incident learnings feed the next gate — so the system does not rediscover the same false positive every week.

That framing is why the passport exists. GitHub comments are ephemeral. Check runs are per-commit fragments. The passport is the PR-level ledger that ties gate events, evidence runs, merge decisions, remedy proof, and incidents into one timeline an operator (or auditor) can read without reconstructing drama from fifty notifications.

A Change Passport is Critique’s system of record for a single pull request. It opens when Critique governs that PR and accumulates state until the change merges or abandons. One passport, many evidence runs, one auditable story.

Passport lifecycle
PR opened → passport createdProvenance + risk scoredAgent Firewall (gate)Evidence run(s) if policy allowsMerge policy evaluatesRemedy proof optionalMerge or block with recorded reason

The queue at /dashboard/passports is how operators live in v4: filter by repo, risk band, gate outcome, verdict, evidence status, merge policy state, and memory. Click through to the passport detail and you see summary, provenance, risk, gate timeline, linked evidence runs, merge permission (including overrides), remedy proof, suppressions, linked incidents, and a single chronological feed — not six disconnected admin pages.

If no immutable snapshot exists yet, the passport still renders: provenance may be labeled heuristic from live PR signals so the queue is never empty while the first review queues. That is a deliberate v4 choice — control rooms fail when the primary table is blank.

One policy model, three moments

Unified on the Control Board — not separate “automation” and “merge policy” products.

PhaseWhenGitHub checkQuestion answered
GateBefore review queueCritique / CheckpointShould this PR consume review at all?
ReviewDuring evidence runCritique / ReviewWhat defects and risks does evidence support?
MergeAfter evidence existsCritique / Merge PolicyMay this change merge under our rules?

Gate is cheap and fast — contributor trust, PR shape, paths, dependencies. Review is expensive and thorough — sandbox execution, specialists, evidence contract. Merge is the enforceable boundary — owners, risk bands, proof requirements, dry-run to enforce when you are not ready to block contributors yet.

For two years the industry optimized writing code. Agents now open pull requests on their own, refactor across dozens of files, and ship while the author sleeps. Generation is solved. What did not scale is deciding whether a change is safe to merge.

A review comment is advice. It does not record who made the change, it does not score the risk, it does not hold a line, and it forgets everything the moment the tab closes. When most of your diffs are written by machines, advice is not enough. You need a boundary with memory.

The dashboard was rebuilt so gravity matches the product. Passports come first. Evidence runs are drill-down. The Control Board replaces a scatter of settings URLs.

Surfaces are how operators touch v4. Layers are what persists. Together they are the difference between “we use an AI reviewer” and “we run change control.”

1. Provenance

Source kind (human, bot, managed agent), inferred vendor when detectable, confidence and reasons. Feeds the WHO decision and the passports queue badge so agent floods are visible without opening every PR.

2. Risk

Persisted score, band, and reasons on the run, flowing to passport and overview. Risk is how you sort a hundred open PRs into “review today” versus “gate already handled it.”

3. Evidence

Evidence Contract v1 normalizes review artifacts. Blocking claims require evidenceId. Legacy runs still render through accessors — your historical sandbox reviews are not orphaned.

Without evidence contract
  • Blocking reads as opinion
  • No audit trail
  • Threads argue forever
With evidence contract
  • Verdict links to findings
  • Replayable from evidence run
  • Merge policy can enforce

4. Merge policy

Schema, evaluator, Critique / Merge Policy check. Dry-run while you learn, warn while you train contributors, enforce when you are ready. Overrides patch check status with recorded provenance.

5. Verified repair + learning loop

Remedy stores proof: patch hash, validation, verification linkage. Findings memory and incident ingest (Sentry, Linear, Jira, Vercel, manual) promote learnings into rules. Production teaches the gate.

Single-model comment tools compete on eloquence in the thread. Critique competes on governance across the PR lifetime. You can run Critique in dry-run on merge policy and gate while still posting reviews — or you can enforce blocks that comments cannot enforce. You can show an auditor the passport, not a screenshot of a bot saying “LGTM with concerns.”

We still want the best review engine in the category. We no longer want the category to stop at review. The teams winning in 2026 are not the ones with the most comments — they are the ones with the shortest path from agent PR to trustworthy merge.

First week on v4
  1. 1
    Where did my homepage go?
    Passports replaced the review-run inbox as default gravity. Review runs live under Evidence runs — linked from each passport.
  2. 2
    Did my checks break?
    No. Critique / Checkpoint and Critique / Merge Policy names are unchanged. UI says Agent Firewall; GitHub says Checkpoint.
  3. 3
    Do I have to enforce merge policy day one?
    No. Dry-run and warn are first-class. Adopt enforcement when passport evidence quality earns trust.
  4. 4
    What happened to automation settings?
    Control Board → Policy and Delivery. Legacy automation editor remains for edge cases during migration.
No. v4 still runs AI-powered sandbox reviews (and other review paths) when gate and policy allow. Those runs are evidence runs attached to the passport. What changed is the primary product object and the questions Critique optimizes for: who, why, what not, and whether merge is allowed with proof.
The official record for one pull request under Critique control: who opened it, how risky it is, what the gate said, which evidence runs executed, what merge policy decided, whether Remedy proved a fix, and what memory or incidents apply — in one timeline.
v3 centered ReviewRun and PR comments. v4 centers ChangePassport and the Control Board. The review pipeline still produces findings; v4 adds provenance, risk, enforceable merge policy, evidence contracts, remedy proof, and findings memory around that pipeline.
Comment bots optimize thread text. Critique v4 optimizes merge-boundary decisions with enforceable checks, auditable passports, and gates that can block PRs before review credits spend. Reviews are one layer — not the whole product.
No. Gate still publishes as Critique / Checkpoint; merge policy as Critique / Merge Policy. Agent Firewall is UI language only.
They remain available as evidence runs and link to passports when a passport exists. Legacy artifacts normalize into Evidence Contract v1 for display.
See /docs/platform/change-control for passports, merge policy, evidence contract, and Control Board tabs. Marketing checkpoint landing remains at /checkpoint.

Open Passports — watch the boundary fill in

Run merge policy in dry-run, let gate observe a week of agent PRs, and keep sandbox reviews on the PRs that deserve spend. That is v4.

Open Passports