Critique PR Review v5: Diff-First Runs & Multi-Turn OpenCode
Review runs read like a review again: diff and findings up front, OpenCode as a resumable session, and a path to a cheap thinker that nudges weak or stuck agent work before you accept an artifact.
Why v5 matters
When the default screen is a firehose of tool starts, permission prompts, and JSON blobs, the product trains people to debug the agent instead of judging the pull request. That is the wrong default for merge decisions. v5 assumes the reader’s job is review: understand the change, scan findings, and only then dig into execution detail if something looks off.
Nothing here removes depth. Telemetry, Remedy export, the full OpenCode stream, and sandbox diagnostics are still there. They simply stop competing with the diff for attention on first paint.
What changed on the review-run page
The surface is reorganized around three layers of evidence: the change itself (diff and touched files), what Critique concluded (structured findings and severity), and what the agent did in plain language (OpenCode highlights rather than raw step noise). Operators who need the deep trace can still open it; everyone else gets a page that reads top-to-bottom like a review memo.
OpenCode as a session, not a single prompt
Send one prompt, wait for one artifact, hope the JSON is valid and the evidence is complete. If anything is thin, you discard or rerun from zero — losing context and burning time.
A headless server-backed session gets the initial review prompt, then the runner can inspect the artifact and send follow-ups in the same conversation before accepting the result — closer to how a senior engineer actually works.
Real review is iterative. If the artifact is malformed, skips command evidence, ignores available specialist roles, or hand-waves tests, Critique can ask OpenCode to continue in-place instead of treating the run as binary success or failure.
Next: a thinker outside the sandbox
Deterministic checks catch structural failures, but they do not read tone, repetition, or “looks busy but shallow” behavior in the stream. The next layer is a small, cheap model that watches the live OpenCode output, classifies whether the agent is stuck or under-evidenced, and drafts the next best follow-up — or signals abort when the run is going nowhere.
DeepSeek V4 Flash is a practical default for that role: already in the Critique catalog, inexpensive enough to run often, and strong enough for meta-review — missing evidence, repeated claims without new commands, tests claimed but not shown, and stalls where the last meaningful beat was too long ago.
- Latest OpenCode messages, tool calls, errors, and command timeline
- Current review artifact, schema validation, and finding depth
- Elapsed time since the last meaningful activity beat
- Whether specialists were invoked when policy made them available
- Accept the artifact as good enough for the merge gate
- Send a targeted follow-up into the same OpenCode session
- Narrow from exploration to final synthesis
- Abort or degrade gracefully when stuck or over budget
What we are optimizing for
The durable goal is not more model calls for their own sake. It is fewer silent failures, fewer weak reviews, and explicit state: what the agent saw, what it ran, what it found, what it skipped, and why the controller accepted the artifact. v5 keeps OpenCode on repository work, Critique on control plane, and leaves room for a cheap thinker that keeps the session honest.
- 1Does the first screen answer “what changed?” without scrolling past logs?You should see the PR diff and finding highlights immediately; expand diagnostics only when debugging.
- 2When a follow-up fires, does it stay in one OpenCode session?Check the run detail: continuation should reuse the session rather than starting a blank run, until follow-up budget is exhausted.
- 3Is the live feed readable by default?Progress should scan quickly; switch to the raw stream when you need verbatim tool payloads.
Try the v5 review run experience
Connect a repository, open a PR review run, and see diff-first layout plus summarized OpenCode activity. Sign up takes under a minute.
Get started →