ProductApril 13, 202612 min readRepath Khan

Critique PR Review v3.1: The Sandbox Now Finishes the Review

A founder-style note on what changed after V3: Critique now lets the sandbox produce the final structured review artifact, not just the evidence pack behind it.

The sandbox finishes the review.

v3.1 · structured artifact

critique.sh

Founder note / Architecture tightening
48-hour release cadence
critique.sh / PR Review v3.1
The sandbox now finishes the review.Critique PR Review v3.1 is the point release where the final structured review artifact moves into the sandbox-native path. The app still owns policy, routing, persistence, GitHub publication, and Remedy. But the environment that inspected the repo is now much closer to being the environment that speaks for the review.
APR 10V2
Engine upgrade
Deeper specialists, clustering, drill-down targeting, stronger evidence.
APR 11V3
Operational review system
Stored artifacts, review-run pages, GitHub publication, Remedy handoff.
APR 13v3.1
Sandbox-native final artifact
The sandbox now writes the final review output instead of stopping at evidence.
Final artifact contract
/tmp/critique-review-output.json
fieldsummary
fieldbody
fieldfindings[]
fieldsubAgents[]
fieldcommandTimeline[]
fieldruntimeChecks
Control plane split
sandbox → inspect repo
sandbox → write final artifact
app → persist + publish
review page → explain + hand off
Try Critique now →Read yesterday's V3 note

A founder note

Two days ago it was V2. Yesterday it was V3. Today it is v3.1.

I wanted this post to exist because the product moved again fast enough that leaving the story at V3 would already feel stale. The V2 post from the day before was about making the review engine deeper and more serious. The V3 post from yesterday was about making the system operational: sandbox-backed analysis, dedicated review-analysis workers, canonical review pages, better GitHub publication, and a real review-to-Remedy handoff. Critique PR Review v3.1 is the next step in the same direction. We have now moved the final review synthesis itself into the sandbox-native path.

That sentence is the whole story in compact form, but it deserves to be unpacked properly because it changes what the product is becoming. Before v3.1, the sandbox could do the grounded work and hand a durable artifact back to the app, but the app still effectively authored the final review result. In v3.1, the sandbox can now produce the final structured review output directly. That means the environment that actually saw the repo is now much closer to being the environment that speaks for the review.

The direct answer

What Critique PR Review v3.1 actually is

Critique PR Review v3.1 is an architecture tightening release. The primary change is that the sandbox-native OpenCode execution path can now write the final JSON review artifact, and the main pipeline knows how to parse and consume that artifact while preserving a backend-led fallback path if sandbox-native review fails. In practice, that means the structured review output now includes the reviewer summary, reviewer-ready body, normalized findings, sub-agent reports, command timeline, and runtime-check state from the sandbox path itself.

What changed from V2 to V3 to v3.1

The last three release notes are really one fast-moving architecture story.

Version	What became true	What still had to move next
V2	Critique got much better at finding issues: semantic index, clustering, deeper specialists, drill-down targeting, cross-file analysis, and stronger evidence packaging.	The review engine was stronger, but the product still needed a more explicit operating model around evidence, publishing, and handoff.
V3	The system became operational: review-analysis workers, stored artifacts, canonical review pages, GitHub-native publication, fix prompts, and Remedy handoff.	The sandbox was still mostly the evidence producer. The app still remained the place where the final review synthesis was authored.
v3.1	The sandbox-native path now produces the final structured review artifact itself, including findings, command history, runtime checks, and sub-agent output.	The remaining work is less about product shape and more about deeper runtime proof, more coverage, and continued hardening of the same architecture.

Why I care about this release

It removes one more translation layer between evidence and verdict

A lot of AI review products still hide the most important architectural weakness in plain sight: one part of the system gathers context, another part writes the verdict, and the distance between those two layers quietly becomes a source of drift. The evidence can be grounded, while the final narration is still one step removed from the environment that actually did the work. That is not a fatal architecture. But it is not the cleanest one either.

v3.1 narrows that gap. The same sandbox path that reads the review input, inspects the checked-out repository, compares behavior, and runs targeted commands can now author the final structured artifact. The app then consumes that artifact, stores it, explains it, publishes it, and hands it forward into GitHub and Remedy. That separation feels right to me. The app should be the control plane. The sandbox should be the execution plane. And when possible, the review should finish in the execution plane too.

What shipped in the code

A real sandbox-native review path, not a cosmetic rename

What v3.1 adds
A dedicated sandbox-native OpenCode review execution path on E2B
A JSON schema contract for the sandbox-authored review artifact
Parsing and normalization for verdict, summary, body, findings, sub-agent reports, command timeline, and runtime checks
Pipeline logic that prefers sandbox-authored review output when the sandbox path is valid
A preserved backend-led fallback path if sandbox-native review execution fails
Artifact persistence that now carries richer sandbox-originated review data forward
What this is not
Not a styling-only update to the review page
Not just another summary template on top of the existing backend path
Not a removal of the fallback path or a brittle all-or-nothing switch
Not a claim that the system is “done” or fully proven under every real runtime yet

V3 path
Sandbox gathers evidence→Artifact stored→Backend synthesizes final review→GitHub publish→Remedy handoff
v3.1 path
Sandbox gathers evidence→Sandbox writes final review artifact→App persists + publishes→Review page reflects sandbox output→Remedy handoff stays grounded

That is the heart of the release. It is not flashy in the way a giant new screen or a new pricing page is flashy, but it is the kind of product movement I trust the most because it changes the truthfulness of the system. It makes the review output more native to the work that produced it. It preserves more of the execution trace. It makes downstream surfaces less dependent on interpretation and more dependent on artifact handling.

What this means for users

The review artifact is getting closer to becoming the real source of truth

The practical implication is that the canonical review page becomes even more honest. When a user opens that page now, more of what they see can originate from the sandbox-native path itself: not only the analysis markdown and collected evidence, but the actual findings set, the command timeline that explains what happened, the runtime-check state that shows whether the app started and which URLs were reachable, and the sub-agent output that fed the conclusion. That is a much better surface for trust than a final review body that was stitched together farther away from the repo execution path.

This also matters downstream. Remedy works best when the handoff is grounded in the same durable artifact that the review relied on. Fix prompts work best when they are built from a concrete finding set with explicit execution history behind it. GitHub publication works better when the internal artifact is already cleanly structured rather than reconstructed late. All of those product surfaces improve when the review artifact is better, even if the UI barely changes.

Why this feels like v3.1 and not V4

Same direction, tighter contract

I do not think every meaningful architecture improvement deserves a whole-number marketing reset. V4, to me, should imply that the external product contour changed enough that users need a genuinely new mental model. v3.1 is different. The mental model from yesterday is still correct: Critique is a GitHub-native AI review system with sandbox-backed analysis, durable artifacts, canonical review pages, GitHub publishing, and a direct path into Remedy. What changed today is that the contract inside that model got tighter and more coherent.

That matters because one of the easiest ways to make fast-moving software feel unserious is to inflate every improvement into a giant theatrical version jump. I would rather be precise. V3 introduced the operational shape. v3.1 moves the architecture closer to the shape V3 implied. That is a point release in name, but not in importance.

What is live in v3.1 right now

What you can say is true after this release
1Does the sandbox still gather deterministic evidence?
Yes. That remains the base of the system, and v3.1 builds on top of it rather than replacing it.
2Can the sandbox-native path now author the final review artifact?
Yes. That is the defining change in v3.1.
3Does the main pipeline still have a backend-led fallback path?
Yes. We tightened the architecture without making the product brittle.
4Do runtime checks, command timeline, and sub-agent output now travel more cleanly with the artifact?
Yes. Those fields are now part of the sandbox-originated structured output path.
5Does GitHub-native publishing still matter?
Absolutely. The app still owns publication and orchestration. v3.1 improves what gets handed into those surfaces.
6Is the path from review to Remedy still intact?
Yes. In some ways it gets stronger because the review artifact is more authoritative.

FAQ

Critique PR Review v3.1 is the release where final review synthesis moves into the sandbox-native path. The sandbox can now produce the structured review artifact itself, including summary, findings, sub-agent reports, command timeline, and runtime checks, while the app remains the control plane.

V3 made the review system operational with stored artifacts, review pages, publishing, and handoff. v3.1 tightens the architecture further by letting the sandbox author the final artifact instead of only producing evidence that the backend later turns into the review.

It matters because the verdict gets closer to the environment that actually inspected the code. That reduces translation loss, preserves more execution history, and gives downstream surfaces like GitHub publication, fix prompts, and Remedy a more authoritative artifact to work from.

No. The system still preserves a backend-led fallback if sandbox-native review execution fails. The goal was to tighten the architecture without sacrificing resilience.

Closing

The product is becoming more literal

The phrase I keep coming back to is literalness. Good infrastructure should become more literal over time. The component that inspects the repo should be closer to the component that explains what it found. The artifact that powers the review page should be the same artifact that powers publication and handoff. The story the product tells the user should line up with the actual operating model under the hood. v3.1 is a small but very real move in that direction.

If you read the V2 post from the day before and the V3 note from yesterday, the speed of this progression is the thing I most want you to notice. We are not wandering around adding disconnected features. We are clarifying the system. Critique is getting more opinionated about what belongs in the app, what belongs in the sandbox, and what a trustworthy review artifact should actually be.

There is more to prove. We still want deeper runtime validation, more end-to-end mileage, and continued hardening of the same path. But the shape is sharper today than it was yesterday, and that is enough reason for me to write this note. Repath Khan Founder, Critique

Primary sources

Read yesterday’s founder note: Hello to v3 of PR Review

The release note that made the review system operational with durable artifacts, canonical review pages, and Remedy handoff.

Read the day-before upgrade note: Critique just got a whole lot better

The V2-era engine upgrade that made the review pipeline deeper, sharper, and harder to fool.

Primary source: Add sandbox-native review execution path

The commit that adds sandbox-native review execution, artifact parsing, and pipeline consumption for the new path.

Primary source: wire sandbox review policy runtime contract

The immediately preceding commit that tightened runtime policy and prepared the ground for the current release.

Catch up on the release cadence

If you want the full 48-hour arc, read the V2 and V3 posts back to back, then come back to this note. The product is changing fast, but the direction is getting clearer every day.

Read the V3 founder note →

Try Critique on a real pull request

Install the GitHub App, connect a repository, and run the current review system end to end. The sandbox-backed path, canonical review page, GitHub publishing, fix prompt flow, and Remedy handoff are all part of the product now.

Install Critique for GitHub →

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy