Critique PR Review v3.1: The Sandbox Now Finishes the Review
A founder-style note on what changed after V3: Critique now lets the sandbox produce the final structured review artifact, not just the evidence pack behind it.

The sandbox finishes the review.
v3.1 · structured artifact
critique.sh
The sandbox now finishes the review.
Critique PR Review v3.1 is the point release where the final structured review artifact moves into the sandbox-native path. The app still owns policy, routing, persistence, GitHub publication, and Remedy. But the environment that inspected the repo is now much closer to being the environment that speaks for the review.
A founder note
Two days ago it was V2. Yesterday it was V3. Today it is v3.1.
I wanted this post to exist because the product moved again fast enough that leaving the story at V3 would already feel stale. The V2 post from the day before was about making the review engine deeper and more serious. The V3 post from yesterday was about making the system operational: sandbox-backed analysis, dedicated review-analysis workers, canonical review pages, better GitHub publication, and a real review-to-Remedy handoff. Critique PR Review v3.1 is the next step in the same direction. We have now moved the final review synthesis itself into the sandbox-native path.
That sentence is the whole story in compact form, but it deserves to be unpacked properly because it changes what the product is becoming. Before v3.1, the sandbox could do the grounded work and hand a durable artifact back to the app, but the app still effectively authored the final review result. In v3.1, the sandbox can now produce the final structured review output directly. That means the environment that actually saw the repo is now much closer to being the environment that speaks for the review.
The direct answer
What Critique PR Review v3.1 actually is
Critique PR Review v3.1 is an architecture tightening release. The primary change is that the sandbox-native OpenCode execution path can now write the final JSON review artifact, and the main pipeline knows how to parse and consume that artifact while preserving a backend-led fallback path if sandbox-native review fails. In practice, that means the structured review output now includes the reviewer summary, reviewer-ready body, normalized findings, sub-agent reports, command timeline, and runtime-check state from the sandbox path itself.
The last three release notes are really one fast-moving architecture story.
| Version | What became true | What still had to move next |
|---|---|---|
| V2 | Critique got much better at finding issues: semantic index, clustering, deeper specialists, drill-down targeting, cross-file analysis, and stronger evidence packaging. | The review engine was stronger, but the product still needed a more explicit operating model around evidence, publishing, and handoff. |
| V3 | The system became operational: review-analysis workers, stored artifacts, canonical review pages, GitHub-native publication, fix prompts, and Remedy handoff. | The sandbox was still mostly the evidence producer. The app still remained the place where the final review synthesis was authored. |
| v3.1 | The sandbox-native path now produces the final structured review artifact itself, including findings, command history, runtime checks, and sub-agent output. | The remaining work is less about product shape and more about deeper runtime proof, more coverage, and continued hardening of the same architecture. |
Why I care about this release
It removes one more translation layer between evidence and verdict
A lot of AI review products still hide the most important architectural weakness in plain sight: one part of the system gathers context, another part writes the verdict, and the distance between those two layers quietly becomes a source of drift. The evidence can be grounded, while the final narration is still one step removed from the environment that actually did the work. That is not a fatal architecture. But it is not the cleanest one either.
v3.1 narrows that gap. The same sandbox path that reads the review input, inspects the checked-out repository, compares behavior, and runs targeted commands can now author the final structured artifact. The app then consumes that artifact, stores it, explains it, publishes it, and hands it forward into GitHub and Remedy. That separation feels right to me. The app should be the control plane. The sandbox should be the execution plane. And when possible, the review should finish in the execution plane too.
What shipped in the code
A real sandbox-native review path, not a cosmetic rename
- A dedicated sandbox-native OpenCode review execution path on E2B
- A JSON schema contract for the sandbox-authored review artifact
- Parsing and normalization for verdict, summary, body, findings, sub-agent reports, command timeline, and runtime checks
- Pipeline logic that prefers sandbox-authored review output when the sandbox path is valid
- A preserved backend-led fallback path if sandbox-native review execution fails
- Artifact persistence that now carries richer sandbox-originated review data forward
- Not a styling-only update to the review page
- Not just another summary template on top of the existing backend path
- Not a removal of the fallback path or a brittle all-or-nothing switch
- Not a claim that the system is “done” or fully proven under every real runtime yet
That is the heart of the release. It is not flashy in the way a giant new screen or a new pricing page is flashy, but it is the kind of product movement I trust the most because it changes the truthfulness of the system. It makes the review output more native to the work that produced it. It preserves more of the execution trace. It makes downstream surfaces less dependent on interpretation and more dependent on artifact handling.
What this means for users
The review artifact is getting closer to becoming the real source of truth
The practical implication is that the canonical review page becomes even more honest. When a user opens that page now, more of what they see can originate from the sandbox-native path itself: not only the analysis markdown and collected evidence, but the actual findings set, the command timeline that explains what happened, the runtime-check state that shows whether the app started and which URLs were reachable, and the sub-agent output that fed the conclusion. That is a much better surface for trust than a final review body that was stitched together farther away from the repo execution path.
This also matters downstream. Remedy works best when the handoff is grounded in the same durable artifact that the review relied on. Fix prompts work best when they are built from a concrete finding set with explicit execution history behind it. GitHub publication works better when the internal artifact is already cleanly structured rather than reconstructed late. All of those product surfaces improve when the review artifact is better, even if the UI barely changes.
Why this feels like v3.1 and not V4
Same direction, tighter contract
I do not think every meaningful architecture improvement deserves a whole-number marketing reset. V4, to me, should imply that the external product contour changed enough that users need a genuinely new mental model. v3.1 is different. The mental model from yesterday is still correct: Critique is a GitHub-native AI review system with sandbox-backed analysis, durable artifacts, canonical review pages, GitHub publishing, and a direct path into Remedy. What changed today is that the contract inside that model got tighter and more coherent.
That matters because one of the easiest ways to make fast-moving software feel unserious is to inflate every improvement into a giant theatrical version jump. I would rather be precise. V3 introduced the operational shape. v3.1 moves the architecture closer to the shape V3 implied. That is a point release in name, but not in importance.
What is live in v3.1 right now
- 1Does the sandbox still gather deterministic evidence?Yes. That remains the base of the system, and v3.1 builds on top of it rather than replacing it.
- 2Can the sandbox-native path now author the final review artifact?Yes. That is the defining change in v3.1.
- 3Does the main pipeline still have a backend-led fallback path?Yes. We tightened the architecture without making the product brittle.
- 4Do runtime checks, command timeline, and sub-agent output now travel more cleanly with the artifact?Yes. Those fields are now part of the sandbox-originated structured output path.
- 5Does GitHub-native publishing still matter?Absolutely. The app still owns publication and orchestration. v3.1 improves what gets handed into those surfaces.
- 6Is the path from review to Remedy still intact?Yes. In some ways it gets stronger because the review artifact is more authoritative.
FAQ
Closing
The product is becoming more literal
The phrase I keep coming back to is literalness. Good infrastructure should become more literal over time. The component that inspects the repo should be closer to the component that explains what it found. The artifact that powers the review page should be the same artifact that powers publication and handoff. The story the product tells the user should line up with the actual operating model under the hood. v3.1 is a small but very real move in that direction.
If you read the V2 post from the day before and the V3 note from yesterday, the speed of this progression is the thing I most want you to notice. We are not wandering around adding disconnected features. We are clarifying the system. Critique is getting more opinionated about what belongs in the app, what belongs in the sandbox, and what a trustworthy review artifact should actually be.
There is more to prove. We still want deeper runtime validation, more end-to-end mileage, and continued hardening of the same path. But the shape is sharper today than it was yesterday, and that is enough reason for me to write this note. Repath Khan Founder, Critique
Catch up on the release cadence
If you want the full 48-hour arc, read the V2 and V3 posts back to back, then come back to this note. The product is changing fast, but the direction is getting clearer every day.
Read the V3 founder note →Try Critique on a real pull request
Install the GitHub App, connect a repository, and run the current review system end to end. The sandbox-backed path, canonical review page, GitHub publishing, fix prompt flow, and Remedy handoff are all part of the product now.
Install Critique for GitHub →