A billion tokens in beta: where Critique started, what shipped, and what is next
Thank you to everyone running the product while it is still rough at the edges. Here is the honest arc: GitHub review first, sandboxes and OpenCode, our own Critique agent on that stack, Remedy and Builder, Chat that grew up beside them, and Workspace as the direction toward one always-on engineer.

fuck around and find out.
May 2026 · beta · thank you
critique.sh
First, thank you. People are using Critique while we are still early in beta: filing bugs, burning real review load, and telling us where the product lies. That feedback is not a nice-to-have. It is how we keep the roadmap honest.
How the product story actually unfolded
We did not start with “a platform.” We started with Critique on the pull request: the GitHub App, comments that carry evidence, and a merge gate that could talk back. That is still the spine of the company.
Sandboxes came next so review could see a real checkout, not only a diff. OpenCode landed inside that sandbox world so an agent could work like a human reviewer with tools. Over time that stack gained its own identity: the Critique agent — OpenCode underneath, but tuned, wrapped, and operated the way we want long reviews to behave.
Remedy arrived as the honest next question after review: when a finding is narrow enough to fix, can we validate a patch in isolation and hand it back on the branch? Builder grew when teams wanted the same execution fabric for “make this change in the repo” without starting from a PR thread. Chat was early, and it kept maturing in parallel — better models, repo context, speech, and tighter bridges into review when a conversation should become action.
Today those surfaces rhyme: similar model ladder, similar respect for evidence, similar need for accounting. Workspace is the bet that they should not feel like four different products forever. The north star is blunt: one place that behaves like an always-on software engineer — review, explain, patch, and build — with one ledger and one notion of trust.
Sampling the PR stream (not every line is in Version)
The public ship log is the right place for dated release notes. It cannot capture every experiment that died in a branch, or every internal-only refactor that made the next feature possible. The boxes below are representative slices of the merge history: early GitHub-only work, the middle where sandboxes and OpenCode took over, and the late wave where metering, liveness, and catalog work had to match real load.
Illustrative groupings of pull-request numbers, not exhaustive release notes. They show how the codebase moved from GitHub-native review toward agent execution, fixes, builder loops, and operator-grade dashboards.
- Critique living on the pull request: install flows, checks, and the first serious comment surfaces.
- Evidence that could travel with a review instead of living only in a private chat.
- The constraint we never dropped: the merge line stays human-governed.
- Collector-backed paths so review could reason about a real tree, not a patch in a vacuum.
- Hardening around timeouts, logs, and failure visibility when remote execution misbehaved.
- Early signals that “review” and “environment” had to be first-class siblings.
- Agent streams attached to review runs so operators could watch work happen.
- Configuration and harness work so OpenCode was a productized runtime, not a one-off script.
- More honest telemetry when a session started but never reached a satisfying tool trace.
- The Critique agent identity on top of OpenCode: same engine, our prompts, policies, and UX.
- Remedy loops for validated fixes; Builder for branch-aware execution outside a single PR comment.
- Chat evolving beside review: better defaults, speech, and shared context with the rest of the app.
- Metering that survives nested agent calls and long sessions without lying on the dashboard.
- Liveness and stall handling so a quiet model minute does not look like a dead product.
- Model catalog and pricing passes so floors match the 2026 vendor shelf.
- Items called out on Version and in shipped UI.
- Flows we demo: GitHub review, Chat, Builder, dashboards, Remedy where permissions allow.
- Credit math that matches what left your pool.
- Spikes and branches that never merged.
- Internal refactors that unblock the next month but never get a marketing paragraph.
- Workspace pieces still wiring together — exciting, not finished as a single surface.
A billion tokens is a trust contract
Roughly a billion tokens a month across review, remedy, chat, and sandbox execution is not a flex. It is a contract: if we miscount, you notice on day three. If a run stalls without explanation, you assume the product is broken. If a map is pretty but inert, you stop opening it. That load is why the “boring” merges at the end of the range matter as much as headline features.
Where to read more
About still explains the three beats — Chat, Review, Remedy — in product language. Remedy and Models pages carry the execution and catalog detail. Older essays on PR review v5, Builder, Chat, Remedy-in-Chat, and the trust gap give the long argument. Version stays the dated ledger when you need “what changed this week.”
Run the beta like you mean it
Use the 500 starter credits, stress the flows that matter to your team, and send a blunt note if something breaks. We are still early — that is the point of being here now.
Create an account