Critique Coding Agent API: How Teams Are Actually Using Cloud Agents Over HTTP
The Coding Agent API is no longer just “start a run and poll.” It is turning into a control plane for CI repair, support-to-fix loops, internal fix bots, and agent-supervisor workflows.
A cloud coding agent API stops being interesting the moment it can only demo “open sandbox, write patch, maybe open PR.” That is table stakes now.
What matters in practice is the control plane around the run. Can a CI job retry safely? Can a platform team tag work by owner and incident? Can a supervisor tell the difference between “the agent is still running,” “the sandbox is warm, send the next instruction,” and “this session died, start a chained fallback”? Can the same organization attach judgment later at the merge boundary instead of treating code generation as the whole system?
That is what this update is about. We did not build another chat surface and call it infrastructure. We tightened the contract around runs so machines can operate the product without scraping Builder semantics out of loose event text.
Why the contract had to get sharper
The cloud-agent market has converged on a few obvious ideas. Cursor publishes a run-based Cloud Agents API. Devin exposes organization-scoped sessions, service-user auth, tags, resumable sessions, secrets, knowledge, playbooks, and session insights. OpenAI Codex can start cloud tasks from pull-request context and push fixes back when it has permission. Factory talks openly about non-interactive execution, cloud templates, and agent readiness. The category is no longer proving that remote coding works. It is proving that the surrounding contract is operable.
The baseline moved. “POST a prompt” is not the product anymore; the product is what the surrounding system can rely on.
| Capability | Why it matters | This Critique update |
|---|---|---|
| Retry-safe create | Webhook handlers and CI rerun jobs cannot open duplicate sandboxes by accident. | Idempotency keys already existed; they remain first-class in the create path. |
| Lifecycle fields | Clients need stable state, not implied state reconstructed from event text. | lifecycle, terminal, awaitingFollowUp, canFollowUp, and nextActions now ship on run and status responses. |
| Attribution | Platform teams need to group work by owner, queue, incident, or source system. | title, tags, and bounded metadata now travel with the run. |
| Warm follow-ups | Multi-step automation should not pay a cold start for every sentence. | Live idle sessions still take follow-ups on the same run id, with cleaner expired-session fallback. |
| Machine contract | Generated clients and internal SDKs need OpenAPI, not only prose docs. | Platform OpenAPI now covers models, runs, status, messages, stream, cancel, webhook, safety, and SSE schemas. |
How teams are using cloud agents over API
There are many possible demos. In practice, the useful patterns collapse into a small set of repeatable loops. The Coding Agent API is now shaped around those loops rather than around one-off prompt submission.
- 1CI repair loopA failing workflow or flaky check triggers a run with idempotency, run tags like
ciorowner:platform, a strict safety budget, and optional draft PR publish. The client watcheslifecycleandnextActionsinstead of reverse-engineering the event ledger. - 2Support-to-fix loopAn investigated support or intake packet becomes a coding-agent run with ticket metadata and repository context. The follow-up turn adds the missing regression test or docs note without recloning the repo when the session is still warm.
- 3Internal fix botA platform team’s own orchestrator owns routing, queuing, and permissions, while Critique owns sandbox execution, patch generation, and optional PR publish. Tags and metadata make these runs queryable by queue, tenant, or incident.
- 4Agent-supervisor workflowOne system writes code, another later judges mergeability. Critique’s writer API stays explicit about repo work, while Review runs, Change Passports, and the Merge Gate API remain the judge layer on the PR that agent created.
The API shape after this update
Create a run with attribution and budgets
The new fields are intentionally boring. That is the point. They give orchestration systems stable handles instead of forcing every team to invent sidecar storage.
curl https://critique.sh/api/v1/coding-agent/runs \
-H "Authorization: Bearer crt_..." \
-H "Idempotency-Key: intake-bug-742" \
-H "Content-Type: application/json" \
-d '{
"repository": "acme/web",
"title": "Fix Stripe webhook verification",
"tags": ["ci", "payments", "owner:platform"],
"metadata": {
"ticket": "PAY-742",
"source": "github-actions"
},
"prompt": "Add Stripe webhook signature verification and regression tests.",
"modelId": "anthropic/claude-sonnet-4.6",
"billing": { "mode": "managed" },
"publish": { "mode": "draft_pr" },
"validationMode": "tests",
"safety": {
"network": { "mode": "restricted", "allowlist": ["api.stripe.com"] },
"resources": { "maxTurns": 3, "maxCredits": 12 }
}
}'Three pieces matter here. First, attribution lives on the run itself: title, tags, metadata. Second, the client gets stable lifecycle hints from the API rather than guessing whether a warm session is still usable. Third, OpenAPI now describes the actual surface, which means internal SDK generation stops lagging behind the shipped routes.
What we copied from the market, and what we did not
The right move in a fast market is not blind originality. Cursor, Devin, Codex, and Factory each make part of the shape obvious. Run-based APIs are good. Session attribution is good. Tags are good. Resumability is good. Explicit cloud-task context is good. We borrowed the parts that make an API more operable.
We did not try to copy everything. Devin’s broader session model includes service-user impersonation, secrets, knowledge, playbooks, and detailed session insights. That is a real product surface, but it also drags security, RBAC, tenancy, and enterprise admin consequences with it. We are not pretending those concerns disappear because a field looks easy to add. The current Critique update stays inside a narrower contract we can defend: repo-scoped execution with clearer lifecycle and provenance.
| Area | Now | Later, if it earns its complexity |
|---|---|---|
| Run control | Lifecycle fields, SSE status payloads, clean fallback behavior | Queue policies, richer scheduling, retries beyond current delivery model |
| Attribution | Title, tags, metadata, deterministic intent classification | Cross-org impersonation, service-user identity layers, tenant policy inheritance |
| Context | Repository, prompt, safety policy, publish mode, follow-up turns | Knowledge packs, secret catalogs, playbooks, wider preconfigured capability bundles |
| Governance | Coding Agent API for writing; Review runs and Merge Gate API for judging | Tighter policy coupling across the full writer-judge loop |
The bigger point: writer APIs are not judge APIs
A lot of the cloud-agent market still talks as if the same system should both write the code and certify that the code should ship. Sometimes that is fine for low-risk work. It is not a strong default for production engineering teams.
Critique’s position remains the same. The Coding Agent API is the writer surface. It clones a repo, executes in a sandbox, and can open a PR. Review runs, Change Passports, and the Merge Gate API are the judge surface. The reason this update matters is that better lifecycle and provenance on the writer side make the handoff to the judge side much cleaner.
Use the API like infrastructure
Generate a crt_ key, wire one concrete workflow first, and treat lifecycle, tags, metadata, and nextActions as the beginning of your agent control plane — not as decorative response fields.