Skip to content
Product10 min readRepath Khan

Critique Coding Agent API: How Teams Are Actually Using Cloud Agents Over HTTP

The Coding Agent API is no longer just “start a run and poll.” It is turning into a control plane for CI repair, support-to-fix loops, internal fix bots, and agent-supervisor workflows.

A cloud coding agent API stops being interesting the moment it can only demo “open sandbox, write patch, maybe open PR.” That is table stakes now.

What matters in practice is the control plane around the run. Can a CI job retry safely? Can a platform team tag work by owner and incident? Can a supervisor tell the difference between “the agent is still running,” “the sandbox is warm, send the next instruction,” and “this session died, start a chained fallback”? Can the same organization attach judgment later at the merge boundary instead of treating code generation as the whole system?

That is what this update is about. We did not build another chat surface and call it infrastructure. We tightened the contract around runs so machines can operate the product without scraping Builder semantics out of loose event text.

0
Documented Coding Agent API endpoints in Platform OpenAPI
0
Run id reused across warm follow-up turns
0
Max normalized tags per run
0
Common workflow shapes this update now serves cleanly

The cloud-agent market has converged on a few obvious ideas. Cursor publishes a run-based Cloud Agents API. Devin exposes organization-scoped sessions, service-user auth, tags, resumable sessions, secrets, knowledge, playbooks, and session insights. OpenAI Codex can start cloud tasks from pull-request context and push fixes back when it has permission. Factory talks openly about non-interactive execution, cloud templates, and agent readiness. The category is no longer proving that remote coding works. It is proving that the surrounding contract is operable.

What a serious cloud coding API needs in 2026

The baseline moved. “POST a prompt” is not the product anymore; the product is what the surrounding system can rely on.

CapabilityWhy it mattersThis Critique update
Retry-safe createWebhook handlers and CI rerun jobs cannot open duplicate sandboxes by accident.Idempotency keys already existed; they remain first-class in the create path.
Lifecycle fieldsClients need stable state, not implied state reconstructed from event text.lifecycle, terminal, awaitingFollowUp, canFollowUp, and nextActions now ship on run and status responses.
AttributionPlatform teams need to group work by owner, queue, incident, or source system.title, tags, and bounded metadata now travel with the run.
Warm follow-upsMulti-step automation should not pay a cold start for every sentence.Live idle sessions still take follow-ups on the same run id, with cleaner expired-session fallback.
Machine contractGenerated clients and internal SDKs need OpenAPI, not only prose docs.Platform OpenAPI now covers models, runs, status, messages, stream, cancel, webhook, safety, and SSE schemas.

There are many possible demos. In practice, the useful patterns collapse into a small set of repeatable loops. The Coding Agent API is now shaped around those loops rather than around one-off prompt submission.

The four workflow shapes
  1. 1
    CI repair loop
    A failing workflow or flaky check triggers a run with idempotency, run tags like ci or owner:platform, a strict safety budget, and optional draft PR publish. The client watches lifecycle and nextActions instead of reverse-engineering the event ledger.
  2. 2
    Support-to-fix loop
    An investigated support or intake packet becomes a coding-agent run with ticket metadata and repository context. The follow-up turn adds the missing regression test or docs note without recloning the repo when the session is still warm.
  3. 3
    Internal fix bot
    A platform team’s own orchestrator owns routing, queuing, and permissions, while Critique owns sandbox execution, patch generation, and optional PR publish. Tags and metadata make these runs queryable by queue, tenant, or incident.
  4. 4
    Agent-supervisor workflow
    One system writes code, another later judges mergeability. Critique’s writer API stays explicit about repo work, while Review runs, Change Passports, and the Merge Gate API remain the judge layer on the PR that agent created.

Create a run with attribution and budgets

The new fields are intentionally boring. That is the point. They give orchestration systems stable handles instead of forcing every team to invent sidecar storage.

curl https://critique.sh/api/v1/coding-agent/runs \
  -H "Authorization: Bearer crt_..." \
  -H "Idempotency-Key: intake-bug-742" \
  -H "Content-Type: application/json" \
  -d '{
    "repository": "acme/web",
    "title": "Fix Stripe webhook verification",
    "tags": ["ci", "payments", "owner:platform"],
    "metadata": {
      "ticket": "PAY-742",
      "source": "github-actions"
    },
    "prompt": "Add Stripe webhook signature verification and regression tests.",
    "modelId": "anthropic/claude-sonnet-4.6",
    "billing": { "mode": "managed" },
    "publish": { "mode": "draft_pr" },
    "validationMode": "tests",
    "safety": {
      "network": { "mode": "restricted", "allowlist": ["api.stripe.com"] },
      "resources": { "maxTurns": 3, "maxCredits": 12 }
    }
  }'

Three pieces matter here. First, attribution lives on the run itself: title, tags, metadata. Second, the client gets stable lifecycle hints from the API rather than guessing whether a warm session is still usable. Third, OpenAPI now describes the actual surface, which means internal SDK generation stops lagging behind the shipped routes.

The right move in a fast market is not blind originality. Cursor, Devin, Codex, and Factory each make part of the shape obvious. Run-based APIs are good. Session attribution is good. Tags are good. Resumability is good. Explicit cloud-task context is good. We borrowed the parts that make an API more operable.

We did not try to copy everything. Devin’s broader session model includes service-user impersonation, secrets, knowledge, playbooks, and detailed session insights. That is a real product surface, but it also drags security, RBAC, tenancy, and enterprise admin consequences with it. We are not pretending those concerns disappear because a field looks easy to add. The current Critique update stays inside a narrower contract we can defend: repo-scoped execution with clearer lifecycle and provenance.

What this update adds now vs what stays for a later layer
AreaNowLater, if it earns its complexity
Run controlLifecycle fields, SSE status payloads, clean fallback behaviorQueue policies, richer scheduling, retries beyond current delivery model
AttributionTitle, tags, metadata, deterministic intent classificationCross-org impersonation, service-user identity layers, tenant policy inheritance
ContextRepository, prompt, safety policy, publish mode, follow-up turnsKnowledge packs, secret catalogs, playbooks, wider preconfigured capability bundles
GovernanceCoding Agent API for writing; Review runs and Merge Gate API for judgingTighter policy coupling across the full writer-judge loop

A lot of the cloud-agent market still talks as if the same system should both write the code and certify that the code should ship. Sometimes that is fine for low-risk work. It is not a strong default for production engineering teams.

Critique’s position remains the same. The Coding Agent API is the writer surface. It clones a repo, executes in a sandbox, and can open a PR. Review runs, Change Passports, and the Merge Gate API are the judge surface. The reason this update matters is that better lifecycle and provenance on the writer side make the handoff to the judge side much cleaner.

Partly. The overlap is real at the HTTP contract layer. But Critique is still narrower than a full session platform or IDE replacement. The stronger position is cloud repo execution plus a clear path into merge-grade review.
No. The run contract became richer without introducing a separate CodingAgentRun table. Attribution is persisted through Builder job events and surfaced through the API response.
Because operators and dashboards need a cheap first pass at “what kind of task was this?” before they open the full transcript. It is deterministic and intentionally modest, not a claim of deep semantic understanding.
Because the runtime should not fork just because the entry point changes. Browser UI and HTTP automation can share OpenCode + E2B while exposing different control surfaces on top.

Use the API like infrastructure

Generate a crt_ key, wire one concrete workflow first, and treat lifecycle, tags, metadata, and nextActions as the beginning of your agent control plane — not as decorative response fields.

Open Coding Agent API