Cursor as a Top-Tier Agent Harness: Composer 2.5, Cloud BYOA, and How It Compares to the Models on Critique
Deep read on Cursor’s agent runtime and Composer 2.5 — vendor benchmarks vs Opus 4.7, GPT-5.5, Kimi K2.6, MiniMax M3, and Qwen3.7 Plus — plus Critique’s SDK-backed cloud handoffs from review runs.

Cursor
Bring your own agent
Cursor harness × Composer 2.5
Review on Critique. Fix in the cloud.
critique.sh
Agent harness · BYOA · Composer 2.5
Harness + model · June 2026
Top-tier agent harness. Frontier coding model.
Critique now queues PR fix handoffs through the Cursor Agent SDK — same cloud agent loop as the IDE, running on Composer 2.5 against your repo and PR. Save your Cursor API key in Settings once; execution bills your Cursor plan, not Critique credits.
One key in Settings. Cloud agents on every PR you choose.
No extra env vars or sidecar scripts for operators. The flow matches Claude Managed Agents and OpenAI Codex BYOA: encrypted key, scoped blueprint, QStash worker, status on the review run. Cursor is the harness; Composer 2.5 is the default model id we pass to the SDK.
PART ONE — WHY CURSOR IS A HARNESS, NOT “JUST A MODEL”
Teams argue about model leaderboards. Staff engineers argue about agent harnesses: Does the loop survive 40 minutes? Does it respect the PR branch? Does it recover from a failed test without rewriting half the repo? Cursor’s moat is the second conversation. The IDE, CLI, Cloud Agents, and the TypeScript SDK all expose the same conceptual object — an Agent with durable state, Runs per prompt, streaming tool events, and cloud VMs that clone your repository.
Critique does not try to replicate that harness inside our sandboxes for Cursor BYOA. We already have Remedy when you want Critique-managed OpenCode on E2B. Cursor BYOA is for orgs that standardized on Cursor execution: same billing relationship, same agent UX in cursor.com/agents, and Composer tuned for the tool schema Cursor actually ships.
Why Critique queues Cursor for execution but uses OpenRouter-shaped models for review.
| Layer | Cursor (BYOA) | Critique review catalog |
|---|---|---|
| Question | How do we patch the PR? | What should change before merge? |
| Runtime | Cursor cloud VM + Agent SDK | Sandbox review graph + specialists |
| Default model | composer-2.5 (Cursor) | Plan-dependent (Opus, Sonnet, M3, Qwen, …) |
| Billing | Cursor API key / plan | Critique credits or BYOK OpenRouter |
| Output | Commits on PR branch | Findings, verdict, blueprint JSON |
PART TWO — COMPOSER 2.5: SPECS AND TRAINING STORY
Composer 2.5 shipped May 18, 2026 as Cursor’s in-house agentic coding model. Public materials describe it as building on Composer 2, with more reinforcement learning on long-horizon coding tasks, better effort calibration (when to keep going vs stop), and stronger tool selection and intent understanding inside Cursor’s agent loop.
The base checkpoint is widely reported as Moonshot’s open Kimi K2.5 lineage — the same architectural family as Kimi K2.6 on Critique’s catalog. Cursor’s differentiation is post-training: Cursor states Composer 2.5 trained on roughly 25× more synthetic tasks than Composer 2, with harder synthetic problems generated dynamically as the model improved (so “easy” tasks did not dominate RL). Third-party summaries also cite a large fraction of total training compute going to Cursor’s own RL stack on top of the open checkpoint.
API list price about $0.50 / M input and $2.50 / M output tokens (Cursor docs, May 2026). Positioned for cost-sensitive batch runs.
About $3 / M input and $15 / M output — same intelligence tier in Cursor’s framing, tuned for interactive agent sessions. Often cited as cheaper than other fast frontier tiers at similar latency.
Composer 2.5 is text-first and tool-native: file edits, terminal, search, MCP when configured in Cursor. It is not on Critique’s OpenRouter review roster — it is exclusive to Cursor surfaces (IDE, CLI, Cloud Agents, SDK). That exclusivity is exactly why BYOA exists: your review can stay multi-vendor while fixes run on the stack you already bought.
PART THREE — BENCHMARKS (VENDOR & THIRD-PARTY TABLES)
SWE-Bench Multilingual
Composer 2.5 vs peers on the same published rows.
- Claude Opus 4.780.5%
- Composer 2.579.8%
- GPT-5.577.8%
- Composer 273.7%
Opus 4.8 may supersede 4.7 on some vendor tables; compare using the exact row your procurement packet cites.
Terminal-Bench 2.0
- GPT-5.582.7%
- Claude Opus 4.769.4%
- Composer 2.569.3%
- Composer 261.7%
- Kimi K2.666.7%
- MiniMax M366%
- Qwen3.7 Plus70.3%
Qwen3.7 Plus terminal score from Alibaba Jun 2026 materials (Critique catalog). M3 from MiniMax launch blog. Harnesses differ — do not treat as interchangeable with Critique’s internal review scores.
CursorBench v3.1
- Claude Opus 4.7 (max)64.8%
- GPT-5.5 (xhigh)64.3%
- Composer 2.563.2%
- Claude Opus 4.7 (default)61.6%
- GPT-5.5 (default)59.2%
- Composer 252.2%
Artificial Analysis Coding Agent Index (May 2026) reports Composer 2.5 at **62** overall with strong cost-per-task — a different blend than CursorBench but the same narrative: near-frontier scores at lower dollars per task.
PART FOUR — COLOSSUS, CLOUD AGENTS, AND WHERE CRITIQUE RUNS
Cursor’s launch narrative for 2026 also points forward: training collaboration with SpaceX AI on Colossus 2 — public commentary describes an order-of-magnitude step up in training compute versus prior generations. That is foundation-model factory infrastructure, not the runtime path for your Tuesday afternoon PR fix.
Critique Cursor BYOA runs in Cursor cloud agents today: we call the Agent SDK with `cloud.repos[]`, your `prUrl`, `workOnCurrentBranch: true`, and `composer-2.5`. The worker runs on Critique’s backend (QStash → pipeline), but the agent loop executes in Cursor-hosted VMs — the same class of surface as “Queue Cursor agent” in the dashboard. Claude BYOA similarly mounts your repo in Anthropic managed cloud; Codex BYOA uses OpenAI Responses until a fuller Codex agent API is available.
PART FIVE — HOW TO USE IT ON CRITIQUE
- 1Where do I put the API key?Cursor Dashboard → Integrations, then Critique Settings → Cursor agent (BYOA). Same pattern as Anthropic and OpenAI BYOA panels.
- 2When can I queue?After the review run completes on the PR you want fixed. Optional operator instructions narrow scope or tests.
- 3Where does the agent run?In Cursor cloud VMs via the Agent SDK — not on Critique Remedy sandboxes and not on Colossus training clusters.
- 4What model executes?Composer 2.5 (`composer-2.5`) unless Cursor changes the SDK default for your account tier.
Export remains available: `GET /api/review-runs/{reviewRunId}/byoa/cursor` returns the `critique.cursor_agent_handoff` JSON for your own CI or scripts. Most teams use the queue button.
| Path | Who executes | When |
|---|---|---|
| Review only | Nobody (human or external) | You want findings without auto-fix |
| Remedy | Critique E2B / OpenCode | You want one invoice and managed sandbox |
| Cursor BYOA | Cursor cloud + Composer 2.5 | You already pay for Cursor agents |
| Claude / Codex BYOA | Anthropic / OpenAI | Same pattern, different vendor key |