ProductJune 4, 202622 min readCritique

Cursor as a Top-Tier Agent Harness: Composer 2.5, Cloud BYOA, and How It Compares to the Models on Critique

Deep read on Cursor’s agent runtime and Composer 2.5 — vendor benchmarks vs Opus 4.7, GPT-5.5, Kimi K2.6, MiniMax M3, and Qwen3.7 Plus — plus Critique’s SDK-backed cloud handoffs from review runs.

Cursor

Bring your own agent

Cursor harness × Composer 2.5

Review on Critique. Fix in the cloud.

critique.sh

Agent harness · BYOA · Composer 2.5

Default BYOA modelcomposer-2.5Cursor Agent SDK · cloud runtime

Harness + model · June 2026

Top-tier agent harness. Frontier coding model.

Critique now queues PR fix handoffs through the Cursor Agent SDK — same cloud agent loop as the IDE, running on Composer 2.5 against your repo and PR. Save your Cursor API key in Settings once; execution bills your Cursor plan, not Critique credits.

Connect Cursor key BYOA docs

Harness

Tool-native agent loop: edit, terminal, search, MCP inside Cursor cloud VMs.

Composer 2.5

Cursor’s in-house model — RL on long coding trajectories, Kimi K2.5 lineage.

Cloud execution

Agents run on Cursor-hosted VMs with prUrl + workOnCurrentBranch — like our Claude BYOA path.

Colossus horizon

Cursor × SpaceX AI Colossus 2 training is the next compute chapter — not where BYOA runs today.

Critique blueprint

Findings, allowed paths, validation commands — then one queue from the review run.

Bring your own agent
One key in Settings. Cloud agents on every PR you choose.No extra env vars or sidecar scripts for operators. The flow matches Claude Managed Agents and OpenAI Codex BYOA: encrypted key, scoped blueprint, QStash worker, status on the review run. Cursor is the harness; Composer 2.5 is the default model id we pass to the SDK.
Cursor
Execution
Cursor BYOA
Your Cursor planNot Critique credits
Ends Queue from review runs
Cursor
Model
Composer 2.5
composer-2.5Composer 2 Fast tier optional in Cursor
Ends SDK cloud runtime
Decision layer
Critique review
Findings + blueprintRemedy optional
Ends Same PR context

Composer 2.5 — SWE-Bench Multilingual (Cursor / DataCamp, May 2026)

Composer 2.5 — Terminal-Bench 2.0 (same sources)

Composer 2.5 — CursorBench v3.1

SDK

Critique queue path — cloud prUrl + workOnCurrentBranch

PART ONE — WHY CURSOR IS A HARNESS, NOT “JUST A MODEL”

Teams argue about model leaderboards. Staff engineers argue about agent harnesses: Does the loop survive 40 minutes? Does it respect the PR branch? Does it recover from a failed test without rewriting half the repo? Cursor’s moat is the second conversation. The IDE, CLI, Cloud Agents, and the TypeScript SDK all expose the same conceptual object — an Agent with durable state, Runs per prompt, streaming tool events, and cloud VMs that clone your repository.

Critique does not try to replicate that harness inside our sandboxes for Cursor BYOA. We already have Remedy when you want Critique-managed OpenCode on E2B. Cursor BYOA is for orgs that standardized on Cursor execution: same billing relationship, same agent UX in cursor.com/agents, and Composer tuned for the tool schema Cursor actually ships.

Harness vs model on a pull request

Why Critique queues Cursor for execution but uses OpenRouter-shaped models for review.

Layer	Cursor (BYOA)	Critique review catalog
Question	How do we patch the PR?	What should change before merge?
Runtime	Cursor cloud VM + Agent SDK	Sandbox review graph + specialists
Default model	composer-2.5 (Cursor)	Plan-dependent (Opus, Sonnet, M3, Qwen, …)
Billing	Cursor API key / plan	Critique credits or BYOK OpenRouter
Output	Commits on PR branch	Findings, verdict, blueprint JSON

PART TWO — COMPOSER 2.5: SPECS AND TRAINING STORY

Composer 2.5 shipped May 18, 2026 as Cursor’s in-house agentic coding model. Public materials describe it as building on Composer 2, with more reinforcement learning on long-horizon coding tasks, better effort calibration (when to keep going vs stop), and stronger tool selection and intent understanding inside Cursor’s agent loop.

The base checkpoint is widely reported as Moonshot’s open Kimi K2.5 lineage — the same architectural family as Kimi K2.6 on Critique’s catalog. Cursor’s differentiation is post-training: Cursor states Composer 2.5 trained on roughly 25× more synthetic tasks than Composer 2, with harder synthetic problems generated dynamically as the model improved (so “easy” tasks did not dominate RL). Third-party summaries also cite a large fraction of total training compute going to Cursor’s own RL stack on top of the open checkpoint.

Composer 2.5 Standard
API list price about $0.50 / M input and $2.50 / M output tokens (Cursor docs, May 2026). Positioned for cost-sensitive batch runs.
Composer 2.5 Fast (default)
About $3 / M input and $15 / M output — same intelligence tier in Cursor’s framing, tuned for interactive agent sessions. Often cited as cheaper than other fast frontier tiers at similar latency.

Composer 2.5 is text-first and tool-native: file edits, terminal, search, MCP when configured in Cursor. It is not on Critique’s OpenRouter review roster — it is exclusive to Cursor surfaces (IDE, CLI, Cloud Agents, SDK). That exclusivity is exactly why BYOA exists: your review can stay multi-vendor while fixes run on the stack you already bought.

PART THREE — BENCHMARKS (VENDOR & THIRD-PARTY TABLES)

SWE-Bench Multilingual — Composer 2.5 vs frontier rows

Multilingual repair suite. Scores from Cursor launch materials and DataCamp’s May 2026 comparison table — not SWE-Bench Verified or Pro.

SWE-Bench Multilingual

Composer 2.5 vs peers on the same published rows.

Claude Opus 4.780.5%
Composer 2.579.8%
GPT-5.577.8%
Composer 273.7%

Opus 4.8 may supersede 4.7 on some vendor tables; compare using the exact row your procurement packet cites.

Terminal-Bench 2.0 — agentic terminal coding

Higher is better. GPT-5.5 leads this suite in public comparisons; Composer 2.5 ties Opus 4.7 band.

Terminal-Bench 2.0

GPT-5.582.7%
Claude Opus 4.769.4%
Composer 2.569.3%
Composer 261.7%
Kimi K2.666.7%
MiniMax M366%
Qwen3.7 Plus70.3%

Qwen3.7 Plus terminal score from Alibaba Jun 2026 materials (Critique catalog). M3 from MiniMax launch blog. Harnesses differ — do not treat as interchangeable with Critique’s internal review scores.

CursorBench v3.1 — Cursor’s agent-trajectory benchmark

Designed to reflect real Cursor agent runs. Composer 2.5 at 63.2% in Cursor/DataCamp tables; Composer 2 at 52.2%.

CursorBench v3.1

Claude Opus 4.7 (max)64.8%
GPT-5.5 (xhigh)64.3%
Composer 2.563.2%
Claude Opus 4.7 (default)61.6%
GPT-5.5 (default)59.2%
Composer 252.2%

Artificial Analysis Coding Agent Index (May 2026) reports Composer 2.5 at 62 overall with strong cost-per-task — a different blend than CursorBench but the same narrative: near-frontier scores at lower dollars per task.

PART FOUR — COLOSSUS, CLOUD AGENTS, AND WHERE CRITIQUE RUNS

Cursor’s launch narrative for 2026 also points forward: training collaboration with SpaceX AI on Colossus 2 — public commentary describes an order-of-magnitude step up in training compute versus prior generations. That is foundation-model factory infrastructure, not the runtime path for your Tuesday afternoon PR fix.

Critique Cursor BYOA runs in Cursor cloud agents today: we call the Agent SDK with cloud.repos[], your prUrl, workOnCurrentBranch: true, and composer-2.5. The worker runs on Critique’s backend (QStash → pipeline), but the agent loop executes in Cursor-hosted VMs — the same class of surface as “Queue Cursor agent” in the dashboard. Claude BYOA similarly mounts your repo in Anthropic managed cloud; Codex BYOA uses OpenAI Responses until a fuller Codex agent API is available.

Critique → Cursor cloud (BYOA)
Review completes — findings + policy + allowed write paths→buildRemedyBlueprint (backend: byoa) + cursor handoff JSON→Queue — POST /api/review-runs/{id}/cursor-agent→Cursor Agent SDK — cloud repos + prUrl + composer-2.5→Open in Cursor — commits on PR head branch

PART FIVE — HOW TO USE IT ON CRITIQUE

Operator checklist
1Where do I put the API key?
Cursor Dashboard → Integrations, then Critique Settings → Cursor agent (BYOA). Same pattern as Anthropic and OpenAI BYOA panels.
2When can I queue?
After the review run completes on the PR you want fixed. Optional operator instructions narrow scope or tests.
3Where does the agent run?
In Cursor cloud VMs via the Agent SDK — not on Critique Remedy sandboxes and not on Colossus training clusters.
4What model executes?
Composer 2.5 (composer-2.5) unless Cursor changes the SDK default for your account tier.

Export remains available: GET /api/review-runs/{reviewRunId}/byoa/cursor returns the critique.cursor_agent_handoff JSON for your own CI or scripts. Most teams use the queue button.

Cursor BYOA vs Remedy vs review-only

Path	Who executes	When
Review only	Nobody (human or external)	You want findings without auto-fix
Remedy	Critique E2B / OpenCode	You want one invoice and managed sandbox
Cursor BYOA	Cursor cloud + Composer 2.5	You already pay for Cursor agents
Claude / Codex BYOA	Anthropic / OpenAI	Same pattern, different vendor key

No. PR review uses Critique’s multi-model graph on OpenRouter-shaped ids (Opus, Sonnet, M3, Qwen, Kimi, etc.). Composer 2.5 is only used when you queue Cursor BYOA execution after review.

No. Critique runs the SDK server-side with your encrypted API key. You only save the key in Settings.

They share reported K2.5-lineage DNA, but Composer 2.5 is Cursor’s post-trained agentic product with Cursor-only tool tuning. Kimi K2.6 on Critique is a separate OpenRouter runtime for review passes.

Critique falls back to the Cloud Agents REST API with the same PR attachment and model id. You still use one key in Settings.

That essay is the partnership framing. This one is the deep harness + Composer 2.5 benchmark read and the SDK queue path shipping now.

Primary sources

Cursor — Composer 2.5 docs

Cursor — TypeScript SDK

Cursor — Cloud Agents API

Critique — BYOA platform docs

DataCamp — Composer 2.5 benchmarks (May 2026)

Artificial Analysis — Composer 2.5 on Coding Agent Index

Critique welcomes Cursor (partnership essay)

MiniMax M3 + Qwen3.7 — catalog comparison essay

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Get started

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy