Inference API

OpenAI-compatible chat completions on Critique credits — Western-hosted models, usage dashboards, and per-user limits.

The Critique Inference API is token-in/token-out chat for sidecars, eval harnesses, and internal tools. It uses the same crt_ keys and credit pool as PR review and Builder — no separate inference vendor account on managed billing.

Marketing overview, rate cards, and an API key gate live at /inference-api. Long-form background: Why we built the Inference API.

Inference API vs Coding Agent API

Inference API — POST /api/v1/chat/completions returns model text; you orchestrate the loop.

Coding Agent API — POST /api/v1/coding-agent/runs clones a repo, runs OpenCode in E2B, and can open a draft PR.

Use Inference when you already have an agent runtime. Use Coding Agent when you want Critique to own the sandbox.

Authentication

Item	Value
Auth	`Authorization: Bearer crt_…`
Create keys	Settings → Connections → Critique API keys (or the key gate on /inference-api when signed in)
Scopes	`read:inference`, `write:inference` (included on new keys alongside builder scopes)
OpenAI SDK	`baseURL: "https://critique.sh/api/v1"`

write:builder also authorizes chat completions if your key predates inference scopes.

Endpoints

Method	Path	Scope	Purpose
`GET`	`/api/v1/models`	`read:inference` or `read:builder`	OpenAI-compatible model list
`POST`	`/api/v1/chat/completions`	`write:inference` or `write:builder`	Chat completions billed from Critique credits

Non-streaming responses may include X-Critique-Credits-Charged and X-Critique-Estimated-Usd headers.

Hosted models (v5.2)

Model id	Role	Notes
`deepseek/deepseek-v4-flash`	Default	1M context, Western-hosted. Private tier: $0.15 / $0.30 per million input/output tokens (via credits).
`tencent/hy3`	Agentic MoE	10% below market on API ($0.126 / $0.522 per M). 1 credit per PR review run on Critique. 295B MoE (21B active), 262K context.
`nvidia/nemotron-3-ultra-550b-a55b`	Frontier MoE	Intro API pricing through 19 June 2026 (UTC) — 50% off market tokens; review runs 2 credits (then 3 shelf).

All launch models route through Western sweetener servers. Critique does not train on Inference API payloads by default.

DeepSeek V4 Flash training opt-in (75% off)

DeepSeek V4 Flash only. Keep the private tier at full rates, or opt in to prompt logging for future model improvement at 25% of list price (75% off) — $0.0375 / $0.075 per million input/output tokens.

How to opt in
Account	Settings → Connections → Inference API panel → DeepSeek V4 Flash training opt-in
Per request	Header `X-Critique-DeepSeek-Training-Opt-In: true`

Western hosting applies either way. This deal does not apply to review runs or other models.

Quickstart

curl https://critique.sh/api/v1/chat/completions \
  -H "Authorization: Bearer crt_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      { "role": "user", "content": "Summarize this retry policy in three bullets." }
    ]
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.CRITIQUE_API_KEY,
  baseURL: "https://critique.sh/api/v1",
});

const res = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Draft a TypeScript type for idempotent enqueue." }],
});

Usage dashboard and limits

Signed-in operators get a full dashboard at /inference-dashboard:

Credits and tokens over time (7 / 30 / 90 days)
Model and API-key attribution
Limit status vs caps
Paginated activity log

Settings → Connections includes a mini Inference API panel:

Control	Purpose
Enable / disable	Turn Inference API off for your account
Monthly / daily credit caps	Inference-only spend limits (UTC)
Daily request cap	Throttle runaway agents
Reserve credits for review	Keep a balance floor for PR review and Builder
Block when cap reached	Hard stop vs soft warning
DeepSeek training opt-in	75% off DeepSeek V4 Flash token rates when enabled

Settings API: GET / PATCH /api/settings/inference-api. Dashboard data: GET /api/dashboard/inference-api (?days=30 or ?view=activity).

Privacy and acceptable use

Payloads are processed to return completions and meter credits — not sold and not used to train foundation models unless you explicitly enable DeepSeek training opt-in.
Traffic stays on Western-hosted capacity; Critique does not mirror customer prompts to non-Western training pipelines.
Short acceptable-use copy lives on /inference-api#policy.

Billing

Inference token spend converts to Critique credits at Solo-plan economics (same pool as review and Builder). See Billing & credits and Pricing.

Insufficient credits return 402 with insufficient_credits. Per-user caps return errors when limits block the request.

Merge Gate API — PR review queue, structured findings, webhooks
Connections & Platform API — crt_ keys, scopes, MCP, REST v1
Coding Agent API — full sandbox agent runs
Models — review catalog including Hy3 and Nemotron
Ship log — operator release notes (v6.0 Platform API and lifecycle webhooks; v5.2 Inference API)

Inference API

On this page