Skip to content
Critique/docs
Platform

Inference API

OpenAI-compatible chat completions on Critique credits — Western-hosted models, usage dashboards, and per-user limits.

The Critique Inference API is token-in/token-out chat for sidecars, eval harnesses, and internal tools. It uses the same crt_ keys and credit pool as PR review and Builder — no separate inference vendor account on managed billing.

Marketing overview, rate cards, and an API key gate live at /inference-api. Long-form background: Why we built the Inference API.

Inference API vs Coding Agent API

Inference APIPOST /api/v1/chat/completions returns model text; you orchestrate the loop.

Coding Agent APIPOST /api/v1/coding-agent/runs clones a repo, runs OpenCode in E2B, and can open a draft PR.

Use Inference when you already have an agent runtime. Use Coding Agent when you want Critique to own the sandbox.

Authentication

ItemValue
AuthAuthorization: Bearer crt_…
Create keysSettings → ConnectionsCritique API keys (or the key gate on /inference-api when signed in)
Scopesread:inference, write:inference (included on new keys alongside builder scopes)
OpenAI SDKbaseURL: "https://critique.sh/api/v1"

write:builder also authorizes chat completions if your key predates inference scopes.

Endpoints

MethodPathScopePurpose
GET/api/v1/modelsread:inference or read:builderOpenAI-compatible model list
POST/api/v1/chat/completionswrite:inference or write:builderChat completions billed from Critique credits

Non-streaming responses may include X-Critique-Credits-Charged and X-Critique-Estimated-Usd headers.

Hosted models (v5.2)

Model idRoleNotes
deepseek/deepseek-v4-flashDefault1M context, Western-hosted. Private tier: $0.15 / $0.30 per million input/output tokens (via credits).
tencent/hy3-previewAgentic MoE10% below market on API ($0.0567 / $0.189 per M). 0.5 credits per PR review run on Critique. 262K context.
nvidia/nemotron-3-ultra-550b-a55bFrontier MoEIntro API pricing through 19 June 2026 (UTC) — 50% off market tokens; review runs 2 credits (then 3 shelf).

All launch models route through Western sweetener servers. Critique does not train on Inference API payloads by default.

DeepSeek V4 Flash training opt-in (75% off)

DeepSeek V4 Flash only. Keep the private tier at full rates, or opt in to prompt logging for future model improvement at 25% of list price (75% off)$0.0375 / $0.075 per million input/output tokens.

How to opt in
AccountSettings → Connections → Inference API panel → DeepSeek V4 Flash training opt-in
Per requestHeader X-Critique-DeepSeek-Training-Opt-In: true

Western hosting applies either way. This deal does not apply to review runs or other models.

Quickstart

curl https://critique.sh/api/v1/chat/completions \
  -H "Authorization: Bearer crt_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      { "role": "user", "content": "Summarize this retry policy in three bullets." }
    ]
  }'
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.CRITIQUE_API_KEY,
  baseURL: "https://critique.sh/api/v1",
});

const res = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  messages: [{ role: "user", content: "Draft a TypeScript type for idempotent enqueue." }],
});

Usage dashboard and limits

Signed-in operators get a full dashboard at /inference-dashboard:

  • Credits and tokens over time (7 / 30 / 90 days)
  • Model and API-key attribution
  • Limit status vs caps
  • Paginated activity log

Settings → Connections includes a mini Inference API panel:

ControlPurpose
Enable / disableTurn Inference API off for your account
Monthly / daily credit capsInference-only spend limits (UTC)
Daily request capThrottle runaway agents
Reserve credits for reviewKeep a balance floor for PR review and Builder
Block when cap reachedHard stop vs soft warning
DeepSeek training opt-in75% off DeepSeek V4 Flash token rates when enabled

Settings API: GET / PATCH /api/settings/inference-api. Dashboard data: GET /api/dashboard/inference-api (?days=30 or ?view=activity).

Privacy and acceptable use

  • Payloads are processed to return completions and meter credits — not sold and not used to train foundation models unless you explicitly enable DeepSeek training opt-in.
  • Traffic stays on Western-hosted capacity; Critique does not mirror customer prompts to non-Western training pipelines.
  • Short acceptable-use copy lives on /inference-api#policy.

Billing

Inference token spend converts to Critique credits at Solo-plan economics (same pool as review and Builder). See Billing & credits and Pricing.

Insufficient credits return 402 with insufficient_credits. Per-user caps return errors when limits block the request.