Inference API
OpenAI-compatible chat completions on Critique credits — Western-hosted models, usage dashboards, and per-user limits.
The Critique Inference API is token-in/token-out chat for sidecars, eval harnesses, and internal tools. It uses the same crt_ keys and credit pool as PR review and Builder — no separate inference vendor account on managed billing.
Marketing overview, rate cards, and an API key gate live at /inference-api. Long-form background: Why we built the Inference API.
Inference API vs Coding Agent API
Inference API — POST /api/v1/chat/completions returns model text; you orchestrate the loop.
Coding Agent API — POST /api/v1/coding-agent/runs clones a repo, runs OpenCode in E2B, and can open a draft PR.
Use Inference when you already have an agent runtime. Use Coding Agent when you want Critique to own the sandbox.
Authentication
| Item | Value |
|---|---|
| Auth | Authorization: Bearer crt_… |
| Create keys | Settings → Connections → Critique API keys (or the key gate on /inference-api when signed in) |
| Scopes | read:inference, write:inference (included on new keys alongside builder scopes) |
| OpenAI SDK | baseURL: "https://critique.sh/api/v1" |
write:builder also authorizes chat completions if your key predates inference scopes.
Endpoints
| Method | Path | Scope | Purpose |
|---|---|---|---|
GET | /api/v1/models | read:inference or read:builder | OpenAI-compatible model list |
POST | /api/v1/chat/completions | write:inference or write:builder | Chat completions billed from Critique credits |
Non-streaming responses may include X-Critique-Credits-Charged and X-Critique-Estimated-Usd headers.
Hosted models (v5.2)
| Model id | Role | Notes |
|---|---|---|
deepseek/deepseek-v4-flash | Default | 1M context, Western-hosted. Private tier: $0.15 / $0.30 per million input/output tokens (via credits). |
tencent/hy3-preview | Agentic MoE | 10% below market on API ($0.0567 / $0.189 per M). 0.5 credits per PR review run on Critique. 262K context. |
nvidia/nemotron-3-ultra-550b-a55b | Frontier MoE | Intro API pricing through 19 June 2026 (UTC) — 50% off market tokens; review runs 2 credits (then 3 shelf). |
All launch models route through Western sweetener servers. Critique does not train on Inference API payloads by default.
DeepSeek V4 Flash training opt-in (75% off)
DeepSeek V4 Flash only. Keep the private tier at full rates, or opt in to prompt logging for future model improvement at 25% of list price (75% off) — $0.0375 / $0.075 per million input/output tokens.
| How to opt in | |
|---|---|
| Account | Settings → Connections → Inference API panel → DeepSeek V4 Flash training opt-in |
| Per request | Header X-Critique-DeepSeek-Training-Opt-In: true |
Western hosting applies either way. This deal does not apply to review runs or other models.
Quickstart
curl https://critique.sh/api/v1/chat/completions \
-H "Authorization: Bearer crt_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [
{ "role": "user", "content": "Summarize this retry policy in three bullets." }
]
}'import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.CRITIQUE_API_KEY,
baseURL: "https://critique.sh/api/v1",
});
const res = await client.chat.completions.create({
model: "deepseek/deepseek-v4-flash",
messages: [{ role: "user", content: "Draft a TypeScript type for idempotent enqueue." }],
});Usage dashboard and limits
Signed-in operators get a full dashboard at /inference-dashboard:
- Credits and tokens over time (7 / 30 / 90 days)
- Model and API-key attribution
- Limit status vs caps
- Paginated activity log
Settings → Connections includes a mini Inference API panel:
| Control | Purpose |
|---|---|
| Enable / disable | Turn Inference API off for your account |
| Monthly / daily credit caps | Inference-only spend limits (UTC) |
| Daily request cap | Throttle runaway agents |
| Reserve credits for review | Keep a balance floor for PR review and Builder |
| Block when cap reached | Hard stop vs soft warning |
| DeepSeek training opt-in | 75% off DeepSeek V4 Flash token rates when enabled |
Settings API: GET / PATCH /api/settings/inference-api. Dashboard data: GET /api/dashboard/inference-api (?days=30 or ?view=activity).
Privacy and acceptable use
- Payloads are processed to return completions and meter credits — not sold and not used to train foundation models unless you explicitly enable DeepSeek training opt-in.
- Traffic stays on Western-hosted capacity; Critique does not mirror customer prompts to non-Western training pipelines.
- Short acceptable-use copy lives on /inference-api#policy.
Billing
Inference token spend converts to Critique credits at Solo-plan economics (same pool as review and Builder). See Billing & credits and Pricing.
Insufficient credits return 402 with insufficient_credits. Per-user caps return errors when limits block the request.
Related
- Connections & Platform API —
crt_keys, scopes, MCP, REST v1 - Coding Agent API — full sandbox agent runs
- Models — review catalog including Hy3 and Nemotron
- v5.2 changelog — operator ship log