Model Execution Engine
How Critique manages runtime models, chat models, OpenRouter fallbacks, and exact token cost tracking.
Critique's model execution engine categorizes models into predefined roles (review-lead, review-specialist, remedy, and chat) and leverages OpenRouter for robust fallback chains and exact token tracking.
1. Runtime vs Chat Models
The system categorizes models into two distinct groups, managed in lib/models/catalog.ts:
Runtime Models
These models handle the heavy lifting for automated PR reviews and the Remedy execution layer. Each model is assigned a base creditFloor and a plan tier (standard or ultra):
- Ultra Plan Models (High Capability):
anthropic/claude-opus-4.6:nitroopenai/gpt-5.2-pro:nitroopenai/gpt-5.4-pro:nitro
- Standard Plan Models (Main Workhorses):
google/gemini-3.1-flash-lite-preview:nitrogoogle/gemini-3-flash-preview:nitroanthropic/claude-sonnet-4.6:nitrox-ai/grok-4.20-betaminimax/minimax-m2.7:nitro
Chat Models
A curated subset available for conversational Q&A and direct UI interactions, managed in lib/ai/chat-models.ts. These models track additional UI-centric attributes like supportsReasoning (for visual thinking UI) and audio capabilities.
2. OpenRouter Native Fallbacks
Unlike systems that manually chain retry logic in code, Critique relies entirely on OpenRouter's native fallback mechanism.
- In
lib/review/openrouter.ts, model requests pass aproviderobject containingallow_fallbacks: config.allowFallbacks. - The system executes a
POSTrequest to OpenRouter's/chat/completions. - If OpenRouter utilizes a fallback (e.g., if the primary model is down or rate-limited), the API returns the ultimately used model in the
modelfield of the response payload. - Critique detects this by comparing the requested model to the response model (
fallbackUsed: resolvedModel !== args.model) and logs it into theModelExecutionTelemetry.
3. Token Pricing Cache
Token pricing and cost estimation are heavily optimized using a caching layer found in lib/openrouter/pricing.ts.
- Pricing Map Cache: The system fetches live pricing per token from
https://openrouter.ai/api/v1/modelsand caches it in memory for 5 minutes (pricingCache). - Cost Calculation: The
estimateOpenRouterUsageCostfunction calculates exact usage across 4 dimensions:promptCostUsd: Calculated purely on uncached prompt tokens (Total Prompt Tokens - Cache Read - Cache Write).completionCostUsd: Output tokens.cacheReadCostUsd: Tokens successfully read from the prompt cache.cacheWriteCostUsd: New tokens written to the prompt cache.
4. Role Routing
Models are heavily categorized by predefined roles (RuntimeModelRole) determining how they are routed and prompted.
- Specialist Models (
runOpenRouterSpecialist): Routed a ranked list of relevant files (e.g., Security prioritizesauth/or/api/files; Tests prioritize missing coverage). They receive heuristic hints and strict temperature settings (0.1), generating structured JSON findings (severity, confidence, exact line numbers). - Lead Models (
generateOpenRouterLeadCopy): Synthesizes deterministic findings into a cohesive, non-hyperbolic Pull Request review. The Lead is routed the output of the specialists, instructed to preserve the actual verdict, collapse overlapping test-gap findings, and emphasize product risk over generic hygiene language. - Chat Models (
getChatModelId): Separated from backend review automation, these models handle user Q&A interactions and dynamic code explanations, leveraging UI-centric parameters.