Critique/docs
Architecture

Model Execution Engine

How Critique manages runtime models, chat models, OpenRouter fallbacks, and exact token cost tracking.

Critique's model execution engine categorizes models into predefined roles (review-lead, review-specialist, remedy, and chat) and leverages OpenRouter for robust fallback chains and exact token tracking.

1. Runtime vs Chat Models

The system categorizes models into two distinct groups, managed in lib/models/catalog.ts:

Runtime Models

These models handle the heavy lifting for automated PR reviews and the Remedy execution layer. Each model is assigned a base creditFloor and a plan tier (standard or ultra):

  • Ultra Plan Models (High Capability):
    • anthropic/claude-opus-4.6:nitro
    • openai/gpt-5.2-pro:nitro
    • openai/gpt-5.4-pro:nitro
  • Standard Plan Models (Main Workhorses):
    • google/gemini-3.1-flash-lite-preview:nitro
    • google/gemini-3-flash-preview:nitro
    • anthropic/claude-sonnet-4.6:nitro
    • x-ai/grok-4.20-beta
    • minimax/minimax-m2.7:nitro

Chat Models

A curated subset available for conversational Q&A and direct UI interactions, managed in lib/ai/chat-models.ts. These models track additional UI-centric attributes like supportsReasoning (for visual thinking UI) and audio capabilities.

2. OpenRouter Native Fallbacks

Unlike systems that manually chain retry logic in code, Critique relies entirely on OpenRouter's native fallback mechanism.

  • In lib/review/openrouter.ts, model requests pass a provider object containing allow_fallbacks: config.allowFallbacks.
  • The system executes a POST request to OpenRouter's /chat/completions.
  • If OpenRouter utilizes a fallback (e.g., if the primary model is down or rate-limited), the API returns the ultimately used model in the model field of the response payload.
  • Critique detects this by comparing the requested model to the response model (fallbackUsed: resolvedModel !== args.model) and logs it into the ModelExecutionTelemetry.

3. Token Pricing Cache

Token pricing and cost estimation are heavily optimized using a caching layer found in lib/openrouter/pricing.ts.

  • Pricing Map Cache: The system fetches live pricing per token from https://openrouter.ai/api/v1/models and caches it in memory for 5 minutes (pricingCache).
  • Cost Calculation: The estimateOpenRouterUsageCost function calculates exact usage across 4 dimensions:
    1. promptCostUsd: Calculated purely on uncached prompt tokens (Total Prompt Tokens - Cache Read - Cache Write).
    2. completionCostUsd: Output tokens.
    3. cacheReadCostUsd: Tokens successfully read from the prompt cache.
    4. cacheWriteCostUsd: New tokens written to the prompt cache.

4. Role Routing

Models are heavily categorized by predefined roles (RuntimeModelRole) determining how they are routed and prompted.

  • Specialist Models (runOpenRouterSpecialist): Routed a ranked list of relevant files (e.g., Security prioritizes auth/ or /api/ files; Tests prioritize missing coverage). They receive heuristic hints and strict temperature settings (0.1), generating structured JSON findings (severity, confidence, exact line numbers).
  • Lead Models (generateOpenRouterLeadCopy): Synthesizes deterministic findings into a cohesive, non-hyperbolic Pull Request review. The Lead is routed the output of the specialists, instructed to preserve the actual verdict, collapse overlapping test-gap findings, and emphasize product risk over generic hygiene language.
  • Chat Models (getChatModelId): Separated from backend review automation, these models handle user Q&A interactions and dynamic code explanations, leveraging UI-centric parameters.