Critique/docs
Reference

Specialists

Deep reference for each specialist in the review pipeline — what it checks, heuristic rules, model routing, and severity mapping.

The review pipeline runs four executable specialists in parallel, plus two structural stages (Scout and Lead) that frame the review. Each specialist produces structured findings with severity, confidence, title, summary, and optional file locations.

Execution order

SCOUT → SECURITY | TESTS | ARCHITECTURE | PERFORMANCE → LEAD

Scout runs first to build the evidence pack. The four specialists run concurrently. Lead runs last to deduplicate, score, and synthesize.


Runtime sources

Each specialist can produce findings from three runtime sources:

SourceDescription
heuristicDeterministic pattern-matching — no LLM involved
openrouterLLM-generated findings via OpenRouter
hybridBoth heuristic and model findings merged (the default production path)

In hybrid mode, heuristic findings are passed to the model as context ("Heuristic hints from Critique") so it can confirm, refine, or add to them. The lead stage then deduplicates across both sources.


Scout

Scout is the evidence-gathering pre-stage, not an executable specialist.

Focus: Build the EvidencePack from the pull request diff, nearby files, repository guidance, and issue context.

What Scout collects:

  • Changed files and patches (up to 40 files)
  • File contents when needed for deeper analysis
  • Language inference from extensions (.ts/.tsx = TypeScript, .js/.jsx/.mjs/.cjs = JavaScript, etc.)
  • Test file classification (paths containing __tests__/, /test/, /tests/, or ending with .test.*/.spec.*)
  • Config file classification (package.json, tsconfig.json, eslint.config.mjs, .yml/.yaml, .github/)
  • Package root detection from apps/ or packages/ path segments
  • Related test paths by base-name matching
  • Risk tags: auth (auth/session/permission paths), billing (billing/invoice/stripe), api (/api/ paths), automation (.github/ or workflow files)
  • Repository guidance documents: AGENTS.md, CLAUDE.md, .cursorrules, .cursor/rules/review.mdc, .cursor/rules/project.mdc, .github/copilot-instructions.md, docs/architecture.md, README.md
  • Linked issue references from PR title/body (up to 5)

No model: Scout is purely deterministic.


Security

Default model: anthropic/claude-sonnet-4.6:nitro

System prompt guidance: Prioritize auth, permissions, injection, secret handling, unsafe code execution, and trust-boundary regressions.

Scope: JS/TS source files (not tests, not config). Files with paths containing auth, billing, or /api/ receive a +40 ranking boost.

Heuristic checks

CheckPatternSeverityConfidenceTitle
dangerouslySetInnerHTMLLiteral string match in sourceWARNING0.88Client HTML injection surface increased
eval / new FunctionRegex \b(eval|new Function)\s*\(FAIL0.96Dynamic code execution detected
Server env in client'use client' + process.env. without NEXT_PUBLICFAIL0.91Server-only env access in client component
High-risk without testsRisk tags include auth or billing but no tests touchedWARNING0.76High-risk code changed without security-oriented test coverage

Tests

Default model: openai/gpt-5.4-mini

System prompt guidance: Prioritize missing coverage, broken test intent, missing assertions, sensitive-path changes without tests, and regressions hidden by weak test movement.

Scope: JS/TS source files (not tests, not config). Test files get +35 ranking boost; files with related test paths get +18.

Heuristic checks

CheckConditionSeverityConfidenceTitle
Source changed, no testsSource files modified but no test files touchedFAIL if risk tags include auth/billing, else WARNING0.79Behavior changed without matching tests
Sensitive file without related testSource file path includes auth or billing and no related test path movedWARNING0.72No directly related test path moved with a sensitive source file

Architecture

Default model: xiaomi/mimo-v2-pro:nitro

System prompt guidance: Prioritize server-client boundary mistakes, module boundary drift, brittle imports, config contract problems, and dangerous coupling.

Scope: JS/TS source files with contents. Config files get +12 ranking boost.

Heuristic checks

CheckPatternSeverityConfidenceTitle
Deep relative importRegex: 3+ levels of ../ in an importWARNING0.69Deep relative import suggests boundary drift
Client imports server module'use client' + imports fs, path, child_process, next/headers, or next/cookiesFAIL0.84Client bundle imports a server-only module

Performance

Default model: openai/gpt-5.4

System prompt guidance: Prioritize waterfalls, blocking loops, dropped concurrency, client fetch regressions, and obviously expensive patterns.

Scope: JS/TS source files with contents. Files containing useEffect, Promise.all, or await get +18 ranking boost.

Heuristic checks

CheckPatternSeverityConfidenceTitle
Async forEachRegex: forEach with async callbackWARNING0.81Async work inside forEach will not be awaited
Await in loopfor(...) block + await within 240 charsINFO0.63Await detected inside a loop body
Client fetch in useEffect'use client' + useEffect with fetch() nearbyINFO0.58Client-side fetch introduced in useEffect

Lead

Lead is the synthesis post-stage, not an executable specialist.

Default model: x-ai/grok-4.20-beta

Role: Rewrite the deterministic lead summary into clean GitHub PR copy. The verdict is locked and cannot be changed by the model.

Deduplication

Findings are fingerprinted using a concern-aware SHA-1 hash that prefers explicit concernKey values and falls back to title plus location. That lets Lead collapse overlapping specialist findings such as repeated "missing tests" observations into one normalized issue. When multiple findings collide, Lead keeps the strongest version and merges supporting specialists, regression scenarios, and concrete checks.

Verdict derivation

ConditionVerdict
Any finding severity weight >= policy strictness thresholdFAIL
Findings exist but none meet thresholdWARN
Zero findingsPASS

Severity weights

SeverityWeightGitHub annotation level
INFO1notice
WARNING2warning
FAIL3failure

With the default policy strictness of FAIL (weight 3), only findings with severity FAIL will cause a FAIL verdict. Setting strictness to WARNING makes WARNING findings also trigger a FAIL verdict.


File evidence ranking

Before sending files to a specialist, the pipeline ranks files by a scoring system:

  • Base score from additions + deletions
  • Language and file-type bonuses per specialist
  • Content-pattern bonuses (e.g., files with useEffect rank higher for Performance)
  • Top 12 files are sent to each specialist

This ensures specialists focus on the most relevant changes rather than reviewing the entire diff.


Model fallback chains

If the primary model is unavailable, each role has a fallback chain:

Lead fallbacks: xiaomi/mimo-v2-pro:nitrominimax/minimax-m2.7:nitroopenai/gpt-5.4:nitroanthropic/claude-sonnet-4.6:nitromoonshotai/kimi-k2.5:nitroz-ai/glm-5-turbo

Default specialist fallbacks: minimax/minimax-m2.7:nitrox-ai/grok-4.20-betaxiaomi/mimo-v2-pro:nitrogoogle/gemini-3-flash-preview:nitroqwen/qwen3.5-27b:nitrodeepseek/deepseek-v3.2-specialeqwen/qwen3.5-397b-a17b:nitrostepfun/step-3.5-flash:nitronvidia/nemotron-3-super-120b-a12b:free

Per-kind specialist model overrides are cleared when a policy sets reviewSpecialistModel.

Model overrides

Policies can override lead and specialist models at the installation or repository level. See the Policy Fields reference for details.