Specialists
Deep reference for each specialist in the review pipeline — what it checks, heuristic rules, model routing, and severity mapping.
The review pipeline runs four executable specialists in parallel, plus two structural stages (Scout and Lead) that frame the review. Each specialist produces structured findings with severity, confidence, title, summary, and optional file locations.
Execution order
SCOUT → SECURITY | TESTS | ARCHITECTURE | PERFORMANCE → LEADScout runs first to build the evidence pack. The four specialists run concurrently. Lead runs last to deduplicate, score, and synthesize.
Runtime sources
Each specialist can produce findings from three runtime sources:
| Source | Description |
|---|---|
heuristic | Deterministic pattern-matching — no LLM involved |
openrouter | LLM-generated findings via OpenRouter |
hybrid | Both heuristic and model findings merged (the default production path) |
In hybrid mode, heuristic findings are passed to the model as context ("Heuristic hints from Critique") so it can confirm, refine, or add to them. The lead stage then deduplicates across both sources.
Scout
Scout is the evidence-gathering pre-stage, not an executable specialist.
Focus: Build the EvidencePack from the pull request diff, nearby files, repository guidance, and issue context.
What Scout collects:
- Changed files and patches (up to 40 files)
- File contents when needed for deeper analysis
- Language inference from extensions (
.ts/.tsx= TypeScript,.js/.jsx/.mjs/.cjs= JavaScript, etc.) - Test file classification (paths containing
__tests__/,/test/,/tests/, or ending with.test.*/.spec.*) - Config file classification (
package.json,tsconfig.json,eslint.config.mjs,.yml/.yaml,.github/) - Package root detection from
apps/orpackages/path segments - Related test paths by base-name matching
- Risk tags:
auth(auth/session/permission paths),billing(billing/invoice/stripe),api(/api/ paths),automation(.github/ or workflow files) - Repository guidance documents:
AGENTS.md,CLAUDE.md,.cursorrules,.cursor/rules/review.mdc,.cursor/rules/project.mdc,.github/copilot-instructions.md,docs/architecture.md,README.md - Linked issue references from PR title/body (up to 5)
No model: Scout is purely deterministic.
Security
Default model: anthropic/claude-sonnet-4.6:nitro
System prompt guidance: Prioritize auth, permissions, injection, secret handling, unsafe code execution, and trust-boundary regressions.
Scope: JS/TS source files (not tests, not config). Files with paths containing auth, billing, or /api/ receive a +40 ranking boost.
Heuristic checks
| Check | Pattern | Severity | Confidence | Title |
|---|---|---|---|---|
dangerouslySetInnerHTML | Literal string match in source | WARNING | 0.88 | Client HTML injection surface increased |
eval / new Function | Regex \b(eval|new Function)\s*\( | FAIL | 0.96 | Dynamic code execution detected |
| Server env in client | 'use client' + process.env. without NEXT_PUBLIC | FAIL | 0.91 | Server-only env access in client component |
| High-risk without tests | Risk tags include auth or billing but no tests touched | WARNING | 0.76 | High-risk code changed without security-oriented test coverage |
Tests
Default model: openai/gpt-5.4-mini
System prompt guidance: Prioritize missing coverage, broken test intent, missing assertions, sensitive-path changes without tests, and regressions hidden by weak test movement.
Scope: JS/TS source files (not tests, not config). Test files get +35 ranking boost; files with related test paths get +18.
Heuristic checks
| Check | Condition | Severity | Confidence | Title |
|---|---|---|---|---|
| Source changed, no tests | Source files modified but no test files touched | FAIL if risk tags include auth/billing, else WARNING | 0.79 | Behavior changed without matching tests |
| Sensitive file without related test | Source file path includes auth or billing and no related test path moved | WARNING | 0.72 | No directly related test path moved with a sensitive source file |
Architecture
Default model: xiaomi/mimo-v2-pro:nitro
System prompt guidance: Prioritize server-client boundary mistakes, module boundary drift, brittle imports, config contract problems, and dangerous coupling.
Scope: JS/TS source files with contents. Config files get +12 ranking boost.
Heuristic checks
| Check | Pattern | Severity | Confidence | Title |
|---|---|---|---|---|
| Deep relative import | Regex: 3+ levels of ../ in an import | WARNING | 0.69 | Deep relative import suggests boundary drift |
| Client imports server module | 'use client' + imports fs, path, child_process, next/headers, or next/cookies | FAIL | 0.84 | Client bundle imports a server-only module |
Performance
Default model: openai/gpt-5.4
System prompt guidance: Prioritize waterfalls, blocking loops, dropped concurrency, client fetch regressions, and obviously expensive patterns.
Scope: JS/TS source files with contents. Files containing useEffect, Promise.all, or await get +18 ranking boost.
Heuristic checks
| Check | Pattern | Severity | Confidence | Title |
|---|---|---|---|---|
| Async forEach | Regex: forEach with async callback | WARNING | 0.81 | Async work inside forEach will not be awaited |
| Await in loop | for(...) block + await within 240 chars | INFO | 0.63 | Await detected inside a loop body |
| Client fetch in useEffect | 'use client' + useEffect with fetch() nearby | INFO | 0.58 | Client-side fetch introduced in useEffect |
Lead
Lead is the synthesis post-stage, not an executable specialist.
Default model: x-ai/grok-4.20-beta
Role: Rewrite the deterministic lead summary into clean GitHub PR copy. The verdict is locked and cannot be changed by the model.
Deduplication
Findings are fingerprinted using a concern-aware SHA-1 hash that prefers explicit concernKey values and falls back to title plus location. That lets Lead collapse overlapping specialist findings such as repeated "missing tests" observations into one normalized issue. When multiple findings collide, Lead keeps the strongest version and merges supporting specialists, regression scenarios, and concrete checks.
Verdict derivation
| Condition | Verdict |
|---|---|
| Any finding severity weight >= policy strictness threshold | FAIL |
| Findings exist but none meet threshold | WARN |
| Zero findings | PASS |
Severity weights
| Severity | Weight | GitHub annotation level |
|---|---|---|
INFO | 1 | notice |
WARNING | 2 | warning |
FAIL | 3 | failure |
With the default policy strictness of FAIL (weight 3), only findings with severity FAIL will cause a FAIL verdict. Setting strictness to WARNING makes WARNING findings also trigger a FAIL verdict.
File evidence ranking
Before sending files to a specialist, the pipeline ranks files by a scoring system:
- Base score from
additions + deletions - Language and file-type bonuses per specialist
- Content-pattern bonuses (e.g., files with
useEffectrank higher for Performance) - Top 12 files are sent to each specialist
This ensures specialists focus on the most relevant changes rather than reviewing the entire diff.
Model fallback chains
If the primary model is unavailable, each role has a fallback chain:
Lead fallbacks: xiaomi/mimo-v2-pro:nitro → minimax/minimax-m2.7:nitro → openai/gpt-5.4:nitro → anthropic/claude-sonnet-4.6:nitro → moonshotai/kimi-k2.5:nitro → z-ai/glm-5-turbo
Default specialist fallbacks: minimax/minimax-m2.7:nitro → x-ai/grok-4.20-beta → xiaomi/mimo-v2-pro:nitro → google/gemini-3-flash-preview:nitro → qwen/qwen3.5-27b:nitro → deepseek/deepseek-v3.2-speciale → qwen/qwen3.5-397b-a17b:nitro → stepfun/step-3.5-flash:nitro → nvidia/nemotron-3-super-120b-a12b:free
Per-kind specialist model overrides are cleared when a policy sets reviewSpecialistModel.
Model overrides
Policies can override lead and specialist models at the installation or repository level. See the Policy Fields reference for details.