Critique/docs
Platform

PR Review

How Critique's two-stage sandbox-backed pipeline analyses pull requests, assigns verdicts, and posts findings back to GitHub.

Critique attaches to your GitHub repositories as a GitHub App. When a pull request is opened or updated, the review pipeline runs automatically and posts structured feedback directly to the PR as a check run, inline annotations, and a summary comment.

How a review is triggered

Critique subscribes to pull_request webhooks from GitHub. When a PR is opened, synchronized (new commits pushed), or reopened, GitHub delivers a webhook to Critique's ingestion endpoint. The system:

  1. Verifies the webhook signature using GITHUB_WEBHOOK_SECRET.
  2. Deduplicates the delivery by checking the X-GitHub-Delivery header.
  3. Queues the event asynchronously via QStash so the webhook response returns 202 Accepted immediately — no timeout risk.
  4. A background delivery worker picks up the queued message and emits review/run.requested.
  5. The review worker runs the two-stage pipeline (analysis → final review).

Two-stage worker graph

The review pipeline is a two-stage worker graph, not a single backend pass.

GitHub webhook → QStash → analysis worker → persisted artifact → final review worker → GitHub publication

Stage 1 — Analysis worker (review/analysis.requested):

The analysis worker opens the check run, clears any stale findings from a previous run, runs evidence collection, stores the analysis artifact to the database, then re-queues review/run.requested with resumeFromAnalysis: true.

Stage 2 — Final review worker (review/run.requested with analysis resumed):

Reads the persisted analysis artifact and runs the specialist and lead synthesis path. If a sandbox-native final artifact is available, it is consumed directly. Otherwise the backend synthesis path runs (heuristic specialists → OpenRouter specialist passes → drill-down → cross-file analysis → lead rewrite).

Sandbox-backed analysis (primary path)

When sandbox mode is enabled, the analysis worker:

  1. Creates an E2B sandbox and clones the PR head into it.
  2. Computes the diff and repo guidance inside the sandbox.
  3. Runs deterministic collectors and emits phase events.
  4. Persists the EvidencePack analysis metadata to the database.

If sandbox analysis fails or is disabled, the system falls back to the GitHub API scout path — fetching the diff and related files via the GitHub REST API — so review always completes.

Sandbox-native final artifact (when available)

When the sandbox analysis path succeeds, the final review worker can hand off to the sandbox-native execution path. This path runs OpenCode inside the E2B sandbox, which writes the final structured review output to /tmp/critique-review-output.json. The pipeline reads that artifact and persists a sandboxReviewArtifact alongside the standard lead summary. When sandbox-native output is not valid, the backend synthesis path runs instead.

Pipeline stages (logical model)

SCOUT → SECURITY | TESTS | ARCHITECTURE | PERFORMANCE → REVIEW OUTPUT

Scout runs first. It fetches the PR diff, reads nearby repository files, infers languages and risk tags (auth, billing, API), and assembles an Evidence Pack — the shared context that every downstream specialist reads. Sandbox-backed analysis is attempted first; the GitHub API scout is the fallback.

Specialists run concurrently in parallel. Each specialist focuses on a narrow domain:

SpecialistDefault modelFocus
Securityanthropic/claude-sonnet-4.6:nitroAuth, injection, secret handling, trust-boundary regressions
Testsopenai/gpt-5.4-miniCoverage gaps, broken intent, missing assertions
Architecturexiaomi/mimo-v2-pro:nitroServer/client boundaries, module drift, brittle imports
Performanceopenai/gpt-5.4Waterfalls, blocking loops, dropped concurrency

Each specialist also runs a deterministic heuristic pass (no LLM) before calling the model, feeding those signals as structured hints to give the model additional grounding.

Review output is finalized last. When sandbox-native artifact generation succeeds, the final structured review is authored inside the sandbox. Otherwise, the backend synthesizes findings from specialist payloads, deduplicates overlapping findings, and locks the verdict. The verdict is derived purely from finding severities and the configured policy strictness — not from model opinion alone.

Verdict levels

VerdictMeaning
PASSNo findings, or no findings above threshold
WARNFindings exist but none meet the fail threshold
FAILOne or more findings meet or exceed policy strictness

The default strictness is FAIL, meaning only findings with severity FAIL block the merge gate. Setting strictness to WARNING makes warnings also trigger a block. See Policy Fields for configuration details.

GitHub output

Critique posts results back to GitHub in three forms:

  • Check run — appears in the PR's "Checks" tab with Pass / Warn / Fail status. When configured as a required status check in branch protection rules, a FAIL verdict blocks merging.
  • Review comment — a summary comment on the PR thread with the full verdict, key findings, and a link to the canonical review page in the Critique dashboard.
  • Inline annotations — file-level and line-level annotations for findings that map to a specific location in the diff.

Customising the pipeline

You can override models, strictness, and scope at the installation or repository level. See Policy Fields for the full list of configurable options.

Re-triggering a review

To manually re-run a review on an existing PR, type @critique /review in any PR comment. The bot will queue a fresh pipeline run and respond when it is complete.

Credit consumption

Each pipeline run consumes credits proportional to the token usage of the models selected. Scout and Specialists use cost-efficient models by default; the final review synthesis uses a higher-reasoning model. The creditFloor mechanism on each model prevents a single large PR from draining your remaining balance unexpectedly. See Billing & Credits for details.