9 min readCritique

Critique is now conversational

Tag @critique in any GitHub PR comment to ask questions, trigger re-reviews, request fixes, and get security scans — all inside the thread. Powered by Qwen 3.6 Plus with a 1M-token context window.

Ping @critique.

PR threads · 1M-token context

critique.sh

Product / Interactive PR Chat
Live now for all installations

Tag @critique in any PR comment. Get answers instantly.

Ask questions, request explanations, trigger re-reviews, and get security scans — all from within your GitHub PR thread. Powered by Qwen 3.6 Plus with a 1M-token context window.

@critique /review
Re-run full pipeline
@critique /explain
Break down changes
@critique /fix
Suggest a concrete fix
@critique /security
Ad-hoc security scan
@critique /tests
Missing test suggestions
@critique /improve
Architecture improvements
Powered by
Qwen
Qwen 3.6 Plus
1M context · frontier-level

The problem with one-way reviews

Automated reviews that go nowhere

Every AI code review tool that came before worked the same way: the bot posts a block of comments when the PR opens, and then it goes silent. You get a report. You read it, or you do not. If you have a follow-up question, you are on your own.

That is a fundamentally broken model for code review. Real review is a conversation. A good senior engineer does not just drop a list of findings and walk away — they discuss the intent behind the change, ask about the edge cases you considered, explain the tradeoffs in a proposed fix, and iterate with you until the PR is in good shape. AI-assisted review should work the same way.

The static report format also misses an important reality: different people need different things from a review. The author wants to understand why a finding was flagged and how to fix it. The reviewer wants to understand what the change actually does. A team lead wants to know whether this is safe to merge. One auto-generated block of comments cannot serve all three people — it is addressed to nobody in particular and optimised for none of them.

What changes today

Tag @critique and start a conversation

Starting today, Critique is conversational. Any team member — author, reviewer, or observer — can tag @critique in a GitHub PR comment and get a context-aware response back in the thread. The bot reads the entire pull request diff, the surrounding repository files, and the existing comment history before it replies. It is not guessing. It is analyzing the actual context of your specific PR.

You can ask it anything in plain language. Why was this pattern chosen over the alternative? What does this function actually do? Is there a simpler way to write this logic? You can also use slash commands for specific, targeted operations — re-running the full review pipeline, requesting a security scan, getting test suggestions, or asking for a concrete code fix.

The commands

Eight ways to get what you need

The interactive chat supports both free-form questions and structured slash commands. Free-form questions are for the moments when you need to think out loud with the code. Slash commands are for when you know exactly what operation you want to run.

Supported @critique commands

Use these in any GitHub PR comment — inline or general

CommandWhat it doesBest used when
@critique <question>Free-form Q&A about the PR, the code, or the contextYou need to understand intent, tradeoffs, or behavior
@critique /reviewRe-runs the full multi-agent review pipeline from scratchThe PR has been updated since the last review ran
@critique /explainBreaks down complex changes or specific lines of codeA diff is hard to follow or you're onboarding someone new
@critique /fixSuggests a concrete, implementable code fix for the current issueA finding has been flagged and you want the actual patch
@critique /improveProposes architectural or code-quality improvements beyond the current PR scopeYou want to think about the broader design, not just the diff
@critique /securityRuns a focused, on-demand security analysis of the changed codeSecurity was not the primary focus of the original review but you're unsure
@critique /testsIdentifies missing test cases and provides concrete examplesCoverage looks thin or edge cases are untested in the diff
@critique /helpLists all available commands with brief descriptionsYou or a new team member want a quick reference

Commands work in both inline PR review comments and general PR-level comments. Critique reads the full thread context before responding to any message.

How context works

What Critique actually reads before it replies

When you tag @critique, the bot does not just read your comment and your comment alone. It builds a full context package before constructing a response. That package includes the complete PR diff, the metadata for every file touched by the PR, the repository's relevant surrounding files, the title and description of the PR, and the entire existing comment thread in the order it happened.

This is what makes the responses feel substantive rather than generic. When you ask "is this the right approach for handling auth tokens?" the answer is informed by the actual auth middleware in your codebase, the pattern the rest of your team is already using, and the specific way the PR deviates from or extends it. Critique is not reasoning about code in the abstract. It is reasoning about your code.

What happens when you tag @critique
Comment postedWebhook receivedContext assembled (diff + files + thread)Qwen 3.6 Plus called via OpenRouterResponse posted to GitHub thread

The model

Why Qwen 3.6 Plus

Choosing the right model for interactive PR chat is not the same decision as choosing the right model for a batch review pipeline. Batch review can afford latency. It runs in the background while your CI checks complete. Interactive chat has to feel like talking to an engineer — which means responses need to arrive in seconds, not minutes, and the model needs to hold genuinely large amounts of context without losing coherence deep in a thread.

Qwen 3.6 Plus was the clear choice on all three dimensions that matter here: context capacity, benchmark performance, and speed.

One million tokens of context

A 1-million-token context window sounds like a spec sheet number, but it has a concrete practical meaning for PR chat. A large PR can touch dozens of files. A monorepo with deep dependency chains may require reading many files beyond the diff itself to understand the change properly. A long-running PR might accumulate hundreds of comments over days or weeks of discussion. All of that needs to fit in the model's working memory simultaneously.

With 1M tokens, Critique can include the complete diff, the full content of every relevant file, the entire comment history, and still have room to reason. There is no truncation, no summarization of the thread, no soft-cutting of file contents at an arbitrary character limit. The model sees everything, and its answers reflect that.

Benchmark performance

Qwen 3.6 Plus is not just a large-context model that handles volume — it is a frontier-level coding and reasoning model that has been measured against the best alternatives available. The benchmark results are striking, and they explain why we chose it over the alternatives we evaluated.

78.8%
SWE-bench Verified — real-world bug fixing (vs 76.2% for Qwen 3.5)
61.6%
Terminal-Bench 2.0 — beats Opus 4.5, Kimi-K2.5, and GLM5
41.5%
DeepPlanning long-horizon — dominates GPT-5.1-Codex, GPT-5.2, GPT-5.3-Codex
94.3%
IFEval instruction following — delivers exactly what you ask for

The SWE-bench Verified score of 78.8% represents a meaningful improvement over the Qwen 3.5 series and puts it in genuinely competitive territory with frontier models. SWE-bench tests the model's ability to fix real bugs in real open-source codebases — the closest available proxy for what the model actually needs to do when you ask it to "/fix" something in your PR.

The DeepPlanning long-horizon score is especially relevant for PR chat. Long-horizon reasoning tests measure how well a model maintains coherent reasoning across a complex task with many steps — exactly what happens when a reviewer and author exchange a long sequence of questions and follow-ups across a multi-file PR. At 41.5%, Qwen 3.6 Plus significantly outperforms GPT-5.1-Codex, GPT-5.2, and GPT-5.3-Codex in this dimension. For PR conversations that run long and get complicated, this matters.

Speed

Qwen 3.6 Plus is exceptionally fast. In our testing, responses to typical PR chat messages — a question about a function, a /fix request for a specific finding — stream back within 3–6 seconds of the webhook being processed. That is fast enough to feel like an actual conversation rather than a polling loop.

This matters more than it might seem. A review tool that takes two minutes to respond to a follow-up question is not a conversational tool — it is a slower version of the original static report. Speed is what makes the interaction feel like collaboration rather than waiting.

Real workflows

What this looks like in practice

Here are the patterns we have seen teams reach for most in the first days of using this feature.

Author-side use cases
  • "@critique why did you flag this pattern as a security issue?"
  • "@critique /fix — apply the suggested change to the highlighted block"
  • "@critique is there a cleaner way to write this that avoids the edge case you flagged?"
  • "@critique /tests — what coverage am I missing for this new function?"
  • "@critique /explain — help me write the PR description for this change"
Reviewer-side use cases
  • "@critique what does this middleware actually do before it calls next()?"
  • "@critique /security — scan this auth change independently"
  • "@critique does this change to the API response schema break any callers?"
  • "@critique /review — the author just pushed a fixup commit, re-run the pipeline"
  • "@critique is the test coverage adequate for a change of this risk level?"

The thread as a record

One underrated benefit of the conversational format is what it produces as a side effect: a documented decision trail. When an author asks Critique to explain why a specific pattern was flagged, and Critique explains the risk, and the author adjusts the code and asks for a re-review, all of that is preserved in the PR thread. Six months later, when someone is trying to understand why that auth flow was written the way it is, the reasoning is right there in the history.

This is something static review reports cannot provide. A block of findings appended to the PR gives you the conclusions without the reasoning. An interactive thread gives you the conclusions, the questions, the answers, and the iterations — a complete picture of how the team thought about the change.

Setup

How to get started

If you already have the Critique GitHub App installed, interactive PR chat is live for your repositories with no additional configuration required. Two webhook subscriptions are needed — issue_comment and pull_request_review_comment — which the App handles automatically.

If you do not yet have the App installed, the install flow takes under two minutes. Select the repositories you want Critique to monitor, authorize the App, and the bot will be ready to respond to @critique mentions in any PR on those repos immediately.

What comes next

Interactive PR chat is the first step toward making Critique a full participant in the development workflow rather than a tool that runs in the background and produces a static artifact. The next capabilities we are building in this direction include persistent context — so Critique remembers the decisions made in previous PRs on the same codebase when answering questions in new ones — and agentic fix proposals, where a /fix command can trigger Remedy to write and push the actual patch to the PR branch directly.

The goal is to close the loop entirely: AI writes the code, Critique reviews it automatically, the author and reviewer talk to Critique interactively to understand and refine it, and Remedy applies the agreed fixes. Every step in that loop is either live or in active development.

For investors & partners

Read our investor letter — the full picture on where we are, what we're building, and what's next for Critique.

Read the investor letter →

Start a conversation with your next PR.

Install the Critique GitHub App, open a pull request, and type @critique in a comment. The bot is ready.

Install the GitHub App →
No. If the Critique GitHub App is already installed on your repositories, the feature is live immediately. The webhook subscriptions for issue_comment and pull_request_review_comment are handled automatically. Just type @critique in any PR comment.
Yes. The bot responds to @critique mentions in both general pull request comments and inline review comments attached to specific lines or files. For inline comments, Critique includes the line and file context in its response.
Critique assembles a context package that includes the full PR diff, relevant repository files beyond the diff, the PR title and description, and the complete comment thread history. It then calls Qwen 3.6 Plus with that context before constructing a response.
Yes. The CRITIQUE_PR_COMMENT_CHAT_MODEL environment variable overrides the default Qwen 3.6 Plus model. Any OpenRouter-compatible model ID can be used. The default is qwen/qwen3.6-plus.
Typical responses for standard questions and commands arrive within 3–6 seconds of the comment being posted. Larger context packages (long threads, large PRs) may take slightly longer, but responses stream as they are generated so the first tokens appear quickly.
There is no hard limit. Each mention triggers an independent context assembly and model call. For very active PRs with many exchanges, the full thread history is included each time, which may increase response time slightly as the thread grows long.
Yes. The /review command triggers a complete re-run of the Critique review pipeline — Scout context gathering, specialist parallel analysis, and Lead Reviewer synthesis — not just a quick re-read. This is the same pipeline that runs automatically when a PR is opened.

Ask about this essay

Nemotron-3-Super
Ask about the argument, the evidence, the structure, or how the post connects to Critique.
Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy