EssayJune 2, 202624 min readCritique

Best Code Review Skill for Claude Code, Hermes, Codex, and Opencode

Q: What is the best Codex skill for PR review?

For teams using Codex app, CLI, or IDE surfaces, `critique-review` is one of the best PR review skills because it fits repo-local skills, `AGENTS.md`, and shared workflow reuse. It works well when the review standard should live in the repo instead of one user prompt history.

Q: What is the best Opencode skill for pull request review?

For a portable PR review procedure in Opencode, `critique-review` is the best fit covered here. In our same-PR experiment on Moonshot Kimi K2.6, the skill-guided run produced a tighter, better-calibrated verdict than the prompt-only run.

Q: Is critique-review a Cursor Bugbot alternative?

For free, agent-side review behavior, yes. `critique-review` gives local or cloud coding agents a stronger review procedure. If you need a hosted GitHub review product rather than a portable skill, Critique is the closer alternative to Cursor Bugbot.

Q: Is critique-review a CodeRabbit alternative?

Partly. `critique-review` is the free portable alternative when you want better review behavior inside coding agents. Critique is the better comparison to CodeRabbit when the real job is GitHub-native PR review, review artifacts, policy, and merge control.

Q: How do I review pull requests in Claude Code or Codex with a reusable skill?

Install `critique-review`, keep the review procedure in the skill, and keep repository-specific rules in `AGENTS.md` or the harness instruction layer. That split makes the review easier to repeat, inspect, and reuse across sessions.

A research-backed guide to installing `critique-review` across Claude Code, Hermes Agent, Codex, and Opencode, with a same-PR Moonshot Kimi K2.6 comparison and clear guidance on when to move to Critique.

critique-review

One skill · Claude · Hermes · Codex · Opencode

critique.sh

Harnesses in scope

One review skill, three different agent operating systems.

Anthropic

Claude Code

Native skills, subagents, project memory, and background delegation make Claude a strong home for a dedicated review persona.

Nous Research

Hermes Agent

Hermes treats skills as portable procedural memory and can carry the same review discipline across CLI, messaging, and long-lived remote sessions.

OpenAI

Codex

Codex gives the skill a durable place inside CLI, IDE, app, and repo-local workflows, with AGENTS.md and team-shared skills for repeatability.

TL;DR

If your team already uses Claude Code, Hermes Agent, Codex, or Opencode, give those agents critique-review before you ask them to review pull requests. In our same-PR Opencode test on Moonshot Kimi K2.6, the prompt-only run escalated three concerns, including speculative ones about unseen consumers and missing tests. The skill-guided run read the installed review procedure, narrowed blast radius to the actual UI change, and returned no actionable findings plus low residual risk. Claude is still the cleanest fit when you want a dedicated review subagent. Hermes is still the best fit when you want the same review brain to live across terminal, remote, and messaging surfaces. Codex is still the best fit when you want repo-local skills, AGENTS.md, app or CLI reuse, and a path into broader automation. If you outgrow the portable skill, Critique is the natural next layer because it moves the same review discipline into a GitHub-native control plane with policy, artifacts, and merge-boundary control.

Short answers for high-intent queries

These are the direct answers this page is designed to settle for engineering teams comparing review skills, review bots, and GitHub-native review workflows.

Query	Short answer
What is the best code review skill for Claude Code?	`critique-review` is a strong default when you want a portable PR review procedure inside Claude Code. Use Critique instead when you need hosted GitHub checks, policy, and merge control.
What is the best Codex skill for PR review?	`critique-review` fits Codex especially well because it works as a repo-local skill with `AGENTS.md`, reusable references, and a path into automations.
What is the best Opencode skill for pull request review?	For a portable review workflow, `critique-review` is the best fit in this article. We tested it on the same PR and same Moonshot Kimi K2.6 lane used for the baseline run.
Is critique-review a Cursor Bugbot alternative?	As a free portable skill, yes for agent-side review behavior. For a hosted GitHub-native review product, Critique is the closer Cursor Bugbot alternative.
What is a cheaper CodeRabbit alternative?	Start with the free `critique-review` skill if you want the lowest-cost entry point. Move to Critique if you need GitHub-native routing, artifacts, and PR control at team scale.
What is the difference between critique-review and Critique?	`critique-review` is the portable open skill. Critique is the hosted GitHub review control plane that adds checks, policy, merge-boundary controls, and team-grade review operations.

This table is intentionally direct. Searchers at this stage are usually choosing between a free portable skill, a local agent workflow, or a hosted GitHub review layer.

Most coding agents can write code faster than most teams can reliably audit it. That is already true in 2026. The problem is not whether the agent can open files, run tests, or emit a patch. The problem is that review quality still drifts if you leave the job at the level of a generic prompt.

“Review this PR” sounds precise to a human and underspecified to a model. One harness will produce style commentary. Another will summarize the diff and call it a review. Another will confidently escalate a weak hunch into a merge blocker because nothing in its instructions told it how to separate a verified finding from an open question. That is exactly the hole a review skill is supposed to close.

Prompt-only review loop
Ask agent to review→Agent improvises rubric→Mixed quality comments→Human re-validates everything
critique-review loop
Load skill→Establish scope + risk map→Verify before reporting→Findings first + explicit verdict

What the skill actually adds

What changes for the agent
It stops treating review as free-form prose and starts from review mode, diff shape, and blast radius.
It is told to read tests, trace data flow, and verify claims before escalating them.
It separates findings from open questions instead of collapsing uncertainty into noise.
It ends with a merge-shaped artifact: severity, file or line, impact, failure mode, fix direction, verdict.
What changes for the team
The review standard travels across tools instead of living inside one vendor prompt box.
The same policy can be reused by humans, local agents, background agents, and CI-style automation.
Review quality becomes easier to inspect because the artifact shape is stable from run to run.
The team can upgrade harnesses later without throwing away its review discipline.

Real experiment: same PR, same model, only the skill changed

The cleanest way to test a review skill is to keep the code input fixed and change only the review procedure. So we used Opencode with the same model, the same PR, and the same attached context pack for both runs. The PR was Critique PR #144, a narrow UI fix that replaces hard-coded “Auto” model labels with labels resolved from the plan-allowed effective runtime model.

The baseline run had no project-local review skill available. The second run exposed critique-review through the project skill path that OpenCode documents and that our terminal output confirmed: the harness loaded the skill and then opened the review references for output contract, intake and triage, stack lenses, and review rubric before generating its verdict.

Side-by-side illustration contrasting a noisy prompt-only PR review with a calibrated skill-guided review — Same PR, same model, same context pack — the skill changes calibration, not the diff.

What changed in the real Opencode run

Same PR, same fixed context pack, same Moonshot Kimi K2.6 lane. The difference below is the review procedure, not the code input.

Question	Prompt-only Opencode	Opencode + critique-review
Actionable findings	3 findings	0 actionable findings
Treatment of unseen consumers	Escalated as a finding even though the attached context could not verify other call sites.	Downgraded to residual risk and suggested a typecheck instead of claiming a bug.
Treatment of missing tests	Escalated as its own finding.	Recorded in checks and residual risk instead of turning it into a blocker for a narrow UI-label fix.
Blast-radius framing	Broader, more defensive, and less bounded to the actual changed behavior.	Explicitly bounded to automation settings UI with no auth or data-path changes.
Verdict	Conditionally approved	No objection
Observed harness behavior	Direct review output only.	Loaded `critique-review` and read four supporting reference files before answering.

Interpretation: the skill did not make the model “nicer”; it made the model stricter about evidence and more conservative about what counts as a finding.

Full Opencode review outputs

These are the verbatim final review texts from the two non-interactive Opencode runs on the same fixed PR context pack.

Prompt-only run

Opencode with the same attached PR pack and no project-local review skill.

**Findings**

1. **Breaking component interface change without updating all consumers.**
`InstallationPolicyCard` now requires three new string props: `autoReviewLeadLabel`, `autoReviewSpecialistLabel`, and `autoRemedyLabel`. The diff only updates the caller in `automation/page.tsx`. If other callers exist (e.g., Storybook stories, tests, or other pages), they will fail TypeScript compilation. Verify there are no other consumers before merging.

2. **Missing test updates.**
The diff does not touch any test files. Because this PR changes user-facing labels from static strings to values resolved by `resolveAllowedRuntimeModel`, tests should assert that:
- `InstallationPolicyCard` receives the correct resolved labels.
- The rendered `emptyLabel` text in each `RuntimeModelSelect` matches the formatted resolved model rather than the old hard-coded strings.

3. **Remedy selector label loses the “(workspace default)” qualifier.**
The old `emptyLabel` for the remedy selector was `"Auto (workspace default)"`. It is replaced with `` `Auto: ${formatModelLabel(autoRemedyModel)}` ``, which standardizes the format but removes the explicit “workspace default” hint. Ensure this change is intentional from a product/UX perspective (the diff suggests it is part of unifying the labels, but worth confirming).

**Checks**

| Area | Result |
|------|--------|
| **Correctness** | The use of `resolveAllowedRuntimeModel` with `requestedModelId: null` correctly forces the fallback path, yielding the effective runtime model for the installation’s plan. The final fallback to the first allowed model (via `getAllowedRuntimeModels(...)[0]?.id`) is safe. |
| **Type safety** | New props are typed as `string` and are passed as interpolated strings, so `emptyLabel` remains a string as required by `RuntimeModelSelect`. |
| **User-facing behavior** | Users will now see accurate auto labels reflecting their plan-specific model eligibility instead of hard-coded “DeepSeek V4 Pro” / “DeepSeek V4 Flash” placeholders. |
| **Edge cases** | If a plan disallows the fallback model, the helper falls back to the first allowed model in the sorted list. `formatModelLabel` handles unknown IDs via `formatRuntimeModelDisplayName`. |
| **BYOK** | The `resolveAllowedRuntimeModel` calls omit `byokProvider`, but this is consistent with the existing `getAllowedRuntimeModels` calls in the same scope. |

**Verdict**

**Conditionally approved.** The resolution logic is correct and the hard-coded labels are properly replaced. Merge only after confirming that all `InstallationPolicyCard` consumers are updated with the new required props and that tests cover the resolved label rendering.

**Summary**

This PR fixes misleading auto-model placeholders by resolving the effective runtime model for each role based on the installation’s plan and passing the resulting formatted labels into `InstallationPolicyCard`. The logic is sound, but the change to the component contract carries a regression risk for unseen consumers, and the diff lacks test coverage for the new dynamic label behavior.

Skill-guided run

Opencode with the same attached PR pack after loading the project-local `critique-review` skill and references.

**Findings**

No actionable findings. The change correctly replaces hard-coded auto model labels with labels derived from the effective runtime model resolution logic.

**Checks**

- Traced `resolveAllowedRuntimeModel` to confirm it returns the requested model if allowed, the fallback if allowed, or the cheapest plan-allowed model as a last resort — matching the intended “effective auto model” semantics.
- Verified the three new props (`autoReviewLeadLabel`, `autoReviewSpecialistLabel`, `autoRemedyLabel`) are passed from the server component and consumed in all three `RuntimeModelSelect` `emptyLabel` slots, replacing the previous hard-coded strings.
- Confirmed `resolveAllowedRuntimeModel` calls omit `byokProvider`, which is consistent with the existing `getAllowedRuntimeModels` calls already present in the same map block.
- No tests were added or modified for this UI-only label change.
- Unable to confirm from the attached context whether `InstallationPolicyCard` has other consumers; a TypeScript build would surface any missing required prop errors at other call sites.

**Verdict**

No objection

**Summary**

A safe, targeted frontend fix that removes misleading hard-coded auto labels from workspace automation settings. The labels now correctly reflect the plan-allowed effective fallback model instead of static placeholders. No data or auth paths are touched, and the blast radius is limited to the automation settings UI. Residual risk is low: run a typecheck to guard against any other `InstallationPolicyCard` consumers missing the new required props, and consider a lightweight component test asserting prop-driven label rendering.

Why these three harnesses matter

Claude Code, Hermes Agent, and Codex are not the same product category even though all three can edit repositories. Claude is especially strong when you want skills plus subagents inside a focused coding surface. Hermes is unusually strong when you want an agent with persistent memory, multi-platform reach, and portable open-standard skills. Codex is unusually strong when you want the same skill to survive across CLI, IDE, app, and repo-local automation, with a first-party story around AGENTS.md and reusable workflows.

What each harness gives critique-review natively

This is the practical compatibility view: where the skill lives, how it gets invoked, and why the harness changes the operating style.

Question	Claude Code	Hermes Agent	Codex
Native skill shape	`SKILL.md` skills and markdown subagents.	`SKILL.md` skills with references, scripts, and hub installs.	`SKILL.md` skills with optional scripts, references, assets, and `agents/openai.yaml`.
Automatic loading	Yes; descriptions drive auto-use and direct `/skill-name` invocation.	Yes; `skills_list()` loads compactly and `skill_view()` expands on demand.	Yes; Codex includes an initial skill list, then reads the full skill when selected.
Project instruction layer	`CLAUDE.md`, with a documented import path for `AGENTS.md`.	Top-level `AGENTS.md` at session start, subdirectory files lazily.	`AGENTS.md` as the shared repo instruction layer for Codex surfaces.
Memory model	Project memory plus optional subagent memory.	Persistent built-in memory and optional external memory providers.	Repo instructions, skills, and broader Codex memories and workflows.
Parallelism story	Subagents, agent view, teams, background work.	Delegation, remote backends, scheduled automations, messaging surfaces.	Parallel agents, worktrees, automations, app plus CLI plus IDE.
Best use of critique-review	Dedicated review subagent or project skill.	Portable review procedure that follows Hermes everywhere it runs.	Repo-local review standard shared across local and cloud Codex work.

Based on the official docs for Claude Code skills and subagents, Hermes skills and context files, and Codex skills plus AGENTS.md.

Case study pattern one: Claude Code as the senior reviewer inside the coding loop

Claude Code is the cleanest fit if your goal is to turn review into a specialized persona rather than a sentence you keep retyping. Anthropic now documents a native skills system, markdown-defined subagents, project CLAUDE.md, and background delegation. That means critique-review can live in exactly the shape Claude already expects rather than being smuggled in as a huge one-off prompt.

Illustration of a dedicated Claude Code review subagent separated from the main coding session — Claude Code already has the primitives a review skill wants: skills, subagents, project memory, and background work.

The documented operator pattern is straightforward. Put the skill in ~/.claude/skills/ for personal reuse or .claude/skills/ for project reuse. If the repository already standardizes on AGENTS.md for multi-agent instructions, Anthropic explicitly documents importing that file from CLAUDE.md, which means you do not have to fork your team policy just to accommodate Claude. If review deserves even tighter identity, promote the same discipline into a custom review subagent. Claude’s docs go further here: subagents can have their own system prompt, memory scope, hooks, and independent tool restrictions.

That changes the quality of review in a very practical way. Instead of asking your main coding session to switch personality midstream, you give Claude a dedicated review worker. The worker can keep the main implementation thread clean, inspect the changed files in its own context window, and come back with a findings-first verdict. For a team already living in Claude Code all day, this is the lowest-friction way to stop code review from collapsing into narrative explanation.

Case study pattern two: Hermes Agent as the portable review brain

Hermes Agent is the most interesting harness in this set if you care about portability more than polish. Nous positions Hermes as a self-improving agent with persistent memory, a skills system, top-level AGENTS.md loading, multiple execution backends, and delivery surfaces that range from CLI to Telegram to Slack to remote server runtimes. That is a very different contract from “one coding assistant inside one editor.”

For critique-review, that matters because the skill is already written as procedural memory. Hermes documents exactly the same pattern: skills are markdown files with frontmatter, they load through skills_list() and skill_view(), every installed skill becomes a slash command, and the agent can install a single-file skill directly from an HTTP URL. Hermes also explicitly frames skills as the place for reusable multi-step procedures, while memory holds facts about the user, project, and environment. That split is almost tailor-made for a review skill.

Why Hermes changes the story
The same review skill can run in terminal, messaging, and remote backends instead of being trapped in one local editor.
Hermes loads top-level AGENTS.md, so repo policy and review procedure can sit together cleanly.
Progressive disclosure keeps the skill cheap until a review request actually triggers it.
Persistent memory means the agent can remember repeated repo-specific review patterns across sessions.
What this looks like in practice
A platform lead pings Hermes from chat to review a risky infra patch while Hermes is running on a remote box.
The agent loads critique-review, reads the repo instruction layer, and applies the same severity contract it would use in terminal.
Follow-up sessions get stronger because Hermes can retain the conventions and failure patterns that matter for that codebase.
The review discipline becomes a durable capability, not a one-time conversation artifact.

That makes Hermes the best home for critique-review when the business problem is not just PR review in one IDE, but review discipline that needs to survive across surfaces and time. If your engineering workflow spills from terminal to chat ops to remote agents, Hermes gives the skill the widest runway.

Illustration of the critique-review skill traveling across Hermes terminal, chat, and remote execution surfaces — Hermes is the portability play: the same review procedure can follow you out of a single editor window.

Case study pattern three: Codex as the team-shared review standard

Codex is the strongest fit when you want a review skill to become part of the repository, not just part of one user’s setup. OpenAI’s Codex docs now treat skills as first-class reusable workflows. Codex documents a skill directory with optional scripts, references, and agents/openai.yaml, plus explicit invocation and implicit selection. OpenAI also documents AGENTS.md as the custom-instruction layer for Codex across its surfaces.

Illustration of repo-local SKILL.md and AGENTS.md as a shared Codex review standard — Codex is the best fit when you want the review skill to live in the repo and survive across app, CLI, IDE, and automation surfaces.

The most important Codex signal is not just that skills exist. It is that OpenAI is publicly documenting team use of repo-local skills, AGENTS.md, and GitHub Actions to turn repeated engineering tasks into repeatable workflows. In OpenAI’s own write-up about OSS maintenance, they report 457 merged PRs across two Agents SDK repos in the December 2025 to February 2026 window, up from 316 in the previous three months, with repo-local skills and AGENTS.md called out as part of the setup. That does not prove every skill boosts throughput on every team. It does prove OpenAI is operationalizing the exact pattern this review skill belongs to.

For a team using Codex app, CLI, or IDE extension, critique-review becomes the shared review grammar. The same repository can tell Codex how the team wants review to work, the same skill can be invoked locally or automatically, and the same procedure can be read by any collaborator who opens the repo. This is where the free skill starts to feel less like content and more like infrastructure.

What benefits show up across all three harnesses

Portable wins

Benefit	Why it matters
Stable review artifact	The output becomes comparable across runs because the agent is pushed into findings, checks, and verdict instead of free-form summary.
Less prompt drift	You stop re-explaining your review philosophy in every new session or every new tool.
Better false-positive control	The skill explicitly tells the agent to verify claims and downgrade uncertainty into questions or residual risk.
Cross-tool continuity	Teams can change harnesses or run several at once without resetting their review discipline.
Cleaner governance	Repo policy lives in `AGENTS.md` or the harness instruction layer; procedure lives in the skill; that separation scales better than giant monolithic prompts.
Independent reviewer identity	The code-writing agent no longer has to invent a review persona on the fly.

When the free skill is enough, and when it should turn into Critique

For many teams, the free portable skill is enough at first. If you are trying to improve how Claude, Hermes, or Codex reviews diffs locally, inside chat, or in a narrow internal workflow, start there. It is cheap, transparent, and it teaches the team what better review output actually looks like.

But a skill is still only a skill. It does not by itself give you GitHub-native checks, merge-boundary control, shared policy enforcement, auditable review artifacts, review routing across multiple specialist lanes, or a product surface built specifically for high-volume pull-request operations. That is the moment where the recommendation should become explicit: move from the free skill into Critique.

FAQ

No. The skill is portable. Codex can read optional OpenAI-specific metadata, but the core procedure is just SKILL.md plus references. Claude Code and Hermes both document compatible skill systems built on the same open standard.

Not conceptually. Claude Code now supports skills directly and can also wrap the same review procedure in a subagent when you want a dedicated reviewer identity.

Because Hermes turns the review procedure into a portable capability that can follow you into remote sessions, scheduled work, and messaging surfaces. That is useful when review requests do not stay inside one editor window.

Because the skill solves procedure. Critique solves control. When review becomes a GitHub workflow, not just an agent habit, you need the hosted layer.

If you want a portable, repo-visible PR review procedure inside Claude Code, critique-review is a strong default. It gives Claude a stable review contract: scope, risk map, verification, findings, checks, and verdict. If you need hosted GitHub checks, policy, and merge control, move from the skill to Critique.

For teams using Codex app, CLI, or IDE surfaces, critique-review is one of the best PR review skills because it fits repo-local skills, AGENTS.md, and shared workflow reuse. It works well when the review standard should live in the repo instead of one user prompt history.

For a portable PR review procedure in Opencode, critique-review is the best fit covered here. In our same-PR experiment on Moonshot Kimi K2.6, the skill-guided run produced a tighter, better-calibrated verdict than the prompt-only run.

For free, agent-side review behavior, yes. critique-review gives local or cloud coding agents a stronger review procedure. If you need a hosted GitHub review product rather than a portable skill, Critique is the closer alternative to Cursor Bugbot.

Partly. critique-review is the free portable alternative when you want better review behavior inside coding agents. Critique is the better comparison to CodeRabbit when the real job is GitHub-native PR review, review artifacts, policy, and merge control.

Install critique-review, keep the review procedure in the skill, and keep repository-specific rules in AGENTS.md or the harness instruction layer. That split makes the review easier to repeat, inspect, and reuse across sessions.

A code review skill improves how one agent reviews a diff. A GitHub review control plane manages the review as a workflow: checks, policy, artifacts, routing, and merge-boundary decisions across many pull requests and many contributors.

Primary sources

Claude Code skills

Anthropic documents skills, direct invocation, dynamic context injection, and the Agent Skills standard.

Claude Code subagents

Anthropic documents markdown subagents, memory scopes, hooks, explicit invocation, and background execution.

Claude Code memory and CLAUDE.md

Documents project memory, CLAUDE.md, and the import path for AGENTS.md.

Hermes Agent features overview

Documents tools, skills, persistent memory, and the open skill standard.

Hermes working with skills

Documents skill installation, slash commands, progressive disclosure, and direct HTTP installs.

Hermes tips and AGENTS.md behavior

Documents top-level AGENTS.md loading and the memory-versus-skills split.

Codex skills

OpenAI documents how Codex discovers skills, what files a skill can contain, and how explicit or implicit invocation works.

Codex AGENTS.md guide

OpenAI documents AGENTS.md as the repo instruction layer for Codex.

OpenCode agent skills

Documents project-local SKILL.md discovery, including .agents/skills/ and .opencode/skills/ paths.

OpenCode CLI run mode

Documents non-interactive opencode run plus file attachment support via --file / -f.

OpenAI on using skills for OSS maintenance

Documents repo-local skills, AGENTS.md, GitHub Actions, and throughput changes in the Agents SDK repos.

Introducing the Codex app

Explains skills, parallel agents, worktrees, and automations across Codex surfaces.

Start with the skill. Move to Critique when review becomes a control problem.

Download critique-review for Claude Code, Hermes Agent, or Codex when you want a portable review standard. Move into Critique when you want that same standard enforced on the GitHub pull request with policy, artifacts, and a real merge-boundary surface.

Open the skill

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy