ProductMarch 23, 202614 min readCritique

Relace Search: The 256K-Context Subagent Behind Serious Codebase Retrieval

Why agentic search beats “RAG and pray,” how critique.sh wires relace/relace-search through a GitHub harness, and what actually moves the needle on speed (hint: it is not twenty thousand tokens per second).

relace/relace-search · Fast agentic search
The model that does not write your app — it finds where the app actually livesSpecialized retrieval loop: parallel reads, repo-native grep, then a structured report_back so the oracle model argues from evidence, not vibes.
Context
256K
single-request budget (OpenRouter listing)
Role
Subagent
hands off to oracle / lead model
Pattern
Parallel tools
view_file · grep · report_back
critique.sh
Chat search
GitHub-backed harness
Trained for tool shape, not chat polishSame pattern as modern agent IDEsOpenRouter + Relace docs linked in-article

If you have shipped anything with agents in the last year, you already felt the pain. The big model wants to be clever immediately. The repository is huge, noisy, and full of generated junk. Dumping the whole tree into context is either impossible or stupid. Pure vector RAG is fast to wire up and slow to trust — embeddings do not understand your import graph, your feature flags, or the difference between “similar text” and “actually called from here.”

Relace’s answer — documented as Fast Agentic Search — is narrower and more honest: train a model to behave like a search intern with a strict tool schema, run several tight loops of parallel exploration, then stop with a machine-readable handoff. On OpenRouter the model ships as relace/relace-search with a 256K context budget and pricing that makes it realistic to call on every serious chat turn, not only on “VIP” threads.

PART ONE

What Relace Search Actually Is

256K

Listed context window on OpenRouter — room for long tool transcripts in one completion.

4–12

Parallel tool calls per turn Relace documents as typical — latency trades against depth.

$1 / $3

OpenRouter list pricing per million input / output tokens at time of writing — cheap enough to be infrastructure.

report_back

Terminal tool: structured summary + files + line anchors for the parent agent.

The official flow is deliberately boring in a good way. You mount a codebase at a known root (their examples use /repo). You expose view_file, view_directory, grep_search, a constrained bash, and report_back. The system prompt tells the model to batch independent reads in parallel, keep turns small, and only call report_back when it is ready to defend its choices. Deviating from those schemas is explicitly warned against — this is reinforcement territory, not “JSON-shaped suggestions.”

PART TWO

Benchmarks: Context Budget and Why It Matters

“256K context” does not mean you should stuff a quarter-million tokens of source into one prompt. It means the search episode can afford long tool transcripts — dozens of partial file views, grep payloads, and intermediate reasoning — without the session collapsing. For comparison, the chart below uses rounded public context ceilings from provider listings; real usable budget is always lower once tools, system prompts, and safety wrappers eat overhead.

Stated context window (thousands of tokens)

Rounded “marketing” ceilings from OpenRouter / vendor listings. Not equal to effective working memory after tools and formatting.

relace/relace-search256K
Representative 200K-class model200K
Representative 128K-class model128K

Relace Search value is taken from the OpenRouter model card. Other rows are generic stand-ins for common API tiers — swap in the exact IDs from your stack when you reproduce this chart internally.

Where Relace wins is not a single jaw-dropping bar on a graph. It wins on precision per dollar: fewer blind alleys than “embed top-k chunks,” fewer hallucinated paths than asking a trillion-parameter oracle to guess file names, and a structured payload the parent agent can lint, display, or re-query.

PART THREE

Why Builder Platforms and Agent IDEs Converge Here

Products like Lovable, v0-style generators, Bolt-class builders, and IDE-native agents all sell the same fantasy in different fonts: describe the outcome, watch the machine converge. The uncomfortable secret is that convergence requires navigation — finding the auth layer, the API route, the Prisma schema, the edge case test — before anyone should touch code. That navigation step is exactly where a search subagent belongs.

We are not claiming every builder routes through Relace’s endpoint. They should not all need to. What they share architecturally is the split brain: a cheaper or narrower model (or hard-coded tool loop) that maps terrain, then a stronger model that edits or explains. OpenRouter’s public “apps using Relace Search” leaderboard is the receipts that real shipping products already centralise traffic on relace/relace-search for that slice of the stack.

Naive
Embed repo chunks once a week→Retrieve top-k “similar” passages→Hope the answer was in the chunk→Oracle model improvises file paths
Agentic
Search subagent lists dirs + partial reads→Parallel greps validate hypotheses→report_back returns paths + reasons→Oracle model edits with evidence

PART FOUR

How critique.sh Uses It Today

Critique Chat exposes a tool called searchSelectedRepository. Behind it, we hydrate a Git tree through the GitHub App, implement the Relace tool surface ourselves (including a deliberately boring bash sandbox), and loop OpenRouter completions until the model calls report_back or we hit a safety cap. The oracle chat model is instructed not to freestyle repository facts before running that tool when a repo is selected.

Pull-request Scout (deterministic)
Our PR pipeline Scout still builds evidence packs from diffs, heuristics, and Git metadata first. That keeps reviews working even when a search provider is down — and it is the right default for merge gates.
Gap
Chat needs interactive, question-driven exploration. Deterministic routing cannot answer ad-hoc “where does billing call Stripe?” questions the way a tool loop can.
Relace harness in chat
runRelaceSearchOnRepository runs relace/relace-search with parallel tool calls, returns structured files and line anchors, and lets the user-facing model synthesise with receipts.

PART FIVE

When Not to Reach for Relace Search

Bad fit
You refuse to implement (and secure) the tool harness — the model is not magic without file IO.
Your “search” is really SQL over a warehouse — use SQL, not grep theatre.
You need legal guarantees every chunk came from an index you control — agentic search is opportunistic exploration.
Good fit
Monorepos where the right answer spans packages and tests.
Oracle models burning credits guessing filenames.
Chat UX where users ask “where” and “why” before “change this line.”

The meta-lesson is architectural. Models like Relace Search are not a replacement for your lead reviewer or your Claude Opus synthesis pass. They are a scalpel — one step in a DAG. Use them where the bottleneck is discovery, not judgment.

How Critique reviews a pull request
01Scout
Maps repo context, call sites, and risk zones
↓
02Shared Investigation Board
Live task space every agent works from
↓
03Specialists in Parallel
Security, tests, architecture, performance, docs
↓
04Lead Reviewer
Reads evidence, ranks severity, makes the call
↓
05Remedyoptional
Turns findings into verified fixes

If you are evaluating this seriously, read Relace’s Fast Agentic Search documentation for the exact tool schemas, then cross-check the OpenRouter model card for pricing, context, and telemetry. Reproduce one harness in a throwaway repo before you bet your production merge gate on it. The future belongs to teams who treat search as infrastructure — not as a single slider labeled “top K.”

critique.sh ships the oracle + search split for real repos.

Connect GitHub, select a repository in chat, and watch the search subagent hand structured evidence to the model you already trust for synthesis. For merge gates, our deterministic Scout + specialist DAG is built to scale — with room to deepen via the same Relace patterns when you are ready.

Get started

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy