Relace Search: The 256K-Context Subagent Behind Serious Codebase Retrieval
Why agentic search beats “RAG and pray,” how critique.sh wires relace/relace-search through a GitHub harness, and what actually moves the needle on speed (hint: it is not twenty thousand tokens per second).
relace/relace-search · Fast agentic search
The model that does not write your app — it finds where the app actually lives
Specialized retrieval loop: parallel reads, repo-native grep, then a structured report_back so the oracle model argues from evidence, not vibes.
Context
256K
single-request budget (OpenRouter listing)
Role
Subagent
hands off to oracle / lead model
Pattern
Parallel tools
view_file · grep · report_back
critique.sh
Chat search
GitHub-backed harness
If you have shipped anything with agents in the last year, you already felt the pain. The big model wants to be clever immediately. The repository is huge, noisy, and full of generated junk. Dumping the whole tree into context is either impossible or stupid. Pure vector RAG is fast to wire up and slow to trust — embeddings do not understand your import graph, your feature flags, or the difference between “similar text” and “actually called from here.”
Relace’s answer — documented as Fast Agentic Search — is narrower and more honest: train a model to behave like a search intern with a strict tool schema, run several tight loops of parallel exploration, then stop with a machine-readable handoff. On OpenRouter the model ships as relace/relace-search with a 256K context budget and pricing that makes it realistic to call on every serious chat turn, not only on “VIP” threads.
PART ONE
What Relace Search Actually Is
The official flow is deliberately boring in a good way. You mount a codebase at a known root (their examples use /repo). You expose view_file, view_directory, grep_search, a constrained bash, and report_back. The system prompt tells the model to batch independent reads in parallel, keep turns small, and only call report_back when it is ready to defend its choices. Deviating from those schemas is explicitly warned against — this is reinforcement territory, not “JSON-shaped suggestions.”
PART TWO
Benchmarks: Context Budget and Why It Matters
“256K context” does not mean you should stuff a quarter-million tokens of source into one prompt. It means the search episode can afford long tool transcripts — dozens of partial file views, grep payloads, and intermediate reasoning — without the session collapsing. For comparison, the chart below uses rounded public context ceilings from provider listings; real usable budget is always lower once tools, system prompts, and safety wrappers eat overhead.
Rounded “marketing” ceilings from OpenRouter / vendor listings. Not equal to effective working memory after tools and formatting.
Relace Search value is taken from the OpenRouter model card. Other rows are generic stand-ins for common API tiers — swap in the exact IDs from your stack when you reproduce this chart internally.
Where Relace wins is not a single jaw-dropping bar on a graph. It wins on precision per dollar: fewer blind alleys than “embed top-k chunks,” fewer hallucinated paths than asking a trillion-parameter oracle to guess file names, and a structured payload the parent agent can lint, display, or re-query.
PART THREE
Why Builder Platforms and Agent IDEs Converge Here
Products like Lovable, v0-style generators, Bolt-class builders, and IDE-native agents all sell the same fantasy in different fonts: describe the outcome, watch the machine converge. The uncomfortable secret is that convergence requires navigation — finding the auth layer, the API route, the Prisma schema, the edge case test — before anyone should touch code. That navigation step is exactly where a search subagent belongs.
We are not claiming every builder routes through Relace’s endpoint. They should not all need to. What they share architecturally is the split brain: a cheaper or narrower model (or hard-coded tool loop) that maps terrain, then a stronger model that edits or explains. OpenRouter’s public “apps using Relace Search” leaderboard is the receipts that real shipping products already centralise traffic on relace/relace-search for that slice of the stack.
PART FOUR
How critique.sh Uses It Today
Critique Chat exposes a tool called searchSelectedRepository. Behind it, we hydrate a Git tree through the GitHub App, implement the Relace tool surface ourselves (including a deliberately boring bash sandbox), and loop OpenRouter completions until the model calls report_back or we hit a safety cap. The oracle chat model is instructed not to freestyle repository facts before running that tool when a repo is selected.
Our PR pipeline Scout still builds evidence packs from diffs, heuristics, and Git metadata first. That keeps reviews working even when a search provider is down — and it is the right default for merge gates.
Chat needs interactive, question-driven exploration. Deterministic routing cannot answer ad-hoc “where does billing call Stripe?” questions the way a tool loop can.
runRelaceSearchOnRepository runs relace/relace-search with parallel tool calls, returns structured files and line anchors, and lets the user-facing model synthesise with receipts.
PART FIVE
When Not to Reach for Relace Search
- You refuse to implement (and secure) the tool harness — the model is not magic without file IO.
- Your “search” is really SQL over a warehouse — use SQL, not grep theatre.
- You need legal guarantees every chunk came from an index you control — agentic search is opportunistic exploration.
- Monorepos where the right answer spans packages and tests.
- Oracle models burning credits guessing filenames.
- Chat UX where users ask “where” and “why” before “change this line.”
The meta-lesson is architectural. Models like Relace Search are not a replacement for your lead reviewer or your Claude Opus synthesis pass. They are a scalpel — one step in a DAG. Use them where the bottleneck is discovery, not judgment.
Maps repo context, call sites, and risk zones
Live task space every agent works from
Security, tests, architecture, performance, docs
Reads evidence, ranks severity, makes the call
Turns findings into verified fixes
If you are evaluating this seriously, read Relace’s Fast Agentic Search documentation for the exact tool schemas, then cross-check the OpenRouter model card for pricing, context, and telemetry. Reproduce one harness in a throwaway repo before you bet your production merge gate on it. The future belongs to teams who treat search as infrastructure — not as a single slider labeled “top K.”
critique.sh ships the oracle + search split for real repos.
Connect GitHub, select a repository in chat, and watch the search subagent hand structured evidence to the model you already trust for synthesis. For merge gates, our deterministic Scout + specialist DAG is built to scale — with room to deepen via the same Relace patterns when you are ready.
Get started