Critique.sh — developer under a tree, building calmly in the field

A note from the Critique team

Dear Investors.

This is not a pitch deck. It is an honest account of what we are building, why we believe it matters, and what the opportunity looks like from where we sit.

01The problem

AI made writing code trivially fast. Nobody fixed the review layer.

Every engineering team we have talked to is feeling the same pressure: pull requests are arriving faster, diffs are getting larger, and reviewers are burning through attention just to keep up — let alone provide meaningful feedback.

The AI productivity narrative focuses almost entirely on generation. Write a function. Autocomplete a test. Ask the model to explain the code. But velocity without a quality gate is just a faster path to production bugs.

Traditional CI answers the question: does this build? It cannot answer the harder question: should this merge? That is the gap. And it compounds directly with AI adoption — the more code agents write, the more output lands in review queues that human reviewers were never designed to process at this scale.

02The insight

Review is not a bot. It is an architecture.

Most AI review tools throw a single model at a diff and call it done. The model has no memory of your codebase, no awareness of what the PR could break, and no principled way to express confidence in its findings.

We approached this differently. We modelled the review process the way a skilled engineering team would run it: one scout who understands the blast radius, parallel specialists who investigate different risk surfaces, and a lead who reads all the evidence before writing a verdict.

Every PR opens a shared investigation board. Scout maps the codebase context — callers, contracts, tests, adjacent modules — not just the diff. Specialists claim lanes (security, tests, architecture, performance) and work in parallel, emitting machine-readable findings with severity, confidence, and line-level evidence. The lead reasoning model reads the full trail and produces a structured verdict that can drive a required GitHub check. The result behaves like a disciplined review squad, not a text generator.

03The market

AI will write a massive percentage of the world's software. Review quality becomes the safeguard.

GitHub reported more than 100 million developers in 2024. Copilot and its successors are accelerating output across that entire base. The emerging bottleneck is not generation — it is judgment.

The tools that win in this shift will be the ones that help teams understand, review, and trust what AI writes on their behalf. That is not an IDE plugin. It is a review infrastructure layer that sits between every commit and every production environment.

We are not competing with Copilot or Cursor for the seat inside the editor. We are the gate at the boundary where code becomes a company's liability.

04The product

A GitHub App that engineers install in ten minutes and teams trust on day one.

Critique is a GitHub App. It responds to pull request events, runs the full multi-agent review pipeline, and writes a structured verdict directly into the GitHub Checks UI — optionally as a required check that blocks merge until the verdict is PASS.

The dashboard gives teams visibility into every review run: findings by severity, specialist attribution, evidence packs, policy configuration per repository or installation. Engineers can configure which specialists run, how strict review should be, and what models the lead and sub-agents use.

Critique Chat sits alongside the review product — a repository-grounded AI that can answer questions about your codebase without consuming review credits. It is how teams stay productive between PRs and how we retain engagement between review events.

Remedy closes the loop. When Critique flags a problem, Remedy can spin up an ephemeral environment, apply the fix against the structured finding, run lint, tests, and build, then push directly to the PR branch — bounded to two loops to prevent runaway execution.

05The model

Pay for what you use. Credits, not seats.

We sell monthly credit pools — not per-seat licenses, not flat PR caps. One credit is a normalised review slice, roughly anchored to 100k input / 15k output tokens at lead-model quality. A typical PR costs 3–50 credits depending on size and the model stack the team configures.

This aligns our revenue directly with usage and value delivered. Teams that review more pay more. Teams that route cheaper models for standard PRs and reserve frontier models for security-critical paths pay less per review. We capture the economics of model choice rather than fighting against it.

Standard, Pro, and Ultra tiers scale from 500 to 10,000 credits per month. Enterprise buyers get custom volume, SLAs, and on-premise hosting options. Students and OSS maintainers get a curated low-cost lane with unlimited repository indexing.

06Why now

The window between AI velocity and AI governance is opening fast.

The enterprises adopting AI coding tools in 2025 are discovering a hard truth: they cannot audit what they cannot review. Compliance teams, security leads, and CTOs are starting to ask pointed questions about what is actually shipping — and who is responsible for it.

The open-source model ecosystem is moving quickly toward competitive performance at dramatically lower cost. That creates exactly the conditions where a smart routing layer — one that directs the right model to the right task at the right cost — becomes structurally valuable. We built that routing into the core of the review pipeline from day one.

We believe the next two years will define which tools become review infrastructure. The teams building enterprise software pipelines today are evaluating this class of product now. We intend to be the answer they reach for.

07What we are building

We are not building another autocomplete. We are building review infrastructure for the AI era.

Short term: a GitHub App that engineering teams install the week they adopt an AI coding tool, because they immediately feel the review pressure that follows.

Medium term: configurable agent orchestration where teams define policy — which specialists run on which paths, what model quality level, what strictness threshold — and the platform enforces it across every PR without human intervention.

Long term: a trust layer for AI-authored software. The layer that tells your CTO, your compliance team, and your board that every line merged has been reviewed by a system designed for the scale AI generates at.

If you are thinking about this problem — or you know someone who is — we would like to talk.

Connect
LinkedIn
Repath Khan on LinkedIn
Profile & professional background

Book a call

Pick a 15-minute slot below. Powered by Cal.com.

Get in touch

Interested in what we're building?

We are early-stage and talking to people who take the AI governance problem seriously. If that is you, reach out directly — no deck required.

Critique.sh is in active beta. We have a real GitHub App, a real review pipeline, and real teams using it. We are not a slide deck.