EssayMarch 9, 202614 min readThe critique.sh Team

STATE OF AI ENGINEERING · 2026: The Claude Code Inflection Point & What It Means for the PR Review Crisis

Inside the fastest adoption curve in developer tooling history — and why the bottleneck has moved from writing code to reviewing it.

29M

Daily VS Code installs — up from 17.7M in Jan 2026

Of all public GitHub commits now authored by Claude Code

$2.5B

Annualised run-rate revenue by February 2026

73%

Of engineering teams using AI coding tools daily in 2026

I want to be honest about something we don't say enough as founders: the product we're building at critique.sh was made necessary by a tool we deeply admire. That tool is Claude Code. And in the first quarter of 2026, it has done something unprecedented — it has outpaced developers' ability to review the code it writes.

This piece is my attempt to document what's actually happening in software engineering right now, from where we sit as a team building infrastructure for the new agentic development era. It is partly a love letter to what Anthropic has built, partly an honest market analysis, and partly a founder's confession about why the problem we're solving is more urgent than we originally thought when we started critique.sh.

The short version: AI has solved the code generation problem faster than anyone expected. It has not solved the code quality problem. That gap — between the velocity of AI-assisted writing and the bottleneck of AI-assisted review — is the exact seam critique.sh was built to close.

PART ONE

The Fastest Adoption Curve in Developer Tooling History

There is a phrase being used in the industry right now — "Claude Code is having its ChatGPT moment." Having watched this unfold in real time, I can tell you: that framing undersells it.

When ChatGPT launched in November 2022, it took roughly two months to reach 100 million users. The comparison is made because the cultural penetration felt sudden, like a threshold had been crossed overnight. Claude Code's trajectory rhymes with that feeling, but the vector is different. This isn't consumer adoption — this is professional infrastructure adoption, which historically moves far slower.

The numbers themselves are staggering in context. Claude Code hit a $1 billion annualised run-rate just six months after launch — faster than ChatGPT, faster than Slack, faster than any enterprise software product in recorded history. By February 2026, that figure had already crossed $2.5 billion. Anthropic's total revenue trajectory reads like a typo: $1B ARR in late 2024, $4B by mid-2025, $9B by year-end, $14B in February 2026. That is 14× growth in 14 months.

What makes this different from prior hype cycles is where the adoption is concentrated. According to a 15,000-developer survey conducted in early 2026, 73% of engineering teams now use AI coding tools daily — up from 41% in 2025 and just 18% in 2024. Crucially, Claude Code leads for complex work: when developers were asked which tool they rely on for multi-file refactoring, architecture design, and debugging hard bugs, Claude came in first at 44%. That is the category where trust is hardest to earn and matters most.

MID 2025

$1B ARR milestone

Claude Code becomes the fastest enterprise software product to reach $1B in annualised revenue. Six months from launch to unicorn revenue run-rate.

NOVEMBER 2025

Bun acquisition signals developer infrastructure ambitions

Anthropic acquires Bun — a JavaScript runtime with 7M monthly downloads — integrating it directly into Claude Code's deployment pipeline.

JANUARY 12, 2026

Cowork launches in 10 days

Four engineers built Claude Code for general computing in under two weeks — using Claude Code itself. The total addressable market for agentic tooling suddenly extends far beyond 28M professional developers.

JANUARY–MARCH 2026

Agent Teams ships; the architecture goes multi-agent

With Opus 4.6, Claude Code's Agent Teams feature removes the coordinator bottleneck. Teammates now message each other directly, claim tasks from a shared list, and work in true parallel. The architecture increasingly resembles what we built at critique.sh — by necessity.

MARCH 2026

4% of all public GitHub commits

SemiAnalysis estimates Claude Code now authors 4% of all public commits on GitHub. Their projection: 20%+ by end of 2026. The PR review crisis becomes an engineering leadership crisis.

One developer with Claude Code can now do what took a team a month. The cost of a Max subscription is $200/month. The median fully-loaded US knowledge worker costs $350–500 per day.

PART TWO

Why Claude Code Is Categorically Different

Understanding why Claude Code has broken from the pack requires understanding what it is — and what it isn't. Most AI coding tools, including early GitHub Copilot, are stateless IDE extensions: sophisticated autocomplete. Every interaction begins fresh. The context is limited to the current file, maybe the current directory.

Claude Code is something fundamentally different. It reads and writes files directly. It executes bash commands. It maintains state across sessions and stores knowledge in files, building up context and spatial awareness over time. It coordinates multi-step processes that span hours or days. As one early adopter put it on Bloomberg's Odd Lots podcast: "It's more like hiring a junior developer than using autocomplete."

The terminal-native design — initially treated as a limitation compared to IDE integrations — turns out to be its architectural advantage. It enables persistent state management. It removes the IDE as a constraint on what can be automated. It makes Claude Code composable with CI/CD pipelines in ways that IDE extensions simply cannot be.

The real-world evidence is extraordinary. Rakuten engineers used Claude Code on a genuinely hard task: implementing an activation vector extraction method in vLLM, a 12.5-million-line codebase. Claude Code finished in seven hours of autonomous work with 99.9% numerical accuracy. TELUS teams created over 13,000 custom AI solutions and shipped engineering code 30% faster, saving over 500,000 hours. Zapier achieved 89% AI adoption across their entire 800-person organisation.

Individual developers report a 164% increase in story completion. Some have nearly doubled their pull request merge rates. The productivity research is extraordinarily consistent: 26–55% improvements in task completion speed, with experienced engineers seeing the largest gains.

PART THREE

The Problem We Never Saw Coming: The Review Bottleneck

Here is the thing nobody was talking about when the AI coding wave began: the bottleneck was never going to stay at code generation. It was always going to move.

When you give an engineering team a tool that multiplies their output by 5–10×, you have not eliminated the constraint on velocity. You have relocated it. And in 2026, that constraint has relocated squarely onto the pull request review process.

Think through the arithmetic. If an engineer previously opened 4 PRs per week and now opens 20 — because Claude Code handled the implementation — the review load on the team has quintupled. The senior engineers who are best qualified to review are also the ones most likely to be using AI to accelerate their own output. Everyone is producing more. Nobody has more time to review.

This is not a hypothetical. Teams building with Claude Code are describing it in real time. PR volume and size immediately jump when AI coding tools are introduced. Line-by-line human review becomes, as one engineering leader put it bluntly, "theater: slow, inconsistent, and blind to subtle security drift."

The naive response is to also use Claude Code for reviews — and many teams are doing exactly this. Individual developers are spinning up 9 parallel subagents, each focused on a specific dimension of code quality: security, performance, test coverage, architectural drift, business logic correctness. The results are genuinely promising — one team reports that AI review suggestions are "~75% useful," significantly better than earlier generations of tools.

But ad-hoc, bespoke subagent review scripts built in individual developers' CLAUDE.md files are not enterprise infrastructure. They are duct tape. They lack context depth, governance, cost controls, auditability, and the kind of organisational trust that comes from standardised process. They solve the problem for one developer on one codebase. They do not scale.

Diff-only review misses
Call sites touched by the change
Surrounding tests that should have been updated
Dependency drift across modules
Hidden auth and security boundary implications
Silent regressions in related logic
Repo-aware review catches
Full impact zones mapped before review starts
Related tests and missing coverage surfaced
Dependency and module boundary analysis
Auth and security surface inspection
Architectural drift and constraint violations

Old review flow
Developer→PR opened→Single AI pass→Human skims→Merge
With Critique
AI agent writes code→PR opened→Scout maps repo→Shared board opens→Specialists in parallel→Lead Reviewer verdict→Remedy (optional)→Safe merge

NAIVE
Single-Prompt Bot. One model reads the diff. No codebase context. Generic feedback. 40–50% noise rate. No severity ranking. No fix capability.
GAP
Context + Coordination. The diff is 3% of what matters. The other 97% is what the change touches, breaks, or misses across the whole system.
RIGHT
Multi-Agent Control Plane. Scout + Specialist parallelism + High-reasoning synthesis. Evidence-based, cost-routed, autonomously resolvable.

The core insight came from watching how the best human reviewers actually work. A great senior engineer doesn't just read the diff. They pull up three related files the PR author didn't change. They trace the call graph. They ask what happens to the edge cases nobody wrote tests for. They cross-reference the change against the last security incident. That contextual intelligence — spatial awareness of the whole system — is what naive AI review bots completely lack.

So we built the architecture that enables it. Three layers, working in concert.

How Critique reviews a pull request
01Scout
Maps repo context, call sites, and risk zones
↓
02Shared Investigation Board
Live task space every agent works from
↓
03Specialists in Parallel
Security, tests, architecture, performance, docs
↓
04Lead Reviewer
Reads evidence, ranks severity, makes the call
↓
05Remedyoptional
Turns findings into verified fixes

PART FOUR

Model-Flexible Economics: Why One-Size-Fits-All AI Review Fails

One of the most underappreciated dynamics in enterprise AI adoption is that the cost structure of LLMs creates an impossible tradeoff with naive architectures. If you route every PR through Claude Opus or GPT-5.5 Pro for the depth of analysis you actually want, your LLM costs become prohibitive at scale. If you use cheap models on everything, your review quality degrades to the point where it misses exactly the issues that matter most.

Teams end up doing neither — they achieve partial coverage, or they use AI review inconsistently, defeating the purpose of an intelligent merge gate.

The right mental model is one of risk-proportional resource allocation. A routine frontend change updating copy on a marketing page deserves fast, cheap, thorough-enough analysis. A change to your authentication middleware touching token validation logic deserves the highest-reasoning model available, regardless of cost. The economics work out — you spend aggressively where it matters, not everywhere.

Model-Flexible Economics: Why One-Size-Fits-All AI Review Fails
PR TypeModel StackCredit RangeUse Case
STANDARD REVIEWGLM-5 / MiniMax M2.71–5 creditsHigh-volume everyday PRs — style, tests, logic basics
ELEVATED REVIEWClaude 3.5 Sonnet8–20 creditsFeature PRs with architectural implications
CRITICAL REVIEWClaude Opus 4.6 / GPT-5.2 Pro / GPT-5.5 Pro37–237 creditsInfrastructure, auth, payment, security-sensitive changes
BYOA MODEYour existing subscriptions0 creditscritique.sh generates the fix blueprint; your Claude Code / Codex / Copilot executes

PR Type	Model Stack	Credit Range	Use Case
STANDARD REVIEW	GLM-5 / MiniMax M2.7	1–5 credits	High-volume everyday PRs — style, tests, logic basics
ELEVATED REVIEW	Claude 3.5 Sonnet	8–20 credits	Feature PRs with architectural implications
CRITICAL REVIEW	Claude Opus 4.6 / GPT-5.2 Pro / GPT-5.5 Pro	37–237 credits	Infrastructure, auth, payment, security-sensitive changes
BYOA MODE	Your existing subscriptions	0 credits	critique.sh generates the fix blueprint; your Claude Code / Codex / Copilot executes

This model-flexible architecture — which we call the credit system — is what allows teams to achieve 100% PR coverage without exploding their LLM budget. The Scout and Specialists run on cost-effective models. The Lead Oracle escalates to premium reasoning only when the risk profile warrants it. The economics work.

The BYOA (Bring Your Own Agent) layer deserves special mention. Many of the teams we talk to already have Claude Code subscriptions, Codex seats, or Copilot Enterprise licences. They've invested in those tools. They don't want to replace them — they want to orchestrate them intelligently. critique.sh generates the fix blueprint from deep-context analysis; if a team chooses its own execution stack, the handoff can happen outside Critique. This modularity is not a compromise — it's a recognition that the market will be heterogeneous for years.

PART FIVE

Remedy: From Critique to Autonomous Resolution

The natural question, once you have a reliable review signal, is: why stop at feedback? If the review identifies a missing edge-case test, a linting error, or a straightforward security pattern violation — all of which are deterministic, verifiable fixes — why not just fix it?

This is the thinking behind Remedy, our autonomous cloud coding agent. When the Lead Oracle identifies a fixable issue and classifies it as resolvable, Remedy boots an ephemeral, isolated cloud VM. It downloads the repository, verifies the latest git state, writes the fix, and runs the local test suite and build process to verify the patch is functional. Once verified, it pushes the commit back to the PR branch when the GitHub App can access the PR head repository; otherwise Critique stops at the review artifact instead of attempting a push.

The design constraint we were most deliberate about: Remedy operates within a strict two-loop autonomous limit. We have seen what happens when AI agents run without checkpoints — they drift, they over-correct, they introduce new problems while solving old ones. The two-loop limit is not a technical constraint; it is a governance constraint. AI that knows when to stop and hand back to humans is more trustworthy than AI that never stops.

The Remedy architecture also reflects something we've learned from watching Claude Code's Agent Teams develop: the most dangerous moment in agentic systems is when a specialist agent completes a task and the coordinator doesn't synthesise correctly. Remedy's ephemeral VM model means each fix is isolated — if it fails validation, the VM is discarded. There is no state contamination. There is no runaway accumulation of AI decisions.

PART SIX

Where This Goes: The 2026 Prediction I'm Prepared to Make

SemiAnalysis published a projection in early 2026 that I think about often: Claude Code currently authors 4% of all public GitHub commits. By end of 2026, they project it will exceed 20%. That would mean that one in five commits on GitHub is written by an AI.

I think that projection is credible. The adoption curve has not shown any signs of inflection. Enterprise deals are accelerating — 300,000+ business customers, 70% of Fortune 100 companies using Claude, Accenture deploying to 30,000 employees. The developer survey data shows AI coding tools crossing from early-adopter to professional standard. There is no obvious reason for the curve to bend downward before it reaches that level of penetration.

What this means for code quality infrastructure is significant. If 20% of commits are AI-generated by year-end, the teams that built reliable, scalable review infrastructure in the first half of 2026 will have compounding advantages over those that didn't. Review quality is not just about catching bugs — it is about maintaining the institutional knowledge encoded in your codebase, enforcing architectural contracts, and ensuring that the AI's output genuinely reflects your system's actual constraints.

The eight trends Anthropic themselves published in their 2026 Agentic Coding Trends Report are telling. They identify four priority areas for organisations planning this year: mastering multi-agent coordination, scaling human-agent oversight through AI-automated review, extending agentic coding beyond engineering teams, and embedding security architecture from the earliest stages. Three of those four directly intersect with what critique.sh does.

The organisations pulling ahead, as Anthropic's own analysis notes, are not the ones removing engineers from the loop. They are the ones making engineer expertise count where it matters most — architecture, system design, strategic decisions. The tactical execution has been delegated to agents. The judgment calls remain with humans.

critique.sh is built for that world. A world where code generation is abundant and cheap, where review is scarce and expensive, and where the merge gate is the critical interface between AI velocity and production safety.

CLOSING

A Note on Building in the Same Wave You're Studying

There is something strange about building review infrastructure for AI-generated code while using AI to generate the infrastructure itself. We use Claude Code extensively internally. Parts of critique.sh's own codebase have been touched by the same multi-agent patterns we review. This creates an interesting epistemic loop.

What I've found is that this experience is the best possible training for understanding what we're building. When you live inside the production velocity that Claude Code enables, you understand viscerally why the review bottleneck is real. When you watch your own team's PR volume quadruple in a quarter, you stop theorising about the problem and start engineering against it.

The thing I am most confident about heading into the rest of 2026: the teams that win will not be the ones that adopted AI coding tools. Those tools are now table stakes — 73% of engineering teams are already there. The teams that win will be the ones that built the quality infrastructure to match the velocity. Generation without rigour is just technical debt at machine speed.

That's the problem we're here to solve. And judging by the data, it's never been more urgent.

critique.sh is open to try.

Critique Chat and the GitHub App beta are live — start free, connect repos when you want full review automation.

Get started → critique.sh

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy