EssayJune 8, 20268 min readCritique

Breaking the Dead Loop: What to Do When Cursor or Windsurf Refuses to Learn

Same broken fix five times in a row? Stop prompting harder. Name the loop, apply the two-strike rule, and let sandbox-verified tests decide what actually works.

You asked Cursor to fix the failing test. It edited the assertion instead of the bug. You said that did not work. It edited the assertion again with a different comment. Third try: it reverted your manual fix and re-applied the same patch. Fourth try: it apologized and did the same thing. Fifth try: you are typing in all caps and the agent is confidently wrong again. This is not a you problem. It is a dead loop — and more prompting is the wrong medicine.

Three loop types (and why each needs a different exit)

Dead loop taxonomy

Diagnose before you prompt again.

Loop type	What it looks like	Exit move
Hallucination loop	Agent insists the fix worked; file content or test output contradicts it	Run tests yourself. Paste raw stdout. Forbid further edits until it explains the mismatch.
Context rot	Conversation is long; agent forgets constraints, reintroduces old bugs	Fresh Composer chat. Attach only the failing test, the function under test, and the error log.
Infra hang	Terminal stuck, dev server zombie, Windsurf shell never returns	Kill the process. New terminal. Do not let the agent "wait a bit longer" indefinitely.

Most "stupid AI" moments are context rot mislabeled as model stupidity.

The two-strike rule

If the same hypothesis fails twice — same file, same approach, same error — you are in a loop. Strike one: allow one retry with new information (full stack trace, git diff, reproduction steps). Strike two: stop delegating. You change strategy: revert the agent's branch, shrink the task, or write the failing test yourself and ask the agent to make only that test pass.

Circuit breaker prompt

When you must stay in the same thread, paste a hard reset instruction. The goal is to force epistemic humility: no edits until the agent states what failed, what it tried, and what evidence would falsify its next guess.

Copy-paste circuit breaker

STOP. We are in a fix loop.

1. Quote the exact error from the last command output (do not paraphrase).
2. List each fix you already attempted in this chat.
3. State ONE hypothesis for the root cause.
4. Propose ONE minimal change.
5. Do not edit files until I confirm the hypothesis.

If you cannot explain why the last fix failed, say "I need the test output" and ask for it.

TDD as ground truth

Natural language acceptance criteria drift. Tests do not. A failing test named after the bug gives the agent a binary oracle: green or red, no vibes. Start by locking the regression in place — even if you write the test manually — then let the agent implement against it. If it greens the test by weakening the assertion, the diff tells you immediately.

For integration bugs, prefer a minimal reproduction script over "fix the auth flow." One command, one expected exit code. Agents handle narrow oracles better than sprawling refactors.

Fresh Composer session vs. infinite thread

Cursor Composer and similar surfaces accumulate stale tool results, wrong file snapshots, and abandoned plans. After two strikes, open a new composer with: (1) the error log, (2) relevant files @-mentioned, (3) explicit out-of-scope list ("do not touch middleware"). Windsurf users hitting terminal hangs should treat a frozen shell as infra loop — kill it, restart, re-run the single command yourself before asking the agent to continue.

Critique sandbox verification as loop-breaker

Local dev lies for the same reasons as in deployment: wrong OS, cached modules, agent never actually ran the test it claimed passed. Critique runs your branch in an isolated sandbox — install, build, test — and attaches the logs to review. When the agent says fixed and the sandbox says fail, the loop ends with evidence, not argument. Use it after agent-heavy PRs the way you would use CI, but scoped to the diff with review context.

When to escalate out of the IDE loop
1Did the same fix fail twice?
Apply two-strike rule. Do not prompt a third time without new evidence.
2Is the chat over ~20 turns on one bug?
Context rot likely. Fresh session with minimal attachments.
3Does the agent claim green without pasted output?
Hallucination loop. Run the command yourself or trigger sandbox verification.
4Is the terminal unresponsive?
Infra hang. Kill processes. Never stack fixes on a zombie dev server.

FAQ

Usually context rot or hallucination: the model does not reliably track prior failed attempts, or it predicts plausible code without verifying runtime output. Long threads and missing test logs make both worse.

If two attempts with the same approach fail, stop prompting. Change strategy: revert, shrink scope, write a failing test, or start a fresh session with only the evidence the agent needs.

Treat it as infra loop: cancel the command, kill zombie node/next processes, open a new terminal tab, and re-run one command manually before delegating again.

Yes. External build and test logs the agent cannot edit provide ground truth. Critique attaches sandbox results to the PR so "it works" claims are checkable.

Replace the fifth apology with a green build log

Sign up for Critique and run sandbox-verified checks on your next agent-written PR — before the dead loop becomes merge history.

Start free

Compare Critique

Compare the main AI code review options.

If this article is part of a buying process, these pages compare Critique with the tools most teams evaluate for GitHub PR review.

Best AI code review tools AI code review pricing

← All essays Privacy & Terms

Ask about this essay

Nemotron-3-Super

Ask about the argument, the evidence, the structure, or how the post connects to Critique.

Not editorial advice · The essay above is the source of truth · Not saved to your account · OpenRouter privacy