Skip to content
Essay8 min readCritique

Breaking the Dead Loop: What to Do When Cursor or Windsurf Refuses to Learn

Same broken fix five times in a row? Stop prompting harder. Name the loop, apply the two-strike rule, and let sandbox-verified tests decide what actually works.

You asked Cursor to fix the failing test. It edited the assertion instead of the bug. You said that did not work. It edited the assertion again with a different comment. Third try: it reverted your manual fix and re-applied the same patch. Fourth try: it apologized and did the same thing. Fifth try: you are typing in all caps and the agent is confidently wrong again. This is not a you problem. It is a dead loop — and more prompting is the wrong medicine.

Dead loop taxonomy

Diagnose before you prompt again.

Loop typeWhat it looks likeExit move
Hallucination loopAgent insists the fix worked; file content or test output contradicts itRun tests yourself. Paste raw stdout. Forbid further edits until it explains the mismatch.
Context rotConversation is long; agent forgets constraints, reintroduces old bugsFresh Composer chat. Attach only the failing test, the function under test, and the error log.
Infra hangTerminal stuck, dev server zombie, Windsurf shell never returnsKill the process. New terminal. Do not let the agent "wait a bit longer" indefinitely.

Most "stupid AI" moments are context rot mislabeled as model stupidity.

If the same hypothesis fails twice — same file, same approach, same error — you are in a loop. Strike one: allow one retry with new information (full stack trace, `git diff`, reproduction steps). Strike two: stop delegating. You change strategy: revert the agent's branch, shrink the task, or write the failing test yourself and ask the agent to make only that test pass.

When you must stay in the same thread, paste a hard reset instruction. The goal is to force epistemic humility: no edits until the agent states what failed, what it tried, and what evidence would falsify its next guess.

Copy-paste circuit breaker

STOP. We are in a fix loop.

1. Quote the exact error from the last command output (do not paraphrase).
2. List each fix you already attempted in this chat.
3. State ONE hypothesis for the root cause.
4. Propose ONE minimal change.
5. Do not edit files until I confirm the hypothesis.

If you cannot explain why the last fix failed, say "I need the test output" and ask for it.

Natural language acceptance criteria drift. Tests do not. A failing test named after the bug gives the agent a binary oracle: green or red, no vibes. Start by locking the regression in place — even if you write the test manually — then let the agent implement against it. If it greens the test by weakening the assertion, the diff tells you immediately.

For integration bugs, prefer a minimal reproduction script over "fix the auth flow." One command, one expected exit code. Agents handle narrow oracles better than sprawling refactors.

Cursor Composer and similar surfaces accumulate stale tool results, wrong file snapshots, and abandoned plans. After two strikes, open a new composer with: (1) the error log, (2) relevant files @-mentioned, (3) explicit out-of-scope list ("do not touch middleware"). Windsurf users hitting terminal hangs should treat a frozen shell as infra loop — kill it, restart, re-run the single command yourself before asking the agent to continue.

Local dev lies for the same reasons as in deployment: wrong OS, cached modules, agent never actually ran the test it claimed passed. Critique runs your branch in an isolated sandbox — install, build, test — and attaches the logs to review. When the agent says fixed and the sandbox says fail, the loop ends with evidence, not argument. Use it after agent-heavy PRs the way you would use CI, but scoped to the diff with review context.

When to escalate out of the IDE loop
  1. 1
    Did the same fix fail twice?
    Apply two-strike rule. Do not prompt a third time without new evidence.
  2. 2
    Is the chat over ~20 turns on one bug?
    Context rot likely. Fresh session with minimal attachments.
  3. 3
    Does the agent claim green without pasted output?
    Hallucination loop. Run the command yourself or trigger sandbox verification.
  4. 4
    Is the terminal unresponsive?
    Infra hang. Kill processes. Never stack fixes on a zombie dev server.
Usually context rot or hallucination: the model does not reliably track prior failed attempts, or it predicts plausible code without verifying runtime output. Long threads and missing test logs make both worse.
If two attempts with the same approach fail, stop prompting. Change strategy: revert, shrink scope, write a failing test, or start a fresh session with only the evidence the agent needs.
Treat it as infra loop: cancel the command, kill zombie node/next processes, open a new terminal tab, and re-run one command manually before delegating again.
Yes. External build and test logs the agent cannot edit provide ground truth. Critique attaches sandbox results to the PR so "it works" claims are checkable.

Replace the fifth apology with a green build log

Sign up for Critique and run sandbox-verified checks on your next agent-written PR — before the dead loop becomes merge history.

Start free