Prompt Engineering & Structured Output/Structured Output & Scale
Advanced10 min

Multi-Instance & Multi-Pass Review

Why self-review is unreliable, how independent Claude instances catch errors the generator missed, and how multi-pass architectures (per-file local analysis + cross-file integration) handle large PRs. Includes confidence self-reporting for calibrated review routing.

Quick Reference

  • Self-review limitation: Claude retains its reasoning context, making it less likely to catch its own errors
  • Independent review instances (separate conversations, no shared context) are significantly more effective
  • Multi-pass: per-file local analysis first, then cross-file integration pass for dependencies
  • Confidence self-reporting: Claude reports confidence per finding, enabling calibrated routing
  • Larger context windows do NOT solve attention quality issues -- they make them worse
  • For large PRs (10+ files), split into per-file analysis then synthesize results
  • The reviewer instance should NEVER see the generator's reasoning or chain-of-thought
  • Two-instance architecture: generator produces, independent reviewer critiques
  • Cost: multi-pass roughly doubles token usage but catches 2-3x more real issues

Why Self-Review Fails

A natural first attempt at quality assurance is asking Claude to review its own output: 'Now review what you just wrote and fix any issues.' This consistently underperforms because of a fundamental cognitive limitation: Claude retains the reasoning context that led to the original output, making it predisposed to see the same answer as correct.

Use a separate Claude instance for review, not the one that generated the output

The generator instance has seen the reasoning chain that led to the output. It will rationalize its own choices. An independent reviewer instance sees only the output and the original requirements -- it approaches the review without confirmation bias from the generation process.

ApproachError detection rateWhy
Same-instance self-review~15-25%Retains generation context; rationalizes own choices
Same-instance with explicit checklist~25-35%Checklist helps but confirmation bias persists
Independent instance review~45-60%No shared context; approaches output fresh
Independent instance + explicit criteria~55-70%Fresh perspective combined with specific review rules
Multi-pass (local + cross-file)~65-80%Catches both local and cross-cutting issues

The numbers above are approximate and vary by task, but the pattern is consistent across studies: independent review outperforms self-review by 2-3x. The cost is roughly 2x tokens, which is almost always worth it for code that ships to production.