★ OverviewIntermediate9 min

The Token Budget — What Costs What

Your context window is a finite resource that starts draining before you type a word. Understanding what loads at startup, what accumulates per turn, and how the 1M window changes the math is the foundation of every context engineering decision.

Quick Reference

→System prompt consumes ~4,200 tokens before anything else
→Each MCP server injects all its tool schemas upfront — disable irrelevant ones
→CLAUDE.md files load at full size on every session start
→Reading a 2,000-line file costs ~15,000–20,000 tokens
→1M context available on Opus 4.7, Opus 4.6, and Sonnet 4.6 — standard pricing beyond 200K
→Max/Team/Enterprise: Opus 1M is automatic. Pro: requires opt-in
→Context quality matters more at 1M — signal gets diluted, not saved
→Use CLAUDE_CODE_DISABLE_1M_CONTEXT=1 for predictable budgets

The Startup Cost: Before You Type a Word

Every Claude Code session opens with a context window that is already partially filled. These fixed-cost components load automatically as part of the session initialization sequence — you pay for them regardless of what your task is.

Component	Approx. Token Cost	Notes
System prompt	~4,200 tokens	Fixed — cannot be reduced or skipped
Environment info	~280 tokens	Working dir, OS, git status, current date
CLAUDE.md (project root)	Full file size	Re-loaded from disk each session
CLAUDE.md (subdirectories)	Full file size each	Loaded when Claude enters that directory
MCP tool schemas	Variable — can be thousands	Every enabled server injects all its tool definitions
Auto memory (MEMORY.md index)	Proportional to index size	Individual memory files load when referenced

MCP Overhead Compounds Fast

Each enabled MCP server loads its entire tool manifest at session start. A server with 20 tools, each with a name, description, and parameter spec, can easily consume 5,000–10,000 tokens. Three such servers cost more than your entire CLAUDE.md. Use ENABLE_TOOL_SEARCH=1 to enable lazy loading — tools are fetched only when Claude needs them.

Check Your Actual Startup Cost

Open a fresh session and immediately run /context. The breakdown shows what is loaded and how much context each component is consuming. Do this once per project to understand your baseline before any work begins.

What Accumulates Per Turn

After startup, the window fills incrementally. Every exchange — your message, Claude's response, and every tool call result — is appended to the context. Nothing is dropped until compaction.

Context Pressure Before the Hard Limit

The hard limit — where Claude refuses to respond — is not the problem. Quality degrades long before that. As the window fills, Claude begins working around its limitations in ways that are invisible until something breaks.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.