Prompt Caching TTL Controls
Claude Code's prompt cache keeps the stable portion of your context cheap to re-send. Two environment variables control the TTL — and choosing the wrong one can cost you significantly on long sessions.
Quick Reference
- →Default prompt cache TTL: 5 minutes (as of March 2026 — previously 1 hour)
- →ENABLE_PROMPT_CACHING_1H=1 opts into 1-hour TTL — useful when stepping away between turns
- →FORCE_PROMPT_CACHING_5M=1 forces 5-minute TTL — overrides 1-hour for a specific session
- →DISABLE_PROMPT_CACHING=1 disables caching entirely — useful for debugging billing anomalies
- →Cache hit: cached tokens re-billed at significantly lower rate than fresh tokens
- →The 300-second dead zone: sleeping exactly 5 minutes pays the cache miss without amortizing it
- →Active sessions: stay under 270 seconds between turns to stay in warm cache window
- →Long-gap sessions: use ENABLE_PROMPT_CACHING_1H=1 for 1-hour TTL
How Prompt Caching Works in Claude Code
Every time Claude Code sends a request to the API, it includes the current conversation context — the system prompt, your CLAUDE.md files, MCP schemas, conversation history, and the current turn. On large sessions, this can be 50,000–200,000+ tokens sent on every single turn. Prompt caching avoids re-billing that stable portion on each turn.
The cache stores a snapshot of the stable portion of your context. If the same content is sent within the TTL window, the API serves it from cache at a reduced rate instead of processing it fresh. Claude Code manages this caching layer automatically on top of the API.
Cached tokens are billed at a fraction of standard input token rates. On Opus 4.7, the savings are substantial — a 100K-token system prompt sent 6 times per hour without caching costs 6x what it costs with a cache hit on 5 of those 6 turns. At scale, prompt caching is one of the highest-leverage cost controls available.