Context & Reasoning/Context Engineering
Intermediate6 min

Prompt Caching TTL Controls

Claude Code's prompt cache keeps the stable portion of your context cheap to re-send. Two environment variables control the TTL — and choosing the wrong one can cost you significantly on long sessions.

Quick Reference

  • Default prompt cache TTL: 5 minutes (as of March 2026 — previously 1 hour)
  • ENABLE_PROMPT_CACHING_1H=1 opts into 1-hour TTL — useful when stepping away between turns
  • FORCE_PROMPT_CACHING_5M=1 forces 5-minute TTL — overrides 1-hour for a specific session
  • DISABLE_PROMPT_CACHING=1 disables caching entirely — useful for debugging billing anomalies
  • Cache hit: cached tokens re-billed at significantly lower rate than fresh tokens
  • The 300-second dead zone: sleeping exactly 5 minutes pays the cache miss without amortizing it
  • Active sessions: stay under 270 seconds between turns to stay in warm cache window
  • Long-gap sessions: use ENABLE_PROMPT_CACHING_1H=1 for 1-hour TTL

How Prompt Caching Works in Claude Code

Every time Claude Code sends a request to the API, it includes the current conversation context — the system prompt, your CLAUDE.md files, MCP schemas, conversation history, and the current turn. On large sessions, this can be 50,000–200,000+ tokens sent on every single turn. Prompt caching avoids re-billing that stable portion on each turn.

The cache stores a snapshot of the stable portion of your context. If the same content is sent within the TTL window, the API serves it from cache at a reduced rate instead of processing it fresh. Claude Code manages this caching layer automatically on top of the API.

Cache Hits Are Significantly Cheaper

Cached tokens are billed at a fraction of standard input token rates. On Opus 4.7, the savings are substantial — a 100K-token system prompt sent 6 times per hour without caching costs 6x what it costs with a cache hit on 5 of those 6 turns. At scale, prompt caching is one of the highest-leverage cost controls available.