Intermediate6 min

Prompt Caching TTL Controls

Claude Code's prompt cache keeps the stable portion of your context cheap to re-send. Two environment variables control the TTL — and choosing the wrong one can cost you significantly on long sessions.

Quick Reference

→Default prompt cache TTL: 5 minutes (as of March 2026 — previously 1 hour)
→ENABLE_PROMPT_CACHING_1H=1 opts into 1-hour TTL — useful when stepping away between turns
→FORCE_PROMPT_CACHING_5M=1 forces 5-minute TTL — overrides 1-hour for a specific session
→DISABLE_PROMPT_CACHING=1 disables caching entirely — useful for debugging billing anomalies
→Cache hit: cached tokens re-billed at significantly lower rate than fresh tokens
→The 300-second dead zone: sleeping exactly 5 minutes pays the cache miss without amortizing it
→Active sessions: stay under 270 seconds between turns to stay in warm cache window
→Long-gap sessions: use ENABLE_PROMPT_CACHING_1H=1 for 1-hour TTL

How Prompt Caching Works in Claude Code

Every time Claude Code sends a request to the API, it includes the current conversation context — the system prompt, your CLAUDE.md files, MCP schemas, conversation history, and the current turn. On large sessions, this can be 50,000–200,000+ tokens sent on every single turn. Prompt caching avoids re-billing that stable portion on each turn.

The cache stores a snapshot of the stable portion of your context. If the same content is sent within the TTL window, the API serves it from cache at a reduced rate instead of processing it fresh. Claude Code manages this caching layer automatically on top of the API.

Cache Hits Are Significantly Cheaper

Cached tokens are billed at a fraction of standard input token rates. On Opus 4.7, the savings are substantial — a 100K-token system prompt sent 6 times per hour without caching costs 6x what it costs with a cache hit on 5 of those 6 turns. At scale, prompt caching is one of the highest-leverage cost controls available.

The March 2026 TTL Regression

In early March 2026, the default prompt cache TTL silently changed from 1 hour to 5 minutes. Teams that had been relying on the 1-hour TTL for sessions with long gaps between turns suddenly started paying full input token rates on most turns — without any warning or changelog.

The Three TTL Environment Variables

Variable	Effect	When to use
ENABLE_PROMPT_CACHING_1H=1	1-hour cache TTL	Sessions with long gaps between turns — running builds, reviewing, attending meetings
FORCE_PROMPT_CACHING_5M=1	Forces 5-minute TTL	Override 1-hour TTL for a specific session where turns are rapid
DISABLE_PROMPT_CACHING=1	Disables caching entirely	Debugging unexpected billing or investigating stale cache behavior

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.