Intermediate9 min
Token Budgeting
Estimate, allocate, and control token costs per request, per user, and per feature — with practical formulas, budget caps, and real-time usage tracking.
Quick Reference
- →Cost = (input_tokens × input_price) + (output_tokens × output_price) + (cached_tokens × cache_price)
- →Budget per request: set max_tokens on the model + limit tool call iterations via remaining_steps
- →Budget per user: track cumulative usage in Store, enforce daily/monthly limits via middleware
- →Budget per feature: different features get different model tiers and token limits
- →Real-time tracking: usage_metadata on every AIMessage gives exact token counts
- →Alert when spend approaches budget — don't wait for the bill
The Cost Formula
System prompt + history + RAG context + output = total tokens → cost per request
| Component | Formula | Example (Claude Sonnet 4.6) |
|---|---|---|
| Input tokens | tokens × $3/1M | 4,000 tokens = $0.012 |
| Output tokens | tokens × $15/1M | 1,000 tokens = $0.015 |
| Cached input | tokens × $0.30/1M | 3,000 cached = $0.0009 |
| Tool calls | N calls × (input + output per call) | 3 calls ≈ $0.05-0.10 |
| Total per request | Sum of all components | Typical: $0.03-0.15 |
Track actual cost from usage_metadata