Token Budgeting

Estimate, allocate, and control token costs per request, per user, and per feature — with practical formulas, budget caps, and real-time usage tracking.

Quick Reference

→Cost = (input_tokens × input_price) + (output_tokens × output_price) + (cached_tokens × cache_price)
→Budget per request: set max_tokens on the model + limit tool call iterations via remaining_steps
→Budget per user: track cumulative usage in Store, enforce daily/monthly limits via middleware
→Budget per feature: different features get different model tiers and token limits
→Real-time tracking: usage_metadata on every AIMessage gives exact token counts
→Alert when spend approaches budget — don't wait for the bill

The Cost Formula

System prompt + history + RAG context + output = total tokens → cost per request

Component	Formula	Example (Claude Sonnet 4.6)
Input tokens	tokens × $3/1M	4,000 tokens = $0.012
Output tokens	tokens × $15/1M	1,000 tokens = $0.015
Cached input	tokens × $0.30/1M	3,000 cached = $0.0009
Tool calls	N calls × (input + output per call)	3 calls ≈ $0.05-0.10
Total per request	Sum of all components	Typical: $0.03-0.15

Track actual cost from usage_metadata

Token Budgeting

The Cost Formula

Budget Per Request

Budget Per User

Sign in to read this article