AI Engineering Judgment/Cost Engineering
Intermediate9 min

Token Budgeting

Estimate, allocate, and control token costs per request, per user, and per feature — with practical formulas, budget caps, and real-time usage tracking.

Quick Reference

  • Cost = (input_tokens × input_price) + (output_tokens × output_price) + (cached_tokens × cache_price)
  • Budget per request: set max_tokens on the model + limit tool call iterations via remaining_steps
  • Budget per user: track cumulative usage in Store, enforce daily/monthly limits via middleware
  • Budget per feature: different features get different model tiers and token limits
  • Real-time tracking: usage_metadata on every AIMessage gives exact token counts
  • Alert when spend approaches budget — don't wait for the bill

The Cost Formula

Token Budget per RequestSystem Prompt~2KHistory~4KRAG Context~3KOutput~1KCost Breakdown (Claude Sonnet 4.6)Input: 9K × $3/1M$0.027Output: 1K × $15/1M$0.015Total per request$0.042

System prompt + history + RAG context + output = total tokens → cost per request

ComponentFormulaExample (Claude Sonnet 4.6)
Input tokenstokens × $3/1M4,000 tokens = $0.012
Output tokenstokens × $15/1M1,000 tokens = $0.015
Cached inputtokens × $0.30/1M3,000 cached = $0.0009
Tool callsN calls × (input + output per call)3 calls ≈ $0.05-0.10
Total per requestSum of all componentsTypical: $0.03-0.15
Track actual cost from usage_metadata