Cost Forensics

Token accounting reveals where your AI budget actually goes. Learn to track costs per user, per feature, and per conversation — then optimize with context trimming, model tiering, and caching to cut costs by 50-80% without sacrificing quality.

Quick Reference

→System prompts are charged on EVERY call — a 2000-token system prompt at 100 calls/user/day adds up fast
→Conversation history is the #1 hidden cost — it grows linearly with each turn and is re-sent every time
→Retries, tool call overhead, and embedding calls are often invisible in billing dashboards
→Track cost per conversation (not per API call) to understand true unit economics
→The biggest savings come from: shorter system prompts, history trimming, and using cheaper models for routing
→Set per-user and per-feature budget limits to prevent runaway costs

Where Your Tokens Actually Go

Most teams look at their monthly LLM bill and have no idea where the money went. Token accounting breaks down every request into its components: system prompt, conversation history, retrieved context, user message, and response. The results are often surprising — system prompts and conversation history dominate, not the user's question.

Token Source	Typical Size	Sent Every Call?	Cost Impact
System prompt	500-3000 tokens	Yes — every single call	HIGH — multiplied by call count
Conversation history	100-10000+ tokens (grows)	Yes — grows with each turn	HIGHEST — exponential growth
Retrieved context (RAG)	500-2000 tokens	Per RAG call	Medium — proportional to chunk count
User message	20-200 tokens	Once per turn	Low
Tool call descriptions	200-1000 tokens	Every agent call	Medium — often overlooked
LLM response	100-2000 tokens	Once per generation	Medium — output tokens cost 2-5x more

Output Tokens Cost 2-5x More Than Input Tokens

Every major provider charges more for output tokens than input tokens. GPT-5.4 charges $2.50/M input but $10/M output. Claude Sonnet 4.6 charges $3/M input but $15/M output. A verbose response costs 2-5x more per token than the prompt that generated it. Instruct models to be concise when you do not need verbose output.

Cost Tracking Middleware

To understand your costs, you need to track token usage at the request level and aggregate it by user, feature, and conversation. The built-in usage data from API responses gives you input and output token counts — you just need to capture and store it.

Hidden Costs You Are Probably Missing

The costs you see in your API dashboard are not the full picture. Several cost sources are easy to overlook because they happen automatically or are grouped with other calls.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.