Cost Management at Team Scale

Claude Code costs compound across a team. npx ccusage, right-sizing models and effort levels, disabling unused MCP servers, and the --max-budget-usd flag are the levers you have.

Quick Reference

→npx ccusage: daily/monthly/session usage dashboard — reads local JSONL logs, no data leaves your machine
→Low cache hit rate = expensive sessions — repeated context (CLAUDE.md, system prompt) should cache
→Disable unused MCP servers: each active server adds tool definitions to every request
→--max-budget-usd: cap spending for a non-interactive session
→Right-sizing: Haiku for subagents that don't need Opus reasoning
→Three cost drivers: model choice, effort level, context size

npx ccusage — Your Usage Dashboard

ccusage is a third-party CLI that reads Claude Code's local JSONL session logs and produces a usage dashboard. It runs entirely on your machine — no data is sent anywhere.

ccusage commands

The output shows per-model cost breakdown, total tokens, and cache hit rate. The cache hit rate metric is one of the most actionable numbers in the report.

What cache hit rate means

Claude Code caches repeated context — CLAUDE.md files, system prompts, large file reads that recur across messages. A high cache hit rate means that context costs much less on subsequent requests. A low cache hit rate means your session context is not stable — you are paying full price for the same content repeatedly.

What Drives Claude Code Spend

Three decisions drive most of your Claude Code cost:

MCP Server Cost

Every active MCP server adds its tool definitions to every request. A server with 15 tools might add 2,000 tokens of overhead. Ten servers = 20,000 tokens overhead per message, before any code is loaded.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.