Production & Scale/Data Engineering for AI
Advanced14 min

Model Management

Once you run more than one model in production, you need a registry, a promotion pipeline, and automatic rollback — or you will lose track of what is serving traffic and why quality changed. This article builds those three things from scratch, with cost math and threshold derivation included.

Quick Reference

  • You need model management when: two or more models serve production, or prompt + model change independently
  • Pipeline: STAGING → SHADOW → CANARY (1–25%) → ACTIVE → DEPRECATED
  • Shadow cost = (requests/day × avg_cost_per_call) × 2 — budget it before starting
  • A/B experiments on LLMs need 10K+ sessions per arm; non-determinism inflates variance
  • Set rollback thresholds relative to your baseline, not from a table of magic numbers
  • Version bundle = model + prompt hash + config — never change two variables in the same bundle
  • Rollback must complete in under 60 seconds — automate it, never rely on manual action

When You Need Model Management (and When You Don't)

If you run one model with one prompt and you change them together whenever you feel like it, you do not need a model registry. You need one when you can no longer answer these questions from memory: which model is serving traffic right now, what prompt version it is using, and what baseline metrics it was registered with. That threshold is usually two models in production — one for cheap/fast tasks, one for quality-critical tasks — or the first time your prompt team and your model team start shipping changes independently.

SituationWhat you need
Single model, prompt changes reviewed as codeVersion control is enough — no registry
Two models (e.g., cheap vs. quality tier)Registry + bundle versioning
Prompt team and model team ship independentlyRegistry + promotion pipeline
More than 5% of traffic complaints are 'it used to work'Registry + automatic rollback
Multiple providers (OpenAI + Anthropic + self-hosted)Registry + provider failover callout
Start with version bundles, add the registry later

You can get most of the reproducibility benefit by just naming your bundles (e.g., 'support-agent-v12') and storing the model ID, prompt hash, and config together — before you build any registry infrastructure. The registry is just a queryable store on top of that.