Production & Scale/Data Engineering for AI
Advanced11 min

Model Management

Build a production model registry that tracks which models, prompts, and configs are in use, supports A/B deployment and shadow testing, and enables instant rollback when a new model underperforms.

Quick Reference

  • Model registry: single source of truth for model name, version, config, prompt version, and performance metrics
  • A/B deployment: route a percentage of traffic to a new model with sticky bucketing by user ID
  • Shadow deployment: run the new model alongside production, compare outputs without user impact
  • Version bundles: pin model + prompt + config together — never change two variables at once
  • Automatic rollback: revert to the previous model within 60 seconds when quality drops below threshold

Building a Model Registry

A model registry is not optional at scale

Once you have more than two models in production (e.g., GPT-5.4 for complex queries, o4-mini for simple ones, an embedding model for retrieval), you need a registry. Without it, you lose track of which model version is serving which traffic, and debugging becomes impossible.

Model registry with versioned bundles
  • Every model change creates a new bundle — never mutate an existing bundle in place
  • Store the full config (temperature, max_tokens, system prompt hash) so you can reproduce any past behavior
  • Tag bundles with the eval suite results at the time of registration for historical comparison