Advanced11 min
Model Management
Build a production model registry that tracks which models, prompts, and configs are in use, supports A/B deployment and shadow testing, and enables instant rollback when a new model underperforms.
Quick Reference
- →Model registry: single source of truth for model name, version, config, prompt version, and performance metrics
- →A/B deployment: route a percentage of traffic to a new model with sticky bucketing by user ID
- →Shadow deployment: run the new model alongside production, compare outputs without user impact
- →Version bundles: pin model + prompt + config together — never change two variables at once
- →Automatic rollback: revert to the previous model within 60 seconds when quality drops below threshold
Building a Model Registry
A model registry is not optional at scale
Once you have more than two models in production (e.g., GPT-5.4 for complex queries, o4-mini for simple ones, an embedding model for retrieval), you need a registry. Without it, you lose track of which model version is serving which traffic, and debugging becomes impossible.
Model registry with versioned bundles
- ▸Every model change creates a new bundle — never mutate an existing bundle in place
- ▸Store the full config (temperature, max_tokens, system prompt hash) so you can reproduce any past behavior
- ▸Tag bundles with the eval suite results at the time of registration for historical comparison