Open vs Closed Models
The trade-offs between closed-source API models (GPT-4, Claude) and open-weight models (Llama, Mistral). When self-hosting makes economic sense, licensing traps to avoid, and a decision framework for choosing between them.
Quick Reference
- →Closed models: higher quality ceiling, zero ops overhead, vendor lock-in risk
- →Open models: full control, privacy, customizable, but require infrastructure and expertise
- →Self-hosting becomes cheaper than APIs at roughly $10K-50K/month in API spend
- →Llama 4 license restricts use by companies with 700M+ monthly active users
- →Apache 2.0 models (Mixtral, Qwen) have the fewest commercial restrictions
- →Decision axes: privacy requirements x scale x customization needs x budget
In this article
Closed Model Advantages
Closed-source models accessed via API (GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro) are the default starting point for most applications. Their advantages center on quality, simplicity, and managed infrastructure.
- ▸Higher quality ceiling: frontier closed models still outperform the best open models on the hardest tasks
- ▸Zero infrastructure: no GPUs to manage, no model serving to maintain, no scaling to handle
- ▸Automatic improvements: providers continuously optimize serving, and model upgrades are seamless
- ▸Better safety alignment: extensive RLHF and safety testing that open models often lack
- ▸Richer features: function calling, JSON mode, vision, prompt caching -- open model tooling lags by 3-6 months
- ▸Pay-per-use: no upfront hardware cost, costs scale linearly with usage
API models are simple to start with, but you accept significant trade-offs: you cannot inspect model internals, you cannot guarantee data privacy beyond the provider's promises, and you are subject to rate limits, outages, and pricing changes you cannot control.
Open Model Advantages
Open-weight models (where the model weights are publicly available) offer control, privacy, and economic advantages that become significant at scale.
- ▸Data privacy: your data never leaves your infrastructure -- critical for healthcare, finance, legal, and defense
- ▸Full customization: fine-tune on your domain data, modify decoding, add custom tokens
- ▸No vendor lock-in: switch between equivalent open models without changing providers
- ▸Cost at scale: GPU inference cost per token drops dramatically below API pricing at high volume
- ▸No rate limits: your throughput is limited only by your hardware
- ▸Offline operation: models can run entirely disconnected from the internet
- ▸Reproducibility: same model weights produce the same results -- important for regulated industries
Self-hosting requires ML infrastructure expertise: GPU procurement, model serving (vLLM, TGI), quantization, load balancing, monitoring, and model updates. If your team does not have this expertise, the operational overhead may exceed the cost savings.
Self-Hosting Economics
The crossover point where self-hosting becomes cheaper than API calls depends on your volume, model size, and infrastructure costs. Here is a rough analysis.
| Scenario | API cost/month | Self-hosted cost/month | Break-even |
|---|---|---|---|
| Low volume (1M tokens/day) | ~$75-300 | ~$2,000-4,000 (1x A100) | Never -- API is cheaper |
| Medium (50M tokens/day) | ~$3,750-15,000 | ~$4,000-8,000 (2x A100) | Roughly break-even |
| High (500M tokens/day) | ~$37,500-150,000 | ~$8,000-16,000 (4x A100) | 4-10x cheaper self-hosted |
| Very high (5B tokens/day) | ~$375,000-1,500,000 | ~$30,000-60,000 (cluster) | 10-25x cheaper self-hosted |
Providers like Together AI, Fireworks, Anyscale, and Groq offer open models via API at 3-10x lower prices than closed models. You get open-model pricing without the ops burden. For example, Llama 4 70B via Together costs ~$0.90/$0.90 per 1M tokens -- cheaper than Claude Haiku 4.5. This is often the best starting point.
Licensing Traps
Not all 'open' models are equally open. Licenses range from fully permissive (Apache 2.0) to restricted (Llama's custom license). Understanding the legal implications is critical before building production systems on open models.
| Model | License | Commercial use | Key restrictions |
|---|---|---|---|
| Llama 4 | Llama 4 Community | Yes | No use by companies with 700M+ MAU; no use of outputs to train competing models |
| Mistral Large | Research + Commercial | Yes with license | Contact Mistral for commercial license |
| Mixtral 8x22B | Apache 2.0 | Yes, unrestricted | None |
| Qwen 3.5 | Apache 2.0 | Yes, unrestricted | None |
| DeepSeek V3.2 | MIT | Yes, unrestricted | None |
| Gemma 2 | Gemma License | Yes | Cannot use outputs to improve other LLMs |
Llama's license includes a clause: if your product has more than 700 million monthly active users, you need a special license from Meta. This affects very few companies today, but if you are building a platform product, be aware. Also note: you cannot use Llama outputs as training data for a competing foundation model.
- ▸Apache 2.0 is the gold standard for commercial use -- no restrictions on use, modification, or distribution
- ▸Always read the actual license, not summaries -- 'open source' and 'open weights' mean different things
- ▸Some licenses restrict using model outputs for training -- check before using outputs as synthetic training data
- ▸Derivative works (fine-tuned models) typically inherit the base model's license restrictions
- ▸When in doubt, consult legal counsel -- the legal landscape for AI model licensing is evolving rapidly
Decision Framework
The choice between open and closed models depends on four primary axes. Score each for your use case and the answer usually becomes clear.
| Dimension | Favors closed/API | Favors open/self-hosted |
|---|---|---|
| Privacy | Non-sensitive data, provider DPA is sufficient | PII, healthcare, financial, defense, or regulatory requirements |
| Scale | < 50M tokens/day | > 50M tokens/day |
| Customization | Prompt engineering is sufficient | Need fine-tuning, custom decoding, or model modifications |
| Budget | Low volume, willing to pay premium for simplicity | High volume, have ML infrastructure team |
The most common successful pattern: start with a closed model API to validate your product and find product-market fit. Once you have stable usage patterns and understand your quality requirements, evaluate whether migrating to open models (self-hosted or via cheaper API providers) makes economic sense. Premature optimization with self-hosting is a common trap for early-stage teams.
- ▸If any dimension strongly favors open models (especially privacy), that usually overrides other considerations
- ▸If you need frontier quality on the hardest reasoning tasks, closed models still have an edge (but it is shrinking)
- ▸The 'hosted open model API' middle ground (Together, Fireworks) eliminates the ops burden while keeping cost advantages
- ▸Plan for model portability from day one: abstract your LLM calls behind a clean interface, regardless of which side you choose
Best Practices
Do
- ✓Start with closed model APIs for rapid prototyping and product validation
- ✓Evaluate hosted open model providers (Together, Fireworks, Groq) as a middle ground
- ✓Read the full license text before building production systems on any open model
- ✓Build model-agnostic abstractions so you can switch between open and closed models
- ✓Factor in total cost of ownership for self-hosting: hardware, ops, monitoring, and engineering time
Don’t
- ✗Don't self-host before you have significant volume -- the break-even point is higher than most teams expect
- ✗Don't assume 'open source' means fully unrestricted -- many open models have commercial restrictions
- ✗Don't ignore the quality gap on frontier tasks -- closed models still lead on the hardest benchmarks
- ✗Don't build on a single provider without a migration strategy
- ✗Don't underestimate the ops burden of self-hosting -- model serving at scale is a full-time job
Key Takeaways
- ✓Closed models offer higher quality ceilings and zero ops, but lock you into a vendor with no privacy guarantees.
- ✓Open models provide full control and privacy, but require significant infrastructure expertise to self-host.
- ✓Self-hosting becomes economically favorable at roughly $10K-50K/month in API spend.
- ✓Hosted open model APIs (Together, Fireworks) offer a compelling middle ground: open model pricing without ops burden.
- ✓Start with closed APIs for product validation, then evaluate open models once you have stable usage patterns.
Video on this topic
Open vs closed AI models: the real trade-offs