LLM Foundations/The Model Landscape
Intermediate10 min

Open vs Closed Models

The trade-offs between closed-source API models (GPT-4, Claude) and open-weight models (Llama, Mistral). When self-hosting makes economic sense, licensing traps to avoid, and a decision framework for choosing between them.

Quick Reference

  • Closed models: higher quality ceiling, zero ops overhead, vendor lock-in risk
  • Open models: full control, privacy, customizable, but require infrastructure and expertise
  • Self-hosting becomes cheaper than APIs at roughly $10K-50K/month in API spend
  • Llama 4 license restricts use by companies with 700M+ monthly active users
  • Apache 2.0 models (Mixtral, Qwen) have the fewest commercial restrictions
  • Decision axes: privacy requirements x scale x customization needs x budget

Closed Model Advantages

Closed-source models accessed via API (GPT-5.4, Claude Sonnet 4.6, Gemini 3.1 Pro) are the default starting point for most applications. Their advantages center on quality, simplicity, and managed infrastructure.

  • Higher quality ceiling: frontier closed models still outperform the best open models on the hardest tasks
  • Zero infrastructure: no GPUs to manage, no model serving to maintain, no scaling to handle
  • Automatic improvements: providers continuously optimize serving, and model upgrades are seamless
  • Better safety alignment: extensive RLHF and safety testing that open models often lack
  • Richer features: function calling, JSON mode, vision, prompt caching -- open model tooling lags by 3-6 months
  • Pay-per-use: no upfront hardware cost, costs scale linearly with usage
The hidden cost of simplicity

API models are simple to start with, but you accept significant trade-offs: you cannot inspect model internals, you cannot guarantee data privacy beyond the provider's promises, and you are subject to rate limits, outages, and pricing changes you cannot control.

Open Model Advantages

Open-weight models (where the model weights are publicly available) offer control, privacy, and economic advantages that become significant at scale.

  • Data privacy: your data never leaves your infrastructure -- critical for healthcare, finance, legal, and defense
  • Full customization: fine-tune on your domain data, modify decoding, add custom tokens
  • No vendor lock-in: switch between equivalent open models without changing providers
  • Cost at scale: GPU inference cost per token drops dramatically below API pricing at high volume
  • No rate limits: your throughput is limited only by your hardware
  • Offline operation: models can run entirely disconnected from the internet
  • Reproducibility: same model weights produce the same results -- important for regulated industries
Open ≠ simple

Self-hosting requires ML infrastructure expertise: GPU procurement, model serving (vLLM, TGI), quantization, load balancing, monitoring, and model updates. If your team does not have this expertise, the operational overhead may exceed the cost savings.

Self-Hosting Economics

The crossover point where self-hosting becomes cheaper than API calls depends on your volume, model size, and infrastructure costs. Here is a rough analysis.

ScenarioAPI cost/monthSelf-hosted cost/monthBreak-even
Low volume (1M tokens/day)~$75-300~$2,000-4,000 (1x A100)Never -- API is cheaper
Medium (50M tokens/day)~$3,750-15,000~$4,000-8,000 (2x A100)Roughly break-even
High (500M tokens/day)~$37,500-150,000~$8,000-16,000 (4x A100)4-10x cheaper self-hosted
Very high (5B tokens/day)~$375,000-1,500,000~$30,000-60,000 (cluster)10-25x cheaper self-hosted
The middle ground: hosted open models

Providers like Together AI, Fireworks, Anyscale, and Groq offer open models via API at 3-10x lower prices than closed models. You get open-model pricing without the ops burden. For example, Llama 4 70B via Together costs ~$0.90/$0.90 per 1M tokens -- cheaper than Claude Haiku 4.5. This is often the best starting point.

Estimating self-hosting costs

Licensing Traps

Not all 'open' models are equally open. Licenses range from fully permissive (Apache 2.0) to restricted (Llama's custom license). Understanding the legal implications is critical before building production systems on open models.

ModelLicenseCommercial useKey restrictions
Llama 4Llama 4 CommunityYesNo use by companies with 700M+ MAU; no use of outputs to train competing models
Mistral LargeResearch + CommercialYes with licenseContact Mistral for commercial license
Mixtral 8x22BApache 2.0Yes, unrestrictedNone
Qwen 3.5Apache 2.0Yes, unrestrictedNone
DeepSeek V3.2MITYes, unrestrictedNone
Gemma 2Gemma LicenseYesCannot use outputs to improve other LLMs
The Llama MAU trap

Llama's license includes a clause: if your product has more than 700 million monthly active users, you need a special license from Meta. This affects very few companies today, but if you are building a platform product, be aware. Also note: you cannot use Llama outputs as training data for a competing foundation model.

  • Apache 2.0 is the gold standard for commercial use -- no restrictions on use, modification, or distribution
  • Always read the actual license, not summaries -- 'open source' and 'open weights' mean different things
  • Some licenses restrict using model outputs for training -- check before using outputs as synthetic training data
  • Derivative works (fine-tuned models) typically inherit the base model's license restrictions
  • When in doubt, consult legal counsel -- the legal landscape for AI model licensing is evolving rapidly

Decision Framework

The choice between open and closed models depends on four primary axes. Score each for your use case and the answer usually becomes clear.

DimensionFavors closed/APIFavors open/self-hosted
PrivacyNon-sensitive data, provider DPA is sufficientPII, healthcare, financial, defense, or regulatory requirements
Scale< 50M tokens/day> 50M tokens/day
CustomizationPrompt engineering is sufficientNeed fine-tuning, custom decoding, or model modifications
BudgetLow volume, willing to pay premium for simplicityHigh volume, have ML infrastructure team
Start closed, graduate to open

The most common successful pattern: start with a closed model API to validate your product and find product-market fit. Once you have stable usage patterns and understand your quality requirements, evaluate whether migrating to open models (self-hosted or via cheaper API providers) makes economic sense. Premature optimization with self-hosting is a common trap for early-stage teams.

  • If any dimension strongly favors open models (especially privacy), that usually overrides other considerations
  • If you need frontier quality on the hardest reasoning tasks, closed models still have an edge (but it is shrinking)
  • The 'hosted open model API' middle ground (Together, Fireworks) eliminates the ops burden while keeping cost advantages
  • Plan for model portability from day one: abstract your LLM calls behind a clean interface, regardless of which side you choose

Best Practices

Best Practices

Do

  • Start with closed model APIs for rapid prototyping and product validation
  • Evaluate hosted open model providers (Together, Fireworks, Groq) as a middle ground
  • Read the full license text before building production systems on any open model
  • Build model-agnostic abstractions so you can switch between open and closed models
  • Factor in total cost of ownership for self-hosting: hardware, ops, monitoring, and engineering time

Don’t

  • Don't self-host before you have significant volume -- the break-even point is higher than most teams expect
  • Don't assume 'open source' means fully unrestricted -- many open models have commercial restrictions
  • Don't ignore the quality gap on frontier tasks -- closed models still lead on the hardest benchmarks
  • Don't build on a single provider without a migration strategy
  • Don't underestimate the ops burden of self-hosting -- model serving at scale is a full-time job

Key Takeaways

  • Closed models offer higher quality ceilings and zero ops, but lock you into a vendor with no privacy guarantees.
  • Open models provide full control and privacy, but require significant infrastructure expertise to self-host.
  • Self-hosting becomes economically favorable at roughly $10K-50K/month in API spend.
  • Hosted open model APIs (Together, Fireworks) offer a compelling middle ground: open model pricing without ops burden.
  • Start with closed APIs for product validation, then evaluate open models once you have stable usage patterns.

Video on this topic

Open vs closed AI models: the real trade-offs

instagram