Reasoning Models
Reasoning models (Claude adaptive thinking, OpenAI o3/o4-mini, Gemini 3 Deep Think) spend internal tokens on a scratchpad before producing the final answer. They cost 5-10x more per call than standard models and add latency. This article covers when that tradeoff is worth it, how each provider's API works in 2026, and how LangChain gives you a single parsing interface across all of them.
Quick Reference
- →Claude Opus 4.7+: thinking={'type': 'adaptive'} — budget_tokens is rejected with a 400 error
- →Claude Opus 4.6 / Sonnet 4.6: thinking={'type': 'adaptive'} + output_config={'effort': 'high'} — budget_tokens deprecated
- →OpenAI o-series: reasoning_effort='low'|'medium'|'high', o3-pro for hardest 5% of tasks ($20/$80/M)
- →Gemini 3: thinking_config={'thinking_budget': -1} for dynamic thinking, 0 to disable
- →LangChain: response.content_blocks returns type='reasoning' blocks then type='text' blocks
- →Reasoning tokens bill as output tokens — adaptive high effort adds ~10x cost vs standard call
- →Stream: thinking_delta events arrive before text_delta — drive 'Thinking...' UX from block type changes
- →Skip reasoning for classification, summarization, single-hop retrieval, and short-form generation
When NOT to Use Reasoning Models
The most expensive mistake with reasoning models is using them for tasks that don't need them. Before enabling thinking, run the single-step test: can this task be solved with a good prompt and a standard model in one pass? If yes, reasoning adds cost and latency without improving accuracy.
- ▸Classification and routing — single-hop decisions benefit from a fast, cheap model, not an extended thinking trace
- ▸Summarization — reasoning doesn't improve compression quality; it adds unnecessary tokens
- ▸RAG response synthesis — when context is well-retrieved, generating from it is a simple task
- ▸Short-form content generation — emails, headlines, translations, and reformatting don't benefit from deep reasoning
- ▸Tool dispatch — deciding which tool to call is a classification problem; reasoning models are overkill
At high effort, reasoning models sometimes over-complicate straightforward answers. A model asked 'What is 2+2?' may reason for 500 tokens and return the right answer — or confidently return a hallucinated 'reasoning chain' that leads to a wrong one. On simple tasks, reasoning is a liability, not an asset.
Use reasoning models for tasks where mistakes are costly and accuracy improves with deliberation: multi-step proofs, security analysis, legal document interpretation, complex debugging, and adversarial analysis. If you can tolerate 5-10% error rate, a standard model with chain-of-thought prompting is cheaper and faster.
When to reach for reasoning — and which model to reach for