Intermediate12 min

Reasoning Models

Reasoning models (Claude adaptive thinking, OpenAI o3/o4-mini, Gemini 3 Deep Think) spend internal tokens on a scratchpad before producing the final answer. They cost 5-10x more per call than standard models and add latency. This article covers when that tradeoff is worth it, how each provider's API works in 2026, and how LangChain gives you a single parsing interface across all of them.

Quick Reference

→Claude Opus 4.7+: thinking={'type': 'adaptive'} — budget_tokens is rejected with a 400 error
→Claude Opus 4.6 / Sonnet 4.6: thinking={'type': 'adaptive'} + output_config={'effort': 'high'} — budget_tokens deprecated
→OpenAI o-series: reasoning_effort='low'|'medium'|'high', o3-pro for hardest 5% of tasks ($20/$80/M)
→Gemini 3: thinking_config={'thinking_budget': -1} for dynamic thinking, 0 to disable
→LangChain: response.content_blocks returns type='reasoning' blocks then type='text' blocks
→Reasoning tokens bill as output tokens — adaptive high effort adds ~10x cost vs standard call
→Stream: thinking_delta events arrive before text_delta — drive 'Thinking...' UX from block type changes
→Skip reasoning for classification, summarization, single-hop retrieval, and short-form generation

When NOT to Use Reasoning Models

The most expensive mistake with reasoning models is using them for tasks that don't need them. Before enabling thinking, run the single-step test: can this task be solved with a good prompt and a standard model in one pass? If yes, reasoning adds cost and latency without improving accuracy.

▸Classification and routing — single-hop decisions benefit from a fast, cheap model, not an extended thinking trace
▸Summarization — reasoning doesn't improve compression quality; it adds unnecessary tokens
▸RAG response synthesis — when context is well-retrieved, generating from it is a simple task
▸Short-form content generation — emails, headlines, translations, and reformatting don't benefit from deep reasoning
▸Tool dispatch — deciding which tool to call is a classification problem; reasoning models are overkill

Reasoning can make simple tasks worse

At high effort, reasoning models sometimes over-complicate straightforward answers. A model asked 'What is 2+2?' may reason for 500 tokens and return the right answer — or confidently return a hallucinated 'reasoning chain' that leads to a wrong one. On simple tasks, reasoning is a liability, not an asset.

Use reasoning models for tasks where mistakes are costly and accuracy improves with deliberation: multi-step proofs, security analysis, legal document interpretation, complex debugging, and adversarial analysis. If you can tolerate 5-10% error rate, a standard model with chain-of-thought prompting is cheaper and faster.

When to reach for reasoning — and which model to reach for

The Reasoning Model Landscape (2026)

Four providers offer production-grade reasoning models with distinct API patterns and pricing. Pick based on latency, audit trail requirements, and cost tolerance.

How Reasoning Works Under the Hood

Reasoning models generate an internal scratchpad before the final answer. This scratchpad — called thinking tokens on Claude or reasoning tokens on OpenAI — is billed as output tokens. You don't control what the model writes in it; you only control how much budget it can spend.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.