LangChain/Models
Intermediate6 min

Reasoning Models

Reasoning models (o3, Claude with extended thinking) emit internal thought steps before the final answer. Access reasoning via content_blocks, control effort with budget_tokens, and stream thinking tokens in real time.

Quick Reference

  • response.content_blocks filters to type='reasoning' for thought steps
  • Pass thinking={'type': 'enabled', 'budget_tokens': 5000} to Claude for extended thinking
  • OpenAI o-series: reasoning_effort='low'|'medium'|'high' controls cost/quality tradeoff
  • Reasoning tokens count toward output tokens — budget affects cost
  • stream=True yields reasoning chunks before the final answer text

What Reasoning Output Is

Reasoning models think before they answer. Models like OpenAI o3 and Claude with extended thinking emit a reasoning trace — internal steps the model takes to solve the problem — before producing the final response. LangChain surfaces these as content_blocks of type 'reasoning' on the AIMessage. The final text answer is a separate block of type 'text'.

Reasoning tokens are billed as output tokens

The reasoning trace counts toward your output token usage. A 5,000-token reasoning budget on a task can cost significantly more than a standard call. Use reasoning selectively — for complex multi-step problems where accuracy justifies the cost.