Reasoning Models
Reasoning models (o3, Claude with extended thinking) emit internal thought steps before the final answer. Access reasoning via content_blocks, control effort with budget_tokens, and stream thinking tokens in real time.
Quick Reference
- →response.content_blocks filters to type='reasoning' for thought steps
- →Pass thinking={'type': 'enabled', 'budget_tokens': 5000} to Claude for extended thinking
- →OpenAI o-series: reasoning_effort='low'|'medium'|'high' controls cost/quality tradeoff
- →Reasoning tokens count toward output tokens — budget affects cost
- →stream=True yields reasoning chunks before the final answer text
What Reasoning Output Is
Reasoning models think before they answer. Models like OpenAI o3 and Claude with extended thinking emit a reasoning trace — internal steps the model takes to solve the problem — before producing the final response. LangChain surfaces these as content_blocks of type 'reasoning' on the AIMessage. The final text answer is a separate block of type 'text'.
The reasoning trace counts toward your output token usage. A 5,000-token reasoning budget on a task can cost significantly more than a standard call. Use reasoning selectively — for complex multi-step problems where accuracy justifies the cost.