Beginner8 min
Model Configuration
Configure temperature, max_tokens, retries, timeouts, and rate limiting when initializing a model. Track token usage across multiple models with UsageMetadataCallbackHandler.
Quick Reference
- →init_chat_model('model-id', temperature=0.7, max_tokens=1000, timeout=30)
- →max_retries=6 by default — increase to 10–15 for long-running tasks on unreliable networks
- →rate_limiter=InMemoryRateLimiter(...) prevents hitting provider rate limits
- →UsageMetadataCallbackHandler tracks token counts across all models in a session
- →logprobs=True on bind() returns per-token probabilities (OpenAI only)
Core Parameters
| Parameter | Type | Default | Purpose |
|---|---|---|---|
| model | str | required | Model name or 'provider:model' shorthand |
| temperature | float | varies | Randomness — 0 = deterministic, 1+ = creative |
| max_tokens | int | varies | Max response length in tokens |
| timeout | float | None | Seconds before the request is cancelled |
| max_retries | int | 6 | Retry attempts on network/rate-limit errors |
| api_key | str | env var | Auth key — usually set via environment variable |
Configuring a model with init_chat_model
Increase max_retries for long-running agents
The default is 6 retries with exponential backoff. For long-running agent tasks on unreliable networks, set max_retries=10–15. Network errors, 429 rate limits, and 5xx server errors are retried automatically. 401/404 errors are not retried.