Structured Output Techniques
Getting reliable JSON, structured data, and type-safe outputs from LLMs. Covers JSON mode, function calling, constrained decoding, Pydantic validation, and handling partial/malformed output in streaming scenarios.
Quick Reference
- →JSON mode: model outputs valid JSON, but no schema enforcement (any valid JSON)
- →Function calling / tool use: model fills parameters for a declared function schema
- →Structured Outputs (OpenAI): guaranteed schema compliance via constrained decoding
- →Pydantic + instructor: Python-native schema validation with automatic retries
- →Streaming JSON: use partial parsers (e.g., partial-json-parser) to handle incomplete JSON
- →Always validate LLM output with a schema -- never trust the model to produce perfect structure
In this article
Approaches to Structured Output
There are several ways to get structured data from LLMs, each with different trade-offs in reliability, flexibility, and provider support. The right choice depends on your reliability requirements and provider.
| Approach | Schema enforced? | Provider support | Reliability | Use when |
|---|---|---|---|---|
| Prompt-based | No | All providers | Low (70-90%) | Quick prototyping, simple schemas |
| JSON mode | Partial (valid JSON only) | OpenAI, Anthropic, Gemini | Medium (90-95%) | Need valid JSON, flexible schema |
| Function calling | Yes (parameter schema) | OpenAI, Anthropic, Gemini | High (95-99%) | Tool use patterns, well-defined schemas |
| Structured Outputs | Yes (constrained decoding) | OpenAI | Very high (99%+) | Critical schemas, zero tolerance for malformation |
| Instructor + Pydantic | Yes (validation + retry) | Any provider | Very high (99%+) | Type-safe Python, any provider |
OpenAI's Structured Outputs use constrained decoding: at each token generation step, the model is forced to only produce tokens that are valid continuations of the schema. This means the output is guaranteed to be schema-compliant -- not because the model is good at following instructions, but because it physically cannot produce invalid output.
JSON Mode and Function Calling
Function calling was designed for tool use, but it is excellent for data extraction. Define a 'function' that represents your extraction schema, force the model to call it, and parse the arguments. This gives you schema enforcement without needing OpenAI's Structured Outputs feature.
Pydantic + Instructor for Type Safety
The instructor library wraps LLM APIs with Pydantic model validation, giving you Python-native type safety with automatic retries on validation failure. It works with any provider and is the recommended approach for production Python applications.
Raw JSON parsing with json.loads() is fragile. Pydantic gives you type validation, default values, custom validators, and clear error messages when the model output does not match your schema. Combined with instructor's automatic retry, this makes structured extraction robust enough for production.
Handling Streaming and Partial JSON
When streaming LLM responses, you receive tokens incrementally. If the output is JSON, the stream contains partial, invalid JSON until the response completes. Handling this requires special parsing approaches.
You cannot fully validate a Pydantic model until the stream completes. Partial streaming gives you early access to fields but cannot enforce cross-field validation (like 'end_date must be after start_date'). For critical validation, wait for the complete response. For UX, show partial results but indicate they are provisional.
Choosing the Right Approach
| Requirement | Best approach | Why |
|---|---|---|
| Quick prototype, any provider | Prompt-based JSON | Fastest to implement, no library dependencies |
| Valid JSON guaranteed | JSON mode (response_format) | Built into the API, zero parsing failures |
| Schema enforcement, OpenAI | Structured Outputs | Constrained decoding, physically can't produce invalid output |
| Schema enforcement, any provider | Instructor + Pydantic | Works everywhere, type-safe, auto-retry |
| Tool use / agent workflows | Function calling | Designed for this, all providers support it |
| Streaming with typed output | Instructor partial streaming | Handles incremental Pydantic population |
Even with constrained decoding, validate the semantic content. A schema-valid JSON response can still contain nonsensical values (age: -5, date: '2099-13-45'). Layer your defenses: (1) constrained decoding for syntactic validity, (2) Pydantic validators for semantic validity, (3) business logic checks for domain validity.
- ▸Start with instructor + Pydantic -- it works with every provider and gives you the most flexibility
- ▸If you are OpenAI-only and need maximum reliability, use Structured Outputs
- ▸For agent/tool-use patterns, function calling is the natural fit across all providers
- ▸Always add Pydantic field validators for domain constraints (valid ranges, enum values, format checks)
- ▸Log validation failures -- they reveal prompt weaknesses and model limitations
Best Practices
Do
- ✓Use Pydantic models as your source of truth for output schemas -- they provide validation, docs, and type safety
- ✓Use instructor with automatic retries for production structured extraction
- ✓Add field-level validators in Pydantic for domain-specific constraints
- ✓Start with function calling or instructor -- avoid raw prompt-based JSON in production
- ✓Log and monitor validation failures to identify systematic extraction issues
Don’t
- ✗Don't use json.loads() without a Pydantic model -- you lose validation and type safety
- ✗Don't assume JSON mode guarantees your schema -- it only guarantees valid JSON syntax
- ✗Don't ignore streaming edge cases -- partial JSON requires special handling
- ✗Don't use Structured Outputs if you need provider portability (OpenAI-only feature)
- ✗Don't skip semantic validation -- schema-valid JSON can still contain wrong values
Key Takeaways
- ✓JSON mode guarantees valid JSON syntax but not your schema. Function calling and Structured Outputs enforce schemas.
- ✓Instructor + Pydantic is the best general-purpose approach: works with any provider, type-safe, auto-retries on failure.
- ✓Constrained decoding (OpenAI Structured Outputs) physically prevents schema violations at the token level.
- ✓Streaming JSON requires partial parsing -- use instructor's create_partial or partial-json-parser library.
- ✓Always validate semantics on top of structure: schema-valid JSON can still contain nonsensical values.
Video on this topic
Getting reliable JSON from LLMs every time
tiktok