AI Output Validation: From JSON Schema to Constrained Decoding
TL;DR
Most production AI breakage isn't the model being wrong — it's the model being right but malformed. A JSON missing a comma, an enum returned as a synonym, a field name that drifted. Output validation is the line between "works in demo" and "works at scale." This guide covers the four validation patterns AI PMs should know, when to use each, and how they layer.
The Four Patterns
Pattern 1: Post-hoc validation
Model generates freely; you validate the output (JSON schema, regex, custom rules). Cheap, common, brittle on edge cases.
Pattern 2: Vendor-native structured outputs
OpenAI's structured outputs, Anthropic's tool use. The model is forced to produce schema-compliant output. Often the right default in 2026.
Pattern 3: Constrained decoding
Generation is mathematically constrained at the token level. Outlines, regex grammars, and JSON-mode libraries. The most reliable; some quality cost.
Pattern 4: Self-validation with retry
Model checks its own output, fixes errors. Cheap to add but doubles cost on retries. Useful for complex structured outputs.
JSON Schema Validation in Practice
JSON schema is the lingua franca of AI output validation. It's simple, well-supported, and good enough for most use cases. The challenge isn't writing the schema — it's designing schemas that produce reliable model outputs.
Use enums liberally
Models return open strings reliably for free-form fields. For categorical fields, use enums and let validation catch deviations.
Required fields must be required
If a field is optional in the schema, models will sometimes omit it. Required = required.
Description matters
JSON schemas with rich field descriptions produce better outputs. Treat the schema as part of your prompt.
Avoid overly nested schemas
Three+ levels of nesting becomes a quality cliff. Flatten when possible.
Validate examples in your prompt
Few-shot examples must themselves pass schema validation. A bad example teaches a bad pattern.
When to Use Constrained Decoding
Constrained decoding mathematically prevents the model from producing invalid output — every generated token is restricted to those that satisfy the grammar. It's the strongest guarantee available, with a real but small quality cost.
When format is non-negotiable
Code generation, SQL queries, regex outputs. If the format must parse cleanly, constrained decoding is the right tool.
When schema is complex
Deeply nested or strict JSON. Post-hoc validation has high failure rates here; constrained decoding shines.
When quality is robust enough
Constrained decoding can hurt quality slightly. For high-IQ tasks, weigh the tradeoff. For format-strict tasks, the tradeoff is worth it.
When self-hosting
Constrained decoding is most powerful with full model access. Some hosted APIs support it; others don't. Plan accordingly.
Make AI Outputs Production-Reliable
The AI PM Masterclass covers output validation, schema design, and reliability patterns — taught by a Salesforce Sr. Director PM.
Layered Validation in Production
Layer 1: Prompt-level guidance
Schema and few-shot examples in the prompt itself. The cheapest layer; sets baseline quality.
Layer 2: Vendor structured output
Use the platform's native JSON mode or tool use. Catches the majority of format failures at the API boundary.
Layer 3: Schema validation in code
Server-side parse and validate. Reject malformed; trigger retries or error UI.
Layer 4: Semantic checks beyond schema
Schema validates structure; semantic checks validate meaning. "Field X must reference an existing user." Catches plausible-but-wrong outputs.
Layer 5: Human review for high-stakes
When semantic checks aren't enough, add human review on a sample or all outputs. Last layer of defense.
Validation Mistakes That Bite
No retry strategy on validation failure
First call fails validation; second call retries with feedback. If you don't retry, every parse error is a user-facing failure.
Silent failure on edge cases
Outputs that pass schema but are semantically wrong slip through. Always layer semantic checks.
Schema drift between code and prompt
Schema in code says X; prompt says Y. Outputs match Y, fail X. Single source of truth required.
Trusting the model on enum values
Models return synonyms ('done' vs 'completed') without strict enum constraints. Lock enums explicitly.
No telemetry on validation failures
If you don't track failure rate per surface, you can't see slow drift. Log every validation outcome.