AI Output Validation: From JSON Schema to Constrained Decoding

The Four Patterns

Pattern 1: Post-hoc validation

Model generates freely; you validate the output (JSON schema, regex, custom rules). Cheap, common, brittle on edge cases.

Pattern 2: Vendor-native structured outputs

OpenAI's structured outputs, Anthropic's tool use. The model is forced to produce schema-compliant output. Often the right default in 2026.

Pattern 3: Constrained decoding

Generation is mathematically constrained at the token level. Outlines, regex grammars, and JSON-mode libraries. The most reliable; some quality cost.

Pattern 4: Self-validation with retry

Model checks its own output, fixes errors. Cheap to add but doubles cost on retries. Useful for complex structured outputs.

JSON Schema Validation in Practice

JSON schema is the lingua franca of AI output validation. It's simple, well-supported, and good enough for most use cases. The challenge isn't writing the schema — it's designing schemas that produce reliable model outputs.

Use enums liberally

Models return open strings reliably for free-form fields. For categorical fields, use enums and let validation catch deviations.

Required fields must be required

If a field is optional in the schema, models will sometimes omit it. Required = required.

Description matters

JSON schemas with rich field descriptions produce better outputs. Treat the schema as part of your prompt.

Avoid overly nested schemas

Three+ levels of nesting becomes a quality cliff. Flatten when possible.

Validate examples in your prompt

Few-shot examples must themselves pass schema validation. A bad example teaches a bad pattern.

When to Use Constrained Decoding

Constrained decoding mathematically prevents the model from producing invalid output — every generated token is restricted to those that satisfy the grammar. It's the strongest guarantee available, with a real but small quality cost.

When format is non-negotiable

Code generation, SQL queries, regex outputs. If the format must parse cleanly, constrained decoding is the right tool.

When schema is complex

Deeply nested or strict JSON. Post-hoc validation has high failure rates here; constrained decoding shines.

When quality is robust enough

Constrained decoding can hurt quality slightly. For high-IQ tasks, weigh the tradeoff. For format-strict tasks, the tradeoff is worth it.

When self-hosting

Constrained decoding is most powerful with full model access. Some hosted APIs support it; others don't. Plan accordingly.

Make AI Outputs Production-Reliable

The AI PM Masterclass covers output validation, schema design, and reliability patterns — taught by a Salesforce Sr. Director PM.

Layered Validation in Production

Layer 1: Prompt-level guidance

Schema and few-shot examples in the prompt itself. The cheapest layer; sets baseline quality.

Layer 2: Vendor structured output

Use the platform's native JSON mode or tool use. Catches the majority of format failures at the API boundary.

Layer 3: Schema validation in code

Server-side parse and validate. Reject malformed; trigger retries or error UI.

Layer 4: Semantic checks beyond schema

Schema validates structure; semantic checks validate meaning. "Field X must reference an existing user." Catches plausible-but-wrong outputs.

Layer 5: Human review for high-stakes

When semantic checks aren't enough, add human review on a sample or all outputs. Last layer of defense.

Validation Mistakes That Bite

No retry strategy on validation failure

First call fails validation; second call retries with feedback. If you don't retry, every parse error is a user-facing failure.

Silent failure on edge cases

Outputs that pass schema but are semantically wrong slip through. Always layer semantic checks.

Schema drift between code and prompt

Schema in code says X; prompt says Y. Outputs match Y, fail X. Single source of truth required.

Trusting the model on enum values

Models return synonyms ('done' vs 'completed') without strict enum constraints. Lock enums explicitly.

No telemetry on validation failures

If you don't track failure rate per surface, you can't see slow drift. Log every validation outcome.