AI Hypothesis-Driven Product Development: A Framework for AI PMs

Why Hypothesis-Driven Development Matters More for AI

In a deterministic product, you build what you specced. In an AI product, the spec hides three different uncertainties: does the model do this well? Do users want this output? Does the unit economics work at scale? Each of those is a different hypothesis with a different test. Build all three at once and you can't tell which one killed your feature.

Capability hypothesis

"The current model can produce X with Y quality." Tested with offline evals before any UI work.

Demand hypothesis

"Users want X in this context." Tested with concierge tests, fake doors, or low-fi prototypes.

Economics hypothesis

"The unit economics work at expected scale." Tested with cost models and traffic projections.

Trust hypothesis

"Users will accept and act on AI outputs in this surface." Tested with usability studies and live trust signals.

Step 1 — Write the Hypothesis Like a Scientist

A hypothesis must be specific enough to fail. "Users will love AI summaries" isn't a hypothesis — it's a wish. The format that works: "If we ___, then ___, because ___." Concrete change, observable outcome, stated rationale.

Bad hypothesis

"AI summaries will improve user engagement." — too vague to test, no specific magnitude, no causal mechanism.

Good hypothesis

"If we add AI summaries above long threads, time-to-first-action drops 30%+ on threads >500 words, because users currently scroll-skim and miss the asks."

What changed

Specific feature, specific metric, specific magnitude, specific scope, specific causal mechanism. Now you can build a test.

Hypothesis review meeting

Run hypotheses past the team weekly. Stripping vague hypotheses early saves weeks of misdirected work.

Step 2 — Define Kill Criteria Up Front

If you don't define what failure looks like before the test, you'll rationalize success when the data is mixed. Kill criteria are the most uncomfortable line in any product hypothesis — and the most valuable.

Pre-commit to thresholds

"If acceptance rate is below 40%, we kill." State the number before the data. Force yourself to mean it.

Time-box the test

"We'll decide within 4 weeks of launch." Open-ended tests die slowly. Every test needs a hard end.

Anti-cherry-picking rules

List the metrics in advance. Don't move the goalpost when results are ambiguous.

Document the kill, don't bury it

Killed features still teach. A short writeup of what you learned is worth more than the team's comfort.

Build a Hypothesis Practice in the Masterclass

The AI PM Masterclass walks through hypothesis-driven development with real templates, kill criteria, and case studies of features that should have died sooner.

Step 3 — Build the Cheapest Test, Not the Best Feature

The first build of any AI feature should cost a fraction of the production version. Concierge tests, Wizard of Oz prototypes, fake doors, internal-only launches — these are the standard tools. The bias is always to build too much; the discipline is shrinking the test until it's embarrassingly small.

Concierge test

You manually do what the AI would do. Real users get real value. You learn whether the workflow even works before automating anything.

Wizard of Oz

The interface looks AI-powered; behind it, a human is generating outputs. You measure user reaction without paying for inference or eng time.

Fake door

Add the entry point in the UI. Track who clicks. Show a "coming soon" if they do. Demand signal at near-zero cost.

Internal-only launch

Ship to internal users for two weeks. Real eval data, real bug surface, no PR risk.

Step 4 — Decide Explicitly: Ship, Iterate, or Kill

Ship

Hypothesis confirmed. Move to production with eval suite, monitoring, and rollout plan. Document what you learned.

Iterate

Mixed signals. One more focused test before deciding. Time-box the iteration; don't loop forever.

Kill

Hypothesis disconfirmed. Document why, capture artifacts that may help future tests, redirect headcount immediately.

Pivot

Different hypothesis emerges from the data. Treat as a new test, not a continuation. Rewrite the hypothesis cleanly.