Standard Scrum breaks when your "feature" is a model that may or may not work. Here are the frameworks that actually fit how AI products get built in 2026.
Why Standard Agile Fails for AI
Scrum was built for predictable software where you can break a feature into stories and estimate them. AI features are not predictable. You cannot story-point "improve the summarization quality from 72% to 85% on our eval set" because you do not know in advance whether that requires a new prompt, a new model, a new retrieval strategy, or a fundamentally different approach.
The AI PMs shipping the fastest in 2026 have stopped pretending that two-week sprints with burndown charts capture ML work. They have adapted, hybridized, or replaced classical agile with frameworks designed for uncertainty. The list below is the working set — not theoretical frameworks but the ones senior AI PMs actually run with their teams.
⚙️Process is half the job. The AI PM Masterclass runs you through these frameworks on real AI projects — with a Salesforce Sr. Director PM who has actually shipped LLM products at scale.
Discovery-Heavy Frameworks
1. Dual-Track Agile (Marty Cagan, Adapted for AI)
Cagan's dual-track model splits work into a Discovery track (figure out what to build) and a Delivery track (build it). For AI products this is the foundational adaptation. Discovery becomes "figure out if the model can actually do this thing at the quality bar we need" — which is a research question, not a design question.
The adaptation for AI: run discovery for 1–2 weeks per initiative using prompt prototypes, small evals, and notebook experiments. Only commit to delivery when the discovery track produces a working prototype hitting at least 70% of the target quality. Most AI features die in discovery, and that is the point.
Why AI PMs need this: The single most important mental model for managing AI development uncertainty. Stops you from committing a delivery team to features that cannot actually be built.
Read the Framework2. Eval-Driven Development (Eugene Yan / Hamel Husain)
Borrowing test-driven development from software, eval-driven development says: before you build an AI feature, write the eval that proves it works. Then iterate prompts, retrieval, and models against that eval until it passes. Eugene Yan and Hamel Husain have written the canonical posts on this in 2024–2026.
For PMs, this changes how you write requirements. Instead of writing user stories, you write eval cases — input plus expected output, plus a clear pass/fail criterion. Engineers cannot ship until the eval passes. Stakeholders cannot argue with eval numbers. See our deeper dive on the AI evaluation and testing guide.
Why AI PMs need this: The most important productivity unlock in AI product management this decade. Replaces vague quality debates with measurable progress.
Read the Framework3. Design Sprints (Jake Knapp, AI-Modified)
The 5-day design sprint compressed into 3 days works well for AI feature exploration. Day 1: define the model task and write evals. Day 2: prototype 3–4 prompt or architecture approaches. Day 3: run the evals, pick the winner, ship to a small user test.
The AI adaptation drops the elaborate user interview phase (you can interview after evals confirm feasibility) and replaces the final user test with an eval run plus a 5-user think-aloud session. Saves a week of work for many AI feature decisions.
Why AI PMs need this: Best framework for resolving "should we build this AI feature?" arguments in a fixed-time, low-cost way.
Read the FrameworkStrategy and Planning Frameworks
4. Lean AI Canvas
Ash Maurya's Lean Canvas adapted for AI products. Replaces "solution" with "model approach," "key metrics" with "evals + leading indicators," and adds two new sections: "data dependencies" and "failure modes." A single page that forces you to confront the AI-specific risks before writing a PRD.
Use it at the start of any new AI initiative. If you cannot fill out the data dependencies and failure modes sections with specifics, you are not ready to commit a team to the work. Stop and do more discovery.
Why AI PMs need this: Forces honest scoping before sunk costs accumulate. Particularly useful for AI feature pitches to leadership.
Read the Framework5. MLOps Loop (Google / Continuous Delivery for ML)
Google's MLOps maturity model describes the full lifecycle: data collection, training, validation, deployment, monitoring, retraining. As a PM, you do not own this loop, but you need to plan your roadmap around it. New training data takes weeks to collect. Retraining a model takes days. Monitoring drift takes ongoing eng time.
The PM framework move: tie every AI feature initiative to a stage of the MLOps loop. If your feature requires a new data source, the dependency is on data collection, not engineering. If it requires a model update, the dependency is the retraining cycle. This makes roadmap timing realistic.
Why AI PMs need this: The mental model that converts "this should take 2 weeks" into "this depends on the next retraining cycle, which means 8 weeks." Realism prevents missed commitments.
Read the Framework6. CRISP-DM (Cross-Industry Standard Process for Data Mining)
A 25-year-old framework that has aged surprisingly well. CRISP-DM's six phases — business understanding, data understanding, data preparation, modeling, evaluation, deployment — map cleanly onto AI feature development with one extra step (monitoring).
Most ML engineers know CRISP-DM by name even if they no longer cite it explicitly. Using its vocabulary in cross-functional meetings shortcuts a lot of process debates. "We are still in the data understanding phase" is a precise statement that ML engineers immediately respect.
Why AI PMs need this: Lingua franca for talking process with ML engineers. Use it to align on which phase a project is actually in.
Read the FrameworkExecution and Cadence Frameworks
7. Kanban for AI Experimentation
Two-week sprints are wrong for AI experiments because experiments do not finish on a schedule — they finish when the eval crosses the threshold. Kanban with WIP limits handles this naturally. Each card is an experiment hypothesis with an attached eval. Cards move when the eval delivers a verdict.
The discipline that makes Kanban work for AI: set a WIP limit of 3–4 experiments per ML engineer. Force experiments to finish (positive or negative) before new ones start. This is the only way to avoid the AI team disease of "we have 47 in-flight experiments and no decisions."
Why AI PMs need this: The right execution framework for the discovery half of dual-track agile. Replaces sprints when work is genuinely unpredictable.
Read the Framework8. OKRs With Leading + Lagging AI Metrics
Classical OKRs ("achieve 95% accuracy by Q3") fail in AI because the metric is the work — once you commit to it you have nothing left to discover. The 2026 adaptation pairs each AI OKR with a leading indicator (eval score) and a lagging indicator (user-facing metric like task completion or retention).
Run quarterly OKRs on lagging indicators, weekly check-ins on leading indicators. This separates "are we making technical progress?" from "is the technical progress moving the user-facing outcome?" — which are different questions for AI products. See our deeper guide to AI product metrics that actually matter.
Why AI PMs need this: The fix for OKRs that look great on a dashboard while the actual product still feels broken to users.
Read the Framework9. Thin AI Slice Planning
Borrowed from "vertical slice" in classical agile but adapted for AI. A thin AI slice is the smallest end-to-end version of an AI feature that demonstrates feasibility: a prompt that works on one happy-path input, plus the minimal UI to show it, plus the eval that confirms it. Time-boxed to 1–2 weeks.
The point is to fail fast on integration risk, not just model risk. Many AI features die in the integration step — the model works in isolation but cannot be plumbed into the product latency budget, the existing auth model, or the company's data residency rules. Thin slice exposes this in week 1, not week 8.
Why AI PMs need this: The single best framework for de-risking AI initiatives before committing real engineering capacity.
Read the FrameworkCross-Functional Frameworks
10. AI-Tuned RACI (Roles for ML Products)
Standard RACI charts break on AI because of two unique roles: the ML engineer who owns the model and the data engineer who owns the training data. The AI-tuned RACI explicitly assigns these — plus a "C" (consulted) slot for the responsible AI / ethics reviewer that most AI features now require.
For each AI feature, draw the RACI before kickoff. The PM is almost always R+A for feature outcomes. The ML lead is R for model performance. The data engineer is R for data availability. The legal / ethics reviewer is C. Skipping this conversation creates 3 weeks of confused ownership six weeks in.
Why AI PMs need this: Prevents the most common AI project failure mode — diffuse ownership when something goes wrong with a model in production.
Read the FrameworkHow to Actually Adopt These
Do not roll out all ten at once. Pick the one that addresses your team's biggest pain. If you cannot estimate AI features, start with dual-track agile. If quality debates are endless, start with eval-driven development. If experiments never finish, start with Kanban + WIP limits. Each framework on its own is worth a quarter of focused adoption.
A Combined Operating Model
The AI PMs I respect run a stack of three: dual-track agile for the macro structure, eval-driven development for the quality bar, and Kanban with WIP limits for the discovery track. Delivery still runs sprints because productionizing a working model is, in fact, predictable engineering.
On top of those three, OKRs with leading + lagging metrics tie everything to outcomes, and an AI-tuned RACI keeps ownership clear. That is a complete operating model for an AI PM in 2026.
What to Drop
Story points. They were never accurate and they actively hurt for AI work. Burndown charts. They suggest linear progress where there is none. Velocity metrics. They incentivize the wrong behaviors. None of these are load-bearing for shipping good AI products.
You can keep stand-ups and retros — those are pure communication rituals and they survive the transition to AI work. Just stop pretending the rest of Scrum applies. Most senior ML engineers will respect you more for naming this honestly.
Where to Learn These Hands-On
Reading frameworks is the easy part. Running them with a real cross-functional team on a real AI initiative is the skill. Our AI Product Management Masterclass walks you through dual-track agile, eval-driven development, and AI-tuned RACI on real projects with a Salesforce Sr. Director PM.
Pick one framework from this list. Run it for a full quarter with one team. Measure whether decisions got faster and shipped quality got better. That is the only way these frameworks become operational instead of theoretical.