AI Product Quarterly Business Review (QBR) Template
TL;DR
A standard SaaS QBR does not work for AI products. You need to report on model quality, hallucination rates, cost-per-request trends, and probabilistic risks — none of which fit in a generic OKR template. This is the slide-by-slide QBR structure used by senior AI PMs at scale, with example fill-ins so you can copy the format and only swap your numbers.
Slide 1–2: North Star & Headline
Open with the north star metric and one-line summary. If your CEO reads only the first slide, what is the verdict on the quarter?
North Star Metric
Example: Weekly Active AI Users (WAAU): 142,000 — up from 98,000 last quarter (+45% QoQ)
Quarterly Verdict (one sentence)
Example: Adoption hit plan; cost per request rose 18% above forecast — we are course-correcting next quarter.
Top 3 Wins
Example: 1) Shipped agent v2 to GA. 2) Cut p95 latency from 4.1s to 1.8s. 3) Closed 3 enterprise deals citing AI as primary reason.
Top 3 Misses
Example: 1) Eval pass rate stalled at 84%. 2) Power-user retention dropped 6 pts. 3) Vendor outage drove 2-hour P1.
Last QBR Commitments — Status
Example: 5 of 7 shipped; 1 descoped (BYOK — pushed to Q3); 1 missed (eval framework).
Slide 3–5: Model Performance
This is the section that distinguishes an AI QBR from a SaaS QBR. Report on quality, not just volume.
Eval Pass Rate (internal benchmark)
Q1: 79% → Q2: 84% (+5pts). Floor we ship at: 80%. Stretch goal: 90% by Q4. Note which sub-eval moved (reasoning, retrieval, tool-calling).
Hallucination Rate (sampled production traffic)
Measured weekly via human review on a 200-prompt sample. Q1: 6.2% → Q2: 3.8%. Methodology: blind review by 2 raters, disagreements escalated. Confidence interval included.
Latency (p50 / p95 / p99)
p50: 720ms → 540ms. p95: 4.1s → 1.8s. p99: 11s → 4.2s. Driver: switched router to faster small model for 60% of traffic.
Refusal Rate & Over-Refusal Audit
Refusal rate: 2.1% (target: <3%). Over-refusal cases reviewed quarterly: 14 confirmed false-positive refusals out of 50 reviewed. Tuning prompt this quarter.
Slide 6–8: Customer Adoption
Activation Rate
Example: % of new users who complete a first AI action within 7 days. Q1: 38% → Q2: 51%. Driver: in-app onboarding tour shipped wk 4.
Power User Cohort (PUC)
Example: Users with >20 AI actions/week. Grew from 4,200 to 7,800. PUCs drive 64% of total AI traffic — concentration risk to monitor.
Retention by Cohort (D7 / D30 / D90)
Example: Jan cohort D90 retention: 41%. Feb cohort D90: 48% (+7pts). The new prompt library is the leading hypothesis for the lift.
AI-Driven NPS or Trust Score
Example: Standalone post-action survey: 'Was this response useful?' Q1: 72% useful → Q2: 81%. Sample size: 4,100 responses.
Customer Stories
Example: Quote 2–3 specific customers. Names + outcomes. 'Acme cut response time from 8h to 12min using the AI workflow.' Anchors the metrics in narrative.
Run AI QBRs That Earn Investment
Executive communication, AI metrics design, and quarterly planning are core curriculum — taught live by a Salesforce Sr. Director PM.
Slide 9–10: Cost Trends
Finance reads this section. Show that you are running the AI like a P&L, not a science project.
Total Inference Spend
Example: $418K (Q2) vs. $312K (Q1) — up 34%. Forecast was $360K. Variance driver: 12% from agent v2 long-tool-call workflows.
Cost per Successful Request (CSPR)
Example: $0.041 → $0.052 (+27%). Higher than plan; routing audit in flight to push 70% of traffic to small model.
Cost per Active User
Example: $2.94 → $2.21 (-25%). Volume offset unit cost rise — gross margin holds at 71%.
Top 5 Most Expensive Workflows
Example: Rank by total spend. Identify if they are also the most-used. If not, they are candidates for prompt compression or model downgrade.
Cost Forecast Next Quarter
Example: $520K base / $610K with agent v3 launch. Sensitivity: every 10% drop in WAAU = ~$45K saved.
Slide 11: Risks & Incidents
Surface risks before someone else does. Include both incidents that happened and risks you are tracking.
P1 Incidents This Quarter
Example: 2 incidents. Apr 14: vendor model outage, 2h 18min, 12% of traffic affected. Mitigation: dual-vendor failover shipped Apr 28.
Top Risk: Underlying Model Deprecation
Example: Vendor announced sunset of model X in Q4. Migration plan in flight; 30% of prompts already validated on successor model.
Top Risk: Regulatory Exposure
Example: EU AI Act high-risk classification under review — legal opinion due June 1. If high-risk, conformity assessment adds 6 weeks to launch path.
Top Risk: Cost Tail
Example: Long-tail of expensive agent runs: 0.4% of sessions account for 14% of spend. Hard cost cap rolling out next sprint.
Top Risk: Eval Coverage Gap
Example: Eval suite covers 8 of 12 production scenarios. 4 are uninstrumented. Q3 plan adds the missing 4 and adversarial cases.
Slide 12–13: Next Quarter Bets
End on three bets, not a backlog. Each bet has a hypothesis, a metric, and a kill criterion.
Bet 1: Agent v3 — Multi-Step Reasoning
Hypothesis: Multi-step reasoning unlocks the workflow customers ask for most (research + draft + revise). Hypothesis: lifts power-user weekly retention by 5pts.
Metric & Kill criterion: Power-user W4 retention. Baseline 41%. Target 46%. Kill criterion: <2pt lift after 6 weeks of GA.
Bet 2: Cost Routing Layer
Hypothesis: 70% of traffic does not need the frontier model. A routing layer cuts CSPR by 30% with <1pt eval drop.
Metric & Kill criterion: CSPR. Baseline $0.052. Target $0.036. Kill criterion: eval pass rate drops below 80% on routed traffic.
Bet 3: Enterprise BYOK + Audit Logs
Hypothesis: BYOK + audit log export unlocks 4 enterprise deals stuck in legal review.
Metric & Kill criterion: Enterprise deals closed citing BYOK as enabler. Target: 3 closed-won by end of quarter. Kill criterion: 0 closed at week 8.