AI Product Quarterly Business Review (QBR) Template

Slide 1–2: North Star & Headline

Open with the north star metric and one-line summary. If your CEO reads only the first slide, what is the verdict on the quarter?

North Star Metric

Example: Weekly Active AI Users (WAAU): 142,000 — up from 98,000 last quarter (+45% QoQ)

Quarterly Verdict (one sentence)

Example: Adoption hit plan; cost per request rose 18% above forecast — we are course-correcting next quarter.

Top 3 Wins

Example: 1) Shipped agent v2 to GA. 2) Cut p95 latency from 4.1s to 1.8s. 3) Closed 3 enterprise deals citing AI as primary reason.

Top 3 Misses

Example: 1) Eval pass rate stalled at 84%. 2) Power-user retention dropped 6 pts. 3) Vendor outage drove 2-hour P1.

Last QBR Commitments — Status

Example: 5 of 7 shipped; 1 descoped (BYOK — pushed to Q3); 1 missed (eval framework).

Slide 3–5: Model Performance

This is the section that distinguishes an AI QBR from a SaaS QBR. Report on quality, not just volume.

Eval Pass Rate (internal benchmark)

Q1: 79% → Q2: 84% (+5pts). Floor we ship at: 80%. Stretch goal: 90% by Q4. Note which sub-eval moved (reasoning, retrieval, tool-calling).

Hallucination Rate (sampled production traffic)

Measured weekly via human review on a 200-prompt sample. Q1: 6.2% → Q2: 3.8%. Methodology: blind review by 2 raters, disagreements escalated. Confidence interval included.

Latency (p50 / p95 / p99)

p50: 720ms → 540ms. p95: 4.1s → 1.8s. p99: 11s → 4.2s. Driver: switched router to faster small model for 60% of traffic.

Refusal Rate & Over-Refusal Audit

Refusal rate: 2.1% (target: <3%). Over-refusal cases reviewed quarterly: 14 confirmed false-positive refusals out of 50 reviewed. Tuning prompt this quarter.

Slide 6–8: Customer Adoption

Activation Rate

Example: % of new users who complete a first AI action within 7 days. Q1: 38% → Q2: 51%. Driver: in-app onboarding tour shipped wk 4.

Power User Cohort (PUC)

Example: Users with >20 AI actions/week. Grew from 4,200 to 7,800. PUCs drive 64% of total AI traffic — concentration risk to monitor.

Retention by Cohort (D7 / D30 / D90)

Example: Jan cohort D90 retention: 41%. Feb cohort D90: 48% (+7pts). The new prompt library is the leading hypothesis for the lift.

AI-Driven NPS or Trust Score

Example: Standalone post-action survey: 'Was this response useful?' Q1: 72% useful → Q2: 81%. Sample size: 4,100 responses.

Customer Stories

Example: Quote 2–3 specific customers. Names + outcomes. 'Acme cut response time from 8h to 12min using the AI workflow.' Anchors the metrics in narrative.

Run AI QBRs That Earn Investment

Executive communication, AI metrics design, and quarterly planning are core curriculum — taught live by a Salesforce Sr. Director PM.

Slide 9–10: Cost Trends

Finance reads this section. Show that you are running the AI like a P&L, not a science project.

Total Inference Spend

Example: $418K (Q2) vs. $312K (Q1) — up 34%. Forecast was $360K. Variance driver: 12% from agent v2 long-tool-call workflows.

Cost per Successful Request (CSPR)

Example: $0.041 → $0.052 (+27%). Higher than plan; routing audit in flight to push 70% of traffic to small model.

Cost per Active User

Example: $2.94 → $2.21 (-25%). Volume offset unit cost rise — gross margin holds at 71%.

Top 5 Most Expensive Workflows

Example: Rank by total spend. Identify if they are also the most-used. If not, they are candidates for prompt compression or model downgrade.

Cost Forecast Next Quarter

Example: $520K base / $610K with agent v3 launch. Sensitivity: every 10% drop in WAAU = ~$45K saved.

Slide 11: Risks & Incidents

Surface risks before someone else does. Include both incidents that happened and risks you are tracking.

P1 Incidents This Quarter

Example: 2 incidents. Apr 14: vendor model outage, 2h 18min, 12% of traffic affected. Mitigation: dual-vendor failover shipped Apr 28.

Top Risk: Underlying Model Deprecation

Example: Vendor announced sunset of model X in Q4. Migration plan in flight; 30% of prompts already validated on successor model.

Top Risk: Regulatory Exposure

Example: EU AI Act high-risk classification under review — legal opinion due June 1. If high-risk, conformity assessment adds 6 weeks to launch path.

Top Risk: Cost Tail

Example: Long-tail of expensive agent runs: 0.4% of sessions account for 14% of spend. Hard cost cap rolling out next sprint.

Top Risk: Eval Coverage Gap

Example: Eval suite covers 8 of 12 production scenarios. 4 are uninstrumented. Q3 plan adds the missing 4 and adversarial cases.

Slide 12–13: Next Quarter Bets

End on three bets, not a backlog. Each bet has a hypothesis, a metric, and a kill criterion.

Bet 1: Agent v3 — Multi-Step Reasoning

Hypothesis: Multi-step reasoning unlocks the workflow customers ask for most (research + draft + revise). Hypothesis: lifts power-user weekly retention by 5pts.

Metric & Kill criterion: Power-user W4 retention. Baseline 41%. Target 46%. Kill criterion: <2pt lift after 6 weeks of GA.

Bet 2: Cost Routing Layer

Hypothesis: 70% of traffic does not need the frontier model. A routing layer cuts CSPR by 30% with <1pt eval drop.

Metric & Kill criterion: CSPR. Baseline $0.052. Target $0.036. Kill criterion: eval pass rate drops below 80% on routed traffic.

Bet 3: Enterprise BYOK + Audit Logs

Hypothesis: BYOK + audit log export unlocks 4 enterprise deals stuck in legal review.

Metric & Kill criterion: Enterprise deals closed citing BYOK as enabler. Target: 3 closed-won by end of quarter. Kill criterion: 0 closed at week 8.