AI Incident Response Plan Template for Product Managers

Why AI Incidents Need Their Own Plan

A standard SaaS incident plan covers outages, data breaches, and bugs. AI incidents add new failure modes: hallucinations going viral, biased outputs surfacing, prompt injection attacks, model regressions silently degrading quality. The response motions are different — you can't restart a hallucinated answer the way you restart a crashed service.

AI-specific incident types

Quality regression, harmful output going public, prompt injection, vendor outage, data leakage via model output, bias incident.

Time-to-detect challenge

AI quality drift can take days to detect. Severity grows quietly. Detection latency is itself a tracked metric.

Time-to-mitigate challenge

You can't patch a bad answer that's already public. Containment is about preventing more bad answers, not reversing the one that escaped.

Communication challenge

Users want to know whether to trust the product. Silence destroys faster than imperfect transparency.

Severity Tier Definitions

SEV-1 — Critical

AI is causing user harm, public reputational risk, regulatory exposure, or data leakage. War room within 15 minutes. Exec on call.

SEV-2 — High

AI quality regression affecting many users; incorrect outputs on important workflows. Containment within 1 hour. Eng + PM + comms.

SEV-3 — Moderate

Quality regression on a specific surface or user segment. Containment within 4 hours. PM-led response.

SEV-4 — Low

Single-user reports, minor format issues, edge case failures. Standard ticket triage. No special response motion.

Roles in an AI Incident

Incident Commander

Owns the response. Often a senior PM or eng lead. Makes containment calls; coordinates across functions; drives toward resolution.

Technical Lead

Owns mitigation. Disables features, rolls back prompts/models, deploys fixes. Reports status every 15 minutes during SEV-1/2.

Communications Lead

Owns messaging. Drafts customer comms, internal updates, public statements. Holds the pen on what gets said when.

Subject Matter Experts

ML engineer, safety expert, legal/comms specialist as needed. Pulled in by Incident Commander based on incident type.

Get Incident-Ready in the Masterclass

The AI PM Masterclass includes incident response drills, postmortem templates, and real-world case studies — the muscle that doesn't build naturally during quiet times.

First 60 Minutes — The Decision Tree

Minute 0-5: Confirm and classify

Reproduce the issue. Estimate impact (number of users, severity). Set the SEV tier. Page the Incident Commander.

Minute 5-15: Contain

Disable the feature, route around the failure, or roll back. The goal is stopping new bad outputs — not fixing the root cause yet.

Minute 15-30: Diagnose

Identify the root cause hypothesis. Pull eval data, recent changes, model version diffs. Don't deploy a fix yet.

Minute 30-60: Communicate

Internal update to the team, exec, support. External communication if user-facing impact. Set expectations on resolution.

Communication Scripts

Initial public statement

"We're aware of an issue affecting [feature]. We've disabled [behavior] while we investigate. We'll update within [timeframe]." Honest, brief, time-bound.

Internal status update

Every 15 minutes during SEV-1/2: current state, mitigation status, next milestone, blocker. Predictable cadence reduces anxiety.

Resolution announcement

"The issue is resolved as of [time]. We affected [count] users. Root cause: [brief]. Postmortem published within [timeframe]."

Postmortem publication

Within 5 business days. Public for SEV-1; internal for others. Blameless format. Concrete preventive actions with owners.