When NOT to Add AI to Your Product: The Anti-Hype Decision Framework

The 6-Point Framework

If you cannot answer yes to at least 4 of these 6 questions, do not ship AI. Ship something else.

1. Is there a real user job here, or just a feature ask?

"Add AI summarization" is a feature ask. "Help me triage 200 incoming tickets in 20 minutes instead of 2 hours" is a user job. Features without jobs become unused tabs in your product. Test: can you describe the user job without using the word AI?

2. Is the failure mode acceptable?

Models hallucinate, drift, and degrade. If a wrong answer creates a regulatory incident, kills someone, loses money irreversibly, or makes the press, your failure tolerance is essentially zero — and AI is rarely the right tool. Reserve AI for failure-tolerant or human-in-the-loop contexts.

3. Will it still work when the model commoditizes?

Foundation model labs ship the obvious AI features for free in their next release. If your AI feature is "summarize this" or "draft an email," you are building a feature that will be a free button in the OS in 18 months. Build the parts that are not free.

4. Do you have evals, or just vibes?

Without a written eval suite that scores model outputs against a labeled dataset, you do not know if your feature works — you have feelings about a demo. Most teams ship AI before building evals. The result: silent regressions every time the model updates.

5. Is the unit economics math actually positive?

Run the model: gross revenue per user, gross cost per user including model spend, payback period. Most token-billed AI features have unit economics that work at low usage and break at high usage. If a power user costs you more than a casual user pays, you have a margin trap.

6. Is there a deterministic alternative that is 80% as good?

Rules, regex, lookup tables, classical ML, search ranking. These are deterministic, debuggable, and free at inference. If the deterministic version gets you 80% of the value at 10% of the cost and 5% of the risk, ship that. Save AI for the cases where deterministic genuinely cannot work.

The Real Cost of an AI Feature

When teams budget for an AI feature, they usually count engineering and model spend. They miss 5 other line items, and they miss them by 3–5x.

Engineering build (the obvious one)

1–3 senior engineers for 8–16 weeks, plus design, plus PM. The line that gets quoted in the planning doc. Usually accurate. Roughly $200K–$500K fully loaded.

Model inference at scale

Easy to estimate per call. Hard to estimate at scale. A feature that costs $0.04 per call at 10K calls/month is $400. At 10M calls/month it is $400K. Plan for 10x your initial usage estimate. Wrong answer here destroys gross margin.

Eval infrastructure and ongoing eval ops

Building the eval suite, labeling the data, running evals on every model update, every prompt change, every release. 0.5 FTE minimum for any serious AI feature, year-round. The line item every team forgets.

Support and trust ops

Hallucinations create support tickets. Bias incidents create escalations. Privacy questions create legal review work. AI features generate 2–4x the support load of equivalent deterministic features. Budget for it.

Model migration and re-eval cost

Every 6–12 months your foundation model is deprecated, repriced, or replaced. Each migration is a 4–8 week project: re-prompt, re-eval, A/B compare, roll out. Treat this as ongoing, not one-time.

Brand and incident risk (the silent one)

One viral hallucination thread on Twitter does more damage than the feature has earned in revenue. Hard to price, but real. Reserve incident response budget and run pre-mortems before launch.

Alternatives That Beat AI in Most Cases

Before defaulting to an LLM, work through these in order. Each one is cheaper, more predictable, and easier to debug.

Better Search and Filters

What it is: If users are asking the AI to find or filter data, faceted search, smart filters, and saved views often eliminate the need entirely. Most "AI search" features lose head-to-head against well-designed traditional search.

PM Implication: Run the experiment: ship better filters and a saved-view system. Measure task completion. The numbers usually beat the AI version, at 1/100th the cost.

Templates and Workflows

What it is: If users want the AI to "start a draft for me," a high-quality template library does the same job at zero inference cost. Notion, Linear, and Figma have shipped enormous template libraries that capture most of the AI starter-draft value.

PM Implication: Templates are content, not engineering. Hire writers and domain experts. Far better unit economics, and you build a content moat as a side effect.

Classical ML and Rules

What it is: Classification, ranking, anomaly detection, recommendation — these are solved problems with classical ML at 1/10th the cost of an LLM. LLMs feel exciting but are a dramatically more expensive way to do tasks classical ML already nails.

PM Implication: If your problem looks like classification, ranking, or scoring, default to classical ML. Use LLMs only for tasks that genuinely require generation, reasoning, or unstructured-text understanding.

Better Defaults and Smart Onboarding

What it is: Many AI features try to compensate for confusing UX. "Ask the AI how to do this" is often a signal the underlying flow is broken. Fix the flow. Ship better defaults. Add progressive disclosure.

PM Implication: If you can solve the user problem with better UX, do that first. AI as a band-aid for bad UX is the most expensive band-aid in product history.

Build the Right Things, Not the Hyped Things

The AI PM Masterclass — taught by a Salesforce Sr. Director PM and former Apple Group PM — teaches the decision frameworks that separate senior AI PMs from PMs who ship features that no one uses.

Customer Signal Traps

Trap 1: "Customers in interviews said they want AI"

In 2026, every customer says they want AI in customer interviews. They are answering a different question than the one you asked. The signal is meaningless. Ask instead: "what is the most painful 30 minutes of your week" — the answer is rarely AI-shaped.

Trap 2: Competitor shipped it, so we have to

Your competitor is also feeling investor pressure and shipping AI features that nobody uses. Their AI ship is not validation; it is the same disease. Look at usage, not at announcements. Most competitor AI features have <2% weekly active rates.

Trap 3: "AI" tested well in concept testing

AI in concept tests is a Rorschach blot. Users project the perfect tool onto it. Then you ship the real thing — slow, hallucinating, $0.40 per call — and adoption craters. Concept tests systematically over-predict AI feature adoption.

Trap 4: A noisy minority of power users

Power users on Twitter, Reddit, your community Slack are loud and visible. They do not represent your average user. Build for the median, not the megaphone. The median user wants the existing flow to be more reliable.

Trap 5: Internal stakeholder pull, not user pull

The CEO mentioned AI in the all-hands. The CRO put it in the keynote. Marketing wants a press story. None of this is user signal. Treat internal pull as a constraint to manage, not a justification to build.

Trap 6: "It demos so well"

The cherry-picked demo always demos well. The 90th percentile real-world session looks dramatically worse. If your launch decision is based on demo quality and not on a labeled eval set, you are launching on vibes. Vibes scale poorly.

When AI Is Genuinely the Right Call

All of the above is not anti-AI. It is anti-AI-as-default. AI is genuinely the right tool when these conditions converge.

The task involves unstructured text or multimodal input

Reading documents, understanding emails, processing screenshots, interpreting voice. These are tasks where deterministic systems genuinely cannot reach acceptable quality. AI is the right tool, not a hyped tool.

The output is reviewed by a human before consequence

Drafting an email the user sends, suggesting code the developer accepts, generating a marketing brief the manager edits. Human-in-the-loop absorbs hallucinations and turns AI into a productivity multiplier rather than a liability.

You have proprietary data that gives you an edge

If your model has access to data that competitors do not have — internal documents, customer history, regulated context — your AI feature is genuinely better than what foundation model labs can ship out of the box. Defensible.

The user job has 100x variance in input shape

Customer support, legal research, sales prospecting — jobs where every instance is different and rules cannot capture the variance. AI handles the long tail that rules cannot. Real value, real defensibility.