Outcome-Based Pricing for AI Agents: Charge Per Result, Not Per Token
TL;DR
Token-based pricing was built for chat interfaces where usage correlates loosely with value. For AI agents that resolve support tickets, review contracts, qualify leads, or draft documents, that correlation breaks. Outcome-based pricing ties revenue to results: per ticket resolved, per contract reviewed, per deal stage advanced. It requires robust outcome measurement infrastructure, a clear attribution methodology, and unit economics that account for your variable cost floor. This guide covers how to define outcomes, structure contracts, solve attribution, and decide when outcome-based pricing beats per-token or per-seat alternatives.
The AI PM Minute
One tactic to make you a sharper AI PM, twice a week. 60 seconds to read. Free.
No fluff. Unsubscribe anytime.
Why Token Pricing Breaks for AI Agents
When an AI agent is resolving support tickets or reviewing legal documents, the number of tokens it generates has nothing to do with the value it delivers. An agent might spend 5,000 tokens thoroughly researching a customer issue and correctly resolving it in one reply. Another call might generate 20,000 tokens going back and forth through tool calls and produce nothing useful. Token pricing rewards thoroughness and punishes efficiency. Outcome pricing rewards the thing that actually matters: completing the task.
The structural problem is incentive misalignment. Under token pricing, a vendor is paid whether the task succeeds or fails. Under outcome pricing, the vendor is only paid when the work is done. This shifts risk toward the vendor and pressure toward genuine quality improvement. Customers prefer it because the relationship is simpler: they pay for what they get. Vendors who can price this way are signaling high confidence in their agent quality.
Token-based pricing for agents
Misaligned
Vendor is paid for compute used, not results delivered. Failed agent runs, retry loops, and unnecessary tool calls all generate cost for the customer. Incentivizes thoroughness over efficiency.
Seat-based pricing for agents
Partially misaligned
Revenue is flat regardless of agent utilization or task completion rate. Customers who get high value and customers who barely use the product pay the same. No signal for the vendor on where quality actually matters.
Outcome-based pricing
Aligned
Vendor is paid only when the agent delivers a result the customer defines as success. Creates direct financial pressure to improve agent reliability. Requires investment in measurement infrastructure that also improves product quality.
Hybrid: platform fee + outcome premium
Partially aligned
Base fee covers infrastructure and access. Outcome premium captures value above baseline. Common when there is fixed setup cost and variable outcome volume. Lets customers budget predictably while still aligning on results.
Defining What an Outcome Actually Is
The hardest part of outcome-based pricing is not pricing. It is defining "done" in a way that is contractually unambiguous and technically measurable. The definition determines everything else: what you build to measure it, how disputes are resolved, and what quality investments are worth making.
Hard outcomes (preferred)
Examples: Ticket closed without human escalation. Contract clause extracted and validated against database. Lead scored and moved to next CRM stage. Form filed with regulatory body. Email replied to and thread marked resolved.
Contract implication: Hard outcomes have a clear binary state: done or not done. System logs can prove it. These are the most defensible in contracts and the easiest to audit.
Soft outcomes (requires instrumentation)
Examples: Customer satisfaction score above threshold post-interaction. Time saved vs. baseline human completion time. Error rate below agreed ceiling on processed documents. User continued to next step in a workflow without contacting support.
Contract implication: Soft outcomes require measurement infrastructure built into the product. You need baseline data, a sampling methodology, and a dispute resolution process when the customer disagrees with your measurement.
Avoid: subjective quality outcomes
Examples: Customer is happy with the output. Document is high quality. Agent response is accurate enough. User feels the task was completed.
Contract implication: Any outcome that requires a human judgment call on every instance creates constant contract disputes. If you must use subjective quality outcomes, define a specific rating methodology upfront and specify who rates, how often, and what the dispute process is.
How to Structure Outcome-Based Contracts
The contract structure determines how risk is shared between vendor and customer. Three models dominate in 2026, each with different risk profiles and measurement requirements.
Per-task pricing (flat fee per completion)
Customer pays a fixed price per successfully completed outcome. Example: $2.00 per support ticket resolved without human escalation. Simple to understand and audit.
Best for: High-volume, homogeneous tasks where completion is clearly binary and average task complexity does not vary significantly. Customer support, form processing, data extraction.
Watch out: Task complexity variance. If 10% of tickets are 10x harder than average, flat per-task pricing may underprice hard tasks and create incentives for the vendor to cherry-pick easy ones.
Success fee (percentage of value created)
Customer pays a percentage of the measurable value the agent delivers. Example: 3% of total contract value for contracts the AI agent reviews and flags issues that the customer acts on.
Best for: High-value, low-volume tasks where value is quantifiable and variable. Legal review, sales qualification, financial analysis. Aligns vendor upside with customer ROI.
Watch out: Attribution complexity. Proving the agent caused the value creation (vs. the human who acted on the output) requires careful methodology and is often contested.
Outcome subscription (unlimited tasks, paid by success rate)
Customer pays a monthly fee that scales with the agent's aggregate success rate across all tasks. High success rate in a given month = higher invoice. Decouples pricing from volume while maintaining outcome alignment.
Best for: Established relationships where both parties have baseline data on expected volumes and success rates. Provides predictable revenue for vendor and predictable budget for customer.
Watch out: Success rate measurement cadence must be clearly defined. Monthly or weekly measurement windows, agreed audit rights, and a clear floor (minimum fee if volume drops) are essential.
Master AI Pricing Strategy in the Masterclass
The AI PM Masterclass covers AI product monetization, unit economics, and pricing strategy with real examples from products that have shipped. Taught live by a Salesforce Sr. Director PM.
Solving the Attribution Problem
The hardest unsolved problem in outcome-based AI pricing is attribution: proving that the agent caused the outcome, not the human who reviewed the output, the existing workflow, or baseline trends. This is not just a technical problem. It is a trust problem. Customers will not pay for outcomes they believe would have happened without the agent.
Control group experiments
The gold standard. Route a percentage of tasks through the agent and an equivalent set through the baseline process (human only, or older workflow). Compare outcome rates. The difference is the agent's causal contribution. Requires volume large enough for statistical significance. Klarna ran this at scale when validating their AI customer service agent, comparing 700 equivalent interactions across AI and human agents.
Incremental lift measurement
Compare outcome rates for tasks the agent handled vs. tasks it did not. Account for selection bias: agents often get easier tasks assigned preferentially, which inflates apparent success rate. Use propensity score matching or random assignment to create comparable cohorts before claiming the lift is real.
Time-to-completion attribution
If the agent handles the task and the outcome occurs within a defined attribution window (e.g., ticket closed within 24 hours of agent response, no human touchpoint), attribute to the agent. Clear, auditable, and simple to implement in logs. The attribution window is a negotiated term in the contract.
Human review pass-through
Agent completes the task; human reviews and approves. Outcome attributed to agent if and only if human approval is required. This gives customers confidence that quality is checked while still crediting the agent for doing the work. Common in legal and financial workflows where human sign-off is a regulatory requirement.
Unit Economics That Work
Before setting an outcome price, you must know your cost floor per outcome. Outcome-based pricing fails when vendors discover they are paying more in inference and overhead than they are collecting per result. Build the cost model before quoting a price.
Inference cost per task
Average token count per completed task * current API price per token. For a support ticket: ~2,000 input tokens + 500 output tokens at $3/$15 per 1M = roughly $0.013 per call. Account for retries (add 20-30% buffer). This is your variable floor.
Infrastructure and orchestration
Vector DB queries, tool calls, external API lookups, human review time if included. Often 30-50% of total cost per task. Map every external call in the agent workflow and price it per task, not as a flat infrastructure budget.
Failure and retry cost
If your agent succeeds 80% of the time and you do not charge for failures, you are paying the inference cost for 20% of tasks with no revenue. Either charge a small initiation fee for attempts, set success rate minimums in SLAs, or price the per-success rate high enough to cover expected failures.
Pricing against the human alternative
The anchor for your outcome price is the cost of a human doing the same task. If a human analyst costs $50/hour and takes 30 minutes per contract review, your price ceiling is $25 per review. Price at 60-70% of human cost to make the ROI obvious. Above 80% of human cost you will lose deals to in-house headcount.
When to Use Outcome-Based vs. Other Pricing Models
Outcome-based pricing is not always the right model. It requires measurement infrastructure, attribution methodology, and customer sophistication. Use this decision framework to pick the right model for your product and buyer.
Use outcome-based when
- ›The agent does the full unit of work autonomously (not just assists a human doing it)
- ›Outcomes are clearly binary and technically measurable without human judgment
- ›Replacement value per outcome is high enough to price profitably above your cost floor
- ›Your agent success rate is above 80% and stable (you are not subsidizing failures at scale)
- ›The customer is sophisticated enough to set up the measurement pipeline with you
Use per-token or usage-based when
- ›The product assists a human who does the final work (agent is a copilot, not an autonomous worker)
- ›Outcomes are too varied or context-dependent to define a standard unit
- ›You are in early product stages and do not yet have baseline outcome data
- ›Customer usage patterns are predictable and token count correlates reasonably with value
Use hybrid when
- ›There is meaningful fixed cost (onboarding, integration, dedicated infrastructure) that should be recovered regardless of outcome volume
- ›You want to give customers predictable budget certainty while still aligning on results
- ›Some value is delivered at the platform level (data, integrations, reporting) independent of task outcomes
- ›Enterprise procurement prefers committed spend with a variable component rather than fully variable pricing
Build AI Products With Sustainable Business Models
The AI PM Masterclass covers pricing strategy, unit economics, and go-to-market for AI products. Stop leaving money on the table. Join the next cohort.
Related Articles
Before you go: get the AI PM Minute
One tactic to make you a sharper AI PM, twice a week. 60 seconds to read. Free.
No fluff. Unsubscribe anytime.