Back to Insights
Architecture

Most AI Pricing Is Broken

By Mathijs Boezer

On June 16, 2025, Cursor changed its Pro plan from 500 fast requests per month to "$20 of API-priced compute."

The pricing change was defensible. Anthropic Sonnet was getting more capable per request. Long-horizon agentic loops cost an order of magnitude more than simple completions. The unit economics of the flat-fee plan were upside down.

Power users burned the $20 budget in hours. Rate limits hit immediately. There was no spend cap by default. Several users reported surprise bills above $1,000. Eighteen days later, CEO Michael Truell published a public apology and offered full refunds. The rollout cost Cursor more trust than the new pricing recovered in revenue.

Cursor is the loud case. The instructive ones are quieter. Replit's Effort-Based Pricing rollout in mid-2025 produced the same backlash on smaller scale. Lovable's credit redefinition, Bolt's mid-quarter token resets, Salesforce's repeated Agentforce pricing changes. Different vendors, same pattern. Each one got the unit wrong before they got the model wrong.

The Cursor episode is the cleanest 2025 illustration of a pattern showing up everywhere. SaaS pricing assumptions are not surviving contact with variable-cost AI workloads. Most companies pricing standalone AI products are getting it wrong. Most companies absorbing AI into existing seats are getting away with it. For now.

The Inversion

Software-as-a-service pricing rested on a hidden assumption: marginal cost per user was bounded and predictable. One more Slack user, one more Notion user, one more Salesforce user added some cost, but a small and forecastable amount. Pricing could be flat per seat. Gross margins could sit at 70-80%. Plans could advertise "unlimited."

AI breaks the predictability.

Inference cost per call is real. Per-call cost varies with model choice, prompt length, reasoning effort, tool use, and retry behavior. One agentic loop can consume 50x what a chat reply consumes. Heavy users are routinely 100x more expensive than light users on the same plan.

ICONIQ's 2026 enterprise AI report puts gross-margin reality at roughly 52% for AI product builders, against the 70-80% SaaS norm (their sample skews toward growth-stage AI-natives; mature SaaS adding AI sees lighter compression). Bessemer's State of the Cloud has hybrid pricing rising from 27% to 41% of AI vendor revenue in 12 months while pure per-seat fell from 21% to 15%. The migration is happening because per-seat does not work when the cost-of-goods walks in the door alongside the customer.

In the dozen-plus pricing reviews I have run with SaaS teams shipping AI features over the last year, two failure modes dominate. I call the first the *COGS-blind launch*: pricing before plotting the usage distribution. I call the second the *undefined outcome*: selling per-resolution before contracting what counts. The first costs you margin in quarter two. The second costs you the renewal in quarter four.

The Taxonomy

Five pricing models in active use. Each has a different failure mode.

Per-seat. Microsoft 365 Copilot at $30 per user per month. GitHub Copilot at $19 per seat for Business. The traditional SaaS shape. Where it breaks: power-law usage. GitHub Copilot is reportedly losing an average of $20 per user per month, with heavy users costing $80 against a $10 list price. The light users subsidize the heavy ones until the heavy cohort grows. Then unit economics invert.

Per-token. OpenAI and Anthropic APIs. The correct model for the API layer because the buyer is a developer who translates tokens to features. Where it breaks for end-user products: bill shock and unpredictability. Customers cannot translate "tokens" into intent. A single agentic loop can multiply a month's spend by 20x. Almost no consumer-facing product passes raw token pricing through, because no procurement team will sign for it.

Per-task or per-action. Zapier at $30 per month for 750 tasks. Salesforce Agentforce Flex Credits at $0.10 per action. Buyer can audit "one task equals one thing." Where it breaks: once tasks include LLM calls of variable cost, the price decouples from the underlying COGS. Zapier's AI agents now live in a separate $150-$200 per month add-on stack, which signals that per-task could not absorb the LLM variance.

Per-outcome or per-resolution. Intercom Fin at $0.99 per resolution, now north of $100 million ARR. Sierra at around $1-$2 per resolved conversation. Decagon at $0.50 to $0.99 per resolution plus a roughly $50,000 platform fee. Zendesk Automated Resolutions at $1.50 committed or $2.00 pay-as-you-go. HubSpot moved Breeze from $1.00 per conversation to $0.50 per resolved conversation in early 2026.

Where it works: customers can compare $0.99 against a human-agent ticket cost of $5 to $15 and write the business case in a sentence. Where it breaks: defining "resolution" is contractual warfare. Salesforce launched Agentforce at $2 per conversation in October 2024 and abandoned the model when customers contested what counted. Salesforce now runs three pricing models in parallel because none of them satisfied both predictability and value capture.

Outcome pricing has a second failure mode I see surprise teams almost every time. As your AI improves, deflection climbs from 25% to 75%, your per-resolution bill triples, and your old per-conversation bill stayed flat. The customers who modeled the math after their first quarter renegotiate by their fourth.

Capacity or committed-use. OpenAI Guaranteed Capacity with 1-3 year reservations. Anthropic's enterprise contracts with mandatory monthly consumption commitments. For vendors, capacity contracts solve the GPU allocation problem and lock in pre-IPO revenue. For buyers, it transforms variable cost into a line item Finance can budget. Where it breaks: over-committing is the new shelfware.

The Hybrid Envelope

Every category is converging on hybrid. A committed floor (seat, platform fee, or capacity) plus metered overage tied to whatever unit the buyer can defend internally.

Intercom Fin charges per seat ($29 to $132) plus $0.99 per resolution. Zendesk charges per seat plus a $50 AI add-on plus per-resolution. Microsoft Copilot is $30 per seat plus consumption credits underneath the surface ("Copilot Credits" billed separately to Azure). Anthropic's restructured enterprise contracts bundle lower seat fees with monthly consumption minimums.

In the hybrid restructures I run with vendors, the negotiation is rarely about which model to pick. It is about where the floor sits and how the overage meter is defined. Customers want predictability at the floor. Vendors want value capture at the ceiling. The hybrid envelope is the configuration that gives both.

By Category

Different product categories settle on different hybrid shapes.

Internal copilots (Microsoft, Google Workspace, Notion AI, Slack AI) are converging on bundled-into-base. Notion folded its standalone $10 AI add-on into the $20 Business plan in May 2025. Slack dropped its $10 AI add-on in August 2025 and raised Business+ from $12.50 to $15. Google bundled Gemini into Workspace and raised base prices 17-22%. Atlassian did the same with Rovo in April 2025. Why: standalone attach rates disappointed, and price uplift via base-tier increases is less negotiable than line-item AI charges.

Customer service agents (Intercom Fin, Sierra, Decagon, Zendesk, Salesforce Agentforce) converged on per-resolution with a platform fee or seat floor. The economics work because human-agent ticket costs sit at $5 to $15 and AI cost-per-resolution at scale runs sub-dollar. This is the category where I see the resolution-definition fight escalate fastest, and where the teams I advise are most often blindsided by the success penalty: their per-resolution bill that triples as their AI gets better at deflection.

Developer tools (Cursor, Replit, Lovable, Bolt, v0) converged on a $20 anchor plus credits, with higher tiers ($60, $100, $200) for the heavy cohort. The category-killing mistake is not the model. It is silent repricing without communication.

Foundation-model APIs stayed per-token, with capacity commitments emerging at the top end. This is the only category where pure per-token actually works at scale, because the buyer is the developer.

The Counter-Argument

Some of the bill-shock concern self-resolves.

Per-token list prices fell roughly 280x in two years on commodity tiers. Hardware efficiency keeps compounding. NVIDIA's Vera Rubin platform promises up to 10x higher inference throughput per megawatt than Blackwell on MoE workloads. Mixture-of-experts active-parameter ratios keep falling: DeepSeek-V3 activates 5.5% of parameters; Qwen3 Next activates 3.75%. Distillation is putting frontier-class capability into models that fit on a phone.

If the cost curve keeps moving the way it has, the COGS pressure on AI products eases over time. The hybrid envelope might be a 2025-2028 artifact rather than a permanent equilibrium.

The counter to the counter: at the frontier ceiling, prices reversed up in 2026. GPT-5.5 costs $5 per million input tokens and $30 per million output, roughly double GPT-5.4's $2.50/$15 and four times GPT-5's launch pricing. The 90% inference price drop story is alive at the commodity tail. The frontier products customers actually want to ship have moved the other way. Until the curves cross, the unit-economics problem is real, and the hybrid envelope is the right response.

For Builders

Three things I check before letting a pricing meeting end.

Model the COGS curve, not the average. Per-seat pricing fails because the long-tail user, not the median user, breaks the unit economics. Plot p50, p90, and p99 usage per seat before you commit to a flat price. If your p99 cost exceeds your list price, build for the floor: cap the heavy users with rate limits, or move them to a higher tier, or move the whole product to hybrid.

Define the outcome before you sell the outcome. Per-resolution pricing only works when both sides agree, in advance and in writing, on what counts. Salesforce shipped Agentforce at $2 per conversation and discovered too late that "conversation" was contested. Build the definition into the contract, with measurement that produces a number both sides can audit.

Floor plus overage beats either alone. Pure per-seat dies on heavy users. Pure per-token dies on procurement. The hybrid envelope absorbs both failure modes. Most 2026 AI pricing changes are some variant of this.

Closing

Per-seat pricing assumed marginal cost was bounded and predictable. AI made it unbounded and bursty.

The cheapest model for the buyer is the most expensive for the vendor. The cheapest model for the vendor is unsellable to the buyer. The hybrid envelope is where both sides survive a model upgrade.

Price the floor. Meter the overage. Define the outcome.