The Real Cost of AI Tokens: What Happens When Employee Reliance Scales

Token prices are falling fast. Enterprise AI bills are rising faster. Here's the data on why usage growth outpaces price decline — and what Jevons Paradox means for your AI budget.

AI Token Costs vs Employee Efficiency

The Headline Nobody Wants to Hear

Everybody in AI is telling you the same story right now: tokens are cheap and getting cheaper. And they're right. LLM (Large Language Model) inference prices have fallen 50x to 200x per year depending on the benchmark. GPT-4 launched in 2023 at $60 per million output tokens. GPT-4o mini costs $0.60. That's a 99% price drop in under two years.

So why did enterprise generative AI spending go from $11.5 billion in 2024 to $37 billion in 2025 — a 3.2x increase — in a single year?

Because the cost-per-token story is only half the story. The other half is what happens when people actually start relying on the tool.

The 320x Consumption Problem

The most striking data point in this entire analysis isn't from an analyst report or a consulting firm. It's from OpenAI's own State of Enterprise AI report, published December 2025. Here's what they found:

API (Application Programming Interface) reasoning token consumption per organization increased 320x year-over-year.

Not 320%. 320 times. More than 9,000 companies exceeded 10 billion tokens. Nearly 200 exceeded 1 trillion.

This isn't new companies joining the platform. This is existing organizations consuming AI 320 times more intensely in 12 months. The average enterprise worker sends 30% more ChatGPT messages weekly than a year ago. Frontier workers — the 95th percentile — send 6x more messages than the median employee.

Here's the arithmetic that should concern any CFO: if tokens cost 50x less but your organization uses 320x more, your bill is 6.4x higher. The price decline doesn't matter when demand growth outpaces it by an order of magnitude.

An Economist From 1865 Predicted This

In 1865, William Stanley Jevons observed something counterintuitive: technological improvements that made coal more efficient to burn led to increased total coal consumption. Not decreased. The cost per unit of energy dropped, so people found more uses for coal, ran their machines longer, built new factories. Efficiency didn't reduce demand — it amplified it.

Microsoft's CEO Satya Nadella cited this directly after DeepSeek's cost breakthrough: "As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can't get enough of."

The academic research backs this up. A 2025 paper published in ACM FAccT (Fairness, Accountability, and Transparency) proceedings documented two types of rebound effects in AI adoption:

  1. Direct rebound: efficiency improvements lead to increased use of the same system. AI makes writing emails faster, so people write more emails.
  2. Indirect rebound: cost savings from efficiency enable expansion into entirely new applications. AI saved your team 3 hours a week, so now management wants you to take on AI-assisted market research too.

The radiology example from the paper is concrete: if AI delivers a 10% efficiency improvement per scan, doctors order more scans because it's easier and faster. If volume rises by more than 10%, total resource use increases despite per-scan gains.

This is the pattern playing out in every organization deploying AI at scale. As models get cheaper and better, three things happen simultaneously:

  • More employees use them (breadth)
  • Existing users use them more intensively (depth)
  • New use cases emerge that wouldn't have existed without AI (demand creation)

BCG (Boston Consulting Group) confirmed this dynamic in their 2024 research: generative AI doesn't just speed up existing tasks — it expands the range of tasks workers can perform. A non-developer can now write basic code. A non-designer can create presentations. This isn't an efficiency improvement. It's capability expansion. And expanded capability always creates expanded demand.

The Budget Surprise Nobody Planned For

If the Jevons effect explains why consumption grows, the enterprise data shows how badly organizations are predicting it.

According to research from Mavvrik:

  • 85% of enterprises miss AI infrastructure forecasts by more than 10%
  • 80% miss by more than 25%
  • Budget overruns affect 66.5% of organizations
  • AI implementation costs increased 89% between 2023 and 2025
  • 100% of surveyed CEOs cancelled or delayed at least one AI initiative due to cost surprises

That last figure deserves emphasis. Not "some." Not "most." Every single CEO in the survey had pulled the plug on at least one AI project after the bill came in.

The underlying cause isn't irresponsibility. It's that consumption-based pricing requires understanding usage patterns that don't exist before deployment. You can't forecast how your power users will behave until they start using the tool. You can't predict whether your sales team will generate 500 tokens per conversation or 5,000 until you measure it.

This is a fundamentally different budgeting problem than traditional SaaS (Software as a Service). Every seat-based software contract has a knowable ceiling. AI API contracts don't. And 53% of AI vendors now use consumption-based pricing — up from 31% in 2024. The billing model that CFOs were trained on is the one that's disappearing.

The Agentic Multiplier: Where Costs Compound

The consumption numbers get more alarming when you look at how AI is being used, not just how often.

A simple chatbot turn costs a few hundred tokens. An agentic workflow — where the AI plans, calls tools, delegates to sub-agents, and iterates on results — can consume 10x to 100x more tokens for work that feels equivalent to the user.

Real benchmarks for agentic workloads show the gap:

  • Basic customer support chatbot: ~1,000 tokens per conversation
  • AI SDR (Sales Development Representative) outreach with research and personalization: ~3,000-4,000 tokens
  • Multi-step automation with chain-of-thought reasoning and tool calls: 5,000+ tokens
  • Unconstrained software engineering agent per task: $5-$8 at current rates
  • A Reflexion loop running 10 cycles: 50x the tokens of a single pass

There's a structural reason for this. In multi-turn agentic conversations, input tokens grow near-quadratically because LLMs are billed for every input token in every turn — including the entire conversation history. Turn 1 sends 200 tokens. Turn 2 sends all of turn 1 plus new context. Turn 10 is sending the cumulative history of turns 1-9 plus new input. Every turn gets more expensive than the last.

An ICLR (International Conference on Learning Representations) 2026 submission — the first empirical analysis of agent token consumption — found that input tokens dominate agentic costs (not output, as commonly assumed), variance across runs is enormous (some runs use 10x more tokens than others on identical tasks), and predicting total consumption before execution is practically impossible (Pearson's r < 0.15).

That last finding deserves repeating: you cannot reliably budget for agentic AI from first principles. The costs are inherently unpredictable until measured.

The Productivity Math Is Real — With Caveats

I want to be fair to the counterargument, because the productivity data is genuinely compelling.

GitHub Copilot users complete tasks 55% faster in controlled studies. MIT and Stanford researchers documented a 14% productivity boost (NBER Working Paper) from AI, with the greatest impact on lower-skilled workers. Employees self-report saving 40-60 minutes per day on average.

For low-cost subscriptions, the ROI (Return on Investment) is unambiguously favorable. GitHub Copilot Pro at $10/month breaks even if a developer saves 1.3 hours per year. A Forrester Total Economic Impact study found 376% ROI over 3 years for a 5,000-developer organization.

But here's where it gets complicated.

The shelfware problem. Microsoft 365 Copilot deployments show a persistent pattern: organizations buy 1,000 licenses and get 300 active users. That's 70% of investment burning on seats that produce zero value. The true cost per active user becomes $100+/month, not $30. And the real all-in cost of M365 Copilot — once you factor in the required base subscriptions — is $42.50-$87 per user per month, not the $30 headline price.

The burnout signal. BCG's 2025 AI at Work report found that 88% of the top quartile of AI users — the people extracting the most value, saving the most time — report significant burnout. The Jevons effect operates at the cognitive level too: AI saves time per task, so more tasks get added. Total workload increases despite individual tasks getting faster.

The total cost iceberg. Raw AI API costs represent only 15-30% of total enterprise AI spend. The rest is implementation ($50,000-$200,000 for enterprise rollouts), data preparation, security validation, integration development, monitoring infrastructure, user training, and FinOps tooling to even measure what you're spending. Organizations budgeting for token costs alone are seeing the tip while the bulk sits underwater.

What This Actually Means

I'm not arguing that AI is overpriced. For many use cases — especially flat-subscription tools with high adoption — the ROI is excellent. A developer getting 55% faster at coding for $10/month is among the best investments in software history.

But the narrative that "tokens are cheap and getting cheaper" creates a dangerous false comfort for organizations moving beyond basic chat subscriptions into API-driven, agentic deployments. The data tells a clear story:

Per-token prices are falling. That is real. A 50x-200x price decline per year is extraordinary.

Usage is growing faster than prices fall. That is also real. 320x consumption growth versus a 50x-200x price decline means the bill goes up, not down.

This growth is structural, not temporary. Jevons Paradox explains why: cheaper and more capable AI creates more demand, not less. History shows this pattern across every major technology transition.

The organizations that will avoid the 85% budget-miss statistic are the ones treating AI spend governance the same way they treat cloud FinOps (financial operations for cloud spend management) — with real-time visibility, per-team budgets, model routing that uses cheap models for simple tasks and expensive models only when justified, and an honest assessment that consumption will grow as reliance deepens.

The ones assuming "cheap tokens" means "cheap AI" are the ones whose CEOs will be cancelling projects next quarter.


Found This Helpful?

This is the kind of analysis Intelligence Adjacent publishes regularly — grounded in data, not vendor marketing. If you want the methodology behind the conclusions, subscribe here. Lurkers get methodology guides. Contributors get implementation deep dives.

Sources

Enterprise AI Cost Data

Pricing and Token Economics

Jevons Paradox and Rebound Effects

Productivity Studies

Enterprise Adoption and SaaS Pricing Shift

Agentic AI Cost Analysis