Jevons Paradox drives AI compute demand as token costs fall 1,000x

Per-token AI costs have fallen 1,000-fold in three years, yet enterprise compute consumption is exploding — Uber burned through its entire 2026 AI budget by April, and AT&T now processes 27 billion tokens daily, up from 1 billion 18 months ago.

"Every time we get the same unit of intelligence cheaper, we are not reducing consumption; we are increasing consumption because we can solve more complex tasks with the same budget," Roman Chernin, co-founder and chief business officer at Nebius, an AI cloud company, said.

The phenomenon, known as Jevons Paradox after 19th-century economist William Stanley Jevons, describes how efficiency gains that lower costs can increase total resource consumption. In a letter to the Wall Street Journal this week, economist Maury Harris argued the principle applies to AI compute, where price elasticity of demand may prove "highly elastic." Nebius, which builds large-scale GPU clusters, saw its stock drop 40% during the DeepSeek panic in January — yet Chernin said that same week was "probably the best week in sales" as companies realized they could afford inference at scale.

The implications for investors are significant. Goldman Sachs estimates annual AI infrastructure spending could rise from $765 billion in 2026 to $1.6 trillion by 2031. But the winners will depend on utilization rates, financing discipline, and the ability to absorb volatile component costs — memory-chip prices have risen sixfold over the past year as AI demand spills beyond data centers into the broader economy.

The Token Explosion Hits Enterprise Budgets

The shift from experimental chatbots to agentic AI systems is the primary driver. When enterprises move from single-turn queries to multi-step autonomous agents that chain calls, retrieve documents, and take action, token consumption jumps by an order of magnitude or more. A major healthcare insurer watched its monthly AI token consumption go from 3 million to more than 150 million in under a year.

The spending surge is reshaping vendor pricing. Anthropic eliminated flat-rate enterprise pricing after discovering developers were burning thousands of dollars in compute on $200-per-month plans. OpenAI moved Codex to per-token billing the same month. Every major AI vendor is converging on metered pricing, creating what Chernin calls a structural lock-in: every new agent deployed deepens dependence on providers who set the rate and control the terms.

Yet the demand side tells a different story from the panic that gripped markets in January. When DeepSeek's release triggered a 40% drop in Nebius's stock and a broader selloff in AI infrastructure names, corporate engineering teams were doing the opposite of retreating — they were scaling up. The lower costs made previously uneconomical applications viable, from internal knowledge retrieval to automated customer workflows.

Who Wins When Compute Gets Cheaper

The competitive dynamics favor companies that move up the technology stack. Chernin estimates the bare-metal GPU rental market serves roughly a dozen customers globally. Managed infrastructure reaches hundreds. Inference platforms attract thousands. Agentic systems, he predicts, will draw tens of thousands of developers.

Nebius's Token Factory, a managed inference platform, exemplifies this strategy. The service lets companies run open-source models without managing backend infrastructure, applying optimization techniques to keep costs predictable. For enterprises, the value proposition is clear: hosted models handle the complexity of tracking costs, maintaining uptime, and routing tasks across different models based on budget and speed requirements.

But the hosted inference layer faces its own commoditization risk. A 2026 study found approximately a 600-fold decline in large language model inference prices between 2020 and 2026, while the OECD's 2025 AI markets report documented sharp declines in quality-adjusted model prices as competition widens. That suggests the margin-compression pressure that hit chipmakers is now spreading upward through the stack.

For investors, the key question is which companies can build durable moats. Nvidia, trading at roughly 35 times forward earnings, faces the risk that cheaper inference reduces demand for its highest-margin training chips. Cloud hyperscalers — Amazon, Microsoft, Google — benefit from increased compute consumption but face rising capital requirements. And infrastructure providers like Nebius must prove they can maintain utilization and pricing power as the market expands.

The Jevons Paradox suggests total AI industry revenue will grow even as unit prices fall. But capturing that revenue requires more than owning compute — it requires the software, tooling, and enterprise relationships that turn raw processing power into finished products.

This article is for informational purposes only and does not constitute investment advice.