Baseten raises $1.5B betting on cheaper open-source AI models

Baseten's $1.5 billion raise signals that enterprise AI spending is pivoting from premium closed models to cheaper open-source alternatives.

Baseten, an AI startup building inference infrastructure for open-source models, is finalizing a $1.5 billion fundraising round at a valuation of up to $13 billion, the latest bet that enterprises will shift AI spending from premium closed models to cheaper alternatives.

"Open-source models are getting very, very good," Tuhin Srivastava, co-founder and chief executive of Baseten, said. "And as open-source gets better, we are growing with it."

The round uses a dual-tiered structure, with some investors participating at an $11 billion valuation and others at $13 billion, according to the company. Altimeter Capital, Conviction, Spark Capital, Sands Capital and Wellington Management are co-leading the investment — Wellington's first foray into AI inference. Baseten's software layer sits atop computing capacity sourced from 20 cloud providers, giving customers the infrastructure to run, optimize and train open-source models without managing hardware.

The bet reflects a broader market shift. The quality gap between open-source and closed-source models has collapsed from two years in 2023 to mere weeks on key engineering benchmarks today, according to independent testing. DeepSeek-V4, a 1.6 trillion-parameter open model, costs about 87 cents per million output tokens — roughly one-thirtieth of frontier pricing from OpenAI and Anthropic. If enterprises redirect even a fraction of their AI spend to open-source alternatives, the revenue projections underpinning the $200 billion-plus data center buildout could face serious pressure.

The Inference Layer Becomes a Business

Baseten is part of a growing ecosystem of startups capitalizing on the inference boom — the process by which AI models use computing power to respond to queries. Cerebras, which designed chips specifically for inference, went public in May and now commands a nearly $50 billion market capitalization. Fireworks AI raised funds in October at a $4 billion valuation, and Factory, a startup building autonomous coding agents, reached $1.5 billion in April.

The economics are driving adoption. One Baseten customer told Srivastava it performed a specific task at 30 percent of the cost required by a closed-source model. Most of Baseten's customers use a mix of open and closed models, tapping frontier systems only for tasks requiring the absolute best performance while routing routine workloads to cheaper alternatives.

"Open-source models are always a handful of months behind the frontier models, but they can serve a lot of use cases while saving some share of token usage for the absolute best," Oz Nur, an investor at Wellington Management, said.

China's Open-Source Offensive

The most popular open-source models today come from Chinese labs. DeepSeek's V4 series and Z.ai's GLM-5.2 have posted benchmark scores that rival or exceed leading American models on engineering tasks. GLM-5.2 scored 81.0 on Terminal-Bench 2.1, up from 62 for the prior version released weeks earlier. It carries a one million-token context window and costs roughly one-sixth of leading American closed models per token.

U.S. companies are trying to catch up. Nvidia recently launched Nemotron, a family of open-source models, while Meta continues to develop its Llama series. But Chinese labs are iterating faster — GLM moved from version 5.0 to 5.2 in four months, with each release trained on domestic silicon.

The Investor Calculus

For investors, the math is straightforward. The cost of a GPT-4 class output fell from about $20 per million tokens in late 2022 to roughly 40 cents today — a near-thousandfold decline. That deflation paused this year due to memory shortages, but new fab capacity coming online could resume the trend. Meanwhile, Nvidia's DGX Spark, a $4,700 desktop machine with 128 gigabytes of unified memory, can now run models up to 200 billion parameters locally.

If frontier-grade open models run on affordable local hardware, the centralized inference demand that justifies five-year depreciation schedules on data center GPUs may grow slower than expected. Michael Burry has flagged roughly $176 billion in understated depreciation across the industry through 2028, and roughly half of U.S. data center projects planned for 2026 already face delay or cancellation.

Baseten's customers include Cursor, Mercor and OpenEvidence. The Information earlier reported on the fundraising.

This article is for informational purposes only and does not constitute investment advice.