Baseten raises $1.5 billion as AI inference demand surges 20x

The inference infrastructure company that powers AI applications for Cursor, Abridge and OpenEvidence has become one of the fastest-growing private companies in tech, with revenue jumping roughly 20x year-over-year as enterprises shift model spending toward custom and open-source deployments.

Baseten, the San Francisco-based AI inference company, closed a $1.5 billion Series F financing led by Altimeter Capital, Conviction and Spark Capital, with Sands Capital and Wellington Management serving as co-leads. The round was structured across two tranches at $13 billion and $11 billion valuations respectively, a structure increasingly common in high-momentum AI rounds that allows different investor classes to enter at negotiated terms.

"The future of AI will be built on millions of specialized models, and the companies building the best ones know that post-training has become existential," Tuhin Srivastava, CEO and co-founder of Baseten, said. "It's how they build intelligence they own, on data that's theirs, optimized for the customers they serve."

Baseten's platform now processes more than 1 billion inference calls every day across 87 clusters spanning 18 cloud providers. The company's multi-cloud architecture — a core selling point — lets enterprises deploy open-source models from Meta, Mistral and DeepSeek alongside proprietary frontier models without lock-in to any single cloud. Leading app-layer companies now direct 30 percent to 50 percent of model spending toward custom and post-trained models, according to Baseten, as open-source quality converges with proprietary alternatives at a fraction of the cost.

The technical foundation is Truss, an open-source framework that packages machine learning models into production APIs with a single configuration file. Developers specify the model, hardware and optimization settings in a YAML file, and Baseten compiles a TensorRT-LLM container, deploys it across its cloud network and returns an OpenAI-compatible endpoint. For compound AI workflows — voice pipelines that chain speech-to-text, language model and text-to-speech steps — the platform's Truss Chains layer streams data directly between model steps, achieving sub-400-millisecond end-to-end latency.

The inference market is compounding faster than almost anything we've seen

The funding round caps a remarkable valuation trajectory. Baseten raised a $150 million Series D at a $2.15 billion valuation in September 2025, followed by a $300 million Series E at $5 billion in January 2026 that included $150 million from Nvidia. The roughly 6x valuation jump in under a year reflects a structural shift in AI economics: Deloitte projected in November 2025 that inference workloads would account for roughly two-thirds of all AI compute in 2026, up from one-third three years earlier, with the inference market exceeding $50 billion in chip spending alone this year.

LLM inference costs have fallen roughly 1,000-fold since late 2022, making agentic applications economically viable at production scale. At least one Baseten customer reports inference costs at roughly 30 percent of what closed-source alternatives charge for equivalent workloads, according to Sacra research.

"Tuhin and the Baseten team made a bet six years ago that most people weren't ready to make: that the future of AI would be built on many specialized models, and that the companies building them would need world-class infrastructure to make them real," Apoorv Agrawal, partner at Altimeter Capital, said. "That bet is paying off in a way that is hard to overstate."

Capital deployment and competitive positioning

Baseten plans to triple headcount this year, investing in engineering, research, operations and enterprise go-to-market teams. The company has raised more than $2 billion to date from investors including Altimeter, Battery Ventures, Blackbird, Greylock, IVP, Nvidia, Spark Capital and Wellington Management.

The competitive landscape includes cloud hyperscalers offering managed inference services — Amazon Bedrock, Google Vertex AI and Microsoft Azure AI — as well as specialist inference providers like Together AI and Fireworks AI. Baseten's differentiation lies in its multi-cloud independence and its focus on custom and post-trained models rather than shared inference capacity for popular models. The company targets enterprises that cannot tolerate the latency variability of shared endpoints and need dedicated GPU allocations for fine-tuned models.

For investors, the Baseten story reflects a broader thesis: as foundation models commoditize, the durable margin in AI is migrating to the infrastructure layer that orchestrates them. The company's revenue trajectory — from roughly $200 million to $600 million annualized in a single quarter, per prior reporting — suggests the market is already pricing that shift. Baseten's customers include some of the most valuable private AI companies, and its platform sits at the intersection of two structural trends: the proliferation of specialized models and the enterprise need for multi-cloud flexibility.

This article is for informational purposes only and does not constitute investment advice.