In 2019, when Baseten was founded, AI and machine learning were already at hype, but few could have predicted just how fast things would evolve. Large models were still a niche pursuit, inference was an afterthought, and the conversation around AI revolved more around research breakthroughs than real-world deployment. Fast forward to 2024, and AI models are not only mainstream but are also driving billion-dollar investments, shifting how businesses operate and compete.
On Wednesday, the company announced a new $75 million funding round, bringing its valuation to $825 million. The round was led by existing investors IVP and Spark Capital, with participation from others. The investment signals that venture capitalists see significant value in infrastructure companies that allow enterprises to run AI efficiently—without the hassle of securing scarce GPUs or dealing with the unpredictability of cloud providers.
The Inference Problem That No One Saw Coming
For years, training AI models received most of the attention. Companies poured resources into gathering data and fine-tuning models, but once those models were trained, a new challenge emerged getting them to run efficiently at scale.
When OpenAI launched ChatGPT, it reshaped user expectations overnight. AI systems were now expected to deliver real-time responses, and any delay became unacceptable. The launch of Stable Diffusion was another defining moment being open-source, it catalyzed an entire ecosystem around customizable AI models. As a result, inference quickly became the industry’s biggest bottleneck.
“Prior to those moments, we were actually doing a lot more than inference,” said co-founder and CEO Tuhin Srivastava. “We had taken for granted that inference was hard, but after the boom of GPT-4 and Stable Diffusion, we started focusing much more heavily on making inference seamless and scalable. Now, inference is the name of the game.”
Why Companies Are Turning to Baseten
Most businesses don’t have the infrastructure or expertise to efficiently run large AI models in production. Managing GPUs, scaling workloads, ensuring reliability, and optimizing costs require significant engineering effort. Without a dedicated solution, companies end up spending as much time managing infrastructure as they do building products.
Baseten’s platform abstracts away these complexities, allowing developers to deploy models on any cloud AWS, GCP, or even a mix of both and automatically spill over to Baseten’s own infrastructure when needed. This multi-cloud approach ensures that customers have access to more GPUs than a single cloud provider could offer.
The company has also put significant effort into improving performance. By integrating with NVIDIA’s TensorRT-LLM, Baseten optimizes language model inference to run as fast as possible. Its native workflows manage versioning, observability, and orchestration, ensuring that models stay online even when cloud providers unexpectedly take GPUs offline for maintenance.
“In this market, your No. 1 differentiation is how fast you can move,” Srivastava said. “You can go to production without worrying about reliability, security, or performance.”
As AI adoption accelerates, businesses are becoming increasingly conscious of the costs associated with running large models. The January breakthrough of Chinese AI lab DeepSeek claiming to train models at a fraction of the cost of U.S. counterparts put further pressure on the industry to focus on efficiency.
Baseten was quick to integrate support for DeepSeek’s R1 reasoning model, which competes with OpenAI’s GPT-4o. According to Srivastava, the company has seen a surge in interest from organizations looking to cut costs.
“There are a lot of people paying millions of dollars per quarter to OpenAI and Anthropic that are thinking, ‘How can I save money?’” he said. “And they’ve flocked.”
Baseten customers typically see their inference costs drop by 40% or more compared to in-house architectures.
A Year of Growth and What’s Next
Over the past year, Baseten has scaled inference loads hundreds of times over without a single minute of downtime. The company’s revenue for the fiscal year ending in January grew sixfold.
Beyond inference, Baseten is expanding into adjacent AI infrastructure challenges. The company recently rolled out:
- Multi-cloud support, allowing workloads to seamlessly span across multiple cloud providers.
- TensorRT integration, optimizing large language models for maximum performance.
- Partnerships with AWS and GCP, giving customers access to top-tier hardware without negotiating cloud contracts themselves.
Looking ahead, Baseten plans to expand its offerings further, adding more GPU availability, a new orchestration layer for build pipelines and queues, and an optimization engine to fine-tune workloads. Customers have also been requesting solutions beyond inference, including fine-tuning and model evaluation, which are on the roadmap.
Despite its momentum, Baseten faces competition. Together AI, backed by Salesforce, is another player in the AI infrastructure space. At the same time, talent remains a challenge. The company is competing for top AI engineers against deep-pocketed firms, including hedge funds and AI model companies.
“Having more money in somewhat of a weird economic environment, it does not hurt,” Srivastava admitted.
At its core, Baseten was founded to solve a problem its own team faced when deploying ML-powered products. Five years later, it’s clear that inference is a business-critical challenge for companies worldwide. For a startup founded to simplify machine learning deployment, the next stage is to make AI more accessible, more efficient, and more cost-effective before the AI boom forces companies to rethink their strategies all over again.