Baseten Is Helping Companies Ditch Million-Dollar AI Bills

In this market, your No. 1 differentiation is how fast you can move.

In 2019, when Baseten was founded, AI and machine learning were already at hype, but few could have predicted just how fast things would evolve. Large models were still a niche pursuit, inference was an afterthought, and the conversation around AI revolved more around research breakthroughs than real-world deployment. Fast forward to 2024, and AI models are not only mainstream but are also driving billion-dollar investments, shifting how businesses operate and compete.

On Wednesday, the company announced a new $75 million funding round, bringing its valuation to $825 million. The round was led by existing investors IVP and Spark Capital, with participation from others. The investment signals that venture capitalists see significant value in infrastructure companies that allow enterprises to run AI efficiently—without the hassle of securing scarce GPUs or dealing with the unpredictability of cloud providers.

The Inference Problem That No One Saw Coming

For years, training AI models received most of the attention. Companies poured resources into gathering data and fine-tuning models, but once those models were trained, a new challenge emerged getting them to run efficiently at scale. 

When OpenAI launched ChatGPT, it reshaped user expectations overnight. AI systems were now expected to deliver real-time responses, and any delay became unacceptable. The launch of Stable Diffusion was another defining moment being open-source, it catalyzed an entire ecosystem around customizable AI models. As a result, inference quickly became the industry’s biggest bottleneck.

“Prior to those moments, we were actually doing a lot more than inference,” said co-founder and CEO Tuhin Srivastava. “We had taken for granted that inference was hard, but after the boom of GPT-4 and Stable Diffusion, we started focusing much more heavily on making inference seamless and scalable. Now, inference is the name of the game.”

Why Companies Are Turning to Baseten

Most businesses don’t have the infrastructure or expertise to efficiently run large AI models in production. Managing GPUs, scaling workloads, ensuring reliability, and optimizing costs require significant engineering effort. Without a dedicated solution, companies end up spending as much time managing infrastructure as they do building products.

Baseten’s platform abstracts away these complexities, allowing developers to deploy models on any cloud AWS, GCP, or even a mix of both and automatically spill over to Baseten’s own infrastructure when needed. This multi-cloud approach ensures that customers have access to more GPUs than a single cloud provider could offer.

The company has also put significant effort into improving performance. By integrating with NVIDIA’s TensorRT-LLM, Baseten optimizes language model inference to run as fast as possible. Its native workflows manage versioning, observability, and orchestration, ensuring that models stay online even when cloud providers unexpectedly take GPUs offline for maintenance.

“In this market, your No. 1 differentiation is how fast you can move,” Srivastava said. “You can go to production without worrying about reliability, security, or performance.”

As AI adoption accelerates, businesses are becoming increasingly conscious of the costs associated with running large models. The January breakthrough of Chinese AI lab DeepSeek claiming to train models at a fraction of the cost of U.S. counterparts put further pressure on the industry to focus on efficiency.

Baseten was quick to integrate support for DeepSeek’s R1 reasoning model, which competes with OpenAI’s GPT-4o. According to Srivastava, the company has seen a surge in interest from organizations looking to cut costs.

“There are a lot of people paying millions of dollars per quarter to OpenAI and Anthropic that are thinking, ‘How can I save money?’” he said. “And they’ve flocked.”

Baseten customers typically see their inference costs drop by 40% or more compared to in-house architectures.

A Year of Growth and What’s Next

Over the past year, Baseten has scaled inference loads hundreds of times over without a single minute of downtime. The company’s revenue for the fiscal year ending in January grew sixfold.

Beyond inference, Baseten is expanding into adjacent AI infrastructure challenges. The company recently rolled out:

  • Multi-cloud support, allowing workloads to seamlessly span across multiple cloud providers.
  • TensorRT integration, optimizing large language models for maximum performance.
  • Partnerships with AWS and GCP, giving customers access to top-tier hardware without negotiating cloud contracts themselves.

Looking ahead, Baseten plans to expand its offerings further, adding more GPU availability, a new orchestration layer for build pipelines and queues, and an optimization engine to fine-tune workloads. Customers have also been requesting solutions beyond inference, including fine-tuning and model evaluation, which are on the roadmap.

Despite its momentum, Baseten faces competition. Together AI, backed by Salesforce, is another player in the AI infrastructure space. At the same time, talent remains a challenge. The company is competing for top AI engineers against deep-pocketed firms, including hedge funds and AI model companies.

“Having more money in somewhat of a weird economic environment, it does not hurt,” Srivastava admitted.

At its core, Baseten was founded to solve a problem its own team faced when deploying ML-powered products. Five years later, it’s clear that inference is a business-critical challenge for companies worldwide. For a startup founded to simplify machine learning deployment, the next stage is to make AI more accessible, more efficient, and more cost-effective before the AI boom forces companies to rethink their strategies all over again.

📣 Want to advertise in AIM Research? Book here >

Picture of Anshika Mathews
Anshika Mathews
Anshika is the Senior Content Strategist for AIM Research. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at anshika.mathews@aimresearch.co
Subscribe to our Latest Insights
By clicking the “Continue” button, you are agreeing to the AIM Media Terms of Use and Privacy Policy.
Recognitions & Lists
Discover, Apply, and Contribute on Noteworthy Awards and Surveys from AIM
AIM Leaders Council
An invitation-only forum of senior executives in the Data Science and AI industry.
Stay Current with our In-Depth Insights
The Most Powerful Generative AI Conference for Enterprise Leaders and Startup Founders

Cypher 2024
21-22 Nov 2024, Santa Clara Convention Center, CA

25 July 2025 | 583 Park Avenue, New York
The Biggest Exclusive Gathering of CDOs & AI Leaders In United States
Our Latest Reports on AI Industry
Supercharge your top goals and objectives to reach new heights of success!