Ankur Goyal, the founder of Braintrust, is no stranger to the challenges of building reliable AI products. Having previously helped develop AI infrastructure at Impira and Figma, Goyal knows firsthand that creating AI tools isn’t just about innovative algorithms or impressive models—it’s about ensuring that those tools perform well under real-world conditions. This insight has been central to Braintrust’s mission, a company that has now raised $36 million in a Series A funding round led by Andreessen Horowitz (a16z). The funding puts Braintrust’s valuation at $150 million and positions the company as a key player in the AI evaluation space.
We’re thrilled to announce that we've raised a $36M Series A led by @martin_casado at @a16z to advance the future of AI software engineering, bringing our total funding to $45 million.
— Braintrust (@braintrustdata) October 8, 2024
We’re also introducing functions — a flexible primitive for building with foundation models. pic.twitter.com/ruVxXMGQft
From Hype to Performance
AI tools often grab headlines for their potential to revolutionize industries, but as Goyal explains, the real challenge begins after the excitement dies down and users begin relying on these tools for day-to-day tasks. “People are good at predicting what they’ll need an AI tool for,” Goyal says. “The hard part is that if you just type something up and ship it, it doesn’t work.” The gap between initial expectations and actual performance is what Braintrust aims to address.
Braintrust steps in to help companies assess and refine their AI products, ensuring they’re not only functional but also accurate and reliable. Its software evaluates how well AI tools are working, providing a feedback loop that helps companies optimize their models over time. “It’s like baking without measuring the ingredients,” Goyal explains. “If you don’t have proper evaluations in place, you end up with mush.”
The Journey to $36 Million
Launched in August 2023, Braintrust quickly gained traction by solving a problem Goyal had encountered multiple times in his career: AI models that perform well in the lab but fail in real-world scenarios. With notable clients like Airtable, Stripe, Instacart, Zapier, and Notion, Braintrust’s software has been integral in helping companies monitor, evaluate, and improve their AI tools.
In just a year, the company’s client base doubled, with its dozens of customers paying tens of thousands of dollars—sometimes even over $100,000—for its services. The company’s rapid growth caught the attention of investors, leading to its recent $36 million Series A round, which also saw participation from cloud leaders Datadog and Databricks. Prominent investors like Elad Gil, Greylock, Basecase, and executives from OpenAI, Zapier, and Notion further bolstered the round.
Martin Casado, a partner at a16z who led the funding, is a strong believer in the potential of Braintrust. He notes that while the market for AI is booming, many companies struggle to optimize their AI products effectively. “This is a new type of product developer that’s emerging,” Casado said. “Braintrust makes you a sophisticated AI programmer right away, turning what could be a frustrating experience into something scalable and reliable.”
A Better Feedback Loop for AI
At its core, Braintrust operates by offering a software development kit (SDK) that integrates directly into a company’s existing infrastructure. This allows teams to run continuous evaluations—commonly known as “evals”—on their AI tools, providing crucial insights into performance. These evaluations don’t just offer a snapshot of how the tools are working; they allow companies to track the accuracy and reliability of their AI models over time.
Goyal emphasizes that Braintrust is not an AI product itself but a tool that helps others build better AI software. “We’re here to help teams iterate and improve,” he said. With Braintrust’s software, companies can see tangible improvements in their AI tools, with reported accuracy often climbing from below 40% to over 80% in just a few weeks.
Take Zapier, for instance. The workflow automation platform relies on AI to handle millions of tasks each month, but before using Braintrust, their method of catching AI hallucinations was “ad hoc,” according to cofounder Bryan Helmig. Now, with Braintrust’s monitoring, Zapier has a much more robust system in place for managing datasets of prompts and ensuring AI accuracy. Similarly, productivity giant Notion has been able to streamline its AI processes, with cofounder Simon Last describing how Braintrust helped the company move from manually sharing errors to automated monitoring.
The Introduction of “Functions”
As part of its ongoing commitment to helping companies build robust AI products, Braintrust recently introduced “functions”—a feature that allows developers to create tools, prompts, and scorers directly in their codebase. These functions can then be uploaded to Braintrust for experimentation and deployment with minimal effort. Goyal explained that this new feature is designed to save teams time by automating the more tedious aspects of AI development, like managing infrastructure. “Functions are general-purpose primitives,” Goyal said, “that let you prototype, experiment, and deploy AI features with ease.”
This kind of innovation is crucial for teams looking to integrate AI seamlessly into their core products. Companies like Notion are already pushing the limits of what’s possible with AI, and Braintrust’s tools are helping them do it with greater confidence. “It lets us build much more complex stuff,” Last said, “and do it more confidently.”
A Rising Demand for Evaluations
Braintrust is not alone in recognizing the growing need for AI evaluations. As the AI industry continues to expand, more companies are turning their attention to the importance of testing and monitoring AI systems. Startups like Galileo have also entered the space, and established AI labs, including OpenAI, have begun rolling out their own evaluation tools. Despite this competition, Goyal is confident that Braintrust’s neutral position as a third-party evaluator gives it an edge. The company is not tied to any one model or platform, allowing it to work across multiple systems and provide unbiased insights.
Investors see this neutrality as a key advantage. The ability to work with different AI models in an interchangeable manner, much like how cloud providers like AWS and Azure operate, makes Braintrust a versatile tool for companies. As Notion’s cofounder Simon Last put it, “Nobody wants to be stuck with just one AI model.”
The Future of Braintrust
With the recent influx of capital, Braintrust is well-positioned to continue expanding its services and reach. The company’s roadmap includes further enhancements to its SDK and the addition of new features that make AI development even more seamless for companies. Goyal is also looking beyond Silicon Valley, with the goal of bringing Braintrust’s tools to a broader range of industries.
For Goyal, the biggest risk isn’t competition—it’s the possibility that AI as a whole may not live up to its lofty promises. “If AI is impactful, Braintrust is incredibly well-positioned,” he said. “If it turns out not to be a big disruptive thing, well, that’s a bet I’m willing to take.”