For years, the autonomous vehicle industry has relied on rigorous simulation techniques to refine self-driving technology. Engineers have spent countless hours testing, evaluating, and iterating on systems to ensure reliability in an unpredictable world. Now, the same approach is being applied to a different kind of automation: AI voice and chat agents.
Brooke Hopkins, a former tech lead at Waymo, saw a familiar challenge when she transitioned from self-driving cars to AI agents. At Waymo, she helped develop the infrastructure for testing and evaluating autonomous systems, ensuring they could navigate complex environments safely. When she turned her attention to AI agents, she realized the industry was struggling with the same core problem: reliability.
“When I left Waymo, I realized a lot of these problems that we had at Waymo were exactly what the rest of the AI industry was facing,” Hopkins told TechCrunch. “But everyone was saying that this is a new paradigm, we’re having to come up with testing practices from first principles and that basically we all have to recreate everything. And I looked at that and said, wait, we’ve spent the last 10 years in self-driving figuring out how to do this.”
That realization led to the creation of Coval, a San Francisco-based startup dedicated to evaluating AI agents through automated simulations. Founded in 2024, Coval builds large-scale testing environments for voice and chat AI, much like how Waymo simulates millions of driving scenarios to improve its autonomous vehicles. Hopkins formally shaped the idea while participating in Y Combinator’s Summer 2024 batch, and by October, Coval was ready to launch publicly.
Coval’s foundation is built on two core principles: deep expertise in observability infrastructure and an unwavering focus on user experience. The founding team brings extensive experience in developing complex observability systems while also prioritizing intuitive developer tools. They understand that great developer tools don’t just solve technical problems—they make complex tasks feel effortless.
Now, just a few months later, Coval is announcing a $3.3 million seed round led by MaC Venture Capital, with participation from Y Combinator and General Catalyst. The funding will support Coval’s engineering expansion and its push toward product-market fit. The company is also looking to broaden its evaluation framework beyond conversational AI, eventually incorporating other types of autonomous systems, such as web-based agents.
The need for such evaluation tools is growing rapidly. AI agents are being deployed across industries, but their reliability remains a major concern. Companies building these systems often rely on slow, manual testing processes, where engineers spend hours diagnosing issues, only to find that fixing one problem introduces another.
“One of the biggest blockers to agents being adopted by enterprises is them feeling confident that this isn’t just a demo with smoke and mirrors,” Hopkins said. “Choosing between vendors is a really complicated task for these executives because it’s just very hard to know what you even ask or how do you even prove that these agents are doing what you expect. And so this gives our companies the ability to really show that and demonstrate it.”
Coval’s approach allows companies to test their AI agents at scale. The platform can run thousands of simulations in parallel, evaluating how an agent handles tasks like making a restaurant reservation or responding to a customer service request in an indirect way. Coval provides general performance metrics but also allows companies to define custom benchmarks and track regressions over time.
The startup is entering the market at a time when demand for AI agents is at an all-time high. Major industry companies, like Salesforce CEO Marc Benioff, have spoken out about the potential of AI agents, with Salesforce aiming to deploy over a billion of them next year. OpenAI has created its own AI agent technology, and the startup ecosystem is also rapidly evolving. More than 100 firms in Y Combinator’s three 2024 cohorts were focused on artificial intelligence. Some, such as /dev/agents, have received significant early-stage funding, with /dev/agents earning a $55 million seed round at a $500 million value in November 2024.
With so many teams racing to market, the demand for evaluation tools like Coval’s is only growing. Hopkins believes Coval’s advantage lies in its experience. “I think where we really stand out is I’ve been working in this space for half a decade and I’ve built these systems over and over,” she said. “We’ve built multiple iterations and we’ve seen how they fail and how they scale and we’re building the same concepts into Coval and all of those learnings.”
Her experience at Waymo is influencing Coval’s trajectory. In the early days of autonomous vehicles, firms relied primarily on manual testing, driving automobiles on test tracks and city streets. However, as the industry grew, the approach changed—engineers began testing every code modification in virtual settings, conducting large simulations to improve efficiency. This transformation was critical in putting self-driving cars on the road safely, and Hopkins envisions a similar evolution in AI agents.
“At Waymo, I developed tools that tested each code modification made by engineers, ensuring that every change improved the Waymo Driver’s performance. I believe this methodical approach was key in helping our team address edge cases and maintain peak performance, and it ultimately cemented Waymo’s status as a leader in the autonomous vehicle space,” she said.
Coval attempts to apply the same rigour to AI agents. As automation spreads into crucial sectors such as consumer interactions, company operations, and even healthcare, the hazards of unreliable systems increase. Without a comprehensive testing infrastructure, AI agents may cause more problems than they solve.
“Teams are coming up with promising prototypes but often hit a wall when it comes to their reliability,” Hopkins said. “As we build for the future, where AI agents execute much of our work, ranging from sending emails to prescribing medication, the risks posed by untested systems could severely throttle the progress.”
Coval’s long-term ambition extends beyond conversational AI. The business aims to become the go-to evaluation platform for all agentic systems, ensuring that autonomous software works reliably across industries. As businesses explore ways to incorporate AI into their workflows, Coval is promoting itself as a crucial component, allowing teams to move quickly while maintaining trust and reliability. The question of whether AI can be trusted is more than simply philosophical; it is a practical barrier to widespread adoption. Coval is striving to make AI agents more reliable at scale by utilising lessons learnt from the autonomous vehicle business.