In 2021, Russ d’Sa, formerly at Twitter, and David Zhao, a Motorola veteran, launched an open source project called LiveKit. At the time, it aimed to solve a growing but under-addressed problem: transmitting high-bandwidth, real-time audio and video with low latency, reliability, and scale. Within months, large companies like Spotify, Oracle, and Reddit began asking for a managed version. That interest led to LiveKit Cloud, a commercial layer built atop their open source foundation. Today, LiveKit powers media infrastructure for names like Meta, Microsoft, and OpenAI, and counts over 100,000 developers on its cloud and open source platforms.
The company, which began during the pandemic as a tool to make communication easier, has now positioned itself at the core of the voice AI boom. When OpenAI launched ChatGPT Voice Mode in September 2023, it did so in partnership with LiveKit. That same week, LiveKit released its own open source voice agent framework, making it easier for developers to create voice-first AI applications. What was once seen by many investors as a speculative interface “3–5 years out,” quickly turned into a full-fledged sector accelerated further by the debut of GPT-4o. LiveKit Agents was used in every GPT-4o voice demo, cementing its infrastructure as a staple of the emerging voice AI stack.
“If OpenAI is building the brain, LiveKit is building the nervous system to carry signals to and from that brain,” Russ D’Sa said.
The company has now raised $45 million in Series B funding, led by Altimeter, with participation from Redpoint Ventures and Hanabi Capital. Altimeter’s Jamin Ball and Brad Gerstner, who led LiveKit’s Series A, have returned to double down. Mike Volpi, a longtime infrastructure investor, is participating through his new firm, Hanabi Capital—his first investment from the fund.
LiveKit’s growth has been closely tied to the rise of voice-based AI interfaces. In September 2023, when OpenAI launched ChatGPT’s Voice Mode, LiveKit simultaneously released the first version of its Agents framework, an open source toolkit for developers building voice AI agents. At the time, the broader market showed little interest in voice interfaces. Multiple investors told LiveKit’s team that voice AI was “3–5 years out.” That changed rapidly after the GPT-4o announcement, which included voice demos built on LiveKit Agents. Since then, the voice AI ecosystem has expanded quickly.
Source: LiveKit
LiveKit Agents are now used across industries. Hello Patient uses it to automate hospital workflows. Salient applies it to automotive loan servicing. Podium deploys AI employees for sales, scheduling, and customer support. The infrastructure layer—LiveKit Cloud—is what enables these agents to function at scale, exchanging real-time audio data with low latency around the world.
Today, LiveKit is announcing Agents 1.0, a major release that consolidates its infrastructure into a platform for building and deploying voice AI systems. Among the key features is a multi-agent orchestration engine, allowing developers to build voice agents that collaborate across subtasks. This matters particularly for business-facing “closed loop” voice agents—ones that follow fixed workflows, like customer support or loan qualification. Prior to Agents 1.0, many developers tried to encode these workflows into long prompts for language models, with unreliable results. LiveKit’s new workflow system breaks these interactions into smaller, more deterministic components, increasing reliability.
Source: LiveKit
“LiveKit Agents 1.0 makes building closed loop voice agents much easier,” the team wrote. “We’ve redesigned the entire framework to be lower level and more flexible, allowing a developer to orchestrate multi-agent workflows that break otherwise complex system prompts into discrete subtasks.”
The 1.0 release also includes a new semantic turn detection model, trained in-house and optimized for speed and multilingual support. It performs CPU inference in under 25 milliseconds and supports 13 languages, including English, Chinese, French, Spanish, Russian, and Japanese. The model is capable of handling mixed-language conversations, where users and agents switch between multiple languages in a single dialogue.
“A few months ago we introduced our first open source model, trained in-house to improve the accuracy of turn detection: one of the hardest problems in voice AI,” the team explained. “Today we’re releasing a new, larger semantic turn detection model with multilingual capabilities.”
Another major component is Telephony 1.0. LiveKit began building an open source SIP telephony stack in late 2023, shortly after the ChatGPT Voice Mode launch. The goal was to support voice AI in telecom, a sector that relies entirely on voice. The stack now handles thousands of concurrent calls and supports HD audio, DTMF, call transfers, and noise cancellation. It is currently used by 25% of U.S. emergency dispatch centers—an infrastructure role that, according to LiveKit, contributes to saving at least one life per week.
“Even a quarter of 911 emergency dispatch centers in the US use LiveKit — which helps save at least one life every week,” the company noted. “In that vein, we’re bumping SIP to 1.0 to reflect its maturity and robustness.”
LiveKit is also launching Cloud Agents, a new deployment and devops layer for voice agents. Unlike web applications, voice agents are stateful—they run inference on GPUs while continuously listening and reacting to human speech. This requires more sophisticated orchestration: dynamic provisioning, health checking, transparent failover, and context migration. Cloud Agents is LiveKit’s answer to those demands. Developers can deploy agent code into containers managed across LiveKit Cloud’s global edge network. The service handles load balancing, logging, rollbacks, and versioning, and has been in internal use at LiveKit for months. A closed beta begins today.
“Vercel is to NextJS what Cloud Agents is to LiveKit’s Agents framework,” the team wrote. “We host your agent code in a secure container, deploy it across LiveKit Cloud’s network of data centers around the world, and manage the entire devops lifecycle for you.”
LiveKit’s infrastructure is used well beyond consumer tech. It powers real-time communication for aerospace launch monitoring, police drone operations via Skydio, and government-facing apps at Oracle and Adobe. The product suite includes SDKs, APIs, and tools for building streaming video and audio experiences—what d’Sa describes as an “AI-native cloud provider.”
“It turns out what LiveKit is ultimately building is ‘AIWS’ — an AI-native cloud provider,” d’Sa said. “What Stripe did for payments, LiveKit is doing for communications.”
The company’s financials reflect that momentum. LiveKit crossed a $10 million run rate last year. It currently employs around 50 people and is focused on expanding its engineering and product teams to meet growing infrastructure demands.
“There’s a lot more building to do ahead of us than behind,” d’Sa said. “We plan to use this capital towards growing our team and furthering our progress towards offering an all-in-one platform for building AI agents that can see, hear, and speak like we do.”