In September, Alexis Conneau made headlines when he departed OpenAI, where he led the development of ChatGPT’s advanced voice features. The project had sparked controversy when one of its AI-generated voices bore a striking resemblance to actress Scarlett Johansson. Johansson revealed she had been approached to license her voice for the project and had refused. Although OpenAI denied intentionally mimicking Johansson’s voice, comparisons to her role in the 2013 film Her were unavoidable, igniting debates around AI ethics and imitation.
Now, Conneau is charting a new course with WaveForms AI, co-founded with Coralie Lamaitre, an engineer turned strategist. Emerging from stealth today, WaveForms AI announced $40 million in seed funding led by Andreessen Horowitz (a16z), valuing the startup at $200 million. The venture aims to elevate audio AI by pioneering what Conneau calls emotional general intelligence (EGI). The ultimate goal: creating AI models capable of perceiving and responding to human emotions with unprecedented nuance.
A Departure Marked by Controversy and Inspiration
Reflecting on the Johansson controversy, Conneau defended the project. “The ChatGPT voice was never meant to mimic Johansson,” he stated. However, he acknowledged the comparisons to Her, a film about a man’s emotional relationship with an AI assistant voiced by Johansson. “When they see the technology, they think about the movie right away,” he said.
Conneau cheekily nodded to the Her comparisons in his exit post on X (formerly Twitter): “After an amazing journey at @OpenAI building #Her, I’ve decided to start a new company.”
While Conneau admits the movie has inspired him, he’s clear that its dystopian undertones are not a model for WaveForms. “The movie’s depiction of the complex, negative impacts of an AI relationship is something that we should probably avoid,” he noted. Instead, WaveForms AI seeks to use audio to augment, rather than replace, human interactions.
Building Audio as the “Social-Emotional Layer” of AGI
Conneau views audio as the key to unlocking the “social-emotional layer” of artificial general intelligence (AGI). According to him, while companies like OpenAI, Google, and Meta focus on AGI’s intellectual capabilities, WaveForms aims to make AI interactions deeply human and emotionally resonant.
“Audio is the first emotional, social-emotional layer of AI,” Conneau explained. He envisions WaveForms’ audio language models (audio LLMs) capturing emotional subtleties such as tone, inflection, and accent. These capabilities could enable AI to interpret the emotional context of conversations and respond empathetically.
For instance, an AI tutor powered by WaveForms could recognize a student’s frustration and respond with additional patience or encouragement. This emphasis on emotional understanding could redefine how humans interact with AI across education, healthcare, and customer service.
From Facebook to OpenAI
Conneau’s fascination with audio AI led him to OpenAI three years ago after a stint as a researcher at Facebook. He cold-emailed OpenAI co-founder and Chief Scientist Ilya Sutskever, who became a mentor and collaborator.
At OpenAI, Conneau spearheaded the development of GPT-4o’s Advanced Voice Mode, a groundbreaking feature capable of real-time speech recognition and response. Unlike earlier versions that converted speech to text for processing, GPT-4o’s Advanced Voice Mode directly tokenized audio, enabling low-latency, natural-sounding conversations.
Despite his success at OpenAI, Conneau felt constrained by its broader AGI focus. “I really enjoyed my time there, but companies like OpenAI, Google, and Meta are AGI-focused. What we’re talking about here is a different focus,” he explained.
He declined to comment on whether Sutskever’s departure from OpenAI in May influenced his decision to leave but praised Sutskever as “the greatest scientist we might ever see in AI—similar to Einstein in AI.”
A Vision for Emotional Intelligence
WaveForms AI’s mission is ambitious: training foundation models that can make voice interactions with AI indistinguishable from human conversations. The company plans to release its first consumer product in 2025, focusing on immersive experiences that enhance emotional connections between users and AI.
Unlike other AI companies optimizing for metrics like “time spent on platform,” Conneau insists WaveForms will prioritize human well-being. “We don’t want to replicate the mistakes of social media,” he said. Instead, he envisions AI as a complementary aspect of social life, capable of enriching human interactions.
Backing from Andreessen Horowitz and Beyond
Andreessen Horowitz (a16z), whose general partner Martin Casado led the investment in WaveForms, shares Conneau’s vision. Casado sees AI as a potentially healthier alternative to some human interactions. “I can talk to a random person on the internet who could bully me, or I could talk to an AI,” Casado said, suggesting that AI could provide safer, more constructive interactions.
Marc Andreessen, a16z co-founder, has expressed personal interest in the project, aligning with his belief that AI should integrate deeply into human life.
Avoiding the Dystopia of “Her”
Conneau remains cautious about the ethical implications of emotional AI. While platforms like Character.AI have demonstrated the demand for AI companionship, they’ve also highlighted the risks of dependency and isolation. “Is it going to replace human interaction? I don’t think that’s the future that will happen,” he said.
Instead, Conneau envisions AI as a tool for inspiration and support—whether it’s acting as a patient teacher, a compassionate healthcare assistant, or an engaging travel companion. “The idea is to create new, more immersive experiences with AI, ones that feel more enjoyable,” he said.