“Building expert-level agentic AI requires expert-level data” – Alex Ratner, CEO at Snorkel AI
One of the biggest hurdles Gen AI enterprises face is ensuring models can be trusted in real-world, specialized settings. That’s the challenge Snorkel AI was founded to solve in 2019. By shifting focus from model development to data development, Snorkel aimed to address the lack of high-quality, domain-specific training and evaluation data. Now, competitors such as Labelbox and Scale have seized the moment to position their platforms as end-to-end solutions. In an increasingly crowded market, Snorkel’s bet on expert-guided, data centric AI is in question.
The Palo Alto-based startup just announced a $100 million Series D round, bringing its total funding to $237 million and its valuation to $1.3 billion. Led by Addition, with participation from Greylock, Lightspeed, QBE Ventures, and others, the new funding will support further expansion of its flagship data development platform, Snorkel Flow, and the launch of new products aimed at strengthening enterprise AI evaluation.
Snorkel began in what co-founder and CEO Alex Ratner describes as an “afternoon project” at the Stanford AI Lab with Chris Re, Paroma Varma, Braden Hancock, and Henry Ehrenberg. Back then, ML teams were spending their time building complex models and treating training data as an afterthought. But Ratner and his co-founders saw that the real bottleneck was data, or rather, the effort required to label it for use in training. By 2019, Snorkel officially spun out of Stanford to commercialize a new approach called programmatic data labeling.
Rather than rely on teams of people manually tagging data point by point, Snorkel enables subject matter experts to encode their knowledge into labeling heuristics that can then be applied across vast datasets. “This transforms months of manual labeling into hours of work,” Ratner explained in a conversation with the IA40 podcast. “It makes the labeling process more efficient and repeatable, similar to software development workflows.”
Competition Looms for Snorkel
Snorkel’s trajectory was tested with the arrival of ChatGPT and a wave of general-purpose AI models in late 2022. As LLMs like GPT-4 began showing surprising capabilities out-of-the-box, some enterprises paused their AI investments, assuming that foundational models could handle tasks without bespoke training data. But Ratner and the Snorkel team didn’t see it that way.
“These foundation models are great starting points,” said Ratner. “But you still need labeled data tailored to the specific domain and task to get utility out of them. You get what you train on.”
Still, they have competition in the data labeling and development space in the form of companies like Labelbox, Scale, and Datasaur that are offering their own comprehensive solutions that challenge Snorkel’s positioning. Labelbox, for instance, provides an end-to-end platform that combines software with a network of over 10,000 expert labelers, enabling fast delivery and human-in-the-loop quality control across video, image, and chat.
Where Snorkel emphasizes automation through programmatic labeling, Labelbox differentiates with high-quality human-generated data customization. Others like Scale and Appen are leaning heavily into RLHF (reinforcement learning with human feedback), fine-tuning, and prompt optimization pipelines: areas Snorkel is still beginning to address.
Prioritizing Scalable Practices vs. Speed
Ratner believes Snorkel’s biggest advantage lies in its ability to inject expert knowledge into the data pipeline at scale, making it especially suited to industries like finance, healthcare, telecom, and government sectors where accurate and auditable AI decisions are non-negotiable. As models become increasingly commoditized, Snorkel sees value shifting to the quality of the data that powers them.
To further its mission, Snorkel just launched two new products: Snorkel Evaluate and Snorkel Expert Data-as-a-Service. These products are part of Snorkel’s vision to move beyond labeling and support the full data-centric AI lifecycle and keep up with competitors. “Labeling is only one part of the loop,” Ratner said. “Our platform enables sampling, slicing, augmenting, training with AutoML, and iterating with model feedback. You have to close the loop to guide data-centric development.”
Looking ahead, Snorkel plans to use its new funding to expand engineering and research capacity, enhance the Snorkel Flow platform, and support enterprises in deploying AI systems. “We don’t want to slow down during one of the most historic opportunities for growth in AI,” said Ratner. “But we also believe that building good, scalable practices and culture is just as important. That’s how we stay grounded, even while pushing forward.”