By all accounts, Databricks has a data problem and so does everyone else.
“Everybody has some data, and has an idea of what they want to do,” said Jonathan Frankle, chief AI scientist at Databricks’ Mosaic AI. “Nobody shows up with nice, clean fine-tuning data that you can stick into a prompt or an [application programming interface].” That gap between ambition and usable input has become the defining bottleneck in enterprise AI.
Dirty data, not model size or GPU availability, is what’s holding back most organizations from realizing the promise of generative AI. And Databricks, a company that’s built its reputation on infrastructure for training large models, now finds itself facing the same challenge its customers do: the data is too messy.
To close that gap, Databricks is advancing a new machine learning technique and acquiring companies that attack the issue from a different angle. Its latest training method, Test-time Adaptive Optimization (TAO), circumvents the need for hand-labeled datasets entirely. Instead of requiring curated fine-tuning data, TAO identifies the best outputs a model can generate and trains the model to prefer those, an approach known as “best-of-N.” The selection is governed by DBRM, the Databricks Reward Model, which learns from human preferences to decide which responses are worth keeping.
Frankle calls it “relatively lightweight reinforcement learning,” borrowing a strategy more often associated with OpenAI or DeepMind. But unlike those firms, Databricks is betting that such training techniques can make a difference only if paired with aggressive work on the data itself. “Without well-labeled, carefully curated data, it is challenging to fine-tune an LLM to do specific tasks,” he said. “This is exactly what every enterprise is trying to do.”
TAO is already delivering results. On FinanceBench, a benchmark for financial analysis, Meta’s Llama 3.1B model scores 68.4%. GPT-4o and OpenAI’s o3-mini hit around 82.1%. But with TAO, Databricks pushed Llama 3.1B to 82.8%, edging past both proprietary models.
Still, reinforcement learning is no silver bullet. “It can behave in unpredictable ways,” said Christopher Amato, a Northeastern University professor specializing in the technique. But the upside, he added, is real: more scalable data labeling and better performance over time as models and their reward functions mature.
Yet even clever methods like TAO don’t fix what Frankle keeps emphasizing: the input data is often stale, inconsistent, or hard to transform into features a model can learn from. That’s why, last week, Databricks quietly acquired Fennel AI founded by Nikhil Garg, Abhay Bothra, and Aditya Nambiar, a startup focused on real-time feature engineering. No price tag was disclosed, but after raising $15 billion in January, Databricks isn’t exactly short on cash.
Founded in 2023 by ex-Meta and Google Brain engineers Nikhil Garg and Abhay Bothra, Fennel offers a Python-native platform that lets data scientists define features directly with no data engineering team or custom pipelines required. Its core innovation is computing only what’s changed since the last run, avoiding redundant processing. For real-time use cases like fraud detection or personalized content ranking, that optimization can be the difference between laggy and live.
Fennel’s early clients include Rippling, Cricut, and Upwork, using it for credit scoring, trust and safety, and recommendation systems. What it offers Databricks is not just tech, but a bridge—connecting high-powered modeling tools with the messy operational data sources that most enterprises rely on.
“Machine learning models are only as good as the data they learn from,” Databricks wrote in the acquisition announcement. “That’s why feature engineering is so critical.”
That conviction is starting to define Databricks’ strategy. The company’s open-source LLM, DBRX, was built from scratch and released as a public signal of its transparency and technical confidence. But releasing models is the easy part. Getting them to work in production, with real-world, unpredictable data? That’s the actual problem—and one that feature engineering, not just model architecture, has to solve.
Databricks’ moves are increasingly focused on this less glamorous but more decisive layer of the stack: not building bigger models, but feeding existing ones better signals. TAO and Fennel represent two flanks of the same front. One improves the model’s learning process. The other ensures the model has something meaningful to learn from.
Even in the age of LLMs, feature engineering hasn’t gone away. If anything, it’s become more important to provide personalization, domain context, and real-time grounding that models otherwise miss. TAO might help a weak model find its footing, but it still needs to stand on something solid.