Ensuring the success of AI implementations involves more than just deploying models; it requires continuous monitoring and management of these models to detect and prevent unexpected behaviors. AI observability is a critical component in this process, allowing organizations to gain insights into how their AI systems operate in real-time. By identifying and addressing issues proactively, AI observability helps maintain the reliability and trustworthiness of AI applications at scale.
This week, we have Venky Veeraraghavan, Chief Product Officer at DataRobot who leads the Product Team and is instrumental in shaping and implementing their AI platform. With more than twenty-five years of experience in product leadership, including positions at Microsoft and Trilogy, Venky has dedicated over a decade to developing hyperscale BigData and AI platforms for major global organizations. His dedication to AI observability reflects a deep commitment to enhancing the reliability and value of AI systems, fostering innovation and building trust in AI solutions.
AIM Media House: Why are you particularly passionate about AI observability and predicting unexpected behaviors at scale? What inspired you to focus on this topic?
“The question is, how is this useful for others? What else can we do to add value for customers with AI?”
Venky Veeraraghavan: DataRobot has been building an AI platform for a long time. We started with AutoML, then moved on to MLOps. When Gen AI came along it was clear that we weren’t going to be the team building large models.
The question is, how is this useful for others? What else can we do to add value for customers with AI? It’s evident that the initial models, and even those today, are not fully reliable. They often provide unexpected answers, leading to a confidence problem among users. People are hesitant to rely on these outputs for their business workflows. The basic response was, “I don’t trust it.”
So, what can we do to support that trust? That’s where we focused our efforts. We identified two key problems, the second one being more prominent today. First, how do you select the right LLM (Large Language Model), prompting method, and database from the hundreds available? Second, once these are in production, how do you ensure they work as expected?
If they’re not working, what actions should be taken to mitigate issues? These questions are very important to customers. We’ve dedicated a significant amount of our development hours to creating robust solutions in this area.
AIM Media House: How has the role of AI observability evolved with the changing landscape of AI implementation? What significance does it hold across various platforms today?
“The emergence of generative AI has dramatically changed this landscape.”
Venky Veeraraghavan: I got into machine learning (ML) about eight years ago, starting with building the internal platform for the Office team at Microsoft. The machine learning models were deployed for the homepage, relevance, and ads, making MLOps crucial. Although we didn’t call it MLOps back then, it was all about ensuring that the model functioned effectively.
We focused on both operational aspects, such as latency and cost, and ML metrics, such as data drift. The core of observability was to make sure the model worked correctly. If it drifted, we would retrain the pipeline with new data and redeploy it. This was the challenging part of the observability problem.
However, the emergence of generative AI has dramatically changed this landscape. In traditional ML, you might look at any one prediction or a set of predictions over time to compare distributions and decide if action was needed. This process was somewhat offline.
With generative AI, you’re interacting with a chatbot or solution, and each input could produce a bad answer. Now, you need to understand every prediction and every generation in real-time. You must recognize when something is wrong and mitigate it immediately, as a human is on the other side of the conversation.
This shift has transformed observability from an esoteric concept understood mainly by data scientists to a critical, real-time issue. No one wants to provide bad answers or data to users or face embarrassment due to poor model performance. As a result, interest in observability has surged, highlighting its importance like never before.
AIM Media House: Can you shed some light on the concept of custom metrics in AI observability? What do custom metrics mean, and what significance do they hold in today’s landscape of thorough supervision across various models?
“When we talk about driving customer metrics, it’s important to recognize that AI doesn’t exist in isolation.”
Venky Veeraraghavan: I love the idea of customer metrics. We have a very strong product in that area. When we talk about driving customer metrics, it’s important to recognize that AI doesn’t exist in isolation. Traditional metrics, like operational latency, count, cost, and utilization, are fixed and straightforward. These are system metrics we are familiar with. However, ML is part of a broader workflow, integrated within larger business processes.
Customers want to measure the business value of these predictions. For example, they might develop metrics that represent the value of a prediction. Over time, they can track how these predictions impact their operations. Imagine having a metric that predicts fines if certain thresholds are not met. You could count the number of times predictions fall below this threshold and estimate potential fines. This gives a clear value metric that is highly specific to the business and its use case.
Another example of custom metrics involves guard models for generative AI. Customers might need to ensure that outputs are on-topic, adhere to specific standards, or avoid toxicity. These guard models can be based on out-of-box metrics or custom-built metrics tailored to their needs.
A flexible framework for measuring, adding, subtracting, and modifying these metrics is essential for observability. This ability to customize metrics according to specific business requirements has become a critical aspect of ensuring reliable AI performance.
AIM Media House: What are your thoughts on the democratization of AI observability, especially considering the shortage of skilled professionals? How do tools like DataRobot contribute to ensuring responsible and accountable AI, maintaining accuracy levels, and enabling sanity in critical business predictions?
“To address this, our platform allows various users, whether code-first or no-code, to build solutions.”
Venky Veeraraghavan: Democratization in the context of observability is an excellent topic. The essence of democratization is bringing tools closer to business owners and subject matter experts, who may not be ML or operations experts. The challenge is balancing ease of use with ensuring proper deployment.
Our platform is designed for hybrid teams or fusion teams, which include business analysts and data analysts who understand the subject matter well but aren’t professional data scientists. These users need to build models that make sense in their world but face non-functional concerns during deployment.
To address this, our platform allows various users, whether code-first or no-code, to build solutions. All solutions are centralized in a single registry. When deploying, guardrails and monitoring can be applied universally. This approach decouples model building from deployment. While expert knowledge is required for end-to-end deployment, our platform allows users to build models while the platform handles guardrails.
For example, our platform can ensure that brand guidelines are adhered to in generative AI models, such as avoiding competitor mentions. These rules are specific to the company and can be applied automatically. A user building a customer service chatbot doesn’t need to understand these guidelines; they can focus on the chatbot, knowing that the platform will enforce the necessary rules.
This approach empowers subject matter experts and democratized users to build real applications that can be safely deployed into production. This capability is the true superpower of observability.
AIM Media House: How can we consolidate and streamline the observability toolchain for AI systems amidst tool proliferation? With cost concerns around log data storage and analysis, what innovative approaches can make comprehensive AI observability more economically feasible for enterprises?
“I strongly believe that there is an AI lifecycle that is not infinite but fairly fixed.”
Venky Veeraraghavan: The proliferation of features in general, is indicative of the early maturity of this space. Every startup decides to build a feature, which is interesting. However, if you try to string all of them together, you end up spending all your time doing so instead of actually addressing the core problem. I strongly believe that there is an AI lifecycle that is not infinite but fairly fixed. It starts with feature engineering and knowledge engineering, then moves to building and training models, and finally to observing and governing the models.
When doing this, you need to ensure everything works together. As a platform provider, we feel it’s important to integrate these aspects. When you identify a problem, having a feature or tool to detect it is just the beginning. The next question is, how are you going to fix it? The fix could be in real-time during serving, by changing the serving code, or it could involve developing a better model. This requires sending a signal back to where the issue originated.
There are different approaches to this. Hyperscale providers offer a series of services that you need to put together, while a SaaS model integrates everything, allowing for custom metrics and IP but with a unified orchestration framework. The last option is open source, where you build everything from scratch, giving you the most control but also requiring the most investment before you see value.
Our theory and approach focus on building a platform that handles most of the foundational work, allowing any IP you add to directly contribute value to your company rather than just base infrastructure.
Regarding logs and everything else, there are well-established techniques for log management and optimization. However, the impact of generative AI on these processes is still being determined. There are latency concerns and questions about how much logging and spacing are necessary. We are still learning about setting specific logging and optimization parameters and seeking to understand these differences better.
AIM Media House: How is the role of a data scientist evolving with advancing technology, especially as platforms are built to focus on business applications? How can we ensure data scientists fundamentally understand their models to avoid potential risks and ensure true democratization?
“With the push for greater value and ROI, AI must be integrated into applications, not just exist in notebooks or dashboards.”
Venky Veeraraghavan: That’s a great question. In the realm of predictive AI or classical machine learning, data scientists traditionally led the way, building models and working with subject matter experts. This remains true for much of predictive AI. However, with generative AI, this model has shifted. Large language model providers now handle much of the hardcore data science, offering models as open-source or API-based solutions, essentially functioning as platforms.
The key question is; what do you do with these models? Here, software developers organize the data flows, while subject matter experts like product managers and marketing managers determine the specifics, such as the voice of a chatbot or the design of personalized emails. More personas are now involved in building these solutions.
With the push for greater value and ROI, AI must be integrated into applications, not just exist in notebooks or dashboards. This integration is handled by software developers, creating larger, more collaborative teams.
Data scientists still play a crucial role in this new landscape. They develop evaluation frameworks to ensure solutions built with generative AI are correct and remain so, and they detect errors. Data scientists create and optimize metrics, assess different models, and may fine-tune smaller, open-source models for efficiency.
In summary, the role of data scientists has evolved but remains vital. They are part of a broader team, focusing on ensuring the quality and efficiency of AI solutions.
AIM Media House: How do you manage the rapid pace of AI and ML development as a product officer, and do you ever feel overwhelmed by the constant need for innovation? How do you balance keeping up with new trends and ensuring practical, impactful product features?
“I love this job and spend all my time identifying what my product engineering team can do to deliver the highest value for our users.”
Venky Veeraraghavan: I’ve been a product person for over 20 years, and you’re never fully ready for change—it always happens. But I’m used to it. The pace of new developments, especially with generative AI, is incredible. It reminds me of the early days of the internet, where new concepts emerged almost daily, and you had to constantly keep up. Over time, you find some best practices, but the most fun is figuring it all out.
To the best of my ability, both physically and mentally, I’m always learning and doing practical work. The biggest compliment I can give myself is figuring out what practical product to build. My job is to understand big trends, distinguish between science fiction and practical merit, and select features that are usable at scale. I love this job and spend all my time identifying what my product engineering team can do to deliver the highest value for our users. That’s my day job, and I love it.
Do I get tired? Absolutely. Do I get overwhelmed by constantly evolving workflows? Yes, but that’s part of the challenge and excitement.