In the dynamic realm of artificial intelligence (AI), where innovation often outpaces comprehension, the quest for transparency and equity has become paramount. At the vanguard of this transformative movement stands Fiddler AI, led by its Founder and CEO, Krishna Gade.
With a steadfast commitment to unlocking the mysteries of AI and making its intricacies accessible to all, Fiddler heralds a new era of transparent and equitable technology. In an era where the black box of AI can seem inscrutable, Fiddler’s pioneering approach empowers users to interact with AI systems, unraveling their inner workings and fostering trust in their decisions.
As we delve into the evolving landscape of artificial intelligence, Fiddler emerges as a guiding light, illuminating the path towards a future where AI is demystified and embraced as a catalyst for understanding and progress.
AIM: What is the origin and significance of the name “Fiddler” and what does “Fiddler” mean?
“One day people will refer to using Fiddler to interact with their AI and build trust.”
Krishan Gade: The model is essentially becoming more and more like a complex brain on its own. You know, this interaction, where you’re trying to understand me and I’m trying to understand you, allows us to exchange questions. Similarly, we wanted to build an interface so that humans can ask questions to AI and essentially tinker with these inputs to observe how the model reacts in response to them. That’s where the name comes from. And we are excited; hopefully, one day people will refer to using Fiddler to interact with their AI and build trust.
AIM: What motivated the effort to address transparency and explainability in a particular problem domain? What approaches or methods were previously employed for transparency and explainability, and what specific problem statement prompted the need for a new solution or improvement in this regard?
“We aimed to establish an enterprise company capable of offering observability and explainability as a service to large enterprises.”
Krishna Gade: I’ve been immersed in AI and machine learning for about 20 years now. My first job out of college was at Microsoft Bing as a search quality engineer exactly 20 years ago, when Microsoft was striving to develop a Google-like search engine. One of the strategies we adopted was to build state-of-the-art machine learning algorithms to enhance search quality and provide the best possible search results. At that time, we were implementing neural networks for search ranking, primarily two-layer networks. Since then, I’ve been exposed to and working on these types of systems.
Over the years, I’ve observed AI and machine learning becoming increasingly complex. Even during the era of two-layer networks, understanding how the network arrived at a particular search or ranking score wasn’t straightforward. As time progressed and I moved on to companies like Twitter, Pinterest, and Facebook, these systems evolved into highly intricate entities. We began dealing with deep neural networks, sparse neural networks processing vast amounts of user activity data, including Facebook posts and metadata, to personalize news and ads. Consequently, questions arose—why am I seeing this news story? Why is this particular ad showing up? This raised concerns about transparency.
During my tenure at Facebook, I contributed to tools like “Why Am I Seeing This?”, which provided easily understandable insights into how the News Feed algorithm operated. This experience led me into the field of explainable AI. I realized that while companies like Facebook and Microsoft could address this need, not every company had the talent, skills, or tools to do so. Therefore, we aimed to establish an enterprise company capable of offering observability and explainability as a service to large enterprises.
AIM: What advancements does Fiddler offer in terms of model explainability and transparency compared to traditional methods, and how do these advancements address the limitations of existing approaches?
“Fiddler adopts the latter approach, treating the model as a black box and using game-theoretic algorithms to ask counterfactual questions.”
Krishna Gade: There are various forms of explainability. There’s explainability aimed at the developer, where the focus is on understanding how the model works. For instance, in a simple model like a decision tree, it’s clear which pathway triggered a particular decision. However, as models become more complex—such as multi-layer networks, deep neural networks, and large language models—it becomes increasingly challenging to decipher what’s happening within the model. That’s one aspect of explainability.
The other form of explainability is directed towards the user who consumes the model’s outputs. For example, in scenarios like credit underwriting in a bank or generating content in customer service, users may not understand why a certain prediction or response is being made. This lack of transparency can lead to mistrust.
So, there are essentially two types of explainability: white box explainability, where efforts are made to understand the model itself, and black box explainability, where the model is treated as a black box, and probing questions are asked to understand its workings and build trust. Fiddler adopts the latter approach, treating the model as a black box and using game-theoretic algorithms to ask counterfactual questions. For example, we might ask, “Would someone with a salary of $10,000 still be approved by the loan credit risk model?” This approach aims to address the challenge of understanding and building trust with complex models by posing these counterfactual questions.
AIM: What is your opinion on the perennial trade-off between model explainability and accuracy in machine learning?
“Complex AI systems have shown superior accuracy, especially in domains requiring classification of unstructured data such as image classification, language understanding, and sentiment analysis.”
Krishna Gade: I think it is somewhat true. Over the years, we’ve observed a significant shift towards allowing models to learn from data and encode patterns, particularly with deep neural networks. This approach has proven highly successful compared to symbolic AI, where humans encode rules to build interpretable AI systems.
Complex AI systems have shown superior accuracy, especially in domains requiring classification of unstructured data such as image classification, language understanding, and sentiment analysis. The emergence of large language models for content generation, although mostly opaque, has demonstrated remarkable performance.
In certain domains, particularly those dealing with structured data, there remains an opportunity to build simpler models that perform comparably well. For example, predicting hospital readmission rates or credit risk scores can often be achieved effectively with simpler models. Additionally, generative additive models (GAMs) have found success in these areas.
Overall, while complex models excel in handling unstructured data tasks, simpler models still hold effectiveness in specific domains. This nuanced understanding underscores the evolving landscape of AI and its diverse applications.
AIM: What is the significance of the collaboration and partnership between Fiddler and In-Q-Tel, particularly considering In-Q-Tel’s involvement with various government entities, and how does this partnership align with Fiddler’s objectives?
“Through In-Q-Tel and other government agencies, we engage with defense and intelligence organizations to enhance their understanding of model operations.”
Krishna Gade: Yes, so In-Q-Tel, being an investor in Fiddler, serves as a partner, aiding us in collaborating with government agencies to develop tools for explainability and model monitoring. Through In-Q-Tel and other government agencies, we engage with defense and intelligence organizations to enhance their understanding of model operations.
One notable use case involves partnering with a large defense organization, where computer vision models are deployed to detect objects underwater, such as drones or sea vessels. However, when misclassifications occur, it’s crucial to understand why. For instance, in a showcased scenario, a model misclassified an underwater object as a ship instead of an airplane. Fiddler’s analysis revealed that the misclassification was influenced by pixels representing shadows in the image, resembling ship features. This highlighted the need for model retraining to address such misclassifications and improve accuracy.
Many people misunderstand explainability, viewing it as a binary choice. However, in the development of large deep learning models, explainability plays a vital role in detecting issues like data leakage. In a healthcare application, computer vision models analyzing X-rays for cancer detection exhibited high accuracy. However, upon closer examination, it was discovered that the model was picking up signals from radiologists’ markings on the X-rays, leading to data leakage. Explainability helped identify this issue, enabling the development of a more robust model.
These examples underscore the importance of explainability in machine learning, as it facilitates the detection of issues and the enhancement of model performance.
AIM: How has your focus shifted in the field of explainability since the emergence of generative AI, and how has ChatGPT influenced your perspective and approach in this evolving landscape?
“Large language models pose various challenges, such as hallucinations, leaking private data, or generating harmful content.”
Krishna Gade: We were working with language models even before, right? For example, we developed explainability for BERT models in the past, which were early versions of these language models. The rise of large language models has democratized access to AI across industries. Previously, building AI required significant infrastructure and hiring PhDs. Now, with pre-trained models available, companies with limited AI talent can begin their AI journey. Software engineers can learn through resources like Coursera and start working on large language models for applications like chatbots or search engines.
In our case, we’ve expanded beyond explainability to what we call observability, which encompasses monitoring models as a whole. Large language models pose various challenges, such as hallucinations, leaking private data, or generating harmful content. It’s not just about identifying inaccuracies or data leakage; it’s about understanding the entire model’s behavior. Observability allows us to detect problematic scenarios, such as why the model hallucinates or where harmful content originates from. By addressing these issues, we contribute to AI safety and enable companies to deploy trustworthy generative AI applications, which is significant for society.
AIM: What features and functionalities does the platform ‘Giga Scale Model Performance Management’ offer, and how does it uniquely contribute to improving transparency and enabling efficient AI transactions?
“When you have so many interactions, you need a scalable system that can capture all of these exhaust, which is basically the inputs and the outputs of the models, and process this data and look for abnormalities.”
Krishna Gade: The thing around model performance management or observability of models is like being able to collect the exhaust of the models coming through. These days, people are running models at scale. Let’s say if you have one of our banking customers with a billion interactions on their chatbot every day, and this is not even a large language model, they’re still using the old deterministic decision tree-based model.
But there are basically applications being rolled out, whether that’s recommendations applications or fraud detection applications, where scale is really high. You’re sending a whole, if you’re a fintech company, you’re a credit card company, you have many transactions going through, and you’re scoring each transaction if it is fraudulent or not. Or you may have an e-commerce company where you’re serving product recommendations, item recommendations for every user who logs in. So when you have so many interactions, you need a scalable system that can capture all of these exhaust, which is basically the inputs and the outputs of the models, and process this data and look for abnormalities.
Look for things like, are there any data quality errors creeping through? Are you seeing any inputs drifting over time? If you’re seeing any distribution shifts happening in the model outputs and then when you see these abnormalities, you can then help the AI teams catch them early and fix them in their production pipelines. Oftentimes, a lot of model performance issues happen because of data errors or data quality issues.
So what Fiddler does, it basically collects all of this data and collects the prompts, the responses, the embedding inputs, and helps you alert when things are going wrong and when unexpected things are happening and for that we had to build a scalable platform that can process millions and billions of these interactions that models are seeing and provide these visual insights for our customers.
AIM: How should organizations strategically adopt emerging technologies like Generative AI while ensuring ethical and responsible scaling with transparency and explainability?”
“You want to provide support to your customers in a natural, interactive, human, understandable manner. You also want to support your service agents to perform better.”
Krishna Gade: I believe AI and ML are here to stay, right? Especially with generative AI, there’s a huge opportunity for every industry to leverage it and optimize their business workflow. We’re hearing from travel companies, financial services companies that they’re unlocking tremendous opportunities. Recently, the CEO of a fintech bank called Klarna blogged about deploying an AI system handling two-thirds of their customer service interactions today. They project about $40 million of profit this year through generative AI. Enormous opportunities await. How do we do this responsibly? It starts with finding the right use case. Well-studied use cases for generative AI include customer support, marketing, and code generation. These are among the top three use cases, especially customer support and marketing, where there’s a wealth of custom data available.
You want to provide support to your customers in a natural, interactive, human, understandable manner. You also want to support your service agents to perform better. In marketing, personalized campaigns are key. How do you run personalized advertising campaigns based on who’s looking at your website, who’s engaging with your content? There’s ample opportunity to exploit. To get started, you need a basic set of generative AI infrastructure tools. A vector database to store custom data in a vectorized format is essential. Large language models, whether third-party or open source, provide human understandable summarized content. Tools like an orchestration system, such as langchain, federate requests to these models and databases. An observability system like Fiddler monitors the stack – application layer, modeling layer, and data layer – ensuring models perform correctly.
We’re calling it the mood stack – your model of orchestration, observability, and data. It’s akin to the lamp stack of Web 1.0, when people built internet applications 20 years ago. You build this toolchain to develop generative AI apps. Moreover, when it comes to responsible and trustworthy AI, a governance structure is crucial. Some customers implement an AI council, comprising AI leaders and other stakeholders. Legal compliance and product teams ensure a diverse perspective in evaluating models. An audit log is maintained, tracking metadata, training data, and model conclusions. This allows for control over the workflow, insights into model performance, and quick response to issues like bias or toxicity.
Focus on picking the right problems, setting up the infrastructure, assembling the right team, and implementing processes to ensure responsible and trustworthy AI. With these three pillars in place, you’re ready to build responsible generative AI apps.