Council Post: Explainability and Interpretability in Machine Learning Models, Bridging the Gap Between Accuracy and Transparency

By Anshika Mathews
Published on January 30, 2024

Council Posts

The quest to bridge the gap between accuracy and transparency is a pivotal journey, acknowledging that the reliability of AI systems not only hinges on their predictive prowess but also on the capacity to demystify their inner workings.

In the ever-evolving landscape of machine learning, the pursuit of accurate predictive models has been a central focus. However, as models become increasingly complex, there arises a critical need to comprehend and trust their decision-making processes. This imperative has given rise to the crucial concepts of explainability and interpretability in machine learning models. The quest to bridge the gap between accuracy and transparency is a pivotal journey, acknowledging that the reliability of AI systems not only hinges on their predictive prowess but also on the capacity to demystify their inner workings. This exploration into explainability and interpretability unveils the significance of understanding how algorithms arrive at their conclusions, fostering trust, accountability, and ethical deployment in diverse applications. As we delve into this nuanced realm, we embark on a journey to dissect the inner workings of machine learning models, seeking a harmonious equilibrium between cutting-edge accuracy and the transparency essential for real-world applicability.

We had a roundtable discussion on the topic with a set of experienced and distinguished leaders in the industry. The session was moderated by Kashyap Raibagi, Associate Director – Growth at AIM along with panelists Satheesh Ramachandran, Head of AI and Analytics Product at Charles Schwab, Farhat Habib, Associate Vice President at Mphasis, Avijit Chaterjee, Head of AI/ML and NextGen Analytics at Memorial Sloan Kettering Cancer Center, Arjun Srinivasan, Director – Data Science at Wesco, Rishi Bhatia, Director – Data Science at Walmart Global Tech and Deepak Jose, Global Head and Senior Director for One Demand Data and Analytics Solutions at Mars.

Exploring the Trade-off Between Explainability and Accuracy in AI

This tradeoff between explainability and accuracy is not a new conversation. If you step back 20-30 years ago, with the classic credit scoring model, for example, you needed to explain why somebody got rejected for a loan application, another example, from having been working in fraud for a long time. In fraud, when you flag something as a lead for an investigator to pursue, explainability is important for two reasons: a) the investigator wants to know why this model gave it a high score – it is very important that you have buy-in/trust from the end users of this model, because you are completing against prebuilt bias of the investigator, and b) regulatory reasons – when rejecting somebody’s credit card transaction, we need to explain to internal auditors that the decisions that the model is making is not biased and discriminatory.

So, explainability has been there for a long time. It’s just that it’s evolving. Before the advent of some of the de-facto explanatory algorithms these days such as Shap, LIME, and..there were the rock solid metrics from old school predictor contribution metrics from logistic regression scorecards that were the mainstay of financial, insurance and other regulated industries for a long time.

From a historical perspective, when AI exploded, for many new generation applications, particularly in CV applications, largely funded by the Googles and the Facebooks, initially explainability took somewhat of a backseat relative to accuracy – nobody cared as much why I was predicting an image to be a dog versus a cat as long as I was predicting them correct. Things have come back now, particularly in the Gen AI world that we live in. If somebody is saying that a text connotes a positive sentiment, people do want to know why, or if I’m categorizing a text a certain way, people want to know why. The explainability part of the conversation has gained increasing emphasis given the quest for responsible and transparent application of Generative-AI. GenAI models are getting increasingly accurate, but at the same time more complex and “black boxy”. But there is significant research trying to keep up in making model outputs interpretable (attribution based methods, attention based methods, example based methods etc.), and even leverage the explanations to improve the models.

– Satheesh Ramachandran, Head of AI and Analytics Product at Charles Schwab

A Dual Perspective from Service Provider and Industry Expertise

Explainability is important because people like to understand what is going on behind the scenes when a particular decision or product is being made. Also, another thing with AI is that in contrast to AI, we have natural intelligence, and we barely understand natural intelligence, and yet we are comfortable using somebody’s intelligence to arrive at a decision. So what’s the difference between a doctor who can say that I don’t feel right about this decision versus an AI that gives you a decision and does not explain it? What’s the difference between these two cases? So the one significant difference I feel is that when a person is there behind a decision, you understand that the person’s reputation is there, the person’s education, everything that made them until that point, as well as in case they made a wrong decision, they would have to stand by it. If somebody consistently makes wrong decisions, people would stop going to them for whatever they are looking for, and they would lose their livelihood, or they would lose their reputation and so on. An AI model does not have that kind of fear in any way whatsoever. Any AI model can consistently get things wrong without feeling shame. It’s just not a part of what AI models are about.

How do you convince people about bringing AI into their organization, using more AI in the products, and so on? I felt it was not that difficult. There are a couple of hurdles that you have to cross. The first one is showing them the value that AI can bring in terms of output. Can you do more with the same number of people? Can you do better with the same quality of people?

The second concern was twofold; sometimes, the harder one is where people will be laid off. Whether [the leader’s] team’s importance will go down or up is a significant concern. The thing that I made it clear to the team was that it’s doubtful that the team would need to be asked to leave simply even looking at the vast increase in the amount of content that AI could produce for two reasons: one is that the demand itself is going up, your team already super overworked. We are having people leave because they’re unhappy with the amount of work that is coming on to them and so on so you could expect to retain all your team produce a more significant amount of content. And your team would be more critical and happier. And the second part is that AI is not 100% replacing a person because we still keep the human-in-the-loop. After a piece of content was produced, the person who would have produced the content would review it and pass it or not pass it because even if it does pretty well for most of its applications, it occasionally will produce a zinger that you did not expect it to do. For many reasons, any public-facing content would need to be moderated by a human somehow. It could be for something as simple as sending an Apple product to a Samsunguser. That was a breach of contract that we had with Samsung at the time, as well as similar things. It could be much worse. For example, you could send out something that is racist or communal, and you cannot wholly expect AI to make sure that no such transgression happens.

– Farhat Habib, Associate Vice President at Mphasis

Discrepancy in Rigorous Research for Building Explainability in AI

In healthcare, machine learning techniques that are more explainable, such as logistic regression, are more widely used. Even though use of deep learning techniques in this field has come a long way, it is still perceived as a black box and difficult to understand how the inputs are converted to features and combined together into a prediction using weights and biases. For studying cancer diagnosis and prognosis, we use a lot of multimodal models involving medical images, for they are at the crux of helping providers understand the disease progression. We have had good success with techniques such as gradient blending where we optimize across different neural networks for the clinical numeric data, text from radiology/pathology/clindoc reports and various modalities of images (CT, MRI etc.) to generate the final prediction. We have noticed that gradient blending method significantly outperforms what you predict with each of these individual data modalities. For explainability of the neural networks, especially for the imaging data, we use Grad-CAM, that highlights the hotspots in the image that helps dilineate the classification decision, as in highlight regions of interest in the image that explains the disease. Traditionally, there was old school thinking that you could never trust AI for clinical decision making, for it could do more harm than good. But that’s rapidly changing, as you see more and more FDA approved AI powered algorithms in Radiology for disease diagnosis and use of biomarkers studied via digital Pathology images, to understand disease diagnosis, and the golden proof of efficacy of the approved precision therapy

– Avijit Chaterjee, Head of AI/ML and NextGen Analytics at Memorial Sloan Kettering Cancer Center

Balancing Data Insights and Business Expertise in Retail AI Implementation

It’s always a trade-off because when you present findings to stakeholders with extensive experience, individuals who have been in the industry for 25 years and possess in-depth knowledge of business processes, there can be a tendency for initial disagreement. They may believe they have a deeper understanding than the data suggests, posing a challenge. In such situations, accuracy takes on heightened importance.

In certain scenarios, explainability also becomes crucial. For instance, when determining the pricing for a particular item, it’s essential to provide a clear rationale for the chosen price. Stakeholders need to comprehend the reasoning behind the pricing strategy. To address this, we’ve integrated a trust-building element into our approach. When we develop any model, we automatically generate an explainability report for stakeholders. This report allows them to delve into the model, offering insights into, for example, why a specific price is recommended. It could be due to factors such as the item being low in stock and having a limited sales history. Our aim is to provide as much transparency as possible, ensuring stakeholders can make informed decisions based on a clear understanding of the data-driven insights.

– Rishi Bhatia, Director Data Science at Walmart Global Tech

Future of Explainability in Emerging Data-Driven Technologies

With the advent of Gen AI, we are going from model to data explainability, as many of us are using pre-trained models with GPTs (closed source) and Llamas (open source) of the world. There are entities embracing open source models like Hugging Face and others who provide this explainability, but there is still a lack of transparency regarding the data used for training these models. Compare this to using supervised training and labeling with your data – there is a greater level of control and explaining what went into creating a model and understanding the cause-and-effect becomes easier.

However, as we move towards Gen AI, there are concerns about the ability to explain the data that contributed to the decision-making process, to stakeholders. This is where insights from domain knowledge/experts (through prompts or reinforcement learning) become crucial. When fine-tuning or adding context to a pre-trained model (using RAG like architecture), it is important to consider how the data and domain-specific knowledge are being incorporated which will help explain the model’s outcome.

In summary, GenAI and current flavors of advanced Deep learning algorithms are putting more emphasis on data explainability and the understanding of domain knowledge, to provide a clear understanding of the decision-making process.

– Arjun Srinivasan, Director – Data Science at Wesco

The Urgency for Explainability in AI Investments

Will there be more investment in explainability? In my opinion, Currently, the AI explainability frameworks are to a great extent led by the technology providers and the data, analytics, and AI team. This will evolve and the organizational risk management team will need to start focusing on it. My thought process is that if there is a major screw-up that happens because of a lack of AI explainability it can be a huge legal risk for any organization in any industry. That’s when the real, large focus and investment is going to come in explainability. Is it going to increase investment? In the short term, it may not, but I think the investment will be there, I think we’ll have to see some bottoms to see the kind of investment that we expect.

– Deepak Jose, Global Head and Senior Director for One Demand Data and Analytics Solutions at Mars

📣 Want to advertise in AIM Research? Book here >

Anshika Mathews

Anshika is the Senior Content Strategist for AIM Research. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at anshika.mathews@aimresearch.co

Subscribe to our Latest Insights