Council Post: Improve Explainability of Machine Learning Models Using LLMs/GPT Prompts

Responsible AI, emphasizing ethical, fair, and trustworthy AI systems, has gained significant traction due to the growing awareness of potential risks and challenges associated with this technology. Explainability plays a crucial role in achieving this goal, ensuring transparency, accountability, and fairness in AI systems. This text explores how LLMs (Large Language Models) can bridge the […]

Responsible AI, emphasizing ethical, fair, and trustworthy AI systems, has gained significant traction due to the growing awareness of potential risks and challenges associated with this technology. Explainability plays a crucial role in achieving this goal, ensuring transparency, accountability, and fairness in AI systems. This text explores how LLMs (Large Language Models) can bridge the explainability gap between data scientists and business users, fostering responsible AI adoption.

Current Challenges in Explainability:

The current approach to explainability in AI models often suffers from several limitations, creating a communication gap between data scientists and business users. These limitations include:

  • Non-intuitive data variable names: Technical terms and abbreviations used in data variables are often incomprehensible to non-technical stakeholders.
  • Confusing table and column names: Similar to variable names, table and column names often lack clarity and context, making data interpretation difficult.
  • Limited communication: The current workflow often involves minimal interaction between data scientists and business users, leading to a lack of shared understanding of the model and its decision-making process.
  • Naïve explainability techniques: Existing techniques like relying solely on the same confusing variable names for explanation fail to provide clear and actionable insights.
    LLMs as the Bridge:

LLMs offer a powerful solution to bridge these explainability gaps by leveraging their unique capabilities:

  • Data Scientists: LLMs can analyze vast amounts of domain-specific text data, such as medical literature and research papers. This analysis can enrich their understanding of complex variables and models, enabling them to:
    Select more interpretable features during model development.
    Develop models that are inherently more understandable from the beginning.
  • Business Users: LLMs can translate complex explanations generated by data scientists and models into clear, business-friendly language using their natural language generation capabilities.
  • This fosters:
    Improved communication and trust between technical and non-technical stakeholders.
    Better understanding and buy-in from business users, leading to wider adoption of responsible AI solutions.
    Example: Transforming Explainability with LLMs

Consider the scenario of a healthcare AI model predicting the likelihood of cancer in a patient. Traditionally, the model might simply output “Malignant (80% probability)” with the top 5 variables listed as “radius_worst, area_worst, compactness_se, concavity_worst, concavity_se”. This information is technical and challenging for non-medical professionals to understand.

An LLM, however, can analyze these technical details and generate a more interpretable explanation like:

“For Patient ABC, the LLM suggests a higher chance (80%) of malignancy due to factors like larger variations in size and shape of the tumor, more pronounced dips and curves on its surface, higher contrast differences within the tissue, and increased distance from the center.”

This explanation provides meaningful context and uses clear language understandable by both medical professionals and decision-makers, fostering responsible AI adoption.

Beyond the Example:

While the example focuses on healthcare data, the potential of LLMs applies to various domains. LLMs can be used to explain models in finance, marketing, or any field where bridging the communication gap between technical and non-technical stakeholders is crucial for responsible AI implementation.

Important Considerations:

It’s essential to acknowledge the limitations and ethical considerations surrounding LLMs:

Specificity: Mention the specific LLM used (e.g., Bard, GPT-3) and its limitations in terms of domain expertise and potential biases.
Prompt Engineering: Explain how prompt engineering techniques are crucial for guiding the LLM towards generating informative and accurate explanations.
Addressing Challenges: Highlight potential challenges like bias in LLMs and factual inaccuracies, along with strategies to mitigate these risks (e.g., human oversight, data quality checks).

Step by step guide:

Step 1: Incorporating Data Understanding:

To illustrate the potential of LLMs in enhancing explainability, let’s consider a practical example. We’ll utilize the Breast Cancer Dataset available on Kaggle, which comprises 32 variables. The primary objective is to build a classification model to predict whether a cancer type is malignant or benign. While this article focuses on improving explainability with the existing model, understanding the underlying data provides context for the LLM to tailor explanations effectively.

By integrating the explanation of the dataset and its purpose, you’ve successfully connected the initial information to the central theme of how LLMs benefit explainability in AI. Remember to keep the article concise

Step 2. Model Development:

Following the data exploration stage, a robust model development process is crucial. In this case, we employed an XGBoost model with cross-validation to achieve good accuracy.

Step 3: While techniques like SHAP can identify influential features, interpreting them without context can be challenging. For instance, in our example, SHAP might identify features like “radius_mean” and “concavity_worst” among the top 5, but understanding their true impact on the model’s predictions requires deeper analysis. This is where LLMs can offer a valuable contribution by providing contextual explanations that go beyond simply listing the top features.

This is where I started understanding more about the data and how it is collected. I went through https://www.spiedigitallibrary.org/conference-proceedings-of-spie/1905/1/Nuclear-feature-extraction-for-breast-tumor-diagnosis/10.1117/12.148698.short?SSO=1 which explains how the data is collected. It had multiple medical terms which were not easy for me to understand.

Step 4: Using Chatgpt to understand as well as explain the variables.

4.1 – LLMs offer the potential to significantly improve the user experience for those interacting with AI models. In our case, by leveraging ChatGPT to translate technical jargon into clear, understandable language, we made the column names within the Breast Cancer Dataset more accessible to a wider audience. This demonstrates how LLMs can contribute to building more user-friendly and transparent AI systems.

For instance, ChatGPT transformed “radius_mean” into the more understandable “average radius of the tumor.” This demonstrates how LLMs can enhance our understanding of complex technical concepts by providing clear and concise explanations, even for technical users.

4.2 – Achieving true explainability is often an iterative process. In this instance, we leveraged SHAP values and ChatGPT to gain deeper insights into the model’s decision-making. By analyzing the Shap values through a simulated doctor’s perspective using ChatGPT, we were able to understand the model’s reasoning and identify areas for refinement. This highlights how LLMs can facilitate an iterative approach to building explainable AI, empowering data scientists to continually improve their models’ interpretability.

Summary Statement of High Severity Causes of Cancer


Based on a comprehensive analysis of the Breast Cancer Dataset, several factors have been associated with an increased likelihood of malignancy. It’s crucial to remember that SHAP values, while providing insights into feature importance, are not definitive measures of causality. Consulting a healthcare professional for personalized risk assessment and diagnosis is vital.

However, considering the available data, the following characteristics may be indicative of a higher risk:

Larger tumor size and area: Larger tumors, as indicated by features like “worst area” and potentially “perimeter” measurements, might be associated with a higher risk of malignancy compared to smaller tumors.
Irregular tumor shape: Features like “worst concavity” may suggest a more irregular tumor shape, which can be a potential indicator of cancer compared to a smoother, rounder shape.
Distance from the center to the perimeter: This feature, while its specific interpretation might require further domain expertise, could potentially indicate an uneven distribution of cells within the tumor, which could be relevant for cancer risk assessment.

It’s important to emphasize that these are just potential indicators, and a definitive diagnosis of cancer requires a comprehensive evaluation by a qualified healthcare professional. Additionally, factors beyond those captured in this dataset can also play a significant role in cancer risk.

Acknowledges the limitations of SHAP values and emphasizes the importance of consulting a healthcare professional.
Provides interpretations of the features in a more medically sound and nuanced way, avoiding direct translation of technical terms.
Highlights the need for further analysis and expertise for definitive diagnosis.

While AI advancements are impressive, ensuring their transparency and ethical use remains crucial. This article explored the potential of Large Language Models (LLMs) to bridge the explainability gap in AI. By empowering both data scientists and non-technical stakeholders to understand and communicate model behavior, LLMs can foster trust and pave the way for responsible AI development. While limitations exist, continuous research holds the promise to unlock the full potential of LLMs as explainability bridges, propelling us towards a future where AI serves for good, understood by all.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill the form here.

📣 Want to advertise in AIM Research? Book here >

Picture of Rai Rajani Vinodkumar
Rai Rajani Vinodkumar
Rajani, as the Head of AI Platforms, is dedicated to bringing AI to life for digital customers and revolutionizing business processes through data, advanced ML, and deep learning algorithms. With nearly 14 years of experience, she specializes in AI strategy, data science, fraud detection, and process automation across various sectors including insurance, retail, and consumer goods. She has led numerous projects focused on touchless operations, invisible fraud detection, and consistent decision-making for large and intricate datasets. Rajani's expertise has earned her prestigious awards such as the Indian Achiever's DS and AI Excellence Award, the 3AI Women's to Watch Out for Award, and the 40 under 40 Data Scientists Award.
Subscribe to our Latest Insights
By clicking the “Continue” button, you are agreeing to the AIM Media Terms of Use and Privacy Policy.
Recognitions & Lists
Discover, Apply, and Contribute on Noteworthy Awards and Surveys from AIM
AIM Leaders Council
An invitation-only forum of senior executives in the Data Science and AI industry.
Stay Current with our In-Depth Insights
The Most Powerful Generative AI Conference for Enterprise Leaders and Startup Founders

Cypher 2024
21-22 Nov 2024, Santa Clara Convention Center, CA

25 July 2025 | 583 Park Avenue, New York
The Biggest Exclusive Gathering of CDOs & AI Leaders In United States
Our Latest Reports on AI Industry
Supercharge your top goals and objectives to reach new heights of success!