As industries embrace Artificial intelligence (AI) in key decision-making processes, it’s important to ensure that the accuracy of these AI solutions in production is sustained. While measuring accuracy offline with testing datasets is relatively straightforward, doing so in a production environment presents unique challenges. In this article, we will explore the need for measuring accuracy in production, challenges associated with the same and best practices to address those challenges using examples.
Need for measuring accuracy in production
Measuring the accuracy of AI solutions in production is crucial as accurate AI solutions have a direct impact on business outcomes and ROI. AI-infused products can deliver the best user experience as we see in retail and e-commerce. However, an inaccuracy can overturn this same experience; in fact the consequences of inaccuracy are critical in industries such as healthcare and automotive. AI models can drift over a period due to changing data and environmental factors. There is no guarantee the model results will stay consistent throughout. Therefore, having the ability to measure the accuracy of models in production at regular intervals is necessary to take corrective actions.
Challenges of measuring accuracy in production
Measuring accuracy comes with a few challenges in terms of data variations, scalability and labeling. Unlike test datasets (used to calculate model accuracy during product development), production data can be complex, dynamic and diverse. The data may contain noise, anomalies, and unexpected edge cases, making it challenging in a production environment. Additionally, enterprise production environments often involve handling massive datasets and high throughput so measuring accuracy at this volume and scale can get challenging. Furthermore, in production, obtaining accurate labels/ground truth for data to evaluate models is challenging. For real-time data use cases, this can be further challenging as it is time sensitive i.e., by the time the data is labeled for measuring accuracy the AI solution would have already created multiple errors impacting the solution.
Let’s take the example of computer vision-based AI solutions. Consider an AI solution to recognize objects using security cameras. In a controlled test environment, the model may outperform with higher accuracy, however, when deployed in production may face several challenges.
Lighting condition: Security cameras experience varying lighting conditions throughout the day resulting in differences in image quality. For instance, a camera shot in the morning may exhibit distinct visual characteristics compared to one taken in afternoon or evening with shadows and lighting conditions. Computer vision model trained exclusively on ideal lighting condition may struggle to generalize effectively in a production environment.
Camera type & Installation: Different types of camera models and their installation angles in the production environments introduce variations in image quality, if this is not taken care of during the model training the model will struggle to generalize resulting in poor accuracy. Additionally, camera view deformation is another challenge where the object blends with the background making it hard for the AI model to detect.
Adversarial Impact: Both naturally occurring and intentionally crafted adversarial attacks have the potential to reduce the accuracy of object detection models in a production environment. For instance, wear and tear of a “stop signal” or deliberate modification of the stop signal can lead to the failure of the AI model to detect the stop signal. Addressing these challenges with robust models on real-world data is essential.
Best practices for measuring accuracy in production.
There are several best practices that can essentially measure the accuracy of AI solutions in production and address these challenges.
Proactive monitoring: Implementing solutions to continuously monitor the model performance in real -time. Any deviation or degradation in accuracy may trigger an alert for taking necessary action. This can be achieved by having a robust evaluation pipeline part of the production deployment where the pipeline periodically assesses the model’s performance and triggers alerts based on set thresholds.
Feedback loop: Establishing feedback loops for model re-training. When model accuracy drops due to environmental change or data distribution those need to be selectively detected and included as part of the model training process. If the AI-infused product includes human interaction via a user interface, leverage the same to get instantaneous feedback.
A/B Testing: Implementing multiple versions of models in production and monitor results on real-life data sets.
A/B test can compare results from different models on the same dataset and
helps in the decision process of which model to publish to production.
Detecting Anomalies: Deploying anomaly detection techniques to identify edge cases, anomalies, and handle outliners in production. Measuring anomalies in a time series model involves identifying data patterns that deviate significantly from the expected behaviour. Proactive alerts to indicate these anomalies will help to take necessary action.
Active learning: In a production environment getting timely labeled data for model training is expensive. Active learning is one of the techniques that mitigates this by allowing AI models to actively select the data samples to be labeled with a focus on the most informative data, the technique helps to accelerate the model’s learning process.
Measuring the accuracy of AI solutions in production is paramount, however, it comes with unique challenges due to the real-world complexities. Following best practices like continuous monitoring, feedback loops, anomaly detection and active learning will help to navigate these challenges and ensure that the AI solution is providing desired results in the production environment.
Swaroop has 17+ years of rich experience in IP based video cameras, surveillance and computer vision technology building & managing enterprise scale video solutions. He has been a passionate evangelist of video analytics technology and has 2 patents in this domain. He is involved in incubating & transforming retail organisations using AI/ML computer vision technology. He is currently leading the computer vision platform at Lowe’s as Director – Data Science.