Search
Close this search box.

Council Post: Software Too Big to Fail; A Case for Independent Software Verification and Validation

When software reaches a level of ubiquity and criticality, it becomes "too big to fail."

In today’s interconnected world, software systems underpin critical functions across virtually every industry. From financial services and healthcare to transportation and communication, these systems are foundational to our daily business operations and lives. 

When software reaches a level of ubiquity and criticality, it becomes “too big to fail.” The failure of such software could lead to significant business and personal disruptions, highlighting the imperative for rigorous, independent verification and validation (IV&V). This methodology, successfully implemented in various market verticals, enhances software reliability and safety, ensuring these indispensable systems remain robust and secure. 

What is “Too Big to Fail”?

The phrase “Too big to fail” describes businesses or sectors so deeply embedded in a financial system or economy that their collapse would be catastrophic. This concept gained prominence during the 2007-2008 financial crisis, when the failure of major banks necessitated substantial government intervention to prevent economic disaster.


In response, the United States passed the Dodd-Frank Act in 2010, imposing stringent regulatory requirements on financial institutions deemed “too big to fail”. The Federal Reserve maintains a list of these systemically important entities.

The concept of “too big to fail,” which originally applied to financial institutions, has now expanded to encompass critical software systems. This shift was starkly illustrated on July 19, 2024, when CrowdStrike, a leading cybersecurity firm, experienced a catastrophic incident due to a faulty software update. The update caused widespread disruptions, affecting approximately 8.5 million Microsoft Windows devices globally, originating from a misconfiguration in CrowdStrike’s Falcon sensor software.

The impact of this incident was profound and far-reaching, affecting critical sectors including transportation, healthcare, and financial services. It led to grounded flights, delayed medical procedures, and interrupted banking services, demonstrating the interconnectedness and vulnerability of modern technology systems. Despite CrowdStrike’s quick response in reverting the faulty update and collaborating with major cloud providers – like Microsoft Azure and Google Cloud Platform – to mitigate the effects, the incident’s repercussions were severe, widespread and long-lasting.

This global software failure at CrowdStrike serves as a compelling example of how certain software platforms have become “too big to fail” in their own right. The incident’s impact on a wide range of industries – from banking and retail to media and healthcare – underscores the critical nature and global ubiquity of these systems. Some organizations continued to struggle with recovery long after the initial incident, highlighting the need for resilient disaster recovery and incident response plans.

Whether caused by human error or artificial intelligence (AI) malfunction, such incidents demonstrate the potential for widespread disruption when critical software systems fail. This event has brought into sharp focus the need for enhanced safeguards, stricter oversight, and potentially new regulatory frameworks to manage the risks associated with software that has become integral to global economic operations and society.

This realization calls for a regulatory framework akin to the Dodd-Frank Act, but tailored for pervasive, critical software systems. The goal should be to reduce the likelihood of catastrophic failures to six sigma-level events – a standard of six standard deviations from the mean (i.e., a truly rare occurrence).

Independent Verification and Validation (IV&V): Ensuring Software Quality and Reliability

Verification and Validation (V&V) are essential processes in software development. Verification asks, “Are we making the product right?” while validation asks, “Are we making the right product?” In other words, verification ensures that software functions correctly according to its design specifications, whereas validation confirms that the software meets the broader needs and expectations of its users. 

The “I” in IV&V stands for Independent, meaning that an external third party performs these functions. This independent entity has separate leadership, software analysis teams, and funding sources distinct from the organization whose software is being evaluated, removing conflicts of interest and allowing for a more robust V&V process.

NASA‘s Independent Verification and Validation (IV&V) Program exemplifies this rigorous approach in action, particularly in the critical field of space exploration. The IV&V Facility has significantly contributed to the success of various missions, including International Space Station operations, Mars Rover software, and Extravehicular Mobility Unit (spacesuit) hardware-software integration. 

In these contexts, software reliability is crucial for both mission success and astronaut safety. Much like using separate individuals for code review and testing rather than those who developed the software, NASA’s IV&V Program represents the most unbiased and rigorous form of V&V, ensuring a thorough evaluation of software quality and functionality. 

This approach helps NASA maintain its high standards for space exploration technology, demonstrating the real-world impact of comprehensive software validation and verification processes. Existing standards, such as those from ANSI, IEEE and MIL, could be leveraged to create an efficient regulatory framework addressing “too big to fail” software across various critical sectors, following NASA’s lead in prioritizing software reliability and safety.

Pros and Cons of Software IV&V

The advantages of software IV&V are evident across various critical industries. While NASA employs IV&V in space exploration, other sectors have also recognized its importance. For instance, aerospace companies use V&V processes to ensure the safety and reliability of avionics systems, as seen in the extensive testing of the Boeing 787 Dreamliner‘s flight control software. Similarly, medical device manufacturers apply V&V to critical equipment like the da Vinci Surgical System, significantly improving patient outcomes.

The automotive industry, particularly companies developing autonomous vehicles – like Tesla and Waymo – conducts extensive V&V testing to ensure public safety. Financial institutions rely on V&V for secure online banking and payment processing systems, while defense organizations implement stringent V&V in systems like the U.S. Navy’s Aegis Combat System.

Despite these success stories, many industries still rely on internal entities for verification and validation, similar to an internal audit function. Although these internal processes may be more efficient, they are arguably less effective due to misaligned incentives related to management, finances, and performance.

“Too big to fail” software entities may argue that regulatory requirements for IV&V would hinder innovation, slow software delivery, and increase costs. While one could easily disagree with the claim about innovation impacts, it’s true that IV&V could initially slow delivery and increase costs, which would likely be passed on to software purchasers. However, these are arguably acceptable trade-offs for minimizing global disruptions, as demonstrated by the critical role of V&V in industries where software failures could have catastrophic consequences.  In the case of Zero-Day events, the deployment of on-call, rapid response teams could also address delivery speed when time is of the essence.

Implementing IV&V requirements would create a new industry and generate more software-oriented jobs. Additionally, software covered by IV&V might potentially reduce outlays for certain corporate and/or software insurance coverages, as seen in the improved safety records of industries that have adopted rigorous V&V practices.

While there are challenges to implementing IV&V across industries, the benefits in terms of software reliability, safety, and long-term cost savings may outweigh the initial drawbacks. The key is to balance the need for thorough verification and validation with the pace of innovation and market demands, learning from sectors where V&V has proven its worth in preventing critical failures and ensuring public safety.

The Path Forward: Ensuring Software Reliability

When software reaches a critical inflection point and becomes “too big to fail,” it should be subject to independent verification and validation (IV&V). This can be achieved through efficient regulatory frameworks that leverage existing V&V standards and draw upon industry use cases that demonstrate the value of these approaches.

Implementing IV&V for critical software systems would significantly enhance global software reliability and safety. By learning from sectors where V&V has proven essential—such as aerospace, healthcare, and defense—we can develop a robust framework that balances thoroughness with practicality.

This approach would not only mitigate risks associated with software failures in critical systems but also foster a culture of quality and responsibility in software development. While there may be initial challenges in implementation, the long-term benefits of increased reliability, improved safety, and potential cost savings in disaster prevention far outweigh these concerns.

By mandating IV&V for “too big to fail” software, we can create a safer, more dependable digital infrastructure that supports innovation while prioritizing public safety and system integrity. This proactive stance will be crucial as software continues to play an increasingly vital role in our global society and economy.

Picture of Jason G. Cooper
Jason G. Cooper
Jason is a Chief Technology Officer at Paradigm with 20-plus years of experience as an executive leader specializing in deriving business value from technology, data and analytics covering private, for-profit, and nonprofit domains, including leadership roles at HMS, Blue Cross Blue Shield plans, Cigna and CVS. A Fellow of the American College of Health Data Management and member of the Society for Information Management, in addition to serving on the advisory boards for Kids' Chance of New Jersey and the Hunterdon County Computer Science and Applied Engineering Academy.
Subscribe to our Latest Insights
By clicking the “Continue” button, you are agreeing to the AIM Media Terms of Use and Privacy Policy.
Recognitions & Lists
Discover, Apply, and Contribute on Noteworthy Awards and Surveys from AIM
AIM Leaders Council
An invitation-only forum of senior executives in the Data Science and AI industry.
Stay Current with our In-Depth Insights
The Most Powerful Generative AI Conference for Enterprise Leaders and Startup Founders

Cypher 2024
21-22 Nov 2024, Santa Clara Convention Center, CA

21-22 Nov 2024, Santa Clara Convention Center, CA
The Most Powerful Generative AI Conference for Developers
Our Latest Reports on AI Industry
Supercharge your top goals and objectives to reach new heights of success!