Artificial Intelligence (AI) has become a critical enabler across sectors, reshaping industries from healthcare to transportation. However, with its transformative potential comes a spectrum of safety and security concerns, particularly for critical infrastructure. Recognizing this, the Cybersecurity and Infrastructure Security Agency (CISA) is championing a “Secure by Design” approach to AI-based software. At the core of this effort is the integration of AI red teaming—a third-party evaluation process—into the broader framework of Testing, Evaluation, Verification, and Validation (TEVV).
By aligning AI evaluations with established software TEVV practices, stakeholders can harness decades of lessons from traditional software security while tailoring them to AI’s unique challenges.
This initiative underlines the importance of rigorous safety and security testing, helping mitigate risks of physical attacks, cyberattacks, and critical failures in AI systems.
Why AI Red Teaming Matters
AI red teaming is the systematic testing of AI systems to identify vulnerabilities and assess their robustness. By simulating attacks or failure scenarios, this process reveals weaknesses that could be exploited, enabling developers to address these gaps before deployment.
CISA emphasizes that AI red teaming is not a standalone activity but a subset of the broader AI TEVV framework. This framework ensures that AI systems are rigorously tested for reliability, safety, and security, aligning them with the requirements of critical infrastructure.
Programs like NIST’s Assessing Risks and Impacts of AI (ARIA) and the GenAI Challenge have already laid the groundwork for AI TEVV by creating tools and methodologies that assess AI risks comprehensively. CISA builds on this foundation by advocating for AI TEVV to operate as a sub-component of traditional software TEVV.
AI and Software: A Shared Foundation in TEVV
A common misconception is that AI evaluations require a completely novel approach, distinct from traditional software testing frameworks. CISA, however, argues that this is a strategic and operational fallacy. AI systems, while unique in certain aspects, are fundamentally software systems and share many of the same challenges, such as safety risks, reliability concerns, and probabilistic behavior.
1. Safety Risks Are Not New
Software safety risks are not unique to AI. Decades ago, incidents like the Therac-25 radiation therapy device failure demonstrated how software flaws could lead to catastrophic outcomes. These failures prompted updates to safety-critical software evaluation processes, a precedent that now informs AI safety assessments.
Similarly, AI systems integrated into critical infrastructure—like transportation or medical devices—must be evaluated for safety risks. For example, an AI-powered braking system in vehicles must account for a range of external conditions, such as slippery roads or unexpected obstacles, much like traditional software evaluations have done for decades.
2. Validity and Reliability Testing
Ensuring that AI systems are valid (performing as intended) and reliable (functioning consistently across scenarios) is a shared requirement with traditional software. Robustness testing for AI systems mirrors the approaches used for software in fields like aviation and healthcare, where unexpected inputs or conditions can significantly impact outcomes.
3. Probabilistic Nature of Systems
Both AI and traditional software systems exhibit probabilistic behavior. For instance, slight variations in inputs can lead to significant output changes, a trait seen in AI systems trained with vast datasets. However, traditional software is no stranger to such variability. Vulnerabilities like race conditions and cryptographic randomness are long-standing issues in software development. By leveraging existing TEVV methodologies, AI evaluations can address these challenges effectively.
CISA’s Multi-Faceted Role in AI Security
CISA plays a pivotal role in enhancing AI security evaluations by working across three key areas:
- Pre-Deployment Testing
CISA collaborates with industry, academia, and government entities to advance AI red teaming. As a founding member of the Testing Risks of AI for National Security (TRAINS) Taskforce, CISA is actively involved in developing AI evaluation benchmarks and methodologies that integrate cybersecurity considerations. - Post-Deployment Testing
Beyond pre-deployment, CISA supports technical testing for AI systems already in use. This includes penetration testing, vulnerability scanning, and configuration assessments to ensure robust security in operational environments. - Standards Development and Operational Guidance
Partnering with NIST, CISA contributes operational expertise to the development of AI security testing standards. These standards are integrated into CISA’s broader security evaluation services, such as Cyber Hygiene and Risk and Vulnerability Assessments, ensuring that AI systems meet high cybersecurity benchmarks.
Streamlining AI and Software Evaluations
CISA’s approach to treating AI TEVV as a subset of software TEVV offers significant benefits:
- Efficiency: By leveraging existing TEVV frameworks, stakeholders can avoid duplicative testing processes, saving time and resources.
- Consistency: Applying proven methodologies ensures that AI systems meet the same rigorous standards as traditional software.
- Scalability: Unified frameworks enable the development of tools and benchmarks that can be used across diverse AI applications, enhancing the robustness of evaluations.
This streamlined approach also encourages innovation at the tactical level. Rather than reinventing the wheel, developers can focus on creating novel tools and methodologies that address AI-specific challenges while building on the solid foundation of software TEVV.
Conclusion: Building on Decades of Expertise
As AI continues to integrate into critical infrastructure, ensuring its safety and security is paramount. CISA’s Secure by Design initiative highlights the importance of viewing AI evaluations through the lens of traditional software testing frameworks.
By aligning AI TEVV with established software TEVV methodologies, stakeholders can build on decades of expertise, mitigating risks effectively and ensuring that AI systems are fit for purpose. With organizations like CISA and NIST leading the charge, the future of AI security is poised to benefit from a balanced blend of innovation and proven practices.
Related