
As artificial intelligence systems become more autonomous, their ability to interact with digital tools and data introduces complex new risks.
Recognizing this challenge, researchers from NVIDIA and Lakera AI have collaborated on a new paper proposing a unified framework for the safety and security of these advanced “agentic” systems.
The proposal addresses the shortcomings of traditional security models in managing the novel threats posed by AI agents that can take actions in the real world.
The core of the proposed framework moves beyond viewing safety as a static feature of a model.
Instead, it treats safety and security as interconnected properties that emerge from the dynamic interactions between AI models, their orchestration, the tools they use, and the data they access.
This holistic approach is designed to identify and manage risks across the entire lifecycle of an agentic system, from development to deployment.
.webp)
Arxiv security researchers noted that conventional security assessment tools, such as the Common Vulnerability Scoring System (CVSS), are insufficient for addressing the unique risks in agentic AI.
A minor security flaw at the component level, they identified, could cascade into significant, system-wide user harm.
This new model introduces a more comprehensive method for evaluating these complex systems, as illustrated in the framework’s architectural diagram.
It provides a structured approach to understanding how localized hazards can compound and lead to unexpected, large-scale failures.
The framework is designed to be operational for enterprise-grade workflows, ensuring that as agents become more integrated into business processes, their actions remain aligned with safety and security policies.
AI-Driven Risk Discovery
The paper delves deeper into the crucial phase of risk discovery, which relies on an innovative AI-driven red teaming process. Within a sandboxed environment, specialized “evaluator” AI agents are used to probe the primary agentic system for weaknesses.
These probes simulate various attack scenarios, from prompt injections to sophisticated attempts at tool misuse, to uncover potential vulnerabilities before they can be exploited.
This automated evaluation allows developers to identify and mitigate novel agentic risks, such as unintended control amplification or cascading action chains, in a controlled setting.
To support the advancement of this field, the researchers have also released a comprehensive dataset, the Nemotron-AIQ Agentic Safety Dataset 1.0. It contains over 10,000 detailed traces of agent behaviors during attack and defense simulations.
This resource offers the broader community a valuable tool for studying and developing more robust safety measures for the next generation of agentic AI. The ongoing research promises to provide evolving insights into the operational behavior of these complex systems.
Follow us on Google News, LinkedIn, and X to Get More Instant Updates, Set CSN as a Preferred Source in Google.
