Is Agentic AI too smart for your own good?


Agentic AI, which consists of systems that autonomously take action based on high-level goals, is becoming integral to enterprise security, threat intelligence, and automation. While these systems present significant potential, they also introduce new risks that CISOs must address. This article examines the key security threats associated with Agentic AI and outlines strategies for mitigation.

CISOs must act now by implementing adversarial testing, governance frameworks, multi-layered authentication, and AI security posture management to prevent the weaponization of Agentic AI against their organizations.

Deceptive and manipulative AI behaviors

A recent study revealed that advanced AI models sometimes resort to deception when faced with losing scenarios. OpenAI’s o1-preview and DeepSeek R1 were found to engage in deceptive behaviors, including cheating in chess simulations when they predicted failure. This raises concerns about Agentic AI systems engaging in unpredictable and untrustworthy behavior in cybersecurity operations.

In a security context, an AI-driven SOC or automated threat remediation system might misrepresent its capabilities or game internal metrics to appear more effective than it is. This deceptive potential forces CISOs to rethink monitoring and validation frameworks for AI-driven decisions.

Mitigation strategy:

  • Implement continuous adversarial testing to detect deceptive tendencies.
  • Require Agentic AI models to provide verifiable reasoning for decisions.
  • Establish AI honesty constraints within operational models.

Autonomous AI and the rise of Shadow ML

Many enterprises already struggle with Shadow IT. With Agentic AI, a new problem is emerging: Shadow ML. Employees are deploying Agentic AI tools for automation and decision-making without security oversight, leading to unmonitored AI-generated actions.

For instance, an AI-powered financial assistant could autonomously approve transactions based on outdated risk models, or an unsanctioned AI chatbot could make regulatory compliance commitments that expose the organization to legal risks.

Mitigation strategy:

  • Deploy AI Security Posture Management (AISPM) tools to track and manage AI model usage.
  • Mandate zero-trust policies for AI-driven transactions and decisions.
  • Establish AI governance teams responsible for monitoring and approving AI deployments.

“Shadow ML is one of the biggest security threats. These rogue models often emerge from well-intentioned teams trying to move fast. But without the right controls, they become an open door to data leaks, compliance violations, and adversarial manipulation. Security teams cannot defend what they can’t see, and if you’re not actively managing your Agentic AI, you’re already behind,” Noam Vander, CISO at Atera, told Help Net Security.

“AI observability and monitoring must be built into Agentic AI deployment from day one, providing real-time visibility into how models behave, flagging anomalies, and ensuring accountability. Real-time tracking of ML model behavior ensures that unauthorized AI doesn’t slip through the cracks. Security teams need visibility into what models are running, how they interact with data, and whether they introduce unexpected vulnerabilities. This requires dedicated AI security tools, integration with SIEM and SOC workflows, and continuous anomaly detection to catch issues before they escalate,” Vander concluded.

Exploiting Agentic AI through prompt injection and manipulation

Cybercriminals are actively researching ways to manipulate Agentic AI using prompt engineering and adversarial inputs. These attacks exploit the model’s autonomy, leading it to make unauthorized transactions, disclose sensitive information, or reroute security alerts.

A particularly concerning scenario is AI-driven email security tools being manipulated to whitelist phishing emails or approve fraudulent access requests after a subtle alteration in prompt instructions.

Mitigation strategy:

  • Implement input sanitization and context verification in Agentic AI decision-making.
  • Use multi-layered authentication before an AI system can execute security-critical tasks.
  • Regularly audit logs of AI-generated actions for anomalies.

Passing unsanitized inputs into any LLM is risky – similar to how SQL injection used to be a common attack vector. Yonatan Striem-Amit, CTO, 7AI, told us that the following are critical to implement when designing any agentic system:

Avoid giving direct input to the LLM from a potentially attacker-controlled system

The question starts with what data is given to the LLM. There’s a considerable difference whether the data comes from a controlled system – such as alerts – or whether the data comes from unsanitized sources like a chatbot facing anonymous users.

Perform input verification, length and content-based filters before letting the LLM receive the data directly.

Architect for small decisions

Instead of having one LLM in a simple one or two-step architecture, opt for an agent architecture with many smaller agents. Configure each small agent only to make straightforward decisions (such as selecting options within a predefined list, retrieving simple data, etc.).

Instead of writing a query/operation from scratch, let it choose among a carefully vetted set of queries with defined output options. The results should be interpreted without considering the input to ensure unbiased interpretation.

Minimal access per agent

Building agents that can do many things is tempting, but sound engineering taught us that a minimal permission model scales better. Make sure each agent can run with minimalist permissions and minimalist access. Ensure connectors and access to external and internal systems are running in least-privilege mode (read-only, only to data the user is allowed access to).

Architect permission management outside the LLM

Information about the tenant, role, and permissions should never be visible to the LLM. Architect the solution such that a user’s scope of access and permissions is always implicit any time the LLM interacts with the world. Instead of asking the LLM to say, “I currently serve a request from joe@nothing_wrong_here.com,” force the identity into all other systems from the get-go.

Architect verification agents

When a set of small agent architectures is used, adding agents/functions that perform mid-process analysis is straightforward. These should be both LLM and not LLM checks to see if the set of requests, actions, and permissions for any LLM call matches the defined permissions and operates within statically defined guardrails. When your agents perform small tasks, it’s easy to validate they do their tasks well and as intended. Any attempt to break into such a system must simultaneously break all guardrails.

AI hallucinations and false positives in security decision-making

While Agentic AI can enhance threat detection, it also has the potential to generate false positives or false negatives at scale, undermining cybersecurity operations. AI hallucinations can lead to misattributed security alerts or even incorrectly flagging an employee as an insider threat.

A misclassified event could trigger automated lockouts, false accusations of data exfiltration, or unnecessary emergency responses, eroding trust in AI-driven security.

Mitigation strategy:

  • Require human-in-the-loop (HITL) verification for AI-driven critical security actions.
  • Implement anomaly detection layers to cross-check AI-generated alerts before execution.
  • Train models using adversarial datasets to improve resilience against hallucinations.

AI agents in cybercrime: The double-edged sword

CISOs must also prepare for offensive Agentic AI threats.

“Attackers can now use autonomy to launch sophisticated attacks. For example, an attacker could use Agentic AI to autonomously map networks, identify access points, and probe for weaknesses without constant human direction. They could also be used for adaptive evasion. Malicious AI agents could dynamically adjust their behavior to avoid detection by learning from failed attempts, modifying attack patterns, and rotating through different techniques to automatically discover which ones are most effective at going under the detection radar,” Diana Kelley, CISO at Protect AI, told us.

Mitigation strategy:

  • Deploy autonomous AI-driven red teaming to simulate attacks using Agentic AI models.
  • Strengthen AI-driven endpoint detection and response (EDR) to anticipate AI-generated malware.
  • Establish AI-incident response protocols that adapt dynamically to evolving threats.

“Agentic AI for defenders includes solutions with advanced detection strategies that focus on behavioral patterns and anomalies that might indicate autonomous agent activity, such as highly systematic scanning/probing, machine-speed decision-making, and rapidly coordinated actions across multiple systems. Once detected, the defensive Agentic AI could act to isolate the activity and limit the blast radius,” Kelley explained.

“There are also ways to defend against malicious Agentic AI. Security architectures must account for agents’ ability to chain multiple lower-risk activities into dangerous sequences. This requires comprehensive logging and correlation, pattern recognition across extended periods, understanding of normal automation behaviors, and detecting subtle deviations from expected patterns. Finally, incident response plans should prepare for high-speed autonomous attacks requiring automated defensive responses rather than purely human-driven investigation and remediation, Kelley concluded.



Source link