The AI security crisis no one is preparing for

The AI security crisis no one is preparing for

In this Help Net Security interview, Jacob Ideskog, CTO of Curity, discusses the risks AI agents pose to organizations. As these agents become embedded in enterprise systems, the potential for misuse, data leakage, and unauthorized access grows.

Ideskog warns that the industry is “sleepwalking” into a security crisis, drawing parallels to the early days of API and cloud adoption, and outlines steps companies must take to defend against these behavior-driven threats.

You’ve warned about the industry “sleepwalking” into a security crisis with AI agents. What do you mean by that, and what signs do you see that we’re already on that path?

AI agents and other non-human identities are proliferating rapidly. In some organizations they already outnumber human users by more than 80 to 1. Many are deployed with broad, persistent access to systems and data, but without the same security controls, governance, or monitoring applied to human accounts. This creates a perfect opportunity for abuse, whether that’s through prompt injection, compromised credentials, or the exploitation of insecure code that they generate.

We’re already seeing early warning signs. Security researchers have shown it’s possible to subvert AI assistants into running unauthorized commands, accessing sensitive files, or introducing supply-chain vulnerabilities, all without triggering typical security alerts. While these tests have been in controlled conditions, the techniques are straightforward enough that it’s only a matter of time before malicious actors use them in the wild.

How would you compare the current state of AI agent security to, say, the early days of API security or cloud misconfigurations? Are we repeating similar mistakes?

Yes, we are definitely seeing history repeat itself. The current state of agentic security feels very similar to the 2010s when organizations rushed to implement cloud and API adoption without understanding the security implications.

In the early API era, developers often exposed endpoints without proper authentication, input validation or rate limiting. Many systems were abused or compromised as attackers found predictable patterns and misconfigurations. The shift to cloud introduced the same sort of growing pains with misconfigured storage buckets, overly permissive roles, and poor visibility.

The same pattern is emerging with AI agents. The capabilities are impressive and the business pressure to adopt them is growing, but the understanding of how to secure them is lagging behind. Many teams do not yet have a threat model for AI. They are not considering how inputs can be manipulated, how AI systems can leak context across sessions, or how an over-permissioned agent can take unintended actions inside production environments.

We are also seeing some operational mistakes that echo those earlier eras. There are organizations that are integrating AI agents directly with internal tools and data sources without putting proper safeguards in place. Others are skipping monitoring or logging, treating these systems as if they are just front-end chatbots. In some cases, there is no definition of what constitutes an acceptable or unsafe output, which leaves too much room for error.

The key difference now is that with AI, the attack surface includes behaviour, language, and context. Existing controls cannot lock these things down easily, and a different approach is required. This could include prompt hardening, input and output filtering, and continuous monitoring of how the system behaves over time. That level of nuance is difficult to manage without purpose-built tools and a shift in mindset.

The good news though, is that we do have experience to draw on. The lessons from API and cloud security still apply, principles like least privilege, secure by default, auditability, and layered defence are just as relevant, we just need to apply them in a new context. The risks here are more about influence, misinterpretation, and unintended action than code-level vulnerabilities.

Can you share a real-world incident, either public or anonymized, where an insecure AI agent or bot caused or contributed to a security breach or operational issue?

A recent example involved Cursor IDE, an AI-powered coding tool. Researchers demonstrated that by feeding it malicious prompts, they could trick its embedded AI assistant into executing system commands on the developer’s local machine. In this type of scenario, attackers could steal environment variables, API keys, and authentication tokens, exfiltrate sensitive files to external servers, and potentially install backdoors or alter configurations.

Another example comes from GitHub Copilot, which uses OpenAI models to suggest code snippets. Early on, developers noticed Copilot sometimes generated insecure code, such as hardcoded credentials, outdated encryption, or missing input validation. This wasn’t a direct breach, but it showed how AI-generated code could quietly introduce vulnerabilities into software supply chains if not carefully reviewed.

These incidents illustrate that AI agents are not passive tools but active participants in your systems. If organizations don’t implement proper guardrails, like scoped permissions, prompt hardening, behavioural monitoring, and strict output filtering, AI agents can become significant security liabilities.

What does the attack surface look like for an AI agent operating within a typical enterprise environment? What are the weak points attackers are most likely to target?

The attack surface of an AI agent in a typical enterprise environment is both broad and highly dynamic. Unlike traditional applications, AI agents interact through natural language, which opens new avenues for manipulation. This hybrid nature, part software system, part language interface, introduces novel security challenges.

One of the most common and critical vulnerabilities is prompt injection. Because AI agents follow instructions embedded in text, attackers can craft inputs designed to override internal commands, expose hidden prompts, and even alter the agent’s intended behavior. If the agent in question interacts with sensitive data or systems, this can escalate quickly into unauthorised access or action.

Another major risk is data leakage through the model’s output. Even if underlying data sources are properly secured, an AI agent can unintentionally surface confidential information in response to cleverly worded questions. Access controls often don’t apply to LLM-based systems, which means sensitive information can leak through indirect queries or context manipulation.

Adversarial inputs, such as deliberately malformed, obfuscated, or ambiguous prompts, also present a growing risk. Attackers may use these to bypass filters, trick the model into making unsafe statements, or induce unexpected behavior.

Further complicating the picture, integrations with enterprise tools can significantly expand the attack surface. AI agents that can call APIs, update records, send emails, or execute workflows must be strictly permissioned and isolated. An over-permissioned agent can become an unintentional attack proxy if manipulated by a malicious input. Boundaries and granular controls are essential in these situations.

Training data exposure is another weak point. If an AI model has been trained or fine-tuned on internal documents or chat logs, attackers may be able to extract fragments of that data by probing the model over time. A risk that is dangerously overlooked.

Lastly, infrastructure concerns still apply. The endpoints, APIs, logging systems, and middleware supporting the AI must be secured using the same principles applied to other critical systems. Weak input validation, insufficient logging, or lack of rate limiting can allow attackers to exploit not just the AI agent, but the environment it operates in.

What makes AI systems particularly challenging to secure is the nature of their vulnerabilities. Many are not code-based but behavioral, the result of how the model interprets and responds to language. Security teams must account for intent manipulation, semantic ambiguity, and social engineering-style attacks delivered through prompts.

To reduce risk, organizations need layered defences, prompt hardening, output filtering, behavioral monitoring, red teaming, access controls, and strong observability across all AI interactions. The attack surface includes the model, its instructions, the data it accesses, and the systems it operates within. Defenders must think like language-driven adversaries to stay ahead.

What are the must-have controls or practices every organization should implement before deploying an AI agent in production?

Deploying an AI agent introduces a new class of threats that must be managed with the same rigor applied to any other critical system. Before go-live, the AI environment must be secured against a broad spectrum of attack vectors, including prompt injection, adversarial manipulation, and abuse through input flooding or denial-of-service. This requires strong identity and access management, including role-based controls for both internal users and integrated systems. All data exchanged with the AI must be encrypted in transit and at rest, and exposed endpoints should be hardened, monitored, and rate-limited.

AI models themselves must be treated as potential attack surfaces. Because generative models are inherently sensitive to how they are prompted, organizations must test and validate prompts under a range of conditions to prevent unexpected behavior. Guardrails, such as input/output filtering, structured responses, and constrained generation, are essential for ensuring the AI doesn’t leak sensitive information or produce unsafe content. Any modifications to prompts, model configurations, or system instructions must be logged and subject to change control.

Pre-deployment testing should include red teaming and adversarial exercises tailored specifically to AI use cases. This includes simulating prompt injection, data exfiltration via language output, model manipulation, and other abuse scenarios. The agent’s performance and behavior must be evaluated not only under normal load, but also under stress conditions and malformed input. For agents connected to internal data, strong segmentation and access controls must be in place to ensure the AI cannot act beyond its authorized scope.

Once live, continuous monitoring becomes critical. Organizations must implement runtime observability to detect anomalous behavior, unexpected outputs, or security policy violations. This includes logging all user interactions with sufficient detail for audit and forensic purposes, while also flagging potential misuse in real time. Where applicable, content moderation pipelines or response validation layers should be deployed to suppress or review unsafe outputs before they reach end users. Fallback mechanisms, such as routing to human operators, should be triggered when confidence is low or response behavior deviates from expectations.

Cyber teams should also be prepared with an AI-specific incident response plan. This should cover also emerging failure modes unique to AI, such as model hallucinations, data leakage via completions, prompt injection-induced behavior changes, or exploitation of system prompts. Detection rules, alerts, and playbooks should be adapted to these new categories of risk.

Finally, operational discipline is critical. Model deployments must be version-controlled, prompts must be auditable, and telemetry from AI interactions should feed into ongoing threat modeling. If the AI is integrated with external plugins, APIs, or agents that perform actions (such as executing code or sending messages), those integrations must be sandboxed and governed by strict permissions and approval workflows.


Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.