Veteran security engineer Niels Provos is working on a new technical approach designed to stop autonomous AI agents from taking actions you haven’t specifically authorized.
His open-source software solution, called IronCurtain, aims to neutralize the risk of an LLM-powered agent “going rogue” – whether through prompt injection or the agent gradually deviating from the user’s original intent over the course of a long session.
How does IronCurtain work?
In the last few months, there have been reports of autonomous AI agents going off the rails due to agentic misalignment).
Instead of allowing them unlimited access to the user’s system, IronCurtain makes sure that the agent will not interact with it directly, and that its intended actions will be first be analyzed by a separate trusted process.
“Every agent, whether a direct LLM session or Claude Code running in a Docker container, goes through the same pipeline,” says Provos.
Once the user gives it an instruction, the agent writes TypeScript code that runs inside a V8 isolated virtual machine, and issues typed function calls that map to MCP tool calls (i.e., requests an AI sends to external tools through the Model Context Protocol so they can do things).
These tool-call requests are forwarded to the trusted process – a MCP proxy – that acts as a policy engine, and will “decide” whether each call should be allowed, denied (blocked), or escalated to a human for approval.
The four layers of IronCurtain (Source: Niels Provos)
The decisions of this policy engine rely on a “constitution”: a set of guiding principles and concrete guidance written in plain English by the user and “translated” into a security policy by IronCurtain.
“A compiler LLM translates the English into per-interface rules using a library of verified policy primitives. A test scenario generator creates cases designed to find gaps and contradictions. A verifier checks that the compiled rules match the original intent. A judge iteratively refines the policy until it meets the spirit of the constitution as well as it can,” Provos explains.
“Evaluation happens in two phases. First, structural invariants: protected paths (like the constitution itself and audit logs) are always denied, sandbox-contained paths are auto-allowed, and unknown tools are rejected. Second, compiled policy rules: each argument is checked against the rules generated from your constitution. Each argument role is evaluated independently and the most restrictive result wins.”
Once the tool-call requests have been allowed either by the trusted process or by the human, they are forwared to standard Model Context Protocol servers that provide filesystem access, git operations, and other capabilities. They instruct tools to execute the asked-for actions.
Results flow back through the trusted process to the agent, which is never allowed to access the user’s filesystem, sensitive credentials (e.g,. OAuth tokens, API keys, service account secrets), or environment variables. It is also prevented from accessing and modifying its own policy files, audit logs, or configuration.
IronCurtain is still in development, and Provos describes it as an early research effort. The code has been released publicly so developers and security researchers can test the approach and suggest improvements.
![]()
Subscribe to our breaking news e-mail alert to never miss out on the latest breaches, vulnerabilities and cybersecurity threats. Subscribe here!
![]()




