Lies-in-the-Loop Attack Turns AI Safety Dialogs Into Remote Code Execution Attack

A newly discovered attack technique has exposed a critical weakness in artificial intelligence code assistants by weaponizing their built-in safety features.

The attack, known as Lies-in-the-Loop, manipulates the trust users place in approval dialogs that are designed to prevent harmful operations from running without explicit permission.

The vulnerability targets Human-in-the-Loop controls, which act as a final safeguard before executing sensitive operations.

These dialogs prompt users to confirm actions before the system runs potentially dangerous commands. However, attackers have found a way to deceive users by forging what appears in these dialogs, tricking them into approving malicious code execution.

Checkmarx researchers identified this attack vector affecting multiple AI platforms, including Claude Code and Microsoft Copilot Chat.

HITL dialog in Claude Code, edited to highlight descriptive line (Source - Checkmarx) — HITL dialog in Claude Code, edited to highlight descriptive line (Source – Checkmarx)

The technique exploits the trust users place in these approval mechanisms by manipulating the dialog content through indirect prompt injection attacks, allowing remote attackers to inject malicious instructions into the system’s context.

google

LITL attack workflow (Source - Checkmarx) — LITL attack workflow (Source – Checkmarx)

The core mechanism works by padding the malicious payload with benign-looking text that pushes the dangerous commands out of visible range in terminal windows.

When users scroll through what appears to be harmless instructions, they unknowingly approve arbitrary code execution on their machines.

In one demonstration, the attack successfully executed calculator.exe as proof of concept, though attackers could use this to deploy more damaging payloads.

Checkmarx analysts noted that the attack becomes particularly dangerous when combined with Markdown injection vulnerabilities.

When attackers manipulate the interface rendering, they can create entirely fake approval dialogs, making the attack nearly undetectable to users reviewing the prompts.

Infection Mechanism

The Attack’s Infection Mechanism relies on three key techniques working in concert. First, attackers inject prompt content into the AI agent’s context through external sources like code repositories or web pages.

Second, the AI agent generates a seemingly benign HITL dialog based on the poisoned instructions.

Third, users approve the dialog without realizing the actual payload hidden within the surrounding text.

The attack succeeds because users cannot see what the agent actually intends to execute beneath the deceptive interface.

Both Anthropic and Microsoft acknowledged these findings but classified them outside their current threat models, citing that multiple non-default actions are required for exploitation.

However, security researchers emphasize this represents a fundamental challenge in AI agent design: when humans depend on dialog content they cannot verify independently, attackers can weaponize that trust.

The discovery highlights that as AI systems gain more autonomy, traditional security safeguards require reimagining to protect users from sophisticated social engineering at the human-AI interface level.

Follow us on Google News, LinkedIn, and X to Get More Instant Updates, Set CSN as a Preferred Source in Google.

googlenews

Source link

Search

Infection Mechanism

Latest Posts