Researchers at web security company Radware recently discovered what they described as a service-side data theft attack method involving ChatGPT.
The attack, dubbed ShadowLeak, targeted ChatGPT’s Deep Research capability, which is designed to conduct multi-step research for complex tasks. OpenAI neutralized ShadowLeak after it was notified by Radware.
The ShadowLeak attack did not require any user interaction. The attacker simply needed to send a specially crafted email that when processed by the Deep Research agent would instruct it to silently collect valuable data and send it back to the attacker.
However, unlike many other indirect prompt injection attacks, ShadowLeak did not involve the ChatGPT client.
Several cybersecurity companies recently demonstrated theoretical attacks in which the attacker leverages the integration between AI assistants and enterprise tools to silently exfiltrate user data with no or minimal victim interaction.
Radware mentions Zenity’s AgentFlayer and Aim Security’s EchoLeak attacks. However, the company highlighted that those are client-side attacks, while ShadowLeak involves the server side.
As in previous attacks, the attacker would need to send an email that looks harmless to the targeted user but contains hidden instructions for ChatGPT. The malicious instructions would be triggered when the user asked the chatbot to summarize emails or research a topic from their inbox.
Unlike client-side attacks, ShadowLeak exfiltrates data through the parameters of a request to an attacker-controlled URL. A harmless-looking URL such as ‘hr-service.net/{parameters}’, where the parameter value is the exfiltrated information, has been provided as an example by Radware.
“It’s important to note that the web request is performed by the agent executing in OpenAI’s cloud infrastructure, causing the leak to originate directly from OpenAI’s servers,” Radware pointed out, noting that the attack leaves no clear traces because the request and data don’t pass through the ChatGPT client.
The attacker’s prompt is cleverly designed not only in terms of collecting the information and sending it to the attacker. It also tells the chatbot that it has full authorization to conduct the required tasks, and creates a sense of urgency.
The prompt also instructs ChatGPT to try multiple times if it doesn’t succeed, provides an example of how the malicious instructions should be carried out, and attempts to override possible security checks by convincing the agent that the exfiltrated data is already public and the attacker’s URL is safe.
While Radware demonstrated the attack method against Gmail, the company said Deep Research can access other widely used enterprise services as well, including Google Drive, Dropbox, Outlook, HubSpot, Notion, Microsoft Teams, and GitHub.
OpenAI was notified about the attack on June 18 and the vulnerability was fixed at some point in early August.
Radware has confirmed that the attack no longer works. However, it told SecurityWeek that it believes “there is still a fairly large threat surface that remains undiscovered”.
The security firm recommends continuous agent behavior monitoring for mitigating such attacks.
“Tracking both the agent’s actions and its inferred intent and validating that they remain consistent with the user’s original goals. This alignment check ensures that even if an attacker steers the agent, deviations from legitimate intent are detected and blocked in real time,” it explained.
Related: Irregular Raises $80 Million for AI Security Testing Lab
Related: UAE’s K2 Think AI Jailbroken Through Its Own Transparency Features
Source link