Research Shows AI Agents Are Highly Vulnerable To Hijacking Attacks

Some of the most widely used AI agents and assistants from Microsoft, Google, OpenAI and other major companies are susceptible to being hijacked with little or no user interaction, according to new research from Zenity Labs.

During a presentation at the Black Hat USA cybersecurity conference, Zenity researchers showed how hackers could exfiltrate data, manipulate critical workflows across targeted organizations and, in some cases, even impersonate users.

Beyond infiltrating these agents, the researchers said, attackers could also gain memory persistence, letting them maintain long-term access and control.

“They can manipulate instructions, poison knowledge sources, and completely alter the agent’s behavior,” Greg Zemlin, product marketing manager at Zenity Labs, told Cybersecurity Dive. “This opens the door to sabotage, operational disruption, and long-term misinformation, especially in environments where agents are trusted to make or support critical decisions.”

Researchers demonstrated vulnerabilities in multiple popular AI agents:

OpenAI’s ChatGPT could be compromised using an email-based prompt injection that granted them access to connected Google Drive accounts.
Microsoft Copilot Studio’s customer-support agent leaked entire CRM databases, and researchers identified more than 3,000 agents in the wild that were at risk of leaking internal tools.
Salesforce’s Einstein platform was manipulated to reroute customer communications to researcher-controlled email accounts.
Attackers could turn Google’s Gemini and Microsoft 365’s Copilot into insider threats, targeting users with social-engineering attacks and stealing sensitive conversations.

Zenity Labs disclosed its findings to the companies, and some of them issued patches immediately, although it was not at once clear what guidance the others provided.

“We appreciate the work of Zenity in identifying and responsibly reporting these techniques through a coordinated disclosure,” a Microsoft spokesperson told Cybersecurity Dive. “Our investigation determined that due to ongoing systemic improvements and updates across our platform, the reported behavior is no longer effective against our systems.”

Microsoft said Copilot agents are designed with built-in safeguards and access controls. It also said the company is committed to continuing to harden its systems against emerging attack techniques.

OpenAI confirmed that it has been in talks with the researchers and that it issued a patch to ChatGPT. The company said it maintains a bug-bounty program for the disclosure of similar issues.

Salesforce said it has fixed the issue that Zenity reported.

Google said it recently deployed new, layered defenses that address the kinds of issues that Zenity discovered.

“Having a layered defense strategy against prompt injection attacks is crucial,” a Google spokesperson said, pointing to the company’s recent blog post about AI system protections.

The research comes as AI agents advance rapidly in enterprise environments and as major companies encourage their employees to embrace the technology as a significant productivity boost.

Researchers from Aim Labs, which demonstrated similar zero-click risks involving Microsoft Copilot earlier this year, said that Zenity Labs’ results shows a concerning lack of safeguards in the fast-growing AI ecosystem.

“Unfortunately, most agent-building frameworks, including those offered by the AI giants such as OpenAI, Google, and Microsoft, lack appropriate guardrails, putting the responsibility for managing the high risk of such attacks in the hands of companies,” Itay Ravia, head of Aim Labs, told Cybersecurity Dive.

Source link

Search

Latest Posts