When AI chatbots leak and how it happens

When AI chatbots leak and how it happens

In a recent article on Cybernews there were two clear signs of how fast the world of AI chatbots is growing. A company I had never even heard of had over 150 million app downloads across its portfolio, and it also had an exposed unprotected Elasticsearch instance.

This needs a bit of an explanation. I had never heard of Vyro AI, a company that probably still doesn’t ring many bells, but its app ImagineArt has over 10 million downloads on Google Play. Vyro AI also markets Chatly, which has over 100,000 downloads, and Chatbotx, a web-based chatbot with about 50,000 monthly visits.

An Elasticsearch instance is a database server running a tool used to quickly store and search lots of data. If it’s unsecured because it lacks passwords, authentication, or network restrictions, it is unprotected against unauthorized visitors. This means it’s freely accessible to access by anyone with internet access that happens to find it. And without any protection like a password or a firewall, anyone who finds the database online can read, copy, change, or even delete all its data.

The researcher that found the database says it covered both production and development environments and stored about 2–7 days’ worth of logs, including 116GB of user logs in real time from the company’s three popular apps.

The information that was accessible included:

  • AI prompts that users typed into the apps. AI prompts are the questions and instructions that users submit to the AI.
  • Bearer authentication tokens, which function similarly to cookies so the user does not have to log in before every session, and allows the user to view their history and enter prompts. An attacker could even hijack an account using these tokens.
  • User agents which are strings of text sent with requests to a server to identify the application, its version, and the device’s operating system. For native mobile apps, developers might include a custom user agent string within the HTTP headers of their requests. This allows developers to identify specific app users, and tailor content and experiences for different app versions or platforms.

The researcher found that the database was first indexed by IoT search engines in mid-February. IoT search engines actively find and list devices or servers that anyone can access on the internet. They help users discover vulnerable devices (such as cameras, printers, and smart home gadgets) and also locate open databases.

This means that attackers have had a chance to “stumble” over this open database for months. And with the information there they could have taken over user accounts, accessed chat histories and generated images, and made fraudulent AI credit purchases.

How does this happen all the time?

Generative AI has found a place in many homes and even more companies, which means there is a lot of money to be made.

But the companies delivering these AI chatbots feel they can only be relevant when they push out new products. So, their engineering efforts are put there where they can control the cash flow. Security and privacy concerns are secondary at best.

Just looking at the last few months, we have reported about:

  • Prompt injection vulnerabilities, where someone inserts carefully crafted input in the form of an ordinary conversation or data, to nudge or outright force an AI into doing something it wasn’t meant to do.
  • An AI chatbot used to launch a cybercrime spree where cybercriminals were found to be using a chatbot to help them defraud people and breach organizations.
  • AI chats showing up in Google search results. These findings concerned Grok, ChatGPT, and Meta AI (twice).
  • An insecure backend application that exposed data about chatbot interactions of job applicants at McDonalds.

As diverse as the causes of the data breaches are—they stem from a combination of human error, platform weaknesses, and architectural flaws—the call to do something about them is starting to get heard.

Hopefully, 2025 will be remembered as a starting point for compliance regulations in the AI chatbots landscape.

The AI Act is a European regulation on artificial intelligence (AI). The Act entered into force on August 1, 2024, and is the first comprehensive regulation on AI by a major regulator anywhere.

The Act assigns applications of AI to three risk categories. First, applications and systems that create an unacceptable risk, such as government-run social scoring of the type used in China, are banned. Second, high-risk applications, such as a CV-scanning tool that ranks job applicants, are subject to specific legal requirements. But lastly, applications not explicitly banned or listed as high-risk are largely left unregulated.

Although not completely ironed out, the NIS2 Directive is destined to have significant implications for AI providers, especially those operating in the EU or serving EU customers. Among others, AI model endpoints, APIs, and data pipelines must be protected to prevent breaches and attacks, ensuring secure deployment and operation.

And, although not cybersecurity related, the California State Assembly took a big step toward regulating AI on September 10, 2025, passing SB 243: a bill that aims to regulate AI companion chatbots in order to protect minors and vulnerable users. One of the major requirements is repeated warnings that the user is “talking to” an AI chatbot and not a real person, and that they should take a break.


We don’t just report on data privacy—we help you remove your personal information

Cybersecurity risks should never spread beyond a headline. With Malwarebytes Personal Data Remover, you can scan to find out which sites are exposing your personal information, and then delete that sensitive data from the internet.


Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.