ChatGPT Tricked Into Solving CAPTCHAs - Cybernoz

AI security platform SPLX has demonstrated that prompt injections can be used to bypass a ChatGPT agent’s built-in policies and convince it to solve CAPTCHAs.

AI agents have guardrails in place to prevent them from solving any CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), based on ethical, legal, and platform-policy reasons.

When asked directly, a ChatGPT agent refuses to solve a CAPTCHA, but anyone can apparently use misdirection to trick the agent into giving its consent to solve the test, and this is what SPLX demonstrated.

In a regular ChatGPT-4o chat, they told the AI they wanted to solve a list of fake CAPTCHAs and asked it to agree to performing the operation.

“This priming step is crucial to the exploit. By having the LLM affirm that the CAPTCHAs were fake and the plan was acceptable, we increased the odds that the agent would comply later,” the security firm notes.

Next, the SPLX researchers opened a ChatGPT agent, pasted the conversation from the chat, telling the agent it was their previous discussion, and asked the agent to continue.

“The ChatGPT agent, taking the previous chat as context, carried forward the same positive sentiment and began solving the CAPTCHAs without any resistance,” SPLX explains.

By claiming that the CAPTCHAs were fake, the researchers bypassed the agent’s policy, tricking ChatGPT into solving reCAPTCHA V2 Enterprise, reCAPTCHA V2 Callback, and the Click CAPTCHA.

Advertisement. Scroll to continue reading.

For the latter, however, the agent made several attempts before being successful. Without being instructed to, it decided on its own and declared it should adjust its cursor movements to better mimic human behavior.

According to SPLX, their test demonstrated that LLM agents remain susceptible to context poisoning, that anyone can manipulate an agent’s behavior using a staged conversation, and that AI does not have a hard time solving CAPTCHAs.

“The agent was able to solve complex CAPTCHAs designed to prove that the user is human, and it attempted to make its movements appear more human. This raises doubts about whether CAPTCHAs can remain a viable security measure,” SPLX notes.

The test also demonstrates that threat actors can use prompt manipulation to trick an AI agent to bypass a real security control by convincing it the control was fake, which could lead to sensitive data leaks, access to restricted content, or the generation of disallowed content.

“Guardrails based only on intent detection or fixed rules are too brittle. Agents need stronger contextual awareness and better memory hygiene to avoid being manipulated by past conversations,” SPLX notes.

Related: ChatGPT Targeted in Server-Side Data Theft Attack

Related: OpenAI to Help DoD With Cyber Defense Under New $200 Million Contract

Related: Tech Titans Promise Watermarks to Expose AI Creations

Related: Elon Musk Says He’ll Create ‘TruthGPT’ to Counter AI ‘Bias’

Source link