An Automated Penetration Testing Tool To Trick AI Chatbots


Bishop Fox has introduced Broken Hill, an advanced automated tool created to produce tailored prompts that can circumvent restrictions in Large Language Models (LLMs). This marks a significant advancement in AI security research.

This innovative software implements the Greedy Coordinate Gradient (GCG) attack, which can trick AI chatbots into misbehaving and ignoring their built-in safeguards.

EHA

The GCG attack, first described in a July 2023 paper by researchers Andy Zou, Zifan Wang, Nicholas Carlini, and others, allows penetration testers to circumvent limitations placed on virtually any LLM with a chat interface.

Leveraging AI for enhanced security => Free Webinar

Broken Hill simplifies this complex process, making it accessible to a wider range of researchers and security professionals.

  • Versatility: The tool can be used against various popular AI models, including smaller ones like Microsoft’s Phi family, which can run on consumer-grade GPUs such as the Nvidia GeForce RTX 4090.
  • Efficiency: Broken Hill can generate effective adversarial content without the need for expensive cloud servers, democratizing access to this cutting-edge technology.
  • Flexibility: Designed to become the “sqlmap of LLM testing,” Broken Hill aims to handle common scenarios almost entirely automatically.

The tool’s capabilities were demonstrated in a capture-the-flag (CTF) exercise designed by Derek Rush, a colleague of the Broken Hill developer. The exercise involved:

  1. Generating payloads to make Phi-3 disclose a secret
  2. Crafting prompts to bypass gatekeeper LLMs
  3. Utilizing filtering features to ensure results pass input validation checks

Broken Hill’s release highlights the ongoing challenges in securing AI systems against sophisticated attacks.

By providing researchers and penetration testers with a powerful tool to probe LLM vulnerabilities, it contributes to the broader effort of improving AI safety and robustness.

While already capable of producing results useful in real-world penetration testing and LLM research scenarios, Broken Hill’s developers envision a wide field of options for enhancing its capabilities.

Future updates may include support for additional models and more advanced attack techniques.

As with any powerful security tool, Broken Hill raises important ethical questions about responsible use and disclosure.

The developers emphasize its intended application for legitimate research and security testing purposes, underscoring the importance of using such tools to strengthen AI systems against potential misuse.

Broken Hill represents a significant advancement in the field of AI security testing. By automating the complex process of generating adversarial prompts, it empowers researchers to better understand and mitigate potential vulnerabilities in LLMs.

As AI systems continue to play an increasingly important role in various sectors, tools like Broken Hill will be crucial in ensuring their security and reliability.

As the AI landscape evolves, the cat-and-mouse game between security researchers and potential adversaries is likely to intensify.

Free Webinar on How to Protect Small Businesses Against Advanced Cyberthreats -> Free Webinar



Source link