AI Red Teaming: HackerOne’s Approach [Playbook]


To ensure that AI is more secure and trustworthy, the EO calls on companies who develop AI and other companies in critical infrastructure that use AI to rely on “red-teaming”: testing to find flaws and vulnerabilities. The EO also requires broad disclosures of some of these red-team test results.

Testing AI systems isn’t necessarily new. Back in 2021, HackerOne organized a public algorithmic bias review with Twitter as part of the AI Village at DEF CON 29. The review encouraged members of the AI and security communities to identify bias in Twitter’s image-cropping algorithms. The results of the engagement brought to light various confirmed biases, informing improvements to make the algorithms more equitable.

In this blog post, we’ll delve into the emerging playbook developed by HackerOne, focusing on the collaboration between ethical hackers and AI safety to fortify these systems. Bug bounty programs have proven effective at finding security vulnerabilities, but AI safety requires a new approach. According to recent findings published in the 7th Annual Hacker Powered Security Report, 55% of hackers say that GenAI tools themselves will become a major target for them in the coming years, and 61% said they plan to use and develop hacking tools using GenAI to find more vulnerabilities. 

HackerOne’s Approach to AI Red Teaming

HackerOne partners with leading technology firms to evaluate their AI deployments for safety issues. The ethical hackers selected for our early AI Red Teaming exceeded all expectations. Drawing from these experiences, we’re eager to share the insights gleaned, which have shaped our evolving playbook for AI safety red teaming.

Our approach builds upon the powerful bug bounty model, which HackerOne has successfully offered for over a decade, but with several modifications necessary for optimal AI Safety engagement.

  • Team Composition: A meticulously selected and, more importantly, diverse team is the backbone of an effective assessment. Emphasizing diversity in background, experience, and skill sets is pivotal for ensuring a safe AI. A blend of curiosity-driven thinkers, individuals with varied experiences, and those skilled in production LLM prompt behavior has yielded the best results.
  • Collaboration and Size: Collaboration among AI Safety Red Team members holds unparalleled significance, often exceeding that of traditional security testing. A team size ranging from 15-25 testers has been found to strike the right balance for effective engagements, bringing in diverse and global perspectives.
  • Duration: Because AI technology is evolving so quickly, we’ve found that engagements between 15 and 60 days work best to assess specific aspects of AI Safety. However, in at least a handful of cases, a continuous engagement without a defined end date was adopted. This method of continuous AI red teaming pairs well with an existing bug bounty program.
  • Context and Scope: Unlike traditional security testing, AI Red Teamers cannot approach a model blindly. Establishing both broad context and specific scope in collaboration with customers is crucial to determining the AI’s purpose, deployment environment, existing safety features, and limitations.
  • Private vs. Public: While most AI Red Teams operate in private due to the sensitivity of safety issues, there are instances where public engagement, such as Twitter’s algorithmic bias bounty challenge, has yielded significant success.
  • Incentive Model: Tailoring the incentive model is a critical aspect of the AI safety playbook. A hybrid economic model that includes both fixed-fee participation rewards in conjunction with rewards for achieving specific safety outcomes (akin to bounties) has proven most effective.
  • Empathy and Consent: As many safety considerations may involve encountering harmful and offensive content, it is important to seek explicit participation consent from adults (18+ years of age), offer regular support for mental health, and encourage breaks between assessments.

In the HackerOne community, over 750 active hackers specialize in prompt hacking and other AI security and safety testing. To date, 90+ of those hackers have participated in HackerOne’s AI Safety Red Teaming engagements. In a single recent engagement, a team of 18 quickly identified 26 valid findings within the initial 24 hours and accumulated over 100 valid findings in the two-week engagement. In one notable example, one of the challenges put forth to the team was bypassing significant protections built to prevent the generation of images containing a Swastika. A particularly creative hacker on the AI Red Team was able to swiftly bypass these protections, and thanks to their findings, the model is now far more resilient against this type of abuse.

As AI continues to shape our future, the ethical hacker community, in collaboration with platforms like HackerOne, is committed to ensuring its safe integration. Our AI Red Teams stand ready to assist enterprises in navigating the complexities of deploying AI models responsibly, ensuring that their potential for positive impact is maximized while guarding against unintended consequences.

By using the expertise of ethical hackers and adapting the bug bounty model to address AI safety, HackerOne’s playbook is a proactive approach to fortifying AI while mitigating potential risks. For technology and security leaders venturing into AI integration, we look forward to partnering with you to explore how HackerOne and ethical hackers can contribute to your AI safety journey. To learn more about how to implement AI Red Teaming for your organization, contact our experts at HackerOne.





Source link