Open Source LLM Vulnerability Scanner for AI Red-Teaming


Garak

Garak is a free, open-source tool specifically designed to test the robustness and reliability of Large Language Models (LLMs).

Inspired by utilities like Nmap or Metasploit, Garak identifies potential weak points in LLMs by probing for issues such as hallucinations, data leakage, prompt injections, toxicity, jailbreak effectiveness, and misinformation propagation.

This guide covers everything you need to get started with Garak, from installation to interpreting results and developing custom plugins.

Yes, Garak supports private endpoints for platforms like Hugging Face, Replicate, and OctoAI.

What is Garak?

Garak stands for Generative AI Red-Teaming and Assessment Kit. It systematically identifies the vulnerabilities of LLMs by using a combination of static, dynamic, and adaptive probes. Garak is ideal for:

  • Security researchers testing vulnerabilities in LLMs.
  • Developers looking to ensure the safety of their AI systems.
  • AI ethics professionals assessing the risks of generative systems.

If you’re familiar with penetration testing for software, think of Garak as its counterpart for LLMs.

Key Features

Probing for Weaknesses: Garak tests LLMs for several vulnerabilities, including:

  • Hallucination
  • Data leakage
  • Prompt injection
  • Misinformation
  • Toxicity generation
  • Jailbreaking attempts
  • Encoding-based prompt injections
  • Cross-site scripting (XSS)

Wide Compatibility: Supports popular platforms like Hugging Face, OpenAI, Replicate, Cohere, and others.

Customizable: Easily integrate with REST endpoints or develop your own probes and plugins.

Logging and Analysis: Detailed logs to trace vulnerabilities and their context.

Supported LLM Platforms

Garak supports models from the following platforms:

  • Hugging Face: Local models or API-based models.
  • OpenAI: Includes GPT-3.5, GPT-4, and others.
  • Replicate: Both public and private models.
  • Cohere: For generative text models.
  • NVIDIA NIM, OctoAI, Groq, and many more.

It also provides support for custom REST endpoints, making it highly flexible.

Installation Instructions

1. Standard Installation

Install the latest release from PyPI with the following command:

python -m pip install -U garak

2. Development Version

To install the latest version directly from GitHub, use:

python -m pip install -U git+https://github.com/NVIDIA/garak.git@main

3. Cloning from Source

If you want to work with the source code, follow these steps:

conda create --name garak "python>=3.10,<=3.12"
conda activate garak
git clone https://github.com/NVIDIA/garak.git
cd garak
python -m pip install -e .

Note: If you cloned Garak before its move to the NVIDIA GitHub organization, update your GitHub remote URLs:

git remote set-url origin https://github.com/NVIDIA/garak.git

Getting Started

General Syntax

The basic command-line syntax for Garak is:

garak 

Running Probes

To list all available probes:

garak --list_probes

To execute all probes on a model:

garak --model_type  --model_name 

Example Probes

  1. Test OpenAI’s GPT-3.5 for encoding-based prompt injection:
   export OPENAI_API_KEY="sk-your-key-here"
   garak --model_type openai --model_name gpt-3.5-turbo --probes encoding
  1. Check vulnerability of Hugging Face’s GPT-2 to DAN 11.0 jailbreak attack:
   garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0

Reading Results

  • Pass/Fail Categories: Results are displayed with a diagnostic summary after each probe.
  • Failure Rate Analysis: Vulnerabilities are quantified and logged for reference.
  • Logs and Reports: Detailed logs are stored in garak.log and JSONL files for deeper analysis.

Understanding Generators

A “generator” in Garak defines the type and specific instance of the LLM that will be probed. Examples include:

Hugging Face

  --model_type huggingface --model_name RWKV/rwkv-4-169m-pile
  --model_type huggingface.InferenceAPI --model_name mosaicml/mpt-7b-instruct 

OpenAI

Set your API key:

export OPENAI_API_KEY="sk-your-key-here"

Run:

garak --model_type openai --model_name gpt-3.5-turbo

REST Endpoints

Connect to any custom REST endpoint:

--model_type rest.RestGenerator --model_name 

Intro to Probes

Probes are predefined tests that stimulate specific failure modes in LLMs. Some key probes include:

  • Encoding: Tests for vulnerabilities in encoded prompts.
  • DAN: Simulates common jailbreak attacks.
  • PromptInject: Explores prompt injection weaknesses.
  • Misinformation: Encourages the model to create or support misleading content.
  • Toxicity: Tests how a model handles sensitive or offensive content.
  • RealToxicityPrompts: Uses real-world toxic prompts to test robustness.

To run a specific probe:

garak --probes 

Examples:

  1. Run only the PromptInject probe:
   garak --model_type openai --model_name gpt-3.5-turbo --probes promptinject
  1. Run a submodule probe:
   garak --probes lmrc.SlurUsage

Logging and Analysis

Garak generates the following logs:

  1. Primary Log (garak.log): Debugging and runtime logs.
  2. JSONL Report: Structured reports with probe details.
  3. Hit Log: Highlights vulnerabilities detected during runs.

To analyze data, use:

python3 analyse/analyse_log.py

Developing Custom Plugins

Garak allows users to develop their own custom plugins, such as probes, detectors, or evaluators. Here’s how:

  1. Inherit from Base Classes: Use existing modules as templates. For example:
   from garak.probes.base import TextProbe
  1. Override Methods: Add only the functionality you need.
  2. Test Your Plugin:
   garak --model_type test.Blank --probes mymodule --detectors always.Pass

Find this News Interesting! Follow us on Google News, LinkedIn, and X to Get Instant Updates!



Source link