PentAGI: Open-source autonomous AI penetration testing system

April 22, 2026 2 min read

Table of Contents

How the agent system works
LLM provider flexibility
Deployment and integration

Penetration testers have long relied on collections of specialized tools, manual coordination, and documented runbooks to work through a target assessment. PentAGI, an open-source project from VXControl, attempts to automate that entire workflow using a multi-agent AI system that plans, researches, and executes penetration tests with minimal human direction.

How the agent system works

PentAGI organizes work into a hierarchy of flows, tasks, subtasks, and actions. An orchestrator agent receives a goal and coordinates three specialist agents: a researcher that gathers information and queries known vulnerability sources, a developer that plans attack strategies, and an executor that runs commands in isolated containers.

All operations happen inside sandboxed Docker environments, with the system selecting container images based on task type. For security work, it defaults to a Kali Linux image pre-loaded with more than 20 tools including nmap, Metasploit, and sqlmap. Each agent type draws on three memory layers: long-term vector storage, working context, and episodic history, all backed by PostgreSQL with the pgvector extension for semantic search.

The system manages growing LLM context windows through a chain summarization algorithm that selectively compresses older conversation history. This keeps token consumption in check across longer engagements without dropping critical context.

LLM provider flexibility

Rather than locking users to a single AI backend, PentAGI accepts credentials for OpenAI, Anthropic, Google Gemini, AWS Bedrock, Ollama, DeepSeek, OpenRouter, and several others. Organizations running air-gapped or cost-sensitive environments can point the system at a local Ollama instance. AWS Bedrock users should note that default rate limits on new accounts are restrictive enough to degrade testing workflows, and the project documentation recommends requesting quota increases before use in production.

An optional Graphiti knowledge graph, powered by Neo4j, stores semantic relationships between tools, targets, vulnerabilities, and techniques across sessions. It is disabled by default and requires an OpenAI key for entity extraction even when other LLM providers handle the main agents.

Deployment and integration

Deployment runs through Docker Compose and takes effect through an interactive installer or manual environment configuration. Minimum requirements are 2 vCPUs, 4GB of RAM, and 20GB of disk space. For production use, the project recommends a two-node architecture that isolates worker containers on a dedicated server, separating potentially untrusted code execution from the main control plane.

The system exposes both REST and GraphQL APIs with Bearer token authentication, allowing integration into CI/CD pipelines or custom applications. Observability is handled through an optional stack of OpenTelemetry, Grafana, VictoriaMetrics, Jaeger, and Loki. Langfuse provides LLM-specific analytics for tracing agent decisions and model performance over time.

PentAGI is available on GitHub.