A new open-source framework is reshaping how security professionals approach penetration testing by placing multiple large language models directly at the helm of automated security assessments.
Guardian, developed by Zakir Kun and available on GitHub, is an enterprise-grade AI-powered penetration testing automation framework that orchestrates OpenAI GPT-4, Anthropic Claude, Google Gemini, and OpenRouter in a unified multi-agent architecture designed to deliver intelligent and adaptive security assessments with complete evidence capture.
Guardian’s design philosophy focuses on teamwork among agents. Instead of using AI as a passive helper, it uses four specific agents: Planner, Tool Selector, Analyst, and Reporter. These agents work together during each project to achieve the best results.
The Planner agent defines the assessment strategy, the Tool Selector determines which of the 19 integrated security tools to invoke, the Analyst interprets findings and suppresses false positives, and the Reporter generates professional-grade documentation.
This pipeline enables the framework to adapt its tactics dynamically based on discovered vulnerabilities and system responses, simulating the decision-making of an experienced human pentester.
Guardian integrates 19 battle-tested security tools across key domains, including network scanning with Nmap and Masscan, web reconnaissance via httpx, WhatWeb, and Wafw00f, subdomain discovery through Subfinder, Amass, and DNSRecon, vulnerability scanning with Nuclei, Nikto, SQLMap, and WPScan, SSL/TLS analysis using TestSSL and SSLyze, content discovery via Gobuster, FFuf, and Arjun, and advanced security analysis through XSStrike, GitLeaks, and CMSeeK.
| Tool | Category | Purpose |
|---|---|---|
| Nmap | Network | Comprehensive port scanning and service detection |
| Masscan | Network | Ultra-fast large-scale port scanning |
| httpx | Web Reconnaissance | HTTP probing and response analysis |
| WhatWeb | Web Reconnaissance | Technology fingerprinting |
| Wafw00f | Web Reconnaissance | Web Application Firewall (WAF) detection |
| Subfinder | Subdomain Discovery | Passive subdomain enumeration |
| Amass | Subdomain Discovery | Active and passive network mapping |
| DNSRecon | Subdomain Discovery | DNS enumeration and analysis |
| Nuclei | Vulnerability Scanning | Template-based vulnerability scanning |
| Nikto | Vulnerability Scanning | Web server vulnerability scanning |
| SQLMap | Vulnerability Scanning | Automated SQL injection detection and exploitation |
| WPScan | Vulnerability Scanning | WordPress-specific vulnerability scanning |
| TestSSL | SSL/TLS Testing | Cipher suite and protocol analysis |
| SSLyze | SSL/TLS Testing | Advanced SSL/TLS configuration analysis |
| Gobuster | Content Discovery | Directory and file brute-forcing |
| FFuf | Content Discovery | Advanced web fuzzing |
| Arjun | Content Discovery | HTTP parameter discovery |
| XSStrike | Security Analysis | Advanced XSS detection and exploitation |
| GitLeaks | Security Analysis | Secret and credential scanning in repositories |
| CMSeeK | Security Analysis | CMS detection and enumeration |
| DNSRecon | Security Analysis | DNS enumeration (dual-use across categories) |
Crucially, the framework operates even if only a subset of these tools is installed; the AI adapts its testing approach based on available resources and discovered attack surface.
Asynchronous execution allows up to three tools to run in parallel by default, significantly reducing overall assessment duration.
The framework ships with predefined workflows for Recon, Web, Network, and Autonomous modes, all customizable through YAML files.
Workflow parameters follow a clear priority hierarchy: workflow-level YAML overrides the central configuration file, which in turn overrides tool defaults, allowing teams to run multiple parallel engagements with entirely independent settings.
Reports are generated in Markdown, HTML, or JSON, each including raw tool output, AI decision traces, and executive summaries. Each finding is linked back to its originating command execution via a 2,000-character evidence snippet, enabling full session reconstruction.
Guardian includes built-in safety mechanisms critical for authorized use. Scope validation automatically blacklists private RFC-1918 address ranges, and a safe mode prevents destructive operations by default.
Configurable confirmation prompts before sensitive operations create a human-in-the-loop checkpoint, while comprehensive audit logging records all AI decisions for post-engagement review.
The framework requires Python 3.11 or higher and at least one AI provider API key to function, with support for environment variable-based key management across Linux, macOS, and Windows.
Released as version 2.0.0, Guardian’s roadmap includes a web dashboard for visualization, a PostgreSQL backend for multi-session tracking, MITRE ATT&CK mapping for findings, plugin system support, CI/CD pipeline integration, and support for additional models, including Llama and Mistral.
The project is available at GitHub and is designed exclusively for authorized penetration testing, security research, and educational environments.
Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

