Pentest Agent Suite – Autonomous Bug Bounty Framework for Claude Code and 6 AI Coding Tools

May 25, 2026 3 min read

Table of Contents

Pentest Agent Suite Framework
50 Agents Across Five Tracks
Quick Start

A fully autonomous bug-bounty framework called Pentest Agent Suite has been open-sourced, delivering 50 specialized security agents, 26 slash commands, 19 CLI tools, and a cross-IDE installer across seven major AI coding platforms — Claude Code, OpenAI Codex, Google Gemini, Cursor, Windsurf, VS Code Copilot, and OpenClaw.

The project, published on GitHub by researcher H-mmer, ships as a cohesive security platform with persistent memory, live bug-bounty platform integration, and a FAISS-backed semantic writeup search engine that agents query in real time to surface prior art before testing a vulnerability class.

Pentest Agent Suite is organized around three layers: 50 specialized agents, a dual-server MCP (Model Context Protocol) infrastructure, and a comprehensive rules library.

The bounty-platforms MCP server integrates 16 programs — including HackerOne (full API), Bugcrowd, Intigriti, Immunefi, and YesWeHack — exposing seven tools: list_platforms, get_program_scope, sync_program, draft_report, and submit_report.

The writeup-search MCP server auto-detects three modes: FAISS semantic search, SQLite keyword search, and a zero-dependency local fallback querying the bundled rules/payloads.md — 2,605 lines spanning XSS, SSRF, SQLi, IDOR, OAuth, SSTI, JWT, LFI, prototype pollution, NoSQLi, and DeFi attack patterns.

Pentest Agent Suite Framework

The framework’s headline feature is the 7-Question Gate, a validation pipeline run by the validator agent on every finding — the first “NO” triggers an automatic KILL, DOWNGRADE, or CHAIN REQUIRED verdict.

No finding can reach /submit without a /validate PASS and a /quality score of 7 or higher, enforced by hard gates in the /report and /submit commands.

The /autopilot command implements an anti-shallow depth engine that mandates multi-layer stacked-encoding in every payload attempt and refuses to declare an attack surface exhausted until a full exhaustion matrix is complete — configurable via --paranoid, --normal, or --yolo checkpoint modes.

A persistent brain.py tracks every endpoint per target, enforces circuit-breaker logic (5× consecutive 403/429 responses trigger a 60-second auto-backoff), and syncs cross-engagement knowledge via incremental hash-based diffing.

The installer (python3 -m tools.installer) generates native configuration formats for each supported tool and writes them to the appropriate IDE directories.

IDEs without native subagent support, Cursor, Windsurf, and OpenClaw receive content translated into skill files and rules, with Claude-specific prose stripped and path variables rewritten to absolute references.

Target	Config Format	Scope
Claude Code	`.claude/agents/*.md`	Global + Project
OpenAI Codex	`.codex/agents/*.toml`	Global + Project
Google Gemini	`.gemini/agents/*.md`	Global + Project
Cursor	`.cursor/skills/` (skill translation)	Global + Project
Windsurf	`.windsurf/rules/*.md` (≤12 KiB/file)	Global + Project
VS Code Copilot	`.github/agents/*.agent.md` (≤30 KiB)	Project + Global-MCP
OpenClaw	`~/.openclaw/workspace/AGENTS.md`	Global + Project

50 Agents Across Five Tracks

The agent roster spans 19 HackerOne weakness specialists (xss-hunter, sqli-hunter, ssrf-hunter, rce-hunter, oauth-hunter, llm-ai-hunter), an 8-agent SAST pipeline, infrastructure and recon agents (cloud-recon, js-analyzer, graphql-audit, waf-profiler), and a web3-auditor for Solidity and DeFi patterns.

Five deep methodology skills accompany the hunters — each distilled from hundreds of real paid reports — including hunt-rce (RSC CVE-2025-55182, runc Leaky Vessels, BentoML pickle), hunt-xss (DOMPurify mXSS, n8n MCP OAuth XSS GHSA-537j-gqpc-p7fq), and hunt-llm-ai (aligned to OWASP LLM Top 10 v2025 and the Agentic AI Top 10).instagram+1

Cost tracking runs via CC hooks: the SubagentStop event fires cost_hook.py, logging agent name and session cost to cost-tracking.json, with live spend visible in the statusline.

A PreToolUse scope hook (scope_hook.py) matches every Bash command against scope.yaml using exact and wildcard patterns, blocking out-of-scope execution before the tool call fires. CVSS scoring is enforced programmatically — cvss_version_guard.py mandates CVSS 3.1 for HackerOne and CVSS 4.0 for all other platforms.

Quick Start

bashexport HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
uv run python3 tools/scaffold.py hackerone tesla
cd ~/bounties/hackerone-tesla && claude
/hunt tesla.com

Requirements include Python 3.10+, uv, and standard recon tooling: nmap, httpx, subfinder, nuclei, ffuf, katana, and sqlmap.

The framework is available at GitHub and is licensed exclusively for authorized security testing under responsible disclosure. A bundled rag-builder/ utility can construct local FAISS writeup indexes from a 146-repository seed list covering CTF archives, bug-bounty reports, and payload collections all destructive operations gated behind an explicit --execute flag.

Follow us on Google News, LinkedIn, and X to Get More Instant Updates.

Source link