Kali Linux Enhances AI-driven Penetration Testing with Local Ollama, 5ire, and MCP Kali Server

March 10, 2026 3 min read

Table of Contents

Ollama as the Local LLM Engine
MCP-Kali-Server Bridges the AI to the Terminal
5ire Closes the Gap Between Ollama and MCP
End-to-End Validation: Natural Language Nmap

Kali Linux AI-driven Penetration Testing

The Kali Linux team has published a new entry in its growing LLM-driven security series, this time eliminating all reliance on third-party cloud services by running large language models entirely on local hardware.

The guide demonstrates how security professionals can use natural language to drive penetration testing tools, all processed on-premise, with no data leaving the machine.

Privacy and operational security concerns have long made cloud-dependent AI tools a liability in sensitive penetration testing environments. The new Kali Linux guide directly addresses this by walking through a fully self-hosted stack in which the LLM, the model context server, and the GUI client all run locally.

The setup requires an NVIDIA GPU with CUDA support, a practical constraint the guide acknowledges upfront: the cost is in hardware acquisition and running expenses, not subscription fees.

The reference hardware used is an NVIDIA GeForce GTX 1060 with 6 GB of VRAM, a mid-range consumer GPU that is capable but not excessive.

The guide installs NVIDIA’s proprietary non-free drivers to enable CUDA acceleration, replacing the open-source Nouveau driver, which lacks the compute support needed for local LLM inference. After a reboot, nvidia-smi confirm Driver Version 550.163.01 and CUDA Version 12.4 are operational.

google

Ollama as the Local LLM Engine

The backbone of the stack is Ollama, a wrapper llama.cpp that simplifies downloading and serving open-weight language models. Installed via a manual extraction of its Linux AMD64 tarball and configured as a systemd service, Ollama runs persistently in the background at startup.

Three models with native tool-calling support were pulled for evaluation: llama3.1:8b (4.9 GB), llama3.2:3b (2.0 GB), and qwen3:4b (2.5 GB) — all sized to fit within the 6 GB VRAM constraint. Tool support is a hard requirement here; without it, the LLM cannot invoke external commands through the MCP layer.

MCP-Kali-Server Bridges the AI to the Terminal

The Model Context Protocol (MCP) is what transforms a conversational LLM into an active security tool. The mcp-kali-server package — already available in Kali’s repositories — acts as a lightweight API bridge that exposes a local Flask server on 127.0.0.1:5000.

When started, it verifies the presence of tools like nmap, gobuster, dirb, nikto, and others. A companion mcp-server binary connects to this API and presents available tools to the MCP client.

The server also supports AI-assisted penetration testing tasks such as web application testing, CTF challenge solving, and interaction with platforms like Hack The Box or TryHackMe.

5ire Closes the Gap Between Ollama and MCP

Since Ollama itself does not natively support MCP, a client bridge is needed. The guide selects 5ire — an open-source AI assistant and MCP client distributed as a Linux AppImage.

Version 0.15.3 is installed to /opt/5ire/, linked into the system path, and configured with a desktop entry. Within 5ire’s GUI, users enable Ollama as the provider, toggle tool support on for each model, and register mcp-kali-server as a local tool with the command /usr/bin/mcp-server.

End-to-End Validation: Natural Language Nmap

The stack’s real-world capability was validated with a prompt asking 5ire backed by qwen3:4b to perform a TCP port scan of scanme.nmap.org across ports 80, 443, 21, and 22.

The LLM correctly interpreted the natural language request, invoked nmap through the MCP chain, and returned structured results — entirely offline, with ollama ps confirming 100% GPU processing throughout.

This setup demonstrates a viable, privacy-preserving alternative to cloud AI assistants for offensive security work.

According to the Kali Linux Team, the full-stack Ollama, mcp-kali-server, and 5ire are open source, hardware-dependent rather than service-dependent, and tunable based on available VRAM.

For red teams and security researchers operating in air-gapped or data-sensitive environments, the combination of local inference and MCP-driven tool execution marks a meaningful step toward autonomous, offline AI-assisted penetration testing.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews