Can We Trust AI To Write Vulnerability Checks? Here’s What We Found


Vulnerability management is always a race. Attackers move quickly, scans take time, and if your scanner can’t keep up, you’re left exposed.

That’s why Intruder’s security team kicked off a research project: could AI help us build new vulnerability checks faster, without dropping our high standards for quality?

After all, speed is only useful if the detections are solid – a check that fires false positives (or worse, misses real issues) doesn’t help anyone.

In this post, we’ll share how we’ve been experimenting with AI, what’s working well, and where it falls short.

One-shot vs. Agentic Approach

We started simple: drop prompts into an LLM chatbot and see if it could write Nuclei templates. The results were messy. Outputs referenced features that didn’t exist, spat out invalid syntax, and used weak matchers and extractors. This was consistent across ChatGPT, Claude, and Gemini.

So we tried an agentic approach. Unlike a chatbot, an agent can use tools, search reference material, and follow rules. We went in with healthy skepticism (recent “vibe coding” disasters didn’t inspire confidence), but the improvement was immediate. 

We used Cursor’s agent, and very quickly saw that with minimal prompts, the quality of output from initial runs was far more promising. 

From there, we layered on rules and indexed a curated repo of Nuclei templates. This gave the agent solid examples to learn from, cut down inconsistencies, and nudged it towards using the right functionality. The quality of templates jumped noticeably and were far closer to what we’d expect from our engineers.

But it wasn’t set-and-forget. Left alone, the agent still needed course corrections. With clear prompting, though, it could generate checks that looked like they’d been written manually.

That’s when our goal shifted: not full automation, but a productivity tool that helps us ship quality checks faster without lowering the bar.

Backlogs don’t stand a chance. GregAI, your AI security sidekick, cuts through the noise by prioritizing what matters, validating issues, and even writing your reports. Less slog, more time back.

Thousands already trust Intruder – why not you?

Learn More

Our Current Workflow

The process we’ve settled on (for now) uses a standard set of prompts and rules. The engineer provides key inputs, such as:

With those in place, the agent builds the template. It’s not fully “vibe-coded,” but it’s much faster and frees our engineers to spend more time on deeper research.

Successes

Attack Surface Checks

Agentic AI has been especially useful for creating checks where no public templates exist. One sweet spot: detecting admin panels exposed to the internet. These checks are simple in principle, but writing them at scale is time-consuming. With automation, we can produce far more of them, much faster.

We’re often surprised at how many products aren’t covered by the major scanners we use under the hood. This process helps us fill those gaps and give customers a fuller view of their attack surface. Because if your VM scanner isn’t flagging exposed panels – and your estate is large – chances are you won’t know they’re there.

Unsecured Elasticsearch

We created an unsecured Elasticsearch check as a quick win for the agentic workflow. A public Nuclei detection template existed, but it didn’t cover the worst-case: instances left wide open where anyone can read data. That’s the case we wanted to reliably detect.

What we fed the agent:

  • The task in 2-3 short sentences – e.g. detect Elasticsearch instances, make a request to X endpoint and then a follow-up request to Y endpoint to see if data is really exposed.

  • A list of testing targets hosting Elasticsearch servers

  • An example target that was vulnerable to the method we wanted to test

  • An example target that was not vulnerable

The agent then iterated through our process using the custom rules that we set.

The final result was a Nuclei template that lists data sources and follows promising endpoints to confirm whether unauthenticated users can read data – a multi-request template with working matchers and extractors suitable for automated scanning.

There was still manual input and judgement from our security engineering team, but the agent handled the repetitive heavy lifting.

Challenges

Our exploration so far has not been without its roadblocks and rethinks. 

Limits of Current Outputs

Even with rules in place, the agent sometimes strays. One example: it built a check for an exposed admin panel but didn’t include strong enough matchers, which risked false positives. A quick extra prompt fixed it – we added a favicon matcher unique to that product – but it’s a reminder that the agent still needs guardrails. Until it can reliably choose the strongest matchers and validate them, human oversight stays essential.

Truncated Curl Output

Cursor often pipes ‘curl’ responses through ‘head’ to save tokens. Unfortunately, this can miss unique identifiers that would make ideal matchers. It’s an efficiency feature, but it works against us and we haven’t fully solved it yet.

Forgetting the Basics

Sometimes Cursor overlooks Nuclei’s own flags, like -l for running against a host list, and instead scripts a manual loop. We’re working on new rules to remind it of key Nuclei features and cut out that inefficiency.

What’s Next?

AI is being pitched everywhere as a silver bullet to replace complex tasks outright. From our perspective, much of that is marketing hype. We’re still a long way from handing over security engineering to an AI agent without close supervision.

That’s not to say it’s impossible, but for now we’re cautious of anyone claiming full automation. We’ll keep pushing AI in vulnerability management, both as a productivity tool and, where possible, towards safe automation.

But the bottom line today is clear: to deliver high-quality custom checks that don’t miss vulns or generate false positives, expert engineers remain essential.
 

Author bio: Benjamin Marr, Security Engineer at Intruder

Ben is a Security Engineer at Intruder, where he automates offensive security scanning and carries out security research. His background is as an OSWE certified penetration tester and PHP software engineer.

Sponsored and written by Intruder.



Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.