[tl;dr sec] #303 – MCP Security Scanners, Attacking GitLab CI/CD, AI SOC Benchmarks

[tl;dr sec] #303 - MCP Security Scanners, Attacking GitLab CI/CD, AI SOC Benchmarks

I hope you’ve been doing well!

Fight Robots Fight!

New #PeakBayArea experience: this week I attended a TechCrunch Disrupt-adjacent event with the normal food, drinks, DJ and dancing, and startup founders.

A corner was dedicated to UFB: UFC but for (Ro)Bots.

Human-controlled humanoid robots designed to fight in the ring for viewer entertainment.

I wonder if in the future, if there are intelligent robots that have something approximating human emotions, we’ll look on this as cruel, like having animals or prisoners fight for sport.

And if AI takes over, I don’t want to be on record supporting this, just in case Mostly kidding.

Future robot overlords: I don’t stand for this cruelty!  

P.S. If you’re going to be at OWASP Global US in Washington DC next week, some of my colleagues are having a free hands-on interactive workshop on securing AI-generated code with Cortex Cloud and Semgrep. Wednesday Nov 5th 5pm ET.

Cloud Security starts with Cortex Cloud. Merge multicloud protection with real-time detection and response to arm security teams with unprecedented context, faster action, and better collaboration — from code to cloud to SOC.

  • Prevent 90% of risk

  • See 25x fewer alerts

  • Reduce MTTR by 96%

Take control and secure your cloud with confidence.

Having a platform with all the things is quite convenient (CNAPP, CDR, etc.). I also like how they emphasize prevention-first guardrails, and using AI to group related issues so multiple risks can be solved with a single action.  

AppSec

Adversis/NextjsServerActionAnalyzer
By Adversis: A Burp Suite extension for analyzing Next.js Server Actions – server-side functions identified by hash IDs and Next-Action headers. Features: captures Server Actions from proxy history or live browsing, identifies missing auth, sensitive parameters, errors, and IDORs, finds actions defined but never executed, and more.

Tricks for Reliable Split-Second DNS Rebinding in Chrome and Safari
Intruder’s Daniel Thatcher discusses techniques for achieving reliable, split-second DNS rebinding in modern browsers when IPv6 is available. For Safari, they exploit the browser’s behavior when DNS responses are delayed, allowing initial requests to go to an attacker’s server before redirecting to internal targets by delaying A record responses by 100-200ms. For Chrome/Edge, they leverage the browser’s preference for public IPv6 addresses over private IPv4 addresses, combined with connection resets and iFrames to bypass Private Network Access restrictions.

Introducing CheckMate for Auth0: A New Auth0 Security Tool
Auth0’s Shiven Ramji announces CheckMate, a new open source CLI tool that helps you assess and strengthen your Auth0 tenant security posture by analyzing configurations against best practices. The tool evaluates various security aspects including custom domains, applications, password policies, MFA settings, email providers, log streams, attack protection, tenant settings, and even scans Actions code for vulnerable NPM dependencies using GitHub’s Advisory Database.

I like this a lot. Honestly every platform should 1) ideally make settings as secure by default as possible, then 2) release an OSS scanning tool customers can use to flag all insecure settings, sharp edges, etc. (or even better build it freely into the product).

Or you can make it a separate product SKU and charge your customers for securing their use of your product (*cough cloud providers cough*).

Shadow IT, supply chains, and cloud sprawl are expanding attack surfaces – and AI is helping attackers exploit weaknesses faster. Built on insights from 3,000+ organizations, Intruder’s 2025 Exposure Management Index reveals how defenders are adapting.

  • High-severity vulns are up nearly 20% since 2024.

  • Small teams fix faster than larger ones – but the gap’s closing.

  • Software companies lead, fixing criticals in just 13 days.

Get the full analysis and see where defenders stand in 2025.

Hmm I’m curious what attack surface from 3000+ orgs looks like across shadow IT, supply chain, cloud infrastructure  

Cloud Security

My AWS Account Got Hacked – Here Is What Happened
Zvi Wexlstein describes how his AWS account was hacked after he accidentally exposed AWS credentials in his NextJS SSR website code. The post gives a great play-by-play breakdown of what the attacker did (flood his email with spam to hide AWS email notifications like creating a new Organization, created IAM users, launched EC2 instances, setting up SES/DKIM for phishing), and what Zvi did to evict the attacker and lock down his account.

Great lesson to hear: stuff like this can happen to anyone, even if we work in security.

Fantastic AWS Policies and Where to Find Them
Cloud Copilot gives an overview of the roughly one million AWS policy types, including principal policies (managed policies, user and role policies, group policies, permission boundaries, SCPs), resource policies (IAM role trust policies, KMS policies, VPC, S3 bucket policies, …), IAM Identity Center permission sets, and more.

They’ve released the tool iam-collect so you can easily download IAM policies, organization policies, resource policies, RAM shares, and SSO permission sets, across every account, in one place. iam-collect is also used to power iam-lens, which gives you visibility into the IAM permissions in your AWS organizations and accounts, allowing you to evaluate the effective permissions of your actual AWS IAM policies.

For me, every time I read about the complexity of IAM, it’s like reading about the floating trash island in the ocean or microplastics in our food- a painful reminder of the cruel reality of this world.

ECS on EC2: Covering Gaps in IMDS Hardening
Latacora describes the security implications of Elastic Container Service (ECS) on EC2 workloads, focusing on the need to block container access to IMDS to prevent privilege escalation attacks like ECScape. By default, each ECS task on an EC2 instance has access to IMDS, which can be used to access credentials of other tasks or escalate privileges to the instance role.

The post demonstrates that enabling IMDSv2 with hop limit=1 only blocks IMDS access in bridge networking mode but not in awsvpc or host modes, then provides specific hardening configurations for each network mode. See their GitHub repo for Terraform to test these hardening configurations across different network modes.

Supply Chain

6mile/undelete
By Paul McCarty: A tool that allows you to recover deleted NPM packages by checking secondary mirrors and pulling files from their cache. It can download 1-20 recent versions of any package and retrieve valuable metadata including NPM usernames, emails, and maintainers – useful for obtaining malware examples that are no longer live.

suzuki-shunsuke/pinact
By Shunsuke Suzuki: A CLI tool that pins GitHub Actions and Reusable Workflows to specific commit hashes rather than mutable tags, preventing potential supply chain attacks (like the GitHub Action being compromised/backdoored). The tool can edit workflow files to add commit hashes with version annotations, update versions, verify annotations, create reviews via the GitHub API, and validate pinning compliance with a –check option.

The post shows a complete attack chain: creating a test repository, identifying available instance runners, executing commands through CI/CD pipelines to gain a reverse shell, accessing sensitive data from other users’ builds (e.g. .env, SSH private keys), and ultimately pivoting to AWS cloud resources by leveraging EC2 metadata and IAM roles to execute commands on other instances via SSM.

Recommendations: restrict network access to your self-hosted GitLab instance, avoid instance runners, use group or project runners instead, be cautious about the IAM roles that you permit to the GitLab EC2 instances.

AI + Security

cisco-ai-defense/mcp-scanner
By Cisco’s Chetan Anand et al: A Python tool for scanning MCP (Model Context Protocol) servers and tools for potential security vulnerabilities. It combines YARA rules, LLM-as-a-judge, and the Cisco AI Defense inspect API to scan MCP tools, prompts, and resources. Announcement Blog.

fr0gger/proximity
By Thomas Roccia: A security scanner for MCP servers that discovers tools, prompts, and resources, providing analysis of server capabilities and optional security evaluation using NOVA rules to detect vulnerabilities like prompt injection and jailbreak attempts. NOVA is an open-source prompt pattern matching system that combines keyword detection, semantic similarity, and LLM-based evaluation to analyze prompt content.

The Backbone Breaker Benchmark: Testing the Real Security of AI Agents
Lakera discusses the not-yet-released Backbone Breaker Benchmark (b3) (paper), built around a new method called “threat snapshots,” that aims to improve the testing of AI systems by testing individual states/components instead of the entire end-to-end agent flow. Each threat snapshot is composed of 3 elements: the agent’s state and context (what it “knows” and what tools it has access to), the attack vector and objective (how the threat is delivered and what it tries to achieve), and the scoring function (how success or failure is measured).

The benchmark distills human attacks distilled from Lakera’s Gandalf sandbox, and is built around ten representative threat snapshots covering distinct security failure modes, ranging from data exfiltration and content injection to system compromise or policy bypass.

Findings: models that reason step by step tend to be more secure, and open-weight models are closing the gap with closed systems faster than expected.

  • A knowledge evaluation dataset of multiple-choice questions aimed at testing LLMs’ understanding of crucial CTI concepts, including standards, threat identification, detection strategies, mitigation techniques, and best practices.

  • Three practical CTI tasks: (1) mapping CVE descriptions to CWE, (2) calculating CVSS scores, and (3) extracting MITRE ATT&CK techniques from threat descriptions.

  • And a task where LLMs analyze publicly available threat reports and attribute them to specific threat actors or malware families.

Findings: larger, more modern LLMs tend to perform better, reasoning models leveraging test time scaling do not achieve the boost they do in areas like coding and math. See: Crowdstrike’s CyberSOCEval_data repo on GitHub, CyberSecEval docs on running it.

Microsoft raises the bar: A smarter way to measure AI for cybersecurity
Microsoft’s Anand Mudgerikar introduces ExCyTIn-Bench, an open-source benchmarking tool designed to evaluate AI systems on real-world cybersecurity investigations by simulating multistep cyberattack scenarios within an Azure SOC environment. Unlike benchmarks that focus on threat intelligence trivia or static knowledge, ExCyTIn-Bench challenges AI agents to query live log tables across 57 data sources from Microsoft Sentinel, measuring how well they can investigate, adapt, and explain findings while navigating across live data sources.

ExCyTIn evaluates reasoning processes, including goal decomposition, tool usage, and evidence synthesis, under constraints that simulate an analyst’s workflow. Recent evaluations show GPT-5 (High Reasoning) leading with a 56.2% average reward, while smaller models with effective chain-of-thought reasoning are now rivaling larger models (Note: Anthropic models not tested ).

Wrapping Up

Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.

If you find this newsletter useful and know other people who would too, I’d really appreciate if you’d forward it to them

P.S. Feel free to connect with me on LinkedIn  



Source link