The world’s most “Dangerous” AI, Anthropic’s Mythos, found only one flaw in curl

Anthropic’s AI found five vulnerabilities in curl, but only one low-severity issue proved to be a real vulnerability.
In April, Anthropic made considerable noise announcing Mythos, a new artificial intelligence model described as so effective at identifying vulnerabilities in code as to be, in the company’s own words, “dangerously good.” So good, in fact, that Anthropic decided against releasing it to the general public, instead distributing access to a small group of major organizations to give them time to patch their most critical flaws before the model reached everyone else.
The industry reacted with a degree of alarm. Thousands of zero-days identified in a matter of weeks, software security as we knew it thrown into question, the script had all the ingredients of a viral tech story. And so it became one.
Then Daniel Stenberg weighed in. Stenberg is the creator and lead developer of curl, the data transfer library present on over twenty billion devices, every smartphone, every connected car, every server on the planet uses curl in one way or another. Through the Linux Foundation’s Alpha Omega project, he too was granted access, indirectly, via a third party, to a Mythos analysis of curl’s codebase. The result? The model analyzed 176,000 lines of C code and returned five vulnerabilities it described, with notable self-assurance, as “confirmed.”
“curl is currently 176,000 lines of C code when we exclude blank lines. The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Peace.” wrote Stenberg. “The report concluded it found five “Confirmed security vulnerabilities”. I think using the term confirmed is a little amusing when the AI says it confidently by itself. Yes, the AI thinks they are confirmed, but the curl security team has a slightly different take.
Five issues felt like nothing as we had expected an extensive list. Once my curl security team fellows and I had poked on the this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”.”
Three of them turned out to be false positives, behaviors already documented in the API documentation, and one was simply a bug, not a security issue. A single real vulnerability remained, rated low severity, scheduled to be included in the curl 8.21.0 release in late June.
Daniel Stenberg concluded that the hype around Anthropic’s Mythos AI looked more like marketing, as he saw no major advantage over existing security tools.
“My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos.” he added.
curl is not an ordinary codebase. As Stenberg himself notes, and as the Mythos report openly acknowledges at the very top of its analysis: “curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely.” In the months prior, other AI-powered tools, Zeropath, AISLE, OpenAI’s Codex Security, had already produced somewhere between two and three hundred bugfixes in the codebase, including a dozen or more confirmed CVEs. Mythos arrived late, on ground that had already been extensively turned over.
There is also the Mozilla comparison. Mythos found over 270 vulnerabilities in Firefox, a result that genuinely impressed the browser’s security team. But Mozilla also made clear that every bug the model identified could have been found by elite human researchers. The value was not in the unreachability of the findings, but in the speed: closing the window between attacker discovery and vendor patch.
Stenberg, for his part, does not dismiss AI tooling in general, quite the opposite.
“AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past,” he wrote.
The argument is narrower: that Mythos, at least on curl, did not demonstrate meaningful superiority over what already exists.
Daniel Stenberg did not directly interact with Anthropic’s Mythos AI and only reviewed a generated report, limiting a full evaluation of the model’s capabilities. While the AI found just one low-severity flaw in curl’s heavily audited codebase, the results neither confirm the industry hype nor completely dismiss the technology. The test suggests AI vulnerability research may be useful, but current claims about revolutionary capabilities still appear overstated.
“Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools.” Stenberg concluded.
Follow me on Twitter: @securityaffairs and Facebook and Mastodon
Pierluigi Paganini
(SecurityAffairs – hacking, Anthropic)

