Phishing remains one of the most stubbornly persistent threats in cybersecurity: humans are tired, distracted, trusting, and susceptible to urgency and authority in ways that no amount of awareness training can completely overcome.
The security community has largely accepted this reality and shifted focus toward automated detection systems that can intercept and block phishing threats before users see them.
But attackers have adapted here, too. Modern phishing campaigns increasingly employ cloaking techniques, serving benign content to scanners, while delivering malicious pages to real victims, or simply blocking automated retrieval altogether. Thus, the automated defenses we built to compensate for human fallibility are being systematically blinded.
Following the URL infrastructure
Researchers from Tokyo Metropolitan University have created PhishLumos, a system that that treats redirection and inaccessible, deceptive and blank pages as suspicious, and pivots away from content analysis altogether.
Instead, it turns to the underlying infrastructure of the URL: the IP addresses it resolves to, the network connections it shares with other domains, the SSL certificates it uses, and traces left behind in historical scan records.
This allows it to reconstruct the broader phishing campaign and other URLs used for it, as well as create a graph that describes how the campaign is organized, which assets are connected, and how the pieces relate to one another.
This graph is finally used by a coordinated team of specialized LLM-powered agents to profile campaigns and create validated detection rules.
Two-phase workflow of PhishLumos (Source: IEEE)
What real-world tests showed
“On 103 real-world campaigns (6,020 URLs), PhishLumos achieved a median campaign coverage of 100% and a median detection lead time of 192.8 hours (8.0 days) before expert verification, with a 0.1% false positive rate on 1,000 benign URLs,” the researchers shared.
“In a six-month in-the-wild study starting from 600 challenging seed URLs, the generated rules uncovered 192,407 additional URLs; 92.0% were later flagged as malicious by at least one engine in a multi-engine scanning service.”
Since it works by finding connections between URLs that share the same underlying infrastructure, it struggles when attackers avoid reuse (e.g., when they use throwaway infrastructure).
In the researchers’ real-world study, PhishLumos was able to generate detection rules for just over half of the starting URLs it was given, with the remaining cases lacking enough shared infrastructure to work with.
The system is also only as good as the external data sources (web scans, passive DNS, and certificate logs) it queries: gaps or blind spots in those records can limit how completely it maps a campaign.
PhishLumos is designed to complement existing defenses, not replace them. For cases where it comes up empty, traditional URL scanners and human analysts remain necessary.
“PhishLumos is an analyst-facing offline tool for triaging a small number of high-priority seed URLs. It is not designed for line-rate inspection,” the researchers explained in the end.
“The campaign-level objective aligns with how phishing operations are organized and how threat intelligence is consumed in practice. Rather than returning a per-URL label, PhishLumos produces reusable mitigation artifacts that directly support workflows such as hunting, blocking, takedown requests, and information sharing.”
![]()
Subscribe to our breaking news e-mail alert to never miss out on the latest breaches, vulnerabilities and cybersecurity threats. Subscribe here!
![]()

