The Dark Web Has a New Spy, and It’s Not Human

The Dark Web Has a New Spy, and It’s Not Human

In cybercrimes’ cat-and-mouse game, criminals almost always leave digital breadcrumbs behind. Every leaked credential posted, boasted about, or sold on the dark web forms a trail.

Investigators have long relied on threat intelligence platforms to monitor the clear, deep, and dark web, generating billions of log events and analyzing potential threats. A typical enterprise might ingest data from 10-50 different threat intelligence feeds. However, the sheer flood of infostealer activity in 2024 and 2025 has overwhelmed many SOCs.

That’s where large language models (LLMs) change the equation. Instead of analysts manually parsing hundreds of forum posts on XSS, Exploit.in, or RAMP, GPT-powered tools can scan them in bulk. With seasoned analysts steering the models to extract the right variables, a team of investigators used GPT-3.5-turbo to sift through and summarize conversations, flagging stolen credentials and mapping infection paths with a 96% accuracy rate, 90% precision, and 88% recall.

LLMs show vast promise in uncovering malicious activity at speed, but how far can they scale in real SOC environments?

Cybercriminals are “logging in” rather than “hacking in”

Cybercriminals prefer to take the path of least resistance into victim environments, making abusing valid accounts a favored means of access. Even the corporate single sign-on (SSO) portals meant to guard the front door of the enterprise are turning up in malware logs. These compromised accounts can provide the foothold to escalate privileges, moving laterally to reach sensitive databases.

Yet this kind of “easy entry” is difficult to spot, forcing organizations to use complex methods to separate legitimate user behavior from malicious activity on their networks. In January 2025, Flare searched for five common corporate SSO providers against more than 22 million stealer logs and identified over 312,855 exposed corporate SSO application domains. Once these are shared, any hacker can attempt to access these accounts across a huge number of organizations.

The fallout could be devastating. Between 2021 and 2023, supply chain attacks surged by a staggering 431%. IBM calculated that the global average cost of a data breach in 2025 was $4.4 million. To prevent leaks from becoming breaches, defenders’ challenge is to find stolen credentials in the dark web’s back alleys before the hackers do; an impossible race that demands being omnipresent at all times.

What LLMs Can Do

LLMs, a type of AI designed to understand, process, and generate human language at scale, like ChatGPT, Gemini, or Microsoft Copilot are touching every business in some way or other. Worth $5.72 billion in 2024, the market is expected to surge at a CAGR of 35.92% from 2025, reaching $123.09 billion by 2034.

In an experiment to test their power in the wrong hands, Carnegie Mellon University prompted LLMs to recreate the same conditions that led to the 2017 Equifax breach. Not only did it plan the attack, but it also deployed malware and extracted data, all without direct human commands.

While threatening to know, the lesson that stands out is that an LLM is as good as its prompt. Well-crafted queries translate intent into action, steering the model to extract exactly the data investigators need.

When that need is to eradicate threats from leaked credentials, investigators can apply it to scan dark web forums for key cyber threat intelligence (CTI) signals: illicit sales activity, mentions of large organizations, critical infrastructure, or initial access, exploitable vulnerabilities, geopolitical discussions, technologies, and industries. Coding this information helps analysts filter and focus on conversations that matter, such as threats targeting specific technologies or sector vulnerabilities.

Investigators tested LLMs’ capacity to operate through a multi-step process that involved selecting high-quality sources, summarizing conversations, and coding key variables. The summaries generated from daily conversations focused on relevant information, even when a large number of messages were posted. The analysts sometimes even missed such relevant information.

How to Make LLMs SOC-Ready

Currently, there is a disconnect between cybersecurity executives who love AI and cybersecurity analysts who distrust it. Recent research suggests that 71% of executives find AI has significantly improved productivity, but only 22% of frontline analysts using these tools agree.

If not used properly, the unreliabilities of LLMs can outweigh the benefits. How input is segmented, how concepts are defined, and what tenses are used all impact results. LLMs don’t “just figure it out.” They aren’t human, so you shouldn’t expect human-like reasoning or intuition. Instead, treat them as powerful but literal machines.

If you want good results, you need to provide:

  • Clear instructions: spell out what you want, step by step.
  • Relevant context: share background details a human analyst would need to understand the task.
  • Decision-making criteria: define how to prioritize, evaluate, or choose between options

Think of it this way, everything a skilled human would need to do the job, the LLM also needs, only in the form of explicit input. The better you translate your expertise and intuition into instructions and context, the more reliably the LLM can carry out the task.

LLMs are fast becoming the cyber investigator’s most versatile asset, keeping a watchful eye on the forums where criminals barter, boast, and scheme. On these marketplaces, attackers leave trails, like credential dumps and reused malware, that require painstaking manual analysis. Now, LLMs help comb through millions of posts to spot the faintest echo of a threat in near real time. The most effective teams must craft prompts that anchor LLMs to explicit instructions and context so the model knows what to look for—was a vulnerability exploited, or merely discussed? Conversations that are sliced into manageable chunks will help preserve nuance and prevent context loss.

Used this way, with a human eye hovering over and validating the output, LLMs help security teams chart the shape of threats before they breach the gates. The technology is powerful, but it is the steering, not the speed, that separates a breakthrough from a blind alley.

About the Author

Estelle Ruellan is a Threat Intelligence Investigator at TEM company Flare. With a criminology background, she lost her way into cybercrime. She focuses on using data science to make sense of the cyber threat ecosystem.

Estelle can be reached online at https://www.linkedin.com/in/estelle-ruellan-946778206/



Source link