Perplexity AI, an emerging question-answering engine powered by advanced large language models, has recently come under scrutiny for deploying stealth crawling techniques that bypass standard web defenses.
Initially launched with transparent intentions, Perplexity’s crawlers would identify themselves via declared user agents such as PerplexityBot/1.0
, respecting robots.txt directives and web application firewall (WAF) rules.
However, in early August 2025 researchers observed that once blocked, Perplexity began modifying its identity mid-crawl, switching to generic browser user agents and unannounced IP ranges in order to access disallowed content.
Cloudflare analysts noted that this shift in behavior represented a deliberate evasion tactic rather than an inadvertent misconfiguration.
After encountering network-level blocks, the system altered its user agent string to impersonate Chrome on macOS, issuing requests like:-
GET /secret-page.html HTTP/1.1
Host: testexample.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36
These stealth requests rotated through multiple autonomous system numbers (ASNs) and IP blocks not publicly attributed to Perplexity, enabling persistent access across millions of daily requests.
The ramifications of this behavior are significant. Website operators who explicitly disallowed Perplexity in their robots.txt files and deployed custom WAF rules reported continued unauthorized scraping of sensitive pages.
.webp)
This abuse of trust undermines core internet principles and raises legal and policy questions regarding AI training data sourcing.
Content owners now face the difficulty of distinguishing legitimate human traffic from obfuscated AI crawlers, complicating compliance with privacy regulations and copyright protections.
Furthermore, Perplexity’s fallback strategy upon being blocked—relying on alternative data sources—demonstrates adaptive persistence.
When direct crawling was unsuccessful, the system generated answers based on secondary websites, though with diminished specificity compared to original content.
This multi-source aggregation underscores the AI’s resilience and amplifies concerns over data provenance and accuracy.
Detection Evasion Mechanisms
A key aspect of Perplexity’s sophisticated persistence is its dynamic user agent rotation combined with rapid ASN hopping.
By programmatically cycling through user agents and IP prefixes, the crawler evades signature-based firewall rules.
Cloudflare researchers identified that stealth crawlers maintain session continuity by preserving cookies and referrer headers across identity changes, effectively masquerading as individual human users.
Mitigation requires behavioral analysis that flags anomalous patterns—high request velocity, uniform inter-request timing, and repeated cookie exchanges—rather than static signature matching.
Continuous refinement of bot management heuristics and adoption of emerging standards like Web Bot Auth are crucial to counteract this evolving threat.
Integrate ANY.RUN TI Lookup with your SIEM or SOAR To Analyses Advanced Threats -> Try 50 Free Trial Searche