What is Cloudflare's accusation against Perplexity?

Cloudflare accused Perplexity of using 'disguised crawlers' to circumvent website `robots.txt` directives, as detailed in an August 4, 2025, blog post.

How does Perplexity allegedly bypass robots.txt?

Perplexity allegedly switches to a secondary bot that hides its identity after its main crawler is blocked, presenting as a typical web-browser user-agent.

What steps did Cloudflare take against Perplexity?

Cloudflare removed Perplexity from its 'Trusted Bots' list, enabled direct blocking for its users, and published technical information about the cloaked crawler.

What is the significance of robots.txt for websites?

`robots.txt` is a standard protocol used by site owners to manage bot access and control intellectual property and server load.

How might AI crawling impact content creators?

The advanced crawling tactics of AI, like those attributed to Perplexity, could lead to an arms race between content creators protecting data and AI systems seeking information.

Navigation

Promtheon AI Discovery

Profile

Cloudflare Charges Perplexity with 'Stealth' Bot Crawling

Cloudflare, on August 4, 2025, accused AI search engine Perplexity of using 'disguised crawlers' to evade `robots.txt`. This raises major AI data ethics questions, prompting Cloudflare to remove Perplexity from 'Trusted Bots' and offer direct blocking for users.

blog.cloudflare.com

Bypassing Robots.txt and undisclosed IPs/User Agents We observed that Perplexity uses not only their declared user-agent, but also a generic ...

19 жовтня 2025 р., 18:34

3 min read

Perplexity Accused of Evading Website Controls with "Stealth" Crawlers

Cloudflare, a well-known internet infrastructure and security firm, has publicly called out Perplexity, an artificial-intelligence search engine that relies on large language models, for consistently circumventing website indexing rules. The allegation, outlined in an August 4, 2025 blog post by Cloudflare, claims that Perplexity uses "disguised crawlers" to sidestep robots.txt directives, the standard protocol that site owners employ to manage bot access.

According to Cloudflare's technical review, when Perplexity's main crawler is blocked via robots.txt, the service allegedly switches to a secondary bot. This backup crawler is said to hide its identity, presenting itself as a typical web-browser user-agent in order to keep scraping content from sites that have expressly denied it. This approach marks a departure from conventional internet etiquette concerning bot identification and content indexing.

In reaction to these findings, Cloudflare has taken several decisive steps:

Perplexity has been formally removed from Cloudflare's "Trusted Bots" list. This label normally gives certain automated agents preferential treatment, such as exemption from some security challenges, based on their compliance with web standards.
Website owners using Cloudflare's services can now block Perplexity directly through their Cloudflare settings. This gives site operators a fine-grained control mechanism to enforce their content-access policies.
Cloudflare has published detailed technical information about Perplexity's "cloaked crawler," allowing other network administrators and security professionals to spot and block these disguised agents on their own.

The ramifications of Perplexity's purported actions go beyond simple technical violations, sparking significant discussion about data ethics in the fast-moving artificial-intelligence arena. The intentional bypass of robots.txt-a foundational tool for webmaster control over intellectual property and server load-raises questions about the long-term effectiveness of current blocking strategies against increasingly sophisticated AI agents. Experts warn that such tactics could fuel an arms race between content creators seeking to safeguard their data and AI systems eager to ingest massive amounts of information.

The episode highlights a growing tension between the massive data needs of large language models and the rights of content creators to decide how their material is accessed and used. As AI models become more skilled at mimicking human browsing patterns, the potency of traditional bot-detection and blocking methods may wane, potentially prompting the development of new protocols or legal frameworks to regulate AI data collection.

Promtheon.com|Fact-checking

The original article accurately reports on Cloudflare's flagging of Perplexity for allegedly bypassing robots.txt directives. The information presented aligns directly with the details provided in the Cloudflare blog post. Specifically, the original article correctly identifies Cloudflare as the source of the allegation, states that Perplexity is accused of using "disguised crawlers" or "browser-masked bot[s]" to evade site-level restrictions, and notes that Perplexity has been excluded from Cloudflare's trusted bots list. It also correctly mentions that website owners can block Perplexity via Cloudflare settings and that Cloudflare shared technical details of the cloaked crawler.

The Cloudflare blog post explicitly details how they observed Perplexity's crawlers ignoring robots.txt rules and user-agent string directives. Cloudflare states that when their systems blocked known Perplexity bots, they then detected sophisticated, browser-masked bots that appeared to be Perplexity continuing to scrape content from sites that had explicitly opted out. The blog post provides technical analysis of these observations, including methods used by Perplexity to obfuscate its identity, such as spoofing IP addresses and user-agent strings. The original article's inclusion of a sentence questioning "AI data ethics" and the efficacy of blocks also reflects a relevant discussion point emerging from such incidents.

There are no discrepancies or misrepresentations of the facts as presented by Cloudflare. The original article acts as a concise summary of the Cloudflare announcement, capturing the main points without introducing additional claims or distortions.

20 жовтня 2025 р.

FalseMisleadingPartially accurateAccurate

blog.cloudflare.com

Bypassing Robots.txt and undisclosed IPs/User Agents We observed that Perplexity uses not only their declared user-agent, but also a generic ...