Cloudflare, on August 4, 2025, accused AI search engine Perplexity of using 'disguised crawlers' to evade `robots.txt`. This raises major AI data ethics questions, prompting Cloudflare to remove Perplexity from 'Trusted Bots' and offer direct blocking for users.
Cloudflare, a well-known internet infrastructure and security firm, has publicly called out Perplexity, an artificial-intelligence search engine that relies on large language models, for consistently circumventing website indexing rules. The allegation, outlined in an August 4, 2025 blog post by Cloudflare, claims that Perplexity uses "disguised crawlers" to sidestep robots.txt
directives, the standard protocol that site owners employ to manage bot access.
According to Cloudflare's technical review, when Perplexity's main crawler is blocked via robots.txt
, the service allegedly switches to a secondary bot. This backup crawler is said to hide its identity, presenting itself as a typical web-browser user-agent in order to keep scraping content from sites that have expressly denied it. This approach marks a departure from conventional internet etiquette concerning bot identification and content indexing.
In reaction to these findings, Cloudflare has taken several decisive steps:
The ramifications of Perplexity's purported actions go beyond simple technical violations, sparking significant discussion about data ethics in the fast-moving artificial-intelligence arena. The intentional bypass of robots.txt
-a foundational tool for webmaster control over intellectual property and server load-raises questions about the long-term effectiveness of current blocking strategies against increasingly sophisticated AI agents. Experts warn that such tactics could fuel an arms race between content creators seeking to safeguard their data and AI systems eager to ingest massive amounts of information.
The episode highlights a growing tension between the massive data needs of large language models and the rights of content creators to decide how their material is accessed and used. As AI models become more skilled at mimicking human browsing patterns, the potency of traditional bot-detection and blocking methods may wane, potentially prompting the development of new protocols or legal frameworks to regulate AI data collection.
The original article accurately reports on Cloudflare's flagging of Perplexity for allegedly bypassing robots.txt
directives. The information presented aligns directly with the details provided in the Cloudflare blog post. Specifically, the original article correctly identifies Cloudflare as the source of the allegation, states that Perplexity is accused of using "disguised crawlers" or "browser-masked bot[s]" to evade site-level restrictions, and notes that Perplexity has been excluded from Cloudflare's trusted bots list. It also correctly mentions that website owners can block Perplexity via Cloudflare settings and that Cloudflare shared technical details of the cloaked crawler.
The Cloudflare blog post explicitly details how they observed Perplexity's crawlers ignoring robots.txt
rules and user-agent string directives. Cloudflare states that when their systems blocked known Perplexity bots, they then detected sophisticated, browser-masked bots that appeared to be Perplexity continuing to scrape content from sites that had explicitly opted out. The blog post provides technical analysis of these observations, including methods used by Perplexity to obfuscate its identity, such as spoofing IP addresses and user-agent strings. The original article's inclusion of a sentence questioning "AI data ethics" and the efficacy of blocks also reflects a relevant discussion point emerging from such incidents.
There are no discrepancies or misrepresentations of the facts as presented by Cloudflare. The original article acts as a concise summary of the Cloudflare announcement, capturing the main points without introducing additional claims or distortions.
20 жовтня 2025 р.
Related Questions