Perplexity caught using stealth, undeclared crawlers to evade website no-crawl directives
Imagine a web browser that promises to revolutionize how you explore the internet, powered by cutting-edge AI. Now imagine it’s launched by a company accused of sidestepping the internet’s ethical boundaries to gather the data that fuels it. That’s the story of Perplexity, an AI-powered search startup, which unveiled its Comet browser in July 2025, hot on the heels of allegations that it unlawfully scraped websites using stealth crawlers.
How does a company pivot from controversy to innovation, and what does this mean for the future of web browsing?
Perplexity Launches Comet Web Browser, After Getting Caught Scraping Sites Unlawfully
Perplexity, a rising star in AI-driven search, launched its Comet web browser in July 2025, aiming to redefine how users interact with the internet. Billed as a “thought partner,” Comet integrates Perplexity’s AI to provide real-time answers, contextual insights, and a seamless browsing experience. CEO Aravind Srinivas promised “core browsing improvements that Chrome hasn’t shipped for ages,” positioning Comet as a competitor to established giants.
But the launch comes under a cloud. Just weeks earlier, Cloudflare, a web security titan handling 20% of global web traffic, accused Perplexity of using “stealth, undeclared crawlers” to bypass website no-crawl directives, raising ethical and legal questions. This juxtaposition of innovation and controversy begs the question: Can Perplexity’s ambitious browser succeed while its data practices are under scrutiny?
What is Perplexity's Comet Browser: A New Frontier?
Comet is designed to blend AI-powered search with traditional browsing, offering features like:
The browser targets knowledge seekers, researchers, and professionals, boasting a user base of 15 million for Perplexity’s existing tools. Its launch aligns with a $18 billion valuation surge, fueled by deals like a partnership with Airtel to offer a free year of Perplexity Pro to 360 million customers. Yet, as Perplexity pushes boundaries, its methods for gathering the data that powers Comet have sparked a firestorm.
The Perplexity Scraping Scandal Exposed: What are Stealth Crawlers?
In August 2025, Cloudflare dropped a bombshell, alleging Perplexity used undeclared crawlers to evade website no-crawl directives, specifically robots.txt files. These files are the internet’s “do not enter” signs, guiding respectful bots on what content to avoid. Cloudflare’s tests revealed Perplexity’s tactics:
- Disguised User Agents: When blocked, Perplexity’s bots switched from declared identifiers (PerplexityBot, Perplexity-User) to a generic Chrome-on-macOS user agent, mimicking human traffic.
- IP Rotation: Bots used unlisted IP addresses and varied Autonomous System Numbers (ASNs) to dodge detection, generating 3-6 million daily stealth requests across tens of thousands of domains.
- Ignoring Robots.txt: Perplexity often bypassed or failed to fetch robots.txt files, accessing restricted content on test domains created by Cloudflare.
This behavior, Cloudflare argued, violates web norms outlined in RFC 9309, undermining the trust-based internet ecosystem. In contrast, OpenAI’s ChatGPT respected robots.txt and ceased crawling when blocked, earning praise as a model of compliance. Perplexity’s response was defiant, dismissing Cloudflare’s report as a “publicity stunt” and arguing that AI assistants like theirs require real-time data to answer user queries, unlike traditional crawlers building databases. But does this justify bypassing website owners’ explicit wishes?
Ethical and Legal Fallout for Perplexity
The use of stealth crawlers raises thorny ethical and legal issues. Websites rely on robots.txt to protect sensitive data, reduce server strain, and maintain control over their content. Ignoring these directives can:
Perplexity’s practices aren’t new—developer Robb Knight reported similar scraping in 2024, and
Amazon is reviewing whether Perplexity violated AWS terms. The controversy highlights a broader tension: AI companies’ hunger for data versus publishers’ rights to control their content.
Website Owners Need to Fight Back
Cloudflare has taken decisive action, delisting Perplexity as a verified bot and deploying heuristics to block its stealth crawlers, protecting over 2.5 million websites. Website owners can adopt strategies to combat unauthorized crawling:
- Monitor Server Logs: Identify unusual traffic patterns or repeated requests from rotating IPs.
- Deploy Honeypots: Set traps (e.g., hidden links) to catch crawlers, enabling IP bans.
- Use Bot Management Tools: Tools like Cloudflare’s analyze request headers and behavior, blocking non-compliant bots.
Innovative defenses like Cloudflare’s “AI Labyrinth,” which traps bots in fake content, and “Pay Per Crawl,” reviving the HTTP 402 code, empower site owners to fight back.
Learn about Cloudflare’s defenses here.
The scraping allegations ignited a firestorm online. On X, users like @TechBit lamented Perplexity’s “shady” tactics, while @paigemacp called for stricter.
As the internet evolves, one question remains: Can innovation coexist with respect for digital boundaries? The answer will shape not just Perplexity’s fate, but the web’s future.
Contact Us





