The internet runs on trust between website owners and the companies that crawl their content. But in August 2025, that trust was shaken when Cloudflare publicly accused Perplexity AI of using stealth crawlers to access data that had been explicitly blocked. The dispute quickly became about more than technical details. It touched on AI ethics, transparency, and the future of web scraping, topics with consequences for publishers, developers, and the millions who rely on AI tools.
Cloudflare’s Allegations vs. Perplexity’s Response
In its August 4 blog post, Cloudflare claimed that Perplexity had been using undeclared bots to evade robots.txt directives and Web Application Firewall (WAF) rules. Cloudflare stated that while Perplexity’s official user agents, PerplexityBot and Perplexity-User, would stop when blocked, the company secretly switched to less identifiable crawlers.
According to Cloudflare, the crawlers in question appeared to operate covertly by:
- Presenting themselves as standard browsers such as Google Chrome.
- Frequently rotating IP addresses and Autonomous Systems (ASNs).
- Overlooking robots.txt restrictions placed by site owners.
In its blog post, Cloudflare argued that “these bots are designed to resemble ordinary browsers” and that they continued to collect data “even when site owners have explicitly opted out.”
Cloudflare said it launched the investigation after numerous customers reported strange activity. Despite setting rules to block Perplexity, content was still being accessed. These site operators had:
- Disallowed Perplexity in their robots.txt.
- Added WAF rules targeting PerplexityBot and Perplexity-User.
And yet, the data showed up in Perplexity’s AI answers. Cloudflare confirmed that the official bots were being blocked, but other traffic sources tied to Perplexity still slipped through. To support its claims, Cloudflare created decoy sites, brand-new, hidden domains with no public exposure and strict blocking rules.
In theory, no bot could access them, but when Cloudflare queried Perplexity AI, the responses included details from these restricted pages. The findings shocked even veteran security researchers. Business Insider reported that Cloudflare detected Perplexity traffic presenting itself as Google, a tactic often linked to questionable practices.
According to Cloudflare CEO Matthew Prince, the behavior was strikingly similar to tactics “typically seen from North Korean hackers.”
After verifying its findings, Cloudflare moved swiftly to limit Perplexity’s access. The company:
- Removed Perplexity from its verified bots program.
- Updated its managed WAF rules to block the suspected stealth crawlers.
- Issued a public advisory warning site operators about Perplexity’s behavior.
Prince stressed that the issue was about upholding basic internet norms rather than targeting a single company, noting that bots should be operated transparently and within the rules, and that disguising their activity crosses a line.
Perplexity, however, strongly rejected these claims. Company spokesperson Jesse Dwyer dismissed the blog post as little more than a “sales pitch,” telling TechCrunch that Cloudflare’s screenshots “show that no content was accessed.” In a follow-up, he claimed the crawler mentioned in the post “isn’t even ours.” Perplexity countered that the traffic identified by Cloudflare may have originated from BrowserBase, a third-party service it sometimes relies on.
Perplexity accused Cloudflare of misinterpreting traffic and blowing the issue out of proportion. In its own blog response, the startup even mocked the report, calling it “more flair than cloud.”
Industry Reactions
The move drew mixed reactions across the tech community. Some publishers welcomed Cloudflare’s hard stance, arguing that AI firms often take a “scrape first, apologize later” approach that disregards website owners’ choices. Others, however, viewed the public confrontation as unusually aggressive.
Alex Stamos, former Facebook security chief, noted in a social media post that while enforcing robots.txt is important, “turning disputes over web norms into public shaming campaigns may not be the most productive path.” Meanwhile, AI advocates suggested that Cloudflare’s framing was designed to reinforce its own value as the gatekeeper of web traffic.
Still, the episode resonated with many smaller site operators, who often lack the resources to fend off unauthorized scraping. For them, Cloudflare’s intervention signaled that at least one major infrastructure player is willing to confront AI companies head-on.
Broader Implications
The sharp disagreement highlights a growing gray area in how AI companies collect training data. Some analysts argue that Cloudflare is using the controversy to reinforce its positioning as a protector of publisher rights. Others point out that Perplexity’s denial does not entirely dispel concerns about the opacity of AI crawlers.
As Sarah Roberts, a UCLA researcher specializing in internet governance, noted in a recent panel discussion, “The tension here is not just about one company versus another. It’s about whether AI firms can be trusted to respect long-standing web norms like robots.txt.”
This back-and-forth underscores a larger issue that extends far beyond Cloudflare and Perplexity; as AI companies race to train models, the ethics and legality of large-scale web scraping are being tested in real time, raising urgent questions about consent, transparency, and control over online content.
Meanwhile, many security experts warned that if impersonation tactics like these become widespread, trust between crawlers and site owners could collapse.
At its heart, the Cloudflare vs. Perplexity fight isn’t just about bots. It’s about the future of web norms. For decades, companies like Google and Bing have respected robots.txt as an industry standard. But as AI startups race to collect training data, the temptation to bend or break rules is increasing.
Cloudflare portrays itself as a defender of those standards, while Perplexity insists it’s being unfairly singled out. Either way, the conflict highlights a central question; should AI treat the internet as free training data, or respect the choices of site owners?
The truth may remain murky, but the fight has already made waves. For AI companies, the lesson is clear, respecting site rules is about more than compliance, it’s about trust. And as AI becomes more deeply embedded into the fabric of the internet, that trust will decide who gets to build the future.