Written by

Perplexity Team

Published on

Aug 4, 2025

Agents or Bots? Making Sense of AI on the Open Web

As the internet evolves, so too do the ways in which we access and interact with information. In the earliest days of the web, automated bots played a simple, well-understood role: indexing sites for search, checking links, or scraping data according to clear rules set by website owners. 

But with the rise of AI-powered assistants and user-driven agents, the boundary between what counts as "just a bot" and what serves the immediate needs of real people has become increasingly blurred. 

The Rise of Digital Assistants

Modern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information—say, "What are the latest reviews for that new restaurant?"—the AI doesn't already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.

This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not. User-driven agents, by contrast, only fetch content when a real person requests something specific, and they use that content immediately to answer the user's question. Perplexity’s user-driven agents do not store the information or train with it.

Why This Distinction Matters

The difference between automated crawling and user-driven fetching isn't just technical—it's about who gets to access information on the open web. When Google's search engine crawls to build its index, that's different from when it fetches a webpage because you asked for a preview. Google's "user-triggered fetchers" prioritize your experience over robots.txt restrictions because these requests happen on your behalf.

The same applies to AI assistants. When Perplexity fetches a webpage, it's because you asked a specific question requiring current information. The content isn't stored for training—it's used immediately to answer your question.

When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like. 

This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.  If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.

This overblocking hurts everyone. Consider someone using AI to research medical conditions, compare product reviews, or access news from multiple sources. If their assistant gets blocked as a "malicious bot," they lose access to valuable information.

The result is a two-tiered internet where your access depends not on your needs, but on whether your chosen tools have been blessed by infrastructure controllers, who will care more about your means. This undermines user choice and threatens the open web's accessibility for innovative services competing with established giants.

A Call for Clarity: How User Agents Actually Work

An AI assistant works just like a human assistant. When you ask an AI assistant a question that requires current information, they don’t already know the answer. They look it up for you in order to complete whatever task you’ve asked.

On Perplexity and all other agentic AI platforms, this happens in real-time, in response to your request, and the information is used immediately to answer your question. It's not stored in massive databases for future use, and it's not used to train AI models.

User-driven agents only act when users make specific requests, and they only fetch the content needed to fulfill those requests. This is the fundamental difference between a user agent and a bot. 

Directly Addressing Cloudflare: A Question of Competence

Cloudflare's recent blog post managed to get almost everything wrong about how modern AI assistants actually work.

In addition to misunderstanding 20-25M user agent requests are not scrapers, Cloudflare claimed that Perplexity was engaging in "stealth crawling," using hidden bots and impersonation tactics to bypass website restrictions. But the technical facts tell a different story.

It appears Cloudflare confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks (less than 45,000 daily requests).  

Because Cloudflare has conveniently obfuscated their methodology and declined to answer questions helping our teams understand, we can only narrow this down to two possible explanations.

  1. Cloudflare needed a clever publicity moment and we–their own customer–happened to be a useful name to get them one. 

  2. Cloudflare fundamentally misattributed 3-6M daily requests from BrowserBase's automated browser service to Perplexity, a basic traffic analysis failure that's particularly embarrassing for a company whose core business is understanding and categorizing web traffic. 

Whichever explanation is the truth, the technical errors in Cloudflare's analysis aren't just embarrassing—they're disqualifying. When you misattribute millions of requests, publish completely inaccurate technical diagrams, and demonstrate a fundamental misunderstanding of how modern AI assistants work, you've forfeited any claim to expertise in this space.

This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.  If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.

The bluster around this issue also reveals that Cloudflare’s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud. This matters because Cloudflare's customers include businesses of all types, companies who can’t afford to trust their infrastructure with charlatan publicity stunts. 

Even more embarrassing, Cloudflare published a technical diagram supposedly showing "Perplexity's crawling workflow" that bears no resemblance to how Perplexity actually works. If Cloudfare were truly interested in understanding the data they were seeing, how our systems work, or these fundamental concepts outlined above, they could have done what we encourage all Perplexity users to do. Just ask.

Agents or Bots? Making Sense of AI on the Open Web

As the internet evolves, so too do the ways in which we access and interact with information. In the earliest days of the web, automated bots played a simple, well-understood role: indexing sites for search, checking links, or scraping data according to clear rules set by website owners. 

But with the rise of AI-powered assistants and user-driven agents, the boundary between what counts as "just a bot" and what serves the immediate needs of real people has become increasingly blurred. 

The Rise of Digital Assistants

Modern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information—say, "What are the latest reviews for that new restaurant?"—the AI doesn't already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.

This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not. User-driven agents, by contrast, only fetch content when a real person requests something specific, and they use that content immediately to answer the user's question. Perplexity’s user-driven agents do not store the information or train with it.

Why This Distinction Matters

The difference between automated crawling and user-driven fetching isn't just technical—it's about who gets to access information on the open web. When Google's search engine crawls to build its index, that's different from when it fetches a webpage because you asked for a preview. Google's "user-triggered fetchers" prioritize your experience over robots.txt restrictions because these requests happen on your behalf.

The same applies to AI assistants. When Perplexity fetches a webpage, it's because you asked a specific question requiring current information. The content isn't stored for training—it's used immediately to answer your question.

When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like. 

This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.  If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.

This overblocking hurts everyone. Consider someone using AI to research medical conditions, compare product reviews, or access news from multiple sources. If their assistant gets blocked as a "malicious bot," they lose access to valuable information.

The result is a two-tiered internet where your access depends not on your needs, but on whether your chosen tools have been blessed by infrastructure controllers, who will care more about your means. This undermines user choice and threatens the open web's accessibility for innovative services competing with established giants.

A Call for Clarity: How User Agents Actually Work

An AI assistant works just like a human assistant. When you ask an AI assistant a question that requires current information, they don’t already know the answer. They look it up for you in order to complete whatever task you’ve asked.

On Perplexity and all other agentic AI platforms, this happens in real-time, in response to your request, and the information is used immediately to answer your question. It's not stored in massive databases for future use, and it's not used to train AI models.

User-driven agents only act when users make specific requests, and they only fetch the content needed to fulfill those requests. This is the fundamental difference between a user agent and a bot. 

Directly Addressing Cloudflare: A Question of Competence

Cloudflare's recent blog post managed to get almost everything wrong about how modern AI assistants actually work.

In addition to misunderstanding 20-25M user agent requests are not scrapers, Cloudflare claimed that Perplexity was engaging in "stealth crawling," using hidden bots and impersonation tactics to bypass website restrictions. But the technical facts tell a different story.

It appears Cloudflare confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks (less than 45,000 daily requests).  

Because Cloudflare has conveniently obfuscated their methodology and declined to answer questions helping our teams understand, we can only narrow this down to two possible explanations.

  1. Cloudflare needed a clever publicity moment and we–their own customer–happened to be a useful name to get them one. 

  2. Cloudflare fundamentally misattributed 3-6M daily requests from BrowserBase's automated browser service to Perplexity, a basic traffic analysis failure that's particularly embarrassing for a company whose core business is understanding and categorizing web traffic. 

Whichever explanation is the truth, the technical errors in Cloudflare's analysis aren't just embarrassing—they're disqualifying. When you misattribute millions of requests, publish completely inaccurate technical diagrams, and demonstrate a fundamental misunderstanding of how modern AI assistants work, you've forfeited any claim to expertise in this space.

This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.  If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.

The bluster around this issue also reveals that Cloudflare’s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud. This matters because Cloudflare's customers include businesses of all types, companies who can’t afford to trust their infrastructure with charlatan publicity stunts. 

Even more embarrassing, Cloudflare published a technical diagram supposedly showing "Perplexity's crawling workflow" that bears no resemblance to how Perplexity actually works. If Cloudfare were truly interested in understanding the data they were seeing, how our systems work, or these fundamental concepts outlined above, they could have done what we encourage all Perplexity users to do. Just ask.

Agents or Bots? Making Sense of AI on the Open Web

As the internet evolves, so too do the ways in which we access and interact with information. In the earliest days of the web, automated bots played a simple, well-understood role: indexing sites for search, checking links, or scraping data according to clear rules set by website owners. 

But with the rise of AI-powered assistants and user-driven agents, the boundary between what counts as "just a bot" and what serves the immediate needs of real people has become increasingly blurred. 

The Rise of Digital Assistants

Modern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information—say, "What are the latest reviews for that new restaurant?"—the AI doesn't already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.

This is fundamentally different from traditional web crawling, in which crawlers systematically visit millions of pages to build massive databases, whether anyone asked for that specific information or not. User-driven agents, by contrast, only fetch content when a real person requests something specific, and they use that content immediately to answer the user's question. Perplexity’s user-driven agents do not store the information or train with it.

Why This Distinction Matters

The difference between automated crawling and user-driven fetching isn't just technical—it's about who gets to access information on the open web. When Google's search engine crawls to build its index, that's different from when it fetches a webpage because you asked for a preview. Google's "user-triggered fetchers" prioritize your experience over robots.txt restrictions because these requests happen on your behalf.

The same applies to AI assistants. When Perplexity fetches a webpage, it's because you asked a specific question requiring current information. The content isn't stored for training—it's used immediately to answer your question.

When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like. 

This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.  If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.

This overblocking hurts everyone. Consider someone using AI to research medical conditions, compare product reviews, or access news from multiple sources. If their assistant gets blocked as a "malicious bot," they lose access to valuable information.

The result is a two-tiered internet where your access depends not on your needs, but on whether your chosen tools have been blessed by infrastructure controllers, who will care more about your means. This undermines user choice and threatens the open web's accessibility for innovative services competing with established giants.

A Call for Clarity: How User Agents Actually Work

An AI assistant works just like a human assistant. When you ask an AI assistant a question that requires current information, they don’t already know the answer. They look it up for you in order to complete whatever task you’ve asked.

On Perplexity and all other agentic AI platforms, this happens in real-time, in response to your request, and the information is used immediately to answer your question. It's not stored in massive databases for future use, and it's not used to train AI models.

User-driven agents only act when users make specific requests, and they only fetch the content needed to fulfill those requests. This is the fundamental difference between a user agent and a bot. 

Directly Addressing Cloudflare: A Question of Competence

Cloudflare's recent blog post managed to get almost everything wrong about how modern AI assistants actually work.

In addition to misunderstanding 20-25M user agent requests are not scrapers, Cloudflare claimed that Perplexity was engaging in "stealth crawling," using hidden bots and impersonation tactics to bypass website restrictions. But the technical facts tell a different story.

It appears Cloudflare confused Perplexity with 3-6M daily requests of unrelated traffic from BrowserBase, a third-party cloud browser service that Perplexity only occasionally uses for highly specialized tasks (less than 45,000 daily requests).  

Because Cloudflare has conveniently obfuscated their methodology and declined to answer questions helping our teams understand, we can only narrow this down to two possible explanations.

  1. Cloudflare needed a clever publicity moment and we–their own customer–happened to be a useful name to get them one. 

  2. Cloudflare fundamentally misattributed 3-6M daily requests from BrowserBase's automated browser service to Perplexity, a basic traffic analysis failure that's particularly embarrassing for a company whose core business is understanding and categorizing web traffic. 

Whichever explanation is the truth, the technical errors in Cloudflare's analysis aren't just embarrassing—they're disqualifying. When you misattribute millions of requests, publish completely inaccurate technical diagrams, and demonstrate a fundamental misunderstanding of how modern AI assistants work, you've forfeited any claim to expertise in this space.

This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.  If you can't tell a helpful digital assistant from a malicious scraper, then you probably shouldn't be making decisions about what constitutes legitimate web traffic.

The bluster around this issue also reveals that Cloudflare’s leadership is either dangerously misinformed on the basics of AI, or simply more flair than cloud. This matters because Cloudflare's customers include businesses of all types, companies who can’t afford to trust their infrastructure with charlatan publicity stunts. 

Even more embarrassing, Cloudflare published a technical diagram supposedly showing "Perplexity's crawling workflow" that bears no resemblance to how Perplexity actually works. If Cloudfare were truly interested in understanding the data they were seeing, how our systems work, or these fundamental concepts outlined above, they could have done what we encourage all Perplexity users to do. Just ask.

Share this article