How does Perplexity follow robots.txt?

Perplexity respects robots.txt directives. Our crawler, PerplexityBot, will not index the full or partial text content of any site that disallows it via robots.txt. However, if a page is blocked, we may still index the domain, headline, and a brief factual summary.

Will my content be used for AI training if I allow it to appear in Perplexity?

No, PerplexityBot indexes pages similarly to other search engines. Perplexity does not build foundation models, so your content will not be used for AI model pre-training.

Why have I read that Perplexity’s crawlers don’t respect robots.txt?

Previously, users could prompt Perplexity to summarize a specific URL, even if it was blocked by robots.txt. This allowed users to access content as if they’d copied and pasted it themselves. However, this feature has been disabled to prevent misuse.

Now, PerplexityBot only crawls content in compliance with robots.txt.

Additionally, Perplexity partners with third-party crawlers to help build our search index. We have updated our agreements to ensure these providers also respect robots.txt, particularly for news publisher sites.