Perplexity Crawler IP Address List

Managing Perplexity AI Crawlers

Ensuring your site is correctly indexed by AI search engines like Perplexity is a key pillar of modern SEO.

Like many search engines and AI LLM tools, Perplexity use web crawlers (bots, robots or spiders) and user agents to parse the internet and gather information from websites indexed globally.

Perplexity crawlers also respect robots.txt rules, much in the same way as traditional search spiders, so updating your robots.txt file to disallow Perplexity from crawling is effective.

If your website has gated content, paywalls, subscription based content or members only areas, then preventing Perplexity from crawling your pages important for keeping content private and preventing it becoming available within AI LLMs.

How do Perplexity crawlers work?

Perplexity crawlers don't function in entirely the same way as traditional search engine crawlers.

Search engines, especially Google, have historically used crawlers to constantly scrape the internet to amass knowledge about websites, data and pages.

AI LLMs, like PerplexityBot instead focus on realtime reretrieval systems, identifying, parsing and summarising the latest fresh web content to produce their answers and results, rather than retrieving it from data stored on their systems.

What is Retrieval Augmented Generation?

1. Real Time Retrieval: Necessary and relevant information is extracted from a website page after an user inputs a question.

2. RAG Intergration (Retrieval Augmented Generation): Numerical vectors store this information and it is sent to the LLM model to generate a response.

3. Rather than a list of links, Perplexity focuses on providing answers, summarised to directly answer the user query.

Perplexity is hugely supportive of publishers, compared to other AI LLMs, because every answer references the source for the content in the answer that it produces.

Robots.txt Rules

Perplexity respects standard robots.txt directives. You can explicitly allow or block their bot:

User-agent: PerplexityBot
Allow: / # To allow PerplexityBot
Disallow: / # To block PerplexityBot

Why Verification is Important

Scrapers often masquerade as well-known bots to avoid being blocked. By cross-referencing visitor IPs with these official lists, you can distinguish legitimate AI search traffic from malicious actors.

Impact on AI-Driven Visibility

Perplexity relies on these crawlers to provide citations and links back to original sources. Blocking these IPs may prevent your brand from appearing as a cited authority in AI-generated answers.

Blocking Perplexity with WAF Rules

Another method of blocking Perplexity crawlers is with the use of WAF (web application firewall) can be configure in your Cloudflare account or AWS account.

You can configure rules in your account to limit or prevent http requests where the user agent contains PerplexityBot or Perplexity-User. You can also add Perplexity IP ranges to your WAF block list.

Configuring these WAF rules, blocking Perplexity bots with robots.txt and using Managed Rules in tools like Cloudflare can help prevent Perplexity crawling if you're protecting your content.

Perplexity Stealth Crawling Accusations

Cloudflare published an article accusing Perplexity crawlers of 'stealth crawling' with undeclared crawlers and modified user agents which circumvent blocking rules to access content anyway.

This is cited as an issue by the team at Cloudflare because they claim the internet is built on foundations of trust, which Perplexity are ignoring. Here's the full breakdown of Cloudflare's grievance with Perplexity crawlers.

Googlebot IP ranges
Special case Google crawler IPs
BingBot IP ranges
OpenAI IP ranges
Claude (Anthropic) doesn't currently make the IP addresses of it's crawlers public. We will update our suite when they do.