Free robots.txt Checker

Paste your domain. We'll check if your robots.txt is blocking AI bots like ChatGPT, Claude, and Perplexity from crawling your site.

Instant analysis. No account needed.

How robots.txt Directives Actually Work

A robots.txt file is a plain text file at the root of your domain (e.g., yourdomain.com/robots.txt). Bots fetch it before crawling anything else on your site. If the file says a bot isn't allowed, a well-behaved bot stops immediately — it won't touch a single page.

The file is structured in stanzas. Each stanza starts with a User-agent: line that names the bot, followed by one or more Allow: or Disallow: rules. The wildcard User-agent: * applies to every bot that doesn't have its own named stanza.

When a specific bot has its own stanza, those rules take complete precedence over the wildcard stanza. The two stanzas don't merge — the bot reads only its named rules. This means you can block all bots with a wildcard and then explicitly allow individual AI crawlers by adding named stanzas above it.

Within a single stanza, more specific rules win. An Allow: /blog/ overrides a Disallow: / for that path, because /blog/ is longer (more specific) than /.

The AI Bots This Tool Checks

Each AI platform uses a different user-agent string. Getting the name wrong — even by one character — means the rule doesn't apply. Here are the exact strings:

PlatformUser-agent stringUsed for
ChatGPT / OpenAIGPTBotChatGPT search results and training
Claude / AnthropicClaudeBotClaude web search and indexing
PerplexityPerplexityBotReal-time web answers
Google Gemini / AI OverviewsGoogle-ExtendedGemini training and AI Overviews

The Three robots.txt Mistakes That Block AI Traffic

Most sites that fail this check didn't set out to block AI bots. They made one of three common configuration mistakes:

1. A catch-all Disallow with no named overrides

User-agent: * followed by Disallow: / blocks every bot on earth — including GPTBot and ClaudeBot. This was a common recommendation for staging sites and was often accidentally left in place after launch. If your site was built before 2022, check this first: open yourdomain.com/robots.txt in a browser right now. If you see those two lines and nothing else, you're invisible to every AI engine.

2. An Allow rule in the wrong stanza

Adding Allow: / under the wildcard User-agent: * stanza doesn't help a named bot like GPTBot. Once a bot has its own stanza, it ignores the wildcard entirely. You must write a separate named stanza for each bot you want to explicitly allow.

3. A wrong user-agent string

User-agent matching in robots.txt is case-sensitive in many bot implementations. gptbot is not the same as GPTBot. Always use the exact string from the table above. Copy-paste, don't type from memory.

The Correct robots.txt Configuration

Add these stanzas at the top of your robots.txt, before any wildcard rules:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# Your existing rules below this line
User-agent: *
Disallow: /admin/

If you want to allow AI bots but still block them from specific sections (like account pages or private dashboards), use path-level rules:

User-agent: GPTBot
Allow: /
Disallow: /dashboard/
Disallow: /settings/

Common Questions

Does robots.txt affect my Google search ranking?

Blocking Googlebot via robots.txt prevents Google from indexing your pages, so yes — directly. Blocking Google-Extended is separate: that only affects Gemini and AI Overviews, not organic search results. You can block one without affecting the other.

Can I allow AI bots but block them from training on my content?

robots.txt controls crawling access only — not how the crawled content is used. Some platforms honour a separate noai or noimageai meta tag, but enforcement varies. If you want citations without training data use, check each platform's individual opt-out policy.

How often should I re-check my robots.txt?

After any site migration, CMS upgrade, or infrastructure change. robots.txt files are easy to accidentally overwrite during deploys — especially with platforms that auto-generate the file. Check it within 24 hours of any deployment to a new hosting environment.