Robots.txt Generator — Create Robots.txt File

Free robots.txt generator. Create and generate a robots.txt file for your website. Control which pages search engines can crawl and index. Block AI bots.

Configure Your Robots.txt

Or build custom rules below:

One per line. Use * for all bots.

Leave empty to skip. Google ignores this.

One path per line. Leave empty to allow all.

Override disallow for specific paths.

Preferred domain for Yandex.

How to Use

Choose a quick preset (WordPress, Laravel, etc.) or build custom rules below.

Add your disallow/allow paths, sitemap URL, and any comments you want.

Click Generate, then copy or download the file. Upload it to the root of your website.

What is a Robots.txt File?

A robots.txt file is a plain text file at the root of your website (e.g. yoursite.com/robots.txt) that tells search engine crawlers which pages they can and cannot access. It uses the Robots Exclusion Protocol supported by Google, Bing, Yahoo, and all major search engines.

Complete Guide to Robots.txt

How Robots.txt Works

When a search engine bot like Googlebot visits your site, it first checks for a robots.txt file at the root directory. The file contains rules that tell the bot which URLs it is allowed to crawl and which it should skip. Every rule starts with a User-agent line (which bot the rule applies to) followed by Disallow and Allow directives.

Basic Robots.txt Syntax

A robots.txt file is made up of one or more rule groups. Each group targets a specific crawler (or all crawlers with *).

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://yoursite.com/sitemap.xml

This example blocks all crawlers from /admin/ and /private/, allows /public/, and points to the XML sitemap.

Common Robots.txt Rules

Block entire site: Disallow: / — prevents all crawling. Use this for staging or development sites.

Allow entire site: Allow: / or an empty Disallow — lets all crawlers access everything.

Block specific folder: Disallow: /wp-admin/ — blocks a directory and everything inside it.

Block specific file type: Disallow: /*.pdf$ — blocks crawling of all PDF files.

Crawl delay: Crawl-delay: 10 — asks bots to wait 10 seconds between requests. Supported by Bing and Yandex, but not Google.

Robots.txt for WordPress

WordPress sites should block /wp-admin/, /wp-includes/, /trackback/, /feed/, and search result pages (/?s=). Always allow /wp-admin/admin-ajax.php since many themes and plugins depend on it. Our generator has a WordPress preset that applies these rules automatically.

Robots.txt vs Noindex — What's the Difference?

Robots.txt blocks crawling — the bot won't visit the page. A noindex meta tag blocks indexing — the bot visits the page but doesn't add it to search results. If you want a page completely hidden from Google, use noindex. If you just want to save crawl budget, use robots.txt.

Where to Upload Your Robots.txt File

Upload the file to the root directory of your website so it's accessible at https://yoursite.com/robots.txt. Every domain and subdomain needs its own robots.txt file. You can verify it's working using Google Search Console's robots.txt tester.

Common Mistakes to Avoid

Blocking CSS/JS files: Google needs to render your pages. Blocking stylesheets and JavaScript can hurt your rankings.

Blocking your sitemap: Make sure the folder containing your sitemap.xml is not disallowed.

Using robots.txt for sensitive data: Robots.txt is publicly visible. Anyone can read it. Never use it to hide sensitive URLs — use authentication or server-side access control instead.

How to Block AI Bots in Robots.txt

Block PerplexityBot in Robots.txt

Perplexity AI uses PerplexityBot as its user agent to crawl websites for its AI search engine. To block PerplexityBot from crawling your site, add this to your robots.txt:

User-agent: PerplexityBot
Disallow: /

PerplexityBot respects robots.txt according to their documentation. This blocks it from crawling any page on your site. You can also block specific directories instead of the entire site.

Block Ahrefs Bot in Robots.txt

Ahrefs uses AhrefsBot to crawl websites for its SEO database. If you want to prevent Ahrefs from crawling your site (to hide your backlink profile or reduce server load), add:

User-agent: AhrefsBot
Disallow: /

Other SEO bots you may want to block include SemrushBot, MJ12bot (Majestic), and DotBot (Moz).

Block All AI Crawlers at Once

To block all known AI bots from training on your content, add these rules to your robots.txt:

# Block AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

This blocks OpenAI (GPTBot, ChatGPT-User), Google AI (Google-Extended), Perplexity (PerplexityBot), Anthropic (ClaudeBot, anthropic-ai), and Common Crawl (CCBot) from scraping your content. Note: this does not block regular Googlebot search crawling.

Robots.txt Allow All & Troubleshooting

Robots.txt Allow All — Open Your Site to Crawlers

To allow all search engine crawlers to access your entire site, use this minimal robots.txt:

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Alternatively, an empty robots.txt file or no robots.txt file at all has the same effect — all crawlers are allowed by default. However, including the Sitemap directive is recommended to help search engines discover your pages faster.

Blocked by Robots.txt — How to Fix It

If Google Search Console shows "Blocked by robots.txt" for pages you want indexed, your robots.txt is preventing Googlebot from crawling those pages. To fix it:

1. Check your robots.txt: Visit yoursite.com/robots.txt and look for Disallow rules that match the blocked URL path.

2. Remove or modify the rule: Delete the Disallow line blocking the URL, or add a specific Allow rule above it (Allow takes priority over Disallow for the same path).

3. Test in Search Console: Use the robots.txt tester in Google Search Console to verify the URL is now allowed.

4. Request re-crawl: After updating, use the URL Inspection tool in Search Console to request Google re-crawl the page.

Indexed Though Blocked by Robots.txt

This Search Console warning means Google found a URL (through links from other pages) but cannot crawl it because robots.txt blocks it. Google indexes the URL but not its content — it may appear in search results with no snippet or description. To fix this, either remove the robots.txt block so Google can crawl the page, or add a noindex meta tag to the page and remove the robots.txt block (Google needs to crawl the page to see the noindex tag).

How to Remove Robots.txt in WordPress

WordPress generates a virtual robots.txt by default. If you want to modify or remove it:

Option 1 — Override with a physical file: Create a robots.txt file using our generator and upload it to your WordPress root directory via FTP or cPanel File Manager. A physical file overrides WordPress's virtual robots.txt.

Option 2 — Use an SEO plugin: Plugins like Yoast SEO and Rank Math let you edit robots.txt directly from the WordPress admin under SEO → Tools → File Editor.

Option 3 — Check "Discourage search engines": Go to Settings → Reading and make sure "Discourage search engines from indexing this site" is unchecked. When checked, WordPress adds Disallow: / to robots.txt, blocking your entire site.

Frequently Asked Questions

What is a robots.txt file?
A robots.txt file is a plain text file at the root of your website that tells search engine crawlers which pages or sections they can or cannot visit. It lives at yoursite.com/robots.txt and is checked by Google, Bing, Yahoo, and other search engines before crawling your site.
How do I generate a robots.txt file?
Use our free robots.txt generator above. Select a preset template (WordPress, Laravel, etc.) or build custom rules by adding user-agents, disallow paths, and allow paths. Click Generate, then download or copy the file and upload it to your website root directory.
How do I block PerplexityBot in robots.txt?
Add these two lines to your robots.txt: User-agent: PerplexityBot followed by Disallow: / on the next line. This prevents Perplexity AI from crawling your website. PerplexityBot respects robots.txt according to their official documentation.
How do I block Ahrefs bot in robots.txt?
Add User-agent: AhrefsBot followed by Disallow: / to your robots.txt file. This prevents the Ahrefs SEO crawler from accessing your site. You can also block other SEO bots like SemrushBot, MJ12bot (Majestic), and DotBot (Moz) the same way.
How do I set robots.txt to allow all?
Use: User-agent: * followed by Allow: / and a Sitemap line. This permits all crawlers to access your entire site. An empty robots.txt or no robots.txt file has the same effect, but adding the Sitemap directive helps search engines find your pages.
What does "Blocked by robots.txt" mean in Search Console?
It means your robots.txt file contains a Disallow rule that prevents Googlebot from crawling that page. Check your robots.txt for rules matching the blocked URL, remove or modify the Disallow line, then request a re-crawl in Google Search Console.
What does "Indexed though blocked by robots.txt" mean?
This means Google found the URL through links but cannot crawl its content because robots.txt blocks it. Google indexes the URL without a snippet. To fix it: either remove the robots.txt block so Google can crawl the page normally, or allow crawling and add a noindex meta tag if you want it removed from search results.
How do I remove or edit robots.txt in WordPress?
WordPress generates a virtual robots.txt automatically. To override it: upload a physical robots.txt file to your root directory via FTP, or use Yoast SEO / Rank Math plugin to edit it from the admin panel. Also check Settings → Reading and uncheck "Discourage search engines from indexing this site".
How do I block all AI crawlers in robots.txt?
Add separate User-agent blocks for each AI bot: GPTBot and ChatGPT-User (OpenAI), Google-Extended (Google AI), PerplexityBot (Perplexity), ClaudeBot and anthropic-ai (Anthropic), and CCBot (Common Crawl). Set Disallow: / for each. This blocks AI training crawlers without affecting regular search engine indexing.
Does robots.txt block pages from Google?
Robots.txt prevents crawling, not indexing. If other pages link to a disallowed URL, Google may still index the URL (but not its content). To fully prevent a page from appearing in search results, use a noindex meta tag — but the page must be crawlable for Google to see the tag.
Should I add my sitemap to robots.txt?
Yes. Adding a Sitemap directive (e.g. Sitemap: https://yoursite.com/sitemap.xml) helps search engines discover your XML sitemap faster. This is recommended by both Google and Bing and can speed up indexing of new pages.
Does Google respect the Crawl-delay directive?
No. Google ignores the Crawl-delay directive in robots.txt. However, Bing, Yandex, and some other search engines do support it. To control Google crawl rate, use the Crawl Rate setting in Google Search Console instead.

Share This Tool

Found it useful? Share it with your friends, classmates, or colleagues.