Loading tool...
Generate SEO meta tags, Open Graph, and Twitter Card tags for your website. Preview how your page appears in search results and social media.
Generate Subresource Integrity (SRI) hashes for scripts and stylesheets. Protect against CDN compromise and tampering
Generate ultra-secure passwords with presets (Simple to Paranoid), strength analysis, entropy calculation, crack time estimation, password history, and bulk generation
Create robots.txt files to control how search engine crawlers access your website. Use presets for common configurations, add custom rules for specific user-agents, specify sitemaps, and set crawl delays. Essential for SEO and website management.
Guide search engines on which pages to crawl and index, improving crawl efficiency and search engine optimization.
Prevent AI training bots (GPTBot, ChatGPT-User, anthropic-ai) from scraping your content for AI model training.
Block crawler access to private directories, admin areas, and sensitive content without using authentication.
Use robots.txt as part of comprehensive SEO strategy to manage crawler resources and improve search rankings.
Control crawler request rates to prevent excessive server load from search engine crawlers.
Configure robots.txt for WordPress, Drupal, Joomla, and other CMSs to protect admin areas and private content.
The Robots Exclusion Protocol, commonly known as robots.txt, is one of the oldest standards on the web, originally proposed by Martijn Koster in 1994 and formalized as an IETF standard in RFC 9309 (published in 2022, nearly three decades after its informal adoption). The protocol provides a mechanism for website owners to communicate with web crawlers about which parts of their site should or should not be accessed. It operates on a voluntary compliance model: well-behaved crawlers honor robots.txt directives, but malicious bots or scrapers may ignore them entirely.
The robots.txt file must be placed at the root of a website (https://example.com/robots.txt) and uses a simple text-based syntax. Each section begins with a User-agent directive specifying which crawler the rules apply to (or * for all crawlers), followed by Allow and Disallow directives that specify URL paths the crawler may or may not access. The Crawl-delay directive, though not part of the original standard, is honored by some crawlers (notably Bing and Yandex) to limit request frequency. The Sitemap directive points crawlers to XML sitemap files that list all pages the site wants indexed, complementing the exclusion rules with inclusion guidance.
The relationship between robots.txt and SEO is nuanced and frequently misunderstood. Blocking a URL with Disallow prevents crawlers from accessing the page, but it does not prevent the URL from appearing in search results. If other pages link to a disallowed URL, search engines may still index the URL with limited information, displaying the link text from referring pages. To prevent a page from appearing in search results entirely, the page must return a noindex meta robots tag or X-Robots-Tag HTTP header, which requires the page to be crawlable. This means that using robots.txt to block sensitive pages can paradoxically make them more visible in search results by preventing the search engine from seeing the noindex directive.
The emergence of AI training bots has created a new dimension for robots.txt usage. Crawlers like GPTBot (OpenAI), ChatGPT-User (OpenAI), Google-Extended (Google), anthropic-ai (Anthropic), and CCBot (Common Crawl) scrape web content for training large language models. Many website owners now add specific Disallow rules for these user agents to prevent their content from being used in AI model training. This has reignited debate about the adequacy of the robots.txt standard, which was designed for search engine crawling and lacks granularity for distinguishing between different uses of crawled content such as indexing, caching, and AI training.
The robots.txt file must be placed in the root directory of your website (e.g., https://example.com/robots.txt).
No, robots.txt is a request, not enforcement. Well-behaved bots follow it, but it doesn't prevent access. Use authentication for truly private content.
If you don't want your content used for AI training, block bots like GPTBot, ChatGPT-User, and anthropic-ai. Many sites now block these by default.
All processing happens directly in your browser. Your files never leave your device and are never uploaded to any server.