Frequently Asked Questions
What is a robots.txt file and why does it matter for SEO?
A robots.txt file is a text file at the root of your website (yoursite.com/robots.txt) that tells search engine crawlers which pages they can and can't access. It's the first file Google checks before crawling your site. A missing, broken, or misconfigured robots.txt can block Google from indexing your most important pages — or waste your crawl budget on pages that don't matter.
What happens if my site doesn't have a robots.txt file?
If your site has no robots.txt file (returns 404), search engines will crawl everything they can find. This isn't necessarily bad for small sites, but for larger sites it means Google wastes crawl budget on admin pages, duplicate content, and URLs you don't want indexed. A well-configured robots.txt focuses Google's attention on your most valuable pages.
Should I reference my sitemap in robots.txt?
Yes — you should include a Sitemap: directive in your robots.txt pointing to your XML sitemap. This helps search engines discover your sitemap automatically. Example: 'Sitemap: https://yoursite.com/sitemap.xml'. If your robots.txt doesn't include a Sitemap directive, Google may not find your sitemap unless you submit it manually in Search Console.
What does 'Disallow: /' mean in robots.txt?
'Disallow: /' tells search engines to not crawl ANY page on your site. This is the most dangerous robots.txt mistake — it makes your entire site invisible to Google. It's sometimes added accidentally during development and forgotten when the site goes live. If you see this, fix it immediately.
Can robots.txt block pages from appearing in Google?
Robots.txt blocks crawling, not indexing. If a page is linked to from other sites, Google may still index the URL (showing it in results without a snippet). To truly prevent indexing, use a 'noindex' meta tag or X-Robots-Tag header instead. Robots.txt is for controlling crawl behavior, not index behavior.