Glossary

Robots.txt

A text file at your site's root that tells search engine crawlers which pages or sections they can and can't access. It controls crawling, not indexing - a page blocked by robots.txt can still appear in search results.

Why It Matters

Robots.txt is your first line of communication with search engine crawlers. It tells them what to crawl and what to ignore. For large sites, it's essential for managing crawl budget - keeping Googlebot focused on your valuable pages rather than wasting time on admin panels, filtered results, or duplicate content.

The critical distinction: robots.txt controls crawling, not indexing. If other sites link to a page you've blocked in robots.txt, Google may still index the URL (just without seeing its content). To prevent indexing, use noindex tags instead.

In Practice

Keep your robots.txt simple. Block admin directories, internal search results, staging environments, and other sections with no SEO value. Don't block CSS or JavaScript files - Google needs these to render your pages.

Always include a reference to your XML sitemap in robots.txt. It's a small thing that helps Google find your sitemap automatically.

Test your robots.txt using Google Search Console's robots.txt testing tool before deploying changes. A misconfigured robots.txt can block your entire site from crawling - it's a single-point-of-failure file that deserves careful attention.

Common Mistakes

Blocking CSS/JS files that Google needs for rendering. Using robots.txt to try to prevent indexing (it doesn't work that way). Accidentally blocking important sections during development and forgetting to remove the rule. Not testing changes before deploying.

Know the Words.
Now See Them in Action.

Free teardown. No jargon. Just what's broken and how to fix it.

Get The Teardown

Get your free site teardown.