Robots.txt
A text file at your site's root that tells search engine crawlers which pages or sections they can and can't access. It controls crawling, not indexing - a page blocked by robots.txt can still appear in search results.
Why It Matters
Robots.txt is your first line of communication with search engine crawlers. It tells them what to crawl and what to ignore. For large sites, it's essential for managing crawl budget - keeping Googlebot focused on your valuable pages rather than wasting time on admin panels, filtered results, or duplicate content.
The critical distinction: robots.txt controls crawling, not indexing. If other sites link to a page you've blocked in robots.txt, Google may still index the URL (just without seeing its content). To prevent indexing, use noindex tags instead.
In Practice
Keep your robots.txt simple. Block admin directories, internal search results, staging environments, and other sections with no SEO value. Don't block CSS or JavaScript files - Google needs these to render your pages.
Always include a reference to your XML sitemap in robots.txt. It's a small thing that helps Google find your sitemap automatically.
Test your robots.txt using Google Search Console's robots.txt testing tool before deploying changes. A misconfigured robots.txt can block your entire site from crawling - it's a single-point-of-failure file that deserves careful attention.
Common Mistakes
Blocking CSS/JS files that Google needs for rendering. Using robots.txt to try to prevent indexing (it doesn't work that way). Accidentally blocking important sections during development and forgetting to remove the rule. Not testing changes before deploying.
Related Terms
Glossary
Crawling
How search engine bots discover and download your pages - the first step to ranking.
Glossary
Crawl Budget
How many pages Google will crawl on your site in a given timeframe.
Glossary
Googlebot
Google's web crawler that discovers, downloads, and indexes your pages.
Glossary
Meta Robots
HTML directives telling search engines whether to index a page and follow its links.
Glossary
Sitemap (XML)
An XML file listing all pages you want search engines to discover and index.
Know the Words.
Now See Them in Action.
Free teardown. No jargon. Just what's broken and how to fix it.
Get The Teardown