Custom error pages are important for user experience — they help visitors stay oriented when something goes wrong. But if they aren't set up correctly, they can cause problems for search engines and crawlers, including:
As a provider of programmable search engine software, we often see these issues during real-world deployments. This post outlines best practices to help avoid these problems and improve crawler behavior.
Return the Correct HTTP Status Code
Every error page should return an appropriate HTTP status code:
Avoid returning 200 OK for pages that represent an error. Doing so tells crawlers the content is valid, which can lead to indexing and link-following behavior that's not intended.
Use a Canonical Link if the Error Page is Reused
If your error page is a shared template (e.g., /404.html
) and may be returned
with a 200 OK, consider adding a canonical link tag in the page header:
<link rel="canonical" href="https://example.com/404.html">
This helps search engines understand that the content is not unique and avoids indexing many different URLs that all show the same error page.
Include Meta Robots Directives
Use the following tag in your error page's <head>:
<meta name="robots" content="noindex, nofollow">
This tells crawlers not to index the page or follow any links on it. It's a useful safeguard, especially if some error pages are returned with a 200 OK status due to technical or legacy reasons.
Be Careful with Relative Links
Navigation links on error pages, such as "Home" or "Contact," can cause unintended crawling
behavior if written as relative paths like home.html
or ../contact
.
For example, if a crawler accesses: https://example.com/missing/path/
and the error
page includes a link to contact.html
, the crawler may request:
https://example.com/missing/path/contact.html
which likely also doesn't exist.
Recommendations:
Use Consistent Titles for Optional Exclusion Rules
Although our crawler does not rely on page heuristics to detect error pages, you can set up manual rules using our “Exclude By Field” feature.
If your error page consistently uses a title like:
<title>404 - Page Not Found</title>
you can define a rule in the crawler to exclude any page with that title from indexing, and optionally from link-following.
Keep in mind:
Monitor for Crawl Patterns
It's helpful to keep an eye on crawler behavior, especially during initial indexing. Indicators of error-related problems include:
Our software provides crawl logs and supports exclusion rules that can be fine-tuned to prevent this type of activity.
Conclusion
Custom error pages should support users without misleading crawlers. By following these best practices — correct status codes, proper link handling, and metadata — you can prevent many common issues and improve the quality of your indexed content.
If you use the Thunderstone search engine, our features like "Exclude By Field" and crawl logging make it easier to manage how error pages are handled.