Troubleshooting missing content URLs

"Why didn't this content get indexed?" is a common first troubleshooting problem.

The first step is determining a specific content URL that you would expect to be part of the searchable content, but isn't. We'll refer to this as the "Content URL".

  • Use the ToolsList/Edit URLs interface too look up the Content URL. Is it present in the searchable database?

    • If so, clicking on the listed URL to go to the List/Edit Details page for the Content URL. Here you can compare its content to what you expect, and view any errors.

  • If the Content URL isn't in the index, you need to determine a URL that links to that Content URL (we'll call this "Parent URL").

    Now look up the Parent URL in ToolsList/Edit URLs. Is the Parent URL in the index?

    • If not, we need to repeat the process again, thinking of a URL that links to THAT Parent URL, and try looking that one up, until you find one that IS in the index. We need to find the break in the "chain" of links between your Base URL, and the Content URL.

  • With the Parent URL found in the index, click on it to see its List/Edit Details page. On the Details page, click Children link to see what links were found on that page and see if the missing page is listed.

    Is the missing link among the listed Children links, and is there an error next to it?

    • If it's not there at all, the Parametric Search Appliance might not be processing your Parent URL correctly, please get in touch with Thunderstone Support.

    • If it's listed and there's a error, that should describe why it's not present.

    • If the URL is there without an error, then the Parametric Search Appliance chose not to index the URL because of some rule, such as robots.txt, meta robots, exclusions, max pages, max depth, exclude by field links, etc. Walking again with a higher Verbosity value, such as 4, may help explain why it wasn't walked.

Copyright © Thunderstone Software     Last updated: Nov 8 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.