"Why didn't this content get indexed?" is a common first troubleshooting
problem.
The first step is determining a specific content URL that you would expect to be
part of the searchable content, but isn't. We'll refer to this as the "Content
URL".
- Use the Tools → List/Edit URLs interface too look
up the Content URL. Is it present in the searchable database?
- If so, clicking on the listed URL to go to the List/Edit Details page for
the Content URL. Here you can compare its content to what you expect, and
view any errors.
- If the Content URL isn't in the index, you need to determine a URL that
links to that Content URL (we'll call this "Parent URL").
Now look up the Parent URL in Tools → List/Edit URLs.
Is the Parent URL in the index?
- If not, we need to repeat the process again, thinking of a URL that links
to THAT Parent URL, and try looking that one up, until you find one that IS in
the index. We need to find the break in the "chain" of links between your
Base URL, and the Content URL.
- With the Parent URL found in the index, click on it to see its List/Edit
Details page. On the Details page, click
Children
link to see what
links were found on that page and see if the missing page is listed.
Is the missing link among the listed Children links, and is there an error
next to it?
- If it's not there at all, the Parametric Search Appliance might not be processing your
Parent URL correctly, please get in touch with Thunderstone Support.
- If it's listed and there's a error, that should describe why it's not
present.
- If the URL is there without an error, then the Parametric Search Appliance chose not to
index the URL because of some rule, such as robots.txt, meta robots,
exclusions, max pages, max depth, exclude by field links, etc. Walking again
with a higher
Verbosity
value, such as 4
, may help explain
why it wasn't walked.
Copyright © Thunderstone Software Last updated: Nov 8 2024