Refresh All still starts with the Base URLs and explores from there.
But instead of creating a new database and fully downloading and processing each
URL, it leaves the already-indexed data in place, and check each of the URLs to
see if the content has changed. New URLs are added to the database, and URLs
that are no longer present on the server are removed from the database.
If a URL's content hasn't changed, the Webinator doesn't reprocess the file. If
the server supports
If-Modified-Since (or it's doing a
walk), the content won't even be transferred. This lets the walk be much more
Refresh Allwalks -
Refresh Allwalks are useful for keeping content up to date once you've established all your walk settings. You're guaranteed for the walk to see anything that's changed, without needing to fully reprocess every URL every time.
Refresh All walks don't apply the walk settings every walk. A
new Data from Field rule to customize the Title will not take effect if
a URL's contents hasn't changed. If you change your settings to include more
URLs (i.e. add extensions, remove exclusions, add domains, etc.), a
Refresh All walk is not likely to find the newly allowed data, unless
all of the URLs leading to this data have been modified. You should do a
New walk once to process these changes.
For some large collections, especially those whose servers don't support
If-Modified-Since, checking every URL every walk may still be too
intensive. For these,
Refresh walks can be used (see below).
If more than 30%-50% of your site changes between walks you may be
better off using a
New walk instead of
Also, many dynamic content generators may not give accurate Last-Modified dates, which will cause every URL to be
rewalked. In that case you should use
New instead of