Walk Database Tables and Fields



Field Description
id Unique record id
Hash Document hash for duplicate content detection
Size Size of retrieved raw document (i.e. HTML)
Visited The date the page was modified (or fetched if modified not set)
Dlsecs The number of seconds needed to fetch the page
Depth The number of URLs traversed to reach the page
Url The URL of the real HTML page
Title The title of the page
Body The formatted textual content of the page, in Storage Charset (UTF-8)
Keywords The keywords meta data from the page
Description The description meta data from the page
Meta Other meta data from the page, separated by newlines
Catno List of categories to which the URL belongs
CatnoLowest Lowest Catno value
Modified The date the page was last modified
NextCheck The date the page should next be refreshed
Views The number of times this URL has been viewed (shown in results)
Clicks The number of times this URL has been clicked (in results)
CTR Click-through ratio
Pop Popularity (number of pages linking to this page)
MimeType MIME type of original page
Charset Character set of page as stored (usually Storage Charset)

Table 6.1: Fields in html table


Field Description
Url The URL of the HTML page
Ref The URL of a reference (link) on the HTML page

Table 6.2: Fields in refs table


Field Description
Catno The number for the category
OverlapsLower Y if some member(s) also in a lower category
Url The URL pattern for the category
Category The name of the category

Table 6.3: Fields in categories table


Field Description
Url The URL of an HTML page that could not be retrieved
Reason The reason it could not be retrieved
id Unique record id (includes timestamp info).

Table 6.4: Fields in error table


Field Description
id Contains the date and time of the query (unique record id)
Client The hostname of the web client that performed the query
Query The user's query as entered

Table 6.5: Fields in querylog table (if query logging enabled)

