Database and File Usage

Webinator maintains a database that contains text from HTML pages, links to other pages, and a list of categories.

When the Webinator walker runs it creates a new database, under your specified data directory, to hold the new walk. It then dispatches a separate process for each web site it needs to visit and another to handle all of the "Single Pages". Each of these retrieves all of the pages in its base list and stores the text of the HTML page to the html table and the hyperlinks to the refs table. All of the desirable URLs from the page that have not been seen before are placed into an internal "todo" list. After all of the base URLs are processed the process repeats with the internal todo list. When there's nothing left in the todo list processing is complete.

Once all of the walking is complete the indices needed for searching are created on the data. Then the new database is flagged as the "live" one and the old database is deleted. Therefore your disk must have sufficient space for 2 complete databases plus temporary space used during the indexing step.

The databases are stored under your specified data directory. The databases are called db1 and db2. Webinator alternates between using these two names.

Note that the above applies to a walk type of New. During a walk type of Refresh only one database, the "live" one, is used.

Webinator also maintains a file containing the detailed report for each walk. This file has the same name as the database with .long appended to the end. Also, a single file called summary is maintained with short summary information about the state of the database.

Given a data directory named .../default there may also be the following:

.../default/db1
an actual walk database
.../default/db2
an actual walk database
.../default/db1.long
detailed walk report. Displayed when viewing Walk Status
.../default/db2.long
detailed walk report. Displayed when viewing Walk Status
.../default/summary
summary walk report. Displayed as Walk summary when viewing Walk Settings

Webinator, being based on Texis, also has the notion of a global "default" database. This database resides in the installation directory. On Unix it is called INSTALLDIR/texis/testdb. On Windows it is called INSTALLDIR\texis\testdb. This database is used to hold all of the profile and account settings. It does not contain any walked data. It is recommended that you not use this as your data directory.

Each setting has a record in the options table of the default database. See section 6.6 (here) for the list of fields in the table. At each complete rewalk the current options settings are copied into an options table in the walk database. These options are not changed as settings are modified and are not otherwise used unless a search is performed setting the database with db instead of setting the profile with pr.


Copyright © Thunderstone Software     Last updated: Oct 5 2023
Copyright © 2024 Thunderstone Software LLC. All rights reserved.