Designing a Schema File

A schema file contains the basic information necessary to create a table or tables of a particular type. Associated information would be included in this file as comments. Information which Timport will act upon directly is listed as keywords with assigned values.

The simplest kind of table you could create would be one where the full text of each file was loaded into a table as a text field, and statistics about each file would be captured into respective fields. This requires virtually no study of text content, so we'll use it as the first example.

The Thunderstone's old indexing program 3DB indexed text files into a database. Timport can create this general kind of table with the sample schema file provided, called 3db.sch. The content of this schema file follows:

#
   # create a 3DB style Texis table with no extra info
   #
   database /tmp/testdb
   table    threedb
   stats
   # create table threedb(id counter,File varind,Fsize long,Ftime date);

To make sense of this file, read it with the following rules in mind, which apply to all schema files:

Preliminary Schema File Format Rules

  • The Texis server must be running for client/server data loading.

  • Comment lines start with a # character.

  • Blank lines are ignored.

  • Syntax is "keyword value(s)", where any number of space(s) and/or tab(s) separate keywords and values. Or "keyword=value(s)".

  • Each line should be terminated with a newline.

  • Order is not important except that fields must be listed in the order that they appeared in the CREATE TABLE statement and fields should be listed last.

The first 3 lines of the example file begin with a #, as does the last. These are comment lines but include important information to the creation of the table. The first comment describes what this schema file is for. The last comment gives the exact CREATE TABLE command to create with Texis first, before running Timport on this schema file.

The remaining 3 lines which are not comments, are the keywords and their values which Timport will act on to create the table. This is the lowest minimum requirement to a schema file:

  • A database must always be listed; in this case it is named /tmp/testdb. The keyword is database, separated with one or more spaces or tabs from its value /tmp/testdb.

  • A table (or tables) must always be listed; in this case it is named threedb. The keyword is table, separated with one or more spaces or tabs from its value threedb.

  • Some information for the table's fields must be listed. In this case that is done by using the keyword stats, which requires no value.

Stats automatically gets the file size and date. Field information can alternatively be obtained by listing the fields individually, as will be shown in later examples. Where no fields have been defined and stats is used, it will also automatically load the full text of the file as an indirect field.

You can use the keyword stats along with specified fields, to capture file size and date. To also load the full text of the file where additional fields have been specified, you would specify it as a field within the schema file.

In data type terms, stats adds the fields "Fsize long" and "Ftime date" and fills them in with the file's info for each file. It will also add "File varind" if no fields have been defined. Refer the Texis manual for a more complete understanding of data types.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.