Field Definitions

The schema format for these records, as contained in the supplied file timport.sch follows:

database /tmp/testdb
   table    load
   # use formfeed as record delimiter (recdelim implies multiple)
   recdelim \x0c
   # take multiple records from a single file
   #multiple
   #       name    type            tag     default_val
   field   Subject varchar(80)     Subject
   field   From    varchar(40)     From
   field   Number  long            Number  0
   field   Date    date            Date
   field   File    varind          -
   field   Text    varchar(1000)   -
   # create table load(id counter,Subject varchar(80),From varchar(40),
   #             Number long,Date date,File varind,Text varchar(1000));

The specification of each field is the most precisely defined item in a schema file. Follow these rules:

  • Fields must be listed in the order that they appeared in the create table statement and fields should be listed last of the keywords in the schema file. Make sure that you have a newline at the end of the last line.

  • The keyword field expects 3 or 4 values: name, type, tag, and (optionally) default value.

    Value 1: NAME
    The name of the field in the Texis table. The name of the table must be prepended to the field name when multiple tables are being loaded (e.g. "mytable.myfield").

    Value 2: TYPE
    The type of the field in the Texis table. This is the same type that would be given to the SQL "create table". If no length it provided varchar fields will be set to length 80 and all other types will be set to 1.

    Value 3: TAG
    The tag for the field or a '/' followed by a REX expression to match the tag. The default tag expression where a tag only is specified is:

    >>$\Rtag:=\P[\x20\x09]*[^\x0d\x0a]+

    • This expression is the tag at the beginning of a line followed by ':' and optional whitespace. Everything after that, up to the end of line, is the field content. Refer to Metamorph documentation for all REX details.

    • Use \x20 and \x09 instead of space and tab, respectively, within the REX expression, since space and tab are delimiters within the schema file.

    • Alternatively, if you specify a "recexpr" it is the REX subexpression number/range that matches the field; (e.g. 2 or 2-4). Subexpressions are numbered starting at 1.

    • When using "csv" it is the field number from the input data. Input fields are numbered starting at 1.

    • When using "col" it is the character column position(s) to include in the field. (e.g. 2 or 2-10). Columns are numbered starting at 1.

    • When using "xml" it is an XPath-like specifier. Forward slashes '/' denote nested tags, and '@' symbols denote an attribute.

    • A lone '-' means the field will not be searched for. Its default value, if any, will always be used. When used with char and indirect fields and no default value, the field will be filled in with the contents of the entire text or the name of the import file respectively.

    • A lone '-' when used with numeric fields means one of two things. If the "default value" begins with a # the field will be filled in with an incrementing number starting with 1. If there is a number after the # it will be used as the starting number.

      If the "default value" starts with a field name the field will be filled in with the length of the fields named as the "default value". To get the length of multiple fields name them all with plus(+) between the names (e.g.: Title+Subject+Body). A minus(-) may also be used for subtraction.

    Value 4: DEFAULT-VALUE
    A default value to insert if the field is not found. Everything to the end of line is used, including spaces. For character fields you might put "NONE", or "UNKNOWN" (without the quotes), or you can put '' or "" in the field for "empty". NOTE: "" for empty was added in version 2.12 (Feb 20 1999) With no default, the importer makes one up based on field type as below:

    indirect -> input file name
         byte     -> empty
         strlst   -> empty
         char     -> entire record
         numerics -> 0
         date     -> current date/time

The fields Subject, From, Number and Date will be loaded by locating those tags at the start of those respective lines. The text which follows the tag up to the end of the line will be imported as the content of the field.

The text of the whole file will be loaded into the field called File, and the text of each separate record will be loaded into the field called Text, as indicated by the lone '-' in Value 3.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.