2.3 Timport Schemas


2.3 Timport Schemas

Timport knows several schema formats, designed to handle many common formats, as well as any format that can be recognized by REX. All except for the DBF format behave the same. With .DBF files timport will simply create a table with the corresponding schema and data. For the other formats you can specify which fields you want timport to populate or return, what the datatype should be, and where to get the data from. A schema file consists of a number of settings, and then the field definitions. The basic format of the field definitions is:

field name type tag defaultvalue

The differences between the various schemas lies in the format specified in the schema, and the usage of the tag field in the field specifier. The default schema expects data to be in mail-header format, e.g.

Name: Value

The tag would be Name, and is case-sensitive. If you start the tag with a forward slash then the tag is taken to be a REX expression to match the tag. For other import formats the value of tag are the field, column or subexpression numbers.

Other common formats are delimited text, such as csv, and REX. The delimiter is an argument to csv. With REX you specify an expression that will match the entire record. If you are not familiar with REX there are a few features in the REX syntax which are designed to ensure deterministic behavior, but can lead to confusion if you are familiar with other regular expression syntaxes. The most common causes of confusion are that repetition operators apply to an entire subexpression, not just a single character, REX matches as much as it can, and doesn't backup, and that REX is directional. For example to construct the expression to match the mail header format is:

>>$\RName:=\P[\x20\x09]*[^\x0d\x0a]+

Dissecting the REX expression will give a good introduction to REX's behavior. The >> anchors the expression at the beginning, which ensures us we will go forwards. That is most important when you are searching for not something. For example if we started at the end we'd match the whole line, and then not be able to find Name:.

The $ matches a newline. The \R forces REX to respect case. Normally REX is case-insensitive. Name: will match that exact string. The = ends the first subexpression, and indicates that we are looking for one occurrence of Name: at the beginning of a line. The \P is the preceding flag to REX. The subexpression the flag occurs in, and all preceding ones, must be found, however they are not considered as part of the hit. There is a \F flag for following which behaves similarly. [\x20\x09] is a character set consisting of a space and a tab. The repetion operator * indicates zero or more occurrences. This eats any whitespace after Name: and before the value. The next expression, [^\x0d\x0a] is also a character set, consisting of everything except carriage-return and linefeed. The repetition operator + means 1 or more occurrences, so there must be a value.

For the program timport you must specify database and table, either on the command line or in the schema file. The timport function in Texis will ignore those.

Back: The TIMPORT function in Texis

Next: Other Import Methods