A good way to handle tabular data is to define an entire record with a
recexpr, then identify each column of text within the record
with a sub-expression. This is shown well with the schema file
provided as log.sch
which contains a sampling of such text.
Let's say a log file of daily activities exists, where each record is separated by a carriage return, and each line contains specified information starting at specified columns. In this example, the date and time always appear starting from Column 1, and extending for 13 characters. In the next column there is either a space or an asterisk, indicating that an event has occurred or is merely scheduled. The next column is the user who logged it, followed by a longish text field describing what occurred. The whole record would never be longer than 2048 characters.
Below is an example of 3 records as they might look in the text log file: (some lines have been wrapped to fit on the page)
95-04-21 1300 bill xxxx yyy - zzz, called, talked to joe
95-04-21 1301 bill xxxx yyyyy - zzz aaaaa, called for joe,
regarding ret'ing some voice mail software, j to cb at
999-888-7777 x 102
95-04-21 1800*bob remember telephone maint. tomorrow morn.
The schema format for this is as listed in the schema file
log.sch
:
#
# import the log into Texis
# example of columnar records
#
database /usr/db/custdb
user customer
pass snoopy
table log
recdelim \x0a
multiple
datefmt yy-mm-dd HHMM
recexpr >>^\P=.{13}.=[^ ]+ +[^\x0a]*
# name type tag default_val
field Date date 2
field Future char 3
field Who varchar(8) 4
field What varchar(2000) 6
# create table log(id counter, Date date, Future char(1),
# Who varchar(8), What varchar(2000));
These fields are specified with numbers indicating the sub-expression which is part of the whole expression listed as the value for the recexpr. Repetition operators separate each sub-expression.
Sub-expr No. Meaning Fieldname
>>^\P= 1 newline precedes expression, excluded
.{13} 2 13 characters Date
.= 3 1 character Future
[^ ]+ 4 not space chars Who
+ 5 space chars not used
[^\x0a]* 6 chars up to newline What
This schema file contains some other keywords not yet covered. They indicate as follows:
port port_number (port number is "10012" for example only)
user texis_user (texis_user is "customer")
pass texis_password (texis_password is "snoopy")