Handling Columnar Records

A good way to handle tabular data is to define an entire record with a recexpr, then identify each column of text within the record with a sub-expression. This is shown well with the schema file provided as log.sch which contains a sampling of such text.

Let's say a log file of daily activities exists, where each record is separated by a carriage return, and each line contains specified information starting at specified columns. In this example, the date and time always appear starting from Column 1, and extending for 13 characters. In the next column there is either a space or an asterisk, indicating that an event has occurred or is merely scheduled. The next column is the user who logged it, followed by a longish text field describing what occurred. The whole record would never be longer than 2048 characters.

Below is an example of 3 records as they might look in the text log file: (some lines have been wrapped to fit on the page)

95-04-21 1300 bill     xxxx yyy - zzz, called, talked to joe
     95-04-21 1301 bill     xxxx yyyyy - zzz aaaaa, called for joe,
     regarding ret'ing some voice mail software, j to cb at
     999-888-7777 x 102
     95-04-21 1800*bob      remember telephone maint. tomorrow morn.

The schema format for this is as listed in the schema file log.sch:

#
   # import the log into Texis
   # example of columnar records
   #
   database /usr/db/custdb
   user     customer
   pass     snoopy
   table    log
   recdelim \x0a
   multiple
   datefmt  yy-mm-dd HHMM
   recexpr  >>^\P=.{13}.=[^ ]+ +[^\x0a]*
   #       name    type            tag     default_val
   field   Date    date            2
   field   Future  char            3
   field   Who     varchar(8)      4
   field   What    varchar(2000)   6
   # create table log(id counter, Date date, Future char(1),
   #                  Who varchar(8), What varchar(2000));

These fields are specified with numbers indicating the sub-expression which is part of the whole expression listed as the value for the recexpr. Repetition operators separate each sub-expression.

Sub-expr   No.  Meaning               Fieldname
     >>^\P=     1    newline               precedes expression, excluded
     .{13}      2    13 characters         Date
     .=         3    1 character           Future
     [^ ]+      4    not space chars       Who
      +         5    space chars           not used
     [^\x0a]*   6    chars up to newline   What

This schema file contains some other keywords not yet covered. They indicate as follows:

port     port_number       (port number is "10012" for example only)
    user     texis_user        (texis_user is "customer")
    pass     texis_password    (texis_password is "snoopy")


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.