One of the most common types of record oriented text is where a few
header lines precede a portion of narrative text. This whole pattern
is repeated throughout the file, so that there are many records per
file. You want to capture the headers to their respective fields, and
also capture the full text of the record to its own field. The sample
file timport.sch
provides an example of this.
The individual fields might be defined as separate expressions, or they might be defined as subexpressions of one large expression defining the whole record. Where an expression is defined for an entire record its value is assigned to the keyword recexpr for record expression.
Where a recexpr is used, the individual fields can be defined with numbers indicating which portion or range of the overall expression is to be used to capture the data for that field. Where recexpr is not used, each field will have its own REX expression defined.
The expression for a field is referred to as its tag. Default
expressions can be used, or your own complete REX expression
constructed. In the example that follows, the fields are easily
tagged as From
, Subject
, Number
, and Date
.
The text of the whole record is stored in the field called
Text
.
The first portion of the file timport.sch
is the schema. The
last portion is sample text to import, which looks like this:
From: multiple record file
Subject: First multiple record
Number: 1
Date: 1995-04-19 11:31:00
This is my message; this is my file.
^L
From: multiple record file
Subject: Second multiple record
Number: 2
Date: 1995-04-19 11:32:00
This is another message.
^L
From: multiple record file
Subject: Third multiple record
Number: 3
Date: 1995-04-19 11:33:00
This is getting tedious!
I'm going to stop now.
Where multiple records occur in a single file, they would be separated
by some sort of repeating textual pattern. In this example, it is
easy to see the form feed character \x0c
which appears as a
^L
separating the 3 records. The keyword for this is
recdelim, for record delimiter. Where a recdelim is defined in
a schema file, it implies that there are multiple records.
Sometimes the definition of the fields within the records defines an overall pattern which does not require a separate record delimiter. In this case you would prefer to use the keyword multiple. With a clear recdelim as in this example the keyword multiple is not required.
Specifically, the schema rules are:
Note that this schema file uses a recdelim. Therefore it does not need to also use the keyword multiple. It does not define the entire record with one expression, just with individual fields, so there is no recexpr defined.