Where dates appear in the text in a predictable format, they can be
captured as a date field using the data type date
, rather than
as characters only. This is shown in the example above with the log
file. You may also use the data type datestr
to perform date parsing,
but store the results into a varchar
field (datestr
was added in version 2.12 (Oct 2 1998).
Timport allows for some flexibility in the manner in which the
dates might appear.
The keyword datefmt is the format to expect date fields in. The default is Texis style:
yyyy-mm-dd[ HH[:MM[:SS]]]
where the first 4 digits represent the year, then 2 digits for the month, 2 digits for the day, and optionally 2 digits each for hours, minutes, and seconds. The scanner in Timport will treat all punctuation and space as delimiters.
In the above schema file, the datefmt keyword is defined to match the way it exists in the log file:
datefmt yy-mm-dd HHMM
This will match dates as above:
95-04-21 1300
95-04-21 1301
95-04-21 1800
Use these specifications to define the expected date format as the value for datefmt. Specify:
y for year digits
m for month digits or month name
d for day of month digits
j for day of year digits
H for hour digits
M for minute digits
S for second digits
p for "am" or "pm" string
x for junk
p
' will only check for 'a
' or 'p
' then
skip all trailing alphabetics.x
' will skip all alphabetics.
Examples:
Format Matches Means
yy-mm-dd HHMM 95-04-27 16:54 1995-04-27 16:54:00
dd-mm-yyyy HH:MM:SS 27/04/1995 16:54:32 1995-04-27 16:54:32
yyyymmdd HHMMSS p 19950427 045432 pm 1995-04-27 16:54:32
x, dd mmm yy HH:MM:SS Thu, 27 Apr 95 16:55:56 1995-04-27 16:55:56
Capturing the dates as date values allows for greater than >
,
less than <
manipulations of document by date range, adding to
the power of the database.