SYNOPSIStimport [-s schemafile] [-schema_option(s)] [options] [-file file(s)]
timport -dbf [-schema_option(s)] [options] [-file file(s)]
timport -csv [-schema_option(s)] [options] [-file file(s)]
timport -col [-schema_option(s)] [options] [-file file(s)]
timport -mail [-schema_option(s)] [options] [-file file(s)]
DESCRIPTION
Timport takes a data and table description file, schema file,
and imports files into Texis tables.
-s schemafile
Is required, unless using one of the special known format options, and specifies the name of the file containing the data and table descriptions.
-d database
Specifies the name of the database to use.
NOTE: This was changed in version 2.12 (Feb. 25 1999).
See the -D
option.
-v
Turns on verbose mode. Extra information about the processing
will be printed. More -v
's will increase verbosity.
Placing a number immediately after the v
will increase
verbosity by that much.
-c
Prints Texis API calls as they are made. This is useful to programmers to see the correct usage of the Texis API.
-t
Prints a tic mark (.) for each record imported. It provides a status display so you can get a feel for how far along it is.
-D
Dumps parsed records to the screen instead of inserting into Texis. This is useful for working out the tags and expressions. When this is on no attempt is made to connect to the Texis server or database, so testing may be done without the server or fear of messing up a table.
NOTE: This was added in version 2.12 (Feb. 25 1999).
-g
Generates the schema file with all of the current settings from
the specified schema file, command line, and guessed columns from
csv and col formats. This is most useful when building a schema
for a new dataset that is of format csv or col. When given just
the -csv
or -col
command line options or a schema
file with no fields defined, Timport will attempt to guess the
column positions, types and names. You can generate a schema file
based on its guess and adjust for any mistakes it might have made.
-h
Prints a short usage message.
-H
Prints a long usage message including information about the schema file.
-schema_option
This option allows you to specify anything that might be in a schemafile on the command line. Using this you can avoid writing a schema file for simple imports. It can also be used to override settings from the schema file. Specify an option just like it would be in the schema file. Make sure you quote things with backslash so the shell does not eat them.
e.g.: -database /tmp/testdb -csv "\x09"
-dbf
Import a dBase or FoxPro table.
-csv
Import "comma separted values" data. Guess at the field names.
-col
Import columnar data. Guess at the field positions and names.
-mail
Import data in Internet mail (RFC822) format. The fields From
,
Subject
, Date
, are stored in addition to the the full
text of the message.
input_file(s)
The data files to import into Texis.
You may specify multiple files.
Or you may specify -
to read from a pipe.
Or you may specify a file containing a list of file name by preceeding
the name with &
.
Schema file format
Comment lines start with a #
character. Blank lines are ignored.
Each line has the syntax:
keyword value(s)
where any number of space(s) and/or tab(s) separate keywords and values.
Ordering of keywords is not important except that fields must be listed in the order that they appeared in the create table statement and fields should be listed last (after all other keywords). In Texis version 6 and later, a maximum of 1000 fields may be listed (previous versions had a limit of 800).
Possible keywords: (a * indicates a required item)
host internet_address
port port_number
user texis_user
group texis_group
pass texis_password
recdelim record_delimiting_rex
recexpr record_matching_rex
readexpr record_delimiting_rex
recsize record_max_size
datefmt date_format_string
dbf optional_translation
csv optional_delimiter
col
mail
oracle
xml
xmlns uri
xmlns:prefix uri
keepemptyrec
stats
multiple
firstmatch
allmatch separator
trimspace
keepfirst
csvquote
csvescquote
xmldatasetlevel value
createtable boolean_value
database texis_database_name
noid texis_table_name
droptable texis_table_name
table texis_table_name
field texis_field_name texis_sql_type tag_name_or_expr [default]
NOTE: field
is not required if stats
is used.
host
, port
, user
, group
, and pass
are
the settings used to log into the Texis server. If unspecified timport
will log into the Texis server on the same machine on the default port
as PUBLIC
with no password. NOTE: Versions prior to 2.12 (May 13 1998)
logged in as user _SYSTEM
.
recdelim
is used for separating records out of an input file
containing multiple records. It implies multiple
. This will
override readexpr
.
recexpr
is an expression that matches an entire record.
field tags are then numbers indicating the subexpression range
for the field. Good for records that are not well delimited (like columns).
readexpr
is used as an input file delimiter for reading when using
"multiple" but not "recdelim". This is needed when using multiple
but not rexexpr
when reading from a pipe or redirection.
It specifies how to delimit reads. This expression should match
the interval between records. This is overridden by recdelim
.
recsize
sets the size of the maximum readable record when using
recdelim
or readexpr
. The default is 1 megabyte.
Increase this value if you ever see the "no end delimiter found
"
warning message.
datefmt
is the format to expect date fields in. The default is
Texis style "yyyy-mm-dd[ HH[:MM[:SS]]]".
The scanner will treat all punctuation and space as delimiters.
Specify:
The date scanner will read up to the next delimiter or how many digits
you specify, whichever comes first. Any non-digit is a delimiter for
the digit only types. 'p
' will only check for 'a
' or
'p
' then skip all trailing alphabetics. 'x
' will skip all
alphabetics. 1900 will be added to 2 digit year specs greater than or 69.
2000 will be added to 2 digit year specs less than 70.
Examples:
FORMAT MATCHES MEANS
yy-mm-dd HHMM 95-04-27 16:54 1995-04-27 16:54:00
dd-mm-yyyy HH:MM:SS 27/04/1995 16:54:32 1995-04-27 16:54:32
yyyymmdd HHMMSS p 19950427 045432 pm 1995-04-27 16:54:32
x, dd mmm yyyy HH:MM:SS Thu, 27 Apr 1995 16:55:56 1995-04-27 16:55:56
yyyy-jjj 1997-117 1995-04-27 00:00:00
dbf, csv, col, mail and oracle
allow you to specify one of several known file formats.
Instead of having to specify rex expressions for the fields timport will
automatically parse out the fields from the known format.
Specify one of the following keywords:
field
s.
The DBF files specified on the command line will be imported
into Texis table(s). The Texis table name will be that provided
with the table
keyword or the name of the original DBF
file if a table name is not provided. The fields will have the
same names in the Texis table as they did in the DBF table.
Data types will be preserved. Memo fields will become varchar
fields. If your DBF table has special characters in it you may
wish to use the dostoiso
option to translate characters
from the DOS code page to the ISO latin character set.
(e.g. dbf dostoiso
)
field
s are
specified timport
will attempt to guess the column positions and types by
sampling a number of rows from the first input file.
The first row is assumed to contain the names of the fields.
You can specify the precise column names and positions with
the field
keyword. Place the character column positions in
the 3rd value for field
. Character columns are numbered
starting at 1. Specify a range of character columns by placing
a hyphen (-) between the first and last columns numbers
(e.g.: 5-9
).
To get all characters after a particular column include the
hyphen, but leave off the second number (e.g.: 57-
).
By default the first row of each specified file will not be
imported. Use the keepfirst
keyword to import the first row.
csv
keyword. Everything up to the end of line will be taken as the
field delimiter. You may encode special characters in hex notation
by using \x
followed by the 2 digit hex code for the
character (e.g.: for tab delimiters use: csv \x09
).
If no field
s are specified timport
will attempt to guess the column names and types by
sampling a number of rows from the first input file.
The first row is assumed to contain the names of the fields.
You can specify the precise column names and types with
the field
keyword. Place the input field numbers in
the 3rd value for field
. Input fields are numbered
starting at 1.
By default the first row of each specified file will not be
imported. Use the keepfirst
keyword to import the first row.
Normally double quotes ("
) are respected. If your data has
quotes scattered through it and quotes are not used for field
binding, you can turn off quote processing with the csvquote
keyword.
If your data uses quotes around fields, but does not escape them
within fields by doubling them, you can turn off embedded
quote processing with the csvescquote
keyword.
-g
option to generate the schema file for this and
edit it to your liking.
<dataset>
<record>
<column1>abc</column1>
<column2>def</column2>
<column3>ghi</column3>
</record>
<record>
<column1>jkl</column1>
<column2>mno</column2>
</record>
...
</dataset>
It must be formatted as a set of records wrapped by an outer
tag (and possibly more outer tags - see xmldatasetlevel
below).
Note: Prior to July 2005, attributes on the dataset-level tag were not handled properly. In the following example:
<dataset randomattribute="value">
<record>
<column1>abc</column1>
...
</dataset>
Prior to July 2005, timport would see randomattribute
as the
first row: timport's first row would have randomattribute
set
to value
, and all fields under record
would be set to
their default values.
For subsequent rows, randomattribute
would be set to its
fields default value.
With a July 2005 or later version, randomattribute
will not be
seen as a separate row, and dataset@randomattribute
will be
properly set to value
for all rows fetched.
xmlns
defines a default XML namespace for the schema. All schema
elements will reside in this namespace. Unlike XML, the only way to
specify a default namespace is for the entire schema.
If finer control of where namespaces apply is needed,
please use multiple xmlns:prefix
commands.
xmlns:prefix
defines a XML namespace prefix to be used in the schema,
where prefix
is replaced by whatever prefix you wish to use.
It is legal (and very common) to define multiple prefixes in a single
schema. Please see XML Namespaces
(page here)
for more detail.
keepemptyrec
will use a record filled with default values when a
completely empty record is found (default behavior will discard a
completely empty record).
stats
will add fields "Fsize long
" and "Ftime date
"
and fill them in with the file's info for each file. It will also add
"File varind
" if no fields have been defined.
multiple
indicates that there may be more than one record per
input file.
firstmatch
indicates that the first match of a tag expression should
be stored instead of the last. Sometimes a tag expression will match
data in a following field. This flag will ensure that the first occurrence
of a tag within a record will be used instead of any subsequent match
within that record.
allmatch
indicates that all matches of a tag expression should
be combined and stored instead. Multiple occurances are combined with
the specified separator in between.
trimspace
indicates that leading and trailing whitespace should be
trimmed from character fields.
trimdollar
indicates that leading whitespace and dollar signs should be
trimmed from character fields.
keepfirst
only applies to the special formats csv
and col
.
It indicates that the first row from the input should be kept. By default it
will be deleted because it usually contains titles.
csvquote
only applies to the special format csv
.
It turns off special handling of quotes.
Normally double quotes ("
) are respected. If your data has
quotes scattered through it and quotes are not used for field
binding, you will need this option.
csvescquote
only applies to the special format csv
.
It turns off special handling of embedded quotes.
Normally embedded quotes are expected to be escaped by doubling them. This
will remove any attempt to handle embedded quotes.
xmldatasetlevel
indicates how deep the dataset tag is in an XML document.
If your data is buried a few levels deep in wrapper tags, you can use this
command to specify what level to regard as the 'dataset' level (See examples
below).
createtable
indicates whether timport should attempt to make the table
if it does not exist. To disable table creation set this to False
.
droptable
indicates that the table should be dropped before loading
any new data into it.
noid
will suppress the default "id counter
" field for the
specified table. Normally the field "id counter
" is inserted at the
beginning of all table definitions.
field
expects 3 or 4 values.
recexpr
this is expected to
be a range of subexpression numbers. When using csv
or
oracle
this is expected to be an input field number. When using
col
this is expected to be a range of input column
numbers.
-H
help.
Prerequisites
The Texis server must be running for client/server imports and the table(s) must match the schema before importing data. The importer will warn you if the table(s) don't match what you specified in the schema file. If the table does not exist it will be created. If the database does not exist and is supposed to be on the local machine it will be created.
EXAMPLE
Given the following schema file (timport.sch):
database /tmp/testdb
table load
# name type tag default_val
field Subject varchar Subject
field From varchar From
field Number long Number 0
field Date date Date
field File varind -
field Text varchar -
And the following input file (example.txt):
From: Thunderstone EPI Inc.
Subject: Test import
Number: 1
Date: 1995-04-19 11:31:00
This is my message; this is my file.
This is more message.
This is the last line of the message.
Use a command line like the following:
timport -s timport.sch example.txt