Indexing properties

indexspace

A directory in which to store the index files. The default is the empty string, which means use the database directory. This can be used to put the indexes onto another disk to balance load or for space reasons. If indexspace is set to a non-default value when a Metamorph index is being updated, the new index will be stored in the new location.

indexblock

When a Metamorph index is created on an indirect field, the indirect files are read in blocks. This property allows the size of the block used to be redefined.

indexmem

When indexes are created Texis will use memory to speed up the process. This setting allows the amount of memory used to be adjusted. The default is to use 40% of physical memory, if it can be determined, and to use 16MB if not. If the value set is less than 100 then it is treated as a percentage of physical memory. It the number is greater than 100 then it is treated as the number of bytes of memory to use. Setting this value too high can cause excessive swapping, while setting it too low causes unneeded extra merges to disk.

indexmeter

Whether to print a progress meter during index creation/update. The default is 0 or 'none', which suppresses the meter. A value of 1 or 'simple' prints a simple hash-mark meter (with no tty control codes; suitable for redirection to a file and reading by other processes). A value of 2 or 'percent' or 'pct' prints a hash-mark meter with a more detailed percentage value (suitable for large indexes). Added in version 4.00.998688241 Aug 24 2001.

meter

A semicolon-separated list of processes to print a progress meter for. Syntax:

{process[=type]}|type [; ...]

A process is one of index, compact, or the catch-all alias all. A type is a progress meter type, one of none, simple, percent, on (same as simple) or off (same as none). The default type if not given is on. E.g. to show a progress meter for all meterable processes, simply set meter to on. Added in version 6.00.1290500000 20101123.

addexp

An additional REX expression to match words to be indexed in a Metamorph index. This is useful if there are non-English words to be searched for, such as part numbers. When an index is first created, the expressions used are stored with it so they will be updated properly. The default expression is \alnum{2,99}. Note: Only the expressions set when the index is initially created (i.e. the first CREATE METAMORPH ... statement - later statements are index updates) are saved. Expressions set during an update (issuance of "create metamorph [inverted] index" on an existent index) will not be added.

delexp

This removes an index word expression from the list. Expressions can be removed either by number (starting with 0) or by expression.

lstexp

Lists the current index word expressions. The value specified is ignored (but required syntactically).

addindextmp

Add a directory to the list of directories to use for temporary files while creating the index. If temporary files are needed while creating a Metamorph index they will be created in one of these directories, the one with the most space at the time of creation. If no addindextmp dirs are specified, the default list is the index's destination dir (e.g. database or indexspace), and the environment variables TMP and TMPDIR.

delindextmp

Remove a directory from the list of directories to use for temporary files while creating a Metamorph index.

lstindextmp

List the directories used for temporary files while creating Metamorph indices. Aka listindextmp.

indexvalues

Controls how a regular (B-tree) index stores table values. If set to splitstrlst (the default), then strlst-type fields are split, i.e. a separate (item,recid) tuple is stored for each (varchar) item in the strlst, rather than just one for the whole (strlst,recid) tuple. This allows the index to be used for some set-like operators that look at individual items in a strlst, such as most IN, SUBSET (here) and INTERSECT (here) queries.

If indexvalues is set to all - or the index is not on a strlst field, or is on multiple fields - such splitting does not occur, and the index can generally not be used for set-like queries (with some exceptions; see here for details).

Note that if index values are split (i.e. splitstrlst set and index is one field which is strlst), table rows with an empty (zero-items) strlst value will not be stored in the index. This means that queries that require searching for or listing empty-strlst table values cannot use such an index. For example, a subset query with a non-empty parameter on the right side and a strlst table column on the left side will not be able to return empty-strlst rows when using an index, even though they match. Also, subset queries with an empty-strlst or empty-varchar parameter (left or right side) must use an indexvalues=all index instead. Thus if empty-strlst subset query parameters are a possibility, both types of index splitstrlst and all) should be created.

As with stringcomparemode, only the creation-time indexvalues value is ever used by an index, not the current value, and the optimizer will attempt to choose the best index at search time. The indexvalues setting was added in Texis version 7; previous versions effectively had indexvalues set to splitstrlst. Caveat: A version 6 Texis will issue an error when encountering an indexvalues=all index (as it is unimplemented in version 6), and will refuse to modify the index or the table it is on. A version 5 or earlier Texis, however, may silently corrupt an indexvalues=all index during table modifications.

btreethreshold

This sets a limit as to how much of an index should be used. If a particular portion of the query matches more than the given percent of the rows the index will not be used. It is often more efficient to try and find another index rather than use an index for a very frequent term. The default is set to 50, so if more than half the records match, the index will not be used. This only applies to ordinary indices.

btreelog

Whether to log operations on a particular B-tree, for debugging. Generally enabled only at the request of tech support. The value syntax is:

[on=|off=][/dir/]file[.btr]

Prefixing on= or off= turns logging on or off, respectively; the default (if no prefix) is on. Logging applies to the named B-tree file; if a relative path is given, logging applies to the named B-tree in any database accessed.

The logging status is also saved in the B-tree file itself, if the index is opened for writing (e.g. at create or update). This means that once logging is enabled and saved, every process that accesses the B-tree will log operations, not just ones that have btreelog explicitly set. This is critical for debugging, as every operation must be logged. Thus, btreelog can just be set once (e.g. at index create), without having to modify (and track down) every script that might use the B-tree. Logging can be disabled later, by setting "off=file" and accessing the index for an update.

Operations are logged to a text file with the same name as the B-tree, but ending in ".log" instead of ".btr". The columns in the log file are as follows; most are for tech support analysis, and note that they may change in a future Texis release:

Date Date
Time Time (including microseconds)
Script and line Vortex script and line number, if known
PID Process ID
DBTBL handle DBTBL handle
Read locks Number of read locks (DBTBL.nireadl)
Write locks Number of write locks (DBTBL.niwrite)
B-tree handle BTREE handle
Action What action was taken:
- open B-tree open: Recid is root page offset
- create B-tree create
- close B-tree close
- RDroot Read root page
- dump B-tree dump
- WRhdr Write B-tree header: Recid is root page offset
- WRdd Write data dictionary: Recid is DD offset. (Read DD at open is not logged.)
- delete Delete key: Recid is for the key
- append Append key
- insert Insert key
- search Search for key
- RDpage Read page: Recid is for the page
- WRpage Write page
- CRpage Create page
- FRpage Free page
- FRdbf Free DBF block
Result Result of action:
- ok Success
- fail Failure
- dup Duplicate (e.g. duplicate insert into unique B-tree)
- hit Search found the key
- miss Search did not find the key
Search mode Search mode:
- B Find before
- F Find
- A Find after
Index guarantee DBTBL.indguar flag (1 if no post-process needed)
Index type Index type:
- N DBIDX_NATIVE (bubble-up)
- M DBIDX_MEMORY (RAM B-tree)
- C DBIDX_CACHE (RAM cache)
Recid Record id; see notes for Action column
Key size Key size (in bytes)
Key flags Flags for each key value, separated by commas:
- D OF_DESCENDING
- I OF_IGN_CASE
- X OF_DONT_CARE
- E OF_PREFER_END
- S OF_PREFER_START
Key Key, i.e. value being inserted, deleted etc.; multiple values separated with commas

Unavailable or not-applicable fields are logged with a dash. Note that enabling logging can produce a large log file quickly; free disk space should be monitored. The btreelog setting was added in version 5.01.1134028000 20051208.

btreedump

Dump B-tree indexes, for debugging. Generally enabled only at the request of tech support. The value is an integer whose bits are defined as follows:

Bits 0-15 define what to dump. Files are created that are named after the B-tree, with a different extension:

Bit: 0: Issue a putmsg about where dump file(s) are
Bit: 1: .btree file: Copy of in-mem BTREE struct
Bit: 2: .btrcopy file: Copy of .btr file
Bit: 3: .cache file: Page cache from BCACHE, BPAGE
Bit: 4: .his file: History from BTRL
Bit: 5: .core file: fork() and dump core

Bits 16+ define when to dump:

Bit: 16: At "Cannot insert value" messages
Bit: 17: At "Cannot delete value" messages
Bit: 18: At "Trying to insert duplicate value" messages

The files are for tech support analysis. Formats and bits subject to change in future Texis releases. The btreedump setting was added in version 5.01.1131587000 20051109.

maxlinearrows

This set the maximum number of records that should be searched linearly. If using the indices to date yield a result set larger than maxlinearrows then the program will try to find more indices to use. Once the result set is smaller than maxlinearrows, or all possible indices are exhausted, the records will be processed. The default is 1000.

likerrows

How many rows a single term can appear in, and still be returned by liker. When searching for multiple terms with liker and likep one does not always want documents only containing a very frequent term to be displayed. This sets the limit of what is considered frequent. The default is 1000.

indexaccess

If this option is turned on then data from an index can be selected as if it were a table. When selecting from an ordinary (B-tree) index, the fields that the index was created on will be listed. When selecting from a Metamorph index a list of words (Word column`), count of rows containing each word (RowCount), and - for Metamorph inverted indexes - count of all hits in all rows (OccurrenceCount) for each word will be returned.

indexchunk

In versions of Texis after October 1998, the indexchunk setting is deprecated and unused. In prior releases, when creating a Metamorph index temporary files are used which in the worst case can grow to twice the size of the data being indexed. This process can be broken into stages, such that after indexing a certain amount of data the temporary files are processed, to generate a partial index, and then the process repeats for the rest of the data. By default the amount of free disk space is checked on startup, and used to calculate when it will need to perform the processing step. If the system does not report free disk space accurately, or to free more disk space, this value can be changed. The default is 0, which automatically calculates a value. Otherwise it is set to the number of bytes of data to index before processing the temporary files. Lower values conserve disk space, at the expense of more time to process intermediate files.

cleanupwait

Windows/NT specific After updating a Metamorph index the database will wait this long before trying to remove the old copy of the index. This is to allow any other process currently using the index time to stop using the index, so it can be removed. The default is twenty seconds. If a whole batch of Metamorph indices are being updated right after another, it may be useful to set this to 0 for all but the last index, as an attempt will be made to remove all old indices after every index update.

dbcleanupverbose

Integer whose bit flags control some tracing messages about database cleanup housekeeping (e.g. removal of unneeded temporary or deleted indexes and tables). A bit-wise OR of the following values:

0x01: Report successful removal of temporary/deleted indexes/tables.
0x02: Report failed removal of such indexes/tables.
0x04: Report on in-use checks of temporary indexes/tables.

The default is 0 (i.e. no messages). Note that these cleanup actions may also be handled by the Database Monitor; see also the [Monitor] DB Cleanup Verbose setting in conf/texis.ini. Added in version 6.00.1339712000 20120614.

indextrace

For debugging: trace index usage, especially during searches, issuing informational putmsgs. Greater values produce more messages. Note that the meaning of values, as well as the messages printed, are subject to change without notice. Aka traceindex, traceidx. Added in version 3.00.942186316 19991109.

tracerecid

For debugging: trace index usage for this particular recid. Added in version 3.01.945660772 19991219.

indexdump

For debugging: dump index recids during search/usage. Value is a bitwise OR of the following flags:

Bit 0: for new list
Bit 1: for delete list
Bit 2: for token file
Bit 3: for overall counts too

The default is 0.

indexmmap

Whether to use memory-mapping to access Metamorph index files, instead of read(). The value is a bitwise OR of the following flags:

Bit 0: for token file
Bit 1: for .dat file

The default is 1 (i.e. for token file only). Note that memory-mapping may not be supported on all platforms.

indexreadbufsz

Read buffer size, when reading (not memory-mapping) Metamorh index .tok and .dat files. The default is 64KB; suffixes like "KB" are respected. During search, actual read block size could be less (if predicted) or more (if blocks merged). Also used during index create/update. Decreasing this size when creating large indexes can save memory (due to the large number of intermediate files), at the potential expense of time. Aka indexreadbufsize. Added in version 4.00.1006398833 20011121.

indexwritebufsz

Write buffer size for creating Metamorph indexes. The default is 128KB; suffixes like "KB" are respected. Aka indexwritebufsize. Added in version 4.00.1007509154 20011204.

indexmmapbufsz

Memory-map buffer size for Metamorph indexes. During search, it is used for the .dat file, if it is memory-mapped (see indexmmap); it is ignored for the .tok file since the latter is heavily used and thus fully mapped (if indexmmap permits it). During index update, indexmmapbufsz is used for the .dat file, if it is memory-mapped; the .tok file will be entirely memory-mapped if it is smaller than this size, else it is read. Aka indexmmapbufsize. The default is 0, which uses 25% of RAM. Added in version 3.01.959984092 20000602. In version 4.00.1007509154 20011204 and later, "KB" etc. suffixes are allowed.

indexslurp

Whether to enable index "slurp" optimization during Metamorph index create/update, where possible. Optimization is always possible for index create; during index update, it is possible if the new insert/update recids all occur after the original recids (e.g. the table is insert-only, or all updates created a new block). Optimization saves about 20% of index create/update time by merging piles an entire word at a time, instead of word/token at a time. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1004391616 20011029.

indexappend

Whether to enable index "append" optimization during Metamorph index update, where possible. Optimization is possible if the new insert recids all occur after the original recids, and there were no deletes/updates (e.g. the table is insert-only); it is irrelevant during index create. Optimization saves index build time by avoiding original token translation if not needed. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1006312820 20011120.

indexwritesplit

Whether to enable index "write-split" optimization during Metamorph index create/update. Optimization saves memory by splitting the writes for (potentially large) .dat blocks into multiple calls, thus needing less buffer space. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1015532186 20020307.

indexbtreeexclusive

Whether to optimize access to certain index B-trees during exclusive access. The optimization may reduce seeks and reads, which may lead to increased index creation speed on platforms with slow large-file lseek behavior. The default is 1 (enabled); set to 0 to disable. Added in version 5.01.1177548533 20070425.

mergeflush

Whether to enable index "merge-flush" optimization during Metamorph index create/update. Optimization saves time by flushing in-memory index piles to disk just before final merge; generally saves time where indexslurp is not possible. The default is 1 (enabled); set to 0 to disable. Added in version 4.00.1011143988 20020115.

indexversion

indexversion Which version of Metamorph index to produce or update, when creating or updating Metamorph indexes. The supported values are 0 through 3; the default is 2. Setting version 0 sets the default index version for that Texis release. Note that old versions of Texis may not support version 3 indexes. Version 3 indexes may use less disk space than version 2, but are considered experimental. Added in version 3.00.954374722 20000329.

indexmaxsingle

For Metamorph indexes; the maximum number of locations that a single-recid dictionary word may have and still be stored solely in the .btr B-tree file (without needing a .dat entry). Single-recid-occurence words usually have their data stored solely in the B-tree to save a .dat access at search time. However, if the word occurs many times in that single recid, the data (for a Metamorph inverted index) may be large enough to bloat the B-tree and thus negate the savings, so if the single-recid word occurs more than indexmaxsingle times, it is stored in the .dat. The default is 8.

uniqnewlist

Whether/how to unique the new list during Metamorph index searches. Works around a potential bug in old versions of Texis; not generally set. The possible values are:

0: : do not unique at all
1: : unique auxillary/compound index new list only
2: : unique all new lists
3: : unique all new lists and report first few duplicates

The default is 0.

tablereadbufsz

Size of read buffer for tables, used when it is possible to buffer table reads (e.g. during some index creations). The default is 16KB. When setting, suffixes such as "KB" etc. are supported. Set to 0 to disable read buffering. Added in version 5.01.1177700467 20070427. Aka tablereadbufsize.