Metamorph parameters

These settings affect the way that text searches are performed. They are equivalent to changing the corresponding parameter in the profile, or by calling the Metamorph API function to set them (if there is an equivalent). They are:

minwordlen
The smallest a word can get due to suffix and prefix removal. Removal of trailing vowel or double consonant can make it a letter shorter than this. Default 255.

keepnoise
Whether noise words should be stripped from the query and index. Default off.

suffixproc
Whether suffixes should be stripped from the words to find a match. Default on.

prefixproc
Whether prefixes should be stripped from the words to find a match. Turning this on is not suggested when using a Metamorph index. Default off.

rebuild
Make sure that the word found can be built from the root and appropriate suffixes and prefixes. This increases the accuracy of the search. Default on.

useequiv
Perform thesaurus lookup. If this is on then the word and all equivalences will be searched for. If it is off then only the query word is searched for. Default off. Aka keepeqvs in version 5.01.1171414736 20070213 and later.

inc_sdexp
Include the start delimiter as part of the hit. This is not generally useful in Texis unless hit offset information is being retrieved. Default off.

inc_edexp
Include the end delimiter as part of the hit. This is not generally useful in Texis unless hit offset information is being retrieved. Default on.

sdexp
Start delimiter to use: a regular expression to match the start of a hit. The default is no delimiter.

edexp
End delimiter to use: a regular expression to match the start of a hit. The default is no delimiter.

intersects
Default number of intersections in Metamorph queries; overridden by the @ operator. Added in version 7.06.1530212000 20180628.

hyphenphrase
Controls whether a hyphen between words searches for the phrase of the two words next to each other, or searches for the hyphen literally. The default value of 1 will search for the two words as a phrase. Setting it to 0 will search for a single term including the hyphen. If you anticipate setting hyphenphrase to 0 then you should modify the index word expression to include hyphens.

wordc
For language or wildcard query terms during linear (non-index) searches, this defines which characters in the document consitute a word. When a match is found for language/wildcard terms, the hit is expanded to include all surrounding word characters, as defined by this setting. The resulting expansion must then match the query term for the hit to be valid. (This prevents the query "pond" from inadvertently matching the text "correspondence", for example.) The value is specified as a REX character set. The default setting is [\alpha\'] which corresponds to all letters and apostrophe. For example, to exclude apostrophe and include digits use: set wordc='[\alnum]' Added in version 3.00.942260000. Note that this setting is for linear searches: what constitutes a word for Metamorph index searches is controlled by the index expressions (addexp property, here). Also note that non-language, non-wildcard query terms (e.g. 123 with default settings) are not word-expanded.

langc
Defines which characters make a query term a language term. A language term will have prefix/suffix processing applied (if enabled), as well as force the use of wordc to qualify the hit (during linear searches). Normally langc should be set the same as wordc with the addition of the phrase characters space and hyphen. The default is [\alpha\' \-] Added in version 3.00.942260000.

withinmode
A space- or comma-separated unit and optional type for the "within-N" operator (e.g. w/5). The unit is one of:

  • char for within-N characters

  • word for within-N words
The optional type determines what distance the operator measures. It is one of the following:

  • radius (the default if no type is specified when set) indicates all sets must be within a radius N of an "anchor" set, i.e. there is a set in the match such that all other sets are within N units right of its right edge or N units left of its left edge.

  • span indicates all sets must be within an N-unit span
Added in version 4.04.1077930936 20040227. The optional type was added in version 5.01.1258712000 20091120; previously the only type was implicitly radius. In version 5 and earlier the default setting was char (i.e. char radius); in version 6 and later the default is word span.

phrasewordproc
Which words of a phrase to do suffix/wildcard processing on. The possible values are mono to treat the phrase as a monolithic word (i.e. only last word processed, but entire phrase counts towards minwordlen); none for no suffix/wildcard processing on phrases; or last to process just the last word. Note that a phrase is multi-word, i.e. a single word in double-quotes is not considered a phrase, and thus phrasewordproc does not apply. Added in version 4.03.1082000000 20040414. Mode none supported in version 5.01.1127760000 20050926.

mdparmodifyterms
If nonzero, allows the Metamorph query parser to modify search terms by compression of whitespace and quoting/unquoting. This is for back-compatibility with earlier versions; enabling it will break the information from bit 4 of mminfo() (query offset/lengths of sets). Added in version 5.01.1220640000 20080905.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.