Text Search Mode

 

Syntax: select from options or enter custom mode

(Note: In earlier releases this setting was known as Character Match Mode.)

Sets the character-matching mode for text (keyword) searches. This controls aspects like case-sensitivity, ignoring accents, etc. The selectable values are:

  • Loose - Ignore case, ignore diacritics (accents), expand ligatures, ignore width differences. Storage Charset should be empty or UTF-8, though ISO-8859-1 may sometimes work. With this mode, not only will a lower-case "e" match an upper-case "E" and vice-versa (ignore case), but "e" will match "'e" (Unicode U+00E9), "oe" will match "œ" (U+0153), and full-width will match half-width characters (for ASCII and katakana).

  • Strict - Ignore case only. "e" will match "E", but not "'e". Storage Charset should be empty or UTF-8, though ISO-8859-1 may sometimes work.

  • Strict ISO-8859-1 - Ignore case only, and assume Storage Charset is ISO-8859-1. For back-compatibility. Available only for Text Search Mode.

  • Exact - Match characters exactly, respecting case, diacritics, width etc. Available only for Attribute Compare Mode.

  • Custom -> - Use the custom mode entered in the Custom Mode box. This is a comma-separated list composed from the following tokens; consult Thunderstone tech support for advice:

    • iso-8859-1 - Assume text is ISO-8859-1 encoded. Should only be used if Storage Charset is also ISO-8859-1. If this flag is not set, text is assumed to be UTF-8, though occasional ISO-8859-1 characters will usually be able to match their UTF-8 equivalents.

    • ignorediacritics - Ignore diacritic marks (accents, umlauts, etc.). E.g. "e" will match "'e" (U+00E9) and vice-versa.

    • expandligatures - Expand ligature characters. E.g. "oe" will match "œ" (U+0153) and vice-versa. Note that with this flag off, certain ligatures may still be expanded if necessary for case-folding with ignorecase.

    • ignorewidth - Ignore half-/full-width differences, e.g. for ASCII and katakana characters.

    • ignorecase - Ignore case differences, e.g. "e" matches "E" and vice-versa; this is the default. The alternative is respectcase.

    • respectcase - Case-sensitive search, e.g. "e" does not match "E". The alternative is ignorecase.

    • unicodemulti - Use Unicode case-compare tables, with multi-character expansions where needed (e.g. for ligatures). The alternative is ctype or unicodemono.

    • unicodemono - Use Unicode case-compare tables, but do not expand characters. The alternative is ctype or unicodemulti.

    • ctype - Use the operating system's ctype.h case-compare tables. Only codepoints U+0001 through U+00FF (i.e. single-byte or ISO-8859-1 range) are supported, though the actual encoding may be ISO-8859-1 or UTF-8 depending on the iso-8859-1 flag. The alternative is unicodemulti or unicodemono.

Note: Changing the Text Search Mode setting will cause text search indexes to be rebuilt, which may take several minutes or more for large profiles.


Copyright © Thunderstone Software     Last updated: Dec 5 2019
Copyright © 2019 Thunderstone Software LLC. All rights reserved.