Texis Version 6 has improved Unicode
(international/foreign/hi-bit/UTF-8) character support. Two new
settings were introduced: textsearchmode
(here)
and
stringcomparemode (here). Both have the same set of possible values,
and offer more flexibility in how text searches and string comparisons
(respectively) are handled. Some features:
LIKE operator) are
case-insensitive in version 6 for the entire Unicode 5.1
locale-independent character set, not just the given operating
system's locale (which may be inconsistent and does not support
characters beyond U+00FF).
All of these behaviors can be controlled with the (new in version 6)
textsearchmode and stringcomparemode apicp
settings (see the Vortex manual for details).
Caveat: A version 5 or earlier Texis should not access or modify
a regular (B-tree) or Metamorph index originally created by a version
6 or later Texis, unless stringcomparemode was set to ctype, respectcase, iso-8859-1 (regular indices) or
textsearchmode was set to ctype, ignorecase, iso-8859-1
(Metamorph indices) at creation. If hi-bit/UTF-8/Unicode characters
exist in the data, index corruption may result from Texis 5
modifications.
|
The stringcomparemode setting also affects the functions
<xtree>, <strstr>, <strstri>, <substr>,
<strcmp>, <strcmpi>, <strncmp>,
<strnicmp>, <strlen>, <strrev>, <upper>,
<lower>, <sort>, <uniq>, upper(),
lower(), initcap(), text2mm() and
length(). The length()/<strlen> functions
count charset characters (e.g. UTF-8 characters) not bytes.
Version 5 and earlier behavior can be restored by default by setting
the texis.ini setting [Apicp] Text Search Mode
to ctype, ignorecase, iso-8859-1, and [Apicp] String Compare Mode
to ctype, respectcase, iso-8859-1.