Texis Version 6 has improved Unicode
(international/foreign/hi-bit/UTF-8) character support. Two new
settings were introduced: textsearchmode
(here)
and
stringcomparemode
(here). Both have the same set of possible values,
and offer more flexibility in how text searches and string comparisons
(respectively) are handled. Some features:
LIKE
operator) are
case-insensitive in version 6 for the entire Unicode 5.1
locale-independent character set, not just the given operating
system's locale (which may be inconsistent and does not support
characters beyond U+00FF).
All of these behaviors can be controlled with the (new in version 6)
textsearchmode
and stringcomparemode
apicp
settings (see the Vortex manual for details).
Caveat: A version 5 or earlier Texis should not access or modify
a regular (B-tree) or Metamorph index originally created by a version
6 or later Texis, unless stringcomparemode was set to ctype, respectcase, iso-8859-1 (regular indices) or
textsearchmode was set to ctype, ignorecase, iso-8859-1
(Metamorph indices) at creation. If hi-bit/UTF-8/Unicode characters
exist in the data, index corruption may result from Texis 5
modifications.
|
The stringcomparemode
setting also affects the functions
<xtree>
, <strstr>
, <strstri>
, <substr>
,
<strcmp>
, <strcmpi>
, <strncmp>
,
<strnicmp>
, <strlen>
, <strrev>
, <upper>
,
<lower>
, <sort>
, <uniq>
, upper()
,
lower()
, initcap()
, text2mm()
and
length()
. The length()
/<strlen>
functions
count charset characters (e.g. UTF-8 characters) not bytes.
Version 5 and earlier behavior can be restored by default by setting
the texis.ini
setting [Apicp] Text Search Mode
to ctype, ignorecase, iso-8859-1, and [Apicp] String Compare Mode
to ctype, respectcase, iso-8859-1.