Language Characters

Syntax: list or range of characters, as inside REX []

The Language Characters setting controls what characters constitute a language query. Query terms composed entirely of these characters are considered language terms, and have Word Forms processing applied. Additionally, during linear/post-process searches (e.g. hit highlighting on the Match Info page), potential matches of language or wildcard query terms will be expanded to include all adjacent characters that are part of this setting, and the match rejected if it does not match the query term (this prevents the query term pond from matching the text term correspondence, for example).

The syntax is a list of characters (no separation), and/or a range of characters; the same as a REX character class (without the brackets). The default is \alpha\'\x80-\xFF, i.e. alphabetic, hi-bit (for UTF-8) and apostrophe (for contractions). For best results, all characters that could match part of a Word Definition expression (here) should usually also be listed in Language Characters.

See here for details on REX search syntax.