Can Texis index multi-byte languages (Chinese, Japanese, etc.)?

Yes, Texis has been used in these languages. A simple configuration setting tells Texis to index multi-byte patterns (if other than UTF-8). Our customers report satisfactory results with this approach. However, there are some options to improve the accuracy. For example, a specific character in Chinese may sometimes be a word on its own, and other times part of a different word. Chinese readers discern the difference from the context, but there is no indication in the text as to which it is. If you need to index one of these languages, please contact us to discuss these and related issues.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2025 Thunderstone Software LLC. All rights reserved.