identifylanguage

Tries to identify the predominant language of a given string. By returning a probability in addition to the identified language, this function can also serve as a test of whether the given string is really natural-language text, or perhaps binary/encoded data instead. Syntax:

    identifylanguage(text[, language])

The return value is a two-element strlst: a probability and a language code. The probability is a value from 0.000 to 1.000 that the text argument is composed in the language named by the returned language code. The language code is a two-letter ISO-639-1 code.

If an ISO-639-1 code is given for the optional language argument, the probability for that particular language is returned, instead of for the highest-probability language of the known/built-in languages (currently de, es, fr, ja, pl, tr, da, en, eu, it, ko, ru).

Note that since a strlst value is returned, the probability is returned as a strlst element, not a double value, and thus should be cast to double during comparisons. In Vortex with arrayconvert on (the default), the return value will be automatically split into a two-element Vortex varchar array.

The identifylanguage() function is experimental, and its behavior, syntax, name and/or existence are subject to change without notice. Added in version 7.01.1381362000 20131009.


Copyright © Thunderstone Software     Last updated: Apr 26 2017
Copyright © 2017 Thunderstone Software LLC. All rights reserved.