Tries to identify the predominant language of a given string. By returning a probability in addition to the identified language, this function can also serve as a test of whether the given string is really natural-language text, or perhaps binary/encoded data instead. Syntax:
identifylanguage(text[, language[, samplesize]])
The return value is a two-element strlst
: a probability and a
language code. The probability is a value from 0.000 to
1.000 that the text argument is composed in the
language named by the returned language code. The language code is
a two-letter ISO-639-1 code.
If an ISO-639-1 code is given for the optional language argument, the probability for that particular language is returned, instead of for the highest-probability language of the known/built-in languages (currently de, es, fr, ja, pl, tr, da, en, eu, it, ko, ru).
The optional third argument samplesize is the initial integer size in bytes of the text to sample when determining language; it defaults to 16384. The samplesize parameter was added in version 7.01.1382113000 20131018.
Note that since a strlst
value is returned, the probability
is returned as a strlst
element, not a double
value,
and thus should be cast to double
during comparisons. In
Vortex with arrayconvert
on (the default), the return value
will be automatically split into a two-element Vortex varchar
array.
The identifylanguage()
function is experimental, and its
behavior, syntax, name and/or existence are subject to change
without notice. Added in version 7.01.1381362000 20131009.