There are two different items to discuss related to encodings; reading/writing XML, and working internally.
The XML API can read and write many character encodings, leveraging
the power of the GNU libiconv library. The reading and writng
encodings need not be simiar. For example, a
document can be written as
UTF-8, and vice-versa.
When working within the library, everything is UTF-8 regardless of what character encoding it was read from or will be written to. This deserves stressing:
This means that a document may exist on disk in
when the XML API parses it and you call
get the text from an element, you'll get
UTF-8 data. If the
file exists on disk in
xmlTreeGetContent() will still give
Simiarly, regardless of whether a document will be outputted in
UTF-32, etc., when adding a new
xmlTreeNewElement(), the name and contents must be
This may sound restricting, but it's actually liberating in that when
working in code, you never have to worry about what encoding the file
was read from, or what encoding it will be written out as. Always use
The default encoding when working in Vortex is already
(unless manually changed with <urlcp charsettxt>). If you have
data that you need to convert to UTF-8, you can use the
<urlutil charsetconv> Vortex function.