9. Support' of UTF-8.
Many systems supports [UTF-8] as a character format for transferred
data. The benefit is that no fixing of a specific character set for
an application is needed because the set of 'all' characters is used,
represented by the 'Universal Character Set' UCS-2 [UCS], a double
byte coding for characters.
SDXF does not really deal with UTF-8 by itself, there are many
possibilities to interprete an UTF-8 sequence: The application may:
- reconstruct the UCS-2 sequence,
- accepts only the pure ASCII character and maps non-ASCII to a
special 'non-printable' character.
- target is pure ASCII, non-ASCII is replaced in a senseful manner
(French accented vowels replaced by vowels without accents, etc.).
- target is a specific ANSI character set, the non-ASCII chars are
mapped as possible, other replaced to a 'non-printable'.
- etc.
But SDXF offers an interface for the 'extract' and 'create'
functions:
A function pointer may be specified in the options table to maintain
this possibility (see 8.5). Default for this pointer is NULL: No
further conversions are done by SDXF, the data are copied 'as is', it
is treated as a bit string as for data type 'binary'.
If this function is specified, it is used by the 'create' function
with the 'toUTF8' mode, and by the 'extract' function with the '
fromUTF8' mode. The invoking of these functions is done by SDXF
transparently.
If the function returns zero (no conversion) SDXF copies the data
without conversion.