ZVON > RFC Repository > RFC 3072
Prev | Next | RFC index | RFC search Download as zip/tar.gz

9. Support' of UTF-8.

   Many systems supports [UTF-8] as a character format for transferred
   data.  The benefit is that no fixing of a specific character set for
   an application is needed because the set of 'all' characters is used,
   represented by the 'Universal Character Set' UCS-2 [UCS], a double
   byte coding for characters.

   SDXF does not really deal with UTF-8 by itself, there are many
   possibilities to interprete an UTF-8 sequence:  The application may:

   -  reconstruct the UCS-2 sequence,
   -  accepts only the pure ASCII character and maps non-ASCII to a
      special 'non-printable' character.
   -  target is pure ASCII, non-ASCII is replaced in a senseful manner
      (French accented vowels replaced by vowels without accents, etc.).
   -  target is a specific ANSI character set, the non-ASCII chars are
      mapped as possible, other replaced to a 'non-printable'.
   -  etc.

   But SDXF offers an interface for the 'extract' and 'create'
   functions:

   A function pointer may be specified in the options table to maintain
   this possibility (see 8.5).  Default for this pointer is NULL: No
   further conversions are done by SDXF, the data are copied 'as is', it
   is treated as a bit string as for data type 'binary'.

   If this function is specified, it is used by the 'create' function
   with the 'toUTF8' mode, and by the 'extract' function with the '
   fromUTF8' mode.  The invoking of these functions is done by SDXF
   transparently.

   If the function returns zero (no conversion) SDXF copies the data
   without conversion.

ZVON > RFC Repository > RFC 3072
Prev | Next | RFC index | RFC search Download as zip/tar.gz