Dnia 25-03-2007, nie o godzinie 22:32 -0400, MichaelL_at_frogware.com
napisa?(a):
> "Important: Supplementary code points must be supported for full Unicode
> support, regardless of the encoding form.
That's the theory. But UTF-16 is strictly less convenient than UTF-32,
which means that a lot of code working in terms of UTF-16 doesn't bother
to support supplementary code points.
The C API for character predicates (iswalpha etc.) makes sense when
wchar_t is UTF-32; it can't support supplementary code points when
wchar_t is UTF-16.
The only advantage of UTF-16 over UTF-32 is memory usage, and data
exchange with those who already use UTF-16. *Nothing* in UTF-16 is more
convenient or simpler than UTF-32, it's an additional complexity layer.
> But I'll tell you what. Find a document, written by someone with
> substantial Unicode experience, that recommends UTF-32 as the best overall
> in-memory encoding.
C/C++ on Linux uses UTF-32 for wchar_t. Gtk+ uses UTF-8 internally.
Python can be compiled to use UTF-16 or UTF-32. Perl uses UTF-8. CLISP
uses code points in the API (the internal representation is a mixture
of UTF-32, UCS-2 and ISO-8859-1). iconv uses UTF-32 as its internal
encoding, which means that recoding to/from UTF-32 is faster than UTF-16
or UTF-8.
I've never seen an external file encoded in UTF-16 or UTF-32 on Linux.
The only (almost?) Unicode encoding used for data exchange is UTF-8,
and UTF-32 is the primary temporary representation in memory.
--
__("< Marcin Kowalczyk
\__/ qrczak_at_knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Received on Mon Mar 26 2007 - 07:07:01 UTC