Dnia 26-03-2007, pon o godzinie 11:29 -0400, MichaelL_at_frogware.com
napisa?(a):
> That said, programs that mishandle surrogate
> pairs probably also have problems with combining sequences, so using
> UTF-32 is unlikely to solve the more general problem of poor handling of
> multi-code-unit characters."
So UTF-32 doesn't solve the whole problem, it only solves half of the
problem. I agree.
Handling combining sequences is less work than handling combining
sequences and surrogates.
> "The only advantage of fixnums over bignums is [performance and] memory
> usage, and data exchange with those who already use fixnums. *Nothing* in
> fixnums is more convenient or simpler than bignums, it's an additional
> complexity layer."
The analogy doesn't hold, as it's *not* an additional complexity layer:
handling numbers is not more complex when the representation is split
into fixnums and bignums; this is purely an internal implementation
detail, so as long as it improves efficiency, it's a pure win. Whereas
the choice between UTF-16 and UTF-32 is a tradeoff between space and
programmer convenience: the space saving doesn't come for free.
Working in terms of code points should not be less convenient than
working in terms of UTF-16 code units. Code points are virtually almost
closer to the program's domain than UTF-16 code units (unless the
program is about recoding UTF-16).
> can you find a single document written by
> a Unicode expert that recommends UTF-32?
I don't care what experts say if I have all the data needed to judge it
myself.
--
__("< Marcin Kowalczyk
\__/ qrczak_at_knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Received on Mon Mar 26 2007 - 13:02:12 UTC