[r6rs-discuss] Strings from Marcin 'Qrczak' Kowalczyk on 2007-03-26 (r6rs-discuss.mbox)

From: Marcin 'Qrczak' Kowalczyk <qrczak>
Date: Mon Mar 26 13:02:35 2007

Dnia 26-03-2007, pon o godzinie 11:29 -0400, MichaelL_at_frogware.com
napisa?(a):

> That said, programs that mishandle surrogate
> pairs probably also have problems with combining sequences, so using
> UTF-32 is unlikely to solve the more general problem of poor handling of
> multi-code-unit characters."

So UTF-32 doesn't solve the whole problem, it only solves half of the
problem. I agree.

Handling combining sequences is less work than handling combining
sequences and surrogates.

> "The only advantage of fixnums over bignums is [performance and] memory
> usage, and data exchange with those who already use fixnums. *Nothing* in
> fixnums is more convenient or simpler than bignums, it's an additional
> complexity layer."

The analogy doesn't hold, as it's *not* an additional complexity layer:
handling numbers is not more complex when the representation is split
into fixnums and bignums; this is purely an internal implementation
detail, so as long as it improves efficiency, it's a pure win. Whereas
the choice between UTF-16 and UTF-32 is a tradeoff between space and
programmer convenience: the space saving doesn't come for free.

Working in terms of code points should not be less convenient than
working in terms of UTF-16 code units. Code points are virtually almost
closer to the program's domain than UTF-16 code units (unless the
program is about recoding UTF-16).

> can you find a single document written by
> a Unicode expert that recommends UTF-32?

I don't care what experts say if I have all the data needed to judge it
myself.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak_at_knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Received on Mon Mar 26 2007 - 13:02:12 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC