[r6rs-discuss] Strings

From: MichaelL_at_frogware.com <MichaelL>
Date: Mon Mar 26 11:30:07 2007

> > "Important: Supplementary code points must be supported for full
Unicode
> > support, regardless of the encoding form.
>
> That's the theory. But UTF-16 is strictly less convenient than UTF-32,
> which means that a lot of code working in terms of UTF-16 doesn't bother
> to support supplementary code points.

>From Wikipedia:

"Unfortunately using UTF-16 makes characters outside the Basic
Multilingual Plane a special case which increases the risk of oversights
related to their handling. That said, programs that mishandle surrogate
pairs probably also have problems with combining sequences, so using
UTF-32 is unlikely to solve the more general problem of poor handling of
multi-code-unit characters."

> The only advantage of UTF-16 over UTF-32 is memory usage, and data
> exchange with those who already use UTF-16. *Nothing* in UTF-16 is more
> convenient or simpler than UTF-32, it's an additional complexity layer.

"The only advantage of fixnums over bignums is [performance and] memory
usage, and data exchange with those who already use fixnums. *Nothing* in
fixnums is more convenient or simpler than bignums, it's an additional
complexity layer."

> > But I'll tell you what. Find a document, written by someone with
> > substantial Unicode experience, that recommends UTF-32 as the best
overall
> > in-memory encoding.

I don't agree with everything you said, but more to the point none of it
related to the question I asked: can you find a single document written by
a Unicode expert that recommends UTF-32? Every such document I can find
recommends UTF-16 as the best overall encoding, with UTF-8 a second choice
(based on expected usage). UTF-32 is always the third choice and it always
has the caveat "if space doesn't matter."
Received on Mon Mar 26 2007 - 11:29:06 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC