Thomas Lord scripsit:
> The restriction in section 9.14, prohibitting the domain of
> INTEGER->CHAR from including surrogates, should be relaxed.
> Implementations should be permitted, not required, to adopt
> that restriction.
I'm against it.
> 1. In general, the less restricted model is simpler and more
> powerful. In an implementation without the restriction,
> the CHAR? type can simply be isomorphic with a set of
> exact integers in some (possibly improper) superset of
> [0,#xFFFFFFFF].
If you want u32 vectors, you know where to find them.
> That enables things like "bucky bits"
> (a fine lisp tradition).
Such a fine old tradition, in fact, that they were made optional in CLtL1
and removed altogether from CLtL2/ANSI CL. They were also accompanied
in CLtL1 by a type called "string-char", which implementations could
define as a subset of "char" that excluded some or all of the bucky bits.
Allowing arbitrary u32 values without creating a string-char type means
that at least one means of representing strings must be as a u32 vector.
Using the Unicode definition makes it possible to use UTF-8 or UTF-16
internally throughout.
> It is certainly easy to teach learn. It seems to be simpler to
> implement, too.
So is weak typing a la C.
> 2. The I/O issues can be solved in a clever way -- by
> reinterpreting ill-formed UTF-8 and UTF-16 as spellings
> of sequences of certain private-use codepoints.
> Round-trips with processes that don't understand these
> private use characters are perfectly robust to the
> extent that those processes are conforming.
Those who try to reinvent Unicode, etc. There are several ways to resolve
ill-formed byte sequences: replace with U+FFFD, throw and exception,
ignore junk. This is just what is already provided.
--
But you, Wormtongue, you have done what you could for your true master. Some
reward you have earned at least. Yet Saruman is apt to overlook his bargains.
I should advise you to go quickly and remind him, lest he forget your faithful
service. --Gandalf John Cowan <cowan_at_ccil.org>
Received on Fri Nov 17 2006 - 14:40:35 UTC