[r6rs-discuss] [Formal] Scheme should not be changed to be case sensitive. from John Cowan on 2006-11-15 (r6rs-discuss.mbox)

From: John Cowan <cowan>
Date: Wed Nov 15 00:18:09 2006

Thomas Lord scripsit:

> I mean that, for a given procedure application (yes, one
> argument, for simplicity) substitution of eq value yields
> an equivalent computation. No? I thought that was the
> point of "eq?".

Only if the procedure in question is a function of its argument.
If it's constant (ignores the argument), or is a function of
some piece of state or non-localized variable, it won't work.

> We need extended forms of UTF-16 and UTF-8 which
> are formally and officially and with a blessing from the pope
> capable of representing *any* sequence of *encoding values*
> from any of the encoding schemes and, then encoding values
> ought to be taken as code-points, used in a particular way.
>
> In other words, I want (for example) a "utf-16++" in which
> an unpaired leading surrogate, followed by an unpaired trailing
> surrogate, can be represented -- if I concatentate two improper
> utf-16 strings, i should reliably get a utf-16++ string whose
> length is the sum of those two and which is itself improper.

*shrug*

People who try to reinvent Unicode at this late date (you aren't
the first and won't be the last) condemn themselves to irrelevance.

> There is no (none, zero, nadda, zilch, zippo) chance of making
> any CHAR? type that jives really perfectly with any social/intuitive/
> human conception of what a "character" is -- but "a character is
> a Unicode codepoint" is just so crystal clear, but simple, and
> sufficient that I can't believe anyone -- not even bear who generally
> prefers the application-level char type to include combining
> sequences -- would pass it by.

There is no reason to allow the representation of loose surrogates,
unless you have prematurely standardized on 16-bit code units.

> R^_RS is getting this all wrong by trying to, arrogantly, carve
> out some entirely *novel* definition of CHAR?.

Not at all. It's the same definition used by XML, for instance.
And it's very sensible: unpaired surrogates are useless.

> This committee -- this Scheme committee -- has decided
> that it knows better than the Unicode consortium or anyone else
> how to define the basis sets of text processing: I have no respect
> for that.

I'm not speaking for the Consortium here, but I am a member of it
and an invited expert on Unicore, so there's some evidence that
I know what I'm talking about.

> A character is a freaking code point is a freaking character.

Not.

-- 
He played King Lear as though           John Cowan <cowan_at_ccil.org>
someone had played the ace.             http://www.ccil.org/~cowan
        --Eugene Field

Received on Wed Nov 15 2006 - 00:18:01 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC