[r6rs-discuss] [Formal] Scheme should not be changed to be case sensitive. from Thomas Lord on 2006-11-15 (r6rs-discuss.mbox)

From: Thomas Lord <lord>
Date: Wed Nov 15 14:10:29 2006

John Cowan wrote:
> Thomas Lord scripsit:
>
>
>> I mean that, for a given procedure application (yes, one
>> argument, for simplicity) substitution of eq value yields
>> an equivalent computation. No? I thought that was the
>> point of "eq?".
>>
>
> Only if the procedure in question is a function of its argument.
> If it's constant (ignores the argument), or is a function of
> some piece of state or non-localized variable, it won't work.
>
>

I think you misunderstand something. I don't mean that for EQ?
values, A and B, it is always true that (EQ? (F A) (F B)). Rather,
I mean, speaking informally, that you can safely freeze a running
Scheme program, reach in from the outside and substitute EQ? for
EQ? values, then resume the program -- and the program will
proceed as if you haven't done anything at all (modulo your effects
on speed and time performance).

For a particular procedure application during a particular run
of a program -- EQ? values can be substituted.

>> We need extended forms of UTF-16 and UTF-8 which
>> are formally and officially and with a blessing from the pope
>> capable of representing *any* sequence of *encoding values*
>> from any of the encoding schemes and, then encoding values
>> ought to be taken as code-points, used in a particular way.
>>
>> In other words, I want (for example) a "utf-16++" in which
>> an unpaired leading surrogate, followed by an unpaired trailing
>> surrogate, can be represented -- if I concatentate two improper
>> utf-16 strings, i should reliably get a utf-16++ string whose
>> length is the sum of those two and which is itself improper.
>>
>
> *shrug*
>
> People who try to reinvent Unicode at this late date (you aren't
> the first and won't be the last) condemn themselves to irrelevance.
>
>

You have it backwards and I hope you will take a glance at
my note to Matthew Flatt. I'm *not* re-inventing Unicode --
the R^_RS authors *are* re-inventing Unicode. That's
my complaint. I agree about the "condemn themselves
to irrelevance" comment.

>> There is no (none, zero, nadda, zilch, zippo) chance of making
>> any CHAR? type that jives really perfectly with any social/intuitive/
>> human conception of what a "character" is -- but "a character is
>> a Unicode codepoint" is just so crystal clear, but simple, and
>> sufficient that I can't believe anyone -- not even bear who generally
>> prefers the application-level char type to include combining
>> sequences -- would pass it by.
>>
>
> There is no reason to allow the representation of loose surrogates,
> unless you have prematurely standardized on 16-bit code units.
>

You are wrong. One reason, sufficient to show that you are wrong,
is that the Unicode standard defines a number of axiomatic mappings
and relations whose domains include loose surrogates. You are
suggesting (and the authors have written) a mapping of Unicode
into Scheme in which there is no natural way to express such mappings
-- the resulting type lattice of Scheme is just "at odds with" the type
lattice of Unicode.

>
>> R^_RS is getting this all wrong by trying to, arrogantly, carve
>> out some entirely *novel* definition of CHAR?.
>>
>
> Not at all. It's the same definition used by XML, for instance.
> And it's very sensible: unpaired surrogates are useless.
>

Really? Here is a port which *might* contain some valid XML.
How will you interpret what you read from it?

>
>> This committee -- this Scheme committee -- has decided
>> that it knows better than the Unicode consortium or anyone else
>> how to define the basis sets of text processing: I have no respect
>> for that.
>>
>
> I'm not speaking for the Consortium here, but I am a member of it
> and an invited expert on Unicore, so there's some evidence that
> I know what I'm talking about.
>

I know. You're on my list of people who's posts I like to read.
You have my undying respect for everything from PDP-8
word processing (that's you, right?) through your now several years of
Unicode work.
I *think* that if we sat down over higher bandwidth I'd have
you convinced, in a short time, that I grok Unicode pretty well
(having learned a lot by reading you, among other things) and
you would wind up agreeing with me. We'll see how well email
approximates that, I guess. :-)

>
>> A character is a freaking code point is a freaking character.
>>
>
> Not.
>
>

Heh. I also kinda like Bear's definition: a character is
isomorphic to a base codepoint plus a well-formed list of
combining-character codepoints -- that *is* the next
level up.

-t

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r6rs.org/pipermail/r6rs-discuss/attachments/20061115/0d3ab9ae/attachment.htm
Received on Wed Nov 15 2006 - 14:10:33 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC