[r6rs-discuss] perhaps i should be formal, but....

From: Thomas Lord <lord>
Date: Wed Mar 14 16:34:00 2007

William D Clinger wrote:
> I am posting this as an individual member of the Scheme
> community. I am not speaking for the R6RS editors.
>
> Thomas Lord wrote:
>
>> Earlier revisions of the standard defined a portable character set,
>> allowing implementations to freely expand beyond that set.
>> In a portable program, if only the portable character set is
>> used, reliably portable behavior obtains.
>>
>
> What's different now is that Unicode has become an
> established standard, and the portability advantages
> of requiring Scheme programs to use Unicode (which
> is more than just a character set) appear far larger
> than any advantages that might still be derived from
> allowing implementations and programs to choose their
> own character sets.
>


Hopefully I'll finish the formal comment in time. Briefly:

"Requiring [portable] Scheme programs to use Unicode [scalar values],"
in any reasonable sense of that phrase, is not at stake here.

"Forbidding implementations from supporting additional
characters," is one part of what is at stake.




>
>> In the R6 draft, the entire set of permitted characters is
>> explicitly enumerated.
>>
>
> Actually, I believe the set of permitted characters
> is enumerated by reference to Unicode character
> categories. SFAIK, the set of characters in those
> categories is still growing, albeit slowly.
>

I don't see the need for the word "actually," there -- I don't
think we're contradicting one another though I can understand
how a narrow interpretation of "explicitly enumerated" would
lead you there.


>
>> Moreover, the set's mapping to integer
>> values is both discontinuous and defined by three constants
>> that, a priori, appear to be arbitrary.
>>
>
> The constants are part of the Unicode standard, and
> are more historical than arbitrary. With hindsight
> we all would have preferred a contiguous range, but
> I understand the historical circumstances that led
> to the hole in the middle.
>
>

How do you get from there to mandating that hole (and much
else that it implies) in all implementations? Relaxing those
restrictions would not seem to change the behavior of non-divergent
programs unless, perhaps, those which use exceptions in some
particularly odd ways.


>> My question is whether any principled reason for these arbitrary
>> constants is given that might be supported without appeal
>> to analogies to other programming languages.
>>
>
> SFAIK, the justification for the constants has naught
> to do with other programming languages, but with Scheme
> and Unicode. Of all Unicode concepts, the one that comes
> closest to Scheme's historical notion of a character is
> the Unicode notion of a scalar value.
>
> Scheme could have defined its own encoding of scalar
> values, and that range could have been contiguous, but
> that would have been a Seriously Bad Idea. Using some
> Scheme-specific encoding would have created enormous
> confusion and made interfacing with other systems more
> difficult.
>

You're skirting around the issue of permitting v. forbidding extensions
that shouldn't have any impact on well-written portable programs.


>
>> Note that there is a fine distinction to be made between arbitrary
>> choices such as the numeric values assigned to portable characters,
>> and arbitrary choices such as a mandatory domain restriction
>> on INTEGER->CHAR. In the former, if CHAR<->INTEGER
>> conversion is to be supported at all, it is clear that *some* arbitrary
>> choice must be made and so, of course, appeal to a popular standard
>> for that. In the latter case, the domain restriction, there is no obvious
>> reason to believe any such restriction is needed or makes the language
>> better than another language without that restriction.
>>
>
> Even in the latter case, the report should state the domain
> for which integer->char can be relied upon to behave portably.
>
>

Yet it does more than that. If it only specified what can be relied upon
in a portable program this conversation would have a different form.

The conversation would probably still occur, though, because (as I'll try
to explain in the formal -- and here I thought I'd given up) -- because
once you start to unpack the loosening of that restriction, a whole bunch
of other changes follow.


> Your question seems to come down to whether that procedure
> should be required to raise an exception when given values
> outside its portable domain:
>
>
>> So, how does it come to pass that those patently arbitrary aspects of
>> Unicode
>> appear in the report not as a set of domain limits within which
>> the behavior of portable programs is assured, but as restrictions that
>> forbid
>> an implementation from expanding the domains and ranges of certain
>> standard procedures?
>>
>
> The argument, I believe, is that passing a non-portable value
> to integer->char is likely to be a common error, especially
> among programmers who are just now learning about Unicode or
> were introduced to Unicode in programming languages that were
> standardized back when Unicode was expected to use a 16-bit
> character set, and that allowing such non-portable arguments
> to integer->char would, if allowed by the report, also be a
> common error among implementors who are just now learning
> about Unicode or were introduced to Unicode in programming
> languages that were standardized back when Unicode was expected
> to use a 16-bit character set.
>

Excellent. That is what tuned-for-education implementations like the
PLT family
are for.


> Making it clear up front that desiring to pass non-portable
> values to integer->char is a grievous conceptual error will
> save everyone a lot of grief later.
>

That's not the proper function of the report, in my opinion. I also
disagree with the use of the word "everyone".


>
>> There is a legal question at issue: how certain procedures should
>> be specified. But the larger question is on what basis, by what ways,
>> should such specifications be decided?
>>
>> If R6 is simply to be a record of votes taken, a kind of tallying up
>> of a political process with purely pragmatic aims, then perhaps
>> it is no longer a "report" at all. The line of thought that started
>> with the "ultimate" papers has ended. What carries on, in its place,
>> is a particular *use* of the main tangible artifact of that line of
>> thought. And, in that case, the introduction should certainly be
>> purged or retitled "Obituary" and the document as a whole
>> retitled.
>>
>
> I have some sympathy for that point of view. I have less
> sympathy for that point of view with respect to Unicode than
> with several other parts of the report, however, because I
> think the draft report's treatment of Unicode is one of the
> more compelling arguments to be made in its favor.
>
>

Lemme see if I can pull of the formal. If I had to bet, and I do
finish it,
it'll go down in flames with the editors but, hopefully it will at least
be a
fun and provocative read, clarifying some of my position.


-t



> Will
>
> _______________________________________________
> r6rs-discuss mailing list
> r6rs-discuss_at_lists.r6rs.org
> http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
>
>
Received on Wed Mar 14 2007 - 16:43:12 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC