[r6rs-discuss] perhaps i should be formal, but.... from William D Clinger on 2007-03-14 (r6rs-discuss.mbox)

From: William D Clinger <will>
Date: Wed Mar 14 16:16:17 2007

I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors.

Thomas Lord wrote:
> Earlier revisions of the standard defined a portable character set,
> allowing implementations to freely expand beyond that set.
> In a portable program, if only the portable character set is
> used, reliably portable behavior obtains.

What's different now is that Unicode has become an
established standard, and the portability advantages
of requiring Scheme programs to use Unicode (which
is more than just a character set) appear far larger
than any advantages that might still be derived from
allowing implementations and programs to choose their
own character sets.

> In the R6 draft, the entire set of permitted characters is
> explicitly enumerated.

Actually, I believe the set of permitted characters
is enumerated by reference to Unicode character
categories. SFAIK, the set of characters in those
categories is still growing, albeit slowly.

> Moreover, the set's mapping to integer
> values is both discontinuous and defined by three constants
> that, a priori, appear to be arbitrary.

The constants are part of the Unicode standard, and
are more historical than arbitrary. With hindsight
we all would have preferred a contiguous range, but
I understand the historical circumstances that led
to the hole in the middle.

> My question is whether any principled reason for these arbitrary
> constants is given that might be supported without appeal
> to analogies to other programming languages.

SFAIK, the justification for the constants has naught
to do with other programming languages, but with Scheme
and Unicode. Of all Unicode concepts, the one that comes
closest to Scheme's historical notion of a character is
the Unicode notion of a scalar value.

Scheme could have defined its own encoding of scalar
values, and that range could have been contiguous, but
that would have been a Seriously Bad Idea. Using some
Scheme-specific encoding would have created enormous
confusion and made interfacing with other systems more
difficult.

> Note that there is a fine distinction to be made between arbitrary
> choices such as the numeric values assigned to portable characters,
> and arbitrary choices such as a mandatory domain restriction
> on INTEGER->CHAR. In the former, if CHAR<->INTEGER
> conversion is to be supported at all, it is clear that *some* arbitrary
> choice must be made and so, of course, appeal to a popular standard
> for that. In the latter case, the domain restriction, there is no obvious
> reason to believe any such restriction is needed or makes the language
> better than another language without that restriction.

Even in the latter case, the report should state the domain
for which integer->char can be relied upon to behave portably.

Your question seems to come down to whether that procedure
should be required to raise an exception when given values
outside its portable domain:

> So, how does it come to pass that those patently arbitrary aspects of
> Unicode
> appear in the report not as a set of domain limits within which
> the behavior of portable programs is assured, but as restrictions that
> forbid
> an implementation from expanding the domains and ranges of certain
> standard procedures?

The argument, I believe, is that passing a non-portable value
to integer->char is likely to be a common error, especially
among programmers who are just now learning about Unicode or
were introduced to Unicode in programming languages that were
standardized back when Unicode was expected to use a 16-bit
character set, and that allowing such non-portable arguments
to integer->char would, if allowed by the report, also be a
common error among implementors who are just now learning
about Unicode or were introduced to Unicode in programming
languages that were standardized back when Unicode was expected
to use a 16-bit character set.

Making it clear up front that desiring to pass non-portable
values to integer->char is a grievous conceptual error will
save everyone a lot of grief later.

> There is a legal question at issue: how certain procedures should
> be specified. But the larger question is on what basis, by what ways,
> should such specifications be decided?
>
> If R6 is simply to be a record of votes taken, a kind of tallying up
> of a political process with purely pragmatic aims, then perhaps
> it is no longer a "report" at all. The line of thought that started
> with the "ultimate" papers has ended. What carries on, in its place,
> is a particular *use* of the main tangible artifact of that line of
> thought. And, in that case, the introduction should certainly be
> purged or retitled "Obituary" and the document as a whole
> retitled.

I have some sympathy for that point of view. I have less
sympathy for that point of view with respect to Unicode than
with several other parts of the report, however, because I
think the draft report's treatment of Unicode is one of the
more compelling arguments to be made in its favor.

Will
Received on Wed Mar 14 2007 - 16:16:11 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC