[r6rs-discuss] [Formal] Non-ASCII characters should not be treated all alike from Thomas Lord on 2006-12-01 (r6rs-discuss.mbox)

From: Thomas Lord <lord>
Date: Fri Dec 1 18:42:27 2006

Actually, and I do understand where you are coming from, no
simply forbidding "unescaped" forms of these characters does not
satisfy my concern.

The Report has the opportunity to assign, or to decline to assign
meaning to those and other characters in Scheme source texts.

Clearly, the Report should assign no meanings which contradict
Unicode best practices, regarding which you are almost certainly
the best available (and a very valuably available) representative.

At the same time, the excuse "because it contradicts Unicode
best practices" is not, in and of itself, sufficient reason to assign
any source text an error meaning. That is, implementations should
be permitted to assign such source texts a non-error meaning.

In general, the report must, if it is to remain true to its preamble, aim
to define a least upper bound of constraints on implementations.

It is minimalist to require that implementations give a specific
interpretation to source texts not containing the characters in question
outside of character and string constants. It is minimalist to require
that if an implementation WRITEs a source text from an s-expression
that the same implementation must be able to READ the same expression.
It is reasonable to inform implementors that they can only count on
texts they WRITE being successfully READ by another implementation
if they stick to a quoting regime you suggest. All of those requirements
in the Report would be just fine, in my book.

What is not minimalist and, indeed, contrary to the spirit of Scheme,
is for the Report to *require* that such characters in source texts,
outside of character and string constants, comprise an error. No
practical necessity of computing requires that. It complicates the
requirements gratuitously. It is a requirement imposed, if at all,
in pursuit of an essentially *political* agenda: to enforce someone's
notion of "Unicode Best Practices" by making it very hard for working
programmers to write code in any other style.

If something required us to form a compromise position between our
views, I would suggest this:

     [These characters (e.g., "other spaces under Zs")] have no
     portable meaning in standard Scheme outside of character and
     string constants except that if such characters are produced by
     WRITE then the same implementation must interpret the
     characters so as to preserve READ/WRITE invariants within
     the context of that implementation.

One (perhaps unachievable or perhaps achievable) ideal would be
if "portable Scheme program" were a statically decidable property
of source texts -- suggesting the invention of a "Scheme lint" tool
that tells you whether or not a program conforms. The Report
should assign meaning to all programs which would be labled
conforming, but leave all other programs undefined.

-t

-t

John Cowan wrote:
> Thomas Lord scripsit:
>
>> John Cowan wrote:
>>
>>> The other spaces under Zs are legacy fixed-width spaces.
>>>
>>> None of them should be allowed in Scheme programs outside string and
>>> character constants.
>>>
>>>
>>>
>> That seems like an incoherent policy.
>>
>
> What I meant was that none can be used *unescaped* in Scheme programs.
> You can include them in identifiers using \-escapes, of course.
> Does that satisfy your concern?
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.r6rs.org/pipermail/r6rs-discuss/attachments/20061201/f7b8858b/attachment.htm
Received on Fri Dec 01 2006 - 18:43:33 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC