[r6rs-discuss] [Formal] the CHAR? type from Thomas Lord on 2006-11-17 (r6rs-discuss.mbox)

From: Thomas Lord <lord>
Date: Fri Nov 17 03:47:47 2006

---
This message is a formal comment which was submitted to formal-comment_at_r6rs.org, following the requirements described at: http://www.r6rs.org/process.html
---
Submitter's Name:
Thomas Lord
Submitter's Email Address:
lord_at_emf.net
Type of Issue:
Defect (solvable by a Simplification)
Priority:
Major
R6RS component:
The CHAR? type. (Section 9.14)
Summary:
The restriction in section 9.14, prohibitting the domain of
INTEGER->CHAR from including surrogates, should be relaxed.
Implementations should be permitted, not required, to adopt
that restriction.
Body:
The text of 9.14 says, concerning the domain of
the INTEGER->CHAR procedure:
(integer->char sv)
Sv must be a scalar value, i.e. a non-negative exact
integer in [0,#xD7FF] union [#xE000,#x10FFFF].
I think it should say:
Implementations are permitted to require that
Sv must be a scalar value, i.e. a non-negative exact
integer in [0,#xD7FF] union [#xE000,#x10FFFF].
or words to that effect.
Opinions vary about the desirability of an implementation in
which an "unpaired surrogate" can be represented as a CHAR?
value. There seem to be no definitive arguments for or
against this proposition. I would be happy to explain in
detail an implementation that permits unpaired surrogates as
CHAR? values, and why I prefer such an implementation.
John Cowan and I have both asserted that a problem with
allowing unpaired surrogates as CHAR? values is that there
is no standard way to write them to a UTF-8 or UTF-16 port.
That is true, but it is not an argument for the restriction
in 9.14.
What is not clear to me is why the authors favor the
restriction and what kind of arguments, examples, logic
etc. to offer in order to attempt to persuade them otherwise.
Would it be helpful for me to describe an implementation
that doesn't have the restriction? Or to explain how the
I/O issues can be addressed? I am hoping it is a simple
matter to drop the restriction on the general principle
that restrictions like that need a strong, positive rationale
which, in this case, is clearly lacking.
Very briefly, therefore:
1. In general, the less restricted model is simpler and more
powerful. In an implementation without the restriction,
the CHAR? type can simply be isomorphic with a set of
exact integers in some (possibly improper) superset of
[0,#xFFFFFFFF]. That enables things like "bucky bits"
(a fine lisp tradition). It is certainly easy to teach
and learn. It seems to be simpler to implement, too.
2. The I/O issues can be solved in a clever way -- by
reinterpreting ill-formed UTF-8 and UTF-16 as spellings
of sequences of certain private-use codepoints.
Round-trips with processes that don't understand these
private use characters are perfectly robust to the
extent that those processes are conforming.

Received on Thu Nov 16 2006 - 16:21:18 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC