[r6rs-discuss] [Formal] the CHAR? type

From: Thomas Lord <lord>
Date: Fri Nov 17 03:47:47 2006

---
This message is a formal comment which was submitted to formal-comment_at_r6rs.org, following the requirements described at: http://www.r6rs.org/process.html
---
Submitter's Name:
  Thomas Lord
Submitter's Email Address:
    lord_at_emf.net
Type of Issue:
    Defect (solvable by a Simplification)
Priority:
    Major
R6RS component:
    The CHAR? type.   (Section 9.14)
Summary:
  The restriction in section 9.14, prohibitting the domain of
  INTEGER->CHAR from including surrogates, should be relaxed.
  Implementations should be permitted, not required, to adopt
  that restriction.
Body:
  The text of 9.14 says, concerning the domain of
  the INTEGER->CHAR procedure:
      (integer->char sv)
      Sv must be a scalar value, i.e. a non-negative exact
      integer in [0,#xD7FF] union [#xE000,#x10FFFF].
  I think it should say:
      Implementations are permitted to require that
      Sv must be a scalar value, i.e. a non-negative exact
      integer in [0,#xD7FF] union [#xE000,#x10FFFF].
  or words to that effect.
  Opinions vary about the desirability of an implementation in
  which an "unpaired surrogate" can be represented as a CHAR?
  value.  There seem to be no definitive arguments for or
  against this proposition.  I would be happy to explain in
  detail an implementation that permits unpaired surrogates as
  CHAR? values, and why I prefer such an implementation.
  John Cowan and I have both asserted that a problem with
  allowing unpaired surrogates as CHAR? values is that there
  is no standard way to write them to a UTF-8 or UTF-16 port.
  That is true, but it is not an argument for the restriction
  in 9.14.
  What is not clear to me is why the authors favor the
  restriction and what kind of arguments, examples, logic
  etc. to offer in order to attempt to persuade them otherwise.
  Would it be helpful for me to describe an implementation
  that doesn't have the restriction?   Or to explain how the
  I/O issues can be addressed?   I am hoping it is a simple
  matter to drop the restriction on the general principle
  that restrictions like that need a strong, positive rationale
  which, in this case, is clearly lacking.
  Very briefly, therefore:
    1. In general, the less restricted model is simpler and more
       powerful.  In an implementation without the restriction,
       the CHAR?  type can simply be isomorphic with a set of
       exact integers in some (possibly improper) superset of
       [0,#xFFFFFFFF].   That enables things like "bucky bits"
       (a fine lisp tradition).   It is certainly easy to teach
       and learn.  It seems to be simpler to implement, too.
     2. The I/O issues can be solved in a clever way -- by
        reinterpreting ill-formed UTF-8 and UTF-16 as spellings
        of sequences of certain private-use codepoints.
        Round-trips with processes that don't understand these
        private use characters are perfectly robust to the
        extent that those processes are conforming.
Received on Thu Nov 16 2006 - 16:21:18 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC