[r6rs-discuss] perhaps i should be formal, but....

From: Shiro Kawai <shiro>
Date: Wed Mar 14 17:29:56 2007

From: William D Clinger <will_at_ccs.neu.edu>
Subject: Re: [r6rs-discuss] perhaps i should be formal, but....
Date: Wed, 14 Mar 2007 16:16:11 -0400

> I am posting this as an individual member of the Scheme
> community. I am not speaking for the R6RS editors.
>
> Thomas Lord wrote:
> > Earlier revisions of the standard defined a portable character set,
> > allowing implementations to freely expand beyond that set.
> > In a portable program, if only the portable character set is
> > used, reliably portable behavior obtains.
>
> What's different now is that Unicode has become an
> established standard, and the portability advantages
> of requiring Scheme programs to use Unicode (which
> is more than just a character set) appear far larger
> than any advantages that might still be derived from
> allowing implementations and programs to choose their
> own character sets.

I also want to reserve a possibility of using different
character sets / encodings, but I agree that Unicode is
the only practical standard for portable programs. I'm
happy as far as R6RS does not prohibit an implementation
to use alternative character set / encodings if it wish.

  For example, Japanese official family registration system
  uses its own character set and codepoints, since it needs
  to distinguish more subtle differences of characters than
  Unicode. (There's no clear line between abstract characters
  and glyphs---it is context-dependent, and for family names
  the line gets closer to glyphs).

The range restriction of integer->char seems reasonable
to guarantee the portable behavior. The implementation
can have another procedure that can deal with non-unicode
range/character set.

Although I feel it better that the standard uses clearer
namings between integer and character conversion, such as
unicode-scalar-value->char (or some abbreviation of it), which
makes it clear that one can't pass non-unicode scalar value.
This isn't a strong desire, though.

The wording of "character" object definition, however, could
be changed. It is unclear to me that (char? <obj>) can return
#t if <obj> is in the implementation's extended character
set but not in unicode. If it can't, I can still provide
(extended-char? <obj>) for example, but it's a bit awkward.
So as other procedures that deals with characters---in
(string <char> ...), should each <char> be in unicode? (I hope not!)

--shiro
Received on Wed Mar 14 2007 - 17:29:25 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC