[R6RS] Unicode SRFI - responses needed

Tue Jul 19 12:12:57 EDT 2005

Here's a summary of the SRFI feedback so far, from my perspective:

 * Jorgen Schaefer points out that Unicode defines a case-folding
   mapping, which we should use for -ci operations instead of
   downcasing. I think he's clearly right; I overlooked this mapping
   before.

 * Alex Shinn and others have convinced me that the SRFI should include
   string-upcase', `string-downcase', and `string-titlecase', which are
   locale-independent but use the mappings in "SpecialCasing.txt" to
   handle a few conversions that are not 1-to-1 in scalar values.

   Along the same lines, the string -ci operations should incorporate
   the non-1-to-1 mappings in CaseFolding.txt. (This file provides
   specific information to use for both 1-to-1 mappings and non-1-to-1
   mappings.) After case-folding, the string comparisons should proceed
   by comparing characters (i.e., scalar values); they should not use
   the Unicode collation algorithm.

   Finally, we probably want case-folding operations `char-foldcase'
   and `string-foldcase'.

   All of this, to me, strikes the right balance between usefulness and
   ease-of-implementation.

 * See

     http://srfi.schemers.org/srfi-75/mail-archive/msg00084.html

   where I attempt to reply to most other messages that suggest more
   exotic definitions of "character", a weaker definition of
   "character", or a different set of core operations.

   The message content is merely my opinion, but I did my best to
   reflect the opinion of the editors as a group. Whether I got it
   right is something we should discuss further.

   An ongoing point of discussion among a handful of people (not
   including me) is whether R6RS should include any character
   comparison or conversion operations. I still think it should, but we
   should discuss this specifically before putting out a new draft.

 * The reception for here strings is mixed. I would be happy to see
   them go, at this point, just to keep things simpler. In any case,
   I'd like to get a sense of the editors' opinion before producing a
   second draft.

 * An open question: are Scheme implementation required to support all
   Unicode scalar values, or are subsets ok? I think we discussed this,
   and we noted that the set of characters that fit into the 16-bit
   space is closed under various operations (I'll double-check this).
   Few other natural subsets are closed, apparently.

   I'm inclined to require full the set of Unicode characters, and let
   implementations declare that they deviate from the standard when
   they support only subsets. That way, a library implementor can say
   "this library works in Scheme", instead of "this library works in
   Scheme that supports all Unicode characters". Meanwhile, other
   libraries might be annotated "this library works even with
   variations of Scheme that support only ASCII characters" when the
   library implementor cares and has given the question some thought.

   This question is closely related to whether R6RS nails down the
   definition of character and supplies various operations at all. Many
   appeal to the way that numbers are handled in R5RS to support that
   idea that R6RS's requirements should be minimal. Since I see our
   role as strengthening portability wherever possible, I don't agree
   with this line of reasoning.

Response items:

 * Does anyone doubt that we really want to pin down the definition of
   character as "Unicode scalar value"? (I still don't.)

 * Does anyone want to argue that supporting a subset of Unicode might
   count as standard-compliant? (I think that it's not necessary to
   allow this in the standard.)

 * Is anyone unhappy with slightly more complex string operations that
   take into account non-1-to-1 conversions? (I think I'm happy with
   this, and I'll implement it today to be sure.)

 * Who wants to keep character-based comparison and conversion
   operations? (I do.)

 * How many editors want to keep here strings? How many would prefer to
   see them go? (I'm now inclined to get rid of them.)

Thanks,
Matthew