[R6RS] Procedures that depend on Unicode character classification
Matthew Flatt
mflatt at cs.utah.edu
Thu Jun 15 08:46:03 EDT 2006
At Wed, 14 Jun 2006 17:21:09 -0400, William D Clinger wrote:
> The Unicode general categories are represented by symbols
> in lower case, e.g. 'lu instead of 'Lu. Is this really
> what we intend for a case-sensitive R6RS?
That's what I intended, at least. I have no objection to using 'Lu.
> The description of string-foldcase talks of "cased characters",
> which I assume to be Unicode general categories Lu, Ll, and Lt,
Yes;.
> but it also talks about "case-ignorable characters". What are
> they?
"Case-ignorable" is defined by Unicode:
A character C is defined to be case-ignorable if C has the Unicode
Property Word_Break=MidLetter as defined in Unicode Standard Annex
#29, "Text Boundaries;" or the General Category of C is Nonspacing
Mark (Mn), Enclosing Mark (Me), Format Control (Cf), Letter Modifier
(Lm), or Symbol Modifier (Sk).
> The current draft of SRFI 75 says the char-alphabetic?,
> char-numeric?, and char-whitespace? predicates are as
> defined by SRFI-14, but SRFI 14 doesn't define those
> predicates; that SRFI explicitly warns that those
> procedures "may or may not be in agreement with the
> SRFI 14 base character sets" char-set:letter,
> char-set:digit, and char-set:whitespace. I presume
> the intent is for those predicates to be in agreement.
Yes, that was the intent.
Matthew
More information about the R6RS
mailing list