Peter Gavin scripsit:
> In addition, I would like to note that the Unicode standard is
> versioned; it would definitely be possible to pin R6RS to a specific
> version of Unicode.
IMHO this is a Bad Thing, and I'll address part of that issue in a
formal comment later.
> I do, however, dislike the case folding of identifiers. Case folding
> is not a reversible operation, and for instance, if an implementation
> wants to print a backtrace after a failure of some sort, it would not
> be possible to grep for the function name provided in the backtrace.
"grep -i" is your friend there, but I agree that case-folding is not
worth the trouble it causes. In particular, it folds away semantic
distinctions in certain languages, as with the Masse/Ma??e issue, both
of which case-fold to "masse". (For that matter, there's "polish" and
"Polish" in English.)
> Also, the same program could refer to the same identifier using
> different byte sequences, possibly of different lengths.
Eliminating case-folding doesn't necessarily eliminate that.
> Another reason I disagree with case folding is that case folding
> is not necessarily a fast operation, and it may slow down the
> reader considerably. (Or it may not, I suppose it depends on the
> implementation.)
The Unicode case-folding table has 1037 entries (discarding the two
Turkic-specific ones), so an unrolled binary search would take about 10
extra comparisons. My guess is that this would be lost in the noise.
> One possible compromise, solely for the purpose of maintaining backwards
> compatibility with R5RS: any character in the set of uppercase Latin
> alphabetic characters (ASCII 0x41-0x5A) is interchangeable with its
> lowercase version (ASCII 0x61-0x7A), and vice-versa. Every other
> Unicode character is kept as-is, and no other case manipulation occurs.
An early version of internationalized DNS proposed that, and was rejected
on the grounds of mental complexity. The situation is especially bad
in those national alphabets that use both basic Latin and other letters:
"Mutter" and "MUTTER" would both case-fold to "mutter", but "m??tter" and
"M??TTER" would case-fold to "m??tter" and "m??tter" respectively.
--
John Cowan http://ccil.org/~cowan cowan_at_ccil.org
Economists were put on this planet to make astrologers look good.
--Leo McGarry
Received on Wed Nov 15 2006 - 09:48:14 UTC