[r6rs-discuss] [Formal] Scheme should not be changed to be case sensitive. from Peter Gavin on 2006-11-15 (r6rs-discuss.mbox)

From: Peter Gavin <pgavin>
Date: Wed Nov 15 13:25:10 2006

On 11/15/06, John Cowan <cowan_at_ccil.org> wrote:
> Peter Gavin scripsit:
> > Also, the same program could refer to the same identifier using
> > different byte sequences, possibly of different lengths.
>
> Eliminating case-folding doesn't necessarily eliminate that.
>

For argument's sake, suppose the report required all implementations
behave as if strings and characters were internally represented as
UCS-4. Could you give me an example where two strings could be
string=? yet be represented by different byte sequences?

> > Another reason I disagree with case folding is that case folding
> > is not necessarily a fast operation, and it may slow down the
> > reader considerably. (Or it may not, I suppose it depends on the
> > implementation.)
>
> The Unicode case-folding table has 1037 entries (discarding the two
> Turkic-specific ones), so an unrolled binary search would take about 10
> extra comparisons. My guess is that this would be lost in the noise.
>

Good point.

> > One possible compromise, solely for the purpose of maintaining backwards
> > compatibility with R5RS: any character in the set of uppercase Latin
> > alphabetic characters (ASCII 0x41-0x5A) is interchangeable with its
> > lowercase version (ASCII 0x61-0x7A), and vice-versa. Every other
> > Unicode character is kept as-is, and no other case manipulation occurs.
>
> An early version of internationalized DNS proposed that, and was rejected
> on the grounds of mental complexity. The situation is especially bad
> in those national alphabets that use both basic Latin and other letters:
> "Mutter" and "MUTTER" would both case-fold to "mutter", but "m?tter" and
> "M?TTER" would case-fold to "m?tter" and "m?tter" respectively.
>

Good point here as well. I would propose the "mental complexity"
argument applies to any case folding in general. A case-sensitive
scheme would be easiest to grok, especially for beginning coders and
people unfamiliar with Unicode. Case folding in Unicode is nowhere
near as simple as case insensitivity in ASCII.

Pete
Received on Wed Nov 15 2006 - 13:25:05 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC