[r6rs-discuss] Comparing Strings

From: MichaelL_at_frogware.com <MichaelL>
Date: Wed Mar 14 19:39:37 2007

(I started a thread on character and string comparisons some time ago, but
then dropped it when I was home sick for a few days. My original point was
that I thought it would be surprising that, for example, string<? was
written in terms of char<? but string-ci-<? was *not* written in terms of
char-ci-<?. Under the hood, the issue is that string-foldcase isn't
written in terms of char-foldcase; the ordering algorithm used by the two
ci functions is the same.)

==================================================

> > I would expect the stricmp-equivalent variants of string comparison to
use
> > the algorithm that characters use rather than the one it currently
uses.
>
> Nobody should use that algorithm, ever.

My point is about consistency. Right or wrong, R6RS specifically defines
string<? in terms of char<?. Given that it would be reasonable to assume
that string-ci<? would be defined in terms char-ci<?, but it isn't. I
think that will cause unnecessary confusion. If nothing else, the
confusion could be reduced by using different names for the things that
work differently.

I was making three suggestions.

First, change the names of the current string case folding functions (and
the case-insensitive comparisons that depend on them) to break the naming
link with the char functions. For example:

        ; li = "locale-independent"?
        string-li-downcase
        string-li-upcase

        string-li-ci<?

(There would be no corresponding char functions.)

Second, use the current names to create functions that honor the
expectations:

        string-downcase ; uses char-downcase
        string-upcase ; uses char-upcase

        string-ci<? ; uses char-ci<?

Third, I was questioning the usefulness of a raw numeric ordering on a
string which had been case-folded using a locale-independent algorithm.
Does it really help that "?" becomes "SS" before a *byte-level*
comparison?

These suggestions are independent of each other. Acceptng one doesn't
necessarily mean accepting all. And rejecting one doesn't necessarily mean
rejecting all.
Received on Wed Mar 14 2007 - 19:39:06 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC