[r6rs-discuss] Scheme should not be changed to be case sensitive

From: Chris Hanson <cph>
Date: Tue Nov 21 01:33:11 2006

Shiro Kawai wrote:
> By prefix you mean some mark like #!fold-case? Well, it works,
> but the story can be complicated.
>
> For example, suppose I have a library that produces a string
> representation of S-expression, which may be stored in a record
> of RDBMS, or to be appended to a log file, or to be sent over
> the newtork. Since the library doesn't know how the resulting
> string will be used, should the library put the prefix in every
> string it produces? We may have an option to specify whether
> we need a prefix or not, but if the library is deep down in the
> layers of other libraries, passing around such an option explicily
> is tedious.

> [...more reasons why this could be a problem...]

How is this different from a BOM? It seems to me that exactly the same
problems crop up.

> Suppose there will be a compatibility means so that you can use
> existing code without a problem. Now you have an option to
> redesign this feature of the language. According to the spirit
> of Scheme ("Language should be designed ..."), which one is simpler
> and cleaner, the one that has case-folding rule, and another that
> has not? Think about teaching it to students, or training new
> employees who have been programming in other popular languages.

This sounds like an argument for why Scheme should be case sensitive,
but I haven't argued against that (except to say that I'm unhappy about
it). Instead I'm arguing for a compromise that will allow the continued
use of case folding by those who like it, and will additionally provide
compatibility.

As for the "spirit of Scheme", I think you should be a little more
careful before invoking that particular genie. It appears in different
forms to different people, as was made painfully clear in the previous
standardization efforts.

> [...module system makes this moot since old files won't work...]

I don't buy that particular argument, precisely because it too breaks
backwards compatibility. The library spec will have to be fixed, and
I'll get to that in due course. But I'm not ready to have that argument
quite yet.

> It may be still a convenient means to tag the file to
> indicate which mode the file wants to be read. I've been
> using such kind of tag to indicate the character encoding
> of the file, and it is working very well (I'm working
> in the environment where the source files are written in
> utf-8, euc-jp, shift-jis, and latin-1).

I think this is exactly right. In fact, I'll go one step further:
Scheme files should have an extensible header that can specify this as
well as other things (e.g. character coding, language, locale). But
regardless of the details, I think the information about case
sensitivity should be associated with the file containing the symbols,
and not in the code that reads it.

So let me suggest an alternative compromise. Suppose we assume some
kind of marker that appears early in the file, prior to any symbol.
(Normally I would say the first line, but I can see that this might be
an issue for unix scripts.) The marker is not dynamic -- it's allowed
to appear once, in a very limited scope, and it's easy to tell whether
it's present or absent. Assume that there is a default behavior in the
absence of a marker -- I'll get to the issue of _what_ the default is
shortly. Also, by default, output files have no markers and conform to
the default behavior for input files.

Now, assuming that the above suggestion is agreeable, we've reduced our
disagreement to the question of the default behavior. (I am by no means
certain that we agree on the above.) Of the available defaults, we can
choose a specific case, or we can leave this up to the individual
implementation. If a specific case is chosen as the default, then
strictly speaking only one kind of marker is needed: that for the other
case (but it does no harm to support both kinds). If the implementation
is allowed to choose, then there needs to be a marker for each case, so
that programmers can write portable code. My preference would be to
leave this choice to the implementation, since clearly there is room for
disagreement here, and the cost of allowing this freedom is small. You
seem to be arguing that the default should be case sensitivity. I don't
think anyone has argued for case insensitivity, although I can imagine
at least two such arguments.

I'll refrain from saying more about the specifics until I have a better
idea whether any of this resonates with you and others.

On a slightly different note: one thing that worries me in this
discussion is that several of the participants seem to think that case
sensitivity is the obvious choice, and that those in favor of case
insensitivity bear the burden of proof. I am relatively agnostic about
the technical choice (except as it concerns compatibility), but I feel
strongly that the choice is anything but obvious.

We are having a discussion about making a break with over 40 years of
Lisp history (and nearly 30 years of Scheme history), and for some
reason this doesn't seem to be a part of the discussion. What is it
that makes this language a dialect of Lisp? Is case insensitivity an
essential feature of Lisp? (For that matter, what about mutable pairs?)
 Is this a gradual process that will converge on something very
different from what we have called Lisp, and if so why isn't it a new
language? I don't know the answers to these questions, but I think it's
a mistake to omit them from the discussion.
Received on Tue Nov 21 2006 - 01:33:04 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC