[r6rs-discuss] [Formal] String positions and string slices

From: William D Clinger <will>
Date: Tue, 10 Apr 2007 13:12:03 -0400

I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors, and
this message should not be confused with the editors'
eventual formal response.

This message is a collection of responses to comments
made by two different authors during the past 24 hours.
When the author of a quotation is not identified, it is
the same as the author of the previous quotation.

John Cowan wrote:
> > Quibble: I think the historical view of strings should be
> > continued for backwards compatibility with Scheme tradition.
>
> In that case, you also have to make characters something other than
> Unicode scalar values, or else go to very tricky implementations.
> I'm trying to break as little of R5.92RS as possible.

The character and string API described in the current
draft of R6RS will not break any portable R5RS code.
Your claim that backwards compatibility requires
characters to be "something other than Unicode scalar
values" or "very tricky implementations" is false, as
shown by several implementations that already implement
characters as Unicode scalar values and several other
implementations that will do so within months.

> I think the only way to make this fly is to introduce a CL-style
> distinction between characters (in texts) and basic characters
> (in strings), though keeping the name "character".

The question is why you think that.

> Alternatively,
> we could say that strings are sequences of Scheme character objects,

"Strings are sequences of characters." (Opening sentence
of section 9.14 of the current draft R6RS.)

> but the
> atomic unit of texts is a text containing a single (Unicode) character.

That would be fine. The new data type is unconstrained
by the old.

> However, I'd be pretty unhappy with doubling up like this. There is
> little or nothing that strings can do that texts cannot.

Aside from mutation if texts are immutable, or efficiency
if you somehow manage to design an inherently inefficient
data type of texts.

> For that matter, you can implement R5RS strings as vectors,
> provided you are allowed to redefine "vector?". If strings
> are massively more efficient than texts, though, people will
> go on using them.

The strings described by the current draft R6RS are quite
efficient for the traditional uses of strings in Scheme.
It would not be hard to design a text type that is slow
enough to encourage people to continue to use strings.
I would hope, however, that the designers of the text
type would try to avoid that outcome.

Chris Hanson wrote:
> The
> historical string abstraction should remain vectors of 8-bit (or 7-bit)
> characters with side effects.

Neither the IEEE/ANSI standard nor the Scheme reports
limited characters to 7 or 8 bits. Many implementations
did so, but any code that relied upon such a limitation
was never portable.

> Or we could say that strings contain only a small subset, e.g. ISO
> 8859-1 or US-ASCII.

That would be a more radical departure from past reports
than is taken by the current draft R6RS.

> > In my opinion, texts should be written up as a SRFI, and
> > then be considered for inclusion in the R7RS.
>
> <flame>
>
> Yes, but the same could be said for many of the experiments currently
> being pushed into R6RS. Speaking only for myself and some
> as-yet-unidentified historical brethren, I would be much happier with a
> less radical and more evolutionary document. Why exactly is it
> necessary to change **everything** now? Either this process works, in
> which case there will be further revisions. Or it doesn't, in which
> case it doesn't matter.

That is a valid criticism of much of the current draft,
but is not a valid criticism of its treatment of characters
and strings. In the current draft, characters and strings
are backwards compatible with the R5RS in the sense that
no portable R5RS-conforming program will have to change
its handling of characters and strings.

Furthermore, the current draft's handling of characters
and strings was first put forward as SRFI 75 in July 2005,
which received the benefit of the usual SRFI discussion and
at least one actual implementation before it was withdrawn
in May 2006 to clear the way for a revision of it in the
first draft R6RS. You can't call it experimental when
people have been using it in production code since December
2005.

> The editors should be in trying to make this document a success, rather
> than in packing it with all these new things. A more conservative
> document stands a much better chance of ratification and implementation.

Agreed, except that characters and strings are not new in
Scheme, none of the character and procedures described in
(r6rs base) are new in Scheme, and the few additional
character and strings procedures in (r6rs unicode) can't
break any existing code. You can't get much more
conservative than that.

> That's important, because the way things are going I am very skeptical
> that R6RS will be implemented.

If the R6RS is ratified, it will be implemented. Whether
the R6RS will be ratified is, however, an open question.

> Well, except for things like XML libraries that won't work with strings,
> or the fact that SYMBOL->STRING will signal an error on any symbol
> containing a character outside the subset.

Just to clarify: libraries written in other languages
have never interoperated with Scheme strings without
marshalling or dependence on implementation-specific
representations. Furthermore, Chris Hanson may be the
only person who thinks symbol->string should "signal
an error on any symbol containing a character outside"
Hanson's preferred set of characters. Neither can be
blamed on the current draft R6RS.

Will
Received on Tue Apr 10 2007 - 13:12:03 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC