[r6rs-discuss] [Formal] String positions and string slices

From: William D Clinger <will>
Date: Mon, 09 Apr 2007 19:12:01 -0400

I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors, and
this message should not be confused with the editors'
eventual formal response.

Chris Hanson wrote:
> I think John is right about this, although maybe he hasn't gone far
> enough.

Agreed.

> Historically, strings were treated equivalently to byte
> vectors, but this is no longer an appropriate view. We should be
> thinking of strings as pieces of text that can be manipulated by
> slicing, concatenation, searching, and a very small number of
> transformations (e.g. case canonicalization).

Quibble: I think the historical view of strings should be
continued for backwards compatibility with Scheme tradition.
In particular, eliminating string-set! and its mutating
friends would break backward compatibility and make the
transition from R5RS to R6RS even more difficult and less
acceptable to the conservative elements of our community.
(Whether I am myself one of those elements is left as an
exercise for the reader.)

I agree, however, that we should think less about Scheme's
traditional strings and more about a new data type that
represents "pieces of text that can be manipulated by
slicing, concatenation, searching, and a very small number
of transformations (e.g. case canonicalization)."

> Given this view, I think that strings should be read-only,

These new text-y things, which we might call texts, should
be read-only. Making all strings read-only, as I advocated
at Brandeis (but gave in to Chris Hanson and others who were
arguing for mutable strings), would break too much code for
us to make that change now.

> However, in the interest of backwards compatibility, I'd like to
> see names that don't conflict with those in common use.

Exactly. I would have no problem with John Cowan's proposal
if he were proposing new immutable data types of texts and
positions. That would eliminate invalidation by string-set!,
eliminate the aliasing problems of his proposal, and eliminate
a couple of ambiguities in his specification of string-slice.

Here is a brief outline of what John could have proposed had
he been willing to make texts and positions into new types.
All big-oh requirements refer to amortized average time.

string?
string
make-string
string-length
string-ref
string-set!

As in the current draft. The make-string procedure should run
in O(n) time; the others should run in O(1) time.

(text? obj)

True of texts; false of strings and everything else. Should
run in O(1) time.

(make-text string)

Should run in O(n) time; should run in O(1) time if the string
is immutable.

(position? obj)

True of positions; false of integers etc. Should run in O(1)
time.

(position-text position)

Returns the text to which the position refers. Should run in
O(1) time.

(text-start-position text)
(text-end-position text)

As in John's proposal. Should run in O(1) time.

(text-position=? position1 position2)

True iff both positions identify the same positions within
the same text. Should run in O(1) time.

(text-forward position [count])
(text-backward position [count])

As in John's proposal, but the text needn't be mentioned
because it is encapsulated within the position. Should run
in O(1) time, but may take O(count) time in some systems.

(text-prefix position)
(text-suffix position)

As in John's proposal, but the text needn't be mentioned
because it is encapsulated within the position. Should run
in O(1) time.

(text-position-difference position1 position2)

As in John's proposal, but the text needn't be mentioned
because it is encapsulated within the positions. Should
run in O(1) time, but may take longer in some systems.

(text-slice position1 position2)

The two positions must refer to the same text. Should run
in O(1) time.

(string-slice string start end)

Equivalent to

  (let ((t (make-text (substring string start end))))
    (text-slice (text-start-position t) (text-end-position t)))

but may run faster, especially if the string is immutable.

Et cetera.

For reasons already mentioned, I think this would be a more
pleasant API than could be obtained by trying to extend the
mutable strings that have been in Scheme for lo these thirty
years (unfortunately).

I also think it would be straightforward to implement the
text and position types portably using the current draft's
records, bytevectors, and strings. That implementation
could easily achieve the "should be O(1)" level of
performance, except make-text would always have to copy its
argument because there is no portable way to tell whether
a string is immutable, or whether its immutability would
be enforced even if it were.

In my opinion, texts should be written up as a SRFI, and
then be considered for inclusion in the R7RS.

Will
Received on Mon Apr 09 2007 - 19:12:01 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC