Since there now seems to be a feeling that strings may require
a non-trivial data structure, why not go with the classic:
Emacs-style buffers and markers. These are actually fairly
simple and efficient, and can make for a very nice API.
Let's start with the data structure:
A buffer is a simple resizable array of character data,
either UTF-8 bytes or UTF-16 words. Using a buffer-gap
representation (the traditional Emacs representation)
is suggested.
A marker is a reference to a location in a buffer,
implemented as a reference to a buffer and an index.
The index is automatically adjusted if text is inserted
or removed before the index position. (This is easy to
implement by chaining all the markers pointing into
a buffer into a linked list.)
A string is a pair of markers into the same buffer.
(string-start str) -> marker
(string-end str) -> marker
(substring marker1 marker2) -> string
A Scheme API works with strings and markers, buf does
not directly access buffers.
These create new buffers, and then a string delimited
by the endpoints of the marker.
(make-string ...)
(string ...)
(marker-ref marker) -> char following marker or (end-of-file)
(string-replace str1 str2)
(string-delete str) == (string-replace str "")
(string-insert marker str)
== (string-replace (substring marker marker) str)
A marker is a textual-input-port and (if the string buffer is
mutable) a textual-output-port, so the various port procedures
work on markers. This implies that the position of a marker
can be moved within a string.
This eliminates the current draft's complexity of representing
a string to enable O(1) indexing. Instead, we have some extra
well-understood complexity of managing markers, and gain a
much more powerful API.
--
--Per Bothner
per_at_bothner.com http://per.bothner.com/
Received on Tue Mar 20 2007 - 13:11:22 UTC