Thomas Lord wrote:
> Per Bothner wrote:
>> I've expressed my preference: introduce a "marker" type,
>> where a marker is a position in a string or a "buffer".
>> Simple operations (e.g. get next characater, set next character,
>> or move position) are O(1) regardless of the reresentation used.
>
>
> I'm not so sure about the O(1) claim.
>
> It should be easy to see that O(1) is unlikely in the
> case where mutations are permitted to change the
> STRING-LENGTH of the string both because
> of the need to copy string data and the need to
> update other markers.
The word I left out is "average" or "amortized".
Assume modifications tend to be clumped in the same region,
most commonly the end of the string. Then use a simple
buffer-gap representation, growing the buffer by a
fixed factor (say 2) when needed.
I will admit that efficiently managing the markers leads to
various non-trivial options. However, if the marker object
contains a raw offset into the buffer, then it only has to be
modified when the gap is moved or the buffer resized,
in which case even just straightforward looping through all
the markers still keeps the cost of updates O(1) on average,
if we assume the number of markers is less than O(N).
> Even if STRING-LENGTH doesn't change, as you
> point out in your formal comment, unless the implementation
> is using UTF-32 or similar, setting a character
> will have similar impacts.
Yes.
> Also, would you have your markers include an
> operation like (MOVE-MARKER! M N) that changes
> the position of a marker by some arbitrary integer
> N? If not, the core string API would penalize operations
> that /do/ have efficient random access to strings.
> If so, then you have a simple operation which is
> O(N) in many popular representations.
simple != useful. Show me a use for MOVE_MARKER for values
other than 1 and -1, and then maybe I'll care.
--
--Per Bothner
per_at_bothner.com http://per.bothner.com/
Received on Sat Mar 24 2007 - 16:29:42 UTC