[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode) from Jason Orendorff on 2007-03-20 (r6rs-discuss.mbox)

From: Jason Orendorff <jason.orendorff>
Date: Tue Mar 20 07:37:38 2007

> I don't intend to teach a seminar here on implementation
> strategies for O(1) string-ref, but I'll describe just one
> simple strategy that achieves O(1) time for both string-ref
> and string-set! while using only a little more space than
> UTF-8. The basic idea is to represent every string by an
> opaque, sealed record whose fields include a vector of
> bytevectors. All but the last of those bytevectors is the
> UTF-8 encoding of exactly 100 characters; the last one
> contains between 0 and 100 characters, inclusive, and
> contains 0 characters iff the length of the entire string
> is 0.

I'm going to assume you meant this seriously.

If no one cares how bad this is, the discussion is over.

I looked at a handful of Schemes and it certainly appears they all
keep each string in a single buffer. I could be wrong. I doubt it.
It's the same in the other languages whose internals I'm familiar
with. I think there are many reasons for this--algorithms implemented
under the covers are simpler and work faster on a single buffer; you
can provide a simpler native interface; you do less copying every time
you want to call into a library; and maintenance costs.

If you want to provide (say) regular expressions using an existing
native library, it's crazy not to have your strings in a single buffer.
If your Scheme is written in Java or C#, it's impractical not to use
the system string type.

Again, if implementers don't care about any of this, we're done
here.

-j
Received on Tue Mar 20 2007 - 07:37:32 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC