[r6rs-discuss] Strings as codepoint-vectors: bad

From: Jason Orendorff <jason.orendorff>
Date: Fri Mar 16 02:20:05 2007

On 3/15/07, Per Bothner <per_at_bothner.com> wrote:
> Jason Orendorff wrote:
> > Making strings vectors of 16-bit values is simple, familiar,
> > speed-efficient, memory-efficient, easy to implement, and convenient
> > for programmers.
>
> [...]
> Most code will as you say work fine even if string-ref
> works on raw 8/16-bit code points. But those code
> points will not be "characters". We'd have to remove
> "character" functions.

I don't think we would have to remove them. There
could be a way to extract the characters from a string:

  (string-iterator s) procedure
    Returns an opaque iterator object that can be used with
    (next-char!).

  (next-char! it) procedure
    Returns the next character from the iterator `it`, or #f if there
    are no more characters.

These can be fast, O(1), and so can the code-unit-oriented
(string-ref), (string-set!), and (string-length).

Perhaps we would like to hide the in-memory encoding of strings from
users, but that's not really possible if you *also* wish to expose a
fast low-level API with integer offsets. The (string-ref) and
(string-set!) APIs, as currently specified, hit a sweet spot of API
badness: they're so low-level and essential that it's almost
unthinkable that they be anything but O(1); yet they're sufficiently
high-level that every actual O(1) implementation sacrifices efficiency
somewhere else.

-j
Received on Fri Mar 16 2007 - 02:19:55 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC