[r6rs-discuss] Strings

From: Jason Orendorff <jason.orendorff>
Date: Tue Mar 27 13:59:05 2007

I think we've gotten way off course. The only reason to standardize
the internal representation of strings would be to expose code units.
Otherwise you wouldn't bother. I can think of two good reasons to
expose code units and one pragmatic reason:

  1. Performance. I think R6RS should support a portable regex
     library--one that people can actually use. A portable parser
     library would also be nice. These things need fast access to
     code units.

  2. Native call interface. A portable one is beyond the scope of
     R6RS, but a standard representation for strings now would
     simplify future efforts.

  3. (the pragmatic reason) Maybe the editors don't have time to add a
     thorough high-level string API to R6RS. I don't know if this is
     true or not. If so, a simple, conventional low-level API would
     be an improvement over the current draft.

If these reasons are unpersuasive, we need not carry on about UTF-8
vs. UTF-16 etc. etc. If the editors decide that R6RS will not expose
code units, I'll just second Per Bothner's suggestion:

> * More generally, write the specification with the assumption
> that many/most Scheme implementations will use a simple
> UTF-8 array or a UTF-16 array. In the case of mutable
> strings, the array may be grown/relocated, and optionally
> use a buffer-gap scheme. We should not assume or require
> anything more complicated.

On the other hand, if it seems desirable to expose code units, UTF-16
is a good balance of all the factors.

-j
Received on Tue Mar 27 2007 - 13:58:53 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC