[r6rs-discuss] Strings as codepoint-vectors: bad

From: Thomas Lord <lord>
Date: Fri Mar 16 02:41:23 2007

Jason Orendorff wrote:
> Perhaps we would like to hide the in-memory encoding of strings from
> users, but that's not really possible if you *also* wish to expose a
> fast low-level API with integer offsets. The (string-ref) and
> (string-set!) APIs, as currently specified, hit a sweet spot of API
> badness: they're so low-level and essential that it's almost
> unthinkable that they be anything but O(1); yet they're sufficiently
> high-level that every actual O(1) implementation sacrifices efficiency
> somewhere else.


(With apologies, but....) this is an example of a category of error
that seems to show up whenever people start talking Unicode
implementation, whether in the context of Scheme or not. There
is some significance that is R6 specific, though:

The generic error is wishing for some "easy way out" that
makes Unicode as easy to hack as ASCII. Won't happen.
Text is just not that simple. Unicode does a fantastic job of
making it "... but no simpler".

The R6 specific thing is the draft's gist of designing CHAR and
STRING around text, rather than around a more abstract conception
of PORTs. That leads, among other problems, people to go on
a quest for that "easy way out" (and, anyway, there are principled
reasons to reject it a priori). This ain't Python we're hacking here.


-t
Received on Fri Mar 16 2007 - 02:50:44 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC