MichaelL_at_frogware.com scripsit:
> I'm also concerned that R6RS, as currently written, seems to require
> UCS-4/UTF-32 strings. The problem is that string-ref returns characters,
> and characters can't be surrogates.
If string-ref also required O(1) time complexity, then you'd be right.
But it doesn't; it's perfectly fine to implement string-ref on top of
underlying UTF-8 or UTF-16 character sequences; you just have to settle
for O(N) performance.
Alternatively, you can use a design in which strings that use the Latin-1
repertoire are stored as Latin-1, strings that use the BMP repertoire
are stored as UCS-2, and all others as UCS-4. That allows string-ref to
be O(1) always, but string-set! winds up being O(N) in the general case,
though still O(1) in most practical situations.
> Then we'd have uchar and ustring and, perhaps, fewer
> backward-compatibility issues.
Python has been suffering through that for several years now, and has
decided to break backward compatibility and abandon the 8-bit strings --
but using the 8-bit names for Unicode strings. I don't know what the
internal implementation is.
> But there's no bytevector-upper or bytevector-<? and such, so no,
> something was lost, at least for "low level" work.
They're easy to write, though, if you do need them. If you want them
to be locale-sensitive, you have to work a little harder.
--
John Cowan http://ccil.org/~cowan cowan_at_ccil.org
'Tis the Linux rebellion / Let coders take their place,
The Linux-nationale / Shall Microsoft outpace,
We can write better programs / Our CPUs won't stall,
So raise the penguin banner of / The Linux-nationale. --Greg Baker
Received on Thu Mar 15 2007 - 00:09:56 UTC