[r6rs-discuss] perhaps i should be formal, but.... from John Cowan on 2007-03-15 (r6rs-discuss.mbox)

From: John Cowan <cowan>
Date: Thu Mar 15 00:10:00 2007

MichaelL_at_frogware.com scripsit:

> I'm also concerned that R6RS, as currently written, seems to require
> UCS-4/UTF-32 strings. The problem is that string-ref returns characters,
> and characters can't be surrogates.

If string-ref also required O(1) time complexity, then you'd be right.
But it doesn't; it's perfectly fine to implement string-ref on top of
underlying UTF-8 or UTF-16 character sequences; you just have to settle
for O(N) performance.

Alternatively, you can use a design in which strings that use the Latin-1
repertoire are stored as Latin-1, strings that use the BMP repertoire
are stored as UCS-2, and all others as UCS-4. That allows string-ref to
be O(1) always, but string-set! winds up being O(N) in the general case,
though still O(1) in most practical situations.

> Then we'd have uchar and ustring and, perhaps, fewer
> backward-compatibility issues.

Python has been suffering through that for several years now, and has
decided to break backward compatibility and abandon the 8-bit strings --
but using the 8-bit names for Unicode strings. I don't know what the
internal implementation is.

> But there's no bytevector-upper or bytevector-<? and such, so no,
> something was lost, at least for "low level" work.

They're easy to write, though, if you do need them. If you want them
to be locale-sensitive, you have to work a little harder.

-- 
John Cowan    http://ccil.org/~cowan  cowan_at_ccil.org
'Tis the Linux rebellion / Let coders take their place,
The Linux-nationale / Shall Microsoft outpace,
We can write better programs / Our CPUs won't stall,
So raise the penguin banner of / The Linux-nationale.  --Greg Baker

Received on Thu Mar 15 2007 - 00:09:56 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC