[r6rs-discuss] perhaps i should be formal, but....

From: Alexander Kjeldaas <alexander.kjeldaas>
Date: Thu Mar 15 18:52:40 2007

On 3/15/07, MichaelL_at_frogware.com <MichaelL_at_frogware.com> wrote:
> > On 3/15/07, MichaelL_at_frogware.com <MichaelL_at_frogware.com> wrote:
> > > > If string-ref also required O(1) time complexity, then you'd be
> right.
> > > > But it doesn't; it's perfectly fine to implement string-ref on top
> of
> > > > underlying UTF-8 or UTF-16 character sequences; you just have to
> settle
> > > > for O(N) performance.
> > >
> > > Are you suggesting that indexes represent code points rather than code
> > > units? I haven't seen anyone do that, not as the one-and-only
> interface to
> > > elements of a string. Have you? And do you think UTF-8/UTF-16
> > > implementations should be *required* to do that? (Obviously, then,
> > > string-length would have to return the number of code points rather
> than
> > > the number of code units.)
> >
> > SBCL does that.
> >
> http://sbcl.sourceforge.net/sbcl-internals/Character-and-String-Types.html
>
> I think SBCL uses UCS-4-sized code units when Unicode is enabled. If
> that's correct, then no, it doesn't do "that", it simply chooses an
> encoding that avoids the problem (at the expense of space).
>
>

This presentation about the unicode support in SBCL also sais code point.
http://www.doc.gold.ac.uk/~mas01cr/talks/2005-04-24%20Amsterdam/presentation.pdf
The internal representation is an immediate bit, the character tag and
the code point.


Alexander
Received on Thu Mar 15 2007 - 18:52:33 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC