[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

From: Per Bothner <per>
Date: Tue Mar 20 13:59:32 2007

William D Clinger wrote:

> We are having this conversation because there are *lots*
> of applications that need to index either (1) the Nth
> scalar value of a string or (2) the Nth code unit of
> some particular representation of the string.

Right - they need one of them, but not both.

Furthermore, most such application don't actually need N
to be a "counter" - which they need N for is as a magic
cookie - or position.

Hence my just-posted "marker" suggestion.

> What you were trying to say, I think, is that you want to
> add string-codeunit-ref. Since there are three standard
> forms of code units, you would need three procedures,
> not just one:
>
> string-codeunit-utf-8-ref
> string-codeunit-utf-16-ref
> string-codeunit-utf-32-ref
>
> Making all three of those run in O(1) time is much harder
> than making string-ref run in O(1) time.

No, my point is for most applications you need *one* of these
and you *don't care* which it is.

For example, copying a string, appending strings, searching for
a substring: All of these work fine on opaque "code units", and
doesn't need to know whether the code unit is a utf-8 byte,
a utf-16 word, a utf-32 value, or a Unicode scalar value.
(The latter two are presumably the same.)

If the standard specifies string-codeunit-ref which returns an
opaque fixnum, and recommends (SHOULD) that function to be O(1),
but does not require that string-ref be O(1) then implementations
have a lot more latitude.

For example:
(string-codeunit-ref str k1) -> fixnum
(string-codeunit-char-at str k1) -> character
(string-codeunit-char-next str k1) -> k2
(string-codeunit-substring k1 k2) -> string
(string-codeunit-set! str k1)
(string-codeunit-replace! str1 k1 k2 str2)

The k1 and k2 values are exact non-negative integers.
-- 
	--Per Bothner
per_at_bothner.com   http://per.bothner.com/
Received on Tue Mar 20 2007 - 13:58:29 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC