[r6rs-discuss] Unicode issues
Shiro Kawai wrote:
> I wonder why you didn't consider the cost of substring (given
> that the start and end point can be provided in a way that
> implementation can find them most efficiently). Maybe it's
> a matter of style, though I use substring (both explicitly and
> implicitly) all the time while hardly use string mutation.
My goal was to extend Larceny's R5RS-compatible character
and string datatypes to full Unicode without requiring any
changes at all to existing programs. That ruled out changes
in the API, and it also ruled out any noticeable decrease in
the performance of any operation.
Had I been designing from scratch a datatype of texts,
I'd have done things quite differently. I'd have made
texts immutable, mainly so I could have fast (but not
necessarily constant-time) subtext and concatenation
operations.
It is easy to design such a texts API, with random access
by character index, and to provide a portable reference
implementation using R5RS/R6RS strings. So long as the
string-length, string-ref, and string-set! operations are
O(1) amortized time, the portable reference implementation
of texts should be efficient enough for all but the most
demanding applications. That is one reason why I think
O(1) string operations are important.
It is also possible to design a texts API that does not
provide random access by character index, is biased toward
Latin-1 or UTF-8 or UTF-16, does not provide fast subtext
or concatenation operations, and so forth. Personally, I
don't see why we should pursue a compromised API for texts
when an uncompromising API is so easy to design and to
implement efficiently.
In short: We can develop a nice API for immutable texts,
and make it into a popular SRFI, so long as we have O(1)
string operations. If some implementations of Scheme were
to abandon O(1) string operations just so they can use an
inefficient representation, such as UTF-8 or UTF-16, then
the texts datatype might perform poorly in those systems,
but that's okay. Users who care about the performance of
immutable texts would just avoid those implementations.
Will
Received on Wed Aug 29 2007 - 16:59:53 UTC
This archive was generated by hypermail 2.3.0
: Wed Oct 23 2024 - 09:15:01 UTC