[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: John Cowan <cowan>
Date: Tue Mar 20 16:08:20 2007

Abdulaziz Ghuloum scripsit:

> I must've missed it somewhere, so let me ask the stupid question.
> What's the problem with the current draft that prohibits implementing
> string-find portably and efficiently?

Well, let's look at the simpler case of string-index (find the index of
a given character in a given string). The flatfooted implementation is
"Look at each character in turn using string-ref, and if it's right,
bail out, returning the current index."

Unfortunately, if string-ref is O(N), then this obvious algorithm
is O(N^2), nor is there any simple way to do better given only the
R5.92RS primitives. However, we can obviously do much better if we
have a string-for-each procedure that is O(N). Unfortunately, we have
no portable way to write this procedure without string-ref.

Therefore, if we accept that string-ref might not be O(1), we need some
sort of iterator or iterator/constructor that will have an amortized O(N)
cost for sequential access to each location. String-for-each is probably
technically sufficient, but providing a few others like string-map,
string-fold(-right), and string-unfold(-right) will make programmers'
lives easier.

(Note that both UTF-8 and UTF-16 can be traversed right-to-left at O(1)
cost, though this is not true for character encodings in general.)

-- 
When I'm stuck in something boring              John Cowan
where reading would be impossible or            (who loves Asimov too)
rude, I often set up math problems for          cowan_at_ccil.org
myself and solve them as a way to pass          http://www.ccil.org/~cowan
the time.      --John Jenkins

Received on Tue Mar 20 2007 - 16:08:12 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC