Abdulaziz Ghuloum wrote:
>
> On Mar 20, 2007, at 1:34 AM, Per Bothner wrote:
>> Well, there may be constraints that complicate that. For example
>> a Java-based implementation may want to use java.lang.String
>> for immutable strings.
>
> I think you're complaining more about the shortcomings and limitations
> of the JVM;
No, this is not a shortcoming/limitation of the JVM. Java String
objects, and likewise Java char arrays can represent all of Unicode,
can do it compactly using a simple array, and String/array indexing are
O(1). But such indexing retrieves code units, not scalar values.
This works fine, since there is no real application where you need
to index the N'th scalar value of a string.
(There is some historical baggage in Java relating to this distinction,
but the problem is the character type, not the string or char array
types: There should be distinct primitive types for code-point
and scalar value. As it is 'char' is code-point, and 'int' is
used for scalar value.)
>> However, it seems to prohibit an implementation that is simple
>> (the way a raw array is), space-efficient, and O(1) for
>> string-ref/set! Pick any two.
> As far as I can tell, there is no representation that has all three;
> otherwise, we would've seen it everywhere and there wouldn't have been
> any other logical choice. So why are we complaining about the draft
> prohibiting the nonexistent?
If we add:
(string-codepoint-ref str i)
then we can achieve all three.
Furthermore, the draft should remove the datatype of fixed-length
mutable string, since it is
(a) useless,
(b) difficult to implement unless you also make it variable-length.
I.e. Since you cannot allocate a fixed-size chunk of memory for
a string of N scalar values (unless you use 24 bits per scalar
value), you need a buffer that can be resized. And if a string
can be resized, why not expose that in the string API, since that
is very useful?
--
--Per Bothner
per_at_bothner.com http://per.bothner.com/
Received on Tue Mar 20 2007 - 11:46:14 UTC