[r6rs-discuss] Strings

From: Jason Orendorff <jason.orendorff>
Date: Sat Mar 24 23:12:59 2007

Marcin 'Qrczak' Kowalczyk <qrczak_at_knm.org.pl> wrote:
> A disadvantage of UTF-16 is that character predicates like
> char-alphabetic? break for characters above U+FFFF.

This kind of bug is pretty common in Java, but it isn't a
necessary consequence of using UTF-16.

Nor does focusing on scalar values fix the problem:

  (define (all-alphabetic? s)
    (for-all char-alphabetic? (string->list s))) ;BUG

This bug is both subtler and more likely to bite.

You could fix both by providing higher-level APIs:
  (string-first s) ===> the first grapheme cluster
  (string-rest s) ===> everything else
and so on. The way this leads is to a realignment of
all the string/character APIs toward grapheme clusters,
away from scalar values. I offer this because if the
editors want to do something unconventional, I think
this is the way to go.

-j
Received on Sat Mar 24 2007 - 23:12:51 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC