[r6rs-discuss] Strings as codepoint-vectors: bad

From: John Cowan <cowan>
Date: Thu Mar 15 17:24:20 2007

Jason Orendorff scripsit:

> I think people who favor strings-as-codepoint-vectors must also think
> that breaking a surrogate pair is really bad. But even with a
> codepoint-centric view of text you can unwittingly break a grapheme
> cluster, which amounts to the same sort of bug--it can lead to garbled
> text--and which is probably much *more* common in practice. I never
> hear anyone complain about that.

I absolutely disagree that these two problems are analogous at all:
Separating surrogate pairs is (a) UTF-16 specific and (b) leaves the
result uninterpretable. Gumming up a grapheme cluster is more like
an off-by-one error in inserting a character: the output is garbled
but not garbage.

-- 
John Cowan   <cowan_at_ccil.org>   http://www.ccil.org/~cowan
One time I called in to the central system and started working on a big
thick 'sed' and 'awk' heavy duty data bashing script.  One of the geologists
came by, looked over my shoulder and said 'Oh, that happens to me too.
Try hanging up and phoning in again.'  --Beverly Erlebacher
Received on Thu Mar 15 2007 - 17:24:16 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC