[r6rs-discuss] Strings as codepoint-vectors: bad

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Michael Sperber <sperber>
Date: Wed Mar 21 05:15:09 2007

"Jason Orendorff" <jason.orendorff_at_gmail.com> writes:

> (c) If you know Unicode, it's not hard to work with code units.

I know Unicode to some extent, but I find it very hard to work with
UTF-16 code units. It's hard enough that, for example, most code
examples in Gillam's book, despite being written in Java, deal in
terms of Unicode scalar values rather than Java's native
representation.

> UTF-8 and UTF-16 were explicitly designed with this in mind. If you
> don't know Unicode, you're unlikely to write correct code on top of
> the R5.92RS libraries anyway. Hiding code units eliminates exactly
> one pitfall--among *many*.

It's a particularly hideous pitfall, however. For fun, try to grok
Unicode normalization by studying the Java sample implementation at:

http://www.unicode.org/reports/tr15/Normalizer.java

-- 
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla

Received on Wed Mar 21 2007 - 05:15:04 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC