[r6rs-discuss] Strings

From: Marcin 'Qrczak' Kowalczyk <qrczak>
Date: Sun Mar 25 06:47:02 2007

Dnia 24-03-2007, sob o godzinie 13:31 -0400, MichaelL_at_frogware.com
napisa?(a):

> Summary
> "This document attempts to make the case that it is advantageous to use
> UTF-16 (or 16-bit Unicode strings) for text processing..."

IMHO this is one of the worst mistakes Unicode is trying to make.
It convinces people that they should not worry about characters above
U+FFFF just because they are very rare. UTF-16 combines the worst
aspects of UTF-8 and UTF-32.

If size is important and variable width of the representation of a code
point is acceptable, then UTF-8 is usually a better choice. If O(1)
indexing by code points is important, then UTF-32 it better. Nobody
wants to process texts in terms of UTF-16 code units. Nobody wants to
have surrogate processing sprinkled around the code, and thus if one
accepts an API which extracts variable width characters, then the API
could as well deal with UTF-8, which is better for interoperability.
UTF-16 makes no sense.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak_at_knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
Received on Sun Mar 25 2007 - 06:46:49 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC