[r6rs-discuss] Strings

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Jon Wilson <j85wilson>
Date: Mon Mar 26 16:22:32 2007

Hi Jason,

Jason Orendorff wrote:
> And most (but not all) Unicode string implementations use UTF-16.
> Among languages and libraries that are very widely used, the majority
> is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt,
> Xerces-C, and on and on. (The few counterexamples use UTF-8: glib,
> expat. And expat can be compiled to use UTF-16.)
If this is true, then I would expect to find relatively little mention
of UTF-8 compared to UTF-16 on the internet. However, the google test
turns up *1,040,000* for *utf-16* versus *173,000,000* for *utf-8*.
Now, of course I realize that this is a particularly crude technique for
determining the relative popularity of UTF-8 and UTF-16, but even a very
crude technique does not cause this much of a discrepancy. 173 : 1 is
quite a steep ratio.

I'm sure this all has a simple explanation, but if we're going to use
popularity as a criterion for choosing a string representation, then we
ought to be really sure that we've got that popularity lined up the
right way around.

Incidentally: *497,000* for *utf-32*.

Furthermore, the IETF likes UTF-8 best. From the UTF-8 wikipedia page:

The Internet Engineering Task Force (IETF) requires all Internet
protocols to identify the encoding used for character data with UTF-8 as
at least one supported encoding.

Regards,
Jon
Received on Mon Mar 26 2007 - 16:22:22 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC