Hi Jason,
Jason Orendorff wrote:
> And most (but not all) Unicode string implementations use UTF-16.
> Among languages and libraries that are very widely used, the majority
> is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt,
> Xerces-C, and on and on. (The few counterexamples use UTF-8: glib,
> expat. And expat can be compiled to use UTF-16.)
If this is true, then I would expect to find relatively little mention
of UTF-8 compared to UTF-16 on the internet. However, the google test
turns up *1,040,000* for *utf-16* versus *173,000,000* for *utf-8*.
Now, of course I realize that this is a particularly crude technique for
determining the relative popularity of UTF-8 and UTF-16, but even a very
crude technique does not cause this much of a discrepancy. 173 : 1 is
quite a steep ratio.
I'm sure this all has a simple explanation, but if we're going to use
popularity as a criterion for choosing a string representation, then we
ought to be really sure that we've got that popularity lined up the
right way around.
Incidentally: *497,000* for *utf-32*.
Furthermore, the IETF likes UTF-8 best. From the UTF-8 wikipedia page:
The Internet Engineering Task Force (IETF) requires all Internet
protocols to identify the encoding used for character data with UTF-8 as
at least one supported encoding.
Regards,
Jon
Received on Mon Mar 26 2007 - 16:22:22 UTC
This archive was generated by hypermail 2.3.0
: Wed Oct 23 2024 - 09:15:01 UTC