Michael linked to a Unicode FAQ earlier; I want to highlight this:
"Q: How about using UTF-32 interfaces in my APIs?
"A: Except in some environments that store text as UTF-32 in memory,
most Unicode APIs are using UTF-16. With UTF-16 APIs the low level
indexing is at the storage or code unit level, with higher-level
mechanisms for graphemes or words specifying their boundaries in terms
of the code units. This provides efficiency at the low levels, and the
required functionality at the high levels."
The author is Mark Davis, President of the Unicode Consortium.
http://unicode.org/faq/utf_bom.html#11
-j
Received on Fri Mar 23 2007 - 22:58:31 UTC