[r6rs-discuss] Strings from MichaelL_at_frogware.com on 2007-03-21 (r6rs-discuss.mbox)

From: MichaelL_at_frogware.com <MichaelL>
Date: Wed Mar 21 18:13:36 2007

Jason Orendorff wrote:

> And most (but not all) Unicode string implementations use UTF-16.
> Among languages and libraries that are very widely used, the majority
> is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt,
> Xerces-C, and on and on.

(...and Windows and Mac and IBM's ICU and PHP 6 and...)

> Higher-level APIs are a fine approach.
>
> The other solution is to standardize the implementation, so that the
> efficient algorithms don't differ. I want to push this seriously one
> last time: Unicode strings have been kicked around for a while now,
> and despite Will's link, real-world implementations do not vary much.
> I don't think it's premature to standardize.

I started looking into these issues a while ago when we were faced with
internationalizing an app. (The app runs on several platforms and under
several web servers.) Before learning about what's out there I would have
wanted to keep my options open; knowing what I know now I'd agree with
Jason. It would make sense to standardize on UTF-16 strings and UTF-32
characters. (Note, btw, that that doesn't preclude UTF-8 strings. It just
means that the built-in string type would be UTF-16.)

On a different note, I find this desire to shield programmers from code
units odd and senseless. If R6RS intends Scheme to be a higher-level
language that abstracts away representation issues why is it adding
fixnums and flonums? Why do bytevectors have operations that get and set
singles and doubles?
Received on Wed Mar 21 2007 - 18:13:00 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC