[r6rs-discuss] perhaps i should be formal, but....

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Per Bothner <per>
Date: Wed Mar 14 15:23:00 2007

Thomas Lord wrote:

> My question is whether any principled reason for these arbitrary
> constants is given that might be supported without appeal
> to analogies to other programming languages.

Consider what happens if Unicode surrogate values are considered
valid characters. That implies they can stored in a string,
which is basically a character array.

Then the question arises is to what it means to index into a
string: Is it the N'th code point or the N'th scalar value?
The draft specifies that it's the N'th scalar value - which
means any use of surrogates must be hidden.

If you allow Unicode surrogate values as actual character
values that you effectively prohibit an implementation
for storing characters internally using UTF-16, since you
can't tell whether a surrogate pair is one Scheme character
or two. UTF-16 is the natural representation in Java, at least.
(I think that might be the code in Windows APIs as well.)

-- 
	--Per Bothner
per_at_bothner.com   http://per.bothner.com/

Received on Wed Mar 14 2007 - 15:21:50 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC