[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

From: Per Bothner <per>
Date: Mon Mar 26 11:48:49 2007

Shiro Kawai wrote:
> Suppose I want to use Scheme as the extension language of
> the editor. It will have an operation to extract a region
> of the buffer as a Scheme string. And it will be useful
> if the extracted string contains language information as
> well, for I might want to do language-specific operations.

Associating arbitrary "properties" with a character or a
run of characters in a string is a very useful operation.
Emacs has this:

   Each character position in a buffer or a string can have a "text
   property list", much like the property list of a symbol.

Java Swing text "Document" objects provide something similar.

> Using 32bits per character and put auxiliary language info
> into the top 11 bits can be a plausible implementation.

For some applications 11 bits may be enough. But if you want a
language property as well as a font property, why then you're
already out of bits.

> (I think Emacs treats characters of different language by
> adding leading octet unique to each language.

Not quite. It can represent simultaneously different encodings
in the same buffer, but encoding isn't the same as language.
This "feature" is a holdover from the pre-Unicode (or rather
anti-Unicode) days: "Mule" was developed in Japan where there
was a lot anti-Unicode sentiment, but I think that war is over.
-- 
	--Per Bothner
per_at_bothner.com   http://per.bothner.com/
Received on Mon Mar 26 2007 - 11:46:50 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC