Shiro Kawai wrote:
> Suppose I want to use Scheme as the extension language of
> the editor. It will have an operation to extract a region
> of the buffer as a Scheme string. And it will be useful
> if the extracted string contains language information as
> well, for I might want to do language-specific operations.
Associating arbitrary "properties" with a character or a
run of characters in a string is a very useful operation.
Emacs has this:
Each character position in a buffer or a string can have a "text
property list", much like the property list of a symbol.
Java Swing text "Document" objects provide something similar.
> Using 32bits per character and put auxiliary language info
> into the top 11 bits can be a plausible implementation.
For some applications 11 bits may be enough. But if you want a
language property as well as a font property, why then you're
already out of bits.
> (I think Emacs treats characters of different language by
> adding leading octet unique to each language.
Not quite. It can represent simultaneously different encodings
in the same buffer, but encoding isn't the same as language.
This "feature" is a holdover from the pre-Unicode (or rather
anti-Unicode) days: "Mule" was developed in Japan where there
was a lot anti-Unicode sentiment, but I think that war is over.
--
--Per Bothner
per_at_bothner.com http://per.bothner.com/
Received on Mon Mar 26 2007 - 11:46:50 UTC