[R6RS] Procedures that depend on Unicode character classification
William D Clinger
will at ccs.neu.edu
Tue Jun 13 22:11:50 EDT 2006
Mike wrote:
> I'd like to suggest that the procedures that depend on the Unicode
> character classification, i.e. the ...-ci... procedures, the
> {char,string}...case procedures, and `char-general-property' procedure
> be moved to a separate library.
>
> They're by far not as frequently used as most of the others, and
> leaving them out of a program might save signficant space, as the
> tables they need to operate tend to weigh in roughly at the order of
> 100kbytes (depending on the compact representation chosen).
Mike's estimate surprised me, so I spent a couple of hours
writing a parser for the UCD File Format and investigating
the table sizes. My estimated table sizes for a 32-bit
system, with no serious compression, are:
tables for case folding and the -ci procedures:
9 kbytes
tables for char-general-property and associated predicates:
10 kbytes
tables for the four normalization procedures:
20 kbytes
I don't fully understand string normalization yet, so I'm
less confident about that estimate than the other two, but
I think my estimated 20 kbytes is more likely to be on the
high side than on the low side.
Will
More information about the R6RS
mailing list