John Cowan <cowan_at_ccil.org> writes:
> The Unicode character U+FFFD is intended to represent a character
> in a non-Unicode encoding which is not representable within Unicode
> as currently defined. It is not intended to represent an encoding
> error.
I disagree. Here is what section 3.2 of Unicode 4.0 says:
"For example, in UTF-8 every code unit of the form 110xxxx must be
followed by a code unit of the form 10xxxxxx. A sequence such as
110xxxxx 0xxxxxxx is illformed and must never be generated. When
faced with this ill-formed code unit sequence while transforming or
interpreting text, a conformant process must treat the first code unit
110xxxxx as an illegally terminated code unit sequence for example,
by signaling an error, filtering the code unit out, or representing
the code unit with a marker such as U+FFFD replacement character."
--
__("< Marcin Kowalczyk
\__/ qrczak_at_knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Received on Fri Sep 22 2006 - 18:25:25 UTC