[r6rs-discuss] [Formal] U+FFFD not intended for encoding errors

From: John Cowan <cowan>
Date: Fri Sep 22 18:27:38 2006

Marcin 'Qrczak' Kowalczyk scripsit:

> "For example, in UTF-8 every code unit of the form 110xxxx must be
> followed by a code unit of the form 10xxxxxx. A sequence such as
> 110xxxxx 0xxxxxxx is illformed and must never be generated. When
> faced with this ill-formed code unit sequence while transforming or
> interpreting text, a conformant process must treat the first code unit
> 110xxxxx as an illegally terminated code unit sequence for example,
> by signaling an error, filtering the code unit out, or representing
> the code unit with a marker such as U+FFFD replacement character."

Good catch. I withdraw my comment.

-- 
Real FORTRAN programmers can program FORTRAN    John Cowan
in any language.  --Allen Brown                 cowan_at_ccil.org
Received on Fri Sep 22 2006 - 18:27:34 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC