[R6RS] I/O questions for everyone: encoding errors
Matthew Flatt
mflatt at cs.utah.edu
Thu Jul 13 14:57:29 EDT 2006
At Thu, 13 Jul 2006 19:58:50 +0200, Michael Sperber wrote:
>
> 1. If one of the read-char and read-string... procedures encounters an
> invalid encoding, should it:
>
> a) skip the first byte of the invalid encoding and treat it as
> U+FFFD (REPLACEMENT CHARACTER)
> b) skip the first byte of the invalid encoding and ignore it
> c) raise a continuable exception that allows the handler to specify
> what the decoding should be
> d) do one of the above depending on an (optional) configuration
> option specified upon opening the port.
"c", but without the "continuable" part.
Options "a" and "b" can be expressed as decodings. For example, there's
a decoding like UTF-8, except that bytes that would be bad in UTF-8 are
decoded as U+FFFD. Similarly, there's a decoding that ignores bytes
that would be bad for UTF-8.
Or maybe I mean "d", because you get to specify the transcoder for the
port.
For transformers created with the current pre-defined codecs, I think
"c" is the right answer. To get "a"- or "b"-like behavior, we could add
some pre-defined "a"- or "b"-like codecs, or we could add an extra
argument to `transcoder'; I have no opinion on whether or how that
should be done, though.
> 2. If one of the write-char and write-string... procedures gets
> passed a character that the transcoder of the port cannot encode,
> should it:
>
> a) encode the U+003F (QUESTION MARK) character instead
> b) try to encode the U+FFFD (REPLACEMENT CHARACTER), and, if that
> fails, do one of the other options
> c) ignore the character
> d) raise a continuable exception that allows the handler to specify
> what the encoding should be
> d) do one of the above depending on an (optional) configuration
> option specified upon opening the port.
Same reasoning, but the given choices make the answer easier: "d".
(Again without the continuable part, though.)
Matthew
More information about the R6RS
mailing list