[R6RS] Changing the transcoding mid-stream
Michael Sperber
sperber at informatik.uni-tuebingen.de
Sat Aug 19 07:42:02 EDT 2006
William D Clinger <will at ccs.neu.edu> writes:
> Note, however, that the section is called "Text Transcoders",
> and the sentence preceding the one you quoted begins with the
> words "Text transcoders".
Yes, and I already admitted that this was potentially misleading. The
next draft will not have this wording.
> The third paragraph of that section requires transcoders to
> do something weird if they encounter an illegal encoding.
> That implies that all of the transcoders, including the UTF-8
> transcoder, will interfere with binary i/o. Since the SRFI
> also says that "no codec" corresponds to UTF-8, it follows
> that the proposal is useless for what I mean by binary i/o.
OK, I get the misunderstanding. More below.
> I don't want to argue with you about what most programmers
> believe or have considered. What concerns me is whether
> the proposal can deal with what I personally consider to
> be binary and mixed binary/textual i/o. Since you do not
> like my definition of those things,
I don't dislike your definition of things, I'm still in the process of
understanding them, just as you were still in the process of
understanding mine. I do want to address your concerns, but up to now
I didn't know what they were.
First of all, here's how to read your WAV file:
(open-file-input-port "foo.wav")
Indeed, "no transcoder" isn't the same as specifying a UTF-8
transcoder, and the sentence you quoted is highly misleading. Sorry
about that. As long as no transcoder is associated with the port, the various binary
I/O procedures deal with the binary data without doing any
interpretation of it as UTF-8. Even a
(transcoder (eol-style (eol-style crlf)))
would only ever look for #x0a and #x0d bytes and translate those,
ignoring the rest.
> You couldn't use read-char to read the text fields because read-char
> assumes UTF-8. You would have to use read-bytes-n (or similar) for
> the text fields, and then translate the bytes yourself.
No, you wouldn't. The details depend on what mechanism we adopt for
associating with a port and/or changing it mid-stream. In the
proposal where the transcoder is an argument, you'd do:
(open-file-input-port "foo.wav")
; binary I/O
... (get-u8 port) ...
; textual I/O
... (get-char port (transcoder (codec (latin1-codec)))) ...
; binary I/O again
... (get-u8 port) ...
If the transcoder is settable, you do:
(open-file-input-port "foo.wav")
; binary I/O
... (get-u8 port) ...
; textual I/O
(input-port-transcoder-set! port (transcoder (codec (latin1-codec))))
... (get-char port) ...
; binary I/O again
(input-port-transcoder-set! port (transcoder))
... (get-u8 port) ...
(One might make #f an alias for (transcoder) here.)
> In particular, we have no operations for translating bytes objects
> (or subsequences of bytes objects) into strings. (We have
> open-bytes-reader, but there is no way to specify a
> translation/transcoding for it.)
No, but there's `open-bytes-input-port', and that does.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
More information about the R6RS
mailing list