From: William D Clinger <will_at_ccs.neu.edu>
Subject: [r6rs-discuss] Stateful codecs and inefficient transcoding
Date: Tue, 07 Nov 2006 11:33:39 -0500
> For the latest, up-to-the-minute list of my mistakes and
> other people's ideas for fixing draft R6RS section 15.3
> (port i/o), please see http://www.ccs.neu.edu/home/will/R6RS/
Much cleaner, IMHO. A few comments and request for clearification.
* Although I agree that splitting binary and text ports are
good idea, I have one concern.
Are the default (current-input-port) and (current-output-port)
text ports or binary ports? If they are text ports, then
we need to have a way to specify their transcoders before the
Scheme process begins. If they are binary ports, probably
large number of scripts need to take extra steps to set up
text I/O:
(with-input-from-port (transcoded-port
(current-input-port)
(some-transcoder))
(lambda ()
(with-output-to-port (transcoded-port
(current-output-port)
(some-transcoder))
(lambda ()
...the actual processing...
))))
I think Java requires something like this, so it may not
be so outrageous as it seems to me. Alternative ideas are:
(1) have separate ports for binary and text, e.g.
current-text-{input|output}-port/current-binary-{input|output}-port,
and require Scheme programs to use only either one per process, or
(2) drop the concept of current-input/output-ports from the standard.
* The requirement of closing the original binary port, when a
text port is created from it, seems a bit too constraining.
For the input ports it might be reasonable, since there would
be no portable way to know how much data is buffered within
the transcoded port. For the output ports, however, we can
make sure the transcoding is 'done' by flusing and closing
the transcoded port, and still might want to use the original
binary port for other purposes. One of such use is the terminal
output in interactive use; a user might want to test out several
transcoders to see which matches the terminal's coding system.
* The behavior of flushing transcoded output port should be
clarified. If the transcoding is stateful, should flush 'reset'
the state, or merely empty the buffer but keeping the state?
Resetting the state may produce some extra output, such as
escape sequence.
If we go for safety side, requiring to reset the state would be
better (since the output becomes 'self contained' when it is
chopped between flushing and the next output). However, if
the underlying transcoder uses libiconv, the only way to enforce
the resetting is to call iconv_close(), which requires the next
output operation to call iconv_open() again---possible performance
loss.
(Some stateful encodings, like iso-2022-jp, require to reset
the sequence at each end of line, so "newline and flush" can
make sequence reset even if flush operation itself does not
guarantee it. Forcing flush to reset the state would be a
burden for such cases.)
--shiro
Received on Tue Nov 07 2006 - 16:54:06 UTC