[r6rs-discuss] Stateful codecs and inefficient transcoding from Shiro Kawai on 2006-11-07 (r6rs-discuss.mbox)

From: Shiro Kawai <shiro>
Date: Tue Nov 7 16:52:26 2006

From: William D Clinger <will_at_ccs.neu.edu>
Subject: [r6rs-discuss] Stateful codecs and inefficient transcoding
Date: Tue, 07 Nov 2006 11:33:39 -0500

> For the latest, up-to-the-minute list of my mistakes and
> other people's ideas for fixing draft R6RS section 15.3
> (port i/o), please see http://www.ccs.neu.edu/home/will/R6RS/

Much cleaner, IMHO. A few comments and request for clearification.

* Although I agree that splitting binary and text ports are
  good idea, I have one concern.

  Are the default (current-input-port) and (current-output-port)
  text ports or binary ports? If they are text ports, then
  we need to have a way to specify their transcoders before the
  Scheme process begins. If they are binary ports, probably
  large number of scripts need to take extra steps to set up
  text I/O:

   (with-input-from-port (transcoded-port
                           (current-input-port)
                           (some-transcoder))
     (lambda ()
       (with-output-to-port (transcoded-port
                              (current-output-port)
                              (some-transcoder))
         (lambda ()
           ...the actual processing...
           ))))

  I think Java requires something like this, so it may not
  be so outrageous as it seems to me. Alternative ideas are:
  (1) have separate ports for binary and text, e.g.
  current-text-{input|output}-port/current-binary-{input|output}-port,
  and require Scheme programs to use only either one per process, or
  (2) drop the concept of current-input/output-ports from the standard.

* The requirement of closing the original binary port, when a
  text port is created from it, seems a bit too constraining.
  For the input ports it might be reasonable, since there would
  be no portable way to know how much data is buffered within
  the transcoded port. For the output ports, however, we can
  make sure the transcoding is 'done' by flusing and closing
  the transcoded port, and still might want to use the original
  binary port for other purposes. One of such use is the terminal
  output in interactive use; a user might want to test out several
  transcoders to see which matches the terminal's coding system.

* The behavior of flushing transcoded output port should be
  clarified. If the transcoding is stateful, should flush 'reset'
  the state, or merely empty the buffer but keeping the state?
  Resetting the state may produce some extra output, such as
  escape sequence.

  If we go for safety side, requiring to reset the state would be
  better (since the output becomes 'self contained' when it is
  chopped between flushing and the next output). However, if
  the underlying transcoder uses libiconv, the only way to enforce
  the resetting is to call iconv_close(), which requires the next
  output operation to call iconv_open() again---possible performance
  loss.

  (Some stateful encodings, like iso-2022-jp, require to reset
  the sequence at each end of line, so "newline and flush" can
  make sequence reset even if flush operation itself does not
  guarantee it. Forcing flush to reset the state would be a
  burden for such cases.)

--shiro
Received on Tue Nov 07 2006 - 16:54:06 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC