[r6rs-discuss] Stateful codecs and inefficient transcoding from Marcin 'Qrczak' Kowalczyk on 2006-11-05 (r6rs-discuss.mbox)

From: Marcin 'Qrczak' Kowalczyk <qrczak>
Date: Sun Nov 5 15:51:59 2006

William D Clinger <will_at_ccs.neu.edu> writes:

> This message describes some simplifications to section 15.3 of the
> draft that, in my opinion, would satisfy the real requirements while
> enhancing the usability and efficiency of port i/o.

This is much better!

I support John Cowan's remarks, in particular separation between byte
ports and character ports.

Here is what I would change:

* Standarising the protocol between codecs, transcoders, and ports
  would allow to extend the set of transcoders portably.

  This is harder than might seem. I will try to port my design to
  Scheme.

  Efficiency will not necessarily suffer because an implementation
  might take shortcuts for builtin transcoders. This approach
  complicates the implementation, and thus it would be better for
  the universal protocol to be fast enough.

* There are two concepts:

  1. A description of a translation of a sequence of bytes
     or characters to another sequence of bytes or characters
     (4 types).

  2. A pair of such descriptions which are supposed to be close
     to each other's inverses.

  The current proposal uses the second concept only, in a few variants
  (transcoders, which are composed from codecs and newline converters).

  But in reality they don't necessarily come in pairs. For example it
  makes sense to consider a newline converter which accepts several
  conventions on input; this makes no sense for output. A compressor
  often has settable parameters, but the corresponding decompressor
  reads the parameters from the stream, they are not specified
  separately.

  In my design the one-direction transcoder is the more fundamental
  concept. Character encodings have names for convenience; there is a
  mapping between encoding names and encoders, and between encoding
  names and decoders.

> * To prevent interference between operations on the
> original port and operations on the port created by
> transcoded-port, the original port is closed when
> the derived port is created.

Such interference has a purpose: this a way for mixing text and binary
i/o on the same stream, or for using multiple encodings (other than
extracting byte arrays and transcoding them separately).

Unfortunately this interference is delicate:

For output the transcoded stream must be flushed but not closed;
probably flushed in the sense of notifying the transcoder about end of
data (there are other modes of flushing when we consider compression).

For input, if the length of the portion to be decoded is not known
beforehand, but is implied by the result of the decoding, then
decoding must be performed one character at a time. This is slow but
unavoidable. Buffering of transcoding must be somehow turned off.

I don't know of any good way of avoiding these complexities. Trying to
avoid them would either limit expressiveness or make transcoding slow
in the usual case.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak_at_knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Received on Sun Nov 05 2006 - 15:51:25 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC