[r6rs-discuss] Stateful codecs and inefficient transcoding

From: William D Clinger <will>
Date: Sat Nov 4 12:16:59 2006

I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors.

Thanks again to those who have explained to me the real
requirements for port i/o. This message describes some
simplifications to section 15.3 of the draft that, in
my opinion, would satisfy the real requirements while
enhancing the usability and efficiency of port i/o.

The main ideas here are similar to those of SRFI 81,
which was a starting point for section 15.3 of the
draft, but there are a few differences.

 * A transcoder is an immutable description (think of
    it as a factory for transcoding objects) of some
    possibly stateful algorithm for translating sequences
    of bytes into sequences of characters and vice versa.

 * Every transcoder can operate in the input direction
    (bytes to characters) or in the output direction
    (characters to bytes), but the composition of those
    directions need not be identity (and often isn't).

 * If t1 and t2 are transcoders, then their composition
    is defined by describing their composition in both
    the input and output directions. In the input
    direction, their composition is t1input followed by
    t1output followed by t2input. For output, their
    composition is t1output followed by t1input followed
    by t2output.

 * The standard transcoders are constructed from codecs,
    eol styles, and handling modes as described in section
    15.3 of the draft R6RS.

 * The standard codecs include those in the draft, plus
    utf-16-codec and utf-32-codec, which interpret a BOM
    (or its absence) as specified by the Unicode standard.
    The standard codecs thus include all seven character
    encoding schemes defined by Unicode, plus latin-1-codec.

 * Implementations may support other codecs, eol styles,
    and other kinds of transcoders.

 * The binary transcoder is defined as the transcoder
    constructed from the latin-1-codec, lf eof-style,
    and (arbitrarily, since no transcoding errors are
    possible) the raise handling mode. (Note that the
    binary transcoder's input and output directions
    compose to identity; transcoders based on Unicode
    codecs or other eol styles do not have that property.)

 * A binary port is a port whose transcoder is the
    binary transcoder.

 * The binary lookahead-X, get-X, and put-X operations
    (which have "byte" or "bytes" in their names) operate
    only on binary ports.

 * The textual lookahead-X, get-X, and put-X operations
    operate on all ports, since a binary port is also a
    textual port. They no longer accept a transcoder as
    an optional argument.

 * A new procedure, transcoded-port, takes a port and
    a transcoder as arguments and returns a new port
    whose state is largely that of the original port
    but whose transcoder is the composition of the
    original transcoder with the newly specified
    transcoder. (In general, the transcoding cost
    for a composition will be additive, but certain
    compositions might be optimized.)

 * To prevent interference between operations on the
    original port and operations on the port created by
    transcoded-port, the original port is closed when
    the derived port is created. (Implementation note:
    the original port is cloned, the cloned port is
    encapsulated within the derived port, and then the
    original port is closed in a special way that
    doesn't release resources needed by the clone.)

 * If no optional transcoder argument is passed to an
    open-file-X procedure, then the transcoder associated
    with the resulting port is not specified.

 * The port-position and set-port-position! procedures
    are required only for binary ports that were created
    by an open-X procedure. (Rationale: asking for the
    byte position of a complexly transcoded port can be
    like asking for the carrier frequency of a spread
    spectrum signal.)

 * The open-X procedures may raise an exception if
    the specified transcoder is not supported for the
    kind of port being opened.

Let me know what you think.

Will
Received on Sat Nov 04 2006 - 12:16:49 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC