[r6rs-discuss] Stateful codecs and inefficient transcoding from John Cowan on 2006-11-04 (r6rs-discuss.mbox)

From: John Cowan <cowan>
Date: Sat Nov 4 13:26:27 2006

William D Clinger scripsit:

> * If t1 and t2 are transcoders, then their composition
> is defined by describing their composition in both
> the input and output directions. In the input
> direction, their composition is t1input followed by
> t2output followed by t2input. For output, their
> composition is t2output followed by t2input followed
> by t1output.

This is very confusing. Let t1 be the binary transcoder, and let t2
be the {utf-8-codec, lf, raise} transcoder. Differences other than
encoding drop out. On input, t1input maps bytes to characters in the
Latin-1 repertoire. t2output maps these characters to either one-byte
or two-byte sequences; t2input then maps them back again. Where does
full UTF-8 decoding get done? I would expect rather that the sequence
is t1input followed by t1output (the identity) followed by t2input.

> The rationale for this definition of composition is that
> it adds a new layer of transcoding onto the existing layer,
> instead of replacing the existing layer. That allows
> some of the weirder, file-at-a-time transcodings.

Other than composing the binary transcoder with an ordinary one, what are
the actual use cases for this? I don't know what you mean by "file at
a time"; transcoders require varying amounts of state from none (UTF-8)
to one bit (UTF-16) to a few bytes (full ISO 2022), but I know of no
encodings in which a byte can retroactively change the interpretation
of bytes that have already been transcoded.

> In practice, t1 will usually be the binary transcoder, and t2output
> followed by t2input will be the identity restricted to some subset of
> the Unicode characters.

Note, however, that there are transcoders which accept certain characters
for output that they never generate on input; they conflate multiple
Unicode characters into the same external encoding.

-- 
Business before pleasure, if not too bloomering long before.
        --Nicholas van Rijn
                John Cowan <cowan_at_ccil.org>
                    http://www.ccil.org/~cowan

Received on Sat Nov 04 2006 - 13:26:22 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC