[r6rs-discuss] Stateful codecs and inefficient transcoding from William D Clinger on 2006-10-31 (r6rs-discuss.mbox)

From: William D Clinger <will>
Date: Tue Oct 31 00:53:26 2006

I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors.

Shiro Kawai wrote:
> (1) As noted in the lookahead-char entry, this kind of coding
> requires some amount of lookahead from the port. In fact,
> it requires potentially unlimited amount---although highly
> unlikely, iso2022 encoding *can* have arbitrary number
> of escape sequences before it hits the 'real character'.

Ouch.

> I wonder if it is the intention of the designers that
> requires unlimited lookahead for comforming implementation
> of a port (the standard doesn't specify iso2022 as standard
> codec, but it is a *must* support if you want to write a
> practical application that handles emails).

Speaking only for myself, I am glad the draft R6RS does
not require unlimited lookahead. I am also glad the
draft R6RS does not forbid unlimited lookahead as an
extension, but I would not want to implement such an
extension myself.

It doesn't surprise me that there are standard encodings
that would require unlimited lookahead. In my opinion,
it is probably not practical for an implementation to
support those encodings via the transcoders of the draft
R6RS. In my opinion, the transcoders of the draft R6RS
are barely adequate for the Unicode encodings, and are
obviously not adequate for all possible translations
from binary to text and vice versa.

That limitation is, in my view, an advantage, because a
fully general solution would be less efficient.

> Or, the comforming implementation can be such that it only
> supports typical cases (an escape sequence always followed by
> a real character) and may report implementation limitation
> violation?

Violations of an implementation restriction are a general
purpose mechanism for escaping from the requirements of
the R5RS and draft R6RS. In my opinion, no shame should
be attached to such restrictions when they can only arise
in connection with an implementation-specific extension.

> Are there any other examples that
> shows the usefulness of the transient encoder?

I am told that XML files can contain several different
encodings of text, and it appears that the "transient
encoder" technique is useful for XML files.

> Aren't these [non-homomorphic encodings] confusing?

Yes.

> I feel the complexity comes from the fact that transcoding
> in inherently a streaming operation, but the transient
> transcoder forces such stream to be cut for each I/O
> procedure call.

Agreed. Note, however, that mixing binary with textual
i/o, and mixing different transcoders for textual i/o,
was (in my opinion) regarded as a requirement. That
requirement already implies cutting streams into pieces
that use different transcoders (or none at all).

Should the Scheme community agree that this perceived
requirement should be dropped, it ought to be possible
to come up with a simpler design for the i/o system.

Will
Received on Tue Oct 31 2006 - 00:53:20 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC