[r6rs-discuss] [Formal] Improve port i/o. from William D Clinger on 2006-11-22 (r6rs-discuss.mbox)

From: William D Clinger <will>
Date: Wed Nov 22 13:50:52 2006

I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors, and
this message should not be confused with the editors'
eventual formal response.

Mike Sperber wrote:
> > The real requirements appear to be:
> >
> > * Support efficient binary i/o.
> >
> > * Support efficient text i/o.
>
> I object to these requirements, particulary the conflation of
> "efficient" with "text" or "binary."

Conflation? I looked up the word, just to make sure
I understand its meaning. I don't see the conflation
you see.

> a) A lower level of efficiency is sufficient for most of my
> applications, many of which are I/O-bound.
> c) As the proposed design implies buffering at some level, and the
> transfer of binary data to arbitrary bytes objects, it is at
> least quite difficult to implement zero-copy I/O, facilities for
> which are offered by many modern operating systems.

That is a peculiar juxtaposition of arguments. If the
input data arise outside the memory system, then they
must be written into the memory system. Copying the
data a second time increases the cost by at most a
factor of 3 (reading the previously written data, and
then writing it again; this factor is then diluted by
other costs).

> The design of the (r6rs ports) library in the current draft, despite
> its numerous flaws, has the nice property that binary and textual I/O
> can be interleaved arbitrarily on the same port.

Yes. That design is also acceptably efficient for simple
Unicode encodings. As has been discussed here at length,
it is inefficient (even if possible) for some more complex
encodings. In practice, an implementation might have to
make separate calls to iconv (or something similar) for
every character. The cost of that is far more than a
factor of 3.

I had thought that mixing binary with textual i/o was
more important than efficient support for a wide range
of encodings. As has been explained to me, I thought
wrong.

> This seems a bizarre way to implement what it essentially a
> destructive operation on a port. Why isn't `transcode-port' simply an
> imperative operation, the way it is in SRFI 81? (I'm pretty sure
> there's a rationale, but it isn't stated.)

The proposed transcode-port procedure takes a binary port
and returns a text port. If no text port is also binary,
as is implied by the proposal, then programs will not be
able to compose calls to transcode-port. That eliminates
the buffering and other problems associated with switching
repeatedly from one textual encoding to another, without
resorting to any ad hoc restriction on the number of times
a side effect such as transcode-port! can be performed.

> > * To simplify the process of reading individual characters
> > a binary port, the R6RS should provide something like
> > get-char-from-binary and lookahead-char-from-binary,
> > which would take a binary port and a transcoder as
> > arguments. (See [issue:lookahead].)
>
> This actually seems a throwback to the (r6rs ports) library in the
> current draft, and it does allow a limited form of interleaving text
> and binary I/O.

Subsequent discussion here suggests those two procedures
are not really needed. In practice, the encodings that
are used at the beginning of an XML file, and that are
likely to be mixed with binary data, seem to be limited
to one-byte-per-character encodings (ASCII, Latin-n,
EBCDIC) or UTF-8, all of which are easily converted from
binary to text.

> What you haven't done is demonstrated what the inefficiencies
> of a stateful transcoder would be, or discuss whether those
> inefficiencies would be inacceptable. The price is that the interface
> grows and gains complexity.

No, the interface shrinks (especially if we can get rid
of readers and writers) and becomes simpler (by dropping
the transcoder arguments to the get-X procedures).

The i/o system of the current draft R6RS is efficient for
a small number of simple standard transcoders, but wouldn't
be efficient for stateful or more complex transcoders.
With the proposed improvements, an implementation could
use iconv to convert large chunks of input or output to
or from some convenient universal encoding, e.g. UTF-8.
That would be efficient even for stateful and complex
transcoders.

> To make it concrete, here's what an alternative design would look like
> that probably satisfies a different efficiency requirement from the
> one you seem to assume, allows arbitrary mixing of text and binary
> I/O, and requires fewer procedures in the interface:
>
> - A procedure `transcode-port!' changes the transcoder associated with
> a port.

The transcode-port! procedure would be better than the
transcode-port procedure if and only if we allow the
transcoder to be changed multiple times.

> - The various procedures for reading and writing binary data bypass
> the transcoding mechanism.

I will not bother to repeat John Cowan's arguments against
this.

> - `get-char-from-binary' etc. are eliminated.

Fine with me, for reasons explained above.

> - The various procedures for reading textual data only remove as much
> binary data from the underlying data stream as necessary. For the
> Unicode encodings, this is trivial. For other, more stateful
> encodings, this may be more complex, and may in effect prohibit
> interleaving textual and binary data, but then we'd be no worse off
> than before in those cases.

I will not bother to repeat John Cowan's arguments against
this.

Will
Received on Wed Nov 22 2006 - 13:50:48 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC