[I'm not speaking for the editors, as usual.]
William D Clinger <will_at_ccs.neu.edu> writes:
> The real requirements appear to be:
>
> * Support efficient binary i/o.
>
> * Support efficient text i/o.
I object to these requirements, particulary the conflation of
"efficient" with "text" or "binary." Here are some problems that
arise from this:
- The notion of "efficiency" is at best vaguely defined. It seems to
me, from the discussion here and on the editors list and from
specific design decisions that you propose, that it refers to a
particular implementation model and set of optimizations that you
assume, and to a level of efficiency reachable with that
implementation model.
- I conjecture the following:
a) A lower level of efficiency is sufficient for most applications.
b) Design improvements are possible if the assumptions
underlying the particular notion of efficiency implicit in the
design are removed.
c) The proposed design makes achieving a maximal level of efficiency
at least very difficult.
These are conjectures; I may be wrong on all three counts, and I
look forward to a technical discussion on them.
Here are some arguments on a) and c):
a) A lower level of efficiency is sufficient for most of my
applications, many of which are I/O-bound.
c) As the proposed design implies buffering at some level, and the
transfer of binary data to arbitrary bytes objects, it is at
least quite difficult to implement zero-copy I/O, facilities for
which are offered by many modern operating systems.
This isn't particularly important to me, as it can (and IMHO
should) be satisfied with a much simpler I/O system such as
Primitive I/O. But as long as an unqualified notion of
efficiency is part of the requirements, it ought to be
considered.
Arguments on b) are really about the general design. I'll go back
an indentation level to talk about that.
The design of the (r6rs ports) library in the current draft, despite
its numerous flaws, has the nice property that binary and textual I/O
can be interleaved arbitrarily on the same port. While some may argue
that it is not particularly important, I disagree. It is certainly
useful in a number of circumstances, and I would like to see it added
to the list of requirements. While lookahead issues on certain text
encodings make this difficult, it is not difficult for the Unicode
encodings, which hopefully will become more prevalent with time.
Also, Unicode encodings is all the current proposals ever talk about
anyway.
I suspect the decision to remove this requirement are largely based on
perceived efficiency problems, but it is hard to know. These problems
also influenced the design of SRFIs 80 and 81, which prevent certain
patterns of interleaved text and binary I/O so as to be able to
block-transcode ahead. The proposed design exhibits essentially an
equivalent restriction (binary I/O can be followed by text I/O, but
not vice versa), as `transcoded-port' is an irreversible operation.
I've since come to believe this restriction is bad an unnecessary.
This brings me to my first more concrete issue with the proposal:
> * A new procedure, transcoded-port, takes a binary port
> and a transcoder as arguments and returns a new text
> port whose state is largely that of the binary port
> but whose transcoder is the newly specified transcoder.
>
> * To prevent interference between operations on the
> original binary port and buffering of transcoded
> characters on the text port created by transcoded-port,
> the original binary port is closed when the derived text
> port is created.
This seems a bizarre way to implement what it essentially a
destructive operation on a port. Why isn't `transcode-port' simply an
imperative operation, the way it is in SRFI 81? (I'm pretty sure
there's a rationale, but it isn't stated.)
> * To simplify the process of reading individual characters
> a binary port, the R6RS should provide something like
> get-char-from-binary and lookahead-char-from-binary,
> which would take a binary port and a transcoder as
> arguments. (See [issue:lookahead].)
This actually seems a throwback to the (r6rs ports) library in the
current draft, and it does allow a limited form of interleaving text
and binary I/O. This the raises the general question of why
`transcode-port' really needs to close the underlying port at all, or
why the transcoding cannot be changed once it's set. I suspect this
is to facilitate the inlining of the inner loop of the encoder or
decoder, as you've demonstrated the efficiency of that implementation
model. What you haven't done is demonstrated what the inefficiencies
of a stateful transcoder would be, or discuss whether those
inefficiencies would be inacceptable. The price is that the interface
grows and gains complexity.
To make it concrete, here's what an alternative design would look like
that probably satisfies a different efficiency requirement from the
one you seem to assume, allows arbitrary mixing of text and binary
I/O, and requires fewer procedures in the interface:
- A procedure `transcode-port!' changes the transcoder associated with
a port.
- The various procedures for reading and writing binary data bypass
the transcoding mechanism.
- `get-char-from-binary' etc. are eliminated.
- The various procedures for reading textual data only remove as much
binary data from the underlying data stream as necessary. For the
Unicode encodings, this is trivial. For other, more stateful
encodings, this may be more complex, and may in effect prohibit
interleaving textual and binary data, but then we'd be no worse off
than before in those cases.
--
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla
Received on Wed Nov 22 2006 - 04:16:26 UTC