[R6RS] I/O
Michael Sperber
sperber at informatik.uni-tuebingen.de
Fri Jul 7 14:35:26 EDT 2006
Many thanks for the detailed comments!
A new version of the I/O SRFIs has been checked in. In addition to
the corrections described below, I've massaged the condition hierarchy
a bit in light of what R6RS will have, and added a `native-eol-style'
procedure to the ports layer.
William D Clinger <will at ccs.neu.edu> writes:
> General comment: Exposing so much low-level detail
> makes it harder to construct an efficient i/o system.
> To me, this primitive i/o abstraction layer looks
> like an extra layer of pure overhead.
This is a misunderstanding about the primary role of the Primitive I/O
layer. The Primitive I/O layer is mainly for people implementing
custom data sources (and possibly doing very high-performance I/O,
which is hard with the Ports layer).
Now, an implementor will need to provide `open-reader-input-port' and
`open-reader-output-port'. However, the proposal does not expose how,
say, ports on files are implemented: These could completely bypass
primitive I/O. (This was a hot topic on the SRFI mailing list; the
first draft did expose more detail.)
It is possible to design the interface for providing custom data
sources to more closely match the Ports system (and I assume most
Scheme systems include abstractions for this, at least internally),
but this is very difficult to do in a manner that's stable, efficient
and easy to write code to. Refer to the history of "custom ports" in
PLT Scheme for particularly gruesome examples.
> If a port were defined as a reader or writer plus a
> transcoder, I could see some use to this layer, but
> with the side-effecting semantics for associating
> transcoders with ports, I don't.
I'm not sure I understand the comment: Defining ports this way would
ignore the issue of buffering which is essential to the design of Port
I/O. I also don't see, even if readers and writers were defined as
"reader/writer plus buffer plus transcoder" how the "side-effecting
semantics" would make it less useful.
> Filenames: Please define "octet" somewhere.
I've replaced this by byte. (Both "byte" and "octet" are specified in
the section on bytes objects.)
> Readers and Writers: "The objects representing I/O
> descriptors are called readers for input and writers
> for output." That sentence appears to be misleading
> because, if I understand this document correctly,
> the word "descriptor" means something completely
> different (and essentially undefined) throughout
> the rest of the document; it does not mean a reader
> object or a writer object.
Right on. I've eliminated the use of the word "descriptor" in this
paragraph.
> Readers, (get-position): "EOFs do not count as
> octets." Do you envision multiple EOFs?
Yes.
> Specification of make-simple-reader: The document
> does not explain how a programmer is supposed to
> lay hands on an object that can legitimately be
> passed as the second argument (the descriptor) to
> this procedure. From that I conclude that this
> procedure has no conceivable use in portable code,
> and does not belong in the R6RS.
I'm obviously failing at describing this clearly, and I need your
help. Remember that the Primitive I/O layer is for implementors of
custom data sources or sinks. The descriptor is an optional
communication channel between the operations of a certain kind of
source or sink. For example, an implementation of a bytes writer
(which is built-in, but it wouldn't have to be) will need to provide
`writer-bytes', given just a writer. Thus, bytes writers keep the
data that's being accumulated in the descriptor; the descriptor is a
communication channel between `open-bytes-writer' and `writer-bytes.'
None of the other procedures ignorant of what kind of reader/writer
they get touches the descriptor. Thus, this has nothing to do with
portability.
> It may be that any object whatsoever may be passed
> as the descriptor, inasmuch as the reader's state
> is essentially private to the procedures that are
> passed (read!, available, get-position, set-position!,
> end-position, close). In that case, make-simple-reader
> has a purpose, but I wonder what purpose is served by
> its descriptor argument.
The problem is that this state is hidden in closures, and more
difficult to make available to auxiliary operations such as
`writer-bytes'.
> Prequisites: The unspecified value should be specified
> as the value returned by the unspecified procedure.
Done; note that this is purely for the SRFI version. It won't appear
in the R6RS document.
> Instead of saying "strings are represented as vectors
> of scalar values", which implies that the vector?
> predicate is true of strings, it should say something
> like "strings are analogous to vectors of scalar values".
Done.
> File options: Instead of saying that file options are
> as in SRFI 79, it should say that file options are a
> subset of a certain set of symbols, as in the current
> draft of the primitive io srfi.
Done.
> Buffer modes: In addition to none, line, and block,
> shouldn't there be an insouciant mode?
Sure. Could we pick a different word, though? I'm reasonably
proficient in English, but I had to look this one up. (And, looking
at the entry in Roget's, it seems to have negative connotations.) How
about `dont-care', `no-preference' or `never-mind'?
> The description of buffer-mode should refer to name as a symbol, not
> as an identifier. (The buffer-mode syntax should recognize the name
> as a symbol, not as the name of a variable. This matters when
> buffer-mode is used within the scope of a variable whose name looks
> like the symbol that names the mode.)
Done.
> Specification of eol-style: These forms should
> evaluate to the symbols lf, crlf, and cr.
Done.
> Specification of read-bytes-some: If this procedure
> is intended to hang when waiting to see whether more
> bytes are forthcoming from its argument, the spec
> should say so. This applies to several subsequent
> specifications also.
I've tried to improve this.
> Specification of read-u8: Please define octet
> somewhere.
I've replaced "octet" by "byte" pervasively.
> The spec speaks of "the next end of file"; do you envision input
> ports that contain multiple ends of file?
The model is that you have a byte sequence with interleaved
end-of-files which goes on indefinitely. For a finite data source, it
ends in an infite sequence of end-of-files. I've tried to describe
this better.
> How is "just past the end
> of file" different from "just before the end of file"?
It differs in whether the next read-<something> will return this end
of file or whatever comes after it. (Which may be another end of file
object, or not. But the difference is observable if a byte comes
after it.)
> These questions apply to several subsequent specifications as well.
> By the way, what if UTF-8 is inconsistent with the transcoding of
> the input port?
This last sentence I don't understand. Could you explain?
> Specification of read-string: The number of bytes
> read appears to be ambiguous, since 0 bytes can
> always be interpreted as a UTF-8 string and many
> bytes that could follow a UTF-8 string might be
> interpreted as an extension of that string.
`Read-string' is really a dumb idea (someone on the SRFI list spotted
it, but I had forgotten about it); it was there for symmetry with
`read-bytes'. I've elided it.
>
> Specification of read-char: This also seems
> ambiguous in the sense that the character #\a
> might be followed by modifiers that could be
> composed with #\a to form a new character. I
> presume the intent is that no such compositions
> be formed.
No. The prefix of the byte sequence forms an encoding of a scalar
value, and it's unambiguous when that sequence ends.
> A similar remark applies to the next two procedures.
>
> Specification of port-eof?: What if the port is
> currently pointing *past* an end of file (whatever
> that means)?
If there's a byte there, it will return #f. If there's another end of
file, it will return #t.
> Specification of input-port-position: The term
> "transcoded port" has not been defined prior to
> its mention in this spec.
Done that.
> Ditto for "truncated stream" and "translated stream".
Leftovers from a previous version; elided.
> Specification of set-input-port-position!: Ditto
> the above, plus "terminated stream", which I assume
> is something like a closed port.
Yes, but not quite. Elided.
> transcode-input-port!: I don't like the side
> effect on the port. I assume the intention is
> to prevent non-UTF-8 data from being written to
> a UTF-8 port.
No. The intention is to support reading data streams with unknown
encodings, where the first few bytes denote the encoding. This is
fairly common with Unicode, with a BOM at the beginning. (This is
where the concept of a purely "character port" falls down, BTW.)
> Specification of open-bytes-input-port: The term
> "byte stream" has not been defined. Ditto for
> open-string-input-port.
Leftover; elided.
> Specification of write-bytes: What if the bytes to be written are
> inconsistent with the transcoder associated with the output port?
> The same question applies to write-u8, write-string-n, write-char,
> et cetera.
I assume you mean the situation where a non-UTF-8 byte sequence is
written. I've put a paragraph on this in the "Transcoders" section.
(This was specified in the original SRFI, but the relevant section
drifted to the Streams SRFI, I think.)
> set-output-port-buffer-mode!: Might there be some
> inefficiency associated with requiring every output
> port to support this operation?
I don't think so.
> transcode-output-port!: See my remarks regarding
> transcode-input-port!.
Actually, the restriction on `transcode-input-port!' isn't necessary
for `transcode-output-port!'. I've removed it.
> call-with-string-output-port: Why does this create
> a "bytes writer" instead of a character writer? If
> it's a bytes writer, programs can write sequences
> of bytes that have no UTF-8 decoding, and the spec
> doesn't say what's supposed to happen in that case.
It does now.
> Specification of open-file-input+output-ports: The
> period at the end of the first sentence should be
> outside both parentheses. (Once again, "stream ports"
> is an undefined term.)
Doine.
> Design rationale, Encoding: The rationale claims to
> avoid the problems that result from "associating an
> encoding with a port", "by specifying that textual
> I/O always uses UTF-8". I don't follow this at all.
> The proposal includes "predefined codecs for the ISO
> 8859-1, UTF-16LE, UTF-16BE, UTF32-LE, and UTF-32BE
> encodings"
The codecs translate between UTF-8 and the other encodings.
> and provides a side-effecting procedure that associates them with a
> port; furthermore that side effect is allowed only once, which seems
> really ad hoc given that some data may already have been read from
> or written to the port before that side effect is performed.
What you're writing is exactly the reason why it's only supported
once: If the stream is un-transcoded, the buffer position easily
corresponds to a position in the input stream, and it's trivial to do
the transcoding *from that point*. If it is transcoded, this mapping
isn't easily available.
> Design rationale, display: According to the most
> recent status report, formatted output is not under
> consideration for R6RS, so something like display
> should remain.
Is the R5RS compatibility library not enough?
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
More information about the R6RS
mailing list