[r6rs-discuss] Issues with section 8.2. Port I/O, a.k.a. (rnrs io ports (6)) from William D Clinger on 2007-08-25 (r6rs-discuss.mbox)

From: William D Clinger <will>
Date: Sat, 25 Aug 2007 17:03:35 -0400

Abdulaziz Ghuloum wrote:
> I don't know about how people feel about changing anything in
> the document at this stage in the game.

Apparently a supermajority don't want to fix the many
problems, or want their repair to be deferred until R7RS.
Otherwise the R6RS would not have been ratified.

> Section 8.2.2. File options:
>
> It is hard for me to understand the listed file options
> (no-create, no-fail, and no-truncate) and how the options
> interact.

That means you are sentient.

> 8.2.3. Buffer modes.
>
> The three listed buffer modes are none, line, and block.

FYI, Larceny's preferred buffer mode for interactive output
ports is datum. The buffer-mode syntax does not allow datum,
which is one of several reasons that syntax is deprecated
in Larceny.

> I don't understand what line buffering means.

Buffer modes are mostly implementation-dependent, so you
can use your best judgment. I'm going to tell you what
I think, which is usually what Larceny does or will do.

> First, the option
> does not make sense for binary ports.

So binary ports can just ignore the line buffer mode and
do whatever they want. (See below, however.)

> Second, what
> constitutes a line depends on what character(s) constitute an
> end-of-line marker, but this is unknown unless the bytes of a
> file are actually decoded.

The behavior is easier to understand for output ports. If
the buffer mode is line, then outputting a linefeed should
flush the output buffer (whatever that means); whether a
<carriage return>, <next line>, or <line separator> flushes
the output buffer is presumably implementation-dependent.

For an input port, I believe the intent is that anything
that's likely to be translated into a linefeed should fill
the buffer from which the textual input procedures fetch
their data, so (for example) those procedures will not
block on an interactive input port once an end-of-line
has been typed. To implement this, you'll probably have
to arrange for binary ports to pay attention to the line
buffer mode.

> For input-ports, the bytes have to
> be read and decoded one at a time until an end-of-line marker
> is found. But this is more or less what unbuffered IO is.

Not quite. When the buffer mode is line, you can decode
the bytes twice, once at the binary level and again when
the bytes are translated into characters. Since this is
all implementation-dependent anyway, and it doesn't hurt
to treat any old byte as a potential end-of-line, the
binary-level decoding can treat all of the usual suspects
as an end-of-line encoding for the purpose of buffering.

Some operating systems have a "cooked" mode that basically
does this for you, in which case your main problem may be
to avoid the "cooked" mode for buffer modes other than line.

> For output-ports, and in the case of multibyte eol marker, the
> situation is just as bad since the port has to maintain additional
> state (e.g. have I seen a <cr> and this is a <lf> or not).

Not really. The textual output procedures have to look
for a linefeed anyway in order to implement the eol style,
so it's no big deal for the linefeed case to flush the buffer.

> A related question: what's the rationale for including the
> "buffer-mode?" predicate?

There is none, just as there is no rationale for the
buffer-mode syntax.

> 8.2.4. Transcoders.
>
> Codecs:
> What's the endianness of the codec returned by (utf-16-codec)?

As specified by the Unicode standard: the default is
big-endian, but that particular codec allows either of
two byte order marks that specify big- or little-endian.

> Was the UTF-32 codec dropped unintentionally?

No, it was dropped intentionally.

> Eol-style:
> Which procedures are affected by the eol-style of a port's
> transcoder? Does it only affect (get-line <textual-input-port>)
> or does it perform translation on other operations?

It affects all of the textual input procedures. All must
perform the eol style translation.

The eol style also affects all of the textual output
procedures.

> Does bytevector->string and string->bytevector use the
> eol-style of the given transcoder argument?

Yes.

> 8.2.7. Input ports
>
> Why does open-file-input-port take a file-option argument?

Answer #1: Because it's there.
Answer #2: So implementors can add their own feeping creatures.

> Why are there two ways of specifying the optional argument
> (dropping it, and adding #f).

People who write code often prefer to drop the optional
argument. Programs that write code often prefer to use
#f instead of dropping it.

> Is it possible that all
> maybe-transcoder arguments to such procedures be named simply
> "transcoder" and the #f option be removed?

It is possible that someone will suggest that for R7RS.
For now, the people have spoken.

> 8.2.9. Textual input.
>
> This section mentions the get-line procedure. Is there a
> similar put-line procedure that I'm missing somewhere?

The index lists no such procedure. It's hard for me to
imagine what its intended semantics might be.

> 8.2.13. Input/output ports.
>
> The description of input/output ports does not explain how the
> read and write operations interact.

Their interaction is implementation-dependent. Some of
the editors had a shared source/sink model in mind, but
there is nothing in the ratified document that actually
enforces that.

When you actually implement this stuff, you will discover
quite a few dark corners that make the above issues look
simple.

Will
Received on Sat Aug 25 2007 - 17:03:35 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC