[r6rs-discuss] Stateful codecs and inefficient transcoding

From: John Cowan <cowan>
Date: Sat Nov 4 13:09:15 2006

William D Clinger scripsit:

> Thanks again to those who have explained to me the real requirements for
> port i/o. This message describes some simplifications to section 15.3
> of the draft that, in my opinion, would satisfy the real requirements
> while enhancing the usability and efficiency of port i/o.

A right hearty plus-one to almost all of this: I'm posting one quibble
and one major objection.

> * The standard codecs include those in the draft,

As I said in my formal comment, I still think that
utf-{16,32}{be}{le}-codec shouldn't be standardized just because they
are cheap and easy; they are rarely wanted, and people will end up using
utf-16le-codec on Windows (because they know that Intel machines are
little-endian) and be in for a big surprise some day trying to read
a UTF-16 file that originated on a SPARC system.

(It occurred to me on reading your message that the 5.91 draft doesn't
actually require codecs to be procedures, and thus the original claim
that stateful encodings can't be handled is false. For all we know,
(utf-8-codec) might on a particular implementation return the symbol
utf-8.)

> * The binary transcoder is defined as the transcoder
> constructed from the latin-1-codec, lf eof-style,
> and (arbitrarily, since no transcoding errors are
> possible) the raise handling mode.

No, no, a thousand times no!

If a port is binary, you should be able to read nothing but bytes objects
from it (which you can then convert to machine integers or floats using
the procedures of Section 11). If you want to read characters from
a port, make it a character port with a proper transcoder. Java 1.0
introduced methods to treat byte sequences as strings in this crass
fashion; those methods were deprecated already in Java 1.1, and anyone who
uses them is in a state of sin. Treating bytes directly as characters is
data punning, every bit as bad as the pre-ISO C ability to pass a float
to a procedure expecting a long and get it, not cast in the sense of
(long)32.0, but the bits reinterpreted.

Furthermore, Latin-1 should *not* be privileged just because it happens to
be very commonly used (and to be the bottom 256 characters of Unicode).
Require it as a codec, fine; slip it in as the implicit encoding of
binary streams, not so much.

> * The textual lookahead-X, get-X, and put-X operations
> operate on all ports, since a binary port is also a
> textual port.

-1, per above.

-- 
A rabbi whose congregation doesn't want         John Cowan
to drive him out of town isn't a rabbi,         http://www.ccil.org/~cowan
and a rabbi who lets them do it                 cowan_at_ccil.org
isn't a man.    --Jewish saying
Received on Sat Nov 04 2006 - 13:09:10 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC