William D Clinger <will_at_ccs.neu.edu> writes:
> The rest of this comment suggests a better design, and
> then describes some outstanding issues for which I have
> no strong recommendation at this time.
Thanks, it's getting better and better. And I support Per Bothner's
posts on this subject.
> * The binary transcoder is a special pseudo-transcoder
> that is returned by the binary-transcoder procedure
This design doesn't encompass bytes->bytes transcoders, e.g.
compression, which are very similar to bytes->chars (on input)
and chars->bytes (on output) transcoders, except the types
of stream elements.
The same is true with chars->chars transcoders, e.g. ports applying
Unicode normalization (NFC/NFD/NFKC/NFKD).
Of course "doesn't encompass" doesn't mean that it forbids them,
as they could be implemented as yet another kind of ports, mostly
duplicating the functionality of transcoders, but it is quite close
to encompassing them.
> [issue:position]
>
> Asking for the byte position of a complexly transcoded
> port can be like asking for the carrier frequency of a
> spread spectrum signal, and I am told that some standard
> encodings do not always align the encodings of characters
> upon byte boundaries, so the port-position operation
> should be required only for binary ports, if at all.
Indeed.
There is a possibility of supporting opaque tokens as positions of
text streams (like fgetpos/fsetpos in C); in practice a token includes
the byte position, the encoding state, and probably information about
buffering at this point. This supports remembering a position and
seeking back to this position later. I mention this design in order
to not propose to use it however:
- It's rarely used in practice.
http://www.google.com/codesearch?q=fsetpos shows mostly
implementations of this function or wrapping it in other languages;
its actual uses (I've counted two on the first 10 pages) apply to
binary files where ftell/fseek could be used as well.
- Some implementations of transcoding don't support cloning the
encoding state (iconv; calling an external program with redirection
via pipes).
> It seems as though the right thing to do may be to eliminate
> readers and writers from the report, while folding their
> functions into ports that represent arbitrary sources and sinks.
I agree.
--
__("< Marcin Kowalczyk
\__/ qrczak_at_knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Received on Sat Nov 25 2006 - 06:58:58 UTC