Michael Sperber scripsit:
> Any number of files contain both text and binary data---most image
> file formats, mp3 files, etc. Moreover, one way to view looking at an
> XML file would be:
>
> - Start off in binary, identifying whether it's UTF-16, UTF-32 or one
> of the 8-bit encodings.
>
> - Switch to ASCII in the latter case for reading the encoding tag.
>
> - Switch to the correct encoding afterwards.
As I pointed out earlier, you don't want to switch to ASCII, because
not all 8-bit encodings are ASCII-compatible (EBCDIC is not).
See
http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html
Furthermore, in practice high-performance XML parsers do their
own transcoding as late as possible (i.e. just before passing the
content to the client), and use binary input exclusively.
--
Cash registers don't really add and subtract; John Cowan
they only grind their gears. cowan_at_ccil.org
But then they don't really grind their gears, either;
they only obey the laws of physics. --Unknown
Received on Wed Nov 22 2006 - 12:07:35 UTC