Chris Hanson scripsit:
> To identify the coding of an XML entity, there are two stages: (1)
> determine if there's a BOM at the beginning of the entity, and (2)
> determine if there's an XML declaration present. Stage (1) requires
> only binary I/O, since you're looking one of several specific
> sequences. Stage (2) requires only ASCII I/O, because the XML
> declaration is entirely coded in ASCII non-control characters (and
> #\tab). I believe this was a deliberate design decision, mostly
> because it is so simple and elegant.
I'm sure it was deliberate to use the ASCII *repertoire*, but XML
does not require the ASCII *encoding* -- in particular, EBCDIC
XML is perfectly possible, so you want to do the encoding-sniffing
using binary I/O throughout.
See
http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html
for the nitty gritty.
--
Using RELAX NG compact syntax to John Cowan <cowan_at_ccil.org>
develop schemas is one of the simple http://www.ccil.org/~cowan
pleasures in life....
--Jeni Tennison <cowan_at_ccil.org>
Received on Mon Nov 20 2006 - 20:57:57 UTC