[r6rs-discuss] reading XML files

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: John Cowan <cowan>
Date: Mon Nov 20 20:58:04 2006

Chris Hanson scripsit:

> To identify the coding of an XML entity, there are two stages: (1)
> determine if there's a BOM at the beginning of the entity, and (2)
> determine if there's an XML declaration present. Stage (1) requires
> only binary I/O, since you're looking one of several specific
> sequences. Stage (2) requires only ASCII I/O, because the XML
> declaration is entirely coded in ASCII non-control characters (and
> #\tab). I believe this was a deliberate design decision, mostly
> because it is so simple and elegant.

I'm sure it was deliberate to use the ASCII *repertoire*, but XML
does not require the ASCII *encoding* -- in particular, EBCDIC
XML is perfectly possible, so you want to do the encoding-sniffing
using binary I/O throughout.

See http://recycledknowledge.blogspot.com/2005/07/hello-i-am-xml-encoding-sniffer.html
for the nitty gritty.

-- 
Using RELAX NG compact syntax to        John Cowan <cowan_at_ccil.org>
develop schemas is one of the simple    http://www.ccil.org/~cowan
pleasures in life....
        --Jeni Tennison                 <cowan_at_ccil.org>

Received on Mon Nov 20 2006 - 20:57:57 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC