[R6RS] Source code encoding
Michael Sperber
sperber
Mon Mar 7 14:15:17 EST 2005
>>>>> "Marc" == Marc Feeley <feeley at IRO.UMontreal.CA> writes:
Marc> Why would this be interesting, since an ASCII encoded file also happens
Marc> to be a UTF-8 encoded file? Why would you want to distinguish these
Marc> encodings by adding a BOM to UTF-8?
>>
>> You don't---you want the BOM to distinguish UTF-8 from UTF-16, not
>> ASCII from UTF-8.
Marc> Something's strange here. First of all there is no need for a BOM in
Marc> UTF-8 because UTF-8 is a sequence of bytes. [...]
For an explanation, check
http://www.unicode.org/faq/utf_bom.html#BOM
This is somewhat convoluted, but addresses your concern very
specifically. Gillam boils it down more nicely at the end of Chapter
6.
Bottom line: "If you want auto-detect a specific UTF format, use a BOM
always."
Given all these contortions, however, I'm coming around to the view
that simply specifying UTF-8 might be best: backwards-compatible to
ASCII, no BOM requirement. The only downside I see is that you can't
use a recent Windows Notepad to write non-ASCII Unicode text.
--
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla
More information about the R6RS
mailing list