[R6RS] Source code encoding

Michael Sperber sperber
Mon Mar 7 14:15:17 EST 2005


>>>>> "Marc" == Marc Feeley <feeley at IRO.UMontreal.CA> writes:

Marc> Why would this be interesting, since an ASCII encoded file also happens
Marc> to be a UTF-8 encoded file?  Why would you want to distinguish these
Marc> encodings by adding a BOM to UTF-8?
>> 
>> You don't---you want the BOM to distinguish UTF-8 from UTF-16, not
>> ASCII from UTF-8.

Marc> Something's strange here.  First of all there is no need for a BOM in
Marc> UTF-8 because UTF-8 is a sequence of bytes. [...]

For an explanation, check 

http://www.unicode.org/faq/utf_bom.html#BOM

This is somewhat convoluted, but addresses your concern very
specifically.  Gillam boils it down more nicely at the end of Chapter
6.

Bottom line: "If you want auto-detect a specific UTF format, use a BOM
  always."

Given all these contortions, however, I'm coming around to the view
that simply specifying UTF-8 might be best: backwards-compatible to
ASCII, no BOM requirement.  The only downside I see is that you can't
use a recent Windows Notepad to write non-ASCII Unicode text.

-- 
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla


More information about the R6RS mailing list