--- This message is a formal comment which was submitted to formal-comment_at_r6rs.org, following the requirements described at: http://www.r6rs.org/process.html --- Submitter: John Cowan Email address: cowan_at_ccil.org Issue type: Defect Priority: Major Component: I/O Report version: 5.91 Summary: R6RS must provide a UTF-16 codec, because UTF-16 is an essential encoding. R6RS implementations are currently required to support the UTF-8, Latin-1 (ISO 8859-1), UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE encodings. This list omits the essential UTF-16 encoding. The difference between UTF-16 and UTF-16{BE,LE} is that in the former, the presence of a BOM (U+FEFF) character at the beginning of the input stream indicates the ordering of the bytes that make up each character. The BOM is not considered part of the content. (If no BOM is present, the environment's default ordering is used; failing that, big-endian order is used.) In the UTF-16BE and UTF-16LE encodings, no BOM is permitted; an initial U+FEFF character has its alternative semantics of zero-width no-break space. These encodings are far less commonly used than the UTF-16 encoding. In particular, the Windows operating system consistently creates UTF-16 documents in little-endian order (not UTF-16LE documents) whenever characters must be written that are not available in the locale-dependent encoding. In essence, Windows systems provide two different encodings at any one time: the "ANSI" (locale-dependent, 8-bit or 8/16-bit) encoding, and the UTF-16 encoding. (The MS-DOS compatibility support provides a third encoding for use by MS-DOS programs.) Failing to provide a UTF-16 codec will make it unnecessarily hard to process Unicode documents generated by Windows. In addition, UTF-16 (not UTF-16LE or UTF-16BE) is one of the two encodings which all XML processors (parsers) are required to accept, the other being UTF-8. Depending on the predominant language of the document, UTF-16 encoding may be more or less compact than UTF-8 encoding. Failing to provide a UTF-16 codec will make a substantial range of XML documents difficult to process. I propose that a procedure named "utf-16-codec" be added to section 15.3.3 (p. 86). I further propose that the codecs for the rarely used UTF-{16,32}{BE,LE} encodings be removed. No form of UTF-32 encoding is in common use in I/O, though UTF-32 format is sometimes convenient for internal use. -- John Cowan http://ccil.org/~cowan cowan_at_ccil.org Monday we watch-a Firefly's house, but he no come out. He wasn't home. Tuesday we go to the ball game, but he fool us. He no show up. Wednesday he go to the ball game, and we fool him. We no show up. Thursday was a double-header. Nobody show up. Friday it rained all day. There was no ball game, so we stayed home and we listened to it on-a the radio. --ChicoliniReceived on Tue Oct 31 2006 - 19:43:57 UTC
This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC