[r6rs-discuss] [Formal] Non-ASCII characters should not be treated all alike

From: Aubrey Jaffer <agj>
Date: Fri Dec 1 14:36:23 2006

 | Date: Tue, 28 Nov 2006 00:05:05 -0500
 | From: John Cowan <cowan_at_ccil.org>
 |
 | Submitter: John Cowan
 | Email address: cowan_at_ccil.org
 | Issue type: Defect
 | Priority: Minor
 | Component: Lexical
 | Report version: 5.91
 | Summary: Non-ASCII characters should not be treated all alike
 |
 | The lexical syntax should not allow Nd, Mc, or Me characters to
 | be initial in identifiers. Allowing a sequence of Nd characters
 | to be identifiers means that digit-strings in non-ASCII digits
 | are identifiers. I don't insist that all digit-strings be
 | numerals, but they certainly should not be identifiers.
 |
 | Likewise, Unicode semantics attaches a Mc or Me character to
 | its predecessor, which would not be part of the identifier.
 | That's undesirable.

Also, should all Zs (whitespace) characters delimit identifiers?
I particularly wonder about NO-BREAK spaces U+00A0 and U+202F:

  U+0020 SPACE
  U+00A0 NO-BREAK SPACE
  U+1680 OGHAM SPACE MARK
  U+180E MONGOLIAN VOWEL SEPARATOR
  U+2000 EN QUAD
  U+2001 EM QUAD
  U+2002 EN SPACE
  U+2003 EM SPACE
  U+2004 THREE-PER-EM SPACE
  U+2005 FOUR-PER-EM SPACE
  U+2006 SIX-PER-EM SPACE
  U+2007 FIGURE SPACE
  U+2008 PUNCTUATION SPACE
  U+2009 THIN SPACE
  U+200A HAIR SPACE
  U+202F NARROW NO-BREAK SPACE
  U+205F MEDIUM MATHEMATICAL SPACE
  U+3000 IDEOGRAPHIC SPACE

\meta{delimiter} \: \meta{whitespace} \| ( \| ) \| \openbracket{} \| \closedbracket{} \| " \| ;
\meta{whitespace} \: \meta{character tabulation} \| \meta{linefeed}
\> \| \meta{line tabulation} \| \meta{form feed} \meta{carriage return}
\> \| \meta{any character whose category is Zs, Zl, or Zp}
Received on Fri Dec 01 2006 - 14:33:48 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC