[r6rs-discuss] [Formal] Non-ASCII characters should not be treated all alike
| Date: Tue, 28 Nov 2006 00:05:05 -0500
| From: John Cowan <cowan_at_ccil.org>
|
| Submitter: John Cowan
| Email address: cowan_at_ccil.org
| Issue type: Defect
| Priority: Minor
| Component: Lexical
| Report version: 5.91
| Summary: Non-ASCII characters should not be treated all alike
|
| The lexical syntax should not allow Nd, Mc, or Me characters to
| be initial in identifiers. Allowing a sequence of Nd characters
| to be identifiers means that digit-strings in non-ASCII digits
| are identifiers. I don't insist that all digit-strings be
| numerals, but they certainly should not be identifiers.
|
| Likewise, Unicode semantics attaches a Mc or Me character to
| its predecessor, which would not be part of the identifier.
| That's undesirable.
Also, should all Zs (whitespace) characters delimit identifiers?
I particularly wonder about NO-BREAK spaces U+00A0 and U+202F:
U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+180E MONGOLIAN VOWEL SEPARATOR
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
\meta{delimiter} \: \meta{whitespace} \| ( \| ) \| \openbracket{} \| \closedbracket{} \| " \| ;
\meta{whitespace} \: \meta{character tabulation} \| \meta{linefeed}
\> \| \meta{line tabulation} \| \meta{form feed} \meta{carriage return}
\> \| \meta{any character whose category is Zs, Zl, or Zp}
Received on Fri Dec 01 2006 - 14:33:48 UTC
This archive was generated by hypermail 2.3.0
: Wed Oct 23 2024 - 09:15:00 UTC