[r6rs-discuss] [Formal] Non-ASCII characters should not be treated all alike from Aubrey Jaffer on 2006-12-01 (r6rs-discuss.mbox)

From: Aubrey Jaffer <agj>
Date: Fri Dec 1 14:36:23 2006

| Date: Tue, 28 Nov 2006 00:05:05 -0500
| From: John Cowan <cowan_at_ccil.org>
|
| Submitter: John Cowan
| Email address: cowan_at_ccil.org
| Issue type: Defect
| Priority: Minor
| Component: Lexical
| Report version: 5.91
| Summary: Non-ASCII characters should not be treated all alike
|
| The lexical syntax should not allow Nd, Mc, or Me characters to
| be initial in identifiers. Allowing a sequence of Nd characters
| to be identifiers means that digit-strings in non-ASCII digits
| are identifiers. I don't insist that all digit-strings be
| numerals, but they certainly should not be identifiers.
|
| Likewise, Unicode semantics attaches a Mc or Me character to
| its predecessor, which would not be part of the identifier.
| That's undesirable.

Also, should all Zs (whitespace) characters delimit identifiers?
I particularly wonder about NO-BREAK spaces U+00A0 and U+202F:

  U+0020 SPACE
  U+00A0 NO-BREAK SPACE
  U+1680 OGHAM SPACE MARK
  U+180E MONGOLIAN VOWEL SEPARATOR
  U+2000 EN QUAD
  U+2001 EM QUAD
  U+2002 EN SPACE
  U+2003 EM SPACE
  U+2004 THREE-PER-EM SPACE
  U+2005 FOUR-PER-EM SPACE
  U+2006 SIX-PER-EM SPACE
  U+2007 FIGURE SPACE
  U+2008 PUNCTUATION SPACE
  U+2009 THIN SPACE
  U+200A HAIR SPACE
  U+202F NARROW NO-BREAK SPACE
  U+205F MEDIUM MATHEMATICAL SPACE
  U+3000 IDEOGRAPHIC SPACE

\meta{delimiter} \: \meta{whitespace} \| ( \| ) \| \openbracket{} \| \closedbracket{} \| " \| ;
\meta{whitespace} \: \meta{character tabulation} \| \meta{linefeed}
\> \| \meta{line tabulation} \| \meta{form feed} \meta{carriage return}
\> \| \meta{any character whose category is Zs, Zl, or Zp}
Received on Fri Dec 01 2006 - 14:33:48 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:00 UTC