[r6rs-discuss] Why lexers can be simpler when restricted to ASCII

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Alan Watson <alan>
Date: Mon, 23 Apr 2007 12:06:05 -0500

In formal comment 231, I stated:

"Many current Schemes have lexers written for ASCII (or Latin-1)
character sets. Conversion of these lexers to the new standard would be
easier if the report allowed inline hex escapes to appear anywhere in
Scheme code."

The editors replied:

"It is unclear why converting the lexers would be significantly simpler
through this change"

Let me explain my original opinion. Many Schemes currently have lexers
written in C using "char". These need converting to "long" to handle
Unicode. Furthermore, table-driven approaches are practical for ASCII
(128 values), but not practical for Unicode (roughly 2^24 values).

In case that isn't clear enough: My Scheme uses flex for its lexer. I
cannot see how to simply convert it to accept Unicode. I think I will
have to dump flex and implement a new lexer by hand.

Regards,

Alan
Received on Mon Apr 23 2007 - 13:06:05 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC