[R6RS] safe and unsafe; declarations

Mon Feb 27 19:21:55 EST 2006

Mike wrote:
> > <priority> --> 0 | 1 | 2 | 3
> 
> Is there any special reason why there are four?  I understand the need
> for at least three.  Is the intention that this set might be extended?

There are four because this has worked well in Common Lisp.
I don't think it is necessary to extend beyond four priorities
for portable code, but implementors will still want to provide
their own mechanism(s) for more detailed control.  Later in
this message, I'll use Larceny as an example to show how the
four priorities might map onto Larceny's existing mechanisms.

IMO, there is more reason to extend the declaration syntax
for other qualities than to extend the set of priorities.

> > <typespec> --> #t
> >              | number? | complex? | real? | rational? | integer?
> >              | exact? | inexact? | fixnum? | flonum?
> >              | (< _ <bound>) | (< <bound> _) | (< <bound> _ <bound>)
> >              | (<= _ <bound>) | (<= <bound> _) | (<= <bound> _ <bound>)
> >              | boolean? | symbol? | null? | pair? | list?
> >              | char? | string? | vector?
> >              | input-port? | output-port?
> 
> This worries me for various reasons---most of all because the set of
> type specs doesn't seem to be extensible.  The rationale would be
> clearer if only "primitive types" (i.e. the types without which there
> is no semantic core) were included, but ports seem a bit misplaced
> here.

I share your concern about where the line is drawn.  The
principle I followed was to include most of the unary
predicates in R6RS.  (I left out EOF-OBJECT? and maybe a
few others.)

I can see the value of letting a <typespec> be any unary
predicate.  The downside is that arbitrary predicates may
have side effects and may not terminate, so a program's
behavior would depend upon which type declarations happen
to be ignored by an implementation.  That's a lot worse
than having type exceptions depend on which declarations
are ignored, because type exceptions explain why they
happened.

Other downsides are that programmers might not have as
clear an idea concerning what kinds of type declarations
they should provide, and more people might begin to argue
that implementations should be required to enforce type
declarations.  If we're going to go down that road, then
we probably ought to design an assertion mechanism instead.

> I would appreciate if the speed-oriented implementors (you or Kent)
> could give us some idea of how important type declarations are for
> generating good code.  (I do understand that these declarations have
> other uses, but they seem to be designed primarily to achieve speed.)

On most machines, you can generate much better code for
arithmetic operations if you know that all of the operands
are flonums.  We're talking about a factor of 5 to 20 in
time, and a factor of 3 to 5 in code space.  Even in safe
mode, where the compiler may have to perform a few checks
at run time, the code can be much faster and smaller if it
can just raise an exception when the flonum declarations
turn out to be incorrect.  In unsafe mode, the compiler
can just trust the declarations, which presumably have
been tested in safe mode.

For other kinds of type declarations the speedups are less
dramatic, but can be important in some cases.

In my opinion, however, the main advantage of type
declarations is not speed or space, but the way they
encourage programmers to state what is known about a
variable where the variable is bound, instead of requiring
readers (and the implementation) to discover those facts
anew every place the variable is used.

Sometimes a compiler can tell that every use of some variable
v assumes that v is a vector, but cannot test that common
assumption in any one place because the compiler can't prove
that any of the uses will be executed; some other exception
might be raised first.  All those scattered tests take space
as well as time.

So it's about code space as well as speed, and it's about
safety as well as readability: you want the assumption to
be checked earlier (at the point of declaration) rather
than later, where you would need more complete test
coverage to ensure that it will be checked at all.  With
type declarations, a lot of type errors can be caught at
compile time.  Even in Scheme!

> >              | <rtd>
> 
> Detail: This default escape to record types seems a bit premature, as
> record types may be an instance of some other primitive underlying
> type mechanism.  It also goes against the notational convention of
> using predicates.  How about (has-record-type? <rtd> _) or something
> like it?

Good point.  Maybe (record-predicate <rtd>) would be a better
syntax for that <typespec>.

                                * * *

To explain how a compiled implementation might interpret these
declarations, I will use the current version of Larceny (v0.90)
as a typical example.

Larceny's COMPILER-SWITCHES procedure defines five handy modes:
slow, standard, fast-safe, fast-unsafe, and default.  (Curiously,
the initial mode is fast-safe, not default.)  These modes control
a variety of switches that can also be set individually.

In terms of the proposed declaration syntax, let's say that
the default switch settings above correspond to priority 2 for
each of the four qualities (safety, fast, small, debug).  If
the programmer declares a different set of priorities, then
every switch's setting might be recomputed for the scope of
that declaration according to policies such as:

Debugging:
  + issue-warnings is on            iff debug + safety >= 3
  + include-procedure-names is on   iff debug >= 1
  + include-variable-names is on    iff debug >= 1
  + include-source-code is on       iff debug - small >= 0
  + single-stepping is on           iff debug - fast - small = 3

Safety:
  + avoid-space-leaks is on         iff small = 3
  + write-barrier is on             [irrelevant for this example]
  + runtime-safety-checking is on   iff safety >= 1
    + catch-undefined-globals is on iff debug * safety - small >= 1

Speed:
  + integrate-procedures is "larceny" [irrelevant for this example]
  + control-optimization is on      iff fast + small >= 1
  + parallel-assignment-optimization is on iff fast + small - debug >= 0
  + lambda-optimization is on       iff fast + small - debug >= 1
  + benchmark-mode is on            iff fast + small - debug - safety >= 1
  + benchmark-block-mode is on      [irrelevant for this example]
  + global-optimization is on       iff 2 * fast + small - debug >= 1
  [The next four switches are irrelevant unless global-optimization is on]
    + interprocedural-inlining is on             iff fast - small >= 0
    + interprocedural-constant-propagation is on iff fast >= 1
    + common-subexpression-elimination is on     iff fast >= 3
    + representation-inference is on             iff fast >= 3
  + local-optimization is on        iff fast + small >= 1
  + peephole-optimization is on     iff fast + small >= 1
  + inline-allocation is on         iff fast - small >= -1
  + fill-delay-slots is on          iff fast >= 1

If safety + debug - fast - small >= 5, then Larceny might
enforce all type declarations, although it would enforce
the extended procedure? type declarations only at call sites
(to keep from breaking object identity on procedures, which
is IMO a really bad idea with which we are stuck).  At some
lower threshold, Larceny might enforce all of the type
declarations that can be enforced in constant time.

If fast + small - debug - 100 * safety >= 4, Larceny might
even trust some type declarations.

Will