[R6RS] Separating linking from environment manipulation

Mon May 23 12:16:35 EDT 2005

This is an attempt to separate a particular set of issues with modules
from the others, namely:

1. linking, i.e. describing the top-level structure of a program and
   how the different parts are assembled

2. namespace management in the actual code

I only do #1 in this email; different kinds of namespace management
systems can (hopefully) be plugged in.  Thus, the proposal could be
set on top of either separate Scheme-48-style namespace management or
Chez-style "modules" that allow environment manipulation in the core
language.

So, in a way, this is an alternative instance for Matthew's "library"
framework (posted privately in September 2004), but it specifies
slightly less in that it doesn't get involved in namespace
management.  This message is long.  I plan on explaining the proposal
at the workshop.  You may want to skip to the examples section.

Overview & Rationale
====================

- The proposal is loosely based on Richard's "Missing Link" paper at

  http://mumble.net/~kelsey/papers/missing-link.ps.gz

  Reading it will probably help understand this proposal, even though
  I've tried to make it self-contained.

- This proposal takes the Pebble view of modules, i.e. that a module
  is a part of a program that may be combined with others in various
  ways.  For example, I can write a program and link it either to the
  Xaw or the Xaw3D GUI, where both the GUI library and the program
  itself are sets of modules.  This, in my view, is essential for
  anything that goes by the name "module," and I frequently need this
  kind of facility in practice.  (In a way, it's a more powerful
  version of Bigloo's "meta-configuration" files, which allow
  associating module implementations with tags.)

  Note that MzScheme's MODULE, Chez's MODULE, and the Scheme 48 module
  system don't handle this situation.  Chez and Scheme 48 handle it to
  some extent, but outside the purview of the module system itself.
  Bigloo's module system allows it to some extent.

- The proposal is meant to make the common case for linking very easy,
  but allow more complex scenarios.

- The proposal has interfaces as dependency tokens, whose main job is
  to say whether an export and an import agree.  They also carry
  information about the exported names.

- The proposal doesn't say anything about phasing, either.  This
  could (and should) be specified separately, possibly along with the
  namespace management.

- The coverage of the configuration language is smaller than that of
  my previous proposals, as it doesn't address namespace management.
  I've tried to make it minimal to allow (if we decide on it) putting
  more of the "module stuff" in the extensible part of the language.
  Nevertheless, it's not integrated with the core language for the
  reasons outlined previously.  Specifically, I think it's good to:

  - Have the bird's-eye view of the top-level modular structure of the
    program directly readable by humans and programs, visualizable by
    module browsers etc.
  - Allow packaging collections of modules without fearing that the
    top-level structure is, in some way, environment-dependent or
    platform-dependent.
  - Allow finding the meanings of top-level identifiers with
    reasonable effort.
  - Allow restricting the initial namespace of a piece of code to
    something less than the core language.
  - Potentially allow integrating separate readers for different
    syntaxes.
  - Enable the linking infrastructure described here, and keep the
    stuff in the core language simple and regular.

  Again, please note that the proposal allows, for example, putting
  a full Chez-style system below it with little (albeit some)
  redundancy between the two.  (This statement doesn't mean I
  unconditionally endorse a full Chez-style system.)

Examples
========

A simple example:

(define-interface foo*
  (export a b))

(define-interface bar*
  (export c d))

(define-module foo foo*
  (import (scheme scheme*))
  (implementation-environment scheme)
  (begin
    (define a 1)
    (define b 2)))

The meaning of interfaces FOO* and BAR* should be clear.  (I'm
choosing simple names for interfaces and modules, but in the real
world, we'll probably want a hierarchical namespace for them.  They
certainly need to be unique across a system.)

The definition of the FOO module says that its implementation
satisfies the FOO* interface.  It imports a module with the SCHEME*
interface (presumably, the Scheme core language) and locally calls it
SCHEME.  It also says that its body is written in the language defined
by the SCHEME module.  (This is essentially the same as in the
MzScheme MODULE construct.)

Here's a module with an additional import:

(define-module bar bar*
  (import (scheme scheme*)
	  (foo foo*))

  (implementation-environment scheme)
  <IMPORTS E>
  (begin
    <IMPORTS I>
    (define c (+ a b))
    (define d 5)))

The <IMPORTS E> and the <IMPORTS I> are placeholders for the forms
that actually make the exports of FOO available to the body; they are
not specified in this proposal.  In a Bigloo/Scheme-48-style system,
<IMPORTS E> would look something like (open foo), in a Chez-style
system, the import would be made available via (import foo).  (A
Bigloo/Scheme-48-style system would make the
IMPLEMENTATION-ENVIRONMENT form unnecessary, we could just do (open
scheme).)

The two could be linked together to form a *library*, i.e. a
collection of modules with partial link information, like so:

(define-library main (bar)
  (modules foo bar)
  (libraries r6rs))

The main library form "exports" module BAR which may be imported in
other libraries linked against it.  The MODULES form specifies the
modules in the library.  Since there's only one implementation of FOO*
in the library, the FOO* import of BAR can be linked to FOO without
any further work.  The same holds for R6RS, which presumably is a
library containing an implementation of SCHEME*.  Via the LIBRARIES
clause, libraries can be combined hierarchically via a LIBRARIES
clause in the DEFINE-LIBRARY form.

Imagine a second implementation of FOO*:

(define-module foo2 foo*
  (import (scheme scheme*))
  (implementation-environment scheme)
  (begin
    (define a 3)
    (define b 4)))

... and a module importing two implementations of FOO*:

(define-module baz bar*
  (import (scheme scheme*)
	  (foo-a foo*)
	  (foo-b foo*))

  (implementation-environment scheme)
  <IMPORTS E>
  (begin
    <IMPORTS I>
    (define c (+ foo-a:a foo-b:b))
    (define d 5)))

Here, <IMPORTS E> or <IMPORTS I> must add a prefix "foo-a:" to the
imports from FOO-A and "foo-b:" to the imports of FOO-B.

Linking then goes like this:

(define-library main (baz)
  (modules foo foo2 baz)
  (libraries r6rs)
  (link (baz foo-a foo)
	(baz foo-b foo2)))

Syntax
======

;; Configuration language

<config program> -> <config form>*

<config form> -> <interface definition>
               | <module definition>
	       | <library definition>

;; Interfaces

<interface definition> -> (DEFINE-INTERFACE <interface identifier> <interface form>)

<interface form> -> <interface identifier>
	          | (EXPORT <export>*)
		  | (COMPOUND-INTERFACE <interface form>*)

;; Modules

<module definition> -> (DEFINE-MODULE <module identifier> <interface identifier>
		         <module clause>*)
		     | (DEFINE-MODULE <module identifier> <module form>)

<module clause> -> <import clause>
	         | <implementation-environment clause>
                 | <begin clause>
                 | <file clause>

<import clause> -> (IMPORT <import spec>*)

<import spec> -> (<identifier> <interface identifier>)

<implementation-environment clause>
                -> (IMPLEMENTATION-ENVIRONMENT <identifier>)

<begin clause> -> (BEGIN <program>)

<file clause> -> (FILE <filename>)

<module form> -> <module identifier>
               | (COPY-MODULE <module form>)

;; Libraries

<library definition> -> (DEFINE-LIBRARY <library identifier>
                                          (<module identifier> ...)
                          <library clause>*)

<library clause> -> (MODULES <module identifier>*)
                  | (LIBRARIES <library identifier>*)
                  | (LINK <link spec>*)

<link spec> -> (<module identifier> <identifier> <module identifier>)

Remarks
=======

- Any module appearing in the body of a DEFINE-LIBRARY form that
  doesn't appear in its export list must be fully linked, i.e. have no
  open imports in the linking environment specified by the library.

- "Any module appearing in the body of a DEFINE-LIBRARY form" extends
  to modules in the libraries in the LIBRARIES form by transitivity.

- COPY-MODULE creates a new module with the same implementation but
  fresh state from an existing module.  It's necessary, for example,
  to link and then hide local "helper" modules from libraries.

Open questions
==============

- The proposal doesn't say whether modules can be mutually recursive.
  If they are, we may need a clause specifying initialization order:

<library clause> -> (INITIAL-ORDER <module identifier>*)

- The lexical syntax of <X identifier> is unspecified as of yet.

- The syntax of <export> is unspecified.  Presumably, it could carry
  information about "implicit" exports, value/macro distinctions, or
  types.

- The syntax of <filename> is unspecified as of yet.

-- 
Cheers =8-} Mike
Friede, V?lkerverst?ndigung und ?berhaupt blabla