Recursive modules for programming

(1)

Recursive modules for programming

Keiko Nakata¹ Jacques Garrigue²

1Kyoto University Research Institute for Mathematical Sciences

2 Graduate School of Mathematics, Nagoya University

Abstract. The ML module system is useful for building large-scale programs. Programmers can factor their programs into nested and parame- terized modules, and can control abstraction with signatures. In spite of this ﬂexibility, however, recursion between modules is prohibited. As a result of this constraint, programmers may have to consolidate conceptually separate components into a single module, intruding on modular programming. Introducing recursive modules is a natural way out of this predicament, yet existing proposals vary in expressiveness and verbosity.

In this paper, we propose a type system for recursive modules, which can infer their signatures. Opaque signatures can also be given explicitly, to provide type abstraction either inside or outside of the recursion.

The type system is provably decidable, and is sound for a call-by-value semantics.

1 Introduction

The ML module system [16] offers useful mechanisms for building large-scale programs: 1) nested structures of modules allow hierarchical decomposition of programs; 2) functors can be used to express advanced forms of parameteriza- tion, which ease code reuse; 3) abstraction can be controlled by signatures with transparent, opaque, or translucent types [9]. In spite of this flexibility, however, mutual recursion between modules is prohibited as dependencies between modules must accord with the order of definitions. As a result of this constraint, programmers may have to consolidate conceptually separate components into a single module, intruding on modular programming.

Recently, much work has been devoted to investigating extensions with recursion of the ML module system. Two important issues involved in recursive modules are type checking and initialization. Crary, Harper and Puri [1], Russo [18], and Dreyer [5] have given type theoretic accounts for recursive modules, proposing type systems together. Hirschowitz and Leroy [8] and Dreyer [4] have investigated type systems which guarantee well-formedness of recursive modules, ensuring that they will not attempt to access not-yet-evaluated values during initialization.

It seems that we are almost ready for programming with recursive modules.

Indeed, some real languages support recursive modules [12, 17], in which we can use recursive modules for practical programs, or, at least, get a ﬂavor of them.

(2)

In this paper, we ﬁrst review two examples. In the ﬁrst one, two recursive modulesTreeandForestrespect each other’s privacy: we seal them with opaque signatures individually. Thus type abstraction is enforced inside the recursion.

In the second,Tree andForest are in closer relations: they know each other’s exact implementations, and we seal them with an opaque signature as a whole.

Thus type abstraction is enforced outside of the recursion.

Both privacy and intimacy will be important for recursive modules in a com- petitive environment. Existing proposals, however, vary in their way to handle them. We may be denied privacy. We may have to write signatures twice for the same module one of which is solely for assisting the type checker and does not aﬀect the resulting signature of the module.

This paper proposes a type system for recursive modules, in which modules can have privacy or keep close relations depending on the situation they are in.

Moreover, it does not require additional signature annotations from programmers. Thus programmers can either omit writing signatures or give signatures explicitly to control abstraction.

The goal of this paper is to design a practical system, where real programs can be written. In order to avoid too much verbosity, it should be able to reconstruct necessary type information during type checking, with minimal reliance on type annotations. The resulting type system is technically involved. Having both of modules with type components and recursion, the reconstruction could easily become undecidable. We paid extensive consideration to make the type system decidable, getting ideas from rewriting theory.

The rest of the paper is organized as follows. In the next section, we review two examples of recursive modules. Section 3 deﬁnes our calculus, with which we develop our formalization. Sections 4 to 6 explain our type system and present a soundness result. Section 7 examines related work. In Section 8, we will conclude and give a brief overview of ongoing and future work.

2 Example

In this section, we review two examples and informally present our calculus¹. The ﬁrst example appears in Fig. 1. The top-level moduleTreeForestcon- tains two modules Treeand Forest:Tree represents a module for trees whose leaves and nodes are labeled with integers;Forestrepresents a module for sets of those integer trees.

Tree and Forest refer to each other in a mutually recursive way. Their type components Tree.t and Forest.t refer to each other, as do their value componentsTree.maxandForest.max. These functions calculate the maximum integers a tree and a forest contain, respectively.

1 In examples, we shall allow ourselves to use some usual core language constructions, such as let and if expressions and list type constructors even though they are not parts of the formal development given in Section 3.

(3)

module TreeForest = struct (TF) module Tree = (struct

datatype t = Leaf of int | Node of int * TF.Forest.t val max = λx.case x of Leaf i ⇒ i

| Node (i, f) ⇒ let j = TF.Forest.max f in if i > j then i else j end : sig type t val max : t → int end)

module Forest = (struct type t = TF.Tree.t list val max = λx.case x of [] ⇒ 0

| hd :: tl ⇒ let i = TF.Tree.max hd in let j = max tl in if i > j then i else j

end : sig type t val max : t → int end) end

Fig. 1.Modules for trees and forest

To enable forward references, we enrich structures and signatures withself variables. The self variable of a structure (resp. signature) is bound to the struc- ture (resp. signature) itself which is going to be deﬁned. Components of the structure (resp. signature) can refer to each other recursively using the self variable. For instanceTreeForesthas a self variable namedTF, which is used inside TreeandForestto refer to each other recursively. We keep the usual ML scop- ing rules for backward references. Thus Tree.max can refer to the Leaf and Node constructors without going through a self variable. Tree might also be used without preﬁx insideForest, but the explicit notation seems clearer.

This ﬁrst example illustrates a possible use of recursive modules where they respect each other’s privacy, that is, they are sealed with opaque signatures individually, enforcing type abstraction between them.

The second example appears in Fig. 2. NowTreeForestis a functor, param- eterized by the type of labels of trees. We assume that an applicative functor MakeSetis given in a library for making sets of comparable elements.

TreeandForestdefine the same recursive types as the first example, except that the argument types of the value constructors Leaf and Node are param- eterized. The module alias module F = TF.Forest in Tree enables us to use an abbreviation F for TF.Forestinside Tree. Similarly, the type s in Tree is an abbreviation which expands intoTF.Forest.t. While we can dispense with abbreviations by replacing them with their definitions altogether, they are useful in practice [15].

In this second example, Tree and Forest have closer relations with each other: the functions split in Tree and sweep in Forest need to know the underlying implementations of the types Forest.tand Tree.tof the other to construct and deconstruct values of those types. Given a tree,splitcuts oﬀ the root node of the tree and returns the resulting forest.sweep gathers the leaves from a given forest.

Since the two modules depend on each other closely, we do not seal Tree andForestindividually here. Instead, we seal them as a whole with an opaque signature. The signature only exposes functionssplit,sweep, andincr, which

(4)

module TreeForest = functor (X: sig type t val compare : t → t → int end) → (struct (TF)

module S = MakeSet(X) module Tree = struct

module F = TF.Forest type s = F.t

datatype t = Leaf of X.t | Node of X.t * s val split = λx.case x of Leaf i ⇒ [Leaf i]

| Node (i, f) ⇒ (Leaf i) :: f

val labels = λx.case x of Leaf i ⇒ TF.S.singlton i

| Node (i, f) ⇒ TF.S.add i (F.labels f) end

module Forest = struct module T = TF.Tree type t = T.t list

val sweep = λx.case x of [] ⇒ []

| (T.Leaf y) :: tl ⇒ sweep tl

| (T.Node y) :: tl ⇒(T.Node y) :: (sweep tl) val labels = λx.case x of [] ⇒ TF.S.empty

| hd :: tl ⇒ TF.S.union (T.labels hd) (labels tl) val incr = λt. λf. let l1 = T.labels t and l2 = labels f in

if TF.S.subset l2 l1 then (t :: f) else f end

end:sig (Z)

module Tree : sig type t val split : t → Z.Forest.t end

module Forest : sig type t val sweep : t → t val incr : Z.Tree.t → t → t end end

end)

Fig. 2.Modules for trees and forests

augments a given forest only if the set of labels in a given tree subsumes the set of labels in the forest, but hidesTree.labelsandForest.labels, which are utility functions forincr. The signature also enforces type abstraction, protecting their privacy from the outside.

The two examples we have seen so far illustrate two possible uses of recursive modules. They may have privacy, enforcing type abstraction inside the recursion.

They may have intimacy, enforcing type abstraction outside the recursion. We think both uses are useful and would become common in practice.

Comparison with existing type systems The two examples presented are type checked in our type system without requiring additional annotations. Below, we will examine how existing type systems type check these examples.

In Russo’s system [18] there is no obvious way to type check the ﬁrst example, keeping type abstraction between Tree andForest. A suggested solution, which is found in his paper, is to annotate the self variable TFof TreeForest

(5)

with arecursive signature^{2 3} [18]:

sig (Z : sig module Tree : sig type t end

module Forest : sig type t = Tree.t list end end)

module Tree : sig datatype t = Leaf of int | Node of int * Z.Forest.t end module Forest : sig type t = Tree.t list val max : t → int end

end

The above annotation for TF, however, would break type abstraction between Tree and Forest, exposing underlying implementations of types Tree.t and Forest.tto each other.

In Dreyer’s system [5], both of the sealing signatures for Tree and Forest must be given in advance, that is, one has to write both signatures before deﬁning either of the two modules, as opposed to Fig. 1 where the signatures are written in a module-wise way.

Fig. 1 can be type checked in Objective Caml [12], without modiﬁcations.

Next, we will examine the second example. As we claimed, it is typable in our type system, and no additional annotations are required. In Russo’s system, however, one must annotateTFwith a recursive signature⁴:

sig (Z : sig module Tree : sig type t end

module Forest : sig type t = Tree.t list end end)

module Tree : sig datatype t = Leaf of X.t | Node of X.t * Z.Forest.t end module Forest : sig type t = Tree.t list val labels : t → MakeSet(X).t end end

Note that this annotation is solely for assisting the type checker. We have already given in Fig. 2 the eventual signaturesTree andForestshould have, which do not expose the underlying implementations of typesTree.t andForest.tnor thelabelsfunctions.

To type check Fig. 2 in Dreyer’s system and Objective Caml, one must write fully transparent signatures for Tree and Forest, in addition to the opaque signatures we have given in Fig. 2. Then, the type checker ﬁrst type checks the two modules assisted by the transparent signatures, which expose all components of the two modules including the underlying implementations of Tree.t and Forest.t. Once this succeeds, type abstraction can be enforced using the opaque signatures. Thus one has to write annotations yet more verbose than in Russo’s system.

We believe that both privacy and intimacy are important when programming with recursive modules. Existing type systems, however, do not handle them equally. These type systems require additional annotations from programmers, which are used only for helping the type checker, but do not aﬀect resulting signatures of modules. In practice, modules could contain large and complex type deﬁnitions. Then, additional annotations would quickly become a burden

2 This recursive signature does not exactly follow his syntax, e.g. we have to use structureinstead ofmodulein his system.

3 We note that by permuting the deﬁnition order ofTreeandForestone can reduce annotations to some extent in this case. However permutation does not always work.

4 To be precise, we have to remove the module alias F = TF.Forest from Tree to make it typable in Russo’s system.

(6)

Module expr. E ::=struct(Z)D1. . . Dnend structure

| functor(X :S)→E functor

| (E:S) sealing

| p module path

Deﬁnitions D ::=moduleM =E module deﬁnition

| vall=e value deﬁnition

| datatypet=T datatype deﬁnition

| typet=τ type abbreviation

Signature expr. S ::=sig(Z)B1. . . Bn end signature type

| functor(X:S)→S functor type Speciﬁcations B ::=moduleM:S module speciﬁcation

| typet=τ manifest type speciﬁcation

| typet abstract type speciﬁcation

| vall:τ value speciﬁcation

Recursive ident. rid ::=Z|rid.M

Module ident. mid ::=rid|mid(mid)|mid(X) Module path p, q, r::=mid|X

Module descript.K ::=E|S

Program P ::=struct(Z)D1. . . Dnend

Fig. 3.Syntax for the module language

for programmers to write, and would make programs overly verbose. Even if we assume that these annotations provide some useful information, our experience with type inference in ML is that one often writes a module without its signature, and then eventually writes a signature by editing the result of type inference.

This technique would not be available with recursive modules.

3 Syntax

Figure 3 gives the syntax for our module language, which is based on Leroy’s module calculus with manifest types [9]. We useM as a metavariable for module names,X for module variables, and Z for self variables. For simplicity, we dis- tinguish them syntactically, however the context could tell them apart without this distinction.

As explained in the previous section, we enrich both of structures and signatures with self variables to support recursive references between modules and signatures. In the constructsstruct(Z)D1. . . Dn endandsig(Z)B1. . . Bnend, the self variableZ is bound inD1. . . Dn, andB1. . . Bn, respectively.

The construct which enables recursive references is recursive identifiers. A recursive identifier is constructed from a self variable and the dot notation “.M”, which represents access to the module componentM of a structure. A recursive identifier can start from any bound self variable, thus can locate any module at any levels of nested structures regardless of the order of definitions. For instance, through the self variable of the top-level structure, one can refer to any modules in the structure except for ones reside in sealed modules. It is important that

(7)

τ ::=1 |τ1→τ2 |τ1∗τ2|ext rid.t|X.t T ::=cofτ

e ::=x|()|(λx.e:τ)|(e1, e2)|πi(e)|e1(e2)|rid.c e|caseeofms|rid.l|X.l ms ::=rid.c x⇒ e

ext rid::=Z|ext rid.M |ext rid(ext rid)|ext rid(X) Fig. 4.Syntax for the core language

recursive identiﬁers can only use bound self variables, and self variables of sealed modules are unbound outside of them. Otherwise type abstraction could be broken.

For the sake of simplicity, we assume that functor applications only contain module paths. In a practical system, however, we can make weaken this restriction following Leroy’s proposal [11].

To obtain a decidable type system, we impose afirst-order structure restric- tionthat requires functors 1) not to have functors as arguments, 2) nor to access inner modules of arguments. The first condition means that our functors are first-order, and the second implies that we have to pass inner modules as inde- pendent parameters for functors instead of passing a module which contains all of them. One might have noticed that the syntax of module paths excludes those of the formsX.M andX(p). This is due to this restriction.

Figure 4 gives the syntax for our core language. We usexas a metavariable for program variables (variables, for short), andcfor value constructor names.

The core language describes a simple functional language extended withvalue pathsX.l andrid.landtype pathsX.tandext rid.t.

Value paths X.l and rid.l refer to the value components l in the modules referred to by X andrid, respectively.

For type paths, we slightly extend the class of paths so as to liberally include functor applications, motivated by Leroy’s proposal [10]. This gives us more ﬂexibility in type sharing between modules.

From now on, we will say “paths” to mean module, value and type paths as a whole.

In our formalization, 1) function deﬁnitions are explicitly type annotated;

2) every structures and signature types have self variables; 3) paths are always preﬁxed by some self variable. Yet, our running examples do not follow these exactly. We assume that there is a prior phase to the type checking which adds type annotations for functions by running a type inference algorithm of the core language. The original program may still require some type annotations, to avoid running into the polymorphic recursion problem. In Section 8, we will discuss how to deﬁne this inference algorithm. The prior phase also adds omitted self variables, and use them for backward references.

We assume that the following three conditions hold: 1) module and self variables in a program diﬀer from each other; 2) a program does not contain free module and self variables; 3) any sequence of module deﬁnitions, type and

(8)

datatype definitions, value definitions, opaque and transparent type specifications, value specifications, and module specifications does not contain duplicate definitions nor specifications for the same name.

4 Overview of the type system

In this section, we overview our type system using the examples in Section 2.

For a moment, we shall suppose that we have an oracle to expand paths, that is, we suppose that we have an algorithm to ﬁnd out the component a path refers to. Then, type checking of Fig. 2 is straightforward. For instance, let us consider type checkingTree.split. The critical part is in the value deconstruction of the form Node(i, f). We should know that f has a type Tree.t list to ensure that (Leaf i) :: f is a well-typed expression. The expansion algorithm tells us that the type s in Tree refers to the type Tree.t list, by chasing type abbreviations. This concludes that split is a well-typed function of the type Tree.t → Tree.t list.

Type checking of Fig. 1 proceeds in a similar way to that of Fig. 2, except that we are careful not to peek at the inside of sealed modules when type checking the outside of them. To see this, let us consider to type checkTree.max. It resides outsideForest and calls Forest.max through the value path TF.Forest.max.

The expansion algorithm tells us thatTF.Forest.maxrefers to the speciﬁcation val max : t → int in Forest’s sealing signature. Thus we can know that TF.Forest.max fis a well-typed expression of the typeint. Since the typetin Forest’s signature is opaque, we have no way to know its underlying implementation. This implies that applyingTF.Forest.max to the null list[] produces an ill-typed expression, enforcing type abstraction ofForesttowardsTree.

Applyingmaxto the null list is a well-typed expression insideForest. This implies that we have to infer diﬀerent types forForest.maxdepending on whether it is referred to from the inside of the sealing or from the outside. The expansion algorithm takes care of that without diﬃculties. When the preceding self variable of a path is declared inside a moduleM, then the path refers to components of M from inside, otherwise it does from outside. According to this criterion,TF.Forest.maxin Fig. 2 refers toForest.maxfrom the outside, asTF is declared outsideForest.

Developing a terminating procedure for expanding paths is the most technically involved part of the type system. In Section 6, we will give a brief account for it. We defer to [14] and [13] for details.

5 Type system and soundness

Our type system is based on Leroy’s applicative functor calculus [10]. It diﬀers from Leroy’s in that: 1) we use the expansion algorithm to obtain the necessary type information about paths; 2) we deﬁne type equalities by means of path expansion instead of a type strengthening operation as Leroy does.

(9)

p;p⁰ p.M ;p⁰.M

p;p⁰ p(q);p⁰(q)

q;q⁰ p(q);p(q⁰)

`p7→(θ, q) p;θ(q) Fig. 5.Normalization of module paths

[lk-sf ]

`Z^θ7→(θ, ∆(Z)) [lk-sexp]

`p7→(θ,struct(Z). . .moduleM=E . . .end)

`p.M7→(θ, E)

[lk-ssig]

`p7→(θ,sig(Z) . . .moduleM:S . . .end)

`p.M7→(θ, S) [lk-fexp]

`p17→(θ,functor(X:S)→E) p26;

`p1(p2)7→(θ[X7→p2], E)

[lk-fsig]

`p17→(θ,functor(X:S⁰)→S) p26;

`p1(p2)7→(θ[X7→p2], S) [lk-seal]

`p7→(θ,(E:sig(Z) . . .moduleM:S . . .end))

`p.M7→(θ, S)

[lk-fseal]

`p17→(θ,(E:functor(X:S⁰)→S)) p26;

`p1(p2)7→(θ[X7→p2], S) Fig. 6.Deﬁnition look-up

In the rest of this paper, we assume a fixed source program P. The type system and the operational semantics to be defined are with respect to this fixed program. Moreover, for technical reasons of defining the type system, we annotate self variables with module variable environments. A module variable environment is a substitution of module paths for module variables. Correspond- ingly, we assume that every occurrence of a self variableZ in the source program P is implicitly annotated with the identity substitutionid. That is, we regard Z as an abbreviation for Zîd. We use θ as a metavariable for module variable environments. We also assume a global mapping ∆, which maps a self variable to its associated structure or signature type. For instance∆ mapsZto

sig (Z)

module Tree : sig type t val split : t → Z.Forest.t end

module Forest : sig type t val sweep : t → t val incr : Z.Tree.t → t → t end end

in Fig. 2.

The complete rules for the type system can be found in Appendix A. Here we only presentnormalizationof module paths anddeﬁnition look-upjudgments.

The normalization is given by a small step reduction of module paths. It would be an “ideal” algorithm for expanding module paths: it chases module aliases in an intuitive way, but does not necessarily terminate. In Appendix A, we instead presentsemi-ground normalization, which is provably terminating and coincides with the intuitive one for well-typed programs. Our type system uses the semi-ground normalization instead of the ideal one to avoid undecidability.

The normalization is deﬁned in Fig. 5. A judgmentp;p⁰is read aspreduces into p⁰. A path pis in normal form, denoted as p6;, when there is no q such thatp;q.

(10)

The ﬁrst three rules are straightforward. The last rule says that when a path pis deﬁned in the source program as an alias for the path q, expressed by the premise`p7→(θ, q) to be explained below, thenpreduces toθ(q).

The deﬁnition look-up judgment `p7→(θ, K) is deﬁned in Fig. 6, whereK is a metavariable for module expressions and signatures. The judgment `p7→

(θ, K) is read that p refers to the module description K, where each module variableX is bound toθ(X).

For a self variable, we consult the global mapping ∆[lk-sf ]. The next four rules[lk-sexp],[lk-ssig],[lk-fexp][lk-fsig]are straightforward. Note that a module pathp.M may refer to either of a module expression or a signature depending on whether p refers to a structure or a signature type. The last two rules are important. They involve sealed modules. When a pathprefers to a module sealed with a signature type[lk-seal], then p.M refers to the signature’s module ﬁeld M. When a pathp1refers to a module sealed with a functor type[lk-fseal], then p1(p2) refers to the functor’s body with its module environment augmented with a new bindingX 7→p2. These two rules disallow to peek at the inside of a sealed module when a path refers to the module from the outside of the sealing.

We put the side conditionp26;in the rules[lk-fexp], [lk-fsig], and[lk-fseal]

to make sure that all path components are normalizable. While the reduction strategy of the ideal normalization is not deterministic, this condition forces it to reduce all functor arguments before chasing module aliases. This is required to make it compatible with the decidable semi-ground normalization.

The judgment ` p 7→ (θ, K) does not hold for all paths, even in a well- typed program. For instance, the judgment holds for the pathTF.S, but not for TF.Tree.F.Sin Fig. 2. The central role of the normalization, or the expansion of module paths, in the type system is to either reduce a module pathpinto p⁰ such that`p⁰ 7→(θ, K) holds, andKis not a path, or raise an error if it cannot.

It is straightforward to make the type system allow separate type checking.

The last two rules of the deﬁnition look-up play an important role in this respect.

These two rules imply that the type system does not need to know the inside of a sealed module when type checking its outside. The inside is looked up only when type checking the sealed module itself.

Proposition 1 (Decidability of type checking). It is decidable whether a program P is well-typed or not.

Soundness

Here we give a call-by-value operational semantics and state its type soundness.

Valuesv and evaluation contextsE are:

v ::= ()|(v1, v2)|u.c v|λx.e

E::={} |(E, e)|(v, E)|πi(E)|E (e)|v(E)|E e|u.c E|caseEofms where we useuas a metavariable for paths in normal form.

A small step reduction is either:

(11)

p.l→^mpp⁰.l whenp;p⁰ p.c^mp2→p⁰.c whenp;p⁰ πi(v1, v2)^prj→vi

(λx.e)v^fun→ e{v/x} caseu.c v ofp.c x⇒e^case→ e{v/x} u.l^vpth→ θ(e) when`p7→(θ,struct(Z) . . . vall=e . . . end) or an inner reduction obtained by induction:

e1→e2

E{e1} →E{e2}

When deconstructing a value through the case statementcaseu.c vofp.c x, we do not explicitly check thatuandpare equivalent paths,i.e.that both refer to the same module. Our type system already ensures that p expands into a path equivalent tou. Note also that, when applying a value constructorp.c, we ﬁrst normalizep. This normalization might not terminate due to cyclic module deﬁnitions. In Section 8, we will discuss how to detect these cycles.

During the reduction, we would like to look up from the source program actual implementations of modules, instead of their signatures. For this to be work, we assume that once the source program is type checked, all sealing signatures in the program are erased. Then the deﬁnition look-up deﬁned in Fig. 6 always looks up actual implementations during the normalization of module paths and in the reduction step^vpth→. That is, the rules[lk-seal]and[lk-fseal]are not used during the reduction.

The following proposition states that when the source program is well-typed, a well-typed expression does not go wrong.

Proposition 2 (Soundness). Let a program P and an expression e be well- typed with respect to P. Then the reduction→ either leads e to a value or else gives rise to an inﬁnite reduction sequence.

6 Expanding paths

Designing an algorithm for expanding paths is made diﬃcult by the potential presence of cyclic and dangling references. Since we have to use the algorithm during type checking, it must terminate for any program, whether it is well- typed or not, to keep type checking decidable. It may seem easy to detect cyclic deﬁnitions in simple cases such asstruct (Z) type t = Z.s type s = t end.

However, when we combine type abbreviations and module aliases, it becomes non-trivial to detect cycles.

For expanding module paths, we split the expansion into two phases by defer- ring substitution of functor arguments. This enables us to enforce the ﬁrst-order structure restriction without relying on the type system. The restriction in turn makes it possible to apply techniques in ground term rewriting systems, for which termination conditions are well-investigated [2]. Our module path expansion is deﬁned in Fig. 7, 8 and 9 in Appendix A. It acts similarly to the one we used in our previous work [14], except that we check the well-foundedness of path

(12)

dependency relations [14] simultaneously with the expansion, rather than in a previous phase as we did before. This makes the algorithm easier to implement.

For expanding type paths, we define another algorithm on top of the module path expansion, which is found in Fig. 10. The expansion is defined for all types, rather than only for type paths. This is convenient to define type equalities, which are defined in terms of the expansion.

The critical rule is:

[typ]

`p;sp⁰ `p⁰7→(θ,struct . . . typet=τ1. . . end) L(p⁰.t)6∈Σ Σ,L(p⁰.t)`τ1↓τ2 Σ`θ(τ2)↓τ

Σ`p.t↓τ

This rule is for chasing type abbreviations. To avoid chasing cyclic type abbreviations, we check that expanding the abbreviated type does not come back to the same place in the source program. The premise L(p⁰.t) 6∈ Σ, which states that the defining location ofp⁰.tis not inΣ, is used for this purpose. We make this cycle check flexible in handling functors by splitting the expansion of the abbreviated type into two phases. We first expand the abbreviated type without substituting functor arguments, in the premise Σ,L(p⁰.t)` τ₁ ↓ τ₂. Once this expansion succeeds, we apply the substitution for the newly obtained type and expand it, by the premise Σ ` θ(τ2) ↓ τ. Again, the technique we used here is inspired by term rewriting theory, in particular recursive path ordering [3].

Recursive path ordering is a technique to construct a well-founded ordering on trees from a well-founded ordering on labels of the trees. In our setting, trees are the derivation trees of type expansion, and we use a well-founded ordering on module paths. The ordering on module paths is obtained through our module path expansion.

We deﬁne value path expansion, applying the same idea as with type expansion,. It can be found in Fig. 13 in the appendix.

As we already mentioned in section 5, our expansion algorithms are pes- simistic in that they attempt to expand all functor arguments, even though some might not be actually used. That is, our strategy to expand paths is close to a call-by-value approach (with a bit of laziness to allow recursion), while a call-by-name approach might succeed more often, and allow more programs to be well-typed. For instance a call-by-name approach might allow the module deﬁnition module M = F(M) when F is provably contractive, for instance if F does not use its argument. However our approach, using semi-ground normalization, has the major advantage of being decidable. Semi-ground normalization also appears to be (almost) complete with respect to the ideal normalization we presented in section 5. That is, programs which would be well-typed using the ideal normalization are still well-typed using semi-ground normalization (and conversely), provided modules passed as arguments to functors have no nesting.

(13)

7 Related work

Much work has been devoted to investigating recursive module extensions of the ML module system. Notably, type systems and initialization for recursive modules pose non-trivial issues, and have been the main subjects of study.

Crary, Harper and Puri[1] gave a foundational, type-theoretic account for recursive modules. They analyzed recursive modules in the context of a phase- distinction formalism [6], by introducing a ﬁxed-point operator for modules and recursively dependent signatures. Their type system requires fully transparent signature annotations for recursive modules, where all components of the modules must be made public. This means that recursive module cannot have privacy, thus Fig. 1 from Section 2 is untypable.

Objective Caml [12] is a real language which supports recursive modules.

The syntax of our calculus is motivated by it. In particular, we follow it in that we write a sealing signature for each recursive modules rather than lump them together in advance. The way we type check recursive modules is also inspired by O’Caml, in particular by Leroy’s applicative functor calculus [10].

We already mentioned in Section 2 that O’Caml requires verbose annotations for intimate modules. O’Caml also supports recursive signatures, with a rather concise syntax. However, it allows to write problematic modules whose type checking does not terminate. The possibility of non-termination when typing O’Caml modules is a well-known fact, but the phenomenon was supposed to be rare enough. Recursive signatures seem to make it more acute. This is one of our motivations in insisting on decidable type checking for our system. Of course we obtain it through restrictions, and a less expressive signature language. Yet, this may be the price for safety. Since we have similar typing rules, we hope that our approach can apply to O’Caml with little change.

Russo [18] proposed a type system for recursive modules, which is imple- mented in Moscow ML [17]. As examined in Section 2, the type system does not allow privacy between recursive modules, and self variables must be annotated with forward declarations in his system.

Dreyer [5] gave a theoretical account for recursive modules with generative abstract data types (as opposed to applicative ones, where functors are applicative). The paper focuses on recursive modules with privacy, proposing a “des- tination passing” interpretation for them. Hence the type system can handle Fig. 1 from Section 2, but requires verbose annotations for Fig. 2, as we have seen in Section 2.

Hirschowitz and Leroy [8] and Dreyer [4] have proposed type systems which ensure that initialization of recursive modules does not try to access components of modules that are not yet evaluated. They are only interested in the safety of initialization, hence their modules do not have type components.

Their type systems judge both of the programs:

struct (Z) val l = Z.m val m = l end and

struct (Z) val l = λx. x + Z.m val m = l(3) end

(14)

to be ill-typed, since, in both programs, the evaluation of the component m cyclically requires the evaluation of itself. Our type system, in particular our value path expansion algorithm, can detect the cycle for the former program, but not for the latter.

From a technical point of view, our value path expansion algorithm and their type systems are orthogonal. Hence, we think that it is possible to combine both to obtain a stronger type system. This looks like a promising avenue for future work.

8 Conclusion and Future work

In this paper we presented a type system for recursive modules. The novelty of our type system is that it has a terminating procedure to resolve path references.

This procedure enables it to reconstruct the necessary type information during type checking, thus the type system can type check recursive modules without relying on signature annotations given by programmers.

Our type system is rich in expressiveness: type abstraction can be enforced either inside or outside of recursive modules. It also avoids verbosity: programmers do not have to write additional signature annotations when the information can be inferred.

There is still a lot of work to be done to obtain a fully practical system. Here we give a brief overview of ongoing and future work.

Type inference for the core language We can deﬁne a type inference algorithm for the core language by combining a standard type inference algorithm and our value path expansion algorithm. We have to be careful about polymorphism and well-formedness. To obtain as much polymorphism as possible, we need to determine the best order for type inference. Using information obtained through our value path expansion, we can build a call graph of functions (represented by a directed graph), which expresses how functions in modules depend on each other. This graph gives us useful information to control inference: the strongly connected components of the graph indicate sets of value components whose type should be inferred simultaneously, referring to each other monomorphically; by topologically sorting the connected components, we can generalize types in a connected component before moving on to typing the next one. We must also check for well-formedness of types, as module variables should not escape their scope. Explicit type annotations can be used to break dependencies in the call graph, and allow polymorphic recursion. Note that these annotations cannot be completely avoided, as type inference for polymorphic recursion is known to be undecidable [7].

Detecting invalid paths It might be desirable to statically ensure that paths are valid references, that is, that they are not cyclic or dangling. Our type system already detects invalid paths as long as they do not go through sealings. For instance, the module path expansion algorithm will detect the cycle in

(15)

struct (Z) module M = Z.M end, and the type system will reject it, but struct (Z) module M = (Z.M : sig val l : int end) endwill be accepted, as the type system forbids expansion to go through the sealing signature of M.

The latter cycle could be detected in a separate phase from the type checking, using the same approach which detects the former. This is ongoing work.

Separate compilation To support separate compilation of recursive modules is a challenging task, if we require invalid paths to be detected. As we mentioned in Section 5, the type system allows separate type checking. There are no diﬃculties with respect to the soundness property. However, since a signaturesig val l : int endsays nothing about how the value componentldepends on components of other recursively deﬁned modules, we cannot detect the cycle in

structure (Z)

module M = (Z.N : sig val l : int end) module N = (Z.M : sig val l : int end) end

if we only know the sealing signature of N when compiling M, and vise versa.

We are currently investigating two approaches. One is to check cycles at the module linking time. Then, we would need access to the implementations of all the modules to be linked. Another is to enrich signatures with information on dependencies between modules. We need further investigation on this topic.

Lazy modules The operational semantics presented in the paper uses lazy evaluation for both modules and their value components, in the sense that only components of modules that are accessed are evaluated, and the evaluation is triggered at access time. This semantics simpliﬁes the soundness statement and its proof. However, it might not be natural for practical programming. Indeed, in our experimental implementation, we keep modules lazy, but we evaluate all the value components (but not module components) of a module at once, triggered by the ﬁrst access to a component of the module. This semantics gives us a uniform way to initialize statically and dynamically loaded modules, that is, both of statically and dynamically loaded modules are initialized only when their components are accessed. We need more investigation on this semantics.

Acknowledgments We thank Susumu Nishimura, Masahito Hasegawa, and Sinya Katumata for their valuable comments on the draft paper.

References

1. K. Crary, R. Harper, and S. Puri. What is a recursive module? InProc. PLDI’99, pages 50–63, 1999.

2. M. Dauchet and S. Tison. The theory of ground rewrite systems is decidable. In Proc. LICS’90, 1990.

3. N. Dershowitz. Orderings For Term-Rewriting Systems. Theoretical Computer Science, 17(3):279–301, 1987.

4. D. Dreyer. A type system for well-founded recursion. InProc. POPL’04, 2004.

(16)

5. D. Dreyer. Recursive Type Generativity. InProc. ICFP’05, 2005.

6. R. Harper, J. C. Mitchell, and E. Moggi. Higher-order modules and the phase distinction. InPorc. of POPL’90, pages 341–354, 1990.

7. F. Henglein. Type inference with polymorphic recursion. ACM Transactions on Programming Languages and Systems, 15:253–289, 1993.

8. T. Hirschowitz and X. Leroy. Mixin modules in a call-by-value setting. InProc.

ESOP’02, pages 6–20, 2002.

9. X. Leroy. Manifest types, modules, and separate compilation. InProc. POPL’94, pages 109–122. ACM Press, 1994.

10. X. Leroy. Applicative functors and fully transparent higher-order modules. In Proc. POPL’95, pages 142–153. ACM Press, 1995.

11. X. Leroy. A modular module system. Journal of Functional Programming, 10(3):269–303, 2000.

12. X. Leroy, D. Doligez, J. Garrigue, and J. Vouillon. The Objective Caml system.

Software and documentation available on the Web,http://caml.inria.fr/.

13. K. Nakata. Type inference for recursive modules. http://www.kurims.kyoto-u.

ac.jp/~keiko/.

14. K. Nakata, A. Ito, and J. Garrigue. Recursive Object-Oriented Modules. InProc.

FOOL’05, 2005.

15. B. Pierce, editor. Advanced Topics in Types and Programming Languages, chap- ter 9. The MIT Press, 2004.

16. R.Milner, M. Tofte, R. Harper, and D. MacQueen.The Deﬁnition of Standard ML - Revised. The MIT Press, 1997.

17. S. Romanenko, C. Russo, N. Kokholm, and P. Sestoft. Moscow ML. Software and documentation available on the Web,http://www.dina.dk/~sestoft/mosml.

html.

18. C. Russo. Recursive Structures for Standard ML. InProc. ICFP’01, pages 50–61.

ACM Press, 2001.

A Appendix

This appendix presents the type system.

We make the following four assumptions. 1) Signatures assigned to functor arguments are only signature types that do not contain module specifications; 2) All occurrences of module descriptions, value definitions, type definitions, and type specifications in the source program P are labeled with distinct integers.

We letL be a mapping from paths to integers, which sends a pathp(resp,p.l andp.t) to an integeriwhenp(resp,p.landp.t) refers to a module description (resp. value definitions, and type definitions or type specifications) labeled with i. 3) There is a prior path to the type checking which replaces every occurence of a self variableZ of a signature of a module variableX withX. For instance, a functor expression:

functor (X: sig (Z) type t type s = Z.t val f : int →Z.s end) → X is turned into:

functor (X: sig (X) type t type s = X.t val f : int →X.s end) → X 4) We annotate module variables with module variable environments, as we did for self variables. We regardX as an abbreviation forX^id.

(17)

5) We have a global mappingΩ, which sends a module variable to its associated signature, and enrich the deﬁnition the look-up judgment given in Fig. 6 with the following rule:

[lk-mv]

`X^θ7→(θ, Ω(X))

We letW as a metavariable for module variables and self variables.

Fig. 7 deﬁnes semi-ground normalization, which is the composition of the ground normalization given in Fig. 8, and the variable normalization given in Fig. 9. Abusing notations, we use p, q, r as metavariables ranging over both of module paths and extended recursive identiﬁersext mid.

Fig. 10 deﬁnes the expansion of types (this is responsible for expanding type paths.) This expansion is important for judging type equivalence, which is found in Fig. 11 with auxiliary judgments in Fig. 12.

Fig. 13 presents type reconstruction, with an auxiliary judgment in Fig. 14.

Type reconstruction is responsible for expanding value paths.

Finally, Fig 15 presents the overall type system, which uses auxiliary judgments in Fig. 16 to Fig. 19.

Note that ground-normalization, type expansion, and type reconstruction all deﬁne algorithms, as their derivations are deterministic.

`p;^gp⁰

`p;^sη(p⁰)

Fig. 7.Semi-ground normalization

[gnlz-mv]

Σ`X;^gX

[gnlz-sf ] Σ`Z^θ;^gZ^θ

[gnlz-exp1]

Σ`p;gp⁰ `p⁰7→(θ, E) E6≡mid Σ`p.M;^gp⁰.M

[gnlz-pth1]

Σ`p;^gp⁰ `p⁰.M7→(θ, q) q6≡X L(p⁰.M)6∈Σ Σ,L(p⁰.M)`q;gr

Σ`p.M;gθ(r) [gnlz-exp2]

Σ`p1;^gp⁰1 Σ`p2;^gp⁰2

`p⁰1(p⁰2)7→(θ, E) E6≡mid Σ`p1(p2);gp⁰1(p⁰2)

[gnlz-pth2]

Σ`p1;^gp⁰1 Σ`p2;^gp⁰2 `p⁰1(p⁰2)7→(θ, q) q6≡X L(p⁰1(p⁰2))6∈Σ Σ,L(p⁰1(p⁰2))`q;^gr

Σ`p1(p2);^gθ(r) Fig. 8.Ground-normalization

(18)

η(X) = X η(Z^θ) =Z^η(θ), η(p.M) =ζ(η(p).M) η(p1(p2)) =ζ(η(p1)(η(p2)))

ζ(p) =

{θ(X) when `p7→(θ, X)

p otherwise

Fig. 9.Variable normalization

[uni]

Σ`1↓1 [arr]

Σ`τ1↓τ1⁰ Σ`τ2↓τ2⁰

Σ`τ1→τ2↓τ1⁰→τ2⁰

[pair]

Σ`τ1↓τ1⁰ Σ`τ2↓τ2⁰

Σ`τ1∗τ2↓τ1⁰∗τ2

[dtyp]

`p;^sp⁰

`p⁰7→(θ,struct . . .datatypet=. . .end) Σ`p.t↓p⁰.t

[typ]

`p;^sp⁰ `p⁰7→(θ,struct . . . typet=τ1. . . end) L(p⁰.t)6∈Σ Σ,L(p⁰.t)`τ1↓τ2 Σ`θ(τ2)↓τ

Σ`p.t↓τ [opq]

`p;^sp⁰

`p⁰7→(θ,sig . . . typet . . .end) Σ`p.t↓p⁰.t

[trsp]

`p;^sp⁰ `p⁰7→(θ,sig . . . typet=τ1. . . end) L(p⁰.t)6∈Σ Σ,L(p⁰.t)`τ1↓τ2 Σ`θ(τ2)↓τ

Σ`p.t↓τ [opq2]

`p;sp⁰

`p⁰7→(θ,(E:sig . . . typet . . .end)) Σ`p.t↓p⁰.t

[trsp2]

`p;^sp⁰ `p⁰7→(θ,(E:sig . . . typet=τ1. . . end)) L(p⁰.t)6∈Σ Σ,L(p⁰.t)`τ1↓τ2 Σ`θ(τ2)↓τ

Σ`p.t↓τ Fig. 10.Expansion of types

`1≡1

`τ1≡τ2 `τ₁⁰ ≡τ₂⁰

`τ1 →τ1⁰≡τ2→τ2⁰

`τ1≡τ2 `τ₁⁰ ≡τ₂⁰

`τ1∗τ1⁰≡τ2∗τ2⁰

`p≡p⁰

`p.t≡p⁰.t Fig. 11.Type equivalence

L(p) =L(p⁰) `p7→(θ, E) `p⁰7→(θ⁰, E) `θ≡θ⁰

`p≡p⁰

∀X∈dom(θ), `θ(X)≡θ⁰(X)

`θ≡θ⁰ Fig. 12.Auxiliary judgments for the type equivalence