• 検索結果がありません。

Construction Algorithm to the While Language

iÐ0

while piănq t iÐi`1 u

ă n

0

` 1

ϕwhile

ϕend i

1

__????? 2

OO

3

??





2

OO

1 44

2

OO

jj 1 1

OO

oo_ _ _ _ _ _

Figure 8.5: Example of circular value graph withϕwhileandϕend, whereion the right refers to its value at the exit of the while statements on the left.

Another important point in detecting congruence is that we have to distinguish ϕwhile of an outer loop in a nested loop fromϕwhile of an inner loop. It is sufficient to check containment relation between two while statements in the comparison ofϕwhile. These considerations do not affect the construction of value graphs. Refer to the discussion onϕwhile andϕendin Section 8.5.

8.2.3 Definition and Formalization of Value Graphs

Definition

Here we formally define the data structure of value graphs in this chapter. A value graph is a triple pG, V, Fq. Gis a directed graphpN, EqwhoseN is a set of tagged nodes andEis a set of ranked edges.

V is a mapping from an alphabetΣtoN. F is an injective binary relation fromΣtoN.

These denotations in programs are as follows. Σis a set of variable symbols. Gis a set of expression values, which is a solid-line part of value graphs shown in Figures 8.2, 8.3, and 8.5. V is a variable binding, which is a break-line part of value graphs. F, which we introduce for convenience of algorithm description, captures the occurrences of free variables (i.e., variables of unknown values) inN. The rank in E denotes the position in an argument list. The set of tags in N consists of variables, constants operators, andϕ-functions. Each node has the only tag but tags are mutable. Multiple nodes can have the same tag. nt denote a noden tagged byt, but we omitt in the case wheret is unused. We call a node tagged by a variable a variable node and call a node tagged by aϕ-function a ϕ-node. For every value graphpG, V, Fq, it is an invariant thatvÞÑnvPF for any variable nodenvPG.

For algorithm descriptions, we introduce a special variable $ and a special mapω. $ is a variable that refers to the value of an expression and does not appear in a given program. ω denotes no control flow and pretends a variable binding aspG, ω, Fq. ωis used for dealing with goto statements in Section 8.4.

Formalization by Circuits

We also formalize a value graph pG, V, Fqas a circuit. We regard G as a black-box circuit where we can observe only input/output points. imgpVq, i.e., the image ofV corresponds to the output points of G, andimgpFq corresponds to the input points ofG. Wires are connected to the input/output points of G from the outside. Variables symbols in dompVq and dompFq correspond to the colors of wires to distinguish them. Nodes tagged by constants or variables are terminals in G and nodes tagged by operators or ϕ-functions are gates in G. This formalization (or a view of value graphs) enables us to focus on the concerned part ofGon constructing value graphs.

8.3. CONSTRUCTION ALGORITHM TO THE WHILE LANGUAGE 75

8.3.1 Syntax-Directed Formulation

We define C for each case in this subsection. LetfreshNodeptqbe a primitive function that generates a fresh node3 tagged byt. A fresh node is distinct from every other node.

First, from the definition of value graphs, the definitions in the cases of expressions and assignments are immediately obtained.

Cpvq “ pptnu,Hq,t$ÞÑnu,tvÞÑnuq, wherenfreshNodepvq,

Cpcq “ pptnu,Hq,t$ÞÑnu,Hq, wherenfreshNodepcq,

Cpe1fe2q “ pG3,t$ÞÑn3u, F1YF2q,

whereG3G1YG2Y ptn3u,tn3ÞÑn1, n3ÞÑn2uq, pG1,t$ÞÑn1u, F1q “Cpe1q,

pG2,t$ÞÑn2u, F2q “Cpe2q, n3freshNodepfq, CpvÐeq “ pG,tvÞÑnu, Fq,

wherepG,t$ÞÑnu, Fq “Cpeq.

For simplicity, the algorithm description above takes no account of the order of the operands off. It is immediate to take account of operand order. Similarly, in the remainder of this chapter, we use algorithm descriptions taking no account of operand order.

The case of statement sequencings1 s2 is defined as

Cps1 s2q “Cps1q bCps2q,

where b is a serial composition operator over value graphs. The meaning of b is easy to understand as the construction of series circuits on the basis of the formalization by circuits. Specifically, it is to construct correct wire connections between the output points of Cps1q and the input points of Cps2q.

The outgoing wire in a color of Cps1qconnects to all the incoming wire in the same color Cps2q. Then, we remove terminals in Cps2q connected to Cps1q and short-circuit incoming wires in Cps2q. We can formulatebover the data structures of value graphs as follows:

ppN1, E1q, V1, F1q b ppN2, E2q, V2, F2q “ pG3, V3, F3q, whereG3“ pN1YupdateσpN2q, E1YE21 YEq,

V3“ tvÞÑnPV1|vRdompV2qu YV2,

F3F1Y tvÞÑnPF2|vRSu Y tv1ÞÑn2 |vÞÑn2PFalias, nv11V1pvqu, Falias“ tvÞÑn2PF2|vPS, nv11V1pvqu,

E“ tn1ÞÑV1pvq |n1ÞÑnPE2, vÞÑnPF2´Falias, vPSu, E21E2´ tn1ÞÑnPE2|vÞÑnPF2´Falias, vPSu,

σ“ tvÞÑv1|vÞÑn2PFalias, nv11V1pvqu, S “dompV1q XdompF2q.

updateσpNqis a primitive operation to update the tags of variable nodes in a setN according a mappingσ overΣ. For example,updatetvÞÑv1uptn`1, nv2uq “ tn`1, nv21u. Note that the identity ofn2here is unchanged and the edges ofn2remain connected after updating. updateσpN2qcorresponds to copy propagation on the nodes of a resultant value graph. Falias is the subset ofF2 sensitive to this copy propagation. The third term in the RHS of the definition ofF3makes the invariant vÞÑnvPF3 hold.

3Node tagged by constants do not have to be fresh regarding the same constant.

When we regard a circuit as a function, the serial compositionbcorresponds to function composition.

bis therefore associative and its identity is pH,H,Hq. Since this identity corresponds to the identity function, this meaning is to do nothing. The case of empty statements is therefore defined as

Cppassq “ pH,H,Hq.

To define the cases of if statements and while statements, we have to place high-level ϕ-functions conceptually. This is straightforward by using the single-entry single-exit control flow of these high-level control structures [BM94].

Specifically, the single-entry single-exit control flow in if statements is observed as follows. To reach the then/else part of an if statement, it is necessary to pass through the entry of the if statement; it means single-entry. To reach the exit of an if statement, it is necessary to pass through the exit of the then/else part; it means single-exit. These lead to two observations. ϕ-functions at the exit of an if statement are unnecessary for variables defined neither in the then part nor the else part. If a variable v is defined either in the then part or the else part, the value ofvat the entry reaches the exit and may become confluent viav. Without redundant (high-level)ϕ-functions, we can therefore define the case of if statements as follows:

Cpifpeq ts1uelse ts2uq “ pG, V, Fq,

whereGGcYGtYGeY pimgpVq YimgpFq, Eq, V “ tvÞÑfreshNodepϕifq |vϕu,

FFcYFtYFeYF,

F“ tvÞÑfreshNodepvq |vϕ´ pdompVtq XdompVeqqu, E“ tVpvq ÞÑnc|vϕu Y tVpvq ÞÑn|vÞÑnPFu,

Y tVpvq ÞÑn|vÞÑnPVtu Y tVpvq ÞÑn|vÞÑnPVeu, Σϕ“dompVtq YdompVeq,

pGc,t$ÞÑncu, Fcq “Cpeq, pGt, Vt, Ftq “Cps1q, pGe, Ve, Feq “Cps2q.

Here,freshNodepvqdenotes the value of vat the entry of a given if statement.

The generation ofϕwhile is almost the same as that ofϕwhile as an if statement whose else part is an empty statement. The main difference is that the join point ofϕwhileis the entry of its while statement.

We therefore have to consider the sequencing ofϕwhile derived from a while statement followed by the while statement. We apply serial compositionbto the sequencing, and thereby we introduce circularity into a resultant value graph. Finally, we append aϕendnode toϕwhilenode. The case of while statements is defined as follows:

Cpwhilepeq tsuq “ pGY pimgpVendq, Eendq, Vend, Fq, whereEend“ tVendpvq ÞÑVpvq |vPdompVqu,

Vend“ tvÞÑfreshNodepϕendq |vPdompVqu,

pG, V, Fq “ ppimgpVq YimgpFq, Eq, V, Fq b pGcYGb,H, FcYFbq, V“ tvÞÑfreshNodepϕwhileq |vPdompVbqu,

F“ tvÞÑfreshNodepvq |vPdompVbqu,

E“ tn1 ÞÑnc, n1 ÞÑFpvq, n1ÞÑn|vÞÑnPVb, n1Vpvqu, pGc,t$ÞÑncu, Fcq “Cpeq,

pGb, Vb, Fbq “Cpsq.

Here,freshNodepvqdenotes the value of vat the entry of a given while statement.

8.3. CONSTRUCTION ALGORITHM TO THE WHILE LANGUAGE 77

8.3.2 Implementation Issues

Use of Union-Find Trees

Although C achieves the construction of value graphs from ASTs in a syntax-directed manner, a naive implementation ofC can cause a problem on efficiency. updateσpN2qin the definition ofbcorresponds to a naive copy propagation. The time complexity of this naive copy propagation depends on the order of processing assignments. For example, consider the following sequence of assignments:

x1Ða x2Ðx1

...

xmÐxm´1.

A naive copy propagation is to rewrite a RHS variable with a LHS variable in nearest prior assignments. It costsOpmqtime if we perform it fromx2Ðx1, but it costsOpm2qtime if we perform it fromxmÐxm´1. C, which performs in the simplest syntax-directed manner, does not guarantee that its processing order is the program order. If we complicated C, we could guarantee it but shall spoil the simplicity of its syntax-directed manner. This inefficiency ofC is, however, unacceptable.

The cause of this inefficiency is to perform copy propagation eagerly. If we perform it lazily, we can avoid the worst case. We can consider variable rewriting as construction of equivalence classes and therefore we can implement this lazy copy propagation by using union-find trees4.

Specifically, we construct an equivalence class of the nodes in tagged by the same variable by using a union-find tree. As a result, since the invariant v ÞÑnv PF holds, F becomes an injective function.

We require two simple modifications on C. First, when we construct F1YF2, letting v P dompF1q Y dompF2q, we unify two nodes F1pvqand F2pvq. Second, when we refer to variable nodes, we find their representatives. To update the tags of only representatives suffices for copy propagation. We therefore use the only field of each variable node both for a tag and a link to representative.

If we use union-find trees equipped with path compression and union by rank, the copy propagation formassignments costsOpmαpm, mqq, whereαpm, mqdenotes the inverse Ackermann function. From a realistic size ofm, we can assumeαpm, mqto be a small constant. C thus becomes almost as efficient as ordinary program-order algorithms.

Use of union-find trees also simplifies the implementation of C. In particular, it simplifies the con-struction ofE in the definition ofb. A naive way of implementing it is that all variable nodes mange reverse edges. This requires mutation of the fields of operator nodes and more fields in a variable node.

Meanwhile, if we change the representative of a variable node to operator nodes in applyingb, we achieve the equivalent result ofE in later find operations.

Reference from ASTs to Value Graphs

As mentioned in Section 8.2.1, for value numbering, we require a mappingT from expressions in ASTs to nodes in value graphs. SinceCconstructs a distinct node for each expression,T becomes an injective function. We can shareT globally and implicitly inC. A simple and efficient way of constructingT is to register a fresh node corresponding to a given expression with globally visibleT, in the case of expressions regardingC. We here assume T to be a separate table because it is easy to construct and dispose of.

We also can make each node in ASTs hold a reference to value graphs. An actual representation of T depends on implementation.

That T holds references to nodes of value graphs simplifies the implementation of a value graph pG, F, Vq. If we constructGas a pointer structure, we can reach nodes ofGthroughT, and thereby we do not have to constructGexplicitly as a set. An important observation is that optimizations concern the part ofGreachable fromT. In particular,ϕ-functions for dead variables are unnecessary. Actually, existing algorithms [BCHS98, CCF91] for placingϕ-functions prune unnecessaryϕ-functions on the basis

4The original paper [GF64] of union-find trees also aimed at managing equivalent variables in compilers.

s ::“ . . . ˇ ˇ l: ˇ

ˇ gotol (Statement)

Figure 8.6: Syntax of the while language with goto/label statements, which is an extension to the syntax defined in Figure 8.4, whereldenotes a metavariable over labels. Label statements that are not targeted by any goto statement are regarded as empty statements.

of live variable analysis. We, however, can pruneϕ-functions easily on the basis of the reachability from T. When we trace links throughT from all essential expressions (e.g., returned expressions), unreachable ϕ-functions fromT are simply unnecessary. If implementation languages have garbage collection (GC), unnecessaryϕ-functions can be implicitly removed by using weak references.