High-Level Program Analysis - Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング）

Chapter 6

Syntax-Directed Computation and Program Analysis

6.1 Motivation for Program Analysis

A syntax-directed computation is to calculate results from trees on the basis of the production rules of their grammars. Attribute grammars (AGs) [Knu68] are one of the most famous styles of syntax-directed computations. In fact, data-parallel skeletons is also a style of syntax-syntax-directed computations.

For example, we first define the grammar (i.e., syntax) of join lists to formalize lists in the divide-and-conquer paradigm and then define reduce for each production rule on the basis of the interpretation of its production. That is, programming with data-parallel skeletons is a syntax-directed manner of programming, i.e.,syntax-directed programming.

As mentioned in Chapter 5, it is important to consider the interpretation and underlying data of trees.

The most typical target of syntax-directed computations is a computer program, where a syntax defines the abstract syntax tree (AST) of an input program. In fact, compiler frontends perform syntax-directed computations on ASTs and their implementation is the primary application of AGs [Wai90, WBGK10, WdMBK02, WKSB07, EH07]. We therefore focus on ASTs in the domain of programming languages.

Recall that data-parallel skeletons (especially, reduce) give an input tree an interpretation. In pro-gramming languages, to give a program an interpretation is generally called program analysis. For example, the work of compiler frontends, which are often defined by using AGs, is called semantic anal-ysis. AST-based program analysis is very similar to data-parallel skeletons. We therefore consider that dealing with AST-based program analysis leads to structured parallel programming with trees.

In this part, we deal with AST-based program analysis as case studies on parallel programming with trees. We do not consider any dynamic analysis for two reasons. The first is that we usually perform it incrementally. The second is that it usually focuses on the executed part and ignores the non-executed part. These characteristics make the amount of work diﬃcult to estimate. Such problems are inappropriate to parallel programming. Meanwhile, static analysis is usually exhaustive and deals with the whole program. Consequently, it tends to be expensive and thereby was accelerated through parallelization in many studies [NG13, MLBP12, AKNR12, PRMH11, RL11, MLMP10]. Static analysis is appropriate to the case studies on parallel programming. We therefore deal with static analysis.

inexpensive than that of low-level ones close to target languages. Nevertheless, CFG-based approaches to program analysis are more popular than AST-based ones in optimizing compilers. This is because CFGs-based approaches deal with arbitrary control flow by definition. If source languages do not cause arbitrary control flow, compiler optimizations could destroy control structures and then lead to arbitrary control flow. In this sense, CFG-based approaches are easier to apply. This is why CFG-based approaches are more popular.

ASTs are more intuitive and easier to understand than CFGs. As demonstrated in the literature on AG-based compiler constructions [Wai90, WBGK10, WdMBK02, WKSB07, EH07], high-level approaches based on ASTs are useful for compiler constructions. Moreover, in editors integrated with compiler frontends such as Eclipse¹, AST-based approaches are quite natural. High-level program analysis based on ASTs is extensively desired. It is therefore worth migrating from de facto standard CFG-based implementation to AST-based implementation.

A main obstacle to migrating from CFGs to ASTs is the control flow that does not form a control structure, which compilers could introduce in the process of compilation. We can introduce such control flow into ASTs by using goto/label statements. It is, however, diﬃcult to interpret the control flow derived from goto/label statements in a syntax-directed manner because it ignores the structures of ASTs. If we can deal with goto/label statements in a syntax-directed manner, we can replace CFG-based analysis with AST-based analysis. Consequently, we can enjoy the benefits from high-level program analysis as well as high-level compiler constructions. The primary technical issue in Chapters 7 and 8 is therefore how to tame with goto/label statements.

From the perspective of syntax-directed programming, how to tame goto/label statements corre-sponds to how to deal with irregular computations involved in computations with trees. To deal with such irregular part in a structured manner is a technical issue in structured parallel programming with trees. We therefore consider that to tame goto/label statements is very appropriate to a case study on parallel programming with trees.

1https://eclipse.org/

Chapter 7

Syntax-Directed Divide-and-Conquer Data-Flow Analysis

This chapter is self-contained and an extended version of our publication [SM14b].

7.1 Introduction

Data-flow analysis (DFA) is a classic and fundamental formalization in programming languages and particularly forms the foundation of compiler optimization. Many optimizations consist of a pair of analysis and transformation, and DFA often formulates the analysis part of an optimization and occupies the computational kernel of its optimization pass.

Nowadays, an input to DFA can be very large. For example, state-of-the-art optimizing compilers such as GCC and LLVM are equipped with link-time optimization (LTO), which is to reserve intermediate representations beside executables at compile time and then optimize the whole program at link time by using all reserved intermediate representations of linked executables. An input program of LTO is larger than the one of usual separate compilations. Furthermore, LTO promotes aggressive procedure inlining, which can incur an exponential blow up of input programs. In DFA for LTO, it is therefore desired that large-scale input programs can be dealt with eﬀectively.

One promising approach to dealing with large-scale inputs is parallelization. Since parallel machines are widespread, well-parallelized DFA will benefit many users of LTO. A primary concern is the generation and assignment of parallel tasks. Concretely, load balancing with little overhead is important. Although load balancing is necessary to reduce parallel time, the load balancing itself could incur considerable overhead in processing large-scale inputs. For parallel DFA of large-scale input programs, the divide and conquer directly on input data structures without preprocessing is very much desired because this will result in the immediate generation of parallel finer-grained tasks in recursion.

A naive approach to the divide and conquer of DFA is procedure-level decomposition. In interpro-cedural as well as intraprointerpro-cedural analysis, the analysis of each procedure is computationally almost independent of that of the others and therefore can be performed in parallel. This procedure-wise paral-lelization, however, can incur a poor load balancing in LTO with aggressive inlining. Aggressive inlining expands the main procedures sharply by substituting and eliminating many other procedures; conse-quently, it reduces the number of procedures and causes a size imbalance among procedures. To obtain better load balancing, the divide and conquer over a procedure is necessary.

DFA usually deals with a procedure in the form of a control-flow graph (CFG). Although there were some earlier studies on parallel DFA that developed divide-and-conquer methods on CFGs, these methods required an auxiliary tree structure [LRF95] or duplication of CFGs [KGS94] and therefore incur significant overhead. These drawbacks stem from the nature of CFGs. The loops and sharing of paths in CFGs make the divide and conquer of DFA diﬃcult because they impose unstructured dependence on parts of the DFA. To resolve this dependence, some preprocessing is generally required. Therefore, DFA on CFGs is essentially diﬃcult to divide and conquer.

In contrast to CFGs, abstract syntax trees (ASTs) are easy to divide and conquer owing to their recursive structures. If we can perform DFA on ASTs, the divide and conquer of DFA will be straight-forward in a recursion on ASTs (i.e., a syntax-directed manner) and enable us to perform each DFA of independent AST subtrees in parallel. Rosen [Ros77] developed high-level data-flow analysis, a well-formed method of DFA on ASTs, but his method cannot deal with goto/label. Since goto/label causes control flow unrestricted to the structures of ASTs, it introduces into ASTs unstructured dependence similar to that of CFGs. Taming goto/label is therefore essential for general DFA.

To resolve this problem, we have developed a novel parallel syntax-directed method of general DFA that tames goto/label. The proposed method is built upon Tarjan’s algebraic formalization [Tar81a] of DFA. First, our method summarizes the syntax-directed data flow in a bottom-up parallel sweep of a given AST, while detaching the goto-derived data flow and constructing a compact system of linear equations that represent it. Next, we obtain the summary of the goto-derived data flow by solving the system.

Lastly, we merge the syntax-directed data flow with the goto-derived flow. Our method is particularly useful for programs containing few goto/label statements because the divide and conquer over a given AST is applied to the most part of DFA. We can assume such an input thanks to the popularity of structured programming. Furthermore, our method guarantees asymptotically linear speedup.

The following are our two major contributions:

• We have developed a novel syntax-directed divide-and-conquer parallel method of DFA based on Tarjan’s formalization [Tar81a] (Section 7.3). The essence of our method is to detach the goto-derived data flow and calculate it afterward. Our method guarantees asymptotically linear speedup.

• We have also developed a practical technique to enhance our DFA method on the basis of the notion of interval analysis [AC76, Tar81b] (Section 7.4). Our technique will reduce the constant factors of our method for usual input programs.

• We have demonstrated the feasibility of our method experimentally through prototype implementa-tions on a C compiler (Section 7.5). Our parallel prototype achieved a significant speedup and our sequential prototype achieved reasonable performance compared to the standard implementation.

ドキュメント内 Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング） (ページ 67-70)