Conclusion - Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング）

the parallelism in trees is transformed into the parallelism in arrays and finally tree computations is implemented with vector operations. This flattening incurs the time and space overhead of array rep-resentations themselves and operations on them. To tame this overhead, several studies dealt with the avoidance of flattening [KCL^`12, BFR^`13]. However, this overhead is the result of a weak abstraction of parallel computations on trees. The construction of and the computations on trees in NESL are re-spectively implemented as list constructions and list computations repeated in recursive functions. This means that NESL have to provide a list view of trees and enable arbitrary recursions. Therefore, the encapsulation of tree data structures is diﬃcult and there is not much room for an eﬃcient implementa-tion. In contrast, tree skeletons can use an encapsulated implementation of trees and therefore have much room for an eﬃcient implementation. Entry functions in our approach actually utilize this property.

4.8 Conclusion

We have presented our parallelizer for implicitly skeletal programming on trees. Our parallelizer builds on the existing formalization [MHT06] and enables programmers to enjoy the benefits of tree skeletons with no burden derived from their complication,

The current implementation deals only with reduce over binary trees. Further development is left for future work. An extension to the computations of uAcc and dAcc is straightforward. However, automatic parallelization of recursive functions that consist of a mixture of reduce, uAcc, anddAccwill be nontrivial. On the basis of the diﬀusion theorem [HTI99], we can parallelize such computations in theory. Its automation with reasonable program analysis to recursive functions is, however, unexplored.

The computations over trees of unbounded degree are also an issue. The tree contraction algorithm and operations [MM11a] to trees of unbounded degree will be helpful.

The current implementation uses the only implementation of a tree skeleton. An implementation specialized to a specific composition of tree skeletons is, however, more eﬃcient. In such cases, dispatch to an appropriate implementation of a composition of tree skeletons is desired. It is promising that underlying skeleton libraries in C++ implement this dispatch because C++ template metaprogramming enables such optimizations [ME10, EM14] at the library level. This approach will promote the separation of concerns between parallelizers and underlying skeleton libraries.

Recent work [Mor11, Mor12] by Morihata applied tree contraction algorithms successfully to at-tributed tree transducers and macro tree transducers, which are restricted recursive functions that transform given trees. These results theoretically demonstrate the possibility of parallelizing recursive functions to tree skeletons for tree transformations. Their practical usability is, however, still unknown and automatic parallelization based on these results is left for future work.

Chapter 5

Limitations of Tree Skeletons

In Chapters 3 and 4, we have improved the usability of tree skeletons both for skeleton implementers and skeleton users. However, tree skeletons are still not practically useful. Applications of tree skeletons are actually very limited. For example, the maximum marking problems [MHT08] on trees can be solved by using tree skeletons. Although maximum marking problems themselves can cover a class of valuable optimization problems, their settings to which we can apply tree skeletons easily are rather artificial.

This is because we adopt an unreasonable approach to tree-based computations.

Trees in programming are more than trees. Trees in programming are abstract representations of some data. The interpretation of trees is closely relevant to this underlying data. In the case of join lists, the underlying data is lists to which the concatenation operation is constant-time. Typically, binary search trees are closely relevant to the underlying data. Their tree structures represent sorted sets. We can balance binary search trees by definition. When querying a binary search tree, we indeed traverse it but essentially calculate the magnitude relation between a key and a subset that its subtree represents.

The interpretation of BinTree_α is nothing more than an algebra over full binary trees. The interpre-tation ofSegTree_αincorporates tree contraction operations but is also nothing more than an algebra over segmented trees. To use tree skeletons, we have to define these interpretations as algebra independent of the underlying data of trees. Intuitively, this is to implement, for example, the operations on binary search trees only with recursion on their structures without any comparison of their payload values.

Such tree manipulations are obviously counter-intuitive and indubitably ineﬃcient. Therefore, use of tree skeletons is unreasonable in programming for tree-based computations.

The consideration of underlying data can simplify the implementation of load balancing. For ex-ample, consider XML processing, which was considered as a typical application of tree skeletons [Ski97, NEM^`07]. In using tree skeletons, DOM trees of unbounded degree are transformed into full binary trees.

As a result, an input tree may be a list-like tree and be suited to load balancing based on segmented trees. However, the height of actual XML documents is small; e.g., XML documents of hundreds of height are unrealistic. List-like DOM trees are not worth considering. In fact, for the MapReduce-based implementation [EI12] ofreduceover XML-like bracketed structures, load balancing to tame list-like trees was unnecessary as well as ineﬃcient. Moreover, large-scale XML documents that constitute database usually have lists of the same kind of elements. It would be suﬃcient to parallelize computations on such lists simply as in list skeletons. The interpretation of substructures of data is thus of high importance for load balancing and practical implementations.

Tree skeletons deal solidly with programming on trees, where only the structure of trees matters.

However, programming with trees, where trees are utilized for dealing with their underlying data, is truly desired. In the following chapters, we therefore deal with parallel programming with trees in a divide-and-conquer manner by considering the interpretation of trees and the underlying data primarily.

Part II

Syntax-Directed Programming

Chapter 6

Syntax-Directed Computation and Program Analysis

6.1 Motivation for Program Analysis

A syntax-directed computation is to calculate results from trees on the basis of the production rules of their grammars. Attribute grammars (AGs) [Knu68] are one of the most famous styles of syntax-directed computations. In fact, data-parallel skeletons is also a style of syntax-syntax-directed computations.

For example, we first define the grammar (i.e., syntax) of join lists to formalize lists in the divide-and-conquer paradigm and then define reduce for each production rule on the basis of the interpretation of its production. That is, programming with data-parallel skeletons is a syntax-directed manner of programming, i.e.,syntax-directed programming.

As mentioned in Chapter 5, it is important to consider the interpretation and underlying data of trees.

The most typical target of syntax-directed computations is a computer program, where a syntax defines the abstract syntax tree (AST) of an input program. In fact, compiler frontends perform syntax-directed computations on ASTs and their implementation is the primary application of AGs [Wai90, WBGK10, WdMBK02, WKSB07, EH07]. We therefore focus on ASTs in the domain of programming languages.

Recall that data-parallel skeletons (especially, reduce) give an input tree an interpretation. In pro-gramming languages, to give a program an interpretation is generally called program analysis. For example, the work of compiler frontends, which are often defined by using AGs, is called semantic anal-ysis. AST-based program analysis is very similar to data-parallel skeletons. We therefore consider that dealing with AST-based program analysis leads to structured parallel programming with trees.

In this part, we deal with AST-based program analysis as case studies on parallel programming with trees. We do not consider any dynamic analysis for two reasons. The first is that we usually perform it incrementally. The second is that it usually focuses on the executed part and ignores the non-executed part. These characteristics make the amount of work diﬃcult to estimate. Such problems are inappropriate to parallel programming. Meanwhile, static analysis is usually exhaustive and deals with the whole program. Consequently, it tends to be expensive and thereby was accelerated through parallelization in many studies [NG13, MLBP12, AKNR12, PRMH11, RL11, MLMP10]. Static analysis is appropriate to the case studies on parallel programming. We therefore deal with static analysis.

ドキュメント内 Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング） (ページ 61-67)