were efficient for photon mapping [WGS04]. The main advantage of our approach based onm-bridge is genericity regarding the shapes of trees.
12.6.3 Iterative Search
In state-space search (or combinatorial search), we can find iterative patterns such as iterative deepening depth-first search, IDA* search, and Monte-Carlo tree search. While the traversal that we deal with is to traverse tree data structures, these traverse state space and the traces of search form trees in hindsight.
A state is per se a value used in search and the next states can be calculated from a current state. These are state transition rather than tree traversal. These locality issues are therefore quite different from our iterative tree traversal.
Since state-space search is state transition, the locality in transition table is important. Actually, a distributed parallelization regarding tables [RPBS99] was quite effective for IDA* search [RPBS99]
and Monte-Carlo tree search [YKK`11]. As seen from these results, both locality enhancement and parallelization therefore rely strongly on data structures.
12.6.4 Data-Parallel Skeletons
In parallel programming, parallel computing patterns are called skeletons [Col89, RG02]. Those which exploit parallelism of data structures are called data-parallel skeletons. Sinceiterexploits parallelisms of given data structures, we can regard it as a data-parallel skeleton.
What kind of skeletons is iter, then? The map to querying points is a list skeleton [Ski93]. The pruning and reduction of space-partitioning trees can be implemented with tree skeletons [Ski96, GCS94].
Moreover, space-partitioning trees are based on the freedom on dividing a given multidimensional space.
This trait in a two-dimensional case is closely relevant to the algebra used in matrix skeletons [EHKT07].
iter is actually a chimera of different data-parallel skeletons. This leads to an interesting observation on skeletons. Even though applications of matrix skeletons and tree ones are very limited, a chimera of them can have important applications.
12.7 Conclusion
In this chapter, we have presented an application of m-bridging to the design of data structures for iterative tree traversal. Our parallel patterniterthat formulates typical iterative tree traversal can deal on our data structures with important applications.
We plan to implement iterand our data structures and evaluate them experimentally by using im-portant applications.
A further step in future work that we consider is to deal with fast algorithms for theN-body problem.
Many problems in statistical learning are known to be formalized as a kind ofN-body problem [GM01].
In Riegel’s recent work [Rie13], a wider range of problems were formalized as the generalized N-body problem and a fast algorithm for solving it was presented. If we incorporated these with parallel patterns on our data structures, our approach would be more valuable.
Chapter 13
Towards Neighborhood Abstractions
In this part, we have investigated parallel programming for neighborhood computations in cache-efficient and divide-and-conquer manners. Unfortunately, we have not built a technical connection between the work in Chapter 11 and that in Chapter 12, which deal with seemingly different computations. Both stencil computation and querying of space-partitioning are, however, conceptually very close.
Stencil computation is usually considered as a regular array-based computation. In this sense, it seems to be out of the scope of tree-based computations. This regularity of stencil computation is, however, derived from a uniform grid. If we adopt an adaptive grid, stencil computation becomes irregular. In fact, adaptive mesh refinement for partial differential equation solvers is considered as adaptive stencils [DS06], and its load balancing uses space-partitioning trees [Mit07]. Moreover, grids can be hierarchical in multigrid methods [BHM00]. In this case, stencil computation becomes a tree-based computation.
In querying of space-partitioning trees, we find neighbors by visiting spatial nodes. We can also consider this visited nodes as neighbors, which have various granularity. In this sense, a query is a tree-shaped stencil. Besides, in the domain ofN-body problems, fast multipole methods (FMMs), which are faster algorithms based on space-partitioning trees, were developed. The original FMM [GR87] performs a typical stencil computation at each spatial level.
These connections between stencil computation and space-partitioning trees are derived from physical laws. Physical contributions such as gravity are inversely proportional to distance in general. It is natural to use neighbors for calculating their approximations. Since we always calculate approximations of model formulae in scientific computing on the basis of numerical analysis, use of neighborhood is primordial.
Then, to improve the precision of approximations in general, we have to calculate contributions of a far distance. Hierarchical subspaces, which space-partitioning trees represent, enable us to interpret faraway areas adaptively as neighbors. Use of space-partitioning trees is therefore reasonable to improve the precision of approximations.
Because of this natural connection, it is desired to unify abstractions of stencil computation and querying of space-partitioning trees. We have been pursued a unified abstraction for various neighborhood computations. At the beginning, we conjectured that trees were useful for such a unified abstraction.
Through the work in this part, we, however, found trees to be inadequate to abstract neighborhood computations because a tree structure is merely an axis of a neighborhood subspace. We have to focus on the formalization of neighborhood rather than overall iterative and/or recursive computations. In fact, from the perspective of abstractions of neighborhood, Chapter 11 formalizes neighborhood as constant band matrices and Chapter 12 formalizes it as a root-to-leaf path in space-partitioning trees. Since both are far different, it is reasonably difficult to build a technical connection between both.
If we focus on the formalization of neighborhood, a unified abstraction for parallel neighborhood computations is still difficult to design. This is because the structures of neighborhood affect global recursions of individual neighborhood computations. For example, Callahan-Kosaraju algorithms [CK95, Cal93] construct neighborhood as well-separated pairs on space-partitioning trees. We can represent these pairs as side links among nodes. However, the construction of these pairs is not a recursion on trees but a recursion on pairs of subspaces. In fact, it contains a peculiar shuffling of siblings from the viewpoint of tree computations. Because the global recursion of a whole computation is of key importance of load
129
balancing, the variety of global recursions depending on kinds of neighborhood makes it difficult to design abstractions. We therefore leave it for future work.
Chapter 14
Conclusion
14.1 Summary of Contributions
In this dissertation, we have dealt with parallel programming with trees in a divide-and-conquer manner.
Our main contributions in this work are summarized as follows:
• We have designed an iterator-based interface between tree skeletons and tree-structure implementa-tion, and have developed a tree skeleton library loosely coupled between both by using our interface.
We have demonstrated the benefits of its flexibility experimentally. (Chapter 3)
• We have developed a parallelizer that transforms sequential recursive functions in C into tree skeleton calls with operators based on the formalization by Matsuzaki at al. [MHT06]. It hides the complicated API of tree skeletons from programmers and brings the benefits of tree skeletons with no burden on programmers. (Chapter 4)
• We have developed a novel syntax-directed divide-and-conquer method of data-flow analysis based on Tarjan’s formalization [Tar81a] and Rosen’s high-level approach [Ros77, Ros80]. It can deal with arbitrary goto/label statements. We have demonstrated the feasibility and scalability of our method experimentally through prototype implementations on a C compiler. (Chapter 7)
• We have developed a syntax-directed divide-and-conquer method for constructing value graphs on the basis of a functional formalization of value graphs and Rosen’s high-level approach [Ros77, Ros80]. It tames goto/label statements troublesome inϕ-function placement. (Chapter 8)
• We have developed a linear algebraic approach to optimizing stencil computation. We have pre-sented its concept, techniques to implement it, and its effects on asymptotic complexities. We demonstrated its performance gain experimentally through a prototype library. (Chapter 11)
• We have developed techniques for enhancing the asymptotic cache complexity of iterative traversal of space-partitioning trees, on the basis of its skeletal formalization and m-bridging [Rei93]. We have explored the applicability of our approach in practically important problems. (Chapter 12)