Applications - Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング）

criterion must be the number of the nodes of its original tree. We therefore have to apply an additional m-bridging to recursively segmented trees for load balancing. We use mfor the parameter of forming recursively segmented trees andm¹ for that of load balancing, assumingmăm¹.

The calculation of m¹-critical nodes on recursively segmented trees is straightforward. A problem is that an m¹-bridge does not necessarily match any segment of a recursively segmented tree. Even if anm¹-bridge does not match a segment, it can contain some of its finer-level segments. With a space-containment criterionc¹: αˆαÑboolforSPTreeα,β, we can assign processors to segments that cover all nodes of an original tree.

Although processors are not assigned to coarse-level segments (especially, root-level ones), it is no problem, rather convenient. Many querying points will pass through such coarse-level segments in travers-ing diﬀerentm¹-bridges. It is therefore reasonable to share or replicate them among all processors. In addition, m¹-bridges of much smaller weight such as single-node ones (see Figure 12.2) are not worth privatizing in practice. We therefore may neglect assigning processors to segments contained in such cheapm¹-bridges.

When an m¹-bridge contains a segment, we do not have to assign a processor explicitly to the finer-level segments contained in it. That is, we can prune processor assignment in top-down traversal.

The computational pattern of this processor assignment is therefore very close to iter. Intuitively, a containment criterion c¹ corresponds to a pruning criterion c in iter, and the destructive updating of nodes regarding processor numbers corresponds tokbandkliniter. Note that the assignment of processor numbers does not cause race by definition.

By using the method above, we can distribute buﬀered recursively segmented trees. It is, however, insuﬃcient for tree traversal. We have to consider intersection (or communication) among processors in tree traversal. We can implement it as a simple asynchronous parallel computing. We associate a mailbox, which is an asynchronous buﬀer, with each segment numbered a processor number. When one processor transmits a querying point to a segment with a mailbox for another processor, we insert the querying point into the mailbox. Each processor always monitor its own mailboxes and transmit querying points in an own mailbox to the buﬀer of segment with which the mailbox is associated. By using such an asynchronous intersection, we can implement tree traversal on distributed buﬀered recursively segmented trees.

In summary, how to distribute a buﬀered recursively segmented treeT is the following:

1. Calculate eachm¹-critical node onT and obtain its subspace, which may have a hole.

2. Distribute m¹-critical subspaces to all processors, where subspaces of much smaller weight may be neglected.

3. Assign processor numbers to segments by using a space-containment criterionc¹in a manner similar to iter.

4. Associate a mailbox for processor iwith each segment numberedi.

12.5. APPLICATIONS 123 For example, we consider a database of trade history. A trade is a triple of type IdˆTimeˆPrice, where Id denotes trader’s identity numbers, Time denotes trade timestamps, andPrice denotes trade prices. The space partitioned by a given treeT is two-dimensional of typeId ˆTime and its subspaces are rectangles of type Rect_IdˆTime. A leaf subspace of T is a trade list of typeListIdˆTimeˆPrice. A querying point is a line segment onTime of typeIdˆPeriod, wherePeriod denotes the type of periods, and a query sums up the trade prices on this line segment. That is, this batch processing calculates the sales of traders during some period. The types and parameters ofiterfor this batch processing are defined as follows.

α“Rect_IdˆTime, β“ListIdˆTimeˆPrice, γ“IdˆPeriod,

δ“Price,

cpq, xq “isEmptypqXxq, kbpq, xq “0,

klppi, pq, yq “ÿ

rw| pj, t, wq Py, i“j, tPps, x‘y“x`y,

c_cpq, x, x_‚q “qĎx_‚, kcpq, x, x_‚q “0,

wherer. . .|. . .sis a notation for list comprehension andř

r. . .|. . .sdenotes list reduction with`.

If we perform distributed parallelization, we obtain a distributed database immediately. Since the insertion of trades to this distributed T does not necessitate mutual exclusion except for mailbox, it is therefore suited to data updated every day. Although balanced k-d trees are known to be appropriate for range queries, the advantage of our recursively segmented trees is that its (re)construction does not have to take the definition of space into account.

12.5.2 N-body Problem

TheN-body problem is to calculate the contributions amongN particles. Although the direct method costsOpN²qtime, the Barnes-Hut algorithm, which utilizes some approximation, costsOpNlogNqtime.

It typically uses an octree to hold N particles in the three-dimensional space. The main idea of the Barnes-Hut algorithm is to approximate faraway particles by using a subspace containing them. This is a pattern ofiterwhere a space-partitioning tree contain all querying points.

For example, in gravitation simulations, a query calculates a force of type Force, which is a three-dimensional vector. Particles (i.e., mass points) are type of Particle, which has to hold a pair of mass and position for gravitation. The subspaces of the three-dimensional space are type ofRect3. Each node of a given octree has a pair of its subspace and a particle that approximates the particles contained in its subspace. In addition to this pair, each leaf subspace has a particle list of type ListParticle. The types

and parameters ofiterfor the Barnes-Hut algorithm are defined as follows.

α“Rect3ˆParticle,

β“Rect3ˆParticleˆListParticle, γ“Particle,

δ“Force,

cpq, xq “isFarawaypq, xq, kbpq,pr, pqq “forcepp, qq,

klpq,pr, p, Lqq “ifcpq,pr, pqqthenforcepp, qqelse ÿ

rforcepp, qq |pPL, p‰qs x‘y“x`y,

c_cpq, x, x_‚q “isFarawaypq, x´x_‚q, kcpq, x, x_‚q “forcepq, pq,

wherepr, pq “x´x_‚,

where isFarawaypq, xq denotes a faraway criterion of particle q from subspace x, forcepp, qq denotes a force from particle pto particle q, and `denotes a vector addition. Although SPTree_α,β is a binary tree, it can represent any octree naturally by dividing each dimension successively.

In the definitions above ofcc andkc, we construct a one-hole subspace through the subtraction over Rect3ˆParticle. Although the subtraction overRect3is immediate, the subtraction overParticlerequires a consideration. Recall that the part of Particle denotes an approximation of the points contained in a subspace. In the Barnes-Hut algorithm, The total mass and the center of mass are usually used for this approximation. In this case, we can define the addition ofParticle as the calculation of total mass and that of the center of mass, and then define the subtraction as its cancellation operation. In addition, we have to defineisFarawaypq, xqfor not onlyxof typeRect₃ but the one-hole subspaces. Although a safe definition ofisFaraway is easy, a reasonable definition is important for eﬀective pruning.

12.5.3 Ray Tracing

Ray tracing is a popular problem in computer graphics. It traces lines of sight (i.e., view rays) from a viewpoint to objects settled in the three-dimensional space. An intersection point of view rays to objects leads to a pixel of a resultant image. A given set of objects settled in the three-dimensional space is called ascene.

In ray tracing algorithms, a bounding volume hierarchy (BVH) is often used for detecting intersections in a scene. The BVH decomposes a bounding box into finer ones and holds an object at a leaf. It enables us to prune intersection tests.

Photon mapping is the most popular algorithm for ray tracing. It scatters many photons over a scene in advance and stores them usually into a k-d tree. Then, it collects photons around an intersection point from thek-d tree and estimates the illumination at the point.

We can apply iterto both cases. In both cases, subspaces are type ofRect₃.

Each leaf of a BVH holds an objects of typeObject, which is a supertype ofRect3. A querying point is a view ray of type Ray. A query calculates an intersection position of typePosition. In the case of

12.5. APPLICATIONS 125 BVHs, the types and parameters ofiterare defined as follows.

α“Rect3, β“Object, γ“Ray, δ“Position,

cpq, xq “ pintersectpq, xq ”p₈q, kbpq, xq “p₈,

klpq, yq “intersectpq, yq, x‘y“nearerpx, yq,

ccpq, x, x_‚q “ pintersectpq, x´x_‚q ”p₈q, k_cpq, x, x_‚q “p₈,

where intersect of type Ray ˆObject Ñ Position calculates an intersection position, nearer of type PositionˆPosition ÑPositionreturns the position nearer to the viewpoint, andp₈denotes the position of the point at infinity. Each query andintersect returnp₈ to denote no intersection.

In photon mapping, photons are stored in the leaves of a k-d tree and a query calculates k-nearest neighbors of a given point under some bounded range [Jen00]. To calculate a set ofk-nearest neighbors, a size-bounded mergeable heap is useful. We consider an abstract data type Heap^k_α that denotes a mergeable heap ofk elements of typeα. Its constructorinitHeap takes an initial set of elements and a weight functionwto calculate the weight of an element for sorting. Themerge operation is defined over heaps that have the same weight function. Since the range of neighbors is bounded, a query point is a ball of typeBall that consists of a center position and a radius. LetPhoton be the type of photons, we can define the types and parameters ofiteras follows:

α“Rect3, β“Photon, γ“Ball, δ“Heap^k_Photon, cpq, xq “isEmptypqXxq, kbpq, xq “ H,

k_lpq, pq “initHeapptpu,distFrom_qq, H‘H¹“mergepH, H¹q,

ccpq, x, x_‚q “qĎx_‚, kcpq, x, x_‚q “ H,

wheredistFromb denotes a function that calculate a distance from the center of ballbto a given photon.

12.5.4 Nearest-Neighbor Classifiers

In nearest-neighbor querying by iter, a bounded range of neighbors is important because it is used in top-down pruning. Without this bounded range, a query point would traverse the whole tree and incur a terrible slowdown. While bounded ranges are natural in photon mapping,k-nearest-neighbor classifiers, which are extensively used in pattern recognition, do not assume bounded ranges in general.

In the unbounded-range case, we generally perform bottom-up pruning by using an intermediate result of k-nearest neighbors. iter cannot deal with this bottom-up pruning. One approach to dealing with this limitation is multi-pass top-down pruning. Specifically, in the first pass, we find a leaf on the basis of a querying point with an empty range and then approximate a range of neighbors from the point(s) in the found leaf. In the second pass, we perform a range-bounded k-nearest-neighbor query

with the approximated range. If we do not obtain k neighbors, we retry a range-bounded k-nearest-neighbor query by relaxing the current range bound a little. In the after third passes, one-hole range queries suﬃce for accumulating nearest neighbors.

We can implement bottom-up pruning itself straightforwardly on (buﬀered) recursively segmented trees. In this case, it will be reasonable to perform bottom-up pruning at the root of each traversed segment on each level. This, however, complicates implementation compared to iter. Because last-level segments would provide reasonably approximated ranges in iterative top-down pruning, it would be suﬃcient fork-nearest-neighbor queries to extenditerin a segment-aware manner.

ドキュメント内 Structured Parallel Programming with Trees （木を用いた構造化並列プログラミング） (ページ 134-138)