Operator Generation and Code Generation
Since e00, e01, e10, and e11 are equivalent to their bodies, it is immediate to generate the primary operators. After augmented coefficient matrices and matrix data types are obtained, we can generate auxiliary operators simply by using symbolic matrix multiplication, following the general forms described in Section 4.2. Note that the special variables v, l, and r are captured as the formal parameters of generated operators.
The final code generation is straightforward. Because of high compatibility between C89 and C++03, we have been able to reuse the C code generator of COINS for code generation. The code generation of helper functions and matrix data types is completely performed by the C code generator. We implemented an ad hoc C++ code generator for entry functions and operator definitions as a simple extension of the C code generator because they necessitate C++ functionality.
From the augmented coefficient matrix above of mis, generated is the following function (more pre-cisely, a member function of C++ function objects) that corresponds toψ
rs.
h i r _ t _ v o i d o p e r a t o r ()( s t r u c t _ m i s _ m a t & m , c o n s t h i r _ t _ i n t v ) c o n s t {
h i r _ s _ p r i v a t e h i r _ v _ a u t o h i r _ t _ i n t _ v a r 4 3 ; h i r _ s _ p r i v a t e h i r _ v _ a u t o h i r _ t _ i n t _ v a r 4 5 ; h i r _ s _ p r i v a t e h i r _ v _ a u t o h i r _ t _ i n t _ v a r 4 7 ; h i r _ s _ p r i v a t e h i r _ v _ a u t o h i r _ t _ i n t _ v a r 4 9 ; _ v a r 4 3 = ( h i r _ _ _ A D D ( v , m . _ 1 _ 0 ));
_ v a r 4 5 = ( h i r _ _ _ A D D ( v , m . _ 1 _ 1 ));
_ v a r 4 7 = max ( m . _0_0 , m . _ 1 _ 0 );
_ v a r 4 9 = max ( m . _0_1 , m . _ 1 _ 1 );
m . _0 _ 0 = _ v a r 4 3 ; m . _0 _ 1 = _ v a r 4 5 ; m . _1 _ 0 = _ v a r 4 7 ; m . _1 _ 1 = _ v a r 4 9 ; }
The matrix argumentmis overwritten with a return value because passed data is unnecessary inreduce.
Right before code generation, our parallelizer excludes symbols inconvenient to C++ programs. For example,wchar_t is a typedefed type in C but a keyword in C++. Ifwchar_t were generated as C, resultant programs would have a syntactic error in C++. We therefore have to exclude wchar_t from code generation. Similarly, we exclude symbols specified in#pragma excludefrom code generation.
4.4. UNDERLYING BINARY TREE SKELETON 43 Left denotes a node having only the left child andRight denotes a node having only the right child. We can definereduceoverGBinTreeα as
reducepk11, k10, k01, k00,Branchpx, l, rqq “k11preducepk11, k10, k01, k00, lq, x,reducepk11, k10, k01, k00, rqq, reducepk11, k10, k01, k00,Leftpx, lqq “k10preducepk11, k10, k01, k00, lq, xq,
reducepk11, k10, k01, k00,Rightpx, rqq “k01px,reducepk11, k10, k01, k00, rqq, reducepk11, k10, k01, k00,Leafpxqq “k00pxq.
Parametersk11,k10,k01, and k00 are respectively the interpretations ofBranch,Left, Right, andLeaf. We can define the segmented version ofGBinTreeα as
GSegTreeα“BranchpGContextα,GSegTreeα,GSegTreeαq, GSegTreeα“LeftpGContextα,GBinTreeαq,
GSegTreeα“RightpGContextα,GBinTreeαq, GSegTreeα“LeafpGBinTreeαq,
GContextα“Branch1px,GContextα,GBinTreeαq, GContextα“Branch2px,GBinTreeα,GContextαq, GContextα“Leftpx,GContextαq,
GContextα“Rightpx,GContextαq, GContextα“Holepxq.
We can also definereduce overGSegTreeα.
reduce:pbˆaˆbÑbq ˆ pbˆaÑbq ˆ paˆbÑbq ˆ paÑbq ˆ paÑcq ˆ pbˆcˆbÑbq ˆ pbˆcÑbq ˆ pcˆbÑbq ˆ pcˆcˆbÑcq ˆ pbˆcˆcÑcq ˆ pcˆcÑcq ˆ pcˆcÑcq ˆGBinTreeαÑb
reducepk11, k10, k01, k00, τ, ϕ11, ϕ10, ϕ01, ψl, ψr, ψsl, ψ
rs, tq “redptq, whereredpBranchpx, l, rqq “ϕ11predplq,redCtxpcq,redprqq
redpLeftpc, lqq “ϕ10predplq,redCtxpcqq redpRightpc, rqq “ϕ01predCtxpcq,redprqq redpLeafptqq “reducepk11, k10, k01, k00, tq
redCtxpBranch1px, l, rqq “ψlpredCtxplq, τpxq,reducepk11, k10, k01, k00, rqq, redCtxpBranch2px, l, rqq “ψrpreducepk11, k10, k01, k00, lq, τpxq,redCtxprqq, redCtxpLeftpx, lqq “ψ
slpredCtxplq, τpxqq, redCtxpRightpx, rqq “ψ
rspτpxq,redCtxprqq redCtxpHolepxqq “τpxq,
and algebraic conditions are as follows:
k11px, l, rq “ϕ11pl, τpxq, rq, ϕ11pϕ11pl1, y1, r1q, y, rq “ϕ11pl1, ψlpy1, y, rq, r1q,
ϕ11pϕ10pl1, y1q, y, rq “ϕ10pl1, ψlpy1, y, rqq, ϕ11pϕ01py1, r1q, y, rq “ϕ01pψlpy1, y, rq, r1q, ϕ11pl, y, ϕ11pl1, y1, r1qq “ϕ11pl1, ψrpl, y, y1q, r1q,
ϕ11pl, y, ϕ10pl1, y1qq “ϕ10pl1, ψrpl, y, y1qq, ϕ11pl, y, ϕ01py1, r1qq “ϕ01pψrpl, y, y1q, r1q,
k10pl, xq “ϕ10pl, τpxqq, ϕ10pϕ11pl1, y1, l1q, yq “ϕ11pl1, ψslpy1, yq, r1q,
ϕ10pϕ10pl1, y1q, yq “ϕ10pl1, ψslpy1, yqq, ϕ10pϕ01py1, r1q, yq “ϕ01pψslpy1, yq, r1q,
k01px, rq “ϕ01pτpxq, rq, ϕ01py, ϕ11pl1, y1, r1qq “ϕ11pl1, ψ
srpy, y1q, r1q, ϕ01py, ϕ10pl1, y1qq “ϕ10pl1, ψ
srpy, y1qq, ϕ01py, ϕ01py1, r1qq “ϕ01pψ
rspy, y1q, r1q.
Auxiliary operatorsϕ11,ϕ10,ϕ01,ψl,ψr,ψsl, andψ
rsare respectively the interpretations ofBranch,Left, andRight forGSegTreeα, andBranch1,Branch2,Left, and Right forGContextα.
ψsl andψ
rscorrespond to theCompressoperation from the perspective of tree reduction. However, ψsl and ψ
sr contain matrix construction, which corresponds to the Rake operation. We can therefore consider thatψsl andψ
rsrespectively correspond toShuntl andShuntr, where theRake operation is applied to a missing leaf in the form of matrix construction with no value of recursive calls.
To best our knowledge, the skeleton identical to reduce over GBinTreeα has not been presented.
However, we can derive its definition (including its algebraic conditions) straightforwardly from the notion of segmented trees and that of tree contraction operations.
4.4.2 Specialization to Multilinear Computations
We can specialize reduce over GBinTreeα to multilinear computations. Let Vβ be the type of vectors whose components are of typeβ andMβ be the type of matrices whose components are of type β.
First, from the definition (4.1) of multilinear functions, we can determine the types of the primary operators as follows.
k11:VβˆαˆVβÑVβ
k10:VβˆαÑVβ
k01:αˆVβ ÑVβ
k00:αÑVβ
Next, as mentioned in Section 4.2, we consider extended payload values of typeαˆMβ. On the basis of the meaning of extended payload values, we can define the following auxiliary operators:
ϕ11:Vβˆ pαˆMβq ˆVβÑVβ
ϕ11pyl,px, Mq,yrq “M k11pyl, x,yrq, ϕ10:Vβˆ pαˆMβq ÑVβ
ϕ10pyl,px, Mqq “M k10pyl, xq, ϕ01:pαˆMβq ˆVβ ÑVβ
ϕ01ppx, Mq,yrq “M k01px,yrq.
4.4. UNDERLYING BINARY TREE SKELETON 45 The operators above are simple instances of operators for reduce and not specialized instances because we obtain them through type substitution.
We can specialize the definitions ofψl,ψr, ψsl, andψ
rs. An important point of specialization is that we introduce extended payload values only for remaining the payload values of hole nodes. If we store on its root the payload value of the hole of a one-hole context, we do not have to introduce extended payload values. In this case, the four auxiliary operators can be simplified as follows:
ψl:MβˆαˆVβÑMβ
ψlpM, x,yrq “AlM, ψr:VβˆαˆMβÑMβ
ψrpyl, x, Mq “ArM, ψsl:MβˆαÑMβ
ψslpM, xq “AslM, ψrs:αˆMβ ÑMβ
ψrspx, Mq “A
rsM, wherex,yl, andyrare implicitly used in Al, Ar, Asl, and A
sr.
Then, the auxiliary operatorτ to lift payload values becomes unnecessary because we reduce a one-hole context in a sequential bottom-up manner. Instead, we lift every one-hole node to the identity matrix I. We thus can reduce an one-hole context to a single matrix by usingψl,ψr, ψsl, ψ
rs, and the primary operators. We construct an extended payload value for each hole node by pairing the matrix to which a one-hole context reduces and the payload value of its hole. After that, we reduce a global tree by using ϕ11,ϕ10, andϕ01 successfully.
4.4.3 Library Implementation
We implemented reduce over GSegTreeα as a C++ library with MPI2, on the basis of an array-based implementation [Mat07a] of tree skeletons. Our implementation is a straightforward extension of the original implementation [Mat07a] for full binary trees to that for general binary trees.
Our library implementation utilizes C++ templates extensively. C++ templates enabled us to im-plement the specialization to multilinear computations easily. What we did for the specialization was only to specialize the reduction of one-hole contexts. After including it in the basic implementation, the instantiation with specialized operator types guidesreduceto the specialized implementation.
Our library implementation assumes programming based upon the SPMD (single-program multiple-data) model with MPI, as in existing data-parallel skeleton libraries [SM14a, EM14]. The functions config::initandconfig::finzalizethat our library provides (see Figure 4.2) are indeed the wrappers ofMPI_initandMPI_finalize.
Our library is assumed to be used only from entry functions. Entry functions are desired to select an appropriate implementation of skeletons from user’s input. Because entry functions, where generated operators are used implicitly from programmers, take only a binary tree data structure, it is natural that the implementations of binary tree data structures specify those of skeletons. If we can implement entry functions as generic functions, the code generation of our parallelizer becomes simple and the generated programs become convenient in C++. If there is a clear interface between entry functions and tree skeletons, we can achieve loose coupling between our parallelizer and our library. To achieve these, we have designed our library to enable callingreducevia tree data structures. As a result, we can implement an entry functionfas follows.
t e m p l a t e < c l a s s BinTree >
s t r u c t v e c t o r _ t y p e f ( c o n s t B i n T r e e & t ) {
r e t u r n B i n T r e e :: s k e l e t o n s :: r e d u c e (/∗ o p e r a t o r s o m i t t e d ∗/, t ) }
2http://www.mpi-forum.org/
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0 5 10 15 20 25
Execution time (seconds)
Number of processes par seq
Figure 4.7: Execution time of reduce for mis described in Figure 4.1, where seq means reduce over GBinTreeint andparmeansreduceoverGSegTreeint. Time for tree construction is not contained.
On the basis of the genericity of templates, f is generic with respect to the parameter type BinTree.
GivenBinTree determines the implementation ofreduce.