and R. Giroudeau

(1)

Volume 2011, Article ID 476939,20pages doi:10.1155/2011/476939

Research Article

Inapproximability and Polynomial-Time Approximation Algorithm for UET Tasks on Structured Processor Networks

M. Bouznif

¹

and R. Giroudeau

²

1Laboratoire G-SCOP, 46 avenue F´elix Viallet, 38031 Grenoble Cedex 1, France

2LIRMM, 161 rue Ada, UMR 5056, 34392 Montpellier Cedex 5, France

Correspondence should be addressed to R. Giroudeau,[email protected] Received 26 October 2010; Revised 22 March 2011; Accepted 4 April 2011 Academic Editor: Ching-Jong Liao

Copyrightq2011 M. Bouznif and R. Giroudeau. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We investigate complexity and approximation results on a processor networks where the communication delay depends on the distance between the processors performing tasks. We then prove that there is no heuristic with a performance guarantee smaller than 4/3 for makespan minimization for precedence graph on a large class of processor networks like hypercube, grid, torus, and so forth, with a fixed diameterδ∈. We extend complexity results when the precedence graph is a bipartite graph. We also design an eﬃcient polynomial-time Oδ²-approximation algorithm for the makespan minimization on processor networks with diameterδ.

1. Introduction

1.1. Problem Statement

In this paper, we consider the processor network model, which is a generalization of the

in which task allocation on the processors does not

have any influence over the length of scheduling. Indeed, since the graph of processors denoted hereafterG^∗ V^∗, E^∗whereV^∗ {π¹, . . . , π^m}is a set ofmprocessors andE^∗ is the set relationship between themis fully connected, the starting of a taskidepends only on the potential communication delay, given by precedence graph betweeniand its own predecessors.

In the processor network model, this assumption is relaxed in order to take into account the fact that the processor graph may not be fully connected. Thus, task allocation on the processors can be expressed by its essential and fundamentals characteristics. We

(2)

0 1 2 3 0 1 2 3

d c a

a e

e

e b

d c a b

d c b

DiagramG1 DiagramG2

π⁰ π¹ π² π³

Figure 1: Diﬀerence between the problemP|prec;cij 1;pi 1|Cmax and P,grid 2×2|prec;cij dπⁱ, π^j;pi1|Cmax.

consider a model in which a distance functionwhich is defined hereafter, denoteddπ^l, π^h between two processors π^l and π^h in the graph of processors impacts computation of the communication delay between two tasks i and j subject to a precedence constraint and consequently on the starting time of taskj. The communication time, using c_{i, π}l, j, π^h

for computing the starting time of a task this notation indicates that the value of the communication delay between taski, which is allotted to processorπ^land taskjwhich will be executed on the processorπ^h, is assumed asc_ijdπ^l, π^h, wherec_ijis the communication delay given by the precedence graph.

Formally, the processor network model may be defined as

∀ i, j

∈E, tj≥tipicijd π, π^h

, 1.1

whereπ resp.π^hrepresents the processor on which taskiresp. taskjis scheduled,t_i represents the starting time of taski,p_i represents the processing time of taski,dπ, π^h represents the shortest path in graph G^∗ the graph of processor G^∗ V^∗, E^∗ between π and π^h, and c_ij represents the communication delay if two tasks are executed on two neighboring processorsthis value is given by the precedence graph.

We consider the classic scheduling UET-UCT Unit Execution Time-Unit Commu- nication Time, i.e., ∀i ∈ V, pi 1, and ∀i, j ∈ E, cij 1 problem on a bounded number of processors such that the processor network is a structured graph with a diameter δ. In these topologies, processors are numbered as π¹, π², . . . , π^m and processor π^h may be communicated with processor π^l with a communication cost equal to dπ^h, π^l where dπ^h, π^l represents the shortest path on graph G^∗ between processors π^h and π^l. The communication delay is therefore the distance function proposed above.

In scheduling theory, a problem type is categorized by its machine environment, job characteristic, and objective function. Thus, using the three fields notation scheme α|β|γ,whereαdesignates the environment processors, βthe characteristics of the job, and γ the criteria. proposed by Graham et al. 1 , we consider the problem of makespan minimization denoted in follows by C_max with unitary task and unitary communication delayUET-UCT in presence of a precedence graph Gon a processors network having a graphG^∗such that the communication delay depends on the shortest path on graphG^∗. This problem is denoted byP, G^∗|prec;c_ijdπ, π^k;p_i1|Cmax.

Example 1.1. Figure 1 shows the diﬀerence between the two problems P|prec;cij 1;pi 1|Cmax and P,grid 2 ×2|prec;cij dπ, π^k;pi 1|Cmax. The relationship between processors is as follows:π⁰ andπ³are connected toπ¹andπ². The processing time of the tasks and the communication delay between the tasks are unitaryUET-UCT problem. Gantt diagramG1 represents an optimal solution for the P|prec;cij 1;pi 1|Cmaxproblem. We

(3)

Table 1: Previous complexity results on the processors network model.

Topology Precedence graph Complexity Reference

Unbounded chain Tree NP-complete 2

Antitree NP-complete

Star Tree NP-complete 2

Cycle/chain Prec ρ≥4/3 4

Star Prec ρ≥6/5 3

can notice that taskzcan be executed on any processor att2. Moreover, Gantt diagramG₂ represents an optimal solution for the problemP,grid 2×2|prec;cij dπ, π^k;pi1|Cmax. In order to obtain an optimal solution, the taskamust be delayed by one unit of time and must be processed on the same processorπ²as taskcatt1. Thus, taskemay be executed att2 only on the processorπ².

1.2. Organization of the Paper

This paper is organized as follows: the next section is devoted to the related works.

In Section 3, after defining the class graph G we propose a general nonapproximability result for a nonspecified precedence graph. We also extend the previous result when the precedence graph is a bipartite graph and when the duplication is allowed. In the last section, we design a polynomial-time approximation algorithm with a performance ratio within Oδ.

2. Related Works

2.1. Complexity Results

To the best of our knowledge, the first complexity result was given by Picouleau2 . The considered problem was to schedule unit execution time tasks with a precedence graph on an unbounded number of processors and on a chain or star a star is a tree of depth one topology. Picouleau proved that this problem isNP-complete if the precedence graph is a tree or an outtree. Recently in 3 , the authors proved that there is no heuristic with a performance guarantee smaller than 6/5 for minimizing the makespan on a processor network represented by a star. This model is closest to the master-slave architecture. In4 , the authors proved that there is no hope to finding a polynomial-time approximation algorithm with a ratioρ >4/3 for the problem to schedule a set of tasks on a ring or a chain as processors networkseeTable 1.

2.1.1. Approximation Results

In ring topology, Lahlou developed, in5 , using the list scheduling proposed by Rayward- Smith6 , aρ-approximation algorithm with√

m ≤ρ≤1 3/8m−1/2mwheremis the number of processors.

Moreover, Hwang et al. 7 studied approximation list algorithms for scheduling problems where the communication times depend on contention and a distance function

(4)

for the tasks involved and on the processors that execute the tasks. The authors examined a simple strategy called extended list scheduling, ELS, which is a straightforward extension of list scheduling. They proved that the ELS strategy is unsatisfactory, but improved a strategy called earliest task first.

Recently, in3 the authors proposed a sophisticated polynomial-time approximation algorithm with a ratio equal to four based on three steps for the problem for the makespan minimization problem on a processor networks as a star forms. In4 the authors develop two polynomial-time approximation algorithms for processor networks with limited or unlimited resources.

2.2. Our Contributions

In this paper, we answer the following interesting question: is there a large class of graphs, for which it exists a polynomial-time reduction from n-PARTITION, to show the NP-completeness?

Therefore, it is suﬃcient to show if the graph G is belonging to this class, in order to prove the nonexistence of PTAS? In order to complete the study of processor networks, we design a polynomial-time approximation algorithm within a ratio at mostδ1²/3 1 whereδ designates the diameter of the graphG^∗.

3. Computational Complexity for a Large Class of Graph

3.1. The Class GraphG

We propose a large class of graphGfor which the problem of deciding whether an instance P, G^∗|prec;cijdπ, π^k;pi1|Cmax≤3 isNP-complete.

We present now a graph class for which we may apply the same polynomial- time transformation mechanism from 3-PARTITION problem to show that our scheduling problem when processor networks belong to this class isNP-complete. Hereafter, we give the definition of the prism graph.

Definition 3.1. A prismP VP, EPof sizekand lengthLk, L∈^Æis a connected undirected graph for that

ithere are two sets of verticesK andKsuch asK ⊂ V_P,K ⊂ V_P \ {K}, and|K|

|K|k. The vertices are denoteds1, . . . , skresp.s₁, . . . ,s_k;

iiit exists an order onKandKvertices such that∀si∈K, s_i∈K,1≤i≤kthere is a path of lengthLdenotedCibetweensiands_i;

iii i /j∧x∈Ci\ {si, s_i} ∧y∈Cj\ {sj, s_j} ⇒x, y∈/EP.

Moreover, the size of a prism is polynomial ink. An illustration is given inFigure 2.

Definition 3.2. LetG be a collection of graphs.G possess the prism property if and only if

∀n0,∀n1 ∈^Æ∃G ∈ G, such thatGcontains a unique subgraphG1 V1, E1ofGinduced by verticesV1⊂V with a prism of sizekn0and lengthLn1.

(5)

Needed edges

Example of authorized edges Examples of forbidden edges

Vertices inKorK^′ Vertices not inK∪K^′ L+1

k

Figure 2: An example of a prism of sizekand lengthL.

Lemma 3.3. The class graphGis not empty.

Proof. In particular we will see in Section 3.2 classic structured graph like torus, grid, complete binary tree, and so forth, belonging to this class graph.

Theorem 3.4. The problem of deciding whether an instance ofP, G^∗|β;cijdπ, π^k;pi1|Cmax

has a schedule of length at most two is polynomial withβ∈ {prec,bipartite}andG^∗∈ G.

Proof. No communication is allowed between two pairs of tasks.

The remainder of this section is devoted to provingTheorem 3.5.

Theorem 3.5. The problem of deciding whether an instance ofP, G^∗|prec;cij dπ^k, π^l;pi 1|Cmaxhas a schedule of length at most three isNP-complete withG^∗∈ G.

Proof. The proof is established by a reduction of the 3-PARTITION problem8 .

Instance

A finite setA of 3Melements{a1, . . . , a3M}, a boundB ∈^Æ, and a sizesa ∈^Æ for each a∈ Asuch that eachsasatisfiesB/4< sa< B/2 and such that

a∈Asa MB.

Question 1. CanAbe partitioned intoMdisjoint setsA1, . . . ,AMofAsuch that for alli ∈ 1, . . . , M ,B

a∈Aisa

a∈Asa/M∈^Æ?

3-PARTITION is known to be NP-complete in the strong sense 8 . Even if B is polynomially bounded by the instance size, the problem is stillNP-complete.

It is easy to see thatP, G^∗|prec, cij dπ^l, π^k 1, pi1|Cmax≤3∈ NP.

Given an instanceIof the 3- PARTITION problem, we construct an instanceIof the scheduling problemP, G^∗|prec;cijdπ, π^k;pi1|Cmax≤3 withG^∗∈ G, in the following way.

(6)

Z²[1,0]

Z²[3,1]

Z²[3,0] Z²[2,1]

Z²[2,0] Z²[1,1]

Z²[0,1]

Z²[0,0]

Figure 3: Graph Z².

The precedence graphGWZ, which will be scheduled on the processors network G^∗, is decomposed into two disjointed graphs, denoted as follows byWand Zthe graph Z is a collection of graphs Z^sa^j, i.e., Z ∪aj∈AZ^sa^j. Hereafter, graphs Z and W are characterized.

GraphZⁱ

Letibe an integer such thati > 1. GraphZⁱ consists of 4×ivertices denoted byZⁱk,0 , Zⁱk,1 , where 0 ≤ k < 2i. The precedence constraints between these tasks are defined as follows:

iarcsZⁱj,0 → Zⁱj,1 for anyj, 0≤j≤2i−1, iiarcsZⁱ2j,0 → Zⁱ2j1,1 for anyj, 0≤j ≤i−1, iiiarcsZⁱ2j,0 → Zⁱ2j−1,1 for anyj, 1≤j ≤i−1.

Remark 3.6. Valid scheduling of length three for the case where the precedence graph isZⁱin a path of 2iprocessors is as follows, for anyj, 0≤j≤2i−1,

itasksZⁱj,0 andZⁱj,1 are executed onπ^j,

iitasksZⁱj, are executed at time, for any∈ {0,1}, ifjis even, iiitasksZⁱj, are otherwise executed at time1, for any∈ {0,1}.

See Figure 3 for graph Z² and Figure 4 for the valid scheduling described in Remark 3.6.

GraphW

Remark 3.7. A path of lengthladmitsl1 vertices.

TheW V ∪ V;E_Wgraph will be defined as follows. LetG^∗ V^∗, E^∗be a graph such thatG^∗ ∈ G, withV^∗ {v^∗₁, . . . , v_n^∗∗}. ByDefinition 3.2, we know that it exists a unique subgraphG V ⊂V^∗, E⊂E^∗of sizekand lengthLwith desired properties. In the following we setknandL2B1 and the size ofG^∗ V^∗, E^∗is polynomial ink. Note thatn^∗2B.

TheW-graph is defined by polynomial-time transformations from theG^∗-graph. The graph given inFigure 5will be used to illustrated the following construction.

iThe paths of length three are created and precedence constraints are addedsee Figure 6. The two sets of tasksV1andVare created.

iiThe tasks are partitioned into three subsetsV,K, andVseeFigure 7.

(7)

Z²[3,1] π⁰

π¹

π²

π³

0 1 2 3 t

Z²[0,0] Z²[0,1]

Z²[1,0] Z²[1,1]

Z²[2,1]

Z²[2,0]

Z²[3,0]

Figure 4: Valid schedule of length three for graph Z².

The graphG^∗

TheG-graph is induced by theV-vertices

V^′-vertices V-vertices

Figure 5: The beginning of the construction ofWgraph fromG^∗∈ G.

iiiThe V1-tasks are now partitioned into two subsets K and V. We consider the subgraph induced by theV ∪ V-tasksseeFigure 8as theW−graph.

The purpose of removing these tasks is to allow the tasks ofK-graph when the tasks ofW-graph, deprived of these tasks, will be executed on the graph of processors.

The set of verticesV^∗is partitioned into two setsV^∗V∪V:

iV {v^∗₁, . . . , v^∗_2nB1}the vertices ofG, and defined the vertices of thenunique paths of length2B1respecting the characteristics given byDefinition 3.1, iiV{v^∗_2nB11, . . . , v^∗_n∗}, the set of an other vertices. Note that these vertices do not

belong toGgraph.

(8)

V^′tasks V1tasks

Figure 6: Next step of the construction of W graph. Path of length three is created and precedence constraints between tasks are added.

V^′tasks Ktasks Vtasks

Figure 7: Partition ofG^∗graph into tasks setsV,K, andV.

(9)

V^′tasks Vtasks

Figure 8: The finalWgraph issue from several transformations.

The definition of theWgraph is given below.

i∀i∈ {1, . . . ,2nB1}, we create a path of length threev_i^∗0 , v^∗_i1 , andv_i^∗2 , with edgesv_i^∗0 → v^∗_i1 → v^∗_i2 . The set of tasks will be denotedV1 {v_i^∗j | ∀i∈ {1, . . . ,2nB1}, j∈ {0,1,2}}. The cardinality ofV1is 6nB1 seeFigure 6.

ii∀i∈ {2nB1 1, . . . , n^∗}, we create a path of length threev_i^∗0 → v^∗_i1 → v_i^∗2 . This set of tasks will be denotedV. The number of tasks is 3n^∗−2nB1with n^∗|V^∗|.

iii k, l∈E^∗, we add the edgesv_k^∗0 → v^∗_l2 andv_l^∗0 → v^∗_k2 seeFigure 6.

Now, 4nB tasks are removed from W-graph. In order to clarify the polynomial- time transformation, we give priority to create tasks and remove some ones instead of enumerating all precedence constraints.Therefore, we consider the following index sets:

iJ₀{2iB1|i{1,2, . . . , n}},

iiJ1{2iB1 1|i∈ {0,1,2, . . . , n−1},

iiiI₀ {k∈ {1, . . . ,2nB1} \ {J0∪J₁} and|k is even}, ivI1 {k∈ {1, . . . ,2nB1} \ {J0∪J1}and|k is odd}.

We remove from theV1-set the following tasksv^∗_k0 ,v^∗_k1 withk ∈I0,resp.v^∗_k1 , v_k^∗2 withk ∈ I1.Kdenotes the set of removed tasksseeFigure 7. Finally, we putV V1\ Kwith|V|2nB6nseeFigure 8.

Figures5,6,7, and8describe the construction ofW-graph fromG^∗∈ G.

E_Wis the set of arcs as described above.

(10)

Lastly, the number of processors ism n^∗, and they are numbered asπⁱ with i ∈ 1, n^∗ .

In summary the precedence graphG WZis composed byW V ∪ V, E_Wwith 3n^∗−4nBtasks and the precedence constraints given before and the graphZ{

aj∈AZ^saⁱ} with 4nBtasks.

The transformation is computed in polynomial time.

iLet us assume thatA {a1, . . . , a3M}can be partitioned intoM disjoint subsets A1, . . . ,AMwith each summing up toB. We will then prove that there is a schedule of length three at most.

Let us construct this schedule.

First, the taskv^∗_ij ∈ V∪ Vis executed on the processorsπⁱtotjwithj ∈ {0,1,2}

if this task exists.

Consider the processors on which the set of V-tasks are scheduled. By the previous allocation, these processors are numbered asπ¹, . . . , π^2nB1.

Let{A1, . . . ,An}be a partition ofA. ConsiderAi {ai1, ai2, ai3}with a fixedi. The tasks ofZ^sa^j,a_j ∈ Ai are executed between processorsπ^12i−1B1andπ^2iB1. Moreover, the tasksZ^sa^jl, k ,k ∈ {0,1},l ∈ J0 resp.,k ∈ {1,2},l ∈ J1 are scheduled on 2saij processors in succession in order to respect a schedule of length three.

Thus without loss of generality, we suppose that the tasks of Z^saⁱ¹ are scheduled between processorsπ^12i−1B1 andπ^2i−1B12saⁱ¹. In similar way, the tasksZ^saⁱ²resp., Z^saⁱ³are executed between processorsπ^22i−1B12saⁱ¹andπ^12i−1B12saⁱ¹^2saⁱ²resp.

π^22i−1B12saⁱ¹^2saⁱ²andπ^2iB1.

iiLet us assume now that there is a scheduleSof length at most three. We will prove thatA {a1, . . . , a_3M}can be partitioned intoMdisjoint subsetsA1, . . . ,AMwith each summing up toB.

Lemma 3.8. In any valid schedule of length three there is no idle time.

Proof. The number of processors ism n^∗ and the number of tasks is 3n^∗4nBforZ-graph and 3n^∗−4nBforWgraph.

Lemma 3.9. In any valid schedule of length three, the subgraph induced byVtasks must be executed on 2B1processors in succession.

Proof. Consider the subgraph induced by theV tasks. This precedence graph admits paths of length two and these paths must be executed on the same processorno communication delay is allowed.

Consider the tasks of path of length one. Letv^∗_i0 ∈ Vbe a task without predecessor.

By constructionv^∗_i0 admits one successor denoted byv^∗_i12 ∈ V.

Suppose that these two tasks are allotted on the same processorπ^l. Since thatv^∗_i12 admits another predecessor denoted byv^∗_i20 ∈ Vthenv^∗_i12 is allotted att2.

The task v^∗_i20 cannot be executed at t 1 on π^l since this task admits another successor asv^∗_i12 . Therefore, it exists an idle slot att1 on the processorπ^l. By construction there is no independent task and since theZgraph admits only path of length one, then no task can be allotted on this idle slot. This is impossible

In conclusion, the subgraph induced byVtasks must be executed on 2B1processors in succession.

(11)

Lemma 3.10. In any valid schedule of length three, two subgraphs induced by theVtasks from two disjoint paths of length 2B1cannot be allotted on the same processors.

Proof. Consider theVtasks which are elements of two disjoints paths of length 2B1. A task without predecessor of one path cannot be allotted on the same processor as a task without successor of other path since there is no isolated task to schedule.

Lemma 3.11. In any valid valid schedule of length three theZ^sa^jtasks must be executed on the same processors as theVtasks.

Proof. LetΠ {π^l| Vtasks allotted onπ^l}be the set of processors on which theVtasks are executed.

Suppose that theZ^sa^j-tasks are executed on processorsπ^k∈/Π. ByLemma 3.8, there is no idle slot, then the tasks on the path of length three are necessarily allotted on processor π^∗∈Π. This is impossible byLemma 3.9.

With previous lemmas, we know that 6nB1taskstheVtasks and theZ^sa^j-tasks are executed on thendisjoints paths of length 2B1. ByDefinition 3.2, we know that the graphG^∗admits a unique set ofndisjoints paths of length 2B1 with desired properties.

Moreover with the precedence constraints, these tasks are allotted on a processor path of length2B1. Without loss of generality, we suppose that a taskvlßVis executed on the processorπ^lwithl∈ {2nB1 1, . . . , n^∗}.

Building the partition{A1, . . . ,An}with desired property fromSschedule of length three, we know that two tasks of the same subgraphZ^sa^jseeLemma 3.11cannot be executed on two diﬀerent paths. The edge distance between these two processors is at least two.

We defineAsuch thataj ∈ Aif and only if the tasks of the graphZ^sa^jare executed between the processors numbered asπ^1j−12B1toπ^2jB1with a fixedj.

Now, we will compute

ai∈Asai.

Using previous remarks, without loss of generality, we suppose thatv_i^∗k withi ∈ {1, . . . ,2nB1}andk∈ {0,1,2}if it existsare executed onπ^lwithl∈ {1, . . . ,2nB1}.

Consider theZ^sa^j-tasks which are scheduled between processorsπ^1j−12B1 andπ^2jB1 for a fixedj ∈ {1, . . . ,2nB1}except the index such that paths of length three constituted by tasks fromV, are allotted onπ^l.

Using Lemma 3.9, we know that the number of V tasks executed on processors π^1j−12B1andπ^2jB1for a fixedjis 62B.

In conclusion we have{A1, . . . ,An}which forms aAwith desired properties.

The construction suggested previously can be easily adapted to obtain a bipartite graph of depth one. Moreover, from the proof ofTheorem 3.5, we can derive the following theorem.

Theorem 3.12. The problem of deciding whether an instance ofP, G^∗|β, cij dπ, π^k 1, p_i 1|Cmaxhas a schedule of length at most three isNP-complete withβ∈ {prec,bipartite}.

Proof. The proof is similar as the proof ofTheorem 3.5by considering the graphGinstead of widgetG. Nevertheless each path of length two induced by the Vtasks is transformed into two paths of length one.

We use the same construction as it is proposed for the proof of Theorem 3.5.

Nevertheless, all paths of length three are transformed into two paths in the following way:

v_i^∗0 → v^∗_i1 and v^∗_i0 → v_i^∗2 . These three must be executed on the same processors.

(12)

Indeed, ifv^∗_i2 admits several predecessors, it is obvious. Otherwise, suppose thatv^∗_i0 is allotted on a processorπ. Sov^∗_i1 must be executed att1 onπ. The taskv^∗_i2 is scheduled at t 2 on a neighborhood processor. Therefore no task from the graphsZandGcan be executed on processorπ att 2. Now using the same arguments as previously there is a schedule of length three if and only if the set A {a1, . . . , a_3n} can be partitioned into n disjoint subsetsA1, . . . ,Aneach summing up toB.

The proof ofTheorem 3.5therefore implies that the problem where the tasks can be duplicated is alsoNP-complete.

Corollary 3.13. The problem of deciding whether an instance ofP, G^∗|β;cij dπ, π^k;pi 1, dup|Cmax with G^∗ ∈ G has a schedule of length at most three is NP-complete with β ∈ {prec,bipartite}.

Proof. The proof comes directly from Theorems3.5and3.12. In fact,Lemma 3.8implies that no task can be duplicatedthe number of the tasks is equal to the number of processors times 3.

Moreover, nonapproximability results can be deduced.

Corollary 3.14. No polynomial-time algorithm exists with a performance bound less than 4/3 unless P NP for the problemsP, G^∗|β;cij dπ, π^k;pi 1|Cmaxand P, G^∗|β;cij dπ, π^k; pi1,dup|Cmaxβ∈ {prec,bipartite}withG^∗∈ G.

Proof. The proof ofCorollary 3.14is an immediate consequence of the impossibility theorem;

see9, page 4 .

3.2. Discussion

In the previous section, we propose a class graph G for which the problem of deciding whether an instance ofP, G^∗|β;c_ij dπ, π^k;p_i 1|Cmaxhas a schedule of length at most three isNP-complete withβ∈ {prec,bipartite}andG^∗∈ G.

Hereafter, we will exhibit the parametersL, kfor some classic structured graphs in order to prove that the class graphGis not empty.

iFor a gridG^∗ Gridm, p m, p ∈ ^Æ, where the couplei, jdesignates thej the position in theithe line; 1 ≤ i ≤ m,1 ≤ j ≤ p or torustopology, we needk 2n1 lines andL2B2 columns. The set of vertices for the graphGa subgraph of G^∗ with the desired properties given byDefinition 3.2isV {i, j, 2 ≤ i ≤ 2n, ieven,2 ≤ j ≤ 2B3}andV {i,1,1 ≤ i ≤ 2n1} ∪ {i, j,1 ≤ i ≤ 2n 1, iodd; 1≤j≤2B3}.

iiFor the complete binary tree, it is suﬃcient to consider a tree with height of logn2B1.

iiiFor the HypercubeHdtopologyor cube connected cycles, it is suﬃcient to have d2lognB2.

iv. . ..

(13)

4. An Approximation Algorithm for Processor Networks with a Fixed Diameter

4.1. Description and Correctness of an Algorithm

In order to design an eﬃcient polynomial-time approximation algorithm, the classic strategy consists of taking an instance of the combinatorial optimization problem and applying some transformations and/or using polynomial-time algorithms as subroutines shortest path, spanning tree, maximum matching, etc.. Afterwards, it is suﬃcient to evaluate the best lower bound for any optimal solution, and this lower bound may be compared to the feasible solution for the combinatorial optimization problem in order to determine the ratio of an approximation algorithm.

Here, instead of considering an instance I and trying to directly develop a feasible solution for theP, G^∗|prec;cij dπ^k, π^l 1;pi 1|Cmax problem, we consider a partial instance ofIof our scheduling problemAn instanceIis constituted by a precedence graph with unit execution time and unit communication time,mprocessors inGgraph form, with the distance function., denotedI^∗.The partial instanceI^∗ of I is constituted only by the precedence graph with unitary tasks and unitary communication timeFor any instanceI^∗, we use the classic approximation algorithm proposed by Munier and K ¨onig 10 for the P|prec;cij 1; pi 1|Cmax problem. We obtain a feasible schedule, denoted S we omit consideration of the processor graph for the momentfor the previous problem. Nevertheless, this solution is not feasible for our scheduling problem.

We proceed with polynomial-time chain of transformations, from schedule S to a scheduleS, in order to get a feasible schedule. It is only in the last step, only for schedule S, that we guarantee a feasible schedule for the problemP, G^∗|prec;c_ijdπ^k, π^l 1;p_i 1|Cmax.

This chain is defined as follows:I^∗ −→^f S −→^g S −→^h S The scheduleS is a feasible solution for the{P , G}|prec;cij dπ^k, π^l 1; pi 1|Cmax problem. , where f is the Munier-K ¨onig algorithm10 ,gthe dilatation algorithmsee11 for details orAppendix A andhthe folding algorithmsee12 for details orAppendix B.

Subsequently, we will consider the three following scheduling problems:

iP|prec;cij 1;p_i1|Cmax, iiP|prec;c_ij≥2;p_i1|Cmax,

iiiand finallyP, G^∗|prec;cijdπ^k, π^l 1;pi1|Cmax. The principal steps of the algorithm are described below.

An approximation algorithm uses three steps. In each step we apply an algorithm for a specified scheduling problem10–12 . In the two first steps, a schedule is producedthese schedules are not feasible for our problem.

iIn the first step of an algorithm, a scheduledenotedSon an unbounded number of processors, for the scheduling problemP|prec;cij 1;pi1|Cmaxis produced. For this problem, Munier and K ¨onig10 presented a4/3-approximation algorithm that is based on an integer linear programming formulation. They use the following procedure: an integrity constraint is relaxed, and a feasible schedule is produced by rounding.

iiThe second step of an algorithm produces a schedule denoted S, also on an unbounded number of processors from S by applying the dilatation principle

(14)

I^∗ f

S

S g

′ h

m m S^′′ m

Cmax^G,m

C^{U ET-UCT,∞}_max CU ET-LCT(c=δ),∞

max

It is clear thatC^UET-UCT,∞max ≤CUET-LCT(c=δ)∞

max ≤C^G,mmax

Figure 9: Description of chain of polynomial-time transformations.

proposed by 11 for the problem P|prec;c_ij ≥ 2;p_i 1|Cmax this algorithm produces a feasible schedule for the large communication delay problem from unitary communication delay. We therefore haveSgSwheregis the dilatation algorithm.

iiiThe third step produces a scheduleSfeasible for theP, G^∗|prec, cijdπ^k, π^l 1, p_i 1|Cmax problem on the G topology from S using the folding principle 12 . The folding procedure constructs a feasible schedule on restricted number of processors from a feasible schedule on an unbounded number of processors. Thus, ShSwithhbeing the folding algorithm.

Note that the length of scheduleSis less thanS, which is less thanS. The three steps are summarized inFigure 9. The notation description is given in the proof ofTheorem 4.2.

Theorem 4.1. The previous algorithm leads a feasible schedule for the problem P, G^∗|prec;c_ij dπ^k, π^l 1;pi1|Cmax.

Proof. Proof is clear from the previous discussion concerning the description of an algorithm.

Indeed, the communication delay is preserved and the precedence constraint is respected.

Moreover, at mostmtasks are executed at any time.

4.2. Relative Performance Analysis

Theorem 4.2. The problemP, G^∗|prec;cij dπ^k, π^l 1;pi 1|Cmax may be approximable within a factor ofδ1²/3 1 using the previous algorithm.

Proof. We denote using C^x,y,z_max with x ∈ {opt,∅}, y ∈ {UET-UCT,UET-LCTc δ, G^∗}, and z ∈ {m,∞} the length of the schedule. Moreover ρ^G^∗^,m resp., ρ^G^∗^,∞ designates the performance ratio on a G processor network model with a bounded resp., unbounded number of processors.

Now let us examine the relative performance of this algorithm.

iAccording to an algorithm, the first step deals with the problemP|prec;cij1;pi 1|Cmax.

(15)

First of all the ScheduleUET-UCT,∞is not optimal. Using the algorithm from 10 gives us a 4/3 relative performance. And so, by10 , we know that

Cmax^UET-UCT,∞≤ 4

3Copt,UET-UCT,∞

max . 4.1

iiIn the second step, a feasible solution for a large communication delaycδrecall thatδstands for the diameter of processors networkis created. This solution comes from using the dilatation algorithm. Then, the expansion coeﬃcient is δ1/2 11 . And so,

CUET-LCTcδ,∞

max ≤ δ1

2 ·4

3Copt,UET-LCTcδ,∞

max , 4.2

CUET-LCTcδ,∞

max ≤ 2δ1

3 Copt,UET-LCTcδ,∞

max . 4.3

Thus, we have a schedule on a UET-LCT task system with a communication delay equal toδ and an infinite number of processors.

By definition it is obvious that

C^G_max^∗^,∞≤CUET-LCTcδ,∞

max , 4.4

Copt,UET-UCT,∞

max ≤Copt,UET-LCTcδ,∞

max . 4.5

It is necessary to evaluate the gap between the optimal length for the schedule on a fully connected processor graph and a processor graph with a diameter of length K. For this, we consider unitary tasks subject to precedence constraints and an unbounded number of processors.

Lemma 4.3. The gap between a schedule on a fully connected graph of processors with a large communication delayc, for all pairs of tasks, and a schedule on a graph of processors with a diameter of lengthK ∈^Æ, is at mostc1/2.

Proof. We need to compare first the relative performance of this schedule on our model with network processor. The relative performance for the UET-LCT task system is not valid for our model. We need to compute a new bound for this schedule on our model.

Let p {x1, x2, . . . , xn}be a critical path of the schedulei.e., a path that gives the length of the schedule. Suppose that there is a communication delay between each pair of tasksxi, x_i1 with 1 ≤ i < n. In the UET-LCT task systemwith a communication delay equal tocfor all pair of tasksthe length of the schedule would be1cn−cunits of time.

In the graph of processors with a diameter of lengthk, the same path allows a length of k/2n−1 nunits of time. The worst case of the length for this path isn n−1kand the best case is 2n−1. So, the ratio isn1c−c/2n−1. For the largen, we obtain the desired result.

(16)

By applying Lemma 4.3, which is valid for all schedules, and in particular for the optimum, withcδ, we obtain

Copt,UET-LCTcδ,∞

max ≤ δ1

2 C^opt,G_max ^∗^,∞ 4.6

and so

Cmax^G^∗^,∞≤CUET-LCTcδ,∞

max by4.4 4.7

C^G_max^∗^,∞≤ 2δ1

3 Copt,UET-LCTcδ,∞

max using4.3 4.8

C^G_max^∗^,∞≤ δ1²

3 C^opt,G,∞_max using4.6 4.9

ρ^G^∗^,∞≤ δ1²

3 . 4.10

Now we have to transform this schedule using an infinite number of processors into a schedule with a bounded number of processors. This can be done easily using the method from12 . The new worst-case relative performance is just increased by one. Thus we have

ρ^G^∗^,m≤ρ^G^∗^,∞1≤ δ1²

3 1.

4.11

Remark 4.4. Note that the order of the operations may be modified. Nevertheless, the ratio becomes 7/6 ×δ1². Indeed, the folding principle may be used just after the solution given by an algorithm proposed by Munier and K ¨onig10 . We then obtain a schedule onm processors. Afterwards, we apply the dilation principle. This order yields a polynomial-time approximation algorithm with a ratio bounded by 7/6×δ1².

Remark 4.5. we may recall two classic results in scheduling problems for which the performance ratio increases by one between the unbounded and bounded versions.

1 When the number of processors is unlimited, the problem of scheduling a set of n tasks under precedence constraints with noncommunication delay is polynomial.

It is suﬃcient to use the classical algorithm given by Bellman 13 as well as the two techniques widely used in project management: CPM Critical Path Method and PERT Project/Program Evaluation and Review Technique. In contrast, when the number of processors is limited, the problem becomesNP-complete and a2 −1/m-approximation is developed by Graham, see14 , wheremdesignates the number of processors based on a list scheduling in which no order on tasks is specified.

2 The second illustration is given by the transition to UET-UCT on unrestricted version to the restricted variant. In 10 , we know the existence of a 4/3-approximation algorithm. Using the previous result Munier and Hanen in15 design a 7/3-approximation for the restricted version.

(17)

5. Conclusion

We have sharpened the demarcation line between the polynomially solvable andNP-hard case of the central scheduling problem UET-UCT on a structured processor network by showing that its decision is polynomially solvable forCmax ≤2 while it isNP-complete for Cmax≥3. This result is given for a large class of graph with a nonconstant diameter. This result implies there is noρ-approximation algorithm withρ <4/3. These results are extended to the case of precedence graph is a bipartite graph.

Lastly, we complete our complexity results by developing a polynomial-time approximation algorithm for P, G^∗|prec, cij dπ^k, π^l 1, p_i 1|Cmax with a worst- case relative performance ofδ1²/31, where δdesignates the diameter of the graph.

An interesting question for further research is to find a polynomial-time approximation algorithm with performance guaranteeρwithρ∈^Ê.

Appendices A.

This section describes the dilatation principle. This principle has been studied in 11 , and used for designing a new polynomial-time approximation algorithm with a nontrivial performance guarantee for the problemP|prec;cijc≥2;pi1|Cmax. For the latter problem, the authors propose ac1/2-approximation algorithmthe best ratio as far as we know.

A.1. Introduction, Notation, and Description of the Method

Notation 1. We useσ^∞to denote the UET-UCT schedule, and byσ_c^∞the UET-LCT schedule.

Moreover, we uset_iresp.,t^c_ito denote the starting time of the taskiin scheduleσ^∞resp., in scheduleσ^∞_c .

Principle

The tasks inσ_c^∞ allow the same assignment as the feasible schedule σ^∞ on an unbounded number of processors. We proceed to an expansion of the makespan, while preserving the communication delayt^c_j ≥t^c_i 1cfor two tasksiandj, withi, j∈E, processing on two diﬀerent processors. For this, the starting timet^c_i is translated by a factord.

In the following section, we will justify and determine the coeﬃcientd.

More formally, letG V,Ebe a precedence graph. We determine a feasible schedule σ^∞, for the model UET-UCT, using the4/3-approximation algorithm proposed by Munier and K ¨onig10 . The result of this algorithm gives a couple of valuesti, π,∀i ∈ V on the schedule σ^∞ with t_i being the starting time of the task ifor the schedule σ^∞ and π the processor on which the taskiwill be processed atti.

From this solution, we will derive a solution for the problem with large communication delays. For this, we will propose a new couple of valuest^c_i, π,∀i ∈ V derived from coupleti, π. The computation of this set of new couples is obtained in the following ways:

the start timet^c_i d×ti c1/2tiand,π π. In other words, all tasks in the schedule σ^∞_c are allotted on the same processor as the scheduleσ^∞, and the starting time of a taski undergoes a translation with a factorc1/2. The justification of the expansion coeﬃcient is given below. An illustration of the expansion is given inFigure 10.

(18)

(c+1)k 2 +1

(c+1)k 2

(c+1)(k+2)

2 +1

(c+1)(k+2)

2 +1

(c+1)(k+2) 2 (c+1)(k+1)

k+3 2

k+2 k+1

1 k

x x

z c z

y y

π² π¹

Communication delay Communication delay

Model UET-UCT Model UET-LCT

Figure 10: Illustration of the notion of expansion.

A.2. Feasibility, Analysis of the Method, and Computation of the Ratio

Afterwards, we will justify the existence of the coeﬃcient d. Moreover, we prove the correctness of the feasible schedule forP|prec;c_ij c ≥ 2;p_i 1|Cmax problem. Lastly, we propose a worst-case analysis for the algorithm.

Lemma A.1. The coeﬃcient of an expansion isd c1/2.

Proof. Let there be two tasksiandjsuch thati, j∈E, which are processed on two different processors in the feasible scheduleσ^∞. We are interested in obtaining a coefficientdsuch that t^c_i d×ti and t^c_j d×tj. After expansion, in order to respect the precedence constraints and communication delay, we must havet^c_j ≥ t^c_i 1c, and sod×ti−d×tj ≥ c1, d ≥ c1/t_i−t_j, d≥c1/2. It is sufficient to choosed c1/2.

Lemma A.2. An expansion algorithm gives a feasible schedule for theP|prec;c_ijc≥2;p_i1|Cmax

problem inOn.

Proof. It is suﬃcient to check that the solution given by an expansion algorithm produces a feasible schedule for the UET-LCT model. Letiandjbe two tasks such thati, j∈E. We use πⁱresp.,π^jto denote the processor on which taskiresp., the taskjis executed in schedule σ^∞. Moreover, we useπⁱresp.,π^jto denote the processor on which taskiresp., the task jis executed in scheduleσ_c^∞. Thus,

iifπⁱ π^j thenπⁱ π^j. Since the solution given by Munier and K¨onig10 gives a feasible schedule on the model UET-UCT, we haveti1 ≤tj,2/c1t^c_i 1 ≤ 2/c1t^c_j;t^c_i 1≤t^c_i c1/2≤t^c_j;

iiif πⁱ/π^j then πⁱ/π^j. We haveti11 ≤ tj,2/c1t^c_i 2 ≤ 2/c1t^c_j; t^c_i c1≤t^c_j.

Theorem A.3. An expansion algorithm gives a2c1/3-approximation algorithm for the problem P|prec;c_ij c≥2;p_i1|Cmax.

Proof. We use Ch,UET-UCT,∞

max resp., Copt,UET-UCT,∞

max to denote the makespan of the schedule computed by Munier and K ¨onigresp., the optimal value of a scheduleσ^∞. In the same way, we useC^h_max^∗^,UET-LCT,∞resp.,Copt,UET-LCT,∞

max to denote the makespan of the schedule computed by an algorithmresp., the optimal value of a scheduleσ_c^∞.