Analysis of an Approximation Algorithm for Scheduling Independent Parallel Tasks
Keqin Li
Department of Mathematics and Computer Science, State University of New York, New Paltz, NY 12561-2499, USA received April 14, 1998, revised May 28, 1999, accepted May 30, 1999.
In this paper, we consider the problem of scheduling independent parallel tasks in parallel systems with identical processors. The problem is NP-hard, since it includes the bin packing problem as a special case when all tasks have unit execution time. We propose and analyze a simple approximation algorithm called , where is a positive integer. Algorithm has a moderate asymptotic worst-case performance ratio in the range
for all ; but the algorithm has a small asymptotic worst-case performance ratio in the range !
, when task sizes do not exceed of the total available processors, where#"$ is an integer. Furthermore, we show that if the task sizes are independent, identically distributed (i.i.d.) uniform random variables, and task execution times are i.i.d.
random variables with finite mean and variance, then the average-case performance ratio of algorithm% is no larger than 1.2898680..., and for an exponential distribution of task sizes, it does not exceed 1.2898305.... As demonstrated by our analytical as well as numerical results, the average-case performance ratio improves significantly when tasks request for smaller numbers of processors.
Keywords: approximation algorithm, average-case performance ratio, parallel task scheduling, probabilistic analy- sis, worst-case performance ratio
1 Introduction
In this paper, we consider the problem of scheduling independent parallel tasks in parallel systems with identical processors. Assume that we are given a list of & tasks ')(+*-,/.102,4310657585709,4:<; . Each task,4=
is specified by its execution time >6=, and its size ?=, i.e., ,4= requires ?= processors to execute. There are @ identical processors, and any ?= processors can be allocated to ,4=. Once,4= starts to execute, it runs without interruption until it completes. Tasks in ' are mutually independent, that is, there is no precedence constraint nor data communication among the& tasks. The problem addressed here is to find a nonpreemptive schedule of ' such that its makespan (i.e., the total execution time of the & tasks) is minimized.
The problem is NP-hard, since it includes the bin packing problem as a special case when all tasks have unit execution time. Therefore, one practical way to solve this problem is to design and analyze approximation algorithms that produce near optimal solutions. LetAB*C'D; be the makespan of the schedule
E
The author can be reached at email: [email protected], phone: (914) 257-3534, fax: (914) 257-3571.
1365–8050 c
F
1999 Maison de l’Informatique et des Math´ematiques Discr`etes (MIMD), Paris, France
generated by an algorithm A for ' , and GIHKJL*'; be the makespan of an optimal schedule of ' . The
quantity MON
P
(RQ7S8T
UWV
NYX Z2[<\
]_^ `_a8b<cd UIe AB*';
GIHKJL*';1fhg
is called the asymptotic worst-case performance ratio of algorithmA . If there exist two constantsi andj such that for all' ,AB*';%klinmoGIHDJL*'D;Wpqj/>r , where>rs(tTvu1wx.zy{=y{:_*>6=|; is the longest execution time of the& tasks, then
M N
P
k}i , andi is called an asymptotic worst-case performance bound of algorithm
A . Moreover, if for any small~hY and all largel , there exists a list' , such thatGIHKJL*C'D; , and
AB*';% *i~z;GIHKJL*';, then the boundi is called tight, i.e.,
M N
P
(i . When task sizes and execution times are random variables, bothAB*'; andGIHDJL*'D; become random variables, and
MON
P
(Q7S8T
: V N X
*A*';9;
*|GIHDJO*';9;g
is called the asymptotic average-case performance ratio of algorithm A , where *m; stands for the ex- pectation of a random variable. Of course,
M N
P depends on the probability distributions of task sizes and execution times. If there exists a constant such that for all' , *CAB*';9;#km
*|GIHDJI*C'D;2; as& , then
M N
P
k , and is called an asymptotic average-case performance bound of algorithmA .
We notice that our scheduling problem defined above looks similar to but is quite different from the two dimensional rectangle packing problem [1, 2, 7, 9], where each task, = is treated as a rectangle with width
? = and height> =. The rectangle packing model implies that processors should be allocated in contiguous groups. That is, the@ processors have indices 1, 2, 3, ...,@ , and task, = must be allocate? = processors with indices ,spY , ...,#p? = l for some . Such a scheduling problem arises in parallel systems like linear arrays. The rectangle packing problem has been extensively studied, where a complicated algorithm with asymptotic worst-case performance ratio as low as 1.25 has been found [1]. However, contiguous processor allocation is not required in our model, where any?= processors can be allocated to,4=. Our problem could be regarded as a resource constraint scheduling problem [3, 6], where the resource is a set of processors. It has applications in parallel computing systems such as symmetric shared memory multiprocessors and distributed computing systems such as bus-connected networks of workstations. In these systems, a processor allocation mechanism is independent of the topology of an interconnection network. Another related problem is scheduling malleable tasks which has also been investigated in the literature [8, 10]. In that problem, each task requests for several possible numbers of processors, i.e., a task has adjustable size, and for each size, an execution time is also specified. The problem has several variations depending on different ways in which the execution time of a task changes with the number of processors allocated to it, and the performance measures to be optimized (e.g., makespan, average flow time).
The problem we consider here is to schedule nonmalleable tasks with noncontiguous processor alloca- tion. Even though the complicated algorithm in [1] for rectangle packing can also be applied to solve our problem, we propose and analyze a simple approximation algorithm called# , where is a positive inte- ger. Algorithm has a moderate asymptotic worst-case performance ratio (5<57575 k
M N
¡W¢
k£5¤¥¥<58575
for all¦£ ); but the algorithm has a small asymptotic worst-case performance ratio hp£§<*-¨#p£©;sk M N¡ ¢
k)spª§1¨ , when task sizes do not exceed@§1¨ , where¨« is an integer. We notice that the
capability to deal with small tasks is important in real applications since many task sizes are relatively small as compared with the system size so that a large scale parallel system can be shared by many users
simultaneously. However, it is not clear whether the algorithm in [1] has such capability. Furthermore, the simplicity of our algorithm allows us to conduct average-case performance analysis. In particular, we show that if the numbers of processors requested by the tasks are independent, identically distributed (i.i.d.) random variables uniformly distributed in the range¬8575@£, and task execution times are i.i.d. ran- dom variables with finite mean and variance, then
M N
¡4¢
k®5¥¯°¯¯<57585, i.e., *s*C'D;2;2§
*|GIHKJO*C'D;2;
is asymptotically bounded from above by 1.2898680... as&£± . For an exponential distribution of task sizes, we have
M N
¡4¢
k«5¥¯°¯²³<58575. As demonstrated by our analytical as well as numerical re-
sults, the average-case performance ratio improves significantly when tasks request for smaller numbers of processors. We notice that there is lack of such results on probabilistic algorithm analysis, especially in multi-dimensional cases [4]. The average-case performance of the algorithm in [1] is unknown.
The rest of the paper is organized as follows. We present algorithm in Section 2. The worst-case performance of the algorithm is analyzed in Section 3, and its average-case performance is studied in Section 4. Finally we give a summary in Section 5.
2 The Approximation Algorithm H
Our algorithm H for scheduling independent parallel tasks divides' into sublists' . ,' 3 , ..., 'K
according to task sizes (i.e., numbers of processors requested by tasks), where´ is a positive integer.
Forµk}¶·k¸t , define'¹(º©, =D» ' ¼ @§ *|¶Bp£©;s½? = k}@§¶{¾ , i.e.,'D¹ contains all tasks in'
that have sizes in the interval¿o¹(*C@§ *|¶Àp£©;0@§¶. Define'Dª(ºo, =D» '¼Á½? = k@§¾ , i.e.,
'K contains all tasks whose sizes are in the range¿( *C<0@§1Â.
Algorithm H produces schedules of the ' ¹ ’s sequentially and separately. To process tasks in ' ¹ , where·kölkë£ , the@ processors are partitioned into¶ groups, ÄB.0ÄO30o58575702Ä ¹ , each contains
@§¶ processors. Each group ÄsÅ of processors is treated as a unit, and is assigned to a task in ' ¹ . Such an allocation can be implemented using, for example, the list scheduling algorithm [4]. Suppose
' ¹ ()*,
¹
.
09,
¹
3
0657585709,
¹
:Æ
;, where& ¹ is the number of tasks in' ¹ . Initially, groupÄsÅ is assigned to, ¹Å , wherekYnk}¶ , and, .¹ 09, 3¹ 0657585709,
¹
¹ are removed from' ¹ . Upon the completion of a task, ¹Å , the first unscheduled task in'¹ , i.e.,, ¹zǹ . , is removed from'¹ and scheduled to execute onÄ Å . This process repeats until all tasks in'D¹ are finished. Then algorithm H begins the scheduling of next sublist'D¹Ç . .
For 'D , there is no need to divide the @ processors. The list scheduling algorithm is again em- ployed here. Let'K«(È*-, . 09, 3 0657585709,
: ¢ ;. Initially, as many tasks in 'K are scheduled as possible, i.e., tasks , . 02, 3 0o58575709,
É start their execution, whereÊ is defined in such a way that the total size of
,
.
02,
3
0o58575802,
É is no larger than @ , but the total size of , .
09,
3
0657585709,
ÉÇ . exceeds @ . When a task finishes, the next task in' begins its execution, provided that there are enough idle processors. Notice that it is the scheduling of tasks in' that takes advantage of noncontiguous processor allocation.
3 Combinatorial Analysis
In this section, we analyze the worst-case performance of algorithm H . Let#*'; be the makespan of the schedule produced by algorithm H for' , andGIHKJL*'; be the makespan of an optimal schedule of
' . First, we show the following result.
Theorem 1. For any list' of& tasks and)l² , we have
#µ*C'D;k
o²
GIHDJL*'D;_p*-Ël;Ì>
r 0
where>r is the longest execution time of the& tasks. Furthermore, forÍÎ and any large« , there exists' , such that GIHKJL*';Ï , ands*C'D;§GIHKJI*';(Ã
3Р. Therefore, for allȣ , we have
5x58575k M N
¡4¢
k}5¤¥¥ 57585.
Proof. Assume that all tasks in' are executed in the time interval ¬<0 *';Ñ, and that tasks in' ¹ are scheduled in¬Ò ¹ 09Ò¹zÇ .Ó, whereLkY¶k$ , i.e., the first task in' ¹ starts at timeÒ¹ , and the last completion time of the tasks in'¹ isÒ9¹zÇ . . Letʹ be the starting time of the last task, :ƹ in'¹ . DefineÔ . (Ò 3 ÕÒ. , and for all¥µk¶nkY , defineÔ1¹À(}Ê©¹IqÒ9¹ , and¨¹(Ò9¹zÇ . ʹ be the remaining execution time of'D¹ once, :ƹ starts. Clearly,¨©¹k>r, and
s*C'D;Ö( Ô . p$Ô 3 pÔ
Ð
pm6mom©p$Ô©p¨ 3 p¨
Ð
pm6m6m1p¨©
k Ô . p$Ô 3 pÔ
Ð
pm6mom©p$Ô©p*®©;Ì>
r 5 (1)
We give several lower bounds forGIHKJL*';. First, tasks in' . have large sizes such that no two of them can execute in parallel. Therefore, we have
GIHDJI*C'D;Ô . 5 (2)
Second, there can be at most two tasks from 'K3 that can execute at the same time, and there can be at most one task from' 3 that can execute simultaneously with a task in' . . Thus,
GIHDJI*C'D;Ô1.Kp
¥Ô©3%qÔ.
¥ (
Ô.
¥
p$Ô©35 (3)
Third, define the area of a task ,4= to be ? =C>6=, and the total area of ' ¹ to be A ¹ (Ø×tÙÚÛ b Æ
? =>6= for
Ükª¶k} , andA(ÝA . pYA 3 p}mom6mpYA# . For convenience, we assume that processor requirements
are normalized such that½? = kÞ for allµk£ßhk& . Clearly,GIHDJL*'D;s}A . We give a lower bound forA as follows. Fork}¶Õk¸Y , we notice that all the¶ groupsÄ . ,Ä 3 , ...,ÄÀ¹ are busy until, :ƹ starts its execution. That is, during time interval¬Ò9¹0Óʹ©, at least¶x§ *|¶%p©; of the processors are busy, i.e., we haveAs¹£*|¶x§ *|¶#pl©;2;Ô1¹ . For'D , we note that during time interval¬ÒB0ÓÊ©Ï, the percentage of busy processors is at least*é;§ , i.e.,A#Þ}*2*-Ã;2§Õ;Ô ; otherwise, some tasks should start earlier.
Hence,
GIHDJI*C'D;
¥ Ô . p ¥² Ô 3 p ²à Ô Ð pm6mom©p
l
Ô#á . p
®l
Ô©B5 (4)
Now let us consider the ratio
â ( Ô . p$Ô 3 p$Ô
Ð
pm6mom©p$Ô©sá . pÔ
Tvuwã|Ô1.10
.
3
Ô.äp$Ô30
.
3
Ô1.äp
3Ð
Ô©3ptmom6m©p
sá .
Ô
sá .äp
sá .
Ô
Iå
5 (5)
LetæÁ(Ô
Ð
pm6mom©p$Ô . Apparently,
â k
Ô.äp$Ô3Dpæ
Tvu1w ãÔ . 0 .3 Ô . pÔ 3 0 .3 Ô . p 3Ð Ô 3 p Ð
ç æ å 5 (6)
We give the proof of the following result in the appendix.
Ô.äp$Ô©3Kpæ
TµuwãÑÔ . 0 . Ô . pÔ 3 0 . Ô . p 3Ð Ô 3 p Ðç æ å k
o²
o¯
0 whereÔ.©0Ô309æ·l<5 (7)
Combining Equations (1)–(7), we gets*C'D;k}
. Ð
.9è
GIHDJI*C'D;/p*-Ël;Ì>r . To show the lower bound for
M N
¡ ¢ , let us consider a list' of tasks which contains& tasks of size .3 pÕ~,
& tasks of size .Ð pY~, and& tasks of size .é l¥~ , where& is a multiple of 6, and~Þ is a very small
quantity. All tasks have unit execution time. Clearly,GIHDJI*C'D;L(& . Algorithm H divides' into' . ,
' 3 , and' é , and#*';#(Þ&Ápl&ê§¥Ipl&ê§1Â(®
3Ð & . Thus, we can choose& a sufficiently large number
while keepings*C'D;2§GIHKJL*';ë(
3Ð . This completes the proof of the theorem.
For tasks with small sizes, algorithm# exhibits much better performance due to increasing processor utilization, as claimed by the following theorem.
Theorem 2. For any list' of& tasks, such that? = k@§1¨ for allß, where¨ is an integer, we have
#*';k
e
Dp
¨ f
GIHDJL*'D;êp*¨#p©;>
r 0
where>r is the longest execution time of the& tasks. Furthermore, forÈY¨#p and any large , there exists ' , such that GIHDJI*C'D;¸ , and #*';2§GIHKJL*';(ìspÞ§<*-¨Op ;. Therefore, we have
Dpt1§ *-¨Ïp©;Dk M N
¡4¢
k}Dpt1§¨ .
Proof. The proof is similar to that of Theorem 1. Since¨Á¥ , and sublists' . ,' 3 , ...,'Kí6á . are empty, Equation (1) becomes
*';kÔ í p$Ô íÇ .äpm6mom©pÔ p*-˨Ïp©;Ì>
r 5
By using the area lower bound forGIHDJI*C'D; , we have
GIHKJL*';R
e ¨
¨hpxf ÔíKp
e
¨Ïpt
¨Ïp$¥_f Ôí2Ç
.
pm6mom©p
e
f
Ôsá
. p e
f
Ô
¨
¨Ïp
*Ô í p$Ô í2Ç .äpm6mom©p$Ô ;z5
The above two inequalities give the asymptotic worst-case performance bound #p1§¨ in the theorem.
To show the lower boundDpt1§ *¨hp©; for
M N¡ ¢ , let us consider a list' which contains&4¨ tasks of size
§ *¨#p£©;êpl~ , and& tasks of size1§ *¨Ïp£;äq¨1~, where& is a multiple of¨sp , and~ is a sufficiently small value. All tasks have unit execution time. Clearly,GIHKJL*C'D;À(Ã& . Algorithm H divides' into
'Kí and'Dí2Ç . , ands*C'D;(¸&Õp&ê§ *¨Opª©;. Thus, we can choose & sufficiently large while keeping
#*'D;§GIHDJI*C'D;ë(ªDpt1§ *¨%p©;.
When¨Á ³ , algorithm# has better asymptotic worst-case performance ratio than the algorithm in
[1].
4 Probabilistic Analysis
Now let us consider the average-case performance of algorithm H . For convenience, we assume the task sizes are normalized such thatµ½l? =ëk , and that the?=’s are independent, identically distributed (i.i.d.) random variables with a common probability density functionîï*æ{; in the range*C<0o. Our assumption on the task execution times is quite general, i.e., the>o=’s are i.i.d. random variables with meanð and variance
ñ 3
, whereð andñ are any finite numbers independent of& . The probability distributions of task sizes and execution times are independent of each other.
Theorem 3. We have the following asymptotic average-case performance bound for algorithms :
MON
¡4¢
(òQ8S7T
: V N
XL *sµ*'D;2;
*ÑGIHKJO*C'D;9;1g
k
#á .
ó¹ d .
¶ÜôÎõ Æõ
Æ2ö
õ
îï*-æ{;9÷ævp
®lsôøõ
¢
ù
æ{îï*-æW;÷æ
Tvuw
e ô .
ù
æ{îï*æ{;÷æ/0
ô .
õú
îï*æ{;÷æ
f 5
(Note that the bound only depends onîï*-æW; .) Proof. It is clear that the mean task size is
?(
ô .
ù
æ{îï*æ{;÷æ/5
Since a task size falls into¿ ¹ with probabilityû©üÆ
îï*-æ{;9÷æ , where¿ ¹ (Î*§<*C¶Àp©;061§¶ for allk¶Õk
l , and¿( *C<061§, the expected number of tasks in'¹ is
*-&/¹;K(
Xô ü Æ
îï*-æW;÷æ
g
&ä0
for allOkY¶Âk . Also, the expected size of tasks in'D¹ is
?¹O(
ô ü Æ
æWîï*-æW;÷æ
ô ü Æ îï*æ{;÷æ
5
Since the area of a task, = has expectation? ð , and tasks in' . have to be executed sequentially, we have
*|GIHDJI*C'D;2;lTvu1wn*-&
? ðä0 *& . ;ð/;K(týv&4ðä0 (8)
where
ýÎ(Tvu1wn*
?{0 *& . ;2§1&/;ë(tTvuw
e ô .
ù
æ{îï*æ{;9÷æê0
ô .
õú
îï*-æ{;9÷æ
f 5
Letþv¹ be the makespan of the schedule for'¹ . Then, *C#µ*';9;ë(
*þ
.
; p
*þ
3
;pmom6mÑp
*þµI;. Clearly,
*Cþ . ;ë(
*-& . ;Ìðÿ(
Xô .
õú
îï*-æW;÷æ
g
&4ðä5 (9)
For¥Bkt¶kl®l , we have *Cþv¹;K( *Ô1¹;/p *-¨¹; , and
*Ô1¹;k
*&/¹;
ðä5
Furthermore,¨©¹ is no more than the maximum of¶ random execution times. It is well known from order statistics [5] that the mean of the maximum of i.i.d. random variables . 0 3 065757580 with meanð and varianceñ 3 is
*-Tvu1w_* . 0 3 0o5857570;2;%kðvp
sl
¥sl
ñ 5
Therefore,
*þ ¹ ; k *&/¹;
¶
ðÜpðÜp
¶l
¥¶
ñ
( X
¶ e ô ü Æ îï*æ{;÷æ
f
&Âpt
g
ðÜp
¶B
¥¶À
ñ 5 (10)
Finally, we considerþµ . Since
*A ;D( *& ;
? ðÿ(
Xô ü¢
æ{îï*-æW;÷æ
g
&4ðä0
and the processor utilization is at leasth§1 in the time interval¬ÒB0ÓÊo#, we get
*CþµL;k
*CA#O;
h
.
p *-¨oO;ë(
Xô ü¢
æWîï*-æW;÷æ
g
&4ðÜp *-¨oO;5
For *¨ ;, the main difficulty is that when task, :¢ starts execution, the number of active tasks still in execution is unknown, which could be as large as& , the total number of tasks in' . Since *& ; could be*&/; , we use the following quite loose upper bound for *-¨oO;, that is,¨o is no more than the maximum of& random execution times,
*-¨©L;kðÜp
&Á
¥1&Á
ñ
½ðvp
&¥ ñ 5
Therefore, we get
*CþµL;k
e
Xô ü¢
æ{îï*æ{;÷æ
g
&vp
f
ðÂp
&¥ ñ 5 (11)
Combining Equations (9)–(11), we obtain
*sµ*'D;2;k
e ô .
õú
îï*-æW;÷æµp
sá .
ó¹ d 3
¶Üô õ
Æõ
Æ2ö
õ
îï*-æ{;9÷æ
p
sô õ
¢
ù
æ{îï*æ{;9÷ævp
Ël
& f
&4ðÜp
e
sá .
ó¹ d 3
¶l
¥¶Bl
p &
¥f ñ 5
Notice that
¶
½ ¶¥
for all¶£ . Consequently,
sá .
ó
¹ d 3
¶B
¥¶À
½
#á .
ó
¹ d 3 ¶¥ ½ ô
3 æ¥
÷æÿ½
¥
² . 5
The above calculations give rises to
*C#µ*';9;k
e
sá .
ó
¹ d .
¶vôÎõ
Æõ
Æ9ö
õ
îï*æ{;÷æµp
Iô õ
¢
ù
æWîï*-æW;÷æµp
l
& f
&4ðÜp
e ¥
²
. p &
¥f ñ 5
(12) Using Equations (8) and (12), we obtain
*s*C'D;2;
*ÑGIHKJL*';9;
k
ý e
sá .
ó¹ d .
¶vô õ
Æõ
Æ9ö
õ
îï*-æW;÷æµp
®sô õ
¢
ù
æWîï*-æ{;9÷ævp
®l
& f
p
ý e ¥
² m
.
& p
¥&$f
ñð
(
ý e
sá
.
ó¹ d .
¶
ôÎõ Æõ
Æ9ö
õ
îï*-æW;÷æµp
®
ô õ
¢
ù
æWîï*-æ{;9÷æ
f
p
e .
& p & f 5
It is clear that as&Õ¦ , we get the asymptotic average-case performance bound in the theorem.
As an example, let us consider the uniform distributions, that is, the? =’s are i.i.d. random variables uniformly distributed in the range *x06§1¨©, where¨t is a positive integer. That is, îï*æ{;n(´¨ for
µ½$æÿk£§¨ . (Notice that when@ is sufficiently large, a discrete uniform distribution onº0Ó¥ 0o5857580Ó@§¨ ¾
can be treated as a continuous uniform distribution on*C<061§¨©.)
Theorem 4. If the? =’s be i.i.d. random variables uniformly distributed in the range*x06§1¨©, we have
M N
¡4¢
k í , that is,
*C#µ*';9;ksí
*|GIHKJO*C'D;2;z0
as&nØ , where
IíÏ(¥¨
3
X
3
¨ e
Dp
¥ 3 pmom6m©p
*-¨Il;
3
fhg
0
as¸¦ .
Proof. We examine the numerator and denominator of the bound in Theorem 3. The denominator is simply
ýÎ(Tvu1w
e ô .
ù
æ{îï*-æW;÷æ/0
ô .
õú
îï*æ{;÷æ
f (
¥¨
0
and the numerator is
(t¨
e
#á .
ó d
¶ 3
*C¶Op©;
p
¥1q*-©;f
5
Since
¶ 3
*C¶Àp©;
(
¶ 3
¶ p
¶Opt
0
we have
sá .
ó¹ d í
¶ 3
*C¶Àp©;
( e
#á .
ó¹ d í
¶ 3 f
¨ p
5
Note that
#á .
ó¹ d í
¶ 3 ( N
ó
¹ d .
¶ 3 í6á .
ó
¹ d .
¶ 3 N
ó
¹ d
¶ 3 k 3
í6á .
ó
¹ d .
¶ 3
5
Thus, we have
#á .
ó¹ d í
¶ 3
*C¶Àp©;
k 3
¨ e
Dp
¥ 3 pm6m6m1p
*¨s©;
3 f 0
and
k¨
e 3
¨ e
Dp
¥ 3 ptmom6m©p
*-¨L©;
3 f p
¥*l; f 5
By choosing sufficiently large, the average-case performance bound
§1ý can be made arbitrarily close tosí .
When¨L(ª , the asymptotic average-case performance bound given in Theorem 4 is . ( 3 §1²%Õ¥s(
5¥¯°¯¯<57575. To show the quality of the average-case performance boundIí in Theorem 4, we give the
following numerical data.
. ( 5¥¯°¯¯<57585
3 ( 58³°
à
¤¥<57585
Ð ( 58©¯¯x©¥<57585
ç ( 5¯¥²²¥ ¤57585
( 5<
àà
°<57585
é ( 5 ³³¥
à
¯¤57585
( 5 à ¤ à
¥ ©°<57585
è ( 5 à
©³²<57585
( 5²°² ¤¥ 57585
.ù ( 5²²¥³³°<57585
It is clear that í ½Ïp}1§¨ for all¨ÿÎ , i.e., í is less than the asymptotic worst-case performance bound in Theorem 2.
Though closed form solutions are not available, the average-case performance bounds of algorithm could be calculated using Theorem 3 numerically for arbitrary probability distribution of task sizes. For instance, let us consider a truncated exponential distribution, i.e.,
îï*-æ{;D(
á!
h
á
0 µ½$æ·k5
Theorem 5. If the? =’s be i.i.d. random variables exponentially distributed in the range *x06, we have
M N¡ ¢
k" , that is,
*C#µ*';9;%k#
*ÑGIHKJL*';9;0
as&nØ , where
( N
ó
¹ d .
¶ m á$&%
a
¹zÇ . c á'%Ó¹
h
á
á$
h
á
0
as¸¦ .
Proof. It can be easily verified by straightforward calculation that the numerator and denominator in Theorem 3 are
(
sá .
ó¹ d .
¶
m á$&%
a
¹zÇ . c á$&%Ó¹
h
á
p
Ël
m
h
á$
e
h
á$&%
(
á&%2
f 0
and
ý(tTvuw
e
á$
h
á$
0
á$&% 3 á
h
á
f (
á$
h
á$
0
respectively. By letting ¦ , the average-case performance bound
§1ý can be made arbitrarily close to .
To show the average-case performance bound" in Theorem 5, we let ()o¥ à , and choose
in such a way that the mean task size
?(
á$
h
á
takes the values 1§ *|¥1¨; for ¨( 0¥<0657585706© , so that a comparison can be made between performance bounds of# under the uniform and the exponential distributions.
(tx5<©¯<©²<5758570 (ª5¥¯°¯²³ 57575
(t²x5³°²³<©°<5758570
(ª5¥¤1²xo
à
57575
(³<5°²<5758570 (ª5¥
à
³¤³³ 57575
(¤ 5°¤1¯<© ¤¤5758570
(ª57o
à
xo<57575
(t°x5°°³ àà
5758570 "B(ª57©¥¤²
à
<57575
(ª5°°°x
à
³ 5758570 "B(ª57o¥<©¯<57575
(ªo²x5°°°¯² ¤<5758570 "B(ª5¯
à
¤
à
³ 57575
(ª©³<5°°°° ¤5758570 "B(ª5 ¤¥¥¤1<57575
(ª¤ 5°°°°°³<5758570 "B(ª5¥°² ¤57575
(ªo°x5°°°°°°x5758570 "B(ª5³³ ¤1 ³²<57575
As shown in the above list, is slightly smaller than 3 §1²v¥ , when?(È<5³ , i.e, ¨l( , due to distribution imbalance in*C<0o. However, is larger than í for all¨B} , because *-&ê.6; is never null.
5 Conclusions
We have studied the problem of scheduling independent nonmalleable parallel tasks in parallel systems with identical processors. We proposed a simple approximation algorithm called # , and performed combinatorial analysis for its worst-case performance and probabilistic analysis for its average-case per- formance. In particular, we proved the following results. (1) The asymptotic worst-case performance ratio
M N
¡W¢ is in the range¬8 3Ð 5857 .
Ð
.9è
. (2) If the numbers of processors requested by the tasks are uniformly distributed i.i.d. random variables and task execution times are i.i.d. random variables with finite mean and variance, then the average-case performance ratio is
M N
¡4¢
kÞ5¥¯°¯¯<57585. In other words, less than 22.5% of the allocated computing power is wasted. (3) Both the worst- and average-case performance ratios improve significantly when tasks request for smaller numbers of processors. (4) Results similar to (2)–(3) also hold for the truncated exponential distribution of task sizes.
Acknowledgements
The author wishes to express his gratitude to the editor and two anonymous referees for their comments on improving the paper. This research was supported in part by National Aeronautics and Space Adminis- tration and the Research Foundation of State University of New York through the NASA/University Joint Venture in Space Science Program under Grant NAG8-1313.
References
[1] B.S. Baker, D.J. Brown, and H.P. Katseff, “A 5/4 algorithm for two-dimensional packing,” Journal of Algorithms 2, 348-368, 1981.
[2] B.S. Baker and J.S. Schwarz, “Shelf algorithms for two-dimensional packing problems,” SIAM Jour- nal on Computing 12, 508-525, 1983.
[3] J. Blazewicz, J.K. Lenstra, and A.H.G. Rinnooy Kan, “Scheduling subject to resource constraints:
classification and complexity,” Discrete Applied Mathematics 5, 11-24, 1983.
[4] E.G. Coffman, Jr., D.S. Johnson, P.W. Shor, and G.S. Lueker, “Probabilistic analysis of packing and related partitioning problems,” in Probability and Algorithms, 87-107, National Research Council, 1992.
[5] H.A. David, Order Statistics, John Wiley & Sons, 1970.
[6] M.R. Garey and R.L. Graham, “Bound for multiprocessor scheduling with resource constraints,”
SIAM Journal on Computing 4, 187-200, 1975.
[7] I. Golan, “Performance bounds for orthogonal oriented two-dimensional packing algorithms,” SIAM Journal on Computing 10, 571-582, 1981.
[8] R. Krishnamurti and E. Ma, “An approximation algorithm for scheduling tasks on varying partition sizes in partitionable multiprocessor systems,” Technical Report RC 15900, IBM Research Division, 1990.
[9] D.K.D.B. Sleator, “A 2.5 times optimal algorithm for bin packing in two dimensions,” Information Processing Letters 10, 37-40, 1980.
[10] J. Turek, J.L. Wolf, K.R. Pattipati, and P.S. Yu, “Scheduling parallelizable tasks: putting it all on the shelf,” Proc. International Conference on Measurement and Modeling of Computer Systems, 225-236, 1992.
Appendix. Proof of Equation (7)
The following fact is used in the proof. If )
*æ{;K(+*
æµp-,
.
æp$÷
0
where
*
0/,10
.
0÷x02æ$ª , then
)
*-æW; is an increasing (decreasing) function ofæ if
*
÷,
.
Þ (½ª ). We consider three cases.
Case 1.Ô . is the maximum. SinceÔ . .3 Ô . pqÔ 3 , we haveÔ 3 k .3 Ô . . SinceÔ . .3 Ô . p 3Ð Ô 3 p Ðç æ , we haveæÕk
çÐ * .
3 Ô . 3Ð Ô 3 ; . Hence,
â ( Dp
Ô 3
Ô . p æ
Ô .
( Dp
Ô 3
Ô.
p ಠe ¥ ¥² m Ô 3
Ô. f
( Dp
¥² p
° m Ô 3 Ô .
k Dp
¥² p
° m
¥
(
o²
o¯
5
Case 2. 3. Ô . p£Ô 3 is the maximum. Since Ô . k 3. Ô . p£Ô 3 , we haveÔ . k ¥Ô 3 . Since 3. Ô . pÔ 3
.
3 Ô . p 3Ð Ô 3 p Ðç æ , we getæÿk
ç Ô 3 . Now,
â (
Ô.äp$Ô3Dpæ
.3 Ô . p$Ô 3 k Ô.äp
. Ð
Ô©3
.3 Ô . pÔ 3 0
which is an increasing function ofÔ . . Henceâ takes its maximum value .
Ð
.9è whenÔ . (£¥Ô 3 .
Case 3. .3 Ô . p 3Ð Ô 3 p Ðç æ is the maximum. Since .3 Ô . plÔ 3 k .3 Ô . p 3Ð Ô 3 p Ðç æ , we getæ
ç Ô 3 . Note that
â (
æplÔ1.äp$Ô3
Ðç
æµp
.3 Ô . p 3Ð Ô 3
is a decreasing function ofæ . Thus,â gets its maximum value whenæÁ(
ç
Ô3 , i.e.,
â k Ô . p .Ð
Ô 3
.
3 Ô . p$Ô 3 0
which is an increasing function ofÔ. . Since, Ô.µk .3 Ô1.p
3Ð
Ô3%p
Ðç æ , andæ(
ç
Ô3 , we haveÔ.µkÞ¥Ô3 . Hence
â
reaches its maximum value .
Ð
.è whenÔ.h(£¥Ô3 .