4.3.1 Heterogeneous Computation Platform
Let us consider a computation Grid, in that, a master process controlsn worker processes and each process runs on a particular computer. The Grid runs on a heterogeneous platform, i.e., workers can have different CPU powers and different network bandwidths.
The master can divide the total workload, Ltotal, into arbitrary chunks and deliver them to the appropriate workers. We assume that the master uses its network connection in a sequential fashion, i.e., it does not send chunks to some workers simultaneously. Workers can receive data from network and perform computation simultaneously.
We keep using the heterogeneous computation platform mentioned in the last chapter, with some addition.
• Ltotal: the total amount of workload.
• Wi: worker numberi, some time called Pi.
• n: total number of workers that are actually selected to process the workload
• m: the number of rounds.
• chunkj,i : the fraction of total workload that the master delivers to worker Wi in round j (i= 1, .., n; j = 1, .., m)
• Si: computation speed of the workerimeasured by the number of units of workload performed per second (f lop/s)
• ESi: estimated average speed of workerWi for Grid tasks on the next round. ESi is derived from equation (4.15).
• Bi: the data transfer rate of the connection link between the master and worker Wi (f lop/s)
• T compj,i: we model the time required for worker i to perform the computation chunkj,i as:
T compj,i=cLati+ chunkj,i ESi
• cLati : the fixed overhead time, in seconds, for starting a computation (e.g. for starting a remote process) in the worker Wi. The computation, including thecLati overhead, can be overlapped with communication.
• nLati : the overhead time, in seconds, incurred by the master to initiate a data transfer toWi (e.g. pre-process application input data and/or initiate a TCP). We denote total latencies by Lati =cLati + nLati.
• T commj,i: we model the communication time spent by the master to sendchunkj,i units of data to workerWi as:
T commj,i=nLati+chunkj,i
(4.2)
• roundj: the fraction of workload dispatched during round j roundj =
n i=1
chunkj,i (4.3)
We fix the time required for each worker to perform communication and computation during each round
cLati+ chunkj,i
Si +nLati+chunkj,i
Bi =constj (4.4)
We set
Ai = Bi×ESi
Bi +ESi (4.5)
so we have
chunkj,i =αiroundj +βi (4.6) where
αi = Ai
n
k=1Ak; βi =Ai
n
k=1Ak(Latk−Lati)
n
k=1Ak (4.7)
Most static scheduling algorithms [4, 18, 16] assume that the execution time of a workload chunk is well-known based on the assumption that workers have guaranteed availability of fixed, predefined CPU power. On a non dedicated, dynamic platform such as Grid, these assumptions are not realistic. Thus in this section we present a model of executing local and Grid tasks at a given, non-dedicated worker.
4.3.2 Markovian Queue M/M/1
1 2 3 4
T1 T2 T3 T4 T5 T6 T7
S1 S S
2 S3 4
Number of waiting tasks
T8 Time Figure 4.1: Arrivals and departures at a queue. {Ti} refer to the arrival instants, {Si} refer to the service times
Per [39] a queue or a waiting line is formed by arriving customers (local tasks and Grid tasks in our case) requiring service from a service station (workers Wi in our case).
If service is not immediately available, the arriving tasks may join the queue and wait for service and leave the system after being served (see Figure (4.1). In the meantime, other tasks may arrive for service. We assume that the service system has unlimited capacity (waiting room capacity) for holding both local tasks and Grid tasks. The basic features of our queue are
• The input process: the arriving tasks consist of Grid tasks and local tasks. Grid tasks are the portions of total load Ltotal that are delivered by the server. The local tasks are produced by local applications at the workers.
• The service mechanism: during the execution of a Grid task on a certain worker, some local tasks may arrive causing to interrupt the execution of the lower priority Grid tasks. We consider the execution of the local tasks as preemptive, i.e. a local task must be executed until completion once it is started. The execution of the local tasks follow the rule of first come first served.
• The worker’s capacity. From the view point of the Grid tasks, the state of a worker alternates between available and unavailable. When the worker is executing its own local tasks, it is unavailable for the Grid tasks, otherwise its state is available. The original computation of worker Wi is Si.
We assume that the arrival of the local tasks of worker Wi is assumed to follow a Poisson distribution with arrival rate λi, their execution process follows an exponential distribution with service rate µi and the local task process in the worker is an M/M/1 [39] queuing system (i= 1,2, ..., n) (Figure (4.2)).
µ Output
Worker
Queue Input P( t)λ
Figure 4.2: M/M/1 queue
The execution time of chunkj,i on the workerWi can be expressed as:
T compj,i=X1+Y1+X2+Y2+...+XNL+YNL (4.8) where:
• N L: the number of local tasks which arrive during the execution of workload chunkj,i.
• Yk: execution time of the local task k (k = 1,2, ..., N L), these are independent identical distribution random variables.
• Xk: execution time of kth section ofchunkj,i (k = 1,2, ..., N L). We have:
X1+X2+...+XNL = chunkj,i
(4.9)
From the M/M/1 queuing theory [39] we have:
E(N L) = λichunkj,i
Si ; E(Yk) = 1
µi−λi (4.10)
Because of N L and Yk are independent random variables (k= 1,2, ..., N L) we derive E(T compj,i) = E(T compj,i|N L) =
NL k=1
Xk+
NL
k=1
E(Yk)
= chunkj,i
Si +E(N L)×E(Yk) = chunkj,i
Si(1−ρi) (4.11) where ρi = λi/µi represents the CPU utilization. For a worker Wi with the CPU utilization ρi we can express the computation time of the chunkj,i as
T compj,i = chunkj,i Si(1−ρi)
However λi, µi, ρi are representative of the dynamicity of the environment during a long time. They do not exactly reflect the dynamicity of the environment during a short interval such as the execution time of an application. Therefore, we introduce theadaptive factor δi, which represents the credibility of performance prediction for workerWi and it is initialized to 1 at the beginning of the scheduling process (i.e., in the first round). At the end of each round afterward, δi is computed as follows:
δi = F Si
ESi (4.12)
whereF Si denotes the factually measured available CPU power. Now the expected value of the execution time of chunkj,i is
T compj,i = chunkj,i×δi
Si(1−ρi) (4.13)
Since the actual power of workers available to the Grid tasks varies over time, we have to predict how δi changes. In the next section we describe 2 ways for prediction smoothing parameter δi, i.e. the CPU utilization:
• Prediction δi by using proposed 2PP strategy.
• Prediction δi by using an existing strategy called Mixed Tendency Based.