• 検索結果がありません。

スライド 1

N/A
N/A
Protected

Academic year: 2021

シェア "スライド 1"

Copied!
33
0
0

読み込み中.... (全文を見る)

全文

(1)

Parallel Programming in MPI

part 2

(2)

Today's Topic

• ノンブロッキング通信

Non-Blocking Communication

• 通信の完了を待つ間に他の処理を行う

Execute other instructions while waiting for the completion of a communication.

• 集団通信関数の実装

Implementation of collective communications • MPIプログラムの時間計測

Measuring execution time of MPI programs • デッドロック Deadlock

(3)

Today's Topic

• ノンブロッキング通信

Non-Blocking Communication

• 通信の完了を待つ間に他の処理を行う

Execute other instructions while waiting for the completion of a communication.

• 集団通信関数の実装

Implementation of collective communications • MPIプログラムの時間計測

Measuring execution time of MPI programs • デッドロック Deadlock

(4)

ノンブロッキング通信関数

Non-blocking communication functions

• ノンブロッキング

= ある命令の完了を待たずに次の命令に移る

Non-blocking = Do not wait for the completion of an instruction and

proceed to the next instruction

Example) MPI_Irecv & MPI_Wait

MPI_Recv

Wait for the arrival of data

MPI_Irecv

Proceed to the next instruction without waiting for the data data Blocking next instructions next instructions MPI_Wait data Non-Blocking

(5)

MPI_Irecv

• Non-Blocking Receive

Parameters:

start address for storing received data, number of elements, data type,

rank of the source, tag (= 0, in most cases),

communicator (= MPI_COMM_WORLD, in most cases), request

• request: 通信要求 Communication Request

• この通信の完了を待つ際に用いる

Used for Waiting completion of this communication

• Example)

MPI_Request req; ...

MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req); ...

MPI_Wait(&req, &status);

Usage:

int MPI_Irecv(void *b, int c, MPI_Datatype d, int src, int t, MPI_Comm comm, MPI_Request *r);

(6)

MPI_Isend

• Non-Blocking Send

Parameters:

start address for sending data, number of elements, data type,

rank of the destination, tag (= 0, in most cases),

communicator (= MPI_COMM_WORLD, in most cases), request

• Example)

MPI_Request req; ...

MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req); ...

MPI_Wait(&req, &status);

Usage:

int MPI_Isend(void *b, int c, MPI_Datatype d, int dest, int t, MPI_Comm comm,

(7)

Non-Blocking Send?

• Blocking send (MPI_Send):

送信データが別の場所にコピーされるのを待つ

Wait for the data to be copied to somewhere else.

• ネットワークにデータを送出し終わるか、一時的にデータのコピーを作成するまで。 Until completion of the data to be transferred to the network

or, until completion of the data to be copied to a temporal memory.

• Non-Blocking send (MPI_Isend):

待たない

(8)

Notice: ノンブロッキング通信中はデータが不定

Data is not sure in non-blocking communications

• MPI_Irecv:

• 受信データの格納場所と指定した変数の値は MPI_Waitまで不定

Value of the variable specified for receiving data is not fixed before MPI_Wait MPI_Irecv to A ... ~ = A ... MPI_Wait 10 A 50 A arrived data 50

Value of

A

at here

can be 10 or 50

~ = A

Value of

A

is 50

(9)

Notice: ノンブロッキング通信中はデータが不定

Data is not sure in non-blocking communications

• MPI_Isend:

• 送信データを格納した変数を MPI_Waitより前に書き換えると、実際に送信さ れる値は不定

If the variable that stored the data to be sent is modified before MPI_Wait, the value to be actually sent is unpredictable.

MPI_Isend A ... A= 50 ... MPI_Wait 10 A 50 A data sent 10 or 50 A= 100

Modifying value of

A

here

causes incorrect

communication

You can modify value of

A

at

(10)

MPI_Wait

• ノンブロッキング通信(

MPI_Isend、 MPI_Irecv)の完了を待つ。

Wait for the completion of MPI_Isend or MPI_Irecv

• 送信データの書き換えや受信データの参照が行える Make sure that sending data can be modified,

or receiving data can be referred. • Parameters:

request, status

status:

MPI_Irecv 完了時に受信データの statusを格納

The status of the received data is stored at the completion of

MPI_Irecv

Usage:

(11)

MPI_Waitall

• 指定した数のノンブロッキング通信の完了を待つ

Wait for the completion of specified number of non-blocking

communications

Parameters:

count, requests, statuses

count:

ノンブロッキング通信の数

The number of non-blocking communications

requests, statuses:

少なくとも

count個の要素を持つ MPI_Request と MPI_Statusの配列

Arrays of MPI_Request or MPI_Status that consists at least 'count'

number of elements.

Usage:

int MPI_Waitall(int c,

(12)

Today's Topic

• ノンブロッキング通信

Non-Blocking Communication

• 通信の完了を待つ間に他の処理を行う

Execute other instructions while waiting for the completion of a communication.

• 集団通信関数の実装

Implementation of collective communications • MPIプログラムの時間計測

Measuring execution time of MPI programs • デッドロック Deadlock

(13)

集団通信関数の中身

Inside of the functions of collective communications

• 通常,集団通信関数は,

MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv

等の一対一通信で実装される

Usually, functions of collective communications are

implemented by using message passing functions.

(14)

Inside of MPI_Bcast

• One of the most simple implementations

int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm)

{

int i, myid, procs; MPI_Status st; MPI_Comm_rank(comm, &myid); MPI_Comm_size(comm, &procs); if (myid == root){ for (i = 0; i < procs) if (i != root) MPI_Send(a, c, d, i, 0, comm); } else{

MPI_Recv(a, c, d, root, 0, comm, &st); }

(15)

Another implementation: With MPI_Isend

int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm)

{

int i, myid, procs, cntr; MPI_Status st, *stats; MPI_Request *reqs;

MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){

stats = (MPI_Status *)malloc(sizeof(MPI_Status)*procs); reqs = (MPI_Request *)malloc(sizeof(MPI_Request)*procs); cntr = 0;

for (i = 0; i < procs) if (i != root)

MPI_Isend(a, c, d, i, 0, comm, &(reqs[cntr++])); MPI_Waitall(procs-1, reqs, stats);

free(stats); free(reqs); } else{

MPI_Recv(a, c, d, root, 0, comm, &st); }

(16)

Flow of the Simple Implementation

Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7

Isend to 1 Irecv from 0 Isend to 2 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 wait wait wait wait wait wait Isend to 3 Isend to 4 Isend to 5 Isend to 6 Isend to 7 wait waitall

(17)

Time for Simple Implementation

1 link can transfer 1 message at a time

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Total Time = T * (P-1)

(18)

Another implementation: Binomial Tree

int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm)

{

int i, myid, procs; MPI_Status st;

int mask, relative_rank, src, dst; int tag = 1, success = 0;

MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); relative_rank = myid - root; if (relative_rank < 0)

relative_rank += procs; mask = 1;

while (mask < num_procs){ if (relative_rank & mask){

src = myid - mask;

if (src < 0) src += procs;

MPI_Recv(a, c, d, src, 0, comm, &st); break;

}

mask >>= 1;

while (mask > 0){

if (relative_rank + mask < procs){ dst = myid + mask;

if (dst >= procs) dst -= procs; MPI_Send (a, c, d, dst, 0, comm); }

mask >>= 1; }

return 0; }

(19)

Flow of Binomial Tree

• Use 'mask' to determine when and how to Send/Recv

Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 mask = 1 mask = 2 mask = 4 mask = 4 mask = 2 mask = 1 Send to 4 Send to 2 Send to 1 mask = 1 Recv from 0 mask = 1 mask = 2 mask = 1 mask = 2 mask = 4 mask = 1 mask = 2 mask = 1 Recv from 2 mask = 1 Recv from 4 mask = 1 Recv from 6 Recv from 0 Recv from 0 Recv from 4 mask = 1 Send to 3 mask = 2 Send to 6 mask = 1 Send to 7 mask = 1 Send to 5

(20)

Time for Binomial Tree

Use multiple links at a time

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

Total Time = T * log2P

T: Time for transferring 1 message P: Number of processes

(21)

Today's Topic

• ノンブロッキング通信

Non-Blocking Communication

• 通信の完了を待つ間に他の処理を行う

Execute other instructions while waiting for the completion of a communication.

• 集団通信関数の実装

Implementation of collective communications • MPIプログラムの時間計測

Measuring execution time of MPI programs • デッドロック Deadlock

(22)

MPIプログラムの時間計測

Measure the time of MPI programs

• MPI_Wtime

• 現在時間(秒)を実数で返す関数 Returns the current time in seconds. • Example) Measure time here

...

double t1, t2;

...

t1 = MPI_Wtime();

処理

t2 = MPI_Wtime();

(23)

並列プログラムにおける時間計測の問題

Problem on measuring time in parallel programs

• プロセス毎に違う時間を測定:

どの時間が本当の所要時間か?

Each process measures different time.

Which time is the time we want?

Read Read Send Read Send Rank 0 Receive Receive Rank 1 Rank 2 t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); Measure time here

(24)

集団通信

MPI_Barrierを使った解決策

Use MPI_Barrier

• 時間計測前にMPI_Barrierで同期

Synchronize processes before each measurement

• For measuring total execution time.

Read Read Send Read Send Rank 0 Receive Receive Rank 1 Rank 2 t1 = MPI_Wtime(); MPI_Barrier MPI_Barrier MPI_Barrier MPI_Barrier MPI_Barrier Measure time here

(25)

より細かい解析

Detailed analysis

Average

• MPI_Reduce can be used to achieve the average:

MAX and MIN

• Use MPI_Gather to gather all of the results to Rank 0. • Let Rank 0 to find MAX and MIN

double t1, t2, t, total;

t1 = MPI_Wtime();

...

t2 = MPI_Wtime();

t = t2 – t1;

MPI_Reduce(&t, &total, 1, MPI_DOUBLE, MPI_SUM,

0, MPI_COMM_WORLD);

if (myrank == 0)

(26)

最大

(Max)、平均(Ave)、最小(Min)の関係

Relationships among Max, Ave and Min

• プロセス毎の負荷(仕事量)のばらつき検証に利用

Can be used for checking the load-balance.

Max – Ave

is large

Max – Ave

is small

Ave – Min

is large

NG

Mostly OK

Ave – Min

(27)

通信時間の計測

Measuring time for communications

double t1, t2, t3, t4 comm=0;

t3 = MPI_Wtime();

for (i = 0; i < N; i++){

computation

t1 = MPI_Wtime();

communication

t2 = MPI_Wtime(); comm += t2 – t1;

computation

t1 = MPI_Wtime();

communication

t2 = MPI_Wtime(); comm += t2 – t1;

}

t4 = MPI_Wtime();

(28)

Analyze computation time

• Computation time = Total time - Communication time

• Or, just measure the computation time

• 計算時間のばらつき = 負荷の不均衡の度合い

Balance of computation time shows

balance of the amount of computation

• 注意: 通信時間には、負荷の不均衡によって生じた待ち時間が含まれ

るので、単純な評価は難しい

Communication time is difficult to analyze since

it consists waiting time caused by load-imbalance.

(29)

Today's Topic

• ノンブロッキング通信

Non-Blocking Communication

• 通信の完了を待つ間に他の処理を行う

Execute other instructions while waiting for the completion of a communication.

• 集団通信関数の実装

Implementation of collective communications • MPIプログラムの時間計測

Measuring execution time of MPI programs • デッドロック Deadlock

(30)

Deadlock

• 何らかの理由で、プログラムを進行させることができなくなった状態

A status of a program in which it cannot proceed by some reasons.

MPIプログラムでデッドロックが発生しやすい場所:

Places you need to be careful for deadlocks:

1. MPI_Recv, MPI_Wait, MPI_Waitall

2. Collective communications

if (myid == 0){

MPI_Recv from rank 1 MPI_Send to rank 1 }

if (myid == 1){

MPI_Recv from rank 0 MPI_Send to rank 0 }

if (myid == 0){

MPI_Irecv from rank 1 MPI_Send to rank 1 MPI_Wait

}

if (myid == 1){

MPI_Irecv from rank 0 MPI_Send to rank 0 MPI_Wait

}

Wrong case: One solution:

(31)

Summary

• ノンブロッキング通信の効果

Effect of non-blocking communication

• 通信開始と通信完了待ちを分離

Split the start and the completion of a communication • 通信と計算のオーバラップを可能にする

Enable overlapping of communication and computation .

• 集団通信の実装

Implementation of collective communication.

• 内部で送信と受信を組み合わせて実装

Construct algorithms with sends and receives. • 所要時間はアルゴリズムに依存

Time depends on the algorithm.

• MPIプログラムの時間計測

Measuring execution time of MPI programs

• 並列プログラムではデッドロックに注意

(32)

Report) Make Reduce function by yourself

• 次のページのプログラムの my_reduce関数の中身を追加してプロ

グラムを完成させる

Fill the inside of 'my_reduce' function in the program

shown in the next slide

• my_reduce: MPI_Reduceの簡略版 Simplified version of MPI_Reduce

• 整数の総和のみ. ルートランクは 0限定. コミュニケータは MPI_COMM_WORLD

Calculates total sum of integer numbers. The root rank is always 0. The communicator is always MPI_COMM_WORLD.

• アルゴリズムは好きなものを考えてよい

Any algorithm is OK.

(33)

#include <stdio.h> #include <stdlib.h> #include "mpi.h" #define N 20

int my_reduce(int *a, int *b, int c) {

return 0; }

int main(int argc, char *argv[]) {

int i, myid, procs; int a[N], b[N]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); for (i = 0; i < N; i++){ a[i] = i; b[i] = 0; } my_reduce(a, b, N); if (myid == 0) for (i = 0; i < N; i++)

printf("b[%d] = %d , correct answer = %d¥n", i, b[i], i*procs); MPI_Finalize();

return 0;

参照

関連したドキュメント

⑤調査内容 2015年度 (2015年4月~2016年3月) 1年間の国内宿泊旅行(出張・帰省・修学旅行などを除く)の有無について.

全国の宿泊旅行実施者を抽出することに加え、性・年代別の宿泊旅行実施率を知るために実施した。

However, if the largest observed time in the data is censored, the area under the survival curve is not a closed area. In such a situation, you can choose a time limit L and

エンザルタミド AR シグナル伝達経路阻害 CRPC, mHSPC アビラテロン CYP17 阻害 CRPC,

Amount of Remuneration, etc. The Company does not pay to Directors who concurrently serve as Executive Officer the remuneration paid to Directors. Therefore, “Number of Persons”

9/23 ユーロ圏PMI 欧州経済はエネルギー価格高騰の悪影響などから冬場にかけてリ セッションが懸念される状況で、PMIの内容が注目される

9/21 FOMC 直近の雇用統計とCPIを踏まえて、利上げ幅が0.75%になるか見 極めたい。ドットチャートでは今後の利上げパスと到達点も注目

①氏名 ②在留資格 ③在留期間 ④生年月日 ⑤性別 ⑥国籍・地域