ブロードキャスト - mpi-report-j.dvi

MPI BCAST(buer,count, datatype,root,comm )

入出力 ^buer バッファの開始アドレス^(choice)

入力 ^count バッファ内の要素の数^(integer)

入力 ^{datatyp e} バッファのデータタイプ^(handle)

入力 ^root ブロードキャスト・ルートのランク^(integer)

入力 ^comm コミュニケータ^(handle)

int MPI Bcast(void* buffer, int count, MPI Datatype datatype, int root, M

PI Comm comm )

MPI BCAST(BUFFER, COUNT, DATATYPE, ROOT, COMM, IERROR)

<type> BUFFER(*)

INTEGER COUNT, DATATYPE, ROOT, COMM, IERROR

MPI BCASTは、ランクが^rootであるプロセスからそのプロセスを含むグループ内の全プ

ロセスへメッセージをブロードキャストする。^commおよび^rootとして同じ引数を使って全グループ・メンバにより呼び出される。戻った時点で、^rootの通信バッファの内容が全プロセスへコピーされている。

一般に^datatypeとして、派生データタイプが許される。どのプロセスの^count、^{datatyp e} の値もルートの^count、^datatypeのそれと等しくなければならない。

このことは、各プロセスとルートの間で、送信データの量と受け取るデータの量が等しくなければならないということを意味している。^MPI ^BCASTおよび他のすべてのデータ通信集団ルーチンはこの制約を課している。ただし送信側と受信側とで型マップの異なりだけは許される。

4.4.1 MPI BCASTの使用例

例 ^4.1 プロセス⁰からグループ内のすべてのプロセスへ¹⁰⁰個の整数をブロードキャストする。

MPI\_Comm comm;

int array[100];

int root=0;

...

ここで取りあげる一部分のコードの多くは、変数（上記の^commなど）に適切な値が代入されているものと仮定している。

4.5 Gather

MPI GATHER(sendbuf, sendcount, sendtyp e,recvbuf, recvcount, recvtype, root,comm)

入力 ^sendbuf 送信バッファの開始アドレス^(choice)

入力 ^sendcount 送信バッファの要素の数^(integer)

入力 ^{sendtyp e} 送信バッファの要素のデータタイプ^(handle)

出力 ^recvbuf 受信バッファのアドレス^(choice, ルートでのみ意味をも

つ⁾

入力 ^recvcount ^number^of^elements^for^any^single^receive^(integer,ルートでのみ意味をもつ⁾

入力 ^{recvtyp e} 受信バッファ要素のデータタイプ⁽ルートでのみ意味をも

つ⁾^(handle)

入力 ^root 受信プロセスのランク^(integer) 入力 ^comm コミュニケータ^(handle)

int MPI Gather(void* sendbuf, int sendcount, MPI Datatype sendtype, void*

recvbuf, int recvcount, MPI Datatype recvtype, int root,

MPI Comm comm)

MPI GATHER(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNT, RECVTYPE ,

ROOT, COMM, IERROR)

<type> SENDBUF(*), RECVBUF(*)

INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR

各プロセス（ルートプロセスを含む）は、送信バッファの内容をルートプロセスへ送信する。ルートプロセスはメッセージを受信し、ランクの順番に格納する。その結果は、グループの中のⁿ個のプロセス（ルートプロセスを含む）が次のルーチンの呼び出しを実行し

MPI Send(sendbuf;sendcount;sendtype;root;:::);

さらに次の呼び出しをⁿ回実行した結果と同じである。

ここで、extent(recvtype)は^MPI ^Type ^extent()を呼び出して得られる型の大きさである。

言い換えると、グループ内のプロセスが送信したⁿ個のメッセージをランク順に連結し、得られたメッセージを^MPI RECV(recvbuf, recvcount1n, recvtyp e, ...)を呼び出して受け取ったかのようにルートが受信する。

ルート以外の全てのプロセスについては受信バッファは無視される。

通常、^{sendtyp e}と^recvtypeには派生データタイプが許される。プロセスⁱの^sendcount、

sendtypeは、ルートの^recvcount、^{recvtyp e}と等しくなければならない。このことは、各プロセスとルートの間で、送信データの量と受け取るデータの量が等しくなければならないということを意味する。ただし送信側と受信側とで型マップの異なりは許される。

rootのプロセスでは関数への全ての引数は意味を持つが、他のプロセスでは引数^sendbuf、

sendcount、^sendtype、^root、^commのみが意味を持つ。引数^rootおよび^commは全てのプロセスで同じ値でなければならない。

個数と型の指定は、ルートプロセス上の同じ位置に複数回書き込まれることがあってはならない。そのような呼び出しはエラーである。

ルートプロセスの引数^recvcountは各プロセスから受信する要素数を示しているのであって、

受信する要素の総数を示しているのではないことに注意すること。

MPI GATHERV( sendbuf, sendcount, sendtype, recvbuf, recvcounts, displs, recvtype, root,

comm)

入力 ^sendbuf 送信バッファの開始アドレス^(choice)

入力 ^sendcount 送信バッファの要素の数^(integer)

入力 ^{sendtyp e} 送信バッファの要素のデータタイプ^(handle)

出力 ^recvbuf 受信バッファのアドレス^(choice, ルートでのみ意味をも

つ⁾

入力 ^recvcounts ⁽グループサイズの長さの⁾整数配列各プロセスから受け

取る要素の数を含んでいる⁽ルートでのみ意味をもつ⁾

入力 ^displs ⁽グループサイズの長さの⁾整数配列。プロセスⁱから送

れて来るデータを置く場所を^recvbufからの相対位置としてⁱ番目の要素に指定⁽ルートでのみ意味をもつ⁾

入力 ^{recvtyp e} 受信バッファ要素のデータタイプ⁽ルートでのみ意味をも

つ⁾^(handle)

入力 ^root データを受信するプロセスのランク^(integer)

入力 ^comm コミュニケータ^(handle)

int MPI Gatherv(void* sendbuf, int sendcount, MPI Datatype sendtype, void

* recvbuf, int *recvcounts, int *displs,

MPI Datatype recvtype, int root, MPI Comm comm)

MPI GATHERV(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNTS, DISPLS ,

RECVTYPE, ROOT, COMM, IERROR)

<type> SENDBUF(*), RECVBUF(*)

INTEGER SENDCOUNT, SENDTYPE, RECVCOUNTS(*), DISPLS(*), RECVTYPE, ROOT,

COMM, IERROR

MPI GATHERVは^MPI ^GATHERの機能を拡張したもので、^recvcountsが配列になっており、各プロセスから可変個のデータを受け取れるようになっている。さらに、新しい引数として

displsを提供することにより、ルート上のデータの配置に関して自由度が増している。

結果としては、ルートプロセスを含む各プロセスがメッセージをルートへ送り、

MPI Send(sendbuf;sendcount;sendtype;root;:::);

ルートプロセスが受信をⁿ繰り返した場合と同じである。

メッセージはルートプロセスの受信バッファの中にランク順に配置される。つまり、プロセス^jから送られたデータはルートプロセスの受信バッファ^recvbufの^j番目の部分に配置される。^recvbufの^j番目の部分は^recvbufをベースにしたオフセット^displs[j]要素（^{recvtyp e}の表現で）から始まる。

ルート以外の全てのプロセスでは、受信バッファは無視される。

プロセスⁱの^sendcount、^{sendtyp e}は、ルートのrecvcounts[i]、^{recvtyp e}と等しくなければならない。このことは、各プロセスとルートとの間で、送信データの量が受け取るデータの量と等しくなければならないということを意味する。ただし、例^4.6に示されているように、送信側と受信側とで型マップの違いは許される。

rootプロセスでは、全ての引数が意味を持つが、それ以外のプロセスでは引数^sendbuf、

send-count、^sendtype、^root、^commのみが意味をもつ。引数^rootおよび^commは全てのプロセスで値が同一でなければならない。

個数と型の指定により、ルートプロセス上の同じ位置に複数回書き込まれることがあってはならない。そのような呼び出しはエラーである。

4.5.1 MPI GATHER、^MPI ^GATHERVの使用例

例 ^4.2 グループ内のすべてのプロセスからルートへ¹⁰⁰個の整数を収集する。図^4.2を参照のこと。

MPI_Comm comm;

int gsize,sendarray[100];

int root, *rbuf;

...

MPI_Comm_size( comm, &gsize)

rbuf = (int *)malloc(gsize*100*sizeof(int));

MPI_Gather( sendarray, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);

例 ^4.3 前の例の手直し−ルートだけが受信バッファ用のメモリを割り当てる。

MPI_Comm_rank( comm, myrank);

if ( myrank == root) {

MPI_Comm_size( comm, &gsize);

rbuf = (int *)malloc(gsize*100*sizeof(int));

}

MPI_Gather( sendarray, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);

100 100 100

100 100

all processes

100 rbuf

at root

図 ^4.2: ルートプロセスがグループ内の各プロセスから¹⁰⁰個の整数を収集する。

例 ^4.4 前の例と同じことをしているが、派生データタイプを使用している。^gatherではルートプロセスと各プロセスとの間で型対応が行われているので、その型は^gsize*100個の整数の集まりと一致しない。

MPI_Comm comm;

int gsize,sendarray[100];

int root, *rbuf;

MPI_Daratype rtype;

...

MPI_Comm_size( comm, &gsize);

MPI_Type_contiguous( 100, MPI_INT, &rtype );

MPI_Type_commit( &rtype );

rbuf = (int *)malloc(gsize*100*sizeof(int));

MPI_Gather( sendarray, 100, MPI_INT, rbuf, 1k rtype, root, comm);

例 ^4.5 各プロセスはルートへ¹⁰⁰個の整数を送信するが、それぞれを、整数^stride個分だけの間隔をおいて配置する。^MPI^GATHERV関数および^displs引数を使用してこの効果を得ることが出来る。^stride≧¹⁰⁰と仮定する。図^4.3を参照のこと。

MPI_Comm comm;

int gsize,sendarray[100];

int root, *rbuf, stride;

int *displs,i,*rcounts;

...

MPI_Comm_size( comm, &gsize);

rbuf = (int *)malloc(gsize*stride*sizeof(int));

100 100 100

stride rbuf

at root all processes

図 ^4.3: ルートプロセスは、グループ内の各プロセスから¹⁰⁰個の整数を集め（^gather）、各々

を整数^stride個分だけ間隔をおいて配置する。

rcounts = (int *)malloc(gsize*sizeof(int));

for (i=0; i<gsize; ++i) {

displs[i] = i*stride;

rcounts[i] = 100;

}

MPI_Gatherv( sendarray, 100, MPI_INT, rbuf, rcounts, displs, MPI_INT,

root, comm);

stride<100だと、プログラムは誤りであることに注意。

例 ^4.6 受信側についての例^4.5と同じ。ただし、^C言語における^1002150の整数配列の第⁰列から¹⁰⁰個の整数を送信する。図^4.4を参照のこと。

MPI_Comm comm;

int gsize,sendarray[100][150];

int root, *rbuf, stride;

MPI_Datatype stype;

int *displs,i,*rcounts;

...

MPI_Comm_size( comm, &gsize);

rbuf = (int *)malloc(gsize*stride*sizeof(int));

displs = (int *)malloc(gsize*sizeof(int));

rcounts = (int *)malloc(gsize*sizeof(int));

for (i=0; i<gsize; ++i) {

100 100 100

150 rbuf

at root stride

all processes 100

150

100

150

100

図^4.4: ルート・プロセスは^C言語における¹⁰⁰²¹⁵⁰配列の第⁰列を集め^,各々を整数^stride 個分だけ間隔をおいて配置する。

rcounts[i] = 100;

}

/* Create datatype for 1 column of array

MPI_Type_vector( 100, 1, 150, MPI_INT, &stype);

MPI_Type_commit( &stype );

MPI_Gatherv( sendarray, 1, stype, rbuf, rcounts, displs, MPI_INT,

root, comm);

例 ^4.7 プロセスⁱは^C言語における¹⁰⁰ ² ¹⁵⁰の整数配列の第ⁱ列から^(100-i)個の整数を送信する。上記２つの例と同様にして、^stride間隔でバッファの中へ読み込む。図^4.5を参照のこと。

MPI_Comm comm;

int gsize,sendarray[100][150],*sptr;

int root, *rbuf, stride, myrank;

MPI_Datatype stype;

int *displs,i,*rcounts;

...

MPI_Comm_size( comm, &gsize);

MPI_Comm_rank( comm, &myrank );

rbuf = (int *)malloc(gsize*stride*sizeof(int));

displs = (int *)malloc(gsize*sizeof(int));

100 99

rbuf

at root stride

all processes 100

150

100

150

100

150

98

図 ^4.5: ルートプロセスは ^100-i個の整数を^C言語における ^1002100 配列の第ⁱ列から集め、

各々を^stride個の整数だけ離して配置する。

for (i=0; i<gsize; ++i) {

displs[i] = i*stride;

rcounts[i] = 100-i; /* note change from previous example */

}

/* Create datatype for the column we are sending

MPI_Type_vector( 100-myrank, 1, 150, MPI_INT, &stype);

MPI_Type_commit( &stype );

/* sptr is the address of start of "myrank" column

sptr = &sendarray[0][myrank];

MPI_Gatherv( sptr, 1, stype, rbuf, rcounts, displs, MPI_INT,

root, comm);

例 ^4.8 例^4.7と同じ。ただし、送信側では別の方法で行われている。送信側で正しいストライドとなるようなデータタイプを生成し、^C言語における配列から１列読み込む。^??節、例^3.33で行ったのと同じである。

MPI_Comm comm;

int gsize,sendarray[100][150],*sptr;

int root, *rbuf, stride, myrank, disp[2], blocklen[2];

MPI_Datatype stype,type[2];

int *displs,i,*rcounts;

...

MPI_Comm_size( comm, &gsize);

MPI_Comm_rank( comm, &myrank );

rbuf = (int *)malloc(gsize*stride*sizeof(int));

displs = (int *)malloc(gsize*sizeof(int));

rcounts = (int *)malloc(gsize*sizeof(int));

for (i=0; i<gsize; ++i) {

displs[i] = i*stride;

rcounts[i] = 100-i;

}

/* Create datatype for one int, with extent of entire row

disp[0] = 0; disp[1] = 150*sizeof(int);

type[0] = MPI_INT; type[1] = MPI_UB;

blocklen[0] = 1; blocklen[1] = 1;

MPI_Type_struct( 2, blocklen, disp, type, &stype );

MPI_Type_commit( &stype );

sptr = &sendarray[0][myrank];

MPI_Gatherv( sptr, 100-myrank, stype, rbuf, rcounts, displs, MPI_INT,

root, comm);

例 ^4.9 送信側は例^4.7と同じ。ただし、受信側では受信したブロックの間のストライドはブロックごとに異なる。図^4.6を参照のこと。

MPI_Comm comm;

int gsize,sendarray[100][150],*sptr;

int root, *rbuf, *stride, myrank, bufsize;

MPI_Datatype stype;

int *displs,i,*rcounts,offset;

...

MPI_Comm_size( comm, &gsize);

MPI_Comm_rank( comm, &myrank );

stride = (int *)malloc(gsize*sizeof(int));

...

100 stride[1]

rbuf

at root

all processes 100

150

100

150

100

150 99 98

図 ^4.6: ルート・プロセスは、^C言語における^1002150配列の第ⁱ列から^100-i個の整数を集め、各々整数個^stride[i]分だけ離して配置する（可変ストライド）。

/* stride[i] for i = 0 to gsize-1 is set somehow

/* set up displs and rcounts vectors first

displs = (int *)malloc(gsize*sizeof(int));

rcounts = (int *)malloc(gsize*sizeof(int));

offset = 0;

for (i=0; i<gsize; ++i) {

displs[i] = offset;

offset += stride[i];

rcounts[i] = 100-i;

}

/* the required buffer size for rbuf is now easily obtained

bufsize = displs[gsize-1]+rcounts[gsize-1];

rbuf = (int *)malloc(bufsize*sizeof(int));

/* Create datatype for the column we are sending

MPI_Type_vector( 100-myrank, 1, 150, MPI_INT, &stype);

MPI_Type_commit( &stype );

sptr = &sendarray[0][myrank];

MPI_Gatherv( sptr, 1, stype, rbuf, rcounts, displs, MPI_INT,

root, comm);

ドキュメント内 mpi-report-j.dvi (ページ 134-146)