¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

(1)

パッケージ

Rhpc

の状況

中間栄治

†

中野純司

‡

†

_COM-ONE

‡

_{統計数理研究所}

2015

年度統計数理研究所共同研究集会

「データ解析環境 R の整備と利用」

平成 27 年 12 月 5 日

(2)

概要

1 _はじめに

2 _Rhpc

3 forign MPI

4 Windows

版はじめました

5 _おわりに

(3)

概要

1 _はじめに

2 _Rhpc

3 forign MPI

4 Windows

版はじめました

5 _おわりに

(4)

Rhpc

は元々, R の HPC 関係のプログラムを ‘Rhpc’ と言うディレク

トリ配下にまとめていたものであり, それをそのままパッケージ化

(

しかし殆ど全ては何故か別のディレクトリに移動)

開発のきっかけは「何かパッケージを作りたい」「えー, snow より

早くできるのー？」という...

データ採取の為の最低限の機能として, Rhpc worker call および

Rhpc lapply

を実装

(5)

概要

1 _はじめに

2 _Rhpc

3 forign MPI

4 Windows

版はじめました

5 _おわりに

(6)

Rhpc

の概要

スーパーコンピューター上でおおよそ最適になるように構築

SPMD

は私が嫌いなので

...apply

系で並列処理

スーパーコンピューターなので

MPI

を選択

ワーカープロセスは Embedding R で作成 (libR 必須)

Windows

版はじめました

(7)

Rhpc

基本関数

1

MPI

の初期化と終了処理及びコミュニケータに関するもの

Rhpc initialize

Rhpc getHandle

Rhpc finalize

Rhpc numberOfWorker(

ワーカー数を返す

)

ワーカーノードに対する関数 Rhpc worker

Rhpc worker call

Rhpc Export

Rhpc EvalQ

(8)

Rhpc

基本関数

2

Apply

系

Rhpc lapply

Rhpc lapplyLB

ワーカー上の乱数の初期化

Rhpc setupRNG

雑多な関数

Rhpc worker noback

(

実験的なもの

: MPI

で作られた外部プログラムの為の関数

)

(9)

Rhpc

基本関数

3

lapply

だけ使うというのはつらいかも

Rhpc apply

Rhpc sapply

Rhpc sapplyLB

その他 (直接使うことは無い)

Rhpc serialize, Rhpc unserialize

Rhpc enquote, Rhpc splitList

(10)

Many workers example (1): Rhpc Export and

parallel::ClusterExport(MPI)

0 50 100 150 0 20 40 60 Export performance Number of workers sec Rhpc::Rhpc_Export parallel::clusterExport(Rmpi)

(11)

Many workers example (2A): Rhpc lapply* and

parallel::ClusterApply*(MPI)

0 50 100 150 0 20 40 60 80 100 120 140 SQRT performance 1 Number of workers sec Rhpc::Rhpc_lapply Rhpc::Rhpc_lapplyLB parallel::clusterAapply(Rmpi+patch) parallel::clusterAapplyLB(Rmpi+patch) parallel::clusterAapply(Rmpi) parallel::clusterAapplyLB(Rmpi)

(12)

Many workers example (2B): Rhpc lapply* and

parallel::ClusterApply*(MPI)

0 50 100 150 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 SQRT performance 2 Number of workers sec Rhpc::Rhpc_lapply Rhpc::Rhpc_lapplyLB parallel::clusterAapply(Rmpi+patch) parallel::clusterAapplyLB(Rmpi+patch)

(13)

Many workers example (2C): Rhpc lapply* and

parallel::ClusterApply*(MPI)

0 50 100 150 0.00 0.02 0.04 0.06 0.08 0.10 0.12 SQRT performance 3 Number of workers sec Rhpc::Rhpc_lapply Rhpc::Rhpc_lapplyLB

(14)

概要

1 _はじめに

2 _Rhpc

3 forign MPI

4 Windows

版はじめました

5 _おわりに

(15)

MPI

で作られた外部プログラムの呼び出し

一般に MPI を用いたプログラム (C 及び Fortran 等) では

Master(rank0)

と Worker(rank1 以上) をコミュニケータを用いて通信さ

せ, SPMD スタイルのプログラミングを行う. したがって Rhpc のよう

に MPI を利用しているプログラムで利用するためには, MPI のコミュニ

ケータの受け渡しが必要になる. そこで、 Rhpc と既存の MPI 外部プロ

グラムが比較的容易に共存可能な環境を提供した. これにより、

Rhpc lapply

を使いつつ MPI の外部プログラムを呼び出す事も可能とな

る.

Rhpc

では, このために Rhpc worker noback 関数を提供する.

(16)

Rhpc

がセットする

options

の値

Rhpc

では MPI 関数の初期化等によって広域変数 options (options

関数を参照) に以下の変数を設定する.

Rhpc.mpi.f.comm

Fortran のためのコミュニケータ (R 型:整

数型)

Rhpc.mpi.c.comm

C 等のためのコミュニケータ (R 型:外部ポイ

ンタ型)

Rhpc.mpi.procs

MPI のコミュニケーションサイズ

Rhpc.mpi.rank

MPI 上のランク

(17)

call of using ‘.Fortran’, ‘.C’ and ‘.Call’ from R

Fortran

及び C の MPI 外部プログラムを R 側から呼び出す部分

1 mpipif<-function(n)

2 {

3 ## Exported functions get values by getOption()

4 ## when they run on workers

5 out<-.Fortran("mpipif", 6 comm=getOption("Rhpc.mpi.f.comm"), 7 n=as.integer(n), 8 outpi=as.double(0)) 9 out$outpi 10 } 1 mpipicall<-function(n) 2 {

5 out<-.Call("mpipicall", 6 comm=getOption("Rhpc.mpi.c.comm"), 7 n=as.integer(n)) 8 out 9 } 1 mpipic<-function(n) 2 {

5 out<-.C("mpipic", 6 comm=getOption("Rhpc.mpi.f.comm"), 7 n=as.integer(n), 8 outpi=as.double(0)) 9 out$outpi 10 }

‘.C’

呼び出しの時、引数として R

の外部参照ポインタは渡せないの

で Fortran 用の整数型のコミュニ

ケーターを渡す. 一般に C であれ

ば.Call で呼んだ方が良い.

see help(.C)

(18)

Changing MPI Fortran code for ‘.Fortran’ in R.

program main | subroutine mpipif(mpi_comm,n,outpi) include "mpif.h" include "mpif.h"

double precision mypi, sumpi double precision mypi, sumpi double precision h, sum, x, f, a double precision h, sum, x, f, a double precision pi double precision pi

parameter (pi=3.14159265358979323846) parameter (pi=3.14159265358979323846) integer n, rank, procs, i, ierr integer n, rank, procs, i, ierr character*16 argv | integer mpi_comm

integer argc | double precision outpi f(a) = 4.d0 / (1.d0 + a*a) f(a) = 4.d0 / (1.d0 + a*a) argc = COMMAND_ARGUMENT_COUNT() <

n=0 <

if (argc .ge. 1) then < call getarg(1, argv) < read(argv,*) n < endif < < call MPI_INIT(ierr) < c COMM c COMM

call MPI_COMM_RANK(MPI_COMM_WORLD, | call MPI_COMM_RANK(mpi_comm, & rank, ierr) & rank, ierr)

call MPI_COMM_SIZE(MPI_COMM_WORLD, | call MPI_COMM_SIZE(mpi_comm, & procs, ierr) & procs, ierr)

call MPI_BCAST(n,1,MPI_INTEGER,0, call MPI_BCAST(n,1,MPI_INTEGER,0, & MPI_COMM_WORLD,ierr) | & mpi_comm,ierr)

if ( n .le. 0 ) goto 30 if ( n .le. 0 ) goto 30 h = 1.0d0/n h = 1.0d0/n sum = 0.0d0 sum = 0.0d0

do 20 i = rank+1, n, procs do 20 i = rank+1, n, procs x = h * (dble(i) - 0.5d0) x = h * (dble(i) - 0.5d0) sum = sum + f(x) sum = sum + f(x) 20 continue 20 continue

mypi = h * sum mypi = h * sum

call MPI_REDUCE(mypi,sumpi,1, call MPI_REDUCE(mypi,sumpi,1, & MPI_DOUBLE_PRECISION, & MPI_DOUBLE_PRECISION, & MPI_SUM,0, & MPI_SUM,0, & MPI_COMM_WORLD,ierr) | & mpi_comm,ierr)

if (rank .eq. 0) then | outpi=sumpi print *, ’pi = ’, sumpi | 30 continue endif | return 30 call MPI_FINALIZE(ierr) <

stop <

(19)

Changing MPI C code for ‘.C’ in R.

#include "mpi.h" #include "mpi.h" #include <stdio.h> #include <stdio.h> #include <math.h> #include <math.h>

> #include <R.h> > #include <Rinternals.h>

int main( int argc, char *argv[] ) | int mpipic( int *comm, int *N, double *outpi )

{ {

> MPI_Comm mpi_comm; int n=0, rank, procs, i; int n=0, rank, procs, i; double mypi, pi, h, sum, x; double mypi, pi, h, sum, x; if ( argc >= 2){ | mpi_comm = MPI_Comm_f2c(*comm);

n = atoi(argv[1]); | n = *N;

} <

< MPI_Init(&argc,&argv); < // COMM // COMM

MPI_Comm_size(MPI_COMM_WORLD,&procs); | MPI_Comm_size(mpi_comm, &procs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); | MPI_Comm_rank(mpi_comm, &rank); MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); | MPI_Bcast(&n, 1, MPI_INT, 0, mpi_comm); h = 1.0 / (double) n; h = 1.0 / (double) n;

sum = 0.0; sum = 0.0;

for (i = rank + 1; i <= n; i += procs) { for (i = rank + 1; i <= n; i += procs) { x = h * ((double)i - 0.5); x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x*x)); sum += (4.0 / (1.0 + x*x));

} }

mypi = h * sum; mypi = h * sum;

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); | mpi_comm); if (rank == 0) | *outpi=pi; printf("pi = %.16f\n", pi); < MPI_Finalize(); < return(0); return(0); } }

(20)

Changing MPI C code for ‘.Call’ in R.

#include "mpi.h" #include "mpi.h" #include <stdio.h> #include <stdio.h> #include <math.h> #include <math.h>

> #include <R.h> > #include <Rinternals.h>

int main( int argc, char *argv[] ) | SEXP mpipicall(SEXP comm, SEXP N)

{ {

> MPI_Comm mpi_comm; > SEXP ret;

int n=0, rank, procs, i; int n=0, rank, procs, i; double mypi, pi, h, sum, x; double mypi, pi, h, sum, x;

if ( argc >= 2){ | mpi_comm = *((MPI_Comm*)R_ExternalPtrAddr(comm)); n = atoi(argv[1]); | PROTECT(ret=allocVector(REALSXP,1));

} | n = INTEGER(N)[0];

MPI_Init(&argc,&argv); < // COMM // COMM

MPI_Comm_size(MPI_COMM_WORLD,&procs); | MPI_Comm_size(mpi_comm, &procs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); | MPI_Comm_rank(mpi_comm, &rank); MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); | MPI_Bcast(&n, 1, MPI_INT, 0, mpi_comm ); h = 1.0 / (double) n; h = 1.0 / (double) n;

sum = 0.0; sum = 0.0;

for (i = rank + 1; i <= n; i += procs) { for (i = rank + 1; i <= n; i += procs) { x = h * ((double)i - 0.5); x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x*x)); sum += (4.0 / (1.0 + x*x));

} }

mypi = h * sum; mypi = h * sum;

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); | mpi_comm );

if (rank == 0) | REAL(ret)[0]=pi; printf("pi = %.16f\n", pi); | UNPROTECT(1); MPI_Finalize(); | return(ret); return(0); <

(21)

Call foreign MPI program from R

1 source("mpipicall.R") 2 source("mpipic.R") 3 source("mpipif.R") 4 5 library(Rhpc) 6 Rhpc_initialize() 7 cl<-Rhpc_getHandle(4) 8 9 n<-100 10

11 ## Load shared library

12 Rhpc_worker_call(cl,dyn.load,"pi.so"); dyn.load("pi.so")

13

14 ## Rhpc_worker_noback calls a function, but does not

15 ## get any result.

16 ## Workers should be started faster than a master.

17 Rhpc_worker_noback(cl,mpipicall,n); mpipicall(n)

18 Rhpc_worker_noback(cl,mpipic,n); mpipic(n)

19 Rhpc_worker_noback(cl,mpipif,n); mpipif(n)

20

(22)

概要

1 _はじめに

2 _Rhpc

3 forign MPI

4 Windows

版はじめました

5 _おわりに

(23)

Windows

上での

Rhpc

CRAN

における Windows 版の MPI パッケージは MS-MPI を用いて

おり, 我々もこれに習って MS-MPI を用いて Rhpc の Windows 版を作成

した

CRAN

の MS-MPI のビルド環境は MS-MPIv4.2

MS-MPI

の最新版は MS-MPIv7

MS-MPI v5 SDK

以降のデフォルトパスには括弧が含まれる

64bit

版では def ファイルを用いて link する必要がある

実行には mpiexec を必要とする

MPI

の実行だけなら, SDK は不要

Rhpc

はソースからビルド可能なように, 何れの MS-MPI でもビルド可

(24)

Windows

上での

Rhpc:

起動方法

1

Windows

上での Rhpc:起動方法 1

C:\Users\boofoo> mpiexec.exe

-env PATH "C:\Program Files\R\R-3.2.2\bin\x64;%PATH%"

-n 1 CMD /C "C:\Program Files\R\R-3.2.2\bin\x64\Rgui.exe"

:

-env PATH "C:\Program Files\R\R-3.2.2\bin\x64;%PATH%"

-n 3 "%USERPROFILE%\Documents\R\win-library\3.2\Rhpc\RhpcWorker64.exe"

(25)

Windows

上での

Rhpc:

起動方法

2

Windows

上での Rhpc:起動方法 2

C:\Users\boofoo> Documents\R\win-library\3.2\Rhpc\RhpcWin64.cmd

上記バッチのシンボリックリンクをデスクトップに作成して, そこに

作業用ディレクトを指定する (たぶんおすすめ)

並列数などはバッチ内の環境変数を参照する

NPROCS (

デフォルトは物理コア数)

OMP NUM THREADS (1)

R HOME (

デフォルトはレジストリの値)

(26)

Windows

上での

Rhpc:

実行

Windows

上での Rhpc:実行

> library(Rhpc)

> Rhpc initialize()

rank 0/ 4(1140850688) : hostname : 2152

> cl <- Rhpc getHandle() #

引数は無し

Detected communication size 4

(27)

Windows

上での

Rhpc:

性能

Windows64bit

上で 4 並列 (1 Master 3 Worker) の場合での性能は以

下のとおりである. 以下の場合, 並列数が少ないので, export における

MPI

の集団通信における恩恵が若干しかない. 一方で, 遅延評価の為

の*lapply 系はまずまずの性能が発揮されている.

parallel(SOCK)

Rhpc

Transfer of matrix4000

2

_{by *export}

_1.54sec

_1.39sec

10000 times of calc sqrt by *lapply

0.70sec

0.08sec

(28)

概要

1 _はじめに

2 _Rhpc

3 forign MPI

4 Windows

版はじめました

5 _おわりに

(29)

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

パッケージ

Rhpc

の状況

中間 栄治

†

中野 純司

‡

COM-ONE

統計数理研究所

2015

年度統計数理研究所共同研究集会

「データ解析環境 R の整備と利用」

平成 27 年 12 月 5 日

概要

1

はじめに

2

Rhpc

3

forign MPI

4

Windows

版はじめました

5

おわりに

概要

1

はじめに

2

Rhpc

3

forign MPI

4

Windows

版はじめました

5

おわりに

Rhpc

は元々, R の HPC 関係のプログラムを ‘Rhpc’ と言うディレク

トリ配下にまとめていたものであり, それをそのままパッケージ化

(

しかし殆ど全ては何故か別のディレクトリに移動)

開発のきっかけは「何かパッケージを作りたい」「えー, snow より

早くできるのー？」という...

データ採取の為の最低限の機能として, Rhpc worker call および

Rhpc lapply

を実装

概要

1

はじめに

2

Rhpc

3

forign MPI

4

Windows

版はじめました

5

おわりに

Rhpc

の概要

スーパーコンピューター上でおおよそ最適になるように構築

SPMD

は私が嫌いなので

...apply

系で並列処理

スーパーコンピューターなので

MPI

を選択

ワーカープロセスは Embedding R で作成 (libR 必須)

Windows

版はじめました

Rhpc

基本関数

1

MPI

の初期化と終了処理及びコミュニケータに関するもの

Rhpc initialize

Rhpc getHandle

中間栄治

中野純司

_COM-ONE

_{統計数理研究所}

_はじめに

_Rhpc

_おわりに

_はじめに

_Rhpc

_おわりに

_はじめに

_Rhpc

_おわりに

_はじめに

_Rhpc

_おわりに