R on super

(1)

R on super - - computers at computers at ISM ISM

Junji

Junji Nakano Nakano

The Institute of Statistical Mathematics The Institute of Statistical Mathematics

Ei Ei - - ji ji Nakama Nakama

COM COM - - ONE Inc. ONE Inc.

(2)

Abstract Abstract

The institute of statistical mathematics (ISM) The institute of statistical mathematics (ISM)

is a (national) research institute for statistical sciences is a (national) research institute for statistical sciences

provides several super provides several super - - computer systems for statistical computer systems for statistical research by Japanese statisticians.

research by Japanese statisticians.

We use R on these super We use R on these super - - computers and have made computers and have made several parallel computing functions available on it.

several parallel computing functions available on it.

These functions include These functions include

parallel BLAS replacements such as ATLAS and GOTO parallel BLAS replacements such as ATLAS and GOTO library

library

snow package for implementing parallel R functions using snow package for implementing parallel R functions using MPI and other distributed computing techniques.

MPI and other distributed computing techniques.

We also implement a Web environment to use parallel R We also implement a Web environment to use parallel R easily.

easily.

(3)

R and super

R and super - - computers (1) computers (1)

R is used widely for developing statistical R is used widely for developing statistical theory and real data analysis

theory and real data analysis

R sometimes requires huge amount of R sometimes requires huge amount of calculation such as

calculation such as

Huge matrix manipulations Huge matrix manipulations

Bootstrap or MCMC calculation Bootstrap or MCMC calculation

Data mining Data mining

Super Super - - computers may be useful for this computers may be useful for this purpose

purpose

(4)

R and super

R and super - - computers (2) computers (2)

Current super Current super - - computers equip computers equip

Many CPUs Many CPUs

However, single CPU of super computers is not so However, single CPU of super computers is not so different from ones in personal computers

different from ones in personal computers

Large memory Large memory

Almost all super Almost all super - - computers at present use computers at present use parallel computing techniques

parallel computing techniques

We use R on our super We use R on our super - - computers for computers for utilizing these facilities

utilizing these facilities

(5)

Parallel computing systems (1) Parallel computing systems (1)

Traditional 1 CPU Traditional 1 CPU system

system

Shared memory system Shared memory system

Distributed memory Distributed memory system

system

Memory CPU

Memory

CPU CPU CPU

(6)

Parallel computing systems (2) Parallel computing systems (2)

Memory CPU

Memory

CPU CPU CPU

Memory CPU

Time

Serial Section

Parallelizable Sections

Single Computer Shared Memory Distributed Memory

(7)

Parallel computing systems (3) Parallel computing systems (3)

Software tools for parallel computing Software tools for parallel computing

Shared memory system Shared memory system

Process fork, Process fork, thread thread

OpenMP OpenMP

Distributed memory system Distributed memory system

Message Passing Message Passing

PVM (Parallel virtual machine) PVM (Parallel virtual machine)

MPI MPI (Messing passing interface) (mainly) (Messing passing interface) (mainly)

HPF (High performance Fortran) HPF (High performance Fortran)

(8)

Super

Super - - computers at ISM (1): computers at ISM (1): ismaltx ismaltx

SGI ALtix3700 Super Cluster SGI ALtix3700 Super Cluster

CPU: Intel Itanium2 1.3GHz CPU: Intel Itanium2 1.3GHz

64 x 4 = 256 CPUs (Total max speed: 64 x 4 = 256 CPUs (Total max speed:

1331.2 GFLOPS) 1331.2 GFLOPS)

Memory: total 1920 GB Memory: total 1920 GB

512 GB x 3 + 384 GB = 1920 GB 512 GB x 3 + 384 GB = 1920 GB

OS: OS: RedHat RedHat Enterprise Linux v.3 (for Enterprise Linux v.3 (for SGI) SGI)

Physical random number generator Physical random number generator boards (16 set)

boards (16 set)

Frontend Frontend Computers For Parallel Computers For Parallel Computer

Computer

SGI Altix3300 SGI Altix3300

8 CPU (Intel Itanium2 1.3GHz) 8 CPU (Intel Itanium2 1.3GHz)

32GB Memory 32GB Memory

(9)

Super

Super - - computers at ISM (2): computers at ISM (2): ismsr ismsr

HITACHI SR11000 HITACHI SR11000 Model H1

Model H1

CPU: IBM Power4+ CPU: IBM Power4+

1.7GHz 1.7GHz

16 x 4 = 64 (Total max 16 x 4 = 64 (Total max speed: 435.2 GFLOPS) speed: 435.2 GFLOPS)

Memory: total 128 GB Memory: total 128 GB

32 GB x 4 = 128 GB 32 GB x 4 = 128 GB

OS: (IBM) AIX 5L V5.2 OS: (IBM) AIX 5L V5.2

(10)

Super

Super - - computers at ISM (3): computers at ISM (3): ismxc ismxc

HP XC4000 System HP XC4000 System

CPU: AMD CPU: AMD Opteron Opteron 2.6GHz

2.6GHz

2 x 128 = 256 CPUs (Total 2 x 128 = 256 CPUs (Total max speed: 1331.2

max speed: 1331.2 GFLOPS)

GFLOPS)

Memory: total 640 GB Memory: total 640 GB

4 GB x 96 + 8 GB x 32 = 4 GB x 96 + 8 GB x 32 = 640 GB

640 GB

OS: RedHat OS: RedHat Enterprise Enterprise Linux v.3 (for HP)

Linux v.3 (for HP)

Physical random number Physical random number

generator boards (8 set)

(11)

BLAS BLAS

BLAS: Basic Linear Algebra Subprograms BLAS: Basic Linear Algebra Subprograms

high quality "building block" routines for performing high quality "building block" routines for performing basic vector and matrix operations

basic vector and matrix operations

Level 1 BLAS do vector Level 1 BLAS do vector - - vector operations vector operations

Level 2 BLAS do matrix Level 2 BLAS do matrix - - vector operations vector operations

Level 3 BLAS do matrix- Level 3 BLAS do matrix - matrix operations matrix operations

BLAS are efficient, portable, and widely available, BLAS are efficient, portable, and widely available, they're commonly used in

they're commonly used in the development of high quality linear algebra software, LINPACK and LAPACK for example

R uses BLAS for executing linear calculations R uses BLAS for executing linear calculations

(12)

ATLAS (1) ATLAS (1)

ATLAS: Automatically Tuned Linear ATLAS: Automatically Tuned Linear Algebra Software

Algebra Software

achieve performance on par with machine achieve performance on par with machine - - specific tuned libraries

specific tuned libraries

provides C and Fortran77 interfaces to a provides C and Fortran77 interfaces to a portably efficient

portably efficient BLAS BLAS implementation, as implementation, as well as a few routines from

well as a few routines from LAPACK LAPACK

Threads are available Threads are available

ATLAS FAQ says: ATLAS FAQ says:

(13)

ATLAS (2) ATLAS (2)

ATLAS is used, or is planned to be used, in the following PSEs ATLAS is used, or is planned to be used, in the following PSEs: :

MAPLE(v7 and higher) MAPLE (v7 and higher)

MATLAB(v6.0 and higher) MATLAB (v6.0 and higher)

Mathematica Mathematica (forthcoming) (forthcoming)

MAPLE(v7 and higher) MAPLE (v7 and higher)

Octave Octave

ATLAS is also included in: ATLAS is also included in:

Absoft Absoft Pro Fortran Pro Fortran

ATLAS may be optionally used by almost any project requiring the ATLAS may be optionally used by almost any project requiring the BLAS. BLAS.

Here are some projects that we have seen providing the option fo

Here are some projects that we have seen providing the option for using r using ATLAS:

ATLAS:

GSL GSL

HPL HPL

LAPACK LAPACK

MPB MPB

The R Project The R Project

Scientific Python Scientific Python

Scilab Scilab

PWSCF PWSCF

(14)

GOTO library GOTO library

Kazushige Goto's Kazushige Goto's library (libgoto library ( libgoto) is a highly tuned ) is a highly tuned blas blas library for various architectures. It is based on

library for various architectures. It is based on minimising minimising TLB misses. These BLAS libraries contains following

TLB misses. These BLAS libraries contains following functions

functions

Level 1 Level 1

Level 1 extentions Level 1 extentions

Level 2 Level 2

Level 3 Level 3

a part of LAPACK slaswp a part of LAPACK slaswp, sgetf2, , sgetf2, sgetrf sgetrf, , sgetrs sgetrs , , sgesv, sgesv , dlaswp dlaswp , , dgetf2,

dgetf2, dgetrf dgetrf, , dgetrs, dgetrs , dgesv dgesv, , claswp, cgetf2, claswp , cgetf2, cgetrf cgetrf, , cgetrs cgetrs, , cgesv cgesv, , zlaswp

zlaswp, zgetf2, , zgetf2, zgetrf zgetrf , , zgetrs, zgetrs , zgesv. zgesv .

The library can be downloaded from The library can be downloaded from Goto's Goto's web web pages pages

Threads are available. Threads are available .

(15)

Execute ATLAS R on

Execute ATLAS R on ismaltx ismaltx (1) (1)

Login to Login to ismaltx.ism.ac.jp ismaltx.ism.ac.jp by telnet by telnet

Many Rs Many Rs are stored in /usr/local1/ . are stored in /usr/local1/ .

Simple example of R program (ex1.R)Simple example of R program (ex1.R) {ismaltx 3}/usr/local1% ls

bin lib R-1.9.1.intel8 R-2.0.1.gccgoto dust log R-1.9.1.intel8atlas R-2.0.1.gccgotop etc man R-1.9.1.intel8goto R-2.0.1.intel8

example R-1.9.1 R-1.9.1.intel8gotop R-2.0.1.intel8atlas gcc-3.4.1 R-1.9.1.gcc R-1.9.1.intel8scs R-2.0.1.intel8atlasp gcc-3.4.2 R-1.9.1.gccatlas R-2.0.1.gcc R-2.0.1.intel8goto include R-1.9.1.gccgoto R-2.0.1.gccatlas R-2.0.1.intel8gotop info R-1.9.1.gccscs R-2.0.1.gccatlasp share

{ismaltx 4}/usr/local1%

matrixcalc <- function(m)

{ A<-array(rnorm(m^2),dim=c(m,m)) B<-array(rnorm(m^2),dim=c(m,m)) st<-proc.time()

A%*%B

et<-proc.time()

return((et-st)[3]) }

(16)

Execute ATLAS R on

Execute ATLAS R on ismaltx ismaltx (2) (2)

Shell file to choose a specific R (R.sh Shell file to choose a specific R ( R.sh) )

Write qsub Write qsub script for batch job (ex1.sh) script for batch job (ex1.sh)

#!/bin/sh RVER=2.0.1 COMP=intel8 BLAS=atlasp

export ATLAS_NUM_THREADS=8

export LD_LIBRARY_PATH=/usr/local1/lib/atlas

/atlas${ATLAS_NUM_THREADS}:${LD_LIBRARY_PATH}

/usr/local1/R-${RVER}.${COMP}${BLAS}/bin/R -q --no-save --args < $1

#!/bin/sh

#PBS -q q8

#PBS -j oe

#PBS -l ncpus=8

cd /home0/nakanoj/itanium/work umask 022

sh ./R.sh ex1.R

(17)

Execute ATLAS R on

Execute them on ismaltx Execute them on ismaltx ismaltx ismaltx (3) (3)

ATLAS_NUM_THREADS=1 ATLAS_NUM_THREADS=1

{ismaltx 150}/home0/nakanoj/itanium/work% qsub ex1.sh 27404.ismaltx

{ismaltx 151}/home0/nakanoj/itanium/work% cat ex1.sh.o27404

> matrixcalc <- function(m)

+ { A<-array(rnorm(m^2),dim=c(m,m)) + B<-array(rnorm(m^2),dim=c(m,m)) + st<-proc.time()

+ A%*%B

+ et<-proc.time()

+ return((et-st)[3]) }

> print(matrixcalc(2048)) [1] 1.039062

>

{ismaltx 152}/home0/nakanoj/itanium/work%

{ismaltx 153}/home0/nakanoj/itanium/work% qsub ex1.sh 27405.ismaltx

{ismaltx 154}/home0/nakanoj/itanium/work% cat ex1.sh.o27405

> matrixcalc <- function(m)

+ { A<-array(rnorm(m^2),dim=c(m,m)) + return((et-st)[3]) }

(18)

ATLAS and GOTO on

ATLAS and GOTO on ismaltx ismaltx

0500100015002000

BLAS on ISMALTX

Matrix size

time(sec) 256 512 768 1024 1536 2048 2560 3072 3584 4096 5120 6144 7168 8192 10240 12288 14336 16384

ATLAS thread ATLAS goto

051015202530

BLAS on ISMALTX 256−4096

Matrix size

time(sec) 256 512 768 1024 1536 2048 2560 3072 3584 4096

ATLAS thread ATLAS goto

050010001500

GOTO on ISMALTX

time(sec)

1 thread 2 thread 4 thread 8 thread 16 thread 32 thread

0510152025

GOTO on ISMALTX 256−4096

time(sec)

1 thread 2 thread 4 thread 8 thread 16 thread 32 thread

(19)

0 5000 10000 15000 20000 25000 30000

020000400006000080000100000120000140000

R−2.2.1 + GotoBLAS−1.08 on IA64 Linux

N

MFLOPS

0 5000 10000 15000 20000 25000 30000

020000400006000080000100000120000140000

ismaltx4 gcc_GOTO_1 ismaltx3 gcc_GOTO_2 ismaltx3 gcc_GOTO_4 ismaltx3 gcc_GOTO_8 ismaltx4 gcc_GOTO_16 ismaltx2 gcc_GOTO_32 ismaltx2 gcc_GOTO_48 ismaltx2 gcc_GOTO_64

(20)

0 5000 10000 15000

010000200003000040000500006000070000

R−2.2.1 + GotoBLAS−1.02 on Power4+ AIX5.2L

N

MFLOPS

0 5000 10000 15000

010000200003000040000500006000070000

ismsr gcc_GOTO_1 srnd02 gcc_GOTO_2 ismsr gcc_GOTO_4 ismsr gcc_GOTO_8 ismsr gcc_GOTO_16

(21)

MPI MPI

De facto message passing library specification De facto message passing library specification

extended message extended message - - passing model passing model

not a language or compiler specification not a language or compiler specification

not a specific implementation or product not a specific implementation or product

For parallel computers, clusters, and (mainly) For parallel computers, clusters, and (mainly) homogeneous networks

homogeneous networks

Designed to provide access to advanced Designed to provide access to advanced

parallel hardware for end users, library writers, parallel hardware for end users, library writers,

and tool developers

(22)

MPI is Simple MPI is Simple

Many parallel programs can be written using just these Many parallel programs can be written using just these six functions, only two of which are non

six functions, only two of which are non- - trivial: trivial:

MPI_INIT MPI_INIT

MPI_FINALIZE MPI_FINALIZE

MPI_COMM_SIZE MPI_COMM_SIZE

MPI_COMM_RANK MPI_COMM_RANK

MPI_SEND MPI_SEND

MPI_RECV MPI_RECV

(23)

ScaLAPACK ScaLAPACK

ScaLAPACK ScaLAPACK (Scalable LAPACK) library (Scalable LAPACK) library

a subset of LAPACK a subset of LAPACK routines redesigned for distributed memory parallel routines redesigned for distributed memory parallel computers

computers

It is portable on any computer that supports It is portable on any computer that supports MPI MPI or PVM. or PVM.

Like Like LAPACK LAPACK , the ScaLAPACK , the ScaLAPACK routines are based on block- routines are based on block - partitioned partitioned algorithms in order to minimize the frequency of data movement b

algorithms in order to minimize the frequency of data movement between etween different levels of the memory hierarchy

different levels of the memory hierarchy

The fundamental building blocks of the ScaLAPACK The fundamental building blocks of the ScaLAPACK library are library are distributed memory versions (

distributed memory versions (PBLAS PBLAS) of the ) of the Level 1, 2 and 3 BLAS Level 1, 2 and 3 BLAS, and , and a set of Basic Linear Algebra Communication Subprograms (

a set of Basic Linear Algebra Communication Subprograms (BLACS BLACS ) for ) for communication tasks that arise frequently in parallel linear alg

communication tasks that arise frequently in parallel linear alg ebra ebra computations

computations

In the ScaLAPACK In the ScaLAPACK routines, all interprocessor routines, all interprocessor communication occurs communication occurs within the

within the PBLAS PBLAS and the BLACS and the BLACS

(24)

Execute

Execute RScaLAPACK RScaLAPACK on on ismaltx ismaltx

library(RScaLAPACK) m <- 2048

num_thread <- 4

matrixcalc<-function(m,thread) {

A<-array(rnorm(m^2),dim=c(m,m)) B<-array(rnorm(m^2),dim=c(m,m)) AB<-A%*%B

st<-proc.time()

sla.solve(A,AB,thread) et<-proc.time()

return((et-st)[3]) }

print(matrixcalc(m,num_thread))

Simple R example (ex2.R) Simple R example (ex2.R)

Script for qsub Script for qsub (ex2.sh) (ex2.sh)

#!/bin/sh

#PBS -q q32

#PBS -j oe

#PBS -l ncpus=32

cd /home0/nakano/itanium/work umask 022

export LD_LIBRARY_PATH=/usr/local1/lib/atlas/atlas1:$LD_LIBRARY_PATH

(25)

RScaLapack

RScaLapack on on ismaltx ismaltx

020040060080010001200

RScaLAPACK on ISMALTX all

CPU

time(sec)

nomp 1 2 4 8 16 32 64

1024 ^ 2 2048 ^ 2 4096 ^ 2 6144 ^ 2 8192 ^ 2 10240 ^ 2

050100150200250

RScaLAPACK on ISMALTX 1024 − 6144

CPU

time(sec)

nomp 1 2 4 8 16 32 64

1024 ^ 2 2048 ^ 2 4096 ^ 2 6144 ^ 2

(26)

Simple Network of Workstations Simple Network of Workstations

(snow) for R (1) (snow) for R (1)

The package The package snow snow (Simple Network of (Simple Network of

Workstations) implements a simple mechanism Workstations) implements a simple mechanism

for using a workstation cluster for for using a workstation cluster for

“ “ embarrassingly parallel embarrassingly parallel ” ” computations in R computations in R

Three low level interfaces have been Three low level interfaces have been

implemented, one based on sockets, one using implemented, one based on sockets, one using

PVM via the

PVM via the rpvm rpvm package and one using MPI, package and one using MPI, via the

via the Rmpi Rmpi package package

(27)

Simple Network of Workstations Simple Network of Workstations

(snow) for R (2) (snow) for R (2)

> cl <- makeCluster(c("itasca", "owasso"), type = "SOCK")

> system.time(nuke.boot <-

+ boot(nuke.data, nuke.fun, R=999, m=1, + fit.pred=new.fit, x.pred=new.data)) [1] 26.32 0.71 27.02 0.00 0.00

> clusterEvalQ(cl, library(boot))

> system.time(cl.nuke.boot <-

+ clusterCall(cl,boot,nuke.data, nuke.fun, R=500, m=1, + fit.pred=new.fit, x.pred=new.data))

[1] 0.01 0.00 14.27 0.00 0.00

> clusterApply(cl, 1:2, get("+"), 3) [[1]]

[1] 4

[[2]]

(28)

Simple Network of Workstations Simple Network of Workstations

(snow) for R (2) (snow) for R (2)

require(snow) N<-1024

cl <- makeCluster(getClusterOption("spec")) A<-array(rnorm(N*N),dim=c(N,N))

B<-array(rnorm(N*N),dim=c(N,N)) C<-parMM(cl,A,B)

stopCluster(cl)

Simple example R program using snow (ex3.R) Simple example R program using snow (ex3.R)

qsub qsub script (ex3.sh) script (ex3.sh)

#!/bin/csh

#PBS -q q8

#PBS -l ncpus=8

#PBS -j oe

cd /home0/nakanoj/itanium/work

source /usr/local1/bin/env_local1.csh

mpiexec -boot N R CMD BATCH --no-save ex3.R

(29)

1000 2000 3000 4000 5000 6000

010002000300040005000

R−2.2.1 snow SOCK BLAS is GOTO_2

t(MFLOPS)

1000 2000 3000 4000 5000 6000

010002000300040005000

SOCK GOTO_2 2 SOCK GOTO_2 4 SOCK GOTO_2 8 SOCK GOTO_2 16 SOCK GOTO_2 32

(30)

(31)

(32)

Web interface for

Web interface for Parall Parall R (1) R (1)

Demo Demo … …

Purpose Purpose

Easy use without thinking Job scheduler Easy use without thinking Job scheduler

Same interface for three super- Same interface for three super -computers computers

Implementation Implementation

Web interface by Perl Web interface by Perl

Generating shell script for execution Generating shell script for execution

Daemon for communicating with Job scheduler by Daemon for communicating with Job scheduler by

C C

(33)

R on super - - computers at computers at ISM ISM

R on super

R on super - - computers at computers at ISM ISM

Junji

Junji Nakano Nakano

The Institute of Statistical Mathematics The Institute of Statistical Mathematics

Ei Ei - - ji ji Nakama Nakama

COM COM - - ONE Inc. ONE Inc.

Abstract Abstract

  The institute of statistical mathematics (ISM) The institute of statistical mathematics (ISM)

  is a (national) research institute for statistical sciences is a (national) research institute for statistical sciences

  provides several super provides several super - - computer systems for statistical computer systems for statistical research by Japanese statisticians.

research by Japanese statisticians.

  We use R on these super We use R on these super - - computers and have made computers and have made several parallel computing functions available on it.

several parallel computing functions available on it.

These functions include These functions include

  parallel BLAS replacements such as ATLAS and GOTO parallel BLAS replacements such as ATLAS and GOTO library

library

  snow package for implementing parallel R functions using snow package for implementing parallel R functions using MPI and other distributed computing techniques.

MPI and other distributed computing techniques.

  We also implement a Web environment to use parallel R We also implement a Web environment to use parallel R easily.

easily.

R and super

R and super - - computers (1) computers (1)

  R is used widely for developing statistical R is used widely for developing statistical theory and real data analysis

theory and real data analysis

  R sometimes requires huge amount of R sometimes requires huge amount of calculation such as

calculation such as

  Huge matrix manipulations Huge matrix manipulations

  Bootstrap or MCMC calculation Bootstrap or MCMC calculation

  Data mining Data mining

  Super Super - - computers may be useful for this computers may be useful for this purpose

purpose

R and super

R and super - - computers (2) computers (2)

  Current super Current super - - computers equip computers equip

  Many CPUs Many CPUs

  However, single CPU of super computers is not so However, single CPU of super computers is not so different from ones in personal computers

different from ones in personal computers

  Large memory Large memory

  Almost all super Almost all super - - computers at present use computers at present use parallel computing techniques

parallel computing techniques

  We use R on our super We use R on our super - - computers for computers for utilizing these facilities

utilizing these facilities

Parallel computing systems (1) Parallel computing systems (1)

  Traditional 1 CPU Traditional 1 CPU system

system

  Shared memory system Shared memory system

  Distributed memory Distributed memory system

system

Memory CPU

Memory CPU

Memory CPU

Memory CPU

Memory

CPU CPU CPU

Parallel computing systems (2) Parallel computing systems (2)

Parallel computing systems (3) Parallel computing systems (3)

  Software tools for parallel computing Software tools for parallel computing

  Shared memory system Shared memory system

  Process fork, Process fork, thread thread

  OpenMP OpenMP

  Distributed memory system Distributed memory system

  Message Passing Message Passing

PVM (Parallel virtual machine) PVM (Parallel virtual machine)

MPI MPI (Messing passing interface) (mainly) (Messing passing interface) (mainly)

  HPF (High performance Fortran) HPF (High performance Fortran)

Super

Super - - computers at ISM (1): computers at ISM (1): ismaltx ismaltx

  SGI ALtix3700 Super Cluster SGI ALtix3700 Super Cluster

  CPU: Intel Itanium2 1.3GHz CPU: Intel Itanium2 1.3GHz

64 x 4 = 256 CPUs (Total max speed: 64 x 4 = 256 CPUs (Total max speed:

1331.2 GFLOPS) 1331.2 GFLOPS)

  Memory: total 1920 GB Memory: total 1920 GB

512 GB x 3 + 384 GB = 1920 GB 512 GB x 3 + 384 GB = 1920 GB

  OS: OS: RedHat RedHat Enterprise Linux v.3 (for Enterprise Linux v.3 (for SGI) SGI)

  Physical random number generator Physical random number generator boards (16 set)

boards (16 set)

  Frontend Frontend Computers For Parallel Computers For Parallel Computer

Computer

The institute of statistical mathematics (ISM) The institute of statistical mathematics (ISM)

is a (national) research institute for statistical sciences is a (national) research institute for statistical sciences

provides several super provides several super - - computer systems for statistical computer systems for statistical research by Japanese statisticians.

We use R on these super We use R on these super - - computers and have made computers and have made several parallel computing functions available on it.

parallel BLAS replacements such as ATLAS and GOTO parallel BLAS replacements such as ATLAS and GOTO library

snow package for implementing parallel R functions using snow package for implementing parallel R functions using MPI and other distributed computing techniques.

We also implement a Web environment to use parallel R We also implement a Web environment to use parallel R easily.

R is used widely for developing statistical R is used widely for developing statistical theory and real data analysis

R sometimes requires huge amount of R sometimes requires huge amount of calculation such as

Huge matrix manipulations Huge matrix manipulations

Bootstrap or MCMC calculation Bootstrap or MCMC calculation

Data mining Data mining

Super Super - - computers may be useful for this computers may be useful for this purpose

Current super Current super - - computers equip computers equip

Many CPUs Many CPUs

However, single CPU of super computers is not so However, single CPU of super computers is not so different from ones in personal computers

Large memory Large memory

Almost all super Almost all super - - computers at present use computers at present use parallel computing techniques

We use R on our super We use R on our super - - computers for computers for utilizing these facilities

Traditional 1 CPU Traditional 1 CPU system

Shared memory system Shared memory system

Distributed memory Distributed memory system

Software tools for parallel computing Software tools for parallel computing

Shared memory system Shared memory system

Process fork, Process fork, thread thread

OpenMP OpenMP

Distributed memory system Distributed memory system

Message Passing Message Passing

HPF (High performance Fortran) HPF (High performance Fortran)

SGI ALtix3700 Super Cluster SGI ALtix3700 Super Cluster

CPU: Intel Itanium2 1.3GHz CPU: Intel Itanium2 1.3GHz

Memory: total 1920 GB Memory: total 1920 GB

OS: OS: RedHat RedHat Enterprise Linux v.3 (for Enterprise Linux v.3 (for SGI) SGI)

Physical random number generator Physical random number generator boards (16 set)

Frontend Frontend Computers For Parallel Computers For Parallel Computer

HITACHI SR11000 HITACHI SR11000 Model H1

CPU: IBM Power4+ CPU: IBM Power4+

16 x 4 = 64 (Total max 16 x 4 = 64 (Total max speed: 435.2 GFLOPS) speed: 435.2 GFLOPS)

Memory: total 128 GB Memory: total 128 GB

32 GB x 4 = 128 GB 32 GB x 4 = 128 GB

OS: (IBM) AIX 5L V5.2 OS: (IBM) AIX 5L V5.2

HP XC4000 System HP XC4000 System

CPU: AMD CPU: AMD Opteron Opteron 2.6GHz

2 x 128 = 256 CPUs (Total 2 x 128 = 256 CPUs (Total max speed: 1331.2

Memory: total 640 GB Memory: total 640 GB

4 GB x 96 + 8 GB x 32 = 4 GB x 96 + 8 GB x 32 = 640 GB

OS: RedHat OS: RedHat Enterprise Enterprise Linux v.3 (for HP)

Physical random number Physical random number

BLAS: Basic Linear Algebra Subprograms BLAS: Basic Linear Algebra Subprograms

high quality "building block" routines for performing high quality "building block" routines for performing basic vector and matrix operations

Level 1 BLAS do vector Level 1 BLAS do vector - - vector operations vector operations

Level 2 BLAS do matrix Level 2 BLAS do matrix - - vector operations vector operations

Level 3 BLAS do matrix- Level 3 BLAS do matrix - matrix operations matrix operations

BLAS are efficient, portable, and widely available, BLAS are efficient, portable, and widely available, they're commonly used in

R uses BLAS for executing linear calculations R uses BLAS for executing linear calculations

ATLAS: Automatically Tuned Linear ATLAS: Automatically Tuned Linear Algebra Software

achieve performance on par with machine achieve performance on par with machine - - specific tuned libraries

provides C and Fortran77 interfaces to a provides C and Fortran77 interfaces to a portably efficient

Threads are available Threads are available

ATLAS FAQ says: ATLAS FAQ says: