Microsoft PowerPoint - jpgrid41.takano

(1)

IEEE/ACM SC2013報告

⾼野了成

産業技術総合研究所情報技術研究研究部⾨

(2)

SC13 “HPC Everywhere”

• 25th IEEE/ACM International Conference for High

performance computing, Networking, Storage and

Analysis

– 会議名にSuper Computingは残っていない。。

– 今年はBig data (Analysis)に注⽬

• 11⽉10⽇〜16⽇⽶国コロラド州デンバー

• HPC関連のトップカンファレンス

– 今年の採択率20％ (90/456)

• TOP500、各種Awards、

Workshop、Tutorial、BoFなど

• 巨⼤な展⽰会場

– ⽶国DoE傘下研究所ブースが不在

• 参加者 10,500名

(3)

Big Data

S. Koonin, “Big Data for Big Cities”

• 基調講演:

G. Bell (Intel), “The Secret Life of Data”

• 招待講演

– A. N. Choudhary (Northwestern University)

– S. Koonin (New York University)

A.N.Choudhary, “Big Data + Big Compute = An

(4)

TOP 500

•

Green 500

• Xeon + NVIDIA K20xの圧勝

# System MFlops/W Power (kW)

1 TSUBAME‐KFC (Xeon/K20x) 4503.17 27.78 2 Wilkes (Xeon/K20) 3631.86 52.62 3 HA‐PACS TCA (Xeon/K20x) 3517.84 78.77 4 Piz Daint (Xeon/K20x) 3185.91 1753.66 5 romeo (Xeon/K20x) 3130.95 81.41 6 TSUBAME 2.5 (Xeon/K20x) 3068.71 922.54 7 iDataPlex DX360M4 (Xeon/K20x) 2702.16 53.62 8 iDataPlex DX360M4 (Xeon/K20x) 2629.10 269.94 9 iDataPlex DX360M4 (Xeon/K20x) 2629.10 55.62 1 CSIRO GPU Cluster (Xeon/K20m) 2358.69 71.01 TSUBAME-KFC（油浸冷却）

(6)

Graph 500

• 前回と変動なし

*) TEPS: Edge Traverse Per Second # System GTEPS 1 1 Sequoia (BG/Q) 15363 2 2 Mira (BG/Q) 14328 3 3 JUQUEEN (BG/Q) 5848 4 4 K computer (SPARC64) 5524.12 5 5 Fermi (BG/Q) 2567 6 6 Tianhe‐2 (Xeon/Phi) 2061.48 7 7 Turing (BG/Q) 1427 7 7 Blue Joule (BG/Q) 1427 7 7 DIRAC (BG/Q) 1427 7 7 Zumbrota (BG/Q) 1427

(7)

Green Graph 500

• TSUBAME-KFCはGreen 500との⼆冠

• Small DataではGraph CRESTチーム圧勝

# System MTEPS/ W Graph5 00 rank 1 TSUBAME‐KFC 6.72 47 2 JUQUEEN 5.41 3 3 Mira 4.42 2 4 EBD‐RH5885v2 4.35 96 5 Sequoia 3.55 1

Big data category:

# System MTEPS /W Graph5 00 rank 1 GraphCREST‐Xperia‐ A‐SO‐04E 153.17 143 2 GraphCREST‐ NEXUS7‐2013 129.63 141 3 Kitty6 73.57 58 4 GraphCREST‐Tegra3 64.12 150 5 GraphCREST‐Intel‐ NUC 53.82 124

small data category (scale < 30):

(8)

30 Technical Sessions

• Application Performance Characterization

• Cloud Resource Management and Scheduling

• Data Management in the Cloud

• Energy Management

• Engineering Scalable Applications • Extreme-Scale Applications

• Fault Tolerance and Migration in the Cloud

• Fault-Tolerant Computing • GPU Programming

• Graph Partitioning and Data Clustering

• I/O Tuning

• Improving Large-Scale Computation and Data Resources

• In-Situ Data Analytics and Reduction • Inter-Node Communication

• Load Balancing

• MPI Performance and Debugging • Matrix Computations

• Memory Hierarchy

• Memory Resilience

• Optimizing Data Movement • Optimizing Numerical Code • Parallel Performance Tools

• Parallel Programming Models and Compilation • Performance Analysis of Applications at Large

Scale

• Performance Management of HPC Systems

• Physical Frontiers

• Preconditioners and Unstructured Meshes • Sorting and Graph Algorithms

• System-wide Application Performance Assessments

(9)

⾼速VMマイグレーション

• ⾼速かつネットワーク負荷が⼩さいライブマイ

グレーションであるガイドコピーを提案

– ポストコピー⽅式の派⽣

– マイグレーション元に残したガイドVMのヒント情報

に従い、ページ転送を最適化

– c.f. 流鏑⾺、都⿃

(10)

⾼速VMマイグレーション

←ページフォルトおよび遅延の削減 ↓利⽤帯域の削減

(11)

クラウド資源管理

S. Niu (Tsinghua Univ.), et al., “Cost-effective Cloud HPC Resource Provisioning by

• 背景と動機

– パブリッククラウド上に仮想クラスタを

作成する環境の整備 e.g., StarCluster

– 予約インスタンスを活⽤して安く計算したい

• クラウド資源を「グルーポン」のように

共同購⼊して利⽤するSemi-Elastic

Cluster (SEC)を提案

• 負荷に応じてクラスタサイズを動的に調整

• バッチスケジューリングの拡張で実現

– シミュレーション実験で61%コスト削減

A (0,1.5) B C (1,0.75) B A (0,1.5) C (1,0.75) A (0,1.5) C (1,0.75) B D (1.75,1.5) D (1.75,1.5) D (1.75,1.5) (a) Pure on-demand cloud

(b) Traditional local cluster

(c) Semi-elastic cluster 1 2 3 Time (Hour) Processors Processors Processors 1 2 3 Time (Hour) 1 2 3 Time (Hour) (0.25,0.5) (0.25,0.5) (0.25,0.5)

(12)

クラウドのデータ管理（１）

• 背景

– 超⼤規模データを扱うデータサイエンス分野では、データを

GridFTPで転送してクライアントサイドで処理するか、SaaS版

Globus Onlineを⽤いるのが⼀般的

– WAN越しに転送する場合、サーバサイドでユーザが定義した

データのサブセット化を⽀援してデータ量を削減する機能が必要

• GridFTPのプラグインとしてSDQuery DSI (Scientific Data Query

Data Storage Interface)を開発

– HDF5とNetCDFデータフォーマットに対応したサブセット化APIを提供 – システム最適化 • データセグメントのインデキシングベース検索とインメモリフィルタリングによる全検索を⾃動的に選択する性能モデル • 異なるディスクブロックが読み出される場合、別のTCPストリームを⽤いる並列ストリームデータ転送 • 各サブブロックに対して同時にインデキシングを実⾏する並列インデキシング

Y. Su (Ohio State Univ.), et al., “SDQuery DSI: Integrating Data Management Support with a Wide Area Data Transfer Protocol”

(13)

クラウドのデータ管理（１）

実験では、以下を⽰した • 性能モデルの妥当性 • 広帯域ネットワークではサブセット化の効果が少ないが、帯域が⼗分ない場合は効果が⼤きい • 並列ストリームや並列インデキシングによる性能向上

(14)

クラウドのデータ管理（２）

• 背景

– データインテンシブアプリケーションでは超⾼性能データ転送

ツールが必要

– end-to-endパスにおけるホスト、ネットワーク、ストレージの

3つのボトルネックへの対応が必要

• 100Gbpsのend-to-end⾼速データ転送システムの設計、

最適化、性能評価を実施

– バックエンドストレージ接続にiSER（iSCSI Extensions for

RDMA）を使⽤

– ホスト間通信にRFTP（RDMAベースファイル転送プロトコル）

を使⽤

– 各ホストでNUMA⽤チューニングによる性能最適化

Y. Ren (Stony Brook Univ.), et al., “Design and Performance Evaluation of NUMA-Aware RDMA-Based End-to-End Data Transfer Systems”

(15)

クラウドのデータ管理（２）

バックエンドSANの設計 • iSERプロトコルを利⽤ • 各ファイルを指定したNUMAノードメモリに置き、 local I/Oになるようtargetプロセスを割り当て RDMAベースプロトコルRFTPの利⽤ • ゼロコピーで⾼速データ転送するため、 • 提案⼿法（RFTP）では100Gbps環境で 91Gbpsを達成。GridFTPでは29Gbps • CPU使⽤率も提案⼿法では削減できた • 特にRFTP sink側（RDMA Write）では⼤幅に削減できる

(16)

ポストペタに向けた耐障害性

• テクニカルセッション

– Fault-Tolerant Computing

– Fault Tolerance and Migration in the Cloud

– Matrix Computation

• パネル

– Fault Tolerance/Resilience at Petascale/Exascale: Is it

Really Critical?...

• 並列Hessenberg変換（チェックサム付きの線形代数

演算）のように、FTをアルゴリズムに⼊れ込む発表は

あるが、Checkpoint/Restartで何とかなってしまう

（何とかしよう）という印象

Y. Jia (Univ. of Tennessee), et al., “Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance”

(17)

Exhibition

• 58カ国、350件の展⽰、10,550名の参加

• 各種メディアでレポート

– http://news.mynavi.jp/column/sc13/ • CUDA6、Post-FX10、SX-ACEなど – http://www.hpcwire.com/tag/sc13/

(18)

ARM-based system

Charm++ cluster in a bag Tiled wall display controlled

by RasPi cluster@SDSC

EU exascale

super-computer research project: Mont-Blanc

(19)

FPGA

Convey HC memcached appliance@DELL memcached benchmark:

(20)

CNT Computer@Stanford

LEGO Turing Machine@Inria (http://rubens.ens-lyon.fr/)

(21)

(22)

雑感

• HPC + ビッグデータ

• HPC Cloudに対する注⽬の⾼まり

– システム系会議かというような論⽂も

– ここ数年AISTブースではHPCクラウドについて展⽰し

ているが、年々興味を持ってくれる⼈が増えているこ

とを肌で感じた

http://sc13.supercomputing.org/

Microsoft PowerPoint - jpgrid41.takano

IEEE/ACM SC2013報告

⾼野 了成

産業技術総合研究所 情報技術研究研究部⾨

SC13 “HPC Everywhere”

• 25th IEEE/ACM International Conference for High

performance computing, Networking, Storage and

Analysis

– 会議名にSuper Computingは残っていない。。

– 今年はBig data (Analysis)に注⽬

• 11⽉10⽇〜16⽇ ⽶国コロラド州デンバー

• HPC関連のトップカンファレンス

– 今年の採択率20％ (90/456)

• TOP500、各種Awards、

Workshop、Tutorial、BoFなど

• 巨⼤な展⽰会場

– ⽶国DoE傘下研究所ブースが不在

• 参加者 10,500名

Big Data

• 基調講演:

G. Bell (Intel), “The Secret Life of Data”

• 招待講演

– A. N. Choudhary (Northwestern University)

– S. Koonin (New York University)

TOP 500

•

ランキングに⼤きな変動無し

Green 500

• Xeon + NVIDIA K20xの圧勝

Graph 500

•

前回と変動なし

Green Graph 500

• TSUBAME-KFCはGreen 500との⼆冠

• Small DataではGraph CRESTチーム圧勝

Big data category:

small data category (scale < 30):

30 Technical Sessions

⾼速VMマイグレーション

• ⾼速かつネットワーク負荷が⼩さいライブマイ

グレーションであるガイドコピーを提案

– ポストコピー⽅式の派⽣

– マイグレーション元に残したガイドVMのヒント情報

に従い、ページ転送を最適化

– c.f. 流鏑⾺、都⿃

⾼速VMマイグレーション

クラウド資源管理

• 背景と動機

– パブリッククラウド上に仮想クラスタを

作成する環境の整備 e.g., StarCluster

– 予約インスタンスを活⽤して安く計算したい

• クラウド資源を「グルーポン」のように

共同購⼊して利⽤するSemi-Elastic

Cluster (SEC)を提案

• 負荷に応じてクラスタサイズを動的に調整

• バッチスケジューリングの拡張で実現

– シミュレーション実験で61%コスト削減

クラウドのデータ管理（１）

• 背景

– 超⼤規模データを扱うデータサイエンス分野では、データを

GridFTPで転送してクライアントサイドで処理するか、SaaS版

Globus Onlineを⽤いるのが⼀般的

– WAN越しに転送する場合、サーバサイドでユーザが定義した

データのサブセット化を⽀援してデータ量を削減する機能が必要

• GridFTPのプラグインとしてSDQuery DSI (Scientific Data Query

Data Storage Interface)を開発

クラウドのデータ管理（１）

クラウドのデータ管理（２）

• 背景

– データインテンシブアプリケーションでは超⾼性能データ転送

ツールが必要

– end-to-endパスにおけるホスト、ネットワーク、ストレージの

3つのボトルネックへの対応が必要

• 100Gbpsのend-to-end⾼速データ転送システムの設計、

最適化、性能評価を実施

– バックエンドストレージ接続にiSER（iSCSI Extensions for

RDMA）を使⽤

– ホスト間通信にRFTP（RDMAベースファイル転送プロトコル）

を使⽤

– 各ホストでNUMA⽤チューニングによる性能最適化

⾼野了成

産業技術総合研究所情報技術研究研究部⾨

• 11⽉10⽇〜16⽇⽶国コロラド州デンバー