• 検索結果がありません。

JAIST Repository https://dspace.jaist.ac.jp/

N/A
N/A
Protected

Academic year: 2021

シェア "JAIST Repository https://dspace.jaist.ac.jp/"

Copied!
3
0
0

読み込み中.... (全文を見る)

全文

(1)

Japan Advanced Institute of Science and Technology

JAIST Repository

https://dspace.jaist.ac.jp/

Title

スプライン基底関数系を用いた固体系の量子モンテカ ルロシミュレーションに対するGPGPUによる高速化の研 究

Author(s) 上嶋, 裕

Citation

Issue Date 2012‑03

Type Thesis or Dissertation Text version author

URL http://hdl.handle.net/10119/10434 Rights

Description Supervisor:前園 涼 准教授, 情報科学研究科, 修士

(2)

GPGPU Acceleration of Spline Basis Set Quantum Monte Carlo Simulation for Solid System

Uejima Yutaka (1010007) School of Information Science,

Japan Advanced Institute of Science and Technology February 2012

Keywords: GPGPU, Quantum Monte Carlo, HPC, Hybrid MPI.

In fundamental nanomaterial research, quantum state is investigated by computer simulation. One of the computational approaches is ab-initio Quantum Monte Carlo (QMC) method. The technique enables to de- scribe more direct behavior of electrons than the convectional auxiliary density function theory. Then the characteristic of QMC can be used for biomolecule and magnetic applications of which simulation was dif- ficult by the traditional approaches. Since QMC is based on statistical technique, over 99% parallel performance (with 80,000 cores) has been achieved. Therefore, its characteristic is considered to suit recent mas- sively parallel supercomputers and further its applications are expected.

Another advantage of QMC is high reliability since it is an approximation absence calculation. The result is obtained directly from the multiple vari- able equation. On the other hands, a huge computation time is recognized to be an issue for primary electronic structure calculation of solids and large scale molecules.

Therefor, as a solution of the issues, through put performance enhance- ment can be achieved if CPU processed bottleneck section is accelerated.

A Hybrid parallelization technique is one of the methods for this. It is the methods allocating MPI parallelization between nodes and other parallel implementation such OpenMP or Pthread between CPU cores. OpenMP

Copyright c!2012 by Uejima Yutaka

1

(3)

create a process, which is running on a CPU, into many threads and easily boots bottlenecked part. Hybrid parallelization using the OpenMP is get- ting popularity in recent parallel scientific computing on supercomputer.

However, the performance improvement of OpenMP is known to be limited since it only uses CPU cores. For one of the next generation paralleliza- tion approaches after OpenMP, General Purpose Graphic Processing Unit (GPGPU) is attracting attentions. GPUs have more operational cores compared to CPUs’. As a result, floating point performance is very high.

GPU acceleration is recent trend in various fields since GPU structure is simple and performance tuning is facile. Due to high efficiency, low power consumption with high computation performance, they are used for super- computers and clusters.

In this research, the bottleneck section was replaced by GPU accelerated spline basis set expanded QMC electronic structure calculation of solid.

30.67 times performance improvement was confirmed for TiO2 simulation with 1536 electrons. In QMC, The position of electrons were needed to be updated over ten million times based on Metropolis algorithms For each iteration step, one-electron wave functions were required to be recalculated.

According to profiling, to calculate the section attributed to 30% of total execution time. The functions were expanded by B-Spline basis sets for the case of solid system Therefore, the recalculation decomposed into matrix vector multiplication, for which parallelization can be applied, the author has determined to use GPU.

At the initial implementation stage, only 1.5 times speed up was observed against the sequential counterpart. However, the degree of GPU paral- lelization was increased by simultaneous update, the final speed up rate of 30.67% was obtained. In this thesis, GPU replacement of the bottlenecks and optimization for GPU are explained. Furthermore, from the obtained result, the single precision effect the result and computation performance were discussed. In addition, from the result obtained by this research, further improvement of bottleneck part was discussed. Also the author suggested further performance tuning techniques and speed up estimation.

2

参照

関連したドキュメント

Keywords: Learning Process, Instructional Design, Learning Analytics, Time-Series Clustering, Dynamic Time

Causation and effectuation processes: A validation study , Journal of Business Venturing, 26, pp.375-390. [4] McKelvie, Alexander & Chandler, Gaylen & Detienne, Dawn

Previous studies have reported phase separation of phospholipid membranes containing charged lipids by the addition of metal ions and phase separation induced by osmotic application

It is separated into several subsections, including introduction, research and development, open innovation, international R&D management, cross-cultural collaboration,

UBICOMM2008 BEST PAPER AWARD 丹   康 雄 情報科学研究科 教 授 平成20年11月. マルチメディア・仮想環境基礎研究会MVE賞

To investigate the synthesizability, we have performed electronic structure simulations based on density functional theory (DFT) and phonon simulations combined with DFT for the

During the implementation stage, we explored appropriate creative pedagogy in foreign language classrooms We conducted practical lectures using the creative teaching method

講演 1 「多様性の尊重とわたしたちにできること:LGBTQ+と無意識の 偏見」 (北陸先端科学技術大学院大学グローバルコミュニケーションセンター 講師 元山