• 検索結果がありません。

既存の並列化手法を用いたGPGPUプログラミング

N/A
N/A
Protected

Academic year: 2021

シェア "既存の並列化手法を用いたGPGPUプログラミング"

Copied!
8
0
0

読み込み中.... (全文を見る)

全文

(1)

طଘͷฒྻԽख๏Λ༻͍ͨ

GPGPU

ϓϩάϥϛϯά

େ ౡ ૱ ࢙

ฏ ᖒ ক Ұ

ຊ ଟ ߂ थ

GPU ͷੑೳ޲্ʹ൐͍ɼGPU ͷੑೳΛ༷ʑͳ༻్ʹ׆༻͢Δ GPGPU ͕஫໨͞Ε͍ͯΔɽGPGPU ͸ಛʹฒྻϓϩάϥϜʹ͓͍ͯ CPU Λ௒͑Δߴ͍ੑೳ͕ظ଴͞ΕΔҰํɼGPGPU ϓϩάϥϛϯ άಛ༗ͷख๏Λ༻͍Δඞཁ͕͋ΔͨΊιϑτ΢ΣΞͷ࡞੒͕༰қͰ͸ͳ͍ɽຊߘͰ͸ GPGPU ϓϩ άϥϛϯάΛ༰қʹ͢Δ̍ͭͷख๏ͱͯ͠طଘͷฒྻԽख๏Λར༻͢Δ͜ͱΛఏҊ͢Δɽ·ͨ۩ମత ͳ࣮૷ʹ޲͚ͯɼۙ೥ར༻͞Ε࢝Ίͨ GPGPU ޲͚ϓϩάϥϛϯάݴޠ CUDA Λར༻͠ɼGPU ্ ͷॲཧΛطଘͷฒྻԽख๏Ͱ͋Δ SIMD ໋ྩɼOpenMPɼMPI Λ༻͍ͯهड़͢Δํ๏Λݕ౼͢Δɽ

GPGPU Programming

Using Existing Parallelizing Method

Satoshi OHSHIMA,

Shoichi HIRASAWA

and Hiroki HONDA

GPGPU utilizing GPU’s performance for general-purpose computation is attracting much attention. GPGPU is expected to effect higher performance than CPU. However, creating GPGPU programming is not easy because programming methods peculiar to GPGPU pro-gramming are needed. In this paper, we propose to use existing parallellizing method as one of a new method making GPGPU programming easier. Also we consider writing programs running on GPUs with SIMD instruction, OpenMP and MPI based on the new GPGPU programming language of CUDA.

1. ͸ ͡ Ί ʹ

ۙ೥ɼߴ౓ͳը૾ॲཧͷཁٻʹ൐͍GPU(Graphics

Processing Unit) ͷੑೳ͕ஶ͘͠޲্͍ͯ͠Δ1)ɽ

GPU ͸ CPU(Central Processing Unit) ͱ ൺ ΂ ͯฒྻॲཧ΍ϕΫτϧॲཧʹదͨ͠ϋʔυ΢ΣΞ ߏ ੒ Ͱ ͋ Δ ͜ ͱ ͔ Β ɼGPU Λ ൚ ༻ ԋ ࢉ ʹ ར ༻ ͢ Δ GPGPU(General-Purpose computation using GPUs)2)΁ͷ஫໨͕ߴ·͍ͬͯΔɽ ࠓ೔Ͱ͸ίϯγϡʔϚPCͷଟ͘ʹGPU͕౥ࡌ͞ Ε͍ͯΔͷͱൺֱ͢ΔͱɼGPGPUͷ׆༻ࣄྫ͸ݶΒ Ε͍ͯΔɽͦͷେ͖ͳཧ༝ͷ̍ͭʹGPGPUϓϩά ϥϛϯάͷ೉͠͞ɼ͢ͳΘͪGPU͸ฒྻॲཧʹద͠ ͨϋʔυ΢ΣΞͰ͋Γͳ͕ΒطଘͷฒྻԽख๏Λ༰қ ʹར༻Ͱ͖ͳ͍͜ͱ͕ڍ͛ΒΕΔɽैདྷͷGPGPU ϓϩάϥϛϯάʹ͓͍ͯ͸ɼάϥϑΟοΫεAPIͱ γΣʔμݴޠΛ༻͍ͨάϥϑΟοΫεϓϩάϥϛϯά ͷٕज़͕ඞཁͱ͞Ε͖ͯͨɽ͜͏ͨ͠ϓϩάϥϜ࡞੒ ख๏͸GPGPUϓϩάϥϛϯάʹಛ༗ͷ΋ͷͰ͋Δ † ిؾ௨৴େֶ େֶ Ӄ৘ใγεςϜֶ ݚڀՊ

Graduate School of Information Systems, The Univer-sity of Electro-Communications ͨΊɼଞͷ෼໺ͷϢʔβʹͱͬͯGPGPUͷ׆༻͸ ༰қͰ͸ͳ͍ɽݱࡏͰ͸CUDA3)ͷΑ͏ʹGPGPU ϓϩάϥϛϯάͷಛघੑΛӅณ͢Δϓϩάϥϛϯάݴ ޠͳͲ΋ొ৔͍ͯ͠Δ͕ɼಛఆͷGPUͰͷΈར༻Մ ೳͳ͏͑ʹɼGPUΞʔΩςΫνϟͷཧղͱ৽͍͠ݴ ޠͷशಘ͕ඞཁͰ͋Δɽ ͦ ͜ Ͱ ຊ ߘ Ͱ ͸ ɼط ଘ ͷ ฒ ྻ Խ ख ๏ Λ ༻ ͍ ͨ GPGPUϓϩάϥϛϯάΛఏҊ͢ΔɽGPU͸ฒྻ ੑͷߴ͍ϓϩάϥϜͷ࣮ߦʹద͍ͯ͠ΔͨΊɼطଘͷ ฒྻԽख๏Λ༰қʹGPU΁ద༻͢Δखஈ͕͋Ε͹ɼ GPUΛฒྻϓϩάϥϜͷ࣮ߦ؀ڥͱͯ͠༗ޮ׆༻Մ ೳͱͳΔ͜ͱ͕ظ଴Ͱ͖Δɽ ຊߘͷߏ੒ΛҎԼʹࣔ͢ɽ2ষͰ͸ຊݚڀͷഎܠʹ ͋ͨΔGPUͱGPGPUʹ͍ͭͯ؆୯ʹઆ໌͠ɼݱঢ় ͷ໰୊఺Λ໌Β͔ʹ͢Δɽ3ষͰ͸ղܾҊͱͯ͠طଘ ͷฒྻԽख๏Λར༻ͨ͠GPGPUϓϩάϥϛϯάΛఏ Ҋ͢Δɽ4ষͰ͸ఏҊʹର͢Δ࣮૷ྫͱͯ͠ɼCUDA Λ༻͍ͯGPU޲͚ʹطଘͷฒྻԽख๏Λ࣮૷͢Δํ ๏Λݕ౼͢Δɽ5ষͰ͸ؔ࿈ݚڀʹ৮Εɼ6ষ͸·ͱ Ίͱࠓޙͷ՝୊ͷষͱ͢Δɽ

(2)

2. GPU ͷΞʔΩςΫνϟͱطଘͷ GPGPU

ϓϩάϥϛϯάख๏

GPU͸ຊདྷɼߴ଎ʹը૾ඳըΛߦ͏ͨΊʹൃల͠ ͨϋʔυ΢ΣΞͰ͋ΔɽݱࡏͰ͸ಈը࠶ੜࢧԉͳͲͷ ॲཧ΋౷߹͞ΕΔ܏޲ʹ͋ΔͨΊɼੑೳͷࠩҟ͸େ͖ ͍΋ͷͷɼҰൠతͳPCͷଟ͘ʹ౥ࡌ͞Ε͍ͯΔɽ ਤ1ʹ఻౷తͳGPUͷϋʔυ΢ΣΞߏ੒ͱը૾ ඳըͷͨΊͷओͳॲཧͷରԠΛࣔ͢ɽGPU͕ߦ͏ը ૾ඳըͷͨΊͷओͳॲཧ͸ɼฒྻܭࢉ΍ϕΫτϧܭࢉ ʹΑΔߴ଎Խʹద͍ͯ͠ΔɽͦͷͨΊGPU্ͷॲཧ Ϣχοτ͸ฒྻԽ͕ਐΜͰ͓ΓɼϕΫτϧܭࢉʹద͠ ͨ಺෦ߏ଄ͱͳ͍ͬͯΔɽ·ͨߴ౓ͳը૾දݱʹ͸ඳ ը಺༰ʹԠͨ͡ෳࡶͳܭࢉ͕ඞཁͳͨΊɼॲཧϢχο τͷϓϩάϥϚϒϧԽ͕ਐΜͰ͍ΔɽGPUΛ༻͍ͨ ϓϩάϥϜΛ࡞੒͢Δʹ͸ɼDirectX΍OpenGLͱ ͍ͬͨάϥϑΟοΫεAPIΛ༻͍ͯGPUͷಈ࡞λΠ ϛϯά੍ޚ΍CPU-GPUؒͷσʔλ௨৴ͳͲΛߦ͍ɼ HLSLɼGLSLɼCgͱ͍ͬͨγΣʔμݴޠΛ༻͍ͯܭ ࢉϢχοτͷߦ͏ॲཧΛهड़͢Δඞཁ͕͋Δɽ ਤ1 ఻౷తͳ GPU ͷϋʔυ΢ΣΞߏ੒ͱը૾ඳըॲཧͷରԠ

Fig. 1 Hardware construction of traditional GPU and relationship with graphics processing

GPGPU͸ɼGPUͷಛ௃Λར༻༷ͯ͠ʑͳॲཧ(൚ ༻ܭࢉɼGeneral-Purpose computation)Λߦ͏΋ͷ Ͱ͋ΔɽGPUͷϋʔυ΢ΣΞੑೳΛ׆͔ͤΔ༻్ɼ͢ ͳΘͪฒྻܭࢉ΍ϕΫτϧܭࢉʹదͨ͠ର৅໰୊ʹର ͯ͠͸ɼCPUΛѹ౗͢ΔԋࢉੑೳΛಘΔ͜ͱ͕Ͱ͖ Δɽ͔͠͠ͳ͕ΒɼGPGPUΞϓϦέʔγϣϯͷ࣮૷ ͸༰қͰ͸ͳ͍ɽྫ͑͹GPUΛ༻͍ͯ਺஋ܭࢉΛߦ ͏ͨΊʹ͸ɼ਺஋ܭࢉΞϧΰϦζϜΛը૾ඳըͷγε ςϜʹରԠ࣮ͤͯ͞૷͢Δඞཁ͕͋Δ(ਤ2)ɽͦͷͨ Ίʹ͸਺஋ܭࢉϓϩάϥϛϯάͱάϥϑΟοΫεϓϩ άϥϛϯάͷ྆ํʹਫ਼௨͍ͯ͠Δඞཁ͕͋Γɼߴ͍ੑ ೳΛಘΔͨΊʹ͸ฒྻԽख๏΍GPUΞʔΩςΫνϟ ʹ͍ͭͯͷ஌ࣝ΋ॏཁͰ͋ΔɽݱࡏͰ͸CUDAͳͲ άϥϑΟοΫεϓϩάϥϛϯάͷ஌ࣝΛඞཁͱ͠ͳ͍ GPGPUϓϩάϥϛϯάͷͨΊͷݴޠ΍ϥΠϒϥϦ ΋͋Δ͕ɼ͜ΕΒΛ༻͍ͯΞϓϦέʔγϣϯΛ࡞੒͢ Δʹ͸GPUΞʔΩςΫνϟͷ஌͕ࣝඞཁͰ͋ͬͨΓɼ طଘͷϓϩάϥϛϯάख๏ͱ͸ҟͳΔख๏Λशಘ͢Δ ඞཁ͕͋ͬͨΓ͢Δɽ

ࠓ೔Ͱ͸MAC OS XͷAqua/Quartz΍Windows VistaͷAeroGlassΛ͸͡Ίͱͯ͠OS΍σεΫτο ϓ؀ڥͷϨϕϧͰ͋Δఔ౓ߴੑೳͳGPUΛཁٻ͢Δ έʔε͕૿Ճ͍ͯ͠Δ4)7)ɽߋʹಈը࠶ੜࢧԉػೳ Λ࣋ͭGPU(ϏσΦΧʔυ)΋૿Ճ͍ͯ͠ΔͨΊɼί ϯγϡʔϚ޲͚PCͷଟ͘ʹGPU͕౥ࡌ͞Ε͍ͯΔɽ GPUͷ࣋ͭཧ࿦ԋࢉੑೳͷߴ͞Λߟྀ͢ΔͱɼGPU ͷ࣋ͭશͯͷੑೳΛ׆༻͢Δ͜ͱ͸Ͱ͖ͳͯ͘΋ɼ͋ Δఔ౓ͷੑೳ͕׆༻Ͱ͖Ε͹༷ʑͳΞϓϦέʔγϣϯ ͷߴ଎Խ͕ظ଴Ͱ͖Δɽ͔͠͠ͳ͕Βɼݱࡏը૾ॲཧ Ҏ֎ͷ෼໺ʹ͓͍ͯGPGPU͕׆༻͞Ε͍ͯΔͷ͸ Ұ෦ͷ਺஋ܭࢉ΍Պֶٕज़ܭࢉʹݶΒΕ͍ͯΔɽͦͷ ࠷ͨΔཧ༝ͱͯ͠ɼGPUΛ׆༻͢ΔϓϩάϥϜͷ࡞ ੒͕༰қͰͳ͍͜ͱ͕ڍ͛ΒΕΔɽGPGPUʹৄ͘͠ ͳ͍ଟ͘ͷΞϓϦέʔγϣϯϓϩάϥϚʹͱͬͯ͸ɼ GPGPU͕༰қʹར༻Ͱ͖Δ͜ͱ͕ॏཁͰ͋Δɽ͜Ε ͸΋ͪΖΜίϯγϡʔϚPCʹݶͬͨ࿩Ͱ͸ͳ͘ɼಛ ʹߴੑೳͳܭࢉ؀ڥΛٻΊΔHPC༻్Ͱ΋ڞ௨ͷ՝ ୊Ͱ͋Δɽ

3. ఏ Ҋ ಺ ༰

3.1 طଘͷฒྻԽख๏Λར༻ͨ͠GPGPUϓϩ άϥϛϯάͷఏҊ GPU͸CPUͱൺ΂ͯߴ͍ϋʔυ΢ΣΞϨϕϧͷฒ ྻੑΛඋ͍͑ͯΔͨΊɼGPUʹΑΔԋࢉੑೳͷ޲্ ͕ظ଴͞Ε͍ͯΔͷ͸΋ͬͺΒߴ͍ฒྻੑΛ࣋ͭΞϓ ϦέʔγϣϯͰ͋ΔɽҰํͰฒྻԽ͸SIMDɼSMPɼ

(3)

2 ඳըॲཧΛར༻ͯ͠ܭࢉॲཧΛߦ͏ϓϩάϥϜͷྫ

Fig. 2 Example of numerical culculation programming using graphics programming

3 GeForce8000 γϦʔζͷΞʔΩςΫνϟ

Fig. 3 Architecture of GeForce8000 series

PCΫϥελɼGridͳͲطʹ༷ʑͳݚڀ͕ਐΊΒΕͯ ͍ΔςʔϚͰ͋Γɼݱࡏ΋ϚϧνίΞCPUͷීٴͳ ͲʹΑΓ஫໨͞Ε͍ͯΔɽطଘͷฒྻԽख๏Λ༻͍ͯ GPGPUϓϩάϥϛϯάΛߦ͑ΔΑ͏ʹ͢Δ͜ͱ͕Ͱ ͖Ε͹ɼGPUΛطଘͷฒྻ؀ڥʹΑΓ͍ۙ΋ͷͱ͠ ͯѻ͑ΔΑ͏ʹͳΔͱߟ͑ΒΕΔɽGPU͕਎ۙͰ࢖ ͍΍͍͢΋ͷͱͳΔ͜ͱͰɼଟ͘ͷϢʔβ͕GPUͷ ࣋ͭߴ͍ܭࢉੑೳΛ༗ޮʹ׆༻Ͱ͖ΔΑ͏ʹͳΔ͜ͱ ͕ظ଴Ͱ͖Δɽ ͦ͜ͰɼGPGPUϓϩάϥϛϯάʹ͓͍ͯطଘͷฒ ྻԽख๏Λར༻͢Δ͜ͱΛఏҊ͢Δɽຊষͷ࢒Γͷ෦ ෼Ͱ͸ɼ࠷৽GPUͷϋʔυ΢ΣΞߏ੒Λ֬ೝͨ͠͏ ͑Ͱɼطଘͷฒྻϓϩάϥϛϯάʹ͓͍ͯ͸ฒྻԽର ৅ͷཻ౓ʹԠ༷ͯ͡ʑͳฒྻԽख๏͕༻͍ΒΕ͍ͯΔ ͜ͱΛߟྀ͠ɼGPUͷϋʔυ΢ΣΞߏ଄ͱฒྻԽख ๏ͱͷରԠ෇͚Λݕ౼͢Δɽ·ͨ࣍ষͰ͸ຊఏҊʹର ͢Δ࣮૷ͷྫͱͯ͠ɼCUDA޲͚ʹطଘͷฒྻԽख ๏Λ࣮૷͢Δ͜ͱΛݕ౼͢Δɽ 3.2 GPUʹର͢ΔطଘͷฒྻԽख๏ͷׂΓ౰ͯ ͷݕ౼ ࠷৽GPUͷϋʔυ΢ΣΞߏ଄Λ֓؍͠ɼطଘͷฒ ྻԽख๏ΛGPUʹద༻͢Δʹ͸Ͳ͏͢Ε͹Α͍͔ɼ طଘͷฒྻԽख๏Ͱ༻͍Δ࣮ߦϞσϧͱGPUϋʔ υ΢ΣΞΛͲͷΑ͏ʹؔ࿈෇͚Δ͔ʹ͍ͭͯͷݕ౼Λ ߦ͏ɽ ຊߘࣥච࣌఺ͷ࠷৽ੈ୅GPUͰ͋ΔGeForce8000 γϦʔζ͓ΑͼRadeonHD2000/3800γϦʔζͷ಺ ෦ߏ଄ͷ֓ཁΛਤ3͓Αͼਤ4ʹࣔ͢ɽ͜ΕΒͷGPU Ͱ͸ϓϩάϥϚϒϧॲཧϢχοτͷฒྻԽ͕ਐΜͰ͓ ΓɼGPUશମͰ࠷େ100Ҏ্ͷԋࢉΛฒྻ࣮ߦՄೳ

(4)

4 RadeonHD2000/3800 γϦʔζͷΞʔΩςΫνϟ

Fig. 4 Architecture of RadeonHD2000/3800 series

ͱͳ͍ͬͯΔɽͨͩ͠ɼGPU্ͷશͯͷԋࢉث͕ಉ ͡ػೳΛ࣋ͪۉ౳ʹ഑ஔ͞Ε͍ͯΔΘ͚Ͱ͸ͳ͍ɽ͍ ͔ͭ͘ͷԋࢉث͕ू·ͬͯॲཧϢχοτΛߏ੒͓ͯ͠ ΓɼߋʹॲཧϢχοτ͕ू·ͬͯGPUΛߏ੒͍ͯ͠ Δɽ·ͨϝϞϦʹ͍ͭͯ΋ɼGPUશମ͔Βಁաతʹ ΞΫηεՄೳͳ(άϩʔόϧͳ)ϝϞϦͷΈΛඋ͑ͯ ͍ΔΘ͚Ͱ͸ͳ͘ɼॲཧϢχοτ͝ͱʹϩʔΧϧͳϝ ϞϦ΋උ͍͑ͯΔɽ͢ͳΘͪݱࡏͷGPU͸ɼϋʔυ ΢ΣΞϨϕϧͰͷߴ͍ฒྻੑΛඋ͍͑ͯΔͷΈͳΒͣɼ ౥ࡌ͍ͯ͠ΔԋࢉثͱϝϞϦʹ͸֊૚ੑ͕උΘ͍ͬͯ Δͱଊ͑Δ͜ͱ͕Ͱ͖Δɽ طଘͷฒྻϓϩάϥϛϯάʹ͓͍ͯ͸ɼσʔλϨϕ ϧͷฒྻੑɼ໋ྩϨϕϧͷฒྻੑɼεϨουϨϕϧͷ ฒྻੑɼϓϩηεϨϕϧͷฒྻੑͱ͍ͬͨฒྻੑ͕ར ༻͞Ε͍ͯΔɽ·ͨ۩ମతͳฒྻϓϩάϥϛϯάͷख ஈͱͯ͠͸SIMD໋ྩɼOpenMP͓Αͼ֤छεϨο υϥΠϒϥϦɼMPIͳͲ͕༻͍ΒΕ͍ͯΔ(ਤ5)ɽͦ ͜ͰɼGPUͷ֊૚ੑͱطଘͷฒྻԽख๏ͷ֊૚ੑͱ ΛରԠ෇͚ɼGPU޲͚ʹ͜ΕΒͷؔ਺ɾϥΠϒϥϦ Λ࣮૷͢Δ͜ͱΛߟ͑Δɽ 3.2.1 GPU޲͚SIMD໋ྩͷݕ౼ SIMD໋ྩ͸̍౓ͷԋࢉͰෳ਺ͷσʔλʹରͯ͠ॲ ཧΛߦ͏͍ΘΏΔϕΫτϧԋࢉ໋ྩͰ͋Γɼσʔλฒ ྻ޲͚ͷࡉཻ౓ͳฒྻԽʹར༻͞Ε͍ͯΔɽGPUϓ ϩάϥϛϯάʹ͓͍ͯ͸ɼը૾ඳըͷࡍʹߦ͏௖఺΍ ৭ͷԋࢉʹ4࣍ݩϕΫτϧԋࢉ͕ద͍ͯ͠Δ͜ͱ͔Βɼ ϋʔυ΢ΣΞϨϕϧͰϕΫτϧԋࢉʹద͍ͯ͠ΔͷΈ ͳΒͣɼγΣʔμݴޠʹϕΫτϧԋࢉ޲͚ͷΠϯλʔ ਤ5 CPU ʹ͓͚Δ֊૚ܕฒྻϓϩάϥϛϯά

Fig. 5 Hierarchical parallel programming on CPU

ϑΣΠε͕උΘ͍ͬͯΔɽϕΫτϧԋࢉʹΑΔߴ଎Խ ͕ߦ͑ΔΑ͏ʹϓϩάϥϜΛ૊Ή͜ͱ͸GPGPUϓ ϩάϥϛϯάͷجຊͰ͋ΔɽҰํCPUϓϩάϥϛϯ άʹ͓͍ͯ͸MMX΍SSEɼVMXͳͲ֤CPUͷ උ͑Δ໋ྩηοτΛར༻ͯ͠SIMD໋ྩΛར༻͢Δɽ ͦͷͨΊCPUͷରԠ͢Δ໋ྩηοτ͝ͱʹҟͳΔϓ ϩάϥϜهड़Λ͢Δඞཁ͕͋Δ্ʹɼΠϯϥΠϯΞη ϯϒϥΛهड़͢Δඞཁ͕͋Δ৔߹΋͋ΔͨΊɼGPU ޲͚ͷSIMD໋ྩ͸࡞੒͠ʹ͍͘ɽͦ͜Ͱɼத੢Β8)

(5)

ʹΑΔهड़ํࣜΛར༻͢Δ͜ͱΛߟ͑Δɽ͜ͷهड़ํ ࣜ͸Ϣʔβ(ΞϓϦέʔγϣϯϓϩάϥϚ)ʹରͯ͠ SIMD໋ྩηοτ͝ͱͷهड़ͷҧ͍ΛӅณ͠ɼ࣮ߦ؀ ڥΛ໰Θͣ౷Ұతͳهड़ʹΑͬͯSIMDԽΛߦ͏͜ ͱͷͰ͖ΔํࣜͰ͋Δɽ࣮૷తʹ͸ఏҊ͢Δهड़ํࣜ Λ༻͍ͯॻ͔ΕͨϓϩάϥϜʹର֤ͯ͠CPUͷରԠ ͢ΔSIMD໋ྩ΁ͷϓϩάϥϜม׵Λߦ͏΋ͷͰ͋ ΔͨΊɼهड़ํ͔ࣜΒGPUͷରԠ͢ΔϕΫτϧԋࢉ ΁ͷม׵ػೳΛ࣮૷͢Δ͜ͱͰGPU޲͚SIMD໋ ྩͷ࣮૷͕ՄೳͱͳΔɽ 3.2.2 GPU޲͚OpenMPͷݕ౼ OpenMP͓Αͼ֤छεϨουϥΠϒϥϦΛ༻͍ͨ ฒྻԽ͸ɼڞ༗ϝϞϦΛඋ͑ͨ؀ڥʹదͨ͠ࡉཻ౓ͷ ฒྻԽख๏Ͱ͋ΔɽOpenMP͸֤छεϨουϥΠϒϥ Ϧ(ओʹpthread)ͷ্Ґʹ࣮૷͞Ε͓ͯΓ͔ͭهड़͕ ༰қͰ࢖͍΍͍ͨ͢Ίɼ͜͜Ͱ͸GPU޲͚OpenMP ͷݕ౼ʹ࿩ΛߜΔɽOpenMPͰ͸ஞ࣍ϓϩάϥϜͷ ιʔείʔυʹϓϥάϚΛૠೖ͠ɼίϯύΠϧ࣌ʹϓ ϥάϚΛղऍͯ͠ڞ༗ϝϞϦΛ༻͍ͨεϨουฒྻϓ ϩάϥϜʹม׵͢Δɽϧʔϓม਺ͷࣗಈॻ͖׵͑ʹΑ ΔϧʔϓͷฒྻԽͱ͍ͬͨࡉཻ౓ͷฒྻԽʹ࢖ΘΕΔ ͜ͱ͕ଟ͘ɼஞ࣍ϓϩάϥϜͷஈ֊తͳฒྻԽʹ΋ద ͍ͯ͠ΔɽOpenMPΛར༻͢Δʹ͸OpenMPͷϓ ϥάϚΛղऍ͢Δ͜ͱ͕ՄೳͳίϯύΠϥ͕ඞཁͰ ͋ΔɽରԠίϯύΠϥͷ୅දతͳ΋ͷͱͯ͠͸Omni

OpenMP Compiler9)ɼIntel C++ Compiler,GCCͳ Ͳ͕͋Γɼଞʹ΋঎༻ίϯύΠϥͷ͍͔͕ͭ͘ରԠ͠ ͍ͯΔɽOpenMPίϯύΠϥͷجຊతͳ࣮૷͸ɼϓϥ άϚ͕ૠೖ͞Εͨ෦෼ʹεϨουͷ੍ޚΛߦ͏هड़Λ ૠೖ͢Δ͜ͱͰ࣮ݱͰ͖ΔɽGPUʹ͸ڞ༗ϝϞϦ͕ ౥ࡌ͞Ε͍ͯΔͨΊɼOpenMPʹ͓͚ΔεϨουͷ ੍ޚʹ૬౰͢Δهड़ΛGPU্Ͱ໛฿Ͱ͖Ε͹GPU ޲͚OpenMPͷ࣮૷͕ՄೳͱͳΔɽ 3.2.3 GPU޲͚MPIͷݕ౼ MPI͸ϝοηʔδ௨৴Λར༻͠ɼෳ਺ϊʔυؒʹ͓ ͚Δσʔλ΍੍ޚͷ΍ΓऔΓΛߦ͏͜ͱͰฒྻϓϩά ϥϜΛܗ੒͢ΔͨΊͷن֨Ͱ͋Δɽϊʔυ͝ͱʹऔಘ ͨ͠rank৘ใʹج͍ͮͯॲཧΛৼΓ෼͚Δͱ͍ͬͨ ࢖͍ํʹద͓ͯ͠ΓɼOpenMPͱൺ΂ͯૄཻ౓ͷฒ ྻԽʹ࢖ΘΕΔ͜ͱ͕ଟ͍ɽMPI͸෼ࢄϝϞϦ؀ڥ ޲͚ͷฒྻԽख๏Ͱ͋Δ͕ɼڞ༗ϝϞϦ؀ڥͰ΋ಛʹ ໰୊ͳ͘ར༻͢Δ͜ͱ͕Ͱ͖ΔɽΉ͠Ζ௨৴ͷࡉ͔͍ ੍ޚ͕ՄೳͳͨΊɼڞ༗ϝϞϦ؀ڥͰ΋OpenMPΑ Γߴ͍ੑೳ͕ಘΒΕΔ͜ͱ͕͋ΔɽMPIͷ࣮૷͸ί ϯύΠϥ΍ෳࡶͳιʔείʔυม׵ػߏΛඞཁͱ͠ͳ ͍୅ΘΓʹɼ(ෳ਺ͷPCʹލͬͯ)ෳ਺ͷϓϩηεΛ ্ཱͪ͛ΔͨΊͷػߏ౳Λඋ͍͑ͯΔɽMPIͷ࣮૷ ͱͯ͠͸mpich͓Αͼmpich2,LAM10),11)ͳͲ͕ڍ ͛ΒΕΔɽGPU޲͚MPIͷ࣮૷ʹ͍ͭͯ͸ɼԋࢉϢ χοτؒʹ͓͚Δڞ༗ϝϞϦΛհͨ͠σʔλ΍੍ޚͷ ΍ΓऔΓʹMPIͷΠϯλʔϑΣΠεΛར༻Ͱ͖ΔΑ ͏ʹ͢Δͱ͍͏࣮૷͕ߟ͑ΒΕΔɽ·ͨɼCPU-GPU ؒͷ௨৴ʹMPIͷΠϯλʔϑΣΠεΛར༻Ͱ͖ΔΑ ͏ʹ͢Δ͜ͱͰɼCPU্ͷϝϞϦͱGPU্ͷϝϞ Ϧͱ͍͏෼ࢄϝϞϦͷ؅ཧΛطଘͷ෼ࢄϝϞϦʹ͍ۙ هड़ʹΑͬͯѻ͑ΔΑ͏ʹ͢Δ͜ͱ͕Ͱ͖Δɽ Ҏ্ͷΑ͏ʹɼطଘͷ༷ʑͳฒྻԽख๏ʹGPUΛ ରԠ෇͚Δ͜ͱͰɼGPU಺෦ͷԋࢉث΍ϝϞϦ͕࣋ ͭ֊૚ੑΛ׆༻ͨ͠GPU޲͚ฒྻϓϩάϥϜΛ༰ қʹ࡞੒ՄೳͱͳΔ͜ͱ͕ظ଴Ͱ͖Δɽ·ͨSMP΍ ϚϧνίΞCPUΛ౥ࡌͨ͠ϊʔυʹΑͬͯߏ੒͞Ε ΔPCΫϥελͳͲʹ͓͍ͯ͸ɼϊʔυ಺ͷฒྻԽΛ OpenMPɼϊʔυؒͷฒྻԽΛMPIʹΑͬͯߦ͏ϋ ΠϒϦουͳฒྻԽͳͲʹΑͬͯޮ཰తͳฒྻԽ͕ߦ ΘΕ͍ͯΔɽGPUΛ༻͍Δ৔߹ʹ͓͍ͯ΋ɼGPU಺ ͷ֊૚తͳฒྻੑΛར༻ͨ͠ϋΠϒϦουͳฒྻԽ΍ɼ GPUͱCPUͷฒྻॲཧͳͲ༷ʑͳϨϕϧͰͷฒྻ ॲཧ͕༰қʹߦ͑ΔΑ͏ʹͳΔ͜ͱ͕ظ଴Ͱ͖Δɽ

4. CUDA Λ༻͍ͨ GPU ޲͚طଘͷฒྻԽ

ख๏ͷ࣮૷ʹ޲͚ͨݕ౼

ຊষͰ͸ɼCUDAΛ༻͍ͯGPU޲͚ʹطଘͷฒ ྻԽख๏Λ࣮૷͢Δํ๏ʹ͍ͭͯΑΓ۩ମతͳݕ౼Λ ߦ͏ɽ ͜Ε·ͰGPUͷػೳΛར༻͢Δʹ͸άϥϑΟοΫ εAPIΛհ͢Δඞཁ͕͋ͬͨͨΊɼGPGPUʹ͓͍ͯ ͸શͯͷॲཧΛάϥϑΟοΫεϓϩάϥϛϯάͷํࣜ ʹ͋Θ࣮ͤͯ૷͢Δඞཁ͕͋ͬͨɽͦͷͨΊGPUϓ ϩάϥϛϯάΛߦ͏ͨΊʹ͸άϥϑΟοΫεϓϩάϥ ϛϯάΛशಘ͢Δඞཁ͕͋ͬͨɽ͜Εʹରͯ͠CUDA Ͱ͸ଟ͘ͷॲཧΛΑΓ௚ײతʹهड़͢Δ͜ͱ͕Ͱ͖Δ (ਤ6)ɽ·ͨ͜Ε·ͰGPUͷ಺෦৘ใʹ͍ͭͯ͸ά ϥϑΟοΫεϓϩάϥϛϯάʹඞཁͳఔ౓ͷ৘ใͷΈ ͕ެ։͞Ε͍͕ͯͨɼCUDAͱͱ΋ʹଟ͘ͷ৘ใ͕ఏ ڙ͞ΕΔΑ͏ʹͳͬͨɽ͜ΕʹΑΓGPUϓϩάϥϜ ͷσόοά΍࠷దԽʹ༗ӹͳ৘ใ͕ೖख͠΍͘͢ͳͬ ͨͱݴ͑Δɽ ਤ7ʹCUDAͷฒྻ࣮ߦϞσϧ͓ΑͼϝϞϦϞσ ϧΛࣔ͢ɽ͜ΕΛݩʹɼCUDAΛ༻͍ͯطଘͷฒྻ

Խख๏Ͱ͋ΔSIMD໋ྩɼOpenMP͓ΑͼMPIͷ

(6)

6 CUDA Λ༻͍ͨ GPGPU ϓϩάϥϛϯά

Fig. 6 GPGPU programming using CUDA

7 CUDA ͷฒྻ࣮ߦϞσϧͱϝϞϦϞσϧ

Fig. 7 Parallel execution model and memory model of CUDA

4.1 CUDA޲͚SIMD໋ྩͷ࣮૷

͸͡ΊʹCUDA޲͚SIMD໋ྩͷ࣮૷ʹ͍ͭͯߟ

͑ΔɽCUDAʹ͓͚Δ֤Block಺ͷThread͸ɼର

৅ͷσʔλ܈ʹରͯ͠ಉҰͷॲཧΛಉ࣌ʹ࣮ߦ͢Δ͜ ͱ͕ՄೳͰ͋Δɽ͜Ε͸యܕతͳSIMD໋ྩͷॲཧ ͱಉҰͰ͋ΔͨΊɼThreadΛ༻͍ͨԋࢉʹରͯ͠ط ଘͷSIMD໋ྩηοτ(ڞ௨هड़ํࣜ)ʹରԠͨ͠Π ϯλʔϑΣΠεΛ༻ҙ͢Δ͜ͱͰCUDA޲͚SIMD ໋ྩ͕࣮ݱͰ͖Δͱߟ͑ΒΕΔɽ 4.2 CUDA޲͚OpenMPͷ࣮૷ ଓ͍ͯ CUDA ޲͚OpenMP ʹ͍ͭͯߟ͑Δɽ OpenMPΛ༻͍ͨฒྻॲཧʹ͸ɼ֤ԋࢉث͕ࣗ༝ʹ ΞΫηε͢Δ͜ͱͷͰ͖Δڞ༗ϝϞϦ͕ඞཁͰ͋Δɽ CUDAʹ͸ڞ༗ϝϞϦͱͯ͠ར༻Մೳͳෳ਺छྨͷ

ϝϞϦ͕ଘࡏ͢ΔɽGlobal Memory, Constant

Mem-ory, Texture Memory͸શͯͷBlock͓Αͼશͯͷ

Thread͔Βڞ༗ϝϞϦͱͯ͠ར༻ՄೳͰ͋ΓɼShared

Memory͸ಉҰBlock಺ͷશͯͷThread͔Βڞ༗ ϝϞϦͱͯ͠ར༻Մೳͱͳ͍ͬͯΔɽ͜ΕΒͷ͏ͪɼ

Constant MemoryͱTexture Memory͸GPU͔Β ݟΔͱಡΈࠐΈઐ༻ͷϝϞϦͰ͋ΔͨΊɼࠓճ͸ར༻

ର৅͔Βআ֎͢Δɽ࢒ΔGlobal MemoryͱShared

Memoryʹ͍ͭͯ͸ɼGlobal MemoryΛར༻͢Ε͹

ThreadϨϕϧͱBlockϨϕϧ૒ํͰɼShared

Mem-oryΛར༻͢Ε͹BlockϨϕϧͰOpenMPͷฒྻॲ

ཧΛஔ͖׵͑ΒΕΔͱߟ͑ΒΕΔɽΑͬͯɼ͜ΕΒͷ ϝϞϦΛར༻ͯ͠OpenMPͷػೳΛ࣮ݱ͢Δ͜ͱͰɼ CUDA޲͚OpenMP͕࣮૷Ͱ͖Δͱߟ͑ΒΕΔɽ OpenMP ͸ࡉཻ౓ͳฒྻॲཧʹར༻͢ΔͨΊɼ BlockϨϕϧͷฒྻॲཧΑΓ΋ThreadϨϕϧͷฒ ྻॲཧʹద͍ͯ͠Δͱߟ͑Δ͜ͱ͕Ͱ͖ΔɽҰํͰ GPU಺෦ͷϝϞϦసૹ଎౓͸CPU-ϝΠϯϝϞϦؒ ΑΓ΋ߴ଎Ͱ͋Γɼ·ͨBlockϨϕϧͷฒྻॲཧ͸ ThreadϨϕϧͷॲཧͱൺ΂ͯॲཧͷࣗ༝౓͕ߴ͍ɽ

(7)

8 CUDA Λ༻͍ͨϋΠϒϦουฒྻϓϩάϥϜͷಈ࡞Πϝʔδ

Fig. 8 Executive image of hybrid parallel program using CUDA

ͦͷͨΊCUDA޲͚OpenMPͷ࣮૷ʹ͍ͭͯ͸ɼ ThreadϨϕϧͱBlockϨϕϧͷͦΕͧΕͰ࣮૷ͯ͠ ੑೳൺֱΛߦ͍ɼΑΓྑ͍࣮૷Λ୳Δ͜ͱʹ͢Δɽ 4.3 CUDA޲͚MPIͷ࣮૷ ࠷ޙʹCUDA޲͚MPIͷ࣮૷ʹ͍ͭͯߟ͑Δɽ CUDAͷฒྻ࣮ߦϞσϧͷ͏ͪɼThreadϨϕϧͷฒ ྻॲཧ͸SIMDʹۙ͘ɼҟछͷԋࢉΛฒྻ࣮ߦ͢Δ͜ ͱ͕Ͱ͖ͳ͍ɽ͜Ε͸ૄཻ౓ͷฒྻॲཧΛߦ͏MPI ʹ͓͍ͯக໋తͳ໰୊Ͱ͋ΔͨΊɼBlockϨϕϧͷฒ ྻॲཧΛར༻͢Δ͜ͱʹ͢ΔɽMPIΛ༻͍ͨฒྻॲ ཧʹ͸ϊʔυؒͷ௨৴͕ෆՄܽͰ͋Γɼ·ͨಉظ௨৴ ؔ਺Λ໛฿͢Δʹ͸Blockؒͷಉظػߏ͕ඞཁͰ͋ ΔɽCUDAʹ͸BlockؒͰ௚઀తͳ௨৴Λߦ͏ػߏ ͕ଘࡏ͠ͳ͍͕ɼGlobal MemoryΛհͯ͠σʔλΛ ΍ΓऔΓ͢Δ͜ͱͰBlockؒͷ௨৴Λ໛฿͢Δɽߋʹ CUDAʹ͸BlockؒͷಉظΛऔΔػߏ͕༻ҙ͞Εͯ ͍ͳ͍ͨΊɼGlobal MemoryΛར༻ͯ͠ಉظॲཧΛ ໛฿͢ΔɽҎ্ʹΑͬͯCUDA޲͚MPI͕࣮૷Ͱ ͖Δͱߟ͑ΒΕΔɽ ҰํɼCPU-GPUؒͷσʔλసૹΛߦ͏ؔ਺Λఆٛ ͠ɼCPUͱGPUͱ͍͏ҟͳΔϋʔυ΢ΣΞؒͷ෼ࢄ ϝϞϦΛMPIͷΠϯλʔϑΣΠεͰ؅ཧՄೳʹ͢Δ ͱ͍͏ద༻ํ๏΋ߟ͑ΒΕΔɽ͜ͷࡍͷ໰୊ͱͯ͠͸ɼ

CUDAͷ࣮ߦϞσϧͱͯ͠Grid(CPU͔ΒGPUʹ

ରͯ͠ԋࢉࢦࣔΛߦ͏ࡍͷ୯Ґ)ͷ్தͰCPU-GPU ؒͷσʔλૹड৴͕Ͱ͖ͳ͍͜ͱɼͦͷͨΊGridͷ ෼ׂͳͲ͕ඞཁͱͳΔՄೳੑ͕ڍ͛ΒΕΔɽ͔͠͠ͳ ͕ΒطଘͷΠϯλʔϑΣΠεͰ؅ཧͰ͖ΔϝϦοτ͸ େ͖͍ͨΊɼ͜ͷద༻ํ๏͸࣮૷͢ΔՁ஋͕͋Δɽ Ҏ্ͷΑ͏ʹThreadϨϕϧͷฒྻԽͱBlockϨϕ ϧͷฒྻԽΛద੾ʹ࢖͍෼͚Δ͜ͱͰɼCUDA޲͚

ͷSIMD໋ྩɼOpenMPɼMPIΛ࣮૷͢Δ͜ͱ͕Ͱ

͖Δͱߟ͑ΒΕΔɽߋʹɼطଘͷCPU޲͚ฒྻϓϩ άϥϛϯάʹ͓͚Δ֊૚ܕͷ(ϋΠϒϦουͳ)ฒྻ ϓϩάϥϜΛ໛฿͢Δ͜ͱ΋ՄೳͱͳΔɽਤ8ʹզʑ ͕ݕ౼ΛਐΊ͍ͯΔCUDAΛ༻͍ͨϋΠϒϦουฒ ྻϓϩάϥϜͷಈ࡞ΠϝʔδΛࣔ͢ɽ͜ΕΒΛ༻͍Δ ͜ͱͰɼGPUͷੑೳΛΑΓ༰қʹ͔ͭޮ཰ྑ͘ൃش Ͱ͖ΔΑ͏ʹͳΔ͜ͱ͕ظ଴Ͱ͖Δɽ

5. ؔ ࿈ ݚ ڀ

ݱࡏɼޮ཰ྑ͘GPGPUϓϩάϥϛϯάΛߦ͏ͨ Ίͷݴޠ΍ϥΠϒϥϦͷݚڀ͕͍͔ͭ͘ߦΘΕ͍ͯ Δɽதʹ͸BrookGPU12)΍RapidMind13)ͷΑ͏

ʹɼGPUͷΈͰ͸ͳ͘Cell Broadband Engine΍

ϚϧνίΞCPUͳͲ΁ͷద༻Λࢹ໺ʹೖΕͨ΋ͷ΋

ଘࡏ͢Δɽ͜ΕΒ͸GPGPUϓϩάϥϛϯάΛ༰қ

ʹ͍ͨ͠ͱ͍͏ཁٻ͸ຊݚڀͱಉ༷Ͱ͋Δɽ͔͠͠ͳ ͕ΒɼC/C++ʹର͢Δ֦ுͱ͍͏ܗΛͱΓطଘͷϓ ϩάϥϛϯάݴޠʹ͍ۙهड़͕Ͱ͖ΔΑ͏ʹ͍ͯ͠Δ

(8)

ͱ͸͍͑ɼ৽͍͠ݴޠ΍ϥΠϒϥϦΛ࡞੒͍ͯ͠Δͨ ΊطଘͷΞϓϦέʔγϣϯϓϩάϥϚʹର͢Δशಘί ετͷେ͖͞͸൱Ίͳ͍ɽҰํɼର৅ΞʔΩςΫνϟ ʹదͨ͠ݴޠΛ࡞੒͢Δ͜ͱͰطଘͷݴޠΛ༻͍ΔΑ Γߴ͍ੑೳΛಘ΍͍͢Մೳੑ΋͋ΔͨΊɼهड़΍शಘ ͷ͠΍͢͞ͱੑೳʹ͍ͭͯͷٞ࿦ɾධՁΛߦ͏Ձ஋͕ ͋Δɽ ຊݚڀͰ࣮૷ʹར༻͍ͯ͠ΔCUDA͸ɼGeForce8000 γϦʔζͷΞʔΩςΫνϟʹڧ͘ґଘͨ͠ϥΠϒϥϦ Ͱ͋ΔɽCUDA͸ࠓޙNVIDIAͷϦϦʔε͢Δ৽͠ ͍GPUͰ΋ར༻Ͱ͖Δͱ͞Ε͍ͯΔ͕ɼࠓͷͱ͜Ζ ଞࣾͷGPUͰ༻͍Δ͜ͱ͸Ͱ͖ͳ͍ͨΊɼGPUϓϩ άϥϜΛ͢΂ͯCUDAͰهड़͢Δͱ͍͏ͷ͸ݱ࣮త Ͱ͸ͳ͍ɽ͜Εʹର͠ɼNVIDIAͱGPUͷγΣΞΛ ೋ෼͍ͯ͠ΔAMD΋Close-to-the-Metal(CTM)14) ͷఏڙΛද໌͍ͯ͠ΔɽGPUϓϩάϥϜͷهड़͠΍ ͢͞΍ੑೳͷൃش͠΍͢͞ͷ໘͔Βɼࠓޙ΋GPUϕ ϯμʔ͕GPGPU޲͚ʹ։ൃ؀ڥ΍ࢿྉΛఏڙ͠ଓ ͚ΔՄೳੑ͸ߴ͍ɽҰํͰɼCUDA΍CTM͕ݴޠ ࢓༷ͷมߋͳ͘ܧଓͯ͠ఏڙ͞ΕΔ͔͸ෆ໌Ͱ͋Γɼ ·ͨGPUͷछྨ͝ͱʹݸผͷϓϩάϥϜΛ࡞੒͢Δ ඞཁ͕͋Δ͜ͱ͸ɼGPGPUϓϩάϥϛϯάΛ༰қ ʹ͢Δͱ͍͏؍఺͔Β͸େ͖ͳো֐Ͱ͋ΔɽຊఏҊ͸ GPUͷҧ͍Λٵऩ͠ڞ௨ʹར༻Ͱ͖Δ؀ڥΛఏڙ͢ Δ΋ͷͰ͋Γɼ͜͏ͨ͠໰୊ΛղܾͰ͖ΔՄೳੑ͕͋ Δͱݴ͑Δɽ

6. ͓ Θ Γ ʹ

ຊߘͰ͸GPGPUΛ༰қʹར༻͢ΔͨΊͷ৽͍͠ ख๏ͱͯ͠ɼطଘͷฒྻԽख๏Λ༻͍ͨGPGPUϓϩ άϥϛϯάͷఏҊΛߦͬͨɽ·ͨCUDAΛ༻͍࣮ͨ૷ ྫʹ͍ͭͯ΋ݕ౼ΛߦͬͨɽຊఏҊʹΑͬͯGPGPU ϓϩάϥϛϯάʹطଘͷฒྻԽख๏͕ར༻Ͱ͖ΔΑ ͏ʹͳΕ͹ɼ༷ʑͳΞϓϦέʔγϣϯʹରͯ͠༰қʹ GPUΛ׆༻Ͱ͖ΔΑ͏ʹͳΔɽߋʹGPUͱ͍͏ಛघ ͳϋʔυ΢ΣΞ޲͚ͷϓϩάϥϛϯάΛطଘͷCPU ޲͚ϓϩάϥϛϯάख๏Ͱߦ͏ྫͱͯ͠ߟ͑Δ͜ͱͰɼ ฒྻԽख๏ʹͱͲ·Β༷ͣʑͳCPU޲͚ͷϓϩάϥ ϛϯάख๏ΛGPU޲͚ʹ׆༻Ͱ͖ΔՄೳੑΛࣔͯ͠ ͍ΔɽຊఏҊ͸ɼࠓޙ·͢·͢ͷੑೳ޲্͕ظ଴͞Ε ΔGPUΛ༰қʹ׆༻͢Δ৽͍͠ՄೳੑΛࣔ͢΋ͷͰ ͋Δɽ ݱࡏ͸ఏҊ಺༰ͷ࣮૷ΛਐΊ͍ͯΔɽࠓޙ΋࣮૷Λ ਐΊɼ֤छΞϓϦέʔγϣϯʹద༻ͯ͠ੑೳධՁΛߦ ͏ɽ·ͨ͜ΕΒΛ௨ͯ͠GPGPUϓϩάϥϛϯάʹ طଘͷฒྻԽख๏Λ༻͍Δ͜ͱࣗମʹ͍ͭͯ΋ධՁΛ ߦ͏͜ͱͰɼΑΓྑ͍GPGPUϓϩάϥϛϯάख๏ ʹ͍ͭͯͷٞ࿦͕ਂ·Δ͜ͱ͕ظ଴Ͱ͖Δɽ

ߟ จ

ݙ

1) gpgpu.org: SIGGRAPH 2007 GPGPU COURSE, http://www.gpgpu.org/s2007/.

2) gpgpu.org: General-Purpose computation on GPUs(GPGPU), http://gpgpu.org/.

3) NVIDIA: CUDA Programming Guide 1.0 (CUDA NVIDIA Homepage),

http://developer.nvidia.com/cuda/. 4) Apple: Apple Human Interface Guidelines,

http://developer.apple.com/documentation/ UserExperience/Conceptual/OSXHIGuidelines/. 5) Apple: Graphics & Imaging - Quartz,

http://developer.apple.com/ graphicsimaging/quartz/.

6) Microsoft: Experience Windows Vista: Win-dows Aero, http://www.microsoft.com/ windows/products/windowsvista/features/ experiences/aero.mspx.

7) RenderingProject/aiglx: RenderingProject/aiglx - Fredora Project Wiki, http://fedoraproject. org/wiki/RenderingProject/aiglx.

8) த੢༔,౉ᬑܒਖ਼,ฏᖒকҰ,ຊଟ߂थ:ίʔυͷ

ੑೳՄൖੑΛఏڙ͢ΔSIMD޲͚ڞ௨هड़ํࣜ,

৘ใॲཧֶձ࿦จࢽ ίϯϐϡʔςΟϯάγες Ϝ, Vol. 48, pp. 95–105 (2007).

9) M.Sato, S.Satoh, K.Kusano and Y.Tanaka: Design of OpenMP Compiler for an SMP Clus-ter, EWOMP ’99 , pp. 32–39 (1999).

10) Burns, G., Daoud, R. and Vaigl, J.: LAM: An Open Cluster Environment for MPI,

Proceed-ings of Supercomputing Symposium, pp. 379–

386 (1994).

11) Squyres, J. M. and Lumsdaine, A.: A Com-ponent Architecture for LAM/MPI,

Proceed-ings, 10th European PVM/MPI Users’ Group Meeting, Lecture Notes in Computer Science,

No. 2840, Venice, Italy, Springer-Verlag, pp. 379–387 (2003).

12) BrookGPU: BrookGPU, http://graphics. stanford.edu/projects/brookgpu/.

13) RapidMind Inc.: RapidMind, http://www.rapidmind.net/.

14) ATI: ATI CTM Guide, http://ati.amd.com/ companyinfo/researcher/documents.html.

Fig. 2 Example of numerical culculation programming using graphics programming

参照

関連したドキュメント

What relates to Offline Turing Machines in the same way that functional programming languages relate to Turing Machines?.. Int Construction.. Understand the transition from

Oscillatory Integrals, Weighted and Mixed Norm Inequalities, Global Smoothing and Decay, Time-dependent Schr¨ odinger Equation, Bessel functions, Weighted inter- polation

The purpose of this paper is to guarantee a complete structure theorem of bered Calabi- Yau threefolds of type II 0 to nish the classication of these two peculiar classes.. In

He thereby extended his method to the investigation of boundary value problems of couple-stress elasticity, thermoelasticity and other generalized models of an elastic

Although such deter- mining equations are known (see for example [23]), boundary conditions involving all polynomial coefficients of the linear operator do not seem to have been

It is well known that the inverse problems for the parabolic equations are ill- posed apart from this the inverse problems considered here are not easy to handle due to the

Applications of msets in Logic Programming languages is found to over- come “computational inefficiency” inherent in otherwise situation, especially in solving a sweep of

It is known that quasi-continuity implies somewhat continuity but there exist somewhat continuous functions which are not quasi-continuous [4].. Thus from Theorem 1 it follows that