• 検索結果がありません。

SPAM粒子シミュレーションコードのハイブリッド並列化

N/A
N/A
Protected

Academic year: 2021

シェア "SPAM粒子シミュレーションコードのハイブリッド並列化"

Copied!
6
0
0

読み込み中.... (全文を見る)

全文

(1)ハイパフォーマンス コンピューティング. 87−8. (2001. 7. 25). SPAM 42&+,/"&-0%"( #)$*.'( 351   . !. y. .  y    y. yy. William G. Hoover. yyy. Carol G. Hoover. 2 þõÆ2óñ6Å SPAM (Smooth Particle Applied Mechanics) Ò º±SMP ÖéÚÛ ÒÛ³×ÜÝãéÜÝáÔ³åúÁسÞÒﺽ²SMP ß³ÞÂÆ5Òî·1»½Ë Å MPI à OpenMP Ò7ÊúѼ½àÓâêÜÞïÅÍÏÿ8Òùľ½²SMP-PC ÖéÚ Û COSMO 3 43,200 ýÆÙäçë³ÙèìÒùľ½ôð±ÿù5 Ç MPI ÆÊÒ´Ï ȵ¶´5 ¶

(2) Îн²ÕæÜÙçÆ÷ÅòÀ¸5 ñ6ÒùÄ´±5 ûÆöíÅ¿´ Áøü»Ï²É½±¹Æ0Æ9Å OpenMP Ò º½4úÆ9 Å¿´ÁÌøü»Ï². Hybrid Parallelization for SPAM particle code Shigehiro YOSHIKAWA,y Taisuke BOKU,y William G. Hoover,yy Carol G. Hooveryyy and Mitsuhisa SATOy. We apply the SPAM (Smooth Particle Applied Mechanics) method for 2-dimensional shockwave analysis and develop a parallelized SPAM code for SMP cluster. To exploit the parallelism in an SMP node, We implement the program with Hybrid method combining MPI and OpenMP. We evaluated the performance of palallelized SPAM code on an SMP-PC cluster named COSMO, and the results show MPI-Only method has higher performance than Hybrid one. We show a performance analysis based on the cache utilization on SMP processors, and also discuss issues of OpenMP on such a problem.. ˜•š— SPAM (Smooth Particle Applid Mechanics) Ǔ 4 Æ5ëÄ{‚É©î ¢î¹jÆ!½ À¤ØžGn%Ã<=Ç[©1Ž?jî ¢î  (MD) Ä QØČ­¬¿¨×‰©DÆüz‚ ÄÇT†Ä&³¿P3Äa)´Ø9÷îáü æۂ¥¿¥Øž²¹ª¼¿‰©DƝ‚ŽÆf¤ Äǝ[©1Ž?jî ¢îÀÕ­‚¥ÖÙ Øäöùã3ƹjÛ&‚´Ø°ÁªÀ«Øžäöù ã3Áǝ¤Øx#T†Õ׆ٹ‰©DÀÆ9÷ îÛ9»íØ°ÁÄÕםüz‚Æf¤Û XĞs´Ø¹jÀ¤Øž 3¡Ç 2 ¯qÆÓg?Æ5ëxÄ ² SPAM Û &‚´Øž°ÆÕ¦Ãäöùã3ۂ¥¹‰©î  ¢îÛ]’)´Ø׌‰©[@j (particle decomposition) Á_D[@j (space decomposition) Æ 2 ºªÕ­‚¥ÖÙØ ž3¡ÆxÀǝx 1.. 1),2). y. ’Œx Ž„RŠ–€x{. Institute of Information Sciences and Electronics, University of Tsukuba yy Department of Applied Science, University of California yyy Lawrence Livermore National Laboratory. _Dªäöùã3ÆGcÄNÍQØÄ«¥ÆÀ_ D[@jۘ‚²¹ž_D[@jÀǝ_DÛx#ì ÞñÆW[_D (ò) Ä[@²òÆ!bƱ Ûäöùã3ÆGcÄ-²¥©¤Ø¥Ç«­ï#´ ؞°ÙÄÕםüz‚f¤®Æ!ÜÛXÄu# ´Ø°ÁªÀ«Øž3¡ÆxÀÇî ¢î 2$Àò6Ɖ©Æq)ª«­a)´Ø¹Ñ5 òöìÌÆòÆ ö2èۃU´Ø°ÁÄÕם f¤ÆV0[£ÛˆÃ¦°Áª¿ƒÀ¤Øž n†Àǝ2 ¯qÆÓg?Û5ë´Ø¹ÑÆ ]’ SPAM ë¢úÛ´þ²·Æã>5ëۈæžf ¤Æ5öù3â¢

(3) Äǝ]< HPC [yÀEÝÆ Š¥ SMP çðôÛô¢êöùÁ´ØžSMP ç ðôNJ*Ã]’f¤IÄN͝Õ×*Ċã> Ãîð÷

(4) Û À«Ø¹Ñv±Ù¿¥Øž3¡Ç SMP ý¢úÀÆ]’ãÛ#«Á´¹ÑMPI ÆÐÄ ÕØ´þº®ÀíMPI Á OpenMP ÛøÐŒÚ ¶¹þÞ4öú ]’)ÄÕØ´þۈæžþÞ4 öú ]’)ÀǝSMP ý¢úDÇ MPI ÄÕØò Æ ö2èÄÕØV0[£SMP ý¢ú 6ÀÇ OpenMP Æðöúðéï¢èÄÕØV0[ £ÛˆÃ¦ž (2 ÒÀ]’ SPAM ‰©î ¢îÆ. 1 −43−.

(5) 2. 2.1. ®° SPAM ¯©œžŸ¢”œ ¤. @P. 2 ¯qÆÓg?5ëÄ SPAM Û&‚´Øž‰© i; j DÆ9÷îáüæOɉ© i Ä0­ŽÇ¯Æ ²ÀS±Ù؞

(6) r wp(r ) = 0 (1 0 rR ) ; R R. X X ()=0 ij. fx (i) = 0. ij. ij. 2. 3. wp(rij ). xi 0 xj ; rij. wp(rij ). yi 0 yj rij. j =i 6. fy i. j =i 6. r Ç ‰© i Á ‰© j DÆT† R Çäöùã3 GcÛS´ž‰©Ä0­ŽÆf¤Ç‰© j ªäöùã 3Gc R 6Ä5œ´Ø׌ÄÆЈÃÚÙ؞ŽÆ f¤ÛGĝî ¢îÀÇ&íîDÐX 1t ۂ¥¿‰©ÆÁ3)Û®DD'%Äf¤ ´Øž°Æ®Dì[Ĩ®Ø}•ÛsÖ´¹Ñ 4 ¯ Æêçöô (Runge-Kutta) jÛ&‚´Øž ij. î ¢îǝ¯ÆզùÅÀˆÚÙ؞ ( 1 ) î ¢îÆÇHÚk (‰©ÆÇHŸ 3)à) Ûï#´Øž ( 2 ) ‰©ÛòÄ[£´Øž ( 3 ) 5òðÄòÛ ö2è´Øž ( 4 ) ò6Ɖ© i Á2!ò6Æ7Ɖ©¨ Õɏò6Ɖ©ÁÆø (i, j ) ÛQѝ ðù۝ä´Øž°Æð÷ö5Û sorting Á w˞ ( 5 ) êçöôjۂ¥¿‰©Ä0­ŽÁ+3) Ûf¤´Øž°°Àäöùã3ÆF#ۈæž ( 6 ) ‰©Æۄߴ؞ ( 7 ) òDÀƉ©ÆBۈ¦ž ( 8 ) ð÷ö5 4 £ 7 Û`×c´ž ð÷ö5 5 Áð÷ö5 7 ÀÇ5òðDÀø¢ô Æs°%ÃBªRƒÀ¤Øž ð÷ö5 3 ÀÇV0[£Û‡Š²¿òÆ ö2 èۃU´ØRƒª¤Øž3¡Ç5òðÌÆò. ö2èÛ±|ÄKÃÀ«ØÕ¦=òÄÇ·Æ ò6Ɖ©ø¢ôº®Àí]kÆòÆÙgÒ e­´ØզIJ¹ž²¹ª¼¿=òÇ]kÆò ÌÆ9ÞôÛe­²·ÆòªÂÆ5òðÄ È{±Ù¿¥Ø©Ø°ÁªÀ«Øž°ÆÕ¦Äë¢ øÝè´ØÁÆ Æã¢ÿ6öúªN>%«­Ã تò ö2èÛ¾7ÄKÃÀ«Ø¹ÑV0[ £Ûˆ¥Ó´¥ž «´3¡Æë¢úÀǝ5òðÌÆòÆ ö. 2èÁî ¢îÆëÜW[Ç5ˆÄKà ±ÙÂÆզà ö2èÄÒ &,>Áü¿¥ ؞·ÆHt5òðDÆ!ÜÇòÀˆÃÚ Ù5òð,¹×Æý!ÜÄ ²!Ü6áÇN >%8­Ã؞²©²{ôØÕ¦Äö´ˆ®DÆ W[Ç5òð6Ɖ©f¤ÄO±Ù!Ü®DÇ ·ÙÎÂ%Z²Ã¥ž HBID 6‚¥Ø 2 ¯qÆÓg?5ëÆë¢úÀǝ‰ ©ÆÇH3)Ç y ¯qi‚ÄE²¿Çõ

(7) Ä §Øªx ¯qi‚ÄÇà 1 ÆզÁAۧ؞ °ÙÇ x ¯qi‚Ä«ÃŽª©®ÖÙ¿¥Ø° ÁÛú#²¿¥Øž´›Äǝà 1 ƁAÄ&,ÃÔ Ö¬Û§¿ï#´ØžÏ¹‰©ÆÆÇH ǝ‰©Æq)ª x ¯qÆæÆi‚Ä‚©¼¿« ­ÃØÕ¦Äï#´Øž²¹ª¼¿”1ÆòÕ× $1ÆòÆΦªØĉ©Æq)ª«­Ã؞ 2.2. 2. x-velocity. 1.9 1.8 1.7 velocity. <ƒÛÃ͹{3 ÒÀþÞ4öú ]’)À‚¥¹ 5è è¹jĽ¥¿ÃÍ؞4 ÒÀÇ´IÖ ÀÆã>T*Û°²5 ÒÀÏÁÑۈæž. 1.6 1.5 1.4 1.3 1.2 1.1 1 -800. -600. -400. -200. 0 200 x-position. 400. 600. 800. †}—Y‰z‹ °ÆÇHÚkÆ(À®Dî ¢îۈ¦ Á‰©ª$1Æ_DĽϼ¿²Ï¦¹Ñ´›Ä f¤ÛRƒÁ´Ø_DÇÀ̴؞°ÆÕ¦ÃÛXÀ î ¢îÆÇHÄï#²¹5òðÌÆò. ö2èÛ¦¥4®ØÁQØĀ‡ª­ÃØÆÀ ¤ØôÞ

(8) ð÷ö5ªe2²¹{Äð÷ö5 3 Äw ם5òðÌòۖ ö2è´Øž!J%Ä° ÆÕ¦ÃòƖ ö2èÇëðùª«¥ÆÀ &íÃôÞ èÀ´ˆ±¶Øž 1 x. £›¡ž¤› 3¡ª´þ²¹ SPAM ë¢úÇ Fortran77 Àë¢ øÝè±Ù¿¥ØžGn%à MPI 5è è ÀÇ4_ëLjü¿¥Ã¥ªôÃÆÕ¦Äò. ö2èÄE²¿ÇՙÃâ|ª,>Áü¿¥Øž ϹSMP ý¢úÀÕ׫Ã]’ãÛ3عÑĝ !½Æ MPI 5òðÛ OpenMP ۂ¥¿±ÖÄ] ’)´Ø°Áۇ§Øž°Æզà ]’)ÛþÞ4 öú ]’)Áw˞ °ÆxÀ—Ò®DÛÎO´ØÆ Çêçöô jۂ¥¿‰©Ä0­ŽÁ+3)Ûf¤´ØW[À¤ ؞‘§È43,200 ‰©Æî ¢îÛ! Æ f¤IÀ´ˆ±¶ØÁŽÆf¤Àz 70 % Æ CPU. −44− 2. 3..

(9) ®DÛÎO´Øž3¡Ç°Æf¤ÛˆÃ¼¿¥Ø¢ 5Û OpenMP ÆøÝç÷Ý4ۂ¥¿]’)´Øž ‰©DŽÆf¤Ä¨¥¿OpenMP ۂ¥¹]’) Ɖ)Û«­¸ÙØզĝsorting Ĩ¥¿ä ±Ù؉©7ÜÆðùǝòÀÇí5 òðÀä±Ù؞´ÃÚ»=5òð6ÀÇ Ïµ=òÛð岿‰©7ÜÆðùÛä² ¹{·ÆðùÛ!JÄÆ ´Øb۸؞ sorting Æ ÒN>%Æ Ç¿¥ªðùÛä ´Ø›Ä;Ɖ© i Oɉ© j ÄE²(i, j) Æá ùª¤Ùȝ(j, i) Æáùǝä²Ã¥ž° Æ 2 ½Æ‰©Æøǝ‚H‚ÆEaĤם(i, j ) Àf¤²¹9÷îáüæÛ¦¼¿ (j , i) Ǝ Ûf¤À«Ø¹Ñ9÷îáüæÆf¤ÛG [ÄsÖ´°ÁªÀ«Ø©ÖÀ¤ØžSPAM ë¢úÀ Ɖ©D9÷îf¤ÇY Ã¹Ñ°Æ°ÁÇ ã>Ö¿ƒÀ¤×Э°ÁÇÀ«Ã¥ž°Æզĉ ©DÆ ÑãªhÙعѝsorting Æ Û OpenMP ۂ¥¿€‡%Ä]’)´Ø°ÁÇ8²¥ž sorting Æð÷ö5Àä±Ù¹ (i, j ) ðùÄ ²¿ŽÆf¤ÛˆÃ¦ž²©²°ÙÖÆ (i, j) ‰ ©7ÜÄÇäöùã3Gc6ÆÒÆÁGc;ÆÒƪ FÏÙ¿¥Ø¹Ñ°ÆðùÛÄÃ4öç[@ À OpenMP Æ parallel do ÄÕ¼¿]’)´ØÁ= ðöúÆf¤V0Ä`תç³Ø,>㪤؞° Æf¤V0Æ[£Æ¹ÑÄ OpenMP Æðöúðé ï¢èƄ‚ª‡§ÖÙ؞ ±ÖĝYáÆðöúª2®Ä¤Ø‰©Ì0­Ž ۄ߲¿²ÏÚÃ¥Õ¦critical øÝç÷Ý4Û ‚¥¿‰©Ä0­ŽÆ„ßÛe~´ØRƒª¤ØžŽ Æf¤ÛˆÃ¦ëÜ¢5Û OpenMP Æ parallel do À]’)²¹ë¢úÛà 3 Ä°´žnpair Ç (i, j)  ðùƱrlucy Çäöùã3GcÀ¤Øžcritical ÀÏÙ¹W[Û 2 ½‚²¿¥ØÆÇ@7â|± ÙØW[Û̱­´Ø¹ÑÀ¤Øž 3¡Æ5è

(10) Ç MPI Á OpenMP ےœ±¶ ¿5è èۈü¿¥Øž‰ÆðÀǝ= MPI 5òðÄ!½ÆðöúÆÐ@×,¿Yá Æ MPI 5òðۂ¥¿5è

(11) Û´ˆ²¹×Œ Û MPI-Only Áw˞!i!½Æ MPI 5òðÄ YáÆðöúÛ@×,¿¿´ˆ²¹×ŒÛ Hybrid Áw˞¦‚´ØZ %Ã5òöìáÇ MPI 5 òðÁ OpenMP ÄÕØðöúáÛøЌڶ¿i #±Ù؞ 4.. ª«¬¥. xìÞñÁ²¿ 43,200 ‰©Æî ¢î ۈæžÏ¹2.2 ðÀÃ͹ÇHÚkÆ(Àî  ¢îÛ9¨´Øžy ¯qi‚ÄÇ»HW8Úk Ûï#²x ¯qi‚Çö¿Æ‰©ª”©Ö$ÌÁ 1´ØÆÀ$ÄÆÐ ¢W8ÚkÛï#´Øž. !$OMP parallel do private(i,j,xij,yij,rr,rij,w,wp) do ij=0,npair-1 i = ipairs(1,ij) j = ipairs(2,ij) xij = x(i) - x(j) yij = y(i) - y(j) if(yij.gt.+0.5*ny) yij = yij - ny if(yij.lt.-0.5*ny) yij = yij + ny rr = xij*xij + yij*yij if(rr.lt.rlucy*rlucy)then rij = sqrt(rr) call pot(rij,w,wp) !$OMP critical (FX) fx(i) = fx(i) - wp*xij/rij fx(j) = fx(j) + wp*xij/rij !$OMP end critical (FX) !$OMP critical (FY) fy(i) = fy(i) - wp*yij/rij fy(j) = fy(j) + wp*yij/rij !$OMP end critical (FY) endif enddo. ›„XUœY|ƒ”•YaSi L>AC ]’ SPAM ë¢úÆT*Äǝ3¡ÆoR³À/ 0²¿¥Ø SMP-PC çðô COSMO (Cluster Of Symmetric Multi prOcessor) ۂ¥Ø ž=ý¢ú Ç Intel Pentium-II Xeon (450MHz, 512KB 4way L2 åöî) Û 4 *š²¹ DELL PowerEdge6300 À4 ý¢úÀ ä±Ù؞·K'Ç 4-way Þ ô¢¢4À ä±ÙŠ¥ð¢5öùª 3ÖÙ Ø žý¢úDÇ 100base-TX Ethernet Switch Á 800Mbps Myrinet Àî4±Ù¿¥ØžÃ¨6Æ ã>T*ÀÇ 100base-TX Ethernet ÆЦ‚´Øž OS ÄÇ SMP &Æ Linux 2.2.16 Û¦‚²¿¥ ؞OpenMP ë0ÞÄÇ PGI Fortran 3.1MPI Þ4ÄÇ MPICH-1.2.0 ۂ¥Øž COSMO ÖÀÆã>T*Ĩ¥¿´›Ä¦‚± ÙØZ %Ã5òöìáǝSMP ý¢úáÁ SMP ý¢ú 6Æ5òöìáÁÆìÀ¤Øž‰Æã>T *ÀǝMPI-OnlyHybird ÁÒĝSMP ý¢úá Û 1, 2, 4±ÖÄ= SMP ý¢ú 6À¦‚´Ø5 òöìáÛ 1, 2, 3, 4 Áa)±¶¿f 12 !×Æ׌ Ľ¥¿´ˆ´Øž .;715=+%'(M?NE 3¡ªô¢êöùÁ²¿¥Øxǝx ¯qi‚Ä E²¿QØÄ¢úÿðª­Ã¼¿¥Øž²¹ ª¼¿ÄÄ x ¯qi‚Û MPI 5òðáÀ4 öç[@²5òðÄòÛ@×,¿Ø (( at mapping) ÁV0[£ª¦Ï­ˆÃÚÙÃ¥ž·°À x ¯qi‚ÄE²¿ MPI 5òðáÆåáBÀÕ× ™©­4öç[@²òÛ5òðÄìÞçö çÄ ö2è´Ø (( cyclic mapping)žV0[ £ÀÇ cyclic mapping ª{„À¤Øª4öçÆ. −45− 3. 2. 4.1. 5). 5). 4.2.

(12) W8ªÿ+´Ø¹Ñ5òðDÀÆø¢ô!ܪÿ +´ØÁ¥¦h(ª¤Øž ‹µÆ mapping ijÛ´þ²V0[£ÆÛXÛ ÍعÑĝ1 ôÞ

(13) ð÷ö5ÆDÄ´›ÄŽÆf ¤ (à 3 Æ if \6Æf¤) ۈü¹6áĽ¥¿2 #²¹ž4 5òðÀ´ˆ²¹×ŒÆj.ÛS 1 Ä° ´žcyclic mapping ÀÇf¤V0ª=5òðÄ\ -Ä[£±Ù¿¥Ø°Áª[©Øž ! 1 cyclic mapping XZ[|ƒY“w•‚ lted ID at mapping cyclic mapping 0 1 2 3. 366,719 387,360 1562,349 1535,350. 14. static cyclic dynamic. 10. 6.80 6.78 6.77. ! 3 dsgid`cpSqu_XZ[|ƒY“w•‚ dsgi ID static cyclic dynamic. 1node(flat mapping) 1node(cyclic mapping) 2node(flat mapping) 2node(cyclic mapping) 4node(flat mapping) 4node(cyclic mapping). 12 Elapsed Time / step (sec). 972,417 974,227 962,881 954,058. ¿´ˆ´ØžS 2 Ä OpenMP Ĩ®Øðöúðé ï¢èÛ static, cyclic, dynamic Áa§¹×Œ Æ 1 ôÞ

(14) ð÷ö5 ,¹×Æ´ˆ®DÛ°´žstatic ÀÇðöúáÀ¢5ÛÄÄ4öç[@² ê%ÄðöúÄ@×,¿Øžcyclic Ç chunk Ä4 öçìÞñÛª#²Õי©¥‰)À[@²ìÞ çöçÄðöúÌ@×,¿Øždynamic Ç cyclic Á2€Ä™©­4öç[@²1%ÄðöúÄ@× ,¿Øžcyclic Á dynamic Ľ¥¿Ç·Ù¸Ù—Ò Œ¥ã>ªÁØÕ¦Ä4öçìÞñÛª#²¹ž ! 2 dsgid`cpSqu_Y~v d`cpSr—˜ ‡ y (sec). 0 1 2 3. 8 6. 924,895 970,685 972,495 972,915. 942,257 955,141 970,682 972,910. 960,329 960,285 956,265 964,111. S©Ö[©ØÕ¦Äðöúðéï¢èÛa §¿Ò´ˆ®DĤÏוÇpÖÙÃ¥ž°Ùǝäö ùã3ÆGc6Ä5œ²´›ÄŽÆf¤ªˆÃÚ٠؉© (i, j) ÆøЌڶª(i, j) ðù6À`× r­5œ²¿¥Ø¹ÑÁ‡§ÖÙ؞=ðéï¢ 3 ermgku_Y~v èĽ¥¿1 ôÞ

(15) ð÷ö5ÆDÄ´›ÄŽÆf à 3 ÇòÆ ö2èÄ at mapping ۂ¥¹ ¤ (à 3 Æ if \6Æf¤) ۈü¹6áĽ¥¿S 3 ׌Á cyclic mapping ۂ¥¹×ŒÆ 1 ôÞ

(16) ð Ä°´ž°ÆÕ¦ÄÄÃ4öç[@À¤Ø static ÷ö5 ,¹×´ˆ®DÀ¤Øžcyclic mapping ۂ ðéï¢èÀÒ 4 ½ÆðöúDÀÆf¤V0 ¥Ø°ÁÄÕם—Àz 30 % Æã>‚Öª3ÖÙ Ä·ÙΠ`תå°Áª[©Øž²¹ª¼¿° ¹ž²©²¦‚´Ø5òöìáªÿ+´ØĽ٠°ÀÇðöúðéï¢èۃU´Ø°ÁÄÕ ¿‹µÆã>•Ç̱­Ã¼¿¥Øž°Ùǝcyclic ؀.ǤÏ×tÙÃ¥ž 4)6:13OQ ,"3&JK mapping ÄÕ×5òðDÆø¢ô!ܪÿ§¹¹ ¯Ä SPAM ë¢úÛ Hybrid À´ˆ²¹×ŒÁ ÑÁ‡§ÖÙ؞ ‰Æã>T*ÀÇ MPI 5òðÌÆò ö2 MPI-Only À´ˆ²¹×ŒÆ 1 ôÞ

(17) ð÷ö5 ,¹ ×Æ´ˆ®DÛà 4 Ä°´žÃ¨MPI-Only Ĩ èÄÇ cyclic mapping ۂ¥Øž ®Ø5òðÆý¢úÌÆ ö2èǝ!Üۗ& %'(M?NE ¯ÄŽÆf¤ÛˆÃ¦¢5Æ=Þô¢îÆ )´Ø¹Ñďî´Ø5òð (î´ØòÛ, f¤V0Æ[£Æ¹ÑĝOpenMP Æðöúðé ´Ø5òð) ª 2³ý¢úÄÃØզIJ¹žà 4 ï¢èۂ¥¹×ŒÆ€.Ľ¥¿Í؞° ©Ö MPI-Only Ç Hybrid Õ×ØÄã>ªŒ¥°Á ÆV0[£Áò ö2èÄÕØV0[£ÁÆ ¥ ª[©Øž °Æ‹µÆã>•Ær"ÛÍعѝ5òðD ǝò ö2èÄÕØV0[£Ç (i, j) ðùÆ ± (npair) Û\-Ä5òðÄ@×,¿Ø°ÁÀ Æ!Ü®DÛË¥¹6WÆ ®D´ÃÚ»‰©Ä0 ¤×°°ÀÆV0[£Ç (i, j) ðù6À´›Äˆ ­ŽÆf¤Ä©©Ø®DÛ2#²¹ž2#j.Ûà 5 Ä°´ž°Æj.©Ö!Ü®DÀÇ흎Æf¤ ÃÚÙ؎Æf¤Æ[£Ûv%Á²¿¥Øž ðöúðéï¢èÇà 3 Æ parallel do øÝ Ä©©Ø®Dªã>•Æ·Ãr"À¤Ø°Áª[©Øž SMP çðôÀÇV{ÿðª8ùüöçÁÃØ ç÷Ý4Ä schedule(SCHEDULE,chunk) Ûª#´ Ø°ÁÄÕ×â|´Øž°Æ€.ÛnԴعÑĝ ¹ÑåöîƄ‚€‡Æ2#ÄG¾­ã>5ë COSMO Æ 1 Æ SMP ý¢úÖÀ 4 ðöú ‚¥ ª{€À¤Ø ž(ÀÇV{ ÿðÌÆÜçò 4 2. 0. 0. 1. 2. 3. 4. 5. 6 7 8 9 10 11 12 13 14 15 16 Number of Processors. 4.4. 4.3. SPAM. OpenMP. 5). 4 −46−.

(18) 14. Elapsed Time / step (sec). ! 4 œY|ƒ”•Y L2 ]ogbpndš (%). 1node(MPI-only) 1node(Hybrid) 2node(MPI-only) 2node(Hybrid) 3node(MPI-only) 4node(Hybrid). 12 10. #node 1. 8. 2. 6 4 2. 4. 0 0. 1. 2. 3. 4. 5. 6 7 8 9 10 11 12 13 14 15 16 Number of Processors. 4 MPI-Only 14. 1node(MPI-only) 1node(Hybrid) 2node(MPI-only) 2node(Hybrid) 3node(MPI-only) 4node(Hybrid). 12 Elapsed Time / step (sec). W Hybrid Y‡ y. 10 8 6 4 2 0 0. 1. 2. 3. 4. 5. 6 7 8 9 10 11 12 13 14 15 16 Number of Processors. W Hybrid Y‘”ˆ™ y (œY|ƒ) ðÛEæ L2 åöîÆ ð‡Û‡§Øž ŽÆf¤W[Ľ¥¿L2 åöî ð‡Û2 #²¹j.ÛS 4 Ä°´žÃ¨Intel Æ PentiumII Xeon Äǝ5òöìÆ03⢠ðÛ2#´ عÑÆäßôª ‚±Ù¿¨×°Æäßô Û¦‚²5òöìÆø¢ôdÆ ¢Ï6á (DATA MEM REFS) Áö¿Æÿðùíçî 6á (BUS TRAN ANY) Û2#´Ø°ÁÄÕם L2 åöî ð‡Û¤Á²¿¥Øž SÆj.©Ö Hybrid ÀÇ SMP ý¢ú 6ÀYáÆ 5òöìۂ¥¹×ŒL2 åöî ð‡ª« ­ÃØ°Áª[©Øž(Hybrid À L2 åöî ð‡ª­ÃØr"Ľ¥¿‡Ÿ´Øž Hybrid Àǝsorting Æð÷ö5Àä±ÙØ (i, j ) ðùÄ ²ðöúۂ¥¿]’)²¿¥Øž° Æ (i, j) ðùªÂÆզà 0Äü¿¥Ø©Ûñ s´Ø¹ÑĝðùƝä¹ÅÛ(Ä°´ž ( 1 ) ¤Øò6Ɖ© i Ľ¥¿2!ò6Ɖ © j ÁÆøÛ¿Yªr¥Õ¦Ää´Øž ( 2 ) ‰© i Áî²¹ 8 ½Æò6Ɖ© j ÁÆø ۝ä´ØžòªW8ÖĤØ׌;ǝ ¿YÛP®Ø¹Ñ+léll+Æ 4 ½ Æò6Ɖ©ÁÆø۝ä´Øž ( 3 ) ò 6Æö¿Æ‰©Ä½¥¿1 Á 2 Û`× c´ž 5 MPI-Only. #processor 1 2 3 4 1 2 3 4 1 2 3 4. MPI-Only 0.118 0.119 0.121 0.130 0.124 0.122 0.125 0.132 0.122 0.128 0.138 0.142. Hybrid 0.119 0.552 0.723 0.854 0.120 0.555 0.726 0.861 0.123 0.563 0.747 0.899. ( 4) ·Æ5òðª,´Øö¿ÆòĽ¥¿ 1 £ 3 Û`×c´ž °Æզĝ(i, j ) ðùÇòÛÆ Á²¿ ä±Ù؞èð÷ö5 4 Àǝ5òðª±[ Æ,²¿¥ØòۢϴØ׌y ¯qi‚Û( ©ÖÖē4ÄÜçòð´Øž °ÆզùÅÀä±Ù¹ (i, j) ðùۂ¥¿ ‰©Ä0­ŽÆf¤ÛˆÃ¦Áà 6 Æզĝ5 ½ Æò6Æø¢ôÆÐ`×c²Üçòð´Ø°ÁÄà ؞ŽÆf¤À·Ä¦ÚÙØBè)´áø¢ôǝ‰ ©ÆÙg (à 2 Æx,y) ÁŽ (à 2 Æfx,fy) Æ 4 ½À¤ØžÏ¹°ÆxÀÇ 1 ½Æò6Ɖ©Ç 100 Û§¿¨Öµ5 ½Æò6ÀÜçòð±ÙØ ø¢ôÆ—ÛpìÒØÁ 5 2 100 2 4 2 8byte = 16000byte ÁÃםz 16 KB ÀQØÄ̱¥žÏ¹°Æx Àǝy ¯qi‚ÆòÇ 6 ½ÁÍí6 2 3 = 18 vÆòÛÜçòð²¿Òø¢ôÇz 56 KB À L2. åöî©Ö ¥Á±ÙØ°ÁÇÃ¥ž²¹ª¼¿ Æ’Æò (x ¯qi‚À$Æ’Æò) ƎÛf ¤´ØÁ«Äǝ+Ál+;ÆòÇ´ÀÄåö îÄ:¼¿¥ØÁ‡§ÖÙ؞²¹ª¼¿x ¯qi ‚Æòғ4ÄÜçòð´Ø°ÁÄÕ×åöî Û{€%Ą‚À«Øž. 5 −47−. Y セル. X. 粒子 j の範囲 粒子 i の範囲. 6. 内の全ての粒子について 力の計算が終了したら 次のセルに移動する. œY|ƒXTV[hSf\^edjfSu.

(19) ¯ÄYáÆ5òöìÖÀ´ˆ´Ø׌Ľ¥¿‡ §ØžMPI-Only ÀǦ‚´Ø5òðáÛÿÓ²¿ ҝ=5òðª·Ù¸Ù (i, j ) ðù۝䲝(i, j ) ðùÆò/©ÖÆ ²¿¥­¹ÑåöîÛ {€Ä„‚À«Øž!iHybrid Ç°ÆòÆ_D %ÃÁÇö­Eaí (i, j ) ðùƃ÷Û@× ,¿ØÆÀMPI-Only ÎÂåöîÄ,¹ÖÃ¥ Á‡§ÖÙ؞½Ïם(i, j) ðùÇò/©Ö¤ Ø¥ÇòÆÆ ÆaÚ×v©Öf¤Û¨ÑØ Æªåöîۄ‚´ØÖÀ—Ò€‡ªŒ¥Á‡§ ÖÙت Hybrid ÀÇ (i, j) ðùÆ;Æ(©ÖÆ ÛðöúÄ@×,¿¿²Ï¦ÆÀ€‡ª­Ã ØÁ‡§ÖÙ؞ Hybrid À MPI-Only Á2³Õ¦Ã@×,¿ÛˆÃ ¦ÆǑ8À¤×~[ÃÆ ªRƒÄÃ؞(i, j )  ðùÀòÀÆÆ ÆaÚ×vÛðöúÄ@× ,¿Ø¹ÑÄǝðöú ID ۂ¥¹Õ™Ãðö ú5è èªRƒÄÃØÁ‡§ÖÙ؞ %#$(2"0<"*9)/" -8= ÖÛÏÁÑØÁHybrid´ÃÚ» OpenMP Û ‚¥Ø5è èĨ¥¿5òð6Æø¢ô ÆçäÁ·ÆÆ Á¥¦Õ¦ÃEaۇ§Ø׌· ÙÖª2!ÆðöúÀ´ˆ±ÙØ°Áª¿ƒÀ¤Øž ´ÃÚ»ø¢ôÆ¢äÞó¢îª¿ƒÀ¤ ם‘§È—ÇÆ¢5Àø¢ôÛç䲝¯Æ¢ 5À·ÙۢϴØÕ¦Ã׌=¢5ÆÞô¢ îª2!ðöúÄ@×,¿ÖÙÃ¥Áåö îÆ1öù‡ªXÄ"(²ã>Û)±¶Ø «Ãƒ"ÄÃ؞ °ÙÛP®Ø{€ÃijǝôGÆ¢5Á{GÆ ¢5Æ=KªÊ¼¹×!´Ø4öç@×,¿ ۈæ°ÁÀ¤Øª°°À¸×S¯¹xÆÕ ¦ÄõÞû öçÃV0[£ÛˆÃ¦Rƒªç³Ø Á°Ùª¦Ï­&‚À«Ã¥žÒ¦!½Æijǝ —Ç©ÖYáÆ¢5ÄyØÆ ÛðöúÄ«­ [®¿²Ï¥ðöú 6À=f¤3à¢ñÛ^³ Ñ¿ˆÃ¦Õ¦Ã5è èۈæ°ÁÀ¤ ؞²©²°ÆijÇ¥ÚÔØ naive Ã]’)À¤ םOpenMP Æ­½¯5è

(20) ©ÖÆÞç  ôÃ]’)Á¥¦5è

(21) ÆCd±Û «­6æ°ÁÄÃ×lÅÃ¥žÏ¹1%V0[£ ÌÆ &Ľ¥¿Ç—ÇÄÃ͹ijÁ2€ &² ¾Ö¥ž OpenMP Û ‚¥¹ Hybrid ]’5è

(22) Àǝ ý¢ú 6!ÜÆã¢ÿ6öúÛsÖ²Ï¹ý¢úD !ÜƉ)Û«­²2€Ä!Üã¢ÿ6öúÛs Ö´Á¥¦ öùª¤Øž²©²6ƑÆզĝ OpenMP W[ÀÆø¢ô[ÈãÄG¾­ã¢ÿ6ö úª!Üã¢ÿ6öúÛÖ6Ø׌ýŒ%Ãã>À MPI ÆÐÆ׌ÛÖ6ØÆÇ8²¥ž°Æ׌naive 4.5. OpenMP. Ã]’)ǤÏ×pªÃ­ (·Ù­Ö¥ÃÖ MPI º ®À5è è²¹iªCÀ¤Ø)Hybrid ] ’)ÄÕ¼¿Šã>ª3ÖÙؑÇáÍÃ¥ž. ™ – š n†Àǝ“4 Æ5ëÄ{‚ùjÀ¤Ø SPAM Û 2 ¯qÆÓg?5ëÄ&‚²SMP çðôÛ ô¢êöù5öù3â¢

(23) Á²¿]’)ۈü¹ž SMP ý¢úÀÕ׊¥ã>‚ÖÛ3عÑĝMPI º®ÀíMPI Á OpenMP ÛøЌڶØþÞ4 öú ]’)ۈü¹ª´ˆã>ÀÇ MPI º®‚ ¥¹×ŒÆiªŒ¥j.Áü¹žMPI Á OpenMP ÄÕØþÞ4öú ]’)ÀŠ¥ã>Û3Ø°ÁÇ8 ²­—]ÆoRÀÒgŽ±Ù¿¥Ø žåöî ð‡ÄG¾­5ë©ÖþÞ4öúÇ MPI ÆÐ ‚¥Ø׌ÄNÍ¿ø¢ôÆ[Èãۀ‡Œ­„‚ 5.. 3),4). À«Ã¥°Áª«Ãr"Á‡§ÖÙ؞ 6ˆÃ¼¹þÞ4öú ]’)Àǝã>Û7õ ´Ø¹ÑÆ~ª¥±Ù¿¥Øž3¡Æë¢úÀlj ©Ä0­ŽÛ„ߴ؛ĝðöú 2§ªUŒ²Ã ¥Õ¦Ä critical øÝç÷Ý4ۂ¥¿¥Øž°Æ @7â|Æã¢ÿ6öúǝ8¯qA’Û_Ă² ðöú oÄ5ˆÄŽÆ„ßۈÃڶذÁÄÕ× žË,>ºª¢5Ɨ{Ä~[Ã+¤Æ ªRƒ ÁÃ؞·Ù¸ÙƹjÆã¢ÿ6öúÆù¢úã 3ÛÍعÑĝ{µÆ¹jĽ¥¿Ò´þOÉT *ªRƒÀ¤ØžÏ¹ø¢ôÆAÁÜçòðÅÊ â|ÄE²¿Ò±ÖćŸÛ¿ÅØRƒª¤Øž GF noRÆÇH:Ĩ¥¿L¿ÃpÛ ¥¹Lawrence Livermore National Laboratory Æ Antony DeGroot C§ÄA¶´ØžÃ¨noRÆ! WÇ9n?ÂÞY4-?oROfÉ^GMoR (C) 1 L‹ 12680327 ÄÕ؞. −48− 6. ¨ § ­ ¦ 1) D.M.Beazley, P.S.Lomdahl : Message-passing multi-cell molecular dynamics on the Connection Machine 5, Parallel Computing 20, pp.173-195, 1994. 2) S.Plimpton : Fast Parallel Algorithms for Short-Range Molecular Dynamics, Journal of Computational Physics, vol.117, pp.1-19, 1995. 3) D.S.Henty : Performance of Hybrid MessagePassing and Shared-Memory Parallelism for Discrete Element Modeling, Proc. of SC'00, Dallas, USA, Nov. 2000. 4) F.Cappello, D.Etiemble : MPI versus MPI + OpenMP on IBM SP for the NAS Benchmarks, Proc. of SC'00, Dallas, USA, Nov. 2000. 5) Mó u, ûó ¼„, ]. æÒ, Iù m!, m

(24) }, “. ¡N \SMP-PC çðôĨ®Ø OpenMP+MPI Æã>T*", ÙgÆ ?4oR gŽ, 2000-HPC-80-27, pp155-160, 2000..

(25)

参照

関連したドキュメント

In this work, we have applied Feng’s first-integral method to the two-component generalization of the reduced Ostrovsky equation, and found some new traveling wave solutions,

To address the problem of slow convergence caused by the reduced spectral gap of σ 1 2 in the Lanczos algorithm, we apply the inverse-free preconditioned Krylov subspace

In this paper, we apply the invariant region theory [1] and the com- pensated compactness method [2] to study the singular limits of stiff relaxation and dominant diffusion for

Thus, we use the results both to prove existence and uniqueness of exponentially asymptotically stable periodic orbits and to determine a part of their basin of attraction.. Let

Xiang; The regularity criterion of the weak solution to the 3D viscous Boussinesq equations in Besov spaces, Math.. Zheng; Regularity criteria of the 3D Boussinesq equations in

The author, with the aid of an equivalent integral equation, proved the existence and uniqueness of the classical solution for a mixed problem with an integral condition for

In this paper we develop an elementary formula for a family of elements {˜ x[a]} a∈ Z n of the upper cluster algebra for any fixed initial seed Σ.. This family of elements

“Breuil-M´ezard conjecture and modularity lifting for potentially semistable deformations after