SPAM粒子シミュレーションコードのハイブリッド並列化
全文
(2) Îн²ÕæÜÙçÆ÷ÅòÀ¸5 ñ6ÒùÄ´±5 ûÆöíÅ¿´ Áøü»Ï²É½±¹Æ0Æ9Å OpenMP Ò º½4úÆ9 Å¿´ÁÌøü»Ï². Hybrid Parallelization for SPAM particle code Shigehiro YOSHIKAWA,y Taisuke BOKU,y William G. Hoover,yy Carol G. Hooveryyy and Mitsuhisa SATOy. We apply the SPAM (Smooth Particle Applied Mechanics) method for 2-dimensional shockwave analysis and develop a parallelized SPAM code for SMP cluster. To exploit the parallelism in an SMP node, We implement the program with Hybrid method combining MPI and OpenMP. We evaluated the performance of palallelized SPAM code on an SMP-PC cluster named COSMO, and the results show MPI-Only method has higher performance than Hybrid one. We show a performance analysis based on the cache utilization on SMP processors, and also discuss issues of OpenMP on such a problem.. SPAM (Smooth Particle Applid Mechanics) Ç 4 Æ5ëÄ{Ã©î ¢î¹jÆ!½ À¤ØGn%Ã<=Ç[©1?jî ¢î (MD) Ä QØĬ¿¨×©DÆüz ÄÇTÄ&³¿P3Äa)´Ø9÷îáü æÛ¥¿¥Ø²¹ª¼¿©DÆÆf¤ ÄÇ[©1?jî ¢îÀÕ¥ÖÙ Øäöùã3ƹjÛ&´Ø°ÁªÀ«Øäöù ã3ÁǤØx#TÕ×Ù¹©DÀÆ9÷ îÛ9»íØ°ÁÄÕ×üzÆf¤Û XÄs´Ø¹jÀ¤Ø 3¡Ç 2 ¯qÆÓg?Æ5ëxÄ ² SPAM Û &´Ø°ÆÕ¦Ãäöùã3Û¥¹©î ¢îÛ])´Øש[@j (particle decomposition) Á_D[@j (space decomposition) Æ 2 ºªÕ¥ÖÙØ 3¡ÆxÀÇx 1.. 1),2). y. x Rx{. Institute of Information Sciences and Electronics, University of Tsukuba yy Department of Applied Science, University of California yyy Lawrence Livermore National Laboratory. _Dªäöùã3ÆGcÄNÍQØÄ«¥ÆÀ_ D[@jÛ²¹_D[@jÀÇ_DÛx#ì ÞñÆW[_D (ò) Ä[@²òÆ!bƱ Ûäöùã3ÆGcÄ-²¥©¤Ø¥Ç«ï#´ Ø°ÙÄÕ×üzf¤®Æ!ÜÛXÄu# ´Ø°ÁªÀ«Ø3¡ÆxÀÇî ¢î 2$Àò6Æ©Æq)ª«a)´Ø¹Ñ5 òöìÌÆòÆ ö2èÛU´Ø°ÁÄÕ× f¤ÆV0[£Ûæ°Áª¿À¤Ø nÀÇ2 ¯qÆÓg?Û5ë´Ø¹ÑÆ ] SPAM ë¢úÛ´þ²·Æã>5ëÛæf ¤Æ5öù3â¢
(3) ÄÇ]< HPC [yÀEÝÆ ¥ SMP çðôÛô¢êöùÁ´ØSMP ç ðôÇ*Ã]f¤IÄNÍÕ×*Äã> Ãîð÷
(4) Û À«Ø¹Ñv±Ù¿¥Ø3¡Ç SMP ý¢úÀÆ]ãÛ#«Á´¹ÑMPI ÆÐÄ ÕØ´þº®ÀÃMPI Á OpenMP ÛøÐÚ ¶¹þÞ4öú ])ÄÕØ´þÛæþÞ4 öú ])ÀÇSMP ý¢úDÇ MPI ÄÕØò Æ ö2èÄÕØV0[£SMP ý¢ú 6ÀÇ OpenMP Æðöúðéï¢èÄÕØV0[ £Ûæ (2 ÒÀ] SPAM ©î ¢îÆ. 1 −43−.
(5) 2. 2.1. ®° SPAM ¯©¢ ¤. @P. 2 ¯qÆÓg?5ëÄ SPAM Û&´Ø© i; j DÆ9÷îáüæOÉ© i Ä0Ç¯Æ ²ÀS±ÙØ
(6) r wp(r ) = 0 (1 0 rR ) ; R R. X X ()=0 ij. fx (i) = 0. ij. ij. 2. 3. wp(rij ). xi 0 xj ; rij. wp(rij ). yi 0 yj rij. j =i 6. fy i. j =i 6. r Ç © i Á © j DÆT R Çäöùã3 GcÛS´©Ä0Æf¤Ç© j ªäöùã 3Gc R 6Ä5´Ø×ÄÆÐÃÚÙØÆ f¤ÛGÄî ¢îÀÇ&íîDÐX 1t Û¥¿©ÆÁ3)Û®DD'%Äf¤ ´Ø°Æ®Dì[Ĩ®Ø}ÛsÖ´¹Ñ 4 ¯ Æêçöô (Runge-Kutta) jÛ&´Ø ij. î ¢îǯÆզùÅÀÚÙØ ( 1 ) î ¢îÆÇHÚk (©ÆÇH 3)à) Ûï#´Ø ( 2 ) ©ÛòÄ[£´Ø ( 3 ) 5òðÄòÛ ö2è´Ø ( 4 ) ò6Æ© i Á2!ò6Æ7Æ©¨ ÕÉî²¹ò6Æ©ÁÆø (i, j ) ÛQÑ ðùÛä´Ø°Æð÷ö5Û sorting Á wË ( 5 ) êçöôjÛ¥¿©Ä0Á+3) Ûf¤´Ø°°Àäöùã3ÆF#Ûæ ( 6 ) ©ÆÛß´Ø ( 7 ) òDÀÆ©ÆBÛ¦ ( 8 ) ð÷ö5 4 £ 7 Û`×c´ ð÷ö5 5 Áð÷ö5 7 ÀÇ5òðDÀø¢ô Æs°%ÃBªRÀ¤Ø ð÷ö5 3 ÀÇV0[£Û²¿òÆ ö2 èÛU´ØRª¤Ø3¡Ç5òðÌÆò. ö2èÛ±|ÄKÃÀ«ØÕ¦=òÄÇ·Æ ò6Æ©ø¢ôº®ÀÃ]kÆòÆÙgÒ e´ØզIJ¹²¹ª¼¿=òÇ]kÆò ÌÆ9ÞôÛe²·ÆòªÂÆ5òðÄ È{±Ù¿¥Ø©Ø°ÁªÀ«Ø°ÆÕ¦Äë¢ øÝè´ØÁÆ Æã¢ÿ6öúªN>%«Ã تò ö2èÛ¾7ÄKÃÀ«Ø¹ÑV0[ £Û¥Ó´¥ «´3¡Æë¢úÀÇ5òðÌÆòÆ ö. 2èÁî ¢îÆëÜW[Ç5ÄKà ±ÙÂÆզà ö2èÄÒ &,>Áü¿¥ Ø·ÆHt5òðDÆ!ÜÇòÀÃÚ Ù5òð,¹×Æý!ÜÄ ²!Ü6áÇN >%8Ãز©²{ôØÕ¦Äö´®DÆ W[Ç5òð6Æ©f¤ÄO±Ù!Ü®DÇ ·ÙÎÂ%Z²Ã¥ HBID 6¥Ø 2 ¯qÆÓg?5ëÆë¢úÀÇ ©ÆÇH3)Ç y ¯qiÄE²¿Çõ
(7) Ä §Øªx ¯qiÄÇà 1 ÆÕ¦ÃAÛ§Ø °ÙÇ x ¯qiīê©®ÖÙ¿¥Ø° ÁÛú#²¿¥Ø´ÄÇà 1 ÆAÄ&,ÃÔ Ö¬Û§¿ï#´ØϹ©ÆÆÇH Ç©Æq)ª x ¯qÆæÆiÄ©¼¿« ÃØÕ¦Äï#´Ø²¹ª¼¿1ÆòÕ× $1ÆòÆΦªØÄ©Æq)ª«ÃØ 2.2. 2. x-velocity. 1.9 1.8 1.7 velocity. <ÛÃ͹{3 ÒÀþÞ4öú ])À¥¹ 5è è¹jĽ¥¿ÃÍØ4 ÒÀÇ´IÖ ÀÆã>T*Û°²5 ÒÀÏÁÑÛæ. 1.6 1.5 1.4 1.3 1.2 1.1 1 -800. -600. -400. -200. 0 200 x-position. 400. 600. 800. }Yz °ÆÇHÚkÆ(À®Dî ¢îÛ¦ Á©ª$1Æ_DĽϼ¿²Ï¦¹Ñ´Ä f¤ÛRÁ´Ø_DÇÀÌ´Ø°ÆÕ¦ÃÛXÀ î ¢îÆÇHÄï#²¹5òðÌÆò. ö2èÛ¦¥4®ØÁQØĪÃØÆÀ ¤ØôÞ
(8) ð÷ö5ªe2²¹{Äð÷ö5 3 Äw ×5òðÌòÛ ö2è´Ø!J%Ä° ÆÕ¦ÃòÆ ö2èÇëðùª«¥ÆÀ &íÃôÞ èÀ´±¶Ø 1 x. £¡¤ 3¡ª´þ²¹ SPAM ë¢úÇ Fortran77 Àë¢ øÝè±Ù¿¥ØGn%à MPI 5è è ÀÇ4_ëÇü¿¥Ã¥ªôÃÆÕ¦Äò. ö2èÄE²¿ÇÕÃâ|ª,>Áü¿¥Ø ϹSMP ý¢úÀÕ׫Ã]ãÛ3عÑÄ !½Æ MPI 5òðÛ OpenMP Û¥¿±ÖÄ] )´Ø°ÁۧذÆզà ])ÛþÞ4 öú ])ÁwË °ÆxÀÒ®DÛÎO´ØÆ Çêçöô jÛ¥¿©Ä0Á+3)Ûf¤´ØW[À¤ اÈ43,200 ©Æî ¢îÛ! Æ f¤IÀ´±¶ØÁÆf¤Àz 70 % Æ CPU. −44− 2. 3..
(9) ®DÛÎO´Ø3¡Ç°Æf¤Ûü¿¥Ø¢ 5Û OpenMP ÆøÝç÷Ý4Û¥¿])´Ø ©DÆf¤Ä¨¥¿OpenMP Û¥¹]) Æ)Û«¸ÙØÕ¦Äsorting Ĩ¥¿ä ±ÙØ©7ÜÆðùÇòÀÇÃ5 òðÀä±ÙØ´ÃÚ»=5òð6ÀÇ Ïµ=òÛð岿©7ÜÆðùÛä² ¹{·ÆðùÛ!JÄÆ ´ØbÛ¸Ø sorting Æ ÒN>%Æ Ç¿¥ªðùÛä ´ØÄ;Æ© i OÉ© j ÄE²(i, j) Æá ùª¤ÙÈ(j, i) ÆáùÇä²Ã¥° Æ 2 ½Æ©ÆøÇHÆEaĤ×(i, j ) Àf¤²¹9÷îáüæÛ¦¼¿ (j , i) Æ Ûf¤À«Ø¹Ñ9÷îáüæÆf¤ÛG [ÄsÖ´°ÁªÀ«Ø©ÖÀ¤ØSPAM ë¢úÀ Æ©D9÷îf¤ÇY ùѰưÁÇ ã>Ö¿À¤×аÁÇÀ«Ã¥°ÆÕ¦Ä ©DÆ ÑãªhÙعÑsorting Æ Û OpenMP Û¥¿%Ä])´Ø°ÁÇ8²¥ sorting Æð÷ö5Àä±Ù¹ (i, j ) ðùÄ ²¿Æf¤Û沩²°ÙÖÆ (i, j) ©7ÜÄÇäöùã3Gc6ÆÒÆÁGc;ÆÒƪ FÏÙ¿¥Ø¹Ñ°ÆðùÛÄÃ4öç[@ À OpenMP Æ parallel do ÄÕ¼¿])´ØÁ= ðöúÆf¤V0Ä`תç³Ø,>㪤ذ Æf¤V0Æ[£Æ¹ÑÄ OpenMP Æðöúðé ï¢èƪ§ÖÙØ ±ÖÄYáÆðöúª2®Ä¤Ø©Ì0 Ûß²¿²ÏÚÃ¥Õ¦critical øÝç÷Ý4Û ¥¿©Ä0ÆßÛe~´ØRª¤Ø Æf¤ÛæëÜ¢5Û OpenMP Æ parallel do À])²¹ë¢úÛà 3 Ä°´npair Ç (i, j) ðùƱrlucy Çäöùã3GcÀ¤Øcritical ÀÏÙ¹W[Û 2 ½²¿¥ØÆÇ@7â|± ÙØW[Û̱´Ø¹ÑÀ¤Ø 3¡Æ5è
(10) Ç MPI Á OpenMP Û±¶ ¿5è èÛü¿¥ØÆðÀÇ= MPI 5òðÄ!½ÆðöúÆÐ@×,¿Yá Æ MPI 5òðÛ¥¿5è
(11) Û´²¹× Û MPI-Only ÁwË!i!½Æ MPI 5òðÄ YáÆðöúÛ@×,¿¿´²¹×Û Hybrid Áw˦´ØZ %Ã5òöìáÇ MPI 5 òðÁ OpenMP ÄÕØðöúáÛøÐÚ¶¿i #±ÙØ 4.. ª«¬¥. xìÞñÁ²¿ 43,200 ©Æî ¢î ÛæϹ2.2 ðÀÃ͹ÇHÚkÆ(Àî ¢îÛ9¨´Øy ¯qiÄÇ»HW8Úk Ûï#²x ¯qiÇö¿Æ©ª©Ö$ÌÁ 1´ØÆÀ$ÄÆÐ ¢W8ÚkÛï#´Ø. !$OMP parallel do private(i,j,xij,yij,rr,rij,w,wp) do ij=0,npair-1 i = ipairs(1,ij) j = ipairs(2,ij) xij = x(i) - x(j) yij = y(i) - y(j) if(yij.gt.+0.5*ny) yij = yij - ny if(yij.lt.-0.5*ny) yij = yij + ny rr = xij*xij + yij*yij if(rr.lt.rlucy*rlucy)then rij = sqrt(rr) call pot(rij,w,wp) !$OMP critical (FX) fx(i) = fx(i) - wp*xij/rij fx(j) = fx(j) + wp*xij/rij !$OMP end critical (FX) !$OMP critical (FY) fy(i) = fy(i) - wp*yij/rij fy(j) = fy(j) + wp*yij/rij !$OMP end critical (FY) endif enddo. XUY|YaSi L>AC ] SPAM ë¢úÆT*ÄÇ3¡ÆoR³À/ 0²¿¥Ø SMP-PC çðô COSMO (Cluster Of Symmetric Multi prOcessor) Û¥Ø =ý¢ú Ç Intel Pentium-II Xeon (450MHz, 512KB 4way L2 åöî) Û 4 *²¹ DELL PowerEdge6300 À4 ý¢úÀ ä±ÙØ·K'Ç 4-way Þ ô¢¢4À ä±Ù¥ð¢5öùª 3ÖÙ Ø ý¢úDÇ 100base-TX Ethernet Switch Á 800Mbps Myrinet Àî4±Ù¿¥Øè6Æ ã>T*ÀÇ 100base-TX Ethernet ÆЦ´Ø OS ÄÇ SMP &Æ Linux 2.2.16 Û¦²¿¥ ØOpenMP ë0ÞÄÇ PGI Fortran 3.1MPI Þ4ÄÇ MPICH-1.2.0 Û¥Ø COSMO ÖÀÆã>T*Ĩ¥¿´Ä¦± ÙØZ %Ã5òöìáÇSMP ý¢úáÁ SMP ý¢ú 6Æ5òöìáÁÆìÀ¤ØÆã>T *ÀÇMPI-OnlyHybird ÁÒÄSMP ý¢úá Û 1, 2, 4±ÖÄ= SMP ý¢ú 6À¦´Ø5 òöìáÛ 1, 2, 3, 4 Áa)±¶¿f 12 !×Æ× Ä½¥¿´´Ø .;715=+%'(M?NE 3¡ªô¢êöùÁ²¿¥ØxÇx ¯qiÄ E²¿QØÄ¢úÿðªÃ¼¿¥Ø²¹ ª¼¿ÄÄ x ¯qiÛ MPI 5òðáÀ4 öç[@²5òðÄòÛ@×,¿Ø (( at mapping) ÁV0[£ª¦ÏÃÚÙÃ¥·°À x ¯qiÄE²¿ MPI 5òðáÆåáBÀÕ× ©4öç[@²òÛ5òðÄìÞçö çÄ ö2è´Ø (( cyclic mapping)V0[ £ÀÇ cyclic mapping ª{À¤Øª4öçÆ. −45− 3. 2. 4.1. 5). 5). 4.2.
(12) W8ªÿ+´Ø¹Ñ5òðDÀÆø¢ô!ܪÿ +´ØÁ¥¦h(ª¤Ø µÆ mapping ijÛ´þ²V0[£ÆÛXÛ ÍعÑÄ1 ôÞ
(13) ð÷ö5ÆDÄ´ÄÆf ¤ (à 3 Æ if \6Æf¤) Ûü¹6áĽ¥¿2 #²¹4 5òðÀ´²¹×Æj.ÛS 1 Ä° ´cyclic mapping ÀÇf¤V0ª=5òðÄ\ -Ä[£±Ù¿¥Ø°Áª[©Ø ! 1 cyclic mapping XZ[|Yw lted ID at mapping cyclic mapping 0 1 2 3. 366,719 387,360 1562,349 1535,350. 14. static cyclic dynamic. 10. 6.80 6.78 6.77. ! 3 dsgid`cpSqu_XZ[|Yw dsgi ID static cyclic dynamic. 1node(flat mapping) 1node(cyclic mapping) 2node(flat mapping) 2node(cyclic mapping) 4node(flat mapping) 4node(cyclic mapping). 12 Elapsed Time / step (sec). 972,417 974,227 962,881 954,058. ¿´´ØS 2 Ä OpenMP Ĩ®Øðöúðé ï¢èÛ static, cyclic, dynamic Áa§¹× Æ 1 ôÞ
(14) ð÷ö5 ,¹×Æ´®DÛ°´static ÀÇðöúáÀ¢5ÛÄÄ4öç[@² ê%ÄðöúÄ@×,¿Øcyclic Ç chunk Ä4 öçìÞñÛª#²Õש¥)À[@²ìÞ çöçÄðöúÌ@×,¿Ødynamic Ç cyclic Á2Ä©4öç[@²1%ÄðöúÄ@× ,¿Øcyclic Á dynamic Ľ¥¿Ç·Ù¸ÙÒ ¥ã>ªÁØÕ¦Ä4öçìÞñÛª#²¹ ! 2 dsgid`cpSqu_Y~v d`cpSr y (sec). 0 1 2 3. 8 6. 924,895 970,685 972,495 972,915. 942,257 955,141 970,682 972,910. 960,329 960,285 956,265 964,111. S©Ö[©ØÕ¦Äðöúðéï¢èÛa §¿Ò´®DĤÏ×ÇpÖÙÃ¥°ÙÇäö ùã3ÆGc6Ä5²´ÄÆf¤ªÃÚÙ Ø© (i, j) ÆøÐÚ¶ª(i, j) ðù6À`× r5²¿¥Ø¹ÑÁ§ÖÙØ=ðéï¢ 3 ermgku_Y~v èĽ¥¿1 ôÞ
(15) ð÷ö5ÆDÄ´ÄÆf à 3 ÇòÆ ö2èÄ at mapping Û¥¹ ¤ (à 3 Æ if \6Æf¤) Ûü¹6áĽ¥¿S 3 ×Á cyclic mapping Û¥¹×Æ 1 ôÞ
(16) ð Ä°´°ÆÕ¦ÄÄÃ4öç[@À¤Ø static ÷ö5 ,¹×´®DÀ¤Øcyclic mapping Û ðéï¢èÀÒ 4 ½ÆðöúDÀÆf¤V0 ¥Ø°ÁÄÕ×Àz 30 % Æã>Öª3ÖÙ Ä·ÙΠ`תå°Áª[©Ø²¹ª¼¿° ¹²©²¦´Ø5òöìáªÿ+´ØĽ٠°ÀÇðöúðéï¢èÛU´Ø°ÁÄÕ ¿µÆã>Ç̱ü¿¥Ø°ÙÇcyclic Ø.ǤÏ×tÙÃ¥ 4)6:13OQ ,"3&JK mapping ÄÕ×5òðDÆø¢ô!ܪÿ§¹¹ ¯Ä SPAM ë¢úÛ Hybrid À´²¹×Á ÑÁ§ÖÙØ Æã>T*ÀÇ MPI 5òðÌÆò ö2 MPI-Only À´²¹×Æ 1 ôÞ
(17) ð÷ö5 ,¹ ×Æ´®DÛà 4 Ä°´Ã¨MPI-Only Ĩ èÄÇ cyclic mapping Û¥Ø ®Ø5òðÆý¢úÌÆ ö2èÇ!ÜÛ& %'(M?NE ¯ÄÆf¤Ûæ¢5Æ=Þô¢îÆ )´Ø¹ÑÄî´Ø5òð (î´ØòÛ, f¤V0Æ[£Æ¹ÑÄOpenMP Æðöúðé ´Ø5òð) ª 2³ý¢úÄÃØզIJ¹à 4 ï¢èÛ¥¹×Æ.Ľ¥¿ÍØ° ©Ö MPI-Only Ç Hybrid Õ×ØÄã>ª¥°Á ÆV0[£Áò ö2èÄÕØV0[£ÁÆ ¥ ª[©Ø °ÆµÆã>Ær"ÛÍعÑ5òðD Çò ö2èÄÕØV0[£Ç (i, j) ðùÆ ± (npair) Û\-Ä5òðÄ@×,¿Ø°ÁÀ Æ!Ü®DÛË¥¹6WÆ ®D´ÃÚ»©Ä0 ¤×°°ÀÆV0[£Ç (i, j) ðù6À´Ä Æf¤Ä©©Ø®DÛ2#²¹2#j.Ûà 5 Ä°´°Æj.©Ö!Ü®DÀÇÃÆf¤ ÃÚÙØÆf¤Æ[£Ûv%Á²¿¥Ø ðöúðéï¢èÇà 3 Æ parallel do øÝ Ä©©Ø®Dªã>Æ·Ãr"À¤Ø°Áª[©Ø SMP çðôÀÇV{ÿðª8ùüöçÁÃØ ç÷Ý4Ä schedule(SCHEDULE,chunk) Ûª#´ Ø°ÁÄÕ×â|´Ø°Æ.ÛnԴعÑÄ ¹ÑåöîÆÆ2#ÄG¾ã>5ë COSMO Æ 1 Æ SMP ý¢úÖÀ 4 ðöú ¥ ª{À¤Ø (ÀÇV{ ÿðÌÆÜçò 4 2. 0. 0. 1. 2. 3. 4. 5. 6 7 8 9 10 11 12 13 14 15 16 Number of Processors. 4.4. 4.3. SPAM. OpenMP. 5). 4 −46−.
(18) 14. Elapsed Time / step (sec). ! 4 Y|Y L2 ]ogbpnd (%). 1node(MPI-only) 1node(Hybrid) 2node(MPI-only) 2node(Hybrid) 3node(MPI-only) 4node(Hybrid). 12 10. #node 1. 8. 2. 6 4 2. 4. 0 0. 1. 2. 3. 4. 5. 6 7 8 9 10 11 12 13 14 15 16 Number of Processors. 4 MPI-Only 14. 1node(MPI-only) 1node(Hybrid) 2node(MPI-only) 2node(Hybrid) 3node(MPI-only) 4node(Hybrid). 12 Elapsed Time / step (sec). W Hybrid Y y. 10 8 6 4 2 0 0. 1. 2. 3. 4. 5. 6 7 8 9 10 11 12 13 14 15 16 Number of Processors. W Hybrid Y y (Y|) ðÛEæ L2 åöîÆ ðÛ§Ø Æf¤W[Ľ¥¿L2 åöî ðÛ2 #²¹j.ÛS 4 Ä°´Ã¨Intel Æ PentiumII Xeon ÄÇ5òöìÆ03⢠ðÛ2#´ عÑÆäßôª ±Ù¿¨×°Æäßô Û¦²5òöìÆø¢ôdÆ ¢Ï6á (DATA MEM REFS) Áö¿Æÿðùíçî 6á (BUS TRAN ANY) Û2#´Ø°ÁÄÕ× L2 åöî ðÛ¤Á²¿¥Ø SÆj.©Ö Hybrid ÀÇ SMP ý¢ú 6ÀYáÆ 5òöìÛ¥¹×L2 åöî 𪫠ÃØ°Áª[©Ø(Hybrid À L2 åöî ðªÃØr"Ľ¥¿´Ø Hybrid ÀÇsorting Æð÷ö5Àä±ÙØ (i, j ) ðùÄ ²ðöúÛ¥¿])²¿¥Ø° Æ (i, j) ðùªÂÆզà 0Äü¿¥Ø©Ûñ s´Ø¹ÑÄðùÆä¹ÅÛ(Ä°´ ( 1 ) ¤Øò6Æ© i Ľ¥¿2!ò6Æ © j ÁÆøÛ¿Yªr¥Õ¦Ää´Ø ( 2 ) © i Áî²¹ 8 ½Æò6Æ© j ÁÆø Ûä´ØòªW8ÖĤØ×;Ç ¿YÛP®Ø¹Ñ+léll+Æ 4 ½ Æò6Æ©ÁÆøÛä´Ø ( 3 ) ò 6Æö¿Æ©Ä½¥¿1 Á 2 Û`× c´ 5 MPI-Only. #processor 1 2 3 4 1 2 3 4 1 2 3 4. MPI-Only 0.118 0.119 0.121 0.130 0.124 0.122 0.125 0.132 0.122 0.128 0.138 0.142. Hybrid 0.119 0.552 0.723 0.854 0.120 0.555 0.726 0.861 0.123 0.563 0.747 0.899. ( 4) ·Æ5òðª,´Øö¿ÆòĽ¥¿ 1 £ 3 Û`×c´ °ÆÕ¦Ä(i, j ) ðùÇòÛÆ Á²¿ ä±ÙØèð÷ö5 4 ÀÇ5òðª±[ Æ,²¿¥ØòۢϴØ×y ¯qiÛ( ©ÖÖÄ4ÄÜçòð´Ø °ÆզùÅÀä±Ù¹ (i, j) ðùÛ¥¿ ©Ä0Æf¤ÛæÁà 6 ÆÕ¦Ä5 ½ Æò6Æø¢ôÆÐ`×c²Üçòð´Ø°ÁÄà ØÆf¤À·Ä¦ÚÙØBè)´áø¢ôÇ ©ÆÙg (à 2 Æx,y) Á (à 2 Æfx,fy) Æ 4 ½À¤ØϹ°ÆxÀÇ 1 ½Æò6Æ©Ç 100 Û§¿¨Öµ5 ½Æò6ÀÜçòð±ÙØ ø¢ôÆÛpìÒØÁ 5 2 100 2 4 2 8byte = 16000byte ÁÃ×z 16 KB ÀQØÄ̱¥Ï¹°Æx ÀÇy ¯qiÆòÇ 6 ½ÁÍÃ6 2 3 = 18 vÆòÛÜçòð²¿Òø¢ôÇz 56 KB À L2. åöî©Ö ¥Á±ÙØ°ÁÇÃ¥²¹ª¼¿ ÆÆò (x ¯qiÀ$ÆÆò) ÆÛf ¤´ØÁ«ÄÇ+Ál+;ÆòÇ´ÀÄåö îÄ:¼¿¥ØÁ§ÖÙز¹ª¼¿x ¯qi ÆòÒ4ÄÜçòð´Ø°ÁÄÕ×åöî Û{%ÄÀ«Ø. 5 −47−. Y セル. X. 粒子 j の範囲 粒子 i の範囲. 6. 内の全ての粒子について 力の計算が終了したら 次のセルに移動する. Y|XTV[hSf\^edjfSu.
(19) ¯ÄYáÆ5òöìÖÀ´´Ø׼¥¿ §ØMPI-Only ÀǦ´Ø5òðáÛÿÓ²¿ Ò=5òðª·Ù¸Ù (i, j ) ðùÛä²(i, j ) ðùÆò/©ÖÆ ²¿¥¹ÑåöîÛ {ÄÀ«Ø!iHybrid Ç°ÆòÆ_D %ÃÁÇöEaà (i, j ) ðùÆ÷Û@× ,¿ØÆÀMPI-Only ÎÂåöîÄ,¹ÖÃ¥ Á§ÖÙؽÏ×(i, j) ðùÇò/©Ö¤ Ø¥ÇòÆÆ ÆaÚ×v©Öf¤Û¨ÑØ ÆªåöîÛ´ØÖÀÒª¥Á§ ÖÙت Hybrid ÀÇ (i, j) ðùÆ;Æ(©ÖÆ ÛðöúÄ@×,¿¿²Ï¦ÆÀªÃ ØÁ§ÖÙØ Hybrid À MPI-Only Á2³Õ¦Ã@×,¿Ûà ¦ÆÇ8À¤×~[ÃÆ ªRÄÃØ(i, j ) ðùÀòÀÆÆ ÆaÚ×vÛðöúÄ@× ,¿Ø¹ÑÄÇðöú ID Û¥¹ÕÃðö ú5è èªRÄÃØÁ§ÖÙØ %#$(2"0<"*9)/" -8= ÖÛÏÁÑØÁHybrid´ÃÚ» OpenMP Û ¥Ø5è èĨ¥¿5òð6Æø¢ô ÆçäÁ·ÆÆ Á¥¦Õ¦ÃEaÛ§Ø×· ÙÖª2!ÆðöúÀ´±ÙØ°Áª¿À¤Ø ´ÃÚ»ø¢ôÆ¢äÞó¢îª¿À¤ קÈÇÆ¢5Àø¢ôÛç䲯Ƣ 5À·ÙۢϴØÕ¦Ã×=¢5ÆÞô¢ îª2!ðöúÄ@×,¿ÖÙÃ¥Áåö îÆ1öùªXÄ"(²ã>Û)±¶Ø «Ã"ÄÃØ °ÙÛP®Ø{ÃijÇôGÆ¢5Á{GÆ ¢5Æ=KªÊ¼¹×!´Ø4öç@×,¿ Ûæ°ÁÀ¤Øª°°À¸×S¯¹xÆÕ ¦ÄõÞû öçÃV0[£ÛæRªç³Ø Á°Ùª¦Ï&À«Ã¥Ò¦!½ÆijÇ Ç©ÖYáÆ¢5ÄyØÆ ÛðöúÄ« [®¿²Ï¥ðöú 6À=f¤3à¢ñÛ^³ ѿæզÃ5è èÛæ°ÁÀ¤ ز©²°ÆijÇ¥ÚÔØ naive Ã])À¤ ×OpenMP ƽ¯5è
(20) ©ÖÆÞç ôÃ])Á¥¦5è
(21) ÆCd±Û «6æ°ÁÄÃ×lÅåϹ1%V0[£ ÌÆ &Ľ¥¿ÇÇÄÃ͹ijÁ2 &² ¾Ö¥ OpenMP Û ¥¹ Hybrid ]5è
(22) ÀÇ ý¢ú 6!ÜÆã¢ÿ6öúÛsֲϹý¢úD !ÜÆ)Û«²2Ä!Üã¢ÿ6öúÛs Ö´Á¥¦ öùª¤Ø²©²6ÆÆÕ¦Ä OpenMP W[ÀÆø¢ô[ÈãÄG¾ã¢ÿ6ö úª!Üã¢ÿ6öúÛÖ6Ø×ý%Ãã>À MPI ÆÐÆ×ÛÖ6ØÆÇ8²¥°Æ×naive 4.5. OpenMP. Ã])ǤÏ×pªÃ (·ÙÖ¥ÃÖ MPI º ®À5è è²¹iªCÀ¤Ø)Hybrid ] )ÄÕ¼¿ã>ª3ÖÙØÇáÍÃ¥. nÀÇ4 Æ5ëÄ{ùjÀ¤Ø SPAM Û 2 ¯qÆÓg?5ëÄ&²SMP çðôÛ ô¢êöù5öù3â¢
(23) Á²¿])Ûü¹ SMP ý¢úÀÕ×¥ã>ÖÛ3عÑÄMPI º®ÀÃMPI Á OpenMP ÛøÐÚ¶ØþÞ4 öú ])Ûü¹ª´ã>ÀÇ MPI º® ¥¹×Æiª¥j.Áü¹MPI Á OpenMP ÄÕØþÞ4öú ])À¥ã>Û3Ø°ÁÇ8 ²]ÆoRÀÒg±Ù¿¥Ø åöî ðÄG¾5ë©ÖþÞ4öúÇ MPI ÆÐ ¥Ø×ÄNÍ¿ø¢ôÆ[ÈãÛ 5.. 3),4). À«Ã¥°Áª«Ãr"Á§ÖÙØ 6ü¹þÞ4öú ])ÀÇã>Û7õ ´Ø¹ÑÆ~ª¥±Ù¿¥Ø3¡Æë¢úÀÇ ©Ä0Ûß´ØÄðöú 2§ªU²Ã ¥Õ¦Ä critical øÝç÷Ý4Û¥¿¥Ø°Æ @7â|Æã¢ÿ6öúÇ8¯qAÛ_IJ ðöú oÄ5ÄÆßÛÃڶذÁÄÕ× Ë,>ºª¢5Æ{Ä~[Ã+¤Æ ªR ÁÃطٸÙƹjÆã¢ÿ6öúÆù¢úã 3ÛÍعÑÄ{µÆ¹jĽ¥¿Ò´þOÉT *ªRÀ¤ØϹø¢ôÆAÁÜçòðÅÊ â|ÄE²¿Ò±ÖÄÛ¿ÅØRª¤Ø GF noRÆÇH:Ĩ¥¿L¿ÃpÛ ¥¹Lawrence Livermore National Laboratory Æ Antony DeGroot C§ÄA¶´ØènoRÆ! WÇ9n?ÂÞY4-?oROfÉ^GMoR (C) 1 L 12680327 ÄÕØ. −48− 6. ¨ § ¦ 1) D.M.Beazley, P.S.Lomdahl : Message-passing multi-cell molecular dynamics on the Connection Machine 5, Parallel Computing 20, pp.173-195, 1994. 2) S.Plimpton : Fast Parallel Algorithms for Short-Range Molecular Dynamics, Journal of Computational Physics, vol.117, pp.1-19, 1995. 3) D.S.Henty : Performance of Hybrid MessagePassing and Shared-Memory Parallelism for Discrete Element Modeling, Proc. of SC'00, Dallas, USA, Nov. 2000. 4) F.Cappello, D.Etiemble : MPI versus MPI + OpenMP on IBM SP for the NAS Benchmarks, Proc. of SC'00, Dallas, USA, Nov. 2000. 5) Mó u, ûó ¼, ]. æÒ, Iù m!, m
(24) }, . ¡N \SMP-PC çðôĨ®Ø OpenMP+MPI Æã>T*", ÙgÆ ?4oR g, 2000-HPC-80-27, pp155-160, 2000..
(25)
関連したドキュメント
In this work, we have applied Feng’s first-integral method to the two-component generalization of the reduced Ostrovsky equation, and found some new traveling wave solutions,
To address the problem of slow convergence caused by the reduced spectral gap of σ 1 2 in the Lanczos algorithm, we apply the inverse-free preconditioned Krylov subspace
In this paper, we apply the invariant region theory [1] and the com- pensated compactness method [2] to study the singular limits of stiff relaxation and dominant diffusion for
Thus, we use the results both to prove existence and uniqueness of exponentially asymptotically stable periodic orbits and to determine a part of their basin of attraction.. Let
Xiang; The regularity criterion of the weak solution to the 3D viscous Boussinesq equations in Besov spaces, Math.. Zheng; Regularity criteria of the 3D Boussinesq equations in
The author, with the aid of an equivalent integral equation, proved the existence and uniqueness of the classical solution for a mixed problem with an integral condition for
In this paper we develop an elementary formula for a family of elements {˜ x[a]} a∈ Z n of the upper cluster algebra for any fixed initial seed Σ.. This family of elements
“Breuil-M´ezard conjecture and modularity lifting for potentially semistable deformations after