λϒʔαʔνΛแͨ͠ϞϯςΧϧϩ୳ࡧʹجͮ͘
ғޟΞϧΰϦζϜ
ଠా ༤େ
1ҏ౻ խ
2 ֓ཁɿϞϯςΧϧϩ୳ࡧʹ͓͚ΔϓϨΠΞτͷޮԽͷݚڀ׆ൃʹߦΘΕ͖ͯͨ. ͔͠͠ɼϓϨΠ Ξτͷଟ༷ੑʹ͍ͭͯͷݚڀ͋·Γ͞Ε͍ͯͳ͍. ͦ͜ͰɼຊݚڀͰϞμϯώϡʔϦεςΟΫεͷ ҰͭͰ͋ΔλϒʔαʔνΛϓϨΠΞτʹద༻͢Δ͜ͱΛఏҊ͢Δ. ϓϨΠΞτΛߦͬͨہ໘Λλϒʔ ϦετʹՃ͠ɼλϒʔظؒ୳ࡧ͢ΔͷΛېࢭ͢Δ. ·ͨɼλϒʔظؒΛա͗ͨہ໘ΛλϒʔϦετ͔Β औΓআ͘. ͦΕʹΑΓϓϨΠΞτͷଟ༷ੑΛ֬อ͢Δ͜ͱ͕Ͱ͖Δ. ࣮ݧΛରہٴͼ٧ޟʹͯߦ͍ɼ λϒʔαʔνΛแͨ͠ϞϯςΧϧϩ୳ࡧ୯७ͳϞϯςΧϧϩ୳ࡧʹൺͯྑ͍ੑೳ͕ಘΒΕͨ.An Igo Algorithm of Monte Carlo Tree Search
Including Tabu Search
Takehiro Ohta
1Masaru Itoh
2Abstract: Efficiency of playout in Monte Carlo tree search (MCTS) have been extensively studied up to now. However, diversity of playout is not really investigated. Because of that, this paper refers to the diver-sity in MCTS. So this paper proposes to combine MCTS with tabu search (TA), which is a modern heuristic technique for combinatorial problems, into the computer igo algorithm. Once a phase of the playout is added into a tabu list, the searching method prohibits the adoption of the same phase during a given tabu tenure. When the number of trials for playouts is greater than the tabu tenure, the phase is removed from the tabu list. And then the phase could be adopted again. Thus the proposed method can be obtained to ensure the diversity of playout as a whole. The numerical results for some life-and-death igo problems shows that the method of MCTS including TA have obviously got an advantage over the simple MCTS algorithm with the view of the right moves.
1.
͡Ίʹ
ۙͷίϯϐϡʔλғޟͷعྗɼ࣮֬͜͜ʹ
্͍ͯ͠Δ. 20133݄ʹߦΘΕͨୈ̒ճUECഋίϯ
ϐϡʔλғޟେձ[1]ͰɼCrazy Stone[2]͕༏উͨ͠.
ίϯϐϡʔλғޟͷعྗ্͕ͨ͠ཧ༝ͱͯ͠ɼϞϯ ςΧϧϩ୳ࡧ(MCTS: Monte Carlo Tree Search)[3]
ϓϨΠΞτͷޮԽͳͲ͕ڍ͛ΒΕΔ. ؔ࿈ݚڀͱͯ͠
LGRF(Last Good Reply with Forget)[4]ύλʔϯʹΑ
1 ѪۀେֶେֶӃܦӦใՊֶݚڀՊ
Graduate School of Business Administration and Computer Science, Aichi Institute of Technology
2 ѪۀେֶใՊֶ෦ใՊֶՊ
Faculty of Information Science, Aichi Institute of Technology
ΔϓϨΠΞτͷணख[5]ͳͲ͕͋Δ. લऀϓϨΠΞτ தͷ͋Δہ໘ͰউͬͨଧͪํهԱ͓͖ͯ͠ɼෛ͚ͨଧͪ ํΕΔͱ͍͏ख๏Ͱ͋Δ. ͜ΕʹΑΓಉ͡ہ໘ʹૺ۰ ͨ͠߹ɼউͬͨ࣌ͷखΛଧͭ͜ͱͰϓϨΠΞτΛޮ Խ͢Δ. ޙऀύλʔϯΛϓϨΠΞτͰͷީิखͱ͢Δ ͜ͱͰϓϨΠΞτͷਫ਼Λ্ͤ͞Δ. ͜ͷΑ͏ʹϓϨ ΠΞτͷޮԽʹ͍ͭͯ׆ൃʹݚڀ͞Ε͍ͯΔ͕ɼϓ ϨΠΞτͷଟ༷ੑʹ͍ͭͯ͋·Γݚڀ͞Ε͍ͯͳ͍. ͦ͜ͰɼຊݚڀͰϞμϯώϡʔϦεςΟΫεͷҰͭͰ ͋Δλϒʔαʔν[6]ΛϓϨΠΞτʹద༻͢Δ͜ͱΛఏ Ҋ͢Δ. λϒʔαʔνɼҰ୳ࡧΛߦͬͨղΛλϒʔϦ ετʹՃ͠ɼλϒʔظؒத୳ࡧ͠ͳ͍ͱ͍͏ΞϧΰϦ ζϜͰ͋Δ. ಉҰہ໘ͷ८ճΛ͗ͭͭɼ୳ࡧۭؒͷଟ༷
The 18th Game Programming Workshop 2013
-ੑΛ֬อ͢Δ͜ͱ͕Ͱ͖Δ. ୯७ͳϞϯςΧϧϩ୳ࡧͱλϒʔαʔνΛแͨ͠Ϟ ϯςΧϧϩ୳ࡧͱͷੑೳൺֱΛରہ͓Αͼ٧ޟʹͯߦͬ ͨ. ·ͨɼϓϨΠΞτʹ͓͚ΔಉҰہ໘Λ୳ࡧͨ͠ൺ ʹ͍ͭͯߟ͢Δ.
2.
ϞϯςΧϧϩ୳ࡧ
ϞϯςΧϧϩ୳ࡧɼϞϯςΧϧϩ๏ʹUCB1ΞϧΰϦζϜ(Upper Confidence Bound)[7]ͱ୳ࡧΛ༻͍ͨΞ
ϧΰϦζϜͰ͋Δ. ୯७ͳϞϯςΧϧϩ๏ɼ໌Β͔ʹѱ ͍खʹϓϨΠΞτΛۉʹ࣮ߦ͠ɼฏۉউͰखͷྑ ͠ѱ͠Λஅ͢Δ. ͔͠͠ɼUCB1ΞϧΰϦζϜΛ༻͢ Εɼউ͕ߴ͍खʹରͯ͠ϓϨΠΞτΛΑΓଟ͘ࢼߦ Ͱ͖ɼ࠷େ(૬खʹͱͬͯ࠷খ)ͷՁΛ࣋ͭख͕બͰ ͖ΔΑ͏ʹͳΔ. UCB1ΞϧΰϦζϜͷܭࢉࣜΛࣜ(1)ʹࣔ͢. ͜͜Ͱɼ ¯ XiखiΛଧͬͨ࣌ͷݱࡏͷউɼniखiҎ߱ͷϓϨΠ ΞτͷճɼNͦͷہ໘ʹ͓͍ͯࢪߦͨ͠ϓϨΠΞ τͷ૯(niͷ૯)Ͱ͋Δ. UCB1(i) = ¯Xi+ 2 logN ni (1) ϞϯςΧϧϩ୳ࡧͷॲཧͷྲྀΕΛҎԼʹࣔ͢. STEP 1 ϧʔτϊʔυ͔Βࣜ(1)ͷ͕࠷େͰ͋Δࢠ ϊʔυΛબ͠ͳ͕Βϊʔυ·ͰΛԼΓΔ. STEP 2ϊʔυͷϓϨΠΞτճ͕ᮢΛ͑ͯ ͍ΕɼͦͷϊʔυͷࢠϊʔυΛ࡞͠ΛԼΓΔ. ᮢΛ͍͑ͯͳ͚Εɼࢠϊʔυ࡞͠ͳ͍. STEP 3ϊʔυͰϓϨΠΞτΛ1ճߦ͏. STEP 4୧͖ͬͯͨϊʔυʢϧʔτϊʔυΛؚΉʣ͢ ͯʹϓϨΠΞτͷ݁ՌΛөͤ͞ɼউΛߋ৽͢Δ. STEP 5੍ݶ࣌ؒͷ੍ʹୡ͍ͯ͠Ε୳ࡧΛऴྃ͢ Δ.͞ͳ͚ΕSTEP 1ʹΔ.
3.
ఏҊ๏ͷ֓ཁ
ғޟͰɼਤ1ͷΑ͏ʹணखखॱ͕ҟͳ͍ͬͯͯಉ͡ ہ໘ʹͳΔ͜ͱ͕ଟ͍. ಛʹɼ9࿏൫٧ޟͰணखൣғ͕ ڱ͍ͷͰݦஶʹදΕΔ. ಉҰہ໘ͷ८ճʹΑΓɼہॴղʹ ऩଋ͢ΔՄೳੑ͕ੜ͡Δ. ಉҰہ໘ͷ८ճ͕ى͜Γқ͍ہ ໘ͱͯ͠γνϣ͕ڍ͛ΒΕΔ. ྫ͑ɼγνϣ͕ਖ਼ண Ͱ͋Δہ໘ͰγνϣΛूதతʹ୳ࡧ͢Δͷਖ਼͍͕͠ɼ γνϣ͕ਖ਼ணͰͳ͍ہ໘ͰγνϣΛूதతʹ୳ࡧ͢Δ ͷޮՌతͰͳ͍. ͦ͜ͰɼຊݚڀͰϞμϯώϡʔϦεςΟΫεͷҰͭͰ ͋ΔλϒʔαʔνΛϞϯςΧϧϩ୳ࡧʹద༻͢Δ. 3.1 ఏҊ๏ͷΞϧΰϦζϜ λϒʔαʔνϞϯςΧϧϩ୳ࡧͷSTEP 3ͷϓϨΠ Ξτதͷہ໘ʹରͯ͠ߦ͏. Ұ୳ࡧͨ͠ϓϨΠΞτ ͷہ໘ΛλϒʔϦετʹՃ͓͖ͯ͠ɼλϒʔظؒத୳ ࡧ͠ͳ͍Α͏ʹ͢Δ. λϒʔαʔνΛแ͢Δ͜ͱͰɼಉ Ұہ໘ͷ८ճΛ͗ͭͭɼϓϨΠΞτͷଟ༷ੑ͕֬อͰ ͖Δ. λϒʔϦετਤ2ʹࣔ͢Α͏ʹϞϯςΧϧϩͷ ֤ϊʔυ͕อ࣋͢Δ. ਤதͷϊʔυn1ͱϊʔυn2ͷ λϒʔϦετͷཁૉ͕ҟͳΔΑ͏ʹɼλϒʔϦετ֤ ϊʔυຖʹཧ͠ɼϊʔυؒͰͷλϒʔϦετͷڞ༗ߦ Θͳ͍. λϒʔϦετϓϨΠΞτॳख༻ͱ࣍ख༻ͷ2 ͭͷϦετͰߏ͞ΕɼϦετͷߏΩϡʔߏͱ͠ɼ Ϧετͷ͞ΛλϒʔαΠζͱ͢Δ. ਤ2λϒʔαΠζ ͕3Ͱ͋ΔϦετΛද͓ͯ͠ΓɼਤதͷFirstॳख༻ͷ ϦετΛɼSecond࣍ख༻ͷϦετΛҙຯ͢Δ. ·ͨɼਤ 2தͷҹλϒʔϦετΛߋ৽͢ΔࡍͷΩϡʔΛͣΒ͢ ํΛද͍ͯ͠Δ. લઅϞϯςΧϧϩ୳ࡧSTEP3ͷՃΞϧΰϦζϜ ͷ֓ཁΛҎԼʹࣔ͢. STEP 3-1ϓϨΠΞτ1खͱ2खͷ࣌ɼͦͷہ໘ ͕λϒʔϦετʹؚ·Ε͍ͯΔ͔൱͔ΛௐΔ. STEP 3-2ؚ·Ε͍ͯͳ͚Εɼͦͷہ໘ΛλϒʔϦε τʹՃ͠ɼϓϨΠΞτΛଓߦ͢Δ.ؚ·Ε͍ͯΔ ͳΒɼͦͷखഁغ͞ΕɼϓϨΠΞτΛ1ख͔ ΒΓ͠ɼSTEP 3-1ʹΔ. STEP 3-3λϒʔϦετʹہ໘ΛՃ͢Δͱಉ࣌ʹɼλ ϒʔϦετΛߋ৽͠ɼλϒʔظؒΛա͗ͨہ໘Λλ ϒʔϦετ͔ΒऔΓআ͘. λϒʔαʔνΛಋೖ͢Δ͜ͱͰɼ୯७ͳMCTSख๏ʹ ൺͯϓϨΠΞτʹཁ͢Δ͕࣌ؒ૿େ͢Δͱਪ͞ΕΔ. ͔͠͠ɼ࣮ࡍʹ୯७ͳMCTSख๏ͱେࠩͷͳ͍ܭࢉ࣌ ؒͰ͋ͬͨ. ࣍અͰఏҊ๏ͷܭࢉྔʹ͍ͭͯઆ໌͢Δ. 1 3 2 1 3 2ਤ1 Example of the same state obtained from different moves
First Second Tabu Size = 3 (5, 3) (4, 5) (7, 1) … n1 n2 First Second Tabu Size = 3 (4, 7) (3, 2) (8, 4) Null Null Null n1 n2 (2, 1) (1, 8) (3, 4)
ਤ2 Tabu lists on each node in the tree
The 18th Game Programming Workshop 2013
-(5, 3) (7, 1) (1, 8) (3, 4) Null (2, 1) (4, 5) (2, 1) Tabu Size = 3 (5, 3) (7, 1) (1, 8) (3, 4) First Second
ਤ3 Updated queue of tabu lists
3.2 ఏҊ๏ͷܭࢉྔ λϒʔϦετͷαΠζΛLͱ͠ɼN࿏൫ͰୈKखͷ ہ໘Λߟ͑Δ. ͜ͷͱ͖ϓϨΠΞτ͕1ճޭ͢Δ·Ͱ ʹ࣮ߦ͢ΔϓϨΠΞτͷฏۉࢼߦճΛT ͱ͠ɼT ͷ্ քT¯ΛٻΊͯΈΔ. ҎԼͷ3ͭΛԾఆ͢Δ. (1)λϒʔ Ϧετશ෦ຒ·͍ͬͯΔ. (2)λϒʔϦετʹొ͢Δ ہ໘ॳखͱ࣍खͷ2ख·Ͱͱ͢Δ. (3)બ͞Εͨख͕ λϒʔϦετʹؚ·ΕΔ߹ɼͦͷखഁغ͞Εɼλϒʔ ϦετͷΩϡʔಉ࣌ʹߋ৽͞ΕΔ. Ծఆ(3)ͷঢ়گΛਤ 3ͷࠨਤʹࣔ͢. ϓϨΠΞτॳखͷ࠲ඪ͕(5, 3)ͳΒɼ ॳखͷϦετʹ(5, 3)ؚ͕·Ε͍ͯΔͷͰɼख(5, 3)ഁ غ͞ΕΔ. ͦͷࡍɼॳखͷϦετ͚͕ͩ1ͭͣΕͯɼਤ3 ͷӈਤͷΑ͏ʹͳΔ. ͯ͞ɼ൫໘ʹੴ͕(K − 1)ݸ͋ΔͷͰɼୈKख͕ λϒʔʹͳΔ֬L/(N2− (K − 1))Ͱ͋Γɼ࣍ख͕λ ϒʔʹͳΔ֬L/(N2− K)Ͱ͋Δ. ฏۉࢼߦճΛ্ քͰධՁ͢Εɼ֬P ΛP = L/(N2− K)ͱ͓͍ͯ Σ(ࣦഊ͢Δճ)× (ͦͷ֬) + 1Λܭࢉ͢ΕΑ͍. ∞ n=1 n(n − 1)(1 − P )Pn−1+Pn+ 1 (2) ࣜ(2)ୈ1߲λϒʔαʔνΛಋೖͨ͜͠ͱʹΑΓແޮ ͱͳΔϓϨΠΞτͷฏۉࢼߦճͷ্քͰ͋Δ. ϓϨ ΠΞτ͕༗ޮʹͳΔʹ͞Βʹ͏1ճࢼߦΛՃ͠ͳ ͚ΕͳΒͳ͍. ͦΕ͕ࣜ(2)ୈ2߲ͷ+ 1ͷҙຯͰ͋Δ. ࣜ(2)Λܭࢉ͢Δͱɼ3P/(1 − P )2+ 1ΛಘΔ. ্քͱ͍ ͏ҙຯ߹͍͔Βɼ͜ΕΛ4P/(1 − P )2+ 1ͰධՁ͢Εɼ ݁ہT¯ ¯ T = N2− K + L N2− K − L 2 (3) ͱͯ͠ಘΒΕΔ. ࣜ(3)ΑΓϞϯςΧϧϩ୳ࡧʹλϒʔαʔνΛแ͞ ͤͯແޮʹͳΔϓϨΠΞτ͕ۃʹ૿͑Δ͜ͱͳ ͍ͱ͍͑Δ.
4.
࣮ݧ
୯७ͳMCTSख๏ͱλϒʔαʔνΛแͨ͠Tabu + MCTSख๏Λରہٴͼ٧ޟʹͯੑೳൺֱΛߦͬͨ. ·ͨɼ ಉҰہ໘ͷ୳ࡧൺʹ͍࣮ͭͯݧΛߦͬͨ. ֤छύϥ ϝʔλɼλϒʔϦετͷαΠζΛ3ʙ12ɼλϒʔظؒΛ 3ʙ12ɼλϒʔϦετʹՃ͢Δہ໘ΛϓϨΠΞτ2खද1 Results of the games
݁Ռ ରہ উͪ ෛ͚ উ Tabu3 200ہ 138উ 62ഊ 69.0 % Tabu4 200ہ 148উ 52ഊ 74.0 % Tabu5 200ہ 154উ 46ഊ 77.0 % Tabu6 200ہ 162উ 38ഊ 81.0 % Tabu7 200ہ 152উ 48ഊ 76.0 % Tabu8 200ہ 162উ 38ഊ 81.0 % Tabu9 200ہ 155উ 45ഊ 77.5 % Tabu10 200ہ 155উ 45ഊ 77.5 % Tabu11 200ہ 157উ 43ഊ 78.5 % Tabu12 200ہ 160উ 40ഊ 80.0 %
ද2 Results of binomial test
݁Ռ p ৴པ۠ؒ Tabu3 8.027e−8 0.621 - 0.753 Tabu4 7.261e−12 0.673 - 0.799 Tabu5 1.017e−9 0.703 - 0.865 Tabu6 < 2.200e−16 0.749 - 0.862 Tabu7 8.738e−14 0.695 - 0.817 Tabu8 < 2.200e−16 0.749 - 0.862 Tabu9 2.397e−15 0.711 - 0.831 Tabu10 2.397e−15 0.711 - 0.831 Tabu11 < 2.200e−16 0.722 - 0.840 Tabu12 < 2.200e−16 0.738 - 0.853 ҎɼϓϨΠΞτͷᮢΛ100ͱͨ͠. 4.1 ରہʹΑΔੑೳൺֱ ରہ݁ՌΛද1ʹࣔ͢. ද1ͷTabu3ʙ12λϒʔαΠ ζ͕3ʙ12Ͱ͋ΔTabu + MCTSख๏Λද͍ͯ͠Δ. Tabu + MCTSख๏Λࠇ൪ɼ୯७ͳMCTSख๏Λന൪ͱ͠ɼί ϛΛ6ɼ࣋ͪ࣌ؒΛ֤10ɼޟ൫Λ9࿏൫ͱͨ͠. ද1ΑΓɼTabu3ʙTabu12ͷશͯʹ͓͍ͯ୯७ͳMCTS ख๏ʹରͯ͠ߴ͍উ͕ಘΒΕͨ. ࣍ʹɼද1ͷରઓ݁Ռ Λجʹ༗ҙਫ४5%Ͱೋ߲ݕఆΛߦͬͨ. ݁ՌΛද2ʹࣔ ͢. ͜͜ͰɼؼແԾઆΛʮ୯७ͳMCTSख๏ͱTabu3ʙ Tabu12ͷTabu + MCTSख๏ʹعྗͷࠩͳ͍ʯͱ͠ɼ ରཱԾઆΛʮೋͭͷख๏ʹعྗͷ͕ࠩ͋Δʯͱͨ͠. ද2 ΑΓTabu3ʙTabu12ͷશͯʹ͓͍ͯp͕0.05ΑΓখ͞ ͍ͷ໌Β͔Ͱ͋Δ. ΑͬͯɼؼແԾઆغ٫͞Εɼରཱ Ծઆ͕࠾͞ΕΔ. ͭ·ΓɼTabu + MCTSख๏ͱ୯७ͳ MCTSख๏ʹعྗͷ͕ࠩ͋Δͱ͍͑Δ. 4.2 ಉҰہ໘ͷ୳ࡧൺ ୯७ͳMCTSख๏ͱTabu + MCTSख๏͕ɼͦΕͧΕ 9࿏൫ͷࠇ൪1खΛ୳ࡧͨ͠ࡍͷϓϨΠΞτʹ͓͍ͯ ಉҰہ໘Λ୳ࡧͨ͠ൺΛਤ4ʹࣔ͢. ·ͨہ໘ͷରɼ ϓϨΠΞτ1खͱ2खͷہ໘ͱͨ͠. ਤ4ΑΓɼTabu + MCTSख๏ͷํ͕୯७ͳMCTSख ๏ʹൺͯಉҰہ໘Λ୳ࡧͨ͠ൺ͕͍͜ͱ͕͔Δ.
The 18th Game Programming Workshop 2013
-10 15 20 25
MCTS Tabu4 Tabu6 Tabu8 Tabu10 Tabu12
S ear ch ing r at ios on t h e s am e s tat e ( % )
first move in playout second move in playout
ਤ4 Searching ratios on the same state in playouts
·ͨɼλϒʔαΠζ͕େ͖͘ͳΔʹैͬͯɼಉҰہ໘Λ୳ ࡧͨ͠ൺ͕ݮগ͍ͯ͠Δ. ۩ମతʹɼϓϨΠΞτ1ख ͰMCTS21.5%ʹର͠Tabu11Ͱ14.6%ɼϓϨΠΞ τ2खͰMCTS17.2%ʹର͠Tabu11Ͱ12.4%ʹݮ গ͍ͯ͠Δ. ͜ΕɼఏҊ͢ΔTabu + MCTSख๏͕୯७ ͳMCTSख๏ʹൺͯϓϨΠΞτͷଟ༷ੑΛ֬อͯ͠ ͍Δ͜ͱΛҙຯ͢Δ. 4.3 ٧ޟʹΑΔੑೳൺֱ ٧ޟ10Λ୯७ͳMCTSख๏ͱTabu + MCTSख๏ ʹͦΕͧΕ30ճͣͭղͤͨ͞. ͦΕͧΕͷख๏ͷਖ਼ Λਤ5ʹࣔ͢. ͜͜Ͱɼਖ਼Λ٧ޟͷ1ख͕ਖ਼ணͰ ׂ͋ͬͨ߹ͱ͢Δ. ਤ5ͷMCTSͱ୯७ͳMCTSख๏ ΛɼTabu12λϒʔαΠζ12ͷTabu + MCTSख๏Λ ද͍ͯ͠Δ. ਤ5ΑΓMCTSʹൺͯTabu12ͷํ͕٧ޟਖ਼͕ ߴ͍͜ͱ͕͔Δ. ಉ༷ʹTabu3ʙ11ʹରͯ͠ಉ࣮͡ݧΛ ߦͬͨ݁ՌɼTabu12ʹ͍ۙਖ਼͕ಘΒΕͨ. ࣍ʹɼ୯७ ͳMCTSख๏ͱTabu3ʙ12ͷTabu + MCTSख๏ͷ٧ޟ ਖ਼ͷ݁ՌΛجʹ༗ҙਫ४5%Ͱχ2ݕఆΛߦͬͨ. ͜͜ ͰɼؼແԾઆΛʮMCTSͱTabu3ʙ12ʹ༗ҙͳࠩͳ͍ʯ ͱ͠ɼରཱԾઆΛʮMCTSͱTabu3ʙ12ʹ༗ҙͳ͕ࠩ͋ Δʯͱͨ͠. χ2ݕఆͷ݁ՌΛද3ʹࣔ͢. ද3ΑΓ༗ҙਫ ४5%ͷχ2Ͱ͋Δ3.841Λେ෯ʹ͍͑ͯΔ. Αͬͯɼ ؼແԾઆغ٫͞ΕɼରཱԾઆ͕࠾͞ΕΔ. ͭ·Γɼ୯ ७ͳMCTSख๏ͱTabu3ʙ12ͷTabu + MCTSख๏ʹ ༗ҙͳ͕ࠩ͋Δͱ͍͑Δ.
5.
͓ΘΓʹ
ຊߘͰλϒʔαʔνΛแͨ͠MCTSख๏ΛఏҊ͠ ͨ. ࣮ݧʹΑΓɼରہٴͼ٧ޟʹ͓͍ͯఏҊͨ͠Tabu + MCTSख๏ͷํ͕୯७ͳMCTSख๏ʹൺͯੑೳ͕ྑ ͍͜ͱΛ֬ೝͨ͠. ·ͨɼTabu + MCTSख๏ϓϨΠΞ τͷଟ༷ੑΛ֬อͰ͖Δ͜ͱ֬ೝͨ͠. ࠓޙͷ՝ͱͯ͠ɼࣗݾରઓʹΑΔউͷੳ͚ͩͰͳ ၥ㢟1 ၥ㢟2 ၥ㢟3 ၥ㢟4 ၥ㢟5 ၥ㢟6 ၥ㢟7 ၥ㢟8 ၥ㢟9 ၥ㢟10 0 20 40 60 80 100 ၥ㢟1 ၥ㢟2 ၥ㢟3 ၥ㢟4 ၥ㢟5 ၥ㢟6 ၥ㢟7 ၥ㢟8 ၥ㢟9 ၥ㢟10 0 20 40 60 80 100 MCTSTabu12ਤ5 Ratios of the correct answers ද3 Results of chi-squared test
݁Ռ χ2 p Tabu3 98.783 < 2.200e−16 Tabu4 100.109 < 2.200e−16 Tabu5 90.009 1.622e−15 Tabu6 99.114 < 2.200e−16 Tabu7 95.081 < 2.200e−16 Tabu8 107.867 < 2.200e−16 Tabu9 103.609 < 2.200e−16 Tabu10 83.569 3.149e−14 Tabu11 98.841 < 2.200e−16 Tabu12 96.830 < 2.200e−16 ͘ΦʔϓϯιʔεͱͷରઓʹΑΔੑೳൺֱ͕ඞཁͰ͋Δ. ঘɼຊݚڀͷҰ෦ɼจ෦Պֶল Պݚඅ ج൫ݚڀ(C) No.25330441 ͷॿΛಘͨ. ͜͜ʹँҙΛද͢Δ. ࢀߟจݙ [1] ୈ6ճUECഋίϯϐϡʔλғޟେձ, http://jsb.cs.uec. ac.jp/˜igo/
[2] Crazy Stone, http://remi.coulom.free.fr/CrazyStone/ [3] R´emi Coulom, “Computing Elo Ratings of Move
Pat-terns in the Game of Go”, Computer Games Workshop, 2007.
[4] Hendrik Baier, and Peter D. Drake, “The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go”, IEEE Transaction on Computational Intelligence and AI in Games, Vol.2, No.4, pp.303–309, 2010.
[5] Slyvain Gelly, Yizao Wang, R´emi Munosɼand Olivier Teytaudɼ“Modification of UCT with Patterns in Monte-Carlo Go”, Technical Report RR-6062, INRIA, 2006. [6] ʑ෦࢘,༄Ӝກݑ, “ہॴ୳ࡧ๏ͱͦͷ֦ு—λϒʔ
୳ࡧ๏Λத৺ͱͯ͠”,ܭଌͱ੍ޚ, Vol.47, No.6, pp.493– 499, 2008.
[7] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer, “Finite-time Analysis of the Multiarmed Bandit Prob-lem”, Machine Learning, Vol.47, No.2–3, pp.235–256, 2002.