The basic algorithm - Existing programs and research

2.4 Existing programs and research

3.1.2 The basic algorithm

Abramson was the ﬁrst to propose, in 1993, a way to use Monte-Carlo for playing games, with his expected outcome model [5]. This model was presented as a possible replacement for traditional Alpha-Beta search [47]. The main idea behind the algorithm is to play, in a given position, the move leading to the best average result. Now this idea of average result of a move must be deﬁned, and a simple way to do that is to use for it the percentage of victory of this move. To compute this percentage of victory, he proposes to use the Monte-Carlo method. The way to compute this average result for a move is quite simple and elegant: we just need to launch a huge number of random games from the position attained by playing the move, and average the results of these random games. This simple algorithm is given in pseudocode in Figure 3.1.

Being a stochastic method, Monte-Carlo has been naturally applied for games which also inherently include a part of randomness, like scrabble [71], poker [11] or backgam-mon [77], in which it generally leads to good results (providing some speciﬁc improve-ments). However, that does not mean that the method cannot be used for deterministic games, and it has indeed been applied to such games, like the game of Go, in more recent years [17]. Since then and following good results for the game of Go, it has been applied for numerous other games, both deterministic and stochastic [84, 76, 78].

One of the good point of Monte-Carlo for games is that it is very easy to program and, in a way, knowledge free. With the exception of the rules of the games, which permit us to generate moves and play them as well as to evaluate endgame positions, this algorithm does not require us to include any other speciﬁc form of knowledge, be it an evaluation function, patterns, an Opening-Book or an endgame database. Of course, as we will see later, including those can be very beneﬁcial for Monte-Carlo, but the point remains that it is possible to implement a program playing any game quite easily. For this reason,

1 function getBestMove ( P o s i t i o n , NumberRandomGames ) 2 f o r e a c h move m p l a y a b l e from P o s i t i o n

3 a v e r a g e v a l u e [m] = 0

4 f o r i from 1 t o NumberRandomGames 5 Pos = copy ( P o s i t i o n )

6 p l a y ( Pos , m)

7 while ( Pos i s not ended p o s i t i o n )

8 p l a y ( Pos , random move p l a y a b l e from Pos )

9 end while

10 v a l u e [m] = v a l u e [m] + r e s u l t ( Pos ) 11 end f o r

12 v a l u e [m] = v a l u e [m] / NumberRandomGames 13 end f o r

14 return( move m w i t h h i g h e s t v a l u e [m] ) 15 end function

Figure 3.1: Pseudocode for a basic Monte-Carlo playing engine

Monte-Carlo methods have also been used with great success in General Game-Playing competitions [36], the goal of which is to play a set of diﬀerent games without knowing the rules beforehand.

Another good point of using Monte-Carlo for games is that the search is anytime.

It is quite easy to change the code presented in Figure 3.1 not to launch N random games for each moves with a big number N, but instead play a random game for each move and repeat this process as long as there is time. Quite naturally, and thanks to the convergence property of Monte-Carlo, the longer the computation time, the bigger the number of games for each move and the more accurate the values computed. With unlimited time, Monte-Carlo will even return the best move, that is the move with the best average result.

Figure 3.2: Example of a global position badly handled by simple Monte-Carlo However, this simplicity comes at several costs. The ﬁrst one is computing power:

while random games are quick to play, the basic Monte-Carlo algorithm for games needs to play a signiﬁcant number of random games to get a correct estimation of the average result of each move. While this computation problem is common to a number of game programming methods, it is particularly acute here since a good portion of the computing time is allocated to bad moves and thus wasted.

Another cost is that the possible success of this application of Monte-Carlo to games is under the assumption that the move leading to the best average result is the best move to play in any given position. While this is a reasonable assumption and very human-like, it is deﬁnitely wrong in certain circumstances. To illustrate this, let us take a closer look at the Amazons position in Figure 3.2: the Black Amazon at D8 is defending a non-negligible territory. While there are probably several potentially good moves in this position, a simple Monte-Carlo program will choose D8D7-D6. While this looks like an opportunity to attack the center, a simple answer would be White playing C7D8-E8, thus stealing from Black all the territory on the top. A more reasonable move both attacking the center and defending the territory on the top would be D8D7-D8, or even D8E8-D8 for more safety.

Figure 3.3: Example of a local position badly handled by simple Monte-Carlo The position in Figure 3.3 gives us another example of a position badly handled by a Monte-Carlo program, this time a local position: while this is a win position for the White player if he is playing ﬁrst, this kind of territory requires him to play moves in the correct order to win. On the other hand, the upper territory is easier to ﬁll, and will give a better average result, so moving down for the White player will mostly lead to loss for him. On the other hand, moving up (and then shooting back, sealing his own territory by doing this) looks more attractive, because if the Black player makes the mistake of not shooting back at the entrance of his territory, he will lose more often than not. Since we are here dealing with random games, this outcome is quite probable. So in this situation, the White player will prefer to move upward while the only winning move begins by going downward.

The next two sections will present possible improvements to tackle these two particular problems.

Some other problems may appear depending on the game Monte-Carlo is applied to.

One of the most acute of them is that, since Monte-Carlo gets its evaluations from random games played from a given position up to the end, it may have some troubles terminating for games without a clear ending condition. Converging games such as the game of the Amazons pose no problem in this respect. For others such as the game of Go, it is possible to enforce speciﬁc rule such asDo not ﬁll the eyes to ensure that random games

will terminate [17]. For lots of others however, such as Shogi [81] or Lines of Actions [84], speciﬁc rules or knowledge have to be used to ensure the termination of the random games.

3.1.3 N-armed bandit heuristics and progressive pruning for

ドキュメント内 Monte-Carlo Techniques: Applications to the Game of the Amazons (ページ 30-33)