Part III - Advanced Topics

Chapter 12 - Game playing

*****************miss: 306-307

Thus it is clear that a program that simply does a straightforward search of the game tree will not be able to select even its first move during the lifetime of its opponent. Some kind of heuristic search procedure is necessary.

One way of looking at all the search procedures we have discussed is that they are essentially generate-and-test procedures in which the testing is done after varying amounts of work by the generator. At one extreme, the generator generates entire proposed solutions, which the tester then evaluates. At the other extreme, the generator generates individual moves in the search space, each of which is then evaluated by the tester and the most promising one is chosen. Looked at this way, it is clear that to improve the effectiveness of a search-based problem-solving program two things can be done:

In game-playing programs, it is particularly important that both these things be done. Consider again the problem of playing chess. On the average, there are about 35 legal moves available at each tum. If we use a simple legal-move generator, then the test procedure (which probably uses some combination of search and a heuristic evaluation function) will have to look at each of them. Because the test procedure must look at so many possibilities, it must be fast. So it probably cannot do a very accurate job. Suppose, on the other hand, that instead of a legal-move generator, we use a plausible- move generator in which only some small number of promising moves are generated. As the number of legal moves available increases, it becomes increasingly important to apply heuristics to select only those that have some kind of promise. (So, for example, it is extremely important in programs that play the game of go [Benson et al.,1979].) With a more selective move generator, the test procedure can afford to spend more time evaluating each of the moves it is given so it can produce a more reliable result. Thus by incorporating heuristic knowledge into both the generator and the tester, the performance of the overall system can be improved.

Of course, in game playing, as in other problem domains, search is not the only available technique. In some games, there are at least some times when more direct techniques are appropriate. For example, in chess, both openings and endgames are often highly stylized, so they are best played by table lookup into a database of stored patterns. To play an entire game then, we need to combine search-oriented and nonsearch-oriented techniques.

The ideal way to use a search procedure to find a solution to a problem is to generate moves through the problem space until a goal state is reached. In the context of game-playing programs, a goal state is one in which we win. Unfortunately, for interesting games such as chess, it is not usually possible, even with a good plausible- move generator, to search until a goal state is found. The depth of the resulting tree (or graph) and its branching factor are too great. In the amount of time available, it is usually possible to search a tree only ten or twenty moves (called ply in the game- playing literature) deep. Then, in order to choose the best move, the resulting board positions must be compared to discover which is most advantageous. This is done using a static evaluation function, which uses whatevor informiation it has to evaluate individual board positions by estimating how likely they are to lead eventually to a win. Its function is similar to that of the heuristic function h' in the A* algorithm: in the absence of complete information, choose the most promising position. Of course, the static evaluation function could simply be ūpplied directly to the positions generated by the proposed moves. But since it is hard to produce a function like this that is very accurate, it is better to apply it as many levels down in the game tree as time pemiits.

A lot of work in game-playing programs has gone into the development of good static evaluation functions.[Footnote: See Berliner [1979b] for a discussion of some theoretical issues in the design of static evaluation functions.] A very simple static evaluation function for chess based on piece advantage was proposed by Turing-simply add the values of black's pieces (B), the values of white's pieces (W), and then compute the quotient W/B. A more sophisticated approach was that taken in Samuel's checkers program, in which the static evaluation function was a linear combination of several simple functions, each of which appeared as though it might be significant. Samuel's functions included, in addition to the obvious one, piece advantage, such things as capability for advancement, control of the center, threat of a fork, and mobility These factors were then combined by attaching to each an appropriate weight and then adding the temis together. Thus the complete evaluation function had the form:

c1 x pieceadvantage + c2 x advancement + c3 x centercontrol...

There were also some nonlinear terms reflecting combinations of these factors. But Samuel did not know the correct weights to assign to each of the components. So he employed a simple learning mechanism in which components that had suggested moves that turned out to lead to wins were given an increased weight, while the weights of those that had led to losses were decreased.

Unfortunately, deciding which moves have contributed to wins and which to losses is not always easy. Suppose we make a very bad move, but then, because the opponent makes a mistake, we ultimately win the game. We would not like to give credit for winning to our mistake. The problem of deciding which of a series of actions is actually responsible for a particular outcome is called the credit assignment problem [Minsky, 1963]. It plagues many learning mechanisms, not just those involving games. Despite this and other problems, though, Samuel's checkers program was eventually able to beat its creator. The techniques it used to acquire this performance are discussed in more detail in Chapter 17.

We have now discussed the two important knowledge-based components of a good game-playing program: a good plausible-move generator and a good static evaluation function. They must both incorporate a great deal of knowledge about the particular game being played. But unless these functions are perfect, we also need a search procedure that makes it possible to look ahead as many moves as possible to see what may occur. Of course, as in other problem-solving domains, the role of search can be altered considerably by altering the amount of knowledge that is available to it. But, so far at least, programs that play nontrivial games rely heavily on search.

What search strategy should we use then? For a simple one-person game or puzzle, the A* algorithm described in Chapter 3 can be used. It can be applied to reason forward from the current state as far as possible in the time allowed. The heuristic function h' can be applied at terminal nodes and used to propagate values back up the search graph

Figure 12.1: One-Ply Search

so that the best next move can be chosen. But because of their adversarial nature, this procedure is inadequate for two-person games such as chess. As values are passed back up, different assumptions must be made at levels where the program chooses the move and at the alternating Ievels where the opponent chooses. There are several ways that this can be done. The most commonly used method is the minimax procedure, which is described in the next section. An altemative approach is the B* algorithm [Berliner, 1979a], which works on both standard problem-solving trees and on game trees.

12.2 The Minimax Search Procedure

The minimax search procedure is a depth-first, depth-limited search procedure. It was described briefly in Section 1.3.1. The idea is to start at the cunent position and use the plausible-move generator to generate the set of possible successor positions. Now we can apply the static evaluation function to those positions and simply choose the best one. After doing so, we can back that value up to the starting position to represent our evaluation of it. The starting position is exactly as good for us as the position generated by the best move we can make next. Here we assume that the static evaluation function returns large values to indicate good situations for us, so our goal is to maximize the value of the static evaluation function of the next board position.

An example of this operation is shown in Figure 12.1. It assumes a static evaluation function that returns values ranging from -10 to 10, with 10 indicating a win for us, - 10 a win for the opponent, and 0 an even match. Since our goal is to maximize the value of the heuristic function, we choose to move to B. Backing B 's value up to A, we can conclude that A's value is 8, since we know we can move to a position with a value of 8.

But since we know that the static evaluation function is not completely accurate, we would like to carry the search farther ahead than one ply. This could be very important, for example, in a chess game in which we are in the middle of a piece exchange. After our move. the situation would appear to be very good, but, if we look one move ahead, we will see that one of our pieces also gets captured and so the situation is not as favorable as it seemed. So we would like to look ahead to see what will happen to each of the new game positions at the next move which will be made by the opponent. Instead of applying the static evaluation function to each of the positions that we just generated, we apply the plausible-move generator. generating a set of successor positions for each position. If we wanted to stop here, at two-ply lookahead, we could apply the static evaluation function to each of these positions, as shown in Figure 12.2.

Figure 12.2: Two-Ply Search

Figure 12.3: Backing Up the Values of a Two-Ply Search

But now we must take into account that the opponent gets to choose which successor moves to make and thus which terminal value should be backed up to the next level. Suppose we made move B. Then the opponent must choose among moves E, F, and G. The opponent 's goal is to minimize the value of the evaluation function, so he or she can be expected to choose move F. This means that ifwe make move B, the actual position in which we will end up one move later is very bad for us. This is true even though a possible configuration is that represented by node E, which is very good for us. But since at this level we are not the ones to move, we will not get to choose it. Figure 12.3 shows the result of propagating the new values up the tree. At the level representing the opponent's choice, the minimum value was chosen and backed up. At the level representing our choice, the maximum value was chosen.

Once the values from the second ply are backed up, it becomes clear that the conect move for us to make at the first Ievel, given the information we have available, is C, since there is nothing the opponent can do from there to produce a value worse than -2. This process can be repeated for as many ply as time allows, and the more accurate evaluations that are produced can be used to choose the conect move at the top level. The altemation of maximizing and minimizing at alternate ply when evaluations are being pushed back up conesponds to the opposing strategies of the two players and gives this method the name minimax.

Having described informally the operation of the minimax procedure, we now describe it precisely. It is a straightforward recursive procedure that relies on two auxiliary procedures that are specific to the game being played:

As with any recursive program, a critical issue in the design of the MINIMAX procedure is when to stop the recursion and simply call the static evaluation function. There are a variety of factors that may influence this decision. They include:

For the general MINIMAX procedure discussed here, we appeal to a function, DEEP-ENOUGH, which is assumed to evaluate all of these factors and to return TRUE if the search should be stopped at the current level and FALSE otherwise. Our simple implementation of DEEP-ENOUGH will take two parameters, Position and Depth. It will ignore its Position parameter and simply return TRUE if its Depth parameter exceeds a constant cutoff value.

One problem that arises in defining MINIMAX as a recursive procedure is that it needs to return not one but two results: The backed-up value of the path it chooses.

  • The path itself. We return the entire path even though probably only the first element, representing the best move from the current position, is actually needed.

    We assume that MINIMAX returns a structure containing both results and that we have two functions, VALUE and PATH, that extract the separate components.

    Since we define the MINIMAX procedure as a recursive function, we must also specify how it is to be called initially It takes three parameters, a board position, the current depth of the search, and the player to move. So the initial call to compute the best move from the position CURRENT should be

    if PLAYER-ONE is to move, or

    if PLAYER-TWO is to move.

    Algorithm: MINIMAX(Position, Depth, Player)

    1. If DEEP-ENOUGH(Position, Depth), then return the structure
      • VALUE = STATIC(Position, Player);
      • PATH = nil

      This indicates that there is no path from this node and that its value is that determined by the static evaluation function.

    2. Otherwise, generate one more ply of the tree by calling the function MOVE- GEN(Position, Player) and setting SUCCESSORS to the list it returns.
    3. If SUCCESSORS is empty, then there are no moves to be made, so return the same structure that would have been returned if DEEP-ENOUGH had returned true.
    4. If SUCCESSORS is not empty, then examine each element in turn and keep track of the best one. This is done as follows.

      Initialize BEST SCORE to the minimum value that STATIC can return. It will be updated to reflect the best score that can be achieved by an element of SUCCESSORS.

      For each element SUCC of SUCCESSORS, do the following:

      1. Set RESULT SUCC to
        • MINIMAX(SUCC, Depth + 1, OPPOSITE(Player))
        This recursive call to MINIMAX will actually carry out the exploration of SUCC.
      2. Set NEW-VALUE to -VALUE(RESULT SUCC). This wilt cause it to reflect the merits of the position from the opposite perspective from that of the next lower level.
      3. If NEW-VALUE < BEST-SCORE, then we have found a successor that is better than any that have been examined so far. Record this by doing the following:
        1. Set BEST SCORE to NEW-VALUE.
        2. The best known path is now from CURRENT to SUCC and then on to the appropriate path down from SUCC as determined by the recursive call to MINIMAX. So set BEST PATH to the rosult of attaching SUCC to the front of PATH(RESULT SUCC).
    5. Now that all the successors have been examined, we know the value of Position as well as which path to take from it. So return the structure
      • VALUE = BEST SCORE
      • PATH = BEST PATH

    When the initial call to MINIMAX returns, the best move from CURRENT is the first element on PATH. To see how this procedure works, you should trace its execution for the game tree shown in Figure 12.2.

    The MINIMAX procedure just described is very simple. But its performance can be improved significantly with a few refinements. Some of these are described in the next few sections.

    12.3 Adding Alpha-Beta Cutoffs

    Recall that the minimax procedure is a depth-first process. One path is explored as far as time allows, the static evaluation function is applied to the game positions at the last step of the path, and the value can then be passed up the path one level at a time. One of the good things about depth-first procedures is that their efficiency can often be improved by using branch-and-bound techniques in which partial solutions that are clearly worse than known solutions can be abandoned early. We described a straightforward application of this technique to the traveling salesman problem in Section 2.2.1. For that problem, aIl that was required was storage of the length of the best path found so far. If a later partial path outgrew that bound, it was abandoned. But just as it was necessary to modify our seareh procedure slightly to handle both maximizing and minimizing players, it is also necessary to modify the branch-and-bound strategy to include two bounds. one for each of the players. This modified strategy is called alpha-beta pruning. It requires the maintenance of two threshold values, one representing a lower bound on the value that a maximizing node may ultimately be assigned (we call this alpha) and another representing an upper bound on the value that a minimizing node may be assigned (this we call beta).

    To see how the alpha-beta procedure works, consider the example shown in Figure 12.4. [Footnote: In this figure, we return to use of a single STATIC function from the point of view of the maximizing player.] After examining node F, we know that the opponent is guaranteed a score of -5 or less at C (since the opponent is the minimizing player). But we also know that we are guaranteed a score of 3 or greater at node A, which we can achieve if we move to B. Any other move that produces a score of less than 3 is worse than the move to B, and we can ignore it. After examining only F, we are sure that a move to C is worse (it will be less than or equal to -5) regardless of the score of node G. Thus we need not bother to explore node G at all. Of course, cutting out one node may not appear to justify the expense of keeping track of the limits and checking them, but if we were exploring this tree to six ply, then we would have eliminated not a single node but an entire tree three ply deep.

    To see how the two thresholds, alpha and beta, can both be used, consider the example shown in Figure 12.5. In searching this tree, the entire subtree headed by B is searehed, and we discover that at A we can expect a score of at lest 3. When this

    Figure 12.4: An Alpha Cutoff

    alpha value is passed down to F, it will enable us to skip the exploration of L. Let's see why. After K is examined, we see that I is guaranteed a maximum score of 0, which means that F is guaranteed a minimum of 0. But this is less than alpha's value of 3, so no more branches of I need be considered. The maximizing player already knows not to choose to move to C and then to I since, if that move is made, the resulting score will be no better than 0 and a score of 3 can be achieved by moving to B instead. Now let's see how the value of beta can be used. After cutting off further exploration of I, J is examined, yielding a value of 5, which is assigned as the value of F (since it is the maximum of 5 and 0). This value becomes the value of beta at node C. It indicates that C is guaranteed to get a 5 or less. Now we must expand G. First M is examined and it has a value of 7, which is passed back to G as its tentative value. But now 7 is compared to beta (5). It is greater, and the player whose turn it is at node C is trying to minimize. So this player will not choose G, which would lead to a score of at least 7, since there is an alternative move to F which will lead to a score of 5. Thus it is not necessary to explore any of the other branehes of G.

    From this example, we see that at maximizing levels, we can rule out a move early if it becomes clear that its value will be less than the current threshold, while at minimizing levels, seareh will be terminated if values that are greater than the current threshold are discovered. But ruling out a possible move by a maximizing player actually means cutting off the seareh at a minimizing level. Look again at the example in Figure 12.4. Once we determine that C is a bad move from A, we cannot bother to explore G, or any other paths, at the minimizing Ievel below C. So the way alpha and beta are actually used is that search at a minimizing level can be terminated when a value less than alpha is discovered, while a seareh at a maximizing level can be tenninated w hen a value greater than beta has been found. Cutting off seareh at a maximizing Ievel when a high value is found may seem counterintuitive at first, but if you keep in mind that we only get to a particular node at a maximizing level if the minimizing player at the level above chooses it, then it makes sense.

    Having illustrated the operation of alpha-beta pruning with examples, we can now explore how the MINIMAX procedure described in Section I2.2 can be modified to exploit this technique. Notice that at maximizing levels, only beta is used to determine whether to cut off the seareh, and at minimizing lovels only alpha is used. But at maximizing levels alpha must also be known since when a recursive call is mado

    Figure 12.5: Alpha and Beta Cutoffs

    to MINIMAX, a minimizing level is created, which needs aecess to alpha. So at maximizing levels alpha must be known not so that it can be used but so that it can be passed down the tree. The same is true of minimizing levels with respect to beta. Each level must receive both va)ues, one to use and one to pass down for the next level to use.

    The MINIMAX procedure as it stands does not need to treat maximizing and mini- mizing levels differently since it simply negates evaluations each time it changes levels. It would be nice if a comparable technique for handling alpha and beta could be found so that it would still not be necessary to write separate procedures for the two players. This turns out to be easy to do. Instead of referring to alpha and beta, MINIMAX uses two values, USE-THRESH and PASS-THRESH. USE-THRESH is used to compute cutoffs. PASS-THRESH is merely passed to the next level as its USE-THRESH. Of course, USE-THRESH must also be passed to the next level, but it will be passed as PASS-THRESH so that it can be passed to the third level down as USE-THRESH again, and so forth. Just as values had to be negated each time they were passed across levels, so too must these thresholds be negated. This is necessary so that, regardless of the level of the search, a test for greater than will determine whether a threshold has been crossed. Now there need still be no difference between the code required at maximizing levels and that required at minimizing ones.

    We have now described how alpha and beta values are passed down the tree. In addition, we must decide how they are to be set. To see how to do this, let's return first to the simple example of Figure 12.4. At a maximizing level, such as that of node A, alpha is set to be the value of the best successor that has yet been found. (Notice that although at maximizing levels it is beta that is used to determine cutoffs, it is alpha whose new value can be computed. Thus at any level, USE-THRESH will be checked for cutoffs and PASS-THRESH will be updated to be used later.) But if the maximizing node is not at the top of the tree, we must also consider the alpha value that was passed down from a higher node. To see how this works, look again at Figure 12.5 and consider what happens at node F. We assign the value 0 to node I on the basis of examining node K. This is so far the best suceessor of F. But from an earlier exploration of the subtree headed by B, alpha was set to 3 and passed down from A to F. Alpha should not be reset to 0 on the basis of node I. It should stay as 3 to re Nect the best move found so far in the entire tree. Thus we see that at a maximizing level, alpha should be set to either the value it had at the next-highest maximizing level or the best value found at this level, whichever is greater. The corresponding statement can be made about beta at minimizing levels. In fact, what we want to say is that at any level, PASS-THRESH should always be the maximum of the value it inherits from above and the best move found at its level. If PASS-THRESH is updated, the new value should be propagated both down to lower levels and back up to higher ones so that it always refiects the best move found anywhere in the tree.

    At this point, we notice that we are doing the same thing in computing PASS- THRESH that we did in MINIMAX to compute BEST SCORE. We might as well eliminate BEST SCORE and let PASS-THRESH serve in its place.

    With these observations, we are in a position to describe the operation of the function MINIMAX-A-B, which requires four arguments, Position, Depth, Use-Thresh, and Pass-Thresh. The initial call, to choose a move for PLAYER-ONE from the position CURRENT, should be

    MINIMAX-A-B(CURRENT

    0,

    PLAYER-ONE,

    maximum value STATIC can compute,

    minimum value STATIC can compute)

    These initial values for Use-Thresh and Pass-Thresh represent the worst values that each side could achieve.

    Algorithm: MINIMAX-A-B(Position, Depth, Player, Use-Thresh, Pass-Thresh)

    1. If DEEP-ENOUGH(Position, Depth), then return the structure
      • VALUE = STATIC(Position, Player);
      • PATH = niI
    2. Otherwise, generate one more ply of the tree by calling the function MOVE- GEN(Position, Player) and setting SUCCESSORS to the list it returns.
    3. If SUCCESSORS is empty, there are no moves to be made; return the same structure that would have been returned if DEEP-ENOUGH had returned TRUE.
    4. If SUCCESSORS is not empty, then go through it, examining each element and keeping track of the best one. This is done as follows.

      For each element SUCC of SUCCESSORS:

      ***************318-319. is missing

    Figure 12.8: The Beginning of an Exchange

    Figure 12.9: The Situation Calms Down

    Waiting for quiescence helps in avoiding the horizon effect, in which an inevitable bad event can be delayed by various tactics until it does not appear in the portion of the game tree that minimax explores. The horizon effect can also influence a program's perception of good moves. The effect may make a move look good despite the fact that the move might be better if delayed past the horizon. Even with quiescence, all fixed-depth search programs are subject to subtle horizon effects.

    12.4.2 Secondary Search

    One good way of combating the horizon effect is to double-check a chosen move to make sure that a hidden pitfall does not exist a few moves farther away than the original search explored. Suppose we explore a game tree to an average depth of six ply and. on the basis of that search, choose a particular move. Although it would have been too expensive to have searched the entire tree to a depth of eight, it is not very expensive to search the single chosen branch an additional two levels to make sure that it still looks good. This technique is called secondary search.

    One particularly successful form of secondary search is called singular extensions. The idea behind singular extensions is that if a leaf node is judged to be far superior to its siblings and if the value of the entire search depends critically on the correctness of that node's value, then the node is expanded one extra ply. This technique allows the search program to concentrate on tactical, forcing combinations. It employs a purely syntactic criterion, choosing interesting lines of play without recourse to any additional domvn knowledge. The DEEP THOUGHT chess computer [Anantharaman et al.,1990] has used singular extensions to great advantage, finding midgame mating combinations as long as thirty-seven moves, an impossible feat for fixed-depth minimax.

    12.4.3 Using Book Moves

    For complicated games taken as wholes, it is, of course, not feasible to select a move by simply looking up the current game configuration in a catalogue and extracting the correct move. The catalogue would be immense and no one knows how to construct it. But for some segments of some games, this approach is reasonable. In chess, for example, both opening sequences and endgame sequences are highly stylized. In these situations, the performance of a program can often be considerably enhanced if it is provided with a list ofmoves (called book moves) that should be made. The use of book moves in the opening sequences and endgames, combined with the use of the minimax search procedure for the midgame, provides a good example of the way that knowledge and search can be combined in a single program to produce more effective results than could either technique on its own.

    12.4.4 Alternatives to Minimax

    Even with the refinements above, minimax still has some problematic aspects. For instance, it relies heavily on the assumption that the opponent will always choose the optimal move. This assumption is acceptable in winning situations where a move that is guaranteed to be good for us can be found. But, as suggested in Berliner [1977], in a losing situation it might be better to take the risk that the opponent will make a mistake. Suppose we must choose between two moves, both of which, if the opponent plays perfectly, lead to situations that are very bad for us, but one is slightly less bad than the other. But further suppose that the less promising move could lead to a very good situation for us ifthe opponent makes a single mistake. Although the minimax procedure would choose the guaranteed bad move, we ought instead to choose the other one, which is probably slightly worse but possibly a lot better. A similar situation arises when one move appears to be only slightly more advantageous than another, assuming that the opponent plays perfectly. It might be better to choose the less advantageous move if it could lead to a significantly superior situation if the opponent makes a mistake. To make these decisions well, we must have access to a model of the individual opponent's playing style so that the likelihood of various mistakes can be estimated. But this is very hard to provide.

    As a mechanism for propagating estimates of position strengths up the game tree, minimax stands on shaky theoretical grounds. Nau [1980] and Pearl [1983] have demonstrated that for certain classes of game trees, e.g., uniform trees with random terminal values, the deeper the search, the poorer the result obtained by minimaxing.

    Figure 12.10: Iterative Deepening

    This "pathological" behavior ofamplifying error-prone heuristic estimates has not been observed in actual game-playing programs, however. It seems that game trees containin won positions and nonrandom distributions ofheuristic estimates provide environment that are conducive to minimaxing.

    12.5 Iterative Deepening

    A number of ideas for searching two-player game trees have led to new algorithms for single-agent heuristic search, of the type described in Chapter 3. One such idea is iteratire deepening, originally used in a program called CHESS 4.5 [Slate and Atkin, 1977]. Rather than searching to a fixed depth in the game tree, CHESS 4.5 first searched only a single ply, applying its static evaluation function to the result of each of its possible moves. It then initiated a new minimax search, this time to a depth of two ply. This was followed by a three-ply search, then a four-ply search. etc. The name "iterative deepening" derives from the fact that on each iteration, the tree is searched one level deeper. Figure 12.10 depicts this process.

    On the face of it, this process seems wasteful. Why should we be intenested in any iteration except the final one? There are several reasons. First, game-playing programs are subject to time constraints. For example, a chess program may be required to complete all its moves within two hours. Since it is impossible to know in advance how long a fixed-depth tree search will take (because of variations in pruning efficiency and the need for selective search), a program may find itself running out of time. With iterative deepening, the current search can be aborted at any time and the best move found by the previous iteration can be played. Perhaps more importantly, previous iterations can provide invaluable move-ordering constraints. If one move was judged to be superior to its siblings in a previous iteration, it can be searched first in the next iteration. With effective ordering, the alpha-beta procedure can prune many more branches, and total search time can be decreased drastically. This allows more time for deeper iterations.

    Years after CHESS 4.5's success with iterative deepening, it was noticed [Korf, 1985a] that the technique could also be applied effectively to single-agent search to solve problems like the 8-puzzle. In Section 2.2.1, we compared two types of uninformed search, depth-first search and breadth-first search. Depth-first search was efficient in terms of space but required some cutoff depth in order to force backtracking when a solution was not found. Breadth-first search was guaranteed to find the shonest solution path but required inordinate amounts of space because all leaf nodes had to be kept in memory. An algorithm calted depth-first iterative deepening (DFID) combines the best aspects of depth-firstand breadth-firstsearch.

    Algorithm: Depth-First IteratIve Deepening

    1. Set SEARCH-DEPTH =1.
    2. Conduct a depth-first search to a depth of SEARCH-DEPTH. If a solution path is found, then return it.
    3. Otherwise, increment SEARCH-DEPTH by 1 and go to step 2.

    Clearly, DFID will find the shortest solution path to the goal state. Moreover, the maximum amount of memory used by DFID is proportional to the number of nodes in that solution path. The only disturbing fact is that all iterations but the final one are essentially wasted. However, this is not a serious problem. The reason is that most ofthe activity during any given iteration occurs at the leaf-node level. Assuming a complete tree, we see that there are as many leafnodes at level n as there are total nodes in levels 1 through n. Thus, the work expended during the nth iteration is roughly equal to the work expended during all previous iterations. This means that DFID is only slower than depth-first search by a constant factor. The problem with depth-first search is that there is no way to know in advance how deep the solution lies in the search space. DFID avoids the problem ofchoosing cutoffs without sacrificing efficiency, and, in fact, DFID is the optimal algorithm (in terms of space and time) for uninformed search.

    But what about informed, heuristic search? Iterative deepening can also be used to improve the performance of the A* search algorithm [Korf.1985a]. Since the major practical difficulty with A* is the large amount of memory it requires to maintain the search node lists, iterative deepening can be ofconsiderable service.

    Algorithm: Iterative-Deepening-A*

    1. Set THRESHOLD = the heuristic evaluation of the start state.
    2. Conduct a depth-first search, pruning any branch when its total cost function (g + h') exceeds THRESHOLD. [Footnote: Recall that g stands for the cost so far in reaching the current node, and h' stands for the heuristic estimate of the distance from the node to the goal.] If a solution path is found during the seareh, return it.
    3. Otherwise, increment THRESHOLD by the minimum amount it was exceeded during the previous step, and then go to Step 2.

    Like A*, Iterative-Deepening-A* (IDA*) is guaranteed to find an optimal solution, provided that h' is an admissible heuristic. Because of its depth-first seareh technique, IDA* is very efficient with respect to space. IDA* was the first heuristic search algorithm to find optimal solution paths for the 15-puzzle (a 4x4 version of the 8-puzzle) within reasonable time and space constraints.

    12.6 References on Specific Games

    In this chapter we have discussed search-based techniques for game playing. We discussed the basic minimax algorithm and then introduced a series of refinements to it. But even with these refinements, it is stifl difficult to build good programs to play difficult games. Every game, Iike every AI task, requires a careful combination ofseareh and knowledge.

    Chess

    Research on computer chess actually predates the field we call artificial intelligence. Shannon [1950] was the first to propose a method for automating the game, and two early chess programs were written by Greenblatt et al. [1967] and Newell and Simon [1972].

    Chess provides a well-defined laboratory for studying the trade-off between knowl- edge and seareh. The more knowledge a program has, the less searching it needs to do. On the other hand, the deeper the seareh, the less knowledge is required. Human chess players use a great deal of knowledge and very Iittle seareh-they typically investigate only 100 branches or so in deciding a move. A computer, on the other hand, is capable of evaluating millions of branches. Its chess knowledge is usually limited to a static evaluation function. Deep-searching chess programs have been calibrated on exercise problems in the chess literature and have even discovered errors in the official human analyses of the problems.

    A chess player, whether human or machine, carries a numerical rating that tells how well it has performed in competition with other players. This rating lets us evaluate in an absolute sense the relative trade-offs between search and knowledge in this domain. The recent trend in chess-playing programs is clearly away from knowledge and toward faster brute force seareh. It turns out that deep, full-width seareh (with pruning) is sufficient for competing at very high levels of chess. Two examples of highly rated chess machines are HITECH [Berliner and Ebeling, 1989] and DEEP THOUGHT [Anantharaman et al., 1990], both of which have beaten human grandmasters and both of which use custom-built parallel hardware to speed up legal move generation and heuristic evaluation.

    Checkers

    Work on computer checkers began with Samuel [1963]. Samuel's program had an inter- esting leaming component which allowed its performance to improve with experience. Ultimately, the program was able to beat its author. We look more closely at the leaming mechanisms used by Samuel in Chapter 17.

    Go

    Go is a very difficult game to play by machine since the average branehing factor of the game tree is very high. Brute force seareh, therefore, is not as effective as it is in chess. Human go players make up for their inability to seareh deeply by using a great deal of knowledge about the game. It is probable that go-playing programs must also be knowledge-based, since today's brute-force programs cannot compete with humans. For a discussion of some of the issues involved, see Wilcox [1988].

    Backgammon

    Unlike ehess, checkers, and go, a backgammon program must choose its moves with incomplete information about what may happen. If all the possible dice rolls are considered, the number of alternatives at each level of the search is huge. With current computational power, it is impossible to search more than a few ply ahead. Such a seareh will not expose the strengths and weaknesses of complex blocking positions, so knowledge-intensive methods must be used. One program that uses such methods is BKG Berliner [1980]. BKG actually does no searching at all but relies instead on positional understanding and understanding of how its goals should change for various phases of play. Like its ehess-playing cousins, BKG has reached high levels of play, even beating a human world champion in a short match.

    NEUROGAMMON [Tesauro and Sejnowski,1989] is another interesting backgammon program. It is based on a neural network model that leams from experience. Neurogammon is one of the few competitive game-playing programs that relies heavily on automatic learning.

    Othello

    Othello is a popular board game that is played on an 8x8 grid with bi-colored pieces. Although computer programs have already achieved world-championship level play [Rosenbloom,1982; Lee and Mahajan,1990], humans continue to study the game and international tournaments are held regularly. Computers are not permitted to compete in these tournaments, but it is believed that the best programs are stronger than the best humans. High-performance Othello programs rely on fast brute-force search and table lookup.

    The Othello experience may shed some light on the future of eomputer chess. Will top human players in the future study chess games between World Champion computers in the same way that they study classic human grandmaster matches today? Perhaps it will turn out that the different search versus knowledge trade-offs made by humans and computers will make it impossible for either of them to benefit from the experiences of the other.

    Others

    Levy [1988] contains a number ofclassic papers on computer game playing. The papers cover the games listed above as well as bridge, scrabble, dominoes, go-moku, hearts, and poker.

    12.7 Exercises

    1. Consider the following game tree in which static scores are all from the first player's point of view:

      Suppose the first player is the maximizing player. What move should be chosen?

    2. In the game tree shown in the previous problem, what nodes would not need to be examined using the alpha-beta pruning procedure?
    3. Why does the search in game-playing programs always proceed forward from the current position rather than backward from a goal state?
    4. Is the minimax procedure a depth-first or breadth-first search procedure?
    5. The minimax algorithm we have described searches a game tree. But for some games, it might be better to search a graph and to check, each time a position is generated, if it has been generated and evaluated before. Under what circumstances would this be a good idea? Modify the minimax procedure to do this.
    6. How would the minimax procedure have to be modified to be used by a program playing a three- or four-person game rather than a two-person one?
    7. In the context ofthe search procedure described in Section 12.3, does the ordering of the list of successor positions created by MOVEGEN matter? Why or why not? If it does matter, how much does it matter (i.e., how much effort is reasonable for ordering it)?
    8. Implement the alpha-beta search procedure. Use it to play a simple game such as tic-tac-toe.
    9. Apply DFID to the water jug problem of Section 2.1.