Chapter 7 - Symbolic Reasoning under Uncertainty

Scanned by Werner 'Dirk Gently' Zsolt (dirk@master.fok.hu)

So far, we have described techniques for reasoning with a complete, consistent, and unchanging model of the world. Unfortunately, in many problem domains it is not possible to create such models. In this chapter and the next. we explore techniques for solving problems with incomplete and uncertain models.

7.1 Introduction to Nonmonotonic Reasoning

In their book, The Web of Belief, Quine and Ullian [1978] provide an excellent discussion of techniques that can be used to reason effectively even when a complete, consistent, and constant model of the world is not available. One of their examples, which we call the ABC Murder story, clearly illustrates many of the main issues that such techniques must deal with. Quoting Quine and Ullian (1978]:

This story illustrates some of the problems posed by uncertain, fuzzy, and often changing knowledge. A variety of logical frameworks and computational methods have been proposed for handling such problems. In this chapter and the next, we discuss two approaches:

Other approaches to these issues have also been proposed and used in systems. For example, it is sometimes the case that there is not a single knowledge base that captures the beliefs of all the agents involved in solving a problem. This would happen in our murder scenario if we were to attempt to model the reasoning of Abbott, Babbitt, and Cabot, as well as that ofthe police investigatoc To be able to do this reasoning, we would require a technique for maintaining several parallel beliefspaces, each of which would correspond to the beliefs ofone agent. Such techniques are complicated by the fact that the belief spaces of the various agents, although not identical, are sufficiently.similar that it is unacceptably inefficient to represent them as completely separate knowledge bases. In Section 15.4.2 we return briefly to this issue. Meanwhile, in the rest of this chapter, we describe techniques for nonmonotonic reasoning.

Conventional reasoning systems, such first-order predicate logic, are designed to work with information that has three important properties:

Unfortunately, if any of these propenies is not satisfied, conventional logic-based reasoning systems become inadequate. Nonmonotonic reasoning systems, on the other hand, are designed to be able to solve problems in which all of these properties may be missing.

In order to do this, we must address several key issues, including the following.

  1. How can the knowledge base be extended to allow inferences to be made on the basis of lack of knowledge as well as on the presence of it ? For example, we would like to be able to say things like, "If you have no reason to suspect that a particular person committed a crime, then assume he didn't," or "If you have no reason to believe that someone is not getting along with her relatives, then assume that the relatives will try to protect her." Specifically, we need to make clear the distinction between:

    First-order predicate logic allows reasoning to be based on the first of these. We need an extended system that allows reasoning to be based on the second as well. In our new system, we call any inference that depends on the lack of some piece of knowledge a nonmonotonic inference[Footnote: recall that in Section 2.4, we also made a monotonic/nonmonotonic distinction. There the issue was classes of production systems. Although we are applying the distinction to different entities here, it is essentially the same distinction in both cases, since it distuingishes between systems that never shrink as a result of an action (monotonic ones) and ones that can (nonmonotonic ones).]

    Allowing such reasoning has a significant impact on a knowledge base. Non- monotonic reasoning systems derive their name from the fact that because of inferences that depend on lack of knowledge, knowledge bases may not grow monotonically as new assertions are made. Adding a new assertion may inval- idate an inference that depended on the absence of that assertion. First-order predicate logic systems, on the other hand, are monotonic in this respect. As new axioms are asserted, new wff's may become provable, but no old proofs ever become invalid.

    In other words, if some set of axioms entails the truth of some statement w, then T combined with another set of axioms N also entails w. Because nonmonotonic reasoning does not share this property, it is also called defeasible: a nonmonotonic inference may be defeated (rendered invalid) by the addition of new information that violates assumptions that were made during the original reasoning process. It turns out, as we show below, that making this one change has a dramatic impact on the structure of the logical system itself. In particular, most of our ideas of what it means to find a proof will have to be reevaluated.

  2. How can the knowledge base be updated properly when a new fact is added to the system (or when an old one is removed)? In particular, in nonmonotonic systems, since the addition of a fact can cause previously discovered proofs to be become invalid, how can those proofs, and all the conelusions that depend on them be found'? The usual solution to this problem is to keep track of proofs, which are often called . This makes it possible to find all the justifications that depended on the absence of the new fact, and those proofs can be marked as invalid. Interestingly, such a recording mechanism also makes it possible to support conventional, monotonic reasoning in the case where axioms must occasionally be retracted to retlect changes in the world that is being modeled. For example, it may be the case that Abbott is in town this week and so is available to testify, but if we wait until next week, he may be out of town. As a result, when we discuss techniques for maintaining valid sets of justifications, we talk both about nonmonotonic reasoning and about monotonic reasoning in a changing wcrrld.
  3. How can knowledge be used to help resolve conflicts when there are several inconsistent nonnmotonic inferences that could be drawn? It turns out that when inferences can be based on the lack of knowledge as well as on its presence, contradictions arc much more likely to occur than they were in conventional logical systems in which the only possible contradictions were those that depended on facts that were explicitly asserted to be true. In particular, in nonmonotonic systems, there are often portions of the knowledge base that are locally consis- tent but mutually (globally) inconsistent. As we show below, many techniques for reasoning nonmonotonically are able to define the alternatives that c;ould be believed, but most of them provide no way to choose among the options when not all of them can be believed at once.

To do this, we require additional methods for resolving such con Hicts in ways that are most appropriate for the particular problem that is being solved. For example, as soon as we conclude that Abbott, Babbitt, and Cabot all claim that they didn't commit a crime, yet we conelude that one of them must have since there's no one else who is believed to have had a motive, we have a contradiction, which we want to resolve in some particular way based on other knowledge that we have. In this case, for example, we choose to resolve the conHict by finding the person with the weakest alibi and believing that he committed the crime (which involves believing other things, such as that the chosen suspect lied).

The rest of this chapter is divided into five parts. tn the first, we present several logical formalisms that provide mechanisms for performing nonmonotonic reasoning. In the last four, we discuss approaches to the implementation of such reasoning in problem-solving programs. For more detailed descriptions of many of these systems, see the papers in Ginsberg [1987].

7.2 Logics for Nonmonotonic Reasoning

Because monotonicity is fundamental to the definition of first-order predicate logic, we are forced to find some altemative to support nonmonotonic reasoning. In this section, we look at several formal approaches to doing this. We examine several because no single formalism with all the desired properties has yet emerged (although there are some attempts, e.g., Shoham [1987] and Konolige [1987], to present a unifying framework for these several theories). In particular, we would like to find a formalism that does all of the foltowing things:

Figure 7.1: Models, Wff's, and Nonmonotonic Reasoning

As we examine each of the theories below, we need to evaluate how well they perform each of these tasks. For a more detailed discussion of these theories and some comparisons among them, see Reiter [1987a], Etherington [1988], and Genesereth and Nilsson [I987].

Before we go into specific theories in detail, let's consider Figure 7.1, which shows one way of visualizing how nonmonotonic reasoning works in all of them. The box labeled A corresponds to an original set of wff's. The large circle contains all the models of A. When we add some nonmonotonic reasoning capabilities to A, we get a new set of wff's, which we've labeled B. [Footnote: As we will see below, some techniques add inference rules, which then generate wff's, whilw others add wff's directly. We'll ignore that difference for the moment.] B (usually) contains more information than A does. As a result, fewer models satisfy B than A. The set of models corresponding to B is shown at the lower right of the Iarge circle. Now suppose we add some new wff's (representing new information) to A. We represent A with these additions as the box C. A difficulty may arise, however, if the set of models corresponding to C is as shown in the smaller, interior circle, since it is disjoint with the models for B. In order to find a new set of models that satisfy C, we need to accept models that had previously been rejected. To do that, we need to eliminate the wff's that were responsible for those models being thrown away. This is the essence of nonmonotonic reasoning.

7.2.1 Default Reasoning

We want to use nonmonotonic reasoning to perform what is commonly called default reasoning. We want to draw conclusions based on what is most likely to be true. In this section, we discuss two approaches to doing this.

Nonmonotonic Logic

One system tat provides a basis for default reasoning is Nonmonolonic Lo,gie (NML) [McDermott and Doyle, 1980], in which the language of first-order predicate logic is augmented with a modal operator M, which can be read as "is consistent." For example, the formula

&every;x, y : Related(x, y) ^ M GetAlong(x, y) -> WillDefend(x,y)

should be read as, "For all x and y, if x and y are related and if the fact that x gets along with y is consistent with everything else that is believed, then conclude that x will defend y."

Once we augment our theory to allow statements of this form, one imponant issue must be resolved if we want our theory to be even semidecidable. (Recall that even in a standard first-order theory, the question of theoremhood is undecidable, so semidecidability is the best we can hope for.) We must define what "is consistent" means. Because consistency in this system, as in first-order predicate logic, is undecidable, we need some approximation. The one that is usually used is the PROLOG notion of negation as failure, or some variant of it. In other words, to show that P is consistent, we attempt to prove &172;P. If we fail, then we assume &172;P to be false and we call P consistent. Unfonunately, this definition does not completely solve our problem. Negation as failure works in pure PROLOG because, if we restrict the rest of our language to Horn clauses, we have a decidable theory. So failure to prove something means that it is not entailed by our theory. If, on the other hand, we start with full first-order predicate logic as our base language, we have no such guarantee. So, as a practical matter, it may be necessary to define consistency on some heuristic basis, such as failure to prove inconsistency within some fixed level of effon.

A second problem that arises in this approach (and others, as we explain below) is what to do when multiple nonmonotonic statements. taken alone, suggest ways of augmenting our knowledge that if taken together would be inconsistent. For example, consider the following set of assenions:

&every;x : Republican(x) ^ M &172;Pacifist(x) -> &172;Pacifist(x)

&every;x : Quaker(x) ^ M Pacifist(x) -> Pacifist(x)

Republican(Dick)

Quaker(Dick)

The definition of NML that we have given supports two distinet ways ofaugmenting this knowledge base. In one, we first apply the first assertion, which allows us to conelude &172;Pacifist(Dick). Having done that, the second assertion cannot apply, since it is not consistent to assume Pacifist(Dick). The other thing we could do, however, is apply the second assertion first. This nesults in the conclusion Pacifisr(Dick), which prevents the fitst one from applying. So what conelusion does the theory actually suppon?

The answer is that NML defines the set of theorems that can be derived from a set of wff's A to be the intersection of the sets of theorems that result from the various ways in which the wff's of A might be combined. So, in our example, no conclusion about Dick's pacifism can be derived. This theory thus takes a very conservative approach to theoremhood.

It is worth pointing out here that although assettions such as the ones we used to reason about Dick's pacifism look like rules, they are, in this theory, just ordinary wff's which can be manipulated by the standard rules for combining logical expressions. So, for example, given

A ^ M B -> B

&172;A ^ M B -> B

we can derive the expression

M B -> B

In the original formulation of NML, the semantics of the modal operator M, which is self-referential, were unelear. A more recent system, Autoepistemic Logic [Moore, 1985] is very similar, but solves some ofthese problems.

Default Logic

An alternative logic for performing default-based reasoning is Reiter's Default Logic (DL) [Reiter, 1980], in which a new class of inference rules is introduced. In this approach, we allow inference rules of the form [Footnote: Reiter's original notation had ":M" in place of ":", but since it conveys no additional information, the M is usually omitted.] {A:BC}

Such a rule should be read as, "If A is provable and it is consistent to assume B then conclude C." As you can see, this is very similar in intent to the nonmonotonie expressions that we used in NML. There are some important differences between the two theories, however. The first is that in DL the new inference rules at'e used as a basis for computing a set ofplausible extensions to the knowledge base. Each extension corresponds to one maximal consistent augmentation of the knowledge base.[Footnote: What we mean by the expression "maximal consistent augmentation" is that no additional default rules can be applied without violating consistency. But it is imponant to note that only exprensions generated by the application of the stated inference rules to the original knowledge are allowed in an extension. Gratuitous additions an not permitted.] The logic then admits as a theorem any expression that is valid in any extension. If a decision atnong the extensions is necessary to support problem solving, some other mechanism must be provided. So, for example, if we return to the case of Dick the Republican, we ean compute two extensions, one corresponding to his being a pacifist and one corresponding to his not being a pacifist. The theoty of DL does not say anything about how to choose between the two. But see Reiter and Criscuolo [1981], Touretzky [1986], and Rich [1983] for discussions of this issue.

A second important difference between these two theoaes is that, in DL, the non- monotonic expressions are rules ofinference rather than expressions in the language. Thus they cannot be manipulated by the other rules of inference. This leads to some unexpeeted results. For example, given the two rules

{A:BC} {&172;A:BC}

and no assenion about A. no conclusion about B will be drawn, since neither inference rule applies.

Abduction

Standard logic performs deduction. Given two axioms:

&every;x : A(x) -> B(x)

A(C)

we can conclude B(C) using deduction. But what about applying the implication in reverse? For example, suppose the axiom we have is

&every;x : Measles(x) -> Spots(x)

The axiom says that having measles implies having spots. But suppose we notice spots. We might like to conclude measles. Such a conelusion is not licensed by the rules of standard logic and it may be wrong, but it may be the best guess we can make about what is going on. Deriving conclusions in this way is thus another focm of default reasoning. We call this specific form abductive reasoning. More precisely, the process of abductive reasoning can be described as, "Given two wff's (A -1 B) and (B), for any expressions A and B, if it is consistent to assume A, do so."

In many domains, abductive reasoning is particularly useful if some measure of cenainty is attached to the resulting expressions. These certainty measures quantify the risk that the abductive reasoning process is wrong, which it will be whenever there were other antecedents besides A that could have produced B. We discuss ways of doing this in Chapter 8.

Abductive reasoning is not a kind of logic in the sense that DL and NML are. In fact, it can be described in either of them. But it is a very useful kind of nonmonotonic nasoning, and so we mentioned it explicitly here.

Inheritance

One very common use of nonmonotonic reasoning is as a basis for inheriting attributc values from a prototypc description of a class to thc individual cntities thtþt belong to the class. We considered one example ofthis kind of reasoning in Chapter 4, when we discussed the baseball knowledge base. Recall that we presented there an algorithm for implementing inheritance. We can deseribe informally what that algorithm does by saying, "An object inherits attribute values from all the classes of which it is a member unless doing so leads to a contradiction, in which case a value from a more restricted class has precedence over a value from a broader class." Can the logical ideas we have just been discussing provide a basis for deseribing this idea more formally? The answer is yes. To see how, let's return to the baseball example (as shown in Figure 4.5) and try to write its inheritable knowledge as rules in DL.

We can write a rule to account for the inheritance of a default value for the height of a baseball player as:

{Baseball-Player(x) : height(x,6-1)height(x,6-1)}

Now suppose we assert Pitcher(Three-Finger-Brown). Since this enables us to conelude that Three-Finger-Brown is a baseball player, our rule allows us to conclude that his height is 6-1. If, on the other hand, we had asserted a confiicting value for Three Finger,s height, and if we had an axiom like

&every;x, y, z : height(x,y) ^ height(x,z) -> y = z,

which prohibits someone from having more than one height, then we would not be able to apply the default rule. Thus an explicitly stated value will block the inheritance of a default value, which is exactly what we want. (We'll ignore here the order in which the assertions and the rules occur. As a logical framework, default logic does not care. We'II just assume that somehow it settles out to a consistent state in which no defaults that conflict with explicit assertions have been asserted. In Section 7.5.1 we look at issues that arise in crexting an implementation that assures that.)

But now, let's encode the default rule for the height of adult males in generat. If we pattern it after the one for baseball players, we get

{Adult-Male(x) : height(x,5-10)height(x,5-10)}

Unfortunately, this rule does not work as we would like. In particular, if we again assert Pitcher(Three-Finger-Brown), then the resulting theory contains two extensions: one in which our first rule fires and Brown's height is 6-1 and one in which this new rule applies and Brown's height is 5-10. Neither of these extensions is preferred. In order to state that we prefer to get a value from the more specific category, baseball player, we could rewrite the default rule for adult males in general as:

{Adult-Male(x) : &172;Baseball-Player(x) ^ height(x,5-10)height(x,5-10)}

This effectively blocks the application of the default knowledge about adult males in the case that more speciftc information from the class of baseball players is available. Unfortunately, this approach can become unwieldy as the set of exceptions to the general rule inereases. For example, we could end up with a rule like:

{Adult-Male(x) : &172;Baseball-Player(x) ^ &172;Midget(x) ^ &172;Jockey(x) ^ height(x,5-10)height(x,5-10)}

What we have done here is to clutter our knowledge about the general class of adult males with a list of all the known exceptions with respect to height. A clearer approach is to say something like, "Adult males typically have a height of 5-10 unless they are abnormal in some way." We can then associate with other classes the information that they are abnormal in one or another way. So we could write, for example:

&every;x : Adult-Male(x) ^ &172;AB( x , aspect1) -> height(x, 5-10)

&every;x : Baseball-Player(x) -> AB(x, aspect1)

&every;x : Midget(x) -> AB(x, aspect1)

&every;x : Jockey(x) -> AB(x, aspect1)

Then, if we add the single default rule:

{:&172;AB(x,y)&172;AB(x,y)}

we get the desired result.

7.2.2 Minimalist Reasoning
So far, we have talked about general methods that provide ways of deseribing things that are generalty true. In this section we deseribe methods for saying a very specific and highly useful class ofthingsthat are generally true. These methods are based on some variant of the idea of a minimal model. Recall from the beginning of this section that a model of a set of formulas is an interpretation that satisfies them. Although there are several distinet definitions of what constitutes a minimal model, for our purposes, we will define a model to be minimal if there are no other models in which fewer things are true. (As you can probably imagine, there are technical diflficulties in making this precise, many of which involve the treatment of sentences with negation.) The idea behind using minimal models as a basis for nonmonotonic reasoning about the world is the following: "There are many fewer true statements than false ones. If something is true and relevant it makes sense to assume that it has been entered into our knowledge base. Therefore, assume that the only true statements are those that necessarily must be true in order to maintain the consistency of the knowledge base." We have already mentioned (in Section 6.2) one kind of reasoning based on this idea, the PROLOG concept of negation as failure, which provides an implementation of the idea for Horn clause-based systems. In the rest of this section we look at some logical issues that arise when we remove the Horn clause limitation.
The Closed World Assumption
A simple kind of minimalist reasoning is suggested by the Closed World Assumptiþn or CWA [Reiter, t978]. The CWA says that the only objects that satisfy any predicate P are those that must. The CWA is particularly powerful as a basis for reasoning with databases, which are assumed to be complete with respect to the propenies they deseribe. For example, a personnel database can safely be assumed to list all of the company's employees. If someone asks whether Smith works for the company, we should reply "no" unless he is explicitly listed as an employee. Similarly, an airline database can be assumed to contain a complete list of all the routes Mown by that airline. So if I ask if there is a direct Might from Oshkosh to EI Paso, the answer should be "no ' ifnone can be found in the database. The CWA is also useful as a way to deal with AB predicates, of the sort we introduced in Section 7.2.1, since we want to take as abnormal only those things that are assened to be so.

Although the CWA is both simple and powerful, it can fail to produce an appropriate answer for either of two reasons. The first is that its assumptions are not always true in the world; some pans of the world are not realistically "closable." We saw this problem in thc murder story example. There were facts that were relevant to the investigation that had not yet been uncovered and so were not present in the knowledge base. The CWA will yield appropriate results exactly to the extent that thc assumption that all the relevant positive facts are present in the knowledge base is true.

The second kind of problem that plagues the CWA arises from the fact that it is a purely syntactic reasoning process. Thus, as you would expect, its results depend on the form of the assertions that are provided. Let's look at two specific examples of this problem.

Consider a knowledge base that consists of just a single statement:

A(Joe) V B(Joe)

The CWA allows us to conclude both &172;A(Joe) and &172;B(Joe), since neither A nor B must necessarily be true of Joe. Unfonunately, the resulting extended knowledge base

A(Joe) V B(Joe)

&172;A(Joe)

&172;B(Joe)

is inconsistent.

The problem is that we have assigned a special status to positive instances of predicates, as opposed to negative ones. Specifically, the CWA forces completion of a knowledge base by adding the negative assenion,P whenever it is consistent to do so. But the assignment of a real world propeny to some predicate P and its complement to the negation of P may be arbitrary. For example, suppose we define a predicate Single and create the following knowledge base:

Single(John)

Single(Mary)

Then, if we ask about Jane, the CWA will yield the answer &172;Single(Jane). But now suppose we had chosen instcad to use the predicate Married rather than Single. Then the corresponding knowledge base would be

&172;Married(John)

&172;Married(Mary)

If we now ask about Jane, the CWA will yield the result &172;Married(Jane).

Circumseription

Although the CWA captures part of the idea that anything that must not necessarily be true should be assumed to be false, it does not capture all of it. It has two essential limitations:

  • It operates on individual predicates without considering the interactions among predicates that are defined in the knowledge base. We saw an example of this above when we considered the statement A(Joe) V B(Joe).
  • It assumes that all predicates have all of their instances listed. Although in many database applications this is true, in many knowledge-based systems it is not. Some predicates can reasonably be assumed to be completely defined (i.e., the part of the world they describe is closed), but others cannot (i.e., the pan of the world they describe is open). For example, the predicate has-a-gr-eYrr-s/rirt should probably be considered open since in most situations it would not be safe to assume that one has been told all the details of everyone else's wardrobe.

Several theories of circumscription (e.g., MeCarthy [1980], McCarthy [1986], and Lifsehitz [1985]) have been proposed to deal with these problems. In all of these theories, new axioms are added to the existing knowledge base. The effect of these axioms is to force a minimal interpretation on a selected ponion of the knowledge base. In particular, each specific axiom deseribes a way that the set of values for which a particular axiom of the original theory is true is to be delimited (i.e., circumscribed).

As an example, suppose we have the simple assertion

&every;x : Adult(x) ^ &172;AB(x,aspect1) -> Literate(x)

We would like to circumscribe AB, since we would like it to apply only to those individuals to which it applies. In essence, what we want to do is to say something about what the predicate AB must be (since at this point we have no idea what it is; all we know is its name). To know what it is, we need to know for what values it is true. Even though we may know a few values for which it is true (if any individuals have been assened to be abnormal in this way), there are many different predicates that would be consistent with what we know so far. Imagine this universe of possible binary predicates. We might ask, which of these predicates could be AB? We want to say that AB can only be one of the predicates that is true only for those objects that we know it must be true for. We can do this by adding a (second order) axiom that says that AB is the smallest predicate that is consistent with our existing knowledge base.

In this simple example, circumseription yields the same result as does the CWA since there are no other assenions in the knowledge base with which a minimization of AB must be consistent. ln both cases, the only models that are admitted are ones in which there are no individuals who are abnormul in aspect1. In other words, AB must be the predicate FALSE.

But, now let's return to the example knowledge base

A(Joe) V B(Joe)

lf we circumscribe only A, then this assertion describes exactly those models in which A is true of no one and B is true of at least Joe. Similarly, if we circumscribþ only B, then we will accept exactly those models in which B is true of no one and A is true of at least Jne. If we circumscribe A and B together, then we will admit only those models in which A is true of only Joe and B is true of no one or those in which B is true of only Joe and A is true of no one. Thus, unlike the CWA, circumseription allows us to describe the logical relationship between A and B.

7.3 Implementation Issues

Although the logical frameworks that we have just discussed take us part of the way toward a basis for implementing nonmonotonic reasoning in problem-solving programs, they are not enough. As we have seen, they all have some weaknesses as logical systems. In addition, they fail to deal with four important problems that arise in real systems.

The first is how to derive exactly those nonmonotonic conclusions that are relevant to solving the problem at hand while not wasting time on those that, while they may be licensed by the logic, are not necessary and are not worth spending time on.

The second problem is how to update our knowledge incrementally as problem- solving progresses. The definitions of the logical systems tell us how to decide on the truth status of a proposition with respect to a given truth status of the rest ofthe knowledge base. Since the procedure for doing this is a global one (relying on some form of consistency or minimality), any change to the knowledge base may have far-reaching consequences. It would be computationally intractable to handle this problem by starting over with just the facts that are explicitly stated and reapplying the various noomonotonic reasoning steps that were used before, this time deriving possibly different results.

The third problem is that in nonmonotonic reasoning systems, it often happens that more than one interpretation of the known facts is licensed by the available inference rules. In Reiter's terminology, a given nonmonotonic system may (and often does) have several extensions at the moment, even though many of them will eventually be eliminated as new knowledge becomes available. Thus some kind of search process is necessary. How should it be managed?

The final problem is that, in general, these theories are not computationally effective. None of them is decidable. Some are semidecidable, but only in their propositional forms. And none is efficient.

In the rest of this chapter, we discuss several computational solutions to these problems. In all of these systems, the reasoning process is separated into two parts: a problem solver that uses whatever mechanism it happens to have to draw conelusions as necessary and a truth maintenance system whose job is just to do the bookkeeping required to provide a solution to our second problem. The various logical issues we have been discussing, as well as the heuristic ones we have raised here are issues in the design of ihe problem solver. We discuss these issues in Section 7.4. Then in the following sections, we describe techniques for tracking nonmonotonic inferences so that changes to the knowledge base are handled properly. Techniques for doing this can be divided into two classes, determined by their approach to the search control problem:

  • Depth-first, in which we follow a siþgle, most likely path until some new piece of information comes in that forces us to give up this path and find another.
  • Breadth-first, in which we consider all the possibilities as eyually likely. We considerthem as a group, eliminating some ofthem as new facts become available. Eventually, it may happen that only one (or a small number) turn out to be consistent with everything we come to know.

It is important to keep in mind throughout the rest of this discussion that there is no exact correspondence between any of the logics that we have deseribed and any of the implementations that we will present. Unfortunately, the details of how the two can be brought together are stitl unknown.

7.4 Augmenting a Problem Solver

So far, we have deseribed a variety of logical formalisms, all of which describe the theorems that can be derived from a set of axioms. We have said nothing about how we might write a program that solves problems using those axioms. In this section, we do that.

As we have already discussed several times, problem solving can be done using either forward or backward reasoning. Problem solving using uncertain knowledge is no exception. As a result, there are two basic approaches to this kind of problem solving (as well as a variety of hybrids):

  • Reason forward from what is known. Treat nonmonotonically derivable conclu- sions the same way monotonically derivable ones are handled. Nonmonotonic reasoning systems that support this kind of reasoning allow standard forward- chaining rules to be augmented with unless clauses, which introduce a basis for reasoning by default. Control (meluding deciding which default interpretation to choose) is handled in the same way that all other control decisions in the system are made (whatever that may be, for example, via rule ordering or the use of metarules).
  • Reason backward to determine whether some expression P is true (or perhaps to find a set ofbindings for its variables that make it true). Nonmonotonic reasoning systems that support this kind of reasoning may do either or both of the following two things:
    • Allow default (unless) clauses in backward rules. Resolve contiicts among defaults using the same control strategy that is used for other kinds of roasoning (usually rule ordering).
    • Support a kind ofdebate in which an attempt is made to construct arguments both in favor of P and opposed to it. Then some additional knowledge is applied to the arguments to determine which side has the stronger case.

Let's look at backward reasoning first. We will begin with the simple case of backward reasoning in which we attempt to prove (and possibly to find bindings for)

Figure 7.2: Backward Rules Using UNLESS

an expression P. Suppose that we have a knowledge base that consists of the backward rules shown in Figure 7.2.

Assume that the problem solver that is using this knowledge base uses the usual PROLOG-slyle control structure in which rules are matched top to bottom, left to right. Then if we ask the question ?Suspect(x), the program will first try Abbott, who is a fine suspect given what we know now, so it will retum Abbott as its answec. If we had also included the facts

RegisteredHotel(Abbott, Albany)

FarAway(Albany)

then, the program would have failed to conclude that Abbott was a suspect and it would instead have located Babbitt.

As an alternative to this approach, considec the idea of a debate. In debating systems, an attempt is made to find multiple answers. In the ABC Murder stocy case, for example, all three possible suspects would be considered. Then some attempt to choose among the arguments would be made. In this case, for example, we might want to have a choice rule that says that it is more likely that people will lie to defend themselves than to defend others. We might have a second rule that says that we prefer to believe hotel registers rather than people. Using these two rules, a problem solver would conclude that the most likely suspect is Cabot.

Backward rules work exactly as we have described if all of the required facts are present when the rules are invoked. But what if we begin with the situation shown in Figure 7.2 and conclude that Abbott is our suspect. Later, we are told that he was

Figure 7.3: Forward Rules Using UNLESS

registered at a hotel in Albany. Backward rules will never notice that anything has changed. To make our system data-driven, we need to use forward rules. Figure 7.3 shows how the same knowledge could be represented as forward rules. Of course, what we probably want is a system that can exploit both. In such a system, we could use a backward rule whose goal is to find a suspect, coupled with forward rules that fire as new facts that are relevant to finding a suspect appear.

7.5 Implementation: Depth-First Search

7.5.1 Dependency-Directed Backtracking

If we take a depth-first approach to nonmonotonic reasoning, then the following scenario is likely to occur often: We need to know a fact, F, which cannot be derived monotonically from what we already know, but which can be derived by making some assumption A which seems plausible. So we make assumption A, derive F, and then derive some additional facts G and H from F. We later derive some other facts M and N, but they are completely independent of A and F. A little while later, a new fact comes in that invalidates A. We need to rescind our proof of F, and also our proofs of G and H since they depended on F. But what about M and N? They didn't depend on F, so there is no logical need to invalidate them. But if we use a conventional backtracking scheme, we have to back up past conclusions in the order in which we derived them. So we have to backup past M and N, thus undoing them, in order to get back to F, G, H and A. To get around this problem, we need a slightly different notion of backtracking, one that is based on logical dependeneies rather than the chronological order in which decisions were made. We call this new method dependency-directed barktracking [Statlman and Sussman, I977], in contrast to chronological backtracking, which we have been using up until now.

Before we go intodetail on how dependency-directed backtracking works, it is worth pointing out that although one of the big motivations for it is in handling nonmonotonic reasoning, it turns out to be useful for conventional search programs as well. This is not too surprising when you consider that what any depth-first seareh program does is to "make a guess" at something, thus creating a branch in the seareh space. If that branch eventually dies out, then we know that at least one guess that led to it must be wrong. It could be any guess along the braneh. In chronological backtracking we have to assume it was the most recent guess and back up there to try an altemative. Sometimes. though, we have additional information that tells us which guess caused the problem. We'd like to retract only that guess and the work that explicitly depended on it, leaving everything else that has happened in the meantime intact. This is exactly what dependency-directed backtracking does.

As an example, suppose we want to build a program that generates a solution to a fairly simple problem, such as finding a time at which three busy people can all attend a meeting. One way to solve such a problem is first to make an assumption that the meeting will be held on some particular day, say Wednesday, add to the database an assenion to that effect, suitably tagged as an assumption, and then proceed to find a time, checking along the way for any inconsistencies in people's schedules. If a conflict arises, the statement representing the assumption must be discarded and replaced by another, hopefully noncontradictory, one. But, of course, any statements that have been generated along the way that depend on the now-discarded assumption must also be discarded.

Of course, this kind of situation can be handled by a straightforward tree search with chronological backtracking. All assumptions, as well as the inferences drawn from them, are recorded at the search node that created them. When a node is determined to represent a contradiction, simply backtrack to the next node from which there remain unexplored paths. The assumptions and their inferences will disappear automatically. The drawback to this approach is illustrated in Figure 7.4, which shows part ofthe seareh tree of a prngram that is trying to sehedule a meeting. To do so, the program must solve a eonstraint satisfaetion problem to find a day and time at which none ofthe participants is busy and at which there is a sufficiently large room available.

In order to solve the problem, the system must try to satisfy one constraint at a time. Initially, there is little reason to choose one altemative over another, so it decides to schedule the meeting on Wednesday. That creates a new constraint that must be met by the rest of the solution. The assumption that the meeting will be held on Wednesday

Figure 7.4: Nondependency-Directed Backtracking

is stored at the node it generated. Next the program tries to select a time at which all participants are available. Among them, they have regularly scheduled daily meetings at all times except 2:00. So 2:0l1 is chosen as the meeting time. But it would not have mattered which day was chosen. Then the program discovers that on Wednesday there are no rooms available. So it backtracks past the assumption that the day would be Wednesday and tries another day, Tuesday. Now it must duplicate the chain ofreasoning that led it to choose 2:00 as the time because that reasoning was lost when it backtracked to redo the choice of day. This occurred even though that reasoning did not depend in any way on the assumption that the day would be Wednesday. By withdrawing statements based on the orderin which they were generated by the search process ratherthan on the basis of responsibility for inconsistency, we may waste a great deal of effort.

If we want to use dependency-directed backtracking instead, so that we do not waste this effort, then we need to do the following things:

  • Associate with each node one or more justifications. Each justification corre- sponds to a derivation process that led to the node. (Since it is possible to derive the same node in several different ways, we want to allow for the possibility of multiple justifications.) Each justification must contain a list of all the nodes (facts, rules, assumptions) on which its derivation depended.
  • Provide a mechanism that, when given a contradiction node and its justification, computes the "no-good" set of assumptions that underlie the justification. The no-good set is defined to be the minimal set of assumptions such that if you remove any element from the set, the justification will no longer be valid and the inconsistent node will no Ionger be believed.
  • Provide a mechanism for considering a no-good set and ehoosing an assumption to retract.
  • Provide a mechanism for propagating the result of retracting an assumption. This mechanism must cause all of the justifications that depended, however indirectly, on the retracted assumption to become invalid.

In the next two sections, we will describe two approaches to providing such a system.

7.5.2 Justifieation-Based Truth Maintenance Systems

The idea of a truth maintenance system or TMS [Doyle,1979] arose as a way of providing the ability to do dependency-directed backtracking and so to support nonmonotonic reasoning. There was a later attempt to rename it to Reason Maintenance System (a bit less pretentious), but since the old name has stuck, we use it here.

A TMS allows assertions to be connected via a spreadsheet-like network of depen- dencies. In this section, we describe a simple form of truth maintenance:system, a justification-based truth maintenance system (or JTMS). In a JTMS (or just TMS for the rest of this section), the TMS itself does not know anything about the stmcture of the assertions themselves. (As a result, in our examples, we use an English-like shorthand for representing the contents ofnodes.) The TMS's only role is to serve as a bookkeeper for a separate problem-solving system, which in tum provides it with both assertions and dependencies among assertions.

To see how a TMS works, let's retum to the ABC Murder story. Initially we might believe that Abbott is the primary suspect because he was a beneficiary of the deceased and he had no alibi. There are three assertions here, a specific combination of which we now believe, although we may change our beliefs later. We can represent these assertions in shorthand as follows:

  • Suspect Abbott (Abbott is the primary murder suspect.)
  • Beneficiary Abbott (Abbott is a beneficiary of the victim.)
  • Alibi Abbott (Abbott was at an Albany hotel at the time.)

Our reason for possible belief that Abbott is the murderer is nonmonotonic. In the notation of Default Logic, we can state the rule that produced it as

{Beneficiary(x) : &172;Alibi(x)Suspect(x)}

or we can write it as a backward rule as we did in Seetion 7.4.

If we currently believe that he is a beneficiary and we have no reason to believe he has a valid alibi, then we will believe that he is our suspect. But if later we come to believe that he does have a valid alibi, we will no longer believe Abbott is a suspect.

But how should belief be represented and how should this change in belief be enforced? There are various ad hoc ways we might do this in a rule-based system. But they would all require a developer to construct rules carefully for each possible change in belief. For instance, we would have to have a rule that said that if Abbott ever gets

Figure 7.5: A Justification

an alibi, then we should erase from the database the belief that Abbott is a suspect. But suppose that we later fire a rule that erases belief in Abbott's alibi. Then we need another rule that would reconclude that Abbott is a suspect. The task of creating a rule set that consistently maintains beliefs when new assertions are added to the database quickly becomes unmanageable. In contrast, a TMS dependency network offers a purely syntactic, domain-independent way to represent belief and change it consistently.

Figure 7.5 shows how these three facts would be represented in a dependency network, which can be created as a result of applying the first rule of either Figure 7.2 or Figure 7.3. The assertion Suspect Abbott has an associated TMS justificiation. Each justification consists of two parts: an IN-list and an OUT-list. In the figure, the assertions on the IN-list are connected to the justification by "+" links, those on the OUT list by "-" links. The justification is connected by an arrow to the assertion that it supports. In thejustification shown, there is exactly one assertion in each list. Beneficiary Abbott is in the IN-list and Alibi Abbott is in the OUT-list. Such ajustification says that Abbott should be a suspect just when it is believed that he is a beneficiary and it is not believed that he has an alibi.

More generally, assertions (usually called nodes) in a TMS dependency network are believed when they have a valid justification. A justification is valid if every assertion in the IN-list is believed and none of those in the OUT list is. A justification is nonmonotonic if its OUT-list is not empty, or, recursively, if any assertion in its IN- list has a nonmonotonic justification. Otherwise, it is monotonic. In a TMS network, nodes are labeled with a belief status. If the assertion corresponding to the node should be believed, then in the TMS it is labeled IN. If there is no good reason to believe the assertion, then it is labeled OUT. What does it mean that an assertion "should be believed" or has no "good" reason for belief?

A TMS answers these questions for a dependency network in a way that is independent of any interpretation of the assertions associated with the nodes. The labeling task of a TMS is to label each node so that two criteria about the dependency network

Figure 7.6: Labeled Nodes with Premise Justification

structure are met. The first criterion is consistency: every node labeled IN is supported by at least one valid justification and all other nodes are labeled OUT. More specifically than before, a justification is valid if every node in its IN-list is labeled IN and every node in its OUT list is labeled OUT. Notice that in Figure 7.5, all of the assertions would have to be labeled OUT to be consistent. Alibi Abbott has no justification at all, much less a valid one, and so must be labeled OUT. But the same is true for Beneficiary Abbott, so it must be OUT as well. Then the justification for Suspect Abbott is invalid because an element of its IN-list is labeled OUT. Suspect Abbott would then be labeled OUT as well. Thus status labels correspond to our belief or lack of it in assertions, and justifications correspond to our reasons for such belief, with valid justifications being our "good" reasons. Notice that the label OUT may indicate that we have specific reason to believe that a node represents an assertion that is not true, or it may mean simply that we have no information one way or the other.

But the state of affairs in Figure 7.5 is incomplete. We are told that Abbott is a beneficiary. We have no furtherjustification for this fact; we must simply accept it. For such facts, we give a premise justification: a justification with empty IN- and OUT lists. Premise justifications are always valid. Figure 7.6 shows such a justification added to the network and a consistent labeling for that network, which shows Suspect Abbott labeled IN.

That Abbot is the primary suspect represents an initial state of the murder investi- gation. Subsequently, the detective establishes that Abbott is listed on the register of a good Albany hotel on the day of the murder. This provides a valid reason to believe Abbott's alibi. Figure 7.7 shows the effect ofadding such a justification to the network, assuming that wc have uxed forward (data-driven) rules as shown in Figure 7.3 for all of our reasoning except possibly establishing the top-level goal. That Abbott was registered at the hotel, Registered Abbott, was told to us and has a premise justification and so is labeled IN. That the hotel is far away is also asserted as a premise. The register might have been forged, but we have no good reason to believe it was. Thus

Figure 7.7: Changed Labeling

Register Forged lacks any justification and is Iabeled OUT. That Abbott was on the register of a far away hotel and the lack. of belief that the register was forged will cause the appropriate forward rule to fire and create a justification for Alibi Abbott, which is thus labeled IN. This means that Suspect Abbott no longer has a valid justification and must be labeled OUT. Abbott is no longer a suspect.

Notice that such a TMS labeling carefully avoids saying that the register definitely was not forged. It only says that there is currently no good reason to believe that it was. Just like our original reason for believing that Abbott was a suspect, this is a nonmonotonic justification. Later, if we find that Abbott was secretly married to the desk clerk, we might add to this network a justification that would reverse some of the labeling. Babbitt will have a similarjustification based upon lack of belief that his brother-in-law lied as shown in Figure 7.8 (where B-I-L stands for "Brother-In-Law").

Abbott's changing state showed how consistency was maintained. There is another criterion that the TMS must meet in labeling a dependency network: well-foundedness (i.e., the proper grounding of a chain of justifications on a set of nodes that do not themselves depend on the nodes they support). To illustrate this, consider poor Cabot. Not only does he have fewer bs and ts in his name, he also Iacks a valid justification for his alibi that he was at a ski show. We have only his word that he was. Ignoring the more complicated representation of lying, the simple dependency network in Figure 7.4 illustrates the fact that the only support for the alibi of attending the ski show is that Cabot is telling the truth about being there. The only support for his telling the truth would be if we knew he was at the ski show. But this is a circular argument. Part of the task of a TMS is to disallow such arguments. ln particular. if the support for a node only depends on an unbroken chain of positive links (IN-list links) leading back to itself,

Figure 7.8: Babbitt's Justification

then that node must be labeled OUT if the labeling is to be well-founded.

The TMS task of ensuring a consistent, well-founded labeling has now been outlined. The other major task of a TMS is resolving contradictions. In a TMS, a contradiction node does not represent a logical contradiction but rather a state ofthe database explicitly declared to be undesirable. (In the next section, we describe a slightly different kind of TMS in which this is not the case.) In our example, we have a contradiction if we do not have at least one murder suspect. Thus a contradiction might have the justification shown in Figure 7.10, where the node Other Suspects means that there are suspects other than Abbott, Babbitt, and Cabot. This is one way of explicitly representing an instance of the closed world assumption. Later, if we discover a long-lost relative, this will provide a valid justification for Other Suspects. But for now, it has none and must be labeled OUT. Fonunately, even though Abbott and Babbitt are not suspects, Suspert Cabot is labeled IN, invalidating the justification for the contradiction. While the contradiction is labeled OUT, there is no contradiction to resolve.

Now we learn that Cabot was seen on television attending the ski tournament. Adding this to the dependency network first illustrates the fact that nodes can have more than one justification as shown in Figure 7.11. Not only does Cabot say he was at the ski slopes, but he was seen there on television, and we have no reason to believe that this was an elaborate forgery. This new valid justification of Alibi Cabot causes it to be labeled IN (which also causes Tells Truth Cabot to come IN). This change in state propagates to Suspect Cabot, which goes OUT. Now we have a problem. The justification for the contradiction is now valid and the contradiction is IN. The job of the TMS at this point is to determioe how the contradiction can be made OUT again. In a TMS network, a node can be made OUT by causing all of its justifications

Figure 7.9: Cabot's Justification

Figure 7.10: A Contradiction

to become invalid. Monotonic justifications cannot be made invalid without retracting explicit assertions that have been made to the network. Nonmonotonic justifications can, however, be invalidated by asserting some fact whose absence is required by the justification. We call assertions with nonmonotonic justifications assumptions. An assumption can be retracted by making IN some element of its justification's OUT list (or recursively in some element of the OUT list of the justification of some element in its IN-list). Unfortunately, there may be many such assumptions in a large dependency network. Fortunately, the network gives us a way to identify those that are relevant to the contradiction at hand. Dependency-directed backtracking algorithms, of the sort we described in Section 7.5.1, can use the dependency links to determine an AND/OR tree of assumptions that might be retracted and ways to retract them by justifying other beliefs.

In Figure 7.10, we see that the contradiction itself is an assumption whenever its justification is valid. We might retract it by believing there were other suspects or by finding a way to believe again that either Abbott. Babbitt, or Cabot was a suspect. Each of tho last three could be believed if we disbelieved their alibis, which in tum

Figure 7.11: A Second Justification

are assumptions. So if we believed that the hotel register was a forgery, that Babbitt's brother-in-law lied, or that the television pictures were faked, we would have a suspect again and the contradiction would go back OUT. So there are four things we might believe to resolve the contradiction. That is as far as DDB will take us. It reports there is an OR tree with four nodes. What should we do?

A TMS has no answer for this question. Early TMSs picked an answer at random. More recent arehitectures take the more reasonable position that this choice was a problem for the same problem-solving agent that created the dependencies in the first place. But suppose we do pick one. Suppose, in particular, that we choose to believe that Babbitt's hrother-in-law lied. What should be the justification for that belief? If we believe it just because not believing it leads to a contradiction, then we should install a justification that should be valid only as long as it needs to be. If later we find another way that the contradiction can be labeled OUT, we will not want to continue in our abductive belief.

For instance, suppose that we believe that the brother-in-Iaw lied, but later we discover that a long-lost relative,jilted by the family, was in town the day ofthe murder. We would no longer have to believe the brother-in-law liedjust to avoid a contradiction. A TMS mtcy also have algorithms to create such justifications, which we call abductive since they are created using abductive reasoning. If they have the property that they are not unnecessarily valid, thcy are said to be cnmplete. Figure 7.12 shows a complete abductive justification for thc belief that Babbitt's brother-in-law lied. If we come to

Figure 7.12: A Complete Abductive Justification

believe that Abbott or Cabot is a suspect, or we find a long-lost relative, or we somehow come to believe that Babbitt's brother-in-law didn't really say Babbitt was at his house, then thisjustification for lying will become invalid.

At this point, we have deseribed the key reasoning operations that are performed by a JTMS:

  • consistent labeling
  • contradiction resolution

We have also deseribed a set of important reasoning operations that a JTMS does not perform, ineluding:

  • applying,rules to derive conclusions
  • creating justifications for the results of applying rules (although justifications are created as part of contradiction resolution)
  • choosing among alternative ways of resolving a contradiction
  • detecting contradictions

All of these operations must be performed by the problem-solving program that is using the JTMS. In the next section, we describe a slightly different kind of TMS, in which, although the first three of these operations must still be performed by the problem-solving system, the last can be performed by the TMS.

7.5.3 Logic-Based 'Iruth Maintenance Systems
A logic-based truth maintenance system (LTMS) [McAllester,1980] is very similar to a JTMS. It differs in one important way. In a JTMS, the nodes in the network are treated as atoms by the TMS, which assumes no relationships among them except the ones that are explicitly stated in thejustifications. In particular, a JTMS has no problem simulta- neously labeling both P and &172;P IN. For example, we could have represented explicitly both Lies B-I-L and Not Lies B-1-L and labeled both of them IN. No contradiction will be detected automatically. In an LTMS, on the other hand, a contradiction would be asserted automatically in such a case. If we had constructed tho ABC example in an LTMS system, we would not have created an explicit eontradiction corresponding to the assenion that there was no suspect. Instead we would replace the contradiction node by one that assened something like No Suspect. Then we would assert Suspect. When No Suspect came IN, it would cause a contradiction to be assened automatically.

7.6 Implementation: Breadth-First Search

The assumption-based truth maintenance system (ATMS) [de Kleer,1986] is an alternative way of implementing nonmonotonic reasoning. In both JTMS and LTMS systems, a single line of reasoning is pursued at a time, and dependency-directed backtracking occurs whenever it is necessary to change the system's assumptions. In an ATMS, alternative paths are maintained in parallel. Backtracking is avoided at the expense of maintaining multiple contexts, each of which corresponds to a set of consistent as- sumptions. As reasoning proceeds in an ATMS-based system, the universe of consistent contexts is pruned as contradictions are discovered. The remaining consistent contexts are used to label assertions, thus indicating the contexts in which each assenion has a valid justification. Assenions that do not have a valid justification in any consistent context can be pruned from consideration by the problem solver. As the set ofconsistent contexts gets smaller, so too does the set of assenions that can consistently be believed by the problem solver. Essentially, an ATMS system works breadth-first, considering all possible contexts at once, while both JTMS and LTMS systems operate depth-first.

The ATMS, like the JTMS, is designed to be used in conjunetion with a separate problem solver. The problem solver's job is to:

  • Create nodes that correspond to assenions (both those that are given as axioms and those that are derived by the problem solver).
  • Associate with each such node one or more justifications, each of which describes a reasoning chain that led to the node.
  • Inform the ATMS of inconsistent contexts.

Notice that this is identical to the role ofthe problem solver that uses a lTMS, except that no explicit choices among paths to follow need be made as reasoning proceeds. Some decision may be necessary at the end, though, if more than one possible solution still hus a consistent context.

The role of the ATMS system is then to:

  • Propagate inconsistencies, thus ruling out contexts that inelude subc;ontexts (sets of assenions) that are known to be inconsistent.
  • Label each problem solver node with the contexts in which it has a valid justifi- cution. This is done by combining contexts that correspond to the components of a justilication. In particular, given a justification of the form

    A1 ^ A2 ^ ùùù ^ An -> C

    assign as a context for the node corresponding to C the intersection of the contexts corresponding to the nodes A1 through An.

Figure 7.13: A Context Lattice

Contexts get eliminated as a result of the problem solver assening inconsistencies and the ATMS propagating them. Nodes get created by the problem solver to represent possible components of a problem solution. They may then get pruned from consideration if all their context labels get pruned. Thus a choice among possible solution components gradually evolves in a process very much like the constraint satisfaction procedure that we examined in Section 3.5.

One problem with this approach is that given a set of n assumptions, the number of possible contexts that may have to be considered is 2n. Fonunately, in many problem- solving scenarios, most of them can be pruned without ever looking at them. Further, the ATMS exploits an efficient labeling system that makes it possible to encode a set of contexts as a single context that delimits the set. To see how both of these things work, it is necessary to think of the set of contexts that are defined by a set of assumptions as forming a lattice, as shown for a simple example with four assumptions in Figure 7.13. Lines going upward indicate a subset relationship.

The first thing this lattice does for us is to illustrate a simple mechanism by which contradictions (inconsistent contexts) can be propagated so that large pans of the space of 2^n^ contexts can be eliminated. Suppose that the context labeled {A2,A3} is asserted to be inconsistent. Then all contexts that inelude it (i.e., those that are above it) must also be inconsistent.

Now consider how a node can be labeled with all the contexts in which it has a valid justification. Suppose its justification depends on assumption A1. Then the context labeled {A1} and all the contexts that include it are acceptable. But this can be indicated just by saying {A1}. It is not necessary to enumerate its supersets. In general, each node will be labeled with the greatest lower bounds of the contexts in which it should be believed.

Clearly, it is important that this lattice not be built explicitly but only used as an implicit structure as the ATMS proceeds.

As an example of how an ATMS-based problem solver works, let's return to the ABC Murder story. Again, our goal is to find a primary suspect. We need (at least) the following assumptions:

  • A1. Hotel register was forged.
  • A2. Hotel register was not forged.
  • A3. Babbitt's brother-in-law lied.
  • A4. Babbitt's brother-in-law did not lie.
  • A5. Cabot lied.
  • A6. Cabot did not lie.
  • A7. Abbott, Babbitt, and Cabot are the only possible suspects.
  • A8. Abbott, Babbitt, and Cabot are not the only suspects.

The problem solver could then generate the nodes and associated justifications shown in the first two columns of Figure 7.14. In the figure, the justification for a node that corresponds to a decision to make assumption N is shown as {N}. Justifications for nodes that correspond to the result of applying reasoning rules are shown as the rule involved. Then the ATMS can assign labels to the nodes as shown in the second two columns. The first shows the tabel that would be generated for each justification taken by itself. The second shows the label (possibly containing multiple contexts) that is actually assigned to the node given all its current justifications. These columns are identical in simple cases, but they may differ in more complex situations as we see for nodes 12, 13, and 14 of our example.

There are several things to notice about this example:

  • Nodes may have several justifications ifthere are several possible reasons for believing them. This is the case for nodes 12,13, and 14.
  • Recall that when we were using a JTMS, a node was labeled IN if it had at least one valid justification. Using an ATMS, a node will end up being Iabeled with a consistent context if it has at least one justification that can occur in a consistent context.
  • The label assignment process is sometimes complicated. We describe it in more detail below.

Suppose that a problem-solving program first created nodes 1 through 14, repre- senting the various dependencies among them without committiþg to which of them it currently believes. It can indicate known contradictions by marking as no good the context:

  • A. B, C are the only suspects; A, B, C are not the oþly suspects: {A7,A8}

Figure 7.14: Nodes and Their Justifications and Labels

The ATMS would then assign the labels shown in the figure. Let's consider the case of node 12. We generate four possible labels, one for each justification. But we want to assign to the node a label that contains just the greatest lower bounds of all the contexts in which it can occur, since they implicitly encode the superset contexts. The label {A2} is the greatest lower bound of the first, third, and fourth label, and {AB} is the same for the second label. Thus those two contexts are all that are required as the label for the node. Now let's consider labeling node 8. Its label must be the union of the labels of nodes 7, 13, and 14. But nodes 13 and 14 have complex labels representing alternative justifications. So we must consider all ways of combining the labels of all three nodes. Fortunately, some of these combinations, namely those that contain both A7 and A8, can be eliminated because they are already known to be contradictory. Thus we are left with a single label as shown.

Now suppose the problem-solving program labels the context {A2} as no good, meaning that the assumption it contains (namely that the hotel register was not forged) conflicts with what it knows. Then many of the labels that we had disappear since they are now inconsistent. In particular, the labels for nodes I, 2, 9,10. and 12 disappear. At this point, the only suspect node that has a label is node 8. But node 12 [Not prime suspect Abbott) also still has a label that corresponds to the assumption that Abbott, Babbitt, and Cabot are not the only suspects. If this assumption is made, then Abbott would not be a clear suspect even ifthe hotel register were forged. Further information or some choice process is still necessary to choose between these remaining nodes.

7.7 Summary

In this chapter we have discussed several logical systems that provide a basis for nonmonotonic reasoning, including nonmonotonic logic, default logic, abduction, inheritance, the closed world assumption, and circumscription. We have also described a way in which the kind of rules that we discussed in Chapter 6 could be augmented to support nonmonotonic reasoning.

We then presented three kinds of TMS systems, all of which provide a basis for implementing nonmonotonic reasoning. We have considered two dimensions along which TMS systems can vary: whether they automatically detect logical contradictions and whether they maintain single or multiple contexts. The following table summarizes this discussion:

As can be seen in this table, there is currently no TMS with logical contradictions and multiple contexts.

These various TMS systems each have advantages and disadvantages with respect to each othec The major issues that distinguish JTMS and ATMS systems are:

  • The JTMS is often better when only a single solution is desired since it does not need to consider altematives; the ATMS is usually more efficient if all solutions are eventually going to be needed.
  • To create the context lattice, the ATMS performs a global operation in which it considers all possible combinations of assumptions. As a result, either all assumptions must be known at the outset of problem solving or an expensive, recompilation process must occur whenever an assumption is added. In the JTMS, on the other hand, the gradual addition of new assumptions poses no problem.
  • The JTMS may spend a lot of time switching contexts when backtracking is necessary. Context switching does not happen in the ATMS.
  • In an ATMS, inconsistent contexts disappear from consideration. If the initial problem description was overconstrained, then alI nodes will end up with empty labels and there will be no problem-solving trace that can serve as a basis for relaxing one or more of the constraints. In a JTMS, on the other hand, the justification that is attached to a contradiction node provides exactly such a trace.
  • The ATMS provides a natural way to answer questions of the form, "In what contexts is A true?" The only way to answer such questions using a JTMS is to try all the altematives and record the ones in which A is labeled IN.

One way to get the best of both of these worlds is to combine an ATMS and a TfM S (or LTMS), letting each handle the part of the problem-solving process to which it is best suited.

The various nonmonotonic systems that we have described in this chapter have served as a basis for a variety of applications. One area of particular significance is diagnosis (for example, of faults in a physical device) [Reiter, 1987b; de Kleer and Williams, 1987]. Diagnosis is a natural application area for minimalist reasoning in particular, since one way to describe the diagnostic task is, "Find the smallest set of abnormally behaving components that would account for the observed behavior" A second application area is reasoning about action, with a particular emphasis on addressing the frame problem [Hanks and McDermott, 1986]. The frame problem is also natural for this kind of reasoning since it can be described as, "Assume that everything stays the same after an action except the things that necessarily change." A third application area is design [Steele et al.,1989]. Here, nonmonotonic reasoning provides a basis for using common design principles to find a promising path quickly even in a huge design space while preserving the option to consider altematives later if necessary. And yet another application area is in extracting intent from English expressions (see Chapter 15.)

In all the systems that we have discussed, we have assumed that belief status is a binary function. An assertion must eventually be either believed or not. Sometimes, this is too strong an assumption. In the next chapter, we present techniques for dealing with uncertainty without making that assumption. Instead, we allow for varying degrees of belief.

7.8 Exercises

  1. Try to formulate the ABC Murder story in predicate logic and see how far you can get.
  2. The classic example of nonmonotonic reasoning involves birds and flying. In particular, consider the following facts:
    • Most things do not fiy.
    • Most birds do fly, unless they are too young or dead or have a broken wing.
    • Penguins and ostriches do not fiy.
    • Magical ostriches fly.
    • Tweety is a bird.
    • Chirpy is either a penguin or an ostrich.
    • Feathers is a magical ostrich.

    Use one or more of the nonmonotonic reasoning systems we have discussed to answer the following questions:

    • Does Tweety Hy?
    • Does Chirpy fly?
    • Does Feathers Hy?
    • Does Paul Hy?
  3. Consider the missionaries and cannibals problem of Section 2.6. When you solved that problem, you used the CWA several times (probably without thinking about it). List some of the ways in which you used it.
  4. A big technical problem that arises in defining circumscription precisely is the definition of a minimal model. Consider again the problem of Dick, the Quaker and Republican, which we can rewrite using a slightly different kind of AB predicate as:

    &every;x : Republican(x) ^ &172;AB1(x) -> &172;Pacifist(x)

    &every;x : Quaker(x) ^ &172;AB2(x) -> Pacifist(x)

    Republican(x)

    Quoker(x)

    1. Write down the smallest models you can that describe the two extensions that we computed for that knowledge base.
    2. Does it make sense to say that either is smaller than the other?
    3. Prioritized circumscription [McCarthy,1986] attempts to solve this problem by ranking predicates by the order in which they should be minimized. How could you use this idea to indicate a preference as to which extension to prefer?
  5. Consider the problem of finding clothes to wear in the morning. To solve this problem, it is necessary to use knowledge such as:
    • Wear jeans unless either they are diny or you have a job interview today.
    • Wear a sweater if it's cold.
    • It's usually cold in the wintec
    • Wear sandals if it's warm.
    • It's usually warm in the summer.
    1. Build a JTMS-style database of the necessary facts to solve this problem.
    2. Show how the problem can be solved and how the solution changes as the relevant facts (such as time of year and dirtiness ofjeans) change.
  6. Show how a JTMS could be used in medical diagnosis. Consider rules such as, "If you have a runny nose, assume you have a cold unless it is allergy season."
  7. Solve the same medical reasoning problem with an ATMS.
  8. Show how a JTMS could be used to select a TV program to watch. Consider rules such as, "If it is 6:00, then watch the news on channel 2 unless there is a football game still going on."
  9. TMSs are useful tools in solving constraint satisfaction problems since they fa- cilitate the nonmonotonic reasoning that occurs during the search for a complete solution.
    1. Show how a JTMS could be used to solve the cryptarithmetic problems of Chapter 2.
    2. Show how an ATMS would solve the same problem.
  10. We described informally the lTMS labeling process. Write a formal description of that algorithm.
  11. Work through the details of the ATMS node labeling process whose results are shown in Figure 7.1.4.