In this chapter, we begin exploring one particular way of representing facts-- the language of logic. Other representational formalisms are discussed in later chapters. The logical formalism is appealing because it immediately suggests a powerful way of deriving new knowledge from old-mathematical deduction. In this formalism, we can conclude that a new statement is true by proving that it follows from the statements that are already known. Thus the idea of a proof, as developed in mathematics as a rigorous way of demonstrating the truth of an already believed proposition, can be extended to include deduction as a way of deriving answers to questions and solutions to problems.
One of the early domains in which AI techniques were explored was mechanical theorem proving, by which was meant proving statements in various areas of mathematics. For example, the Logic Theorist [Newell et al.,1963] proved theorems from the first chapter of Whitehead and Russell's Principia Mathematica [1950]. Another theorem prover [Gelernter et al., 1963] proved theorems in geometry. Mathematical theorem proving is still an active area of AI research. (See, for example, Wos et al. [1984].) But, as we show in this chapter, the usefulness of some mathematical techniques extends well beyond the traditional scope of mathematics. It turns out that mathematics is no different from any other complex intellectual endeavor in reyuiring both reliable deductive mechanisms and a mass ofheuristic knowledge to control what would otherwise be a completely intractable search problem.
At this point, readers who are unfamiliar with propositional and predicate logic may want to consult a good introductory logic text before reading the rest of this chapter. Readers who want a more complete and formal presentation of the material in this chapter should consult Chang and Lee [1973]. Throughout the chapter, we use the following standard logic symbols: "->" (material implication), "&172" (not), "v" (or), "^" (and), "&every;" (for all), and "&exists;" (there exists).
Let's first explore the use of propositional logic as a way of representing the sort of world knowledge that an AI system might need. Propositional logic is appealing because it is simple to deal with and a decision procedure for it exists. We can easily
represent real-world facts as logical propositions written as well-formed formulas (wff's) in propositional logic, as shown in Figure 5.1. Using these propositions, we could, for example, conclude from the fact that it is raining the fact that it is not sunny. But very quickly we run up against the limitations of propositional logic. Suppose we want to represent the obvious fact stated by the classical sentence
Socrates is a man.
We could write:
SOCRATESMAN
But if we also wanted to represent
Plato is a man.
we would have to write something such as:
PLATOMAN
which would be a totally separate assertion, and we would not be able to draw any conclusions about similarities between Socrates and Plato. It would be much better to represent these facts as:
MAN(SOCRATES)
MAN(PLATO)
since now the structure of the representation reflects the structure of the knowledge itself. But to do that, we need to be able to use predicates applied to arguments. We are in even more difficulty if we try to represent the equally classic sentence
All men are mortal.
We could represent this as:
MORTALMAN
But that fails to capture the relationship between any individual being a man and that individual being a mortal. To do that, we really need variables and quantification unless we are willing to write separate statements about the mortality of every known man.
So we appear to be forced to move to first-order predicate logic (or just predicate logic, since we do not discuss higher order theories in this chapter) as a way of representing knowledge because it permits representations of things that cannot reasonably be represented in propositional logic. In predicate logic, we can represent real-world facts as statements written as wff's.
But a major motivation for choosing to use logic at all is that if we use logical statements as a way of representing knowledge, then we have available a good way of reasoning with that knowledge. Determining the validity of a proposition in propositional Iogic is straightforward, although it may be computationally hard. So before we adopt predicate logic as a good medium for representing knowledge, we need to ask whether it also provides a good way of reasoning with the knowledge. At first glance, the answer is yes. It provides a way of deducing new statements from old ones. Unfortunately, however, unlike propositional logic, it does not possess a decision procedure, even an exponential one. There do exist procedures that will find a proof of a proposed theorem if indeed it is a theorem. But these procedures are not guaranteed to halt if the proposed statement is not a theorem. In other words, although first-order predicate logic is not decidable, it is semidecidable. A simple such procedure is to use the rules of inference to generate theorems from the axioms in some orderly fashion, testing each to see if it is the one for which a proof is sought. This method is not particularly efficient, however, and we will want to try to find a better one.
Although negative results, such as the fact that there can exist no decision procedure for predicate logic, generally have little direct effect on a science such as AI, which seeks positive methods for doing things, this particular negative result is helpful since it tells us that in our search for an efficient proof procedure, we should be content if we find one that will prove theorems, even if it is not guaranteed to halt if given a nontheorem. And the fact that there cannot exist a decision procedure that halts on all possible inputs does not mean that there cannot exist one that will halt on almost all the inputs it would see in the process of trying to solve real problems. So despite the theoretical undecidability of predicate logic, it can still serve as a useful way of representing and manipulating some of the kinds of knowledge that an AI system might need.
Let's now explore the use of predicate logic as a way of representing knowledge by looking at a specific example. Consider the following set of sentences:
The facts deseribed by these sentences can be represented as a set of wff's in predicate logic as follows:
man(Marcus)
This representation captures the critical fact of Marcus being a man. It fails to capture some of the information in the English sentence, namely the notion of past tense. Whether this omission is acceptable or not depends on the use to which we intend to put the knowledge. For this simple example, it will be all right.
Pompeian(Marcus)
&every;x : Pompeian(x) -> Roman(x)
ruler(Caesar)
Here we ignore the fact that proper names are often not references to unique individuals, since many people share the same name. Sometimes deciding which of several people of the same name is being referred to in a particular statement may require a fair amount of knowledge and reasoning.
&every;x : Roman(x) -> loyalto(x, Caesar) V hate(x, Caesar)
ln English, the word "or" sometimes means the logical inclusive-or and sometimes means the logical exclusive-or(XOR). Here we have used the inclusive interpretation. Some people will argue, however, that this English sentence is really, stating an exelusive-or. To express that, we would have to write:
&every;x : Roman(x) -> [(loyalto(x, Caesar) V hate(x, Caesar)) ^
&172(loyalto(x, Caesar) ^ hate(x, Caesar))]
&every;x: &exists;y : 3v : loyalto(x,y)
A major problem that arises when trying to convert English sentences into logical stalements is the scope of quantifiers. Does this sentence say, as we have assumed in writing the logical formula above, that for each person there exists someone to whom he or she is loyal, possibly a different someone for everyone? Or does it say that there exists someone to whom everyone is loyal (which would be written as &exists;y: &every;x : loyalto(x,y))? Often only one of the two interpretations seems likely, so people tend to favor it.
&every;x : &every;y : person(x) -^ ruler(y) ^ tryassassinate(x,y)) -> &172;loyalto(x,y)
This sentence, too, is ambiguous. Does it mean that the only rulers that people try to assassinate are those to whom they are not loyal (the interpretation used here), or does it mean that the only thing people try to do is to assassinate rulers to whom they are not Ioyal?
In representing this sentence the way we did, we have chosen to write "try to assassinate ' as a single predicate. This gives a fairly simple representation with which we can reason about trying to assassinate. But using this representation, the connections between trying to assassinate and trying to do other things and between trying to assassinate and actually assassinating could not be made eas- ily. If such connections were necessary, we would need to choose a different representation.
tryassassinate(Marcus, Caesar)
From this brief attempt to convert English sentences into logical statements, it should be clear how difficult the task is. For a good deseription of many issues involved in this process, see Reichenbach [1947].
Now suppose that we want to use these statements to answer the question
Was Marcus loyal to Caesar?
It seems that using 7 and 8, we should be able to prove that Marcus was not loyal to Caesar (again ignoring the distinction between past and present tense). Now let's try to produce a formal proof, reasoning backward from the desired goal:
&172;loyalto(Marcus, Caesar)
In order to prove the goal, we need to use the rules of inference to transform it into another goal (or possibly a set of goals) that can in turn be transformed, and so on, until there are no unsatisfied goals remaining. This process may require the seareh of an AND-OR graph (as deseribed in Section 3.4) when there are alternative ways of satisfying individual goals. Here, for simplicity, we show only a single path. Figure 5.2 shows an attempt to produce a proof of the goal by reducing the set of necessary Mut as yet unattained goals to the empty set. The attempt fails, however, since there is no way to satisfy the goal person(Marcus) with the statements we have available.
The problem is that, although we know that Marcus was a man, we do not have any way to conclude from that that Marcus was a person. We need to add the representation of artother fact to our system, namely:
&every;x : man(x) -> person(x)
Now we can satisfy the last goal and produce a proof that Marcus was not loyal to Caesar.
From this simple example, we see that three important issues must be addressed in the process of converting English sentences into logical statements and then using those statements to deduce new ones:
An additional problem arises in situations where we do not know in advance which statements to deduce. In the example just presented, the object was to answer the yucstion "Was Marcus loyal to Caesar?" How would a program decide whether it should try to prove
loyalto(Murcus, Caesar)
&172;loyalto(Marcus, Caesar)
There are several things it could do. It could abandon the strategy we have outlined of reasoning backward from a proposed truth to the axioms and instead try to reason forward and see which answer it gets to. The problem with this approach is that, in general, the branching factor going forward from the axioms is so great that it would probably not get to either answer in any reasonable amount of time. A second thing it could do is use some sort ofheuristic rules for deciJing which answer is more likely anJ then try to prove that one first. If it fails to find a proof after some reasonable amount of effort, it can try the other answer. This notion of limited effort is important, since any proof procedure we use may not halt if given a nontheorem. Another thing it could do is simply try to prove both answers simultaneously and stop when one effort is successful. Even here, however, if there is not enough information available to answer the yuestion with certainty, the program may never halt. Yet a fourth strategy is to try both to prove one answer and to disprove it, and to use information gained in one of the processes to guide the other.
In Chapter 4, we discussed the specific attributes instance and isa and described the important role they play in a particularly useful form of reasoning, property inheritance. But if we look back at the way we just represented our knowledge about Marcus and Caesar, we do not appear to have used these attributes at all. We certainly have not used predicates with those names. Why not? The answer is that although we have not used the predicates instance and isa explicitly, we have captured the relationships they are used to express, namely class membership and class inelusion.
Figure 5.3 shows the first five sentences of the last section represented in logic in three different ways. The first part of the figure contains the representations we have already discussed. In these representations, class membership is represented with unary predicates (such as Roman), each of which corresponds to a class. Asserting that P(x) is true is equivalent to asserting that x is an instance (or element) of P. The second part of the figure contains representations that use the instance predicate explicitly. The predicate instance is a binary one, whose first argument is an object and whose second argument is a class to which the object belongs. But these representations do not use an explicit isa predicate. Instead, subelass relationships, such as that between Pompeians and Romans, are deseribed as shown in sentence 3. The implication rule there states that if an object is an instance of the subclass Pompeian then it is an instance ofthe superclass Roman. Note that this rule is equivalent to the standard set-theoretic definition of the subclass-superclass relationship. The third part contains representations that use both the instance and isa predicates explicitly. The use of the Isu predicate simplifies the representation of sentence 3, but it requires that one additional axiom (shown here as number 6) be provided. This additional axiom deseribes how an instance relation and an isa relation can be combined to derive a new instance relation. This one additional axiom is general, though, and does not need to be provided separately for additional isu relations.
These examples illustrate two points. The first is fairly specific. It is that, although class and superclass memberships are imponant facts that need to be represented. those memberships need not be represented with predicates labeled instance and isa. In fact, in a logical framework it is usually unwieldy to do that, and instead unary predicates corresponding to the classes are often used. The second pointis more general. There are usually several different ways ofrepresenting a given fact within a panicular rep- resentational framework, be it logic or anything else. The choice depends partly on which deductions need to be supponed most efficiently and partly on taste. The only important thing is that within a particular knowledge base consistency of representation is critical. Since any panicular inference rule is designed to work on one panicular form of represcntation, it is necessary that all the knowledge to which that rule is intended to apply be in the form that the rule demands. Many errors in the reasoning performed by knowledge-based programs are the result of inconsistent representation decisions. The moral is simply to be careful.
There is one additional point that needs to be made here on the subject of the use of isa hierarchies in logic-based systems. The reason that these hierarchies are so importanl is not that they permit the inference of superclass membership. It is that by permitting the inference of superclass membership, they permit the inference of other propcnics associated with membership in that superclass. So. for example, in our sample knowledge base it is imponant to be able to conclude that Marcus is a Roman because we have some relevant knowledge about Romans, namely that they either hate Caesar or are loyal to him. But recall that in the baseball example of Chapter 4, we were able to associate knowledge with superclasses that could then be overridden by more specific knowledge associated either with individual instances or with subclasses. In other words, we recorded default values that could be accessed whenever necessary. For example, there was a height associated with adult males and a different height associated with baseball players. Our procedure for manipulating the isa hierarchy guaranteed that we always found the correct (i.e., most specific) value for any attribute. Unfonunately, reproducing this result in logic is difficult.
Suppose, for example, that, in addition to the facts we already have, we add the following.[Footnoote: for convenience, we now return to our original notation using unary predicates to denote class relations.]
Pompeian(Paulus)
&172;[loyalto(Paulus,Caesar) V hate(Paulus,Caesar)]
In other words, suppose we want to make Paulus an exception to the general rule about Romans and their feelings toward Caesar. Unfonunately, we cannot simply add these facts to our existing knowledge base the way we could just add new nodes into a semantic net. The difficulty is that if the old assenions are left unchanged, then the addition ofthe new assenions makes the knowledge base inconsistent. In order to restore consistency, it is necessary to modify the original assenion to which an exception is being made. So our original sentence 5 must become:
&every;x : Roman(x) ^ &172;eq(x, Paulus) -> loyalto(x, Caesar) V hate(x, Caesar)
In this framework, every exception to a general rule must be stated twice, once in a particular statement.and once in an exception list that forms part of the general rule. This makes the use of general rules in this framework less convenient and less efficient when there are exceptions than is the use of general rules in a semantic net.
A further problem arises when information is incomplete and it is not possible to prove that no exceptions apply in a panicular instance. But we defer consideration of this problem until Chapter 7.
In the example we explored in the last section, all the simple facts were expressed as combinations of individual predicates, such as:
tryassassinate(Marcus, Caesar)
This is fine if the number of facts is not very large or if the facts themselves are sufficiently unstructured that there is little alternative. But suppose we want to express simple facts, such as the following greater-than and less-than relationships:
Clearly we do not want to have to write out the representation of each of these facts individually For one thing, there are infinitely many of them. But even if we only consider the finite number of them that can be represented, say, using a single machine word per number, it would be extremely inefficient to store explicitly a large set of statements when we could, instead, so easily compute each one as we need it. Thus it becomes useful to augment our representation by these computable piedicutes. Whatever proofprocedure we use, when it comes upon one ofthese predicates, instead of searching for it explicitly in the database or attempting to deduce it by further reasoning, we can simply invoke a procedure, which we will specify in addition to our regular rules, that will evaluate it and return true or false.
It is often also useful to have computable functions as well as computable predicates. Thus we might want to be able to evaluate the truth of
gt(2+3,1)
To do so requires that we first compute the value of the pius function given the arguments 2 and 3, and then send the arguments 5 and I to gt.
The next example shows how these ideas of computable functions and predicates can be useful. It also makes use of the notion of equality and allows eyual objects to be substituted for each other whenever it appears helpful to do so during a proof.
Consider the following set of facts, again involving Marcus:
man(Marcus)
Again we ignore the issue of tense.
Pompeian(Marcus)
born(Marcus, 40)
For simplicity, we will not represent A.D. explicitly, just as we normally omit it in everyday discussions. If we ever need to represent dates B.C.. then we will have to decide on a way to do that, such as by using negative numbers. Notice that the rcpresentation of a sentence does not have to look like the sentence itself as long as there is a way to convert back and forth between them. This allows us to choose a representation, suc;h as positive and negative numbers, that is easy for a program to work with.
&every;x : man(x) -> mortal(x)
erupted(volcano, 79) ^ &every;x : [Pompeian(x) -> died(x, 79)]
This sentence clearly asserts the two fac!s represented above. It may also assert another that we have not shown, namely that the eruption of the volcano caused the death of the Pompeians. People often assume causality between concurrent events if such causality seems plausible.
Another problem that arises in interpreting this sentence is that of determining the referent of the phrase "the volcano." There is more than one volcano in the world. Clearly the one referred to here is Vesuvius, which is near Pompeii and erupted in 79 A.D. In general, resolving references such as these can require both a lot of reasoning and a lot of additional knowledge.
There are several ways that the content of this sentence could be expressed. For example, we could introduce a function age and assert that its value is never greater than 150. The representation shown above is simpler, though, and it will suffice for this example.
now =1991
Here we will exploit the idea of equal quantities that can be substituted for each other.
Now suppose we want to answer the question "Is Marcus alive?" A quick glance through the statements we have suggests that there may be two ways of deducing an answer. Either we can show that Marcus is dead because he was killed by the volcano or we can show that he must be dead because he would otherwise be more than I 50 years old, which we know is not possible. As soon as we attempt to follow either of those paths rigorously, however, we discover,just as we did in the last example, that we need some additional knowledge. For example, our statements talk about dying, but they say nothing that relates to being alive, which is what the question is asking. So we add the following facts:
&every;x : &every;t : [alive(x, t) -> &172;dead(x, t)] ^ [&172;dead(x, t) -> alive(x, t)]
This is not strictly correct, since &172;dead implies alive only for animate objects. (Chairs can be neither dead nor alive.) Again, we will ignore this for now. This is an example of the fact that rarely do two expressions have truly idontical meanings in all circumstances.
This representation says that one is dead in all years after the one in which one died. It ignores the question of whethor one is dead in the year in which one died.
To answer that requires breaking time up into smaller units than years. If we do that, we can then add rules that say such things as "One is dead at time(year1, month1) if one died during (year1, month2) and month2 precedes month1." We can extend this to days, hours, etc., as necessary. But we do not want to reduce all time statements to that level of detail, which is unnecessary and often not available.
A summary of all the facts we have now represented is given in Figure 5.4. (The numbering is changed slightly because sentence 5 has been split into two parts.) Now let's attempt to answer the question "Is Marcus alive?" by proving:
&172;alive(Marcus,now) Two such proofs are shown in Figures 5.5 and 5.6. The term nil at the end of each proof indicates that the list of conditions remaining to be proved is empty and so the proof has succeeded. Notice in those proofs that whenever a statement of the form:
a ^ b -> c
was used, a and b were set up as independent subgoals. In one sense they are, but in another sense they are not if they share the same bound variables, since, in that case, consistent substitutions must bc made in each of them. For example, in Figure 5.6 look at the step justified by statement 3. We can satisfy the goal
that a match exists and a way of guaranteeing uniform substitutions throughout a proof. Mechanisms for doing both those things are discussed below.
From looking at the proofs we have just shown, two things should be clear:
The first of these observations suggests that if we want to be able to do nontrivial reasoning, we are going to need some statements that allow us to take bigger steps along the way. These should represent the facts that people gradually acquire as they become experts. How to get computers to acquire them is a hard problem for which no very good answer is known.
The second observation suggests that actually building a program to do what people do in producing proofs such as these may not be easy. ln the next section, we introduce a proof procedure called resolution that reduces some of the complexity because it operates on statements that have first been converted to a single canonical form.
in reasoning with statements in predicate logic. Resolution is such a procedure, which gains its efficiency from the fact that it operates on statements that have been converted to a vcry convenient standard form, which is described below.
Resolution produces proofs by refutation. In other words, to prove a statement (i.e., show that it is valid), resolution attempts to show that the negation of the statement produces a contradiction with the known statements (i.e., that it is unsatisfiable). This approach contrasts with the technique that we have been using to generate proofs by chaining backward from the theorem to be proved to the axioms. Further discussion of how resolution operates will be much more straightforward after we have discussed the standard form in which statements will be represented, so we defer it until thcn.
Suppose we know that all Romans who know Marcus either hate Caesar or think that anyone who hates anyone is crazy. We could represent that in the following wff:
To use this formula in a proof requires a complex matching process. Then, having matched one piece of it, such as thinkc'i-uzy(x,y), it is necessary to do the right thing with the rest of the formula including the pieces in which the matched part is embedded and those in which it is not. If the formula were in a simpler form, this process would be much easier. The formula would be easier to work with if
Conjunctive normal form [Davis and Putnam, 1960] has both of these properties. For example, the formula given above for the feelings of Romans who know Marcus would be represented in conjunctive normal form as
&172;Roman(x) V&172;know(x, Marcus) V
hate(x, Caesar) V&172;hate(y, z) V thinkcrazy(x, z)
Since there exists an algorithm for converting any wff into conjunctive normal form, we lose no generality if we employ a proof procedure (such as resolution) that operates only on wff's to this form. In fact, for resolution to work, we need to go one step furthec We need to reduce a set of wff's to a set of clauses, where a clause is defined to be a wff in conjunctive normal form but with no instances of the connector ^. We can do this by first converting each wff into conjunctive normal form and then breaking apart each such expression into clauses, one for each conjunct. All the conjuncts will be considered to be conjoined together as the proof procedure operates. To convert a wff into clause form, perform the following sequence of steps.
&every;x : &172;[Roman(x) ^ know(x, Marcus)] V
[hate(x, Caesar) V (&every;y : &172;(&exists;z : hate(y,z)) V thinkcrazy(x,y))]
&every;x : [&172;Roman(x) V &172;know(x, Marcus)] V
[hate(x, Caesar) V (&every;y : &every;z : &172;hate(y, z) V thinkcrazy(x, y))]
&every;x : P(x) V &every;x : Q(x)
would be converted to
&every;x : P(x) V &every;y : Q(y)
This step is in preparation for the next.
&every;x : &every;y : &every;z : [&172;Roman(x) V &172;know(x, Marcus)] V
[hate(x, Caesar) V (&172;hate(y, z) V thinkcrazy(x, y))]
At this point, the formula is in what is known as prenex normal form. It consists of a prefix of quantifiers followed by a matrix, which is quantifier-free.
&exists;y : President(y)
can he transformed into the formula
President(S1)
where SI is a function with no arguments that somehow produces a value that satisfies President.
If existential quantifiers occur within the scope of universal quantifiers, then the value that satisfies the predicate may depend on the values of the universally quantified variables. For example, in the formula
&every;x : &exists;y : father-of(y,x)
the value of y that satisfies father-of depends on the particular value of x. Thus we must generate functions with the same number of arguments as the number of universal yuantifiers in whose scope the expression occurs. So this example would be transformed into
&every;x : &exists;y : father-of(S2(x),x))
These generated functions are called Skolem functions. Sometimes ones with no arguments are called Skolem constants.
[&172;Roman(x) V &172;know(x, Marcus)] V
[hate(x, Caesar) V (&172;hate(y, z) V thinkcrazy(x, y))]
&172;Roman(x) V &172;know(x, Marcus) V
hate(x, Caesar) V hate(y,z) V thinkcrazy(x, y)
However, it is also frequently necessary to exploit the distributive propeny [i.e., (a ^ b) V c = (a V c) ^ (b V c)]. For example, the formula
(winter ^ wearingboots) V (summer ^ wearingsandals)
becomes, after one application of the rule
(winter V (summer ^ wearingsandals)]
[wearingboots V (summer ^ wearingsandals)
and then, after a second application, required since there are still conjuncts joined by OR's,
(winter V summer) ^
(winter V wearingsandals) ^
(wearingboots V (summer) ^
(wearingboots V wearingsandals)
(&every;x : P(x) ^ Q(x)) = &every;x : P(x) ^ &every;x : Q(x)
Thus since each clause is a separate conjunct and since all the variables are universally quantified, there need be no relationship between the variables of two clauses, even if they were generated from the same wff.
Performing this final step of standardization is important because during the resolu- tion procedure it is sometimes necessary to instantiate a universally quantified variable (i.e., substitute for it a particular value). But, in general, we want to keep clauses in their most general form as long as possible. So when a variable is instantiated, we want to know the minimum number of substitutions that must be made to preserve the truth value of the system.
After applying this entire procedure to a set of wff's, we will have a set of clauses, each of which is a disjunction of literals. Theso clauses can now be exploited by the resolution procedure to generate proofs.
The resolution procedure is a simple iterative process: at each step, two clauses, called the parent clauses, are compared (resolved), yielding a new clause that has been inferred from them. The new clause represents ways that the two parent clauses interact with each other. Suppose that there are two clauses in the system:
winter V summer
&172;winter V cold
Recall that this means that both clauses must be true (i.e., the clauses, although they look independent, are really conjoined).
Now we observe that precisely one of winter and &172winter will be true at any point. If winter is true, then cold must be true to guarantee the truth of the second clause. If &172winter is true, then summer must be true to guarantee the truth of the first clause. Thus we see that from these two clauses we can deduce summer V cold
This is the deduction that the resolution procedure will make. Resolution operates by taking two clauses that each contain the same literal, in this example, winter. The literal must occur in positive form in one clause and in negative form in the other. The resolvenr is obtained by combining all of the literals of the two parent clauses except the ones that cancel.
If the clause that is produced is the empty clause, then a contradiction has been found. For example, the two clauses
winter
&172winter
will produce the empty clause. Ifa contradiction exists, then eventually it will be found. Ofcourse, ifno contradiction exists, it is possible that the procedure will never terminate, although as we will see, there are often ways of detecting that no contradiction exists.
So far, we have discussed only resolution in propositional logic. In predicate logic, the situation is more complicated since we must considerall possibleways ofsubstituting values for the variables. The theoretical basis of the resolution procedure in predicate logic is Herbrand's theorem [Chang and Lee,1973], which tells us the following:
The second part of the theorem is important if there is to exist any computational procedure for proving unsatisfiability, since in a finite amount of time no procedure will be able to examine an infinite set. The first part suggests that one way to go about finding a contradiction is to try systematically the possible substitutionsand see ifeach produces a contradiction. But that is highly inefficient. The resolution principle, first introduced by Robinson [1965], provides a way of finding contradictions by trying a minimum number of substitutions. The idea is to keep clauses in their general form as long as possible and only introduce specific substitutions when they are required. For more details on different kinds of resolution, see Stickel [1988].
In order to make it clear how resolution works, we first present the resolution procedure for propositional logic. We then expand it to include predicate logic.
In propositional logic, the procedure for producing a proof by resolution of propo- sition P with respect to a set of axioms F is the following.
Let's look at a simple example. Suppose we are given the axioms shown in the first column of Figure 5.7 and we want to prove R. First we convert the axioms to clause form, as shown in the second column of the figure. Then we negate R, producing &172;R, which is already in clause form. Then we begin selecting pairs of clauses to resolve together. Although any pair of clauses can be resolved, only those pairs that contain complementary literals will produce a resolvent that is likely to lead to the goal of producing the empty clause (shown as a box). We might, for example, generate the sequence of resolvents shown in Figure 5.8. We begin by resolving with the clause &172;R since that is one of the clauses that must be involved in the contradiction we are trying to find.
One way of viewing the resolution process is that it takes a set of clauses that are all assumed to be true and, based on information provided by the others, it generates new clauses that represent restrictions on the way each of those original clauses can be made true. A contradiction occurs when a clause becomes so restricted that there is no way it can be true. This is indicated by the generation of the empty clause. To see how this works, let's Iook again at the example. In order for proposition 2 to be true, one of three things must be true: &172;P, &172;Q, or R. But we are assuming that &172;R is true. Given
that, the only way for proposition 2 to be true is for one of two things to be true: &172;P or &172;Q. That is what the first resolvent clause says. But proposition 1 says that P is true, which means that &172;P cannot be true, which leaves only one way for proposition 2 to be truc, namely for &172;Q to be true (as shown in the second resolvent clause). Proposition 4 can be true if either &172;T or Q is true. But since we now know that &172;Q must be true, the only way for proposition 4 to be true is for &172;T to be true (the third resolvent). But proposition 5 says that T is true. Thus there is no way for all of these clauses to be true in a single interpretation. This is indicated by the empty clause (the last resolvent).
In propositional logic, it is easy to determine that two literals cannot both be true at the same time. Simply look for L and &172;L. In predicate logic, this matehing process is more complicated since the arguments of the predicates must be considered. For example, man(John) and &172;man(John) is a contradiction, while man(John) and &172;man(Spot) is not. Thus, in order to determine contradictions, we need a matching procedure that compares two literals and discovers whether there exists a set of substitutions that makes them identical. There is a straightforward recursive procedure, called the unification algorithm, that does just this.
The basic idea of unification is very simple. To attempt to unify two literals, we first check if their initial predicate symbols are the same. If so, we can proceed. Otherwise, there is no way they can be unified, regardless of their arguments. For example, the two literals
tryassassinate(Marcus, Caesar)
hate(Marcus, Caesar)
cannot be unified. If the predicate symbols match, then we must check the arguments, one pair at a time. If the first matches, we can continue with the second, and so on. To test each argument pair, we can simply call the unification procedure recursively. The matching rules are simple. Different constants or predicates cannot mateh; identical ones can. A variable can match another variable, any constant, or a predicate expression, with the restriction that the predicate expression must not contain any instances of the variable being matehed.
The only complication in this procedure is that we must find a single, consistent substitution for the entire literal, not separate ones for each piece of it. To do this, we must take each substitution that we find and apply it to the remainder of the literals before we continue trying to unify them. For example, suppose we want to unify the expressions
P(x, x)
P(y, z)
The two instances of P match fine. Next we compare x and y, and decide that if we substitute y for x, they could match. We will write that substitution as
y/x
(We could, of course, have decided instead to substitute x for y, since they are both just dummy variable names. The algorithm will simply pick one of these two substitutions.) But now, if we simply continue and match x and z, we produce the substitution z/x. But we cannot substitute both y and z for x, so we have not produced a consistent substitution.
What we need to do after finding the first substitutiony y/x is to make that substitution throughout the literals, giving
P(y,y)
P(y, z)
Now we can attempt to unify arguments y and z, which succeeds with the substitution z/y. The entire unification process has now succeeded with a substitution that is the composition of the two substitutions we found. We write the composition as
(z/y)(y/x)
following standard notation for function composition. In general, the substitution