Computers have an entirely deserved reputation for lacking common sense. Anyone who has ever received a bill for $0.00 from an accounting program can attest to this fact. An AI program may possess more knowledge than an accounting program, but it still computes using primitives that it knows nothing about. For example, consider the following interaction between a medical diagnosis system and a human (adapted from Lenat and Guha [1990]):
System: How old is the patient?
Human: (looking at his I957 Chevrofet): 33.
System: Are there any spots on the patient's body?
Human: (noticing rust spots): Yes.
System: What color are the spots?
Human: Reddish-brown.
System: The patient has measles (probability 0.9).
Obviously, the system does not really know what measles are, what spots are, or what the difference between cars and people is. Even within its specialty, the system is unaware of fundamental facts, for example, that humans have two arms. Clearly, what the system lacks is knowledge. So far in this book, we have seen a number of techniyues that can be used to enable an AI program to represent and reason with commonsense knowledge. For example, in predicate logic, one can state facts such as "if you die, you are dead at all later times." Frames can deseribe everyday objects, and scripts can describe the typical seyuences of events. Nonmonotonic logics can support default reasoning, an important aspect of common sense.
As of yet, however, no program can match the commonsense reasoning powers of a five-year-old child. This is due, in part, to the large amount of knowledge reyuired for common sense. In Section 10.3, we discussed the CYC program, one attempt to codify this infomþation in a large knowledge base. In this chapter, we look more closely at the kinds of knowledge such a system must possess. In particular, we investigate how to understand and predict physical processes, how to model the behavior ofmaterials, and how to reason about time and space. Memory is another key aspect to common sense. We look at how a memory can organize experiences, generalize them, and use them to solve new problems.
People know a great deal about the how the physical world works. Consider the three situations shown in Figure 19.1.
Anyone can predict what will happen in these scenarios. In situation (a), the ball will probably bounce on the ground several times, then come to rest. In situation (b), the ball will travel upward and to the right, then downward. In situation (c), the ball will swing repeatedly from left to right, finally coming to rest in the middle. Now, how can we build a computer program to do this kind of reasoning?
The obvious answer is to program in the equations goveming the physical motion of objects. These equations date back to classical physics and appear in every introductory physics textbook. For example, if the initial velocity of the ball in Figure 19.1(b) is vo, and the angle of its departure from the ground is 8, then the ball's position t seconds after being launched is given by:
height = v0 * t * sin() - 0.5gt2
distance = v0 * t * cos()
We can do the same thing for Figures 19.1(a) and (c). For Figure 19.1(a), we need to know the coefficient ofelasticity, and for Figure 19.1(c), we need to know the length of the string, the initial velocity of the ball, and its original horizontal displacement.
There are two problems with this approach. First, most people do not know these eyuations, yet they are perfectly capable of predicting what will happen in physical situations. Also, unlike eyuations, people do not need exact, numerical measures. They need only qualitative descriptions, such as the ones given at the beginning ofthis section. People seem to reason more abstractly than the equations would indicate. The goal of qualitative physics is to understand how to build and reason with abstract, numberless representations.
One might object to yualitative physics on the grounds that computers are actually well-suited to model physical processes with numerical equations. After all, a com- puter's ability to solve simultaneous equations far outstrips that of a human. However, we cannot escape common sense so easily. Eyuations themselves say nothing about when they should be used; this is usually left up to a human physicist. The common- sense knowledge employed by the physicist is part of what we must model. While some sort of yualitative physics seems necessary for automating the solution ofphysics problems, it is not sufficient by itself. The goal of qualitative physics is not to replace traditional physics but rather to provide a foundation for programs that can reason about the physical world. One such program might be a physics expert system.
As a further illustration of the need for qualitative models, consider a scene in which a glass of water leans precariously against a book on top of a cluttered desk. When the book is moved, the glass begins to tip over. At present, no set of differential equations can accurately model exactly how the spilling water will How across the desk. þven if such a model existed, it would be impossible to measure the initial conditions accurately enough to make an accurate prediction. Yet anyone in this situation can immediately visualize what is likely to happen and take rapid action to prevent it.
Qualitative physics seeks to understand physical processes by building models of them. A model is an abstract representation that eliminates inelevant details. For example, if we want to predict the motion of a ball, we may want to consider its mass and velocity, but probably not its color. Traditional physical models are built up from real-valued variables, rates of change, expressions, equations, and states. Qualitativephysics provides similar building blocks, ones which are more abstract and nonnumeric.
Notice that qualitative addition differs from its quantitative counterpart, in part because the result of qualitative addition may be ambiguous. For example, if both glasses are between empty and full, it is impossible to know whether combining them will result in a full glass or not.
No matter how states are represented, we need some way to reason about the information contained in them. A common reasoning method in qualitative physics is called qualitative simulation [Kuipers,1986]. The idea is to construct a sequence of discrete "episodes" that occur as qualitative variables change values. States are linked to other states by yualitative rules. Some rules are very general. For example, one simulation rule states that variables reach closer values before reaching further ones, and another rule states that changing from one value to another consumes some finite amount of time. Other rules, such as the rules goveming the motion of objects through the air, are more specific.
In systems that contain more than one object, rules must apply to all objects simul- taneously. For example, consider an electrical device with many components. Because the components are connected, they influence one another. The constraint satisfaction technique (Chapters 3 and 14) is one efficient way of propagating a change in one component to other nearby components.
Since combining qualitative values can lead to ambiguity, a qualitative simulation must sometimes split into two or more possible paths. A network of all possible states and transitions for a yualitative system is called an envisionment. Figure l9.2 shows an envisionment of the bouncing ball system of Figure l9.1(a). This network allows a cnmputer to reason about the behavior of the ball without recourse to numerical simulation. There are often many paths through an envisionment. Each path is called a history.
Envisionments are useful in a number of applications. Most importantly, envision- ments provide explanations for physical systems, and those explanations can be used to predict future Mehavior. In addition, if a system is an artificial one, such as a mechanical device, envisionments can be used to diagnose problems that occur when components fail to behave correctly. Envisionments can also be used to represent and/or repair inac- curate mental models that people may have. For more information about envisionments and qualitative simulation, see Weld and de Kleer [1988].
In order to write programs that automatically construct envisionments, we must represent qualitative knowledge about the behavior of particular kinds of processes, substances, spaces, devices, and so on. In the next section, we look at how to codify some of this knowledge.
A computer program that interacts with the real world must be able to reason about things like time, space, and materials. As fundamental and commonsensical as these concepts may be, modeling them tums out to present some thomy problems.
While physicists and philosophers still debate the true nature of time, we all manage to get by on a few basic commonsense notions. These notions help us to decide when to initiate actions, how to reason about others' actions, and how to determine relationships between events. For instance, if we know that the Franco-Prussian War preceded World War I and that the Battle of Verdun occurred during World War I, then we can easily infer that the Battle of Verdun must have occurred sometime after the Franco-Prussian War. A commonsense theory of time must account for reasoning of this kind.
The most basic notion of time is that it is occupied by events. These events occur during intervals, continuous spaces of time. What kinds of things might we want to say about an interval? An interval has a starting point and an ending point, and a duration defined by these points. Intervals can be related to other intervals, as we saw in the last paragraph. It turns out that there are exactly thirteen ways in which two non-empty time intervals can relate to one another. Figure 19.3 shows these relationships. As is clear from the figure,there are actually only seven distinet relationships: the relationship of equality plus six other relationships that have their own inverses.
Now we can state rules for drawing inferences about time intervals. For example, common sense tells us that the IS-BEFORE relation is transitive. That is, if event a occurred before event b and if event b occurred before event c, then event a must have
occurred before event c. How many such axioms will we need before we capture all of our basic commonsense notions of time? We can greatly simplify matters if we define some interval relationships in terms of other more basic ones. In fact, we can reduce all Ihe relations in Figure 19.3 to the single relation MEETS. Here is the definition of the relation IS-BEFORE:
i IS-BEFOREj &3equals; &exists;k : (i MEETS k) ^ (k MEETS j)
In other words, if i IS-BEFORE j, then there must be some k in between that MEETS both i and j. When the rest of the relations are defined similarly, MEETS becomes the only primitive relation, and we can write all our commonsense axioms in terms of it. Our first axiom states that points where intervals MEET are unique:
&every;i,j : (&exists;k : (i MEETS k) ^ (j MEETS k)) -->
(&every;l : (i MEETS l) <-> (j MEETS l))
In other words, i and j cannot MEET k at different points in time, so every event has a uniyue staning time. We can write a similar axiom to state that every event has a uniyue ending time. Next, we state that given two places where intervals meet, exactly one of the following three conditions must hold: the places are the same, the first place precedes the second, or the second precedes the first. [Footnote: in this formula &exclor; should be read as "exelusive-or." p &exclor; q &exclor; is logical shorthand for (p ^&177;q ^ &177;r) v (&177;p ^ q ^ &177;r) v (&177;p ^ &177;q ^ r).]
&every;i,j,k,l : (i MEETS j) ^ (k MEETS l) ->
(i MEETS l) &exclor;
&exists;m : (i MEETS m) ^ (m MEETS l) &exclor;
&exists;m : (k MEETS m) ^ (m MEETS j)
There are two more axioms. One states that there are always intervals surrounding any given interval. This axiom turns out to be useful, although it prohibits any reasoning about infinite time intervals.
&every;i : &exists;j, k : (j MEETS i) ^ (i MEETS k)
Finally, we can state that for any two intervals that MEET, there exists a continuous interval that is the union of the two:
&every;i,j : (i MEETSj) ->
&exists;a, b, (i +j) :
(a MEETSi) ^ (j MEETSb) ^
(a MEETS(i +j)) ^ ((i +j) MEETSb)
These axioms encode a rich commonsense theory of time. They allow us Io derive many facts, such as the transitivity of the IS-BEFORE relation. Suppose we know that a IS-BEFORE b and that b IS-BEFORE c. By the delinition of IS-BEFORE, there must be some interval d that lies between a and b, i.e., a MEETS d and d MEETS b. By the union axiom, we can deduce the existence of an interval (d + b) such that there is an s that MEETS (d + b) and a y, that IS-MET BY (d + b). By the uniqueness of starting points, we can conelude that a also meets (d + b). Since b IS-BEFORE c, there must be an e between them. We can now construct another union interval (d + b + e), which we can prove MEETS c and IS-MET-BY a. Therefore a IS-BEFORE c.
This may seem like a roundabout way of doing things, and it is. There is nothing in the axioms themselves that dictates how they should be used in real programs. In fact, efficient implementalions represent all thirteen temporal relations explicitly, making use ofprecompiled tables Ihat record how the relations can intenict. Constraint satislþaction is a useful techniyue for making inferences about these relations [Kautz, I 4X6). The logical statements above are just a concise way of writing down one particular commonsense theory of time.
In this book, we have often used examples from the blocks world. Primitives in this world include block names, actions like PICKUP and STACK, and predicates like ON(x, y). These primitives constitute a useful abstraction, but eventually we must break them down. Ii we want a real robot to achicve ON(x, y), then thal rohot had belter know what ON really mcans, where.r and v are located, how big they are, how they arc shaped, how to align x on top of y so that x won't fall off, and so forth. These requirements become more apparent if we want to issue commands like "place block x near block y" or "lean block x up against block y." Commonsense notions of space are critical for living in the real world.
Objects have spatial extent, while events have temporal extent. We might therefore try to expand our commonsense theory of time into a commonsense theory of space. B ecause space is three-dimensional, there are far more than thirteen possible spatial relationships between two objects. For instance, consider one block perfectly aligned on top of another. The objects are EQUAL in the length and width dimensions, while they MEET in the height dimension. If the top block is smaller than the bottom one but still centered on top of it, then they still MEET in the height dimension, but we must use the spatial equivalent of IS-DURING to describe the length and width relationships. The main problem with this approach is that it generates a vast number of relations (namely 133 = 2197), many of which are not very commonsensical. Moreover, a number of interesting spatial relations, such as "x curves around y," are not included. So we must consider another approach.
In our discussion of qualitative physics, we saw how to build abstract models by transforming real-valued variables into diserete quantity spaces. We can also view objects and spaces at various levels of abstraction. For instance, we can view a three- dimensional piece of paper as a two-dimensional sheet; similarly, we can view a three- dimensional highway as a one-dimensional curve. Hobbs [1985] proposed one very general mechanism forcreating and manipulatingabstract models. With this mechanism, we start out with a full-blown theory of the world, and then we construct a simpler, more abstract model by extracting a set of relevant properties. We then group objects into classes whose members are indistinguishable from each other as far as their relevant properties go. For example, as we drive along a highway, our major relevant property might be DISTANCE-TO-GOAL. This property effectively reduces the bits ofconcrete in the three-dimensional highway into a one-dimensional curve, where each point on the curve has a unique DISTANCE-TO-GOAL value. In a similar fashion, we can map real time intervals onto diserete time steps, spatial coordinates onto a two-dimensional grid, and so on. Choosing a set of relevant properties amounts to viewing the worfd at a particular level of granularity. Since different granularities are systematically related to each other, we can reason in a simplified model with relative assurance that our actions will be implementable in the real world.
The idea of granularity can be used to build a commonsense model of space [Kautz, 1985]. The basic idea is to define relations over spaces. The first relation is INSIDE(x, y, g), where x and y are spaces occupied by particular objects and g is the level of granularity at which those objects are viewed. For example, water is INSIDE a glass ifthe three-dimensional space taken up by the water is completely contained within the three-dimensional space taken up by the glass. If we view a highway as a three- dimensional slab of concrete, then a car driving along the highway would be considered ADJACENT to the highway, but not INSIDE of it. However, ifsome granularity g views the highway as a one-dimensional curve, then the relation INSIDE(Car, Highway, g) holds for as long as the car stays on the road. This is because the car and its position on the road are indistinguishableat that level ofgranularity.
We can now detine a number of useful properties for curves, lines, surfaces, planes, and volumes. For example, here is the definition of a terminal point p of a curve c:
TERMINAL(p, c) &3equals;
INSIDE(p, c) ^
&every;c1, c2 : INSIDE(c1, c) ^ INSIDE(c2, c)
^ INSIDE(p, c1) ^ INSIDE(p, c2)
- > INSIDE(c1, c2) V INSIDE(c2, c1)
In other words, p is a TERMINAL of c if, whenever two subcurves of c both include p, one must be a subcurve of the other. We can similarly define curve segments, adjoining curves, loops, and forks. Another useful class to define is that of a RIBBON:
RIBBON(object, side1, side2)
A ribbon is essentially a curve viewed at a coarser level of granularity, resulting in a two-dimensional ribbonlike shape. Our world contains many objects that are usefully viewed as ribbons, e.g., rivers and bridges (Figure I 9.4). We can define several properties of curves as they relate to ribbons. For example,
These definitions assume that we have defined the terms PERPENDICULAR, AXIS, and ADJACENT, and that we have supplied the commonsense axiom that an object x is ADlACENT to an object y if any part of x is ADJACENT to y.
A robot could use the ALONG relation to plot a course down the river's edge. It could similarly use the ACROSS relation to navigate to the other side of the river. Unfortunately, the ACROSS relation is not enough, as the robot might try to cross the river without using the bridge. The robot is still missing one fact: you can't walk on water. That's common sense.
Why can 't you walk on water? What happens if you tum a glass of water upside down? What happens when you pour water into the soil of a potted plant?
Liquids present a particularly interesting and challenging domain to formalize. Hayes [1985] presented one attempt to describe them. Before we can write down any properties of liquids, we must decide what kinds of objects those properties will deseribe. In the last section, we defined spatial relations in terms of the spaces occupied by objects, not in terms of the objects themselves. It is particularly useful to take this point of view with liquids, since liquid "objects ' can be split and merged so easily. For example, if we consider a river to be a piece of liquid, then what happens to the river when the liquid flows out into the ocean? Instead of continually changing our characterization of the river, it is more convenient to view the river as a fixed space occupied by water.
Containers play an important role m the world of liquids. Since we do not want to refer to liquid objects, we must have another way of stating how much liquid is in a container. We can define a CAPACITY function to bound the amount of liquid l that a space s can hold. The space is FULL when the AMOUNT equals the CAPACITY.
We can also define an AMOUNT function:
AMOUNT(Water, Glass) &62; none
This statement means, "There is water in the glass." Here, Water refers to the generic concept of water and Glass refers to the space enclosed by a particular glass.
Spaces have a number of other properties besides CAPACITY and FULL. Recall that spaces can be linked to one another by the INSIDE relation. In addition, a space can be free or not. A space is free if it is not wholly contained inside a solid object. In addition, every space is bounded on all sides by a set of two-dimensional regions, called faces. If a free face (one not part of a solid object) separates two free spaces, it is called a portal. Liquids can flow from one free space to another via a portal. Two objects are said to be joined if they share a common face. To summarize:
We can now define a closed container as a hollow object with no portals:
An open container has (at least) one portal at the top: x
Liquids make things wet. To model wetness, we will find it useful to imagine a solid object as being surrounded by a very thin free space. This space is broken up into a set of thin outer spaces corresponding to the various faces of the object. Objects that touch share these outer spaces. x
The last two facts state that SURROUND(o) contains all its outer spaces, and that any larger, free space containing object o also contains SURROUND(o). Now we can define wetness as a relation between an outer space d and some generic liquid l: x
Suppose our robot encounters a room with six inehes of water on the floor. What will happen if the robot touches the floor? By the definition of TOUCHING, we have:
&exists;d1 : OUTER(d1, Robot) ^ OUTER(d1, Floor)
Since the floor only has one face, d, we can conclude:
OUTER(d, Robot) ^ OUTER(d, Floor)
Combining the first clause with the fact WET BY(d, Water) gives us IS-WET(Robor). In other words, the robot will get wet. Recall that at the end of the last section, our robot was about to try crossing a river without using a bridge. It might find this fact useful:
INSIDE(s1, s1) ^ FREE(s) ^ FULL(s2, l) -> FULL(s1, l)
It is straightforward to show that if the robot is submerged, it will be wet all over. Predicting that the robot will become submerged in the first place requires some envisionment. We need a rule thal says cne dense solid object must be supported by another solid object, or else it will tend to move downward. One property of liquids is that they do not support dense solid objects.
We also need general rules deseribing how liquids themselves behave over time. Consider all the possible forms that a liquid may take at a given instant. Hayes [1985] distinguishes between "lazy, still" liquids, "lazy, moving" liquids and "energetic, mov- ing" liquids. Energetic, moving liquids are liquids being propelled by some active force, for example, oil being pumped through a pipeline. Lazy liquids are liquids in their nat- ural state. Sometimes they are moving, as in river water and rain, and sometimes they are still. Liquids in any of these forms can also be either bulk or divided. Most of the time we deal with bulk liyuid, but sometimes we encounter mist, dew, or rain. Finally, liquids can be either unsupported, on a surface, or in a container.
What happens to these types of liquids? Figure 19.5 shows five envisionments for lazy, bulk liquids. A containment event can become a falling event ifthe container tips. The falling event becomes a wetting event and then a spreading one. Depending on where the spreading takes place, further falling or Howing events may ensue. When all the liquid has left the container, the spreading will stop, and sometime afterward, a drying event will begin.
Other materials behave differently. Solids can be rigid or Hexible. A string can be used to pull an object but not to push it. Solids can also be particulate (like sand), in which case they share many of the same behaviors as liquids. Gases are also similar to liquids. Also, some solids soak up liquid (sponges, dirt), while others are watertight.
We can see that commonsense knowledge representation has a strongly taxonomic flavor. A lot of work has been done in these and other areas, but much more also remains to be worked out.
Memory is central to commonsense behavior. Human memory contains an immense
a Psychological studies suggest several distinctions in human memory. One distinction
is between short-term memory (STM) and long-term memory (LTM). We know that a
person can only hold a few items at a time in STM, but the capacity of LTM is very large.
LTM storage is also fairly permanent. The production system is one computer model of
the STM-LTM structure. Perceptual information is stored directly in STM, also called
working memory. Production rules, stored in LTM, match themselves against items in
STM. Produrtion.c fire, modify STM, and repeal.
LTM is often divided into episodic memory and semantic memory. Episodic memory
contains information about past personal experiences, usually stored from an autobiographical
point of view. For example, a college graduation, a wedding, or a concert may
all form episodic memories. Semantic memory, on the other hand, contains facts like
"Birds fly." These facls are no longer conneeted with personal experiences.
Work on modeling semantic memory began with Quillian [1969]. This model soon
developed into the idea of semantic networks and from there into the other slot-and-
filler structures we saw in Chapters 9 and l0. Semantic memory is especially useful in
programs that understand natural language.
Models for episodic memory grew out of research on scripts. Recall that a script is
a slereotyped sequence of events, such as those involved in going to the dentist. One
Obvious yuestion to ask is: How are scripts acyuired? Surely they are acyuired through
personal experience. But a particular experience often ineludes details that we d0 not
want to inelude in a script. For example, just because we once saw The New Yoc-ker
magazine in a dentist's waiting room, that doesn't mean that The New Yorkei- should be
part of the dentist script. The problem is that ifa script contains to0 many details, it will
not be matched and retrieved correctly when new, similar situations arise.
ln general, it is difficult to know which script to retrieve (as we discussed in Section
4.3.5). One reason for this is that scripts are too monolithic. It is hard to do
any kind of partial matching. It is also hard to modify a script. More recent work
reduces scripts to individual scenes, which can be shared across multiple structures.
Stereotypical sequences of scenes are strung together into memory organization packets
(MOPs) [Schank, 1977]. Usually, three distinet MOPs encode knowledge about an
event sequence. One MOP represents the physical seyuence ofevents, such as entering
a dentist's office, sitting in the waiting room, reading a magazine, sitting in the dentist's
chair, etc. Another MOP represents the set of social events that take place. These are
events that involve personal interactions. A third MOP revolves around the goals of the
person in the particularepisode. Any ofthese MOPs may be important for understanding
new situatiOns.
MOPs organize scenes, and they themselves are further organized into higher-level
MOPs. For example, the MOP for visiting the otfice of a professional may contain a
seyuence ofabstract general scenes, such as talking to an assistant, waiting, and meeting.
High-level MOPs contain no actual memories, so where do they come from?
New MOPs are created upon the failure of expectations. When we use scripts for
story understanding, we are able to locate interesting parts ofthe story by noticing places
where events do not conform to the script's expectations. In a MOP-based system, if
an expectatiOn is repeatedly violated, then the MOP is generalized or split. Eventually,
episodic memories can fade away, leaving only a set ofgeneralized MOPs. These MOPs
look something like scripts, except that they share scenes with one another.
Let's look at an example. The first time you go to the dentist, you must determine
how things work from serateh since you have no prior experience. In doing so, you
store detailed acrounts of ezcch scene and string them together into a MOP. The next
time you visit the dentist, that MOP provides certain expectations, which are mostly
met. You are able to deal with the situation easily and make inferences that you could
not make the first time. If any expectation fails, this provides grounds for modifying
the MOP. Now, suppose you later visit a doctor's office. As you begin to store episodic
scenes, you notice similarities between these scenes and scenes from the dentist MOP.
Such similarilies provide a basis for using the dentist MOP to generate expectations.
Multiple trips to the doctor will result in a doctor MOP that is slightly different from
Ihc dentist MOP. Laler experiences with visiting lawyers and government officials will
resull in other MOPs. Ultimately, the structures shared by all of these MoPs will cause
a generalized MOP to appcar. Whenever you visit a professional's office in the future,
you can use lhe generalized MOP to provide expectations.
With MOPs, memory is both a constructive and reconstructive process. It is constructive
because new experiences create new memory structures. It is reconstructive
because even if the details of a particular episode are lost, the MOP provides information
about what was likely to have happened. The ability to do this kind of reconstruction is
an important feature of human memory.
There are several MOP-based computer programs. CYRUS [Kolodner, 1984] is a
program that contains episodes taken from the life of a particular individual. CYRUS
can answer yuestions that reyuire significant amounts of memory reconstruction. The
IPP program [Lebowitz, 1983] accepts stories about terrorist attacks and stores them in
an episodic memory. As it notices similarities in the stories, it creates general memory
structures. These structures improve its ability to understand. MOPTRANS [Lytinen,
1984] uses a MOP-based memory to understand sentences in one language and translate
them into another.
We now turn to the role of memory in general problem solving. Most AI programs
solve problems by reasoning from first principles. They can explain their reasoning
by reporting the string of deductions that led from the input data to the conelusiOn.
With human experts, however, we often observe a different type of explanation. An
expert encountering a new problem is usually reminded of similar cases seen in the
past, remembering the results of those cases and perhaps the reasoning behind those
results. New problems are solved by analogy with old ones and the explanations are
often couched in terms of prior experiences. Medical expertise, for example, seems to
follow this pattern, and legal education is also case-oriented.
Computer systems that solve new problems by analogy with old ones are often called
case-based reasoning (CBR) systems. A CBR system draws its power from a large case
library, rather than from a set of first principles. In order to be successful, CBR systems
must answer the following questions:
The memory structures we discussed in the previous section are clearly relevant t0
CBR. Those structures were used primarily in text understanding applications, however.
Now we look at general memory-based problem solving.
To use a memory effectively, we must have a rich indexing niechanism. When we
are presented with a problem, we should be reminded of relevant past experiences, but
not be inundated with a lot of useless memories. The obvious idea is to index past
episodes by the features present in thent. For example, any experience having t0 d0
with a car would be filed under Car, as well as under other indices. But we must havc
some seheme for distinguishing important indices from unimportant ones. Otherwise,
everything will remind us of everything else, and we will be unable to focus on memorics
that will best help us to solve our current problem. But important features are not always
the most obvious ones. Here is an example from Sehank [1977], called the "steak and
haircut" story:
Clearly, the indices Steak, Wife, and Rare are insufficient to remind Y of the barbershop
episode. We need more general indices, such as Provide-Service, Refusal, and Extreme.
Dyer [1983] also takes up this theme, embodied in a program that deduces adages and
morals from narratives.
Some features are only important in certain contexts. For example, suppose it is
cloudy. If your problem is to plan a picnic, you might want to retrieve other episodes
involving cloudy days. But if your problem is to write a computer program, then the fact
that it is cloudy is probably incidental. Because important features vary from domain
to domain, a general CBR system must be able to leam a proper set of indices from
experience. Both the inductive and explanation-based leaming techniques described in
Chapter 17 have been used for this task.
Recall that in our discussion of production systems, we talked about how rules
and states could be organized into a RETE network for efficient matching. We also
discussed matching frames and scripts in Section 4.3.5. Something similar is required
for CBR, since the number of cases can be very large. The data structure for the case
itself is also important. A case is usually stored as a monolithic structure, although in
some variations, cases can be stored piecemeal. The former strategy is efficient when
it is possible to obtain almost-perfect matches; the latter strategy is better in complex
problem-solving domains.
The result of the retrieval process is usually a set of cases. The next step is to take
the best case and adapt it to the current situation. One method for choosing the best case
is the use ofpreference heurislics [Kolodner, I989). Here are some examples:
Since even the best case will not match the current situation exactly, it will have
to be adapted. At the simplest level, this involves mapping new objects onto old ones
(e.g., Steak onto Hair, and Rare onto Short). When old cases represent entire problem-
solving episodes, adaptation can be quite complex. CHEF [Hammond, 1986] is an
exam ple of a case-based planner, a program w hose cases are actually contplete plans
for solving problems in the domain of cooking. CHEF's case library is augmented with
a plan-modification library indexed by plan types and change types. CHEF first looks
at the retrieved plan and sees if it satisfies the current goals. If any goal is unsatisfied,
then the plan-modification library is consulted. The library may suggest a list of steps
to be added to the plan, deleted from the plan, or substituted for existing steps. This
modification process is not guaranteed to succeed, however, and so CHEF ineludes a
plan repair module that uses domain knowledge to explain why the new plan fails, if it
does. Once a complete, working plan is created, it is executed and then stored in the
case library for future reference.
We have said nothing yet about how cases are acquired originally. In fact, most
CBR systems draw on a small library of cases that are entered by hand. Of course, we
will eventually be able to transform large bodies of on-line texts, such as legal cases,
into Iarge case libraries. Another approach is to bootstrap gradually from rule-based
seareh into CBR. The idea is to start solving problems with a heuristic seareh engine.
Each time a problem is solved, it is automatically stored in a case library. As the library
grows, it becomes possible to solve some new problems by analogy with old ones. This
idea is very similar to some of the learning techniques we saw in Section 17.4-the
acquisition of search control rules, for example. This brings up the issue of whether it
is better to store whole cases in memory or to store smaller bits of control knowledge
instead. There are a number of trade-offs involved. First is the ease of modification.
Central to case-based reasoning is the idea that stored cases can be adapted and modified.
Search control rules are more procedural. Once learned, they are hard to modify. If a
search control rule starts to perform badly, it is usually deleted in toto. Another trade-off
involves indexing. Seareh control rules are fully indexed: they apply in exactly the
situations to which they are relevant. Cases, on the other hand, are usually indexed
heuristically, as we saw above. Finally, search control rules are explicit?y generalized at
storage time. In CBR, generalization occurs over time as a by-product of the retrieval
and adaptation process. Aggressive generalization makes it easy to solve new problems
quickly, but in less complete domains, where proper generalizations are unknown, an
aggressive strategy can be inefficient and even incorrect.
Convert these facts into logical statements in terms of the MEETS relation. Use
the commonsense axioms oftime given in Section 19.2.1 to show that the Franco-
Prussian War must have occurred before the Battle of Verdun.
Figure 19.5: Five Envisionments for Lazy, Bulk Liquids
19.4 Case-Based Reasoning
19.5 Exercises