Language is meant for communicating about the world. By studying language, we can come to understand more about the world. We can test our theories about the world by how well they suppon our attempt to understand language. And, if we can succeed at building a computational model of language, we will have a powerful tool for communicating about the world. In this chapter, we Iook at how we can exploit knowledge about the world, in combination with linguistic facts. to build computational natural language systems.
Throughout this discussion, it is going to be important to keep in mind that the difficulties we will encounter do not exist out of perversity on the part of some diabolical designer. Instead, what we see as difficulties when we try to analyze language are just the flip sides of the very properties that make language so powerful. Figure 15.1 shows some examples of this. As we pursue our discussion of language processing, it is important to keep the good sides in mind since it is because of them that language is significant enough a phenomenon to be worth all the trouble.
By far the largest part of human linguistic communication occurs as speech. Written language is a fairly recent invention and still plays a less central role than speech in most activities. But processing written language (assuming it is written in unambiguous characters) is easier, in some ways, than processing speech. For example, to build a program that understands spoken language, we need all the facilities of a written language understander as well as enough additional knowledge to handle all the noise and ambiguities of the audio signal.[Footnote: Actually, in understanding spoken language, we take advantage of clues, such as intonation and the resence of pauses, to which we do not have access when we read. We can make the task of a speech- uderstanding program easier by allowing it, too, to use these cues, but to do so, we must know enough about them to incorporate into the program knowledge of how to use them.] Thus it is useful to divide the entire language- processing problem into two tasks:
The Problem: English sentences are incomplete descriptions of the information that they are intended to convey:
The Good Side: Language allows speakers to be as vague or as precise as they like. It also allows speakers to leave out things they believe their hearers already know.
The Problem: 'The same expression means diffierent things in different contexts:
The Good Side: Language lets us communicate about an infinite world using a finite (and thus learnable) number of symbols.
The Problem: No natural language program can be complete because new words, expressions, and meanings can be generated quite freely:
The Good Side: Language can evolve as the experiences that we want to communicate about evolve.
The Problem: There are lots of ways to say the same thing
The Good Side: When you know a lot, facts imply each other. Language is intended to be used by agents who know a lot.
In Chapter 14 we described some of the issues that arise in speech understanding, and in Section 21.2.2 we return to them in more detail. In this chapter, though, we concentrate on written language processing (usually called simply natural language processing).
Throughout this discussion of natural language processing, the focus is on English. This happens to be convenient and tums out to be where much of the work in the field has occurred. But the major issues we address are common to all natural languages. In fact. the techniques we discuss are particularly important in the task of translating from one natural language to another.
Natural language processing includes both understanding and generation, as well as other tasks such as multilingual translation. In this chapter we focus on understanding, although in Section I5.5 we will provide some references to work in these other areas.
Recall that in the last chapter we defined understanding as the process ofmapping from an input form into a more immediately useful form. It is this view of understanding that we pursue throughout this chapter. But it is useful to point out here that there is a formal sense in which a language can be defined simply as a set of strings without reference to any world being described or task to be performed. Although some of the ideas that have come out of this formal study of languages can be exploited in parts of the understanding process, they are only the beginning. To get the overall picture, we need to think of language as a pair(source language. target representation), together with a mapping between elements of each to the other. The target representation will have been chosen to be appropriate for the task at hand. Often, if the task has clearly been agreed on and the details of the target representation are not important in a particular discussion, we talk just about the language itself, but the other half of the pair is really always present.
One of the great philosophical debates throughout the centuries has centered around the question of what a sentence means. We do not claim to have found the definitive answer to that question. But once we realize that understanding a piece of language involves mapping it into some representation appropriate to a particular situation, it becomes easy to see why the questions "What is language understanding?" and "What does a sentence mean?" have proved to be so difficult to answer. We use language in such a wide variety of situations that no single definition of understanding is able to account for them all. As we set about the task of building computer programs that understand natural language, one of the first things we have to do is define precisely what the underlying task is and what the target representation should look like. In the rest of this chapter, we assume that our goal is to be able to reason with the knowledge contained in the linguistic expressions, and we exploit a frame language as our target representation.
Before we go into detail on the several components of the natural language understanding process, it is useful to survey all of them and see how they fit together. Roughly we can break the process down into the following pieces:
The boundaries between these five phases are often very fuzzy. The phases are sometimes performed in sequence, and they are sometimes performed all at once. If they are performed in sequence, one may need to appeal for assistance to another. For example, part of the process of performing the syntactic analysis of the sentence "Is the glass jar peanut butter?" is deciding how to form two noun phrases out of the four nouns at the end of the sentence (giving a sentence of the form "Is the x y?"). All of the following constituents are syntactically possible: glass, glass jar, gtass jar peanut, jar peanut butter, peanut butter, butter. A syntactic processor on its own has no way to choose among these, and so any decision must be made by appealing to some model of the world in which some of these phrases make sense and others do not. If we do this, then we get a syntactic structure in which the constituents "glass jar" and "peanut butter" appear. Thus although it is often useful to separate these five processing phases to some extent, they can all interact in a variety of ways, making a complete separation impossible.
Specifically, to make the overall language understanding problem tractable, it will help if we distinguish between the following two ways of decomposing a program:
In this chapter, we focus primarily on the first of these issues. It is the one that has received the most attention from people working on this problem. We do not completely ignore the second issue, although considerably less of substance is known about it. For an example of this kind of discussion that talks about interleaving syntactic and semantic processing, see Lytinen [1986].
With that caveat, let's consider an example to see how the individual processes work. In this example, we assume that the processes happen sequentially. Suppose we have an English interface to an operating system and the following sentence is typed:
Morphological Analysis
Morphological analysis must do the following things:
In addition, this process will usually assign syntactic categories to all the words in the sentence. This is usually done now because interpretations for affixes (prefixes and suffixes) may depend on the syntactic category of the complete word. For example, consider the word "prints." This word is either a plural noun (with the "-s" marking plural) or a third person singular verb (as in "he prints"), in which case the "-s" indicates both singular and third person. If this step is done now, then in our example, there will be ambiguity since "want," "print," and "file" can all function as more than one syntactic category.
Syntactic Analysis
Syntactic analysis must exploit the results of morphological analysis to build a structural description of the sentence. The goal of this process, called parsing, is to convert the flat list of words that forms the sentence into a structure that defines the units that are represented by that flat list. For our example sentence, the result of parsing is shown in Figure 15.2. The details of this representation are not particularly significant; we describe alternative versions of them in Section 15.2. What is important here is that a flat sentence has been converted into a hierarchical structure and that that structure has been designed to correspond to sentence units (such as noun phrases) that will correspond to meaning units when semantic analysis is performed. One useful thing we have done here, although not all syntactic systems do, is create a set of entities we call reference markers. They are shown in parentheses in the parse tree. Each one corresponds to some entity that has been mentioned in the sentence. These reference markers are useful later since they provide a place in which to accumulate information about the entities as we get it. Thus although we have not tried to do semantic analysis (i.e.. assign meaning) at this point, we have designed our syntactic analysis process so that it will find constituents to which meaning can be assigned.
Semantic Analysis
Semantic analysis must do two important things:
For this example, suppose that we have a frame-based knowledge base that contains the units shown in Figure 15.3. Then we can generate a partial meaning, with respect to that knowledge base, as shown in Figure 15.4. Reference marker RM1 corresponds to the top-level event of the sentence. It is a wanting event in which the speaker (denoted by "I") wants a printing event to occur in which the same speaker prints a file whose extension is ".init" and whose owner is Bill.
Discourse Integration
At this point, we have figured out what kinds of things this sentence is about. But we do not yet know which specific individuals are being referred to. Specifically, we do not know to whom the pronoun "I" or the proper noun "Bill" refers. To pin down these references requires an appeal to a model of the current discourse context, from which we can learn that the current user (who typed the word "I") is User068 and that the only
person named "Bill" about whom we could be talking is User073. Once the correct referent for Bill is known, we can also determine exactly which file is being referred to: F1 is the only file with the extension ".init" that is owned by Bill.
Pragmatic Analysis
We now have a complete description, in the terms provided by our knowledge base, of what was said. The final step toward effective understanding is to decide what to do as a result. One possible thing to do is to record what was said as a fact and be done with it. For some sentences, whose intended effect is clearly declarative, that is precisely the correct thing to do. But for other sentences, including this one, the intended effect is different. We can discover this intended effect by applying a set of rules that characterize cooperative dialogues. In this example, we use the fact that when the user claims to want something that the system is capable of performing, then the system should go ahead and do it. This produces the final meaning shown in Figure 15.5.
The final step in pragmatic processing is to translate, when necessary, from the knowledge-based representation to a command to be executed by the system. In this case, this step is necessary, and we see that the final result of the understanding process is
where "lpr" is the operating system's file print command.
Summary
At this point, we have seen the results of each of the main processes that combine to form a natural language system. In a complete system, all of these processes are necessary in some form. For example, it may have seemed that we could have skipped the knowledge-based representation of the meaning of the sentence since the final output of the understanding system bore no relationship to it. But it is that intermediate knowledge-based representation to which we usually attach the knowledge that supports the creation of the final answer.
All of the processes we have described are important in a complete natural language understanding system. But not all programs are written with exactly these components. Sometimes two or more of them are collapsed, as we will see in several sections later in this chapter. Doing that usually results in a system thatis easier to build for restricted subsets of English but one that is harder to extend to wider coverage. In the rest of this chapter we describe the major processes in more detail and talk about some of the ways in which they can be put together to form a complete system.
Syntactic processing is the step in which a flat input sentence is converted into a hierarchical structure that corresponds to the units of meaning in the sentence. This process is called parsing. Although there are natural language understanding systems that skip this step (for example, see Section 15.3.3), it plays an important role in many natural language understanding systems for two reasons:
Although there are many ways to produce a parse, almost all the systems that are actually used have two main components:
The most common way to represent grammars is as a set of production rules. Although details of the forms that are allowed in the rules vary, the basic idea remains the same and is illustrated in Figure 15.6, which shows a simple context-free, phrase structure grammar for English. Read the first rule as, "A sentence is composed of a noun phrase followed by a verb phrase." In this grammar, the venical bar should be read as "or." The e denotes the empty string. Symbols that are further expanded by rules are called nonter-ntinal symbols. Symbols that correspond directly to strings that must be found in an input sentence are called terminal symbols.
Grammar formalisms such as this one underlie many linguistic theories, which in tum provide the basis for many natural language understanding systems. Modern linguistic theories include: the government binding theory of Chomsky [1981;1986], GPSG [Gazdar et al.,1985], LFG [Bresnan,1982], and categorial grammar [Ades and Steedman, I982; Oehrle et al.,1987]. The first three of these are also discussed in Sells [1986]. We should point out here that there is general agreement that pure, context- free grammars are not effective for describing natural languages. [Footnote: There is, however, still some debate on whether context-free grammars are formally adequate for describing natural languages (e.g., Gazdar [1982].)] As a result, natural language processing systems have less in common with computer language processing systems (such as compilers) than you might expect.
Regardless of the theoretical basis of the grammar, the parsing process takes the rules of thc grammar and compares them against the input sentence. Each rule that matehes adds something to the complete structure that is being built for the sentence.
The simplest structure to build is a parse tree, which simply records the rules and how they are matehed. Figure 15.7 shows the parse tree that would be produced for the sentence "Bill printed the file" using this grammar. Figure 15.2 contained another example of a parse tree, although some additions to this grammar would be required to produce it.
Notice that every node of the parse tree corresponds either to an input word or to a nonterminal in our grammar. Each level in the parse tree corresponds to the upplication of one grammar rule. As a result, it should be clear that a grammar specifies two things about a language:
So far, we have shown the result of parsing to be exactly a trace of the rules that were applied during it. This is not always the case, though. Some grammars contain additional information that describes the structure that should be built. We present an example of such a grammar in Section 15.2.2.
But first we need to look at two important issues that define the space of possible parsers that can exploit the grammars we write.
Top-Down versus Bottom-Up Parsing
To parse a sentence, it is necessary to find a way in which that sentence could have been generated from the start symbol. There are two ways that this can be done:
The choice between these two approaches is similar to the choice between forward and backward reasoning in other problem-solving tasks. The most important consideration is the branching factor. Is it greater going backward or forward? Another important issue is the availability of good heuristics for evaluating progress. Can partial information be used to rule out paths early? Sometimes these two approaches are combined into a single method called bottom-up parsing with top-down filtering. In this method, parsing proceeds essentially bottom-up (i.e., the grammar rules are applied backward). But using tables that have been precomputed for a particular grammar, the parser can immediately eliminate constituents that can never be combined into useful higher-level structures.
Finding One Interpretation or Finding Many
As several of the examples above have shown, the process of understanding a sentence is a search process in which a large universe of possible interpretations must be explored to find one that meets all the constraints imposed by a particular sentence. As for any search process, we must decide whether to explore all possible paths or, instead, to explore only a single most likely one and to produce only the result of that one path as the answer.
Suppose, for example, that a sentence processor looks at the words of an input sentence one at a time, from left to right, and suppose that so far, it has seen:
There are two paths that the processor could be following at this point:
There are four ways of handling sentences such as these:
Although the problems of deciding which paths to follow and how to handle backtracking are common to all search processes, they are complicated in the case of language understanding by the existence of genuinely ambiguous sentences, such as our earlier example "They are flying planes." If it is important that not just one interpretation but rather all possible ones be found, then either all possible paths must be followed (which is very expensive since most of them will die out before the end of the sentence) or backtracking must be forced (which is also expensive because of duplicated computations). Many practical systems are content to find a single plausible interpretation. If that interpretation is later rejected, possibly for semantic or pragmatic reasons, then a new attempt to find a different interpretation can be made.
Parser Summary
As this discussion suggests, there are many different kinds of parsing systems. There are three that have been used fairly extensively in natural language systems:
We do not have space here to go into all these methods. In the next section, we illustrate the main ideas involved in parsing by working through an example with an ATN. After this, we look at one way of parsing with a more declarative representation.
An augmented transition network (ATN) is a top-down parsing procedure that allows various kinds of knowledge to be incorporated into the parsing system so it can operate efficiently. Since the early use ofthe ATN in the LUNAR system [Woods, l973], which provided access to a large database of information on lunar geology, the mechanism has been exploited in many language-understanding systems. The ATN is similar to a finite state machine in which the class of labels that can be attached to the ares that define transitions between states has been augmented. Ares may be labeled with an arbitrary combination of the following:
Figure 15.8 shows an example of an ATN in graphical notation. Figure 15.9 shows the top-level ATN of that example in a notation that a program could read. To see how an ATN works, let us trace the execution of this ATN as it parses the following sentence:
This execution proceeds as follows:
(NP (FILE (LONG) DEFINITE))
The return causes the machine to be in state Q1. with the SUBJ register set to the structure just retumed and the TYPE register set to DCL.
(S DCL (NP (FILE (LONG) DEFINITE))
HAS
(VP PRINTED))
This structure is the output of the parse.
This example grammar illustrates several interesting points about the use of ATNs. A single subnetwork need only occur once even though it is used in more than one place. A network can be called recursively. Any number of intemal registers may be used to contain the result of the parse. The result of a network can be built, using the function BUILDQ, out of values contained in the various system registers. A single state may be both a final state, in which a complete sentence has been found, and an intermediate state, in which only a part of a sentence has been recognized. And, finally, the contents of a register can be modified at any time.
In addition, there are a variety of ways in which ATNs can be used which are not shown in this example:
ATN grammars have substantial procedural components. The grammar describes the order in which constituents must be built. Variables are explicitly given values, and they must already have been assigned a value before they can be referenced. This procedurality limits the effectiveness of ATN grammars in some cases, for example: in speech processing where some later parts of the sentence may have been recognized clearly while earlier parts are still unknown (for example, suppose we had heard, "The long * * * file printed."), or in systems that want to use the same grammar to support both understanding and generation (e.g., Appelt [1987], Shieber [1988], and Barnett et al. [1990]). Although there is no clear distinction between declarative and procedural representations (as we saw in Section 6.1), there is a spectrum and it often turns out that more declarative representations are more flexibte than more procedural ones are. So in this section we describe a declarative approach to representing grammars.
When a parser applies grammar rules to a sentence, it performs two major kinds of operations:
Now think back to the unification operation that we described in Section 5.4.4 as part of our theorem-proving discussion. Matching and structure building are operations that unification performs naturally. So an obvious candidate for representing grammars is some structure on which we can define a unification operator. Directed acyclic graphs (DAGs) can do exactly that.
Each DAG represents a set of attribute-value pairs. For example, the graphs corresponding to the words "the" and "file" are:
[CAT: DET [CAT: N
LEX: the] LEX: file
NUMBER: SING]
Both words have a lexical category (CAT) and a lexical entry. In addition, the word "file" has a value (SING) for the NUMBER attribute. The result of combining these two words to form a simple NP can also be described as a graph:
[NP: [DET: the
HEAD: file
NUMBER: SING]]
The rule that forms this new constituent can also be represented as a graph, but to do so we need to introduce a new notation. Until now, all our graphs have actually been trees. To describe graphs that are not trees, we need a way to Iabel a piece of a graph and then point to that piece elsewhere in the graph. So let {n} for any value of n be a label, which is to be interpreted as a label for the next constituent foflowing it in the graph. Sometimes, the constituent is empty (i.e., there is not yet any structure that is known to fill that piece of the graph). In that case, the label functions very much like a variable and will be treated like one by the unification operation. It is this degencrate kind of a label that we need in order to describe the NP rule:
We can write this rule as the following graph:
[CONSTITUENT1 : [CAT: DET
LEX: {1} ]
CONSTlTUENT2: [CAT: N
LEX: {2}
NUMBER: {3}]
BUILD: [NP: [DET: {1}
HEAD: {2}
NUMBER: {3}]]]
This rule should be read as follows: Two constituents, described in the subgraphs labeled CONSTITUENT1 and CONSTITUENT2, are to be combined. The first must be of CAT DET. We do not care what its lexical entry is, but whatever it is will be bound to the label { l }. The second constituent must be of CAT N. Its lexical entry will be bound to the label {2}, and its number will be bound to the label {3}. The result of combining these two constituents is described in the subgraph labeled BUILD. This result will be a graph corresponding to an NP with three attributes: DET, HEAD, and NUMBER. The values for all these attributes are to be taken from the appropriate pieces of the graphs that are being combined by the rule.
Now we need to define a unification operator that can be applied to the graphs we have just described. It will be very similar to logical unification. Two graphs unify if, recursively, all their subgraphs unify. The result of a successful unification is a graph that is composed ofthe union ofthe subgraphs ofthe two inputs, with all bindings made as indicated. This process bottoms out when a subgraph is not an attribute-value pair but is just a value for an attribute. At that point, we must define what it means for two values to unify. Identical values unify. Anything unifies with a variable (a Iabel with no attached structure) and produces a binding for the label. The simplest thing to do is then to say that any other situation results in failure. But it may be useful to be more fiexible. So some systems allow a value to match with a more general one (e.g., PROPER-NOUN matches NOUN). Others allow values that are disjunetions [e.g., (MASCULINE V FEMININE)], in which case unification succeeds whenever the intersection of the two values is not empty.
There is one other important difference between logical unification and graph unification. The inputs to logical unification are treated as logical formulas. Order matters, since, for example, f(g(a), h(b)) is a different formula than f(h(b), g(a)). The inputs to graph unification, on the other hand, must be treated as sets, since the order in which attribute-value pairs are stated does not matter. For example, if a rule describes a constituent as
[CAT: DET
LEX: {1} ]
we want to be able to match a constituent such as
[LEX: the
CAT: DET)
Algorithm: Graph-Unify
A simple parser can use this algorithm to apply a grammar rule by unifying CONSTITUENT1 with a proposed first constituent. If that succeeds, then CONSTITUENT2 is unified with a proposed second constituent. If that also succeeds, then a new constituent corresponding to the value of BUILD is produced. If there are variables in the value of BUILD that were bound during the matehing of the constituents, then those bindings will be used to build the new constituent.
There are many possible variations on the notation we have described here. There are also a variety of ways of using it to represent dictionary entries and grammar rules. See Shieber [1986] and Knight [1989] for discussions of some of them.
Although we have presented unification here as a technique for doing syntactic analysis, it has also been used as a basis for semantic interpretation. In fact, there are arguments for using it as a uniform representation for all phases of natural language understanding. There are also arguments against doing so, primarily involving system modularity, the noncompositionality of language in some respects (see Section 15.3.4), and the need to invoke substantial domain reasoning. We will not say any more about this here, but to see how this idea could work, see Allen [ I 989].
Producing a syntactic parse of a sentence is only the first step toward undorstanding it. We must still produce a representation of the meaning of the sentence. Because understanding is a mapping process, we must first deline the language into which we are trying to map. There is no single, definitive language in which all sentence meanings can be described. All of the knowledge representation systems that were described in Part II are candidates, and having selected one or more of them, we still need to define the vocabulary (i.e., the predicates, frames, or whatever) that will be used on top of the structure. In the rest of this chapter, we call the final meaning representation language, including both the representational framework and the specific meaning vocabulary, the target language. The choice of a target language for any particular natural language understanding program must depend on what is to be done with the meanings once they are constructed. There are two broad families of target languages that are used in NL systems, depending on the role that the natural language system is playing in a larger system (if any).
When natural language is being considered as a phenomenon on its own, as, for example, when one builds a program whose goal is to read text and then answer questions about it, a target language can be designed specifically to support language processing. In this case, one typically looks for primitives that correspond to distinctions that are usually made in language. Of course, selecting the right set of primitives is not easy. We discussed this issue briefiy in Section 4.3.3, and in Chapter 10 we looked at two proposals for a set of primitives, conceptual dependency and CYC.
When nalural language is being used as an interface language to another program (such as a database query system or an expert system), then the target language must be a legal input to that other program. Thus the design of the target language is driven by the backend program. This was the case in the simple example we discussed in Section 15.1.1. But even in this case, it is useful, as we showed in that example, to use an intermediate knowledge-based representation to guide the overall process. So, in the rest of this section, we assume that the target language we are building is a knowledge-based one.
Although the main purpose ofsemantic processing is the creation of a target language representation of a sentence's meaning, there is another important role that it plays. It imposes constraints on the representations that can be constructed, and, because of the structural connections that must exist between the syntactic structure and the semantic one, it also provides a way of selecting among competing syntactic analyses. Semantic processing can impose constraints because it has access to knowledge about what makes sense in the world. We already mentioned one example of this, the sentence, "Is the glass jar peanut butter?" There are other examples in the rest of this section.
Lexical Processing
The first step in any semantic processing system is to look up the individual words in a dictionary (or lexicon) and extract their meanings. Unfortunately, many words have several meanings, and it may not be possible to choose the correct one just by looking at the word itself. For example, the word "diamond" might have the following set of meanings:
To select the correct meaning for the word "diamond" in the sentence,
it is necessary to know that neither geometrical shapes nor baseball fields shimmer, whereas gemstones do.
Unfortunately, if we view English understanding as mapping from English words into objects in a specific knowledge base, lexical ambiguity is often greater than it seems in everyday English. For, example, consider the word "mean." This word is ambiguous in at least three ways: it can be a verb meaning "to signify"; it can be an adjective meaning "unpleasant" or "cheap"; and it can be a noun meaning "statistical average." But now imagine that we have a knowledge base that describes a statistics program and its operation. There might be at least two distinct objects in that knowledge base, both of which correspond to the "statistical average" meaning of "mean." One object is the statistical concept of a mean; the other is the particular function that computes the mean in this program. To understand the word "mean" we need to map it into some concept in our knowledge base. But to do that, we must decide which of these concepts is meant. Because of cases like this, lexical ambiguity is a serious problem, even when the domain of discourse is severely constrained.
The process of determining the correct meaning of an individual word is called word sense disambiguation or lexical disamiguation. It is done by associating, with each word in the lexicon, information about the contexts in which each of the word's senses may appear. Each of the words in a sentence can serve as part of the context in which the meanings of the other words must be determined.
Sometimes only very straightforward information about each word sense is necessary. For example, the baseball field interpretation of "diamond" could be marked as a LOCATION. Then the correct meaning of "diamond" in the sentence "I'll meet you at the diamond" could easily be determined if the fact that at requires a TIME or a LOCATION as its object were recorded as part of the lexical entry for at. Such simple properties of word senses are called semantic markers. Other useful semantic markers are
Using these markers, the correct meaning of "diamond" in the sentence "I dropped my diamond" can be computed. As part of its lexical entry, the verb "drop" will specify that its object must be a PHYSICAL-OBJECT. The gemstone meaning of "diamond" will be marked as a PHYSICAL-OBJECT. So it will be selected as the appropriate meaning in this context.
This technique has been extended by Wilks [1972;1975a;1975b] in his preference semantics, which relies on the notion that requirements, such as the one described above for an object that is a LOCATION, are rarely hard-and-fast demands. Rather, they can best be described as preferences. For example, we might say that verbs such as "hate ' prefer a subject that is animate. Thus we have no difficulty in understandingthe sentence
as describing the feelings of a man and not those of soft drinks. But now consider the sentence
Now there is no animate subject available, and so the metaphorical use of lawn acting as an animate object should be accepted.
Unfortunately, to solve the lexical disambiguation problem completely, it becomes necessary to introduce more and more finely grained semantic markers. For example, to interpret the sentence about Susan's diamond correctly, we must mark one sense of diamond as SHIMMERABLE, while the other two are marked NONSHIMMERABLE. As the number of such markers grows, the size of the lexicon becomes unmanageable. In addition, each new entry into the lexicon may require that a new marker be added to each of the existing entries. The breakdown of the semantic marker approach when the number of words and word senses becomes large has led to the development of other ways in which correct senses can be chosen. We retum to this issue in Section 15.3.4.
Sentence-Level Processing
Several approaches to the problem of creating a semantic representation of a sentence have been developed, including the following:
In the following sections, we discuss each of these approaches.
A semantic grammar [Burton, 1976; Hendrix et al.,1978; Hendrix and Lewis, 1981] is a context-free grammar in which the choice of nonterminals and production rules is governed by semantic as well as syntactic funetion. In addition, there is usually a semantic action associated with each grammar rule. The result of parsing and applying all the associated semantic actions is the meaning of the sentence. This close coupling of semantic actions to grammar rules works because the grammar rules themselves are designed around key semantic concepts.
An example of a fragment of a semantic grammar is shown in Figure 15.10. This grammar defines part of a simple interface to an operating system. Shown in braces under each rule is the semantic action that is taken when the rule is applied. The term "value" is used to refer to the value that is matched by the right-hand side of tbe rule. The dotted notation x.y should be read as the y attribute of the unit x. The result of a successful parse using this grammar will be either a command or a query.
A semantic grammar can be used by a parsing system in exactly the same ways in which a strictly syntactic grammar could be used. Several existing systems that have used semantic grammars have been built around an ATN parsing system, since it offers a great deal of Nexibility.
Figure 15.11 shows the result of app)ying this semantic grammar to the sentence
Notice that in this approach. we have combined into a single process all five steps of Section 15.1.1 with the exception of the final part of pragmatic processing in which the conversion to the system's command syntax is done.
The principal advantages of semantic grammars are the following:
After many experiments with the use of semantic grammars in a variety of domains, the conclusion appears to be that for producing restricted natural language interfaces quickly, they can be very useful. But as an overall solution to the problem of language understanding, they are doomed by their failure to capture important linguistic generalizations.
Case grammars [Fillmore. 1968; Bruce, 1975) provide a different approach to the problem of how syntactic and semantic interpretation can be combined. Grammar rules are written to describe syntactic rather than semantic regularities. But the structures the rules produce correspond to semantic relations rather than to strictly syntactic ones. As an example, consider the two sentences and the simplified forms of their conventional parse trees shown in Figure 15.12.
Although the semantic roles of "Susan" and "the file" are identical in these two sentences, their syntactic roles are reversed. Each is the subject in one sentence and the object in another.
Using a case grammar, the interpretations of the two sentences would both be
(printed (agent Susan)
(object File))
Now consider the two sentences shown in Figure 15.13.
The syntactic structures of these two sentences are almost identical. In one case, "Mother" is the subject of "baked," while in the other "the pie" is the subject. But the relationship between Mother and baking is very different from that between the pie and baking. A case grammar analysis ofthese two sentences refiects this difference. The first sentence would be interpreted as
(baked (agent Mother)
(timeperiod 3-hours))
Thc second would be interpreted as
(baked (object Pie)
(timeperiod 3-hours)) ln these representations, the semantic roles of "mother" and "the pie" are made explicit. It is interesting to note that this semantic information actually does intrude into the syntax of the language. While it is allowed to conjoin two parallel sentences (e.g., "the pie baked" and "the cake baked" become "the pie and the cake baked"), this is only possible if the conjoined noun phrases are in the same case relation to the verb. This accounts for the fact that we do not say, "Mother and the pie baked."
Notice that the cases used by a case grammar describe relationships between verbs and their arguments. This contrasts with the grammatical notion of surface case, as exhibited, for example, in English, by the distinetion between "I" (nominative case) and "me" (objective case). A given grammatical, or surface, case can indicate a variety of semantic, or deep, cases.
There is no clear agreement on exactly what the correct set of deep cases ought to be, but some obvious ones are the following:
The process of parsing into a case representation is heavily directed by the lexical entries associated with each verb. Figure 15.14 shows examples of a few such entries. Optional cases are indicated in parentheses.
Languages have rules for mapping from underlying case structures to surface syntactic forms. For example, in English, the "unmarked subject" [Footnote: The unmarked subject is the one that is used by default: it signals no special focus or emphaxis in the sentence.] is generally chosen by the following rule:
These rules can be applied in reverse by a parser to determine the underlying case structure from the superficial syntax.
Parsing using a case grammar is usually expectation-driven. Once the verb of the sentence has been located, it can be used to predict the noun phrases that will occur and to determine the relationship of those phrases to the rest of the sentence.
ATNs provide a good structure for case grammar parsing. Unlike traditional parsing algorithms in which the output structure always mirrors the structure ofthe grammar rules that created it, ATNs allow output structures of arbitrary form. For an example of their use, see Simmons [1973], which describes a system that uses an ATN parser to translate English sentences into a semantic net representing the case structures of sentences. These semantic nets can then be used to answer questions about the sentences.
The result of parsing in a case representation is usually not a complete semantic description of a sentence. For example, the constituents that fill the case slots may still be English words ratherthan true semantic descriptions stated in the target representation. To go the rest ofthe way toward buildinga meaning representation, we still require many of the steps that are described in Section 15.3.4.
Conceptual parsing, like semantic grammars, is a strategy for finding both the structure and the meaning of a sentence in one step. Conceptual parsing is driven by a dictionary that describes the meanings of words as coneeptual dependency (CD) structures.
Parsing a sentence into a conceptual dependency representation is similar to the process of parsing using a case grammar. In both systems, the parsing process is heavily driven by a set ofexpectations that are set up on the basis ofthe sentence's main verb. But bec:cuse the representation ofa verb in CD is at a Iower level than that ofa verb in a case grammar (in which the representation is often identical to the English word that is used), CD usually provides a greater degree of predictive power. The first step in mapping a sentence into its CD representation involves a syntactic processor that extracts the main noun and verb. It also determines the syntactic category and aspectual class of the verb (i.e., stative, transitive, or intransitive). The conceptual processor then takes over. It makes use of a verb-ACT dictionary, which contains an entry for each environment in which a verb can appear. Figure 15.15 (taken from Sehank [1973]) shows the dictionary entries associated with the verb "want." These three entries correspond to the three
kinds of wanting:
Once the correct dictionary entry is chosen, the conceptual processor analyzes the rest of the sentence looking for components that will tit iuto the empty slots of the verb structure. For example, if the stative form of"want" has been found, then the conceptual processor will look for a conceptualization that can be inserted into the structure. So, if the sentence being processed were
the structure shown in Figure 15.16 would be built.
The conceptual processor examines possible interpretations in a well-defined order. For example, if a phrase of the form "with PP" (recall that a PP is a picture producer) occurs. it could indicate any of the following relationships between the PP and the conceptualization of which it is a part:
Suppose that the conceptual processor were attempting to interpret the prepositional phrase in the sentence
First, the system's immediate memory would be checked to see if a park with a girl has been mentioned. If so, a reference to that particular objectis generated and the process terminates. Otherwise, the four possibilities outlined above are investigated in the order in which they are presented. Can "the girl" be an instrument of the main ACT (PTRANS) of this sentence? The answer is no, because only MOVE and PROPEL can be instruments of a PTRANS and their objects must be either body parts or vehicles. "Girl" is neither of these. So we move on to consider the second possibility In order for girl" to be an additional actor of the main ACT, it must be animate. It is. So this interpretation is chosen and the process terminates. If, however, the sentence had been
the process would not have stopped since a fountain is inanimate and cannot move. Then the third possibility would have been considered. Since parks can have fountains, it would be accepted and the process would terminate there. For a more detailed description of the way a conceptual processor based on CD works, see Schank (1973], Rieger [1975], and Riesbeck [1975].
This example illustrates both the strengths and the weaknesses of this approach to sentence understanding. Because a great deal of semantic information is exploited in the understanding process, sentences that would be ambiguous to a purely syntactic parser can be assigned a unique interpretation. Unfortunately, the amount of semantic information that is required to do this job perfectly is immense. All simple rules have exceptions. For example, suppose the conceptual processor described above were given the sentence
Since peacocks are animate, they would be acceptable as additional actors of the main verb, "went." Thus, the interpretation that would be produced would be that shown in Figure 15.17(a), while the more likely interpretation, in which John went to a park containing peacocks, is shown in Figure 15.17(b). But if the possible roles for a prepositional phrase introduced by "with" were considered in the order necessary for
this sentence to be interpreted correctly, then the previous example involving the phrase, "with Mary," would have been misunderstood.
The problem is that the simple check for the property ANIMATE is not sufficient to determine acceptability as an additional actor of a PTRANS. Additional knowledge is necessary. Some more knowledge can be inserted within the framework we have described for a conceptual processor. But to do a very good job of producing correct semantic interpretations of sentences requires knowledge of the larger context in which the sentence appears. Techniques for exploiting such knowledge are discussed in the next section.
The final approach to semantics that we consider here is one in which syntactic parsing and semantic interpretation are treated as separate steps, although they must mirror each other in well-defined ways. This is the approach to semantics that we looked at briefly in Section 15.1.1 when we worked through the example sentence "I want to print Bill's init file."
If a strictly syntactic parse of a sentence has been produced, then a straightforward way to generate a semantic interpretation is the following:
We have already discussed the first ofthese steps (in Section 15.3). In the rest of this section, we discuss the second.
Montague Semantics
Recall that we argued in Section 15.1.1 that the reason syntactic parsing was a good idea was that it produces structures that correspond to the structures that should result from semantic processing. If we investigate this idea more closely, we arrive at a notion called compositional semantics. The main idea behind compositional semantics is that, for every step in the syntactic parsing process, there is a corresponding step in semantic interpretation. Each time syntactic constituents are combined to form a larger syntactic unit, their corresponding semantic interpretations can be combined to form a larger semantic unit. The necessary mles for combining semantic structures are associated with the corresponding rules for combining syntactic structures. We use the word "compositional" to describe this approach because it defines the meaning of each sentence constituent to be a composition of the meanings of its constituents with the meaning of the rule that was used to create it. The main theoretical basis for this approach is modem (i.e., post-Fregean) logic; the clearest linguistic application is the work of Montague [Dowty et al., 1981; Thomason, 1974].
As an example of this approach to semantic interpretation, let's retum to the example that we began in Section 15.1.1. The sentence is
The output of the syntactic parsing process was shown in Figure 15.2, and a fragment of the knowledge base that is being used to define the target representation was shown in Figure 15.3. The result of semantic interpretation was also shown there in Figure 15.4. Although the exact form of semantic mapping rules in this approach depends on the way that the syntactic grammar is defined, we illustrate the idea of compositional semantic rules in Figure 15.18.
The first two rules are examples of verb-mapping rules. Read these rules as saying that they map from a partial syntactic structure containing a verb, its subject, and its object, to some unit with the attributes instance, agent. and object. These rules do two things. They describe the meaning of the verbs ("want" or "print") themselves in terms of events in the knowledge base. They also state how the syntactic arguments of the verbs (their subjects and objects) map into attributes of those events. By the way, do not get confused by the use of the term "object" in two different senses here. The syntactic object of a sentence and its semantic object are two different things. For historical reasons (ineluding the standard usage in case grammars as described in Section l5.3.2), they are often called the same thing, although this probtem is sometimes avoided by using some other name, such as affected-entity, for the semantic object. Altematively, in some knowledge bases, much more specialized names, such as printed-thing, are sometimes used as attribute names.
The third and fourth rules are examples of modifier rules. Like the verb rules, they too must specify both their own constituent's contribution to meaning as well as how it combines with the meaning of the noun phrase or phrases to which it is attached.
The last two rules are simpler. They define the meanings of nouns. Since nouns do not usually take arguments, these rules specify only single-word meanings; they do not need to describe how the meanings of larger constituents are derived from their components.
One important thing to remember about these rules is that since they define mappings from words into a knowledge base, they implicitly make available to the semantic processing system all the information contained in the knowledge base itself. For example, Figure 15.19 contains a description of the semantic information that is associated with the word "want" after applying the semantic rule associated with the verb and retrieving semantic constraints associated with wanting events in the knowledge base. Notice that we now know where to pick up the agent for the wanting (RMI) and we know some property that the agent must have. The semantic interpretation routine will reject any interpretation that does not satisfy all these constraints.
This compositional approach to defining semantic interpretation has proved to be a very powerful idea. (See, for example, the Absity syslem described in Hirst [1987].) Unfortunately, there are some linguistic constructions that cannot be ac:counted for naturally in a strictly compositional system. Quantified expressions have this property. Consider, for example, the sentence
There are several ways in which the relative scopes of the quantifiers in this sentence can be assigned. In the most likely, both existential quantifiers are within the scope of the universal yuantifier. But, in other readings, they are not. These include readings corresponding to, "There is a major such that every student who had not declared it took an English class," and "There is an English class such that every student who had not declared some major took it." In order to generate these meanings compositionally from the parse, it is necessary to produce a separate parse for each scope assignment. But there is no syntactic reason to do that, and it requires substantial additional effort. An altemative is to generate a single parse and then to use a noncompositional algorithm to generate as many alternative scopes as desired.
As a second example, consider the sentence, "John only eats meat on Friday and Mary does too." The syntactic analysis of this sentence must include the verb phrase constituent, "only eats meat on Friday," since that is the constituent that is picked up by the elliptical expression "does too." But the meaning of the first clause has a structure more like
which can be read as, "Meat is the only thing that John eats on Friday."
Extended Reasoning with a Knowledge Base
A significant amount of world knowledge may be necessary in order to do semantic interpretation (and thus, sometimes, to get the conect syntactic parse). Sometimes the knowledge is needed to enable the system to choose among competing interpretations. Consider, for example, the sentences
Let's concentrate on the problem of deciding to which constituent the prepositional phrase should be attached and of assigning a meaning to the preposition "with." We have two main choices: either the phrase attaches to the action of making the cake and "with" indicates the instrument relation, or the prepositional phrase attaches to the noun phrase describing the dessert that was made, in which case "with" describes an additional component of the dessert. The first two sentences are relatively straightforward if we imagine that our knowledge base contains the following facts:
But now consider the third sentence. A giant tower is neither a food nor a mixer. So it is not a likely candidate for either role. What is required here is the much more specific (and culturally dependent) fact that
The highly specific nature of this knowledge is illustrated by the fact that the last of these sentences does not make much sense to us since we can find no appropriate role for the tower, either as part of a pie or as an instrument used during pie making.
Another use for knowledge is to enable the system to accept meanings that it has not been explicitly told about. Consider the following sentences as examples:
Suppose our system has only the following meanings for the words "Joyce," "Washington," and "squirrel" (actually we give only the relevant parts of the meanings):
But suppose that we also have only the following meanings for the verbs in these sentences:
The problem is that it is not possible to construct coherent interpretations for any of
these sentences with these definitions. An author is not a Of course, this problem can become arbitrarily complex. For example, metaphors
are a rich source for Iinguistic expressions [Lakoff and Johnson, 1980]. And the
problem becomes even more complex when we move beyond single sentences and
attempt to extract meaning from texts and dialogues. We delve briefly into those issues
in Section 15.4.
The Interaction between Syntax and Semantics
If we take a compositional approach to semantics, then we apply semantic interpretation
rules to each syntactic constituent, eventually producing an interpretation for an entire
sentence. But making a commitment about what to do implies no specific commitment
about when to do it. To implement a system, however, we must make some decision
on how control will be passed back and forth between the syntactic and the semantic
processors. Two extreme positions are:
There are arguments in favor of each approach. The theme of most of the arguments
is seareh control and the opportunity to prune dead-end paths. Applying semantic
processing to each constituent as soon as it is produced allows semantics to rule out right
away those constituents that are syntactically valid but that make no sense. Syntactic
processing can then be informed that it should not go any further with those constituents.
This approach would pay off, for example, for the sentence, "Is the glass jar peanut
butter?" But this approach can be costly when syntactic processing builds constituents
that it will eventually reject as being syntactically unacceptable, regardless of their
semantic acceptability. The sentence, "The horse raced past the barn fell down," is
an example of this. There is no point in doing a semantic analysis of the sentence
"The horse raced past the bam," since that constituent will not end up being part of
any complete syntactic parse. There are also additional arguments for waiting until a
co mplete sentence has been parsed to do atleast some parts of semantic interpretation.
These arguments involve the need for large constituents to serve as the basis of those
semantic actions, such as the ones we discussed in Section 15.3.4. that are hard to detine
completely compositionally. There is no magic solution to this problem. Most systems
use one of these two extremes or a heuristically driven compromise position.
To understand even a single sentence, it is necessary to consider the discourse and
pragmatic context in which the sentence was uttered (as we saw in Section 15.1.1 ).
These issues become even more important when we want to understand texts and
dialogues, so in this section we broaden our concern to these larger linguistic units.
There are a number of important relationships that may hold between phrases and parts
af their discourse contexts, including:
The word "it" should be identified as referring to the red balloon. References such
as this are called anaphoric references or anaphora.
The phrase "the title page" should be recognized as being part of the book that
was just bought.
Taking a flight should be recognized as part of going on a trip.
The pronoun "they" should be recognized as referring to the burglars who broke
into the house.
The moons in the second sentence should be understood to be some of thc moons
mentioned in Ihe first sentence. Notice that to understand Ihe second sentenec at
all reyuires that we use the context of the tirst sentence to establish that the word
"moons" means moon decals.
Dave should be understood to be some person named Dave. Although there are
many, the speaker had one particular one in mind and the discourse context should
tell us which.
The snow should be recognized as the reason that the schools were closed.
Sally's sudden interest in a job should be recognized as arising out of her desire
for a new car and thus for the money to buy one.
In many circumstances, this sentence should be recognized as having, as its
intended effect, that the hearer should do something like close the window or turn
up the thermostat.
The speaker's presuppositions, including the fact that C5101 is a valid course,
that Joe is a student, and that Joe took CS 1 O1, should be recognized so that if any
of them is not satisfied, the speaker can be informed.
In order to be able to recognize these kinds of relationships among sentences, a
great deal of knowledge about the world being discussed is required. Programs that
can do multiple-sentence understanding rely either on large knowledge bases or on
strong constraints on the domain of discourse so that only a more limited knowledge
base is necessary. The way this knowledge is organized is critical to the success of
the understanding program. In the rest of this section, we discuss brietly how some of
the knowledge representations described in Chapters 9 and 10 can be exploited by a
language-understanding program. In particular, we focus on the use of the following
kinds of knowledge:
Although these issues are complex, we discuss them only brieHy here. Most of the
hard problems are not peculiar to natural language processing. They involve reasoning
about objects, events, goals, plans, intentions, beliefs, and likelihoods, and we have
discussed all these issues in some detail elsewhere. Our goal in this section is to tie
those reasoning mechanisms into the process of natural language understanding.
There are two important parts of the process of using knowledge to facilitate understanding:
The first of these is critical if the amount of knowledge available is large. Some
techniques for handling this were outlined in Section 4.3.5, since the problem arises
whenever knowledge structures are to be used.
The linguistic properties of coherent discourse, however, provide some additional
mechanisms for focusing. For example, the structure of task-oriented discourses typically
mirrors the structure of the task. Consider the following sequence of (highly
simplified) instructions:
This task decomposes into three subtasks: making the cake, making the filling, and
combiningthetwocomponents. Thestructureoftheparagraphofinstructionsis: overall
sketch of the task, instructions for step 1, instructions for step 2, and then instructions
for step 3.
A second property of coherent discourse is that dramatic changes of focus are
usually signaled explicitly with phrases such as "on the other hand," "to return to an
earlier topic," or "a second issue is."
Assuming that all this knowledge has been used successfully to focus on the relevant
part(s) of the knowledge base, the second issue is how to use the focused knowledge
to help in understanding. There are as many ways of doing this as there are discourse
phenomena that reyuire it. In the last section, we presented a sample list of those
phenomena. To give one example, consider the problem of finding the meaning of
definite noun phrases. Definite noun phrases are ones that refer to specific individual
objects, for example, the first noun phrase in the sentence, "The title page was torn."
The title page in question is assumed to be one that is related to an object that is currently
in focus. So the procedure for finding a meaning for it involves searehing for ways in
which a title page could be related to a focused object. Of course, in some sense, almost
any object in a knowledge base relates somehow to almost any other. But some relations
are far more salient than others, and they should be considered first. Highly salient
relations inelude physical-part-of, temporal-part-of, and element-of. In this example,
physical-part-of relates the title page to the book that is in focus as a result of its mention
in the previous sentence.
Other ways of using focused information also exist. We examine some of them in
the remaining parts of this section.
In order for a program to be able to participate intelligently in a dialogue, it must be able
to represent not only its own beliefs about the world, but also its knowledge of the other
dialogue participant's beliefs about the world, that person's beliefs about the computer's
beliefs, and so forth. The remark "She knew I knew she knew I knew she knew" [Footnote: From
Kingsley Amis' Jake's Thing.] may
be a bit extreme, but we do that kind of thinking all the time. To make computational
models of belief, it is useful to divide the issue into two parts: those beliefs that can
be assumed to be shared among all the participants in a linguistic event and those that
cannot.
Modeling Shared Beliefs
Shared beliefs can be modeled without any explicit notion of belief in the knowledge
base. All we need to do is represent the shared beliefs as facts, and they will be accessed
whenever knowledge about anyone's beliefs is needed. We have already discussed
techniques for doing this. For example, much ofthe knowledge described in Chapter 10
is exactly the sort that people presume is shared by other people they are communicating
with. Scripts, in particular, have been used extensively to aid in natural language
understanding. Recall that seripts record commonly occurring sequences of events.
There are two steps in the process of using a script to aid in language understanding:
Both of these aspects of reasoning with seripts have already been discussed in Section 10.2.
The story-understanding program SAM [Cullingford,1981] demonstrated
the usefulness of such reasoning with scripts in natural language understanding. To
understand a story, SAM first employed a parser that translated the English sentences
into their conceptual dependency representation. Then it built a representation of the
entire text using the relationships indicated by the relevant scripts.
Modeling Individual Beliefs
As soon as we decide to represent individual beliefs, we need to introduce some explicit
predicate(s) to indicate that a fact is believed. Up until now, belief has been indicated
only by the presence or absence of assertions in the knowledge base. To model belief, we
need to move to a logic that supports reasoning about belief propositions. The standard
approach is to use a modal logic such as that defined in Hintikka [1962]. Logic, or
"classical" logic, deals with the truth or falsehood of different statements as they are.
Modal logic, on the other hand, concerns itself with the different "modes" in which
a statement may be true. Modal logics allow us to talk about the truth of a set of
propositions not only in the current state of the real world, but also about their truth or
falsehood in the past or the future (these are called temporal logics), and about their
truth or falsehood under circumstances that might have been, but were not (these are
sometimes called conditional logics). We have already used one idea from modal logic,
namely the notion necessarily true. We used it in Section 13.5, when we talked about
nonlinear planning in TWEAK.
Modal logics also allow us to talk of the truth or falsehood of statements concerning
the beliefs, knowledge, desires, intentions, and obligations of people and robots, which
may, in fact be, respectively, false, unjustified, unsatisfiable, irrational, or mutually
contradictory. Modal logics thus provide a set ofpowerful tools for understanding natural
language utterances, which often involv'e reference to other times and circumstances,
and to the mental states of people.
In particular, to model individual belief we define a modal operator BELIEVE, that
enables us to make assertions of the form BELIEVE(A, P), which is true whenever A
believes P to be true. Notice that this can occur even if P is believed by someone else
to be false or even if P is false.
Another useful modal operator is KNOW:
A third useful modal operator is KNOW-WHAT(A. P), which is true if A knows the
value of the funetion P. For example, we might say that A knows the value of his age.
An alternative way to represent individual beliefs is to use the idea of knowledge
base partitioning that we discussed in Section 9.1. Partitioning enables us to do two
things:
Requirement 1 makes it imperative that shared beliefs not be duplicated in the
representation. This suggests that a single knowledge base must be used to represent the
beliefs of all the participants. But reyuirement 2 demands that it be possible to separate
the beliefs of one person from those of another. One way to do this is to use partitioned
semantic nets. Figure 15.20 shows an example of a partitioned belief space.
Three different belief spaces are shown:
Consider the text
To understand this story, we need to recognize that John had
Some of the common goals that can be identified in stories of all sorts (including
children`s stories, newspaper reports, and history books) are
To achieve their goals, people exploit plans. In Chapter 13, we talked about several
computational representations of plans. These representations can be used to support
natural language processing, particularly if they are combined with a knowledge base of
operators and stored plans that describe the ways that people often accomplish common
goals. These stored operators and plans enable an understanding system to form a
coherent representation of a text even when steps have been omitted, since they specify
things that must have occurred in the complete story. For example, to understand this
simple text about John, we need to make use of the fact that John was exploiting the
operator USE (by A of P to perform G), which can be described as:
USE(A, P, G):
precondition: KNOW-WHAT(A, LOCATION(P))
NEAR(A, P)
HAS-CONTROL-OF(A, P)
READY(P)
postcondition: DONE(G)
In other words, for A to use P to perform G, A must know the location of P, A must
be near P, A must have coştrol of P (for example, I cannot use a serewdriver that you
are holding and refuse to give to me), and P must be ready for use (for example, I cannot
use a broken serewdriver).
In our story, John's plan for constructing the bike ineludes using a sevewdriver. So
he needs to establish the preconditions for that use. In particular, he needs to know the
location of the serewdriver. To find that out, he makes use of the operator LOOK-FOR:
LOOK-FOR(A, P):
precondition: CAN-RECOGNIZE(A, P)
postcondition: KNOW-WHAT(A, LOCATION(P))
A story understanding program can connect the goal of putting together the bike
with the activity of looking for a serewdriver by recognizing that John is looking for a
serewdriver so that he can use it as part of putting the bike together.
Often there are alternative operators or plans for achieving the same goal. For
example, to find out where the screwdriver was, lohn could have asked someone. Thus
the problem of constructing a coherent interpretation of a text or a discourse may involve
considering many partial plans and operators.
Plan recognition has served as the basis for many understanding programs. PAM
[Wilensky, 1981] is an early oxample; it translated stories into a CD representation.
Another such program was BORIS [Dyer,1983]. BORIS used a memory structure called
the Thematic Abstraction Unit to organize knowledge about plans, goals, interpersonal
relationships, and emotions. For other examples, see Allen and Perrault [1980] and
Sidner [1985].
Language is a form of behavior. We use it as one way to accomplish our goals. In
essence, we make communicative plans in much the same sense that we make plans for
anything else [Austin,1962]. In fact, as we just saw in the example above, John could
have achieved his goal of locating a screwdriver by asking someone where it was rather
than by looking for it. The elements of communicative plans are called speech acts
[Searle,1969]. We can axiomatize speech acts just as we axiomatized other operators in
the previous section, except that we need to make use of modal operators that describe
states of belief, knowledge, wanting, etc. For example, we can define the basic speech
act A INFORM B of P as follows:
INFORM(A, B, P)
precondition: BELIEVE(A, P)
KNOW-WHAT(A, LOCATION(B))
postcondition: BELIEVE(B, BELIEVE(A, P))
BELIEVE-IN(B, A) --? BELIEVE(B, P)
To execute this operation, A must believe P and A must know where B is. The result
of this operator is that B believes that A believes P, and if B believes in the truth of
what A says, then B also believes P.
We can deline other speech acts similarly. For example, we can define ASK-WHAT
(in which A asks H the value of some predicate P):
ASK-WHAT(A, B, P):
precondition: KNOW-WHAT(A, LOCATION(B))
KNOW-WHAT(B, P)
WILLING-TO-PERFORM
(B, INFORM(B, A, P))
postcondition: KNOW-WHAT(A, P)
This is the action that John could have performed as an alternative way of finding a
screwdriver.
We can also define other speech acts, such as A REQUEST B to petform R:
REQUEST(A, B, R)
precondition: KNOW-WHAT(A, LOCATION(B))
CAN-PERFORM(B, R)
WILLING-TO-PERFORM(B, R)
postcondition: WILL(PERFORM(B, R))
Unfortunately, this analysis of language is complicated by the fact that we do not always
say exactly what we mean. Instead, we often use indirect speech acts, such as "Do you
know what time it is?" or "It sure is cold in here." Searle [1975] presents a linguistic
theory of such indirect speech acts. Computational treatments of this phenomenon
usually rely on models of the speaker's goals and of ways that those goals might
reasonably be achieved by using language. See, for example, Cohen and Perrault
[l979].
Fortunately, there is a certain amount of regularity in people's goals and in the way
language can be used to achieve them. This regularity gives rise to a set of conversational
postulates, which are rules about conversation that are shared by all speakers. Usually
these rules are followed. Sometimes they are not, but when this happens, the violation
of the rules communicates something in itself. Some of these conversational postulates
are:
A inferred from B's incomplete response that B did not know who won the race,
because if B had known she would have provided a name.
Of course, sometimes people "cop out" of these conventions. In the following
dialogue, B is explicitly copping out:
But in the absence of such a cop out, and assuming a cooperative relationship between
the panies to a dialogue, the shared assumption of these postutates greatly facilitates
communication. For a more detailed discussion of conversational postulates, see Grice
[1975] and Gordon and Lakoff [1975].
We can axiomatize these conversational postulates by augmenting the preconditions
for the speech acts that we have already defined. For example, we can describe the sin-
cerity conditions by adding the following clauses to the precondition for REQUEST(A,
B, R):
WANT(A, PERFORM(B, R))
BELIEVE(A, CAN-PERFORM(B, R))
BELIEVE(A, WILLING-TO-PERFORM(B, R))
BELIEVE(A, ,WILL(PERFORM(B, R)))
If we assume that each panicipant in a dialogue is following these conventions, then
it is possible to infer facts about the participants' belief states from what they say. Those
facts can then be used as a basis for constructing a coherent interpretation of a discourse
as a whole.
To summarize, we have just described several techniques for representing knowledge
about how people act and talk. This knowledge plays an imponant role in text and
discourse understanding, since it enables an understander to fill in the gaps left by the
original writer or speaker. It turns out that many ofthese same mechanisms, in particular
those that allow us to represent explicitly the goals and beliefs of multiple agents, will
also turn out to be useful in constructing distributed reasoning systems, in which several
(at least partially independent) agents interact to achieve a single goal. We come back
to this topic in Section 16.3.
In this chapter, we presented a brief introduction to the surprisingly hard problem of
language understanding. Recall that in Chapter 14, we showed that at least one understanding
problem, line labeling, could effectively be viewed as a constraint satisfaction
problem. One interesting way to summarize the natural language understanding problem
that we have described in this chapter is to view it too as a constraint satisfaction
problem. Unfonunately, many more kinds of constraints must be considered, and even
when they are all exploited, it is usually not possible to avoid the guess and seareh
pan of the constraint satisfaction procedure. But constraint satisfaction does provide a
reascinable framework in which to view the whole collection of steps that together create
a meaning for a sentence. Essentially each of the steps described in this chapter exploits
a particular kind of knowledge that contributes a specific set of constraints that must be
satisfied by any correct final interpretation of a sentence.
Syntactic processing contributes a set of constraints derived from the grammar of
the language. It imposes constraints such as:
Semantic processing contributes an additional set of constraints derived from the
knowledge it has about entities that can exist in the world. It imposes constraints such
Discourse processing contributes a funher set of constraints that arise from the
structure of coherent discourses. These include:
There are many important issues in natural language processing that we have barely
touched on here. To learn more about the overall problem, see Allen [1987], Cullingford
[1986], Dowty et al. [1985], and Grosz et al. [1986]. For more information on syntactic
processing, see Winograd [1983] and King [1983). See Joshi et al. [1981] for more
discussion of the issues involved in discourse understanding. Also, we have restricted
our discussion to natural language understanding. It is often useful to be able to go
the other way as well, that is, to begin with a logical description and render it into
English. For discussions of natural language generation systems, see McKeown and
Swartout [1987] and McDonald and Bolc [1988]. By combining understanding and
generation systems, it is possible to attack the problem of machine translation, by which
we understand text written in one language and then generate it in another language. See
Slocum [1988], Nirenburg [1987], Lehrberger and Bourbeau [1988], and Nagao [1989]
for discussions of a variety of approaches to this problem.
The old man's glasses were filled with sherry.
What information is necessary to choose the correct meaning for the word
"glasses '? What information suggests the incorrect meaning?
After you have done this, you might want to look at the discussion of this problem
in Church and Patil [1982].
Do not expect to produce an ATN that can handle all possible verb phrases. But do
design one with a reasonable structure that handles most common ones, including
the ones above. The grammar should create structures that reflect the structures
of the input verb phrases.
in the two formalisms.
To do this, you will need to do all the following things:
B: He's a great guy. Everyone likes him.
B (A's friend): What do you want to buy?
15.4 Discourse and Pragmatic Processing
15.4.1 Using Focus in Understanding
15.4.2 Modeling Beliefs
Figure 15.20: A Partitioned Semantic Net Showing Three Belief Spaces
15.4.3 Using Goals and Plans for Understanding
15.4.4 Speech Acts
15.4.5 Conversational Postulates
15.5 Summary
And finally, pragmatic processing contributes yet another set of constraints. For
example,
15.6 Exercises