250 120 11MB
English Pages 167 [180] Year 1980
Frame Conceptions and Text Understanding
Research in Text Theory Untersuchungen zur Texttheorie Editor Jänos S. Petöfi, Bielefeld Advisory Board Irena Bellert, Montreal Maria-Elisabeth Conte, Pavia Teun A. van Dijk, Amsterdam Wolfgang U. Dressler, Wien Peter Hartmann, Konstanz Robert E. Longacre, Dallas Roland Posner, Berlin Hannes Rieser, Bielefeld Volume 5
W G_ DE
Walter de Gruyter • Berlin • New York 1980
Frame Conceptions and Text Understanding Edited by Dieter Metzing
W DE G Walter de Gruyter • Berlin • New York 1980
Library of Congress Cataloging in Publication Data
Main entry under title: Frame conceptions and text understanding. (Research in text theory ; v. 5) Includes indexes. 1. Discourse analysis. 2. Comprehension. I. Metzing, Dieter. II. Series. P302.F7 410 79-23482
CIP-Kurztitelaufnahme
der Deutschen
Bibliothek
Frame conceptions and text understanding / ed. by Dieter Metzing. - Berlin, New Y o r k : de Gruyter, 1979. (Research in text theory ; Vol. 5) ISBN 3 - 1 1 - 0 0 8 0 0 6 - 0 N E : Metzing, Dieter [Hrsg.]
© Copyright 1979 by Walter de Gruyter Sc Co., vormals J . J . Göschen'sche Verlagshandlung J . Guttentag, Verlagsbuchhandlung- Georg Reimer - Karl J . Trübner - Veit & Comp., Berlin 30. Printed in Germany Alle Rechte des Nachdrucks, der photomechanischen Wiedergabe, der Herstellung von Photokopien - auch auszugsweise - vorbehalten. Satz und Druck: Passavia Passau Bindearbeiten: Lüderitz Sc Bauer, Berlin
Contents List of Contributors Preface
VI VII
Part One M. Minsky A Framework for Representing Knowledge
1
I. P. Goldstein, R.B. Roberts Nudge, a knowledge-based Scheduling Program
26
P.J. Hayes The Logic of Frames
46
Part Two E. Charniak Ms. Malaprop, a Language Comprehension Program
62
W.G. Lehnert The Role of Scripts in Understanding
79
St.T. Rosenberg Frame-based Text Processing
96
C. Sidner Discourse and Reference Components of PAL
120
Y. Wilks Frames, Semantics and Novelty
134
Index
164
List of Contributors E. Charniak,
Department of Computer Science, Brown University, Providence (USA) I. P. Goldstein, Xerox Palo Alto Research Center, Palo Alto (USA) P.J. Hayes, Department of Computer Science, Essex University, Colchester (GB) W.G. Lehnert, Department of Computer Science, Yale University, New Haven (USA) M . Minsky, Artificial Intelligence Laboratory, Mass. Inst, of Technology, Cambridge (USA) R. B. Roberts, Artificial Intelligence Laboratory, Mass. Inst, of Technology, Cambridge (USA) St. T. Rosenberg, Lawrence Berkeley Laboratory, Berkeley University, Berkeley (USA) C. Sidner, Artificial Intelligence Laboratory, Mass. Inst, of Technology, Cambridge (USA) Y. Wilks, Department of Language and Linguistics, Essex University, Colchester (GB)
Preface The contributions of this volume deal with frame conceptions, their application and development in the domain of natural language and text understanding. "Frame conceptions" is taken as a name for a number of tentative but productive and influential ideas on knowledge representation, information processing, recognition and reasoning processes. Frame conceptions are starting to play an important role in Computer Science (Artificial Intelligence), (Cognitive) Linguistics, (Cognitive) Psychology and Education. Though it covers a broad range of 'frame issues' this volume is far from being a complete 'frame documentation'. Only a selected documentation of a first period of exploration and experimentation is presented, facilitating the access to several important 'frame papers'. In the first part more general frame properties are discussed whereas in the second part frame conceptions are applied to topics of text understanding. For readers more familiar with text linguistics than with 'frame literature' some additional remarks may be in order. The relevance of the selected contributions for the development for computer models of text understanding may be obvious, but they also seem important for some other reasons. The frame conceptions considered are based on fundamental theoretical assumptions about human information processing and text understanding and these assumptions may be evaluated separately from designed or implemented process models. Furthermore, a satisfactory theory of discourse could hardly leave out the process aspects of discourse and any design of process models could hardly ignore frame-oriented approaches to text understanding. Interactions between text sentences on one hand and a knowledge base on the other are a central topic in the following contributions, whereas approaches analyzing a text as an object of its own characterized by text-specific structural and stylistic properties are not considered. This may be changed in the course of future research, for stylistic knowledge (e.g. about types of text) could be organized in terms of frames and structural/stylistic properties of text may convey information relevant to inference processes. The first contribution is a condensed version of a very influential article of Minsky. Frame conceptions were at first developed at MIT in the domain of
vision and natural language processing (e.g. Kuipers, Winograd); they were at first applied to story understanding at YALE (Schank and his collaborators) and a bit later by Charniak and at MIT (Rosenberg, Sidner). In his contribution Minsky points out the expectation-driven aspects of recognition and comprehension, aspects requiring special devices for the organization of knowledge as well as for the selection of relevant knowledge. From aframe-based point of view text understanding means: choosing frames, collecting evidence from text sentences, filling in details, assuming standard details missing, making conjectures, inferring, testing and revising assumptions. The general requirements for frame-based information processing (and acquisition) stated by Minsky may be taken as guidelines for the design of higher-level representation languages, 'frame representation languages'. (Note that each representation language tends to have a special view on the problems to be solved). Two such languages have been implemented: FRL-0 by Goldstein and Roberts and KRL-0 by Bobrow and Winograd; they have been applied to problems of text processing, FRL-0 by Rosenberg and Sidner, KRL-0 by Lehnert. Goldstein and Roberts frame representation language is object-centered, not action-centered (so is KRL). FRL may be viewed as an augmented semantic network whose nodes may be complex items, frames. These nodes may be specifid by: comments (meta-descriptions), abstraction hierarchies (direct and indirect ones), defaults and constraints (followed by a value or a description) and procedural attachment. Sequences of events may be represented as plans (following conventions of Sacerdoti's procedural nets). To understand a text means - according to Goldstein/Roberts: setting up partially instantiated frames on the basis of evidence from text sentences and building up frame gestalts on the basis of additional information represented in generic frames. If frames are more than "a cumbersome convention for the listing of facts" (cf. Dresher/Hornstein, Cognition 4, 1976), if they are elements of a representation language, are they then a new syntax with or without a welldefined semantics ? If inferences are the principal part of text understanding, what kinds of 'frame-inference' mechanisms are needed ? In his contribution Hayes examines the logical aspects of the frame representation language KRL. His critical analysis is important in at least three aspects in that: - It helps to clarify the semantic status of frames in KRL which is compared to Predicate Calculus. (For a more process-oriented semantics of KRL see B. Smith, Levels, Layers, and Planes, MIT thesis 1978). - It points out the importance of default reasoning taken on the one hand as a fundamental element of human communication and on the other as a device to be integrated into a system of formal logic.
- It may bridge some gaps between logic-oriented approaches to textlinguistics and frame-oriented approaches to text comprehension. The contributions of the second part of this volume are based on experiences with special text processing systems. Charniak who used so-called 'Demons' in his first story understander for reference problems, replaced them by the richer structure of action-frames (and other types of frames). His action-frames are an example of a procedural frame representation (allowing e.g. for iteration). According to Charniak to understand a text means to "fit what one is told into the framework established by what one already knows". To understand statements about actions means to know: why an action was performed; what is its position in a chain of actions; what could happen if a certain action is not executed. A prerequisite for understanding a text about a subject domain is to become competent in this domain. A test of competence is the processing of text statements followed by How and Why questions. To understand a text sentence means to match it against a frame statement. A frame statement may be viewed as an element of a network of frames guiding the construction of reasoning chains (cf. the LEADS-TO and COMESFROM links). In experimenting with a Script Applier Mechanism (SAM) Lehnert was in a position to use different representational structures: a semantic representation based on action primitives (Schank's Conceptual Dependency), scripts containing knowledge about stereotypic situations and plans required for less stereotypic goal-oriented activities. Conceptual Dependency is used as an aid for script application. Note that the need of semantic primitives for a frame representation language is still an open question (cf. Lehnert/Wilks and Bobrow/Winograd in Cognitive Science 3,1979). To understand a text means according to Lehnert: recognizing the relevant scripts, determining word senses using script expectations, completing causal chains, elaborating statements or questions according to objects in focus, i.e. objects belonging to activated scripts. At least three kinds of relation might be assumed to exist between a text and a knowledge base: - Both are completely separated. A semantic representation of a text is set up without consulting a knowledge base. A text is viewed as a (particular) sequence of statements referring to a (particular) sequence of state of affairs, sequence which has to be determined each time putting together very elementary building blocks (elements of (simple) propositions). - The only role of each text sentence is to change the state of a knowledge base; and this role is completed as soon as its consequences for a knowledge base have been determined. - There are multiple interactions between both: text statements may be completed or augmented by information from the knowledge base; se-
quences of state of affairs may be recognized by putting together more complex building blocks (elements of a frame system); consequences of a text sentence may only be determined at the end of a sequence of text sentences; a particular arrangement of statements or state of affairs may become part of the knowledge base. So it makes sense to distinguish between a natural language text, a story base and a knowledge base. In his contribution Rosenberg analyses some interactions between text sentences, a story date base and a knowledge base studying especially some intersentence links (coreference) and theme organization. Tasks of a discourse processor are described and exemplified by frame representations related to a text example. It appears that frame conceptions have only been applied to more traditional views of a text seen as a connected sequence of sentences or propositions. But according to a different view a text is seen as a connected sequence of speech acts. "In comprehension, people do not process sentences as abstract formulae, they observe that someone has said something to them, and attempt to interpret that act and its consequences, which may include modification of their model of the world." . . . "A speech act is performed for some purpose, with some goal in mind. And complete understanding involves the ability to infer such goals and purposes at every level, from inferring the purpose of referring expressions to inferring the speaker's overall goal in constructing the text." (Morgan, in: Theoretical Issues in Natural Language Processing 2, Proceedings ACM 1978, lllf).
The contribution of Sidner shows how a frame representation language may be used for a speech act oriented approach to text comprehension. PAL is a model of a hearer interpreting a speaker's extended discourse, establishing the reference and discourse purpose of each utterance. The knowledge represented in a frame is useful in order to recognize sequences of speech acts belonging to a superordinate purpose and useful for a recognition of focus and focus-shift. It is very instructive to compare informal discourse models - as for example those outlined by Winograd (in: M. Just, P. Carpenter (eds.), Cognitive Processes in Comprehension, New York 1977) or by Morgan - with those discourse models which are implemented, in order to evaluate the usefulness of frame conceptions for context dependent aspects of discourse structures and reasoning processes - and in order to evaluate the amount of work still to be done. Frame representations are still associated with a number of technical problems as to their: internal structure, basic elements, semantics, their selection, matching, acquisition, and as to the amount of information to be represented in frames. Other questions concern their application in systems modelling text understanding: are they a uniform way of knowledge representation (as has been claimed for KRL for example) or are they more suited for certain
types of knowledge ? Rosenberg for example, proposed to use framed bundles of standard knowledge as elements in a planning representation language (MIT, A.I. Lab, Working Paper 156, 1977; MIT, A.I. Lab, Memo 475, 1978). In his contribution to this volume Wilks proposes the use of structures simpler than frames for a first level of text representation using frame-like structures ('proto-texts') only at a higher level in order to interpret extended uses of word sense. He discusses the incorporation of these richer semantic structures into his system of preference semantics. What are frame conceptions ? They are "working concepts", useful for some time in attempts to develop a new theory of human information processing (cf. Bobrow/Winograd, in: Cognitive Science 3,1979), and as such they may soon be replaced by other "working concepts". Dieter Metzing
M. MINSKY
A Framework for Representing Knowledge"' Here is the essence of the frame theory: When one encounters a new situation (or makes a substantial change in one's view of a problem), one selects from memory a structure called a frame. This is a remembered framework to be adapted to fit reality by changing details as necessary. A frame is a data-structure for representing a stereotyped situation like being in a certain kind of living room or going to a child's birthday party. Attached to each frame are several kinds of information. Some of this information is about how to use the frame. Some is about what one can expect to happen next. Some is about what to do if these expectations are not confirmed. We can think of a frame as a network of nodes and relations. The "top levels" of a frame are fixed, and represent things that are always true about the supposed situation. The lower levels have many terminals — "slots" that must be filled by specific instances or data. Each terminal can specify conditions its assignments must meet. (The assignments themselves are usually smaller "sub-frames.") Simple conditions are specified by markers that might require a terminal assignment to be a person, an object of sufficient value, or a pointer to a sub-frame of a certain type. More complex conditions can specify relations among the things assigned to several terminals. Collections of related frames are linked together into frame-systems. The effects of important actions are mirrored by transformations between the frames of a system. These are used to make certain kinds of calculations economical, to represent changes of emphasis and attention, and to account for the effectiveness of "imagery." For visual scene analysis, the different frames of a system describe the scene from different viewpoints, and the transformations between one frame and another represent the effects of moving from place to place. For non-visual kinds of frames, the differences between the frames of a system can represent actions, cause-effect relations, or changes in conceptual viewpoint. Different frames of a system share the same terminals; this is the critical point that makes it possible to coordinate information gathered from different viewpoints. * This is a highly condensed version of a much longer paper in The Psychology Vision, (P.H. Winston, Ed.) McGraw-Hill 1975.
of
Computer
2
A Framework for Representing Knowledge
Much of the phenomenological power of the theory hinges on the inclusion of expectations and other kinds of presumptions. A frame's terminals are normally already filled with "default" assignments. Thus, a frame may contain a great many details whose supposition is not specifically warranted by the situation. These have many uses in representing general information, most likely cases, techniques for by-passing "logic," and ways to make useful generalizations. The default assignments are attached loosely to their terminals, so that they can be easily displaced by new items that fit better the current situation. They thus can serve also as "variables" or as special cases for "reasoning by example," or as "textbook cases," and often make the use of logical quantifiers unnecessary. The frame-systems are linked, in turn, by an information retrieval network. When a proposed frame cannot be made to fit reality — when we cannot find terminal assignments that suitably match its terminal marker conditions — this network provides a replacement frame. These interframe structures make possible other ways to represent knowledge about facts, analogies, and other information useful in understanding. Once a frame is proposed to represent a situation, a matching process tries to assign values to each frame's terminals, consistent with the markers at each place. The matching process is partly controlled by information associated with the frame (which includes information about how to deal with surprises) and partly by knowledge about the system's current goals. There are important uses for the information, obtained when a matching process fails; it can be used to select an alternative frame that better suites the situation. Local and Global Theories for Vision When we enter a room we seem to see the entire scene at a glance. But seeing is really an extended process. It takes time to fill in details, collect evidence, make conjectures, test,deduce, and interpret in ways that depend on our knowledge, expectations and goals. Wrong first impressions have to be revised. Nevertheless, all this proceeds so quickly and smoothly that it seems to demand a special explanation. Would parallel processing help ? This is a more technical question than it might seem. At the level of detecting elementary visual features, texture elements, stereoscopic and motion-parallax cues, it is obvious that parallel processing might be useful. At the level of grouping features into objects, it is harder to see exactly how to use parallelism, but one can at least conceive of the aggregation of connected "nuclei" (Guzman TR-228), or the application of boundary line constraint semantics (Waltz TR-271), performed in a special parallel network.
M . Minsky
3
At "higher" levels of cognitive processing, however, one suspects fundamental limitations in the usefulness of parallelism. Many "integral" schemas were proposed in the literature on "pattern recognition" for parallel operations on pictorial material — perceptrons, integral transforms, skeletonizers, and so forth. These mathematically and computationally interesting schemes might quite possibly serve as ingredients of perceptual processing theories. But as ingredients only! Basically, "integral" methods work only on isolated figures in two dimensions. They fail disastrously in coping with complicated, three-dimensional scenery. The new, more successful symbolic theories use hypothesis formation and confirmation methods that seem, on the surface at least, more inherently serial. It is hard to solve any very complicated problem without giving essentially full attention, at different times, to different sub-problems. Fortunately, however, beyond the brute idea of doing many things in parallel, one can imagine a more serial process that deals with large, complex, symbolic structures as units! This opens a new theoretical "niche" for performing a rapid selection of large substructures; in this niche our theory hopes to find the secret of speed, both in vision and in ordinary thinking. Seeing a Cube In the tradition of Guzman and Winston, we assume that the result of looking at a cube is a structure something like that in figure 1. The substructures "A" and "B" represent details or decorations on two faces of the cube. When we move to the right, face "A" disappears from view, while the new face decorated with " C " is now seen. If we had to analyse the scene from the start, we would have to (1) lose the knowledge about "A", (2) recompute "B", and (3) compute the description of "C". cube region-of
A
B
left-above etc.
vertical-type parallelogram etc.
A Figure 1
E
B
4
A Framework for Representing Knowledge
But since we know we moved to the right, we can save "B" by assigning it also to the "left face" terminal of a second cube-frame. To save "A" (just in case!) we connect it also to an extra, invisible face-terminal of the new cube-schema as in figure 2.
A
E
B
C
Figure 2
If later we move back to the left, we can reconstruct the first scene without any perceptual computation at all: just restore the top-level pointers to the first cube-frame. We now need a place to store " C " ; we can add yet another invisible face to the right in the first cubeframe! See figure 3. We could extend this to represent further excursions
around the object. This would lead to a more comprehensive frame system, in which each frame represents a different "perspective" of a cube. In figure 4 there are three frames corresponding to 45-degree MOVE-RIGHT and MOVE-LEFT actions. If we pursue this analysis, the resulting system can become very large; more complex objects need even more different projections. It is not obvious either that all of them are normally necessary or that just one of each variety is adequate. It all depends. It is not proposed that this kind of complicated structure is recreated every time one examines an object. It is imagined instead that a great collection of
A
B
RIGHT
nn B
M. Minsky
RIGHT
B
C
5
Spatial F r a m e s
Pictorial Frames
left
R e l a t i o n M a r k e r s in c o m m o n - t e r m i n a l structure can represent m o r e invariant ( e . g . three-dimensional) properties.
Figure 4
frame systems is stored in permanent memory, and one of them is evoked when evidence and expectation make it plausible that the scene in view will fit it. How are they acquired ? We propose that if a chosen frame does not fit well enough, and if no better one is easily found, and if the matter is important enough, then an adaptation of the best one so far discovered will be constructed and remembered for future use. Each frame has terminals for attaching pointers to substructures. Different frames can share the same terminal, which can thus correspond to the same physical feature as seen in different views. This permits us to represent, in a single place, view-independent information gathered at different times and places. This is important also in non-visual applications. The matching process which decides whether a proposed frame is suitable is controlled partly by one's current goals and partly by information attached to the frame; the frames carry terminal markers and other constraints, while the goals are used to decide which of these constraints are currently relevant. Generally, the matching process could have these components: (1) A frame, once evoked on the basis of partial evidence or expectation, would first direct a test to confirm its own appropriateness, using knowledge about recently noticed features, loci, relations, and plausible Subframes. The current goal list is used to decide which terminals and conditions must be made to match reality. (2) Next it would request information needed to assign values to those terminals that cannot retain their default assignments. For exemple, it might request a description of face " C " , if this terminal is currently unassigned, but only if it is not marked "invisible." Such assignments must agree
6
A Framework for Representing Knowledge
with the current markers at the terminal. Thus, face " C " might already have markers for such constraints or expectations as: =1= Right-middle visual field. =1= Must be assigned. =1= Should be visible; if not, consider moving right. =1= Should be a cube-face sub-frame. =1= Share left vertical boundary terminal with face " B " . 4= If failure, consider box-lying-on-side frame. =1= Same background color as face " B . " (3) Finally, if informed about a transformation (e.g., an impending motion) it would transfer control to the appropriate other frame of that system. Within the details of the control scheme are opportunities to embed many kinds of knowledge. When a terminal-assigning attempt fails, the resulting error message can be used to propose a second-guess alternative. Later it is shown how memory can be organized into a "Similarity Network" as proposed in Winston's thesis (TR-231). Is Vision Symbolic ? Can one really believe that a person's appreciation of three-dimensional structure can be so fragmentary and atomic as to be representable in terms of the relations between parts of two-dimensional views? Let us separate, at once, the two issues: is imagery symbolic ? and is it based an two-dimensional fragments ? The first problem is one of degree; surely everyone would agree that at some level vision is essentially symbolic. The quarrel would be between certain naive conceptions on one side — in which one accepts seeing either as picture-like or as evoking imaginary solids — against the confrontation of such experimental results of Piaget (1956) and others in which many limitations that one might fear would result from symbolic representations are shown actually to exist! As for our second question: the issue of two- vs. three-dimensions evaporates at the symbolic level. The very concept of dimension becomes inappropriate. Each type of symbolic representation of an object serves some goals well and others poorly. If we attach the relation labels left-of, right-of, and above between parts of the structure, say, as markers on pairs of terminals, certain manipulations will work out smoothly; for example, some properties of these relations are "invariant" if we rotate the cube while keeping the same face on the table. Most objects have "permanent" tops and bottoms. But if we turn the cube on its side such predictions become harder to make; people have great difficulty keeping track of the faces of a six-colored cube if one makes them roll it around in their mind.
M . Minsky
7
If one uses instead more "intrinsic" relations like next-to and opposite-to, then turning the object on its side disturbs the "image" much less. In Winston's thesis we see how systematic replacements (e.g., of "left" for "behind," and "right" for "in-front-of") can deal with the effect of spatial rotation. Seeing a Room Visual experience seems continuous. One reason is that we move continuously. A deeper explanation is that our "expectations" usually interact smoothly with our perceptions. Suppose you were to leave a room, close the door, turn to reopen it, and find an entirely different room. You would be shocked. The sense of change would be hardly less striking if the world suddenly changed before your eyes. A naive theory of phenomenological continuity is that we see so quickly that our image changes as fast as does the scene. There is an alternative theory: the changes in one's frame-structure representation proceed at their own pace; the system prefers to make small changes whenever possible; and the illusion of continuity is due to the persistance of assignments to terminals common to the different view-frames. Thus, continuity depends on the confirmation of expectations which in turn depends on rapid access to remembered knowledge about the visual world. Just before you enter a room, you usually know enough to "expect" a room rather than, say, a landscape. You can usually tell just by the character of the door. And you can often select in advance a frame for the new room. Very often, one expects a certain particular room. Then many assignments are already filled in. The simplest sort of room-frame candidate is like the inside of a box. Following our cube-model, the room-frame might have the top-level structure shown in figure 5.
left wall
Figure 5
g
center wall
h
right wall
8
A Framework for Representing Knowledge
One has to assign to the frame's terminals the things that are seen. If the room is familiar, some are already assigned. If no expectations are recorded already, the first priority might be locating the principal geometric landmarks. T o fill in L E F T W A L L one might first try to find edges " a " and " d " and then the associated corners " a g " and " g d . " Edge " g " , f o r example, is usually easy to find because it should intersect any eye-level horizontal scan f r o m left to right. Eventually, " a g , " " g b , " and " b a " must not be too inconsistent with one another — because they are the same physical vertex. However the process is directed, there are some generally useful knowledgebased tactics. It is probably easier to find edge " e " than any other edge, because if w e have just entered a normal rectangular room, then we may expect that =1= Edge " e " is a horizontal line. =t= It ist below eye level. =1= It defines a floor-wall texture boundary. Given an expectation about the size of a room, we can estimate the elevation of " e , " and vice versa. In outdoor scenes, " e " is the horizon and on flat ground we can expect to see it at eye-level. If w e fail quickly to locate and assign this horizon, we must consider rejecting the proposed frame: either the room is not normal or there is a large obstruction. T h e room-analysis strategy might try next to establish some other landmarks. Given " e , " w e next look for its left and right corners, and then for the verticals rising from them. Once such gross geometrical landmarks are located, we can guess the room's general shape and size. This might lead to selecting a new frame better matched to that shape and size, with additional markers confirming the choice and completing the structure with further details. Scene Analysis and Subframes If the new room is unfamiliar, no pre-assembled frame can supply fine details; more scene-analysis is needed. Even so, the complexity of the w o r k can be reduced, given suitable subframes for constructing hypotheses about substructures in the scene. H o w useful these will be depends both on their inherent adequacy and on the quality of the expectation process that selects which one to use next. One can say a lot even about an unfamiliar room. M o s t rooms are like boxes, and they can be categorized into types: kitchen, hall, living room, theater, and so on. One knows dozens of kinds of rooms and hundreds of particular r o o m s ; one no doubt has them structured into some sort of similarity network f o r effective access. This will be discussed later.
M . Minsky
9
A typical room-frame has three or four visible walls, each perhaps of a different "kind." One knows many kinds of walls: walls with windows, shelves, pictures, and fireplaces. Each kind of room has its own kinds of walls. A typical wall might have a 3 X 3 array of region-terminals (left-center-right) X (top-middle-bottom) so that wall-objects can be assigned qualitative locations. One would further want to locate objects relative to geometric interrelations in order to represent such facts as " Y is a little above the center of the line between X and Z . " In three dimensions, the location of a visual feature of a subframe is ambiguous, given only eye direction. A feature in the middle of the visual field could belong either to a Center Front Wall object or to a High Middle Floor object; these attach to different subframes. The decision could depend on reasoned evidence for support, on more directly visual distance information derived from stereo disparity or motion-parallax, or on plausibility information derived from other frames: a clock would be plausible only on the wall-frame while a person is almost certainly standing on the floor. Given a box-shaped room, lateral motions induce orderly changes in the quadrilateral shapes of the walls as in figure 6. A picture-frame rectangle, lying flat against a wall, should transform in the same way as does its wall. If a "center-rectangle" is drawn on a left wall it will appear to project out because one makes the default assumption that any such quadrilateral is actually a rectangle hence must lie in a plane that would so project.In figure 7A, both quadrilaterals could "look like" rectangles, but the one to the right
10
A Framework for Representing Knowledge
Figure 7
does not match the markers for a "left rectangle" subframe (these require, e.g., that the left side be longer than the right side). That rectangle is therefore represented by a center-rectangle frame, and seems to project out as though parallel to the center wall. Thus we must not simply assign the label "rectangle" to a quadrilateral but to a particular frame of a rectangle-system. When we move, we expect whatever space-transformation is applied to the top-level system will be applied also to its subsystems as suggested in figure 7B. Similarly the sequence of elliptical projections of a circle contains congruent pairs that are visually ambiguous as shown in figure 8. But because wall objects usually lie flat, we assume that an ellipse on a left wall is a left-ellipse,
Figure 8
M . Minsky
11
expect it to transform the same way as the left wall, and are surprised if the prediction is not confirmed.
Default
Assignment
While both Seeing and Imagining result in assignments to frame terminals, imagination leaves us wider choices of detail and variety of such assignments. Frames are probably never stored in long-term memory with unassigned terminal values. Instead, what really happens is that frames are stored with weakly-bound default assignments at every terminal! These manifest themselves as often-useful but sometimes counter-productive stereotypes. Thus in the sentence " J o h n kicked the ball," you probably cannot think of a purely abstract ball, but must imagine characteristics of a vaguely particular ball; it probably has a certain default size, default color, default weight. Perhaps it is a descendant of one you first owned or were injured by. Perhaps it resembles your latest one. In any case your image lacks the sharpness of presence because the processes that inspect and operate upon the weaklybound default features are very likely to change, adapt, or detach them.
Words, Sentences and Meaning The concepts of frame and default assignment seem helpful in discussing the phenomenology of "meaning." Chomsky (1957) points out that such a sentence as (A) "colorless green ideas sleep furiously" is treated very differently than the non-sentence (B) "furiously sleep ideas green colorless" and suggests that because both are "equally nonsensical," what is involved in the recognition of sentences must be quite different from what is involved in the appreciation of meanings. There is no doubt that there are processes especially concerned with grammar. Since the meaning of an utterance is "encoded" as much in the positional and structural relations between the words as in the word choices themselves, there must be processes concerned with analysing those relations in the; course of building the structures that will more directly represent the meaning. What makes the words of (A) more effective and predictable than (B) in producing such a structure — putting aside the question of whether that structure should be called semantic or syntactic — is that the word-order relations in (A) exploit the (grammatical) convention and rules people usually use to induce others to make assignments to terminals of structures. This is entirely consistent with grammar theories. A generative grammar would be a sum-
12
A Framework for Representing Knowledge
mary description of the exterior appearance of those frame rules — or their associated processes — while the operators of transformational grammars seem similar enough to some of our frame transformations. We certainly cannot assume that "logical" meaninglessness has a precise psychological counterpart. Sentence (A) can certainly generate an image! The dominant frame is perhaps that of someone sleeping; the default system assigns a particular bed, and in it lies a mummy-like shape-frame with a translucent green color property. In this frame there is a terminal for the character of the sleep — restless, perhaps — and "furiously" seems somewhat inappropriate at that terminal, perhaps because the terminal does not like to accept anything so "intentional" for a sleeper. "Idea" is even more disturbing, because one expects a person, or at least something animate. One senses frustrated procedures trying to resolve these tensions and conflicts more properly, here or there, into the sleeping framework that has been evoked. Utterance (B) does not get nearly so far because no subframe accepts any substantial fragment. As a result no larger frame finds anything to match its terminals, hence finally, no top level "meaning" or "sentence" frame can organize the utterance as either meaningful or grammatical. By combining this "soft" theory with gradations of assignment tolerances, one could develop systems that degrade properly for sentences with "poor" grammar rather than none; if the smaller fragments — phrases and sub-clauses — satisfy subframes well enough, an image adequate for certain kinds of comprehension could be constructed anyway, even though some parts of the top level structure are not entirely satisfied. Thus, we arrive at a qualitative theory of "grammatical:" if the top levels are satisfied but some lower terminals are not we have a meaningless sentence; if the top is weak but the bottom solid, we can have an ungrammatical but meaningful utterance.
Discourse Linguistic activity involves larger structures than can be described in terms of sentential grammar, and these larger structures further blur the distinctness of the syntax-semantic dichotomy. Consider the following fable, as told by W. Chafe (1972): There was once a Wolf who saw a Lamb drinking at a river and wanted excuse to eat it. For that purpose, even though he himself was upstream, he accused the Lamb of stirring up the water and keeping him from drinking . . . To understand this, one must realize that the Wolf is lying! To understand the key conjunctive "even though" one must realize that contamination never flows upstream. This in turn requires us to understand (among other
M. Minsky
13
things) the word "upstream" itself. Within a declarative, predicate-based "logical" system, one might try to formalize "upstream" by some formula like: [A upstream B] AND [Event T. Stream muddy at A] [Exists [Event U. Stream muddy at B] ] AND [Later U. T] But an adequate definition would need a good deal more. What about the fact that the order of things being transported by water currents is not ordinarily changed ? A logician might try to deduce this from a suitably intricate set of "local" axioms, together with appropriate "Induction" axioms. I propose instead to represent this knowledge in a structure that automatically translocates spatial descriptions from the terminals of one frame to those of another frame of the same system. While this might be considered to be a form of logic, it uses some of the same mechanisms designed for spatial thinking. In many instances we would handle a change over time, or a cause-effect relation, in the same way as we deal with a change in position. Thus, the concept river-flow could evoke a frame-system structure something like the following, where SI, S2, and S3 are abstract slices of the flowing river shown in figure 9.
next-to T
wo
'f lamb
M E
Figure 9
lamb
14
A Framework for Representing Knowledge
There are many more nuances to fill in. What is "stirring u p " and why would it keep the wolf from drinking ? One might normally assign default floating objects to the S's, but here S3 interacts with "stirring u p " to yield something that " d r i n k " does not find acceptable. Was it " d e d u c e d " that stirring riverwater means that S3 in the first frame should have " m u d " assigned to it; or is this simply the default assignment for stirred water ? Almost any event, action, change, flow of materiel, or even flow of information can be represented to a first approximation by a two-frame generalized event. The frame-system can have slots for agents, tools, side-effects, preconditions, generalized trajectories, just as in the " t r a n s " verbs of "case g r a m m a r " theories, but we have the additional flexibility of representing changes explicitly. T o see if one has understood an event or action, one can try to build an appropriate instantiated frame-pair. However, in representing changes by simple "before-after" frame-pairs, we can expect to pay a price. Pointing to a pair is not the same as describing their differences. This makes it less convenient to do planning or abstract reasoning; there is no explicit place to attach information about the transformation. As a second approximation, we could label pairs of nodes that point to corresponding terminals, obtaining a structure like the "comparisonnotes" in Winston (TR-231), or we might place at the top of the frame-system information describing the differences more abstractly. Something of this sort will be needed eventually. Scenarios We condense and conventionalize, in language and thought, complex situations and sequences into compact words and symbols. Some words can perhaps be "defined" in elegant, simple structures, but only a small part of the meaning of " t r a d e " is captured by: first frame
second frame
A has X B has Y B has X A has Y Trading normally occurs in a social context of law, trust and convention. Unless we also represent these other facts, most trade transactions will be almost meaningless. It is usually essential to know that each party usually wants both things but has to compromise. It is a happy but unusual circumstance in which each trader is glad to get rid of what he has. T o represent trading strategies, one could insert the basic maneuvers right into the above framepair scenario: In order for A to make B want X more (or want Y less) we expect him to select one of the familiar tactics: Offer more for Y. Explain why X is so good. Create favorable side-effect of B having
M. Minsky
15
Disparage the competition. Make B think C wants X. These only scratch the surface. Trades usually occur within a scenario tied together by more than a simple chain of events each linked to the next. No single such scenario will do; when a clue about trading appears it is essential to guess which of the different available scenarios is most likely to be useful. Charniak's thesis (TR-266) studies questions about transactions that seem easy for people to comprehend yet obviously need rich default structures. We find in elementary school reading books such stories as: Jane was invited to Jack's Birthday Party. She wondered if he would like a kite. She went to her room and shook her piggy bank. It made no sound. We first hear that Jane is invited to Jack's Birthday Party. Without the party scenario, or at least an invitation scenario, the second line seems rather mysterious: She wondered if he would like a kite. To explain one's rapid comprehension of this, we make a somewhat radical proposal: to represent explicitiy, in the frame for a scenario structure, pointers to a collection of the most serious problems and questions commonly associated with it. In fact we shall consider the idea that the frame terminals are exactly those questions. Thus, for the birthday party: Y must get P for X Chose P! X must like P Will X like P ? Buy P Where to buy P ? Get money to buy P . . . . Where to get money ? (Sub-questions of the "present" frame ?) Y must dress up What should Y wear ? Certainly these are one's first concerns when one is invited to a party. The reader is free to wonder whether this solution is acceptable. The question "Will X like P ?" certainly matches "She wondered if he would like a kite ?" and correctly assigns the kite to P. But is our world regular enough that such question sets could be pre-compiled to make this mechanism often work smoothly ? The answer is mixed. We do indeed expect many such questions; we surely do not expect all of them. But surely "expertise" consists partly in not having to realize, ab initio, what are the outstanding problems and interactions in situations. Notice, for example, that there is no default assignment for the Present in our party-scenario frame. This mandates attention to that assignment problem and prepares us for a possible thematic con-
16
A F r a m e w o r k for Representing Knowledge
cern. In any case, we probably need a more active mechanism for understanding "wondered" which can apply the information currently in the frame to produce an expectation of what Jane will think about. The key words and ideas of a discourse evoke substantial thematic or scenario structures, drawn from memory with rich default assumptions. In any event, the individual statements of a discourse lead to temporary representations — which seem to correspond to what contemporary linguists call "deep structures" — which are then quickly rearranged or consumed in elaborating the growing scenario representation. In order of "scale," among the ingredients of such a structure there might be these kinds of levels : Surface Syntactic Frames — Mainly verb and noun structures. Prepositional and word-order indicator conventions. Surface Semantic Frames — Action-centered meanings of words. Qualifiers and relations concerning participants, instruments, trajectories and strategies, goals, consequences and side-effects. Thematic Frames — Scenarios concerned with topics, activities, portraits, setting. Outstanding problems and strategies commonly connected with topics. Narrative Frames — Skeleton forms for typical stories, explanations, and arguments. Conventions about foci, protagonists, plot forms, development, etc., designed to help a listener construct a new, instantiated Thematic Frame in his own mind. Requests to
Memory
We can now imagine the memory system as driven by two complementary needs. On one side are items demanding to be properly represented by being embedded into larger frames; on the other side are incompletely-filled frames demanding terminal assignments. The rest of the system will try to placate these lobbyists, but not so much in accord with "general principles" as in accord with special knowledge and conditions imposed by the currently active goals. When a frame encounters trouble — when an important condition cannot be satisfied — something must be done. We envision the following major kinds of accomodation to trouble. M A T C H I N G : When nothing more specific is found, we can attempt to use some "basic" associative memory mechanism. This will succeed by itself only in relatively simple situations, but should play a supporting role in the other tactics.
M. Minsky
17
EXCUSE: An apparent misfit can often be excused or explained. A "chair" that meets all other conditions but is much too small could be a "toy". ADVICE: The frame contains explicit knowledge about what to do about the trouble. Below, we describe an extensive, learned "Similarity Network" in which to embed such knowledge. SUMMARY: If a frame cannot be completed or replaced, one must give it up. But first one must construct a well-formulated complaint or summary to help whatever process next becomes responsible for reassigning the subframes left in limbo. Matching When replacing a frame, we do not want to start all over again. How can we remember what was already "seen ?" We consider here only the case in which the system has no specific knowledge about what to do and must resort to some "general" strategy. No completely general method can be very good, but if we could find a new frame that shares enough terminals with the old frame, then some of the common assignments can be retained, and we will probably do better than chance. The problem can be formulated as follows: let E be the cost of losing a certain already assigned terminal and let F be the cost of being unable to assign some other terminal. If E is worse than F, then any new frame should retain the old subframe. Thus, given any sort of priority ordering on the terminals, a typical request for a new frame should include: (1) Find a frame with as many terminals in common with [a, b , . . . , z] as possible, where we list high priority terminals already assigned in the old frame. But the frame being replaced is usually already a subframe of some other frame and must satisfy the markers of its attachment terminal, lest the entire structure be lost. This suggests another form of memory request, looking upward rather than downward: (2) Find or build a frame that has properties [a, b , . . . , z] If we emphasize differences rather than absolute specifications, we can merge (2) and (1): (3) Find a frame that is like the old frame except for certain differences [a, b, . . . , z] between them. One can imagine a parallel-search or hash-coded memory to handle (1) and (2) if the terminals or properties are simple atomic symbols. (There must be some such mechanism, in any case, to support a production-based program or some sort of pattern matcher.) Unfortunately, there are so many ways to do this that it implies no specific design requirements. Although (1) and (2) are formally special cases of (3), they are different in practice because complicated cases of (3) require knowledge about differen-
18
A Framework for Representing Knowledge
ces. In fact (3) is too general to be useful as stated, and we will later propose to depend on specific, learned, knowledge about differences between pairs of frames rather than on broad, general principles. It should be emphasized again that we must not expect magic. For difficult, novel problems a new representation structure will have to be constructed, and this will require application of both general and special knowledge. Excuses We can think of a frame as describing an "ideal." If an ideal does not match reality because it is "basically" wrong, it must be replaced. But it is in the nature of ideals that they are really elegant simplifications; their attractiveness derives from their simplicity, but their real power depends upon additional knowledge about interactions between them! Accordingly we need not abandon an ideal because of a failure to instantiate it, provided one can explain the discrepancy in terms of such an interaction. Here are some examples in which such an "excuse" can save a failing match: OCCLUSION: A table, in a certain view, should have four legs, but a chair might occlude one of them. One can look for things like T-joints and shadows to support such an excuse. FUNCTIONAL VARIANT: A chair-leg is usually a stick, geometrically; but more important, it is functionally a support. Therefore, a strong center post, with an adequate base plate, should be an acceptable replacement for all the legs. Many objects are multiple purpose and need functional rather than physical descriptions. BROKEN: A visually missing component could be explained as in fact physically missing, or it could be broken. Reality has a variety of ways to frustrate ideals. PARASITIC CONTEXTS: An object that is just like a chair, except in size, could be (and probably is) a toy chair. The complaint "too small" could often be so interpreted in contexts with other things too small, children playing, peculiarly large "grain," and so forth. In most of those examples, the kinds of knowledge to make the repair — and thus salvage the current frame — are "general" enough usually to be attached to the thematic context of a superior frame. Advice and Similarity
Network
In moving about a familiar house, we already know a dependable structure for "information retrieval" of room frames. When we move through Door D, in Room X, we expect to enter RoomY (assuming D is not the Exit). We could
M . Minsky
19
represent this as an action transformation of the simplest kind, consisting of pointers between pairs of room frames of a particular house system. When the house is not familiar, a "logical" strategy might be to move up a level of classification: when you leave one room, you may not know which room you are entering, but you usually know that it is some room. Thus, one can partially evade lack of specific information by dealing with classes — and one has to use some form of abstraction or generalization to escape the dilemma of Bartlett's commander. Winston's thesis (TR-231) proposes a way to construct a retrieval system that can represent classes but has additional flexibility. His retrieval pointers can be made to represent goal requirements and action effects as well as class memberships. What does it mean to expect a chair ? Typically, four legs, some assortment of rungs, a level seat, an upper back. One expects also certain relations between these "parts." The legs must be below the seat, the back above. The legs must be supported by the floor. The seat must be horizontal, the back vertical, and so forth. Now suppose that this description does not match; the vision system finds four legs, a level plane, but no back. The "difference" between what we expect and what we see is "too few backs." This suggests not a chair, but a table or a bench. Winston proposes pointers from each description in memory to other descriptions, with each pointer labelled by a difference marker. Complaints about mismatch are matched to the difference pointers leaving the frame and thus may propose a better candidate frame. Winston calls the resulting structure a Similarity Network. Is a Similarity Network practical ? At first sight, there might seem to be a danger of unconstrained growth of memory. If there are N frames, and K kinds of differences, then there could be as many as K 4= N 4= N interframe pointers. One might fear that: (1) If N is large, say 10, then N 4= N is very large — of the order of 10 — which might be impractical, at least for human memory. (2) There might be so many pointers for a given difference and a given frame that the system will not be selective enough to be useful. (3) K itself might be very large if the system is sensitive to many different kinds of issues. But, according to contemporary opinions (admittedly, not very conclusive) about the rate of storage into human long-term memory there are probably not enough seconds in a lifetime to cause a saturation problem. So the real problem, paradoxically, is that there will be too few connections ! One cannot expect to have enough time to fill out the network to saturation. Given two frames that should be linked by a difference, we cannot
20
A Framework for Representing Knowledge
count on that pointer being there; the problem may not have occurred before. However, in the next section we see how to partially escape this problem. Clusters, Classes and Geographic
Analogy
T o make the Similarity Network act more "complete," consider the following analogy in a city, any person should be able to visit any other; but we do not build a special road between each pair of houses; we place a group of houses on a "block." We do not connect roads between each pair of blocks; but have them share streets. We do not connect each town to every other; but construct main routes, connecting the centers of larger groups. Within such an organization, each member has direct links to some other individuals at his own "level," mainly to nearby, highly similar ones; but each individual has also at least a few links to "distinguished" members of higher level groups. The result is that there is usually a rather short sequence between any two individuals, if one can but find it. At each level, the aggregates usually have distinguished foci or capitols. These serve as elements for clustering at the next level of aggregation. There is no nonstop airplane service between New Haven and San Jose because it is more efficient overall to share the "trunk" route between New York and San Francisco, which are the capitols at that level of aggregation. The non-random convergences and divergences of the similarity pointers, for each difference d, thus tend to structure our conceptual world around (1) the aggregation into ¿-clusters (2) the selection of ¿-capitols. Note that it is perfectly all right to have several capitols in a cluster, so that there need be no one attribute common to them all. The "crisscross resemblances" of Wittgenstein are then consequences of the local connections in our similarity network, which are surely adequate to explain how we can feel as though we know what is a chair or a game — yet cannot always define it in a "logical" way as an element in some class-hierarchy or by any other kind of compact, formal, declarative rule. The apparent coherence of the conceptual aggregates need not reflect explicit definitions, but can emerge from the success-directed sharpening of the difference-describing processes. The selection of capitols corresponds to selecting stereotypes or typical elements whose default assignments are unusually useful. There are many forms of chairs, for example, and one should choose carefully the chairdescription frames that are to be the major capitols of chairland. These are used for rapid matching and assigning priorities to the various differences. The lower priority features of the cluster center then serve either as default properties of the chair types or, if more realism is required, as dispatch pointers to the local chair villages and towns.
M . Minsky
21
Difference pointers could be "functional" as well as geometric. Thus, after rejecting a first try at "chair" one might try the functional idea of "something one can sit on" to explain an unconventional form. This requires a deeper analysis in terms of forces and strengths. Of course, that analysis would fail to capture toy chairs, or chairs of such ornamental delicacy that their actual use would be unthinkable. These would be better handled by the method of excuses, in which one would bypass the usual geometrical or functional explanations in favor of responding to contexts involving art or play. Analogies and Alternative
Descriptions
Suppose your car battery runs down. You believe that there is an electricity shortage and blame the generator. The generator can be represented as a mechanical system: the rotor has a pulley wheel driven by a belt from the engine. Is the belt tight enough ? Is it even there ? The output, seen mechanically, is a cable to the battery or whatever. Is it intact ? Are the bolts tight ? Are the brushes pressing on the commutator ? Seen electrically, the generator is described differently. The rotor is seen as a flux-linking coil, rather than as a rotating device. The brushes and commutator are seen as electrical switches. The output is current along a pair of conductors leading from the brushes through control circuits to the battery. The differences between the two frames are substantial. The entire mechanical chassis of the car plays the simple role, in the electrical frame, of one of the battery connections. The diagnostician has to use both representations. A failure of current to flow often means that an intended conductor is not acting like one. For this case, the basic transformation between the frames depends on the fact that electrical continuity is in general equivalent to firm mechanical attachment. Therefore, any conduction disparity revealed by electrical measurements should make us look for a corresponding disparity in the mechanical frame. In fact, since "repair" in this universe is synonymous with "mechanical repair," the diagnosis must end in the mechanical frame. Eventually, we might locate a defective mechanical junction and discover a loose connection, corrosion, wear, or whatever. One cannot expect to have a frame exactly right for any problem or expect always to be able to invent one. But we do have a good deal to work with, and it is important to remember the contribution of one's culture in assessing the complexity of problems people seem to solve. The experienced mechanic need not routinely invent; he already has engine representations in terms of ignition, lubrication, cooling, timing, fuel mixing, transmission, compression, and so forth. Cooling, for example, is already subdivided into fluid circulation, air flow, thermostasis, etc. Most "ordinary" problems are presumably solved by systematic use of the analogies provided by the trans-
22
A Framework for Representing Knowledge
formations between pairs of these structures. The huge network of knowledge, acquired from school, books, apprenticeship, or whatever is interlinked by difference and relevancy pointers. No doubt the culture imparts a good deal of this structure by its conventional use of the same words in explanations of different views of a subject. Summaries:
Using Frames in Heuristic
Search
Over the past decade, it has become widely recognized how important are the details of the representation of a "problem space"; but it was not so well recognized that descriptions can be useful to a program, as well as to the person writing the programm. Perhaps progress was actually retarded by ingenious schemes to avoid explicit manipulation of descriptions. Especially in "theorem-proving" and in "game-playing" the dominant paradigm of the past might be schematized so: The central goal of a Theory of Problem Solving is to find systematic ways to reduce the extent of the Search through the Problem Space. Sometime a simple problem is indeed solved by trying a sequence of "methods" until one is found to work. Some harder problems are solved by a sequence of local improvements, by "hill-climbing" within the problem space. But even when this solves a particular problem, it tells us little about the problemspace; hence yielding no improved future competence. The best-developed technology of Heuristic Search is that of game-playing using tree-pruning, plausible-move generation, and terminal-evaluation methods. But even those systems that use hierarchies of symbolic goals do not improve their understanding or refine their understanding or refine their representations. But there is a more mature and powerful paradigm: The primary purpose in problem solving should be better to understand the problem space, to find representations within which the problems are easier to solve. The purpose of search is to get information for this reformulation, not — as is usually assumed — to find solutions; once the space is adequately understood, solutions to problems will more easily be found. The value of an intellectual experiment should be assessed along the dimension of success — partial success — failure, or in terms of "improving the situation" or "reducing a difference." An application of a "method," or a reconfiguration of a representation can be valuable if it leads to a way to improve the strategy of subsequent trials. Earlier formulations of the role of heuristic search strategies did not emphasize these possibilities, although they are implicit in discussions of "planning." Papert (1972, see also Minsky 1972) is correct in believing that the ability to diagnose and modify one's own procedures is a collection of specific and
M. Minsky
23
important "skills." Debugging, a fundamentally important component of intelligence, has its own special techniques and procedures. Every normal person is pretty good at them or otherwise he would not have learned to see and talk! Goldstein (AIM-305) and Sussman (TR-297) have designed systems which build new procedures to satisfy multiple requirements by such elementary but powerful techniques as: 1. Make a crude first attempt by the first order method of simply putting together procedures that separately achieve the individual goals. 2. If something goes wrong, try to characterize one of the defects as a specific (and undesirable) kind of interaction between two procedures. 3. Apply a "debugging technique" that, according to a record in memory, is good at repairing that specific kind of interaction. 4. Summarize the experience, to add to the "debugging techniques library" in memory. These might seem simple-minded, but if the new problem is not too radically different from the old ones, then they have a good chance to work, especially if one picks out the right first-order approximations. If the new problem is radically different, one should not expect any learning theory to work well. Without a structured cognitive map — without the "near misses" of Winston, or a cultural supply of good training sequences of problems — we should not expect radically new paradigms to appear magically whenever we need them. Some Relevant
Reading
Abelson, R.P. 1973 " T h e Structure of Belief Systems". Computer Models of Thought and Language. Ed. R. Schank and K. Colby (San Francisco: W . H . Freeman). Bartlett, F. C. 1967 Remembering (Cambridge: Cambridge University Press). Berlin, I. 1957 The Hedgehog and the Fox (New York: New American Library). Celce-Marcia, M . 1972 Paradigms for Sentence Recognition (Los Angeles ; Univ. of California, Dept. of Linguistics). Chafe, W. 1972 First Tech. Report, Contrastive Semantics Project (Berkeley: Univ of California, Dept. of Linguistics). Chomsky, N. 1957 "Syntactic Structures". (Originally published as "Strukturen der Syntax") Janua Linguarum Studia Memoriae, 182.
24
A Framework for Representing Knowledge
Fillmore, C . J . 1968 " T h e Case for Case". Universals in Linguistic Theory. Ed. Bach and Harms (Chicago: Holt, Rinehart and Winston). Freeman, P. and Newell, A. 1971 "A Model for Functional Reasoning in Design". Proc. Second. Intl. Conf. on Artificial Intelligence (London: Sept.). Gombrich, E.H. 1969 Art and Illusion, A Study in the Psychology of Pictorial Representation (Princeton: Princeton University Press). Hogarth, W. 1955 The Analysis of Beauty (Oxford: Oxford University Press). Huffman, D.A. 1972 "Impossible Objects as Nonsense Sentences". Machine Intelligence 6. Ed. D. Michie and B. Meitzer (Edinburgh: Edinburgh University Press). Koffka, K. 1962 Principles of Gestalt Psychology (New York: Harcourt, Brace and World). Kuhn, T . 1970 The Structure of Scientific Revolutions 2nd ed. (Chicago: University of Chicago Press). Lavoisier, A. 1949 Elements of Chemistry (Chicago: Regnery). Levin, J. A. 1973 Network Representation and Rotation of Letters. Publication of the Dept. of Psychology, University of California, La Jolla. Minsky, M. 1970 "Form and Content in Computer Science". 1970 ACM Turing Lecture. Journal of the ACM, 17, No. 2 , 1 9 7 - 2 1 5 . Minsky, M. and Papert, S. 1969 Perceptrons (Cambridge: M.I.T. Press). Moore, J. and Newell, A. 1973 "How can MERLIN Understand ?" Knowledge and Cognition. Ed. J. Gregg (Potomac, Md.: Lawrence Erlbaum Associates). Newell, A. 1973 Productions Systems: Models of Control Structures, Visual Information Processing (New York: Academic Press). Newell, A. 1973 "Artificial Intelligence and the Concept of Mind". Computer Models of Thought and Language. Ed. R. Schank and K. Colby (San Francisco: W.H. Freeman). Newell, A. and Simon, H.A. 1972 Human Problem Solving (Englewood-Cliffs, N . J . : Prentice-Hall).
M. Minsky
25
Norman,D. 1972 "Memory, Knowledge and the Answering of Questions". Loyola Symposium on Cognitive Psychology, Chicago. Papert, S. 1972 "Teaching Children to be Mathematicians vs. Teaching about Mathematics". Int. J. Math. Educ. Sci. Technol., 3, 249-262. Piaget, J. 1968 Six Psychological Studies. Ed. D. Elkind (New York: Vintage). Piaget, J. and Inhelder, B. 1956 The Child's Conception of Space. (New York: The Humanities Press). Pylyshyn, Z.W. 1973 "What the Mind's Eye Tells the Mind's Brain". Psychological Bulletin. 80, 1-24. Roberts, L. G. 1965 Machine Perception of Three Dimensional Solids, Optical and Optoelectric Information Processing (Cambridge: M . I . T . Press). Sandewall, E. 1972 "Representing Natural Language Information in Predicate Calculus". Machine Intelligence 6. Ed. D. Michie and B. Meltzer (Edinburgh: Edinburgh University Press). Schank, R. 1973 "Conceptual Dependency: A Theory of Natural Language Understanding". Cognitive Psychology (1972), 552-631. see also Schank, R. and K. Colby, Computer Models of Thought and Language (San Francisco: W.H. Freeman). Simmons, R.F. 1973 "Semantic Networks: Their Computation and Use for Understanding English Sentences". Computer Models of Thought and Language. Ed. R. Schank and K. Colby (San Francisco: W.H. Freeman). Underwood, S.A. and Gates, C.L. 1972 Visual Learning and Recognition by Computer, TR-123, Publications of Elect. Res. Center, University of Texas. Wertheimer, M. 1959 Productive Thinking (Evanston, III.: Harper & Row). Wilks, Y. 1973 "Preference Semantics". Memo AIM-206, Publications of Stanford Artificial Intelligence Laboratory, Stanford University. Wilks, Y. 1973 "An Artificial Intelligence Approach to Machine Translation". Computer Models of Thought and Language. Ed. R. Schank and K. Colby (San Francisco: W.H. Freeman).
I.P. GOLDSTEIN, R.B. ROBERTS
NUDGE, a Knowledge-based Scheduling Program 1
Traditional scheduling algorithms (using the techniques of PERT charts, decision analysis or operations research) require well-defined, quantitative, complete sets of constraints. They are insufficient for scheduling situations where the problem description is ill-defined, involving incomplete, possibly inconsistent and generally qualitative constraints. The NUDGE program uses an extensive knowledge base to debug scheduling requests by supplying typical values for qualitative constraints, supplying missing details and resolving minor inconsistencies. The result is that an informal request is converted to a complete description suitable for a traditional scheduler. To implement the NUDGE program, a knowledge representation language — FRL-O — based on a few powerful generalizations of the traditional property list representation has been developed. The NUDGE knowledge base defined in FRL-O consists of a hierarchical set of concepts that provide generic descriptions of the typical activities, agents, plans and purposes of the domain to be scheduled. Currently, this domain is the management and coordination of personnel engaged in a group project. NUDGE constitutes an experiment in knowledge-based, rather than power-based AI programs. It also provides an example of an intelligent support system, in which an AI program serves as an aid to a decision maker. Finally, NUDGE has served as an experimental vehicle for testing advanced representation techniques.
1.
Introduction
A classic issue in AI is the knowledge versus power controversy (Minsky & Papert 1974). The knowledge position advocates that intelligence arises mainly from the use of a large store of specific knowledge, while the power theory argues for a small collection of general reasoning mechanisms. This paper reports on an experiment in which a knowledge-based program NUDGE has been implemented for the scheduling domain, a domain in which power-based programs have long been the dominant paradigm. Traditionally, scheduling programs apply simple but powerful decision 1
This paper has been submitted to the Fifth International Conference on Artifical Intelligence.
I.P. Goldstein, R.B. Roberts
27
analysis techniques to finding the optimal schedule under a well-defined set of constraints. The performance of NUDGE confirms that for well-defined, formal situations, the traditional power-based approach is appropriate. But for the problem of defining these formal situations when given only informal specifications, a knowledge-based approach is necessary. By an informal specification, we mean a scheduling request that is potentially incomplete, possibly inconsistent and qualitative. (See Balzer 1974 for an analysis of informal program specifications.) Thus, the NUDGE program accepts informal requests and produces a calendar containing possible conflicts aftd an associated set of strategies for resolving those conflicts. A domain-independent search algorithm BARGAIN then resolves these conflicts by traditional decision analysis techniques. informal
¡T, , , , , ! formal I Knowledge-based |
scheduling
|
request
Program^ NUDGE
|
reqUest
1 Power-based ! r , , , —• Schedule Program 1 BARGAIN
NUDGE used a broad data base of knowledge to expand and debug informal scheduling requests. The database is used to supply missing details, resolve inconsistencies, determine available options, notice necessary prerequisites and plan for expected outcomes. To manage this large store of knowledge, a representation language — FRL-O — has been implemented. FRL-O extends the traditional attribute/value description of properties by allowing properties to be described by: comments, abstractions, defaults, constraints, indirect pointers from other properties, and attached procedures. These are not new representation techniques. Abstraction, for example, was discussed in Quillian 1968, and attached procedures have become a common property of AI languages since PLANNER (Hewitt 1969). However, the strengths and weaknesses of these representation techniques and their potential interactions are still not well understood. For this reason, we have chosen not to include as many representation capabilities as are currently being implemented in KRL (Bobrow and Winograd 1976) and OWL (Martin 1977). We view FRL-O as an experimental medium to study the utility of a few specific capabilities and their interactions. Because a knowledge-based approach requires a large store of specific data, it was necessary to choose a particular domain to carry out our experiment. Our criterion was to select a realm in which scheduling requests are typically informal. This criterion ruled out such scheduling problems as those of an assembly line. (See Tonge 1963 for an AI treatment of this problem.) Instead, we selected office scheduling; in particular, assisting a manager in scheduling his team. This environment includes scheduling meetings, monitoring the progess of subgoals assigned to team members, alerting the manager to deadlines, and real-time rescheduling.
28
NUDGE, a Knowledge-based Scheduling Program
In providing NUDGE with the knowledge necessary for these functions, our research serves a third purpose beyond (1) exploring the relation between knowledge-based an power-based scheduling and (2) exercising various representation strategies. It provides insight into the categories of knowledge that are necessary for office scheduling (independent of their representation). NUDGE contains a hierarchy for activities involving information transfer, for people in various roles related to this transfer, for the plans governing these transfers, and for the associated demands on time, space and personnel. The hierarchy is on the average five levels deep and includes approximately 100 objects, each described by a generalized property list called a frame. An abridged version of this hierarchy appears below, with specialization indicated by nesting. (THING (ACTIVITY (EATING (LUNCH) (DINNER)) (TRANSPORT (FLY) (DRIVE) . . . ) (MEETING (PA-MEETING) (COMMUNICATION-AT-A-DISTANCE (PHONECALL) (CORRESPONDENCE (US-MAIL) (NET-MAIL))) (ONE-WAY-COMMUNICATION (READ) (WRITE (DRAFT) (EDIT)) (TELL (LECTURE)) (LISTEN))) (RESEARCH (PA-PROJECT) (VLDB-PROJECT) ..)) (PEOPLE (GROUP (AI-LAB (VLDB-GROUP) (PA-GROUP) . . . ) . . . ) (PERSON (PROFESSOR (IRA)) (STUDENT (GRADUATE-STUDENT (MITCH) (CANDY)) (UNDERGRADUATE-STUDENT)) (VISITOR (ACADEMIC-VISITOR) (GOVERNMENT-VISITOR) (REPORTER)))) (PLACE (CITY (CAMBRIDGE) (SAN-FRANCISCO) ...) (SCHOOL (MIT) (STANFORD) . . . ) (BUILDING (HOUSE (25-WILD WOOD)) (OFFICE-BUILDING (545-TECHNOLOGY-SQUARE)) (RESTAURANT (DODIN-BOUFFANT))) (ROOM (OFFICE (NE43-819)) (SEMINAR-ROOM (AI-LOUNGE)))) (TIME (INTERVAL) (MOMENT)) (OBJECT (INFORMATION (REPORT (PROPOSAL) (PROGRESS-REPORT) (FINAL-REPORT)) (LETTER) (MEMO)) (VEHICLE (CAR) (PLANE) ...) (REMINDER) (CALENDAR)))
The term "frame" as used in FRL-O was inspired by Minsky's (1975) development of frame theory. Frame theory contends that (1) intelligence arises from the application of large amounts of highly specific knowledge, as opposed to a few general inferencing mechanisms, and (2) this is accomplished through the use of a library of frames, packets of knowledge that provide descriptions of typical objects and events. These descriptions contain both
I. P. Goldstein, R . B. Roberts
29
an abstract template providing a skeleton for describing any instance and a set of defaults for typical members of the class. The defaults allow the information system to supply missing detail, maintain expectations, and notice anomalies. We have yet to investigate with equal care related areas of knowledge not strictly involved in scheduling the information flow between members of a research team — this includes space allocation, budgeting, and travel scheduling. The last of these is the focus of the GUS system developed by Bobrow et al (1976). GUS is a frame-based dialog system that addresses many of the issues raised here. The major difference is that GUS focusses on the use of frame representations to support dialog comprehension while NUDGE focusses on their use to support sophisticated scheduling. Finally, although developed for the office scheduling domain, the NUDGE knowledge base does have a broader applicability. The use of a hierarchy of successively more abstract concepts allows highly specific information to be clearly segregated from more general facts about the world. Furthermore, some of the concept hierarchies such as those for time and place apply in toto to many other applications. 2.
Scenario
The following request is typical of those understood by NUDGE. (SI)
NUDGE, schedule a meeting with Bruce for next Tuesday.
SI is an informal specification in that such facts as the following are left unsaid: Where the meeting is to be held. Which Tuesday is next Tuesday. What time on Tuesday is preferred. How long the meeting is should last. Whether there are any prerequisites. What to do if Bruce is unavailable. As such it is typical of the requests made by people of one another. Given a knowledge of the individuals involved, their roles and the context, NUDGE expands SI to the more complete SI', where the added information is in italics. (SI') > I UNDERSTAND YOUR REQUEST T O MEAN: SCHEDULE A MEETING WITH BRUCE AND ME AT 3 IN THE AFTERNOON N E X T TUESDAY ONE WEEK FROM TOMORROW LASTING ONE HOUR IN MY OFFICE TO DISCUSS THE PERSONAL ASSISTANT PROJECT. When processed by the the language comprehension component of NUDGE (Bullwinkle 1977), SI causes the construction of a frame gestalt, a data struc-
30
NUDGE, a Knowledge-based Scheduling Program
ture of instantiated frames containing all information in SI'. (The actual text of SI' is generated by the text output procedures associated with the frames in this gestalt.) The knowledge-based phase of the scheduling concludes with the conversion of the information in the frame gestalt to a calendar containing possible conflicts and an associated set of strategies for resolving those conflicts. The BARGAIN program then reports the highest utility compromise to the user. (Rl)
YOUR PREFERRED TIME FOR MEETING WITH BRUCE ON TUESDAY IS ALREADY FILLED BY A MEETING WITH MARK. HOWEVER, MARK'S PURPOSE IS T O DISCUSS LANGUAGE RESEARCH AND CANDY IS AVAILABLE. SHALL I RESCHEDULE MARK T O SEE HER INSTEAD. Mark's purpose is not specifically to see Ira, and hence, a member of Ira's group can be substituted to see Mark. Observe that sufficient knowledge has made it unnecessary to juggle the possible time of the meeting between Ira and Bruce, producing a compromise not entirely satisfactory to either. Instead, a sufficiently broad perspective has allowed NUDGE to identify the option of finding an alternative participant by referencing the purpose of the conflicting appointment. A knowledge-based phase in the scheduling thus avoids the myopia a purely power-based approach can exhibit. Continuing the scenario: (52) > Yes. (R2) THE MEETING WITH BRUCE IS SCHEDULED SHOULD BRUCE PREPARE A WRITTEN PA PROGRESS REPORT? (53) > Yes. (R3) I WILL REMIND BRUCE ABOUT THE REPORT. Section 3 describes the frame gestalt representing NUDGE's completion of the S1 request. Section 4 discusses the representation techniques of FRL-O supporting the creation of the gestalt. Section 5 analyzes the underlying knowledge of activities and individuals represented with these techniques. 3.
The Formation of Frame Gestalts
A frame gestalt consists of a set of generic frames, instantiated appropriately for a particular scheduling request. The generic frames are selected on the basis of clues implicit in the scheduling request; knowledge-based reasoning begins with a recognition process. Information missing in the request is then computed from defaults, constraints, and procedures — all associated with these generic frames. For SI, the gestalt includes the frames shown below, with daughters of a frame representing its specializations. Many other interrelations among these frames are not shown.
I. P. Goldstein, R. B. Roberts
31
THING
TIME MOMENT
I
/
MOMENT 54
\
INTERVAL
I
INTERVAL 17
ACTIVITY
PEOPLE
ACTION
MEETING
PERSON
SCHEDULE
I
I
I
/
PA-MEETING
MANAGER
MEETING 37
IRA
I
I
I
\
STAFF
I
I
SCHEDULE21
BRUCE
The goal of the knowledge-based phase is to compute this frame gestalt. The input to the gestalt formation process is a set of partially instantiated frames representing the information actually present in the informal request. This input is generated by a natural language front end consisting of the Waitand-See-Parser developed by M. Marcus (1976) and a frame-based semantics designed by C. Bullwinkle (1977). There are four partially instantiated frames: MEETING 37, SCHEDULE 21, INTERVAL 17, MOMENT 54. MEETING 37 is the frame for the proposed meeting and initially contains information regarding the participants and time of the event extracted from the English request. We give MEETING37 below in FRL-0 notation. (MEETING37 (AKO ($VALUE (PA-MEETING (SOURCE: PRAGMATICS SI))) (WHO ($VALUE (IRA (SOURCE: PRAGMATICS S1)) (BRUCE (SOURCE: SEMANTICS SI))))
...)
MEETING37 reports that the participants are Bruce and Ira. Bruce is known from the semantic interpretation of S I , while Ira is inserted on the basis of pragmatics, i. e. that the person requesting the meeting wishes to be included among the participants. MEETING37 has been identified by pragmatics as A-KIND-OF (AKO) PA-MEETING on the basis of knowledge regarding the common activities of Ira and Bruce. Had no such special knowledge existed, the sentence would simply have triggered an instance of the MEETING frame. The first element of these frame structures is the name of the frame and each remaining item is a slot. Each slot has a value, explicitly marked by $VALUE. A slot can have more than one value, as in the participant slot of MEETING 37. A frame is thus simply a multi-level association list. The semantics of attached procedures and inheritance is enforced by the access functions. 4.
Representation
Technology
The formation of a frame gestalt occurs by expanding the frames extracted from the initial request in terms of the knowledge stored in the FRL database.
32
NUDGE, a Knowledge-based Scheduling Program
This section discusses this process in terms of the contributions made by six representations techniques embedded in FRL-0: comments, abstraction, defaults, constraints, indirection and procedural attachment. (1) Comments. The first generalization of property lists in FRL-0 is the inclusion of comments attached to values. Comments are used in the examples to record the source of the value in each slot. So far, the only source is the semantic and pragmatic interpretation performed by the language comprehension process. Alternative sources are inferences made by attached procedures and inherited properties. Other kinds of commentary provide numercal utilities for use by the scheduler, and describe differences between multiple procedural methods attached to the same slot. This commentary provides guidelines both for BARGAIN to judge the reliability and strength of various constraints and for NUDGE to debug inconsistencies arising from conflicting contributions by different frames during the gestalt formation process. The former would arise if conflicts existed between the generic information of the IRA and BRUCE frames, while the latter is exemplified by S1 being part of a dialog. The sentence "The meeting should be tomorrow in my office." would override the default suggested by generic knowledge. Our use of commentary to guide a debugging process derives from research by Sussman (1973) and Goldstein (1974). Self-knowledge, in the form of machine understandable annotations, also faciliates the systems ability to explain its inferences to a user. This is critical if the user is to become confident in the system's capabilities. (2) Abstraction. The second property list generalization is to allow information to be inherited between concepts. In essence, this is an implementation of Quillian's SUPERCONCEPT pointer, but between property lists rather than nodes in a semantic net. For example, details of the meeting between Bruce and Ira are inherited from a generic description of PA-MEETINGs. This description includes answers to such questions as where such meetings are typically held, when, why, who is involved, what prerequisites are required and what consequences result. Since the answers to some of these questions is clearly applicable to a broader set of activities than meetings of the PA-GROUP, the information is distributed in a hierarchy of successively more general frames, thereby achieving both power and economy. Each frame points to its generalization by means of its AKO slot. The process by which information in a generic frame is acquired by a specialized instance of that frame is called inheritance. MEETING37, for example, inherits information from its generalization, the PA-MEETING frame. (PA-MEETING (AKO ($VALUE (WHY ($VALUE
(MEETING))) (PA-PROJECT)))
I. P. Goldstein, R. B. Roberts
33
(WHERE ($DEFAULT (AI-PLAYROOM))) (WHEN ($DEFAULT (((ON FRIDAY) (FOR 1 HOUR))))) (WHO ($DEFAULT ((IRA (ROLE: MANAGER)) (BRUCE (ROLE: FRL)) (CANDY(ROLE: SEMANTICS)) (MITCH (ROLE: SYNTAX))))) (3) Defaults. The third generalization that naturally accompanies a hierarchy of frames is default information, as the slots of a generic activity typically supply default answers to the common questions asked about such events. The utility of such default information was a major insight of Minsky's original frames research. Their use is prevalent throughout the NUDGE database, and give the scheduler much of its power. For example, PA-MEETING supplies the information that such activities typically involve four people, occur on Fridays in the AI Lab Playroom and last one hour. The $DEFAULT atom distinguishes defaults from values. We shall refer to the different kinds of information associated with a slot as its facets. The role commentary information associated with the participants of PA-MEETINGs is used by the PERSON-SWAPPING scheduling strategy of the BARGAIN program. (Managers are optimistically defined to know all of the team member's roles.) In forming a frame gestalt, the defaults of superior frames are used unless they are overridden by information from a more reliable source, such as the explicit constraints of the original request. Thus, the WHERE default would apply to the MEETING37 gestalt. The WHEN default, however, is overridden by the explicit request in S1 that the meeting occur on Tuesday. Defaults are also useful to the natural language understanding system by supplying expectations that aid the parser and semantics in processing ambiguity and ellipsis. However, we do not develop that application here. (4) Constraints. A knowledge representation language must accommodate descriptions of properties if it is to support the recognition of new instances of a generic concept. FRL-0 allows constraints to be attached to a slot by means of facets for requirements and preferences. These constraints are illustrated in the MEETING frame, which is the immediate generalization of PA-MEETING. (MEETING (AKO ($VALUE (ACTIVITY))) (WHO (SREQUIRE ((EXISTS ?WHO (HAS-ROLE ?WHO 'CHAIRMAN)) ))) (WHEN (SPREFER (( (DURATION ?WHEN) (HOUR 1.5)) )))
...)
Requirements are predicates which must be true of the values in a slot. Preferences can be relaxed yet leave a valid slot. The $REQUIRE facet of MEET-
34
NUDGE, a Knowledge-based Scheduling Program
ING stipulates that a chairman be present at all meetings. The $PREFER facet states that meetings should not last longer than 90 minutes. It is possible to collapse defaults, preferences and requirements into a single CONSTRAINT facet. However, we have found it convenient to preserve the distinction, given the use of these facets by a scheduler. Requirements cannot be relaxed. Preferences, however, can be negotiated. Defaults are preferences that offer specific alternatives, rather than acting as predicates on a set. KRL's development of "perspectives" in which a frame is described from a particular viewpoint is a more elaborate kind of constraint than we typically employ. While a frame pattern matcher can, in essence, specify a perspective, we have not generally used such a device for the narrowly defined world of office scheduling. Whether an extended description mechanism will be needed for more complex applications remains to be seen. Our current plans are to see where FRL-0 fails before we incorporate more powerful, but more complex techniques. (5) Indirection. Not all of the relevant description of MEETING 37 is contained in abstractions of the meeting concept. Frames in orthogonal hierarchies also supply constraints. For example, activities involve agents, and the frames for these agents have relevant information. The frame system has separate hierarchies for activities and people, interconnected by indirection. The IRA frame exemplifies this. (IRA (AKO ($VALUE (PERSON))) ((MEETING WHEN) ($PREFER
((DURING AFTERNOON) ) ((ON FRIDAY) ))) ((MEETING WHERE) ($DEFAULT (NE 43-819)) (SPREFER ((IN 545-TECH-SQUARE) ))) ((PA-MEETING WHEN) ($DEFAULT ((AT 3 PM) ) ((AT 10 AM) )) ($PREFER ((ON TUESDAY) )))
...)
The first atomic slot identifies Ira as a person. The remaining non-atomic slots(i.e., slots with compound names) provide information regarding various activities with which IRA is typically involved. For example, in general IRA prefers MEETINGS on Friday afternoons in his office. This information is not stored directly in the activity frame since it not generically true for the activity, but only applicable when IRA is involved. IRA's default of NE43-819 as the place to hold meetings supplies the value of the WHERE slot of MEETINGS/. The indirect information appears in the frame gestalt if both the agent and activity frames are triggered. For SI, triggering IRA and MEETING together resulted in the frame gestalt supplying the missing information re-
I. P. Goldstein, R.B. Roberts
35
garding the location of the meeting. Thus, indirection provides links between different hierarchies, extending the frame system to include a network of contingent facts. Indirection is a simplified kind of mapping between concepts. FRL differs from MERLIN (Moore & Newell 1973) (in which general mapping is allowed) by providing a restricted but more structured environment. Mapping occurs, in essence, only between agent and activity frames through indirection and between concept and superconcept frames through inheritance. (6) Procedural attachment. A knowledge representation must allow procedural as well as declarative knowledge. Procedural attachment provides this capability in FRL-0. This capability is found in most AI languages beginning with PLANNER (Hewitt 1969). There are typically three kinds of procedural attachment, and all are provided in FRL-0. These are if-added, if-needed, and if-removed methods. A difference from traditional AI languages is that these procedures are attached to the slots of a frame rather than to assertions of an arbitrary form. FRL-0 is thus a more structured environment than languages linke PLANNER and CONNIVER (McDermott & Sussman 1972). Providing a mechanism for triggering arbitrary procedures by adding a value to a slot supports the fundamental operation of FRL-0 which is instantiation; that is, creating an instance of a frame and filling in values. For example, when the time of MEETING 37 is arranged (a value of the WHEN slot is assigned) its name is entered in a calendar for easy reference. The method for doing this is supplied by the W H E N slot of the ACTIVITY frame. (ACTIVITY (AKO ($VALUE (THING))) (WHO ($REQUIRE ((ALL (AKO PERSON)) )) ($IF-NEEDED ((ASK) (TYPE: REQUEST) ) ((USE TOPIC) (TYPE: DEDUCE) ))) (WHEN ($IF-ADDED ((ADD-TO-CALENDAR) )) ($REQUIRE ((AKO INTERVAL) )))
...)
ACTIVITY also illustrates IF-NEEDED methods. These methods allow access to arbitrary procedures for supplying values. For example, examine the W H O slot of ACTIVITY. There are two IF-NEEDED methods there. The first, (ASK), is a function that requests the value from the user. Its purpose is indicated by the comment TYPE: REQUEST. The other method, (USE TOPIC), attempts to deduce the participants by accessing the W H O slot of the frame provided as the value of the TOPIC. The comment on this method indicates that it is of TYPE: DEDUCE. The TYPE comments are used by the function controlling the overall instantiation process (the IF-NEEDED
36
NUDGE, a Knowledge-based Scheduling Program
method of INSTANCE in THING, which all frames inherit). Their function is to allow deductive methods to be used in preference to interactive requests if possible. If-needed methods are exceedingly powerful. For example, defaults can be viewed as a special kind of if-needed method, so useful and widespread that a special facet of a slot is devoted to it. Idiosyncratic forms of inheritance (using other than the AKO link) can be imbedded in an if-needed for appropriate slots. Attached procedures are also used to maintain the integrity of the database. For example, AKO and INSTANCE are slots that provide a two-way link between a frame and its generalization. This linkage is maintained by a pair of if-added and if-removed methods. The procedures which implement this mechanism appear in THING, the most general frame in NUDGE. (THING (AKO
($IF-ADDED ($IF-REMOVED (INSTANCE ($IF-NEEDED ($IF-ADDED ($IF-REMOVED
((ADD-INSTANCE) )) ((REMOVE-INSTANCE) ))) ((INSTANTIATE-A-FRAME) )) ((ADD-AKO) )) ((REMOVE-AKO) )))
Subtleties. We conclude our discussion of FRL-0 as a representation language with a consideration of first some of the subtleties involved in the use of these six techniques and then some of FRL-O's current limitations. 1. There is more than one kind of inheritance. Additive and restrictive inheritance are two kinds of inheritance strategies that correspond to two common forms of specialization. Additive inheritance is appropriate where specialization adds new non-contradictory facts to the more general concept. Restrictive inheritance is appropriate where specialization overrides the information contained in the more general concept. Commentary is employed to inform the inheritance mechanism whether to stop or to continue up an AKO chain once the first datum is found. 2. Methods can conflict. Procedural attachment can be complex. For example, care must be taken to avoid loops: a method in slot A may add a value to slot B that in turn adds a value to slot A. An endless loop results. We handle this by using comments to describe the source of information. It is up to the individual method to access this commentary. Scheduling multiple methods associated with a single slot may be required, when the methods have an a implicit order. Currently, the frame system executes the methods in a fixed order. If more subtle ordering is required, the user must combine the methods into a single procedure, with this method responsible for performing the proper ordering.
I. P. Goldstein, R. B. Roberts
37
3. The distinction between value and requirement is not sharp. Requirements have been presented as predicates to filter out unwanted values. To the extent that NUDGE can reason directly from them, however, they can be used in place of values. For example, an IF-NEEDED procedure can use the requirement to select the generic category when instantiating a new frame to fill a slot. 4. A frame is more than the sum of its parts. In our initial conception of a frame system, we did not provide for a SELF slot to contain idiosyncratic information about the frame itself. Our hypothesis was that all of the information in the frame could be represented in the individual slots. However, the need arose to represent global information about the frame, not local to any slot. Two examples are knowledge of how to print the frame in English and knowledge of the preferred order in which to instantiate the slots of the frame. For these reasons, a SELF slot was introduced with a set of facets appropriate to the various classes of global information it contains. At present these are a $DISCUSS facet which contains a procedure for describing the frame in prose, and an $ORDER facet which contains a procedure that orders the slots at the time of instantiation. 5. Inheritance of values and defaults can conflict. A given slot may have both a default and a value in its generalization frame. Which should dominate ? Currently, the frame system treats values as more important than defaults, so all of the values of superior frames are checked before a default is accepted. However, this may not be appropriate in all cases. When it is not, the user of the frame system can obtain complete control by asking for the full heritage of the slot, i. e. all of the facets for the slot in the current frame and its superiors. The user can then select the desired datum. Limitations. The following are limitations of version 0 of FRL. 1. No provision is made for multiple worlds in the frame system, although the BARGAIN program can consider alternative calendars. 2. Procedures cannot be attached to arbitrary forms, but only to values. For example, there is no way to have a procedure trigger when a new requirement is added. 3. Arbitrary data structures cannot be asserted. Only information of the form "frame, slot, facet, datum, comment" can be placed in a frame. 4. Hash coding is not currently used. Hence, it is expensive to find all the frames with a slot of a given name and even more expensive to find all the frames in which a given value appears. 5. Comments cannot be associated with arbitrary parts of a frame, but only either with individual data or the SELF slot of the frame. There is no way to associate a comment with a subset of the slots.
38
NUDGE, a Knowledge-based Scheduling Program
6. Mapping between frames is restricted to matching slots. A generalized mapping function, as in MERLIN (Moore, Newell 1973) wherein one slot can be mapped to another, is not allowed. Eventually, we may find that the more sophisticated capabilities of CONNIVER, MERLIN, KRL, or OWL are needed. But the rapid rise and fall of PLANNER argues for caution in the introduction of complex new techniques. We plan to introduce additional techniques only as the simple generalized property list scheme of FRL-0 proves inadequate. At present, FRL is adequate to represent the knowledge described in the next section which comprises the basis of its scheduling expertise.
5.
Epistemology of Scheduling
NUDGE's ability to expand informal scheduling requests arises from the recognition of the request as an instance of various generic frames. This section provides snapshots of this generic knowledge, which includes frame hierarchies for activities, people, time, and plans. (1) Activities. Most activities known to NUDGE involve information transfer. They span a set of events central to the scheduling domain and interesting in the subtleties they introduce for successfully debugging conflicting schedules. The "activity" subtree of the frame hierarchy shown earlier illustrates these information transfer activities and their disposition along generalization chains. The use of concept hierarchies is an alternative to the unstructured set of primitives proposed by Schank (1973). We find this approach powerful in that it facilitates the recognition of new objects and allows fine distinctions to be readily made. To illustrate the first point, S1 could have been treated as referring to an instantiation of the MEETING frame, in the absence of recognizing a meeting between Bruce and Ira as a PA-MEETING. Later, if additional information allows this recognition, the AKO pointer of MEETING37 would simply be adjusted to point to PA-MEETING. Otherwise no change is needed. The fine distinctions between information transfer activities represented in the activity hierarchy guides the time of scheduling, the preparations required, and the pattern of the consequences. A phone call need not be scheduled for a precise time while a formal meeting must. We find a separate frame useful, rather than representing phone and mail as different instruments to a single communication activity (as might be Schank's approach) because other information is clustered around the choice of means. A letter requires more preparation time than a phone call, implies a certain delay in communication, and leaves open whether a response will be received.
I. P. Goldstein, R. B. Roberts
39
(2) People. Beyond the straightforward record of pertinent facts about a person — their name, address, phone number, office — lies the need to capture the alternate roles people play in different situations. Roles exist in their own abstraction tree. For scheduling purposes, the roles define an order of importance that dictates, in the case of conflicting defaults and preferences, the order in which they should be relaxed. Acquiring properties by virtue of playing a role is identical to inheriting information as a specialized instance of a frame; the AKO/INSTANCE links define the path along which this information flows. A particular feature of roles is that they are often transitory and conditional on the type of activity. Originally, we maintained a strict hierarchy in FRL with each frame pointing to a single abstraction. Recently, we have allowed multiple AKO links. The motivation was that it appeared natural for a person to inherit from several roles. For example, in a given situation, Ira may be both a visitor and a professor. Insofar as the information inherited from multiple parents is nonconflicting, no difficulty arises from this bifurcation in the AKO path. If there is a conflict, it shows up in the frame gestalt with source comments indicating the various sources of the conflicting information. It is up to the process using the gestalt to decide on a resolution. For example, as a professor, Ira's preferences with respect to a meeting time would take precedence over a student. As a visitor to another university, they would not. The techniques for debugging schedules take account of this, treating the VISITOR role as overriding the PROFESSOR role. People can be members of groups. A group has many of the same characteristics as a person insofar as it can appear in the WHO slot of an activity and may have its own "personal" facts: a name, an address, etc. The MEMBER and AFFILIATE slots record this dual relation in NUDGE. (3) Time. In our earliest work on NUDGE, time was represented simply as points on a real-number line. This was adequate for the formal analysis made by the scheduler, but proved insufficient to represent the informal time specifications supplied by users. People's time specifications are generally incomplete and occasionally inconsistent. To handle these informal requests, we moved to a frame-based representation for time similar to the one described by Winograd (1975). Below is part of the generic frame for a moment in time: (MOMENT (AKO (SVALUE (MINUTE ...) (HOUR ($IF-NEEDED ($IF-ADDED ($REQUIRE (DAY
(TIME)))
((ASK) (TYPE: REQUEST) )) ((DAYTIME-EQUATION) )) ((INTEGER-RANGE 0 23) (TYPE: SYNTAX) ) ((DAYTIME-AGREEMENT) ))) ($IF-NEEDED ((ASK) (TYPE: REQUEST) ))
40
NUDGE, a Knowledge-based Scheduling Program ($IF-ADDED (SREQUIRE
((WEEKDAY-EQUATION) )) ((INTEGER-RANGE 1 31) (TYPE: SYNTAX) ) ((DAY-MONTH-AGREEMENT) ) ((WEEKDAY-AGREEMENT) )) (SDEFAULT ((CALENDAR-DAY (NOW)) ))) (WEEKDAY ($IF-NEEDED ((ASK) (TYPE: REQUEST)) ((WEEKDAY-EQUATION) (TYPE: DEDUCED) )) (SREQUIRE ((WEEKDAY ?) (TYPE: SYNTAX) ) ((WEEKDAY-AGREEMENT) (TYPE: DEDUCED) ))) (DAYTIME (SIF-NEEDED ((ASK) (TYPE: REQUEST) ) ((DAYTIME-EQUATION) (TYPE: DEDUCED) )) (SREQUIRE ((DAYTIME?) (TYPE: SYNTAX) ) ((DAYTIME-AGREEMENT) (TYPE: DEDUCED) ))) (MONTH "similar to day") (YEAR "similar to day and month"))
The M O M E N T frame can represent an incomplete request by creating an instance with only a few of the slots instantiated. The attached procedures — WEEKDAY-EQUATION and D A Y T I M E - E Q U A T I O N — derive as many additional descriptors as possible. A set of time predicates (BEFORE, DURING, A F T E R , etc.) have been implemented that allow a user to ask questions regarding his calendar. For example, using the natural language front end, the system can be asked: "Is Ira free on Monday, February 7 ? " The predicates take instantiated time frames as input and perform their analysis on a "need to know" basis. That is, (BEFORE M l M 2 ) will return T even if M l and M 2 are incomplete, providing there is sufficient information to make the required judgment. For example, M l may specify a moment in January without saying precisely when and M 2 a moment in February. In this case, the B E F O R E question can be answered despite ignorance of the exact time involved. In this fashion, we go beyond Winograd's original discussion of frame-based time-representations in that he did not consider the issues raised by reasoning with incomplete time specifications. Of course, the nature of the incompleteness may be such that no answer is possible. In this case, the time predicates report that the frames are too incomplete for an answer. The time predicates can also tolerate a certain degree of inconsistency. For example, suppose a user asks if a meeting is possible with Bruce on Tuesday, January 20,1977. In fact, January 20 is Monday. But if the frame system knows that Bruce is on vacation all of January, it is more appropriate for it to reply: "I assume you mean Monday, January 20. Bruce will be on vacation then." Rather than first asking for a clarification and then telling the user his request will fail. Inconsistency is detected by IF-ADDED methods which, in the course of deriving values, observe that they have computed a slot value that conflicts with a user supplied value. A comment regarding the inconsistency is placed both at the slot level and at the frame level. For example, the M O M E N T frame for the inconsistent time specification given above would be:
I. P. Goldstein, R.B. Roberts
41
(MOMENT 12 (WEEKDAY (SVALUE (TUESDAY (SOURCE: USER) ) (MONDAY (SOURCE: DERIVED) ))) (SELF (LOGICAL-STATE ($VALUE (INCONSISTENT (SEE: WORKDAY) )))))
The time predicates report the inconsistency, and then attempt to answer the original question by reducing the inconsistency to an incompleteness. This is done by referencing an ordering on the slots corresponding to their relative reliability. Year dominates Month which dominates Day which dominates Weekday. The inferior slot values are ignored until the inconsistency is removed. The question is then answered using the resulting incomplete frame. At best, the time predicates have guessed correctly and the user has learned the answer to his question. At worst, he is alerted to the consistency and responds with a repetition of his original request with the inconsistency removed. (4) Plans. It is uncommon to schedule isolated events. Typically, clusters of related activities are organized around a theme ; a series of meetings to discuss the state of a group's research, work by several people on a joint paper. These clusters embody two kinds of interrelations in addition to the AKO/INSTANCE bond already discussed. First, there is a logical ordering of activities, which in the realms of scheduling nearly always entails a chronological ordering to be enforced. Second, activities can be broken down into sub-activities ; these represent sub-goals with respect to the purposes of the activity itself. Opposing PREREQUISITE/POSTREQUISITE links connect frames possessing a logical ordering. The values of a Prerequisite slot name frames which must immediately precede it. Analogous SUB/SUPER links connect frames subordinate one to another. A plan is a group of frames connected by these pointers. These implement a procedural net in the style of Sacerdoti (1975), which served to unify the ideas of ABSTRIPS and NOAH as schemes for representing planning knowledge. An example of using a plan is illustrated in the scenario. NUDGE's response R 2 alludes to a PA progress report, whose frame contains the following planning links. PROGRESS R E P O R T
DRAFT«
pre
EDIT
post
• DISTRIBUTE
Interconnections describing its subgoals and the order in which they must be accomplished permit the creation of an instance mirroring this structure which satisfies the request. At R 2 in the scenario, NUDGE makes a point of scheduling the preparation of a written progress report for Bruce which is clearly something to be accomplished before the newly scheduled meeting
42
NUDGE, a Knowledge-based Scheduling Program
with Ira. The generic frame for PA-MEETING has a PREREQUISITE slot containing a requirement for this and an If-needed procedure to accomplish it. (PA-MEETING (PREREQUISITE ($REQUIRE ((AKO REPORT) )) ($IF-NEEDED ((INSTANTIATE-AS-REQUIRED) )))
...)
Frames and Knowledge. Frame systems have proved a convenient representation for knowledge that naturally falls into a taxonomy of successively more general categories. Individual frames are convenient for representing concepts that have multi-dimensional descriptions which may be potentially incomplete or inconsistent. However, the limits of frames as a representational scheme are not yet clearly understood. We plan extensions into different domains to understand these limitations. 6.
Bargaining between Goals
NUDGE translates an ill-defined, under-specified scheduling request into a complete specification, represented by the frame gestalt. This gestalt becomes the input to a scheduling program, BARGAIN, that seeks the best time for the requested activity if the desired time is unavailable. Other traditional scheduling programs could be employed as the gestalt is a complete and formal request. We use BARGAIN since it improves upon traditional decision analysis programs by incorporating AI techniques to control the search process. BARGAIN is power-based in the sense that its competence is predicated on efficient search. It engages in a best-first search, as controlled by a static evaluation function that measures (1) the number of violated preferences, (2) their respective utilities and (3) the number of remaining conflicts. BARGAIN was originally designed by Goldstein (1975) and implemented in CONNIVER by F. Kern (1975). BARGAIN employs a set of 8 search operators which constitute debugging strategies for time conflicts. One set are "resource-driven", i.e. they are experts on the physics of time and eliminate a conflict by altering the duration, interrupting, sharing or moving the event. The second set are "purposedriven" and go outside the time domain to examine the topic of the meeting and alternative methods for accomplishing it. An application of any one of these techniques produces a new calendar with the conflict resolved, and possibly new conflicts introduced. Each strategy has a cost associated with it. BARGAIN halts when it has found the best sequence of debugging strategies that generate a conflict free calendar within various computational constraints. The applicability of a search operator — especially the purpose-driven kind — can depend on the overall knowledge context. Hence, the power-based
I.P. Goldstein, R.B. Roberts
43
approach benefits from some heterarchy with the preceding knowledge-based phase. A given search operator may ask the frame system whether it applies. For example, a strategy to change participants must rely on knowledge of available candidates and the goals of the activity for suggesting suitable replacements. The relative preference for different scheduling strategies is controlled by specific assertions in the HOW slot, which contains the names of strategies applicable to the activity in which it appears. For example, PA meetings can be postponed as a last resort and only reluctantly interrupted; as can be seen in this excerpt from the PA-MEETING frame. (PA-MEETING (HOW (SDEFAULT ((POSTPONE (MAXIMUM: 2)) (UTILITY: HIGH) )) ((INTERRUPT (UTILITY: HIGH) )))) • ••)
Our approach to power-based scheduling parallels the conservative development of the knowledge-based component in that the well-understood techniques of decision analysis have been augmented only as required. This augmentation has involved applying AI search techniques to improve the efficiency with which a best compromise is found. 7.
Conclusions
(1) FRL-0 provides a simple, but powerful representation technology. Frames are generalized property list, sharing much of the simplicity of traditional attribute/value representation schemes. Yet the addition of a few capabilities — comments, constraints, defaults, procedural attachment, inheritance — provides a great deal more power and economy in the representation. KRL and OWL are more ambitious and more complex, and may well apply to contexts in which FRL-0 proves insufficient. But this remains to be seen. We plan further experiments with FRL-0 to identify its strengths and weaknesses. (2) Whether FRL-0 or some other AI language is employed, our experience with the nature of informal requests, the issues raised by multiple inheritance paths, the interaction between a search program and a rich knowledge base, and the epistemology of information transfer acitivities, time, place and people will surely be relevant to the design of knowledgebased AI programs. (3) FRL is an experiment in the utility of the frames. Our experience is that clustering the answers to common questions about the frame structure for a concept provides a useful representation paradigm. The gestalt derived from this frame structure supplies missing information similar to that generated by competent human schedulers to handle informal requests.
44
NUDGE, a Knowledge-based Scheduling Program
(4) The entire system can be viewed from another perspective. Since a frame's behavior is actually governed largely by the attached procedures, it can be viewed as an accessing scheme to the underlying procedural knowledge. Thus, frames implement goaldirected invocation (as in PLANNER), but with pattern matching replaced by the more general process of frame instantiation. (5) NUDGE is a step towards an AI system with common sense. By generating a complete frame gestalt, the system minimizes the possibility of overlooking obvious alternatives. Defaults, preferences and requirements allow much to remain unsaid. A tolerance for minor inconsistencies is a benchmark of a robust knowledge system. (6) NUDGE and BARGAIN are steps towards the creation of an automated office. Given the enormous information flow in a modern office, this is an important area for applied AI research.
8.
Bibliography
Balzer, R. 1974 "Human Use of World Knowledge", ISI/RR-73-7. Bobrow, G.G.; Kaplan, R . M . ; Kay, M . ; Norman, D.; Thompson, H.; Winograd, T. 1976 "GUS, A Frame-Driven Dialog System" (Xerox Research Center, Palo Alto, Cal.). Bobrow, D.G. & Winograd, T. 1976 "An Overview of KRL, a Knowledge Representation Language". (Xerox Research Center, Palo Alto, Cal.). Bullwinkle, C. 1977 "The Semantic Component of PAL: The Personal Assistant Language Understanding Program", forthcoming Working Paper, MIT AI Laboratory. Collins, A. and Warnock, E. 1974 Semantic Networks, Memo 2833 (Bolt Beranek &c Neuman Cambridte/Mass.). Goldstein, I. P. 1974 Understanding Simple Picture Programs, MIT-AI TR 294. Goldstein, I. P. 1975 "Bargaining Between Goals", IJCAIIV. Tbilisi. Hewitt, C. 1969 "PLANNER: A Language for Proving Theorems in Robots", IJCAI I. Kern, F. 1975 A Heuristic Approach to Scheduling. M.S. Thesis, MIT.
I. P. Goldstein, R. B. Roberts
45
Marcus, M . 1976 " A Design for a Parser for English", Proc. of the ACM Conference, Houston, October 1976, pp. 6 2 - 6 8 . Martin, W . A . 1977 " A Theory of English Grammar". In A Computational Approach to Modern Linguistics: Theory and Implementation (in preparation). McDermott, D. and Sussman, G. 1972 " T h e C O N N I V E R Reference Manual", M I T AI Memo 259. Minsky, M . 1975 " A Framework for Representing Knowledge". In P.H. Winston (Ed.). The Psychology of Computer Vision (New York, McGraw-Hill). Minsky, M . &C Papert, S. 1974 Artificial Intelligence. University of Oregon. Moore, J . and Newell, A. 1973 " H o w Can M E R L I N Understand ?" In L. Gregg (Ed.) Knowledge and Cognition (Lawrence Potomac Assoc., Erlbaum, MD). Quillian, R. 1968 "Semantic Memory" in M . Minsky (Ed.), Semantic Information Processing (MIT Press, Cambridge, Mass.). Sacerdoti, E. 1975 " T h e non-linear nature of plans". I J C A I I V . Schank, R . C . 1973 "Identification of Conceptualizations Underlying Natural Language". In R . C . Schank & K . M . Colby (Ed.) Computer Models of Thought and Language (San Francisco: Freeman). Sussman, G . J . 1973 A Computational Model of Skill Acquisition, MIT-AI T r 297. Tonge, F . M . 1963 "Summary of a Heuristic Line Balancing Procedure". In E.A. Feigenbaum & J . Feldman (Ed.), Computers and Thought (New York: McGraw-Hill). Winograd, T . 1975 "Frame Representations and the Declarative/Procedural Controversy", in Bobrow and A. Collins (eds.), Representation and Understanding (New York, Academic Press).
P.J. HAYES
The Logic of Frames Introduction:
Representation
and
Meaning
Minsky introduced the terminology of 'frames' to unify and denote a loose collection of related ideas on knowledge representation: a collection which, since the publication of his paper (Minsky, 1975) has become even looser. It is not at all clear now what frames are, or were ever intended to be. I will assume, below, that frames were put forward as a (set of ideas for the design of a) formal language for expressing knowledge, to be considered as an alternative to, for example, semantic networks or predicate calculus. At least one group have explicitly designed such a language, KRL (Bobrow/ Winograd, 1977a, 1977b), based on the frames idea. But it is important to distinguish this from two other possible interpretations of what Minsky was urging, which one might call the metaphysical and the heuristic (following the terminology of (Mc Carthy/Hayes, 1968)). The "metaphysical" interpretation is, that to use frames is to make a certain kind of assumption about what entities shall be assumed to exist in the world being described. That is, to use frames is to assume that a certain kind of knowledge is to be represented by them. Minsky seems to be making a point like this when he urges the idea that visual perception may be facilitated by the storage of explicit 2-dimensional view prototypes and explicit rotational transformations between them. Again, the now considerable literature on the use of 'scripts' or similar frame-like structures in text understanding systems (Charniak, 1977; Lehnert, 1977; Schank, 1975) seems to be based on the view that what might be called "programmatic" knowledge of stereotypical situations like shopping-in-a-supermarket or going-somewhere-on-a-bus is "necessary in order to understand English texts about these situations. Whatever the merits of this view (its proponents seem to regard it as simply obvious, but see (Feldman, 1975) and (Wilks, 1976) for some contrary arguments), it is clearly a thesis about what sort of things a program needs to know, rather than about how those things should or can be represented. One could describe the sequence of events in a typical supermarket visit as well in almost any reasonable expressive formal language. The "heuristic", or as I would prefer now to say, "implementation", interpretation is, that frames are a computational device for organising stored representations in computer memory, and perhaps also, for organising the processes of retrieval and inference which manipulate these stored represen-
P.J.Hayes
47
tations. Minsky seems to be making a point like this when he refers to the computational ease with which one can switch from one frame to another in a frame-system by following pointers. And many other authors have referred with evident approval to the way in which frames, so considered, facilitate certain retrieval operations. (There has been less emphasis on undesirable computational features of frame-like hierarchical organisations of memory.) Again, however, none of this discussion engages representational issues. A given representational language can be implemented in all manner of ways: predicate calculus assertions may be implemented as lists, as character sequences, as trees, as networks, as patterns in an associative memory, etc: all giving different computational properties but all encoding the same representational language. Indeed, one might almost characterise the art of programming as being able to deploy this variety of computational techniques to achieve implementations with various computational properties. Similarly, any one of these computational techniques can be used to implement many essentially different representational languages. Thus, circuit diagrams, perspective line drawings, and predicate calculus assertions, three entirely distinct formal languages (c.f. Hayes, 1975), can be all implemented in terms of list structures. Were it not so, every application of computers would require the development of a new specialised programming language. Much discussion in the literature seems to ignore or confuse these distinctions. They are vital if we are to have any useful taxonomy, let alone theory, of representational languages. For example, if we confuse representation with implementation then LISP would seem a universal representational language, which stops all discussion before we can even begin. One can characterise a representational language as one which has (or can be given) a semantic theory, by which I mean an account (more or less formal, more or less precise — this is not the place to argue for a formal model theory, but see Hayes, 1977) of how expressions of the language relate to the individuals or relationships or actions or configurations, etc., comprising the world, or worlds about which the language claims to express knowledge. (Such an account may — in fact must — entail making some metaphysical assumptions, but these will usually be of a very general and minimal kind (for example, that the world consists of individual entities and relationships of one kind or another which hold between them: this is the ontological committment needed to understand predicate logic)). Such a semantic theory defines the meanings of expressions of the language. That's what makes a formal language into a representational language: its expressions carry meaning. The semantic theory should explain the way in which they do this carrying. T o sum up, then, although frames are sometimes understood at the metaphysical level, and sometimes at the computational level, I will discuss them as a representational proposal: a proposal for a language for the representation of knowledge, to be compared with other such representational languages: a language with a meaning.
48
The Logic of Frames
What Do Frames Mean ? A frame is a data structure — we had better say expression — intended to represent a 'stereotypical situation'. It contains named 'slots', which can be filled with other expressions — fillers — which may themselves be frames, or presumably simple names or identifiers (which may themselves be somehow associated with other frames, but not by a slot-filler relationship: otherwise the trees formed by filling slots with frames recursively, would always be infinitely deep). For example, we might have a frame representing a typical house, with slots called kitchen, bathroom, bedrooms, lavatory, room-withTV-in-it, owner, address, etc.. A particular house is then to be represented by an instance of this house frame, obtained by filling in the slots with specifications of the corresponding parts of the particular house, so that, for example, the kitchen slot may be filled by an instance of the frame contemporary-kitchen which has slots cooker, floorcovering, sink, cleanliness, etc., which may contain in turn respectively an instance of the split-level frame, the identifier vinyl, an instance of the double-drainer frame, and the identifier '13' (for "very clean"), say. Not all slots in an instance need be filled, so that we can express doubt (e.g. "I dont't know where the lavatory is"), and in real 'frame' languages other refinments are included, e.g. descriptors such as "which-isred" as slot fillers, etc. We will come to these later. From examples such as these (c.f. also Minsky's birthday-party example in Minsky, 1975), it seems fairly clear what frames mean. A frame instance denotes an individual, and each slot denotes a relationship which may hold between that individual and some other. Thus, if an instance (call it G00097) of the house frame has its slot called kitchen filled with a frame instance called, say G00082, then this means that the relationship kitchen (or, better, is kitchen of) holds between G00097 and G00082. We could express this same assertion (for it is an assertion) in predicate calculus by writing: is kitchen of (G00097, G00082). Looked at this way,frames are essentially bundles of properties. House could be paraphrased as something like A,x. (kitchen (x,y,) & bathroom (x, y2) & ...) where the free variables yj correspond to the slots. Instantiating House to yield a particular house called Dunroamin (say), corresponds to applying the ^-expression to the identifier Dunroamin to get kitchen (dunroamin, y t ) & bathroom (dunroamin, y2) & . . . which, once the "slots" are filled, is an assertion about Dunroamin. Thus far, then, working only at a very intuitive level, it seems that frames are simply an alternative syntax for expressing relationships between individuals, i.e. for predicate logic. But we should be careful, since although the meanings may appear to be the same, the inferences sanctioned by frames may differ in some crucial way from those sanctioned by logic. In order to get more insight into what frames are supposed to mean we should examine the ways in which it is suggested that they be used.
P.J.Hayes 49 Frame
Inference
One inference rule we have already met is instantiation: given a frame representing a concept, we can generate an instance of the concept by filling in its slots. But there is another, more subtle, form of inference suggested by Minsky and realised explicitly in some applications of frames. This is the "criteriality" inference. If we find fillers for all the slots of a frame, then this rule enables us to infer that an appropriate instance of the concept does indeed exist. For example, if an entity has a kitchen and a bathroom and an address a n d . . . , etc.; then it must be a house. Possession of these attributes is a sufficient as well as necessary condition for an entity to qualify as a house, criteriality tells us. An example of the use of this rule is in perceptual reasoning. Suppose for example the concept of a letter is represented as a frame, with slots corresponding to the parts of the letter (strokes and junctions, perhaps), in a program to read handwriting (as was done in the Essex Fortran project (Brady/Wielinga, 1977)). Then the discovery of fillers for all the slots of the 'F' frame means that one has indeed found an 'F' (the picture is considerably more complicated than this, in fact, as all inferences are potentially subject to disconfirmation: but this does not affect the present point.). Now one can map this understanding of a frame straightforwardly into first-order logic also. A frame representing the concept C, with slot-relationships R t , . . . , R n , becomes the assertion V x (C (x) = 3 y t , . . . , y n . R , ( x , Y l ) & . . . & R n ( x , y j ) or, expressed in clausal form: v x C M ^ M » ) 8c Vx C ( x ) = > R 2 ( x , f 2 ( x ) )
&
:
& Vxy; R j (x, Y l ) & R 2 (X, y2) & . . . & R n (X, Y J . => C (x) The last long clause captures the criteriality assumption exactly. Notice the Skolem functions in the other clauses: they have a direct intuitive reading, e. g. for kitchen, the corresponding function is kitchenof, which is a function from houses to their kitchens. These functions correspond exactly to the selectors which would apply to a frame, considered now as a data structure, to give the values of its fields (the fillers of its slots). All the variables here are universally quantified. If we assume that our logic contains equality, then we could dispense altogether with the slot-relations R; and express the frame as an assertion using equality. In many ways this is more natural. The above then becomes: C ( x ) D 3 y . y = f 1 (x) & etc. f 1 ( x ) = y 1 & . . . 8 c f n ( x ) = y n . =>C(x)
50
The Logic of Frames
(Where the existential quantifiers are supposed to assert that the functions are applicable to the individual in question. This assumes that the function symbols fj denote partial functions, so that it makes sense to write "i 3 y. y = fj (x). Other notations are possible.) We see then that criterial reasoning can easily be expressed in logic. Such expression makes clear, moreover (what is sometimes not clear in frames literature) whether or not criteriality is being assumed. A third form of frames reasoning has been proposed, often called matching (Bobrow/Winograd, 1977 a). Suppose we have an instance of a concept, and we wish to know whether it can plausibly be regarded as also being an instance of another concept. Can we view John Smith as a dog-owner?, for example, where J.S. is an instance of the Man frame, let us suppose, and Dogowner is another frame. We can rephrase this question: can we find an instance of the dogowner frame which matches J.S. ? The sense of match here is what concerns us. Notice that this cannot mean a simple syntactic unification, but must rest — if it is possible at all — on some assumptions about the domain about which the frames in question express information. For example, perhaps Man has a slot called pet, so we could say that a sufficient condition for J.S.'s being matchable to Dog-owner is that his pet slot is filled with as object known to be canine. Perhaps Dog-owner has slots dog and name: then we could specify how to build an instance of dog-owner corresponding to J. S.: fill the name slot with J.S.'s name (or perhaps with J. S. himself, or some other reference to him) and the dog slot with J. S.'s pet. KRL has facilities for just this sort of transference of fillers from slots in one frame to another, so that one can write routines to actually perform the matchings. Given our expressions of frames as assertions, the sort of reasoning exemplified by this example falls out with very little effort. All we need to do is express the slot-to-slot transference by simple implications, thus: Isdog (x) & petof(x, y). r>dogof(x,y) (using the first formulation in which slots are relations). Then, given: name (J.S., "John Smith") (1) & pet (J.S., Fido) (2) & Isdog (Fido) (3) (the first two from the J. S. instance of the 'man' frame, the third from general world-knowledge: or perhaps from Fido's being in fact an instance of the Dog frame) it follows directly that dogof (J.S., Fido) (4) whence, by the criteriality of Dogowner, from (1) and (4), we have: Dogowner (J.S.) The translation of this piece of reasoning into the functional notation is left as an exercise for the reader. All the examples of 'matching' I have seen have this rather simple character. More profound examples are hinted at in (Bobrow/Winograd, 1977 b), how-
P.J.Hayes
51
ever. So far as one can tell, the processes of reasoning involved may be expressible only in higher-order logic. For example, it may be necessary to construct new relations by abstraction during the "matching" process. It is known (Huet, 1972; Pietrzykowski/Jensen, 1973) that the search spaces which this gives rise to are of great complexity, and it is not entirely clear that it will be possible to automate this process in a reasonable way.) This reading of a frame as an assertion has the merit of putting frames, frame-instances and 'matching' assumptions into a common language with a clear extensional semantics which makes it quite clear what all these structures mean. The (usual) inference rules are clearly correct, and are sufficient to account for most of the deductive properties of frames which are required. Notice, for example, that no special mechanism is required in order to see that J.S. is a Dogowner: it follows by ordinary first-order reasoning. One technicality is worth mentioning. In KRL, the same slot-name can be used in different frames to mean different relations. For example, the age of a person is a number, but his age as an airline passenger (i.e. in the traveller frame) is one of {infant, child, adult}. We could not allow this conflation, and would have to use different names for the different relations. It is an interesting exercise to extend the usual first-order syntax with a notion of name-scope in order to allow such pleasantries. But this is really nothing more than syntactic sugar. Seeing As One apparently central intuition behind frames, which seems perhaps to be missing from the above account, is the idea of seeing one thing as though it were another: or of specifying an object by comparison with a known prototype, noting the similarities and points of difference (Bobrow/Winograd, 1977a). This is the basic analogical reasoning behind MERLIN (Moore/ Newell, 1973), which Minsky cites as a major influence. Now this idea can be taken to mean several rather different things. Some of them can be easily expressed in deductive-assertional terms, others less easily. The first and simplest interpretation is that the 'comparison' is filling-in the details. Thus, to say JS is a man tells us something about him, but to say he is a bus conductor tells us more. The bus conductor frame would presumably have slots which did not appear in the Man frame (since-when for example, and bus-company), but it would also have a slot to be filled by the Man instance for JS (or refer to him in some other way), so have access to all his slots. Now there is nothing remarkable here. All this involves is asserting more and more restrictive properties of an entity. This can all be done within the logical framework of the last section. The second interpretation is that a frame represents a 'way of looking' at an entity, and this is a correct way of looking at it. For example a Man may also
52
The Logic of Frames
be a Dog-owner, and neither of these is a further specification of the other: each has slots not possessed by the other frame. Thus far, there is nothing here more remarkable than the fact that several properties may be true of a single entity. Something may be both a Man and a Dog-owner, of course: or both a friend and an employee, or both a day and a birthday. And each of these pairs can have its own independent criteriality. However, there is an apparent difficulty. A single thing may have apparently contradictory properties, seen from different points of view. Thus, a man viewed as a working colleague may be suspicious and short tempered; but viewed as a family man, may have a sweet and kindly disposition. One's views of oneself often seem to change depending on how one perceives one's social role, for another example. And in neither case, one feels, is there an outright contradiction: the different viewpoints 'insulate' the parts of the potential contradiction from one another. I think there are three possible interpretations of this, all expressible in assertional terms. The first is that one is really asserting different properties in the two frames: that 'friendly' at work and 'friendly' at home are just different notions. This is analogous to the case discussed above where 'age' means different relations in two different contexts. The second is that the two frames somehow encode an extra parameter: the time or place, for example: so that Bill really is unfriendly at work and friendly at home. In expressing the relevant properties as assertions one would be obliged then to explicitly represent these parameters as extra arguments in the relevant relations, and provide an appropriate theory of the times, places, etc. which distinguish the various frames. These may be subtle distinctions, as in the self seen-as-spouse or the self seenas-hospital-patient or seen-as-father, etc., where the relevant parameter is something like interpersonal role. I am not suggesting that I have any idea what a theory of these would be like, only that to introduce such distinctions, in frames or any other formalism, is to assume that there is such a theoryperhaps a very simple one. The third interpretation is that, after all, the two frames contradict one another. Then of course a faithful translation into assertions will also contain an explicit contradiction. The assertional language makes these alternatives explicit, and forces one who uses it to choose which interpretation he means. And one can always express that interpretation in logic. At worst, every slot-relation can have the name of its frame as an extra parameter, if really necessary. There is however a third, more radical, way to understand seeing-as. This is to view a seeing-as as a metaphor or analogy, without actually asserting that it is true. This is the MERLIN idea. Example: a man may be looked at as a pig, if you think of his home as a sty, his nose as a snout, and his feet as trotters. Now such a caricature may be useful in reasoning, without its being taken to be veridically true. One may think of a man as a pig, knowing perfectly well that as a matter of fact he isn't one. MERLIN's notation and inference machinery for handling such analogies
P.J.Hayes
53
are very similar respectively to frames and "matching", and we have seen that this is merely first-order reasoning. The snag is that we have no way to distinguish a 'frame' representing a mere caricature from one representing a real assertion. Neither the old MERLIN (in which all reasoning is this analogical reasoning) nor KRL provide any means of making this rather important distinction. What does it mean to say that you can look at a man as a pig ? I think the only reasonable answer is something like: certain of the properties of (some) men are preserved under the mapping defined by the analogy. Thus, perhaps, pigs are greedy, illmannered and dirty, their snouts are short, upturned and blunt, and they are rotund and short-legged. Hence, a man with these qualities (under the mapping which defines the analogy: hence, the man's nose will be upturned, his house will be dirty) may be plausibly be regarded as pig-like. But of course there are many other properties of pigs which we would not intend to transfer to a men under the analogy: quadrupedal gait, being a source of bacon, etc. (Although one of the joys of using such analogies is finding ways of extending them: "Look at all the little piggies ... sitting down to eat their bacon" [G. Harrison]). So, the intention of such a caricature is, that some -not all- of the properties of the caricature shall be transferred to the caricaturee. And the analogy is correct, or plausible, when these transferred properties do, in fact, hold of the thing caricatured: when the man is in fact greedy, slovenly, etc This is almost exactly what the second sense of seeing-as seemed to mean: that the man 'matches' the pig frame. The difference (apart from the systematic rewriting) is that here we simply cannot assume criteriality of this pig frame. To say that a man is a pig is false: yet we have assumed that this fellow does fit this pig frame. Hence the properties expressed in this pig frame cannot be criterial for pig. To say that a man is a pig is to use criteriality incorrectly. This then helps to distinguish this third sense of seeing-as from the earlier senses: the failure of criteriality. And this clearly indicates why MERLIN and KRL cannot distinguish caricatures from factual assertions; for criteriality is not made explicit in these languages. We can however easily express a non-criterial frame as a simple assertion. One might wonder what use the 'frame' idea is when criteriality is abandoned, since a frame is now merely a conjunction. Its boundaries appear arbitrary: why conjoin just these properties together? The answer lies in the fact that not all properties of the caricature are asserted of the caricaturee, just those bundled together in the seeing-as frame. The bundling here is used to delimit the scope of the transfer. We could say that these properties were criterial for pig-likeness (rather than pig-hood). In order to express caricatures in logic, then, we need only to define the systematic translations of vocabulary: nose — snout, etc., this seems to require some syntactic machinery which logic does not provide: the ability to substitute one relation symbol for another in an assertion. This kind of "analogy map-
54
The Logic of Frames
ping" was first developed some years ago by R. Kling and used by him to express analogies in mathematics. Let (j) be the syntactic mapping 'out' of the analogy (e.g. 'snout 1 -* r nose] 'sty 1 -* 'house 1 ), and suppose Xx. \|/(x) is the defining conjunction of the frame of Pig-likeness : Pig-like (x) = \|/ (x) (Where v|/ may contain several existentially bound variables, and generally may be a complicated assertion). Then we can say that Pig-like (Fred) is true just when holds for Fred, i. e. the asserted properties are actually true of Fred, when the relation names are altered according to the syntactic mapping . So, a caricature frame needs to contain, or be somehow associated with, a specification of how its vocabulary should be altered to fit reality. With this modification, all the rest of the reasoning involved is first-order and conventional.
Defaults One aspect of frame reasoning which is often considered to lie outside of logic is the idea of a default value: a value which is taken to be the slot filler in the absence of explicit information to the contrary. Thus, the default for the home-port slot in a traveller frame may be the city where the travel agency is located (Bobrow et al. 1977). Now, defaults certainly seem to take us outside first-order reasoning, in the sense that we cannot express the assumption of the default value as a simple first-order consequence of there being no contrary information. For if we could, the resulting inference would have the property that p — | q but (p &C r) 1— —i q for suitable p,q and r (p does not deny the default: q represents the default assumption: r overrides the default), and no logical system behaves this way (Curry [1956] for example, takes p I— q => p 8c r I— q to be the fundamental property of all 'logistic' systems). This shows however only that a naive mapping of default reasoning into assertional reasoning fails. The moral is to distrust naivety. Let us take an example. Suppose we have a Car frame and an instance of it for my car, and suppose it has a slot called status, with possible values {OK, struggling, needsattention, broken], and the default is OK. That is, in the absence of contrary information, I assume the car is OK. Now I go to the car, and I see that the tyre is flat: I am surprised, and I conclude that (contrary to what I expected), the correct filler for the status slot is broken. But, it is important to note, my state of knowledge has changed. I was previously making an assumption — that the car was OK — which was reasonable given my state of knowledge at the time. We might say that if v|/ represented my state of knowledge, then status (car) = O K was a reasonable inference from ij/:\|i (—status (car) = O K . But once I know the tyre is flat, we have a new state of knowledge \|/t, and of course
P.J.Hayes
55
\J/1 h- status (car) = broken. In order for this to be deductively possible, it must be that is got from \|/ not merely by adding new beliefs, but also by removing some old ones. That is, when I see the flat tyre I am surprised: I had expected that it was OK. (This is not to say that I had explicitly considered the possiblity that the tyre might be flat, and rejected it. It only means that my state of belief was such that the tyres being OK was a consequence of it). And of course this makes sense: indeed, I was surprised. Moreover, there is no contradiction between my earlier belief that the car was OK and my present belief that it is broken. If challenged, I would not say that I had previously been irrational or mad, only misinformed (or perhaps just wrong, in the sense that I was entertaining a false belief). As this example illustrates, default assumptions involve an implicit reference to the whole state of knowledge at the time the assumption was generated. Any event which alters the state of knowledge is liable therefore to upset these assumptions. If we represent these references to knowledge states explicitly, then 'default'reasoning can be easily and naturally expressed in logic. To say that the default for home-port is Palo Alto is to say that unless the current knowledge-state says otherwise, then we will assume that it is Palo Alto, until the knowledge-state changes. Let us suppose we can somehow refer to the current knowledge-state (denoted by NOW), and to a notion of derivability (denoted by the turnstile I—). Then we can express the default assumption by: 3 y. N O W I— r homeport (traveller) = y1 V homeport (traveller) = Palo Alto. The conclusion of which allows us to infer that homeport (traveller) = Palo-Alto until the sti te of knowledge changes. When it does, we would have to establish this conclusion for the new knowledge state. I believe this is intuitively plausible. Experience with manipulating collections of beliefs should dispel the feeling that one can predict all the ways new knowledge can affect previously held beliefs. We do not have a theory of this process, nor am I claiming that this notation provides one/'' But any mechanism — whether expressed in frames or otherwise — which makes strong assumptions on weak evidence needs to have some method for unpicking these assumptions when things go wrong, or equivalently of controlling the propagation of inferences from the assumptions. This inclusion of a reference to the knowledge-state which produced the assumption is in the latter category. An example of the kind of axiom which might form part of such a theory of assumption-transfer is this. Suppose I— p, and hence p, is in the knowledgestate (|>, and suppose we wish to generate a new knowledge-state (f>'by adding the observation q. Let \|I be (j) — r I— p 1 . Then if v|/ u {q} W—i p, define (J)'to be \|/ u r\|/1— pVq1. This can all be written, albeit rather rebarbitively, in logic augmented with notations for * Recent work of Doyle, McDermott and Reiter is providing such a theory: see (Doyle, 1978) (McDermott/Doyle, 1978) (Reiter, 1978)
56
The Logic of Frames
describing constructive operations upon knowledge-states. It would justify for example the transfer of status (car) = OK past an observation of the form, say, that the car was parked in an unusual position, provided that the belief state did not contain anything which allowed one to conclude that an unusual parking position entailed anything wrong with the car. (It would also justify transferring it past an observation like it is raining, or my mother is feeling ill, but these transfers can be justified by a much simpler rule: if p and q have no possible inferential connections in — this can be detected very rapidly from the 'connection graph' (Kowalski 1973) — then addition of q cannot affect p.) T o sum up, a close analysis of what defaults mean shows that they are intimately connected with the idea of observations: additions of fresh knowledge into a data-base. Their role in inference — the drawing of consequences of assumptions — is readily expressible in logic, but their interaction with observation requires that the role of the state of the system's own knowledge is made explicit. This requires not a new logic, but an unusual ontology, and some new primitive relations. We need to be able to talk about the system itself, in its own language, and to involve assumptions about itself in its own processes of reasoning.
Reflexive
Reasoning
We have seen that most of 'frames' is just a new syntax for parts of first-order logic. There are one or two apparently minor details which give a lot of trouble, however, especially defaults. There are two points worth making about this. The first is, that I believe that this complexity, revealed by the attempt to formulate these ideas in logic, is not an artefact of the translation but is intrinsic to the ideas involved. Defaults just are a complicated notion, with farreaching consequences for the whole process of inference-making. The second point is a deeper one. In both cases — caricatures and defaults — the necessary enrichment of logic involved adding the ability to talk about the system itself, rather than about the worlds of men, pigs and travel agents. I believe these are merely two relatively minor aspects of this most important fact: much common-sense reasoning involves the reasoner in thinking about himself and his own abilities as well as about the world. In trying to formalise intuitive common-sense reasoning I find again and again that this awareness of one's own internal processes of deduction and memory is crucial to even quite mundane arguments. There is only space for one example. I was once talking to a Texan about television. This person, it was clear, knew far more about electronics than I did. We were discussing the number of lines per screen in different countries. One part of the conversation went like this.
P.J. Hayes
57
Texan: You have 900 lines in England, don't you ? Me: No, 625. Texan (confidently): I thought it was 900. Me (somewhat doubtfully): No, I think it's 625. (pause) Say, they couldn't change it without altering the sets, could they ? I mean by sending some kind of signal from the transmitter or Texan: No, they'd sure have to alter the receivers. Me (now confident): Oh, well, it's definitely 625 lines then. I made a note of my own thought processes immediately afterwards, and they went like this. I remembered that we had 625 lines in England. (This remembering cannot be introspectively examined: it seems like a primitive ability, analogous to FETCH in CONNIVER. I will take it to be such a primitive in what follows. Although this seems a ludicrously naive assumption, the internal structure of remembering will not concern us here, so we might as well take it to be primitive.) However, the Texan's confidence shook me, and I examined the belief in a little more detail. Many facts emerged: I remembered in particular that we had changed from 405 lines to 625 lines, and that this change was a long, expensive and complicated process. For several years one could buy dual-standard sets which worked on either system. My parents, indeed, had owned such a set, and it was prone to unreliability, having a huge multigang sliding-contact switch: I had examined its insides once. There had been newspaper articles about it, technical debates in the popular science press, etc.. It was not the kind of event which could have passed unnoticed. (It was this richness of detail, I think, which gave the memory its subjective confidence: I couldn't have imagined all that, surely ?) So if there had been another, subsequent, alteration to 900 lines, there would have been another huge fuss. But I had no memory at all of any such fuss: so it couldn't have happened. (I had a definite subjective impression of searching for such a memory. For example, I briefly considered the possibility that it had happened while my family and I were in California for 4 months, being somehow managed with great alacrity that time: but rejected this when I realised that our own set still worked, unchanged, on our return). Notice how this conclusion was obtained. It was the kind of event I would remember; but I don't remember it; so it didn't happen. This argument crucially involves an explicit assertion about my own memory. It is not enough that I didn't remember the event: I had to realise that I didn't remember it, and use that realisation in an argument. The Texan's confidence still shook me somewhat, and I found a possible flaw in my argument. Maybe the new TV sets were constructed in a new sophisticated way which made it possible to alter the number of lines by remote control, say, by a signal from the transmitter. (This seems quite implausible to me now; but my knowledge of electronics is not rapidly accessible, and it did seem a viable possibility at the moment). How to check whether this was
The Logic of Frames
58
possible ? Why, ask the expert: which I did, and his answer sealed the only hole I could find in the argument. This process involves taking a previously constructed argument — a proof, or derivation — as an object, and inferring properties of it: that a certain step in it is weak (can be denied on moderately plausible assumption), for example. Again, this is an example of reflexive reasoning: reasoning involving descriptions of the self. Conclusion I believe that an emphasis on the analysis of such processes of reflexive reasoning is one of the few positive suggestions which the 'frames' movement has produced. Apart from this, there are no new insights to be had there: no new processes of reasoning, no advance in expressive power. Nevertheless, as an historical fact, 'frames' have been extraordinarily influential. Perhaps this is in part because the original idea was interesting, but vague enough to leave scope for creative imagination. But a more serious suggestion is that the real force of the frames idea was not at the representational level at all, but rather at the implementation level: a suggestion about how to organise large memories. Looked at in this light, we could sum up 'frames' as the suggestion that we should store assertions in nameable 'bundles' which can be retrieved via some kind of indexing mechanism on their names. In fact, the suggestion that we should store assertions in non-clausal form. Acknowledgements I would like to thank Frank Brown and Terry Winograd for helpful comments on an earlier draft of this paper.
Appendix:
Translation of KRL-fy into Predicate Logic
KRL
many-sorted predicate logic
Units (i) Basic (ii) Specialisation (iii) Abstract (iv) Individual (v) Manifestation
(vi) Relation
relation
P.J.Hayes Slot
59
binary relation or unary function
Descriptors (i) direct pointer
name
(ii) Perspective
^.-expression e.g. Xx. trip (x) & destination (x) = Boston & airline (x) = T W A
e.g. (a trip with destination = Boston airline = T W A )
(in this case both fillers are unique. If not we would use a relation, e.g. a i r l i n e ( x , T W A ) )
(iii) Specification e.g. (the actor from Act E 1 7 ( a c h a s e . .
l-expression e.g. ix. a c t o r ( E 1 7 ) = x or ix. a c t o r ( E 1 7 ) = x & A c t ( E 1 7 )
(iv) predication
/.-expression
(v) logical boolean
non-atomic expression
(vi) restriction
i-expression
e.g. (the one (a mouse) (which owns (a dog)))
e.g. ix. mouse(x) & 3 y . d o g ( y ) &owns(x,y)
(vii) selection e.g. (using (the age from Person this one) select from (which is less than 2) ~ Infant (which is at least 12) ~ Adult otherwise child
l-expression with conditional body e.g. ix. (age (this one) < 2 &C x = infant) V (age) (this one) > 12 & x = adult) V (age (this one) < 2 & age (this one) > 12 & x = child) X-expression (sets coded as predicates) or set specification (if we use set theory. Only very simple set theory is necessary) l-expression or e-expression
(viii) set specification
(ix) contingency e.g. (during state 2 4 then (the topblock from (a stack with height = 3)))
whose body mentions a state or has a bound state variable, e.g. ix.3y. is stack (y, state 24) 8c height (y) = 3 & topblock (y, x) where I have taken stack to be a contingent property: other choices are possible (e.g. stacks always " e x i s t " but have zero height in some states).
Examples Traveller (x)
Person (x) & (category (x) = infant V category (x) = child V category (x) = adult)
& 3 y. airport (y) & preferredairport (x, y) Person(x)
string (first name (x)) & string (last name (x)) & integer (age (x)) & city (nametown (x)) & address (streetaddress (x))
60
The Logic of Frames
Person (G0043) & firstname (G0043) = " J u a n " & foreignname (lastname (G0043)) & firstcharacter (lastname (G0043)) = " M " & age (G0043) > 21 Traveller (G0043) & category (G0043) = Adult & preferredairport (G0043,SJC>)
References Bobrow, D.G., Kaplan, R . M . , Norman, D. A., Thompson, H. and Winograd, T. 1977 "GUS, a Frame-Driven Dialog System", Artificial Intelligence 8, 155-173. Bobrow, D.G. and Winograd, T. 1977a "An Overview of KRL", Cognitice Science 1, 3-46. 1977b "Experience with KRL-O : One Cycle of a Knowledge Representation Language", Proc. 5 t h Int. Joint Conf. on AI, MIT, (vol 1), 213-222. Brady, J . M . and Wielinga, B . J . 1977 "Reading the Writing on the Wall", Proc. Workshop on Computer Vision, Amherst Mass. Charniak, E. 1977 "Ms. Malaprop, a Language Comprehension Program", Proc. 5 t h Int. Joint Conf. on AI, M I T , (vol 1), 1-8. Curry, H.B. 1956 Introduction to Mathematical Logic (Amsterdam : Van Nostrand) Doyle, J. 1978 Truth Maintenance System for Problem Solving, Memo TR-419, A.I. Laboratory, M I T Feldman, J. 1975 "Bad-Mouthing Frames", Proc. Conf. on Theor. Issues in Natural Language Processing", Cambridge Mass, 102-103. Hayes, P.J. 1975 "Some Problems and Non-problems in Representation Theory", Proc. 1st AISB Conf., Brighton Sussex. 1977 "In Defence of Logic", 5 Int. Joint Conf. on AI, MIT, (vol 2), 559-565. Huet, G.P. 1972 Constrained Resolution : a Complete Method for Type Theory, Jenning's Computer Science, Report 1117, Cace Western University. Kowalski, R. 1973 An Improved Theorem-Proving System for First Order Logic, DCL Memo 65, Edinburgh.
P.J. Hayes
61
Lehnert, W. 1977 "Human and Computational Question Answering", Cognitive Science 1, 47-73. McCarthy, J. and Hayes, J. P. 1969 "Some Philosophical Problems from the Standpoint of Artificial Intelligence", Machine Intelligence 4, 463-502. McDermott, D. and Doyle, J. 1978 Non-monotonic logic I, Memo AI-486, A.I. Laboratory, MIT Minsky, M. 1975 "A Framework for Representing Knowledge", in P. Winston (Ed.) The Psychology of Computer Vision, (New York: McGraw-Hill), 211-277. Moore, J. and Newell, A. 1973 "How Can MERLIN Understand ?", in L. Gregg (Ed.) Knowledge and Cognition (Hillsdale New York: Lawrence Erlbaum Assoc), 201-310. Pietrzykowski, T. and Jensen, D. 1973 Mechanising W-Order Type Theory through Unification, Dept. of Applied Analysis and Comp. Science, Report CS-73-16, University of Waterloo. Reiter, R. 1978 "On ReasoningbyDefault",Proc. 2 nd Symp. on Theor. Issues in Natural Language Processing, Urbana, Illinois. Schank, R. 1975 "The Structure of Episodes in Memory", in D. G. Bobrow and A. Collins (Eds) Representation and Understanding, (New York: Academic Press), 237-272. Wilks, Y. 1976 "Natural Language Understanding Systems within the AI Paradigm: a Survey", in M. Penny (Ed) Artificial Intelligence and Language Comprehension, (National Institute of Education, Washington, Oc).
E. C H A R N I A K
Ms. Malaprop, a Language Comprehension Program 1 Abstract This paper describes M s . Malaprop, a program (currently being designed) which will answer questions about simple stories dealing with painting, where stories, questions and answers will be expressed in semantic representation rather than English in order to allow concentration on the inferential problems involved in language comprehension. The common sense knowledge needed to accomplish the task is provided by the frame representation of " m u n d a n e " painting found in Charniak (1976 b). The present paper, after reviewing this representation, goes on to describe how it is used by M s . Malaprop. Some specific questions of semantic representation, matching, correcting false conclusions, and search, will be discussed. 1.
Introduction
For the purpose of this paper, I take language comprehension to be the process of fitting what one is told into the framework established by what one already knows. So, to take a simple example, (1) Jack was going to paint a chair. He started to clean it. our understanding of the second line of (1) is conditioned by two facts. The first is the story specific information provided by the first line, i.e., that Jack has the intention of painting the chair, while the second comes from our general fund of common sense knowledge and states that it is a good idea if the thing to be painted is clean before one starts. By tying the second line to such information a person, or computer, would " k n o w " , such related facts as why the action was performed, what might have happened if it hadn't been, and how far along Jack is in the process of painting the chair. M s . Malaprop is a computer program (currently being designed) which will answer questions about examples such as (1). Indeed (1) is a typical example in many respects. For one thing it stays quite close to our knowledge of everyday events. As such the story specific information serves only to tell the program which parts of its real world knowledge are relevant to the story situation; the story does not build up a complex setting of its own. Hence when M s . Malaprop fits new story information into what she knows it is always by relating it to her store of common sense knowledge, and never by 1
T h i s paper has been submitted to the Fifth International Conference on Artificial Intelligence
E. Charniak
63
seeing how it relates to some complex plot supplied in the story. This is obviously unrealistic as far as stories go, but it is all too realistic given current understanding of language comprehension. This example is also typical insofar as once we have seen the second line as an instance of a certain portion of the painting process, the typical questions one might ask to demonstrate understanding, such as "why", or "what would have happened if he hadn't", should not be too dificult to answer. Hence I shall for the most part ignore the problem of how questions actually get answered in order to concentrate on the problems of the initial integration which, without further discussion, I will assume occurs at "read time" rather than "question time". (For discussion of this assumption, see Charniak (1976 a).) No example is completely typical however, and one thing (1) does not indicate is that Ms. Malaprop, at least in her early versions, will not understand English, but rather will be given stories and questions already in semantic representation. This representation has been almost entirely designed (see Charniak (1976 b)) but, except in those places where it is the topic of discussion, it will be replaced by English phrases throughout. Also, while many of the examples which are being used to define Ms. Malaprop's capabilities are like (1) in that they call for the program to tell you what in some sense it already knows, other examples are considerably more complex. For example: (2) Jack was going to paint his chair green. He got some blue and yellow paint. Question: Why ? (3) After Jack finished he did not wash the paint brush. He was going to throw it away. Question: Why didn't Jack wash the brush ? I should note that the foreseen first version of the program will handle all of the painting examples herein, given the caveat, repeated here for the last time, that Ms. Malaprop cannot handle actual English. 2.
The Framed Painting
Evidently, a program which answers such questions will have to have at its disposal quite a bit of information about painting and its neighbouring concepts. This knowledge base is completely designed and is described in detail in Charniak (1976 b). We can only give a brief overview here, but it should be stressed that this representation is a) completely formalized, b) fairly complete, and c) fairly deep. By this last comment I mean that I have striven to hook up the representation of painting knowledge to more basic knowledge wherever possible. So, the representation "knows" why one should wash a paint brush after use because it knows about what happens when paint dries on something. But this latter is based on its knowledge of the evaporation of liquids containing residues, which in turn is based on its knowledge of evapo-
64
Ms. Malaprop, a Language Comprehension Program
ration in general. Almost nothing of these properties can be presented here, and the interested reader is encouraged to consult the afore-mentioned article. Let us start by considering a very simplified version of the painting "frame" (term due to Minsky (1975)), expressed mostly in informal English, but with some formalism thrown in. PAINTING (COMPLEX EVENT) VARS: (AGENT must be animate) (OBJECT must be a solid) (PAINT must be a liquid, usually is paint) (INSTRUMENT must be a solid, usually is either a roller or a paint brush, and should be absorbant) GOAL: PAINTING-GOAL (OBJECT has a coat of PAINT on it) COMES-FROM: (PAINTING 6 via rules which say that paint on INSTRUMENT will stick to OBJECT, partially fulfilling the goal) EVENT: PAINTING 1 (OBJECT not dirty) COMES-FROM: (WASH-GOAL) LEADS-TO: (NOT DIRTY-OBJECT 1) PAINTING 2 (everything nearby covered with newspaper) PAINTING 3 (LOOP PAINTING 4 (get PAINT on INSTRUMENT) COMES-FROM: (rules which explain how immersing INSTRUMENT in PAINT will give desired result) PAINTING 5 (GREATER DRIP-THRESHOLD than the amount of PAINT on INSTRUMENT) COMES-FROM: (rules showing how the regulation of pressure regulates the amount of PAINT) PAINTING 6 (INSTRUMENT is in contact with OBJECT) PAINTING 7 (GREATER amount of PAINT on INSTRUMENT than the STREAK-THRESHOLD) COMES-FROM: either (regulating pressure) or (adding more paint via PAINTING 4)
) PAINTING 8 (PAINT removed from INSTRUMENT) LEADS-TO: (rules expressing how if it were not removed INSTRUMENT would stiffen)
E. Charniak
65
Approaching this in steps, we first note that it is divided into three sections. The first, labeled VARS is simply a list of variables along with some specification of what sorts of things may be bound to these variables. Then comes the GOAL, which expresses the goal of the activity. Finally we have EVENT which is a description of what sorts of things have to be done in order to accomplish the goal. The arrows here are to indicate rough time ordering. Going down one level of detail we notice that the EVENT is made up of a series of "frame statements", each of which has a name, PAINTING 1, etc., which is followed by an expression in parentheses. Here these are informal English like statements, but in the complete version they are simply predicate plus argument structures. Some of these have extra information following them (labeled COMES-FROM and LEADS-TO) but let us ignore these for the time being. If we just look at the frame statements in EVENT we see that they give an outline of how to paint. One portion of this outline is a LOOP (PAINTING 3) which tells us to get paint on the instrument (PAINTING 4), bring the instrument in contact with the object (PAINTING 6), while at the same time keeping the volume of paint above the streak threshold (PAINTING/), and below the drip threshold (PAINTING 5). Shifting our attention to the GOAL, we see that it too is a frame statement (named PAINTINGGOAL), and had we specified the variable restrictions in VARS more fully, we would have seen them to be frame statements also. If we look now at a single frame statement, say PAINTING 1, we see that it has various "tags". One of these expresses how the state described by the frame statement normally is achieved (or equivalently how it "comes about", or where it COMES-FROM) while the other gives the reason for doing this portion of the frame (or the results of the frame statement, or what it LEADSTO). So PAINTING 1 is brought about by the WASH frame (left to the reader's imagination). This is expressed by saying that PAINTING 1 matches (in the normal pattern matching sense) the goal statement of the WASH frame, namely WASH-GOAL. In much the same way there is a COMESFROM pointer from PAINTING-GOAL to PAINTING 6 which states how it is that the goal is brought about by PAINTING 6. Note that in this case PAINTING-GOAL does not match PAINTING 6, so in a complete version there would be "intermediaries" or rules which, from a syntactic point of view explain how the two statements can be made to match, while from a semantic point of view they explain how it is that bringing the instrument in contact with the object can ultimately lead to the goal being achieved. For example, these rules would tell us that if there were no paint on the instrument the desired result would not be achieved. In much the same way, PAINTING 1 (OBJECT not dirty) LEADS-TO the prevention of flaking and cracking, which is expressed by a separate frame, DIRTY-OBJECT, given below.
66
Ms. Malaprop, a Language Comprehension Program
D I R T Y - O B J E C T (SIMPLE-EVENT) VARS: . . . E V E N T : (AND D I R T Y - O B J E C T 1 (object is dirty) D I R T Y O B J E C T 2 (paint is put over the dirt) ) CAUSES (after a year, plus or minus a factor of four) (OR D I R T Y - O B J E C T 3 (paint flakes) D I R T Y - O B J E C T 4 (paint cracks) ) In effect, then we are told that PAINTING 1 will match the negation of D I R T Y - O B J E C T 1, and hence prevent the causal relation described in DIRTY-OBJECT. Note that the E V E N T in D I R T Y - O B J E C T is of a different form than that of PAINTING, as the former expresses a simple cause and effect relation, while the latter gives a complex series of "commands" without any cause and effect relations. In fact, they are two different kinds of frames, as is indicated by the type marks appearing by their names, SIMPLE-EVENT, and C O M P L E X - E V E N T . These are two of five types of frames allowed by the system. Returning to our PAINTING frame we can now see how story statements like those of (i) can be integrated into Ms. Malaprop's knowledge of the world. The first line of (1) (Jack was going to paint a chair) will set up an instance of the PAINTING frame with the AGENT and O B J E C T variables bound appropriately. Then the second line comes in (Jack started to clean the chair). If we assume that the input representation of this corresponds to „Jack started an activity which would cause the chair not to be dirty" we can see that part of this will match PAINTING 1, a fact which will be recorded by a LEADST O pointer. (Remember, LEADS-TO pointers indicate reasons for doing things.) Then, if Ms. Malaprop is asked " w h y " she will simply follow the pointer from the story statement back to PAINTING 1, and reply, in effect, "one should not paint a dirty object". If asked " w h y " again, the program would follow the LEADS-TO pointer from PAINTING 1 to D I R T Y - O B J E C T 1, and reply, "otherwise the paint might flake".
3.
Semantic Representation and the Use of Framed Painting
In the last section we observed how Ms. Malaprop could use the framed painting representation to answer questions like "Why did Jack wash the chair ?". While I believe that Ms. Malaprop handles such examples in a fairly natural way, it should not have escaped the reader's notice that this type of " w h y " question is exactly the sort of thing the framed painting was designed to handle. In this section I will present some other ways in which the frames can be used in story comprehension — ways however which are not so obiously
E. Charniak
67
"built in" the structure of framed painting. At the same time I have chosen examples which will illustrate a second property of the program. Ms. Malaprop is designed to be the inference portion of a larger program which would incorporate the proverbial "front end" to translate the natural language into the semantic representation. Nevertheless, much of the input to Ms. Malaprop has a very "surfacy" feel to it. This is due to my belief that semantic representation is not all of a piece, but rather that one starts with a relatively crude semantic representation, which is then refined and "deepened" during inference. So Ms. Malaprop will be given such imprecise and even ambiguous predicates as HAVE (a notorious problem) and NORMAL (in the sense that "very" indicates "greater than NORMAL"). While I mean to contrast this view with the more traditional one that the "front end" converts natural language into a precise and unambiguous semantic representation, given the now standard view that such a front end would require the full services of inference it is by no means clear that the two views can be cleanly distinguished. Hence I will not attempt to argue my view, but rather assume it and hope the few examples suffice to show its utility. START and FINISH. Typically when we say someone is engaged in painting, we mean two separate, though clearly related, things. We may simply wish to indicate that he is somewhere in the activity described in PAINTING, but we may rather wish to delimit a more restricted set of actions. Perhaps the easiest way to see this is to consider "starting" and "finishing". (4) Jack had finished painting. Then he washed the brush. (5) Jack put paper under the chair. Then he started to paint. How is our program to account for the existence of these two, inner and outer, paintings? (An earlier discussion of this problem is to be found in Scragg (1975).) One might simply say that there are two PAINTINGs, not one, and the first line of (4) is ambiguous between the two. However, this is an inelegant solution at best since exactly this same division seems to occur regularly in a wide variety of actions. (6) Jack has finished putting on his shoes. Now he will tie them. (7) Jack put the dishes in the sink, and added soap and water. Before he started washing the dishes the telephone rang. I have assumed in Ms. Malaprop that the inner activity can be inferred from the outer according to the following rule: (8) Starting from the goal statement of the complex activity, follow the COMES-FROM pointer into the event (see section II). The command to which it points is the inner event unless it is embedded in loops, in which case the inner event is the outermost loop which so encloses it. So for painting, the COMES-FROM points to PAINTING 6 (instrument in contact with object) which is enclosed by PAINTING3 (the loop for keeping
68
Ms. Malaprop, a Language Comprehension Program
paint on the instrument) so it is the latter which is the inner event of painting, a result which is in full accord with intuition. What this means in terms of practice is that Ms. Malaprop will be given as input, story statements like (START JACK 1 SS1) where SS1 is the name of (PAINTING JACK1 CHAIR 1). The program will then look to see if any of the events of PAINTING have been accomplished, in which case the inner painting must be meant. Otherwise she would assume the outer painting. (The original START statement will be retained in the data base since in the case of a mistaken assumption its presence will, in principle, allow the program to rectify the error. It should be noted however that rule (8) is far from the complete story. Consider for example:
(9) Jack finished { g ¡ ^ ¿ j f } a cake. Then he put { ^ "„"he ichlg"} " The combination of (b) with (c) is absurd, although (a) with (c) is fine, (b) with (d) is fine, although (a) with (d) is slightly odd. What this seems to show is that "making a cake" and "baking a cake" do not have the same inner events, although superficially their outer event is the same. I assume that this has something to do with the idiosyncrasies of "making" and "baking" and will hence retain rule (8) as a good first approximation. HAVE. A typical example of a word with built in, and seemingly intractable, ambiguity is "have". The usual first approximation is to divide it up into a number of reasonably well-defined word senses, such as "own", "part-of", "hold", etc. But this inevitably runs into problems with the ill-defined "have for use" meaning illustrated in examples like: (10) Jack was carrying a walky-talky, but when his friends asked him to demonstrate it he admitted that he did not have the other half. It was being repaired. Here Jack is not saying that he does not own the other half, or that he is not holding it, rather simply that he does not have the "use" of it. Furthermore, many examples which could possibly be forced into one of the other word senses are better seen as examples of "have for use". (11) The children waited to start the game until Fred came. He had the bat. Fred might be holding the bat, and perhaps he owns it, but neither of these need be the case. Such examples are easily understood within the frame system being described here. "Have as use" simply says that the object in question may be bound to an appropriate variable in an appropriate frame. For example, if in a discussion of painting we are told that Jack got some paint (where "got" « "cause to have") this simply informs us that we may bind this instance of paint to the variable in painting which represents the paint to be used. By looking at "have" in this light we need not worry exactly what set of
E. Charniak
69
physical conditions are being claimed. Our painting frame will tell us (indirectly) that typical use of paint requires being in its proximity, but note that if for some reasons this condition does not apply (painting by remote control using a robot) then Jack can still "have" the paint although indefinitely distant from it. Furthermore, although I can hardly imagine what the appropriate frames might look like, this handling of "have" at least holds out the promise of working for such refractory examples as: (12) Jack can work in Switzerland now that he has a work permit. (13) Prof. Wetfred has the brightest graduate students in the department. Of course, one may encounter a "have" without an appropriate binding frame available, only to be told in the next sentence or two the use intended. Given the afore-mentioned view of inference as refining and deepening the semantic representation this is no problem however. Ms. Malaprop will simply record the "ambiguous" have statement, and then return to it should the next few lines provide a possible binding frame. T O O MUCH — T O O LITTLE. For our last example of interesting interactions between the "meanings" of words and our frames, consider the use of " t o o " as seen in the following examples: (14) Jack was painting the wall. At one point he had too little paint on his brush. Question: What do you think happened ? Answer: The paint did not go on evenly. (15) Then he got too much paint on the brush. Q: What happened? A: Possibly the paint dripped. Such examples hardly exhaust the uses of the term, but they can be handled naturally within Ms. Malaprop. Roughly speaking, Ms. Malaprop interprets such cases as saying that some threshold has been exceeded and this leads to an undesirable effect. That is to say, Ms. Malaprop starts out by looking for a command in an active frame which, for (14), matches (16). (16) (GREATER PAINT-VOLUME THRESHOLD) She will find PAINTING 7. Once found, Ms. Malaprop simply states that this command was not "obeyed". Any deleterious results from this will then be specified by the LEADS-TO pointer from the command. Quite similar techniques work for "very", except that with "very" there is no implication of negative results, but only the weaker "greater than normal". A program would then be faced with deciding what constitutes "normality" in a given circumstance. This is a difficult problem, and for the moment I intend only to implement one such example, namely: (16) When Jack ran out of paint he started to press very hard on his paint brush. Although I have not worked out the exact mechanism here, the general idea seems clear enough given the painting frame as previously developed. Noting that pressure on the painting instrument is but one of two ways to accomplish
70
Ms. Malaprop, a Language Comprehension Program
PAINTING 7 (keep the amount of paint on the instrument greater than some threshold), and further that the other means can no longer be used because of lack of paint, Ms. Malaprop will interpret "normal" as expressing the normal trade off between the two methods, and the lack of paint as the reason why the normal trade off is not being followed. 4.
Search and Language
Comprehension
The notion of search has seldom played an important role in Artificial Intelligence language work. The earlier programs hardly did any search (with the exception of Raphael's (68) SIR) while the latter work shunted the issue aside by a variety of means: one could adopt a sufficiently limited domain of discourse (Winograd 1972), one could adopt a wider domain but know so little about it as to in effect form a limited domain (Charniak 1972), or one could posit the existence of massive parallel inference capabilities (Rieger 1974, Fahlman, 1975). The first two approaches can only work so long, and I have no confidence in the premise behind the third, so at some point the problem of search has to be faced. Within the general approach assumed by this paper the problem of search (and its brother, problem solving) has come up, at least implicitly, in work designed to show how a comprehension program might infer a person's motives from his actions or vice versa (Schank and Abelson 1975, Rieger 1976). I have not considered this problem with respect to Ms. Malaprop, but the program does have search problems, if at a somewhat more modest level of complexity. The problems stem from the fact that Ms. Malaprop must find (and hence search for) frame statements which match incoming story statements. In most of the cases we have considered so far, Ms. Malaprop's search is of a very simple sort. At any given time there is a list of "context frames" which are simply those complex event frames which have been mentioned in the story. Given a story statement, a list of frame statements with the same predicate is retrieved from each context frame, and matches are attempted against all of them. (Matches may be more or less good, which will be discussed in the next section). While I am sympathetic to the view that even this amount of search will prove unacceptable in a system which is to deal with the complexities of real stories, a more pressing problem is that even this flagrant use of search proves not to be sufficient. Consider the following example: (17) Jack was painting a wall. At one point he pressed the brush too hard against the wall. Question: What happened? This is, in fact, very similar to (15) only instead of saying there was too much paint on the brush, we are told that he pressed too hard. The result, of course, is the same, the problem, of course, is to figure this out.
E. Charniak
71
Working in a parallel fashion to the "too much paint" example, Ms. Malaprop will attempt to find in one of the context frames a command matching: (18) (GREATER T H R E S H O L D PRESSURE1) Were she able to find a match for (18) she would then proceed to say that the command was disobeyed, but given the search mechanism just explained she will not, in fact, find the match. The reasons are not precisely obvious, but a little explanation should make them clear. As was assumed in the original " t o o much paint" example, the actual command, PAINTING5, is to regulate the amount of paint — not the amount of pressure. PAINTING5 does state however that one way this is done is by regulating the pressure, but this does not allow us to make the match needed for (18). T o see why, let us take a closer look at the relevant command. (19) PAINTING5 (GREATER DRIP-THRESHOLD the amount of PAINT on the I N S T R U M E N T ) C O M E S - F R O M : ((19a) (a rule which states that surface volume varies directly with pressure) (19b) (apply pressure to instrument)) The COMES-FROM here states that one applies pressure on the instrument (19b) and then by changing that pressure the volume will go up or down accordingly (19a). But to use this information to match (18) we must apply yet another rule, which states: (20) If X varies (in) directly with Y, then X greater or less than a threshold (vice versa if indirectly) can be caused by Y being greater or less than a second threshold. In terms of our example this means that the program must first look into the simple event frame which expresses rule (19 a) to note that the volume varies directly with the pressure, and then apply rule (20) to infer that we must keep the pressure lower than some threshold. This last statement is the command which will match (18). Given our previous search mechanism none of this would have taken place. For one thing the crucial information is not found in PAINTING, but in the frame for rule (19a), and secondly, we need to apply yet a second rule, namely (20) before this information yields a match. We could, of course, simply loosen our restrictions on search so that a) story statements will not only be matched against the context frames, but any frames which are pointed to by the context frames, and b) extra rules may be used in matching. However, if we were somewhat worried about the effects of the previous search technique in terms of search time, this new "restriction", or rather the lack of restrictions, on search, will be problematic to say the least. The way in which Ms. Malaprop will actually handle this problem is quite different. It depends on the fact that in example (17) we not only know that there is "too much" of something, which is the portion of the input we have
72
Ms. Malaprop, a Language Comprehension Program
been emphasizing, but we also know that Jack is pressing the paint brush against the wall. This latter statement will, in fact, match (19 b). Once this happens, Ms. Malaprop will then go into a different search mode, called the "restricted search" mode. In effect she assumes that any embellishments of the matched story statement should be matched against the same portion of the frame. Hence, in the attempt to match (18), she will concentrate all of her "energy" in the area of PAINTING5, and by so concentrating will allow herself to go considerably deeper into subframes than she would normally. I have yet to work out the exact restrictions which will apply here, but for our purposes it is sufficient to note that in the normal circumstance (i. e. when we do not have any previous match within the sentence to tell the program which part of which frame is being discussed), only the "normal", relatively superficial, frame search will be allowed. Let us consider a second example, differing in detail, but with the same over-all problem. This was originally given as (2), but is repeated below. (2) Jack was going to paint his chair green. He got some blue and yellow paint. Question: Why ? The effect of the first line of (2) will be to set up an instance of the painting frame and note that the paint in question, although not yet mentioned in the story, will be green. The second line, according to our previous discussion of "have" will try to bind the blue and yellow paint to variables in a previously established frame. However, in the case of "paint" this will be narrowed down considerably, because one of the facts which we will know about paint is that its typical role in life is as the value of the variable paint in the PAINTING frame. Of course, in (2) this simple match will fail, for the colors are wrong, but note that we are again in a restricted search situation. That is, we know already which portion of PAINTING should match the second line of (2). Hence Ms. Malaprop will expend more effort in making the match. In particular, what should then occur is that the information about STUFF-COLOR (as opposed to simply COLOR which is "surface color") will be brought into play, and the search will be led to the frame for colour mixing. Once there it is all downhill.
5.
The Problem of Matching
We have been talking about how Ms. Malaprop integrates a story statement by matching it against some frame statement. The last section was concerned with how potential matching frame statements are located, in this we will consider how the program decides that there is indeed a match. One factor is time. For example, given sufficient disparities in time, some context complex event frames will not even be considered, simply because the events they describe were over so long before the new event that there cannot be any relation. In other cases the time disparity is not so great and time
E. Charniak
73
will only serve to eliminate certain possibilities within a particular frame. For example: (21) Jack finished painting the chair. He was going to clean the paint brush. He got some newspaper. Question: Why ? Answer: Presumably in order to wipe the brush on it. Within PAINTING there are two uses of newspaper: Once at the beginning to put around the object to be painted, and then again near the end for wiping the brush (not shown in the version of painting given earlier). In (21), the fact that Jack has "finished" painting will be interpreted (according to the rules given in section III) as indicating that Jack has finished the "inner" painting, but not the outer. Hence we must still examine this context frame, but many of the early statements can be disregarded on the basis of time considerations. But the major influence on matching is the binding of variables. This being a quite complex problem, let us start with the most simple situations and work our way up. (22) Jack was going to paint the chair with some green paint. He got a paint brush. Then he dipped the paint brush in the paint. The first line of this example tells the program to set up an instance of painting, and further to make the following bindings: (23) A G E N T = JACK1 O B J E C T = CHAIR1 PAINT = GREEN-PAINT1 Then using our analysis of "have", line two tells us to add on the binding: (24) I N S T R U M E N T = PAINT-BRUSH1 In each case the left hand side is a variable in painting while the right is the story object which is bound to that variable. Now, when we see the third line we will be trying to match: (25) (LIQUID-IN I N S T R U M E N T PAINT) from the painting frame (26) (LIQUID-IN PAINT-BRUSH1 GREEN-PAINT1) from the story Here LIQUID-IN is a predicate which expresses the fact that an object is submerged in a liquid. Given that both I N S T R U M E N T and PAINT are already bound to the objects they are supposed to match there is no difficulty and the match will be made. But we will not always have situations where all of the variables are bound in advance. For example: (27) Jack was painting a chair. He dipped a brush into the paint. In trying to integrate the second line of (27) we have exactly the same situation as in matching (25) with (26), only this time neither of the relevant variables are bound. In such cases the bindings must be the result of the match. T o see how this occurs, let us concentrate on I N S T R U M E N T . Once Ms. Malaprop notes that I N S T R U M E N T is unbound she will examine the variable entry for the variable in PAINTING, and will find, roughly speaking, the following:
74
Ms. Malaprop, a Language Comprehension Program
(28) (INSTRUMENT (28 a) (SOLID INSTRUMENT) NORMAL (28 b) (PAINT-BRUSH INSTRUMENT) (28 c) (ROLLER INSTRUMENT) (28 d) (ABSORBANT INSTRUMENT) ) The statement before the NORMAL states that the instrument must be a solid. Those after give various statements which are normally true of the variable. In the case of Jack's paint brush, it will certainly satisfy (28 b) and if it is a typical brush will satisfy (28 d) as well. In fact, simply satisfying (28 b) would have been sufficient to ensure a match between INSTRUMENT and PAINTBRUSH1. But now let us consider a still more difficult example: (29) Jack was going to paint the chair. He dipped a sponge into the paint. We have here an anomalous situation, and if asked Ms. Malaprop would be quite right to say that she did not know what Jack was doing. But suppose we add: (30) Then he wiped the sponge across a leg of the chair. Question: Why? Answer: I guess he is painting the chair with his sponge. The point of this example is that when first confronted with (29) the program tries to match (25) against: (31) (LIQUID-IN SPONGE1 PAINT1) In the attempt to match SPONGE1 to INSTRUMENT it will not only match the strict requirements, but will also match one of the normal conditions, that of absorbancy. However, this is not sufficient for Ms. Malaprop to make a match because a distinction is made between "normal object conditions" (which state what sort of object the thing is) which are sufficient to match the variable, and "normal property conditions" (like "this thing is normally absorbant") which are not. However, Ms. Malaprop will remember in the case of (29) that there was at least some positive evidence for the match, and when (30) comes in and the same variable match is tried again, this will be deemed sufficient and the match will be made. This is somewhat complex, but it is not complex enough. In fact, the problem of matching as developed here is simply a special case of the recognition or diagnosis problem, and as such, given the notorious difficulties of these issues, is sure to remain beyond our grasp for quite some time. 6.
Guessing, and Guessing Wrong
I stated at the outset that I would not try to justify here the decision to have Ms. Malaprop "integrate" new information at read time rather than waiting for user questions. Nevertheless, there are certain problems entailed by such a decision, and some discussion of these would be appropriate. To say that
E. Charniak
75
Ms. Malaprop "integrates" incoming story statements at read time is simply another way to say that she makes inferences at read time. So, in matching statements like "Jack got some newspaper" against our frames we are assuming, that is, inferring that the newspaper will be used in the course of painting, be it to clean the brush afterwards, or put under the object before. We then confront two problems. First, since in theory the number of such inferences is infinite, which ones should the program make. Secondly, such inferences, while quite reasonable, may upon occasion be wrong — Jack may have fetched the paper because he was going to pack some glasses in a box immediately after he finished painting. So if we are to make such inferences we must be able to unmake them as well. But how ? About the first of these I shall have nothing to say here. Indeed, about the second of these I have little to say — except for one point. If one hopes to correct mistaken beliefs, one should have some record of how this belief came about, and what, if any, influence it has had on future computation. It is this question I wish to address, although we will approach it from a somewhat oblique angle. Consider the following example: (32) Jack was painting a chair. He dipped a brush into the paint. Then he drew the brush across the chair. Question: Could this step be left out ? Answer: No. Question: Why not ? Answer: Because it is this step which gets the paint on the chair, and that's what painting is about. The interesting point about this second question is that it is a very different sort of "why" question than the others we have considered. Formerly we have been asking, "what were the person's goals when he did this activity". This time the request is rather "how did you infer that this step could not be left out". So derivational information is not only needed to correct mistaken inferences, but to answer questions as well. If we go back and look at how the program actually did infer that the step could not be left out, we see that it used a rule, which in simplified form looks like: (33) r - (AND OBLIGATORY1 (LEADS-TO X G) OBLIGATORY2 (GOAL G A) ) IFF OBLIGATORY3 (OBLIGATORY X A) This states that if action X leads to the goal of an action A, then X is obligatory with respect to A. The use of this rule in the present case will produce: (34) (OBLIGATORY PAINTING6 PAINTING) COMES-FROM: (OBLIGATORY3) (Actually, the relevant derivation would relate the story statement instantiating PAINTING6 with that instantiating PAINTING, but as usual, this introduces complications I would rather avoid.) In (34) COMES-FROM has been extended from its normal role of indicating how a state of affairs came about,
76
Ms. Malaprop, a Language Comprehension Program
and rather indicates how this particular fact was inferred. (The next example however will serve to indicate how close these two notions are within the system.) So, to answer the second question of (32) we simply follow the COMES-FROM pointer back to (33) and give the conditions which were used to make the inference as our answer. We will extend LEADS-TO in the analogous fashion. Now let us turn to a case of mistaken assumption. (35) Jack finished painting. He did not clean the brush. He left the brush in the paint. Question: Will the paint dry on the brush ? After the second line Ms. Malaprop will infer the bad results from the failure to wash the brush. (I think I can justify this action, but not in the available space.) The problem with this inference in the case of (35) is that the last line tells us we jumped to a false conclusion. The crucial knowledge here concerns evaporation, and in a simplified form would look like: (36) f - (AND EVAP1 (liquid on some surface) EVAP2 (liquid exposed to air) ) CAUSES EVAP3 (liquid dries) Using this fact, by the time we reach line three of (35) we will have: (37) (paint is sticking to the brush) COMES-FROM: (the story itself) LEADS-TO: (EVAP1) (38) (paint exposed to air) COMES-FROM: (air exposure rule, see below) LEADS-TO: (EVAP2) (39) (paint will dry) COMES-FROM: (EVAP3) In (38) the "air exposure rule" is a rule which states that, if there is no reason to believe the contrary, everything is assumed to be exposed to air. This rule, of course, is fallible, and will be marked as such. Given such a structure we can see, at least in principle, the program's reaction to the last line of (35). Concluding that the paint on the brush will not be exposed to air, it follows the results of this assumption to the rule of evaporation, and negates it, hence negating the fact that the paint brush will be unabsorbant the next time Jack wishes to use it (a fact not included in (37)—(39), but which would be in a complete version). 7.
Conclusion
Almost none of the program described here has been implemented. There does exist a "frame checker" which checks the frames for correct syntax and
E. Charniak
77
translates them into the internal structure used by Ms. Malaprop. That is all. I would estimate that the version which will handle all of the examples herein is six months off, but previous experience tells me that such estimates are likely to be too ambitious by factors of two or three. I have no illusions that when completed Ms. Malaprop will handle these examples "correctly". Ms. Malaprop can only be an approximation to "the truth" because she will have embedded in her approximate, or rather trivialized, solutions to the many standard AI problems which have come up in the course of this paper: search, matching, diagnosis, visual recognition, problem solving, etc. AI workers have always believed in the essential unity of cognitive processes. At one time this was expressed in the belief that some simple idea (like heuristic search) underlay all of our mental abilities. Ms. Malaprop on the other hand reflects what I take to be the now emerging view — what we once took to be peculiar complexities of particular AI domains occur in the rest of AI as well. It is here that the unity lies.
Acknowledgements This research was supported by the Dalle Molle Foundation and the Department of Computer Science, University of Geneva. Many of the ideas presented here were "batted around" in the Institute's Thursday afternoon seminars, and may properly "belong" to other people. But I doubt that any of us could remember. I would like to thank all the members of the seminar, and in particular Walter Bischof who displays exemplary courage while being shot at.
References Charniak, E. 1972 Toward a model of children's story comprehension. Memo 266, M.I.T. Artificial Intelligence Laboratory. 1976 a "Inference and knowledge", part I. In E. Charniak and Y. Wilks (Eds.), Computational semantics. (Amsterdam, North-Holland), 1-21. 1976 b " A framed P A N T I N G : the representation of a common sense knowledge fragment". Working paper 29, Institute for semantic and cognitive studies. Fahlman, Scott E. 1975 " A system for representing and using real-world knowledge". Memo 331, M.I.T. Artificial Intelligence Laboratory. Minsky, M. 1975 " A framework for representing knowledge". In P.H. Winston (Ed.), The psychology of computer vision, (New York, McGraw-Hill), 211-277.
78
Ms. Malaprop, a Language Comprehension Program
Raphael, B. 1968 "SIR: a computer program for semantic information retrieval". In M . Minsky (Ed.), Semantic information processing, (Cambridge Mass., M.I.T. press), 33-145. Rieger, C. 1974 Conceptual memory. Unpublished Ph. D. thesis, Stanford University. 1976 "An organization of knowledge for problem solving and language comprehension". In: Artificial intelligence, 7, 89-127. Schank, R. and Abelson, R. 1975 "Scripts, plans and knowledge". In: Advanced papers of the fourth international conference on artificial intelligence, Tbilisi, 151-158. Winograd, T . 1972 Unterstanding natural language. (New York, Academic press).
W.G. L E H N E R T
The Role of Scripts in Understanding1
1.
Natural Language Processing
The complexities underlying the human ability to use language are not immediately apparent. People generally regard simple reading skills as commonplace and therefore unimpressive. It is much easier to be impressed by the proof of a theorem or a skillfully played chess game since fewer people have a facility for mathematics or chess. But when so many people can read, the ability to read is not considered a hallmark of remarkable intelligence. The field of natural language processing has one ostensible goal: to create computer programs which simulate human communication through natural language. We are ultimately striving for programs that can understand text or conduct an interactive conversation with people in a natural manner. Researchers in natural language processing have considerable respect for the human cognition involved in language skills. At the present time there are no computer programs that can simulate the language abilities of a threeyear-old. Furthermore, there is an excellent chance that we will see a computerized chess master before we see a computerized conversationalist with the competence of a three-year-old. Why is natural language processing so difficult? The answer is simple: the cognition involved in language skills is inexorably bound up with the organization of information in human memory and with human thought processes. The pervasiveness of this connection is by no means obvious or undisputed. A majority of linguists persist in denying any such connection for fear their discipline will fall into the clutches of cognitive psychology. Linguistics, psychology, and artificial intelligence each attack the language problem with a different research paradigm. Linguists have a tendency to describe language as a quasi-mathematical entity that can be studied without regard for its function as a communication device. Psychologists prefer to shed light on the cognitive functions behind language comprehension by designing experiments. Artificial intelligence researchers regard language as a phenomenon in information processing which can be profitably studied in 1
T h e research described here was done at the yale artificial intelligence project and is funded in part by the advanced research projects agency of the department of defence and monitored under the office of navel research contract n 0 0 0 1 4 - 7 5 - c - l l l l .
80
T h e Role of Scripts in Understanding
terms of rigorous process models and computer implementations of these models. While each paradigm operates according to a different set of research premises and rules, ideas emerging from one paradigm can sometimes stimulate research in another. Some of the ideas which have come out of natural language processing research have aroused the interest of cognitive and social psychologists. In particular, the information processing approach to language has produced some interesting theories concerning human memory organization. But before we present these ideas, some motivational background is in order. 1.1 Contextual Understanding When people read sentences they do not process each sentence in isolation of what went before. People use previous context in order to make sense of each new sentence. (1) John goes to the zoo often. He is very fond of one particular seal. (2) John picked up a package of cigarettes. He noticed the seal was mutilated. (3) The royal proclamation was finally completed. The king sent for his seal. Here we have three different senses of the word "seal." People have no trouble determining which sense is appropriate in each case. If any one of these examples occurred in a natural context (say in a story) only one sense of the word would come to mind. This phenomenon is often referred to as a problem in word sense disambiguation. People are able to "effortlessly" arrive at the proper sense of words which are potentially ambiguous. In fact, improper word sense interpretations are never even considered. The seal in (1) is an animal, the seal in (2) is a sticker, and the seal in (3) is a device which produces an official stamp of some sort. The cognitive process which enables us to interpret this word three different ways in three different situations is dependent on previous context in each case. (4) He is very fond of one particular seal. (5) He noticed the seal was mutilated. (6) The king sent for his seal. Without previous context it is not clear what sense of "seal" is appropriate. John may have a favorite seal he uses with sealing wax. The king could have a pet seal. But these possibilities are not considered when previous context provides suitable interpretive constraints. 1.2 Pronominal Reference Another problem of which people remain largely unconscious is reference specification for pronouns. (7) Mary ordered a hamburger from the waitress. She brought it to her quicky. (8) John tried to make toast in the broiler but he burnt it. (9) When Bill saw John kissing Mary he punched him.
W . G . Lehnert
81
In (7) we have no trouble understanding that the waitress brought Mary the hamburger. In (8) it is clear that the toast was burnt, not the broiler. But if we substitute " b r o k e " for " b u r n t " in (8) then the pronominal referent for " i t " changes to the broiler. In (9) we assume that Bill punched John. Appropriate reference specifications for pronouns rely on knowledge about the world. For the examples above, we need to know that waitresses bring food to restaurant patrons, that toast is often burnt by accident, and that witnessing a kiss may arouse jealousy which may in turn be expressed by violence.There are no syntactic rules that are effective for pronominal reference problems. Substituting " b r o k e " for "burnt" in (8) doesn't change the syntactic construction of the sentence but it does change the referent for " i t . " People are able to reference pronouns on the basis of what makes the most sense. And this ability to make sense of a sentence is dependent on knowledge about the world. 1.3 Causal Chain Completion In the last section we saw a sentence which makes sense only because we are able to make a number of inferences. For our purposes we will define an inference to be an assumption that could be wrong. (9) When Bill saw John kissing Mary he punched him. We infer that Bill has some sort of relationship with Mary which causes Bill to feel possessive or protective toward Mary, and that Bill feels jealous or angry when he sees John kissing her. These feelings on Bill's part explain his aggression toward John. Without inferences of this sort, we would not feel that there was a valid causal connection between Bill's seeing John kiss Mary and Bill's punching John. (10) When Bill saw John leaving Mary he yelled at him. In (10) we cannot account for any causality. There is no chain of inferences which allows us to connect Bill's seeing John leave Mary and Bill yelling at John. We do not make any inferences about Bill's relationship with Mary or his feelings about John's leaving Mary. While (10) is a reasonable statement, we do not understand it as thoroughly as we understand (9) because we cannot establish a causal connection. We are made to feel that this would somehow make more sense in a context providing us with more information. When people read they are continually looking for causal connections between events. Causal connections are rarely spelled out explicitly because people are so adept at recognizing implicit causalities. It would be very tedious if everything was always spelled out for us: (11) John was fond of Mary. When John saw Bill kissing Mary he became angry. He wanted to make sure that Bill wouldn't kiss Mary again so he went up to Bill and punched him. He thought this would keep Bill away from Mary.
82
T h e Role of Scripts in Understanding
People are so adept at filling in missing information, that they often cannot remember what they were explicitly told and what they inferred (see Bower, 1976; Bransford and Franks, 1971). (12) John went into the restaurant and ordered. The waitress brought him a hamburger and he ate quickly. He tipped the waitress and left. After reading this story, it is not difficult to answer the following questions: (Ql) (Q2) (Q3) (Q4)
What did John order? Who served John the hamburger ? What did John eat? Did John leave the restaurant ?
Yet the answers to Q l , Q3 and Q4 each rely on an inference that was made at the time of story understanding. John ordered a hamburger, John ate the hamburger, and John left the restaurant. None of this was explicitly stated in the story. Establishing causal connections and filling in missing information are two necessary tasks in language comprehension that depend on the generation of inferences. 1.4 Focus Establishment As a final example, we will consider a problem that arises in question answering. When people answer questions, the appropriateness of an answer often depends on an ability to focus attention toward one aspect of the question. For example, suppose we are in the context of the following story: (13) John went into the restaurant and the hostess handed him a menu. He ordered lasagna. The waitress served him quickly and brought him a check. Now suppose we ask: (Q5) Did the waitress give John a menu ? It would be perfectly natural to answer: (A5a) No, the hostess gave John a menu. It would be equally correct, but far less appropriate, to respond: (A5b) No, the waitress gave John a check. Both (A5a) and (A5b) elaborate a negative answer in order to clarify something for the questioner. The elaboration in (A5a) assumes that the questioner is mainly interested in who gave John a menu. The elaboration in (A5b) assumes the questioner is primarily concerned with what the waitress gave John. Different elaborations result from different shifts in focus. In order to produce appropriate elaborations for questions like these, it is necessary to establish the best interpretive focus for the question. This is equivalent to determining what the questioner is interested in. People occassionally make mistakes in interpreting questions by misplacing the focus of the question. When the focus is not clear and could reasonably be assigned
W . G . Lehnert
83
to more than one component of the question, we are conscious of ambiguity. But when focus establishment occurs without such difficulties, we remain unaware of any thought processes concerned with interpretive focus. People who hear Q5 in the context of our restaurant story can produce an answer like A 5 a without any sense of ambiguity in the question A 5 b does not occur to anyone as an appropriate answer in spite of the fact that it is a perfectly correct answer. 1.5 Summary The problems outlined in the last four sections are not intended to be a representative overview of major problems in natural language processing. They are simply four issues in language comprehension that require solutions depending on general knowledge about the world. Finding the right word sense of "seal" in "The royal proclamation was finally completed. The king sent for his seal..." relies on knowledge about royal documents. Identification of pronominal referents in "Mary ordered a hamburger from the waitress. She brought it to her quickly . . . " depends on knowing what happens in restaurants. Understanding the causality in "When Bill saw John kissing Mary he punched him," is a matter of knowing something about human behavior and emotional reactions. And focus establishment for a question depends on an ability to assess what the questioner is most likely to be interested in. This assessment must rely on knowledge about what things are relatively routine and what kinds of variations can occur within a routine. The key issue behind all these problems is one of epistemology. What kinds of knowledge do people have ? How is this knowledge organized in memory ? What are the memory processes that access and manipulate this knowledge ? 2.
Expectation-Driven
Understanding
As people read text, they generate expectations about what is going to happen next and what they are likely to hear about next. The process of understanding is largely a process of generating such expectations and recognizing when an expectation has been substantiated or violated. There is a particular type of joke, often called a garden path joke, that takes advantage of expectationbased understanding. A typical garden path joke leads you along, substantiating your expectations, and then forces you to back up and reinterpret things according to a different set of assumptions. For example: (14) John and Bill went hunting last week. They shot two bucks. It was all they had. Until we get to the third sentence, everything seems fine. The context of a hunting trip, which was established in the first sentence, sets up expectations about hunting. In this context, the second sentence makes sense as a statement about the hunt; they killed two deer. We have been led down the garden path.
84
The Role of Scripts in Understanding
When we get to the third sentence, we discover it doesn't make sense. In order to have it make sense, we have to go back and reinterpret the second sentence to mean they spent two dollars. This interpretation would never have come to mind initially within the context of a hunting trip. Sometimes we get explicit references to expectations in language. The word " b u t " can be used in a number of ways, one of which is as a signal concerning expectations. (15) John bought a ticket for the concert but he didn't go. As soon as we hear that John bought a ticket to a concert, we generate a number of expectations about what John did later. In particular, we assume that John intended to go to the concert. The word " b u t " here operates as a signal that tells us to get ready to have one of our inferences violated. " B u t " is not an appropriate connective when the statement following it does not contradict some inference. (16) John bought a ticket for the concert but he hated to read. Here we sense no connection between buying a concert ticket and hating to read. The use of the word " b u t " is inappropriate since we had no expectations about John's attitudes toward reading. Expectations often form the basis for inferences which fill in missing information. For example, if we hear that John went to a restaurant and had a steak, we generate a number of expectations about what John is liable to do after entering the restaurant. We expect John to sit down, decide what he wants to eat, order, be served, and eat. When we hear that he ate a steak, we assume that all our other expectations about what John did in the restaurant still hold. Our expectations about what John would do become inferences about what John did. A statement can make sense or fail to make sense on many different levels. The ability to recognize words defines one very fundamental level of understanding. But on a higher level, it is not unreasonable to assert that a sentence or piece of text "makes sense" to the extent that it substantiates expectations generated beforehand. Consider: (17) John went into the department store and asked for a summer sublet. This sentence makes sense insofar as we understand what John did, but it fails to make sense in terms of what people normally do in department stores. (18) John went into the realty office and asked for a summer sublet. (19) John went into the department store and asked for the appliance section. We have expectations about what constitutes normal behavior in places like realty offices and department stores. When these expectations are confirmed, as in (18) and (19), we feel satisfied that things make sense. When these expectations are violated, as in (17), we feel disturbed. In developing theories of expectation-based understanding, we must ask where these expectations come from, and what sort of memory organiza-
W.G. Lehnert
85
tion is accessed by the cognitive processes that generate expectations. It is not enough to classify various types of expectations unless we can also account for the underlying memory organization from which those expectations are derived. Since text comprehension is a human process, we are naturally concerned with human memory organization. The following ideas about memory organization will therefore be presented as both a theory of human memory organization and as a proposed memory schema for computerized language understanding systems. If we can model human memory organization and human thought processes on a computer, we can design a system competent in natural language processing. While this is clearly a sufficient condition for a natural language system, there are people who would argue that such a condition is not necessary. That is, there might be effective strategies for processing natural language that are not at all like the processing strategies people use. We will not argue this issue here. Suffice to say that by modelling people we at least have a place to start ; we already know that humans have effective information processing systems. 3.
Scripts
The notion of a "script" was first introduced by Roger Schank and Robert Abelson at the Fourth International Conference on Artificial Intelligence (Schank and Abelson 1975). Work on a computer implementation of scriptbased text'processing had begun some months earlier at Yale University (Cullingford 1977). At about this same time, a paper by Marvin Minsky (Minsky 1975) emerged from MIT describing how a system of frames could be used to encode necessary knowledge about the world for problems in artificial intelligence. In this paper Minsky discussed a technique for artificial intelligence in general, and his formulation of frames was described mainly in terms of vision problems. It was clear, however, that Minsky was advocating a strategy for expectation-driven information processing in which particular situations are interpreted in terms of generalized expectations. On this level, we can safely say that scripts are one type of frame ; they are frames designed for the specific task of natural language processing. 3.1 What is a Script? In each culture there are a number of stereotypic situations in which human behavior is highly predictable and narrowly defined. Behavior in these situations is often described in terms of cultural conventions. These conventions are learned in childhood, adhered to throughout one's life and rarely questioned or analyzed. Scripts describe those conventional situations that are defined by a highly stereotypic sequence of events. For example, Americans have a very simple script for preparing a cup of tea. (1) Pour the tea into the cup. (2) Add milk, (optional)
86
T h e Role of Scripts in Understanding
Most people have never questioned this script or considered whether there might be a better way. This is what one learns, and this is how everyone does it. In Great Britain, people learn a slightly different script for preparing tea: (1) Pour milk into the cup. (optional) (2) Add tea. In the American script tea is poured into the cup first and milk is added. In the English script, milk goes into the cup first and tea is added. T o anyone who has acquired one of these scripts, their way of doing it seems to be the "most natural" procedure. People tend to be unconscious of their conventions until they are confronted with different ones. Given these two different conventions, we can look for reasons why one is superior or inferior to the other. But when we acquire a cultural script, we rarely question it; it is merely the way the world works. Most scripts are acquired in childhood, either through direct experience or by vicarious observation. Many people have scripts for gunfights, bank robberies, and airplane hijackings, in spite of the fact that they have never been directly involved in any such episodes. Movies, books, and television have contributed significantly to vicarious script acquisition. These scripts are general in the sense that a large population share stereotypic knowledge of such situations. Some of the scripts acquired through direct experience are idiosyncratic insofar as they may be specific to one individual. For example, many people who own dogs have a dog walking script. While each of these scripts share certain events (such as calling the dog and perhaps putting a leash on the dog), other aspects of the script are highly individualized. If I follow the same path time after time, this path is an individualized aspect of my dog walking script that is not a part of anyone else's script for walking their dog. While idiosyncratic scripts may be of interest to behavioral psychologists, the scripts that are important for natural language processing are those shared by a large population as a cultural norm. When a script is shared by many people, that script can be referenced very efficiently. If a friend mentions that he went out to a restaurant, you will not interpret this to mean that he simply moved himself into the proximity of a restaurant. His statement is normally interpreted to mean that the entire restaurant script was executed. That is, he went into a restaurant, decided what he wanted to eat, made his choice known to an appropriate employee, the order was conveyed to a cook who prepared the meal, the meal was served, eaten, a check was received, paid, and he left the restaurant. This entire inference chain is conveyed by saying "I went to a restaurant". Because this information is being communicated, it makes sense to respond with a question like "What did you have to eat?" Of course, anyone who does not have a restaurant script will not be able to make these inferences, and will therefore not understand the full import of what is being said. Technical talk or con-
W . G . Lehnert
87
versation about esoteric knowledge domains is often incomprehensible to the uninitiated because unfamiliar scripts are referenced. For example, a statement like "I boot-strapped the computer system last night", means a little or a lot depending on how much knowledge you have about computers and their maintenance. Many scripts can be described from different points of view. For example, the restaurant script has a patron's point of view, a waiter's perspective, and a cook's viewpoint. While the tip may be the most important event in the waiter's version of the restaurant script, a restaurant patron is likely to be much more concerned with the meal he receives. When scripts are used to understand, one point of view is normally assumed. If a script perspective changes, and we are forced to view events from a new vantage point, we tend to be conscious of a shift in viewpoint. Scripts are used by people both behaviorally and cognitively. The behavioral aspect of script application occurs when people are actually in a scriptal situation and they behave in a manner appropriate to that script. The cognitive aspect of script application occurs when people are processing language and must generate inferences about what is being said on the basis of their scriptal knowledge. Scripts are significant in natural language processing because they provide us with strategies for inference generation. 3.2
The SAM
System
SAM (Script Applier Mechanism) is a computer program that illustrates how scripts can be used in understanding. SAM has roughly twenty scripts in its knowledge base which enable it to understand stories about those knowledge domains. SAM reads stories that have been typed in on a computer terminal. As SAM processes each sentence of the input story, it generates an internal memory representation of the story. This internal story representation is then accessed by post-understanding processes that produce paraphrases of the original story and answer questions about the story. For example, here is a newspaper story that SAM has understood: Friday evening a car swerved off Route 69. The vehicle struck a tree. The passenger, a New Jersey man, was killed. David Hall, 27, was pronounced dead at the scene by Dr. Dana Blauchard, medical examiner. Frank Miller, 32, of 593 Foxon Rd, the driver, was taken to Milford Hospital by Flanagan Ambulance. He was treated and released. The passenger was extricated from the vehicle by the Branford Fire Department. No charges were made. Patrolman Robert Onofrio investigated the accident. Once SAM has generated a memory representation for this story, we can ask SAM questions: Was anyone killed ? YES, DAVID HALL DIED.
88
The Role of Scripts in Understanding
Was anyone hurt ? YES, FRANK MILLER WAS SLIGHTLY INJURED. Why was Frank Miller hurt ? BECAUSE THE AUTOMOBILE HIT THE TREE. Did Miller go to the hospital ? YES, HE WAS IN THE MILFORD HOSPITAL. How did Miller get to the hospital ? THE FLANAGAN AMBULANCE COMPANY TOOK HIM TO THE MILFORD HOSPITAL. SAM can also produce a paraphrase of the story AN AUTOMOBILE HIT A TREE NEAR HIGHWAY 69 FOUR DAYS AGO. DAVID HALL, AGE 27, RESIDENCE IN NEW JERSEY, THE PASSENGER, DIED. FRANK MILLER, AGE 34, RESIDENCE AT 593 FOXON ROAD IN NEW HAVEN, CONNECTICUT, THE DRIVER, WAS SLIGHTLY INJURED. THE POLICE DEPARTMENT DID N O T FILE CHARGES. Scripts are encoded in SAM as causal chains (Schank 1975 b) of alternating states and events. Individual states and events are encoded in a language-free meaning representation called Conceptual Dependency (Schank 1975 a). Input sentences are initially analyzed and translated into Conceptual Dependency representations. Then each conceptualization is given to the script applier, which attempts to match that concept to one of the concepts in a currently active script. For example, suppose the first sentence SAM receives is: (20) John went to a restaurant. This is represented by a primitive act PTRANS that is used to describe changes of physical location: -* Restaurant John PTRANS