Human-Computer Interaction - The Design of User-Friendly, Transparent Interfaces; Research and Evaluation [2]

276 121 13MB

English Pages [288]

Table of contents :
Evolving the IRB: Building Robust Review for Industry Research
Recommended Citation
Microsoft Word - FINAL jackman.kanerva 6.12.16.docm

Recommend Papers

Human-Computer Interaction - The Design of User-Friendly, Transparent Interfaces; Research and Evaluation [1]

130 16 67MB Read more

Human-Computer Interaction - The Design of User-Friendly, Transparent Interfaces; Research and Evaluation [1, Complete ed.] 0128152652

105 79 86MB Read more

Human-Computer Interaction - The Design of User-Friendly, Transparent Interfaces; Research and Evaluation [1, Instructor's ed.] 0128152652

113 46 86MB Read more

Human Computer Interaction Research in Web Design and Evaluation 1599042460, 9781599042466, 9781599042480

Human Computer Interaction Research in Web Design and Evaluation presents research from academics and industry experts,

455 8 6MB Read more

Transparent Plastics: Design and Technology 9783764382872, 9783764374709

Practical design and construction guide Recent years have seen the construction of buildings made of plastic, structur

113 59 41MB Read more

Designing Interfaces: Patterns for Effective Interaction Design [3 ed.] 9781492051961, 1492051969

Designing good application interfaces isn't easy now that companies need to create compelling, seamless user experi

391 56 178MB Read more

Designing Interfaces: Patterns for Effective Interaction Design [1 ed.] 9780596008031, 0596008031

Designing a good interface isn't easy. Users demand software that is well-behaved, good-looking, and easy to use. Y

406 96 9MB Read more

Design of Multimodal Mobile Interfaces 9781501502736, 9781501510847

The “smart mobile” has become an essential and inseparable part of our lives. This powerful tool enables us to perform m

98 85 5MB Read more

Design of Multimodal Mobile Interfaces 9781501502736, 9781501510847

The “smart mobile” has become an essential and inseparable part of our lives. This powerful tool enables us to perform m

131 50 12MB Read more

Brain-Computer Interfaces: Revolutionizing Human-Computer Interaction 9783642020902, 9783642020919, 3642020909

This book provides an accessible introduction to the neurophysiological and signal-processing background required for BC

329 67 12MB Read more

Human-Computer Interaction - The Design of User-Friendly, Transparent Interfaces; Research and Evaluation [2]

Author / Uploaded
Various

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Cognitive Artifacts

Pagina 1 di 18

Cognitive Artifacts Donald A. Norman Norman, Donald A. (1991), "Cognitive artifacts", i John M. Carroll (red), Designing interaction, Cambridge University Press, Cambridge. 20 sidor. A cognitive artifact is an artificial device designed to maintain, display, or operate upon information in order to serve a representational function. The distinctive characteristics of human beings as a species are: 1. Their special ability to modify the environment in which they live through the creation of artifacts and 1. the corresponding ability to transmit the accumulated modifications to subsequent generations through precept and procedure coded in human language. (Cole, 1990, p. 1). Artifacts pervade our lives, our every activity. The speed, power, and intelligence of human beings are drambtically enhanced by the invention of artificial devices, so much so that tool making and usage constitute one of the defining characteristics of our species. Many artifacts make us stronger or faster, or protect us from the elements or predators, or feed and clothe us. And many artifacts make us smarter, increasing cognitive capabilities and making possible the modern intellectual world. My interest is in cognitive artifacts, those artificial devices that maintain, display, or operate upon information in order to serve a representational function and that affect human cognitive performance. In this chapter I discuss three aspects of cognitive artifacts: 1. Two differing views of artifacts: the system view and the personal view; 2. Levels of directness and engagement: the relationship between those aspects of artifacts that serve the execution of acts and those that serve the evaluation of environmental states and the resulting feeling of directness of control or engagement; 3. Representational properties of cognitive artifacts: the relationship between the system state and its representation in the artifact. Some History Despite the enormous impact of artifacts upon human cognition, most of our scientific understanding is of the unaided mind: of memory, attention, perception, action, and thought, unaided by external devices. There is little understanding of the informationprocessing roles played by artifacts and how they interact with the information processing activities of their users. The power and importance of culture and artifacts to enhance human abilities are ignored within much of contemporary cognitive science despite the heavy prominence given to its importance in the early days of psychological and anthropological investigation. The field has a sound historical basis, starting at least with Wundt (1916), nurtured and developed by the Soviet socialhistorical school of the 1920s (Leont'ev, 1981; Luria, 1979; Vygotsky, 1978; Wertsch, 1985), and still under study by a hardy band of social scientists, often unified by titles such as "activity theory," "action theory," or "situated action," with much of the research centered in Scandinavia, Germany, and the Soviet Union. In the early part of the 20th century, American psychology moved from its early interest in mental functioning to the behavioral era, in which studies of representational issues, consciousness, mind, and culture were considered, at best, irrelevant to science. These dark ages ended in the mid1950s,

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 2 di 18

but by then, the historical continuity with the earlier approaches and with European psychology had been lost. As a result, American cognitive psychology had to recreate itself, borrowing heavily from British influences. The emphasis was on the study of the psychological mechanisms responsible for memory, attention, perception, language, and thought in the single, unaided individual, studied almost entirely within the university laboratory. There was little or no emphasis on group activities, on the overall situation in which people accomplished their normal daily activities, or on naturalistic observations. Given these biases and history, it is no surprise that little thought was given to the role of the environment (whether natural or artificial) in the study of human cognition. The field has now returned to pay serious attention to the role of the situation, other people, natural and artificial environments, and culture. In part, this change has come about through the dedicated effort of the current researchers, in part because the current interest in the design of computer interfaces has forced consideration of the role of real tasks and environments, and therefore of groups of cooperating individuals, or artifacts, and of culture. The birth, death, and now apparent rebirth of the interest in culture and artifacts in thought is reflected in a survey paper by Cole, "Cultural psychology: a once and future discipline?" (Cole, 1990). For Cole, cultural psychology builds on the two major assumptions that stand as the opening quotation to this chapter: (1) the human's ability to create artifacts; (2) the corresponding ability to transmit accumulated knowledge to subsequent generations. In this chapter I emphasize the informationprocessing role played by physical artifacts upon the cognition of the individual hence the term cognitive artifact. Here, I will not be concerned with how they are invented, acquired, or transmitted across individuals or generations. The goal is to integrate artifacts into the existing theory of human cognition. The field of humancomputer interaction has pioneered in the formal study of the cognitive relationship between a person's activities, the artifact of the computer, and the task, and this chapter is a result of work in that tradition. However, most of the work has been narrowly focused on the details of the "interface" between the person and the machine. But it has become increasingly clear that the nature of the interaction between the people and the task affects the artifact and its use, with the view and use of the artifact varying with both the nature of the task and the level of expertise and skill of the people (e.g., see Bannon & B0dker, this volume, both for a clear description of this philosophy and also for a general review). I agree that we need a broader outlook upon tools and their use, but we also need better scientific understanding of the role played by the artifact itself, and so the main focus is upon the properties of the artifact and how its design affects the person and task. It is clear that we are entering a new era of technology, one dominated by access to computation, communication, and knowledge, access that moreover can be readily available, inexpensive, powerful, and portable. Much of what will transpire can be called the development of cognitive artifacts, artificial devices that enhance human cognitive capabilities. As we shall see, however, artifacts do not actually change an individual's capabilities. Rather, they change the nature of the task performed by the person. When the informational and processing structure of the artifact is combined with the task and the informational and processing structure of the human, the result is to expand and enhance cognitive capabilities of the total system of human, task, and artifact. Two Views of Artifacts: The System View and the Personal View The most obvious analysis of an artifact is that it enhances human ability. According to this analysis an artifact such as a pulley system makes us stronger, a car makes us faster, and paper and pencil make us smarter. By this analysis, artifacts such as written notes, books, and recordings amplify the cognitive power of human memory and artifacts such as mathematics and logic amplify the power of thought. The notions that artifacts enhance or amplify may be natural, but as Cole and Griffin point out in their essay "Cultural amplifiers reconsidered" (1980), they are badly misleading.

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 3 di 18

Artifacts may enhance performance, but as a rule they do not do so by enhancing or amplifying individual abilities. There are artifacts that really do amplify. A megaphone amplifies voice intensity to allow a person's voice to be heard for a greater distance than otherwise possible. This is amplification: The voice is unchanged in form and content but increased in quantity (intensity). But when written language and mathematics enable different performance than possible without their use, they do not do so by amplification: They change the nature of the task being done by the person and, in this way, enhance the overall performance. Artifacts appear to play different roles depending upon the point from which they are viewed. When a person uses an artifact to accomplish some task, the outside observer sees the system view, the total structure of person plus artifact (Figure 2.1) in accomplishing that task. The person, however, sees the personal view: how the artifact has affected the task to be performed (Figure 2.2).

Figure 2.1. The system view of a cognitive artifact. Under this view, we see the entire system composed of the person, the task, and the artifact. Seen from this perspective, the artifact enhances cognition, for with the aid of the artifact, a system can accomplish more than without the artifact.

Figure 2.2. The personal view of a cognitive artifact. Under this view, that of the individual person who must use the artifact, the view of the task has changed: thus, the artifact does not enhance cognitionit changes the task. New things have to be learned, and old procedures and information may no longer be required: The person's cognitive abilities are unchanged.

The System View of an Artifact The two views of artifacts, and an illustration of how cognition is distributed across people and technology, can perhaps most easily be illustrated by example. Consider the everyday memory aid, the reminder or "todo" list, or in industrial contexts, the checklist for a task (e.g., the checklists used by pilots before each critical phase of flight in a commercial aircraft). From the system point of view, checklists enhance memory and performance; from the personal point of view, they change the task. At first, the checklist or todo list may appear to be a memory aid. It can be seen to help us remember what to do during the course of our activities. In fact, there can be no question that checklists change our behavior and prevent some kinds of forgetting: They are so effective in industrial and aviation settings, that their use is often required by regulation. It is tempting to say that a list extends or enhances our memory. After all, with it, we can perform as if we had a perfect memory for the items on the list. Without it, we occasionally forget to take important actions. When we think of the todo list in terms of what the personpluslist system can do, we are looking at one view of the artifact. This is the view of the artifact from afar, looking at it in the context of the person and the task to be performed: This is the system view. The system view of the list is as a memory enhancer. The Personal View of an Artifact

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 4 di 18

The checklist or todo list has another view, the view it presents to the task performer: this the personal view. From the point of view of the user of the artifact, using the list is itself a task. Without the list, we must remember or plan all of our actions. With the list, we need to do very little remembering and planning: The planning and "remembering" were done ahead of time, at the time we made up the list. At the time we perform the individual actions we need not repeat the planning and remembering. The use of a list instead of unaided memory introduces three new tasks, the first performed ahead of time, the other two at the time the action is to be done: 1. The construction of the list; 2. Remembering to consult the list; 3. Reading and interpreting the items on the list. The fact that the preparation of the list is done prior to the action has an important impact upon performance because it allows the cognitive effort to be distributed across time and people. This preparatory task, which Hutchins calls "precomputation" (E. Hutchins, 1989, personal communication), can be done whenever convenient, when there are no time pressures or other stresses, and even by a different person than the individual who performs the actions. In fact, precomputation can take place years before the actual event and one precomputation can serve many applications. In the aviation setting, flight checklists are prepared by the chief pilot of each airline, approved by the Federal Aviation Authority, and then passed on to the pilots who use them for many years and many thousands of flights without further modification: This is both precomputation and a distribution of the cognitive task of planning across people and time. To the aviation system, the checklist enhances memory and accuracy of action; to the individual pilots, the checklist is a new task inserted into the daily routine, and at times it is apt to be viewed as extraneous to the main goals of the day. As such it is a nuisance and it can lead to new classes of errors: Some of these errors may resemble those that would occur without the use of the checklist, and some may not. When we compare the activities performed with an without the aid of a reminder list, we see that the conclusion one draws depends on the point of view being taken. To the outside observer (who takes the system view), the same actions are intended to be performed with and without the list, but (usually) they are carried out more accurately and reliably with the list. To the individual user (who takes the personal view), the list is not a memory or planning enhancer; it is a set of new tasks to be performed, with the aspects of the list relevant to memory and planning separated from the aspects of the list relevant to performance. Every artifact has both a system and a personal view, and they are often very different in appearance. From the system view, the artifact appears to expand some functional capacity of the task performer. From the personal view, the artifact has replaced the original task with a different task, one that may have radically different cognitive requirements and use radically different cognitive capacities than the original task. This analysis points out that from all points of view, artifacts change the way a task gets done. In particular, artifacts can: Distribute the actions across time (precomputation); Distribute the actions across people (distributed cognition); Change the actions required of the individuals doing the activity. Levels of Directness and Engagement

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 5 di 18

When we use an artifact to do a task, of necessity we make use of a representation. Artifacts act as mediators between us and the world, both in execution (between actions and the resulting changes to the world state) and in perception (between changes in the world and our detection and interpretation of the state). The nature of the interaction between the person and the object of the task varies from direct engagement to a very indirect, remote form of interaction. Thus, when we write or draw with a pencil on paper, there is a direct relationship between movement of the pencil and the resulting marks on the paper. When we ask someone else to write or draw for us, the relationship is much less direct. Some interactions are so indirect and remote that feedback and information about the world state are difficult to get and possibly delayed in time, and incomplete or of unknown accuracy. These differences can have a major impact upon task performance and to a large extent are controlled by the design of the task and the artifact. (See the important discussion by Laurel, 1986, which introduces the concept of "direct engagement.") Bodker (1989) distinguishes among several possible relationships among the person, the artifact, and the objects being operated upon. Thus, the artifact can be used to mediate directly between the person and the object (as in using a hammer or chisel to operate upon nails or wood). Or the artifact can present a virtual object or world upon which operations are performed, eventually to be reflected onto the real object. In some cases, the virtual world exists only within the computer (as in building a spreadsheet or graphic object that will never exist outside the computer). The object might actually exist outside the computer, but be created or operated upon through the virtual world of the artifact (as in controlling an industrial process through the computer display, or developing the content and format of a publication within the computer word processor and publishing system). In these cases, here are several layers of representation: representations the represented world of he real object; representations the representing world within the artifact; represention3, the way the artifact displays the virtual world; and representations the ental representation of the human. Actions are performed through a feedback mechanism involving both an execuon and evaluation phase (Figure 2.3). Both phases of the action cycle need suport from the representational format used by the artifact. The choice of representaon and interactions permitted by the artifact affect the interaction of the person ith the object, whether real or virtual (Hutchins, Hollan, & Norman, 1986; Nortan, 1986, 1988). Different forms of artifacts have different representational implitions, which in turn dramatically affect the interactions.

Figure 2.3. The action cycle. Artifacts that support action must support both the execution and evaluation phases of the action cycle, usually through different representations. The gulfs of execution and evaluation refer to she mismatch between our internal Coals and expectations and the availability and representation of informaion about the state of the world and low it might be changed. The gulf of xecution refers to the difficulty of acting upon the environment (and low

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 6 di 18

well the artifact supports those actions). The gulf of evaluation refers o the difficulty of assessing the state f the environment (and how well the rtifact supports the detection and interpretation of that state).

Activity Flow The gulf of execution refers to the difficulty of acting upon the environment (and how well the artifact supports those actions). The gulf of evaluation refers to the fficulty of assessing the state of the environment (and how well the artifact suprts the detection and interpretation of that state). There are two ways of bridging e gulfs. One is by appropriate design of the artifact, the other through mental effort and training. Thus, with increasing skill, a person mentally bridges the gulfs, so that the operations upon the artifact are done subconsciously, without awareness, and the operators view themselves as operating directly upon the final object (Bodker, 1989; Hutchins, 1986; Hutchins et al., 1986). Bodker introduces the notion of "activity flow" to describe the activity cycle in accomplishing a task. Automatization of effort and the resulting feeling of direct engagement can occur where a consistent, cohesive activity flow is supported by the task, artifact, and environment. Interruptions and unexpected results break the activity flow, forcing conscious attention upon the task. For many activities, this "bringing to consciousness" is disruptive of efficient performance. The problem with disrupting activity flow is that the disruption brings to conscious awareness the disrupting activity, even when this is not the main focus of attention. This is usually undesirable, for it can have negative impact upon the task being performed. In fact, disruptions of this sort can lead to errors when the interrupting activity interferes with the maintenance of working memory for the task. The resulting memory difficulties may mean that the interrupted task is not resumed properly, either by being delayed beyond its proper execution time, by returning to the wrong point in the task, or by being forgotten altogether and never resumed: three classic forms of action errors. But deliberate disruption of the activity flow might be a useful safety device if it forces conscious attention upon critical, safetyrelated aspects of the task. Automatic behavior is valuable in many skilled operations, for it permits the attention to be directed to one area of concern even while performing smoothly the operations required for another area for example, the way in which a skilled typist can enter text automatically while concentrating upon the construction of future sentences. But at times, it might be valuable to force conscious attention to some aspect of performance by deliberately breaking the activity flow. Thus, "forcing functions" physical constraints that prevent critical or dangerous actions without conscious attention could be viewed as serving their function by a deliberate disruption of normal activities. A good example of a deliberate disruption of activity for safety purposes is the use of checklists in industry and, especially, in commercial aviation. In aviation, the checklist is often reviewed by both pilots, one reading aloud the items, the other confirming and saying aloud the setting of each item as it is read. These actions are intended to force a deliberate, conscious disruption of skilled behavior, deliberately breaking the normal activity flow. Safetyrelated checks and cautions should be disruptive in order to receive conscious attention. Automatic actions are the most susceptible to errors by action slips and to disruption by external events and interruptions. In fact, the checklist can fail in its function: After thousands of usages and years of experience, checklist use can be so routine that it does become automatic, sometimes with serious consequences (Degani & Wiener, 1990; Norman & Hutchins, 1990; NTSB, 1989). The point is not that one class of interaction or representation is superior to another but that the different forms and modes each have different properties. Representation and Artifacts

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 7 di 18

The power of a cognitive artifact comes from its function as a representational device. Indeed, I define a cognitive artifact as an artificial device designed to maintain, display, or operate upon information in order to serve a representational function. It is now time to take a look at some of the representational features of artifacts. This will be brief and incomplete: This work is just beginning and although the work so far is suggestive, a more complete analysis will have to come later. Representational Systems A representational system has three essential ingredients (Newell, 1981; Rumelhart & Norman, 1988): The represented world that which is to be represented; The representing world a set of symbols; An interpreter (which includes procedures for operating upon the representation). Surface Representations Some artifacts are capable only of a surface level representation. Thus, memory aids such as paper, books, and blackboards are useful because they allow for the display and (relatively) permanent maintenance of representations. The slide rule and abacus are examples of computational devices which only contain surface representations of their information. These devices are primarily systems for making possible the display and maintenance of symbols: They implement the "physical" part of the physical symbol system. These are called surface representations because the symbols are maintained at the visible "surface" of the device for example, marks on the surface, as pencil or ink marks on paper, chalk on a board, indentations in sand, clay or wood. Internal Representations Artifacts that have internal representations are those in which the symbols are maintained internally within the device (unlike paper and pencil where the symbols are always visible on the "surface"). This poses an immediate requirement on the artifact: There must be an interface that transforms the internal representation into some surface representation that can be interpreted and used by the person. Artifacts that have only surface representations do not have such a requirement, for the surface representation itself serves as the interface. The Interface between Artifact and Person Cognitive artifacts need interfaces for several reasons. In the case of artifacts with internal representations, the internal representation is inaccessible to the user, so the interface is essential for any use of the artifact. Moreover, even for artifacts that have only surface representations, the style and format of the interface determine the usability of the device. Here, the standard issues in the field of interface design apply. We can conceptualise the artifact and its interface in this way. A person is a system with an active, internal representation. For an artifact to be usable, the surface representation must correspond to something that is interpretable by the person, and the operations required to modify the information within the artifact must be performable by the user. The interface serves to transform the properties of the artifact's representational system to those that match the properties of the person. To the user of an artifact, the representing world is the surface of the artifact the information structures accessible to the person employing the artifact. One of the basic issues in developing an artifact is the choice of mapping between the representing world and the represented world (or

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 8 di 18

between the surface representation and the task domain being supported by the artifact). In the mapping between the represented world and the representing world of the artifact. the choice of representation determines how faithfully the match is met. The Object Symbol One major concern in interfaces is the relationship between control operation and system state. Usually, these two aspects of the interface are separated and handled by different components. The two different aspects are not always present, and even when they are, they may differ considerably from one another in physical location, conception, and form of representation. This independence of control and display was not always true, and it seems to have arisen more by historical accident than by design. Some controls have the interesting representational property that they serve both as the objects to be operated upon and also as representations of their states (see Figure 2.4). Simple examples occur for any controls operated by physical levers, where the act of moving the lever changes both the system state and also the physical appearance of the device: The position of the lever is both the actual state of the device and also its representation. Norman and Hutchins named the situation where the physical object is both the object operated upon and the symbol of its state the "object symbol" (Norman & Hutchins, 1988). The special case in which the same object serves as both a control of its value and a representation of its value was first described by Draper (1986). who argued for the importance of treating input and output to a computer system as a unified activity.

Figure 2.4. The object symbol. When a person manipulates a real or virtual world through an artifact, when the object in the artifact is both the means of control (for execution of actions) and also the representation of the object state (for evaluation), then we have the case of an object symbol. In condition a, the execution and evaluation are done separately. In condition alfa, the same representation is used for execution and evaluation; beta represents the case of the object symbol.

Object symbols used to be the prevailing mode of operation, for they represent the natural and frequently occurring mode of operation with mechanical systems, especially simpler systems. Many mechanical systems have the property that one directly manipulates the parts of interest and that one assesses the state of the device from the position of those same parts. The object symbol situation disappears when controls are physically removed from the site of action. In the modern world of computer controls, the object symbol is rare. With modern electronic systems, the controls and indicators have almost no physical or spatial relationships to the device itself, which introduces an arbitrary or abstract relationship between the controls, the indicators, and the state of the system. But this state of affairs has come about by accident, not by design. The advantages of separating controls from physical equipment led to a natural separation of object and symbol. Once there was a separation, then the control no longer signaled system state. The result has been separation of the control of state from the indicator of state and, in some systems, a complete neglect of the development of appropriate representational forms for either control or display.

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 9 di 18

Figure 2.5. Substitutive and additive dimensions. Each of the ovals represents the representational aspects of values along the dimension from A to E. In the substitutive case, the representations replace one another. In the additive case, each successive representation includes the previous. Examples of additive dimensions are loudness and brightness. Examples of substitutive dimensions are pitch and hue.

Additive and Substitutive Dimensions Many years ago, Stevens identified two forms of psychological representational dimensions or scales: additive and substitutive (Stevens, 1957). In an additive scale, the representations could be ordered, with each succeeding one containing the one before it, plus perhaps new aspects. The psychological percepts of loudness and brightness (which are the psychological mappings of physical sound and light intensities) form additive scales. In a substitutive scale, each new item replaces the one before it, with perhaps some overlap of attributes. The psychological percepts of pitch and hue (which are the psychological mappings of physical sound frequency and light wavelength) form substitutive scales. Restle (1961) showed that these two scale types could be represented in settheoretic terms (as shown in Figure 2.5). In an additive scale, "as one moves along the sequence of sets one picks up new aspects, and one never loses any of the old ones. Any such sequence of sets is ordered in a strict way, and distances are additive" (Restle, 1961, p. 49). In a substitutive scale where, for example, one is moving from state A to state B. "each step of the process involves discarding some elements from A and adding some new elements from B. Elements from A which have earlier been discarded are never reused and elements from B which have been added are never discarded.... each move along the scale involves substituting some elements from B for some of the elements of A" (Restle, 1961, p. 50). Representational Naturalness I propose the following hypotheses about the form of representation used in a cognitive artifact. Hypothesis 1: The "naturalness" of a mapping is related to the directness of the mapping, where directness can be measured by the complexity of the relationship between representation and value, measured by the length of the description of that mapping. The use of "length of description" as the measure of naturalness is taken from the analogous use in specifying the complexity of a statement in complexity theory. The length of the description is, of course, a function of the terms used for the description. I propose that the terms be psychological, perceptual primitives. It is important not to confuse the idea of the mapping terms with natural language or conscious awareness. The mapping terms are purely formal and do not imply that the person is aware of them. They are not terms in natural language. Hypothesis 2: Experts derive more efficient mapping terms, thus reducing the complexity of a mapping and increasing its feeling of "naturalness." However, although these derived terms may simplify the mapping relationship, they always extract some penalty in time or computation (and, thereby, in mental workload) for their interpretation .

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 10 di 18

Hypothesis 2 accounts for the phenomenon that experts can apparently get used to any representation, without obvious decrease in performance (except for learning time). This hypothesis allows the apparent complexity and naturalness of a representation to change with the development of expert skill. However, because the derived mapping terms are built upon some set of perceptual primitives, these derived terms will need to be interpreted, thereby extracting some informationprocessing workload. In normal behavior, this will probably not be noticeable, but in times of heavy workload or stress, the extra workload required to use the derived terms should degrade performance. In other words, although experts can get used to anything and even claim it to be natural and easy to use, less natural representations will suffer first under periods of heavy workload and stress. Finally, I suggest that the choice of representation for the mapping between the representing world (the surface representation) and the represented world (the task domain being supported by the artifact) follow a guiding principle for appropriateness taken from the work of Mackinlay, Card, and Robertson (1989): Appropriateness principle: The surface representation used by the artifact should allow the person to work with exactly the information acceptable to the task: neither more nor less.2 This principle is a direct paraphrase of the expressiveness principle for input devices developed by Mackinlay, Card, & Robertson (1989), namely: "An input device should allow the user to express exactly the information acceptable to the application: neither more nor less" (emphasis added). Mackinlay et al. were developing a language for describing the mapping between input device and function, which meant they were on a parallel undertaking to the one described here. In principle, their analyses can be translated into the ones needed for the study of the representational properties of artifact. EST %

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 11 di 18

Figure 2.6. An unnatural mapping. Here, percentage (which is an additive dimension) is represented by a substitutive scale different shadings. And where the shadings can be ordered along an additive scale, the ordering conflicts with the ordering of percentages. (Redrawn from a figure in the Los Angeles Times, September 13, 1988, p. 21.)

Using Density to Represent Numerical Value Example: Contrast the case where an additive scale is used to represent an additive domain with one in which a substitutive scale is used to represent an additive domain. Figure 2.6 illustrates the representation of percentages (an additive scale) by arbitrary shadings. According to Hypothesis 1, the superior representation would be to use an ordered sequence of density (an additive scale) to represent percentages (an additive scale), as shown in Figure 2.7. Note that there is still a problem with the representation in Figure 2.7, but the problem helps emphasize the point about the importance of matching representational format. The white areas, perceptually, appear to represent the states with the least concentration of radon. This is because white fits on the ordered density scale to the left of (less than) the 010% density. In fact, white represents those states for which there are no data. One way to represent this situation to avoid the conflict in representational interpretation would be to delete the states for which there is no information from the map. I chose the method shown because the natural misinterpretation helps make the point about the impact of representational scale.

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 12 di 18

Figure 2.7. A natural mapping. Here, the map of Figure 2.6 has been redrawn so that percentage (which is an additive dimension) is represented by an additive scale ordered densities of shading. Now, the density ordering matches the percentage ordering. (Redrawn from a figure in the Los Angeles Times, September 13, 1988, p. 21.)

Color hue is frequently used to represent density or quantity, especially in geographic maps, satellite photographs, and medical imagery. But hue is a substitutive scale, and the values of interest are almost always additive scales. Hence, according to Hypothesis 1, hue is inappropriate for this purpose. The use of hue should lead to interpretive difficulties. In fact, people who use these color representations do demonstrate difficulties by their continual need to refer to the legend that gives the mapping between the additive scale of interest and the hues. According to the hypothesis, density or brightness would provide a superior representation. It would probably be even better to use a spatial third dimension for representing this information.

Figure 2.8. Differing representations for numerical quantity. If one simply wishes to compare numerical values, tally marks are superior to Arabic numerals, for the length of the representation is analogous to the numerical value. If one wants to do arithmetic operations, the symbolic (Arabic) representation is better, even though length is not a good indication of value. The Roman numeral representation is a compromise, being somewhat symbolic, but also approximately proportional to the value being represented.

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 13 di 18

Legends of maps and graphs are usually used to present the mapping rule for the representational code being used. According to my hypotheses, frequent use of legends is a sign of inappropriate representational mapping. With appropriate representations, the mapping code is easily learned and applied: Legends should not be essential to understanding. Representations for Comparing Numerical Counts Even such a simple example as counting items in order to compare quantity provides another instance of the use of mapping rules. When one is interested in comparing the values of counts to determine which is greater, according to these hypotheses, the superior form of representation will have the size of the representation itself map onto the size of the number. Size comparisons require additive comparisons. Line length provides an additive representation. The Arabic numeral method for representing number does not. Counting methods that use tally marks to represent the number of objects translate number into length in this case, the length of the space required to show the tally marks (Figure 2.8). Tally marks, therefore, provide an additive representation in which the size of the representation is related to the value of the number. Thus, according to Hypothesis 1, Arabic notation is inferior for simple Boolean comparisons because its perceptual representation bears little relationship to its numerical value: There is only a weak perceptual relationship between the physical dimensions of a numerical representation and its numerical value (the physical length of the number how many digits it contains is proportional to the logarithm of its value but with a discreteness of resolution good only to within a factor of 10). But Arabic notation is superior to all other common notations when numerical operations need to be performed. Most people feel uncomfortable with this result because the comparison of Arabic numerals seems natural and straightforward. Here is where Hypothesis 2 comes into play. Most people forget the years of training it has taken to reach this state of naturalness. Moreover, there is psychological evidence that the time to compare two different (Arabic) numbers varies with the size of the difference between the numbers, strongly suggesting that an internal translation has to be made into the more primitive, additive representation, as suggested by Hypothesis 2. Moreover, I would predict that under heavy workload, comparisons of Arabic numbers would suffer. However, in cases where an exact numerical value is required or where numerical operations need to be performed, Arabic notation is clearly superior which is why it is the standard notation used today. The form of representation most appropriate for an artifact depends upon the task to be performed, which is one reason that so many different numerical representations do exist (Ifrah, 1987; Nickerson, 1988). Intrinsic Properties of Representation Some years ago, Palmer described several properties of representations, including two that he called "intrinsic" and "extrinsic" (Palmer, 1978). The important point of these attributes is that they constrain what one can do with representations. A simple example will suffice. Consider three objects: A, B. and C. Suppose that we know that object A is taller than both object B and object C, but we don't know which is taller, B or C. We can represent this state of affairs very nicely by symbolic expressions. Let H(i) be the height of object i. Then we know that: H(A) > H(B); H(A) > H(C)

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 14 di 18

We do not know the relationship between H(B) and H(C), and this symbolic form of representation does not force us to represent the relationship. That is an important, positive aspect of this form of representation. However, on the negative side, there is nothing to stop us from writing a contradictory statement: H(B) > H(A), or even H(A) > H(A) Suppose we represented the objects by a visual image: In the image, height of the object would be represented by height of the image. A possible representation for the three objects is shown in Figure 2.9. Note that with an image, it is simply not possible to represent an object without also representing its form and size: In this case, the representation of height is an intrinsic property of a visual image. Moreover, it is simply not possible to enter a contradictory statement in the same way that we could with the other representational format.

Figure 2.9. Intrinsic properties of a representation. Using images to represent the objects A, B. and C, we cannot also avoid representing their form and dimensions. Even if we did not know the height of C, we would be forced to select some value under this form of representation.

The form of representation used by an artifact carries great weight in determining its functionality and utility. The choice of representation is not arbitrary: Each particular representation provides a set of constraints and intrinsic and extrinsic properties. Each representation emphasises some mappings at the expense of others, makes some explicit and visible, whereas others are neglected, and the physical form suggests and reminds the person of the set of possible operations. Appropriate use of intrinsic properties can constrain behavior in desirable or undesirable ways. Forcing functions are design properties that use the intrinsic properties of a representation to force a specific behavior upon the person (Norman, 1988). Thus, in normal operation, it is not possible to start a modern automobile without the proper key, for the ignition switch is operated by turning the key: The switch has a builtin forcing function that requires insertion of the key. One of the intrinsic properties of the lock is the lack of affordances for turning. One of the intrinsic properties of a key is the affordance it offers for rotation of the lock (assuming it is the proper key for the particular lock). However, it is possible to leave the automobile without removing the key from the ignition there is no forcing function. Bells and alarms that accompany the opening of the door without removing the key are not forcing functions. These are reminders extrinsic or addedon properties of the system. They can remind the user but they allow the behavior. A forcing function would require the key to open the door, or perhaps make it so that the door would not open with the key still in the ignition. These forcing functions, of course, have undesirable consequences. Any design can be thought of as a representation. The designer has to decide how to represent the features of the device, how to implement the operation, and how to represent the current state. In the choice of design, many factors come into play, including aesthetics, cost, manufacturing efficiency, and usability. The face that the device puts forward to the person is often a compromise among the

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 15 di 18

competing requirements of these different factors, but this face the interface is a representation. Forcing functions are simply the manifestations of the intrinsic properties of the design represention. Representations carry with them many subtle intrinsic properties, often ones not intended by the designer. Line lengths represent quantity, and two lines of different lengths thereby intrinsically present a comparison of the lengths, even if that is not intended by the designer. Many inappropriate uses of graphs can be traced to conflicts with the unintended intrinsic properties of the graphs.

Figure 2.10. Inappropriate use of an additive scale. This example, inspired by Mackinlay (1986), shows that additive scales have the intrinsic property of numerical value andn therefore, they imply numerical comparison. This, of course is an inappropriate operation for these data. Note that there is no formal problem with the representation save for the erroneous implication.

Additive Scales for Qualitative Information A marvelous demonstration of how representational format can be misused in graphs is presented by Mackinlay (1986). Suppose we wish to represent the country of origin of various automobiles. Mackinlay points out that the example shown in Figure 2.10 is clearly inappropriate. Clearly, the choice of a bar graph is inappropriate for this purpose. But why? What is the problem with Figure 2.10? The bar graph does uniquely specify the desired relationship between manufacture and country: There is no formal problem with the presentation. The problem arises from the intrinsic, additive properties of the lengths of bars. Additive scales have the intrinsic property of numerical value and, therefore, they imply numerical comparison. This, of course, is an inappropriate operation for these data. Finally, the bar graph violates the appropriateness principle that the surface representation used by the artifact should allow the person to work with exactly the information acceptable to the task: neither more nor less. In this case, the bars are capable of carrying more informational structure than the task permits. The excess informational value permitted by the graph is clearly inappropriate: The graph and any artifact should use a representation that is nether too rich nor too poor. Summary Cognitive artifacts play an important role in human performance. In this chapter I provide the beginning of an analysis of their critical components by focusing upon three aspects of artifacts: their role in enhancing cognition (the difference between the system and the personal point of view); the degrees of engagement that one can experience; the role of representational format. The study of artifacts can lead to several advances. First, because so many human activities depend upon artifacts, a full understanding of those activities requires an understanding of the human informationprocessing mechanisms, the internal knowledge of the human, and also the structure, capabilities, and representational status of the artifacts. Second, by understanding the ways in which

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 16 di 18

cognitive artifacts serve human cognition, we may be better able to design new ones and improve old ones. A major theme of the chapters in this book is the role of artifact, both in support of human activities and also as a tool for the understanding of human cognition. Artifacts play a critical role in almost all human activity. Indeed, as the quotation from Cole with which I opened this chapter suggests, the development of artifacts, their use, and then the propagation of knowledge and skills of the artifacts to subsequent generations of humans are among the distinctive characteristics of human beings as a species. The evolution of artifacts over tens of thousands of years of usage and mutual dependence between human and artifact provides a fertile source of information about both. The study of the artifact informs us about the characteristics of the human. The study of the human informs us of the appropriate characteristics of artifacts. And the study of both the artifact and the human must emphasise the interactions between and the complementarity of the two. The study of the relationship between humans and the artifacts of cognition provides a fertile ground for the development of both theory and application. Acknowledgments Much of this chapter reflects joint work and thinking with my research collaborator, Ed Hutchins. I wish to acknowledge my strong debt and gratitude to Ed for his contributions and my thanks for his permission to use them in this way throughout this chapter. I am grateful to the many people who have commented and aided in this work. In particular, l thank Jack Carroll, Mike Cole, Emmy Goldknopf, and Hank Strub for their comments, intensive critiques, and suggestions. The working group for this conference, of course, was a valuable source of feedback and considerable thanks must go to them and to Jack Carroll for providing the framework for the interaction. Research support was provided by grant NCC 2591 to Donald Norman and Edwin Hutchins from the Ames Research Center of the National Aeronautics and Space Agency in the Aviation SafetyAutomation I'rogram. Everett Palmer served as technical monitor. Additional support was provided by funds from the Apple Computer Company and the Digital Equipment Corporation to the Affiliates of Cognitive Science at UCSD. References Bodker, S. (1989). A human activity approach to user interfaces. HumanComputer Interaction, 4, 171195. Cole, M. (1990). Cultural psychology: A once and future discipline? Paper presented at the Nebraska Symposium, 1989. Cole, M., & Griffin, P. (1980). Cultural amplifiers reconsidered. In D. R. Olson (Ed.), The social foundations of language and thought. New York: Norton. Degani, A., & Wiener, E. L. (1990, February). Human factors of JQightdeck checklists: The normal checklist (Contractor's Report). Moffett Field, CA: National Aeronautics and Space Administration, Ames Research Center. Draper, S. W. (1986). Display managers as a basis for usermachine interaction. In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectives in humancomputer interaction (pp. 339352). Hillsdale, NJ: Lawrence Erlbaum Associates. Hutchins, E. (1986). Mediation and automatization. Quarterly Newsletter of the Laboratory of

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 17 di 18

Comparative Human Cognition: University of California, San Diego, 8(2), 4758. Hutchins, E., Hollan, J., & Norman, D. A. (1986). Direct manipulation interfaces. In D. A. Norman & S. Draper (Eds.), User centered system design: New perspectives in humancomputer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Ifrah, G. (1987). From one to zero: A universal history of numbers (L. Bair, Trans.). New York: Penguin Books. (Original work published 1981) Laurel, B. K. (1986). Interface as mimesis. In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectives in humancomputer interaction (pp. 6785). Hillsdale, NJ: Lawrence Erlbaum Associates. Leont'ev, A. N. (1981). Problems of the development of mind. Moscow: Progress Publishers. Luria, A. R. (1979). The making of mind: A personal account of Soviet psychology. (M. Cole & S. Cole, Eds.). Cambridge, MA: Harvard University Press. Mackinlay, J. D. (1986). Automating the design of graphical presentations of relational information. ACM Transactions on Graphics, 5 (2), 110141. Mackinlay, J. D., Card, S. K., & Robertson, G. G. (1989). A semantic analysis and taxonomy of input devices. Unpublished manuscript, Xerox Palo Alto Research Center. Newell, A. (1981). The knowledge level. Al Magazine, 2, 120. Also published in Artificial Intelligence, 1982, 18, 87127. Nickerson, R. (1988). Counting, computing, and the representation of numbers. Human Factors, 30, 181199. Norman, D. A. (1986). Cognitive engineering. In D. A. Norman & S. W. Draper (Eds.), User centered system design: New perspectives in humancomputer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Norman, D. A. (1988). The psychology of everyday things. New York: Basic Books. Also published in paperback as D. A. Norman. (1990). The design of everyday things. New York: Doubleday. Norman, D. A., & Hutchins, E. (1990). Checklists. Unpublished manuscript, Department of Cognitive Science, University of California, San Diego. Norman, D. A., & Hutchins, E. L. (1988). Computation via direct manipulation (Final Report: ONR Contract N0001485C0133). La Jolla, CA: University of California, San Diego, Institute for Cognitive Science. NTSB. (1989). Aircraft accident report Delta Air Lines, Inc., Boeing 727232, N473DA, DallasFort Worth International Airport, Texas, August 31, 1988 (Report No: NTSB/AAR88/05, Govt. Accession No: PB 89/04. September 26, 1989). Washington, DC: National Transportation Safety Board. Palmer, S. (1978). Fundamental aspects of cognitive representation. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorisation. Hillsdale, NJ: Lawrence Erlbaum Associates. Restle, F. (1961). Psychology of judgment and choice. New York: Wiley.

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Cognitive Artifacts

Pagina 18 di 18

Rumelhart, D. E., & Norman, D. A. (1988). Representation in memory. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens' handbook of experimental psychology. New York: Wiley. Stevens, S. S. (1957). On the psychophysical law. Psychological Review, 64, 153181. Vygotsky, L. S. (1978). Mind in society: The development of higher mental processes. (M. Cole, V. JohnSteiner, S. Scribner, & E. Souberman, Eds.). Cambridge, MA: Harvard University Press. Wertsch, J. V. (1985). Vygotsky and the social formation of mind. Cambridge, MA: Harvard University Press. Wundt, W. M. (1916). Elements offolk psychology: Outlines of a psychological history of the development of mankind (Edward Leroy Schaub, Trans.). London: Allen & Unwin.

http://www.cs.umu.se/kurser/TDBC12/HT99/Norman-91.html

26/11/2007

Published in the Proceedings of CHI '97, March 22-27, 1997, © 1997 ACM

1

Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms Hiroshi Ishii and Brygg Ullmer MIT Media Laboratory Tangible Media Group 20 Ames Street, Cambridge, MA 02139-4307 USA {ishii, ullmer}@media.mit.edu ABSTRACT

BITS & ATOMS

This paper presents our vision of Human Computer Interaction (HCI): "Tangible Bits." Tangible Bits allows users to "grasp & manipulate" bits in the center of users’ attention by coupling the bits with everyday physical objects and architectural surfaces. Tangible Bits also enables users to be aware of background bits at the periphery of human perception using ambient display media such as light, sound, airflow, and water movement in an augmented space. The goal of Tangible Bits is to bridge the gaps between both cyberspace and the physical environment, as well as the foreground and background of human activities. This paper describes three key concepts of Tangible Bits: interactive surfaces; the coupling of bits with graspable physical objects; and ambient media for background awareness. We illustrate these concepts with three prototype systems – the metaDESK, transBOARD and ambientROOM – to identify underlying research issues.

We live between two realms: our physical environment and cyberspace. Despite our dual citizenship, the absence of seamless couplings between these parallel existences leaves a great divide between the worlds of bits and atoms. At the present, we are torn between these parallel but disjoint spaces. We are now almost constantly "wired" so that we can be here (physical space) and there Figure 1 Sketches made (cyberspace) simultaneously at Collection of Historical [14]. Streams of bits leak out Scientific Instruments at of cyberspace through a Harvard University myriad of rectangular screens into the physical world as photon beams. However, the interactions between people and cyberspace are now largely confined to traditional GUI (Graphical User Interface)-based boxes sitting on desktops or laptops. The interactions with these GUIs are separated from the ordinary physical environment within which we live and interact. Although we have developed various skills and work practices for processing information through haptic interactions with physical objects (e.g., scribbling messages on Post-It™ notes and spatially manipulating them on a wall) as well as peripheral senses (e.g., being aware of a change in weather through ambient light), most of these practices are neglected in current HCI design because of the lack of diversity of input/output media, and too much bias towards graphical output at the expense of input from the real world [3].

Keywords

tangible user interface, ambient media, graspable user interface, augmented reality, ubiquitous computing, center and periphery, foreground and background INTRODUCTION: FROM THE MUSEUM

Long before the invention of personal computers, our ancestors developed a variety of specialized physical artifacts to measure the passage of time, to predict the movement of planets, to draw geometric shapes, and to compute [10]. We can find these beautiful artifacts made of oak and brass in museums such as the Collection of Historic Scientific Instruments at Harvard University (Fig. 1). We were inspired by the aesthetics and rich affordances of these historical scientific instruments, most of which have disappeared from schools, laboratories, and design studios and have been replaced with the most general of appliances: personal computers. Through grasping and manipulating these instruments, users of the past must have developed rich languages and cultures which valued haptic interaction with real physical objects. Alas, much of this richness has been lost to the rapid flood of digital technologies. We began our investigation of "looking to the future of HCI" at this museum by looking for what we have lost with the advent of personal computers. Our intention was to rejoin the richness of the physical world in HCI. Permission to make digital/hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright is b y permission of th ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission and/or a fee. CHI ‘97, Atlanta GA USA Copyright 1997 ACM 0-89791-802-9/97/03 ..$3.50

Outline of This Paper

To look towards the future of HCI, this paper will present our vision of Tangible Bits and introduce design projects including the metaDESK, transBOARD and ambientROOM systems to illustrate our key concepts. This paper is not intended to propose a solution to any one single problem. Rather, we will propose a new view of interface and raise a set of new research questions to go beyond GUI. FROM DESKTOP TO PHYSICAL ENVIRONMENT

In 1981, the Xerox Star workstation set the stage for the first generation of GUI [16], establishing a "desktop metaphor" which simulates a desktop on a bit-mapped

Ishii and Ullmer, Tangible Bits

Typical HCI

Tangible UI

GUI of desktop PC

World will be interface.

2 2) Coupling of Bits and Atoms: Seamless coupling of everyday graspable objects (e.g., cards, books, models) with the digital information that pertains to them; and 3) Ambient Media: Use of ambient media such as sound, light, airflow, and water movement for background interfaces with cyberspace at the periphery of human perception.

Figure 2 From GUI to Tangible User Interfaces

screen. The Star was the first commercial system which demonstrated the power of a mouse, windows, icons, property sheets, and modeless interactions. The Star also set several important HCI design principles, such as "seeing and pointing vs remembering and typing," and "what you see is what you get." The Apple Macintosh brought this new style of HCI into the public's attention in 1984, creating a new stream in the personal computer industry [1]. Now, the GUI is widespread, largely through the pervasiveness of Microsoft Windows. In 1991, Mark Weiser (Xerox PARC) published an article on his vision of "Ubiquitous Computing" [18], illustrating a different paradigm of computing and HCI which pushes computers into the background and attempts to make them invisible. The aim of our research is to show concrete ways to move beyond the current dominant model of GUI bound to computers with a flat rectangular display, windows, a mouse, and a keyboard. To make computing truly ubiquitous and invisible, we seek to establish a new type of HCI that we call "Tangible User Interfaces" (TUIs). TUIs will augment the real physical world by coupling digital information to everyday physical objects and environments. Fig. 2 illustrates the transition of HCI from the GUI of desktop PCs to Tangible User Interfaces which will change the world itself into an interface. We see the locus of computation is now shifting from the desktop in two major directions: i) onto our skins/bodies, and ii) into the physical environments we inhabit. The transition to our bodies is represented by recent activities in the new field of "wearable computers" [13]. We are focusing on the second path: integration of computational augmentations into the physical environment. Our intention is to take advantage of natural physical affordances [15] to achieve a heightened legibility and seamlessness of interaction between people and information. GOALS OF TANGIBLE BITS

"Tangible Bits" is an attempt to bridge the gap between cyberspace and the physical environment by making digital information (bits) tangible. We are developing ways to make bits accessible through the physical environment. Our key concepts are: 1) Interactive Surfaces: Transformation of each surface within architectural space (e.g., walls, desktops, ceilings, doors, windows) into an active interface between the physical and virtual worlds;

Background Periphery Foreground Center

ambient media

graspable media

Figure 3

Center and Periphery of User's Attention within Physical Space

Ultimately, we are seeking ways to turn each state of physical matter – not only solid matter, but also liquids and gases – within everyday architectural spaces into "interfaces" between people and digital information. We are exploring ways of both improving the quality and broadening the bandwidth of interaction between people and digital information by: • allowing users to "grasp & manipulate" foreground bits by coupling bits with physical objects, and • enabling users to be aware of background bits at the periphery using ambient media in an augmented space. Current HCI research is focusing primarily on foreground activity and neglecting the background [2]. However, subconsciously, people are constantly receiving various information from the "periphery" without attending to it explicitly. If anything unusual is noticed, it immediately comes to the center of their attention. The smooth transition of users’ focus of attention between background and foreground using ambient media and graspable objects is a key challenge of Tangible Bits. RELATED WORKS

Our work is inspired by the vision of "Ubiquitous Computing" [18] and the new stream of "Augmented Reality" research [20, 7, 9, 4]. The notion of "foreground/background" [2] also has stimulated our vision. Tangible Bits is also directly grounded on the previous works of ClearBoard [12] and Graspable User Interfaces [8]. Interactions with many media artists and product designers have also influenced the development of our vision. Ubiquitous Computing

Mark Weiser (Xerox) proposed a vision of Ubiquitous Computing in which access to computational services is delivered through a number of different devices, the design and location of which would be tailored to support various tasks. In addition to this ubiquity, he stressed that the delivery of computation should be "transparent." His team

Ishii and Ullmer, Tangible Bits at PARC implemented a variety of computational devices including Tabs, Pads, and Boards, along with the infrastructure which allows these devices to talk with each other. Our work has been stimulated by Weiser's vision, but it is also marked by important differences. The Tab/Pad/Board vision is largely characterized by exporting a GUI-style interaction metaphor to large and small computer terminals situated in the physical environment. While this approach clearly has a place, our interest lies in looking towards the bounty of richly-afforded physical devices of the last few millennia and inventing ways to reapply these elements of "tangible media" augmented by digital technology. Thus, our vision is not about making "computers" ubiquitous per se, but rather about awakening richly-afforded physical objects, instruments, surfaces, and spaces to computational mediation, borrowing perhaps more from the physical forms of the pre-computer age than the present. Augmented Reality

Augmented Reality (AR) (or Computer-Augmented Environments) is a new research stream which tries to answer the question of how to integrate the "real world" and computational media [20, 7, 9, 4]. DigitalDesk (Wellner) [20] is one pioneering work which demonstrated a way to merge physical and digital documents by using video projection of computer display onto a real desk with physical documents. The most common AR approach is the visual overlay of digital information onto real-world imagery with seethrough head-mounted (or hand-held) display devices or video projections. Our approach in Tangible Bits is differentiated by a strong focus on graspable physical objects as input rather than by considering purely visual augmentations, and in the combination of ambient media with graspable objects.

3 seamless integration of input and output for supporting physical interactions. Bricks: Graspable Interfaces

User

Graspable User Interfaces [8] (Fitzmaurice, Ishii & Buxton) Figure 4 ClearBoard [12] allow direct control of virtual objects through physical handles called "bricks." Bricks can be "attached" to virtual objects, thus making virtual objects physically graspable. Although a mouse provides timeFigure 5 Bricks [8] multiplexed input (one device to control different functions at different points in time), Bricks offer a concurrence between space-multiplexed input and output. Bricks can be attached to virtual objects to serve as dedicated Figure 6 Marble Answering transducers, each occupying Machine (Courtesy D. its own space. Bricks Bishop) encourage two-handed direct manipulation and allow parallel input specification, thereby improving the communication bandwidth with the computer. This work lead us to a strong focus on graspable physical objects as a means to access and manipulate bits within the Tangible Bits project. Marble Answering Machine

ClearBoard [12] (Ishii & Kobayashi) was designed to achieve a seamless integration of shared drawing space and interpersonal space for geographically distributed shared drawing activity. ClearBoard triggered the idea of changing a "wall" from a passive architectural partition to a dynamic collaboration medium that integrates distributed real and virtual spaces. ClearBoard led us to a vision of new architectural spaces where all the surfaces including walls, ceilings, windows, doors and desktops become active surfaces through which people can interact with other spaces, both real and virtual.

Durrell Bishop, while a student at the Royal College of Art (RCA), designed a prototype telephone answering machine to explore ways in which computing can be taken off the desk and integrated into everyday objects. In the marble answering machine, incoming voice messages are physically instantiated as marbles (Fig. 6). The user can grasp the message (marble) and drop it into an indentation in the machine to play the message. The user can also place the marble onto an augmented telephone, thus dialing the caller automatically [5]. The original concept animation was followed by several physical prototypes which realized the answering machine along with a family of other physical instantiation of applications. This physical embodiment of incoming phone messages as marbles demonstrated the great potential of making digital information graspable by coupling bits and atoms.

Passive Real-World Interface Props

Live Wire

ClearBoard

Hinckley et al. has developed passive real-world interface props for 3D neurosurgical visualization [11]. Here, users are given physical props (e.g. head viewing prop, cuttingplane selection prop) as a mechanism for manipulating 3D models within a traditional computer screen. They worked towards an interface which facilitates natural two-handed interaction and provides tactile and kinesthetic feedback. Our work differs from the passive props approach in its

Natalie Jeremijenko designed a beautiful instrument called Live Wire while an artist in residence at Xerox PARC [19]. It is a piece of plastic cord that hangs from a small electric motor mounted on the ceiling. The motor is electrically connected to the area Ethernet network, such that each passing packet of information causes a tiny twitch of the motor. Bits flowing through the wires of a computer network become tangible through motion, sound, and even

Ishii and Ullmer, Tangible Bits

4

touch. The activity of the wire is visible and audible from many offices without being obtrusive, taking advantage of peripheral cues. This work encouraged us to think about ambient media as a general mechanism for displaying activities in cyberspace. Fields and Thresholds: Benches

Anthony Dunne and Fiona Raby at the RCA presented "Fields and Thresholds" [6] at the Doors of Perception 2 conference. In this work, they explored ways to blur the boundary between "telematic" spaces and physical spaces and to create a sense of another place using non-visual media such as acoustic and thermal transducers. As an example, they described two cold steel benches located in different cities. When a person sits on one of these benches, a corresponding position on the other bench warms, and a bi-directional sound channel is opened. At the other location, after feeling the bench for "body heat," another person can decide to make contact by sitting near the warmth. Initially the sound channel is distorted, but as the second party lingers, the audio channel clears. This subtle and poetic representation of the presence of others stimulated the conceptual design of our ambientROOM. TANGIBLE BITS: RESEARCH PROTOTYPES

"Tangible User Interfaces" emphasize both visuallyintensive, "hands-on" foreground interactions, and background perception of ambient light, sound, airflow, and water flow at the periphery of our senses. The metaDESK and transBOARD are prototype systems for exploring the use of physical objects as a means to manipulate bits in the center of users’ attention (foreground). On the other hand, the ambientROOM is focused on the periphery of human perception (background). metaDESK

transBOARD

Foreground Objects on Interactive Surface

ambientROOM

Ambient Media in Background

Fig. 7 Three Research Platforms of Tangible Bits

The metaDESK consists of a nearly physical passive objects, horizontal backdesktop instruments projected graphical surface; the metaDESK (lenses, phicons, etc.) "activeLENS," an desktop arm-mounted LCD metaphor graspable screen; the windows, icons, widgets. "passiveLENS," a passive optically GUI on desktop PC transparent "lens" Virtual World actively mediated by the desk; "phicons," Figure 8 metaDESK design physical icons; and approach instruments which are used on the surface of the desk. These physical objects and instruments are sensed by an array of optical, mechanical, and electromagnetic field sensors embedded within the metaDESK, which is based on Input Technologies' VisionMaker™. The metaDESK "brings to life" these physical objects and instruments as elements of tangible interfaces. Real World

TUI: Tangible UI lens

phicon

tray

window

icon

menu

phandle instrument

GUI: Graphical UI handle

widget

Figure 9 Physical instantiation of GUI elements in TUI

Tangible Geospace

Tangible Geospace is a prototype application of the metaDESK platform. Tangible Geospace uses physical models of landmarks such as MIT's Great Dome and Media Lab buildings as phicons to allow the user to manipulate 2D and 3D graphical maps of the MIT campus (Fig. 10, 11). By grasping a small physical model of the Great Dome and placing it onto the desk’s surface, a twodimensional map of MIT campus appears on the desk surface beneath the object, with the location of the Dome passiveLENS

activeLENS

metaDESK

In the metaDESK design, we have tried to push back from GUIs into the real world, physically embodying many of the metaphorical devices (windows, icons, handles) they have popularized. Simultaneously, we have attempted to push forward from the unaugmented physical world, inheriting from the richness of various historical instruments and devices often made "obsolete" by the advent of personal computers. This design approach of metaDESK is illustrated in Fig. 8. Fig. 9 illustrates examples of the physical instantiation of GUI elements such as windows, icons, and handles in Tangible User Interfaces. For example, the "activeLENS," an arm-mounted flat-panel display, is a physically instantiated window which allows haptic interaction with 3D digital information bound to physical objects.

instrument

Figure 10

phicons

Tangible Geospace on metaDESK

Ishii and Ullmer, Tangible Bits

Figure 11 Phicon and activeLENS

5 By bringing a passiveLENS device onto the desk, the user may interact with satellite-imagery or future/past-time overlay views of the map space, or explore alternate interactions consistent with physical instantiation of the Magic Lens metaphor [17]. With two phicon objects on the desk, there is an issue of ambiguity that must be resolved. For instance, when one or both phicons are rotated independently, how should the application respond? We currently ignore this conflicting information, but could also imagine other interpretations such as warping the map view. To resolve this ambiguity, we designed a rotation-constraint instrument made of two cylinders mechanically coupled by a sliding bar as shown in Fig. 12. This instrument has mechanical constraints which prohibit independent rotation and realize distinct axes of scaling and rotation. By building in these physical constraints, we resolve the question of ambiguity in this particular case. ambientROOM

Figure 12

Scaling and Rotation Device with embedded mechanical constraints

on the map bound to the physical location of the Dome phicon. (Fig. 11). Simultaneously, the arm-mounted activeLENS displays a spatially-contiguous 3D view of MIT campus (Fig. 11). By grasping and moving the activeLENS (a physically embodied window), the user can navigate the 3D representation of campus building-space. The Great Dome phicon acts both as a container of bits which represent the MIT campus, and as a handle for manipulating the map. By rotating or translating the Dome object across the desk surface, both the 2D desk-view and 3D activeLENS-view are correspondingly transformed. The user is thus interacting visually and haptically with three spaces at once—the physical-space of the Dome object; the 2D graphical space of the desk surface; and the 3D graphical space of the activeLENS. The user may then take a second phicon, this time of the Media Lab building, and place it onto the surface of the desk. The map then rotates and scales such that the second phicon is bound to the location of the Media Lab building on the map. Now there are two physical constraints and handles on the MIT campus space, allowing the user to simultaneously scale, rotate, and translate the map by moving one or both objects with respect to each other. Because each phicon serves as an independent locus of control, the user may grasp and manipulate both objects simultaneously with his/her two hands. Alternatively, two users may independently grasp separate building objects, cooperatively manipulating the transformation of the Geospace. In this fashion, there is no one locus of control as is true in point-and-click mouse interaction; rather, the interaction is constrained by the physics of the physical environment, supporting multiple pathways of single- and multi-user interaction.

The ambientROOM complements the graphically-intensive, cognitively-foreground interactions of the metaDESK by using ambient media – ambient light, shadow, sound, airflow, water flow – as a means for communicating information at the periphery of human perception. The ambientROOM is based on Steelcase's Personal Harbor™ unit, a 6’ x 8’ freestanding room, which we have augmented with MIDI-controllable facilities. The ambientROOM is designed to employ both the foreground and background of users' attention.

Figure 13

ambientROOM based on Personal Harbor™

In normal day to day interactions, we get information in two main ways. First, we get information from what we are focusing on, where our center of attention is directed. When we are speaking with a colleague in the office, we are consciously focusing on that person and receiving information directly from them. But at the same time, we are also getting information from ambient sources. We may have a sense of the weather outside from ambient cues such as light, temperature, sound, and air flow from nearby windows. We may also have an idea of the activities of colleagues in the area from the ambient sound and the visible presence of passers-by. In contrast to the conscious foreground processing occurring in discussions with a colleague, much of this ambient information is processed through background communication channels. Our goal for the ambientROOM is to explore how we can take advantage of this natural

Ishii and Ullmer, Tangible Bits parallel background processing using ambient media to convey information. One focus of the ambientROOM is the use Figure 14 Phicons in of ambient media to subtly ambientROOM display and communicate information which is not the user’s primary foreground task. We are also concerned with providing handles for the seamless transition of the user's interaction between background and foreground Figure 15 An ambient information. In the real display : light reflection world, when a process that from water onto ceiling was not a focus of attention catches our interest, we are often able to seamlessly integrate it into our activity. Realizing HCI analogs of this fuzzy, fluid boundary between background and foreground is a challenging research opportunity. To identify design issues within the ambientROOM, we have prototyped a simple scenario which suggests possible directions for continuing work. Within the ambientROOM, we have utilized phicons as handles and containers for "sources" and "sinks" of information. Imagine one’s business is manufacturing toys, and the latest toy car product has just been advertised on the company Web site. In this example, we can imagine using the physical object of the toy car as a phicon representing a source of information, say, web-hits on the carannouncement Web page. By grasping this car phicon and moving it into the proximity of some information sink, we can establish an ambient display of car web-page activity. For instance, by moving the phicon near a speaker in the ambientROOM, we can imagine activating a subtle audio display of associated web activity, where each web-hit to the car page is presented as the sound of raindrops. The sound of heavy rain would indicate many visits to the web page and success in attracting customer attention, while no rain might indicate poor marketing or a potential breakdown of the web server. A steady pattering of rain might remain at the periphery of the user’s attention, allowing the user to concentrate on foreground activities such as reading e-mail. However, if the sound of rain suddenly stops or grows louder, it will attract his attention, causing him to grasp the object and bring it onto the metaDESK, for example, displaying more detailed interactive graphical information about recent web page activity. We prototyped the above interaction in the ambientROOM with passive physical phicons, simple MIDI-based infrared proximity sensors, and recorded sounds of rainfall linked to sample web activity levels. While we found this ambient

6 display compelling, we also determined that at times the sounds of rain could be distracting. As an alternate display technology, we built a thin water tank outfitted with a solenoid-driven float which can be "pulled by bits." With each pull, the float creates a ripple on the surface of water. Light projected onto the water tank from above casts a subtle but poetic image of ripples on the ceiling of the room (Fig. Figure 16 transBOARD: a networked digital 15). This new ambient whiteboard with display of bits in the room hyperCARDs has been received enthusiastically by both artists and commercial sponsors who have visited our lab. Although ambient media is most often processed continually in the background, it can naturally move into the center of attention. The brain will naturally move the ambient information into the foreground either if it becomes anomalous or if there is a lull in the foreground concern. By using ambient media as an additional method to convey information, we are taking advantage of our brain's natural abilities as both a parallel processor and as an attention manager. We have begun a new research project, "Ghostly Presence," to support awareness at the periphery by displaying the presence of remote people and their activities using ambient media and graspable physical representations. transBOARD

The transBOARD is a networked digitally-enhanced physical whiteboard designed to explore the concept of interactive surfaces which absorb information from the physical world, transforming this data into bits and distributing it into cyberspace (Fig. 16). The transBOARD supports Java-based distributed access to physical whiteboard activity. The stroke server and Java applet (developed by Chris Fuchs in our group) allow distributed users to graphically view both realtime and recorded drawing processes over the Internet, or potentially to express drawing activity through ambient display media within the ambientROOM (for instance, with a subtle sound of scratching produced from a wall surface). The transBOARD was implemented on a SoftBoard™ product from Microfield Graphics, which monitors the activity of tagged physical pens and erasers with a scanning infrared laser. From the user's point of view, the transBOARD is nearly the same as an ordinary whiteboard, and minimally alters the familiar work practice of using a whiteboard. The transBOARD supports the use of "hyperCARDs" (barcode-tagged paper cards) as containers of digital strokes. These magnetically-backed cards, which can be attached to the vertical surface of the transBOARD, are another

Ishii and Ullmer, Tangible Bits example of phicons. In our demonstration scenario, when a card is attached to the surface during drawing sessions, monitored pen-strokes from the whiteboard are virtually "stored" within the card; in fact, they are being recorded on a web server to which the hyperCARD’s barcode serves as a URL. In addition, the strokes are broadcast live to remote users who might be monitoring the session with an identical hyperCARD. The user can then keep the meeting contents within this card, perhaps bringing it to his office or home. If he puts this card on a conventional computer screen, it will automatically bring up the digital strokes stored in the source web server, without requiring manual entry of a filename or URL. For the implementation of this prototype, we used a barcode wand to let the transBOARD or remote computer identify the card at the beginning of each session. However, we found scanning of the barcode with the wand to be quite cumbersome, and we are now designing a new version that uses wireless passive RF ID tag technology. The surface of the transBOARD serves as a one-way filter for absorbing bits from the physical world into cyberspace. In contrast, the metaDESK surface is a bi-directional interactive surface spanning physical and virtual spaces. The lack of capability for displaying bits on the transBOARD makes an interesting contrast with the metaDESK which can both absorb and express digital information on its surface. We are planning to install a transBOARD-based activeLENS to augment the transBOARD’s limited display capabilities. DISCUSSIONS: OPTICAL METAPHORS

Through the design process of Tangible User Interfaces, we encountered many design problems which helped us to identify the most salient research issues. Among the many issues that we explored, we realized that metaphors which bridge physical and digital worlds are particularly interesting. In this section, we would like to discuss one user interface metaphor that can be applied seamlessly across the metaDESK, ambientROOM, and transBOARD platforms. We found the metaphor of light, shadow, and optics in general to be particularly compelling for interfaces spanning virtual and physical space. We first implicitly invoked this metaphor by creating the activeLENS, an arm-mounted flatpanel display, modeled in both form and function upon the optical jeweler’s magnifying lens. Exploring this same optical process, we created the passiveLENS, a transparent plexiglass surface brought to life by the back-projected metaDESK display. Pushing further still, we applied the optical metaphor to the metaDESK’s physical icons (phicons), and created the notion of "digital shadows." In physical space, illuminated objects cast shadows reflecting their physical substance. In augmented space, we reasoned, physical objects might also cast digital shadows which project information pertinent to their virtual contents. We have used this digital shadow concept in several of our prototypes. Reflecting on the notion of digital shadows in Tangible Geospace, we were struck by considerations of where

7 sources of digital, semantic, or virtual light might originate. So inspired, we have begun implementing an instrumented physical flashlight which projects various "wavelengths" of "virtual" or "semantic light" into the Tangible Geospace scene. In Tangible Geospace, this virtual light might cast digital shadows of various physical and virtual objects. For instance, the 3D building geometries of the MIT campus could be imagined not only to render "optically-constrained" shadows of buildings’ physical forms, but also to generate shadows as a function of non-physical parameters of the geographical landscape, say, as a function of the research publications or sponsorship inflows of various buildings. Furthermore, we can think of projecting other "optical" qualities into the scene such that the rendering style changes from photorealistic to impressionistic. The optical metaphor is not limited to the graphicallyintensive surface of the metaDESK, but has also played a major role in our design of the ambientROOM. We have discussed at some length the use of ambient light, shadow, and reflection/refraction from moving water within the ambientROOM. Here, notions of light and shadow are used both literally as a medium for ambient display, and metaphorically for the play of light and shadow at the periphery of human perception. On the transBOARD, the optical metaphor is somewhat weaker due to the limited computer display facilities. However, we have considered mounting an "optical" activeLENS device onto the transBOARD, allowing users to spatially and temporally navigate local and remote drawings along with supporting digital media. Perhaps the most compelling aspect of the optical metaphor is its seamless consistency with the physics of real space. By not only invoking but also obeying the optical constraints metaphorically imposed on our physical interface prototypes, we are able to maximize the legibility of interface of our creations. People know what to expect of a flashlight, know what to expect of lenses. By satisfying these expectations, we can realize truly seamless, "invisible" integration of our technologies with the physical environment. Finally, the optical metaphor is a great source of inspiration for future work. Ideas about the interaction of digital light with physical space opens many new possibilities for literal and metaphorical uses of mirrors, prisms, virtual and physical transparency and opacity, and light of different spectrums and powers of penetration in the context of both foreground and backchannel information display. CONCLUSIONS

This paper presented our vision of Tangible Bits which bridges the gap between the worlds of bits and atoms through graspable objects and ambient media in physical environments. Current GUI-based HCI displays all information as "painted bits" on rectangular screens in the foreground, thus restricting itself to very limited communication channels. GUIs fall short of embracing the richness of human senses and skills people have developed through a lifetime of interaction with the physical world.

Ishii and Ullmer, Tangible Bits Our attempt is to change "painted bits" into "tangible bits" by taking advantage of multiple senses and the multimodality of human interactions with the real world. We believe the use of graspable objects and ambient media will lead us to a much richer multi-sensory experience of digital information. Ishii met a highly successful PDA (Personal Digital Assistant) called the "abacus" when he was 2 years old. This simple abacus-PDA was not merely a computational device, but also a musical instrument, imaginary toy train, and a back scratcher. He was captivated by the sound and tactile interaction with this simple artifact. When his mother kept household accounts, he was aware of her activities by the sound of her abacus, knowing he could not ask for her to play with him while her abacus made its music. We strongly believe this abacus is suggesting to us a direction for the next generation of HCI.

8 2. 3.

4.

5. 6.

7.

ACKNOWLEDGMENTS

We thank Prof. William Buxton and George Fitzmaurice at the University of Toronto for countless discussions about graspable UI, skill-based design, and foreground & background issues, through which many of the ideas in this paper were developed and shaped. Thanks are also due to Bill Verplank at Interval Research for his insightful comments and discussions on haptic interfaces and tangible media, as well as for suggesting the metaDESK’s rotationconstraint instrument. He also introduced the work of the Marble Answering Machine and Benches to Ishii. Finally, we appreciate Mark Weiser for his inspiring Ubiquitous Computing work and the introduction of Live Wire to Ishii. We thank TTT (Things That Think), a new consortium at the MIT Media Lab, for its ongoing support of the Tangible Bits project. TTT was begun in October 1995 to explore futures where intelligence, sensing, and computation move off the desktop into "things." We also would like to acknowledge the contribution of many hardworking graduate and undergraduate students at MIT for work on the implementation of the Tangible Bits platforms. In particular, we thank graduate students Scott Brave, Andrew Dahley, Matt Gorbet, and Flavia Sparacino, as well as undergraduate assistants Minpont Chien, Philipp Frei, Chris Fuchs, Dylan Glas, Chris Lopes, Dave Naffziger, Tom Rikert, and Craig Wisneski for their work on the metaDESK, ambientROOM, and transBOARD prototypes. Scott Brave contributed to the text of the ambientROOM discussion. We also thank Thad Starner and the Vision and Modeling group for metaDESK computer vision support, and to Robert Poor, Scott Brave, and Brian Bradley for valuable comments on the paper. Thanks are finally due to administrative assistant Betty Lou McClanahan for her support of our research and comments on this paper. REFERENCES

1.

Apple. Human Interface Guidelines: The Apple Desktop Interface. Addison-Wesley, 1987

8.

9.

10. 11

12.

13.

14. 15. 16. 17. 18. 19. 20.

Buxton, W. Integrating the Periphery and Context: A New Model of Telematics, in Proceedings of Graphics Interface '95, 239-246. Buxton, W. (in press). Living in Augmented Reality: Ubiquitous Media and Reactive Environments. To appear in Finn, Sellen & Wilber (Eds.). Video Mediated Communication. Hillsdale, N.J.: Erlbaum. Cooperstock, J.R., Tanikoshi, K., Beirne, G., Narine, T., Buxton, W. Evolution of a Reactive Environment, Proceedings of CHI'95, 170-177. Crampton Smith, G. The Hand That Rocks the Cradle. I.D., May/June 1995, pp. 60-65. Dunne, A. and Raby F. Fields and Thresholds. Presentation at the Doors of Perception 2, November 1994, http://www.mediamatic.nl/Doors/Doors2/DunRab/DunRab-Doors2-E.html Feiner, S., MacIntyre, B., and Seligmann, D. Knowledge-based augmented reality. Commun. of the ACM, 36(7), July 1993, 52-62. Fitzmaurice, G.W., Ishii, H. & Buxton, W. Bricks: Laying the Foundations for Graspable User Interfaces, in Proceedings of CHI'95, 442-449. Fitzmaurice, G., Situated Information Spaces and Spatially Aware Palmtop Computers. Commun. ACM, July 1993, Vol. 36, No. 7, 38-49. Hambly, M., Drawing Instruments 1580-1980, Sotheby’s Publications, London, 1988. Hinckley, K., Pausch, R., Goble, J., and Kassel, N., Passive Real-World Interface Props for Neurosurgical Visualization, in Proceedings of CHI ‘94, April 1994, 452-458. Ishii, H., Kobayashi, M. and Arita, K. Iterative Design of Seamless Collaboration Media, Commun. ACM, Vol. 37, No. 8, August 1994, 83-97. Mann, S. 'Smart Clothing': Wearable Multimedia Computing and 'Personal Imaging' to Restore the Technological Balance between People and Their Environments, in Proc. of ACM MULTIMEDIA '96, November 1996, 163-174. Negroponte, N. Being Digital. Alfred A. Knopf, Inc., New York, 1995 Norman, D. A. Psychology of Everyday Things. Basic Books, 1988 Smith, D. Designing the Star User Interface. Byte, April 1982, pp. 242-282. Stone, M., Fishkin, K., and Bier, E. The Movable Filter as a User Interface Tool, in Proceedings of CHI '94, ACM Press, 306-312. Weiser, M. The Computer for the 21st Century. Scientific American, 1991, 265 (3), pp. 94-104. Weiser, M. and Brown, J.S., Designing Calm Technology. http://www.ubiq.com/hypertext/weiser/calmtech/calmtech.htm, December 1995. Wellner, P., Mackay, W., and Gold, R. Computer Augmented Environments: Back to the Real World. Commun. ACM, Vol. 36, No. 7, July 1993

CHAPTER 4

THE PRINCIPLES OF UNIVERSAL DESIGN Molly Follette Story

4.1

INTRODUCTION The Center for Universal Design at North Carolina State University defined universal design as “the design of products and environments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design” (Connell et al., 1997). Among the international community, there are multiple names for and definitions of universal design. Some are broader, others are narrower, and still others emphasize certain aspects over others; but consensus is unnecessary. Differing terminology is a sign of healthy engagement with the concept, of practitioners seeking wording that is useful for a variety of specific purposes. Regardless of wording, the goal is profound: we can and should make our human-made world as accessible and usable as possible for as diverse a user population as possible.

4.2 WHY CREATE PRINCIPLES OF UNIVERSAL DESIGN? Early in its history, the universal design concept suffered from a lack of established criteria that defined what makes a design most broadly usable. Instead, universal design was most often communicated through presentation of examples that demonstrated specific aspects of the concept, without concrete descriptions of requisite characteristics (e.g., Universal Designers and Consultants, 1996). Before the Principles of Universal Design were written, only limited accessibility criteria were available, and these were found in a few U.S. and international codes and standards. Some criteria were provided by accessibility building codes, such as those contained in the U.S. Americans with Disabilities Act Accessibility Guidelines (ADAAG). Other criteria were provided by standards for accessibility of electronic and information technologies, such as Section 508 of the 1998 amendments to the Rehabilitation Act in the United States. Other sets of usability (not accessibility) criteria were available in some American National Standards Institute (ANSI) and International Standards Organization (ISO) standards, but most were quite limited in scope. The most general, ISO 13407, Human-Centered Design Processes for Interactive Systems, defined a process for involving end users in the design process but did not provide design guidance. ISO 9241, Ergonomic Requirements for Office Work with Video Display Terminals, included discussion of dialogue principles (Part 10) and guidance on usability (Part 11). ANSI/HFES 200, Human Factors Engineering of Software User Interfaces, provided design requirements and recommendations that were intended to increase the accessibility, learnability, and ease of use of software, but hardware was not specifically addressed.

4.3

4.4

PRINCIPLES, STANDARDS, AND GUIDELINES

Typically, if accessibility was considered at all, these standards provided only minimum requirements to accommodate people with disabilities (basic accessibility) and fell substantially short of ideal conditions (both good accessibility and usability). The limitations of such prescriptive standards are discussed in Chap. 6, “U.S. Accessibility Codes and Standards: Challenges for Universal Design.” As discussed in Chap. 7, “The ADA and Accessibility: Interpretations in U.S. Courts,” standards such as the ADA also applied to only a limited set of specific products and environments. Guiding principles were needed that articulated the full range of criteria for achieving universal design for all types of designs, as well as clarified how the concept of universal design might pertain to specific designs under development and suggested how usability of those designs could be maximized.

4.3 THE PRINCIPLES OF UNIVERSAL DESIGN From 1994 to 1997, the Center for Universal Design conducted a research and demonstration project funded by the U.S. Department of Education’s National Institute on Disability and Rehabilitation Research (NIDRR). The project was titled “Studies to Further the Development of Universal Design” (project no. H133A40006). One of the activities of the project was to develop a set of universal design guidelines. The resulting Principles of Universal Design were as follows: Principle 1: Equitable Use Principle 2: Flexibility in Use Principle 3: Simple and Intuitive Use Principle 4: Perceptible Information Principle 5: Tolerance for Error Principle 6: Low Physical Effort Principle 7: Size and Space for Approach and Use Each of these principles was defined and then expanded in a set of guidelines describing key elements that should be present in a design adhering to the principle (see Table 4.1). The purpose of the Principles of Universal Design and their associated guidelines was to articulate the concept of universal design in a comprehensive way. The principles reflected the authors’ belief that basic universal design principles applied to all design disciplines, including those that focused on built environments, products, and communications. The principles were intended to guide the design process, allow systematic evaluation of designs, and assist in educating both designers and consumers about the characteristics of more usable design solutions (Story et al. 1998; Center for Universal Design, 2000a; Mueller, 1997). The authors of the Principles of Universal Design envisioned that beyond the principles and guidelines, two additional levels of detail would eventually be developed. If level 1 were conceptual principles and level 2 were design guidelines, level 3 would be compliance tests (e.g., Center for Universal Design, 2000b) and level 4 would be design strategies. The tests in level 3 might be in the form of questions that would allow designers to query a design for universal usability. Level 4, which would offer strategies for meeting the guidelines and passing the tests, would have several discipline-specific branches. For example, for Principle 3, Simple and Intuitive Use, the level 4 design strategies might describe the following: • For architecture—methods of creating clear environmental way-finding features • For products—methods of applying the concepts of correspondence and cognitive mapping to user interfaces • For software—methods of supporting broadly accessible user interaction modes

THE PRINCIPLES OF UNIVERSAL DESIGN

TABLE 4.1

4.5

The Principles of Universal Design, Version 2.0 (Connell et al., 1997)

Principle 1: Equitable Use The design is useful and marketable to people with diverse abilities. Guidelines: 1a. Provide the same means of use for all users: identical whenever possible; equivalent when not. 1b. Avoid segregating or stigmatizing any users. 1c. Make provisions for privacy, security, and safety equally available to all users. 1d. Make the design appealing to all users. Principle 2: Flexibility in Use The design accommodates a wide range of individual preferences and abilities. Guidelines: 2a. Provide choice in methods of use. 2b. Accommodate right- or left-handed access and use. 2c. Facilitate the user’s accuracy and precision. 2d. Provide adaptability to the user’s pace. Principle 3: Simple and Intuitive Use Use of the design is easy to understand, regardless of the user’s experience, knowledge, language skills, or current concentration level. Guidelines: 3a. Eliminate unnecessary complexity. 3b. Be consistent with user expectations and intuition. 3c. Accommodate a wide range of literacy and language skills. 3d. Arrange information consistent with its importance. 3e. Provide effective prompting and feedback during and after task completion. Principle 4: Perceptible Information The design communicates necessary information effectively to the user, regardless of ambient conditions or the user’s sensory abilities. Guidelines: 4a. Use different modes (pictorial, verbal, tactile) for redundant presentation of essential information. 4b. Maximize “legibility” of essential information. 4c. Differentiate elements in ways that can be described (i.e., make it easy to give instructions or directions). 4d. Provide compatibility with a variety of techniques or devices used by people with sensory limitations. Principle 5: Tolerance for Error The design minimizes hazards and the adverse consequences of accidental or unintended actions. Guidelines: 5a. Arrange elements to minimize hazards and errors: most used elements, most accessible; hazardous elements eliminated, isolated, or shielded. 5b. Provide warnings of hazards and errors. 5c. Provide fail-safe features. 5d. Discourage unconscious action in tasks that require vigilance. Principle 6: Low Physical Effort The design can be used efficiently and comfortably and with a minimum of fatigue. Guidelines: 6a. Allow user to maintain a neutral body position. 6b. Use reasonable operating forces. (Continued)

4.6

PRINCIPLES, STANDARDS, AND GUIDELINES

TABLE 4.1

The Principles of Universal Design, Version 2.0 (Connell et al., 1997) (Continued)

Principle 6: Low Physical Effort 6c. Minimize repetitive actions. 6d. Minimize sustained physical effort. Principle 7: Size and Space for Approach and Use Appropriate size and space is provided for approach, reach, manipulation, and use regardless of user’s body size, posture, or mobility. Guidelines: 7a. Provide a clear line of sight to important elements for any seated or standing user. 7b. Make reach to all components comfortable for any seated or standing user. 7c. Accommodate variations in hand and grip size. 7d. Provide adequate space for the use of assistive devices or personal assistance. Copyright © 1997 by North Carolina State University. Major funding provided by the National Institute on Disability and Rehabilitation Research, U.S. Department of Education.

4.4 EXAMPLES OF THE PRINCIPLES OF UNIVERSAL DESIGN It is useful to illustrate the Principles of Universal Design with examples. Each of the designs presented here demonstrates a good application of one of the guidelines associated with the principles. The design solutions included here are not necessarily universal in every respect, but each is a good example of a specific guideline and helps to illustrate its intent. Principle 1: Equitable Use The design is useful and marketable to people with diverse abilities. • In other words, designs should appeal to diverse populations and offer everyone a comparable and nonstigmatizing way to participate. The water play area in a children’s museum shown in Fig. 4.1 simulates a meandering brook and invites enjoyment for everyone in and around the water. It is appealing to and usable by people who are short or tall, young children or older adults. Principle 2: Flexibility in Use The design accommodates a wide range of individual preferences and abilities. • In other words, designs should provide for multiple ways of doing things. Adaptability is one way to make designs universally usable. A medical examination table that adjusts in height can be lowered so that it is easier for the patient to get onto and off of the table; it can also be raised so that it is easier for the health care provider to examine and treat the patient at a level that is most effective and comfortable for him or her and the specific procedure (Fig. 4.2). Principle 3: Simple and Intuitive Use Use of the design is easy to understand, regardless of the user’s experience, knowledge, language skills, or current concentration level.

THE PRINCIPLES OF UNIVERSAL DESIGN

4.7

FIGURE 4.1 “Meandering brook” at children’s museum invites participation. Long description: The photo shows a water play area in the sunshine outside a children’s museum. It has multiple pools of varying heights that have curving side walls. The water in the pools cascades from one pool to the next. A number of plastic balls float in the pools. Many people are playing in the water and with the balls. Some people are standing, bending over, or sitting next to the pools. Some of the children are standing or crouching in the pools.

• In other words, make designs work in expected ways. The prototype electronic thermostat designed at the Center for Universal Design provides information in visual, audible, and tactile formats (Fig. 4.3). The functions are clearly laid out and labeled; readouts are provided in both digital format (visible) and analog format (visible and tactile); and the thermostat’s voice output (audible) helps users know what is happening when they push the buttons. For example, when the user presses one of the directional keys, which has a raised tactile arrow, the thermostat announces “72 degrees.” If the user keeps the down-pointing arrow depressed, the thermostat will count down: “71, 70, 69, 68, . . .” When the user lets go, the thermostat repeats “68 degrees.” The other control buttons would also trigger the thermostat to speak.

4.8

PRINCIPLES, STANDARDS, AND GUIDELINES

FIGURE 4.2

Height-adjustable exam table suits a wide range of patient and health care professional users.

Long description: The photo shows the side view of a height-adjustable medical examination table. The table is set to its lowest height and is in a flat position. A man sits on the end of the table with his hands on the surface and feet flat on the floor. A walker stands on the floor in front of him.

FIGURE 4.3 Prototype thermostat design by the Center for Universal Design. Long description: The photo shows a computer rendering of a prototype thermostat and its remote control. The left end of the thermostat is a semicircular, high-contrast analog gauge that displays the temperature with a needle that points to a value between 45° when straight down to 95° when straight up. To the right, in the middle of the thermostat body, is a large digital display that shows 72°. Above the display is a large tactile button with an upward-pointing triangle, and below it is a button with a downward-pointing triangle. In the upper corner of the right edge of the thermostat is a large, round, tactile button labeled H for heat, and in the lower corner is a corresponding button labeled C for cool. Between these two buttons are two other buttons, labeled FAN and OFF. The remote control has only two large, tactile buttons, grouped closer to one end, with triangles that point up and down.

THE PRINCIPLES OF UNIVERSAL DESIGN

4.9

FIGURE 4.4 Subway fare machines with high-contrast and tactile lettering and audible output on demand. Long description: The photos show close-up views of a subway fare vending machine. The machine surface is black, information on it is presented in white uppercase and lowercase lettering and raised black uppercase lettering, and it has a button that is labeled “Push for audio.”

Principle 4: Perceptible Information The design communicates necessary information to the user, regardless of ambient conditions or the user’s sensory abilities. • In other words, designs should provide for multiple modes of output. The subway fare machine shown in Fig. 4.4 provides tactile lettering in all-capital letters, which is easier to feel with the fingertips, and high-contrast printed lettering in capital and lowercase letters, which is easier to see with low vision. The fare machine also offers a push button for selecting instructions to be presented audibly for users with vision impairments. Redundant audible feedback is also helpful for people with disabilities that affect cognitive processing. Principle 5: Tolerance for Error The design minimizes hazards and the adverse consequences of accidental or unintended actions. • In other words, designs should make it difficult for users to make a mistake; but if users do, the error should not result in injury to the person or the product. The dead-man switch, activated by a secondary bar that runs parallel to the handle of some power lawn mowers (Fig. 4.5), requires the user to squeeze the bar and the handle together to make the mower blade spin. If the two are not held together, the blade stops turning. Principle 6: Low Physical Effort The design can be used efficiently and comfortably and with a minimum of fatigue. • In other words, designs should minimize strain and overexertion.

4.10

PRINCIPLES, STANDARDS, AND GUIDELINES

FIGURE 4.5

The “dead-man” switch on a lawn mower handle requires conscious use.

Long description: The photo shows the handle portion of a lawnmower. The handle is made of metal tubing, which is bent down on the sides and attaches to the mower body. A U-shaped bar is connected to the tubing on the sides near the handle and is spring-loaded to rest in a position away from the handle. The user must squeeze the bar against the handle for the mower blade to operate.

FIGURE 4.6 Computer hardware can be configured with a microphone to work with voice recognition software. Long description: The photo shows a laptop computer setup with multiple peripheral devices. The computer is open, and a microphone sits to the right side.

THE PRINCIPLES OF UNIVERSAL DESIGN

4.11

FIGURE 4.7 Lowering one section of the nurses’ station counter in a hospital suits the needs of visitors who are shorter or seated in a wheelchair or scooter. Long description: The photo shows a nurses’ station in a hospital. Most of the counter is raised to standing elbow height, but one section in the middle is cut away, which exposes the desk surface behind. A woman sits at the desk, interacting with a small girl in front of the station who is wearing a hospital gown.

A microphone and voice recognition software on a computer (Fig. 4.6) eliminate the need for highly repetitive keystrokes or manual actions of any kind. This feature accommodates disabilities of the hand and also reduces repetitive stress injuries to the hand.

Principle 7: Size and Space for Approach and Use Appropriate size and space is provided for approach, reach, manipulation, and use regardless of the user’s body size, posture, or mobility. • In other words, designs should accommodate variety in people’s body sizes and ranges of motion. Reception desks that have counters at multiple heights (Fig. 4.7), for example, 28-in. high for sitting and 36-in. high for standing, accommodate people of varying heights, postures, and preferences.

4.5

CONCLUSION The efforts described in this chapter to create a set of Principles of Universal Design were an attempt to articulate a concept that embraces human diversity and applies to all design specialties. It is important to recognize, however, that while the principles are useful, they offer only a starting point for the universal design process. By its nature, any design challenge can be successfully addressed through multiple solutions. Choosing the most appropriate design solution requires an understanding of and negotiation among inevitable tradeoffs in accessibility and usability. This demands a commitment to soliciting user input throughout the design process. It is essential to involve representative users in

4.12

PRINCIPLES, STANDARDS, AND GUIDELINES

evaluating designs during the development process to ensure that the needs of the full diversity of potential users have been addressed. The Principles of Universal Design helped to articulate and describe the different aspects of universal design. The principles’ purpose was to guide others, and in spite of their general nature, they have proved useful in shaping projects of various types all over the world. It is the author’s hope that they will continue to support and inspire ongoing advancement in the field of universal design.

4.6

BIBLIOGRAPHY Center for Universal Design, Universal Design Exemplars (CD-ROM), Raleigh, N.C.: North Carolina State University, 2000a. ——, Universal Design Performance Measures for Products, Raleigh, N.C.: North Carolina State University, 2000b. Connell, B. R., M. L. Jones, R. L. Mace, J. L. Mueller, A. Mullick, E. Ostroff, J. Sanford, et al., The Principles of Universal Design, Version 2.0, Raleigh, N.C.: Center for Universal Design, North Carolina State University, 1997. Mueller, J. L., Case Studies on Universal Design, Raleigh, N.C.: Center for Universal Design, North Carolina State University, 1997. Story, M. F., J. L. Mueller, and R. L. Mace, The Universal Design File: Designing for People of All Ages and Abilities, Raleigh, N.C.: Center for Universal Design, North Carolina State University, 1998. ——, Images of Universal Design Excellence, Takoma Park, Md.: Universal Designers and Consultants, 1996.

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/2504882

The GOMS Family of User Interface Analysis Techniques: Comparison and Contrast Article  in  ACM Transactions on Computer-Human Interaction · September 1999 DOI: 10.1145/235833.236054 · Source: CiteSeer

CITATIONS

READS

675

7,221

2 authors: Bonnie E. John

David Kieras

IBM

University of Michigan

171 PUBLICATIONS   6,052 CITATIONS

174 PUBLICATIONS   12,325 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Further applications of EPIC architecture View project

The Application of Cognitive Models in Human-Computer Interaction View project

All content following this page was uploaded by Bonnie E. John on 23 August 2013. The user has requested enhancement of the downloaded file.

SEE PROFILE

The GOMS Family of User Interface Analysis Techniques: Comparison and Contrast Bonnie E. John Computer Science, Psychology and the Human-Computer Interaction Institute Carnegie Mellon University Pittsburgh, PA 15213 Phone: (412) 268-7182 Email: [email protected]

& David E. Kieras Department of Electrical Engineering and Computer Science University of Michigan Advanced Technology Laboratory Building 1101 Beal Avenue Ann Arbor, MI 48109-2110 Phone: (313) 763-6739 Email: [email protected]

10 June 1996

*** To appear in ACM ToCHI ***

Copyright © 1996 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or [email protected].

GOMS Family Comparison

Keywords: GOMS, cognitive modeling, usability engineering

2

p. 2

ABSTRACT Since the publication of The psychology of human-computer interaction (Card, Moran & Newell, 1983), the GOMS model has been one of the most widely known theoretical concepts in HCI. This concept has produced several GOMS analysis techniques that differ in appearance and form, underlying architectural assumptions, and predictive power. This paper compares and contrasts four popular variants of the GOMS family (the Keystroke-Level Model, the original GOMS formulation, NGOMSL, and CPM-GOMS) by applying them to a single task example. 1.

INTRODUCTION

Since the publication of The psychology of human-computer interaction (Card, Moran & Newell, 1983, hereafter, CMN), GOMS analysis has been one of the most widely known theoretical concepts in HCI. The GOMS concept, that it is useful to analyze knowledge of how to do a task in terms of Goals, Operators, Methods, and Selection rules, provided the stimulus for much research that verifies and extends the original work. Today, there are several variants of the GOMS analysis technique, and many applications of the technique in real-world design situations (John & Kieras, in press). However, the clear differences between these techniques can create confusion about how they relate to each other and to the original concept. The purpose of this paper is to compare several of the popular variants, demonstrating and discussing their similarities and differences. This paper is not a tutorial in how to use any version of GOMS; that information is elsewhere in textbooks, handbooks and tutorial notes (CMN; John & Gray, 1995; Kieras, 1988, in press). It is also not a guide for deciding when to use the variants of GOMS in a particular design situation; that information is in the preceding paper (John & Kieras, in press). This paper presents how different GOMS techniques are related. We will examine four variants of GOMS: the simplest version presented by Card, Moran and Newell, called the Keystroke-Level Model (KLM); the original formulation of GOMS, which we will refer to as CMN-GOMS; a more rigorous version called NGOMSL; and a version that can model overlapping human activities, CPM-GOMS. To make the comparison, we analyze the same task in each of these variants, then discuss the qualitative and quantitative similarities and differences. 1.1.

The example task.

Throughout this presentation, we use a single example task, and present how each GOMS technique represents this task. A GOMS model can, and should, start at a high level of a task such as collaboratively writing a research paper with a co-author. At such a high level, the subtasks involve many different applications: a word processor to actually write the paper, a graphics application to make figures, bibliographies to look up related work, e-mail to send the drafts back and forth, the operating system used to manage the files, and so forth. This wide inclusion of applications, many of which were not designed with the others in mind, gives a broad perspective on the knowledge people need to accomplish such a complex task. GOMS models can then show how knowledge transfers from one application to another, or how much additional time is spent moving information between applications that do not fit together well. However, presenting such a broad task is impossible within the confines of this article, so we will present a very small part of it, editing a paragraph in a word-processor (Figure 1), and make reference to the larger task as appropriate. Text-editing was the original task used in the development of GOMS, and is of course still an important task domain. However, it is incorrect to assume that GOMS is somehow limited to textediting; GOMS is much more general. In fact, nine cases presented in John & Kieras (in press)

GOMS Family Comparison

p. 4

concern task domains radically different from text editing, as do other published work (e.g., Beard, Smith, & Denelsbeck, in press; Vera & Rosenblatt, 1995). But this familiar domain, with task goals and typical procedures familiar to all readers, makes the best example context to present and compare the different GOMS techniques. Before presenting the analyses, we define each of the components of the GOMS model and discuss an important distinction between two forms that GOMS model take. In order to understand GOMS models that have arisen in the last decade and the relationships between them, an analyst must understand each of the components of the model (goals, operators, methods, and selection rules), the concept of level of detail, and the different computational forms that GOMS models take. In this section, we will each of these concepts; in subsequent sections we will categorize existing GOMS models according to these concepts.

Figure 1. The example task: editing a marked-up manuscript.

1.2.

Definitions of GOMS Components

1.2.1. Goals. Goals are what the user has to accomplish. The common-sense meaning of the term applies here; a goal is the "end towards which effort is directed" (Webster's, 1977, p. 493). In the collaborative writing example mentioned above, the highest-level goal is to write the paper. Goals are often broken down into sub-goals; all of the subgoals must be accomplished in order to achieve the overall goal. Some subgoals for collaborative-writing might be to format the bibliography, send the current draft to the second author, or incorporate marked-up comments into the text file (Figure 1). Expanding the latter, the subgoal could be EDIT-MANUSCRIPT and its subgoals might be MOVE-TEXT, DELETE-PHRASE and INSERT-WORD. All of the subgoals must be accomplished to accomplish the higher-level goal. Goals and sub-goals are often arranged hierarchically, but a strict hierarchical goal structure is not required. In particular, some versions of GOMS models allow several goals to be active at once, and some versions represent extremely well-practiced behavior in a "flattened" structure that does not contain an explicit hierarchy of subgoals. 1.2.2. Operators. An operator is an action performed in service of a goal. Operators can be perceptual, cognitive, or motor acts, or a composite of these. Operators can change the user's internal mental state or physically change the state of the external environment. The important parameters of operators, in

GOMS Family Comparison

p. 5

particular execution time, are assumed to be independent of how the user or the system got into the current state (i.e., independent of the history of operators). Execution time may be approximated by a constant, by a probability distribution, or by a function of some parameter. For instance, the time to type a word might be approximated by a constant (e.g., the average time for an average word by an average typist), or a statistical distribution, or by a function involving the number of letters in the word and the time to type a single character (which could, in turn be approximated by a constant or a distribution). The accuracy of execution time predictions obtained from a GOMS model depends on the accuracy of this assumption and on the accuracy of the duration estimates. In our text-editing example, with the goal-hierarchy defined above, some operators could be MOVE-MOUSE, CLICK-MOUSE-BUTTON, SHIFT-CLICK-MOUSE-BUTTON and HIT-DELETE-KEY. 1.2.3. Methods. Methods are sequences of operators and subgoal invocations that accomplish a goal. If the goals have a hierarchical form, then there is a corresponding hierarchy of methods. The content of the methods depends on the set of possible operators and on the nature of the tasks represented. One method for accomplishing the goal DELETE-PHRASE (in the text editor we are using to write this paper) would be to MOVE-MOUSE to the beginning of the phrase, CLICK-MOUSE-BUTTON, MOVEMOUSE to the end of the phrase, SHIFT-CLICK-MOUSE-BUTTON, and finally, HIT-DELETE-KEY (the mark-and-delete method). 1.2.4. Selection rules. There is often more than one method to accomplish a goal. Instead of the mark-and-delete method just described, another method for accomplishing the DELETE-PHRASE goal in Figure 1 would be MOVE-MOUSE to the end of the phrase, CLICK-MOUSE-BUTTON, and HIT-DELETE-KEY 11 times (the delete-characters method). If there is more than one method applicable to a goal, then selection rules are necessary to represent the user's knowledge of which method should be applied. Typically such rules are based on specific properties of the task instance. Selection rules can arise through a user's personal experience with the interface or from explicit training. For example, a user may have a rule for the delete-phrase goal that says if the phrase is more than eight characters long, then use the mark-and-delete method, otherwise use the delete-characters method. 1.2.5. Goals vs. Operators: Level of Detail It is important to clarify a common point of confusion about goals and operators. The distinction is strictly one of the required level of detail: The difference between a goal and an operator in a GOMS analysis is merely a matter of the level of detail chosen by the analyst. For a goal, the analyst provides a method that uses lower-level operators to specify the details of how it is to be accomplished; in contrast, operators are not broken down any further. That is, an analyst will decide that certain user activities do not need to be "unpacked" into any more detail, and thus will represent them as operators, while other activities do need to be considered in more detail so the analyst will represent these in terms of goals with their associated methods. Thus, any particular GOMS analysis assumes a certain grain of analysis, a "stopping point" in the level of detail, chosen to suit the needs of the analysis. Continuing the text-editing example, a GOMS analysis could have only one goal (EDIT-MANUSCRIPT) and a few high-level operators (e.g., MOVE-TEXT, DELETE-PHRASE and INSERT-WORD). Or, if the design situation required a finer level of detail, the analysis could have four goals (EDIT-MANUSCRIPT, with MOVETEXT, DELETE-PHRASE and INSERT-WORD as subgoals) and finer-grained operators like MOVECURSOR, CLICK-MOUSE-BUTTON, DOUBLE-CLICK-MOUSE-BUTTON, SHIFT-CLICK-MOUSE-

GOMS Family Comparison

p. 6

BUTTON and HIT-DELETE-KEY to accomplish these goals.

In principle, the goals and operators of a task could be described at much higher levels (e.g., collaboratively writing a paper) or ever-deeper levels of detail, down to muscle group twitches. However, at any stopping point, the analyst must be sure that it is reasonable to assume that the execution times of the lowest-level operators (primitive operators) are constant regardless of the surrounding context (or are a constant function of some given parameter) . The times can be estimated from data, often from previous similar tasks found in the literature, and used to predict performance on new tasks. The dual constraints that primitive operators be context-free and already estimated leads most GOMS models to stop at the command or keystroke level. It is not necessary to bring all parts of an analysis down to the same level of primitive operators. In many design situations, different parts of a system or different user tasks may require different levels of scrutiny, and GOMS allows such selective detail of analysis. Starting from the high-level user goals, the analyst expands only those parts of the goal hierarchy as necessary for the questions at hand. Other parts can be expanded later as other questions arise. For instance, in collaborative writing, a GOMS analyst might first chose to expand all the search-functions in the different applications (word-processor, bibliography, e-mail), and weeks later expand on the spellchecking functions. Thus, decomposing goals into sub-goals and primitive operators is a very flexible analysis tool that suits many design situations. 1.3.

Form of a GOMS Model

Different GOMS models in the literature differ substantially in the basic form and appearance of their methods. There are two basic forms: the program form and the sequence form. 1.3.1. Program form. A GOMS model in program form is analogous to a parameterized computer program. The methods take any admissible set of task parameters and will execute the corresponding instance of the task correctly. For example, if the mark-and-delete method described above was represented in program form, it would take as task parameters the starting and ending locations of the to-bedeleted phrase, and when executed, would move the mouse to the corresponding locations. Thus, a GOMS model in program form describes how to accomplish a general class of tasks, with a specific instance of the class being represented by a set of values for the task parameters. Typically, such a model will explicitly contain some form of conditional branching and invocations of submethods to accomplish subgoals. The procedural knowledge represented in program form is fixed, but the execution pathway and sequence of operators through the task will depend on the specific properties of the task instance. Once the model is defined, all of the possible tasks can be covered by different execution pathways through the model. Thus, a program form model is a compact, generative1 description that explicitly represents the knowledge of what features of the task environment the user should attend to and how the user should operate the system to accomplish the task goals. The program form has the advantage that all procedural knowledge is visible to the analyst. In addition, if many task instances need to be analyzed, the generative nature of the program form allows those tasks to be instantiated quickly, especially if implemented in a running computer

1

The term generative is used analogously to its sense in formal linguistics. The syntax of a language can be represented compactly by a generative grammar, a set of rules for generating all of the grammatical sentences in the language.

GOMS Family Comparison

p. 7

program. However, program form has two disadvantages. First, the only way to determine the sequence of operators used in a task instance is to run the model (either by hand or machine) and obtain a trace of the method execution steps. Second, defining and expressing a complete and accurate program form model can be quite time consuming, especially if it is represented as a machine-executable model. 1.3.2. Sequence form. In contrast, the methods in a sequence-form GOMS model contain a fixed sequence of operators for accomplishing a particular task instance. There may be some conditionality and parameters included in the sequence model. For instance, in the text-editing example above, listing the exact operators necessary to delete the phrase indicated in Figure 1 is a GOMS model in sequence form (e.g., MOVE-MOUSE, CLICK-MOUSE-BUTTON, 11*HIT-DELETE-KEY). A more general sequence model would take the number of characters in the phrase as a parameter and contain an implicit iteration. For example, for the delete-characters method, there would be a MOVE-MOUSE operator, a CLICK-MOUSE-BUTTON operator, and then the HIT-DELETE-KEY operator would be repeated until there were no more characters in the phrase. The advantages and disadvantages of the sequence form are the inverse of the program form. That is, the analyst does not have to explicitly define the procedural knowledge for every possible task situation in program-like detail, and the sequence of operators is clearly visible to the analyst. But there may be more information about the structure of the methods than can be captured by the operator sequences for a set of task instances; such unrepresented aspects will not be inspectable. Finally, even though listing the operator sequence for an individual task instance is usually easy, if a large number of task instances are involved, it could be time-consuming to construct and evaluate the corresponding large number of sequence-form models. 2.

COMPARISON OF GOMS TASK ANALYSIS TECHNIQUES

We now apply each technique to the example task, discuss the underlying architectural basis and the ensuing constraints for each technique, and compare and contrast the analysis with that of the other GOMS variants. 2.1.

The Keystroke-Level Model (KLM)

The Keystroke-Level Model (KLM) is the simplest GOMS technique (Card, Moran, & Newell, 1980a; CMN, Ch. 8). To estimate execution time for a task, the analyst lists the sequence of operators and then totals the execution times for the individual operators. In particular, the analyst must specify the method used to accomplish each particular task instance. Other GOMS techniques discussed below predict the method given the task situation, but the KLM does not. Furthermore, the specified methods are limited to being in sequence form and containing only keystroke-level primitive operators. Given the task and the method, the KLM uses preestablished keystroke-level primitive operators to predict the time to execute the task. The original KLM presentation included six types of operators: K to press a key or button, P to point with a mouse to a target on a display, H to home hands on the keyboard or other device, D to draw a line segment on a grid, M to mentally prepare to do an action or a closely-related series of primitive actions, and R to represent the system response time during which the user has to wait for the system. Each of these operators has an estimate of execution time, either a single value, a parameterized estimate (e.g., K is dependent on typing speed and whether a key or mouse button click, press, or release is involved), or a simple approximating function. As presented in CMN, the KLM technique includes a set of five heuristic rules for placing mental operators to account for mental preparation time during a task that requires several physical operators. For example, Rule 0

GOMS Family Comparison

p. 8

reads "Insert M's in front of all K's that are not part of argument strings proper (e.g., text or numbers). Place M's in front of all P's that select commands (not arguments)." (CMN, p. 265) Subsequent research has refined these six primitive operators, improving the time estimates or differentiating between different types of mental operations (Olson & Olson, 1990). Practitioners often tailor these operators or define new ones to suit their particular user group and interface requirements (e.g., Haunold & Kuhn, 1994). In addition, the heuristics for placing mental operators have been refined for specific types of subtasks (e.g., for making a fixed series of menu choices, Lane, Napier, Batsell & Naman, 1993). Since the original heuristic rules were created primarily for command-based interfaces, they had to be updated for direct manipulation interfaces. Thus, heuristic Rule 0 should be expanded to read, "Insert M's in front of all K's that are not part of argument strings proper (e.g., text or numbers). Place M's in front of all P's that select commands (not arguments) or that begin a sequence of direct-manipulation operations belonging to a cognitive unit.2 " 2.1.1. Architectural basis and constraints The KLM is based on a simple underlying cognitive architecture, a serial stage model of human information-processing in which one activity is done at a time until the task is complete. All of the human information-processing activity is assumed to be contained in the primitive operators, including internal perceptual and cognitive actions, which are subsumed by black-box Mental (M) operators. This restricts the KLM to tasks that can be usefully approximated by a series of operators, with no parallel activities, no interruptions, and no interleaving of goals. Luckily, many single-user computer tasks are usefully approximated with these restrictions. However, these restrictions, along with primitive operators defined to be at the keystroke-level, make the KLM impractical for representing an entire high-level task like collaboratively writing a research paper. The next two GOMS variants are more able to handle that task. 2.1.2. Example KLM Figure 2 provides a sample KLM for moving the circled phrase in Figure 1. To construct this model, we used heuristics for placing Ms that have been updated for mouse-based interfaces (CMN, p. 265 and above) and the original operator types and times supplied in CMN (p. 264). Figure 2 also includes illustrative observations that an analyst might make about the model. Quantitatively, the KLM makes the prediction that this task will take about 14 seconds. Qualitatively, the analyst can use the model to highlight several ideas. The subgoal structure is not explicit in the KLM itself, but an analyst can see it in the model (as annotated) and use it to look for recurring subprocedures that might be combined or shortened. For instance, the analyst has made an annotation to consider a MOVE command instead of CUT and PASTE . A KLM for MOVE would show what time savings this would provide, which could then be weighed against other considerations like users' prior knowledge or other functionality (e.g. the ability to paste multiple copies). Considering the sub-goal structure is an important use of all GOMS versions, and the next two variants will make it explicit in the model itself.

2

The concept of a cognitive unit is discussed in CMN, p. 268.

GOMS Family Comparison Moving text with the MENU - METHOD Description Operator Mentally prepare by Heuristic Rule 0 M Move cursor to beginning of phrase P (no M by Heuristic Rule 1) Click mouse button K (no M by Heuristic Rule 0) Move cursor to end of phrase P (no M by Heuristic Rule 1) Shift-click mouse button (one average typing K) K (one mouse button click K) K Mentally prepare by Heuristic Rule 0 M Move cursor to Edit menu P (no M by Heuristic Rule 1) Press mouse button K Move cursor to Cut menu item P (no M by Heuristic Rule 1) Release mouse button K Mentally prepare by Heuristic Rule 0 M Move cursor to insertion point P Click mouse button K Mentally prepare by Heuristic Rule 0 M Move cursor to Edit menu P (no M by Heuristic Rule 1) Press mouse button K Move cursor to Paste menu item P (no M by Heuristic Rule 1) Release mouse button K TOTAL PREDICTED TIME

p. 9

Duration (sec) 1.35 1.10 0.20 1.10 0.28 0.20 1.35 1.10 0.10 1.10 0.10 1.35 1.10 0.20 1.35 1.10 0.10 1.10 0.10 14.38

Figure 2. A Keystroke-Level Model for moving the text in Figure 1. The notes on the right represent hand-written notes an analyst might add to the KLM to highlight ideas.

2.2.

Card, Moran, & Newell GOMS (CMN-GOMS)

CMN-GOMS is the term we use to refer to the form of GOMS model presented in CMN (Ch. 5; Card, Moran, & Newell, 1980b). CMN-GOMS has a strict goal hierarchy. Methods are represented in an informal program form that can include submethods and conditionals. A CMNGOMS model, given a particular task situation, can thus predict both operator sequence and execution time. CMN do not describe the CMN-GOMS technique with an explicit "how to" guide, but their presentation of nine models at different levels of detail illustrates a breadth-first expansion of a goal hierarchy until the desired level of detail is attained. CMN report results in which such models predicted operator sequences and execution times for text editing tasks, operating systems tasks, and the routine aspects of computer-aided VLSI layout. These examples are sufficiently detailed and extensive that researchers have been able to develop their own CMN-GOMS analyses (e.g., Lerch, Mantei, & Olson, 1989).

GOMS Family Comparison

p. 10

2.2.1. Architectural basis and constraints In the context of the CMN book, it would appear that CMN-GOMS is based on the Model Human Processor (MHP), a simple conventional model of human information processing with parallel stages described by CMN (Ch. 2, and summarized in section 2.6.1 below). But in fact CMN do not establish this relationship, and do not derive the GOMS concept from the specific properties of the MHP. Rather, CMN-GOMS is based on two of the MHP "Principles of Operation", the Rationality Principle and the Problem Space Principle, both of which developed in the problem-solving theoretical literature (e.g., Newell & Simon, 1972; see CMN Ch. 11). The Problem Space Principle postulates that a user's activity can be characterized as applying a sequence of actions, called operators, to transform an initial state into a goal state. With experience, the sequence of operators to accomplish a goal no longer has to be inferred; rather the sequence, termed a method, can be routinely recalled and executed when the same goal situation is recognized (CMN Ch. 11). The Rationality Principle asserts that users will develop methods that are efficient, given the structure of the task environment (i.e., the design of the system) and human processing abilities and limitations. Thus, human activity with a computer system can be viewed as executing methods to accomplish goals, and because humans strive to be efficient, these methods are heavily determined by the design of the computer system. This means that the user's activity can be predicted to a great extent from the system design. Thus, constructing a GOMS model based on the task and the system design can predict useful properties of the human interaction with a computer. CMN-GOMS, like the KLM, is based on the simple serial stage architecture, and even though it has program methods and goal structure, no further assumptions are made about how these methods are executed or represented. Consequently, CMN-GOMS is easy to write, but the lack of an explicit description of the method representation and mechanisms involved in task execution means that CMN-GOMS models are relatively vague and unspecified compared to the next two GOMS techniques. 2.2.2. Example CMN-GOMS model Because the goal hierarchy is explicitly represented, a CMN-GOMS model could start at the level of collaboratively writing a research paper, with subgoals like SEND-DRAFT-TO-CO-AUTHOR, FORMAT-BIBLIOGRAPHY, or EDIT-MANUSCRIPT. Figure 3 displays only those goals and operators at and below the EDIT-MANUSCRIPT subgoal. It includes details for the MOVE-TEXT subgoal and illustrative analyst annotations. Moving is accomplishing by first cutting the text and then pasting it. Cutting is accomplished by first selecting the text, and then issuing the CUT command. As specified by a selection rule, selecting of text can be done in two different ways, depending on the nature of the text to be selected. Finally pasting requires selecting the insertion point, and then issuing the PASTE command. Quantitatively, CMN-GOMS models predict the operator sequence and execution time. Qualitatively, CMN-GOMS models focus attention on methods to accomplish goals; similar methods are easy to see, unusually short or long methods jump out (as annotated) and can spur design ideas. In addition, the annotations indicate that this analyst has observed that the VERIFY operator explicitly records points of feedback to the user. 2.2.3. Comparison to the KLM. A major difference between the KLM and the CMN-GOMS models is that CMN-GOMS is in program form; therefore, the analysis is general and executable. That is, any instance of the

GOMS Family Comparison

p. 11

GOAL: EDIT-MANUSCRIPT . GOAL: EDIT-UNIT-TASK ...repeat until no more unit tasks . . GOAL: ACQUIRE UNIT-TASK ...if task not remembered . . . GOAL: TURN-PAGE ...if at end of manuscript page . . . GOAL: GET-FROM-MANUSCRIPT . . GOAL:EXECUTE-UNIT-TASK ...if a unit task was found . . . GOAL: MODIFY-TEXT . . . . [select: GOAL: MOVE-TEXT* ...if text is to be moved . . . . GOAL: DELETE-PHRASE ...if a phrase is to be deleted . . . . GOAL: INSERT-WORD] ...if a word is to be inserted . . . . VERIFY-EDIT

*Expansion of MOVE-TEXT goal GOAL: MOVE-TEXT . GOAL: CUT-TEXT . . GOAL: HIGHLIGHT-TEXT . . . [select**: GOAL: HIGHLIGHT-WORD . . . . MOVE-CURSOR-TO-WORD . . . . DOUBLE-CLICK-MOUSE-BUTTON . . . . VERIFY-HIGHLIGHT . . . GOAL: HIGHLIGHT-ARBITRARY-TEXT . . . . MOVE-CURSOR-TO-BEGINNING . . . . CLICK-MOUSE-BUTTON . . . . MOVE-CURSOR-TO-END . . . . SHIFT-CLICK-MOUSE-BUTTON . . . . VERIFY-HIGHLIGHT] . . GOAL: ISSUE-CUT-COMMAND . . . MOVE-CURSOR-TO-EDIT-MENU1.10 . . . PRESS-MOUSE-BUTTON . . . MOVE-MOUSE-TO-CUT-ITEM . . . VERIFY-HIGHLIGHT . . . RELEASE-MOUSE-BUTTON . GOAL: PASTE-TEXT . . GOAL: POSITION-CURSOR-AT-INSERTION-POINT . . . MOVE-CURSOR-TO-INSERTION-POINT . . . CLICK-MOUSE-BUTTON . . . VERIFY-POSITION . . GOAL: ISSUE-PASTE-COMMAND . . . MOVE-CURSOR-TO-EDIT-MENU1.10 . . . PRESS-MOUSE-BUTTON . . . MOVE-MOUSE-TO-PASTE-ITEM1.10 . . . VERIFY-HIGHLIGHT . . . RELEASE-MOUSE-BUTTON

1.10 0.20 1.10 0.48 1.35

0.10 1.10 1.35 0.10

1.10 0.20 1.35

0.10

1.35 0.10 TOTAL TIME PREDICTED (SEC) 14.38

**Selection Rule for GOAL: HIGHLIGHT-TEXT: If the text to be highlighted is a single word, use the HIGHLIGHT-WORD method, else use the HIGHLIGHT-ARBITRARY-TEXT method. Figure 3. Example of CMN-GOMS text-editing methods showing the top-level unit-task method structure, an expansion of one method, and a selection rule.

GOMS Family Comparison

p. 12

described class of tasks can be performed or simulated by following the steps in the model, which may take different paths depending on the specific task situation. Subgoal invocation and method selection are predicted by the model given the task situation, and need not be dictated by the analyst as they must for the KLM. Another major difference is that the goal-hierarchy is explicit in CMNGOMS, while it was implicit in the KLM. Comparing Figure 3 with Figure 2 shows the relationship between CMN-GOMS and the KLM. For instance, there is a one-to-one mapping between the physical operators in the CMN-GOMS model and the Ks and Ps in the KLM. The CMN-GOMS model has other operators at this level: VERIFY-LOCATION and VERIFY-HIGHLIGHT, which are not overt physical actions. The KLM has no explicit goals or choices between goals, whereas the CMN-GOMS model represents these explicitly. Roughly, the VERIFY operators, subgoal invocations, and selection rules of the CMNGOMS model are represented as the M operators in the KLM. That is, such operators appear in the CMN-GOMS model in groups that roughly correspond to the placement of Ms in the KLM. This is only approximately the case, as the VERIFY operators sometimes occur in the middle of a group of physical operators, but the approximation is close. Given the task specified by the manuscript in Figure 1, this model would predict the trace of operators shown with the estimates of operator times in the far right column. The estimates for the physical operators are identical to the ones in the KLM. The VERIFY-HIGHLIGHT and VERIFYPOSITION operators are assigned 1.35 sec, the same value as the KLM's M operator because this is CMN's best estimate of mental time in the absence of other information.3 Thus, the CMN-GOMS model produces the same estimate for task completion as the KLM. Notice that the CMN-GOMS technique assigns time only to operators, not to any "overhead" required to manipulate the goal hierarchy. In their results, CMN found that time predictions were as good with the assumption that only operators contributed time to the task as they were when goal manipulation also contributed time. However, they suggested that at more detailed levels of analysis such cognitive activity might become more important. Also notice that where the KLM puts Ms at the beginning of subprocedures, the CMN-GOMS model puts the mental time in verify operators at the end of subprocedures. Since mental time is observable only as pauses between actions, it is difficult to distinguish between these two techniques empirically, and only appeals to more detailed cognitive architectures can explain the distinction. Pragmatically, however, this difference is irrelevant in most design situations. We will discuss the issue of mental time again after presenting all the GOMS techniques. 2.3.

Natural GOMS Language (NGOMSL)

NGOMSL is a structured natural language notation for representing GOMS models and a procedure for constructing them (Kieras, 1988, in press). An NGOMSL model is in program form, and provides predictions of operator sequence, execution time, and time to learn the methods. An analyst constructs an NGOMSL model by performing a top-down, breadth-first expansion of the user's top-level goals into methods, until the methods contain only primitive operators, typically keystroke-level operators. Like CMN-GOMS, NGOMSL models explicitly represent the goal structure, and so can represent high-level goals like collaboratively writing a research paper.

3

Some design situations may require, or provide opportunity for, using better estimates of specific types of mental operators. Analysts can look at the additional empirical work of CMN in Chapter 5 where they measure many specific mental times, or other HCI empirical work (e.g. John & Newell, 1987 for estimates of time to recall command abbreviations, Olson & Olson, 1990, for mental preparation in spreadsheet use).

GOMS Family Comparison

p. 13

2.3.1. Architectural basis and constraints The NGOMSL technique refines the basic GOMS concept by representing methods in terms of a cognitive architecture called cognitive complexity theory (CCT: Kieras & Polson, 1985; Bovair, Kieras, & Polson, 1990). CCT assumes a simple serial stage architecture in which working memory triggers production rules that apply at a fixed rate. These rules alter the contents of working memory or execute primitive external operators such as making a keystroke. GOMS methods are represented by sets of production rules in a prescribed format. Learning procedural knowledge consists of learning the individual production rules. Learning transfers from a different task if the rules had already been learned (see also Anderson, 1993). CCT has been shown to provide good predictions of both execution time, learning time, and transfer of procedure learning (Kieras & Bovair, 1986; Bovair, Kieras & Polson, 1988, 1990). NGOMSL originated from attempts to define a higher-level notation to represent the content of a CCT model (Bennett, Lorch, Kieras, & Polson, 1987; Butler, Bennett, Polson, and Karat, 1989). It is a structured natural language notation in which methods are represented in program form as a list of steps which contain operators, both external keystroke-level operators, and also internal operators that represent operations of the CCT architectural mechanisms, such as adding and removing working memory information, or setting up subgoals. The relationship between the NGOMSL notation and the CCT architecture is direct: there is essentially a one-to-one relationship between statements in the NGOMSL language and the production rules for a GOMS model written in the CCT format. Therefore, the CCT prediction results can be used by NGOMSL models to estimate not only execution time like KLM and CMN-GOMS, but also the time to learn the procedures. Although an NGOMSL analysis can provide useful descriptions of a task at many levels of analysis (Karat & Bennett 1991), quantitative predictions of learning and execution times are meaningful only if the methods use operators that the user is assumed to already know and that have known properties. CCT and NGOMSL model have been empirically validated at the keystroke-level of analysis (operators like DETERMINE-POSITION and CLICK-MOUSE-BUTTON), thus, models at that level can produce reliable quantitative estimates. In principle, other levels could be researched and empirically validated, but this has not yet been done. Because NGOMSL models specify methods in program form, they can characterize the procedural complexity of tasks, both in terms of how much must be learned, and how much has to be executed. However, the underlying simple serial stage architecture of CCT limits NGOMSL to hierarchical and sequential methods. Thus, there is no provision for representing methods whose steps could be executed in any order, or which could be interrupted and resumed. Also, there is no direct way to represent how perceptual, cognitive, and motor processing might overlap. For example, there is no provision for representing a user doing perceptual processing on an icon while simultaneously homing the hand to the mouse and doing a retrieval from long-term memory. To some extent it is possible to approximate overlapping operations by setting certain operator times to zero (as has been done in Figure 4, see Gong, 1993);. Direct representation of processing overlap requires a different underlying cognitive architecture; such an approach is represented by the CPMGOMS technique, to be discussed next. 2.3.2. Example NGOMSL model Continuing the text editing example, Figure 4 shows the NGOMSL methods involved in moving text. Notice that more methods are represented here than are executed in the example task instance. Quantitatively, NGOMSL provides learning time as well as execution time predictions, discussed in detail below. Qualitatively, NGOMSL provides all that KLM and CMN-GOMS provide, and

NGOMSL Statements

Method Step Step Step

for 1. 2. 3.

Executions

goal: Move text Accomplish goal: Cut text. Accomplish goal: Paste text. Return with goal accomplished.

External Operator Times

1 1 1 1

Method for goal: Cut text Step 1. Accomplish goal: Highlight text. Step 2. Retain that the command is CUT, and accomplish goal: Issue a command. Step 3. Return with goal accomplished.

1 1

Method for goal: Paste text Step 1. Accomplish goal: Position cursor at insertion point. Step 2. Retain that the command is PASTE, and accomplish goal: Issue a command. Step 3. Return with goal accomplished.

1 1

Selection rule set for goal: Highlight text If text-is word, then accomplish goal: Highlight word. If text-is arbitrary, then accomplish goal: Highlight arbitrary text. Return with goal accomplished.

1

1 1

1 1

1 1

Method Step Step Step Step Step

for 1. 2. 3. 4. 5.

goal: Highlight word Determine position of middle of word. Move cursor to middle of word. Double-click mouse button. Verify that correct text is selected Return with goal accomplished.

Method Step Step Step Step Step Step Step Step

for 1. 2. 3. 4. 5. 6. 7. 8.

goal: Highlight arbitrary text Determine position of beginning of text. Move cursor to beginning of text. Click mouse button. Determine position of end of text. (already known) Move cursor to end of text. Shift-click mouse button. Verify that correct text is highlighted. Return with goal accomplished.

1 1 1 1 1 1 1 1 1

1.20 1.10 0.20 0.00 1.10 0.48 1.20

Method Step Step Step Step Step

for 1. 2. 3. 4. 5.

goal: Position cursor at insertion point Determine position of insertion point. Move cursor to insertion point. Click mouse button. Verify that correct point is flashing Return with goal accomplished.

1 1 1 1 1 1

1.20 1.10 0.20 1.20

Method for goal: Issue a command Step 1. Recall command name and retrieve from LTM the menu name for it, and retain the menu name. Step 2. Recall the menu name, and move cursor to it on Menu Bar. Step 3. Press mouse button down. Step 4. Recall command name, and move cursor to it. Step 4. Recall command name, and verify that it is selected. Step 5. Release mouse button. Step 6. Forget menu name, forget command name, and return with goal accomplished.

1 1 1 1 1 1 1 1

Predicted Pure Procedure Learning Time for 44 statements + 6 LTM chunks = 784 sec Total Predicted Execution Time = 16.38 sec Figure 4. An example of NGOMSL methods for moving text, showing a generic command-issuing method that uses items in long-term memory to associate menu names to the contained commands. Adapted from Kieras (in press).

1.10 0.10 1.10 1.20 0.10

GOMS Family Comparison

p. 15

more. For example, NGOMSL makes the similarity between methods explicit, i.e., all menucommands use a submethod for issuing a command. Like CMN-GOMS, VERIFY operators draw the analyst's attention to feedback. In addition, NGOMSL models explicitly represent working memory and long-term memory usage, allowing the analyst to assess the demands of the design on those cognitive resources. In this example, working memory need only store the command name and the menu name, a reasonable amount of information. This model assumes that users will have learned which commands are in which menus; if they haven't they will either systematically search through all the menus, or guess. Because these assumptions are explicit, they can be questioned, and considered in design. Learning time predictions. NGOMSL models have been shown to be good predictors of time to learn how to use a system, keeping in mind that what is predicted is the pure learning time for the procedural knowledge represented in the methods. Note that, as mentioned above, the user is assumed to already know how to execute the operators; the GOMS methods do not represent the knowledge involved in executing the operators themselves, but only represent the knowledge of which operators to apply and in what order to accomplish the goal. Innovative interface technology often results in new operators; moving the cursor with a mouse was a new operator for many users in the early 1980s, and selecting objects with an eye-movement tracker or manipulating 3D objects and flying about in virtual space with data-glove gestures will be new operators as these technologies move into the workplace. Clearly, the time to learn how to execute new operators is a critical aspect of the value of new interface devices, but a GOMS model that assumes such operators can not predict their learning time. The time to learn new operators themselves would have to be measured, or simply not included in the analysis. The total elapsed time to learn to use a system depends not only on how much procedural knowledge must be learned but on how much time it takes to complete the training curriculum itself. That is, most learning of computer use takes place in the context of the new user performing tasks of some sort, and this performance would take a certain amount of time even if the user were fully trained. Thus the total learning time consists of the time to execute the training tasks plus the extra time required to learn how to perform the tasks (the pure learning time). As Gong (1993) showed, training-task execution times can be estimated from a GOMS model of the training tasks. The key empirical result is that the procedure learning time is approximately linear with the number of NGOMSL statements that must be learned. Thus, the pure learning time for the methods themselves can be estimated just by counting the statements and multiplying by an empirically-determined coefficient. Transfer of training effects can be calculated by deducting the number of NGOMSL statements in methods that are identical, or highly similar, to ones already known to the learner (see Kieras, 1988, in press; also Bovair, Kieras, & Polson, 1988, 1990). This characterization of interface consistency in terms of the quantitative transferability of procedural knowledge is perhaps the most significant contribution of the CCT research and the NGOMSL technique. An important limitation of this result is that the accuracy of absolute predictions of learning time will depend on whether the analyst has followed the same "style" in writing the methods as was used to obtain the empirical coefficient. This uncertainty can be dealt with by performing relative comparisons using models written in a consistent style. Further work is needed to describe and document a style for analysts to follow that will yield consistently accurate absolute predictions of learning time. An additional component of the pure learning time is the time required to memorize chunks of declarative information required by the methods, such as the menu names under which commands are found. Such items are assumed to be stored in long-term memory (LTM), and while not strictly part of the GOMS methods, are required to be in LTM for the methods to execute correctly.

Including this component in learning time estimates is a way to represent the learning load imposed by menu or command terms, and the heuristics suggested in CMN can be applied to estimate the time to memorize these items based on the number of chunks. However, heuristics for counting chunks are not very well defined at this time (see Gong, 1993). The validity and utility of the learning time predictions depend on the general requirements of the learning situation. Clearly, if the learner is engaged in problem-solving, or in an unstructured learning situation, the time required for learning is more variable and ill-defined than if the learner is trained in a tightly controlled situation. The original work by Kieras, Polson, and Bovair used a mastery learning situation, in which the users were explicitly trained on the methods and were required to repeatedly execute each procedure fully and exactly before going to the next (Bovair, Kieras, & Polson, 1990; Kieras & Bovair, 1986; Polson, 1988). The CCT predictions were extremely accurate in this sort of learning situation. Gong (1993) used a more realistic learning situation in which users were given a demonstration and explanation, and then had to perform a series of training tasks at their own pace, without detailed feedback or correction. The NGOMSL method length, transfer measures, and the number of memory chunks were excellent predictors of this more realistic training time, although the prediction coefficients were different than those in Kieras (1988). Finally, even in learning situations that are naturalistically unstructured, at least the ordinal predictions of learning time should hold true, as suggested by results such as Ziegler, Hoppe, & Fahnrich (1986). It seems reasonable that regardless of the learning situation, systems whose methods are longer and more complex will require more time to learn, because more procedural knowledge has to be acquired, either by explicit study or inferential problem-solving. But clearly more work on the nature of relatively unstructured learning situations is required. The above discussion of estimating learning time can be summarized as follows, using the values determined by Gong (1993): Total Procedure Learning Time =

Pure Procedure Learning Time + Training Procedure Execution Time

Pure Procedure Learning Time =

NGOMSL Method Learning Time + LTM Item Learning Time

NGOMSL Method Learning Time = 17 sec · No. of NGOMSL Statements to be Learned LTM Item Learning Time = 6 sec · Number of LTM Chunks to be Learned These formulas give a pure procedure learning time estimate for the whole set of methods shown in Figure 4 of 784 sec, in a "typical" learning situation, assuming no prior knowledge of any methods, and assuming that learning the proper command words for the two menu terms will require learning three chunks each. Execution time predictions. Like the other GOMS models, execution time predictions are based on the sequence of operators executed while performing the benchmark tasks. A trace of the example NGOMSL model performing the text moving example is summarized in Figure 4. The trace includes the same sequence of physical operators as the KLM and CMN-GOMS models in Figures 2 and 3. The predicted execution time is obtained by counting 0.1 sec for each NGOMSL statement executed and adding the total external operator time, using values based on the KLM recommended in Kieras (in press). This gives a predicted execution time of 16.38 sec, which is comparable to the predictions of the other two models (14.38 sec for both the KLM and CMNGOMS models).

GOMS Family Comparison

p. 17

2.3.3. Comparison with KLM and CMN-GOMS. The primary difference between execution time predictions for NGOMSL, KLM and CMNGOMS is how time is assigned to cognitive and perceptual operators. There are some stylistic differences in how many large mental operators are assumed; for example, the NGOMSL example follows the recommendations (Kieras, 1988; in press) for the number and placement of DETERMINE-POSITION and VERIFY operators, and so has more M-like operators than do the CMNGOMS and KLM models. These stylistic differences could be resolved with further research. A more important difference is in the nature of the unobservable operators. The KLM has a single crude M operator that precedes each cognitive unit of action. NGOMSL, because it is based on CCT, uniformly requires some cognitive execution time for every step, manipulating goals and working memory, and for entering and leaving methods. In contrast, CMN-GOMS assigns no time to such cognitive overhead. But all three models include M-like operators for substantial time-consuming mental actions such as locating information on the screen and verifying entries. Thus, these methods assign roughly the same time to unobservable perceptual and cognitive activities, but do so at different places in the trace. 2.4.

Cognitive-Perceptual-Motor GOMS (CPM-GOMS)

CPM-GOMS, like the other GOMS models, predicts execution time based on an analysis of component activities. However, CPM-GOMS requires a specific level of analysis where the primitive operators are simple perceptual, cognitive and motor acts. Unlike the other extant GOMS techniques, CPM-GOMS does not make the assumption that operators are performed serially; rather, perceptual, cognitive and motor operators can be performed in parallel as the task demands. CPM-GOMS uses a schedule chart (or PERT chart, familiar to project managers, e.g. Stires & Murphy, 1962) to represent the operators and dependencies between operators. The acronym CPM stands for both the Cognitive-Perceptual-Motor level of analysis and also Critical Path Method, since the critical path in a schedule chart provides the prediction of total task time. 2.4.1. Architectural basis and constraints CPM-GOMS is based directly on the Model Human Processor (MHP; see CMN, Ch. 2), which is a basic human information-processing architecture similar to those appearing in the human cognitive and performance literature for the last few decades. The human is modeled by a set of processors and storage systems in which sensory information is first acquired, recognized, and deposited in working memory by perceptual processors, and then a cognitive processor acts upon the information and commands motor processors to make physical actions. Each processor operates serially internally, with a characteristic cycle time, but processors run in parallel with each other. The unique contribution of CMN was to present this standard picture of human-information processing in the form of an engineering model, which by careful simplifications and approximations, is able to quantitatively account for many basic phenomena relevant to humancomputer interaction (see CMN, Ch 2). The CPM-GOMS technique directly applies the MHP to a task analysis by identifying the operators that must be performed by each processor, and the sequential dependencies between them. The MHP architecture allows parallelism between CPM-GOMS operators, which is necessary for analyzing some tasks, but it also forces the primitive operators to be at the level of the cycletimes of the MHP's processors. Thus, CPM-GOMS models are much more detailed than previous GOMS variants. As the following example will make clear, CPM-GOMS models are too detailed for tasks that can be usefully approximated by serial operators. CPM-GOMS models also make an assumption of extreme expertise in the user. That is, they

GOMS Family Comparison

p. 18

typically model performance that has been optimized to proceed as fast as the MHP and information-flow dependencies will allow. We will discuss the implications of this assumption in the context of the text-editing example, below. 2.4.2. Example CPM-GOMS model To build a CPM-GOMS model, the analyst begins with a CMN-GOMS model of a task, (thereby inheriting all the qualitative information obtained from doing a CMN-GOMS model). The CMNGOMS model can start at any level but must stop with operators at the activity level, primarily high-level perceptual (READ-SCREEN) or motor (ENTER-COMMAND) actions. The analyst continues by dropping to a lower level where these operators are then expressed as goals, which are accomplished by methods containing MHP-level operators. John & Gray (1995) have provided templates (assemblies of cognitive, perceptual and motor operators and their dependencies) for different activities under different task conditions. For instance, Figure 5 contains the template for the READ-SCREEN goal when an eye-movement is required, and Figure 6 contains the template when the user is already looking at right spot on the display. Each operator in the templates has a duration estimate, or a set of estimates that depend on task conditions. For instance, visually perceiving and comprehending a 6-character word takes 290 ms, whereas visually perceiving and comprehending that a symbol is merely present or absent (e.g., the presence of highlighting) takes 100 ms (Figures 5 and 6). These templates are first joined together serially, and then interleaved to take advantage of the parallelism of the underlying cognitive architecture. The operators, their estimates of duration, and the pattern of dependencies between them, combine to produce a detailed model of which actions will occur in the performance of the task and when they will happen. The sequence of operators which produces the longest path through this chart is called the critical path; the sum of the durations of operators on the critical path estimates the total duration of the task. If empirical data about actual performance of observable motor operators is available from a current system that is similar to the system being designed, it is desirable to verify the model against these data. Then the verified models are modified to represent the proposed design and quantitative predictions of performance time can be determined from the critical path of the CPM-GOMS model. Qualitative analysis of what aspects of a design lead to changes in the performance time are quite easy once the models are built, as are subtask profiling, sensitivity and parametric analyses, and playing "whatif" with suggested design features (Chuah, John & Pane, 1994; Gray, John & Atwood, 1993). Continuing the example of the MOVE-TEXT goal, Figure 7 shows a CPM-GOMS model of this task. For brevity, the model covers only the portion of the procedure involved with highlighting the text to be moved. Each box in the chart represents an operator, and each horizontal line of boxes represents the sequence of operators executed by a perceptual, cognitive, or motor processor. The lines connecting the boxes indicate sequential dependencies between the operators, and the highlighted lines correspond to the critical path. Before discussing this example model in detail, it is important to note that text-editing is not a good application of the CPM-GOMS technique and we present it here only to compare it to the other GOMS techniques. Text-editing is usefully approximated by serial processes, which is why the KLM, CMN-GOMS and NGOMSL have been so successful at predicting performance on text editors. CPM-GOMS is overly detailed for such primarily serial tasks and can underestimate the execution time. For examples of tasks for which a parallel-processing model is essential, and where the power of CPM-GOMS is evident, see the telephone operator task in Gray, John and Atwood (1993) and transcription typing (John, in press; John & Newell, 1989).

GOMS Family Comparison

p. 19

100 msec if perceiving a simple binary visual signal 290 msec if perceiving a complex visual signal similar to a perceive 6-letter word info (x)

Visual Perception 50 msec

Cognitive Operators

50 msec

50 msec

verify info (x)

initiate eye movement (x)

attend info (x)

30 msec eye movement (x)

Eye Movement

Figure 5. Example of a template for building CPM-GOMS models adapted from John & Gray, 1995. This template accomplished the goal READ-SCREEN, when an eye-movement is required in the task.

Visual Perception 50 msec

Cognitive Operators

attend info (x)

100 msec if perceiving a simple binary visual signal 290 msec if perceiving a complex visual signal similar to a perceive 6-letter word info (x) 50 msec verify info (x)

Figure 6. Example of a template for building CPM-GOMS models adapted from John & Gray, 1995. This template accomplished the goal READ-SCREEN, when an eye-movement is NOT required in the task.

Execution time predictions. In Figure 7, the times for the operators shown on the boxes in the schedule chart are based on the durations estimated by John & Gray (1995). The highlighted lines and boxes comprise the critical path. Reading the total duration on the final item of the critical path gives a total execution time through this subsequence of the task equal to 2.21 sec. The ability of CPM-GOMS to represent parallel processing is illustrated in the set of operators that accomplish the MOVE-TO-BEGINNING-OF-PHRASE goal. These operators are not performed strictly serially, that is, the eye-movement and perception of information occur in parallel with the cursor being moved to the new location. The information-flow dependency lines between the operators ensure that the eyes must get there first, before the new position of the cursor can be

GOMS Family Comparison

p. 20

verified to be at the right location, but the movement of the mouse takes longer than the eyemovement and perception, so it defines the critical path. Multiple active goals can be represented in CPM-GOMS models and are illustrated in Figure 7 in the sets of operators that accomplish the MOVE-TO-END-OF-PHRASE goal and the SHIFT-CLICKMOUSE-BUTTON goal. Because the shift key is hit with the left hand (in this model of a righthanded person) and the mouse is moved with the right hand, pressing the shift-key can occur while the mouse is still being moved to the end of the phrase. Thus, the operators that accomplish the SHIFT-CLICK-MOUSE-BUTTON goal are interleaved with the operators that accomplish the MOVETO-END-OF-PHRASE goal. This interleaving represents a very high level of skill on the part of the user. 2.4.3. Comparison with KLM ,CMN-GOMS, and NGOMSL. Although text-editing is not the best task to display the advantages of CPM-GOMS, there are several interesting aspects of the model in Figure 7 compared to the other example models. First, there is a direct mapping from the CMN-GOMS model to the CPM-GOMS model, because all CPM-GOMS models start with CMN-GOMS and the particular model in Figure 7 was built with reference to the one in Figure 3. As with the KLM, selection rules are not explicitly represented because CPM-GOMS models are in sequence form, and the chooses a particular method for each task instance. For example, in Figure 7, the selection between HIGHLIGHT-ARBITRARY-PHRASE and HIGHLIGHT-WORD that is explicitly represented in CMN-GOMS and NGOMSL, is only implicit in the analyst's choice of the method for this particular model. Although the qualitative process represented in this CPM-GOMS model is reasonable, its quantitative prediction is much shorter than the estimates from the other models. The CPM-GOMS model predicts the total execution time to be 2.21 sec; totaling the execution time over the same steps in the other models gives 4.23 sec for both the KLM and CMN-GOMS and 6.18 sec for the NGOMSL model. The primary source of the discrepancy between the GOMS variants is the basic assumption in the CPM-GOMS technique that the user is extremely experienced and executes the task as rapidly as the MHP architecture permits. One aspect of the extreme-expertise assumption is that the CPM-GOMS model assumes that the user knows exactly where to look for the to-be-moved-phrase. This means that the model needs only one eye-movement to find the beginning and one to find the end of the target phrase and that the mouse movements to these points can be initiated prior to the completion of the eye movements. In some real-world situations, like telephone operators handling calls (Gray, John, and Atwood, 1993), the required information always appears at fixed screen locations, and with experience, the user will learn where to look. But in a typical text editing task like our example, the situation changes from one task instance to the next, and so visual search would be required to locate the target phrase. CPM-GOMS has been used to model visual search processes (Chuah, John & Pane, 1994), but for brevity, we did not include this complexity in our example. A second aspect of the assumed extreme expertise is that the example does not include any substantial cognitive activity associated with selection of methods or complex decisions. Such cognitive activity is represented in the other GOMS variants with M-like operators of about a second in duration. In contrast, in Figure 7, the method selection is implicit in a single cognitive operator (INITIATE-MOVE-TEXT-METHOD) which is the minimum cognitive activity required by the MHP to recognize a situation and note it in working memory. Likewise, VERIFY-POSITION operators are included in the CPM-GOMS model, but they represent much more elementary recognitions that the cursor is indeed in the location where the model is already looking rather than complex verifications that a text modification has been done correctly required in CMN-GOMS and NGOMSL. Thus, Figure 7 represents a minimum of cognitive activity, which is an unreasonable

...model continues

start time (msec) duration (msec)

Bold indicates critical path

operator name

KEY:

Eye Movement Motor Operators

Left Hand Motor Operators

Right Hand Motor Operators

Cognitive Operators

Visual Perception Operators

initiatehighlightphrase-method

50 50 attend info (beginning of phrase)

100 50 initiate eyemovement (beginning of phrase)

150 50

200 30 eye movement (beginning of phrase)

initiate cursor movement

200 50

move cursor (beginning of phrase)

verify location of phrase

520 50

250 480

830 50 initiate mouse-button click

880 50

down mousebutton

930 100

These cognitive and motor operators accomplish the goal click-mouse-button, which is an operator in the KLM, CMN-GOMS and NGOMSL models.

verify location of cursor

perceive location of cursor

730 100

Figure 7. CPM-GOMS model of a move-text method for the text-editing task in Figure 1.

These two cognitive operators accomplish the mental preparation to set up the move-text task.

initiatemove-text method

0 50

perceive info (beginning of phrase)

230 290

These perceptual, cognitive and motor operators together accomplish the goal: move-cursor-to-beginning , which is an operator in the KLM, CMN-GOMS and NGOMSL models. This model assumes that the hand can begin moving the cursor in the correct direction before the eyes have fully found and verified the destination of the cursor (however, the eyes do get there before the cursor does). Thus, the eye movement, perception and verification are not on the critical path.

up mousebutton

1,030 100

GOMS Family Comparison p. 21

GOMS Family Comparison

These perceptual, cognitive and motor operators accomplish the goal: move-to-end-of-phrase, which is an operator in the KLM, CMN-GOMS and NGOMSL models. Notice how operators for the next goal: shift-click-mouse-button, are interleaved with the operators for this goal. This is how multiple active goals are represented in CPM-GOMS models and are an indication of extreme expertise.

1,810 100

1,310 290 perceive location of cursor

perceive info (end of phrase) 1,130 50 initiate moveto-endof-phase

1,180 50 attend info (end of phrase)

1,230 50 initiate eye movement (end of phrase)

1,280 50 initiate cursor movement

1,910 50

1,600 50 initiate mouse-button shift-click

verify location of end of phrase

verify location of cursor

1,960 50 initiate mouse-button click 2,010 100

1,330 480

down mousebutton

move cursor (end of phrase)

1,700 100 1,280 30

press shift key

2,110 100 up mousebutton

2210 mesc to this point. ...model continues with task...

2,110 100 release shift key

eye movement (end of phrase)

These cognitive and motor operators accomplish the goal of shift-click-the-mouse-button. Notice that they interleave with the previous goal, occur in parallel with other operators, and thus only some of the six operators contribute time to the critical path.

Figure 7 (con't). CPM-GOMS model of a move-text method for the text-editing task in Figure 1.

p. 22

GOMS Family Comparison

p. 23

assumption for a normal text-editing task. However, in an experiment by CMN (pp. 279-286), the performance time of an expert user on a novel editing task was well predicted by the KLM, but after 1100 trials on the exact same task instance, the performance time decreased by 35%, largely because the M operators became much shorter. It is this type of extreme expertise that our example CPM-GOMS model represents. A more elaborate CPM-GOMS model could represent complex decisions as a series of MHP-level operators performing minute cognitive steps serially, as in the earlier work on recalling computer command (John & Newell, 1987). However, the technique for modeling complex decisions in CPM-GOMS is still a research issue, and so it should be used only for tasks in which method selection is based on obvious cues in the environment and decisions can be represented very simply. A final contributor to the short predicted time is that the mouse movements in CPM-GOMS are calculated specifically for the particular target size and distance in this situation, yielding much shorter times than CMN's 1.10 sec estimate of average pointing time used in the other models (further discussion appears in the next section). 3.

Summary and Comparison of the GOMS Techniques

We have modeled the same goal, MOVE-TEXT, with four different GOMS task analysis techniques. For purposes of comparison, we included a CPM-GOMS model for the same textediting task, although the technique is not recommended for modeling such sequential tasks, and for brevity, it was shown only for the text-highlighting submethod. 3.1.

Summary comparison of predictions.

The KLM, CMN-GOMS and NGOMSL models all produce the same sequence of observable operators, as does the CPM-GOMS model (although at a more detailed level). Table 1 summarizes the quantitative predictions from the above presentation, both for the overall example task, and the subtask consisting just of highlighting the to-be-moved text. NGOMSL is the only one of the four techniques that makes learning time predictions, and these are limited to the effects of the amount of procedural knowledge and related LTM information to be learned, and to learning situations for which the coefficients have been empirically determined. KLM, CMN-GOMS, and NGOMSL produce execution time predictions that are roughly the same for both the overall task and the subtask, although they make different assumptions about unobservable cognitive and perceptual operators and so distribute the time in different ways (see below). An important difference is that the NGOMSL technique currently entails more M-like operators than the other techniques, as well as some cognitive overhead due to method step execution. Thus, NGOMSL will typically predict execution times that are longer than KLM or CMN-GOMS predictions. As shown in the execution time predictions for the text-highlighting submethod, the CPMGOMS model predicts a substantially shorter execution time than the other models. As discussed above, this is due to the assumption of extreme expertise, which produces maximum operator overlapping, finer-grain time estimates for the individual operators, and the minimum of cognitive activity allowed by the MHP. An interesting similarity between NGOMSL and CPM-GOMS is the roughly similar cognitive overhead time in the example submethod; in NGOMSL this value is the statement execution time at 0.1 sec/statement; in CPM-GOMS it is the total time for which the cognitive processor is on the critical path in Figure 7.

GOMS Family Comparison

p. 24

Table 1. Predicted time measures (seconds) for each technique for the MOVE-TEXT example . -------------------------------------------------------------------------------------------------------KLM CMN-GOMS NGOMSL CPM-GOMS -------------------------------------------------------------------------------------------------------Overall Measures Procedure Learning (both highlighting methods)

---

---

784.00

Total Example Task Execution Time

14.38

14.38

16.38

---

not shown in this example

-----------------------------------------------------------------------------------------Text Highlighting Sub-Method Highlighting Sub-Method Execution Time

4.23

4.23

6.18

2.21

Total Cognitive ----0.90 1.10 Overhead -------------------------------------------------------------------------------------------------------3.2.

Summary comparison of operator times.

Table 2 lists the operator times assumed in the different techniques and used in the MOVE-TEXT example. There are basically two types of operators: those that are directly observable when looking at human performance (the motor operators) and those that are not or usually not observable (perceptual and cognitive operators, eye-movements).4 The values for directlyobservable operators are quite similar across the GOMS techniques, while the assumptions about unobservable operators vary more widely. Mouse button operations. The times for mouse button operators and using the shift key in KLM, CMN-GOMS, and NGOMSL are based on values from CMN. The slightly different value for CLICK-MOUSE-BUTTON in the CPM-GOMS technique can be read from the example in Figure 7. That is, clicking the mouse button requires a 50 ms cognitive operator and two motor operators at 100 ms each. The SHIFT-CLICK operation is assumed to be the sequence of hitting the shift key (280 msec, from CMN) and then the mouse button (200 msec) in the first three techniques. However, in CPM-GOMS, the shift key operator can overlap with earlier processing in the MOVE-TEXT task, so that it is not on the critical path. Thus, the entire SHIFT-CLICK operation adds only 250 msec to the critical path (the same as CLICK-MOUSE-BUTTON).

4

Although eye-movements are observable with an eye-tracker, eye-tracking research in HCI is sparse and we will treat them as unobservable in this task.

GOMS Family Comparison

p. 25

Cursor movement. The 1.10 sec used in KLM, CMN-GOMS, and NGOMSL is the average value suggested by CMN for large-screen text editing tasks. But Gong (1993) found that many of the mouse movements involved in using a Macintosh interface, such as making menu selections and activating windows, were much faster than 1.10 sec, and that Fitts' Law estimates (see CMN, p. 55) were more accurate. Thus, Fitts' Law values based on the actual or typical locations and sizes of screen objects should probably be used whenever possible in all of the techniques. For CPM-GOMS, moving the cursor to point to an object is a combination of cognitive operators, motor operators and perceptual operators (see Figure 7) and only some of them occur on the critical path in any particular task situation. The duration of the mouse-movement motor operator itself was calculated using Fitts' Law (480 msec for both movements). In this example, moving to the beginning of the phrase put 680 ms on the critical path (2 cognitive, 1 motor, and 1 perceptual in Figure 7) and, coincidentally, moving to the end of the phrase also put 680 ms on the critical path (also 2 cognitive, 1 motor, and 1 perceptual). Unobservable operations. All of the GOMS variants make assumptions about unobservable operations. The KLM makes the simplest assumption, putting all such operations (perceiving information, eye-movements, comparisons, decisions, mental calculations, etc.) into one operator, M, 1.35 seconds in length. This operator is always put at the beginning of a cognitive unit. CMN-GOMS and NGOMSL break this catch-all M into more specific unobservable operators. CMN-GOMS uses unobservable operators to verify the editing actions (VERIFY-HIGHLIGHT and VERIFY-POSITION), also assigned the estimate of 1.35 seconds; NGOMSL uses DETERMINEPOSITION and VERIFY, both 1.20 seconds. For the KLM, CMN-GOMS, and NGOMSL models, the estimates for the unobservable operators shown are those currently recommended for each technique as average values to be used in the absence of more specific measurements. They are all

Table 2. Operator times (seconds) used in each technique for the MOVE-TEXT example. See text for explanation of the CPM-GOMS entries.

-------------------------------------------------------------------------------------------------------KLM CMN-GOMS NGOMSL CPM-GOMS critical path -------------------------------------------------------------------------------------------------------Directly Observable Motor Operators: Click-mouse-button

0.20

0.20

0.20

0.250

Shift-clickmouse-button

0.48

0.48

0.48

0.250

Cursor movement

1.10

1.10

1.10 or Fitts' Law

0.680 by Fitts' Law

Unobservable Perceptual or Cognitive Operators Mental Preparation

1.35

not used

not used

0.100

Determine Position

not used

not used

1.20

0.100

Edit Verification

not used

1.35

1.20

not used

--------------------------------------------------------------------------------------------------------

GOMS Family Comparison

p. 26

roughly the same at about a second duration, but are slightly different because they were determined empirically with different data sets at different historical points in the development of GOMS techniques. None of the these techniques have a theoretical commitment to any particular value. Any available empirically-determined values for the operators involved in a particular analysis should be used instead of these average estimates. There are also differences in the distribution of mental time. The KLM tends to place mental time in the preparation for action, while CMN-GOMS mental time tends to come at the end of actions in VERIFY operators, and NGOMSL has mental time in both places. These stylistic differences could probably be resolved with further research. In addition to the M-like operators, NGOMSL also takes time for the unobservable activity associated with the production-rule cycling assumed in the underlying architecture and represented with the 0.1 sec/statement "cognitive overhead." In considerably more detail, CPM-GOMS also represents the underlying unobserved operations in terms of the cycle times of the MHP processors, such as the cognitive cycle time (estimated at 70 ms by CMN, but refined by subsequent work to be 50 ms, John & Newell 1989; Nelson, Lehman & John, 1994; Wiesmeyer, 1992), perceptual cycle time (which depends on the complexity of the signal being perceived, see Figures 5 and 6), and eye-movement time (estimated to be 30 msec, CMN, p. 25). Both the duration and dependencies of these unobservable operators are specified in the templates used to construct the model. However, the other operators needed to accomplish a task and their dependencies make every critical path different, and no one estimate of "mental time" is meaningful in CPM-GOMS. For example, in the MOVE-TEXT task in Figure 7, the entry in Table 2 for Mental Preparation is the sum of the durations of the two cognitive operators on the critical path that set up the move-text task and highlight-phrase subtask. The entry for Determine Position is the sum of the durations of those operators that locate the beginning of the phrase on the screen that occur on the critical (3 cognitive operators, 1 eye-movement motor operator, and 1 perceptual operator). All of these operators depend on each other and have to occur in order, thus, if this were the only activity taking place in a task, they would all be on the critical path and take 420 ms. However, since looking for the beginning of the phrase is just one part of the MOVE-TEXT task, other activities can occur in parallel (e.g., moving the mouse, discussed in the last section) and their operators are interleaved with these, making the critical path more complicated, so that only the first two cognitive operators appear on the critical path for this task. 3.3.

Summary comparison of architectural assumptions.

The assumed cognitive architectures range from the trivial, in the case of the KLM, to slightly more complicated for CMN-GOMS, to an elaborated sequential architecture with a working memory and specified procedure knowledge representation in NGOMSL, to a powerful but relatively unspecified multiple parallel processor architecture in CPM-GOMS. The strengths and weaknesses of the techniques corresponds quite directly to these architectural differences. The KLM is easy to apply, but predicts only execution time, and only from analyst-supplied methods. At the other extreme, CPM-GOMS predicts execution time for subtle, overlapping patterns of activities, but also requires analyst-supplied methods. CMN-GOMS, once its program methods have been worked out, can predict execution time for all subsumed task instances, and NGOMSL, with the additional investment in its explicit representation of procedural knowledge, can then also predict some aspects of learning time. Thus, rather than being radically different, the GOMS techniques occupy various points in a space of possible techniques defined by different architectural assumptions and the form of the methods supplied by the analyst (see John & Kieras, 1994, for more discussion). Some important possibilities for research lie in the gaps in this space; for example, the extant set of ready-to-use GOMS techniques lack a program-form approach to analyzing overlapping cognitive, perceptual and motor activities.

GOMS Family Comparison 4.

p. 27

CONCLUSIONS

The four specific GOMS modeling techniques discussed here are all related to a general taskanalysis approach. This general approach emphasizes the importance of the procedures for accomplishing goals that a user must learn and follow in order to perform well with the system. By using descriptions of user procedures, the techniques can provide quantitative predictions of procedure learning and execution time and qualitative insights into the implications of design features. While other aspects of system design are undoubtedly important, the ability of GOMS models to address this critical aspect makes them not only a key part of the scientific theory of human-computer interaction, but also useful tools for practical design (John & Kieras, in press). The current GOMS models are quite effective because they capture procedural speed and complexity. But other aspects of human performance with an interface are not addressed by the simple cognitive architectures underlying the current GOMS variants. Current research in cognitive architectures assumes both more detail and more variety of human mechanisms, and so can potentially account for a wider and deeper range of design issues. Representative examples of such work use architectures that represent perceptual-cognitive-motor interactions (Nelson, Lehman, & John, 1994; Kieras & Meyer, in press; Kieras, Wood, & Meyer, 1995), comprehension processes (Kitajima & Polson, 1992; Doane, Mannes, Kintsch, & Polson, 1992), and problem-solving and learning mechanisms (Altmann, Larkin, & John, 1995; Anderson, 1993; Bauer & John, 1995; Howes, 1994; Polson & Lewis, 1990; Rieman, Lewis, Young and Polson, 1994).5 Because these research efforts are rigorous and make use of computational models, they should eventually lead to engineering-style design tools for additional aspects of interface design and usability. Thus while the current generation of GOMS models are ready for application, we can expect to see future models of human-computer interaction that are even more comprehensive, accurate, and useful. ACKNOWLEDGMENTS The authors contributed equally to this article; the order of their names reflects alphabetical order and not seniority of authorship. We thank Wayne Gray and Judy Olson for their comments on drafts of this paper. Work on this paper by Bonnie John was supported by the Office of Naval Research, Cognitive Science Program, Contract Number N00014-89-J-1975N158, and the Advanced Research Projects Agency, DoD, and monitored by the Office of Naval Research under contract N00014-931-0934. Work on this paper by David Kieras was supported by the Office of Naval Research Cognitive Science Program, under Grant Number N00014-92-J-1173 NR 4422574, and the Advanced Research Projects Agency, DoD, and monitored by the NCCOSC under contract N66001-94-C-6036. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, NCCOSC, the Advanced Research Projects Agency, or the U. S. Government.

5

A special issue on cognitive architectures for HCI is scheduled to appear in Human-Computer Interaction in 1997.

GOMS Family Comparison

p. 28

REFERENCES Altmann, E. M., Larkin, J. H., & John, B. E. (1994). Display navigation by an expert programmer: A preliminary model of memory. Proceedings of CHI, 1995 (Denver, Colorado, May 7-11, 1995) ACM, New York, pp. 3-10. Anderson, J. R. (1993). Rules of the Mind. Hillsdale, NJ: Lawrence Erlbaum Associates. Bauer, M. I. & John, B. E. (1994). Modeling time-constrained learning in a highly-interactive task. Proceedings of CHI, 1995 (Denver, Colorado, May 7-11, 1995) ACM, New York, pp. 19-26. Beard, D. V., Smith, D. K., & Denelsbeck, K. M. (in press). Quick and dirty GOMS: A case study of computed tomography interpretation. Human-Computer Interaction. Bennett, J. L., Lorch, D. J., Kieras, D. E., & Polson, P. G. (1987). Developing a user interface technology for use in industry. In Bullinger, H. J., & Shackel, B. (Eds.), Proceedings of the Second IFIP Conference on Human-Computer Interaction, Human-Computer Interaction – INTERACT '87. (Stuttgart, Federal Republic of Germany, Sept. 1–4). Elsevier Science Publishers B.V., North-Holland, 21-26. Bovair, S., Kieras, D. E., & Polson, P. G. (1988). The acquisition and performance of textediting skill: A production–system analysis. (Tech. Rep. No. 28). Ann Arbor: University of Michigan, Technical Communication Program. Bovair, S., Kieras, D. E., & Polson, P. G. (1990). The acquisition and performance of text editing skill: A cognitive complexity analysis. Human-Computer Interaction, 5, 1-48. Butler, K. A., Bennett, J., Polson, P., and Karat, J. (1989). Report on the workshop on analytical models: Predicting the complexity of human-computer interaction. SIGCHI Bulletin, 20(4), pp. 63-79. Card, S. K., Moran, T. P., & Newell, A. (1980a). The keystroke-level model for user performance time with interactive systems. Communications of the ACM , 23(7), 396-410. Card, S. K., Moran, T. P., & Newell, A. (1980b). Computer text-editing: An informationprocessing analysis of a routine cognitive skill. Cognitive Psychology, 12, 32-74. Card, S. K., Moran, T. P., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Chuah, M. C., John, B. E., & Pane, J. (1994). Analyzing graphic and textual layouts with GOMS: Results of a preliminary analysis. In Proceedings Companion of CHI, 1994, Boston, MA, USA, April 24-28, 1994). New York: ACM, pp. 323-324. Doane, S. M., Mannes, S. M., Kintsch W., and Polson, P. G.(1992). Modeling user action planning: A comprehension based approach. User Modeling and User-Adapted Interaction, 2, 249-285 Gong, R. (1993). Validating and refining the GOMS model methodology for software user interface design and evaluation. Ph.D. dissertation, University of Michigan, 1993.

GOMS Family Comparison

p. 29

Gray, W. D., John, B. E., & Atwood, M. E. (1993)."Project Ernestine: A validation of GOMS for prediction and explanation of real-world task performance." Human-Computer Interaction, 8, 3, pp. 237-209. Haunold, P. & Kuhn, W. (1994). A keystroke level analysis of a graphics application: Manual map digitizing. In Proceedings of CHI, 1994, Boston, MA, USA, April 24-28, 1994). New York: ACM, pp. 337-343. Howes, A. (1994). A model of the acquisition of menu knowledge by exploration. In Proceedings of CHI, 1994, Boston, MA, USA, April 24-28, 1994). New York: ACM, pp. 445-451. John, B. E. (in press) TYPIST: A TheorY of Performance In Skilled Typing. Human-Computer Interaction. John, B. E. & Gray, W. D. (1995)GOMS Analyses for Parallel Activities. Tutorial Notes, CHI, 1995 (Denver, Colorado, May 7-11, 1995), ACM, New York. John, B. E., & Kieras, D. E. (1994). The GOMS family of analysis techniques: Tools for design and evaluation. Technical Report CMU-CS-94-181, School of Computer Science, CarnegieMellon University. John, B. E. & Kieras, D. E. (in press). Using GOMS for user interface design and evaluation: Which technique? ACM Transactions on Computer-Human Interaction . John, B. E. & Newell, A. (1987) Predicting the time to recall computer command abbreviations. In proceedings of CHI+GI, 1987 (Toronto, April 5-9, 1987) ACM, New York, 33-40. John, B. E. & Newell, A. (1989). Cumulating the science of HCI: From S-R compatibility to transcription typing. In Proceedings of CHI, 1989 (Austin, Texas, April 30-May 4, 1989) ACM, New York, 109-114. Karat, J., and Bennett, J. (1991). Modeling the user interaction methods imposed by designs. In M. Tauber and D. Ackermann (Eds.), Mental models and Human-Computer Interaction 2. Amsterdam: Elsevier. Kieras, D. E. (1988). Towards a practical GOMS model methodology for user interface design. In M. Helander (Ed.), The handbook of human-computer interaction. (pp. 135-158). Amsterdam: North-Holland. Kieras, D. E. (in press). A Guide to GOMS Model Usability Evaluation using NGOMSL. In M. Helander & T. Landauer (Eds.), The handbook of human-computer interaction. (Second Edition). Amsterdam: North-Holland. Kieras, D. E., & Bovair, S. (1986). The acquisition of procedures from text: A production– system analysis of transfer of training. Journal of Memory and Language, 25, 507–524. Kieras, D. E., & Meyer, D. E. (in press). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction. Kieras, D. E., & Polson, P. G. (1985). An approach to the formal analysis of user complexity. International Journal of Man-Machine Studies, 22, 365-394.

GOMS Family Comparison

p. 30

Kieras, D. E., Wood, S. D., & Meyer, D. E. (1995). Predictive engineering models using the EPIC architecture for a high-performance task. In Proceedings of CHI, 1995, Denver, CO, USA, May 7-11, 1995. New York: ACM. pp. 11-18. Kitajima, M., & Polson, P. G. (1992). A computational model of skilled use of a graphical user interface. In Proceedings of CHI (Monterey, May 3-7, 1992) ACM, New York. pp. 241-249 Lane, D. M., Napier, H. A., Batsell, R. R., & Naman, J. L. (1993). Predicting the skilled use of hierarchical menus with the Keystroke-Level Model. Human-Computer Interaction, 8, 2, pp. 185-192. Lerch, F. J., Mantei, M. M., & Olson, J. R. (1989). Translating ideas into action: Cognitive analysis of errors in spreadsheet formulas. in Proceedings of CHI, 1989, 121-126. New York: ACM. Nelson, G. H., Lehman, J. F., & John. B. E. (1994). Integrating cognitive capabilities in a realtime task. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (Atlanta, Georgia, August 13-16, 1994). Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: PrenticeHall. Olson, J. R., & Olson, G. M. (1990). The growth of cognitive modeling in human-computer interaction since GOMS. Human-Computer Interaction, 5, 221-265. Polson, P. G. (1988). Transfer and retention. In R. Guindon (Ed.), Cognitive science and its application for human-computer interaction. Hillsdale, NJ: Erlbaum. Pp. 59-162. Polson, P., & Lewis, C. (1990). Theory-based design for easily learned interfaces. HumanComputer Interaction, 5, 191-220. Rieman, J., Lewis, C. Young, R. M., & Polson, P. G. "'Why is a raven like a writing desk?' Lessons in interface consistency and analogical reasoning from two cognitive architectures" In Proceedings of CHI, 1994, (Boston, MA, USA, April 24-28, 1994). New York: ACM, pp. 438-444. Stires, D. M. & Murphy, M. M. (1962). PERT (Program Evaluation and Review Technique) CPM (Critical Path Method). Boston: Materials Management Institute. Webster's New Collegiate Dictionary. (1977). Springfield, MA.: G. & C. Merriam Company. Wiesmeyer, M. D. (1992). An operator-based model of human covert visual attention. Ph.D. thesis, University of Michigan. Vera, A. H. & Rosenblatt, J. K. (1995). Developing user model-based intelligent agents. In J. D. Moore and J. F. Lehman (eds.) Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum. pp. 500-505. Ziegler, J. E., Hoppe, H. U., & Fahnrich, K. P. (1986). Learning and transfer for text and graphics editing with a direct manipulation interface. In Proceeding of CHI, 1986. New York: ACM.

View publication stats

1 Activity Theory and Human-Computer Interaction Bonnie A. Nardi What is activity theory, and how will it benefit studies of human-computer interaction? This book addresses these questions. Many HCI researchers are eager to move beyond the confines of traditional cognitive science, but it is not clear exactly which direction to move in. This book explores one alternative for HCI research: activity theory, a research framework and set of perspectives originating in Soviet psychology in the 1920s. Just as HCI research is concerned with practical problems of design and evaluation, activity theorists from the outset have addressed practical needs, applying their research efforts to the problems of mentally and physically handicapped children, educational testing, ergonomics, and other areas. Following the lead of dialectical materialism, activity theory focuses on practice, which obviates the need to distinguish ``applied'' from ``pure'' science—understanding everyday practice in the real world is the very objective of scientific practice. Activity theory is a powerful and clarifying descriptive tool rather than a strongly predictive theory. The object of activity theory is to understand the unity of consciousness and activity. Activity theory incorporates strong notions of intentionality, history, mediation, collaboration and development in constructing consciousness (see Kaptelinin, chapter 5; Kuutti, this volume). Activity theorists argue that consciousness is not a set of discrete disembodied cognitive acts (decision making, classification, remembering), and certainly it is not the brain; rather, consciousness is located in everyday practice: you are what you do. And what you do is firmly and inextricably embedded in the social matrix of which every person is an organic part. This social matrix is composed of people and artifacts. Artifacts may be physical tools or sign systems such as human language. Understanding the interpenetration of the individual, other people, and artifacts in everyday activity is the challenge activity theory has set for itself. Unlike anthropology, which is also preoccupied with everyday activity, activity theory is concerned with the development and function of individual consciousness. Activity theory was developed by psychologists, so this is not surprising, but it is a very different flavor of psychology from what the West has been accustomed to, as activity theory emphasizes naturalistic study, culture, and history. The chapters in part I explain what activity theory is. They, along with the seminal article, ``The Problem of Activity in Psychology'' by the Russian psychologist Leont'ev (1974) (widely available in English in university libraries), form a primer of activity theory. Activity theory offers a set of perspectives on human activity and a set of concepts for describing that activity. This, it seems to me, is exactly what HCI research needs as we struggle to understand and describe ``context,'' ``situation,'' ``practice.'' We have recognized that technology use is not a mechanical input-output relation between a person and a machine; a much richer depiction of the user's situation is needed for design and evaluation. However, it is unclear how to formulate that depiction in a way that is not purely ad hoc. Here is where activity theory helps: by providing orienting concepts and perspectives. As Engeström (1993) has noted, activity theory does not offer ``ready-made techniques and procedures'' for research; rather, its conceptual tools must be ``concretized according to the specific nature of the object under scrutiny.'' As we expand our horizons to think not only about usable systems but now useful systems, it is imperative that we have ways of finding out what would be useful. How can we begin to understand the best ways to undertake major design projects, such as providing universal access to the Internet, effectively using computers in the classroom, supporting distributed work teams, and even promoting international understanding in ways both small (e.g., international video/e-mail pen-pals for schoolchildren) and large (e.g., using technology to find new means of conflict resolution)? Laboratory-based usability studies are part of the solution, but they are best preceded in a phased design process by careful field studies to ascertain how technology can fit into users' actual social and material environments, the problems users

4

have that technology can remedy, the applications that will promote creativity and enlightenment, and how we can design humane technology that ensures privacy and dignity. Recently a major American journal of HCI rejected a set of papers that would have formed a special issue on activity theory. The concern was that activity theory is hard to learn, and because we have not seen its actual benefits realized in specific empirical studies, the time spent learning it would be of dubious benefit. The chapters in parts II and III of this book speak to this concern by providing empirical studies of human-computer interaction developed from an activity theory perspective. In these pages you will meet Danish homicide detectives, a beleaguered U.S. Post Office robot and its human creators, disgruntled slide makers, absent-minded professors, enthusiastic elementary school students, sly college students, and others. These people and artifacts, and the situations in which they are embedded, are analyzed with concepts from activity theory. Several interesting ways to structure an activity theory analysis are provided in these chapters, so readers are offered substantial methodological tools to support practice. Throughout the book we have tried to ``compare and contrast'' activity theory with other techniques and theories to make it ``easier'' to learn (if indeed it is truly difficult). Thus readers will find that as they read the chapters, they may think about activity theory in relation to cognitive science, GOMS, Gibson's work on affordances, Norman's cognitive artifacts, situated action models, distributed cognition, actor-network theory, and other social scientific artifacts. Bannon and Bødker (1991) have compared activity theory to task analysis and user modeling elsewhere, so we have not undertaken that task here. Briefly, they argued that these approaches are very limited in that (1) task analysis provides a set of procedural steps by which a task supposedly proceeds, with little attention to ``the tacit knowledge that is required in many skilled activities, or the fluent action in the actual work process,'' and (2) user modeling considers user characteristics (e.g., is the user an expert or a novice?) but says little about the situation in which the user works or the nature of the work itself. Activity theory proposes a strong notion of mediation—all human experience is shaped by the tools and sign systems we use. Mediators connect us organically and intimately to the world; they are not merely filters or channels through which experience is carried, like water in a pipe (see Zinchenko, this volume). Activity theorists are the first to note that activity theory itself is but one mediating tool for research (as are all theories!) and that like any tool, its design evolves over time (see Kaptelinin, chapter 3, this volume). Activity theory is certainly evolving and growing; it is not by any means a static end point. Activity theory has a tremendous capacity for growth and change, an intellectual energy that is being realized in research efforts in Russia, Europe, North America, and Australia. I think perhaps this is because of activity theory's rich philosophical and scientific heritage and because it permits such wide scope of analysis. Activity theory provides ample room in the intellectual sandbox for adventure and discovery and leads to the work of philosophers, psychologists, anthropologists, linguists, educators, and others whose thoughts have influenced activity theory. The chapters in part III of this book push on the frontiers of activity theory, expanding its conceptual base. Let's talk for a moment about the most concrete practical benefit we could expect from activity theory in the near term. The most immediate benefit I hope for is the dissemination of a common vocabulary for describing activity that all HCI researchers would share. Activity theory has a simple but powerful hierarchy for describing activity that could be common coin for all HCI researchers. This hierarchy (described in several of the chapters in this book) has a superficial resemblance to GOMS but goes beyond GOMS in essential ways, especially in describing dynamic movement between levels of activity rather than assuming stasis. The development of a common vocabulary is crucial for HCI. As we move toward ethnographic and participatory design methods to discover and describe real everyday activity, we run into the problem that has bedeviled anthropology for so long: every account is an ad hoc description cast in situationally specific terms. Abstraction, generalization and comparison become problematic. An ethnographic description, although it may contain much information of direct value for design and evaluation, remains a narrative account structured according to the author's own personal vocabulary, largely unconstrained and arbitrary. Ethnography—literally, ``writing culture''—assumes no a priori framework that orders the data, that contributes to the coherence and generalizability of the descriptive account. This leads to a disappointing lack of cumulative research results. One would like to be able to develop a comparative framework, perhaps a taxonomy as suggested by Brooks (1991), that would help us as we pursue design and evaluation activities. It would be desirable to be able to go back to previous work and find a structured

5

set of problems and solutions. Activity theory will help us to achieve this goal but not until its concepts become part of a shared vocabulary. Let us look briefly at a few of the main concerns of activity theory: consciousness, the asymmetrical relation between people and things, and the role of artifacts in everyday life. Each of these concerns (and others) will be considered at length in this book, and I introduce them briefly here to anticipate some of what the reader will encounter. A basic tenet of activity theory is that a notion of consciousness is central to a depiction of activity. Vygotsky described consciousness as a phenomenon that unifies attention, intention, memory, reasoning, and speech (Vygotsky 1925/1982; see Bakhurst 1991). Does HCI really need to worry about consciousness? The answer would seem to be yes, as we have been worrying about it all along. A notion of consciousness, especially one that focuses on attention and access to cognitive resources, permeates HCI discourse. When we speak of ``direct manipulation,'' ``intelligent agents,'' ``expert behavior,'' and ``novice behavior,'' we are really positing concepts in which consciousness is central. The notion of consciousness has continually snuck in the back door of HCI studies, as Draper (1993) has pointed out. We use the word ``transparent,'' to describe a good user interface—that is, one that is supportive and unobtrusive, but which the user need pay little, if any, attention to. We have borrowed the concept ``affordances'' from Gibson, which practically dispenses with the notion of consciousness but still implies a particular stance toward it. We speak of ``skilled performance,'' implying a kind of mental ease and access to certain cognitive resources peculiar to experts who have become very good at something. ``Novices,'' on the other hand, consciously labor to perform actions that will later become automatic, requiring little conscious awareness. Their less able performance is attributable to their need to focus deliberate attention on task actions while at the same time working with fewer cognitive resources than they will have available later as they gain expertise and experience in their tasks. Even in the earliest HCI work we find concern with the user's consciousness. In 1972 Bobrow wrote that a programming technique ``can greatly facilitate construction of complex programs because it allows the user to remain thinking about his program operation at a relatively high level without having to descend into manipulation of details.'' This is a succinct statement of the interdependence of the ``how'' and the ``what'' of consciousness: the user's attention is at stake, and at the very same time, so is the content of what he thinks about as he programs. Consciousness is still with us: Carey and Rusli (1995) argue that simply observing users does not tell the researcher enough; it must be discovered what the user is thinking. They give an example, asking, ``Was a switch in search tactics the result of abandoning an unproductive attempt, or the result of gaining knowledge from the last few actions?'' There are very different implications for technology design depending on the reason for the switch. Looking back more than a decade at Malone's (1983) classic paper on office organization, we find that Malone noted that users' behavior cannot be understood without reference to intentionality: is a user organizing her office so that she can find something later, or so that she will be reminded of something? The observer sees the same behavior but cannot know what it means without asking the user. Malone observed that finding and reminding are quite different functions, equally important for users, and that we cannot understand them if we do not take account of the user's intentions. The unstudied use of a notion of consciousness will continue to crop up in HCI research, and rather than dealing with each new instance piecemeal, in a new vocabulary, as though we had never heard of it before, an overarching framework prepared to deal with the phenomenon of consciousness will be useful. Draper (1993) talks about ``designing for consciousness,'' and it seems that this is exactly what we should be doing when we discuss the possibility of, for example, ``intelligent agents.'' The notion of agents suggests that the user direct conscious awareness toward the user interface rather than that the user interface disappear ``transparently.'' In a direct manipulation interface, on the other hand, cognitive content concerns the nitty-gritty of one's task, with the interface ideally fading from awareness. Thus we see from this brief excursion into the difficult subject of consciousness that already we have gained two insights: (1) we must know what the user is thinking to design properly, as Carey and Rusli (1995) argue, and (2) we have a larger conceptual space into which to place differing user interface paradigms such as intelligent agents and direct manipulation. Of course, psychologists have studied attention and consciousness for a long time; this is not new to activity theory. Activity theory, however, embeds consciousness in a wider activity system and describes a dynamic by which changes in consciousness are directly related to the material and social conditions current in a person's situation (see Kaptelinin, chapters 3, 5; Nardi, chapter 4; Bødker; Raeithel and Velichkovsky, this volume). This extends the concept of consciousness past an idealistic, mentalistic

6

construct in which only cognitive resources and attention ``inside the head'' are at issue, to a situated phenomenon in which one's material and social context are crucial. An important perspective contributed by activity theory is its insistence on the asymmetry between people and things (see Kaptelinin, chapter 5; Nardi, chapter 4; Zinchenko, this volume). Activity theory, with its emphasis on the importance of motive and consciousness—which belong only to humans—sees people and things as fundamentally different. People are not reduced to ``nodes'' or ``agents'' in a system; ``information processing'' is not seen as something to be modeled in the same way for people and machines. In activity theory, artifacts are mediators of human thought and behavior; they do not occupy the same ontological space. This results in a more humane view of the relationship of people and artifacts, as well as squarely confronting the many real differences between people and things. Cognitive science has been the dominant theoretical voice in HCI studies since the inception of our young field. We are beginning to feel a theoretical pinch, however—a sense that cognitive science is too restrictive a paradigm for finding out what we would like to know (Bannon and Bødker, 1991; Kuutti, this volume). Activity theory is not a rejection of cognitive science (see Kaptelinin, chapter 5, this volume) but rather a radical expansion of it. One reason we need this expansion is that a key aspect of HCI studies must be to understand things; technology—physical objects that mediate activity—and cognitive science have pretty much ignored the study of artifacts, insisting on mental representations as the proper locus of study. Thus we have produced reams of studies on mentalistic phenomena such as ``plans'' and ``mental models'' and ``cognitive maps,'' with insufficient attention to the physical world of artifacts—their design and use in the world of real activity (Hutchins 1994). Norman (1988) has done much to alleviate this situation, turning our attention toward what Sylvia Plath called the ``thinginess of things'' (Plath 1982), but we still have a long way to go. Activity theory proposes that activity cannot be understood without understanding the role of artifacts in everyday existence, especially the way artifacts are integrated into social practice (which thus contrasts with Gibson's notion of affordances). Cognitive science has concentrated on information, its representation and propagation; activity theory is concerned with practice, that is, doing and activity, which significantly involve ``the mastery of ... external devices and tools of labor activity'' (Zinchenko 1986). Kaptelinin (chapters 3, 5, this volume) and Zinchenko (this volume) describe the activity theory concept of ``functional organ,'' a fundamental notion pinpointing the way the mind and body are profoundly extended and transformed by artifacts (see also Vygotsky 1929, Leont'ev 1981). There are echoes of Haraway's (1990) cyborg here but in a different (and much earlier) voice. The notion of the functional organ, rather than being a riveting poetic image like the cyborg, is a tenet of a larger system of theoretical thought and a tool for further scientific inquiry. Some readers may be impatient with activity theory terminology. It can be inelegant in translation from the Russian and, worse, confusing. The notion of an ``object,'' in particular, becomes a point of confusion as activity theorists use terms such as ``object-oriented'' in an entirely different way than they are used in the programming community. A degree of forbearance is helpful when first confronting activity theory terminology. Activity theory challenges much that we have held useful and important in HCI research. But this book is not mounted as an attack on previous work; rather, it is an inquiry into satisfying ways to extend, and where necessary to reformulate, the basis for the study of problems in human-computer interaction. This inquiry is intended to be ecumenical and inclusive yet probing and questioning. There is a new kind of post-postmodern voice struggling to speak clearly here; it is polyvocal and dialogical, to be sure, but also committed to social and scientific engagement. This voice has little use for the peevish debate and posturing that mark much current (and past) discourse; instead the aim is to acknowledge, learn from, and yet go beyond existing theory, to reach for what Bertelsen (1994) calls a ``radical pragmatic science of HCI.'' Many who have come to find activity theory useful for HCI acknowledge a debt to cognitive science, especially the pioneering work of Card, Moran, and Newell (1983), for the suggestion that HCI design can benefit from a rigorous scientific foundation, as well as a debt to participatory design work (Kyng 1991; Muller and Kuhn 1993), which urges a humane, socially responsible scientific practice. That activity theory fuses these two intellectual impulses into a unified approach perhaps explains why we are seeking its counsel at this particular time in the history of our field. REFERENCES Bakhurst, D. (1991). Consciousness and Revolution in Soviet Philosophy. Cambridge: Cambridge University Press.

7

Bannon, L., and Bødker, S. (1991). Beyond the interface: Encountering artifacts in use. In J. Carroll, ed., Designing Interaction: Psychology at the Human Computer Interface. Cambridge: Cambridge University Press. Bertelsen, O. (1994). Fitts' law as a design artifact: A paradigm case of theory in software design. In Proceedings EastWest Human Computer Interaction Conference (vol. 1, pp. 37–43). St. Petersburg, August 2–6. Bobrow, D. (1972). Requirements for advanced programming systems for list processing. Communications of the ACM (July). Brooks, R. (1991). Comparative task analysis: An alternative direction for human-computer interaction science. In J. Carroll, ed., Designing Interaction: Psychology at the Human Computer Interface. Cambridge: Cambridge University Press. Card, S., Moran, T., and Newell, A. (1983). The Psychology of Human-Computer Interaction. Hillsdale, NJ: Lawrence Erlbaum Associates. Carey, T., and Rusli, M. (1995). Usage representations for re-use of design insights: A case study of access to on-line books. In J. Carroll, ed., Scenario-based Design for Human Computer Interaction. New York: Wiley. Draper, S. (1993). Critical notice: Activity theory: The new direction for HCI? International Journal of Man-Machine Studies 37(6):812–821. Engeström, Y. (1993). Developmental studies of work as a testbench of activity theory. In S. Chaiklin, S. and J. Lave, eds., Understanding Practice: Perspectives on Activity and Context (pp. 64–103). Cambridge: Cambridge University Press. Haraway, D. (1990). Simians, Cyborgs and Women: The Reinvention of Nature. London: Routledge. Hutchins, E. (1994). Cognition in the Wild. Cambridge, MA: MIT Press. Kyng, M. (1991). Designing for cooperation—cooperating in design. Communications of the ACM 34(12):64–73. Leont'ev, A. (1974). The problem of activity in psychology. Soviet Psychology 13(2):4–33. Leont'ev, A. (1981). Problems of the Development of Mind. Moscow: Progress. Malone, T. (1983). How do people organize their desks? ACM Transactions on Office Information Systems 1, 99–112. Muller, M., and Kuhn, S. (1993). Introduction to special issue on participatory design. Communications of the ACM 36:24–28. Norman, D. (1988). The Psychology of Everyday Things. New York: Basic Books. Plath, S. (1982). The Journals of Sylvia Plath. New York: Ballantine Books. Vygotsky, L. S. (1925/1982). Consciousness as a problem in the psychology of behaviour. In Collected Works: Questions of the Theory and History of Psychology. Moscow: Pedagogika. Vygotsky, L. S. (1929). The problem of the cultural development of the child. Journal of Genetic Psychology 36:415– 432. Zinchenko, V. P. (1986). Ergonomics and informatics. Problems in Philosophy 7:53–64.

8

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

September 21, 2006

5 Plans

Once the European navigator has developed his operating plan and has available the appropriate technical resources, the implementation and monitoring of his navigation can be accomplished with a minimum of thought. He has simply to perform almost mechanically the steps dictated by his training and by his initial planning synthesis. (Gladwin 1964: 175)

Every account of communication involves assumptions about action, in particular about the bases for action’s coherence and intelligibility. This chapter and the next discuss two alternative views of action. The first, adopted by most researchers in artificial intelligence, locates the organization and significance of human action in underlying plans. At least as old as the Occidental hills, this view of purposeful action is the basis for traditional philosophies of rational action and for much of the behavioral sciences. It is hardly surprising, therefore, that it should be embraced by those newer fields concerned with intelligent artifacts, particularly cognitive science and information-processing psychology. On the planning view, plans are prerequisite to and prescribe action, at every level of detail. Mutual intelligibility is a matter of the reciprocal recognizability of our plans, enabled by common conventions for the expression of intent and shared knowledge about typical situations and appropriate actions. The alternative view, developed in Chapter 6 of this book, is that although the course of action can always be projected or reconstructed in terms of prior intentions and typical situations, the prescriptive significance of intentions for situated action is inherently 51

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

52

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

vague. The coherence of situated action is tied in essential ways not to individual predispositions or conventional rules but to local interactions contingent on the actor’s particular circumstances.1 A consequence of action’s situated nature is that communication must incorporate both a sensitivity to local circumstances and resources for the remedy of troubles in understanding that inevitably arise. This chapter reviews the planning model of purposeful action and shared understanding. Those who adopt the planning model as a basis for interaction between people and machines draw on three related theories about the mutual intelligibility of action: (1) the planning model itself, which takes the significance of action to be derived from plans and identifies the problem for interaction as their recognition and coordination; (2) speech act theory, which accounts for the recognizability of plans or intentions by proposing conventional rules for their expression; and (3) the idea of shared background knowledge, as the common resource that stands behind individual action and gives it social meaning. Each of these promises to solve general problems in human communication, such as the relation of observable behavior to intent, the correspondence of intended and interpreted meaning, and the stability of meaning assignments across situations, in ways that are relevant to particular problems in people’s interaction with machines.

the planning model The planning model in cognitive science treats a plan as a sequence of actions designed to accomplish some preconceived end. The model posits that action is a form of problem solving, where the actor’s problem is to find a path from some initial state to a desired goal state, given 1

The term circumstances of course begs a further set of questions. Most important for the purpose of this argument is recognition of the extent to which the conditions of our actions are not simply pregiven and self-evident but are themselves constituted through unfolding courses of action and interaction. This is not to say that action is constructed somehow always de novo or in a vacuum. On the contrary, human activity invariably occurs in circumstances that include more and less long-standing, obdurate, and compelling layers of culturally and historically constituted, social and material conditions. However familiar and constraining, though, the significance of those conditions, and their relevance for what is happening here and now, must be actively reenacted by participants in ways not fully specified in advance or in any strongly determinate way. For explorations of the improvisatory character of action drawn from close studies of jazz performance, see Sawyer (2003).

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

53

certain conditions along the way. Actions are described, at whatever level of detail, by their preconditions and their consequences: In problem-solving systems, actions are described by prerequisites (i.e., what must be true to enable the action), effects (what must be true after the action has occurred), and decomposition (how the action is performed, which is typically a sequence of subactions). (Allen 1984: 126)

Goals define the actor’s relationship to the situation of action, because the situation is just those conditions that obstruct or advance the actor’s progress toward his or her goals. Advance planning is inversely related to prior knowledge of the environment of action and of the conditions that the environment is likely to present. Unanticipated conditions will require replanning. In every case, however, whether constructed entirely in advance or completed and modified during the action’s course, the plan is prerequisite to the action. Plan Generation and Execution Monitoring One of the earliest attempts to implement the planning model on a machine occurred as part of a project at Stanford Research Institute, beginning in the mid-1960s. The project’s goal was to build a robot that could navigate autonomously through a series of rooms, avoiding obstacles and moving specified objects from one room to another. The robot, named by its designers Shakey, was controlled by a problem-solving program called STRIPS, which employed a means–end analysis to determine the robot’s path (Fikes and Nilsson 1971). The STRIPS program examined the stated goal and then determined a subset of operators, or actions available to the robot, that would produce that state. The preconditions of those actions in turn identified particular subgoal states, which could be examined in the same way. The system thus worked backward from the goal until a plan was defined from the initial state to the goal state, made up of actions that the robot could perform. Subsequent work on problem solving and plan synthesis consisted in large part in refinements to this basic means–ends strategy, toward the end of achieving greater efficiency by constraining the search through possible solution paths.2 Beyond the problem of constructing plans, artificial intelligence researchers have had to address problems of what Nilsson (1973) terms 2

For a review of subsequent work see Sacerdoti (1977, Chapter 3). (Original footnote.)

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

54

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

failure and surprise in the execution of their planning programs, due to the practical exigencies of action in an unpredictable environment. The objective that Shakey should actually be able to move autonomously through a real (albeit somewhat impoverished) environment added a new class of problems to those faced by mathematical or game-playing programs operating in an abstract formal domain: for a problem-solver in a formal domain is essentially done when it has constructed a plan for a solution; nothing can go wrong. A robot in the real world, however, must consider the execution of the plan as a major part of every task. Unexpected occurrences are not unusual, so that the use of sensory feedback and corrective action are crucial. (Raphael, cited in McCorduck 1979: 224)

In Shakey’s case, execution of the plan generated by the STRIPS program was monitored by a second program called PLANEX. The PLANEX program monitored not the actual moves of the robot, however, but the execution of the plan. The program simply assumed that the execution of the plan meant that the robot had taken the corresponding action in the real world. The program also made the assumption that every time the robot moved there was some normally distributed margin of error that would be added to a “model of the world” or representation of the robot’s location. When the cumulative error in the representation got large enough, the plan monitor initiated another part of the program that triggered a camera that could, in turn, take a reading of Shakey’s location in the actual world. The uncertainty to which Shakey was to respond consisted in changes made to the objects in its environment. Another order of uncertainty was introduced with Sacerdoti’s system NOAH (an acronym for Nets of Action Hierarchies). Also developed at the Stanford Research Institute as part of the Computer-Based Consultant project, NOAH was designed to monitor and respond to the actions of a human user. With NOAH, Sacerdoti extended the techniques of problem solving and execution monitoring developed in the planning domain to the problem of interactive instruction: NOAH is an integrated problem solving and execution monitoring system. Its major goal is to provide a framework for storing expertise about the actions of a particular task domain, and to impart that expertise to a human in the cooperative achievement of nontrivial tasks. (Sacerdoti 1977: 2)

The output of the planning portion of Sacerdoti’s program is a “procedural net” or hierarchy of partially ordered actions, which becomes in turn the input to the execution-monitoring portion of the system. The

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

55

execution monitor takes the topmost action in the hierarchy, provides the user with an instruction, and then queries the user regarding the action’s completion. A principal objective of the innovations that Sacerdoti introduced for the representation of procedures in NOAH was to extend execution monitoring to include tracking and assessment of the user’s actions in response to the instructions generated: The system will monitor the apprentice’s work to ensure that the operation is proceeding normally. When the system becomes aware of an unexpected event, it will alter instructions to the apprentice to deal effectively with the new situation. (ibid.: 3)

A positive response from the user to the system’s query regarding the action is taken to mean that the user understood the instruction and has successfully carried it out, whereas a negative response is taken as a request for a more detailed instruction. The system allows as well for a “motivation response” or query from the user as to why a certain task needs to be done (to which the system responds by listing tasks to which the current task is related) and for an “error response” or indication from the user that the current instruction cannot be carried out. Just as the accumulation of error in the PLANEX program required feedback from the world in order to reestablish the robot’s location, the error response from the user in Sacerdoti’s system requires that NOAH somehow repair its representation of the user’s situation: PLANEX presumed that an adequate mechanism existed for accurately updating the world model. This was almost the case, since there were only a small number of actions that the robot vehicle could take, and the model of each action contained information about the uncertainty it would introduce in the world model. When uncertainties reached a threshold, the vision subsystem was used to restore the accuracy of the world model. For the domain of the Computer-Based Consultant, or even for a richer robot domain, this approach will prove inadequate . . . NOAH cannot treat the world model as a given. It must initiate interactions with the user at appropriate points to ensure that it is accurately monitoring the course of the execution . . . [W]hen a serious error is discovered (requiring the system to be more thorough in its efforts to determine the state of the world), the system must determine what portions of its world model differ from the actual situation. (ibid.: 71–2)

The situation in which Shakey moved consisted of walls and boxes (albeit boxes that could be moved unexpectedly by a human hand). The problem in designing Shakey was to maintain consistency between the represented environment and the physical environment in which the robot moved. In introducing the actions of a user, the computer’s

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

56

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

environment becomes not only a physical but also a social one, requiring the interpretation of the user’s actions and an assessment of the user’s understanding of his or her situation. The difficulty of maintaining a shared understanding of a situation, as shown more clearly in Chapters 8 and 9, is not just a matter of monitoring the course of events but of establishing their significance. Nonetheless, with Sacerdoti we have at least a preliminary recognition of the place of the situation in the intelligibility of action and communication. Interaction and Plan Recognition Adherents of the planning model in artificial intelligence research have taken the requirement of interaction as an injunction to extend the planning model from a single individual to two or more individuals acting in concert. The planning model attempts to bring concerted action under the jurisdiction of the individual actor by attaching to the others in the actor’s world sufficient description and granting to the actor sufficient knowledge that he or she is able to respond to the actions of others as just another set of environmental conditions. The problem of social interaction, consequently, becomes an extension of the problem of the individual actor. The basic view of a single, goal-directed agent, acting in response to an environment of conditions, is complicated (the conditions now include the actions of other agents) but intact. The problem for interaction, on this view, is to recognize the actions of others as the expression of their underlying plans. The complement to plan generation and execution in artificial intelligence research, therefore, is plan recognition or the attribution of plans to others based on observation of their actions. The starting premise for a theory of plan recognition is that an observer takes some sequence of actions as evidence and then forms hypotheses about the plans that could motivate and explain those actions. One persisting difficulty for action understanding in artificial intelligence research has been the uncertain relation between actions and intended effects. Allen (1984) illustrates this problem with the example of turning on a light: There are few physical activities that are a necessary part of performing the action of turning on a light. Depending on the context, vastly different patterns of behavior can be classified as the same action. For example, turning on a light usually involves flipping a light switch, but in some circumstances it may involve tightening the light bulb (in the basement) or hitting the wall

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

57

(in an old house). Although we have knowledge about how the action can be performed, this does not define what the action is. The key defining characteristic of turning on the light seems to be that the agent is performing some activity which will cause the light, which was off when the action started, to become on when the action ends. An important side effect of this definition is that we could recognize an observed pattern of activity as “turning on the light” even if we had never seen or thought about that pattern previously. (ibid.: 126)

Allen’s point is twofold. First, the “same” action as a matter of intended effect can be achieved in any number of ways, where the ways are contingent on circumstance rather than on definitional properties of the action. And second, although an action can be accounted for post hoc with reference to its intended effect, an action’s course cannot be predicted from knowledge of the actor’s intent, nor can the course be inferred from observation of the outcome. Allen identifies the indeterminate relationship of intended effect to method as a problem for planning or plan recognition systems: a problem that he attempts to resolve by constructing a logical language for action descriptions that handles the distinction between what he calls the “causal definition” of an action (i.e., the pre and post conditions that must hold to say that the action has occurred, independent of any method) and the action’s characterization in terms of a particular method or procedure for its accomplishment.3 Whereas Allen’s approach to the problem of plan recognition is an attempt to reconstruct logically our vocabulary of purposeful action, a few more psychologically oriented researchers in artificial intelligence have undertaken experiments designed to reveal the process by which people bring the actions of others under the jurisdiction of an ascribed plan. Schmidt, Sridharan, and Goodson (1978) observe, for example, that plan attribution seems to require certain transformations of the sequential organization of the action described.4 They report that throughout the process of plan attribution the problem to be solved by the subject remains “ill-formed,” by which they mean that at any given time neither the range of possible plans that the other might be carrying out, 3

4

Another, less problematic, uncertainty that Allen attempts to capture is the observation that while some components of an action are sequentially ordered in a necessary way (i.e., one is prerequisite to the other), other components, although necessary to the action, have no necessary sequential relationship to each other. The incorporation of unordered actions into the structure of plans, pioneered by Sacerdoti (1975), was viewed as a substantial breakthrough in early planning research. (Original footnote.) The empirical method of their study is unusual in artificial intelligence research, where work generally proceeds on the basis of imagination and introspection. (Original footnote.)

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

58

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

nor the criteria for assessing just what plan is actually in effect, are clearly defined (ibid.: 80). Nonetheless, they report that their subjects are able to posit an underlying plan. Their strategy appears to be to adopt tentatively a single hypothesis about the other’s plan rather than entertain all or even some number of logical possibilities simultaneously. The preferred hypothesis regarding the other’s plan then affects what actions are noted and recalled in the subject’s accounts of the action, and the temporal order of events is restructured into logical “in order to” or “because” relationships, such that relations among actions are not restricted to consecutive events in time. At the same time, the current hypothesis is always subject to elaboration or revision in light of subsequent events to the extent that subjects are often required to suspend judgment on a given hypothesis and to adopt a “wait and see” strategy. Wherever possible, actions that violate the structure of an attributed plan are explained away before the plan itself is reconsidered. Schmidt, Shridharan, and Goodson conclude that all of these observations “support the generalization that action understanding is simply a process of plan recognition” (ibid.: 50). It is worth noting, however, that although these observations clearly point to a process of plan attribution by the observer, there is no independent evidence that the process of plan attribution is a process of recognizing the plan of the actor. The Status of Plans Assessment of the planning model is complicated by equivocation in the literature between plans as a conceptual framework for the analysis and simulation of action and plans as a psychological mechanism for its actual production. When researchers describe human action in terms of plans, the discussion generally finesses the question of just how the formulations provided by the researcher are purported to relate to the actor’s intent. The claim is at least that people analyze each other’s actions into goals and plans to understand each other. But the suggestion that the plan is “recognized” implies that it has an existence prior to and independent of the attribution and that it actually determines the action. The identification of the plan with the actor’s intent is explicit in the writing of philosophers of action supportive of artificial intelligence research like Margaret Boden, who writes: unless an intention is thought of as an action-plan that can draw upon background knowledge and utilize it in the guidance of behavior one cannot understand how intentions function in real life. (1973: 27–8)

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

59

Intentions, in other words, are realized as plans-for-action that directly guide behavior. A logical extension of Boden’s view, particularly given an interest in rendering it more computable, is the view that plans actually are prescriptions or instructions for action. An early and seminal articulation of this view came from Miller, Galanter, and Pribram (1960), who define an intention as “the uncompleted parts of a Plan whose execution has already begun” (ibid.: 61). With respect to the plan itself: Any complete description of behavior should be adequate to serve as a set of instructions, that is, it should have the characteristics of a plan that could guide the action described. When we speak of a plan . . . the term will refer to a hierarchy of instructions . . . A plan is any hierarchical process in the organism that can control the order in which a sequence of operations is to be performed. A Plan is, for an organism, essentially the same as a program for a computer . . . , we regard a computer program that simulates certain features of an organism’s behavior as a theory about the organismic Plan that generated the behavior. Moreover, we shall also use the term “Plan” to designate a rough sketch of some course of action . . . , as well as the completely detailed specification of every detailed operation . . . We shall say that a creature is executing a particular Plan when in fact that Plan is controlling the sequence of operations he is carrying out. (ibid.: 17, original emphasis)

With Miller, Galanter, and Pribram the view that purposeful action is planned is put forth as a psychological “process theory” compatible with the interest in a mechanistic, computationally tractable account of intelligent action.5 By improving on or completing our commonsense descriptions of the structure of action, the structure is now represented not only as a plausible sequence but also as a hierarchical plan. The plan reduces, moreover, to a detailed set of instructions that actually serves as the program that controls the action. At this point the plan as stipulated becomes substitutable for the action, insofar as the action is viewed as derivative from the plan. And once this substitution is done, the theory is self-sustaining: the problem of action is assumed to be solved by the planning model and the task that remains is the model’s refinement. Although attributing the plan to the actor resolves the question of the plan’s status, however, it introduces new problems with respect to what we actually mean by purposeful action. If plans are synonymous with purposeful action how do we account, on the one hand, for a prior intent to act that may never be realized and, on the other hand, for 5

For a close, critical reading of Miller, Galanter, and Pribram from within the field of artificial intelligence research, see Agre (1997, Chapter 8).

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

60

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

an intentional action for which we would ordinarily say no plan was formed ahead of time?6 And if any plan of action can be analyzed at any level of detail, what level of description represents that which we would want to call purposeful action? If every level, there is no reason in principle to distinguish, for example, between deliberate action and involuntary response, as the latter always can be ascribed to a process of planning unavailable to the actor. In fact, this is just what Boden would have us do. On her account, action can be reduced to basic units for which “no further procedural analysis could conceivably be given.” Those units compose “complex procedural schemata or action-plans,” which in turn produce “complex intentional effects” (1973: 36). Psychological processes at the level of intention, in other words, are reducible ultimately to bodily operations. But although the planning model would have a statement of intent reflect an actual set of instructions for action, even casual observation indicates that our statements of intent generally do not address the question of situated action at any level of detail. In fact, because the relation of the intent to accomplish some goal to the actual course of situated action is enormously contingent, a statement of intent generally says very little about the action that follows. It is precisely because our plans are inherently vague – because we can state our intentions without having to describe the actual course that our actions will take – that an intentional vocabulary is so useful for our everyday affairs. The confusion in the planning literature over the status of plans mirrors the fact that in our everyday action descriptions we do not normally distinguish between accounts of action provided before and after the fact and an action’s actual course. As commonsense constructs plans are a constituent of practical action, but they are constituent as an artifact of our reasoning about action, not as the generative mechanism of action. Our imagined projections and our retrospective reconstructions are the principal means by which we catch hold of situated action and reason about it, whereas situated action itself, in contrast, is essentially transparent to us as actors.7 The planning model, however, takes over our commonsense preoccupation with the anticipation of action and the review of 6

7

Davis (cited in Allen 1984) gives the example of a person driving who brakes when a small child runs in front of the car. See also Searle’s distinction (1980) between “prior intentions” and “intentions-in-action.” (Original footnote.) One result of the transparency of situated action is that we have little vocabulary with which to talk about it, though Chapters 6 and 7 attempt to present some recent efforts from the social sciences. For a treatment of the philosophical vocabulary proposed by Heidegger, see Dreyfus (1991). (Original footnote, with updated reference.)

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

61

its outcomes and attempts to systematize that reasoning as a model for action while ignoring the actual stuff, the situated action, which is the reasoning’s object.8

speech acts A growing number of research efforts devoted to machine intelligence have as their objective, for both theoretical and practical reasons, human–machine communication using English or “natural language” (for example, Brady and Berwick 1983; Bruce 1981; Joshi, Webber, and Sag 1981). Researchers in natural language understanding have embraced Austin’s observation (1962) that language is a form of action as a way of subsuming communication to the planning model. If language is a form of action, it follows that language understanding, like the interpretation of action generally, involves an analysis of a speaker’s utterances in terms of the plans those utterances serve: Let us start with an intuitive description of what we think occurs when one agent A asks a question of another agent B which B then answers. A has some goal; s/he creates a plan (plan construction) that involves asking B a question whose answer will provide some information needed in order to achieve the goal. A then executes this plan, asking B the question. B interprets the question, and attempts to infer A’s plan (plan inference). (Allen 1983; original emphasis)

As with the interpretation of action, plans are the substrate on which the interpretation of natural language utterances rests, insofar as “human language behavior is part of a coherent plan of action directed toward satisfying a speaker’s goals” (Appelt 1985: 1). We understand language, and action more generally, when we successfully infer the other’s goals and understand how the other’s action furthers them. The appropriateness of a response turns on that analysis, from which in turn: The hearer then adopts new goals (e.g., to respond to a request, to clarify the previous speaker’s utterance or goal), and plans his own utterances to achieve those. A conversation ensues. (Cohen n.d.: 24)

8

A note of clarification is in order here, particularly in light of readings of this text that have taken my argument to be either that plans do not exist or that they are “merely” fictions created before and after the fact of specifically situated activity. In rereading this passage I realize the contribution that I myself may have made to this misunderstanding in not emphasizing clearly enough that I take planning itself to be a form of situated action. As I have argued in Chapter 1, this is true both in the sense that plans are imaginative and discursive accounts created in anticipation of action and in the sense that they may be cited in the midst of ongoing activity, as well as afterwards. See also Chapter 11.

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

62

Given such an account of conversation, the research problem with respect to language understanding is essentially the same as that of the planning model more generally; that is, to characterize actions in terms of their preconditions and effects and to formulate a set of inference rules for mapping between actions and underlying plans. Among researchers in the natural language area of artificial intelligence research, Searle’s speech act theory (1969) is seen to offer some initial guidelines for computational models of communication: We hypothesize that people maintain, as part of their models of the world, symbolic descriptions of the world models of other people. Our plan-based approach will regard speech acts as operators whose effects are primarily on the models that speakers and hearers maintain of each other. (Cohen and Perrault 1979: 179)

Searle’s conditions of satisfaction for the successful performance of speech acts are read as the speech act’s “preconditions,” whereas its illocutionary force is the desired “effect”: Utterances are produced by actions (speech acts) that are excited in order to have some effect on the hearer. This effect typically involves modifying the hearer’s beliefs or goals. A speech act, like any other action, may be observed by the hearer and may allow the hearer to infer what the speaker’s plan is. (Allen 1983: 108)

In describing utterances by their preconditions and effects, speech acts seem to provide at least the framework within which computational mechanisms for engineering interaction between people and machines might emerge. But although Searle’s “conditions of satisfaction” state conventions governing the illocutionary force of certain classes of utterance, he argues against the possibility of a rule-based semantics for construing the significance of any particular utterance. Although the maxims that speech act theory proposes (for example, the felicity condition for a directive is that S wants H to do A) tell us something about the general conditions of satisfaction for a directive, they tell us nothing further about the significance of any particular directive. With respect to the problem of interpretation, Gumperz (1982b: 326) offers the following example from an exchange between two secretaries in a small office: A: B: A: B:

Are you going to be here for ten minutes? Go ahead and take your break. Take longer if you want. I’ll just be outside on the porch. Call me if you need me. OK. Don’t worry.

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

63

Gumperz points out that B’s response to A’s question clearly indicates that B interprets the questions as an indirect request that B stay in the office while A takes a break and, by her reply, A confirms that interpretation. B’s interpretation accords with a categorization of A’s question as an indirect speech act (Searle 1979), and with Grice’s discussion of implicature (1975); that is, B assumes that A is cooperating and that her question must be relevant, and therefore B searches her mind for some possible context or interpretive frame that would make sense of the question and comes up with the break. But, Gumperz points out, this analysis begs the question of how B arrives at the right inference: What is it about the situation that leads her to think A is talking about taking a break? A common sociolinguistic procedure in such cases is to attempt to formulate discourse rules such as the following: “If a secretary in an office around break time asks a co-worker a question seeking information about the co-worker’s plans for the period usually allotted for breaks, interpret it as a request to take her break.” Such rules are difficult to formulate and in any case are neither sufficiently general to cover a wide enough range of situations nor specific enough to predict responses. An alternative approach is to consider the pragmatics of questioning and to argue that questioning is semantically related to requesting, and that there are a number of contexts in which questions can be interpreted as requests. While such semantic processes clearly channel conversational inference, there is nothing in this type of explanation that refers to taking a break. (1982b: 326–7)

The problem that Gumperz identifies here applies equally to attempts to account for inferences such as B’s by arguing that she recognizes A’s plan to take a break. Clearly she does: the outstanding question is how. Although we can always construct a post hoc account that explains interpretation in terms of knowledge of typical situations and motives, it remains the case that with speech act theory, as with the planning model, neither typifications of intent nor general rules for its expression are sufficient to account for the mutual intelligibility of our situated action. In the final analysis, attempts to construct a taxonomy of intentions and rules for their recognition seem to beg the question of situated interpretation rather than answer it.

background knowledge Gumperz’s example demonstrates a problem that any account of human action must face; namely, that an action’s significance seems to lie as much in what it presupposes and implies about its situation as in any

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

64

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

explicit or observable behavior as such. Even the notion of observable behavior becomes problematic in this respect, insofar as what we do and what we understand others to be doing is so thoroughly informed by assumptions about the action’s significance. In the interpretation of purposeful action, it is hard to know where observation leaves off and where interpretation begins. In recognition of the fact that human behavior is a figure defined by its ground, social science has largely turned from the observation of behavior to explication of the background that seems to lend behavior its sense. For cognitive science the background of action is not the world as such, but knowledge about the world. Researchers agree that representation of knowledge about the world is a principal limiting factor on progress in machine intelligence. The prevailing strategy in representing knowledge has been to categorize the world into domains of knowledge (e.g., areas of specialization such as medicine along one dimension or propositions about physical phenomena such as liquids along another) and then to enumerate facts about the domain and relationships between them. Having carved out domains of specialized knowledge the catchall for anything not clearly assignable is “common sense,” which then can be spoken of as if it were yet another domain of knowledge (albeit one that is foundational to the others). Although some progress has been made in selected areas of specialized knowledge, the domain of commonsense knowledge so far remains intractable and unwieldy.9 One approach to bounding commonsense knowledge, exemplified by the work of Schank and Abelson (1977), is to classify the everyday world as types of situations and assign to each its own body of specialized knowledge. The claim is that our knowledge of the everyday world is organized by a “predetermined, stereotyped sequence of actions that define a well-known situation” or script (ibid.: 422). Needless to say, “[s]cripts are extremely numerous. There is a restaurant script, a birthday party script, a football game script, a classroom script, and so on” (ibid.: 423). Every situation, in other words, has its plan made up of ordered action sequences, each action producing the conditions that enable the next action to occur. Admittedly, the normative order of these action sequences can be thrown off course by any one of what Schank and Abelson term distractions, obstacles, or 9

For a cogent critique of the most ambitious effort to encode “commonsense” knowledge as a foundation for AI, see Adam’s account of the CYC project in Adam (1998, Chapter 3).

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

65

errors. Distractions, about which they have little to say, comprise the interruption of one script by another, whereas: An obstacle to the normal sequence occurs when someone or something prevents a normal action from occurring or some enabling condition for the action is absent. An error occurs when the action is completed in an inappropriate manner, so that the normal consequences of the action do not come about. (ibid.: 426)

Not only does the typical script proceed according to a normal sequence of actions, in other words, but each script has its typical obstacles and errors that, like the script itself, are stored in memory along with their remedies and retrieved and applied as needed. Whereas plans associate intentions with action sequences, scripts associate action sequences with typical situations. In practice, however, the stipulation of relevant background knowledge for typical situations always takes the form of a partial list, albeit one offered as if the author could complete the list given the requisite time and space: If one intends to buy bread, for instance, the knowledge of which bakers are open and which are shut on that day of the week will enter into the generation of one’s plan of action in a definite way; one’s knowledge of local topography (and perhaps of map-reading) will guide one’s locomotion to the selected shop; one’s knowledge of linguistic grammar and of the reciprocal roles of shopkeeper and customer will be needed to generate that part of the action-plan concerned with speaking to the baker, and one’s financial competence will guide and monitor the exchange of coins over the shop counter. (Boden 1973: 28)

Like Boden’s story of the business of buying bread, attempts in artificial intelligence research to formalize commonsense knowledge rely on an appeal to intuition that shows little sign of yielding to scientific methods. The difficulty is not just that every action presupposes a large quantity of background knowledge: though it would pose practical problems, such a difficulty would be tractable eventually. Just because “implicit knowledge” can in principle be enumerated indefinitely, deciding in practice about the enumeration of background knowledge remains a stubbornly ad hoc procedure, for which researchers have not succeeded in constructing rules that do not depend, in their turn, on some deeper ad hoc procedures. Nevertheless, the image evoked by “shared knowledge” is a potentially enumerable body of implicit assumptions or presuppositions that stands behind every explicit action or utterance and from which

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

66

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

participants in interaction selectively draw in understanding each other’s actions. This image suggests that what actually does get said on any occasion must reflect the application of a principle of communicative economy, which recommends roughly that to the extent that either the premises or rationale of an action can be assumed to be shared, they can be left unspoken. That means, in turn, that speakers must have procedures for deciding the extent of the listener’s knowledge and the commensurate requirements for explication. The listener, likewise, must make inferences regarding the speaker’s assumptions about shared knowledge on the basis of what he or she chooses explicitly to say. What is unspoken and relevant to what is said is assumed to reside in the speaker’s and listener’s common stock of background knowledge, the existence of which is proven by the fact that an account of what is said always requires reference to further facts that, though unspoken, are clearly relevant. This image of communication is challenged, however, by the results of an exercise assigned by Garfinkel to his students (1972). Garfinkel’s aim was to press the commonsense notion that background knowledge is a body of things thought but unsaid that stands behind behavior and makes it intelligible. The request was that the students provide a complete description of what was communicated, in one particular conversation, as a matter of the participants’ shared knowledge. Students were asked to report a simple conversation by writing on the left-hand side of a piece of paper what was said and on the right-hand side what it was that they and their partners actually understood was being talked about. Garfinkel reports that when he made the assignment: many students asked how much I wanted them to write. As I progressively imposed accuracy, clarity, and distinctness, the task became increasingly laborious. Finally, when I required that they assume I would know what they had actually talked about only from reading lilerally what they wrote literally, they gave up with the complaint that the task was impossible. (ibid.: 317)

The students’ dilemma was not simply that they were being asked to write “everything” that was said, where that consisted of some bounded, albeit vast, content. Instead, it was that the task of enumerating what was talked about itself extended what was talked about, providing a continually receding horizon of understandings to be accounted for. The assignment, it turned out, was not to describe some existing content but to generate it. As such, it was an endless task. The students’ failure suggests not that they gave up too soon but that what they were assigned

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

0 521 85891 7

Plans

September 21, 2006

67

to do was not what the participants in the conversation themselves did to achieve shared understanding. Although the notion of “background assumptions” connotes an actual collection of things that are there in the mind of the speaker – a body of knowledge that motivates a particular action or linguistic expression and makes it interpretable – Garfinkel’s exercise, as well as the phenomenology of experience, suggest that there is reason to question the view that background assumptions are part of the actor’s mental state prior to action: As I dash out the door of my office, for example, I do not consciously entertain the belief that the floor continues on the other side, but if you stop me and ask me whether, when I charged confidently through the door, I believed that the floor continued on the other side, I would have to respond that indeed, I did. (Dreyfus 1982: 25)

A background assumption, in other words, is generated by the activity of accounting for an action when the premise of the action is called into question. But there is no particular reason to believe that the assumption actually characterizes the actor’s mental state prior to the act. In this respect, the “taken for granted” denotes not a mental state but something outside of our heads that, precisely because it is nonproblematically there, we do not need to think about. By the same token, in whatever ways we do find action to be problematical the world is there to be consulted should we choose to do so. Similarly, we can assume the intelligibility of our actions, and as long as the others with whom we interact present no evidence of failing to understand us we do not need to explain ourselves, yet the grounds and significance of our actions can be explicated endlessly. The situation of action is thus an inexhaustibly rich resource, and the enormous problems of specification that arise in cognitive science’s theorizing about intelligible action have less to do with action than with the project of substituting definite procedures for vague plans, and representations of the situation of action, for action’s actual circumstances. To characterize purposeful action as in accord with plans and goals is just to say again that it is purposeful and that somehow, in a way not addressed by the characterization itself, we constrain and direct our actions according to the significance that we assign to a particular context. How we do that is the outstanding problem. Plans and goals do not provide the solution for that problem; they simply restate it. The dependency of significance on a particular context, every particular

17:54

P1: IBE 0521858917c05

CUFX024/Suchman

68

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

context’s open-endedness, and the essential contingency of contextual elaboration are resources for practical affairs but perplexities for a science of human action. And, to anticipate the analysis in Chapter 9, it is an intractable problem for projects that rest on providing in advance for the significance of canonical descriptions – such as instructions – for situated action.

17:54

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

September 21, 2006

6 Situated Actions

This total process [of Trukese navigation] goes forward without reference to any explicit principles and without any planning, unless the intention to proceed to a particular island can be considered a plan. It is nonverbal and does not follow a coherent set of logical steps. As such it does not represent what we tend to value in our culture as “intelligent” behavior. (Gladwin 1964: 175)

This chapter turns to recent efforts within anthropology and sociology to challenge traditional assumptions regarding purposeful action and shared understanding. A point of departure for the challenge is the idea that commonsense notions of planning are not inadequate versions of scientific models of action, but rather are resources for people’s practical deliberations about action.1 As projective and retrospective accounts of action, plans are themselves located in the larger context of some ongoing practical activity. And as commonsense notions about the structure of that activity, plans are part of the subject matter to be investigated in a study of purposeful action, not something to be improved on or transformed into axiomatic theories of action. The premise that practical reasoning about action is properly part of the subject matter of social studies is due to the emergence of a branch of sociology named ethnomethodology. This chapter describes the inversion of traditional social theory recommended by ethnomethodology and the implications of that inversion for the prowler of purposeful 1

For an exposition of the ethnomethodological premises that underwrite this idea, see Lynch (1993).

69

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

70

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

action and shared understanding. To designate the alternative that ethnomethodology suggests (more a reformulation of the problem of purposeful action and a research programme than an alternative theory), I have introduced the term situated action.2 That term underscores the view that every course of action depends in essential ways on its material and social circumstances. Rather than attempt to abstract action away from its circumstances and represent it as a rational plan, the approach is to study how people use their circumstances to achieve intelligent action. Rather than build a theory of action out of a theory of plans, the aim is to investigate how people produce and find evidence for plans in the course of situated action. More generally, rather than subsume the details of action under the study of plans, plans are subsumed by the larger problem of situated action. The view of action that ethnomethodology recommends is neither behavioristic, in any narrow sense of that term, nor mentalistic. It is not behavioristic in that it assumes that the significance of action is not reducible to uninterpreted bodily movements. Nor is it mentalistic, however, in that the significance of action is taken to be based, in ways that are fundamental, rather than secondary or epiphenomenal, in the physical and social world. The basic premise is twofold: first, that what traditional behavioral sciences take to be cognitive phenomena have an necessary relationship to a publicly available, collaboratively organized world of artifacts and actions and, second, that the significance of artifacts and actions, and the methods by which their significance is conveyed, have an essential relationship to their particular, concrete circumstances.3 The ethnomethodological view of purposeful action and shared understanding is outlined in this chapter under five propositions: (1) plans are representations of situated actions; (2) in the course of situated action, representation occurs when otherwise transparent activity becomes in some way problematic; (3) the objectivity of the situations 2

3

In saying that I had introduced the term situated action, I meant within the context of the present discussion. Subsequent attributions to the contrary, I by no means intended to suggest that I had coined that phrase! Origins of the phrase in sociological writings go back at least to C. Wright Mills’s (1940) “Situated Actions and Vocabularies of Motive.” Rawls (2002: 20) points out that although Garfinkel’s ethnomethodological treatment of the relation of action and accounts is consistent with Mills’s, Garfinkel attends not only to the retrospective character of accounts but also to the prospective and ongoing character of both accounts the actions that they formulate. On the relevance of a phenomenological account of the public availability of objects and artifacts to system design, see Robertson (2002).

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

71

of our action is achieved rather than given; (4) a central resource for achieving the objectivity of situations is language, which stands in a generally indexical relationship to the circumstances that it presupposes, produces, and describes; (5) as a consequence of the indexicality of language, mutual intelligibility is achieved on each occasion of interaction with reference to situation particulars rather than being discharged once and for all by a stable body of shared meanings.

plans are representations of action The pragmatist philosopher and social psychologist George Herbert Mead (1934) has argued for a view of meaningful, directed action as two integrally but problematically related kinds of activity. One kind of activity is situated and ad hoc improvisation – the part of us, so to speak, that actually acts. The other kind of activity is derived from the first and includes our representations of action in the form of future plans and retrospective accounts. Plans and accounts are distinguished from action as such by the fact that to represent our actions we must in some way to make an object of them. Consequently, our descriptions of our actions come always before or after the fact, in the form of imagined projections and recollected reconstructions.4 Mead’s treatment of the relation of deliberation and reflection to action is one of the more controversial, and in some ways incoherent, pieces of his theory. But his premise of a disjunction between our actions and our grasp of them at least raises the question for social science of the relationship between projected or reconstructed courses of action and actions in situ. Most accounts of purposeful action have taken this relationship to be a directly causal one, at least in a logical sense (see Chapter 5 in this book). Given a desired outcome, the actor is assumed to make a choice among alternative courses of action, based on the anticipated consequences of each with respect to that outcome. Accounts of actions taken, by the same token, are just a report on the choices made. The student of purposeful action on this view need know only the predisposition of the actor and the alternative courses that are available to predict the action’s course. 4

Here again, I regret the implication that plans and other forms of imaginative reflection stand somehow outside of action rather than being themselves moments of situated activity (activities of planning, remembering, etc.), displaced in time and space from the occasion anticipated or recollected. The interesting questions for this discussion turn on how it is that activities of planning are invoked and made relevant to the course of some subsequent activity and vice versa. See Chapter 11.

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

72

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

The action’s course is just the playing out of these antecedent factors, knowable in advance of and standing in a determinate relationship to the action itself. The alternative view is that plans are resources for situated action but do not in any strong sense determine its course. Although plans presuppose the embodied practices and changing circumstances of situated action, the efficiency of plans as representations comes precisely from the fact that they do not represent those practices and circumstances in all of their concrete detail. So, for example, in planning to run a series of rapids in a canoe, one is very likely to sit for a while above the falls and plan one’s descent.5 The plan might go something like “I’ll get as far over to the left as possible, try to make it between those two large rocks, then backferry hard to the right to make it around that next bunch.” A great deal of deliberation, discussion, simulation, and reconstruction may go into such a plan. But however detailed, the plan stops short of the actual business of getting your canoe through the falls. When it really comes down to the details of responding to currents and handling a canoe, you effectively abandon the plan and fall back on whatever embodied skills are available to you.6 The purpose of the plan in this case is not to get your canoe through the rapids, but rather to orient you in such a way that you can obtain the best possible position from which to use those embodied skills on which, in the final analysis, your success depends. Even in the case of more deliberative, less highly skilled activities we generally do not anticipate alternative courses of action or their consequences until some course of action is already underway. It is frequently only on acting in a present situation that its possibilities become clear, and we often do not know ahead of time, or at least not with any specificity, what future state we desire to bring about. Garfinkel points out that in many cases it is only after we encounter some state of affairs that we find to be desirable that we identify that state as the goal toward which our previous actions, in retrospect, were directed “all along” or “after all” (1967: 98). The fact that we can always perform a post hoc analysis of situated action that will make it appear to have followed a rational plan says more about the 5 6

This example was suggested to me by Randall Trigg, to whom I am indebted for the insight that plans orient us for situated action in this way. (Original footnote.) This phrasing is unfortunate, in suggesting that the plan is somehow jettisoned (see Chapter 1). It would be better to say that your ability to act according to the plan ultimately turns on the embodied skills available to you in situ, which are themselves presupposed, rather than specified, by the plan.

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

73

nature of our analyses than it does about our situated actions. To return to Mead’s point, rather than direct situated action rationality anticipates action before the fact and reconstructs it afterwards.

representation and breakdown Although we can always construct rational accounts of situated action before and after the fact, when action is proceeding smoothly it is essentially transparent to us. Similarly, when we use what Heidegger terms equipment that is “ready-to-hand,” the equipment has a tendency to disappear: Consider the example (used by Wittgenstein, Polanyi, and MerleauPonty) of the blind man’s cane. We hand the blind man a cane and ask him to tell us what properties it has. After hefting and feeling it, he tells us that it is light, smooth, about three feet long, and so on; it is occurrent for him. But when the man starts to manipulate the cane, he loses his awareness of the cane itself; he is aware only of the curb (or whatever object the cane touches); or, if all is going well, he is not even aware of that . . . Precisely when it is most genuinely appropriated equipment becomes transparent. (Dreyfus 1991: 65)7

In contrast, the “unready-to-hand,” in Heidegger’s phrase, comprises occasions wherein equipment that is involved in some practical activity becomes unwieldy, temporarily broken, or unavailable. At such times, inspection and practical problem solving occur, aimed at repairing or eliminating the disturbance to “get going again.” In such times of disturbance, our use of equipment becomes explicitly manifest as a goaloriented activity, and we may then try to formulate procedures or rules: “The scheme peculiar to [deliberating] is the ‘if–then’; if this or that, for instance, is to be produced, put to use, or averted, then some ways and means, circumstances, or opportunities will be needed” (Heidegger, cited in Dreyfus 1991: 72). Another kind of breakdown that arises when equipment to be used is unfamiliar is discussed in Chapter 9 in this book in relation to the “expert help system” and the problem of instructing the novice user of a machine. The important point here is just that the rules and procedures that come into play when we deal with the unready-to-hand are not self-contained or foundational but contingent on and derived from the 7

This quote has been updated from the citation in the original text, which was drawn from a prepublished manuscript of Dreyfus’s book. The phrase ready-to-hand, used in that earlier version, has been replaced with the term occurrent.

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

74

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

situated action that the rules and procedures represent. The representations involved in managing problems in the use of equipment presuppose the very transparent practices that the problem renders noticeable or remarkable. Situated action, in other words, is not made explicit by rules and procedures. Rather, when situated action becomes in some way problematic rules and procedures are explicated for purposes of deliberation and the action, which is otherwise neither rule based nor procedural, is then made accountable to them.

the practical objectivity of situations If we look at the world commonsensically, the environment of our actions is made up of a succession of situations that we walk into and to which we respond. As I noted in Chapter 5 in this book, advocates of the planning model not only adopt this commonsense realist view with respect to the individual actor but also attempt to bring concerted action under the same account by treating the actions of others as just so many more conditions of the actor’s situation. In the same tradition, normative sociology posits and then attempts to describe an objective world of social facts, or received norms, to which our attitudes and actions are a response. Emile Durkheim’s famous maxim that the objective reality of social facts is sociology’s fundamental principle (1938) has been the methodological premise of social studies since early in this century. Recognizing the human environment to be constituted crucially by others, sociological norms comprise a set of environmental conditions beyond the material to which human behavior is responsive: namely the sanctions of institutionalized group life. Human action, the argument goes, cannot be adequately explained without reference to these “social facts,” which are to be treated as antecedent, external, and coercive vis-`a-vis the individual actor. By adopting Durkheim’s maxim, and assuming the individual’s responsiveness to received social facts, social scientists hoped to gain respectability under the view that human responses to the facts of the social world should be discoverable by the same methods as are appropriate to studies of other organisms reacting to the natural world. A principal aim of normative sociology was to shift the focus of attention in studies of human behavior from the psychology of the individual to the conventions of the social group. But at the same time that normative sociology directed attention to the community or group, it maintained an image of the individual member rooted in behaviorist

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

75

psychology and natural science – an image that has been dubbed by Garfinkel the “cultural dope”: “By ‘cultural dope’ I refer to the man-inthe-sociologist’s-society who produces the stable features of the society by acting in compliance with preestablished and legitimate alternatives of action that the common culture provides” (1967: 68). Insofar as the alternatives of action that the culture provides are seen to be nonproblematic and constraining on the individual, their enumeration is taken to constitute an account of situated human action. The social facts (that is to say, what actions typically come to) are used as a point of departure for retrospective theorizing about the “necessary character of the pathways whereby the end result is assembled” (ibid.: 68). In 1954 the sociologist Herbert Blumer published a critique of traditional sociology titled, “What Is Wrong with Social Theory?” (see Blumer 1969: 140–52). Blumer argues that the social world is constituted by the local production of meaningful action and that as such the social world has never been taken seriously by social scientists. Instead, Blumer says, investigations by social scientists have looked at meaningful action as the playing out of various determining factors, all antecedent and external to the action itself. Whether those factors are brought to the occasion in the form of individual predispositions, or are present in the situation as preexisting environmental conditions or received social norms, the action itself is treated as epiphenomenal. As a consequence, Blumer argues, we have a social science that is about meaningful human action but not a science of it. For the foundations of a science of action Blumer turns to Mead, who offers a metaphysics of action that is deeply sociological. Blumer points out that a central contribution of Mead’s work is his challenge to traditional assumptions regarding the origins of the commonsense world and of purposeful action: His treatment took the form of showing that human group life was the essential condition for the emergence of consciousness, the mind, a world of objects, human beings as organisms possessing selves, and human conduct in the form of constructed acts. He reversed the traditional assumptions underlying philosophical, psychological, and sociological thought to the effect that human beings possess minds and consciousness as original “givens,” that they live in worlds of pre-existing and self-constituted objects, and that group life consists of the association of such reacting human organisms. (Blumer 1969: 61)

Mead’s reversal, in putting human interaction before the objectivity of the commonsense world, should not be read as an argument for metaphysical idealism: Mead does not deny the existence of constraints in

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

76

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

the environment in which we act. What Mead is working toward is not a characterization of the natural world simpliciter but of the natural world under interpretation or the world as construed by us through language. The latter is precisely what we mean by the social world and, on Mead’s account, interaction is a condition for that world, while that world is a condition for intentional action. More recently, ethnomethodology has turned Durkheim’s aphorism on its head with more profound theoretical and methodological consequences.8 Briefly, the standpoint of ethnomethodology is that what traditional sociology captures is precisely our commonsense view of the social world (see Sacks 1963; Garfinkel 1967; Garfinkel and Sacks 1970). Following Durkheim, the argument goes, social studies have simply taken this commonsense view as foundational and attempted to build a science of the social world by improving on it. Social scientific theories, under this attempt, are considered to be scientific insofar as they remedy shortcomings in, and preferably quantify, the intuitions of everyday, practical sociological reasoning. In contrast, ethnomethodology grants commonsense sociological reasoning a fundamentally different status than that of a defective approximation of an adequate scientific theory. Rather than being resources for social science to improve on, the “all things being equal” typifications of commonsense reasoning are to be taken as social science’s topic. The notion that we act in response to an objectively given social world is replaced by the assumption that our everyday social practices render the world publicly available and mutually intelligible. It is those practices that constitute ethnomethods. The methodology of interest to ethnomethodologists, in other words, is not their own but that deployed by members of the society in coming to know, and making sense out of, the everyday world of talk and action. The outstanding question for social science, therefore, is not whether social facts are objectively grounded but how their objective grounding is accomplished. Objectivity is a product of systematic practices or members’ methods for rendering our unique experience and relative circumstances mutually intelligible. The source of mutual intelligibility is not a received conceptual scheme, or a set of coercive rules or norms, but those common practices that produce the typifications of which schemes and rules are made. The task of social studies, then, is to 8

For extensive consideration of Durkheim’s aphorism and its ethnomethodological rereading, see Rawls (1996), Garfinkel (2002).

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

77

describe the practices, not to enumerate their product in the form of a catalogue of commonsense beliefs about the social world. The interest of ethnomethodologists, in other words, is in how it is that the mutual intelligibility and objectivity of the social world is achieved. Ethnomethodology locates that achievement in our everyday situated actions, such that our common sense of the social world is not the precondition for our interaction but its product. By the same token, the objective reality of social facts is not the fundamental principle of social studies, but social studies’ fundamental phenomenon.

the indexicality of language Our shared understanding of situations is due in great measure to the efficiency of language, “the typifying medium par excellence” (Schutz 1962: 14). Language is efficient in the sense that, on the one hand, expressions have assigned to them conventional meanings that hold on any occasion of their use. The significance of a linguistic expression on some actual occasion, on the other hand, lies in its relationship to circumstances that are presupposed or indicated by, but not actually captured in, the expression itself.9 Language takes its significance from the embedding world, in other words, even while it transforms the world into something that can be thought of and talked about. Expressions that rely on their situation for significance are commonly called indexical, after the “indexes” of Charles Peirce (1933), the exemplary indexicals being first- and second-person pronouns, tense, and specific time and place adverbs such as here and now. In the strict sense, exemplified by these commonly recognized indexical expressions, the distinction of conventional or literal meaning and situated significance breaks down. That is to say, these expressions are distinguished by the fact that although one can state procedures for finding the expression’s significance, or rules for its use, the expression’s meaning can be specified only as the use of those procedures in some actual circumstances (see Bates 1976, Chapter 1). Heritage (1984: 143) offers as an example the indexical expression “that’s a nice one.” There is, first of all, the obvious fact that this 9

For a semantic theory based on this view of language, see Barwise and Perry (1985). Their work on language and information was highly salient among the audiences to whom these passages were written at the time, centered at the newly formed Center for the Study of Language and Information (CSLI) at Stanford University.

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

78

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

expression will have quite a different significance when uttered by a visitor with reference to a photograph in her host’s photo album or by one shopper to another in front of the lettuce bin at the grocery store. But although linguists and logicians would commonly recognize the referent of “that’s” as the problematic element in such cases, Heritage points out that the significance of the descriptor nice is equally so. So, in the first case, nice will refer to some properties of the photograph, whereas different properties will be intended in the case of the lettuce. Moreover, in either case whichever sense of nice is intended is not available from the utterance but remains to be found by the hearer through an active search of both the details of the referent and the larger context of the remark. So nice in the first instance might be a comment on the composition of the photograph, on the appearance of the host, or on some indefinite range of other properties of the photo in question. What is more, visitor and host will never establish in just so many words precisely what it is that the visitor intends and the host understands. Their interpretations of the term will remain partially unarticulated, located in their unique relationship to the photograph and the context of the remark. Yet the shared understanding that they do achieve will be perfectly adequate for purposes of their interaction. It is in this sense – that is, that expression and interpretation involve an active process of pointing to and searching the situation of talk – that language is a form of situated action. Among philosophers and linguists, the term indexicality typically is used to distinguish those classes of expressions whose meaning is conditional on the situation of their use in this way from those such as, for example, definite noun phrases whose meaning is claimed to be specifiable in objective, or context-independent terms. But the communicative significance of a linguistic expression is always dependent on the circumstances of its use. A formal statement not of what the language means in relation to any context, but of what the language-user means in relation to some particular context, requires a description of the context or situation of the utterance itself. And every utterance’s situation comprises an indefinite range of possibly relevant features.10 Our practical solution to this theoretical problem is not to enumerate some subset 10

The “problem” of context was a central preoccupation for cognitive science in the 1980s, as evidenced for example by a seminar series at CSLI titled “Why Context Won’t Go Away,” devoted to discussion of how context might best be represented in philosophical and computational formalisms.

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

79

of the relevant circumstances – we generally never mention our circumstances as such at all – but to “wave our hand” at the situation, as if we always included in our utterance an implicit ceteris paribus clause and closed with an implicit et cetera clause. One consequence of this practice is that we always “mean more than we can say in just so many words”: “[S]peakers can . . . do the immense work that they do with natural language, even though over the course of their talk it is not known and is never, not even “in the end,” available for saying in so many words just what they are talking about. Emphatically, that does not mean that speakers do not know what they are talking about, but instead that they know what they are talking about in that way” (Garfinkel and Sacks 1970: 342–4, original emphasis). In this sense deictic expressions, time and place adverbs, and pronouns are just particularly clear illustrations of the general fact that all language, including the most abstract or eternal, stands in an essentially indexical relationship to the embedding world. Because the significance of an expression always exceeds the meaning of what actually gets said, the interpretation of an expression turns not only on its conventional or definitional meaning, nor on that plus some body of presuppositions, but also on the unspoken situation of its use. Our situated use of language, and consequently language’s significance, presupposes and implies a horizon of things that are never actually mentioned – what Schutz referred to as the “world taken for granted” (1962: 74). Philosophers have been preoccupied with this fact about language as a matter of the truth conditionality of propositions, the problem being that the truth conditions of an assertion are always relative to a background, and the background does not form part of the semantic content of the sentence as such (Searle 1979). And the same problems that have plagued philosophers of language as a matter of principle are now practical problems for cognitive science. As I pointed out in Chapter 5 in this book, the view that mutual intelligibility rests on a stock of shared knowledge has been taken over by researchers in cognitive science, in the hope that an enumeration of the knowledge assumed by particular words or actions could be implemented as data structures in the machine, which would then “understand” those words and actions. Actual attempts to include the background assumptions of a statement as part of its semantic content, however, run up against the fact that there is no fixed set of assumptions that underlies a given statement. As a consequence, the elaboration of background assumptions is fundamentally ad hoc and arbitrary, and each elaboration of assumptions in principle introduces further assumptions to be elaborated, ad infinitum.

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

80

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

The problem of communicating instructions for action, in particular certain of its seemingly intractable difficulties, becomes clearer with this view of language in mind. The relation of efficient linguistic formulations to particular situations parallels the relation of instructions to situated action. As linguistic expressions, instructions are subject to the constraint that: “However extensive or explicit what a speaker says may be, it does not by its extensiveness or explicitness pose a task of deciding the correspondence between what he says and what he means that is resolved by citing his talk verbatim” (Garfinkel and Sacks 1970: 342–4). This indexicality of instructions means that an instruction’s significance with respect to action does not inhere in the instruction but must be found by the instruction follower with reference to the situation of its use. Far from replacing the ad hoc methods used to establish the significance of everyday talk and action, therefore, the interpretation of instructions is thoroughly reliant on those same methods. As Garfinkel concludes: “To treat instructions as though ad hoc features in their use was a nuisance, or to treat their presence as grounds for complaining about the incompleteness of instructions, is very much like complaining that if the walls of a building were gotten out of the way, one could see better what was keeping the roof up” (Garfinkel 1967: 22). Like all action descriptions, instructions necessarily rely on an implicit et cetera clause to be called complete. The project of instruction writing is ill conceived, therefore, if its goal is the production of exhaustive action descriptions that can guarantee a particular interpretation. What “keeps the roof up” in the case of instructions for action is not only the instructions as such, but also their interpretation in use. And the latter has all of the ad hoc and uncertain properties that characterize every occasion of the situated use of language.

the mutual intelligibility of action By “index” Peirce meant not only that the sign relies for its significance on the event or object that it indicates but also that the sign is actually a constituent of the referent. So language more generally is not only anchored in, but in large measure constitutes, the situation of its use. Ethnomethodology generalizes this constitutive function of language still further to action, in the proposition that the purposefulness of action is recognizable in virtue of the methodic, skillful, and therefore takenfor-granted practices whereby we establish the rational properties of actions in a particular context. It is those practices that provide for the

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

81

“analyzability of actions-in-context given that not only does no concept of context-in-general exist, but every use of ‘context’ without exception is itself essentially indexical” (Garfinkel 1967: 10). In positing the reflexivity of purposeful action and the methods by which we convey and construe action’s purposes, ethnomethodology does not intend to reduce meaningful action to method. The intent is rather to identify the mutual intelligibility of action as the problem for sociology. To account for the foundations of mutual intelligibility and social order, traditional social science posits a system of known-incommon social conventions or behavioral norms. What we share, on this view, is agreement on the appropriate relation of actions to situations. We walk into a situation, identify its features, and match our actions to it. This implies that, on any given occasion, the concrete situation must be recognizable as an instance of a class of typical situations, and the behavior of the actor must be recognizable as an instance of a class of appropriate actions. And with respect to communication, as Wilson (1970) points out: the different participants must define situations and actions in essentially the same way, since otherwise rules could not operate to produce coherent interaction over time. Within the normative paradigm, this cognitive agreement is provided by the assumption that the actors share a system of culturally established symbols and meanings. Disparate definitions of situations and actions do occur, of course, but these are handled as conflicting subcultural traditions or idiosyncratic deviations from the culturally established cognitive consensus. (ibid.: 699)

In contrast with this normative paradigm, Garfinkel proposes that the stability of the social world is not the consequence of a “cognitive consensus” or stable body of shared meanings but of our tacit use of the documentary method of interpretation to find the coherence of situations and actions. As a general process, the documentary method describes a search for uniformities that underlie unique appearances. Applied to the social world, it describes the process whereby actions are taken as evidence, or “documents,” of underlying plans or intent, which in turn fill in the sense of the actions (1967, Chapter 3). The documentary method describes an ability – the ascription of intent on the basis of evidence, and the interpretation of evidence on the basis of ascribed intent – that is as identifying of rationality as the ability to act rationally itself. At the same time, the documentary method is not reducible to the application of any necessary and sufficient conditions, either behavioral or contextual, for the identification of intent. There are no logical formulae

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

82

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

for recognizing the intent of some behavior independent of context, and there are no recognition algorithms for joining contextual particulars to behavioral descriptions so that forms of intent can be precisely defined over a set of necessary and sufficient observational data (see Coulter 1983: 162–3). Given the lack of universal rules for the interpretation of action, the programme of ethnomethodology is to investigate and describe the use of the documentary method in particular situations. Studies indicate, on the one hand, the generality of the method and, on the other hand, the extent to which special constraints on its use characterize specialized domains of practical activity such as natural science, courts of law, and the practice of medicine.11 In a contrived situation that, though designed independently and not with them in mind, closely parallels both the “Turing test” and encounters with Weizenbaum’s ELIZA programs, Garfinkel set out to test the documentary method in the context of counseling. Students were asked to direct questions concerning their personal problems to someone they knew to be a student counselor, seated in another room. They were restricted to questions that could take yes/no answers, and the answers were then given by the counselor on a random basis. For the students, the counselor’s answers were motivated by the questions. That is to say, by taking each answer as evidence for what the counselor “had in mind,” the students were able to find a deliberate pattern in the exchange that explicated the significance and relevance of each new response as an answer to their question. Specifically, the yes/no utterances were found to document advice from the counselor, intended to help in the solution of the student’s problem. So, for example, students assigned to the counselor, as the advice “behind” the answer, the thought formulated in the student’s question: “when a subject asked ‘Should I come to school every night after supper to do my studying?’ and the experimenter said ‘My answer is no,’ the subject in his comments said, ‘He said I shouldn’t come to school and study’” (Garfinkel 1967: 92). In cases where an answer seemed directly to contradict what had come before, students either attributed the 11

For example, the work of coroners at the Los Angeles Suicide Prevention Center (Garfinkel 1967: 11–18), the deliberations of juries (ibid.: Chapter 4) and courtroom practices of attorneys (Atkinson and Drew 1979), the work of clinic staff in selecting patients for out-patient psychiatric treatment (Garfinkel 1967, Chapter 7), the work of physicians interviewing patients for purposes of diagnosis (Beckman and Frankel 1983), the work of scientists discovering an optical pulsar (Garfinkel, Lynch, and Livingston 1981). (Original footnote.)

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

0 521 85891 7

Situated Actions

September 21, 2006

83

apparent contradiction to a change of mind on the part of the counselor, as the result of having learned more between the two replies, or to some agenda on the part of the counselor that lent the reply a deeper significance than its first, apparently inconsistent, interpretation would suggest. In other cases, the interpretation of previous answers was revised in light of the current one, or an interpretation of the question was found and attributed to the counselor that rationalized what would otherwise appear to be an inappropriate answer. Generally, Garfinkel observes: “The underlying pattern was elaborated and compounded over the series of exchanges and was accommodated to each present ‘answer’ so as to maintain the ‘course of advice,’ to elaborate what had ‘really been advised’ previously, and to motivate the new possibilities as emerging features of the problem” (ibid.: 90). Garfinkel’s results with arbitrary responses make the success of Weizenbaum’s DOCTOR program easier to understand and lend support to Weizenbaum’s hypothesis that the intelligence of interactions with the DOCTOR program is due to the work of the human participant, specifically, to methods for interpreting the system’s behavior as evidence for some underlying intent. The larger implications of the documentary method, however, touch on the status of an “underlying” reality of psychological and social facts in human interaction, prior to situated action and interpretation: It is not unusual for professional sociologists to think of their procedures as processes of “seeing through” appearances to an underlying reality; of brushing past actual appearances to “grasp the invariant.” Where our subjects are concerned, their processes are not appropriately imagined as “seeing through,” but consist instead of coming to terms with a situation in which factual knowledge of social structures – factual in the sense of warranted grounds of further inferences and actions – must be assembled and made available for potential use despite the fact that the situations it purports to describe are, in any calculable sense, unknown; in their actual and intended logical structures are essentially vague; and are modified, elaborated, extended, if not indeed created, by the fact and matter of being addressed. (Garfinkel 1967: 96)

The stability of the social world, from this standpoint, is not due to an eternal structure but to situated actions that create and sustain shared understanding on specific occasions of interaction. Social constraints on appropriate action are always identified relative to some unique and unreproducible set of circumstances. Members of the society are treated as being at least potentially aware of the concrete details of their circumstances, and their actions are interpreted in that light. Rather than

17:56

P1: KAE 0521858917c06

CUFX024/Suchman

84

0 521 85891 7

September 21, 2006

Human–Machine Reconfigurations

actions being determined by rules, actors effectively use the normative rules of conduct that are available to produce significant actions. So, for example, there is a normative rule for greetings that runs to the effect: do not initiate greetings except with persons who are acquaintances. If we witness a person greeting another who we know is not an acquaintance, we can either conclude that the greeter broke the rule or infer that via the use of the rule he or she was seeking to treat the other as an acquaintance (Heritage 1984: 126). Such rules are not taught or encoded but are learned tacitly through typification over families of similar situations and actions. Despite the availability of such typifications, no action can fully provide for its own interpretation in any given instance. Instead, every instance of meaningful action must be accounted for separately with respect to specific, local, contingent determinants of significance. The recommendation for social studies, as a consequence, is that instead of looking for a structure that is invariant across situations we look for the processes whereby particular, uniquely constituted circumstances are systematically interpreted so as to render meaning shared and action accountably rational. Structure, on this view, is an emergent product of situated action, rather than its foundation. Insofar as the project of ethnomethodology is to redirect social science from its traditional preoccupation with abstract structures to an interest in situated actions, and the cognitive sciences share in that same tradition, the ethnomethodological project has implications for cognitive science as well.

17:56

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/200086092

Human-centered design considered harmful Article  in  interactions · July 2005 DOI: 10.1145/1070960.1070976

CITATIONS

READS

404

40,385

2 authors, including: Donald Arthur Norman University of California, San Diego 355 PUBLICATIONS   71,538 CITATIONS    SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Design View project

All content following this page was uploaded by Donald Arthur Norman on 17 May 2014. The user has requested enhancement of the downloaded file.

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■

: / 14

Human-Centered Design Considered Harmful By Do nal d A. N or m an > N ie l s e n Nor m a n Gr o up > n or m an @nn gr ou p. com

Most items in the world have been designed

Human-centered design has become such a dominant theme in design that it is now accepted by

without the benefit of user studies and the methods

interface and application designers automatically,

of human-centered design. Yet they do quite well.

without thought, let alone criticism. That’s a danger-

Moreover, these include some of the most success-

ous state—when things are treated as accepted

ful objects of our modern, technological worlds.

wisdom. The purpose of this essay is to provoke

Consider two representative examples: T h e A u t o m ob i l e.

thought, discussion, and reconsideration of some of

People all over the world learn

the fundamental principles of human-centered

to drive quite successfully with roughly the same

design. These principles, I suggest, can be helpful,

configuration of controls. There were no systematic

misleading, or wrong. At times, they might even be

studies of users. Rather, early automobiles tried a

harmful. Activity-centered design might be superior.

variety of configurations, initially copying the seating and steering arrangements of horse-drawn car-

Know Your User

riages, going through tillers and rods, and then vari-

If there is any principle that is sacred to those in

ous hand and foot controls until the current scheme

the field of user-interface design and human-com-

evolved. Ev er yd a y Ob jec ts.

puter interaction, it is “know your user.” After all,

Just look around: kitchen

how can one design something for people without a

utensils, garden tools, woodworking tools, typewrit-

deep, detailed knowledge of those people? The

ers, cameras, and sporting equipment vary some-

plethora of bad designs in the world would seem to

what from culture to culture, but on the whole, they

be excellent demonstrations of the perils of ignor-

are more similar than not. People all over the world

ing the people for whom the design is intended.

manage to learn them—and manage quite well. Act iv it y- Center ed Des ign.

Human-centered design was developed to overcome

Why do these devices

the poor design of software products. By emphasiz-

work so well? The basic reason is that they were all

ing the needs and abilities of those who were to

developed with a deep understanding of the activi-

use the software, usability and understandability of

ties that were to be performed: Call this activity-

products has indeed been improved. But despite

centered design. Many were not even designed in

these improvements, software complexity is still with

the common sense of the term; rather, they evolved

us. Even companies that pride themselves on fol-

with time. Each new generation of builders slowly

lowing human-centered principles still have com-

improved the product upon the previous generation,

plex, confusing products.

based on feedback from their own experiences as

If it is so critical to understand the particular users

well as from their customers. Slow, evolutionary folk

of a product, then what happens when a product is

design. But even for those devices created by for-

designed to be used by almost anyone in the world?

mal design teams, populated with people whose job

There are many designs that do work well for every-

title was “designer,” these designers used their own

one. This is paradoxical, and it is this very paradox

understanding of the activities to be performed to

that led me to reexamine common dogma.

determine how the device would be operated. The

i n t e r a c t i o n s

/

j u l y

+

a u g u s t

2 0 0 5

fresh users were supposed to understand the task and to

when we are rested. University classes are taught in

understand the designers’ intentions.

one-hour periods, three times a week, in ten- to 15week sessions, not because this is good for educa-

Activities Are Not the Same as Tasks

tion, but because it makes for easier scheduling.

Do note the emphasis on the word “activity” as

The extreme reliance on time is an accidental out-

opposed to “task.” There is a subtle difference. I

growth of the rise of the factory and the resulting

use the terms in a hierarchical fashion. At the high-

technological society. Wr itin g Syst em s.

est levels are activities, which are composed of

Consider printing, handwriting,

tasks, which themselves are composed of actions,

and typing. All are artificial and unnatural. It takes

and actions are made up of operations. The hierar-

people weeks, months, or even years to learn and

chical structure comes from my own brand of

become skilled. One successful stylus-based text

“activity theory,” heavily motivated by early Russian

input device for the Roman alphabet is graffiti—yet

and Scandinavian research. To me, an activity is a

another unnatural way of writing. Mu si ca l I n str u me n ts .

coordinated, integrated set of tasks. For example,

Musical instruments are

mobile phones that combine appointment books,

complex and difficult to manipulate and can cause

diaries and calendars, note-taking facilities, text

severe medical problems. Musical notation is modal,

messaging, and cameras can do a good job of sup-

so the same representation on a treble clef has a

porting communication activities. This one single

different interpretation on the bass clef. The usabili-

device integrates several tasks: looking up numbers,

ty profession has long known of the problems with

dialing, talking, note taking, checking one’s diary or

modes, yet multiple staves have been with us for

calendar, and exchanging photographs, text mes-

approximately 1,000 years. It takes considerable

sages, and emails. One activity, many tasks.

instruction and practice to become skilled at reading and playing. The medical problems faced by musi-

What Adapts? Technology or People?

cians are so severe that there are books, physicians,

The historical record contains numerous examples

Web pages and discussion groups devoted to them.

of successful devices that required people to adapt

For example, repetitive stress injuries among violin-

to and learn the devices. People were expected to

ists and pianists are common. Neither the instru-

acquire a good understanding of the activities to be

ments nor the notation would pass any human-cen-

performed and of the operation of the technology.

tered design review.

None of this “tools adapt to the people” non-

Human-Centered versus ActivityCentered: What’s the Difference?

sense—people adapt to the tools. Think about that last point. A fundamental corollary to the principle of human-centered design has

What is going on? Why are such non-human-cen-

always been that technology should adapt to peo-

tered designs so successful? I believe there are two

ple, not people to the technology. Is this really true?

reasons, one the activity-centered nature, and two

Consider the history of the following successful

the communication of intention from the builders

technologies.

and designers. Successful devices are those that fit

The Cl ock (and Wat ch).

An arbitrary division of

gracefully into the requirements of the underlying

the year and day into months, weeks, days, hours,

activity, supporting them in a manner understand-

minutes, and seconds, all according to physical prin-

able by people. Understand the activity, and the

ciples that differ from psychological or biological

device is understandable. Builders and designers

ones, now rules our lives. We eat when our watches

often have good reasons for the way they construct-

tell us it is meal time, not when we are hungry. We

ed the system. If these reasons can be explained,

awake according to the harsh call of the alarm, not

then the task of learning the system is both eased

i n t e r a c t i o n s

/

j u l y

+

a u g u s t

2 0 0 5

: / 15

■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■

and made plausible. Yes, it takes years to learn to

the vagaries of the media. If you want to do oil

play the violin, but people accept this because the

painting, then you need to understand oil, and

instrument itself communicates rather nicely the

brushes, and painting surfaces—even how and

relationship between strings and the resulting

when to clean your brush. Is this the tool wagging

sounds. Both the activity and the design are under-

the dog? Yes, and that is how it always is, how it

standable, even if the body must be contorted to

always shall be. The truly excellent artists have a

hold, finger, and bow the instrument.

deep and thorough understanding of their tools and

Activity-centered design (ACD) is actually very much like human-centered design (HCD). Many of

sense. So too with sports, with cooking, with music,

the best attributes of HCD carry over. But there are

and with all other major activities that use tools.

several differences, first and foremost that of attitude. Attitude? Yes, the mindset of the designer. The activities, after all, are human activities, so they reflect the possible range of actions, of condi-

To the human-centered design community, the tool should be invisible; it should not get in the way. With activity-centered design, the tool is the way.

tions under which people are able to function, and

Why Might HCD Be Harmful?

the constraints of real people. A deep understanding

Why might a human-centered design approach ever

of people is still a part of ACD. But ACD is more: It

be harmful? After all, it has evolved as a direct

also requires a deep understanding of the technolo-

result of the many problems people have with exist-

gy, of the tools, and of the reasons for the activities.

ing designs, problems that lead to frustration, grief,

Tools Define the Activity: People Really Do Adapt to Technology

lost time and effort, and, in safety-critical applications, errors, accidents, and death. Moreover, HCD has demonstrated clear benefits: improved usability,

HCD asserts as a basic tenet that technology

fewer errors during usage, and faster learning times.

adapts to the person. In ACD, we admit that much

What, then, are the concerns?

of human behavior can be thought of as an adapta-

One concern is that the focus upon individual

tion to the powers and limitations of technology.

people (or groups) might improve things for them

Everything, from the hours we sleep to the way we

at the cost of making it worse for others. The more

dress, eat, interact with one another, travel, learn,

something is tailored for the particular likes, dislikes,

communicate, play, and relax. Not just the way we

skills, and needs of a particular target population,

do these things, but with whom, when, and the way

the less likely it will be appropriate for others.

we are supposed to act, variously called mores, customs, and conventions.

The individual is a moving target. Design for the individual of today, and the design will be wrong

People do adapt to technology. It changes social

tomorrow. Indeed, the more successful the product,

and family structure. It changes our lives. Activity-

the more that it will no longer be appropriate. This

centered design not only understands this, but

is because as individuals gain proficiency in usage,

might very well exploit it.

they need different interfaces than were required

Learn the activity, and the tools are understood.

when they were beginners. In addition, the success-

That’s the mantra of the human-centered design

ful product often leads to unanticipated new uses

community. But this is actually a misleading state-

that are very apt not to be well supported by the

ment, because for many activities, the tools define

original design.

the activity. Maybe the reality is just the converse: Learn the tools, and the activity is understood.

i n t e r a c t i o n s

But there are more-serious concerns: First, the focus upon humans detracts from support for the

Consider art, where much time is spent learning

: / 16

technologies. It isn’t enough to have an artistic

/

activities themselves; second, too much attention to

j u l y

+

a u g u s t

2 0 0 5

fresh the needs of the users can lead to a lack of cohe-

designs. Several major software companies, proud

sion and added complexity in the design. Consider

of their human-centered philosophy, suffer from this

the dynamic nature of applications, where any task

problem. Their software gets more complex and less

requires a sequence of operations, and activities can

understandable with each revision. Activity-centered

comprise multiple, overlapping tasks. Here is where

philosophy tends to guard against this error

the difference in focus becomes evident, and where

because the focus is upon the activity, not the

the weakness of the focus on the users shows up.

human. As a result, there is a cohesive, well-articulated design model. If a user suggestion fails to fit

Static Screens versus Dynamic Sequences

within this design model, it should be discarded. Alas, all too many companies, proud of listening to their users, would put it in.

We find that work in the kitchen does not consist of

Here, what is needed is a strong, authoritative

independent, separate acts, but of a series of interrelated processes. (Christine Frederick, The Labor-

designer who can examine the suggestions and

Saving Kitchen.1919.)

evaluate them in terms of the requirements of the activity. When necessary, it is essential to be able to

The methods of HCD seem centered around static understanding of each set of controls, each screen

ignore the requests. This is the goal to cohesion

on an electronic display. But as a result, the sequen-

and understandability. Paradoxically, the best way to

tial operations of activities are often ill-supported.

satisfy users is sometimes to ignore them. Note that this philosophy applies in the service

The importance of support for sequences has been known ever since the time-and-motion studies of the

domain as well. Thus, Southwest Airlines has been

early 1900s, as the quotation from Frederick, above,

successful despite the fact that it ignores the two

illustrates. Simply delete the phrase “in the kitchen”

most popular complaints of its passengers: provide

and her words are still a powerful prescription for

reserved seating and inter-airline baggage transfer.

design. She was writing in 1919: What has hap-

Southwest decided that its major strategic advan-

pened in the past 100 years to make us forget this?

tage was inexpensive, reliable transportation, and

Note that the importance of support for sequences

this required a speedy turn-around time at each

is still deeply understood within industrial engineer-

destination. Passengers complain, but they still pre-

ing and human factors and ergonomics communities.

fer the airline. Sometimes what is needed is a design dictator

Somehow, it seems less prevalent within the human-

who says, “Ignore what users say: I know what’s

computer interaction community.

best for them.” The case of Apple Computer is illus-

Many of the systems that have passed through HCD design phases and usability reviews are

trative. Apple’s products have long been admired for

superb at the level of the static, individual display,

ease of use. Nonetheless, Apple replaced its well

but fail to support the sequential requirements of

known, well-respected human interface design team

the underlying tasks and activities. The HCD meth-

with a single, authoritative (dictatorial) leader. Did

ods tend to miss this aspect of behavior: Activity-

usability suffer? On the contrary: Its new products

centered methods focus upon it.

are considered prototypes of great design. The “listen to your users” produces incoherent

Too Much Listening to Users

designs. The “ignore your users” can produce horror

One basic philosophy of HCD is to listen to users,

stories, unless the person in charge has a clear vision

to take their complaints and critiques seriously. Yes,

for the product, what I have called the “conceptual

listening to customers is always wise, but acceding

model.” The person in charge must follow that vision

to their requests can lead to overly complex

and not be afraid to ignore findings. Yes, listen to

i n t e r a c t i o n s

/

j u l y

+

a u g u s t

2 0 0 5

: / 17

fresh customers, but don’t always do what they say.

of those bad designs are profitable products. Hmm.

Now consider the method employed by the

What does that suggest? Would they be even more

human-centered design community. The emphasis is

profitable had human-centered design principles

often upon the person, not the activity. Look at

been followed? Perhaps. But perhaps they might

those detailed scenarios and personas: Honestly,

not have existed at all. Think about that.

now, did they really inform your design? Did know-

Yes, we all know of disastrous attempts to intro-

ing that the persona is that of a 37-year-old, single

duce computer systems into organizations where

mother, studying for the MBA at night, really help

the failure was a direct result of a lack of under-

lay out the control panel or determine the screen

standing of the people and system. Or was it a

layout and, more importantly, to design the appro-

result of not understanding the activities? Maybe

priate action sequence? Did user modeling, formal

what is needed is more activity-centered design;

or informal, help determine just what technology

Maybe failures come from a shallow understanding

should be employed?

of the needs of the activities that are to be sup-

Show me an instance of a major technology that

ported. Note too that in safety-critical applications,

was developed according to principles of human-

a deep knowledge of the activity is fundamental.

centered design, or rapid prototype and test, or

Safety is usually a complex system issue, and with-

user modeling, or the technology adapting to the

out deep understanding of all that is involved, the

user. Note the word “major.” I have no doubt that

design is apt to be faulty.

many projects were improved, perhaps even dra-

Still, I think it’s time to rethink some of our funda-

matically, by the use of these techniques. But name

mental suppositions. The focus upon the human

one fundamental, major enhancement to our tech-

may be misguided. A focus on the activities rather

nologies that came about this way.

than the people might bring benefits. Moreover,

Human-centered design does guarantee good

substituting activity-centered for human-centered

products. It can lead to clear improvements of bad

design does not mean discarding all that we have

ones. Moreover, good human-centered design will

learned. Activities involve people, and so any sys-

avoid failures. It will ensure that products do work,

tem that supports the activities must of necessity

that people can use them. But is good design the

support the people who perform them. We can

goal? Many of us wish for great design. Great

build upon our prior knowledge and experience,

design, I contend, comes from breaking the rules,

both from within the field of HCD, but also from

by ignoring the generally accepted practices, by

industrial engineering and ergonomics.

pushing forward with a clear concept of the end

All fields have fundamental presuppositions.

result, no matter what. This ego-centric, vision-

Sometimes it is worthwhile to reexamine them, to

directed design results in both great successes and

consider the pros and cons and see whether they

great failures. If you want great rather than good,

might be modified or even replaced. Is this the

this is what you must do.

case for those of us interested in human-centered

There is a lot more to say on this topic. My pre-

design? We will never know unless we do the

cepts here are themselves dangerous. We dare not

exercise. ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■

let the entire world of designers follow their instincts and ignore conventional wisdom: Most lack

ABOUT THE AUTHOR

the deep understanding of the activity coupled with

many hats, including co-founder of the

a clear conceptual model. Moreover, there certainly

Nielsen Norman Group, professor at Northwestern University, and author; his lat-

are sufficient examples of poor design out in the

est book is Emotional Design. He lives at www.jnd.org.

world to argue against my position. But note, many

i n t e r a c t i o n s View publication stats

Don Norman wears

/

j u l y

+

a u g u s t

2 0 0 5

: / 19

Washington and Lee Law Review Online Volume 72

Issue 3

Article 8

6-14-2016

Evolving the IRB: Building Robust Review for Industry Research Molly Jackman Facebook

Lauri Kanerva Facebook

Follow this and additional works at: https://scholarlycommons.law.wlu.edu/wlulr-online Part of the Law Commons

Recommended Citation Molly Jackman & Lauri Kanerva, Evolving the IRB: Building Robust Review for Industry Research, 72 WASH. & LEE L. REV. ONLINE 442 (2016), https://scholarlycommons.law.wlu.edu/wlulr-online/vol72/iss3/8

This Roundtable: Beyond IRBs: Designing Ethical Review Processes for Big Data is brought to you for free and open access by the Law School Journals at Washington and Lee University School of Law Scholarly Commons. It has been accepted for inclusion in Washington and Lee Law Review Online by an authorized editor of Washington and Lee University School of Law Scholarly Commons. For more information, please contact [email protected].

Evolving the IRB: Building Robust Review for Industry Research Molly Jackman and Lauri Kanerva* Abstract Increasingly, companies are conducting research so that they can make informed decisions about what products to build and what features to change. These data-driven insights enable companies to make responsible decisions that will improve peoples’ experiences with their products. Importantly, companies must also be responsible in how they conduct research. Existing ethical guidelines for research do not always robustly address the considerations that industry researchers face. For this reason, companies should develop principles and practices around research that are appropriate to the environments in which they operate, taking into account the values set out in law and ethics. This paper describes the research review process designed and implemented at Facebook, including the training employees receive, and the steps involved in evaluating proposed research. We emphasize that there is no one-size-fits-all model of research review that can be applied across companies, and that processes should be designed to fit the contexts in which the research is taking place. However, we hope that general principles can be extracted from Facebook’s process that will inform other companies as they develop frameworks for research review that serve their needs. Table of Contents I. Introduction....................................................................... 443 II. The Merits of Industry Research and Review ................. 445 * Molly Jackman, Public Policy Research Manager, Facebook. Lauri Kanerva, Research Management Lead, Facebook.

442

EVOLVING THE IRB

443

III. Existing Frameworks ....................................................... 447 IV. Designing a Process .......................................................... 450 A. Training....................................................................... 451 B. Review By Substantive Area Expert ......................... 451 C. Review by Research Review Group ........................... 452 D. A Note on Evaluative Criteria ................................... 454 V. Conclusion ......................................................................... 456 A. Leverage Existing Infrastructure .............................. 456 B. Inclusiveness Is Key ................................................... 456 C. Ask for Help ................................................................ 456 D. Listening to Feedback ................................................ 457 E. Flexibility .................................................................... 457

I. Introduction Increasingly, companies are conducting research to understand how to improve their products and develop new insights about the world.1 Traditional guidelines for research may not always robustly address the considerations—ethical and otherwise—that industry researchers face.2 Thus, it is prudent for companies to develop principles and practices around research that are appropriate to the environments in which they operate, taking into account the values set out in law and ethics. Establishing and abiding by such principles enables companies to do responsible research that is calibrated to their industry and that will make real contributions to society and science. This challenge of establishing and implementing a robust research review does not just apply to industry. Analysis of existing datasets is being undertaken with greater frequency in

1. See generally Mathieu Alemany Oliver & Jean-Sébastien Vayre, Big Data and the Future of Knowledge Production in Marketing Research: Ethics, Digital Traces, and Abductive Reasoning, 3 J. MARKETING ANALYTICS 5 (2015) (exploring how big data has transformed marketing research techniques). 2. See Jules Polonetsky, Omer Tene & Joseph Jerome, Beyond the Common Rule: Ethical Structures for Data Research in Non-Academic Settings, 13 COLO. TECH. L.J. 333, 337 (2015) (asserting that traditional privacy principles do not adequately address new ethical concerns arising from big data research).

444

72 WASH. & LEE L. REV. ONLINE 442 (2016)

government, medicine, science, and academia.3 These studies provide insights to inform product development and also hold the potential to contribute to general knowledge and solve important policy problems.4 While existing frameworks provide some guidance for ethical review, there is a need for context-specific guidelines, tailored to the range of research that exists in these different environments. This Article describes the research review process developed and implemented at Facebook.5 The process leverages the company’s organizational structure, creating multiple training opportunities and research review checkpoints in the existing organizational flow. Moreover, the review criteria are tailored to the typical questions Facebook researchers address and the data that they use. In developing this process, we have benefited from numerous rounds of feedback from internal teams and external experts.6 We hope that general principles can be extracted from our process to inform thinking about the evolution of research review in general. We emphasize, however, that there is no one-size-fits-all model for research review; the model best suited to protect people and promote ethical research is one that fits the unique context in which the research takes place. Additionally, a flexible process is key: The ever-changing nature of the questions and data involved in industry (and academic) research requires that any processes must be able to adapt efficiently to new internal challenges and external feedback so they can improve over time.7 3. See id. at 335 (noting the growing use of big data research in the fields of “healthcare, education, energy conservation, law enforcement, and national security”) 4. See id. at 335–36 (“[B]ig data is not only fueling business intelligence but also informing decision-making around some of the world’s toughest social problems . . . . The benefits of such research accrue not only to organizations but also to affected individuals, communities, and society at large.”). 5. This including the Facebook family of applications and services. 6. The authors are grateful for the thoughtful advice and consultation of numerous individuals. Special thanks to Martin Abrams, Rebecca Armstrong, Joetta Bell, Ryan Calo, Brenda Curtis, Anastasia Doherty, Penelope Eckert, William Faustman, Susan Fish, Celia Fisher, Manjit Gill, William Hoffman, Joe Jerome, Reynol Junco, Michelle Meyer, Doug McFarland, Amy Lynn McGuire, Jules Polonetsky, Evan Selinger, Adam Tanner, Timothy Yi, and Ruby Zefo. 7. This view is supported by many entities dealing with big data research, including analysts and other industry stakeholders. See Lisa Morgan, Flexibility

EVOLVING THE IRB

445

The Article proceeds as follows. Part Two describes the need for internal review processes within companies. Part Three provides an overview of Institutional Review Boards (IRBs) and describes why we at Facebook found that the Common Rule framework does not fully meet our research needs. Part Four includes information about the research review process at Facebook. Part Five concludes with a discussion of the lessons we learned during implementation of our review process, including (1) leveraging existing infrastructure; (2) openness; (3) seeking help from experts; (4) listening to feedback; and (5) being flexible to changing internal and external conditions. II. The Merits of Industry Research and Review In a joint study conducted by researchers at Harvard, MIT, McKinsey, and the University of Pennsylvania, companies that characterized their decision-making structures as data-driven were found to perform better on objective measures of financial and operational success.8 To be sure, decisions can be driven by insights generated outside of a company; however, companies often possess the best data with which to study their own products and performance, making internal research highly valuable in many contexts.9 Is Critical for Big Data Analytics, SOFTWARE DEV. TIMES (Apr. 1, 2015), http://sdtimes.com/flexibility-is-critical-for-big-data-analytics/ (last visited Apr. 11, 2016) (“Regardless of how sophisticated or unsophisticated an organization may be, tool investments should consider the current state, but be flexible enough to adapt to a future state.”) (on file with the Washington and Lee Law Review); Marc Andrews, Flexibility Is Key to a Smooth Big Data and Analytics Journey, IBM BIG DATA & ANALYTICS HUB (Oct. 26, 2014), http://www.ibmbigdatahub.com/blog/flexibility-key-smooth-big-data-andanalytics-journey (last visited Apr. 11, 2016) (conveying to companies that “embarking on a big data and analytics journey is like setting off on a worldwide tour. You have an idea of what you want to do and see, and what you’ll need, but you must be flexible—your adventure will undoubtedly take some unforeseeable turns”) (on file with the Washington and Lee Law Review). 8. See Andrew McAfee & Erik Brynjolfsson, Big Data: The Management Revolution, HARV. BUS. REV., Oct. 2012, at 61, 67, http://www.tias.edu/docs/defaultsource/Kennisartikelen/mcafeebrynjolfson_bigdatamanagementrevolution_hbr2 012.pdf?sfvrsn=0 (“The evidence is clear: Data-driven decisions tend to be better decisions.”). 9. See id. at 64 (providing specific examples of benefits stemming from

446

72 WASH. & LEE L. REV. ONLINE 442 (2016)

Research does not only make companies more efficient and innovative but also can make them more responsible. For example, A/B testing—comparing outcomes for a treatment and control group to determine differences in performance—can provide insights into what people find most useful and relevant, rather than relying solely on intuition.10 As ethicist and legal scholar Michelle Meyer writes, “Practices that are subjected to . . . A/B testing . . . generally have a far greater chance of being discovered to be unsafe or ineffective, potentially leading to substantial welfare gains if practitioners act on their newfound knowledge.”11 Intuition often drives innovation; research allows companies to test whether new products—in Facebook’s case, anything from allowing replies to comments12 to incorporating suicide prevention features13—are improving people’s experience on a small scale before being implemented for a broader population. Sustaining a research program in a company—which can generate data-driven insights to inform decision-making and lead to greater efficiency and growth—requires developing an infrastructure to support it, including creating an internal approach to reviewing the ethics of proposed research.14 Early review of research provides feedback on the ethical implications internal data research). 10. See Michelle N. Meyer, Two Cheers for Corporate Experimentation: The A/B Illusion and the Virtues of Data-Driven Innovation, 13 COLO. TECH. L.J. 273, 277 (2015) (explaining how A/B testing is typically conducted and examining its uses). 11. Id. 12. Vadim Lavrusik, Improving Conversations on Facebook with Replies, FACEBOOK (Mar. 25, 2013, 10:59 AM), https://www.facebook.com/notes/journalists-on-facebook/improvingconversations-on-facebook-with-replies/578890718789613/ (last visited Mar. 11, 2016) (announcing launch of this “new comments feature designed to improve conversations”) (on file with the Washington and Lee Law Review). 13. Alexis Kleinman, Facebook Adds New Feature for Suicide Prevention, HUFFINGTON POST (Feb. 25, 2015, 4:03 PM), http://www.huffingtonpost.com/2015/02/25/facebook-suicideprevention_n_6754106.html (last updated Mar. 2, 2015) (last visited Apr. 11, 2016) (reporting this new Facebook feature and explaining how it works) (on file with the Washington and Lee Law Review). 14. See Polonetsky, Tene & Jerome, supra note 2, at 364–65 (discussing the benefits of internal review boards).

EVOLVING THE IRB

447

of proposed projects, so that problems can be anticipated and avoided. Although the review process we describe is primarily intended to consider ethical issues, research review can also benefit companies by identifying potential challenges in those other domains like law or public policy so they can be addressed. III. Existing Frameworks In 1978, the National Commission published the Belmont Report,15 intended to serve as guidelines for academic research.16 The Belmont Report influenced the Federal Policy for the Protection of Human Subjects, or the “Common Rule,” published in 1991.17 The Common Rule outlines the basic provisions for IRBs. At qualifying academic institutions, researchers are required to justify their proposals in accordance with the principles of the Belmont Report, as codified in the Common Rule, to an IRB.18 Institutions are only required to form IRBs, however, when they receive federal funding—which means that private companies conducting research are under no obligation to do so.19 Existing frameworks have not kept up with state of the art research because, even at institutions that are subject to IRBs, researchers are increasingly undertaking studies that are exempt from full review under the Common Rule.20 Leading scholars have questioned whether proposed changes to the Common Rule 15. Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research, 44 Fed. Reg. 23,191 (Apr. 18, 1979). 16. See Belmont Report, OFF. FOR HUM. RES. PROTECTIONS, U.S. DEP’T OF HEALTH & HUM. SERVICES, http://www.hhs.gov/ohrp/archive/belmontArchive.html (last visited Mar. 11, 2016) (describing the history of the Belmont Report) (on file with the Washington and Lee Law Review). 17. Basic HHS Policy for Protection of Human Research Subjects, 45 C.F.R. § 46.101(a) (2009). 18. Id. §§ 46.107–46.109. 19. Id. § 46.101(a). 20. See Effy Vayena, Urs Gasser, Alexandra Wood, David R. O’Brien & Micah Altman, Towards a New Ethical and Regulatory Framework for Big Data Research, 72 WASH. & LEE L. REV. ONLINE (forthcoming Apr. 2016) (manuscript at 4–5) (describing how modern research often evades the Common Rule due to its limited application) (on file with the Washington and Lee Law Review).

448

72 WASH. & LEE L. REV. ONLINE 442 (2016)

would adequately address this issue, because they may not reach important uses of data.21 While companies may work to build IRBs or other internal review mechanisms, they must undertake those efforts against the backdrop of the concern these experts have expressed: that the Common Rule does not provide sufficient guidance to address the challenges relevant to most industry research.22 While a broad ethical framework is helpful, a more prescriptive approach can be developed and implemented within companies based on their particular research contexts. The field of research ethics has an established framework of protections that should apply to people and data involved in research—many of which have been built into the program we operate at Facebook. For instance, the Menlo Report, which proposes guidelines for ethical review of technology research, says that respect for persons is maintained in industry research by ensuring data protections and removing non-essential identifying information from data reporting.23 Facebook has designed processes and systems consistent with these principles.24 For instance, a dedicated security team monitors data access, and 21. See COUNCIL FOR BIG DATA, ETHICS & SOC’Y, COMMENT LETTER ON PROPOSED CHANGES TO THE COMMON RULE 2 (Dec. 29, 2015), http://bdes.datasociety.net/wp-content/uploads/2015/12/BDES-Common-RuleLetter.pdf?utm_content=bufferb4ef5&utm_medium=social&utm_source=twitter. com&utm_campaign=buffer We wish to express our view that any rules which include or exclude data science from federal ethics regulations should be based on sound research and reasoning about risks to human subjects and preservation of social justice, and achieve clarity about when and how ethics regulations should apply. The proposed revisions in the NPRM fall short of this in several regards. 22. See id. at 1 (“Not surprisingly, researchers and practitioners are increasingly finding that these new methods of knowledge production raise ethical challenges that do not easily translate into the regulatory frameworks developed over the last several decades.”). 23. HOMELAND SEC., SCI. & TECH., THE MENLO REPORT: ETHICAL PRINCIPLES GUIDING INFORMATION AND COMMUNICATION TECHNOLOGY RESEARCH 8 (2012), https://www.caida.org/publications/papers/2012/menlo_report_actual_formatted/ menlo_report_actual_formatted.pdf/. 24. See generally Ryan Calo, Consumer Subject Review Boards: A Thought Experiment, 66 STAN. L. REV. ONLINE 97 (Sept. 3, 2013), http://www.stanfordlawreview.org/sites/default/files/online/topics/Calo.pdf (examining the ethical concerns involved with studying human behavior).

EVOLVING THE IRB

449

employees are well trained in privacy protection policies. We also have a comprehensive privacy program staffed with experts who specialize in data protection. In addition to research review, this privacy group must approve research proposals that raise privacy considerations. In many areas, however, existing review guidelines do not provide sufficient guidance regarding research conducted in an industry context. For example, most IRB experts consider product-oriented projects to be quality improvement research because the goal is to contribute to implementable (as opposed to generalizable) knowledge.25 Most Facebook research is part of this category, which does not typically qualify as human subjects research and is, thus, outside the scope of the Common Rule. Given the lack of guidance in this area, two IRBs applying the same standards to the evaluation of this category of research may reach different conclusions. Moreover, some IRB experts suggest that decisions made based on the Common Rule are more likely to be too lenient than too stringent in an industry context, due to gaps in oversight.26 For instance, analyses of existing datasets that are reported in de-identified form are eligible for exemption according to Common Rule 46.101(b)(4).27 Our research review group has worked with researchers to improve the ethical aspects of research conducted on historical Facebook data—for instance, by identifying the implications of research for the community we serve and ensuring that those implications are taken into account in research design and reporting. Incorporating this broader context into research has been important for maintaining the integrity of the research and disclosing it responsibly. The same research would likely have been deemed exempt from the purview of an IRB, however, because it involved the analysis of pre-existing, de-identified datasets.

25

Based on private conversations with IRB members and experts.

26. See, e.g., Vayena et al., supra note 20, at manuscript 3–7 (providing instances of gaps in the current oversight framework). 27. Basic HHS Policy for Protection of Human Research Subjects, 45 C.F.R. §§ 46.101(b)(4) (2009).

450

72 WASH. & LEE L. REV. ONLINE 442 (2016)

For these reasons, the guidelines articulated in the Common Rule do not always provide sufficient guidance around many of the research questions we face. To be sure, some of the research conducted at Facebook, such as user experience surveys, would fall clearly under the purview and expertise of an IRB. And indeed, when proposed research falls outside the expertise of our internal research review group, we can and do consult with outside IRBs. For the majority of research we undertake, however, the Common Rule framework would fail to subject it to meaningful review—an outcome that was important for us to avoid. Rather than attempt to fit a square peg into a round hole, we developed a process specifically tailored to the context in which we operate and the full range of research questions and methodologies we employ. IV. Designing a Process We designed our process to leverage the structure that already exists at Facebook, creating multiple training opportunities and research review checkpoints in the organizational flow. Figure 1 summarizes this process. Figure 1: Research Review at Facebook

EVOLVING THE IRB

451

Note that the research review process exists in parallel to our privacy process—not as a substitute to it. Research affecting user privacy must, then, be evaluated by both the privacy and research review groups. A. Training We provide three levels of training related to privacy and research, depending on each individual's involvement with research: 1. Employee onboarding: Socialization to our practices and principles around ethical research begins during onboarding and is mandatory for all employees. Every new hire receives training on our company policies around data access and privacy. 2. Researcher-specific training: Those working directly with data—for example, data scientists and quantitative researchers—attend “bootcamp,” where they learn about our research review process, why it matters, and the types of research that are subject to extended review. 3. Reviewer-specific training: Individuals directly involved in the research review decision-making process— substantive area experts and members of the research review group—complete the National Institute of Health’s (NIH) human subjects training. The NIH training, however, is just a starting point. The reviewers meet regularly to share lessons learned, discuss challenges, and review the latest thinking on the subject of research review from academics and policymakers. B. Review By Substantive Area Expert The senior managers of each research team (for example, data science, infrastructure)—who have substantive expertise in the areas of research for which they are responsible—provide the first review of research proposals. At this point in the process, the manager determines whether an expedited review (“standard review”) is appropriate, or whether the proposal should be referred to the cross-functional research review group (“extended review”).

452

72 WASH. & LEE L. REV. ONLINE 442 (2016)

These managers consider the scientific and ethical merits of each proposal, based on the criteria described in a subsequent section, and can request feedback or additional review from the cross-functional group as needed. Moreover, because research evolves as it progresses, managers may refer a project to the research review group at any stage—not just at the project's inception. The managers undoubtedly exercise some discretion in their decision to approve, escalate, or decline to advance a research proposal. Any review process involves some degree of subjectivity, which is why we have designed ours to err on the side of multiple reviews. We do not have categories of research—including product improvements—that are automatically approved. Moreover, as previously discussed, research that also touches on privacy is considered by a separate privacy review group with expertise specific to that area. C. Review by Research Review Group The research review group consists of a standing committee of five, and includes experts in the substantive area of the research as well as law, ethics, communications, and policy. Most of the research Facebook conducts relates to small product tests—for example, evaluating whether the size or placement of a comment box affects people’s engagement. The research area expert may expedite the review of these studies or seek the counsel of a particular reviewer from the larger group based on the area of sensitivity. Some research, however, raises additional complexities. For those studies, the group considers the potential ethical, policy, and legal implications. Once extended review has been triggered, we require consensus among all members of the group before the research proposal is approved. In evaluating research, the group considers the potential benefits of the results and identifies any potential downsides that require evaluation—for instance, whether there are data privacy or security issues that have not already been reviewed through our privacy program. Benefits typically relate to our efforts to improve Facebook products and services. The group also

EVOLVING THE IRB

453

considers the anticipated contribution to general knowledge and whether the research could generate positive externalities and implications for society.28 We have designed our process to be inclusive: Companies have to consider a myriad of factors when deciding to undertake a particular project, and diverse networks help ensure that a broad range of experiences and expertise are leveraged. Frequently, the group solicits feedback from others across the company who have particular expertise about the research, or a dimension of it. The group also can go outside the company for additional expert consultation.29 For example, before conducting research on trends in the LGBT community on Facebook, we sought feedback from prominent groups representing LGBT people on the value that this research would provide and on what data to collect and report.30 So, when decisions are made, the research has been considered from a variety of sides.

For example, Facebook researchers used image recognition technologies to process satellite maps in order to generate highresolution population estimates to support our connectivity initiatives. These maps will be open-sourced, so that they can provide value outside the Facebook context—for instance, guiding government infrastructure planning, crisis rescue and recovery teams, and humanitarian groups deciding how to most efficiently allocate medication and other resources. See Connecting the World with Better Maps: Data-assisted Population Mapping, NEWSROOM AT FACEBOOK (Feb. 21, 2016), https://fbnewsroomus.files.wordpress.com/2016/02/population_density_fi nal_mj2_ym_tt2113.pdf (last visited Apr. 27, 2016) (on file with the Washington and Lee Law Review). 28

29. We have considered including an external member on our review board, following the IRB model. To this point, however, we have instead taken the approach of engaging external stakeholders on a case-by-case basis, identifying those with the most substantive and methodological knowledge on particular research proposals under consideration. So far, we have found it more valuable to engage top experts on each project, rather than to include an additional standing member on our committee who is a generalist. 30. See Bogdan State & Nils Wernerfelt, America’s Coming Out on Facebook, RESEARCH AT FACEBOOK (Oct. 15, 2015), https://research.facebook.com/blog/america-s-coming-out-on-facebook/ (last visited Mar. 11, 2016) (setting forth this study) (on file with the Washington and Lee Law Review).

454

72 WASH. & LEE L. REV. ONLINE 442 (2016) D. Evaluative Criteria

When reviewing research proposals, our basic formula is the same as an IRBs: We consider the benefits of the research against the potential downsides. And also like an IRB, the particular inputs into this formula depend on the research that is under review. Each research proposal is different and requires judgment about whether it is consistent with our values. Four criteria, however, guide our consideration of proposed research. First, we consider how the research will improve our society, our community, and Facebook. Like many companies, we do research to make our product better. We are fortunate, however, to have the capacity to be forward-looking and to prioritize research that will lead to long-term innovations over incremental gains. As the company grows, our research agenda expands to include projects that contribute value to our community and society. For instance, our accessibility team develops technologies to make Facebook more inclusive for people with disabilities.31 Collaborative research with the University of Washington informed the design of our suicide prevention tool.32 Researchers in our Connectivity Lab are using technologies developed across Facebook to create high-quality population density maps based on satellite images, which have the potential to inform policymaking and decisions about where to invest in connectivity and other infrastructure.33 Thus, when evaluating research, we

31. Shaomei Woo, Hermes Pique & Jeff Wieland. Using Artificial Intelligence to Help Blind People ‘See’ Facebook, FACEBOOK NEWSROOM (Apr. 5, 2015), http://newsroom.fb.com/news/2016/04/using-artificial-intelligence-to-helpblind-people-see-facebook/ (last visited Apr. 5, 2016) (on file with the Washington and Lee Law Review). 32. See Deborah Bach, Forefront and Facebook Launch Suicide Prevention Effort, UNIV. OF WASH.: UW TODAY (Feb. 25, 2015), http://www.washington.edu/news/2015/02/25/forefront-and-facebook-launchsuicide-prevention-effort/ (last visited Mar. 11, 2016) (announcing this collaboration and outlining its goals) (on file with the Washington and Lee Law Review). 33. Andi Gros & Tobias Tiecke, Connecting the World with Better Maps, CODE AT FACEBOOK (Feb. 21, 2016), https://code.facebook.com/posts/1676452492623525/connecting-the-world-withbetter-maps/ (last visited Apr. 5, 2016) (on file with the Washington and Lee Law Review).

EVOLVING THE IRB

455

consider not just the value it will bring to Facebook, but also to science and, most importantly, the people we serve. Second, we ask whether there are potentially adverse consequences that could result from the study, and whether every effort has been taken to minimize them. Like an IRB, we think about potential downsides to study participants. Our review pays attention to the impact of research focused on vulnerable populations (e.g., teen bullying) or sensitive topics (e.g., suicide prevention). Third, we consider whether the research is consistent with people’s expectations. Ethicist and legal scholar Helen Nissenbaum writes, “[W]hat people care most about is not simply restricting the flow of information but ensuring that it flows appropriately.”34 In keeping with this perspective, we try to make sure that our methodology is consistent with people’s expectations of how their information is collected and stored. To be sure, gauging people’s expectations is not an exact science. We stay closely aware of principles and discussions being put forward by ethicists, advocates, academics.35 We also know that certain categories of research—for example, analyses of aggregate trends in public posts—are less sensitive than others, so we try to leverage these types of designs when possible. We also ask researchers who publish their work to be explicit, where appropriate, about the fact that their research conforms with our data policy and to articulate the values that motivate the research. Our research review process helps us apply those values consistently. Finally, we ensure that we have taken appropriate precautions designed to protect people’s information. For

34. HELEN NISSENBAUM, PRIVACY IN CONTEXT: TECHNOLOGY, POLICY AND THE INTEGRITY OF SOCIAL LIFE 2 (2010). 35. Some examples include: (1) Towards a New Digital Ethics: Data, Dignity, and Technology. EUR. DATA PROTECTION SUPERVISOR, (Sept. 11, 2015), https://secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents /Consultation/Opinions/2015/15-09-11_Data_Ethics_EN.pdf (last visited Apr. 5, 2016) (on file with the Washington and Lee Law Review).; (2) Polonetsky et al., supra note 2; (3) Civil Rights Principles for the Era of Big Data, LEADERSHIP CONF. (2014), http://www.civilrights.org/press/2014/civil-rights-principles-bigdata.html?referrer=https://www.google.com/ (last visited Apr. 5, 2016) (on file with the Washington and Lee Law Review).

456

72 WASH. & LEE L. REV. ONLINE 442 (2016)

instance, we generally release our research results in aggregated form. V. Conclusion Through building our internal review process, we have learned a number of lessons: A. Leverage Existing Infrastructure Facebook’s research review process is managed on the same online platform that teams use to track their work. By building on the existing infrastructure, the research review process becomes part of researchers’ normal workflows. This reduces the burden placed on researchers—both in terms of training and paperwork. It also makes it easy to solicit input from the research review group and additional stakeholders across the company at any stage of the research process. Deliberations are documented within this system; like IRBs, we do not make those deliberations public, but we do maintain records of our decisions. B. Inclusiveness Is Key Research review does not occur behind closed doors. We have found that including researchers and managers in the deliberations leads to faster turn-around and more informed decision-making. It also helps educate researchers about ethical considerations that may inform their future work. The deliberations and decisions of the research review group are accessible to all employees through the centralized platform that we use to track our work. Moreover, anyone at the company is empowered to refer research for review if he or she believes a review is warranted. C. Ask for Help As our company grows, we engage in research that is increasingly diverse and complex. While the same

EVOLVING THE IRB

457

cross-functional group evaluates each of these proposals, the infrastructure around our review process allows us to bring more people into the conversation seamlessly. We often do so. In addition, we can reach out to external consultants or IRBs if we lack the necessary expertise to evaluate a proposal comprehensively. D. Listening to Feedback The development of our current process did not occur in a vacuum. Throughout, we benefited from the feedback of our community, as well as from experts in industry research, academia, and human subjects review. We continue to listen and to iterate as we receive additional feedback. E. Flexibility Within industry and academia, norms around data use and analysis are constantly evolving, as are the questions researchers ask. Principles that are set in stone are at risk of quickly becoming irrelevant and unhelpful.36 We believe that a research review process is most likely to be successful and sustainable if it can change fluidly in response to shifting paradigms, new research questions, and external feedback. Accordingly, we plan to continue improving our research process over time. There is no one-size-fits-all model of research review that can be applied across companies. We hope, however, that the lessons we have learned will help inform others as they create the processes that serve their needs.

36. Lewis Gersh, The Velocity of Obsolescence, FORBES: ENTREPRENEURS (July 29, 2013, 11:41 AM), http://www.forbes.com/sites/lewisgersh/2013/07/29/the-velocity-ofobsolescence/#4217d7a1665e (last visited Mar. 11, 2016) (exploring the rapid speed at which technology changes and advances) (on file with the Washington and Lee Law Review).

B I L L G AV E R , T O N Y D U N N E A N D E L E N A PA C E N T I

design

Don Bishop ©1997 Artville, LLC

Cultural Probes Homo ludens impinges on his environment: He interrupts, changes, intensifies; he follows paths and in passing, leaves traces of his presence everywhere. — Constant

A

As the local site coordinator finished his introduction to the meeting, our worries were increasing. The group had taken on a glazed look, showing polite interest, but no real enthusiasm. How would they react when we presented them with our packages? Would disinterest deepen to boredom, or even hostility?

interactions...january + february 1999

21

Of course an explanation had been necessary for this special meeting with us, three foreign designers. The coordinator explained that we were there as part of a European Union–funded research project looking at novel interaction techniques to increase the presence of the elderly in their local communities. We represented two design centers that would be working over the next two years with three community sites: in the Majorstua, a district of Oslo; the Bijlmer, a large planned community near Amsterdam; and Peccioli, a small village outside Pisa. We were at the last site, to get to know the group a little. An important preamble, then, well delivered by the coordinator, but the explanation was of necessity fairly complicated. On our arrival, the 10 elderly members had been friendly and enthusiastic, if a little puzzled.

Now they were looking tired. Finally the time came. I stood up and said, “We’ve brought you a kind of gift,” as we all passed the clear blue plastic envelopes to the group. (See Fig. 1) “They’re a way for us to get to know you better, and for you to get to know us.” Already people were starting to unwind the strings fastening the envelopes. “Take a look,” I said, “and we’ll explain what’s in them.” An assortment of maps, postcards, cameras, and booklets began accumulating in front of them. Curious, they started examining the materials. Soon they were smiling and discussing them with their neighbors. As the feeling of the group livened perceptibly, we started explaining the contents. Worry transformed to excitement. Perhaps the probes would work after all. Cultural Probes

The cultural probes—these packages of maps, postcards, and other materials—were designed to provoke inspirational responses from elderly people in diverse communities. Like astronomic or surgical probes, we left them behind when we had gone and waited for them to return fragmentary data over time. The probes were part of a strategy of pursuing experimental design in a responsive way. They address a common dilemma in developing projects for unfamiliar groups. Understanding the local cultures was necessary so that our designs wouldn’t seem irrelevant or arrogant, but we didn’t want the groups to constrain our designs unduly by focusing on needs or desires they already understood. We wanted to lead a discussion with the groups toward unexpected ideas, but we didn’t want to dominate it.

Figure 1. A cultural probe package.

22

Postcards Within the probe packages, people found 8 to 10 postcards scattered among other materials. The cards had images on the front, and questions on the back, such as: ✦ Please tell us a piece of advice or insight that has been important to you. ✦ What do you dislike about Peccioli? ✦ What place does art have in your life? ✦ Tell us about your favorite device. interactions...january + february 1999

The questions concerned the elders’ attitudes towards their lives, cultural environments, and technology. But we used oblique wording and evocative images to open a space of possibilities, allowing the elders as much room to respond as possible. Postcards are an attractive medium for asking these sorts of questions because of their connotations as an informal, friendly mode of communication. (See Fig. 2) Unlike formal questionnaires, the postcards encouraged questions to be approached casually, which was underlined by pre-addressing and stamping them for separate return. Maps The probes contained about seven maps, each with an accompanying inquiry exploring the elders’ attitudes toward their environment. (See Fig. 3) Requests ranged from straightforward to poetic. For instance, a map of the world included the question “Where have you been in the world?”, and small dot stickers were provided to mark answers. Participants were also asked to mark zones on local maps, showing us where, for instance, ✘ They would go to meet people ✘ They would go to be alone ✘ They liked to daydream ✘ They would like to go but can’t A more surreal task was given to each group as well; in the case of Peccioli, for example, a map was labeled “if Peccioli were New York...” and was accompanied by stickers showing scenes ranging from the Statue of Liberty to people injecting drugs. The maps were printed on a variety of textured papers to emphasize their individuality and cut into several different envelope forms. When the elderly were finished with them, they folded them together and put them in the mail. Camera Each probe included a disposable camera, repackaged to separate it from its commercial origins and to integrate it with the other probe materials. On the back we listed requests for pictures, such as

✱ Your home ✱ What you will wear today ✱ The first person you see today ✱ Something desirable ✱ Something boring About half the pictures were unassigned, and the elders were asked to photograph whatever they wanted to show us before mailing the camera back to us. (See Fig. 4) Photo Album and Media Diary The last two items in the probes were in the form of small booklets. The first was a photo album, which requesting the elders to “use 6 to 10 pictures to tell us your story.” When questioned, we encouraged participants to use photos of the past, their families, their current lives, or anything they found meaningful. (See Fig. 5) Finally, each probe contained a media

Figure 2. A postcard (“what is your favorite device?”)

interactions...january + february 1999

23

Bill Gaver and Tony Dunne Royal College of Art London, U.K. [email protected] [email protected] Elena Pacenti Domus Academy Milan, Italy [email protected]

diary, in which elderly participants were asked to record their television and radio use, including what they watched, with whom, and when. They were also asked to note incoming and outgoing calls, including their relationship with the caller and the subject of the calls. The entries were made daily, for a total of a week. Context

A number of converging interests and constraints were involved in designing the probes. The Presence Project has been funded for two years under the European Union’s 13 initiative. Eight partners from four countries are exploring technologies to increase the presence of the elderly in their local communities. This is a relatively unconstrained project, defined only in terms of its overall goal and its flow over time. The first year has been spent on opening a space of possible designs; the second will focus on developing prototypes to be

tested in the sites. The sites themselves constrain the sorts of design explorations that might be meaningful. In Oslo, we are working with a group of elderly who have been learning to use the Internet at a local library. In the Netherlands, the elders live in the Bijlmer, an extensive planned community with a poor reputation. Finally, the Italian site is in Peccioli, a small Tuscan village where an elder center is being planned. The diversity of the sites was clear from the outset. Our task was to better understand their particularities. The openness of the design brief, and the availability of more quantitative demographic data from the local sites, meant that we could freely explore many different aspects of the elders’ attitudes. Of course, we might have used more traditional methods to do this, including perhaps ethnographic studies, interviews, or questionnaires. That we didn’t stems, in part, from how we think about doing research through design. Design as Research

We approach research into new technologies from the traditions of artist–designers rather than the more typical science- and engineering-based approaches. Unlike much research, we don’t emphasize precise analyses or carefully controlled methodologies; instead, we concentrate on aesthetic control, the cultural implications of our designs, and ways to open new spaces for design. Scientific theories may be one source of inspiration for us, but so are more informal analyses, chance observations, the popular press, and other such “unscientific” sources.

Figure 3. A map (“if Peccioli were New York...“)

24

Figure 4. Camera

interactions...january + february 1999

Unlike most design, we don’t focus on commercial products, but on new understandings of technology. This allows us—even requires us—to be speculative in our designs, as trying to extend the boundaries of current technologies demands that we explore functions, experiences, and cultural placements quite outside the norm. Instead of designing solutions for user needs, then, we work to provide opportunities to discover new pleasures, new forms of sociability, and new cultural forms. We often act as provocateurs through our designs, trying to shift current perceptions of technology functionally, aesthetically, culturally, and even politically. Inspiration, not Information

The artist–designer approach is openly subjective, only partly guided by any “objective” problem statement. Thus we were after “inspirational data” with the probes, to stimulate our imaginations rather than define a set of problems. We weren’t trying to reach an objective view of the elders’ needs through the probes, but instead a more impressionistic account of their beliefs and desires, their aesthetic preferences and cultural concerns. Using officiallooking questionnaires or formal meetings seemed likely to cast us in the role of doctors, diagnosing user problems and prescribing technological cures. Conversely, we didn’t want to be servants either, letting the elders set the directions for our designs. Trying to establish a role as provocateurs, we shaped the probes as interventions that would affect the elders while eliciting informative responses from them. Combating Distance

To establish a conversation with the elder groups, we had to overcome several kinds of distance that might separate us, some endemic to most research, some particular to this project. Foremost was the kind of distance of officialdom that comes with being flown in as well-funded experts. Trying to reduce this sort of distance underlay a great deal of the tone and aesthetics of the probe materials.

Geographic and cultural distances were more specific problems for this project. We designed the materials to be posted separately, both to acknowledge our distance and to emphasize our ongoing lives in other countries (thus we used our names in the addresses, as opposed to an institutional title like “The Presence Project”). We also tried to design the materials to be as visual as possible, to some extent bypassing language barriers. Respecting Our Elders

A particularly important gap for us to bridge was the generational gap implied by designing for another age group. To encourage a provocative dialogue about design, we tried to reject stereotypes of older people as “needy” or “nice.” This freed us, in turn, to challenge the elder groups, both through the probes and our eventual designs. Moving beyond a view of older people as needy or nice has allowed us to view them in new ways, opening new opportunities for design. For instance, elders represent a lifetime of experiences and knowledge, often deeply embedded in their local communities. This could be an invaluable resource to the younger members of their community. Conversely, elders also represent a life free from the need to work, and thus the possibility of exploring life as homo ludens, humanity defined by its playful qualities. Our designs could offer them opportunities to appreciate their environments—social, urban, and natural—in new and intriguing ways.

Figure 5. Photo album

Functional Aesthetics

Throughout the project, we have viewed aesthetic and conceptual pleasure as a right rather than a luxury. We didn’t work on the aesthetics of the probes simply to make them appealing or motivating but because we believe aesthetics to be an integral part of functionality, with pleasure a criterion for design equal to efficiency or usability. We worked to make the probe materials delightful, but not childish or condescending. In fact, the aesthetics were somewhat abstract or alien in order to encourage from respondents a slightly detached attitude to our

interactions...january + february 1999

25

DESIGN COLUMN EDITOR Kate Ehrlich Senior Research Manager Lotus Development Corporation 55 Cambridge Parkway Cambridge, MA 02142 +1-617-693-1899 fax: +1-617-693-8383 [email protected]

requests. But although the materials were aesthetically crafted, they were not too professionally finished. This gave them a personal and informal feeling, allowing them to escape the genres of official forms or of commercial marketing. In the end, they revealed the energy we put into them and expressed our tastes and interests to the groups. The aesthetics of the packages were thus another attempt to reduce the distance between us and the groups. Through the materials and images and the requests we made, we tried to reveal ourselves to the groups as we asked them to reveal themselves to us. Not only did this make the probes themselves enjoyable and communicative, but it meant that they started to hint at what the elders might expect from our eventual designs. Applying Conceptual Art

The conceptual concerns and specific techniques of various arts movements also influenced our design. For instance, our maps are related to the psychogeographical maps of the Situationists [1] (see the sidebar), which capture the emotional ambience of different locations. Unfamiliar with the local sites ourselves, we asked the local groups to map them for us. Not only did this give us material to inform our designs, but, we hope, provoked the elders to consider their environment in a new way. We used other techniques from groups such as Dada, the Surrealists, and more contemporary artists in the probes as well. They incorporated elements of collage, in which juxtaposed images open new and provocative

spaces, and of borrowing and subverting the visual and textual languages of advertisements, postcards, and other elements of commercial culture. Finally, we tried to use, judiciously, tactics of ambiguity, absurdity, and mystery throughout, as a way of provoking new perspectives on everyday life. Launching the Probes

We gave the probes to members of the elder groups in a series of meetings at the local sites, like the one described in the beginning of this paper. We did not describe every item, but instead introduced the types of things they would find. We wanted them to be surprised as they returned to the packages over the following weeks. Originally we had planned to send the packages to the groups, but we were afraid they might reject the unusual approach we were taking. We decided to present them ourselves to explain our intentions, answer questions, and encourage the elders to take an informal, experimental approach to the materials. This turned out to be an extremely fortunate decision, because one of the unexpected strengths of the probes was in sparking a dialogue between us and the elderly. What we feared would be polite group discussions turned out to be spontaneous and personal, and we learned a great deal about the groups in discussing the materials. Even after we left, some of the elders sent us personal greetings beyond the materials themselves—postcards, letters, even personalized Christmas cards. The Returns

Figure 6. Some of the returned items.

26

For about a month after we left each site, we started receiving the completed materials, at a rate that seemed to compare favorably with that for other methods. Every day or so, we would find another few postcards, maps, or cameras in our post, which allowed us to scan and sort them in a piecemeal and leisurely fashion. (See Fig. 6) Some of the items that the elders returned were left blank or they included notes about why the given request was difficult. We had encouraged this in the meetings, as a way of keeping the process open to the elders’ opininteractions...january + february 1999

ions. And in fact, we redesigned the materials for each group as we received returns from the last. Sorting through the masses of maps, cards, and photographs that we received, strong and differentiated views of the three sites began to emerge. Some items acted as beacons for us— a photograph of friends at an Italian café, a map of the Bijlmer with extensive notes about the “junkies and thieves” in the area, a joke about death from Oslo. They seemed to capture particular facets of the cultures, clearly symbolizing important issues. (See Fig. 7) The return rates from the groups added to our impressions of their differences. The Oslo group returned almost all the materials, and thus seemed enthusiastic and diligent. The Bijlmer group returned a bit more than half the materials: they seemed less convinced by the project but willing to take part in tasks they found meaningful or provoking. Finally, the Peccioli group returned less than half the materials, despite being enthusiastic when they received them. We take this as a sign that they are well meaning but happily distracted by their daily lives—an important factor for our designs. From Probes to Designs

The probes were not designed to be analyzed, nor did we summarize what they revealed about the sites as an explicit stage in the process. Rather, the design proposals we produced reflected what we learned from the materials. For the Royal College of Art, the probe materials allowed the different characters of the three sites to emerge, which we are reflecting in quite different design scenarios: ✦ In the Bijlmer, our ideas respond to the paradox of a strong community in a dangerous area: We have proposed building a network of computer displays with which the elderly could help inhabitants communicate their values and attitudes about the culture. ✦ The group in Oslo is affluent, well educated, and enthusiastic: We are proposing that they lead a communitywide conversation about social issues, publishing questions from the library

Figure 7. A returned map showing zones of safety and fear in the Bijlmer.

that are sent for public response to electronic systems in cafés, trams, or public spaces. ✦ Finally, the elders in Peccioli enjoy a relaxed social life in a beautiful setting. We plan to amplify their pleasure by creating social and pastoral radioscapes, allowing them to create flexible communications networks and to listen to the sounds of the surrounding countryside. (See Fig. 8) For the Domus Academy, the returns suggested a range of nonstereotypical profiles of elders that were less focused on the particular sites. For instance, many elders are experts on

interactions...january + february 1999

27

Figure 8. From the proposal for the bijlmer.

the current status and history of their communities and might serve as local information resources, perhaps by guiding tourists. They are eager to keep in touch with friends and families, and thus new technologies might

support relationships with distant relatives or with children and grandchildren closer to home, or might provide forms of “soft surveillance” or informal help chains to combat social isolation. Finally, the elderly might provide a living memory of a particular community, enriching the physical environment with virtual traces of its history. These proposals were our reply to the elders’ responses to the probes, integrating what we learned about them with suggestions for new possibilities. The best evidence that the returns from the probes spurred valuable insights into the local cultures was that the elders clearly recognized themselves in the proposals. Although some of our suggestions were intended to be strange or provocative, the elders became readily involved with them, making suggestions for reshaping the ideas, but without breakdowns in the conversation that would have indicated our perceptions

The Situationists One influence on our work is the Situationists [1, 3], a collective of artist-provocateurs based largely in Paris from the late 1950s to early 1960s. Like the Dadaists and Surrealists (e.g., [2]), the Situationists wanted their art to be revolutionary, reawakening passion and unconscious desires in the general public. Fundamental in this approach was their analysis of the ways that commercial culture expropriates people’s experience into the “Spectacle,” an allencompassing, media-fueled show. As the Spectacle subsumes ideas, desires, even protests, people are forced into an alienated position, as consumers of their own experience. The Situationists used artistic strategies both as a radical critique of the Spectacle and as concrete research into the promise of new cultural possibilities. Art was to be liberated from the safe enclave of established galleries and used to seduce and confront ordinary people. They mass-produced paintings sold by the yard; altered prints, comic strips, and advertisements; and created new architectures to be changed at will by the people who lived in them. Throughout, they embraced disorientation and confusion as methods for liberation. Psychogeographical maps were developed to represent the city’s topology of desire, fear, isolation and sociality, to challenge the cultural homogeneity assumed by commercial interests. Situationists took derivés, meandering around the city guided only by the landscape of impulse and desire, and mapped what they found. We have borrowed from this technique for the cultural probes. More generally, we approach our design in their spirit of functional pleasure.

28

interactions...january + february 1999

were crude or mistaken. This notion of a continuing conversation with the elders has been pivotal to our understanding of the probes as a method. User-Centered Inspiration

Although the probes were central to our understanding of the sites, they didn’t directly lead to our designs. They were invaluable in making us aware of the detailed texture of the sites, allowing us to shape proposals to fit them. But we were also influenced by our preexisting conceptual interests, our visits to the sites, anecdotes and data about the areas from the local coordinators, and readings from the popular and specialist press. Just as many influences went into designing the probes, so have they been one of many influences on our design process. The cultural probes were successful for us in trying to familiarize ourselves with the sites in a way that would be appropriate for our approach as artist–designers. They provided us with a rich and varied set of materials that both inspired our designs and let us ground them in the detailed textures of the local cultures. What we learned about the elders is only half the story, however. The other half is what the elders learned from the probes. They provoked the groups to think about the roles they play and the pleasures they experience, hinting to them that our designs might suggest new roles and new experiences. In the end, the probes helped establish a conversation with the groups, one that has continued throughout the project. We believe the cultural probes could be adapted to a wide variety of similar design projects. Just as machine-addressed letters seem more pushy than friendly, however, so might a generic approach to the probes pro-

duce materials that seem insincere, like official forms with a veneer of marketing. The real strength of the method was that we had designed and produced the materials specifically for this project, for those people, and for their environments. The probes were our personal communication to the elders, and prompted the elders to communicate personally in return. “The game should be played for some length of time to arrive at the most curious results. The questions, as well as the answers, are to be considered as symptomatic.” — J. Levy, Surrealism Acknowledgments

The Presence Project is supported by a grant from the European Union under the I3 initiative. We are extremely grateful to the members of the three groups and to Sidsel Bjorneby, Simon Clatworthy, Danielle van Diemen, and Cecelia Laschi, the local site coordinators. We thank Ben Hooker and Fiona Raby for help with the design and production of the probe materials and Anne Schlottmann for helpful comments on this paper. Finally, we thank our partners from the Domus Academy, Netherlands Design Institute, Telenor, Human Factors Solutions, Scuola Superiore Sant’Anna, and IDEA.

PERMISSION TO MAKE DIGITAL OR HARD COPIES OF ALL OR PART OF THIS WORK FOR PERSONAL OR CLASSROOM USE IS GRANTED WITHOUT FEE PROVIDED THAT COPIES ARE NOT MADE OR DISTRIBUTED FOR PROFIT OR COMMERCIAL ADVANTAGE AND THAT

References 1. Andreotti, L. and Costa, X. (eds.), Situationists: Art, Politics, Urbanism. Museo d’Art Contemporani de

COPIES BEAR THIS NOTICE AND THE FULL CITATION ON THE FIRST PAGE.

TO COPY OTHERWISE, TO REPUBLISH, TO POST ON SERVERS OR TO REDIS-

Barcelona, 1996.

TRIBUTE TO LISTS, REQUIRES PRIOR

2. Levy, J. Surrealism. Black Sun Press (1st ed.), 1936;

SPECIFIC PERMISSION AND/OR A FEE.

reprinted by Da Capo Press, New York, 1995.

© ACM 1072-5220/99/0100 $5.00

3. Plant, S. The Most Radical Gesture: The Situationist International in a Postmodern Age. Routledge, London, 1992.

interactions...january + february 1999

29

Methods&Tools p27-39 6/7/99 6:37 AM Page 27

V. KAPTELININ, B. NARDI AND C. MACAULAY

methods & tools

David Wasserman ©1997 Artville, LLC

The Activity Checklist: A Tool for Representing the “Space” of Context Introduction

I

In recent years, specialists in human–computer interaction (HCI) have come to appreciate the importance of understanding the context in which computer-supported activities take place [1]. Such understanding directly affects design and evaluation by revealing what users are up to and how they might most effectively use a technology. The idea is to gain this understanding before the design process has progressed too far, or during evaluation, when openings for modifications and improvements to the technology exist .

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

27

Methods&Tools p27-39 6/7/99 6:37 AM Page 28

Victor Kaptelinin Department of Informatics Umeå University S-901 87 Umeå, Sweden Victor.Kaptelinin@ informatik.umu.se Bonnie A. Nardi Department of Human–Computer Interaction AT&T Labs–Research Menlo Park, California 94040, USA [email protected] Catriona Macaulay Napier University Department of Computing Edinburgh, EH14 1DJ, United Kingdom [email protected]. ac.uk

28

There have been several attempts to come up with tools and techniques to support taking context into account in the design and evaluation of computer technologies. These approaches include task analysis [6], participatory design [3], and contextual design [7], among others. However, contextual factors are notoriously elusive and difficult to pin down [5], so there is still a need for conceptual tools to deal with context at a practical level. The existing approaches to context are for the most part “bottom up” ones. They start with an empirical analysis of contextual factors and gradually develop concepts such as “task decomposition” [6], “future workshops” [3], or “flow models” [7], which later can be put in an appropriate theoretical framework. From our point of view, this “bottom up” or empirically-driven strategy can be complemented with a “top down” one, that is, starting with an abstract theoretical representation of context and then situating this representation in the reality of design and evaluation. Borrowing Brown and Duguid’s well-known metaphor [5], we can say that if it is difficult to grapple with the “whale” of context by trying to get a firm grip on its specific parts, let’s try a large net instead. In this paper we present a tool that is directly shaped by a general theoretical approach—activity theory [10, 11, 18]. Activity theory provides a broad theoretical framework for describing the structure, development, and context of human activity. In the 1990s, activity theory has been applied to problems of human–computer interaction by an international community of scholars and practitioners [1–4, 8, 9, 12]. Activity theory is framed by several basic principles (explained in the next section): hierarchical structure of activity, object-orientedness, internalization and–externalization, tool mediation, and development. These general principles help orient thought and research, but they are somewhat abstract when it comes to the actual business of working on a design or performing an evaluation. To make activity theory more useful, we have developed an artifact—the Activity Checklist—that makes concrete the concep-

tual system of activity theory for the specific tasks of design and evaluation. The Activity Checklist is intended to elucidate the most important contextual factors of human–computer interaction. It is a guide to the specific areas to which a researcher or practitioner should be paying attention when trying to understand the context in which a tool will be or is used. The Checklist lays out a kind of “contextual design space” by representing the key areas of context specified by activity theory. In the rest of this paper we discuss activity theory, present the Checklist, and show its use by giving an example of a specific technology. The Checklist is an adjunct to the basic principles of activity theory—not a tool to be used in isolation. An overview of activity theory with empirical applications can be found in [13]. Basic Principles of Activity Theory: An Overview

Activity theory is a general conceptual approach, rather than a highly predictive theory. The unit of analysis in activity theory is the activity, consisting of a subject (an individual or group), an object or motive, artifacts, and sociocultural rules. Leont’ev [10] made the point that we cannot pull these pieces apart without violating the very essence of human activity, just as we cannot pull apart sodium and chloride if we want to understand salt. Understanding human activity requires a commitment to a complex unit of analysis. Two basic ideas animate activity theory: (1) the human mind emerges, exists, and can only be understood within the context of human interaction with the world; and (2) this interaction, that is, activity, is socially and culturally determined. These ideas are elaborated in activity theory into a set of five principles as follows. Object-Orientedness The principle of object-orientedness states that every activity is directed toward something that objectively exists in the world, that is, an object. For example, a computer program is an object of a programmer’s activity.

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

Methods&Tools p27-39 6/7/99 6:37 AM Page 29

Human activity can be oriented toward two types of objects: things and people [10]. The notion of an object is not limited in activity theory to the physical, chemical, and biological properties of entities. Socially and culturally determined properties are also objective properties that can be studied with objective methods. For example, the intended purposes and ways of using artifacts can be objectively studied. Hierarchical Structure of Activity According to Leont’ev [11], interaction between human beings and the world is organized into functionally subordinated hierarchical levels. Leont’ev differentiated among three levels: activities, actions, and operations. Activities are undertaken in order to fulfill motives. Motives can be considered top-level objectives that are not subordinated to any other objectives. Behind a motive “… there always stands a need or a desire, to which [the activity] always answers” [10]. People may or may not be consciously aware of their motives. Actions are goal-directed processes that must be carried out to fulfill a motive. For instance, a programmer may write a utility program needed to make his larger program work efficiently. The larger program itself might be an action with respect to a motive such as getting ahead at work. Actions are conscious; people are aware of their goals. Goals can be broken into lower level goals, which, in turn, can have lower level goals, much like the concept of goals and subgoals in artificial intelligence (AI) and other traditions. For example, writing a utility program might involve talking to another programmer about how she solved a similar problem, which might involve scheduling a time to talk, opening an electronic calendar, and so forth. Actions are similar to what are often referred to in the human–computer interaction literature as tasks [e.g., 15]. Moving down the hierarchy of actions we cross the border between conscious and automatic processes. Functional subunits of actions, which are carried out automatically, are operations. Operations do not have their own goals; rather they adjust actions to curi n t e r a c t i o n s . . . j u l y

rent situations. Actions transform into operations when they become routinized and unconscious with practice. When learning to drive a car, the shifting of the gears is an action with an explicit goal that must be consciously attended to. Later, shifting gears becomes operational and “can no longer be picked out as a special goal-directed process: its goal is not picked out and discerned by the driver” [10]. Conversely, an operation can become an action when “conditions impede an action’s execution through previously formed operations” [10]. For example, if one’s mail program ceases to work, one continues to send mail by substituting another mailer, but it is now necessary to pay conscious attention to using an unfamiliar set of commands. This dynamic movement up and down the hierarchy distinguishes the activity theory hierarchy from static models such as GOMS. Internalization and Externalization Activity theory differentiates between internal and external activities. The traditional notion of mental processes (such as in cognitive science) corresponds to internal activities. Activity theory emphasizes that internal activities cannot be understood if they are analyzed separately, in isolation from external activities, because it is the constant transformation between external and internal that is the very basis of human cognition and activity. Internalization is the transformation of external activities into internal ones. Activity theory emphasizes that not only do mental representations get placed in someone’s head, but the holistic activity, including motor activity and the use of artifacts, is crucial for internalization. For example, learning to calculate may involve counting on the fingers in the early stages of learning simple arithmetic. Once the arith+

a u g u s t

1 9 9 9

METHODS & TOOLS COLUMN EDITORS Michael Muller Lotus Development Corp. 55 Cambridge Parkway Cambridge, MA 02142 USA +1-617-693-4235 fax: +1-617-693-1407 [email protected]

Finn Kensing Computer Science Roskilde University P.O. Box 260 DK-4000 Roskilde Denmark +45-4675-7781-2548 fax: +45-4674-3072 [email protected]

29

Methods&Tools p27-39 6/7/99 6:37 AM Page 30

metic is internalized, the calculations can be performed in the head without external aids. Internalization provides a means for people to try potential interactions with reality without performing actual manipulation with real objects (mental simulations, imaginings, considering alternative plans, and so forth). Therefore, internalization can help identify optimal actions before actually performing an action externally. In some cases, internaliza-

tion can make an action more efficient because external components, such as performing calculations in the head, are omitted. Externalization transforms internal activities into external ones. Externalization is often necessary when an internalized action needs to be repaired, or scaled, such as when a calculation is not coming out right when done mentally or is too large to perform without pencil and paper or calculator (or some external arti-

The Checklist in the Field Catriona Macaulay As a tool for thinking, the Checklist lends itself to many situations and uses. In this section I illustrate one such situation—the domain investigation—with a personal account of my experiences using the Checklist. For some time now, the need to “contextualize” the design of computer systems has been recognized [1]. Context is of course a notoriously slippery term, and contextualizing design can mean anything from simply taking into account the physical environment in which a system is to be used to developing richly detailed accounts of how people do the things we design new artifacts to support. Ethnographic techniques (see [3] for an introduction) have become firmly established as one way of gathering contextual information. The uses of ethnography within design settings have been described as a continuum, ranging from requirements gathering tied to a particular development project, to opening up a broad domain such as information gathering in order to contribute to our currently limited understanding about fundamental tasks. [4] My field site was a UK national daily newspaper. I had gone there to explore what ‘information gathering’ meant in the context of journalism. And I was doing this for a very explicit purpose, that of informing the design of future technologies to support such activities. Like many ethnographers, having made the decision to go into the field I was unsure about what to do when I got there. To complicate matters, I came from a background in computing and human-computer interaction studies and therefore was taking a particular information technology-biased set of preconceptions and inclinations into the field with me. These issues, my natural inclinations towards theory, and my inexperience as a fieldworker all led me to look for some kind of theoretical scaffolding. Activity Theory (AT) seemed a good choice. AT, I reasoned, had been investigated within HCI and CSCW circles for some time. [2] It seemed to offer hope for bridging the field-design gap by providing a set of concepts relevant to both AT researchers and designers. And activity theory provided a particularly rich set of insights into the relationship between artifacts and practice. The adoption of theoretical frameworks is, of course, not without its dangers. Prior to conducting my main study, I undertook a short pilot study at a small community organization. Very quickly during this period I felt overwhelmed by my attempts to orient my field experiences around activity theory issues, and I eventually abandoned the attempt. It was at this point that I fortuitously discovered the Checklist. Now I had something tangible I could use. It gave me a quick way of relating experiences in the field to AT concepts. It helped me think about the kinds of data I wanted to gather, and the kinds of questions I wanted to ask. As time went on, field driven concerns came to dominate my efforts and the Checklist took more of a back seat. When I was out of the field and reviewing my notes and transcripts, the Checklist once again gave me an additional viewpoint on it all. But how did I actually use it? Well, one of the Checklist’s benefits is its informality. Key concepts are illustrated with sample questions which suggest avenues for thought and exploration rather than formal directions. The Checklist orients without prescribing. I reduced the main section of the Checklist to A5 and kept a copy in my fieldnotes books as an aide memoir. This proved particularly handy for the nervous neophyte fieldworker I was. It gave me something to look at and think about in those awful moments sitting around in the field feeling completely lost! I also had a copy stuck on my office wall which I could refer to when I was preparing for interviews or observation sessions. During data analysis, the Checklist provided a

30

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

Methods&Tools p27-39 6/7/99 6:37 AM Page 31

fact). Externalization is also important when collaboration between several people requires their activities to be performed externally in order to be coordinated. Mediation Activity theory’s emphasis on social factors and on the interaction between people and their environments explains why the principle of tool mediation plays a central role. First,

tools shape the way human beings interact with reality. Shaping external activities results in shaping internal ones. Second, tools usually reflect the experience of other people who tried to solve similar problems before and invented or modified the tool to make it more efficient and useful. This experience is accumulated in (1) the structural properties of tools (shape, size, material) and (2) the knowledge of how the tool should be used. Point 2

resource for deriving additional codewords for my data analysis work. Just having a copy visible when I was working that I could occasionally look up at was helpful. The sample questions were particularly useful. For example, one day I caught sight of one of the sample questions under the Environment column: Are concepts and vocabulary of the system consistent with the concepts and vocabulary of the domain? I suddenly realized that whilst in computing people talk about “information” all the time, in journalism people talk about “sources.” Computing people design systems primarily to help people find information, but “information” is often treated by designers simplistically. Journalists, on the other hand, are more interested in the sources of information than in the information itself. The challenge is finding a source for information about something within an extremely limited timeframe. Subjective judgements about the relevance of a piece of information, then, are made largely in relation to judgements about the source. This led me to the realization that sources can be seen as a very particular kind of artifact within journalistic information gathering, and that they have largely been overlooked by designers of information gathering systems. During the early stages of my study, the sample questions helped me understand the specific issues the Checklist deals with. Later, as my understanding grew, I turned more to the issues in the Checklist rather than the sample questions. Later still, I found myself developing my own sample questions, questions I now carry with me into my next study. Activity theory and the Checklist also proved a useful counter to my natural inclination to cling to the familiar—to obvious technological artifacts. Entering the world of ethnographic fieldwork from a computing background, one can easily become over-focussed on high-tech devices, or on “information” in a simplistic sense. During my first forays into the field, I was so focussed on what I thought the obvious constituents of information gathering activity would be, that I completely failed to recognize the importance of sources. It was this kind of benefit from the Checklist that I most valued. The Checklist was a tool for reflexivity, helping me in my attempts to maintain an awareness of where my own instinctive concerns and interests were closing me off from those of the people I was studying. In summary then, the Checklist became a valuable aide memoir and a tool for reflexivity. Although the Checklist as presented here does not explicitly draw attention to its reflexive uses, this was clearly something of particular benefit to more broadly scoped fieldwork such as mine. For the theoretically-oriented fieldworker, the Checklist provides a flexible and non-prescriptive way of maintaining an awareness of potentially relevant aspects of AT to design concerns. Of course it does not do away with the need to engage with the ideas behind activity theory more broadly, but it certainly helps kick-start the process. References 1. Clarke, S. (1997). Encouraging the Effective Use of Contextual Information in Design. Unpublished PhD, University of Glasgow, Glasgow, Scotland. 2. Draper, S. (1992). Activity Theory: The New Direction for HCI? International Journal for Man-Machine Studies, 37. 3. Hammersley, M., & Atkinson, P. (1995). Ethnography: Principles in Practice. (2nd ed.). London: Routledge. 4. Whittaker, S., Terveen, L. and Nardi, B. Let’s stop pushing the envelope and start addressing it. Submitted to TOCHI.

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

31

Methods&Tools p27-39 6/7/99 6:37 AM Page 32

is critical for activity theory. Many theories discuss Point 1 (such as the idea of affordances, Latourian notions of tool prescriptions, and so forth). Activity theory emphasizes that a tool comes fully into being when it is used and that knowing how to use it is a crucial part of the tool. So, the use of tools is an evolutionary accumulation and transmission of social knowledge, which influences the nature of not only external behavior but also the mental functioning of individuals. The concept of tool in activity theory is broad and embraces both technical tools, which are intended to manipulate physical objects (e.g., a hammer), and psychological tools, which are used by human beings to influence other people or themselves (e.g., the multiplication table or a calendar). Development Finally, activity theory requires that human interaction with reality be analyzed in the context of development. Activity theory sees all practice as being reformed and shaped by historical development. It is important to understand how tools are used not in a single instant of trying them out in a laboratory (for example) but as usage unfolds over time. In that time, development may occur making the tool more useful and efficient than might be seen in a single observation. In activity theory, development is thus not only an object of study, it is also a general research methodology. That is why a basic research method in activity theory is the formative experiment which combines active participation with monitoring of the developmental changes in the object of study. Integration of the Principles

These basic principles of activity theory should be considered an integrated system, because they are associated with various 32

aspects of the whole activity. A systematic application of any of these principles makes it eventually necessary to involve all the others. For instance, understanding the hierarchical structure of an activity requires an analysis of its object or motive, as well as developmental transformations between actions and operations and between internal and external components. The latter, in turn, can critically depend on the tools used in the activity. Activity Checklist

As mentioned earlier, activity theory does not provide ready-made solutions that can be directly applied to specific problems. We see its main potential in supporting researchers and designers in their own search for solutions, in particular, by helping them to ask meaningful questions. To make such an application of activity theory more practical, we introduce an analytical tool, the Activity Checklist. The Activity Checklist is intended to be used at early phases of system design or for evaluating existing systems. Accordingly, there are two slightly different versions of the Checklist, the “evaluation version” and the “design version.” Both versions are used as organized sets of items covering the contextual factors that can potentially influence the use of a computer technology in real-life settings. It is assumed that the Checklist can help to identify the most important issues, for instance, potential trouble spots, that designers can address. Having two versions of the Checklist implies a commitment to the study of actual use as a critical part of design. Researchers such as Bannon [1] have made the useful point that design and use are two sides of the same coin. Still, a design must begin somewhere, and it is helpful to have guidance in the earliest stages of brainstorming and creative imagining of how a technology might come into being. The Checklist covers a large space. It is intended to be used first by examining the whole space for areas of interest, then focusing on the identified areas of interest in as much depth as possible. The general strategy, then,

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

Methods&Tools p27-39 6/7/99 6:37 AM Page 33

is breadth-first consideration of the relevant areas of context enumerated in the Checklist, followed by a “drilling down” into specific areas that should yield rich results given the tools and problems at hand. The structure of the Checklist reflects the five basic principles of activity theory. Since the Checklist is intended to be applied in analyzing how people use (or will use) a computer technology, the principle of tool mediation is strongly emphasized. This principle has been applied throughout the Checklist and systematically combined with the other four principles. It results in four sections corresponding to four main perspectives on the use of the “target technology” to be evaluated or designed: 1. Means and ends—the extent to which the technology facilitates and constrains the attainment of users’ goals and the impact of the technology on provoking or resolving conflicts between different goals. 2. Social and physical aspects of the environment—integration of target technology with requirements, tools, resources, and social rules of the environment. 3. Learning, cognition, and articulation — internal versus external components of activity and support of their mutual transformations with target technology. 4. Development—developmental transformation of the foregoing components as a whole. Taken together, these sections cover various aspects of how the target technology supports, or is intended to support, human actions (“target actions”). See the Checklist in the Appendix to this paper. Using the Checklist

According to our experience of using the Checklist and teaching other people how to use it, there are several points to remember when trying to apply the tool in a specific project. First, the Checklist is supposed to be used not as the only basis for system design or evaluation, but in combination with other techniques. One of the main advantages of using

the Checklist seems to be more effective application of a number of already established methods and techniques. For instance, the Checklist can help identify the most relevant issues to be covered in an interview or to make sure important problems are not overlooked in a discussion of empirical data collected in an observational study. Second, the linear structure of the Checklist does not imply that it should be used linearly, by focusing on isolated items one by one and ignoring the rest of the Checklist. Instead, practitioners using the tool should look for patterns of related items, even if these items belong to different sections. Third, in order to use the tool effectively, practitioners should familiarize themselves with the Checklist and even try to internalize it. We recommend that practitioners follow the items in the Checklist repeatedly at various phases of design or evaluation. A quick initial run should identify the most important potential trouble spots and filter out the rest. Further runs may result in finding patterns, revising previously made judgments about the importance or unimportance of certain issues, and formulating requests for more information, if necessary. Fourth, it should be noted that every tool is used for some purpose, and the Checklist is no exception. Therefore, potential users of the Checklist should clearly understand why they are using the tool. Such understanding can help focus on relevant items and ignore irrelevant ones. Also, such understanding is necessary for successful incorporation of conclusions, judgments, and ideas related to individual items into more general notions relevant to design or evaluation of the system as a whole. Apple Data Detectors: An Example of Using the Checklist

The design of Apple Data Detectors, a multi-

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

33

Methods&Tools p27-39 6/7/99 6:37 AM Page 34

purpose intelligent agent technology for analyzing and taking action on structured data [14], is an example of using the Checklist to go beyond the narrow scope of many design projects. Apple Data Detectors recognizes structured data such as URLs, e-mail addresses, postal addresses, ISBN numbers, and stock symbols. Using “structure detection” technology [17], a detector is written to analyze structured data. The detector is then paired with an action such as “Open URL” or “Create e-mail message.” Apple Data Detectors works in any text; applications do not need to be modified to use it. In the design of Apple Data Detectors, we were concerned with the Learning/development areas of the Checklist. We devoted many resources to considering how end users would go from simple use of the tool involving only accessing structures supplied by Apple or third-party developers to programming their own new agents. As we considered the principle of development from the beginning of our research, we were able to create an architecture that supports end-user programming [12]. We paid special attention to the first four areas in the Development column. To reiterate the point made earlier, the Checklist can be used to scope out a large possible space of potential areas of interest and then narrow down to specific areas to actively pursue. The Checklist is useful in reminding developers of a larger space that gets beyond the details of user interface mechanisms and leads systematically into many areas of the context of use that may provide inspiration for interesting designs. A research project at another lab used structure detection technology much as we did [16]. But the prototype looked quite dif-

ferent because the emphasis was not the users’ wider context as it was in Apple Data Detectors. Apple Data Detectors allows for end-user development of structures, scripting of actions, mixing and matching recognizers and actions, and composite structures (see Table 1). It provides flexibility and a growth path for users. Most designers will have to be concerned with the Means/ends column of the Checklist. In Apple Data Detectors we studied potential uses of Data Detectors and found that for the technology to be useful, users need composite structures such as postal addresses. It is considerably more difficult to write a parser that handles composite structures (e.g., an address is composed of a name, street, city, etc., each of which is an atomic structure). But our users can select an address (with the mouse), and Apple Data Detectors will recognize it and take a prespecified action such as adding the address to the user’s address book, putting each field of the address in the appropriate place in the address book. We also gave careful thought at the outset of the project to our criteria for success and failure (the Design section of the Means/ends column). Our criteria were that the technology be useful for Apple customers and that developers be able to use it painlessly. (Thirdparty developers are developing the structures and actions that work with their applications.) We thus decided not to use OpenDoc, even though there was some pressure to do so. But the developers’ experience would have been much more difficult. As it turned out, this was the right decision more than we knew at the time, because OpenDoc was eventually put on the corporate back burner. It was the explicit

Table 1 A COMPARISON OF TWO USES OF STRUCTURE DETECTION TECHNOLOGY Intel Selection Recognition Agent Apple Data Detectors No scripting (C behind an API) AppleScript

34

No path to end-user modification

Script editor End-user modification

Recognizer/action pairs bound together

Separate recognizers, actions

No composite structures

Composite structures

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

Methods&Tools p27-39 6/7/99 6:37 AM Page 35

attention to a firm set of design criteria that helped us weather that storm. Conclusion

As mentioned earlier, the Activity Checklist is not the only attempt to deal with context in the field of HCI, and it is not intended as a substitute for other approaches. From our point of view, the Checklist can be most successfully used together with other tools and techniques to efficiently address issues of context. For instance, task analysis [6] places a heavy emphasis on the Means/ends dimension of context, whereas environment and, especially, learning and development are underrepresented. Using the terminology of activity theory, we could say that task analysis gives a thorough description of individual actions, whereas the higher levels of activity and interrelations between actions receive less attention. Contextual design [7], conversely, provides an elaborated set of concepts and techniques for describing the environment (as in the “Environment” section of the Checklist). It also supports identifying users’ tasks (although, in our opinion, not to the extent to which it addresses the structure and functioning of the environment) but is less focused on learning and development. Finally, Development is a major concern within participatory design approaches [e.g., 3], but identifying task structures does not have a high priority there. Therefore, each of the existing empirically driven approaches to context has its strong points. From our point of view, the main advantage of the Activity Checklist is that it is a general framework that can be used to (1) provide a preliminary overview of potentially relevant contextual factors, (2) select appropriate tools for further exploration, and (3) evaluate limitations of those tools. In other words, the Checklist can help to leverage the various strengths of empirically based approaches. The fact that the Checklist is comprehensive and wide-ranging should not mislead its potential users. It would be impossible to investigate all the areas it covers without a

multiyear study, but that is not how it is intended to be employed. For most uses of the Checklist, users should first do a “quick-anddirty” perusal of the areas represented in the Checklist that are likely to be troublesome or interesting (or both) in a specific design or evaluation. Then, once those areas have been identified, they can be explored more deeply. The breadth of coverage in the Checklist will help to ensure that designers do not miss areas that might be important for understanding the tool they are working on. Acknowledgments

Many thanks to Helen Hasan, Mark Spasser, Clay Spinuzzi, and John Waterworth for their helpful comments on an earlier version of the paper. References 1. Bannon, L. From human factors to human actors: The role of psychology and human–computer interaction studies in system design. In Design at Work: Cooperative Design of Computer Systems, J. Greenbaum and M. Kyng, eds., Lawrence Erlbaum, Hillsdale, NJ, 1991. 2. Bødker, S. Through the Interface: A Human Activity Approach to User Interface Design. Lawrence Erlbaum, Hillsdale, NJ, 1991. 3. Bødker, S., Knudsen, J., Kyng, M., Ehn, P., and Madsen, K.H. Computer Support For Cooperative Design. In Proceedings of ACM CSCW’88 Conference on Computer-Supported Cooperative Work (Portland, OR, 1988). 4. Bourke, I., Verenikina I., and Gould, E. Interacting With Proprietary Software Users: An Application for Activity Theory? In Proceedings East-West International Conference on Human-Computer Interaction (Moscow, August 3–7, 1993,). ICSTI, Moscow. 5. Brown, J., and Duguid, P. Borderline issues: Social and material aspects of design. Human-Computer Interaction 9, 1 (1994). 6. Dix, A., Finlay, J. Abowd, G., and Beale, R. Human-

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

35

Methods&Tools p27-39 6/7/99 6:37 AM Page 36

Computer Interaction. Prentice Hall, London, 1993.

13. Nardi, B., ed. Context and Consciousness: Activity

7. Holtzblatt, K., and Beyer, H. Making customer-cen-

Theory and Human–Computer Interaction. MIT Press,

tered design work for teams. Communications of the

Cambridge, MA, 1996.

ACM 36, 10 (1993).

14. Nardi, B., Miller, J., and Wright, D. Collaborative,

8. Kaptelinin, V. Human–Computer Interaction in

Programmable Intelligent Agents. In Communications of

Context: The Activity Theory Perspective. In

the ACM March 1998.

Proceedings of the East-West Human–Computer

15. Norman, D. Cognitive Artifacts. In Designing

WORK FOR PERSONAL OR CLASSROOM

Interaction Conference (St. Petersburg, Russia, August

Interaction: Psychology at the Human-Computer Interface,

USE IS GRANTED WITHOUT FEE

4–8). ICSTI, Moscow, 1992.

J. Carroll. ed., Cambridge University Press, Cambridge,

PROVIDED THAT COPIES ARE NOT

9. Kuutti, K. Activity theory and its applications in

1991.

COMMERCIAL ADVANTAGE AND THAT

information systems research and design. In Information

16. Pandit, M., and Kalbag, S. The Selection

COPIES BEAR THIS NOTICE AND THE

Systems Research Arena of the 90’s, H.-E. Nissen, H.K.

Recognition Agent: Instant Access to Relevant

FULL CITATION ON THE FIRST PAGE.

Klein, and R. Hirschheim, eds., Elsevier, Amsterdam,

Information and Operations. In Proceedings of

1991.

Intelligent User Interfaces ‘97. (1997 Orland, Fl.),

TRIBUTE TO LISTS, REQUIRES PRIOR

10. Leont’ev, A.N. Activity, Consciousness, Personality.

New York, ACM Press.

SPECIFIC PERMISSION AND/OR A FEE.

Prentice Hall, Englewood Cliffs, NJ, 1978.

17. Rus, D. and Subramanian, D. Multi-media RISSC

11. Leont’ev, A. N. Problems of the Development of

Informatics. In Proceedings of the 2nd International

Mind. Progress, Moscow, 1981.

Conference on Information and Knowledge Management.

12. Lieberman, H., Nardi, B. and Wright, D. Training

(Washington, DC, 1993.). ACM Press, New York, pp.

Agents to Recognize Text by Example. In Proceedings of

283–294.

the International Conference on Autonomous Agents,

18. Wertsch, J. Ed., The Concept of Activity in Soviet

Seattle, April 1999.

Psychology. M. E. Sharpe, Armonk, NY, 1981.

PERMISSION TO MAKE DIGITAL OR HARD COPIES OF ALL OR PART OF THIS

MADE OR DISTRIBUTED FOR PROFIT OR

TO COPY OTHERWISE, TO REPUBLISH, TO POST ON SERVERS OR TO REDIS-

© ACM 1072-5220/99/0700 $5.00

Appendix. Activity Checklist PREAMBLE

36

Means/ends (hierarchical structure of activity)

Environment (object-orientedness)

Learning/cognition/ articulation (externalization/ internalization)

Development (development)

Human beings have hierarchies of goals that emerge from attempts to meet their needs under current circumstances. Understanding the use of any technology should start with identifying the goals of target actions, which are relatively explicit, and then extending the scope of analysis both “up” (to higher-level actions and activities) and “down” (to lower level actions and operations).

Human beings live in the social, cultural world. They achieve their motives and goals by active transformation of objects in their environments. This section of the checklist identifies the objects involved in target activities and constitutes the environment of the use of target technology.

Activities include both internal (mental) and external components which can transform into each other. Computer systems should support both internalization of new ways of action and articulation of mental processes, when necessary, to facilitate problem solving and social coordination.

Activities undergo permanent developmental transformations. Analysis of the history of target activities can help to reveal the main factors influencing the development. Analysis of potential changes in the environment can help to anticipate their effect on the structure of target activities.

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

Methods&Tools p27-39 6/7/99 6:37 AM Page 37

E VA L U AT I O N V E R S I O N Means/ends

Environment

Learning/cognition/ articulation

Development

People who use the

Role of target technology

Components of target

Use of target technology

target technology Goals and subgoals of the target actions (target goals) Criteria for success or failure of achieving target goals Decomposition of target

in producing the out-

actions that are to be

at various stages of

comes of target actions

internalized

target action “life

Tools, other than target

Knowledge about target

technology, available to

technology that resides

users

in the environment and

cycles”—from goal setting to outcomes Effect of implementation

the way this knowledge

of target technology on

technology with other

is distributed and

the structure of target

tools

accessed

Integration of target

actions

goals into subgoals

Access to tools and mate-

Time and effort necessary

Setting of target goals

rials necessary to per-

to master new opera-

that became attainable

form target actions

tions

after the technology

and subgoals Potential conflicts between target goals Potential conflicts between target goals

Self-monitoring and

Tools and materials shared between several

reflection through externalization

users Spatial layout and tem-

Use of target technology

and goals associated

poral organization of

for simulating target

with other technologies

the working environ-

actions before their

and activities

ment.

Resolution of conflicts

Division of labor,

New higher-level goals

had been implemented Users’ attitudes toward target technology (e.g., resistance) and changes over time Dynamics of potential

actual implementation

conflicts between

Support of problem articu-

target actions and

between various goals

including synchronous

lation and help request

higher-level goals

Integration of individual

and asynchronous dis-

in case of breakdowns

Anticipated changes in

target actions and

tribution of work

other actions into

between different loca-

Strategies and procedures of providing help to

the level of activity they

higher-level actions

tions

other users of target

directly influence (oper-

technology

ations, actions, or activi-

Constraints imposed by

Rules, norms, and proce-

higher-level goals on

dures regulating social

the choice and use of

interactions and coordi-

target technology

nation related to the

Alternative ways to attain target goals through

use of target tech-

Coordination of individual

the environment and

ties)

and group activities through externalization Use of shared representation to support collabo-

nology

lower-level goals.

rative work

Troubleshooting strate-

Individual contributions

gies and techniques

to shared resources of

Support of mutual trans-

group or organization

formations between actions and operations

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

37

Methods&Tools p27-39 6/7/99 6:37 AM Page 38

DESIGN VERSION

U S E

38

Means/ends

Environment

Learning/cognition/ articulation

Development

People who use the target technology Goals and subgoals of the target actions (target goals) Criteria for success or failure of achieving target goals Decomposition of target goals into subgoals Setting of target goals and subgoals Potential conflicts between target goals Potential conflicts between target goals and goals associated with other technologies and activities Resolution of conflicts between various goals Integration of individual target actions and other actions into higher-level actions Constraints imposed by higher-level goals on the choice and use of target technology Alternative ways to attain target goals through lower-level goals. Troubleshooting strategies and techniques Support of mutual transformations between actions and operations Goal that can be changed or modified, and goals that have to remain after new technology is implemented

Role of existing technology in producing the outcomes of target actions Tools, available to users Integration of target technology with other tools Access to tools and materials necessary to perform target actions Tools and materials shared between several users Spatial layout and temporal organization of the working environment. Division of labor, including synchronous and asynchronous distribution of work between different locations Rules, norms, and procedures regulating social interactions and coordination related to target actions

Components of target actions that are to be internalized Time and effort necessary to learn how to use existing technology Self-monitoring and reflection through externalization Possibilities for simulating target actions before their actual implementation. Support of problem articulation and help request in case of breakdowns Strategies and procedures of providing help to collegues and collaborators Coordination of individual and group activities through externalization Use of shared representation to support collaborative work

Use of tools at various stages of target action “life cycles”—from goal setting to outcomes Transformation of existing activities into future activities supported with the system History of implementation of new technologies to support target actions Anticipated changes in the environment and the level of activity they directly influence (operations, actions, or activities) Anticipated changes of target actions after new technology is implemented

i n t e r a c t i o n s . . . j u l y

+

a u g u s t

1 9 9 9

Methods&Tools p27-39 6/7/99 6:37 AM Page 39

DESIGN VERSION

D E S I G N

Means/ends

Environment

Learning/cognition/ articulation

Development

Parties involved in the process of design Goals of designing a new system Criteria of success or failure of design Potential conflicts between goals of design and other goals (e.g., stability of the organization, minimizing expenses)

Resources available to the parties involved in design of the system Rules, norms, and procedures regulating interaction between the parties

Representations of design that support coordination between the parties Mutual learning of the content of the work (designers) and possibilities and limitations of technology (users)

Anticipated changes in the requirements to the system

SAMPLE QUESTIONS Means/ends

Environment

Learning/cognition/ articulation

Development

Are all target actions actually supported?

Are concepts and vocabulary of the system consistent with the concepts and vocabulary of the domain?

Is the whole “action lifecycle,” from goal setting to the final outcome, taken into account and/or supported?

What are the consequences of implementing the target technology on target actions? Did expected benefits actually take place?

Is target technology considered an important part of work activities?

Does the system help to avoid unnecessary learning?

Are computer resources necessary to produce a certain outcome integrated with each other?

Is externally distributed knowledge easily accessible when necessary?

Is there any functionality of the system that is not actually used? If yes, which actions were intended to be supported with this functionality? How do users perform these actions? Are there actions, other than target actions, that are not supported, but users obviously need such support? Are there conflicts between different goals of the user? If yes, what are the current trade-offs and rules or procedures for resolving the conflicts?

Is target technology integrated with other tools and materials? Are characteristics of target technology consistent with the nature of the environment (e.g., central office work vs. teleworking)?

What are the basic limitations of the current technology?

Does the system provide problem representations in case of breakdowns that can be used to find a solution or formulate a request for help? Are there external representations of the user’s activities that can be used by others as clues for coordinating their activities within the framework of group or organization?

Is it necessary for the user to constantly switch between different actions or activities? If yes, are there “emergency exits” which support painless transition between actions and activities, and, if necessary, returning to previous states, actions, or activities?

i n t e r a c t i o n s . . . j u l y

Does the system provide representations of user’s activities that can help in goal setting and self-evaluation?

+

a u g u s t

1 9 9 9

Did users have enough experience with the system at the time of evaluation? Did the system require a large investment of time and effort in learning how to use it? Did the system show increasing or decreasing benefits over the process of its use? Are users’ attitudes toward the system becoming more or less positive? Are there negative or positive side-effects associated with the use of the system?

39

Chapter 1

Why Interview?

I

interview because I am interested in other people’s stories. Most simply put, stories are a way of knowing. The root of the word story is the Greek word histor, which means one who is “wise” and “learned” (Watkins, 1985, p. 74). Telling stories is essentially a meaning-making process. When people tell stories, they select details of their experience from their stream of consciousness. Every whole story, Aristotle tells us, has a beginning, a middle, and an end (Butcher, 1902). In order to give the details of their experience a beginning, middle, and end, people must reflect on their experience. It is this process of selecting constitutive details of experience, reflecting on them, giving them order, and thereby making sense of them that makes telling stories a meaning-making experience. (See Schutz, 1967, p. 12 and p. 50, for aspects of the relationship between reflection and meaning making.) Every word that people use in telling their stories is a microcosm of their consciousness (Vygotsky, 1987, pp. 236–237). Individuals’ consciousness gives access to the most complicated social and educational issues, because social and educational issues are abstractions based on the concrete experience of people. W. E. B. Du Bois knew this when he wrote, “I seem to see a way of elucidating the inner meaning of life and significance of that race problem by explaining it in terms of the one human life that I know best” (Wideman, 1990, p. xiv). Although anthropologists have long been interested in people’s stories as a way of understanding their culture, such an approach to research in education has not been widely accepted. For many years those who were trying to make education a respected academic discipline in universities argued that education could be a science (Bailyn, 1963). They urged their colleagues in education to adapt research models patterned after those in the natural and physical sciences. In the 1970s a reaction to the dominance of experimental, quantitative, and behaviorist research in education began to develop (Gage, 1989). The critique had its own energy and was also a reflection of the era’s more general resistance to received authority (Gitlin, 1987, esp. 7

8

Interviewing as Qualitative Research

chap. 4). Researchers in education split into two, almost warring, camps: quantitative and qualitative. It is interesting to note that the debate between the two camps got especially fierce and the polemics more extreme when the economics of higher education took a downturn in the mid-1970s and early 1980s (Gage, 1989). But the political battles were informed by real epistemological differences. The underlying assumptions about the nature of reality, the relationship of the knower and the known, the possibility of objectivity, the possibility of generalization, inherent in each approach are different and to a considerable degree contradictory. To begin to understand these basic differences in assumptions, I urge you to read James (1947), Lincoln and Guba (1985, chap. 1), Mannheim (1975), and Polanyi (1958). For those interested in interviewing as a method of research, perhaps the most telling argument between the two camps centers on the significance of language to inquiry with human beings. Bertaux (1981) has argued that those who urge educational researchers to imitate the natural sciences seem to ignore one basic difference between the subjects of inquiry in the natural sciences and those in the social sciences: The subjects of inquiry in the social sciences can talk and think. Unlike a planet, or a chemical, or a lever, “If given a chance to talk freely, people appear to know a lot about what is going on” (p. 39). At the very heart of what it means to be human is the ability of people to symbolize their experience through language. To understand human behavior means to understand the use of language (Heron, 1981). Heron points out that the original and archetypal paradigm of human inquiry is two persons talking and asking questions of each other. He says: The use of language, itself, . . . contains within it the paradigm of cooperative inquiry; and since language is the primary tool whose use enables human construing and intending to occur, it is difficult to see how there can be any more fundamental mode of inquiry for human beings into the human condition. (p. 26)

Interviewing, then, is a basic mode of inquiry. Recounting narratives of experience has been the major way throughout recorded history that humans have made sense of their experience. To those who would ask, however, “Is telling stories science?” Peter Reason (1981) would respond, The best stories are those which stir people’s minds, hearts, and souls and by so doing give them new insights into themselves, their problems and their human condition. The challenge is to develop a human science

Why Interview?

9

that can more fully serve this aim. The question, then, is not “Is story telling science?” but “Can science learn to tell good stories?” (p. 50)

THE PURPOSE OF INTERVIEWING The purpose of in-depth interviewing is not to get answers to questions, nor to test hypotheses, and not to “evaluate” as the term is normally used. (See Patton, 1989, for an exception.) At the root of in-depth interviewing is an interest in understanding the lived experience of other people and the meaning they make of that experience. (For a deeply thoughtful elaboration of a phenomenological approach to research, see Van Manen, 1990, from whom the notion of exploring “lived” experience mentioned throughout this text is taken.) Being interested in others is the key to some of the basic assumptions underlying interviewing technique. It requires that we interviewers keep our egos in check. It requires that we realize we are not the center of the world. It demands that our actions as interviewers indicate that others’ stories are important. At the heart of interviewing research is an interest in other individuals’ stories because they are of worth. That is why people whom we interview are hard to code with numbers, and why finding pseudonyms for participants1 is a complex and sensitive task. (See Kvale, 1996, pp. 259–260, for a discussion of the dangers of the careless use of pseudonyms.) Their stories defy the anonymity of a number and almost that of a pseudonym. To hold the conviction that we know enough already and don’t need to know others’ stories is not only anti-intellectual; it also leaves us, at one extreme, prone to violence to others (Todorov, 1984). Schutz (1967, chap. 3) offers us guidance. First of all, he says that it is never possible to understand another perfectly, because to do so would mean that we had entered into the other’s stream of consciousness and experienced what he or she had. If we could do that, we would be that other person. Recognizing the limits on our understanding of others, we can still strive to comprehend them by understanding their actions. Schutz gives the example of walking in the woods and seeing a man chopping wood. The observer can watch this behavior and have an “observational understanding” of the woodchopper. But what the observer understands as a result of this observation may not be at all consistent with how the woodchopper views his own behavior. (In analogous terms, think of the prob-

10

Interviewing as Qualitative Research

lem of observing students or teachers.) To understand the woodchopper’s behavior, the observer would have to gain access to the woodchopper’s “subjective understanding,” that is, know what meaning he himself made out of his chopping wood. The way to meaning, Schutz says, is to be able to put behavior in context. Was the woodchopper chopping wood to supply a logger, heat his home, or get in shape? (For Schutz’s complete and detailed explication of this argument, see esp. chaps. 1–3. For a thoughtful secondary source on research methodology based on phenomenology, for which Schutz is one primary resource, see Moustakas, 1994.) Interviewing provides access to the context of people’s behavior and thereby provides a way for researchers to understand the meaning of that behavior. A basic assumption in in-depth interviewing research is that the meaning people make of their experience affects the way they carry out that experience (Blumer, 1969, p. 2). To observe a teacher, student, principal, or counselor provides access to their behavior. Interviewing allows us to put behavior in context and provides access to understanding their action. The best article I have read on the importance of context for meaning is Elliot Mishler’s (1979) “Meaning in Context: Is There Any Other Kind?” the theme of which was later expanded into his book, Research Interviewing: Context and Narrative (1986). Ian Dey (1993) also stresses the significance of context in the interpretation of data in his useful book on qualitative data analysis. INTERVIEWING: “THE” METHOD OR “A” METHOD? The primary way a researcher can investigate an educational organization, institution, or process is through the experience of the individual people, the “others” who make up the organization or carry out the process. Social abstractions like “education” are best understood through the experiences of the individuals whose work and lives are the stuff upon which the abstractions are built (Ferrarotti, 1981). So much research is done on schooling in the United States; yet so little of it is based on studies involving the perspective of the students, teachers, administrators, counselors, special subject teachers, nurses, psychologists, cafeteria workers, secretaries, school crossing guards, bus drivers, parents, and school committee members, whose individual and collective experience constitutes schooling. A researcher can approach the experience of people in contemporary organizations through examining personal and institutional docu-

Why Interview?

11

ments, through observation, through exploring history, through experimentation, through questionnaires and surveys, and through a review of existing literature. If the researcher’s goal, however, is to understand the meaning people involved in education make of their experience, then interviewing provides a necessary, if not always completely sufficient, avenue of inquiry. An educational researcher might suggest that the other avenues of inquiry listed above offer access to people’s experience and the meaning they make of it as effectively as and at less cost than does interviewing. I would not argue that there is one right way, or that one way is better than another. Howard Becker, Blanche Geer, and Martin Trow carried on an argument in 1957 that still gains attention in the literature because, among other reasons, Becker and Geer seemed to be arguing that participant observation was the single and best way to gather data about people in society. Trow took exception and argued back that for some purposes interviewing is far superior (Becker & Geer, 1957; Trow, 1957). The adequacy of a research method depends on the purpose of the research and the questions being asked (Locke, 1989). If a researcher is asking a question such as, “How do people behave in this classroom?” then participant observation might be the best method of inquiry. If the researcher is asking, “How does the placement of students in a level of the tracking system correlate with social class and race?” then a survey may be the best approach. If the researcher is wondering whether a new curriculum affects students’ achievements on standardized tests, then a quasi-experimental, controlled study might be most effective. Research interests don’t always or often come out so neatly. In many cases, research interests have many levels, and as a result multiple methods may be appropriate. If the researcher is interested, however, in what it is like for students to be in the classroom, what their experience is, and what meaning they make out of that experience—if the interest is in what Schutz (1967) calls their “subjective understanding”—then it seems to me that interviewing, in most cases, may be the best avenue of inquiry. I say “in most cases,” because below a certain age, interviewing children may not work. I would not rule out the possibility, however, of sitting down with even very young children to ask them about their experience. Carlisle (1988) interviewed first-grade students about their responses to literature. She found that although she had to shorten the length of time that she interviewed students, she was successful at exploring with first graders their experience with books.

12

Interviewing as Qualitative Research

WHY NOT INTERVIEW? Interviewing research takes a great deal of time and, sometimes, money. The researcher has to conceptualize the project, establish access and make contact with participants, interview them, transcribe the data, and then work with the material and share what he or she has learned. Sometimes I sense that a new researcher is choosing one method because he or she thinks it will be easier than another. Any method of inquiry worth anything takes time, thoughtfulness, energy, and money. But interviewing is especially labor intensive. If the researcher does not have the money or the support to hire secretarial help to transcribe tapes, it is his or her labor that is at stake. (See Chapter 8.) Interviewing requires that researchers establish access to, and make contact with, potential participants whom they have never met. If they are unduly shy about themselves or hate to make phone calls, the process of getting started can be daunting. On the other hand, overcoming shyness, taking the initiative, establishing contact, and scheduling and completing the first set of interviews can be a very satisfying accomplishment. My sense is that graduate programs today in general, and the one in which I teach in particular, are much more individualized and less monolithic than I thought them to be when I was a doctoral candidate. Students have a choice of the type of research methodology they wish to pursue. But in some graduate programs there may be a cost to pay for that freedom: Those interested in qualitative research may not be required to learn the tenets of what is called “quantitative” research. As a result, some students tend not to understand the history of the method they are using or the critique of positivism and experimentalism out of which some approaches to qualitative research in education grew. (For those interested in learning that critique as an underpinning for their work, as a start see Johnson, 1975; Lincoln & Guba, 1985.) Graduate candidates must understand the so-called paradigm wars (Gage, 1989) that took place in the 1970s and 1980s and are still being waged in the 2000s (Shavelson & Towne, 2002). By not being aware of the history of the battle and the fields upon which it has been fought, students may not understand their own position in it and the potential implications for their career as it continues. If doctoral candidates choose to use interviewing as a research methodology for their dissertation or other early research, they should know that their choice to do qualitative research has not been the dominant one in the history of educational research. Although qualitative research has gained ground in the last 30

Why Interview?

13

years, professional organizations, some journals in education, and personnel committees on which senior faculty tend to sit, are often dominated by those who have a predilection for quantitative research. Furthermore, the federal government issued an additional challenge to qualitative researchers when it enacted legislation that guides federal funding agencies to award grants to researchers whose methodologies adhere to “scientific” standards. (See the definition of “scientific” in section 102,18 of the Education Sciences Reform Act of 2002.) In some arenas, doctoral candidates choosing to do qualitative rather than quantitative research may have to fight a stiffer battle to establish themselves as credible. They may also have to be comfortable with being outside the center of the conventional educational establishments. They will have to learn to search out funding agencies, journals, and publishers open to qualitative approaches. (For a discussion of some of these issues, see Mishler, 1986, esp. pp. 141–143; Wolcott, 1994, pp. 417–422.) Although the choice of a research method ideally is determined by what one is trying to learn, those coming into the field of educational research must know that some researchers and scholars see the choice as a political and moral one. (See Bertaux, 1981; Fay, 1987; Gage, 1989; Lather, 1986a, 1986b; Popkowitz, 1984.) Those who espouse qualitative research often take the high moral road. Among other criticism, they decry the way quantitative research turns human beings into numbers. But, there are equally serious moral issues involved in qualitative research. As I read Todorov’s (1984) The Conquest of America, I began to think of interviewing as a process that turns others into subjects so that their words can be appropriated for the benefit of the researcher. Daphne Patai (1987) raises a similar issue when she points out that the Brazilian women she interviewed seemed to enjoy the activity, but she was deeply troubled by the possibility that she was exploiting them for her scholarship. Interviewing as exploitation is a serious concern and provides a contradiction and a tension within my work that I have not fully resolved. Part of the issue is, as Patai recognizes, an economic one. Steps can be taken to assure that participants receive an equitable share of whatever financial profits ensue from their participation in research. But, at a deeper level, there is a more basic question of research for whom, by whom, and to what end. Research is often done by people in relative positions of power in the guise of reform. All too often the only interests served are those of the researcher’s personal advancement. It is a constant struggle to make the research process equitable, especially in the United States where a good deal of our social structure is inequitable.

14

Interviewing as Qualitative Research

CONCLUSION So why choose interviewing? Perhaps constitutive events in your life, as in mine, have added up to your being “interested” in interviewing as a method. It is a powerful way to gain insight into educational and other important social issues through understanding the experience of the individuals whose lives reflect those issues. As a method of inquiry, interviewing is most consistent with people’s ability to make meaning through language. It affirms the importance of the individual without denigrating the possibility of community and collaboration. Finally, it is deeply satisfying to researchers who are interested in others’ stories. NOTE 1. The word a researcher chooses to refer to the person being interviewed often communicates important information about the researcher’s purpose in interviewing and his or her view of the relationship. In the literature about interviewing, a wide range of terms is used. Interviewee or respondent (Lincoln & Guba, 1985; Richardson, Dohrenwend, & Klein, 1965) casts the participant in a passive role and the process of interviewing as one of giving answers to questions. Some writers refer to the person being interviewed as the subject (Patai, 1987). On one hand, this term can be seen as positive; it changes the person being interviewed from object to subject. On the other hand, the term subject implies that the interviewing relationship is hierarchical and that the person being interviewed can be subjugated. Alternatively, anthropologists tend to use the term informant (Ellen, 1984), because the people they interview inform them about a culture. Researchers pursuing cooperative inquiry and action research may consider all involved in the research as co-researchers (Reason, 1994). The use of this term has significant implications for how you design research, and gather and interpret data. In searching for the term we wanted to use, my colleagues and I focused on the fact that in-depth interviewing encourages people to reconstruct their experience actively within the context of their lives. To reflect that active stance we chose the word participants to refer to the people we interview. That word seems to capture both the sense of active involvement that occurs in an in-depth interview and the sense of equity that we try to build in our interviewing relationships.

E-Mail Interviewing in Qualitative Research: A Methodological Discussion

Lokman I. Meho School of Library and Information Science, Indiana University, 1320 E. 10th Street, LI 011, Bloomington, IN 47405. E-mail: [email protected]

This article summarizes findings from studies that employed electronic mail (e-mail) for conducting indepth interviewing. It discusses the benefits of, and the challenges associated with, using e-mail interviewing in qualitative research. The article concludes that while a mixed mode interviewing strategy should be considered when possible, e-mail interviewing can be in many cases a viable alternative to face-to-face and telephone interviewing. A list of recommendations for carrying out effective e-mail interviews is presented.

Introduction The past two decades have seen a considerable increase in the number of studies in library and information science (LIS) that employ qualitative research methods. This increase has, in turn, resulted in a noticeable shift towards studies that rely on observation and in-depth (or less-structured) interviewing, as opposed to questionnaires or structured interviewing. The goal of both observation and in-depth interview methods is to improve understanding of social and cultural phenomena and processes rather than to produce objective facts about reality and make generalizations to given populations (Fidel, 1993; Pettigrew, Fidel, & Bruce, 2001; Wang, 1999). Over the years, however, researchers have identified challenges associated with the observation and in-depth interview methods, including cost, time, and limited access to research participants (Denzin & Lincoln, 2005; Gubrium & Holstein, 2002; Kvale, 1996; Miles & Huberman, 1994; Patton, 2002; Strauss & Corbin, 1998; Taylor & Bogdan, 1998). Challenged with the task of identifying new methods or tools for conducting more effective research while retaining or improving quality, researchers started to explore using the Internet for carrying out qualitative research. These researchers began to use (and still do) three main types of Internet-based qualitative research methods: online synchronous interviews, online asynchronous interviews, Received May 25, 2005; revised June 23, 2005; accepted August 3, 2005

•

© 2006 Wiley Periodicals, Inc. Published online 25 May 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20416

and virtual focus groups.1 In contrast to studies that used online synchronous interviews (Bowker & Tuffin, 2004; Madge & O’Connor, 2002; Mann & Stewart, 2000) and those that used virtual focus groups (Burton & Bruening, 2003; Chase & Alvarez, 2000; Gaiser, 1997; Mann & Stewart, 2000; Schneider, Kerwin, Frechtling, & Vivari, 2002; Underhill & Olmsted, 2003), studies that used online asynchronous interviewing have never been reviewed as a separate body of literature and their specific characteristics have nearly always been subsumed under the broader category of online research methods (e.g., Kraut et al., 2004; Madge & O’Connor, 2004). To fill this gap, the current paper reviews studies that used online, asynchronous, in-depth interviewing within the context of qualitative research. In doing so, the article addresses two questions: What opportunities, constraints, and challenges does online, asynchronous, in-depth interviewing present for collecting qualitative data? How can in-depth e-mail interviews be conducted effectively?

Before discussing these questions, it is important to note that online, asynchronous, in-depth interviewing, which is usually conducted via e-mail, is, unlike e-mail surveys, semistructured in nature and involves multiple e-mail exchanges between the interviewer and interviewee over an extended period of time. Online, asynchronous, in-depth interviewing is also different from virtual focus groups in that the information volunteered by individual participants is not shared with, viewed, or influenced by other participants (Schneider et al., 2002). With the exception of Meho and Tibbo (2003), LIS researchers have yet to adopt this method of interviewing in their qualitative research. Exploring the value of e-mail interviewing in qualitative research and 1 Electronic questionnaires (via Web page delivery or electronic mail) are among the earliest and most popular online methods used by researchers. These, however, are considered quantitative in nature and the studies based on them are not reviewed here. For excellent reviews of online surveys or questionnaires, see Birnbaum (2004), Couper (2000), Dillman (2000), and Tourangeau (2004).

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(10):1284–1295, 2006

knowing under what conditions it can be most effective and how to implement it, should be useful to LIS researchers. This knowledge could be particularly useful to those who study people who prefer to be interviewed online rather than face-to-face, as well as people who are not easily accessible or are geographically far apart. What follows is a review of studies that employed qualitative e-mail interviewing, a summary of their major findings, and a list of recommendations for carrying out effective e-mail interviews.2 Review of the Literature Citation and bibliographic searches using multiple types of sources and strategies indicate that the use of in-depth, e-mail interviewing is rapidly increasing. In 2003 and 2004 alone, for example, there were as many qualitative studies using this data collection method as in all previous years (see below). Moreover, nearly all of the studies conducted before 2003 were methodological in nature, aiming simply to test the suitability of e-mail for qualitative interviewing. In contrast, most of the studies conducted since 2003 have not addressed methodological issues; this suggests that e-mail interviewing has become a viable tool for qualitative research. Methodological Studies In the earliest study Foster (1994) used e-mail for conducting qualitative interviews with subscribers to a listserv. His intentions were both to study the ways in which university instructors conducted curriculum planning and to explore the advantages of e-mail interviewing, along with the challenges that could arise were the method not employed carefully. Murray (1995, 1996) interviewed five nurses to study why and how they used computer-mediated communication and to examine the potentials of e-mail for interviewing research participants. Persichitte, Young, and Tharp (1997) interviewed six education professionals by e-mail to examine how they used technology at work and to develop guidelines for conducting e-mail interviews (see also Young, Persichitte, & Tharp, 1998). Murray and Sixsmith (1998) conducted electronic interviews with 21 prosthesis users located in Australia, Canada, the Netherlands, United Kingdom, and United States in order to explore the viability of e-mail as a qualitative research medium for in-depth interviewing. After educating her third- and fourth-year college students about qualitative methods and in-depth interviewing, Curasi (2001) asked them to interview 48 consumers by e-mail and face-to-face to examine how data from the two types of interviews compare and how to increase the effectiveness of the e-mail method. 2 The relevant literatures of discourse analysis, content analysis, and computer-mediated communication (CMC) are not reviewed here. Discourse analysis is described in detail in Dijk (1997) and Schiffrin, Tannen, and Hamilton (2001); content analysis in Krippindorff (2004), Patton (2002), and Weber (1990); and CMC in Herring (2002), Knapp and Daly (2002), and Thurlow, Tomic, and Lengel (2004).

Non-Methodological Studies Kennedy (2000) conducted in-depth, e-mail interviews with 17 women who are designers to determine the kinds of experiences they have on the Internet and the impact these experiences have on their Internet Web site activities, Internet personal activities, and non-Internet personal and/or social activities. Karchmer (2001) used e-mail interviewing to explore 13 K–12 teachers’ reports about how the Internet influenced literacy and literacy instruction in their classrooms. Meho and Tibbo (2003) used e-mail interviewing to explore and model the information-seeking behavior of social science faculty; they interviewed 60 scholars from 14 different countries (see also Meho, 2001). Kim, Brenner, Liang, and Asay (2003) used e-mail to interview ten 1.5Generation Asian American college students to study their adaptation experiences as immigrants and their current experiences as young adults. Hodgson (2004) used e-mail to interview 22 self-reported self-injurers to learn about their stigma management techniques and their motives for selfinjury. Lehu (2004) conducted in-depth interviews with 53 top-level managers and advertising executives to investigate why brands age and what managers subsequently do to rejuvenate them. Murray (2004) interviewed 35 prosthesis users by e-mail and face-to-face to understand the embodied perceptual experience of successful prosthesis. Murray and Harrison (2004) conducted e-mail and face-to-face interviews with 10 stroke survivors to investigate the meaning and experience of being a survivor. Olivero and Lunt (2004) conducted semistructured long e-mail and face-to-face interviews with 28 adult Internet users to explore their views on privacy. More details about the above mentioned 14 studies are provided in Table 1. Although only Curasi (2001), Meho and Tibbo (2003), Murray (2004), Murray and Harrison (2004), and Olivero and Lunt (2004) collected both e-mail and face-to-face interview data, the majority of the 14 studies summarized above discussed the benefits and challenges of e-mail interviewing in qualitative research and how to alleviate or eliminate some of those problems, or how to conduct more efficient and effective e-mail interviews. The following is a summary of their findings.

Benefits and Challenges Associated With the Use of E-Mail Interviewing in Qualitative Research Cost and Efficiency E-mail interviews cost considerably less to administer than telephone or face-to-face interviews. Researchers can invite participation of large or geographically dispersed samples of people by sending them e-mail messages individually or through listservs, message boards, or discussion groups, rather than making long-distance telephone calls, using regular mail, or traveling to the location of participants. The use of e-mail in research also decreases the cost of transcribing. Data from e-mail interviews are generated in

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

1285

1286

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi Nurses

Education professionals Prosthesis users

Female Web designers

Consumers

K–12 teachers

None

None

None

None

24

None

5

6

21

17 (all women)

24

13 (10 women and 3 men)

Murray (1995, 1996)

Persichitte et al. (1997); and Youngetal.(1998)

Murray & Sixsmith (1998)

Kennedy (2000)

Curasi (2001)

Karchmer (2001)

University instructors

None

NR

Foster (1994)

Participants

No. of FTF participants

Details of e-mail interview studies.

No. of e-mail participants

TABLE 1.

11 different states in the United States

NR

NR

Australia, Canada, the Netherlands, United Kingdom, United States

NR

NR

Australia, New Zealand, Canada, United States, Europe

Geographical distribution of participants

Announcements on electronic mailing lists (n=8), direct email solicitations to specific teachers (n=10), and snowballing (n=13)

NR

Invitations posted on listservs, message boards, and personal research Web site, as well as e-mail messages to individual Web sites

Invitation posted on a listserv

Individual e-mail solicitations to colleagues

Invitation posted on a listserv

Invitation posted on a listserv

Recruitment tool/method

Building rapport first, then emailing a few questions at a time

Interview Guide in a single e-mail and follow-up

NR

NR

NR

Interview Guide in a single e-mail and follow-up

Interview questions

3 months

1 semester (interviewers were psychology senior students)

NR

Over 4 months

Most interviews were completed within 2 weeks

NR

Length of e-mail data collection period

Between 25 and 30 times with each participant

NR

NR

NR

NR

NR

No. of e-mail or follow-up exchanges

NR

NR

NR

NR

NR

NR

NR

Incentives

(continued)

31 were invited; 16 initially agreed to participate, 3 of whom discontinued correspondence within the first week of data collection

NR

23 were initially involved; 3 stopped communicating and 3 did not supply enough information for analysis

NR

5 out of 9 indicated they were willing to participate in additional discussion

NR

Participation/ dropout rates

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

1287

FTF face-to-face; NR not reported.

Web and e-mail users

5

23 (12 women and 11 men)

Olivero & Lunt (2004)

Note.

Stroke survivors

5 (2 women and 3 men)

5 (4 women and 1 men)

Murray & Harrison (2004)

Murray (2004)

Successful prosthesis users

Executives

None

53

14 21 19 women and 16 19 women and 16 men men

Self-reported selfinjurers

None

22 (18 women and 4 men)

Hodgson (2004)

Lehu (2004)

Faculty

5 (all men)

60 (19 women and 41 men)

College students

None

Meho & Tibbo (2003)

Participants

No. of FTF participants

10 (3 women and 7 men)

No. of e-mail participants

(continued)

Kim et al. (2003)

TABLE 1.

Different regions in the United Kingdom

United Kingdom [and possibly other countries]

United Kingdom [and possibly other countries]

France

Different states in the United States

14 different countries

Mid-Atlantic states

Geographical distribution of participants

Snowballing among experienced Internet and email users

E-mail discussion group for stroke survivors

Advertisement on an e-mail discussion group (with approximately 200 members) via the moderator of the list

Telephone and e-mail

Message boards

Database searching to identify e-mail addresses and then individual invitations

Psychology course

Recruitment tool/method

Building rapport and providing a provisional structure to the interview first, then a few questions at a time

Building rapport and providing a provisional structure to the interview first, then a few questions at a time

Building rapport and providing a provisional structure to the interview first, then a few questions at a time

Interview Guide in a single e-mail and follow-up

Interview Guide in a single e-mail and follow-up

Interview Guide in a single e-mail and follow-up

Interview Guide in a single e-mail and follow-up

Interview questions

Over 6 weeks

An average of 6 weeks (range 2–8 weeks)

2–6 months

2–5 weeks

Over a month

5 months

10 weeks

Length of e-mail data collection period

Ranged from 17–24 messages per participant

On average, 8.4 contacts were made per person (range: 8–10)

On average, 15 contacts were made per person (range: 5–60 contacts)

A maximum of 2 follow-up exchanges with most participants; only 7 had more exchanges

2–9 follow-up e-mails per participant

75.0% of study participants replied to 1–3 follow-up e-mails

Several

No. of e-mail or follow-up exchanges

NR

NR

20 pounds, 10 pounds if e-mails were answered within 24 hours

NR

Executives at 92 firms were contacted. 56 agreed to participate. 53 were accepted

Of 106 potential participants, 16 declined for lack of time and 30 did not respond to the invitations

2 students dropped out during data collection and data from 5 students were not used, to increase the homogeneity of the sample

Participation/ dropout rates

NR

NR

NR

NR

Bibliograph.& citation searches, and copies of articles

Course credit

Incentives

electronic format and require little editing or formatting before they are processed for analysis. E-mail also eliminates the need for synchronous interview times and allows researchers to interview more than 1 participant at a time, because a standard interview schedule or list of questions can be sent individually to several participants at once, irrespective of their geographical location or time zone. However, the time period required to collect e-mail interview data varies from one study to another. Some researchers report a delay of several months before data collection is complete, while others wait only a week. This variation occurs because it may take days or even weeks before a respondent replies to an e-mail message. The number of follow-up exchanges between the researcher and participants may also fluctuate greatly; some interviewers complete data collection after only one follow-up exchange, whereas others may require more than 30 exchanges (see Table 1). Overall, the length of the data collection period depends on several factors, including but not limited to: the number of participants in each study, the number of questions asked, the degree of commitment or motivation of the participants, the quantity and quality of data gathered, the time both the participants and the interviewers can afford to spend on these interviews, and access to the Internet at the time questions were e-mailed. Some studies showed that when online communication was stretched over a long period of time, participants experienced a degree of affirmation for their participation (Bowker & Tuffin, 2004; Walther, 1996). As discussed further below, however, other studies show that the longer it takes to complete an interview with a participant, the higher the possibility of dropouts or frustration to both the researcher and the interviewees (Hodgson, 2004). Methods or procedures that a researcher could employ to reduce the possibilities of dropouts or frustration caused by length or number of interviews are discussed below. Democratization and Internationalization of Research Although e-mail interviewing limits the research to those people with access to the Internet, the method, on the other hand, democratizes and internationalizes research. In contrast to face-to-face and telephone interviewing, e-mail interviewing enables researchers to study individuals or groups with special characteristics or those often difficult or impossible to reach or interview face-to-face or via telephone, such as executives (Lehu, 2004), prosthesis users (Murray, 2004; Murray & Sixsmith, 1998), self-reported self-injurers (Hodgson, 2004), stroke survivors (Murray & Harrison, 2004), and people with disabilities (Bowker & Tuffin, 2004), or those who are geographically dispersed (Foster, 1994; Hodgson, 2004; Karchmer, 2001; Meho & Tibbo, 2003; Murray, 2004; Murray & Harrison, 2004; Murray & Sixsmith, 1998; Olivero & Lunt, 2004) or located in dangerous or politically sensitive sites (Meho & Tibbo, 2003). Moreover, e-mail enables the interviewing of shy people or people who do not or cannot express themselves as

1288

well in talking as they do in writing, especially when the language used in communicating with participants is their second one (Karchmer, 2001; Kim et al., 2003). In short, e-mail allows the researcher to interview groups or communities that would not and could not have been studied otherwise. Sample Recruitment Recruiting in e-mail interviewing studies is done in multiple ways, including individual solicitations, snowballing, or invitations through listservs, message boards, discussion groups, or personal research Web sites. Because these are the same methods employed by online survey researchers, it was not surprising that e-mail interviewing researchers face similar problems in recruiting participants. For example, although recruiting is easy in some cases, in others it can be daunting because even when e-mail addresses are found or invitations are sent to listservs and message boards, not all potential participants read the invitations (Meho & Tiboo, 2003). The number of qualitative studies that have employed e-mail interviewing is insufficient for making generalizations, but experience with online survey research indicates that, due to information overload, many people delete invitations before they are read. To ensure sufficient participation, researchers who encounter high undeliverable rates usually send reminders to those who did not reply to initial invitations. As with traditional mail and e-mail surveys, reminders can significantly increase participation rates (see Meho & Tibbo). A number of the e-mail interviewing studies reviewed here also share findings similar to those of online survey research in terms of high rates of nondelivery (Dommeyer & Moriarty, 2000; Frost, 1998; Meho & Tibbo, 2003; Oppermann, 1995). Among other reasons, this occurs because people change or lose their e-mail addresses (e.g., students who graduate from school, faculty members who change jobs, or people who change their Internet service providers). But because a representative sample is not a goal in qualitative research, authors who employ e-mail interviewing usually overcome this problem of high nondelivery rate by inviting new or additional individuals to participate, if needed. Informed Consent and Confidentiality As in conventional studies, researchers who employ qualitative e-mail interviewing develop informed consent, providing participants detailed information about the research in which they are asked to participate and ensuring that they understand fully what participation would entail, including any possible risks. Participants in e-mail interview research are asked to take part in a study only after they provide their consent, which can be given to the researcher in a number of ways, including but not limited to: returning via fax or snail mail a signed form that was sent as an e-mail attachment, e-mailing back a signed form, or simply replying via e-mail affirmatively to an invitation to participate by stating in the message that the consent form was read and agreed to. The

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

right to withdraw from a study at any time is also included in the consent form. For more details about informed consent in online research, see Kraut et al. (2004). According to Kraut et al. (2004), “research on the Internet is not inherently more difficult to conduct or inherently riskier to subjects than more traditional research styles. But because the Internet is a relatively new medium for conducting research, it raises ambiguities that have been long settled in more conventional laboratory and field settings” (p. 114). In addressing these issues, researchers and Institutional Review Boards (IRBs) will need expertise, which many currently lack.3 This includes expertise about both online behavior and technology. For example, security, digital signatures, procedures for stripping identifying information, and provisions for one-on-one debriefing require specialized technical expertise. As in the case of face-to-face research, in the context of e-mail interviewing researchers need to ensure that adequate provisions are taken to protect the privacy of participants and to maintain the confidentiality of data. This is so because identifying information such as records of statements, attitudes, or behaviors, coupled with names, email addresses, partially disguised pseudonyms, or other identifying information, may be inadvertently disclosed either when the data are being collected or, more commonly, when they are stored on a networked computer connected to the public Internet (Kraut et al., 2004; Singer & Levine, 2003). Emphasizing to participants that certain measures will be adopted to maximize confidentiality is necessary. Examples of these measures include the use of pseudonyms and hiding the user names, domain names, and any other personal identifiers when publishing or storing interview data. Many people perceive online communication as anonymous because there is no in-person contact and thus, little accountability. This anonymity may explain why some people are more willing to participate in e-mail interview studies, whereas others are more willing to stop participating, not respond in a timely fashion, embellish more, or be less friendly to the interviewer (Hodgson, 2004; Mann & Stewart, 2000). As explained further below, the anonymity afforded by online communication can be an important factor in increasing self-disclosure (Herring, 1996; Mann & Stewart; Tidwell & Walther, 2002) and in facilitating a closer connection with interviewees’ personal feelings, beliefs, and values (Matheson, 1992). The American Psychological Association (Kraut et al., 2004), the American Association for the Advancement of Science (Frankel & Siang, 1999), the Association of Internet Research (Ess, 2002), Brownlow and O’Dell (2002), Eysenbach and Till (2001), Mann and Stewart (2000), Pittenger (2003), and Sharf (1999) all provide excellent detailed discussion on ethical issues relevant to online research. 3 The American Psychological Association (APA) recommends that all IRB boards have technical consultants who can be called on to resolve these issues when needed. APA further recommends that IRBs undertake an educational mission to inform researchers about the issues, the judgments that are now involved, and remedies for ensuring the health and protection of subjects in online research (Kraut et al., 2004).

Medium Effects One of the most important differences between e-mail interviews and face-to-face or telephone interviews involves media richness, that is, the ability of a communication medium to foster interaction and feedback and to permit people to communicate with many kinds of cues, using multiple senses (Panteli, 2002; Robert & Dennis, 2005). Having said this, face-to-face interviews are then expected to provide richer data than telephone interviews and telephone interviews are expected to provide richer data than e-mail interviews (Schneider et al., 2002). This is true because in e-mail interviews, for example, the interviewer will not be able to read facial expressions and body language, make eye contact, or hear voice tones of the participants. As a result, it is possible that some important visual or nonverbal cues are missed online that would be observed during face-to-face data collection (Selwyn & Robson, 1998). On the other hand, e-mail interviews reduce, if not eliminate, some of the problems associated with telephone or face-to-face interviews, such as the interviewer/interviewee effects that might result from visual or nonverbal cues or status difference between the two (e.g., race, gender, age, voice tones, dress, shyness, gestures, disabilities). Murray and Harrison (2004), for example, argue that some of their potential participants—stroke survivors—were assumed not to be able or willing to take part in face-to-face interviews because of speech and mobility disabilities or self-consciousness about their appearance. Kim et al. (2003), too, explain that, among other things, e-mail may safeguard against possible loss of face among some people when they describe potentially sensitive events, experiences, or personal characteristics (e.g., difficult relationships with family, lack of English proficiency, racism, academic problems), thus allowing them to participate in research studies. In short, in many cases e-mail facilitates greater disclosure of personal information, offering further benefits to both the researcher and participants (Bowker & Tuffin, 2004). Another medium-related problem in e-mail interviewing is that it is always possible that some participants may not be as effective writers as they are speakers (Karchmer, 2001). As mentioned earlier, however, the opposite could be true, too. There could be some participants (and even interviewers) who do not or cannot express themselves as well in talking as they do in writing. Online communication could solve the latter problem because neither the participants nor the interviewers need to communicate orally or face-to-face. Acknowledging that e-mail has strengths and weaknesses as a communication medium, researchers strive to maximize the richness of the tool by employing certain linguistic methods, such as the use of acronyms or abbreviations (e.g., LOL, laughing out loud; ROFL, rolling on the floor laughing) and emoticons (e.g., those little smiley faces), as well as underlining and capitalization (for emphasis), as a substitute for nonverbal cues (Walther, Anderson, & Park, 1994). Because little is known about the number of e-mail users who are literate with these communication methods, it is

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

1289

important for researchers who use e-mail interviewing as a data collection method to instruct and encourage their participants to use such acronyms and emoticons. This will not only lessen some of the losses in nonverbal cues but it should also increase the depth of the data collected. Interview Questions As in face-to-face and telephone interactions, most e-mail interview-based studies use an interview schedule for data collection. Some researchers decompose the schedule into several sections and ask a certain number of questions at one time, whereas others send all primary interview questions in one e-mail message (see Table 1). Moreover, some researchers e-mail their questions only after securing permission from their participants. Others e-mail their interview questions along with the interview invitation and consent form so that potential participants will have a better idea of what would be involved in the interview process before any commitments are made. The lack of a standard for conducting e-mail interviews is due to variations in the length of an interview schedule, the characteristics of the target population, and the experiences of the researchers in conducting qualitative e-mail interviews. With the exception of Meho and Tibbo (2003), no research report explicitly explained why one method was used but not another. In their case, a pretest was conducted to determine which method was best for their participants. The result indicated that all interview materials could be sent together in one e-mail, including the invitation for participation, background information about the researchers, consent form, instructions, and the interview schedule. This strategy may not be appropriate with other populations. The point here, however, is that pretests help determine the best method for each individual study or group of participants. Meho and Tibbo (2003), as well as other researchers such as Curasi (2001) found that the interview guide containing the interview questions could be sent to informants via e-mail with the questions embedded in the e-mail message, rather than in an attached document. Research has shown that embedded questions result in significantly higher response rates (five times as much) than do those attached to an e-mail message (Dommeyer & Moriarty, 2000). This is because the attached e-mail questions present too many obstacles to the potential respondent. Anyone responding to an attached e-mail survey must have: a strong interest in responding; the hardware and software that will enable him/her to download, read, and upload a foreign file; the knowledge of how to execute the various response steps; and a low fear of computer viruses. Absence of any one of these could result in a nonresponse. The embedded e-mail survey, despite its formatting limitations, can be answered and returned by the most unsophisticated of e-mail users and therefore can appeal to a broader audience (Dommeyer & Moriarty). A distinctive feature in e-mail interviewing is that it allows participants to take their time in answering questions and to take part in the interviews in a familiar environment 1290

(e.g., home or office), which may make them feel more relaxed expressing themselves and in responding when and how they feel comfortable (Kennedy, 2000; Lehu, 2004). Although this may generate rich and high quality data, it also means that the e-mailed questions must be much more selfexplanatory than those posed face-to-face, with a clear indication given of the responses required. Even when questions are pretested, because of lack of face-to-face or direct interaction in e-mail interviews, there is always room for miscommunication and misinterpretation. The inclusion of additional information may, however, function to narrow participants’ interpretations and, thereby, constrain their responses. Therefore, managing this methodological dilemma requires meticulous attention to detail, with attempts to reduce ambiguity and improve specificity while avoiding the narrowing of participants’ interpretations and constraint of their responses. According to Bowker and Tuffin (2004), restricting some of the ideas chosen for analysis will be inevitable, but it is very important and necessary to minimize participants’ confusion and eventual frustration by specifying the meaning of interview questions. The following two examples from Meho and Tibbo’s study (2003) demonstrate cases for which additional explanation of questions is needed: Interview Question: Who and when do you usually ask for help in locating research information? For what kind(s) of help do you normally ask? Participant Answer: I don’t know what you mean here. I usually hire a graduate student to do some basic legwork for me in terms of hunting out the newest information on whatever subject I am working on at the time. Interview Question: What criteria do you employ when assessing whether to follow up on materials not found in your university library? Participant Answer: Don’t know what you mean by this.

The fact that these two questions were not clear to, or were misinterpreted by, only a very small fraction of the study participants (3.3%) suggests that additional explanation or follow-up on misinterpreted questions be provided on an individual basis rather than to all participants. This should especially be the case when such questions are interpreted correctly by the majority of the study participants. Probes Probes or follow-up questions in interviews are generally used to elaborate and clarify participants’ responses or to help elicit additional information and depth from informants. Unlike face-to-face and telephone interviews, e-mail interviews do not allow direct probing; it can be done only in follow-up e-mails, which can take place any time during the data collection and analysis periods. The lack of direct probing in e-mail interviews may result in missing some important pieces of data, especially given that not all participants respond to follow-up questions, even if they were told to expect them. In Kennedy’s study (2000),

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

for example, 23 participants were initially involved; yet 3 stopped communicating after the first set of questions were answered and 3 did not supply enough information for analysis. In Karchmer’s study (2001), 16 initially agreed to participate, but 3 discontinued correspondence within the first week of data collection. In Meho and Tibbo’s (2001) study, 15 of the 60 study participants terminated the interview process and did not answer any of the follow-up questions. There are, however, cases in which all participants responded to follow-up probes, such as in Curasi’s (2001) study. As expected by researchers, the lack or loss of communication after the initial interview can be frustrating (Hodgson, 2004), but in none of the studies reviewed here was there discussion of this problem or an indication that this loss had any impact on the quality of data collected. In fact, although the lack of direct probing may result in the loss of some important pieces of information, on the other hand, it can play a major role in improving the quality of the e-mail interview data (Lehu, 2004). This is because the researcher is not limited to the probes that come to mind during the face-to-face interviews and because it gives participants ample time to think about their answers before sending them (Curasi, 2001). The benefits of indirect probing are further discussed in the following section. Data Quality According to Denscombe (2003, p. 51), the quality of responses gained through online research is much the same as responses produced by more traditional methods. The same conclusion was reached in several studies that compared, or conducted, both e-mail and face-to-face interviews (e.g., Curasi, 2001; Meho & Tibbo, 2003; Murray, 2004; Murray & Harrison, 2004). These studies found that participants interviewed via e-mail remained more focused on the interview questions and provided more reflectively dense accounts than their face-to-face counterparts. This is not to say that the quality of face-to-face interviews is lower, but rather to highlight the benefits of the e-mail interview, which was possibly aided by the ability of both the researchers and the interviewees to take the time to be more thoughtful and careful in their responses to, or communication with, each other than they would during natural conversation (Karchmer, 2001; Murray, 2004; Young et al., 1998). Data quality, according to Curasi (2001), is dependent on who is being interviewed, who the interviewers are, and how skillful they are in online interviewing. She found, for example, that some e-mail interview participants provided very short and very precise responses to the questions posed. Others, however, discussed at length their feelings and experiences, sometimes in as much depth and detail as their face-to-face counterpart, especially when data from the initial questions are combined with those from follow-up questions. In other studies, data from face-to-face interviews did not reveal any information that was not already discovered via data from e-mail interviews (Meho & Tibbo, 2003). Still other studies found that much of the information

conveyed through electronic mail is information that would not be conveyed through another medium, such as sensitive and personal information—health, medical, political, and so on (Beck, 2005; Murray & Sixsmith, 1998). Overall, e-mail interviewing offers an opportunity to access, in an interactive manner, participants’ thoughts, ideas, and memories in their own words. It allows the recording of many anecdotes that participants share to enhance the accounts of their experiences. It also allows participants to construct their own experiences with their own dialogue and interaction with the researcher. E-mail interviewing is additionally empowering to the participants because it essentially allows them to be in control of the flow of the interview (Bowker & Tuffin, 2004), enabling them to answer at their convenience and in any manner they feel suitable (Kennedy, 2000). Levinson (1990) considers that the asynchronous electronic communication’s capacity to provide opportunity for reflection and editing of messages before sending them contributes to the production of a closer fit between ideas, intentions, and their expression in writing. A summary of the advantages and disadvantages of e-mail interviewing, or challenges associated with it, is provided in Table 2. Guidelines for Conducting Effective E-Mail Interviews In addition to the findings discussed above, the studies reviewed or examined in this article and the personal experience of the author offer several suggestions for those considering the use of e-mail interviews in qualitative research. These suggestions are presented here in order to assist researchers in conducting more efficient and effective e-mail interviews, as well as to enable them to establish trustworthy results:

•

•

•

• •

Invitations: Solicit people for participation individually if possible rather than via a mailing list or message board. According to Dillman (2000), this technique shows potential participants that they are important, thereby encouraging them to participate. Subject line: Use an effective subject line for the first contact with the interviewees, such as Research Interview. This will avoid or reduce the likelihood of a request being deleted before it is read. Self-disclosure: Introduce yourself and provide brief information about your professional status/credentials. Then tell your interviewees how you acquired their e-mail addresses. This will help to establish trust. There is evidence that people will engage in more self-disclosure when they first become recipients of such self-disclosure from their interviewers (Moon, 2000). Interview request: State your request succinctly and professionally, as in “May I interview you for an article I am writing?” Be open about the research: Suspicion can exist when online researchers contact participants. One way to establish trust that creates rapport is to be as open as possible about the purposes and processes of the research. Outline the details of the project and specify the topic of the interview and the interview procedure, including information about

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

1291

TABLE 2.

Advantages/disadvantages of e-mail interviewing. Advantages

Interviewers and participants

Disadvantages/Challenges

Allows access to individuals often difficult or impossible to reach or interview face-to-face or via telephone

Limited to individuals with access to the Internet

Allows access to diverse research subjects

Requires skills in online communication from both interviewer and interviewees

Allows access to individuals regardless of their geographic location

Requires expertise in technology from both interviewer and interviewees

Allows interviewing of individuals who do not or cannot express themselves as well in talking as they do in writing Allows interviewing of individuals who prefer online interaction over face-to-face or telephone conversation Cost

Eliminates expenses of calling and traveling

Can be high for participants

Eliminates expenses of transcribing Decreases cost of recruiting large/geographically dispersed samples Time

Eliminates time required for transcribing

May take several days or weeks before an interview is complete

Eliminates the need to schedule appointments Allows interviewing more than 1 participant at a time Recruitment

Done via e-mail, listservs, message boards, discussion groups, and/or Web pages

Invitations for participation may be deleted before they are read

Participation

Done by e-mail

High undeliverable rates (e.g., due to inactive e-mail addresses) Some participants may drop out before interview is complete

Medium effects

Allows participants to take part in the interviews in a familiar environment (e.g., home or office)

Empowers participants, essentially allowing them to be in control of the flow of the interview

Allows participants to take their time in answering questions

Does not allow direct probing

Allows participants to express their opinions and feelings more Requires that questions be more self-explanatory than honestly (because of sense of anonymity) those posed face-to-face or by telephone, to avoid miscommunication and misinterpretation Encourages self-disclosure Loses visual and nonverbal cues due to inability to read facial expressions or body languages or hear the voice tones of each other

Eliminates interruption that takes place in face-to-face/telephone interviews Eliminates transcription errors

May narrow participants’ interpretations and, thereby, Eliminates interviewer/interviewee effect resulting from visual constrain their responses and nonverbal cues or status difference between the two (e.g., race, gender, age, voice tones, dress, gestures, disabilities) Requires meticulous attention to detail Participants may lose focus Cues and emotions can be conveyed through use of certain symbols or text Data quality

Allows participants to construct their own experiences with their own dialogue and interaction with the researcher Facilitates a closer connection with interviewee’s personal feelings, beliefs, and values

One-dimensional (based on text only) In-depth information is not always easily obtainable

Data are more focused on the interview questions asked Responses are more thought out before they are sent

•

follow-up exchanges. It is important that participants know what types of questions to expect, how much time is required for participation, and how many times they will be contacted. Made aware of this information, potential interviewees will likely make more informed decisions on whether to participate, which will assure better continuity in the interview process and in the quality of data collected. Incentives: Consider providing nontraditional incentives for people who will be willing to participate in a study. Meho and Tibbo (2003), for example, offered their study

1292

•

participants online bibliographic searches and personal citation searches (see Table 1). Promising participants a copy of the results may help encourage individuals to participate. Researchers should also communicate to potential participants the benefits of participation, such as the opportunity to gain perspectives on, and understanding of, their own ideas and opinions and those of their peers. Research ethics and informed consent: Emphasize the anonymity of the participants (e.g., by assuring them that all implicit and explicit links between their names and the data

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

•

•

•

•

•

they provide will be removed). In addition, follow and communicate the standard procedures for the protection of human subjects to the participants, such as asking them to read an approved informed consent form before the interview takes place. Avoid overly elaborate assurances of anonymity and confidentiality because it may actually heighten rather than diminish respondents’ concern, causing participants to be less willing to provide sensitive information (Singer & Levine, 2003). Interview questions: Keep in mind that participants are not being interviewed face-to-face. So, as mentioned earlier, make sure that the questions to be asked are clear enough both to avoid misinterpretations and to motivate participants to delve deeper into the topic at hand. Also, determine whether there is a need to ask a certain number of questions at a time or ask all initial, important questions in the very first e-mail message—this largely depends on the nature of the study, the number of questions prepared, and the participants; these points can be verified by conducting pretests (Meho & Tibbo, 2003; Young et al., 1998). Instructions: Along with the initial interview questions, include instructions to the participants on completing the interview. This might include how or where to place their answers; that the more detailed their responses the better; that there are no wrong or incorrect answers (in an effort to encourage spontaneity and reduce inhibitions); that they can use acronyms and symbols that communicate feelings, emotions, and the like; and that they should not worry about misspellings or grammatical errors. Deadlines and reminders: Indicate the due dates when inviting individuals to participate, but make them reasonable for the participants so that they have ample time to respond. Send reminders 1 week before the deadline, in case of no reply, to increase the response rate. When sending reminders, e-mail all important materials again (e.g., informed consent, interview schedule/questions, and so on) because some participants may feel shy about admitting that they deleted previously sent information. Limit the number of reminders to one or two; otherwise, it may be construed as pressure to continue participation. Follow-up questions: Be timely with follow-up questions, especially when clarifications, illustrations, explanations, or elaborations are needed. Check for messages from interviewees regularly and if necessary, summarize the interviewee’s responses to previous questions and return the summary to the interviewee for verification. This will demonstrate understanding and concern for careful representation while allowing for clarification of misinterpretations (Young et al., 1998). Participants and data quality: Be very discriminating as to the sample interviewed. A highly committed or motivated participant can be very helpful, providing detailed and in-depth interviews. Conversely, potential informants who lack commitment to the project may not be worth the follow-up, extra energy, and the possible time delays they require (Curasi, 2001). Be alert for misunderstandings and attentive to changes in the tone of responses, symbols that are inconsistent with previous dialogue, and any other clues that might lead to questioning the credibility of a response. Be prepared to refocus the discussion on the interview topic(s). Online relationships that develop over longer time frames can become quite comfortable for the

•

interviewee and there may be a tendency toward selfdisclosure beyond the scope of the interview topic(s). Do not overtly discourage this sharing; rather, subtly encourage responses related to the research topic (Young et al., 1998). Finally, reserve a few, or try to identify new, potential participants and use them as backup subjects. These may be needed if too many participants withdraw or if more data are needed. Survey methodology: Review the literature of, and learn how to carry out successful, survey research; it can be very useful in designing and conducting effective in-depth e-mail interviewing studies. Excellent sources on the topic include Dillman (2000), Groves, Dillman, Eltinge, and Little (2002), Groves et al. (2004), Presser et al. (2004), and Shaw and Davis (1996).

Conclusion This article reviewed studies that used e-mail for conducting qualitative, in-depth interviews. It was found that e-mail interviewing offers unprecedented opportunities for qualitative research, providing access to millions of potential research participants who are otherwise inaccessible. The method can be employed quickly, conveniently, and inexpensively and can generate high-quality data when handled carefully. Although the method has a number of challenges, many of them were found to be easy to overcome, presenting scholars with a new technique for conducting efficient and effective qualitative research. While a mixed mode interviewing strategy should always be considered when possible, semi-structured e-mail interviewing can be a viable alternative to the face-to-face and telephone interviews, especially when time, financial constraints, or geographical boundaries are barriers to an investigation. The use of e-mail to collect qualitative data will certainly expand as Internet access and use become more prevalent.4 Empirical studies addressing relevant methodological issues are few, and thus there is a need to explore more fully the conditions under which in-depth e-mail interviewing can be most effective, the factors that may influence its reliability, how the implementation of some techniques may improve response rate and quality of data, and how respondents react to e-mail-based interviews in comparison to telephone and face-to-face interviewing. Acknowledgments I would like to thank Blaise Cronin, Alice Robbin, and Debora Shaw for their valuable comments and suggestions. 4 With an estimated 800 million Internet users worldwide (Internet World Stats, 2005) and thousands of scholars utilizing the ease of access to many of these users, online research has become a field in its own right, boasting a number of peer-reviewed journals (e.g., Social Research Update— http://www.soc.surrey.ac.uk/sru/—and Sociological Research Online— http://www.socresonline.org.uk/home.html), a plethora of book titles, and an association that draws hundreds of researchers from across the globe to its annual conference (The Association of Internet Researchers—http:// www.aoir.org/).

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

1293

References Beck, C.T. (2005). Benefits of participating in Internet interviews: Women helping women. Qualitative Health Research, 15, 411–422. Birnbaum, M.H. (2004). Human research and data collection via the Internet. Annual Review of Psychology, 55, 803–832. Bowker, N., & Tuffin, K. (2004). Using the online medium for discursive research about people with disabilities. Social Science Computer Review, 22(2), 228–241. Brownlow, C., & O’Dell, L. (2002). Ethical issues for qualitative research in on-line communities. Disability & Society, 17, 685–694. Burton, L.J., & Bruening, J.E. (2003). Technology and method intersect in the online focus group. Quest, 55, 315–327. Chase, L., & Alvarez, J. (2000). Internet research: The role of the focus group. Library & Information Science Research, 22, 357–369. Couper, M.P. (2000). Web surveys: A review of issues and approaches. Public Opinion Quarterly, 64, 464–494. Curasi, C.F. (2001). A critical exploration of face-to-face interviewing vs. computer-mediated interviewing. International Journal of Market Research, 43(4), 361–375. Denscombe, M. (2003). The good research guide. Maidenhead: Open University Press. Denzin, N.K., & Lincoln, Y.S. (2005). The sage handbook of qualitative research. (3rd ed.). London: Sage Publications. Dijk, T.A.V. (1997). Discourse studies: A multidisciplinary introduction. Thousand Oaks, CA: Sage Publications. Dillman, D.A. (2000). Mail and Internet surveys: The tailored design method (2nd ed.). New York: John Wiley & Sons, Inc. Dommeyer, C.J., & Moriarty, E. (2000). Comparing two forms of an e-mail survey: Embedded vs. attached. International Journal of Market Research, 42(1), 39–50. Ess, C. (2002). Ethical decision-making and Internet research: Recommendations from the AoIR Ethics Working Committee. Association of Internet Researchers (AoIR). Retrieved February 27, 2005, from http://www.aoir.org/reports/ethics.pdf Eysenbach, G., & Till, J.E. (November 10, 2001). Ethical issues in qualitative research on Internet communities. British Medical Journal, 323(7321), 1103–1105. Fidel, R. (1993). Qualitative methods in information retrieval research. Library & Information Science Research, 15(3), 219–247. Foster, G. (1994). Fishing with the Net for research data. British Journal of Educational Technology, 25(2), 91–97. Frankel, M.S., & Siang, S. (1999). Ethical and legal aspects of human subjects research on the Internet. American Association for the Advancement of Science (AAAS). Retrieved February 27, 2005, from http://www. aaas.org/spp/sfrl/projects/intres/report.pdf Frost, F. (1998). Electronic surveys: New methods of primary data collection. In European Marketing Academy, Proceedings of the 27th EMAC Conference: Marketing, Research and Practice (1998). Stockholm: EMAC. Gaiser, T.J. (1997). Conducting on-line focus groups: A methodological discussion. Social Science Computer Review, 15(2), 135–144. Groves, R.M., Dillman, D.A., Eltinge, J.L., & Little, R.J.A. (2002). Survey nonresponse. New York: John Wiley & Sons, Inc. Groves, R.M., Fowler, F.J., Couper, M.P., Lepkowski, J.M., Singer, E., & Tourangeau, R. (2004). Survey methodology. Hoboken, NJ: John Wiley & Sons, Inc. Gubrium, J.F., & Holstein, J.A. (2002). Handbook of interview research: Context & method. Thousand Oaks, CA: Sage Publications. Herring, S.C. (1996). Computer-mediated communication: Linguistic, social and cross-cultural perspectives. Philadelphia, PA: John Benjamin Publishing Company. Herring, S.C. (2002). Computer-mediated communication on the Internet. Annual Review of Information Science and Technology, 36, 109–168. Hodgson, S. (2004). Cutting through the silence: A sociological construction of self-injury. Sociological Inquiry, 74(2), 162–179. Internet World Stats. (2005). Internet World Stats: Usage and population statistics. Retrieved May 25, 2005, from http://www.internetworldstats. com/

1294

Karchmer, R.A. (2001). The journey ahead: Thirteen teachers report how the Internet influences literacy and literacy instruction in their K–12 classrooms. Reading Research Quarterly, 36(4), 442–466. Kennedy, T.L.M. (2000). An exploratory study of feminist experiences in cyberspace. CyberPsychology & Behavior, 3(5), 707–719. Kim, B.S.K., Brenner, B.R., Liang, C.T.H., & Asay, P.A. (2003). A qualitative study of adaptation experiences of 1.5-generation Asian Americans. Cultural Diversity & Ethnic Minority Psychology, 9(2), 156–170. Knapp, M.L., & Daly, J.A. (2002). Handbook of interpersonal communication (3rd ed.). Thousand Oaks, CA: Sage Publications. Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., & Couper, M. (2004). Psychological research online: Report of board of scientific affairs’ advisory group on the conduct of research on the Internet. American Psychologist, 59(2), 105–117. Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage Publications. Kvale, S. (1996). Interviews: An introduction to qualitative research interviewing. Thousand Oaks, CA: Sage Publications. Lehu, J-M. (2004). Back to life! Why brands grow old and sometimes die and what managers then do: An exploratory qualitative research put into the French context. Journal of Marketing Communications, 10, 133–152. Levinson, P. (1990). Computer conferencing in the context of the evolution of media. In L.M. Harasim (Ed.), Online education: Perspectives on a new environment (pp. 3–14). NY: Praeger. Madge, C., & O’Connor, H. (2002). On-line with e-mums: Exploring the Internet as a medium for research. Area, 34(1), 92–102. Madge, C., & O’Connor, H. (2004). Online methods in geography educational research. Journal of Geography in Higher Education, 28(1), 143–152. Mann, C., & Stewart, F. (2000). Internet communication and qualitative research: A handbook for researching online. London: Sage Publications. Matheson, K. (1992). Women and computer technology: Communicating for herself. In M. Lea (Ed.), Contexts of computer-mediated communication (pp. 66–88). Hemel Hempstead: Harvester Wheatsheaf. Meho, L.I. (2001). The information-seeking behavior of social science faculty studying stateless nations. Unpublished doctoral dissertation, University of North Carolina at Chapel Hill. Meho, L.I., & Tibbo, H.R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587. Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage Publications. Moon, Y. (2000). Intimate exchanges: Using computers to elicit selfdisclosure from consumers. Journal of Consumer Research, 27, 323–339. Murray, C.D. (2004). An interpretive phenomenological analysis of the embodiment of artificial limbs. Disability and Rehabilitation, 26(16), 963–973. Murray, C.D., & Harrison, B. (2004). The meaning and experience of being a stroke survivor: An interpretive phenomenological analysis. Disability and Rehabilitation, 26(13), 808–816. Murray, C.D., & Sixsmith, J. (1998). E-mail: A qualitative research medium for interviewing? International Journal of Social Research Methodology, 1(2), 103–121. Murray, P.J. (1995). Research from cyberspace: Interviewing nurses by e-mail. Health Informatics, 1(2), 73–76. Murray, P.J. (1996). Nurses’ computer-mediated communications on NURSENET: A case study. Computers in Nursing, 14(4), 227–234. Olivero, N., & Lunt, P. (2004). Privacy versus willingness to disclose in e-commerce exchanges: The effect of risk awareness on the relative role of trust and control. Journal of Economic Psychology, 25(2), 243–262. Oppermann, M. (1995). E-mail surveys: Potentials and pitfalls. Marketing Research, 7(3), 29–33. Panteli, N. (2002). Richness, power cues and email text. Information & Management, 40(2), 75–86. Patton, M.Q. (2002). Qualitative research and evaluation methods (3rd ed.). Thousand Oaks, CA: Sage Publications. Persichitte, K.A., Young, S., & Tharp, D.D. (1997). Conducting research on the Internet: Strategies for electronic interviewing. In Proceedings of Selected Research and Development Presentations at the 1997 National Convention of the Association for Educational Communications and

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

Technology (19th,Albuquerque, NM, February 14–18, 1997). Washington, DC: Association for Educational Communications and Technology. Pettigrew K.E., Fidel, R., & Bruce, H. (2001). Conceptual frameworks in information behavior. Annual Review of Information Science and Technology, 35, 43–78. Pittenger D.J. (2003). Internet research: An opportunity to revisit classic ethical problems in behavioral research. Ethics & Behavior, 13(1), 45–60. Presser, S., Couper, M.P., Lessler, J.T., Martin, E., Martin, J., Rothgeb, J.M., et al. (2004). Methods for testing and evaluating survey questions. Public Opinion Quarterly, 68(1), 109–130. Robert, L.P., & Dennis, A.R. (2005). Paradox of richness: A cognitive model of media choice. IEEE Transactions on Professional Communication, 48(1), 10–21. Schiffrin, D., Tannen, D., & Hamilton, H.E. (2001). The handbook of discourse analysis. Malden, MA: Blackwell Publishers. Schneider, S.J., Kerwin, J., Frechtling, J., & Vivari, B.A. (2002). Characteristics of the discussion in online and face to face focus groups. Social Science Computer Review, 20(1), 31–42. Selwyn, N., & Robson, K. (1998). Using e-mail as a research tool. Social Research Update, 21. Retrieved from http://www.soc.surrey.ac.uk/sru/ SRU21.html Sharf, B.F. (1999). Beyond netiquette: The ethics of doing naturalistic discourse research on the Internet. In Steve Jones (Ed.), Doing Internet research: Critical issues and methods for examining the Net (pp. 243–256). Thousand Oaks, CA: Sage Publications. Shaw, D., & Davis, C.H. (1996). Modern Language Association: Electronic and paper surveys of computer-based tool use. Journal of the American Society for Information Science, 47(12), 932–940. Singer, E., & Levine F.J. (2003). Protection of human subjects of research: Recent developments and future prospects for the social sciences. Public Opinion Quarterly, 67(1), 148–164.

Strauss, A.L., & Corbin, J.M. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks: Sage Publications. Taylor, S.J., & Bogdan, R. (1998). Introduction to qualitative research methods: A guidebook and resource (3rd ed.). New York: John Wiley & Sons, Inc. Thurlow, C., Tomic, A., & Lengel, L.B. (2004). Computer mediated communication: Social interaction and the Internet. Thousand Oaks, CA: Sage Publications. Tidwell, L.C., & Walther, J.B. (2002). Computer-mediated communication effects on disclosure, impressions, and interpersonal evaluations: Getting to know one another a bit at a time. Human Communication Research, 28(3), 317–348. Tourangeau, R. (2004). Survey research and societal change. Annual Review of Psychology, 55, 775–801. Underhill, C., & Olmsted, M.G. (2003). An experimental comparison of computer-mediated and face-to-face focus groups. Social Science Computer Review, 21(4), 506–512. Walther, J.B. (1996). Computer-mediated communication: Impersonal, interpersonal, and hyperpersonal interaction. Communication Research, 23, 3–43. Walther, J.B., Anderson, J.F., & Park, D.W. (1994). Interpersonal effects in computer-mediated communication: A meta-analysis of social and antisocial communication. Communication Research, 21, 460–487. Wang, P. (1999). Methodologies and methods for user behavioral research. Annual Review of Information Science and Technology, 34, 53–99. Weber, R.P. (1990). Basic content analysis (2nd ed.). Newbury Park, CA: Sage Publications. Young, S., Persichitte, K.A., & Tharp, D.D. (1998). Electronic mail interviews: Guidelines for conducting research. International Journal of Educational Telecommunications, 4(4), 291–299.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—August 2006 DOI: 10.1002/asi

1295

Ethnography in Participatory Design Andy Crabtree Centre for CSCW Research Department of Sociology Lancaster University United Kingdom Telephone: +44 1524 94683 E-mail: [email protected]

Keywords

frequently employed in concert with prototyping as techniques of requirements specification. While effective in enabling user participation in design, these particular prototyping techniques are subject to an endemic problem of systems design. Specifically, in emphasising future possibilities, the danger of ‘tunnel vision’ and thus, of coming up with perfect technological solutions to the wrong set of work problems [63]. Although techniques of participatory design, particular experimental techniques, have gone a long way in reducing the significance of this problem in practical circumstances of design, the problem nevertheless remains as an ever present danger. One course of action seen to be contributing to a potential solution has been to turn to ethnography, an approach which insists that rigorous attention be paid to the social organisation of current practice [36].

Work-oriented design, participatory design, ethnography, methodological problems, methodological solutions.

THE EMERGENCE OF COMMON PRACTICES IN WORKORIENTED PARTICIPATORY DESIGN

ABSTRACT

Even the most cursory glance through recent proceedings of the biannual participatory design conference shows that ethnography is becoming an increasingly widespread technique in work-oriented design. This paper (1) explicates the rationale behind participatory design’s ‘turn to ethnography’; (2) identifies central problems with the technique’s employment from participatory design’s point of view; (3) presents methodological solutions developed in the course of designing a prototype supporting the work activities of some 2500 potential end-users distributed in over 250 offices around the world. Emphasis is placed on attention to working language as a reproducible means of getting hands-on work and organisation, particularly in large-scale settings.

INTRODUCTION

Over the last decade a common core of techniques supporting user-involvement in systems design have emerged from within participatory design. Future workshops, mock-ups, and scenario construction are

© Computer Professionals Social Responsibility, 1998. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 1998 Participatory Design Conference, pp. 93-105, Seattle, Washington, 12-14 November: Computer Professionals Social Responsibility.

Since its inception in Scandinavia nearly thirty years ago, the concept of active user-involvement in work-oriented design has undergone some radical transformations. In its origins, the concept of user-involvement emphasised unmediated, trade union-oriented and (thus) institutionalised notions of participation in workplace design. [51, 56, 57]. Today, more inclusive stakeholder notions emphasising interdisciplinarity and the development of practical techniques supporting userinvolvement predominate [7, 30, 46]. Although a heterogeneous enterprise, and despite internal equivocation regarding this shift in focus [2, 3, 8], participatory design has enjoyed modest success in developing commonly applicable techniques that (potentially) support original ambitions of workplace democracy [44]. As awareness of the benefits of user-involvement in design has grown over recent years, these practical achievements (outlined below) have seen techniques of participatory design be taken up, further developed and complemented with new techniques in Western Europe, North America, Austrilasia and beyond. In shifting focus from the politics of design to the

practicalities of design, participatory design has placed an emphasis on developing computer-based artefacts that resonate with or ‘fit’, and at the same time transform, the activities cum organisation of work in which they are to be embedded [66]. Following the Utopia project and the emergence of the ‘tool perspective’, end-users have been elevated from central focus in design to indispensable resource in so much as they are seen, and treated as a matter of policy, as the proper experts in details of work’s achievement. Subsequent emphasis in design has been placed on eliciting the tacit local knowledge and skills which characterise work and on supporting local knowledge and skills in a mutual, collective process of learning and design of potential technological solutions [23]. In the effort to understand work and its organisation, participatory design practices have been further elaborated through the inclusion of workplace analysts [14] and the development of cooperative user-designer techniques [11, 13, 26, 38] Today, PD is a heterogeneous enterprise employing a wide range of practical techniques for enabling active user participation in design. Despite a vast array of evolving approaches, participatory design might nevertheless be said to consist in a common core of techniques supporting userinvolvement in work-oriented design: Future workshops: user-designer sessions intended to identifying substantive ‘problems’ of work and alternatives from a user perspective [39]. Studies of work: typically (but not exclusively) preliminary studies of the workplace intended shed light upon important aspects of practice requiring support [15]. Mock-ups: cardboard designs that serve in the game of envisioning future work, enabling users to experience and modify potential design-solutions [25]. Prototyping: the construction of the future through the preliminary and iterative design of potential systems enabling concrete experience and modification by prospective users.[12]. Scenario construction: employed in developing, potential applications, scenarios are open-ended hypothetical alternatives to current practice constructed and enacted by users and designers on the basis of representative instances of work activities [16]. These common techniques may be employed individually at various, selective stages in development, or in concert throughout development. MAKING COMMON PRACTICES WORK

In treating users as the ultimate experts on what constitutes

appropriate computer support within their own context of work, participatory design is characteristically concerned with creating worklike contexts in which users and designers can formulate appropriate designs. Following future workshops and initial workplace studies, scenarios may be constructed. Scenarios are concerned with and informed by observations of specific work situations and are enacted and explored by users and designers alike through the use of mock-ups and / or prototypes in order: •

to explicate or make visible users taken-for-granted knowledge and skills.

•

in simulating work through alternative technological means, to identify what is necessary to practice and what is contingently dependent on the current organisation of work.

•

thereby, to enable users and designers to get hands-on future technical possibilities in concrete detail [49].

Scenarios are open-ended and continuously elaborated, developed and refined by users and designers hand-in-hand with mock-ups and prototypes throughout development until a concrete product – a fully functional prototype – emerges [45]. Prototyping’s strength lies in its orientation to future practice and the construction and iterative development of potential applications in (varying degrees of) cooperation with end-users [1, 17, 27]. Prototyping’s strength however, is also its weakness: in the iterative construction of potential applications lies the endemic problem of ‘tunnel vision’ – i.e. the danger of designing perfect technological solutions to wrong problems of work [1, 43, 63]. As Mogensen describes the situation: ‘First of all, prototyping is directed towards the future (potential computer applications) … Once the process of development of successive prototypes has started, the danger arises that one is led to elaborate the details of the current prototype instead of questioning its underlying premises.’ [49: 98]

The methodological problem alluded to here is not so much one of designing a potential application in successive iteration through the use of said techniques but rather, the methodological orientation taken in employing those techniques; namely the inherent orientation to the future. Recognising that effective prototyping depends on an adequate understanding of - and thus an orientation to current practice [31, 49], cooperative approaches to experimental prototyping have emerged in response to the problem of getting hands-on the current organisation of work [32]. The purpose of understanding current practice is two-fold. On the one hand current practice is oriented to in order to identify practical problems of work and thus, to

formulate initial ‘guesses’ as to what might constitute realistic possibilities for design. On the other hand, to elaborate those guesses through an experimental process in which prototypes are developed through confronting the practical problems embodied in current practice. In elaborating practical problems of work through a continuous process of analysis and design, participants ‘work up’ alternate futures through the cooperative formulation of concrete design-solutions to those problems. Thus, in experimentation both the current and the future are mutually and reciprocally elaborated through iteration and cooperation in analysis and design. Cooperative techniques of experimentation are predicated upon existing common practices of participatory design. Of particular but often downplayed importance are initial descriptions or studies of work. Initial descriptions of work facilitate the formulation of initial guesses as to what might constitute realistic possibilities for design. These guesses may then be explored in cooperation with end-users through the construction of scenarios and use of mock-ups and / or prototypes. As Morten Kyng points out, work descriptions ‘are descriptions of relevant, existing situations within the users’ workplace. Here the word relevant indicates that users find that these situations are important parts of their work and that currently they constitute a bottleneck, are error prone, or for other reasons need to be changed.’ [45: 94]

In the enactment of scenarios predicated on these descriptions, the prototype assumes the character of a ‘triggering artefact’, mediating analysis and allowing users and designers alike to investigate current practice, its problems, dynamics and constraints, in exposing current practice to alternate future possibilities [48, 50]. In exploring and experimenting with practice through designing-prototypes-with-users the future is ‘worked up’ in the present by elaborating current practice, and the problem of designing perfect solutions for wrong problems of work is, in principle at least, adequately resolved. PROBLEM

Experimental techniques, like all techniques of participatory design, are predicated in their employment on what users’ consider relevant. Emerging from, working within and attempting to ‘handle’ the dialectics between tradition and transcendence1, Simonsen and Kensing suggest that while there can be no doubt that users should ‘be taken seriously’, failure to ‘take a closer look’ at enacted practice may result in inappropriate design. 1

The gap between current practice and future practice which further characterises participatory design [24].

Without observing practice in situ, as it is performed in the workplace, one may well come up with the perfect solution to the wrong problem: ‘The immediate learning experience from [our] research project was that “taking a closer look” did result in specific changes to our first design proposal. This was, to some extent, even a surprising result, as both we and the users found the first design proposal very appropriate.’ [62: 56]

The methodological problem alluded to here does not abnegate the notion that design should be predicated on what users find relevant. Rather, it is to point out that what users find relevant in the course of accomplishing participatory design activities, experimental or not, is not necessarily what they find relevant in the course of work’s accomplishment. Of course there is, quite frequently, a strong relationship between activities of work and participatory design – but a relationship is all that exists: the two are not the same. The problem here is well known and consists in the difficulty of articulating or otherwise making visible enacted practice in actual details of its enactment [45]. Although participatory design has devised a number of sophisticated techniques to deal with the problem, it is not fully resolvable through the sole application of such techniques. The reason: enacted practice is highly localised, contingent, and (above all) subject to continuous enquiry and discovery for practitioners themselves in the course of work’s accomplishment [59]. Thus, enacted practice is, to some significant extent, intransigent to explication in alternate contexts [35]; hence the need to ‘take a closer look’. Despite significant methodological developments in experimentation, the endemic problem emerging from the simulation of context and the intractable dialectics of tradition and transcendence maintains to some, not insignificant, extent [62]. It will continue to do so in so much as enacted practice is intransigent to adequate abstraction to – and thus visibility in - alternate contexts however artfully provided for. One problem that participatory design has faced for some time then, is that of developing complementary means of ‘taking a closer look’. ETHNOGRAPHY - A CANDIDATE SOLUTION

Granting the need for a technique of getting hands-on current practice in actual details of its enactment does not answer the question as to which technique may be best suited to meet this need. In Scandinavia (at the very least) common agreement existed in the early 90’s however, as to the desirability of incorporating a sociological approach to work in systems design [42]. From several competing sociological schools of candidate solution, ethnography

emerged in applying an approach that facilitates the design of systems that resonate with or ‘fit’ work in context. The term ‘ethnography’ delineates little more than a distinction between quantitative and qualitative methods of social research. As Shapiro, commenting on the limits of ethnography in CSCW, remarks: ‘Ethnography can be put to the service of virtually any theoretical school: there are, for example, functionalist, structuralist, interactionist, Weberian and Marxist ethnographies.’ [61: 418]

This is not the place to explore the differences between such schools of thought. It is, however, to note that ethnography is anything but a unified method, indeed it is not really a method at all but, as Shapiro makes clear, is rather a gloss on various and different analytic frameworks2. Despite the disunity of ethnography it might nevertheless be said to entail a minimum orientation which has something to do with seeing social activities from the point of view of participants. As Randall et al. point out: ‘One “take” on this [orientation] … is the ethnomethodological one, in which members methods for accomplishing situations in and through the use of local rationalities becomes the topic of inquiry.’ [53: 330]

Ethnomethodologically informed ethnography’s primary topic of inquiry has been the world of work and organisation. Seen from ethnomethodology’s point of view, ethnography’s task is to identify the everyday methods and practical reasoning in and through the application of which activities of work are practically accomplished as routine, taken-for-granted activities within a working division of labour. Ethnomethodology focuses on the working division of labour as individuals are necessarily individuals-as-partof-a-collectivity and much of their work therefore consists of the intersubjective coordination of tasks into an ongoing assemblage which just is the ‘organisation’ of work: the factory, the office, the air traffic control suite etc. In coming to understand the situated methods through the application of which workers accomplish and coordinate their activities as activities ‘within’ some unique or distinct assemblage [29, 64], ethnomethodologically informed ethnography displays the performance of work and production of organisation in skilful, intersubjective (i.e. social) details of its real-time achievement in contrast to idealised form [54] Through its orientation to the social organisation of current 2

This point draws attention to the distinction between gathering data and producing findings through analysis of the data gathered: data may be analysed in multiplicity of ways for a multiplicity of purposes.

practice, ethnomethodology has achieved some prominence in system design [64, 34, 35, 36]. In respect of these achievements, participatory design turned to this particular brand of ethnography as a (potentially) complementary means of getting hands-on current practice [65, 9, 10, 62, 40, 41]. SOME PROBLEMS WITH ETHNOGRAPHY’S CANDICACY

Despite achieving considerable prominence within CSCW, ethnomethodologically informed ethnography’s candidacy in participatory design has not been and is not now without its problems; some real, others putative3. Interpretation

On a general level, participatory designers have suggested that to construe ethnography as a methodology supporting requirements gathering is to profoundly misrepresent and obscure its true nature as a vehicle of ‘cultural translation and representation’ [4]. Seen as a translation exercise, ethnography is construed as an interpretative activity which limits its (potential) input into design. As Harold Garfinkel, ethnomethodology’s founder, points out: ‘[Ethnomethodology] is not an interpretative enterprise. Enacted local practices are not texts which symbolise “meanings” or events. They are in detail identical with themselves, and not representative of something else. The witnessably recurrent details of ordinary everyday practices constitute their own reality. They are studied in their unmediated details and not as signed enterprises.’ [28: 8]

The concept of ‘interpretation’ is akin to that of ‘forming a hypothesis’ or ‘making an informed guess’ [68]. Ethnomethodology is not in the business of making informed guesses about enacted local practices but seeks to describe them in practitioners terms and actual details of their witnessable (re)occurrence which is the orderliness and thus (social) organisation of work: ethnomethodology describes what people do in observed and observable details of the doing. There is no hypothesising here then – this or that activity happened: the question is, in visible details of some particular activity’s, or family of activities (re)occurrence, how? Thus, despite occasional labelling to the contrary by its own practitioners, ethnography interprets nothing but seeks to rigorously describe and explicate the 3

Ethnomethodologically informed ethnography is simply referred to as ethnography from here on in as I am only concerned with ethnomethodologically informed ethnography from this point forward. It is worth bearing this point in mind to avoid confusion when considering claims about ethnography - any claims made about ethnography forthwith are claims about ethnomethodologically informed ethnography only.

socially organised features of some family of activities (re)occurrence thereby making visible the practices that systems will be embedded in and change4. Having said that, ethnography is unquestionably a means of cultural representation5 in so much as the approach, properly conducted, makes visible enacted local practices: the intersubjective workings of a culture, such as the workplace. It is the very ability to represent a culture’s workings – the shared, social ‘methods’ or practices of work’s situated accomplishment and coordination - that has enabled ethnography as a methodology supporting requirements specification in the design of CSCW systems. Proxy User

One of the central problems with ethnography in participatory design has to do with the notion of the ethnographer as a proxy user. In one respect this is a nonsense as the ethnographer does not (or at least should not) seek to be a proxy user but rather, seek to predicate design on enacted local practice. Of course the ethnographer can never know the work domain as users know it [35]. However, it is not the ethnographer’s task to speak on behalf of or represent users but the practices users enact through attention to the recurrent details of their enactment. That is to say that the ethnographer seeks to represent the ‘job’, and more specifically, the intersubjective methods or social practices in and through which the ‘job’ gets done time and time again. Thus, the ethnographer is concerned with portraying those features of work that maintain regardless of individual. Nevertheless, criticism has been made [45] in light of remarks suggesting that ethnographers can act as ‘users champions’ in the early stages of design [6]. Ethnography has (and can have) no objection to direct user-involvement from the outset of design, although the economic realities of industrial design may well dictate otherwise – as advocates of participatory design are well aware [32]. However, as Bardram (1996) points out, to exclude users even from the initial stages of design and elect ethnographers as proxy creates a potential problem of ‘one-way communication between users and designers, meaning that information is floating from the work practices to the designer, but no information about the future technology, the use of computers etc., is floating back to the future users in the 4

The methodological issue of interpretation is both a complex and subtle one which is addressed at greater length in the forthcoming paper Ethnomethodologically Informed Ethnography and Information Systems Design [22]. 5

Where representation is understood in the sense of to stand or act in the place of, as a proxy (Webster’s: 1994).

workplace.’ [4: 616]

In short, exclusion of users from initial design limits requirements formulation, thus affecting the efficacy of the design process as a whole. While it needs to be recognised that ethnographers frequently act as a communicative agents between users and designers, the potential problem of one-way communication, and thus the isolation of users and designers, is a significant problem to be reckoned with. However, the endemic problem of tunnel vision in design suggests that not only should users and designers be in direct contact from the outset of design, whenever possible, but also, that ethnographers should be an integral link in that chain if design is not to go astray in this way. Intervention

System design is characterised as a process of change: design is an intervening activity. In the course of its participation in design, ethnography has characterised itself as ‘non-intervening’: ‘Ethnography insists that its inquiries be conducted in a nondisruptive and non-interventionist manner, principles that cannot be compromised given that much of the motivation for IT is to reorganise work.’ [36: 431]

Comments such as this have led many participatory designers to criticise ethnography as failing to recognise the dynamics of design [4, 32, 45, 49]. That ethnography in a sociological mode should advocate a non-interventionist attitude I find curious. From its origins, sociology has been explicitly concerned with the issue of social change. Not simply as a topic of sociological inquiry but more importantly, as the point and purpose of sociological inquiry. Sociological findings were, from their very conception let alone production, to be put to use in changing society and ethnographers are very much involved in bringing about social change, particularly in working order through technological design. To take a noninterventionist attitude is not only wholly incompatible with the ethos of sociology but also incongruent with design activities within in which ethnography is embedded and performed. Having said that, as Hughes et al. [36] point out, there are some principles at work here ‘that cannot be compromised’. The most important principle is the notion of maintaining faithfulness to the phenomenon. If system design depends on an adequate understanding of enacted practice, then it needs to achieve a congruent understanding of practice’s workings on any occasion of design. Ethnography’s success here (to date) depends on it observing work in situ in a nondisruptive manner. This is not a negotiable matter, but a condition of effective organisational change through design as anybody can change practice. If one is not aware of the

social characteristics of the job which are work’s guarantee however, design may well fail or worse, impinge upon working life in ways that are detrimental to workers and business alike [34, 36, 53]. Motivated by change, participatory designers frequently emphasise the need to take action and intervene. Intervening in the absence of sufficient knowledge of enacted practice can hardly be construed as best practice in any respect [62]. To require that an approach to understanding and getting hands-on enacted practice be non-disruptive, is not to advocate that the understandings produced by that approach be non-interventionist: what one uses the understanding for, and how, is an entirely different matter6 [40]. Current Practice

Ethnography’s orientation to enacted local practice has given rise to criticism to the effect that it ‘fetishes’ current practice at the (potential) expense of future conceptions of work [50]. Criticisms such as this are not intended to abnegate an attention to current practice but draw attention to the proper place of such an attention in design. As Mogensen describes it: ‘Current practice imposes a number of constraints on potential applications’ and as such ‘current practice often contains the keys to what “guesses” could be appropriate.’ [49: 98]

The point and purpose in attending to current practice is to discover realistic possibilities for design where the notion of ‘realistic’ is understood in the context of ‘constraints’ or features of practice which are integral to the continued performance of work. Ethnography could not agree more: ‘Ethnography .. brings a particular focus to the analysis of systems in use and thereby outlines the “play of possibilities” for system design …[Thus] we are not making .. a defence of current practice [but explicating] .. possibilities that good design should not ignore’. [53: 337]

Ethnography does not ‘look’ at current practice for its own sake then, but in order to identify ‘essential’ characteristics of practice on any occasion of design. Specifically, the shared methods in and through which activities are accomplished and coordinated - the what and how of practice so to speak - and the practical reasoning 6

It might also be said that to recognise design’s interventionist character is not to buy into any preconceived notions as to what intervention ‘is all about’: appropriate intervention depends on the situated character of the phenomenon in question not on prior formulations of what the phenomenon might be.

underpinning activities - the why of practice. Knowing the ‘what’, the ‘how’ and the ‘why’ of practice as of enacted detail is to understand the realistic ‘play of possibilities’. If design is to be effective, it must be able to get hands-on the realistic play of possibilities on any occasion of design, hence ethnography’s attention to current practice. It should also be said, that identifying these features of practice is not only an initial concern in design but a concern that runs throughout development [19]. Ethnography ought to influence and run in parallel to exploratory and experimental activities of user participation, thus enabling design to maintain an adequate grasp on current practice in ‘working up’ the future through cooperation in design [21]. Implications for design

The greatest problem ethnography faces, is that of ‘linking’ its findings to system specifications [4, 36, 52, 61]. The issue has been treated in two ways by ethnographers to date. One, through the development of structured means such as DNP (COMIC Del. 2.2*) which consists in developing computer support for organising ethnographic findings and formulating abstractions; and two, by reformulating the problem. In the case of the latter and for example: in considering the working practices of ethnographers and designers, Plowman et al. [19] note that specifications for design are routinely generated through internal reports and discussions with designers. Reflecting on the character of internal reports and discussions, Plowman et al. suggest that ethnographic studies ‘impart knowledge to design’ rather than ‘give form to design’. Ethnographic studies are ‘informative’, and that is all they are supposed to be. Such a reformulation of the problem is untenable. It does, however, encapsulate a common attitude and one which is detrimental to ethnographic study in design (in the longer term at least). As Shapiro makes forcefully clear: ‘Any role at all for sociologists in this field rests on their claim to being in a better position to identify particular aspects of “what is really going on” in a given field of work and “what is really the problem” that people encounter in doing it. If this claim is not sustainable then sociologists have no contribution to make to systems design.’ [60: 21]

If ethnography cannot support system developers in the redesign of work rather than ‘run for cover’ [61], then it has no business in design. Having said that, findings in hand, it is no part of ethnography’s remit to come up with actual design-solutions [54]. Design-solutions are the indisputable task of the participatory designer cum software engineer, and users. Ethnography’s task is to develop commonly applicable means of discovering and linking ‘what is really *

www.comp.lancs.ac.uk/computing/research/cseg/comic/

going on’, why and how in ways that support the formulation of potential design-solutions. Of course, the problem is how ethnography might achieve this in ways that are readily assimilable by software engineers and users? ‘LINKING’ ETHNOGRAPHY TO DESIGN

The central problem of linking ethnography to design is a problem of method. As Kensing and Simonsen describe the situation: ‘though we have learned that applying ethnography contributed to [our] result, it is impossible to specify .. precisely which techniques gave which kind of insight.’ [62: 56]

Although firm advocates and developers of ethnography as a means of getting hands on enacted practice in the attempt to solve the endemic problem of designing perfect solutions to wrong problems of work, Kensing et al. point out that issues of technique are still problematic [40]. In attempting to support the continued assimilation of ethnography in work-oriented practices of participatory design, below I outline the ethnographic method developed in the course of the designing a prototype for a global customer service system. Immersion in the setting [36], standard use of audio or videotape7 [65] and description of work in its own term’s [9] are taken-for-granted here. The concern here is with language, its relationship to the routine performance of work, and with the production of concrete resources supporting the formulation of concrete design-solutions. Properly speaking, what is outlined below is not simply a method but a methodology: way of working and rationale of work are two sides of the same coin. Language: a candidate methodological solution

use natural language but do so in distinct ways constitutive of distinct practices. Thus, the language of container shipping is different to the language of rail transport, rail transport different to sociology, sociology different to computer science and so on. In its use language is constitutive of distinct practices, and the language of any practice is distinct in and as itself: as the practice of container shipping, rail transport, sociology or computer science etc. Borrowing a metaphor from Wittgenstein, I characterise a distinct practice as a ‘language-game’. From my own point of view, to understand a language-game is to understand a distinct organisation of work: a practice, or more precisely, a family of practices. It should be said that the methodological reason for invoking the notion of a language-game is not simply to draw attention to the relationship between language and organisation. Rather, in so much as language is practice [67], then it is to point out that attention to a working language is a primary means of discovering organisation in and as the normal, natural course of work’s accomplishment8. What one sees in getting hands on the language-game is practice and thus organisation in its own terms, in real world detail providing for the possibility of effective technological support. Thus, to understand the languagegame of customer service in the container shipping business for example, is to understand, in actual details of accomplishment, that complex of categorised activities in and through which customer service work (for example) is achieved and some element of a unique organisation produced. Organisations and language-game concepts

Language-games and organisations

The strength of this orientation to practice lies in an all too frequently glossed feature of work in large organisations. No matter what size, work is achieved locally, in small settings: in offices, workshops, and on factory floors etc., consisting in sections, sub-sections, work groups and so on, all of which consist in a relatively small number of members. Widespread, even global practice emerges from the implementation and routine accomplishment of predefined procedures in small settings and assembly’s of work. Thus, to get hands-on practice in one location is to get hands-on it in another in so much as (and only in so much as) the same pre-defined procedures of work apply, which they frequently do hence there being such a thing as ‘common’ practice whether at local, regional or global levels; different procedures, then different organisations of work (as one frequently finds at regional levels in a global

A fundamental feature of all human practice is language. Different practices have different ‘grammars’ - i.e. they all

8

The notion of language as methodological solution to the problem of securing empirical reference (getting hands-on current practice) has some pedigree within the social sciences [37]. One ‘take’ on this point of view emerged in the course of interdisciplinary work in developing a global customer service system supporting the commercial activities of a large geographically distributed container shipping company. The organisation’s staff, some 2500 members, work out of two hundred and fifty offices in over seventy countries providing world-wide coverage. The first and biggest problem the project presented was its sheer scale: a globally distributed company with over two hundred and fifty offices in seventy countries world-wide. How is one supposed to get hands-on that?

7

See [65] to see what the ‘standard’ is.

As Pelle Ehn reminds us: ‘To design new artefacts that are useful for people, designers have to understand the language-games of the use activity.’ [24: 108]

scheme9). In their application, common procedures of work are rendered intelligible (and thus discoverable) through unique concepts: this set of procedures is called X, that set of procedures called Y. Furthermore, this or that set of procedures are applied (and work thus performed) by persons occupying discrete positions within the working division of labour. Likewise, these positions are rendered intelligible through concepts found in each and every local setting of work: in container shipping for example, whether in Europe, Asia or America, one finds persons occupying positions dealing with ‘pricing’, ‘export handling’, ‘documentation’ etc. in customer service. In other words the working division of labour is a categorised framework enabling the identification of common practices of work. These practices consist of the achievement of pre-defined procedures, which are themselves categorised and related. One may get hands-on work then by ‘mapping the grammar’ of the languagegame10 [68]. In order to map a language-game’s grammar it is necessary to adopt the ethnographic stance – to observe practice itself. The purpose here is to document language-game categories or concepts as enacted concepts. The first step here is to get hands-on the working division of labour. This is achieved by mapping the primary concepts constitutive of practice or the area of practice in which design is interested, and their interrelations: Example 1.0: In developing GCSS we were concerned with developing technological support for a distinct area of organisational practice known as ‘customer service’. The primary concepts at work here are ‘quoting’, ‘pricing’, ‘export booking’, ‘allocation’, ‘documentation’, ‘inbound handling’. Interrelations are respective – quoting (standard rates) and pricing (non-standard rates) relate to booking and allocation (one formulates and issues a financial rate and if accepted does a booking and assigns cargo to specific vessels); booking to documentation (having booked cargo and loaded the container on a specific vessel, legal documentation must be made to cover its shipment); 9

In the context of container shipping, Europe works in one way, Asia another for example, although work within these regions is much the same as the same procedures of work apply. In so much as local variations do occasionally occur, then they are conceptually distinct and thus mappable. See [20] for further detail. 10

That is to say, by describing the practised ways in which categorised positions and procedures of work are recurrently achieved and related to other categorised positions and procedures.

documentation to inbound handling (having shipped cargo to some point, arrangements for its release and delivery must be made). These concepts were discovered through attention to the membership categories employed by persons embedded or occupying discrete positions within the working division of labour: people who, as a matter of daily routine, do ‘quotes’, ‘pricing’ ‘export handling’ etc. Having identified the primary concepts and the sense in which they relate to one another, the next step is to map the grammar of each primary concept. Each primary concept consists in a family of activity specific or relational concepts [55, 5]. Mapping the grammar of each primary concept thus consists in identifying relational concepts and mapping their individual grammatical features: Example 1.1: The primary concept of ‘export handling’ consists in the relational concepts of ‘preliminary booking’, ‘freight type’ (+ the categories ‘full load’, ‘partial load’, ‘over size’, ‘dangerous’: all of which are associated to other activity specific concepts: ‘over size’ to ‘dimensions’ for example) ‘routing’, ‘space allocation’, ‘pricing’, ‘planning’, ‘inland haulage’, ‘confirmation’, ‘notification’11. By mapping individual grammatical features I mean this: insofar as language-game concepts are enacted, then mapping a relational concept’s grammatical features consists of describing the actions in and through which the activity being mapped is recurrently accomplished: Example 1.2: In mapping the primary concept ‘booking’ and relational concept ‘over size’ we must, in addition to regular booking concepts, map the concepts of ‘dimension’ - which consists in obtaining and inserting details of ‘length’, ‘width’, ‘height’ and ‘weight’ into the system and ‘acceptance’; the shipment of ‘over size’ freight must be ‘accepted’ by the vessel ‘coordinator’. The work of acceptance consists in sending a telex marked ‘OOG’ to the coordinator who, having assessed the feasibility of carrying the freight and availability of space on the vessel, approves shipment by inserting ‘A’ for accept and returns the telex; over size bookings cannot be confirmed without being accepted by the coordinator. Despite its simplicity the above example, which is greatly abstracted as space limits what can be shown here, serves to demonstrate that mapping a relational concept’s grammatical features not only makes the constitutive details of a particular activities accomplishment visible but also, 11

In practice, each concept would be described in the details of its constitution: the actions, collaborations, temporal character of the work, tools (no matter how mundane) and information produced and used etc.

and at the same time, renders apparent the embodied ways in which that accomplishment relates to or is coordinated with other activities within the working division of labour. Mapping a language-game’s grammar not only makes visible the ways in which work is intersubjectively orchestrated and achieved as a matter of everyday routine then, but in so doing secures a particular relevance for design in making visible what the ‘game’ is and how it is played. This issue goes to the heart of systems design, for in achieving an understanding of the ‘game’ (the family of practices an organisation consists of as a phenomenon in action) and how it is played, how its constituent activities ‘hang together’ as activities in playing the game, we come to understand what playing the game depends on and thus, of what is necessary or essential to practice and what is contingent on the current organisation of the game. In other words, in mapping grammar and thereby achieving an understanding of the situated ways in which the ‘game’ is played, we come to understand what is and what is not amenable to change. Thus, we come to see what practices playing the ‘game’ relies upon. A fortiori, mapping grammar contributes to the resolution of the classical problem of tradition and transcendence on any occasion of design (Ehn, 1988). In doing so it contributes to the solution of the endemic problem of tunnel vision. Furthermore, in mapping the grammar of language-game concepts, we furnish concrete resources for design. Instances of language-game concepts

Primary and relational language-game concepts are mapped through the provision of ‘instances’. Instances are concrete cases of concepts-in-use, of activities-being-done, of workin-progress [10]. They describe, in real world detail, the social organisation of this or that concept, describe the practices in and through which this element of the game is played. Specifically, instances display: shared, intersubjective techniques or ways of working, artefacts used including information worked on and transformed in the working and, of the utmost importance, the practical reasoning or point and purpose for which information is being worked on. Language-game concepts are mapped through real world instances of concepts-in-use and as such, delineate a fluid movement of action and interaction in real time. Instances may be provided in the form of video or audio recordings of work-in-progress and by transcripts of informal interviews with staff in their actual settings of work which incorporate copies of artefacts-in-use (screen dumps, documents, hard copy files etc). In so much as instances display practice then they provide methodical detail of work’s real time accomplishment in that these methods, like the practices constitutive of chess, are the practices whereby the ‘game’ is played by any competent member. Thus, instances furnish concrete topics and

resources for design. An instance of a language-game concept

In the normal, natural course of customer service work in container shipping, ‘allocation’ is an activity concerned with assigning cargo to a particular vessel. In discussion with the project’s participatory designer and the organisation’s project management, allocation was ‘scoped’ as a matter of specifying rules regarding weight, financial margins, type of containers etc., and displaying allocation figures per vessel and office. Through experimentation with allocation functions in workshops, it became apparent to the developers that the ‘scope’ needed to be extended. Specifically, to enable ‘taking action when space pressed’. Ethnographic studies of the work were undertaken and although a contingent activity, ‘taking action when space pressed’ transpired to be an everyday activity accomplished in routine (or recurrent) ways. The routine character of work here consisted in export handlers informing the ‘capacity manager’ of the current state of affairs and of prospective business by telex, and asking for an according increase in allocation for the space pressed vessel. The capacity manager coordinates all requests from export handlers through the use of hard-copy vessel specific allocation sheets, informs ‘line management’ of the actual and prospective state of affairs by using a computer based artefact akin to an edit sheet, and requests an according increase in allocation. Line management checks the actual state of affairs for the vessel in all regional offices through an on-line vessel specific allocation overview, and if any regional office is underbooked and, as the prospects indicate, does not look likely to achieve its allocation, grants the request. Vessels becomes increasingly space pressed the closer it gets to arrival / departure date. Thus, taking action is typically a ‘last minute’ activity which is vital to the well being of the business as the company wishes to maximise its operational capacities and get each vessel as full as possible, and full with cargo generating the most income. Given the ‘last minute’ character of allocation, it is not uncommon for several, if not all, offices to be competing for space. Thus, when calculating ‘prospects’ capacity managers often ‘add’ an excess to the total figure, knowing that line management will probably not be able to give it to them but in responding to over-estimated prospects, will probably give them something close to what they really need. This ‘negotiation’ is on-going and becomes increasingly frenetic the closer the arrival / departure date comes. Compromise is the norm here and capacity managers routinely have to ‘roll’ some cargo to the next available vessel which may well be a week away, and which consists in using an on-line export overview

displaying customer, commodity, number of containers and other details of relevance to making a decision as to who and what can be rolled in maximising operational capacities and cost-benefit. So the next vessel … Analysing instances

In mapping the grammar of ‘allocation’ in details of that concepts enactment, it became apparent that in addition to scoped requirements, the ‘problem’ of work we had to support if the system was to adequately support the daily accomplishment of work, consisted in providing for the accomplishment and coordination of activities between export handling and capacity management on the one hand, and capacity management and line management on the other. In coming to understand the rationale of the work by mapping the situated ways in which taking action when space pressed was routinely accomplished, and discussing the details of that accomplishment with the participatory designer and users, it became apparent how we might go about solving that problem. This is not to say that we sought to reproduce existing mechanisms of coordination but rather, that in coming to understand the social organisation of the work through observing such mechanisms in use, we came to understand just what kind of design-solutions were realistically possible. Specifically, design-solutions would have to enable communication between export handlers and capacity management on a case by case basis; enable the capacity manager to coordinate cases; get an overview of the actual and prospective state of affairs throughout local export handling per vessel; get an overview of roll criteria; enable line management to get an overview of the actual and prospective state of affairs throughout regional export handling per vessel; enable capacity management and line management to ‘negotiate’ allocation on a contingent, moment-by-moment basis. Achieving an understanding of real world working practice through mapping grammar and thereby documenting the actual details of work’s accomplishment, allows us to identify practical problems of work and situated, intersubjective methods of solution which taken together provide for the development of systems that support, and at the same time transform, the activities in which they are to be embedded. Instances of language-game concepts-in-use faciltate the specification of requirements in that they delineate a problem-space emergent from practice itself. Furthermore: in illuminating the ways in which staff routinely go about solving the problem, instances of concepts-in-use delineate a solution-space rich in productional detail providing for the initial formulation of concrete design-solutions. One product emerging from the orientation to the socially organised features of the

allocation instance for example, was the development of a flexible overview enabling coordination and negotiation. Having said that, design-solutions such as the overview are not the product of ethnography alone but of ethnography, object-orientation and participatory design working in concert with end-users in a process of evolutionary prototyping12. The instance is, one might say, a concrete starting point for design; which is not to say that one must have a collection of instances prior to design but rather, that they should be generated throughout the course of design in concert with exploratory and experimental activities. Instances are concrete starting points in that they display the social organisation of activities of work in real time and as such they circumscribe a problem-solution space for design. In this respect instances of concepts-in-use not only enable design to get hands-on practice in details of its enactment but also, and at the same time, ‘link’ ethnography to design in a readily assimilable way. As such, instances furnish concrete topics for design and serve as resources, sensitising designers to the subtleties of work’s real time accomplishment in a manner that provides for the formulation of concrete design-solutions. Instances are invaluable resources in grounding design in practice and its constituent details then. They enable system design to get hands-on the language-games of use activities - thus contributing to the resolution of the problems of tradition and transcendence, and tunnel vision - and do so in a rigorous, reproducible fashion, furnishing transformable resources in the process. ETHNOGRAPHY IN PARTICIPATORY DESIGN

User-involvement has undergone some radical transformations since its inception. In shifting emphasis from institutional notions of user participation in design to more technologically oriented means, prototyping has emerged as a common (potential) solution to the problem of accomplishing (more) democratic organisational change. Prototyping’s strength lies in its orientation to the future and, in participatory design’s case at least, the formulation of potential futures in active cooperation with users. The strengths of prototyping are also its weakness however. On the one hand, in orienting to the future lies the endemic danger of tunnel vision: designing the perfect solution to the wrong problem(s) of work. On the other hand, what users find relevant in the course of participatory design activities is not necessarily what they find relevant in the course of accomplishing work. No matter how artfully provided for, the problem of explicating enacted practice in 12

The confines of this paper exclude elaboration of that, somewhat complex, process. For an explication of the process and its formal characteristics see [18, 21].

alternate contexts cannot be fully resolved. In attempting to solve the problem of designing perfect solutions to wrong problems of work by getting hands-on enacted practice in details of its enactment, participatory design has turned to ethnography as one complementary, candidate solution. Incorporating ethnography into participatory design in readily assimilable and reproducible ways has proved to be problematic however. In explicating the ethnographic techniques employed in the design of a global customer service system, it has here been suggested that attention to working language provides a rigorous, reproducible and complementary means of getting handson work and organisation, and of linking ethnography to design. In treating practice as a language-game and mapping the grammar of its concepts-in-use, instances of the intersubjective ways in which work is routinely accomplished and coordinated in real time are provided as resources for design. In the analysis of instances, practical problems of work are displayed. More: the everyday ways in which practitioners routinely solve those problems are displayed. Still further: instances display the rationale of work and thus make visible, ‘what the work is really all about’ in actual details of its achievement. In the technological transformation of

these findings – i.e. in prototyping - current and future working practice may be further elaborated through experimentation and continued ethnographic inquiry until a concrete application thoroughly grounded in, and at the same time transforming, everyday organisational practice emerges. In conclusion, it might be said that participatory design relies on obtaining an adequate understanding of the language-game of the use activity. Working in parallel with experimental techniques, that goal may in significant part be achieved on any occasion of design by attending 1) to the working division of labour and the membership categories employed by persons embedded there-in; and 2) to the categories members’ use to make their activities intelligible both to each other and the inquirer alike. Concrete resources – instances – supporting the formulation of design-solutions may be furnished by describing the recurrent activities of language-game categories or concepts employed: ‘a language-game is something that consists in the recurrent procedures of the game in time.’ (Wittgenstein, On Certainty: 519)

ACKNOWLEDGEMENTS

This research was funded by the Danish National Centre for IT-Research, research grant COT 74.4 and made possible by the members of the Dragon Project. Many thanks. REFERENCES

1. Bally, L., Brittan, J., Wagner, K.H. (1977) A Prototype Approach to Information System Design and Development, Information & Management, vol 1. 2. Bansler, J.P. & Kraft, P. (1994a) The Collective Resource Approach: The Scandinavian Experience, Scandinavian Journal of Information Systems, 6 (1). 3. Bansler, J.P. & Kraft, P. (1994b) Privilege and Invisibility in the New Work Order: A Reply to Kyng, Scandinavian Journal of Information Systems, 6 (1). 4. Bardram, J.E. (1996) The Role of Workplace Studies in Design of CSCW Systems, Proceedings of IRIS 19, 613-629, Gothenburg Studies in Informatics, Sweden. 5. Benson, D. & Hughes, J.A. (1983) The Use of Categories and the Display of Culture, The Perspective of Ethnomethodology, Longman: London. 6. Bentley, R., et al. (1992) Ethnographically Informed Systems Design for Air Traffic Control, Proceedings of CSCW ‘92, Toronto, Canada: ACM Press. 7. Bjerknes, G. et al. (1987) Computers and Democracy: A Scandinavian Challenge, Aldershot: Avebury. 8. Bjerknes, G. & Bratteteig, T (1994) User Participation, Proceedings of PDC ‘94, Chapel Hill, NC: CPSR. 9. Blomberg, J., et al. (1993) Ethnographic Field Methods and Their Relation to Design, Participatory Design (eds. Schuler, D. et al.), Hillsdale, NJ: Lawrence Erlbaum Associates. 10. Blomberg, J., Suchman, L., Trigg, R. (1994) Reflections on a Work-Oriented Design Project, Proceedings of PDC ‘94, Chapel Hill, NC: CPSR. 11.Bødker, S., et al. (1987) A UTOPIAN Experience, Computers and Democracy (eds. Bjerknes, G. et al.), Avebury: Aldershot. 12. Bødker, S. & Grønbæk, K. (1991) Cooperative Prototyping, International Journal of Man-Machine Studies, 34 (3). 13. Bødker, S. & Grønbæk, K. (1991) From Prototyping by Demonstration to Cooperative Prototyping, Design at Work (eds. Greenbaum, J. et al.), Hillsdale, NJ: Lawrence Erlbaum Associates. 14. Bødker, S., et al. (1993) The AT-Project, Computer Science Dept., Århus University: Daimi PB-454. 15. Bødker, S., et al. (1993) Cooperative Design, Participatory Design (eds. Schuler, D. et al.), Hillsdale, NJ: Lawrence Erlbaum. 16. Bødker, S. et al. (1995) A Conceptual Toolbox for Designing CSCW Applications, Proceedings of COOP ‘95, France: ACM Press.

17. Boehm, B.W. (1988) A Spiral Model of Software Development and Enhancement, Computer, vol 21. 18. Christensen, M., et al. (1988) Multiperspective Application Development in evolutionary prototyping, Proceedings of ECOOP ‘98, Belgium: Springer. 19. COMIC Deliverable 2.2 (1994) Field Studies and CSCW, (eds.) Lancaster & Manchester University. 20.Crabtree, A. (1998) Talking Work: language-games, organisations and CSCW, Computer Supported Cooperative Work: The Journal of Collaborative Computing, to appear, the Netherlands: Kluwer. 21. Crabtree, A. & Mogensen, P. (1998) The Relevance of Specifics and the Specifics of Relevance, Computer Science Dept., Århus University. 22.Crabtree, A., Nichols, D.M., O’Brien, J., Rouncefield, M., Twidale, M. (1998) Ethnomethodologically Informed Ethnography and Information Systems Design, Computing Dept., Lancaster University. 23. Ehn, P. & Kyng, M. (1987) The Collective Resource Approach ..., Computers and Democracy (eds. Bjerknes, G. et al.), Aldershot: Avebury. 24. Ehn, P. (1988) Work-Oriented Design of Computer Artefacts, Stockholm, Sweden: Arbetslivscentrum. 25. Ehn, P. & Kyng, M. (1991) Cardboard Computers, Design at Work (eds. Greenbaum, J. et al.), Hillsdale, NJ: Lawrence Erlbaum Associates. 26. Ehn, P. & Sjögren, D. (1991) From System Descriptions to Scripts for Action, Design at Work (eds. Greenbaum, J. & Kyng, M.), Hillsdale, NJ: Lawrence Erlbaum Associates. 27. Floyd, C. (1984) A Systematic Look at Prototyping, Approaches to Prototyping, Berlin: Springer-Verlag. 28. Garfinkel, H. (1996) Ethnomethodology's Programme, Social Psychology Quarterly, 59 (1). 29. Garfinkel, H. (1967) Studies in Ethnomethodology, Englewood-Cliffs, NJ: Prentice-Hall. 30. Greenbaum, J. & Kyng, M. (1991) Design at Work: Cooperative Design of Computer Systems, Hillsdale, NJ: Lawrence Erlbaum Associates. 31. Grønbæk, K. (1991) Prototyping and Active User Involvement in System Development, Ph.D. thesis, Computer Science Dept., Århus University. 32. Grønbæk, K. et al. (1995) Cooperative Experimental System Development, Proceedings of Computers in Context, Computer Science Dept., Århus University. 33. Grudin, J. (1989) Why Groupware Applications Fail, Office: Technology and People, 4 (3). 34. Hughes, J.A., Randall, D., Shapiro, D. (1992) Faltering from Ethnography to Design, Proceedings of CSCW ‘92, Toronto, Canada: ACM Press. 35. Hughes, J.A., Randall, D., Shapiro, D. (1993) From Ethnographic Record to System Design, CSCW: The Journal of Collaborative Computing, 1 (3).

36. Hughes, J.A. et al. (1994) Moving Out of the Control Room: Ethnography in System Design, Proceedings of CSCW ‘94, Chapel Hill, NC: ACM Press. 37. Hughes, J.A. & Sharrock, W.W. (1997) The Philosophy of Social Research, Longman: London. 38.Kensing, F. (1987) Generating Visions in System Development, System Design for Human Develo-pment and Productivity, Amsterdam: North-Holland. 39. Kensing, F. & Madsen, K.H. (1991) Generating Visions, Design at Work (eds. Greenbaum, J. & Kyng, M.), Hillsdale, NJ: Lawrence Erlbaum Associates. 40.Kensing, F., Simonsen, J., Bødker, K. (1996) MUST A Method for Participatory Design, Proceedings of PDC ‘96, Cambridge, MA: CPSR. 41.Kensing, F. & Simonsen, J. (1997) Using Ethnography in Contextual Design, Communications of ACM, 40 (7). 42. Knudsen, T. et al. (1993) The Scandinavian Approaches, Proceedings of IRIS 16, Computer Science Dept: University of Copenhagen. 43. Kyng, M. (1988) Designing for a Dollar a Day, Office: Technology and People, vol 4. 44. Kyng, M. (1994) Collective Resources Meets Puritanism, Scandinavian Journal of Information Systems, 6 (1). 45. Kyng, M. (1995) Creating Contexts for Design, Scenario-based Design (ed. Carroll, J.), NY: John Wiley. 46. Kyng, M. & Mathiassen, L. (eds.) (1997) Computers and Design in Context, Cambridge, MA: MIT Press. 47. Malcolm, N. (1995) Wittgenstein on Language and Rules, Wittgensteinian Themes (ed. von Wright, G.H.), Ithaca, NY: Cornell University Press. 48. Mogensen, P. & Trigg, R. (1992) Artefacts as Triggers for Participatory Analysis, Proceedings of PDC ‘92, Boston, MA: CPSR. 49. Mogensen, P. (1994) Challenging Practice: An Approach to Cooperative Analysis, Ph.D. thesis, Computer Science Dept., Århus University. 50. Mogensen, P. & Robinson, M. (1995) Triggering Artefacts, AI & Society, vol 9. 51. Nygaard, K. (1979) The "Iron and Metal Project", Swedish Centre for Working Life: Malmö. 52. Plowman, L., Rogers, Y., Ramage, M. (1995) What Are Workplace Studies For? Proceedings of ECSCW ‘95, Sweden: Kluwer. 53. Randall, D., Rouncefield, M., Hughes, J.A. (1995) BPR and Ethnomethodologically Informed Ethnography in CSCW, Proceedings of ECSCW ‘95, Sweden: Kluwer 54. Rouncefield, M., Hughes, J.A., Rodden, T, Viller, S. (1994) CSCW and the Small Office, Proceedings of CSCW ‘94, Chapel Hill, NC: ACM Press.

55. Sacks, H. (1979) Hot-Rodder: A Revolutionary Category, Everyday Language (ed. Psathas, G.), NY: Irvington Press. 56. Sandberg, A. (1979) The DEMOS Project, Swedish Centre for Working Life: Malmö. 57. Sandberg, A. (1979) Project DUE, Swedish Centre for Working Life: Malmö. 58. Schmidt, K. & Carstensen, P. (1983) Bridging the Gap: Requirements Analysis for System Design, COMICRISØ Deliverable 2.2. 59. Schutz, A. (1967) The Phenomenology of the Social World, Evanston: North Western University Press. 60. Shapiro, D. (1993) Interdisciplinary Design, Proceedings of IRIS 16, Computer Science Dept: University of Copenhagen. 61. Shapiro, D. (1994) The Limits of Ethnography, Proceedings of CSCW ‘94, Chapel Hill, NC: ACM Press. 62. Simonsen, J. & Kensing, F. (1994) Take Users Seriously, But Take a Deeper Look, Proceedings PDC ‘96, Chapel Hill, NC: CPSR. 63. Sol, H.G. (1984) Prototyping: A Methodological Assessment, Approaches to Prototyping, Berlin: Springer-Verlag. 64. Suchman, L. (1987) Plans and Situated Actions: The Problem of Human-Machine Communication, Cambridge: Cambridge University Press. 65. Suchman, L. & Trigg, R. (1991) Understanding Practice, Design at Work (eds. Greenbaum, J. & Kyng, M.), Hillsdale, NJ: Lawrence Erlbaum Associates. 66. Utopia Project Group (1981) Training, Technology and Product from the Quality of Work Perspective, Swedish Centre for Working Life: Malmö. 67. Wittgenstein, L. (1967) Zettel, Oxford: Basil Blackwell. 68.Wittgenstein, L. (1968) Philosophical Investigations, Oxford: Basil Blackwell 69.Wittgenstein, L. (1968) On Certainty, Oxford: Basil Blackwell.

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/233564945

The Methodology of Participatory Design Article  in  Technical Communication · May 2005

CITATIONS

READS

1,054

28,445

1 author: Clay Spinuzzi University of Texas at Austin 115 PUBLICATIONS   4,518 CITATIONS    SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Revisionist Revolution in Vygotsky Studies View project

All content following this page was uploaded by Clay Spinuzzi on 18 July 2016. The user has requested enhancement of the downloaded file.

APPUED THEORY

SUMMARY • Provides the historical and methodological grounding for understanding participatory design as a methodology • Describes its research designs, methods, criteria, and limitations • Provides guidance for applying it to technical communication research

The Methodology of Participatory Design CLAY SPINUZZI

INTRODUCTION echnical communicators have begun writing quite a bit about participatory design, sometimes with a fervor that rivals that with which we used to write about T-units or think-aloud protocols. The terms participatory design and user-centered design are being broadly applied in the philosophical and pedagogical work of technical communication (Blythe 2001; Henry 1998; Johnson 1998; Salvo 2001; Spinuzzi 2003); methods associated with those terms are being applied in technical communication research (Mirel 1988, 2003; Smart 2003; Smart and Whiting 2002; Smart, Whiting, and DeTienne 2002; Spinuzzi 2002a, 2002c, in press; Wixon and Ramey 1996); and prototypes in particular are often presented as a vital part of iterative usability (see, for example, Barnum 2002, Chapter 4; Smart and Whiting 2002). But that breadth of application has often come at the price of imprecision. It's hard to find a good methodological expUnation of participatory design. That lack of a strong methodological explanation is not just technical communication's problem, though. Participatory design is often discussed in human-computer interaction, computer-supported cooperative work, and related fields as a research orientation or even a field (see Muller 2002, p. 1,052) rather than a methodology. The distinction may be important in principle, but in practice, it has become an escape hatch that allows practitioners to label their work "participatory design" without being accountable to established, grounded prece(dent. By looking at that established precedent, I argue, we can define participatory design as a methodology, even if it's a loose one. And I believe it's time we did: Without such a definition, we can't hold ourselves accountable to participatory design or build on a coherent body of knowledge. Consequently, we have trouble applying participatory design rigorously to our technical communication projects, and we tend to think of participatory design as an approach

T

to design rather than a rigorous research methodology. In this article, I discuss participatory design as a research methodology, characterizing it as a way to understand knowledge by doing, the traditional, tacit, and often invisible (in the sense of Nardi and Engestrom 1999; Muller 1999) ways that people perform their everyday activities and how those activities might be shaped prociuctively. I first define and describe if^af participatory design research is. I describe participatory design research in terms of its paradigm, methodology, research design, and methods. With this definition and description as a framework, I next discuss why we should pursue participatory design studies. In this section, I discuss the benefits of knowledge by doing and provide evaluative criteria to use as guidelines for creating and assessing participatory (design research. Finally, I explore the implications of understanding participatory design as a research methodology, and I discuss some practical applications. WHAT IS PARTICIPATORY DESIGN RESEARCH? Participatory design is research. Although it has sometimes been seen as a design approach characterized by user involvement Qohnson 1998), participatory design has its own highly articulated methodological orientation, methods, and techniques, just as does participatory action research, the approach on which it is based (Glesne 1998). Implementations of participatory design do vary in their attention to rigor and validity (Spinuzzi in press), but they all reflect a commitment to sustained, methodical investigation according to grounded methodological principles, as we'll see below. Participatory design is rather different from most research conducted by technical communicators, though it Manuscript received 11 August 2004; reviseci 24 November 2004; accepted 28 November 2004. Volume 52, Number 2, May 2005 • TechnicalCOMAlNCATION 1 6 3

APPLIED RESEARCH The Methodology of Partioipatory Design

turns out to be a good match for the work we do. As the name implies, the approach is just as much about design— producing artifacts, systems, work organizations, and practical or tacit knowledge—as it is about research. In this methodology, design is research. That is, although participatory design draws on various research methods (such as ethnographic observations, interviews, analysis of artifacts, and sometimes protocol analysis), these methods are always used to iteratively construct the emerging design, which itself simultaneously constitutes and elicits the research results as co-interpreted by the designer-researchers and the participants who will use the design. Like member checks in ethnographic research, participatory design's many methods ensure that participants' interpretations are taken into account in the research. Unlike member checks, however, these methods are shot through the entire research project; the goal is not just to empirically understand the activity, but also to simultaneously envision, shape, and transcend it in ways the workers find to be positive. In participatory design, participants' cointerpretation of the research is not just confirmatory but an essential part of the process. Participatory design started in Scandinavia through a partnership between academics and trade unions. Since that time it has worked its way across the Atlantic, becoming an important approach for researchers interested in human-computer interaction, computer-supported cooperative work, and related fields. From there, it has begun to influence writing studies, particularly through technical communication as well as computers and composition (for example, Sullivan and Porter 1997; Johnson 1998; see Spinuzzi 2002b for an overview). Participatory design has undergone many changes— for instance, later variations have moved away from the Marxist underpinnings of the earlier work—but its core has remained more or less constant. It attempts to examine the tacit, invisible aspects of human activity; assumes that these aspects can be productively and ethically examined through design partnerships with participants, partnerships in which researcher-designers and participants cooperatively design artifacts, workflow, and work environments; and argues that this partnership must be conducted iteratively so that researcher-designers and participants can develop and refine their understanding of the activity. The result of the research typically consists of designed artifacts, work arrangements, or work environments. As Pelle Ehn suggests, participatory design attempts to steer a course "between tradition and transcendence" (1989, p. 28)—that is, between participants' tacit knowledge and researchers' more abstract, analytical knowledge. The developers of participatory design believed that politically and ethically, the two types of knowledge must be bridged, with each being valued by all involved in the 1 6 4 TechnicalCOMMtNCATION • Volume 52, Number 2, May 2005

Spinuzzi

research. That's especially true in studies of workers, for which participatory design was initially designed, but also in studies of end users and students. History

Participatory design originated in Scandinavia in the 1970s and 1980s. This early Scandinavian work was motivated by a Marxist commitment to democratically empowering workers and fostering democracy in the workplace. This avowedly political research aimed to form partnerships with labor unions that would allow workers to determine the shape and scope of new technologies introduced into the workplace. Up to that point, labor unions had little experience with computer technologies and had been forced to accept systems developed by management, systems that represented a sharp break from workers' traditional ways of working; exerted a greater and greater control over increasingly fine details of their work; and automated large swathes of the workflow, putting people out of work (see Ehn, 1990; Zuboff, 1989). Since they did not know how to design computer technologies themselves, workers were put into the position of accepting these disempowering technologies or simply rejecting them. Some Scandinavian researchers set out to develop a third way, an approach that provided a set of "language games" (Ehn and Kyng 1991, pp. 176-177) that would allow software developers and workers to collaboratively develop and refine new technologies—allowing workers to retain control over their work. These researchers turned to action research, in which ethnographic methods are linked to positive change for the research participants (see Glesne 1998 for an overview). Clement and van den Besselaar explain that Untike conventional research, which is directedprimarity at producing resutts of interest to those heyond the immediate research site, an essential goat of action research is to achieve practical or political improvements in the participants' Hves (e.g., less routine work, greater autonomy, more effective tools). The researcher hecomes directly involved in the ongoing work and feeds results back to the participants. (1993, p- 33)

Action research involves alternating between practical work to support changes (such as design activities) on one hand, and systematic data collection and analysis on the other hand (p. 33; see also Bertelsen 2000). In early participatory design studies, workers tried to describe computer systems that could automate work while still valuing their craft skills and upholding their autonomy. But because workers had no experience in systems design, they could not begin to speculate on how to build such a system. So in the UTOPIA project, researchers joined with

spinuzzi

APPUED RESEARCH The Methodology of Participatory

Design

a workers' union to experiment with a range of research techniques, inclu(Jing mockups and other low-fidelity prototypes, future workshops, and organizational toolkits (B0dker and colleagues 1987). Although the project failed to produce a working system, it did produce a design approach and a range of techniques for participatory design work. Based on UTOPIA and other projects that came after it, the Scandinavians issued the "Scandinavian challenge": to develop and use design approaches that encourage industrial democracy (Bjerknes, Ehn, and Kyng 1987). This call resulted in many approaches and techniques under the umbrella of participatory design, such as CARD, PICTIVE, cooperative interactive storyboard prototyping, and contextual design (see Muller, Wildman, and White 1993 for an exhaustive taxonomy). Some of these, such as contextual design, have become complex enough to be categorized on their own and, arguably, differentiated enough that they are no longer participatory design in the strictest sense of the term (see Spinuzzi 2002c). In the United States, because of relatively weak labor unions and a focus on functionality rather than workplace democracy (see Spinuzzi in press), participatory design has tended to be implemented through nonintmsive methods: workplace microethnographies rather than walkthroughs and workshops (Blomberg and colleagues 1993; Blomberg, Suchman, and Trigg 1997), small-scale card-matching exercises rather than large-scale organizational games (Muller and Carr 1996), and one-on-one prototyping sessions that focus on confirming developed ideas rather than group prototyping sessions that emphasize exploration (Beyer and Holtzblatt 1998; Spinuzzi 2005). But the basic methodological principles of participatory design remained. What distinguishes participatory design from related approaches such as user-centered design is that the latter supposes only that the research and design work is done on behalfoi the users; in participatory design, this work must be done with the users (Iivari 2004).

of Taylorism, such as Harry Braverman's argument that Taylorism seeks to effect managerial control through "the dictation to the worker of the precise manner in which work is to be performed' (1974, p. 90, emphasis his). That is, rather than allowing workers to determine how to accomplish their tasks—and develop their own tacit craft skills and knowledge not possessed by management—the Taylorist manager examines the work, then breaks it into discrete, formal tasks that can be optimized, regulated, and taught to new workers. All discretion and all decisions are taken away from the workers. Knowledge is made explicit, formalized, and regulated; workers' craft traditions are judged inferior. (See Muller 1997, 1999 for discussions of this tendency in U.S. corporations and a response from the perspective of participatory design.) Participatory design opposes this notion of knowledge on both political and theoretical grounds. Politically, this notion of knowledge as wholly consisting of optimized tasks spells the death of workplace democracy: if it is accepted, workers cannot have a say in their own work because only trained researchers can determine the best way of performing that work. Theoretically, participatory design is founded on constructivism, a theory that explicitly resists the notion that knowledge can be completely formalized and classified. (For overviews of the constructivist argument in writing studies, see Mirel 1998; Spinuzzi 2003). Knowledge is situated in a complex of artifacts, practices, and interactions; it is essentially interpretive, and therefore it cannot be decontextualized and broken into discrete tasks, nor totally described and optimized. In the constructivist view, participants' knowledge is valorized rather than deprecated, and their perspectives therefore become invaluable when researching their activity and designing new ways to enact that activity. "Knowing and learning," as Barbara Mirel says, "take place in a dynamic system of people, practices, artifacts, communities, and institutional practices" (1998, p. 13). Defining users' knowiedge When we think of knowledge, we often think of exParticipatory design's object of study is the tacit knowl- plicit forms of knowledge: things that are written down, edge developed and used by those who work with tech- defined, categorized, systematized, or quantified. But to nologies. It's important to understand this focus because understand knowledge-making in participatory design, we tacit knowledge, which is typically difficult to formalize have to understand that much knowledge tends to be tacit. and describe, has tended to be ignored by the theory of Tacit knowledge is implicit rather than explicit, holistic cognition that has tended to dominate human-computer rather than bounded and systematized; it is what people interaction: information processing cognitive science know without being able to articulate. As Ehn argues, (Winograd and Flores 1986; Nardi 1996; Nardi and En- participatory design takes a Heideggerian approach to gestrom 1999). knowledge in which "the fundamental difference between In practice, this theory tends to lead to a rationalist involved, practical understanding and detached theoretiapproach to design, which generally assumes that there is cal reflection is stressed" (1989, p. 28). This pragmatic one best way to perform any activity—an assumption it approach involves alternating between the two by discovshares with Taylorism. This rationalist approach was some- ering tacit knowledge, then critically reflecting on it. thing to which early participatory designers reacted Since practical tacit knowledge was a main goal of strongly. They were heavily influenced by Marxist critiques early participatory design research, researchers adopted Volume 52, Number 2, May 2005 • TechuicalCONMUNCATION 1 6 5

APPUED RESEARCH The Methodology of Participatory Design

the tool perspective, the idea that "computer support is designed as a collection of tools for the skilled worker to use. The tool perspective takes the work process as its origin rather than data or information flow. This means: not detailed analysis, description, and formalization of qualifications, but the development of professional education based on the skills of professionals; not information flow analysis and systems description but specification of tools" (Bodker and colleagues 1987, p. 26l). The tool perspective allowed researchers to recognize and leverage the workers' craft knowledge, allowing them to develop new tools that would support rather than disrupt that work: "a tool is developed as an extension of the accumulated knowledge of tools and materials within a domain" (Bodker and colleagues 1987, p. 26l; see also Ehn 1989, pp. 339-40). In contrast to design approaches favored by management that served Tayloristic goals (deskilling, work intensification), the tool perspective involved building "computer-based tools by which the craftsman can still apply and develop original skills" (p. 26l; see Ehn 1989, p. 34; Ehn and Kyng 1987, pp. 34-38). This tacit or craft knowledge is linked to metis: "Metis, or what is also called cunning intelligence, is the ability to act quickly, effectively, and prudently within everchanging contexts" (Johnson 1998, p. 53). These everchanging contexts are what Mirel points to when she talks about complex tasks (1998, 2004). In participatory design, tacit knowledge is not only explored, it is in many cases made material, as we saw with the tool perspective that participatory designers adopted. Workers find unconventional ways to use the tools that have been supplied to them, learn how to construct their own ad hoc tools (Spinuzzi 2003), and—if they are allowed the time and freedom to do so—eventually stabilize new tools and the ways they interact with them. One goal of participatory design is to preserve tacit knowledge so that technologies can fit into the existing web of tacit knowledge, workflow, and work tools, rather than doing away with them. In contrast to rationalist studies that assume workers' tasks can be broken down into their components, formalized, and made more efficient, participatory design assumes that tacit knowledge cannot be completely formalized; the task-and-efficiency orientation typical in many user-centered design methods such as GOMS (Card, Moran, and Newell 1983; Muller 1999) and usability testing (Barnum 2002; Rubin 1994) can actually get in the way of the holistic activity. Certainly, some tacit knowledge can be made explicit and formalized, but attempts at explication of such tacit knowledge must always be incomplete. The knowledge is too layered and subtle to be fully articulated. That is why action1 6 6 TechnicalCOIVMlNCATION • Volume 52, Number 2, May 2005

Spinuzzi

centered skill has always been learned through experience (on-the-job training, apprenticeships, sports practice, and so forth). Actions work better than words when it comes to learning and communicating these skills. (Zuboff 1988, p. 188)

So tacit knowledge often remains invisible, since it is not made systematic or quantifiable, it passes unnoticed and often undervalued. (See Nardi and Engestrom 1999 for a collection of essays on this theme.) In particular, low-level workers are often not valued by management because their skills are invisible: the complexity, difficulty, and interconnectedness of their work are not recognized. One example is Blomberg, Suchman, and Trigg's (1994) study in which document analysts (temporary workers who coded legal documents) were found to perform complex interpretive work. The attorneys who employed these workers did not recognize the work as being complex or interpretive, and consequently planned to outsource the work to lower paid workers in another country. Like others working in the participatory design tradition, Blomberg, Suchman, and Trigg attempted to demonstrate to management the tacit knowledge that workers brought to the activity, knowledge that had remained invisible up to that point, yet was vital to the continued success of the activity. Describing users' knowledge

Since users' tacit knowledge is highly valued, participatory design focuses on exploring that tacit knowledge and taking it into account when building new systems. This task is accomplished with a strong political or ethical orientation: users' knowledge is described so that it can be used to design new tools and workflows that empower the users. (What is meant by empowerment is sometimes different in the different strands of participatory design.) In this section, I describe the paradigm that underpins participatory design, its methodology, research design, and methods. Paradigm Participatory design's paradigm is constructivist in Mirel's sense (1998). That is, it sees knowledgemaking as occurring through the interaction among people, practices, and artifacts—knowledge doesn't just reside in the head; it's a condition of a certain context. One of the most distinct and influential notions of participatory design is that of the language game (^hn 1989, p. 17): bridging the worlds of researcher-designers and users by finding a common "language" or mode of interaction with which both parties feel comfortable. Methodology Participatory design's methodology is derived from participatory action research or, as Ehn calls it, "practice research": "Practical interventionistic investigations (as opposed to gathering of data) and parallel theo-

APPUED RESEARCH spinuzzi

The Methodology of Participatory Design

retical reflection (as opposed to detached theoretical re- Methods Methods are grouped by stage. flections a posteriori)" (Ehn 1989, p. 13). As discussed • Stage 1: Initial exploration of work above, this activist brand of research has an explicit Since initial exploration tends to involve examining political-ethical orientation: to empower workers to take technology use on site. Stage 1 draws from ethnocontrol over their work. Unlike Donald Norman—who graphic methods such as observations, interviews, argues that the designer should be a dictator (Grossman walkthroughs and organizational visits, and examina2002)—participatory designers see themselves as facilitations of artifacts. This stage is typically conducted on tors who attempt to empower users in making their own site, during the normal work day. In the earlier Scandecisions (Clement 1994). dinavian iterations, this initial exploration tended to To achieve that goal, participatory design emphasizes be highly interactive and intrusive: the researchers co-research and co-design: researcher-designers must generally aligned themselves with relatively powerful come to conclusions in conjunction with users. So particiworkers' unions that believed in the projects and patory design involves redesigning workplaces and work could insist on the sorts of disruptions caused by organization as well as work tools. And it is iterative, walkthroughs and organizational visits (see B0dker, allowing workers and researchers to critically examine the Gr0nbaek, and Kyng 1993 for an overview). impacts of these incremental redesigns in progress. In North America, unions were much weaker and workers were not in a position to force participation, Research design Participatory design is still developnor were they terribly interested in such projects. So ing and consequently its research design tends to be researchers turned to less intrusive ethnographic and quite flexible. For instance, the early Scandinavian work ethnomethodological techniques such as observatended to rely on union-sponsored workshops and tions and interviews (see Wall and Mosher 1994 for games involving heavy direct interaction between dean overview). Although the methods draw from ethsigners and users, while later work in the U.S. has tended nography, they are oriented toward design as well to supplement targeted interaction with less intrusive as description, so they tend to be focused and enmethods such as observation and artifact analysis. But acted differently, with more interaction in mind (see three basic stages are present in almost all participatory Beyer and Holtzblatt 1998 for one example, and design research: Spinuzzi 2002c and in press for critiques). Much of • Stage 1: Initial exploration of work that interaction takes place during the second stage, In this stage, designers meet the users and familiarin discovery processes. ize themselves with the ways in which the users • Stage 2: Discovery processes work together. This exploration includes the techStage 2 is where researchers and users interact most nologies used, but also includes workflow and work heavily, and it also typically involves group interacprocedures, routines, teamwork, and other aspects tions. Again, discovery processes tended to be more of the work. interactive and intrusive in the earlier Scandinavian • Stage 2: Discovery processes iterations than in the later North American iterations, In this stage, designers and users employ various but in all implementations they are more interactive techniques to understand and prioritize work organithan traditional ethnographies. Because of participazation and envision the future workplace. This stage tory design's orientation toward design, the goal is allows designers and users to clarify the users' goals to cooperatively make meaning out of the work and values and to agree on the desired outcome of rather than to simply describe it. Methods used durthe project. This stage is often conducted on site or ing this stage include organizational games (Bodker, in a conference room, and usually involves several Gronbaek, and Kyng 1993, pp. 166-167), role-playusers. ing games (Iacucci, Kuutti and Ranta 2000), organi• Stage 3: Prototyping zational toolkits (Tudor, Muller, and Dayton 1993; Ehn and Sjogren 1991; B0dker and colleagues 1987), In this stage, designers and users iteratively shape future workshops (B0dker, Gr0nbaek, and Kyng technological artifacts to fit into the workplace envi1993, p. 164; Bertelsen 1996), storyboarding (Madsen sioned in Stage 2. Prototyping can be conducted on and Aiken 1993), and workflow models and intersite or in a lab; involves one or more users; and can pretation sessions (Beyer and Holtzblatt 1998). be conducted on-the-job if the prototype is a working prototype. • Stage 3: Prototyping The stages can be (and usually should be) iterated Finally, this stage involves a variety of techniques for several times. Together, they provide an iterative coiteratively shaping artifacts. These techniques include exploration by designers and users. mockups (Ehn 1989; Ehn and Kyng 1991; B0dker Volume 52, Number 2, May 2005 • TechnicalCOMMlNCATION

167

APPUED RESEARCH The Methodology of Participatory Design

and colleagues 1987), paper prototyping (Novick 2000), cooperative prototyping (B0dker and Gr0nbaek 1991; Gr0nbaek and Mogensen 1994); and PICTIVE (Muller 1991b, 1993), among many others. Finally, and just as importantly, results are disseminated in forms that users can understand and share—a continuation of the "language games" that allow researchers and users to collaborate, and a way to continue to support the empowerment and participation of users. The tone for this dissemination was set early on, in the UTOPIA project: results were discussed in everyday language in a union publication called Graffitti (Ehn 1989, pp. 350-352). Another example is contextual design's practice of "walking" through affinity diagrams and consolidated models with participants and of providing a room with diagrams and prototypes posted on the walls so that workers, managers, engineers, marketing people, and customers can see the state of the project in progress (Beyer and Holtzblatt 1998, chapter 10).

Spinuzzi

design has migrated across socioeconomic borders, from Scandinavia to North America, researchers have had difficulty maintaining its methodological tenets, particularly its focus on democratic empowerment (Muller 1991a; Spinuzzi 2002b, in press).

Iiinltations of method If more rigorous methods can be described as "measure twice, cut once," participatory design methods can be described as "explore, approximate, then refine." This essentially dissimilar methodological orientation—related to action research's juggling act between the traditional researcher's role of collecting and analyzing data versus the activist's role of initiating and sustaining significant change at the research site—tends to alter how researcher-designers apply established methods. For instance, participatory design researchers often draw on ethnographic methods to develop knowledge about the participants' work, tools, and craft traditions. But these researchers, who often come from backgrounds in systems design, human-computer interaction, or technical communication, tend to apply these methods quite loosely in the CRITICALLY EXAMINING PARTICIPATORY DESIGN STUDIES Despite its advantages, participatory design has some eyes of trained ethnographers. Diana Forsythe, for instance, scathingly critiques these rather sharp limitations as well as some criteria for success that are not immediately obvious. Below, I review some of applications as "do-it-yourself ethnography" and complains the limitations of participatory design and discuss criteria that "superficial social research may confer the illusion of increased understanding when in fact no such understandfor evaluating participatory design studies. ing has been achieved." She specifically takes to task a contextual design project "in which brief exercises in shadLimitations of participatory design Participatory design has strengths, but as with other re- owing, observation, and interviewing have been undertaken from a common sense stance without engaging the search approaches, those strengths come with tradeoffs. questions that define ethnography as anthropologists unLunitations of methodology Since participatory de- derstand it," and warns that "such an exercise can result in sign aims to ground changes in traditional craft skills as a a cognitive hall of mirrors. Without addressing basic issues way of empowering workers, some argue that participatory such as the problem of perspective, researchers have no design does not lend itself to radical change of the sort that way of knowing whether they have really understood anysometimes must characterize new systems (Beyer and thing of their informants' world view or have simply proHoltzblatt 1998). In fact, participatory designers have been jected and then 'discovered' their own assumptions in the cautioned to think of their work as "evolution, not revolu- data" (1999, p. 136; see also Cooper and colleagues 1995; tion" (Sumner and Stolze 1997). This gradualist tendency Nyce and Lowgren, 1995). can lead to tunnel vision, in which particular stakeholders Forsythe's critique is valid if the aim of research is to are served while others are left to fend for themselves extract knowledge in the mode of traditional research, (Bjerknes and Bratteteig 1995; Bodker 1996). In response, pulling the data into another domain where it can be some participatory designers have worked to bring in new abstracted, analyzed, and used apart from the site. But accounts of stakeholders that can support more complex participatory design research, properly done, continually projects (B0dker 1996; Muller 2003). brings the analysis back to the domain and shares it with Another limitation is that some strains of participatory the participants, who cointerpret it, co-analyze it, and codesign—particularly later work that emphasizes functional design responses to it. That is, the traditional methods empowerment over democratic empowerment, such as co- are—at least in the best examples—re-networked or reconoperative prototyping (B0dker and Gr0nbaek 1991)—have figured to meet the design orientation. a tendency to focus too narrowly on artifacts rather than The "same" methods can be enacted differently and overall workflow, presuming that fine-tuning the artifact take rather different shapes as they are attached to different will necessarily result in empowering changes to the over- methodologies and paradigms. In this case, the resulting all work activity (Spinuzzi 2002c). Finally, as participatory research and designs do give up traditional research rigor. 1 6 8 TechnicalCOMMlNCATION • Volume 52, Number 2, May 2005

APPLIED RESEARCH spinuzzi

but they do so to gain reflexivity and agreement. (In the earlier, highly politicized Scandinavian work, that agreement took the shape of political representation; in later work, the focus shifted to ethical concerns in giving workers the tools needed to do their jobs, and agreement took the shape of consensus among representative users.) This tradeoff resembles "rolling" member checks. For example, Muller (1999) describes using the participatory design technique CARD to study the work of telephone operators. CARD, he says, has less rigor and predictive power than more narrowly defined analytical techniques such as GOMS, but on the other hand it brings in benefits that are more important to participatory designers:. Its strengths lie in its ability to capture diverse information . . . , its openness to the disconfirmation of assumptions . .. , and its extensibility in the face of new information. Underlying all of these attributes is CARD'S enfranchisement of multiple stakeholders with differing disciplines, perspectives, and positions, (p. 54; see also Bertelsen 2000)

Rigor is difficult to achieve because researchers cede considerable control to their participants and share a "design language" with those participants which must by its nature be imprecise. On the other hand, the proof is in the pudding, so to speak—the design artifact both encapsulates the research results (as the material trace left by the design efforts) and elicits them (both during design sessions and afterwards, as it is introduced into the environment to be used as a stable work artifact). Wall and Mosher demonstrate that the same design artifacts can be used as records of a field study; tools for analysis; communication tools for a language game in which researcher-designers and users participate; and focal artifacts for co-design and codevelopment (1994). Rigor becomes something different in participatory design research; a desirable goal, but subordinated to users' control and aims. Practical limitations In addition to the methodological and methodical critiques is the practical one: participatory design research takes an enormous amount of time, resources, and institutional commitment to pull off. That institutional commitment in particular can be hard to come by. From the standpoint of a profit-oriented business, participatory design seems to provide little structure and no deadlines (Wood and Silver 1995, pp. 322-323). Researchers find that they have to cede considerable control to workers, who must be committed to the process and cannot be coerced. For example, Bertelsen (1996) ruefully recounts how some of his participants simply failed to show up for a future workshop, compromising the design

The Methodology of Participatory Design

developed in the workshop. Finally, unlike ethnographic studies, participatory design studies typically require continuous critical participation by workers. Later participatory design variants such as contextual design (Beyer and Holtzblatt 1998) and customer partnering (Hackos, Hammar, and Elser 1997) have compromised by sharply limiting users' participation. Evaluating participatory design Participatory design is usually brought in at major turning points when work is to be automated and tools and workflows are to be changed. Since participatory design projects by definition involve design as well as research, the object of the research tends to be expressed in a purpose statement rather than a research question. The purpose of this project. . . is to design a number of computer applications for [an organization] and to develop a long-term strategy for decentralizing development and maintenance. (Bodker, Gronbaek and Kyng 1993, p. 161) The overall object of the project has been to contribute to the development of skill-enhancing tools for graphic workers. (Bodkerand colleagues 1987, p. 254) The work-oriented design project was originally conceived to explore bringing together the worlds of corporate research, product development, and specific worksites . . . in an effort to design more useful new technologies. (Blomberg, Suchman, and Trigg 1997, p. 269)

In concert with these types of research statements, participatory design has developed criteria that are also oriented toward development. Participatory design is still a relatively young approach, and at present it is more of a movement or research orientation than a coherent methodology, so it hasn't developed evaluative criteria to the same level that, say, experimental studies have. But we can draw nascent criteria from the methodological principles discussed earlier. They are often difficult to meet. As Blomberg and Henderson (1990) illustrate, it's easy to produce a study that looks like participatory design but that fails at all three of the criteria listed here. Participatory design projects, despite their ceding of power and analysis to users, still must rigorously apply these criteria to have internal integrity. Criterion #1: Quality of life for workers Most participatory designers would point to this criterion as the most important one. Participatory design is meant to improve workers' quality of life both in terms of demoVolume 52, Number 2, May 2005 • TechnicalCOMMLNCATION

169

APPUED RESEARCH The Methodology of Participatory Design

cratic empowerment (that is, workers' control over their own work organization, tools, and processes) and functional empowerment (that is, workers' ability to perform their given tasks with ease; see Blomberg, Suchman, and Trigg 1997; Spinuzzi and colleagues 2003; Spinuzzi in press). In a participatory design study, workers critically reflect on their own practices, work organization, and tools. In the earlier Scandinavian iterations, this critical reflection usually involved examining ways that workers could better control the terms of their work; in later U.S.based iterations, critical reflection turned to an examination of tacit knowledge to more effectively meet the goals of the work. Either way, this methodological principle translates into an exploration of tacit knowledge, invisible work, and unstated individual and organizational goals. To meet this criterion, participatory design studies strive for • Reflexivity and agreement between researchers and users. The two groups interact closely through interviews, focus groups, workshops, organizational games, prototyping sessions, and other techniques to continually reassess the activity under investigation and to synchronize their interpretations. • Codetermination of the project by researchers and users. Specific project criteria are codetermined by researchers and users during the project. This way, researchers do not take total ownership of the project; users are also able to shape the project to reflect their values, goals, and ends. Criterion #2: Collaborative development Collaborative development is a key part of the effort to improve workers' quality of life. As noted earlier, users' work is often invisible and their knowledge is often tacit. Thus designers of information systems, educational Web sites, and documentation often assume that the work is simple, easily formalized, and (sometimes) easily automated. Collaborative development allows researcher-designers to avoid that trap by inviting participants to be co-researchers and co-developers. Doing so allows researcher-designers to elicit and explore the tacit knowledge and invisible practices that might otherwise have been lost, and simultaneously encourages workers to participate in their own empowerment. In terms of a study criterion, this methodological principle translates into a requirement for mechanisms to ensure that data collection and analysis be done in conjunction with participants. In ethnographical terms, participatory design uses member checks—but in participatory design, the member checks are continuous since the project is co-owned and co-enacted by the participants. To meet this criterion, participatory design studies strive for 1 7 0 TechnicalCONMLNCATION • Volume 52. Number 2. May 2005

Spinuzzi

• Involvement The successful study will provide mechanisms for participation and produce verifiable changes based on them. Participatory design studies are not a "listening tour" in which researchers hear the concerns of users, then go away and design a solution; they are participatory top to bottom and must include verifiable, regular avenues for group interaction and definite routines for ensuring that users' concerns are methodically addressed in the resulting design. ^ Mechanisms for consensus/agreement and representation In most cases, not every user can be involved in a participatory design study. For instance, if a participatory (design study involved redesigning an interface used by 2000 workers, it's simply not practical or manageable to involve every worker in workshops and prototyping sessions. Instead, workers must be represented in the same way that politicians are elected to represent the interests and views of their constituencies. In the earlier Scandinavian iterations of participatory design, representatives were assigned to projects by their unions, making them explicitly political representatives. In North America, however, unions hold considerably less power and no other ready-made mechanisms for political representation of workers exist; rather, workers are typically selected by management and are seen as functionally representative (that is, "average users"). In any case, users must be given the opportunity to be broadly represented in the study, and the representatives should have a way to settle disagreements or come to consensus. Common language games such as contextual design's work diagrams and PICITVE's pictures. To collaboratively develop solutions, users should be able to interact with researchers in a neutral "language" understood by both sides. It's not enough to offer such a language game; researchers must also confirm that users are comfortable with the language game, able to understand it, and able to use it both to critique solutions and to express their own solutions. Common aims codetermined by researchers and users in advance. Near the beginning of the project, researchers and users should be able to settle on a list of common aims that represent the users' interests. That list must be flexible, as users will continue to critically evaluate their own aims. Criterion #3. Iterative process But to enact collaborative development, researcher-designers and participants must follow an iterative process. Tacit knowledge and invisible practices are by their nature difficult to tease out.

APPLIED RESEARCH spinuzzi

A crude caricature of participatory design might involve gathering workers' comments on current practice and their responses to a prototype, but without sustained, iterative reflection on and use of a designed artifact, workers may not be able to comment critically or respond effectively (see Hackos, Hammar, and Elser 1997). Each change in a prototype tends to unearth other invisible work practices and other tacit knowledge. In terms of a criterion for a study, this methodological principle translates into a requirement for a series of opportunities to sustain the continuous member check. To meet this criterion, participatory design studies strive for Continual participation Users should be involved repeatedly or continually and offered mechanisms for codesign at multiple stages.

The Methodology of Participatory Design

other studies; we are able to draw from and contribute to a coherent, common body of knowledge. Our work becomes relevant to others working in human-computer interaction, computer-supported cooperative work, and similar fields. TC REFERENCES Barnum, C. M. 2002. Usability testing and research. New York, NY: Longman. Bertelsen, O. W. 1996. The festival checklist: Design as the transformation of artifacts. In PDC '96 Proceedings of the Participatory Design Conference, pp. 93-101. Palo Alto, CA: Computer Professionals for Social Responsibility. -. 2000. Design artifacts: Toward a design-oriented epistemology. Scandinavian journal of information systems 12:15-27.

Revisiting stages Rarely is one sweep through the stages enough because the stages are designed to inspire critical reflection on the work and turn up tacit knowledge. Beyer, H., and K. Holtzblatt. 1998. Contextual design: So a successful participatory design project should be flexDefining customer-centered systems. San Francisco: ible enough to revisit stages repeatedly and cyclically. Morgan Kaufmann Publishers, Inc. Sustained reflection Finally, the continuous member check must go beyond simply reacting to the functionality of designs—a danger especially in the later stages of a project, when functioning prototypes take on the appearance of completeness and participants' attention often turns to minor details. At all points, participants should be encouraged and given avenues to critically reflect on the implication of the research results for their own work.

Bjerknes, G. 1992. Shared responsibility: A field of tension. In Software development and reality construction, ed. C. Floyd, H. Zullighoven, R. Budde, and R. Keil-Slawik. New York, NY: Springer-Verlag, pp. 295-301.

CONCLUSION

Bjerknes, G., P. Ehn, and M. Kyng, eds. 1987. Computers and democracy^A Scandinavian challenge. Aldershot, UK: Avebury.

Although participatory design is often portrayed as a research orientation or a field, understanding it as a methodology leads us to better understand its promises and constraints, its limitations and its criteria—and, I think, also leads us to greater respect for the careful work that goes into developing a participatory design study. That's especially important for technical communicators. We are, after all, in a design-oriented field (Kaufer and Butler 1993) and we have drawn heavily on design-oriented research methodologies, methods, and techniques such as usability testing. If we understand participatory design as an orientation, we are tempted to articulate a few general principles and retrofit our existing techniques to accommodate them. But if we understand it as a methodology, we are able to draw on a coherent body of methods and techniques operating within a general research design under common methodological premises. That is, we are able to conduct studies that have a great deal in common with

and T. Bratteteig. 1995. User participation and democracy: A discussion of Scandinavian research on system development. Scandinavian journal of information systems 7:73-98.

Blomberg, J., J. Giacomi, A. Mosher, and P. Swenton-Wall. 1993. Ethnographic field methods and their relation to design. In Participatory design: Principles and practices, ed. D. Schuler and A. Namioka. Hillsdale, NJ: Lawrence Eribaum Associates, pp. 123-156. Blomberg, J. and A. Henderson. 1990. Reflections on participatory design: Lessons from the trillium experience. In CHI '90 Human Factors in Computing Systems. Seattle, Washington: ACM, pp. 353-359. Blomberg, J., L. Suchman, and R. Trigg. 1994. Reflections on a work-oriented design project. In Participatory Design Conference (PDC'94), ed. R. Trigg, S. I. Anderson, and E. Dykstra-Erickson. Palo Alto, CA: Computer Professionals for Social Responsibility, pp. 99-109. Volume 32, Number 2, May 2005 • TechnicalCCMMlNCATION

171

APPUED RESEARCH The Methodology of Participatory Design

Spinuzzi

Blomberg, J., L. Suchman, and R. Trigg. 1997. Back to work: Renewing oid agendas for cooperative design. In Computers and design in context, ed. M. Kyng and L. Mathiassen. Cambridge, MA: MIT Press, pp. 267-288.

Ehn, P., and M. Kyng. 1987. The collective resource approach to systems design. In Computers and democracy -A Scandinavian challenge, ed. G. Bjerknes, P. Ehn, and M. Kyng. Brookfield, VT: Gower, pp. 17-58.

Bodker, S. 1996. Creating conditions for participation: Conflicts and resources in systems development. Humancomputer interaction 11:215-236.

Ehn, P., and M. Kyng. 1991. Cardboard computers: Mocking-it-up or hands-on the future. In Design at work: Cooperative design of computer systems, ed. J. Greenbaum and M. Kyng. Hillsdale, NJ: Lawrence Eribaum Associates, pp. 169-196.

Bodker, S., P. Ehn, J. Kammersgaard, M. Kyng, and Y. Sundblad. 1987. A Utopian experience: On design of powerful computer-based tools for skilled graphical workers. In Computers and democracy—A Scandinavian challenge, ed. G. Bjerknes, P. Ehn, and M. Kyng. Aldershot, England: UK, pp. 251-278. Bodker, S. and K. Gronbaek. 1991a. Cooperative prototyping: Users and designers in mutual activity. International journal of man-machine studies 34:453-478. . 1991 b. Design in action: From prototyping by demonstration to cooperative prototyping. In Design at work: Cooperative design of computer systems, ed. J. Greenbaum and M. Kyng. Hillsdale, NJ: Lawrence Eribaum Associates, pp. 197-218. , and M. Kyng. 1993. Cooperative design: Techniques and experiences from the Scandinavian scene. In Participatory design: Principles and practices, ed. D. Schuler and A. Namioka. Hillsdale, NJ: Lawrence Eribaum Associates, pp. 157-176. Braverman, H. 1974. Labor and monopoly capital. New York, NY: Monthly Review Press. Card, S. K., T. P. Moran, and A. Newell. 1983. The psychology of human-computer interaction. Hillsdale, NJ: Lawrence Eribaum Associates. Clement, A. 1994. Computing at work: Empowering action by "low-level" users. Communications of the ACM 37 (1): 52-63. , and P. van den Besselaar. 1993. A retrospective look at PD projects. Communications of the ACM 36 (4): 29-37. Cooper, G., C. Hine, J. Rachel, and S. Woolgar. 1995. Ethnography and human-computer interaction. In The social and interactional dimensions of human-computer interfaces, ed. P. J. Thomas. New York, NY: Cambridge University Press, pp. 11-36. Ehn, P. 1989. Work-oriented design of computer artifacts. Hillsdale, NJ: Lawrence Eribaum Associates. 1 7 2 TechnicalCOMVHNGATION • Volume 52, Number 2, May 2005

Ehn, P. and D. Sjogren (1991). From system descriptions to scripts for action. In Design at work: Cooperative design of computer systems, ed. J. Greenbaum and M. Kyng. Hillsdale, NJ: Lawrence Eribaum Associates, pp. 241-268. Forsythe, D. E. 1999. "It's just a matter of common sense": Ethnography as invisible work. Computer supported cooperative work 8:127-145. Glesne, C. 1998. Becoming qualitative researchers: An introduction, vol. 2. New York, NY: Allyn and Bacon. Gronbaek, K., and P. Mogensen. 1994. Specific cooperative analysis and design of general hypermedia development. In Participatory Design Conference (PDC'94), ed. R. Trigg, S. I. Anderson, and E. Dykstra-Erickson. Palo Alto, CA: Computer Professionals for Social Responsibility, pp. 159171. Grossman, W. M. 2002. Interview: Designed for life. New sc/enf/st 176:46-49. Hackos, J. T., M. Hammar, and A. Elser. 1997. Customer partnering: Data gathering for complex on-line documentation. IEEE transactions on professional communication 40:102-110. lacucci, G., K. Kuutti, and M. Ranta. 2000. On the move with a magic thing: Role playing in concept design of mobile services and devices. In DIS '00. Brooklyn, NY: ACM, Inc., pp. 193-202. Iivari, N. 2004. Enculturation of user involvement in software development organizations: An interpretive case study in the product development context. In Proceedings of the third Nordic conference on Human-computer interaction, pp. 287-296. New York, NY: ACM Press. Johnson, R. R. 1998. User-centered technology: A rhetorical theory for computers and other mundane artifacts. New York, NY: SUNY Press.

spinuzzi

APPUED RESEARCH The Methodoiogy of Participatory Design

Kaufer, D. S., and B. S. Butler. 1996. Rhetoric and the arts of design. Mahwah, NJ: Lawrence Eribaum Associates. Madsen, K. H. and P. H. Aiken. 1993. Experiences using cooperative interactive storyboard prototyping. Communications of the ACM 36 (4): 57-66. Mirel, B. 1998. "Applied constructivism" for user documentation. Journal of business and technical communication 12:7-49. . 2004. Interaction desigr) for complex problem solving: Developing usable and useful software. San Francisco, CA: Morgan Kauffman. Muller, M. J. 1991a No mechanization without representation: Who participates in participatory design of large software products? In CHI'91. New Orleans, LA: ACM, pp. 391. Muller, M. J. 1991b. PICTIVE: An exploration in participatory design. In CHI'91. New Crieans, LA: ACM, pp. 225-231. Muller, M. J. 1993. PICTIVE: Democratizing the dynamics of the design session. In Participatory design: Principles arid practices, ed. A. Namioka and D. Schuler. Hillsdale, NJ: Lawrence Eribaum Associates, pp. 211-237. Muller, M. J. 1997. Ethnocritical heuristics for reflecting on work with users and other interested parties. In Computers and design in context, ed. M. Kyng and L. Mathiassen. Cambridge, MA: MIT Press, pp. 349-380. Muller, M. J. 1999. Invisible work of telephone operators: An ethnocritical analysis. Computer supported cooperative work 8:31-61. Muller, M. J. 2003. Participatory design: the third space in HCI. In The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications. Hillsdale, NJ: Lawrence Eribaum Associates, Inc., pp. 1051-1068. Muller, M. J., and R. Carr. 1996. Using the card and pictive participatory design methods for collaborative analysis. In Field methods casebook for software design, ed. D. Wixon and J. Ramey. New York, NY: John Wiley and Sons, pp. 17-34. Muller, M. J., D. M. Wildman, and E. A. White. 1993. Taxonomy of PD practices: A brief practicioner's guide. Communications of the ACM 36 (4): 26-27.

Nardi, B. A. and Y. Engestrom. 1999. A web on the wind: The structure of invisible work. Computer supported cooperative work 8:1-8. Novick, D. G. 2000. Testing documentation with "low-tech" simulation. In SIGDCC 2000. Cambridge, MA: ACM, pp. 55-68. Nyce, J. M. and J. Lowgren. 1995. Toward foundational analysis in human-computer interaction. In The social and interactional dimensions of human-computer interfaces, ed. P. J Thomas. New York, NY: Cambridge University Press, pp. 37-47. Rubin, J. 1994. Handbook of usability testing. New York, NY: John Wiley and Sons. Smart, K. L. and M. E. Whiting 2002. Using customer data to drive documentation design decisions. Journal of business and technical communication 16:115-169. Spinuzzi, C. 2002a. Documentation, participatory citizenship, and the Web: The potential of open systems. Proceedings of the 20th Annual International Conference on Computer Documentation. Chapel Hill, NC: ACM Press, pp. 194-199. . 2002b. A Scandinavian challenge, a US response: Methodological assumptions in Scandinavian and US prototyping approaches. Proceedmgs of the 20th Annual International Conference on Computer Documentation. Chapel Hill, NC: ACM Press, pp. 208-215. . 2002c. Toward integrating our research scope: A sociocultural field methodology. Journal of business and technical communication 16:3-32. 2003. Tracing genres through organizations: A sociocultural approach to information design. Cambridge, MA: MIT Press. . In press. Lost in the translation: Shifting claims in the migration of a research technique. Technical communication quarterly 14. , J. Bowie, I. Rogers, and X. Li. 2003. Open systems and citizenship: Developing a departmental website as a civic forum. Computers and composition 20:168-193.

Sullivan, P., and J. E. Porter. 1997. Cpening spaces: Writing technologies arid critical research practices. New directions in computers and composition studies. Greenwich, CT: Nardi, B. A., ed. 1996. Context and consciousness: Activity theory Ablex Pub. Corp. and human-computer interaction. Cambridge, MA: MIT Press. Volume 52, Number 2, May 2005 • TechnicalCOMMUNCATION 1 7 3

APPUED RESEARCH The Methodoiogy of Participatory Design

Sumner, T., and M. Stolze. 1997. Evolution, not revolution: Participatory design in the toolbelt era. In Computers and design in context, eds. M. Kyng and L. Mathiassen. Cambridge, MA: MIT Press, pp. 1-26. Tudor, L., M. Muller, and J. Dayton. 1993. A C.A.R.D. game for participatory task analysis and redesign: Macroscopic complement to PICTIVE. In INTERCHI'93. Amsterdam, Netherlands: ACM, Inc. Wall, P. and A. Mosher. 1994. Representations of work: Bringing designers and users together. In PDC'94: Proceedings of the

1 7 4 TechnicalCOMMUNCATION • Volume 52, Number 2, May 2005

Spinuzzi

Participatory Design Coriference, eds. R. Trigg, S. I. Anderson, and E. Dykstra-Erickson. Palo Alto, CA, pp. 87-98. Zuboff, S. 1988. In the age of the smart machine: The future of work and power. New York. NY: Basic Books. CLAY SPINUZZI is an an assistant professor of rhetoric at The University of Texas at Austin, where he directs the Computer Writing and Research Lab. His interests include research and design methodologies; his book on the subject. Tracing genres through organizations, was published by MIT Press in 2003- Contact: [email protected].

View publication stats

Personas: Practice and Theory John Pruitt Microsoft Corporation One Microsoft Way Redmond, WA 98052 USA +1 425 703 4938 [email protected] Jonathan Grudin Microsoft Research One Microsoft Way Redmond, WA 98052 USA +1 425 706 0784 [email protected]

Abstract ì Personasî is an interaction design technique with considerable potential for software product development. In three years of use, our colleagues and we have extended Alan Cooperís technique to make Personas a powerful complement to other usability methods. After describing and illustrating our approach, we outline the psychological theory that explains why Personas are more engaging than design based primarily on scenarios. As Cooper and others have observed, Personas can engage team members very effectively. They also provide a conduit for conveying a broad range of qualitative and quantitative data, and focus attention on aspects of design and use that other methods do not.

Keywords Personas, User Archetypes, User Profiles, User Research, Design Method, Scenarios, User-Centered Design.

Industry/category Computer software, hardware, and technology

Project statement Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 2003, ACM.

©2003 ACM 1-58113-728-1 03/0006 5.00

We have used Personas on projects that range from small to large. This paper will discuss two such projects, one small and one large. The smaller project involved the first version of a new Web browser, MSN Explorer. The larger, our most recent Personas effort,

2

was in support of the Microsoft Windows product development team. The goal of each effort was to help a team identify and understand its target audience as well as aid in design and development decisions for a specific product release.

Project participants The MSN Explorer product development team consisted of several hundred members. The Windows product development team changed greatly over time, starting with several hundred members and growing to several thousand at the peak of the effort. Each team comprised programmers, quality assurance testers, program managers, designers, technical writers, product planners, user researchers, and marketing professionals, among others. The MSN Explorer Persona team included one full-time usability engineer and the part-time efforts of a product designer and two additional usability engineers. The Windows Persona creation team consisted of 22 people: several technical writers, several usability engineers, four product planners, and two market researchers. After the Windows Personas were created, the ensuing Persona campaign involved the part-time efforts of several usability engineers, ethnographers, graphic designers, and product planners.

Process An Introduction to Personas The use of Personas, fictional people, in product design is widely heralded: in Alan Cooper’s book The Inmates are Running the Asylum [8], in tutorials by Kim Goodwin of Cooper Design [11], and in workshops [23] [24], newsletters [5] [14] [20], on-line resources [10] [15] and research papers [4] [12] [19]. The use of abstract representations of users originated in marketing [e.g., 18], but Cooper’s use of Personas, their goals, and activity scenarios is focused on design. He notes that designers often have a vague or contradictory sense of their intended users and may base scenarios on people similar to themselves. His “goal-directed design” provides focus through the creation of fictional Personas whose goals form the basis for scenario creation. Cooper’s early Personas were rough sketches, but over time his method evolved to include interviews or ethnography to create more detailed characters [16]. Prior to Cooper, others promoted the use of abstract representations of users to guide design: user profiles and scenarios derived from contextual inquiry [13] [25] and user classes fleshed out into “user archetypes” [17]. These representations were also used as a basis for scenario construction.

Project dates and duration The MSN Explorer Persona effort started in January of 2000 and lasted about 10 months. The actual creation of the Personas took about 2 months. Our Windows Persona effort started around March of 2001 and is ongoing at the time of writing this paper (roughly two years), though the initial creation and validation of the Windows Personas took about three months.

Cooper’s approach can be effective, but our use of Personas diverges in several ways. He emphasizes an “initial investigation phase” and downplays ongoing data collection and usability engineering: “Seems like sandpaper…. Very expensive and time-consuming, it wasn’t solving the fundamental problem.” [8] Statements such as “We always design before putting

3

up buildings” and claims that designers have an innate ability to make intuitive leaps that no methodology can replace [11] understate the value of user involvement. Personas used alone can aid design, but they can be more powerful if used to complement, not replace, a full range of quantitative and qualitative methods. They can amplify the effectiveness of other methods.

The four problems listed on the right are also noted in a recent paper by Blomquist and Arvola [4], describing a Persona effort that was not considered fully successful.

Personas might help a designer focus. However, their greatest value is in providing a shared basis for communication. Cooper emphasizes communicating the design and its rationale among designers and their clients: “It’s easy to explain and justify design decisions when they’re based on Persona goals...” [16]. We have extended this, using Personas to communicate a broader range of information to more people: designers, developers, testers, writers, managers, marketers, and others. Information from market research, ethnographic studies, instrumented prototypes, usability tests, or any other relevant source can be conveyed rapidly to all project participants. Our experience with Personas We have actively used Personas, and refined our techniques for using them, for over three years. When the MSN Explorer effort began, we did not set out to create Personas. In fact, we were only vaguely familiar with the concept. Our goal was to help a development team understand and focus on a set of target users. We read Cooper’s 1999 book and looked around the industry and our company to see how other teams had defined their audiences and communicated that information to their broader team. Many product teams within our company had done significant work with market segmentation, user role definition, user

profiling, and fictional character definitions created for use in scenario-based design. One specific technique, under the name “user archetypes,” started around 1995 with a single product team and focused primarily on product planning, marketing, and product messaging [17]. Their approach resembled Geoffrey Moore’s “targeting customer characterizations” [18]. Over time, other product teams adopted that method, and adapted it to better suit product development. Although much of the adoption and adaptation of Persona-like methods by various teams happened independently, common issues arose and similar solutions were developed. From others around the company who had been directly involved with creating these user abstractions or who were expected to use them in product definition and design, we found that the early Persona-like efforts suffered from four major problems: 1. The characters were not believable; either they were obviously designed by committee (not based on data) or the relationship to data was not clear. 2. The characters were not communicated well. Often the main communication method was a resume-like document blown up to poster size and posted around the hallways. 3. There was no real understanding about how to use the characters. In particular, there was typically nothing that spoke to all disciplines or all stages of the development cycle. 4. The projects were often grass-roots efforts with little or no high-level support (such as people resources for creating and promoting Personas, budget for posters or other materials to make the Personas visible, or

4

encouragement from team leaders: “thou shalt use these characters”). The approach outlined here was developed specifically to address these four problems. It has been refined to address additional issues encountered along the way: ! How best to create user abstractions? ! How much can be fictional, and what should be based

on data? ! What data is most appropriate? ! How can different types of data be combined? ! How can you validate your creations? ! Can multiple related product teams share a common

set of abstractions? ! How can you determine whether the effort was worth

it? ! Did the product get better as a result? and so on.

Our method and process by necessity combined techniques gleaned from the previous Persona-like efforts with what we could learn from Cooper’s book, which was not written as a “how to” manual. Our MSN Explorer Personas effort suffered from several problems. First, because this was new to us, we began with little idea of how much work was involved and what would be gained. Thus, obtaining resources and creating reasonable timelines was difficult. We started with no budget and two people who had plenty of other work to do. We began the Personas effort as the product vision and initial planning were being completed. By the time we finished creating Personas, which took much longer than expected, our team had fully completed the basic design and specification phase

of the cycle. We had neither time nor resources to do original research, but were fortunate that others had completed several field studies and market research pertinent to our product. Finally, the whole idea of using fictional characters to aid design was new to most people on our development team, so there was much resistance to overcome and education required. By the time we began the Windows Personas effort, our understanding of the method had grown tremendously through our experiences and through sharing experiences with other Persona practitioners [23]. Because of the success of previous Persona efforts and the Persona buzz around the industry, the method had become more familiar and fairly well accepted by the development team. We were given people resources and a decent budget for posters, events, and other promotional exploits. Most important, Personas were being requested by execs and team leaders as well as members of the design and development team. What we had set out to do in our first effort was more likely to be achieved in this larger effort.

Practice details Creating and using Personas: Our approach The following is a bulleted sketch of our current process. Where appropriate, we call out differences in the resource-lacking Explorer Personas effort and the resource-intensive Windows Personas effort. ! We attempt to start an effort using previously

executed, large-sample market segmentation studies, much like those discussed by Weinstein [27]. Highest priority segments are fleshed out with user research that includes field studies, focus groups, interviews, and further market research. We use metrics around

5

market size, historical revenue, and strategic or competitive placement to determine which segments are enriched into Personas. We try to keep the set of characters down to a manageable number: 3 to 6 Personas, depending on the breadth of product use. ! Generally, we collect as much existing related market

and user research as possible (from internal and external sources) to help inform and “fill out” the Personas. We have yet to start a Persona effort in an area that does not have some existing quantitative and qualitative data. Thus, our own research endeavors typically start after we create our Personas. ! Although we have not yet created full-on international

or disabled Personas, we included international market and accessibility information in our Personas. Several of our partner teams have also created “anti-Personas”: Personas intended to identify people that are specifically not being designed for. ! In our larger Persona effort, involving 22 people, we

divided the team so that each Persona (6 in all) had 2 or more dedicated people. At the other extreme, two people created all four MSN Explorer Personas, though a few other people contributed to or reviewed various aspects of the work from time to time. As mentioned, this lighter effort relied solely on existing user research, and as a result, generated far less detailed Personas. ! The Windows Personas effort drew on many research

studies. We divvied up the research documents, with each team member becoming well acquainted with only a few studies. We then held “affinity” sessions where we physically cut data points and interesting/relevant facts out of the studies and pinned them to a wall to form groups of related findings across studies. The resulting groups of findings were used in writing narratives that told the story of the data.

! As we wrote the Personas’ stories, we employed

qualitative data and observational anecdotes where possible. A not quite yet achieved goal is to have every statement in our Personas generated from or related to user data or observation. ! In all Persona efforts, we use a central “foundation”

document for each Persona as a storehouse for information about that Persona (data, key attributes, photos, reference materials, and so on). Figure 1 shows the table of contents for a typical foundation document. Note that the foundation document is not the primary means of communicating information about the Persona to general team members (more on this following). Likewise, the foundation documents do not contain all or even most of the feature scenarios (for example, “walk-through” scenarios are located directly in the feature specs). Instead, the foundation document contains goals, fears, and typical activities that motivate and justify scenarios that appear in feature specs, vision documents, storyboards, and so forth. ! Links between Persona characteristics and the

supporting data are made explicit and salient in the foundation documents. These documents contain copious footnotes, comments on specific data, and links to research reports that support and explain the Personas’ characteristics. All Persona illustrations and materials point to the foundation documents (which are on an intranet site) to enable team members to access the supporting documentation. ! Once a basic Persona description is written, we find

local people to serve as models and hold one- to twohour photo shoots to get visual material to help illustrate and communicate each Persona. We have avoided stock photo galleries because they typically

6

Figure 1: The table of contents for a foundation document.

Overview – Alan Waters (Business Owner) Get to know Alan, his business, and family. A Day in the Life Follow Alan through a typical day. Work Activities Look at Alan’s job description and role at work. Household and Leisure Activities Get information about what Alan does when he’s not at work. Goals, Fears, and Aspirations Understand the concerns Alan has about his life, career, and business. Computer Skills, Knowledge, and Abilities Learn about Alan’s computer experience. Market Size and Influence Understand the impact people like Alan have on our business. Demographic Attributes Read key demographic information about Alan and his family. Technology Attributes Get a sense of what Alan does with technology. Technology Attitudes Review Alan’s perspective on technology, past and future. Communicating Learn how Alan keeps in touch with people. International Considerations Find out what Alan is like outside the U.S. Quotes Hear what Alan has to say. References See source materials for this document.

offer only one or two shots of a given model and the images are too “slick.” ! For our Windows Personas effort, after our Personas

were created, we set up “sanity check” site visits with users who match the Personas on high-level

characteristics to see how well they match on low-level characteristics. We do this because our creation method utilizes multiple data sources, many of which are not directly comparable or inherently compatible. ! Once the Personas’ documents and materials are in

place, we hold a kick-off meeting to introduce the Personas to the team at large. ! Communicating our Personas has been multifaceted,

multimodal, ongoing, and progressively discloses more and more information about them. Our foundation documents are available to anyone on the team who wishes to review them, but they are not the primary means for delivering information. Instead, we create many variations of posters, flyers, and handouts over the course of the development cycle. For the Windows Personas we even created a few gimmicky (and popular) promotional items (e.g., squeeze toys, beer glasses, and mouse pads—sprinkled with Persona images and information). We created Web sites that host foundation documents, links to supporting research, related customer data and scenarios, and a host of tools for using the Personas (screening material for recruiting usability test participants, spreadsheet tools, comparison charts, posters and photos, etc.). In an ongoing “Persona fact of the week” e-mail campaign, each Persona gets a real e-mail address used occasionally to send information to the development team. Figure 2 shows two general posters designed to further a team’s understanding of the Personas. One compares important characteristics of four Personas. The other communicates the fact that our Personas are based on real people and tries to provide a sense of the essence of a Persona by providing quotations from real users who are similar to that Persona. Figure 3 shows two posters from a series

7

that provides information specifically about how customers think about security and privacy. The first again provides real quotes from users who fit our various Persona profiles. The second poster shows how a real hacker targeted people who resemble one of our Personas. ! We instruct our team in Persona use and provide tools

(Note: Figures 2 and 3 are intentionally small and hide detail to preserve proprietary information in them.)

to help. Cooper describes Persona use mostly as a discussion tool. “Would Alan use this feature?” This is valuable, but we have generated additional activities and incorporated them into specific development processes. We created spreadsheet tools and document templates for clearer and consistent Persona utilization. As an example of how Personas can become explicitly involved in the design and development process, Figure 4 shows an abstract version of a feature-Persona weighted priority matrix that can help prioritize features for a product development cycle. In the example, the scoring in the feature rows is as follows: -1 (the Persona is confused, annoyed, or in some way harmed by the feature), 0 (the Persona doesn’t care about the feature one way or the other), +1 (the feature provides some value to the Persona), +2 (the Persona loves this feature or the feature does something wonderful for the Persona even if they don’t realize it). The sums are weighted according to the proportion of the market each represents. Once completed, the rows can be sorted according to the weighted sum and criteria can be created to establish what features should be pursued and what features should be reconsidered. Shown below, features 2 and 4 should be made a high priority for the development team; feature 3 should probably be dropped.

Figure 2: Two general posters: one comparing characteristics across Personas; the other presenting real quotes from users that fit the profile of one of our Personas.

Figure: 3: Two more targeted posters: one communicating aspects of security and privacy across all of our Personas; the other showing how certain types of hackers can target one of our Personas

8

Persona 1

Persona 2

Persona 3

50

35

15

Weighted Sum

Feature 1

0

1

2

65

Feature 2

2

1

1

150

Feature 3

-1

1

0

-15

Feature 4

1

1

1

100

-

-

-

-

Weight: Figure 4: A feature by Personaweighted priority matrix.

Etc.

! We make a strong effort to ensure that all product and

feature specification documents contain walk-through scenarios that utilize our Personas. We do the same with vision documents, storyboards, demos, and so forth. Unfortunately for the MSN Explorer effort, we completed our Personas too late in the process to use this approach. ! During the Windows Personas effort, we collected

Persona scenarios from across the product team in a spreadsheet that enables us to track and police the use of the Personas (this also enables us to roughly gauge the direction of a product as it is developed—for example, how many scenarios are written for Toby vs. Abby when we know Abby is a higher-priority target). ! Design teams have made creative visual explorations

based on the Personas. More specifically, they created branding and style collages by cutting and pasting images that “feel like” our Personas from a variety of magazines onto poster boards [Fig. 5]. They then used these boards to do a variety of visual treatments across several areas of our product. In another Persona effort, we took these types of explorations to focus groups to understand in detail what aspects of the designs were appealing and how they worked together to form a holistic style. Although the Personas were not critical to this process, they served as springboard that inspired creation.

! As a communication mechanism useful to the Persona

team itself, we create Persona screeners and recruit participants for usability and market research. We then categorize, analyze, and report our findings by Persona type. For the Windows Personas, we have gone to the extreme of creating a Persona user panel. Through an outside firm, we established a 5,000-person panel of users that match our Persona profiles. We poll the panel on a regular basis to better understand reported activities, preferences, and opinions, as well as reactions to our feature plans, vision, and implementations. We have not aged our Personas over time, but we do revise them as new data becomes available. Unlike Cooper, we support a strong, ongoing effort to obtain as much quantitative and qualitative information about users as possible, thereby improving the selection, enrichment, and evolution of sets of Personas. ! One of our technical writing groups, a partner to the

Windows team, used the Windows Personas to plan and write “How to” and reference books for the popular press. In doing so, they expanded the Personas to include notions of learning style, book usage patterns, and so forth to enrich how they authored for specific audiences. ! Although this hasn’t happened for the Persona efforts

described here, in other efforts the quality assurance test team has used Personas to organize bug bashes and select/refine scenarios for their QA testing. ! For the Windows Personas, we undertook a large effort

to reconcile two sets of target audiences (one in the form of Personas and one in the form of customer segments) when a team working on a related product was directed to be “better together” with our product.

9

Figure 5: A Persona focused style collage.

Results Benefits of Personas It is clear to us that Personas can create a strong focus on users and work contexts through the fictionalized settings. Though we have not tried to formally measure the impact of our Personas, the subjective view of the Personas and the surrounding effort by the development team has been favorable. A wide range of team members (from executives to designers and developers) knows about and discuss our product in terms of the Personas. We’ve seen Personas go from scattered use (in early Persona projects) to widespread adoption and understanding (in recent product cycles). Our Personas are seen everywhere and used broadly (for example, feature specs, vision documents, storyboards, demo-ware, design discussions, bug

Figure 6: A design exploration based on the Figure 5 style collage.

bashes—even used by VPs arguing for user concerns in product strategy meetings). Not only have our development teams engaged with Personas, but also correspondingly they have engaged with our other user-centered activities. Our Persona campaigns generated a momentum that increased general user focus and awareness. With our most recent Persona effort, we’ve had partner teams building related but different products adopt and adapt our Personas in an effort to enhance cross-team collaboration, synergy, and communication. The act of creating Personas has helped us make our assumptions about the target audience more explicit. Once created, the Personas have helped make assumptions and decision-making criteria equally

10

explicit. Why are we building this feature? Why are we building it like this? Without Personas, development teams routinely make decisions about features and implementation without recognizing or communicating their underlying assumptions about who will use the product and how it will be used. The “feature by Persona-weighted priority matrix” described in the previous section is a good example of this. Using that tool inevitably results in favored features or seemingly important features being pushed down in the list. When this happens, teams must be very explicit with their reasoning to get a feature back in the plan. We stress to the team that this tool is not “golden,” it is a guide; exceptions can and should be made, when appropriate.

test a product focusing on Alan scenarios, another day focusing on Laura scenarios. As stated in the previous section, this works for testers and other product team members, in “bug bashes,” for example. An experienced tester reported feeling that he was identifying “the right kind” of problems in drawing on knowledge of a Persona in guiding his test scripts and activities. Compare this to an observation from a study of interface development: Some people realized that tests conducted by Quality Control to ensure that the product matches specification were not sufficient. One manager noted, “I would say that testing should be done by a group outside Development, ‘cause Development knows how the code works, and even

Personas are a medium for communication; a conduit for information about users and work settings derived from ethnographies, market research, usability studies, interviews, observations, and so on. Once a set of Personas is familiar to a team, a new finding can be instantly communicated: “Alan cannot use the search tool on your Web page” has an immediacy that “a subset of participants in the usability study had problems with the search tool” doesn’t, especially for team members who now, for all intents and purposes, see Alan (a Persona) as a real person. We have found this to be extremely powerful for communicating results and furthering our teammates’ understanding of the Personas.

though you don’t want it to, your subconscious makes you test the way you know it works. See, those people in the Quality Control group have nothing to do with customers. They’re not users.” In fact, two members of Field Support were reported to have found more bugs than the Quality Control group in the latest release, and they had accomplished this by working with the product as they imagined that users would. Testing by Field Support was an innovative experiment, however, and not part of the accepted development process. “The Quality Control group has a lot of systematic testing, and you need some of that, but at the same time, you need somebody who is essentially a customer. It is as if you had a customer in house who uses it the way a

Finally, Personas focus attention on a specific target audience. The method helps establish who is and consequently who is not being designed for. Personas explicitly do not cover every conceivable user. They also help focus sequentially on different kinds of users. For example, a quality assurance engineer can one day

customer would every day, and is particularly tough on it and shakes all these things out. That’s what these two guys did, and it was just invaluable.” [21, p. 64]

11

The Field Support engineers could “test as a user” because of their extensive experience with customers. That Persona use results in similar positive reports is encouraging. Risks of Personas Getting the right Persona or set of Personas is a challenge. Cooper argues that designing for any one external person is better than trying to design vaguely for everyone or specifically for oneself. This may be true, and it does feel as though settling on a small set of Personas provides some insurance, but it also seems clear that Personas should be developed for a particular effort. In making choices it becomes clear that the choices have consequences. For example, they will guide participant selection for future studies and could be used to filter out data from sources not matching one of the Persona profiles. Related to this is the temptation of Persona reuse. After the investment in developing Personas and acquainting people with them, it may be difficult to avoid overextending their use when it would be better to disband one cast of characters and recruit another one. It can be good or bad when our partner teams adopt or adapt our Personas. Different teams and products have different goals, so the Personas are stretched a bit. So far, the stretching has been modest and closely tied to data (because our target customers do indeed overlap), but it is a concern. In addition, marketing and product development have different needs that require different Persona attributes, and sometimes different target audiences. Marketing is generally interested in buyer behavior and customers; product development is interested in end

users. We’ve had some success in collaborating here, but there are rough edges. Finally, we have seen a certain level of “Persona mania” within our organization and others. Personas can be overused. At worst, they could replace other usercentered methods, ongoing data collection, or product evaluation. Personas are not a panacea. They should augment and enhance: augment existing design processes and enhance user focus. We’ve found that Personas enhance user testing and other evaluation methods, field research, scenario generation, design exploration, and solution brainstorming.

Discussion Theory of mind: how personas work At first encounter, Personas may seem too “arty” for a science and engineering-based enterprise. It may seem more logical to focus directly on scenarios, which after all describe the actual work processes one aims to support. Cooper offered no explanation for why it is better to develop Personas before scenarios. For 25 years, psychologists have been exploring our ability to predict another person’s behavior by understanding their mental state. Theory of Mind first asked whether primates share this ability [22] and then explored its development in children [1]. Every day of our lives, starting very young, we use partial knowledge to draw inferences, make predictions, and form expectations about the people around us. We are not always right, but we learn from experience. Whenever we say or do something, we anticipate the reactions of other people. Misjudgments stand out in memory, but we usually get it right.

12

Personas invoke this powerful human capability and bring it to the design process. Well-crafted Personas are generative: Once fully engaged with them, you can almost effortlessly project them into new situations. In contrast, a scenario covers just what it covers.

Method acting uses a great deal of detail to enable people to generate realistic behavior. Detailed histories are created for people and even objects, detail that is not explicitly referred to but which is drawn on implicitly by the actor.

If team members are told, “Market research shows that 20% of our target users have bought cell phones,” it may not help them much. If told “Alan has bought a cell phone” and Alan is a familiar Persona, they can immediately begin extrapolating how this could affect behavior. They can create scenarios. We do this kind of extrapolation all the time, we are skilled at it—not perfect, but very skilled.

A fiction based on research can be used to communicate. For example, watching a character succumb slowly to a dementia on ER, one can understand the disease and perhaps even design technology to support sufferers, if the portrayal is based on real observation and data.

THE POWER OF FICTION TO ENGAGE People routinely engage with fictional characters in novels, movies, and television programs, often fiercely. They shout advice to fictional characters and argue over what they have done off-screen or after the novel ends. Particularly in ongoing television dramas or situation comedies, characters come to resemble normal people to some extent. Perhaps better looking or wittier on average, but moderately complex— stereotypes would become boring over time. METHOD ACTING AND FOCUSING ON DETAIL Many actors prepare by observing and talking with people who resemble the fictional character they will portray. As with Personas, the fictional character is based on real data. An actor intuits details of the character’s behavior in new situations. A designer, developer, or tester is supported in doing the same for the people on whom a Persona is based.

Merging personas with other approaches As noted above, we see Personas complementing other approaches, or used where another approach is impractical. SCENARIOS AND TASK ANALYSIS Scenarios are a natural element of Persona-based design and development. In Carroll’s words [7], a scenario is a story with a setting, agents, or actors who have goals or objectives, and a plot or sequence of actions and events. Given that scenarios have “actors” and Personas come with scenarios, the distinction is in which comes first, which takes precedence. Actors or agents in scenario-based design are typically not defined fully enough to promote generative engagement. Consider Carroll’s example: “An accountant wishes to open a folder on a system desktop in order to access a memo on budgets. However, the folder is covered up by a budget spreadsheet that the accountant wishes to refer to while reading the memo. The spreadsheet is so large that it nearly fills the display. The accountant pauses for several seconds, resizes the

13

spreadsheet, moves it partially out of the display, opens the folder, opens the memo, resizes and repositions the

novel, predictable stereotypes become boring, so more complex, realistic characters are more effective.

memo and continues working.”

The lifelessness of characters in such scenarios has been critiqued from a writer’s perspective [19] and by scenario-based design researchers who suggest using caricatures, perhaps shocking or extreme ones [6] [9]. Bødker writes in [6], “It gives a better effect to create scenarios that are caricatures… it is much easier… to relate to.… Not that they ‘believe’ in the caricatures, indeed they do not, but it is much easier to use one’s common sense judgment when confronted with a number of extremes than when judging based on some kind of ‘middle ground.’” She also recommends constructing both utopian and nightmarish scenarios around a proposed design to stimulate reflection. Task analysis is generally directed toward formal representations that are particularly difficult to engage with generatively. An exception is [2], which recommends detailed character sketches. These thoughtful analyses point to weaknesses in scenarios taken alone. Unless based strongly on data, a scenario can be created to promote any feature, any position (utopian or dystopian), and can be difficult to engage with. Personas need not be extreme or stereotyped characters; the team engages with them over a long enough time to absorb nuances, as we do with real people. This duration of engagement is critical. In a movie, heroes and villains may be stereotyped because of a need to describe them quickly, as with stand-alone scenarios. But in an ongoing television series or a

CONTEXTUAL DESIGN AND ETHNOGRAPHY Contextual Design [3], a powerful approach to obtaining and analyzing behavioral data, is a strong candidate for informing Personas. As it evolved over two decades, Contextual Design increasingly stressed communication with team members, ways to share knowledge acquired in the field. Personas are primarily a tool to achieve this and thus a natural partner to Contextual Design [4] [14]. Ethnographic data may help the most in developing realistic Personas, when available in sufficient depth. Quantitative data may be necessary in selecting appropriate Personas, but does not replace observation. Again, the parallel to method acting arises. Why not just use real people? Designing for a real person is better than designing blind, but just about everyone has some behaviors one would not want to focus design on. Using a real individual would exclude or complicate the use of data from market research, usability testing, and so on. It could undermine the confidence of team members in the generality of particular behaviors—team members do step back and recognize that a Persona represents a group of people, as when they describe “testing six Alans.” PARTICIPATORY DESIGN AND VALUE-SENSITIVE DESIGN Participatory or cooperative design focuses on the eventual users of a system or application. It has the same goal of engaging developers with user behavior and also enlists our ability to anticipate behaviors of familiar people. When designing for a relatively small,

14

accessible group of people, this approach makes the most sense. Product development is more challenging for participatory design. We discuss the relationship of Personas and participatory design in depth in [12]. Early participatory design efforts included a strong focus on sociopolitical and “quality of life” issues. These issues are more significant today as the reach of computing extends [26]. Although the industry and many companies have engaged these issues at a high level, most usability and interaction design techniques avoid addressing these issues. Persona use brings sociopolitical issues to the surface. Each Persona has a gender, age, race, ethnic, family or cohabitation arrangement, socio-economic background, work, or home environment. This provides an effective avenue for recognizing and perhaps changing assumptions about users. If one populated a Persona set with middle-aged white males, it would be obvious that this is a mistake. Cooper writes “all things being equal, I will use people of different races, genders, nationalities, and colors.” Realism, not “political correctness,” is his stated goal. He stereotypes if he feels it will provide more credence and avoids casting strongly against expectations if he feels it will undermine credibility. Persona use does require decision-making. It isn’t a science. If not used appropriately, any powerful tool can take one down the wrong path, as in lying with statistics or using non-representative video examples. Personas are one such powerful tool. It is up to all of us together to develop effective ways to use them.

Acknowledgments We thank Gayna Williams, Shari Schneider, Mark Patterson, Chris Nodder, Holly Jamesen, Tamara Adlin, Larry Parsons, Steve Poltrock, Jeanette Blomberg, and members of the Microsoft Personas and Qual groups.

References [1] Astington, J.W., & Jenkins, J.M. (1995). Theory of mind development and social understanding. Cognition and Emotion, 9, 151-65. [2] Benyon, D. & Macauley, C. (2002). Scenarios and the HCI-SE design problem. Interacting with computers, 14, 4, 397-405. [3] Beyer, H. & Holtzblatt, K. (1998). Contextual design. Morgan Kaufmann. [4] Blomquist, Å. & Arvola, M. (2002). Personas in action: Ethnography in an interaction design team. To appear in Proc. NordiCHI 2002. [5] Brechin, E. (2002). Reconciling market segments and Personas. http://www.cooper.com/newsletters/2002_02/reconcili ng_ market_seg ments_and_Personas.htm [6] Bødker, S. (2000). Scenarios in user-centred design— Setting the stage for reflection and action. Interacting with computers, 13, 1, 61-75. [7] Carroll, J. (2000). Making use: Scenario-based design of human-computer interactions. MIT Press. [8] Cooper, A. (1999). The inmates are running the asylum. Macmillan. [9] Djajadiningrat, J.P., Gaver, W.W. & Frens, J.W. (2000). Interaction relabelling and extreme characters: Methods for exploring aesthetic interactions. Proc. DIS 2000, 66-71. [10] Freydenson, E. (2002). Bringing your Personas to life in real life. http://boxesandarrows.com/archives/002343.php

15

[11] Goodwin, K. (2002). Goal-directed methods for great design. CHI 2002 tutorial. http://www.acm.org/sigchi/chi2002/tut-sun.html#9 [12] Grudin, J. & Pruitt, J. (2002). Personas, participatory design, and product development: An infrastructure for engagement. Proc. PDC 2002, 144-161. [13] Hackos, J. & Redish, J. (1998). User and task analysis for interface design. John Wiley and Sons, New York. [14] Holtzblatt, K. (2002). Personas and contextual design. http://www.incent.com/community/design_corner/02_ 0913.html [15] Hourihan, M. (2002). Taking the “you” out of user: My experience using Personas. Boxes and arrows. http://boxesandarrows.com/archives/002330.php [16] Interview with Kim Goodwin (2002). User Interface 7 East. [17] http://www.uiconf.com/uie-7/goodwin_interview.htm [18] Mikkelson, N. & Lee, W. O. (2000). Incorporating user archetypes into scenario-based design. Proc. UPA 2000. [19] Moore, G. A. (1991). Crossing the chasm. Harper Collins Publishers, New York. [20] Nielsen, L. (2002). From user to character—an investigation into user-descriptions in scenarios. Proc. DIS 2002. [21] Perfetti, C. (2002). Personas: Matching a design to the users' goals. User Interface 7 East. http://www.uiconf.com/uie-7/goodwin_article.htm

[22] Poltrock, S.E. and Grudin, J. (1994). Organizational obstacles to interface design and development: Two participant observer studies. ACM Transactions on Computer-Human Interaction, 1, 1, 52-80. [23] Premack, D. & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral & Brain Sciences, 4, 515-526. [24] Pruitt, J.S., Jamesen, H. & Adlin, T. (2001). Personas, user archetypes, and other user representations in software design. UPA 2001 workshop. http://www.upassoc.org/conf2001/reg/program/worksh ops/w2.html [25] Pruitt, J.S., Jamesen, H. & Adlin, T. (2002). Creating and using personas: A practitioner’s workshop. UPA 2002 workshop. http://www.usabilityprofessionals.org/conferences/200 2/program/workshops/wkshop_Personas.php [26] Tahir, M. F. (1997). Who's on the other side of your software: Creating user profiles through contextual inquiry. Proc. UPA ’97. [27] Value-sensitive design site: http://www.vsdesign.org/ [28] Weinstein, Art, (1998). Defining your market: winning strategies for high-tech, industrial, and service firms. New York: Haworth Press.

fiiom the basic meLaphor to the choice of background color. The team .just barely has time to incorporatr these comments into a rwiscd design brforr committing

jerrs ranging

by Marc Rettig

Now considrr a diffcrrnt situation, one I have witnessed first-hand over the past few months: a development team spends weeks designing an interface. During the first few days, they construct a paper prototype of their initial thinking about all aspects of the design, and test it with typical representatives of the user commu-

nity. One of them “plays computer,” moving components of the paper interface around on the table in response to the users’ actions. The others observe and take notes. After the tests they take a week to distill lessons from their observations, redesign the interface, and retest with several new users. This process continues until, at the end of the time allotted for interface design, the team has revised their design four times and tested it with many typical users. This technique-building prototypes on paper and testing them with real users-is called low-fidelity prototyping or “lo-t7 for short. The value of promtyping is widely recognized, but as thr first situation exenlplilies, that value is not always gained in practice. If that has brrn your experience, you might want t” try lo-C promtyping, which requires little more in the way of implementation skills than the ones you learned in kindergarten. The idea oflo-fi prototyping (a.k.a. “paper prototypes”) has been around a long time. So long, in fact, that nmre than one person in the CHI community expressed surprise when I said I was planning to write a column on the subject. But I see this as a wonderfully simple and effective tool that has somehow failed to come int” general use in the software community. I say this based on the success of the teams I’ve watched over the past several months together with the fact that this is the first commercial project where I’ve seen paper prototypes employed. Paper prototyping is potentially a breakthrough idea for organizations that have never tried it, since it allows you to demonstrate the behavior of an interface very early in development, and test designs with real users. If quality is partially a function of the number of iterations and reftnements a design undergoes before it hits the street, lo-fi prototyping is a technique ,that can dramatically increase quality. It is fast, it brings results early in development (when it is relatively cheap to make changes), and allows a team to try far more ideas than they could with high-fidelity prototypes. Lo-fi prototyping helps you apply Fudd’s first law of creativity: “To get a good idra, grt lots of ideas.”

The Problems with Hi-Fi For years developers have used everything from demo-builders to multimedia tools to high-level languages to build prototypes. Lo-fi proponents call these “hi-h prototypes.” They have their place: selling an idea, testing look-and-feel, detailed proof-ofconcept, testing changes to an existing system, and so forth. I’m not suggesting we should stop building them. But they also have problems. l Heji ~ro@#~e.i take too long to budd andchange. Even with high-level tools, a fully functional prototype can take weeks to create. I have seen teams build a complete working lo-Ii prototype in four hours. The goal is to get through as many iterations as you can during the design phase, because each iteration means improvement. If testing flushes out problems with the basic metaphor or control structure in a design, changing the prototype can again take weeks. This is what Debbie Hix and Rex Hartson, researchers and faculty members at Virginia Tech, call the “Software developer’s dilemma.” You can’t evaluate an interaction design until after it is built, but after building, changes t” the design are difficult. Paper prototypes, on the other hand, are extremely fast to develop and the technique is very easy to learn. It is the fastest of the so-called rapid prototyping techniques. To make a broad generalization, interface designers spend 95% of their time thinking about the design and anly 5% thinking about the mechanics of thr t”“l. Software-based tools, no matter how well executed, rcversc this ratio. l Reuiewm and L&en lend lo commcnl on “f;l nndfin~~h”rsua You are trying m get feedback on the big things: the flow of the conversation, the general layout of the controls, the terminology, the expressiveness and power of the basic metaphor. With a slick software prototype, you are just as likely t” hear criticisms about your choice of fonts, color combinations, and button sires. On the back side of the same coin, developers easily become obsessed with the prettiness-power of a good tool, and spend their hours choosing colors instead of coming up with new ideas.

In contrast, the hand-made appearance of a paper or acetate prototype forces users to think about content rather than appearance. l Developers resist changes. They are attached t” their work because it was so hard to implement. Spend enough time crafting something and you are likely to fall in love with it. Knowing this, team members may feel reluctant t” suggest that their colleague should make drastic changes to the lovely looking, weeks-in-the-making software prototype. They would bc less hesitant to suggest redrawing a sketch that took an hour to create. . A @,tot@e in software can .sel e$mLalions thrill wzll be hard to change. Prototyping tools let you do wonderful things in a (relatively) short time. You can make something that looks like a finished product, fooling testers and even management into thinking how far you are along. If it tests well, you may wind up spending time on “reverse damage control,” handling questions about your sudden lack of progress. l A single bug in a hi-/i protolype fun bring a lest lo n comfilele halt. To test effectively, your prototype needs to be complete and robust enough for someone TV try to do something useful with it. Even with the coolest of highlevel tools, building a prototype is still essentially a programming exerciseand we all know how hard it can he t” get all the bugs out of a program. On the other hand, 1 often see teams correcting “hugs” in a paper prototype while the test is in progress. A Trojan Meme The spread of lo-6 design through my current project started with a visit from Jared Spool (with User Interface Engineering in Andover, Mass.). He and his associate presented the basic ideas, then put us to work in four trams to design and build a prototype of an automated menu for a fast food restaurant. For three hours we discussed, designed, sketched and glued, then ran the results in a faceoff competition with “real users” and a “real task.” That is, we brought people in from elsewhere in the building and told them, “you have $4.92. Order as much food as you can.” The designs were measured by how quickly and efficiently people

Lo-fi prototyping works because it effectively educates developers to have a concern for usability and formative evaluation, and because it maximizes the number of times you get to refine your design before you must commit to code. could use the interfaces without coaching from the designers. Between tests, each team had a few minutes to refine their interface. We were all impressed with the resuits of the exercise. In about six hours we had learned the technique, designed an interface and built a model of it, conducted tests, and measurably improved the original design. That was four months ago, and now we have scores of people working on lo-ti designs, refining them through repeated tests with actual “sers. Interface sketches are lying all over the place, scans are put on the network for peer review, and terms like “affordance” and “mental model” are common parlance. I call this a “Trojan meme” instead ofjust a “selfish mcme” because it did more than reproduce itself through the department. (A meme is an ideathe mental equivalent of a gene, and selfish ones try to replicate themselves in as many minds as possible.) As it spread, it served as a vehicle for spreading a general appreciation of the value of usability design: developers saw first-hand the difference in people’s reactions to successive refinements in their designs. Within days of designing an interface, they saw exactly how their work was perceived by people just like those who will eventually be using their product. The value of two important laws of interaction design was memorably demonstrated: “Know Your User,” and “You Aren’t Your User.” Testing for iterative refinement is known in the interface design community as “formative evaluation,” meaning you are evaluating your design while it is still in its formative stages. Testing is used as a kind of natural selection for ideas, helping your design evolve toward a form that will survive in the wilds of the user community. This is in contrast to “summary evaluation,” which is done

once after the product is complete. With summary evaluation you find out how well you did, but you find out too late to make substantial changes. Lo-ii prototyping works because it effectivelv educates dewlowers to have a Concern for usability and formative evaluation, and because it maximizes the number of times you get to refine your design before you must commit to code. To make the most of these advantages, the prototyping effort needs to be carefully planned and followed by adequate testing and evaluation. (It also helps to have someone who can enthusiastically champion the idea.) Hix and Hartson have an excellent chapter on formative evaluation in their book, Develofing User Interfam If you plan to adopt any of these techniques, I recommend you read their book. The rest of this is drawn from our experience over dozens of designs and scores of tests, notes from Jared Spool’s workshop, and Hix and Hartson’s book. Building a Lo-F1 Prototype I. Assemble a kit. In this decadent age of too many computers and too few paint brushes, it might be hard to get all the materials you need by rummaging through the supply closet in the copy room. Make a trip to the office supply store, or better yet, the art supply store, and buy enough school supplies to excite the creative impulses of your team. Here’s a shopping list: l White, unlined, heavy paper that is bigger than letter size (11 by I7 inches is nice), and heavy enough to endure the rigors of repeated testing and revision. l Hundreds of 5-by-&inch cards. These come in handy as construction material, and later you’ll use them by the score for note taking during tests.

Various adhesives. Tape: clear, colored, double-backed, pin stripin tape, whatever. Glue sticks, and most importantly, Post-It glue-a stick of the kind of glue that’s on the back of those sticky yellow notes. Rolls of white correction tape arc great for button labels and hurriedly written field contents. l Various markers-colored pens and pencils, highlighters, tine and thick markers, pastels. l Lots of sticky note pads of various sires and colors. l Acetate sheets--the kind you USCIO make overhead presentations. Hix and Hartson swear by these as the primary construction material for lo-ii interfaces. l See what you find in the architrcture section. They have sheets of rubon texture, for example, which could give you an instant shading pattern. l Scissors, X-act” knives, straightedges, Band-Aids. l

Just like kindergartners, lo-6 drsigners sometimes find inspiration in the materials at hand. So go aheadbuy that package of colored construct tion paper. The worst that can happen is you won’t USC it. Eventually your tram will develop their own construction methods, and settle on a list of essentials for their lo-ii construction kit. 2. Set a deadline. There is a terrific temptation to think long and hard about each aspect of the interface before you commit anything to paper. How should you arrange the menus? What should be in a dialog box, what should be in menus, and what should be in a tool palette? When you are faced with a blank sheet of paper, these kinds of decisions crowd your thoughts all at once. “Wait,” you think, ?ve haven’t thought about this enough!” That’s exactly the point: no matter how hard you think about it, you

Ciawe I. Afewcomponents Of a paper prototvpe. The main window is in the middle, showing a few Pieces of data added with strips Of Correction tape. and controls stuck on with Post-It paper. The window is surrounded by POP-UP menus, dialog boxes, and sundry interface widgets.

gets, producing large amo”“ts of data, or rendering artistic and attractive designs. Exploit these talenu and divide the labor accordingly. Construct a first version completely by hand. Sketch the widgets, hand-letter the labels. Don’t eve” worry about “sing a straightedge at first. Just get the ideas down on paper. Test small details on one another, or drag people in from the hall for quick tests of alternative solutions. Of course, hand-draw” sketches, no matter bow carefully done, may not be appropriate for some testing situations. For example, a customer may be willing to let you test your design with actual users. They may understand the transience of the prototype, but you still want to make a good impression. You want to look sharp. Some of the teams on my project have made remarkably attractive paper interfaces using components created with drawing software, then printed on a laser printer. Some of them build up a familiar look with elements taken from screen captures. To facilitate this kind of thing, they set up a library of lo-ti widget images: blank buttons of all sizes, window and dialog frames, scroll bars, entry tields, and so on. People print these out, resize them on the photocopier, and make them part of their standard lo-t7 kit. Or they resize them on the computer, add labels, and print o”t a cmtom part for their work in progress. This is a” example of the kind of

preparation that will help lo-ii prototyping become a normal part of your design process. Preparing a widget library, writing down guidelines, and taking time to train people will make everyone more enthusiastic and productive. Preparing for a Test However much care you take in building your prototype, the tests will be ineffective unless you prepare well for them. Be sure to attend to the following matters. 1. Select your trsers. Before you start designing, you should do enough user and task analysis to understand the people who will he using your software--their educational and training background, knowledge of computers, their familiarity with the domain, typical tasks involved in their job, and so on. Based on this study, you can look for pools of potential testers for your prototype. With a good user profile on band, you can develop a questionnaire that will help to choose the best representative users from available candidates. If would seem reasonable to arrange it so the people testing your prototype are the same people who will be using the final product. But bona fide members of the user community may be hard to corral for the time it takes to run a test, and using them may not be the best idea in the long run. Be sensitive to the political climate. People may feel threatened

by the intrusion of a new system into their work (perhaps justifiably!), or there may be a competitive situation that makes your employer reluctant to expose “ew ideas outside the walls of your building. Since you are looking for appropriate knowledge and skills, not job titles, you can often get by with “surrogate users”-people who fit the same profile as your actual clients, but free from whatever association that prevents you from testing with the clients themselves. I’ve heard of all kinds of tricks for attracting people to the test. Spool says he’s done everything from running ads in the newspaper to recruiting university students to contacting local user groups. Anything to avoid using actual customers, employees, or friends and family. (The latter may be accessible, but there are a lot of things about sharing ties in the same social web that can conspire to damage a usability test. For example, testers who know you or the project may skew the results by trying hard to please you or do what they think you expect them to do.) Finally, remember that no two people are the same, and your product’s users may be a diverse group. Try to recruit testers that represent the whole range of characteristics in your target audience. Our practice has been to conduct at least one

elgure2.

A lo-fl testing

session

..

.e

_..

bb

mund of resting in our off& with surrogates, then go t” the field fog testing with the m”st typical end users we can find. 2. Prepare test scenarios. Write a set of scenarios, preferably drawn fl-om task analysis, describing the product during use in a typical work situati”n. Design your prototype to support a few of these scenarios, narrowing the scope of your &orts f” a reasonably small set of functions, but broad enough t” allow meaningful tests. If possible, ask someone t” review the scenarios and sample data and tell you whether they look realistic. In our experience, people find a lo-fi interface more engaging-more realistic-if it shows data that looks familiar and we ask them t” perform realistic tasks. This helps draw them into the “let’s pretend you’re really using a cnmputer at your job” world, which leads to better tests. On the other hand, unrealistic scenari”s and data can severely damage the credibility of your design. 3. Practice. Just as a bug in a software prototype can ruin a test se,sion, so can a bug in a lo-fi prototype. That bug could be a missing compw “em, a misunderstanding “n the pan “f the person playing “computer,” “I even excessive hesitation and c”nf& Sian because the team is unfamiliarwith how to conduct a tat. So to avoid embarrassment, conduct xveral dry runs belore you test with people from outside your team. Each team member should be comfortable with his or her role, and you need I” make SUE you have the supplies and equipment needed TV gather gaod information. Conducting a Test We find it takes four pcoplc t” get tbr mat “ut of a test session (see Figure 2). and that their activities fall int” four essential roles: Greeler. Much the same as the usher in a church, thr greeter welc”mes users and tries t” put them at ease. We have some forms we ask people t” lill out-an experience profile, for example--a job the greeter handles while other team members are setting up for the test. l Facililnlor. Once the test is set up, the facilitator takes the lead, and is l

_L.

Practical ~__ -w..

Progmmmer ..-.

the only team member who is allowed to speak freely during the test. Facilitating means three things: giving the user instructions, encouraging the user t” express his or her thoughts during the test, and making sure evcrything gets donr on time. This is a difficult enough job that the facilitat”r should not be expected t” take notes during a session. l Compulw. One tram member acts as the “computer.” He or she knows the application I”gic thowughly, and sustains the illusion that the paper pi”totype behaves similar to a real computer (with an unusually slow response time). A pointing fingelserves as a cursor, and expressions like, “I ‘half-silvered ‘YP’ bicuspidon in that field” substitute for keyboard entry. If the use,. touches a control, the computer inranges the prototype t” simulate the response, taking care not t” explain anything other than the bebavi”~- of the interface. l Obsuwrs. The rest of the tram members quietly take notes “n s-byX-inch index cards, writing one observation prr card. If they think “1 a recommended solwi”n, they write it on the same card that record, the problem. Since all of these r”les can be exhausting, we rotate them among the tram when we amduct more than OFF session a day (and we very “lien schedule low sessions in a day). Typical tat sessions usually last a little wee- an haul-, and go through three phases: getting ready, conducting the test, and debriefing. We begin with greetings, introductiwr. irefrrshmrnts and general ire-bl-eaking, trying our very best to assuw people that the test is confidenti the results will remain anonymous, and their supervisor won’t hear a w”rd about whether or not they “got it.” people often say things like, “Am I flunking the test? Am I getting it right?” To which we anwet-, “Don’t worry, the qurstion is whether or not we are flunking. The interface is on trial, not you. If you fail t” understand something or can’t complete one of the tasks, that’s a sign of trouble with the design, not a lack of intelligence on your part.” While this is going “n, someone positkms a video camera (we tape all

the sessions) so it points down over the user’s shoulder t” lo”k at the interface and the hands moving over it. No ““e’s face ever appears on tape. During the test, the Facilitator hands written tasks t” the user one at a time. These must be very clear and detailed. As the person works on each task, the facilitator tries t” elicit the user’s thr,ughts without influencing his or her choicer. “What are you thinking right now?” “What qucati”ns are “n your mind?” “Are you confused about what you’re seeing?” While this is going on, the rest of the tram members “bsrrve and take notes, and may occasionally interject a question. But they must never laugh, gape, say “a-ha.” nudge one another, “r “then-wise display their reaction to what’s happening to their careful design. This kind of thing can intimidate or humiliate users. ruining the arlationship and spoiling the test. It can be terribly difficult to keep still while the user spends IO minutea using all the wrong conools for all the wnmg reasons. You will feel a compelling urge to explain the de+ t” your users. Don’t give in. When the hour is over, we spend a IO-minute debriefing session asking questions, gathering imp,-essions, and expressing our thanks. Evaluating ReSultS I.“-fi “1 hi-Ii, prrnotyping is worthleas unless information is gathered and the product is relined based on you) findings. As I wr”te earlier, Hix and Harts”” nicety cover the details of gathering and analyzing trst data. We spend quite a bit of time (at least a day per iteration) sorting and prioritizing the n”tr cards we wr”tr during the test sessions. Ow methr,d involves arranging the paper prototype “n a big table. then piling the n”tc cards next t” its relevant interface component. Then team members divide the labor ofgoing through the piles t” summarize and prioritize rhc problems. These sorted piles inform a written report on findings from the test, and form the agrnda of a merting t” discuss recommended changes f” the design. The team warks through the piles and agrees “n suggested changes, which are written on Post-It n”tes and affixed directly to the rele-

,... L

_.

,..

.

.

.

.

.

LL

_.b.

>

van, part of the paper prototype. Constructing the revised prototype becomes a procrss of taking each component, and following the recommendations that were stuck to it.

Hix, who for years has been teaching courses and workshops in interface design, says that people consistently enter the tint lo-f, exercise with skepticism. After trying it they invariably say something to the extent of, “I can’t believe how much we learned from this!” If this column is the first place you have heard about the lo4 technique, one danger is that you will set aside this magazine with just enough skepticism that, however much interest I’ve managed to create, you will fail to actually try it. Having seen other skeptics converted, I’m confident in recommending this technique. If you already have a working high-fidelity prototype, it probably isn’t worth abandoning that course to switch to lo-L But if you are in the very early stages of design and exploring broad questions, or if you need to learn more now, lo-fi prototyping is just the tool to pick up. 0

Ku&l,

J. and knee.

for a happier, I, I “a”.

s.

Twn,y-two

healthirr prototype.

,994). 35-4”.

ups lnlerar.

WIZARD OF OZ STUDIES — WHY AND HOW Nils

Dahlbdck,

Natural

Arne

Language

Department

Jdnsson,

Processing

of Computer

S-58 183

Phone:

+46

KEYWORDS: Design and evaluation, lan=mage interfaces.

dialogue, natural

THE NEED FOR WIZARD OF 02 STUDIES Dialogue has keen an active research area for quite some time in natural language processing. It is fair to say that researchers studying dialogue and dkcourse have developed their theories through detailed analysis of empirical data from many diverse dialogue situations. In their recent review of the field, Grosz, Pollaek and Sidner [12] mentions work on task-oriented dialogues, descriptions of complex objects, narratives, informal and formal arguments, negotiations and explanations. One thing which these studies have shown is that human dialogue is a very complex activity, leading to a corresponding complexity of the theories proposed. In particular it is evident that participants must rely on knowledge and reasoning capabilities of many different kinds to know what is going on in a dlalo=%. When it comes to using data and theories to the design of natural language interfaces it has often ben argued that human dialogues should be regarded as a norm aud a startingPermission granted direct

to

copy

provided commercial

title

of the

that

copying

without

fee

the copies

advantage,

publication

Machinery.

specific

a 1992

ACM

all or part

of this

are not made

the ACM

and its date

is by permission To copy

and/or

Intelligent

that

copyright

appear,

notice

and notice

of the Association

otherwise,

material

or distributed

or to republish,

J-557

.7/92

/00 J2/0J93...

User Interfaces

’93

and the is given

requires

Science

13281000 [email protected]

point, i.e. that a natural dialogue between a person and a computer should resemble a dialogue between humans as much as possible. But a computer is not a person, and some of the differences are such that they can be expected to have a major influence on the dialogue, thus making human-human data an unreliable source of information for some important aspects of design, in pwticular the style and complexity of interaction. First let us look at some of the differences between the two dialogue situations that are likely to play a significant role. We know that language is influenced by interpersonal factors. To take one example, it has been su~ested by R. Lakoff [18] and others that the use of so-called indirect speech acts is motivated by a need to follow “rules of politeness” (1. don’t impose, 2. give options). But will a user feel a need to be polite to a computer? And if not, will users of NLIs use indirect requests in the seamh of information from a database? If not, do we need a component in our NLI for handling indirect requests? This is obviously an empirical question that can be answered only by studying the language used in such situations. Indirect utterances are of course something more than just ways of being polite. There are other forms, such as omitting obvious steps in an argument - relying on the listener’s background knowledge, and giving answers, not to the question, but by supplying information relevant to an inferred higher goal. But also the use of these will presumably vary with the assessed characteristics of one’s dialogue partner. In the case of keyboard input another important factor is that the communication channel is different form ordinary human dialogues. The fact that the dialogue will be written instead of spoken will obviously affect the language used. As pointed out by Cohen [3] “Keyboard interaction, with its emphasis on optimal packaging of information into the smallest linguistic “space”, appears to be a mode that alters the normal organization of discourse.”

for

for Computing

permission. 0.8979

is

Laboratory

SWEDEN

[email protected],

ABSTRACT We discuss current approaches to the development of natural-language dialo=we systems, and claim that they do not sufficiently consider the unique qualities of man-machine interaction as distinct from general human discourse. We conclude that empirical studies of this unique comnnmication situation is required for the development of user-friendly interactive systems. One way of achieving this is through the use of so-called Wizard of Oz studies. We describe our work in tlis area. The focus is on the practical execution of the studies and the methodological conclusions that we have drawn on the basis of our experience. While the focus is on natural language interfams, the methods used and the conclusions drawn from the results obtained m of relevance also to other kinds of intelligent interfaces.

Ahrenberg

and Information

LINK~PING,

[email protected],

Lars

a fee

Much of our language behaviour, on all levels, from pronunciation to choice of words and sentences, can be seen as a result of our attempts to find the optimal compromise between two needs, the need to make ourselves understood, and the need to reach this goal with as little effort as possible. It is a

$j .=J)

193

well established fact in linguistic research that we ax speakers adapt to the perceived characteristics of our interlocutors. The ability to modify the language to the needs of the hearer seems to be present already at the age of four [21]. Language directed to children is different from language directed to adults, as is the case when talking to foreigners, brain-injured people etc. There are good reasons to believe that similar adjustments can and will be made when we are faced with the task of interacting with a computer in natural language. One important consequence of this is that goals in some dialogue research in computational linguistics such as ‘Getting computers to talk like you and me’ [201 or developing interfaces that will “allow the user to forget that he is questioning a machine’’[lOl, are not only difficult to reach. They are misconceived. Given these differences between the two types of dirdogue and the well-founded assumption that they will affect the linguistic behaviour of the human interlocutors, it follows that the language samples used for providing the empirical ground should come from relevant settings and domains. In other words, the development of NLI-software should be based on an analysis of the language and interaction style used when communicating with NLIs. Since, as we just observed, users adapt to the language of their interlocutors, analysis of the language used when communicating with existing NLIs is of limited value in the development of the next generation systems. This is what motivates data collection by means of Wizard of Oz techniques, i.e. studies where subjects are told that they are interacting with a computer system through a natural-language interface, though in fact they are not. Instead the interaction is mediated by a human operator, the wizard, with the consequence that the subject can be given more freedom of expression, or be constrained in more systematic ways, than is the case for existing NLIs, (Some well-known studies based on a more or less ‘pure’ Wizard of Oz technique are those of Cohen [3], Grosz [11], Guindon [13], and Kennedy et al. [171. For a review and discussion of these and other studies, see [5, 16]. [91 provides a review focused on speech systems) Of course you cannot expect to gather all the data you need for the design of a given application system by means of Wizard of Oz studies, e.g. as regards vocabulary and syntactic constructions related to the domain. But for finding out what the application-spxific linguistic characteristics are, or for gathering data as a basis for theories of the specilic genre of human-computer interaction in natural language, the Wizard of Oz-technique seems to us to be the best available alternative. The rest of this paper is concerned with a description of our work in the area of Wizard of Oz simulation studies. The focus is on the practical execution of the studies and the methodological conclusions that we have drawn on the basis of our experience. Some results on the characteristics of human-computer-interaction in natural lan=~age have been reported elsewhere [4, 5, 6, 7, 8, 161 and further work is currently in progress. Some of the major results obtained

194

this far is that man-machine dialogues exhibit a simpler structure than human dialogues, making it possible to use simpler but computationally more tractable dialogue models; that the users and system rely on an conceptual model specific for the domain but common for all users, i.e. uses mutual knowledge based on community membership, to use the terminology of [2]. These results in turn suggest less need for dynamic user modelling but a larger need for dynamic focus management than has hitherto been assumed in the HCI/NLP communities.

SOME DESIGN ISSUES To circumvent the risk of drawing general conclusions

that in fact are only a reflection of the specific experimental setting used, we have striven to vary the type of background system, not only as regards the content or application domain, but also as regards the ‘intelligence’ of the system and the types of possible actions to perform by the person using it. This far we have collected approximately 150 dialogues, using nine different Ral or simulated background systems. Apart from the use of ‘pure’ natural language, in one case the dialogues also contain tables displaying the contents of the lNGRIN-database, and in two cases a limited use of graphics is possible. Our aim is to simulate the interaction with a system in the case of an occasional or one-time user, i.e. a user who is unfamiliar with the system, but has some knowledge of the domain. We think that this is the most relevant user-category to study, as otherwise the user will be adapted to the system. We have the~fo~ tried to use background systems and tasks which follow these criteria. But it is not enough to have a reasonable background system and a good experimental environment to run a successful experiment. Great care should also be taken regarding the task given to the subjects. If we give them too simple a task to solve, we will not get much data to analyse, and if we give them too detailed instructions on which information they should seek from the system, there is a risk that what they will type is not their way of phrasing the questions but ours. Our approach has keen to develop a so-called scenario, i.e. a task to solve whose solution requires the use of the system, but where there does not exist one single correct answer, and/or where there are more than one way to reach the goal. Fraser and Gilbert [9] in their simulations of speech-systems also propose the use of scenarios to achieve nxdistic interactions. We have previously stressed some consequences of the fact that computers are different from people which has motivated the choice of Wizard of Oz simulations. But another consequence of this is that such simulations are very difiicult to run. People are flexible, computers are rigid (or consistent); computer output is fast; people are slow at typewritinfj, computers never make small mistakes (e.g. occasional spelling errors), people make them all the time. The list could be made longer, but the conclusion is obvious. If we want our subjects to believe that they are communicating with a computer also after three exchanges, we cannot let the person

Intelligent

User Interfaces

’93

FL YGPL4TS UTDA1U14 UTKLOCKAN HENKLOCKAN RESPRIS FORSiKRING

i

-

Av8E3TsKYo0

{DSK}homd.g.tru34

.b,/.lplaWl'sphedle#b~ rktilre5orkptefieme'.c

V?ngfri tid ha. resor till f!31jande platier Ikaria. Lefkada. twos. Kreta. Samos och klla dessa W bar utm To1 on. ..>

J Grekl and: 101 on

-;:?JTAR

382C

+W,.gfritidvet

----

.- R15M&L -km Inte wara - PROBLEM 1’

T;

t

:Kyyw

r---

a.;

~-

..-

--”

~

“’;,””””’

KR

:w~t~fl:;ti,,.)

05:00

05.36

THE SIMULATION ENVIRONMENT ARNE The simulation environment now exists in its third version, ARNE-3. Some of its main features are:

●

●

response editor with canned texts and templates easily accessed through menus ability

to access various background

systems

editor for creating queries to database systems interaction

log with time stamps

The simulation environment is customized for each new application. An overview of the simulation environment is shown in figure 1, where the application is a Travel agency system holding information on holiday trips to the Greek archipelago. The environment in its base configuration con-

Intelligent

User

Interfaces

’93

1

00$ ,durnp; $

Viinta... d “ Vingfritid ha!. resor till fiiljmde platser i Grekland:d Itiia, Lefkada. Hams, Kreta. Sams OCJ! Tolon. d hlla dessa ?4 W UW 101 On.4 => vart 1 igger toulond

simulating the computer just sit and slowly write the answers on the screen. Therefore, to make the output from the wizard memble that of a computer as far as possible as regards timing and consistency, we have developed an envi-’ ronment for conducting the experiments, currently running on SUN Spare stations in the Medley Lisp environment. The background system can be a real system on the same or another computer, or it can be simulated too. The simulation environment will be the topic of the next section.

●

‘“’

Dump

Figure 1: An overview of the simulation

●

1 @

,“&:421#%z4

sewer start wverl close cn”nectimd Low, clear! 6et! Put! convert! Dump h Format: {D5K>)ho m, fi"gatrtxYlabln lP1abfln@medleyb lvk$h,fr/reso r/tialoqerNp h File: out Fmmt: Out file.

(omi,&

;,J1

environment

(The Wizards view.)

sists of two parts, a log and a response editor, each accessed through its own window. The editor window can be seen in the lower left part of the screen, while the log window is found in the lower right part. Maps and other kinds of graphics can also be displayed. In one scenario for the Travel system the subject can also order a holiday trip. The window in the upper left part of the screen is the template for the order. This is filled in by the wizard as the interaction proceeds. When the ordering window is completed, the subjects receives a confirmation in natural language of the ordered item. This is generated automatically by a Common Lisp function from the order template. This is in line with our general policy to automate as much as possible of the interaction. The editor window is used to create responses to the subjects. When a response is ready it is sent to the subject and simultaneously logged in the log window. To speed up the response time the editor has a hierarchically organised set of canned texts which are easily inched through a set of menus, seen to the right in the editor window. Fi=~e 2 shows the hierarchy of canned texts for the menu item Resmill (Eng: Resort). The wizard can stop at any point in the hierarchy and will thus provide more or less detailed information depending on how far the dialogue has developed. So, in the

195

Figure 2: A menu hierarchy

example of figure 2, if the wizard stops at Lefkada, on the second level in the hierarchy, the subject will be provided with general information on Lefkada, while if the wizard stops at Adani, general information about hotel Adani on Lefkada is provided. The total amount of canned text available in this fashion is 2300 rows, where a row is everything between a full line of text to a single word. This corresponds to approximately 40 A4-pages. The text from the menus is entered into the editor window, to allow the wizru-d to edit the information, if necessary, before sending it to the subject. Certain messages are so simple and also commonly used that they cau be prompted directly to the subject without tirst passing the editor. These are also present in the quick menus. In the example there are two such quick responses, one is the prompt ==>, and the other is V@ta ... (Eng. ‘Wait ...“). The latter ensure that the subject receives an acknowledgement as soon as the input is acquired, meanwhile the wizard scans through the canned texts for an answer. The simulation environment can also be connected to existing background systems. One example of this is the Cars simulations, where subjects could acquire information on properties of used cars from an Ingres database containing such information. The simulation environment in these simulations consisted of four different windows. Here there were two windows added for database access, one is the actual database interfaee and the other is a query command editor. As forming a SQL-query can take quite some time we needed to speed up that process. This is done in a similar way as the mponse generation namely by using hierarchically organised menus. The menus contain information that can be used to fill in a SQL-template. The editor used for this P~W iS m instance of the same editor that was used for cxeating responses to the subject. Thus, the wizard need not

196

learn a new system which again provides a more efficient interaction as the same commands and actions are used to carry out both tasks. The database access menus do not only contain SQL-que~ templates, but also entries for the objects and properties that are stored in the database. Thus the wizard can avoid misspelled words which would lead to a database access failure and a slow down of response time. It is a time-consuming task to customize the simulation environment to a particular application. For some applications we have used some 2040 pilot studies before being satisfied with the scenario and the performance of the simulation. But we believe that without such preparation, there is a large risk that the value of the results obtained is seriously diminished.

OUR EXPERIMENTAL DATA We have conducted a number of Wizard of Oz-experiments using different background systems and scenarios.

Corpus 1 Corpus 1 was collected using the fist versions of the simulation environment. This corpus contains dialogues with five real or simulated background systems. The first system, PUB, was a library database then in use at our department, containing information on which books the department owned, and on which reseamhers room they were kept. Common bibliographic information was also obtainable. The C-line system was a simulated database containing information about the computer science curriculum at Linkoping University. The scenario for the subjects was that they should imagine themselves working as study counselors, their task being to answer a letter from a student with questions about the Master’s program in computer science. Five dialogues were run in this condition.

Intelligent

User

Interfaces

’93

In the third system, called Hi13, the user can order high quality HiFi equipment after having queried a simulated database containing information about the available equipment. The system can also answer some questions about which pieces can suitably be combined, so in a sense it is not a database but an expert system. The fourth system in this corpus is the first version of the automated travel agency encountered previously in the description of the simulation environment. In this version there were no graphics facilities. The last system in corpus 1 is a simulated wine selection advisory system. It is capable of suggesting suitable wines for different dishes, if necessary within a specified price range. It could also tell whether two wines could be combined in the same meal. The analysis of this corpus is presented in [5,7, 16] general overview, [4, 5, 8] dialogue structure, [5,6] pronoun analysis. [1] gives an overview of the NLI-project for which the analysis was used. [5] presents the most detailed analysis of both the dialogue structure and the pronoun patterns and also analyses the use of definite descriptions.

Corpus 2 The second corpus was collected using the refined Wizard of Oz-simulation environment presented here, and with a new set of scenarios.Thk corpus consists of totally 60 dialogues using two different background systems, the Cars database of used car models and a considerably revised and enlarged version of the travel system used in corpus 1. In this corpus half of the subjects could only obtain information from the system, whereas the other half of them also could order the trip as was the case in corpus 1. Dialogues where collected under two different conditions: one where the subjects knew that they were interacting with a person and one which was a real Wizard of Oz-sirnulation. We thus have 10 subjects in each cell. The analysis of this corpus is presently under way. Some results are used in [8].

The basic reason for not being able to use this promising application was that the system was only a prototype system that never was completed. Not only did it contain some bugs, but there were “holes” in its knowledge, i.e. some subareas for which no rules were implemented. It turned out to be impossible to create a scenario which guaranteed that the subjects kept themselves within the system’s competence. The lesson we learned from this was that if we shall use a real background system it must be well tested and functioning properly. Furthermore, the dialogue of EMYCIN-based expert systems is controlled by the system to an extent that it is difficult to simulate a more open dialogue where the user can take the initiative too. With the development of bitmapped screens and mouses, it becomes interesting to study multi-modal interfaces where users can use both written input and direct manipulation. And if we make it possible for the user to use both modes, we can learn something about when the different interface methods are to be preferred. We therefore txied to use a computer-based calendar system developed at our department for this purpose. In the system you can book meetings with groups and individuals, deciding on the time and location. You can also ask the system about these meetings, and about the times when people or groups of people are not booked, for instance when planning a meeting with them. You can do this by writing in a calendar displayed in a window on the screen, but also using a limited natural language interface, There were two major problems when designing this experiment. The first problem was to expand the ARNE environment so that it could handle graphics too. In the Calendar system we actually send the graphics on the net between the different work stations, which for obvious reasons gave long response times. This gave rise to some problems discussed below. In the later travel agency project we have therefore stored all the graphical information on the user’s machine, and only send a signal from the wizard to this station tell which picture to display.

SYSTEMS TRIED BUT NOT USED We have found the simulation of database-dialogues fairly straightforward, as is the case with the simulation of systems where the user can perform more tasks, such as ordering equipment after having obtained information about the stock. But for some other kinds of systems we have encountered different kinds of problems, in some cases leading us to abandon the project of collecting the dialogues for a particular system. One example of this was an EMYCIN based expert system, advising on tax issues in connection with the transfer of real estate. There were many reasons for our believing that this was a suitable system for our purposes. The domain is one with which we thought most people had some familiarity. Another reason was that rule-based expert systems such as this are a large and growing area and is considered one possible application domain for natural language interfaces.

Intelligent

User

Interfaces

’93

The second problem was deciding how to analyse the obtained data, and this we did not solve. If we lack well developed theories for dialogues in natural language, the case is even worse for this kind of multi-modal dialogues. The only thing we have been able to do thus far is to simply observe the users running the system. But even this simple data collection has given us one somewhat surprising observation concerning the effects of very slow response times on the dialogue structure. The interesting fact is that in spite of these long response times, the language used be the subjects still is coherent, with a number of anaphoric expressions, something which goes somewhat contrary to expectations, since one could assume that it is necessary for the user to have the dialogue ‘on the top of his consciousness’ to be able to use such linguistic devices. It is of course not possible here to give an explanation of this phenomenon, which in our opinion requires further investigation. But it is possible that the fact that both the dialogue and the calendar is displayed on

197

the screen affects the dialogue structure. Another system tried but not used was an advisorv mosmun for incom~ tax return and tax planning that run; in %3M PCs. The reason for thinking that this was a suitable system for our experiments is of course the same as the one first one described above. One reason for not using it was that very little dialogue was necessary to use the program, apart from filling in the menus that correspond to various part of the tax forms. So it seems as if a natural language interface is not the preferred mode for such a system, but at most something which can supplement it. Another difficulty was with the scenario, as people are not willing to present their own income tax planning in an experiment and it is quite a complex task to learn a fictitious tax profile. In one of our simulations we have tried to simulate also an advisory system. But there are some problems with this too, the most important beiig that it is difticult for the Wizard to maintain a consistent performance and give the same answers to similar questions from different users, and even from the same user. To some extent these problems can be overcome, but it seems to require longer development phases than for other kinds of systems. Advisory system thus seem to give us two kinds of problems if we want to use them in Wizard of Oz studies. On the one hand, the simulation of the system is difficult to do, and if one wants to use a ml system developed on existing shells, at least in some cases the dialogue is system driven to an extent that there seem to be little that can be gained from such a study. To summarize, we can identify three parameters that must be given careful consideration: the background system, the task given to subjects, and the wizard’s guide-lines and tools. ●

“

198

If the background system is not simulated it should be fully implemented. A shaky prototype will only reveal that system’s limitations and will not provide useful data. Furthermore, the system should allow for a minimum of mixed-initiative dialogue. A system-directed background system will give a dialogue which is not varied enough. The task given to subjects must be reasonably open, i.e. have the form of a scenario. Retrieving information from a database system and putting it together for a speciiic purpose can be fruitful. But, if the d~ main is so complex that it requires extensive domain knowledge, or the task is of a more private nature, then it is likely that the subjects try to tinish their task as quickly as possible and again not provide enough dialogue variation. The specification of the task must allow for varied outcomes. Many different outcomes must be considered “correct” and there should be many ways to expIore the background system to achieve a reasonable result.

●

The simulation experiment must be studied in detail, from pilot experiments, before the real simulations m carried out. This information is used to provide lmowledge to the wizard on how to act in various sitnations that he may encounter. Furthermore, the wizard needs a variety of pre-stored mresponses covering typical situations. Otherwise, besides slowing down the simulation, the ensuing variation will provide msuhs that are less generalizable.

FOR AND AGAINST THE CHOSEN METHOD We have conducted post-experimental interviews our subjects. The most important objective was of ascertain that they had not realized that the system simulated, and also to explain what we had done we had deceived them. (We also explained that the dialogue should be destroyed if they so wished.)

with all course to had been and why collected

In the review of the Wizard of Oz method Fraser and Gilbert

[91 argued, on ethical tgounds, against deceiving subjects about the nature of their conversation partner. We do not want to deny that the~ are ethical problems here, but we think that they can be overcome, and that there are good reasons for trying to do so. As pointed out above it has been shown that there me dfierences in the language used when subjects think they communicate with a computer and when they think they communicate with a person. And, what is more important, the differences observed concern aspects of language over which subjects seem to have little conscious control, e.g. type and frequency of anaphoric expressions used. So at least if your intenxts concern syntax and discourse, we consider it important to make the subjects believe that they are communicating with a computer, simply because we do not think that subjects can role-play here and give you the data you need. And if, on the other hand, you find that subjects iind it difficult to use the existing NLI, as for instance in [14], this amounts hardly to anything more than a demonstration of the limitations of existing technolof3Y. So much for the need, but how about the ethics? We would claim that if one uses the practice developed within experimental social psychology of a debriefing session afterwards, explaining what you have done and why you found it necessary to do so, and furthermore that you tell the subjects that the data collected will be destroyed immediately if they so wish, you will encounter little problem. In our experiments we have done so, and we have so far only had a number of good laughs and interesting discussions with our subjects on their expectations of what a computer can and cannot do, but no one has criticized us for our ‘lies’. Perhaps one ~ason for this is that none of the subjects felt that they had been put in an embarrassing situation. It is not exactly the same as Candid Camera. Another possible critique is that one should study existing systems instead of simulated ones. But in this case we agree with Temant’s [22] conclusion that people can often adapt

Intelligent

User

interfaces

’93

to the limitations of an existing system, and such an experiment does not therefore tell you what they ideally would need. It could also be argued that the human ability to adapt to the communicative capacity of the dialogue partner means that what we tlnd is only the subjects adaptive Rsponses to the wizard’s conception of what an NLI should be able to do. But it is exactly for this reason that the wizards in our experiments have not been instructed to mimic any specific capacity limitations. At the present stage of development of NLI technology, we cannot say with any high degree of certainty what we will and will not be able to do iU the future. Furthermom it is extremely difficult to be consistent in role-playing an agent with limited linguistic or communicative ability, so, to make such an experiment you would need some way of making the restrictions automatically, for instance by filtering the input through a specific parser, and only understand those utterances that can be analysed by this parser. Furthermore, the fact that we have used different wizards for the different background systems guarantees at least that the language we find our subjects using is not the nXlection of the idiosyncrasies of one single person’s behaviour in such a situation. The possible critique against the artificiality of the experimental setting can be levelled against another aspect of the method used, namely that the subjects are role-playing. They are not real users, and their motivation for searching information or ordering equipment is really not theirs. This is an argument that should be taken seriously. It is, however, our kelief that the fact that the subjects are role-playing affects different aspects of their behaviour differently. If the focus of interest is for instance the goals and plans of the users, and the way that is manifested in the dialogue, the use of role-playing subjects should be made with caution. But if the focus is on aspects not under voluntary conscious control (cognitively impenetrable, to use Pylyshyn’s, [191 term), the prospect is better for obtaining ecologically valid data. To take one specific example; if a user is just pretending to buy a holiday trip to Greece, she might not probe the alternatives to the extent that she would if she were in fact to buy it, simply because the goal of iinishing the task within a limited time takes precedence. But it does not seem likely that the latter fact will affect the use of pronouns in a specific utterance, or the knowledge about charter holidays and Greek geography that is implicitly used in interpreting and formulating specific utterances.

dialogue systems. The methodological point is simply that to acquire the relevant knowledge, we need high quality empirical data. But if the point is simple, gathering such data is not quite that simple. One way of doing so is by simulating intelligent interfaces (and sometimes also systems) using so-called Wizard of Oz-studies, i.e. having a person simulate the interface (and system). But even if the basic idea is simple, to acquire the required high-quality data a ~at deal of care and consideration need to be used in the design of such experiments. We have described our own simulation environment ARNE and some of our practical experiences, both positive and negative, form o~ rese~ch in this mea, to illustrate some of the points that we consider important if such a research program is to contribute to the development of theoretically and empirical y sound user friendly intelligent interfaces.

REFERENCES 1.

Ahrenberg, Lars, Arne J@sson & Nils Dahlback (1990) Discourse Representation and Discourse Management for Natural Language Interfaces, Proceedings of the Second Nordic Conference on Text Comprehension in Man and Machine, Tiiby, Stockholm.

2.

Clark, Herbert H.& Catherine Marshall (198 1) Deftite Reference and Mutual Knowledge. In Joshi, Aravind, Webber, Bonnie, and Sag, Ivan (eds.) Elements of Discourse Understanding. Cambridge, Mass.: Cambridge University Press.

3.

Cohen, Philip R. (1984) The pragmatic of referring and modality of communication, Computational Linguistics, 10, pp 97-146

4.

Dahlb~ck, Nils (199 la) Empirical Analysis of a Discourse Model for Natural Language Interfaces, Proceedings of the Thirteenth Annual Meeting of The Cognitive Science Society, Chicago, Illinois.

5.

Dahlbiick, Cognitive Ldtiping

6.

Dahlbiick, Nils (1992) Pronoun usage in NLI-dialogues: A Wizard of Oz study. To appear in Papers from the Thiid Nordic Conference on Text Comprehension in Man and Machine, Linkdping April, 21-23,1992.

7.

Dahlbiick, Nils& Arne J@sson (1989) Empirical Studies of Discourse Representations for Natural Language Interfaces, Proceedings of the Fourth Conference of the European Chapter of the ACL, Manchester.

8.

Dahlbtick. Nils & Arne JOnsson (1992) An em~iricallv based computationally tractable dialogue mod~l, Proceedings of the Fourteenth Annual Meeting of The Cognitive Science Society, Bloomington, Indiana.

CONCLUDING REMARKS The present paper makes two points, one theoretical and one methodological. On the theoretical side we argue that it is natural for any human engaging in a dialogue to adapt to the perceived characteristics of the dialogue partner. Since computers are different from people, a necessary corollary from this is that the development of interfaces for natural dialogues with a computer cannot take human dialogues as its sole starting point, but must be based on a knowledge of the unique characteristics of these kinds of dialoages. Our own work has been concerned with natural-language interfaces, but the argument is of relevance for all kinds of intelligent

Intelligent

User

Interfaces

’93

Nils, (1991b) Representations of Discourse, and Computational Aspects, PhD-thesis, University.

199

9.

Fraser, Norman & Nigel S. Gilbert (1991) Simulating speech systems, Computer Speech and Language, 5, pp 81-99.

10. Gal, Annie (1988) Cooperative responses in Deductive Databases. PhD Thesis, Department of Computer Science, University of Maryland, College Park. 11. Grosz, Barbara (1977) The Representation and use of Focus in Dialogue Understanding. Unpublished Ph.D. thesis. University of California, Berkely. 12. Grosz, Barbara J., Martha Pollack & Candaee L. Sidner (1989) Discourse, Jm Posner M. I. (M.) Foundations of Cognitive Science, Cambridge, MA The MIT Press 13. Guindon, Raymonde (1988) A multidisciplinary perspective on dialogue structure in user-advisor dialogues. In Guindon, Raymoned (cd.) Cognitive Science and its Applications for Human-Computer Interaction. Hillsdale, N.J.: Erlbaum. 14. Jarke, M,, Krause, J., Vassiliou, Y., Stohr, E., Turner, J. & White, N. (1985) Evaluation and assessment of domain-independent natural language query systems, IEEE quarterly bulletin on Database Engineering, Vol. 8, No. 3, Sept. 15. J6nsson, Arne (1991) A Dialogue Manager Using Initiative-Response Units and Distributed Control, Proceed-

200

ings of the F@th Conference of the European Chapter of the Association for Computational Linguistics, Berlin. 16. Ji5mson, Arne & Nils DahlbZiek (1988) Talking to a Computer is not Like Talking to Your Best Friend. Proceedings of TheJrst Scandinavian Conference on Arti& cial Intelligence, Tromso, Norway. 17. Kennedy, A., Wilkes, A., Elder, L.& Murray, W. (1988) Dialogue with machines. Cognition, 30,73-105. 18. Lakoff, R.T. (1973) The Logic of Politeness; or minding your p’s and q’s, Papers from the Ninth Regional Meeting, Chicago Linguistic Society, pp 292-305. 19. Pylyshyn, Zenon (1984) Computation Cambridge MA The MIT Press

and Cognition,

20. Reichman, Rachel (1985) Getting Computers Like You and Me, MIT Press, Cambridge, MA.

to Talk

21. Shatz, M. & Gehnan, R. The development of communication skills: Modifications in the speech of young children as a function of listener. Monographs of the Society for research in child development.38, No 152. 22. Tennant, Harry (1981) Evaluation of Natural Language Processors Ph.D. Thesis. University of Illinois UrbanaChampaign.

Intelligent

User

Interfaces

’93

Guerrilla HCI useit.com

Page 1 of 19 Papers and Essays

Guerrilla HCI

| Search

Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier by Jakob Nielsen, 1994 This paper was one of the chapters in the book Cost-Justifying Usability (edited by Randolph G. Bias and Deborah J. Mayhew).

One of the oldest jokes in computer science goes as follows: Q: How many programmers does it take to change a light bulb? A: None; it is a hardware problem! When asking how many usability specialists it takes to change a light bulb, the answer might well be four: Two to conduct a field study and task analysis to determine whether people really need light, one to observe the user who actually screws in the light bulb, and one to control the video camera filming the event. It is certainly true that one should study user needs before implementing supposed solutions to those problems. Even so, the perception that anybody touching usability will come down with a bad case of budget overruns is keeping many software projects from achieving the level of usability their users deserve.

1 The Intimidation Barrier It is well known that people rarely use the recommended usability engineering methods [Nielsen 1993; Whiteside et al. 1988] on software development projects in real life. This includes even such basic usability engineering techniques as early focus on the user, empirical measurement, and iterative design which are used by very few companies. Gould and Lewis [1985] found that only 16% of developers mentioned all three principles when asked what one should do when developing and evaluating a new computer system for end users. Twenty-six percent of developers did not mention a single of these extremely basic principles. A more recent study found that only 21% of Danish software developers knew about the thinking aloud method and that only 6% actually used it [Milsted et al. 1989]. More advanced usability methods were not used at all. One important reason usability engineering is not used in practice is the cost of using the techniques. Or rather, the reason is the perceived cost of using these techniques, as this chapter will show that many usability techniques can be used quite cheaply. It should be no surprise, however, that practitioners view usability methods as expensive considering, for example, that a paper in the widely read and very respected journal Communications of the ACM estimated that the "costs required to add human factors elements to the development of software" was $128,330 [Mantei and Teorey 1988]. This sum is several times the total budget for usability in most smaller companies, and one interface PDF created with FinePrint pdfFactory Pro trial version http://www.fineprint.com http://www.useit.com/papers/guerrilla_hci.html

06/09/2002

Guerrilla HCI

Page 2 of 19

evangelist has actually found it necessary to warn such small companies against believing the CACM estimate [Tognazzini 1990]. Otherwise, the result could easily be that a project manager would discard any attempt at usability engineering in the belief that the project's budget could not bear the cost. Table 1 shows the result of adjusting a usability budget according to the discount usability engineering method discussed below. The numbers in Table 1 are for a medium scale software project (about 32,000 lines of code). For small projects, even cheaper methods can be used, while really large projects might consider additional funds to usability and the full-blown traditional methodology, though even large projects can benefit considerably from using discount usability engineering. Original usability cost estimate by [Mantei and Teorey 1988]

$128,330

Scenario developed as paper mockup instead of on videotape

- $2,160

Prototyping done with free hypertext package

- $16,000

All user testing done with 3 subjects instead of 5

- $11,520

Thinking aloud studies analyzed by taking notes instead of by video taping - $5,520 Special video laboratory not needed

- $17,600

Only 2 focus groups instead of 3 for market research

- $2,000

Only 1 focus group instead of 3 for accept analysis

- $4,000

Questionnaires only used in feedback phase, not after prototype testing

- $7,200

Usability expert brought in for heuristic evaluation Cost for "discount usability engineering" project

+ $3,000 $65,330

Table 1 Cost savings in a medium scale software project by using the discount usability engineering method instead of the more thorough usability methods sometimes recommended. British studies [Bellotti 1988] indicate that many developers don't use usability engineering because HCI (human-computer interaction) methods are seen as too time consuming and expensive and because the techniques are often intimidating in their complexity. The "discount usability engineering" approach is intended to address these two issues. Further reasons given by Bellotti were that there were sometimes no perceived need for HCI and a lack of awareness about appropriate techniques. These two other problems must be addressed by education [Perlman 1988, 1990; Nielsen and Molich 1989] and propaganda [Nielsen 1990a], but even for that purpose, simpler usability methods should help. Also, time itself is on the side of increasing the perceived need for HCI since the software market seems to be shifting away from the "features war" of earlier years [Telles 1990]. Now, most software products have more features than users will ever need or learn, and Telles [1990] states that the "interface has become an important element in garnering good reviews" of software in the trade press. As an example of "intimidating complexity," consider the paper by Karwowski et al. [1989] on extending the GOMS model [Card et al. 1983] with fuzzy logic. Note that I am not complaining that doing so is bad research. On the contrary, I find it very exciting to develop methods to extend models like GOMS to deal better with real-world circumstances like uncertainty and user errors. Unfortunately, the fuzzy logic GOMS and similar work can easily lead to intimidation when software people without in-depth knowledge of the HCI field read the papers. These readers may well believe that such methods represent "the way" to do usability engineering even though usability specialists PDF created with FinePrint pdfFactory Pro trial version http://www.fineprint.com http://www.useit.com/papers/guerrilla_hci.html

06/09/2002

Guerrilla HCI

Page 3 of 19

would know that the research represents exploratory probes to extend the field and should only serve as, say, the fifth or so method one would use on a project. There are many simpler methods one should use first [Nielsen 1992a, 1993]. I certainly can be guilty of intimidating behavior too. For example, together with Marco Bergman, I recently completed a research project on iterative design where we employed a total of 99 subjects to test various versions of a user interface at a total estimated cost of $62,786. People reading papers reporting on this and similar studies might be excused if they think that iterative design and user testing are expensive and overly elaborate procedures. In fact, of course, it is possible to use considerably fewer subjects and get by with much cheaper methods, and we took care to say so explicitly in our paper. A basic problem is that with a few exceptions, published descriptions of usability work normally describe cases where considerable extra efforts were expended on deriving publicationquality results, even though most development needs can be met in much simpler ways. As one example, consider the issue of statistical significance. I recently had a meeting to discuss usability engineering with the head of computer science for one of the world's most famous laboratories, and when discussing the needed number of subjects for various tests, he immediately referred to the need for test results to be statistically significant to be worth collecting. Certainly, for much research, you need to have a high degree of confidence that your claimed findings are not just due to chance. For the development of usable interfaces, however, one can often be satisfied by less rigorous tests. Statistical significance is basically an indication of the probability that one is not making the wrong conclusion (e.g., a claim that a certain result is significant at the p