Individual and Group Decision Making: Current Issues 1134767978, 9781134767977

The idea for this volume took root during a recent annual convention of the American Psychological Association. The cont

320 13 10MB

English Pages 328 [314] Year 2013

Table of contents :
Front Cover
Individual and Group Decision Making
Copyright Page
Contents
Preface
I.Individual Addresses
1. Some Practical Judgment and Decision-Making Research: Hal R. Arkes
2. The Use of Multiple Strategies in Judgment and Choice: John W Payne, James R. Bellman, and Eric J Johnson
II. Processing Probabilistic information
3. Using Configura! and Dimensional Information: Stephen E. Edgell
4. Judgment of Nonlinear Contingencies and Applicationsof Contingencies to Organizational Behavior: John E. Sawyer
5. Becoming More or Less Uncertain: Janet A. Sniezek and Timothy Buckley
6. Decision Errors Made by Individuals and Groups: R. Scott Tindale
7. Paradoxes in Individual and Group Decision Making: A Plea for Models: N. John Castellan, Jr.
III. Jury Decision Making
8. The Normative Status of Base Rates at Trial: Jonathan J. Koehler
9. The Evaluation of Hearsay Evidence: A Social Psychological Approach: Peter Miene, Eugene Borgida, and Roger Park
10. Jury Decision Making and the Insanity Defense: James R. P. Ogloff
11. Research on Jury Decision Making: The State of the Science: William C. Thompson
IV. Naturalist Group Decision Making
12. Shared Mental Models in Expert Team Decision Making: Janis A. Cannon-Bowers, Eduardo Salas, and Sharolyn Converse
13. Team Decision Making and Technology: LorRaine Duffy
14. Group Situation Awareness and Distributed Decision Making: From Military to Civilian Applications: A. Rodney Wellens
15. Naturalistic Group Decision Making: Overview and Summary: William C. McDaniel
Author Index
Subject Index

Recommend Papers

Individual and Group Decision Making: Current Issues 0805810900, 0805810919, 9780203772744

Based on papers presented at the Science Weekend symposia of the 1990 convention of the American Psychological Associati

380 74 10MB Read more

The Impact of Individual Expertise and Public Information on Group Decision-Making [1 ed.] 9783658331399, 9783658331382

In this open-access-book the author concludes that expertise could be the key factor for global and interconnected probl

129 46 4MB Read more

Effective Meetings: Improving Group Decision Making 1483365646, 9781483365640

Lauded for its accessible format and humorous writing style, Effective Meetings: Improving Group Decision Making by John

192 49 578KB Read more

Contemporary Issues in Group Decision and Negotiation: 21st International Conference on Group Decision and Negotiation, GDN 2021, Toronto, ON, Canada, ... Notes in Business Information Processing) 3030772071, 9783030772079

This book constitutes the refereed proceedings of the 21st International Conference on Group Decision and Negotiation, G

109 17 Read more

Decisions: the complexities of individual and organizational decision-making 1788110382, 9781788110389

Decisions is a concise and easy-to-read introduction to a highly significant and intriguing topic. The concepts and anal

340 120 837KB Read more

Current Air Quality Issues

Air pollution is thus far one of the key environmental issues in urban areas. Comprehensive air quality plans are requir

738 87 21MB Read more

Human Rights: Current Issues and Controversies 9781442609556

Written largely by Canadian scholars for Canadian readers, this overview of contemporary human rights concerns introduce

116 94 2MB Read more

GIS for Group Decision Making [1 ed.] 9780748409327, 0748409327, 0203484908, 0203793145

The findings of this volume show how research into the use of group based GIS results in new knowledge of participatory

273 41 5MB Read more

Algorithms for Decision Making

This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of

718 56 8MB Read more

Algorithms for Decision Making

This book provides a broad introduction to algorithms for decision making under uncertainty. We cover a wide variety of

779 79 8MB Read more

Individual and Group Decision Making: Current Issues
1134767978, 9781134767977

Author / Uploaded
N. John Castellan Jr.

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Individual and Group Decision Making Current Issues

Individual and Group Decision Making Current Issues

Edited by

N. John Castellan, Jr. Indiana University

1m 1993

LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS Hillsdale, New Jersey Hove and London

Copyright © 1993 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, retrieval system, or any other means, without the prior written ,permission of the publisher. lawrence Erlbaum Associates, Inc., Publishers 365 Broadway Hillsdale, New Jersey 07642

Library of Congress Cataloging-in-Publication Data Individual and group decision making : current issues I edited by N. John Castellan, Jr. p. em. Based on papers presented at the Science Weekend symposia of the 1990 convention of the American Psychological Association. Includes bibliographical references and index. ISBN 0-8058-1090-0 (c.)- ISBN 0-8058-1091-9 (p.) I. Decision-making. 2. Decision-making, Group. I. Castellan, N. John, 1939BF448.153 1993 302.3-dc20 92-34702 CIP Books published by lawrence Erlbaurn Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

CONTENTS

Preface

I INDIVIDUAL ADDRESSES

ix

I

1 Some Practical Judgment and Decision-Making Research Hal R. Arkes

3

2 The Use of Multiple Strategies in Judgment and Choice John W Payne, James R. Bellman, and Eric J Johnson

II PROCESSING PROBABILISTIC INFORMATION 3

4

19

41

Using Configura! and Dimensional Information Stephen E. Edgell

43

Judgment of Nonlinear Contingencies and Applications of Contingencies to Organizational Behavior John E. Sawyer

65

v

CONTENTS

vii

15 Naturalistic Group Decision Making: Overview and Summary William C. McDaniel

293

Author Index

301

Subject Index

311

PREFACE

At the 1990 annual convention of the American Psychological Association, a portion of the meeting was devoted to a "Science Weekend," an exciting program sponsored by the APA Science Directorate and several AP A divisions. Three program tracks or themes were focal points for the weekend, and one track was Decision Making. The Decision-Making track included two invited addresses and symposia by distinguished scholars in the field. However, because of the strong interest in decision making, the time allotted for the Science Weekend could not accommodate the diversity and breadth of the field. As a result, additional sessions on decision making were included elsewhere in the meeting. As a participant in one of the symposia and an observer at others, I was struck by the strong interest in the sessionssome were standing room only-and the variety of approaches to decision making. During the convention, the idea for this volume took root and was encouraged and nurtured by colleagues. Participants in the convention symposia and invited addresses were enthusiastic about the possibility of presenting their work in a more coherent and extended form. As a result, this volume consists of two of the invited addresses and three of the symposia from that meeting-Processing Probabilistic Information, Jury Decision Making, and Naturalistic Group Decision Making. As the project developed in consultation with the various contributors, it became clear that we needed more than a simple compilation of manuscripts derived from convention presentations. The participants in each symposium share a common vision of research in their particular area and had an ix

X

PREFACE

opportunity to debate and clarify their ideas at the meeting. The ability to provide more extended narratives than possible with specialized journal articles was seen as an opportunity not just to advance the science of decision making, but to provide surveys and tutorials displaying the richness of decision-making methodology in psychology and the behavioral sciences. That methodology extends from detailed models of individual decision making to group decision processes, to jury decision making, to complex decision tasks in natural environments. It is necessary to understand the full range of decision-making methodologies, research, and applications if we are to develop coherent models and theories of individual decision making and group processes, and if we seek to enhance the decision making and judgment in complex systems. Because the general theme of the volume and its purpose extends beyond the intent of the original presentations, the contributors have rewritten and expanded their presentations. In some cases, they follow the meeting presentations closely, but in most the meeting presentation was the springboard for the development of the papers that could serve a broad audience from specialists in the field to those who desire an up-to-date compendium of current research. Thus the meeting presentations were the starting points for the final chapters in this volume. Sections II, Ill, and IV of the book could be read independently of the other sections. Each has the benefit of a final summary statement that either synthesizes all the chapters in the section or points to new directions emanating from the basic research in the area. Taken as a whole, the 15 chapters provide an exciting perspective of the field and could form a basic set of readings for courses on individual and group decision making in a variety of disciplines. To be sure, there are gaps, but the coverage from basic laboratory research to complex applied group decision processes should challenge researchers and students to pursue the field of decision making as enthusiastic scientists and practitioners. The final editing of this volume took place while I was on leave from Indiana University and serving as Visiting Scientist and Director for the Decision, Risk, and Management Science Program at the National Science Foundation. I am especially grateful to Hollis Heimbouch of Lawrence Erlbaum Associates for her early encouragement of the project and to many colleagues in the Society for Judgment and Decision Making who also encouraged the project and offered comments and suggestions on the various chapters. -N. John Castellan, Jr.

PART

I INDIVIDUAL ADDRESSES

CHAPTER

1 SOME PRACTICAL jUDGMENT AND DECISION-MAKING RESEARCH Hal R. Arkes Ohio University

"So what?" is a question I wish more people would ask cognitive psychologists. When we describe our latest research findings to interested laypersons, most listeners nod politely. However, they do not feel comfortable asking us the question they are asking themselves, namely, "What practical implications could such esoteric findings possibly have?" I believe that most researchers in the area of judgment and decision making could answer that question quite satisfactorily. But if we are not asked in a blatant way, we have a tendency not to mention the implications of our research. Without waiting for anyone to inquire, I demonstrate how applied this area of research can be. I present three very practical areas of judgment and decision-making research: economics, the hindsight bias, and the "validity effect."

ECONOMICS

The first topic is the psychology of windfall gains, and the "Cathy" cartoon (Fig. 1.1) helps illustrate the principle. Somehow saving a lot of money at a sale permits one to spend $400 on a parking place, a behavior one would normally not consider rational. The money Cathy saved at the sale is a windfall gain, and there is evidence that such money is treated differently than nonwindfall cash. What characterizes a windfall gain? Why is it so eminently spendable? Windfall money is unanticipated. It has not already been entered into any 3

4

ARKES

fiG. 1.1. Cathy demonstrates the spendability of windfall profits. CATHY, copyright © 1989, Cathy Guisewite. Reprinted with permission of UNIVERSAL PRESS SYNDICATE. All rights reserved.

"account." Because one has no plans for it, such money seems to burn a hole in one's pocket. To demonstrate this phenomenon I begin with a simple two-group study (Arkes et al., 1990). The design of this experiment is quite straightforward. One group came to the experiment anticipating some payment. A second group was totally surprised by being given money when they arrived. We merely assessed to what extent each group was willing to gamble the funds obtained at the beginning of the experimental session. If lack of anticipation is the key factor in the willingness to spend windfall gains, then the group surprised by being given money should be more likely to spend it. The sign-up sheet for this experiment directed undergraduate men to an office where they left their phone numbers. Between 1 and 5 days before the experiment was to take place, all subjects were telephoned by an experimenter. Those in the unanticipated-money group were merely reminded of the date, time, and location of the experiment. Those in the anticipatedmoney group were also told that they would be paid $3 for participating in the upcoming study. The male experimenter who telephoned the subjects greeted each subject when he arrived for the experiment. Subjects were taken individually to an experimental room where a female experimenter introduced herself and presented the subject with $3.00 in the form of 12 quarters. She then explained the experiment as follows: The first part of this experiment involves gambling. You will need this pair of dice. You can bet as much as you want on the roll of the dice, from 25¢ to $3. If you roll a number 7 or greater, you win. If you roll a number less than 7, you lose. For example, if you bet $1 and you roll a number 7 or greater, I will pay you $1. If you roll a number less than 7, you will pay me $1. You can roll the dice only once. How much do you want to bet?

I.

PRACTICAL RESEARCH

5

The 18 men stated their wager and rolled the dice. They actually won or lost the money they bet. Those who anticipated the money wagered an average of $1.00. Those who did not anticipate the money wagered an average of $2 .16. This difference was highly statistically significant. People in the group that did not anticipate the money spent about twice as much in the gambling situation as those who did anticipate the money. Hence it is reasonable to conclude that one factor in the proclivity to spend windfall gains is their unanticipated nature. We think that a number of practical phenomena are related to this finding. First, tax rebates are particularly effective in stimulating the economy, because such unanticipated funds are likely to be spent rather than saved. To test this implication of our hypothesis we ran a questionnaire study. Some subjects were given the following scenario: Suppose that your annual salary is $20,000 per year. Some friends have asked you to invest $1,000 in the construction of an indoor tennis club. You've checked into the financial records of such clubs in other cities, and it looks like about half of such clubs prove to be good investments. People who invest $1,000 in such clubs earn about 20% interest on their money every year if the club is a success. If the club is not a success, you can figure on rescuing only $250 of your original investment. Although you do not have any savings, the government has just announced that they will be giving an immediate rebate of $1,000 to every taxpayer in order to stimulate the economy. You could use your upcoming rebate check for this tennis court investment. Should you do it? Yes 34 No 11

For other subjects the portion of the scenario containing information about the tax rebate was replaced by: You have about $1,000 in savings, which you could use for this tennis court investment. Yes 25 No 22

The people who considered the tax rebate were significantly more likely to spend the money on the tennis courts than were those who were told that they had the money in savings. Again, windfall money-the tax rebate-is more spendable. A second implication of this finding was discussed by Richard Thaler and Eric Johnson (1990), who have noted that people are quite prone to "gamble with the house's money." Suppose you go to Atlantic City with $100 to gamble. If you win $50 right away, you would likely be quite eager to bet with that $50 winning, because it is the "house's money." After that $50 is gone,

6

ARKES

you will become more conservative with your original $100. That is not unanticipated windfall money. That is your money. Of course, it would have been rational to consider all $150 to be your money, but there appears to be an important psychological difference between your original money and the unanticipated windfall. The latter is more spendable. Third, there are important marketing principles related to this finding. For example, I bought a clothes washer and dryer on February 29, 1984. The salesman informed me that he was having an unadvertised "Leap Day Sale." He encouraged me to spend my savings on some gadgets he had strategically placed near the large appliances. Apparently, he had found that many people are willing to spend their windfall on electric screwdrivers, digital thermometers, and other items he had nearby. Because he wanted to unload some slow-moving inventory, it was probably wise for him to place it where windfall gains would be dispensed. Thus far, I have discussed the inability to treat windfall gains rationally. I now turn to the inability to treat certain costs rationally. The "sunk cost effect" is manifested in a greater tendency to continue an endeavor once an investment in money, effort, or time has been made (Arkes & Blumer, 1985). A prior investment should be irrelevant in the decision to continue a behavior. Economists agree that the only determinants of a behavior should be the incremental costs and benefits that will accrue if that behavior is chosen. For example, my wife and I bought an expensive aluminum pan several years ago. The very first dinner we cooked in it was lime chicken, which tasted like aluminum. I would estimate that about a quarter pound of aluminum from the pan had dissolved into the chicken. I suggested that we discard the pan, especially because evidence had been publicized that excessive aluminum in the diet causes Alzheimer's disease. My wife protested. "That pan cost a lot of money. We can't get rid of it!" I want to emphasize in the strongest possible terms that the determinants of a behavior should be the costs and benefits that are expected to derive from that behavior. If the costs of a behavior outweigh its benefits, then the behavior should not be done. We should not continue to cook with a pan that poisons us no matter how much it cost last month. That cost is a sunk cost. It should not influence our future behavior. I have seen a number of instances in which people seemed to be behaving in violation of the sunk cost principle. One such violation occurred in the "theater study" (Arkes & Blumer, 1985, Experiment 2). We simply took over the ticket booth at the Ohio University Theater on the day that season tickets were sold. As patrons approached the ticket office to request season tickets, we consulted a random number table and sold them one of three types of tickets. Approximately one third of the customers received the tickets at the normal price ($15). One third got a $2 discount, and the final third got a $7 discount. We color coded the 12 tickets in each packet,

I.

PRACTICAL RESEARCH

7

and we merely counted the number of plays the people in each of the three groups attended. Suppose you bought a ticket for full price in September. It is now January 15, it is 10° below zero outside, you have the flu. Furthermore, this is a surrealist play on sinus trouble, in which you are not too interested. Should you go? If you were in the no-discount group, you might reason, "Hell yes, I'm going. I paid a lot of money for this ticket." Those who received a discount might feel they should just stay home in bed, because they got a "good deal" on the tickets. Both lines of reasoning are irrational. If the benefits to be derived are expected to exceed the costs, then one should go. If the costs are expected to exceed the benefits, then one should not go. The money you paid several months ago is a sunk cost. It should not influence your decision to attend. That money has been paid regardless of your attendance. However, as we expected, those who paid the full price for their ticket were more likely to attend the first five plays than people who received a discount. This represents a naturalistic demonstration of the sunk cost effect. What motivates people to attend to sunk costs? We think that one possible answer is that people do not want to appear wasteful. They think that by not attending a play they will be "wasting" the sunk cost. To test this wastefulness explanation, we gave people the following questionnaire (Arkes & Blumer, 1985, Experiment 6). On your way home from work you buy a TV dinner on sale for $3 at the local grocery store. A few hours later you decide it is time for dinner, so you get ready to put the TV dinner in the oven. Then you get an idea. You call up your friend to ask if he would like to come over for a quick TV dinner and then watch a good movie on TV. Your friend says "Sure." So you go out to buy a second TV dinner. However, the on-sale TV dinners are gone. You therefore have to spend $5 (the regular price) for the TV dinner identical to the one you just bought for $3. You go home and put both TV dinners in the oven. When the two dinners are fully cooked, you get a phone call. Your friend is ill and cannot come. You are not hungry enough to eat both dinners. You cannot freeze one. You must eat one and discard the other. Which one do you eat?

Subjects were given the choice of discarding the $3 dinner, discarding the $5 dinner, or stating that they had no preference as to which one they discarded. The prediction of rational economic theory would be that you should be indifferent between the two dinners. After all, they are absolutely identical. However, a quarter of the subjects said they would eat the $5 dinner. To discard that one would appear wasteful. Three quarters of the subjects did act in accordance with rational economic theory and expressed indifference between the two dinners. The sunk cost effect does not apply just to tiny amounts of money. I read an argument in a news magazine in mid-1990 that the U.S. Congress should

8

ARKES

continue appropriations for the expensive B-2 bomber because so much money had already been spent on it! Similarly, the Chicago Tribune reported on July 13, 1991, that supporters of the $30-$40 billion space station claim that the $5 billion already spent would be wasted if Congress killed the project, even though the project cannot be justified on scientific grounds. Apparently, spending the remaining $30 billion on a worthless piece of hardware is not seen as wasteful as forsaking the $5 billion sunk cost. Further practical implications of research on the sunk cost effect were demonstrated by Larrick, Morgan, and Nisbett {1990) and Larrick, Nisbett, and Morgan {in press). These authors reasoned that people who were aware of economic principles such as the proper role of sunk costs should make better decisions and demonstrate superior adaptation in everyday life compared to those who were less aware. The first way these authors tested this hypothesis involved surveying faculty members in economics, biology, and humanities at the University of Michigan. Respondents were tested for their use of proper economic reasoning involving sunk costs and closely related concepts. Those who answered the survey questions according to rational economic theory had higher salaries than those who did not do so. This relationship was even stronger when the economists were omitted from the sample, so this salary relationship could not be due to the highly paid economists answering all the questions according to rational economic theory, whereas the starving artists answered them all contrary to the theory. Larrick et al. suggested that those who are aware of the proper role of sunk cost in decision making should make better decisions in a wide variety of areas, such as time management. These superior decisions would foster career enhancement, which would, in turn, have a positive effect on salary. For example, one of the sunk cost survey questions was "Have you ever dropped a research project because it was not proving worthwhile?" Those who feel they cannot leave a fruitless project simply because they have already spent so much time on it are likely to suffer three consequences: (a) They will answer "incorrectly" the Larrick et al. economic reasoning questions; {b) they will be more likely to waste a lot of time compared to their colleagues who do not attend to sunk costs; (c) their careers will suffer, resulting in lower salaries. These consequences would result in the relationship between salary and sunk cost reasoning found by Larrick et al. A second way Larrick and his colleagues tested their hypothesis involved surveying undergraduates. Because these survey respondents were too young to have a career and salary, the authors examined college grade point average (GPA) and other variables. First, it was found that the students' verbal scores on the Scholastic Aptitude Test (SAT) were significantly correlated with sound use of sunk cost and related economic principles in their own reasoning and behaviors. This would suggest that intelligence is related to the use of proper choice rules. Second, it was found that students whose GPAs are

I.

9

PRACTICAL RESEARCH

higher than what might be predicted based on their SAT scores are particularly likely to use such rules properly in their own decisions and behavior. The implication drawn by the authors is that placement of sunk costs in proper perspective is beneficial in one's adaptation to an educational environment. For example, one of the questions asked of the students was whether they had started a term paper over after nearly finishing it. The argument made by Larrick et al. was that those who could ignore sunk costs when confronted by a term paper that was not going well would be more likely to scrap it, start over, and finish a new and probably better paper. Larrick et al. hypothesized that students who attended to sunk costs would probably have papers and GPAs of lower quality. Whether one considers GPA, salary, theater attendance, or bomber funding, one will make better decisions if future costs-not sunk ones-are reckoned.

HINDSIGHT BIAS

In hindsight we tend to exaggerate the likelihood that we would have been able to predict an event beforehand. This is commonly known as "Monday morning quarterbacking," although I am most interested in the hindsight bias not for its sports' implications but for its medical ones. After an autopsy is done it seems "obvious" what the correct diagnosis was. We tend to think that the physician should have been able to make the correct diagnosis earlier. This may be a manifestation of the hindsight bias, because the diagnosis probably was not so obvious earlier. Neal Dawson plus several other collaborators examined the hindsight bias in a clinicopathologic conference (CPC) (Dawson et al., 1988). During a CPC the medical staff of a hospital listens to the presentation of a case by one of the physicians. Before the actual CPC takes place, the physician is given the entire premortem medical record of a patient with which he or she has no prior familiarity. The physician studies these records for a few weeks and then presents the case to the assembled staff at the CPC. The presenting physician always discusses the possible causes of the patient's death. Before finishing the presentation, the physician states boldly what he or she feels was the correct diagnosis. Then the pathologist who performed the autopsy stands up and announces-occasionally with a smirk-what the correct diagnosis really was. He or she discusses the case knowing the right answer. At this point many people in the audience are saying to themselves or to their neighbor, "The presenting physician, Dr. Jones, certainly should have gotten this one right. This was obviously a case of X. I would have diagnosed that one easily." Dawson et al. felt that the CPC situation is ripe for hindsight bias. This would be an

10

ARKES

unfortunate manifestation of the bias, because it robs people of the instructional value of the exercise. The physicians in the audience should realize that they are receiving valuable new information about a difficult case. They have something to learn. Instead they inappropriately exaggerate the likelihood that they would have made the correct diagnosis beforehand. They do not learn much from the CPC, because they feel that they "knew it all along." Dawson et al. first wanted to test whether their suspicion about the presence of the hindsight bias in CPCs was correct. To do this they attended four different CPCs at a large hospital in Cleveland. After the discussant presented all the case information to the audience but before the correct diagnosis was announced, half the audience members were asked to assign their own probability estimates to each of five possible diagnoses. These physicians were the "foresight group," because they were asked to render a judgment in foresight, before the actual diagnosis was known. Then the presenting physician stated which diagnosis he thought was the correct one, and the pathologist gave the correct answer. The remaining half of the members of the audience were then asked to write down the probabilities they thought they would have assigned to each of five possible diagnoses if they had been making the initial diagnosis without knowledge of the correct answer. These were the hindsight subjects. The four CPC cases were divided into the two easier ones and the two more difficult ones. The physicians were also divided into two groups-the less experienced ones and the more experienced ones. So there were four groups-less experienced and more experienced physicians considering easy and tough cases. Figure 1.2 depicts the results of this study. If there were no hindsight bias, then the estimated probabilities for foresight and hindsight groups should be identical. The hindsight bias was present in three of these four groups. All but the veteran physicians judging tough cases showed a higher hindsight than foresight estimate. Of course, this is the hallmark of the hindsight effect. After one knows the correct diagnosis, it seems obvious. Dawson et al. hypothesized that the veterans who judged tough cases in hindsight were aware of how extraordinarily difficult these cases were; the victims died from very rare diseases. These physicians knew that they would have been .unlikely to have made a correct diagnosis had they been asked to do so in foresight. The less experienced physicians might have misjudged the difficulty of the cases because they were less likely to have known how infrequently these diseases occur. Therefore they would not have appreciated how unlikely they would have been to have made the correct diagnosis had they been asked to do so in foresight. The results of this research have a number of practical implications. First, all malpractice considerations are made from a position of hindsight. With the exception of the most difficult cases considered by the most experienced

I.

II

PRACTICAL RESEARCH

p

42

R

Easy, high experience

0 B. 0

Easy, low experience

38

Tough, low experience

F

c 34

0

R R E

c

30

T D

I

26

A

Tough, high experience

G N

0

s

22

I

Foresight

Hindsight

s FIG. 1.2. Mean estimated probabilities of the correct diagnosis as a function of timing of the estimates (foresight vs. hindsjght), experience of the physicians (less vs. more), and case difficulty (easier vJ. more difficult) (from Dawson et

al., 1989).

physicians, when the outcome of a case isl known, it may seem that the correct diagnosis should have been more app~rent in foresight than it really was. Of course, jurors are generally not expetienced physicians. Next, second opinions are often requested from a person who has been told what the opinion of the first clinician is. This may make the second opinion spuriously likely to corroborate the first opinion, which compromises the information value of the second opinion. Finally, as noted earlier, the educational value of the CPC is sabotaged if the audience feels they have not learned anything from this "easy" case. In another study, David Faust, Tom Guilmette, Kathy Hart, and I made an effort to reduce the hindsight effect (Arkes, Faust, Guilmette, & Hart, 1988). We presented neuropsychologists with a couple of paragraphs describing a person who might have had any of three possible diagnoses-alcohol withdrawal, Alzheimer's disease, or brain damage secondary to alcohol abuse. The foresight group was asked to assign probabilities to each of the three possible diagnoses. Each of three hindsight groups was told that a different one of the three diagnoses was the correct one, and we asked them to assign probabilities to each of the diagnoses as if they did not know which one was correct. The top portion of Table 1.1 depicts the results from these groups, which are termed the "No Reasons" groups. We got the usual hindsight effect with those in the foresight group assign-

12

ARKES

TABLE 1.1 Mean Probability Assigned to Each Diagnosis

Outcome Evaluated

Group No reasons Foresight Hindsight Hindsight Hindsight Reasons Foresight Hindsight Hindsight Hindsight

N

Outcome Provided

Alcohol Withdrawal (A!-11

Alzheimer's Disease (AD)

Brain Damage/ Alcohol Abuse (BD)

30 22 22 28

None AW AD BD

37 44 (12) 27 22

26 24 34 (11) 28

37 33° 38° 50 (19)

24 22 23 23

None AW AD BD

33 29 (5) 22 22

32 34 39 (10) 38

34 36° 39 39° (13)

Note: All probabilities are multiplied by I 00. The numbers in parentheses indicate the num· ber of neuropsychologists whose probability for that particular diagnosis exceeded the correspond· ing foresight estimate. 0 Row does not sum to 100.0 due to rounding error.

ing lower probability estimates to the so-called correct diagnosis than did those in the corresponding hindsight group. We then ran four more groups-a special foresight group and three special hindsight groups. Utilizing a procedure developed by Slovic and Fischhoff (1977) and by Koriat, Lichtenstein, and Fischhoff (1980), we asked each of these groups to state one reason why each of the possible diagnoses might be correct before they made their probability estimates for each diagnosis. These were termed the Reasons groups. We found that this simple intervention reduced the hindsight effect in these groups! We think the reason why this debiasing procedure worked is that the hindsight groups who had to list a reason why each diagnosis might be correct would be performing a behavior that would reduce the obviousness of the "correct" diagnosis. If I am a neuropsychologist and am told that a person is suffering from Alzheimer's, I will be able to locate corroborating evidence in the case that makes this diagnosis seem compelling. "Anyone should have been able to make this easy diagnosis," I might think. However, if I am forced to locate other evidence in the case history consistent with other diagnoses, I will probably be able to do so. I may then appreciate for the first time how difficult the case would be to diagnosticians who had to consider the case in foresight. This has the very practical benefit of lowering the hindsight bias. These medical examples of the hindsight bias should not obscure the fact that this bias is also manifested in other domains. For example, shortly after Iraq invaded Kuwait in August of 1990, the lead sentence of an article in the

I.

13

PRACTICAL RESEARCH

Boston Globe proclaimed, "Saddam Hussein's latest bid for hegemony in the Persian Gulf was entirely predictable." In hindsight, events may seem "entirely predictable." They may not seem that way in foresight, however. For example, writing in the Washington Post on October 5, I 988, Milton Viorst asserted that "... Iraq ... currently has no wars on its agenda, and it has pledged to abide by the Geneva Convention in the future." If events were "entirely predictable" in foresight, we would not have to wait for their occurrence before writing their history. Law is another practical area in which hindsight has been shown to play an important role. For example, Casper, Benedict, and Kelly (1988) examined jurors' attitudes toward police officers who had improperly searched an apartment. For one group of simulated jurors, the description of the case included the information that the police found 340 packages of heroin in the apartment. A second group of simulated jurors read that the police found nothing incriminating. A third group was given no information about what the police found. The jurors were asked what damages, if any, to award the plaintiff, who was the resident of the apartment searched by the police. Note that the groups differed in that they heard different outcomes of the search. According to the law, this outcome information should not be used in deciding if the police acted improperly in making this search, and what damages, if any, are therefore due the apartment dweller. The only relevant information should be whether the evidence known to the officers prior to the search met legal standards. Nevertheless, the simulated jurors did not ignore the outcome information. When the search resulted in the discovery of heroin in the apartment, the tenant was awarded far less monetary damages for the illegal search than if nothing was found or if no information was given. In addition, compared to subjects in the other two groups, subjects in the "heroin group" felt that the police were less likely to have used excessive force in their treatment of the subject. Casper et al. suggest that subjects reinterpret the evidence in a way consistent with the outcome knowledge. If heroin was found, then the police were not unjustified in their treatment of the suspect. After all, he was a criminal, wasn't he? The inability to disregard outcome knowledge in evaluating prior evidence is the hallmark of the hindsight bias. THE VALIDITY EFFECT

The last area of judgment and decision-making research I describe is the one most closely related to social psychology. First examined by Hasher, Goldstein, and Toppino (I 977), the validity effect refers to the fact that merely repeating a statement causes it to be perceived as more valid. I find this result to be quite troubling. This result has enormous implications for advertising, propaganda, and persuasion in general.

14

ARKES

In a typical validity effect study, subjects are asked to read a list of trivia statements, like "Mercury has a higher boiling point than copper" and "Over 400 Hollywood films were produced in I948." Subjects rate these statements for validity on a I to 7 scale, where "I" signifies that the rater thinks the statement is definitely false and "7" signifies that the rater thinks the statement is definitely true. In our own studies, the subject then returns a week or two later and rates some of the statements seen previously plus some new ones not shown before. The result is that at this second session the repeated statements are rated as true compared to their nonrepeated counterparts. Note that no attempt has been made to persuade. No supporting arguments are offered. Mere repetition seems to increase rated validity. There are two experiments that suggest how robust this phenomenon is. The first is Catherine Hackett's dissertation (Arkes, Hackett, & Boehm, I989, Experiment 2), in which she changed the original Hasher et al. (I977) study in two ways. First, rather than using only trivia statements some of which were factually true and some of which were factually false, Hackett also used opinion statements, like "At least 75% of all politicians are basically dishonest." Note that the validity of such statements cannot be ascertained by consulting an encyclopedia, as can be done with the usual trivia sentences. Hackett wanted to know if repetition augmented the rated validity of opinion statements as well as the rated validity of the trivia statements. The second difference is that Hackett not only used statements about which people initially felt neutral (4.0 on the 7-point scale), the typical stimulus material in such experiments. She also used statements that subjects initially felt were true (5.25 on the scale) and some that subjects initially felt were false (2.75 on the scale). A good way to summarize the results is to say that the validity effect worked for every kind of statement. I was both amazed at the robust nature of the validity effect and frightened at how labile the rating of validity seemed to be. But the next study (Arkes, Gradwohl-Nash, & Joyner, I989) is more amazing and frightening. First, we constructed eight puzzles, each one consisting of a 15 x 15 matrix of English letters. An example is depicted in Fig. 1.3. Within each of these matrices were buried four English words printed horizontally on four different rows. In each matrix one of the words was the subject of a factually true sentence to be used later, one was the subject of a factually false sentence to be used later, and the other two words were not used subsequently. At the top of each matrix was information about the number of letters in each of the four hidden words. The purpose of this information was to help subjects find the buried words. There were two different groups of four matrices each. One quartet of matrices contained I6 nouns I label Set A. The other quartet of matrices contained I6 different nouns I label Set B. Following subjects' completion of four matrix tasks, the first experimenter

I.

PRACTICAL RESEARCH

IS

This puzzle has a 4 letter word, a 5 letter word, a 9 letter word, and a 10 letter word. Circle each of the four words.

T Q F WT H L S R P F G MD X MOTORCYCLEFRTYY R N T F S F P K S S S F S Y P D F R G S T Z S S S W K N P S WVDQWLNPGSSSGKP SRRTSFNUTRIENTS FLTNRHDVJSMNKFG HRPTDLMNTJHRYBT YTXFCFRRWYHDMKL COURTHKPTLSRLPM T H L R D R F Q K D F T T F L PKLMMWWNKFDLNPL RFACTKPLLGSJLPS TVVPBLFMNSRFTPR MRTTQRRYSPTPFQL FIG. 1.3. Matrix used in the experiment by H. R. Arkes, J. Gradwohi-Nash, and C. A. Joyner (1989).

announced that she would like everyone to participate in a different experiment that her friend was doing. At this point Experimenter 2 announced the "True-False Rating Experiment." She handed out a sheet containing 16 sentences, half of which were factually false and half of which were factually true. At the beginning of the list of sentences was the usual 1 to 7 scale on which subjects were asked to rate the truth or falsity of each sentence. Eight of the 16 sentences contained one noun from puzzle Set A. Thus the subjects who had previously solved the Set A puzzles by finding the buried words had these 8 sentences cued by the prior puzzles. The remaining 8 sentences had a noun from Set B, so these sentences were previously cued for the subjects who had solved the Set B puzzles. In other words, each sentence was cued for half of the subjects and not cued for the other half of the subjects.

16

ARKES

The results of the sentence rating were either encouraging or discouraging, depending on whether you are cheering for the power of the validity effect or the manifestation of the subjects' common sense. For each sentence we calculated the mean validity rating given by subjects for whom the sentence was cued by the prior puzzle task and the mean validity rating given by subjects for whom the sentence was not cued by the prior puzzle task. The 16 sentences were rated more true when one word from the sentence was present in the prior puzzle. Another way to appreciate the results is to consider that 12 of the 16 sentences were rated more true when cued by a prior word from a puzzle than when not cued. We administered a final questionnaire in which we asked subjects to state what relation there might have been between the puzzle task and the sentence-rating task. None of the 36 subjects expressed any realization whatsoever that the sentence-rating task contained words from the puzzle task. Our conclusion is straightforward. Augmenting the familiarity of a statement, even in a minimal way, appears to heighten the perceived validity of the statement. It is easy to think of useful applications of this result-some clearly beneficial and some clearly sinister. We suspect that familiarity is used as one means of judging validity. Often this heuristic may be quite sensible, because valid statements should be more widely spoken about than false ones. However, our research suggests that this heuristic can be exploited in a way that can lead to irrational increases in perceived validity. A few years ago I spoke to a foreign student about this research. She comes from a totalitarian society and told me that the government would frequently put up posters containing assertions that the population knew were ridiculous. However, she said that after she walked by these posters several times per day for a few months, the assertions did not strike her as ridiculous anymore. Nazi propagandist Joseph Goebbels made this same observation a half century ago (Gordon, Falk, & Hodapp, 1984).

CONCLUSION

I think that many judgment errors are costs of an otherwise highly adaptive system. These errors are a relatively small cost to pay for some useful adaptations. I begin with a simple analogy. Humans walk on two feet rather than four. Upright gait, very unusual in the animal kingdom, has resulted in epidemic levels of back pain in humans. This pain, however, was a relatively small cost for our ancestors to pay in order to free their hands for tool use. The benefit far outweighed the cost. Now consider the hindsight bias. When an event occurs, we try to make

I.

PRACTICAL RESEARCH

17

sense of it. We search assiduously for precursors that may have been responsible for the event's occurrence. Of course, the potential events that did not occur are not likely to cue the search for their causes. So we have a very biased consideration of past causal factors. Those that may have caused the event that did occur are sought. Those that may have caused events that did not occur are not sought. Following this selective inventory, the actual event seems to us to have been inevitable and easily predictable beforehand. Hence we obtain a hindsight bias. To ignore the causes of events that never took place is usually a benefit, because it saves a Jot of time and cognitive energy. However, the cost of ignoring such causes is the hindsight bias. Consider another example. Several years ago Allan Harkness and I (Arkes & Harkness, 1980) presented a list of 12 symptoms to a group of hearing and speech therapy students. Eight of the symptoms pertained to Down's Syndrome. About half the students were able to recognize the fact that the patient with these 12 symptoms had Down's Syndrome. About 2 weeks later we returned and asked subjects to examine a Jist of symptoms and tell us which ones had been presented during the first session. Some of the symptoms we presented had not been seen 2 weeks earlier but were symptoms commonly seen in Down's Syndrome children. Compared to students who had not been able to make the diagnosis, those students who correctly made the Down's Syndrome diagnosis 2 weeks earlier falsely stated that these new Down's Syndrome symptoms had been seen earlier. Making the diagnosis of Down's Syndrome imposes a schema upon the symptom Jist. The schema, "Down's Syndrome," is much easier to recall than the Jist of symptoms on which it is based. However, this savings in cognitive effort has a cost-the inability to retrieve the information that engendered the schema in the first place. Most of us, perhaps all of us, would agree that schematizing one's knowledge has huge benefits. However, there are costs, and the costs are known as memory errors. My view of biases as costs of otherwise adaptive systems helps me understand many of these judgment errors. My own understanding provides no consolation for those who bear the practical costs of judgment errors. For example, spouses who refuse to leave terribly abusive relationships because "I've already invested so many years in it" are falling prey to the sunk cost effect. Multimillionaire athletes who end up in poverty demonstrate how spendable windfall money can be. Physicians who practice "defensive medicine" by ordering numerous marginally useful tests might thereby be manifesting an accurate assessment of how compelling the hindsight bias might be if their diagnosis were incorrect. Perhaps defensive medicine is rational even if the hindsight bias is not. Judgment and decision-making research is so important precisely because it is so practical. And given how rational some of our practical decisions are, I think this research will continue for a long time.

18

ARKES

REFERENCES Arkes, H. R., & Blumer, C. (1985). The psychology of sunk cost. Organizational Behavior and Human Decision Processes, 35, 124-140. Arkes, H. R., Faust, D., Guilmette, T., & Hart, K. (1988). Eliminating the hindsight bias. Journal of Applied Psychology, 73, 305-307. Arkes, H. R., Gradwohl-Nash, J., & Joyner, C. A. (1989, November). Solving a word puzzle makes subsequent statements containing the word seem more valid. Convention of the Psychonomic Society, Atlanta. Arkes, H. R., Hackett, C., & Boehm, L. (1989). The generality of the relation between familiarity and judged validity. Journal of Behavioral Decision Making, 2, 81-94. Arkes, H. R., & Harkness, A. R. (1980). Effect of making a diagnosis on subsequent recognition of symptoms. Journal of Experimental Psychology: Human Learning and Memory, 6, 56S-575. Arkes, H. R., Joyner, C. A., Nash, J., Pezzo, M., Christensen, C., Schweigert, W., Boehm, L., SiegelJacobs, K., & Stone, E. (1990, November). The psychology of windfall gains. Convention of the Psychonomic Society, New Orleans. Casper, J.D., Benedict, K., & Kelly, J. R. (1988). Cognitions, attitudes and decision-making in search and seizure cases. Journal of Applied Social Psychology, 18, 93-113. Dawson, N. V., Arkes, H. R., Siciliano, C., Blinkhorn, R., Lakshmanan, M., & Petrelli, M. (1988). Hindsight bias: An impediment to accurate probability estimation in clinicopathologic conferences. Medical Decision Making, 8, 259-264. Gordon, G. N., Falk, 1., & Hodapp, W. (1984). The idea invaders. New York: Hastings House. Hasher, L., Goldstein, D., & Toppino, T. (1977). Frequency and the conference of referential validity. Journal of Verbal Learning and Verbal Behavior, 16, 107-112. Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6, I 07-118. Larrick, R. P., Morgan, J. N., & Nisbett, R. E. (1990). Teaching the use of cost-benefit reasoning in everyday life. Psychological Science, 1, 362-370. Larrick, R. P., Nisbett, R. E., & Morgan, J. N. (in press). Who uses the cost-benefit rules of choice? Implications for the normative status of economic theory. Organizational Behavior and Human Decision Processes.

Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of Ex· perimental Psychology: Human Perception and Performance, 3, 544-551. Thaler, R. H., & Johnson, E. J. (1990). Gambling with the house money and trying to break even: The effects of prior outcomes on risky choice. Management Science, 36, 643-660. Viorst, M. (1988, October 5). Poison gas and 'genocide': The shaky case against Iraq. The Washington Post, p. A25.

CHAPTER

2 THE USE OF MULTIPLE STRATEGIES IN jUDGMENT AND CHOICE /

John W. Payne, James R. Bettman Fuqua School of Business, Duke University

Eric J. Johnson Wharton School, University of Pennsylvania

The problem of preferential choice and judgment is an old one in psychology. Many years of research have been devoted to understanding how people make decisions in situations where one alternative is better on some attributes of value, whereas another alternative is better on other attributes. For example, as a faculty member in a psychology department you might be faced with the decision of which of two applicants to hire. One applicant (alternative) might offer more potential as a teacher (an attribute of value), whereas the other applicant offers more potential as a researcher and as a colleague. As another example, as a faculty member you also might be faced with deciding among alternative investments for your retirement funds. The alternative investments will offer various combinations of the attributes of risk and return. Generally, the higher the risk, the higher the return. How would you go about processing the information about the various alternatives in order to make a choice? This chapter reports on an ongoing program of research concerned with preferential decision behavior .1 The focus of that program of research is the fact that people use multiple strategies when making decisions, contingent on a wide variety of task demands. An underlying theme of that research is the adaptive nature of decision behavior (see also the chapter by Arkes 1A much more extensive report on our program of research can be found in Payne, Beltman, and Johnson (in press). Support for our program of research was provided by the Office of Naval Research and Isle Maligne Society.

19

20

PAYNE, BETTMAN, JOHNSON

in this book). The idea is that decision behavior based on the selective use of heuristics, even though not consistent with normative approaches such as economic theory or Bayesian models, may still represent an intelligent response to decision problems. We argue that decision behavior needs to be viewed from the perspective of a decision maker with limited informationprocessing capabilities and multiple goals for the decision process. The chapter is organized as follows: First, we outline alternative decision strategies and highlight a few major findings from the last 25 years of research that demonstrate the contingent nature of decision behavior. Next, we offer a framework for understanding the multiple use of strategies in decision making that is based on the idea that strategy selection reflects a compromise between the desire to make a good decision (accuracy) and the desire to minimize the cognitive effort used in making the decision. Some studies that test and elaborate the implications of an effort-accuracy framework for strategy selection are then briefly reviewed. Finally, some implications of our research for decision aiding and other practical matters are described.

CONTINGENT DECISION BEHAVIOR

Alternative Decision Strategies There are many different strategies that can be used to make a decision. One strategy often used in decision research is the weighted additive model, which explicitly reflects tradeoffs among attributes. In this model, a measure of the relative importance (weight) of the attribute is multiplied by the value of that attribute for the particular alternative; then these products are summed over all attributes to arrive at an overall evaluation of the alternative. The alternative with the highest overall evaluation is then assumed to be chosen. When faced with more complex decision problems involving many alternatives, people often adopt simplifying strategies that are much more selective in the use of information. Further, the strategies adopted tend to be noncompensatory, in that excellent values on some attributes cannot compensate for poor values on other attributes. As an example, as a faculty member faced with 12 job applicants, you might decide that publication is the most important attribute. Then you might decide to eliminate any applicant who has not had a research publication from further consideration. Tversky (1972) referred to such a strategy as an elimination-by-aspects process. Other simplification strategies (heuristics for choice) that people use include satisficing (Simon, 1955), the equal weighting rule (Dawes, 1979; Einhorn & Hogarth, 1975), the majority of confirming dimensions heuristic (Russo & Dosher, 1983), and the lexicographic choice process (fversky, 1969). Each heuristic represents a different method for simplifying decision processing

2.

THE USE OF MULTIPLE STRATEGIES

21

by limiting the amount of information processed and/or by making how the information is processed easier. The satisficing heuristic, for instance, involves the consideration of alternatives one at a time, with the value of each attribute of an alternative compared to a predefined cutoff level, often thought of as an aspiration level. If any attribute value is below the cutoff, then that alternative is rejected. The first alternative in a set that has values that meet the cutoffs for all attributes is chosen (e.g., the first faculty candidate who has satisfactory levels of research, teaching, and potential as our colleague would be hired). Thus, both the amount of information processed about any given alternative and the number of alternatives processed can be limited by using the satisficing heuristic. If no alternatives pass all the cutoffs, the cutoffs can be relaxed and the process repeated. If multiple alternatives pass the cutoffs, the process can be repeated with more restrictive cutoffs, or an alternative can be randomly selected from those alternatives that are satisfactory. The equal weight strategy simplifies the decision process by ignoring information about the relative importance of each attribute. For example, research potential, teaching potential, and potential as a colleague would all be treated as equally important in hiring a new faculty member. An overall value for each alternative is obtained by simply summing the values for each attribute for that alternative. This assumes that the attribute values are expressed, or can be expressed, on a common scale of value. The majority of confirming dimensions heuristic involves processing pairs of alternatives. The values for each of the two alternatives are compared on each attribute, and the alternative with a majority of winning (better) attribute values is retained. Thus, processing is simplified by requiring only ordinal judgments of which alternative is better on an attribute, rather than assessments of the degree to which one alternative is better than the other. The retained or winning alternative from the first paired comparison is then compared to the next alternative in the set. The process of pairwise comparison is repeated until all alternatives have been evaluated and the final winning alternative identified. Finally, the lexicographic choice strategy proceeds by first determining the most important attribute (e.g., research potential in the selection of a faculty member) and then examining the values of all alternatives on that attribute. The alternative with the best value on the most important attribute is selected. If two alternatives have tied values, the second most important attribute is considered, and so on, until the tie is broken. 2 A lexico2Although generally seen as a descriptive model of choice, the lexicographic rule is sometimes used as the basis for advice about decisions. For example, one piece of advice that we sometimes give to graduate students seeking their first academic job is to ignore all other considerations and select that job offer that maximizes professional opportunities; that is, the student is advised not to worry about factors such as lifestyle and location. The idea is that such additional factors can be considered for a second job, once the graduate student has established herself or himself as a professional.

22

PAYNE, BETTMAN, JOHNSON

graphic process both reduces the amount of information that must be processed and avoids difficult value tradeoffs.

Contingent Strategy Usage Based on Task Complexity Many of the most striking examples of multiple strategy use and contingent judgment and choice concern how people adapt their decision processes to deal with task complexity. Perhaps the most well-established task complexity effect is the impact of changes in the number of alternatives available (Biggs, Bedard, Gaber, & Linsmeier, 1985; Billings & Marcus, 1983; Johnson, Meyer, & Ghose, 1989; Klayman, 1985; Onken, Hastie, & Revelle, 1985; Payne, 1976; Sundstrom, 1987). When faced with decision problems involving just two or three alternatives, people often use decision strategies that process all the relevant information and require one to decide explicitly the extent to which one is willing to trade off less of one valued attribute (e.g., research potential) for more of another valued attribute (e.g., teaching potential). When more alternatives are involved, people often use some of the heuristics outlined previously (e.g., elimination-by-aspects) as a first step to cut down the number of options. As an illustration of the evidence for the use of multiple strategies in decision making as a function of the number of alternatives available, consider Fig. 2.1. That figure provides some excerpts from the verbal protocols (thinking aloud records) obtained by Payne (1976) in a study of choices among apartments. The protocols represent the responses of two different subjects (A and D) faced with two levels of task complexity: (a) two alternative problems (Panels a and b), and (b) multialternative choice problems (Panels c and d). Panels a and b in Fig. 2.1 suggest the consideration of tradeoffs among attributes. For example, Subject D explicitly asks a tradeoff question dealing with the exchange of a higher rent for a lower level of noise in Panel b. The excerpts in Panels c and d, on the other hand, indicate more noncompensatory processing, such as satisficing (Panel c) and elimination-by-aspects (Panel d). Also, note from Fig. 2.1 that strategy differences are shown both within the same subject (e.g., Panels a and c) and across subjects (e.g., Panels c and d).

The Benefits-Costs of Flexibility in Decision Making The flexibility in decision making just illustrated provides benefits by allowing the decision maker to reflect changes in task environments (e.g., increased number of alternatives) in terms of changes in decision strategies (e.g., a shift from compensatory to noncompensatory processes). However, there are also potential costs due to using noncompensatory heuristics in this instance and to flexibility in the use of decision strategies in general.

...,

w

0243: but it might have a low noise level

A29: even though the rent is $30 higher than B.

(c)

0242: $170 is a little steep

A28: and the kitchen facilities are good

(b)

0297: So, we'll automatically eliminate D.

.....

A168: And, if it's high, it must be pretty bad

A169: Which means, you couldn't sleep

A170: I would just put that one aside right there. I wouldn"t

Source: Payne (1976)

A171: Even though, the rent is good

FIG. 2.1. Verbal protocols of choice strate!!ies.

0304: that are O.K in noise level

0300: So. we have four here

0296: It has a high noise level

A167: Ah, I don't like a lot of noise

look any further than that

..... D295:Goto D

A166: That would almost deter me right there

0290: I'm going to go across the top and see which noise levels are high 0291 : If there are any high ones, I'll reject them immediately

A165: The noise level for this apartment is high

(d)

A164: Which is a good note

0289: Since we have a whole bunch here,

Ellmlnetlon-By-Aapecta

exira $30 a month for, to be able to study in my apartment?

0249: So I'll ask myself the question, is it worth spending that

0248: Gee, I can't really very well study with a lot of noise

0247: It's high

0246: We'll go to B's noise level

0245: A's noise level is low

A163: The rent for apartment E is $140

s.tleflclng

0241 : The rent for B is $140

A27: because the noise level is low

0244: So we'll check A's noise level

0239: First look at the rent for both of them 0240: The rent for A is $170 and

0238: O.K. we have an A and a B

Additive Difference

A26: Now for apartment A has the advantage

two rent prices

(a)

A25: in accordance with the other qualitites

A24: O.K, the decision is now between the

Additive Utility

24

PAYNE, BETTMAN. JOHNSON

The use of noncompensatory processes in multialternative choice, for instance, can lead to the elimination of potentially good alternatives early in the decision process {e.g., a potentially excellent faculty member who did not publish as a graduate student may be eliminated in a first cut through the candidates). As another example, the use of a lexicographic process can lead to intransitive patterns of preference, in which a person expresses a choice of X over Y, a choice of Y over Z, and a choice of Z over X (see Tversky, 1969, for an example based on choice among job applicants). Another example of difficulties resulting from contingent processing is the now classic preference reversal phenomenon (Lichtenstein & Slovic, 1971 ). Common sense suggests that good decisions are consistent decisions, in that small changes in the way in which a question is asked should not change what we prefer. However, Sarah Lichtenstein and Paul Slovic showed more than 20 years ago that the expressed preference order between two gambles often reverses, contingent on whether the response requested is a direct choice between the gambles or a bidding price for each gamble; that is, the same individual may choose Gamble A over Gamble B and bid more for Gamble B than for Gamble A, a reversal in preference. Such reversals were even replicated in a Las Vegas casino setting (Lichtenstein & Slovic, 1973). Tversky, Sattath, and Slovic (1988) have shown recently that tradeoffs between attributes (e.g., lives vs. dollars) also are contingent on the nature of the response mode. The more prominent dimension (i.e., lives for most people) looms larger when the decision maker responds by making a choice as compared to when he or she responds by making a matching response, in which an aspect of one option is adjusted so that this option matches another option in overall value. 3 This implies that the tradeoff between lives and dollars is different when an individual responds via matching than when that individual responds by making a choice. Tversky, Sattath, and Slovic suggested that choice tends to elicit qualitative types of reasoning strategies that focus on the most important attribute whereas matching tasks elicit more quantitative types of reasoning. If preferences or beliefs are subject to subtle changes depending on how information is presented or how questions are asked, the decision maker may be vulnerable to strategic manipulation by others. Tversky and Sattath (1979), for example, discuss how placing constraints on the order in which an individual considers the elements of a choice set (e.g., an agenda) can affect the preference order of that individual. Thus, the flexible use of cognitive 3To illustrate a matching response, imagine that you are asked to consider the following two programs for dealing with traffic accidents, described in terms of yearly costs (in millions of dollars) and the number of casualties per year: Program X is expected to lead to 570 casualties and cost $12 million, whereas Program Y is expected to lead to 500 casualties and cost$?? Your task is to provide a value for the cost of Program Y, presumably some amount greater than $12 million, that would make it equal in overall value to Program X.

2.

THE USE OF MULTIPLE STRATEGIES

25

processes to make decisions, contingent on properties of the decision environment, has both benefits and costs for the individual.

AN EFFORT-ACCURACY FRAMEWORK FOR DECIDING AMONG STRATEGIES

These examples provide only a brief view of contingent processing; however, there is extensive evidence that human decision behavior is a highly contingent form of information processing (see Payne et al., in press, for a more complete review of this evidence). Therefore, a conceptual framework for understanding the use of multiple strategies to solve decision problems would be extremely valuable. We offer such a framework, based on the idea that using various decision strategies is an adaptive response for a limited-capacity information processor to the demands of complex task environments. In particular, we emphasize the use of multiple strategies as an adaptive way to balance the goals of achieving an accurate decision and limiting the cognitive effort needed to reach a decision. Our framework can be summarized in terms of several assumptions. First, we assume people have available a repertoire of strategies for solving decision problems of any complexity. Individuals may have acquired different strategies through formal training (Larrick, Morgan, & Nisbett, 1990) or through experience (Kruglanski, 1989). Second, we assume that the available strategies have differing benefits (such as accuracy) and costs (such as effort). Third, for a given level of desired accuracy, we assume that people are motivated to use as little effort as necessary to solve a decision problem; for a given level of desired effort, we assume that individuals will try to use a strategy that is as accurate as possible. Fourth, we assume that the relative benefits and costs of the available strategies will vary across different decision environments. Thus, to be adaptive in terms of accuracy and effort, a decision maker must be flexible in using strategies across such environments. Fifth, we assume that an individual selects the strategy that he or she anticipates will represent the "best" effort-accuracy tradeoff for the task. A further discussion of these assumptions can be found in Payne et al. (in press). The idea that strategy selection involves the consideration of the benefits and costs of different strategies is a frequently used framework for explaining contingent decision behavior (e.g., Beach & Mitchell, 1978; Klayman, 1983; Klein, 1983; Russo & Dosher, 1983; Shugan, 1980; Thorngate, 1980; Wright, 1977). Our version of this framework, however, focuses on strategy selection at a more detailed information-processing level than the work of most other researchers. In addition, we place more stress on the role that cognitive effort plays in strategy selection. In the next section we consider how to measure cognitive effort and present evidence validating our approach.

26

PAYNE, BETTMAN, JOHNSON

Cognitive Effort and Decision Strategies Measuring Cognitive Effort. The importance of the concept of cognitive effort to our framework means that we need to have a method for measuring the effort associated with the use of various strategies in different task environments. The idea that decision making is influenced by considerations of cognitive effort is an old one (Simon, 1955). In the late 1970s and early 1980s, a method for comparing the effort required by different decision strategies was independently proposed by Huber (1980) and Johnson (1979). Based on the work of Newell and Simon (1972), they suggested that decision strategies could be described by a set of elementary information processes (EIPs). An EIP could include such mental operations as reading a piece of information into short-term memory, comparing the values of two alternatives on an attribute to determine which is larger, and multiplying a weight and an attribute value. A set of EIPs for decision making that we have used in our research is shown in Table 2.1. A particular decision strategy would be defined in terms of a specific collection and sequence of EIPs. For example, a lexicographic choice strategy would involve a number of reading and comparison EIPs but no adding or multiplying EIPs. In contrast, a weighted additive strategy would have reading EIPs, a number of adding and multiplying EIPs, and some comparisons (but fewer comparisons than the lexicographic strategy). The number of EIPs required to execute a strategy in a particular task environment reflects the cognitive effort required to make a decision in that environment using that specific strategy. Cognitive effort is also a function of the specific mix of EIPs utilized as well as the total number of EIPs used. The latter idea reflects the belief that people will find some EIPs (e.g., multiplications) more effortful than others (e.g., comparisons). How valid is this EIP approach to measuring decision effort? As we indicate in the following section, there is empirical evidence showing that measTABLE 2.1 Elementary EIPs Used in Decision Strategies READ COMPARE DIFFERENCE ADD PRODUCT ELIMINATE MOVE CHOOSE

Read an alternative's value on an attribute into STM Compare two alternatives on an attribute Calculate the size of the difference of two attributes for an alternative Add the values of an attribute in STM Weight one value by another (Multiply) Remove an alternative or attribute from consideration Go to next element of the external environment Announce preferred alternative and stop the process

Note: STM = short-term memory.

2.

THE USE OF MULTIPLE STRATEGIES

27

uring strategy effort as a function of EIPs is predictive of other measures of decision effort such as response times and self-reports of effort. Empirical Validation of an EIPApproach to Decision Effort. We examined the assumption that EIP counts provide a measure of cognitive effort by having decision makers make choices using different prescribed strategies for choice sets varying in size. Both decision latencies and self-reports of decision difficulty were obtained as measures of strategy execution effort. The crucial question was whether models based on EIP counts could predict these indicators of cognitive effort in choice. Given space constraints, the following description of our methods and results is necessarily limited (for more details, see Bettman, Johnson, & Payne, 1990). We trained 7 subjects to use 6 different decision strategies: weighted adding, equal weighting, lexicographic, elimination-by-aspects, satisficing, and majority of confirming dimensions. Each subject used each strategy in a separate session to make 20 decisions involving selection among job candidates that ranged in size from 2 to 6 alternatives and from 2 to 4 attributes. For each session, subjects were told to make their selections using the prescribed rule exactly as given to them. Subjects used the Mouselab computer-based information acquisition system to acquire information and make their decisions (Johnson, Payne, Schkade, & Bettman, 1991 ). Subjects used a mouse as a pointing device to move a cursor around a matrix containing the attribute weights and values. When the subject pointed the cursor at a cell of the matrix, the information in that cell was displayed, and all remaining information in the matrix was concealed. Mouselab monitored the subjects' information sequences and recorded latencies for each acquisition, the overall time for each problem, any errors made by the subject (e.g., departures from the prescribed search pattern), and the choice. Subjects also rated the difficulty of each choice and the effort each choice required on two response scales presented after each problem. Finally, in a seventh session subjects also made choices for 12 problems of various sizes where the subject was free to use any strategy desired. To determine whether the EIP framework could predict the effort required to use a decision strategy, we used regression analysis to assess the degree to which four alternative models of effort based on EIPs fit the observed response times and self-reports of effort. The simplest model treated each EIP as equally effortful and summed the numbers of each component EIP required for a particular choice to get an overall measure of effort (the equalweighted EIP model). The second model allowed the effort required by each individual EIP to vary by using counts for each of the individual EIPs as separate independent variables (the weighted EIP model). A third model allowed the effortfulness of the individual EIPs to vary across rules (the weighted

28

PAYNE, BETTMAN, JOHNSON

EIP by rule model). Although such variation is possible, of course, our goal of developing a unifying framework for describing the effort of decision strategies would be much more difficult if the sequence of operations or the rule used affected the effort required for individual EIPs. The fourth model allowed the required effort for each EIP to vary across individuals but not rules (the weighted EIP by individual model), based on the expectation that some individuals would find certain EIPs relatively more effortful than other individuals. A fifth model, based simply on the number of pieces of information acquired, was also assessed as a baseline model of decision effort (the information acquisition model). This last model implies that the specific type of processing done on the information acquired makes little or no difference in determining decision effort. Overall, the results yielded strong support for our EIP approach to measuring strategy effort. A model based on weighted EIP counts provided good fits for the observed overall response times (J?l = .84) and self-reports of effort (J?l = .59). In addition, the fit of the weighted ElP model to the data was statistically superior to that of the baseline model of information acquisition and to that of the equal-weighted EIP model. Thus, a model of cognitive effort in choice requires concern not only for the amount of information processed but also for different weights for the particular processes (EIPs) applied to that information. Interestingly, the estimates of the time taken for each EIP were mostly in line with prior cognitive research. The estimated weights for the various EIPs were essentially the same regardless of the decision strategy used; that is, the fits for the more complex weighted EIP by rule model were essentially the same as the fits for the weighted EIP model. This supports the assumption of independence of EIPs across rules. Finally, the results showed significant individual differences in the effort associated with particular EIPs (i.e., the fit of the weighted EIP by individual model was significantly better than that of the weighted EIP model). This suggests that individuals may choose different decision strategies in part because certain component EIPs may be relatively more or less effortful across individuals. In fact, Bettman et al. (1990) showed that the processing patterns used by subjects in an unconstrained choice environment were related to the relative costs of certain EIPs, although the limited number of subjects in that study precluded any strong conclusions. Subjects for whom arithmetic operators were relatively more difficult, as indicated by the coefficients for the various EIPs, showed greater selectivity in processing. To summarize, we found strong support for our EIP approach to measuring decision effort. Next, we illustrate how our general accuracy-effort framework can be used (a) to generate specific predictions about how the use of strategies will vary across task environments, and (b) to test the extent to which actual decision behavior adapts in ways predicted by our framework.

2. THE USE OF MULTIPLE STRATEGIES

29

TESTS OF ADAPTIVE STRATEGY SELECTION

As noted earlier, we have emphasized understanding contingent decision behavior at a detailed information-processing level of explanation. Consequently, we have adopted a distinctive combination of methodologies in our research. First, we use computer simulation in order to derive specific process-level predictions regarding adaptivity in decision processes. Then we use processtracing methods in our experiments to gather the detailed process-level data needed to test those predictions empirically. Next we briefly summarize some of the results we have obtained. Details can be found in Payne, Bettman, and Johnson (1988, in press). Monte-Carlo Simulation of Effort and Accuracy in Choice

To determine the effort and accuracy of various heuristics in different environments, we first modeled each of a set of decision strategies (e.g., the weighted additive [WADD], elimination-by-aspects [EBA], lexicographic choice [LEX], satisficing [SAT], majority of confirming dimensions [MCD], and the equal-weight [EQW] rules) as production systems (Newell & Simon, 1972). A production system is a collection of IF-THEN rules; a discussion of the value of production systems as representations of human cognitive processes can be found in Newell (1980). We then implemented these production system models as computer programs and ran Monte-Carlo simulations using these models of each strategy in order to estimate how the effort and accuracy of the various strategies vary with changes in decision environments. Effort was calculated on the basis of counts of EIPs, as discussed earlier. For each heuristic, accuracy was measured in terms of the relative performance of that heuristic when compared to the optimal choice (given by the weighted additive rule, which uses all the relevant problem information) and the choice that would be expected if a random choice procedure (RAND) was used, which involves no processing of information. Specifically, we measured relative accuracy as follows, in terms of the values of the alternative chosen by each rule indicated: (heuristic-random)/(weighted adding-random). Based on a review of factors that might have important effects on either the effort or accuracy of decision strategies (e.g., see Beach, 1983; McClelland, 1978; Thorngate, 1980), we varied several factors in the simulations to provide different choice environments: the number of alternatives and number of attributes, time pressure, the presence or absence of dominated alternatives, and the degree of dispersion of weights across attributes. To illustrate the latter variable, a problem with low dispersion might have relative weights on the attributes of .30, .20, .22, and .28, respectively, for a four-attribute decision problem. On the other hand, a problem with a high degree of dis-

30

PAYNE, BETIMAN, JOHNSON

persion in weights might have weights of .68, .12, .05, and .15 for the four attributes. For the empirical studies reviewed next, these weights were operationalized in terms of the probabilities of various outcomes that might be experienced, given that a particular alternative was chosen. In other words, the decision maker was asked to select the best gamble from a set of gambles. The conclusions from the Monte-Carlo simulations can be summarized as follows: First, in some decision environments, the use of a non-normative strategy like the lexicographic rule may not only significantly reduce the effort needed to reach a decision but can also provide a level of accuracy comparable to that obtained by the weighted additive rule. Thus, the use of heuristic decision strategies often makes sense when both accuracy of choice and decision effort are considered. Second, no single heuristic was the most efficient strategy across all task environments. In the low dispersion, dominance-possible environment, for example, simplifying processing by ignoring weight (probability) information (i.e., using the equal-weight strategy) is quite accurate. In contrast, when the dispersion in weights is higher, the lexicographic rule is the most accurate of the heuristics and is substantially better than the equal-weight rule. Thus, a decision maker wanting to achieve both a reasonably high level of accuracy and low effort would have to use a repertoire of strategies, with selection contingent on task demands. Third, under time pressure the weighted additive rule rapidly degrades in accuracy, whereas heuristics like elimination-by-aspects and lexicographic choice show much smaller losses in accuracy. In fact, under severe levels of time pressure, elimination-by-aspects is often the most accurate rule. Thus, under time constraints the preferred strategy is to process at least some information about all alternatives as soon as possible rather than to worry about processing each alternative in depth. 4 The simulation results just reported highlight what an idealized decision maker might do to shift strategies adaptively as task environments change. In the next section we discuss the extent to which actual decision behavior involves shifts in strategies of the type predicted by the simulation. Experimental Validation of the Simulation Results

We have conducted a number of experiments designed to validate the results of our simulation studies. Those experiments have used Mouselab to collect process-level data. This involves setting up the decision task so that the sub4 1nterestingly, Eisenhardt (1989) reported that firms in the computer industry operating in rapidly changing environments (time pressure) did better if they used a "breadth-not-depth" strategy for evaluating options.

2.

THE USE OF MULTIPLE STRATEGIES

31

ject must use the mouse to view or select information; as noted before, these acquisition processes can then be easily monitored by the computer. Data can be obtained on what information the subject seeks, in what order, how much information is examined, and how long the information is examined. Further details on Mouselab's capabilities can be found in Johnson et al. (1991). Our first experiment examined the sensitivity of decision behavior to variations in the goals for the task (emphasis on accuracy or emphasis on effort savings). The second series of experiments examined the sensitivity of decision processes to variations in time pressure and to variations in the dispersion of the probabilities (weights) associated with the outcomes of the alternatives in a choice set. More complete details on each set of studies can be found in Creyer, Bettman, and Payne (1990) and Payne et a!. (1988), respectively. Effects of Accuracy and Effort Goals on Decision Processes. A key assumption underlying any accuracy-effort approach to strategy selection is that processing should be sensitive to the relative emphasis placed on accuracy versus effort. For example, people should utilize strategies that provide greater accuracy (often at the cost of greater effort) when the incentives for accuracy are increased. However, as several authors point out (e.g., Ashton, 1990; Tversky & Kahneman, 1986; Wright & Aboui-Ezz, 1988), incentives sometimes enhance performance, sometimes have no effect, and at times may actually decrease performance. One concept important for understanding incentive effects is the distinction between working harder versus working smarter (Einhorn & Hogarth, 1986; Tversky & Kahneman, 1986). Working harder denotes devoting more effort to the same strategy; working smarter, in contrast, refers to changing strategies appropriately to take advantage of a specific situation. We believe that a common response to general incentives is simply to work harder at the same strategy. However, we believe that specific incentives that explicitly change the relative salience of effort and accuracy considerations in the decision environment can lead to strategy changes. Subjects used Mouselab to acquire information and make decisions for 32 sets of 4 nonrisky alternatives, each defined by 6 attributes. The subjects' task was to select the alternative in each set that they thought was best overall. The sets varied within subjects with respect to (a) the dispersion of the weights provided for the attributes (high or low), (b) the explicit goal of the decision maker for the set (minimize effort or maximize accuracy), and (c) the presence or absence of effort and accuracy feedback (these feedback factors are not discussed in this chapter). We manipulated effort-accuracy tradeoffs by explicitly emphasizing either a goal of maximizing accuracy relative to effort or a goal of minimizing effort relative to accuracy for each choice set. Subjects were told that an index

32

PAYNE, BETTMAN, JOHNSON

of overall performance would be developed based on both the time taken and the accuracy achieved for each trial. 5 They were told further that, for trials when the goal was to minimize effort, time taken would receive a weight of three and accuracy a weight of one. On trials where the goal was to maximize accuracy, time taken would have a weight of one and accuracy a weight of three. Thus, both accuracy and effort (time taken) mattered for all trials; we tried to manipulate the relative importance of those two goals. Subjects did more processing when the goal was to maximize accuracy rather than to minimize effort; that is, more information was acquired and more time was spent on the information. In addition, information acquisition was less selective under a goal of maximizing accuracy. Subjects spent proportionally less time on the most important attribute, were less selective in processing over attributes, and were less selective in processing across alternatives. Finally, processing was more alternative based when the goal was to maximize accuracy. This more extensive, less selective, and more alternative-based processing is more consistent with normative strategies and also leads to better performance, because subjects attain greater relative accuracy levels when the goal is to maximize accuracy. To summarize, when we emphasized the goal of accuracy more than effort, we found a shift in strategies in the direction predicted by the effortaccuracy framework. These results provide the clearest evidence available to date for the effects of differences in goals on process-tracing measures of decision strategies (see Billings & Scherer, 1988; Ford, Schmitt, Schechtman, Hults, & Doherty, 1989, pp. 101-102). Effects of Time Pressure and Dispersion in Weights on Decision Processes.

The next series of experiments asked the following: (a) To what extent do people vary their choice processes as a function of context factors such as the dispersion of probabilities and task factors such as time pressure?; and (b) Are these changes in processing in the directions suggested by our simulations? As outlined earlier, the simulation results provide a fairly clear picture of how a decision maker might adapt to such decision environments. Specifically, as implied by the good performance of heuristics like the lexicographic and elimination-by-aspects rules under high dispersion, the simulations suggest that an adaptive decision maker should exhibit more attribute-based processing, greater selectivity in processing across attributes and alternatives, and a greater proportion of processing devoted to probabilities and the most important attribute in a high-dispersion environment. The simulation results also indicate the kind of processing that should be expected under increased time pressure: processing characteristic of strategies, like elimination-by-aspects and the lexicographic rule, which performed 5Accuracy

was measured relative to the weighted additive rule.

2.

THE USE OF MULTIPLE STRATEGIES

33

well in the simulations under time pressure. In particular, there should be more attribute-based processing, greater selectivity in processing, and a greater proportion of processing focused on probabilities and the most important attribute under higher levels of time pressure. We conducted experiments in which subjects were asked to make a series of choices from sets of risky options where dominated options were possible. Each choice set contained four risky options, with each option offering four possible outcomes (attributes). For any given outcome, the probability was the same for all four options. After completing the series of choices, subjects actually played one gamble and received the amount of money corresponding to the alternative they had chosen. The sets varied in terms of two within-subjects factors: (a) presence or absence of time pressure, and (b) high or low dispersion in probabilities. In addition, half the subjects had a IS-second constraint for the problems with time pressure, whereas the other half had a 25-second time constraint (the average response time for the no time pressure conditions was 44 seconds). Information acquisitions, response times, and choices were monitored using Mouselab. For trials with time pressure, Mouselab ensured that subjects could not collect any additional information once the available time had expired. A clock on the display screen indicated the time left as it counted down. Overall, the results for subjects' choice processes validated the patterns predicted by the simulation. Subjects showed a substantial degree of adaptive decision making. More specifically, subjects processed less information, were more selective, and tended to process more by attribute when dispersion in probabilities was high rather than low. Because accuracy was equivalent in the two dispersion conditions, subjects took advantage of changes in the structure of the available alternatives to reduce processing load whereas maintaining accuracy. Individual subjects who were more adaptive in their patterns of processing (i.e., who were relatively more selective and more attribute-based processors in high dispersion environments) also attained higher relative accuracy scores. Importantly, this increase in performance was not accompanied by a significant increase in effort. Hence, more adaptive subjects also appeared to be more efficient decision makers.

We also found several effects of time pressure. First, under severe time pressure, people accelerated their processing (e.g., less time was spent per item of information acquired), selectively focused on a subset of the more important information, and changed their pattern of processing in the direction of relatively more attribute-based processing. This general pattern of results is consistent with the simulation results, suggesting that an efficient strategy under severe time pressure would involve selective and attributebased processing. The effects of time pressure were substantially less for those subjects with

34

PAYNE, BETTMAN, JOHNSON

a 25- as opposed to a 15-second constraint. Under more moderate time pressure, subjects showed some acceleration and some selectivity in processing but provided no evidence for a shift in the pattern of processing. These results suggested a possible hierarchy of responses to time pressure. People initially may try to respond to time pressure simply by working faster. If this is insufficient, they may then focus on a subset of the available information. Finally, if that is still insufficient, people may change processing strategies (e.g., from alternative-based processing to attribute-based processing). The results of the experiments outlined previously provide compelling evidence for adaptivity in decision making. Although not perfectly adaptive, our subjects were able to change processing strategies in ways that the simulation indicated were appropriate given changes in context and task features of the decision problems. Our conceptual framework thus receives strong support in these empirical studies. Individuals appear to weigh accuracy and effort concerns in selecting decision strategies. Given that individuals appear to be so flexible, how can we further aid them in making better decisions? In the next section, we examine the implications of adaptivity for designing decision aids.

DECISION AIDING

Understanding contingent decision processes has important ramifications for helping individuals to make better decisions. In particular, researchers have examined how various decision aids can either improve the accuracy of decisions, decrease the effort required for decision making, or both. Two major areas of active research on decision aids are reviewed next, decision analysis and the design of information environments.

Decision Analysis Decision analysis is a set of models and methods for helping people deal with difficult and stressful decisions. The operating assumption of decision analysts is that a decision maker wishes to select the action that has the highest expected utility. Decision analytic methods include both tools for structuring decisions (e.g., decision trees) and tools for eliciting beliefs (probabilities) and values (utilities) (Watson & Buede, 1987). The "divide and conquer" approach, in which complex decision tasks are decomposed into smaller components, is also an important feature of decision analysis (Henrion, Fischer, & Mullin, in press; MacGregor & Lichtenstein, 1991; Ravinder & Kleinmuntz, 1991). Some evidence for the general value of decision analysis is provided by Politser (1991).

2.

THE USE OF MULTIPLE STRATEGIES

35

The contingent nature of decision behavior has important implications for decision analysis. For example, the variance in preferences across tasks that are seemingly similar (e.g., choice vs. bidding for the same lotteries) calls into question the validity of the judgmental inputs needed to operationalize decision analysis (Watson & Buede, 1987). On the other hand, Tversky (1988) argued that the evidence of contingent decision processes shows that people may greatly benefit from various decision aids. He suggested that rather than abandoning decision analysis we try to make it more responsive to the complexities and limitations of the human mind. There are at least three ways in which decision analysts are trying to be responsive to the adaptive nature of judgment and choice. First, new methods for eliciting values and beliefs have been proposed that will hopefully avoid some of the inconsistencies in preference and probability judgments (e.g., McCord & de Neufville, 1986). A second approach for dealing with contingent judgments is sensitivity analysis, in which utility is measured in several ways and any discrepancies are explicitly reconciled by the decision maker (von Winterfeldt & Edwards, 1986). Edwards (1990) made a strong case for using multiple methods to elicit the same beliefs or values, followed by a discussion of discrepancies, followed by revision of the elicited quantities by the respondent. The belief is that asking respondents to think harder (expend more cognitive effort) and reconcile conflicting estimates will enhance the validity (accuracy) of the assessed preferences and beliefs. Although that belief is in some ways consistent with an effort-accuracy framework, we have noted earlier that incentives (the encouragement of the analyst) do not always lead to more normative behavior. In addition, there also is the danger that the analyst will become too involved in constructing the decision maker's values (Fischhoff, Slovic, & Lichtenstein, 1980, discuss how an elicitor can affect the expression or formulation of values). We agree with Edwards (1990) that more data on the effects of multiple elicitations would help a great deal. D. Kleinmuntz (1990) discussed a third approach to dealing with the sensitivity of assessment procedures to task and context effects. He advocated building error theories to predict the cumulative effects of assessment errors given that a complex problem has been decomposed into a series of simpler judgment tasks. Each judgment has an expected value (systematic portion) and a standard deviation (random portion), and Kleinmuntz proposed that the expected value not be viewed as a true internal opinion but as a value determined jointly by the person, the task, and the context. Thus, systematic error due to strategy shifts resulting from task and context changes is explicitly included in the error theory along with the more commonly considered random error portion of judgment. Elaborating on the idea of multiple assessments of preferences and beliefs, Kleinmuntz suggested that if the direction and size of the systematic errors can be predicted, then it should be

36

PAYNE, BETTMAN, JOHNSON

possible to select a portfolio of assessment procedures in such a way that biases cancel each other out. Finally, Kleinmuntz noted that decision makers' great concern with conserving cognitive effort suggests that simplified modeling techniques might be preferred in practice (see Behn & Vaupel, 1982; von Winterfeldt & Edwards, 1986, for examples), although Politser (1991) argued that simple problem analysis may not always provide the greatest gains.

Changing Information Environments to Aid Decisions Because individuals adjust their decision strategies depending on properties of the decision task, decisions can sometimes be improved by rather straightforward inexpensive changes to the information environments within which individuals make judgments and choices. For example, in the 1970s the provision of unit price information in supermarkets was promoted as a way of increasing consumer welfare. However, several studies showed that people were either not aware of unit prices or were not using them. Russo (1977) argued that people would like to compare alternatives directly on important attributes like unit prices; however, he also noted that it was difficult for most consumers to process unit price information as it was normally displayed. Each unit price was typically available only under each item on a shelf. Hence, comparing unit prices for many items could involve searching up and down a shelf, with potentially great demands on memory. Russo argued, therefore, that people would tend to ignore such unit price information because it was hard to process. Thus, making information available was not sufficient to change consumer behavior; the available information also had to be processable. Russo demonstrated the power of this argument by showing that consumers' actual purchase decisions could be altered by making a simple change in the format used to present unit price information; he put all the available information on unit prices together in an easy to read list with unit prices ranked from lowest to highest. Consumers changed their purchasing patterns and paid lower prices, on average, when the new lists were posted. Additional evidence of the effects of information formats on people's responses in real-world settings has been provided by a number of researchers since Russo. For example, the importance of improved information displays has been borne out in studies of hazard warning labels on household products (Viscusi, Magat, & Huber, 1986), in the provision of the results of home energy audits (Magat, Payne, & Brucato, 1986), and in the provision of information on radon levels in homes (Smith, Desvousges, Fisher, & Johnson, 1988). The processability of presented information matters a great deal in determining the decisions people make. ·

2.

THE USE OF MULTIPLE STRATEGIES

37

CONCLUSION

One of the major findings from behavioral decision research is that an individual uses different processing strategies for making a decision, contingent on the nature of the task environment. This chapter reviews a program of research directed at understanding such contingent use of strategies in decision making. We argue that this observed flexibility in the use of decision strategies often reflects adaptive behavior when both the effort and accuracy of decisions are considered. Evidence in support of an effort-accuracy framework is presented. We also argue that understanding the contingent nature of decision behavior has implications for such decision-aiding techniques as decision analysis and changing the information environments of decision makers. Whereas some may view the fact that decision processes are not invariant across task environments as a difficulty for decision research, we view it as a source of excitement and opportunity.

REFERENCES Ashton, R. H. {1990). Pressure and performance in accounting decision settings: Paradoxical effects of incentives, feedback, and justification. Journal of Accounting Research, 28 {Supplement), 148-180. Beach, L. R. (1983). Muddling through: A response to Yates and Goldstein. Organizational Behavior and Human Performance, 31, 47-53. Beach, L. R., & Mitchell, T. R. (1978). A contingency model for the selection of decision strategies. Academy of Management Review, 3, 439-449. Behn, R. D., & Vaupel, J. W. {1982). Quick analysis for busy decision makers. New York: Basic Books. Bettman, J. R., Johnson, E. J., & Payne, J. W. (1990). A componential analysis of cognitive effort in choice. Organizational Behavior and Human Decision Processes, 45, 111-139. Biggs, S. F., Bedard, J. C., Gaber, B. G., & Linsmeier, T. J. (1985). The effects of task size and similarity on the decision behavior of bank loan officers. Management Science, 31, 970-987. Billings, R. S., & Marcus, S. A. {1983). Measures of compensatory and noncompensatory models of decision behavior: Process tracing versus policy capturing. Organizational Behavior and Human Performance, 3/, 331-352. Billings, R. S., & Scherer, L. M. (1988). The effects of response mode and importance in decision making strategies: Judgment versus choice. Organizational Behavior and Human Decision Processes, 34, 1-19. Creyer, E. H., Bettman, J. R., & Payne, J. W. (1990). The impact of accuracy and effort feedback and goals on adaptive decision behavior. Journal of Behavioral Decision Making, 3, 1-16. Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571-582. Edwards, W. (1990). Unfinished tasks: A research agenda for behavioral decision theory. In R. M. Hogarth (Ed.),Jnsights in decision making: A tribute to Hillel J. Einhorn (pp. 44-65). Chicago: University of Chicago Press. Einhorn, H. J., & Hogarth, R. M. (1975). Unit weighting schemes for decision making. Organizational Behavior and Human Performance, 13, 171-192.

38

PAYNE, BETTMAN, JOHNSON

Einhorn, H. J., & Hogarth, R. M. (1986). Decision making under ambiguity. Journal of Business, 59, S225-S250. Eisenhardt, K. M. (1989). Making fast strategic decisions in high velocity environments. Academy of Management Journal, 32, 432-576. Fischhoff, B., Slovic, P., & Lichtenstein, S. (1980). Knowing what you want: Measuring labile values. In T. Wallsten (Ed.), Cognitive processes in choice and decision behavior (pp. 117-141 ). Hillsdale, NJ: Lawrence Erlbaum Associates. Ford, J. K., Schmitt, N., Schechtman, S. L., Hults, B. M., & Doherty, M. L. (1989). Process tracing methods: Contributions, problems, and neglected research questions. Organizational Behavior and Human Decision Processes, 43, 75-117. Henrion, M., Fischer, G. W., & Mullin, T. (in press). Divide and conquer? Effects of decomposition on the accuracy and calibration of subjective probability distributions. Organizational Behavior and Human Decision Processes.

Huber, 0. (1980). The influence of some task variables on cognitive operations in an informationprocessing decision model. Acta Psychologica, 45, 187-196. Johnson, E. J. (1979). Deciding how to decide: The effort of making a decision. Unpublished manuscript, University of Chicago. Johnson, E. J., Meyer, R. M., & Ghose, S. (1989). When choice models fail: Compensatory representations in negatively correlated environments. Journal of Marketing Research, 26, 255-270. Johnson, E. J., Payne, J. W., Schkade, D. A., & Bettman, J. R. (1991). Monitoring information processing and decisions: The Mouselab system. Unpublished manuscript, Center for Decision Studies, Fuqua School of Business, Duke University. Klayman, J. (1983). Analysis of predecisional information search patterns. In P. C. Humphreys, 0. Svenson, & A. Vari (Eds.), Analyzing and aiding decision processes (pp. 401-414). Amsterdam: North-Holland. Klayman, J. (1985). Children's decision strategies and their adaptation to task characteristics. Organizational Behavior and Human Decision Processes, 35, 179-201. Klein, N. M. (1983). Utility and decision strategies: A second look at the rational decision maker. Organizational Behavior and Human Performance, 31, 1-25. Kleinmuntz, D. N. (1990). Decomposition and the control of error in decision-analytic models. In R. M. Hogarth (Ed.), Insights in decision making: A tribute to Hillel J. Einhorn (pp. 107-126). Chicago: University of Chicago Press. Kruglanski, A. W. (1989). The psychology of being "right": The problem of accuracy in social perception and cognition. Psychological Bulletin, 106, 395-409. Larrick, R. P., Morgan, J. N., & Nisbett, R. E. (1990). Teaching the use of cost-benefit reasoning in everyday life. Psychological Science, I, 362-370. Lichtenstein, S., & Slovic, P. (1971). Reversals of preference between bids and choices in gambling decisions. Journal of Experimental Psychology, 89, 46-55. Lichtenstein, S., & Slovic, P. (1973). Response-induced reversals of preference in gambling: An extended replication in Las Vegas. Journal of Experimental Psychology, 101, 16-20. MacGregor, D. G., & Lichtenstein, S. (1991). Problem structuring aids for quantitative estimation. Journal of Behavioral Decision Making, 4, 101-116. Magat, W. A., Payne, J. W., & Brucato, P. F. (1986). How important is information format? An experimental study of home energy audit programs. Journal of Policy Analysis and Management, 6, 20-34. McClelland, G. H. (1978). Equal versus differential weighting for multiattribute decisions. Unpublished working paper, University of Colorado. McCord, M. R., & De Neufville, R. (1986). "Lottery equivalents": Reduction of the certainty effect problem in utility assessment. Management Science, 32, 56-60. Newell, A. (1980). Harpy, production syttems, and human cognition. In R. Cole (Ed.), Perception and production of fluent speech (pp. 299-380). Hillsdale, NJ: Lawrence Erlbaum Associates.

2.

THE USE OF MULTIPLE STRATEGIES

39

Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Onken, J., Hastie, R., & Revelle, W. (1985). Individual differences in the use of simplification strategies in a complex decision-making task. Journal of Experimental Psychology: Human Perception and Performance, 11, 14-27. Payne, J. W. (1976). Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance, 16, 366-387. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 534-552. Payne, J. W., Bettman, J. R., & Johnson, E. J. (in press). The adaptive decision maker. Cambridge: Cambridge University Press. Politser, P. E. (1991). Do medical decision analyses' largest gains grow from the smallest trees? Journal of Behavioral Decision Making, 4, 121-138. Ravinder, H. V., & Kleinmuntz, D. N. (1991). Random error in additive decompositions of multiattribute utility. Journal of Behavioral Decision Making, 4, 83-97. Russo, J. E. (1977). The value of unit price information. Journal of Marketing Research, 14, 193-201. Russo, J. E., & Dosher, B. A. (1983). Strategies for multiattribute binary choice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 676-696. Shugan, S. M. (1980). The cost of thinking. Journal of Consumer Research, 7, 99-111. Simon, H. A. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69, 99-118. Smith, V. K., Desvousges, W. H., Fisher, A., & Johnson, F. R. (1988). Learning about radon's risk. Journal of Risk and Uncertainty, I, 233-258. Sundstrom, G. A. (1987). Information search and decision making: The effects of information displays. Acta Psychologica, 65, 165-179. Thorngate, W. (1980). Efficient decision heuristics. Behavioral Science, 25, 219-225. Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76, 31-48. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79,281-299. Tversky, A. (1988). Discussion. In D. E. Bell, H. Raiffa, & A. Tversky (Eds.), Decision making: Descriptive, normative, and prescriptive interactions (pp. 599-612). Cambridge: Cambridge University Press. Tversky, A., & Kahneman, D. (1986). Rational choice and the framing of decisions. Journal of Business. 59, 5251-5278. Tversky, A., & 5attath, 5. (1979). Preference trees. Psychological Review, 86, 542-573. Tversky, A., 5attath, 5., & 51ovic, P. (1988). Contingent weighting in judgment and choice. Psychological Review, 95, 371-384. Viscusi, W. K., Magat, W. A., & Huber, J. (1986). Informational regulation of consumer health risks: An empirical evaluation of hazard warnings. Rand Journal of Economics, 17, 351-365. Von Winterfeldt, D., & Edwards, W. (1986). Decision analysis and behavioral research. Cambridge: Cambridge University Press. Watson, S. R., & Suede, D. M. (1987). Decision synthesis: The principles and practice of decision analysis. Cambridge: Cambridge University Press. Wright, P. L. (1977). Decision times and processes on complex problems. Unpublished manuscript, Stanford University. Wright, W. F., & Aboui-Ezz, M. E. (1988). Effects of extrinsic incentives on the quality of frequency assessments. Organizational Behavior and Human Decision Processes, 41, 143-152.

PART

II PROCESSING PROBABILISTIC INFORMATION

CHAPTER

3 USING CONFIGURAL AND DIMENSIONAL INFORMATION Stephen E. Edgell University of Louisville

This chapter explores many years of published data and some new data from the multiple-cue probability learning research paradigm. The focus is on answering two questions concerning subjects' performance in using probabilistic information in making decisions. The first question is whether or not it is more difficult to use relevant configura) information than it is to use relevant dimensional information. Although early studies answered this in the affirmative, later work has shown that the issue is very complex. Some partial answers are discussed. The second question is does having relevant dimensional (configura!) information affect the utilization of relevant configura) (dimensional) information; that is, does the utilization of one type of information interact with the utilization of the other? It is found that there is no evidence for relevant configura! affecting dimensional utilization but strong evidence for relevant dimensional affecting configura! utilization. The Castellan and Edgell model is compared to the findings, and a revised version of the model is found to better account for these results. Although everyone has a good idea what decision making is, a precise definition has never been agreed on. One characteristic often associated with decision making is that of a decision maker using information to make a decision or judgment. (In this chapter no distinction is made between decisions and judgments.) The decision maker cognitively processes the information to arrive at a decision. Much of the study of decision making has involved exploring these cognitive processes. Moreover, most of the researchers in this area have taken into account the. nature of the information that the 43

44

EDGELL

decision maker uses. Following the advice of Brunswik (1952), most researchers have considered the relationship between the information that is available to the decision maker and the event or criterion about which the decision maker is making a judgment. The information and the event or criterion make up what is called the environment of the decision maker. It is common to refer to a "lens model" of the decision-making process in which the information dimensions are in the center with the subject's cognitive processes to the left and the environment to the right. However, because certain analytical models (e.g., linear regression and correlation) have become associated with Brunswik's lens model, that terminology is not used in this chapter; rather, I refer to the cognitive process and the environment. The importance of the relationship between these two is of basic interest in this chapter. The information in the environment is often only partially related to the criterion or, in other words, there is random error in the relationship; that is, the environment is probabilistic. Many researchers consider a probabilistic environment to be one of the defining characteristics of a decision-making task. As an example, consider a stockbroker who is trying to make a judgment as to how to advise clients on a particular stock. The stockbroker has several dimensions of information to use in making this decision. There could be reports on sales from several quarters, inventory, reserves, profits, and so on. None of these dimensions will perfectly predict the criterion of what will happen to the price of the stock, but they should have some relationship to it. It is the task of the stockbroker to use this information to arrive at a judgment as to what is the most likely value of the criterion. The focus of this chapter is on the effect that the environment has on the cognitive processes of the decision maker. The information dimensions may be relevant to the decision process; that is, there may be a relationship between the criterion and the information dimensions. If there are dimensions of information that are related to the criterion, then there is relevant dimensional information in the environment. It may also be the case that there is a relationship between patterns of the information dimensions and the criterion over and above any relationship between the individual dimensions and the criterion; that is, there may be relevant configural information in the environment. (The definitions of relevant dimensional information and relevant configural information are made more precise later.) In particular, this chapter focuses on the utilization of dimensional and configural information. The question of whether it is harder for the decision maker to utilize configura! information than to utilize dimensional information is explored. Whether there are interactions of having one type of relevant information in the environment on the cognitive use of the other type is then examined.

3.

CONFIGURAL AND DIMENSIONAL INFORMATION

45

MULTIPLE-CUE PROBABILITY LEARNING It is obvious that if the decision maker is to use in any way correctly the information in the environment, there first must be learning on the part of the decision maker. Many researchers have chosen to study the cognitive process of the decision maker after learning has taken place. For example, when studying the stockbroker it would be assumed that the stockbroker has learned, at least to some degree, how to use the information. The researcher would study the stockbroker after he or she has learned. However, other researchers have chosen to study the learning process itself. The subject, or decision maker, is exposed to a probabilistic environment. The subject must learn to use the information to make decisions about an event. Some form of feedback must be given to the subject in order for learning to take place. The feedback is usually the correct event. However, some studies have looked at the effect of other types of feedback on learning (e.g., Castellan, 197 4; Hammond & Summers, 1973; Todd & Hammond, 1965). This chapter reviews only studies that used correct event feedback. This paradigm is called multiple-cue probability learning. On each trial, the subject is given one value from each information, or cue, dimension and must give as his or her judgment a prediction of what the correct event is. After making the prediction, the subject is given the correct event as feedback. Over trials the subject's task is to learn how to utilize the cues to more accurately predict the event. As is generally the case with real-world decision tasks, the function relating the cues to the event is probabilistic (i.e., contains error variance). This, of course, makes perfect prediction of the event impossible. In a multiple-cue probability learning study, there are usually sufficient trials given to the subject for the subject's performance to reach asymptote. Indeed, in many experiments the main focus of the study is on this asymptotic performance. However, some studies have investigated the learning process also. To be a little more formal, let the environment contain p cue dimensions C1, C2 , ••• , CP. On each trial the decision maker is given one value from each dimension. This is equivalent to the information about one company in the stockbroker example. There is some function, I{C1, C2 , ••• , CP, X), of these cues and a random variable unrelated to the cues (to make the environment probabilistic) that is the criterion that the subject is to predict. After the subject makes his or her response or prediction, the value of this function on that trial is given as feedback. The cues can be metric (that is, have numerical values), or they can be nonmetric (that is, have values that are not numeric). An example of a metric cue dimension for a stockbroker would be sales, whereas an example of a nonmetric cue dimension would be the type of business of the company, such as airline. Although many realworld decision-making environments probably involve cue dimensions of both

46

EDGELL

types, laboratory research has concentrated on environments of all metric cue dimensions (metric multiple-cue probability learning) or all nonmetric cue dimensions (nonmetric multiple-cue probability learning). Since the famous book by Meehl (1954), one of the important topics in decision making has been that of configura! information utilization. Configura! information is information that is contained in the pattern of two or more cues but not any one of them separately. More technically, the function that relates the cues to the event is not additive. A function is additive if it can be written as the sum of functions of each individual variable. For example, if there were only two dimensions the function f would be additive if there exists two functions f 1 and f2 such that the following holds: I{C1,C2 ,X)

= f 1(CI'X)

+ f2(C2,X).

If a function is nonadditive, then it is at least partially configura!. Yntema and Torgerson (1961) demonstrated that a function that is nonadditive can have most of its variance accounted for by an additive function. A configura! function can be partitioned into additive variance and nonadditive variance with parsimony dictating that as much variance as possible be attributed to the additive part. In order to understand the term configura! better, it may be helpful to note that in the area of research design the term interaction is used and is a synonym for configural. Further, the term main effect is a synonym for dimensional information.

MEASURING VALIDITY AND UTILIZATION

In order to fully analyze multiple-cue probability learning studies, it is necessary to be able to mathematically model the environment and the subject's cognitive rule. This model must be capable of separating the dimensional components from the configura) components such that the dimensional compo· nents account for as much of the variance as possible. The same modeling scheme should be applicable to both the environment and the subject's cognitive processing. This would make it possible to compare the analysis of the environment with the analysis of the decision maker's cognitive processes. In the nonmetric task such a modeling scheme was described by Edgell (1978, 1980). This scheme uses the general linear model from a standard factorial analysis of variance (as was proposed by Hoffman, Slovic, & Rorer, 1968) applied to the conditional probabilities of one of the events given the pattern of cues. (Bjorkman, 1973, proposed these conditional probabilities as measures of validity.) Although the model is described fully in Edgell (1978, 1980), a few examples here help to illustrate its use. Consider a nonmetric environment with only one cue dimension that takes on one of two values (C1.1 and CJ. 2). (C;,1 is the jth value of cue dimension i.) Also suppose there is

3.

47

CONFIGURAL AND DIMENSIONAL INFORMATION

one event that also takes on only two values (£1 and £ 2). Suppose further that the event base rate is .5 (i.e., P(£1) = P(£2) = .5). Also let the cue base rate be .5 (i.e., P(Cu) = P(C~, 2) = .5). (If the cue base rate is not .5 the modeling scheme is complicated, but this is not of concern with the studies reviewed in this chapter.) Now suppose that the conditional probabilities are P(E11Cu) = .7 and P(£11Cd = .3. We subtract the grand mean (the event base rate) from each of these conditional probabilities, which gives + .2 and - .2. We arbitrarily choose the positive value and designate the validity as .2. It should be easy to see that the validity can take on values from 0 to .5. When the validity is 0, the cue is useless, and when the validity is .5, the cue would allow perfect prediction of the event. As a more complex example consider an environment with two cues that each take on two values. Thus there are four cue patterns possible. Again, let the cue and event base rates be .5 and further let the two cue dimensions be independent (P(C~,;IC2 ) = .5). The four conditional probabilities of Event I given each of the possible cue patterns for three different possible environments are given in Table 3.I. The same basic analysis is done except that the usual two by two table for a factorial analysis of variance is used. For environment I, subtracting .5 and taking the row and column means for each cue dimension across the other cue, we get for cue dimension I, +.2, and - .2 and for cue dimension 2, 0, and 0. Thus cue dimension I is relevant whereas cue dimension 2 is irrelevant. For environment 2, both cue dimensions give values of +.2 and -.2. Hence, both cue dimensions are relevant. In the third environment, the cue dimension I also gives values of +.2 and -.2, whereas cue dimension 2 gives values of 0 and 0. However, the .5, the +.2, and -.2, and the 0 and 0 do not sum to give the conditional probabilities. Additional factors of + .2 and - .2 are needed. This is the validity of the configura! information. The configura) information was irrelevant in the first two environments. Referring to Table 3.I, it is obvious that in environment 3 cue dimension 2 modifies the validity of cue dimension I even though cue dimension 2 has no validity on the average. This is a classic interaction scenario. The same analysis scheme can be applied to the conditional proportions of one response (that of predicting Event I) given each cue pattern, that are observed over some number of trials for each subject. This would convert TABLE 3.1 Conditional Probabilities for Three Example Environments

Environment

P(£1ICu,C2,tl P(£tiCt.2•C2,tl P(Et1Ct,I•C2,2) P(£11Ct,2•C2,2)

.7 .3 .7 .3

2

3

.9

.9 .1

.5 .5 .1

.5 .5

48

EDGELL

the subject's response proportions to utilization weights with the same range and interpretation as the validity weights for the environment. As when modeling the environment, there are two weights for each cue dimension, four for each two dimension pattern, and so on, but they only differ by sign. We choose to use for each dimension and pattern that subject's weight that corresponds to the same dimension or pattern weight in the environment that was chosen because it was positive. Because of this, one or more of the subject's weights could be negative. This would occur if the subject were utilizing the dimensional or configura! information in the opposite direction as the environment. There has been no equivalent scheme developed for the metric environments. Linear regression is often used to model the environment and the subject's utilization. However, this only models the linear component of each. Tucker's (1964) equation does allow the determination of the amount of correct nonlinear utilization. This suffers from the problem of modeling only the utilization that is correct with respect to the environment, and it confounds additive but nonlinear with configura!. The metric studies reviewed in this chapter have used the correlation of the subject's predictions with the actual event (usually called achievement) as the dependent measure. This measure is useful but quite crude, because all correct utilization is lumped together. Some authors have proposed using this same measure in the nonmetric case. However, the scheme just outlined is obviously much preferred. Not only does it give a detailed breakdown, but the analysis of the environment and the subject's utilization are independent of each other. Thus incorrect, as well as correct, utilization can be modeled. Further it should be noted that achievement is a function of the environment, the subject's utilization, and chance. The aforementioned scheme is not affected by chance, which adds error variance to achievement thus making it Jess desirable as a dependent variable.

THEORY

Castellan and Edgell (1973) proposed a theory to account for subjects' behavior in nonmetric multiple-cue probability learning studies. The theory proposed that on each trial a subject would choose either to ignore the cues, to use a particular cue dimension, or to use the pattern of the cues. (This is the model referred to as Model 2 in their paper. Their Model I has been refuted by several studies.) If the subject decided to ignore the cues, then the subject would formulate a response based on how often each response had led to a correct prediction. If the subject chose to observe a particular cue dimension, a response would be formulated based on the observed cue v J.lt•e and how often the subject had been correct making each response to

3.

CONFIGURAL AND DIMENSIONAL INFORMATION

49

that cue value. If the pattern was selected, the response would be formulated in a similar manner using the value of the pattern of cues. In their original paper Castellan and Edgell did not consider the issue of subpatterns of cues, which is a pattern of two or more cue dimensions, but not of all the cue dimensions. The model could be proposed such that the subject could choose to use the overall pattern only, or that the subject could choose on any trial to use any of the subpatterns also. Obviously, this is not an issue except in environments with three or more dimensions. These two possible versions were discussed in Edgell (1980) and are considered later in this chapter. Castellan and Edgell proposed functions that would determine the probabilities of making each of the choices necessary from the environment probabilities. The model with these parameter values would be applicable to asymptotic performance statistics on the subjects. Edgell and Morrissey (1987), in order to account for findings in studies where the environment changed during learning, proposed an alternative way for determining the probabilities of what the subject chooses to attend to on each trial. However, this latter scheme was never quantified. Edgell and Morrissey (1987) only proposed enough details to account for the direction of the effects that they found.

IS CONFIGURAL INFORMATION HARDER TO UTILIZE

The first question to be explored in this chapter is whether it is more difficult for decision makers to utilize relevant configura! information than for them to utilize relevant dimensional information. One might well expect that, because patterns are more complex than individual dimensions, it would be more difficult for subjects to utilize configura! information than for them to utilize dimensional information. The Castellan and Edgell (1973) model in either its original form or in its modified form predicts that in a nonmetric environment configura! information would be less utilized than dimensions when both have the same validity. Several studies in the literature contain data that are relevant to this question. First several studies that used nonmetric multiple-cue probability learning environments are reviewed, and then some that used metric multiple-cue probability learning environments are reviewed. Some early research using the nonmetric environment seemed to find that it was more difficult for subjects to utilize configura! than dimensional information (Edgell, 1978; Edgell & Castellan, 1973). Subjects were run in an environment with two cue dimensions. The two cue dimensions were nonmetric and binary valued. The subjects were to predict a binary-valued event. This is the simplest possible environment that could have relevant configura! information. In one condition of one of their experiments, the validity of the information in one of the dimensions and the validity of the configura! in-

50

EDGELL

formation were equal and of a moderate level {.2). Validity is measured as discussed previously. The event base rate and the other cue dimension were irrelevant. Subjects were run with correct event feedback for 400 trials. The mean asymptotic utilization weights (as discussed before) from the last 100 trials is shown graphically in Fig. 3.1 by the first two bars labeled Experiment 1. The crosshatched taller bar on the left is the mean utilization of the dimensional information, and the shorter bar next to it is the mean utilization of the configura! information. Notice that the subjects were much more strongly utilizing the dimensional information than the configura! information (p < .001). It could be argued that, because there were both relevant dimensional information and relevant configura) information in the environment, they were competing for the attention of the subject. If there was no competing dimensional information, perhaps the utilization of configura! information would be higher. However, in another of their experiments, that exact condition was one of the conditions run. Only the configura! information was relevant with the same validity (.2) as in the study discussed earlier. The mean asymptotic utilization of the configura! information is given by the bar labeled Experiment 2 on the far right of Fig. 3.1. Again, comparing it with mean dimensional utilization from the previous experiment (the crosshatched bar on the far left), one can see that the mean dimensional utilization was much higher. In fact there is little, if any, difference in the utilization of the configural information in the first and the second experiments as shown in Fig. 3.1 by comparing the two bars hatched with horizontal lines. A different experiment using a three-cue nonmetric environment found a higher utilization for a relevant dimension than for a relevant pattern that consisted of the other two dimensions (Edgell & Morrissey, 1992). This was found for a condition that the authors called the unitary stimuli condition. The stimuli consisted of one or two squares or triangles made up of vertical or horizontal lines. The utilization of the relevant dimension was over twice as high as the utilization of the relevant pattern when averaged over the three possible conditions where a different one of the dimensions was relevant. It should be noted that this result occurred even though the validity of the pattern (.3) was higher than the validity of the relevant dimension (.2). The other type of stimuli (consisting of a pair of different letters or characters for each dimension), which was run in this study, resulted in only a slightly higher utilization for relevant dimensional as opposed to relevant configura! information. However, due to the higher validity of the configura) information, this is not evidence that configura! information is as easy to use as dimensional information. In fact, it could be taken as further evidence for dimensional information being easier to utilize. Stockburger and Erickson (1974) ran a nonmetric study with four binaryvalued dimensions and a binary-valued event. In one condition all the rele-

3.

51

CONFIGURAL AND DIMENSIONAL INFORMATION

0.5 -

0.4 -

z

0

I-

0.3 -

·-

0.3

z

0.2

0

t

B

A

,.

B

Group Reasons Ahemative

positive

negative

Prob.

A

10

15

.40

B

20

30

.40

A

.

B

The numerical values in the tables were chosen to illustrate the point. Certainly each of the effects can be obtained with other combinations of numbers. The point is that it is not necessary to assume any sort of group interaction for preference reversals to be manifested. The following heuristic example further illustrates this point. Suppose each person in the group writes each of his or her reasons on a slip of paper indicating to which alternative it referred and whether it was positive or negative. Then when the group meets, they simply pool the slips of paper and the group preference is determined by simple counting. 3 Although it is perhaps unlikely that decision makers in groups perform in the manner just described, this example shows that sometimes surprising results can have simple interpretations. 4 Occam's Razor suggests that the simple counting mechanism is a strong explanation. This example underscores the critical need for testing models rather than simply exploring data, and the need to design experiments that can discriminate between alternative models. 3This simple heuristic assumes that the decision makers do not eliminate duplicate reasons when they pool the coded reasons. Each group member could have completely separate reasons for the decision, or the reasons could simply be coded #J, "2, #J, and so forth. 4 lnterestingly, this counting approach could be relevant to other experimental work on individual and group decision making. Stasser (Stasser, 1992; Stasser & Titus, 1985) did extensive research on the pooling of information in group decision making. Recently, an elegant theory has been proposed to account for the observed behavior. A counting model like that described in this chapter provides an alternative, initial model for judging his results.

7.

PARADOXES IN DECISION MAKING

131

It is worth remarking that simple counting models that accurately predict group behavior from individual behavior have a long history in individual and group problem solving research (c.f., Lorge & Solomon, 1955, 1962; Restle, 1962; Restle & Davis, 1962). These models are able to predict group problem solving and learning behavior quite accurately from the performances of individual members of the group. Many have complained that such models are simplistic and fail to display the richness of group interaction. However, the point is that such models can do very well without assuming any special interaction between group members. Thus, the challenge to researchers is to find alternative models and theories that predict as well and provide for meaningful interaction. In chapter 4, Sawyer explores the scaling of alternative value functions in multiattribute models. He explores monotone, but not linear (affine), transformations. He modeled behavior using the Lens Model with extra terms for linear, quadratic, and cubic effects, although the analysis cannot reveal the precise form of the functions. One difficulty with the Lens Model is that it is descriptive, and, although it may give an elegant account of outcomes, it tells us little about process. An alternative approach, which Sawyer mentions, is to apply Anderson's functional measurement approach to the data (Anderson, 1979, 1986). Although this could permit the scales to be identified (up to a transformation}, it still tells us little about process. Over recent decades, it has been repeatedly demonstrated that linear (or other algebraic models) are superior to individual judgment, even though the models may tell us little about the behavior of individual decision makers (e.g., Dawes, Faust, & Meehl, 1989; Hoffman, 1960}, although some have called for a rapprochement of competing views of the judgment process (Kieinmuntz, 1990). One approach is to combine the analysis of value functions with process tracing models. Although such eclectic approaches have been frequently advocated (e.g., Einhorn, Kleinmuntz, & Kleinmuntz, 1979}, implementation has been surprisingly rare. One criterion seldom considered in modeling individual judges is the acceptability of the model to the decision maker. Linear models have been shown to be superior (in an actuarial or statistical sense) to individual judgments. Such analysis suggests the use of such models in place of the actual judge. However, almost everyone would agree that the model is still only an approximation of the actual judgment process. But practically no study has asked decision makers about the acceptability of the model as a substitute for their own judgments. Sawyer and Castellan (in prep.), using the land use paradigm described earlier but with individual subjects only, asked subjects to make a number of land use judgments. Several days later, subjects returned to the task but were not asked to make numerical evaluations again. Rather, they were presented with a set of three ratings for each scenario, and asked to choose the one number from each triple that best reflected their

132

CASTELLAN

own evaluation. The triples were simply presented as three scale values representing different possible evaluations and their source was not identified. One element of each triple was the subject's own earlier judgment, another was the judgment predicted by a linear model of the decision maker, and the third was a value chosen to control for spacing and magnitude ordering of the triple. The basic outcome of relevance here is that subjects were essentially indifferent between their original evaluations and the model's predictions. But both were significantly preferred to the control values. 5 The indifference may have important meaning. One reason advanced for using linear models in place of actual judgments is that they remove the random error that is part of each judgment. Presumably, on repeated testing, actual judgments should vary. The model predictions should be within some suitable or acceptable interval of internal error. The decision makers' indifference between their original predictions and the model predictions suggests that the predicted values are within that interval. Edgell's chapter on processing configura! information (chapter 3) uses a task that is quite different from that of the other authors in this section. The research and the analysis is directed to testing a specific model. The analysis pits one model against another. This is a strong approach, because we make progress by pitting one model or theory against another. In pursuing the analysis and the subsequent interpretation, we find that the model does not deal with the salience of the alternatives considered in the decision task. In his task, a subject must combine information about stimuli, which vary in size, color, and shape. (Here color sometimes refers to shading in the stimuli.) The model makes predictions in terms of the validity of the cue dimensions and the patterns formed from them. There seem to be differences in performance that depend on which cue dimension has the higher validity. However, the model deals with cue validity but not the intrinsic salience of a cue dimension. An important next step is to include salience factors in the model. However, this might not be so straightforward. Salience can easily account for differences in performance on single dimension tasks, but it appears that salience interacts with the validity of the configura! information as well. A salience factor should help to explain the results when the components of the stimuli are decomposed before presentation. The effects of salience or meaningfulness raise strong challenges for Edgell's model. The salience effects reported by Edgell are not surprising because it is well documented in other studies of multiple-cue probability learning

51n another condition the same experiment, subjects produced separate importance weightings for the various attributes in the land use scenarios, as well as judgments about each scenario. When tested several days later, an outcome triple consisting of model predictions, predictions based on importance weights, and control values, the model predictions were preferred three to two over importance weight predictions, and three to one over control predictions.

7. PARADOXES IN DECISION MAKING

133

(e.g., Muchinsky & Dudycha, 1975; Sawyer, 1991; Sniezek, 1986).6 Meaningfulness and salience are factors in several current theories and models, often descriptive in nature, of decision making under uncertainty, which offer productive avenues to follow. Examples include Image Theory (Mitchell & Beach, 1990) and the work on strategies discussed in chapter 2 by Payne, Bettman, and Johnson. Edgell argues that whether dimensional or configura! information is used (weighted) more in the judgment task is meaningless because of context effects. I would argue that it is not meaningless, but that we need a method for transforming physical dimensions into underlying psychological or sensory dimensions. That may be the key to understanding the real effect of configurality on judgment. And that is what Sawyer has begun to do for us with his scaling of value functions. The underlying theme in these remarks is the need for models. In all cases, models are the key to explanation. Models are almost always wrong, but they help us eliminate alternative explanations and can lead us to new insights. These papers have given us important clues about information processing and decision making. We now need unified theories and models that explain as well as predict behavior. Moreover, good theories and models should enable us to begin to generalize meaningfully our experimental results. The papers in this section give us some of the basic ingredients of such theories.

ACKNOWLEDGMENTS

I would like to thank Stephen Edgell, Scott Tindale, John Sawyer, and Bernhard Flury for their comments on an earlier version of this chapter.

REFERENCES Anderson, N. H. (1979). Algebraic rules in psychological measurement. American Scientist, 67, 555-563. Anderson, N.H. (1986). A cognitive theory of judgment and decision. In B. Brehmer, H. Jungermann, P. Lourens, & G. Sev6n (Eds.), New directions in research on decision making (pp. 63-108). Amsterdam: North-Holland. Carroll, J. S. (1980). Analyzing decision behavior: The magician's audience. In T. S. Wallsten (Ed.), Cognitive processes in choice and decision behavior (pp. 69-76). Hillsdale, NJ: Lawrence Erlbaum Associates. 6Strictly speaking, one should make a distinction between salience and meaningfulness, because one can easily conceive of cues that are salient but not meaningful and cues that are meaningful but not salient. The critical issue for the discussion here is that a dimension has characteristics other than its statistical properties, which renders it more distinctive than other dimensions.

134

CASTELLAN

Castellan, N.J., Jr., & Sawyer, T. A. (1990). Multiattribute decision models: Task order and group effects. In G. von Furstenberg (Ed.), Acting under uncertainty: Multi-disciplinary conceptions (pp. 353-372). Boston: Kluwer. Dawes, R. M. (1975). The mind, the model, and the task. In F. Resile, R. M. Shiffrin, N.J. Castellan, Jr., H. R. Lindman, & D. B. Pisoni (Eds.), Cognitive theory (Vol. 1, pp. 119-129). Hillsdale, NJ: Lawrence Erlbaum Associates. Dawes, R. M., Faust, D., & Meehl, P. E. {1989). Clinical versus actuarial judgment. Science, 243, 1668-1674. Einhorn, H. J., Kleinmuntz, D. N., & Kleinmuntz, B. (1979). Linear regression and process-tracing models of judgment. Psychological Review, 86, 465-485. Hastie, R., & Park, B. (1986). The relationship between memory and judgment depends on whether the judgment task is memory-based or on-line. Psychological Review, 93, 258-268. Hoffman, P. J. (1960). The paramorphic representation of clinical judgment. Psychological Bulletin, 57, 116-131. Kleinmuntz, B. (1990). Why we still use our heads instead of formulas: Toward an integrative approach. Psychological Bulletin, 107, 296-310. Lorge, 1., & Solomon, H. (1955). Two models of group behavior in the solution of Eureka-type problems. Psychometrika, 20, 139-148. Lorge, 1., & Solomon, H. (1962). Group and individual behavior in free-recall verbal learning. In J. Criswell, H. Solomon, & P. Suppes (Eds.), Mathematical methods in small group processes (pp. 221-231). Stanford, CA: Stanford University Press. Mitchell, T. R., & Beach, L. R. (1990). " ... Do I love thee? Let me count ... " Toward an understanding of intuitive and automatic decision making. Organizational Behavior and Human Decision Processes, 47, 1-20. Moore, D. S., & McCabe, G. P. (1989). Introduction to the practice of statistics. San Francisco: W. F. Freeman. Muchinsky, P.M., & Dudycha, A. L. (1975). Human inference behavior in abstract and meaningful environments. Organizational Behavior and Human Performance, 13, 377-391. Resile, F. (1962). Speed and accuracy of cognitive achievement in small groups. In J. Criswell, H. Solomon, & P. Suppes (Eds.), Mathematical methods in small group processes (pp. 250-262). Stanford, CA: Stanford University Press. Restle, F., & Davis, J. H. (1962). Success and speed of problem solving by individuals and groups. Psychological Review, 69, 520-536. Sawyer, J. E. (1991). Hypothesis sampling, construction, or adjustment: How are inferences about nonlinear monotonic contingencies developed? Organizational Behavior and Human Decision Processes, 49, 124-150. Sawyer, T. A., & Castellan, N.J., Jr. (in prep.). Preferences among predictions and the correlation between predicted and observed judgments. Sniezek, J. A. (1986). The role of variable labels in cue probability learning tasks. Organizational Behavior and Human Decision Processes, 38, 141-161. Sniezek, J. A., & Henry, R. A. (1990). Revision, weighting, and commitment in consensus group judgment. Organizational Behavior and Human Decision Processes, 45, 66-84. Stasser, G. (1992). Information salience and the discovery of hidden profiles by decision-making groups: A "thought experiment." Organizational Behavior and Human Decision Processes, 52, 156-181. Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision making: Biased information sampling during discussion. Journal of Personality and Social Psychology, 48, 1467-1478.

PART

III jURY DECISION MAKING

CHAPTER

8 THE NORMATIVE STATUS OF BASE RATES AT TRIAL Jonathan J. Koehler University of Texas

Quantitative probability ... is not proof, nor even probative evidence of the proposition to be proved. That in one throw of the dice there is a quantitative probability, or greater chance, that a less number of spots than sixes will fall uppermost is no evidence whatever that in a given throw such was the actual result. Without something more, the actual result of the throw would still be utterly unknown. The slightest real evidence that sixes did in fact fall uppermost would outweigh all the probability otherwise. -Day v. Boston and Maine R.R., 1902, p. 774

Although poorly reasoned and written long before the recent surge in the use of quantitative evidence at trial (see Fienberg, 1989), the conclusions of the Day court reflect a judicial wariness of probability evidence that persists today. Some courts and legal scholars believe that the use of probabilistic evidence and methods at trial is inconsistent with certain goals and values of the judicial process (Cohen, 1977; for an early judicial opinion along these lines, see Virginia v. Hawk, 1908). It has been argued, for example, that the notion of individualized justice is compromised when probabilistic argumentsarguments that obtain their force from factors and situations other than those immediately at issue in the present case-are used to help convict a defendant (Nesson, 1985; Tribe, 1971). A second set of arguments regards the introduction of probability evidence to be at once confusing and seductive. Accordingly, jurors will be overly impressed with probabilities and statistics and underweight other probative (i.e., 137

138

KOEHLER

diagnostic) evidence that does not readily lend itself to quantification (Tribe, 1971 ). The Minnesota Supreme Court has repeatedly rejected probabilistic evidence on these grounds in cases involving blood (State v. Boyd, 1983), hair (State v. Carlson, 1978), semen (State v. Kim, 1987), and DNA (State v. Schwartz, 1989; State v. Nielsen, 1991) evidence. Finally, it has been argued that even if the seduction of the quantitative is resisted, the use of probabilistic evidence and methods will lead to less accurate or even absurd verdicts (Brilmayer & Kornhauser, 1978; Jonakait, 1983). Some say that even a skilled application of the rules of probability cannot promote a search for truth in the courtroom. The first argument-that probability evidence conflicts with certain values in our judicial values-is essentially a policy argument. The procedural and moral concerns it reflects indicate that the discovery of truth is not the only function of a trial. There is also great sensitivity to issues related to the process by which truth is discovered. For example, the courts will often disallow the introduction of even highly probative evidence when it has been obtained in ways that seem to violate the privilege against selfincrimination (Miranda v. Arizona, 1966) or the right to confidential attorneyclient and doctor-patient communications (although see U.S. v. Hodge & Zweig, 1977, and Tarasoff v. Regents of University of California, 1976, for exceptions). These policy concerns are important. Many of them cut straight to the heart of the nature of justice. But the argument that would reject probabilistic evidence and methods on grounds that they contradict one or more cherished policies does not withstand careful scrutiny. All evidence is probabilistic, in the sense that there is a risk of error associated with each fragment of courtroom testimony. Evidence that is referred to here and elsewhere as probabilistic differs from other types of evidence only in the overtness of this risk of error. Elsewhere, Daniel Shaviro and I have argued that evidence that is only implicitly probabilistic is no Jess immoral, unfair, humiliating, or dehumanizing than much of the overtly probabilistic evidence that has been criticized (Koehler, 1992a; Koehler & Shaviro, 1990; Shaviro, 1989). The second set of arguments-that jurors will overweight probability evidence-is an intuitively appealing empirical claim; that is, according to Tribe (1971 ), "Readily quantifiable factors are easier to process-and hence more likely to be recognized and then reflected in the outcome-than are factors that resist ready quantification" (p. 1,362). However, empirical research on this issue lends no support to Tribe's assertion. Quite the contrary, it appears that people may not attach as much weight to quantitative evidence as they perhaps should. Recent studies on mock jurors' use of probability evidence almost uniformly show that subjects attach relatively little weight to statistical evidence when other sources of evidence are available (for

8.

THE NORMATIVE STATUS

139

reviews, see Kaye & Koehler, 1991; Thompson, 1989). 1 These data are consistent with a larger body of literature on statistical reasoning as well (see Nisbett & Ross, 1980; Saks & Kidd, 1980-1981). The remainder of this chapter focuses primarily on the third set of arguments against the use of probabilistic evidence and methods at trial. For the most part, these arguments are concerned with (a) the diagnostic value of probability evidence in general, and (b) the value of probabilistic techniques for combining different sources of evidence. Critics claim that naked statistics, such as base rates, are an inferior or worthless form of legal evidence. In addition, some charge that there is little relation between the rules of probability (e.g., Bayes' theorem) and accurate fact finding. Jurors and judges would be better off, they say, relying on more intuitive methods for arriving at final verdicts. These arguments, which are often embedded in discussions of various hypotheticals, are discussed next. COURTROOM BASE RATES AND THE HYPOTHETICAL$

A base rate may be defined as the general frequency that an event occurs or an attribute is present in some reference population. Base rates are generally expressed as proportions. Thus, the base rate for come-from-behind victories by a local hockey team might be 30%, whereas the base rate for bad tempers among the Amish might be less than 5%. The discussion of base rates at trial has centered primarily around several hypotheticals first introduced by critics of probabilistic evidence and methods. In one hypothetical, a woman is hit by a bus, and it is known that 80% of the buses in town were operated by the blue bus company. Is the 80% base rate probative with respect to the issue of whether a blue bus hit the woman? If so, should it be admitted as evidence in a civil suit against the blue bus company? And if it is admitted and no other evidence is heard, should the plaintiff prevail?2 1It must be noted, however, that the evidence to date on the impact of probabilistic evidence on jurors' decisions is limited. No studies have examined the impact of extremely small probability values (e.g., I in a million, or I in a billion) that, although often highly probative. may still be overvalued by jurors. Moreover, most studies have not been conducted in a realistic or even semirealistic manner in which jurors are exposed to probability evidence in the context of trial procedure in which witnesses are cross-examined, counterarguments are offered, and verdicts are rendered following group deliberation. 2This hypothetical, which is due to Tribe (1971), is based on Smith v. Rapid Transit, Inc. (1945). In this case, a female driver was sideswiped late at night on Main Street by a bus that she could identify only as "a big, long wide affair." Nevertheless, the woman sued the Rapid Transit bus company on the grounds that, because Rapid Transit chartered most of the buses on Main Street, a Rapid Transit bus probably caused her accident. The court agreed but held in favor of Rapid Transit: "The most that can be said of the evidence in the instant case is that the mathematical chances somewhat favor the proposition that a bus of the defendant caused the accident. This was not enough" (p. 754).

140

KOEHLER

A second hypothetical involves the sale of 499 tickets to a rodeo event that is attended by 1,000 people (Cohen, 1977). According to the base rate, the probability that a random patron is a gatecrasher (assuming that all those who did not purchase a ticket are gatecrashers) is 50.1 %. Should this base rate be admitted in a suit against a randomly selected patron? If other information is made available, such as evidence that this patron did or did not have a ticket stub in his or her possession, is the base rate probative? In a third hypothetical, 24 of 25 prisoners in a prison yard participate in the killing of a prison guard, while one prisoner does not participate (Nesson, 1979). Because none of the prisoners will talk, the warden selects one of the prisoners to stand trial for the crime and argues that the 96% base rate is strong evidence in favor of the defendant's guilt. Is this base rate probative with respect to the defendant's guilt? If so, does it provide sufficient grounds on which to convict?3 Reactions to these hypotheticals are typically strong. Whereas people disagree about which, if any, of the these base rates should be admitted at trial, few believe that any are sufficient to sustain a conviction. Wells (1992) recently showed that mock jurors and experienced trial judges are generally unwilling to find for the plaintiff in cases based on naked statistical evidence alone. Moreover, this reluctance appeared to be unrelated to the ultimate issue probability values that are implicated by naked base rate evidence. Many people believe that it is not fair to convict a defendant in the absence of evidence that relates him or her to the crime in a more direct fashion (Brilmayer & Kornhauser, 1978). Many more people would be willing to convict in, say, the blue bus problem if an 80% reliable eyewitness identified the bus that hit the woman as a blue one. 4 The blue bus hypothetical and its variants are surely engaging and provocative. They remind us that verdicts are based on considerations other than probabilistic ones. Unfortunately, they do not serve the purpose of educating the legal profession about the evidentiary value of probability evidence. Instead, they invite readers to confuse the probative value of base rates with

m)

31ndeed, according to one study, a majority of judges would consider a 96% probability of guilt proof "beyond a reasonable doubt," the standard of proof that must be satisfied in a criminal suit (McCauliff, 1982). This study of 171 judges showed that, whereas the judges varied widely in the probability values that they associated with certainty beyond a reasonable doubt (50%-100%), the median value was 90%. Similar results were obtained by Simon and Mahan (1971). 4 By "fair" I have in mind a more general notion than the distributional fairness hypothesis that Wells (1992, Experiment 3) tested and subsequently ruled out as an explanation for his data. In this experiment, Wells showed that, even when the statistical evidence is made case specific and distributionally fair (in the sense that a defendant who is at fault X% of the time will lose only X% of cases over the long run), fact finders are reluctant to find for the plaintiff. However, fact finders may have fairness concerns other than distributional ones. For example, they may believe that a verdict for the plaintiff must be grounded in at least some direct (i.e., case specific) evidence to be fair.

8.

THE NORMATIVE STATUS

141

their moral sufficiency to sustain a verdict. Too often the conclusion that base rates should not be admitted at trial is justified by analyses of one or more hypotheticals in which it is argued that defendants should not be convicted on base rate evidence alone. In practice, however, one rarely encounters cases in which base rates represent the only available evidence. More often, plaintiffs offer at least some evidence that corroborates a previously admitted base rate. This additional evidence need not be capable of sustaining the plaintiff's contention by itself in order for the plaintiff to prevail. When the Smith court held that the probabilistic evidence "was not enough" to hold Rapid Transit, Inc. responsible for the plaintiff's injury (see Footnote 2), it did not mean that the probabilistic evidence was irrelevant or lacked probative value. Instead, the court meant that verdicts must be both probable and morally defensible. Even if one believes that probability evidence-such as a base rate-fails this moral defensibility standard, it need not fail the probity standard. Thus, the reluctance of fact finders to convict the defendants in the three preceding hypotheticals has no bearing on the true or perceived probative value of the base rate evidence. It may well be that the presentation of even a small amount of additional nonstatistical evidence against the defendant would convince many judges and jurors to find in favor of the plaintiff in each of these hypotheticals. Nevertheless, some legal critics question the probative value of base rates. Four popular arguments are considered next. In general, these arguments are unconvincing. However, a less extreme form of the fourth argument is presented and treated as a potentially serious threat to the probative status of base rate evidence in the courtroom.

FOUR SKEPTICAL ARGUMENTS Argument #I: Base Rates are Irrelevant in Individual Cases Because They Only Inform about Groups, Cases in General, or Cases in the Long Run. This

argument is more likely to be advanced by law students and practicing attorneys who have little or no familiarity with statistics or probability theory than by professional critics of probability evidence. 5 Nevertheless, it is included here because it has great intuitive appeal for many who are skeptical of probabilistic reasoning at trial. From the standpoint of accuracy, there is no valid distinction between longrun group data and individual cases. Because groups are composed of individual cases, information, like base rates, that informs long-run accuracy rates, must likewise inform accuracy in individual cases. Failure to understand this principle can be costly or even deadly. L. J. Cohen (1981) argued that a hospital 51ndeed,

some well-known critics, like Lawrence Tribe, expressly reject it.

142

KOEHLER

administrator interested in maximizing lives saved should use base rate data, whereas individual patients, presumably interested in maximizing their own chances for survival, should not. But how can it be that decision makers exposed to identical information should make different probability estimates? If the use of base rates is appropriate for the administrator, it must also be appropriate for the patients in the administrator's hospital. As Krantz (1981), Sternberg (1981), and others noted in replies to Cohen, patients who follow Cohen's advice stand a far greater chance of dying than those who do not. Argument "'2: Base Rates are an Inferior Form of Evidence. As just noted, many believe that probabilistic evidence alone is insufficient to sustain a conviction. It is widely felt that it would be unfair to convict a defendant in the absence of at least some individuating evidence to support the charge. The inferiority of probability evidence that this position entails is policy based; that is, the reluctance to ground verdicts entirely on probabilistic evidence may have little to do with the probative value of this evidence. But some critics have merged and confused these policy and probity considerations. In her discussion of the blue bus and gatecrasher hypotheticals, Brilmayer (1986) suggested that base rate evidence cannot tell us that the ABC company "really" was at fault because it operated so many blue buses, or that Sally Smith "really" was a gatecrasher. Instead, Brilmayer said: "[t]here is only a background statistic about the number of buses owned, or the number of tickets sold" (p. 675). Although not explicitly stated, Brilmayer's argument reflects a belief that the probability evidence is less likely to promote accurate verdicts than other forms of evidence. This is not true. As already noted, all evidence is probabilistic, in the sense that it carries with it a risk of error. An 80% base rate carries with it the same 20% risk of error as does an 80% reliable eyewitness. Error does not care about its source. It cares only about probability, whether implicit or explicit. In terms of determining whether Sally Smith "really" was a gatecrasher or not, probability evidence is no less helpful to the decision maker than eyewitness testimony that carries with it the same probability of error. Argument #J: Base Rates Become Irrelevant when Individuating lnforma· tion is Made Available. This is essentially the argument used by the Day court

(see opening quotation) when it concluded its dice-throwing example by suggesting that "the slightest real evidence that sixes did in fact fall uppermost would outweigh all the probability otherwise." Although empirical data suggest that people often do attach much greater weight to individuating information than to base rate probabilities (for reviews, see Bar-Hillel, 1983; Koehler, 1992b), the Day court's reasoning is fallacious. Imagine, for example, that an eyewitness gets a brief glimpse of a 100-sided

8.

THE NORMATIVE STATUS

143

die and believes that it had 71 spots uppermost. Certainly, this is probative information in the legal sense that it makes the fact more probable than it was prior to the introduction of this evidence (see Federal Rule of Evidence 401). However, the 1% base rate probability for 71 spots is also probative, and it remains probative even after the eyewitness testimony is introduced. 6 In this case, the very large prior probability that 71 spots would not fall uppermost (99%) likely would and should outweigh the shaky eyewitness testimony to the contrary. The extent to which this principle is appreciated by the courts is hard to assess. In Bazemore v. Davis (1978) the District of Columbia Court of Appeals warned against making judgments based on a simple more-likely-than-not standard. In a footnote, the court elaborated with an example in which it is known that 99% of cars are black and a person is asked to guess the color of a particular car. In the absence of additional information, the court writes, we should guess that it is black. "It is equally true, however, [that) if we opened our eyes we could make a more accurate determination" (p. 1,382, Footnote 7). This example can hardly be regarded as a counterexample to the value of probabilistic reasoning or as an argument against evidence against the use of a probabilistic standard of guilt. At best, it is a reminder that decision makers should use all available information before making decisions. At worst, it is an invitation to discard probability evidence when evidence of a more individuating kind becomes available. If the car in question soared past an eyewitness's window at dusk, the identification of the car as any color other than black should-from an accuracy standpoint-be regarded with suspicion. 7 Argument #4: Base Rates are Worthless Because They Rarely, if Ever, are Derived from Appropriate Reference Classes. Base rates were previously defined

as the relative frequency with which an event occurs or an attribute is present in some reference population. However, identifying the appropriate reference populations for most real-world probabilistic judgment tasks is itself problematic (Einhorn & Hogarth, 1981). Consider, for example, Tversky and Kahneman's (1980) well-known taxi cab problem. 8 We are told that 85% of the cabs in a city are Green and 15% are Blue. We are also told that an 80% reliable witness reports that the color of the cab he observed in a hit-and-run accident at night was Blue. Tversky and Kahneman assume that the base rates provided in this problem for Blue 6The only time a base rate is completely irrelevant in the face of eyewitness testimony is when the eyewitness testimony is infallible. 7Assuming that the prior probability of a black car is .99, an eyewitness who identifies the car as a color other than black would need to be more than 99% reliable under the task condi· lions in order for the odds that the car is not black to be greater than 50%. 8Smith v. Rapid Transit (1945) or Tribe's (1971) Blue Bus hypothetical were probably the inspiration for this hypothetical.

144

KOEHLER

and Green cabs in the city form the Bayesian prior odds ratio, whereas the witness reliability statistic provides information needed to form the likelihood ratio. According to this reasoning, a Bayesian posterior odds ratio may be computed to identify the relative probabilities that the cab in the accident was Green or Blue. 9 Not everyone accepts this solution to the problem. For example, L. J. Cohen (1981) argued that a base rate is informative only when its reference class "share[s] all the relevant characteristics" with the instant case (p. 329). Cohen disputes Tversky and Kahneman's solution to the cab problem on the grounds that the available base rates are derived from a reference class that neglects certain "relevant characteristics" of the instant case. For Cohen, the relevant base rates would be derived from the reference class "cabs in accidents in the city" rather than from the more general class "cabs in the city." The base rates associated with "cabs in accidents in the city" would seem to be more relevant than those associated with "cabs in the city," because they take into account a causal feature of apparently great significance, namely, the propensity to get into accidents. But it might also be argued that base rates derived from the reference class "cabs in accidents at night" would be even more relevant to a prediction of the instant case. And if one believes that geographical location is important, then the base rates associated with "cabs in accidents at night in the vicinity of this accident" would be more relevant still. In principle, then, base rate refinements may be offered until the reference class reduces to a set of one (the instant case alone), or at least until it becomes so small that it does not allow for reliable base rate estimates. Indeed, if Cohen's relevance requirement is taken too literally, then no background information could ever be considered useful. Imagine, for example, that it is known that people your age with similar medical histories and with three of your symptoms (fatigue, intermittent chest pains, and chronic muscle soreness) have a 90% chance of recovering completely when drug A is taken. But now suppose you develop a fourth symptom-say, shortness of breath-about which there is no explicit base rate data for people who share your other characteristics and symptoms. Assuming that shortness of breath is a "relevant characteristic," must we disregard the preceding 90% base rate statistic because it is derived from an insufficiently refined reference class? In fact, most people would continue to place great stock in the base rate, and they would likely be better off for doing so. 10 9

P(Bib) = P(B) P(Gib) P(G)

x P(b/8) =

~

x .80 =

g

P(b/G)

.85

.20

17

where B = Cab in accident was Blue, G = Cab in accident was Green, b = Witness reports cab in accident was Blue. According to this solution, the odds that the cab in the accident was Blue given that the witness reports seeing a Blue cab are 12:17 (i.e., 41%). 10Cornfield's Theorem shows that it is often unreasonable to assume that an unexamined factor can explain an observed association; under some conditions, it is impossible (see Gastwirth, 1988, pp. 296-297).

8.

THE NORMATIVE STATUS

145

SUBOPTIMAL BASE RATES AND BAYESIAN TECHNIQUES

The conclusion that people will be better off (i.e., make more accurate judgments) using base rates that are derived from incompletely refined reference classes is essentially an empirical claim. Normative models such as Bayes' theorem cannot guarantee the accuracy of their output values when there is some question as to the appropriateness or accuracy of one or more input values. In many base rate tasks, for example, available base rates may not necessarily translate into the prior probabilities that Bayes' theorem requires (this point is elaborated in Koehler, 1992b). Moreover, our confidence that base rates will improve judgmental accuracy should diminish when target cases are not sampled at random from stable and well-defined sample spaces. In such cases (which are the rule rather than the exception), the role base rates play in improving decisional accuracy is an empirical matter. Consider the following example. If a piece of Halloween candy is selected at random from a basket that contains 10 Tootsie Rolls and 90 Sweet Tarts, we know that there is a 10% chance of selecting a Tootsie Roll. But if a candy is selected from the basket in an admittedly nonrandom fashion by a young Halloween trick-or-treater, it is not at all clear how, if at all, the 10% Tootsie Roll base rate should be employed in predicting the selection. Obviously, it is much more important to know the base rate preferences for Tootsie Rolls and Sweet Tarts among the trick-or-treater population. Once this proportion is known (or even estimated), the proportion of Tootsie Rolls in the basket may be no help whatsoever in predicting the selection. 11 In this admittedly self-serving example, the outcome that results from a nonrandom selection process is determined entirely (or nearly entirely) by a feature other than base rates, namely, preference. In many real-world cases, however, base rates are more likely to exert an impact on outcomes, even in the absence of random selection from well-defined sample spaces. Again, identification of the conditions under which base rates are most likely to increase judgmental accuracy in the natural ecology is a topic for empirical research. Fortunately, a few clinical studies have emerged that provide some basis for speculation about the predictive power of suboptimal, real-world base rates. Willis (1984) reanalyzed a series of neuropsychological studies and concluded that, in most cases, diagnostic accuracy could have been improved 11 This conclusion depends on what assumptions are made about the possibility of errors and the opportunity to correct them. If it is assumed that uncorrectable selection errors will be made even by those trick-or-treaters who deliberately attempt to select a particular candy, knowledge of the base rates will increase predictive accuracy. Specifically, as the proportion of the less preferred candy increases, the probability that a failure to select the particular preferred piece of candy will result in the selection of the less preferred type of candy increases.

146

KOEHLER

by equating available base rates with prior probabilities and aggregating them with other data in a Bayesian way. Duthie and Vincent (1986) likewise found that a Bayesian aggregation of base rates and Diagnostic Inventory of Personality and Symptoms (DIPS) percentile scores greatly improved diagnostic hit rates over either indicator when used alone. Balla, lansek, and Elstein (1985) presented 44 experienced physicians with negative CT scan data about an actual lung cancer patient and asked them to estimate the probability that this patient had metastic deposits in his brain. Textbook base rate data strongly supported the presence of such deposits (99.8%); a Bayesian analysis that combined the CT scan data with this base rate yielded a 98% posterior probability in favor of deposits. However, 29 of the 44 physicians (66%) estimated the probability of deposits to be less than 25%. A postmortem on the patient defied the physicians' intuitive predictions and revealed that he did indeed have brain deposits. Although based on an n of 1, this result slightly increases our confidence in the efficacy of a Bayesian use of real-world base rate data. These studies do not overwhelmingly demonstrate that predictive accuracy is enhanced when suboptimal, real-world base rates are processed in a Bayesian way. Clearly, more focused research that pits Bayesian methods against other techniques in a variety of contexts-including the courtroom-is needed. But for now there is little support for the strong claim that decision makers should ignore suboptimal base rates.

CONCLUSIONS

Much of the criticism directed at probability evidence at trial arises from policy-based concerns about how justice ought to be pursued. The validity of many of these criticisms have themselves been challenged elsewhere. But even when they have merit, policy concerns are not necessarily decisive. Judges must also consider the probative and potentially prejudicial impact of probability evidence (Federal Rules of Evidence 401 and 403; see Koehler, 1991, for further discussion of the probity-policy distinction). To the extent judges understand the probative and prejudicial impacts this type of evidence should and will have on jurors, they will be able to make more informed decisions about its admissibility. Some have charged that probability evidence should be excluded at trial because it will "dwarf all efforts to put it into perspective with more impressionistic sorts of evidence" (Tribe, 1971, p. 1,360). Twenty years later, no evidence has emerged to support Tribe's contention. If anything, there is good reason to believe that jurors generally underweight probabilistic evidence relative to other, less quantifiable, types of evidence. This chapter was largely concerned with legal criticisms about the probative merit of base rate probability evidence. The traditional criticisms were

8.

147

THE NORMATIVE STATUS

found to be wrong or unconvincing. The claim that probabilities are diagnostically relevant to a large series of cases but irrelevant to the individual cases that compose the series is logically indefensible. Similarly, the argument that would treat overtly probabilistic evidence as diagnostically inferior to evidence that is only implicitly probabilistic cannot be sustained; probative value is not negatively related to numerical explicitness. Although probability evidence can and should be defended against these criticisms, special problems associated with base rate evidence were identified. Most worrisome are the difficulties associated with identifying appropriate reference classes and treating target cases as if they were sampled at random from these reference classes. These problems are serious and deserve more attention than they have received. On the other hand, it was argued that these problems do not necessarily justify disregarding base rates in favor of other types of evidence as some have suggested. Instead, the probative value of base rates that are not obtained under ideal circumstances should be treated as an empirical matter. Hopefully, future studies will investigate the important prescriptive issues related to identifying the conditions under which attentiveness to these suboptimal base rates will and will not improve judgmental accuracy in the courtroom and elsewhere. ACKNOWLEDGMENTS

Thanks are due to Joseph Gastwirth for his comments on an earlier version of this chapter. Some of the ideas presented here were discussed in a Cornell Law Review (1990) article by the author and Daniel N. Shaviro. REFERENCES Balla, J.I., lansek, R., & Elstein, A. (1985). Bayesian diagnosis in presence of preexisting disease. The Lancet, /, 326-329. Bar-Hillel, M. (1983). The base rate fallacy controversy. In R. W. Scholz (Ed.), Decision making under uncertainty (pp. 39-61). North-Holland: Elsevier. Brilmayer, L. (1986). Second-order evidence and Bayesian logic. Boston University Law Review, 66, 673-691. Brilmayer, L., & Kornhauser, L. (1978). Review: Quantitative methods and legal decisions. University of Chicago Law Review, 46, 116-153. Cohen, L. J. (1977). The probable and the provable. Oxford: Clarendon Press. Cohen, L. J. (1981). Can human irrationality be experimentally demonstrated? The Behavioral and Brain Sciences, 4, 317-331. Duthie, B., & Vincent, K. R. (1986). Diagnostic hit rates of high point codes for the diagnostic inventory of personality and symptoms using random assignment, base rates, and probability scales. Journal of Clinical Psychology, 42, 612-614. Einhorn, H. J., & Hogarth, R. M. (1981). Behavioral decision theory: Processes of judgment and choice. Annual Review of Psychology, 32, 53-88.

148

KOEHLER

Fienberg, S. E. (Ed.) (1989). The evolving role of statistical assessments as evidence in the courts. New York: Springer-Verlag. Gastwirth, J. L. (1988). Statistical reasoning in law and public policy (Vol. 1). New York: Academic Press. Jonakait, R. N. (1983). When blood is their argument: Probabilities in criminal cases, genetic markers, and, once again, Bayes Theorem. University of Illinois Law Review, /983, 369-421. Kaye, D. H., & Koehler, J. J. (1991). Can jurors understand probabilistic evidence? Journal of the Royal Statistical Society A, 154, 75-81. Koehler, J. J. (1991). The probity-policy distinction in the statistical evidence debate. Tulane Law Review, 66, 141-150. Koehler, J. J. (1992a). Probabilities in the courtroom: An evaluation of the objections and policies. In D. K. Kagehiro & W. S. Laufer (Eds.), Handbook of psychology and law (pp. 167-184). New York: Springer-Verlag. Koehler, J. J. (1992b). On the use and appropriateness of base rates in probabilistic judgment. Unpublished manuscript. Koehler, J. J., & Shaviro, D. (1990). Veridical verdicts: Increasing verdict accuracy through the use of overtly probabilistic evidence and methods. Cornell Law Review, 75, 247-279. Krantz, D. H. (1981). Improvements in human reasoning and an error in L. J. Cohen's. Behavior and Brain Sciences, 4, 340-341. McCauliff, C. M. A. (1982). Burdens of proof: Degrees of belief, quanta of evidence, or constitutional guarantees? Vanderbilt Law Review, 35, 1,293-1,335. Nesson, C. (1979). Reasonable doubt and permissive inferences: The value of complexity. Harvard Law Review, 92, 1187-1225. Nesson, C. (1985). The evidence or the event? On judicial proof and the acceptability of verdicts. Harvard Law Review, 98, 1,357-1,392. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice-Hall. Saks, M. J., & Kidd, R. F. (1980-1981). Human information processing and adjudication: Trial by heuristics. Law and Society Review, 15, 123-160. Shaviro, D. (1989). Statistical-probability evidence and the appearance of justice. Harvard Law Review, 103, 530-554. Simon, R. J., & Mahan, L. (1971 ). Quantifying burdens of proof. Law and Society Review, 5, 319-330. Sternberg, R. J. (1981). Some questions regarding the rationality of a demonstration of human rationality. Behavioral and Brain Sciences, 4, 352-353. Thompson, W. C. (1989). Are juries competent to evaluate statistical evidence? Law and Contemporary Problems, 52, 9-41. Tribe, L. H. (1971 ). Trial by mathematics: Precision and ritual in the legal process. Harvard Law Review, 84, 1329-1393. Tversky, A., & Kahneman, D. (1980). Causal schemas in judgments under uncertainty.ln M. Fishbein (Ed.), Progress in social psychology (pp. 49-72). Hillsdale, NJ: Lawrence Erlbaum Associates. Wells, G. L. (1992). Naked statistical evidence of liability: Is subjective probability enough? Journal of Personality and Social Psychology, 62, 739-752. Willis, W. G. (I 984). Reanalysis of an actuarial approach to neuropsychological diagnosis in consideration of base rates. Journal of Consulting and Clinical Psychology, 52, 567-569.

LEGAL CASES Bazemore v. Davis, 394 A2d 1377 (D.C. 1978). Day v. Boston and Maine R.R., 96 Me. 207, 52 A. 771 (1902).

8.

THE NORMATIVE STATUS

Miranda v. Arizona, 384 U.S. 436, 86 S. Ct. 1602, 16 L.Ed.2d 694 (1966). Smith v. Rapid Transit, Inc., 317 Mass 469, 58 N.E.2d 754 (1945). State v. Boyd, 331 N.W.2d 480 (Minn. 1983). State v. Carlson, 267 N.W.2d 170 (Minn. 1978). State v. Kim, 398 N.W.2d 544 (Minn. 1987). State v. Nielsen, 467 N.W.2d 615 (Minn. 1991). State v. Schwartz, 447 N.W.2d 422 (Minn. 1989). Tarasofl v. Regents of University of California, 17 Cal. 3d 425 (1976). U.S. v. Hodge and Zweig, 548 F.2d 1347 (9th Cir. 1977). Virginia v. Hawk, 160 F.348 (1908).

149

CHAPTER

9 THE EVALUATION OF HEARSAY EVIDENCE: A SOCIAL PSYCHOLOGICAL APPROACH Peter Miene, Eugene Borgida, and Roger Park University of Minnesota

Suppose that your favorite bank teller is present during a bank robbery, witnesses the robber, and a few hours later tells a police investigator everything that happened. At trial, in the absence of exceptional circumstances, the police investigator may not be allowed to testify about the events described by the teller. The evidence would normally be excluded even if the teller were unavailable at the time of trial. The introduction of the evidence would be barred by the hearsay rule (Park, 1987). In the law of evidence, hearsay is evidence that is introduced into court by one person (the witness) based on what another person (the declarant) has said outside of court, although a simple repetition of a statement is not necessarily hearsay. According to Lilly (1978), in order for testimony to be considered hearsay, "the repeated statement must be offered for the purpose of proving that what the declarant said is true-just as if the declarant were on the witness stand, giving testimony which the proponent wants the trier to believe" (p. 157). Thus, some out-of-court statements are not hearsay because they are not offered to prove the truth of the matter asserted. For example, a declarant's statement "If you don't help me I'll kill you," offered by the proponent to show that the hearer was under duress, would not be hearsay. It does not matter whether the declarant was telling the truth or not; the hearer might still fear death. Most of the reasons provided by the Federal Rules of Evidence for excluding hearsay do not apply because there is no need to crossexamine the declarant under oath. The fact that an out-of-court statement is hearsay does not necessarily mean lSI

152

MIENE. BORGIDA, PARK

that it will be excluded from evidence. There are dozens of exceptions to the hearsay rule. For example, hospital records are routinely admitted for the truth of what they assert under the business records exception to the hearsay rule. The hearsay exceptions themselves have exceptions and qualifications, and the law of hearsay is a complicated web of doctrine (Park, 1987). Despite the many exceptions to the hearsay rule, some evidence is clearly inadmissible hearsay. Suppose, for example, that in a criminal case the prosecution offered the written report of a police officer who observed a crime, in lieu of the officer's courtroom testimony. Or suppose that, in a civil suit to recover damages for personal injuries caused by an accident, one of the parties offered an investigator's testimony that a day after the accident she interviewed a bystander who saw the accident, and that the bystander said that one of the cars crossed the centerline. When offered as evidence to prove the truth of their assertions, the police report and the bystander's statement would almost surely be excluded on hearsay grounds. The difference between hearsay testimony and eyewitness testimony is significant, and particularly so for the research presented in this chapter. In eyewitness testimony, the witness testifies to information with which he or she has had direct experience, and this testimony is subject to direct and crossexamination. Under cross-examination, weaknesses and contradictions in the testimony are revealed, and it is assumed that the jury responds to this information accordingly. The power of cross-examination to reveal inaccuracies or inconsistencies in testimony is central to our adversarial system of justice (Monahan & Walker, 1990). With hearsay testimony, however, the witness, and especially the out-of-court declarant, are largely immune from the powers of cross-examination. The opposing attorney can do little but question the witness's ability to accurately recall what the declarant said and highlight the fact that this testimony is not based on direct experience. In general, there are two competing views regarding the proper treatment of hearsay admissibility (Miene, Park, & Borgida, 1992). One position argues that cross-examination of a hearsay witness cannot reveal anything about the credibility of the out-of-court declarant, and the testimony may therefore be unduly prejudicial. In addition, some legal scholars adopting this position cite the possibility that errors can easily occur when one person recalls information said by another (e.g., the hearsay witness did not correctly hear what the declarant said). For these reasons, legal scholars adopting this position do not favor the admissibility of hearsay evidence. Park (1987) stated that the opposing view takes the position that "hearsay can be convincing evidence, and it is the sort of evidence on which we routinely rely in the most important affairs of home, state, and business" (p. 54). This position argues, then, that people are aware of potential problems with hearsay from their everyday experiences and are able to process this type of information in a relatively unbiased manner. Thus, it is argued, hearsay

9.

EVALUATION OF HEARSAY EVIDENCE

153

evidence should not be withheld from the jury because jurors will be able to give this testimony an appropriate evaluation in reaching their verdict decisions. The question of hearsay admissibility revolves around the issue of whether it is just to withhold from the jury information that is perhaps unreliable and difficult to assess in verdict decisions, or to provide the jury with all available information, trusting that they will be able to use the information appropriately. Research on the processes of human inference would suggest that people are not always sensitive to factors that may underlie the reliability of evidence used in everyday life (Fiske & Taylor, 1991; Nisbett & Ross, 1980). Hearsay is a type of evidence that social perceivers, often prone to overweighing anecdotal and emotionally compelling evidence based on small samples, might be rather inclined to overvalue in their decision making. Hearsay evidence, if indeed more accessible in memory, may be better recalled while verdict decisions are being made, and this process could result in hearsay being overvalued (see lmwinkelried, 1989; Stewart, 1970). Cautionary instructions from a judge, however, coupled with cross-examination pointing out the potential flaws of hearsay evidence, may prompt jurors to be cautious in their interpretation and evaluation of hearsay. Such testimony would then have a negligible impact on verdict decisions. Only a few empirical studies, however, have directly examined these issues, and they are discussed in the next section.

EMPIRICAL STUDIES OF HEARSAY

A study reported by Landsman and Rakos (1990, 1991) is one of the first attempts to assess the impact of hearsay evidence on decision making. Landsman and Rakos designed their study to gauge the conditions under which hearsay would be relied upon by mock jurors by experimentally manipulating the strength of hearsay testimony and the overall strength of the case against the defendant in a factorial research design. Their goal was to examine the overall impact of hearsay on decision making by creating some situations in which the hearsay was stronger than the other evidence and other situations in which the hearsay was weaker than the other evidence. Because their interest centered on the impact of the hearsay evidence, the authors chose levels of hearsay evidence that varied both in the content of the testimony and in the credibility of the hearsay witness. The methodological confounding of these variables was deemed necessary to create levels of hearsay that varied widely along a dimension of evidentiary strength. To examine how mock jurors utilize hearsay evidence in decision making, Landsman and Rakos randomly assigned 147 participants to 1 of the 12 conditions created by the 4 (levels of hearsay) x 3 (strength of other evi-

154

MIENE, BORGIDA, PARK

dence) factorial design. Participants read a 12-page transcript of a trial in which the defendant was charged with stealing money from a coat in a restaurant. The transcripts contained opening and closing statements by the attorneys, opening remarks and final instructions from the judge, and a large number of evidentiary statements made by several witnesses. The study participants read the trial transcript and then provided a verdict and other evaluations. The results indicated that, despite the fact that the two strongest levels of hearsay were rated as more important in the verdict decision than the two weakest levels of hearsay, the strength of the hearsay evidence had no impact on the verdict measure. The hearsay evidence, which was not labeled as hearsay for the mock jurors, was designed to incriminate the defendant; that is, strong hearsay was expected to produce more guilty verdicts than weak hearsay. However, 67% of the mock jurors receiving the "weak" hearsay found the defendant guilty, whereas only 58% of the jurors receiving the "strong" hearsay voted for conviction. These conviction rates compare to the 51% of the mock jurors who received no hearsay evidence and who believed the defendant was guilty. Thus, the strength of the hearsay evidence had no systematic effect on mock jurors' verdict decisions (no significant differences between conditions), nor did strength of hearsay interact in any way with the strength of the other evidence. An experimental study by Kovera, Penrod, and Park (1992) examined the witnessing conditions of eyewitnesses and hearsay witnesses. Eyewitnesses, and in this case declarants, provided accounts that were classified as good, moderate, or poor in terms of accuracy. Each hearsay witness watched a videotape of one of these accounts and then described the information after an intc~rval of either 1 day or 1 week. Mock jurors were then exposed to a variety of eyewitness and hearsay accounts. Kovera, Penrod, and Park found that hearsay witnesses testifying after the short delay were far more accurate than those witnesses testifying after the long delay. The mock jurors rated the quality and the usefulness of the hearsay testimony after the short delay significantly higher than the long-delay testimony. However, mock jurors rated the eyewitness testimony as more useful and of higher quality than the hearsay testimony. Do Jurors Use Hearsay Evidence? In our research on hearsay evidence, we hypothesized that mock jurors would differentiate between evidence entered by an eyewitness as opposed to a hearsay witness (Miene, Park, Borgida, & Anderson, 1990). More specifically, it was expected that hearsay testimony would be discounted or have less impact on the verdict decision compared to eyewitness testimony conveying the same information. Previous research on the effects of eyewitness testimony suggests that such testimony is influential in juror decision making

9.

EVALUATION OF HEARSAY EVIDENCE

ISS

(Kassin, Ellsworth, & Smith, 1989; Wells & Loftus, 1984). Cutler, Penrod, and Stuve (1988) manipulated 10 witness and identification factors that had previously been shown to affect eyewitness memory (e.g., the use of disguise, weapon visibility, violence, lineup size, and witness confidence) and created 64 trial stimulus videotapes in a fractional factorial design. They found that the mock juror subjects recalled the testimony relevant to the manipulated factors, but eight of these factors had but trivial effects on the subjects' inferences regarding the defendant's culpability and the likelihood the identification was correct. Cutler, Penrod, and Stuve argued that, in the absence of expert testimony, lay people are not sensitive to the factors that influence eyewitness memory and identification. If jurors are not sensitive to those factors associated with the unreliability of eyewitness testimony, then what are the effects of expert testimony? Cutler, Penrod, and Dexter (1989) hypothesized that expert testimony could have three types of effects. One possibility, based on a host of empirical studies that demonstrate jurors have difficulty integrating both legal and scientific concepts in their decision making, is that such testimony may confuse or mislead members of the jury. A second, more desirable effect, is that expert testimony would increase juror sensitivity to eyewitness accuracy concerns. However, a third possibility is that expert testimony may be given too much weight; that is, jurors may be so impressed by the expert testimony that they undervalue or disregard the eyewitness testimony, a so-called juror skepticism effect. Cutler, Penrod, and Dexter (1989) created videotaped trials that manipulated, among other variables, the witnessing and identification conditions, the confidence the witness expressed in her identification, and the presence or absence of expert testimony, and they had 538 mock juror subjects view one of the videotapes. The results indicated that expert testimony increased juror sensitivity: Jurors gave more importance to the witnessing and identification factors and less weight to the eyewitness' confidence in her identification in conditions where they received expert testimony. The expert testimony did not produce a skepticism effect, nor was there any evidence that indicated the jurors became confused by the expert testimony. Thus, studies on the influence of expert testimony in cases involving eyewitness testimony suggest that testimony from an expert sensitizes jurors to the fallibility of eyewitness testimony and is effective at weakening the influence of eyewitness testimony. With these studies in mind, we expected that in the absence of expert testimony an eyewitness would be more influential in the context of a juror decision-making task than a hearsay witness. In order to test these hypotheses about hearsay evaluation, we first created a trial stimulus tape based on an apparently real theft. 1 A situation 1The authors are grateful to Martin J. Costello, Dean Steven H. Goldberg of the Pace University School of Law, and the Hon. John S. Connolly of Ramsey County (MN) District Court for their invaluable advice on and participation in this trial simulation.

156

MIENE, BORGIDA, PARK

was created so that a few participants, recruited ostensibly to evaluate law students in a mock trial, witnessed an experimental confederate enter the University of Minnesota Law School and leave a short time later carrying a computer. A law professor then led these participants to believe that his computer had just been stolen and that they were in fact eyewitnesses to the theft. Another group of participants, designed to be the hearsay witnesses, were prevented from witnessing the theft. They became hearsay witnesses through a procedure in which they were individually paired with one of the eyewitnesses during a mock police questioning regarding the theft. Thus, the staged theft created actual eyewitnesses and hearsay witnesses to a seemingly real event, and a mock trial involving these witnesses, the confederate defendant, two practicing attorneys, and a district court judge was conducted and videotaped. From this videotape, four different experimental conditions were created: circumstantial evidence only (circumstantial condition); circumstantial evidence plus hearsay testimony (hearsay condition); circumstantial evidence plus eyewitness testimony (eyewitness condition); circumstantial evidence plus eyewitness and hearsay testimony (all evidence condition). All conditions included standard Minnesota judicial instructions and opening and closing statements by the two attorneys (i.e., discussing testimony presented in the particular condition). The two conditions involving hearsay testimony also included the standard Minnesota hearsay caution as part of the judge's instructions to the jury, but the hearsay testimony was not labeled as such, nor was it objected to, at the time of presentation. 2 Despite 30 years of social science research demonstrating the ineffectiveness of such curative instructions, there is little evidence that the courts are changing these instructions in light of the empirical evidence (J. A. Tanford, 1989, 1990, 1991); hence, our inclusion of a cautionary instruction in the present study. The circumstantial evidence consisted of two witnesses played by actors. One actor played the role of the law professor reporting the stolen computer, and the second actor played the role of the defendant's landlord, who testified that he found the computer in the defendant's apartment. The circumstantial evidence was created by the attorneys working on the project with us, and it was designed to provide context for the eyewitness or hearsay witness testimony. The circumstantial evidence alone was not believed to be strong enough to produce a verdict of guilty. For the eyewitness evidence, we selected the witness (from the several available) who provided the most accurate and complete testimony when questioned by the attorneys. We selected the hearsay witness who had been paired with this eyewitness during the mock police questioning because she was the best hearsay witness 2 Note that the hearsay evidence used in this study is not covered by an exception to the hearsay rule and would therefore not be admissible in an actual trial.

9.

EVALUATION OF HEARSAY EVIDENCE

157

and because we wanted the eyewitness and hearsay evidence to be as similar as possible. The videotaping took place I week after the staged theft; the eyewitness testified based on her memory of the theft, and the hearsay witness testified based on her memory of the eyewitness's account provided during the mock police questioning. The all-evidence condition presented both these accounts. The four videotapes were then shown to 186 undergraduate subjects (Ill women, 75 men), run in groups ranging in size from 3 to 10. The subjects were instructed to watch the trial as if they were jurors in the case and were told they would be asked for their verdict decision and other judgments after the trial. More specifically, the first dependent measure asked participants to decide on their verdict on the charge of theft as outlined by the judge and to indicate their confidence in that verdict. Participants next evaluated each witness on several dimensions (ability, influence, importance, and reliability). In addition, participants rated the effectiveness of the two attorneys, the strength of their respective cases, the influence of the judicial instructions, and the influence of the defendant's lack of testimony in his own defense. Participants completed a free response sheet that asked them to describe and rank order the three "most important pieces of evidence that you personally used in arriving at your verdict decision." Participants then described and ranked the evidence for the verdict decision other than the one they chose. For example, if a participant found the defendant guilty, he or she was first asked to list the three most important pieces of evidence leading to a guilty verdict and to then Jist the three most important pieces of evidence that would support a verdict of not guilty. The participants then completed a multiple choice quiz covering the evidence presented as well as information contained in the judge's instructions. This measure was included as a check on the participants' attention to the trial tape. Finally, participants rated their satisfaction with the videotaped trial and provided their personal opinions regarding hearsay admissibility. A final, open-ended measure asked whether they believed jurors serving in actual trials could properly evaluate hearsay evidence. In the videotaped trial, the eyewitness selected for this videotaped trial provided an accurate account of the events of the theft, but her description of the thief was not especially good. The hearsay witness reproduced the eyewitness's account of the theft and her description of the thief very accurately; the actual evidence provided by these two witnesses was therefore the same. To maximize experimental control, one should have the same person play the role of the eyewitness and the hearsay witness. However, we desired a more naturalistic design that would provide us with actual testimony based on a real, albeit staged, event. This meant we had to use different people in the roles of eyewitness and hearsay witness. To insure that any obtained differences between the eyewitness and the hearsay witness were due to the testimony, and not due to factors associated with these witnesses,

158

MIENE, BORGIDA, PARK

we had an independent sample of undergraduate subjects watch the videotapes and rate the witness on 15 dimensions related to credibility and persuasiveness. An overall MANOVA on these 15 dimensions indicated there was no significant difference in the subjects' perceptions of the two witnesses. Thus, the two witnesses provided the same evidence, and they were seen as equally convincing, trustworthy, confident, and effective by an independent sample of raters. The results from the Miene et al. study indicate that mock juror subjects clearly distinguished between the testimony provided by either an eyewitness or a hearsay witness (even though the evidence presented was the same) and, as expected, weighed the eyewitness testimony more heavily in their verdict decisions. As shown in Table 9.1, 62% of the subjects in the eyewitness condition found the defendant guilty, whereas only 40% of the subjects in the hearsay condition convicted the defendant, test of proportions z = 2.05, p < .05. Comparing the verdict pattern in the hearsay condition to the circumstantial condition, the addition of the hearsay testimony produced a meager 4% increase over the 36% guilty rate in the circumstantial evidence alone. Also, the addition of the hearsay testimony to the eyewitness testimony had no impact on the verdict decisions of those mock jurors. In fact, the percentage of guilty verdicts in the all-evidence condition (55%) is actually lower, although not significantly, than the percentage of guilty verdicts in the eyewitness condition. The overall verdict pattern, which is significant, x2(3) = 8.35, p < .05, clearly demonstrates that our mock juror subjects were not influenced in their verdict decisions by the hearsay testimony. We also created a continuous dependent measure of verdict by multiplying each subject's binary verdict decision by their confidence in that decision, and this produced a 14-point scale ranging from -7 (very confident defendant is guilty) to + 7 (very confident defendant is innocent). (A negative score on this measure therefore reflects a verdict of guilty and a positive score reflects a verdict of not guilty, and scores higher in absolute value indicate greater confidence in the verdict decision.) An ANOVA of this dependent variable was signifi-

TABLE 9.1

Juror Verdicts as a Function of Experimental Condition

Verdict Condition Circumstantial (n = 42) Hearsay (n = 50) Eyewitness (n = 47) All Evidence (n = 47)

Note. N

%Guilty

%Not Guilty

35.7 40.0 61.7 55.3

64.3 60.0 38.3 44.7

= 186. Chi-square (3) = 8.35, p = .04.

9.

EVALUATION OF HEARSAY EVIDENCE

159

cant, F(3, 176) = 3. 79, p = .0 1. Follow-up tests indicated that the eyewitness condition {M = - 1. 70) is significantly different from both the circumstantial condition (M = 1.27) and the hearsay condition (M = 0.62). In addition to the differences obtained on the verdict measure, we were interested in examining our subjects' perceptions of the eyewitness and hearsay witness testimony. We found that their perceptions of these two forms of testimony differed on all measured dimensions. The testimony of the eyewitness was rated as significantly more influential in the verdict decision than the hearsay witness, 1{95) = 2.46, p = .01, more important in the verdict decision, 1{94) = 3.46, p < .01, and the eyewitness was also perceived as a more reliable witness, 1(95) = 2.26, p = .03. These findings are clearly consistent with the obtained pattern of verdicts. Our mock juror subjects reported that the hearsay was less important and influential than the eyewitness testimony, and significantly fewer subjects found the defendant guilty in the hearsay condition compared to the eyewitness condition. Thus, data from our nonverdict measures clearly indicate that the hearsay testimony was not influential in the decision process and that eyewitness testimony was perceived as influential evidence. Our interest then turned to exploring how our subjects made their decisions and upon what information these decisions were based. To provide some indirect evidence on how these decisions were made, we had subjects complete open-ended questions in which they were asked to name and rank the three items of evidence they found most supportive of their verdict decision, whether that decision was guilty or not guilty. To analyze this free response data, we developed a coding scheme containing 10 categories of evidence, including testimony by the four witnesses, the description of the defendant, the fact that the stolen computer was found in the defendant's apartment, and, for those jurors in the circumstantial and hearsay conditions, the fact that no eyewitness testified. Two independent raters classified each statement as belonging in one of the 10 evidence categories or in an additional miscellaneous category; the raters' classifications were in agreement for 86% of the statements, and the discrepancies were resolved through discussion with one of the authors. In addition, we had our subjects name and rank the three items of evidence that raised the greatest doubts in their minds about their decisions. For example, if a subject found the defendant guilty, he or she was first asked to list the three most important reasons why he or she believed the defendant to be guilty. After providing this information, the subject was then asked to list the three most important pieces of evidence that suggested the defendant was not guilty {see Table 9.2). An examination of these open-ended responses yielded some insights into our mock jurors' decision process. We expected to find results suggesting that our subjects used the evidence provided by the hearsay testimony but then engaged in some type of discounting of the evidence because hearsay is con-

160

MIENE, BORGIDA. PARK

TABLE 9.2 Evidence Used in Verdict Decisions

/. Percentage of Subjects Citing Evidence in Support of Guilty Verdict 1 Circumstantial Eyewitness Hearsay

I. Computers found in defendant's apartment 2. Testimony of the witness 3. Positive ID

80

59 21 7

II. Percentage of Subjects Citing Evidence Against Guilty Verdict 2 Circumstantial Eyewitness I. Questionable description of defendant 2. Lack of eyewitness testimony

55 52

All Evidence

65 15 20

69

Hearsay

All Evidence

5 37

81

4 15

1AII

data taken from subjects voting guilty. for the first measure was taken from subjects voting guilty; data for the second measure were taken from subjects voting not guilty. 2Data

sidered less reliable than eyewitness testimony. Instead of finding evidence for this type of explicit discounting of the hearsay evidence, we found that our subjects simply did not report using the hearsay in their decision-making process. Specifically, for those participants voting guilty, the most important evidence was the fact that the stolen computer was found in the defendant's apartment. This finding was mentioned by 67% of the subjects as the single most important item of evidence, and this was found across all four conditions. This evidence, introduced through the landlord's testimony, established possession of the stolen goods but did not directly establish defendant guilt. The testimony of the eyewitness was the second most important evidence used by those in the eyewitness condition (21% listed this as the most important piece of evidence). The testimony of the hearsay witness was listed as most important by 15% of the subjects voting guilty in the hearsay condition. When asked to indicate what evidence suggested the defendant was not guilty, a striking difference emerged between the conditions. Sixty-seven percent of the mock jurors who voted guilty in the conditions receiving eyewitness testimony said the poor description of the defendant was the most important evidence supporting a verdict of not guilty. In contrast, only one person (5%) voting guilty in the hearsay condition reported having doubts about the description of the defendant. So what was creating doubt in the minds of the mock jurors receiving the hearsay testimony? Fifteen (75%) of the hearsay jurors specifically mentioned that no eyewitness account of the theft had been presented (or that the testimony they had heard was "only hearsay"),

9.

EVALUATION OF HEARSAY EVIDENCE

161

and this raised the most concern for the hearsay jurors who believed the defendant to be guilty. Similarly, of those jurors voting not guilty in the hearsay condition, 53% listed the Jack of an eyewitness as the primary reason for their decision to acquit, and this finding was mirrored in the circumstantial evidence-only condition, in which 56% cited this reason. No single item of evidence was commonly mentioned by the 18 jurors in the eyewitness condition as supporting their vote of not guilty, although the poor description of the defendant was listed more frequently (22%) than any other piece of evidence. The mock jurors in the all-evidence condition also reported being most influenced by the poor description, as 57% of those subjects reported this as being the most important reason in their decision to vote not guilty. The evidence cited as being most supportive of a guilty verdict by those jurors acquitting the defendant was again the fact that the stolen computer was found in the defendant's apartment. This response was listed by 63% of all subjects voting not guilty, and these individuals were distributed evenly across conditions. In summary, subjects in the eyewitness condition and the all-evidence condition, which included the eyewitness and hearsay testimony, reported the evidence contained in the eyewitness testimony was used in their verdict decisions. Subjects in the hearsay condition, on the other hand, reported using evidence from the two other witnesses (the Jaw professor and the landlord), whereas only rarely mentioning the evidence contained in the testimony of the hearsay witness. Thus, it appears that the eyewitness testimony was more influential in our mock jurors' decision making. Sometimes this evidence was used to support the verdict decision, other times it was reported as raising doubts about the verdict selected; nevertheless, the eyewitness testimony received a great deal of attention in the open-ended responses. More importantly, the hearsay testimony was not mentioned in these open-ended responses. Instead of relying on and subsequently discounting the hearsay evidence, it seems that the hearsay testimony was either simply ignored or, for whatever reason, not reported.

CONCLUSIONS "Hearsay" has a specialized meaning for jurists. In the Jaw of evidence, a statement is "hearsay" if it is offered into evidence to prove the truth of the matter that it asserts. There are differences between the legal term of art and the lay concept of hearsay. First, the legal term hearsay is not a synonym for rumor or gossip. A statement from a reliable source with first-hand information is still hearsay. For example, an out-of-court statement from a trained observer who saw an accident would be hearsay if offered to prove the truth of the matter asserted. Second, the term hearsay is not restricted

162

MIENE, BORGIDA, PARK

to oral statements (i.e., not to statements that one "hears" someone else "say"). A written assertion, offered to prove its truth, is also hearsay. Probably the principal reason for excluding hearsay is the belief that the jury will be misled by it. Hearsay is regarded as inferior evidence because the out-of-court declarant has not testified under oath and is not subject to observation and cross-examination. The absence of these courtroom safeguards is thought to deprive the jury of the means of assessing the credibility of the declarant (see Park, 1987). In addition, some jurists have pointed out that the admission of hearsay raises dangers of fabrication by the in-court witness, and of unfair surprise. Many commentators have suggested that the hearsay rule should be reformed to allow hearsay to be received more freely, trusting the jury to give it appropriate value. Radical reform would require both statutory change and, in criminal cases in which hearsay is offered by the prosecution, a change in the Supreme Court's interpretation of the Confrontation Clause of the Sixth Amendment. 3 Radical reform is not likely to occur in the near future, although less drastic change that ameliorates the impact of the hearsay rule has often been proposed and sometimes adopted. In the Miene et al. study discussed in this chapter, the trial stimulus was constructed so that in some conditions the jury was allowed to consider evidence that, under current law, would be excluded on hearsay grounds. We believed that the study of the jury's treatment of inadmissible hearsay was more relevant to law reform concerns than would be the jury's treatment of admissible hearsay. In terms of the question of hearsay admissibility, the data from this study clearly suggests that hearsay as a form of testimony does not appear to be overvalued in the verdict decision as some legal scholars have argued. The addition of hearsay evidence to the circumstantial evidence raised the conviction rate in our study by only 4%, and the addition of hearsay evidence to the eyewitness testimony actually lowered the conviction rate by about 7%. In general, our data suggest that hearsay testimony was discounted. However, the data do not distinguish whether our subjects did not use hearsay because they know it is unreliable, or whether they did not use hearsay

3The confrontation clause of the Sixth Amendment provides that "In all criminal prosecutions, the accused shall enjoy the right ... to be confronted with the witnesses against him . . . ." The language of the Amendment does not provide clear guidance about hearsay issues. It has been interpreted to place limits on the reception of unreliable hearsay that does not fall under a firmly rooted exception to the hearsay rule. However, the text of the clause could easily be interpreted merely to require that the defendant be confronted with whatever witnesses the prosecution chose to produce at trial. Under this interpretation, trial witnesses could testify about hearsay declarations, and the confrontation clause would impose no limits on the creation of new hearsay exceptions. It would merely require the presence of the defendant when evidence was presented to the trier of fact (see Park, 1987).

9.

163

EVALUATION OF HEARSAY EVIDENCE

because of the judge's cautionary instructions to discount hearsay evidence. The latter hypothesis strikes us as dubious in light of the scores of empirical studies on inadmissible and limited-admissibility evidence showing that jurors are not influenced by a judge's admonitions or cautionary instructions (e.g., Cox & S. Tanford, 1989; Elwork, Sales, & Alfini, 1982; Severance and Loftus, 1982; S. Tanford & Cox, 1988; S. Tanford, Penrod, & Collins, 1985; Wissler & Saks, 1985). In our case, the argument could be made that jurors discounted the hearsay when they first heard the witness and then believed the judge's instructions at the end of the trial that validated this discounting. However, some additional evidence supporting the claim that jurors believe hearsay to be unreliable independent of judicial instructions is presented in Table 9.3. Subjects in the Miene et al. study were asked for their opinions about hearsay, and the data reflect a rather negative view regarding the reliability of hearsay. One measure described the hearsay evidence (without labeling it as "hearsay'') that was presented in the hearsay and all-evidence conditions. Subjects were asked in those two conditions how useful such evidence was in their own verdict decisions. Subjects in the circumstantial and eyewitness conditions were asked how useful this evidence would have been. As the Table 9.3 data suggest, jurors who actually received this testimony rated it as significantly less useful than the jurors who were merely rating its potential usefulness. We then described and provided an example of hearsay and asked subjects whether they believed that hearsay should be admissible evidence (see Appendix for the items used in these analyses). Subjects tended to agree that hearsay should not be presented to a jury as evidence. Subjects also agreed that "making hearsay evidence admissible would encourage some lawyers or litigants to lie or create evidence by getting witnesses to testify to statements that were never made." Subjects also agreed with the statements that hearsay is not useful because the declarant's credibility is unknown, and there is a danger that the witness may not remember or may misstate what the TABLE 9.3 Opinions of Hearsay Evidence

Measure

Usefulness Inadmissible Jury decide Lies Credibility Memory

Circumstantial (n = 41)

Eyewitness (n = 47)

Hearsay (n =50)

All Evidence (n = 46)

5.78 3.68 3.71 3.24 2.68 2.51

5.47 3.47 3.74 2.77 3.02 2.70

4.00 3.62 4.28 3.32 2.40 2.86

3.59 4.11 4.35 3.30 3.24 2.85

Note: Lower means indicate a stronger concern about the reliability of hearsay evidence. Exact item wordings found in Appendix.

164

MIENE, BORGIDA, PARK

TABLE 9.4

Juror Evaluation of Hearsay: Open-Ended Item

Circumstantial Eyewitness Hearsay All Evidence Marginals

%Yes

%No

%Uncertain

29 30 28 40 32

55 63 56 47 55

16 7 16 13 13

Note: Question asked was: "Do you think most people serving on a jury would be able to properly evaluate hearsay evidence? Please briefly give us your opinion."

declarant said. Finally, subjects responded to an open-ended question asking whether they believed most people serving on a jury would be able to properly evaluate hearsay evidence. As shown in Table 9.4, the majority of participants in the Miene et al. study believed that jurors could not properly evaluate hearsay (although many believed that they were personally able to do so). The opinion data indicate, then, that participants believe hearsay to be potentially unreliable and difficult to evaluate in the decision-making process. In addition to the generally negative view that emerges about hearsay, it is important to note that these perceptions were the same across all experimental conditions in the study, despite the fact that jurors in two conditions heard judicial instructions on hearsay whereas jurors in the other two conditions did not. This fact is consistent with our interpretation that jurors discount the reliability of hearsay on their own, and not on the basis of the judge's limiting instructions. Nevertheless, further research should focus on distinguishing more conclusively between these two interpretations. Such data not only would enhance our understanding of how jurors think about hearsay evidence, but they also would address perhaps the central issue for jury researchers-juror and jury competence.

APPENDIX Items Used to Assess Opinions About Hearsay I. Evidence related to the identification of the defendant was presented in the trial you just saw. Part of this evidence was testimony by a witness who was present during police questioning of someone else who saw the computer thief and who described the thief and picked an identification photo from the police officer's photo display. As a juror in this case, to what extent was this evidence useful in reaching your verdict? (I = not at all useful, 7 = extremely useful). Hearsay evidence is "second-hand" information of a certain type. Legally, it is defined as in-court testimony about an out-of-court statement, when

9.

EVALUATION OF HEARSAY EVIDENCE

165

the testimony is offered to show the truth of some assertion in the out-ofcourt statement. For example, Joe tells me that the blue car ran a red light and crashed into a school bus. If I am a witness in court and say that Joe told me that the blue car ran the red light and crashed into the school bus, I am offering hearsay testimony. Please give your opinion in the statements below. 2&3. Some legal experts believe that hearsay is unreliable evidence and should not be presented to juries. Other legal experts believe that the jury should be allowed to decide whether the hearsay is unreliable or not. As a potential juror (anyone over the age of 18 can be called to serve on a jury), what is your opinion of the two options given below? Hearsay should not be presented to the jury as evidence. [1 = strongly agree, 7 = strongly disagree]. Hearsay should be presented and the jury can then decide how to use it when making their decision. [1 = strongly agree, 7 = strongly disagree]. 4. Making hearsay evidence admissible would encourage some lawyers or litigants to lie or create evidence by getting witnesses to testify to statements that were never made. 5. Hearsay testimony is not useful because the credibility of the person who originally makes the statement out of court is not known (this person is not a witness, so she or he cannot be cross-examined). 6. Hearsay testimony is not useful because the witness in court may not remember or may mis-state what the original speaker actually said out of court.

REFERENCES Cox, M., & Tanford, S. (1989). Effects of evidence and instructions in civil trials: An experimental investigation of rules of admissibility. Social Behavior, 4, 31-55. Cutler, B. L., Penrod, S. D., & Dexter, H. R. (1989). The eyewitness, the expert psychologist, and the jury. Law and Human Behavior, 13, 311-332. Cutler, B. L., Penrod, S.D., & Stuve, T. E. (1988). Juror decision making in eyewitness identification cases. Law and Human Behavior, 12, 41-55. Elwork, A., Sales, B. D., & Alfini, J. J. (1982). Making jury instructions understandable. Charlottesville, VA: Michie. Fiske, S. T., & Taylor, S. (1991). Social cognition. New York: McGraw-Hill. lmwinkelried, E. J. (1989). The importance of the memory factor in analyzing the reliability of hearsay testimony: A lesson slowly learnt-and quickly forgotten. Florida Law Review, 41, 215-252. Kassin, S.M., Ellsworth, P. C., & Smith, V. L. (1989). The "General Acceptance" of psychological research on eyewitness testimony. American Psychologist, 44, 1089-1098. Kovera, M. B., Penrod, S. D., & Park, R. C. (1992). Jurors' perceptions of hearsay evidence. Minnesota Law Review, 76, 703-721. Landsman, S. A., & Rakos, R. F. (1990). The impact of hearsay evidence on mock jurors. Paper presented at the annual meeting of the American Psychological Association, Boston.

166

MIENE. BORGIDA. PARK

Landsman, S., & Rakos, R. F. (1991). Research essay: A preliminary empirical enquiry concerning the prohibition of hearsay evidence in American courts. Law and Psychology Review, 15, 65-85. Lilly, G. (1978). An introduction to the law of evidence. St. Paul, MN: West. Miene, P., Park, R., & Borgida, E. (1992). Juror decision making and the evaluation of hearsay evidence. Minnesota Law Review, 76, 683-702. Miene, P., Park, R., Borgida, E., & Anderson, J. (1990). The evaluation of hearsay evidence. Paper presented at the annual meeting of the American Psychological Association, Boston. Monahan, J., & Walker, L. (1990). Social science in law: Cases and materials (2nd ed.). Mineola, NY: Foundation Press. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings in social judgment. Englewood Cliffs, NJ: Prentice-Hall. Park, R. (1987). A subject matter approach to hearsay reform. Michigan Law Review, 86,51-122. Severance, L., & Loftus, E. (1982). Improving the ability of jurors to comprehend and apply criminal jury instructions. Law and Society Review, 17, 153-197. Stewart, I. D., Jr. (1970). Perception, memory, and hearsay. A criticism of present law and the proposed Federal Rules of Evidence. Utah Law Review, I, 1-39. Tanford, J. A. (1989). 30 years and still waiting: The negligible effect of jury instruction research on judicial decisions. Paper presented at the annual meeting of the Law and Society Association, Madison, WI. Tanford, J. A. (1990). The law and psychology of jury instructions. Nebraska Law Review, 6.9, 71-111. Tanford, J. A. (1991). Law reform by courts, legislatures, and commissions following empirical research on jury instructions. Law and Society Review, 25, 155-175. Tanford, S., & Cox, M. (1988). The effects of impeachment evidence and limiting instructions on individual and group decision making. Law and Human Behavior, 12, 477-497. Tanford, S., Penrod, S., & Collins, R. (1985). Decision making in joined criminal trials: The influence of charge similarity, evidence similarity, and limiting instructions. Law and Human Behavior, 9, 319-337. Wells, G. L., & Loftus, E. F. (1984). Eyewitness testimony: Psychological perspectives. New York: Cambridge University Press. Wissler, R. L., & Saks, M. J. (1985). On the inefficiency of limiting instructions: When jurors use prior conviction evidence to decide on guilt. Law and Human Behavior, 9, 37-48.

CHAPTER

10 jURY DECISION MAKING AND THE INSANITY DEFENSE James R. P. Ogloff Simon Fraser University, British Columbia

In its effort to regulate society, the Jaw makes countless assumptions about human behavior. Some of the assumptions deal with people's understanding of legal information. Nowhere is this assumption more evident than in the area of jury decision making. One of the more controversial areas of jury decision making involves the insanity defense. This chapter summarizes some of the research that investigates the extent to which jurors comprehend insanity defense instructions, and the factors that jurors use when making decisions about the insanity defense. The chapter also describes two studies that were conducted to determine whether the specific insanity standard (including the assignment of burden of proof and standard of proof) employed had a significant effect on mock jurors' verdicts. Participants' comprehension of insanity defense instructions was measured, and the factors jurors used to decide whether to find the defendant Not Guilty by Reason of Insanity (NGRI) were also assessed. Participants' comprehension of insanity defense standards was very low. When asked to identify the factors they considered important in determining whether to find a defendant NGRI, only three elements of insanity defense standards were identified as being significant. The results have important implications for policy decisions regarding the insanity defense. Following the 1982 acquittal of John W. Hinckley, Jr., 17 states and the federal government revised their insanity defense statutes by using a substitute standard, by assigning the burden of proof to the defendant, or by altering the standard of proof necessary to meet the burden. Several states have also in167

168

OGLOFF

troduced the Guilty But Mentally Ill verdict. After all the attention that has been devoted to the insanity defense, however, little is known about the actual impact that insanity defense standards have on the outcome of insanity cases. Implicit in the assumptions the law makes when altering insanity defense standards is that jurors will understand the language of the standards and will employ those instructions when deliberating in an insanity defense case. In order for this assumption to be accurate, jurors must (a) understand the insanity defense instructions when they are provided with them, and (b) employ the insanity defense standards when deliberating and rendering their verdicts. If these assumptions are not met, and jurors make decisions in insanity defense cases, it is important to identify the factors they consider important when making a verdict. The purpose of the studies reported in this chapter is to provide information regarding the impact of insanity defense standards on jurors' findings of guilt. The research also investigated simulated jurors' comprehension of jury instructions, and the factors that mock jurors report as having been important when deciding whether to find a defendant not guilty by reason of insanity (NGRI).

CURRENT FORMULATIONS OF THE STANDARD FOR LEGAL INSANITY M'Naghten Standard of Insanity

The first "modern" legal standard of insanity was announced in Regina v. M'Naghten (1843). M'Naghten was arrested and charged with murder for mortally wounding the Prime Minister of England's private secretary (Moran, 1981, 1985). M'Naghten was mentally ill and was actually attempting to assassinate the prime minister. M'Naghten was acquitted and the House of Lords resolved that a similar standard to that employed in Regina v. M'Naghten (1843) was correct: "You must find the defendant Not Guilty By Reason of Insanity if you believe that, at the time of committing the act, the defendant was labouring under such a defect of reason, from disease of the mind, as not to know the nature and quality of the act he was doing; or, if he did know it, that he did not know what he was doing was wrong." The substantive requirements of M'Naghten are still being used by numerous jurisdictions around the world, including 21 of the United States and Canada (see Table 10.1). The M'Naghten standard has been criticized as focusing rather narrowly on one's cognitive capacity to know that what one is doing is wrong (Hermann & Sor, 1983; Loh, Jeffries, & Bonnie, 1986; Melton, Petrila, Poythress, & Slobogin, 1987; Perlin, 1989; Simon & Aaronson, 1988).

TABLE 10.1 Insanity Defense Standards Currently Employed in Jurisdictions in the United States

Test Used

Burden of Proof

Standard of Proof

Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware D.C. Florida Georgia

ALI M'N M'N ALI ALI 1 M'N 1 ALI 1 M'N AL1m 1 M'N M'N

D D DI

POE POE POE 1 POE POE BRD POE 1 POE CCE 1 BRD POE

xi

Hawaii Idaho Illinois Indiana

ALI n/a* 1 ALl M'N 1

DI

POE 1 CCE POE 1 POE

xi

Iowa Kansas

M'N M'N

DI

Kentucky Louisiana Maine Maryland Massachusetts

ALI M'N ALI ALI ALI

D D D DI

Michigan

State

D D

pi

DI D DI

p D

D DI D

p

GBMI

XI

POE 1 BRD

X

ALI

p

BRD

X

Minnesota Mississippi Missouri Montana Nebraska Nevada

M'N 1 M'N A Lim n/a* M'N M'N

D

POE BRD POE POE POE 1 POE

New Hampshire New Jersey New Mexico New York North Carolina North Dakota

Dur. M'N M'N M'Nm M'N ALim 1

Ohio Oklahoma Oregon

ALl M'N ALI 1

D D

p DI D

pi D

p D

POE POE BRD POE 1 POE BRD 1 POE BRD POE

§15-16-2 §12.47.010, .030"-' §13-502A & B §41-601 §522 § 16-8-101,104,105 §53a-12 & 13 11 §304,401; 401"' §24-301 §3.217 §26-702"-', 703"-' §27 -1503"-' §704-402; §704-408 §18-207 §6-2; §6-2(c)(d)"-' §35-41-3-6, -4-1(b) §35-36-2-3(4)"-' §701-4 654 P.2d 395 673 P.2d 1166 §504.020,.070;.130rv R.S. 14:14; Art. 652 17-A §39 §12-108, -109 349 N.W.2d 139 341 N.W.2d 861"-' §768.21(a); §330.1400a"-' §611.026 287 So. 2d 759 §522.030 §46-14-201; 311"-' §29-2203 625 P.2d 1163 29 P.2d 100 §628.2(11) §2C: 4-2 §41.01;§31-9-3"-' §40.15 291 S.E.2d 599 §12.1-04-03, §12.1-01-03(2) §2943.03; §2901.05 21 §152 § 161.305; § 161.055

X

p

D

X

xi

xi

D D DI

Statutory or Case Citation

xi

POE POE POE POE 1 BRD

p

No Reforms

X X X X

X X X

X

xi

(Continued)

169

170

OGLOFF

TABLE 10.1 (Continued)

State

Test Used

Burden of Proof

Standard of Proof

Pennsylvania Rhode Island South Carolina South Dakota

M'N ALl M'N M'N

oi 0 0

pi

POE 1 POE POE BRD

Tennessee

ALl

p

BRD

Texas Utah

M'N 1 n/a*

0

POE BR0 1

Vermont Virginia

ALl M'N

0

POE 1 POE

Washington West Virginia

M'N ALl

0 p

POE BRD

Wisconsin Wyoming

ALl ALl

oi

pi

oi

0

GBMI

No Reforms

xi xi xi

xi

..

POE 1

X X

X

Statutory or Case Citation

18 § 315; §314-v 399 A.2d 469 §17-24-10; -24-20"-' 290 N.W.2d 482 §25A-25-13-v 656 S.W.2d 344; 679 F.2d 1209 §2.04; §8.01 §76-2-305;§64-7 -2-8 § 77-35-21.5 13 §4801 204 S.E.2d 272 323 S.E.2d 106 §10.77.030(2) 274 S.E.2d 272 301 S.E.2d 776 §971.15; §971.175 §7-11-305

1lndicates

that the element was reformed, or introduced, during or following the Hinckley Case. *Question of sanity relates to mens rea at the time of the crime. **In Wisconsin, the defendant is given a choice of using the ALl test or the M'Naghten test (Schleisner v. State, 1967). If the defendant chooses the ALI test, the burden of proof remains with the defendant who must prove his or her insa;,ity at the time of the offense by the greater weight of the credible evidence, a standard roughly equivalent to the preponderance of the evidence standard (LaFollette v. Rasking, 1967). "-' Indicates that the statute or case is for the state's GBMI verdict. Note: ALl - American Law Institute Insanity Defense Standard BRD - Beyond a Reasonable Doubt CCE - Clear and Convincing Evidence 0 - Defendant Our. - Durham Insanity Defense Standard m- modified M'N - M'Naghten Insanity Defense Standard P - Prosecution POE - Preponderance of the Evidence This table was adapted from Callahan, Mayer, and Steadman (1987).

Because M'Naghten requires a subjective exploration of the defendant's thinking, it is referred to as a "cognitive" test of insanity (e.g., Loh et al., 1986). American Law Institute Standard of Insanity The American Law Institute (ALI) developed the standard for criminal responsibility that was adopted by the Model Penal Code (ALI, 1962; Simon & Aaronson, 1988): "A person is not responsible for criminal conduct if at the time

10. JURY DECISION MAKING AND INSANITY

171

of such conduct as a result of mental disease or defect he lacks substantial capacity either to appreciate the criminality (wrongfulness) of his conduct or to conform his conduct to the requirements of law" (§4.01). To date, 22 states have adopted some variation of the ALI standard (see Table 10.1). 1 The ALI standard contains a volitional prong, designed to focus on situations where the defendant had the same cognitive ability to know that what he or she was doing was wrong (as required by the M'Naghten standard) but was unable to conform his or her behavior to the law. Because the ALI standard provides for relief from criminal responsibility for both cognitive and volitional reasons, it is apparently more expansive than M'Naghten. Conceptually, therefore, one would expect that use of the ALI standard would result in a greater number of NGRI acquittals than the M'Naghten standard. Somewhat surprisingly, however, there has not been a great deal of empirical research on this topic (Finkel, 1989; Finkel & Handel, 1989; Finkel, Shaw, Bercaw, & Koch, 1985; Sales & Hafemeister, 1984). The Guilty But Mentally Ill Verdict

In 1975, the Michigan Legislature introduced a new verdict-"Guilty But Mentally Ill" (GBMI). 2 Essentially, the GBMI verdict holds a defendant criminally responsible for his or her act but recognizes that the defendant is mentally ill. The GBMI verdict is typically employed as an option in addition to the NGRI and guilty verdicts. 3 A number of states have adopted the GBMI verdict (see Table 10.1 ). Some commentators argue that the verdict has been a success because it allows defendants to be held criminally responsible for their actions, while enabling the defendant to seek treatment (e.g., Mickenberg, 1987). Critics argue that the GBMI verdict is an overreaction to a problem that really does not exist (i.e., that the insanity defense allows dangerous defendants to simply "get off"; e.g., Slobogin, 1985). Similarly, some say that the GBMI verdict serves no necessary purpose and is a misleading verdict introduced for purely political reasons (Blunt & Stock, 1985; McGraw, Farthing-Capowich, & Keilitz, 1985; Melton et al., 1987; Petrella, Benedek, Bank, & Packer, 1985). Those found GBMI are often not given psychiatric treatment (McGraw et al., 1985; Melton et al., 1987). Some evidence shows that the verdict confuses jurors, causing them to find a disproportionate number of defendants "guilty"-even innocent ones (Savitsky & Lindblom, 1986). 1The

ALI standard was in force in the District of Columbia when Hinckley was tried. D.C.

CODE ANN.§ 24-301 (1981). 2MtcH. COMP. LAWS ANN. §§ 768.29a(2), 768.36. 3Id.

172

OGLOFF

To summarize, the most common insanity defense standards employed are M'Naghten and ALl, or some variation thereof. In addition, several states have introduced the GBMI verdict. In order to understand the full impact of insanity defense standards on juror verdicts, it is important to understand the effect that the assignment of burden of proof and the standard of evidence required to meet the burden of proof have on jurors.

THE LOCUS OF BURDEN OF PROOF AND THE STANDARD OF PROOF IN INSANITY DEFENSE CASES

The assignment of burden of proof determines whether the prosecution or the defense has the onus of proving an issue in the case and rebutting any legal assumptions. The state bears burden of proving, beyond a reasonable doubt, that the defendant committed all the elements of a crime (In re Winship, 1970). By contrast, the defendant must often prove all circumstances of justification, excuse, or alleviation (Commonwealth v. York, 1845). If a state considers sanity to be a fundamental element of all crimes, the prosecution bears the burden of proving that the defendant was sane at the time of the offense (e.g., Commonwealth v. Vogel, 1970; see Melton et al., 1987, p. 125). If, however, insanity is considered to be an exculpating factor, the defendant must prove that he or she was insane at the time of the offense (American Psychiatric Association, 1982, pp. 12-13; Melton et al., p. 125). Currently, 35 states and the District of Columbia place the burden of proving the defendant's insanity on the defendant (see Table 10.1). The standard of proof defines the amount of evidence required to satisfy the burden of proof. When the prosecution has the burden of proof, the prosecution must prove, beyond a reasonable doubt, that the defendant was sane at the time of the offense (Commonwealth v. Vogel, 1970; In re Winship, 1970; Melton et al., 1987, p. 125). When the burden of proof has been assigned to the defendant, the defendant does not have to prove his or her insanity by the most stringent standard of beyond a reasonable doubt (BRD; approximately 95% certainty), but by the lesser standards of preponderance of evidence (POE; approximately 51% certainty) or clear and convincing evidence (CCE; approximately 75% certainty). Currently, 34 states place the burden of proof on the defendant to prove his or her insanity by a preponderance of the evidence (see Table 10.1). Only Idaho and the District of Columbia uses the clear and convincing evidence standard of proof. Eleven other states assign the burden of proving the defendant's sanity on the prosecution beyond a reasonable doubt. The burden and standard of proof are important concepts in determining the defendant's guilt. In fact, as Melton et al. (1987) noted, "[a]rguably, the outcome of an insanity

10. JURY DECISION MAKING AND INSANITY

173

case could depend as much on a jurisdiction's approach to these proof issues as on its substantive test of insanity" (p. 125). Overall, there are three major variables that may play some role in the success with which the insanity defense is employed in cases: (a) the actual standard of insanity being employed (M'Naghten or ALl); (b) the locus of the burden of proof (defense or prosecution); and (c) the standard of proof (BRD, CCE, and POE). There is surprisingly little empirical evidence exploring the role that these factors may play in the process of determining whether a defendant is acquitted by reason of insanity (Sales & Hafemeister, 1984).

EMPIRICAL RESEARCH REGARDING THE INSANITY DEFENSE

Much of the controversy surrounding the insanity defense has involved legal arguments for determining the most appropriate standard of insanity (see National Institute of Justice, 1989; Piquet & Best, 1985). This approach largely overlooks the impact of the language of the insanity standard on the trier of fact (judge or jury). As discussed next, however, some investigators have attempted to determine whether the particular insanity defense standard employed differentially effects NGRI acquittal rates (see Ogloff, Schweighofer, Turnbull, & Whittemore, 1992, for a recent review of research on the insanity defense).

THE EFFECT OF VARYING INSANITY DEFENSE STANDARD ON NGRI ACQUITTAL RATE

Archival-Naturalistic Research In 1976, Sauer and Mullens compared the number of pretrial psychiatric examinations in Maryland between the 1966 and 1973 fiscal years. Maryland replaced the M'Naghten test with the ALI test in 1967. Results suggested that significantly more examinations occurred in 1973 (380) than in 1966 (278). There was also a significant increase in the percentage of individuals evaluated as not responsible for their actions between 1966 (8%) and 1973 (19% ). This study has been criticized because the authors relied on a pre-post comparison, rather than comparing the trend of findings of not responsible across time (Sales & Hafemeister, 1984). In their favor, the authors did attempt to explain the other factors that may have accounted for an increase in findings of not responsible. For example, they noted that they found no major differences in frequency of hospitalization, frequency of psychological labels applied, or the quality of evaluations between 1966 and 1973. Nonetheless,

174

OGLOFF

without a longitudinal time-series design that would show the trend of findings of not responsible over a period of time, one cannot draw any conclusions regarding the cause of the increase in findings of not responsible (see Luckey & Berman, 1979; Steadman & Braff, 1983). In a study using very different methodology, Arens and Susman (1966) found that the legal standard chosen by a jurisdiction may have little influence on NGRI determinations. The authors performed a content analysis of trial manuscripts from NGRI cases in Washington, D.C., prior to and following the introduction of the Durham standard. The results suggested that, although the official insanity test changed, the wording of judge's instructions to the jury regarding the insanity standard did not change appreciably. Arens (1967) presented data from the District of Columbia demonstrating that the percentage of defendants found NGRI increased steadily from 0.4% in 1954, following the introduction of the Durham standard, to 14.4% in 1961. The percentage stabilized around 14% for the 1961 to 1963 fiscal years, then fluctuated from 5.9% to 9.4% during the 1964 to 1966 fiscal years. Thus, there was an apparently significant increase in the percentage of defendants found NGRI in the years following the introduction of the Durham test. In evaluating the data reported by Arens, Keilitz (1987) warned that "it is problematic to attribute this increase {in NGRis] simply to the application of the Durham rule" (p. 300). For example, the most significant increase in percentage of successful acquittals did not occur until 1961, 7 years after the Durham test was introduced. Further, the percentage of successful insanity acquittals decreased dramatically from 13.3% in 1963 to only 5.9% in 1964. Finally, the most dramatic increase reported by Arens occurred following a number of court decisions that served to limit the Durham test (Keilitz, 1987). Thus, one must question the cause for the increase in NGRis as reported by Arens. Pasewark, Randolph, and Bieber (1983} compared insanity acquittals in Wyoming over a 6-year period during which three different standards of insanity were employed (including ALI and M'Naghten). No major differences occurred in the volume of insanity acquittals. From the preceding review, it is impossible to draw solid conclusions regarding the impact that changing the insanity defense standards have had on the insanity acquittal rate. In most studies, it is simply impossible to wade through all the extraneous variables in order to determine the extent to which the insanity standard per se produces changes in the insanity acquittal rate. None of the studies have employed time-series or longitudinal analyses necessary to begin to understand the trend in insanity acquittal rates prior to and following changes to insanity standards. In addition, the findings of the studies were contradictory.

10. JURY DECISION MAKING AND INSANITY

175

Analogue Studies

Because of the difficulty associated with performing archival studies, and in an attempt to exert more control over the variables influencing insanity defense decisions, researchers have also employed analogue studies. In an early study, a judge assigned actual jurors who were called for regular jury duty to serve on a jury in an experimental trial {James, 1959a, 1959b; Simon, 1967; Simon & Aaronson, 1988). Two tape-recorded trials (housebreaking and incest), based on actual cases, were used as the stimuli in the experiment. The independent variables in both studies included insanity instruction (M'Naghten, Durham, or no instructions). The results suggested that there were few differences in the NGRI rate among insanity standards. The number of juries finding the defendant NGRI in the housebreaking trial were 7 (M'Naghten), 4 (Durham), and 6 (no instructions), and for the incest trial were 0 (M'Naghten), 5 (Durham), and 4 (no instructions). Analyses revealed that M'Naghten jurors were significantly less likely to vote for an NGRl acquittal than the Durham jurors. Simon's research has some flaws. The low number of juries finding the defendants NGRI threaten the power of the statistical analyses employed. Also, the study did not test the effect that varying the burden and standard of proof may have on the acquittal rate. Finkel and his colleagues have investigated the impact that insanity defense instructions have on jurors (Finkel, 1989; Finkel et al., 1985; Finkel & Handel, 1989). In an early study, Finkel et al. (1985) presented participants with a booklet of five cases in which the insanity defense was raised. Although the reasons for insanity varied, basic elements were identical among all cases. The insanity defense instruction they were to use in their decision making was defined on the front page of the booklet of cases and questionnaires ("wild beast" test, M'Naghten, M'Naghten plus the irresistible impulse test, Durham, the ALI test, and the disability of mind test proposed by Fingarette, 1972, and Fingarette and Hasse, 1979). The results differed significantly for the type of case; however, the insanity acquittal rate did not vary significantly across groups who had received different insanity instructions. Given the array of insanity instructions included, this finding is rather remarkable. The external validity of the study must be questioned because of the choice of participants {undergraduates) and the lack of similarity between the study and the actual trial situation. Regardless of this criticism, one cannot overlook the importance of the central finding of the study: Participants' ratings of NGRI did not vary across insanity defense standards. In another study, Finkel (1989) attempted to determine the effect that changes made to the federal insanity standard by the Insanity Defense Reform Act of 1984 (IDRA) has on simulated jurors. Using a methodology similar to his earlier studies (Finkel et al., 1985; Finkel & Handel, 1989), he provided

176

OGLOFF

participants with a booklet of randomly ordered cases. Once again, the results showed that participants' verdicts varied significantly for type of case, but not for the type of insanity instruction. Savitsky and Lindblom (1986) investigated the impact of guilty but mentally ill (GBMI) instructions on mock jury decision making. In addition to the guilty and not guilty choices, participants in the three-choice condition were told that they could find the defendant NGRI, according to the ALI standard. The four-choice condition added a GBMI verdict option. Jurors in the twoand three-choice condition were most likely to find the defendant not guilty (69%-85%). However, in the four-choice condition, "[w]hen the GBMI choice was made available to the participants ... 65% of the participants cast their predeliberation choice for GBMI" (p. 695). Similarly, postdeliberation verdicts produced significant differences for number of verdict choices. All the juries in the two- and three-choice conditions returned verdicts of not guilty; however, only one of the four-choice juries returned a verdict of not guilty. The remaining juries labeled the defendant GBMI. As the preceding review suggests, analogue studies, like their archival counterparts, have been marred by methodological and conceptual flaws. The analogue research has generally failed to find any difference among mock juror verdicts when the insanity defense standard is varied. One reason for the apparent discrepancies between archival and analogue findings occurs because analogue studies focus on the language of the standards, whereas the archival studies focus on system-wide information. For example, historical artifacts such as the level of funding a state psychiatric hospital gets or the political influence surrounding a particular insanity defense standard do not influence the decision making of mock jurors. Likewise, as discussed before, the laboratory studies suffer from a number of methodological and conceptual flaws that decrease their external validity. Thus, the level of analysis between archival and analogue studies is different enough to partially account for the discrepant findings. Another consideration is that none of the archival studies was carefully designed. Thus, all the confounding variables previously discussed (e.g., historical artifacts) make it impossible to determine the internal validity of the archival studies. Pasewark and McGinley (1985) reported that a very limited number of jurisdictions maintain statistics regarding the frequency and success of insanity pleas. Therefore, it is currently nearly impossible to conduct well-designed, controlled, archival studies that would address many of the questions raised by the aforementioned review of the literature.

jurors' Comprehension of Insanity Defense Instructions Some researchers have investigated the extent to which jurors comprehend insanity instructions. Because of the general inaccessibility to actual jurors, researchers have focused on their attention on mock jurors. James (1959b)

10. JURY DECISION MAKING AND INSANITY

177

assessed the extent to which jurors understood the court's instructions. Jurors were approximately 58% more accurate in recalling the instructions they were given. James also assessed the jurors' ability to recall a variety of material from the trial. Significantly, the jurors' accuracy rate for the insanity defense instructions was lower than that for any of the other material participants were asked to recall. Arens, Granfield, and Susman (1965) investigated the extent to which jurors comprehended insanity defense instructions. The results showed that the overall percentage of jurors' comprehension of the standards ranged between 31% and 40%, regardless of the specific instructions given. Arens et a!. (1965) also measured the jurors' comprehension of the burden of proof. Varying from 35% to 50%, the comprehension level for burden of proof was also low, although not quite so low as for the insanity defense standards. Elwork and his colleagues (Elwork, Sales, & Alfini, 1977, 1982; Elwork, Sales, & Suggs, 1981) also found that jurors have a great deal of difficulty comprehending jury instructions generally. Consistent with other researchers, Elwork et al. found that juror comprehension is approximately 30% for insanity defense instructions. 4 Even when a tested method for rewriting jury instructions in order to make them maximally understandable was employed, Elwork et al. (1982) found that jurors averaged only 51% correct on a questionnaire designed to test their comprehension of M'Naghten jury instructions. Overall, the results of the studies just reviewed are consistent in demonstrating that jurors have a great deal of difficulty understanding insanity defense instructions. If mock jurors are unable to comprehend insanity instructions well, it must be very difficult for them to apply the standards to the facts of the cases they are deciding. Nonetheless, they do make decisions regarding insanity. The question that next arises, then, is on what do jurors base their insanity defense decisions if it is not the insanity instructions they are given? The Factor-s Jurors Use in Determining Whether to Find a Defendant NGRI

Arens et al. (1965) also obtained information regarding the factors that jurors employ when determining whether one should be found NGRI. The authors presented several pages of examples of the responses participants provided to a number of questions about the insanity defense. The authors concluded that, even when given Durham instructions, jurors tended to focus on "M'Naghten-like" (cognitive) factors. 4 Eiwork et al. only used the M'Naghten jury instructions. Thus, there remains a question regarding the generalizability of their findings to ALI instructions.

178

OGLOFF

Roberts, Golding, and Fincham (1987) investigated the implicit theories laypersons use to determine whether one should be held criminally responsible for one's actions. Independent variables included level of mental disorder, bizarreness of the crime, and planfulness of the crime. The participants were undergraduates who read vignettes that included all the combinations of the independent variables. After reading the vignette, participants were asked questions about criminal responsibility. The results indicated that the level of mental illness of the defendant was the primary determinant of NGRI decisions. Also, participants were more likely to find a defendant NGRI if the criminal act was not planned. Finally, if the criminal act was bizarre, participants were more likely to find the defendant NGRI. The results also provide information about important dimensions that effect jurors' decisions in insanity cases. However, the design of the study makes it impossible to know which factors jurors "naturally" use to determine whether one is NGRI. Thus. although we know that a defendant is more likely to be acquitted if he or she is very mentally ill (psychotic), commits a bizarre act, and did not apparently plan the act, we do not know how important these factors are to the jurors' overall decision of whether to find the defendant NGRI. More recently, Finkel and his colleagues have used quantitative techniques in an attempt to understand the factors jurors use when making decisions regarding the insanity defense (Finkel, 1989; Finkel & Handel, 1989). After completing the first part of the study that was just described, participants were asked to evaluate and categorize the reasons for their verdicts. In order to do this, Finkel listed "guilty" and "not guilty" constructs and had participants identify the factors that explained the reasons for their verdict. The constructs included the defendant's incapacity, awareness, clarity of thinking, ability to control his behavior, culpability, and evil motive. The other construct was whether any other people were at fault for the crime. The results indicated that mock jurors provided multiple constructs for their decisions (M = 3.2 constructs for NGRI verdicts and M = 3.0 constructs for guilty verdicts}. In addition, the constructs that jurors use shift among cases. Thus, mock jurors are apparently flexible and thorough in determining whether one is NGRI. These results are interesting and also suggest that mock jurors do not make random or arbitrary decisions regarding the insanity defense. Given the shortcomings of previous investigations, research is needed to compare the standards of insanity currently being used, the locus of the burden of proof, and the different standards of proof. In addition, because previous research suggests that the standards of insanity may not significantly vary jurors' acquittal decisions, it is important to determine what factors jurors rely on when deciding whether to find a defendant NGRI. The purpose of the studies discussed next is to address some of the shortcomings of the research previously conducted on the insanity defense. Ex-

10. JURY DECISION MAKING AND INSANITY

179

periment I was conducted to assess the effect that varying the insanity defense standard (ALI v. M'Naghten, GBMI), the locus of the burden of proof (prosecution vs. defense), and the standard of proof (beyond a reasonable doubt, clear and convincing evidence, and preponderance of the evidence) have on jurors' decisions of whether to find a defendant NGRI. Experiment 2 served as a replication and extension of the first experiment. In addition to investigating the insanity defense standard, burden, and standard of proof, participants' ability to comprehend the insanity defense instructions was also measured. Finally, the factors that participants employ in determining whether a defendant is NGRI were also evaluated.

EXPERIMENT I Method

Two hundred and fifty-five undergraduates (153 females, 102 males) at a midwestern university volunteered to participate in the study. Participants.

Materials. Videotaped Reenactment of a Trial. Melton and Gardner (in prep.) produced a videotape based on an actual trial in which the defendant's sanity was at issue. The videotape was edited down to a viewing time of IOI minutes for this study. 5 It was professionally produced by the Nebraska Educational Television network and was edited down to a viewing time of I 0 I minutes for this study. The videotaped reenactment was based on an actual trial from Michigan.

In the case, the defendant, a fundamentalist Christian, killed his daughter and three of her friends. The events leading to the killings are well delineated in the videotape. The defendant's daughter, who was a very good student at the University of Nebraska (as the case facts were reframed), left home and was living with her boyfriend at his apartment. The defendant and his wife did not know of their daughter's whereabouts, so they began a 3-day search for her. During the same time, the defendant was working long hours on his job with a railroad. The defendant and his wife paid an informant 20 dollars to obtain information about the daughter's location. They learned that their daughter was living with her boyfriend at his apartment in "Stonehead Manor." The apartment building was a known location for drugs and other dangers. Late in the evening following the third day of searching for their daughter, the defendant and his wife went to Stonehead Manor. The defendant claimed to have been armed because he was told that the building manager 5Jnformation

about the trial is available from the author.

180

OGLOFF

had weapons in his apartment. The defendant entered the apartment in which his daughter was living. He entered a bedroom in the apartment where he saw his nude daughter and her boyfriend in bed together. He approached the bed and hit the boyfriend with his gun. The gun discharged and a bullet hit his daughter. At this point, the defendant apparently panicked whereupon he shot his daughter again "to put her out of her misery." The defendant then shot and kille9 the boyfriend and two other young men who were sleeping in other rooms in the apartment. Evidence in the videotape shows that there was no evidence that the daughter and the boyfriend had sexual intercourse that evening. There was also no evidence that the daughter or the other victims had used drugs either. The defendant was apparently very rigid and conservative. Evidence was presented to show that he had placed great hope on the abilities of his daughter, who had disappointed him greatly by having premarital sexual intercourse with her boyfriend. Although the defendant held a teaching degree from the college in his home state of Tennessee, he did not feel that he would be a good teacher, so he chose blue-collar work. The defendant had a speech impediment that caused him to stutter. Aside from evidence that demonstrated that the defendant had worked extremely hard and had not slept for the 3 days during which he was searching for his daughter, expert psychiatric testimony was offered. The edited version of the videotape used in this study included testimony from a psychologist and a psychiatrist who both agreed that the defendant's mental state at the time of the offense was such that he was exhausted and probably unable to control his behavior. jury Instructions and Questionnaire. The jury instructions were based on those developed by Melton and Gardner (in prep.) and are standard jury instructions a judge would give jurors in a murder case in which the defendant's insanity is at issue. The instructions and questionnaires only varied to the extent that participants were assigned to conditions that varied the insanity standard (ALI, M'Naghten, GBMI, and No Instructions), burden of proof (Defendant or Prosecution), and standard of evidence (preponderance of the evidence [POE], clear and convincing evidence [CCE], and beyond a reasonable doubt [BRD)). 6 The instructions also included explicit information about the presumption of innocence, the burden of proof, reasonable doubt, murder and lesser included offenses, inferring deliberation and premeditation, and the disposition of the defendant. The questionnaire asked participants to render a verdict: (a) guilty of first degree murder, (b) guilty of second degree murder, (c) guilty of voluntary manslaughter, and (d) NGRI. Participants in the GBMI category were also given Wrhe instructions employed are available from the author.

181

10. JURY DECISION MAKING AND INSANITY

the option of finding the defendant guilty (of any category of murder) and GBMI.

Procedure Participants were tested in groups of 10 to 12 people. No participants withdrew from the study at any time. After watching the videotaped trial, participants read the jury instructions. Participants were then given the questionnaires to complete and were told that they could return to the jury instructions at any time while completing the questionnaire. The average testing time lasted approximately 2Y2 hours.

Results Verdict. A Loglinear Analysis revealed no significant interaction effect for experimental condition by verdict [x 2(18) = 12.57, n.s.]. There was no main effect for experimental condition [x 2(6) = 1.59, n.s.]. Thus, the verdict did not vary significantly across the experimental conditions (see Table 10.2). There was a significant main effect for verdict [x 2(3) = 93.04, p < .001]. Regardless of experimental condition, participants were more likely to choose a verdict of voluntary manslaughter than any other verdict. Post hoc comparisons of Guilty-NGRI verdicts for combined ALI and M'Naghten conditions also revealed no significant differences [x 2(3) = .28, n.s.]. Similarly, no significant results were obtained for post hoc analyses of burden of proof [x 2(3) = 1.69, n.s.] or standard of proof [x 2(3) = .43, n.s.]. TABLE 10.2 Results for Verdict in Experiment I

1st Degree Murder

2nd Degree Murder

Voluntary Manslaughter

NGR!

Total

N

(%)

N

(%)

N

(%)

N

(%)

N

I 2 4

4.3 8.7 4.5 5.9

2 I 5 8

8.7 4.3 22.7 11.8

15 18 15 48

65.2 78.3 68.2 70.6

5 2 I 8

21.7 8.7 4.5 11.8

23 23 22 68

M-P-BRD M-D-POE M-D-CCE M'Naghten

3

4.5 4.8 4.3 4.9

4 4 2 10

18.2 19.1 8.7 15.2

15 15 16 41

68.2 71.4 69.6 66.1

2 I 4 7

9.1 4.8 17.4 10.6

22 21 23 66

No lnstr. Total

2 9

9.1 5.9

7 25

31.8 16.6

II 100

50.0 66.2

2 17

9.1 11.3

22 151

ALI-P-BRD ALI-D-POE ALI-D-CCE ALl

182

OGLOFF

Guilty But Mentally Ill Condition. Analyses of the GBMI results revealed that when the GBMI options were available participants chose that option significantly more often than the NGRI option fx 2(3) = 14.29, p < .01). Twelve participants (46.15%) found the defendant GBMI, and only two participants found the defendant NGRI. Similarly, significantly more participants found the defendant GBMI (n = 12, 46.15%) than guilty of first or second degree murder (n = 1 (3.85%) each, x2(3) = 12.76, p < .05]. There was no significant difference between the number of participants choosing a verdict of Guilty of Voluntary Manslaughter (n = 10, 38.46%) or Guilty of Voluntary Manslaughter, But Mentally Ill [n = 8 (30. 77%), x2(3) = 0.36, n.s.). There was also no difference between the number of participants choosing a verdict of Guilty of Voluntary Manslaughter (n = 10, 38.46%) and those choosing any Guilty But Mentally Ill verdict [n = 12 (46.15%), x2(3) = 0.34, n.s.). Thus, participants were equally likely to find the defendant guilty of voluntary manslaughter and GBMI.

Discussion of Experiment I Results Verdict. There were no significant differences for verdict among any of the experimental conditions. The fact that more people found the defendant guilty of voluntary manslaughter decreases the likelihood that the participants' verdicts were simply random. Thus, the results present strong evidence suggesting that mock jurors do not appear to make verdict distinctions based on the insanity instructions with which they are presented. The results are supported by previous analogue research (Finkel, 1989; Finkel et al., 1985; Finkel & Handel, 1989) and some archival research (Pasewark et al., 1983). However, the findings are in conflict with some archival studies (Arens, 1967; Sauer & Mullins, 1976). In order to resolve this discrepancy, it may be helpful to measure mock jurors' comprehension of the insanity defense standards to determine whether, in fact, they can recall the subtle nuances of the standards with which they are presented. It is also necessary to determine what general factors participants consider important when deciding whether to find a defendant NGRI. Also, it would be interesting to see how much importance participants place on the elements of insanity defense standards. Similarly, because altering the burden and standard of proof do not seem to make a difference in mock jurors' decisions about the insanity defense, it is important to learn whether jurors can recall which side has been assigned the burden of proof, and what the burden of proof was. An important concern with these findings is that most participants decided that the defendant was guilty of voluntary manslaughter. Thus, because of the relatively small number of participants who found the defendant NGRI, it is difficult to be certain that a floor effect was not obtained, in which case

10. JURY DECISION MAKING AND INSANITY

183

one would not expect any difference among experimental groups. Thus, it is important to ensure that a substantial number of people find the defendant NGRI in order to increase the validity of results concerning both the NGRI acquittal rate, and participants' comprehension of the insanity defense standards. Guilty But Mentally Ill Condition. As previous research demonstrated (Savitsky & Lindblom, 1986), when the GBMI option is introduced, participants tend to choose that option more often than others. Indeed, the preceding results indicated that the addition of the GBMI verdict resulted in less participants choosing any option, except the voluntary manslaughter option. However, significantly fewer people chose the voluntary manslaughter in the GBMI condition than all other conditions. The results suggest that the GBMI verdict serves as a "miscellaneous" category for participants who may not wish to find the defendant "not guilty" by reason of insanity but feel badly about finding the defendant guilty. A second study was performed to replicate the findings of the first study and to evaluate participants' comprehension of jury instructions as well as the factors they believe are important when determining whether one should be found NGRI.

EXPERIMENT 2 Method Participants. Two hundred and twenty-seven undergraduates (137 females, 90 males) from a midwestern university volunteered to participate in the study. Materials. The same videotaped trial as described in Experiment I was used in Experiment 2. jury Instructions and Questionnaire. Participants were given the same jury instructions as described in Experiment I. For Experiment 2, however, no GBMI instructions were included. In order to elicit a maximum number of NGRI responses, the questionnaire participants completed had only two possible verdicts: (a) guilty of second degree murder, and (b) not guilty by reason of insanity. Other than verdict choices, the first questionnaire used in Experiment 2 was identical to the one described earlier in Experiment 1. Participants were given two additional questionnaires. The first follow-up questionnaire asked participants to list "all of the factors which you personally

184

OGLOFF

consider to be important in deciding whether to find a defendant not guilty by reason of insanity." In addition, an open-ended question asked participants to recall the insanity defense standard that was provided in the jury instructions they read. In a second follow-up questionnaire, participants were presented with a randomly ordered checklist of the elements of all the insanity defense tests. Participants were asked to "please mark any of these factors which were presented to you in the insanity defense standard provided in the jury instructions you read in the first package." The second follow-up questionnaire also used a recognition task to determine whether participants could correctly identify those elements of the insanity defense test with which they were presented. Participants were presented with the same checklist of insanity defense elements; this time, however, they are asked to "please mark the factors below which you believe to be important in determining whether a defendant should be found not guilty by reason of insanity." This questionnaire also tested whether participants could correctly recall which side had the burden of proof for proving sanity or insanity. Finally, participants were asked to check off the standard of proof with which they were presented in the jury instructions. Procedure

The procedure was identical to that followed in Experiment 1, except that participants were given the two follow-up questionnaires, one at a time, before they were debriefed. The first questionnaire was collected prior to being given either follow-up questionnaire. Results Questionnaire Results. Just as in the first study, a Loglinear Analysis revealed no significant interaction effect for experimental condition by verdict [x 2(6) = 2.36, n.s.]. There was also no main effect for experimental condition [x 2(6) = 1.92, n.s.]. Thus, again the verdict did not vary significantly across the experimental conditions (see Table 10.3). There also was a significant main effect for verdict [x 2(3} = 43.97, p < .001]. As Table 10.3 reveals, regardless of experimental condition, participants were more likely to choose a verdict of NGRI than guilty of second degree murder. Post hoc comparisons of Guilty-NGRI verdicts for combined ALl and M'Naghten conditions also revealed no significant differences [x 2(3) = 0.40, n.s.]. Similarly, no significant results were obtained for post hoc analyses of burden of proof [x 2(3) = 0.12, n.s.) or standard of proof [x 2(3) = 1.37, n.s.].

185

10. JURY DECISION MAKING AND INSANITY TABLE 10.3 Results for Verdict in Experiment 2 2nd Degree Murder

NGR/

Total

N

(%)

N

(%)

N

ALI-P-BRD ALI-D-POE ALI-D-CCE ALl

9 6 II 26

29.0 19.4 29.0 25.7

23 25 27 75

74.2 80.7 71.1 74.3

31 31 38 101

M-P-BRD M-D-POE M-D-CCE M'Naghten

10

8 10 28

30.3 25.0 34.5 29.8

23 24 19 66

69.7 75.0 65.5 70.2

33 32 29 94

No Instructions Total

7 61

22.6 27.0

24 165

77.4 73.0

31 226

Follow-Up Questionnaire I Comprehension of jury Instructions. The first follow-up questionnaire was designed to measure participants' comprehension of the insanity defense standards using a free-recall technique. Participants were asked to try their best to write out the insanity defense instructions they were given. If the participant correctly identified an element of the insanity defense with which he or she was presented, the item was marked correct. Raters were trained by the author to understand each of the elements of the insanity defenses so they would be able to identify items that were substantively correct but did not contain language identical to the elements. One rater scored all the questionnaires and a second rater graded 50 randomly selected questionnaires to obtain a rating of inter-rater reliability. A Pearson product-moment correlation showed that the inter-rater reliability across all 50 questionnaires was r = .88. Participants' comprehension was relatively low for both the ALI and M'Naghten rules. An ANOVA performed on the Insanity Elements revealed significant differences among the number of people correctly recalling insanity elements (F(8, 191) = 2.89, p < .01). Post hoc analyses revealed no significant differences for the number of people correctly recalling ALI element. However, significantly more people receiving the M'Naghten standard correctly recalled the "at the time of the offense" (M = 39.71, sd = 7.71) and "as not to know the nature and quality of the act he was doing" (M = 38.42, sd = 8.11) items than the "defendant was laboring under such a defect of reason" (M = 13.91, sd = 3.26) or "he did not know what he was doing was wrong" (M = 18.54, sd = 4.76) items.

186

OGLOFF

TABLE 10.4 Mean Number of Participants Correctly Recalling Insanity Defense Elements

Insanity Defense Elements

A. M'Naghten a. at the time of the offense b. the defendant was labouring under such a defect of reason c. from disease of the mind d. as not to know the nature and quality of the act he was doing e. or. if he did know it, that he did not know what he was doing was wrong. B. ALl a. at the time of the offense b. as a result of mental disease or defect c. the defendant lacked the capacity either to appreciate the wrongfulness of his conduct or d. he lacked substantial capacity to conform his conduct to the requirements of the law.

Percentage Answering Correctly

42.71% 14.89% 31.03% 42.11% 20.83% 30.39% 31.37% 34.31% 29.41%

All the ALI items were correctly recalled by more people than these two M'Naghten items as well.

An ANOVA performed on Insanity Standards revealed that there was no difference between the mean number of correct items recalled for M'Naghten (M = 2.93, sd = 0.46) or ALI (M = 3.13, sd = .32, F(1, 191) = 2.37, n.s.). Follow-Up Questionnaire 2 Comprehension of Insanity Defense Standard. This questionnaire measured the participants' ability to correctly identify the elements of the insanity defense instruction that they read. Analyses revealed that participants who received the ALI instructions identified different items than those receiving the M'Naghten instructions (x 2(7) = 18.01, p < .05]. Indeed, a Spearman rank-order correlation comparing the item rankings of participants who received the ALI instructions and those who received the M'Naghten instructions show very little relationship between the ranking of items (r(8) = 0.10, n.s.). Both groups tended to correctly identify more items with which they were presented in their insanity instructions than items that were not presented to them. However, as Table 10.5 shows, the percentage of people checking the correct elements still remained relatively low. Comprehension of Burden of Proof. Results indicated that 63.9% of all participants correctly identified the assignment of the burden of proof. There was no significant difference for correct identification of burden of proof be-

187

10. JURY DECISION MAKING AND INSANITY TABLE 10.5 Rank Ordering of the Insanity Defense Elements Correctly Recognized by Participants

Insanity Defense Elements

A. ALl Test *1. The defendant's acts were a result of mental disease or defect *2. The defendant lacked substantial capacity to conform his conduct to the requirements of the law *3. At the time of the offense *4. The defendant lacked substantial capacity to appreciate the wrongfulness of his conduct 5. The defendant was suffering from a disease of the mind 6. The defendant was laboring under a defect of reason 7. The defendant did not know the nature and quality of the act he was doing 8. The defendant did not know that what he was doing was wrong

B. M'Naghten 1. The defendant's acts were a result of mental disease or defect *2. The defendant was laboring under a defect of reason *3. The defendant was suffering from a disease of the mind *4. At the time of the offense • 5. The defendant did not know what he was doing was wrong *6. The defendant did not know the nature and quality of the act he was doing 7. The defendant lacked substantial capacity to appreciate the wrongfulness of his conduct 8. The defendant lacked substantial capacity to conform his conduct to the requirements of the law

Percentage of Participants Checking the Item

20% 14.94% 14.71% 11.95% 11.95% 9.89% 8.97% 7.59%

15.79% 13.73% 13.50% 12.59% 12.36% 12.13% 10.07% 9.84%

*Indicates that the element is part of the insanity defense standard in question.

tween participants who had received the ALI standard (65.11 %) or those who had received the M'Naghten standard (62.5%). Comprehension of Standard of Evidence. Overall, 51.59% of participants identified the correct standard of proof. Significant differences occurred between the correctly and incorrectly identified standards of evidence in two experiment conditions. The results for these conditions are presented in Table 10.6. Thus, in the two conditions when the burden of proof was assigned to the prosecution to prove the defendant's sanity beyond a reasonable doubt, significantly more people correctly identified the standard of proof.

188

OGLOFF

TABLE 10.6 Differences in Comprehension of Standard of Proof

A. ALI-Burden of Proof Assigned to the Prosecution Beyond a Reasonable Doubt Correct

BRD POE CCE Total

Incorrect

Total

N

(%)

N

(%)

N

17 2 4 23

73.9 8.7 17.4 33.0

6 21 19 46

26.1 91.3 82.6 66.7

23 23 23 69

i(5) = 25.96, p < .001 B. M'Naghten-Burden of Proof Assigned to the Prosecution Beyond a Reasonable Doubt Correct

BRD POE

CCE Total

Total

Incorrect

N

(%)

N

(%)

N

15 0 8 23

65.2 0.0 34.8 33.0

8 23 15 46

34.8 100.0 65.2 66.7

23 23 23 69

i(5) = 22.04, p < .001

Factors Participants Consider Important When Making Insanity Defense Determinations Follow-Up Questionnaire I. This questionnaire also asked participants simply to list all the factors they considered essential when making decisions about whether to find the defendant NGRI. Table 10.7 shows a rank ordering of the factors that participants provided. No significant differences occurred among experimental groups for the items participants regarded as being important when determining whether to find a defendant NGRI. lnterrater reliability was obtained in a similar manner as described previously in the comprehension section. This time, a rater read the responses and listed all the factors that participants mentioned as having been important to them. The rater continued reading through all the questionnaires, adding new items to the master list whenever they arose. A second rater read through 50 questionnaires (approximately one fourth) and listed the factors participants mentioned. A Pearson product-moment correlation showed that there was a relatively high degree of inter-rater agreement (r = .79). Follow-Up Questionnaire 2. The second follow-up questionnaire provided participants with a random checklist of all the elements from the ALI and M'Naghten insanity defense standards. Participants were asked to check off

189

10. JURY DECISION MAKING AND INSANITY TABLE 10.7 Factors Participants Considered Important in Determining Whether a Defendant is Not Guilty By Reason of Insanity Rank

Factor

N

%*

I. 2. 3.

Expert psychiatric testimony Defendant's intent to harm • *Whether the defendant was insane at the time of the offense Background of the defendant Defendant's past history of mental illness • *Whether the defendant appreciated the wrongfulness of his actions Defendant's ability to recall events from the time of the offense **Defendant's ability to control his actions at the time of the offense Situation at the scene of the crime Defendant's own testimony Offender /Victim relationship Defendant's remorse

77 73 62 51 50 32 30 30 19 17 8 6

39 37 31 26 25 16 15 15 10 9

4. 5. 6. 7. 8. 9. 10. II. 12.

4 3

*Indicates the percentage of participants who identified the particular factor as being important. • *These items are factors of insanity defense standards.

those items they felt were important in deciding whether a defendant is NGRI. Table 10.8 shows a rank ordering of the insanity defense elements that participants found important. A chi-square analysis revealed that the number of people endorsing an element as important did not vary significantly depending on the insanity standards they received (x2{7) = 9.68, n.s.]. Although not significant, a Spearman rank-order correlation coefficient does show some degree of relationship between the rank orderings of participants who received the ALI instructions and those who received M'Naghten instructions [r(8) = 0.57, p < .10].

GENERAL DISCUSSION

The Impact of Jury Instructions Results from both experiments revealed no significant differences for verdict among experimental conditions. However, participants' verdicts were not simply random: Most people in the first study found the defendant guilty of voluntary manslaughter, and the most popular verdict in the second study was NGRI. The results suggest that mock jurors do not appear to make verdict distinctions based on the insanity instructions with which they are presented. This conclusion is supported by the fact that there were no significant differences regarding verdict choice for participants in the no-instruction

190

OGLOFF

TABLE 10.8 Rank Ordering of the Insanity Defense Elements Participants Considered Important Insanity Defense Elements

I. 2. 3. 4. 5. 6. 7. 8.

The defendant's acts were a result of mental disease or defect The defendant was suffering from a disease of the mind The defendant did not know that what he was doing was wrong The defendant did not know the nature and quality of the act he was doing The defendant lacked substantial capacity to conform his conduct to the requirements of the law At the time of the offense The defendant lacked substantial capacity to appreciate the wrongfulness of his conduct The defendant was laboring under a defect of reason

N*

%**

172

115

18.05 16.16 12.o7

112

11.75

107 98

11.23 10.28

98

10.28 10.18

!54

97

100.0% *Indicates the number of participants who identified the particular item as being important when making a determination about the insanity defense. • *These percentages are based on the total number of factors checked as being important, not the percentage of participants choosing the item.

condition. Thus, participants who were not given insanity defense instructions made similar verdict choices compared to those participants who received insanity defense instructions. Similarly, no significant effects were found for the assignment of burden of proof or standard of proof in either study. Therefore, again, there is reason to believe that mock jurors may not focus their verdict decision making on the legal standards alone. Indeed, as noted earlier, the M'Naghten and ALI standards are identical except for some subtle language differences and the ALI's added volitional prong. Thus, if differences are to occur between findings of NGRI using M'Naghten as compared to ALI standards, the differences will be the result of one or more of three factors: (a) Jurors are able to detect and understand the language differences between those elements of M'Naghten and ALI that are similar; (b) jurors must place some degree of importance on the differences between the standards when deliberating about the insanity defense; or, (c) differences may arise in cases where the defendant has a general cognitive understanding of "right" and "wrong," yet the ability to control his or her behavior is at issue. Experiment 2 results showed that mock jurors did not remember the insanity defense standard with which they were presented. 7 Indeed, the very 7Arguably, participants may have been able to remember the insanity defense if they had been given an opportunity to deliberate, and if they were given more incentive to process the information they were given in the instructions. Thus, future researchers may want to investigate these points further.

10. JURY DECISION MAKING AND INSANITY

191

highest mean percentage of people who were able to correctly recall any element of the defense was only 42.71, and that was for one of the most obvious factors, the "at the time of the offense" requirement. Although these results do not necessarily mean that they did not understand and even employ the standards while they were deciding the outcome of the case, it does show that the mock jurors certainly did not find the instructions salient enough to remember them. Consideration of the results of the comprehension level for the ALI elements help elucidate the results, showing a lack of effect for insanity defense instructions. Indeed, less than one third of those participants who were provided with ALI instructions recalled the volitional prong of ALI-the element that most differentiates the ALI and the M'Naghten standards. Logically then, one would expect no more than one third of the participants who received the ALI standards to provide different verdicts than they would have provided had they been given the M'Naghten instructions. Further, this statement would be true only for those cases where the defendant's volition is a central issue. The results obtained from the checklist of insanity elements in Experiment 2 tend to indicate that the elements of the standards are hardly distinguishable by participants. Indeed, when asked to check off those elements that were part of the insanity defense with which they were presented, participants seemed to choose the items virtually at random. Thus, not only are participants rather incapable of recalling the elements of the standards with which they were presented, but they are even less able to correctly differentiate and identify them. Even if jurors could comprehend the standards well, one would not necessarily expect the standards to play a significant role in their decision making unless they believed the elements of the standards were important to their decision making. Again, however, evidence from Experiment 2 does not support this proposition. Indeed, when asked to note those factors that were important in deciding whether to find a defendant NGRI, only three insanity defense elements emerged. The first was listed by 31% of participants: "whether the defendant was insane at the time of the offense." Arguably, one would hardly need to receive jury instructions regarding insanity in order to know that, before one can be found not guilty by reason of insanity, one must be "insane." The other two factors that correspond to the insanity defense standards were "whether the defendant appreciated the wrongfulness of his actions" and "the defendant's ability to control his actions." Approximately 15% of the participants chose one of these two factors as being important. Thus, fewer than one out of seven participants felt that crucial elements of the insanity defense standard were important when determining whether to find a defendant NGRI. To further complicate matters, there was no sig-

192

OGLOFF

nificant difference between the number of ALl and M'Naghten participants who identified the "defendant's ability to control his actions" as being an important factor. Of course, only the ALl participants were given this element in their insanity defense instructions. Thus, this suggests that participants may simply use their visceral instincts when deciding whether a defendant is NGRI. The concept that jurors may rely on their own "prototypes" or "schemata" when making decisions has been supported by other researchers (Eiwork et al., 1977; Severance & Loftus, 1982; Smith, in press). For example, in a line of research, Smith has found that mock jurors may determine the guilt or innocence of a defendant by comparing the characteristics of the defendant's crime to the features of their prototype or schema for the crime. Thus, the defendant is judged guilty when there is sufficient feature overlap between the case a juror is deciding and their prototypical case (Smith, in press). The results reported support the concept that mock jurors may use their own schemata when deciding the outcome of a trial. Further, their schema for "legally insane" is surprisingly narrow and concrete. Indeed, only 12 factors were identified as being important when determining whether a defendant is NGRI. Only five factors were endorsed by at least 25% of the subjects: expert psychiatric testimony, defendant's intent to harm, whether the defendant was insane at the time of the offense, the background of the defendant, and the defendant's past history of mental illness. Thus, the schemata subjects employ appear to be relatively consistent among subjects. More puzzling perhaps than the finding that the insanity defense standards do not result in different verdicts is the finding that neither the burden of proof nor standard of proof appear to have a significant effect on participants' decision making either. Indeed, as any lawyer knows, it is much harder to have to prove something than it is to muster enough suspicion for one to develop a level of doubt. Thus, one would intuitively expect the assignment of burden of proof to play a more important role than the results from this study suggested. Similarly, the standard of proof is also very significant from a legal perspective-yet apparently not so for the participants in these studies. Again, however, the answer to the perplexing question of why the instructions-this time about burden and standard of proof-do not have a significant impact on jurors' decision making may lie in the fact that only 64% of participants correctly identified the burden of proof. Although this figure may seem high, one must remember that because there were only two choices (prosecution and defendant) the percentage of people who correctly identified the burden of proof is really only 14 points higher than chance. Slightly over 50% of participants correctly identified the standard of proof. Although this number is much higher than chance (33%), it is still relatively low considering almost half the participants did not know the standard of proof in their condition. Further, the results show that the only cells in which

10. JURY DECISION MAKING AND INSANITY

193

significantly more people than not identified the correct burden of proof and standard of proof was when the burden of proof was on the prosecution, beyond a reasonable doubt. Almost without a doubt, if you were to ask people who has the burden of proof, and what the standard of proof is in criminal cases, many would say the state has the burden and the standard of proof is beyond a reasonable doubt. 8 Thus, one must question the extent to which the instructions actually influence jurors' decisions. Interestingly, in a recent study Kagehiro (1990) tested the ability of participants to recall the standard of proof in cases. Participants were either presented with the standard of proof used in traditional legal definitions or in quantified definitions (i.e., specifying the percentage of certainty a juror must have to satisfy the burden of proof; e.g., beyond a reasonable doubt means approximately 95% certain). Kagehiro's results suggest that when traditional instructions are used, participants' comprehension rates were strikingly similar to those found in Experiment 2 herein. However, when participants were given quantified definitions of standard of proof, they were significantly better to recall the burden of proof. Thus, there is some evidence to suggest that court should use quantified definitions of standard of proof if they want to maximize jurors' comprehension of those instructions. Factors that Mock Jurors Consider Important in Making Decisions Regarding the Insanity Defense

The findings regarding the factors that jurors use when determining whether one is NGRI support the findings by Finkel et a!. (Finkel, 1988; Finkel et a!., 1985; Finkel & Handel, 1989). Participants reported focusing on a number of important factors when deciding whether to find a defendant NGRI. As Table 10.7 shows, participants found the expert psychiatric testimony to be most important when making their decision. This finding is supported by Steadman and Braff (1983), whose findings show that the controlling factor in NGRI adjudications is whether the pretrial forensic examiner found the NGRI defendant insane. Interestingly, mock jurors also reported relying quite heavily on the defendant's intent to harm. It is unclear whether they actually focused on the defendant's cognitive ability to form intent or his mens rea. As noted previously, 8 In an informal questionnaire completed by students in two introductory psychology classes I taught, 53% of the 400 students correctly identified the state or prosecution when asked, "In a criminal law case, where a defendant has been charged with a crime, who has the responsibility of proving that the defendant did-or did not-commit the crime?" Similarly, 55% of the students responded with "beyond a reasonable doubt" when asked, "How certain must the judge and/or jury be that the defendant did, or did not, commit the crime in order to find him or her guilty?"

194

OGLOFF

the participants also reported that whether the defendant was insane was also important in deciding whether to find him NGRI. These were the most salient factors provided, as indicated by the fact that between 33% and 40% of participants identified them. Participants also focused on the defendant himself. Of importance to them was the defendant's background, his past history of mental illness, his ability to recall events from the offense, his testimony, his relationship with the victim, and his remorse. These factors seem to indicate that participants were particularly concerned with the defendant's veracity. As already mentioned, participants also reported that the defendant's ability to appreciate the wrongfulness of his offense and his ability to control his behavior were important factors in considering whether to find the defendant NGRI. Overall, participants focus on a number of important factors when making their verdict. The factors tend to focus around three themes. First, participants attempt to learn whether the defendant was in fact insane at the time of the offense. Participants are also interested in knowing about the defendant's history and character. Finally, some participants focus on the defendant's cognitive abilities and volitional control. It is interesting to note that, although a few of these factors are exactly the same as insanity defense elements, there is certainly some level of concordance between the factors participants supplied and the insanity defenses in general. Indeed, one must be insane to be found NGRI, and one must have had either cognitive or volitional difficulties. Due to the relatively small number of participants who identified the latter two factors as being important, and the fact that these factors were equally important to participants who received ALI or M'Naghten standards, it is apparent that the factors that participants find important do not correspond closely with the elements in the insanity defense standards. This information suggests that, although participants may identify the elements of actual insanity defense items as being important, their responses do not appear to be arbitrary. Indeed, participants identified thoughtful and logical factors as being important when trying to determine whether a defendant is NGRI.

THEORETICAL IMPLICATIONS

The findings presented may have important theoretical implications that provide some support for the contention that, for whatever reason, the particular insanity defense standards employed do not seem to strongly influence a juror's decision making. Thus, any differences that exist between the ALI and M'Naghten standard may be practically meaningless. Assuming that the previous information is valid and generalizable, jurors likely base their decisions on factors other than the instructions they receive.

10. JURY DECISION MAKING AND INSANITY

195

As Smith (in press) and others have noted (e.g., Kassin & Wrightsman, 1988; Pfeifer & Ogloff, 1991; Steele & Thornburg, 1988), jurors may base their decision making on pre-existing schemata or prototypes. In addition, these schemata may be difficult to alter. Nonetheless, as Roberts et al. (1987) and Finkel and Duff (1989) reported, changing verdict schema options, by presenting jurors with different cases, results in different verdicts. Similarly, not only can different schemata produce different verdicts, but they can produce verdicts that fit more closely with jurors' ratings and constructs, and verdicts that significantly reduce error variance (Finkel, 1990). Finally, it is unclear whether and to what extent jury deliberation may effect an individual juror's schema. If jurors are not likely either to remember or employ traditional insanity defense standards, two options are available: (a) Jurisdictions could employ a standard that is more consistent with the way jurors intuitively construe insanity, or (b) attempts could be made to make insanity defense instructions crystal clear to jurors. Although there is not enough empirical information to determine how jurors construe insanity and how they make decisions regarding the insanity defense, the "Justly Responsible" test is one formulation for a test of insanity that is surprisingly consistent with the conclusion that jurors apply their own sense of justice when determining whether to find a defendant NGRI. The American Law Institute considered the Justly Responsible test when developing an insanity defense standard (Hermann & Sor, 1983). The Justly Responsible test holds that defendants are not to be found criminally responsible if their mental impairment is so substantial that they cannot be held justly responsible (Goldstein, 1967). 9 As Hermann and Sor (1983) specified: "The theory behind this alternative is that the justly responsible standard encourages the jury to apply its own sense of justice according to accepted community standards for determining whether the defendant ought to be found responsible for a crime" (p. 125). Thus, the Justly Responsible test appears to articulate the very manner in which participants were determining whether to find the defendant NGRl: They were relying on their own schemata of insanity. 10 Further, because of the consistency by which participants identified important factors for making their decision, there is some evidence to believe that their decisions reflect the community standard of criminal responsibility. 9The modified Justly Responsible test was adopted by the Rhode Island Supreme Court in State u. Johnson (1979). 10 1n a pilot study that preceded the research performed here, the Justly Responsible test, as employed by Rhode Island (State u. Johnson, 1979), was used as an alternate insanity defense standard. Although the n was low (20), there were no significant differences among findings of NGRIIor participants who were given the Justly Responsible test (10%) and those who were given the other insanity defense standards. This provides some support lor the contention that, in fact, participants are already employing their own version of the Justly Responsible test.

196

OGLOFF

The objections raised against the justly responsible standard suggest that such a test would give juries so much leeway that the result would be grossly varying verdicts in similar cases. 11 This criticism is certainly open for empirical review. Based on the findings in the studies here, and many other studies, there is little evidence to believe that insanity juries are at all "acquittal prone." Thus, it is unlikely that an inappropriately large acquittal rate would occur. Further, as suggested by Finkel's work, there are some surprising similarities among the factors people consider important when deciding an insanity case. Overall, it is not entirely clear whether gross discrepancies would occur among juries and cases if a standard such as the justly responsible test was employed. The second option just raised is to develop a method of writing and relaying insanity defense instructions so that they will be clear to jurors. Indeed, if courts want insanity defense instructions to have a greater impact on jurors, they must take steps to increase the jurors' understanding of jury instructions. For example, courts may wish to employ techniques such as Kagehiro's (1990) idea of giving jurors quantified definitions of standard of proof, or Elwork et al.'s (1982) and Steele and Thornburg's (1988) techniques of rewriting jury instructions to make them more understandable. Although jury instructions could be made more understandable, the present research suggests that, even if they could understand them, jurors may not find the elements of the standards important. Therefore, legal scholars and psycholegal scholars may wish to develop an insanity defense standard that more closely reflects those elements that jurors find important when deciding the fate of a defendant when the insanity defense is at issue.

LIMITATIONS OF THE STUDIES AND THE NEED FOR FUTURE RESEARCH

The most apparent problem with mock jury research revolves around the inherent limitations concerning external validity. Criticisms are generally based on the methodological inadequacies of research due to the fact that analogue studies do not replicate all aspects of an actual trial. Indeed, it is impossible to argue that a high degree of external validity exists when one attempts to simulate a real jury operating in a real case by using undergraduates who view a 101-minute videotape, read some instructions, and complete a few questionnaires without deliberation. The major concerns regarding external validity and simulated jury research have been reviewed elsewhere (Davis, Bray, & Holt, 1977; Gerbasi, Zucker11 For

a discussion of the potential problems with the justly responsible test, see U.S. v. Brawn-

er, 471 F.2d 969 (D.C. Cir. 1972).

10. JURY DECISION MAKING AND INSANITY

197

man, & Reis, 1977; MacCoun, 1990; Pfeifer, 1990; Pfeifer & Ogloff, 1991; Weiten & Diamond, 1979; Wilson & Donnerstein, 1977). First, the responses of college undergraduates acting as "jurors" may not reflect the responses of actual jurors. Also, undergraduate participants may not be nearly as motivated to pay attention to instructions and evidence as jurors would in an actual case. Second, by focusing on the decisions of individual jurors, we Jearn little about jury deliberation and jury decisions. Third, because participants are aware that they are in an experiment, they simply may not take their roles as seriously as they would if they were in a real trial. Four, critics argue that because analogue studies do not incorporate all aspects of a trial (e.g., jury instructions, deliberations), they do not provide valid information about how a real jury would behave. Finally, there is a question about the generalizability of results obtained from studies relying on only a few, or one, case. Unfortunately, the research in this chapter is flawed by many of the previous criticisms. Indeed, the participants were undergraduates who knew that they were in an experiment, participants did not deliberate, and the verdict choices in Experiment 2 were not those a real jury would have received. In addition, only one case was employed in the experiments. Nonetheless, the findings are important for a variety of reasons. First, a high level of experimental control was employed in these experiments. Thus, the only factors that were systematically varied among participants were the jury instructions. Therefore, any differences that occurred between participants should have been because of the jury instructionsand the lack of differences may be the result of the instructions' lack of effect on participants. Thus, whereas the results obtained in this study may not translate directly to the "real world," the findings do provide reason to question whether actual jurors can-or do-distinguish among insanity defense standards. Second, given the major purpose of the experiments conducted herein, to determine whether the use of insanity instructions per se causes differences in participants' decisions regarding a defendant's guilt, highly controlled empirical research is arguably the most suitable research approach. Certainly, archival research is more externally valid, but so many factors must be considered that it becomes important to know the effect of a single variablelike jury instructions regarding the insanity defense. In designing and conducting the experiments, care was taken to ensure some modicum of external validity. Indeed, these are the only insanity defense studies that have taken into consideration important legal concerns like burden and standard of proof. Further, the jury instructions used herein were comprehensive; participants were not simply given the insanity instructions and asked to render a decision. Thus, the trade-off for a high level of experimental control in these studies was a reduced degree of external validity.

198

OGLOFF

One serious limitation of the studies is that, although the findings were reliable between Experiments 1 and 2, the results could differ when different cases are employed. However, research conducted by Finkel (Finkel, 1988; Finkel et al., 1986; Finkel & Handel, 1989) suggests that, although jurors may arrive at different verdicts for different defendants, their verdicts are not significantly influenced by insanity defense instructions. Similarly, James (1959b) found that the jurors' accuracy rate for the insanity defense instructions was lower than that for any of the other material participants were asked to recall. Nonetheless, future researchers will want to test the reliability of the findings obtained herein by using other cases. Obviously, the results reported here represent only another small step in a course of research that must be conducted if we are ever to learn the true impact that instructions about insanity have on juror-and jury-decision making. Future researchers must strive to enhance the external validity of their research, without losing too much control over the extraneous variables. Similarly, as noted in the introduction, researchers need to conduct longitudinal archival research using time-series methodology to attempt to determine whether the increases in insanity acquittal rates that have been found in some archival studies are actually caused by the change in the insanity defense.

ACKNOWLEDGMENTS

This chapter was made possible in part by a Social Sciences and Humanities Research Council Research Grant awarded to the author. I would like to thank Gary Melton, Norman Finkel, David Finkelman, and Ronald Roesch for their comments on previous versions of this work. Some of the material in this chapter appeared in Ogloff (1991).

REFERENCES American Psychiatric Association. (1982). Statement on the insanity defense. Washington, DC: Author. Arens, R. (1967). The Durham rule in action: Judicial psychiatry and psychiatric justice. Law and Society Review, 41-80. Arens, R., Granfield, D. D., & Susman, J. (1965). Jurors, jury charges and insanity. Catholic University Law Review, XIV, 1-29. Arens, R., & Susman, J. (1966). Judges, jury charges and insanity. Howard Law Journal, 12, 1-34. American Law Institute. (1962). Model penal code. Washington, DC: Author. Blunt, L. W., & Stock, H. V. (1985). Guilty but mentally ill: An alternative verdict. Behavioral Sciences and the Law, 8, 49-67. Callahan, L., Mayer, C., & Steadman, H. J. (1987). Insanity defense reform in the United States-post Hinckley. Mental and Physical Disability Law Reporter, 11, 54-59.

10. JURY DECISION MAKING AND INSANITY

199

Davis, J. H., Bray, R. M., & Holt, R. W. (1977). The empirical study of decision processes in juries: A critical review. In J. Tapp & F. Levine (Eds.), Law, justice, and the individual in society (pp. 326-361). New York: Holt, Rinehart & Winston. Elwork, A., Sales, B. D., & Alfini, J. (1977}. Juridic decisions: In ignorance of the law or in light of it? Law and Human Behavior, /, 163-190. Elwork, A., Sales, B. D., & Alfini, J. (1982). Making jury instructions understandable. Charlottesville, VA: Michie/Bobbs-Merrill. Elwork, A., Sales, B. D., & Suggs, D. (1981). The trial: A research review. In B. D. Sales (Ed.), The trial process (pp. 1-68). New York: Plenum Press. Fingarette, H. (1972). The meaning of criminal insanity. Berkeley: University of California Press. Fingarette, H., & Hasse, A. F. (1979). Mental disabilities and criminal responsibility. Berkeley: University of California Press. Finkel, N. J. (1988). Maligning and misconstruing jurors' insanity verdicts: A rebuttal. Forensic Reports, 1, 97-124. Finkel, N.J. (1989). The Insanity Defense Reform Act of 1984: Much ado about nothing. Behavioral Sciences and the Law, 7, 403-419. Finkel, N. J. (1990, August). The insanity defense: A comparison of verdict choice options. Paper presented at the Annual Convention of the American Psychological Association, Boston. Finkel, N. J., & Duff, M. A. (1989). The insanity defense: Giving jurors a third option. Forensic Reports, 2, 235-263. Finkel, N.J., & Handel, S. F. (1989). How jurors construe "insanity." Law and Human Behavior, 13, 41-59. Finkel, N.J., Shaw, R., Bercaw, S., & Koch, J. (1985}. Insanity defenses: From the jurors' perspective. Law and Psychology Review, 9, 77-92. Gerbasi, K. L., Zuckerman, M., & Reis, H. T. (1977}. Justice needs a new blindfold: A review of mock jury research. Psychological Bulletin, 84, 323-345. Goldstein, A. (1967}. The insanity defense. New Haven, CT: Yale University Press. Hermann, D. H. J., & Sor, Y. S. (1983). Convicting or confining? Alternative directions in insanity defense reform: Guilty but mentally ill versus new rules for release of insanity acquittees. Brigham Young University Law Review, 499-638. James, R. M. (1959a}. Jurors' assessment of criminal responsibility. Social Problems, 7, 58-67. James, R. M. (1959b). Status and competence of jurors. American Journal of Sociology, 65, 563-567. Kagehiro, D. (1990). Defining the standard of proof in jury instructions. Psychological Science, I. 187-193. Kassin, S., & Wrightsman, L. ( 1988). The American jury on trial: Psychological perspectives. New York: Hemisphere. Keilitz, I. (1987}. Researching and reforming the insanity defense. Rutgers Law Review, 39, 289-322. Loh, P. W., Jeffries, J. C., & Bonnie, R. J. (1986). The trial of John W Hinckley, Jr.: A case study of the insanity defense. Mineola, NY: The Foundation Press. Luckey, J. W., & Berman, J. J. (1979). Effects of a new commitment law on involuntary admissions and service utilization patterns. Law and Human Behavior, 3, 149-162. MacCoun, R. J. (1990). Experimental research on jury decision-making. Jurimetrics Journal, 30, 223-233. McGraw, B. D., Farthing-Capowich, D., & Keilitz,l. (1985). The "guilty but mentally ill" plea and verdict: Current state of the knowledge. Villanova Law Review, 30, 117-191. Melton, G. B., & Gardner, M. R. (in prep.). Effects of addition of a GBMI verdict. Unpublished research report, University of Nebraska-Lincoln. Melton, G. B., Petrila, J., Poythress, J., & Slobogin, C. (1987). Psychological evaluations for the courts: A handbook for mental health professionals and lawyers. New York: Guilford. Mickenberg, I. (1987). A pleasant surprise: The guilty but mentally ill verdict has both succeeded in its own right and successfully preserved the traditional role of the insanity defense. Cincinnati Law Review, 55, 943-996.

200

OGLOFF

Moran, R. (1981). Knowing right from wrong: The insanity defense of Daniel McNaughton. New York: Macmillan, Free Press. Moran, R. (1985). The modern foundation for the insanity defense: The cases of James Hadfield (1800) and Daniel McNaughton (1843). The Annals of the American Academy of Political and Social Science, 477, 31-42. National Institute of Justice, U.S. Department of Justice (1989). Topical bibliography: Insanity defense/competency to stand trial. Washington, DC: Author. Ogloff, J. R. P. (1991). A comparison of insanity defense standards on juror decision-making. Law and Human Behavior, 15, 509-531. Ogloff, J. R. P., Schweighofer, A., Turnbull, S., & Whittemore, K. (1992). Empirical research and the insanity defense: How much do we really know? In J. R. P. Ogloff (Ed.), Psychology and law: The broadening of the discipline (pp. 171-21 0). Durham, NC: Carolina Academic Press. Pasewark, R. A., & McGinley, H. (1985). Insanity plea: National survey of frequency and success. Journal of Psychiatry and Law, 13, 101-108. Pasewark, R. A., Randolph, R., & Bieber, S. (1983). Insanity plea: Statutory language and trial procedures. Journal of Psychiatry and Law, 12, 399-422. Perlin, M. L. (1989). Mental disability law: Civil and criminal. Charlottesville, VA: The Mitchie Company. Petrella, R. C., Benedek, E. P., Bank, S.C., & Packer, I. (1985). Examining the application of the guilty but mentally ill verdict in Michigan. Hospital and Community Psychiatry, 36, 254-259. Pfeifer, J. E. (1990). Juries and racism: Findings of discrimination or discriminatory findings? Nebraska Law Review, 69, 230-250. Pfeifer, J. E., & Ogloff, J. R. P. (1991). Ambiguity and guilt determinations: A modern racism perspective. Journal of Applied Social Psychology, 2 I, 1713-1725. Picquet, D., & Best, R. (1985). The insanity defense: A bibliographic research guide. New York: Harrison. Roberts, C. F., Golding, S. L., & Fincham, F. D. (1987). Implicit theories of criminal responsibility. Law and Human Behavior, 11, 207-232. Sales, B. D., & Hafemeister, T. (1984). Empiricism and legal policy on the insanity defense. In L. A. Teplin (Ed.), Mental health and criminal justice (pp. 253-278). Beverly Hills, CA: Sage. Savitsky, J. C., & Lindblom, W. D. (1986). The impact of the Guilty But Mentally Ill verdict on juror decisions: An empirical analysis. Journal of Applied Social Psychology, /6, 686-701. Sauer, R. H., & Mullens, P.M. (1976). The insanity defense: M'Naghten versus ALl. Bulletin of American Academy of Psychiatry and the Law, 4, 73-75. Severance, L. J., & Loftus, E. F. (1982). Improving the ability of jurors to comprehend and apply criminal jury instructions. Law and Society Review, 17, 153-197. Simon, R. J. (1967). The jury and the defense of insanity. Boston: Little, Brown. Simon, R. J., & Aaronson, D. E. (1988). The insanity defense: A critical assessment of law and policy in the post-Hinckley era. New York: Praeger. Slobogin, C. (1985). The guilty but mentally ill verdict: An idea whose time should not have come. George Washington Law Review, 53, 494-527. Smith, V. L. (in press). The impact of pre-trial instruction on jurors' information processing and decision making. Journal of Applied Psychology. Steadman, H. J., & Braff, J. (1983). Defendants not guilty by reason of insanity. In J. Monahan & H. J. Steadman (Eds.), Mentally disordered offenders: Perspectives from law and social science (pp. I 09-132). New York: Plenum Press. Steele, W. W., & Thornburg, E. G. (1988). Jury instructions: A persistent failure to communicate. North Carolina Law Review, 67, 77-119. Wei ten, W., & Diamond, S. S. (1979). A critical review of the jury simulation paradigm: The case of defendant characteristics. Law and Human Behavior, /, 71-94. Wilson, D. W., & Donnerstein, E. (1977). Guilty or not guilty? A look at the "simulated" juror paradigm. Journal of Applied Social Psychology, 7, 175-190.

10. JURY DECISION MAKING AND INSANITY

CASES Commonwealth v. Vogel, 440 Pa. 1, 2, 268 A.2d 89, 90 (1970). Commonwealth v. York, 50 Mass. 93 (1845). Durham v. United States, 214 F.2d 862 (D.C. Cir. 1954). In re Winship, 397 U.S. 358 (1970). LaFollettee v. Raskin, 34 Wis.2d 607, 150 N.W.2d 318 (1967). Regina v. M'Naghten, 10 Cl. and F. 200, 8 Eng.Rep. 718 (1843). Schleisner v. State, 58 Wis.2d 605, 207 N.W.2d 636 (1967). State v. Johnson, 121 R.I. 254, 399 A.2d 469 (1979). U.S. v. Brawner, 471 F.2d 969 (D.C. Cir. 1972). U.S. v. Hinckley, 525 F. Supp. 1342 (D.C. 1981).

201

CHAPTER

11 RESEARCH ON jURY DECISION MAKING: THE STATE OF THE SCIENCE William C. Thompson University of California, Irvine

Psychologists who are interested in individual and group decision making have long looked with fascination at juries. These small, ad hoc groups of lay individuals perform a crucial role in the legal system: They are the ultimate trier of fact in most criminal and many civil trials. Jury decisions not only determine the outcome of cases that go to trial, they also set the parameters within which the outcomes of many more cases are negotiated. Plea bargains and out-of-court settlements are reached in the shadow of the jury, as litigants balance their hopes and fears of what a jury might decide. Because so much is at stake in jury trials, it seems important to know whether jury decisions are accurate, reasonable, and fair. Indeed, the importance of the jury in the legal system illustrates the importance of understanding the strengths and limitations of decision making by small ad hoc groups. It is not surprising, then, that a literature has developed on jury decision making (Erlanger, 1970; Gerbasi, Zuckerman, & Reis, 1977; Hans & Vidmar, 1986; Hastie, Penrod, & Pennington, 1983; MacCoun, 1989; Saks & Hastie, 1978). An important early study (Kalven & Zeisel, 1966) surveyed trial judges, asking them to describe and evaluate the outcome of jury trials that had occurred in their courts. Inferences about jury performance were drawn by comparing the decisions of juries in several thousand cases with the decision that the judges said they would have made. This approach is somewhat analogous to the comparison of "lay" and "expert" judgments in other domains. Other researchers have conducted post-trial interviews with jurors, trying to infer the basis for jury decisions from self-reports of the participants (Moran 203

204

THOMPSON

& Comfort, 1986; Zeisel, 1968). By far the most common approach, however,

has been the experimental jury simulation study. In simulation studies, subjects are asked to play the role of jurors and to review and make decisions on simulated evidence under conditions systematically varied by the experimenter. The simulated evidence can range from brief written descriptions of facts to videotapes of elaborate and realistic recreations of trials. Although there are obvious strengths and weaknesses of each methodology, a convergence of findings across studies and across methods can be quite convincing. The study of jury decision making is often mentioned as a prime area in which psychological research can contribute to the development of effective legal procedures (Saks & Hastie, 1978; Wrightsman, 1991). Whether this literature has, in fact, contributed significantly to the legal system has been subject to debate (Haney, 1980; Konecni & Ebbesen, 1979, 1981; Lob, 1981). Although there are some notable exceptions, much of the research on jury decision making is rarely if ever cited by courts or legal commentators and appears to have had little influence in the legal arena. Part of the problem is that many studies in the area answer questions lawyers consider obvious, trivial, unimportant, or uninteresting (Vidmar, 1979). Simulation studies, in particular, have often been motivated by a desire to test hypotheses of theoretical interest to psychologists rather than to answer questions of practical interest to the legal system. The finding that physically attractive defendants fare better in simulated jury trials than those who are ugly (e.g., Sigall & Ostrove, 1975), caused little stir among legal commentators. Apart from being obvious, such findings have limited influence in the legal system because victims of such a bias have no legally cognizable claim that their constitutional rights have been violated (and thus no reason to present such research in court), and because it is difficult to see how legal procedures might be changed to solve the problem. Such findings may, of course, be quite interesting to social psychologists studying person perception if they somehow advance theory in the area. Quite often, however, such studies have done little more than test the existence of well-known phenomena using a "jury paradigm," and thus have had limited influence in psychology as well as law. Additionally, concerns have arisen about the generalizability of the results of many simulation studies. Simulations are frequently faulted for failure to take into account potentially important moderating and mediating variables (Bermant, McGuire, McKinley, & Salo, 1974; Konecni & Ebbesen, 1981, 1982; Pennington & Hastie, 1981; Vidmar, 1979). In fairness to researchers, it must be noted that jury trials are highly complex events, in which so many variables are potentially important that it is impossible to test the generalizability of phenomena across all states of all potentially important variables. Neverthe-

II. JURY DECISION MAKING

205

less, there has been a tendency for researchers to overgeneralize; sweeping conclusions about all jury trials are sometimes drawn from findings that may occur only in specific factual contexts. Even the best simulations, which have highly realistic stimulus materials, often examine reactions of simulated jurors to a single case, leaving readers to wonder whether the phenomena they document will generalize to other types of cases, or even to similar cases in which the facts are varied. To make matters worse, the cases used in simulation studies are sometimes unrealistic. Researchers motivated by theoretical rather than legal concerns have often been cavalier about how accurately they simulated the legal system because the verisimilitude of their legal "simulations" was not terribly important to them. Telling subjects they were to act as jurors was often little more than a cover story used to put them in an appropriate frame of mind. To legal scholars, however, it sometimes appeared that the studies were simulating a legal world that existed more in psychologists' imaginations than in reality. The credibility of the field as a whole has been damaged by researchers who have made sweeping and misleading generalizations about the real legal system based on such findings. Lawyers began taking jury simulation studies more seriously when researchers began addressing questions that lawyers themselves had defined as important, using studies that simulated the legal world in a thoughtful, legally sophisticated manner. A milestone was reached in 1978 when the U.S. Supreme Court, in Ballew v. Georgia, appeared to rely heavily on jury simulation research to reach the conclusion that juries of fewer than six persons are unconstitutional (see generally, Saks & Hastie, 1978, pp. 75-83). 1 Research on the effects of death-qualifying capital juries was also taken very seriously by several appellate courts (see generally, Thompson, 1989a). Although the U.S. Supreme Court, in Lockhart v. McCree (1986), ultimately declined to make new constitutional law based on the findings of this research, the detailed attention given the simulation studies in the Court's analysis helped establish the importance of such research in the legal arena. Ballew and McCree thus reflect a coming of age of research on jury decision making, particularly jury simulation studies, in the legal community. Although the volume of jury simulation research has tapered off over the last 10 years, the current research is markedly more interesting, significant, and legally relevant than much of what went before. The accompanying chapters in this section are generally consistent with this shift from quantity to quality, and illustrate some important trends in the field. 1For a contrary view, see Loh (1981, p. 340) who argues that the Supreme Court used the jury research "the way a drunk uses a lamppost: for support rather than illumination."

206

THOMPSON

HOW JUDGES' INSTRUCTIONS AFFECT VERDICTS IN CASES INVOLVING AN INSANITY DEFENSE

The chapter by Ogloff (this volume), discussing how jurors' decisions are affected by judges' instructions on the insanity defense, contributes to a broad literature on the effects of judges' instructions on juries (e.g., Elwork, Sales, & Alfini, 1977; Kassin & Wrightsman, 1979). A major concern of psychologists who have studied jury decision making is whether jurors follow judges' instructions. Although the jury is the ultimate trier-of-fact, it is the judge who determines what laws are applicable to the case and instructs the jury on the law. The task of the jury, then, is twofold: first to evaluate the facts and then to apply the law to the facts. In a homicide case, for example, the judge's instructions may tell the jury the elements of various crimes (first-degree murder, second-degree murder, manslaughter) and defenses (e.g., self-defense) that might be relevant. The jury's role is to decide what happened in the case at hand and to fit those facts into the appropriate legal category. Useful models of this process, with strong grounding in empirical research, have been developed by Pennington and Hastie (1991, 1992). In this context, a key issue is whether jurors comprehend judges' instructions. The law that judges must explain to jurors is often quite complex, featuring unfamiliar terminology (e.g., proximate cause), subtle distinctions between legal categories, and terms with legal definitions that may deviate from common usage (e.g., malice; Elwork, Sales, & Alfini, 1982). Because verdicts can easily be reversed on appeal if the judges' instructions are legally inaccurate (but are almost never challenged on grounds that jurors misunderstood the law), the primary concern of judges is that the instructions be accurate, not that they be easy to comprehend. Consequently, instructions are typically cobbled together and read to jurors word-for-word from form books of language previously approved by appellate courts. Whether jurors fully understand these form instructions is an obvious and important issue addressed by a number of studies, including those reported herein by Ogloff.2 Ogloff examined judges' instructions in criminal cases involving an insanity defense. The legal standards determining whether a defendant may be found not guilty by reason of insanity are complex and have been controversial for well over a century (Bazelon, 1974). Indeed, some commentators have noted the striking similarity of the debate that arose over the insanity defense following the M'Naghten's acquittal in the 1840s and the debate that followed 2A related concern, of course, is jurors' willingness (and, in some instances, their ability) to follow judges' instructions. It is widely recognized that juries sometimes "nullify" unpopular laws through their reluctance to convict violators. Juries have been both celebrated and vilified for doing so (Kadish & Kadish, 1971 ). Because juries are not required to justify their decisions, subtle acts of civil disobedience by the jury are difficult to detect and difficult to distinguish from simple failure to comprehend instructions.

II. JURY DECISION MAKING

207

John Hinckley's acquittal in 1982 {Moran, 1985). As Ogloff noted, Hinckley's acquittal provided political impetus to efforts to revise insanity standards in a number of states. 3 The changes in the law concerned not only the legal definition of insanity but the party with the burden of proof and the standard of proof. The question addressed by Ogloff is what effect such changes have on the juries that actually decide insanity cases. This is a question of broad interest in the legal community. The answer is important not only to mental health professionals, who are concerned with the outcome of criminal cases involving a mental defense, but also to law and society scholars, who are concerned more broadly with the connection between the law on the books and the law in practice. It also is likely to interest legal scholars concerned with the appropriateness of jury verdicts and experts in judgment and decision making interested in how lay individuals apply complex rules when making decisions. Does the change in the law on the books really affect the way the law is applied by the jury? Ogloff noted correctly that previous research has left this issue murky. A number of archival studies, correlating rates of NGI verdicts with changes in the law, have suggested the legal changes may have significant effects. But these correlational studies have too many uncontrolled variables to allow confident conclusions about causal relationships. On the other hand, a number of jury simulation studies (which have featured tight experimental control of the relevant variables) have found that judges' instructions have little effect. But Ogloff argued that these studies are insufficiently realistic to instill confidence. To this inconclusive literature, Ogloff contributed two additional jury simulation studies. Simulated jurors watched a realistic reenactment of a criminal trial in which the defense was insanity, heard legally accurate judge's instructions, and individually rendered verdicts. They were also tested for their memory and comprehension of the judge's instructions. The content of the judge's instructions was varied experimentally between subjects such that the various conditions reflected the full range of possible instructions that jurors in the real world might hear. Ogloff found no differences across conditions in the rate of NGI verdicts and poor memory for and comprehension of the judge's instructions. Ogloff suggested that his subjects failed to understand the judge's instructions; they decided the case based on their own schemata or notions of what is just rather than on the law as presented by the judge. Thus, the variation in judges' instructions from condition to condition had no statistically detectable effect on jurors' verdicts. Based on these 3A number of commentators attributed Hinckley's acquittal to the requirement, under Dis· trict of Columbia law at that time, that the prosecution prove the defendant's sanity beyond a reasonable doubt. As Ogloff noted, there was a move following Hinckley's acquittal to shift the burden; requiring the defendant to prove he was not legally sane.

208

THOMPSON

findings, he concluded that there is little connection between the legal standard propounded by the judge and jury decisions, and thus, more broadly, that efforts to change legal standards regarding the insanity defense have little practical effect. Although Ogloff's conclusions are plausible, they are not adequately supported by his findings and must be viewed as somewhat speculative at this point. Ogloff claims, for example, that subjects' failure to comprehend the judges' instructions explains the failure of the instructions to affect verdicts. This conclusion would be more convincing if it could be shown that the variation in judges' instructions did affect verdicts among subjects who understood them. Future studies might include conditions in which subjects receive idealized judges' instructions that are sufficiently thorough and lucid to assure comprehension. The use of such conditions would help distinguish failure to comprehend the judge's instructions from unwillingness or inability to follow them as a causal factor. The most important limitation of Ogloff's studies, however, is that they involve judgments in a single case (and one that is not fully described). This limitation is important because the effects of substantive legal rules on jury decisions in insanity cases may well vary depending on the nature of the case. One variation that Ogloff examined, for example, was whether jurors were given the traditional M'Naghten standard or the standard recommended by the American Law Institute (ALI standard). These two standards are quite similar with regard to the so-called cognitive elements of insanity. Under both standards, someone who, as a result of a mental disease or defect, did not understand the wrongfulness of his conduct should be found not guilty. So one would expect similar jury decisions under the two standards where the defendant's claim is that his mental illness rendered him unable to appreciate the wrongfulness of his actions. Where the M'Naghten and ALI standards differ sharply is on the so-called volitional element of insanity. Under ALI, but not under M'Naghten, people who, as a result of mental illness, are "unable to conform their conduct to the requirements of the law" are excused. So one would expect different jury decisions under the two standards where the defendant's claim is that his mental illness rendered him unable to control himself. To gain a full account of the effects of legal standards on jury decisions, one should ideally look at a range of cases, including at least one case in which a strong normative argument can be made that the instructions should make a difference. 4 The studies must, of course, have adequate statistical power to reliably detect differences of the expected size. The strong point of Ogloff's studies is that they address a truly important 4The importance of normative analysis of the problems posed in jury simulation studies is elaborated later.

II. JURY DECISION MAKING

209

question and that they make an obvious effort to achieve legal realism. 5 Although the studies leave many questions unanswered and provide inadequate support for Ogloff's more speculative conclusions, they are nevertheless a useful contribution to the literature. It is useful to know, for example, that Ogloff's subjects apparently failed to appreciate points of major importance in the judges' instructions, such as which side bore the burden of proof on the issue of insanity. Although the failure to find significant differences in verdicts across conditions is open to multiple interpretations, subjects' apparently limited comprehension of the details of the instructions must surely give pause to anyone thinking that a legislative change in the wording on the insanity standards will necessarily influence the outcome of jury trials.

THE NORMATIVE STATUS OF BASE RATES AT TRIAL

In recent years, there has been a rapid increase in the use of quantitative evidence in jury trials (Degroot, Fienberg, & Kadane, 1986; Fienberg, 1989; Kaye, 1987, 1990; Thompson, 1989b). For example, market share data are often presented in antitrust cases; statistical projections of lost income are presented in personal injury and wrongful death cases; epidemiological data are presented in toxic tort cases; data on hiring and retention rates are presented in employment discrimination cases; and data on the error rate of forensic tests are presented in criminal trials. The growing use of quantitative data in court goes hand in hand with courts' increasing reliance on the testimony of experts, whose conclusions are often based upon (and must be evaluated in light of) statistical data of various kinds (Saks & Van Duizend, 1983). Quite often, statistics are presented in the form of base rates (Thompson, 1989b). However, the use of base rate statistics in court has been controversial. In some cases, courts have held base rate statistics inadmissible on grounds that they lack probative value; in other cases, they have rejected base rate statistics based on concerns that such evidence will be misused by juries. Empirical studies on jurors' reactions to base rate statistics have only recently begun to appear (for reviews see Kaye & Koehler, 1991; Thompson, 1989b). Given the confusion in the legal community over jurors' ability to use such evidence appropriately, such studies are greatly needed. The chapter by Koehler (this volume) makes a number of important points about the use of base rates in trial. Koehler does not present jury research !Yrhe studies employed an apparently realistic videotaped reenactment of an actual trial and realistic judge's instructions. Although one might quibble with the failure to include group deliberation and use of individual judgments, rather than group verdicts, as the primary dependent variable, there is, in any simulation study, a need to balance verisimilitude against available resources. In most jury simulation studies, group verdict can be predicted with good accuracy from the kind of individual judgments Ogloff obtained.

210

THOMPSON

per se; rather, he offers a discussion of normative issues surrounding the use of base rates in court. He tells us not how base rates are used, but how they should be used. This discussion will be of great importance to those interested in evaluating jurors' use of base rate evidence. In order to determine whether jurors are using base rate evidence appropriately, one must have a standard of appropriateness against which to compare their performance. Koehler's incisive analysis helps provide such a benchmark. Koehler's work in this chapter and elsewhere (Koehler & Shaviro, 1990) is also helpful, of course, to appellate courts and legal scholars interested in analyzing the legal relevance and probative value of base rate evidence. The heart of Koehler's chapter is his analysis of four arguments against the use of base rate evidence in court. It is worth noting that Koehler is the first scholar to classify attacks on base rate statistics so neatly and accurately. The underlying literature on which Koehler draws is both extensive and confused. A reader is confronted, at first glance, with dozens of complex arguments and involved analyses. Koehler distills this commentary to its essence and finds, quite rightly, four distinct "skeptical arguments." The identification and delineation of these arguments is an important service in itself. Koehler's concise analysis of each argument makes this chapter even more helpful. The clear distinctions drawn between concerns about the probative value of base rate evidence and policy-related concerns about the use of the evidence in court are particularly valuable. When reading Koehler's chapter, it is important to realize that base rate statistics can be used in court to prove a fact in two distinct ways (Thompson, 1989b). In some instances, the base rate is directly relevant to a target outcome in that it directly expresses the frequency of that outcome. When a pedestrian is struck by a bus of unknown origin, evidence that a particular company operated 90% of the buses in that city is directly relevant to the question of who owned the bus. Similarly, when a man possessing heroin is charged with concealing an illegally imported narcotic, evidence that 98% of the heroin is illegally imported is directly relevant to the question of whether the heroin possessed by the defendant was illegally imported. In such instances, the base rate establishes the prior probability of the target outcome. 6 If 90% of the buses that could have been involved in the accident are owned by a particular company, there is a prior probability of .90 that company owned the offending bus. In other instances, the base rate is only indirectly relevant to a target outcome and must be combined with other information before any probabilistic assessment of the target outcome is possible. When forensic tests link a criminal defendant to a crime by showing his blood type matches that of the &rhe prior probability of a target outcome is the probability that a reasonable person would assign to that outcome prior to receiving any case-specific or individuating information.

II. JURY DECISION MAKING

211

perpetrator, evidence that the blood type is found in 5% of the population is relevant to the ultimate issue of the defendant's guilt, but only indirectly. The base rate of the blood type does not, by itself, reveal anything about the likelihood of the target outcome (the defendant's guilt) and thus, unlike a directly relevant base rate, does not establish a prior probability of the target outcome. Instead, it speaks to the likelihood that the defendant might, by chance, have a "matching" blood type if innocent, and thus helps to establish the value of the forensic evidence. The use of "directly relevant" base rates in court has been controversial, particularly where the base rate is the sole evidence of a target outcome {Kaye, 1987). Base rates of this kind have been labeled "naked statistical evidence" {Kaye, 1982) and have generally been held inadmissible, although a few courts have admitted such evidence. 7 The most widely discussed case involving "naked statistical evidence" is Smith v. Rapid Transit (1945), in which the plaintiff was struck by a hit-and-run bus and based her claim that the bus was the defendant's solely on evidence that the defendant operated 90% of the buses in the city. The Massachusetts Supreme Court sustained the defendant's motion for summary judgment on grounds that the base rate statistic was insufficient to make a case against the defendant in the absence of some more particularized proof of the identity of the offending bus. Koehler's analysis suggests that the holding in Smith might be justified by policy concerns about finding liability based solely on "naked statistics" but cannot be justified on grounds that the evidence is insufficiently probative. Koehler offers no insights into how jurors actually respond to naked statistics, but other researchers have recently begun to look into that question (Hikida & Thompson, 1992; Wells, 1992, in press). Wells found, for example, that people are reluctant to hold a defendant liable in cases similar to Smith where the plaintiff relies solely on naked statistical evidence, but they are quite willing to hold the defendant liable in cases in which the plaintiff presents evidence of identical probative value involving "indirectly relevant" base rates. Subjects who received the "naked" statistical evidence and subjects who received the "indirectly relevant" base rates gave comparable estimates of the probability the defendant was responsible, but the latter group was far more willing to base a legal judgment on the evidence. This intriguing finding certainly warrants further exploration. The use in trials of "indirectly relevant" base rates has been more common and less controversial (Kaye, 1987; Thompson, 1989b). For example, where the perpetrator and defendant are shown to have the same blood type, statistics on the percentage of the population possessing a given blood type are routinely admitted in most states. Statistics have also been admitted in connection with forensic evidence showing a match between samples of hair, 7 E.g.,

Turner v. U.S. (1970); Sinde/1 v. Abbott Labs (1980).

212

THOMPSON

glass, paint, fibers, particles, and teeth marks (Thompson, 1989b). Most of the research on jurors' reactions to base rate statistics has focused on "indirectly relevant" base rates of this type (for reviews see Kaye & Koehler, 1991; Thompson, 1989b).

JURORS' EVALUATION OF HEARSAY EVIDENCE

The rules of evidence, which control what information the jury can and cannot receive, have two purposes. First, they help screen out irrelevant information that could needlessly prolong the trial without contributing to the resolution of the issue at hand. Quite often, evidence is ruled inadmissible because it is deemed insufficiently diagnostic to be worth the trouble of hearing (Lempert, 1977). Second, the rules help screen out information that is prejudicial-that is, information that the jury is so likely to misuse or draw wrong conclusions from that the factfinding process would be more accurate without it8 (Kaplan, 1968). To accomplish these purposes, the law has developed an elaborate thicket of rules (see generally, Lempert & Salzberg, 1977). One of the more tangled parts of this thicket is the rules concerning hearsay. Hearsay is evidence about what a person said outside of court that is introduced to prove the truth of what the person said. Suppose a witness testifies as follows: "Joe told me it was raining." If this testimony were offered in court to prove that it was raining, it would be hearsay. 9 The person whose statement is quoted by the witness, in this case Joe, is called the declarant. Courts have traditionally viewed hearsay with some skepticism, fearing that it will be overvalued by jurors and therefore be prejudicial (Lempert & Salzburg, 1977). The fear arises in part from historic cases in which miscarriages of justice were seen to arise from the use of hearsay evidence. But the fear is primarily grounded in a psychological intuition: Courts have traditionally assumed, without much empirical basis, that jurors fail to fully appreciate the unreliability of hearsay testimony (Lempert & Salzburg, 1977). With hearsay there are two levels of uncertainty-there is the possibility the declarant was lying or mistaken, as well as the possibility the witness is lying or mistaken about what the declarant said. And the first possibility is unusually difficult to assess because the declarant, who is not present in 8Whether evidence is relevant is a normative issue (i.e., it concerns the weight a decision maker should give the evidence). Whether evidence is prejudicial is a psychological issue; it re· quires a judgment of whether jurors are likely to give the evidence the weight it deserves. 911 would not be hearsay if it were introduced to prove something other than the fact as· serted by Joe. For example, if it were introduced to prove Joe could talk, or that Joe was alive (rather than dead), it would not be hearsay.

II. JURY DECISION MAKING

213

court, cannot be cross-examined to test for bias or reveal hidden sources of error. Because it is assumed that jurors will fail to discount adequately for these limitations, there is a general rule excluding hearsay testimony. Like most evidentiary rules, however, the hearsay exclusion is subject to multiple exceptions designed to allow such testimony where special circumstances exist that help assure that it is reliable or where strong practical considerations favor its use (Lempert & Salzburg, 1977). The chapter by Miene, Borgida, and Park (this volume), initiates an empirical examination of the fundamental assumption underlying the hearsay rule-that jurors fail to discount adequately for the uncertainty surrounding hearsay. Research in this area is particularly timely and important because a controversy has arisen in the legal community about the continuing viability of some traditional hearsay rules (Park, 1987). One attack on hearsay exclusions is being mounted by groups concerned with protecting vulnerable witnesses, particularly children, from the trauma of testifying in court (Goodman & Helgeson, 1985). They would like to make it easier for prosecutors to present hearsay testimony by parents, teachers, or social workers about the out-of-court statements of alleged child abuse victims. In many such cases, if the jury could hear second-hand reports regarding the child's statements, it would not be necessary to have the child testify. To this end, a number of states have recently passed or have pending legislation expanding exceptions to the hearsay rule, thereby broadening the circumstances in which hearsay testimony may reach the jury. In deciding just how liberal to be in allowing hearsay testimony, conscientious legislators must, of course, consider how such evidence will affect the accuracy of fact-finding by juries. It is on this question that research like that initiated by Miene et al. will be helpful. Such research is also likely to be helpful to appellate courts as they analyze the constitutionality of new hearsay rules. Regardless of what legislators may wish to allow, the use of hearsay is limited, to some extent, by the U.S. Constitution. The confrontation clause of the Sixth Amendment of the U.S. Constitution states: "In all criminal prosecutions, the accused shall enjoy the right ... to be confronted with the witnesses against him." The U.S. Supreme Court has consistently held that the confrontation clause is not an absolute bar to hearsay: "[T]he Clause permits, where necessary, the admission of certain hearsay statements against a defendant despite the defendant's inability to confront the declarant at trial" (Maryland v. Craig, 1990). However, at present, there is considerable uncertainty about what types of hearsay are and are not allowable under the confrontation clause, and the law in this area appears to be evolving. As courts struggle with the application of the confrontation clause in hearsay cases, they will need to consider the importance of the interests it protects and thus will inevitably face the question

214

THOMPSON

of whether, and how much, the use of hearsay undermines the accuracy of decision making in jury trials. This question, of course, concerns the ability of jurors to draw appropriate conclusions from hearsay evidence. The study reported by Miene et al. is a small first step toward answering this important question. The goal of the study was to determine whether simulated jurors discount hearsay evidence when deciding the guilt of a criminal defendant. In one condition jurors received an eyewitness's description of what he saw. In another condition, jurors received the same description of what the eyewitness saw, but it was recounted by a hearsay witness who had heard the eyewitness describe it outside court rather than by the eyewitness himself. Conviction rates were significantly lower in the hearsay condition, which indicates that subjects discounted the hearsay testimony relative to direct testimony from the eyewitness. According to Miene et al., this finding suggests that jurors are sensitive to the unreliability of hearsay and discount for it when making decisions based on hearsay evidence. Although the finding is certainly consistent with this interpretation, it is a rather slender reed on which to hang a conclusion that traditional concerns about the use of hearsay in jury trials are overblown. Hearsay evidence can serve a variety of purposes in a trial and can arise in a variety of factual contexts. The way in which jurors respond to it may well be context dependent. Hence, additional research involving a broader range of fact patterns is needed to test the generality of these findings. Another difficulty in evaluating such findings is the lack of a normative standard against which to compare jurors' performance. A design like that employed by Miene et al. can provide some evidence of whether jurors discount hearsay or not, but it cannot answer the key legal question of whether they respond appropriately to the evidence. Do they discount as much as they should? Inferences drawn from hearsay evidence are, of course, complex because they involve multiple sources of uncertainty. Nevertheless, it should be possible to develop normative models that could be used as a benchmark for human judgment. The cascaded inference models developed by Schum and his colleagues might be quite helpful in this area (Schum, 1977; Schum & Martin, 1982). The techniques used by Schum and Martin (1982) to cast light on people's sensitivity to such evidentiary subtleties as partial redundancy of evidence may well help cast light on people's sensitivity to the sources of uncertainty underlying hearsay. Another approach is to test people's inferences about the reliability of hearsay evidence under conditions where the actual reliability of the evidence can be objectively verified. For example, one recent study simulated the conditions giving rise to trial-like hearsay and eyewitness testimony (Kovera, Park, & Penrod, 1992). Two observers watched a staged event. One observer, acting as an eyewitness, reported what he had seen directly to mock jurors.

215

II. JURY DECISION MAKING

The other observer, acting as a declarant, described what he had seen to a second individual (the hearsay witness), who, in turn, recounted the conversation to the mock jurors. By varying the amount of time that passed between the time the event was observed and the time the observers described it, the researchers manipulated the accuracy of the eyewitness, hearsay witness, and the declarant in a manner that could be objectively verified (longer delays led to less accurate reports). A major goal of the study was to test mock jurors' sensitivity to this variation. The study found that jurors were insensitive to the time-delay variation when evaluating the eyewitness-that is, they gave as much weight to the statements of inaccurate eyewitnesses (long delay) as to accurate eyewitnesses (short delay). By contrast, jurors were sensitive to the time-delay variations when evaluating the testimony of the hearsay witness-that is, they gave more weight to the hearsay testimony under conditions where the declarant and hearsay witness were more accurate (short delays) than under conditions where they were less accurate (long delays). The advantage of this approach is that it allows researchers to compare jurors' evaluations of evidence with objective information on the accuracy of the evidence, and thus to draw conclusions about whether jurors are responding to variations in the evidence as they should. Although this approach requires rather elaborate simulations, it is one possible way to answer the important question of whether jurors' responses are normatively appropriate.

CONCLUSION

The chapters on jury decision making in this volume are a fair reflection of the "state of the science" in jury simulation research. All three chapters focus on issues of great importance in the legal system. They reflect the increasing legal sophistication of researchers in the field and the gradual shift in their focus from questions primarily of interest to psychologists to questions of direct importance to the legal community. There is no evidence in these studies of the kind of "simulation psychosis" that led some earlier researchers to simulate a highly unrealistic legal world. Although the studies vary in how elaborately they recreate trial evidence, the simulations are legally accurate; the issues that the simulated jurors must resolve are the issues faced by real jurors. These are studies that legal professionals will find interesting. Like many studies on jury decision making, however, those reported in this volume are limited by their use of single-case scenarios. It is important to be cautious about generalizing from such findings until it can be determined how robust they are across different fact patterns. Although the tendency of researchers to overgeneralize such findings is natural and, perhaps,

216

THOMPSON

inevitable, they should take care to distinguish what is demonstrated by their data from what might plausibly be inferred from it. The study by Miene et al. highlights the need, in this area, for better normative analysis. It will be difficult to assess whether jury decisions are appropriate without normative standards against which to compare them. In this regard, Koehler's chapter, offering a normative analysis of base rate evidence, is particularly helpful. If we are to fully understand the strengths and limitations of jury decision making, an analysis of how juries should decide cases and research on how they do decide cases must proceed hand in hand.

REFERENCES Bazelon, D. L. {1974). Psychiatrists and the adversary process. Scientific American, 230, 18-23. Berman!, G., McGuire, M., McKinley, W., & Salo, C. {1974). The logic of simulation in jury research. Criminal Justice and Behavior, /, 224-233. Degroot, S. E., Fienberg, S. E., & Kadane, J. B. (Eds.). (1986). Statistics and the law. New York: Wiley. Elwork, A., Sales, B. D., & Alfini, J. (1977). Juridic decisions: In ignorance of the law or in light of it? Law and Human Behavior, /, 163-189. Elwork, A., Sales, B. D., & Alfini, J. (1982). Making jury instructions understandable. Charlottesville, VA: Michie/Bobbs-Merrill. Erlanger, H. A. (1970). Jury research in America: Its past and future. Law and Society Review, 4, 345-370. Fienberg, S. E. (Ed.). (1989). The evolving role of statistical assessments as evidence in the courts. New York: Springer-Verlag. Gerbasi, K. C., Zuckerman, M., & Reis, H. T. {1977). Justice needs a new blindfold: A review of mock jury research. Psychological Bulletin, 84, 323-345. Goodman, G., & Helgeson (1985). Child sexual assault: Children's memory and the law. University of Miami Law Review, 40, 181-225. Haney, C. (1980). Psychology and legal change: On the limits of a factual jurisprudence. Law and Human Behavior, 4, 147-199. Hans, V. P., & Vidmar, N. (1986). Judging the jury. New York: Plenum Press. Hastie, R., Penrod, S. D., & Pennington, N. (1983)./nside the jury. Cambridge, MA: Harvard University Press. Hikida, R. H., & Thompson, W. C. (1992). Base rate statistics and liability judgments: "Naked" statistics exposed. Paper presented at the meeting of the Western Psychological Association, Portland, OR. Kadish, M. R., & Kadish, S. H. (1971). The institutionalization of conflict: Jury acquittals. Journal of Social Issues, 27(2), 199-218. Kalven, H., & Zeisel, H. (1966). The American jury. Boston: Little, Brown. Kaplan, J. (1968). Decision theory and the factfinding process. Stanford Law Review, 20, 1,065-1,100. Kassin, S.M., & Wrightsman, L. S. {1979). On the requirements of proof: The timing of judicial instructions and mock juror verdicts. Journal of Personality and Social Psychology, 37, 1,877-1,887. Kaye, D. H. {1982). The limits of the preponderance of the evidence standard: Justifiably naked statistical evidence and multiple causation. American Bar Foundation Research Journal, 2, 487-501.

II. JURY DECISION MAKING

217

Kaye, D. H. (1987). The admissibility of "probability evidence" in criminal trials-Part II. Jurimetrics, 27, 160-180. Kaye, D. H. (1990). Review essay: Improving legal statistics. Law and Society Review, 24, I ,255-1,275. Kaye, D. H., & Koehler, J. J. (1991). Can jurors understand probabilistic evidence? Journal of the Royal Statistical Society, 154( I), 75-81. Koehler, J. J., & Shaviro, D. (1990). Veridical verdicts: Increasing verdict accuracy through the use of overtly probabilistic evidence and methods. Cornell Law Review, 75, 247-279. Konecni, V. J., & Ebbesen, E. B. (1979). External validity of research in legal psychology. Law and Human Behavior, 3, 39-70. Konecni, V. J., & Ebbesen, E. B. (1981). A critique of theory and method in social-psychological approaches to legal issues. In B. D. Sales (Ed.), The trial process (pp. 481-498). New York: Plenum Press. Konecni, V. J., & Ebbesen, E. B. (1982). Social psychology and the law: The choice of research problems, settings, and methodology. In V. J. Konecni & E. B. Ebbesen {Eds.), The criminal justice system: A social psychological analysis. San Francisco: W. H. Freeman. Kovera, M. B., Park, R. C., & Penrod, S.D. (1992). Jurors' perceptions of eyewitness and hearsay evidence. Minnesota Law Review, 76, 703-722. Lempert, R. L. (1977). Modeling relevance. Michigan Law Review, 75, 1,021-1,075. Lempert, R. L., & Salzburg, S. (1977). A modern approach to evidence. St. Paul, MN: West. Loh, W. (1981). Perspectives on psychology and law. Journal of Applied Social Psychology, 11, 314-345. MacCoun, R. J. (1989). Experimental research on jury decision-making. Science, 244, I ,046-1,050. Moran, G., & Comfort, A. (1986). Neither "tentative" nor "fragmentary": Verdict preference of impaneled felony jurors as a function of attitude toward capital punishment. Journal of Applied Psychology, 71, 146-156. Moran, R. (1985). The modern foundation for the insanity defense. The Annals of the American Academy of Political and Social Science, 477, 31-42. Park, R. (1987). A subject matter approach to hearsay reform. Michigan Law Review, 86, 51-122. Pennington, N., & Hastie, R. (1981). Juror decision-making models: The generalization gap. Psychological Bulletin, 89, 246-287. Pennington, N., & Hastie, R. (1991). A cognitive theory of juror decision making: The Story Model. Cardozo Law Review, 13, 5,001-5,039. Pennington, N., & Hastie, R. (1992). Explaining the evidence: Tests of the Story Model for juror decision making. Journal of Personality and Social Psychology, 62, 189-206. Saks, M. J., & Hastie, R. (1978). Social psychology in court. New York: Van Nostrand-Reinhold. Saks, M. J., & Van Duizend, R. (1983). The use of scientific evidence in litigation. Williamsburg, VA: National Center for State Courts. Schum, D. A. (1977). The behavioral richness of cascaded inference models: Examples in jurisprudence. In N. J. Castellan, Jr., D. B. Pisoni, & G. Potts (Eds.), Cognitive theory (Vol. 2, pp. 149-159). Hillsdale, NJ: Lawrence Erlbaum Associates. Schum, D. L., & Martin, A. W. (1982). Formal and empirical research on cascaded inference in jurisprudence. Law and Society Review, 17, 105-151. Sigall, H., & Ostrove, N. (1975). Beautiful but dangerous: Effects of offender attractiveness and nature of crime on juridic judgment. Journal of Personality and Social Psychology, 31,410-414. Thompson, W. C. (1989a). Death qualification after Wainwright v. Witt and Lockhart v. McCree. Law and Human Behavior, 13, 185-215. Thompson, W. C. (1989b). Are juries competent to evaluate statistical evidence? Law and Contemporary Problems, 52(4), 9-41. Vidmar, N. (1979). The other issues in jury simulation research: A commentary with particular reference to defendant character studies. Law and Human Behavior, 3, 95-106.

218

THOMPSON

Wells, G. L. (1992). Naked statistical evidence of liability: Is subjective probability enough? Paper presented at the meeting of the American Psychology-Law Society, San Diego, CA. Wells, G. L. (in press). Naked statistical evidence of liability: Is subjective probability enough? Journal of Personality and Social Psychology. Wrightsman, L. (1991). Psychology and the legal system (2nd ed.). Pacific Grove, CA: Brooks/Cole. Zeisel, H. (1968). Some data on juror attitudes toward capital punishment. (Monograph), Center for Studies in Criminal Justice, University of Chicago Law School.

LEGAL CASES Ballew v. Georgia, 435 U.S. 223 (1978). Lockhart v. McCree, 106 S.Ct. 1758 (1986). Maryland v. Craig, 110 S.Ct. 3157 (1990). Sindell v. Abbott Labs, 26 Cal.3d 588, 607 P.2d 924, 163 Cai.Rptr. 132 (1980). Smith v. Rapid Transit, 317 Mass. 469, 58 N.E.2d 754 (1945). Turner v. U.S., 396 U.S. 398, 414-16, reh'g denied, 397 U.S. 958 (1970).

PART

IV NATURALISTIC GROUP DECISION MAKING

CHAPTER

12 SHARED MENTAL MODELS IN EXPERT TEAM DECISION MAKING* Janis A. Cannon-Bowers Eduardo Salas Naval Training Systems Center, Orlando, Florida

Sharolyn Converse North Carolina State University

Critical performance in many complex systems depends on the coordinated activity of a team of individuals. Cockpit crews, surgery teams, fire fighting teams, and military teams are all examples of teams who operate in situations where ineffective performance can have disastrous consequences. Such teams are comprised of individuals who have high degrees of expertise in particular areas, requiring that information contributed from different team members converge in support of critical decisions. Furthermore, the decision environments in which expert teams must operate are often characterized by severe time pressure; complex, multicomponent decision tasks; rapidly evolving and changing information; high short-term memory demands; and high information ambiguity (Orasanu, 1990; Orasanu & Salas, in press). Efforts to understand and improve team performance have been ongoing for over 50 years, yet relatively little is known about how to train teams or manage team performance (Salas, Blaiwes, Reynolds, Glickman, & Morgan, 1985; Salas, Dickinson, Converse, & Tannenbaum, 1992). Recently, several authors have suggested that team performance and team decision making can be understood in terms of shared mental models of the task and team (Cannon-Bowers & Salas, 1990; Kleinman & Serfaty, 1989; Orasanu & Salas, in press). According to this position, effective team performance requires that *The views expressed herein are those of the authors and do not reflect the official positions of the organization with which they are affiliated.

221

222

CANNON-BOWERS, SALAS, CONVERSE

team members hold common or overlapping cognitive representations of task requirements, procedures, and role responsibilities. The purpose of this chapter is to describe how the notion of shared mental models can advance our understanding of teamwork and team decision making, and to delineate the implications of adopting such a position. In order to accomplish this, several relevant bodies of research are reviewed, including literature regarding team performance, team decision making, and mental model theory. Following this, a case is made that, when teamwork is conceptualized in terms of shared mental models, it provides an effective means to understand this rather elusive phenomenon. Finally, the implications of adopting the shared mental model perspective in terms of team decision making and training for team decision making are discussed.

TEAMWORK AND TEAM DECISION MAKING

Defining Terms Before presenting a review of the literature into team performance, it is first necessary to define relevant terms. In the current context, a team is defined in a manner consistent with Dyer (1984), Morgan, Glickman, Woodard, Blaiwes, and Salas (1986), and Orasanu and Salas (in press) as: A group of two or more individuals who must interact cooperatively and adaptively in pursuit of shared valued objectives. Further, team members have clearly defined differentiated roles and responsibilities, hold task-relevant knowledge, and are interdependent (i.e., must rely on one another in order to accomplish goals). Given this definition, teams can be distinguished from groups, in which members are homogeneous with respect to expertise, roles, and responsibilities (Klaus & Glaser, 1968; Orasanu & Salas, in press). Team decision making (TOM) in this chapter refers to a team process that involves gathering, processing, integrating, and communicating information in support of arriving at a task-relevant decision. TOM does not require that a consensus is reached among members, nor does it suggest that all team members are involved in all aspects of the decision. Instead, TOM requires that team members process and filter "raw" data, apply individual expertise, communicate relevant information, and (often) make recommendations to other members. In hierarchically structured teams where final decision authority is retained by a single individual, the team functions to provide the decision maker with assessments and information that are crucial to the situation. Several important implications of this definition of TOM should be noted. First, TOM is a process; that is, it is a set of ongoing activities in which demand on particular individuals can vary according to momentary task de-

12. SHARED MENTAL MODELS

223

mands. Second, the quality of TOM is dependent on the ability of the team to function effectively as a coherent unit; that is, effective TDM requires effective teamwork. Third, as suggested by Orasanu and Salas (in press), TOM tasks require the application of various classes of expertise by individual team members and often are beyond the capability of a single individual. Fourth, given the nature of TOM, individual task competence (i.e., knowledge, skills, and abilities relevant to an individual member's task or role) is a necessary but not sufficient condition for effective TOM. In addition, Orasanu (1990) noted that in the operational environment TDM often is embedded in a larger task; that is, the goal of the team is to accomplish a task, rather than simply to make a decision. Given these definitions, we contend that the study of TOM must focus on understanding how teams function-how they interact, coordinate, communicate, exchange information, and adapt-and on explicating the nature of teamwork and teamwork skills. The following sections provide a brief review of current research in this area.

Research into Team Performance Many researchers have suggested that teams are a critical component of modern American industry (Cummings, 1981; Hackman & Morris, 1975; Sundstrom, DeMeuse, & Futrell, 1990). A similar conclusion can be drawn for the military and public sectors. It is not surprising, therefore, that teams have been the subject of countless investigations over the past few decades (CannonBowers, Salas, & Converse, 1990). However, despite considerable theorizing and research, relatively little is known about either the nature of teamwork

or how best to train teams to perform effectively (Hackman, 1987; Salas et al., in press; Tannenbaum, Beard, & Salas, in press). In particular, past research has done little to identify specific teamwork skills or to investigate how teams acquire, maintain, and Jose these critical skills (Dyer, 1984). One of the problems with earlier team performance research (and a widely lamented criticism of it) is that confusion and misspecification have existed in what is meant by a team. Specifically, past researchers have frequently used the labels team and group (as defined earlier) interchangeably, rendering the applicability and generalizability of results questionable (Orasanu & Salas, in press). Recently, however, a series of studies conducted with military command and control teams and aircrews has made significant progress in understanding team performance (Glickman et al., 1987; Oser, McCallum, Salas, & Morgan, 1989; Stout, Cannon-Bowers, Salas, & Morgan, 1990). Defining a team as we do previously, Glickman et al. (1987) found that two separate tracks of behavior evolve during team training. The "taskwork" track involves skills that are related to the execution of the task and/or mission (e.g., operating equipment, following procedures, etc.). The second track, called the

224

CANNON-BOWERS. SALAS, CONVERSE

"teamwork" track, involves skills that are related to functioning effectively as a team member. In summarizing the overall results of this and related work, the following conclusions can be drawn: (a) Behaviors that are related specifically to team functioning (i.e., independent of the particular task at hand) are important to task outcomes (e.g., Oser et al., 1989; Stout et al., 1990); (b) effective teamwork behavior appears to be fairly consistent across tasks (e.g., Glickman et al., 1987; Oser et al., 1989); (c) team process variables (e.g., communication, coordination, compensatory behavior) influence team effectiveness (e.g., Stout et al., 1990). Mcintyre, Morgan, Salas, and Glickman (1988) recently summarized these and related studies in order to identify the common teamwork skills exhibited by successful team members. They concluded that teamwork appears to be comprised of a set of behaviors including: closed-loop communication, compensatory behavior, mutual performance monitoring, giving and receiving feedback, adaptability, and coordination of activity. Further, Mcintyre et al. (1988) suggested that in effective teams members seem to be able to predict the behavior and needs of other members. Kleinman and his associates (e.g., Kleinman, Luh, Pattipati, & Serfaty, 1992; Kleinman & Serfaty, 1989; Kohn, Kleinman, & Serfaty, 1987) also have studied team performance, but in a simulated rather than an operational command and control task. Among other things, this work revealed that teams under high workload conditions maintain performance by relying on "implicit" coordination strategies because opportunities for overt communication are restricted (Kleinman & Serfaty, I 989). The research just cited supports the contention that crucial teamwork behaviors can be isolated from other task-related behaviors. However, a more detailed understanding of team performance and its impact on TOM requires that the general behavioral dimensions of teamwork be decomposed into requisite knowledge, skills, and abilities (KSAs) and defined in terms of more specific behaviors. For several classes of teamwork behavior such as communication, giving and receiving feedback, and mutual performance monitoring, KSA development seems to be fairly straightforward. Because these categories of behavior are relatively observable, their relationship to overall TOM quality and team performance can be specified. In fact, a number of studies have investigated the relationship between team performance and communication behavior (e.g., Foushee & Manos, 1981; Lanzetta & Roby, I 960; Oser, Prince, & Morgan, 1990). It is in this area of defining and training skills associated with coordination of action and adaptability that things become more complicated. This is due to the fact that these skills appear to involve the ability of team members to predict the needs of the task and anticipate the actions of other team members in order to adjust their behavior accordingly. Further, effective coordination requires that team members understand when particular behaviors

225

12. SHARED MENTAL MODELS

are necessary, either as a function of the task or of the characteristics, duties, or needs of other team members (Prince, Chidester, Bowers, & CannonBowers, 1992). In order to explain these more elusive aspects of teamwork behavior, several researchers have suggested that team members may hold shared or common mental models of the task and team (Cannon-Bowers, Salas, & Converse, 1990; Kleinman & Serfaty, 1989; Mcintyre et al., 1988; Orasanu, 1990; Orasanu & Salas, in press). Before detailing and expanding this view, the mental model construct and supporting research must be reviewed; this is done in the following sections.

MENTAL MODELS

The notion of "mental models" has been invoked as an explanatory mechanism in a variety of disciplines over the years (see Wilson & Rutherford, 1989). In the area of skilled performance and system control, for example, Veldhuyzen and Stassen (1977) suggested that a mental model of a system consists of knowledge about the system under control (i.e., its structure, overall behavior, etc.); knowledge about disturbances that act on the system and the manner in which these affect the system; and knowledge about the task that must be performed in relation to the system. These internal representations of complex systems allow operators to interact successfully with the system by making explicit the nature and relationship of system components. In cognitive psychology and cognitive science, researchers have suggested that mental models are important to the more general understanding of how humans interact and cope with the world (Rouse & Morris, 1986). In fact, the notion of a schema (i.e., an organized knowledge structure that includes objects, situations and events, and the relationships between them) has been a common means to explain human behavior for many years (e.g., Anderson, 1977; Bobrow & Norman, 197 5; Rumelhart & Ortony, 1977). Schemata have been hypothesized to be important to the selection, abstraction, interpretation, and integration of information (Alba & Hasher, 1983). With respect to the mental model construct, several theorists have maintained that mental models allow people to predict and explain system behavior, and to recognize and remember the relationship between system components and events (see Wilson & Rutherford, 1989). Wickens (1984) contended further that mental models provide a source of people's expectations. In an even more comprehensive view, Johnson-Laird (1983) suggested that people "understand the world by constructing working models of it in their mind" (p. 10). Mental models enable people to draw inferences and make predictions, to understand phenomena, to decide what actions to take, to control system execution, and to experience events vicariously (Johnson-Laird, 1983).

226

CANNON-BOWERS, SALAS, CONVERSE

For our purposes, we define a "mental model" in this chapter as suggested by Rouse and Morris (1986). They offered a general definition of a mental model as a "mechanism whereby humans generate descriptions of system purpose and form, explanations of system functioning and observed system states, and predictions of future system states" (p. 360). This definition appears to capture the essential characteristics of mental models as defined and used in other areas. Purposes of Mental Models

Most researchers agree that mental models perform a number of important functions that allow people to interact effectively with their environment. According to Veldhuyzen and Stassen (1977), for example, mental models perform several functions with respect to manual system control. Specifically, mental models allow an operator to estimate all variables necessary to control the system, to adopt strategies necessary to control the system, to select proper control actions in relation to the defined strategy, to evaluate the potential of control actions that have been initiated, and to understand what happens as the task is being executed (Veldhuyzen & Stassen, 1977). From a more purely cognitive standpoint, the mental model construct assumes that people organize knowledge into structured, meaningful patterns that are stored in memory (Johnson-Laird, 1983; Rouse & Morris, 1986). These patterns contain several classes of information: concepts, features, and the relationships between concepts and features (Rips, Shoben, & Smith, 1973). Relationships among concepts and features can be defined on several bases; for example, they can be expressed in terms of time, cause and effect, or categorical membership (Collins & Loftus, 1975). The organization of knowledge into structured patterns (i.e., mental models) enables people to process information in a rapid and flexible manner and underlies complex cognitive functioning (Rumelhart & Ortany, 1977). When information is retrieved from memory, related information becomes more easily accessible. As such, mental models provide a heuristic function by allowing information about situations, objects, and environments to be classified and retrieved in terms of their most salient and important features. This is particularly useful when rapid comprehension and response are required. In their review, Rouse and Morris (1986) concluded that a number of common themes can be drawn among theories that describe the purpose of mental models; namely, that mental models serve to help people describe, explain, and predict system behavior. It also must be noted that, in contrast to the construct of a "schema," most theorists conceptualize mental models as more than simple mental images. Instead, mental models are manipulable, enabling people to predict system states via mental manipulation of model parameters {see Johnson-Laird, 1983, for a detailed description of mental

12. SHARED MENTAL MODELS

227

model functioning). For example, Klein (1989) has suggested that expert decision makers engage in a mental simulation that allows them to predict the ramifications of a potential decision prior to taking action. Wilson and Rutherford (1989) and de Kleer and Brown (1981, 1983) refer to this ability as "running" a mental model. These theorists maintain that running a mental model enables people to create causal event links so as to anticipate the specific outcomes that will result in given particular inputs to the model. The mental model construct has also been used as a means to evaluate the nature of an operator's knowledge of complex system performance, and as a basis to analyze effective and ineffective performance (Jagacinski & Miller, 1978; Sanderson, 1989). The unique contribution of mental model theory in this regard is that it helps researchers to understand how operators can adapt to changing conditions, sequence task inputs, and recognize the impact of a single behavior on the overall system. Further, the mental model construct can be used as a basis on which to design complex systems that require human control, to design instruction, or as a basis to evaluate training effectiveness (Moore & Gordon, 1988).ln fact, Rouse and Morris (1986) contended that the most fruitful use of the mental model construct may be as a means of solving applied problems such as developing training strategies, or improving system design. In sum, the mental model construct has been invoked frequently by psychologists and engineers to explain human cognitive functioning and humansystem performance. Whereas many questions remain regarding the exact form and nature of mental models (see Rouse & Morris, 1986), they offer a powerful explanatory mechanism for understanding complex performance. In the following sections, a theory of shared mental models is delineated as a means to explain the nature of coordinated team performance and team decision making.

SHARED MENTAL MODELS IN TOM

Specification of knowledge, skills, and abilities for effective team performance is necessary to enhance our understanding of TOM. With respect to several classes of teamwork behavior-coordinating action, adapting to changing task conditions, and anticipating the needs of the task and team-KSA specification is difficult. This is because these activities require team members to predict future events (with respect to task or team requirements) in order to sequence, time, and adjust their behavior appropriately. Moreover, the bases of these behaviors are difficult to define in observable terms; that is, whereas the outcome of an accurate prediction can be observed (e.g., a team member provides appropriate information to another member without being asked for it), the process by which the team member arrives at the prediction (i.e.,

228

CANNON-BOWERS. SALAS. CONVERSE

anticipates the need) cannot be observed. Rather, effective team members appear to draw on an internal knowledge base that allows them to decide which behaviors are required, and when and how to execute them. Perhaps the most parsimonious explanation of how teams coordinate, adapt, and predict is in terms of mutual expectations (Cannon-Bowers & Salas, 1990; Cream, Eggemeier, & Klein, 1978; Gabarro, 1990; Vreuls & Obermayer, 1985). When a novel event is encountered, teams that cannot strategize overtly must rely on existing expectations regarding the task and team in order to decide what action to take. The role of mental models in explaining team performance, then, stems from their ability to provide a set of organized expectations for performance from which accurate, timely predictions can be drawn (Cannon-Bowers & Salas, 1990; Rouse, Cannon-Bowers, & Salas, in press). From this, it can be hypothesized that team effectiveness is a function of the compatibility of expectations generated from team members' mental models. This line of thinking expands Orasanu and Salas' (in press) definition of shared mental models as "organized knowledge that is shared by team members" (p. 8) by suggesting that in addition to shared knowledge team members must hold shared expectations that are generated from this knowledge. Recently, Rouse et al. (in press) have added an additional proviso: that shared explanations of task and team performance are also crucial to team performance. They argue that shared mental models enable team members to draw accurate explanations about system and task performance. Specifically, shared mental models help team members to explain what is happening during task performance (in terms of task demands and teammate behavior), and to arrive at common explanations across members. These common or shared explanations in turn lead to development of shared expectations of task and team demands. In summary, shared mental models are defined here as knowledge structures held by members of a team that enable them to form accurate explanations and expectations for the task, and, in turn, to coordinate their actions and adapt their behavior to demands of the task and other team members. Past Work in Shared Mental Models

A number of researchers have hypothesized the existence and importance of shared mental models in teams. As early as 1934, Mead maintained that "complex cooperative activity" is only possible if each team member can direct his or her behavior according to shared notions of task processes and activities (Mead, 1934). Hammond (1965) concluded that members of ineffective problem-solving teams utilized stimulus cues differently as a function of different mental models of the problem-solving task. In their review of com-

12. SHARED MENTAL MODELS

229

mand and control decision making, Wohl, Entin, Kleinman, and Pattipati (1984) hypothesized that a team must have a "mutual" mental model of the co-functioning of team members. As noted earlier, Kleinman and Serfaty (1989) used the notion of shared mental models to explain the results of an investigation of simulated tactical decision making. They concluded that team members under high workload conditions exercised mutual (or shared) mental models, allowing them to coordinate implicitly (i.e., without overt communication). Athens (1982) hypothesized that frequent communication among military commanders allowed them to develop mutual mental models, providing a common view of the environment. Further, Athens maintained that the increased similarity among commanders' mental models would lead to improved communication and coordination. Orasanu (1990) used the term shared mental model to refer to common models of the problem or situation. According to this view, team members must develop a shared understanding of the situation during emergencies, including definition of the problem, plans and strategies for solving the problem, interpretation of cues and information, and roles and responsibilities of participants (Orasanu, 1990). These "shared situation" or "shared problem" models include common understanding of the problem, goals, information cues, strategies, and member roles, all of which are grounded in the team's more general models of the task and team. They provide a context in which communication can be interpreted, and a basis for predicting the behavior and needs of other members (Orasanu & Salas, in press). In an analysis of cockpit crew behavior, Orasanu (1990) found support for these hypotheses regarding the role of shared problem models in TOM. Briefly, effective crews in her study displayed communication patterns that were different from those displayed by ineffective crews (although the total amount of communication did not differ). The nature of these differences led Orasanu to conclude that effective crews were building shared models of the situation that enhanced their performance. For example, effective crews were more likely to articulate plans and strategies for coping with emergent situations, and to assign responsibilities to crew members (Orasanu & Salas, in press). Several other formulations are also similar to the shared mental hypothesis. These include Klein and Thordsen's (1989) construct of "team mind" and Wegner's (1987) construct of "transactive memory." Both of these positions maintain that teams can be conceptualized as unified informationprocessing units, analogous in some ways to the individual mind. In fact, Klein and Thordsen (1989) hypothesized the existence of team-level constructs including team (collective) consciousness, as well as preconscious, and team memory.

230

CANNON-BOWERS, SALAS, CONVERSE

Empirical Evidence of Shared Mental Models

Very few studies have tested directly the hypothesis that shared mental models enhance TOM performance, although (indirect) evidence for existence of the phenomenon can be found in the team performance literature. Several studies have already been described in this regard: Kleinman and Serfaty (1989), Orasanu (1990), and Hammond (1965). Others include a study by Cream (1974), who demonstrated that accurate expectations regarding other team members' functional responsibilities were important to team effectiveness. Oser et al. (1990) found that the teamwork behavior of "offering information before it was requested" was related to team effectiveness in military command and control teams. In an early study, Hemphill and Rush (1952) concluded that, when team members shared knowledge about their functions and role responsibilities, effectiveness was enhanced. Finally, Foushee, Lauber, Baetge, and Acomb (1986) suggested that the effects of fatigue on aircrew performance could be overcome by shared experience. These researchers found that, over time, crews were able to develop interaction patterns that enabled them to perform effectively, even when fatigued (Foushee et al., 1986). In one of the attempts to test the shared mental model hypothesis directly, Brehmer (1972) trained two-member teams to have similar mental models of a simulated tactical decision-making task. Results indicated that shared mental models were insufficient to maintain performance under stressful conditions. He concluded that, in addition to shared mental models of the task, team members must hold a common assessment of what actions are required to meet task demands (Brehmer, 1972). Under the current formulation, we would maintain that "common assessment of actions" is part of the shared mental model, not an addition to it as suggested by Brehmer. We would question, therefore, whether subjects in Brehmer's study actually developed shared mental models. Adelman, Zirk, Lehner, Moffett, and Hall (1986) also provided a more direct test of shared mental model propositions. Using a tactical decision-making task, these researchers manipulated the extent of overlap in team members' mental models but were unable to demonstrate a significant relationship between the extent of overlap and overall team performance (Adelman et al., 1986). However, in creating shared mental models, they manipulated the nature of information given to team members regarding only one facet of the task environment (i.e., the sector from which attack was most likely to occur). Our conceptualization of shared mental models would suggest that this was an insufficient manipulation of shared mental models because it would not provide an adequate basis on which team member expectations could be derived. The last two investigations cited (Adelman et al., 1987; Brehmer, 1972) lead us to raise an important question; namely, what should be the content

12.

SHARED MENTAL MODELS

231

of the mental models shared by team members? More specifically, we must ask, what is the nature and extent of information that must be shared by team members in order to allow them to form accurate explanations and expectations of the task and team? Whereas the answer to this question is ultimately an empirical one, we attempt to shed some light on it by analyzing in more detail the nature of TOM performance and incorporating the notion of multiple mental models into the discussion. Multiple Mental Models

Several theorists working in the area of mental models have hypothesized that there may be different types of mental models. For example, Young (1983) described several types of mental models. He delineated in detail the surrogate model, a mental model that is a representation of the components of a device and the relationship among those components. A second type of mental model described by Young, the task-action mapping model, contains information that relates actions to outcomes so that an operator can predict the likely outcome of various actions (Young, 1983). In other formulations, Rasmussen (1979) and Rouse and Morris (1986) offered taxonomies of mental models. According to Rouse and Morris (1986), the various types of mental models delineated by these taxonomies all have value, depending on the questions being asked about a system. Further, they maintain that when a system is used in different ways, multiple models of it are likely to be developed (Rouse & Morris, 1986). In fact, the significance of hypothesizing multiple mental models is that the appropriateness of a given model is likely to depend on the task that is being accomplished (Gentner & Gentner, 1983; Mayer & Greeno, 1972; Young, 1981). For example, Young (1981, 1983) maintained that task-action mapping models of a device (described before) will be most useful if a person is trying to use the device to accomplish a particular task. Furthermore, there is evidence to support Rouse and Morris' (1986) contention that people actually hold several mental models of a system (Stevens & Collins, 1980; Wilson & Rutherford, 1989). Stevens and Collins (1980) found, for example, that subjects employed more than one of four possible mental models when answering questions about the causes of rainfall. These results prompted Stevens and Collins to conclude that when interacting with complex systems people must know when to use each of the various models they hold, and also how the various models are related to one another. Applying the notion of multiple mental models to teams, it seems reasonable to hypothesize that team members may hold multiple models of the task and team. Furthermore, the content and format of these various models will likely make them more or less useful in generating common expectations of task and team behavior.

232

CANNON-BOWERS. SALAS, CONVERSE

An example may help to illustrate this contention. One of the tasks facing a team of operators in a Navy tactical decision-making task is to defend the ship against hostile aircraft. Briefly, this task is accomplished by a team whose members must operate sensor consoles to detect aircraft, integrate and exchange pertinent situation assessment information regarding the aircraft's intent, transmit information to key decision makers, and take action based on the aircraft's believed intent. Typically, such tasks occur under several adverse situational conditions such as high workload, ambiguity, severe time pressure, and threat; all conditions that mitigate against explicit coordination strategies. To be effective in such a situation, it seems reasonable to hypothesize that team members must understand the system at several levels. First, they must understand the dynamics and control of the equipment with which they are interacting to extract information. Second, they must understand the task and how to accomplish it (i.e., the significance of information, what information is needed, how information must be combined, required procedures, etc.). They must also understand how various facets of the environment affect the task and task demands; for example, when workload increases as a function of air traffic in the area, or when radar reception is affected by weather conditions. Third, they must understand their role in the task, that is, their particular contribution, how they must interact with other team members, who requires particular types of information, and so forth. Related to this, they must also know when to monitor their teammates' behavior, when to step in and help· a fellow member who is overloaded, and when to change his or her behavior in response to the needs of the team. Finally, in order to perform optimally, they must be familiar with the knowledge, skills, abilities, preferences, and other task-relevant attributes of their teammates. This is because the expectations for the behavior of their teammates will vary as a function of the individuals who comprise the team. When working with a particularly competent team member, for example, they may alter their behavior or their expectations concerning how they think that team member will perform. In addition, the experience they have had with particular teammates will affect the quality of their expectations for their performance (a contention in keeping with the Foushee et al., 1986, findings cited earlier). Situations of this complexity seem to require multiple mental representations of the task; Table 12.1 describes several of these. Table 12.1 describes four types of mental models that are hypothesized to be useful for effective team performance: one that describes the equipment (equipment model), one that describes the task (task model), and two that describe the team-one that describes the roles, responsibilities, and interactions of team members (team interaction model) and one that describes the team members themselves (team model). It should be noted that these models are not independent. For example, the task model and team interaction model will interact

233

12. SHARED MENTAL MODELS

TABLE 12.1 Multiple Mental Models in Teams Type of Model

Equipment Model

Knowledge Contents

Stability of Model Contents

Equipment functioning Operating procedures Equipment limitations Likely failures

High

Task Model

Task procedures Likely conthigencies Likely scenarios Task strategies Environmental constraints

Moderate

Team Interaction Model

Roles/responsibilities Information sources Interaction patterns Communication channels Role interdependencies

Moderate

Teammates' Teammates' Teammates' Teammates' Teammates'

Low

Team Model

knowledge skills abilities preferences tendencies

when task demands require team members to redistribute functional responsibilities. Taking this notion one step further, it is reasonable to hypothesize that the complexity and stability of such models is not equivalent. As depicted in Table 12.1, the equipment model is likely to be consistent across particular instances of performance; the operator always interacts with the equipment in a similar manner. The task model is likely to be more dynamic and complex because a host of situational parameters will vary across task instances and dictate different accomplishment strategies. Likewise, the team interaction model is likely to be dynamic because the roles, responsibilities, and interaction patterns of team members will vary as a function of task demands. Still more dynamic is the team model, which depends not only on the situation, but also on the particular team members involved. ISSUES IN THE SHARED MENTAL MODEL HYPOTHESIS

Before moving on to more practical concerns, we believe it is necessary to raise several issues that bear on our conception of shared mental models. The following sections attempt to refine our presentation of shared mental

234

CANNON-BOWERS, SALAS, CONVERSE

model theory and highlight aspects of the theory that we believe are worthy of further discussion.

What Must be Shared in Shared Mental Models?

Given what has been said about expectations (i.e., that providing expectations about the task and team performance is the most important function of shared mental models) and about multiple models (i.e., that in complex, multioperator systems people probably hold multiple mental models of that task and team), we can now begin to address the question: What must be shared in shared mental models? Although we cannot answer this question definitively without some empirical investigation, we can hypothesize that team members must share those mental models that describe when and how they must interact with one another in order to accomplish the task. This means, for example, the detailed models of individual functions (e.g., the equipment model) probably do not need to be shared, although familiarity with other team members' equipment may be helpful. On the other hand, the team interaction model, which delineates the impact of each individual's function and his or her contribution to the task does need to be common among members. Models of the task (task model) that create expectations about how events are likely to unfold and how the team is likely to respond to task demands (i.e., strategies) must also be shared; Orasanu's (1990) notion of shared problem models is relevant here. Finally, team models that provide a basis for team members to predict the specific behavior and needs of their teammates must be shared among members. Team models will provide information regarding the knowledge, skills, abilities, preferences, and tendencies of particular teammates so that behavior can be tailored accordingly. The exact contents of what must be shared among team members is most likely to be task dependent. For example, we would maintain that in tasks that are relatively proceduralized (i.e., the responses to various task contingencies can be well specified), team members probably need to share task and team interaction models, but the importance of the team model is diminished because the task offers relatively little behavioral discretion (i.e., it makes little difference who occupies a particular role). On the other hand, when tasks are more dynamic and require high levels of flexibility and adaptability on the part of team members, the importance of team models will be increased. This is because familiarity with teammates-how they operate, what they are likely to do, what information they will requireenhances a team member's ability to develop viable expectations for performance.

12. SHARED MENTAl MODELS

235

What Do They Look Like?

Mental model theorists have wrestled with the question of what form of representation mental models take, that is, what they look like, for several years (Rouse & Morris, 1986). According to Rouse and Morris (1986), two dimensions of representational format are particularly important: whether mental models are spatial or verbal, and whether they are representational or abstract. Another important concern is exactly how information is organized within the model. Rouse and Morris (1986) concluded further that methodologies to answer this "form of representation" question are extremely limited at present. In terms of shared mental model theory, the question of mental model format is important to the extent that format dictates how the mental model is used, how the mental model affects task performance, and the nature of expectations that are generated from the model. Given what we have said about shared mental models thus far, it is tempting to assume that team members must hold representationally similar mental models in order to be effective. However, when shared mental model theory is interpreted in terms of expectations (as in this chapter), the issue of mental model form becomes indirect; that is, because we maintain that the function or benefit of shared mental models is that they lead to common expectations of the task and team, it is the expectations rather than the mental models themselves that must be shared. In a sense, then, we are suggesting that mental models must be compatible in terms of the expectations they generate. Interpreting shared mental models in this way relaxes the requirement that individual team members must have identical mental models; a requirement that ignores the potential for individual differences in mental model development and one that would be extremely difficult to demonstrate. A valid question becomes, however, why invoke the construct of mental models at all (i.e., why not explain the theory simply in terms of common expectations)? There are at least two reasons why the mental model construct is necessary to the theory. First, it provides a context in which we can conceptualize expectations. Given the complexity of the systems and tasks we have been discussing, it is reasonable to assume that expectations are derived from organized knowledge structures (i.e., mental models), as suggested by Wickens (1984). Therefore, if we want to understand how people develop and maintain expectations, we must consider the role of mental models. Further, by hypothesizing that team members hold multiple mental models of the task and team, we are in a better position to determine how various components of system knowledge lead to specific performance expectations. In this sense, the mental model construct has both theoretical and heuristic value. A second reason why the mental model construct is important in this context has practical as well as theoretical implications. Going back to our con-

236

CANNON-BOWERS, SALAS, CONVERSE

tention that TOM requires coordination of activity, adaptability, flexibility, and anticipation of other members' behavior, and that it often occurs indynamic ambiguous environments, it would be impossible to train team members to have specific performance expectations for the variety of situations that may arise. Instead, the goal of training must be to provide people with mechanisms that will enable them to extrapolate their knowledge of the system so that they will form task and team expectations rapidly and accurately. Hence, the mental model construct can guide the development of training and instruction for TOM. More is said about this topic in a later section. In summary, we have attempted to refine shared mental model theory by suggesting that it does not imply identical mental models. Rather, the crucial implication of shared mental model theory is that team members hold compatible mental models that lead to common expectations for the task and team. Too Much of a Good Thing It is a commonly held assumption that one of the unique features of teams is that they can harness multiple perspectives in solving a problem. In terms of TOM, we agree with past researchers that team members hold unique expertise that they bring to bear in performing the task (Orasanu & Salas, in press). However, we also have argued that team members must hold shared mental models of the task and team in order to perform effectively. A reasonable question then becomes: When do shared mental models become a liability? In other words, at what point do team members' knowledge and expectations overlap so much that the uniqueness of their individual contributions is lost? Although this question has not been addressed in the context of shared mental models, some findings from related work may be applicable. First, Janis' (1972) concept of "groupthink" bears on the current discussion. According to Janis, groupthink occurs in highly cohesive groups when the desire for unanimity overrides realistic appraisal and consideration of possible courses of action; that is, the motivation to preserve the group has higher priority than does the outcome of the decision-making process. In the current context, groupthink could occur as a result of too much overlap in team member mental models, or when team members refuse to abandon incorrect models because they are shared by the group. There is also evidence that homogeneity among team members can reduce creativity (Smith, 1971). The reasoning here is that team members from diverse perspectives will stimulate discussion and creativity. These results suggest that shared mental models can become a liability in TOM if they lead to a single-minded view of the problem. Further, they suggest that there may be an optimal degree of shared knowledge structures that must be achieved to maximize TOM performance. On the one hand, too

12.

SHARED MENTAL MODELS

237

little shared knowledge will lead to poorly coordinated teams that are unable to cope with rapidly changing or emergent environments. On the other hand, too much shared knowledge may lead to teams that allow incorrect decisions to be reinforced and go unchallenged. Our assessment of this dilemma is that poorly coordinated teams are likely to fail, suggesting that shared mental models be fostered among team members as much as possible. Particularly in emergency situations, teams will not have time to discuss strategies or manage conflict. To combat the propensity for groupthink or excessive conformity, we offer two possibilities. First, research into aircrew coordination behavior indicates that assertiveness is a critical teamwork skill (Prince et al., 1992). Specifically, team members must be trained to advocate positions they believe to be correct and communicate information they believe to be important to the decision. Recent evidence suggests that assertiveness skills can be trained successfully in the team task performance context (Smith & Salas, 1991 ). A second solution to the conformity problem may be to provide decisionmaking teams with support systems that present alternative hypotheses about the situation. For example, such systems could alert decision makers to the fact that other reasonable explanations and solutions exist for the current situation. Further research along this line is needed. Personality, Attitudes, and Other Individual Variables

We would like to point out that, whereas we have not addressed such issues as personality, attitudes, motivation, goals, and other individual characteristics directly, we do believe that they are critical to TOM performance. In fact, our notion of the "team model" embraces these constructs. Recall that the team model contains information about the KSA's of fellow team members. Other important components of this model include other members' preferences, styles, personalities, attitudes, and any other personal characteristics that will affect how he or she performs the task. In the shared mental model context, such information about other team members will enhance a team member's ability to form accurate expectations for performance. Of particular importance in this regard is the team leader's personal characteristics. First of all, the team leader will have a direct impact on how the team is expected to perform (in terms of procedures, interactions, strategies, etc.). He or she is critical, therefore, to the development of shared mental models. For example, research by Franz, McCallum, Lewis, Prince, and Salas (1990) revealed that the leader's behavior during a mission prebrief with an aircrew had a significant impact on subsequent performance of the crew in a flight simulator. Second, as indicated by Orasanu's (1990) data, leader behavior during task performance has an impact on the extent and quality of shared problem

238

CANNON-BOWERS, SALAS, CONVERSE

models developed by the team. Team leaders who communicated their plans and strategies, and who were proactive in assigning roles and responsibilities, aided the team by clarifying expectations for performance {Orasanu, 1990). Research is needed to explicate further the impact of leader behavior on shared mental model development.

FOSTERING SHARED MENTAL MODELS

If shared mental model theory is to have practical value, methods to foster shared mental models must be devised. The following sections first review research into individual mental model training and then offer several directions that we believe will be fruitful in training shared mental models.

Training Individual Mental Models

According to Rouse and Morris (1986), "one of the purposes of instruction is to provide necessary mental models" (p. 357). We would assert further that the goal of instruction to foster shared mental models is to provide mental models that lead team members to form common interpretations of, and expectations for, the task and team. Note again that we are not necessarily requiring that mental models are identical among members (see discussion earlier). Rather, we are suggesting that mental models that lead to similar expectations must be developed. Research into individual mental model training has important implications for achieving this goal. In their review of mental models and instruction, Rouse and Morris (1986) drew several conclusions. Among other things, they asserted that providing people with knowledge of theories, fundamentals, and principles is not sufficient to ensure effective performance. For example, in an experiment involving device operation, Kieras and Bovair {1984) concluded that effective training must include information about the system or device that allows an operator to infer specific information about the system's operation. Rouse and Morris (1986) concluded from this and other studies that guidance and cuing for applying general knowledge is nece~sary for this knowledge to be transferred to task performance. Related to this, it appears that presenting conceptual models of a system (i.e., those that make explicit the major objects, actions, and causal relationships) during training can help trainees to develop more accurate mental models (see Mayer, 1989). This is particularly true for less able learners (Mayer, 1989), learners in complex domains (Borgman, 1986), and when the model allows the operator to infer procedures for operating the system (Kieras, 1988).

12. SHARED MENTAL MODELS

239

Another area of concern here is how to present instruction so that trainees acquire a particular organization of the material. In the shared mental model context this is important to the extent that specific knowledge structures will lead to development of accurate expectations. Review of this literature reveals that trainees can be led to acquire a particular organizational structure of material (Eylon & Reif, 1984; Meyer, Brandt, & Bluth, 1980; Shavelson, 1972, 197 4; Thro, 1978). With respect to performance, Eylon and Reif (1984) also found that a hierarchical organization of material presented in training led to superior performance on complex problems, and that "task-adapted" organizations (i.e., those containing information that was relevant to the task being accomplished) were more effective than unrelated organizations. Another line of inquiry related to the current topic involves the impact of practice on mental model development. Several studies indicate that allowing people simply to interact with a device or system will often lead to models that are impoverished or incorrect (Bayman & Mayer, 1984; Frederiksen & White, 1989). Further, Frederiksen and White (1989) found that even when subjects appeared to be performing similarly on a task they did not necessarily have similar, fully articulated, mental models. The implications of such findings is that simply letting people practice on a task will not necessarily lead to valid, accurate, mental model development. A final issue to be addressed in this section involves the impact of prior information on mental model training. According to Rouse and Morris (1986), the effect of prior knowledge can either foster or hinder the learning of new mental models. Specifically, if prior models are correct, they can provide a context in which new material can be interpreted. However, if prior mental models are incorrect, evidence suggests that they are difficult to extinguish (Rouse & Morris, 1986). In summarizing the discussion of mental model training, several conclusions can be drawn: (a) training is most effective when it includes specific information regarding the procedures and operations involved in a system; (b) particular knowledge organizations can be trained; (c) unaided practice or experience do not guarantee accurate mental model development; (d) when dealing with complex systems and less able learners, instruction can be enhanced by presenting explicit conceptual models of the system; and (e) the effects of prior knowledge (either positive or negative) must be considered in training mental models. Overall, these findings have several implications for training shared mental models. First, they suggest that it may be possible to instill particular mental models in team members. Second, specific instructional interventions must be instituted to guide practice and experience if specific, accurate, mental models are to be developed. Finally, they suggest that the variability among team member's mental models may be reduced via specific instruction. These conclusions are encouraging because they indicate that the method of in-

240

CANNON-BOWERS, SALAS, CONVERSE

struction can be manipulated to foster accurate mental models and reduce variability among team members' models. Research is needed to further refine and expand the understanding of training to foster individual mental model development. Training Shared Mental Models

Given what we concluded regarding individual mental model training, several propositions regarding training for shared mental models can be drawn. To begin with, it appears that training particular mental models of the task and team is plausible. This does not address the issue, however, of what specific knowledge and organization must comprise these models in order to allow accurate performance expectations to be drawn. What is needed, therefore, is research to determine the relationships among mental model format (organization), content, and expectations. Second, although the value of practice and experience is not questioned, it appears that to be effective, practice must be guided. An obvious way to accomplish this is to develop feedback mechanisms that improve the accuracy of team members' mental models. Related to this, debriefing sessions can be used to involve team members in self-regulation of performance. We maintain that debriefing at the end of an exercise will be successful to the extent that it allows team members to interpret what was happening at critical times, why other team members behaved as they did, and to what extent expectations were correct (Rouse et at., in press). Other methods for training shared mental models are also worthy of follow up. Cross-training, for example, may help team members understand the roles and responsibilities of their teammates (Cannon-Bowers & Salas, 1990). Along the same line, providing team members with information regarding the role of other team members in accomplishing the task may enhance the ability to predict and anticipate each other's behavior. Finally, training team leaders to foster shared mental models has additional potential value. Leaders who articulate their own views of the task and team, and who encourage team members to do so as well, may be more successful in creating shared mental models among team members. A potential means of standardizing such behavior is to structure the format of pretask briefings, planning, and strategy sessions. ASSESSING THE ADEQUACY OF SHARED MENTAL MODELS

An issue of some concern to shared mental model theory and more generally to TOM performance involves developing measures of training effectiveness. Specifically, if shared mental models actually affect TOM as we have

241

12. SHARED MENTAL MODELS

suggested, a reasonable criterion for team training may be that team members hold shared mental models of the task and team. Unfortunately, it is too early to determine whether this proposition is viable. First, it must be established that shared mental models are critical to TDM as we have suggested. Second, as noted, the relationship between mental model format and resulting expectations must be explicated. Finally, the extent and nature of overlap necessary for optimal TDM performance must be determined. To complicate matters further, methods to measure mental models are not well developed, so an individual's mental model cannot be captured with much certainty at present (see Rouse & Morris, 1986). These questions also bear on our ability to test shared mental model theory. Although the situation sounds rather bleak, an alternative to measuring team members' mental models directly and then assessing overlap among models is to assess the quality and accuracy of team members' expectations for performance (Rouse et al., in press). Whereas this will not provide a direct test of hypotheses put forth here, it may provide important information about the team's ability to perform under various conditions. In addition, the degree of overlap in team member expectations is likely to be easier to measure than will be overlap in mental models themselves. Given that we have emphasized the role of expectations in shared mental models theory, this may be a viable means to test the theory and provide a criterion for TDM training while more direct methods to measure mental models are being developed.

SUMMARY

In this chapter, we delineated the beginnings of a theory of shared mental models and their relationship to TDM performance. Building on the work of several researchers, we maintained that the ability of team members to coordinate activity, to adapt to changing task and team demands, and to anticipate the needs of the task and team will be enhanced via shared mental models of the task and team. We further refined the theory and addressed several issues that require further study. Finally, we reviewed literature that might lead to development of methods to foster shared mental model development and suggested alternatives for assessing shared mental models. Overall, we conclude that adopting the shared mental model position can advance our understanding of how teams make decisions effectively in dynamic, complex, and often ambiguous situations. This is particularly true in environments characterized by stress and/or high work load, because both conditions mitigate against the use of explicit coordination strategies. Further, as we have described, the shared mental model position has direct implications for training decision-making teams, and for assessing a team's likelihood for success.

242

CANNON-BOWERS, SALAS, CONVERSE

Research is needed in several areas to realize these potential benefits of shared mental model theory. Specifically, future investigation must: I. Establish a relationship between shared mental models (and resulting expectations for the task and team) and TOM performance. 2. Determine how the content, format, and organization of mental models affect the nature of expectations derived from them. 3. Develop methods to foster shared mental models among team members through training and system design. 4. Develop methods to measure the extent of overlap in team members' mental models (either directly or through an assessment of expectations). 5. Delineate the boundaries of shared mental model theory, including specification of when shared mental models become a liability in TOM.

REFERENCES Adelman, L., Zirk, D. A., Lehner, P. E., Moffett, R. J., & Hall, R. (1986). Distributed tactical decision making: Conceptual framework and empirical results. IEEE Transactions on Systems, Man, and Cybernetics, 16, 794-805. Alba, J. W., & Hasher, L. (1983). Is memory schematic? Psychological Bulletin, 93, 203-231. Anderson, R. C. (1977). The notion of schemata and the educational enterprise: General discussion of the conference. In R. Anderson, R. Spiro, & W. Montague (Eds.), Schooling and the acquisition of knowledge (pp. 415-451). Hillsdale, NJ: Lawrence Erlbaum Associates. Athens, M. (1982, September). The expert team of experts approach to command and control (C2) organizations. IEEE Control Systems Magazine, 30-38. Bayman, P., & Mayer, R. E. (1984). Instructional manipulation of users' mental models for electronic calculators. International Journal of Man-Machine Studies, 20, 189-199. Bobrow, D. G., & Norman, D. A. (1975). In D. G. Bobrow & A.M. Collins (Eds.), Representation and understanding: Studies in cognitive science (pp. 131-149). New York: Academic Press. Borgman, C. L. (1986). The user's mental model of an information retrieval system: An experiment on a prototype online catalog. International Journal of Man-Machine Studies, 24, 41-64. Brehmer, B. (1972). Policy conflict as a function of policy similarity and policy complexity. Scandinavian Journal of Psychology, 13, 208-221. Cannon-Bowers, J. A., & Salas, E. (1990, April). Cognitive psychology and team training: Shared mental models in complex systems. Paper presented at the annual meeting of the Society for Industrial and Organizational Psychology, Miami. Cannon-Bowers, J. A., Salas, E., & Converse, S. A. (1990). Cognitive psychology and team training: Shared mental models in complex systems. Human Factors Society Bulletin, 33, 1-4. Collins, A.M., & Loftus, E. F. (1975). A spreading activation theory of semantic processing. Psychological Review, 82, 407-428. Cream, B. W. (1974). A functional integrated systems trainer for individual and crew coordination training. Proceedings of the Fourth Annual Symposium on Psychology in the Air Force. Colorado Springs: U.S. Air Force Academy. Cream, B. W., Eggemeier, F. T., & Klein, G. A. (1978). A strategy for development of training devices. Human Factors, 20, 145-158.

12. SHARED MENTAL MODELS

243

Cummings, T. G. (1981). Designing effective work groups. In P. C. Nystrom & W. Starbuck (Eds.), Handbook of organizational design (Vol. 2, pp. 250-271). London: Oxford University Press. de Kleer, J., & Brown, J. S. (1981). Mental models of physical mechanisms and their acquisition. In J. Anderson (Ed.), Cognitive skills and their acquisition (pp. 285-309). Hillsdale, NJ: Lawrence Erlbaum Associates. de Kleer, J., & Brown, J. S. (1983). Assumptions and ambiguities in mechanistic mental models. In D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 155-190). Hillsdale, NJ: Lawrence Erlbaum Associates. Dyer, J. L. (1984). Team research and team training: A state of the art review. Human Factors Review, 285-323. Eylon, B.S., & Rei!, F. (1984). Effects of knowledge organization on task performance. Cogni· tion and Instruction, I, 5-44. Foushee, H. C., Lauber, J. K., Baetge, M. M., & Acomb, D. B. (1986). Crew factors in flight operations: Ill The operational significance of exposure to short-haul air transport operations (NASA Technical Memorandum 88322). Moffett Field, CA: National Aeronautics and Space Administration. Foushee, H. C., & Manos, K. L. (1981).1nformation transfer within the cockpit: Problems in intracockpit communications. In C. E. Billings & E. S. Cheaney (Eds.), Information transfer problems in the aviation system (NASA Technical Paper 1875, pp. 63-71). Moffett Field, CA: National Aeronautics and Space Administration. Franz, T. M., McCallum, G. A., Lewis, M.D., Prince, C., & Salas, E. (1990, April). Pilot briefings and aircrew coordination evaluation: Empirical results. Paper presented at the 12th Annual Department of Defense Symposium, U.S. Air Force Academy, Colorado Springs. Frederiksen, J., & White, B. (1989). An approach to training based upon principled task decomposition. Acta Psychologica, 71, 89-146. Gabarro, J. J. (1990). The development of working relationships. In J. Galegher, R. E. Kraut, & C. Egido (Eds.), Intellectual teamwork (pp. 79-110). Hillsdale, NJ: Lawrence Erlbaum Associates. Gentner, D., & Gentner, D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity.ln D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 99-129). Hillsdale, NJ: Lawrence Erlbaum Associates. Glickman, A. S., Zimmer, S., Montero, R. C., Guerette, P. J., Campbell, W. J., Morgan, B. B., & Salas, E. (1987). The evolution of teamwork skills: An empirical assessment with implications for training (Tech. Rep. No. 87-016). Orlando: Naval Training Systems Center. Hackman, J. R. (1987). The design of workteams. In J. Lorsch (Ed.), Handbook of organizational behavior (pp. 315-342). Englewood Cliffs, NJ: Prentice-Hall. Hackman, J. R., & Morris, C. G. (1975). Group tasks, group interaction process and group performance effectiveness: A review and proposed integration. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 8, pp. 45-109). New York: Academic Press. Hammond, K. R. (1965). New directions in research on conflict resolution. Journal of Social Issues, 21, 44-66. Hemphill, J. K., & Rush, C. H. (1952). Studies in aircrew composition: Measurement of cross-training in B-29 aircrews (AD Number: 8958347). Columbus: Ohio State University, Columbus Personnel Research Board. Jagacinski, R. J., & Miller, R. A. (1978). Describing the human operator's internal model of a dynamic system. Human Factors, 20, 425-433. Janis, I. L. (1972). Victims of groupthink. Boston: Houghton-Mifflin. Johnson-Laird, P. (1983). Mental models. Cambridge, MA: Harvard University Press. Kieras, D. E. (1988). What mental models should be taught: Choosing instructional content for complex engineering systems. In J. Psotka, L. D. Massey, & S. A. Mutter (Eds.), Intelligent tutoring systems: Lessons learned (pp. 85-111 ). Hillsdale, NJ: Lawrence Erlbaum Associates.

244

CANNON-BOWERS, SALAS, CONVERSE

Kieras, D. E., & Bovair, S. (1984). The role of a mental model in learning to control a device. Cognitive Science, 8, 255-273. Klaus, D. J., & Glaser, R. (1968). Increasing team proficiency through training: 8 (Technical Report No. AIR E 1-6/68FR). Springfield, VA.: Clearinghouse lor Federal Scientific & Technical Information. Klein, G. A. (1989). Recognition-primed decisions. In W. B. Rouse (Ed.), Advances in man-machine systems research (Vol. 5, pp. 47-92). Greenwich, CT: JAI Press. Klein, G., & Thordsen, M. (1989, June). Recognitional decision making in C2 organizations. Paper presented at the 1989 Symposium on C2 Research, Washington, DC, National Defense University. Kleinman, D. L., Luh, P. B., Pattipati, K. R., & Serlaty, D. (1992). Mathematical models of team performance: A distributed decision making approach. In R. W. Swezey & E. Salas (Eds.), Teams: Their training and performance. Norwood, NJ: Ablex. Kleinman, D. L., & Serlaty, D. (1989). Team performance assessment in distributed decision making. In R. Gilson, J. P. Kincaid, & B. Goldiez (Eds.), Proceedings for Interactive Networked Simulation for Training Conference (pp. 22-27). Orlando: Institute for Simulation and Training. Kohn, C., Kleinman, D. L., & Serfaty, D. (1987). Distributed resource allocation in a team. Proceedings of the JDL Symposium on Command and Control Research (pp. 221-233). Washington, DC. Lanzetta, J. T., & Roby, T. B. (1960). The relationship between certain group process variables and group problem solving efficiency. Journal of Social Psychology, 52, 135-148. Mayer, R. E. (1989). Models for understanding. Review of Educational Research, 59, 43-64. Mayer, R. E., & Greeno, J. G. (1972). Structurally different learning outcomes produced by different instructional methods. Journal of Educational Psychology, 63, 165-173. Mcintyre, R. M., Morgan, B. B., Jr., Salas, E., & Glickman, AS. (1988). Team research in the eighties: Lessons learned. Unpublished manuscript, Naval Training Systems Center, Orlando. Mead, G. H. (1934). Mind, self, and society. Chicago: University of Chicago Press. Meyer, B. J. F., Brandt, D. M., & Bluth, G. J. (1980). Use of top level structure in text: Key lor reading comprehension of ninth grade students. Reading Research Quarterly, 16, 72-103. Moore, J. L., & Gordon, S.C. (1988). Conceptual graphs as instructional tools. Proceedings of the Human Factors Society 32nd Annual Meeting (pp. 1289-1293). Santa Monica, CA: Human Factors Society. Morgan, B. B., Jr., Glickman, A. S., Woodard, E. A., Blaiwes, A. S., & Salas, E. (1986). Measurement of team behaviors in a Navy environment (Tech. Rep. No. TR-86-014). Orlando: Naval Training Systems Center, Human Factors Division. Orasanu, J. (1990, July). Shared mental models and crew decision making. Paper presented at the 12th Annual Conference of the Cognitive Science Society, Cambridge, MA. Orasanu, J., & Salas, E. (in press). Team decision making in complex environments. In G. Klein, Orasanu, J., Calderwood, R., & Zsambok, C. (Eds.), Decision-making in action: Models and methods. Norwood, NJ: Ablex. Oser, R., McCallum, G. A., Salas, E., & Morgan, B. B., Jr. (1989). Toward a definition of teamwork: An analysis of critical team behaviors (Tech. Rep. No. TR-89-004). Orlando: Naval Training Systems Center. Oser, R., Prince, C., & Morgan, B. B., Jr. (1990, October). Differences in aircrew communication content as a function of flight requirement: Implications for operational aircrew training. Poster presented at the 34th Annual Meeting of the Human Factors Society, Orlando. Prince, C., Chidester, T. R., Bowers, C. A., & Cannon-Bowers, J. A. (1992). Aircrew coordination: Achieving teamwork in the cockpit. In R. Swezey & E. Salas (Eds.), Teams: Their training and performance (pp. 329-353). Norwood, NJ: Ablex. Rasmussen, J. (1979). On the structure of knowledge-A morphology of mental models in a manmachine system context (Tech. Rep. No. Riso-M-2192). Roskilde, Denmark: Riso National Laboratory.

12. SHARED MENTAL MODELS

245

Rips, L. J., Shoben, E. J., & Smith, E. E. (1973). Semantic distance and the verification of semantic relations. Journal of Verbal Learning and Verbal Behavior, 12, 1-20. Rouse, W. B., Cannon-Bowers, J. A., & Salas, E. (in press). The role of mental models in team performance in complex systems. IEEE Transactions on Systems, Man, and Cybernetics. Rouse, W. B., & Morris, N. M. (1986). On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin, 100, 359-363. Rumelhart, D. D., & Ortony, A. (1977). The representation of knowledge in memory. In R. C. Anderson & R. J. Spiro (Eds.), Schooling and the acquisition of knowledge (pp. 99-135). Hillsdale, NJ: Lawrence Erlbaum Associates. Salas, E., Blaiwes, A. R., Reynolds, R. E., Glickman, A. S., & Morgan, B. B., Jr. (1985). Teamwork from team training: New directions. Proceedings of the 7th lnterservice/lndustry Training Equipment Conference and Exhibition (pp. 400-406). Washington, DC: American Defense Preparedness Association. Salas, E., Dickinson, T. L., Converse, S. A., & Tannenbaum, S. I. (1992). Toward an understanding of team performance and training. In R. W. Swezey & E. Salas (Eds.), Teams: Their training and performance (pp. 3-29). Norwood, NJ: Ablex. Sanderson, P.M. (1989). Verbalizable knowledge and skilled task performance: Association, dissociation, and mental models. Journal of Experimental Psychology, 15, 729-747. Shavelson, R. J. (1972). Some aspects of correspondence between content structure and cognitive structure in physics instruction. Journal of Educational Psychology, 63, 225-234. Shavelson, R. J. (1974). Methods for examining representations of a subject-matter structure in a student's memory. Journal of Research in Science Teaching, 11, 231-249. Smith, C. G. (1971). Scientific performance and the composition of research teams. Administrative Science Quarterly, 16, 486-495. Smith, K. A., & Salas, E. (1991, March). Training assertiveness: The importance of active participation. Paper presented at the 37th Annual Meeting of the Southeastern Psychological Association, New Orleans. Stevens, A. L., & Collins, A. {1980). Multiple conceptual models of a complex system. In R. E. Snow, P. A. Federico, & W. E. Montague (Eds.), Aptitude, learning, and instruction (Vol. 2, pp. 177-197). Hillsdale, NJ: Lawrence Erlbaum Associates. Stout, R., Cannon-Bowers, J. A., Salas, E., & Morgan, B. B., Jr. (1990). Does crew coordination behavior impact performance? Proceedings of the 34th Annual Meeting of the Human Factors Society (pp. 1382-1386). Santa Monica, CA: Human Factors Society. Sundstrom, E., DeMeuse, K. P., & Futrell, D. (1990). Work teams: Applications and effectiveness. American Psychologist, 45, 120-133. Tannenbaum, S. 1., Beard, R. L., & Salas, E. {in press). Team building and its influence on team effectiveness: An examination of conceptual and empirical developments. In K. Kelley {Ed.), Issues, theory, and research in Industrial/organizational psychology. Amsterdam: Elsevier. Thro, M.P. {1978). Relationships between associative and content structure of physics concepts. Journal of Educational Psychology, 70, 971-978. Veldhuyzen, W., & Stassen, H. G. {1977). The internal model concept: An application to modeling human control of large ships. Human Factors, 19, 367-380. Vreuls, D., & Obermayer, R. W. (1985). Human-system performance measurement in training simulators. Human Factors, 27, 241-250. Wegner, D. {1987). Transactive memory: A contemporary analysis of group mind. In B. Mullen & G. R. Goethals (Eds.), Theories of group behavior (pp. 185-208). New York: Springer-Verlag. Wickens, C. D. (1984). Engineering psychology and human performance. Columbus, OH: Merrill. Wilson, J. R., & Rutherford, A. (1989). Mental models: Theory and application in human factors. Human Factors, 31, 617-634.

246

CANNON-BOWERS, SALAS, CONVERSE

Wohl, J. G., Entin, E. E., Kleinman, D. L., & Pattipati, K. R. (1984). Human decision processes in military command and control. In W. B. Rouse (Ed.), Advances in man-machine systems research (Vol. 1, pp. 261-307). Greenwich, CT: JAI Press. Young, R. M. (1981). The machine inside the machine: Users' models of pocket calculators./nternationa/ Journal of Man-Machine Studies, 15, 51-85. Young, R. M. (1983). Surrogates and mappings: Two kinds of conceptual models for interactive devices. In D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 35-52). Hillsdale, NJ: Lawrence Erlbaum Associates.

CHAPTER

13 TEAM DECISION MAKING AND TECHNOLOGY* LorRaine Duffy Naval Command Control and Ocean Surveillance Center, San Diego, CA

In 1990, Martin Marietta deployed a satellite into the wrong orbit when engineers instructed the computer programmers to open the bay door to the hatch containing the satellite. The programmers opened the "wrong door," although they had followed the instructions correctly (Associated Press, 1990). Today, $150 million dollars sits dead in orbit around the earth. The total cost of the miscommunication: $500 million. In 1989, an Avianca airliner crashed in New York awaiting clearance for landing, while critically low on fuel. Ground control misinterpreted the Colombian pilot's urgent message regarding his fuel shortage and assumed that a fuel emergency had not been reached. More than 270 people died. The captain aboard the USS Vincennes had approximately 180 seconds in which to process information from several of his support staff and make the decision as to whether to shoot the (then misidentified) Iranian airbus and protect his crew. Although the decision was militarily correct, 290 civilians died and questions were raised regarding the inadequacy of our scientific knowledge concerning team decision making and decision support systems in this "real life" setting. Each of these examples involved more than one decision maker interacting with technologically sophisticated support systems, each of which contributed to the final outcome. These examples involved highly skilled, technically competent team members. However, something "went wrong" *The views expressed in this chapter are solely those of the author and should not be construed as an official position of the United States government.

247

248

DUFFY

when these multiple-expert parties attempted to reach a resolution. What factors are important in predicting team effectiveness and what is the impact of technologically sophisticated systems on these teams? We know a great deal about the answer when discussing individual decision makers (Abelson & Levi, 1985), but very little when discussing technologically supported multiple decision makers (Duffy, 1990; Galegher, 1990).

A DEFINITION OF COMPUTER-SUPPORTED TEAM DECISION MAKING

Although I refer to groups and teams interchangeably, a distinction should be drawn between "real" teams and groups created in laboratory settings for the purposes of research, because they operate under different assumptions and constraints. The study of laboratory-created groups has led to valuable theoretical insight into the group decision-making process; however, whether these insights generalize outside the laboratory is not clear (Michaelsen, Watson, & Black, 1989; Michaelsen, Watson, Schwartzkopf, & Black, 1992; Tindale & Larson, 1992). Seeger (1983) and Tetlock (1985) noted that the academic literature on group decision making is based on data drawn from artificially created groups in a laboratory setting, working on arbitrary tasks. This has led to questionable conclusions regarding the processes that are undertaken by real teams to reach decisions in organizational and political contexts (Janis, 1989). However, descriptive studies of real teams have been unable to systematically test theories regarding process that would be generalizable (Hackman, 1990). Laboratory research and descriptive studies together may provide us with some insight into the nature of team decision making in a technological context. Team decision making is further complicated when it is supported by technology, such as decision support systems that are comprised of decision aids, informational data bases, computers, intercoms, telephones, video, and so forth (Johansen et al., 1991). Decision making, as a term, no longer adequately fits the expanded activities that the team undertakes to solve a problem or reach an intended goal. Intellectual teamwork is possibly a better term to describe team decision making in technologically supported environments, because according to Galegher (1990), it assumes" ... individuals working together (often across long periods of time) to manipulate information or to create information intensive products" (p. 195). For example, products can include strategic planning documents, integrated engineering designs, or comprehensive medical diagnoses. It does not focus on the manual movements that characterize assembly line work, or restrict itself to the coordination of physical activities among team members. It is much broader than the consensual decision making that is characteristic of much of the management

13. TEAM DECISION MAKING

249

literature on group decision making. The focus is on the implicit cognitive and coordination strategies and information use that are undertaken by a team to make a decision, reach an intended goal, or solve a problem (Bushnell, Serfaty, & Kleinman, 1987; Kleinman & Serfaty, 1989). Assume, then, that team and group decision making involves co-acting members with specialized knowledge, interacting to arrive at some valued decision(s) or outcomes. (See Duffy, 1990, for a more complete definition.) Teams have accountable membership (Tetlock, 1985), often work in unpredictable ambiguous environments, and process information (or enact various functions) for variable lengths of time. As McGrath (1990) noted, "temporal patterns are not fixed in time in the sense that a given set of actions does not always last a certain number of minutes" (p. 26). Team actions can occur iteratively, nonsequentially, and redundantly. To keep in line with what we see in real teams, Galegher's definition is expanded here to include the following requirements: (a) team members who have different and specialized expertise, (b) the interaction of individual and group problem-solving activities, as well as (c) substantial interpersonal communication in "nonsequential" task time frames (as defined before by McGrath). Given this definition, the team task is more intellectually intensive than the coordinated manual activities typically associated with, for example, industrial teams and command and control teams (Morgan, Glickman, Woodward, Blaiwes, & Salas, 1986; Sundstrom, DeMuese, & Futrell, 1990). Given that teams engage in our expanded definition of intellectual teamwork, what kinds of problems beset teams and can technology address these problems? First, though, what types of problems are there?

TEAM DECISION-MAKING PROBLEMS

What kinds of biases affect the decision making of teams engaged in intellectual teamwork? Two broad categories are process "problems" and content "problems." These problems occur as a function of errors, such as slips and mistakes (Norman, 1988), and cognitive biases (Kahneman, Tversky, & Slovic, 1982). Before a description can be given, a caveat is in order. The point of commission of an error or bias may be undefinable for teamwork. When an individual commits an error or succumbs to a bias, it is usually definable at the moment of behavioral commission, because we cannot see "inside a person's head." When the team commits an error or succumbs to a bias, it is difficult to define the point of commission, because teams can explicitly selfcorrect behaviors before the end of a team work sequence of behaviors and verbal reasoning. This is, in effect, like "seeing inside the team's head." Error, then, can only be reliably defined after the fact (Sticha & Gribben, 1992). Researchers must be very clear of their definition of team error and at what

250

DUFFY

level of analysis they are working in order to determine the nature and effects of these errors and biases. It is one thing to assume an error or bias on the part of one team member, based on notions of what would be considered logical behavior or intended action. But, when one has a team of decision makers, that assumption is stretched to the limit. Any individual can assume a different perspective on the problem or intention of action and act according to the logic of that perspective or intention, not the logic of the experimenter. The experimenter's definition of the team error or bias is a substantial research issue.

Errors/Bias Due to Process The team decision-making process, in general, may itself serve as a source of error or bias. One categorization of process includes the following four dimensions (McDonald, 1990): rational, consensual, empirical, and political. Rational decision processes are goal centered, logical, and efficient and include many of the processes characterized by multiattribute utility theory (Keeney & Raiffa, 1976). Unfortunately, rational decision processes rely on the assumption that knowledge of all attributes and consequences of the problem are definable by the team members or can be derived, and that imagination can be used to define one's knowledge of "future events," making them predictable. Consensual processes demand participation and general agreement on an outcome by all team members. They highlight a strong bias to democratic processes. Support of this process has been the focus of much of the development of technological support for group decision making in the business community. Empirical processes are based on the use of information or evidence and demand accountability or credibility. These processes might be subsumed under rational processes; one can be rational without being empirical, but one cannot be empirical without being rational. Political decision processes rely heavily on adaptability and external legitimacy. "Satisficing" (Simon, 1955), a decision strategy of choosing the first "acceptable" solution generated as opposed to an optimal solution, for individuals could be generalized to team decision making. Whether these different processes are unique to particular types of teams or to particular points in the decision-making process remains unanswered. It does appear that they are not mutually exclusive, and any decision situation can be described as consisting of each process in varying degrees. Emphasizing one process over another may cause a predilection to error. For example, I would assert that the Martin Marietta team was engaged in rational decision making, when consensual decision making should have been the focus because of the high demand for coordination among the team members (even though it was in a rational and empirical context). The Avianca incident seemed to reflect a focus on political and consensual decision

13. TEAM DECISION MAKING

251

making; the air ground controller was making the quickest satisficing decision possible under his very heavy task load and trying to reach concurrence with the pilot despite language differences. However, there should have been more credence given to the empirical processes-the determination of actual facts on fuel load. The Vincennes incident was heavily enmeshed in empirical decision making (determining the identification of the unknown aircraft by correlating different sources of tactical information about the aircraft), as well as in the more obvious political decision making, and was overloaded by the demands of both. The overload on human processing may have been unavoidable in this case. We must improve the technological support of the appropriate process when a change in the process type is unfeasible. (Of course, research is needed to empirically support these assertions. We discuss this later.)

Errors/Bias Due to Content The current literature on content-related errors and biases that is the most useful for teams engaged in intellectual teamwork seems to fall into three general categories: informational, normative, and structural. These categories are derived from a mix of individual and group decision-making literature. How well the transfer of the terminology from an individual perspective to the team perspective occurs is a major research issue. As Tindale (this volume) notes, the effects can be quite different for the individual and the group, depending on the bias under investigation. Informational Errors/Biases.

Informational errors/biases are most close-

ly associated with Kahneman et al.'s (1982) description of the types of individual cognitive heuristics and biases. Examples of these would be representativeness, availability, and the base-rate fallacy. Informational errors are similar to Janis' (1989) depiction of cognitive effects in team decision making. The error occurs at a cognitive level, rather than at a social/interpersonal level. Eekhout and Rouse (1981), Rasmussen (1986), Norman (1988), Reason (1990), and Rouse and Rouse (1983) all provided classification schemes for individual cognitive errors. They range from Norman's slips (resulting from automatic behavior) and mistakes (resulting from conscious deliberation) to Reason's slips, rule-based mistakes, and knowledgebased mistakes. Many of these errors seem to result from erroneous or hidden assumptions raised when an inappropriate "context" (or frame) is used to delineate the information needed for a particular decision. This may relate to the use of schemata, heuristics, scripts, assumptions, and so forth. Most of the research on informational error and bias has been conducted in laboratories on individuals. Do these often laboratory-induced individual errors transfer to nonlaboratory, often irrational, teams? The illusory corre-

252

DUFFY

lation effect studied in the laboratory (Chapman & Chapman, 1969) is one area of research that has transferred well from the laboratory to the applied setting; from the individual to the team. The Chapman's research found just such effects with individuals on a clinical categorization task. Talcott, Marvin, and Bresnick (1989) found a similar effect, discussed as confirmation bias, with teams of military planners. Base rate fallacy has also made the transition; from Kahneman and Tversky's (1973) study of individuals to Hinsz, Tindale, Nagao, Davis, and Robertson's (1988) study of the group's susceptibility to base-rate fallacy. Satisficing (Simon, 1955) as a strategy can lead to good and bad expert-like decisions for the individual decision maker. This concept has been reworked (with modifications) to depict expert-like team decision making in time-constrained situations, as "recognition-primed" decision making (Klein & Thordsen, 1989). Informational errors and biases are based on how information is processed and transferred. This suggests that an information-processing perspective of team interaction may be an appropriate model (Hinsz, 1990a; Lord, 1985; Taylor & Crocker, 1981; Wegner, 1987). This perspective can provide a rich source of predictions regarding intellectual teamwork. For example, the engineers and programmers in the Martin Marietta problem may have been victim to a simple case of miscommunication. When the engineers told the programmers to "Open bay door one," they did not know that their referents were different-engineers were referring to the upper bay, the programmers were referring to the lower bay. At the critical moment, they were both thinking of "bay door one." This does not give us much insight into resolving the problem to inhibit its future occurrence. But if one were to think of the incident as two teams working with different "mental models" or "schemata" of the problem, it provides software designers with an idea of potential correctives/ aids in the decision support systems used by engineers and programmers. (Simply editing a procedural manual cannot solve the problem of one team member not knowing he/she is using the wrong referent at the time of execution.) This is a recent area of development with respect to understanding team interaction (Cannon-Bowers & Salas, 1990; Hinsz, 1990b; Rentsch & Duffy, 1990). If mental models are the appropriate unit of cognitive analysis, encoding as an information-processing function would be an appropriate unit of functional analysis. It also would be interesting to see if software designers could encourage consensual decision processes in this highly rational context, as discussed earlier. Knowing which mental models are represented in individual team members and how to communicate them to the other team members would move us far in the direction of preventing potential "miscommunications" and $150 million mistakes. Another perspective on mental models would be to study the way the information is stored and then retrieved from group memory (Wegner, 1987), which can be affected by coding information in a format congruent to the

13. TEAM DECISION MAKING

253

"story told" format, as in the use of analogies (Janis, 1989); or in a similar vein, as depicted by Pennington and Hastie's (1992) explanation-based reasoning. If different team members choose different storing strategies, team errors could occur in retrieving and combining the needed information. Surfacing important information from group memory (Wegner, 1987) for group decision making is not as straightforward as with individual retrieval. Stasser (1992) and Stasser and Titus (1985) found that in a laboratory study based on an information sampling model, groups have a tendency to share commonly known information, not uniquely held information. (Unique information is that known only by an individual group member, not by other group members.) Technology in the form of software support for "surfacing" uniquely held information should be the focus of development in group support systems. Normative Errors/Biases. A second category of error /bias is more socially based; it includes normative influences or affiliating influences (Janis, 1989) and is a derivative of more traditional social psychological variables. They are influences specific to social interaction processes or expectation of interaction with others. Included is the "groupthink" effect (Janis, 1972; Janis & Mann, 1977), where small group consensus occurs due to pressure from group members to preserve group harmony. The Abilene paradox (Harvey, 197 4, 1988) is the "flip" side of this process, wherein there is outward group acceptance of a solution, although each member internally does not agree with the socially accepted solution, much along the lines of the false consensus effect (Sanders & Mullen, 1983). This push to agreement may have been the operative social mechanism that drove the decision making for the harried air controller in trying to understand the fuel situation on the Avianca airliner. Rather than lose valuable time querying whether the pilot understood his request for information, the controller assumed that the pilot had understood the request. The pilot in turn assumed the air controller was "handling" the situation and did not query the understanding of the air controller. One could hypothesize that both agreed to the solution (to wait) but internally were not "happy" about it. (Note, too, that from a human factors' standpoint, I am not denying the overwhelming criticality of the pilot's lack of knowledge of his own airliner's fuel reserves.) Incremental decision making is a socially induced bias in the team process as described by Janis (1989), where the latest decision is the "smallest" possible incremental change from the last decision made in the same area. The underlying reason is to "cover one's rear flank." Tetlock (1985) described an "avoid punishment" strategy similar in description to incremental decision making. In other words, a group makes numerous small decisions that are less likely to erupt into trouble or create dissent than one large coursechanging decision. Another bias is a group preference for sequential process-

254

DUFFY

ing in group interaction (such as turn-taking), noted by Galegher (1990), specifically in computer-mediated interacting groups. Other more commonly known process variables are related to personality and ideological conflicts occurring among team members, which may cause difficulty or misunderstanding in reaching consensus. The most popular is the domination of group interaction by one member, which has been a ripe area of research in computer-mediated group decision making because it is relatively easy to observe the dominant member as he/she is reduced to using only verbal skills to influence the group, rather than much richer behavioral influences, such as posturing, glowering, or selectively attending to favored group members (Kiesler, Siegel, & McGuire, 1984; Nunamaker, Applegate, & Kosynski, 1988). Structural Errors/Biases. Finally, structural variables that influence the type of error or bias that may appear are derived from the organizational or environmental context in which they occur. Cohen, March, and Olsen (1972) were among the first to note that group decision making was heavily influenced by the organizational context in which it occurred. Mismatches in the more global, systems-theoretic variables, such as organizational hierarchy, led to poor group decision making, particularly in real-life settings (March & Weissinger-Saylor, 1986). For example, inappropriate lines of authority (e.g., by rank and not by skill) could lead to errors in team decision making (Rochlin, La Porte, & Roberts, 1987), as well as a failure to coordinate uniquely held expert information (Cicourel, 1990; Hutchins, 1990); that is, "valued" information is not distributed among team members. Think of a surgical team of doctors consulting on the health care of a patient. The "worth" or "value" of a recommendation for a course of health care is couched in the socially based reputation and perceived competence of the member making the recommendation. Information gains value by how that doctor is perceived by his/her colleagues in knowledge, ranking, and expertise. This is quite evident in teams that are engaged in consensual decision makingwhere democratic processes tend to be followed. In military teams, the same influences may be occurring but are less evident, because the rigid authority structure forces the members to accept a member's information as valid, despite personal feelings about that member's competence or expertise. (However, it is known that a particular member's view is worked around or ignored due to the perceived incompetence of that member.) Errors (or poor performance) also can occur because of the effects of task characteristics (such as noted by Hammond, Hamm, Grassia, & Pearson, 1987; Stammers & Hallum, 1985), spatial arrangement of team members in organizations, the degree of technological support, and so on, all of which have little or no research evidence but much anecdotal evidence. The crew of the USS Vincennes was ultimately following organizational doctrine (based on the type of technological support they had, such as the configuration of

13. TEAM DECISION MAKING

255

weapon systems), which left little room for negotiation in attempting to correctly define the problem in a situation with little time, high risk, high uncertainty, and great stress. Given that we know that team decision making can succumb to any of these errors, what can we do technologically to ensure accuracy? What is the current state of technological support in this area?

TEAM DECISION MAKING AND TECHNOLOGY

Technology is now a fact of life, not only in complex systems exemplified at the beginning of this chapter, but in our everyday lives. We see it in the computers we use, the appliances we have trouble manipulating, and the transportation systems that we take for granted. Interestingly, decision technologists have only recently begun to pay attention to group support systems and collaborative work. Group support systems encompass a wide range of technology (hardware, software, and network) that support group processes and functions. Collaborative work implies any task that requires the multiple inputs of more than one person or machine to reach an intended goal. Getting one piece of needed information from another person or source is not, by definition, collaborative work. Nor does it necessarily imply consensus among team members. Collaborative work implies working jointly, especially on an intellectual endeavor (Webster's New Collegiate Dictionary, 1980). The variety of hardware and software suites that comprise group support systems is growing daily. Their main purpose is to function as quantitative and qualitative group decision aids for the purposes of five functions: communication, group memory /information access, coordination, development, and collaboration, such as brainstorming, group drawing, and group authoring (Nunamaker, 1992). Mechanically, support systems vary along two dimensions: time and place. They support group processes that occur at the same time and place, same time-different place, different time-same place, and different time and place (Johansen et al., 1991). Philosophically, there is another dimension that must be addressed: the time dimension that is the focus of the group decision making itself, or strategic versus tactical focus, to borrow language from the military. Strategic decision making incorporates the most important or overriding goals for military action. In business, the concept is referenced in long range, five year plans and in company's purpose and goals. Strategic planning meetings are a common occurrence in almost any business, as well as in military environments. Tactical decision making involves the actions or means of less magnitude or that involve actions and objects that are a short distance from the base of operations (especially in the U.S. Air Force), as opposed to those of strategy. It tends to be used in business operations to refer to "everyday functioning" or those deci-

256

DUFFY

sions that a business makes with an immediate end or goal in view. It is much more time compressed in its goals than strategic decision making. Group support systems, such as meeting room environments, focus heavily on supporting the strategic decision making of an organization. Coordination and communication software focus on the tactical decision making that an organization conducts. There is no software or hardware that focuses on the transition from one form to the other. The assumptions that group support systems make, particularly meeting support software, with regard to group decision making are: rationality among decision makers; decision making as a structured linear process; the need for anonymity or a facilitator as a process controller; and the need to replicate the face-to-face decision-making environment. Each of these assumptions can be questioned given the particular circumstances under investigation, and none has had consistent empirical or anecdotal evidence to support it. Pinsonneault and Kraemer (1990) offered an excellent review of the area and noted that in "eyeballing" the available data, group decision support systems seem to increase: the depth of analysis of groups, the level of participation by individual group members, consensus reaching, the confidence and satisfaction with the group decision, and improvement of the quality of the decision. They also found support for the assertion that these systems reduce domination of the group by a few members and reduce decision time. Overall, then, the objectives of these group decision support systems are generally supported. The objectives include: (a) provision of available information more accurately, more completely, and in faster accession time; (b) reduction of coordination effort (obviating process loss); (c) reduction of negative group influence effects (moving the decision making from a normatively influenced one to an information-influenced one); (d) and the increase of wider audience view of the same material, under the assumption that this leads to faster common understanding (Kraemer & King, 1988). However, new problems are associated with the preceding gains in process: (a) team members are no longer interacting in a face-to-face format in "real time" and with no replacement for the Joss of nonverbal information that is gained "in situ"; (b) as a result, the process becomes asynchronous with unknown impact; (c) the process becomes democratized without the members' knowledge or consent (Kiesler, et al., 1984); and (d) there are no quick recovery mechanisms for errors and misinterpretations, particularly because "backchannel communication" has been eliminated (Galegher, 1990). (Backchannel communication occurs in the "background" of primary communication and can be used to quickly query and to confirm understanding among team members.) These group system problems will continue to plague technology supported team decision making unless they are addressed in the early stages of decision support system development. For example, teams like those described

257

13. TEAM DECISION MAKING

in the satellite incident would continue to suffer from two of the major problems just listed: no face-to-face communication and no quick recovery mechanisms. However, a group system could begin by addressing the two that could impact the type of problem that the Martin Marietta team experienced: asynchronous processing and democratization of the interaction. For example, with a group support system providing an interactive graphic view of the rocket carrying the satellite to be viewed by all members of the satellite deployment team, would the same error have occurred? It would lead to a more synchronous process because the interactive graphics enable them to work on the view simultaneously, such as in group authoring software. (Everyone can "write" on a common document at essentially the same time.) It also would allow for the development of a more consensual decision process between the team members, as they can interact, not simply react. Could the Avianca disaster have been averted if the two principal participants (the pilot and the air traffic controller) had a system that more specifically queried their assumptions about the level of fuel left on the airliner (within the given time constraints of the decision)? This is a case where the "backchannel" communication" support needs to be improved. This situation would continue to suffer from the lack of face-to-face interaction, but one can see that the supports need not have to concentrate so much on the synchronical or democratic processes, because they had less impact. Increasing the richness of information (Daft & Lengel, 1986) by increasing backchannel communication is another possible solution. What type of system "fixes" would have given the crew of the USS Vincennes enough information to realize that their team situation assessment had incorporated incorrect information (e.g., the altitude reading of the incoming plane)? "Human factoring" the display panel is only a partial answer. We need to know more about how information is acquired, processed, and distributed among a team of experts, before we can truly begin to develop system "fixes." This is another question to be resolved by research on team decision making and technology.

RESEARCH ISSUES

What are some of the main issues that we need to address in order to build a program of research that will answer some of the questions posed earlier? And what has been already developed to attempt to answer these questions? Four major research issues influence the direction of our growing understanding of team decision making in a technological context. The first issue involves simply understanding team/ group decision making and cognitive issues in a technological context. Most of the dominant research has been associated with analytical models of decision making by individuals. Less attention has been focused on group/team decision making. Very little research

258

DUFFY

has been conducted on decision making in a technological context. (See Abelson & Levi, 1985, for a review of individual decision-making issues; Davis & Stasson, 1988, and Levine & Moreland, 1990, for group decision-making issues; and Kraemer & King, 1988, for a review of technologically supported group decision-making issues.) It is heavily influenced by generating decision support systems/aids that give better answers to the queries posed. In fact, a neglected area has been the creation of systems that assist in better identification or assessment of the problem or situation (Moreland & Levine, 1992; Raphael, 1991). Real and expert teams often can resolve a problem quite nicely once it is correctly identified. Errors, as well as process loss (Steiner, 1972, 1976), occur when the problem is erroneously defined, assessed, understood or framed, as in the Martin Marietta and Vincennes examples. There is little research on the process of situation assessment and the cycle of strategic versus tactical decision making that impacts that process. The second issue revolves around the problems raised by normative factors that effect group decision making (group dynamic variables) in technological context. Research in academic and applied settings has resulted in a large conflicting base of information. The preponderance of studies and system development focus on variables that structure group (specifically, groups in a meeting) process (Galegher, 1990; Kiesler, Siegel, & McGuire, 1984). Pinsonneault and Kraemer (1990) and Kraemer and King (1988) specifically highlight our lack of understanding regarding technologically supported groups and group dynamics variables. We simply know very little of the normative influences that occur in technological settings because there is so little research conducted in this area. The only factors that have received wide consistent attention are the "dominant personality" issue and anonymity as a process controller. The overwhelming result is that group systems cause group processes to democratize (Kiesler et al., 1984). Communication effectiveness provides the third set of research issues. This is a broadly defined field, with the preponderance of information coming from researchers looking at the effects of social presence across machine mediums (Short, Williams, & Christie, 1976), information richness (Daft & Lengel, 1986), and psychological distance (Wellens, this volume, 1989, 1990), among others (McGrath & Hollingshead, 1993). The evidence runs from linguistic (exactly how do we say it; Kiesler et al., 1984) to mechanical (in what medium is the information presented; Chapanis, Ochsman, Parrish, & Weeks, 1972; Galegher, 1990; Wellens, 1989) to personal (the effects of cognitive and decision styles in technological context; Duffy, 1984; Meshkati, 1991). However, as with normative factors, the evidence is piecemeal and conflicting. The trend promoted by software and hardware developers of increasing social presence is seriously questioned by research conducted by Wellens (1989). His research demonstrates the negative effects of providing too much social presence information between team members. One of the most pressing needs is for a

13.

TEAM DECISION MAKING

259

multidisciplinary approach in answering these questions. Researchers in communication, psychology, anthropology, human factors, business, and systems control theory need to begin cooperative efforts. Finally, the most applied perspective comes from those interested in broader organizational and environmental (such as culturaQ issues (Huber, 1990). A promising new organizational theory proposed by Huber, Valacich, and Jessup (1993) outlining group support system impact on organizational effectiveness provides specific proposed relationships among organizational variables and organizational effectiveness. The underlying premise is that group support technology, when correctly applied, will improve an organization's effectiveness. It provides a framework specifically tailored to group decision support systems and is a rich source of research ideas. Meshkati (1991) (along the lines of Rochlin et al.'s 1987 assertions and Rasmussen's 1986 theoretical perspective) has made a strong case that error analysis necessarily requires an analysis at the individual interface level, the job/task analysis level, and the organizational (communication network) level. No one level of analysis will provide possible solutions or preventives for system errors such as those described in the Vincennes incident, the Three Mile Island accident, and the Chernobyl nuclear plant accident. This multilevel perspective can help guide system developers in assessing the appropriate level of intervention when designing "error-free" decision support systems. Environmental issues are those that involve factors outside the immediate organization. For example, the impact of the culture that provides the context for the organization is poorly understood (Jessup & Valacich, 1993); and as described in the Avianca air disaster, a critically important variable. Multicultural factors in a technological context have little clear research evidence to date, but much anecdotal information. I would assume that, with the continuing proliferation of cellular /video phones, facsimile machines, group meeting systems, improvements in videoteleconferencing and digital technology, coupled with an increasingly global economy, this area will only grow as a research focus. The military is experiencing the same trend, as witnessed in Desert Storm. Future battles will be of a combined force nature (with many different countries playing a part) and collaboration needs to be technologically supported on a second-by-second basis. We must become smart about the effects of culture in the technological context because this information may, at one level, help avert the next military conflict and, at another level, help prevent errors that could lead to fratricide. The three examples of team decision making presented earlier provide an applied context that can highlight some of the more important areas of research that have the highest potential in improving the process and content of team decision making. Understanding team decision making and cognitive processes in a technological context is relevant to all three, specifically because of its vague and encompassing wording. We know very little about

260

DUFFY

how groups decide, whether it is in a distributed, technologically supported context (with computer support systems), where members are not face-toface (as in the satellite and Avianca examples) or in the Vincennes example, where team decision making occurred among members who were co-located (in the same physical location) but were operating in an environment of high uncertainty and time pressure. As discussed earlier, normative (member influence) and communication effectiveness factors appear to be relevant to the satellite and Avianca incidents. Both incidents were clearly affected by the distributed location of the team members, compounded by the lack of common referents. Therefore, developing an understanding of the types of system components that would replace or enhance communication effectiveness (and secondarily normative influences) would have immediate payoff. This was less an issue (albeit, still an important one) for the Vincennes, because team members were co-located, had common referents to the problem, and had experience working as a team. Organizational and environmental factors were most evident in two of the three examples. It appears to be least apparent in the satellite example, only because I have assumed that the team members were committed to the same organizational view and brought similar backgrounds to the problem. (Granted, this is a big assumption.) However, these factors seem to be the most obvious in the Avianca and Vincennes incidents and provide a strong focus for possible future research. The cultural context of the Avianca disaster seems self-evident; the military context of the Vincennes provides the organizational perspective that helps explain the types of decisions that would be made in high-uncertainty and high-time-pressure situations. These factors cannot be ignored in trying to build system enhancements to obviate these "errors" in the future. Table 13.1 summarizes this discussion of possible future research areas.

SUMMARY

In this chapter, I have attempted to describe some of the problems that can affect team decision making from a process and content perspective, outlined group decision support system characteristics and their ability to enhance the group decision-making process, and suggested four research areas that encompass the type of issues that should be addressed. To reiterate, team errors/biases fall into two perspectives: process and content. Process errors were defined as the use of the wrong process for the nature of the problem facing the team. Content errors were further subcategorized into informational, normative, and structural "types." Group support systems and collaborative work issues were described by outlining their objectives and looking at research that supported those objectives. Research on team or group

261

13. TEAM DECISION MAKING

TABLE 13.1 Summary Table of Research Issue Priority for Three Examples of Team Decision Making

Satellite Avianca Vincennes

Decision Making Cognitive Issues

Normative

Communication Effectiveness

Organizational Environmental

primary primary primary

secondary secondary unknown

primary primary secondary

secondary primary primary

decision-making problems is still an embryonic field with little empirical research; this is further compounded in the group support technology area. With regard to the four research issues, several conclusions can be drawn. Overall, there is great practical potential to increasing our understanding of the group decision-making process in the technological context. The academic literature has provided the practitioner with some salient variables that can be studied to discover their impact on group decision making (Jessup & Valacich, 1993). Good applied evidence is accumulating (Galegher, Kraut, & Egido, 1990; Hackman, 1990; Jessup & Valacich, 1993). However, normative variables appear to suffer from fragmentation in research. Often, group dynamic variables are experimentally studied independently, when the practitioner knows that their effects are interactive, hence the popularity of anecdotal and experiential evidence (Hackman, 1990; Janis, 1989), rather than the experimental evidence. Future research on team decision making must address this discrepancy in what we know (the anecdotes) versus what we can prove (the experimental evidence). Communication research, by the nature of its topic, has recently begun to have an impact on computer-mediated group decision making. By focusing on what is communicated through the computer medium, researchers are discovering the most salient variables for the improvement of group support technologies (Galegher, Kraut, & Egido, 1990). The most impressive practical literature has come from these researchers. Finally, organizational and cultural issues are only recently being addressed from a psychological (versus a management) point of view (Connolly, 1993). Interestingly, there has been comparatively little written on the interactive nature of group decision making and organizational structure (see Sundstrom et al., 1990). How can technology within organizations be structurally "placed" to enhance the group process? I am referring to both physical location (Polley & Stone, 1993) and procedural aspects of incorporating this technology. Huber, Valacich, and Jessup (1993) provided a theory that can guide the development of research on this question. Anecdotally, this could be evidenced by the success of "power down" decision making-lowering authority for decision making to the lowest possible organizational level-with the requisite restructuring of decision-

262

DUFFY

making responsibility, power, and technological supports (Walter Ulmer, personal communication, January 24, 1990). The ideal solution, of course, is to combine all these perspectives into a single interdisciplinary research program, because the answers within any given experimental paradigm would have the ability to more completely explain each of the other perspectives. The recent growth in the literature on group decision making in a technologically supported context beyond the settings of meetings (Conference on Computer-Supported Cooperative Work [CSCW}, 1990) has proven that this important topic is gaining the attention it deserves; and with a little insight and a lot of hard work, the answers that have eluded us will be uncovered. ACKNOWLEDGMENTS

The author would like to thank Carl Englund and N. John Castellan, Jr., for their insightful comments on earlier versions of this chapter. REFERENCES Abelson, R. P., & Levi, A. (1985). Decision making and decision theory.ln G. Lindzey & E. Aronson (Eds.), Handbook of social psychology, Jrd ed. (Vol. I, pp. 231-309). New York: Random House. Associated Press (1990). Miscommunication puts satellite adrift in low, useless orbit. Dayton Daily News, Wednesday, March 21, p. 12A. Bushnell, L., Serfaty, D., & Kleinman, D. (1987). Team information process: The normativedescriptive approach. In S. E. Johnson & A. Levis (Eds.), Science of C2: Coping with uncertainty (pp. 62-72). London: AFCEA International Press. Cannon-Bowers, J., & Salas, E. (1990, April). Cognitive psychology and team training: Shared mental models in complex systems. Paper presented at the annual meeting of the Society for Industrial and Organizational Psychology, Miami. Cicourel, A. V. (1990). The integration of distributed knowledge in collaborative medical diagnoses. In J. Galegher, R. Kraut, & C. Egido (Eds.), Intellectual teamwork: Social and technological foundations of cooperative work (pp. 221-241). Hillsdale, NJ: Lawrence Erlbaum Associates. Chapanis, A., Ochsman, R. B., Parrish, R.N., & Weeks, G. D. (1972). Studies in interactive communication: I. The effects of four communication modes on the behavior of teams during cooperative problem solving. Human Factors, 14, 487-509. Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid diagnostic signs. Psychological Bulletin, 74, 271-280. Cohen, M.D., March, J. G., & Olsen, J.P. (1972). Garbage can model of organizational choice. Administrative Science Quarterly, 17, 1-25. Connolly, T. (1993). Behavioral decision theory and group support systems. In L. Jessup & J. Valacich (Eds.), Group support systems: New perspectives (pp. 270-280). New York: Macmillan. CSCW (1990). Proceedings of the Conference on Computer-Supported Cooperative Work. New York: Association for Computing Machinery. Daft, R. L., & Lengel, R. H. (1986). Organizational information requirements, media richness, and structural design. Management Science, 32, 554-571.

13. TEAM DECISION MAKING

263

Davis, J. H., & Stasson, M. F. (1988). Small group performance: Past and present research trends. Advances in Group Processes, 5, 245-277. Duffy, L. T. (1984). Leadership and decision making styles. Doctoral dissertation, University of Utah, Salt Lake City. Duffy, L. T. (1990). Team decision making and group decision support systems. Proceedings of the I 990 Symposium on Command and Control Research. McLean, VA: Science Applications International Corporation. Eekhout, J. M., & Rouse, W. B. (1981). Human errors in detection, diagnosis, and compensation for failure in the engine control room of a supertanker. IEEE Transactions on Systems, Man, and Cybernetics, I 1(12), 813-816. Galegher, J. (1990). Intellectual teamwork and information technology: The role of information systems in collaborative intellectual work. In J. Carroll (Ed.), Applied social psychology in organizations (pp. 193-216). Hillsdale, NJ: Lawrence Erlbaum Associates. Galegher, J., Kraut, R. E., & Egido, C. (1990). Intellectual teamwork: Social and technological foundations of cooperative work. Hillsdale, NJ: Lawrence Erlbaum Associates. Hackman, J. R. (1990). Groups that work (and those that don't). San Francisco: Jossey-Bass. Hammond, K. R., Hamm, R. M., Grassia, J., & Pearson, T. (1987, Sept.-Oct.). Direct comparison of the efficacy of intuitive and analytical cognition in expert judgement. IEEE Transactions on Systems, Man, and Cybernetics, 17(5), 753-770. Harvey, J. B. (1974). The Abilene paradox. Organizational Dynamics, 64-83. Harvey, J. B. (1988). The Abilene paradox. Lexington, MA: Lexington Books. Hinsz, V. B. (1990a). A conceptual framework for a research program on groups as information processors. Technical report submitted to the Logistics and Human Factors Division, AF Human Resources Laboratory, Brooks AFB, TX. Hinsz, V. B. (1990b). Considerations in the assessment and evaluation of mental models. Technical report submitted to the Logistics and Human Factors Division, AF Human Resources Laboratory, Brooks AFB, TX. Hinsz, V. B., Tindale, R. S., Nagao, D. H., Davis, J. H., & Robertson, B. A. (1988). The influence of the accuracy of individuating information on the use of base rate information in probability judgement. Journal of Experimental Social Psychology, 24, 127-145. Huber, G. P. (1990). A theory of the effects of advanced information technologies on organizational design, intelligence, and decision making. Academy of Management Review, I 5(1 ), 47-71. Huber, G. P., Valacich, J. S., & Jessup, L. M. (1993). A theory of the effects of group support systems on an organization's nature and decisions. In L. Jessup & J. Valacich (Eds.), Group support systems: New perspectives (pp. 255-269). New York: Macmillan. Hutchins, E. (1990). The technology of team navigation. In J. Galegher, R. E. Kraut, & C. Egido (Eds.), Intellectual teamwork: Social and technological foundations of cooperative work (pp. 191-220). Hillsdale, NJ: Lawrence Erlbaum Associates. Janis, I. L. (1972). Victims of groupthink. Boston, MA: Houghton-Mifflin. Janis, I. L. (1989). Crucial decision making. New York: The Free Press. Janis, I. L., & Mann, L. (1977). Decision making: A psychological analysis of conflict, choice, and commitment. New York: The Free Press. Jessup, L., & Valacich, J. (1993). Group support systems: New perspectives. New York: Macmillan. Johansen, R., Sibbet, D., Benson, S., Martin, A., Mittman, R., & Saffo, P. (1991). Leading business teams: How teams can use technology and group process to enhance performance. Menlo Park, CA: Addison-Wesley. Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237-251. Kahneman, D., Tversky, A., Slovic, P. (1982). Judgement under uncertainty. Cambridge: Cambridge University Press.

264

DUFFY

Keeney, R., & Raiffa, H. (1976). Decisions with multiple objectives: Preferences and value tradeoffs. New York: Wiley. Kiesler, S., Siegel, J., & McGuire, T. W. (1984). Social psychological aspects of computer medial· ed communication. American Psychologist, 39(10), 1123-1134. Klein, G., & Thordsen, M. (1989). Recognitional decision making in C2 organizations. Proceedings of the I989 Symposium on Command and Control Research (pp. 239-244). McLean, VA: Science Applications International Corporation. Kleinman, D., & Serfaty, D. (1989, April). Team performance assessment in distributed decision making. In R. Gilson, P. Kincaid, & B. Goldiez (Eds.), Proceedings of Interactive Networked Simulation for Training (pp. 22-27). Orlando, FL: Institute for Simulation and Training, Florida High Technology and Industry Council. Kraemer, K., & King, J. (1988). Computer-based systems for cooperative work and group decision making. ACM Computing Surveys, 20(2), 115-146. Levine, J., & Moreland, R. (1990). Progress in small group research. Annual Review of Psychology, 41, 585-634. Lord, R. G. (1985). An information processing approach to social perceptions, leadership, and behavioral measurement in organizations. In L. L. Cummings & B. M. Staw (Eds.), Research in organizational behavior (Vol. 7, pp. 87-128). Greenwich, CT: JAI Press. March, J., & Weissinger-Saylor, R. (1986). Ambiguity and command: Organizational perspectives on military decision making. Marshfield, MA: Pitman. McDonald, P. (1990). Group support technologies. Report written for the Organizational Planning and Development Division, Office of Human Resource Management, Federal Aviation Administration, U.S. Dept. of Transportation, Transportation Systems Center, Strategic Management Division, Cambridge, MA. McGrath, J. (1990). Time matters in groups. In J. Galegher, R. Kraut, & C. Egido (Eds.),lntellectual teamwork: Social and technological foundations of cooperative work (pp. 23-62). Hillsdale, NJ: Lawrence Erlbaum Associates. McGrath, J., & Hollingshead, A. (1993). Putting the "group" back in group support systems: Some theoretical issues about dynamic processes in groups with technological enhancements. In L. Jessup and J. Valacich (Eds.), Group support systems: New perspectives (pp. 78-96). New York: Macmillan. Meshkati, N. (1991). Integration of workstation, job, and team structure design in complex humanmachine systems: A framework. International Journal of Industrial Ergonomics, 7, 111-122. Michaelsen, L. K., Watson, W. E., & Black, R. H. (1989). A realistic test of individual versus group consensus decision making. Journal of Applied Psychology, 74(5), 834-839. Michaelsen, L. K., Watson, W. E., Schwartzkopf, A., & Black, R. H. (1992). Group decision making: How you frame the question determines what you find. Journal of Applied Psychology, 77(1), 106-108. Miscommunication puts satellite adrift in low, useless orbit. (1990, March 21). Dayton Daily News, p. 12A. Moreland, R., & Levine, J. (1992). Problem identification by groups. In S. Worchel, W. Wood, & J. Simpson (Eds.), Group process and productivity (pp. 17-47). Newbury Park, CA: Sage. Morgan, B. B., Glickman, A. S., Woodard, E. A., Blaiwes, A. S., & Salas, E. (1986). Measurement of team behaviors in a Navy environment. (Tech. Rep. No. NTSA TR-86-0 140). Norfolk, VA: Old Dominion University, Center for Applied Psychological Studies. Norman, D. A. (1988). The psychology of everyday things. New York: Basic Books. Nunamaker, J. (1992, October). Automating the flow: Groupware goes to work. Corporate Computing, pp. 187-189. Nunamaker, J., Applegate, L., & Kosynski, B. (1988). Computer aided deliberation: Model management and group decision support. Journal of Operations Research, November-December, 826-848.

13. TEAM DECISION MAKING

265

Pennington, N., & Hastie, R. (1992). Explaining the evidence: Tests of the story model for juror decision making. Journal of Personality and Social Psychology, 62(2), 189-206. Pinsonneault, A., & Kraemer, K. L. (1990). The effects of electronic meetings on group process and outcomes: An assessment of the empirical research. European Journal of Operations Research, 46, 143-161. Polley, R. B., & Stone, P. J. (1993). Flexspace: Making room for collaborative work. In L. Jessup & J. Valacich (Eds.), Group support systems: New perspectives (pp. 169-191). New York: Macmillan. Raphael, T. (1991 ). Cognitive flexibility and innovation in the problem definition process for tactical command and control (Tech. Rep. No. D-39W.91). Alexandria, VA: Mystech Associates. Rasmussen, J. (1986). Information processing and human-machine interaction. Amsterdam: NorthHolland. Reason, J. (1990). Human error. Cambridge, MA: Cambridge University Press. Rentsch, J. R., & Duffy, L. (1990, September 26-28). An organizational climate and culture perspective on shared mental models of team members. Paper presented at the 1990 International Conference on Sell-Managed Work Teams, Denton, TX. Rochlin, G. 1., La Porte, T. R., & Roberts, K. H. (1987). The sell designing high-reliability organization: Aircraft carrier flight operations at sea. Naval War College Review, Autumn, 76-90. Rouse, W. B., & Rouse, S. H. (1983). Analysis and classification of human error. IEEE Transactions on Systems, Man, and Cybernetics, 13(4), 539-549. Sanders, G. S., & Mullen, B. {1983). Accuracy in perceptions of consensus: Differential tendencies of people with majority and minority positions. European Journal of Social Psychology, 13, 57-70. Seeger, J. A. (1983). No innate phases in group problem solving. Academy of Management Review, 8{4), 683-689. Short, J., Williams, E., & Christie, B. (1976). The social psychology of telecommunications. Chichester, England: Wiley. Simon, H. (1955). A behavioral model of rational choice. Quarterly Journal of Economics, 69, 99-118. Stammers, R. B., & Hallum, J. (1985). Task allocation and balancing of task demands in the multiman-machine system: Some case studies. Applied Ergonomics, /6{4), 251-257. Stasser, G. (1992). Pooling of unshared information during group discussion. InS. Worchel, W. Wood, & J. Simpson (Eds.), Group process and productivity (pp. 48-68). Newbury Park, CA: Sage. Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision making: Biased information sampling during group discussion. Journal of Personality and Social Psychology, 48, 1467-1478. Steiner, I. D. (1972). Group processes and productivity. New York: Academic Press. Steiner, I. D. (1976). Paradigms and groups. Advances in Experimental Social Psychology, 19, 251-289. Sticha, P., & Gribben, M. (1992). Heuristics and biases in team situation assessment: Observation, synthesis, and training principles (Tech. Rep. under Contract No. N61339-91-C-0105). Alexandria, VA: Human Resources Research Organization. Sundstrom, E., DeMeuse, K., & Futrell, D. (1990). Work teams: Applications and effectiveness. American Psychologist, 45{2), 120-133. Taylor, S. E., & Crocker, J. {1981). Schematic bases of social information processing. In E. T. Higgens, C. P. Herman, & M. P. Zanna {Eds.), Social cognition: The Ontario Symposium (pp. 153-171). Hillsdale, NJ: Lawrence Erlbaum Associates. Tetlock, P. E. {1985). Accountability: The neglected social context of judgement and choice. Research in Organizational Behavior, 7, 297-332. Tindale, R. S., & Larson, J. R. (1992). Assembly bonus effect or typical group performance? A comment on Michaelsen, Watson, and Black (1989). Journal of Applied Psychology, 77(1), 102-105.

266

DUFFY

Tolcott, M.A., Marvin, F. F., & Bresnick, T. A. (1989). The confirmation bias in evolving decisions. Proceedings of the 1989 Symposium on Command and Control Research (pp. 232-238). McLean, VA: Science Applications International Corporation. Webster's New Collegiate Dictionary (1980). Springfield, MA: G. & C. Merriam-Webster Co. Wegner, D. (1987). Transactive memory: A contemporary analysis of the group mind. In B. Mullen & G. Goethals (Eds.), Theories of group behavior (pp. 185-208). New York: Springer-Verlag. Wellens, A. R. (1989). Effects of telecommunication media upon information sharing and team performance: Some theoretical and empirical observations. IEEE AES Magazine, September, 13-19. Wellens, A. R. (1990, January). Assessing multi-person and person-machine distributed decision making using an extended psychological distancing model. Final University Resident Research Program Report lor the Air Force Office of Scientific Research, Washington, DC.

CHAPTER

14 GROUP SITUATION AWARENESS AND DISTRIBUTED DECISION MAKING: FROM MILITARY TO CIVILIAN APPLICATIONS A. Rodney Wellens University of Miami

Situation awareness and distributed decision making are terms conceived within the world of military and industrial support systems. They were born in neighborhoods whose boundaries overlapped but remained distant during their infancy. As they matured they were adopted by different caretakers who sent them into the world searching for applications in complex technological support systems. Occasionally, they were seen darting through the halls of academia, visiting schools of engineering, business, and medicine. Eventually, they started seeing one another at scientific conferences and were ultimately wed by interdisciplinary border travelers who saw their common heritage and potential. As they moved into the 1990s, their hybrid offspring were reintroduced within academic, military, and industrial settings under different names with multiple applications. The purpose of this chapter is to introduce you to one of these conceptual offspring living in a high-tech telecommunications environment. I attempt to trace its lineage, describe its life to date, and speculate on its future. If you are already familiar with situation awareness and distributed decision making, you will no doubt recognize close relatives of this hybrid, including first cousins who have evolved within adjacent research communities. For those not familiar with this literature, the following pages may stimulate your interest sufficiently to want to learn more about it and perhaps even spawn new offspring within your own field of inquiry.

267

268

WELLENS

INDIVIDUAL SITUATION AWARENESS AND DECISION MAKING

Mica Endsley (1988) defined situation awareness (SA) as "the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future" (p. 792). Situation awareness has become a "buzzword" in military circles that can be roughly conceived of as an individual's internal model of the world at any point in time. Situation awareness and situation assessment (the process that ultimately generates awareness) have become central concepts within several recent models of "real time" human decision making. These models relate to dynamically alterable environments where information is constantly changing and frequent monitoring may be necessary to clarify the current state of knowledge. In reviewing the origins of SA within the military flight environment, Press (1986) found that much of the success of WWI flying ace, Oswald Boelcke, was attributed to his ability to gain awareness of the enemy before the enemy became aware of him. This required a sustained shift in attention from the cockpit to the surrounding airspace without loss of control of the aircraft or falling prey to the enemy. With the increased speed and sophistication of fighter aircraft over the past several decades, the pilot has been asked to attend to many different kinds of complex navigation, weapons, and control systems while still completing his or her mission. Gone are the days of the open cockpit, white scarf, and goggles. Today the pilot is sealed within a pressurized container full of electronic displays and alarms all vying for attention. According to Endsley (1988), it has become a major design goal for the developers of new aircraft systems to help the pilot achieve a composite picture of the environment required for successful task performance (cf. Dornheim, 1986; Morishige & Retelle, 1985; Person & Steinmetz, 1981). The relationship between SA, workload, pilot decision making, and performance has been conceptualized by Endsley (1988) within the model shown in Fig. 14 .I. The pilot's perception of the elements in the environment, as sensed from aircraft displays as well as directly, forms the basis for situation awareness at any point in time. As the pilot puts these elements together to form a higher level holistic picture (gestalt) of the situation, he or she is better able to grasp the present and project future scenarios. The quality and level of a pilot's SA is moderated by his or her native abilities, training and experience, preconceptions and objectives, and ongoing work load. SA forms the critical input to pilot decision making, which is the basis for all subsequent pilot actions. Endsley (1988) further developed the SA concept by suggesting multiple SA zones that extend outward in time and space from the individual. These SA zones are depicted within Fig. 14.2. The "immediate" SA zone for the

FEEDBACK

ENVIRONMENT

PERCEPTION OF ELEMENT IN CURRENT ITUATION

FIG. 14.1. Endsley's Aircrew Decision Model showing the relationship of situation awareness to decision making and performance. From "Situation Awareness Global Assessment Technique (SAGAT)" by M. R. Endsley, 1988, Proceedings of the IEEE National Aerospace Electronic Conference, 3, p. 790. Copyright © 1988 by The Institute of Electrical and Electronics Engineers, Inc. Adapted by permission.

LONG-TERM

FIG. 14.2. Endsley's representation of multiple SA zones that extend outward from the individual. From "Situation Awareness Global Assessment Technique (SAGAT)" by M. R. Endsley, 1988, Proceedings of the

IMMEDIATE

IEEE National Aerospace Electronic Conference, 3, p. 791. Copyright

© 1988 by The Institute of Electrical and Electronics Engineers, Inc. Adapted by permission.

269

270

WELLENS

pilot might be comprised of environmental elements within the cockpit, including instrument displays. "Intermediate" elements might include other aircraft, or ground terrain in low altitude flying, just outside the cockpit. The "long-term" SA zone might represent distant targets or potential threats that may be on the horizon. The relative size and shape of these SA zones would be expected to vary according to a number of information-processing factors related to attention apportionment. For example, increased work load has been found to bring about a kind of attention myopia that limits attention to a few salient events. This may foreshorten the individual's ability to focus on events that lie outside his or her "immediate" SA zone. Similarly, if cognitive resources are being applied to forecast future events, the individual may Jose sight of what should be of more immediate concern. Another way of characterizing this might be to think of individuals being endowed with a kind of "cognitive zoom lens" that allows them to "zoom in" for a close-up view or "pull back" for a wide-angle perspective. The particular focal length chosen will depend on the demands of the situation and the internal state of the perceiver. In their information-processing model of pilot decision making, Wickens and Flach (1988) outlined several additional biases and heuristics that may influence situation assessment and decision making. Notice in Fig. 14.3 that Wickens and Flach place situation assessment downstream from environmental cues and upstream from choice behavior and action. An accurate assessment of a situation often requires the integration of a large number of cues that must be interpreted against a knowledge base in long-term memory. Wickens and Flach emphasized the dynamic interplay between working memory, long-term memory, and cue-seeking behavior that may result in biased information processing and decision making. A pilot's understanding of the current state of the "world" (SA) is dependent on his or her past experiences as well as the relative salience of external cues. It should be obvious by now that the SA concept has thrived within the world of aviation. However, it should also be noted that the SA concept can be readily applied to many different settings. In fact, Ben-Bassat and Freedy (1982) have identified situation assessment tasks as a general family of problem-solving tasks. The generic nature of this family is characterized as a "multiperspective, multimembership, hierarchical, pattern recognition problem." SA constitutes a fundamental stage in many kinds of decisionmaking problems including medical diagnosis, battlefield reading, and corporation status assessment. In each of these settings, information from many different sources needs to be assimilated into an overall "big picture" in order to decide what to do next. Once the situation has been identified, decision rules may be applied in order to determine the appropriate course of action. In the case of "expert" decision makers, some theorists like Companion (1990), Klein (1989), and Klein and Klinger (1991) believe that actions may flow almost

271

14. GROUP SITUATION AWARENESS Perception [Q] and ~--r-----------~ Attention ~ -

I I I I

Working Memory

----,

Criterion Setting

Situation Assesment (Diagnosis)

1

I I I I

1

I I I

I..----~------.

Cues

I

'----~---'

~

Hypothesis Action : Generation Generation 1 L______________ _.I Risk Assessment

Long-Term Memory

[!J SALIENCE BIAS [BJ REPRESENTATIVENESS ~"AS IF" HEURISTIC

~AVAILABILITY HEURISTIC HEURISITC

@]CONFIRMATION BIAS

~FRAMING

FIG. 14.3. Wickens and Flach's information-processing model of decision making showing the relationship of situation assessment to various biases and heuristics that may influence choice behavior. From "Information Processing" by C. D. Wickens and J. M. Flach, 1988, in Human Factors in Aviation (p. 128) edited by E. L. Wiener and D. C. Nagel, 1988. New York: Academic Press. Copyright © 1988 by Academic Press, Inc. Adapted by permission.

directly from recognition of familiar patterns to form a very fast "skill-based" or "recognition-primed" course of action.

GROUP SITUATION AWARENESS AND DECISION MAKING

It does not take a giant leap of the imagination to see how the SA concept could be applied to group settings. The group dynamics literature is full of research on the effects of "frames of reference" and variable "points of view" on group formation, decision making, conflict, and cooperation (see Hendrick, 1987; McGrath, 1984; Steiner, 1972). By assigning Endsley's SA zones to individual members of a team or working group, one can graphically represent the degree of overlap in immediate and long-term perspectives held in common within the group. In doing this, Wellens (1989a) defined group situation

272

WELLENS

awareness as "the sharing of a common perspective between two or more individuals regarding current environmental events, their meaning and project· ed future status" (p. 6). It was assumed at the time the definition was developed that the greater the degree of group situation awareness obtained, the higher the degree of group coordination and task performance would be observed. This is not to say that all members of a group should strive to obtain totally overlapping SA zones. Indeed, much of the strength of collaboration comes from group members monitoring different parts of the environment and pooling their varying perspectives when necessary. The key to optimal group SA appears to be arranging group members such that enough overlap occurs to maintain group coordination while allowing enough separation to maximize coverage of the relevant environment. Apparently, Endsley (1989) came to a similar conclusion in her definition of "crew SA" as the "degree to which every crew member possesses the SA required for his position." Several studies of airline crew coordination (see Foushee, 1982, 1984; Foushee & Helmreich, 1988) suggest that the sharing of information among crew members is a critical factor in obtaining optimal SA and task perform· ance. Foushee and Manos (1981) reported a tendency for crews who communicate more to perform better. When flight status information is shared, fewer errors occur related to problems such as misreading or missetting instruments, and mishandling engines or fuel systems. Several tragic airline accidents have been attributed to one or more flight crew members ignoring information being communicated or becoming so fixated on a relatively minor problem that they lost sight of the "big picture" required to fly the aircraft. One important aspect of crew communication is recognizing that it takes effort and attentional resources to accomplish, and that this can have both positive and negative effects on task performance (Wellens, 1990). There are many biases that may arise in group settings that can add to the biases already present in individual decision making. For example, in judgmental tasks where there exists no commonly accepted system of logic that would lead to an unambiguously correct decision (Laughlin, 1980), there appears to be a bias toward discussing information that is already shared in common among participants. Presentation of uniquely held information is inhibited, especially when it conflicts with already held beliefs (Stasser & Titus, 1985). When concurrence seeking is high, compliance to the majority (Asch, 1951) and "groupthink" (Callaway & Esser, 1984; Janis, 1972) may replace rational decision making. Similarly, where power hierarchies are present within a group, obedience to authority (Milgram, 1965) may supersede independent judgment. Duffy (1992) has recently reviewed team decision-making biases from a perspective that views groups as information-processing units. Drawing on the work of Hinsz (1990}, Hinsz, Tindale, and Vollrath (1991}, Lord (1985), Wegner (1987}, and others, Duffy notes that groups must perceive, encode, store,

14. GROUP SITUATION AWARENESS

273

and retrieve information in order to accomplish decision-making tasks. The quality of the group's output will depend not only on the information available to individual group members but also on the shared "mental model" (Hinsz, 1991; Norman, 1983; Rouse & Morris, 1986) present in the group. One aspect of a group mental model is the agreed upon representation of how the group operates as a system (who is responsible for what, who reports to whom, etc.). Another aspect of this term is the shared understanding of a problem facing the group. Thus, Orasanu (1990) sees a crew's shared mental model as arising out of the articulation of situation assessment (interpreting situation cues) and metacognition (defining the problem and devising a plan for coping with it). This definition suggests the creation of a shared cognitive schema (Thorndyke, 1984) that helps organize thinking about the problem. Casting a situation into a commonly shared frame of reference would seem essential for fusing information into a coherent "big picture." The extent to which this coherent perspective would be expected to occur within a group would depend on the past experience of the group, the uniqueness of the situation, and the quality of communication and trust between team members.

DISTRIBUTED DECISION MAKING AND GROUP SITUATION AWARENESS

In making the transition from individual to group decision making, the problem of synchronizing points of view and pooling information was added to the problem of recognizing patterns of events and selecting appropriate responses. If we now consider multiple decision-making units that must act in unison even though they are separated geographically, we have entered the world of distributed decision making. The Committee on Human Factors of the National Research Council (1990) recently proposed a working definition of distributed decision making (DDM) as the task faced by "organizations in which the information and responsibility for decision making is distributed among individuals within the organization, who are often distributed geographically" (p. xi). Sage (1987) described distributed decision-making environments as those where "decision making responsibility and knowledge bases are physically separated and potentially geographically distributed; and where there are multiple agents, each responsible for portions of the decision making effort" (p. 920). Whereas group decision making typically involves proximate group members pooling information to arrive at a common decision, DDM often involves semiautonomous decision makers who may be physically separated while attempting to share information and coordinate their activities to arrive at decisions that satisfy both regional and more global objectives. Typical of a DDM situation is the tactical and strategic decision making that must occur

274

WELLENS

within a military command, control, and communication (C 3) environment. United States military activities in the Persian Gulf demonstrated the need for a high degree of coordination between air, land, and sea forces. Local decisions on the ground had to take into account what was going on overhead as well as out at sea. Other examples of DDM can be drawn from multinational business organizations as well as state and local governmental agencies, public schools, and university systems. Although decentralized organizational structures have been around for many years, what makes DDM unique in the 1990s is the development of high-speed telecommunication and data processing systems, plus the need for quicker response times. Even though decision makers may be physically separated, they can now be brought into electronic proximity with one another by means of various "groupware" (Engelhart & Lehtman, 1988; Galegher & Kraut, 1990; Johansen, 1988) or other teleconferencing tools ranging from shared computer networks to telephone and two-way television conferencing. Recent advances in machine intelligence (Wellens & McNeese, 1987) also make the possibility of sharing decision-making tasks with automated or expert systems feasible. The possibility of creating electronic groups comprised of either remotely located humans or humans and intelligent machines, that can work together in real time, makes the idea of timely "group situation awareness" more viable within a DDM environment. In attempting to find a method of representing the process of multiperson and person-machine decision making within a DDM environment, Wellens and Ergener (1988) extended an information-processing model originally developed by Wohl (1985, 1987) to describe a tactical warning and assessment system. The extended DDM model is depicted in Fig. 14.4. Like Wickens and Flach's (1988) individual decision-making model, Wohl's model placed the integration and assessment phase of decision making after the initial collection of raw data from multiple information sources and before the application of decision rules that allow selection of alternative action plans. All initial raw data, including historical data as well as current events, comprised what Wohl called "information space." After this information is integrated and assessed, a "big picture" emerges that is called "situation space." Finally, "action space" represents the behavioral options available that can affect environmental events, including those that contributed to the original information space. Wellens and Ergener (1988) added a feedback loop from action space to information space to show how actions taken ultimately feed back to information space and allow a reexamination of the situation. Wellens and Ergener also added a "communication bridge" to show how two or more decision-making units might share initial information, differing points of view, and action plans in order to coordinate their activities. The communication media used to create the bridge linking decision-making units would be expected to play a critical role in the development of group situation awareness and effective decision making.

275

14. GROUP SITUATION AWARENESS Feedback

1

t ACTION SPACE

INFORMATION SPACE

t

~Integration ~ & •

SITUATION SPACE

Assessment

i

Decision Rules

~ M

ACTION SPACE

-

!PJC9~m~m®oo ~w~moo® lWoom~ ~

+--

Feedback

-

1

+--

FIG. 14.4. Wellens and Ergener's extended DDM model based on Wohl's (1985, 1987) information-processing approach. From "The CITIES Game: A ComputerBased Situation Assessment Task for Studying Distributed Decision Making" by A. R. Wellens and D. Ergener, 1988, Simulation and Games, 19, 306. Copyright © 1988 by Sage Publications. Adapted by permission.

COMMUNICATION NETWORKING AND DISTRIBUTED DECISION MAKING

One potential obstacle facing decision makers in DDM environments is the physical distance separating them from each other and from critical information. As mentioned in the previous section, various kinds of electronic media can be used to help bridge this gap. In attempting to conceptualize differences in electronic media, several authors (e.g., Connors, Harrison, & Akin, 1985; Kaplan, 1977; Korzenny, 1978) have used the idea of communication richness or bandwidth to order media according to their ability to carry information generally associated with face-to-face interactions. Following this theme, Wellens (1986, 1989b) proposed a generic "psychological distancing" model that used a proximity metaphor to order media according to the "immediacy" (Mehrabian, 1971, 1972), "richness" (Lengel & Daft, 1984), and "social presence" (Short, Williams, & Christie, 1976) they provide. Figure 14.5 shows the model's conceptual ordering of various telecommunication media that could be used to provide a communication bridge between decision-making units. According to the model, as communication bandwidth decreases, the

276

WELLENS

FACE-TO-FACE IK... !IIC, VISUAL. ~AAAlMO!AtTJC. UiiiCK.IIS'TIC)

TWO-WAY T.V. 1\'IISIJAL.~AIU~~

TELEPHONE !"AIIlA~. LINQIUIS'nCl

,.........,.,

COMPUTER MESSAGING

CLOSE _ _ _ _ _ ,PSYCHOLOGICAL _ _ _ _ _ REMOTE DISTANCE FIG. 14.5. Wellens' Psychological Distancing Model's ordering of various telecommunication media according to the number of communication channels available. From "Effects of Telecommunication Media Upon Information Sharing and Team Performance: Some Theoretical and Empirical Findings" by A. R. Wellens, 1989, IEEE AESMagazine, September, 14. Copyright© 1989 by The Institute of Electrical and Electronics Engineers, Inc. Adapted by permission.

physical representation of an interlocker is experienced less directly and appears more psychologically remote. In general, the empirical literature suggests that information-"rich" media can be used to support the socioemotional aspects of interaction that promote the establishment of positive interpersonal bonds, cooperation, role differentiation, and compliance. lnformation"lean" media tend to increase interpersonal formality and focus interactants on the exchange of task-related factual information. Given the selective filtering capabilities of electronic media, it may be possible to reduce some of the negative impacts of group dynamics on decision making, summarized earlier, while enhancing the positive effects of resource pooling. For example, Johansen, Vallee, and Collins (1978) noted that some users view computer-based conferencing as a "highly cognitive medium that, in addition to providing technological advantages, promotes rationality by providing essential discipline and by filtering out affective components of communication." These authors suggest that under some circumstances users feel computer conferencing may be superior to face-to-face meetings in that it filters out "irrelevant and irrational interpersonal 'noise' and enhances the communication of highly-informed 'pure reason,' a quest of philosophers since ancient times" (p. 386). Early studies of communication networking within problem-solving groups (e.g., Leavitt, 1951; Shaw, 1964, 1978) demonstrated the potential impact of manipulating communication structure on team performance, leadership emergence, and productivity. By varying the number of person-to-person communication links feeding into various group positions, investigators could predetermine group member satisfaction (the more links the greater satisfaction), leadership emergence (central positions, where communication lines converged, exercised more information power than peripheral positions), and

14.

GROUP SITUATION AWARENESS

277

overall group performance (centralized structures worked best for easy tasks, decentralized structures worked better for difficult tasks). Although these early networking studies broke new ground by pointing out the power of communication structure on group problem solving, they manipulated only the presence or absence of communication links between participants and did not deal with the more complex issue of the particular communication channels employed. On the other hand, Chapanis and his colleagues (Chapanis, 1971; Ochsman & Chapanis, 1974) systematically explored the effects of several different types of media within a series of two-person problem-solving tasks. The objective of this research was to determine the optimal arrangement of communication channels to use when connecting people with intelligent machine aides of the 2001 variety. By design, these studies dealt with highly structured tasks and did not examine media effects within the more dynamic situation assessment settings characteristic of DDM environments. Nevertheless, strong media effects were found with regard to the volume of information transmitted and the speed with which tasks were accomplished. These and other early studies of communication bandwidth effects suggested that the communication bridge selected to connect decision-making units within a DDM environment would be critical to the development of group situation awareness and timely decision making. What was lacking was an experimental paradigm to study these effects in a systematic manner.

AN EXPERIMENTAL PARADIGM FOR STUDYING GROUP SITUATION AWARENESS

In an attempt to better understand the effects of communication media on group situation awareness within a distributed decision-making environment, Wellens and Ergener (1988) developed a computer-based simulation of a civilian command, control, and communication (C 3) setting. The C3 Interactive Task for Identifying Emerging Situations (CITIES) provided a functional simulation of many aspects of a metropolitan emergency response ("911 ") center where information is constantly being received from multiple sources by members of remotely located semiautonomous decision-making units. The task was developed with all the major components of the extended DDM model in mind. Thus, it (a) provided control over all information available (both historical and real time events) to each decision-making unit, (b) allowed situation awareness to emerge between decision-making units, (c) allowed the manipulation of communication media used to connect decision-making units, (d) provided a finite yet flexible action space, and (e) provided objective feedback regarding the effects of actions taken. The task was embedded within a setting that allowed the monitoring of group information-gathering

278

WELLENS

activities as well as the recording of actions taken by each decision-making unit. The task and research setting have been described in detail elsewhere (Wellens, 1990; Wellens & Ergener, 1988), so only a brief description is given here. Task Description

CITIES' participants are assigned to one of two physically separated "dispatch centers" that contain computer displays and telecommunication equipment. One dispatch center has the responsibility of assigning fire and rescue resources to emergency events, whereas the other dispatch center assigns police and tow truck resources. Emergency events appear as flashing red icons on computer-generated maps displayed on color television monitors within each dispatch center. The pacing, distribution, and interdependence of events to which teams respond is predetermined by programmed event scenarios. Detailed information about the nature and intensity of emergency events may be obtained by dispatch center personnel by simply touching the flashing event icons on their touch sensitive television monitors. The computer graphic screen temporarily gives way to a text screen that provides the requested information. Resources, such as fire trucks or police cars, can be allocated to map locations via a similar touch process. The fire-rescue and police-tow centers are instructed to cooperate in meeting their joint goal of preserving life and property in the regions displayed on their computer maps. Resources correctly assigned to events have the effect of reducing the intensity of the events that would otherwise grow in strength. Event icons turn green and eventually extinguish if appropriate resources have been allocated to them; otherwise, they remain red and eventually turn purple if they grow "out of control." Team performance is measured by a numeric index that takes into account the speed and appropriateness with which resources are assigned. In order to develop an overall "big picture" and coordinate activities, police and fire centers must share information across communication links that are provided by the experimenter. In line with the distancing model described earlier, these links ranged from a narrow-band computer messaging system (Ergener & Wellens, 1985) to a broad-band two-way television system (Wellens, 1978). Event Scenarios

Two different event scenarios were written for CITIES, each with a slightly different way of assessing group situation awareness. The first was called the "Visiting Senator" scenario. It consisted of 44 police-tow and 45 fire-rescue events distributed with escalating intensities over a 30-minute time period.

14.

GROUP SITUATION AWARENESS

279

At the beginning of the scenario the police-tow dispatch center received a written itinerary listing the times and places a senator would be visiting the region during their work shift. They were asked to insure adequate police resources as the senator moved in an orderly fashion around the city map. The remotely located fire-rescue team received no initial information about the senator. As the scenario unfolds, both police-tow and fire-rescue centers respond to emergency events, many of which occur along the path taken by the senator. Recognizing that these events were linked to the senator's movements would enable the police-tow dispatch center to place resources in advance of the senator's movements in order to reduce the intensity of potentially threatening events. Transmitting information about the senator's movements to the fire-rescue dispatch center would enable that center to similarly discover the pattern and respond appropriately. The measure of group situation awareness in this scenario was the ability of the police-tow and fire-rescue decision makers to accurately report the connection between the senator's movements and major emergency events on a post-session questionnaire. The second scenario written for CITIES was called the "Three Waves" scenario. It consisted of three 20-minute segments each containing 16 policetow and 20 fire-rescue events arranged in temporally offset ascending "waves" of intensity that crested with a major fire or rescue incident. Each wave began with a police or tow event that escalated and eventually "spilled over" into the fire-rescue domain. Messages announcing these events included information that forewarned the police-tow center that fire-rescue involvement might be needed. With good communication between dispatch centers, the fire-rescue center could be made aware of the incident before it became critical, and resources could be preallocated accordingly. The measure of situation awareness in this scenario was the reported extent to which fire-rescue dispatchers preassigned resources to these events. Another measure of situation awareness has recently been added to this scenario and involves a postsession debriefing that requires operators to recall details of selected events throughout the scenario. Experimental Manipulations and Outcomes

Several experiments have been conducted using CITIES, some conducted at the University of Miami's Interactive Television facility (Wellens, 1979, 1987; Wellens, Grant, & Brown, 1990) and others at a similar facility constructed at the Armstrong Aerospace Medical Research Laboratory (Wellens, 1990). Although each of these experiments was designed to test multiple hypotheses, only those portions related to group situation awareness are discussed here. The first experiment (Wellens, 1987) was a "test run" of the CITIES procedure. It employed the "Visiting Senator" scenario to assess the effects of several

280

WELLENS

different kinds of team-to-team communication media on group situation awareness and performance. Within this experiment 40 groups, each comprised of 2 two-person teams, were randomly assigned to 1 of 4 team-to-team communication conditions (two-way television, audio intercom, computer messaging, or no communication control). Each team consisted of a "dispatcher" (who physically interacted with the CITIES touch screen map) and an "advisor" (who was to keep abreast of all events and maintain contact with the remote team). Each two-person team was assigned to either a police-tow or a fire-rescue center. Videotapes were made of all within- and between-team communications. Post-session questionnaires were used to assess each team's awareness of the senator's movements being linked to emergency events. Computer logs were scanned by an automated scoring program to yield a numeric index of overall group performance. Results of the first experiment were somewhat disappointing, but highly informative. Although the majority of the police-tow teams became situationally aware of the connection between the senator's movements and emergency events, none of the remote fire-rescue teams in any of the communication conditions reported such awareness. An examination of the communication patterns within these groups indicated why. First, the majority of communications for both police-tow and fire-rescue teams stayed within-team. Dispatchers and advisors played close attention to their own areas of responsibility. In fact, advisors often micromanaged their own dispatchers and narrowed their focus to immediate concerns. Second, between-team communications rarely dealt with strategy formation. In most cases, these communications were either simple requests for assistance or comments about how the task was going. No significant differences were found in group performance across communication conditions. Simply increasing between-team communication potential did not guarantee increased communication usage. A second experiment (reported in Wellens, 1990) was conducted that represented a conceptual replication of the first experiment. Within this experiment, hybrid human-machine teams replaced the all-human advisordispatcher teams in the first experiment. A rule-based expert system was developed that took the place of the human dispatcher. This system could automatically respond to incoming emergency events by going through the same steps that the human dispatcher used to perform. Thus, it displayed incoming messages, examined event intensities, and assigned resources accordingly. The remaining human team member was given the role of "supervisor." The supervisor's role was to monitor the expert system and communicate with the other dispatch center to formulate optimizing strategies. The supervisor at each dispatch center could override the local expert system to reallocate or preassign resources under its control. As in the first experiment, 40 groups comprised of 2 teams each were randomly assigned to 1 of 4 team-toteam communication conditions. Teams were again exposed to the "Visiting

14. GROUP SITUATION AWARENESS

281

Senator" scenario, but events were distributed over a 60-minute time period instead of the original 30-minute time period. This replication, then, optimized the chances of group situation awareness developing between decision-making units by allowing more time to discuss strategy and demanding less in terms of direct manipulation of the system. The results of the second experiment were markedly different from those obtained in the first experiment. Between-team communication bandwidth systematically affected the amount of information exchanged between teams. The number of words exchanged between supervisors significantly increased in stepwise fashion as bandwidth increased from computer messaging to audio intercom to two-way television. Team performance also showed slight increases across these conditions but did not reach statistical significance. Perhaps the most interesting result in the second experiment was the effects of communication media on situation awareness. Recall that the measure of situation awareness in the "Visiting Senator" scenario was the post-session reporting of a connection between the senator's movements and significant emergency events. In theory, all the police-tow teams should have become situationally aware because they had access to all the necessary information (the senator's itinerary plus associated emergency events). The firerescue teams should have become aware as the amount of information exchanged between dispatch centers increased. Figure 14.6 shows the actual outcome for the experiment. Note that whereas the percentage of fire-rescue teams that became aware increased slightly at the higher bandwidths, the percentage of police-tow teams that became aware actually decreased as bandwidth increased. One possible explanation for this finding was that the increased attention demands brought about by broader team-to-team communication bandwidths interfered with the police-tow supervisor's ability to recognize what should have been an obvious pattern. Thus, a modest gain in situation awareness on the part of the remote (fire) team was apparently bought at the expense of decreased awareness on the part of the originating (police) team. Two additional experiments were conducted that used the "Three Waves" scenario (Wellens, 1990; Wellens et al., 1990). The focus of these studies was to examine alternative ways of communicating critical event information from a completely automated dispatch center (no human presence) to a remote dispatch center supervised by a human operator. In each case the automated center received advanced warning of ev~nts that would eventually involve the fire-rescue center. An automated communication system was developed that used either a low-bandwidth computer messaging system or a highbandwidth animated "talking head" to relay information to the remotely located human. Both of these communication systems proved to be effective in increasing the human operator's awareness of developing events and reported ability to preallocate resources. Interestingly, an additional condition in

282

WELLENS

100

75

70

p

F

p

E E

'E ca

0

.Q

0

0

z

0

F

p

p 0

=s:::s

F

~

c(

~ Q)

~

FIG. 14.6. The percentage of police (P) and fire (F) teams in each communication condition that showed evidence of situation awareness. (Data from "Assessing Multi-Person and Person-Machine Distributed Decision Making Using An Extended Psychological Distancing Model" by A. R. Wellens, 1990. WrightPatterson Air Force Base, Ohio: Armstrong Aerospace Medical Research Laboratory.)

which the human operator could simply look "over the shoulder" of the remote expert system by viewing its CITIES' map display proved equally effective. Being able to directly monitor the remote system's actions decreased the human operator's dependence on explicit written or verbal messages relayed by the automated system. The second experiment followed up on the latter finding and examined the effects of time stress on a human operator's attention to a remote dispatcher's display. The "Three Waves" scenario was run with events distributed across the normal 20-minute period, extended to 40 minutes (low stress), or shortened to 10 minutes (high stress) per wave. The dispatcher's head and eye movements were unobtrusively monitored by a concealed television camera. As one might expect, the remote dispatcher spent the majority of his time attending to his own display, but this was exacerbated under increased time stress. Thus, the remote operator spent proportionately less time

14.

GROUP SITUATION AWARENESS

283

visually attending to the auxiliary display when he was placed under high time stress as compared to the low stress condition. This was associated with a significant decrease in reported ability to preallocate resources under high time stress and an accompanying significant decrease in performance. A secondary measure of situation awareness that assessed operators' recall of details associated with selected events that had occurred during various parts of the scenario was also included in this experiment. Results showed a trend for decreased recall of events that occurred under high time stress conditions. Even under relatively low time stress conditions, however, operators' recall was less than perfect {averaging about 60%). In discussing this relatively low recall rate with subjects and examining videotapes of the sessions, it was discovered that one strategy for coping with time pressure was to streamline the information-gathering process before allocating resources. Thus, if several emergency events occurred in rapid succession, one way for dispatchers to respond to all of them in a timely manner was to allocate resources to them before fully understanding the nature of the event. If the allocated resource proved to be sufficient in handling the event, there was no motivation to go back and explore it further. When responses became "reactive" and less "analytic," there was probably less chance for long-term memory involvement and a lowered probability of recall later. Lessons Learned

One important lesson learned from the development of this experimental paradigm was an increased appreciation for the fact that group-to-group communication and distributed decision making are far more complex than any model has yet to capture. Initially, it was expected that group situation awareness and performance would increase as a function of increased channel capacity between decision-making units. However, it was found that the kind of information presented across available channels was far more important than channel capacity. It now seems obvious that increasing the "social presence" of remote decision makers can introduce unwanted distractions and attention obligations that may actually detract from a team's ability to focus on information already available to it. In some cases, simply presenting an abstract representation of what actions a remote decision maker has made (or intends to make) may provide just the kind of speedy information another decision maker needs without either party having to take the time or energy to formulate an explicit message. In comparing human-human teams with hybrid human-machine teams, some additional lessons were learned. First, human pairs who were co-located tended to make decision-making processes "visible" by openly discussing plans before taking action. Single individuals or computer-based expert systems

284

WELLENS

kept these decision-making strategies "private" and unobservable, sometimes making it difficult for an outside observer to ascertain the underlying reasons for an action. The difficulties that can arise from automated systems that do not "communicate" have been well documented (e.g., Braune, 1990; Wiener, 1988). Second, human pairs appeared to be more "egocentric" in their handling of information. Thus, the "pull" of a proximate human partner appeared to bias attention in the direction of that partner, perhaps at the expense of information that should have been sent or received from remotely located team members. It has become obvious to the investigator that group situation awareness, whether it occurs between people or between people and machines, does not just "happen." It takes a concerted effort to share and integrate information in such a way as to detect predictive patterns from a stream of incoming events. Whereas individuals, and machines, may vary in their ability to share, integrate, and interpret information, there can be little doubt that this process requires information-processing resources that cannot be simultaneously applied to other tasks requiring similar capacities. An interesting example of the limited capacity problem occurred when the automated system, used in the third experiment, was being developed. A method had to be found for a computer to quickly dispatch resources while simultaneously notifying a remote operator of potentially critical events. Eventually, it took three microprocessors running in parallel to accomplish this task in real time. The first processor housed the decision-making software that allocated local resources; the second processor filtered incoming information for cues that would be potentially relevant for the remote team; and the third processor assembled and transmitted the appropriate messages to the remote team based on the second processor's output. All three processors ran programs that were "interrupt driven" so that they had to attend to new data as it arrived. When these three processors were being tested as an integrated system, the investigator intentionally speeded up the flow of data beyond what would normally occur in a programmed scenario just to see what would happen. Interestingly, an early version of the computergenerated graphic representing the human-like "talking head" showed signs of "stress." Thus, when data arrived too quickly it became hesitant in its speech and uncoordinated in its movements. Although this was perfectly understandable in terms of its programming (the data driven "interrupts" momentarily disrupted subroutines responsible for speech, eye, and mouth movements}, it made salient the potential effects of rapidly incoming events on the similarly limited attention capacities of human subjects. Another less obvious lesson learned in these experiments was the complex relationship between situation awareness and performance. Whereas one might think that increased SA would automatically be associated with increased performance, this was not always the case. If SA took too long to

14. GROUP SITUATION AWARENESS

285

develop or consumed time that should have been spent taking action on other events in need of attention, the cost of SA would rapidly outweigh its benefit. Thus, although there was a trend toward increased performance as SA increased, there were instances where SA was bought at the expense of overall performance. More thought needs to go into determining what level of group SA is needed for accomplishing different kinds of group tasks.

CONCEPTUAL AND PRACTICAL ISSUES

An attempt was made in this chapter to trace the evolution of the group situation awareness concept and then describe a paradigm used to study it within a distributed decision-making environment. In coming up with an operational definition of group situation awareness within CITIES, the reader may have noticed that an emphasis was placed on higher level aspects of "long-term" SA related to a team's ability to forecast the future and preassign resources in advance of trouble. Little attention was paid, at least until the last experiment reported, to the more concrete elements of "immediate" SA that allowed teams to maintain moment-by-moment control of their dispatch systems. The decision to emphasize the more cognitively complex aspects of group SA was made with the idea that it would give a clearer indication of a team's awareness of the "big picture." Rather than probe each individual team member' memory for specific elements of the immediate environment, it was decided to study the integration and organization of these elements into an overall gestalt. This was in contrast to studies of individual SA that have traditionally focused on a pilot's ability to recall specific instrument readings or aircraft positions after a flight scenario has been temporarily interrupted (e.g., Endsley, 1988; Fracker, 1988, 1989; Marshak, Kuperman, Ramsey, & Wilson, 1987). Interestingly, those who have suggested expanding ways of measuring individual and group SA within the cockpit (e.g., Harwood, Barnett, & Wickens, 1988; Sarter & Woods, 1991; Tenny, Adams, Pew, Huggins, & Rogers, 1991) have proposed the use of similar dynamic scenarios that contain subtle cues whose importance is not realized until decision points are reached later in the scenario. These scenario-based approaches are all reminiscent of the early simulator experiments conducted at NASA-Ames Research Center (e.g., Foushee & Manos, 1981; Ruffel Smith, 1979). Clearly, there is a need to measure both the "explicit" and "implicit" (Harwood et al., 1988) aspects of group SA in order to gain a better understanding of how various levels of SA evolve. Recent attempts by the present investigator to supplement global measures of group SA with post-session memory probes of specific events led to some interesting results. The finding that under some conditions operators streamlined the information-gathering

286

WELLENS

process before allocating resources to emergency events suggested a coping mechanism that may interfere with higher levels of SA. If information gathering is abbreviated for the sake of making timely decisions, the probability of an accurate "big picture" evolving from accumulated stores in long-term memory may be reduced. The tradeoff between timeliness and accuracy has been reported by other investigators using an alternative distributed decisionmaking paradigm (e.g., Entin, Serfaty, & Williams, 1987; Serfaty, 1990). These investigators reported additional adjustments in response to time stress that include a reduction in communication between team members. This underscores the importance of time pressure and its effect on group situation awareness. As team members streamline their information-gathering activities and shut down their communication with other team members, accurate group SA may eventually collapse. At least one set of authors have identified "time" as the "hallmark" of situation awareness (Harwood et al., 1988), and others have mentioned it as an integral part of the process (Adams & Pew, 1990; Sarter & Woods, 1991). When individuals or groups "go over the SA cliff" (Press, 1986), it can be a result of there simply not being enough time to han· die all the tasks facing them. As a practical matter, how can group SA be better maintained in working groups? Certainly, the answer to this question will depend on the particulars of the group, the task, and the setting. For many kinds of groups, situation awareness is easily maintained because of the physical proximity of team members and the visibility of the task. For example, members of a sailing team may obtain a uniform sense of wind speed and direction directly from the environment they all share in common. They can respond to sudden bursts of wind in a coordinated fashion because they all experience the sensation simultaneously and can see each other's reaction to it. As long as the helms· man can see the marker buoys, the team will most likely be able to maintain its course. The problem in maintaining group SA is greater for groups whose members do not share the same environment or whose tasks are less visible. An air traffic controller and an airline flight crew must work as a team to coordinate the landing of an aircraft. Their joint effort is complicated by the fact that each is out of sight of the other and each has many tasks to perform that are not directly observable. If something out of the ordinary must be dealt with at either location, assumptions about what tasks are being performed and where the joint effort is headed may be wrong. Clarification of the situation takes deliberate verbal communication that must be added to an already heavy workload. How can SA be increased in DDM environments? Results of the experiments reported here suggested there is no simple technological"fix." As long as time pressure remains a factor, the answer will most probably come from the intersection of three separate approaches described by Duffy (1992):

14. GROUP SITUATION AWARENESS

287

training, organization, and additional engineering. Several programs are already underway that are aimed at training team members to become more aware of multiple mental models (e.g., Cannon-Bowers, Salas, & Converse, 1990) and to communicate with one another more effectively (e.g., Lauber & Foushee, 1981; Oser, McCallum, Salas, & Morgan, 1989; Palmer, 1990). Still others are focusing on the speed and efficiency that comes with increased expertise (e.g., Klein & Klinger, 1991) and exposure to anticipated patterns of events (e.g., Companion, 1990). Organizational approaches have been attempted that focus on restructuring tasks (e.g., Stammers & Hallam, 1985) and communication systems (e.g., Hiltz & Turoff, 1985) to take into account the increased informational demands being placed on individual team positions. Finally, engineering efforts have been undertaken that focus on how information can be more effectively transmitted and displayed for use under high time stress conditions (e.g., Schwartz & Howell, 1985; Scott & Wickens, 1983; Stollings, 1982; Wilson, McNeese, Brown, & Wellens, 1987). The theoretical and empirical reports summarized in this chapter are but the beginnings of an interdisciplinary effort to understand group situation awareness and distributed decision making. As behavioral scientists and engineers struggle to cope with problems already manifesting themselves in existing DDM environments, new technologies and applications are appearing on the horizon. New ways of manufacturing products, producing energy, diagnosing illnesses, exploring space, and even conducting science are rapidly emerging. Many of these endeavors envisage increased collaboration between geographically dispersed individuals and machines. It is hoped that our own ability to grasp the "big picture" will help us develop concepts broad enough and sophisticated enough to meet these new challenges.

ACKNOWLEDGMENTS

The author would like to thank the Air Force Office of Scientific Research for sponsoring the development of the CITIES paradigm and for supporting much of the research associated with it. Thanks are also extended to the Armstrong Aerospace Medical Research Laboratory, Human Engineering Division, Wright-Patterson Air Force Base, Ohio, for hosting the project through the University Resident Research Program. The author would like to thank the following people for helping locate many of the technical reports cited in this chapter and for their own contributions to the situation awareness literature: Clifford Brown, LorRaine Duffy, Mica Endsley, Michael McNeese, Lisa Shrestha, Eduardo Salas, and Yvette Tenney.

288

WELLENS

REFERENCES Adams, M. J., & Pew, R. W. {1990). Situation awareness in the commercial aircraft cockpit: A cognitive perspective. Proceedings of the IEEE International Digital Avionics Systems Conference, 519-524. Asch, S. {1951). Effects of group pressure upon the modification and distortion of judgment. In H. Guetzkow {Ed.), Groups, leadership and men. Pittsburgh, PA: Carnegie Press. Ben-Bassat, M., & Freedy, M. {1982). Knowledge requirements and management in expert decision support systems for {military) situation assessment. IEEE Transactions on Systems, Man and Cybernetics, 12, 479-490. Braune, R. J. {1990). Automation in commercial transport airplanes: The role of situation awareness. Proceedings of the Human Factors Society 34th Annual Meeting, 22. Callaway, M., & Esser, J. (1984). Groupthink: Effects of cohesiveness and problem-solving procedures on group decision making. Social Behavior and Personality, 12, 157-164. Cannon-Bowers, J. A., Salas, E., & Converse, S. (1990). Cognitive psychology and team training: Training shared mental models of computer systems. Human Factors Saciety Bulletin, 33, 1-4. Chapanis, A. {1971). Prelude to 2001: Exploration in human communication. American Psychologist, 26, 949-961. Companion, M. (1990). Training technology for situation awareness: Annual technical project report (Florida High Technology and Industry Council Applied Research Grants Program Research Report). Orlando: Institute for Simulation and Training. Connors, M., Harrison, A., & Akins, F. (1985). Living aloft: Human requirements for extended spaceflight. Washington, DC: National Aeronautics and Space Administration. Dornheim, M.A. (1986, June 23). Crew situation awareness drives avionics developments. Aviation Week and Space Technology, 114-116. Duffy, L. (1992). Team decision making biases: An information processing perspective. In G. A. Klein, J. Orasanu, & R. Calderwood (Eds.), Decision making in action: Models and methods (pp. 234-242). Norwood, NJ: Ablex. Endsley, M. R. (1988). Situation awareness global assessment techniques (SAGA T). Proceedings of the IEEE National Aerospace Electronic Conference, 3, 789-795. Endsley, M. R. (1989). Final report: Situation awareness in an advanced strategic RT mission (NOR DOC 89-32). Hawthorne, CA: Northrop. Engelhart, D., & Lehtman, H. (1988). Working together. Byte, 13, 245-252. Entin, E., Serfaty, D., & Williams, P. (1987). Timeliness versus accuracy in team decision making. Presented at the 5th Annual Workshop on Command and Control Decision Aiding, Kansas City, MO. Ergener, D., & Wellens, A. R. (1985). A split-screen electronic messaging system for Apple II computers. Behavior Research Methods, Instruments & Computers, 17, 556-564. Foushee, H. C. (1982). The role of communications, socio-psychological, and personality factors in the maintenance of crew coordination. Aviation, Space, and Environmental Medicine, 53, 1062-1066. Foushee, H. C. (1984). Dyads and triads at 35,000 feet: Factors affecting group process and aircrew performance. American Psychologist, 39, 885-893. Foushee, H. C., & Helmreich, R. L. (1988). Group interaction and flight crew performance. In E. L. Wiener & D. C. Nagel (Eds.), Human factors in aviation (pp. 189-227). New York: Academic Press. Foushee, H. C., & Manos, K. L. (1981). Information transfer within the cockpit. Problems in intracockpit communications. In C. E. Billings & E. S. Cheaney (Eds.), Information transfer problems in the aviation system (NASA Tech. Paper 1875; pp. 63-71). Moffett Field, CA: NASAAmes Research Center.

14. GROUP SITUATION AWARENESS

289

Fracker, M. L. (1988). A theory of situation assessment: Implications for measuring situation awareness. Proceedings of the Human Factors Society 32nd Annual Meeting, 102-106. Fracker, M. L. (1989). Attention allocation in situation awareness. Proceedings of the Human Factor Society 33rd Annual Meeting, 1396-1400. Galegher, J., & Kraut, R. (1990). Technology for intellectual teamwork: Perspectives on research and design. In J. Galegher & R. Kraut (Eds.), Intellectual teamwork: Social and technological foundations of cooperative work (pp. 1-20). Hillsdale, NJ: Lawrence Erlbaum Associates. Harwood, K., Barnett, B., & Wickens, C. (1988). Situation awareness: A conceptual and methodological framework. Proceedings of the I 1til Symposium of Psychology in the Department of Defense, 316-320. Hendrick, C. (Ed.) (1987). Group processes. Beverly Hills, CA: Sage. Hiltz, S. R., & Turoff, M. (I 985). Structuring computer-mediated communication systems to avoid information overload. Communication of the ACM, 28, 680-689. Hinsz, V. B. (1990). A conceptual framework for a research program on groups as information processors. Technical report submitted to the Logistics and Human Factors Division, AF Human Resources Laboratory, Wright-Patterson AFB, OH. Hinsz, V. B. (199 I). Considerations in the specification, assessment and evaluation of mental models of social systems. Unpublished manuscript. Hinsz, V. B., Tindale, R. S., & Vollrath, D. A. (1991). The emerging conceptualization of groups as information processors. Manuscript submitted for publication. Janis, I. (1972). Victims of groupthink. Boston: Houghton-Mifflin. Johansen, R. (1988). Groupware: Computer support for business teams. New York: The Free Press. Johansen, R., Vallee, J., & Collins, K. (1978). Learning the limits of teleconferencing. In M. Elton, W. Lucas, & D. Conrath (Eds.), Evaluating new telecommunication systems (pp. 385-398). New York: Plenum Press. Kaplan, K. (I 977). Structure and process in interpersonal "distancing." Environmental Psychology and Nonverbal Behavior, 1, I 7-29. Klein, G. (I 989, May). Strategies of decision making. Military Review, 56-64. Klein, G., & Klinger, D. (1991). Naturalistic decision making. CSERIAC Gateway, 2, 1-4. Korzenny, F. (1978). A theory of electronic propinquity: Mediated communication in organizations. Communication Research, 5, 3-23. Lauber, J. K., & Foushee, H. C. (1981). Guidelines for line-oriented flight training (Vol. I, NASA Conference Publication 2184). Moffett Field: NASA-Ames Research Center. Laughlin, P. R. (I 980). Social combination processes of cooperative problem-solving groups on verbal intellective tasks. In M. Fishbein (Ed.), Progress in social psychology (Vol. I, pp. 127-155). Hillsdale, NJ: Lawrence Erlbaum Associates. Leavitt, H. J. (1951). Some effects of certain communication patterns on group performance. Journal of Abnormal and Social Psychology, 46, 38-50. Lengel, R. H., & Daft, R. L. (1984). Exploratory analysis of the relationship between media richness and managerial information processing (TR-DG-08-0NR). College Station, TX: Department of Management, Texas A & M University. Lord, R. G. (1985). An information processing approach to social perceptions, leadership, and behavioral measurement in organizations. In L. L. Cummings & B. M. Staw (Eds.), Research in organizational behavior (Vol. 7, pp. 87-128). Greenwich, CT: JAI Press. Marshak, W. P., Kuperman, G., Ramsey, E. G., & Wilson, D. (1987). Situation awareness in map displays. Proceedings of the Human Factors Society 31st Annual Meeting, 533-535. McGrath, J. E. (1984). Groups: Interaction and performance. Englewood Cliffs, NJ: Prentice-Hall. Mehrabian, A. (1971). Silent messages. Belmont, CA: Wadsworth. Mehrabian, A. (1972). Nonverbal communication. New York: Aldine-Atherton. Milgram, S. (1965). Some conditions of obedience and disobedience to authority. Human Relations, 18, 57-75.

290

WELLENS

Morishige, R. 1., & Retelle, J. (1985, October). Air combat and artificial intelligence. Air Force Magazine, 91-93. National Research Council (1990). Distributed decision making: Report of a workshop. Washington, DC: National Academy Press. Norman, D. (1983). Some observations on mental models. In D. Gentner & A. Stevens (Eds.), Mental models (pp. 7-14). Hillsdale, NJ: Lawrence Erlbaum Associates. Ochsman, R. B., & Chapanis, A. (197 4). Effects of 10 communication modes on behavior of teams during co-operative problem solving. International Journal of Man-Machine Studies, 6, 579-619. Orasanu, J. M. (1990). Shared mental models and crew decision making (CSL Report 46). Prince· ton, NJ: Cognitive Science Laboratory, Princeton University. Oser, R., McCallum, G. A., Salas, E., & Morgan, B. B. (1989). Toward a definition of teamwork: An analysis of critical team behaviors (TR 89-004.) Orlando: Naval Training Systems Center, Human Factors Division. Palmer, E. (1990). Crew situation awareness. Proceedings of the Human Factors Society 34th Annual Meeting, 22. Person, L. H., & Steinmetz, G. G. (1981, November 9-12). The integration of control and display concepts for improved pilot situation awareness. Flight Safety Foundation International Air Safety Seminar, Acapulco. Press, M. (1986). Situation awareness: Let's get serious about the clue bird. Unpublished paper. Rouse, W. B., & Morris, N. M. (1986). On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin, /00, 349-363. Ruffel Smith, H. P. (1979). A simulator study of the interaction of pilot workload with errors, vigilance, and decisions (NASA Technical Memorandum 78482). Moffett Field, CA: NASAAmes Research Center. Sage, A. P. (1987). Information systems engineering lor distributed decision making. IEEE Trans· actions on Systems, Man, and Cybernetics, I 7, 920-936. Sarter, N. B., & Woods, D. D. (1991). Situation awareness: A critical but ill-defined phenomenon. International Journal of Aviation Psychology, /, 45-57. Schwartz, D. R., & Howell, W. C. (1985). Optimal stopping performance under graphic and nu· meric CRT formatting. Human Factors, 27, 433-444. Scott, B., & Wickens, C. D. (1983). Spatial and verbal displays in a C3 probabilistic information integration task. Proceedings of the Human Factors Society 27th Annual Meeting, 355-358. Serfaty, D. (1990). Studies in team decision making: Adoption processes in team coordination. Presented at the 98th Annual Convention of the American Psychological Association, Boston. Shaw, M. E. (1964). Communicative networks. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 1, pp. 111-147). New York: Academic Press. Shaw, M. E. (1978). Communication networks fourteen years later. In L. Berkowitz (Ed.), Group processes (pp. 351-361). New York: Academic Press. Short, J., Williams, E., & Christie, B. (197 6). The social psychology of telecommunications. Lon· don: Wiley. Stammers, R. B., & Hallam, J. (1985). Task allocation and the balancing of task demands in the multi-man-machine system: Some case studies. Applied Ergonomics, /6, 251-257. Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision making: Bi· ased information sampling during discussion. Journal of Personality and Social Psychology, 48, 1467-1478. Steiner, I. D. (1972). Group process and productivity. New York: Academic Press. Stollings, M. N. (1982). Information processing load of graphic versus alphanumeric weapons format display for advanced fighter cockpits (Master's thesis). Wright State University, Dayton, OH. Tenny, Y. T., Adams, M. J., Pew, R. W., Huggins, A. W. F., & Rogers, W. H. (1991). A principled approach to the measurement of situation awareness in commercial aviation (NASA Contractor Report; BBN Report No. 7451 under Contract NAS1-18788). Cambridge, MA: BBN Systems & Technologies.

14.

GROUP SITUATION AWARENESS

291

Thorndyke, P. (1984). Applications of schema theory in cognitive research. In J. Anderson & S. Kosslyn (Eds.), Tutorials in learning and memory (pp. 167-191). New York: W. H. Freeman. Wegner, D. (1987). Transactive memory: A contemporary analysis of the group mind. In B. Mullen & G. Goethals (Eds.), Theories of group behavior (pp. 185-208). New York: Springer-Verlag. Wellens, A. R. (1978). A device that provides an eye-to-eye video perspective for interactive television. Behavior Research Methods and Instrumentation, /0, 25-26. Wellens, A. R. (1979). An interactive television laboratory for the study of social interaction. Journal of Nonverbal Behavior, 4, 119-122. Wellens, A. R. (1986). Use of a psychological distancing model to assess differences in telecommunication media. In L. Parker & C. Olgren (Eds.), Teleconferencing and electronic media (Vol. V, pp. 347-361). Madison: Center for Interactive Programs, University of Wisconsin. Wellens, A. R. (1987). Effects of telecommunication media upon group decisionmaking processes within a multi-team situation assessment task (AFOSR Final Report P.O. S-760-6MG-085). Coral Gables, FL: University of Miami. Wellens, A. R. (1989a, June 16). Effects of communication bandwidth upon group and humanmachine situation awareness and performance (Final briefing). Wright-Patterson Air Force Base, OH: Armstrong Aerospace Medical Research Laboratory. Wellens, A. R. (1989b, September). Effects of telecommunication media upon information sharing and team performance: Some theoretical and empirical observations. IEEE AES Magazine, 13-19. Wellens, A. R. (1990). Assessing multi-person and person-machine distributed decision making using an extended psychological distancing model (AAMRL-TR-90-006). Wright-Patterson Air Force Base, OH: Armstrong Aerospace Medical Research Laboratory. Wellens, A. R., & Ergener, D. (1988). The C.I.T.I.E.S. game: A computer-based situation assessment task for studying distributed decision making. Simulation and Games, 19, 304-327. Wellens, A. R., Grant, B. S., & Brown, C. E. (1990, October). Effects of time stress upon human and machine operators of a simulated emergency response system. Presented at the 34th Annual Meeting of the Human Factors Society, Orlando. Wellens, A. R., & McNeese, M.D. (1987). A research agenda for the social psychology of intelligent machines. Proceedings of the IEEE National Aerospace and Electronics Conference, 4, 944-950. Wickens, C. D., & Flach, J. M. (1988). Information processing. In E. L. Wiener & D. C. Nagel (Eds.), Human factors in aviation (pp. 111-155). New York: Academic Press. Wiener, E. L. (1988). Cockpit automation. In E. L. Wiener & D. C. Nagel (Eds.), Human factors in aviation (pp. 433-461). New York: Academic Press. Wilson, D. L., McNeese, M.D., Brown, C. E., & Wellens, A. R. (1987). Utility of shared versus isolated work setting for dynamic team decision making (AAMRL-TR-87-072). Wright-Patterson Air Force Base, OH: Armstrong Aerospace Medical Research Laboratory. Wohl, J. G. (1985). Subtask 4 report "Build a top-level concept" for A W/ AA proactive integration and assessment (SP-491, Contract Number BOA-11-860015-39 Task 1). Burlington, MA: Alphatech. Wohl, J. G. (1987, May 19-21). Proactive data fusion for integrated tactical warning and assessment. Presented at the Strategic C3 Working Group, 55th MORS Symposium, Air University, Maxwell Air Force Base, Montgomery, AL.

CHAPTER

15 NATURALISTIC GROUP DECISION MAKING: OVERVIEW AND SUMMARY William C. McDaniel Formerly with Naval Personnel Research and Development Center, San Diego, CA

To provide a structure to my remarks concerning the previously presented research in naturalistic group decision making, I am reminded of a rather pejorative story about attorneys. The story seems appropriate and relevant to decision-making research; however, if I fail to communicate this relevancy, I hope you will at least enjoy the story. It seems there were two psychologists who were hot air balloon enthusiasts. One morning soon after they had launched, a fierce storm came up. Through downdrafts, updrafts, and severe crosswinds, they managed to remain aloft. However, due to fog, rain, and clouds that obscured the ground, after about an hour they were completely lost. Finally, through a break in the clouds, they could see the ground and a man standing in a clearing. The two balloonists descended and one of them shouted, "Where are we?" The man on the ground replied, "You're in a balloon." Perplexed, frustrated, and more than a little confused, the balloonists looked at each other. Finally, one spoke: "What did he say?" The other answered, "He must be a lawyer-What he said made a lot of sense, but I don't think he told us a damn thing." WHERE ARE WE?

On July 3, 1988, Iran Air Flight 665 took off from Bandar Abbas Airport and headed across the Persian Gulf enroute to Dubai. Seven minutes into its flight, Captain William Rogers of the USS VINCENNES, judging the airliner to be 293

294

McDANIEL

a hostile fighter plane, destroyed it with two missiles. All 290 persons aboard were killed. That episode entailed human judgment, its interface with the highest of high technology, and the terrible cost of an error produced as a consequence. Ken Hammond, then Chairperson of the Society of Judgment and Decision Making, sent a challenge to decision-making researchers to focus their talents on system interfaces and reduce the probability of such incidents. The thread of "Where are we?" runs throughout the fabric of Hammond's message. The challenge is not trivial-indeed as Hammond noted, "One can think of the VINCENNES episode as a not-so-dry run for the main event."

WHAT HE SAID MAKES A LOT OF SENSE, BUT ...

Classical decision-making theory suggests that individuals exhaustively examine all possible alternatives and select the alternative leading to the greatest utility. That alternative was considered to be optimal, all other alternatives being suboptimal. Simon introduced the theory of bounded rationality, that posits individuals examine a limited number of alternatives and select an alternative that is consequentially "good enough." Both classical and bounded rationality theories are normative; that is, they describe optimal decision making as it should be done. However, individuals do make suboptimal decisions from either an exhaustive or a limited set of alternatives. The mainstream approach in suboptimal decision making emphasizes heuristics and biases that affect judgment under uncertainty (Kahneman, Slovic, & Tversky, 1982). Based on a considerable body of individual decision-making research, Slovic (1989) concluded that optimal decisions may be reached through rigorous analysis of structuring the problem, assessing the likelihoods, assessing the importance of the potential outcomes, and integrating all the information about probabilities and values. The effects of heuristics and biases upon optimal decision making has been called into question. Dreyfus and Dreyfus (1986) critically reviewed previous decision-making research and concluded that observed biases in human judgment are often the result of contrived experimental situations and inexperienced or poorly motivated subjects. Alternatively, they suggested that optimal decision making is the product of skill acquisition and experience rather than a rigorous analysis to overcome biases. Klein (1989) questioned the applicability of rigorous analysis in naturalistic conditions and timepressured tasks. The VINCENNES Anti-Aircraft Warfare Officer is often attributed with an error due to expectancy bias. The stage had been set for expectancy bias. The Iran-Iraq War and Persian Gulf hostilities had steadily intensified. The campaign in the Central Persian Gulf centered on air strikes against oil facili-

I 5.

OVERVIEW AND SUMMARY

295

ties and shipping. On May 17, 1987, Iraq conducted an erroneous air attack on the USS STARK. The STARK was severely damaged and experienced high casualties to U.S. Navy personnel. The Commanding Officer and Tactical Action Officer of the STARK received criticism for taking inadequate measures to defend the ship. The United States commenced escorting Kuwaiti reflagged tankers in 1987. In addition to strikes against neutral shipping by aircraft, Iran conducted ship attacks with surface ships and small boats. Additionally, Iran also placed mine fields across the Persian Gulf and the Gulf of Oman in an effort to sink U.S. warships and stop convoy operations. These mine fields resulted in severe damage to both BRIDGETON in July, 1987, and the USS SAMUEL B. ROBERTS in April, 1988. In retaliation for the mining of USS SAMUEL B. ROBERTS, the United States attacked the Iranian Sirri and Sasson offshore oil production facilities in the south Persian Gulf on April 18, 1988. Iran responded by air and surface attacks on U.S. owned or associated oil rigs, platforms, and jack-up rigs. During the engagement with U.S. forces, two Iranian frigates and one missile patrol boat were sunk or severely damaged. Iranian F-4 aircraft were scrambled during the day from Bandar Abbas. USS WAINWRIGHT launched missiles at one of the aircraft, damaging it when the aircraft failed to respond to repeated warnings and continued to close the ship. Iranian Air Force operating patterns changed significantly, particularly at Bandar Abbas, in the month prior to July 3, 1988. Iranian F-14 aircraft were transferred to Bandar Abbas and perceived as an upgrade in Iranian air capability at Bandar Abbas. Units were cautioned to be on the alert for more aggressive behavior, and the USS VINCENNES was advised of the increased threat the F-14s represented. On July 3, 1988, the VINCENNES crew was engaged in a threatening battle with Iranian gunboats. Initial information from the Combat Information Center revealed an aircraft was taking off from Bandar Abbas and, due to a system flaw, the aircraft was identified as an F-14. The target aircraft was not flying down the middle of the commercial air corridor as was typical for jetliners in the Persian Gulf and did not heed warnings to change course. Further, an Iranian P-3 aircraft was flying a classical profile for furnishing information to an attack aircraft. There were some reports that the target aircraft was reducing altitude as it closed with the ship. These last reports were erroneous; the airliner never reduced altitude. The VINCENNES' Commanding Officer decided to fire on the approaching aircraft and, 7 minutes after takeoff, Iran Air Flight 655 was destroyed with two Standard missiles. Expectancy was so high that crewmen on the deck of the VINCENNES actually confirmed an F-14 going down in the water. Klein (1989) questioned the usefulness of explaining poor outcomes in terms of decision biases. Whereas a strong case is made for potential expectancy bias, other biases affecting decision making should also be considered. For

296

McDANIEL

example, if we consider base rates, 98% of the aircraft being warned belonged to Iranian military forces; only 2% were commercial flights. The Antiaircraft Warfare Officer was a good Bayesian. If the aircraft had been hostile and damage had been done to the VINCENNES as in the previous USS STARK incident, we would have condemned him for failing to take base rates into account. Thus, we place decision makers in naturalistic groups in a classic Joseph Heller's Catch-22 dilemma. Naturalistic groups or task-oriented teams involve social interaction. The application of social decision-making research to resolving problems of poor decision making offers little direct solution. The majority of research centers on groups analyzing alternatives and reaching consensus. Naturalistic groups rarely reach consensus, and there is rarely a requirement to do so. Naturalistic group decision making is considerably more dynamic than found in studies of laboratory-created teams. At the outset, in the dynamic naturalistic group, even knowledge of state and initial conditions does not necessarily imply knowledge of state very far into the future. Information is gathered, decisions are made, actions are taken, the effect of actions on the situation are assessed, and new decisions are made.

WHERE ARE WE? After VINCENNES, have we picked up Hammond's challenge? Where are we now? The previous three chapters represent a sample of efforts to explore the dynamics of decision making in naturalistic groups and teams. Common among these efforts is the recognition that decision making in naturalistic groups is a qualitatively and uniquely different area of decision making. Although previous research efforts in individual and group decision making are useful to naturalistic group decision making, the knowledge generated must be moderated against the background of a highly dynamic team process in rapidly changing environmental situations. Solutions to judgmental errors will be found in three areas of importance to applied psychology: selection, technology, and training. In a practical sense, selection may not be a viable option. Demographics, the sheer numbers of naturalistic groups needed to accomplish diverse functions, and the dynamic development of teams make selection very difficult. Thus, we find ourselves left with two areas to look for solutions: technology and training. First, let us look at technology. Wellens (this volume) presents information about the dynamics of naturalistic group decision making particularly focused on the human-system interface. Dawes (1979) noted that the human is especially good at perceiving and sorting information, whereas computers and mathematical models are especially good at integrating information. Caution must be exercised in designing technology that we perceive as helpful

IS.

OVERVIEW AND SUMMARY

297

in the integration of information. Wellens' research in networking and situational awareness is helpful in looking at member skills in the human-system interface and informational requirements to improve decision-making skills. An important lesson to be remembered is the nonintuitive effects of "communication bandwidth," and the observed result that apparently useful information may degrade performance rather than enhance it. Duffy (this volume) suggests that computer support and automation introduces a new dimension into this research. Obviously, the supply of needed information and the suppression of irrelevant information to members in the naturalistic group is a necessary enterprise for the research community. The crucial issue in the design of complex human-machine systems hinges on the role of the human operator(s) as final decision authority in the control of the system. Requirements for complex systems are approaching the limits of our knowledge about how to integrate humans into systems to ensure that they retain effective control. The problem is deeper than simply improving display and control technology or development of faster processors; an integrated architecture for the human-machine interface and an associated systems engineering methodology are urgently needed to support human-mediated control of complex systems. Human engineering-human factors must transit from the traditional man-machine interface to a mindmachine interface. This concept has been further reinforced in a "technologypush" environment of microprocessors, artificial intelligence, cognitive science, cybernetics, analytical decision theory, and emerging simulationmodeling technology. The automated assistance of mental functions to enhance system performance has become a respectable notion in this environment. Gibson (1966) suggested humans were not simply passive processors of environmental information; rather, humans actively seek information that provide structure and a sense of the world. Similarly. humans do more than process information and take action based on system sensor signals. Humans can provide the system with information about expectancies, hypotheses, and model parameters of the situation. The system can deploy sensors to either confirm or disconfirm this expected model. Using the inductive power of the human and the deductive power of the machine, the operator can keep ahead or at least abreast of the system. Barrett and Donnell (1990) articulated some considerations and imperatives for real time expert advisory systems. These considerations were designed into the Knowledgeable Observation AnalysisLinked Advisory System (KOALAS) process architecture that is serving as the foundation of a simulation-based intelligent control testbed under development at Los Alamos National Laboratory. This concept of integrating human information into the system to develop a proactive model of the world situation rather than simply correlating system information to form a reactive model represents a new direction in systems design.

298

McDANIEL

Turning now to training, Duffy (this volume) and Cannon-Bowers, Salas, and Converse (this volume) address approaches and concepts that may be useful in the training of decision-making skills in naturalistic groups. Duffy's expanded list of biases that result in suboptimal decision making provide a much expanded research agenda. The introduction of social influence presents a particular challenge to the training community. Deutsch and Gerard (1955) introduced the strategies of information seeking and norm seeking as applied to conformity and social influence. Consider those two strategies in the VINCENNES incident. Commander Robert Carlson, in a letter to Naval Institute Proceedings, suggested that decisional errors were not a problem on board the VINCENNES, rather, the VINCENNES was spoiling for a fight. The crew was aggressive to the point that the VINCENNES was nicknamed "RoboCruiser." Certainly, if the VINCENNES was "norm seeking" for an appropriate aggressive posture, training strategies to develop methods to cope with "information-seeking" groups would lack effectiveness. Cannon-Bowers et al. (this volume) suggest that shared mental models or the sharing of perspectives is key to understanding naturalistic group processes in problem solving and decision making. The impact of framing on these shared perspectives should be considered. Most of us are familiar with the concept of framing as suggested by Kahneman and Tversky (1984). Indeed, their classic example could almost be termed the Parable of the Deadly Virus. We are amazed as individuals read about the 600 infected, the immunization that will save 200, or the 400 that will die, and the subsequent violation of intransitivity. Yet, this is just a problem on a piece of paper, cold and posing only an intellectual puzzle. In naturalistic groups and real-world problems, there is not only a cognitive component but also connative and emotional components. How do these connative and emotional components affect framing of the situation, and do members of naturalistic groups share the same framing effects? In summary, progress in identifying areas that will ultimately result in improved decision making in naturalistic groups is being made. We have found the dynamics are more complex, perhaps by an order of magnitude, than previously considered in decision-making research. Social exchange and social influence interact with the complexities of individual problem solving, information processing, and decision making to produce a problem that almost defies structure. We cannot ignore the problem or devote less than our full energy to improving judgment and decision making in naturalistic groups; the signs are too clear about the disastrous consequences. Sir Ronald Fisher noted-in a most sobering statement-"The improbable is inevitable." I watched three Soviet Naval vessels sail into San Diego Bay. For a fleeting moment I thought that perhaps improved East-West relations would move the improbable to the impossible. However, recent Middle East events suggest this thought may have been only a quickly passing dream. These events

15.

299

OVERVIEW AND SUMMARY

accentuate the urgency to improve judgment and decision making in naturalistic groups.

REFERENCES Barrett, C. L., & Donnell, M. L. (1990). Real time expert advisory systems: Considerations and imperatives. Information and Decision Technologies, 16. Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571-582. Deutsch, M., & Gerard, H. B. (1955). A study of normative and informational social influences upon individual judgment. Journal of Abnormal and Social Psychology, 35, 1-11. Dreyfus, H. S., & Dreyfus, S. E. (1986). Mind over machine. New York: The Free Press. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton-Mifflin. Kahneman, D., & Tversky, A. (1984). Choices, values and frames. American Psychologist, 3.9, 341-350. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press. Klein, G. A. (1989). Do decision biases explain too much? Human Factors Society Bulletin, 32, 1-3. Slovic, P. (1989). Testimony. In U.S. Congress, Iran Air Flight 655 compensation: Hearings before the Defense Policy Panel of the Committee on Armed Services, House of Representatives, One Hundredth Congress, second session, hearings held August 3 and 4, September .9, and October 6, 1.988 (pp. 193-217). Washington, DC: U.S. Government Printing Office.

AUTHOR INDEX

A Aaronson, D. E., 168, 170, 175, 200 Abelson, R. P., 119, 120, 123, 248, 258, 262 Aboul-Ezz, M. E., 31, 39 Abrams, R. A., 102, 107 Acomb, D. B., 230, 232, 243 Adams, M. J., 285, 286, 288, 290 Adelman, L., 230, 242 Akins, F., 275, 288 Alba, J. W., 225, 242 Alfini, J. J., 163, 165, 177. 192, 196, 199, 206, 216 American Law Institute, 170, 171, 172, 173, 174, 198 American Psychiatric Association, 172, 198 Anderson, B. F., 88, 106 Anderson, J., 154, 158, 162, 163, 164, 166 Anderson, N. H., 131, 133 Anderson, R. C., 225, 242 Applegate, L., 254, 264 Arens, R., 174, 177, 182, 198 Argote, L., 109, 110, 123 Arkes, H. R., 4, 6, 7, 9, 10, II, 14, 15, 17, 18, 19, 102, 106 Asch, S., 272, 288 Ashton, R. H., 31. 37

Associated Press, 247, 262 Athens, M., 229, 242

B Baetge, M. M., 230, 232, 243 Balla, J. 1., 146, 147 Balzer, W. K., 68, 84 Bank, S.C., 171,200 Bar-Hillel, M., 112, 123, 142, 147 Barnett, B., 285, 286, 289 Barrett, C. L., 297, 299 Bayman, P., 239, 242 Bazelon, D. L., 206, 216 Beach, L. R., 25, 29, 37, 133, 134 Beard, R. L., 223, 245 Bedard, J. C., 22, 37 Behn, R. D., 36, 37 Ben-Bassat, M., 270, 288 Benedek, E. P., 171, 200 Benedict, K., 13, 18 Benson, S., 248, 255, 263 Bercaw, S., 171, 175, 176, 182, 193, 198, 199

Berkely, D., 97, 106 Berman, J. J., 174,199

301

302 Berman!, G., 204, 216 Best, R., 173, 200 Bettman, J. R., 25, 27, 28, 29, 31, 37, 38, 39, 99, 106, 133, 134 Bieber, S., 174, 182,200 Biggs, S. F., 22, 37 Billings, R. S., 22, 32, 37 Birnbaum, M. H., 112, 123 Bjorkman, M., 46, 63 Black, R. H., 248, 264 Blaiwes, A. R., 221, 245 Blaiwes, A. S., 222, 244, 249, 264 Blinkhorn, R., 9, 10, II, 18 Blumer, C., 6, 7, 18, 102, 106 Blunt, L. W., 171, 198 Bluth, G. J., 239, 244 Bobrow, D. G., 225, 242 Boehm, L., 4, 14, 18 Bonnie, R. J., 168, 170, 199 Borgida, E., 152, 154, 158, 162, 163, 164, 166, 213, 214, 216 Borgman, C. L., 238, 242 Bovair, S., 238, 244 Bowers, C. A., 225, 237, 244 Braff, J., 17 4, 193, 200 Brandt, D. M., 239, 244 Braune, R. J., 284, 288 Bray, R. M., 196, /99 Brehmer, B., 56, 63, 67, 68, 69, 70, 71, 72, 84, 85, 89, 102, 106, 107, 230, 242 Bresnick, T. A., 252, 266 Bright, R. D., 52, 54, 55, 63 Brilmayer, l., 138, 140, 142, 147 Brown, C. E., 279, 281, 287, 291 Brown, J. S., 227, 243 Brucato, P. F., 36, 38 Brunswik, E., 44, 63, 67, 85, 88, 106 Buckley, T., 88, 89, 98, 100, 102, 106, 106, 107, 108 Buckley, W., 125, 126, 127, 134 Suede, D. M., 34, 35, 39 Bushnell, l., 249, 262

c Callahan, L., 170, 198 Callaway, M., 272, 288 Campbell, W. J., 223, 224, 243 Cannon-Bowers, J. A., 221, 223, 224, 225, 228, 237, 240, 241, 242, 244, 245, 252, 262, 287, 288, 298

AUTHOR INDEX

Carroll, J. S., 126, 133 Casper, J. D., 13, 18 Castellan, N. J., Jr., 43, 45, 48, 49, 57, 58, 59, 61, 63, 67, 81, 85, 127, 131, 134 Chapanis, A., 258, 262, 277, 288, 290 Chapman, J. P., 252, 262 Chapman, L. J., 252, 262 Chidester, T. R., 225, 237, 244 Christensen, C., 4, 18, 102, 106 Christie, B., 258, 265, 275, 290 Christie, S. D., 110, 124 Cicourel, A. V., 254, 262 Clark, R. D., 69, 71, 85 Cohen, l. J., 137, 140, 141, 142, 144, 147 Cohen, M. D., 254, 262 Collins, A., 231, 245 Collins, A. M., 226, 242 Collins, K., 276, 289 Collins, R., 163, 166 Comfort, A., 203, 217 Companion, M., 270, 287, 288 Connolly, T., 261, 262 Connors, M., 275, 288 Converse, S. A., 221, 223, 225, 242, 245, 287, 288, 298 Cooper, R., 89, 106, 107 Cox, M., 163, 165 Cream, B. W., 228, 230, 242 Creyer, E. H., 31, 37 Crocker, J., 252, 265 CSCW, 262 Cummings, T. G., 223, 243 Curley, S. P., 102, 107 Cutler, B. L., 155, 165

D Daft, R. L., 257, 258, 262, 275, 289 Davis, J. H., 109, 110, Ill, 112, 118, 123, 124, 131, 134, 196, 199, 252, 258, 263 Dawes, R. M., 20, 37, 89, 107, 126, 131, 134, 296, 299 Dawson, N. V., 9, 10, II, 18 de Kleer, J., 227, 243 De Neufville, R., 35, 38 Deane, D. H., 57, 63, 88, 106 Degroot, S. E., 209, 216 DeMeuse, K. P., 223, 245, 249, 261, 265 Desvousges, W., 36, 39

303

AUTHOR INDEX

Deutsch, M., 298, 299 Devadas, A., 109, 123 Dexter, H. R., 155, 165 Diamond, S. S., 197, 200 Dickinson, T. L., 221, 223, 245 Doherty, M. E., 68, 84 Doherty, M. L. 32, 38 Domine, R. K., 7I, 85 Donnell, M. L., 297, 299 Donnerstein, E., I97, 200 Dornheirn, M. A., 268, 288 Dosher, B. A., 20, 25, 39 Dreyfus, H. S., 294, 299 Dreyfus, S. E., 294, 299 Dudycha, A. L., 69, 85, 133, 134 Dudycha, L. W., 89, 107 Duff, M. A., I95, I96, 199 Duffy, L. T., 248, 249, 252, 258, 263, 265, 272, 286, 288, 297, 298 Dunning, D., 98, 107 Duthie, B., I46, 147 Dyer, J. L., 222, 223, 243 Dyer, L., IIO, 123

E Ebbesen, E. B., 204, 217 Eder, R. W., 106, 107 Edgell, S., 43, 46, 48, 49, 50, 52, 54, 55, 56, 57, 58, 59, 60, 61, 63, 65, 66, 132, 134 Edwards, W., 35, 36, 37, 39, 76, 85 Eekhout, J. M., 251, 263 Eggerneier, F. T., 228, 242 Egido, C., 261, 263 Einhorn, H. J., 20, 31, 37, 38, 72, 85, 89, 105, 107, 113, 123, 131, 134, 143, 147 Eisenhardt, K. M., 30, 38 Ekeberg, S. E., 79, 83, 84, 86 Ellis, A. L., 110, 121, 122, 123 Ellsworth, P. C., 155, 165 Elstein, A., 146, 147 Elwork, A., 163, 165, 177, 192, 196, /99, 206, 216 Endsley, M. R., 268, 269, 272, 285, 288 Engelhart, D., 274, 288 Entin, E., 229, 246, 286, 288 Ergener, D., 274, 275, 277, 278, 288, 291 Erickson, J. R., 50, 64

Erlanger, H. A., 203, 216 Esser, J., 272, 288 Eylon, B. S., 239, 243

F Falk, 1., 16, 18 Farthing-Capowich, D., 171, 199 Faust, D., II, 18, 131, 134 Fedor, D. B., 106, 107 Feinberg, S. E., 137, 148, 209, 216 Filkins, J., Ill, II8, I 24 Fincham, F. D., 178, 195, 200 Fingarette, H., 175, 199 Finkel, N.J., 17I, 175, 176, 178, 182, 193, 195, 196, 198, 199 Fischer, G. W., 34, 38 Fischhoff, B., I2, /8, 35, 38, 90, 96, 97, 98, 107, Il3, 123 Fisher, W. N., 36, 39 Fiske, S. T., I53, 165 Flach, J. M., 270, 271, 274, 291 Flanagan, J. C., 79, 85 Ford, J. K., 32, 38 Ford, L. A., 52, 54, 55, 63 Foushee, H. C., 224, 230, 232, 243, 272, 285,287,288,289 Fracker, M. L., 285, 289 Franz, T. M., 237, 243 Frederiksen, J., 239, 243 Freedy, M., 270, 288 Futrell, D., 223, 245, 249, 261, 265

G Gabarro, J. J., 228, 243 Gaber, B. G., 22, 37 Galegher, J., 248, 249, 254, 256, 258, 261, 263, 274, 289 Gardner, M. R., I79, 180, /99 Gastwirth, J. L., 144, 148 Gentner, D. R., 231, 243 Gerard, H. B., 298, 299 Gerbasi, K. C., 196, 199, 203, 2/6 Ghose, S., 22, 38 Gibson, J. J., 297, 299 Gigerenzer, G., 100, 107 Glaser, R., 222, 244

304

AUTHOR INDEX

Glickman, A. S., 221, 222, 223, 224, 225, 243,244,245,249,264 Golding, S. L., 178, 195, 200 Goldstein, A., 195, 199 Goldstein, D., 13, 14, 18 Goodman, G., 213, 216 Gordon, G. N., 16, 18 Gordon, S. C., 227, 244 Gradwohl-Nash, J., 14, 15, 18 Granfield, D. D., 177, 198 Grant, B.S., 279, 281, 291 Grassia, J., 254, 263 Greeno, J. G., 231, 244 Gribben, M., 249, 265 Griffin, D. W., 98, 107, 108 Gross, P. H., 119, 120, 123 Guerette, P. J., 223, 224, 243 Guilmette, T., 11, 18

H Hackett, C., 14, 18 Hackman, J. R., 223, 243, 248, 261, 263 Hafemeister, T., 171. 173, 200 Hagafors, R., 102, 107 Hall, R., 230, 242 Hallam, J., 254, 265, 287, 290 Hamm, R. M., 254, 263 Hammond, K. R., 45, 57, 63, 64, 67, 81, 83, 85, 88, 106, 228, 230, 243, 254, 263 Handel, S. F., 171, 175, 176, 178, 182, 193, 198, 199 Haney, C., 204, 216 Hans, V. P., 203, 216 Harkness, A. R., 17, 18 Harrison, A., 275, 288 Hart, K., 11, 18 Hart, S. L., 102, 107 Harvey, J. B., 253, 263 Harwood, K., 285, 286, 289 Hasher, L., 13, 14, 18, 225, 242 Hasse, A. F., 175, 199 Hastie, R., 22, 39, 109, 123, 127, 134, 203, 204,205,206, 216,21G 253,265 Hedley-Goode, A., 77, 79, 80, 82, 83, 86 Helgeson, 213, 216 Helmreich, R. L., 272, 288 Hemphill, J. K., 230, 243 Hendrick, C., 271, 289 Henrion, M., 34, 38

Henry, R. A., 98, 103, 104, 106, JOB, 110, 122, 123, 127, 134 Hermann, D. H. J., 168, 195, 199 Hikida, R. H., 211, 216 Hiltz, S. R., 287, 289 Hinsz, V. B., 109,111,112,123,252,263, 272, 273, 289 Hoch, S. J., 98, 107 Hodapp, W., 16, 18 Hoffman, P. J., 46, 63, 68, 85, 131, 134 Hoffrage, U., 100, 107 Hogarth, R. M., 20, 31, 37, 38, 65, 66, 72, 85, 89, 107, 113, 123, 143, 147 Hollingshead, A., 258, 264 Holt, R. W., 196, 199 Howell, W. C., 287, 290 Huber, G. P., 259, 261, 263 Huber, J., 36, 39 Huber, 0., 26, 38 Huggins, A. W. F., 285, 290 Hults, B. M., 32, 38 Humphreys, P ., 97, 106 Hursch, C. J., 83, 85 Hutchins, E., 254, 263

lansek, R., 146, 147 ligen. D. R., 65, 72, 73, 76, 83, 86, 105, 107 lmwinkelried, E. J., 153, 165

J Jagacinski, R. J., 227, 243 James, R. M., 175, 176, 177. 197, 198, 199 Janis, I. L., 236, 243, 248, 251, 253, 261, 263, 272, 289 Jeffries, J. C., 168, 170, 199 Jessup, L. M., 259, 261. 263 Johansen, R., 248, 255, 263, 274, 276, 289 Johnson, E. J., 5, 18, 22, 25, 26, 27, 29, 31, 37,38,39, 99,106,133,134 Johnson, F. R., 36, 39 Johnson-Laird, P., 225, 226, 243 Jonakait, R. N., 144, 148 Jones, S. D., 79, 83, 84, 86 Joyner, C. A., 4, 14, 15, 18

305

AUTHOR INDEX

K Kadane, J. B., 209, 216 Kadish, M. R., 206, 216 Kadish, S. H., 206, 216 Kagehiro, D., 193, 196, 199 Kahneman, D., 31, 39, 75, 76, 84, 85, 98, 108, 109, 111, 112, 118, 119, 120, 123, 124, 143, 144, 148, 249, 251, 252, 263, 294, 298, 299 Kalven, H., 203, 216 Kaplan, J., 212, 216 Kaplan, K., 275, 289 Karkau, V. T., 56, 64 Kassin, S. M., 155, /65, 195, /99, 206, 216 Kaye, D. H., 139, 148, 209, 211, 212, 216, 217 Keeney, R., 250, 264 Keilitz, 1., 171, 174, /99 Kelly, J. R., 13, 18 Keren, G., 90, 107 Kerr, N. l., 110, /23 Kidd, R. F., 139, 148 Kieras, D. E., 238, 243, 244 Kiesler, S., 254, 256, 258, 264 King, J., 256, 258, 264 Klaus, D. J., 222, 244 Klayman, J., 22, 25, 38 Klein, G., 229, 244, 252, 264, 270, 287, 289 Klein, G. A., 227, 228, 242, 244, 294, 295, 299 Klein, N. M., 25, 38 Kleinbolting, H., 100, 107 Kleinman, D. L., 221, 224, 225, 229, 230, 244, 246, 249, 264 Kleinmuntz, B., 131, /34 Kleinmuntz, D. N., 34, 35, 36, 38, 39, 131, 134 Klinger, D., 270, 287, 289 Koch, J., 171, 175, 176, 182, 193, 198, /99 Koehler, J. J., 138, 139, 142, 145, 146, /48, 209, 21~ 211, 212, 21~ 217 Koele, P., 72, 85 Kohn, C., 224, 244 Kolzow, K., 104, /08 Konecni, V. J., 204, 217 Koriat, A., 12, 18, 98, 107 Kornhauser, L., 138, 140, /47 Korzenny, F., 275, 289 Kosynski, B., 254, 264 Kovera, M. B., 154, 165, 214, 217

Kraemer, K. l., 256, 258, 264, 265 Krantz, D. H., 142, 148 Kraut, R. E., 261, 263 Kraut, R., 274, 289 Kruglanski, A. W., 25, 38 Kuperman, G., 285, 289 Kuylenstierna, J., 69, 72, 85

L Lai, C., 102, 106 Lakshamanan, M., 9, 10, II, 18 Landsman, S. A., 153, /65, /66 Lanzetta, J. T., 224, 244 LaPorte, T. R., 254, 259, 265 Larrick, R. P., 8, 9, 18, 25, 38 Larson, J. R., 248, 265 Lauber, J. K., 230, 232, 243, 287, 289 Laughlin, P. R., 110, 121, 122, 123, 272. 289 Leavitt, H. J., 276, 289 Leddo, J., 119, 120, 123 Lehner, P. E., 230, 242 Lehtman, H., 274, 288 Lempert, R. L., 212, 213, 217 Lengel, R. H., 257, 258, 262, 275, 289 Levi, A., 248, 258, 262 Levine, J., 258, 264 Lewis, M. D., 237, 243 Lichtenstein, S., 12, /8, 24, 34, 35, 38, 90, 97, 98, 107 Lilly, G., !51, 166 Lin, S., 98, /08 Lindberg, l., 67, 71, 85 Lindblom, W. D., 171, 176, 183,200 Linsmeier, T. J., 22, 37 Loftus, E. F., 155, 163, 166, 192, 200, 226, 242 Loh, P. W., 168, 170, 199 Loh, W. D., 204, 205, 217 Lord, R. G., 252, 264, 272, 289 Lorge, 1., 109, 110, 123, 131, 134 Luckey, J. W., 174,199 Luh, P. B., 224, 244 Luus, C. A. E., 110, 124

M MacCoun, R. J., 197, /99, 203, 217 MacGregor, D. G., 34, 38, 96, 98, 107

306 Magat, W. A., 36, 38, 39 Mahan, L., 140, 148 Mann, L., 253, 263 Manning, L. L., 75, 86 Manos, K. L., 224, 243, 272, 285, 288 March, J., 254, 264 March, J. G., 254, 262 Marcus, S. A., 22, 37 Marshak, W. P., 285, 289 Martin, A., 248, 255, 263 Martin, A. W., 214, 217 Marvin, F. F., 252, 266 May, R. S., 90, 107 Mayer, C., 170, 198 Mayer, R. E., 231, 238, 239, 242, 244 McCabe, G. P., 127, 129, 134 McCallum, G. A., 223, 224, 237, 243, 244, 287, 290 McCauliff, C. M.A., 140, 148 McClelland, G. H., 29, 38, 81, 85, 88, 106 McCord, M. R., 35, 38 McDonald, P., 250, 264 McGinley, H., 176, 200 McGrath, J., 249, 258, 264 McGrath, J. E., 271, 289 McGraw, B. D., 171, 199 McGuire, M., 204, 216 McGuire, T. W., 254, 256, 258, 264 Mcintyre, R. M., 224, 225, 244 McKinley, W., 204, 216 McNeese, M. D., 274, 287, 291 Mead, G. H., 228, 244 Meehl, P. E., 46, 63, 131, 134 Mehrabian, A., 275, 289 Meine, P., 213, 214, 216 Mellers, B. A., 57, 64 Melone, N., 110, 123 Melton, G. B., 168, 171, 172, 179, 180, 199 Meshkati, N., 258, 259, 264 Meyer, B. J. F., 239, 244 Meyer, R. M., 22, 38 Michaelsen, L. K., 248, 264 Mickenberg, 1., 171, 199 Miene, P., 152, !54, 158, 162, 163, 164, 166, 213, 214, 216 Milgram, S., 272, 289 Miller, M. J., 57, 64 Miller, R. A., 227, 243 Milojkovic, J. D., 98, 107 Mitchell, T. R., 25, 37, 133, 134 Mittman, R., 248, 255, 263

AUTHOR INDEX

Moffett, R. J., 230, 242 Monahan, J., 152, 166 Montero, R. C., 223, 224, 243 Moore, D. S., 127, 129, 134 Moore, J. L., 227, 244 Moran, G., 203, 217 Moran, R., 168, 200, 207, 217 Moreland, R., 258, 264 Morgan, B. B., Jr., 221, 222, 223, 224, 225, 230,243,244,245,249,264,287, 290 Morgan, J. N., 8, 9, /8, 25, 38 Morishige, R. 1., 268, 290 Morris, C. G., 223, 243 Morris, N. M., 225, 226, 227, 231, 235, 238, 239, 241, 245, 273, 290 Morrissey, J. M., 49, 50, 56, 58, 59, 61, 63 Muchinsky, P.M., 69, 85, 133, 134 Mullen, B., 253, 265 Mullens, P. M., 173, 182, 200 Mullin, T., 34, 38 Mumpower, J., 81, 85

N Nagao, D. H., 111, 112, 123, 252, 263 Nash, J., 4, 18 National Institute of Justice, 173, 200 National Research Council, 273, 290 Naylor, J. C., 65, 67, 69, 71, 72, 73, 76, 83, 85, 86, 88, 89, 105, 107, 108 Nesson, C., 137, 140, 148 Newell, A., 26, 29, 38, 39 Newman, J. R., 76, 85 Ng, P. C., 52, 53, 54, 55, 63 Nisbett, R. E., 8, 9, 18, 25, 38, 110, 123, 139, 148, 153, 166 Noonan, T. K., 52, 54, 55, 63 Norman, D. A., 225, 242, 249, 251, 264, 273, 290 Nunamaker, J., 254, 255, 264

0 O'Connor, R., Jr., 68, 85 Obermayer, R. W., 228, 245 Ochsman, R. B., 258, 262, 277, 290 Ogloff, J. R. P., 173, 195, 197, 198,200, 206, 207, 208, 209 Olsen, J. P., 254, 262

307

AUTHOR INDEX

Olson, M., 100, 101, 108 Onken, J., 22, 39 Orasanu, J., 221, 222, 223, 225, 227, 229, 230,233,234,236,237,238,244, 273, 290 Ortony, A., 225, 226, 228, 245 Oser, R., 223, 224, 230, 244, 287, 290 Ostrove, N., 204, 217

p Packer, 1., 171, 200 Paese, P. W., 98, 99, 100, 101, 105, 106, 107, 108 Palmer, E., 287, 290 Park, B., 127, 134 Park, R., 151, 152, 154, 158, 162, 163, 164, 165, 166, 213, 214, 216, 217 Parrish, R. N., 258, 262 Pasewark, R. A., 174, 176, 182, 200 Pattipati, K. R., 224, 229, 244, 246 Payne, J. W., 22, 23, 25, 27, 29, 31, 36, 37, 38, 39, 99, 106, 133, 134 Pearson, T., 254, 263 Pennington, N., 109, 123, 203, 204, 206, 216, 217, 253, 265 Penrod, S. D., 109, 123, 154, 155, 163, 165, 166, 203, 214, 216 Perlin, M. L., 168, 200 Person, L. J., 268, 290 Petrella, R. C., 171 , 200 Petrila, J., 168, 171, 172, 199 Petrilli, M., 9, 10, II, 18 Pew, R. W., 285, 286, 288, 290 Pezzo, M., 4, 18 Pfiefer, J. E., 195, 197, 200 Phillips, L. D., 90, 97, 98, 105, 107, 108 Picquet, D., 173, 200 Pinsonneault, A., 256, 258, 265 Politser, P. E., 34, 36, 39 Polley, R. B., 261, 265 Poythress, J., 168, 171, 172, 199 Press, M., 268, 286, 2.90 Prince, C., 224, 225, 230, 237, 243, 244 Pritchard, R. D., 65, 72, 73, 76, 77, 79, 80, 82, 83, 84, 86, 105, 107

R Raiffa, H., 250, 264 Rakos, R. F., 153, 165, 166

Ramsey, E. G., 285, 289 Randolph, R., 174, 182, 200 Raphael, T., 258, 265 Rasmussen, J., 231, 244, 251, 259, 265 Ravinder, H. V., 34, 39 Reason, J., 251, 265 Reeves, A. P., 96, 108 Rei!, F., 239, 243 Reis, H. T., 197, 199, 203, 216 Rentsch, J. R., 252, 265 Restle, F., 131, 134 Retelle, J., 268 Revelle, W., 22, 39 Reynolds, R. E., 221, 245 Rips, L. J., 226, 245 Roberts, C. F., 178, 195, 200 Roberts, K. H., 254, 259, 265 Robertson, B. A., 252, 263 Roby, T. B., 224, 244 Rochlin, G. 1., 254, 259, 265 Rogers, W. H., 285, 2.90 Rohrbaugh, J., 68, 83, 86 Ronis, D. L., 90, 98, 99, 105, 107, 108 Rorer, L. G., 46, 63, 68, 85 Ross, L., 98, 107, 108, 110, 123, 139, 148, 153, 166 Roth, P. L., 79, 83, 84, 86 Rouse, S. H., 251, 265 Rouse, W. B., 225, 226, 227, 228, 231, 235, 238, 239, 240, 241, 245, 251, 263, 265, 273, 290 Ruffel Smith, H. P., 285, 290 Rumelhart, D. D., 225, 226, 245 Rush, C. H., 230, 243 Russo, J. E., 20, 25, 36, 39 Rutherford, A., 225, 227, 231, 245

s Saffo, P., 248, 255, 263 Sage, A. P., 273, 290 Saks, M. J., 139, 148, 163, 166, 203, 204, 205, 209, 217 Salas, E., 221, 222, 223, 224, 225, 227, 228, 229, 236, 237, 240, 241, 242, 243, 244, 245, 249, 252, 262, 287, 288, 290, 298 Sales, B. D., 163, 165, 171, 173, 177, 192, 196, 199, 200, 206, 216 Salo, C., 204, 216

308 Salzburg, S., 212, 213,217 Sanders, G. S., 253, 265 Sanderson, P. M., 227, 245 Sarter, N. B., 285, 286, 290 Sattath, S., 24, 39 Sauer, R. H., 173, 182, 200 Savage, L. J., 106, 107 Savitsky, J. C., 171, 176, 183, 200 Sawyer, J. E., 70, 72, 73, 75, 76, 77, 79, 80, 82, 83, 86, 126, 127, 131, 133, 134 Sawyer, T. A., 127, 131, 134 Schechtman, S. L., 32, 38 Schenck, E. A., 67, 86 Scherer, L. M., 32, 37 Schkade, D. A., 27, 31, 38 Schmitt, N., 32, 38 Schum, D. A., 214,217 Schwartz, D. R., 287, 290 Schwartzkopf, A., 248, 264 Schweigert, W., 4, 18 Schweighofer, A., 173, 200 Scott, B., 287, 290 Seabright, M.A., 110, 123 Seeger, J. A., 248, 265 Serfaty, D., 221, 224, 225, 229, 230, 244, 249,262,264,286,288,290 Severance, L. J., 163, 166, 192, 200 Shanteau, J. C., 88, 106 Shavelson, R. J., 239, 245 Shaviro, D., 138, 148, 210, 217 Shaw, M. E., 109, 123, 276, 290 Shaw, R., 171, 175, 176, 182, 193, 198, /99 Sheets, C. A., Jr., 57, 64 Sheffey, S., 109, 111, 118, 124 Shinotsuka, H., 90, 105, 108 Shoben, E. J., 226, 245 Short, J., 258, 265, 275, 290 Shugan, S. M., 25, 39 Sibbet, D., 248, 255, 263 Siciliano, C., 9, 10, 11, 18 Siegel, J., 254, 256, 258, 264 Siegel-Jacobs, K., 4, 18 Sigall, H., 204, 217 Simon, H. A., 20, 26, 29, 39, 250, 252, 265 Simon, R. J., 140, 148, 168, 170, 175, 200 Slobogin, C., 168, 171, 172, 199, 200 Slovic, P., 12, 18, 24, 35, 38, 39, 46, 63, 68, 85, 109, 113, 123, 249, 251, 263, 294, 299 Smith, C. G., 236, 245 Smith, E. E., 226, 245

AUTHOR INDEX

Smith, K. A., 237, 245 Smith, V. K., 36, 39 Smith, V. L., 155, 165, 192, 195, 200 Sniezek, J. A., 69, 71, 72, 86, 87, 88, 89, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 110, 113, 122, 123, 125, 126, 127, 133, 134 Solomon, H., 109, 110, 123, 131, 134 Sor, Y. S., 168, 195, 199 Stammers, R. B., 254, 265, 287, 290 Stassen, H. G., 225, 226, 245 Stasser, G., 110, 123, 130, 134, 253, 265, 272, 290 Stasson, M. F., 258, 263 Steadman, H. J., 170, 174, 193, 198, 200 Steele, W. W., 195, 196, 200 Steiner, I. D., 109, 123, 258, 265, 271, 290 Steinmann, D. 0., 67, 85 Steinmetz, G. G., 268, 290 Sternberg, R. J., 142, 148 Stevens, A. L., 231, 245 Stewart, I. D., Jr., 153, 166 Stewart, T. R., 67, 85 Sticha, P., 249, 265 Stock, H. V., 171, 198 Stockburger, D. W., 50, 64 Stollings, M. N., 287, 290 Stone, E., 4, /8 Stone, P. J., 261, 265 Stout, R., 223, 224, 245 Stuebing, K. K., 79, 83, 84, 86 Stuve, T. E., 155, 165 Suggs, D., 177, 199 Summers, D. A., 45, 57, 63, 64, 67, 81, 85 Summers, R. C., 56, 64 Summers, S. A., 56, 64 Sundstrom, E., 223, 245, 249, 261, 265 Sundstrom, G. A., 22, 39 Susman, J., 174, 177, 198 Switzer, F. S., Ill, 100, 106, 108 Switzer, F. S. C., 98, 99, 100, 101. 108

T Tanford, J. A., 156, 163, 166 Tanford, S., 163, 165 Tannenbaum, S. 1., 221, 223, 245 Taylor, S. E., 153, 165, 252, 265 Tenny, Y. T., 285, 290 Tetlock, P. E., 248, 249, 253, 265

309

AUTHOR INDEX

Thaler, R. H., 5, 18 Thompson, W. C., 139, 148, 205, 209, 210, 211, 212, 216, 217 Thordsen, M., 229, 244, 252, 264 Thornburgh, E. G., 195, 196, 200 Thorndyke, P., 273, 291 Thorngate, W., 25, 29, 39 Thro, M. P., 239, 245 Tindale, R. S., 109, 110, Ill, 112, 114, 118, 123, 124, 127, 134, 248, 251, 252, 263, 265, 272, 289 Titus, W., 130, 134, 253, 265, 272, 290 Toda, M., 90, 105, 108 Todd, F. J., 45, 64, 83, 85 Tolcott, M. A., 252, 266 Toppino, T., 13, 14, 18 Torgerson, W. S., 46, 64 Trafimow, D., 98, 100, 108 Tribe, L. H., 137, 138, 139, 141, 143, 146, 148 Tucker, L. R., 48, 56, 64, 67, 81, 86 Turnbull, S., 173, 200 Turoll, M., 287, 289 Tversky, A., 20, 24, 31, 35, 39, 75, 76, 84, 85, 98, 108, I 09, Ill, 112, 118, 119, 120, 123,124,143,144,/48,249,251,252, 263, 294, 298, 299

u Ulmer, W., 262

v Valacich, J. S., 259, 261, 263 Vallee, J., 276, 289 Vallone, R. P., 98, 108 Van Duizend, R., 209, 217 Vaupel, J. W., 36, 37 Veldhuyzen, W., 225, 226, 245 Vidmar, N., 203, 204, 216, 217 Vincent, K. R., 146, 147 Viorst, M., 13, 18 Viscusi, W. K., 36, 39 Vollrath, D. A., 272, 289 Von Winterfeldt, D., 35, 36, 39 Vreuls, D., 228, 245

w Walker, L., 152, 166 Wang, D. F., 90, 105, 108

Watson, S. R., 34, 35, 39 Watson, W. E., 248, 264 Webster's New Collegiate Dictionary, 255, 266

Weeks, G. D., 258, 262, 263 Wegner, D., 229, 245, 252, 253, 266, 272, 291

Weiner, E. L., 284, 291 Weissinger-Baylor, R., 254, 264 Weiten, W., 197, 200 Wellens, A. R., 258, 264, 266, 271, 272, 274, 275, 276, 277, 278, 279, 280, 281, 282, 287, 288, 291, 296, 297 Wells, G. L., 140, 148, 155, 166, 211, 218 White, B., 239, 243 Whittemore, K., 173, 200 Wickens, C. D., 225, 235, 245, 270, 271, 274, 287, 289, 290, 291 Williams, E., 258, 265, 275, 290 Williams, P., 286, 288 Willis, W. G., 145, 148 Wilson, D. L., 285, 287, 289, 291 Wilson, D. W., 197, 200 Wilson, J. R., 225, 227, 231, 245 Wissler, R. L., 163, 166 Wohl, J. G., 229, 246, 274, 275, 291 Woodard, E. A., 222, 244, 249, 264 Woods, D. D., 285, 286, 290 Wright, E. F., 110, 124 Wright, G. N., 105, 108 Wright, P. L., 25, 39 Wright, W. F., 31, 39 Wrightsman, L., 195, 199, 204, 206, 216, 218

y Yates, J. F., 90, 98, 99, 102, 105, 107, 108 Yntema, D. B., 46, 64 Young, R. M., 231, 246

z Zeisel, H., 203, 216, 218 Zhu, Y., 90, 105, 108 Zimmer, S., 223, 224, 243 Zirk, D. A., 230, 242 Zuckerman, M., 196, 199, 203, 204, 216

SUBJECT INDEX

A Abilene paradox, 253 Adaptive decision strategies, efficiency, 33 Adaptive strategies, 26, 29-34 Additive difference, 23, 29 Additive utility, 23, 29 Aircrew Decision Model, 269 Ambiguity, 75 Ambiguity avoidance, 102 Anchoring and adjustment, 98 Avianca Airlines, 247, 253, 257, 259, 260, 261

B Ballew v. Georgia; 205, 218 Base rates, 139-149, 296 and Bayesian techniques, 145-146 at trial, 209-212 cab problem, Ill, 143 fallacy, 111-113, 252 Bayesian aggregation and DIPS, 146 Bayesian techniques, base rates, 145-146 Bazemore v. Davis; 143 Boelcke, Oswald, 268

Bounded rationality, 294 Brunswik's lens model, see Lens model Burden of proof, convincing evidence, 172 preponderance of evidence, 172 reasonable doubt, 172

c Cab problem, 111, 143 Calibration, 90ft, 96 Cascaded inference, 214 Castellan-Edgell hypothesis generation model, 43, 48-49, 57-59 Catch-22, 296 Causal judgment, 65 Cautionary instruction, 156 Chernobyl nuclear plant, 259 Choice and equal weighting, 20-21 CITIES, 275, 277-281, 285 Cognitive effort, measuring, 26 Cognitive heuristics, 109 Collaborative work, 255-257 Commonwealth v. Vogel; 172, 201 Commonwealth v. York; 172, 201 Communication effectiveness, 258

311

312

SUBJECT INDEX

Compensatory decision processes, 22 Confidence, accuracy, 91, 104 confidence curves, 9011 hypothetical, 9511 prototypical, 93-94 confidence-frequency effect, 100-1 01 dual-process hypothesis, 100 group, 102-1 06 group discussion, 104 judgment, 126 "over-under confidence paradox", 101 Configura! information, 46, 132 measurement, 67-68 measuring utilization, 4911 Conjunction errors, 113-122 Contingencies, critical incidence interviews, 79 focus group interviews, 79 nonlinear, 65-86 Contingency judgment, 35 cue labeling, 69, 71 Contingent choice behavior, 20-25 Contingent strategy and task complexity, 22 Covariation judgment, 6511 Cues metric, 45 nonmetric, 45

D Day v. Boston and Maine R.R.; 137, 142 Decision acceptability, 131 Decision aiding, 34-36 warning labels, 36 Decision analysis, 34-36 Decision errors, group, 109-124 individual, 109-124 Decision strategies, effort-accuracy framework, 25-28 Decision making, and economics, 3ff benefits-costs, 22 group, 258 Simpson's paradox, 127-131 individual vs group, 127 information seeking, 298 norm seeking, 298 see also Group decision making

Desert Storm, 259 Diagnostic Inventory of Personality and Symptoms (DIPS), 146 Distributed decision making, defined, 273 Distributional fairness, 140n Durham v. United States; 174, 175, 201

E Economics and decision making, 3ft Effort and accuracy in choice, 29ff Effort-accuracy tradeoffs, 31-32 Elementary information processes (EIP), 26ff validation, 27ft Elimination by aspects, 22, 23, 29 Ellsberg paradox, 102 Equal weighting, 29 Error theories, 35 Expectancy bias, 294 Expert testimony, reliability, !55 Eyewitness testimony and hearsay testimony, 152 see also Hearsay testimony

F Familiarity and validity, 16 Federal Rules of Evidence, 151 Fisher, Sir Ronald, 298 Functional measurement, 131

G Group situation awareness & decision making, 271ft Groups, error checking, 109, 122 Groups, Judge-Advisor System (JAS), 102-106 Groupthink, 236, 253, 272

H Hearsay evidence, 151-166 jurors' use, 154ff jurors' evaluation, 212-215 caution, Minnesota, 156 opinions about, 163-165

313

SUBJECT INDEX Heuristics and choice, 20-21 Anchoring and adjustment, 98 Cognitive heuristics, I 09 Hindsight bias, 9- I 3, I 6- I 7 debiasing, 12 Iraqi invasion of Kuwait, 12 medical diagnosis, 9 Human-human teams, 283 Human-machine interface, 296 Human-machine teams, 283 Hypothesis-generation model, 4311 see also Castellan-Edgell model

Illusory correlation, 251-252 Image theory, 133 Incentives, 3 I Information errors, mistakes, 251 slips, 251 Insanity, American Law Institute standard, 170 burden of proof, 172 cognitive test, 170 Durham standard, 174 Justly Responsible test, 195-196 M'Naughten standard, 168 mock juries, 175-177 Intellectual teamwork, 248

J Juror skepticism effect, I 55 Jury instructions, 20611 comprehension, 175-178, 18611 memory, 190 Jury research, Federal Rules of Evidence, 151 generalization, 204 and insanity defense, 167-201 memory, 190 mock juries, critique, 204

K KOALAS (Knowledgeable Observation Analysis-Linked Advisory System), 297

L LaFollette v. Raskin; 170 Lens model, 44, 48, 56, 67, 68, 81, 83, 88, 131 nonlinear relations, 56 Lexicographic choice, 20-21, 26 Linear models, weighting schemes, 82 Lockhart v. McCree; 205, 218

M Martin Marietta, 247, 250, 252, 257, 258, 261 Maryland v. Craig; 213, 218 Memory and jury instructions, I 90 Memory errors, MCPL, 55-56 Mental models, 221-246, 252, 273 heuristics, 20-21, 226 schema, 226 training, 238-240 shared, 22711, 298 expectations, 234 evaluation, 240-241 group think, 236 personality, attitudes, 237 stress, 229 surrogate model, 231 task-action mapping model, 231 taxonomy, 233 team mind, 229 team decision making, 221-246 training, 240 transactive memory, 229 workload, 229 Mind-machine interface, 297 Miranda v. Arizona; 138, 149 Mock juries, critique, 204 see also Juries Motivation, judgment, 73 Multiple strategies, 19-39 Multiple Probability Learning (MPL), 68 Multiple-Cue Probability Learning (MCPL), 4511 metric, 56 validity and utilization, 46-48

N Noncompensatory decision processes, 22 Normative models, problems, 76

314

SUBJECT INDEX

0 Organizational behavior, contingency judgment, 72ft Organizational behavior, covariation judgment, 72ft Overconfidence, 113-118 feedback, 113-118

p Performance norms, contingencies, 77 Persian Gulf, 274 Predictability, 66 nonlinear relations, 66 Preference reversal, 24 Preferences and decision behavior, 19 Probabilistic evidence, 138 Probabilistic functionalism, 88 Problem solving, group, 109 individual, 109 individual vs group, 110 Process tracing, 131 Production rules, 29 Productivity Measurement and Enhancement System (ProMES), 79-83 Prospect theory, 76 Psychological Distancing Model, 276

R Rational decision making vs consensual decision making, 250 Recall, schema, 17 Regina v. M'Naughten; 168, 170, 172, 173, 174, 175,201 Resource allocation, 73 Risk aversion, contingencies and biases, 75

s Salience, cue, 132 and meaningfulness, 133 Satisficing, 20-21, 23, 29, 250, 252 Schema, jury effects, 195

recall, 17 shared, 273 Schleisner v. State; 170, 201 Sensitivity analysis, 35 Simpson's Paradox, 127-131 Sinde/1 v. Abbott Labs; 211, 218 Single-cue Probability Learning (SPL), 68 see also Multiple-cue probability learning Situation awareness, defined, 268 distributed decision making, 267-287 Sixth Amendment, cautionary clause, 162,213 Smith v. Rapid Transit, Inc.; 139, 141, 142, 143, 149, 211, 218 Social cueing, 102-103 Social Judgment Theory (SJT), 67, 81, 83 Social Decision Scheme (SDS), 118, 121 State v. Boyd; 138, 149 State v. Carlson; 138, 149 State v. Johnson; 195, 201 State v. Kim; 138, 149 State v. Neilsen; 138, 149 State v. Schwartz; 138, 149 Strategy selection, 25 Subjective uncertainty, active choice, 98-99 alternatives, 98 confidence, 87ft defined, 88 effort, 99-1 00 environmental uncertainty, 89-90 influence on confidence, 98 level of aggregation, 100-101 Sunk cost effect, 6, 17 theater study, 6 Chicago Tribune, 8

T Tarasoff v. Regents of University of California; 138, 149 Taskwork, 223 Team, defined, 222 and group distinguished, 223 Team decision making (rDM), 221-242, 247-262 biases, 249ft cognitive biases, 249 computer supported, 248ft defined, 222

315

SUBJECT INDEX Three Mile Island, 259 Time pressure decision weights, 32-33 strategies, 30 Turner v. U.S.; 211, 2/8

u Uncertainty defined, 88 types, 125-126 U.S. v. Hodge and Zweig; 138, 149 U.S. v. Brawner; 196, 201 U.S. v. Hinckley; 167, 170, 201 USS Stark, 295, 296

USS Vincennes, 247, 251, 254, 257, 258, 259, 260, 261, 293, 294, 295, 296, 298

v Validity effect, 13-16 propaganda, 16 Virginia v. Hawk; 137, 149

w Windfall gain, 311 and tax rebates, 5 In re Winship; 172