Applied Probability Models and Intuition 9780989910873

569 74 16MB

English Pages 364 Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Applied Probability

99 92 16MB Read more

Probability Models in Engineering and Science

686 10 28MB Read more

Applied Problems in Probability Theory

112 26 16MB Read more

An Introduction to Applied Probability 9783031493058, 9783031493065

111 78 6MB Read more

An Introduction to Applied Probability 9783031493058, 9783031493065

This book provides the elements of probability and stochastic processes of direct interest to the applied sciences where

99 1 6MB Read more

Applied Probability and Stochastic Processes [1st ed.] 9789811559501, 9789811559518

This book gathers selected papers presented at the International Conference on Advances in Applied Probability and Stoch

690 111 8MB Read more

Mixture Models (Chapman & Hall/CRC Monographs on Statistics and Applied Probability) [1 ed.] 0367481820, 9780367481827

Mixture models are a powerful tool for analyzing complex and heterogeneous datasets across many scientific fields, from

111 20 10MB Read more

Linear Probability, Logit, and Probit Models 0803921330, 9780803921337

352 48 1MB Read more

Discrete and Continuous Models and Applied Computational Science. № 2

162 79 3MB Read more

Applied Linear Regression Models 0073013447, 9780073013442

1,223 124 4MB Read more

Applied Probability Models and Intuition
9780989910873

Author / Uploaded
Arnold I. Barnett

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Applied Probability Models and Intuition

Applied Probability Models and Intuition

ARNOLD I. BARNETT George Eastman Professor of Management Science Professor of Statistics Sloan School of Management Massachusetts Institute of Technology

Dynamic Ideas LLC Belmont, Massachusetts

Dynamic Ideas LLC 43 Lantern Road Belmont, Mass. 02478 USA

Email: [email protected] WWW Information and Orders: http://www.dynamic-ideas.com Cover Design: Patrick Ciano © 2015 Dynamic Ideas LLC

All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

Publisher's Cataloging-in-Publication Data Barnett, Arnold I. Applied Probability: Models and Intuition Includes bibliographical references and index 1. Applied Probability. 2. Uncertainty Modeling. QA276.12.M643 2015 ISBN: 978-0-9899108-7-3

3. Applications.

Title.

Dedication

With deep gratitude, I dedicate this book to my wife Harriet, my children Andrew and Lauren (two-legged) and Duncan and Lilly (fourlegged), my mother Doris Barnett, and in memory of my father Leon Barnett and my parents in law, Beatrice and Shepard Schwartz.

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. vii

CHAPTER 1 The Laws Of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1

A Set Theory Primer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2

Some Basic Probability . . . . . . . . . . . . .. . . . . . . . . . . . . . .. ... . .. . . 4

1.3

Conditional Probability .... ... .. ..... . ...... . .. . .. . . . . . . . . . . 14

1.4

Games of Chance .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .23

1.5

Health and Safety ... ........ . . . . . . . . . . . . .. . . . ... . .. . .. . . .40

1.6

Other Matters . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . ... .. ........ 51 The Takeaway Bar . .. ... . .. . . ... .. .. . . . . . . . . . . . . . . . . . . . . . .69 Key Formulas ... . ...... .. . . . . . .. .. .. . .. . .. . . . . . .. . . ... .. 70 Some Parting Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 EZ Pass . ... . ......... . .. .. . . . . . . . . . . .... . ........ . ... .72 Further Chapter 1 Exercises .... . . .... ... . .... .. ... . .. . ... . .. .73

CHAPTER 2 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 91 2.1

A Counting Formula .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92

2.2

Random Variables . . . . . . . . . . . . . . ... ..... . .... . .... ... .. . .. 95

2.3

Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98

2.4

Summary Statistics for a Discrete Random Variable . . . . . . . . . . . . . . . . . . 100

2.5

The Binomial Distribution . . . . . . . . . . . . . . . . . . . ....... . ... .. . . 108

2.6

Some Further Mean and Variance Formulas . . . . . . . . . . . . . . . . . . . . . . 122

2.6

The Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . 126

2.7

The Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

2.8

The Hypergeometric Distribution . . ..... ........ . .. . . . . . . . . . . . 137

2.9

The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 The Takeaway Bar . .. .... . . .. ... .. . . ... . .... .. . ...... .... 159 Key Formulas . . .. . . . ........ . .... . . .. ....... . . . . .. . .... 160 Some Parting Thoughts .... .. . ... .. . . . .. . . ... . ..... . .. .... 162 Chapter 2 EZ-Pass . . ..... .. .. ... . . .. ..... . .. . . . . . . . . ..... 163 Chapter 2 Further Exercises ... ... .. . . .. .... .. .. ... . .. .. .. . .. 164

vii

viii

Contents

CHAPTER 3 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 3.1

The Cumulative Distribution Function . ...... ..... . .. ... ..... . .. 178

3.2

The Probability Density Function . . .... . . . . . . . . . . . . . . . . . . . . . . . 180

3.3

The Uniform Distribution . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 184

3.4

The Exponential Distribution . .... .. . . . .. . ...... . ... . .. . ... .. 194

3.5

The Gamma Distribution .. . . . .. . .. . . . .. . ... .... . .. . . . . . . . . . 207

3.6

The Normal Distribution .. . ... . .... . ...... . . . .. ... ...... .. . 212

3.7

Other Probability Distributions .. . ...... . .... . . . .. . . . .. . ... . .. 227

3.8

Simulation for a Continuous Random Variable .. . .. .. . . . . . . . . . . . . . . 228 The Takeaway Bar .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . .. . 234 Key Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .... 235 Some Parting Thoughts ..... . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . 236 Chapter 3 EZ Pass .... . .... . ... . . . ...... ..... . ..... . . . .. . 237 Further Exercises .. ..... . ... ... . .. . ... . . . . . . . . . . . . . . . .... 238

CHAPTER 4 Combinations of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 253 4.1

Joint Distributions ofRandom Variables .. . . . . . . . . . . . . . . .. .. . .. .. 254

4.2

Sums of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. 256

4.3

Some Mean Feats .. . .. .. . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . 262

4.3

Independent Random Variables .. .. .. ..... . . . . . . .... . . ....... 264

4.4

Sweet Sums ofRandom Variables . .... .. . ... . . . . . . . . . . . . . .. .. . 269

4.6

Discrete Transforms and Independent Random Variables .. . . . . . . . . . . . . 277

4.7

Moment Generating Functions ... .. . . . . . . . . . . . . . . . . ..... .. . . 290

4.8

The Central Limit Theorem .. .. .... .. . .... . . .. . . . . . . . . . . . . . . 298

4.9

Correlation ... . .... . ..... .. .. . .. . . . .. ... . . .. . . . ... . . .. 312

4.10 Sums of Correlated Random Variables . . . . . . . . . . . . . . . . . . . . . . .. .. 325 The Takeaway Bar .. .. ... . ... . . . .... .. ..... .. . . . .. . . . . . . . 333 Key Formulas ..... . .. .. .. . . . .. . ...... . . . . . . . . . . . . .. .. . . 334 Some Parting Thoughts .. ... .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Chapter 4 EZ Pass .... . . . .. ... ...... . ...... . .... ... ... . . . 337 Index . . ....... .. . . . . . . . . . . . . . . . . . . .... .. . . . . . . . . . . .... . . 351 Solved Problems ...... . .. . . . . . .. . . . ....... . . . . . . . . . . . .. . .... 353

T}le study of Probability goes back centuries, and many fine books have been writ-

I ten about it. An author should therefore avoid the bombastic claim that he has produced something totally new. Indeed, if such novelty means divorcing the book from the best thinking that preceded it, then the reader should be hesitant to pick it up. That said, an author has to believe that something worthwhile distinguishes his book from its predecessors. Let me set forth my hopes for this volume, and the reasons I believe it is not redundant of others. This book is an introductory textbook in Probability, written from the viewpoint of Applied Mathematics. Under that perspective, theoretical rigor is not denigrated, but the book puts a high premium on offering an intuitive overview of key theorems as well as clear evidence of their usefulness. Beyond teaching specific methods, the book strives to create real comfort with the underlying notions of probabilistic thinking. Grasping such ideas is essential if they are to be used with competence and confidence. To achieve these goals, the book aims to provide: An Exceptionally Large Inventory of Real-Life Examples Drawn from Many Fields Consistent with the theme of Applied Probability, the book contains well over one hundred diverse and (I hope) interesting problems, which will serve as key components of the pedagogy. The idea is simple: the more readers care about the answer to a problem, the more interest they will take in the analysis that generates the answer. Some introductory textbooks give the impression that the most interesting applications of Probability in the world concern coin flips, die tosses, and urns filled with balls. While some such examples have educational value and I do not avoid them, my emphasis lies elsewhere. For example, I try to show with probabilistic reasoning why The New York Times erred in interpreting homicide data from Detroit, how quality-control charts work, why genetically recessive populations do not die out, what workload air-traffic controllers face at high-level crossing points, and how the risk of investing in stock options compares to that of investing in the stocks themselves. The aim is to have examples that are consistently stimulating and that will attract the attention of all readers. It would not be helpful if Engineering students lose interest whenever applications from Biology are raised, or vice-versa.

ix

X

Preface

An Unusually Strong Emphasis on Achieving an Intuitive Understanding of the Material A primary objective of the book is to demonstrate that Probability and Statistics are, more than anything, just crystallizations of common sense. The reader should come away from each lesson understanding not only how a result was obtained but also why the outcome is plausible. I am hardly the first author to see the virtues of intuition, but my background in Physics and Applied Mathematics makes me very uncomfortable with accepting an answer that, while clearly correct, just doesn't seem to make sense. My treatment of the classic birthday problem illustrates my priorities for this book. In that problem, calculations reveal that birthday overlap in a small group is far more likely than one might initially expect. But if that finding is to be more illuminating than unnerving, it is important to conduct a "post mortem" and make clear how-if viewed from the right perspective-the result is not really surprising. I offer such a post-mortem, and subsequently address the further question of the extent of birthday overlap one would expect in a group of size N. (The answer to that question is if anything even more surprising than the answer about whether there is any overlap.) Furthermore, to stress the point that the birthday problem is more than fodder for a party game, I present an exercise based on my experience at a paper mill in Wisconsin. The manufacturers were dismayed at what seemed like an uncanny tendency of small random tears in the paper to cluster, thereby reducing paper strength to an unacceptable degree. The explanation for this pattern arose from a direct analogy to the birthday problem.

A Conversational Style of Presentation I have tried to write the book in an informal and occasionally light-hearted style, although I placed far higher priority on lucidity than entertainment. The reader should want to pick up the book and read it, and sense that the author is there with her as she tries to think the material through. Make no mistake: Probability is a subtle subject, and solving Probability problems is considerably more difficult than watching others solve them. Once one learns the Pythagorean theorem, using it to find the hypotenuse of a right triangle is straightforward. The laws of Probability are rarely so generous: Often one must express the problem in just the right way before the laws become willing to reveal its solution. Beyond mathematics per se, the discussions and practice exercises are meant to encourage the calm and self-confidence that are prerequisites to real mastery of Applied Probability. The Intended Audience

The book is intended to a support a one-semester course in Applied Probability, though it could undergird a one-quarter course if some topics are skipped. (Topics that are not necessary for subsequent material (e.g., z-transforms for discrete random variables) are identified when they are introduced.) The book strives to be suitable for both undergraduate and graduate students who concentrate in Engineering,

Preface

xi

Business, the Natural Sciences, and the Social Sciences, as well as for Math majors with an interest in applications. As suggested earlier, I am not among those who believe that engineers need learn from a book about "probability for engineers" or that business students need a volume about "probability for business." The laws of probability are the same regardless of the domain of application, and the thinking about (say) a problem in Finance can benefit the Chemistry major as well as the economist. The book does not minimize the importance of derivations of key results (the author got his doctorate in Mathematics), but it is possible for instructors to deemphasize these derivations and move directly to the results when it seems desirable. The reality is that there is a finite amount of time in the introductory Probability course, and a tradeoff must be made between an emphasis on mathematical rigor and a zestful treatment of applications. This volume clearly tilts towards the latter, but proofs of primary results appear either in the text or in the website that accompanies this book. As happens at MIT, students who complete this book should be ready to move on to more advanced courses in Applied Probability should they so desire. Though the book does not cover such fascinating topics as Markov Processes and Brownian motion, it immerses the reader in enough probabilistic thinking that she can readily progress to the study of such topics. Indeed, it would be immensely gratifying if this book served to inspire students to take further courses. The book assumes that the reader has taken a first course in Calculus. Calculus is fundamental to critical concepts like probability density functions and transform methods, and it provides a basis for especially-appealing derivations of such key results as the Poisson-distribution formula. But part of the reason for a Calculus requirement goes beyond its immediate utility. Completion of a Calculus course suggests a level of mathematical mastery and maturity that I believe appropriate to achieve comfort with the book's treatment of Applied Probability. Even when Calculus is not required to solve a Probability problem, the reasoning skills associated with instruction in that subject are often important. Solution Manual

A solution manual that includes solutions to all exercises in the book is available to qualified instructors. Interested parties should contact the publisher at info@ dynamic-ideas.com to obtain the solution manual. Some Acknowledgements

Perhaps the hardest part of writing this book is preparing this section of acknowledgements, because I am fearful of accidentally leaving people out. Over the past several decades, I have taught thousands of students at MIT and worked with nearly 100 teaching assistants, and a huge number of these people have influenced my thinking about what material is important and how to explain it effectively. I have also benefitted greatly from my experience outside the academic world in research and consulting, which has given me a strong sense of the value of Applied Probability in coping with real-world problems (many of which are reflected in the book). The

xii

Preface

lawyers, medical doctors, engineers, economists, and business and government leaders whom I've been lucky enough to work with are literally too numerous to mention, but my debt to them is enormous. However, I should single out some of the remarkable individuals from the academic world who have encouraged me and influenced me in important ways. Among faculty, a partial list would include Alfred Blumstein (Carnegie Mellon), Jonathan Caulkins (Carnegie Mellon), David Czerwinski (San Jose State), the late Alvin Drake (MIT), Robert Freund (MIT), Mark Hansen (Berkeley), Kenneth Huang (Singapore Management University), the late Paul Gray (USC), Linda Green (Columbia), Edward Kaplan (Yale), Richard C. Larson (MIT), Daniel J. Kleitman (MIT), Polykarp Kusch (Columbia), John D. C. Little (MIT), Gordon Kaufman (MIT), the late Robert Machol (Northwestern University and the Federal Aviation Administration), Thomas Magnanti (MIT and Singapore Institute of Technology and Design), Susan Martonosi (Harvey Mudd), Donald Morrison (UCLA), Scott Neslin (Dartmouth), the late Gordon Newell (Berkeley), Amedeo Odoni (MIT), Cynthia Rudin (MIT), the late Abraham Siegel (MIT), Robert Shumsky (Dartmouth), and Roy Welsch (MIT). I am especially grateful to the brilliant publisher of Dynamic Ideas LLC, Dimitris Bertsimas (MIT). Among MIT students, I owe much to Robin Bose, Lincoln Chandler, Jesse Goranson, Bob Jordan, Mark Kamal, Anthony Lofaso, Steve Miller, Arvind Nagarajan, C. S. Venkatakrishnan, and in particular Brian Chang, who prepared most of the solutions to exercises in the book. I am of course grateful to my wife and children, who make my life better with probability 100%. A Final Thought

If my experience is any guide, gaining skill in Applied Probability makes one a better thinker. Applied Probability requires both logic and discipline, because the laws of Probability are unforgiving of imprecise or superficial reasoning. And, if one learns to avoid sloppy thinking in the context of Applied Probability, it seems to carry over to other domains. Nearly five decades later, I am still influenced by my undergraduate Physics course in Electricity in Magnetism even though I scarcely remember what I learned about capacitors and resistors. I would consider my book a success if it likewise imparts ideas to readers that, years later, help them grapple with both Probability problems and dilemmas far removed from Probability. To state the obvious: I hope that readers and instructors find this a valuable book, and that they will let me know about changes that could make it better. Arnold Barnett Cambridge, Massachusetts October 2014

The Laws Of Probability

A Birthday Problem "Wanna bet?" Minerva asked Mendel as they arrived at the party. "There are 60 people here, and I'll give you $100 if no two of us share a birthday. If the re is any birthday sharing, you only have to give me $1." Rather puzzled, Mendel responded "You're kidding, right? There are 365 days in a year, so the re's ple nty of room for 60 people to spread out thei r birthdays without overlap. In fa ct, even if each person had six birthdays, there are enough days t hat no two people need share a birthday. This bet makes no sense for you unless ... you know t he bi rthdays of some people here, so you're su re there's ove rlap. Is that it?" "I don't know any birthdays here except my own," Minerva replied, annoyed at the hint that she was devious. "I base the bet solely on the laws of probability. Are you ready to make the bet you think is so unwise?" "Sure," Mendel answered, "but I think you 're throwing your money away." To be continued

e all invoke the ideas of probability long before we are exposed to them for-

W mally. Few outcomes are definite and, among those things that may or may not happen, some seem more likely than others. Among people with any quantitative background, there is an almost instinctive groping for some numerical measure of the varying degrees of likelihood. And yet, formulating a precise concept of probability is very difficult. If not careful, one can wind up devising a "definition" of probability that tacitly assumes that one has some prior definition. It is frustrating that something so familiar can be so elusive; one could soon start echoing Supreme Court justice Potter Stewart who, in a decision about "decency" in art works, said he couldn't define indecency but knew it when he saw it. Here we will not embroil ourselves in the centuries-old controversy about what probability is. Rather, we will move briskly to the laws about how to use probabilities

2

CHAPTER 1

The Laws Of Probability

once we have them. These are a most remarkable series of laws: their simplicity is hardly suggestive of great power, but they are strong enough to provide the foundation of all Probability and most of Statistics.

A Set Theory Primer Because basic probability often invokes the ideas of set theory, we do well to start our work by reviewing those ideas. In simplest terms, a set S is a collection of elements, where the word "elements" is defined broadly enough to include people, objects, events, or bankrupt corporations. Some definitions: A subset A of set S contains some (but generally not all) elements in S. The intersection of two subsets A and B is the set of elements that are included in both A and B. The intersection is commonly assigned the notation P(A n B). The union of two subsets A and B is the set of items included in one or both of A and B. The union is usually assigned the notation P(A U B). A empty set, usually given the symbol 0, is a set that has no elements from S in it. For example, if S is the set of integers between 65 and 70, then the set of integers within S that are perfect squares (i.e., integers of the form x2, where x is an integer) has no members. Thus, that set is an empty set. So is the set of integers within S that are multiples of 16. Two subsets A and B within Sare said to be disjoint if An B = 0. For example, if A is the set of even numbers in the range 65-70 and B is the set of odd numbers in that range, then A and B are disjoint. Two subsets A and B within S are said to be complementary if two conditions are met: A n B = 0 and A u B = S. Complementary sets arise when the elements in S are partitioned into two disjoint sets. Subset B is contained in Subset A if all elements of B are also elements of A. This relationship is usually written as B c A. Geometrically, the relationship among sets within S is often depicted with a Venn diagram. In Figure 1-1, set A is the oval consisting of the yellow plus purple areas, while set B is the oval comprising the purple plus orange areas. Here the intersection of sets A and B is the purple area, while the union of A and B is anything either yellow, purple, or orange. FIGURE 1-1

A

Venn Diagram in which Set A is Yel low and Purple and Set Bis Purple and Orange

1.1

EXAMPLE 1.1

A Set Theory

Primer

3

Biggest of the Big Let S be the set of cities in Texas with at least 250,000 residents in 2010. The elements of S are: City

2010 Population (Thousands)

Location Within State

Arlington Austin (state capital) Corpus Christi Dallas El Paso Fort Worth Houston San Antonio

365 790 305 1198 649 741 2099 1327

Northeast Center Southeast Northeast West Northeast Southeast Center

Suppose that:

Subset A is Texas cities with at least one million inhabitants in 2010 Subset B is Texas cities in the eastern part of the state Subset C is the state capital Subset D is west Texas cities with at least one million inhabitants in 2010 Subset E is the complement of subset A What cities are contained in A n B, A u B, A n C, D, and E n B? Solution: • A n B consists of those east Texas cities (northeast plus southeast) with at least

one million inhabitants in 2010. They are Dallas and Houston. • A U B consists of those Texas cities that have at least one million residents and/

or are located in east Texas. They are Arlington, Corpus Christi, Dallas, Fort Worth, Houston, and San Antonio. • An C is 0, because Austin's 2010 population was below one million. • Dis 0, because El Paso's 2010 population was below one million. • E n B consists of east Texas cities with fewer than one million inhabitants. They are Arlington, Corpus Christi, and Fort Worth.

EXERCISE 1.1

What are the elements of E n C, E u C, B u (E n C), A u E, and E u A?

Now, on to Probability.

4

CHAPTER 1

The Laws Of Probability

Some Basic Probability Probability starts out with the notion of an experiment, which is some process that leads to a single outcome that (generally) is not known in advance with certainty. For example, a coin toss might be the experiment, and the outcome might be either heads or tails. A few more definitions: The sample space S is the set of all possible outcomes of an experiment (plus the empty set 0). An event A is a subset of S. Two events A and B are said to be mutually exclusive if the occurrence of one of them precludes the occurrence of the other. They do not share any common elements in the sample space (or, in the notation of set theory, A n B = 0). Mutually exclusive events are also described as disjoint. An elementary event cannot be expressed as the union of two or more disjoint events. Following the terminology of set theory, two events A and B are complementary if A n B = 0 and A U B = S. In general, we will use the notation A' for the complement of event A. For two events A and B, A (and possibly both).

u B is the event that at least one of A and

B occurs

Note that Venn diagrams arise naturally in connection with sample spaces. If A and B are events then, as in Figure 1-1, their intersection is the event that both of them occurred and their union is the event that at least one of them occurred. Only if the intersection is the empty set (i.e. there is no purple region they have in common) are events A and B mutually exclusive.

Coins, Dice, and Urns Probability courses are sometimes criticized because they seem to suggest that the most interesting events on earth involve coins, dice, and random drawings from urns. In this book, we will consider a somewhat wider range of applications. However, in introducing basic ideas in Probability, it can be highly useful to toss coins or dip into urns. It is inherently simpler to discuss a die toss than to describe why (say) a mortgage-backed security might default. Too much realism in early examples can detract from their probabilistic content, so we will start our journey into Probability on the coin/ die/ um circuit.

EXAMPLE 1.2

Double Header? A coin is tossed twice in a row. Several events associated with the outcomes are: A:

both tosses lead to heads

1.2

B:

5

Some Basic Probability

D:

the second toss yields a different outcome than the first both tosses lead to tails the first toss yields heads

(i) (ii) (iii) (iv) (v)

What is the sample space here? Among these events, which pairs are mutually exclusive? Which pairs are not mutually exclusive? Are any of these pairs complementary? Are all possible outcomes of the experiment covered by these events?

C:

Solution

The sample space S consists of all possible outcomes of the double toss, which in simplest form are the four outcomes:

(i)

(Heads, Heads)

(Heads then Tails)

(Tails then Heads)

(Tails, Tails).

A and B are mutually exclusive, as are A and C. B and C are also mutually exclusive, as well as C and D. (iii) A and D and B and D are not mutually exclusive. (iv) No pairs of events above are complementary, because the union of no two of them equals the full sample space. (v) Yes: because A u B u C = S, they cover all possibilities.

(ii)

EXERCISE 1.2

(i) (ii) (iii) (iv)

In words, what are events B n D, B u C, and A u D in Example 1.2? For what pairs of events is the intersection the empty set? In words, what is the complement of D? If E is the event that the second toss yields tails, is B = E u D? Briefly discuss.

Kolmogorov's Axioms

Probabilities are numbers assigned to individual events, and are chosen to correspond to the intuitive concept of likelihood. Higher probabilities and greater likelihood are meant to go hand in hand. Kolmogorov brought greater precision to the quantification of probability by advancing several axioms that a legitimate assignment of probabilities to events must satisfy:

Kolmogorov's Axioms

Suppose that S is the sample space consisting of all possible events that can arise in an experiment, and that there are n events labeled A 1 through A n that, collectively, include all elements in the sample space. The probabilities assigned to these events must have the following properties:

6

The Laws Of Probability

CHAPTER 1

•

P(A;) ~ 0 for all i

•

P(0)

•

P(S)

(1-1)

=0

(1-2)

= P(A 1 u A 2 ... u A,.) = 1

(1-3) (1-4)

We assume in this chapter that the number of events is either finite or "countably infinite." This last phrase means that we create a series of events in S of the form (C1, C2, •. • Ck, ... ) such that every possible event is a member of that series. (In Chapter 3, we will encounter situations in which the sample space is uncountably infinite. For example, if we are picking a number at random from the real line (0,1), then the number of possibilities vast exceeds the number of positive integers in the set (1,2,3, ...). These axioms imply some other rules about probabilities. Below we first state some individual rules, and then explain the reasoning that leads to them.

(1-5) As noted, A 1 and A 2 are complementary events. In words, if there are only two possible events and they are mutually exclusive, (1-5) says that the probability of one of them is one minus the probability of the other. That is true because P(S) = 1 and, given that S = A1 u A 2 and A; n Ai = 0, we have:

If events (Ai, A 2, AJ are pairwise mutually exclusive, (i.e. events for which A; n Ai = 0 for all 1 ~ i * j ~ n then: k

P(~ U Ai

... Ak) = }:P(A;)

(1-6)

i- l

We can establish this outcome recursively, by breaking events down into pairs that satisfy equation (1-4). For example, if A 1, A 2, and A 3 are pairwise mutually exclusive, then P(A 1 U A 2 u A 3) = P(B u A 3 ), where B = A 1 U A 2 . (Bis a well-defined event: it arises when either A 1 or A 2 occurs.) Clearly, B n A 3 = 0: if A 1 n A 2 = 0, then A1 n A 2 n A 3 is necessarily also 0. But then (1-4) implies that P(A 1 u A 2 u A 3) = P(B n A 3) = P(B) + P(A 3) and, because A1 n A 2 = 0, (1-4) also implies that P(A 1 n Az) = P(A 1) + P(Az). Putting it all together, we see that P(A 1 u A 2 u A 3) = P(A 1) + P(Az) + P(A 3). This argument can be extended to general values of k.

1.2

7

Some Basic Probability

If A is an event and Bi, ... Bk is a series of mutually exclusive (disjoint) events for which B1 u B2 ... u Bn = S, then: k

+P(AnBk)= }:P(AnB;)

(1-7)

i-1

(This rule is called the Law of Alternatives.) Because the B;'s are mutually exclusive, it must be true that the events A n B; and A n Bi are mutually exclusive events. Furthermore, the fact that B1 U B2 . .. U Bn = S means that every point in the sample space contained in A must also be contained in one of the B;'s. Therefore, the A n B;'s are mutually exclusive events the union of which must be A itself. We can therefore apply (1-4) to reach (1-7). Another rule arises quickly. Suppose that A c B. (Recall that c means "is contained in.") Then A n B = A, meaning that: If A c B then P(A c B)

= P(A)

(1-8)

Kolmogorov's axioms (1-1) ➔ (1-4) impose requirements that any viable assignment of probabilities must meet. But they do not imply what particular assignment is appropriate in a given situation. For example, if the sample space consists of exactly two disjoint points 1 and 2, then assigning 70% probability to point 1 and 30% probability to 2 would satisfy Kolmogorov. But so would assigning 43% probability to 1 and 57% to 2. We need to supplement the Kolmogorov axioms in some defensible way that favors some assignments over others. But how to supplement those axioms is a subject that has provoked disagreement.

Three Perspectives on Setting Probabilities As is ably discussed in deGroot and Schervish ((2002), there are three major interpretations of probability that have been proposed: subjective, classical, and frequency. The authors note that "no single scientific interpretation of probability is accepted by all statisticians, philosophers, and other authorities (because) .. . each interpretation of probability that has been proposed by some authorities has been criticized by others." For these reasons, the authors conclude that "the true meaning of probability is still a highly controversial subject." Here we make no attempt to resolve the controversy, but instead offer something of a mixture of the three interpretations to assign probabilities in particular circumstances.

Subjective Probability and Equally Likely Events The subjective concept of probability entails the use of a numerical scale to reflect varying degrees of likelihood. On a scale from O to 1, under which O reflects impossibility and 1 reflects certainty, a person chooses a number that reflects her belief about whether event A will occur. The number selected is considered her subjective probability of event A. Asked the question "which would surprise you more: if A occurred

8

CHAPTER 1

The Laws Of Probability

or if A didn't occur?" she would be told to choose 1/ 2 if her answer was "neither outcome would surprise me more than the other." (In that case, she deems A equally likely to occur as not to occur.) The choice of 3/ 4 would suggest considerable surprise if A doesn't happen, while the choice of 1/8 would imply even greater surprise if A did happen. Of course, different people might assign very different subjective probabilities to the same event: one person might estimate that there is a 60% chance that Hillary Clinton will be elected President of the United States in 2016, while someone else assigns probability 25% to her election. The notion of equal likelihood in subjective probabilities is assumed to extend to any two mutually exclusive events. Two mutually exclusive events A and B are considered equally likely if, when asked "which would surprise your more: having A occur or having B occur?," the person assigning probabilities would again answer "neither outcome would surprise me more than the other." (We will not engage in deep probing of what we mean by "surprise," but will rather assume that we have an intuitive understanding of the concept.) Of course, subjective probability assessments are required to be consistent with Kolmogorov's axioms. It simply would not do, for example, if the probabilities assigned to three mutually exclusive events added up to 1.5.

EXERCISE 1.3

Think about five events that might occur in your life in the next month-including two that are complementary and three that are mutually exclusive-and assign subjective probabilities to them. (For example, if you take a bus or train daily, what is your estimate of the chance that it will be on time throughout the month?) Do your probability assignments satisfy Kolmogorov's axioms and the equal-probability principle (if relevant)? If not, what did you do wrong?

A Classical Axiom about Equally Probable Events

In classical probability, some prior understanding of the words "equally likely" is assumed to exist, perhaps tied to the concept of "surprise" in connection with subjective probability. It is postulated that equally likely events will share the same numerical probability:

Equal Probability Axiom :

If two mutually exclusive events A and B are equally likely, then P(A)

= P(B)

(1-9)

Coupled with Kolmogorov's axioms, (1-9) allows the assignment of probabilities in some situations. For example, we would presumably treat heads (event A) and tails (event B) as equally likely if a fair coin were tossed, meaning that P(A) = P(B) under

9

Some Basic Probability

1.2

(1-9). But because A and B are mutually exclusive and because one of them must occur, they are complementary events. Therefore (1-3) and (1-4) tell us that: P(A

But then P(A)

EXAMPLE 1.3

n B) = P(A) + P(B) = l.

= P(B) means that P(A) + P(B) = 2P(A) = 1 ⇒ P(A) = P(B) = 1/ 2.

Sixth Sense

If P(Aj) is the chance of getting the number i on the toss of a fair six-sided die for j = l, 2, ... , 6, then what is P(A;)? Solution

Here the sample space S is the integers from 1 to 6, the only possible outcomes in such a die toss. Because A 1 U A 2 .. • U A 6 = S, we know from (1-3) that

P(~ u Az ... A6 ) = L,~; 1 P(A;) = 1. call the common value p, then 6p

EXERCISE 1.4

On a fair die, P(A;) is the same for all i; if we

= l. Thus, for i = l,2, ..., 6,

P(A;) =

¼.

Did you know there is such a thing as a 20-sided die? Take a look. On such a die, what is the probability of a getting a 3?

In combination with (1-1) ➔ (1-7), the equal-probability axiom can often allow broader calculations of probabilities. We turn now to two examples.

EXAMPLE 1.4

Urnings

Suppose that an urn has 100 balls numbered 1 to 100, and that a ball will be chosen completely at random from the urn (meaning that all these numbers are equally likely to be selected). Find the probability that: (i) (ii) (iii)

the number selected is 67 the number selected is even the number selected is a perfect square.

Solution

Let p;be the probability that number i is selected. Because different selections are mutually exclusive events, we know that the probability Q that some number is selected follows:

10

CHAPTER 1

The Laws Of Probability

Of course, Q = 1 here. And the equal likelihood assumption means that all the p;'s share some common value p. We can determine p by writing: 100 }: 1- l I

. p = }:100 . p = lO0p = l __.. p = .01 1- l

(i) (ii)

The probability that 67 is selected is .01. Of the 100 integers between 1 and 100, 50 are even. The selections of different even numbers are mutually exclusive events. Thus: P(even number selected) =

(iii)

I, ;~1 P(2j is selected) = 50p = 50 * .01 = ½

Of the integers from 1 to 100, 11 are perfect squares: 1, 2, 4, 9, 16, 25, 36, 49, 64, 81, and 100. Each has probability .01 of arising, and the various selections are mutually exclusive events. Thus: P(perfect square selected)

= 11 * .01 = .11

EXERCISE 1.5

A prime number, you might recall, is an integer greater than one that has the property that its only factors are itself and one. What is the probability that a ball randomly selected from the urn in Problem 1.4 has a prime number on it?

EXAMPLE 1.5

Double Urnings

Now suppose that a ball is selected at random from the urn with balls numbered 1 to 100 and that, without replacing that ball in the urn, a second ball is selected at random. What is the probability that: (i) The sum of the two selections is below five. (ii) Both balls selected have odd numbers. (iii) Two consecutive numbers are chosen, in ascending order Solution

If (i,j) are the two numbers selected in that order, there are 100 possible values of i and, for each of those values, there are 99 possible values of j. Thus, there are 100 * 99 = 9900 possible pairings of selections. (We are treating (2,5) as different from (5,2).) All 9900 such pairings are equally likely in this random drawing, meaning that, under the reasoning in Example 1.3, each (i,j) pair has a 1/9900 chance of arising.

1.2

(i)

(ii)

(iii)

Some Basic Probability

11

Of the 9900 pairings, exactly four yield a sum below five: (1,2), (1,3), (3,1) and (2,1). The probability that one of these five mutually exclusive possibilities arises is 4 * (1/9900) = 4/9900 = 1/2475. There are fifty possible values of i that are odd and, for each of them, there are 49 odd numbers that could come up for j. Thus, the chance of a "double odd" event is (50 * 49)/9900 = 2450/9900 = .247 The chance of getting two consecutive outcomes of the form (i, i + 1) is the sum of chances of the sequences (1,2) and (2,3) and (3,4) ... up to (99,100). In other words, there are 99 sequences of the form (i, i + 1). Given that each has probability 1/9900 and they are mutually exclusive, we can write: P(two consecutive numbers selected) = 99 * (1/9900) = .01.

EXERCISE 1.6

In the double-selection of problem 1.5: (i) (ii) (iii)

What is the chance that the sum of the numbers chosen is exactly five? What number b has the highest chance of being the sum of i and j? (Hint: What sum could come up regardless of the value of i?) What is the probability that the sum takes on the value b from part (ii)?

While the equal probability axiom is stated above for mutually exclusive events, we assume that it can be extended to cases in which A and B are not mutually exclusive. For example, two problems that might arise tomorrow morning are that my car will not start (event A) or that the newspaper will not arrive (event B). These events are not mutually exclusive, but suppose that I am told by a soothsayer that exactly one of these events will occur tomorrow. If I would be at a complete loss to guess which one (i.e., if neither resolution of the soothsayer's prophecy would surprise me more than the other), then I am in effect declaring the two events equally likely and the axiom can be taken to require that P(A) = P(B). The equal-probability axiom has a limitation: it does not directly tell us what probabilities to assign to two complementary events A and B when we do not think A and B equally likely. Here, however, we might consider something of a synthesis of the equal probability axiom and the subjective concept of probability. Suppose again that A is the event that my car will not start the next time I try to use it, and B the complementary event that it will start. Suppose further that I view having the car fail to start as equally likely with getting a 1 when I pick a ball at random from an urn that has 1000 balls, each labeled with a different integer from 1 to 1000. As we know, the equal probability axiom for the urn choice yields a 1/1000 probability of getting a one. And a second application of the equal probability axiominvolving the car and the urn-then obliges me to assign 1/1000 probability to the car failing to start. Given that P(A n B) = P(A) + P(B) = 1, the choice P(A) = 1/1000 implies that P(B) = 999/1000. In this way, we might indirectly benefit from the equal

12

CHAPTER 1

The Laws Of Probability

probability principle even in cases where its direct use for A and B would be flagrantly inappropriate. With such reasoning, we can combine subjective probability and the equal probability axiom (1-9) to move even further afield. An example:

EXAMPLE 1.6

Easy as A, B, C? Suppose that sample space S consists of three mutually exclusive events A, B, and C, and suppose that it is believed that: B is twice as likely as C A is three times as likely as B What subjective probability assignment is consistent with these beliefs? Solution

The way we interpret "twice as likely" is that, whenever either B or C occurs, the chance that it is B rather than C is the same as the chance of getting a black ball in a pick from a random urn with two black balls and one red ball. Similarly, "three times as likely" implies that, when either A or B comes up, the chance that it is A is the chance of selecting a green ball from a random urn with six green balls and two black balls. A larger random urn consistent with both restrictions-as well as no possibilities other than A, B, and C-would have six green balls, two black balls, and one red ball. Because all nine balls have an equal chance of being selected, that common probability is 1/ 9. But then: P(A)

= 6/ 9 = 2/ 3

P(B)

= 2/ 9

P(C)

= 1/ 9.

This urn is not the only one consistent with the conditions of the problem: we could just as well have had 120 green balls, 40 black balls, and 20 red balls. But all satisfactory urns lead to the same probability assessment, namely, the one above.

EXERCISE 1.7

Suppose that sample space S consists of four mutually exclusive events A, B, C, and D and suppose that it is believed that: B is twice as likely as C A is three times as likely as B

D is twice as likely as A, B, and C combined What subjective probability assignment is consistent with these beliefs?

Relative Frequency Our subjective assignments of probability are sometimes disciplined by the third interpretation of probability, namely, the concept of relativefrequency. If the experiment

1.2

Some Basic Probability

13

that can result in the event can be conducted a large number of times under independent and identical conditions, the fraction of times that event A occurs should converge to P(A). In other words, probability should correspond to the fraction of time the event occurs "in the long run." We will elaborate on this reasoning in Chapter 4, where we discuss the Law of Large Numbers. Because of the relative frequency principle, a subjective estimate of probability can sometimes be shown to be incorrect. For example, suppose that someone estimates that a coin will come up heads with probability 7/ 10. Then if the coin is tossed a million times and it comes up heads 50% of the time, the estimate of 7/ 10 is contradicted by long-term experience. (More precisely, either one disavows the original estimate or winds up disavowing some of the axioms of probability.) Of course, it is not always possible to conduct lots of experiments to determine long-term outcomes. If the issue is the probability that Hillary Clinton will be elected the US President in 2016, the idea of arranging ten million independent elections and observing the fraction in which Hillary Clinton wins does not exude realism.

EXAMPLE 1.7

Rain Man

To assess the accuracy of a weatherman's predictions, five years of his forecasts are assembled. It is discovered that there were 120 days on which he said that there was a 40% probability of rain the next day. In fact, it rained on the following day on 26 occasions. Does that outcome reduce confidence in his 40% forecasts? Discuss briefly. Solution

Well, yes. 40% of 120 is 48, so the relative frequency principle suggests that the number of subsequent days with rain should have been about twice as large as the 26 observed. Because 26/120 = .217, he would have had better overall accuracy had he had estimated the chance of rain as 20% on each of those 120 occasions.

EXERCISE 1.8

Let's pursue the two tosses of a fair coin in Example 1.2 in the spirit of the relative frequency definition of Probability. Taking account of what outcomes are possible (return to Example 1.2 if your memory needs refreshing), what is the probability of getting exactly one head? (ii) Exactly two? (iii) Exactly zero? (iv) If you had to guess how many heads would come up, would guess would you make? (v) What is the probability your guess would be correct? (vi) Does this example suggest that relative frequency is not a reliable guide to the outcome in the short run? Briefly discuss. (i)

14

CHAPTER 1

The Laws Of Probability

Not everyone would accept this particular approach to mixing various interpretations of probability; as noted, the topic evokes a controversy that has gone on far longer than our lives. Once probabilities have been assigned to elementary events, however, there is no controversy about how to use those probabilities to determine the probabilities of more complex events. We need to introduce one more fundamental probability concept: the idea of conditional probability.

Conditional Probability The conditional probability of B given A is denoted P(B IA) and is defined by the rule: P(BIA) = P(A n B) P(A)

(1-10)

The intuitive interpretation of P(B IA) is as the probability we assign to B knowing that A has occurred. Event B originally had probability P(B), but the information that event A occurred in the experiment could justify revising the probability of event B, and (1-10) indicates how we should do so. For example, suppose a fair six-sided die is tossed, and event A and B follow: A: getting an even number

B: getting a four If we learn that the toss resulted in an even number (but nothing more about the outcome), how should we adjust the probability of event B? The initial probability of B (i.e., P(B)) is 1/6. Because P(A n B) =

¼and P(A) = ½,

(1-10) tells us that P( BIA) =

½·

That revision of the probability makes sense: Given that A occurred (an even number), the outcome of the toss must be a 2, 4, or 6. Because these three outcomes initially were equally likely-and because knowing that the toss came up even does not change that equality-the probability assigned to B should go from 1/ 6 to 1/ 3. Under (1-10), it does. We can derive (1-10) under the relative frequency concept of Probability. Over a very large number of experiments (N), suppose that NA of them result in event A and N AnB result in both A and B. Then, when A arises, the fraction of time that B arises in the long run is approximated by NAnB/NA- For example, if N = 100,000, NA = 60,000 and NAnB= 20,000, then P(BIA) ""NAnB/N = 1/ 3. But the frequency interpretation of probability also entails the approximations P(A) ""NA/N and P(A n B) ""NAnB/N. .

Therefore, P(BIA) ""NAnB/N = (NA/N)/ (NAnB/N), which becomes P(BIA) =

P(A n B) P(A) .

Conditional Probability

1.3

EXAMPLE 1.8

15

Another Die Is Cast A fair six-sided die is tossed twice. Among the events that might arise are:

A: the first toss yields an even number B: the second toss yields an odd number C: the sum of the two outcomes is even

Find P(A), P(B), P(C), P(B n C), P(C IB), P(C IA n B), and P(B IC) Solution

The long way to do this is to list the 6 * 6 = 36 possible (i,j) pairs that can result from the tosses, assign probability 1/36 to each, and then add the probabilities for all the mutually exclusive pairings that lead to each event. But we can save time if we recognize that the successive outcomes fall into four categories on the odd/ even dimension: 0 12: (Both Odd)

0 1E2:(0dd then Even)

Ei0 2 :(Even then Odd)

E12 : (Both Even)

All these categories are equally likely: there are three odd and three even possibilities on each toss (and we are assuming that the first outcome does not influence the second). Thus P(Oiz) = P(0 1Ez) = P(Ei0z) = P(Eiz) = 1/4 As for the events of interest: P(A) = P(Ei0 2 U Eiz) = P(E 10z) + P(Eiz) = 1/2 (because Ei0 2 and E12 are mutu-

ally exclusive)

= P(E10 2) + P(Oiz) = 1/2 P(C) = P(Oiz) + P(Eiz) = 1/2 P(B n C) = P(Oiz) = 1/4 P(CIB) = P(B n C)/P(C) = 1/2 P(B)

P(CIA n B) =P(C n (An B))/P(A n B) = 0 because C and An Bare disjoint subsets of the sample space, meaning that P(C n (A n B)) = 0. P(B IC)

= P(B n C)/P(B) = 1/2

All these probabilities make sense, and we could have worked them out in our heads without formal use of conditional probability. But it is comforting that use of (1-10) yields the answers we knew were correct on other grounds. Rest assured, however, that we will be using (1-10) in lots of cases where we cannot work out answers in our heads.

EXERCISE 1.9

In Example 1.8, work out P(A n B), P((A n B) IC) and P((A n B) Icc) where c c is the complement of C.

16

EXAMPLE 1.9

CHAPTER 1

The Laws Of Probability

Horse Sense

Besides wagering that a particular horse will win at the race track, one can bet it will be one of the two fastest (i.e., that it will place). The horse that the bettors consider the most likely to win (as evidenced by their wagers) is called the favorite. (Actual) data collected at the Louisville Downs raceway in Kentucky suggested that, 33% of the time, the favorite actually wins and, 29% of the time, it comes in second. Assume that these probabilities prevail in any given race and that the outcomes of different races are independent. (i)

In a particular race, Mendel bets on the favorite to win and Minerva bets that the favorite will place. What is the probability that both their wagers will succeed? (ii) What is the probability that Minerva will succeed but Mendel will not? (iii) Given that Minerva succeeds, what is the probability that Mendel will also succeed? Solution:

(i)

(ii)

(iii)

EXERCISE 1.10

If A is the event that Minerva wins her bet and B the event that Mendel wins his, then P(B) = .33 and P(A) = .33 + .29 = .62. Because B c A, P(A n B) = P(B) under (1-9). (In words, both to place and to win, the horse has to win.) Thus, P(both wagers succeed) = P(A n B) = P(B) = .33 The Law of Alternatives (1-7) says that P(A) = P(A n B) + P(A n Bc), where Bc is the complement of B (i.e., the event that Mendel doesn't win). Because P(A) = .62 and P(A n B) = .33, we see that P(A n Bc) = P(A) - P(A n B) = .29. (We can get the result directly if we recognize that the event "Minerva wins but Mendel doesn't" is the same as the event "the horse came in second.") From (1-10), we know that P(BIA) = P(A n B)/P(A). Because P(A) = .62 and P(A n B) = .33, we have P(B IA) = .33/.62 = .53. What this means is that, if the race is over and we learn that Minerva won her bet but have no news about Mendel, we should increase our estimate of the chance that he won from 33% to 53%.

It is also possible to bet that a horse will show in a race, meaning that it finishes among the three fastest horses. Suppose that the chance the favorite comes in third at Louisville Downs was 10% (we say "was" because Louisville Downs has since closed), and that Asparagus went to the track with Mendel and Minerva and bet that the horse would show in the race in in Example 1.9 which Mendel wagered "win" and Minerva bet "place." If C is the event that Asparagus won his bet:

(i)

What are the events C, B n C, and B n C n N in words?

Conditional Probability

1.3

17

And what are: (ii) P(C) (iii) P(B n C) (iv) P(B n C n N) (v) P(AIC)

Independent Events

Two events A and B are said to be independent if: P(B IA)

= P(B)

(1-11)

In other words, knowing that A has occurred does not change our assessment of the likelihood of B. When A and Bare independent, (1-11) implies a simple product rule: P(BIA) = P(A n B) - P(B) = P(A n B) - P(A n B) = P(A)P(B) P(A)

P(A)

If A and B are independent events, then P(A

n B) = P(A)P(B)

(1-12)

If A is independent of B, does that mean Bis independent of A? Yes. We know that:

P( AIB)

=

P(;(~) B). But if P(A n B) = P(A)P(B), then P( AIB)

=

P(A).

In consequence, independence is a two-way street.

EXAMPLE 1.10

Square Won? Suppose that a fair six-sided die will be tossed once, and let A be the event of getting an even number and B be getting a perfect square. Are A and B independent events? Solution

Here P(B IA) is the chance of getting a perfect square given that the toss yields an even number. The even numbers that can come up are 2, 4, and 6; the perfect squares are 1 and 4. We see that P(A) = 1/2, and P(A n B) = P(get a 4) = 1/6. Therefore, P(BIA) = P(A n B)/P(A) = 1/3. Is that quantity equal to P(B), as independence requires? Sure it is, because two of the six equally likely outcomes on the toss- namely, 1 and 4- result in a

18

CHAPTER 1

The Laws Of Probability

perfect square, yielding P(B) = 1/3. Because P(B IA) = P(B) = 1/3, the definition of independence is satisfied. And so is our intuitive sense of what independence means. Knowing that the outcome was even neither raises nor lowers the chance of getting a perfect square: what was originally 2 in 6 became 1 in 3. Nothing lost, nothing gained.

EXERCISE 1.11

For a single toss of a fair die, let three events be:

A: getting an odd number B: getting a perfect square C: getting a prime number

For each pair of events below, state whether the events are independent, mutually exclusive, or neither independent nor mutually exclusive: • A and C • A and B

• A n Band C • C n Band A • C n B and Ac, where Ac is the complement of A (Hint: Remember that one is not considered a prime number.)

EXAMPLE 1.11

Backed Up Many kinds of equipment have the feature of redundancy, under which backup systems take over certain functions when the primary systems fail. Such arrangements are common with respect to a hospital's electrical supply: if the usual power source fails, secondary generators are supposed to come on-line at once. (Obviously, it is hoped that whatever caused the primary system to fail does not simultaneously cripple the backup system.) Consider two events about a piece of "redundant" equipment for which both the primary and backup systems are up and running now: A i: the primary system Jails in the next hour A 2: the backup system Jails in the next hour

Let p; = P(A;) for i = 1,2. Suppose that a study of historical data about the equipment leads one to hypothesize that: (i) (ii)

Pi and p2 are exactly the same, both of them sharing the common value p. Ai and A2 are independent events.

19

Conditional Probability

1.3

Statements (i) and (ii) constitute a probabilistic model about the equipment's reliability. Assume that this model is valid and consider five events:

B: both systems fail in the next hour C: the backup system does not fail over the next hour D: the primary system fails in the next hour but not the backup . E: exactly one of the two systems fails in the next hour F: at least one system fails in the next hour What is the probability of each of these events?

Solution P(B) (both systems fail)

B is the same as the event A1 (1-12) to write:

n A 2. Because the A/s are independent, we can use

Exploiting the fact that P(Ai) = p for each i, we simplify the last equation to P(B) =p2. P(C) (backup system does not fail)

As defined, the events C and A 2 are complementary. Thus, we can use (1-5) to write: P(C) = 1-p P(D) (the primary system fails but not the backup)

Dis the joint event A1 and C. To find P(D) = P(A 1 n C), we must face a question: While we know that A1 and A 2 are independent, can the same be said of A1 and C? The answer is yes because, if the failure of the primary system does not affect the chance that the backup fails, it must likewise leave untouched the chance that the backup does not fail. Therefore: P(D)

= P(A 1 n C) = P(A 1)P(C) = p(l -

p)

P(E) (exactly one of the systems fails)

Event D is only one way to get exactly one failure: it is also possible that that the backup system fails but not the primary one. If we denote this latter event by Y, then P(E) = P(D u Y). Clearly, D and Y are mutually exclusive, so P(E) = P(D u Y) = P(D) + P(Y). We know that P(D) = p(l - p); identical arguments establish that P(Y) also equals p(l - p), which means that:

20

CHAPTER 1

The Laws Of Probability

P(E) = P(D) + P(Y) = p(l - p) + p(l - p) = 2p(l - p) P(F) (at least one system fails)

Here we can use the earlier findings about B and E to write: P(F) = P(B

EXERCISE 1.12

u E) = P(B) + P(E) = p2 + 2p(l - p)

Suppose that p is the probability that the primary system in Example 1.11 fails in the next hour, and q is the corresponding probability for the backup system (p "'- q). Again, failures of the two systems are independent events. Find the probabilities of events B ⇒ Fin terms of p and q.

We're almost ready to turn to a series of Probability problems, but first we need pursue some variants of the conditional probability rule.

Bayes' Theorem In using (1-10), we assume that P(A) > 0. If P(A) = 0, after all, the concept of P(BIA) is meaningless. (If there are no such things as unicorns, we should not worry about what fraction of them drives Toyota Corollas.) When P(A) > 0, we know from (1-10) that: For any two events A and B: P(BIA) = P(A

n B)

P(A)

(1-13)

But another use of (1-10) assuming P(B) > 0 yields: P(AIB) = P(A n B) P(B)

(1-14)

(1-14) can be rewritten as: P(A

n B) = P(B)P( AIB)

Substituting (1-15) into (1-13) yields:

(1-15)

1.3

21

Condit ional Probability

Bayes' Theorem:

- P(B)P(AIB) p (BA P(A) I )-

(1-16)

The practical value of that formula might seem dubious on the grounds that, unless one knew P(B IA) already, one might seem unlikely to know all three of the quantities needed to deduce it from (1-16). But things are not always what they seem: It turns out that Bayes' Theorem is one of the most powerful results in Probability and Statistics. The great advantage of the Bayesian formula (1-16) is that it allows the revision of probabilities in the light of new information (i.e., from P(B) to P(B IA)). Quite often, a variety of indicators about the likelihood of an event (e.g., that a particular geological basin contains oil) arise at different times. One wants to include the latest findings in the probability estimate while not diminishing too much the weight accorded earlier developments. Bayes' Theorem provides for a natural and accurate synthesis of all of the facts already amassed.

EXAMPLE 1.12

Bayes' Hit?

(Sorry) If P(A) and P(B)> 0, does P(B IA) = P(A IB) mean that P(B) must equal P(A)?

Solution: Well, yes and no. First, suppose that P(B IA) > 0. From Bayes Theorem, we have P(B)P(AIB) P(B IA) = P(A) . Therefore, the equality of P(B IA) = P(A IB) implies that the ratio of P(B) to P(A) is one. Hence, P(A) and P(B) must be equal. But suppose that A and B are mutually exclusive, in which case P(B IA) = P(A IB) = 0. There is no requirement that P(A) and P(B) be equal just because the events are mutually exclusive. And this outcome does not contradict Bayes' Theorem, which simply becomes O= 0 in this case.

EXERCISE 1.13

If P(B IA) = 1 and P(A) = 3/5, do we have enough information to determine P(B)? Either explain why we do or give two examples that show that P(B) can take on different values even with this information. Also, is it possible that P(B) = ½ with this information? Discuss briefly.

22

CHAPTER 1

The Laws Of Probability

The Joint Probability Law Equation (1-15) can be generalized beyond two events A and B. If we have three events = A n B and then write:

A, B, and C, then we can write D P(A

n B n C)

But (1-15) also implies that P(D) Therefore: P(A

= P(D n C) = P(D)P(CI D)

= P(A n B) = P(A)P(B IA)

n B n C)

= P(A)P(BIA)P(CIA n B)

(1-17)

We have in (1-17) a three-event generalization of (1-15). And we can use this reasoning again to get a four-event generalization and, through applying the reasoning recursively, to get a rule that applies fork events Ai, A 2, .• . Ab which we call the joint probability law:

Joint Probability Law:

(1-18)

The joint probability law is basically a "sudden death" rule: To get all events A 1 ➔ A ,,, first A 1 must occur: otherwise we have already failed. Given that A 1 occurred, we require that A2 also occurred; if not, the k-event sequence is again doomed. Then we must ask about A3 given A 1 and A 2, and so on. Equation (1-18) is the probability of surviving all k stages of this discovery process.

1~n Ai ... n

IftheA/ sareindependent-inwhichcase P(A1 (1-18) becomes a simple product rule:

A _1 ) = P(A1) then 1

If A 1 ➔ A " are independent events, then: (1-19)

We have now stated all the basic laws of Probability that we will need in this book. There can be debate about which specific formulas should be called "laws." Kolmogorov's Axioms, the equal probability axiom, and the conditional probability rule (1-10) might be viewed as the fundamental laws, because all the others we presented are derived from them. As noted, however, people speak of the Law of Alternatives and the Joint Probability Law. But there is no point engaging in semantic quibbles, for we have work to do to gain greater familiarity with the principles we have discussed. That familiarity should grow as we employ the principles in practical

Games of Chance

1.4

23

exercises. Indeed, now that all the laws of probability are at hand a strenuous series of such exercises is most timely. Some Probability Problems

We embark in the rest of this chapter on over two dozen probability problems, on subjects ranging from the amusing to the grim. Readers should focus their efforts here on comprehending the solutions proposed, and should not worry if they find themselves doubting that they would have reached the answers on their own. A formula like the Pythagorean theorem is directly applicable as soon as it is known; the laws of probability are rarely so generous. Often one must express the problem in just the right way before the laws become willing to reveal its solution. Although we said we would move beyond coins, dice, and urns in this book, we only partially do so in the first set of exercises, which concern games in which the outcome depends largely on sheer luck. Then we move on to some threats to health and safety, after which we consider several applications of probability in science, business, and government. While most of our examples ask us to answer specific questions, a few involve thinking about answers that were already provided by others.

Games of Chance EXAMPLE 1.13

Ask Marilyn

Marilyn Vos Savant is listed in the Guinness Book of World Records as the person with the world's highest IQ (intelligence quotient). She has a weekly column in Parade magazine called "Ask Marilyn," which occasionally considers a Probability problem that is posed by a reader. One of her readers raised the following question: "At work, we had a contest in which the prize was a new car. The six finalists could choose from among six keys, only one of which would start the car. In an order chosen at random, each person would select a key and try it. If the key didn't work, it would be discarded (and the next person would try). In a game like this, would you want to go first?" While Marilyn hardly needs help to answer the question, what is the answer? Solution

Suppose that W; is the event that the ith person to select a key wins the car (i = 1,2, ... 6), and let L; be the event that the ith person does not win. Then, it seems clear that P(W1) = 1/6: the first person to select is simply guessing which key out of the six starts the car. But what are P(Wz) and the other probabilities of winning? It is useful to note that P(Wi) is really P(L 1 n Wi): the second player can only win if the first player loses and the second key works. But then we can use (1-15) to write:

24

CHAPTER 1

The Laws Of Probability

Now, P(L 1) = 1 - P(W1) = 1 - 1/6 = 5/6 (using equation (1-5) for complementary events). As for P(W2 1L1), the fact that player 1 lost means that the winning key is one of the five he didn't choose. Conditioned on player l's defeat, the chance that the second player will chose the winning key is 1/5. Therefore, P(L1 n Wi) =

(¾)(½) = ¾- The second player is just as likely to win as the first! What about the third player? For her, we can write: P(L1 n L2 n Wi)

= P(L1)P(L 2 I L 1)P(W3 1L1 n Li)

Once again, P(L1) =5/6. P(L 2 1L 1) = 1- P(W2 1L 1) = 1-1/5 =4/5 (Right?). And P(W3 1L 1 and Li)= 1/4, because the third player in this circumstance must choose the winning key out of the four remaining keys. In consequence,

Player 3 has the same chance of winning as the first two players. Continuing in this way, we find that all six players have the same probability of winning (namely, 1/6). There is no advantage to going first: Later players are

less likely to get the opportunity to choose a key, but more likely to select the correct one if they do get to choose. The calculation makes clear that these two opposite effects exactly cancel out.

EXERCISE 1.14

Suppose that in Example 1.13 there are six players but twelve keys, one of which will start the car. Players select one key at a time and, if none of the first six keys chosen starts the car, the player who went first gets to a select a seventh. If the seventh key fails, the second player to choose a key gets to pick the eighth, etc. In this game, is there any advantage to going first?

EXAMPLE 1.14

Second Chance Marilyn was also asked about another game of chance, namely: A large um has an equal number of red balls and black balls. A player selects a ball at random and, if it is red, she wins at once. If the ball is black, she gets a second chance to pick: if she gets red this time, she wins; if she gets another black, she loses. If lots of people play this game, will the total number of red balls drawn differ from the total number of black balls?

Games of Chance

1.4

25

Solution

In this game, the chance the first ball is red is 50%. If a second ball is drawn, the chance that it is red is again assumed to be 50%. (True: the fact that the first ball was black slightly changes the red/black mix among the remaining balls; we assume, however, that the urn is so large that this effect can be ignored.) To find the chance that a particular player will win, we note that there are two mutually exclusive ways of winning: Red on the first ball

50% probability

Black on the first ball and red on the second

25% probability (i.e, 1/2 * 1/2)

Thus, P(player wins) = .5 + .25 = .75. But that wasn't the question! We want to know about the color mix of the balls drawn by a large number of players. Let X be the number of red balls drawn by a particular player. If the player wins, she got exactly one red ball during the game; if she loses, she got none. Given a 75% chance of winning, the relative frequency interpretation of probability implies that we can write: 25% of the time 75% of the time Out of every 100 players, therefore, approximately 75 * 1 + 25 * 0 = 75 red balls will be drawn. (Here we are using the relative frequency concept of Probability.) That works out to an average of 3/4 red ball per player (75/100). Now, let Z be the number of black balls drawn by a particular player. If the player wins on the first pick, then Z = 0. If the player wins on the second pick, then Z = 1 (first ball was black and the second red). And, if the player loses, then Z = 2 because both balls selected were black. As for the probabilities of these three outcomes, there is a 50% chance that Z = 0 (first ball red), a 25% chance that Z = 1 (black then red), and a 25% chance that Z = 2 (two black balls). In short: 50% of the time 25% of the time 25% of the time

Again invoking relative frequency, we approximate that, for every 100 players, the total number of black balls is approximately 50 * 0 + 25 * 1 + 25 * 2 = 75. We get an average of 75/100 = 3/4 black balls per player, the same answer as for red balls. Thus, over the long run, the game will yield an equal number of red balls and black balls. For a given player, the number of red balls and black balls need not be equal, but the differences cancel out over large numbers of players. It turns out that there is an easier way of getting to the answer In any particular game, the first ball drawn is equally likely to be red or black. But if the game goes on to a second drawing, then that ball is also equally likely to be red or black.

26

CHAPTER 1

The Laws Of Probability

In short, every ball drawn-whether the first or second in a game-has an equal chance of being black or red. This 50/ 50 rule ensures that red balls and black balls are equally numerous in the long run. Actually, Marilyn did not receive the problem in exactly the form stated above. Suppose that, in a given country, each couple that wants to have a child must state in advance whether it prefers a boy or a girl. If the first-born is of the desired sex, then the couple is not allowed to have any more children. If the first-born is of the other sex, then the couple can have one more child. The couple must stop after two births, even if it is disappointed the second time. Under these arrangements, Marilyn was asked, what proportion of children will be boys and what proportion girls? Assuming that each child is equally likely to be boy or a girl (and neglecting the possibility of twins), the answer is 50/ 50 by the reasoning above. This example suggests that what might at first seem like a mere game of chance is in reality a probability model for a real-life policy problem. Nor is this problem preposterous: it is based on a real situation, and some countries impose birth-control policies even more restrictive than the one just outlined.

EXERCISE 1.15

Suppose that the game in Example 1.14 is played with a large urn that contains twice as many red balls as black balls. If the game is played repeatedly for a long time, approximately what percentage of balls drawn will be red?

EXAMPLE 1.15

A Bridge Collapse This problem concerns the card game Bridge, a game that involves four playersNorth, South, East, and West-each of whom is dealt thirteen cards. The players compete as two teams: North/South and East/West. In any given round, one team is trying to fulfill a contract while the other is trying to thwart such fulfillment. In a particular game that the author played (as South), he knew that the East/ West team members together had a total of four of the 13 clubs in the deck (of 52 cards): 5, 8, 10, and Jack. But he didn't know which player held any particular one of the four clubs. From his point of view, the worst possibility would be a 4-0 split, in which East or West held all four clubs. What is the probability of this unfortunate split (assuming of course a random deal)?

.I

1•. 1•. ~•.• ••

. ••• ·••; • ·••;

• •t

Does East have all of these cards?

27

Games of Chance

1.4

Solution

Let C be the event that East held all four clubs, D the event that West did so, and E the event that either opponent had them all (i.e. C U D). The question posed was really a request for P(E). Because C and D are mutually exclusive, we can write: P(E)

= P(C u D) = P(C) + P(D)

Under a random deal, P(C) and P(D) must be equal and therefore P(E) = 2P(C)

Determining P(C) therefore becomes our objective. We now define the events A1 ⇒ A4 by: A1: East has the 5 of clubs A 2: East has the 8 of clubs

A 3: East has the 10 of clubs ~: East has the Jack of clubs Because event C requires all four A;'s to occur, we have the equation: P(C) = P(~

n Ai n ~ n A4)

or P(C) = P(~)P(Ail~)P(~I~

n Ai)P(A41~ n Ai n ~ )

Thus, finding the four quantities on the right will lead to P(C) and hence P(E). Once the problem is cast in an appropriate form, the four factors we need can easily be deduced. From the perspective of South, the East-West team in effect has 26 cards face down, 13 of them belonging to East and 13 to West. Among those 26 cards are the 5, 8, 10, and Jack of Clubs. The question is whether all four of these clubs are within the 13 cards that East possesses. Under the assumption of a random deal, all 26 of the concealed cards are equally likely to be the 5 of clubs. Because East holds 13 of these cards, the chance that his hand includes the 5 is simply 13 in 26. Or, in the notation of the problem, P(A 1)

= 13/26 = 1/2.

To find P(A 2 1A 1), imagine that, acting to end the mystery, East announces that he has the 5 of clubs and proceeds to turn it over. Then each of the 25 cards that remain face down has an equal chance of being the 8 of clubs. But given that East holds only 12 of these cards, the chance that he possesses the 8 as well as the 5 is only 12 in 25. We can therefore write: P(A 2 IA 1) = 12/25 = .48. Extending the logic of the last paragraph, the chance East has the 10 of clubs having already revealed he has the 5 and the 8 would be only 11 in 24. (In other words, P(A 3 IA1 n A 2) = 11/24.) And, generalizing the argument one last time, P(A 41A1 n A 2 n A 3) is 10/23. Performing the necessary multiplications, we get:

28

CHAPTER 1

The Laws Of Probability

or P(C) = P(Ai)P(~IAi)P( A:i lAi n ~)P(A4 1Ai n ~ n A:i )

=

(½){;~)(;l)G~) = .048

Thus (1-20) tells us that P(E), the overall chance of a 4-0 split is 2 * .048 = .096, or roughly 1 in 10. In other words, the author did suffer a bad break, but not quite the amazing reversal of fate he so eagerly blamed for his loss. Before leaving this problem, we might note that P(A 2 IA 1) is 12/25, while P(A 2) is .5. The discrepancy-which violates the assumption that A1 and A 2 are independent-means that East's having the lowest of the four clubs slightly lessens the chance that she also has the second lowest. The intuitive explanation for the change is that, the more of East's hand that is already accounted for, the fewer are her opportunities for acquiring other particular cards.

EXERCISE 1.16

Suppose that East/West had a total of five clubs between them in Example 1.15. What is the probability of a 5-0 split between the two players? Does the answer depend on which five clubs the two players have? (No, it doesn't: do you see why?)

Even when a probability has been computed correctly, it is not always clear how to interpret it. As the next example suggests, an event is sometimes defined in a way that, while technically correct, nonetheless fails to capture the real reason that people took note of the occurrence.

EXAMPLE 1.16

A Coincidence in Connecticut On both January 8 and January 9, 1998, the number 828 came up in the three-digit Connecticut lottery (with outcomes from 000 to 999). Time magazine shortly thereafter informed its readers that a "one in a million" event had taken place. (i) (ii)

In what sense was the Time calculation correct? Why might the calculation nonetheless be misleading?

Solution

(i)

Time was taking note of the facts that the probability of getting 828 was 1/1000 on January 8 and the same on January 9 and that, assuming the lot-

tery is genuinely random (what a calamity if it is not!), the outcomes on the two days are independent. Thus, one can use (1-12) to write: P(828 on both 1/8/98 and 1/9/98)

= P(828 on 1/8) * P(828 on 1/9)

= (1/1000)2 = 1 in 1 million

1.4

(ii)

29

Games of Chance

Thus, the prior probability of the event that occurred was indeed one in a million. And yet. Wouldn't Time have been just as excited if (say) the number 453 had come up on 1/8 and 1/9, or if 828 had come up on 1/11 and 1/12? What really attracted Time's interest was that the same number had been selected two days in a row in Connecticut. Perhaps we should focus on that outcome in assessing whether Time had uncovered a rare event.

Let's consider a calendar year, and seek the probability of at least one "back to back" selection during the year. Sometimes the easiest path to a probability is to find the chance that something doesn't happen (i.e., the complement of the event of interest) and then to subtract that number from one. Hence, we define: C: at least once during a given calendar year, two consecutive days yield the same lottery number in Connecticut (which takes place daily)

CC: the complement of event C (i.e. C does not occur) We also define the E;'s by:

E; : the number that comes up on day i (i = 2, ... 365) is different from the one that arises on day i - l (We start at i = 2 because we can't have two consecutive days the same year before January 2.) We seek P(C), and will find it by first finding P(C'). We have:

P(cc) = P(E2 n E3 ... n £365) =

= P(E2)P(E3IE2)P(E4IE2 n E3) ... P(E36sls n E2 ... n £364) Each of the probabilities on the right must be 999/1000. (Note that the various E;'s are independent: Whether we get the same number on August 7 and August 8 is unrelated to whether August 7 matched August 6). Thus:

P(CC)

= (999/1000)364 "" .69, and hence P(C) "" 1 -

.69

= .31.

In other words, Time's "one in a million" event has roughly a 1-in-3 chance of coming up in Connecticut sometime during a year. And that is not all. As a national publication, Time would presumably be just as enthusiastic if a comparable event arose in the New Jersey, California, or any other state lottery. There are 39 three-digit lotteries in the United States: if each has a l-in-3 chance per year of experiencing the same number on two consecutive days, then roughly (1/3) * 39 = 13 such coincidences should occur in the US every year. To be consistent, therefore, Time would have to report an astounding coincidence in some US lottery every few weeks. The real issue is that "one in a million" events are not really infrequent when there are millions of ways they happen. Such exciting reports have a way of resurfacing. A nationally-distributed press report on 1/22/09 was headlined:

30

CHAPTER 1

The Laws Of Probability

Nebraska Lottery Draws Same Number Twice in a Row The Odds of such an Odd Occurrence? One in a Million

What had happened was that the number 196 had come up on 1/19/09 and 1/20/09. Note that, in this case, the event was explicitly described as getting the same number twice in a row, not as getting 196 twice in a row. Unlike Time's calculation, this one was incorrect even on its own terms.

EXERCISE 1.17

Texas has a three-digit lottery (Pick 3) with drawings twice a day except on Sunday (when there are none). Drawings take place on all holidays that fall between Monday and Saturday. In the year 2015-in which January 1 falls on a Thursday-what is the probability that there will be at least one day in which the same three-digit number arises on both drawings?

EXAMPLE 1.17

Minerva Plays Craps At many casinos, the dice-game Craps is prominently played. One variant of Craps goes as follows: • The game starts with the toss of two six-sided dice, and the outcome is the sum of the two numbers that come up. If the outcome is 2, 3, or 12, then the player (who we will assume is Minerva) loses at once. If a 7 or 11 comes up, she wins at once. If the outcome is 4, 5, 6, 8, 9, or 10, then the game continues to a second toss of the dice. • If the game goes on to a second toss, it continues for as many tosses as needed until either a 4 or a 7 comes up (when it ends). Should a 4 eventually come up, Minerva wins; should a 7 come up, she loses. In other words, a 7 is great news on the first toss but, after that, it is deadly. Under these rules, we consider a few questions: (i) (ii) (iii) (iv) (v)

What is the probability that Minerva wins on the first toss? What is the probability that the game continues beyond the first toss? What is the probability that Minerva wins immediately on the second toss? Assuming that the game does not end on the first toss, what is the probability that Minerva will eventually win? Before the game starts, what is the probability that Minerva will win (sooner or later)?

Solutions We do well to start by working out the probabilities of various outcomes on a toss of two fair dice. The sum of the two numbers that arise can range from 2 to 12. As

1.4

31

Games of Chance

before, we assume for simplicity that one of the dice is red and the other black, and let i be the red number and j be the black number. The sample space consists of the 36 equally likely (i,j) pairs that could come up, each assigned probability 1/36 under the "equally likely" principle (1-9). As for the sum i + j, we have the following chart: Sum

Ways That Sum Can Arise

Probability of Getting that Sum

2

(1,1)

1/36

3

(1,2), (2,1)

1/36 + 1/36 = 1/ 18

4

(1,3), (2,2), (3,1)

3/36 = 1/12

5

(1,4), (2,3), (3,2), (4,1)

4/36

6

(1,5), (2,4), (3,3), (4,2), (5,1)

5/36

= 1/9 = 1/6

7

(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)

6/36

8

(2,6), (3,5), (4,4), (5,3), (6,2)

5/36

9

(3,6), (4,5), (5,4), (6,3)

4/36

10

(4,6), (5,5), (6,4)

3/36 = 1/12

11

(6,5), (5,6)

2/36

12

(6,6)

1/36

= 1/9 = 1/18

Note that we have added the probabilities for the mutually exclusive ways of reaching each particular sum. (i)

To find P(end on first toss), we need merely add the probabilities of the (mutually exclusive) outcomes 7 and 11. The result is P(7 U 11)

(ii)

1 1 +6 18

= -

2 9

= -

The chance that the game continues beyond the first toss is the chance of a 4, 5, 6, 8, 9, or 10 on that toss. From the chart above, we see that: 1 1 5 5 1 1 24 2 P(4U SU 6U8U9U 10) = - + - + - + - + - + - = - = 12 9 36 36 9 12 36 3

(iii) For Minerva to win at once on the second toss, two things must happen: the game must continue past the first toss, and then the second toss must result in a 4. Noting that different tosses are independent in their outcomes, we can write: P(Minerva wins on 2nd toss)

= P(no end on first toss) * P(4 on 2nd toss)

We know from (ii) that the first factor on the right is 2/3, while the second is 1/9 from the table above. Therefore, P(Minerva wins on 2nd toss)=

(¾)(/2 ) = 118

32

CHAPTER 1

(iv)

The Laws Of Probability

If the game continues past the first toss, it might go on for several further tosses but, eventually, it must end. (We cannot forever avoid getting a 4 or a 7). The tosses that precede the final round are irrelevant to whether Minerva wins: it does not matter if we get (say) 8,2,5,8, and 9 before the toss that ends the game. We can therefore focus on the final round (assuming it was not the first), and ask the question: when a game-ending toss arises, what is the probability that it will be a 4 rather than a 7? We can work out the answer assuming that the game ends on the second toss, knowing that the chance the game ends in Minerva's victory would be the same if the game ended on the third toss or any subsequent toss. We can define events E and M by:

E: the second toss ends the game M: the second toss gives Minerva a victory. Then the quantity we seek is P(MIE). From Bayes' Theorem (1-15), we can write: P(MIE)

= P(M)P(EIM)/P(E)

From (iii), we already know that P(M) P(E)

= 1/18. As for P(E), we have:

= P(F) * P(game ends on second toss Isecond toss occurs)

where F: game doesn't end on first toss Given (ii), we have P(F) = 2/3. For the conditional probability just above, we need the probability of a 4 or a 7 on the second toss, which is ; + ¼= 1 Therefore, P(E) = ¾ * = Also, P(EIM) = 1.

¼.

¼ ¼-

In consequence: P(MIE) =

(-l-)(1) +

=

1

3.

6

(v)

For Minerva's overall chance of winning, we can write: P(Minerva wins)

= P(Minerva wins on 1st toss) + P(Minerva wins later)

The first term on the right is simply the answer to (i), namely, 2/9. For the second term on the right, we combine the results of (ii) and (iv) to get: P(Minerva wins later)

= P(go past first toss) * P(Minerva wins Ipast first toss)

Thus: P(Minerva wins later)=(¾) * (½)=¾ Adding up: P(Minerva wins)=%+%=¾ Interestingly, Minerva's 44% chance of winning (4/9) is equally divided between a 22% chance of winning on the first toss and a 22% chance of winning later. That circumstance is a coincidence.

1.4

Games of Chance

EXERCISE 1.18

Suppose that Minerva plays a game of Craps that is the same as that in Example 1.17 with one exception: If the game goes to a second toss she wins if a 5 (rather than a 4) comes up before the next 7. How would the answers in Example 1.19 change (if at all) under this version of Craps?

EXAMPLE 1.18

Miller's Dilemma

While a student at MIT's Sloan School of Management in the late 1970's, Mr. Steve Miller faced a problem at the card table. The game was that variant of seven-card poker in which four of a player's cards are visible to all and the other three are concealed. While it was not apparent from his open cards, Steve had three Kings and a pair of 6's, which in poker parlance is a King-high full house. Under normal circumstances, such a strong hand would almost surely be victorious. But the circumstances were not altogether normal. One of the other players was showing three aces and an 8 (all of different suits; see Figure 1-2 below). While those cards would not in themselves beat a full house, four aces or an ace-high full house (three aces and any pair of other cards, like three aces and two S's) would do so. Hence if the player's three hidden cards contained the fourth ace, another 8, or any pair, he would have the stronger hand. Anything that improved the hand would lead to victory. In the betting that followed the deal, all players dropped out quickly except Steve and his three-ace opponent. Soon the bet was very large. The author (who was present) wondered whether, given what Steve knew, his confidence that he could win was really rational.

FIGURE 1-2

What Steve Miller saw from his opponent's hand

(The cards above come courtesy of the United States Playing Card Company.)

Under reasonable assumptions, approximately what is the probability that Steve will win the round? Solution

Literally calculating the probability of Steve's victory would be very difficult: it would depend on all of the cards exposed so far (including those of players who were initially part of the round but became discouraged and dropped out). It would also depend on the attitude of Steve's opponent towards bluffing. But when a problem seems intractable, it is often useful to consider a simplified version of

33

34

CHAPTER 1

The Laws Of Probability

it. So long as that version does not discard the critical details, it can yield a good approximate answer, which is often all that we really need. In that spirit, we estimate Steve's chance of winning given an amnesia assumption, under which we assess his opponent's prospects having "forgotten" everything about Steve's hand and about the hands of defeated players. We also ignore such phenomena as the opponent's past betting behavior. All that we remember are the three Aces and an 8 that the opponent is showing. This assumption is artificial, of course; given such amnesia, however, the adversary's three concealed cards can be treated as constituting a random sample from the remaining 48 cards. Steve would win only if these three cards all fail to improve the existing hand. In other words, we strive to answer the question:

Given three aces and an 8 (all four cards of different suits) and three random cards from the other 48 in the deck of 52 cards, what is the probability of getting a hand that would beat a King-high full house? The startling degree of forgetfulness embodied in the amnesia assumption yields a simpler problem, yet even the amnesia calculation can be very complicated unless one goes about it in an advantageous way. One fruitful approach assumes that the opponent looks at his last three cards one at a time, pausing after each revelation to consider whether his hand has been strengthened. (Clearly, the manner in which he inspects his hand has nothing to do with his chance of winning.) We define the events Ci, C2 and C3 by: C1:

the fifth card fails to improve the hand over the best possible with the first four (i.e. over three aces)

C2:

the sixth card fails to improve the hand over the best possible with the first five

C3:

the seventh card fails to improve over the first six.

Note that anything that improves the opponent's existing poker hand (which already contains three aces) will lead to his victory: it will either yield the final ace or some pair of cards that produces an ace-high full house. Thus, if B is the event that Steve's hand is the stronger of the two, its probability under (1-17) follows:

Steve can only win, after all, if his opponent is disappointed three times in a row. To determine P(C1) under amnesia, we note that the fifth card immediately helps only if it is either the fourth ace or one of the three remaining 8's. Thus, only four the 48 cards remaining in the deck are beneficial. Hence the chance of immediate improvement is 4/48, and P(C1), the chance of the complementary event of . t, must b no zmprovemen e l-4 = 44 . 48 48 To find P(C2 1C1), we observe that, if C1 has occurred and the opponent has not improved his hand, then the first five cards consist of three aces, one 8 and

1.4

35

Games of Chance

one card with some other denomination X. Thus, of the 47 remaining cards, there are seven that would be immediately helpful: one ace, three 8's, and three X's. The chance of improvement is therefore 7/47, and that of no improvementP(C21C1)-is 1 - ;

7

=

!~. Extending this argument to the seventh card, we see

10 36 = . 46 46 Collecting these three probabilities and combining them via (1 - q), we learn that: that P(C3 IC1 n Cz) 1 -

P(B) = (44) * (40) * (36) ... 0 61 48 47 46 . This means that Steve's chance of winning is over 3 in 5 under amnesia or, to put it another way, that Steve is nearly twice as likely to win as to lose.(!) Given the many ways that the opponent could advance above three aces, this assessment of Steve's prospects seems surprisingly upbeat. But it reflects a cold fact: most of the opponent's potential improvements are remote possibilities, so the cumulative chance that he will gain falls well short of 50%. Two questions might be welling up now in the reader's mind. The first might be how this "amnesia" result would change if we considered the full information that Steve possessed about the cards that were dealt, including his own seven cards. There is a exercise on the subject (Exercise 1.75) but the brief answer is "not very much." Because Steve held a King-high full house, for example, his adversary's hidden cards could not include two Kings. But because Steve lacked any Queens, the probability was greater that his opponent held a pair of them. In other words, the particulars of the opponent's hand made some possibilities that would hurt Steve less likely, but others more so. What happens is that the details omitted by the amnesia analysis tend to cancel in their effects, and the answer remains reliable even beyond the conditions of its calculation. The other question is closer to earth: what actually happened in the game? Despite the analysis just performed, Steve lost: his opponent possessed the fourth ace (!) This outcome was especially sad because, of the three ways the opponent could improve his hand-a fourth Ace, another 8, another pair-getting the fourth ace was by far the least likely. Steve had proceeded quite sensibly given what he knew, but even the best decision under imperfect information will sometimes be the wrong one. Readers will be pleased to know, however, that Steve has recovered from his loss and now lives happily in Los Angeles. And he is now head of the US Poker Players Alliance (really!!).

EXERCISE 1.19

In seven-card poker, a flush arises if at least five of the cards are of the same suit. (An example of a flush would be five hearts, one club, and one spade.) Suppose that the first four cards that Minerva gets at poker are all diamonds, and that she

36

CHAPTER 1

The Laws Of Probability

has not yet seen the other three. (She also knows nothing about what cards the other players have.) What is the probability that she will achieve a flush? (There are four suits in the deck of 52 cards.)

EXAMPLE 1.19

Let's Make a Deal Passionate debates broke out in college classrooms a few years ago about a simplesounding question from a television quiz show named "Let's Make a Deal." A contestant would be asked to guess which one of three closed doors had a big prize behind it. (The others had nothing behind them.) The emcee (Monty Hall) would then open one door that the contestant had not chosen, which was never the door that concealed the prize. Then the contestant was asked whether she wanted to switch her guess to the remaining door that the emcee hadn't opened. The emcee knew full well which door concealed the prize, and he deliberately opened a losing door that the contestant hadn't selected. If the contestant's initial

guess was correct, the emcee chose randomly between the two remaining doors; if the initial guess was wrong, he was forced to open the one remaining door with no prize behind it. The question: Should the contestant switch or not? For example, if the contestant guessed A and the emcee opened B, should the contestant change her guess to C?

Solution: The issue is clearer if we frame the dilemma correctly as a probability problem. We define events G and F by:

G: the contestant's first guess was correct F: the contestant would benefit by switching. What we seek is P(F). We can readily break event F down into the two mutually exclusive ways it can arise under the Law of Alternatives (1-7): P(F) = P(G n F) + P(G' n F)

(1-21)

where G' is the complement of G (i.e., the first guess was wrong) Equation (1-21) can be rewritten under (1-15) as: P(F)

= P(G)P(Fi G) + P(G')P(Fi G')

(1-22)

If the prize was placed randomly behind one of the three doors and the contestant had no inside information, then P(G) = 1/3 and P(G') = 2/3. To find the two conditional probabilities P(F IG) and P(F IGe), we assume for simplicity that the contestant chose door A (out of A, B, and C) and the emcee opened B. If the contestant's choice was wrong, the emcee had no option but to open B (because C was the right answer). Thus, switching to C would definitely bring benefits to the contestant, and hence P(F IG') = l. If the choice of A was

1.4

37

Games of Chance

correct, however, then switching would be exactly the wrong decision, meaning that P(F IG) = 0. Putting it all together via (1-22), we see that: P(F)

= (1/ 3)(0) + (2/3)(1) = 2/ 3

In short, the contestant would do well to take the hint and switch, thereby increasing her chances of winning the prize from the initial 1/ 3 to 2/ 3. The Debate

What is controversial about this analysis? Well, really nothing, but many people reasoned that, if the contestant chose A and the emcee opened B, then the prize could be behind either A or C. Treating those two possibilities as equally likely, these people saw no point in changing the original guess. Ah, but an important distinction must be borne in mind. Had door B accidentally opened at random, then it would be fair to say that A and C, which initially had an equal chance of hiding the prize, continue to have an equal chance. But the emcee is not choosing a door at random; he is operating under a constraint which means that-in the 2/3 of cases when the contestant's first guess was wrong-the emcee's selection provides a sure-fire sign that the prize is behind the remaining door. To put it another way, when the contestant's first guess was wrong, the conditional probability is 100% that switching will lead to the right answer. Following the deterministic rule "always switch" will, therefore, be correct 2/ 3 of the time. Framing the issue more generally, the emcee's choice provides new information that changes the conditional probabilities about doors A or C and which, when properly incorporated into a probabilistic assessment-argues for a switch.

EXERCISE 1.20

Imagine a version of "Let's Make a Deal" with four doors, in which the emcee will always open two of the three doors that the contestant didn't choose and which have nothing behind them. Would the contestant do well to switch in this game? Is the case for switching here stronger than in the actual game with three doors?

EXAMPLE 1.20

The Birthday Problem One of the classic probability problems asks: if M people are assembled in a room, what is the chance no two of them share the same birthday? Approximately what is the answer? Solution

This problem was the subject of the bet between Mendel and Minerva that started this chapter. The calculation starts off in a routine enough way but it ends on a note of great surprise.

38

CHAPTER 1

The Laws Of Probability

To solve this problem, one must make some assumptions. Here we make the two standard ones: • There is no prior relationship among the various birthdays (i.e., that we are not (say) at a convention of twins). • Birthdays are uniformly distributed over the year. (This assumption means that, neglecting Leap Year, the chance of being born on any particular date is 1 in 365.) Suppose that one went around the room asking each of the people his or her birthday. Let A ; be the event that the ith person questioned mentions a date that has not come up before, and let C be the event that all M people have different birthdays. Then P(C)

=P(A1 n A 2 ••. n A AJ =P(A1)P(A 2 IA 1) . . . P(AMIA1 n A 2 • . • n A M_1)

(1-23)

The quantities on the right-hand side of (1-23) can readily be deduced. Obviously, P(AJ = 1, because the first person asked cannot repeat a date that came up previously. The second respondent will announce a new date unless he was born the same day as the first; under uniformity, the chance of such overlap is 1/ 365 1 364 and thus P (~ I~ )1 = . 365 365 To find P(A 3 IA 1andA0, we note that if A 1andA 2 have both occurred, there are two dates the third person must avoid to prevent repetition. The chance of doing so is 363/ 365. To find P(Akl A 1 n A 2 •. • n Ak_1), we note that the kth person must avoid the birthdays of the previous k - l, which are all assumed to be different. We therefore reach the formula: P (AkI~

n ~ ... n Ak-1 ) =

365 - (k - 1) 365

We can therefore express P(C) in (1-23) by: P(C) = l

* (364) * (363) 365

365 - (M -1) 365

365 ...

365) (364) (363)

= ( 365 * 365 * 365 . . . or P(C) = 365 * 364 * 363 * ... 365M

366 - M 365

* (366 -

M)

(1-24)

This is an opportune time to remind readers about the concept of factorials. For any positive integer k, the quantity written as k! and spoken as k-factorial is the product of all the integers from 1 to k. Thus 3! = 6 and 4! = 24. With this definition, the quotient j!/k! for j > k-which includes all whole numbers from 1 to kin both numerator and denominator-is simply the product of all the integers between k + 1 and j. Bearing this last statement in mind, we can rewrite (1-24) as: P(C)

= (365!/(365 365M

M!)) (1-25)

1.4

Games of Chance

39

From the viewpoint of calculation (1-25) is often friendlier than it seems, for there are superb simple approximations for k! when k is very large. The birthday problem attains its zest when (1-25) is evaluated at particular values of M. When M is as small as 25, the probability of no shared birthdays is only .48. When M = 60, the probability drops to 1 in 170. And for 100 individuals, the chance that all have different birthdays drops to 1 in 3 million. Most people find such statistics astounding. When M = 60, there are six times as many dates as individuals, leaving plenty of room for the people to spread out their birthdays. (Indeed, each one could take up nearly a week.) But while birthday overlap is thus wholly unnecessary, the calculation tells us it is not just likely but almost inevitable. Nor is the outcome some peculiar consequence of the uniform-birthdays assumption. One can show rigorously that the chance of birthday overlap is actually lower under this hypothesis than any other. The reason is that an alternate assumption must involve clumping of birthdays at certain times of the year, a circumstance that would make duplication even more probable.

A Post Mortem

As we think about the problem a bit more, however, the answer starts to appear less preposterous. Suppose that someone enters a room where 59 others are present (i.e. M = 60) and wonders whether any were born on the same day as she. Because, collectively, their birthdays cover roughly a sixth of the year, the chance that her birthday is shared by someone else is something like 1 in 6. Likewise, each of the other 60 people in the rooms has roughly a 1 in 6 chance of sharing a birthday with someone. But that circumstance reminds us of a analogous situation. One in six is also the chance that a fair die comes up three. If such a die is cast 60 times, the probability that it never comes up three is extremely low. (We would expect roughly 60 * (1/6) = 10 threes to arise.) In the same way, the chance that all 60 people avoid the 1 in 6 risk of birthday overlap should also be very low. The comparison just made is not perfect: different tosses of a die are independent, while the birthday overlaps of different people are not (e.g., if one person shares his birthday with 12 others, each of them must be in the same situation). But the similarity with the die-toss example is more powerful than the difference. The point is that for each of 60 people to avoid a "collision" with all 59 of the others, thousands of potential collisions must be avoided. Doing so takes a lot of luck. The actual answer is telling us that, somewhere along the line, this luck is likely to run out. Now do you see why Minerva offered Mendel a 100-to-1 bet? The chance that Minerva would win was 99.4% (169/ 170), which was 166 times as great as the chance that Mendel would win. If they went from party to party with 60 guests, making the same bet for each party, Minerva would win 169 times out of 170, meaning she would take in $169 for every $100 she paid out. Not bad, especially if the parties were fun.

40

EXERCISE 1.21

CHAPTER 1

The Laws Of Probability

In a group of 80 people, what is the chance of no birthday overlap?

Health and Safety Changing pace quite radically, we now move from amusing games to more serious questions about personal health and safety. In each case, we work w ith actual data.

EXAMPLE 1.21

Length of Life

According to the US National Center for Health Statistics, a baby born in the US in 2004 would have an 88.0 % chance of living until age 60 and a 53.9% chance of living until age 80. Based on these data:

• If a baby born in 2004 actually lives until age 60 (in 2064), what is the probability that baby will live until age 80? • What is the probability that an American baby born in 2004 will have a lifespan between 60 and 80 years?

Solution To answer the first question, we note that it really is an exercise in conditional probability. Suppose we define events A and B by:

A: the baby lives until age 60 B: the baby lives until age 80 The question is asking for P(B IA). As always, we can cite (1-10) and write: P(BIA) = P(A n B)/ P(A). P(A) = .88 given the data. As for P(A n B), we note that everyone who reaches age 80 has previously reached age 60 (i.e. that B c A), meaning that P(A n B) = P(B) under (1-8). But P(B) = .539 given the data. Hence P(B I A) = P(A n B)/ P(A) = (.539)/ (.88) = .613. Is this last outcome surprising? At birth, the baby has a 53.9% chance of reaching age 80. But if the person is still alive in 2064, the estimated chance of reaching age 80 only increases to 61.3%. One might have expected a greater increase than that: 60 years is 3/ 4 of 80 years, and making it 3/ 4 of the way might seem to increase sharply the chance of making it all the way. The problem with that reasoning is that, because P(A) = .88, the first sixty years that have passed are years of low death risk: indeed, the twenty years between ages 60 and 80 entail far higher risk than the first 60 years combined. A person who reaches 60, therefore, has not avoided 3/ 4 of the risk of not reaching 80, but instead a far lesser proportion. Which brings us to the second question: what is the chance the baby's lifespan will be between 60 and 80? The temptation is to take the drop between the chance of reaching 60 and the chance of reaching 80, meaning .88 - .539 = .341. But,

1.5

Health and Safety

41

in Probability, we should be wary of yielding at once to our temptations. Instead, we might define event C by: C: lifespan between 60 and 80

Let N and Be be the complements of events A and B respectively. Thus, N is the chance of not reaching 60 and Be of not reaching 80. We can then write: P(W) = P(N) + P(C)

There are, after all, two mutually exclusive ways of not reaching age 80: dying before age 60, or dying between 60 and 80. We have the equivalent equation: P(C)

= P(BC) -

P(N)

We know that P(Bc) = 1 - P(B) = 1 - .539 = .461, while P(N) = 1 - .88 = .12. It follows that P(C) = .461 - .12 = .341. In this instance, therefore, the simple subtraction rule was correct! No one is saying that our gut instincts always lead us astray; they do so often enough, however, that we do well to work out the answer rather than proclaim that it is obvious. Of course, these statistics are based on longevity patterns that prevailed in 2004. The chance that someone who reached 50 would reach 51, for example, might be estimated by the fraction of Americans who turned 50 in 2003 who were still alive one year later. Advances in medicine over the next half century could increase the likelihood of reaching 80 considerably, meaning that the 59.3% estimate could well be pessimistic. Any probabilistic forecast involves assumptions, and one does well to find out what they are before taking the forecast too seriously.

EXERCISE 1.22

Data suggest a 20.8% probability that an American baby born in 2004 will be alive 90 years later. Given that information and the data in Example 1.21, what is the conditional probability that a 2004 baby who lives to age 80 will make it to age 90?

EXAMPLE 1.22

Space Problem

The laws of probability are unforgiving when events are described with imprecise language. In 1979, for example, the US National Aeronautics and Space Administration (NASA) decided to warn people that a Skylab space station had fallen out of orbit and was headed towards earth, where its debris could fall in a populated area. NASA tried to make its announcement less frightening by accompanying it with this risk assessment:

42

CHAPTER 1

(i) (ii)

The Laws Of Probability

The probability is only 1 in 150 that someone will be hit by falling debris from the Skylab. There are four billion people on the earth.

Given (i) and (ii), NASA estimated that the chance any particular individual would be hit by Skylab debris as (1/150) * (1/4 billion), which is 1 in 600 billion. So negligible is that level of risk that the best response might be to forget it entirely. Might there be a flaw in NASA's reasoning? Solution

Some ambiguity in NASA's risk description raises questions about the statistic 1 in 600 billion. What does it mean to say that there is a 1 in 150 chance that "someone" is hit by debris? The clear implication is that there is a 149 in 150 probability that no one is hit, but how many people are hit given that "someone" is? The answer to that question is crucial to determining the level of individual risk. Let H represent the event that a particular person (e.g., Mendel) is hit and S the event that "someone" is hit. We know from the fourth probability law that: P(H) = P(S)P(HI S) Now P(S) = 1/150, but what is P(HI S)? If we assume that the number of people struck by Skylab debris cannot exceed one, then the risk to an individual (P(HI S)) given that someone was struck is 1 in 4 billion. Thus, as NASA said, P(H) can be estimated as (1/150)(1/4 billion)= 1 in 600 billion. But why is it certain that the debris can hit at most one person? If a crowded bus gets struck, dozens of people could be hit simultaneously. The possibility of multiple victims affects the estimate of P(H given S): if, for example, the number of individuals struck given that someone is gets assigned a 25% chance of being one and a 50% chance of being two and a 25% chance of being three, then P(HI S) would follow: P(HI S) = 1/4 (1/4 billion)+ 1/2 (2/4 billion)+ 1/4 (3/4 billion) In that case P(HI S) would be 1 in 2 billion, and P(H) would double to 1 in 300 billion. That number is still infinitesimal, but we can imagine more ominous possibilities in which "someone" is struck and hundreds of lives are endangered. In other words, NASA's assessment-based on the lowest possible value of P(HI S)may well have been too optimistic. Fortunately, the debris fell harmlessly in a remote part of Australia and no one was injured. But the lesson for us is that a word like "someone" is not helpful in describing an event: do we mean "exactly one" or "at least one?" If we are unclear, then we might unintentionally jump in our thinking from one of these phrases to another. Perhaps NASA did that when it initially intended "someone" to mean "at least one" but shifted to "exactly one" later in its risk calculation.

1.5

Health and Safety

EXERCISE 1.23

What is your estimate of the expected number of people that would be hit by Skylab debris given that at least one person is hit (i.e. of E(HI S))? With that estimate, what is the chance that debris would hit a randomly-chosen person alive in 1979?

EXAMPLE 1.23

Homicide in Detroit

The overall murder rate in the US is many times that of the other Westernized countries with which it is usually grouped. But the suggestion has arisen that, in absolute as opposed to relative terms, the U.S. homicide problem is not so acute. For example, The New York Times argued in 1975 that, "if you live in Detroit, the odds are 2000 to 1 that you will not be killed by one of your fellow citizens. Optimists searching for perspective in the city's homicide statistics insist that these odds are pretty good." (New York Times, 1/30/75; italics added) The Times based its assessment on Detroit's 1974 homicide data: 700 slayings that year out of roughly 1.4 million residents. Might the reassurance offered by this "perspective" be unwarranted?

Solution In answering this question, we need consider what precisely The Times did. It computed the proportion of Detroiters slain in 1974 (namely, 700/1,400,000 = 1/2000) and equated that figure to the risk level faced by an average resident. That approach is encouraged by data summaries like the F.B.I. Uniform Crime Reports, which give special weight to the rate statistic "killings per 100,000 inhabitants per year." But a difficulty is inherent in the phrase "per year." Why emphasize annual risk as opposed to that per day, per month, or per decade? A person born in Detroit who continues to live there is in danger of being murdered throughout his life. And if the risk is being sustained over a lifetime, that would seem the natural time frame over which to measure it. To put it another way, the full hazard that Detroit's murders pose to its residents is only reflected in a question like: For a randomly chosen person born in Detroit who will reside there, what is the chance that homicide will be the ultimate cause of death? We try now to answer that question, realizing that we can only proceed if we make some initial assumptions: • Barring homicide, a randomly picked Detroiter just born would have a lifespan of exactly 70 years. • Extrapolating forward the 1974 statistic, we assume that those who have survived until age i (i = 0, 1, ... , 69), have a 1 in 2000 chance of being murdered at that age. We are not mathematically obliged to simplify to this extent, but these assumptions provide for a reasonable "baseline" calculation.

43

44

CHAPTER 1

The Laws Of Probability

Let A be the event that a randomly-chosen newborn is eventually murdered, and N be the complementary event that this person is not murdered and thus dies of natural causes at age 70. If C; is the event that this person was not slain at age i, then it is true under our assumptions and (1-18) that:

Because eventually being slain is the complement of never being murdered, the quantity P(A) that we seek follows: P(A) = 1 - P(N)

This last relationship is useful because P(N) can be found directly. The ageinvariant 1-in-2000 assumption for Detroit implies that, for each age i, P(C;I C0 & . .. C;_1) = 1999/2000. That circumstance transforms (1-26) into the equation:

which means that P(A)

=1-

1999 ) ( 2000

70

In a formal sense, the analysis is over. But one gets little "feel" for the magnitude of the risk because the numerical answer exists in an opaque form. If we 70

actually work out ( 1999) with a calculator, we find that P(N) = .964. But that 2000 means that P(A) = .036, which is about one in 28. One in twenty-eight. This statistic means that nearly 4% of Detroit's lifetime residents would fall victim to its lethal violence. This calculation suggests that Detroit's two daily killings do not endanger its citizens only minimally, but instead imply substantial long-term risk. The "optimists" who took consolation in the raw 1974 data had succumbed to the fallacy that, if a risk is small over a short period, it will likewise be so over a much longer one. Of course, murder risk in Detroit was not the same at all ages, and it depended heavily on such factors as gender, race, and lifestyle. But taking those factors into account would not meaningfully affect the overall 1-in-28 statistic. If some individuals were at much lower risk than the statistic implied, that is balanced by the fact that others were at considerably higher risk. And if the risk of being murdered varied with age, that circumstance means that 1/2000 was too high a risk estimate at some ages but too low at others. The opposite errors largely cancelled out in the lifetime risk calculation. Was 1974 an unusually violent year, meaning that extrapolating its murder pattern forward was unrealistic? For Detroit, unfortunately not. While homicide in the United States has generally declined in recent years, that is not the case in Detroit: If the calculation above is performed with 2011- 13 data, the lifetime mortality risk is, once again, 1 in 28.

45

Health and Safety

1.5

EXERCISE 1.24

New York City had 8.2 million residents in 2013 and 335 murders. Using the methodology of Example 1.23, estimate the lifetime murder risk for a New York City resident based on the 2013 pattern.

EXAMPLE 1.24

Safe to Fly? For a somewhat less unnerving look at mortality data, we might consider the risk of flying. The author does data analyses on the topic and, in this electronic age, sometimes receives e-mails from nervous travelers. One example is the following message: "I stopped flying about a year ago, and this has affected my life in a significant manner.... One question: what are the odds of me dying in a plane crash?" What is an approximate answer to that question for US commercial aviation? Solution

Here we undertake a quick data analysis. We focus on the question "given that a person picks a flight at random within a set of interest, what is the probability that she will die in an aviation accident?" For the set of interest, we choose all scheduled flights by US airlines over the period 2008-12. First, a general formula. Suppose that N is the number of flights under consideration, and let X; be the fraction of passengers on flight i(i = 1,2, .. . N) who perish in an accident. (If the flight lands safely, then x = O; if everyone aboard is killed in an accident, then X; = l; if a crash kills 20% of the passengers, then X; = 0.2.) And let Q be the death risk per (randomly chosen) flight within the set of interest. To determine Q, we note three points: • The chance of picking any particular flight at random among the N under study is 1/N. • Given the choice of flight i, the conditional probability of dying in an accident is X;. • The N different ways of perishing as a result of the "lottery" are mutually exclusive events: one cannot simultaneously pick (say) flight 3 and flight 67. We can write: Q

= P(select flight 1 and perish) + P(select flight 2 and perish) + ... + P(select flight N and perish) (1-27)

With (1-27) in hand, we can turn to data. Over 2008-12, US airlines performed a total of 32 million flights. Of these exactly one suffered fatalities; a plane crashed in Buffalo, New York on 2/1/09 with no survivors. In other words, N = 45 million,

46

CHAPTER 1

The Laws Of Probability

and X ; = l for one of the flights, while X ; = 0 for all the others. It follows Q = 1 in 45 million. One is 45 million. That is obviously a low number, but can we get an intuitive sense of how low? We start with a few numbers. In 2009, Barack Obama was inaugurated as the 44th president of the United States. The first president, George Washington, took office in 1789. Since that time, Americans have elected a new president on average once every (2009 -1789)/43 = 5.1 years. Moreover, in the first eight years of the 21st century, an average of 4.1 million Americans were born each year. Only Americans born in the US are eligible to be elected president. At 4.1 million per year, approximately 21 million Americans are born over a period of 5.1 years. Based on the historical average, one of them will be elected president. In other words, recent data suggest an American kid about to board a plane at a US airport is considerably more likely to grow up to be president than to perish on today's flight(l in 21 million vs. 1 in 45 million). If becoming president seems an extremely remote possibility, then so is a terrible outcome on the flight. To be sure, this calculation does not include 9/11, the worst day ever in the history of civil aviation. (There were no terrorist deaths on US aircraft over 2008-12.) The hope is that the events of that day are a horrid aberration that will never be repeated.

EXERCISE 1.25

Over 2007-2010, the airlines of Southern/Western Europe performed approximately 25 million flights. They suffered two fatal crashes: one in 2008 which killed 89% of the passengers and one in 2009 which killed all the passengers. Given these data, work out the death risk per flight for these European airlines over 2007-10. At one flight per day, how many years could a traveler take these airlines before the reach of dying in a crash would reach 50%?

EXAMPLE 1.25

Was Hinckley Schizophrenic? In March 1981, John Hinckley shot President Reagan in an attempt to assassinate him. At the subsequent trial, no one denied that Hinckley had fired the shots that had wounded four people; the point of contention was whether he was sane when he did so. His lawyers asserted that he was insane because he suffered from schizophrenia. To buttress their case, they sought to introduce into evidence a CAT-scan of Hinckley's brain that revealed it to be atrophied. Because such atrophy is believed to be present in 30% of schizophrenics but only 2% of others, the attorneys depicted the test result as a powerful indicator of Hinckley's deranged condition. At least one mathematician has suggested, however, that this argument represented a dreadful abuse of statistics. His criticism-which takes note of official

1.5

Health and Safety

47

estimates that 1.5% of Americans are schizophrenic-centers on the answer to the question: if a randomly chosen American takes a CAT-scan and it shows brain atrophy, what is the probability that he is schizophrenic? Given the numbers above, approximately what is the answer to that question? Solution

We define several events: S: a randomly-chosen participant in a national CAT-scan suffers from schizophrenia s c: the complementary event that this participant is not schizophrenic

A : the participant's CAT-scan shows brain atrophy In this notation, the mathematician's question above reduces to a call for the calculation of P(S IA). The numbers available for such an endeavor are: P(S)

= .015; P(Sc) = 1 -

.015 = .985; P(A IS)

= .30; P(A

J

s c) = .02.

Solution

From Bayes' Theorem (1-16), we know that P(S IA)

= P(S) P(A I5)/P(A)

(1-28)

Given the numbers in the preceding paragraph, we already know that P(S) = .015 and P(AI S) = .30. But P(A), the denominator of (1-28), requires some further work. We use the Law of Alternatives (1-7) to break down the event A to write:

P(A) = P(S

n A) + P(Sc n A)

(1-29)

Now we can twice employ (1-15) to restate (1-29) as:

P(A) which means that: P(A) Thus: P(S IA)

= P(Sc)P(A

J

s c) + P(S)P(A IS)

= P(Sc)P(A I S') + P(S)P(A IS) = .985 * .02 + .015 * .30 = .0242

= P(S) P(A I5)/ P(A) = (.015 * .30)/.0242 = .186

In short, the person has better than a 4 in 5 chance of not being schizophrenic. A relative frequency argument makes the result intuitively clearer (Figure 1-3). Out of every 1000 people tested, roughly 15 will be schizophrenic and the other 985 not so. On average, about 30% of the schizophrenics (i.e. about 15 * .3, or about five) will show atrophy, as will 2% of the nonschizophrenics (i.e., about 985 * .02, or about 20). Thus, of the roughly 25 cases of atrophy arising from these 1000 people (5 + 20), only about one-fifth (5/25) arise from schizophrenics. What is happening is that 2% of a large number (985) dwarfs 30% of a small one (15). The mathematician's concern was that Hinckley's lawyers were emphasizing the higher rate of atrophy among schizophrenics (30% vs. 2%) but ignoring the rarity of schizophrenia in the population (1.5%). Their evidence, he believed, would

48

CHAPTER 1

The Laws Of Probability

1000 Americans 1.5%

FIGURE 1-3

98.5%

Of 1000 Americans, 25 Will Show Atrophy, Only 5 ofThem Schizophrenic

make it hard for jurors to recognize only 1/5 of those with atrophy in a national CAT-scan would actually be schizophrenic, meaning that that the CATscan outcome in itself was not a strong signal of schizophrenia. Discussion

The mathematician's argument is intriguing, and it suggests the value of probabilistic thinking in preventing the neglect of important information. One can, however, imagine another interpretation of the numbers more favorable to the defendant. Even though our primary interest is in Statistics rather than criminal law, a brief statement of an alternative viewpoint seems useful. The Bayesian computation makes clear that, among Americans chosen at random for CATscans, only about one fifth of those who showed brain atrophy would actually be schizophrenic. But Hinckley's CATscan was not part of some national sample; it was performed precisely because he was suspected of schizophrenia. Given that circumstance, should Hinckley be assigned the same prior probability of suffering the malady (1.5 percent) as a hypothetical member of the population? To consider another perspective, suppose that a juror, after hearing psychiatric testimony and other evidence, believes the probability is P that Hinckley is schizophrenic. (P corresponds to the .015 in the national calculation.) On learning the CATscan results, she should revise that probability via (1-28 and 29) to the value Q that follows: Q

.3P

= .3P + .02(1 - P)

(1-30)

When (1-30) is used to modify various values of P, the results are most interesting. For P = .5 (which corresponds to the case of complete indecision), Q takes

1.5

Health and Safety

49

on the far less equivocal value of .94. When P = .75, Q = .98. And even when P is only 1/4, Q takes on the immensely higher value of 5/6. It therefore appears that the CATscan results, even if not decisive in their own right, could be potent in conjunction with the other elements of the case (which have already raised P above the population average of .015). And this general situation is not unusual: often a strong case consists of an accumulation of clues that, considered singly, would not be compelling.

EXERCISE 1.26

For what initial probability of schizophrenia (P) in Example 1.25 would the revised probability of schizophrenia be 99% given atrophy in a CAT-scan?

EXAMPLE 1.26

Survival Curves

100

. >

79

~ 1:

IIO

0..

2IJ

;l!

. ~

0 0

12

Tlme In

24 Month■

A Typical Survival Curve

In medicine, it is often helpful to investigate how long a person survives once diagnosed with (or treated for) a life-threatening condition. A common mechanism for summarizing information on that subject is a survival curve, the preparation of which is an exercise with the Joint Probability Law. Given the data: Number of Patients Who: Period

Survive

Die

Are Lost to Follow Up

1

60 24

10

20 30

2 3

18

6 2

4

so

CHAPTER 1

The Laws Of Probability

What inferences would be drawn under survival-curve methodology about the chance of being alive k periods after treatment, for k = l, 2, and 3?

Solution Suppose that A ; is the event that the patient is still alive at the end of the ith time unit after diagnosis. Because one can't be alive at the end of the ith period unless one was alive at the end of all prior periods, we can write: P(A,,)

= P(A1 n A 2 • .. n A n) = P(A1) * P(A2IA1) * P(A3IA1 n Az) * ... P(A)A1 n A 2 .. . n A n-1)

The category "lost to follow up" contains those individuals who were known to be alive at the start of the period but for whom there is no further information. This category includes patients who were treated for the disease at a hospital far from their homes, meaning that continuing records about their condition were unavailable. It also includes people who cannot provide information about a particular period because they were diagnosed too recently. A patient who was diagnosed three periods ago, for example, provides no follow-up information for the fifth period after diagnosis. With these data, P(A 1) would typically be estimated as 60/70 = .858, with the 20 people lost to follow up in the first period being excluded from the calculation. That exclusion is no problem so long as loss to follow up is uncorrelated with the patient's condition. But that assumption is not always warranted: If (say) patients in good health are especially likely to be lost because they do not seek further medical treatment, then death-rate statistics that exclude them could well be too pessimistic. Continuing the calculation, P(A 2IA 1) would be estimated as 24/30 = .8. (The 30 people lost to follow-up and excluded in calculating P(A 2IA1) were all among the 60 known to have survived period 1.) Thus P(Az) would be approximated by P(A 1)P(A 2IA 1) = .858 * .8 = .685. P(A 3IA 1 and Az) would be estimated as 18/20 = .9. In consequence, P(A 3) would be estimated by:

Of course, this 61.8% survival rate was based on a limited number of patients (namely, 70), and might have been different if we had another set of 70 similar patients. Actually, the factor P(A 3 lA 1 n Az) was based on a sample size of only 20; if we had another 20 patients, we might have had 15 survivors and five deaths, yielding an estimate of P(A 3IA 1 n Az) of .75 rather than .9. Thus, the calculated 61.8% rate is subject to statistical sampling error. Some people examining the survival chart might consider this analysis too complicated. They might note that, at the end of the three periods, 18 patients were known still to be alive, and another 18 to be dead. Why not simply use these two numbers to estimate the three-period survival rate as 18/ 36 = 50%?

1.6

Other Matters

51

The best reason is that those lost to follow-up after the first or second periods did provide some information before they left the radar screen. The ten patients who were lost in the third period, for example, were known to have survived the first two. The conditional probability calculation that reaches 61.8% uses the partial data that these patients provide, all of which was favorable about survival. The 50% calculation, on the other hand, discards that information completely. In general, the early data about these lost patients constitutes relevant information. Better to have had and lost these patients than never to have had them at all.

EXERCISE 1.27

IDIIIIIII

Suppose that it is subsequently learned that exactly 33 of the 60 patients initially classified as "lost to follow up" in Example 1.27 were alive at the end of period 3. Given this new information, what is the three-period survival probability for all patients? Can we work out the two-period survival probability with the new information?

Other Matters Lots of activities fall between risking our fortunes and risking our lives, and Probability can help in analyzing many of them. Now we consider eight examples on widely-divergent themes.

EXAMPLE 1.27

Color Scheme The geneticist Gregor Mendel analyzed the inheritance of traits from parents to their offspring. The basic principles are: • Each of the two parents has two genes on a given dimension (e.g. color), and each offspring gets one gene from each parent. • Each of a particular parent's two genes has an equal chance of being passed on to the offspring. • The genetic contributions from the two parents are independent. There are two different types of genes, dominant and recessive, associated with different traits. (For example, for the colors of peas, yellow is dominant and green is recessive.) If an offspring gets one dominant gene and one recessive one, then it will exhibit the dominant trait, as it will if it gets two dominant genes from its parents. Only if both genes are recessive will the recessive trait show up. Some questions: (i)

Suppose that one parent pea plant has one yellow gene and one green one (i.e. is hybrid), and that the other has two green genes. If they "mate," what is the probability that their offspring (named Zelda) will be green?

52

CHAPTER 1

The Laws Of Probability

(Actually, a pea plant is self-pollinating, and can mate with itself. But, as Mendel literally did (the geneticist, not our Mendel), we will create male and female pea-plants for our discussion.) (ii) Suppose that the overall population of pea plants is 25% all yellow, 50% hybrid yellow and green, and 25% all green in genetic makeup. If Zelda is green and picks a mate at random, what is the probability that they have a green offspring? (iii) Now suppose that Zelda is green and picks a mate at random from among those that are yellow (her favorite color). What is the probability that their offspring will be green? (iv) Assuming the 25/ 50/ 25 split among pea plants in a given generation, will green peas be less common in the next generation? Are they on a path towards eventually dying out? Solution:

(i)

(ii)

If the two parents have genetic makeup (Y,G) and (G,G), then their offspring will be green if and only if its genetic makeup is (G,G). (Because green is recessive, a (Y,G) plant will be yellow.) The probability of getting a green gene from the hybrid parent is 50%, and is 100% of doing so from the green parent. Under independence, the probability of both these events is (1/ 2) * 1 = 1/ 2. If Zelda is green, then her genetic makeup is known to be (G,G). She can have a green offspring in two mutually exclusive ways:

• Her mate is (G,G), and thus a green offspring is assured. • Her mate is (Y,G), and donates a green gene to the offspring. Thus P(green offspring)

= P(mate is GG) + P(mate is YG)P(G gene from YG)

Given the 25/50/ 25 population makeup, we have: P(green offspring)

= .25 + .5 * .5 = .50

(iii) Zelda is green. Because her mate is yellow (the dominant color), his genetic makeup is either (Y,Y) or (Y,G). If (Y,Y), then the offspring will surely be yellow; if (Y,G), then the offspring will be green if he donates a green gene to the youth. Thus P(green offspring)

= P(mate (Y,G) Imate is (Y,G) or (Y,Y)) *

P(G gene from YG)

Now (1-10) tells us that: P(mate (Y,G) Imate (Y,G) or (Y,Y) = P(mate (Y,G))/ P(mate ((Y,G) or (Y,Y)) But, based on the makeup of the pea population, P(mate (Y,G)) P(mate (Y,G) or (Y,Y)) = P((Y,G) u (Y,Y)) = 3/ 4. Thus,

= 1/ 2, while

1.6

Other Matters

53

P((Y,G)i (Y,G)

(iv)

u (Y,Y)) = (1/2)/ (3/ 4) = 2/ 3

Therefore P(green offspring) = (2/ 3) * P(G from YG) = 2/ 3 * 1/ 2 = 1/ 3. Well, suppose mating is random in the pea population with respect to color. Given two parents each selected from the 25/ 50/25 mix, what is the probability that their offspring is green? There are now three mutually exclusive routes to a green kid: • Both parents are (G,G), and thus a green offspring is assured. • One parent is (G,G) and the other (Y,G), and both donate a green gene to the offspring. • Both parents are (Y,G), and both donate a green gene to the offspring. Or: P(green offspring) = P(both parents (G,G)) + P(one (G,G) and one (Y,G) * P(both donate green I(G,G) and (Y,G)) + P(both parents (Y,G) * P(both donate greenlboth (Y,G)).

We know that: • P(both parents (G,G)) = (1/ 4) 2 = 1/ 16, and P(both parents (Y,G) = (1/2) 2 = 1/4. • P(father (G,G) n mother (Y,G) = 1/ 4 * 1/ 2 = 1/ 8, as is P(mother(G,G) n father (Y,G)). Therefore, P(one parent (G,G) and the other (Y,G)) = 1/ 8 + 1/ 8 = 1/4. • Also, from (i), we know that (P(both donate green gene Ione(Y,G) and one (G,G)) = 1/ 2. And P(both donate greenlboth (Y,G)) = (1/ 2) 2 = 1/ 4 Using these various numbers in the equation above them, we get: P(green offspring)= 1/ 16 + 1/ 4 * 1/ 2 + 1/ 4 * 1/ 4 = 1/4. Under random mating, therefore, the next generation will be 25% green, the same as this generation. And the argument can be extended to generations after that. There is no tendency for green pea plants to die out over time, though they w ill always be a minority of pea plants.

EXERCISE 1.28

If a green pea plant mates with three different pea plants chosen at random in Example 1.29, what is the probability that all of the resulting offspring are green? (Treat the outcomes of different matings as independent.)

54

EXAMPLE 1.28

CHAPTER 1

The Laws Of Probability

Cost Benefit Analysis

When organizations like the US government consider buying new hardware systems, they often ask whether the economic benefits that the system will bring can justify its expense. Often, however, there is substantial uncertainty about the cost of developing and deploying the system, much as there is uncertainty about what benefits will actually arise. In consequence, the question "do the benefits exceed the costs?" is sometimes modified to the less definitive form "what is the probability that the benefits exceed the costs?" Teams of investment analysts pore over available information and work out a probability distribution for X, the cost of building, installing and maintaining the system over its lifespan. The distribution typically takes the form:

(w.p. = with probability) w.p. Pn n

where LP; = l i-1

A corresponding probability distribution for economic benefits Z is likewise prepared:

Z=

j

bl b2

w.p. q1 w.p. q2

bm w.p. qm

The analysts typically assume that X and Z are independent quantities, which means that, for all i and j, the events "c = c;'' and "b = b;" are independent. Such an assumption is often reasonable. For an early-warning system about earthquakes, for example, the main uncertainty about cost might be the price of developing equipment that meets accuracy requirements, while the key uncertainty about benefits could be the number of earthquakes that occur in a given region over some future period. Given the distributions of X and Z, what is the probability that the benefits exceed the costs?

Solution

With the independence assumption at hand, working out P(Z > X) is conceptually tractable. We note that:

1.6

Other Matters

55

and then write:

P(Z

> X)

=

2>ipi where Q is the set of (i, j) for which bi

>

ci

1-31

Q

In other words, we add up the probabilities of all the mutually exclusive outcomes in which the benefit exceeds the cost. We can rew rite (1-31) by noting that: n

P(Z

> X) =

n

2P(X = c;)P(Z > ci ) = 2P/(Z > ci ) i-1

where P( Z > ci ) =

(1-32)

i-1

2uqi where U is the set of j's for w hich bi

>

ci

EXERCISE 1.29

Suppose that cost is deemed equally likely to take each of the ten values (6, 7, 8, 9, 10, 11, 12, 13, 35, and 40), which reflects a 20% risk that certain aspects of product development will be vastly expensive (35 or 40). The benefits are assumed more predictable, being equally likely to take each of the six values (11, 12,13, 14, 15, 16). What is the probability that the benefits exceed the costs? (Hint: Use (1-32).)

EXAMPLE 1.29

Double Time (A True Story) In a semi-conductor factory (fab), a certain machine that processes individual lots of computer chips sometimes gets sick. (It is said to be "on an excursion" when that happens.) The sick machine will continue to process the lots, but every lot handled until the machine is repaired will be defective. The machine's illness is not visible to the naked eye, and can only be diagnosed through detailed inspection. Initially, the company followed a conservative inspection strategy: look at every lot, and fix the machine immediately if there is trouble. That way, one discovers and repairs problems as soon as possible, and minimizes the number of defective lots produced. But management worried that inspections at the plant were too expensive, and called for an across-the-board 50% cut. For this machine, that policy would mean that every second lot would be inspected rather than every lot. Let's define a few quantities: c = the cost of performing an inspection a = the cost of repairing a machine when it's ill

b = the cost of reworking one lot of defective chips p = the probability that machine will fall ill while processing a particular lot of chips, given that it was OK when it started processing them.

56

CHAPTER 1

The Laws Of Probability

Suppose that a, b, c, and p are known constants, and that the cost of inspection is the same regardless of how many lots of chips are examined at a given time. If the machine falls ill while processing a given lot of chips, we assume that they will be defective. How might one evaluate whether management's proposal make economic sense? Solution

We could reformulate the question as: In terms of the four parameters, when is the existing policy less costly per lot produced than the alternate policy with only half as many inspections? We would compare the current policy (inspect every lot) with the alternative that management proposed (inspect every second lot). Policy 1: Inspect Every Lot

Consider a large number N of consecutive lots processed by the machine. Under the original policy, every lot would be inspected (at cost c), and, under the relative frequency interpretation of probability, fraction p of them would be found defective. (Given the immediate-repair policy, the machine was OK when it started processing a given lot.) If a lot is defective, the cost of repairing the machine and reworking the lot's chips is a + b. Thus, the total cost over N lots of inspecting every one would be Ne + Np(a + b). The average cost per lot, L1,would follow: L1 = (cN + p(a + b)N)/N = c + p(a + b)

(1-33)

Policy 2: Inspect Every Second Lot

Under the "every second lot" policy, there would be only N/2 inspections over the N lots. Let L2 be the average cost per lot under the revised inspection rule. Two important events are: D1: The machine fall ills on processing the first lot after the last inspection (in which case two defective lots will be found on the next inspection) D 2: The machine first falls ill while processing the second lot since the last inspection (in which case one defective lot will be found at the next inspection) The probabilities of these events are within our reach. P(D 1) is simply p, the chance of failure one lot after the machine was known to have worked properly. To find P(D 2), we note that the event requires the machine to succeed with the first lot since the last inspection, but fail with the second. Given how p was defined, P(Di) = (1 - p)p. When event D2 occurs, the cost of dealing with the failure uncovered is a + b, because the first lot processed since the last inspection was OK. But when D 1 occurs, the corresponding cost is a + 2b. All inspections, whether they encounter

1.6

57

Other Matters

failures or not, entail a fixed cost of c. Thus, over N lots, the N/ 2 inspections performed sustain an total cost T2 composed of: (N/ 2)c

(the fixed cost of inspections)

+ (N/ 2)p(a + 2b)

(repair and rework costs for "first-lot " failures)

+ (N/ 2)p(l - p)(a + b)

(repair and rework costs for "second-lot" failures)

The cost per unit time L2 therefore follows: L2 = T2 / N

= c/ 2 + p(l -

p)(a + b)/ 2 + p(a + 2b)/ 2

(1-34)

Comparing (1-33) to (1-34) indicates that it's not immediately clear which inspection strategy is more economical. If p is small and c is relatively large (i.e. inspections are costly and needed repairs are infrequent), then cutting inspections in half might save more money than the increase in reworkings would cost. But if c is low and p and bare high, then reducing inspections will do more harm than good. The advantage of having formulas is that we can go beyond saying "maybe," and make a specific decision based on the particulars of the situation we face.

EXERCISE 1.30

Suppose that every third lot is inspected in the situation of Example 1.31. What is L3, the expected cost per unit time under that strategy?

EXAMPLE 1.30

Where Default Lies

US mortgage-backed securities played a major role in the world financial crisis of 2008. Through a simple example, we suggest how a deviation between a probabilistic model about the behavior of mortgage-holders and their actual behavior pushed some of the world's largest banks towards insolvency. (Mortgage holders are generally people who borrow money to buy a house, and are expected to make monthly payments for a long period to pay back what they borrowed.) Investors in a mortgage-backed bond are promised that they will receive a fixed amount of money at a specified time, provided that enough of those individuals who hold certain mortgages actually pay these mortgages off. If too many mortgage-holders stop making their payments (i.e., go into default), then the bonds become worthless (i.e., themselves go into default). Under the tools of modern finance, the same set of mortgages can underlie several distinct kinds of bonds, which are constructed in ways that imply different risks of default. Suppose that, in a simple world, there are exactly two mortgages of interest, A and B. It is assumed that mortgage A has probability p of defaulting, while mortgage B has probability q. It is further assumed that the defaults of the individual

58

CHAPTER 1

The Laws Of Probabil ity

mortgages are independent events, meaning that there is no tendency of the mortgages to default at the same time. Two bonds-one called senior and the other junior-are created based on mortgages A and B. The senior bond will default only if mortgages both default, while the junior bond will default if one or more of the mortgages A and B defaults. Under this arrangement, the junior bond is riskier than the senior one. True: the two bonds fare the same way if both mortgages default or if neither of them does. But if exactly one mortgage defaults, the senior bond pays off while the junior bond does not. This difference in riskiness is presumably reflected in the prices that investors pay for the bonds. (i)

What is the probability that the senior bond defaults under the assumptions made? (ii) What is the probability the junior bond defaults? (iii) Give an example of a situation in which mortgage defaults are not independent events Answer (i) and (ii) for the situation in (iii). Solution

The calculations are not difficult under the independence assumption. (i) (ii)

P(senior bond defaults) = P(both mortgages default) = pq PQunior bond defaults) =P(at least one bond defaults) =1 - P(neither defaults)

=1-0-p~-~=q+pO-~ (We are using the rule for complementary events here.) When p = .08 and q = .12, for example, the default risk for the senior bond is .08 * .12 = .0096, while the risk for the junior bond is 1 - (.92) * (.88) = .1904. (iii) Beyond p and q, these calculations relied on the independence assumption about defaults of the different mortgages. But defaults might not be independent, because the economic conditions that make it impossible for the holder of mortgage A to make payments could have a similar effect on the holder of mortgage B. To model that possibility, suppose that q > p and we replace the independence assumption with the following: • The event "Both A and B will default" has probability p • "B alone will default" has probability q - p • "Neither bond will default" has probability 1 - q (iv) Under these new assumptions in (iii), we seek the default probabilities for the senior bond and the junior bond. Note that these assumptions entail a default probability of p for mortgage A and of p + (q - p) =q for mortgage B, as before. The overall frequency of individual defaults has not increased, but the likelihood of a "double default" changes from pq to p. And, with the new probability 1 - q that neither bond defaults, the chance that at least one defaults is simply 1- (1- q) = q. Under the revised assumption, therefore, the chance the senior bond will default rises from pq to p, while the chance the junior bond will default drops from q + p(l - q) to q.

1.6

Other Matters

In our example with p = .08 and q = .12, the new default probabilities are .08 (senior) and .12 Qunior), as compared to the earlier values of, respectively, .0096 and .1904. The upshot is that, when the mortgages tend to move the same way rather than independently, the senior and junior bonds converge towards fairly similar risk levels. What happens is that the junior bond gets safer while the senior bond gets less safe. This example suggests the importance of the independence assumption to the initial risk assessment. It does not make clear, however, why the failure of independence does any net harm: what the senior bonds lost, after all, the junior bonds gained. The problem in 2008 was that some of the world's largest investment banks were on the wrong end of both bets. The banks held lots of senior bonds, which defaulted at a far higher rate than anticipated, and they were responsible for paying off lots of junior bonds, which defaulted at a lower rate than expected. (When a junior bond does not default, the bank must pay the full amount to the bond owner, who had bought that bond at a steep discount.) Taking in much less and paying out much more is not a recipe for financial stability. One major investment bank-Lehmann Brothers-collapsed under the strain, and several others approached collapse and might well have gone under without massive government intervention. (I am indebted to Professor Andrew Lo of MIT for much of the material in this example.)

EXERCISE 1.31

Suppose that the two mortgages in Example 1.30 have default risks of p and q, respectively (p < q < 1/2), and defaults of the two mortgages are assumed mutually exclusive events. Under that assumption, what are the default probabilities for the senior and junior bond?

EXAMPLE 1.31

Poll Vault A two-person election can be viewed as a probabilistic venture where the goal is not majority support among all registered voters, but rather among the subset who actually vote. That circumstance raises some strategic questions for candidates. Suppose that, on election day, it is reliably estimated that candidate A is favored by fraction fA of the registered voters and candidate B by the remaining fraction 1 - fA. (We are in effect assuming that no voters are still undecided.) But it is well-known that not all voters show up at the polls on Election Day. Suppose that historical voting patterns and other relevant information lead to the estimate that fraction QA of A's supporters will actually vote, as will fraction QB of B's supporters. For particular values of fA, QA, and QB, will candidate A win the election?

59

60

CHAPTER 1

The Laws Of Probability

Solution

We can answer this question if we ask another: if a randomly-selected voter leaving the polls were asked about his choice, what is the probability that her (honest) response would be A? The second question yields the answer to the first because the probability the random voter chose A must equal the percentage of the vote A received at the polls. If 53% of those who voted chose A, for example, then the chance that a randomly-chosen departing voter chose A would be .53. We define three events: V: A randomly-chosen registered voter will actually vote today VA: A randomly-chosen registered voter will actually vote today, for candidate A VB: A randomly-chosen registered voter will actually vote today, for candidate B

What candidate A really needs to know is P(VA IV), which reflects his support among actual voters and ignores support among nonvoters that does not help him. From (1-10) and (1-16), we can write: P(VAI V)

= P(VA n V)/ P(V) = P(VJ P(VI VJ/ P(V)

Clearly, P(VI VJ= 1, so P(VAI V) follows: P(VAI V)

= P(VJ/ P(V)

(1-35)

Under our definitions, P(VJ = P(support A)P(VI support A)= JAQA- Similarly, P(VB) = (1-JJ QB. Because P(V) = P(VJ + P(VB), we have: P(V)

=JAQA + (1- JJQB-

Substituting for P(V) and P(VJ in (1-35) yields:

(1-36) Suppose, for instance, that JA = .52, QA = .7 and QB= .8. Then (1-36) reveals that, under current conditions, A will only get 48.7% of the votes. A is four percentage points ahead of B in overall support (52% vs. 48%) but, because A's supporters are less likely to go to the polls than B's (70% vs. 80%), A will not have a majority of actual voters. How might A respond to this unpleasant projection? If she is to win and it is too late to change JA, she must act to increase QA by a last-minute "get out the vote" drive. (We assume that she is unwilling to terrorize B's supporters and thus cannot easily reduce QB.) Calculations based on (1-36) can suggest the minimum value of QA that would yield victory, or might instead show that, even if QA = 1, the candidate will lose.

Other Matters

1.6

EXERCISE 1.32

61

If fA = .53 and QA = .66 in Example 1.31, what is the range of Q8 -values for which candidate A wins the election?

Probability Grows on Trees

When computing probabilities, it is sometimes useful to draw diagrams that depict the various sequences under which events can unfold. Such "trees" not only reduce the danger that we will forget some of the sequences, but make it surprisingly easy to find the probability of any particular one. We plant a tree below.

EXAMPLE 1.32

Getting Used

A firm is about to launch a new consumer product that is purchased frequently (e.g., toothpaste), and plans massive advertising on the product's behalf. After analyzing focus-group outcomes and other information, the company estimates that:

• 35% of potential buyers will see the ads • Of those who see the ads, 40% will try the product • Of those who try the product after seeing the ads, 45% will become regular users The firm also estimates that:

• Of those potential buyers who do not see the ads, 25% will try the product anyway • Of those who try the product despite not having seen the ads, 30% will become regular users Why are people who tried the product without seeing the ads less likely to become regular users than those who saw the ads and tried the product? Perhaps the ads cleverly stress the product's good points, in ways that make people who saw the ads more likely than others to appreciate the product. With these approximations about customer response at hand, the company wants to estimate several overall probabilities, including: (i) (ii) (iii) (iv)

the chance that a randomly-chosen potential customer will become a regular user of the product the chance that someone who sees the ads will become a regular user the chance that someone who does not see the ads will not become a regular user the fraction of those who become regular users who will have seen the ads

Can you assist the company, using a decision tree?

So/u,tion There are three (somewhat interrelated) events that are of interest:

A: a potential buyer sees the ad

62

CHAPTER 1

The Laws Of Probability

B: a potential buyer tries the product C: a potential buyer becomes a regular user

The tree diagram in Figure 1-4 shows the different ways by which a potential buyer either does or does not become a regular user of the product. The various circles in the diagram are called nodes, while the lines that connect them are called branches. Note that different paths from beginning to end represent mutually exclusive events, even if the paths partially overlap (e.g. ABC and ABCC). Note too that all relevant possibilities are presented. (There is no need, of course, to consider a path from Be to C, because no one can become a regular user without ever having tried the product.) Figure 1-5 shows the same tree as Figure 1-4, but also lists the probabilities of moving from one circle to the next. In many instances, these numbers are conditional probabilities: the entry on the line between A and B is the chance of moving to B given that one has already reached A. To find the probability of moving via a particular path from one place to another one further on, we need simply multiply the probabilities assigned to the various line segments on the path. We

C

c' B'

c' PC

C

c' B'

c' PC= potential customer; A= saw ad; 8 = tried product; C = regular user

A'= did not see ad, 8' = did not try product, CC= did not become regular user

FIGURE 1-4

Climbing the Tree, A Potential Customer Becomes a Regular User or Not

63

Other Matters

1.6

~

A'

B

0

.35

C

~

c'

~

c'

5

B'

PC

.65

~ ~ B

A'

C

0

5

c'

B'

~ FIGURE 1-5

c'

Transition Probabilities from One Node in the Tree to the Next

are in effect using the Joint Probability Law (1-17), having listed the needed probabilities one at a time. Now, we use Figure 1-5 to find the four probabilities that we seek. Note that the quantities in question are: (a) (b) (c) (d)

P(C) = P(potential customer becomes regular user) P(C!A) = P(regular userlsaw ad) P(C' iN) = P(not regular user! didn't see ad) P(AI C) = P(saw ad! regular user)

We see from Figure 1-4 that P(C) and thus that: P(C)

= P(PC ➔ A ➔ B ➔ C) + P(PC ➔ N

➔

B

➔

C),

= (.35)(.4)(.45) + (.65)(.25)(.3) = .11

Furthermore, P(CIA)

= P(A ➔ B ➔ C) = (.4)(.45) = .18.

(Note that there was no need here directly to use Bayes' Theorem; we simply multiplied the two branch probabilities along the path from A to C, namely, A ➔ B and B ➔ C.)

64

CHAPTER 1

The Laws Of Probability

Likewise: P(ccl N)

= (.75)(1) + (.25)(.7) = .925

(Here we must consider the two paths from Ac to CC: N ➔ c c.) To find P(A IC), we recall from (1-15) that:

➔

Bc ➔ CC and

Ac ➔

B

P(AI C) = P(A)P(CIA)/P(C)

We have already determined P(C) to be .11, and P(CIA) to be .18, and know that P(A) is P(PC ➔ A) = .35. Thus: P(AI C)

= (.35) * (.18)/.11 = .54.

Working with probability trees can be relaxing and fruitful (sorry). The tree diagram can get unwieldy as the number of possible sequences gets large; in that case, however, the probability calculations will presumably be tedious regardless of which method is used.

EXERCISE 1.33

In the setting of Example 1.32, suppose that it is estimated that a different advertising campaign will increase by a factor of 1.5 the fraction of potential buyers who see the ad, but will decrease by a factor of 1.2 the chance that someone who saw the ad will try the product and also decrease by a factor of 1.2 the chance that someone who saw the ad and tried the product would become a regular user. Given these estimates, would the new ad campaign be more effective that the current one?

EXAMPLE 1.33

Screen Test Many airlines use screening procedures to keep dangerous packages out of aircraft luggage compartments while not disrupting the loading of harmless packages. This example, slightly disguised, considers an existing security arrangement in Europe. Each package destined for the airplane goes to a first screening device. If it passes the test there, it is loaded on the plane while, if it fails, it is sent to a second device. At the second device, the procedure is similar: packages that pass are loaded, while those that fail go to a third device. Packages that pass the third screen test are loaded, while those that fail are taken aside. (The latter are subject to further procedures, but they will not travel on the present plane.) None of the screening devices is perfect, however. For device i (i = 1,2,3), there are two risks: (i)

With probability p;, it erroneously passes a bag that is dangerous, given that previous devices have correctly rejected the bag. (p 2 assumes, for example, that device 1 has rejected a dangerous bag)

65

Other Matters

1.6

With probability q;, it erroneously fails a harmless bag that previous devices have also erroneously failed.

(ii)

We assume that all these probabilities exceed zero but fall below one. Under these conditions: (i)

What is the probability that a dangerous package will be loaded despite this series of screenings? (ii) What is the probability that a harmless package will mistakenly be kept off the airplane? (iii) How does the chance that a hazardous bag is loaded under this three-part screening compare with the chance of its being loaded if any one of the devices was used alone for testing? (iv) Why might this three-part screening be preferable to using only one device? Solution

The time is right for drawing two decision trees, one for a dangerous package that arrives at the airplane, and the other for a harmless one. We define events L, D, and V; according to: L: the package is loaded onto the airplane

D: the package is dangerous Vi:

the package passes at device i

L c, De, and

VC; are, respectively, the complements of L, D, and V;

In this notation, we are seeking P(LI D) and P(LcJ DC), respectively, in parts (i) and (ii) of the problem. A tree helps us visualize the various ways the inspection can unfold, assuming that the package is in fact dangerous. (A second tree covers the case when the package is harmless.)

1-~---------------•0 2-Y

1-p,

FIGURE 1-6

ing Process

Loading Decisions about a Dangerous Package Under a Three-Stage Screen-

66

CHAPTER 1

(i)

The Laws Of Probability

From Figure 1-6, we see quickly that there are three mutually-exclusive paths to loading a dangerous package, depending on which screening device gives the OK. The overall loading risk P(L ID) can be obtained by adding the probabilities for the three paths: P(LI D)

(ii)

= P(D ➔ lY ➔ L) + P(D ➔ IN ➔ 2Y ➔ L) + P(D ➔ IN ➔ 2N ➔ 3Y ➔ L) Or: P(L ID) = Pi + pz(l - Pi) + p3(1 - pi) (1 - Pi) (1-37)

As for keeping a harmless package off the plane, the reader can draw the corresponding tree starting with De. Going down the path (De ➔ IN ➔ 2N ➔ 3N ➔ U) immediately makes clear that the risk is: P(L cj DC)

= qi q2 qil) = qi q2 q3

(1-38)

(iii) The question asks whether P(L ID) for this three-part scheme is greater than the dangerous-package error rates for each of the individual screening devices, namely, pi, p2, and p3• Clearly, P(LI D) > pi, because Pi is the first term on the right of (1-37) and the last two terms are positive. But, using algebra on (1-37) we note too that: P(LI D)

= P2 + PiO - pi)+ pil -

pi)(l - Pi)

From this equation, we can conclude that P(L ID) > p2 . We can similarly show that P(L ID) > p3• The upshot is that the combination of devices yields a higher risk of loading a hazardous bag that would any one device on its own! For example, if Pi = .02, p2 = .04, and p3 = .01, then P(L ID) = .069. The answer to this comparative question raises another: Are these people out of their minds? Why are they using a complex scheme with three devices that achieves less security against dangerous bags than any one device can provide? The answer lies in our result (1-38) about harmless packages. Obviously, the probability of failing to load a harmless bag (namely, q1 q2 q3 ) is lower-and probably far lower-than the comparable risk q; for device i on its own (e.g. if qi = q2 = q2 = .l, then q1 q2q3 = .001). If harmless bags enormously outnumber dangerous ones-as they do-considerations of efficiency might argue that it is preferable to accept a higher (though still small) risk of loading a dangerous package than to create frequent and substantial delays for innocent ones. We would all prefer not to make tradeoffs between security and efficiency, but that is not an option in the real world.

EXERCISE 1.34

Consider the arrangement in Example 1.33 in which the third device is eliminated, and the load/no load decision is based solely on what happens with the first two devices. Assuming that the p;'s and q;'s are all positive but less than one, must the probability of erroneous loading in this arrangement fall between the corresponding probabilities for Device 1 alone and for all three devices?

1.6

EXAMPLE 1.34

67

Other Matters

Strong Stuff? When testing the effectiveness of a new drug, pharmaceutical companies do not simply give the drug to some disease-sufferers and observe the proportion that improves. For one thing, some of these sufferers might have gotten better on their own; for another, the very act of taking medicine can lead a patient to perceive improvements whether or not they truly occur. (This phenomenon-known as the placebo effect- may be especially common if the malady is in part psychological.) For these reasons, a typical drug-effectiveness study will divide the diseasesufferers at random into two groups. Half are given the new pill, while the other half is given a harmless but useless substitute (called the placebo). None of those taking part in the experiment know what type of pill they are receiving. Then, the patient-reported success rate for the new drug is compared with that for the placebo, allowing an assessment of the drug's true potency. The results of such clinical trials are sometimes given wide publicity. A Time magazine ad for Viagra, for example, stated that 80% of those in an experiment who used Viagra reported improvement, as compared to 25% among those who had the placebo. What do these statistics imply about the effectiveness of Viagra?

Solution The temptation is to subtract 25 from 80, and thus to estimate Viagra's true effectiveness as 55%. Such a calculation, however, is harsh on Viagra. Among the 80% of Viagra users who reported improvement, some physically benefited from the drug (the genuine beneficiaries), while the remainder reported improvement for other reasons (such as the placebo effect). Thus, the genuine effectiveness of Viagra is less than 80%. But how much less? If q is the fraction of users genuinely helped by the drug, then 80% is roughly the sum of two quantities: • q • 25% ofl -q

Why the second quantity? We know that 25% of those who got the placebo felt better. It seems reasonable to posit that among those who used Viagra but did not physically benefit-namely, those for whom Viagra was in effect a placeboroughly 25% would likewise report improvements. The 80% of Viagra users who gained, therefore, included all of those directly helped plus about 25% of the rest. These observations imply the equation: q + .25(1 - q)

= .8

(1-39)

From simple algebra on (1-39), we see that q"" .73. Figure 1-7 shows a probability tree for what happened, showing the progression from Viagra use to the report of success or failure. The 80% of Viagra patients who reported gains included the 73% with genuine benefits (User ➔ Gain ➔ Report Gain) and about 1/4 of the

68

CHAPTER 1

The Laws Of Probability

73% .73 Viagra User

.27

FIGURE 1-7

7%

A Probability Tree for Viagra Users

remaining 27% without such benefits (User ➔ No Gain ➔ Report Gain), a path with probability .27 * .25"' .07. Of course, 73% is considerably higher than the 55% success rate readers of Time might have inferred from subtracting 25% from 80%. Some readers might wonder whether the Viagra success rate should be estimated as 80% rather than 73%, on the grounds that the 7% of users who erroneously believed that they gained because of the drug (i.e., took the path User ➔ No Gain ➔ Report Gain) did derive some benefit, whatever its origin. The problem with this view is that placebo effects are often temporary, meaning that the 7% boost because of "mind over matter" might not persist.

EXERCISE 1.35

Among those who have a meal that Mendel cooked, all those who found it delicious will tell him so, as will 40% of those who did not find it delicious (out of politeness). If 76% of the guests tell Mendel that the meal was delicious, what percentage actually thought it was?

The Takeaway Bar

69

The Takeaway Bar Each chapter of this book will end with a Takeaway Bar, which will summarize the key methodological points that have arisen. It will start with a glossary of terms, presented not in alphabetical order but instead in roughly the order in which the terms arose in the chapter. Then the Bar will list the main formulas from the chapter, and after that go on to offer "some parting thoughts, " a brief collection of tips and observations. The hope is that the Bar will be helpful for review, and offer "one-stop shopping" for people who want to locate something important from the chapter without rereading it page-by-page. Here is the Bar for Chapter 1.

Glossary An experiment is a process that yields an outcome/event that is generally not known in advance. The sample space is the set of all possible events/outcomes of an experiment. An elementary event arising from an experiment is one of the "building blocks" of the sample space, and cannot be expressed as the union of two or more other events from that experiment. An event is a subset of the sample space, and may consist of a combination of elementary events.

Kolmogorov's Axioms specify conditions that numerical assignments of probabilities should satisfy. Subjective probability arises when a person uses a number on a 0-1 scale to express her view about the likelihood of an event. The equal probability axiom states that two events believed equally likely should be assigned the same probability. Two events are considered equally likely if, roughly speaking, neither would be more surprising than the other. The relative frequency concept of probability arises from the idea that the probability of an event is essentially the fraction of time it would occur in the long run, if the experiment that could give rise to the event were repeated under independent and identical conditions.

Mutually exclusive events (also known as disjoint events) are events that can never occur together in an experiment. Their intersection in the sample space is the empty set. Exhaustive events have the property that at least one of them will definitely occur. Complementary events are two mutually exclusive events, one of which will definitely occur. (Complementary events are exhaustive.) The conditional probability of event B given event A is the probability assigned to B when it is known that A has occurred

70

CHAPTER 1

The Laws Of Probability

Bayes' Theorem is a form of the conditional probability law which expresses P(B IA)-the conditional probability of event B given that event A is known to have occurred-under a formula that uses the original probabilities of A and B (P(A) and P(B), respectively) and also P(A IB), the conditional probability of A

given B. Independent events have the characteristic that knowing that one of them has occurred does not change the probability that any of the others occurred.

The joint probability of several events is the probability that all of them occurred.

Key Formulas • If P(A) is the probability of event A, then:

(A probability of 1 is identified with virtual certainty, while probability Ois identified with impossibility.) • If Ai, A 2, • .• A" are mutually exclusive events, then: n

P(~ U Ai

...

U An)= IP(AJ i-1

• If A1, A2,

...

A" are mutually exclusive and exhaustive events, then:

• If B and Bc are complementary events, then:

P(W)

=1 -

P(B)

• If B and Bc are complementary events and A some other event, then: P(A)

= P(A n B) + P(A n Bc)

• If Bi, B2, and B,, are mutually exclusive and exhaustive events and A is some other event, then:

(This equation is called the Law of Alternatives or the Law of Total Probability.) • For any two events A and B:

Some Parting Thoughts

71

P(BIA)

= P(A n B)/P(A)

• Bayes' Theorem: P(BIA)

= P(B) * P(AIB)IP(A)

(This equation is called the Joint Probability Law.) • If events A and B are independent, then: P(B I A)

• If Ai, A 2,

•.•

= P(B); P(A I B) = P(A)

A n are independent events, then:

(where n means multiplication)

Some Parting Thoughts • While it is rarely possible to eliminate uncertainty, a more precise description of the uncertainties we face might make us better able to cope with them. The study of Probability allows for more precise descriptions. • Estimating the probability of an event by adding the probabilities for the different ways that it can happen is correct only when the ways are mutually exclusive. • Sometimes, it is easier to find the probability that an event will not occur than the probability that it will. When that happens, it might be wise to find the chance the event won't occur and then, under the rule for complementary events, to subtract that probability from one. • To find the probability that all events in a series occur (i.e., their joint probability), one can multiply their individual probabilities if they are independent. If the events are not independent, one can still find their joint probability through multiplication, provided that one uses appropriate conditional probabilities as factors. • When it is difficult exactly to determine the probability of some event, it can be useful to approximate that probability under simplifying assumptions. • When the result of a probability calculation is surprising (and not in error), it is worthwhile trying to understand why the result makes sense. Doing so can improve one's intuition, and might lead to fewer surprises in the future . • Even the best decision under uncertainty may turn out, when all the facts become known, to be the wrong decision. (Revisit Miller's Dilemma.) But the person who

72

CHAPTER 1

The Laws Of Probability

takes the trouble to estimate the risks and does the best possible given those risks has nothing to apologize for.

EZ Pass

1. If a fair six-sided die is tossed repeatedly, the probability that the first three arises on the second toss is: (A) 1/ 3 (B) (5/ 6) * (1/ 6) (C) (5/ 6) * (1/5) (D) none of these 2. If a fair six-sided die is tossed repeatedly, the probability that the first three arises before the third toss is: (A) 1/ 3 (B) (1/ 6) + (1/6) 2 (C) 1/ 6 + (1/ 6)(5/ 6) (D) none of these 3. If P(B IA) = P(A IB) and both P(A) and P(B) > 0, then the statement P(A) = P(B) is: (A) always true (B) sometimes true (C) never true (D) none of these 4. Suppose that P(B IA) = 2/ 5, P(B) = 3/ 5, and P(A) = 1/2, and A' is the complement of event A. Then P(B IA') is: (A) 1/ 2 (B) 3/ 5 (C) 2/ 3 (D) 4/ 5 5. If P(A IB) = 1, then (A) Au B = A (B) B = A c (C) An B = A (D) An B = B 6. If three people are in a room, the probability they all have the same birthday (assuming birthdays are uniform over the year is):

(A) (3!5f (B) (3!5)2 3

(C) 365

(D) none of these

Further Chapter 1 Exercises

73

Further Chapter 1 Exercises While these problems are not intended systematically to increase in difficulty, some are more demanding than others

EXERCISE 1.36

Two fair six sided dice (one red, one black) are tossed. (i) What is the sample space of outcomes? (ii) What are the elementary events in this experiment? (iii) Getting a one on the red die can be expressed as the union of six elementary events in this two-dice experiment. What are these six events? Find the probability that: (iv) (v) (vi) (vii)

EXERCISE 1.37

the sum of the outcomes is exactly 2 the sum of the outcomes exceeds 2 both dice come up with the same number both dice emerge with odd numbers

Mendel has a slow computer and a dreadful Internet connection. If he sends an e-mail message, there is a probability q that it will be delivered within the next 24 hours. One morning, he sends messages to his bank, to Gloom magazine, and to his history professor. Assuming that the fates of different messages are independent, (i) (ii)

What is the chance that all three of Mendel's e-mails arrive within 24 hours? What is the chance that the bank and magazine messages arrive but not the message to the professor? (iii) What is the chance that exactly two of Mendel's messages are delivered within 24 hours? (iv) Why might the independence assumption be unrealistic in this case?

EXERCISE 1.38

Suppose that events Ai, A 2, •.. , A n are mutually exclusive and equally likely, and that it is certain that one of them will occur (i.e. they are exhaustive). Show that P(A;) = 1/n for i = l, ..., n.

EXERCISE 1.39

For a patient with a particular set of symptoms, the probability is 64% that he suffers kidney problems and 44% that he suffers liver problems. Under these circumstances, suffering liver problems and kidney problems:

74

CHAPTER 1

(A) (B) (C) (D)

The Laws Of Probability

could be mutually exclusive events but not independent events could be independent events but not mutually exclusive events could be both independent and mutually exclusive could not be independent and could not be mutually exclusive

Choose an answer, and briefly explain your choice.

EXERCISE 1.40

Everyone buying a pizza at JR's pizzeria gets a card that conceals either a J or an R. When the wax covering is rubbed off, the chance is 80% that a J will appear. Different cards can be treated as having independent letters. Once a person has both a J and R, she is entitled to a free pizza. Note that, because the same letter can come up on several successive purchases, that rule is not equivalent to one free pizza per two paid ones. If Minerva starts patronizing the pizzeria, find the probability that: (i) (ii) (iii) (iv)

EXERCISE 1.41

her first two purchases yield a J and R, in that order she is eligible for a free pizza after her first two purchases it takes her exactly three purchases to achieve the bonus even after buying five pizzas, she is not eligible for her first complimentary one.

We will now derive an important probability formula for P(A u B) when A and B are not mutually exclusive. Suppose that A and B are two events, and Ac and Bc their respective complements. (i)

Taking account of the three ways that '~ or B" can arise, explain why it is true that: P(A

(ii)

u B) = P(A n B) + (A n Bc) + P(N n B)

Explain in simple words why it must also be true that: P(A)

= P(A and B) + (A and B")

(iii) Write an expression for P(B) analogous to that in (ii). (iv) Now, combining the three equations arising in (i) ➔ (iii), show that: P(A u B)

(v) (vi)

= P(A) + P(B) -

P(A n B).

What happens to this last formula when A and B are mutually exclusive? Are you happy? Show that when A and B are independent: P(A

u B) = P(A) + P(B)(l -

(vii) Find P(A u B) when P(A)

P(A)).

= .4, P(B) = .2, and P(B IA) = .25.

Further Chapter 1 Exercises

EXERCISE 1.42

Based on voter samplings a few hours before Election Day starts, a US Presidential candidate is estimated to have a 55% chance of carrying Minnesota (10 electoral votes), a 40% chance of carrying Ohio (20 votes), and a 32% chance of winning California 55 votes). (These probabilities incorporate the statistical uncertainty in results based on small scale voter samples.) The Presidential-election system in the United States is "all or nothing" at the state level: a candidate either wins all the state's electoral votes or none of them. The candidate is convinced that, with at least electoral 30 votes from these three states, victory is certain; with any fewer, however, defeat is unavoidable. For the purpose of this exercise, treat this assessment as valid. (i)

Identify the eight possible combinations of wins and losses in the three states (e.g., one of them is "win Minnesota, lose Ohio, win California"). (ii) Assuming that the outcomes in the different states are independent, find the probability of each of the combinations listed in (i). (iii) Should the eight probabilities just computed sum to 1? Briefly explain. (iv) Which combinations of results in the three states would make the candidate the next President? (v) What is the probability that, come January 20, she will take the oath of office?

EXERCISE 1.43

Explain why the following relations hold for any two events A and B in sample space S: (i) (ii)

(A u B)c = N n SC (A n B)c = N u BC These identities are known as DeMorgan's Laws.

EXERCISE 1.44

Imagine that a square item that has just been manufactured is (invisibly) divided into 900 squares each of unit area. The original square has 25 small defects that are randomly distributed over its surface. (More specifically, each defect independently has a 1-in-900 chance of falling in any given unit square, while no defect extends beyond one square.) (i)

The item will have to be scrapped if any one of the unit squares contains two or more defects. What is the probability that such scrapping will be needed? (HINT: Happy Birthday.) (ii) What is the probability that all 25 of the defects fall in the same unit square? (iii) Why is the answer to (ii) NOT equal to (1/900) 25? (If that is the answer you gave, why were you wrong?)

75

76

EXERCISE 1.45

CHAPTER 1

The Laws Of Probability

The New York Times of August 14, 1993 reported that a NASA safety panel estimated that the probability of catastrophic failure on a given space-shuttle flight is 1 in 60. But a NASA official noted that "we've had 30 flights with no catastrophic failures" since the space-shuttle Challenger disaster in 1986. Assuming that the 1-in-60 estimate is accurate: and that different space-shuttle flights can be treated as independent in terms of failure: (i) (ii)

What is the probability of getting 30 flights in a row without catastrophes? Let X be the number of flights after a given time until the first catastrophe. For the positive integer k, what is P(X > k)? (Hint: What is P(X > 30)?) (iii) What is P(X = k)? (Hint: If X = k, what were the outcomes on the first k - 1 flights?) (iv) What is P(X < k)? (Hint: Complement) (v) What is P(X $ k)? (vi) The Ath percentile of Xis defined the number b for which P(X $ b) =A% What integer k comes closest to being the 95th percentile of X? (Yes: work it out using your formula.)

EXERCISE 1.46

On December 29, 2011, The New York Times listed the eight candidates for the Republican nomination for US President in 2012, and then offered a scrambled list of eight quotations, one for each candidate. It asked readers whether they could match quotations with candidates. (i)

(ii)

(iii)

EXERCISE 1.47

If Mendel has no idea who said what and makes a completely random match of quotations to candidates, what is the probability all eight of his answers were correct? If Minerva can match three quotations to candidates but guesses at random for the remaining five, what is the probability that all her answers were correct? If (as actually happened), this author could match three of the quotations and could eliminate all but two possibilities for a fourth, what is the probability that all his answers were correct? (Assume that he guessed at random for the remaining four quotations, and guessed at random between the two candidates for the quotation that came from one of the two).

In the "Let's Make a Deal" setting in Example 1.19, suppose that the emcee behaves as follows: • When the player has chosen the wrong door, the emcee opens the one remaining door that has no prize behind it.

Further Chapter 1 Exercises

77

• When the player has chosen the correct door, the emcee always opens the leftmost of the two remaining doors. (If these two remaining doors are A and B, he selects A; A and C ➔ A; B and C ➔ B) Given this policy by the emcee, what should the player do if he chose A and the emcee chose B? (ii) If he chose C and the emcee chose B? (iii) Are there any situations in which the player derives benefit from not switching? (iv) Is it still true that a player who follows the "always switch" strategy has a 2/3 chance of winning?

(i)

EXERCISE 1.48

In the "March Madness" basketball tournament in the United States, two pairs of teams meet in the next-to-last of the "brackets." Then the winners of these games face each other in the final bracket. The winner of that game is the victor in the tournament. Suppose that two teams (East and West) are to compete in the next-to-last bracket, are as North and South. Experts in sports analytics have the following probability estimates about the outcomes for various combinations of teams: Probability It Would Win Against: Team East West North South (i)

East ? ? ?

West

North

South

2/ 5

1/ 2

3/ 5 3/ 5 11/ 20

3/ 5

? ?

?

Assuming that there are no tie games, there is enough information above to replace all the ?'s with numbers. What are they? (ii) For each team, find the probability that it will win the tournament. (To win, it must first win its next-to-last bracket and then defeat the winner of the other next-to-last bracket. (iii) Suppose that it might be possible to change the competitors in the next-tolast bracket (so that (say) East could play South). Could such a rearrangement change the probability that any given team would win the tournament? Briefly but specifically discuss. (iv) Would it be possible for South and East to come up with a joint proposal for changing the competitors in the next-to-last bracket under which both of them would have a higher probability of winning the tournament? If so, identify the proposal; if not, make clear why such a proposal is impossible.

78

EXERCISE 1.49

CHAPTER 1

The Laws Of Probability

A medical test for malaria is subject to both false positive and false negative errors. Given that a person has malaria, the probability the test will fail to reveal it is .06. And, given that a person does not have malaria, the chance is .09 that the test will suggest the opposite. (i)

Given that Mendel has malaria, what is the chance that his test result will reflect it? (ii) From earlier information, a physician estimates that Mendel has a 70% chance of suffering from malaria. Based on this estimate, what is the chance Mendel's test result will indicate malaria? (iii) Given that his test result does indicate malaria, what revision should be made to the original 70% probability that Mendel suffers from the disease?

EXERCISE 1.50

During a financial panic, it is learned that Bank A has a .4 chance of collapsing in the next month and that, independently, Bank B has a .3 chance of failing. Bank C, which has heavy deposits in both A and B, has a .5 probability of failing if one of those banks collapses and a .7 chance if both of them fall apart. Otherwise, Bank C will stay afloat. (i)

List the eight possible outcomes of the panic (with respect to the fates of the three banks). (ii) Find the probability of each of the events identified in (i). Taking into account your answer to (ii): (iii) What is the probability that all three banks survive the panic? (iv) What is the probability that at least two of the banks go under? (v) Mendel hears the alarming news about the risks of collapse and immediately transfers his savings from Bank A to Bank C. What is the chance he will derive positive benefit from this move? (Neglect interest payments at such a critical time; the banks surely will.) (vi) What is the probability Mendel will be worse off because of the transfer?

EXERCISE 1.51

During the 1990's, US jet flights performed 65 million flights. The proportions killed in fatal accidents among these flights were: Year

Location

Percentage of Passengers Killed

1990 1991 1991 1992

Detroit

20% 27% 100% 53%

Los Angeles Colorado New York

Further Chapter 1 Exercises

79

Year

Location

Percentage of Passengers Killed

1994 1994 1995 1996 1996 1996 1999

North Carolina Pittsburgh Colombia Florida New York Florida Arkansas

71% 100% 97% 100% 100% 1% 6%

There were no deaths in terrorist attacks on US aviation in the 1990's. Given these statistics, what was the death risk q per randomly-chosen US jet flight in the 1990's?

EXERCISE 1.53

Suppose that it is known that a given test for lead poisoning has a zero false negative rate, and that one per cent of the people who are tested for lead positioning at a given emergency room actually have it. However, because the test has a falsepositive rate, the percentage of people for whom the test comes back positive is not one per cent, but instead two per cent. (The false positive rate is the probability that someone who do not suffer lead poisoning yields a positive test result.) If q is the false positive rate of the test, then the value of Q is; (A) (B) (C)

(D)

one per cent 50% 1/99 cannot be determined from the given information

Choose an answer, and briefly defend it.

EXERCISE 1.54

In one game of the Massachusetts State Lottery, the player picks six integers in the range 1-42. Then six integers in that range are chosen at random. If they exactly correspond to the player's numbers (even if they weren't selected in the order in which he listed them), then he wins the Jackpot. (i) (ii)

If Mendel buys one ticket, what is the probability that he will win the Jackpot?

(Hint: What is the chance that his first number is selected? And his second?) Suppose that one million other tickets were purchased for that lottery, and assume that they can be treated as randomly distributed among all possible combinations of choices. (For example, each of the tickets independently has the same chance of containing (1,2,3,4,5,6) as (5,11,22,25,34,41) or any

80

CHAPTER 1

The Laws Of Probabil ity

other combination.) What is the probability that, if Mendel wins the Jackpot, he will have to share the winnings with at least one other person? (Hint: Complement.) (iii) After the drawing, Mendel hears on the radio that there were one or more Jackpot winners, but he does not hear what the winning numbers were. Given that information (and the fact that there were one million tickets other than his), what is the probability that he won the Jackpot?

EXERCISE 1.55

A study shows that, of the products that recently received patents, 12% later became commercially successful. The study also shows that 20% of these patented products involved biotechnology and that 30% of these biotechnical products were commercially successful. If Mendel had invested in four recently-patented products that he chose at random, what is the probability that all four were commercially successful? (ii) Knowing that all four of his randomly-chosen products were commercially successful (and nothing else about his choices), what is the chance that they all involved biotechnology? (iii) Let q be the fraction of recently patented nonbiotechnological products that were successful. Given the information above, what is the numerical value of q? (HINT: Express the overall success rate for recently-patented products in terms of q.)

(i)

EXERCISE 1.56

When a certain machine breaks down, it must be repaired. But the quality of the repair is variable, which affects the time to the next breakdown: • The probability of a perfect repair is 0.5. If perfectly repaired, the machine goes exactly twelve months until its next breakdown. • The probability of an adequate repair is 0.3. If adequately repaired, the machine goes exactly nine months until its next breakdown. • The probability of a poor repair is 0.2. If poorly repaired, the machine goes exactly six months until its next breakdown. The time to complete a repair is zero, and a repair occurs the instant the machine breaks down. Different repairs of the machine over time are independent in quality. Assume that, after a repair, one cannot judge its quality (and thus cannot try to improve a poor repair). (i)

Suppose that the machine has just been repaired, and let X be the time until the next repair. What values can X take on with positive probability? What is the probability associated with each such value?

Further Chapter 1 Exercises

Let E be the event that the machine is repaired again exactly 24 months from now. One way that E can occur is to have four poor repairs in a row: the repair now, in one in six months, in 12 months, and in 18 months (i.e., at times 0, 6, 12, and 18). Please list the other ways that E can occur, in terms of exactly when the repairs occur during the next 24 months. (iii) Suppose that we return in 24 months and find the machine being repaired. Given that circumstance, what is the probability it was also undergoing repair 9 months earlier (i.e., 15 months after the initial repair at t = 0)? (ii)

EXERCISE 1.57

A new durable good is to reach market next year. (A durable good is one that customers never purchase more than once, because it lasts a very long time.) Its manufacturers are unsure whether to run an advertising campaign on its behalf. They make the following assumptions: (a) (b) (c) (d) (e) (f)

If there is no advertising campaign, then 3% of the potential buyers will purchase the good. If there is a campaign, then 40% of potential buyers will see the advertisements. Among those who see the advertisement, 4% will purchase the good. Among those 60% of potential buyers who do not see the advertisement, 3% will purchase the good. The profit for each good sold is $100. The advertising campaign costs $0.50 per potential buyer (whether or not the person purchases the good).

Given all these data, use a decision tree to decide whether the company should run the advertising campaign. Assume that average profit per 1000 potential buyers (subtracting advertising costs if necessary) is the decision criterion.

EXERCISE 1.58

(i)

(ii)

Among people with athlete's foot, 75% of those who take a drug no longer had the condition a week later. Within a control group of other athlete's foot sufferers, 30% no longer have the disease a week later. Given these statistics, make an estimate p, the probability that the drug cures the condition within a week. Suppose that all high-school graduates report that fact on their resumes, while 30% of those who did not graduate high school falsely report on their resumes that they did. If 92% of resumes report high-school graduations, what was the actual graduation rate?

81

82

EXERCISE 1.59

CHAPTER 1

The Laws Of Probability

Among people who undergo a given cancer treatment, the data about subsequent survival is as follows: Period

Survive

Die

Lost to Follow Up

1 2 3 4

90 36 28 20

10 14 4

20 40 4

0

8

Under survival-curve methods, what is the estimated probability of surviving three periods with these data? (Those lost to follow up after a given period survived it but provide no data for later periods.)

EXERCISE 1.60

Suppose that animals of a certain kind have genes for brown eyes or blue eyes, with brown being the dominant trait and blue the recessive one. The initial genetic mix for both males and females is 25/50/25, where 50% of the animals are hybrid (one brown gene and one blue one) and the other 50% are equally divided between those with two blue genes and those with two brown ones. (i)

Suppose that each animal mates with another with the same eye color to produce exactly one offspring. What will be the percentage of offspring with each eye color one generation hence? (ii) Two generations hence? (iii) Suppose instead that each couple produces two offspring. Will the mix of eye colors be the same as your answer to (i)? Briefly explain. (iv) Suppose instead that each male mates with one brown-eyed female to produce one offspring. No males mate with blue-eyed females. What will be the mix of eye colors one generation hence? (Males and females are equally numerous, and some brown-eyed females mate with more than one male.) (v) Two generations hence?

EXERCISE 1.61

Suppose that mortgage A and mortgage B both have a 12% chance of defaulting, and that the probability that B defaults given that A does is 1/2. (i) (ii) (iii) (iv) (v) (vi) (vii)

Are the defaults by mortgages A and B independent events? Explain. What is the probability that A defaults and B does not? What is the conditional probability that A defaults given that B does? What is the probability that both mortgages default? Find the probability exactly one mortgage defaults. Now find that probability that neither mortgage defaults. Do your answers to (iv), (v), and (vi) add up to one? Should they?

Further Chapter 1 Exercises

EXERCISE 1.62

In the "poll vault" example 1.31: (i)

Assuming that everyone votes, what is the probability that candidate A wins if/A=.4?

(ii)

Suppose that the exertions of candidate A on election day increase QA, leave fA unchanged, and sufficiently energize her opponents that QB increases by the same proportion as QA (i.e., the ratio QB/QA remains the same). What is the net effect of these developments on candidate ks share of the vote? (iii) True or False: If !AlfB = QB/QA, then the outcome on Election Day will be a tie.

EXERCISE 1.63

Jezebel looks at her bridge hand (thirteen cards) and finds she has only one club (the 6, it turns out). She does not yet know anything about her partner's hand or any others: (i)

What is the probability that all the other clubs are in the hands of the other team? (ii) What is the probability that all but the 9 of clubs are? (iii) What is the probability that the other team has ten of the twelve remaining clubs?

EXERCISE 1.64

In the "screen test," example, work out P(L ID) if a bag is loaded when and only when it has been approved by both of the first two devices. (In addition to the p;'s and q;'s defined in the example, let a1 be the conditional probability that the second device erroneously approves a dangerous bag, given that the first device has erroneously approved it. (We did not need a1 under the earlier loading rule.) Draw a decision tree as part of your efforts.

EXERCISE 1.65

In the example 1.32 about probability growing on trees: (i)

Which of the following would be more effective in increasing the number of regular users: • raising the fraction of potential users who see the ad by ten percentage points, or • raising by a factor of 1.2 the fraction of those who try the product who become regular users (This factor of 1.2 applies both to those who saw the ad and those who did not.)

83

84

CHAPTER 1

The Laws Of Probability

Suppose that it costs c to achieve the ten-percentage-point gain, and d to achieve the factor of 1.2 growth. Which of these strategies costs less per additional regular user? Answer in terms of c and d. (iii) Is the more cost-effective strategy under (ii) necessarily the best strategy for the company? Briefly discuss. (ii)

EXERCISE 1.66

In the "double time" inspection problem 1.29, suppose the policy is changed to inspect every fourth lot of chips. Let X be the number of lots that are found defective on a given inspection. (i) What values can X take on? (ii) What are the probabilities that X takes on each of these values? (iii) Suppose that the machine is found to be defective at a given inspection, but Xis not yet known. What is the probability that X = 4?

EXERCISE 1.67

In Framingham, every family has either two or three children. Twins and triplets are nonexistent, each child born is equally likely to be a boy or a girl, and the sex of any child born is independent of that of any older siblings. A behavioralist deduces that a family's total number of children depends on the sex distribution of the first two. The chance that the parents will go on to a third child is r if the first two are of the same sex ands if they are of opposite sexes. Assume that this analysis is correct in the forthcoming exercise.

If a Framingham woman has just had her second child, what is the chance it is her second boy? (ii) For a randomly chosen family beyond the childbearing ages, what is the probability it has three boys? (iii) What is the probability it has three kids? (iv) Given that a family has exactly one boy, what is the probability he was the second child born? (HINT: Consider the five mutually exclusive ways a family can wind up with exactly one boy.) (i)

EXERCISE 1.68

A manufacturer has three machines (A, B, and C) for making bottle tops. The probability is p that any bottle-top produced by machine A or B will not fit the appropriate bottle. For machine C, the corresponding probability is 2p. Before leaving the factory, the bottle-tops produced by all three machines are merged randomly. Each machine produces 1/3 of the total output.

Further Chapter 1 Exercises

(i) (ii) (iii)

EXERCISE 1.69

If a bottle-top picked at random from the merged pool is found to be defective, what is the probability it was produced by machine C? If a bottle-top picked at random from the pool is found not defective, what is the probability it was produced by machine C? Suppose that all 8000 bottle-tops produced one day were examined and 20 were found to be defective. On the basis of this outcome, make a sensible estimate of p. (Hint: Express in terms of p the chance that a randomly-chosen bottle-top is defective.)

Those who visit a certain web-site find a banner ad for wine awaiting them. An analysis of data about these visitors finds that: • 20% of them click onto the ad, and, of those who do: • 82% purchase no wine

• 9% purchase a case of red wine • 15% purchase a case of white wine (Some people purchase both red and white wine, but no one buys more than one case of either.) Please answer the following questions (decision tree optional). (i)

What is the probability that a randomly-chosen visitor to the site will not buy any wine there? (ii) Given that the visitor purchased no wine, what is the probability that she clicked on the ad? (iii) Let p be the chance that the visitor purchased red wine only, q be the chance for white wine only, and g for both kinds of wine. Given the numbers above, what are p, q, and g? (Hint: You can write three simultaneous linear equations for these quantities. What is the sum of the three quantities?) (iv) Estimate the number of cases of wine sold per thousand visitors to the site.

EXERCISE 1.70

Mendel wants to buy a used bike at an on-line auction, and is wondering what to bid. Based on his study of the outcomes of earlier auctions, he (accurately) estimates that the probability P(y) that he will win the bike if he bids $y is given by: P(y)

= .00005y2 + .005y for y ~ 100, and P(y) = 1 for y ~ 100.

For this exercise, assume that his formula is accurate. (i) (ii)

Show that both formulas give the same result for y = 100. For what bid would he have a 90% chance of winning?

85

86

CHAPTER 1

The Laws Of Probability

(iii) Let a be the minimum bid for which Mendel would win. (Of course, he doesn't know what a is; it depends on what other bids have been made.) What is the probability that a ~ 75? (Hint: This is not hard.) (iv) Suppose that he bids 75 and wins. What is the probability that he would have won had he bid 60?

EXERCISE 1.71

In a large jungle community, 36% of the residents are zebras, 21% are hyenas, and the remaining 43% are elephants. On the question of whether to impose a "trunk tax" on elephants, 62% of the zebras, 56% of the hyenas, and 6% of the elephants are in favor. All those not in favor are opposed. (i)

If two residents of the community are chosen at random, what is the prob-

(ii)

If two residents are chosen at random, what is the probability that they both

ability that both are elephants? oppose the trunk tax? (iii) Given that the first two residents chosen both opposed the trunk tax, what is the probability that at least one was an elephant?

EXERCISE 1.72

Suppose that Soothsayer and Afterthought are two financial services and that, historically, each has been right 3/4 of the time. Suppose further that their past errors have been independent (that the errors did not, for example, tend to be simultaneous). Given that both firms make the same forecast for an upcoming period, what is the probability that this forecast is correct? (HINT: Treat this as a conditional probability problem and consider the frequency with which both firms agree.)

EXERCISE 1.73

The Professor will be traveling by rail from Leeds to Wilmslow, and must change trains enroute at both Stalybridge and Stockport. Because his "connection times" are very short, he fears that his arrival in Wilmslow will be nowhere close to schedule. Specifically, he estimates the chance of lateness as 30% for each of the LeedsStalybridge and Stalybridge-Stockport trains, and 20% for the Stockport-Wilmslow service. (A late train is behind schedule throughout its trip.) All trains that are not late are on schedule; treat the "promptness status" of each train as independent of the others. At either junction, the Professor will miss his connection if the

Further Chapter 1 Exercises

arriving train is late and the departing one is on schedule; otherwise he'll catch the appropriate train. (i) (ii)

What is the probability he'll arrive in Wilmslow on schedule? What is the probability he will make both his connections? (Why is this question different from (i)?) (iii) Given that he arrives in Wilmslow on the train he had hoped to catch, what is the chance he does so on time?

EXERCISE 1.74

Suppose that fraction p of the microchips produced by Life Potato Chips are defective, and that a quality inspector selects ten microchips at random. If a microchip is defective, there is a probability q that he will fail to notice the problem and will declare the microchip acceptable. If the microchip is acceptable, however, there is no possibility that the inspector will erroneously declare it defective. (i)

What is the probability that none of the microchips the inspector selects are defective? (ii) What is the probability that the inspector will declare the first microchip that he picks to be defective? (iii) Given that he declares all ten microchips acceptable, what is the probability that all of them actually are acceptable?

EXERCISE 1.75

Returning to the setting of Miller's Dilemma in Example 1.18, suppose that Steve's cards consisted of three Kings, two 6's, a 4, and a 9. Taking into account these seven cards as well as the four his opponent showed (A A A 8), we'll proceed in steps to recalculate the chance that Steve has the winning hand. Given what is known about the eleven cards, what is the probability that: (i) the opponent has the remaining ace (ii) the opponent has at least one more 8 but not another ace (iii) the opponent has a pair of 6's but no more 8's or aces (iv) the opponent has two or more 9's but no more 8's or aces (v) the opponent has two or more Q's but no more 8's or aces (vi) Now, using the answers to (i) ➔ (v), find the probability that the opponent's concealed cards improve his hand in some way. (Numerical answer, please.) (vii) If your answer to (f) is correct, it should be slightly higher than the .39 arising from the amnesia calculation. Given the details of Steve's hand, is it surprising that taking account of them improves his opponent's prospects? Discuss briefly.

87

88

EXERCISE 1.76

CHAPTER 1

The Laws Of Probability

"The pass line is the most fundamental bet in craps; almost every player at the table bets on it. If you only understand one bet in craps, it should be this one. The pass line bet starts with a come out roll. If the come out roll is a 7 or 11, then you win even money. If the come out roll is a 2, 3, or 12, then you lose. If any other total is rolled (4, 5, 6, 8, 9, or 10) that total is called "the point." Then, the shooter will roll the dice until he either rolls that same point again, or a seven. If a seven comes before the point, then you lose. If the point is rolled first, then you win even money. The house edge on the pass line is only 1.41%, which is not bad compared to most other bets on the table and other games in the casino." -From "Craps: Strategy and Odds" at Wizard of Odds.com (In an even-money bet, your money is doubled if you win and you lose all your money if you lose.) Suppose that Minerva makes a pass-line bet, and that the dice are fair. (i)

What is the probability that she wins on the come out roll? For k = 4, ... , 10, let Ak be the event that number k comes up on the come out roll and that Minerva eventually wins. (ii) Show that P(A 4) = (3/36) * ((3/36)/(9/36) = 1/36 (iii) Find P(AJ for k = 5, ..., 10 (iv) What is the overall chance that Minerva will win? (v) In light of your answer to (iv), how do you interpret the statement that "the house edge on the pass line is only 1.41%? (vi) Suppose that Minerva makes 1000 pass-line bets of $10 each in different craps games over the course of a year in Las Vegas. Approximately how much will she win or lose over the year? What will be her average loss (or gain) per game, as a percentage of her bet of $10?

EXERCISE 1.77

The casino game Sim Bo that is highly popular in Macau involves a toss of three six-sided dice. If a player bets "high," she wins if the sum of the three outcomes is in the range (10,17) and all three dice do not show the same number (a triple). In various steps, we will work out the chance of winning with such a bet. Let S be the sum of outcomes on the three dice. (i) (ii) (iii) (iv)

What is the probability that S = 18? What triples involves sums in the range (10,17)? What is the probability of getting a triple in the range (10,17)? When the dice are tossed, suppose one falls to the left, another to the right, and the third in the middle. Does the outcome (X on left, V in middle, Z on right) have the same chance of showing up as (7 - X on left, 7 - Vin middle,

89

Further Chapter 1 Exercises

7 - Z on right)? If S = X + V + Z

= k,

then what is U

= (7 -

X) + (7 - V) +

(7 - Z)?

(v) Do your answers to (iv) imply that P(S = k) = P(S = 21 - k), where 3 s k s 18? (vi) Does your answer to (v) imply that P(4 :,; S s 10) = P(ll s S s 17)? (vii) Now, put the various responses above together to find the chance of winning with a high bet. Note that it is not necessary to work out a difficult quantity like P(S = 11). (Hint: What is P(S = 3) + P(4 s S s 10) + P(ll s S s 17) + P(S = 18)?)

Discrete Probability Distributions Double Trouble (Based on a True Story) The operations manager of the air-freight company was might ily upset. The company flew Boeing-727 freighters w ith eight compartments for containers. Now the government was considering a rule that would bar loading two heavy containers (each weighing more than 9600 pounds) into adj acent compartments. The fea r was that, if t he plane encount ered an extreme w ind gust w hile in a "double heavy" situation, it could suffer a catastrophic loss of control. The manager accosted t he statistician and began a bitter lament. "If we have to weigh each conta iner before loading it, the effect on our efficiency will be disastrous. And the thing is, only 7% of ou r containers are heavy. So we ra rely have two heavies on the same plane, let alone next t o one another. Even if we loaded at random without weighing anything, the chance of being in "double heavy" mode is infin itesimal. Right?" "I'm not sure it's infinitesimal," the statisticia n responded. More than a little annoyed, the manager snapped "you 're t he statistician. Why don't you work it out?"

To be continued

ome applications of the laws of probability are of such widespread practical importance that, rather than "reinvent the wheel" every time they come up, it is worth exploring them in detail and devising general formulas. In this chapter, we introduce the potent concepts of probability distribution and random variable, paying special attention to several widely-used distributions. We illustrate the key ideas with lots of examples. Before we turn to probability distributions, we should familiarize ourselves with a counting formula that we will use repeatedly.

S

91

92

CHAPTER 2

Discrete Probability Distributions

A Counting Formula Suppose that there are n people, and a committee consisting of k of them must be chosen. How many distinct sets of k people can be selected? (Two sets are distinct so long as they are not identical: two sets with some overlap in membership are still distinct.) For simplicity, suppose that the committee is chosen at random as follows: From an urn containing n balls, one for each of the n people, k balls are selected one at a time. The people named on these k balls become members of the committee. To get started, suppose that n = 4 (people) and k = 2 (will be chosen). The four people are imaginatively named A, B, C, and D. Then the two-stage urn selection process can yield twelve outcomes: (A,B)

(B,A)

(C,A)

(D,A)

(A,C)

(B,C)

(C,B)

(D,B)

(A,D)

(B,D)

(C,D)

(D,C)

But the two outcomes (A,B) and (B,A) yield the same committee: it does not matter whether A precedes B in the drawing or B precedes A. When we eliminate the double-counting in the list above, we have six distinct committees: (A,B), (A,C), (A,D), (B,C), (B,D), and (C,D). What if n = 8 and k = 3? Well, there are eight possibilities for the first name selected and, for each of these, there are seven possibilities for the second name chosen. Thus, there are 7 * 8 = 56 different pairings from the first two selections. For each of these pairings, there are six possibilities for the third name, meaning that the total number of "triplets" that can be selected is 56 * 6 = 336. As before, however, there are several different sequences in which any given set of three names could come up in the drawing. The three individuals (A,B,C) could have been chosen for the committee in six different sequences: (A,B,C)

(A,C,B)

(B,A,C)

(B,C,A)

(C,A,B)

(C,B,A)

When every possible committee of three people can arise in six different ways during the drawing, the number of distinct committees that can be formed is not 336, but rather 336/6 = 56. OK? We can extend this reasoning to general values of n and k (k :,; n). In choosing k names from the urn, there are n possibilities for the first name selected. Once that person is chosen, there are n - 1 possibilities for the second selection. For each of the n(n - 1) pairings of the first two names, there are n - 2 options for the third; for each of the n(n - l)(n - 2) triples of the first three names, there are n - 3 possibilities for the fourth. We can continue in this way until the last person is selected: once (k - 1) people have been chosen, the number of possibilities for the kth is n - (k - 1) = n - k + 1. If Qkn is the number of possible outcomes of the drawing-with names listed in the order they were chosen-then: Qkn

= n(n - l)(n - 2) ...

(n - k + l)

2.1

93

A Counting Formula

Noting that n!/(n - k)! is the product of integers from 1 through n divided by the product of integers from 1 through n - k, we can rewrite Qkn as: Qkn

= n!/(n -

k)!

But, as noted earlier, these Qkn listings include a lot of repetition, in which the same k people arise in different order. We need therefore ask: in how many different sequences could a particular set of k people be chosen in the drawing? Among the drawings that yield these k people, there are k possibilities for which of them came up first; for each such possibility, there are k - l options for which one came up second. For each of the k(k - 1) pairings for the first two selected, there are k - 2 options for the third person chosen, and therefore k(k - l)(k - 2) possibilities for the first three selected. We continue the progression, which ends when k - l people have been chosen and there is only one person who can be the last. It follows that Rk, the number of different orderings in which a particular set of k people could be selected is given by :

The number of distinct committees of k people that can be formed in an urn drawing from a group of size n-denoted (~)-is the number of possible outcomes of the drawing (namely, Qkn ) divided through by R1c, the number of orderings in which each particular set of k names could come up. In short, we get:

The number of distinct "committees" of size k that can formed from a group of size n follows:

(n)-

Qkn _ n! k - ~ - k!(n - k)!

(2-1)

For example,(;) = 8! / (3! * 5!) = 40,320/ (6 * 120) = 56, whichistheanswerwe reached above for how many distinct committees of size three can be formed among eight people. Two further points. While we deduced (~) by talking about choosing committees at random, the total number of distinct committees of size k from a group of size n is the same regardle~ of how the selection was done. Also, we can interpret the idea of committees broadly, going beyond committees of human beings. For example, suppose we must choose three dates in April when we will perform some unpleasant chore. The number of possible "committees" of three distinct dates in April (e.g., 3 April 6, April 9, and April 23) is ( 3 which is (30 * 29 * 28)/(3 * 2 * 1) = 4060.

°),

94

EXAMPLE 2.1

CHAPTER 2

Discrete Probability Distributions

New York Story In the game "Sweet Million" in the New York State lottery, the player chooses six integers in the range 1-40 and pays $1. Then six such integers are selected in a random drawing and, if the player's six numbers correspond exactly to those chosen, the player wins a $1,000,000 jackpot. (Order does not matter: if the player wrote down (8, 14, 19, 33, 35, 38), there is no problem if the numbers arose in the order (33, 8, 35, 14, 19, 38)) (i) (ii)

What is the probability that a particular player will win the jackpot on a given bet? Is this a good bet?

Solution: (i)

The number of ways the six-number drawing can turn out is ( ~O ) -the number of committees of size 6 that can be created from a group of size 40. We then write: 0 (~ )

(ii)

EXERCISE 2.1

= 40!/(34! * 10!) = 3,838,380.

Given that all possible committees are equally likely outcomes of the drawing, the chance of winning the jackpot is 1 in 3.8 million. Invoking the relative frequency concept of Probability, the player would win the jackpot once in every 3.8 million games. In other words, he would spend $3,800,000 to win $1,000,000. Not exactly a good deal. However, there are smaller prizes for getting (say) four of the six numbers chosen, so the overall winnings every 3.8 million games is somewhat higher than $1,000,000 (though still lower than $3,800,000).

New York State also has a Lotto game in which the player picks six numbers from 1 to 59, and wins the jackpot if the six numbers selected match the player's numbers (order is not important). (i)

Make a guess: given the chance of winning a jackpot at Sweet Million, what is the chance of winning at Lotto? (ii) Now work out the answer for Lotto. (iii) Now go to the New York Lottery website and see whether you got the same answer as they did. (iv) Suppose that a game "SuperLotto" required that the player list the six numbers selected in the same order. What would be the chance of winning at SuperLotto?

2.2

DIIIIIIII

95

Random Variables

Random Variables We start with a definition: A random variable is a mathematical function that assigns a number to each elementary event in the sample space.

(To put it more mathematically, a random variable is a "mapping" from the sample space to the real numbers.) Different elements of the sample space are not necessarily assigned different numbers: all that is required is that each elementary event receives a unique number. The specific number that the random variable yieldsbased on which elementary event occurred-is called the realization of the random variable. Typically, a random variable is designated by a capital letter like X, while a particular realization is denoted by a small letter like x. The quantity P(X = x) is the probability that the realization of X is x.

EXAMPLE 2.2

Beyond the Die Toss Suppose that a fair six-sided die is tossed, and that random variable X assigns a number to the elementary event (or outcome) k under the rule X = k(6 - k). Then: (i)

For each elementary event of the sample space, identify the corresponding realization of X. (ii) What are the realizations that X can take on, and what are the probabilities associated with each? (iii) Can there be a random variable that simply replicates the actual outcome of the die toss? If so, identify it; if not, explain why not. Solution

(i)

The elementary events (outcomes) in the sample space here are the six numbers that can come up in the die toss. If k is the outcome of the die toss, then the realization of Xis k(6 - k). A chart of outcome vs. realization is: Die Outcome (k)

Realization of X (k(6 - k))

1

5

2

8

3

9

4

8

5

5

6

0

96

CHAPTER 2

(ii)

Discrete Probability Distributions

We know the values X that can take on from (i), and we can work out probabilities for X using the fact that each die outcome from 1 to 6 has probability 1/ 6. We reach the following chart: P(X=x)

Realization of X (x-value)

1/ 6 1/ 3 1/3 1/6

1

5 8 9

(iii)

= 6) = 1) + P(k = 5) P(k = 2) + P(k = 4) P(k = 3) P(k P(k

Yes: the rule X = k arranges that the realization of Xis equal to the die outcome itself. We would have P(X = x) = 1/ 6 for x = l, 2, 3, 4, 5, 6.

EXERCISE 2.2

Suppose that a six-sided fair die is tossed, and that random variable X assigns to each outcome the value 1 if the outcome is a perfect square (i.e., the square of an integer) and the value Ootherwise. What realizations of X are possible, and what is the probability associated with each?

EXAMPLE 2.3

Fit to Eat? There are 12 sealed cups of strawberry yogurt at a supermarket, and Minerva is about to pick two at random to buy. Unbeknownst to her and the store owner, three of the cups contain yogurt that is sour and inedible. Random variable X assigns to each selection that Minerva might make the number of sour cups that she buys. (i) (ii) (iii) (iv)

What are the elementary events associated with Minerva's purchase? How many such events are there? How many such events involve two sour cups? Exactly one sour cup? What are the realizations of X, and what is the probability associated with each one?

Solution (i)

(ii)

To make things easier, we number the 12 cups from 1 to 12 based on their locations in the display (e.g., left to right). The elementary events can be denoted by (i,j), where i and j are the two cups that Minerva selects (j > i). 1 The total number of (i,j) combinations that could arise is ( ~) as we dis1 cussed in Section 2.1. Under (2-1), ( = 12 * 11/2 = 66. Given that Minerva

J)

chose two cups at random, all 66 combinations are equally likely.

2.2

(iii)

(iv)

EXERCISE 2.3

97

Random Variables

Suppose that iv i2, and i3 are the numbers of the three cups with sour yogurt (i3 > i2 > i1). There are only three combinations of two cups that are both sour: (iv i2), (iv i3), and (i2, i3). But there are 27 combinations in which exactly one of the two cups is sour, namely, i 1 and one of the nine cups that are OK, i 2 and one of the nine, or i3 and one of the nine. Random variable X has three possible realizations: zero, one, or two. The number of (i,j) pairs in which both cups are OK is 66 - 3 - 27 = 36. Each of the distinct (i,j) combinations has a 1/66 chance of arising. We therefore have the chart: Realization of X(x)

Probability of Occurrence

0

1

36/ 66 27/66

2

3/ 66

Suppose that there was only one cup with sour yogurt in Example 2.3. Answer parts (i)-(iv) of Example 2.3 under this new condition.

Shorthand for a Random Variable

Often, in a situation such as Example 2.2, we will see wording like "let random variable X be the number of sour cups that Minerva buys." As we know, however, the random variable is the function that assigns to her selection the number of sour cups it includes, while the actual number she buys is the realization of the random variable. In other words, the shortened description conflates the realization of the random variable with the random variable itself. Yet the description does achieve brevity, and it makes clear how the random variable in question was defined. For these reasons, we

will adopt this shorthand in the remainder of this chapter and beyond. Random variables can generally be classified as continuous or discrete.

Discrete random variable

A discrete random variable X assigns the elementary events in the sample space to a discrete set of (separate) points on the real-number line.

Continuous random variable

A continuous random variable X assigns the elementary events to an entire interval (or intervals) of real numbers.

98

CHAPTER 2

Discrete Probability Distributions

The random variables discussed in Examples 2.1 and 2.2 are discrete. A random variable that reflects the future temperature at some time and place (i.e., which mapE the temperature t to the real line under the rule X = t) is continuous, because the temperature itself could be not only (say) 32°F or 33°F but any real number between 3'.2 and 33. We focus on discrete random variables in this chapter and turn to continuous random variables in the next.

Discrete Probability Distributions A listing of each possible realization of a discrete random variable and the probabil· ity that it will arise is called the probability mass function of the random variable. A discrete mass function distribution takes the form:

a2

w.p. P1 w.p.p2

aM

w.p. PM

a1 X=

{

where w.p. = with probability

(2-Z

We will follow the convention that the a;'s are listed in ascending order. In (2-2) it must be the case that: M

pI > 0 for all j, and that'° p =l £., I j=1

The number of possible realizations of a discrete random variable need not bE finite (i.e., M could be For example, suppose that a fair coin is to be tossed, and le random variable X be the number of tosses until "heads" first occurs. (Again, we an using shorthand: X maps each possible sequence into the number of tosses until thE first head comes up, while the actual number of tosses is the realization of X.) Then is no finite upper limit on X here: for any positive integer k, the probability is (1/2) that the first k tosses are all "tails," meaning that X will exceed k. The quantity (1/2) obviously gets very small as k gets very large, but it never reaches zero when k is finite In many instances, one can express the probability mass function more succinct!) than with a huge chart like (2-2). For example, it might be the case that ak = k for i :?: 0 while Pk is some simple function g(k). When that happens, the situation can b4 described in a single line: P(X = k) = g(k) for nonnegative integer values of k. Another characterization of a discrete random variable is its cumulative distri bution function, Fx(y). For each specific value of y (e.g. y = 4): 00 ).

(2-3

The cumulative distribution function is defined at all y-values, not only thos, at which a/ s arise. However, F(y) is constant between two adjacent values a1and a1+1 and jumps up at the a/s themselves (Figure 2.1).

2.3

99

Discrete Probability Distributions

Fx(y)

0

FIGURE 2.1

EXAMPLE 2.4

For a discrete random variable, Fx(y) moves up in steps

More Sour Yogurt

What are the probability mass function and cumulative distribution for random variable X-the number of cups of sour yogurt that Minerva selects-in Example 2.3? Solution

Given what we determined in Example 2.3 about X, its probability mass function is: 0 wp

X=

36/66 lwp 27/66 { 2 wp 3/66

The cumulative distribution function of X, Fx(Y) follows: 0 Fx(y)

=

P(X s y)

=

for y < 0

36 for Os y < l 66 63 for 1 s y < 2 66 1 for y 1.

Each trial has exactly two possible outcomes, A and B, which are complementary. (iii) P(A), the probability of A, takes the same value on all n trials. P(B) is likewise fixed at 1 - p.

2.5

(iv)

109

The Binomial Distribution

The n trials are independent of one another. (For example, what happens on the ith trial has nothing to do with what happened on the i - lst trial or any previous trials.)

In this setting, the individual trials are called Bernoulli trials. Though binomial processes are abundant, so are processes that may seem to be binomial but are not. Several repeated flips of a fair coin create a binomial process, with n = number of flips and p = l/2. But if we classify several successive days by the dichotomy "rain/no rain," then, even if the probability of rain is the same on all days in a given month, a tendency of rainy days to come in streaks would lead us to revise the chance that it rains tomorrow based on the weather today. Such revision would violate the independence assumption stated in (iv). In a binomial process, the elementary events are the different sequences of A's and B's that could arise over then trials (e.g., ABABBAB). Suppose that random variable X assigns to each such sequence the number of A's that it contains. In shorthand, however, we say "let random variable X be the number of A's that come up over the n trials." In practical situations, it is often important to know about X. This discrete random variable X is said to follow a binomial distribution. Clearly, X must be an integer that can neither exceed n nor fall below 0. But we want to move beyond that trivial observation to find the chance that X can take on any particular integer value. Put differently, we want the probability mass function of X, namely, P(X = k) for each k from 0 to n. The quantity P(X = k) varies with both n and p, which are called the parameters of the binomial process. We adopt the following conventions:

If random variable Xis the number of A's that arise over a binomial process that has n trials and for which P(A) = p on each one, then we say that X has a binomial distribution with parameters n and p. We use the notation Xis B(n,p).

The Probability Mass Function of a Binomial Random Variable Here we specify the formula for P(X events A; and B; as follows:

= k) when X is B(n,p). To do so, we first define

Ai: the ith trial of the binomial process leads to outcome A. (i

= 1,2, .. . ,n)

Bi: the ith trial of the binomial process leads to outcome B. We determine P(X = k), starting at the extreme values for k and working our way inwards. Case 1. X = n and X = 0

These two cases are quite straightforward. The outcome X = n can only arise if event A arises on every one of the n trials. We can therefore write:

110

Discrete Probability Distributions

CHAPTER 2

P(X = n) = P(A's come up on all n trials)= P(A 1 n A 2

...

n A n)

Because the different trials of a binomial process are independent, we have:

Moreover, because P(A;) = p for all i, we can go on to write: P(X = n)

= p"

The case X = 0 is analogous, the only difference being that, instead of getting A's on all n trials, we get all B's. Since each of these B's has a probability 1 - p of arising, a modification of the above analysis yields: P(X = 0)

= (1 -

p)"

Case 2. X = n - 1 and X = 1 To get X = n - l, we must get A on n - l of the trials and B on the remaining one. There are n mutually exclusive ways of getting all A's but one, depending on where the lone B appears within the n trials: B

A

A

A

A

B

A

A

A

A

B

A

A

A

A

B

Then different elementary events for which X = n - 1 in a binomial process with n trials.

Each of these sequences has probability p"-1(1 - p), because there are n-1 factors of p in the multiplicative expression for the probability and one factor of 1 - p. Adding the probabilities of these mutually exclusive events, we get: P(X = n - l) = np"-1(1 - p) By analogous reasoning, we reach: P(X = 1)

= np(l -

p)"- 1

Case 3.X=k Finding the probability that X takes a particular value k gets more complicated when k moves towards interior values further away from O and n. Even for k = 5, where do the five /{s fall among the many trials? The first five trials only? The last five? The 3rd, 9th, 27th, 36th, and 88th?

111

The Binomial Distribution

2.5

Actually, we answered this question in our discussion about counting in Section 2.1. For any particular outcome of the n trials that yields A exactly k times, we could list the trial numbers on which A arose (in ascending order). The number of different ways of getting exactly k A's in n trials is equal to the number of distinct combinations of k integers in the range from 1 ton. (Right?) But we showed in Section 2.1 that this number is simply ( ~), which is k!(nn~ k)!" Furthermore, any particular outcome with k A's and n - k B's in n trials has probability p'(l - p?-k. Different sequences that meet this criterion are mutually exclusive events: for any two such sequences, there must be at least two individual trials in which one of them saw an A and the other a B. Putting the pieces together, we can write: The binomial probability mass function is: P(X

= k) = (~

)rk(l -

p)•-k

(2-8)

n) n! where ( k = k!(n-k)!' 0!=1,and0~k~n

EXAMPLE 2.10

One Day Flu? Suppose that a pharmaceutical company claims that it has a new cure for the flu, and that 96% of flu sufferers will fully recover within 24 hours of taking the drug. As a first test on its claim, a clinic will give the drug to 25 people diagnosed with the flu and then check on their condition 24 hours later. Suppose too that the company's claim is perfectly accurate. (i)

Let random variable X be the number of people tested who will recover from the flu within 24 hours. What can we say about the probability distribution of X? (ii) What is the chance that all 25 people tested will recover within 24 hours? (iii) What is the chance that the percentage tested who recover will exactly match the claimed percentage? (iv) What is the probability that only 80% of the people tested recover within 24 hours?

Solution (i)

If we treat different flu sufferers as independent in terms of responsiveness to the drug (which seems reasonable), then X is B(25, .96). Each patient-test constitutes a trial of the drug with a 96% chance of success (as defined by the 24-hour standard) and 4% chance of failure. With independence, all the conditions for a binomial process are met. (We assume that the 96% statistic includes the possibility that the patient would be cured even without the drug.)

112

Discrete Probability Distributions

CHAPTER 2

(ii)

Here we want P(X = 25), which is given under (2-8) by: P(X = 25)

(iii)

If X = 24, then the cure rate is 96%, which exactly matches the claim. We write: P(X

(iv)

= .9625 = .360

= 24) = ( ~~}-96) 24 (.04) = 24~:!1!

· 96 24 * .04

= .9624 = .375

The cure rate is 80% if X = 20. We write: P(X

= 20) = (;~}-96)2°(.04)5 = 20~~! 5 ! · 9620 * .045 = .0024

We see that the chance is about 3 in 4 that the observed cure rate will be 96% or higher, and that the chance of falling far short of the true rate of 96% because of "bad luck" with the patients tested is very small. Of course, these calculations are based on treating the 96% figure as correct. If the true cure rate is well below 96%, then P(X = 25), P(X = 24), and P(X = 20) could be drastically different.

EXERCISE 2.10

Find the probability in Example 2.10 that at least 90% of the flu sufferers are cured within a day.

Binomial Table Working with (2-8) can get tedious in many situations. Suppose, for example, that

= 33, and we want to find the P(X ::; 14). Finding (; Jcan be unpleasant when (say) = 33, and k = 14: who wants to work with 14! and the product of integers between ~

J

and 33? And, whatever pis, we have to multiply ( :: by p14 (1- p)19 . Furthermore, firn ing P(X = 14) is just the start of our work: we also have to calculate P(X = k) for eve1 integer k from Oto 13, and then add all the various probabilities together. Wouldn't be nice if there were electronic calculators that did the work for us? Well, there are (which should be no surprise, given the practical importance , binomial calculations). Your instructor will tell you which particular binomial ca culators are at your disposal, depending on which statistical package you are usin (Excel works binomial probabilities, as do many hand calculators.) Necessarily, tl calculator will require that you specify n and p, so that it knows which binomi distribution is appropriate. Then, you have to make clear what you want to kno (e.g., P(X = 3).) Such calculators routinely tell you P(X = k) and P(X::; k) for a specifiE value of k. If you can find P(X ::; k) for any k, then you can deduce any probability that inte ests you:

113

The Binomial Distribution

2.5

• Because P(X $ k) + P(X > k)

= 1, it follows that: P(X > k)

=l

- P(X $ k)

(2-9)

• If we want P(c $ X $ d) for two integers c and d with d > c, then we can w rite:

P(X $ d)

= P(c $ X $ d) + P(X $ c -1) ➔ P(c $ X $ d) = P(X $ d) - P(X $ c -1)

(2-10)

To understand (2-10), note that there are two mutually exclusive ways to get X $ d: X can be between c and d, or X can fall below c. (Because X is an integer, its h ighest value below c is c-1).

:XAMPLE 2.11

Flu Cure Revisited Returning to example 2.10 about the flu drug that yields a 96% chance of recovery within 24 hours, let X again be the number of such recoveries among 25 flu sufferers who receive the drug. (i) What is P(X $ 20)? (ii) What is P(X ~ 23)? (iii) What is P(21 $ X $ 24)? (iv) Suppose that X = 20 in the actual test. Might this result reasonably suggest to those conducting the test that the 96% claim is exaggerated? What are the hazards of concluding that the claim is exaggerated because X = 20? Solution

Because X is N(25, .96), a binomial calculator yields P(X $ 20) = .00038 "' 1 in 2600 (ii) P(X ~ 23) = P(X > 22) = 1 - P(X $ 22) = 1 - .0765 = .9235 (iii) P(21 $ X $ 24) = P(X $ 24) - P(X $ 20) = .6396 - .00038 = .6392 (iv) If the true cure rate is 96%, the chance is only 1 in 2600 (i.e., .00038) that 80% or fewer patients in a sample of 25 would be cured. Under the claim, in other words, the observed outcome would be highly freakish, so reasonable evaluators might be skeptical of the claim given the evidence. However, freakish events do occur, and rejecting the claim because of the evidence could conceivably be a mistake. We will discuss the dilemmas in evaluating experimental results in the Statistics part of the course. (i)

0, because the accumulation of events over a period of length T follows a binomial process. Each L1 can be construed as offering a trial in that process, in which outcome A (an event) has probability A.tl, while the complementary outcome B is having no event. With X "' B(T/ tl, A.tl), we have: P(X

= k over period T)

""

(T( ) (,;\,t,,.)k(l - M)(l t,,.

,1,t,,. f f1,.

(2-32)

Formula (2-32) is valuable because it points out that a Poisson process can be viewed as a binomial process with a very large number of trials. But the very fact that T/tl can get extremely large can be problematic (e.g., if T = one day and L1 = one second, then T/tl = 60 * 60 * 24 = 86,400). And, in any case, (2-32) is an approximation, for it ignores the adjustments essentially proportional to t1 2 tied to two or more events over increment Ll. It would be nice to have an expression for P(X = k) that is both simpler and exact. Fortunately, such an expression can be derived. The derivation that the author prefers involves the use of differential equations. While many readers of this book have not studied differential equations, what they learned in first-year Calculus should be enough to understand the derivation. We first derive the formula for P(X = 0 over period T) below; then we present the more general formula for P(X = k over period T). To find P(X = 0 over period T), suppose first that P 0(t) is the probability of zero events over an interval of length t in a Poisson process with parameter A.. We have the boundary condition that P 0(0) = 1. Now consider P0(t + Ll), with L1 very small. P0(t + tl) is related to P0 (t) by the formula (2-33)

2.9

The Poisson Distribution

147

where a can be treated as a proportionality constant, which allows the minuscule ll 2 effect to be included in the equation rather than ignored. This expression is correct because having no events over a period of length t + !l requires no events in the first t time units and also no events in the last ll. Given independence of different time periods in a Poisson process, we multiply the probabilities of no events in the two nonoverlapping intervals to get the chance of no events overt+ !l. Now, (2-37) can be rewritten as: (2-34) Dividing both sides of (2-34) by ll and letting ll ➔ 0, we get on the left the quantity P0(t), which is the first derivative of the function P0(w), evaluated at t. On the right hand side, we get -AP0 (t) + a!lP0 (t), which approaches -AP0 (t) in the limit as ll ➔ 0. We thus convert (2-34) as ll ➔ 0 to the differential equation: (2-35) where P0(t) = first derivative of P0 (t), evaluated at t. (2-35) is a differential equation, because it relates the derivative of P0 (t) to P0 (t) itself. While there are general methods for solving differential equations, we can solve this particular one by asking: what function has a derivative that is a constant multiplied by the function itself? We might recall from Calculus that eX has the striking property that its derivative is the same as itself, and the correlated result that [3erx has a derivative of f3ye rx when~ and y are constants. In the present setting, the function [3e-J.J has a derivative of -Af3e-J.J. It is therefore a solution to (2-35). The differential equation is solved for any value of the constant [3 . But here we face the boundary condition P 0(0) = 1, because there cannot possibly be any events over a period of length zero. Satisfying both (2-35) and the boundary condition requires the choice f3 = l. We thus reach P0 (t) = e-J.J. One last question: is P0 (t) = e-J.J the only possible solution to the differential equation and boundary condition? The theory of differential equations ensures that the answer to this question is "yes." Thus, we can write:

For an interval of length T, we can write: P(X = 0 over period T)

= e-AT

(2-36)

The expression for P(X =0 over period T) in (2-36) is easy to use. Another approach to finding this probability applies the fact that limX➔0 (1 + x)1 1x = e to the binomial formula (2-32). If your instructor prefers invoking this limiting formula to using the differential equation, she has no disagreement from this author.

148

CHAPTER 2

Discrete Probability Distributions

One advantage of using a differential equation to find P(X = 0 over T) is that the approach generalizes directly for finding P(X = k over T), as we discuss at the website. The more general expression for P(X = k over period T) is: In a Poisson process with parameter?., the probability of exactly k events over a period of length T follows: P(X = k over period T)

= (?.T) ke-'-T/k!

(2-37)

where k is a nonnegative integer This formula is interesting for two reasons. Because it yields a positive probability for any nonnegative integer k, it means that a Poisson random variable in principle can take on an infinite number of possible values. Of course, the probabilities drop off dramatically as k increases beyond a certain point, and the sum of the P(X = k)'s for all k's equals one. Furthermore, (2-37) uses ?. and T only through their product ?. T Thus, the probability of (say) three events is the same when ?., = 3 and T = l as it is when A = 1 and T = 3. EXERCISE 2.26

If P(X = 1) = P(X = 0) in a Poisson process with parameter A observed for a period of length T, what is A in terms of T?

We should play with (2-37) a bit to get familiar with it.

EXAMPLE 2.25

Poisson Time Given a Poisson process with ?., events if: (i) (ii)

= 1, find the probabilities of 0, 1, 2, 3, and 4 or more

T=l T=2

Solution

i)

For a Poisson process with ?., to learn that: P(X = 0)

= 1 and T = 1, we can use a spreadsheet on (2-37)

= e-'-T = e-1 = .37 P(X = 1) = (?. T)e-'-T = l * e-1 = .37 P(X = 2) = (?. T) 2e-).T/2! = e-1 / 2 = .19 P(X = 3) = (?.T)3e-'"T/3! = e-1 /6 = .06 P(X ~ 4) = 1 - (P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)) = .01

2.9

149

The Poisson Distribution

In this particular Poisson process, 0 and 1 are the most likely outcomes, arising about 3/ 4 of the time. A graph of P(X = k) vs. k for A = 1 and T = 1 looks like: 0 . 45 0.4 0 . 35 0.3 0 . 25 0.2 0 . 15 0.1

0 . 05 0 0

ii)

2

3

4

5

6

7

6

9

10

If we double the length of the period to T = 2 while keeping A = 1, the results change: P(X = 0)

= e-'-T = e-2 = .13 P(X = 1) = (AT)e-'-T = 2 * e-2 = .26 P(X = 2) = (A T)2e-'-T/ 2! = 4e-2 / 2 = .26 P(X = 3) = (AT)3e-'-T/3! = Be-2 / 6 = .18 P(X ~ 4) = 1 - (P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)) = .17

With the period twice as long, we would expect roughly twice as many events as before. It is therefore unsurprising that P(X = 0) and P(X = 1) go down while P(X = 2), P(X = 3), and P(X ~ 4) go up. The graph of P(X = k) vs. k for A = 2 and T = l shows how the probability mass is moving outward compared to the earlier case A = 1 and T = l 0 . 35 0.3 0 . 25 0 .2 0 . 15 0 .1 0 . 05 0 0

2

3

4

5

6

7

6

9

10

150

EXERCISE 2.27

CHAPTER 2

Discrete Probability Distributions

Given a Poisson process with A = 1, find the probabilities of 0, 1, 2, 3, and 4 or more events if T = 4. Create a graph of P(X = k) vs. k like those in Example 2.25.

The Mean and Variance of a Poisson Random Variable The mean and variance of a Poisson random variable follow simple rules: The mean of a Poisson random variable

If X is Poisson with parameter A and period T, then

E(X) = AT

(2-38)

The variance of a Poisson random variable

If X is Poisson with parameter A and period T, then

(2-39)

To establish the rule for E(X), we note that:

-

-

-

k=l

k=l

k=l

E(X) = L,kP(X = k) = L,kP(X = k) = L,k(ATl e-?.t / kl

= ATL,(ATl- e-,, I (k -1)! = ATL,(AT )i e-At I j ! 1

1

k=l

j=O

(where j = k - l) But

I, 7=o (AT)i e-At I j ! = 1 because it is the sum of P(X = j) over all possibilities.

Thus: E(X) =AT * l = AT

EXERCISE 2.28

Verify that cr2(x) = AT. (Hint: Use the hint in Exercise 2.12 to rewrite k2.)

When T = l in (2-38), we get E(X) =A . Thus, A is the average number of events per unit time in the Poisson process. That circumstance allows a simpler interpretation of that process. Suppose that events occur completely at random-with no times more likely than others-but at an average rate of A per unit time. (For example, a city might average 40 births per day, but the timing is random, just as likely between 2

The Poisson Distribution

2.9

151

AM and 2:01 AM as between 2 PM and 2:01 PM.) Then those random events occur under a Poisson process with parameters A and T (or, given (2-37), a Poisson process with overall parameter AT). We can therefore use (2-37) to find the probability of a particular number of events over a particular period (e.g. the chance of three births between 3 PM and 4 PM). Because events in real life often arise randomly over time, the Poisson formula is of widespread relevance.

Some Applications of the Poisson Distribution EXAMPLE 2.26

Born Again? Suppose that births in a large city follow a Poisson process with rate A per minute, and that the last blessed event occurred z minutes ago. What is the probability that the next birth will occur during the next five minutes? Solution

To answer that question, we define four relevant events: A: The last birth occurred z minutes ago B: The next birth will occur during the next five minutes C: At least one birth occurs over the next five minutes

D: No births occur within the next five minutes In terms of these four events, the question is asking us to find P(B IA). But C and D are worth contemplating because they make the problem easier to solve. Because of assumption (ii) about the Poisson process, the value of z cannot reveal anything about the timing of the next event; thus, P(B IA) = P(B). That past events in a Poisson process say nothing about future ones is the memoryless property of the process. Memoryless is sometimes viewed as an intriguing attribute of the Poisson process, but it is embodied in the process by definition. Because B and C are two different descriptions of the same event, it follows that P(B) = P(C). And because C and D are complementary, we have: P(C) = 1 P(D). To find P(D), we set T = 5 and use (2-37) to determine P(X = 0). Substitution into the formula yields: P(D) = e-5" Hence P(B) = P(C) = 1 - P(D) = 1 - e-5" Thus P(B IA) = 1 - e-5" for any value of z. If A = .02 per minute, for example, then the probability the next birth occurs within five minutes is 1 - e-·1 = .095.

EXERCISE 2.29

Under the Poisson process of Example 2.26, what is the probability that the next birth is between 5 and 10 minutes from now?

152

EXAMPLE 2.27

CHAPTER 2

Discrete Probability Distributions

Mystery on Ice

Various mathematicians have modeled point scoring in hockey as a Poisson process, in which teams A and B accumulate points under independent processes with rates AA and AB, respectively. With this model in mind, Professor Richard C. Larson of MIT pondered an aspect of hockey playoff games that surprised him. National Hockey League statistics showed that 28 of 85 recent playoff games went into overtime because the teams were tied at the end of the regulation period. Should such ties arise fully one-third of the time if the Poisson model is correct? Solution

Over the 85 games in question, the average point score per team per regulation period was 2.45. Both teams in a playoff game are presumably very good; hence, it seems reasonable to assume that their scoring rates per unit time are fairly similar. Indeed, we might as a first approximation assume that their scoring rates per unit time are the same, and treat the regulation period of a playoff game as generating two independent scores, each of which follows a Poisson distribution with parameter 2.45. Pursuing Professor Larson's question, we could ask (as he did) how often the scores SA and SBshould be equal, given their distributions. More precisely, if two random variables SA and S8 are independently Poisson-distributed with rate 2.45, what is the probability that SA= SB? The two quantities could be equal because both are zero, because both are one, or, in general, because they both equal the same integer k. Thus, considering all the mutually exclusive ways that the scores are equal, we can write:

which means under independence that:

P(SA =SB)= P(SA = 0)

* P(SB= 0) + P(SA = 1) * P(SB= 1) + P(SA = 2) * P(SB = 2) + ...

It follows from (2-37) that:

P(SA = SB) = (e-2.45)2 + (2.45 e-2.45)2 + (2.4S2 e-2.45 /2)2 + ...

Or

P( SA = SB) = L, (2.45ke-2.45 I k!)2 k=O

Working with a spreadsheet to sum up the (2.45ke-2·45 /k!)2 terms reveals that the numerical value of P(SA = SB) is .19. (Beyond k = 6 or so, the contributions to the sum are utterly negligible.) While it is true that SA and SB are both likely to be small integers, the probability is less than one in five that these integers are the same. Because 19% of 85 is about 16, the 28 ties observed over the 85 playoff games are nearly twice the number expected. Under the Poisson model Professor Larson

2.9

The Poisson Distribution

153

considered, therefore, the 33% rate of ties is surprisingly high. This outcome raises the possibility that the independence assumption is not realistic. Or perhaps the variability in scores around the overall average of 2.45 is less than is implied by the Poisson distribution.

EXERCISE 2.30

Under Professor Larson's assumptions in Example 2.27, what is the probability that the two hockey scores in a given playoff game differ by exactly one point? (As a practical matter, we can neglect the possibility that either team scores more than six points.)

EXAMPLE 2.28

Line Item

Amother example tied to Professor Larson, this time related to queuing research. Suppose that customers arrive at a single Automatic Teller Machine (ATM) under a Poisson process with mean rate one per minute. The time to complete a service is equally likely to be one minute or one-half minute. Customers are handled in the order first-come first-served; no services are interrupted because of arrivals; the machine serves each customer as soon as it has finished with all previous customers. The owner of the ATM can make inferences about customer waiting times from data about the timings of its transactions. Suppose that, on a given day, the ATM's transaction data for the period 5:26 PM-5:29 PM looked as follows: Time (hr/min/sec)

ATM Status

5 : 26 : 00

Machine free Service starts Service continues Service ends; new service starts Service ends; new service starts Service ends; no new service Machine free

5 : 26 : 30 5 : 27 : 00 5 : 27: 30 5: 28 : 00 5: 28 : 30

5 : 29 : 00

Let random variable 51 be the number of customer arrivals from just after 5:26:30 to 5:27:30, 5 2 the number from just after 5:27:30 to 5:28:00, and 53 the number from just after 5:28:00 to 5:28:30. The first arrival in the observation period (which is not counted as part of 51) was at exactly 5:26:30. Some questions: (i)

How many customers were served by the machine over the observation period?

154

CHAPTER 2

Discrete Probability Distributions

(ii) What is the minimum value 51 could take? (iii) The values of 51 + 5 2 and of 5 3 are exactly known given the data above. What are they? (iv) Given the information at hand, what is the conditional probability that 51 = 1? Solution

(i)

The server was free at the beginning and end of the three-minute observation period, and completed three services during the interval. Therefore, a total of three customers arrived over the period (and all were gone when it ended). (ii) There were no customers in the system just before 5:26:30, when a customer arrival ended the server's free period. When that customer's service ended at 5:27:30, another service started at once, which implies that at least one new customer arrived during the initial service. Hence, S1 ~ 1. (iii) Given that the server was free once the third service ended, it is clear that 5 3 = 0. Because there were three customer arrivals in total-the first of which is not counted among the 5/s-it follows that 51 + 52 = 2. (iv) The data demonstrate that 51 ~ 1 and that 51 + 5 2 = 2. (The 5/s must also be integers.) Thus, if events D and E are defined by: D: 51 = 1 and E: 51 ~ 1 and 51 + 52 =2, what we seek here is P(D IE). We have the familiar formula: P(DI E)

= P(D n E)/P(E).

To achieve both D and E, 51 and 5 2 must both equal 1 (think about it). Thus, P(D

n E) = P(5 1 = 5 2 = 1) = P(5 1 = 1) * P(5 2 = 1)

= P (one customer arrival over a one-minute interval and one arrival over a

separate half-minute interval) Under (2-38) and independence, we have: P(D and E) = P(one arrival in one minute and one arrival in next half minute)

= }.c' (.5.?.. )e-.s,1. = .S}.2e-1.s,1. As for P(E), we have: P(E)

= P(E n

D) + P(E n De)

where oc= complement of D We already know that P(E n D) = .SA 2e-1.s,1._ The event E n ocmeans that S1 * 1, 51 ~ 1, and 51 + S2 = 2. The only integers satisfying these conditions are 5 1 =2 and 52 = 0. Thus: P(E n De) = P(5 1 = 2 n 5 2 = 0)

2.9

155

The Poisson Distribution

But P(S 1 = 2 n S2 = 0)

= P(S1 = 2)P(S 2 = 0) = [(A 2e-'")/2]e-·5'" = (A 2e-L5'")/2.

In consequence, P(E)

= P(E n

D) + P(E n D*)

= A2 e-1.s,.

Putting it all together, we reach a surprisingly simple answer: P(D IE)

= P(D n E)/P(E) = (.SA 2e-L5'")/(A 2e-L5'") = 1/2

Note that the answer does not depend on A. This happens because, in any Poisson process with any rate A, the chance of having one event in each of two consecutive intervals-one of unit length and the other of 1/2 unit length-is the same as the probability of two events in the first interval and none in the second. The value of A affects the probability of actually getting two events in 1.5 minutes, with at least one in the first minute. However, once that event is known to have occurred, probabilities conditioned on the event no longer depend on A. Why would this question interest anyone? Well, the owner of the ATM might be concerned about whether people are waiting too long for service. When S1 = 2, and S2 = 0, the second person to arrive between 5:26:30 and 5:27:30 has to wait until the original service ended and then has to wait through the service of the first customer who arrived between 5:26:30 and 5:27:30. If S1 = 1 and S2 = 1, by contrast, then each arriving person goes into service immediately as soon as the customer then using the ATM leaves. Knowing what the transaction data imply about the likelihood of these two possibilities helps in assessing the quality of customer service.

EXERCISE 2.31

Suppose that the transactional data above were changed at 5:28:30 to "service ends, new service starts," and at 5:29:00 to "service ends, no new service," while the data at all other times remained the same. Now what can we say about Si, S2, and S/ What is the conditional probability that S1 = 3 given the data?

EXAMPLE 2.29

Too Close for Comfort (Based on a true story) Now, a problem about air traffic control. Suppose that two aerial routes-one Eastbound and one Northbound-cross at an altitude of 35,000 feet at junction J (below). In the absence of air-traffic control, suppose that the times at which eastbound planes would arrive at the junction would reflect a Poisson process with parameter A£ (per minute). Likewise, northbound planes would arrive under an independent Poisson process with parameter AN· All planes are jets that move at a speed of 600 miles per hour along their routes.

156

CHAPTER 2

Discrete Probability Distributions

J

Eastbound and Northbound Air Routes Cross at J

The US Federal Aviation Administration (FAA) thinks it dangerous if two planes cruising at the same altitude get within 5 miles of one another (in which case they are said to conflict). The idea is that, if a conflict arises, the planes are traveling so fast that they might be within seconds of a collision. With the FAA standard in mind, find the probabilities of three interesting events: (i) (ii) (iii)

E: the chance that an eastbound plane that has just reached Jis in conflict at that moment with a northbound plane N: the chance that a northbound plane that has just reached J is at that moment in conflict with an eastbound plane EE: the chance that a given eastbound plane that passes through J is at any time in conflict with a northbound plane that passes through J.

Solution (i)

To find P(E), we note that the conflict occurs if, at the time the eastbound plane reaches J, there is a northbound plane within five miles of J. If Eeis the complement of E, then £C requires that there be no northbound plane within five miles of the junction. It is easier to find P(F) than P(E), so we will do so and then invoke the rule P(E) = 1 - P(F). We aren't told anything about planes that are not at the junction, so how can we determine whether a Northbound aircraft is within five miles of J? Well, we can exploit the clue that planes travel at 600 miles per hour (which works out to ten miles per minute, or one mile every six seconds). Suppose that a plane is north of J and within five miles of it. Then, the plane must have passed through Jwithin the last thirty seconds. Similarly, if a northbound plane is still south of Jbut less than five miles away, it will reach Jwithin the next thirty seconds.

2.9

157

The Poisson Distribution

Thus, if an eastbound plane reaches J at time t, there will be a conflict at t if any northbound planes pass through J between t - 0.5 (in minutes) and t + 0.5. And there will be no conflict if no northbound plane reaches t over the one minute interval (t - 0.5, t + 0.5). We can therefore write: P(EC) (ii)

= P(no northbound arrivals at J over (t -

0.5, t + 0.5))

= e_,.N

and thus that P(E) = 1 - e-'-N The reasoning is the same as for P(E), so we can write: P(N)

=1 -

e-'-N

It might seem surprising that P(E) and P(N) differ, given that each conflict we are considering involves one eastbound and one northbound plane. If, however, ?., N > ?., E (for example), then more northbound planes reach J per hour than do eastbound planes. Thus, if equal numbers of northbound and eastbound planes face conflicts, the percentage of conflicts is lower for northbound planes passing through J than eastbound ones. And P(E) and P(N) reflect these percentages. (iii) The reader might be wondering about something: what is the difference between P(EE) and P(E)? The definitions of the events differ a bit: P(E) requires that a conflict be in progress when an eastbound plane reaches /; P(EE) requires some east/north conflict, but allows for the possibility that the conflict is already over (or has not yet begun) when the eastbound plane passes through /. Still, does this distinction really matter? Well, yes. Suppose that, when an eastbound plane arrives at /, there is a northbound plane six miles north of J. The two planes are not then in conflict. But consider the situation twelve seconds earlier, when the northbound plane was four miles north of J and the eastbound plane two miles west of it. The Pythagorean theorem reminds us that the two planes were .J25 = 4.5 miles apart at that time (i.e. that they were in conflict, even though they no longer are). Suppose that a northbound plane is L miles north of Jwhen an eastbound plane reaches /. What was the shortest distance between these two planes? Recognizing that each plane travels at 10 miles per minute, their distance apart w minutes previously is given by: D(w) = -,j(lOw )2 + (L - lOw )2 • To find the value of w at which D(w) is minimized, we can observe that the value of w that minimizes D(w) is the same at that which minimizes D 2(w) = (10w) 2 + (L - l0w)2. We find the minimizing value for D2(w) by equating to zero the derivative of (10w) 2 + (L - 10w)2 with respect tow. That derivative is 200w - 20L

+ 200w

= 400w -

20L, meaning that a zero derivative arises when w

= 2~.

(Because the second derivative is positive, we know we have a minimum.) At time w

=

{o, 0

2

( w)

=

L2 /2, from which it follows that the minimum

distance between the two planes was

L/../2. (We need not consider times

158

CHAPTER 2

Discrete Probability Distributions

after the Eastbound plane reached J, because its subsequent distance from the Northbound plane would always exceed L.). Setting L/.J2 = 5, we conclude if the Northbound plane was less than 5.J2. = 7.07 miles North of Jwhen the Eastbound plane reached J, the two planes are/were in conflict at some time. If the present distance apart is between 5 and 7.07 miles, then the conflict occurred earlier but has ended. By similar reasoning, a conflict will arise in the future if a Northbound plane is between 5 and 7.07 miles south of J when an Eastbound plane passes through J. At travel speeds of ten miles per minute, an eastbound aircraft reaching J at time t will conflict with any northbound plane that passes through J during the interval (t - .707,t + .707). It will avoid a conflict if no Northbound planes reach J during that interval of 1.414 minutes. In consequence,

P(EE)

= 1 - P(EE') = 1 -

e-L 414 .l.N

This example assumed the absence of air-traffic control, and random arrivals at J under Poisson processes. In reality, aircraft arrival times at junctions would never be left to chance alone. What these calculations suggest is the frequency at which potentially hazardous situations would arise based on activity levels and operational randomness in the air-traffic system. They therefore indicate the magnitude of the challenge facing air-traffic controllers. The magnificence with which the controllers meet this challenge is suggested by a statistic: between 1990 and 2013, over 11 billion passengers travelled in scheduled commercial aircraft in the United States. The number killed in midair collisions was zero.

EXERCISE 2.32

Suppose that Eastbound jets passing through J travel at speeds of 600 miles per hour, while Northbound planes have speeds of 500 miles per hour. (The discrepancy could arise because Eastbound planes benefit from West-to-East tailwinds, while South-to-North planes receive no comparable benefit.) How would the probabilities P(E), P(N), and P(EE) in Example 2.29 change (if at all) because of this change in Northbound speeds?

This concludes our introduction to discrete random variables, and in particular some of the best known discrete distributions. We turn to continuous random variables in the next chapter.

The Takeaway Bar

159

The Takeaway Bar

Glossary A random variable is a mathematical function that assigns a real number to each elementary event in the sample space. The realization of a random variable is the number it actually assumes, based on which elementary event occurs A discrete random variable can take on a discrete set of numerical values (e.g. integers) with probabilities that are also specified. A continuous random variable can take on any value within some interval(s) of real numbers. The probability mass function of a discrete random variable specifies each realization that can arise with nonzero probability, and the probability associated with each realization. The cumulative distribution function (CDF) of a discrete random variable X specifies for each number y the probability that the realization of Xis at or below y. The mean of a discrete random variable is a weighted average of its realizations, with the weight of each value equal to the probability assigned to it. Under the Law of Large Numbers, the mean corresponds to the long-term average of outcomes if the experiment that generates the random variable is conducted repeatedly. (The mean is sometimes called the expected value of X.) The variance of a random variable is a weighted average of squared differences between individual realizations and the variable's mean, with the weight assigned to each squared difference equal to the probability that it arises. The standard deviation of a random variable is the square root of its variance. The standard deviation is expressed in the same units as the mean, and is a widely-used measure of the volatility of the random variable. The conditional expectation rule expresses the mean of random variable X as a weighted average of its conditional mean values, depending on which of a series of exhaustive events occurs. The median of a random variable is its 50th percentile (i.e., its halfway mark). Some people prefer the median to the mean as a measure of the "middle" of the distribution. A memoryless random process has the property that what happened in the past has no relevance to what will happen in the future.

Some Discrete Random Variables A binomial process involves a series of n independent (Bernoulli) trials, each of which can yield one of two complementary outcomes A and B. P(A), the

160

CHAPTER 2

Discrete Probability Distributions

probability that outcome A raises on a particular trial, is the same for all trials. The binomial random variable Xis the number of times that A comes up over the n trials. (More precisely, X assigns to each sequence of trial outcomes the number of A's within that sequence.) A geometric process likewise involves a series of independent Bernoulli trials, each of which can yield outcome A or its complement B. P(A), the probability that outcome A raises on a particular trial, is the same for all trials. The geometric random variable Xis the number of trials until outcome A arises for the first time. A negative binomial process likewise involves independent Bernoulli trials with outcomes A and B, with its random variable Xkequal to the number of trials until A comes up for the kth time. The geometric distribution is a special case of the negative binomial distribution. A hypergeometric process is one in which j items are drawn at random without replacement from a finite population that is divided into two distinct groups of known sizes, m and n. The hypergeometric random variable X concerns the number of items selected that come from the first group. A (time-related) Poisson process is one in which events occur randomly over time, at an average rate of A per unit time. The Poisson random variable X is the number of events that occur over a period of length T.

Key Formulas • The mean µx of discrete random variable X (also written as E(X)) follows: M

µx =

"2>li j=l

where ai = the jth smallest value that X can assume and Pi • If Z = cX + b where c and b are numerical constants, then: E(Z)

= P(X = ai)

= cE(X) + b

Conditional Expectation Rule: If the A;'s are a set of n mutually exclusive and exhaustive events, then:

where E(XIA;) is the conditional mean of X given that event A ; occurs

Key Formulas

161

• The variance cr2 (X)of discrete random variable X follows:

where aj = the jth smallest value that X can assume and pj = P(X = aj) • The standard deviation cr(x) of random variable x is the positive square root of its variance: a(X) = +Ja 2 (X)

• If Z

= cX + b, then:

cr(Z)

= Ic Icr(x)

where Ic Iis the absolute value of c

Binomial, geometric, negative binomial, hypergeometric, and Poisson random variables can only take on integer nonnegative values. • If X is binomial with parameters n and p, then:

where ( ~)

= n!/(k!(n -

k)!)

• If X is geometric with parameter p, then:

P(X = k) = p(l - pf1 for k ~ l • If Xj is negative binomial with parameters p and j, then

P(X = k) = (~ - l)pj(l - p)k-j I

J- l

• If X is hypergeometric with parameters (j,m,n) then:

P(X where O::; k ::; min(m,j)

= k) = (;

)C:

k} ( m; n)

162

CHAPTER 2

Discrete Probability Distributions

• If X is Poisson with parameters A. and T, then: P(X = k)

= (A T)ke-;u /k!

for k ~ 0

Some Major Details about Five Probability Distributions: Distribution

Parameters

Notation

Mean

Standard Deviation

Binomial

n,p

B(n,p)

np

✓np(l - p)

Poisson

A,

P(l,T)

lT

m

Geometric

p

G(p)

lip

✓(l - p)/p

Negative Binomial

p,k

NB(p,k)

kip

✓k(l - p)/p

Hypergeometric

j,m,n

H(j,m,n)

jm/(m + n))

Ugly; see (2-30)

Some Parting Thoughts • The basic laws of probability apply just as well in the domain of probability distributions as they did elsewhere. We can use Bayes' Theorem, for example, to find the conditional probability that a random variable falls in a certain range given certain information about that variable. • Electronic tables exist for most well-known probability distributions, and they can all but eliminate tedious calculations. These tables arise in statistical packages, as stand-alone tables on the web, or in even mildly sophisticated hand calculators. One has to get used to the eccentricities of particular tables; many of them, for example, use the word "success" when outcome A arises rather than its complement, even though outcome A might be anything but a success (e.g., a disastrous flood). • Means and standard deviations offer good first descriptions of how random variables behave. But they are only an "executive summary" of what is going on, and not the whole story. The probability mass function and cumulative distribution function offer further detail about a discrete random variable, and allow the calculation of specific probabilities that are often the real focus of interest. • Many important discrete random processes involve a series of trials, each of which leads either to event A or its complement B. The binomial, geometric, negative binomial and hypergeometric processes all fit this framework. In the binomial, geometric and negative binomial processes, P(A) = p on all trials and different trials have independent outcomes; in the hypergeometric setting, different trials are not independent and P(A) varies across trials based on previous outcomes.

Chapter 2 EZ-Pass

163

• If events occur completely at random over time except that they occur at a known average rate per unit time, then we are in the domain of the Poisson distribution for the number of events over a given period. • The conditional expectation rule is a good formula for working out a mean when we know the conditional means that arise over various possibilities. • In both binomial and hypergeometric processes with moderate or large numbers of trials, the spread of outcomes around their means is generally small relative to the means themselves. That property explains why polling results represent so well the results for entire populations. • The memoryless property of the geometric, and Poisson distributions is a powerful tool in analyzing these processes, a point we should not forget (sorry). • We should not be upset if our analysis of a random variable generates results that go against our intuition. Rather, we should try to understand why our intuition went astray. If we do that, our intuition will tend to be better the next time.

Chapter 2 EZ-Pass 1. Discrete random variable X can take on more than one value. Given this fact,

2.

3.

4.

5.

the variance of X: (A) must be zero (B) must be nonzero (C) might be either zero or nonzero (0) all of these In a discrete random process in which X is a nonnegative integer, P(X = 1) > P(X = 2) is always true if: (A) X has a binomial distribution (B) X has a Poisson distribution (C) X has a geometric distribution (D) X has a hypergeometric distribution In a memoryless random process, the timing of the next event: (A) is always known for sure (B) is too unpredictable to allow any probabilistic statements (C) is longer than the time since the last event (0) none of these Independent Bernoulli processes are important in all the following distributions except: (A) Binomial (B) Geometric (C) Negative Binomial (0) Hypergeometric If a Poisson process with parameter )., is observed over two nonoverlapping periods of equal length, the probability that there were more events in the second period than in the first:

164

CHAPTER 2

Discrete Probability Distributions

(A) must exceed 50% (B) must fall below 50% (C) mustequal50% (D) none of these 6. For a random variable with positive mean, its coefficient of variation (A) cannot be zero (B) cannot exceed one (C) cannot exceed the mean (D) none of these

Chapter 2 Further Exercises

EXERCISE 2.33

Minerva has invented a fair eight-sided die. (i) What is the probability of getting two sixes in a row when this die is tossed? (ii) What is the probability of getting an even number on a given toss? (iii) What is the probability that the sum of outcomes on two consecutive tosses is even? (iv) What is the mean outcome for a single toss? (v) What are the variance and standard deviation of the outcome on a single toss? (vi) What is the coefficient of variation of the outcome?

EXERCISE 2.34

Mendel plays spin the bottle, a game that was popular last century. On any given spin, he has a probability p of winning; different spins can be assumed independent. In a total of seven spins: (i) (ii)

What is the probability that Mendel wins exactly three times? Given that he won exactly three times, what is the probability that he won on both of the first two spins? (iii) For what value of p does he have a 10% chance of winning exactly three times? (Play with the binomial calculator to get the answer: do not try to solve cubic or quadric equations.)

EXERCISE 2.35

A True/False test with ten questions is to begin in Statistics, but its scoring rules are a bit unusual. If the student chooses "true" or "false," she will get ten points if her answer is correct and will lose six points if her answer is wrong. But if she writes "probably true," she will get seven points if her answer is correct and lose three points if her answer is wrong. The same rules apply if she writes "probably false."

Chapter 2 Further Exercises

165

(i)

Mendel does not know any of the answers, and will make random true/ false guesses on each question based on flipping a coin (head = true, tails = false). What is the probability that his score will be at least 80%? (Hint: If he gets k answers correct, what is his score in terms of k?) (ii) Minerva also doesn't know any answers and does her own coin flip. She will answer every question "probably true" or "probably false." What is the probability that her final score is zero? (iii) Asparagus thinks that the chance is 60% that the answer to the first question is true. In terms of his mean score for the first question, would he be better off writing "true" or "probably true?" Treat his 60% estimate as accurate.

EXERCISE 2.36

Please review Example 2.13, "A World Series." Suppose that the chance that the Maryland Crabs win a game during the series is related to its success during the previous games. More specifically, if k is the number of games played so far in the series and j is the number of Crab wins so far, then the chance the Crabs win the next game is (j + 1)/ (k + 2). For example, if they have won the first two out of three games, their chance of winning the fourth is 3/ 5. (i)

What is the probability that the Crabs win the series in the first four games? Is the probability the same for the Bayous? (ii) Identify the mutually exclusive sequences by which the Crabs could win on the fifth game, and find the probability of each. Do you sense a pattern? (iii) Now, work out the probability distribution for random variable X, the number of games that the World Series lasts. (iv) Compare this distribution with the one for X we found in the text, when we assumed that successes in different games are independent. Do the changes in the distribution make intuitive sense? (v) Why don't the various games under the current assumptions behave like trials in a binomial process?

EXERCISE 2.37

In the Canadian national election in 2011, the Conservatives won 39.62% of the popular vote, the New Democrats won 30.63%, the Liberals won 18.91%, and minor parties won the remaining votes. Suppose that, just prior to the election, a random sample of 1200 Canadian voters were polled and asked their voting intentions. (i)

(ii)

What is the probability that the pollster's estimate of the Conservative share of votes would be correct to within two percentage points? (Yes: use an electronic binomial table.) What is the probability that the pollster would estimate that the Conservatives would gain at least 40% of the popular vote?

166

CHAPTER 2

Discrete Probability Distributions

(iii) Given the pollster's estimate of the Conservative vote share is exactly 40%, what is the probability that his estimate of the Liberal share is at most 20%? (Hint: Among voters sampled who do not favor the Conservatives, what fraction support the Liberals?)

EXERCISE 2.38

From the Web: (i) (ii)

A study indicates that 4% of American teenagers have tattoos. You randomly sample 30 teenagers. What is the likelihood that exactly 3 will have a tattoo? An XYZ cell phone is made from 55 components. Each component has a .002 probability of being defective. What is the probability that an XYZ cell phone will not work perfectly? (Assume independence of different components.)

Courtesy of Professor Hershey Harry Friedman of Brooklyn College

EXERCISE 2.39

Of the packages to be shipped by a particular manufacturing company, 28% are "accidentally" listed as weighing less than they actually do. (The aim is dishonestly to save money.) The freight transporter-which does not know this 28% statistic-decides to check the company's accuracy in weighing by picking 50 packages at random and weighing them. If Xis the number of these packages that weigh more than the manufacturer had claimed, then the transporter will estimate the fraction of packages with low weight-listings by p = X/50 (e.g., if X = 12, the transporter will assume that 24% of this company's packages weigh more than is listed). (i)

Using a binomial table and the company's true behavior, find the probability that p is exactly correct (i.e. that p = .28). (ii) What is the probability that p is correct to within ten percentage points? (iii) Given that p is not exactly correct, what is the probability that p overestimates the manufacturer's true fraction of packages with low weight listings?

EXERCISE 2.40

Please read Example 2.15, "Double Trouble." Suppose that there are six loading positions for containers on a freight plane, and that each container independently has a 10% chance of being heavy. (i) (ii)

Find the probability that the plane will be in a "double heavy" situation (as defined in Example 2.15)? Exact answer; no approximations. Find the probability that the plane will be in a double heavy situation in the three loading positions closest to the front of the plane.

Chapter 2 Further Exercises

167

(iii) Now find the probability that the plane will be in a double heavy situation in the three loading positions closest to the back of the plane. (iv) Find the probability that the plane is not in a double heavy situation in either the front three or the back three loading positions, but is in such a situation when all six loading positions are considered. (Hint: This is not hard if you consider how this event can arise.)

EXERCISE 2.41

Suppose that you had to estimate the temperature based on readings from five thermometers, knowing that each one independently has an 80% chance of showing the correct temperature, a 10% chance of being exactly one degree (Fahrenheit) too high, and a 10% chance of being exactly one degree too low. (i) If you got the five readings (80°,80°, 81°, 80°, 80°), how would estimate the temperature? (Assume that your estimate must be an integer.) (ii) What is the probability that your estimate in (i) is correct? (iii) Suppose you got the five readings (79°,81°, 81°, 81°, 81°), what would your estimate be? (Hint: Don't say 81°. That estimate would surely be wrong.) (iv) What is the probability that your estimate in (iii) is correct? (v) Suppose in the general case that the true temperature is T°. What is the probability that your decision rule based on the five readings would yield an incorrect answer? (You can assume that Tis an integer.)

EXERCISE 2.42

Suppose that we have a series of n independent trials, each of which yields one of three mutually exclusive outcomes A, B, or C. The probabilities of A, B, and C on each trial are PAt PB, and Pc, respectively. Let random variable X be the number of A outcomes over the n trials, while Z is the number of B outcomes. (The number of C outcomes is then n - X - Z). We want an expression for: P(X = k, Z

= m), where 0 s k s n and 0 s m s n -

k)

What is the probability that the first k outcomes are all A's, the next j are all B's, and the remaining n - j - k are all C's? (Note that Pc = l - PA - PB, and express your answer in terms of PA and PB-) (ii) What is the probability that the last k outcomes are all A's, the j previous outcomes are all B's, and the first n - j - k are all C's? (iii) Is it true that all particular sequences of A's, B's, and C's that yield exactly k A's and j B's have the same probability as the event in (ii)? (iv) How many different ways can k A's be distributed over n trials? (v) For each of the individual ways identified in (iv), in how many ways can the m B's be distributed over the remaining n - k trials? (i)

168

CHAPTER 2

(vi)

Discrete Probability Distributions

Once we know the trial numbers of the k A's and the m B's, in how many ways can the n - j - k C's be distributed?

Putting (i)-(vi) together, do you get: P(X =

kn Z = m) = ( ; )( n: k) p~p; (1 -

pA

-

PBr

m- k

(vii) If the answer to (vi) is yes, congratulations. If not, go back and see what went wrong. (viii) Show that the answer in (vi) can be simplified to: P(X =kn Z = m) = n! / (k!m!(n - k- m)! ) p~p; ( l - PA - PB) (ix)

EXERCISE 2.43

n-m-k

This distribution is called the trinomial with parameters (n, PA, PB)- Briefly explain why this name makes sense.

In the British parliamentary election in 2010, the Conservatives got 41.0% of the national vote, the Labour Party got 32.9%, and the Liberal Democrats got 26.1%. (We are neglecting some minor parties.) If a pollster chose 100 voters at random at election time: (i)

(Ii)

Make a guess (to the nearest hundredth) of the probability that the vote shares of all three parties in the poll would all be within two percentage points of their actual vote shares. Now, compute the probability of that event to the nearest hundredth, using the formula in Exercise 2.42.

EXERCISE 2.44

Generalizing the reasoning in Exercise 2.42, we can work out a probability formula for a multinomial distribution. For a series of n independent trials, suppose that events A1 ➔ A mare mutually exclusive and exhaustive, and that P(A) =Pi on each trial. Let random variable Xi be the number of times that event Ai occurs over the n trials. Show that:

EXERCISE 2.45

Among the computer chips produced at a given facility, 2% are mildly defective and an additional 3% are highly defective. The remaining chips are OK. Different

Chapter 2 Further Exercises

169

chips are independent in terms of their status. If a chip is mildly defective, it costs $20 to repair it to proper condition. For highly defective chips, the cost is $60; for chips that are OK, the cost is zero. Suppose that 40 chips produced at this facility are examined for defects. (i)

What is the probability that exactly two of the chips are defective to some extent? (Hint: What is the probability a given chip is defective to some extent?) (ii) What is the probability that exactly two chips are defective and that they are examined consecutively (e.g., the third and the fourth chips examined)? (iii) For the very first chip examined, what are the mean and variance of its repair cost? (iv) What is the probability that the total repair cost for the 40 chips is exactly $60? (Hint: There are three possible outcomes for each chip, not two. So this is not exactly a binomial process in terms of cost.)

EXERCISE 2.46

A standardized test on vocabulary has 20 multiple-choice questions, with four choices for each question. It was learned in pretesting that, among students like those who will be tested: 25% knew 15 of the words 20% knew 16 20% knew 17 15% knew 18 15% knew 19 5% knew all 20 Suppose that students who take the test will answer correctly for all the words they know and guess completely at random on the words they do not know.

If a student is picked at random among those about to take the test, what is the probability that the student will get exactly 15 answers correct? (ii) What is the probability that a randomly-chosen student will get all 20 answers correct? (iii) If a student gets all 20 answers correct, what is the probability that she knew all 20 words (and did not depend on lucky guesses)? (i)

EXERCISE 2.47

Suppose that a football league consists of three teams (A,B,C) that each play twelve games each season, six against each of the other two teams. Suppose further that the teams are equally good, and each one has a 50% chance of winning any particular game (while different games are independent). We want to work out the

170

CHAPTER 2

Discrete Probability Distributions

probability that at the end of the season the three teams will be tied, each having won six games. Let random variables W, S, and V be defined by: W = number of games A wins against B S = number of games A wins against C V

= number of games B wins against C

(i) (ii)

In terms of W, S, and V, what are the total numbers of wins for each team? If W = 4, what values must S and V take on to achieve a three-way tie over the season? (iii) What is the probability that W = 4? (iv) If W = 4, what is the probability of a three-way tie? (v) Now work out the overall probability of a three-way tie.

EXERCISE 2.48

(i) Returning to Exercise 2.47, suppose that team A is weaker than teams B and C, and only has a 40% chance of winning a game against each (with different games independent). Teams B and C are equally good, with both equally likely to win when they play each other. Now, what is the probability of a three-way tie at the end of the season? (ii) Returning yet again to Exercise 2.47 suppose that there are four teams that are equally good, and that each one plays the each of the others four times over a season. What is the probability of a four-way tie when the season ends?

EXERCISE 2.49

True story: Back in January 1987, the air traveler from Boston to Florida had two options on Eastern Airlines: a $190 round-trip fare with a 50% cancellation penalty, and a $250 fare with a 10% penalty. On January 2, 1987, Minerva, who is about to make a round-trip reservation, estimates that the probability is p that she will have to cancel it. (i)

On that date, what was the probability distribution for the amount of money Eastern would ultimately receive from her under the $190 option? Under the $250 option? (Answer in terms of p.) (ii) What is the expected amount of money she would ultimately pay Eastern under the two options? (iii) For what values of p does the $250 option yield the lower expected price? What does this outcome suggest about the preferable option for a typical traveler?

Chapter 2 Further Exercises

EXERCISE 2.50

171

In the on-line auction in Exercise 1.70, suppose that Mendel will buy a used bike for $100 locally if his on-line auction bid is not accepted. Recall that, if he bids $b, he estimates the probability he will succeed as .00005b2 + .00Sb for b ~ 100. (i)

Assuming that he bids $75, what is the expected amount that he will wind up paying for the bike? (ii) Suppose he bids b, where bis between 0 and 100. What is the expected amount he will wind up paying for the bike? (iii) For what value of b is this expected cost minimized? (Use a spreadsheet, or calculus.) (iv) Do you think minimizing expected cost is a reasonable objective for Mendel? Briefly explain.

EXERCISE 2.51

The Professor plays the video game Text Twist, in which the player sees six scrambled letters and has to make a six-letter word within two minutes. (Example: HSINWE) He knows from long experience that, in any given round, he has a 98% chance of getting the six-letter word and a 2% chance of not doing so. The outcomes of different rounds are independent. (i)

Let random variable X be the number of rounds until The Professor next fails. What kind of distribution does X have? What is the mean of X? (ii) Let random variable Z be the number of rounds on which The Professor fails among the next 20. What kind of distribution does Z have? What is the mean of Z, and what is P(Z = 0)? (iii) True of false: If the Professor fails exactly once in the next 20 rounds, he is more likely to do so towards the beginning of these rounds than towards the end. Briefly explain. (Hint: This is a conditional probability problem.) (iv) Actually, in Text Twist, the computer will rescramble the letters on command every two seconds: HSINWE, for example, can get changed to WEHISN. If the Professor has no idea what the six-letter word is and keeps rescrambling every two seconds, what is the probability that the correct answer will appear on the screen during the two-minute span? Assume that the six letters are different, that there is only one correct answer, and that different scrambles yield independent answers: it is conceivable, for example, that HSINWE will get scrambled to yield HSINWE again. (Hint: In how many different orderings can the letters HSINWE appear?)

EXERCISE 2.52

Please read Example 2.19 "Mendel's Sure Thing," and answer the following questions:

172

CHAPTER 2

Discrete Probability Distributions

Suppose that the chance of winning on each of Mendel's "double or nothing" bets is not 50% but rather 49%. What is his probability of winning $1 if he has $1023 when he starts betting? What is his expected return? (ii) If Mendel has 2Q- 1 dollars and a 49% chance of winning each bet, does his expected return increase as Q goes up? (iii) Do your answer to (iii) make intuitive sense? Briefly explain.

(i)

EXERCISE 2.53

While the author first encountered Keno in Las Vegas, it is now part in the lottery in his own Commonwealth of Massachusetts. If the player picks ten integers in the range 1-80 and then 20 numbers in that range are chosen at random, then the chart at the lottery website says the following about winnings on a $1 bet: Match

10 9 8 7 6 5 0

Chance of Coming Up

1: 8,911,711.18 1: 163,381.37 1: 7,384.47 1: 620.68 1: 87.11 1: 19.44 1: 21.84

Win

$100,000 $10,000 $500 $80 $20 $2 $2

Note that the lottery is chivalrous: if you have the disastrous luck of getting none of your numbers chosen, it will give you a prize! (i)

Verify that the listed probabilities for "zero selected" and "eight selected" are correct. (Please use an electronic hypergeometric table.) (ii) Using both the probabilities and the win amounts that appear in this table, what is the chance that a player will win at least double her $1 bet ? (iii) What is the player's expected (mean) winnings on the bet? Do not subtract the $1 it costs to play. (iv) If you were calculating the variance of the player's winnings, would you take any account of the chance that the player loses (or "wins" zero)? If so, how would you do so? If not, are you kidding?

EXERCISE 2.54

----------

In a church with 88 members of the congregation, the pastor wants to ask the members for donations to renovate the church. Before doing so, however, he wants to get a sense of the sentiment on the issue. He will pick eight members of the congregation at random and ask them their views.

Chapter 2 Further Exercises

173

Suppose that 75% of the members of the congregation favor the renovations. We want to explore how accurately the views within sample of size eight will reflect those of the full congregation. (i)

Why is the hypergeometric a better distribution to use in this polling situation than the binomial? (ii) What is the probability that the (percentage) results in the sample will perfectly reflect those in the full congregation? (iii) What is the probability that everyone in the sample will favor the renovations? (iv) What is the probability that no one in the sample will favor the renovations? (v) What is the probability that the sample result will be within ten percentage points of that in the full congregation?

EXERCISE 2.55

Please review the example 2.24 "Under Fire." As we noted, the paper about the analysis reported that 5 of the 44 Manhattan manholes (structures) that experienced serious events in 2007 were among the 1000 ranked most hazardous among the 51,219 manholes. It further reported that 9 of the structures in ranked in the top 5000 had serious events in 200Z (i)

(ii)

EXERCISE 2.56

Given these results for the top 5000 structures and the top 1000, how many structures with ranks between 1001 and 5000 experienced serious events in 2007? What is the probability that (exactly) this number of events would have arisen with ranks between 1001 and 5000 if the rankings reflected sheer guesswork?

Suppose that random variable Xis Poisson distributed with rate A = 3 and period T= 1/2. (i) (ii)

Find P(X = 0), P(X = 1), P(X = 2), P(X = 3), E(X), and cr(X). Now suppose that Z is Poisson distributed with A = 3 and T = 1.5. Find P(Z = 0), P(Z = 1), P(Z = 2), P(Z = 3), E(Z), and cr(Z). Do these answers differ from those in (i) in a way that makes general sense? (iii) Suppose that W is Poisson distributed with A = 4. For what value of T does W have the same distribution as Z in (ii)?

EXERCISE 2.57

The Professor has determined that, when there is no restriction on e-mail, his incoming messages arrive in Poisson manner at an average rate of two per hour. But he is having a dispute with his e-mail system provider, and (correctly) believes that there is a 20% chance that his flow of incoming e-mails will be stopped at 9

174

CHAPTER 2

Discrete Probability Distributions

AM. When he turns on the computer at 10:30 AM, he sees that no e-mails have arrived since 9 AM. (i)

Given this information, what is the probability that the provider blocked his e-mails starting at 9 AM? (ii) Given this information, what is the probability that at least one message will arrive between 10:30 AM and 10:45 AM? (iii) Suppose that he leaves his office in despair at 10:30 AM, but learns from his assistant at 10:45 that at least one message arrived in the interval 10:30-10:45. Given that information, what is the probability that at least two messages arrived over 10:30-10:45? (Hint: If at least one message arrived, what do we know about whether his provider stopped his service at 9 AM?)

EXERCISE 2.58

Mendel works as a cashier for Trader Jack's grocery store. He is just about to close up and leave when Minerva arrives at his checkout station. Observing the situation, the manager asks Mendel to serve Minerva, and also all other customers who reach his station before Minerva is finished. Then he can go home. Suppose that it takes Mendel exactly one minute to handle each customer, and that customers arrive at his station under a Poisson process with?., = 4/5. Let random variable X be the time (in minutes) until Mendel can go home. (i) What is P(X = 1)? (ii) What is P(X ~ 5)? (iii) What is E(X)? (iv) What is cr(X)? (v) Suppose that the manager revises the instruction, and says that Mendel should serve all customers who arrive during Minerva's service and-if at least one customer arrives during Minerva's service-all those who arrive during the very first service after Minerva's. Under this revision, what are the answers to (i)-(iii)? (vi) Now suppose that the manager instructs Mendel to serve all customers who arrive after Minerva until the queue of customers drops to zero. For example, if two customers arrive during Minerva's service, he must serve them as well as (say) the three customers who arrive while they are being served. And if two further customers arrive while these three are being served, he must serve them, etc. Find the mean number of customers Mendel will serve after Minerva until he can go home. (Hint: Recall the conditional expectation rule, and also the rule for summing a geometric series.)

j

j

Chapter 2 Further Exercises

EXERCISE 2.59

175

At a certain maintenance facility, serious errors arise over time under a Poisson process with rate A = 2.4/year. (i) (ii)

What is the probability that the time until the next serious error is more than three years? What is the probability that there are three serious errors over the next three years AND all of them occur over the first three months? (Assume that all

months are exactly 1/12 of a year long.) For parts (iii) and (iv), assume that when a serious error occurs, workers are extra careful for the next month and there is no chance of a serious error. After that, the usual Poisson process resumes. (iii) (iv)

EXERCISE 2.60

Given that a serious error has just occurred, what is the probability that there are no serious errors over the next five months? Given that a serious error has just occurred, what is the probability that there is exactly one serious error over the next three years? (Ignore any differences between the last month of the three-year period and the previous months.)

(Gruesome example, but based on a real application of the Poisson distribution) Mendel will travel for five miles along a rural road, on which animals killed by cars ("roadkill") can sometimes be seen. On a given day, dead animals are distributed along the road according to a spatial Poisson process with rate A: in a road stretch of length L'1 ('1 very small), the probability of seeing a dead animal is A.£1. On average, dead animals appear once every three miles along this road. (i)

What is the probability that Mendel sees no dead animals on his five-mile trip? (ii) What is the probability that he sees at least one dead animal in each mile he travels (i.e., in the first mile and the second and ... the fifth)? (iii) What is the expected number of dead animals that Mendel sees? (iv) Given that he sees two dead animals in the first mile he drives, what is the expected total number that he sees? Minerva will make a four-mile trip on the same road: the first mile of her trip is the last mile of Mendel's. She will start her journey on the road just as Mendel as passing, so she drives right behind him. (v) What is the probability that neither Mendel nor Minerva sees any dead animals? (vi) What is the probability that Mendel and Minerva see the same number of dead animals? (Hint: Review Example 2.27, "Mystery on Ice," and use a spreadsheet to work out the numerical answer.) (vii) Given that Mendel and Minerva see the same number of dead animals, what is the probability that the number is zero?

176

EXERCISE 2.61

CHAPTER 2

Discrete Probability Distributions

Please read Example 2.28 about ATM queuing. Let random variable Xbe the number of (the three) customers who waited more than one minute between arrival and the start of service. (i) What is the probability distribution of X? (ii) What are its mean and standard deviation? (iii) Let random variable W be the average of the waiting times for the three customers between arrival and the start of service. Given the information at hand, what are the maximum and minimum values W could take on? (iv) Is P(W ~ 1/2) at least 1/2? (Hint: Look over the last part of 2.28.) (v) What is E(W)?

EXERCISE 2.62

Suppose that two aerial routes intersect at J an angle of 45° as shown. Eastbound planes arrive at Jin Poisson manner at rate 12 per hour and travel at 600 miles per hour, while Northeasterly planes arrive at 570 miles per hour in Poisson manner with rate 8 per hour. Planes on the two routes are in conflict if they are within five miles of one another. At the moment when an Eastbound plane reaches J, what is the probability that it is not in conflict with any Northeasterly plane? (ii) When a Northeasterly plane reaches J, what is the probability that it is not in conflict with any Eastbound plane? (iii) At the moment when an Eastbound plane reaches J, what is the probability that it is in conflict with more than one Northeasterly plane?

(i)

Continuous Random Variables Airline Yield Management

-~-

The Vice President of Revenue Management for t he large ai rline was perplexed. The airline had a 5:30 PM flight from Oakland to Orange County in California, wh ich was performed by a 100-seat Embraer 190 jet. Record s showed that, on Thursdays, an average of 90 business t ravelers who would pay the full fare of $190 sought the flight, w ith some week-to-week random fluctuation in demand. The airline also offered discount tickets for $69, w ith a strong advance pu rchase requirement. The question was: how many discount seats should the airline sell for a Thursday fl ight, not yet knowing how many seats business travelers would want? "The obvious answer is ten seats," the executive explained to his assistant, "but if we sell ten seats at a discou nt and business demand that week exceeds 90 seats, we'd be getting $69 for some seats for wh ich we could have gotten $190. Yet if we sell fewer than ten seats at a discount, there will be weeks w hen business demand is below 90 seats and we'll leave with lots of seats empty. If only there were a scient ific way to determine what we should do." "There is a scientific way," the assistant replied, thinking that he shou ld really be t he VP of Revenue Management . "It arises from the concepts of Probability."

To be continued

s we noted in Chapter 2, not all random variables are discrete. A random variable X can also be continuous, in which case it can assume any value in some interval from a to b. (Again, we are using shorthand terminology: we mean that the random variable is a function that assigns elementary events in the sample space to real numbers throughout the continuum from a to b.) There are an uncountably infinite number of real values within that interval and, for a continuous random variable, the

A

177

178

CHAPTER 3

Continuous Random Variables

probability assigned to any single value is essentially zero. That circumstance does not mean, however, that we cannot speak probabilistically about a continuous variable. It may make little sense to ask the probability that the temperature in Boston at noon next August 15 will be exactly 92° Fahrenheit (to six zillion decimal places). But it is sensible to ask the chance that the temperature will exceed 92°, or that it will fall between 90° and 92°. For continuous random variables, in other words, probabilistic questions are well-posed if they refer to intervals rather than specific values.

The Cumulative Distribution Function We have already met the cumulative distribution function for a discrete random variable, though we have rarely used it. For a continuous random variable X and a given number y, we say:

For continuous random variable X, its cumulative distribution function Fx(y) is defined by: Fx(Y)

= P(X s y)

(3-1)

(For short, Fx(y) is often called the CDF of x.) As before, Fx(Y) is the chance that X does not exceed y. The quantity y is not a variable: it is a numerical constant. For the mid-August Boston temperature, for example, we might be told that F(92) = .85 while F(60) = .02. Those statements would mean that the temperature has an 85% chance of not exceeding 92° but only a 2% chance of not exceeding 60° (and hence a 98% chance of doing so). Why are we interested in Fx(y)? The reason is that, if we are told Fx(Y) for all values of y, we can deduce any probabilistic quantities about X that we desire. We can learn the chance that X exceeds any number, falls below any number, or falls between any two numbers. If we know Fx(Y), we know as much as someone who possesses the probability mass function of a discrete random variable. To justify this last statement, we start by relating Fx(y)-the probability that X is at or below y-to P(X < y). If X is at or below y, one of two mutually exclusive possibilities has arisen: X < y or X = y. Therefore: P(X s y)

= P(X = y) + P(X < y)

(3-2)

But for a continuous random variable, P(X = y) is treated as zero so (3-2) becomes: Fx(Y)

= P(X s y) = P(X < y)

(3-3)

Equation (3-3) shows that F(y) tells us P(X < y) as well as P(X s y), and means that the symbols s and < can be treated as identical in connection with continuous random

3.1

The Cumulative Distribution Function

179

variables (as can?: and>). (Because P(X =y) can exceed zero for discrete random variables, these statements do not apply to them.) Random variable X must take on some value; it can either exceed y or fail to do so. Hence: P(X ~ y) + P(X > y) = 1, or: P(X > y)

= 1- P(X ~ y) = 1 -

Fx(y)

(3-4)

Thus, Fx(Y) can be used to determine the chance that x exceeds any given number. Now consider two numbers c and d, where d > c. The event that X ~ d can arise in two mutually exclusive ways: X falls below c, or X falls in the range (c,d) . In consequence: P(X ~ d) But because P(X ~ d) (3-5) as:

= P(X < c) + P(c ~ X ~ d)

(3-5)

= Fx(d) by definition and P(X < c) = Fx(c) by (3-3), we can restate

If d > c, then: P(c ~ X ~ d)

= Fx(d) -

Fx(c)

(3-6)

(This rule also applies if~ is replaced by CJ = (100 - A)%. Perhaps the most famous percentile is the 50th, which is called the median and which competes with the mean as a measure of the "middle" of the distribution. The 95th percentile of Xis often used to epitomize a high but not preposterous outcome: X is nineteen times as likely to fall below C95 as to exceed it (95/ 5), yet it is not the highest value possible. Likewise, the 5th percentile C5 is often treated as a low but not bizarre outcome. The distance between the 25th and 75th percentiles of X (i.e., C75 C25 ) is called the interquartile range of X-values: it has a 50% chance of including X and represents something of a midsection of the distribution. We will work numerical examples momentarily, but first we should meet an important relative of the cumulative distribution function.

The Probability Density Function While the CDF tells us the chance that X doesn't exceed y, it doesn't tell us whether X falls only slightly below y or several miles below. To understand the local behavior of X for a continuous random variable, it is useful to work with its probability density function. The probability density function of X, denoted f x(Y) and often shortened to the pdf plays a role similar to A in the Poisson process:

If f x(Y) is the probability density function of continuous random variable X, then:

P(y ~ X:::: y + .1):: f x(y).1 when .1 is extremely small

(3-8)

3.2

The Probability Density Function

181

D

FIGURE 3.2

A Density Function fx(y) in which XShows Up Most Often Near 0, Its Mode

Note that f x(Y) is not itself a probability: it must be multiplied by the length of a very small interval starting at y to yield the chance that X falls in that interval. The quantity f x(Y) must be nonnegative (though it can exceed one). If we plot f x(Y) vs. y, we see from (3-8) that the chance that X falls within a very small t:,. starting at y is proportional to the height of the density curve at y. Thus, the peaks of the curve indicate the neighborhoods where X is most likely to arise (Figure 3.2). The place where the density function is maximized is called the mode of X. If we break the real number line into adjacent segments of length t:,., and then add the probabilities that X falls in each one, the sum must be one. (After all, X must be somewhere, and we are adding the probabilities for all the mutually-exclusive places it can be.) But we might recall that integration in Calculus is a process of summation, which means that, if we use the notation 'dy' rather than t:,. for a miniscule increment that approaches zero, we can write: For any probability density function f x(y):

(3-9)

When X is restricted to the finite range (a,b), we can restate (3-9) as follows

If X is limited to the finite range (a,b), then:

• f x(Y)

= 0 outside that range

• s:f x(y)dy = 1

(3-10)

182

CHAPTER 3

Continuous Random Variables

A major advantage of working with the density function is that it immediately yields formulas for the mean and variance of X. As defined for discrete random variAnalogouslv, we define ables the mean is a summation of the form µ = ~ £...i m ,~1aI.p.. I J• 1

µ for continuous random variable X by multiplying each y-value by the chance that X falls extremely close to it (i.e., between y and y + dy), and then summing up over all possibilities:

When random variable X is continuous, then its mean follows:

(3-11)

Again in analogy with the discrete-variable formula, the variance

0'

2

(X) is given by:

(3-12) The Probability Density Function and the Cumulative Distribution Function

Given the definition offx(y), we can write: P(c

~ X ~ d) =

But we have also established in (3-6) that P(c is the CDF of X. It follows that:

~

r

X ~ d)

f x(y)dy (ford> c).

= Fx(d) -

Fx(c), where Fx(Y)

For continuous random variable X:

(3-13)

We could also obtain this equation using the definitions of pdf and CDF and a connection between them that arises from Calculus. We note that:

P(y ~ X ~ y + dy) = Fx(Y + dy) - Fx(Y) ""fx(y)dy (for dy very small) Dividing both sides of this relationship by dy yields: fx(y)"" (Fx(Y + dy) - Fx(y))/dy As dy ➔ 0, the expression on the right become the derivative of f x(Y) with respect to y. In short, the density function at a particular value is the derivative of the cumulative distribution Junction at that value. We might recall from Calculus that the CDF is therefore the indefinite integral of the pdf

The Probability Density Function

3.2

183

For a continuous random variable X, the derivative of its CDF Fx(y) with respect toy at any y-value is its probability density functionfx(Y) at that value.

Under the circumstances, the relationship:

r

f x(y)dy

= Fx(d) - Fx(c) also arises

from the Fundamental Theorem of Integral Calculus, which states the integral of a function between two limits is the change of its indefinite integrand between these two limits. The upshot is that pdf and CDF are by no means independent descriptors of the behavior of X; when one of them is known, it directly implies the other.

EXAMPLE 3.1

Our First Continuous Distribution Suppose that random variable X has the following CDF:

fx (y)

(i) (ii) (iii) (iv) (v)

=

0

for y < 0

!

for O ~ y ~ 2

1

for y > 2

What is P(X < 1.5)? What is P(X > 1.5)? What is P(0.7 < X < 1.5)? What is P(4 < X < 5)? What is the 70th percentile of X?

Solution

(i) Given that Fx(l.5) = 1.5/2, we see from (3-3) that P(X < 1.5) = .75. (ii) From (3-4), we see that P(X > 1.5) =1 - Fx(l.5) = .25 (iii) From (3-6), and the facts Fx(l.5) = .75 and Fx(0.7) = .35, we have: P(.7 < X < 1.5) = Fx(l.5) - Fx(.7) (iv)

(v)

= .75 -

.35

= .40

Because both Fx(4) and Fx(5) are both one, (3-6) implies that P(4 < X < 5) must be zero. That outcome makes sense: if it is certain that X does not exceed 4, there could not be a positive probability that X falls between 4 and 5. And Fx(4) = 1 means that the chance is 100% that X does not exceed 4. C70, the 70th percentile of X, is the value of y at which Fx(Y) = .7. Setting Fx(Y) = y/2 = .7, we get C70 = 1.4. (Clearly, the 70th percentile of the distribution cannot fall below Oor above 2, so we ignore those regions.)

184

EXERCISE 3.1

Continuous Random Variables

CHAPTER 3

Your turn. For the CDF in Example 3.1, what is the pdf in the three ranges below 0, 0-2, and above 2? What is the mean of X? What is the variance?

The Uniform Distribution When people are asked about some random quantity X, they sometimes say 'Tm pretty sure it's between a and b, but I have no idea where it falls in that interval." In effect, these people are assuming that X is equally likely to take on any value in the range (a,b) (where b > a). In that situation, it is said to be uniformly distributed over that range. (In notation, X in this case is said to be U(a,b).) To illustrate how to specify a cumulative distribution function Fx(Y) (hereafter CDF) in a particular case, we determine it now when X is uniform. To find Fx(y) when X is uniform on (a,b), it is convenient to break y-values into three distinct ranges: below a, from a to b, and above b. If y is below a, then Fx(y), the probability that x does not exceed y, must be zero. (See Figure 3.3). By similar reasoning, Fx(Y) = l when y > b (Figure 3.4). The more challenging case is that in which a < y < b. Here X has a nonzero chance of exceeding y but also a nonzero chance of not doing so (Figure 3.5). To work out Fx(Y) precisely, we use a geometric argument: since Xis equally likely to be anywhere in (a,b), its chance of falling at or below y is equal to the fraction of the line between a and b that falls between a and y (i.e., on the orange portion of Figure 3.5). But that fraction is just (y - a)/ (b - a). Hence Fx(Y) = (y - a)/(b - a) for a < y < b. Summarizing, we can write: 0

If xis U(a,b), then its CDF ~ (y) = P(x ~ y) =

y-a for a< y < b

b_ a 1

y

FIGURE 3.3

a

(3-14)

for y > b

b

If X must fall between a and b, then X can't fall below y here so Fx(Y) = P(X :5 y) = 0.

a FIGURE 3.4

for y < a

b

y

If y exceeds b, then it is certain that X falls below y so Fx(y)

= P(X :5 y) = 1.

3.3

The Uniform Distribution

185

a

y

b

FIGURE 3.5 The chance that X doesn't exceed y here is the chance it falls on the orange part of (a,b) rather than the blue part.

When Xis U(a,b), a graph of Fx(y) vs. y follows:

0

a FIGURE 3.6

b

The CDF of x when xis U(a,b)

Note, by the way, that random variable X in Example 3.1 was U(0,2).

EXAMPLE 3.2

A Quickie Suppose that Xis U(l,9). What are P(X < 1.2) and P(0.5 < X b. In the range (a,b), we need simply differentiate Fx(y) = (y - a)/(b - a), a process that yields the derivative 1/(b - a). In short:

186

CHAPTER 3

Continuous Random Variables

When x is U(a,b), its pdf fx(y) follows:

fx(y)

=

0

wheny < a

1 b-a 0

when a::;; y::;; b

(3-15)

wheny > b

When X is U(a,b), the graph of its pdf follows:

Fx(y)

0 y

FIGURE 3-7

b

a

If Xis U(a,b), then its probability density function follows the red line.

If we use (3-11) to find the mean of X, we find that E(X) = (a + b)/2. That result is not a surprise: we would expect the long-term average of outcomes spread uniformly between a and b to be (a + b)/2, the middle of the interval from a to b. Use of (3-12) yields the result a 2(X) = (b - a)2 /12, meaning that the standard deviation of X follows: a(X)

= (b -

a)/./0.

= .29(b -

a) .

It seems intuitive that the standard deviation of X should be proportional to b - a, but it is not obvious that the proportionality constant is 0.29. But that is why we perform calculations rather than rely on our guesses.

If X is U(a,b), then:

= (a+ b)/2 a(X) = (b - a)/./0. = .29(b -

E(X)

EXAMPLE 3.3

a)

(3-16)

Fire Fight

In some very small towns in the desert, all the buildings line up along one main street. One can approximate locations in such a town by their x-coordinates along a straight line from O to L. Suppose that the town's firehouse is located at x = L/3.

3.3

The Uniform Distribution

187

If all locations within the city are equally likely to be where a fire starts, what is the probability that the distance from the firehouse to a fire is at most L/3? (ii) What is the probability that the distance is at most L/2? (iii) What is the mean distance from the firehouse to a fire?

(i)

Solution (i)

If Z is the location of the fire, then Z is U(0,L) given the description of the town. The firehouse at L/3 is within L/3 of the fire if the fire occurs in the range (0, 2L/3). We can therefore say:

P(distance to firehouse at most L/3) (ii)

= P(Z $ 2L/3) = (2/3)L/L = 2/3

The firehouse at L/3 is within L/2 of the fire if the fire is anywhere in the range (0, L/3 + L/2) = (0, (5/6)L). Therefore: P(distance to firehouse at most L/2)

= P(Z $

(5/6)L)

= (5/6)L/L = 5/6

(iii) The conditional expectation rule (equation (2-18) is helpful here: If random variable V is the distance from the firehouse to the fire, we can write: E(V)

= E(VI Z < L/3)P(Z < L/3) + E(VI Z ~ L/3)P(Z ~ L/3)

We could use the conditional probability law to verify that, if Z < L/3, the revised distribution of Z is U(0,L/3). That outcome makes intuitive sense: if all values in (0, L/3) were equally likely when Z could be anywhere from 0 to L, why would some of those values become more likely than others when we learn that Z actually fell in (0,L/3)? In consequence, E(Z IZ < L/3) = L/6 When Z < L/3, then the distance Z between the firehouse and the fire follows V = L/3 -Z. Thus E(V) = L/3 - E(Z IZ < L/3) =L/3-L/6 =L/6. By similar reasoning, E(VI Z > L/3) =L/3. Therefore: E(V)

= E(VI Z < L/3)P(Z < L/3) + E(VI Z > L/3)P(Z > L/3)

= (L/6)(1/3) + (L/3)(2/3) = SL/18

EXERCISE 3.3

Suppose that, at the moment a fire starts in the town in Example 3.3, the fire truck in Example 3.3 is actually fighting a small brush fire out of town at X = 1.2L. The truck of course leaves immediately for the in-town fire. Assuming as before that the location of the fire is U(0,L), answer questions (i)-(iii) above about the distance the truck will travel.

EXAMPLE 3.4

Tunnel Vision An electric warning device at a railroad control center reveals that there is rail damage somewhere on a straight four-mile stretch of track. The first three miles are in the open and can be observed from above, but the last mile is in a tunnel. A

188

CHAPTER 3

Continuous Random Variables

dispatcher sends a helicopter to look for damage on the outdoor three-mile stretch. However, this search technique is not perfect: even if the helicopter passes over the trouble spot, there is a 20% chance that the crew will not recognize the damage. But if the damage is in the tunnel, it is certain that the helicopter crew will not see it. A few minutes later, the helicopter reports that it has failed to find the damage. Given this information, what is the probably that the problem occurred in the tunnel? Solution

Let X be the distance in miles from the start of the four-mile stretch to the point of damage, and assume that the tunnel is in the last mile of the stretch. With the limited information at hand, it is reasonable to assume prior to the search that X is U(0,4). If we define events A and B by:

A: the helicopter crew does not see the damage B: the damage is in the tunnel, then what we need here is P(B IA). We know that P(B IA) = P(A n B)/P(A) = P(B)P(A IB)/P(A), and that P(A) follows:

= P(A n B) + P(A n Be) = P(B) P(A IB) + P(B") P(A I Be) (B" is the complement of B)

P(A)

We can work out the requisite probabilities with the information at hand. Under the uniformity assumption, P(B) = .25, because only the last mile out of four is in the tunnel. (More formally, the tunnel extends from X = 3 to X = 4, and P(3 < X s; 4) = Fx(4) - Fx(3) under (3-6). Here Fx(4) = 1, and Fx(3) = (3 - 0)/(4 - 0) = 3/4, so P(3 < X < 4) = P(B) = 1/4.) Similarly, P(Bc) = 1- P(B) = .75. Because the helicopter crew cannot see into the tunnel, P(A IB) = 1 under the definition of event A. Because of the helicopter's limitations even when the damage is outside, P(A IW) = 0.2. Therefore: P(A n B) = P(B)P(A I B) = .25 * 1 = .25. P(A) = P(A n B) + P(A n B") = .25 * 1 + .75

* (0.2) = .40.

Thus: P(BIA) = P(A n B)IP(A) = .25/.40 = 5/8. The helicopter report thus more than doubles the chance that the problem is in the tunnel (from the original .25 to 5/8 = .625. That may well be bad news, because repairs in the tunnel might be more difficult than repairs out in the open.

EXERCISE 3.4

Suppose that the repair would cost $1000 if outside the tunnel and $3000 if inside. (i)

What was the mean of the projected repair cost when the warning signal in Example 3.4 first reached the tower?

3.3

(ii)

EXAMPLE 3.5

The Uniform Distribution

189

What was the revised mean of the repair cost after the helicopter reported not seeing the damage?

Color Scheme (True Story)

In Boston, pedestrians wishing to cross Park Street at Tremont Street face a traffic light with a 100-second cycle: green for seven seconds, then yellow for eleven seconds, and then red for 82 seconds. Those pedestrians who reach the intersection when the light is yellow are not supposed to start crossing. Bostonians being Bostonians, however, a good fraction do scurry across Park Street against a yellow light. Suppose that, among the pedestrians who reach the intersection when the light is yellow, fraction Q of them cross the street immediately. Suppose too that pedestrians arrive at the light at a uniform rate over time, and that none of them cross against a red light. (Even in Boston, there are limits.) Let random variable W be the waiting time at the light for a randomly-chosen pedestrian. (i) (ii)

Does W have a discrete or continuous probability distribution? What is the probability that the randomly-chosen pedestrian waits zero before crossing the street? (iii) In terms of Q, what is the mean time the pedestrian waits to cross? (iv) If Q = 4/5, what is 82 seconds as a percentile of the distribution of W? Solutions:

(i)

Well, W is both continuous and discrete (perhaps we should say that it is a hybrid). There is a positive probability that the waiting time is zero: all people who arrive when the light is green-which it is 7% of the time-have zero wait. On the other hand, the waiting time for a random pedestrian who arrives during a red light is a continuous random variable: in fact, it is U(0,82). A random pedestrian who arrives when the light is yellow waits zero if he crosses right away; otherwise, his waiting time is U(0,11) until the light turns red plus 82 seconds. We should note the general point raised here: It is possible for a random variable to be both continuous and discrete, because it takes some particular values with positive probability but

otherwise falls within some interval(s) in which no individual values have positive probability. P(W = 0) = .07 + .llQ, because there are two mutually exclusive ways of waiting zero: arriving when the light is green (7% chance), and arriving during a yellow light but crossing immediately (with probability .llQ). (iii) The conditional expectation rule is useful here. We define three mutually exclusive but exhaustive events:

(ii)

190

CHAPTER 3

Continuous Random Variables

A1: The pedestrian waits zero A 2 : The pedestrian arrives during a yellow light but does not cross A 3 : The pedestrian arrives during a red light

We can express E(W), the mean waiting time for the randomly-chosen pedestrian as:

Under the uniform arrival rate, the time in the 100-second cycle that a randomly-chosen pedestrian reaches the light is completely random . We know that P(A 1) = .07 + .llQ as discussed above, while P(Az) = .11(1 - Q), and P(A 3) = .82. We also know that E(W IA 1) = 0, and that E(W IA 3) = 41 seconds (half of 82; recall that Wis U(0,82) when A 3 occurs). Furthermore, E(WI Az) = 5.5 + 82 = 87.5 (note that 5.5 is half of 11). Putting it all together, we can write: Or E(W) (iv)

EXAMPLE 3.6

= O+ .11(1 -

Q)

* 87.5 + .82 * 41 = 43.25 -

9.63Q seconds

When Q = 4/5 (which is if anything low for Boston), then the only people who wait more than 82 seconds are the .11(1 - Q) of them who arrive during a yellow light and do not cross. If Q = 4/5, then .11(1 - Q) = .022. But if the chance is 2.2% that the random pedestrian's wait exceeds 82 seconds, it follows that 82 seconds is the 97.8th percentile of the distribution of W.

Fuel for Thought

Mendel drives a hybrid car, but it does require gasoline. He can buy gas at two gas stations separated by one mile. On a given morning, he estimates that the price per gallon at either station is equally likely to be anywhere from $3.70 to $4.00, and that the prices at the different stations are independent. (As happens in the US, the prices can involve fractions of a penny.) Mendel does not want to go to both stations to compare the price; rather, his strategy for getting a low price is: • He goes to the station nearer his home and, if the price per gallon is below $3.84, he buys gas there at once. • If the price is $3.84 or higher at the first station, he goes to the second one and buys gas there whatever the price. Some questions related to this strategy: (i) (ii)

What is the probability that Mendel will buy gas at the first station? What is the probability that he pays less than $3.80 for gas? (All prices here are per gallon.) (iii) What is the probability that he pays more than $3.90 for gas? (iv) Given that he buys gas at the first station, on average how much will he pay for it?

3.3

(v) (vi)

The Uniform Distribution

191

Overall, how much on average will he pay for gas? Why is the independence assumption questionable in this case?

Solutions:

Let random variables W1 and W2 be the prices at the first and second gas stations, respectively. (i)

Mendel will buy gas at the first station if the price there is below $3.84. Given that the price is U(3.70,4.00), we see under (3-14) that P(W1< $3.84)

(ii)

= fw/3.84) = (3.84 -

3.70)/(4.00 - 3.70)

= 14/30 = .467

There are two mutually exclusive ways Mendel can end up with a purchase price below $3.80: • He finds a price of below $3.80 at the first station • He finds a price more than $3.84 at the first station and goes on the second, finding a price below $3.80 there. We can therefore write: P(Mendel pays less than $3.80) = P(W1 < 3.80) + P(W1 > 3.84 n W2 < 3.80)

= P(W1 < 3.80) + P(W1 > 3.84)P(W2 < 3.80) But P(W1 < 3.80) = P(W2 < 3.80) = (3.80 - 3.70)/(4.00 - 3.70) = 1/3, while P(W1 > 3.84) = 1 - P(W1 < 3.84) = .533. Substituting these numbers into the equation above yields a probability of 1/3 + .533 * (1/3) = .511 that Mendel pays less than $3.80. (iii) If Mendel pays more than $3.90, he did not make the purchase at the first station; rather, he went on to the second but got stuck with a high price. Under his strategy, we can say: P(Mendel pays more than $3.90) We already know that P(W1 > 3.84) 3.70) = 1/3. Thus:

= P(W1 > 3.84 n W2 > 3.90) = P(W1 > 3.84)P(W2 > 3.90)

= .533 while P(W2 > 3.90) = (4 -

3.90)/(4 -

P(Mendel pays more than $3.90) is .533 * (1/3) = .178. (iv)

(v)

We could derive the result formally using Bayes' Theorem (Exercise 3.28), but intuition suggests that, if Mendel will buy gas at the first station, the price there is U(3.70, 3.84). (We use the same reasoning as in Example 3.3: if $3.71 and $3.82 initially were both equally likely as the selling price; how could knowledge that the price was below $3.84 suddenly make one of those two prices more likely than the other?) The average price assuming a purchase at that station is the middle of the uniform range (3.70, 3.84), namely, $3.77. Here we can use the conditional expectation rule (2-11) for two complementary events: A1 : W1 < 3.84 and A 2 : W1 ~ 3.84. If Z is the price that Mendel will pay under his strategy, then:

192

CHAPTER 3

Continuous Random Variables

E(Z)

= E(ZIA1)P(A1) + E(ZIA:z)P(A:z)

We know from (i) that P(A 1) = .467; hence, P(A:z) = 1 - P(A 1) = .533. And we have just argued that E(ZIA1) = 3.77. As for E(ZIA:z), that quantity is $3.85, namely, the middle of the price range (3.70, 4.00) at the second station. Thus, E(Z) = .467 * (3.77) + .533 * (3.85) = $3.81 (rounded to the nearest penny). (vi) The independence assumption is suspect for more than one reason. The price the gas stations charge is related to what they must pay their suppliers, which is presumably similar at both stations. Moreover, because the two stations compete for customers like Mendel, one of them cannot set its price without taking account of what the other is doing. In this instance, one would expect the two selling prices to exhibit positive correlation, a concept we will encounter in Chapter 4.

EXERCISE 3.6

Suppose Mendel chooses $3.88 rather than $3.84 as his threshold for buying gas at the first station. Now what is his average price for gas?

The Uniform Distribution and Monte Carlo Simulation It will not surprise readers that, in the 21st century, many situations involving uncer-

tainty are modeled in computer simulations. The computer instantaneously generates a large number of independent picks from a given probability distribution, which are then used as input numbers in an analysis. The good news is that, whatever the distribution, one only needs a random-number generator for the uniform distribution U(0,1) to simulate that distribution. And such U(0,1) generators are routinely available: in Excel, for example, the command rand() yields independent outcomes from U(0,l). (Well, almost: achieving exact randomness is impossible because the algorithms that generate random numbers have structures that very subtly violate the independence assumption. But don't worry about it.) To illustrate how a generator of uniform random numbers can yield random numbers from a discrete distribution, consider the following grim example.

EXAMPLE 3.7

Quitting Time?

Suppose you have a sadistic boss who calls you in and tells you that, by the next morning, he wants you to toss a fair die 10,000 times, and to record the outcomes in the order in which they arose. He wants a spreadsheet of the outcomes. Even if you're very fast and take no rest or meal breaks, you won't finish the assignment without pulling an all-nighter. Is there any way you can deal with the request that won't cause you agony?

3.3

The Uniform Distribution

193

Solution

The obvious answer is that you should quit at once. But suppose that this option is not available. There is a way you can fool the boss. You could go to the computer, and ask it to generate outcomes from U(0,1). Then you could write a quick program under which: Outcomes from 0 to 1/6 translate to the integer 1 Outcomes from 1/6 to 1/3 translate to the integer 2 Outcomes from 1/3 to 1/2 translate to the integer 3 Outcomes from 1/2 to 2/3 translate to the integer 4 Outcomes from 2/3 to 5/6 translate to the integer 5 Outcomes from 5/6 to 1 translate to the integer 6 For example, the output could look like: 0.257

2

0.886

6

0.799

5

0.544

4

0.028

1

0.771

5

0.446

3

0.591

4

0.701

5

0.329

2

(Left column consists of hypothetical random numbers from U(O, 1), to nearest thousandth)

You get the idea: you are breaking the line segment of Oto 1 into six non-overlapping intervals of length 1/6, and assigning each segment to a different integer in the range 1 to 6. (Do not worry about whether to assign 1 or 2 if you reach the outcome 1/6: the chance of doing so to several trillion decimal places is essentially zero.) If the original random numbers are U(0,1), then the resulting numbers from 1 to 6 will arise with equal probability. If you proceed in this manner, you will almost instantly get the spreadsheet of 10,000 outcomes you need. It would be impossible for the boss to tell that you didn't actually perform the tosses. In that regard, this process is better than simply making up 10,000 supposedly random outcomes: people are notoriously bad in imitating randomness. If the first number they pick is high, they tend to pick a low number the second time. True, you are being deceptive, and there are issues related to that. But we have just been introduced to the process by which a U(0,1) generator can yield

194

CHAPTER 3

Continuous Random Variables

picks from any discrete distribution. Suppose that random variable X follows the distribution:

w.p. Pm Then we can generate huge numbers of outcomes from this distribution by first going to the U(0,1) generator, and arranging that: Outcomes from O to p1 are identified with the outcome X = a1 Outcomes from p1 to p1 + p2 are identified with the outcome X = a 2 Outcomes from p1 + p2 to p1 + p2 + p3 are identified with the outcome X = a3 Outcomes from 1 - Pmto 1 are identified with the outcome X = am Note that, under this process, the interval corresponding to a1 is of length p1, the interval for a 2 is of length (p 1 + pi) - p1 = p2, and, for any j, the interval for ai is of length Pi· Thus, we will get each outcome in its proper proportion, and the successive outcomes are independent. We will say more about simulation later in this chapter (Section 3.8); for now, the important point is that the uniform distribution can help in many situations in which the random process is not uniform.

EXERCISE 3.7

Suppose that your boss instead asks that you toss a fair coin six times, record the number of heads, and then repeat the process, for 2000 times in total. How might you simulate this process with a U(0,1) random number generator? Try to generate the outcome for each group of six tosses at once (i.e., use 2000 random numbers, rather than 12,000 individuals outcomes which you bundle into groups of size six).

The Exponential Distribution Another continuous distribution of interest to us is the exponential. For an exponential random variable, all non-negative values are possible. If random variable X has an exponential distribution with parameter ;\, (A > 0), then by definition the probability that it exceeds some value y follows: P(X > y)

= e-?.y for y ~ 0

(3-17)

3.4

The Exponential Distribution

195

Note that, under this formula, P(X > 0) = 1 and P(X > y) CDF of X is directly available:

⇒

0 as y ⇒ oo. Given (3-4), the

When Xis exponential with parameter ;t, its CDF follows: Fx(y) = P(X ~ y) = 1- P(X > y) = 1- e-MJ

(3-18)

Taking the derivative of (3-18) with respect to y, we find that: When Xis exponential with parameter .:t, its pdf follows: (3-19) The density function under (3-19) has its peak value at 0 and then declines steadily, in the fashion of exponential decay (Figure 3.8). This pattern means that, as y increases, the chance that x falls in a small ~ starting at y is likewise decreasing. The likeliest "neighborhood" for X of length~ is (0, ~). Large outcomes for X become increasingly rare under this pdf, but they never die out completely.

0

FIGURE 3.8

The probability density function for an exponentially-distributed random variable

Using (3-11) and (3-12) with (3-19) and Calculus establishes that:

If Xis exponential with parameter .:t, then:

= 1/;\, cr(X) = 1/.:t E(X)

(3-20)

With the standard deviation equal to the mean, the coefficient of variation of X, namely Cv(X), is 1 under (3-20). This outcome indicates that the variability of X around its mean is about as large as the mean itself. (Note that the variance of Xis l / .:t 2.) Moreover, (3-17) and (3-20) mean that the probability that X exceeds its mean µ x follows:

196

CHAPTER 3

Continuous Random Variables

When Xis exponential with parameter A: P(X > µ x ) = P(X > 1/?.,)

= e-i(½) = e- 1 = 0.37, regardless of the value of A.

(3-21)

That outcome is interesting. Because there is a 37% chance of exceeding the mean (and a 63% chance of not doing so), the long-term average is larger than the median of the distribution, the 50th percentile. That happens because very large outcomes, although rare, happen often enough to drag the average up beyond the halfway mark of the distribution. Another interesting outcome is that: P(X > 2y)

= e-ny = (e-AY)2 = (P(X > y))2.

Exponential Distributions are Everywhere

The exponential has often proven to a good approximation to the distribution of time it takes different people to accomplish something. For example, if the times that various people spend at a given ATM (from the time they insert their cards to the time the leave the machine) are plotted, the graph might well look exponential. Lots of people will have very short service terms because they are simply withdrawing cash and are quite practiced at doing so quickly. The smaller number of others who are (say) making deposits take longer, and a few people with complex transactions (withdrawals, deposits, and transfers) will take far longer. One of the earliest practical applications of the exponential distribution involved the telephone system of Copenhagen, Denmark. Agner Erlang observed that the durations of different telephone calls approximately followed an exponential distribution: most were very short, some were of moderate length, and a few were interminable. In Physics, the exponential distribution arises in connection with radioactive decay. In a population of atoms subject all to the same kind of decay (in which the atom emits radiation and is transformed into something else), the time until decay for a given atom follows an exponential distribution. When a Geiger counter records instants of decay in a group of similar atoms, the clicks are very frequent at first (corresponding to short decay times) but then gradually become less and less frequent. Here the exponential rule is not an approximation: it is a fundamental principle of Chemical Physics.

EXAMPLE 3.8

Memory Loss

Suppose that the length of some interval follows an exponential distribution with parameter A, and suppose it is known that the interval has just exceeded some numberc. (i)

What is the conditional probability that the event occurs in the next q time units (q > O)?

The Exponential Distribution

3.4

(ii)

197

For example, suppose that the durations of phone calls in minutes are exponentially distributed with parameter;\,, and that a given call has just reached the five-minute mark. What is the probability that the call ends in the next ten minutes?

Solution What we really want here is a conditional probability: P(c < X s c + q IX > c). (If a call that has lasted c minutes ends within the next q, its total length is in the range (c, c + q).) But we can invoke Bayes' Theorem to write: P(c < X < c + qIX > c)

= P(c < X s c + q n X > c)/P(X > c) (3-22) = e-M. To find P(c < X < c + q n X > c), we first

From (3-17), we have P(X > c) note that, if c < X s c + q, then it automatically follows that X > c. Therefore, what we really seek in the numerator of (3-22) is P(c < X s c + q). To find P(c < X s c + q), we can apply our usual rule(3-6) involving the CDF of X, namely: P(c < X s c + q)

= Fx(c + q) -

Fx(c)

= (1 -

e-A(c+q)) -

(1 - e-N.).

Fasten your seat belt: a powerful result is corning. When we substitute the expressions just cited into Bayes' Theorem, we get: P(c < X < c + qIX > c)

= (e-N. -

e-A(c+ql)/e-N.

=1 -

e-NJ

If X is exponential with parameter ;\, and q > 0, then:

P(c < X < c + q IX > c)

= 1 - e-NJ regardless of the value of c

(3-23)

You may be wondering why this result is so powerful. Well, suppose that the exponential process has just started (i.e., c = 0) and we want the chance that it ends within q time units. The answer is simply Fx(q) = 1- e-NJ. But suppose that the process did not just start; rather, it started c time units ago. The answer is exactly the same: the chance the process ends in the next q time units is 1 - e-NJ for any value of c. In an exponential process, in other words, the remaining time in the process is totally unrelated to the duration of the process thus far. That this outcome is striking is clearer if we consider a specific case. Suppose that-as in old Copenhagen-telephone calls are exponential in length, and that they have a 60% chance of lasting less than ten minutes. Then a call that has just started has a 60% chance ending within ten minutes. But another call that has already gone one for 20 minutes also has the same 60% chance of ending in the next ten. Sarne for a call that has gone on so far for 45 minutes. If the question is "how much longer will this process last?" is asked about an exponential process involving time, then the amount of time it has lasted so far is completely irrelevant to the answer. (In part (ii) of this problem, the probability of ending in the next ten minutes is 1 - e-10.i., independent of the fact that it has already gone one for five minutes.)

198

CHAPTER 3

Continuous Random Variables

When future developments in a random process do not depend on anything in the past, the process is said to be memoryless. We have already described the Poisson process as memoryless; now we know that the exponential process also meets this standard.

EXERCISE 3.8

If the lengths of telephone calls are exponentially distributed in minutes with (positive) parameter A-, then which of the following is more likely (i) a call that has already gone on two minutes will last an additional ten minutes or (ii) a call that has gone on for 45 minutes will last an additional nine minutes? Does the answer depend on the value of A-?

The Poisson and the Exponential

The exponential distribution plays an extremely important role in the discussion of Poisson processes. Suppose that an event has just occurred in a Poisson process with parameter A-, and let random variable X be time until the next event. Then we can use (2-24) to write that: P(X > y)

= P(zero events in the next y time units) = e-'-Y

But then X follows an exponential distribution with parameter A- . In other words:

If events arise under a Poisson process with parameter A-, then the time between consecutive events follows an exponential distribution with parameter A-.

The reverse is also true: if the intervals between events are independent and exponential with parameter A-, then the number of events in a period of length T must follow a Poisson distribution with parameters A- and T. The proof of the last statement is quite brief. If the intervals between events are all exponential with parameter A- then, regardless of how long the present interval has gone on so far, the probability of an event in the next ~(~ very small) is given under (3-18) by: P(an event in next ~)

=1 -

e--""

Here we are using the fact that the chance of an event in the next ~ is the same as the chance that the current interval ends during that ~ Because ~ is very small, the Taylor series expansion of e--"" leads to the approximation:

3.4

199

The Exponential Distribution

where we are ignoring later terms in the expansion, all of which have t:. 2 as a factor. But then: P(an event in next t:.) = 1 -

e- U ""

A.Cl.,

which implies that: P(no event in next t:.) "' 1 - A.Cl.

(3-24)

Moreover, the memoryless property of exponential distributions means that different t:.'s are independent in terms of whether they contain an event. The probabilities in (3-24) are exactly the same as those under which we derived the Poisson distribution in Chapter 2. Therefore, the number of events X over a period of length T must follow the Poisson formula:

In summary:

If the intervals between successive events in a process are independent and exponential with parameter A, then the total number of events over a period of fixed length T follows a Poisson distribution with rate A per unit time:

(3-25)

We should note that, under the Poisson assumption, the exponential intervals between different consecutive events are independent. Knowing the time between the first and second event, for example, tells us nothing about the time between the second and the third.

EXERCISE 3.9

If events arise in a Poisson process with parameter A = 2 per minute, what is the probability that it is at least three minutes until the next event and at least two minutes between the next event and the one after that?

EXAMPLE 3.9

A Battery Charge Mendel works as a consultant, and needs a battery to power his laptop so he can work during airplane trips. The brochure about the battery he uses says that the average lifespan of the battery is 60 hours; what is does not say, however, is that the lifespan is exponentially distributed with that mean. Because it's traumatic for

200

CHAPTER 3

Continuous Random Variables

Mendel if the battery dies during a flight (leaving him unable to work) he replaces a battery when it reaches age 60 (hours) even if it is still working. Of course, if the battery dies before 60 hours, he suffers a trauma and gets a new battery as soon as he lands. Some questions: (i) Does Mendel's 60-hour rule make any sense? (ii) What is the probability that a battery will die before it reaches age 60? (iii) If Mendel has just installed a new battery, how long will it operate on average before he replaces it? (iv) By what percentage would his long-term cost of buying batteries go down if he only replaced batteries when they died? Solutions

(i)

(ii)

(iii)

The rule makes no sense whatever. If lifespans are indeed exponential, then a battery at age 60 has the same projected future lifespan as a new battery. (That could happen because the very fact that it reached 60 means that it is unusually well made, and thus is just as likely as a new battery to last for (say) 60 additional hours.) His strategy means that he is spending more money on batteries than he would if he replaced them only at death, and he is not achieving any reduction in traumatic events. In any exponential distribution, we know from (3-21) that the probability is 63% that the outcome falls below the mean. Thus, there is a 63% chance that he will have to replace the battery after it failed in flight and before it reached age 60. The average time until Mendel replaces a battery can be calculated using a mathematical trick. The reader should admire the trick, and not feel uneasy if she didn't think of it. We recall the conditional expectation rule involving two complementary events:

Suppose that: Xis the battery's time until replacement A1 is the event that the battery dies before 60 A 2 the event that it is still alive at age 60. With an exponential distribution with mean 60, we have P(A1) = .63 and = .37 from (3-21); we also know that E(XIAi) = 60 under his immediatereplacement policy at age 60. Thus:

P(Ai)

E(X)

= .63 E(XIA1) + .37 * 60 = .63 E(XIA1) + 22.2

But what is E(XIA 1)? Here is where the trickery comes in. Let random variable Z be the battery's lifespan if Mendel's rule is scrapped and it will be

3.4

The Exponential Distribution

201

kept until it dies. We know that E(Z) = 60, and that E(ZIA 1) = E(XIA1) (if the battery dies before 60, Mendel's replacement rule at 60 has no bearing on age at replacement). But we also know that E(Z IAi) = 60 + 60 = 120, because a battery that reaches 60 has an average of 60 hours of remaining life under the memoryless property. We can therefore write:

E(Z) = E(zl~)P(~) + E(zl~)P(~) = 60

= .63 * E(zl~) + .37 * 120 = .63 * E(zl~) + 44.4 From this last linear equation, we see that E(ZIA1) = (60 - 44.4)/.63 = 24.8. That outcome means that, over lots of batteries, the average lifespan of 60 arises because 63% of them die before 60 at an average age of 24.8, while the remaining 37% die on average at 120. Because E(ZIA1) = E(XIA1), we now know that E(XIA 1) = 24.8, and therefore that:

E(X) = 24.8 * (.63) + 60 * (.37) = 37.8 hours

(iv)

You want to read this solution again? Don't blame you; go right ahead. If you go through step by step, it will make sense. Over a very long period of T battery-hours, Mendel will buy approximately T/37.8 batteries under his 60-hour rule, because he replaces batteries every 37.8 hours on average. If he only replaced batteries when they die he would do so on average every 60 battery-hours. His cost is therefore a factor of (T/37.8)/(T/60) = 60/37.8 = 1.59 higher (i.e., is 59% higher) under his policy as under letting batteries last until they die. And this 59% extra expense gains him nothing, because replacing old batteries with new ones is useless under exponential lifespans.

EXERCISE 3.10

Suppose that Mendel partially mends his ways, and replaces his 60-hour rule with the corresponding 90-hour rule. Answer questions (i) ⇒ (iv) in Example 3.9 under this new rule.

EXAMPLE 3.10

Traffic Congestion

Mendel and Minerva go together to the Bureau of Motor Vehicles (BMV), where they must take care of matters concerning their vehicles and drivers' licenses. They know from experience that the time it will take to complete their business is quite unpredictable: in fact, the transaction time (entry to exit) is exponentially distributed with a mean of 20 minutes (i.e., A = 1/20) for each of them. Because they are going on different lines and dealing with different people, their transaction times are independent.

202

CHAPTER 3

Continuous Random Variables

Some questions about their visit: (i)

What is the probability that neither of them will be finished within 20 minutes? (ii) What is the probability that Minerva will be finished in 20 minutes but not Mendel? (iii) What is the probability that Mendel's transaction time will be at least ten minutes longer than Minerva's? (iv) Zanzibar is driving over to pick Mendel and Minerva up after they are finished. Because no vehicle standing is allowed outside the BMV, he wants to arrive at time T when the probability is 95% that both of them are finished (and thus are waiting outside). Assuming that they reach the BMV at 1 PM, what time T satisfies his requirement? (v) When Zanzibar arrives at that time T, what is the probability that Mendel and Minerva are waiting and both of them have waited more than 30 minutes? (vi) Let random variable Z be the amount of time until the first of them is finished. What is the probability distribution of Z? (vii) How long on average will it be until both of them are finished?

Solutions Quite a few questions. But let's get started. (i)

(ii)

Because their transaction times are independent and exponential with parameter 1/20, we have: P(both times exceed 20) = P(Mendel > 20) * (Minerva > 20) = (e-,.Y)2 where )., = 1/20 and y = 20. Thus, P(both exceed 20) = (e-20120)2 = (e-1)2 = .3'11- = .14 (Given that the mean transaction time is itself 20 minutes, we could have cited the general result that an exponential variable has a 37% chance of exceeding its mean.) We can write: P(Minerva finishes within 20 minutes but not Mendel) = P(X1 < 20 and X2 >20), where X1 = Minerva's transaction time and X2 = Mendel's. Under independence of X1 and X2: P(X1 < 20

n X2 >20) = P(X1 < 20) * P(X2 >20) = (1- e-1)e-1 = .63 * .37 = .23

(iii) Here we have to be a bit clever. We want P(X2 ~ X1 + 10), but can express this event as P(A n B) where: A: Minerva finishes first

B: Mendel's time is at least ten minutes longer than Minerva's. Then we have: P(A

n B)

= P(A) * P(BIA)

= P(Minerva first) * P(lO more minutes for Mendell Minerva first)

3.4

203

The Exponential Distribution

Now P(Minerva finishes first) = 1/2, because they both have the same exponential distribution for service time, and thus an equal chance of ending first. (For a continuous distribution, the probability of a tie is zero.) As for P(B IA), we can invoke the memoryless property of the exponential to say that, from the moment that Minerva finishes and regardless of when she finishes, Mendel's remaining time is exponential with parameter 1/20. (Right?) Hence, P(ten more minutes for Mendell Minerva first)

= e-ly for )., = 1/20 and y = 10. Thus = e-112 = .61.

(iv)

In short, P(Minerva first and at least ten more minutes for Mendel) = 1/2 * .61 "'.31 We first want the number r for which P(X1 and X2 < r) = .95. But, given that X1 and X2 are independent, we have:

Setting ( 1 -

J

e-fo = .95,

we get ( 1- e-fo)

= ✓-95 = .975

and then:

To solve for r, we take logarithms of both sides of this equation, which yields: = ln(.025). Because ln(.025) = -3.69, we reach r = 73.8 But r = 73.8 minutes means that Zanzibar should arrive 73.8 minutes after 1 PM, which implies that T = 2:13:48 PM (i.e., 13 minutes and 48 seconds after 2PM). For both Mendel and Minerva to have waited more than 30 minutes for Zanzibar, each of them would have to be finished by 1:43:48 (i.e., 30 minutes before T). So:

-'l' /20

(v)

P(both wait 30 + minutes)

(vi)

= P(X1 and X2 < 43.8) = P(X1 < 43.8) * P(X2 < 43.8) = (1 - e-43.8/20)2 = (1 - .112)2 = .79

In short, the chance is roughly 4 in 5 that both of them will wait at least a half hour(!). Given the unpredictability of the exponential process, achieving a low risk than Zanzibar will have to wait entails a high risk that Mendel and Minerva will both have long waits. This question is easier than it seems. We can write: P( Z > y)

= P(time until first completion exceeds y) = P(both transaction times exceed y)

204

CHAPTER 3

Continuous Random Variables

But if we go back to our first statements about the exponential distribution, we see something striking: P(Z > y) = e-V..y means that Z itselffollows an exponential distribution, although with parameter 2A rather than A. Here we are asked to find E(Z), the mean of Z. That quantity is simply l/2A = 1/ (2/ 20) = 10 minutes. In directional terms, this answer makes sense: if each person has an average transaction time of 20 minutes, then the one who finishes earlier presumably does so in less than 20 minutes. In the exponential case with its high variability, the average advantage is quite large. This outcome about Z is a special case of a powerful result:

If X1 is exponential with parameter A1 and X2 is independently exponential with parameter A- 2, then the minimum of X 1 and X 2 is exponential with parameter A1 + A- 2.

(3-26)

(vii) This question too is not too bad. If Z is the time until the first person finishes, then we know from (vi) that E(Z) = 10. If W is the remaining time until the second person after the first one finishes, then, given memorylessness, W follows the original exponential distribution with parameter 1/ 20. It follows that E(W) = 20 minutes. If it takes on average 10 minutes for the first person to finish and an additional 20 on average for the second person, it seems intuitive that the average total time to completion would be 10 + 20 = 30 minutes. That answer is correct, as we discuss more formally in chapter 4 when we discuss the mean of a sum of two random variables. Given that Mendel and Minerva are on average both finished by 1:30, it is no surprise that they generally wait a long time until Zanzibar arrives at 2:13:48.

EXERCISE 3.11

In the situation of Example 3.11, what is the probability that Zanzibar (who arrives at 2:13:48) waits longer for Minerva than Mendel waits for her? (Yes: Mendel might wait zero time for Minerva.) (Hint: This isn't too hard if you think about how that event could arise.)

EXAMPLE 3.12

Express Yourself? (Based on a true story) The downtown passenger entering New York's 8th Avenue subway at 34th Street must choose a different platform for express trains than for local trains. Mendel is traveling from 34th St. downtown to Canal Street, and knows that the travel time there from 34th St. is deterministically 8 minutes on the express and 11 minutes

The Exponential Distribution

3.4

205

on the local. However, downtown local trains arrive at 34th St. under a Poisson process with mean 12 per hour, while express trains arrive with mean 6 per hour. Mendel decides to go to the express platform. (i)

What is the probability that Mendel will arrive at Canal St. within ten minutes? (In this and later parts of the problem, neglect the time the train spends at 34th St. station.) (ii) What is the pdf of the time until the next downtown train-local or expressenters 34th Street station? (iii) What is the probability that the next downtown train to reach 34th St. will be a local train? (iv) What is the probability that the next two downtown trains to reach 34th St. will both be local trains? (v) What is the probability that Mendel arrives at Canal St. earlier because he chose the express than he would have had he chosen the local? Solution

(i)

Given a transit time of eight minutes on the express, Mendel will arrive at Canal Street within ten minutes if the next downtown express arrives in the next two minutes. We know from Example 3.9 that, in a Poisson process with parameter A = 6 per hour, the time X until the next express train is exponential with A = 6 per hour. Given that two minutes is 1/30 of an hour, we can use the CDF of X in (3-18) to write: P(X ., 2) = 1 - e- 6 5, they will both agree to Minerva's hypothesis while, if Z < 5, they will go with Mendel's view? (It is impossible that Z = 5 out of 100 picks; think about it.) (iii) Suppose that Xis the number of picks above 1/2 that arise (out of the 100). What is Z in terms of X? (iv) What values of X would yield a Z-value greater than 5? (v) What is the probability that they will erroneously accept Minerva's hypothesis if Mendel is correct? (Hint: What is the distribution of X under Mendel's hypothesis?)

r

Further Exercises

241

(vi)

What is the probability that they will erroneously accept Mendel's hypothesis if Minerva is correct? (vii) Zanzibar initially believes that Mendel and Minerva each has a 50% chance of being correct. Given that Z = 6, what should be his revised estimate of the probability that Minerva is correct? (Yes. Bayes' Theorem.) Incidentally, Minerva is wrong. The Excel random number generator has been extensively tested, and it has performed well.

EXERCISE 3.30

Please review Example 3.6, Fuel for Thought. Now suppose that Mendel has three gas stations on his route from home to work, and that the price of gas at each station is equally likely to be anywhere from $3.70 to $4.00 per gallon, with the prices at different stations independent. His strategy for getting a low price is: • If the price is $3.80 or below at the gas station nearest his home, he fills up his gas tank there. • If the price exceeds $3.80, he goes on to the second station, and fills up his tank there if the price is $3.90 or below. Otherwise, he proceeds to the third station. • If he goes on to the third station, he fills up his tank there whatever the price.

(i)

Show that, given that Mendel buys gas at the first station, the probability that he paid y or less than y is (y - 3.70)/(3.80 - 3.70) for y < 3.80. In other words, his price given that he bought at the first station is U(3.70, 3.80). (ii) For each of the three gas stations, what is the probability that Mendel purchased his gas there? (iii) For each of the three stations, what is the conditional mean price that Mendel paid given that he bought his gas there? (iv) What is the overall mean price at which Mendel purchased his gasoline? (v) Suppose that Mendel is willing to change his threshold price at the second gas station from $3.90. Give an example of a change in that threshold that would reduce his average purchase price. (vi) Assuming that his second threshold stays at $3.90, are there any changes in the first threshold of $3.80 that would reduce his average purchase price? (vii) If he is willing to change both his first and second thresholds from $3.80 and $3.90 respectively, what might you suggest he do to lower his average purchase price?

EXERCISE 3.31

A classic probability problem asks: if a stick is broken at two completely random places, what is the chance that the three pieces can form a triangle? We will answer that question now in ... pieces.

242

l CHAPTER 3

(i)

Continuous Random Variables

Before doing any calculations or agonizing, what is your "gut feel" estimate of the answer? Suppose that the stick is of length 1, with Oas its lower end. Let X1 be the location of the first random point chosen (in chronological order), and X2 of the second point. (Assume that X2 can take any value on (0,1) regardless of the value of X1.) What are the distributions of X1 and X2?

(ii)

0

U3

L

Suppose that we define the first segment of the broken stick as between Oand min (Xi, Xi), the second segment as between min (Xi, Xi) and max(X1, Xi) and the third segment as between max (Xi, Xi) and 1. (iii) For what numerical range of values for X1 and X2 will the first segment be so long that a triangle is impossible? (Hint: Can you make a triangle if the first segment is longer than the sum of the other two?) (iv) What is the probability that the first segment will be so long as to prevent a triangle? (v) Argue that the chance the third segment will be so long as to prevent a triangle is the same as the chance as the first segment will do so. (vi) Now argue that the chance the second segment will prevent a triangle is the same as the chance that the first will do so. (Hint: Is the probability that X1 is between .55 and .55001 and X2 is between .64 and .64001 the same as the probability that X1 is between .64 - .55 = .09 and .09001 and X2 is between .64 and .64001? Can you generalize this argument?) (vii) Now, what is the overall chance that a triangle is not possible?

EXAMPLE 3.32

Suppose that crimes in a one-dimensional town of length L are uniformly distributed over the town. When a crime occurs, the location of the police car that will respond to it is also uniformly distributed over the town, and the locations of the crime and the police car are independent random variables. (i) (ii)

Given that the crime occurs in (y, y + Ll) for O:s; y :s; L and Ll very small, what is the expected distance the police car must travel to the scene of the crime? What is the overall mean distance the police car must travel? (Hint: Use the conditional expectation rule and the result in (i).)

(iii) What is the standard deviation of the distance the police car must travel? (Hint: First work out the mean squared travel distance.)

r I

243

Further Exercises

EXAMPLE 3.33

Suppose we return to the 34th Street subway station in Example 3.12, but one thing has changed. Downtown express trains continue to arrive at the rate of six per hour, but now they do so at intervals of exactly ten minutes. Similarly, downtown locals arrive at twelve per hour but at intervals of exactly five minutes. Mendel arrives at a random time, and knows nothing about the train schedules. Thus, the time he would wait for the next downtown express is U(0,10), and the time until the next downtown local is U(0,5). Assume that the times until the next express and the next local are independent random variables. (i)

(ii)

If Mendel goes to the express platform, what is the probability that he will reach Canal Street in less than ten minutes? (As in Example 3.12, travel time to Canal St. is exactly eight minutes on the express.) What is the probability that the next downtown train to reach 34th St. is a local? More specifically, argue that the answer is

s:(½)((10- y)/lO)dy and find the numerical answer. (iii) What is the probability that the next two trains downtown are both locals? (Hint: How should you alter the integral in (ii) to answer this question?) (iv) What is the probability that Mendel arrives earlier on the express than he would have had he taken the local?

EXERCISE 3.34

In radioactive alpha decay, an atomic nucleus emits an alpha particle and decays into another atom with lower mass number and atomic number. Each nucleus subject to such decay will do so at a time (from now) that is exponentially distributed with parameter .:t, and different nuclei are independent in their decay times.

;-:article ~

Am-241

lpha

Np-237

(i) Suppose that there are N nuclei simultaneously subject to alpha decay (N large). Approximately how long will it take until half of those nuclei will have decayed? (This time is called the half-life of such nuclei.)

244

CHAPTER 3

Continuous Random Variables

True or False (Please provide both answer and explanation.) (ii)

The time until 3/4 of the nuclei have decayed will be approximately twice the half-life. (iii) There is no finite time T when it is possible that all N nuclei have already decayed. (iv) There is no finite time T when it is definite that all N nuclei have already decayed. (v) The time until the first instance of alpha decay among the N nuclei is exponentially distributed with parameter AN.

EXERCISE 3.35

In a telephone exchange, it is believed that the durations of calls are exponentially distributed with mean length of 10 minutes. Three calls start simultaneously at 1:00:00 PM (the last two digits refer to seconds); assume that the durations of these calls are independent. (i) (ii)

What is the probability that all three calls will be finished by 1:10 PM? What is the probability that all three calls end between 1:10:00 PM and 1:15:00 PM? (iii) What is the probability that the longest call lasts beyond 1:15:00 PM? (Hint: Under what circumstances are all calls finished by 1:15:00?) (iv) Suppose that the second longest call ends at 1:08:30 PM. What is the probability that the longest call is still going on at 1:23:30?

EXERCISE 3.36

Suppose that the intervals between successive buses at a given stop are equally likely to be 3, 4, and 5 minutes. (i) (ii) (iii)

(iv)

(v)

If Mendel just misses a bus, what is the probability that he will wait at least four minutes until the next bus? Suppose that we observe N bus intervals in a row, with N very large. About how many minutes will it take until the Nth bus arrives? Of the minutes until the Nth bus arrives, about how many of them occur during three-minute intervals between successive buses? Four-minute intervals? Five minute intervals? Using the results in (ii) and (iii), what fraction of the minutes until the Nth bus arrives are occupied by three-minute intervals? If Minerva arrives at a random time, what is the probability that she arrives during a three minute interval? What is the probability that Minerva will arrive in a three-minute interval and will wait less than two minutes for the next bus?

r 245

Further Exercises

(vi)

What is the overall probability that Minerva will wait less than two minutes for the next bus? (vii) Given that the average time between buses is four minutes, Zanzibar asserts that a person like Minerva who arrives at a random time will wait on average two minutes until the next bus. Is that statement true? (Hints: If Minerva arrives in a five-minute interval, what is her expected wait until the next bus? Remember the conditional expectation rule.)

EXERCISE 3.37

When Mendel arrives at a certain airport, he travels to town either in a shuttle van or in a bus. The vans have eight seats, and the next van only leaves the airport when all eight seats are filled with passengers. The bus operates on a fixed departure schedule. He wants to choose whichever vehicle will leave the airport first. (i)

Suppose that passengers for the van arrive under a Poisson process with rate 0.5 per minute, and that Mendel notices that four people are already in the van. (If he gets in, he will be the fifth passenger.) It is now 11:50 AM, and the next bus leaves at noon. If he chooses the van, what is the probability that it will depart the airport before the bus? (ii) Suppose that the bus and the van depart from separate places, and Mendel must make a choice between them before knowing how many people are already in the van. If it is ten minutes before the next bus leaves and he chooses the van, what is the probability the van will leave before the bus? (Assume that the number of passengers he will find on the van is equally likely to be any integer from zero to seven, and that it takes him one-trillionth of a second to reach the van.) (iii) Now suppose that Mendel has ten minutes until the bus leaves, and that he has time to go to the van stop and then-if he finds too few passengers in the van-to move on to get the bus. His rule is that he will go to the van (in one-trillionth of a second) but will leave for the bus stop if the number of passengers aboard the van is either zero or one. Under that rule, what is the probability that he will choose the vehicle that leaves the airport first?

EXERCISE 3.38

(i)

Using an electronic table, find the 90th percentile C90 of each of the following normal distributions N(0,1)

(ii)

N(20,5)

N(-20,60)

By calculating (C90 - µ)/CJ for each of the distributions in (i), see how many standard deviations the 90th percentile is from the mean.

, 246

CHAPTER 3

(iii) (iv)

(v)

(vi)

EXERCISE 3.39

Continuous Random Variables

Now, find the 25th percentile C25 for each of the distributions in (i) Now find for each distribution how many standard deviations the 25th percentile is from the mean. (A negative number simply tells how many standard deviations below the mean. If you have done your calculations correctly, your answers to (ii) and (iv) for all three distributions should be the same. As we will discuss in Chapter 4, that outcome is not a coincidence. Does it follow from (v) that, to find out how many standard deviations a particular percentile of a normal distribution is from its mean, we need merely find the answer for that percentile for the distribution N(0,l)?

Read Example 3.18, "California Nightmare." (i)

Using the information and probability model discussed in the example, what is the probability of a major earthquake on the Southern San Andreas Fault over the next twenty years (starting in mid-2014)? (ii) If there is no major earthquake on the Southern San Andreas over the next twenty years, what is the probability (from that time) of an earthquake in the next twenty years? (iii) Over how many years from now is there a 50% probability of seeing the next major earthquake on the Southern San Andreas? (Your answer can involve fractions of a year.)

EXERCISE 3.40

For a company that has just come up out Chapter 11 bankruptcy protection, the following statistics apply: • The chance is 60% that it will return to Chapter 11 bankruptcy in the future • Given that it returns to Chapter 11 bankruptcy, the time until it does so is normally distributed with mean two years and standard deviation six months. • Given that the company returns to Chapter 11 bankruptcy, the probability is 70% that it will be liquidated. (If it does not return to bankruptcy, it will not be liquidated.) (i)

Based on these statistics, the chance that a company that just left Chapter 11 will eventually be liquidated is: (A) 50% (B) 42% (C) 35% (D) 30%

ii)

Based on these statistics, the probability that the company will return to Chapter 11 within two years is: (A) 50% (B) 42% (C) 35% (D) 30%

Further Exercises

247

(iii) Given that the company goes three full years without returning to bankruptcy, the probability that it will do so later is closest to: (A) 1.5% (B) 2.5% (C) 3.6% (D) 3.8%

EXERCISE 3.41

During a bull market the weekly price change of a share of stock X is normally distributed with mean .05P and variance 1, where P is the price at the beginning of the week.

If a share of stock X costs $24 at the beginning of a week, what is the probability the stock goes up that week? (ii) Given that the stock goes up that week, what is the probability it reaches $27? (iii) Given that the stock goes up that week, is the probability of further increase the next week more or less than the quantity calculated in (i)? Explain. (No calculations are necessary). (i)

EXERCISE 3.42

In Exercise 3.26, suppose instead that the temperature increase in assumed normally distributed, with a mean halfway between 3.5 and 7.4 (i.e., at 5.45). (i)

What standard deviation for the increase is consistent with the MIT statement? (Hint: What is 7.4 as a percentile of the distribution of the increase? Look at Exercise 3.38 to see how to determine how many standard deviations that percentile is above the mean.) (ii) Now, what is the probability that the rise will exceed 5° Celsius? (iii) Find an interval centered at 5.45 that has a 99% chance of including the actual increase.

EXERCISE 3.43

Suppose that the lifespan of a piece of equipment is normally distributed with mean 10 years and standard deviation 3 years. A particular piece of equipment is now age 11 years. The firm is considering two options: • Replace it now. • Wait until age 13 to replace it (unless it dies before then, in which case an unscheduled replacement is necessary). Suppose that immediate replacement will cost the company $a. Replacement at age 13-if the equipment lasts that long-will cost $b, where b < a. An unscheduled replacement between now and age 13-if the equipment dies suddenly during the next two years- will cost $c, where c > a.

248

CHAPTER 3

Continuous Random Variables

(i)

What is the probability that, barring replacement now, the equipment will last until age 13? (ii) What is the expected cost of replacement if the equipment is not replaced now? (iii) Under what condition on a, b, and c is expected cost lower if the equipment is replaced now than under the age 13 policy?

EXERCISE 3.44

(Based on a true story) If Travelopedia sells at least $3 million of tickets on airline X in a given month, the airline will give it a bonus of $250,000. (The bonus is called an override commission.) If the airline fails to reach $3 million in sales, it gets no bonus at all, Travelopedia projects that it has a 50% chance of meeting the $3 million target if it does nothing. If Travelopedia itself buys $60,000 in tickets on airline X-which it is not allowed to sell to anyone-it projects a 95% chance of meeting its target. If it buys $120,000 in tickets, it projects a 99.9% chance of meeting its target. Assuming that average net gain tied to the bonus is the standard, what should Travelopedia do? Use a decision tree if you like. (ii) Suppose that Travelopedia projects that, absent any action on its part, the dollar volume of tickets it can sell in the rest of the month is normally distributed. If it is now short of the $3 million dollar goal by $400,000, what is the mean of that normal distribution given the statements above? (iii) Given that Travelopedia will have a 95% chance of meeting its target with a $60,000 purchase of useless tickets, what is a reasonable estimate of the standard deviation of the normal distribution in (ii)? (iv) With the mean and standard deviation estimated in (ii) and (iii), is the probability of meeting its with a $120,000 purchase of useless tickets very close to 99.9%? (i)

EXERCISE 3.45

Suppose that a 70-room hotel offers rooms for $129 apiece but has an "EarlyBird" nonrefundable rate of $59 that requires booking at least one month in advance. It believes the demand for full-priced rooms on a particular day (which materializes only a few days ahead) is normally distributed with mean 66 and standard deviation 5. (The conversion rule from the normal curve to integers is the same as that in the example "Airline Yield Management.") Demand for EarlyBird rooms is unlimited. For simplicity, assume that all bookings are for exactly one night. (i)

Using the iterative approach of the airline yield-management problem (i.e., first decide whether to accept the first EarlyBird reservation; then, if it is accepted, decide whether to accept the second, etc.), find the number of rooms

249

Further Exercises

(i)

EXERCISE 3.46

the hotel should sell at the EarlyBird rate if its goal is to maximize expected revenue per night. Now suppose that it is known that each party that books at the EarlyBird rate has a 10% chance of not showing up (though it cannot get a refund in that case). Those who book rooms at full price always show up. If any guest shows up and there are no rooms available (i.e., the hotel is overbooked), the hotel must pay a penalty of $500 as well as give a refund. Would you change your booking strategy in (i) based on this new information? Explain.

New Distribution: The Beta The beta distribution has two parameters traditionally called a and /3, and its pdf is given by:

f., (y) = { cy x

0

where c is a constant equal to 1/ (i)

0 1 - (1-

J:y

0 -

1

y)'3-

1

for 0 ::; _Y ::; 1 otherwise

(1 - y)'3- 1 dy

Show that, given the expression above for c,

J:fx(y)dy = 1.

Using a computer program that plots functions, plot y 0 - 1(1 - y)f3-1 between 0 and 1 for a = f3 = 2, and also for a =l, f3 =3 and a =10, f3 =4. Do the results suggest that many different shapes of density function between 0 and 1 can be approximated by a beta distribution? (iii) In the UK, a by-election takes place when a seat in the House of Commons becomes available between general elections. Suppose that a by-election is to take place shortly, and it is initially believed that the fraction of voters supporting the Labour candidate is equally likely to be anywhere from 0 to 1. Then a poll of 100 voters is taken, and he has the support of 42 of them. Using Bayes Theorem, find the revised pdf for his support level. Is the revised pdf a beta distribution? (Yes.) What are its parameters? (Hint: Recall that polling is modeled as a binomial process.) (iv) Suppose that the initial pdf for the candidate's support was a beta distribution with a =f3 =2. Now what is the revised pdf for his support level? Is that itself a beta distribution? (Yes.)What are its parameters? (v) Consider the statement "if the prior distribution for a parameter is beta and we get new information about the parameter from a binomial process, the revised (posterior) distribution for the parameter is also beta." Is that statement true? Briefly discuss. (ii)

250

EXERCISE 3.47

CHAPTER 3

Continuous Random Variables

New Distribution: The Lognormal Suppose that random variable Z is the ratio of an investor's holdings at the end of the year to her holdings at the beginning. A ratio of one would mean that the value did not change (which is far from the worst outcome!). Based on finance theory, it is believed that X, the natural logarithm of Z, is N(0,0.2). Thus, for example, P(X < 0) = 1/2. In this instance, Z is said to be lognormal with parameters (0,.2) (i.e., the mean and standard deviation of the logarithm). (i) What is P(Z $ 1)? (ii) What is P(0 $ Z $ 1)? Is P(Z < 1/2) the same as P(Z > 2)? (iii) What is the probability that the holdings increase in value by at least 50% over the year? (iv) What is the median of Z? Do you think the mean of Z will exceed its median? (Hint: Do the values of Z that exceed one tend to spread out more than those that fall below one?) (v) Go to Google or whatever and find an expression for the mean of a random variable whose logarithm is N(µ,cr) . Is the answer consistent with what you said in (iv)?

EXERCISE 3.48

New Distribution: The Weibull The Weibull distribution is often used in operations management and engineering to approximate the lifespan of some item. It has parameters A and k, has a range from 0 to and has a CDF given by: 00 ,

Fx(Y) = P(X ::; y) = 1 - e;.y>

When k = l, to which distribution that we already know does the Weibull correspond? (ii) For ). = 1, work out P(0 $ X $ 1), P(l $ X $ 2) and P(2 $ X $ 3) for k = l, k = l/2, andk = 2. (iii) Now work out: P(l $ X $ 21 X ~ 1) and P(2 $ X $ 31 X ~ 2) for the three k-values. (iv) Among Weibull distributions, some k-values are associated with an increasing failure rate with age, some with a decreasing failure rate with age, and one with a constant failure rate with age. Given your results above, how would you classify k = 1/2, k = l, and k = 2 under this terminology? (i)

(v)

Think of one or two items that you commonly use. How do you think their failure rates per vary with age? Might some of their lifespans be approximated as Weibull? (This in not a deep question; just think in general terms. Don't be afraid to answer "no.")

Further Exercises

EXERCISE 3.49

251

New distribution: Student's t If X has a (student's) t-distribution with parameter n (for n a positive integer), its

pdf follows:

for -

oo

Montana + Wyoming" can occur. There could, for example, be 65 births in Idaho and 64 or fewer in Montana and Wyoming, or 53 births in Idaho and 52 or fewer in Montana and Wyoming. The total number of births in Montana and Wyoming combined (SMw) is Poisson distributed with mean 32 + 19 = 51, while the number in Idaho (5 1) is Poisson with mean 62. We can write:

Evaluating the expression on the right with a spreadsheet and Poisson calculator (allowing k to vary from (say) 40 to 100, outside of which P(S 1 = k) == 0), we reach: P(S1 > SMw) == 0.84

Because Idaho had a somewhat higher Poisson parameter than Montana and Wyoming combined, it is no surprise that the answer falls considerably above 50%. But we needed the calculation to find out what "considerably" means.

EXERCISE 4.11

On a day in which there are exactly 25 births in Wyoming, what is the probability that there are at least four times that number in Idaho and Montana combined?

Normal:

Suppose that Wi, W2,W n are independent normal random variables with means µ 1, µ 2, ..., µ nand variances cr2i, cr22, • • • , cr2n, respectively. Then the pleasant news is that their sum Sis itself normally distributed, with mean µ s = µk and variance 1 1 We postpone the proof of this result until we meet moment generating functions later in this chapter. Note that the formula is true for all means and all variances: the addition of independent bell-shaped curves always results in a bell-shaped curve.

I,;~

aiI,;; a;.

271

Sweet Sums of Random Variables

4.4

If Wk is N(µ1c, 0

(4-40)

Differentiating A/(A - 0) with respect to 0 yields M~(0) = A/(A - 0)2, which equals 1/A when 0 = 0. Thus, E(X) = 1/ A, which is what we determined in Chapter 3. The second derivative of Mx(0) follows: M'~(0) = 2)./(A - 0)3, which means that E(X2) = 2/}. 2 based on M'~(0). It follows that a 2 (X) = E(X 2) - µ/ = 1/}.2, and thus that a(X) = 1/A. This outcome reaffirms that standard deviation of any exponential random variable is the same as its mean.

EXERCISE 4.23

Find the moment generating function of X when X is gamma w ith parameters A and k = 2. Use it to verify that the mean of Xis 2/ A.

EXAMPLE 4.24

Normal Moments

Find the moment generating function for X when Xis N(0,l). Solution

If X is normal with unit parameters Oand 1 (i.e., X is N(0,1) then we can write:

Or

292

CHAPTER 4

Combinations of Random Variables

Do you remember the "completing the square" technique from high school, by which any function of the general form ay + by2 can be rewritten as:

ay + by 2 = (y-J"5 + a/(2-J"5) )2

- a2 I 4b

We can use it here to rewrite y 2 /2- 0y as (y / .Ji - 0/.J2)2 - 0 2 /2 = (y- 0)2 /2 - 0 2 /2. We can therefore express Mx(0) as:

Because this last integral is known from Calculus to be ,J2n, we have: If Xis N(0,1), its moment generating function M x(0) follows:

Mx (0)

= ee212

(4-41)

EXERCISE 4.24

If Xis N(µ, er), modify the approach in Example 4.24 to show that X has a moment generating function given by:

EXAMPLE 4.25

Circular Reasoning Suppose that a machine produces circular chips, but there is some variability in the radii of these chips. The radius of a randomly-selected chip is normally distributed with mean 1 unit and standard deviation .04 units. Given this information, what are the mean and standard deviation of the area of the chips produced? Solution

As we may recall, the area AR of a circular relates to its radius R under the formula AR= nR 2 • Thus, E(A0 = nE(R 2), and er 2(A0 = n 2 E(R 4) - n 2 [E(R 2)]2, Given that R is N(l,.04), we know that E(R 2) = er/+ µ 2R = .042 + 12 = 1.0016. Thus, E(A0 = nE(R 2) = 3.1416 * 1.0016 = 3.147. As for E(R4), we can take the fourth derivative of the moment generating function of R, noting that M x(0) = eµ 0erl-lY-l 2 = e0e OO16 • 1Y-l 2 _ Evaluated at 0 = 0, that fourth derivative-which is equal to E(R4)-is 1.0096. (Be patient in differentiating if you try to verify this outcome.) Therefore, er2 (A0 = n 2 E(R4) - n 2 [E(R 2)]2 = n 2 (1.0096 - 1.0032) = .063 and er(A0 = ✓.063 = .251

4.7

EXERCISE 4.25

Moment Generating Functions

293

Suppose that a chocolate company produces cubes of chocolate. Because of variability in the production process, the length W of the side of the cube is a random variable. (i) (ii)

In terms of the moments of W, what are the mean and variance of the cube's volume? (Hint: Do you have a sixth sense about the answer?) Suppose that W = 1 + X, where X is exponential with parameter 100 (and thus mean .01). What are the mean and standard deviation of the volume of a chocolate cube? (Hint: find the first six moments of X, and then find E(W6) in terms of these moments.)

Two More Moments

So far, we have emphasized the mean and variance, the first two moments of a random variable's distribution. But because the moment generating function often makes it tractable to find E(X~ for k larger than two, this is an opportune time to introduce two additional moments that provide further information about the distribution. The third central moment of X's distribution is E((X - µ x)3), while the fourth central moment is E((X- µ x)4) . (Here µ xis the mean of X.) What can we learn from these moments? The third central moment tells us something about how the spread of X-values to the left of the mean compares to that on the right. Note that (X- µ x)3 is negative when X below the mean and positive when Xis above it. It follows that, if the X-values above the mean spread out further than those below (i.e., if the distribution of X has a long right tail), the positive contributions to E((X - µ x)3) can considerably exceed the negative ones. The opposite is expected if X has a long left tail. If the distribution is symmetric about its mean-meaning that the chance of exceeding the mean by amount c is the same as the chance of falling below the mean by c-then E((X- µ x) 3) = 0. A frequently used indicator involving the third central moment is the skewness y:

The skewness y of random variable X is defined by: (4-42)

The idea is that the standard deviation already tells us something about the spread in the distribution of X, and that we want to compare the asymmetry in the spread on opposite sides of the mean with the overall level of spread. (We use CJ x3 in the denominator of y to get a cubic measure of spread against which we might compare the cubic variability in the numerator.) A high positive value of y implies that, even taking account of the spread in the distribution, the spread of measurements to the right of the mean is considerably more pronounced that the spread to the left. (A

294

CHAPTER 4

Combinations of Random Variables

high negative value has the opposite interpretation.) A low value of y implies that, in the context of the overall spread in the distribution of X, the left/right asymmetry in the spread is no big deal. For the fourth central moment, the quantity corresponding to skewness is the kurtosis: which is defined by:

The kurtosis

K

of random variable x is defined by: (4-43)

The idea here is that even two distributions that have the same mean and the same variance can differ appreciably. At the lower extreme, half the data might be heavily concentrated close to µ x - a x and the other half near µ x + a x; at the other extreme, some of the data might dribble out many standard deviations away from the mean. Distinguishing the two situations can help in assessing how extreme the outer outcomes in the distribution can be. The kurtosis can help in that assessment. (Sometimes, people define kurtosis as E(X -µ x) 4 I a x4 - 3, for reasons that will become clearer once we perform a few calculations.)

EXERCISE 4.26

Using a moment generating function, find E(X3) and E(X4) when X is exponential with parameter A. Then use these results with E(X) and E(X2) (which we determined in Example 4.23) to find the skewness and kurtosis of X.

Moment Generating Function of a Random Variable that Linear in Another

Given the way skewness and kurtosis are defined, we could find both of them directly for random variable X if we could easily find the moment generating function of V

= X-µx-

That circumstance raises a broader question: How does the moment generating function of V = a + bX relate to that of X? Fortunately, the relationship is not complicated.

If V

= a + bX, we can write:

Note that having Vbetween y and y + dy is the same as having Xbetween (y-a)/b and (y - a)/b + dy/b.

295

Moment Generating Functions

4.7

But then P(y s; V s; y + dy)

=f v(y)dy = P((y - a)/b s; X s; (y -

a)/b + dy/b)

=fx((y -

a)/b))dy/b

which means that:

If we set w = (y - a)/b, then y =a+ bw and dw = dy/b and can therefore write:

If V

= a + bX, then: (4-44)

In other words, the moment generating function of V is readily expressed in the terms of the generating function of X, so long as we use b0 rather then 0 in the exponent involving X. Through a double-use of the Taylor-series expansion for the exponential, we reach: M v(0)

= eeaM v(b0) = (1 + a0 + a 2 0 2 / 2 + ...)(1 + b0E(X) + b2 0 2 E(X2)/ 2 + ...)

= 1 + 0(a + bE(X)) + 02 [(a2 + 2abE(X) + b2 E(X2)]/ 2] + .. . The first derivative of M v(0) at 0 = 0 is as usual the mean of V, but we see that it is also a + bE(X). That is the result E(V) = a + bE(X) that we first saw in Chapter 2. Likewise, the second derivative of M v(0) at 0 = 0 is E(V2), but it is also equal to a 2 + 2abE(X) + b2 E(X2), as we also would have expected.

EXAMPLE 4.27

Two More Normal Moments

Find the skewness and kurtosis of X if Xis N(µ x, Gx}. Solution

The symmetry of the normal density function means that the skewness E(X - µ x) 3 is zero for a normal random variable. (For example, (X - µx) 3 when Xis three below the mean is the negative of (X- µx) 3 when Xis three above the mean, and the two

296

CHAPTER 4

Combinations of Random Variables

possibilities are equally likely.) For the kurtosis, we set V = X - µ x and note from (4-44) that Mv (0) = e- 0µxM x (0). (We are applying (4-44) where a= -µ x and b = l.) From Exercise 4.25, we can then express M v(0) as: M v (0) = e-0µ xeµx0ea';.1J212 = ea';.!J'- 12

Then we take the fourth derivative of M v(0) and, on evaluating that derivative at 0 =0, we get the surprisingly simple outcome that Mv(0) =3CJx4. In consequence, the kurtosis K: as defined in (4-43) follows:

If x is N(µ x, CJ x), then its kurtosis is given by: k = 3CJx4/ CJx4 = 3.

(4-45)

What (4-45) says is that every normal distribution-regardless of its mean or standard deviation-has kurtosis three. Because the normal is often viewed as the "baseline" distribution, some people define the kurtosis as E(x - µ )4/ / eJ4 - 3. Under that convention-which we will not adopt-the normal has kurtosis zero, while distributions with positive kurtosis tend to assign higher probabilities to outcomes far from the mean than does a normal distribution with the same standard deviation. Under our own rule, a kurtosis greater than three implies more extreme "outer spread" than the corresponding normal, while a kurtosis less than three means less extreme "outer spread." (The minimum possible kurtosis is one under our definition, and it arises for the discrete distribution with half its probability mass a distance CJ x below the mean and the other half a distance CJ x above the mean.)

EXERCISE 4.27

Suppose that Xis exponential with parameter A. (i) (ii)

Use (4-44) to relate the moment generation function of V moment generating function of X. Then use Mv(0) to find the skewness and kurtosis of X.

=X -

µx to the

An Important Point

Though a formal proof is beyond the scope of this book, an important theorem is that:

If two random variables V and W have the same moment generating Junction, then they have the same probability distribution.

(4-46)

This statement makes intuitive sense. The fact that two random variables V and W have the same mean does not imply they are the same; nor does having the same mean and standard deviation. But if the moment generating functions are the same, then E(Wk) is the same as E(Vk) for every nonnegative integer k. Moreover, any difference between the pdfs of Wand V would seem inevitably to create some discrepancy

4.7

297

Moment Generating Functions

r~

somewhere when we compare e9Y fv(y)dy and J..::. e9Y f w(y)dy at every single value of 0. The theorem says that, if no discrepancy exists, thenfv(y) =f w(Y) at every y. This theorem allows us a way to "back into" the probability density function of some random variable of interest. If we can show that its moment generation function is that same as that for another random variable whose pdf we know, then we can be confident that our interesting random variable has the same pdf. Indeed, we can now fulfill a promise we made earlier. We said that, if X is normally distributed, then V = a + bX must also be normally distributed. For simplicity, suppose that Xis N(O,1). Then, under (4-41) and (4-44), the moment generating func2 tion of V follows: M v(0) = M x(b0) = e6neb fF- 12. But you showed in Exercise 4.25 that this is the moment generating function for a normal variable with mean a and standard deviation b. For that reason, V must itself be N(a,b). More generally:

If Xis N(µx, crx) and V =a+ bX, then: (4-47)

Vis N(a + bµx, bcrx)

Moment Generating Functions for Sums of Independent Random Variables

There are some obvious similarities between the moment generating function and the z-transform. To find the mean, for example, we differentiate in both instances, though we evaluate the derivative at z = 1 in the z-transform but at 0 = 0 in the moment generating function. And several other properties we ascribed to the z-transform carry over to the generating function. For example: If X and Z are independent and S = X + Z, then: (4-48)

The proof of this statement is brief, and appears at the course website. More generally:

If Xi, Xz, ... Xmare independent random variables and S

= "'"'•=1 ~ "' X.

then:

I

(4-49) (where TT refers to multiplication)

This result offers a direct way to show that the sum of independent normals is itself normal. If X; is N(µ ;,CJ';), then its moment generating function is eµ ,9 ea:9' 12 . But then (4-48) means that S

=

r:

1

X; has as its moment generating function

298

CHAPTER 4

e 1:µ•e 2 of the form W1E(W2 1 W1). (vi) Using the correlation coefficient for W1 and W2, and other information about the mean and variance of W1 and W21 work out the variance and standard deviation of S, the index of birthday overlap. (Hint: would E(W;Wj) be the same for all (i,j) pairs with j > i? How many such (i,j) pairs are there?)

EXERCISE 4.66

349

Combinations of Random Variables

CHAPTER 4

For each of the patterns graphed below, do you think the variables exhibit positive correlation, negative correlation, or no correlation? ,:;;a

@RISK

Sc•ttff Plot Output (121 Input (36

et Income/ 20 10 vs Capital Expenses/ 20 10

.,..,

·20

+ ~

-60

i]

,,

-

~o

9g

.t: !,

r~_f□ I~

_ ___ _....._______ i

--II-

+ - ~, - ·1:i,t

·100

11,...,;n

-1."0

lllSl'-llU

+

+ · 140

+

~....

+++\+t

+:

+

++++++ + ++

hO 110

·200

s

~

C . t p O l ~ , 2010 V es,en,o.,s I

~

85

·-

80

75 70

..,

~

·"'

i

65 60

55

50

•s

•o 35 30 30

35

•o

◄S

so

ss

Hu b n nd"

60

Av e

s

70

75

so

Note: This index presents important terms and concepts only when they are introduced. (There is no point recording every occasion that a phrase like "mutually exclusive events" appears, and doing so could confuse the reader.) Readers should remember that the glossary and formula listings in the Takeaway Bars of individual chapters offer a good review of what happened in the chapter.

Bayes Theorem, 20-21 Bernoulli Trials, 109 Beta distribution, 227, 249 Binomial distribution, 108, 111 Binomial mean, 114 Binomial process, 108 Binomial variance, 114 Branch of probability tree, 62 Central Limit Theorem, 298, 299 Coefficient of Variation, 102 Complementary events, 4 Complementary sets, 1 Conditional Expectation Rule, 125 Conditional probability, 14 Control chart, 307 Correlation, 312 Correlation coefficient, 313 Correlation and independence, 316 Cost-benefit analysis, 54 Counting formula, 92 Cumulative Distribution Function, 98 Difference of two independent random variables, 267 Discrete approximation, 106 Discrete random variable 97 Discrete transforms of independent random variables, 277 Discrete transforms of sums of independent random variables, 277 Disjoint events, 4 Disjoint sets, 1

Elementary event, 4 Empty Set, 1 Equally likely events, 8 Equal probability axiom, 8 Event, 4 Exhaustive events, 69 Expected Value, 100 Experiment, 69 Exponential distribution, 194, 195 Exponential mean, 195 Exponential standard deviation, 195 Gamma distribution, 207 Gamma mean, 207 Gamma variance, 207 Geometric distribution, 126, 127 Geometric's mean, 128 Geometric process, 126 Geometric variance, 128 Hypergeometric distribution, 137, 139 Independent and identically distributed (iid) random variables, 275 Independent events, 17 Independent random variables, 264 Intersection (of sets), 1 Joint density function, 254 Joint distributions, 254 Joint probability, 69 Joint probability law, 22 Joint probability mass function, 254

351

352

Index

Kolmogorov's axioms, 5-6 Kurtosis, 294 Law of alternatives, 7 Linear function of discrete random variable, 123 Lognormal distribution, 227, 250, 311 Lower control limit, 306 Mean Mean of a sum of random variables, 260 Mean of an average of iid random variables, 276 Mean of linear combination of random variables, 274 Memoryless random process, 198 Moment generating functions, 290 Monte Carlo simulation, 193 Mutually exclusive events, 4 Negative Binomial distribution, 131, 132 Negative Binomial mean, 133 Negative Binomial variance, 133 Node of probability tree, 62 Normal distribution, 212, 214 Poisson Distribution, 145, 148 Poisson Mean, 150 Poisson Variance, 150 Poisson and Exponential, 198 Probability, 1 Probability Density Function, 180, 181 Probability Mass Function, 98 Random variable, 95 Realization of random variable, 95 Redundancy of equipment, 18 Relative frequency, 12 Sample space, 4 Simulation (Monte Carlo), 193 Simulation, continuous random variable, 228

Simulation, discrete random variable, 194 Skewness, 293 Standard Deviation, 102 Statistical process control, 304 Subjective probability, 7 Sums of a random number of independent random variables, 282 Sums of continuous random variables, 258 Sums of correlated random variables, 325, 326

Sums of discrete random variables, 256 Sums of independent Poisson random variables, 269 Sums of independent normal random variables, 270 Sums of independent binomial random variables, 27 Survival curve, 49 t-distribution, 227, 251 Transforms of sums of a random number of independent random variables, 283

Uniform Distribution, 184 Uniform mean, 186 Uniform variance, 186 Union of sets, 1 Upper control limit, 306 Variance, 101 Variance of a linear combination of independent random variables, 274 Variance of a sum of independent random variables, 266 Variance of an average of iid random variables, 276 Venn diagram, 1 Weibull distribution, 227, 250 z-transform, 278

Biggest of the Big

3

1.2

Double Header?

4

1.3

Sixth Sense

9

1.4

Urnings

9

1.5

Double Urnings

10

1.6

Easy as A, B, C?

12

1.7

Rain Man

13

1.8

Another Die Is Cast

15

1.9

Horse Sense

16

1.10

Square Won?

17

1.11

Backed Up?

18

1.12

Bayes' Hit?

21

1.13

Ask Marilyn

23

1.14

Second Chance

24

1.15

A Bridge Collapse

26

1.16

A Coincidence in Connecticut

28

1.17

Minerva Plays Craps

30

1.18

Miller's Dilemma

33

1.19

Let's Make a Deal

36

1.20

The Birthday Problem

37

1.1

1.21

Length of Life

40

1.22

Space Problem

41

1.23

Homicide in Detroit

43

1.24

Safe to Fly?

45

1.25

Was Hinckley Schizophrenic?

46

1.26

Survival Curves

49

1.27

Color Scheme

51

1.28

Cost Benefit Analysis

54

1.29

Double Time

55

1.30

Where Default Lies

57

1.31

Poll Vault

59

1.32

Getting Used

61

1.33

Screen Test

64

1.34

Strong Stuff?

67

2.1

New York Story

94

2.2

Beyond the Die Toss

95

2.3

Fit to Eat?

96

2.4

More Sour Yogurt

99

2.5

Mean Feat

100

2.6

A Toss and a Spread

102

2.7

Still More Sour Yogurt

103

2.8

Box Score

104

2.9

Pension Liability

106

2.10

One Day Flu?

111

2.11

Flu Cure Revisited

113

2.12

Auto Insurance

115

2.13

A World Series

116

2.14

Arc of Triumph?

119

2.15

Double Trouble

120

2.16

Hot Flash

124

2.17

Another Mean Feat

126

2.18

Ten Million Miles of Air Time

128

2.19

Mendel's Sure Thing

130

2.20

Time to Give

133

2.21

Risk Assessment

135

2.22

Back to Keno

140

2.23

Heat Wave

141

2.24

Under Fire

143

2.25

Poisson Time

148

2.26

Born Again?

151

2.27

Mystery on Ice

152

2.28

Line Item

153

2.29

Too Close for Comfort

155

3.1

Our First Continuous Distribution

183

353

354

Solved Problems

3.2

A Quickie

185

4.12

State of Birth

269 271

3.3

Fire Fight

186

4.13

Times Fly

3.4

Tunnel Vision

187

4.14

Power Play

272

3.5

Color Scheme

189

4.15

Show Business

275

3.6

Fuel for Thought

190

4.16

Moving In

277

3.7

Quitting Time?

192

4.17

Two by Two

279

3.8

Memory Loss

196

4.18

Sic Bo Express

280

3.9

A Battery Charge

199

4.19

Born Yet Again

283

3.10

Traffic Congestion

201

4.20

Minerva's Bagels

284

3.11

Express Yourself

204

4.21

Poisson Transformed

288

3.12

Smoke Signals

208

4.22

Integers Not Welcome

289

3.13

Perfect Tenth?

209

4.23

Exponential Moments

291

3.14

A Gathering Storm

211

4.24

Normal Moments

291

3.15

Keep It Light?

216

4.25

Circular Reasoning

292

3.16

California Nightmare?

217

4.26

Two More Normal Moments

296

3.17

Airline Yield Management

218

4.27

Is Gamma Normal?

300

3.18

A Newsboy's Problem

221

4.28

Binomial Successes?

302

3.19

Foreclosing Options

224

4.29

A Party Game

303

3.20

Triangulation

229

4.30

Tube Tests

307

3.21

Another Inversion

230

4.31

Mendel Takes the Heat

309

4.32

Log In

310

4.1

The Die Toss Again

255

4.33

Three Correlations

313

4.2

Imperfect Match?

255

4.34

A Partial Correlation

315

4.3

The Die Toss Yet Again

257

4.35

April Showers

316

Log Roll

259

4.36

Of Skill and Luck

318

4.5

Die Mean

260

4.37

High Times

320

4.6

Another Birthday Present

260

4.38

With Friends Like These

321

4.7

Hyper Sum

262

4.39

Bridge Gaps

323

4.8

Gamma's Mean

263

4.40

Will Power

327

4.9

Heat Wave

264

4.41

A Diversified Portfolio

327

4.10

No Name

266

4.42

4.11

Flight Plans

268

A Paradox in Mendel's Commute

330

4.4