Random Phenomena: Fundamentals of Probability and Statistics for Engineers (Solutions, Instructor Solution Manual) [1 ed.] 9781439820261, 9781420044973, 1420044974


141 20 6MB

English Pages 552 Year 2009

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Random Phenomena: Fundamentals of Probability and Statistics for Engineers   (Solutions, Instructor Solution Manual) [1 ed.]
 9781439820261, 9781420044973, 1420044974

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

SOLUTIONS MANUAL FOR Random Phenomena:

Fundamentals of Probability and Statistics for Engineers

by

Babatunde A. Ogunnaike

SOLUTIONS MANUAL FOR Random Phenomena:

Fundamentals of Probability and Statistics for Engineers

by Babatunde A. Ogunnaike

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2011 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper Version Date: 20110815 International Standard Book Number: 978-1-4398-2026-1 (Paperback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

Chapter 1

Exercises Section 1.1 1.1 From the yield data in Table 1.1 in the text, and using the given expression, we obtain s2A s2B

= =

2.05 7.64

from where we observe that s2A is greater than s2B . 1.2 A table of values for di is easily generated; the histogram along with summary statistics obtained using MINITAB is shown in the Figure below. Summary for d A nderson-D arling N ormality Test

-3

0

3

6

A -S quared P -V alue

0.27 0.653

M ean S tDev V ariance S kew ness Kurtosis N

3.0467 3.3200 11.0221 -0.188360 -0.456418 50

M inimum 1st Q uartile M edian 3rd Q uartile M aximum

9

-5.1712 1.0978 2.8916 5.2501 9.1111

95% C onfidence Interv al for M ean 2.1032

3.9903

95% C onfidence Interv al for M edian 1.8908

4.2991

95% C onfidence Interv al for S tD ev 9 5 % C onfidence Inter vals

2.7733

4.1371

Mean Median 2.0

2.5

3.0

3.5

4.0

4.5

Figure 1.1: Histogram for d = YA − YB data with superimposed theoretical distribution 1

2

CHAPTER 1.

¯ is obtained as From the data, the arithmetic average, d, d¯ = 3.05

(1.1)

And now, that this average is positive, not zero, suggests the possibility that YA may be greater than YB . However conclusive evidence requires a measure of intrinsic variability. 1.3 Directly from the data in Table 1.1 in the text, we obtain y¯A = 75.52; y¯B = 72.47; and s2A = 2.05; s2B = 7.64. Also directly from the table of differences, di , generated for Exercise 1.2, we obtain: d¯ = 3.05; however s2d = 11.02, not 9.71. Thus, even though for the means, d¯ = y¯A − y¯B for the variances,

s2d 6= s2A + s2B

The reason for this discrepancy is that for the variance equality to hold, YA must be completely independent of YB so that the covariance between YA and YB is precisely zero. While this may be true of the actual random variable, it is not always strictly the case with data. The more general expression which is valid in all cases is as follows: s2d = s2A + s2B − 2sAB

(1.2)

where sAB is the covariance between yA and yB (see Chapters 4 and 12). In this particular case, the covariance between the yA and yB data is computed as sAB = −0.67 Observe that the value computed for s2d (11.02) is obtained by adding −2sAB to s2A + s2B , as in Eq (1.2). Section 1.2 1.4 From the data in Table 1.2 in the text, s2x = 1.2. 1.5 In this case, with x ¯ = 1.02, and variance, s2x = 1.2, even though the numbers are not exactly equal, within limits of random variation, they appear to be close enough, suggesting the possibility that X may in fact be a Poisson random variable. Section 1.3 1.6 The histograms obtained with bin sizes of 0.75, shown below, contain 10 bins for YA versus 8 bins for the histogram of Fig 1.1 in the text, and 14 bins for YB versus 11 bins in Fig 1.2 in the text. These new histograms show a bit more detail but the general features displayed for the data sets are essentially unchanged. When the bin sizes are expanded to 2.0, things are slightly different,

3

Histogram of YA (Bin size 0.75) 18 16 14

Frequency

12 10 8 6 4 2 0 72.0

73.5

75.0

76.5

78.0

79.5

YA

Histogram of YB (Bin size 0.75) 6

Frequency

5

4

3

2

1

0 67.5

69.0

70.5

72.0

73.5 YB

75.0

76.5

78.0

Figure 1.2: Histogram for YA , YB data with small bin size (0.75) Histogram of YA (Bin size 2.0) 25

Frequency

20

15

10

5

0 72

74

76 YA

78

80

Histogram of YB(Bin Size 2.0) 14 12

Frequency

10 8 6 4 2 0 67

69

71

73 YB

75

77

79

Figure 1.3: Histogram for YA , YB data with larger bin size (2.0)

4

CHAPTER 1.

as shown below. These histograms now contain fewer bins (5 for YA and 7 for YB ); and, hence in general, show less of the true character of the data sets. 1.7 The values computed from the data for y¯A and sA imply that the interval of interest, y¯A ± 1.96sA , is 75.52 ± 2.81, or (72.71, 78.33). From the frequency distribution of Table 1.3 in the text, 48 of the 50 points lie in this range, the excluded points being (i) the single point in the 71.51–72.50 bin and (ii) the single point in the 78.51–79.50 bin. Thus, this interval contains 96% of the data. 1.8 For the YB data, the interval of interest, y¯B ± 1.96sB , is 72.47 ± 5.41, or (67.06, 77.88). From Table 1.4 in the text,we see that approximately 48 of the 50 points lie in this range (excluding the 2 points in the 77.51–78.50 bin). Thus, this interval also contains approximately 96% of the data. 1.9 From Table 1.4 in the text, we observe that the relative frequency associated with x = 4 is 0.033; that associated with x = 5 is 0.017 and 0 thereafter. The implication is that the relative frequency associated with x > 3 = 0.050. Hence, the value of x such that only 5% of the data exceeds this value is x = 3. 1.10 Using µ = 75.52 and σ = 1.43, the theoretical values computed for the function in Eq 1.3 in the text, (for y = 72, 73, . . . , 79) are shown in the table below along with the the corresponding relative frequency values from Table 1.3 in the text.

YA Group 71.51-72.50 72.51-73.50 73.51-74.50 74.51-75.50 75.51-76.50 76.51-77.50 77.51-78.50 78.51-79.50

y 72 73 74 75 76 77 78 79

Theoretical f (y) 0.014 0.059 0.159 0.261 0.264 0.163 0.062 0.014

Relative Frequency 0.02 0.04 0.18 0.34 0.14 0.16 0.10 0.02

TOTAL

50

0.996

1.00

The agreement between the theoretical values and the relative frequency is reasonable but not perfect. 1.11 This time time with µ = 72.47 and σ = 2.76 and for y = 67, 68, 69, . . . , 79, we obtain the table shown below for the YB data (along with the the corresponding relative frequency values from Table 1.4 in the text).

5

YB Group 66.51-67.50 67.51-68.50 68.51-69.50 69.51-70.50 70.51-71.50 71.51-72.50 72.51-73.50 73.51-74.50 74.51-75.50 75.51-76.50 76.51-77.50 77.51-78.50 78.51-79.50

y 67 68 69 70 71 72 73 74 75 76 77 78 79

Theoretical f (y) 0.020 0.039 0.066 0.097 0.125 0.142 0.142 0.124 0.095 0.064 0.038 0.019 0.009

Relative Frequency 0.02 0.06 0.08 0.16 0.04 0.14 0.08 0.12 0.10 0.12 0.00 0.04 0.00

TOTAL

50

0.980

1.00

There is reasonable agreement between the theoretical values and the relative frequency. 1.12 Using λ = 1.02, the theoretical values of the function f (x|λ) of Eq 1.4 in the text at x = 0, 1, 2, . . . 6 are shown in the table below along with the corresponding relative frequency values from Table 1.5 in the text.

X 0 1 2 3 4 5 6

Theoretical f (x|λ = 1.02) 0.3606 0.3678 0.1876 0.0638 0.0163 0.0033 0.0006

Relative Frequency 0.367 0.383 0.183 0.017 0.033 0.017 0.000

TOTAL

1.0000

1.000

The agreement between the theoretical f (x) and the data relative frequency is reasonable. (This pdf was plotted in Fig 1.6 of the text.)

Application Problems 1.13 (i) The following is one way to generate a frequency distribution for this data:

6

CHAPTER 1.

X 1.00-3.00 3.01-5.00 5.01-7.00 7.01-9.00 9.01-11.00 11.01-13.00 13.01-15.00 15.01-17.00 17.01-19.00 19.01-21.00 21.01-23.00 23.01-25.00

Frequency 4 9 11 20 10 9 3 6 6 5 1 1

Relative Frequency 0.047 0.106 0.129 0.235 0.118 0.106 0.035 0.070 0.070 0.059 0.012 0.012

TOTAL

85

0.999

The histogram resulting from this frequency distribution is shown below where we observe that it is skewed to the right. Superimposed on the histogram is a theoretical gamma distribution, which fits the data quite well. The variable in question, time-to-publication, is (a) non-negative, (b) continuous, and (c) has the potential to be a large number (if a paper goes through several revisions before it is finally accepted, or if the reviewers are tardy in completing their reviews in the first place). It is therefore not surprising that the histogram will be skewed to the right as shown. Histogram of x Gamma Shape 3.577 Scale 2.830 N 85

20

Frequency

15

10

5

0 0

4

8

12

16

20

24

x

Figure 1.4: Histogram for time-to-publication data (ii) From this frequency distribution and the histogram, we see that the “most popular” time-to-publication is in the range from 7-9 months (centered at 8 months); from the relative frequency values, we note that 41/85 or 0.482 is the

7 fraction of the papers that took longer than this to publish. 1.14 (i) A plot of the histogram for the 20-sample averages, yi , generated as prescribed is shown in the top panel of the figure below. We note the narrower range occupied by this data set as well as its more symmetric nature. (Superimposed on this histogram is a theoretical normal distribution distribution.) (ii) A histogram of the average of averages, zi , is shown in the bottom panel of the figure. The “averaging” significantly narrows the range of the data and also makes the data set somewhat more symmetric.

Histogram of y Normal 12

Mean StDev N

10.12 0.8088 85

Mean StDev N

10.38 0.7148 85

Frequency

10 8 6 4 2

0 8.5

9.0

9.5

10.0 y

10.5

11.0

11.5

12.0

Histogram of z Normal 20

Frequency

15

10

5

0 9.0

9.6

10.2

10.8

11.4

12.0

z

Figure 1.5: Histogram for time-to-publication data 1.15 (i) Average number of safety incidents per month, x ¯ = 0.500; the associated variance, s2 = 0.511. The frequency table is shown below:

8

CHAPTER 1.

X 0 1 2 3

Frequency 30 12 6 0

Relative Frequency 0.625 0.250 0.125 0.000

TOTAL

48

1.000

The resulting histogram is shown below.

Histogram of SafetyIncidents 30

Frequency

25

20

15

10

5

0 0

1 SafetyIncidents

2

Figure 1.6: Histogram for safety incidents data

(ii) It is reasonable to consider the relative frequency of occurrence of the safety incidents as an acceptable measure of the “chances” of obtaining each indicated number of occurrences: since fr (0) = 0.625, fr (1) = 0.250, fr (2) = 0.125, fr (3) = 0.000 = fr (4) = fr (5), these may then be considered as reasonable estimates of the chances of observing the indicated occurrences. (iii) From the postulated model:

f (x) =

e−0.5 0.5x x!

we obtain the following table which shows the theoretical probability of occurrence side-by-side with the relative frequency data; it indicates that the model actually fits the data quite well.

9

X 0 1 2 3 4 5

Theoretical Probability, f (x) 0.607 0.303 0.076 0.012 0.002 0.000

Relative Frequency 0.625 0.250 0.125 0.000 0.000 0.000

TOTAL

1.000

1.000

(iv) Assuming that this is a reasonable model, then we may use it to compute the “probability” of observing 1, 3, 2, 3 safety incidents (by pure chance alone) respectively over a period of 4 consecutive months. From the theoretical results in (iii) above, we note that the probability of observing 1 incident (by pure chance alone) is a reasonable 0.303; for 2 incidents, the probability is 0.076; it appears that the probability of observing 3 incidents by pure chance alone is rare: 0.012 or 1.2%. Observing another set of 3 incidents just two months after observing the first set of 3 incidents seems to suggest that something more systematic than pure chance alone might be responsible. However, these statements are not meant to be “definitive” or conclusive; they merely illustrates how one may use this model to answer the posed question. 1.16 (i) The histograms for XB and XA are shown below, plotted side-by-side and on the same x-axis scale. The histograms cover the same range (from about 200 to about 360), and the frequencies are similar. Strictly on the basis of a visual inspection, therefore, it is difficult to say anything concrete about the effectiveness of the weight-loss program. It is difficult to spot any real difference between the two histograms.

Histogram of XB, XA 200 XB

240

280

320

XA

4

5

4

Frequency

3 3 2 2 1 1

0

0 200

240

280

320

360

Figure 1.7: Histograms for XB and XA

360

10

CHAPTER 1.

(ii) The histogram of the difference variable, D = XB −XA , shown below, reveals that this variable is not only positive, it actually ranges from about 2 to about 14 lbs. Thus, strictly from a visual inspection of this histogram, it seems obvious that the weight-loss program is effective. The implication of this histogram of the “before”-minus-“after” weight difference is that the “after” weight is consistently lower than the “before” weight (hence the difference variable that is consistently positive). However, this is not obvious from the raw data sets. Histogram of D 10

Frequency

8

6

4

2

0 2

6

10

14

D

Figure 1.8: Histograms for the difference variable D = XB − XA 1.17 The relative frequency table is shown below (obtained by dividing the supplied absolute frequency data by 100, the total number of patients). The resulting frequency distribution plots are also shown below. x 0 1 2 3 4 5

frO 0.32 0.41 0.21 0.05 0.01 0.00

frY 0.08 0.25 0.35 0.23 0.08 0.01

The average number of live births per delivered pregnancy is determined as follows: for the older group, Total no of live births Total no of patients (0 × 32) + (1 × 41) + (2 × 21) + (3 × 5) + (4 × 1) = 1.02 = 100 and in a similar fashion, for the younger group, x ¯O

=

x ¯Y =

201.08 = 2.01 100

11

Freq Distribution: y_O 40

Frequency

30

20

10

0 0

1

2

3

4

5

4

5

x

Freq Distribution: y_Y 40

Frequency

30

20

10

0 0

1

2

3 x

Figure 1.9: Frequency distribution plots for IVF data

12

CHAPTER 1.

The values computed for the average number of live births appears to be higher for the younger group than for the older group; and the frequency distributions also appear to be different in overall shape. Observe that the distribution for the older group shows a peak at x = 1 while the peak for the younger group’s frequency distribution is located at x = 2; furthermore, the distribution for the older group shows a much higher value for x = 0 than that for the younger group. These data sets therefore seem to indicate that the outcomes of the IVF treatments are different for these two groups.

Chapter 2

Exercises Section 2.1 2.1 There are several different ways to solve the equation: τ

dC = −C + C0 δ(t) dt

(2.1)

For example, by Laplace transforms, if the Laplace transform of the indicated function is defined as ˆ C(s) = L{C(t)} then by taking Laplace transforms of each term in this linear equation, one immediately obtains: ˆ ˆ τ sC(s) = −C(s) + C0 since L{δ(t)} = 1. This algebraic equation in the variable s is easily solved for ˆ C(s) to obtain C0 ˆ C(s) = (2.2) τs + 1 from where the inverse Laplace transform yields C(t) =

C0 −t/τ e τ

(2.3)

as expected. 2.2 In terms of the indicated scaled time variable, Eq (2.15) in the text may be written as: ˜ F (t˜) = 1 − e−t and the required plot is shown below. The required percentage of dye molecules with age less than or equal to the mean residence time, τ , is obtained from Eq (2.15) when t = τ , or for t˜ = 1 from above: i.e., F (1) = 1 − e−1 = 0.632 1

2

CHAPTER 2.

1.0

0.8

F

0.6

0.4

0.2

0.0 0

2

4

~ t

6

8

10

Figure 2.1: Cumulative age distribution function as a function of scaled time variable t˜ so that 63.2% of the dye molecules have age less than or equal to τ . 2.3 Via direct integration by parts, letting u = θ and dv = e−θ/τ dθ, one obtains: ½ ¾ Z ∞ ´¯∞ Z ∞ 1 −θ/τ 1 ³ ¯ θe dθ = −θτ e−θ/τ ¯ + τ e−θ/τ dθ τ τ 0 0 0 o 1n −θ/τ ∞ = 0 + τ (−τ e |0 ) = τ (2.4) τ as required. Section 2.2 2.4 The plot of the two pdfs f (x) and f (y) are shown below.

0.4 x, Eq (2.24) y , Eq (2.25)

f(y)

Density

0.3

0.2

0.1 f(x)

0.0 -10

-5

0 X

5

10

Figure 2.2: Probability distribution functions for f (x) and f (y) in Problem 2.4

3 Variable x, whose pdf is shown in the solid line, has a higher degree of uncertainty associated with the determination of any particular outcome. This is because the range of possible values for x is much broader than the range of possible values for y. 2.5 From the given pdf, we obtain f (0) = (0.5)4

=

f (1) = 4(0.5)4

=

f (2) = 6(0.5)4

=

f (3) = 4(0.5)4

=

f (4) = (0.5)4

=

1 16 1 4 3 8 1 4 1 16

Intuitively, one would expect that if the coin is truly fair, then the most likely outcome when a coin is tossed 4 times is 2 heads and 2 tails. Any systematic deviation from this unbiased outcome will be less likely for a fair coin. The result of the computation is consistent with this intuitive notion of fairness. Section 2.3 2.6 In tossing a fair coin once, 1. from the classical (` a-priori ) perspective, the probability of obtaining a head is specified as 1/2 because, of the two mutually exclusive outcomes, (H, T ), one is favorable (H); 2. from the relative frequency (` a-posteriori ) perspective, if the experiment of tossing the fair coin once is repeated a total of nT times, and a head is observed nH times, then the probability of obtaining a head is specified as nnH ; T 3. and from the subjective perspective, the presumption that the coin is fair implies that there is no reason for a head to be observed preferentially over a tail; the probability of obtaining a head is therefore specified as 1/2.

Application Problems 2.7 (a) With two plug flow reactors in series, the overall residence time is obtained as a combination of the residence times in each reactor. First, by the plug flow assumption, θ1 , the time for a dye molecule to traverse the entire length of the first reactor, is precisely l1 A/F ; similarly, θ2 , the time for a dye molecule to traverse the second reactor, is precisely l2 A/F . Therefore, the residence time

4

CHAPTER 2.

for the configuration involving these two reactors in series is given by: θ = θ1 + θ2 =

A(l1 + l2 ) F

(2.5)

(b) With two CSTR’s in series, the residence time distribution for the configuration is related to the dye concentration out of the second reactor, which is influenced directly by the dye concentration from the first reactor. Upon the usual assumption of ideal mixing, we obtain the following model equations from material balance when a bolus of dye, with concentration C0 δ(t), is introduced into the first reactor: dC1 dt dC2 τ2 dt τ1

=

−C1 + C0 δ(t)

(2.6)

=

−C2 + C1

(2.7)

where τi = Vi /F ; i = 1, 2. This set of linear ordinary differential equations can be solved for C2 (t) many different ways (either as a consolidated second order equation, obtained by differentiating the second equation and introducing the first for the resulting dC1 /dt; or simultaneously, using matrix methods; or via Laplace transforms, etc.) By whichever method, the result is: µ ¶³ ´ C0 C2 (t) = e−t/τ1 − e−t/τ2 (2.8) τ1 − τ2 And now, as in the main text, if we define f (θ) as the instantaneous fraction of the initial number of injected dye molecules exiting the reactor at time t = θ, i.e., C2 (t)/C0 , we obtain: µ −t/τ1 ¶ e − e−t/τ2 f (θ) = (2.9) τ1 − τ2 as the required residence time distribution for this ensemble. (c) For a configuration with the PFR first and the CSTR second, if C0 δ(t) is the concentration at the inlet to the PFR, then C1 (t) the concentration out of the PFR is given by: C1 (t) = C0 δ(t − θ1 ) (2.10) where θ1 = l1 A/F is the residence time in the PFR, as obtained in (a) above. When this is introduced into Eq (2.7) above (the model for the second-in-line CSTR), the result is dC2 τ2 = −C2 + C0 δ(t − θ1 ) (2.11) dt an equation best solved using Laplace transforms. Upon taking Laplace transforms, and rearranging, we obtain C0 e−θ1 s ˆ C(s) = τ2 s + 1

(2.12)

5 from where Laplace inversion yields C2 (t) =

C0 −(t−θ1 )/τ2 e τ2

(2.13)

The desired residence time distribution is then obtained as: f (θ) =

1 −(θ−θ1 )/τ2 e τ2

(2.14)

2.8 Let E represent the “event” that the ship in question took evasive action, and C the “event” that the ship counterattacked; let H represent the “event” that the ship was hit. From the relative frequency perspective, and assuming that the observed data set is large enough so that the relative frequencies with which events have occurred can be regarded as approximately equal to the true probabilities of occurrence, we are able to compute the required probabilities as follows. The probability that any attacked warship will be hit regardless of tactical response is obtained as: P (H) =

Total number of ships hit 60 + 62 = = 0.334 Total number of ships attacked 365

(2.15)

Similarly, the probability that a ship taking evasive action is hit is given by: Total number of ships taking evasive action that were hit Total number of ships taking evasive action 60 = 0.333 (2.16) 180

P (HE ) = =

Finally, the probability that a counterattacking ship is hit is given by: P (HC ) = =

Total number of counterattacking ships that were hit Total number of counterattacking ships 62 = 0.335 (2.17) 185

We now observe that all three probabilities are about equal, indicating that in terms of the current classification, the tactical response of the attacked ship does not much matter in determining whether the ship is hit or not. 2.9 (i) Assuming that past performance is indicative of future results, then from the relative frequency perspective, the probability of team A winning a generic game will be given by 9 = 0.6 PG (A) = 15 since team A won 9 out of the 15 games played. Similarly, the probability of team B winning a generic game will be given by: PG (B) =

12 = 0.8 15

6

CHAPTER 2.

(ii) Assuming that the respective proportions of past wins are indicative of each team’s capabilities and remain unchanged when the two teams meet, then P (A) 0.6 3 = = P (B) 0.8 4

(2.18)

Now, along with the constraint, P (A) + P (B) = 1

(2.19)

we have two equations to solve for the two unknown probabilities. First, from Eq (2.19), we have that P (A) 1 +1= P (B) P (B) which, upon introducing Eq (2.18) yields 1 3 +1= 4 P (B) from where the required probabilities are determined as: P (B) =

4 3 ; P (A) = 7 7

(2.20)

Chapter 3

Exercises Section 3.1 3.1 (i) Experiment: Toss the two dice once, at the same time; Trial : A single toss of the two dice; Outcomes: nB , nW , respectively, the number showing on the black die, and on the white die; Sample space: A set consisting of a total of 36 ordered pairs; Ω = {(i, j) : i = 1, 2, . . . , 6; j = 1, 2, . . . , 6} (ii) The simple events associated with the sum S = 7 are: E1 = (1, 6); E2 = (2, 5); E3 = (3, 4); E4 = (4, 3); E5 = (5, 2); E6 = (6, 1), a total of 6 distinct entries because the first entry, nB , is distinguishable from the second. 3.2 (i) A = {(20, 0, 0)}; assuming that the complement of “approve” is “disapprove”, then A∗ = {(0, 20, 0)} (ii) B = {n1 > n0 }; B ∗ = {n1 < n0 } (iii) C = {n2 > n1 }; (iv) D = {n2 > 10} Section 3.2 3.3 From the given sets, obtain: A ∪ B = {x : x = 0, 1, 2, 3, 4 . . .} and A ∩ B = Φ; the null set 3.4 B=

∞ [

Ai = {x : 0 ≤ x ≤ 1}

i=1

1

2

CHAPTER 3.

3.5 Venn diagrams for the LHS and the RHS are acceptable (not shown). Alternatively, use algebra of sets as follows: (i) Let D = (A ∪ B)∗ . Then x ∈ D ⇒ x ∈ / (A ∪ B) ⇒ x ∈ / A and x ∈ / B ⇒ x ∈ A∗ and x ∈ B ∗ , so that ∗ ∗ ∗ ∗ x ∈ (A ∩ B ), implying that D = (A ∩ B ), as required. (ii) Let D = (A ∩ B)∗ . Then x ∈ D ⇒ x ∈ / (A ∩ B), which implies that either x ∈ / A or x ∈ / B so that x ∈ A∗ or x ∈ B ∗ , i.e., x ∈ (A∗ ∪ B ∗ ), implying that D = (A∗ ∪ B ∗ ), as required. (iii) Similarly, let D = A ∩ (B ∪ C). Then x ∈ D ⇒ x ∈ A and (x ∈ B or x ∈ C) ⇒ (x ∈ A and x ∈ B) or (x ∈ A and x ∈ C), i.e., x ∈ (A ∩ B) ∪ (A ∩ C), implying that D = (A ∩ B) ∪ (A ∩ C), as required. (iv) Finally, let D = A ∪ (B ∩ C). Then x ∈ D ⇒ x ∈ A or (x ∈ B and x ∈ C) ⇒ (x ∈ A or x ∈ B) and (x ∈ A or x ∈ C), i.e., x ∈ (A ∪ B) ∩ (A ∪ C), implying that D = (A ∪ B) ∩ (A ∪ C), as required. 3.6 Proceed by expressing the sets A and B in terms of disjoint sets as follows: A B

= (A ∩ B) ∪ (A ∩ B ∗ ) = (B ∩ A) ∪ (B ∩ A∗ )

from which we obtain: P (A) = P (B) =

P (A ∩ B) + P (A ∩ B ∗ ) ⇒ P (A ∩ B ∗ ) = P (A) − P (A ∩ B)(3.1) P (B ∩ A) + P (B ∩ A∗ ) ⇒ P (B ∩ A∗ ) = P (B) − P (A ∩ B)(3.2)

And now, the set A ∪ B, in terms of a union of disjoint sets, is A ∪ B = (A ∩ B) ∪ (A ∩ B ∗ ) ∪ (B ∩ A∗ ) so that: P (A ∪ B) = P (A ∩ B) + P (A ∩ B ∗ ) + P (B ∩ A∗ ) Now substitute Eqs (3.1) and (3.2) to obtain P (A ∪ B) = P (A) + P (B) − P (A ∩ B) as required. 3.7 Assuming that staff members are neither engineers nor statisticians, then the total number of engineers plus statisticians = 100 − 25 = 75. This sum is made up of those that are purely engineers (E), those that are purely statisticians (S) and those that are both engineers and statisticians (B) so that the total number of engineers will be E + B = 50

3 the total number of statisticians will be S + B = 40 but E + S + B = 75 hence obtain B = 15 A Venn diagram representing the supplied information is given below. From

Figure 3.1: Venn diagram for Problem 3.7 here, the required probability, P (B ∗ ) is obtained as P (B ∗ ) = 1 −

15 = 0.85. 100

Section 3.3 3.8 From the supplied information, obtain

Q(A1 ) =

3 µ ¶ µ ¶x X 1 2 x=0

=

3

3

µ ¶µ ¶ 2 1 1 1 = 1+ + 2 + 3 3 3 3 3

80 81

Similarly, Q(A2 )

= =

∞ µ ¶ µ ¶x X 2 1

µ ¶X ∞ µ ¶x 2 1 = ) 3 x=0 3

3 3 x=0 µ ¶µ ¶ 2 1 =1 3 1 − 13

4

CHAPTER 3.

3.9 From the definitions of the sets, obtain Z ∞ P (A) = e−x dx = e−4 4

Z

4

P (A∗ ) = 0

Z P (A ∪ A∗ ) =

e−x dx = 1 − e−4



e−x dx = 1

0

Alternatively, directly from P (A) = e

−4

, obtain

P (A∗ ) = 1 − P (A) = 1 − e−4 and from the fact that the two sets A and A∗ are mutually exclusive and complementary, obtain the final result that P (A ∪ A∗ ) = P (A) + P (A∗ ) = 1. 3.10 As in Problem 3.1, obtain the sample space as: Ω = {(i, j) : i = 1, 2, . . . , 6; j = 1, 2, . . . , 6} a set with 36 elements. By assigning equal probability to each element in the set, determine the required probabilities as follows: (i) Since A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, obtain P (A) = 6/36 = 1/6 (ii) B = {nB < nW } is a set consisting of the 15 off-diagonal elements of the 6 × 6 array of the ordered pair (nB , nw ) for which nB < nW ; hence, P (B) = 15/36 (iii) B ∗ = {nB ≥ nW } is the complementary set to B above, from which we immediately obtain P (B ∗ ) = 1 − 15/36 = 21/36 Alternatively, we may note that B ∗ = {nB ≥ nW } is a set consisting of the 15 diagonal elements for which nB > nW , in addition to the 6 diagonal elements for which nB = nW , yielding the same result. (iv) C = {nB = nW } is a set consisting of the 6 diagonal elements for which nB = nW , so that P (C) = 6/36 = 1/6 (v) D = {nB + nW = 5 or 9} may be represented as a union of two disjoint subsets, D1 ∪ D2 where D1 = {nB + nW = 5} and D2 = {nB + nW = 9}. More specifically, D1 = {(1, 4), (2, 3), (3, 2), (4, 1)} ⇒ P (D1 ) = 4/36 and D2 = {(3, 6), (4, 5), (5, 4), (6, 3)} ⇒ P (D2 ) = 4/36

5 from where we obtain: P (D) = P (D1 ) + P (D2 ) = 2/9 3.11 (i) Assuming that balls of the same color are indistinguishable, the sample space is obtained as: Ω = {(R, G)i , i = 1, 2, . . . , 9; (R, R)i , i = 1, 2, 3; (G, G)i , i = 1, 2, 3} indicating 9 possible realizations of (R, G) outcomes; and 3 realizations each¡ of ¢ (R, R) and (G, G) for a total of 15 elements. (Note that this is the same as 62 , the total number of ways of selecting two items from 6 when the order is not important.) (ii) Let SD be the event that the outcome consists of two balls of different colors. Upon assigning equal probability to each of the 15 elements in the sample space, we obtain the probability of drawing two balls of different colors as P (SD ) =

9 = 0.6 15

(iii) Without loss of generality, let the balls be numbered R1 , R2 , R3 , and G4 , G5 , G6 ; then under the indicated conditions, the sample space will consist of the following sets: SR the set of all red outcomes; SG the set of all greens, and SD , the set of different colored outcomes, i.e., Ω = {SR , SG , SD } where SR = {R1 R2 , R1 R3 , R2 R1 , R2 R3 , R3 R1 , R3 R2 }, with a total of 6 elements, since the numbered balls are now all distinguishable, so that the outcome Ri Rj (indicating that the ball labeled Ri is drawn first, and the one labeled Rj is drawn next) is different from Rj Ri . Similarly, SG = {G1 G2 , G1 G3 , G2 G1 , G2 G3 , G3 G1 , G3 G2 }. Finally, SD contains 18 elements, 9 elements of the form Ri Gj ; i = 1, 2, 3; j = 1, 2, 3; and another 9 of form Gi Rj ; i = 1, 2, 3; j = 1, 2, 3. Again, note that the total number of elements, 30, is the same as the number of distinct permutations of 2 items selected from 6 (when the order of selection is important). Upon assigning equal probability to the 30 elements of Ω, we obtain the required probability as: 18 P (SD ) = = 0.6 30 so that there is no difference between this result and the one obtained in (ii). (There are alternative means of obtaining this same result.) 3.12 (i) The random variable space is: V = {x : x = 0, 1, 2, 3, 4}

6

CHAPTER 3.

(ii) The induced probability set function, PX (A) = P (ΓA ), is, in this case: PX (X PX (X PX (X PX (X

= 0) = 1) = 2)

= = =

9/13, 1/13, 1/13,

= 3)

=

1/13,

PX (X = 4)

=

1/13

(and similarly for the rest of the subsets.) 3.13 The sample space Ω is given by: Ω

=

{HHHH, HHHT, HHT H, HT HH, HHT T, HT HT, HT T H, HT T T, T HHH, T HHT, T HT H, T T HH, T HT T, T T HT, T T T H, T T T T }

with a total of 16 elements. By defining the random variable X as the number of heads, upon assigning equal probabilities to each outcome, we obtain VX = {0, 1, 2, 3, 4} from which the following probabilities are easily obtained: PX (0) PX (1) PX (2) PX (3) PX (4)

= = = = =

1/16 4/16 6/16 4/16 1/16

From the provided distribution function, we obtain, for p = 12 , f (x) =

1 4! x!(4 − x)! 16

so that f (0) = 1/16; f (1) = 4/16; f (2) = 6/16; f (3) = 4/16 and f (4) = 1/16, precisely as obtained earlier. 3.14 Determine first the complementary probability p∗ that all k birthdays are distinct so that no two are the same. The total number of possible combinations of birthdays is (365)k , since any of the 365 days in the “normal” (as opposed to “leap”) year, 1989, can be the birthday for any of the k students in class. Of these, the number of “favorable” cases involves selecting exactly k distinct numbers from 365 without repetition. Since the number of distinct permutations of length k < n from n items is n!/(n − k)!, we have that p∗ = (1 − p) =

1 365! (365 − k)! 365k

7 Sections 3.4 and 3.5 3.15 (i) P (A) = P (E1 ) + P (E2 ) = 0.11 + 0.20 = 0.31 P (B) = P (E2 ) + P (E3 ) + P (E4 ) = 0.54 P (C) = P (E5 ) + P (E6 ) = 0.35 P (D) = P (E1 ) + P (E2 ) + P (E5 ) = 0.51 (ii) (A ∪ B) = {E1 , E2 , E3 , E4 } so that: P (A ∪ B) = P (E1 ) + P (E2 ) + P (E3 ) + P (E4 ) = 0.65 Similarly, P (A ∩ B) = P (E2 ) = 0.2 P (A ∪ D) = P (D) = 0.51 P (A ∩ D) = P (A) = 0.31 P (B ∪ C) = P (B) + P (C) = 0.89 P (B ∩ C) = P (Φ) = 0.0 P (B∩A) P (A) = 0.2/0.31 = 0.645 P (A∩B) P (B) = 0.2/0.54 = 0.370 P (B∩C) P (C) = 0 P (D∩C) P (C) = 0.2/0.35 = 0.571

(iii) P (B|A) = P (A|B) = P (B|C) = P (D|C) =

B and C are mutually exclusive because (B ∩ C) = Φ and P (B|C) = 0. 3.16 Let Bi indicate the event that the ith child is a boy; then, by independence and equiprobability of these events, the probability of interest is obtained as: P (B1 , B2 , B3 ) = P (B1 )P (B2 )P (B3 ) = (0.5)3 = 0.125 Now, under the stated conjecture, the required probability is P (B3 ∩ B1 B2 ) given the following information: P (B3 |B1 B2 ) = 0.8 By definition of conditional probabilities, we know that: P (B3 ∩ B1 B2 ) = P (B3 |B1 B2 )P (B1 B2 ) And now, since by independence and equiprobability, P (B1 B2 ) = 0.52 = 0.25, we now obtain: P (B3 ∩ B1 B2 ) = 0.8 × 0.25 = 0.2 3.17 (i) If B attracts A, then P (B|A) > P (B) By definition, P (B|A) = (3.3) yields

P (B∩A) P (A)

(3.3)

which, when substituted into the LHS in Eq

P (B ∩ A) > P (A)P (B)

8

CHAPTER 3.

And now, since P (B) > 0 and P (B ∩ A) = P (A ∩ B), we obtain P (A ∩ B) > P (A) ⇒ P (A|B) > P (A) P (B) as required. (ii) It is required to show that P (B ∗ |A) < P (B ∗ ) follows from Eq (3.3) above. First, Eq (3.3) implies: 1 − P (B|A) < P (B ∗ ) and, again, since P (B|A) =

P (A∩B) P (A) ,

we have that

P (A) − P (A ∩ B) < P (B ∗ ) P (A)

(3.4)

Now, as a union of two disjoint sets, A = (B ∗ ∩ A) ∪ (A ∩ B) so that,

P (A) = P (B ∗ ∩ A) + P (A ∩ B)

As a result, Eq (3.4) becomes P (B ∗ ∩ A) > P (B ∗ ) ⇒ P (B ∗ |A) < P (B ∗ ) P (A) as required. 3.18 That A and B are independent implies P (A|B) = P (A); P (B|A) = P (B) from where we immediately obtain: 1 − P (A|B) = P (A∗ ); or 1 −

P (A ∩ B) = P (A∗ ) P (A)

(3.5)

And now, because B = (A∗ ∩ B) ∪ (A ∩ B) ⇒ P (B) = P (A∗ ∩ B) + P (A ∩ B) then Eq (3.5) becomes P (A∗ ∩ B) = P (A∗ ); or P (A∗ ∩ B) = P (A∗ )P (B) P (B) (which implies, in its own right, that A∗ is independent of B). Now, because (A∗ ∩ B ∗ ) = (A ∪ B)∗ , so that P (A∗ ∩ B ∗ ) = 1 − P (A ∪ B)

(3.6)

9 and, in terms of a union of disjoint sets, (A ∪ B) = (A∗ ∩ B) ∪ A so that P (A ∪ B) = P (A∗ ∩ B) + P (A) it follows therefore that P (A∗ ∩ B ∗ ) = 1 − P (A∗ ∩ B) − P (A) = P (A∗ ) − P (A∗ ∩ B) Upon introducing Eq (3.6), we obtain, P (A∗ ∩ B ∗ ) = P (A∗ ) − P (A∗ )P (B) = P (A∗ )[1 − P (B)] = P (A∗ )P (B ∗ ) as required. 3.19 By definition of conditional probabilities, P (A ∩ B|A ∪ B) =

P [(A ∩ B) ∩ (A ∪ B)] P (A ∪ B)

(3.7)

The following identities will be useful: (A ∩ B) ∩ (A ∪ B) = (A ∩ B) (A ∩ B) ∩ A = (A ∩ B) (A ∩ B) ∩ B = (A ∩ B) From here, first, Eq (3.7) becomes: P (A ∩ B|A ∪ B) =

P (A ∩ B) P (A ∪ B)

(3.8)

Now, because (A ∪ B) = A ∪ (B ∩ A∗ ), so that P (A ∪ B) = P (A) + P (B ∩ A∗ ) it follows that P (A ∪ B) ≥ P (A), so that Eq (3.8) becomes P (A ∩ B|A ∪ B) ≤

P (A ∩ B) P (A)

and from the identities shown above, P (A ∩ B|A ∪ B) ≤

P (A ∩ B ∩ A) = P (A ∩ B|A) P (A)

Equality holds when P (B ∩ A∗ ) = 0, which will occur when B ⊂ A.

(3.9)

10

CHAPTER 3.

3.20 From the problem definition, we can deduce that (i) The family has two children, each equally likely to be male (M ) or female (F ); (ii) At least one child is male; (iii) Required: the probability that the other child is female. From (i), obtain the sample space as Ω = {M M, M F, F M, F F } a set consisting of 4 equally likely outcomes, where M F is distinct from F M . From (iii) we know that the event of interest is EI = {M F, F M }, that the unknown sibling is female either younger or older; and from (ii), we deduce that the conditioning event is EC = {M M, M F, F M }. From here, therefore, P (EI |EC ) =

P (EI ∩ EC ) P (EI ) 2 = = P (EC ) P (EC ) 3

This result appears counterintuitive at first because one would think (erroneously) that since we already know that one child is male, then there are only two equally likely options left: the unknown sibling is either male or female. This would then lead to the erroneous conclusion that the required probability is 1/2. The error arises because the outcomes M F (the older sibling is male) and F M (the older sibling is female) are separate and distinct, and equally likely; there are therefore in fact three possible outcomes that are consistent with fact (ii) above (at least one male child in the family), not two; and of these possible outcomes, two are “favorable.” 3.21 By independence and by the definition of the conditions required for the series-configured system to function, P (SS ) = P (A)P (B) = 0.99 × 0.9 = 0.891 is the probability that the system functions. With the parallel configuration, and by the definition of the conditions required for the parallel-configured system to fail, P (FP ) = P (FA )P (FB ) = [1 − P (A)][1 − P (B)] is the probability that the parallel-configured system fails. The required probability that the system functions is therefore given by: P (SP ) = 1 − P (FP ) = 1 − (0.01 × 0.1) = 1 − 0.001 = 0.999 Clearly the probability that the parallel-configured system functions is higher. This is reasonable because with such a configuration, one component is redundant, acting as a back-up for the other component. Because only one component is required to function for the entire system to function, even if one fails, the system continues to function if the redundant component still functions. With

11 the series component, the opposite is the case: if one component fails, the entire system fails. 3.22 From the theorem of total probability, P (S) = P (S|Ck )P (Ck ) + P (S|Ck∗ )P (Ck∗ ) and given P (Ck ) = 0.9 (so that P (Ck∗ ) = 0.1), in conjunction with the given conditional probabilities, obtain P (S) = (0.9 × 0.9) + (0.8 × 0.1) = 0.89

APPLICATION PROBLEMS 3.23 (i) The required Lithium toxicity probabilities are obtained as follows: 1. From the table, P (L+ ) = 51/150 = 0.340; 2. P (L+ |A+ ) = 30/47 = 0.638, a moderate value indicating that there is a reasonable chance that the assay will correctly identify high lithium concentrations. 3. P (L+ |A− ) = 21/103 = 0.204, a fairly high (about 20%) chance of missed diagnoses. (ii) The required blood lithium assay probabilities are obtained as follows: 1. P (A+ ) = 47/150 = 0.104; 2. P (A+ |L+ ) = 30/51 = 0.588. This quantity shows the percentage of patients known to have high lithium concentrations that are identified as such by the assay. Now, given a generic function, y = f (x), ∆y, the “response” in y as a result of a change, ∆x in x, is given by ∆y = S∆x

(3.10)

where S = ∂y/∂x, the “local” sensitivity function, indicates how sensitive y is to unit changes in x. In this particular application, by definition of conditional probabilities, P (A+ ∩ L+ ) = P (A+ |L+ )P (L+ ) Here, P (A+ ∩ L+ ) is representative of the theoretical proportion of the entire population with high lithium toxicity that are correctly identified by the assay as such, while P (L+ ) is the proportion of the population with high lithium toxicity. By analogy with Eq (3.10), observe that P (A+ |L+ ) plays the role of the sensitivity function. The computed value of 0.588 (not quite 0.6) indicates that the assay is not overly sensitive.

12

CHAPTER 3. 3. From Bayes’ rule, P (L+ |A+ ) =

P (A+ |L+ )P (L+ ) = P (A+ )

30 51

×

51 150

47 150

=

30 = 0.638 47

as has already been computed directly in (i) above. 3.24 The given information translates as follows: • The sample space is: Ω = {T1 , T2 , T3 , T4 , T5 }, where Ti is the outcome that the polymorph produced at any particular time is of Type i = 1, 2, . . . , 5; • The “probability distribution” is P (T1 ) = 0.3; P (T2 ) = P (T3 ) = P (T4 ) = 0.2 and P (T5 ) = 0.1. • The “events” are the applications: A = {T1 , T2 , T3 }; and B = {T2 , T3 , T4 } (i) The “event” of interest in this case is A = {T1 , T2 , T3 }, so that P (A) = 0.3 + 0.2 + 0.2 = 0.7 (ii) The required probability is P (T2 |B), which, by definition, is: P (T2 |B) =

0.2 P (T2 ∩ B) = = 1/3 P (B) 0.2 + 0.2 + 0.2

(iii) The required probability P (A|B) is obtained in the usual manner, by definition of conditional probability, i.e., P (A|B) =

P (A ∩ B) P (T2 ) + P (T3 ) = = 2/3 P (B) 0.6

(iv) The converse probability, P (B|A), may be obtained by Bayes’ rule: i.e., P (B|A) =

P (A|B)P (B) = P (A)

2 3

× 0.6 = 4/7. 0.7

3.25 Consider a person selected randomly from the population who undergoes this test; define the following events: D: the event that disease (abnormal) cells are present; S: the event that the sample misses the disease (abnormal) cells; W : the event that the test result is wrong; C: the event that the test result is correct. (i) First, observe that the test result can be wrong (a) when disease (abnormal) cells are present but the test fails to identify them; or (b) when there are no abnormal cells present but the test misclassifies normal cells as abnormal. By the theorem of total probability, obtain: P (W ) = P (W |D)P (D) + P (W |D∗ )P (D∗ )

(3.11)

13 From the given information, P (D) = θD , so that P (D∗ ) = 1 − θD ; also P (W |D∗ ) = θm . The only term yet to be determined in this equation is P (W |D); and it is associated with the event that the test is wrong given that disease cells are present—an event consisting of two mutually exclusive parts: (a) when the disease cells are present but the sample misses them, or (b) when the sample actually contains the disease cells but the test fails to identify them. Thus, the required probability is obtained as: P (W |D) = =

P (W ∩ S|D) + P (W ∩ S ∗ |D) θs (1 − θm ) + (1 − θs )θf

(3.12)

Upon introducing Eq (3.12) into Eq (3.11), we obtain P (W ) = {θs (1 − θm ) + (1 − θs )θf } θD + θm (1 − θD ) and the probability that the test is correct, P (C) is obtained as 1 − P (W ). (ii) Let A be the event that an abnormality has been reported. The required probability is P (D∗ |A), which, according to Bayes’ Theorem, is obtained as: P (D∗ |A) =

P (A|D∗ )P (D∗ ) P (A|D∗ )P (D∗ ) = P (A) P (A|D)P (D) + P (A|D∗ )P (D∗ )

(3.13)

All the terms in this expression are known except P (A|D), which, as in (i) above is obtained as follows: P (A|D) = =

P (A ∩ S|D) + P (A ∩ S ∗ |D) θm θf + (1 − θs )(1 − θm )

(3.14)

so that Eq (3.13) becomes: P (D∗ |A) =

θm (1 − θD ) {θm θf + (1 − θs )(1 − θm )} θD + θm (1 − θD )

3.26 With the given information (and θD = 0.02), obtain P (W ) ⇒ P (C)

= (0.1 × 0.9 + 0.9 × 0.05) × 0.02 + 0.1 × 0.98 = 0.1007; = 0.8993

The second probability is: P (D∗ |A) = 0.098/(0.0163 + 0.098) = 0.857 a very high probability that the test will return an abnormality result when there is in fact no abnormality present. A major contributor to this problem is the rather high probability of misclassifying normal cells as abnormal, θm = 0.1. When θm is reduced to a more manageable 0.01, the results are: P (W ) = (0.1 × 0.99 + 0.9 × 0.05) × 0.02 + 0.01 × 0.98 = 0.0127

14

CHAPTER 3.

so that P (C) = 0.987, along with P (D∗ |A) = 0.0098/(0.0178 + 0.0098) = 0.355 Thus, the probability that the test result is correct increases from almost 0.9 to 0.987, while the probability of reporting an abnormality in the absence of a real abnormality drops significantly from 0.857 to 0.355. 3.27 (i) P (Q1 ) = 336 425 = 0.791, also 18 P (Q∗3 ) = 1 − P (Q3 ) = 1 − 425 = 0.958 110 (ii) P (Q1 |M1 ) = 150 = 0.733; also 226 P (Q1 |M2 ∪ M3 ) = (150+76) (180+90) = 275 = 0.822 1 (iii) P (M3 |Q3 ) = 18 = 0.056; also P (M2 |Q2 ) =

33 71

= 0.465

3.28 Let B1 be the event that a tax filer selected at random belongs to the first income bracket (Below $10, 000); B2 , if in the second bracket ($10, 000 − $24, 999); B3 , if in the third bracket ($25, 000 − $49, 999); and B4 , if in the fourth bracket, ($50, 000 and above). Furthermore, let A be the event that the tax filer is audited. Then the given information may be translated as follows: • The percent audited column corresponds to P (A|Bi ); • P (Bi ) is the entry in the “Number of filers” column divided by 89.8 (the total number of filers). (i) By the theorem of total probability, P (A)

=

4 X

P (A|Bi )P (Bi )

i=1

= =

(31.4 × 0.0034) + (30.4 × 0.0092) + (22.2 × 0.0205) + (5.5 × 0.04) 89.8 0.0065

(The same result is obtained by expressing P (A) as the total number audited divided by the total number in the population.) (ii) The required probability, P (B3 ∩ A), is obtained as: P (B3 ∩ A) = P (A|B3 )P (B3 ) =

22.2 × 0.205 = 0.051 89.8

(iii) P (B4 |A) is obtained from P (B4 ∩ A)/P (A) or, equivalently, by expanding the indicated ratio of probabilities into Bayes’ rule, i.e., P (B4 |A) =

P (A|B4 ) P (B4 ∩ A) = P4 = 0.376 P (A) i=1 P (A|Bi )P (Bi )

Chapter 4

Exercises Section 4.1 4.1 Under the specified conditions (no twins), the only two possible outcomes after each single delivery are B, for “Boy,” and G for “Girl”; the desired sample space is therefore: Ω = {BBB, BBG, BGB, GBB, BGG, GBG, GGB, GGG} If X is the random variable representing the total number of girls born to the family, then the random variable space is clearly: VX = {0, 1, 2, 3} Now, given that P (B) = 0.75, implying that P (G) = 0.25, we are able to obtain first, that: P (BBB) = (0.75)3 = 0.422 Thus, in terms of the random variable, X, this event corresponds to X = 0 (no girls); i.e., PX (0) = 0.422. Next, the remaining probabilities for X = 1, 2 and 3 are obtained as follows: PX (1) = P {BBG, BGB, GBB} = P (BBG) + P (BGB) + P (GBB) by virtue of each of the three outcomes being mutually exclusive. And now, since P (BBG) = (0.75)2 × 0.25 = 0.1406 and P (BBG) = P (BGB) = P (GBB), we obtain finally that PX (1) = 0.422 Via similar arguments, we obtain: PX (2) = P (BGG) + P (GBG) + P (GGB) = 3 × 0.047 = 0.141 and PX (3) = P (GGG) = 0.016 The complete probability distribution for all the possible combinations of children that can be born to this family is represented in the table below (with X as the total number of girls). 1

2

CHAPTER 4. X 0 1 2 3 TOTAL

f (x) 0.422 0.422 0.141 0.015 1.000

4.2 In this case, the sample space, Ω, is given by: Ω

= {HHHH, HHHT, HHT H, HT HH, HHT T, HT HT, HT T H, HT T T, T HHH, T HHT, T HT H, T T HH, T HT T, T T HT, T T T H, T T T T }

with a total of 16 elements, ωi ; i = 1, 2, . . . , 16, in the order presented above, with ω1 = HHHH, ω2 = HHHT, . . . , ω16 = T T T T . If X is the total number of tails, then the random variable space is: V = {0, 1, 2, 3, 4} The set A corresponding to the event that X = 2 therefore consists of 6 mutually exclusive outcomes, ω5 , ω6 , ω7 , ω10 , ω11 , ω12 , as defined above. Assuming equiprobable outcomes, as usual, we obtain: P (X = 2) = 6/16 = 3/8 4.3 (i) From the spaces given in Eqs (4.10), and (4.11) in the text, we obtain that the event A, that X = 7, is a set consisting of a total of 6 elementary events E1 = (1, 6); E2 = (2, 5); E3 = (3, 4); E4 = (4, 3); E5 = (5, 2); E6 = (6, 1). With equiprobable outcomes, we obtain therefore that: P (A) = P (X = 7) = 6/36 = 1/6 (ii) The set B, representing the event that X = 6, is: B = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} consisting of 5 elements, so that P (B) = 5/16 The set C can be represented as a union of two disjoint sets, C1 and C2 , where C1 , representing the event that X = 10, is: C1 = {(4, 6), (5, 5), (6, 4)} while C2 , representing the event that X = 11, is: C2 = {(5, 6), (6, 5)}

3 And now, either by forming C explicitly as the union of these two sets, or by virtue of these sets being disjoint, so that P (C) = P (C1 ) + P (C2 ) we obtain, upon assuming equiprobable outcomes, P (C) = 3/36 + 2/36 = 5/36 Section 4.2 4.4 From Eqn (4.11) in the text, which gives the random variable space as V = {2, 3, 4, . . . , 12} the desired complete pdf is obtained as follows: Let A2 represent the event that X = 2, then: A2 = {(1, 1)}, so that P (X = 2) = 1/36. Now, if An represents the event that X = n; n = 2, 3, . . . 12, then similarly, A3 = {(1, 2), (2, 1)}, so that P (X = 3) = 2/36. A4 = {(1, 3), (2, 2), (2, 1)}, so that P (X = 4) = 3/36. A5 = {(1, 4), (2, 3), (3, 2), (4, 1)}, so that P (X = 5) = 4/36. A6 = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}, so that P (X = 6) = 5/36. A7 = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, so that P (X = 7) = 6/36. A8 = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}, so that P (X = 8) = 5/36. A9 = {(3, 6), (4, 5), (5, 4), (6, 3)}, so that P (X = 9) = 4/36. A10 = {(4, 6), (5, 5), (6, 4)}, so that P (X = 10) = 3/36. A11 = {(5, 6), (6, 5)}, so that P (X = 11) = 2/36. Finally, A12 = {(6, 6)}, so that P (X = 12) = 1/36. The resulting pdf, f (x), and the cdf, F (x) (obtained cumulatively from the values shown above), are presented in the table below. A plot of the pdf and cdf are shown in Fig 4.1. X 0 1 2 3 4 5 6 7 8 9 10 11 12

f (x) 0 0 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

F (x) 0 0 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36

4

CHAPTER 4.

Probability Distribution Function 0.18 0.16 0.14 0.12

f(x)

0.10 0.08 0.06 0.04 0.02 0.00 0

2

4

6

8

10

12

14

12

14

x

Cumulative Distribution Function 1.0

0.8

F(x)

0.6

0.4

0.2

0.0 0

2

4

6

8

10

x

Figure 4.1: The pdf and cdf for the double dice experiment.

5 4.5 (i) From the given table, the required cdf, F (x), is obtained as shown in the table below: x F (x)

1 0.10

2 0.35

3 0.65

4 0.90

5 1.00

(ii) From the given table, obtain: P (X ≤ 3) = 0.65; P (X < 3) = F (2) = 0.35; P (X > 3) = 1 − F (3) = 0.35; P (2 ≤ X ≤ 4) = f (2) + f (3) + f (4) = 0.80. 4.6 By definition, for the discrete random variable, F (x) =

x X

f (i)

i=1

so that F (x − 1) =

x−1 X

f (i)

i=1

from where it is clear that f (x) = F (x) − F (x − 1) For the particular cdf given here, the required pdf is obtained as: ³ x ´k µ x − 1 ¶k f (x) = − ; x = 1, 2, . . . , n n n Specifically for k = 2 and n = 8, the cdf and pdf are given respectively as: ³ x ´2 F (x) = ; x = 1, 2, . . . , 8 8 and f (x) =

³ x ´2 µ

=

µ −

x−1 8

¶2

8 ¶ 2x − 1 ; x = 1, 2, . . . , 8 64

A plot of the pdf and the cdf is shown in Fig 4.2. 4.7 (i) To be a legitimate pdf, the given function, f (x), must satisfy the following condition: Z 1

cxdx = 1 0

6

CHAPTER 4.

Probability Distribution Function 0.25

0.20

f(x)

0.15

0.10

0.05

0.00 0

1

2

3

4

5

6

7

8

9

7

8

9

x

Cumulative Distribution Function 1.0

0.8

F(x)

0.6

0.4

0.2

0.0 0

1

2

3

4

5

6

x

Figure 4.2: The pdf and cdf for the double dice experiment.

7 Upon carrying out the indicated integration, we obtain: ¯1 cx2 ¯¯ =1 2 ¯0 from which we determine easily that c = 2, so that the complete pdf is now given by: ½ 2x 0 < x < 1 f (x) = 0 otherwise From here, we obtain the required cdf as: ½ 2 Z x x F (x) = 2u du = 0 0

0 30 ln 2) = 1 − F (30 ln 2) = 12 Section 4.3 4.9 E(X) for the discrete random variable, X, of Exercise 4.5 is given by: E(X) =

X i

xi f (xi ) = (1×0.1)+(2×0.25)+(3×0.3)+(4×0.25)+(5×0.1) = 3.0

8

CHAPTER 4.

On the other hand, for the continuous random variable in Exercise 4.7, ¯1 Z Z 1 2x3 ¯¯ 2 x(2x)dx = E(X) = xf (x) dx = = ¯ 3 0 3 0 Finally, for the residence time distribution in Eq (4.41), the expected value is given by: Z 1 ∞ −x/τ E(X) = xe dx τ 0 Now, via direct integration by parts, letting u = x and dv = e−x/τ dx, we obtain: ½ ¾ Z ∞ ´¯∞ Z ∞ 1 ³ 1 −x/τ −x/τ ¯ −x/τ xe dx = −xτ e τe dx ¯ + τ τ 0 0 0 o 1n 0 + τ (−τ e−x/τ |∞ = 0 ) =τ τ as required. Thus, τ is the expected (or mean) value of the residence time for the single CSTR whose residence time, x, follows the distribution given in Eq (4.11). 4.10 For the pdf given in Eq (4.140), the absolute convergence condition for the existence of the expected value requires that: ∞ X

|x|f (x) = 4

x=1

∞ X

1 2) = 1 − P (X ≤ 2) = 1 − {f (0) + f (1) + f (2)} In this case, with η/ζ = 3/4.5 = 2/3, we obtain: µ ¶ µ ¶x 1 2 f (x) = 3 3 so that f (0) = 1/3; f (1) = 2/9; f (2) = 4/27; hence, the required probability, that there are more than two cars at the station, is obtained as P (X > 2) = 8/27 The probability that there are no cars, f (0), is 1/3. 4.28 (i) The histogram is plotted in Fig 4.3; it indicates a distribution that is skewed to the right, as is typical of income distributions. (ii) From the data, the mean is obtained as: X x ¯= xi f (xi ) = (2.5 × 0.04) + (7.5 × 0.13) + · · · + (57.5 × 0.01) = $20.7 (×103 ) i

Next, we observe from the data table that, cumulatively, 34% of the population have incomes up to $15,000, and up to $20,000 for 54% of the population. The 50% mark therefore lies somewhere in the [15-20] income group. Since the

24

CHAPTER 4.

20

Frequency (%)

15

10

5

0 2.5

12.5

22.5

32.5

42.5

52.5

x

Figure 4.3: Histogram for US family income in 1979. midpoint of this group is 17.5, we take this value as the median, since no other additional information is available. Thus, the median, xm , is given by: xm = 17.5 From the histogram, or directly from the data table, we see that the [15-20] group (with a mid-point of 17.5) is the “most popular” group, with 20% of the population; the mode, x∗ , is therefore determined as: x∗ = 17.5 Next, the variance and skewness are obtained as follows. X σ2 = (xi − x ¯)2 f (xi ) = 128.76 ⇒ σ = 11.35 i

and, µ3 = E(X − µ)3 =

X (xi − x ¯)3 f (xi ) = 1218.64 i

so that the coefficient of skewness, γ3 , is obtained as: γ3 =

µ3 = 0.8341 σ3

implying that the distribution is positively skewed (as is also evident from the shape of the histogram). (iii) Let L, M , and U represent the outcome that a single individual selected from the population is in the “Lower Class,” the “Middle Class,” and the “Upper Class,” respectively. Then, the following probabilities are easily determined

25 directly from the frequency table and the given income ranges that constitute each group classification: P (L) P (M ) P (U )

= 0.34 = 0.64 = 0.02

From here, we obtain that (a) P (L, L) = P (L)P (L) = 0.1156 (b) P (M, M ) = P (M )P (M ) = 0.4096 (c) P (M, U ) = P (M )P (U ) = 0.0128 (d) P (U, U ) = P (U )P (U ) = 0.0004 (iv) If an individual selected at random is an engineer, the immediate implication is that the individual’s income falls into the [20–55] bracket. From the table, the total percentage of the population in this bracket is 45, composed of 1% in the upper income bracket (those in the bracket from [50–55]), and the remaining 44% in the middle income bracket. The proportion of engineers in these two groups is 0.01 (since engineers make up 1% of the entire population). Thus, the probability that a person selected at random is in the middle class, conditioned upon the fact that this individual is an engineer, is: P (M |E) =

0.44 × 0.01 44 = = 0.978 0.45 × 0.01 45

To compute the converse probability, P (E|M ), one may invoke Bayes’ Theorem, i.e., P (M |E)P (E) P (E|M ) = P (M ) and since P (E) = 0.01, and P (M ) = 0.64, we immediately obtain the required probability as: 0.978 × 0.01 P (E|M ) = = 0.0153 0.64 Observe that these two probabilities, P (M |E) and P (E|M ), are drastically different, but the computed values make sense. First, the exceptionally high value determined for P (M |E) makes sense because the engineers in the population, by virtue of their salaries, are virtually all in the middle class, the only exceptions being a small fraction in the upper class. As a result, if it is given that an individual is an engineer, then it is nearly certain that the individual in question will be in the middle class. The value of P (M |E) reflects this perfectly. On the other hand, P (E|M ) is extremely low because the conditioning “set” is the income bracket: in this case, the defining characteristic is the fact that there are many more individuals in the middle class income bracket that are not engineers (recall that engineers make up only 1% of the total population). Thus if it is given that an individual is in the middle class, the chances that such an individual will be an engineer is quite small. However, because the middle class

26

CHAPTER 4.

is “over-represented” within the group of engineers, it is not surprising that 0.0153, the value determined for P (E|M ), even though comparatively small, is still some 15% higher than the value of 0.01 obtained for the unconditional P (E) in the entire population. 4.29 (i) From the problem statement, observe that xw is to be determined such that no more than 15% of the chips have lifetimes lower than xw ; i.e., the upper limit of xw (a number to be determined as a whole integer) is obtained from P (X ≤ xw ) = 0.15 From the given pdf, if we let η = 1/β, we then have: Z xw 0.15 = ηe−ηx dx 0

=

1 − e−ηxw

which is easily solved for xw , given η = 1/6.25 = 0.16, to obtain: xw = 1.016

(4.11)

as the upper limit. Thus, in whole integers, the warranty should be set at xw = 1 year. (ii) It is possible to use the survival function, S(x), directly for this part of the problem, since what is required is P (X > 3) = 0.85, or S(x) = 0.85 (for x = 3). Either from a direct integration of the given pdf, or from recalling the exact form of the survival function for the random variable whose pdf is given in Eq (4.163), (See Example 4.8 in the text), we obtain: S(x) = e−x/β

(4.12)

so that for x = 3, and S(x) = 0.85, we solve the equation ∗

for β2∗ to obtain:

0.85 = e−3/β2

(4.13)

β2∗ = 1/0.054

(4.14)

The implication is that the target mean life-span should be 1/0.054 = 18.52 years for the next generation chip. From an initial mean life-span of 6.25 years, the implied “fold increase” in mean life-span is 2.96 or about 3-fold. 4.30 (i) The required probability is P (X ≥ 4). For the younger patient, with E(X) = 2.5 (and V ar(X) = 1.25), Markov’s inequality states that: P (X ≥ 4) ≤

2.5 = 0.625 4

For the older patient, with E(X) = 1 (and V ar(X) = 0.8), Markov’s inequality states that: 1 P (X ≥ 4) ≤ = 0.25 4

27 so that with n = 5, the upper bound on the probability of obtaining a set of quadruplets or quintuplets is significantly higher for the younger patient (with p = 0.5) than for the older patient (with p = 0.2). To determine Chebyshev’s inequality for each patient, we know that, for this application: 4−µ µ + kσ = 4 ⇒ k = σ so that for the younger, k is given by: 1.5 k=√ 1.25 and therefore,

1 = 0.556 k2 Thus, in this case, Chebyshev’s inequality states that: P (X ≥ 4) ≤ 0.556 which is a bit tighter than the bound provided by Markov’s inequality. Similarly for the older patient, 3 k=√ 0.8 so that

1 = 0.089 k2 Therefore, Chebyshev’s inequality states in this case that: P (X ≥ 4) ≤ 0.089 which is also much tighter than the bound provided by Markov’s inequality. (ii) From the given pdf, we obtain, first for the younger patient: P (X ≥ 4) = f (4) + f (5) = 0.1875 This shows that while both inequalities are “validated” to be true, in the sense that the actual probability lies within the prescribed bounds, the actual value of 0.1875 is quite far from the upper bounds of 0.625 and 0.556. For the older patient, the actual probability, P (X ≥ 4), is obtained as: P (X ≥ 4) = f (4) + f (5) = 0.0067 Again, this shows that both inequalities are also “validated” as true for this patient; however, the actual value of 0.0067 is also quite far from the upper bounds of 0.25 and 0.089. In both cases Chebyshev’s inequality is sharper than Markov’s.

28

CHAPTER 4.

4.31 Let A be the event that an individual that is currently y years of age survives to 65 and beyond; and let B be the complementary event that this individual does not survive until age 65 (by dying at age y < 65). (i) For a policy based on a fixed premium, α, paid annually beginning at age y, then, over the entire life span of an individual that survives beyond 65, XA , the total amount collected in premiums from such an individual will be: XA = α(65 − y) The corresponding probability that this amount will be collected over this lifetime is PS (y), the probability of survival to age 65 at age y, indicated in the supplied table. On the other hand, in the event that an individual does not survive to age 65, the payout, XB = π(y), is age-dependent, with associated probability, (1 − PS (y)). Thus, the expected revenue per individual, over the individual’s lifetime, is given by: α(65 − y)PS (y) − π(y)(1 − PS (y)) = RE (65 − y)

(4.15)

where RE is the expected revenue per year, per participant, over the duration of his/her participation. When Eq (4.15) is solved for π(y), the result is: π(y) =

(65 − y)(αPS (y) − RE ) 1 − PS (y)

(4.16)

And now, specifically for a fixed annual premium, α = 90.00, and for a target expected (per capita) revenue, RE = 30, the computed values for the agedependent payout are shown in the table below. y 0 10 20 30 35 40 45 50 55 60

PS (y) 0.72 0.74 0.74 0.75 0.76 0.77 0.79 0.81 0.85 0.90

$π(y) 8078.57 7742.31 6334.62 5250.00 4800.00 4271.74 3914.29 3386.84 3100.00 2550.00

(ii) For a policy based instead on a fixed payout, π, the corresponding agedependent annual premium, α(y), is obtained by solving Eq (4.15) for α to yield: µ ¶ π 1 RE + −1 (4.17) α(y) = PS (y) (65 − y) PS (y) Thus, for the indicated specific value π = 8000, with RE = 30 as before, the computed values for the age-dependent annual premiums are shown in the table below.

29 y 0 10 20 30 35 40 45 50 55 60

PS (y) 0.72 0.74 0.74 0.75 0.76 0.77 0.79 0.81 0.85 0.90

$α(y) 89.53 91.65 103.00 116.19 123.68 134.55 144.30 162.14 176.47 211.11

Note that with the fixed, ninety-dollar annual premium policy of part (i), and its resultant declining payout, only individuals for which y = 0 will receive a payout higher than $8000.00. With a fixed payout of $8000.00 for all participants, we observe from the table above that, in this case, individuals for which y = 0 will pay an annual premium that is lower than $90.00; the premiums to be paid by all others are higher than $90.00 and increase with age. (iii) If the target expected revenue is to increase by 50% (from $30 per year, per participant, to $45), using Eqs (4.16) and (4.17) given above respectively for the time-dependent payout (fixed premium), and time-dependent premium (fixed payout), we obtain the results shown in the following table for both the agedependent payout (fixed premium, α = 90), and the age-dependent premium (fixed payout, π = 8000). y

PS (y)

0 10 20 30 35 40 45 50 55 60

0.72 0.74 0.74 0.75 0.76 0.77 0.79 0.81 0.85 0.90

$π(y) (α = 90) 4596.43 4569.23 3738.46 3150.00 2925.00 2641.30 2485.71 2202.63 2100.00 1800.00

$α(y) (π = 8000) 110.36 111.92 123.27 136.19 143.42 154.03 163.29 180.66 194.12 227.78

As expected, when compared with the results in (i) and (ii), we observe that the payouts are now uniformly lower (for the same fixed premium α = 90), and the annual premiums are uniformly higher (for the same fixed payout π = 8000). (iv) We return to the problem formulation in (i) and (ii), this time with each probability of survival increased by 0.05, (retaining the same RE ). Once again, using Eqs (4.16) and (4.17), this time with the new values PS+ (y) for the survival probabilities, the results are shown in the table below.

30

CHAPTER 4. y

PS+ (y)

0 10 20 30 35 40 45 50 55 60

0.77 0.79 0.79 0.80 0.81 0.82 0.84 0.86 0.90 0.95

$π(y) (α = 90) 11,106.50 10,764.30 8807.10 7350.00 6773.70 6083.33 5700.00 5078.60 5100.00 5550.00

$α(y) (π = 8000) 75.72 76.64 85.23 94.64 99.59 106.83 111.91 121.71 122.22 115.79

These results warrant a closer look.

Variable Standard Inc Rev enue Ps(y )+0.05

11000 10000 9000

Payout, $

8000 7000 6000 5000 4000 3000 2000 0

10

20

30 Age, Y

40

50

60

Figure 4.4: Age-dependent payout for fixed annual premium of $90: Standard case, dark circles, solid line; increased revenue case, squares, long dashed line; increased survival probabilities, diamonds, short dashed line.

Because probabilities of survival have increased, cumulatively more money will be paid into the pool by the participants in the long run, since each one will, on average, live longer. It is therefore consistent that the payouts should be higher (for the same fixed premium, α = 90, and the same expected per capita revenue). Similarly, for the same fixed payout, π = 8000, it is consistent that the premiums should be lower. However, something interesting happens at age 55: the payouts that had been decreasing monotonically (for a fixed annual premium) now begin to increase with age; similarly, the annual premiums that had been increasing monotonically (for a fixed payout) now begin to decrease. This is seen clearly in Figs 4.4 and 4.5, which show all the cases investigated thus far in this problem: the standard case results of parts (i) and (ii) (circles, solid

31

250

Variable Standard Inc Rev enue Ps(y ) + 0.05

Annual Premium

200

150

100

0

10

20

30 Age, Y

40

50

60

Figure 4.5: Age-dependent annual premiums for fixed payout of $8000: Standard case, dark circles, solid line; increased revenue case, squares, long dashed line; increased survival probabilities, diamonds, short dashed line.

line); the increased per capita revenue case of part (iii) (squares, long dashed line); and the increased survival probabilities case (diamonds, short dashed line). The results obtained when the probabilities of survival have increased make no sense financially: older participants enrolling at age 55 (and later) should not pay lower premiums or receive higher payouts upon death than those enrolling at 50. The reason for this anomaly is that with longer life expectancies (indicated by the increased probabilities of survival beyond 65), the computational horizon should also be increased commensurately. The entire problem should be reformulated for a survival threshold higher than 65 (beyond which there is no payout).

Chapter 5

Exercises Sections 5.1 and 5.2 5.1 The sample space, Ω, given in Example 5.1 in the text, is: Ω = {HH, HT, T H, T T } consisting of all 4 possible outcomes; or, when these outcomes are represented respectively, as ωi ; i = 1, 2, 3, 4, it may be represented as: Ω = {ω1 , ω2 , ω3 , ω4 } Now, upon defining the two-dimensional random variable X = (X1 , X2 ), where X1 is the total number of heads, and X2 , the total number of tails, we obtain the following mappings as a result: X(ω1 ) = (2, 0); X(ω2 ) = (1, 1); X(ω3 ) = (1, 1); X(ω4 ) = (0, 2) The corresponding random variable space, V , is therefore obtained as: V = {(2, 0), (1, 1), (0, 2)} By assuming equiprobable outcomes, we obtain the following probabilities: PX (0, 2) = P (ω4 ) = 14 PX (1, 1) = P (ω2 ) + P (ω3 ) = PX (2, 0) = P (ω1 ) = 14

1 2

The full pdf is now given, for x1 = 0, 1, 2, and x2 = 0, 1, 2, as follows: f (0, 0) = 0; f (1, 0) = 0; f (2, 0) = 14 f (0, 1) = 0; f (1, 1) = 12 ; f (2, 1) = 0 f (0, 2) = 14 ; f (1, 2) = 0; f (2, 2) = 0 1

2

CHAPTER 5. Note that for this problem, X1 and X2 must satisfy the constraint X1 + X2 = 2

so that events that do not meet this constraint are impossible; the probability of occurrence of such events are therefore zero. The same complete joint pdf, f (x1 , x2 ), may therefore be represented in tabular form as follows: X1 → X2 ↓ 0 1 2

0

1

2

0 0 1/4

0 1/2 0

1/4 0 0

5.2 (i) From the given pdf, we are able to determine the required probabilities as follows: (a) P (X1 ≥ X2 ) = f (1, 1) + f (2, 1) + f (2, 2) = 43 3 (b) P (X1 + X2 = 4) = f (1, 3) + f (2, 2) = 16 9 (c) P (|X1 − X2 | = 1) = f (1, 2) + f (2, 1) + f (2, 3) = 16 7 (d) P (X1 + X2 is even ) = f (1, 1) + f (1, 2) + f (2, 2) = 16 (ii) The joint cdf, F (x1 , x2 ), by definition, is: F (x1 , x2 ) =

x1 X x2 X

f (ξ1 , ξ2 )

ξ1 =1 ξ2 =1

In this specific case, upon introducing the values for the joint pdf, the result is as follows: F (1, 1) = f (1, 1) = 14 F (1, 2) = f (1, 1) + f (1, 2) = 38 7 F (1, 3) = f (1, 1) + f (1, 2) + f (1, 3) = 16 5 F (2, 1) = f (1, 1) + f (2, 1) = 8 F (2, 2) = f (1, 1) + f (1, 2) + f (2, 1) + f (2, 2) = 78 F (2, 3) = f (1, 1) + f (1, 2) + f (1, 3) + f (2, 1) + f (2, 2) + f (2, 3) =

16 16

A plot of this discrete cdf is shown in Fig 5.1. 5.3(i) The required sample space is:   (W, W ), (L, W ), Ω=  (D, W ),

(W, L), (L, L), (D, L),

 (W, D)  (L, D)  (D, D)

a space with a total of nine elements ω1 , ω2 , . . . , ω9 , ordered from left to right and from top to bottom in the array. The first entry in each ordered pair, the

3

Cumulative Distribution Function F(x1,x2)

1.0 0.8 F(x1,x2) 0.6 3

0.4 2

X2

1.0 1

1.5 X1

2.0

Figure 5.1: Cumulative distribution function, F (x1 , x2 ), for Problem 5.2. outcome of the first game, is distinct from the second, the outcome of the second game. Thus, for example, ω2 , representing the outcome that the player wins the first game but loses the second, is distinguishable from ω4 where the reverse takes place, with the player losing the first game and winning the second. (ii) Defining the two-dimensional random variable X = (X1 , X2 ), where X1 is the total number of wins, and X2 is the total number of draws, produces the following mapping: X(ω1 ) = (2, 0); X(ω2 ) = (1, 0); X(ω3 ) = (1, 1); X(ω4 ) = (1, 0); X(ω5 ) = (0, 0); X(ω6 ) = (0, 1); X(ω7 ) = (1, 1); X(ω8 ) = (0, 1); X(ω9 ) = (0, 2) The corresponding random variable space is therefore: V = {(2, 0), (1, 0), (1, 1), (0, 0), (0, 1), (0.2)} a set consisting of 6 elements. By assuming equiprobable outcomes, we obtain the following probabilities: PX (2, 0) = P (ω1 ) = 19 PX (1, 0) = P (ω2 ) + P (ω4 ) = PX (1, 1) = P (ω3 ) + P (ω7 ) = PX (0, 0) = P (ω5 ) = 19 PX (0, 1) = P (ω6 ) + P (ω8 ) = PX (0, 2) = P (ω9 ) = 19

2 9 2 9 2 9

The full joint pdf, f (x1 , x2 ), for x1 = 0, 1, 2, and x2 = 0, 1, 2, is:

4

CHAPTER 5. f (0, 0) = 91 ; f (1, 0) = 29 ; f (2, 0) = 19 f (0, 1) = 92 ; f (1, 1) = 29 ; f (2, 1) = 0 f (0, 2) = 91 ; f (1, 2) = 0; f (2, 2) = 0

where, because of the constraint inherent to this problem, i.e., X1 + X2 ≤ 2 events that do not satisfy this constraint, being impossible, are assigned the commensurate probability of zero. The joint pdf, f (x1 , x2 ), may therefore be represented in tabular form as follows: X1 → X2 ↓ 0 1 2

0

1

2

1/9 2/9 1/9

2/9 2/9 0

1/9 0 0

(iii) If 3 points are awarded for a win, and 1 point for a draw, let Y be the total number of points awarded to a player. It is true, then, that: Y = 3X1 + X2 which leads to the following mapping: Y (ω1 ) = 6; Y (ω2 ) = 3; Y (ω3 ) = 4; Y (ω4 ) = 3; Y (ω5 ) = 0; Y (ω6 ) = 1; Y (ω7 ) = 4; Y (ω8 ) = 1; Y (ω9 ) = 2 with a corresponding random variable space: VY = {0, 1, 2, 3, 4, 6} The resulting pdf, fY (y), is obtained immediately as: fY (0) = 19 ; fY (1) = 29 ; fY (2) = 19 ; fY (3) = 29 ; fY (4) = 29 ; fY (5) = 0; fY (6) =

1 9

From here, the required probability, P (Y ≥ 4), is obtained as: P (Y ≥ 4) = fY (4) + fY (5) + fY (6) =

3 1 , or 9 3

Thus, a player for which all possible two-game combinations in Ω are equally likely, the probability of qualifying for the tournament is 1/3, which is low. 5.4 (i) For Suzie the superior player, with pW = 0.75, pD = 0.2, and pL = 0.05, we obtain the following joint probability distribution.

5 PX (2, 0) = P (ω1 ) = (0.75)2 = 0.5625 PX (1, 0) = P (ω2 ) + P (ω4 ) = 2(0.75 × 0.05) = 0.075 PX (1, 1) = P (ω3 ) + P (ω7 ) = 2(0.75 × 0.2) = 0.3 PX (0, 0) = P (ω5 ) = (0.05)2 = 0.0025 PX (0, 1) = P (ω6 ) + P (ω8 ) = 2(0.2 × 0.05) = 0.02 PX (0, 2) = P (ω9 ) = (0.2)2 = 0.04 The complete joint pdf, fS (x1 , x2 ), for x1 = 0, 1, 2, and x2 = 0, 1, 2, is: fS (0, 0) = 0.0025; fS (1, 0) = 0.075; fS (2, 0) = 0.5625 fS (0, 1) = 0.02; fS (1, 1) = 0.30; fS (2, 1) = 0 fS (0, 2) = 0.04; fS (1, 2) = 0; fS (2, 2) = 0 which may be represented in tabular form as: X1 → X2 ↓ 0 1 2

fS (x1 , x2 ) 0 1 0.0025 0.0200 0.0400

0.075 0.300 0

2 0.5625 0 0

(ii) Similarly, for Meredith the mediocre player, with pW = 0.5, pD = 0.3, and pL = 0.2, the joint probability distribution is as follows. PX (2, 0) = P (ω1 ) = (0.5)2 = 0.25 PX (1, 0) = P (ω2 ) + P (ω4 ) = 2(0.5 × 0.2) = 0.20 PX (1, 1) = P (ω3 ) + P (ω7 ) = 2(0.5 × 0.3) = 0.30 PX (0, 0) = P (ω5 ) = (0.2)2 = 0.04 PX (0, 1) = P (ω6 ) + P (ω8 ) = 2(0.2 × 0.3) = 0.12 PX (0, 2) = P (ω9 ) = (0.3)2 = 0.09 The complete joint pdf, fM (x1 , x2 ), is: fM (0, 0) = 0.04; fM (1, 0) = 0.20; fM (2, 0) = 0.25 fM (0, 1) = 0.12; fM (1, 1) = 0.30; fM (2, 1) = 0 fM (0, 2) = 0.09; fM (1, 2) = 0; fM (2, 2) = 0 In tabular form, fM (x1 , x2 ) is: X1 → X2 ↓ 0 1 2

fM (x1 , x2 ) 0 1 0.04 0.12 0.09

0.20 0.30 0

2 0.25 0 0

6

CHAPTER 5.

(iii) Finally, for Paula the poor player, with pW = 0.2, pD = 0.3, and pL = 0.5, the following is the joint probability distribution. PX (2, 0) = P (ω1 ) = (0.2)2 = 0.04 PX (1, 0) = P (ω2 ) + P (ω4 ) = 2(0.2 × 0.5) = 0.20 PX (1, 1) = P (ω3 ) + P (ω7 ) = 2(0.2 × 0.3) = 0.12 PX (0, 0) = P (ω5 ) = (0.5)2 = 0.25 PX (0, 1) = P (ω6 ) + P (ω8 ) = 2(0.5 × 0.3) = 0.30 PX (0, 2) = P (ω9 ) = (0.3)2 = 0.09 Once more, the complete joint pdf, fP (x1 , x2 ), is: fP (0, 0) = 0.25; fP (1, 0) = 0.20; fP (2, 0) = 0.04 fP (0, 1) = 0.30; fP (1, 1) = 0.12; fP (2, 1) = 0 fP (0, 2) = 0.09; fP (1, 2) = 0; fP (2, 2) = 0 or, in tabular form: X1 → X2 ↓ 0 1 2

fP (x1 , x2 ) 0 1 0.25 0.30 0.09

0.20 0.12 0

2 0.04 0 0

Now, by defining the random variable, Y = 3X1 + X2 , representing the total number of points awarded to each player, then in all cases, the set Q defined as: Q = {Y : y ≥ 4} represents the event that a player qualifies for the tournament (having received at least 4 points). In this case, in terms of the original sample space, Q = {ω1 , ω3 , ω7 } so that PY (Q) = P (ω1 ) + P (ω3 ) + P (ω7 ) = PX (2, 0) + PX (1, 1) Therefore, for Suzie, PY (Y ≥ 4) = 0.5625 + 0.30 = 0.8625 for Meredith, PY (Y ≥ 4) = 0.25 + 0.30 = 0.55 and, for Paula, PY (Y ≥ 4) = 0.04 + 0.12 = 0.16

7 Thus, the probability that the superior player qualifies for the tournament is a reasonably high 0.8625; the probability that the mediocre player qualifies is a moderate 0.55, and for the poor player, the probability of qualifying is a very low 0.16. 5.5 (i) The condition to be satisfied is: Z 1Z 2 cx1 x2 (1 − x2 )dx1 dx2 = 1 0

0

We may now carry out the indicated integrations, first with respect to x1 , and then x2 , i.e., à ¯2 ! Z 1Z 2 Z 1 x21 ¯¯ cx1 x2 (1 − x2 )dx1 dx2 = c x2 (1 − x2 ) dx2 2 ¯0 0 0 0 Z 1 = 2c (x2 − x22 )dx2 0

µ = 2c

¶¯1 x22 x32 ¯¯ − 2 3 ¯0

c =1 3 which is solved for c to yield the required result: =

c=3 The complete joint pdf is therefore given by: ½ 3x1 x2 (1 − x2 ); 0 < x1 < 2; 0 < x2 < 1 f (x1 , x2 ) = 0; elsewhere

(5.1)

(ii) The required probabilities are obtained from Eq (5.1) above as follows: Z 1 Z 2 P (1 < x1 < 2; 0.5 < x2 < 1) = 3 x1 x2 (1 − x2 )dx1 dx2 0.5 1 µZ 2 ¶ Z 1 = 3 x2 (1 − x2 ) x1 dx1 dx2 0.5 1

=

1

Z

9 2

0.5

(x2 − x22 )dx2 =

3 8

Similarly, Z

0.5

Z

2

P (x1 > 1; x2 < 0.5) = 3

x1 x2 (1 − x2 )dx1 dx2 µZ 2 ¶ Z 0.5 3 x2 (1 − x2 ) x1 dx1 dx2 0

=

1

0

=

9 2

Z

0

1 0.5

(x2 − x22 )dx2 =

3 8

8

CHAPTER 5.

Surface Plot of the cdf F(x1,x2)

1.0

0.5 F(x1,x2)

0.0 0

1.0 1

0.5

X1 2

0.0

X2

Figure 5.2: Cumulative distribution function, F (x1 , x2 ) for Problem 5.5. (iii) The cumulative distribution function is obtained from: Z x2 Z x1 F (x1 , x2 ) = f (ξ1 , ξ2 )dξ1 dξ2 0

which, in this specific case, is: F (x1 , x2 )

=

Z

0

x2

Z

x1

3

ξ1 ξ2 (1 − ξ2 )dξ1 dξ2 µZ x1 ¶ = 3 ξ2 (1 − ξ2 ) ξ1 dξ1 dξ2 0 0 µ 2¶ Z x2 x1 dξ2 = 3 ξ2 (1 − ξ2 ) 2 0 µ ¶ 3x21 x22 x3 = − 2 2 2 3 Z0 x2

0

A plot of this cdf is shown in Fig 5.2. 5.6 (i) From the joint pdf given in Exercise 5.5, the marginal pdfs are obtained as follows: Z 1 x1 x2 (1 − x2 )dx2 f1 (x1 ) = 3 0

=

Z

1

3x1

x2 (1 − x2 )dx2 = 0

Thus,

½ f1 (x1 ) =

x1 2 ;

0;

0 < x1 < 2; elsewhere

x1 2

9 Similarly,

Z

2

f2 (x2 ) = 3x2 (1 − x2 )

x1 dx1 = 6x2 (1 − x2 ) 0

Thus,

½ f2 (x2 ) =

6x2 (1 − x2 ); 0 < x2 < 1; 0; elsewhere

From here, it is straightforward to see that: f (x1 , x2 ) = f1 (x1 )f2 (x2 ) so that X1 and X2 are independent. From the marginal pdfs obtained above, the required marginal means, µX1 , µX2 , are determined as follows: Z 2 Z 1 2 2 4 µX1 = x1 f1 (x1 )dx1 = x1 dx1 = 2 0 3 0 and,

Z µX2 =

Z

1

x2 f2 (x2 )dx2 = 6 0

0

2

6x22 (1 − x2 )dx2 =

1 2

(ii) The conditional pdfs are obtained as follows. f (x1 |x2 ) = also: f (x2 |x1 ) =

3x1 x2 (1 − x2 ) x1 f (x1 , x2 ) = = f2 (x2 ) 6x2 (1 − x2 ) 2

f (x1 , x2 ) 3x1 x2 (1 − x2 ) = = 6x2 (1 − x2 ) x1 f1 (x1 ) 2

5.7 (i) From the given pdf, the marginal pdfs are obtained by summing out x2 (to obtain f1 (x1 )) and summing out x1 (to obtain f2 (x2 )). The results are shown in the following table. X1 → X2 ↓ 0 1 2 f1 (x1 )

0

1

2

0 0 1/4 1/4

0 1/2 0 1/2

1/4 0 0 1/4

f2 (x2 ) 1/4 1/2 1/4 1

From these marginal pdfs, the product, f1 (x1 )f2 (x2 ), is obtained as in the following table:

10

CHAPTER 5. f1 (x1 )f2 (x2 ) X1 → 0 1 X2 ↓ 0 1/16 1/8 1 1/8 1/4 2 1/16 1/8

2 1/16 1/8 1/16

which is not the same as the joint pdf, f (x1 , x2 ); hence, X1 and X2 are not independent. (ii) The conditional pdfs are obtained as follows: f (x1 |x2 ) =

f (x1 , x2 ) f2 (x2 )

and, from the indicated pdfs, the result is as shown in the following table: f (x1 |x2 ) X1 → 0 1 X2 ↓ 0 0 0 1 0 1 2 1 0

2 1 0 0

Similarly,

f (x1 , x2 ) f1 (x1 ) and, again, from the indicated pdfs, the result is as shown in the following table: f (x2 |x1 ) =

f (x2 |x1 ) X1 → 0 1 X2 ↓ 0 0 0 1 0 1 2 1 0

2 1 0 0

Observe that: P (X2 = 2|X1 = 0) = 1 = P (X1 = 2|X2 = 0) P (X2 = 1|X1 = 1) = 1 = P (X1 = 1|X2 = 1) P (X2 = 0|X1 = 2) = 1 = P (X1 = 0|X2 = 2) With all other probabilities being zero, it appears as if the experiments are such that, with absolute certainty (and consistently), X1 + X2 = 2 (iii) With the random variables X1 and X2 defined respectively as the total number of heads, and the total number of tails obtained when a coin is tossed twice, we note the following:

11 (a) each random variable space is {0, 1, 2}; and, (b) there are exactly two tosses of the coin involved in each experiment. Therefore, the resulting outcomes must satisfy the constraint X1 + X2 = 2 always: at the end of each experiment, even though the actual outcomes are uncertain, the total number of heads obtained plus the total number of tails obtained must add up to 2, hence the constraint. Thus, the foregoing results are consistent with the stated conjecture. 5.8 From the given pdf, the condition that must be satisfied is: Z

2

Z

1

c 0

e−(x1 +x2 ) dx1 dx2 = 1

0

The indicated integrals may then be carried out to give the following result: Z

2

Z

1

c 0

Z e−(x1 +x2 ) dx1 dx2

2

=

0

Z

0

³ ¯1 ´ e−x2 −e−x1 ¯0 dx2

2

¡ ¢ e−x2 1 − e−1 dx2 0 ¡ ¢¡ ¢ = c 1 − e−1 1 − e−2 = 1 =

which, when solved for c, yields the desired result: c=

1 (1 − e−1 ) (1 − e−2 )

The marginal pdfs are obtained as follows. Z

2

f1 (x1 ) = c

e−(x1 +x2 ) dx2

0

which, upon using the result obtained above for c, simplifies to give: ½ 1 −x1 ; 0 < x1 < 1; (1−e−1 ) e f1 (x1 ) = 0; elsewhere Similarly,

Z f2 (x2 ) = c

1

e−(x1 +x2 ) dx1

0

which simplifies to yield: ½ f2 (x2 ) =

1 −x2 ; (1−e−2 ) e

0;

0 < x2 < 2; elsewhere

We may now observe from here that: f1 (x1 )f2 (x2 ) = f (x1 , x2 )

12

CHAPTER 5.

indicating that X1 and X2 are independent. 5.9 When the range of validity is changed as indicated, the constant, c, is determined to satisfy the condition: Z ∞Z ∞ c e−(x1 +x2 ) dx1 dx2 = 1 0

0

and, upon evaluating the indicated integrals, we obtain c=1 The marginal pdfs are obtained in the usual fashion: Z ∞ f1 (x1 ) = e−(x1 +x2 ) dx2 = e−x1 0

and

Z



f2 (x2 ) =

e−(x1 +x2 ) dx1 = e−x2

0

From here, it is clear that, as in Problem 5.8, f1 (x1 )f2 (x2 ) = f (x1 , x2 ) indicating that X1 and X2 are independent under these conditions, also. The joint pdf in this case, is ½ −(x +x ) e 1 2 ; 0 < x1 < ∞; 0 < x2 < ∞ (5.2) f (x1 , x2 ) = 0; elsewhere

Section 5.3 5.10 (i) The random variable, U (X1 , X2 ) = X1 + X2 , represents the total number of wins and draws. From the pdf obtained in Exercise 5.3, its expected value is obtained as follows: E(X1 + X2 ) =

2 2 X X

(x1 + x2 )f (x1 , x2 )

x2 =0 x1 =0

= (0 + 0)f (0, 0) + (1 + 0)f (1, 0) + · · · + (2 + 2)f (2, 2) 4 6 2 4 = + + = 9 9 9 3 (ii) The random variable, U (X1 , X2 ) = 3X1 + X2 , represents the total number of points awarded to the player, with a minimum of 4 points required for qualification; its expected value is obtained as follows: E(3X1 + X2 ) =

2 2 X X

(3x1 + x2 )f (x1 , x2 )

x2 =0 x1 =0

= 0f (0, 0) + 3f (1, 0) + · · · + 8f (2, 2) 8 12 10 2 + + = = 9 9 9 3

13 Thus, the expected total number of points awarded is 2 23 , which is less than the required minimum of 4; this player is therefore not expected to qualify. 5.11 (i) The required marginal pdfs are obtained from: f1 (x1 ) =

2 X

f (x1 , x2 ); f2 (x2 ) =

x2 =0

2 X

f (x1 , x2 )

x1 =0

The specific marginal pdfs for each of the players are shown in the tables below. X1 → X2 ↓ 0 1 2 f1 (x1 )

For Suzie 1

0

0.0025 0.0200 0.0400 0.0625

X1 → X2 ↓ 0 1 2 f1 (x1 ) X1 → X2 ↓ 0 1 2 f1 (x1 )

0.075 0.300 0 0.375

2

0.5625 0 0 0.5625

f2 (x2 ) 0.64 0.32 0.04 1

For Meredith 0 1 2 0.04 0.12 0.09 0.25 0

0.20 0.30 0 0.50

0.25 0 0 0.25

f2 (x2 ) 0.49 0.42 0.09 1

For Paula 1 2

0.25 0.30 0.09 0.64

0.20 0.12 0 0.32

0.04 0 0 0.04

f2 (x2 ) 0.49 0.42 0.09 1

The marginal means are obtained from these marginal pdfs as follows: µX1 =

2 X x1 =0

x1 f1 (x1 ); µX2 =

2 X

x2 f2 (x2 )

x2 =0

For Suzie, µX1 µX2

= (0 × 0.0625 + 1 × 0.375 + 2 × 0.5625) = 1.5 = (0 × 0.64 + 1 × 0.32 + 2 × 0.04) = 0.4

indicating that, in 2 games, Suzie is expected to win 1.5 games on average, while her expected number of draws is 0.4.

14

CHAPTER 5. For Meredith, µX1 µX2

= (0 × 0.25 + 1 × 0.50 + 2 × 0.25) = 1.0 = (0 × 0.49 + 1 × 0.42 + 2 × 0.09) = 0.6

indicating that, on average, in 2 games, Meredith will win 1.0 game, and draw in 0.6. And for Paula, µX1 µX2

= (0 × 0.64 + 1 × 0.32 + 2 × 0.04) = 0.4 = (0 × 0.49 + 1 × 0.42 + 2 × 0.009) = 0.6

indicating that, in 2 games, Paula is only expected (on average) to win 0.4 games, and draw in 0.6 games. It is interesting to note that these results could also have been obtained directly from the supplied individual probabilities. Recall that for Suzie, the probability of winning a single game, pW , is 0.75, so that in 2 games, the expected number of wins will be 2 × 0.75 = 1.5; the probability of a draw pD , is 0.2, and therefore the expected number of draws in 2 games is 0.4, as obtained above. The same is true for Meredith, (pW = 0.5; pD = 0.3); and Paula, (pW = 0.2; pD = 0.3). (ii) The expectation, E[U (X1 , X2 ) = 3X1 + X2 ], is the expected total number of points awarded to each player; and from the original problem definition, a minimum of 4 is required for qualification. This expectation is obtained from the joint pdf, f (x1 , x2 ), as follows: E(3X1 + X2 ) =

2 2 X X

(3x1 + x2 )f (x1 , x2 )

x2 =0 x1 =0

Upon employing the appropriate pdf for each player, the results are shown below. First, for Suzie, E(3X1 + X2 )

=

=

(0 × 0.0025 + 3 × 0.075 + 6 × 0.5626) +(1 × 0.02 + 4 × 0.30 + 7 × 0) +(2 × 0.04 + 5 × 0 + 8 × 0) 4.9

In similar fashion, we obtain, ½ E(3X1 + X2 ) =

3.6; 1.8;

for Meredith for Paula

Consequently, only Suzie is expected to qualify. It appears, therefore, that this is a tournament meant only for superior players, with the stringent pre-qualifying conditions designed specifically to weed out all but the truly superior players.

15 5.12 From the marginal pdfs obtained in Exercise 5.7, we obtain µ ¶ µ ¶ µ ¶ X 1 1 1 µX1 = x1 f1 (x1 ) = 0 × + 1× + 2× =1 4 2 4 µX2 =

X

x2 f2 (x2 ) =

¶ µ ¶ µ ¶ µ 1 1 1 + 1× + 2× =1 0× 4 2 4

The covariance is obtained from: σ12 = E(X1 X2 ) − µX1 µX2 and since

XX

E(X1 X2 ) =

x2

x1 x2 f (x1 , x2 )

x1

we obtain, from the given joint pdf, f (x1 , x2 ), that: E(X1 X2 ) = so that the covariance is: σ12 =

1 2

1 1 −1=− 2 2

Next, the variances are obtained from: σ12

= E(X1 − µX1 ) =

X

µ ¶ µ ¶ µ ¶ 1 1 1 1 (x1 − 1) f1 (x1 ) = 1 × + 0× + 1× = 4 2 4 2 2

and similarly, σ22 = E(X2 − µX2 ) =

X

(x2 − 1)2 f2 (x2 ) =

1 2

Hence, the correlation coefficient, ρ, is obtained as: −1/2 p = −1 1/2 1/2

ρ= p

with the implication that the two random variables in question, X1 and X2 , are perfectly negatively correlated. 5.13 From the given joint pdfs for each player, we are able to obtain the marginal pdfs from 2 2 X X f1 (x1 ) = f (x1 , x2 ); f2 (x2 ) = f (x1 , x2 ) x2 =0

as follows (also see Exercise 5.11).

x1 =0

16

CHAPTER 5.

X1 → X2 ↓ 0 1 2 f1 (x1 )

For Suzie 1

0

0.0025 0.0200 0.0400 0.0625

0.075 0.300 0 0.375

2 f2 (x2 ) 0.64 0.32 0.04 1

0.5625 0 0 0.5625

For Meredith 0 1 2

X1 → X2 ↓ 0 1 2 f1 (x1 )

0.04 0.12 0.09 0.25

X1 → X2 ↓ 0 1 2 f1 (x1 )

0

0.20 0.30 0 0.50

0.25 0 0 0.25

f2 (x2 ) 0.49 0.42 0.09 1

For Paula 1 2

0.25 0.30 0.09 0.64

0.20 0.12 0 0.32

0.04 0 0 0.04

f2 (x2 ) 0.49 0.42 0.09 1

The marginal means are obtained from these marginal pdfs as follows: µX1 =

2 X

x1 f1 (x1 ); µX2 =

x1 =0

to yield: µX1 and, µX2

2 X

x2 f2 (x2 )

x2 =0

  1.5; 1.0; =  0.4;

for Suzie for Meredith for Paula

  0.4; 0.6; =  0.6;

for Suzie for Meredith for Paula

The variances, σ12 and σ22 , are obtained for each player from: X (xi − µXi )2 fi (xi ); i = 1, 2 σi2 = E(Xi − µXi )2 = xi

Thus, for Suzie, with µX1 = 1.5 and µX2 = 0.4, σ12 = (1.52 × 0.0625) + (0.52 × 0.375) + (0.52 × 0.5625) = 0.375

17 and

σ22 = (0.42 × 0.64) + (0.62 × 0.32) + (1.62 × 0.4) = 0.32

Similarly, for Meredith, with µX1 = 1.0 and µX2 = 0.6, σ12 = (12 × 0.25) + (0 × 0.50) + (12 × 0.25) = 0.50 and

σ22 = (0.62 × 0.49) + (0.42 × 0.42) + (1.42 × 0.09) = 0.42

and for Paula, with µX1 = 0.4 and µX2 = 0.6, σ12 = (0.42 × 0.64) + (0.62 × 0.32) + (1.62 × 0.04) = 0.32 and

σ22 = (0.62 × 0.49) + (0.42 × 0.42) + (1.42 × 0.09) = 0.42

From here, we are now able to calculate the covariances, σ12 , from: σ12 = E(X1 X2 ) − µX1 µX2 to yield, for the various players:   −0.30; −0.30; σ12 =  −0.12;

for Suzie for Meredith for Paula

and the correlation coefficients, ρ, from ρ= to obtain:

  −0.866; −0.655; ρ=  −0.327;

σ12 σ1 σ2 for Suzie for Meredith for Paula

The uniformly negative values obtained for the covariances and correlation coefficients indicate, first, that for all 3 players, the total number of wins and the total number of draws are negatively correlated: i.e., a higher number of wins tends to occur together with a lower number of draws, and vice versa. Keep in mind that there is a third possible outcome (L, a “loss”), so that if a player does not win a game, the other option is not limited to a “draw.” Hence the two variables, X1 and X2 , are not (and cannot be) perfectly correlated—unless the probability of a loss is always zero, which is not the case here. Second, the negative correlation between wins and draws is strongest for Suzie, the superior player; it is moderately strong for Meredith, and much less so for Paula. These values reflect the influence exerted by the third possible outcome (a loss), via the probability of its occurrence. The strong correlation between wins and draws for Suzie indicates that for her, these two outcomes are the most dominant of the three: i.e., she is far more likely to win or draw

18

CHAPTER 5.

(almost exclusively) than lose. If the probability of losing were exactly zero, the correlation coefficient between X1 and X2 would be exactly −1. The moderately strong correlation coefficient for Meredith shows that while the more likely outcomes are a win or a draw, the possibility of losing is just high enough to diffuse the correlation between wins and draws. For Paula, the possibility of losing is sufficiently high to the point of lowering the correlation between wins and draws substantially. 5.14 (i) From the joint pdfs, we obtain the required marginal pdfs as follows: Z

µ

1

f1 (x) =

(x + y)dy = 0

so that: f1 (x) = x +

¶¯1 y 2 ¯¯ xy + 2 ¯0

1 2

Similarly, we obtain: Z

1

f2 (y) =

(x + y)dx = y + 0

1 2

The conditional pdfs are obtained as follows: f (x|y) =

f (x, y) 2(x + y) = f2 (y) 2y + 1

and similarly, f (y|x) =

2(x + y) 2x + 1

We may now observe that f (x|y) 6= f1 (x); f (y|x) 6= f2 (y); and f (x, y) 6= f1 (x)f2 (y) hence, X and Y are not independent. (ii) The marginal means and marginal variances required for computing the covariances and the correlation coefficients, are obtained as follows: ¶ Z 1 Z 1 µ 7 1 µX = xf1 (x)dx = x x+ dx = 2 12 0 0 and by symmetry,

Z

1

µY =

yf2 (y)dy = 0

The variances are obtained as follows: Z 1 Z 2 σX = (x − µX )2 f1 (x)dx = 0

0

1

7 12

µ ¶µ ¶ 7 1 x− x+ dx 12 2

19 and after a bit of fairly basic calculus and algebra, this simplifies to yield: 2 σX =

11 144

σY2 =

11 144

and similarly for Y ,

The covariance is obtained as: σXY = E(XY ) − µX µY and since

Z

1

Z

E(XY ) =

1

xy(x + y)dxdy = 0

0

1 3

we therefore obtain:

1 49 1 − =− 3 144 144 from where the correlation coefficient is obtained as: σXY =

1 −1/144 p =− ρ= p 11 11/144 11/144 indicating slightly negatively correlated random variables.

Application Problems 5.15 (i) First, upon representing the assay results as Y , with two possible outcomes, y1 = A+ , and y2 = A− , and the true lithium status as X, also with two possible outcomes, x1 = L+ and x2 = L− ; and subsequently, upon considering the relative frequencies as representative of the probabilities, the resulting joint pdf, f (x, y), is shown in the table below. The marginal distributions, f1 (x) and f2 (y), are obtained by summing over y, and x, respectively, i.e., X X f1 (x) = f (x, y); f2 (y) = f (x, y). y

Y , Assay Result↓ y1 y2 f1 (x)

x

X, Toxicity Status x1 x2 0.200 0.113 0.140 0.547 0.340 0.660

f2 (y) 0.313 0.687 1.000

The event “test result is correct,” consists of two mutually exclusive events: (a) the test method correctly registers a high lithium concentration (Y = y1 ) for a patient with confirmed lithium toxicity (X = x1 ), or,

20

CHAPTER 5.

(b) the test method correctly registers a low lithium concentration (Y = y2 ) for a patient with no lithium toxicity (X = x2 ). Thus, the required probability, P (R), that the test method produces the right result, is: P (R) = =

P (X = x1 , Y = y1 ) + P (X = x2 , Y = y2 ) 0.200 + 0.547 = 0.747

(ii) From the joint pdf and the marginal pdfs given in the table above, we obtain the required conditional pdfs as follows. f (y2 |x2 ) =

f (x2 , y2 ) 0.547 = = 0.829 f1 (x2 ) 0.660

f (y1 |x2 ) =

f (x2 , y1 ) 0.113 = = 0.171 f1 (x2 ) 0.660

f (y2 |x1 ) =

f (x1 , y2 ) 0.140 = = 0.412 f1 (x1 ) 0.340

In words, • f (y2 |x2 ) is the probability that the test method correctly indicates low lithium concentrations when used on patients confirmed with no lithium toxicity; • f (y1 |x2 ) is the probability that the test method incorrectly indicates high lithium concentrations when used on patients confirmed with no lithium toxicity; and, • f (y2 |x1 ) is the probability that the test method incorrectly indicates low lithium concentrations when used on patients with confirmed lithium toxicity. 5.16 (i) The required probability is P (X2 > X1 ); it is determined from the given joint pdf as follows. Z P (X2 > X1 ) = = = =



Z

x2

1 −(0.2x1 +0.1x2 ) e dx1 dx2 50 0 0 µZ x2 ¶ Z ∞ 1 −0.1x2 −0.2x1 e e dx1 dx2 50 0 0 Z ∞ 1 e−0.1x2 (1 − e−0.2x2 )dx2 10 0 2 3

21 (ii) The converse probability, P (X1 > X2 ), is obtained from: Z ∞Z ∞ 1 −(0.2x1 +0.1x2 ) P (X1 > X2 ) = e dx1 dx2 0 x2 50 µZ ∞ ¶ Z ∞ 1 = e−0.1x2 e−0.2x1 dx1 dx2 50 0 x2 Z ∞ 1 e−0.1x2 e−0.2x2 dx2 = 10 0 1 = 3 Of course, this could also have been obtained from the fact that: P (X1 > X2 ) = 1 − P (X2 > X1 ) (iii) The expected (mean) lifetimes for the various components are obtained as follows. Z ∞ Z ∞ µX1 = x1 f1 (x1 )dx1 ; and µX2 = x2 f2 (x2 )dx2 0

0

with fi (xi ) as the respective marginal pdfs, i = 1, 2. From Example 5.3 in the text, where these marginal pdfs were derived explicitly, we obtain, Z 1 ∞ E(X1 ) = µX1 = x1 e−0.2x1 dx1 = 5 5 0 Z ∞ 1 E(X2 ) = µX2 = x2 e−0.1x2 dx2 = 10 10 0 Thus, the expected lifetime of the control valve, µX2 , is 10 years, while that of the controller hardware electronics, µX1 , is 5 years. The implications are therefore that one should expect to replace the control valve every 10 years, and the controller hardware electronics every 5 years. (iv) From the result in (iii) above, we observe that over the next 20 years, one should expect to replace the control valve twice (at a cost of $10,000 each time), and to replace the control hardware electronics 4 times (at a cost of $20,000 each time). The total cost will then be: C = (2 × 10, 000) + (4 × 20, 000) = 100, 000 Thus, $100,000 should be budgeted over the next 20 years for the purpose of keeping the control system functioning by replacing a malfunctioning component every time it fails. 5.17 (i) From the supplied information, and under the stipulated conditions, the joint pdf, f (x, y), is shown in the table below. The marginal pdfs, f1 (x) and f2 (y), are obtained by summing over y, and summing over x, respectively.

22

CHAPTER 5. Y → X↓ 0 1 f2 (y)

1

2

3

4

f1 (x)

0.06 0.17 0.23

0.20 0.14 0.34

0.13 0.12 0.25

0.10 0.08 0.18

0.49 0.51 1.00

(ii) The required probability is the conditional probability, f (y = 3, 4|x = 0); it is obtained as follows: f (y = 3, 4|x = 0)

=

f (y = 3|x = 0) + f (y = 4|x = 0) f (x = 0, y = 3) f (x = 0, y = 4) + = f1 (x = 0) f1 (x = 0) 0.13 0.10 = + = 0.469 0.49 0.49

(iii) The required expected value is obtained as:

E(C) = 1500 + E(500X − 100Y ) = 1500 +

1 X 4 X (500x − 100y)f (x, y) x=0 y=1

which, upon introducing the joint pdf and appropriate values for x and y, simplifies to yield: E(C) = 1500 + 27 = 1527 Thus, the company should expect to spend, on average, $1527 per worker every year. 5.18 (i) It is not necessary to consider X3 in this joint pdf because of the constraint: X1 + X2 + X3 = 5 so that once X1 and X2 are given, X3 follows automatically. (ii) From the given pdf, f (x1 , x2 ) =

120 0.85x1 0.05x2 0.15−x1 −x2 x1 !x2 !(5 − x1 − x2 )!

for x1 = 0, 1, 2, . . . , 5, and x2 = 0, 1, 2, . . . , 5, we obtain the following table for the joint pdf as well as the marginal pdfs, the latter having been obtained by summing over the appropriate variable. (Note that events for which X1 +X2 > 5 are entirely impossible; accordingly, the probabilities associated with them are zero.)

23 X1 → X2 ↓ 0 1 2 3 4 5 f1 (x1 )

0

1

2

3

4

5

f2 (x2 )

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000

0.0004 0.0009 0.0006 0.0002 0.0000 0 0.002

0.0072 0.0108 0.0054 0.0009 0 0 0.024

0.0614 0.0614 0.0154 0 0 0 0.138

0.2610 0.1305 0 0 0 0 0.392

0.4437 0 0 0 0 0 0.444

0.774 0.204 0.021 0.001 0.000 0.000 1.000

This joint pdf is plotted in Fig 5.3, where it is seen that most of the “activity” is localized to the region where X1 ≥ 3 and X2 ≤ 2. Joint pdf for Problem 5.18

0.45

0.30 f(x1,x2)

0.15

0.00 5 4

5 3

X2

4 3

2

2

1 0

1

X1

0

Figure 5.3: Joint probability distribution function, f (x1 , x2 ), for Problem 5.18. It is possible to generate a 6 × 6 table of the product, f1 (x1 )f2 (x2 ), and compare this term-by-term to the table shown above for the joint pdf, f (x1 , x2 ), in order to determine whether or not the two pdfs are the same. However, it is sufficient to note, for example, that while f1 (x1 = 5) = 0.444, and f2 (x2 = 1) = 0.204, so that the product, f1 (x1 = 5)f2 (x2 = 1) = 0.091, the joint probability, f (x1 = 5, x2 = 1), is exactly equal to zero, because the outcome X1 = 5 jointly with X2 = 1, cannot occur. And if f (x1 , x2 ) 6= f1 (x1 )f2 (x2 ) at a single point, then the two pdfs cannot be equal at all. Hence, the random variables X1 and X2 are not independent. (iii) The required expected values are obtained from the marginal pdfs as follows. E(X1 ) =

5 X x1 =0

x1 f1 (x1 ) = (0.002 + 0.048 + 0.414 + 1.568 + 2.22) = 4.252

24

CHAPTER 5.

Similarly, E(X2 ) =

5 X

x2 f2 (x2 ) = (0.204 + 0.042 + 0.003) = 0.249

x2 =0

Thus, the expected number of correct results, regardless of the other results, is 4.25; the expected value of false positives (again, regardless of other results), is 0.25. Note that these results could also have been obtained directly from the given individual probabilities, 0.85 for correct results, and 0.05 for false positives. In five repetitions, we would “expect” 5 × 0.85 = 4.25 correct results, and 5 × 0.05 = 0.25 false positives. (iv) The required expected value, E(X1 + X2 ), is obtained as follows. E(X1 + X2 )

=

5 5 X X

(x1 + x2 )f (x1 , x2 )

x2 =0 x1 =0

= 3.4615 + 0.9323 + 0.1166 + 0.0053 = 4.5157 which is slightly larger than E(X1 ) + E(X2 ) = 4.501. These values are different simply because the two random variables are not independent; the values would be identical for independent random variables.

Chapter 6

Exercises 6.1 (i) From the given transformation, Y =

1 X

we obtain the inverse transformation: X=

1 Y

so that from the pdf, f (x), we obtain the required pdf as: 1 1 1 fY (y) = p(1 − p) y −1 ; y = 1, , , . . . , 0 2 3

(ii) By definition, E(Y ) =

X

fY (y) =

y

X

1

yp(1 − p) y −1

y

or, with a more convenient change of variables back to the original x, ∞ X 1 E(Y ) = p (1 − p)x−1 x x=1

If we now let q = (1 − p), we obtain: ∞

E(Y ) =

pX1 x q q x=1 x

(6.1)

Now, by defining the infinite sum indicated above as S, i.e., ∞ X 1 x S= q x x=1

1

(6.2)

2

CHAPTER 6.

we may then observe that upon differentiating once with respect to q, the result is: ∞ X dS 1 = q x−1 = dq 1 − q x=1 Thus, the infinite sum S in Eq (6.2) satisfies the differential equation: dS 1 = dq 1−q which is easily solved to yield: S = − ln(1 − q) When this is substituted into Eq (6.1), the result is: µ ¶ µ ¶ p 1 p 1 E(Y ) = ln = ln q 1−q (1 − p) p Thus, while E(X) = p1 , E(Y ) = E( X1 ) is not p, but a more complicated function. Therefore, as is true with many nonlinear transformations, µ ¶ 1 1 E 6= X E(X)

6.2 To establish that, for the random variable, Y , whose pdf is given by θ

fY (y) =

e−(2 ) y θ ; y = 1, 2, 4, 8, . . . (log2 y)!

the expected value, E(Y ), is given by: E(Y ) = eλ we begin from the definition of the expectation: X E(Y ) = yfY (y) y

and from the given pdf, obtain: X

yfY (y) =

θ

e−(2

)

X y

y

=

yy θ (log2 y)!

· ¸ 21 2θ 22 22θ 23 23θ −(2θ ) e 1+ + + + ... 1 2! 3!+

3 And since, from Eq (6.17) in the main text, θ = log2 λ so that λ = 2θ , we have, X

· ¸ 21 λ 22 λ2 23 λ3 = e−λ 1 + + + + ... 1 2! 3!+ · ¸ (2λ) (2λ)2 (2λ)3 + + + ... = e−λ 1 + 1 2! 3!+

yfY (y)

y

=

(6.3)

e−λ e2λ = eλ

as required. Alternatively, once can go directly from Y = 2X so that E(Y ) = E(2X ) =

X

2x

x

or, upon consolidation: E(Y ) = e−λ

λx e−λ x!

X (2λ)x x

x!

as above in Eq (6.3), with the rest of the result therefore following immediately. 6.3 From the given transformation, Y =

1 −X/β e β

we obtain the inverse transformation: x = ψ(y) = −β ln(βy); 0 < y
2) from fY (y); i.e., P (Y > 2) = 1 − P (Y ≤ 2)

= 1 − (fY (0) + fY (1) + fY (2)) = 1 − (0.105 + 0.237 + 0.267) = 0.391

6.14 This non-square transformation can be made square by introducing any convenient additional “squaring transformation.” We choose Y2 = X2 , so that the complete bivariate transformation is now: Y1 Y2

X1 X2 = X2 =

The corresponding inverse transformation is: x1 x2

= y1 y2 = y2

13 so that the Jacobian of the transformation ¯ ¯ y y1 J = ¯¯ 2 0 1

is: ¯ ¯ ¯ = y2 ¯

and, therefore, |J| = |y2 |. Now, by independence, the joint pdf for X1 and X2 is: fX (x1 , x2 ) =

1 xα−1 xβ−1 e−x1 e−x2 ; 0 < x1 < ∞; 0 < x2 < ∞ 2 Γ(α)Γ(β) 1

From here, and the from the inverse transformation, we obtain the joint pdf for Y1 and Y2 as: fY (y1 , y2 ) = =

1 (y1 y2 )α−1 y2β−1 y2 e−y1 y2 e−y2 ; Γ(α)Γ(β) 1 y α−1 y2α+β−1 e−y2 (y1 +1) Γ(α)Γ(β) 1

We may now integrate out the “extraneous” variable, y2 , to obtain the desired marginal pdf, f1 (y1 ), as follows: Z ∞ 1 α−1 f1 (y1 ) = y y2α+β−1 e−y2 (y1 +1) dy2 Γ(α)Γ(β) 1 0 If we now let C = (1 + y), and introduce a new variable, z = Cy2 so that

1 dz = dy2 C

then: f1 (y1 ) =

1 y α−1 Γ(α)Γ(β) 1

Z 0



1 C α+β−1

z α+β−1 e−z

1 dz C

which simplifies to: 1 1 f1 (y1 ) = y α−1 Γ(α)Γ(β) 1 C α+β

Z



z α+β−1 e−z dz

0

and since the surviving integral term is the definition of the Gamma function, Γ(α + β), we obtain finally that: f1 (y1 ) = as required.

y1α−1 Γ(α + β) Γ(α)Γ(β) (1 + y1 )α+β

14

CHAPTER 6.

6.15 (i) Since the expectation operator, E(.), is a linear operator, then from Eq (6.134), we obtain immediately, E(X) = 0.4E(V ) + 100 = 0.4µV + 100 as required. And since V ar(X) = E(X − µX )2 , with µX = E(X) obtained as above, we note that (X − µX )2

= [(0.4V + 100) − (0.4µV + 100)]2 = 0.16(V − µV )2

where, upon taking expectations, we obtain: 2 σX = V ar(X) = 0.16σV2

(ii) If the expression in Eq (6.134) is considered as a transformation from V to X, then the inverse transformation is: v = 2.5(x − 100) so that |J| = 2.5. Then, from this and the pdf in Eq (6.135), the required pdf, fX (x), is therefore given by: ½ ¾ 1 −(x − µX )2 √ exp fX (x) = 2 2σX σX 2π

6.16 The Taylor series approximation of Eq (6.138) is: Q0 ≈ Q∗0 +

210 (M ∗ )−1/4 (M − M ∗ ) 4

(6.10)

And, given the mean mass, M ∗ = 75, it follows from Eq (6.138), that the corresponding value for the resting metabolic rate, Q∗0 , is: Q∗0 = 70 × (75)3/4 = 1784.00 Thus, Eq (6.10) becomes: Q0 ≈ 1784.00 + 17.84(M − 75) From here, since M ∗ = 75 is E(M ), we obtain: E(Q0 ) ≈ 1784.00 and also:

V ar(Q0 ) ≈ (17.84)2 × V ar(M ) = 3978.32

Chapter 8

Exercises 8.1 By definition, the variance of any random variable, X, is: σ 2 = E[(X − µ)2 ] In the case of the Bernoulli random variable, with µ = p (see Eq (8.11)), σ2

=

1 ∑

(x − p)2 f (x)

x=0

= p2 f (0) + (1 − p)2 f (1) = p2 (1 − p) + (1 − p)2 p = p(1 − p) as required. Similarly, the MGF is obtained as follows: ( ) MX (t) = E etX =

1 ∑

etX f (x)

x=0

= f (0) + et f (1) = (1 − p) + pet as required. Finally, for the characteristic function, ( ) φX (t) = E ejtX

=

1 ∑

ejtX f (x)

x=0

= (1 − p) + pejt as required. 8.2 (i) The required pdf is given in the table below with the plot in the accompanying Fig 8.1. 1

2

CHAPTER 8. x 0 1 2 3 4 5

f (x) 0.222 0.556 0.222 0.000 0.000 0.000

TOTAL

1.000

Distribution Plot Hypergeometric, N=10, Nd=2, n=5 0.6 0.5

f(x)

0.4 0.3 0.2 0.1 0.0 0

1

2

3

4

5

X

Figure 8.1: The hypergeometric random variable pdf, with n = 5, Nd = 2, and N = 10. (ii) P (X > 1) = 1 − P (X ≤ 1) = 0.222 (or, in this specific case, this is the same as P (X = 2)). Also, P (X < 2) = f (0) + f (1) = 0.778. (iii) P (1 ≤ X ≤ 3) = f (1) + f (2) + f (3) = 0.778 8.3 The random variable in question is hypergeometric, with N = 100, Nd = 5, n = 10; the required probability, P (X = 0), is obtained as: f (0) = 0.584 Thus, the probability of accepting the entire crate under the indicated sampling plan is almost 0.6. If the sample size is increased to 20 (from 10), (i.e., n = 20), the result is: f (0) = 0.319 and the probability of acceptance is reduced by more than 45%.

3 8.4 For the binomial random variable, E(X) =

n ∑

xf (x) =

x=0

n ∑

xn! px (1 − p)n−x x!(n − x)! x=0

and, because there is no contribution to the indicated sum when x = 0, this reduces to:

E(X)

n ∑

xn! px (1 − p)n−x x!(n − x)! x=1

=

n ∑

n! px (1 − p)n−x (x − 1)!(n − x)! x=1

=

n ∑

n(n − 1)! ppx−1 (1 − p)n−x (x − 1)!(n − x)! x=1

=

Now, first, by letting y = x − 1, we obtain from here: E(X) = np

n−1 ∑ y=0

(n − 1)! py (1 − p)n−y−1 y!(n − y − 1)!

and finally, by letting m = n − 1, we obtain: E(X)

= np

m ∑

m! py (1 − p)m−y y!(m − y)! y=0

= np

(8.1)

because the term under the sum is precisely the Binomial pdf, so that the indicated sum is identical to 1, hence the result. For the binomial random variable variance, we begin from the fact that: V ar(X) = E[(X − µ)2 ] = E(X 2 ) − µ2 = E(X 2 ) − n2 p2

(8.2)

In this case, we have that: E(X 2 ) = =

n ∑

x2 n! px (1 − p)n−x x!(n − x)! x=0 n ∑

xn(n − 1)! ppx−1 (1 − p)n−x (x − 1)!(n − x)! x=1

= np

m ∑ (y + 1)m! y p (1 − p)m−y y!(m − y)! y=0

(8.3)

where we have made use of the same variable changes employed in determining E(X) above.

4

CHAPTER 8. Observe that Eq (8.3) may now be expanded into the following two terms: 2

E(X ) = np

{[ m ∑

] [m ]} ∑ ym! m! y m−y y m−y p (1 − p) + p (1 − p) y!(m − y)! y!(m − y)! y=0 y=0

And now, from earlier results, we recognize the first term as the expected value of the binomial random variable, Y , with parameters m, p (i.e., it is equal to mp); the second sum is 1; so that the equation simplifies to give: E(X 2 ) = np(mp + 1) = np[(n − 1)p + 1] Upon substituting this into Eq 8.2, we obtain, upon further simplification, the desired result: V ar(X) = np(1 − p) 8.5 The characteristic function for the Bernoulli random variable, Xi , is: jt

φi (t) = [pe + (1 − p)] For the indicated random variable sum, therefore, the corresponding characteristic function, φX (t), will be given by: φX (t) =

n ∏

jt

jt

[pe + (1 − p)] = [pe + (1 − p)]n

i=1

which is precisely the characteristic function for the binomial random variable, establishing the required result. 8.6 The computed hypergeometric pdf, fH (x), and the binomial counterpart, fB (x), are shown in the table below and plotted in Fig 8.2, from where one is able to see the similarities. In the limit as n → ∞, the two pdfs will coincide provided Nd /N = p, as is the case here. x 0 1 2 3 4 5 6 7 8 9 10

fH (x) 0.016 0.135 0.348 0.348 0.135 0.016 0.000 0.000 0.000 0.000 0.000

fB (x) 0.056 0.188 0.282 0.250 0.146 0.058 0.016 0.003 0.000 0.000 0.000

5

0.4

Variable Hy pergeometric Binomial(10,0.25)

f(x)

0.3

0.2

0.1

0.0 0

2

4

6

8

10

X

Figure 8.2: The hypergeometric random variable pdf, with n = 10, Nd = 5, and N = 20 (solid line, circles), and the binomial pdf, with n = 10, p = 0.25 (dashed line, squares).

8.7 From the binomial pdf f (x) =

n! px (1 − p)n−x x!(n − x)!

we obtain that: f (x + 1)

n! px+1 (1 − p)n−x−1 (x + 1)!(n − x − 1)! n! = ppx (1 − p)−1 (1 − p)n−x (x + 1)x! (n−x)! (n−x) ( )( ) n−x p n! = px (1 − p)n−x x+1 1 − p x!(n − x)! =

from which we immediately obtain: ( )( ) n−x p f (x + 1) = f (x) x+1 1−p thereby establishing that: ρ(n, x, p) =

(

n−x x+1

)(

p 1−p

(8.4)

)

Now, because x is not a continuous variable, one cannot “differentiate” f (x) in order to determine x∗ , the value at which a maximum is achieved; however, one can use the finite difference, which, from the result above, is given as: )( ) ] [( p n−x − 1 f (x) f (x + 1) − f (x) = x+1 1−p

6

CHAPTER 8.

We observe from this expression that at the “turning” point, f (x∗ + 1) − f (x∗ ) will be zero, i.e., [( )( ) ] n−x p −1 =0 x+1 1−p which requires that: p(n − x) − (1 − p)(x + 1) =0 (1 − p)(x + 1) When solved for x, the result is: x∗ = (n + 1)p − 1

(8.5)

To confirm that this is a maximum, observe that (a) when x < x∗ , two immediate implications, from Eq (8.5), are that x + 1 < (n + 1)p and also that: (n − x) > (n − x∗ ) = (n + 1)(1 − p) or, alternatively: (n + 1)(1 − p) < (n − x) Since all the quantities involved are positive, these two results combine to yield: (x + 1)(1 − p) < p(n − x) so that, from Eq (8.4), f (x + 1) > f (x) (b) When x > x∗ , it is easy to see that the opposite is the case, with f (x) < f (x + 1). Therefore, x∗ as given in Eq (8.5) is a true maximum. Finally, note that because f (x∗ + 1) − f (x∗ ) must be zero at this maximum, this means that: f (x∗ + 1) = f (x∗ ) with the implication that if x∗ is an integer, (so that (x∗ + 1) is also an integer), then the pdf, f (x), achieves a maximum at both x∗ and x∗ + 1. (For example, for n = 4, and p = 0.2, x∗ = 0 and 1 are the two values at which the binomial pdf attains a maximum; similarly, for n = 7, and p = 0.5, the binomial pdf achieves its maximum at x∗ = 3 and 4.) 8.8 The required conditional pdfs are obtained from the trinomial pdf by definition as follows: f (x1 |x2 ) =

f (x1 , x2 ) f (x1 , x2 ) ; f (x2 |x1 ) = f2 (x2 ) f1 (x1 )

7 where f1 (x1 ) and f2 (x2 ) are marginal distributions of the random variables x1 and x2 respectively, and f (x1 , x2 ) is the joint pdf: f (x1 , x2 ) =

n! px1 px2 (1 − p1 − p2 )n−x1 −x2 x1 !x2 !(n − x1 − x2 )! 1 2

From the results presented in the text in Eqs (8.44) and (8.55), we know that the marginal distributions are given as follows (each being the pdf of a binomial random variable with parameters (n, p1 ) for X1 , and (n, p2 ) for X2 ): f1 (x1 )

=

f2 (x2 )

=

n! px1 (1 − p1 )n−x1 x1 !(n − x1 )! 1 n! px2 (1 − p2 )n−x2 x2 !(n − x2 )! 2

As such, the required conditional pdfs are obtained immediately as follows: f (x1 |x2 ) = f (x2 |x1 ) =

px1 1 (n − x2 )! (1 − p1 − p2 )n−x1 −x2 x1 !(n − x1 − x2 )! (1 − p2 )n−x2 (n − x1 )! px2 2 (1 − p1 − p2 )n−x1 −x2 x2 !(n − x1 − x2 )! (1 − p1 )n−x1

8.9 This two-dimensional random variable is clearly trinomial; the joint pdf in question is therefore given as: f (x1 , x2 ) =

n! 0.75x1 0.2x2 0.05n−x1 −x2 x1 !x2 !(n − x1 − x2 )!

valid for x1 = 0, 1, 2, and x2 = 0, 1, 2; it is zero otherwise. The desired joint pdf computed for specific values of the (x1 , x2 ) ordered pair is shown in the table below. The marginal pdfs, obtained from sums across appropriate rows and down appropriate columns of the joint pdf table are also shown. X1 → X2 ↓ 0 1 2 f1 (x1 )

0

f (x1 , x2 ) 1

2

0.0025 0.0200 0.0400 0.0625

0.075 0.300 0 0.375

0.5625 0 0 0.5625

f2 (x2 ) 0.64 0.32 0.04 1

From here, the required conditional pdfs are computed according to: f (x1 |x2 ) =

f (x1 , x2 ) f (x1 , x2 ) ; f (x2 |x1 ) = f2 (x2 ) f1 (x1 )

to yield the following results: first, for f (x1 |x2 ),

8

CHAPTER 8.

X1 → X2 ↓ 0 1 2

0

f (x1 |x2 ) 1

2

0.0039 0.0625 1.0000

0.1172 0.9375 0

0.8789 0 0

1.0000 1.0000 1.0000

and for f (x2 |x1 ), f (x2 |x1 ) 0 1

X1 → X2 ↓ 0 1 2

0.04 0.32 0.64 1.00

0.20 0.80 0 1.00

2 1.00 0 0 1.00

8.10 (i) To establish the first equivalence (between Eq (8.51) and Eq (8.52)) requires showing that ( ) ( ) x+k−1 x+k−1 = k−1 x Observe that, by definition, ( ) i i! = j!(i − j)! j so that: ( ) ( ) x+k−1 (x + k − 1)! (x + k − 1)! x+k−1 = = = k−1 (k − 1)!x! x!(k − 1)! x as required. Next, when α is an integer, we know from Eq (8.56) that Γ(α) = (α − 1)!; as a result, when written in terms of factorials, Eq (8.52) is: f (x) =

(x + k − 1)! k p (1 − p)x x!(k − 1)!

in terms of the Gamma function, this then becomes: f (x) =

Γ(x + k) k p (1 − p)x Γ(k)x!

as in Eq (8.53). (ii) If X is now defined as the total number of trials required to obtain exactly k successes, then to observe exactly k successes, the following events would have to

9 happen: (a) obtain (x−k) failures and (k −1) successes in the first (x−1) trials; and (b) obtain a success on the xth trial. As such, under these circumstances, ( ) ( ) x − 1 k−1 x−1 k P (X = x) = p (1 − p)x−k × p = p (1 − p)x−k k−1 k−1 is the appropriate probability model. A comparison of this pdf with that in Eq (8.51) shows that the variable x in Eq (8.51) has been replaced by (x − k) in this new equation. This makes perfect sense because, here, X is the total number of trials required to obtain exactly k successes, which includes both the failures and the k successes, whereas in Eq (8.51), X is defined as the number of failures (not trials) before the k th success, in which case, the total number of trials (required to obtain exactly k successes) is X + k. 8.11 From f (x), the pdf for the negative binomial random variable, we obtain that: ( ) f (x + 1) (x + k)!x! x+k = (1 − p) = (1 − p) f (x) (x + k − 1)!(x + 1)! x+1 Thus,

( ρ(k, x, p) =

From here,

[( f (x + 1) − f (x) =

x+k x+1

x+k x+1

)

)

(1 − p) ] (1 − p) − 1 f (x)

so that at the turning point, where f (x + 1) = f (x), we have: (x + k)(1 − p) − (x + 1) = 0 which, when solved for x, yields the result: x∗ =

(1 − p)k − 1 p

Again, note that if x∗ is an integer, then, by virtue of the fact that f (x∗ ) = f (x∗ + 1) at the maximum, the pdf attains a maximum at both x∗ and x∗ + 1. For example, with p = 0.5 and k = 3, f (x) is maximized when x∗ = 1 and x∗ = 2. For the geometric random variable, X is defined as the total number of trials (not failures) required to obtain the first success. Thus, the geometric pdf is obtained from the alternate definition of the negative binomial pdf with k = 1; i.e., from: (x − 1)! f (x) = pk (1 − p)x−k (k − 1)!(x − k)! Thus, in this particular case: ( ) f (x + 1) x = (1 − p) f (x) x−k+1

10

CHAPTER 8.

Specifically for the geometric random variable, with k = 1, we obtain: f (x + 1) = (1 − p) = q f (x) or f (x + 1) = qf (x)

(8.6)

And now, since q =1−p is always such that 0 < q < 1, then Eq (8.6) indicates that the pdf for the geometric random variable is monotonically decreasing, since each succeeding value of f (x) will be smaller than the immediately preceding one. 8.12 (i) For the geometric random variable with pdf: f (x) = pq x−1 the expected value is obtained as: E(X) = = =

∞ ∑

xpq x−1

x=1 ∞ ∑

p xq x q x=1

p q 1 = 2 q (1 − q) p

as required. The variance is obtained from: V ar(X) = E(X 2 ) − (E(X))2 E(X 2 ) is obtained as: E(X 2 ) = = =

∞ ∑

x2 pq x−1

x=1 ∞ ∑

p x2 q x q x=1

1+q p q(1 + q) = q (1 − q)3 p2

so that: V ar(X) =

1 q 1+q − 2 = 2 p2 p p

11 as required. (ii) From the given probabilities, f (2) = p(1 − p) = 0.0475, or f (10)p(1 − p)9 = 0.0315, we obtain the geometric random variable parameter, p, as: p = 0.05 The required probability, P (2 ≤ X ≤ 10), may be obtained several different ways. One way (which is a bit “pedestrian”) is to compute individual probabilities, f (2), f (3), . . . , f (10) and add them; alternatively, we could use the cumulative probability F (10) = P (X ≤ 10), and the fact that P (X ≤ 10) = f (0) + f (1) + P (2 ≤ X ≤ 10) From the computed value of p, we obtain (say from MINITAB, or any other such software package), that: F (10) = P (X ≤ 10) = 0.4013; f (0) = 0; f (1) = 0.05 so that the required probability is obtained as: P (2 ≤ X ≤ 10) = 0.401 − 0.050 = 0.351 (iii) The implications of the supplied information are as follows: E(X) =

1 = 200 ⇒ p = 0.005 p

with the required probability being P (X > 200). From here, we obtain this probability as: P (X > 200) = 1 − P (X ≤ 200) = 1 − 0.633 = 0.367 Thus, 36.7% of the polymer product is expected to have chains longer than 200 units. 8.13 For the given logarithmic series pdf to be a legitimate pdf, the following condition must hold: ∞ ∞ ∑ ∑ px f (x) = α =1 x x=1 x=1 which requires: α= where S(p) =

1 S(p) ∞ ∑ px x=1

x

To evaluate this infinite sum, we first differentiate once with respect to p to obtain: ∞ ∞ ∑ ∑ 1 dS = px−1 = py = dp 1 − p x=1 y=0

12

CHAPTER 8.

which, upon integration, yields the result: S = − ln(1 − p) Thus, the constant α must be given by: −1 ln(1 − p)

α=

as required. The expected value for this random variable is obtained as: E(X) =

∞ ∑

xf (x) = α

x=1

∞ ∑

px =

x=1

αp (1 − p)

as required. The variance is obtained from: V ar(X) = E(X 2 ) − (E(X))2 First, we obtain E(X 2 ) as: E(X 2 ) = α

∞ ∑

xpx =

x=1

so that: V ar(X) =

αp (1 − p)2

αp α2 p2 αp(1 − αp) − = 2 2 (1 − p) (1 − p) (1 − p)2

as required. By definition, the MGF is: tX

M (t) = E(e

∞ ∑ etx

)=α

x=1

x

px

If we represent the indicated sum by Se , i.e., Se =

∞ ∑ etx x=1

then,



x ∞

px

∑ dSe 1∑ x 1 = etx px−1 = r = dp p p x=1 x=1

(

r 1−r

where r = pet Thus, et et dp dSe = ⇒ dSe = t dp 1 − pe 1 − pet

)

13 so that upon integration, we obtain: Se = − ln(1 − pet ) Therefore, M (t) = αSe =

ln(1 − pet ) ln(1 − p)

as required. The characteristic function follows immediately from the arguments above, upon replacing et with ejt . 8.14 The general negative binomial pdf is: f (x) = When p=

(x + k − 1)! k p (1 − p)x x!(k − 1)!

k 1 ) ; ⇒p= ( k+λ 1 + λk

so that: 1−p=

λ k+λ

the general pdf given above becomes: f (x)

(x + k − 1)! λx ( x!(k − 1)!(k + λ)x 1 + λ )k k [ ] (x + k − 1)! λx = (k − 1)!(k + λ)x x! (1 + λ )k

=

(8.7)

k

There terms in the square brackets in Eq (8.7) above may be rewritten as follows: [ ] (x + k − 1)! [k + (x − 1)][k + (x − 2)] · · · k(k − 1)! = x (k − 1)!(k + λ) (k − 1)!(k + λ)x [k + (x − 1)][k + (x − 2)] · · · k = (k + λ)x where the numerator consists of exactly x terms. In the limit as k → ∞ therefore, this ratio tends to 1, so that, in Eq (8.7) above, lim f (x) =

k→∞

λx −λ e x!

which is the pdf for a Poisson random variable. 8.15 From the pdf for the Poisson random variable, f (x) =

λx −λ e x!

14

CHAPTER 8.

we obtain λx+1 −λ e (x + 1)!

f (x + 1) = so that:

λ f (x + 1) = f (x) x+1 Thus, for 0 < λ < 1, we observe immediately that f (x + 1) < f (x) always, so that under these conditions, the Poisson pdf is always monotonically decreasing. But when λ > 1, ( f (x + 1) − f (x) =

) λ − 1 f (x) x+1

so that, at the turning point, where f (x + 1) − f (x) = 0, λ=x+1 implying that the maximum is attained at the value, x∗ , given by x∗ = λ − 1 Observe that if λ is an integer, then x∗ will also be an integer; and by virtue of the fact that when the maximum is attained, f (x + 1) = f (x), the implication is that, under these conditions, the Poisson pdf will achieve a maximum at the two values: x∗ ∗

x +1

= λ−1 = λ

For example, for the Poisson random variable with λ = 3, this result states that the pdf achieves a maximum at x = 2 and x = 3. The computed values of the pdf are plotted for this specific case in Fig 8.3. Note the values of the pdf at these two values of x in relation to values taken by the pdf at other values of x. 8.16 (i) The complete pdf for the indicated binomial random variable, fB (x), and the corresponding pdf for the indicated Poisson random variable, fP (x), are both shown in the table below.

15

Poisson pdf: lambda=3 0.25

0.20

f(x)

0.15

0.10

0.05

0.00 0

2

4

6

8

10

12

14

x

Figure 8.3: Illustrating the maxima of the pdf for a Poisson random variable with λ = 3.

x 0 1 2 3 4 5 6 7 8 9 10

fB (x) X ∼ Bi(10, 0.05) 0.599 0.315 0.075 0.010 0.001 0.000 0.000 0.000 0.000 0.000 0.000

fP (x) X ∼ P(0.5) 0.607 0.303 0.076 0.013 0.002 0.000 0.000 0.000 0.000 0.000 0.000

A plot of these two pdfs is shown in Fig 8.4, where the two are seen to be virtually indistinguishable. (ii) When n = 20 and p = 0.5 for the binomial random variable (note the high probability of “success”), and λ = 10 for the Poisson random variable, the computed values for the resulting pdfs are shown in the table below.

16

CHAPTER 8.

Variable Binomial(10,0.05) Poisson(0.5)

0.6 0.5

f(x)

0.4 0.3 0.2

0.1 0.0 0

2

4

6

8

10

X

Figure 8.4: Comparison of the Bi(10,0.05) pdf with the Poisson(0.5) pdf. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

fB (x) X ∼ Bi(20, 0.5) 0.000 0.000 0.000 0.001 0.005 0.015 0.037 0.074 0.120 0.160 0.176 0.160 0.120 0.074 0.037 0.015 0.005 0.001 0.000 0.000 0.000

fP (x) X ∼ P(10) 0.000 0.000 0.002 0.008 0.019 0.038 0.063 0.090 0.113 0.125 0.125 0.114 0.095 0.073 0.052 0.035 0.022 0.013 0.007 0.004 0.002

A plot of the two pdfs is shown in Fig 8.5, where we notice that, even though the two pdfs are somewhat similar, the differences between them are more obvious than was the case in part (i) above. The reason for this is that in (i), the prob-

17

0.20

Variable Binomial(20,0.5) Poisson(10)

f(x)

0.15

0.10

0.05

0.00 0

5

10 X

15

20

Figure 8.5: Comparison of the Bi(20,0.5) pdf with the Poisson(10) pdf. ability of “success” is much lower that in part (ii). The fundamental connection between these two random variables is such that the Poisson approximation of the binomial random variable is better for smaller probabilities of success. 8.17 From the characteristic function given in Section 8.7.3 for the Poisson random variable (see also Eq (6.116)), i.e., φXi (t) = eλi [(e

jt

−1)]

and from the results in Eq (6.54) in the main text, we know that φY (t), the cf of the random variable sum: n ∑ Y = Xi i=1

is obtained as: φY (t) =

n ∏

φXi (t) = e(λ1 +λ2 +···+λn )[(e

jt

−1)]



= eλ

[(ejt −1)]

i=1 ∗

where λ is the sum: λ∗ =

n ∑

λi

i=1

By comparison with the cf of the Poisson random variable, X, the expression shown above for φY (t) indicates that Y is also a Poisson random variable, with parameter λ∗ . The corresponding pdf is given by: ∗

e−λ (λ∗ )y ; y = 0, 1, 2, . . . fY (y) = y!

18

CHAPTER 8.

8.18 The required probabilities are obtained as follows: The event of “not experiencing a yarn break in a particular shift” corresponds to x = 0, and therefore, P (X = 0|λ = 3) = 0.05 is the required probability in this case. For the event of “experiencing more than 3 breaks per shift,” P (X > 3|λ = 3) = 1 − P (X ≤ 3) = 1 − 0.647 = 0.353 8.19 This problem involves a Poisson random variable with intensity η = 0.0002 per sq cm. In an area of size 1 sq m, (which is 104 sq cm), λ = 0.0002 × 104 = 2 so that the required probability is obtained as P (X > 2|λ = 2) = 1 − P (X ≤ 2) = 0.323 8.20 The required probabilities are shown in the table below for the given values of λ. λ 0.5 1 2 3

P (X ≤ 2|λ) 0.986 0.920 0.677 0.423

This shows the probability of observing 2 or fewer Poisson events monotonically decreasing as the mean number of occurrence increases. This makes perfect sense because as the mean number of occurrences increases, such conditions favor an increasingly higher number of occurrences, so that commensurately, the probability of observing two or fewer occurrences should decrease. 8.21 Obtaining a total number of y hatchlings from x eggs is akin to observing Y successes from X trials, i.e., Y is a binomial random variable with parameters X (this time, itself a random variable) and p. In the current context, this implies that, given x, the number of eggs, the conditional pdf for Y is: ( ) x y P (Y = y|X = x) = p (1 − p)x−y ; y = 0, 1, 2 . . . , x (8.8) y The total, unconditional pdf for Y is obtained by summing over all the possible values of X, (keep in mind: there can never be more hatchlings than eggs) i.e., P (Y = y) =

∞ ∑

P (Y = y|X = x)P (X = x)

x=y ∞ ( ∑

) ( x −λ ) x y λ e x−y p (1 − p) = y x! x=y ( ( ) ) ∞ ∑ y+k λy+k e−λ = py (1 − p)k y (y + k)! k=0

19 where we have introduced the variable change: x=y+k Observe that in physical terms, if x is the total number of eggs, and y is the number of successful hatchlings, then the newly introduced index, k, is the number of failed hatchlings. The equation above may now be expanded and simplified further to yield: P (Y = y) =

∞ ∑ (y + k)! k=0

y!k!

(pλ)y

λk e−λ (1 − p)k (y + k)!



=

(pλ)y −λ ∑ [(1 − p)λ]k e y! k! k=0

= =

(pλ)y −λ (1−p)λ e e y! e−pλ (pλ)y y!

as required.

Application Problems 8.22 (i) The problem involves a hypergeometric random variable, with N = 15, Nd = 4, and n = 2; the required probabilities are therefore obtained easily from the hypergeometric pdf as follows: (a) P (X = 2) = 0.057 (b) P (X = 0) = 0.524 (c) P (X = 1) = 0.419 (ii) If the problem had been misconstrued as involving a binomial random variable, with the proportion of irregular chips in the lot as the binomial probability of “success,” i.e., Nd 4 p= = = 0.267 N 15 then the required probabilities, obtained from the Bi(n, p) pdf with n = 2, will be: (a) f (2) = 0.071 compared to 0.057 above; (b) f (0) = 0.538 compared to 0.524; (c) f (1) = 0.391 compared to 0.419 8.23 This problem involves a trinomial random variable, with x1 as the total number of pumps working continuously for fewer than 2 years; x2 as the total number of pumps working continuously for 2 to 5 years; and x3 as the the total

20

CHAPTER 8.

number of pumps working for more than 5 years. As specified by the problem, the probabilities of each of these events are, respectively, p1 = 0.3; p2 = 0.5; p3 = 0.2. The appropriate joint pdf is therefore given by: f (x1 , x2 ) =

n! px1 px2 pn−x1 −x2 x1 !x2 !(n − x1 − x2 )! 1 2 3

For this specific problem, n = 8, and x1 = 2, x2 = 5, x3 = 1, we therefore obtain the required probability as: f (2, 5) = 0.095 8.24 (i) The problem statement suggests that 50 2 = N 10 so that N = 250 is a reasonable estimate of the tiger population. The two most important sources of error are: • The low sample size: a sample of 10 is too small for one to be confident that its composition will be representative of the entire population’s composition; • It is also possible that the tagged tigers have not been completely “mixed” in uniformly with the population. Any segregation will lead to biased sampling one way or another (either too few, or too many “tagged” tigers in the sample) and there will be no way of knowing which is which. (ii) The applicable pdf in this case is: f (x|n, p) =

n! px (1 − p)n−x x!(n − x)!

(8.9)

from which we obtain the following table. p 0.1 0.2 0.3

f (x = 2|10, p) 0.194 0.302 0.234

The indication is that of the three postulated values for p, p = 0.2 yields the highest probability of obtaining X = 2 tagged tigers from a sample of 10. On this basis alone, one would then consider that of the three postulated values of p, p = 0.2 seems most “likely” to represent the data. (iii) In general, the pdf of interest is as shown in Eq (8.9) above, with x = 2, n = 10. The optimum of this function can be determined via the usual calculus route as follows: df = 2Cp(1 − p)8 − 8Cp2 (1 − p)7 = 0 dp

21 (where C, a constant, represents the factorials). This expression simplifies immediately to 2Cp(1 − p)7 [(1 − p) − 4p] = 0 so that provided that (1 − p) ̸= 0, and p ̸= 0, this is solved to yield p = 0.2 confirming that indeed p = 0.2 provides an optimum for f . A second derivative evaluated at this value confirms that the optimum is indeed a maximum. 8.25 (i) From the supplied data, it is straightforward to obtain the following empirical frequency distribution:

is

x

Frequency

0 1 2 3 4 5 6

4 8 8 6 3 1 0

Relative frequency fE (x) 0.133 0.267 0.267 0.200 0.100 0.033 0.000

TOTAL

30

1.000

The corresponding histogram is shown in Fig 8.6. The expected value for x ∑ i xi fE (xi ) = 1.967.

(ii) Finding contaminant particles on a silicon wafer is a rare event, and the actual data shows that when such particles are found, they are few in number. This suggests that the underlying phenomenon is Poisson; the postulated model is therefore: f (x) =

e−λ λx x!

Using λ = 2 (the result in (i) above rounded up to the nearest integer), we generate the following theoretical f (x), shown along with the empirical probability distribution in the following table:

22

CHAPTER 8.

Histogram of Number of Flaws on Wafer 9 8 7

Frequency

6 5 4 3 2 1 0 0

1

2 3 Number of Flaws on Wafer

4

5

Figure 8.6: Histogram of silicon wafer flaws.

x 0 1 2 3 4 5 6 7 8

f (x|λ = 2) 0.135 0.271 0.271 0.180 0.090 0.036 0.012 0.003 0.001

fE (x) 0.133 0.266 0.266 0.200 0.100 0.033 0.000 0.000 0.000

TOTAL

0.999

1.000

The theoretical distribution is seen to agree remarkably well with the empirical distribution. (iii) From the theoretical distribution, we obtain that P (X > 2|λ = 2) = 1 − P (X ≤ 2|λ = 2) = 1 − 0.677 = 0.323 so that, on average, about 32.3% of the wafers will contain more than 2 flaws. (The data set shows 1/3 of the sample wafers with 3 or more flaws). Thus, according to the stipulated criterion, this particular process is no longer economically viable. 8.26 The frequency table and histogram generated from the supplied data are shown below:

23 x Pumps 4 5 6 7 8 9 10

Frequency 1 2 1 7 8 9 2

Relative frequency fE (x) 0.009 0.037 0.110 0.224 0.298 0.235 0.083

TOTAL

30

0.996

Histogram of Available Pumps 9 8

Frequency

7 6 5 4 3 2 1 0 4

5

6

7 8 Available Pumps

9

10

Figure 8.7: Histogram of available pumps. The appropriate probability model is binomial, with n = 10 as the number of “trials,” and p as the probability that any particular pump functions properly. From the data, the average number of available pumps (those functioning properly on any particular day) is 7.8 out of a total of 10. This is obtained in the usual fashion by summing up the total number of available pumps each day (234) and dividing by the total number of days (30). From the expression for the mean of a Bi(n, p) random variable, the implication is that np = 7.8 and since n = 10, we obtain: p = 0.78 as an estimate of the binomial random variable probability of “success” in this case, the probability that any particular pump functions properly on any par-

24

CHAPTER 8.

ticular day. From here, we are able to compute the theoretical pdf; this pdf is shown in the table below along with the corresponding empirical data. x Pumps 4 5 6 7 8 9 10

Theoretical pdf f (x|p = 0.78) 0.033 0.067 0.033 0.233 0.267 0.300 0.067

Relative frequency fE (x) 0.009 0.037 0.110 0.224 0.298 0.235 0.083

TOTAL

1.000

0.996

The theoretical pdf (with p = 0.78) and the empirical frequency distribution are compared graphically in Fig 8.8 below where the two are seen to be reasonably close. The binomial model therefore appears to be adequate.

0.30

Variable Empirical pdf Theoretical (p=0.78)

0.25

f(x)

0.20 0.15 0.10 0.05 0.00 4

5

6 7 8 9 X (number of available pumps)

10

Figure 8.8: Empirical distribution of available pumps (solid line; circles) and theoretical binomial pdf with n = 10; p = 0.78 (dashed line; squares).

8.27 (i) Let X represent the total number of pumps functioning at any particular point in time. Then the problem translates to determining the probability that x ≥ 4 for a binomial random variable, where n = 8, and p = 1 − 0.16 = 0.84 is the probability that the selected pump will function (since the probability that the pump will fail is given as 0.16). The required probability is P (X ≥ 4); and since X is discrete, this probability is obtained as: P (X ≥ 4) = 1 − P (X ≤ 3) = 1 − 0.004 = 0.996

25 (ii) In this case, first, we need to determine P (X ≤ 5), which is obtained as 0.123. This indicates that on average, the alarm will go off approximately 12.3 percent of the time. If a “unit of time” is assumed to be a day, then in a period of 30 days, one would expect the alarm to go off 12.3 × 30 = 3.69, approximately 4, times. (Any other reasonable assumption about the “unit of time” is acceptable.) (iii) Currently, with the probability of failure as 0.16—so that the probability of functioning is 0.84—the probability that four or more pumps will fail (which, with n = 8 as the total number of pumps, is equivalent to the probability that 4 or fewer pumps will function) is obtained as: P (X ≤ 4|p = 0.84) = 0.027 If the probability of failure increases to 0.2 so that the probability of functioning decreases to 0.8, we obtain that: P (X ≤ 4|p = 0.8) = 0.056 and the percentage increase, ∆%, is obtained as ∆% =

0.056 − 0.027 = 107.4% 0.027

a surprisingly large percentage increase, given that the increase in the probability of failure (from 0.16 to 0.2) appears to be relatively innocuous. (This portion of the problem could also have been approached from the perspective of pump failures, say with Y as the total number of failed pumps: the results will be the same.) 8.28 (i) The mean number of accidents is obtained as ∑ xi fi x ¯ = ∑i = 0.465 i fi where xi is the number of accidents, and fi , the corresponding observed frequency with which xi accidents occurred. The variance is obtained as: σ 2 = 0.691 For a true Poisson process, the mean and the variance are theoretically equal, and will be close in practice. This is clearly not the case here; in fact, the variance is much larger than the mean, implying that this is an “overdispersed” Poisson-like phenomenon. (ii) Using λ = 0.465, the theoretical Poisson pdf is shown in the following table, and upon multiplying by 647, the total number of subjects, we obtain the indicated predicted frequency. The table also includes the observed frequency for comparison.

26

CHAPTER 8. x, Number of Accidents 0 1 2 3 4 5+

Poisson pdf f (x|λ = 0.465) 0.628 0.292 0.068 0.011 0.001 0.000114

Predicted Frequency 406.403 188.978 43.937 6.810 0.792 0.074

Observed Frequency 447 132 42 21 3 2

From here, other than the values obtained for x = 2, the disagreement between the corresponding predicted and observed frequencies is quite notable. (iii) The relationships between the mean and variance of a negative binomial random variable and the parameters k and p, are: µ = σ2

=

kq p kq p2

from where we obtain the inverse relationships: p=

µ µp ; k= σ2 q

In this specific case, with µ = 0.465 and σ 2 = 0.691, we obtain the following estimates for the corresponding negative binomial pdf parameters: p = 0.673; k = 0.96 (rounded up to k = 1) The resulting pdf for the negative binomial random variable with these parameters, and the corresponding predicted frequency, are shown in the table below along with the observed frequency for comparison. x, Number of Accidents 0 1 2 3 4 5+

Neg Binomial(1,0.673) pdf, f (x) 0.673 0.220 0.072 0.024 0.008 0.003

Predicted Frequency 435.431 142.386 46.560 15.225 4.979 1.628

Observed Frequency 447 132 42 21 3 2

The agreement between the frequency predicted by the negative binomial model and the observed frequency is seen to be quite good—much better than the Poisson model prediction obtained earlier. Fig 8.9 shows a plot of the observed frequency (solid line, circles) compared with the Poisson model prediction (short dashed line, diamonds), and the negative binomial model prediction (long

27 dashed line, squares). This figure provides visual evidence that the negative binomial model prediction agrees much better with the observation than does the Poisson model prediction.

500 Variable ObsFreq PredFreq(NBi) PredFreq(Poisson)

Frequency

400

300

200

100

0 0

1

2 3 X, Number of Accidents

4

5

Figure 8.9: Frequencies of occurrence of accidents: Greenwood and Yule data (solid line; circles), negative binomial model prediction with k = 1; p = 0.673 (long dashed line; squares), and Poisson model prediction with λ = 0.465 (short dashed line; diamonds). (iii) The objective measure of the “goodness-of-fit,” C 2 , defined in Eq (8.100), is computed as follows for each model. First, for the Poisson model: CP2

=

=

(447 − 406.403)2 (132 − 188.978)2 (42 − 43.937)2 + + + 406.403 188.978 43.937 2 2 (21 − 6.810) (5 − 0.866) + 6.810 0.866 70.619

and, for the negative binomial model: 2 CN B

(447 − 435.431)2 (132 − 142.386)2 (42 − 46.560)2 + + + 435.431 142.386 46.560 (5 − 6.604)2 (21 − 15.225)2 + 15.225 6.604 = 4.092 =

We may now observe that the value of the “goodness-of-fit” quantity is much smaller for the negative binomial model than for the Poisson model. From the definition of this quantity, the better the fit, the smaller the value of C 2 . Hence we conclude that the negative binomial model provides a much better fit to the

28

CHAPTER 8.

Greenwood and Yule data than does the Poisson model. 8.29 (i) Let X1 represent the number of children with sickle-cell anemia (SCA), and X2 , the number of children that are carriers of the disease. If n is the total number of children born to the couple, then, the joint probability distribution of the bivariate random variable in question, (X1 , X2 ), is the following trinomial distribution: f (x1 , x2 ) =

n! px1 px2 (1 − p1 − p2 )n−x1 −x2 x1 !x2 !(n − x1 − x2 )! 1 2

In this specific case, n = 4, p1 = 0.25, and p2 = 0.5. (ii) The required probabilities are obtained by substituting appropriate values for x1 and x2 into the joint pdf above; the results are as follows: (a) x1 = 0; x2 = 2; f (x1 , x2 ) = 0.094 (b) x1 = 1; x2 = 2; f (x1 , x2 ) = 0.188 (c) x1 = 2; x2 = 2; f (x1 , x2 ) = 0.094 (iii) In this case, the pdf required for computing the probabilities of interest is the conditional probability, f (x1 |x2 = 1). In general, for the trinomial random variable, the conditional pdf, f (x1 |x2 ), is: f (x1 |x2 ) =

f (x1 , x2 ) (n − x2 )! px1 1 = (1 − p1 − p2 )n−x1 −x2 f2 (x2 ) x1 !(n − x1 − x2 )! (1 − p2 )n−x2

(see Problem 8.8); and in this specific case, with x2 = 1, we obtain: f (x1 |x2 = 1) =

(n − 1)! px1 1 (1 − p1 − p2 )n−x1 −1 x1 !(n − x1 − 1)! (1 − p2 )n−1

which, with n = 4, simplifies to: f (x1 |x2 = 1) =

3! 0.25x1 3!0.53 3−x1 (0.25) = x1 !(3 − x1 )! 0.53 x1 !(3 − x1 )!

The required probabilities are computed from here to yield the following results: (a) x1 = 0; f (x1 = 0|x2 = 1) = 0.125 (b) x1 = 1; f (x1 = 1|x2 = 1) = 0.375 (c) x1 = 2; f (x1 = 2|x2 = 1) = 0.375 (d) x1 = 3; f (x1 = 3|x2 = 1) = 0.125 Note that these four probabilities sum up to 1 since they constitute the entire collection of all possible outcomes once x2 is fixed at 1. 8.30 (i) With X1 as the number of children with the disease, what is required is E(X1 ). Because the marginal distribution of X1 is binomial, Bi(n, p), we know in this case that E(X1 ) = np = 8 × 0.25 = 2. The implication is that the couple can expect to have 2 children with the disease, so that the annual expected cost

29 will the equivalent of US$4000. (ii) Let Z represent the number of crisis episodes that this family will endure per year. This random variable is a composite variable consisting of two parts: first, to experience an episode, the family must have children with the sickle-cell anemia (SCA) disease. From Problem 8.29 above, X1 , the number of children with the disease, is a random variable with a Bi(n = 8, p1 = 0.25) distribution. Secondly, any child with the disease will experience Y “crisis” episodes per year, itself another random variable with a Poisson, P(λ = 1.5) distribution. Thus, Z is a compound random variable, which, as derived in Problem 8.29 above, possesses the distribution: e−p1 λ (p1 λ)z z! In this specific case with p1 = 0.25 and λ = 1.5, the appropriate pdf is: f (z) =

f (z) =

e0.375 (0.375)z z!

from where we obtain f (3) = 0.006 Thus, the probability is 0.006 that this family will endure 3 crisis episodes in one year. 8.31 (i) The phenomenon in question is akin to the situation where one experiences X “failures” (i.e., failure to identify infected patients) before the k th “success” (i.e., successfully identifying only k out of a total of X + k infected patients), with the probability of “success” being 1/3. The appropriate probability model is therefore a negative binomial, N Bi(k, p) distribution, with k = 5 and p = 1/3. The pdf is given by: f (x) =

(x + k − 1)! k p (1 − p)x x!(k − 1)!

From this general expression, one may find the desired maximum, x∗ , analytically, or else numerically—by computing the probabilities for various values of x, and then identifying the value of x for which the computed probability is largest. We consider the analytical route first. From the pdf given above, we observe that: (x + k)! f (x + 1) = pk (1 − p)x+1 (x + 1)!(k − 1)! so that, after some simplification, ( ) x+k f (x + 1) = (1 − p) f (x) x+1 At the maximum—the turning point of this discrete function—the condition, f (x + 1) = f (x), must hold; i.e., in this case: (x + k)(1 − p) = (x + 1)

30

CHAPTER 8.

which, when solved for x, yields: x∗ =

kq − 1 ; (q = 1 − p) p

Of course, if x∗ is an integer, f (x) will also show a maximum at (x∗ + 1) by virtue of the condition for the optimum whereby f (x + 1) = f (x). For this specific problem, with k = 1 and p = 1/3, we obtain that the maximum occurs at 10 −1 x∗ = 3 1 =7 3 ∗

and because this is an integer, x = 8 is also a valid maximum point. Numerically, the probability distribution function may be computed for a negative binomial random variable with k = 1 and p = 1/3 for various values of x; the result is shown in the table below and plotted in Fig 8.10. Observe that the values at which the maximum probability occurs are confirmed indeed to be x = 7 and x = 8, as determined analytically earlier. As such, the “most likely” number of infected but not yet symptomatic patients is 8 (choosing the larger value). The implication is that with 5 already identified, the total population of infected patients is most likely to be 13. x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

f (x) 0.0041 0.0137 0.0274 0.0427 0.0569 0.0683 0.0759 0.0795 0.0795 0.0765 0.0714 0.0649 0.0577 0.0503 0.0431 0.0364

(ii) With the random variable, X, defined as the number of patients that are infected but not yet identified, the total number of infected patients will be X + k; thus, the required probability is P (X + k > 15) which translates to P (X > 10) and determined as: P (X > 10) = 1 − P (X ≤ 10) = 1 − 0.596 = 0.404

31

0.08 0.07 0.06

f(x)

0.05 0.04 0.03 0.02 0.01 0.00 0

2

4

6

8

10

12

14

16

x

Figure 8.10: Probability distribution function (pdf) for the negative binomial random variable with k = 5, n = 1/3; it is maximized at x = 7 and x = 8. This shows that there is a fairly sizable probability of 0.4 that, with 5 patients already identified as infected, the town may have to declare a state of emergency. 8.32 (i) The appropriate model is the Poisson pdf, with λ = 8.5; the required probability is obtained as: P (X = 10|λ = 8.5) = 0.11 (ii) From the results in Exercise 8.15, we obtain: x∗ = λ − 1 = 7.5 which is not an integer, hence, it is a unique maximum. The “most likely” number of failures is therefore 7.5 per year. The required probability (of having more failures in one year than this “most likely” number of failures) is determined as: P (X ≥ 7.5) = 1 − P (X ≤ 7.5) = 1 − P (X ≤ 7) = 1 − 0.386 = 0.614 (Note: because X can only take integer values, P (X ≤ 7.5) = P (X ≤ 7).) (iii) The required probability, P (X ≥ 13), is obtained as: P (X ≥ 13) = 1 − P (X ≤ 12) = 1 − 0.909 = 0.091 The implication is that, if conditions are “typical,” there is a fairly small chance (just about 9%) that one would see 13 or more failures in one year. If this event then occurs, there is therefore a reasonable chance that things may no longer be “typical,” that something more fundamental may be responsible for causing

32

CHAPTER 8.

such an “unusually” large number of failures in one year. 8.33 (i) Assuming that the indicated frequencies of occurrence for each accident category is representative of event probabilities, the phenomenon in question is multinomial, with the pdf: f (x1 , x2 , x3 , x4 ) =

n! px1 px2 px3 px4 x1 !x2 !x3 !x4 ! 1 2 3 4

where x1 is the number of eye injuries, with p1 = 0.40 as the probability of recording a single eye injury; x2 is the number of hand injuries, with p2 = 0.22 as the probability of recording a single hand injury; x3 is the number of back injuries, with p3 = 0.20 as the probability of recording a single back injury; and finally, x4 is the number of “other” injuries, with p4 = 0.18 as the probability of recording one of the injuries collectively categorized as “other.” With n = 10, (i.e., a total of 10 recorded injuries selected at random, distributed as indicated) the required probability, P (x1 = 4; x2 = 3; x3 = 2; x4 = 1) is obtained as: f (4, 3, 2, 1) =

10! 0.44 0.223 0.202 0.181 = 0.0011 4!3!2!1!

This is the probability that the 10 recorded injuries selected at random are distributed as noted in the problem statement. The probability appears to be very small mostly because there are so many different ways in which 10 injuries can be distributed among the 4 categories. (ii) Since we are concerned here with eye injuries alone, regardless of the other injuries, we recall that the marginal distribution of each component variable, Xi , of a multinomial random variable is a binomial pdf, i.e., in this case, X1 ∼ Bi(n, p1 ). Thus the required probability is obtained from the binomial Bi(5, 0.4) pdf as P (X < 2) = P (X ≤ 1) = 0.337 (iii) Since, once again as in (ii), n = 5, we now require a value for p1 such that: P (X < 2) ≈ 0.9; i.e., P (X < 2) = P (X ≤ 1) = f (0) + f (1) = (1 − p1 )5 + 5P1 (1 − p1 )4 ≈ 0.9 Using MINITAB (or any such program), we determine that for p = 0.11, this cumulative probability is obtained as 0.903. Thus, the target probability to aim for is a reduction of p1 from 0.4 to 0.11. 8.34 (i) The phenomenon of attempting 25 missions before the first accident occurs is akin to attempting x = 25 “trials” before obtaining the first “success” in a process where each trial has two mutually exclusive (i.e., binary) outcomes. This, of course, is the phenomenon underlying the geometric random variable, with the pdf: f (x) = pq x−1

33 In this case, x is the number of missions undertaken prior to the occurrence of the first accident, and p is the probability of the indicated catastrophic accident occurring. Thus, for x = 25 and p = 1/35 = 0.02857, we obtain the required probability as: f (25) = 0.014 (ii) In this case, the required probability is P (X ≤ 25) for the geometric random variable, X; the result is: P (X ≤ 25) = 0.516 with the very illuminating implication that if a catastrophic event such as that experienced on Jan 28, 1986 were to occur, there is more than a 50% chance that it would occur on or before the 25th mission attempt. (iii) If p = 1/60, 000, the probability P (X ≤ 25) becomes: P (X ≤ 25) = 0.0004167 (we deliberately retained so many significant figures to make a point). The indication is that such an event is extremely unlikely. In light of the historical fact that this catastrophic event did in fact occur on the 25th mission attempt, it appears as if the independent NASA study grossly underestimated the value p; the Air Force estimate, on the other hand, definitely appears to be much more representative of the actual value. 8.35 (i) The required probability is obtained from the Poisson distribution, P(λ = 0.75) as: P (X ≤ 2|λ = 0.75) = 0.959 Note: this implies that the probability that there will be 3 or more claims per year on the car in question is 1 − 0.959 = 0.041. (ii) The problem requires determining xu such that: P (X ≥ xu ) ≤ 0.05 In terms of cumulative probabilities, this is equivalent to requiring: 1 − P (X < xu ) ≤ 0.05; or P (X < xu ) ≥ 0.95 The following cumulative probabilities for the Poisson random variable with λ = 0.75 can be obtained from computer packages such as MINITAB: P (X < 4) = P (X ≤ 3) = 0.993 P (X < 3) = P (X ≤ 2) = 0.959 P (X < 2) = P (X ≤ 1) = 0.827 Observe that the smallest value of xu for which P (X < xu ) exceeds 0.95 is 3, so that the desired value of xu is 3. Hence, any car with claims totalling 3 or more in one year will be declared to be of “poor initial quality.”

34

CHAPTER 8.

8.36 By definition, the average is obtained as: ∑ xi Φ(xi ) 3306 x ¯ = ∑i = = 6.599 501 i Φ(xi ) From the result in Exercise 8.13, we know that for this random variable: µ = E(X) = αp/(1 − p); where α =

−1 ln(1 − p)

Thus, given x ¯ = 6.599 as an estimate of µ, we must now solve the following nonlinear equation numerically for p: 6.599 =

−p (1 − p) ln(1 − p)

The result is: p = 0.953 Upon introducing this value into the logarithmic series pdf, f (x) = resulting predicted frequency, obtained as:

αpx x ,

the

ˆ Φ(x) = 501f (x) is shown in the following table, along with the observed frequency, with both frequencies plotted in Fig 8.11. From this table and the plots, the model appears sufficiently adequate.

160

Variable Observ ed Predicted

140 120

Frequency

100 80 60 40 20 0 0

5

10

15

20

25

X

Figure 8.11: Empirical frequency distribution of the Malaya butterfly data (solid line, circles) versus theoretical logarithmic series model, with p = 0.953 (dashed line, squares).

35 No of species x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Observed Frequency Φ(x) 118 74 44 24 29 22 20 19 20 15 12 14 6 12 6 9 9 6 10 10 11 5 3 3

Predicted Frequency ˆ Φ(x) 156.152 74.407 47.273 33.788 25.760 20.458 16.711 13.935 11.805 10.125 8.772 7.663 6.741 5.965 5.306 4.740 4.252 3.827 3.455 3.128 2.839 2.583 2.354 2.150

Chapter 9

Exercises Section 9.1 9.1 (i) The graphs for the discrete geometric G(0.25) and the continuous exponential E(4) distributions are shown in Fig 9.1, where we see that one is a discrete version of the other. The graphs for the additional pairs of distributions: G(0.8) and E(1.25); and G(0.5) and E(2) are shown, respectively, in Figs 9.2 and 9.3, both showing the same characteristics as shown in Fig 9.1. Distribution Plot Distribution p Geometric 0.25

0.25

Distribution Scale Thresh Exponential 4 0

f(x)

0.20

0.15

0.10

0.05

0.00 0

10

20

30 X

40

50

60

X = total number of trials.

Figure 9.1: Comparison of the discrete discrete geometric G(0.25) pdf (shaded bars) and the continuous exponential E(4) pdf (solid line). (ii) The pdf of the geometric random variable G(p) is f (x) = pq x−1 so that f (x + 1) = pq x 1

2

CHAPTER 9.

Distribution Plot 0.9

Distribution p Geometric 0.8

0.8

Distribution Scale Thresh Exponential 1.25 0

0.7

Density

0.6 0.5 0.4 0.3 0.2 0.1 0.0 0

5

10 X

15

20

X = total number of trials.

Figure 9.2: Comparison of the discrete discrete geometric G(0.8) pdf (shaded bars) and the continuous exponential E(1.25) pdf (solid line).

Distribution Plot Distribution p Geometric 0.5

0.5

Distribution Scale Thresh Exponential 2 0

Density

0.4

0.3

0.2

0.1

0.0 0

5

10

15 X

20

25

30

X = total number of trials.

Figure 9.3: Comparison of the discrete discrete geometric G(0.5) pdf (shaded bars) and the continuous exponential E(2) pdf (solid line).

3 from where we see that:

f (x + 1) =q =1−p f (x)

which rearranges to give: f (x + 1) − f (x) = −p f (x)

(9.1)

as required, clearly a finite difference discretization of the expression, df (x) = −p f (x) Integrating both sides of this differential expression with respect to x yields: ln f (x) = −px + C0 from which we obtain:

f (x) = C1 e−px

(9.2)

where the constant C1 = eC0 . Observe now that first, for the expression in Eq (9.2)∫ to be a valid pdf, x must be constrained to the set 0 < x < ∞ (otherwise x f (x)dx will not be finite). We may then evaluate the indicated constant, C1 , from the requirement that for a valid pdf, ∫ ∞ ∫ ∞ f (x)dx = C1 e−px dx = 1 0

0

to yield

{ C1

∞ } −1 −px e =1 p 0

from which we obtain: C1 = p Thus, the resulting pdf is

f (x) = pe−px

(9.3)

which, when compared with Eq (9.10) in the text, is seen to be the pdf for an exponential random variable with parameter β = 1/p, hence establishing the result that the pdf of a geometric random variable G(p) is indeed the discretized version of the pdf of the exponential random variable, E(1/p). 9.2 By definition, the median xm of the exponential random variable, E(β), is obtained from ∫ ∞ ∫ xm 1 −x/β 1 −x/β e dx = e dx β β xm 0 Upon carrying out the indicated integrals, the result is: xm ∞ −e−x/β = −e−x/β 0

xm

4

CHAPTER 9.

which simplifies to yield: (1 − e−xm /β ) = e−xm /β or, 1 = e−xm /β 2 and finally, upon taking natural logs and rearranging, we obtain: xm = β ln 2 as required. By definition, the hazard function is: h(t) =

f (t) 1 − F (t)

with f (t) as the pdf of the exponential random variable, T , and F (t) is the cdf, defined by ∫ t ∫ t 1 −x/β F (t) = f (x)dx = e dx β 0 0 or, upon carrying out the indicated integral, F (t) = 1 − e−t/β Hence, h(t) =

1−

1 −t/β βe (1 − e−t/β )

=

1 =η β

as required. 9.3 The pdfs for the two independent random variables, X1 and X2 , are: f (x1 ) = ηe−ηx1 ; f (x2 ) = ηe−ηx2 so that by their independence, the joint pdf, f (x1 , x2 ), is obtained as: f (x1 , x2 ) = f (x1 )f (x2 ) = η 2 e−η(x1 +x2 ) ; 0 < x1 , x2 < ∞

(9.4)

Now, the given transformation, Y = X1 − X2

(9.5)

is non-square; it may be “squared” by referring to it as Y1 , and introducing any other arbitrary second transformation as Y2 , for example, Y1

= X1 − X2 ; −∞ < y1 < ∞

(9.6)

Y2

= X1 ;

(9.7)

5 The inverse of this “squared” transformation is obtained easily as: x1

= y2

x2

= y2 − y1

at which point we must now exercise some care. This is because, according Eq (9.6), it is possible for y1 to be both positive and negative. Thus, according to the second half of this inverse transformation, the original condition 0 < x2 < ∞ will hold only if the condition 0 < y2 − y1 < ∞ holds, which implies that y1 < y2 < ∞ is the only valid domain for y2 as defined here. The Jacobian of the inverse transformation is 0 1 =1 J = −1 1 from where we now obtain the joint pdf as: f (y1 , y2 ) = η 2 e−η(2y2 −y1 ) ; −∞ < y1 < ∞

(9.8)

y1 < y2 < ∞ The required pdf, f (y1 ), may now be obtained from Eq (9.8) by integrating out the extraneous y2 over the appropriate domain, i.e., ∫ ∞ ∫ ∞ 2 −η(2y2 −y1 ) 2 ηy1 f (y1 ) = η e dy2 = η e e−2ηy2 dy2 y1

y1

And now, when y1 > 0, the integral is evaluated to yield: { ∞ } 1 −2ηy2 2 ηy1 f (y1 ) = η e − e 2η y1 which simplifies to:

η −ηy1 e ; y1 > 0 2 When y1 < 0, the integral simplifies to: f (y1 ) =

f (y1 ) =

η ηy1 e ; y1 < 0 2

(9.9)

(9.10)

The two may be consolidated by using the absolute value function defined as: { y; y > 0 |y| = (9.11) −y; y ≤ 0 to give f (y) = as required.

η −η|y| e ; −∞ < y < ∞ 2

(9.12)

6

CHAPTER 9.

9.4 Directly from the pdf in Eq (9.12) and the formal definitions of the expected value, we obtain ∫ ∞ ∫ η ∞ −η|y| E(Y ) = yf (y)dy = ye dy 2 −∞ −∞ which may be rewritten as: ∫ ∫ η 0 η ∞ −ηy E(Y ) = ye(−η)(−y) dy + ye dy 2 −∞ 2 0 Now, in general,

∫ xeax =

(9.13)

eax (ax − 1) a2

so that Eq (9.13) reduces to: { 0 } ∞ } { η e−ηy η eηy + E(Y ) = (ηy − 1) (−ηy − 1) 2 2 2 η 2 η −∞ 0 = −

1 1 + =0 2η 2η

Similarly, because V ar(Y ) = E[(Y − µy )2 ] where µy = 0 as obtained above, in this case, we obtain that, ∫ ∞ ∫ η ∞ 2 −η|y| V ar(Y ) = E(Y 2 ) = y 2 f (y)dy = y e dy 2 −∞ −∞ And, again, as in Eq (9.13), we may split the integral into two to obtain: ∫ ∫ η 0 2 (−η)(−y) η ∞ 2 −ηy V ar(Y ) = y e dy + y e dy 2 −∞ 2 0 Upon carrying out the indicated integration (by parts, twice), and tidying up, one obtains 1 1 2 V ar(Y ) = 2 + 2 = 2 η η η Alternatively, directly from Eq (9.5), by independence, E(Y ) = E(X1 ) − E(X2 ) and since, for the exponential random variable, E(X1 ) = 1/η = E(X2 ), we obtain immediately that: E(Y ) = 0 as obtained earlier. Similarly, V ar(Y ) = V ar(X1 ) + V ar(X2 )

7 and, since for the exponential random variable, V ar(X1 ) = 1/η 2 = V ar(X2 ), we obtain immediately that: V ar(Y ) =

2 η2

again as obtained earlier. 9.5 For X ∼ E(1), an exponentially distributed random variable with parameter 2 β = 1, µX = 1 and σX = 1, so that σX = 1 also. Hence, (i) P (X − µX ≥ 3σX ) = P (X − 1 ≥ 3) or P (X ≥ 4). This probability can be obtained by direct integration of the pdf, i.e., ∫ ∞ P (X ≥ 4) = e−x dx = 0 + e−4 = 0.0183 4

(ii) P (µX − 2σX < X < µX + 2σX ) = P (−1 < X < 3) = P (0 < X < 3) since X is a nonnegative random variable. This probability may also be obtained by direct integration of the pdf to yield ∫ 3 P (0 < X < 3) = e−x dx = e0 − e−3 = 0.95 0

9.6 (i) To establish the result in Eq (9.17) in the text, we begin by defining the integral ∫ ∞ I(k) = e−z z k−1 dz (9.14) a

Then, via integration by parts, we obtain: I(k) = =

∫ ∞ ∞ −z k−1 e−z a + (k − 1) e−z z k−2 dz ∫ ∞ a k−1 −a a e + (k − 1) e−z z k−2 dz a

which, from Eq (9.14), simplifies to I(k) = ak−1 e−a + (k − 1)I(k − 1)

(9.15)

From here, we obtain that I(k − 1) = ak−2 e−a + (k − 2)I(k − 2)

(9.16)

Upon substituting Eq (9.16) for I(k − 1) in Eq (9.15), the result is I(k) = ak−1 e−a + (k − 1)ak−2 e−a + (k − 1)(k − 2)I(k − 2) Repeated recursive substitutions then produce I(k) = ak−1 e−a + (k − 1)ak−2 e−a + (k − 1)(k − 2)ak−3 e−a + · · · + (k − 1)!

8

CHAPTER 9.

Dividing through by (k − 1)! now yields: I(k) (k − 1)!

ak−1 e−a ak−2 e−a + + ··· + 1 (k − 1)! (k − 2)!

=

k−1 ∑

=

i=0

ai e−a i!

(9.17)

And now, for a = ηt and y = i, we obtain the required expression: 1 (k − 1)!





e−z z k−1 dz =

ηt

k−1 ∑

e−ηt (ηt)y y! y=0

(9.18)

(ii) From the definition: ∫ Γ(α) =



e−y y α−1 dy

(9.19)

0

we obtain: (a) By substituting α = 1 that ∫ ∞ ∞ Γ(1) = e−y dy = −e−y 0 = 1 0

(b) Upon integration by parts, we obtain: ∫ ∞ ∫ ∞ −y α−1 α−1 −y ∞ e−y y α−2 dy Γ(α) = e y dy = −y e 0 + (α − 1) 0 0 ∫ ∞ = (α − 1) e−y y α−2 dy 0

= (α − 1)Γ(α − 1) as required. (c) Finally, from Γ(α) = (α − 1)Γ(α − 1); α > 1;

(9.20)

for integer alpha, we obtain: Γ(α − 1) = (α − 2)Γ(α − 2);

(9.21)

so that upon substituting Eq (9.21) into Eq (9.20) for Γ(α − 1), we obtain Γ(α) = (α − 1)(α − 2)Γ(α − 2); Similarly, with Eq (9.21) implying that Γ(α − 2) = (α − 3)Γ(α − 3), we obtain: Γ(α) = (α − 1)(α − 2)(α − 3)Γ(α − 3);

9 Such recursive substitutions finally yield Γ(α) = (α − 1)! as required. 9.7 (i) From the given pdf, the mean µ of the random variable X, is obtained from: ∫ ∞ 1 µ= x α e−x/β xα−1 dx β Γ(α) 0 where, upon introducing the change of variable, y = x/β, so that dx = βdy, we obtain ∫ ∞ 1 µ = β (α+1) y α e−y dy β α Γ(α) 0 ∫ ∞ β = e−y y α dy Γ(α) 0 β = Γ(α + 1) Γ(α) And now, because Γ(α + 1) = αΓ(α) we obtain the required result: µ = αβ (ii) By definition, σ 2 = E[(X − µ)2 ] = E(X 2 ) − µ2 From the given pdf, and the result obtained in (i) above, we find therefore that: ∫ ∞ 1 σ2 = α xα+1 e−x/β dx − (αβ)2 β Γ(α) 0 Introducing a change of variable, y = x/β, and tidying up a bit, results in: ∫ ∞ β2 σ2 = y α+1 e−y dy − (αβ)2 Γ(α) 0 β 2 Γ(α + 2) = − (αβ)2 Γ(α) = β 2 α(α + 1) − (αβ)2 = αβ 2 as required. (iii) By definition, M (t) = E(etX ) =

1 β α Γ(α)



∞ 0

xα−1 e−(1−βt)x/β dx

10

CHAPTER 9.

The change of variable, y = x/γ where γ=

β (1 − βt)

reduces this integral to M (t) = = =

∫ ∞ 1 γ α y α−1 e−y dy β α Γ(α) 0 ( )α γ β 1 = (1 − βt)−α (1 − βt)α

as required. 9.8 The most direct approach is to use the method of characteristic functions. In this case if Yi is an exponential random variable with parameter β, i.e., Yi ∼ E(β), then, from Eq (9.9) in the text, its characteristic function is φYi (t) =

1 (1 − jβt)

and for the random variable X defined as: X=

α ∑

Yi

i=1

the characteristic function is obtained as: φX (t) =

α ∏

1 1 = (1 − jβt) (1 − jβt)α i=1

But from Eq (9.34) in the text, this characteristic function is recognized as that of the gamma random variable, γ(α, β), hence the required result. 9.9 (i) If Xi , i = 1, 2, . . . , n, are n independent gamma random variables, each with different shape parameters αi but a common scale parameter β, i.e., Xi ∼ γ(αi , β), then for each i, the characteristic function is: φXi (t) =

1 (1 − jβt)αi

so that the characteristic function for the random variable Y defined as: Y =

n ∑ i=1

Xi

11 will be φY (t) =

n ∏

1 1 ∑n = αi i αi (1 − jβt) (1 − jβt) i=1

(9.22)

which is the characteristic function of a gamma random variable, with shape ∑n parameter α∗ = i=1 αi and scale parameter β, i.e., Y ∼ γ(α∗ , β), as required. (ii) We begin by noting that the property stated in Eqs (4.94) and (4.95) in the text for MGFs also apply to characteristic functions (by definition); i.e., if Y = aX + b and φX (t) is the characteristic function for the random variable X, then the corresponding characteristic function for the random variable Y is: φY (t) = ebt φX (at) Thus, given the CF, φXi (t) =

1 (1 − jβt)αi

if we define the random variable Z as Z=c

n ∑

Xi = cY

i=1

then from the result in Eq (9.22) above, φZ (t) = φY (ct) =

1 (1 − jcβt)α∗

which is the characteristic function for a gamma random variable, γ(α∗ , cβ), as required. 9.10 (i) Figure 9.4 shows f (x), the pdf for the X ∼ E(5) random variable (for the single, large CSTR) in the solid line, while the dashed line shows f (y), the pdf for the Y ∼ γ(5, 1) random variable (for 5 identical, standard size CSTRs in series). The mean residence time for X is E(X) for X ∼ E(5); i.e., µX = 5; the corresponding value for Y is µY = 5 × 1 = 5, also. Thus, both configurations have identical mean residence times. (ii) Using MINITAB, one obtains, for the gamma random variable, Y ∼ γ(5, 1), P (Y ≤ 5) = 0.56 and for the exponential random variable, X ∼ E(5) P (X ≤ 5) = 0.632 (See Figs 9.5 and 9.6).

12

CHAPTER 9.

Distribution S cale Thresh E xponential 5 0

0.20

Distribution S hape S cale Thresh G amma 5 1 0

f(x)

0.15

0.10

0.05

0.00 0

10

20 X

30

40

Figure 9.4: Comparison of the pdfs for X ∼ E(5), the residence time distribution for the single, large CSTR (solid line) and for Y ∼ γ(5, 1), the residence time distribution for 5 identical, standard size CSTRs in series (dashed line).

Distribution Plot Gamma, Shape=5, Scale=1, Thresh=0 0.20

0.560

f(y)

0.15

0.10

0.05

0.00

0

5 Y

Figure 9.5: P (Y ≤ 5) for the gamma random variable, Y ∼ γ(5, 1).

13

Distribution Plot Exponential, Scale=5, Thresh=0 0.20

0.15

f(x)

0.632 0.10

0.05

0.00

0

5 X

Figure 9.6: P (X ≤ 5) for the exponential random variable, X ∼ E(5). 9.11 The pdf for the random variable X ∼ γ(α, β), is: f (x) =

1 e−x/β xα−1 β α Γ(α)

For the Inverse Gamma, IG, random variable defined by Y = 1/X, the inverse transformation is: 1 x = ψ(y) = y so that the Jacobian of the transformation will be: J=

1 1 dx = − 2 ; ⇒ |J| = 2 dy y y

As a result, fY (y) = fX (ψ(y))|J| =

1 e(−1/β)/y α β Γ(α)

( )α−1 1 1 y y2

which rearranges to yield: fY (y) =

(1/β)α −(1/β)/y −α−1 e y ;0 < y < ∞ Γ(α)

as required. The mean for this random variable, µY , is obtained as: ∫ ∫ ∞ (1/β)α ∞ −(1/β)/y −α e y dy µY = E(Y ) = yf (y)dy = Γ(α) 0 0

(9.23)

14

CHAPTER 9.

Upon a first change of variable, x = 1/y, we obtain ∫ 0 1 µY = α −e−(x/β) xα−2 dx β Γ(α) ∞ and one more change of variable, z = x/β, yields: ∫ 0 1 µY = α −e−z β α−1 z α−2 dz β Γ(α) ∞ which reduces to: µY =

1 βΓ(α)





e−z z α−2 dz

0

With the integral now recognized as Γ(α − 1), and because Γ(α) = (α − 1)Γ(α − 1) we obtain finally that: µY =

1 β(α − 1)

(9.24)

Determining the mode via the usual calculus method requires taking the derivative of fY (y), setting the result to zero, and solving for y, i.e., from { } d 1 1 −(1/β)/y −2 −α−1 fY (y) = α e y y + e−(1/β)/y (−α − 1)y −α−2 = 0 dy β Γ(α) β This simplifies to: 1 −α−3 y = (α + 1)y −α−2 β which may be solved for y to obtain the mode, y ∗ , as: y∗ =

1 β(α + 1)

To determine the variance, we recall that by definition, σY2 = E(Y 2 ) − µ2Y . In this case, ∫ (1/β)α ∞ −(1/β)/y −α+1 E(Y 2 ) = e y dy Γ(α) 0 where, upon introducing a change of variable, x = 1/y, evaluating the resulting integral in the form of the gamma function Γ(α), and simplifying, result in: E(Y 2 ) =

1 β 2 (α − 1)(α − 2)

And now, from Eq (9.24), we obtain: σY2 =

1 − β 2 (α − 1)(α − 2)

(

1 β(α − 1)

)2 =

1 β 2 (α − 1)2 (α − 2)

15 9.12 (i) If Y ∼ E(β), then the pdf is fY (y) =

1 −y/β e β

For the given transformation, X = Y 1/ζ with the inverse transformation Y = Xζ and the Jacobian, J = ζX ζ−1 Therefore: fX (x) =

1 ζ−1 −xζ /β ζx e β

When compared with Eq (9.55) in the text, we observe that this is the pdf for a W (ζ, β 1/ζ ) random variable. (ii) Conversely, if X ∼ W (ζ, β), then the pdf is: ζ fX (x) = β

( )ζ−1 ζ x e−(x/β) β

Now, for the transformation, Y = Xζ with the inverse transformation and Jacobian given, respectively, as: x = y 1/ζ ; J =

1 (1/ζ−1) y ζ

we obtain that: fY (y) =

ζ 1 (1−1/ζ) −y/β ζ 1 (1/ζ−1) y e y β β ζ−1 ζ

which simplifies to yield: fY (y) =

1 −y/β ζ e βζ

recognized as the pdf of an E(β ζ ) random variable. 9.13 (i) The number of days that each probe is expected to last is determined as the expected value of the Weibull(β = 10; ζ = 2) random variable. From p. 275 in the text, this is obtained as: E(X) = 10Γ(1.5) = 10 × 0.886 = 8.86 days

16

CHAPTER 9.

(ii) The required probability is: P (X > 20), which may be computed (using MINITAB) as P (X > 20) = 1 − P (X ≤ 20) = 0.0183 (iii) The required probability is: P (10 ≤ X ≤ 15), obtained as P (10 ≤ X ≤ 15) = 0.262 (iv) The required probability is: P (X ≤ 10), obtained as P (X ≤ 10) = 0.632 Alternative, one may simply note that for this random variable, β = 10, and, from Eq (9.62) in the text, P (X ≤ β) = 0.632 for all Weibull random variables. These 3 probabilities are also shown graphically in Fig 9.7. 9.14 From the supplied information, we obtain that for this random variable, β ζ = 10; ⇒ β = 100 How long we may expect the component to last is determined from E(X), the expected value of the Weibull random variable; i.e., E(X) = 100Γ(1 + 2) = 200 mins Next, the probability that the component will fail in less than 5 hours is determined by computing P (X < 300), since the random variable is defined in minutes. This probability is determined (using MINITAB) for the Weibull W (100, 1/2) as: P (X < 300) = 0.823 Section 9.2 9.15 Given the relationship Z=

X − µx σx

(i) E(Z) is obtained as: E(Z) =

1 [E(X) − µx ] = 0 σx

as required, since E(X) = µx . Similarly, from the given relationship, V ar(Z) is obtained as: 1 V ar(Z) = 2 [V ar(X)] = 1 σx since V ar(X) = σx2 . (ii) From the given pdf of Z, 2 1 f (z) = √ e−z /2 2π

17

Distribution Plot Weibull, Shape=2, Scale=10, Thresh=0 0.09 0.08 0.07

f(x)

0.06 0.05 0.04 0.03 0.02 0.01 0.0183 0.00

0

20 X

Distribution Plot Weibull, Shape=2, Scale=10, Thresh=0 0.09 0.08 0.07 0.06

f(x)

0.262 0.05 0.04 0.03 0.02 0.01 0.00

0

10

15 X

Distribution Plot Weibull, Shape=2, Scale=10, Thresh=0 0.09 0.632

0.08 0.07

f(x)

0.06 0.05 0.04 0.03 0.02 0.01 0.00

0

10 X

Figure 9.7: P (X > 20) (top), P (10 ≤ X ≤ 15) (middle), and P (X ≤ 10) for a Weibull(10,2) random variable.

18

CHAPTER 9.

we observe that the random variable X—whose pdf we seek—is related to Z— whose pdf is given—according to X = µx + σx2 Z Thus, the inverse transformation in this case is actually the original expression given above (showing Z as a function of X); the corresponding Jacobian of this inverse transformation is obtained as: J=

1 dz = dx σx

Thus, fX (x) =

1 1 −(x−µx )2 /2σx2 √ e σx 2π

which, of course, is the pdf for the Gaussian random variable with mean µx and variance σx2 , confirming Eq (9.91) in the text. 9.16 (i) Using MINITAB, the required probabilities are obtained as follows: P (−1.96σ < X − µ < 1.96σ) = P (80.4 < X < 119.6) = 0.95 and P (−3σ < X − µ < 3σ) = P (70 < X < 130) = 0.997 (ii) Similarly, P (X > 123) = 0.0107; and P (74.2 < X < 126) = 0.990 9.17 (i) Using MINITAB, we obtain that, for Z ∼ N (0, 1), P (Z ≥ z0 ) = 0.05 ⇒ z0 = 1.64 and that P (Z ≥ z0 ) = 0.025 ⇒ z0 = 1.96 (ii) Similarly, P (Z ≤ z0 ) = 0.025 ⇒ z0 = −1.96 and P (Z ≥ z0 ) = 0.10 ⇒ z0 = 1.28 while P (Z ≤ z0 ) = 0.10 ⇒ z0 = −1.28 (iii) Finally, by symmetry P (|Z| ≥ z0 ) = 0.00135 ⇒ P (Z ≤ −z0 ) = 0.000675 and P (Z ≥ z0 ) = 0.000675 so that: z0 = 3.21

19

Distribution Plot Distribution Mean StDev Normal 0 1

0.4

Distribution Loc Scale Logistic 0 1

f(x)

0.3

0.2

0.1

0.0 -5.0

-2.5

0.0 X

2.5

5.0

7.5

Figure 9.8: Comparison of the pdfs for the standard normal distribution (solid line) and for the (standard) logistic distribution (dashed line).

9.18 (i) P (−1.96 < Z < 1.96) = 0.95 and P (−1.645 < Z < 1.645) = 0.900 (ii) P (−2 < Z < 2) = 0.954 and P (−3 < Z < 3) = 0.997 (iii) P (|Z| ≥ 1) = P (Z ≤ −1) + P (Z ≥ 1) = 0.159 + 0.159 = 0.318 9.19 (i) The expected value for this random variable is obtained as ∫ ∞ xe−x E(X) = dx −x )2 −∞ (1 − e It is best to rewrite this in terms of the hyperbolic secant function, i.e., ∫ ∞ (x) x E(X) = sech2 dx 2 −∞ 4 Upon introducing the variable change, y = x/2, we obtain, ∫ ∞ E(X) = y sech2 y dy = 0 −∞

as required. (ii) A plot the (standard) logistic pdf and the standard normal pdf is shown in Fig 9.8. Both distributions are symmetric, with the iconic bell-shape, but the logistic distribution has heavier tails. As a result, the logistic distribution will be more appropriate for those phenomena for which large deviations from the mean are not as rare. 9.20 First, we show that if Xi ∼ N (µ, σ 2 ), then Zi =

Xi − µ σ

20

CHAPTER 9.

has the N (0, 1) distribution. This is accomplished in the usual fashion: the inverse transformation and Jacobian are: xi = σzi + µ; J = σ so that the pdf, fZ (zi ), is obtained as: 2 2 1 1 fZ (zi ) = σ √ e−zi /2 = √ e−zi /2 σ 2π 2π

as required. Next, we show that if Zi ∼ N (0, 1), then Wi = Zi2 has a χ2 (1) distribution. This result was established in Example 6.3, p177 √ of the text, and may simply be invoked here (keeping in mind that Γ(1/2) = π so that Eq (6.41) and Eq (9.43) are equivalent). Finally, we need to obtain fY (y) for the random variable defined as )2 ∑ n ( n ∑ Xi − µ Y = = Wi σ i=1 i=1 where the random variable Wi has a χ2 (1) distribution. From Eq (9.45) in the text, we see that the characteristic function for Wi is: φWi (t) =

1 (1 − j2t)1/2

Thus, the characteristic function for the random variable Y defined above is obtained from here as: φY (t) =

n ∏

φWi (t) =

i=1

n ∏

1 1 = 1/2 (1 − j2t) (1 − j2t)n/2 i=1

which is recognized as the characteristic function of a χ2 (n) random variable, as required. 9.21 If the random variable Y has a normal N (µ, σ 2 ) distribution, then f (y) =

2 2 1 √ e−(y−µ) /2σ σ 2π

For the random variable X defined as X = eY the inverse transformation, and the corresponding Jacobian are: y = ln x; ⇒ J =

1 x

21 as such, the pdf, fX (x) is: fX (x) =

2 2 1 √ e−(ln x−µ) /2σ xσ 2π

which, when compared with Eq (9.143), is recognized as the lognormal distribution with parameters α = µ and β = σ, (or, β 2 = σ 2 ). 9.22 Here, we start with the lognormal L(α, β) random variable, X, with pdf: f (x) =

2 2 1 √ e−(ln x−α) /2β xβ 2π

This time, for the random variable Y defined as Y = ln X the inverse transformation and associated Jacobian are: x = ey ; ⇒ J = ey so that the desired pdf fY (y) is obtained as: fY (y) = ey

1 √

ey β

e−(y−α)

2



/2β 2

=

2 2 1 √ e−(y−α) /2β β 2π

recognized as a normal N (µ, σ 2 ) distribution, with µ = α; σ = β. 9.23 From the values given for the lognormal distribution parameters, i.e., α = 2 0; β = 0.2, we obtain the distribution mean, µX , and variance, σX , as follows: µX

= exp(α + β 2 /2) = 1.02

2 σX

= e2α+β (eβ − 1) = 0.042; (or σx = 0.206)

2

2

The pdf for this lognormal distribution, and that for the Gaussian N (1.02, 0.2062 ) random variable, are shown in Fig 9.9, where we observe that the two pdfs are quite similar. The lognormal distribution (dashed line) is still skewed, but only very slightly so. In general, for small values of x, and especially for relatively small variances, the lognormal distribution approaches the normal distribution. (Note: MINITAB refers to the lognormal distribution’s α parameter as the “location” parameter, and β as the “scale” parameter. As we showed in Figs 9.8 and 9.9 in the text, β is more appropriately considered as a “shape” parameter, whereas α is more appropriately considered as the scale parameter.) 9.24 For the lognormal distributed random variable, with µX = 1.02 and σX = 0.206, the required probability P (µX − 1.96σX < X < µX + 1.96σX ), is P (0.616 < X < 1.414) = 0.954

22

CHAPTER 9.

Distribution Plot Distribution Mean StDev Normal 1.02 0.206

2.0

Distribution Loc Scale Thresh Lognormal 0 0.2 0

f(x)

1.5

1.0

0.5

0.0 0.50

0.75

1.00 X

1.25

1.50

1.75

Figure 9.9: Comparison of the pdfs for the normal distribution N (1.02, 0.2062 ) (solid line) and for the lognormal distribution L(α, β); α = 0; β = 0.2 (dashed line).

Had this random variable been mistakenly assumed to be Gaussian with the same µX = 1.02 and σX = 0.206, the probability would have been obtained as 0.95. This result underscores just how close these two distributions are under the specific conditions being considered here: note that the Gaussian distribution “underestimates” the probability in question by a mere 0.004, less that 0.5%. To two decimal places, the two probabilities are equal. 9.25 If Y1 ∼ N (0, b2 ) and Y2 ∼ N (0, b2 ), then, by independence, the joint pdf, f (y1 , y2 ), is obtained as: 1 −(y12 +y22 )/2b2 e b2 2π

f (y1 , y2 ) = The transformation, X=

√ Y12 + Y22

is non-square; redefining this as X1 , and introducing a second arbitrary transformation, such as: X2 = Y2 results in the following square transformation: X12 X22

= Y12 + Y22 = Y22

from which we obtain the inverse transformation: √ y1 = x21 − x22 y2

= x2

23 whose Jacobian is:

√ x1 2 2 J = x1 −x2 0

2 √−x 2

x1 −x22

1

x1 ; = √ 2 x1 − x22

provided x1 ̸= x2 . Thus, the joint pdf f (x1 , x2 ) is obtained as: f (x1 , x2 ) =

x 1 −(x21 )/2b2 √ 1 e ; x1 ̸= x2 2 b2 2π x1 − x22

To obtain the actual required pdf, f (x1 ), we must now integrate out the extraneous x2 variable, i.e., {∫ } ∫ ∞ ∫ ∞ 0 x1 −(x21 )/2b2 1 1 √ √ dx2 + dx2 f (x1 ) = f (x1 , x2 )dx2 = 2 e b 2π x21 − x22 x21 − x22 −∞ 0 −∞ and for x1 ̸= x2 , we obtain: f (x1 )

= = =

{ 0 ∞ } x1 −(x21 )/2b2 x2 x2 e arcsin + arcsin b2 2π x1 −∞ x1 0 x1 −(x21 )/2b2 e (π + π) b2 2π x1 −(x21 )/2b2 e b2

which is the pdf of the Rayleigh R(b) random variable, as given by Eq (9.153) in the text. 9.26 The pdf for the random variable X with a Rayleigh R(b) distribution, is: f (x) =

x −(x2 )/2b2 e ; x > 0; b > 0 b2

For the random variable Y defined as Y = X2 the inverse transformation is:

√ x=+ y

because, by definition, the Rayleigh-distributed random variable X is restricted to the positive real line. Thus, the Jacobian of the transformation is: J= As such,

√ fY (y) =

1 −1/2 y 2 y

b2

e−y/2b

2

1√ y 2

24

CHAPTER 9.

which simplifies to give: fY (y) =

1 −y/2b2 e 2b2

recognized as the pdf of an exponential random variable, E(β), with β = 2b2 . Section 9.3 9.27 The pdf for the beta B(α, β) random variable X is f (x) =

Γ(α + β) α−1 x (1 − x)β−1 Γ(α)Γ(β)

whose mode, x∗ , is obtained by solving the expression

df (x) dx

= 0 for x, i.e.,

{ } df (x) = C (α − 1)xα−2 (1 − x)β−1 − (β − 1)(1 − x)β−2 xα−1 = 0 dx where C, a constant, is the ratio of the indicated gamma functions. For x ̸= 0 and (1 − x) ̸= 0, this expression simplifies to: (α − 1)(1 − x) = (β − 1)x which, when solved for x, yields: x∗ =

α−1 α+β−2

as required. (a) From here, we now note that if α + β > 2, so that the denominator is positive, and if, in addition, 0 < α < 1, then the numerator will be negative; as a result, the mode, x∗ , will be negative, which is impossible for the beta random variable restricted to the [0,1] interval. Hence, no mode exists for a beta random variable when 0 < α < 1 and α + β > 2. (b) Since the mean for the beta random variable is given as: µ=

α α+β

then when a mode exists, the mode and the mean will coincide when: α−1 α = α+β α+β−2 or: α(α + β − 2) = (α + β)(α − 1); which simplifies to: α=β as required.

25 9.28 The joint pdf f (x, p) is obtained as: ( ) n Γ(α + β) x+α−1 f (x, p) = f (x|p)f (p) = p (1 − p)n−x+β−1 x Γ(α)Γ(β) = where



C(x)pα

−1

(1 − p)β



−1

( ) n Γ(α + β) C(x) = ; α∗ = (x + α); β ∗ = (n − x) + β x Γ(α)Γ(β)

The desired pdf, f (x), is obtained by integrating out p from the joint pdf, i.e., ∫ 1 ∫ 1 ∗ ∗ f (x) = f (x, p)dp = C(x) pα −1 (1 − p)β −1 dp 0

0

The easiest way to evaluate this integral is to recognize that the integrand is the kernel of a beta B(α∗ , β ∗ ) pdf for the random variable, P , with the immediate implication that the integral will be the inverse of the pdf’s normalizing constant, (also known as the “beta function,”) i.e., ∫ 1 ∗ ∗ Γ(α∗ )Γ(β ∗ ) pα −1 (1 − p)β −1 dp = Γ(α∗ + β ∗ ) 0 as a result,

( ) n Γ(α + β) Γ(α∗ )Γ(β ∗ ) f (x) = x Γ(α)Γ(β) Γ(α∗ + β ∗ ) is the required pdf for the Beta-Binomial random variable; it is sometimes represented in terms of the so-called beta function alluded to above, and defined as: ∫ 1 Γ(α)Γ(β) B(α, β) = xα−1 (1 − x)β−1 dx = Γ(α + β) 0 providing the alternate representation: ( ) ( ) n B(α∗ , β ∗ ) n B(α + x, β + n − x) f (x) = = x B(α, β) x B(α, β) 9.29 For the standard uniform U (0, 1) random variable, the required probability is obtained as: ∫ a1 +a2 ∫ a1 +a2 P [a1 < X < (a1 + a2 )] = f (x)dx = 1 dx a1

a1

= (a1 + a2 ) − a1 = a2 as required. In general, for the uniform random variable on the interval (a, b), the required probability is obtained as: ) ∫ a1 +a2 ∫ a1 +a2 ( 1 dx P [a1 < X < (a1 + a2 )] = f (x)dx = b−a a1 a1 1 a2 = [(a1 + a2 ) − a1 ] = b−a b−a

26

CHAPTER 9.

as required. 9.30 (i) The required probability is obtained as: ∫ P (X > [ωa + (1 − ω)b])

b

(

= [ωa+(1−ω)b]

1 b−a

=

b x b − a [ωa+(1−ω)b]

=

ω(b − a) =ω (b − a)

) dx

(ii) For the specific case where a = 1, b = 3, the pdf is: f (x) =

1 ; 1 2 α−2

as required. (ii) The median, xm , is obtained from: ∫ xm (α − 1)x−α dx = 1 xm ( ) α−1 1−α x = 1−α 1 1 − x1−α m

1 2 1 2 1 2

=

which simplifies to yield: 1

xα−1 = 2; or xm = 2 α−1 m The variance is obtained from: V ar(X) = E(X 2 ) − µ2 Now,





E(X 2 ) =

( (α − 1)x2−α dx =

1

(

so that: V ar(X) =

α−1 α−3

) −

α−1 α−3

) ; α>3

(α − 1)2 (α − 2)2

which simplifies to: V ar(X) =

(α − 1) ; α>3 (α − 3)(α − 2)2

9.32 We use MINITAB to obtain the required probabilities as follows: (i) P (X ≥ 1) = 0.5 (ii) P (X ≥ 2) = 0.0084; P (X ≤ 0.5) = 0.0084 (iii) P (X ≥ 1.76) = 0.0252; P (X ≤ 0.568) = 0.0252 9.33 (a) For P (X ≥ x0 ) = 0.025, we use MINITAB to obtain the required results:

28

CHAPTER 9.

(i) For ν1 = ν2 = 49, x0 = 1.76 (ii) For ν1 = ν2 = 39, x0 = 1.89 (iii) For ν1 = ν2 = 29, x0 = 2.10 (iv) Reducing the degrees of freedom results in an increase in the corresponding values of x0 . (b) Similarly for P (X ≤ x0 ) = 0.025 we use MINITAB to obtain the required results: (i) For ν1 = ν2 = 49, x0 = 0.567 (ii) For ν1 = ν2 = 39, x0 = 0.529 (iii) For ν1 = ν2 = 29, x0 = 0.476 (iv) In this case, reducing the degrees of freedom results in a reduction in the corresponding values of x0 . Note that the values for x0 here are the reciprocals of the corresponding values of x0 in part (a). 9.34 For the random variable X that has a t(ν) distribution, the value of x0 such that P (|X| > x0 ) = 0.025 is obtained from MINITAB for the various specified values of ν as follows: (i) For ν = 5, x0 = 2.57; (ii) For ν = 25, x0 = 2.06; (iii) For ν = 50 x0 = 2.01; (iv) For ν = 100, x0 = 1.98. For a standard normal random variable, the value of x0 such that P (|X| > x0 ) = 0.025 is obtained as x0 = 1.96 Observe that the values of x0 obtained for the t(ν) distribution approach the value obtained for the standard normal distribution as ν increases. 9.35 A plot of the pdfs for a Cauchy C(5, 4) random variable and for a Gaussian N (5, 4) random variable is shown in Fig 9.10. Both distributions are symmetrical around x = 5, but the Cauchy distribution has heavier tails. The required probability is P (X ≥ 5 + 7.84) = P (X > 12.84) for each random variable. The probabilities, obtained using MINITAB, are as follows: for the Gaussian random variable, P (X > 12.84) = 0.025 but for the Cauchy random variable, P (X > 12.84) = 0.150 underscoring the implications of the heavier Cauchy distribution tails. The probability of observing extreme values exceeding 12.84 for the Cauchy C(5, 4) random variable is 6 times the corresponding probability for the Gaussian counterpart. See Fig 9.11. 9.36 A plot of the pdf for the logistic distribution given in Eq (9.186) of Exercise 9.19, and that of the standard Cauchy random variable, is shown in Fig 9.12.

29

Distribution Plot Distribution Mean StDev Normal 5 4

0.10

Distribution Loc Scale Cauchy 5 4

0.08

f(x)

0.06

0.04

0.02

0.00 -40

-30

-20 -10

0

10

20

30

40

50

X

Figure 9.10: Comparison of the pdfs for the Gaussian N (5, 4) random variable (solid line) and for the Cauchy C(5, 4) random variable (dashed line). Note the heavier tails of the Cauchy distribution. Both distributions are symmetric, but the Cauchy distribution has the slightly heavier tails.

APPLICATION PROBLEMS 9.37 The supplied information translates to: P (X > 30) = 0.6 for X ∼ E(β), where the parameter β is unknown. Since, for this exponential random variable, P (X > x) = e−x/β the immediate implication is that: e−30/β = 0.6 which, upon taking natural logs, yields the result: β=

−30 = 58.73 days ln 0.6

With the indicated characteristic that the tornadoes arrive, on average, one every 58.73 days, then the expected number of tornadoes to arrive in the next 90 days, say, y90 , is obtained as: y90 =

1 × 90 = 1.53 58.73

30

CHAPTER 9.

Distribution Plot Normal, Mean=5, StDev=4 0.10

0.08

f(x)

0.06

0.04

0.02 0.0250 0.00

5 X

12.84

Distribution Plot Cauchy, Loc=5, Scale=4 0.08 0.07 0.06

f(x)

0.05 0.04 0.03 0.02 0.01 0.150 0.00

5 X

12.84

Figure 9.11: Comparison of P (X > 12.84) for the Gaussian N (5, 4) random variable (top), and for the Cauchy C(5, 4) random variable (bottom). The probability for the Cauchy random variable is 6 times the probability for the Gaussian random variable.

31

Distribution Plot 0.35

Distribution Loc Scale Logistic 0 1 Distribution Loc Scale Cauchy 0 1

0.30

f(x)

0.25 0.20 0.15 0.10 0.05 0.00 -10

-5

0 X

5

10

Figure 9.12: Comparison of the standard logistic distribution (solid line), and the Cauchy distribution (dashed line). The Cauchy distribution has the slightly heavier tails.

Thus, on average, 1 12 tornadoes are expected to arrive in the next 90 days, or 3 in the next 180 days. 9.38(i) By definition of component “reliability function,” for this particular component, we have that ∫ ∞ Ri (t) = P (T > t) = ηe−ηθ dθ t

from where we obtain: Ri (t) = e−ηt ; η =

0.075 hour−1 100

as the explicit expression for Ri (t) for this electronic component. (ii) A system consisting of two identical components in parallel functions if at least one of them functions; this implies that the system does not function only if no component functions. Therefore, the reliability, Rp (t), the probability that this parallel system functions, is 1 − P (no component functions); i.e., Rp (t)

= 1 − [(1 − Ri )(1 − Ri )] = 1 − [(1 − e−ηt )(1 − e−ηt )] = 2e−ηt − e−2ηt

For η =

0.075 100 ,

therefore, Rp (1000) = 2e−0.75 − e−1.5 = 0.722

32

CHAPTER 9.

is the probability that the parallel system survives at least 1000 hours of operation. 9.39 (i) From the problem statement, observe that xw is to be determined such that no more than 15% of the chips have lifetimes lower than xw ; i.e., the upper limit of xw (a number to be determined as a whole integer) is obtained from P (X ≤ xw ) = 0.15 From the given pdf, we then have that: ∫ 0.15 =

xw

ηe−ηx dx

(9.27)

(9.28)

0

= 1 − e−ηxw

(9.29)

which, given η = 0.16, is easily solved for xw to yield: xw = 1.016

(9.30)

as the upper limit. Thus, as a whole integer, the warranty should be set at xw = 1 year. (ii) It is possible to use the survival function, S(x), directly for this part of the problem, since what is required is P (X > 3) = 0.85, or S(x) = 0.85 (for x = 3). Either from a direct integration of the given pdf, or from recalling the exact form of the survival function for the exponential random variable, obtain: S(x) = e−ηx so that for x = 3, and S(x) = 0.85, solve ∗

0.85 = e−3η2 for η2∗ to obtain

η2∗ = 0.054

The implication is that the target mean life-span should be 1/0.054 = 18.46 years for the next generation chip. From an initial mean life-span of 1/0.16 = 6.25 years, the implied “fold increase” in mean life-span is 2.96 or about 3-fold. 9.40 (i) The mean (average) and variance of the CHO cells inter-origin distance are obtained directly from the data table as: x ¯ = s2x

=

n ∑ i=1 n ∑ i=1

xi f (xi ) = 54.75 (xi − x ¯)2 f (xi ) = 705.28

33

Figure 9.13: Frequency data on distances between DNA replication origins (inter-origin distances) versus the gamma γ(4.25, 12.88) distribution fit.

(ii) From the expressions for the mean and variance of a gamma distributed random variable, i.e., µ = αβ; σ 2 = αβ 2 using the values obtained in (i) for x ¯ and s2x as estimates of the theoretical mean, 2 µ, and variance, σ , respectively, we obtain: 54.75 = αβ; 705.28 = αβ 2 two equations that are easily solved for the two unknowns, to obtain: α = 4.25; β = 12.88 as estimates for the gamma distribution parameters. A plot of the frequency data versus the gamma γ(4.25, 12.88) pdf is shown in Fig 9.13. But for the singular data point at x = 45 kb, the model fit is excellent, especially since the model parameters were determined from the data moments, not by actually fitting the gamma pdf to the data directly. (iii) The values estimated for the gamma distribution imply that the physiological parameter r = 1/β ≈ 1/13, implying that on average, replication origins occur once every 13 kbs. Also, k ≈ 4, implying that about 4 “skips” occur before DNA synthesis can begin.

34

CHAPTER 9.

9.41 (i) For this Weibull(2,10) random variable, the “most likely” MSL value, which occurs at the mode, is: x∗ = β(1 − ζ)1/ζ = 10 months (ii) The median MSL is obtained as: xm = β(ln 2)1/2 = 8.33 months and the expected MSL is obtained as: µ = βΓ(1 + 1/ζ) = 10Γ(1.5) = 8.86 months The difference between these two values is 0.53 months, approximately 6%. (iii) The required probability is obtained (using MINITAB) as: P (X > 18) = 0.0392 a very low probability. 9.42 (i) With λ ∼ γ(13, 0.25), X, the number of misdiagnoses recorded in a year follows a Poisson-Gamma mixture model, or, equivalently, a negative binomial model, N Bi(α, p), where α = 13; p = 1/(1 + β) = 0.8 from where we obtain: P (X = 3) = 0.200 P (X ≤ 3) = 0.598 (Note: MINITAB offers two options for the negative binomial random variable. In one option, X is defined as the number of “non-events” before the k th success, exactly as we defined the random variable in the text; the second option has X as the “total number of trials” before the k th success.) (ii) If X were misrepresented with a standard Poisson model, then the Poisson mean rate would be determined as λ=

1 × 3250 = 3.25 1000

from where we obtain: P (X = 3|λ = 3.25) P (X ≤ 3|λ = 3.25)

= =

0.222 0.591

Observe that these probabilities are close to the corresponding probabilities computed in (i) above, with P (X > 3) somewhat overestimated under the Poisson model assumption.

35 9.43 For the L(3, 0.5) distribution, (i) P (X > 20) = 0.503; (ii) P (X > 1000) = 0.000663, so that less than 0.07% of senior technical researchers received a “Jumbo” bonus. (iii) P (10 < X < 30) = 0.707, so that 70.7% received the typical bonus in the range $10,000–$30,000. 9.44 The supplied information about the lognormal distribution indicate the following about the pdf parameters: eα

= 403

(α−β 2 )

e

= 245 √ from which we determine that α = 6 and β = 0.5 = 0.707. (i) For this lognormal L(6, 0.707) random variable, therefore, P (X > 500) = 0.381 so that 38.1% of the homes in question cost more than $500,000. (ii) Since P (150 < X < 300) = 0.257 then only 25.7% of the homes in question are “affordable.” (iii) The mean of this random variable is obtained as: µ = e(α+β

2

/2)

= 518.013

A plot of the lognormal L(6, 0.707) indicating the mean (¯ x = 518.013) and median (xm = 403) is shown in Fig 9.14. The median appears to be more “representative” of the central location of the distribution, as the mean is “shifted” more to the right of the central portion of the distribution. 9.45 (i) The mode of the beta B(2, 3) distribution is obtained as: x∗ =

α−1 = 0.333 α+β−2

and since the expected value, E(X), is obtained as: E(X) =

α = 0.4 α+β

the implication is that 40% of students can be expected to fail annually, a very high number. (ii) The probability that over 90% of the students will pass the examination in any given year is the same as the probability that fewer than 10% will fail the exam. Since X is defined in terms of the proportion that fail the exam, the required probability is obtained as: P (X < 0.1) = 0.053

36

CHAPTER 9.

Lognormal, alpha=6, beta=0.707 518

0.0020

Mean

f(x)

0.0015

0.0010

Median 0.0005

403

0.0000 0

500

1000 X

1500

2000

Figure 9.14: Probability distribution function for the lognormal L(6, 0.707) random variable (representing home prices in the upper mid-Atlantic region of the US), indicating the median (solid line) and mean (dashed line) of the random variable.

(iii) If the random variable Y is the proportion of students from an elite college preparatory school who fail the same entrance examination, then Y ∼ B(1, 7), from which we obtain the following results: E(Y ) =

1 = 0.125 8

indicating that only 12.5% of this group of students can be expected to fail. In similar fashion, we obtain P (Y < 0.1) = 0.522 as the probability that less that 10% of these elite prep school students will fail the examination in any given year, the same as the required probability that over 90% will pass. (iv) Theoretically, yes: these results mean that the elite college preparatory school does better in getting its students admitted into the highly selective foreign university. The percentage expected to fail is 12.5% for the prep school students, against 40% for the regular school. Also, the chances are better than 50-50 that over 90% of the elite prep school students will pass the examination; the corresponding chances for the regular school students are one tenth of the chances for the elite prep school students. To buttress this point, Fig 9.15 shows a plot of the two pdfs, for the elite prep school in the solid line, and for the regular school in the dashed line.

37

Distribution Plot Beta First Second 1 7 2 3

7 6

f(x)

5 4 3 2 1 0 0.0

0.2

0.4

0.6

0.8

1.0

X

Figure 9.15: Comparison of beta distributions for student performance in two different schools: elite prep school students (solid line) and regular school students (dashed line). The probability that a large proportion of the regular school students (e.g., x > 0.6) will fail the examination is much higher than the corresponding probability for the elite prep school students, which is practically zero. 9.46 (i) The theoretical beta B(4.5, 1) distribution characterization implies that E(X) =

α 4.5 = = 0.818 α+β 5.5

practically identical to the computed all-time success rate of 0.82. (ii) For the beta B(4.5, 1) random variable, P (X > 0.9) = 0.378 a reasonable but not particularly high probability that the place kicker will achieve elite status. (iii) The table of computed probabilities for the indicated values of the pdf variable, α, is shown below. α 3.5 4.0 4.5 5.0 5.5

P (X > 0.9|α; β = 1) 0.308 0.344 0.378 0.410 0.440

A plot of these probabilities as a function of α is shown in Fig 9.16, where the probability of the place kicker achieving elite status is seen to be a monotonically

38

CHAPTER 9.

0.44

P(X>0.9|beta=1)

0.42 0.40 0.38 0.36 0.34 0.32 0.30 3.5

4.0

4.5

5.0

5.5

Figure 9.16: Probability of the place kicker achieving elite status as a function of the beta distribution parameter α.

increasing function of the beta distribution parameter α. 9.47 (i) The expected value of the given inverted beta distribution random variable, Y , is: α E(Y ) = = 4.36 β−1 (ii) It is better to work with the random variable Z defined as Z = X1 /(X1 +X2 ). This implies the following relationship between Z and Y : Z=

Y Y +1

Thus, P (Y ≥ 2) = P (Z ≥ 2/3) and since Z ∼ B(α, β), then, this probability is obtained as: P (Y ≥ 2) = P (Z ≥ 2/3) = 0.604 as the probability that a gene selected at random from this population will be identified as overexpressed. (iii) Similarly, P (0.5 ≤ Y ≤ 2) = P (1/3 ≤ Z ≤ 2/3) = 0.372 is the probability that there will be insufficient evidence to conclude that there is differential expression.

39 9.48 (i) Under these conditions, xAB =

s2A = 0.269 s2B

so that for the F (49, 49) distribution, P (X ≤ 0.269) = 0.00000487 virtually zero. Thus, the probability is virtually zero that X takes a value as small as, or smaller than 0.269, when the two variances are in fact the same. (ii) From the F (49, 49) distribution, we obtain the following results: P (X ≤ f1 ) P (X ≥ f2 )

= 0.025 ⇒ f1 = 0.567 = 0.025 ⇒ f2 = 1.763

(iii) These results imply that, if the variances of the data obtained from the two processes are in fact the same, the probability of a ratio of the smaller to the larger computed variances being less than 0.567 is a very small 0.025. Note that the value determined for f2 is the reciprocal of f1 , implying that the first statement can also be restated as follows: if the variances of the data obtained from the two processes are in fact the same, the probability of a ratio of the larger to the smaller computed variances being greater than 1/0.567 = 1.763 is the same very small 0.025. The actual computed ratio, 0.269 is even smaller than 0.567 (or, alternatively, the inverse, 1/0.269=3.717 is even larger than 1.763); thus the plausibility of the equal variance conjecture is very slim indeed.

Chapter 10

Exercises 10.1 First, the expression for the entropy of the Bernoulli random variable H(X) = −(1 − p) log2 (1 − p) − p log2 p is rewritten in terms of natural logs as: H(X) = −(1 − p)

ln(1 − p) ln p −p ln 2 ln 2

from where, upon differentiation, we obtain that: dH(X) dp

= =

ln(1 − p) 1 ln p 1 + − − ln 2 ln 2 ln 2 ln 2 ln(1 − p) ln p − ln 2 ln 2

(10.1)

Setting this to zero and solving for p yields: 1 − p = p ⇒ p = 0.5 And a second derivative in Eq (10.1) confirms that H(X) indeed achieves a maximum at this value, p = 0.5, as required. 10.2 One approach to this problem is to determine the maximum entropy distribution for the Bernoulli random variable, Xi , and from there derive the resultant pdf for the Binomial random variable, X, defined in terms of the Bernoulli random variable according to: X=

n ∑

Xi

(10.2)

i=1

Without loss of generality, we consider the Bernoulli random variable to take on values 0 and 1, with an unknown pdf described as follows: { f (xi0 ); xi = 0 f (xi ) = (10.3) f (xi1 ); xi = 1 1

2

CHAPTER 10.

Thus, we seek to find, first, the discrete pdf f (xi ) that minimizes: H(Xi ) =

1 ∑

f (xin ) ln f (xin )

(10.4)

n=0

subject to the constraint: 1 ∑

f (xin ) = 1

(10.5)

n=0

As was presented in the text in Section 10.4.1, the Lagrangian functional in this case is obtained as: [ 1 ] 1 ∑ ∑ Λ(f ) = [− ln f (xin )]f (xin ) − λ f (xin ) − 1 (10.6) n=0

n=0

From here, we may employ the method of calculus of variations (illustrated in the text) to determine the following results: first, for f (xi0 ), [ ] ∂Λ 1 =− f (xi0 ) + ln f (xi0 ) − λ = 0 (10.7) ∂f (xi0 ) f (xi0 ) which yields, upon simplification, f (xi0 ) = e−(1+λ) = C

(10.8)

C is a constant, which, even though still unknown, clearly represents the probability that Xi = 0. We may now use the constraint equation, Eq (10.5), to obtain that f (xi1 ) = 1 − C (10.9) From Eq (10.6), this time for f (xi1 ), we obtain the result that, also: f (xi1 ) = e−(1+λ) = C

(10.10)

so that in principle, the maximum entropy pdf for the Bernoulli random variable must be such that: C = 1 − C; or C = 0.5 For the sake of convention, let p = 1 − C be the probability that Xi = 1; then, the maximum entropy distribution for the Bernoulli random variable is obtained as: { 1 − p; xi = 0 f (xi ) = (10.11) p; xi = 1 with p = 0.5 specifically. We may now use concepts from Chapter 6, in particular the method of characteristic functions, to determine that, given jt

φXi (t) = pe + (1 − p)

3 as the characteristic function of the Bernoulli random variable, Xi , then the CF for the binomial random variable defined in Eq (10.2) will be: [ jt ]n φX (t) = pe + (1 − p) which, from Chapter 8, is recognized as the CF for the random variable whose pdf is given by: ( ) n x f (x) = p (1 − p)n−x (10.12) x Thus, the maximum entropy distribution for a binomial random variable, X, is as shown in Eq (10.12), specifically with p = 0.5; i.e., ( ) n f (x) = 0.5n x when nothing else is known about the random variable except that it produces exactly two outcomes. 10.3 From the given pdf for the geometric random variable, f (x) = pq x−1 ; x = 1, 2, . . . we obtain that ln f (x) = ln p + (x − 1) ln q As such, the entropy, defined as: H(X) =

∞ ∑

− ln f (x)f (x)

i=1

will be obtained as: H(X)

=

∞ ∑

− [ln p + (x − 1) ln q] pq x−1

i=1

= − ln p

∞ ∑

pq x−1 − ln q

i=1

(

= − ln p − ln q

∞ ∑

xpq x−1 + ln q

i=1

1 + ln q p

)

∞ ∑

pq x−1

i=1

∑∞ where we have taken advantage of the fact that i=1 f∑ (xi ) = 1 in general, and, ∞ in particular, that for the geometric random variable, i=1 xpq x−1 = E(X) = 1/p. This expression further simplifies, upon introducing q = (1 − p) to yield: HG (X) = − ln p − =

(1 − p) ln(1 − p) p

1 [−p ln p − (1 − p) ln(1 − p)] p

4

CHAPTER 10.

Compared with the expression for the Bernoulli random variable’s entropy, i.e., HB (X) = −p ln p − (1 − p) ln(1 − p) we see that both expressions are related according to: HG =

1 HB p

i.e., that the entropy of the geometric random variable is exactly the entropy of the Bernoulli random variable divided by p. Note that since p < 1, the implication is that the geometric random variable always has a higher entropy than the corresponding Bernoulli random variable with the same probability of success parameter, p. 10.4 For the exponential random variable with pdf: f (x) =

1 −x/β e ;0 < x < ∞ β

first, we obtain that: ln f (x) = ln

1 x x − = − ln β − β β β

from where the entropy, defined as: ∫ ∞ H(X) = − ln f (x)f (x)dx 0

is obtained in this specific case as: ) ∫ ∞( x 1 −x/β H(X) = ln β + e dx β β 0 ∫ ∫ ∞ 1 ∞ x −x/β 1 −x/β e dx + e dx = ln β β β 0 β 0 And since, from the properties of a pdf: ∫ ∞ 1 −x/β e dx = 1 β 0 and from the expected value of the exponential random variable, ∫ ∞ x −x/β e dx = β β 0 the expression for the entropy above simplifies to: H(X) = ln β +

β = ln β + 1 β

5 as required. 10.5 First, for the gamma random variable, γ(α, β), with pdf f (x) =

1 e−x/β xα−1 ; 0 < x < ∞ β α Γ(α)

we have that: ln f (x) = −α ln β − ln Γ(α) −

x + (α − 1) ln x β

Thus, the entropy, defined as: ∫



H(X) =

− ln f (x)f (x)dx

0

is obtained in this specific case as: ∫ ∞ ∫ ∫ ∞ 1 ∞ H(X) = (ln Γ(α)+α ln β) f (x)dx+ xf (x)dx−(α−1) ln xf (x)dx β 0 0 0 (10.13) And now, because of the following identities: ∫ ∞ f (x)dx = 1 0 ∫ ∞ xf (x)dx = αβ ∫ ∞0 ln xf (x)dx = ψ(α) + ln β 0

where the so-called psi-function, ψ(α), is defined as: ψ(α) =

Γ′ (α) Γ(α)

Eq (10.13) becomes: H(X)

= α ln β + ln Γ(α) + α + (α − 1)(ψ(α) + ln β) Γ′ (α) = α + ln β + ln Γ(α) + (α − 1) Γ(α)

(10.14) (10.15)

as required. And now, since the χ2 (r) random variable is the gamma γ(r/2, 2) random variable, we obtain immediately from Eq (10.15) that the entropy for this random variable is: H(X) =

Γ′ (r/2) r + ln 2 + ln Γ(r/2) + (1 − r/2) 2 Γ(r/2)

6

CHAPTER 10.

10.6 For the Gaussian N (µ, σ 2 ) random variable with pdf f (x) =

−(x−µ)2 1 √ e 2σ2 σ 2π

we obtain that:

2

ln f (x) = − ln σ −

1 (x − µ) ln 2π − 2 2σ 2

so that the entropy is obtained as: ∫ ∞ H(X) = − ln f (x)f (x)dx (0 )∫ ∞ ∫ ∞ 1 1 2 = ln σ + ln 2π f (x)dx + 2 (x − µ) f (x)dx 2 2σ 0 0 ( ) 1 σ2 = ln σ + ln 2π + 2 2 2σ which simplifies to give

) ( √ H(X) = ln σ 2πe

(10.16)

as required. Observe that this expression depends on σ, the Gaussian distribution standard deviation, but is entirely independent of µ, the mean. This makes sense because µ is a measure of location: changes in its value merely shift the entire distribution without changing the shape. Thus, µ has no effect on the uncertainty associated with a Gaussian distribution. On the other hand, how broad or narrow a Gaussian distribution is depends entirely on σ. As a result, it is entirely reasonable that the entropy will depend on σ. In the limit as σ → ∞, we observe from Eq (10.16) that H(X) → ∞ also. The implication is as follows: as σ → ∞, so that the uncertainty in the Gaussian random variable, X, increases to the worst possible limit of complete uncertainty, the entropy also increases to match. 10.7 For the lognormal L(α, β 2 ) random variable with pdf { } 1 −(ln x − α)2 √ exp f (x) = ;0 < x < ∞ 2β 2 xβ 2π it follows that: ln f (x) = − ln x − ln β −

1 (ln x − α)2 ln 2π − 2 2β 2

from where we obtain the entropy as follows: ∫ ∞ H(X) = − ln f (x)f (x)dx 0 ( )∫ ∞ ∫ ∞ ∫ ∞ 1 1 = ln β + ln 2π f (x)dx + ln xf (x)dx + 2 (ln x − α)2 f (x)dx 2 2β 0 0 0

7 since, for the lognormal L(α, β 2 ) random variable, we know, from Eq (9.146) in the text that: ∫ ∞ ln xf (x)dx = E(ln X) = α 0 ∫ ∞ (ln x − α)2 f (x)dx = V ar(ln X) = β 2 0

the expression for the entropy above therefore simplifies to: H(X) = ln β + so that, finally:

1 β2 ln 2π + α + 2 2 2β

( √ ) H(X) = ln β 2πe + α

(10.17)

as required. When this expression is compared to the one in Eq (10.16), we find that β here plays the role of σ in the former expression; and, while the former expression does not involve µ, the current expression for the lognormal random variable depends linearly on α. This observation is not surprising. While values of the mean µ of a Gaussian distribution affect only the location of the distribution, not the shape, α, on the other hand, is a scale parameter for the lognormal random variable: it does not shift the distribution’s location but rather scales its magnitude (see Fig 9.9 in the text). Thus, as α increases, the spread (and hence the entropy) of the distribution changes. 10.8 (i) Under the condition G(Xi ) = 0; i = 1, 2, . . . , k, f (x) for this discrete random variable becomes: f (xi ) = C; i = 1, 2, . . . , k and since the constraint, k ∑

f (xi ) = 1

i=1

must hold, the implication is that kC = 1 ⇒ C =

1 k

so that, finally: 1 ; i = 1, 2, . . . , k k (ii) Under the condition G(X) = 0; a < X < b, f (x) for this continuous random variable becomes: f (x) = C; a < x < b f (xi ) =

8

CHAPTER 10.

and since



b

f (x) = 1 a

must hold, the implication is that: ∫

b

Cdx = C(b − a) = 1; ⇒ C = a

1 (b − a)

As such, we obtain, finally: f (x) =

1 ; a 1 always. Thus, the Beta-Binomial random variable variance is inflated by ϕ, so that this parameter may rightly be referred to as a “variance inflation” factor. Thus, the Beta-Binomial random variable may be considered as an overdispersed binomial random variable, in precisely the same manner in which the Poisson-Gamma random variable (with the same pdf as the negative binomial random variable) is an overdispersed Poisson random variable. In general, the extent to which the “variance inflation” (or overdispersion) parameter ϕ is different from 1 is the extent to which the binomial model is inadequate (by itself) to describe the data at hand; for cases where ϕ is close to 1, the binomial model will be reasonably adequate, and augmentation with a beta distribution in order to describe the variability in p better may not be necessary.

11.2. MODELING THE CLINICAL DATA

11.2

5

Modeling the Clinical Data

The objective is to model the IVF clinical data of Elsner and co-workers using a beta-Binomial pdf. This involves • Estimating unknown pdf parameters from the data; • Drawing insight about patient cohort groups from the parameter estimates; and • Assessing the model fit.

11.2.1

Parameter Estimation

Compute the mean x ¯, and variance s2 , of the complete Elsner data shown in Table 11.2. Determine appropriate estimates for the Beta-Binomial model parameters, α and β, in terms of the values computed for x ¯ and s2 from the data. Background For the purposes of this exercise, the clinical data set shown in Table 11.2 in the text is actually a collection of 6 data sets: for each cohort group, i, (i = 1, 2, . . . , 6), Ni patients each receive the same number, ni , embryos (with n1 = 1, n2 = 2, . . . , ni = i, . . . , n6 = 6); the data record is ηi (x), the number of patients in cohort group i with pregnancy outcome x. In terms of characterizing the random phenomenon in question, observe that the “experiment” consists of transferring ni embryos to each patient in the ith cohort group; the “outcome,” X, is the resulting number of live births for each patient. The “experiment” is replicated Ni times, and the recorded data, the final result, is ηi (x), the total number of times the pregnancy outcome x is obtained. This, of course, is akin to the binomial experiment of “tossing a coin” ni times, recording X, the number of “heads” observed after each experiment, and then repeating the entire experiment, Ni times, finally tallying up as ηi (x) the total number of occurrences of the outcome x. Computational Procedure The upcoming data analysis is based on applying the following procedure to the data record for each cohort group. First, the raw data in Table 11.2 will be converted to empirical frequencies, fE (x), from which sample means, x ¯, and sample variances, s2 , will be computed in the usual fashion, i.e.: ∑ ∑ x ¯= xj fE (xj ); s2 = x2j fE (xj ) − x ¯2 j

j

These sample values will then be used to obtain method-of-moments estimates of the unknown Beta-Binomial model parameters, α and β, as follows:

6

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

Values of π and ϕ will be determined from the values computed for x ¯, and s2 , using the following expressions: x ¯ s2 ; ϕ= n nπ(1 − π)

π=

ˆ will then be used to estimate α and β The estimates thus obtained, π ˆ and ϕ, by solving simultaneously the following two equations: π ˆ

=

ϕˆ =

α α+β α+β+n α+β+1

This task may be accomplished as follows. First, observe that the second equation is easily rearranged to yield: π 1 + nˆ α ϕˆ = π ˆ 1+ α

which further rearranges to: ˆπ = α + nˆ αϕˆ + ϕˆ π This is now solved for α to yield: ( α ˆ=

n − ϕˆ ϕˆ − 1

) π ˆ ; ϕˆ = ̸ 1

(11.8)

Next, because the first equation rearranges to: 1+

β 1 = α π ˆ (

we obtain: β=α

1−π ˆ π ˆ

)

and upon introducing Eq (11.8), the final result is: ( ) ˆ n − ϕ βˆ = (1 − π ˆ ); ϕˆ ̸= 1 ϕˆ − 1

(11.9)

Data Analysis Results Table 11.1 here shows the clinical data in terms of empirical frequencies, obtained by normalizing each column of the raw data in Table 11.2 in the text by the number of patients in each cohort group (i.e., fEi (x) = ηi (x)/Ni ). The result of the data analysis for each cohort group receiving ni = 1, 2, . . . 6 embryos, (i.e., x ¯, s2 , π ˆ and ϕˆ determined as indicated in the previous subsection)

11.2. MODELING THE CLINICAL DATA

7

Table 11.1: Empirical frequencies for IVF data Proportion of patients receiving n = 1, 2, . . . 6 embryos with pregnancy outcome x x Delivered pregnancy outcome 0 1 2 3 4 5 Total

fE1 (x)

fE2 (x)

fE3 (x)

fE4 (x)

fE5 (x)

fE6 (x)

0.9031 0.0969 0 0 0 0 1.000

0.7164 0.2413 0.0423 0 0 0 1.000

0.6248 0.2481 0.1120 0.0151 0 0 1.000

0.6046 0.2488 0.1009 0.0385 0.0072 0 1.000

0.5957 0.2766 0.1064 0.0213 0 0 1.000

0.50 0.25 0.25 0 0 0 1.000

is shown in Table 11.2 here. First, compare the estimates obtained here for π ˆ with the corresponding values shown in Table 11.4 in the text for the “overall pˆ est”; they are identical, as expected. ˆ the estimated binomial “variance inflation” factor. Note also the values of ϕ, These values indicate that: 1. The cohort group receiving n = 6 embryos shows the least amount of overdispersion (variance inflation), so that the basic binomial model may not require a beta distribution augmentation in representing this data; 2. The cohort groups receiving n = 2 and n = 5 embryos show the next highest amount of overdispersion—noticeable, but still modest. A beta distribution augmentation will improve the plain binomial model prediction somewhat. 3. The cohort groups receiving n = 3 and n = 4 embryos show the highest amount of overdispersion—more than 30% in one case, 50% in the other. The binomial model will be the most inadequate for these data sets and will therefore benefit the most from a beta distribution augmentation. The Beta-Binomial pdf parameters estimated from these sample sample statistics are shown in Table 11.3.

11.2.2

Analysis of the Estimated Beta Distributions

Plot the theoretical pdfs for f (p) using the values you determined for the parameters α and β. Discuss what this implies about how the SEPS parameter is distributed in the population involved in the Elsner clinical study.

8

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

Table 11.2: Data analysis results for each cohort group receiving n embryos n 1 2 3 4 5 6

x ¯ 0.0969 0.3259 0.5174 0.5950 0.5532 0.7500

s2 0.0875 0.3042 0.5644 0.7602 0.5876 0.6875

π ˆ 0.0969 0.1630 0.1725 0.1487 0.1106 0.1250

ϕˆ 1.0000 1.1153 1.3182 1.5011 1.1943 1.0476

Table 11.3: Computed Beta-Binomial distribution parameter estimates for each cohort group receiving n embryos n 1 2 3 4 5 6

α ˆ – 1.251 0.912 0.742 2.167 13

βˆ – 6.425 4.374 4.245 17.419 91

11.2. MODELING THE CLINICAL DATA

9

Distribution Plot Beta, alpha=1.25, beta=6.425 4

f(p|n=2)

3

2

1

0 0.0

0.2

0.4

0.6

0.8

1.0

p

Figure 11.1: Estimated probability distribution for SEPS parameter, p, for the cohort group receiving n = 2 embryos.

Probability Distribution Function Characteristics The parameter estimates α ˆ and βˆ shown in Table 11.3 here belong to the beta pdfs indicated by the data sets as the best representation of how p, the SEPS parameter, is distributed for each cohort group. The parameter p for the cohort group receiving only one embryo, of course, does not admit of a distribution (for each patient, there is only a single embryo involved); the distribution plots for the cohort groups receiving n =2, 3, 4, 5 and 6 embryos are shown, respectively in Figs 11.1, 11.2, 11.3, 11.4 and 11.5. These beta pdf results provide the following insights: 1. First, each estimate π ˆ shown in Table 11.2 here should be compared with the corresponding pdf plot, since, for each n, π ˆ is the expected value of the corresponding pdfs plotted here. For example, for n = 2, π ˆ = 0.163, while the corresponding complete distribution is shown in Fig 11.1. Thus, while the data average indicates a SEPS parameter value of 0.163 for this cohort group, the Beta-Binomial analysis indicates that a more complete representation is a distribution of parameter values shown in Fig 11.1, for which 0.163 is just the average. 2. In terms of general distribution characteristics, the analysis suggests that the parameter distribution for the cohort group receiving n = 3 embryos is similar to that for the cohort group receiving n = 4 embryos. This class of distributions is characterized by a distinctive left-handed J-shape, with the most likely values for p concentrated around 0, and limited mostly to the region between 0 and 0.3. (Also, these cohort groups have the largest sample size, N3 = 661 patients, and N4 = 832 patients). Furthermore,

10

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

Distribution Plot Beta, alpha=0.912, beta=4.374 16 14

f(p|n=3)

12 10 8 6 4 2 0 0.0

0.2

0.4

0.6

0.8

1.0

p

Figure 11.2: Estimated probability distribution for SEPS parameter, p, for the cohort group receiving n = 3 embryos.

Distribution Plot Beta, alpha=0.742, beta=4.245 12 10

f(p|n=4)

8 6 4 2 0 0.0

0.2

0.4

0.6

0.8

1.0

p

Figure 11.3: Estimated probability distribution for SEPS parameter, p, for the cohort group receiving n = 4 embryos.

11.2. MODELING THE CLINICAL DATA

11

Distribution Plot Beta, alpha=2.167, beta=17.419 7 6

f(p|n=5)

5 4 3 2 1 0 0.0

0.1

0.2 p

0.3

0.4

Figure 11.4: Estimated probability distribution for SEPS parameter, p, for the cohort group receiving n = 5 embryos.

Distribution Plot Beta, alpha=13, beta=91 14 12

f(p|n=6)

10 8 6 4 2 0 0.00

0.05

0.10

0.15

0.20

0.25

p

Figure 11.5: Estimated probability distribution for SEPS parameter, p, for the cohort group receiving n = 6 embryos.

12

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA the general characteristics of this class of distributions is distinct from those of the cohort group receiving n = 2 and n = 5 embryos. This latter group shares a common right-skewed characteristic for which the most likely values of p, even though still low, are concentrated away from 0, not around 0 as is the case with the former group. Note that these ˆ the n = 3 observations echo earlier statements concerning the values of ϕ: and n = 4 cohort groups, with their 30% or more variance inflation, have similar J-shaped distributions; the n = 2 and n = 5 cohort groups have smaller variance inflation (10–20%), and distributions of similar shape. 3. The distribution for the cohort group receiving n = 6 embryos is unique in that it is the least skewed (almost symmetric, in fact), but it is also the one with the narrowest range: most of the possible probabilities are concentrated in the range from 0.05 to 0.25. This observation appears consistent with the practice of transferring more embryos for patients with the lowest probability of success. Also, as noted earlier, this cohort group, with ϕˆ = 1.0476, has less than 5% variance inflation; as such, little or no beta distribution augmentation is necessary, so that the binomial model prediction will therefore be very similar to the Beta-Binomial prediction. Nevertheless, keep in mind that the number of patients in this subpopulation is extremely small (N6 = 4 patients). The results obtained for this cohort group should therefore be interpreted with caution.

11.2.3

Model Assessment

Compute probabilities for x = 0, 1, . . . 6, using the Beta-Binomial model and compare with the data. Use the model to predict delivered pregnancy outcomes for each cohort group and for the entire population; compare the predictions with the Elsner clinical data We are now in a position to compute the theoretical probabilities of observing the delivered pregnancy outcomes x = 0, 1, 2, . . . 6, for each cohort group, using the Beta-Binomial model, which may be conveniently written as: Γ(α + β) n! Γ(α + x)Γ(β + n − x) Γ(α)Γ(β)Γ(α + β + n) x!(n − x)! (11.10) For each n = 2, 3, . . . 6, and the corresponding values of α and β shown in Table 11.3, Eq (11.10) is used to determine the predicted probability distribution. The result is shown in Table 11.4. (Note: there is no Beta-Binomial pdf for the cohort group with n = 1; the result shown is based on the Bernoulli (binomial with n = 1) distribution with the estimated pˆ = 0.0969.) A comparison of this table with the empirical frequency distribution obtained directly from the data (Table 11.1 here) shows how remarkably well these predictions match the data. The corresponding predicted outcome, ηˆBB (x), the total number of patients with pregnancy outcome x, is obtained for each cohort group by multiplying fBB (x|n, α, β) =

11.2. MODELING THE CLINICAL DATA

13

Table 11.4: Predicted Beta-Binomial model probability distribution for the IVF data Proportion of patients receiving n = 1, 2, . . . 6 embryos with pregnancy outcome x x Delivered fˆBB1 (x) fˆBB2 (x) fˆBB3 (x) fˆBB4 (x) fˆBB5 (x) fˆBB6 (x) pregnancy outcome 0 0.9031 0.7158 0.6184 0.6044 0.5878 0.4578 1 0.0969 0.2412 0.2654 0.2476 0.2973 0.3720 2 0 0.0423 0.0944 0.1036 0.0922 0.1370 3 0 0 0.0210 0.0361 0.0198 0.0292 4 0 0 0 0.0080 0.0028 0.0037 5 0 0 0 0 0.0002 0.0003 Total 1.0000 0.9993 0.9992 0.9997 1.0001 1.0000 each fˆBBi (x) by Ni , the total number of patients in the cohort group i, i.e., ηˆBBi (x) = fˆBBi (x)Ni The result, shown in Table 11.5, should be compared with the raw data in Table 11.2 in the text, and the plain binomial model prediction in Table 11.3 in the text. The Beta-Binomial predictions fit to the data are seen to be uniformly better than the plain binomial model predictions. The following figures show a comparison of the clinical data’s empirical distribution versus both the basic binomial model predictions and the Beta-Binomial model predictions, for each cohort group (leaving out the “trivially perfect” n = 1 group). Note especially Figs 11.7 and 11.8, where the binomial model fit is particularly poor in each case. Recall that earlier, strictly on the basis of the estimated values for ϕˆ (the binomial variance inflation factor), we had stated that the binomial model would be inadequate for these two data sets. This pre-model assesment statement is borne out by these two figures. Another similar pre-model assessment statement is borne out by the result in Fig 11.10 for the n = 6 cohort group. The closeness to 1 of the estimated ϕˆ for this group prompted the earlier statement that, among other things, the binomial and beta-binomial model predictions will be similar; this is indeed the case in Fig 11.10. On the other hand, the poor fit of both models is primarily due to the extremely small sample size, N6 = 4, making it difficult to obtain parameter estimates with reasonable precision. Finally, having seen the Beta-Binomial model fit to data for the separate cohort groups, it is now instructive to see how this translates to a prediction of the overall distribution of pregnancy outcomes (similar to what is represented in Figure 11.1 in the text). The result is shown in Fig 11.11, where the BetaBinomial model prediction (short dashed line, diamond) is seen to fit the data (solid line, circle) so well that it is hard to differentiate the two. The basic binomial model fit (long dashed line, square) is definitely not as good.

x Delivered pregnancy outcome 0 1 2 3 4 5 Total ηˆBB2 (x)

287.763 96.968 16.986 0 0 0 401.717

ηˆBB1 (x)

205.004 21.996 0 0 0 0 227

408.744 175.451 62.423 13.853 0 0 660.471

ηˆBB3 (x)

502.760 205.962 86.177 30.035 6.619 0 831.553

ηˆBB4 (x)

27.627 13.975 4.335 0.930 0.1305 0.009 47.007

ηˆBB5 (x)

1.832 1.488 0.548 0.117 0.015 0.001 4.001

ηˆBB6 (x)

ηˆBBT (x) Total no. patients with pregnancy outcome x 1433.73 515.840 170.470 44.93 6.77 0.01 2171.75

Table 11.5: Beta-Binomial model prediction of Elsner, et al. data Predicted No. of patients receiving n = 1, 2, . . . 6 embryos with pregnancy outcome x

14 CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

11.2. MODELING THE CLINICAL DATA

15

Plot of fE2x, fh2, fxBB2 vs x 0.8

Variable fE2x fh2 fxBB2

0.7 0.6

f(x)

0.5 0.4 0.3 0.2 0.1 0.0 0

1

2

3 x

4

5

6

Figure 11.6: Elsner data empirical distribution (solid line, circle) versus binomial model prediction (long dashed line, square) and Beta-Binomial model prediction (short dashed line, diamond) for the cohort group receiving n = 2 embryos. Note how the data is indistinguishable from the Beta-Binomial fit.

Plot of fE3X, fh3, fxBB3 vs x 0.7

Variable fE3X fh3 fxBB3

0.6 0.5

f(x)

0.4 0.3 0.2 0.1 0.0 0

1

2

3 x

4

5

6

Figure 11.7: Elsner data empirical distribution (solid line, circle) versus binomial model prediction (long dashed line, square) and Beta-Binomial model prediction (short dashed line, diamond) for the cohort group receiving n = 3 embryos. Note the discrepancy between the binomial model fit and the data, and the improved fit provided by the Beta-Binomial model.

16

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

Plot of fE4x, fh4, fxBB4 vs x Variable fE4x fh4 fxBB4

0.6 0.5

f(x)

0.4 0.3 0.2 0.1 0.0 0

1

2

3 x

4

5

6

Figure 11.8: Elsner data empirical distribution (solid line, circle) versus binomial model prediction (long dashed line, square) and Beta-Binomial model prediction (short dashed line, diamond) for the cohort group receiving n = 4 embryos. Note the discrepancy between the binomial model fit and the data, and the improved fit provided by the Beta-Binomial model.

Plot of fE5x, fh5, fxBB5 vs x Variable fE5x fh5 fxBB5

0.6 0.5

f(x)

0.4

0.3 0.2

0.1 0.0 0

1

2

3 x

4

5

6

Figure 11.9: Elsner data empirical distribution (solid line, circle) versus binomial model prediction (long dashed line, square) and Beta-Binomial model prediction (short dashed line, diamond) for the cohort group receiving n = 5 embryos. Note the improvement in model fit provided by the Beta-Binomial model.

11.2. MODELING THE CLINICAL DATA

17

Plot of fE6x, fh6, fxBB6 vs x Variable fE6x fh6 fxBB6

0.5

0.4

f(x)

0.3

0.2

0.1

0.0 0

1

2

3 x

4

5

6

Figure 11.10: Elsner data empirical distribution (solid line, circle) versus binomial model prediction (long dashed line, square) and Beta-Binomial model prediction (short dashed line, diamond) for the cohort group receiving n = 6 embryos. With a sample size of only 4 patients, the data is the least “reliable”. Note how similar the binomial and Beta-Binomial fits are.

Scatterplot of NTData, NTh, NThBB vs x 1600

Variable NTData NTh NThBB

1400

Number of Patients

1200 1000 800 600 400 200 0 0

1

2 3 4 x, Pregnancy Outcome

5

6

Figure 11.11: Complete Elsner data (solid line, circle) versus binomial model prediction (long dashed line, square) and Beta-Binomial model prediction (short dashed line, diamond).

18

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

Scatterplot of NTData, NThS, NThBB vs x 1600

Variable NTData NThS NThBB

1400

Number of Patients

1200 1000 800 600 400 200 0 0

1

2 3 4 x, Pregnancy Outcome

5

6

Figure 11.12: Complete Elsner data (solid line, circle) versus binomial model prediction using stratified data, as in Fig 11.6 in the text (long dashed line, square), and Beta-Binomial model prediction (short dashed line, diamond).

Fig 11.12 shows the improvement in model fit to data provided by the ad-hoc stratification of the data (as discussed in the text) versus that obtained with the more rigorous Beta-Binomial model. The data stratification clearly improves the binomial model fit, but not as much as does the formal incorporation of a beta distribution to produce the Beta-Binomial model. A quantitative (and more revealing) evaluation of the overall model fit to the complete Elsner data is provided by a regression analysis of data ηT (x), the total number of patients with pregnancy outcome x as observed in the clinical study, versus the various model predictions of the same quantity: ηˆT (x) for the plain binomial model, ηˆTS (x) for the “stratified” binomial model, ηˆTBB (x) for the Beta-Binomial model. Note that for the “perfect” model, such a regression analysis will result in a slope of 1 and an intercept of 0, with an R2 value of 100%. The extent to which the results of the regression analysis approach this ideal provides an indication of model fit. The results are shown in Fig 11.13 for the plain binomial model fit, Fig 11.14 for the “stratified” binomial model fit, and finally, Fig 11.15 for the BetaBinomial model fit. In spite of the fact that the data spans 3 orders of magnitude, the model fits all appear quite reasonable. The “stratified” binomial model fit provides a slight improvement over the basic binomial fit in terms R2 2 and Radj values, and also in terms of S, the root mean squared deviation, which is halved from 98.7 for the plain binomial model, to 49.05 for the “stratified” model. However, the regression line’s slope and intercept are essentially the same for both models. On the other hand, the regression line’s slope and intercept for the BetaBinomial model predictions are near perfect; S, the root mean squared deviation

11.3. SUMMARY AND CONCLUSIONS

19

Fitted Line Plot NTData = - 5.50 + 1.018 NTh 1600

S R-Sq R-Sq(adj)

1400

98.7312 97.1% 96.5%

1200

NTData

1000 800 600 400 200 0 0

200

400

600 800 NTh

1000

1200

1400

Figure 11.13: Regression of the complete Elsner data versus the basic binomial model prediction. 2 has been reduced by an order of magnitude; and the R2 and Radj values are literally perfect.

11.3

Summary and Conclusions

We have shown how to develop a Beta-Binomial model for the IVF clinical data set, and demonstrated that this mixture model’s predictions fit extremely well to the specific data set used in the case study. We also showed that the mixture model provided a much better fit to the data than did a plain binomial model. The data stratification discussed in the text may now be seen as simply an ad-hoc method of characterizing the unavoidable distribution of values for the SEPS parameter, p, for the various cohort groups, a task that is achieved more rigorously with a beta probability distribution. Overall, the analysis presented above confirms two things: 1. That for each single patient, the process of IVF can indeed be idealized as a binomial Bi(n, p) phenomenon, with n as the number of embryos transferred, and the parameter p, associated with the patient-embryo combination, indicative of the probability of success (where “success” is understood to mean a live birth resulting from a single transferred embryo); 2. In analyzing clinical data involving a great variety of patients and cohort groups, while the binomial model remains valid for the individual patient, the entire population will be better represented by a Beta-Binomial model arising from a recognition that the SEPS parameter p will admit of a variety of values and is best represented by a beta B(α, β) distribution.

20

CHAPTER 11. BETA-BINOMIAL MODELING OF IVF DATA

Fitted Line Plot NTData = - 5.15 + 1.017 NThS 1600

S R-Sq R-Sq(adj)

1400

49.0507 99.3% 99.1%

1200

NTData

1000 800 600 400 200 0 0

200

400

600 800 NThS

1000

1200

1400

Figure 11.14: Regression of the complete Elsner data versus the “stratified” binomial model prediction.

Fitted Line Plot NTData = - 0.332 + 1.002 NThBB 1600

S R-Sq R-Sq(adj)

1400

7.46168 100.0% 100.0%

1200

NTData

1000 800 600 400 200 0 0

200

400

600

800 NThBB

1000

1200

1400

1600

Figure 11.15: Regression of the complete Elsner data versus the Beta-Binomial model prediction.

Chapter 12

Exercises Section 12.1 12.1 (i) The random variable is the number shown on the die’s top face. The random variable space is: VX = {1, 2, 3, 4, 5, 6} or the totality of the numbers that can possibly show on the die. (ii) The population is the entire collection of all the numbers that can possibly be obtained from an infinite repetition of the “experiment” of tossing a single toss once. A specific set of numbers obtained from a finite number of such single die tosses will constitute a sample from the population. (iii) Since the sample space is Ω = {1, 2, 3, 4, 5, 6}, and since, for a fair die, the probability of observing any particular number will be equal, the following uniform probability distribution will be reasonable for this random variable: PX (X = 1)

=

PX (X = 2) =

1 6 1 6

.. . PX (X = 6) =

1 6

Thus, the postulated probability model is f (x) =

1 ; x = 1, 2, 3, 4, 5, 6 6

12.2 (i) The random variable in this case is X = (XW , XD ) a two-dimensional random variable, where XW is the total number of wins (with possible values {0, 1, 2}), and XD is the total number of draws (with possible 1

2

CHAPTER 12.

values {0, 1, 2}), since the total number of games in the series is 2. In addition, because of this very same constraint (that the total number of games must be exactly 2), it must also hold for this random variable that X W + XD ≤ 2 Consequently, the associated random variable space is: VX = {(2, 0), (1, 0), (1, 1), (0, 0), (0, 1), (0, 2)} (ii) The population in this case is the totality of all the outcomes, Xi = (XWi , XDi ), i = 1, 2, . . . , observed when the player participates in an infinite number of two-game matches. A sample of size N from this population will be the specific outcomes X1 , X2 , . . . , XN observed after N such two-game matches, where, for the ith match, Xi = (XWi , XDi ), with the first entry in the ordered pair being the total number of wins, and the last entry, the total number of losses. (iii) Let pW be the probability of winning a single game, and pD , the probability of drawing. (Note that pL , the probability of losing, is obtained immediately as (1 − pW − pD ) since the phenomenon in question consists of only the indicated three mutually exclusive outcomes.) Observe that the two-dimensional random variable, X, has all the characteristics of a trinomial random variable, with n, the number of trials, as 2. Thus, from Chapter 8, we see that a reasonable probability model for this random variable is: f (xW , xD ) =

2! pxW pxD (1 − pW − pD )(2−xW −xD ) xW !xD !xL ! W D

(12.1)

where XL = (2 − xW − xD ). The two unknown parameters are pW and pD . 12.3 The random variable space is VT = {T : 0 < T < ∞} where the random variable, T , represents the time in between incidences. The population is the totality of all possible observable times between occurrences of accidents. A finite collection of actual observed times between accidents in the 10 year period, or a random selection from this, will constitute sample. The exponential E(β) distribution is a reasonable probability model for this random variable, if the accidents are postulated to be rare, Poisson events, occurring at a constant mean rate, η. This distribution has one parameter, β, which is the same as the inverse of η, the constant mean rate of occurrence of the Poisson events (accidents). 12.4 The random variable space is VT = {T : 0 < T < ∞}

3 The population is the totality of the lifetimes observable from the entire collection of all washing machines of the brand in question. The population parameters are the Weibull distribution characteristics parameters, ζ and β. A sample {xi }50 i=1 is obtained by selecting 50 such washing machines (at random), testing them under identical conditions until failure, and noting the time to failure for each one. Section 12.2 12.5(i) Qualitative variable, nominal data (ii) Quantitative variable, continuous data (iii) Qualitative variable, ordinal data (iv) Quantitative variable, continuous data (v) Quantitative variable, continuous data (vi) Quantitative variable, discrete data 12.6(i) Qualitative variable, ordinal data (ii) Quantitative variable, continuous data (iii) Quantitative variable, discrete data (iv) Qualitative variable, nominal data (v) Quantitative variable, continuous data 12.7 (i) Qualitative variable, nominal data (ii) Quantitative variable, continuous data (iii) Qualitative variable, nominal data (iv) Quantitative variable, continuous data (v) Quantitative variable, discrete data (vi) Quantitative variable, continuous data (vii) Qualitative variable, nominal data Section 12.3 12.8 The bar charts are shown in Fig 12.1, and the pie charts in Fig 12.2.

(a) Graduated with a BS

(b) Graduated with an MS (c) Graduated with a PhD

Figure 12.1: Bar charts showing employers of chemical engineers graduates categorized by degree. Some of the most prominent characteristics of these data sets are as follows:

4

CHAPTER 12.

(a) Graduated with a BS

(b) Graduated with an MS

(c) Graduated with a PhD

Figure 12.2: Pie charts showing employers of chemical engineers graduates categorized by degree. • The vast majority of chemical engineers are employed in industry, regardless of degree. • Virtually all academic jobs are held by chemical engineers with PhDs. • The percent unemployment drops sharply from BS holders to MS holders while the difference between the percent unemployed MS and PhD holders is less prominent. • A large number of MS students go on to graduate/professional schools. 12.9 The Pareto charts are shown in Fig 12.3, from where we are able to make the following observations. • More than half of all chemical engineers graduates with BS or PhD degrees are employed in industry; the percentage is only slightly less that 50% for those with MS degrees. • For graduates with BS degrees, more than 80% are employed either in industry, go on to graduate/professional school, or work in an unknown/unspecified field. • For graduates with MS degrees, industry and graduate/professional school account for almost 80% of the employment.

5 • Industry and academia employ approximately 75% of PhD graduates; adding contributions from graduate/professional brings the total to almost 90%.

(a) Graduated with a BS

(b) Graduated with an MS (c) Graduated with a PhD

Figure 12.3: Pareto charts showing employers of employers of chemical engineers graduates categorized by degree. 12.10 The bar and Pareto charts for this data set are shown in Fig 12.4.

Figure 12.4: Bar chart and Pareto chart of data from 1983 report showing American participation in different sports.

According to the bar chart, the category with the largest number of participants is swimming, followed by bicycling. Skiing has the fewest number of participants, which are really only slightly fewer than ice-skating participants. From the Pareto chart, we observe that swimming accounts for almost 20% of the participants. However, there are so many categories that it requires up to 9 categories cumulatively to account for approximately 80% of the participants. The Pareto chart does not show the typical “elbow” followed by a relatively flat approach to 100% cumulative contribution; instead, this chart shows an “almost smooth” and monotonically increasing line, indicating the obvious point

6

CHAPTER 12.

that there are many categories, and that many of them have near-equal contributions. This is especially so in the middle where there are several categories in a row, all with almost identical participation. It is unadvisable to use a pie chart for this data set primarily because of the large number of categories which will result in a chart with too many slivers. In addition, the relative percentages for many of the categories are similar, making them indistinguishable in such a crowded pie chart. 12.11 The bar chart is shown in Fig 12.5 and a corresponding Pareto chart in Fig 12.6. One finds 83% of the WTD individuals in 6 of the states (California, Texas, Florida, New York, Illinois, and Pennsylvania).

Relative Frequency of Well-to-Do Individuals, in 1985, by State 0.25

RelFreq

0.20 0.15 0.10 0.05 0.00 a ni or lif a C

s xa Te

a id or Fl

w Ne

rk Yo

s oi in Il l

lv a sy nn

a ni

Pe State

io Oh w Ne

Je

ey rs

w Io

a

an ig ic h M

Figure 12.5: Bar chart of WTD population in various states in 1985.

Figure 12.6: Pareto chart of WTD population in various states in 1985.

7

Histogram of X_N, X_L, X_G, X_I X_N

X_L 8

4.8

6

Frequency

3.6 2.4

4

1.2

2 0

0.0 8

9

10

11

12

0

5

10

X_G

20

25

30

35

X_I

6.0

4.8

4.5

3.6

3.0

2.4

1.5

1.2

0.0

15

0.0 5.0

7.5 10.0 12.5 15.0 17.5 20.0

0.05

0.10

0.15

0.20

0.25

Figure 12.7: Histograms of the N, L, G, I data sets.

Section 12.4 12.12 The histograms are shown in Fig 12.7, the box plots in Fig 12.8. The histograms indicate that while the N population data set appears symmetric, the others appear skewed. Even though skewed to the right, the I data set has a very narrow range (from ∼0.5 to ∼0.25) while the L data set occupies a larger range (from 0 to 35). The box plots emphasize the skewness of the I data set the most, as indicated by the “outlier” at 0.25 identified with an asterisk.

Boxplot of X_N, X_L, X_G, X_I X_N

X_L

12

40

11

30

10

20

9

10

8 20

15

0 X_G

0.25

X_I

0.20 0.15

10 0.10 5

0.05

Figure 12.8: Box plots of the N, L, G, I data sets.

8

CHAPTER 12.

12.13 The required summary statistics are shown in the table below. Note that in every case, Eq (12.25) in the text holds: i.e., x ¯h < x ¯g < x ¯ Random variable XN XL XG XI

x ¯ 10.0364 12.8397 11.8109 0.1184

x ¯g 9.9896 7.9599 10.9640 0.1096

xm 9.9987 7.3051 11.4796 0.1011

x ¯h 9.9419 4.2052 10.0484 0.1020

Consistent with the earlier observation in Problem 12.12 that the N data set appears symmetric, we observe that all the computed measures (with the exception of the harmonic mean, x ¯h ) have approximately the same value of 10.0, a number that appears exactly in the middle of the histogram. In particular, the arithmetic mean and the median are virtually identical and appear to be slightly more representative of the central location of the data set. (These facts are consistent with the characteristics of Gaussian-distributed data; and in fact, the data set was generated from a Gaussian distribution.) For the L population, the arithmetic mean (12.8), and the harmonic mean (4.2), appear to be far removed from the geometric mean (7.96) and the median (7.30) which are close to each other. From the histogram of this data set, the geometric mean and the median appear to be more representative of the central location than the arithmetic mean (which appears to be too high) or the harmonic mean (which appears to be too low). (The data set was actually generated from a lognormal distribution; and we observe that the conclusions noted here are consistent with the characteristics of lognormally distributed data. In particular, the geometric mean is the most appropriate measure of central location since what sums are to normally distributed data, products are to lognormally distributed data.) Even though somewhat close to one another, the arithmetic mean for the G population data set (11.8) appears high, the harmonic mean (10.0) appears low; the geometric mean (10.96) appears a bit low, while, upon inspection of the histogram, the median (11.5) appears to be most representative of the central location. (This observation is consistent with the fact that the data set was generated from a gamma distribution with modest variance, hence the less pronounced skewness.) For the I data set, the median and the harmonic mean are virtually identical, while the arithmetic and geometric means appear relatively larger than the others. An inspection of the histograms indicates that the median and harmonic mean appear to be more representative of the central location. (This is consistent with the fact that the data was generated from an inverse gamma distribution; as such the harmonic mean will be more representative of the central location of the data set.) 12.14 The data histogram is shown in Fig 12.9.

9

0.35

0.3

Relative Frequency

0.25

0.2

0.15

0.1

0.05

0

0

15

30

45 60 75 90 105 120 135 150 165 Inter−Origin Distance (kb)

Figure 12.9: Histogram of inter-origin distances data of Li et al. The mean is obtained from: x ¯=

n ∑

xi fr (xi )

(12.2)

i=1

where fr (xi ) is the relative frequency associated with each distance, xi . The result is: x ¯ = 58.95 kb (12.3) The variance is obtained from sˆ2 =

n ∑

(xi − x ¯)2 fr (xi )

(12.4)

i=1

which, upon introducing Eq (12.3) above, yields: sˆ2 = 1, 226.4 The median is estimated by finding the value, xn , for which Observing that 4 ∑

∑n i=0

fr (xi ) ≈ 0.5.

fr (xi ) = 0.00 + 0.09 + 0.18 + 0.26 = 0.53

i=1

where because x4 = 45, we conclude that the best estimate of the median obtainable from the data set is 45 kb.

10

CHAPTER 12.

Scatterplot of City mpg vs Eng Capacity 22 20

City mpg

18 16 14 12 10 8 1

2

3

4

5 Eng Capacity

6

7

8

9

Figure 12.10: Scatter plot of city gas mileage versus engine capacity.

Scatterplot of City mpg vs Cylinders 22 20

City mpg

18 16 14 12 10 8 5.0

7.5

10.0 Cylinders

12.5

15.0

17.5

Figure 12.11: Scatter plot of city gas mileage versus number of cylinders.

12.15 The scatter plots are shown in Fig 12.10 and Fig 12.11. Each plot is very similar to its corresponding highway mileage counterpart in the text, even down to the same two potential “outliers” identified in Fig 12.16 in the text for the plot versus engine capacity (see Fig 12.10 here). This observation should not be surprising since city and highway gas mileage are expected to be highly correlated. 12.16 The required 6 pairwise correlations are computed and listed in the table

11 below. Variables X1 and Y1 X2 and Y1 X1 and Y2 X2 and Y2 X1 and X2 Y1 and Y2

Correlation Coefficient −0.8504 −0.9315 −0.7574 −0.8980 0.8911 0.9648

Engine capacity (X1 ) is strongly negatively correlated to both highway (Y1 ) and city gas mileage (Y2 ), implying that cars with larger engines typically get lower gas mileage, in the city or on the highway. Number of cylinders (X2 ) is also strongly negatively correlated to both highway and city gas mileage, with the correlations generally stronger in each corresponding case, compared to engine capacity. Thus, cars whose engines have more cylinders generally get poorer city gas mileage quite consistently. Engine capacity is strongly positively correlated to number of cylinders, indicating, not surprisingly, that engines with more cylinders tend to have higher capacities. The strongest correlation by far is that between city and highway gas mileage. The very strong positive correlation has the obvious implication that cars with higher city gas mileage also tend to have higher highway gas mileage. 12.17 Directly from the data given in Table 12.8 in the text, we are able to compute the following characteristics: x ¯1 = x ¯2 = x ¯3 = x ¯4 = 9.0 y¯1 = y¯2 = y¯3 = y¯4 = 7.5 sx1 = sx2 = sx3 = sx4 = 3.32 sy1 = sy2 = sy3 = sy4 = 2.03

confirming the results in the text. Additionally ρxy = .82 for all data sets. 12.18 To determine the value of c that minimizes the mean absolute deviation ϕ1 defined as: ∞ ∑ ϕ1 = |xi − c|f (xi )dx i=0

first, we rewrite this function in terms of the two regions of interest, {xi : xi < c} and {xi : xi > c}, by virtue of the absolute value function for which: { |xi − c| =

−(xi − c); xi < c xi − c; xi > c

(12.5)

12

CHAPTER 12.

As a result: ϕ1 =





−(xi − c)f (xi ) +

xi c

From here, the usual calculus method for determining optima yields: ∑ ∑ dϕ1 f (xi ) = 0 f (xi ) − = dc x >c x f (xi ) for all i; and (b) by virtue of the unimodality assumption, if this mode occurs at xj , then, in particular, f (xj )

>

f (xj+1 )

f (xj )

>

f (xj−1 )

4. The conditions of optimality for the discrete objective function, ϕ∞ (c), are as follows: if xj is the value that minimizes ϕ∞ (c), then, for all possible values of xj , it must be true that: ϕ∞ (xj )
f (xj+1 )

(12.17)

Similarly, ϕ∞ (xj−1 ) − ϕ∞ (xj ) > 0 if and only if f (xj ) > f (xj−1 )

(12.18)

The implications are as follows: the optimality conditions in Eq (12.7) will be satisfied for some value c = xj if and only if xj meets the requirements in Eqs (12.17) and (12.18) simultaneously. As noted in statement 3 above, this value is the mode of the discrete pdf f (xi ), f (x∗ ), hence the result.

APPLICATION PROBLEMS 12.20 (i) The random variable, X, in this case is the number of flaws found on any silicon wafer; the set {xi }ni=1 , where n = 20, is the specific number of flaws found on each of the 20 silicon wafers selected from those produced at the site in question. The Poisson distribution is a reasonable probability model for the phenomenon in question here because flaws rarely occur on silicon wafers, and when they do occur, they tend to be few in number. Thus, the total number of flaws per wafer will exhibit the characteristics of a Poisson random variable. (ii) The computed Poisson model probabilities for the various population parameters are shown below, along with the supplied data cast in terms the frequency of occurrence of flaws.

15

x

f (x|λ = 0.5)

f (x|λ = 1.0)

f (x|λ = 1.5)

0 1 2 3 4 5

0.606531 0.303265 0.075816 0.012636 0.001580 0.000158

0.367879 0.367879 0.183940 0.061313 0.015328 0.003066

0.223130 0.334695 0.251021 0.125511 0.047067 0.014120

Data Frequency ϕ(x) 4 4 6 4 2 0

Relative Frequency fE (x) = ϕ(x)/20 0.2 0.2 0.3 0.2 0.1 0.0

A plot of the data frequency distribution is shown in Fig 12.12 along with the various postulated theoretical Poisson distributions, with λ = 0.5 (dashed line, squares), λ = 1.0 (dotted line, diamonds), and λ = 1.5 (dashed-dotted line, triangles). We observe from this plot that λ = 1.5 is the population parameter most representative of the observations.

Poisson probabilty distributions and particle flaw frequency data Variable data fx05 fx10 fx15

0.6 0.5

f(x)

0.4 0.3 0.2 0.1 0.0 0

1

2

3

4

5

x

Figure 12.12: Relative frequency of flaws found on standard sized silicon wafers (solid line, filled circles) and various postulated theoretical Poisson distributions. The distribution with λ = 1.5 (dashed-dotted line, triangles) is closest to the data. 12.21 (i) The mean is 1.99; the median is 1.36; and the variance is 3.45. The histogram of the data, plotted in Fig 12.13, shows a non-negative variable with a right-skewed distribution. A theoretical exponential distribution fit is superimposed. The shape is not surprising given that the time between occurrences of single Poisson events can be anywhere from 0 to ∞, with the larger proportion of observations concentrated at short times, and progressively fewer observations at longer times. (ii) A reasonable model is the exponential E(β) distribution.

16

CHAPTER 12.

Histogram of Time between Safety Violations Exponential 16

Mean N

1.988 30

14

Frequency

12 10 8 6 4 2 0 0

2 4 6 8 Time (in months) between safety violations

Figure 12.13: Histogram of time (in months) between safety violations in toll manufacturing facility and a theoretical exponential model fit.

For the exponential distribution with µ = β = 2, the required probability is obtained as: P (X > 2) = 1 − P (X ≤ 2) = 0.368 The data set shows 10 values out of 30 that are greater than 2 months, implying an empirical probability of observing X > 2 of 10/30 = 0.3333, which is close enough to the theoretical value of 0.368. The model therefore appears consistent with the data. (iii) The mean and median for each of the 3 data subsets are shown in the table below; a set of box plots of the three data sets is shown in Fig 12.14.

Mean Median

Operator A 1.87 1.36

Operator B 2.12 1.64

Operator C 1.97 0.88

Even though the average time in between occurrence of safety violations is smaller for operator A than the value obtained for the other operators, the medians and the box plots show no conclusive evidence to support the preconception that “Operator A” is more safety conscious than the other operators. 12.22 From the supplied data, we are able to compute the numerical descriptors shown in the table below.

17

Boxplot of A, B, C 8 7 6

Data

5 4 3 2 1 0 A

B

C

Figure 12.14: Box plots of the time (in months) between safety violations in toll manufacturing facility for operators A, B and C.

Battery Type 1 2 3 4 5

Mean 43.00 42.25 27.00 47.50 30.50

Median 42.50 42.50 27.00 47.00 30.00

Std Dev 2.16 3.30 0.82 2.65 5.20

Fig 12.15 shows a plot of the data for each battery type. Strictly from these descriptive characteristics (numerical as well as graphical), it appears as if battery types 1, 2 and 4 are better than types 3 and 5, with type 4 being the best. 12.23 A scatter plot of the data is shown in Fig 12.16 where it is obvious that a strong—and fairly predictable—relationship exists between the two variables. The implication is that it should be possible to predict the boiling point of the compounds in this series on the basis of the number of carbon atoms in the compound. The computed correlation coefficient is 0.987, indicating a very strong, positive correlation between the two variables. 12.24 A plot of thermal conductivity versus temperature, shown in Fig 12.17, indicates a positive, linear relationship between these two variables. This conclusion is supported by the correlation coefficient which is computed as 0.985. Since the relationship between these two variables appears to be essentially linear, it seems reasonable to compute the value of the thermal conductivity at 325℃ by linearly interpolating between the values at 300℃ and the value at

18

CHAPTER 12.

55

50

Seconds

45

40

35

30

25

20

0

1

2

3 Battery Type

4

5

6

Figure 12.15: Plot of “cold cranking power” for different types of batteries.

150

100

Boiling Point (C)

50

0

−50

−100

−150

−200

1

2

3

4 5 Number of Carbon Atoms

6

7

8

Figure 12.16: Boiling point of hydrocarbon compound versus number of carbon atoms in the molecule.

350℃, i.e., 111.535 + 115.874 = 113.700W/m-℃ 2 is a reasonable estimate of the thermal conductivity at 325℃. k(325) =

19

130

125

Thermal Conductivity

120

115

110

105

100

95

90 100

150

200

250 300 Temperature

350

400

450

Figure 12.17: Thermal conductivity versus temperature.

12.25 A summary of key descriptive statistics for these two data sets is shown in the table below. It appears as if, across the board, the number of accidents occurring during Period II are noticeably fewer than those occurring in Period I. One possible explanation for this observation is that a safety training program implemented after Period I was effective in reducing the number of accidents observed in Period II. Period I II

Mean 6.00 2.90

Median 5.50 2.50

Std Dev 2.49 1.89

Min 2 0

Max 10 7

12.26 Bar charts for the 1960 and 1980 population distributions are shown in Figs 12.18 and 12.19 respectively. A distinctive feature of these plots is the general wave-like characteristic (excluding the catch-all group of senior citizens 65 years and older). In particular, for the 1960 distribution in Fig 12.18, we observe the “trailing wave” from the < 5 age group, up to, and including the 15-19 age group. Observe that babies born between 1945 and 1965 would belong to this extended cohort group: the oldest, those born in 1945, would be 15 years old in 1960, while those born between 1945 and 1960 would be between 0 and 15 years; those born between 1960 and 1965 would not appear in this plot for obvious reasons). Twenty years later, the cohort groups in the 1960 distribution’s “trailing wave” have shifted forward in time to become the three groups in the 25-39 age bracket in 1980. Those born between 1960 and 1965 (not part of the 1960 population) would be

20

CHAPTER 12.

Chart of 1960 20000

1960

15000

10000

5000

0
= 10 15 25 20 30 35 40 45 55 50 60 Age Group

Figure 12.18: Age distribution in the US in 1960.

Chart of 1980 25000

1980

20000

15000

10000

5000

0
= 10 15 25 20 30 35 40 45 55 50 60 Age Group

Figure 12.19: Age distribution in the US in 1980.

21 between 15 and 20 years of age in 1980, contributing to the 15-19 age group bar in Fig 12.19. Thus, the prominent high peak wave in the 1980 age distribution (between 15 and 39 years of age) is seen to be due to the baby boom children born after World War II. Another distinctive feature is the “senior citizen” population (≥ 65), which is much higher in 1980 than in 1960, indicating an increase in life expectancy over the 20 year span in question.

Chapter 13

Exercises Section 13.1 13.1 Y1 and Y3 are statistics, since they are functions of only the random sample, not of any unknown population parameters. Because Y2 depends on the unknown population parameter µ, and Y4 depends on the variance σ 2 , they do not qualify as statistics. If the population mean µ is specified, then Y2 will qualify as a statistic. ¯ is a statistic. Because Y2 13.2 Y1 and Y3 are statistics, since the sample mean X depends on the unknown population mean µ, and Y4 depends on the unknown variance σ 2 , these random variables are not statistics. 13.3 (i) For the Gamma(α, β) distribution, µ = αβ; σ 2 = αβ 2 a pair of equations readily solved simultaneously to give β=

σ2 µ µ2 ; α= = 2 µ β σ

(ii) For the Beta(α, β) distribution, µ=

α αβ ; σ2 = 2 α+β (α + β) (α + β + 1)

which, when solved simultaneously for α and β in terms of µ and σ 2 , yields: [ ] (1 − µ)µ α = µ −1 σ2 ] [ (1 − µ)µ −1 β = (1 − µ) σ2 (iii) For the Binomial(n, p) distribution, µ = np; σ 2 = np(1 − p) 1

2

CHAPTER 13.

which can be solved to give (1 − p) =

σ2 σ2 µ − σ2 ⇒p=1− = µ µ µ

and n=

µ µ2 = p µ − σ2

Section 13.2 13.4 In general, for n independent random variables Xi ; i = 1, 2, . . . , n, each with respective characteristic functions, φXi (t), if Y = k1 X1 + k2 X2 + . . . + kn Xn =

n ∑

ki Xi

i=1

then φY (t) = φX1 (k1 t)φX2 (k2 t) · · · φXn (kn t) =

n ∏

φXi (ki t)

(13.1)

i=1

Here, for Xi ∼ N (µi , σi2 ), the characteristic function is { } 1 2 2 φXi (t) = exp jµi t − σi t 2 so that { } 1 2 2 φY (t) = exp jµi (ki t) − σi (ki t) 2 i=1 { n } ∑ 1 2 2 2 jµi ki t − σi ki t = exp 2 i=1 { } 1 = exp jµy t − σy2 t2 2 ∑ ∑n n where µy = i=1 ki µi and σy2 = i=1 ki2 σi2 . The expression is recognized as the characteristic function of a Gaussian random variable, N (µy , σy2 ), as required. (ii) If µi = µ; i = 1, 2, . . . , n, and σi2 = σ 2 ; i = 1, 2, . . . , n, then ( n ) n ∑ ∑ µy = ki µ = ki µ n ∏

i=1

σy2 =

n ∑ i=1

as required.

ki2 σ 2 =

i=1

( n ∑ i=1

ki2

) σ2

3 13.5 From the definition, 1∑ Xi n i=1 n

Y = we obtain the mean, µy , as:

1∑ E(Xi ) n i=1 n

µy = E(Y ) = =

1∑ µ n i=1

=

1 (nµ) = µ n

n

Similarly, σy2 = V ar(Y )

= = =

n 1 ∑ V ar(Xi ) n2 i=1

1 (nσ 2 ) n2 σ2 n

as required. 13.6 With Xi ∼ N (µ, σ 2 ); i = 1, 2, . . . , n, the independent random variables Zi =

Xi − µ ; i = 1, 2, . . . , n σ

have a standard normal N (0, 1) distribution. As shown in Example 6.3 in the text, the pdf for the random variable Yi = Zi2 is: 1 −1/2 fYi (yi ) = √ e−yi /2 yi ; 0 < yi < ∞ 2π 2 which we recognize as that √ of the χ (r) random variable, with r = 1, and recalling that Γ(1/2) = π. From Eq (9.45) in the text, its characteristic function is 1 φYi (t) = (1 − j2t)1/2

Thus, the characteristic function of Y =

)2 n ( ∑ Xi − µ i=1

σ

=

n ∑ i=1

Yi

4

CHAPTER 13.

is obtained as: φY (t) =

n ∏

φYi (t) =

i=1

=

n ∏

1 1/2 (1 − j2t) i=1

1 (1 − j2t)n/2

which is the characteristic function of the χ2 (n)-distributed random variable. 13.7 The characteristic function for each Xi ; i = 1, 2, . . . , n, is φXi (t) =

1 (1 − j2t)ri /2

Thus, the characteristic function of Y is φY (t)

=

n ∏

φXi (t) =

i=1

=

n ∏

1 ri /2 (1 − j2t) i=1

1 (1 − j2t)r/2

∑n where r = i=1 ri . This is identified as the characteristic function of the χ2 (r) distributed random variable, as required. ∑n 13.8 Let X = i=1 Xi , so that Y = X/n. We know from the reproductive ∑n property of the Poisson random variable that X ∼ P(λ∗ ), where λ∗ = i=1 λ = nλ. As in Example 6.1 of the text, we obtain the pdf of Y as: fY (y) =

(nλ)ny e−nλ 1 2 3 ; y = 0, , , , . . . (ny)! n n n

so that nY ∼ P(nλ). 13.9 From Eq (9.34) of the text, the characteristic function for each Xi ∼ γ(α, β); i = 1, 2, . . . , n, is 1 φXi (t) = (1 − jβt)α so that φY (t)

=

n ∏

φXi (t) = (φXi (t))n

i=1

=

1 (1 − jβt)nα

which we recognize as the characteristic function of Y ∼ γ(nα, β). This is an example of the reproductive property of the gamma random variable with αi = α.

5 Section 13.3 13.10 From the given definitions, we observe first that 1∑ E(Xi ) n i=1 n

¯ E(X) = ˜ E(X)

=

n ∑

ωi E(Xi )

i=1

and since E(Xi ) = µ, and

∑n i=1

ωi = 1, then:

¯ E(X) ˜ E(X)

nµ =µ n n ∑ = µ ωi = µ =

i=1

as required. Next, also from the given definitions, ¯ V ar(X) = ˜ V ar(X) =

n σ2 1 ∑ 2 σ = n2 i=1 n n ∑

ωi2 σ 2 = σ 2

i=1

n ∑

ωi2

i=1

¯ ≤ V ar(X), ˜ we employ an To establish the required result that V ar(X) optimization approach, where we aim to determine the values of ωi required to ˜ subject to the constraint ∑n ωi = 1. minimize V ar(X), i=1 This requires setting up the Lagrangian: [( n ) ] n ∑ ∑ ωi − 1 (13.2) Λ = σ2 ωi2 − λ i=1

i=1

comprising the function to be minimized, augmented by the Lagrange multiplier, λ, scaling the constraint. The optimum values are then determined by solving the equations: ∂Λ =0 (13.3) ∂ωi along with the auxiliary equation ∂Λ =0 ∂λ which simply recovers the constraint. From Eqs (13.2) and (13.3), we obtain: ∂Λ = 2σ 2 ωi − λ = 0 ∂ωi

6

CHAPTER 13.

which simplifies to: λ =C 2σ 2 where C is a constant. Introducing this into the constraint equation yields: ωi =

n ∑

ωi =

i=1

n ∑

C=1

i=1

or nC = 1 ⇒ C =

1 n

Because the second derivative, obtained as: ∂2Λ = 2σ 2 ∂ωi2 is strictly positive, this indicates that the determined optimum value, ωi∗ = 1/n, ˜ is minimized when is truly a minimum. The implication is that V ar(X) ωi =

1 n

˜ and V ar(X) ¯ coincide. Thus, V ar(X) ˜ at which point the value of V ar(X) ¯ achieves a minimum value equal to V ar(X), with any other value of ωi leading to higher variance, hence: ¯ ≤ V ar(X) ˜ V ar(X) as required. 13.11 Define the random variable Y as Y =

n ∑

Xi

i=1

¯ = Y /n. We know from the reproductive property of the Poisson so that X ∑n random variable that Y ∼ P(λ∗ ), where λ∗ = i=1 λ = nλ. From Example 6.1 ¯ immediately as: of the text, we obtain the pdf of X fX¯ (¯ x) =

(nλ)n¯x e−nλ 1 2 3 ; x ¯ = 0, , , , . . . (n¯ x)! n n n

Recall that for the Poisson random variable, the mean µ = λ and the variance σ 2 = λ. From Eqs (13.32) and (13.33) in the text, we see that ¯ = µX¯ = µ = λ E(X) and

2 ¯ = σ 2¯ = σ = λ V ar(X) X n n

7 13.12 In general, for the random variable Y defined as 1∑ Xi n i=1 n

Y =

where each Xi ; i = 1, 2, . . . , n has characteristic function φXi (t), then φY (t) =

n ∏

φXi (t/n)

(13.4)

i=1

¯ is the sample mean and Eq If the Xi are all identically distributed, then Y = X (13.4) simplifies to φX¯ (t) = (φXi (t/n))n (13.5) From Eq (9.34) in the text, the characteristic function for each Xi ∼ γ(α, β); i = 1, 2, . . . , n, is 1 φXi (t) = (1 − jβt)α so that φX¯ (t) =

1 (1 − jβt/n)nα

¯ ∼ γ(nα, β/n). Thus, which we recognize as the characteristic function of X ¯ = (nα)(β/n) = αβ E(X) and

2 ¯ = (nα)(β/n)2 = αβ V ar(X) n consistent with Eqs (13.32) and (13.33) in the text.

13.13 (i) First, note that 1∑ ln Xi n i=1 n

¯g = ln X

Now let Yi = ln Xi so that, from the properties of the lognormal distribution, ¯ g is the sample mean of n i.i.d. N (α, β 2 ) Y ∼ N (α, β 2 ). As a result, Y¯ = ln X Gaussian random variables, from which we conclude that Y¯ ∼ N (α, β 2 /n) ¯ g = eY has a lognormal L(α, β 2 /n) distribution. Therefore, X (ii) Recalling the properties of the Lognormal random variable from Eq (9.146) in the text, we obtain: ¯g ) = E(ln X ¯g ) = V ar(ln X

α β 2 /n

8

CHAPTER 13.

13.14 Since n = 10 samples are obtained from a Gaussian population with ¯ ∼ mean µ = 100 and variance σ 2 = 25, we know that the sample mean, X N (100, 25/10) exactly. The desired probabilities can then be obtained directly from MINITAB using the cumulative probability option for a Normal distribution √ (Calc > Prob Dist > Normal), with mean = 100, standard deviation = 25/10 = 1.581. Alternately, one can use the fact that ¯ − 100 X Z= √ ∼ N (0, 1) 25/10 Either way, the results are as follows: ¯ ≥ 100) = P (Z ≥ 0) = 1 − FZ (0) = 1 − 0.500 = 0.500. (i) P (X ¯ ≤ 100) = P (Z ≤ 0) = FZ (0) = 0.500. (ii) P (X ¯ ≥ 104.5) = P (Z ≥ 2.85) = 1 − FZ (2.85) = 1 − 0.998 = 0.002. (iii) P (X ¯ ≤ 103.1) = P (−1.96 ≤ Z ≤ 1.96) = FZ (1.96) − FZ (−1.96) = (iv) P (96.9 ≤ X 0.975 − 0.025 = 0.950. ¯ ≤ 101.6) = P (−1.01 ≤ Z ≤ 1.01) = FZ (1.01) − FZ (−1.01) = (v) P (98.4 ≤ X 0.844 − 0.156 = 0.688. The sample size will make a difference in the distribution used to compute these probabilities since the distribution’s variance depends on the sample size. However, the results obtained in (i) and (ii) will not be affected because these values remain constant regardless of the variance of the distribution. But the sample size will affect the other remaining probabilities, since the values of these probabilities depend on the variance of the distribution. 13.15 Now, with the sample variance s2 = 24.7, the result that ¯ − 100 X T =√ ∼ t(9) 24.7/10 provides us with the sampling distribution to use in determining the required probabilities. (i) The desired probabilities are therefore obtained from MINITAB using the cumulative probability option for a Student’s t distribution (Calc > Prob Dist > t), with 9 degrees of freedom. The results are as follows: ¯ ≥ 100) = P (T ≥ 0) = 1 − FT (0) = 1 − 0.500 = 0.500. (a) P (X ¯ ≤ 100) = P (T ≤ 0) = FT (0) = 0.500. (b) P (X ¯ ≥ 104.5) = P (T ≥ 2.86) = 1 − FT (2.86) = 1 − 0.991 = 0.009. (c) P (X ¯ ≤ 103.1) = P (−1.97 ≤ T ≤ 1.97) = FT (1.97) − FT (−1.97) = (d) P (96.9 ≤ X 0.960 − 0.040 = 0.920. ¯ ≤ 101.6) = P (−1.02 ≤ T ≤ 1.02) = FT (1.02) − FT (−1.02) = (e) P (98.4 ≤ X 0.832 − 0.168 = 0.665. (ii) With n = 30, this time, ¯ − 100 X T =√ ∼ t(29) 24.7/30

9 so that: ¯ ≥ 100) = P (T ≥ 0) = 1 − FT (0) = 1 − 0.500 = 0.500. (a) P (X ¯ ≤ 100) = P (T ≤ 0) = FT (0) = 0.500. (b) P (X ¯ (c) P (X ≥ 104.5) = P (T ≥ 4.96) = 1 − FT (4.96) = 1 − 1.000 = 0.000. ¯ ≤ 103.1) = P (−3.42 ≤ T ≤ 3.42) = FT (3.42) − FT (−3.42) = (d) P (96.9 ≤ X 0.999 − 0.001 = 0.998. ¯ ≤ 101.6) = P (−1.76 ≤ T ≤ 1.76) = FT (1.76) − FT (−1.76) = (e) P (98.4 ≤ X 0.956 − 0.044 = 0.912. 2 13.16 The mean µX¯ and variance σX ¯ of the sample mean of the random variable, X, whose underlying population distribution is unknown, is determined from Eqs (13.32) and (13.33) in the text. In this specific case, the results are as follows: σ2 2 (i) µX¯ = µ = 50; σX ¯ = n = 20/10 = 2. σ2 2 (ii) µX¯ = µ = 50; σX ¯ = n = 20/20 = 1. (iii) Since the underlying population distribution is unknown, but the mean µ and variance σ 2 are given, we may make use of the Central Limit Theorem, i.e., ¯ X−µ √ has an approximate standard normal N (0, 1) distribution. Thus, that Z = σ/ n ¯ ∼ N (µ, σ 2 /n). The to a good approximation for n = 50, we may assume that X desired probability is therefore determined as follows:

¯ − 50| ≤ 3) = P (|X = =

¯ ≤ 53) P (47 ≤ X ( ) 47 − 50 53 − 50 P √ ≤Z≤ √ 20/50 20/50 P (−4.74 ≤ Z ≤ 4.74) = 1.000

implying that it is virtually certain that a sample mean computed from a sample of size n = 50 will not deviate from the true but unknown population mean by more than ±3. 13.17 Since the underlying distribution is unknown, but the population variance is specified as σ 2 = 4, Eq (13.34) in the text applies here. As such: (i) The standard deviation of the mean for n = 50 is √ σ 4 σX¯ = √ = √ = 0.283 n 50 (ii) If n = 25 instead, then σX¯

√ σ 4 = √ = √ = 2/5 = 0.4 n 25

(iii) In general, because σ σX¯ i = √ ni

10

CHAPTER 13.

¯ 1 and X ¯ 2 obtained respectively using two different sample for sample means X sizes n1 and n2 , then √ σX¯ 1 n2 = (13.6) σX¯ 2 n1 And now, if σX¯ 2 = then from Eq (13.6), we obtain 4=

σX¯ 1 2 n2 n1

In the specific case with n1 = 50, the result is: n2 = 200 Thus, to reduce the standard deviation of the mean by 1/2 √ requires a 4-fold increase in sample size, reflecting the inverse square root (1/ n) dependence of the standard deviation of the mean on sample size. Section 13.4 13.18 Because the sample (of size n = 20) is known to be from a normal population with mean µ = 100 and variance σ 2 = 10, the required sampling distribution is obtained from Eq (13.43) of the text, i.e., C=

(n − 1)S 2 ∼ χ2 (n − 1) σ2

The desired probabilities may then be computed in MINITAB using the cumulative probability option for a Chi-square distribution (Calc > Prob Dist > Chi-Square) with degrees of freedom = 20 − 1 = 19. The results are as follows: (i) ( ) (20 − 1)10 P (S 2 ≤ 10) = P C ≤ = P (C ≤ 19) = FC (19) = 0.543 10 (ii)

( ) (20 − 1)9.5 P (S ≤ 9.5) = P C ≤ = P (C ≤ 18.05) = 0.481 10 2

(iii) For n = 1.5nold = 30, then the degrees of freedom, ν = 29, and ( ) (30 − 1)9.5 P (S 2 ≤ 9.5) = P C ≤ = P (C ≤ 27.55) = 0.458 10 (iv) For n = 0.5nold = 10, then ν becomes 9, and ( ) (10 − 1)9.5 P (S 2 ≤ 9.5) = P C ≤ = P (C ≤ 8.55) = 0.520 10

11 As the sample size increases, the probability of obtaining a sample variance that will be smaller than the true value decreases—implying that, as n increases, one is more likely to obtain a value that is closer to the true population variance than underestimate the true value. 13.19 The required sampling distribution is obtained from Eq (13.43) of the text, i.e., (n − 1)S 2 C= ∼ χ2 (n − 1) σ2 (i) We compute the probabilities in MINITAB using the cumulative probability option for a Chi-square distribution (Calc > Prob Dist > Chi-Square) with ν = 19: ( ) (20 − 1)9.8 = P (C1 ≤ 18.62) = 0.519 P (S12 ≤ 9.8) = P C1 ≤ 10 And, with ν = 29, P (S22

( ) (30 − 1)11.2 ≥ 11.2) = P C2 ≥ = 1−P (C2 ≤ 32.48) = 1−0.701 = 0.299 10

(ii) We use the inverse cumulative probability option for a Chi-square distribution (Calc > Prob Dist > Chi-Square) with ν = 19 to obtain: P (C ≥ χ20 ) = 0.05 ⇒ P (C ≤ χ20 ) = 0.95 ⇒ χ20 = FC−1 (0.95) = 30.14 (iii) With ν = 29, we obtain: P (C ≤ χ20 ) = 0.05 ⇒ χ20 = FC−1 (0.05) = 17.71 13.20 Here, we make use of the result from Eq (13.44) of the text that F =

S12 ∼ F (n1 − 1, n2 − 1) S22

The desired probability is then computed in MINITAB using the cumulative probability option for a Fisher F distribution (Calc > Prob Dist > F) with numerator degrees of freedom = 19, denominator degrees of freedom = 29: ( 2 ) S1 11.2 P ≥ = 1 − P (F ≤ 1.143) = 1 − 0.636 = 0.364 S22 9.8 S2

13.21 Again, we use Eq (13.44) in the text: F = S12 ∼ F (n1 − 1, n2 − 1). The 2 desired probabilities are computed in MINITAB using the cumulative probability option for a Fisher F distribution (Calc > Prob Dist > F) with numerator degrees of freedom = 50, denominator degrees of freedom = 50. (i) ( 2 ) S1 P ≥ 1 = 1 − P (F ≤ 1) = 1 − 0.500 = 0.500 S22

12

CHAPTER 13. ( P

(ii)

( P

S22 ≤1 S12

) = P (F ≤ 1) = 0.500

) S12 ≥ 1.25 = 1 − P (F ≤ 1.25) = 1 − 0.784 = 0.216 S22 ( 2 ) S2 P ≤ 0.8 = P (F ≤ 0.8) = 0.216 S1 s2

This confirms our expectation that P (S22 /S12 ≤ α) = P (S12 /S22 ≥ 1/α).

APPLICATION PROBLEMS 13.22 (i) The data set is a record of the number of rare events occurring over a finite interval of time, precisely the sort of phenomenon for which the Poisson model is ideal. Thus, the Poisson pdf is a reasonable model for representing the data. (ii) From the data set, the sample mean for the combined 10-year data is obtained as ¯ = 4.45 X Using this as an estimate for λ, the following pdf is obtained for describing the population: (4.45)x e−4.45 f (x|λ = 4.45) = x! The required probabilities are shown in the table below: x 1 2 3 4 5 6 7 8 9 10

f (x|λ) λ = 4.45 0.0520 0.1156 0.1715 0.1908 0.1698 0.1260 0.0801 0.0445 0.0220 0.0098

¯ is obtained from the result that (iii) The distribution for X ¯ ∼ P(nλ) nX From here, we observe that ¯ ≥ 3) = P (nX ¯ ≥ 3n) P (X

13 ¯ ≥ 120), where the random variable and with n = 40, we wish to compute P (nX ¯ Y = nX has a Poisson P(nλ) distribution, or, in this case, P(178). From MINITAB, we obtain the required probability as: P (Y ≥ 120) = 1 − P (Y < 120) = 1 − P (Y ≤ 119) = 1 − 0.0000016 = 0.9999984 Similarly, ¯ ≤ 6) = P (nX ¯ ≤ 6n) P (X ¯ ≤ 240) for the random variable or, with n = 40, we wish to compute P (nX ¯ ∼ P(nλ), or P(178). The required probability is obtained as: Y = nX P (Y ≤ 240) = 1.0000 (iv) From the Poisson pdf with λ = 4.45, we obtain: P (X ≤ 6) = 0.8374 P (X ≥ 3) = 1 − P (X < 3) = 1 − P (X ≤ 2) = 1 − 0.1793 = 0.8207 For purposes of this problem, the data sets show: 1. 13 values out of 20 that are less than or equal to 6 in Period I, compared to 18 out of 20 for Period II. 2. 19 values out of 20 that are greater than or equal to 3 in Period I, compared to 10 out of 20 for Period II. or, in tabular form:

Condition X≤6 X≥3

Period I Frequency

Period II Frequency

13 20 19 20

18 20 10 20

And if we assume that data frequency is approximately representative of probability, then we may obtain the following empirical probabilities from the data: P (X ≤ 6) = 0.65 for Period I and 0.9 for Period II; and P (X ≥ 3) = 0.95 for Period I and 0.5 for Period II. These empirical probabilities are not consistent with the corresponding theoretical probabilities—implying that the postulated model is not likely to be plausible. 13.23 From each separate data set, we obtain: ¯ 1 = 6.0 and X ¯ 2 = 2.9 X The exact sampling distributions for these means are obtained from ¯ i ∼ P(nλi ) nX

14

CHAPTER 13.

From here, we observe that ¯ 1 ≥ 3|λ1 = 6.0) = P (nX ¯ 1 ≥ 3n|λ1 = 6.0) P (X ¯1, or, with n = 20, so that nλ1 = 120 and Y = 20X P (Y ≥ 60|nλ1 = 120) = 1 − P (Y ≤ 59|nλ1 = 120) = 1 − 0 = 1 Similarly, ¯ 2 ≤ 6|λ2 = 2.9) = P (nX ¯ 2 ≤ 120|λ2 = 2.9) P (X ¯2, or, with n = 20, so that nλ2 = 58 and Y = 20X P (Y ≤ 120|nλ2 = 58) = 1 Recalling that the sample averages were computed as 6 for Period I and 2.9 (approximately 3) for Period II, the implications of these results are as follows: 1. If these two populations are truly the same, then the observed difference in computed sample means occurred by pure chance alone; the sample mean computed for Period I could equally well have been close to 3 and that for Period II could equally well have been close to 6. 2. However, if one considers the alternative postulate for Period I, that the true population mean is greater than 3 (the sample average value obtained for Period II), the probability that this alternative postulate is true is 1, implying that the true population mean for Period 1 is almost certainly greater than 3. 3. Similarly, the postulate that the true population mean for Period II is less than 6 (the sample average value computed for Period 1) has probability 1 of being true, implying that the true mean for Period II is almost certainly less than 6. Thus, these two populations are more likely to be different because there is overwhelming evidence in support of the alternative postulates: (a) that the population mean for Period I is truly greater than 3, as opposed to the assumption that the true mean takes the same value of 3 as that for Period II; and (b) that the population mean for Period II is truly less than 6, as opposed to the assumption that the true mean takes the same value of 6 as that for Period I. 13.24 From the phenomenological characteristics of the random variable in question, i.e., that X ∼ P(λ), the stated propositions imply that λ1 = 6 for Period I, and λ2 = 3 for Period II. With these specifications, the theoretical probability distributions for the occurrence of safety incidents in each period can be computed from the respective Poisson pdfs; the results are shown in the following table along with a side-by-side comparison with the corresponding empirical distributions, fEI (x) and fEII (x), obtained from the data sets.

15

Scatterplot of fEP1x, f1x vs x Variable fEP1x f1x

0.25

0.20

f1(x)

0.15

0.10

0.05

0.00 0

2

4

6

8

10

x

Figure 13.1: Comparison of empirical distribution (solid line, circles) and theoretical probability distribution (dashed line, squares) of safety incidents in Period I.

x 0 1 2 3 4 5 6 7 8 9 10

Period I Frequency 0 0 1 3 1 5 3 1 2 1 3

fEI (x)

fI (x|λ = 6)

0.00 0.00 0.05 0.15 0.05 0.25 0.15 0.05 0.10 0.05 0.15

0.002479 0.014873 0.044618 0.089235 0.133853 0.160623 0.160623 0.137677 0.103258 0.068838 0.041303

Period II Frequency 1 4 5 2 6 0 0 2 0 0 0

fEII (x)

fII (x|λ = 3)

0.05 0.20 0.25 0.10 0.30 0.00 0.00 0.10 0.00 0.00 0.00

0.049787 0.149361 0.224042 0.224042 0.168031 0.100819 0.050409 0.021604 0.008102 0.002701 0.000810

Comparisons of the empirical and theoretical distributions are shown in Figs 13.1 and 13.2, which show generally good agreement between the data and the theoretical models, especially given that the sample size in each case (n = 20) is quite modest. However, the propositions can be compared with the information in the data more qunatitatively by computing the specific probabilities investigated in Problem 13.22, i.e., P (X ≤ 6) and P (X ≥ 3). For Period I, where the postulated population mean, λ1 = 6, the theoretical probabilities are: P (X ≤ 6) = 0.606 P (X ≥ 3) = 1 − P (X ≤ 2) = 1 − 0.062 = 0.938

16

CHAPTER 13.

Scatterplot of fEP2x, f2x vs x Variable fEP2x f2x

0.30 0.25

f2(x)

0.20 0.15

0.10 0.05 0.00 0

2

4

6

8

10

x

Figure 13.2: Comparison of empirical distribution (solid line, circles) and theoretical probability distribution (dashed line, squares) of safety incidents in Period II.

For Period II, with λ2 = 3, they are: P (X ≤ 6) =

0.966

P (X ≥ 3) =

1 − P (X ≤ 2) = 1 − 0.423 = 0.577

or, in tabular form,

Condition X≤6 X≥3

Period I Theoretical Probability 0.606 0.938

Period II Theoretical Probability 0.966 0.577

When these probabilities are compared with the empirical frequencies obtained in Problem 13.22 (reproduced here with the fractions converted to decimals for convenience), we see that indeed the propositions are consistent with the data. Condition X≤6 X≥3

Period I Frequency 13 20 = 0.65 19 20 = 0.95

Period II Frequency 18 20 = 0.90 10 20 = 0.50

¯ is obtained from the result that 13.25 (i) The exact distribution for X ¯ ∼ P(nλ) nX As such, the required probabilities are obtained from: ¯ i ≥ 4.5) = P (nX ¯ i ≥ 4.5n) = P (20X ¯ i ≥ 90); i = 1, 2 P (X

17 ¯ 1 has a Poisson P(nλ1 ) or, in this case, where the random variable Y1 = nX ¯ P(120), and similarly, Y2 = nX2 has a Poisson P(nλ2 ) or, P(60). From MINITAB, we obtain the required probabilities as: ¯ 1 ≥ 4.5) = 1 − P (Y1 ≤ 89|nλ1 ) = 1 − 0.0019 = 0.9981 P (X ¯ 2 ≥ 4.5) = 1 − P (Y2 ≤ 89|nλ2 ) = 1 − 0.9998 = 0.0002 P (X ¯ when X ∼ P(λ) are as follows: E(X) ¯ = λ, and (ii) The characteristics of X ¯ = λ/n. Thus, if n is large enough, so that the Central Limit Theorem V ar(X) ¯ is obtained as N (λ, λ/n). Thus, in is applicable, the sampling distribution of X ¯ 1 ∼ N (6, 0.3) and X ¯ 2 ∼ N (3, 0.15). These the specific case of this problem, X approximate sampling distributions may now be used to compute the desired probabilities; the results are: ¯ 1 ≥ 4.5) = 0.9969 P (X ¯ 2 ≥ 4.5) = 0.000053 P (X results that are reasonably close to the precise values computed in part (i) above. Thus, the Gaussian approximation, even though not “exact,” provides reasonable results that are very close to the exact probabilities computed using the exact sampling distributions. More importantly, the conclusions one would reach by interpreting these results would be identical, namely that it is virtually certain that the average for Period I exceeds 4.5, while it is virtually impossible that the average for Period II exceeds 4.5. 13.26 (i) From the given data, we obtain the sample average, x ¯ = 30.1. And now, if Xi ∼ E(β), then from results in Chapter 6, or more specifically, from Eq ¯ defined as (9.39) in the text, we know that the random variable X, ∑ ¯= 1 X Xi n i=1 n

has the gamma distribution γ(n, β/n). Thus, the exact sampling distribution ¯ when Xi ∼ E(β) is γ(n, β/n). for X ¯ < 30) using the (ii) Under the given conditions, we require the probability P (X distribution γ(10, 4), since β is specified as 40, and n = 10. Using MINITAB, the result is obtained as: ¯ < 30|β = 40) = 0.2236 P (X (iii) On the other hand, if β is postulated instead as 30, then the implied sam¯ will be γ(10, 3) so that, this time pling distribution for X ¯ < 30|β = 30) = 0.5421 P (X a probability that is more than twice the value of the probability obtained under the postulate that β = 40. Thus, the claim by the independent auditor that

18

CHAPTER 13.

β = 30 appears to be more believable. 13.27 The desired probabilities are computed as follows, for a random variable X ∼ E(β): P (X ≤ 10|β = 40) = 0.2212 P (X ≤ 10|β = 30) = 0.2835 From the data record, we observe that only three values (1,1,9) are less than 10, i.e., the frequency of occurrence of numbers less than 10 in the data record is 3/10. Even though the sample sized n = 10 is small, we see that the theoretical probability computed for β = 30 is closer to the empirical frequency of 3/10 than the theoretical probability computed for β = 40. Thus, it appears as if the value of the population parameter is more likely to be β = 30. 2 2 13.28 (i) The sampling distribution for Y¯A is N (75.5, 1.5 50 ), i.e., N (75.5, 0.212 ), from where we obtain the required probability as:

P (75.0 < Y¯A < 76.0) = 0.9817 (ii) Using the distribution N (75.5, 0.2122 ), the desired probability is obtained as: P (Y¯B ≤ 72.47) = 1.2−46 which is essentially zero. This implies that the probability is practically zero that one would obtain the value 72.47 computed for Y¯B , or something less, when the true value is the same as the value for Y¯A . The implication is that it is virtually impossible that the “no difference” postulate is plausible. (iii) Using the sampling distribution N (75.5, 0.2122 ), the value of η0 for which P (Y¯A ≤ η0 ) = 0.05 is obtained as η0 = 75.15 (see Fig 13.3), a value that is much higher than y¯B = 72.47. Observe, therefore, that the result η0 = 75.15 indicates that there is a very small probability (specifically 0.05) that the true value of Y¯A will be less than 75.15. Now, the fact that this “critical value” is much higher than y¯B = 72.47 implies that it is highly implausible that the YB data, with sample average, y¯B = 72.47, comes from the same population as the YA data, with sample average, y¯A = 75.52—a population with the characteristic that there is only a very small probability (0.05) that its true mean value is less than 75.15. 2

13.29 Under the given conditions, the sampling distribution for Y¯B is N (72.5, 2.5 50 ), i.e., N (72.5, 0.3542 ), from where we obtain the required probabilities as: P (Y¯B ≤ 72.47) = 0.4662 P (72.0 < Y¯B < 73.0) = 0.8422

19

Distribution Plot Normal, Mean=75.5, StDev=0.212 2.0

f(x)

1.5

1.0

0.5 0.05 0.0

75.15

75.5 X

Figure 13.3: Tail area probability for the sampling distribution of Y¯A , N (75.5, 0.2122 ), indicating that P (Y¯A ≤ η0 ) = 0.05 requires that η0 = 75.15.

The implications are that, on the basis of the alternative proposition, (i) there is almost a 50-50 chance that the true value of the population mean is less than or equal to the computed sample average of 72.47; and (ii) there is a very high probability (0.84) that the true value of the population mean lies between 72 and 73, an interval that includes the computed average, 72.47. These results imply that the alternative proposition is highly plausible, especially when compared to the results of Problem 13.28. 13.30 If we assume that the sample size, n, is sufficiently large, and/or that the variability in the ball bearings diameter, X, is approximately normally dis¯ is normal with mean µ = 10 and tributed, then the sampling distribution of X standard deviation: 0.1 σX¯ = √ n Now, the desired probability statement, ¯ ≤ 10.00 + 0.02) = 0.95 P (10.00 − 0.02 ≤ X implies that 1.96σX¯ = 0.02 so that

0.1 0.02 σX¯ = √ = 1.96 n which, when solved for n, yields ( )2 0.02 n= = 96.04 1.96

20

CHAPTER 13.

rounded up to give the required sample size as n = 97 13.31 (i) With n = 4 and the given standard deviation, we obtain: 3σ √ = 0.15 n so that the upper limit will be 10.15 and the lower limit, 9.85. ¯ the mean of the measured ball Upon assuming that the distribution of X, bearing diameter, is well-approximated by a normal distribution, the mean and √ standard deviation of this distribution will be µ = 10 and σx¯ = 0.1/ n2 = 0.05; ¯ < 10.15)], as a result, the required probability is obtained as [1 − P (9.85 ≤ X computed from the N (10, 0.052 ) distribution. Since under these conditions, ¯ < 10.15) = 0.9973 P (9.85 ≤ X then the required probability—that x ¯ falls outside the specified limits when the process is operating as expected—is (1 − 0.9973) = 0.0027. ¯ > 10.15|µ = 10.10) computed from the (ii) The required probability is P (X 2 distribution, N (10.10, 0.05 ). The result is: ¯ > 10.15|µ = 10.10) = 0.1587 P (X 13.32 (i) The sampling distribution of interest is obtained from Eq (13.43) of the text, i.e., (n − 1)S 2 C= ∼ χ2 (n − 1) σ2 so that, with n = 50, as given in Problem 13.28: ( ) 49 × 1.52 2 P (SA ≥ 1.52 ) = P C 2 ≥ = P (C 2 ≥ 49) 1.52 which, computed from the χ2 (49) distribution, yields: 2 P (SA ≥ 1.52 ) = 0.4731 2 (ii) We are required to compute P (SB ≥ 7.62) using the sampling distribution of part (i) with σ 2 = 1.52 . Under these conditions, ( ) 49 × 7.62 2 P (SB ≥ 7.62) = P C 2 ≥ = P (C 2 ≥ 165.95) 1.52

and when computed from the χ2 (49) distribution, the result is: 2 P (SB ≥ 7.62) = 1.2657 × 10−14

virtually zero. The implication is that the postulate that the YB data came from the same population as the YA data is highly implausible, at least from

21 the perspective of the estimates of the population variance. 2 (iii) Under the specified circumstances, we require P (SB ≥ 7.62), computed using the same sampling distribution as in parts (i) and (ii), this time with σ 2 = 2.52 . In this case, ( ) 49 × 7.62 2 P (SB ≥ 7.62) = P C 2 ≥ = P (C 2 ≥ 59.74) 2.52 to yield: 2 P (SB ≥ 7.62) = 0.14

implying that this new alternative postulate is more plausible. 13.33 The sampling distribution of interest here is obtained from the result in Eq (13.44) of the text, i.e., F =

S12 ∼ F (n1 − 1, n2 − 1) S22

This particular problem requires computing: ( 2 ) SB 7.62 P > 2 SA 2.05 using the F (49, 49) distribution. The result is: ( 2 ) SB −6 P 2 > 3.717 = 4.8749 × 10 SA which is virtually zero, and implies that the proposition is highly implausible. 13.34 (i) Under the postulate that the two data sets came from the same population, they can be combined into one set, with the sample mean and standard deviation obtained respectively as: x ¯ = 72; s = 4.834 ¯ > Because the population variance is unknown, the required probability, P (X 74), is to be computed using the sampling distribution, T =

¯ −µ X √ ∼ t(ν) s/ n

In this case, with µ = 70, s = 4.834, and n = 20 for the combined data set, then ( ) 4 ¯ √ P (X > 74) = P T > = P (T > 3.7) 4.834/ 20 which, when computed from a t(ν) distribution with ν = n − 1 = 19 degrees of freedom, yields ¯ > 74) = 0.00075 P (X

22

CHAPTER 13. Now, the individual sample means are obtained as: x ¯A = 70; x ¯B = 74;

As a result, and because of the extremely low probability of obtaining a sample mean of 74 or higher under the postulate in question, these results indicate that the specified postulate is highly implausible. (ii) Under the stated conditions, with the standard deviation for the B data set obtained as sB = 5.395, the desired limits are: 75-3.412 = 71.588, and 75 + ¯ < 78.412), to be computed 3.412 = 78.412. As such, we now seek P (71.588 < X using the random variable: ¯ − 75 X √ T = 5.395/ 10 which possesses a t(9) distribution, i.e., ( ) 71.588 − 75 78.412 − 75 ¯ √ √ P (71.588 < X < 78.412) = P 0, we have retained the only valid root). From the supplied data, whose moments are computed as: M1 = 3.8; M2 = 30.5 the two method of moments estimates are obtained as: pˆ1

=

0.2632

pˆ2

=

0.2402

(ii) The inverse harmonic mean, defined as 1∑ 1 1 = x ˆh n i=1 xi n

is computed for this data set as: 1 = 0.556 x ˆh

11 a value that is substantially higher than the population parameter p = 0.25. Thus, the inverse harmonic mean is not as good an estimate as the two method of moments estimates, both of which are much closer to the true population parameter value. 14.15 From the expression for the mean of the exponential random variable, we obtain: M1 = β so that the first method of moments estimator will be: βˆ1 = M1 and from the expression for the variance, and the relationship between the variance and the first two moments, we obtain: M2 = σ 2 + M1 ⇒ M2 = β 2 + β which, when solved for β, gives the second method of moments estimator as: √ −1 + 1 + 4M2 ˆ β2 = 2 (retaining the only valid root of the quadratic equation, since β > 0). From the supplied data, we obtain the moments as: m1 = 3.912; m2 = 30.402 from which we obtain the following point estimates: βˆ1 βˆ2

= 3.912 = 5.036

The first point estimate is much closer to the true value of β = 4 than the second point estimate. 14.16 The first two moments M1 and M2 for any random variable are related to the theoretical mean and variance as follows: µ = σ2 =

M1 M2 − M12

Specifically for the beta B(α, β) random variable, for which the mean and variance are as given in the text, the desired result may be obtained by first solving the following two equations simultaneously for α and β in terms of µ and σ 2 : α α+β αβ (α + β)2 (α + β + 1)

= µ = σ2

12

CHAPTER 14.

and subsequently replacing µ and σ 2 with the expressions for the moments given above. The result is: [ ] (1 − µ)µ α = µ −1 σ2 [ ] (1 − µ)µ β = (1 − µ) −1 σ2 from where we obtain:

[

] (1 − M1 )M1 − 1 (M2 − M12 ) [ ] (1 − M1 )M1 − 1 (1 − M1 ) (M2 − M12 )

α

= M1

β

=

which may be simplified further if so desired. 14.17 From the relationship between (µ, σ 2 ) for any random variable, and the two moments (M1 , M2 ), µ = M1 σ 2 = M2 − M12 we obtain, specifically for the Rayleigh random variable, that: √ M1 = b π/2 M2 = b2 (2 − π/2) + b2 (π/2) = 2b2 From the first expression, we obtain: ˆb1 = M1 and from the second, ˆb2 =



2/π

√ M2 /2

14.18 Because the first two moments M1 and M2 for any random variable are related to the theoretical mean and variance as follows: µ = M1 σ 2 = M2 − M12 and since, for the gamma random variable, µ = αβ σ 2 = αβ 2 the objective is to solve, simultaneously, the following two equations for α and β in terms of M1 and M2 : M1

=

αβ

M2 − M12

=

αβ 2

13 Upon dividing the second equation by the first, we obtain immediately: β=

M2 − M12 M1

which, upon introduction into the first equation and solving for the surviving α, yields: M12 α= M2 − M12 14.19 The maximum of the given likelihood function, ( ) n X L(p) = p (1 − p)n−X X may be determined first by taking logs, to obtain the log-likelihood function, ( ) n ℓ(p) = ln L(p) = ln + X ln p + (n − X) ln(1 − p) X and then, upon differentiating ℓ(p) with respect to p, and setting the result to zero, we obtain: ∂ℓ(p) X n−X = − =0 ∂p p (1 − p) which simplifies to give: (1 − p)X = p(n − X) and, when solved for p, yields: pˆ =

X n

as required. 14.20 From the pdf for the geometric random variable, f (x; θ) = θ(1 − θ)X−1 the joint pdf (and likelihood function) for the random sample, X1 , X2 , . . . , Xn , is obtained as: ∑ L(θ) = θn (1 − θ) Xi −n From here, by differentiating the log-likelihood with respect to θ and setting the result to zero, we obtain: ∑ ∂log(L) n (Xi − n) = − =0 ∂θ θ 1−θ which, when solved for θ, yields: 1 n = ¯ θˆ = ∑ Xi X

(14.19)

14

CHAPTER 14.

as the MLE for the parameter, θ. Now, upon taking expectations in Eq (14.19), we obtain: ( ) ˆ = E 1 ̸= θ E(θ) ¯ X ˆ ̸= θ. On the other hand, also from so that indeed, θˆ is biased for θ since E(θ) Eq (14.19), we see that: 1 ¯ =X θˆ so that:

( n ) ( ) ∑ nE(Xi ) 1 ¯ E = E(X) = E Xi /n = = E(Xi ) n θˆ i=1

and since, for the geometric random variable, E(Xi ) =

1 θ

( ) 1 1 = E ˆ θ θ

we have that:

so that 1/θˆ is unbiased for 1/θ. 14.21 The likelihood function for a random sample X1 , X2 , . . . , Xn from a general negative binomial N Bi(k, p) population is: ∏ (k + Xi − 1) ∑ L(k, p) = pkn (1 − p) Xi Xi from where the log-likelihood is obtained as: n ∑ (k + Xi − 1) ∑ ln + nk ln p + Xi ln(1 − p) ℓ(k, p) = Xi i i=1

(14.20)

The maximum likelihood estimates for k and p are obtained from: ∂ℓ(k, p) ∂p ∂ℓ(k, p) ∂k

= 0 = 0

by solving both equations simultaneously. Upon taking appropriate derivatives in Eq (14.20), we obtain: ∂ℓ(k, p) ∂p ∂ℓ(k, p) ∂k

kn ∑ Xi − =0 p 1−p i=1 n

=

= n ln p +

n ∑ i=1

[ψ(Xi + k) − ψ(k)] = 0

(14.21) (14.22)

15 where ψ(z) is the so-called psi-function defined as: ψ(z) =

d ln Γ(z) dz

(14.23)

(Eq (14.22) is obtained by using the gamma function representation of factorials.) Unfortunately, because of the presence of the psi functions, there is no closed form analytical solution to Eq (14.22), so that the MLEs must be obtained numerically. However, when k is a fixed, known constant, the MLE for p is obtained by solving Eq (14.21) to yield: pˆ = θˆ =

kn k ∑ = ¯ kn + Xi k+X

Observe from here that:

( ˆ =E E(θ)

k ¯ k+X

(14.24)

) ̸= θ

however, from Eq (14.24), we obtain the inverse as: ¯ ¯ 1 k+X X = =1+ ˆ k k θ so that: E

( ) ¯ 1 E(X) µ =1+ =1+ k k θˆ

where µ is the mean of the negative binomial random variable, given by: µ= Thus,

k(1 − p) p

( ) 1 (1 − p) 1 1 E =1+ = = ˆ p p θ θ

as required. 14.22 The likelihood function for a random sample X1 , X2 , . . . , Xn from a gamma γ(α, β) population is: L(α, β) =

N ∏ i=1

1 β α Γ(α)

e−Xi /β Xiα−1

from where the log-likelihood is obtained as: ℓ(α, β) = −nα ln β − n ln Γ(α) −

n ∑ Xi i=1

β

+ (α − 1)

n ∑ i=1

ln Xi

(14.25)

16

CHAPTER 14.

(i) Now, if α is specified, the MLE for β is obtained by solving the following equation for β: ∑n Xi ∂ℓ(α, β) nα =− + i=12 =0 ∂β β β which, when solved for β, yields: ∑n ¯ Xi X = (14.26) βˆ = i=1 nα α (ii) When α is not specified, obtaining the MLE estimate for this parameter requires maximizing the log-likelihood function with respect to α, as was just done for β. For this, we substitute Eq (14.26) into Eq (14.25) for β to obtain: ( ∑n ) n ∑ i=1 Xi ℓ(α) = −nα ln − n ln Γ(α) − nα + (α − 1) ln (Xi ) (14.27) nα i=1 And now, taking the derivative with respect to α, setting to zero, and simplifying, yields: ( ∑n ) ∑n ln Xi i=1 Xi ln α − ψ(α) = ln − i=1 (14.28) n n where ψ(α) is the so-called psi-function defined as: ψ(z) =

d ln Γ(z) dz

Unfortunately, there is no closed-form solution to Eq (14.28) because of the presence of the psi-function. On the other hand, the method-of-moments estimates (from Problem 14.18, for example) are: βˆ = α ˆ

=

M2 − M12 V ar(x) = M1 M ean(x) M12 M2 − M12

which are entirely different from the MLEs. In particular, note that βˆ above is the ratio of the variance (in the numerator) to the mean (in the denominator); the MLE equivalent in Eq (14.26) involves the mean in the numerator and α—whatever that turns out to be—in the denominator. No such comparison is even possible with α ˆ and the MLE equivalent, since no closed-form analytic expression exists for the latter. Sections 14.4 and 14.5 14.23 (i) The problem involves determining from the following probabilistic statement concerning samples from a normal population, ] [ ¯ −µ X √ < zα/2 = (1 − α) (14.29) P −zα/2 < σ/ n

17 specific values for the indicated interval, given a corresponding value of α. From here, the required interval may be represented as ( ) ¯ ± zα/2 √σ µ=X (14.30) n where zα/2 is determined from Eq (14.29) given α. In the specific case where the desired probability is given as 0.9, implying that α = 0.10, we obtain zα/2 = 1.645 As such, with the given values of x ¯, n and σ, we obtain, from Eq (14.30) that the required interval is: 73.58 < µ < 75.22 (ii) When the desired probability is given as 0.95, so that α = 0.05, we obtain zα/2 = 1.96 with the result that this time, the interval will be: 73.42 < µ < 75.38 (iii) When the population variance is not given, one must now use the sample variance S in place of σ; furthermore, under these circumstances, the required interval will now be represented as ( ) ¯ ± tα/2 (n − 1) √S µ=X (14.31) n where tα/2 (n − 1) is determined from [ ] ¯ −µ X √ < tα/2 (n − 1) = (1 − α) P −tα/2 (n − 1) < S/ n

(14.32)

given α, a probability that is computed from the t(ν) distribution, with ν = n−1 degrees of freedom. And now, when the desired probability is given as 0.9, implying that α = 0.10, we obtain: tα/2 (19) = 1.73 so that from Eq (14.32), the required interval is 73.49 < µ < 75.32 When the desired probability is 0.95, implying that α = 0.05, tα/2 (19) = 2.09 and therefore, from Eq (14.32), the required interval is 73.29 < µ < 75.51

18

CHAPTER 14.

¯ follows a 14.24 Upon the (reasonable) assumption that the sample mean, X, 2 2 Gaussian distribution with mean λ and variance σ /n, where σ is the population variance, then it follows that with probability (1 − α), the unknown population mean value, λ, will lie in the following interval: ) ( ¯ ± tα/2 (n − 1) √S (14.33) λ=X n where S is the sample standard deviation. For the specific data set in question, the standard deviation is obtained as 1.097. With the given value of x ¯ = 1.02, and n = 60, Eq (14.33) is used to generate the following table which shows the required interval for various values of the desired probability. Probability 0.90 0.95 0.99

α 0.10 0.05 0.01

tα/2 (59) 1.67 2.00 2.66

Min(λ) 0.783 0.737 0.643

Max(λ) 1.257 1.303 1.397

14.25 The given expression for C is rearranged to yield: S2 =

C σ2 (n − 1)

where, upon taking expectations, we obtain: E(S 2 ) =

E(C) 2 σ (n − 1)

Now, because C ∼ χ2 (n − 1), we know immediately from the characteristics of the χ2 random variable, that E(C) = (n − 1) with the immediate result that E(S 2 ) = σ 2 so that, indeed, the estimator S 2 is unbiased for the population variance, σ 2 . 14.26 This problem requires determining the tail area probability, α/2, from a standard normal distribution, given zα/2 , from which the confidence level is obtained as (1 − α) × 100%. (i) With zα/2 = 1.645, obtain α/2 = 0.05, so that the implied confidence level is 90%. (ii) With zα/2 = 2.575, obtain α/2 = 0.005, so that the implied confidence level is 99%. (iii) With zα/2 = 3.000, obtain α/2 = 0.0027, so that the implied confidence level is 99.73%. 14.27 (i) We begin by invoking a well-known result:

19 The sum of 2 independent normal random variables X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ) is another normal random variable, XΣ ∼ 2 N (µΣ , σΣ ) where µΣ 2 σΣ

= µ1 + µ2 = σ12 + σ22

(14.34) (14.35)

This result is straightforward to establish using the characteristic function method. Recall from Eq (9.127) in the text that the characteristic function for a Gaussian random variable with mean µ and variance σ 2 is: { } 1 φ(t) = exp jµt − σ 2 t2 2 Furthermore, in general, if the CF for X1 is φ1 (t), and that for X2 is φ2 (t), then the CF for XΣ = X1 + X2 is φΣ (t) = φ1 (t)φ2 (t) so that from the CF for the Gaussian random variable, we obtain: { } 1 2 2 2 φΣ (t) = exp j(µ1 + µ2 )t − (σ1 + σ2 )t 2 recognized immediately as the CF of a Gaussian random variable with mean and variance precisely as specified in Eqs (14.34) and (14.35) respectively. 2 We may now invoke this result to obtain, immediately, that if X ∼ N (µX , σX ) 2 and Y ∼ N (µY , σY ) then the random variable D = X − Y will have a normal 2 distribution with mean µD and variance σD given by: µD

= µX − µY

2 σD

2 = σX + σY2

i.e., f (d), the pdf for the random variable, D, is: { } 1 −(d − µD )2 √ exp f (d) = 2 2σD σD 2

(14.36)

(14.37)

regardless of sample size. (ii) From Eq (14.51) in the text, we know that the MLE for the mean of the Gaussian random variable X is, in this case, 1∑ ¯ Xi = X n i=1 n

µ ˆX = and for the random variable Y ,

1 ∑ µ ˆY = Yi = Y¯ m i=1 m

20

CHAPTER 14.

To obtain the MLE for the Gaussian random variable, D, we may simply invoke the “invariance property” of the MLE to assert from Eq (14.36) above, that ¯ − Y¯ µ ˆD = µ ˆX − µ ˆY = X Thus, if the mean of the random variable D = X − Y is defined as δXY = (µX − µY ), then the MLE for δXY is ¯ =X ¯ − Y¯ D as required. 14.28 (i) The required interval may be represented as ( ) σ ¯ µ = X ± z0.025 √ n so that within the context of this problem, ( ) σ 1 z0.025 √ = w 2 n (1.96)(2.236) √ = 0.25 n

(14.38)

to yield n = 307.32 which is rounded up to obtain n = 308. Observe that because the population variance, σ 2 , is 5, but we desire to 1 estimate the mean to a level of precision prescribed as an interval 10 th the size of the sample variance, a large sample size will be required in order to achieve such a comparatively stringent level of precision. (ii) When σ 2 is doubled, solving for n in Eq (14.38) above yields n = 614.656, rounded up to n = 615. But for the round-up effect in each case, this sample size would be precisely double that in (i) above. In general, if under one condition the problem characteristics are σ12 , n1 , respectively the population variance, and sample size required to obtain a 95% confidence interval with width w1 , and the corresponding characteristics under a second condition are σ22 , n2 and w2 , it is straightforward to see, from Eq (14.38) above, that: √ √

σ12 σ22

n1 n2

=

w1 w2

(14.39)

so that if the desire is to maintain the same interval width (i.e., w1 = w2 ), then, the relationship between the population variances and sample sizes will be: √ √ σ12 n1 = 2 σ2 n2

21 or

σ12 n1 = σ22 n2

Thus, if the population variance doubles, the requires sample size will also double. In general, if the new population variance is a multiple k of the original population variance, then the sample size, n2 , required to maintain the same 95% confidence interval width will be kn1 . (iii) Under these circumstances, n1 = n2 , and the required 95% CI width, w2 , may be obtained from Eq (14.39) as: √ w2 = w1 a width that is



√ σ22 = w1 2 = 0.707 2 σ1

2 times larger than the original 95% CI width.

14.29 In this case, the required confidence intervals are obtained from the expression ¯ ± tα/2 (n − 1) √S µ=X n with the results shown in the table below. (1 − α) 0.90 0.95 0.99

tα/2 (99) 1.66 1.98 2.63

Min(µ) 10.297 10.258 10.178

Max(µ) 10.703 10.743 10.822

First, we observe that the results in this table show a systematic widening of the resulting confidence interval with increasing desired degree of confidence. All other things being equal, it is perfectly logical that a higher degree of confidence that the unknown parameter will lie in a particular region will require a wider interval. Narrowing the interval will reduce the attendant confidence that it will contain the unknown parameter. 14.30 (i) Assuming that X, the total number of respondents in favor of the proposition, is a binomial Bi(n, p) random variable, then 2 σX = np(1 − p)

Consequently, the associated variance for the estimate pˆ = is obtained as: V ar(ˆ p) =

X n

p(1 − p) V ar(X) = 2 n n

22

CHAPTER 14.

Thus,



p(1 − p) n If we now assume that the sample size, n = 50, is large enough for the standard normal approximation to hold for {(ˆ p − p)/σp }, then the 95% CI for the estimate of the true but unknown parameter will be (√ ) 0.72 × 0.28 θ = 0.72 ± 1.96 = 0.72 ± 0.124 50 σp =

or, equivalently that 0.596 < θ < 0.844. (ii) Under the indicated circumstances, we desire to find n such that: (√ ) 0.8 × 0.2 1.96 = 0.05 n or, n=

1.96 × 0.8 × 0.2 = 245.862 (0.05)2

which may be rounded up to n = 246. Observe that with a sample size, n = 50, the width of the 95% CI is ±0.124; to reduce this CI interval width to ±0.05 (i.e., by more than a half), requires almost quintupling the number of people sampled. (iii) With a confidence level reduction to 90%, the implication is that: z0.05 = 1.64 and for the same desired interval width, the required sample size is obtained as: n=

1.64 × 0.8 × 0.2 = 172.134 (0.05)2

rounded up to n = 173. Reducing the confidence level reduces the required sample size commensurately. The implication is obvious: a higher confidence level in the estimated population proportion requires sampling a larger number of people; if one moderates one’s desired confidence level, then one can sample a smaller number of people and still have a CI of the same width. 14.31 The desired 95% confidence interval is obtained from: √ 2 SX S2 ¯ ¯ δxy = (Y − X) ± zα/2 + Y n m and from the specific supplied information, we obtain: √ 20 35 δxy = (42.4 − 38.8) ± 1.96 + = 3.60 ± 1.46 120 90

23 giving the interval as 2.14 < δxy < 5.06 which does not include zero. 14.32 For this particular problem, we note that the “half-width” of the (symmetric) confidence interval is given by: √ 2 SX S2 1 w1/2 = zα/2 + X2 n1 n2 With the specific values indicated, this becomes: √ 30 2 = 1.96 n which, when solved for n, yields: n = 28.8 rounded up to give n = 29. Section 14.6 14.33 (i) The equation in question is of the form f (θ|x) = CB θx+1 (1 − θ)n−x+2 which is the form of a beta B(α, β) distribution where: α = (x + 2); β = (n − x + 3) Under these circumstances, the normalizing constant, CB , is given by: CB =

Γ(α + β) Γ(n + 5) = Γ(α)Γ(β) Γ(x + 2)Γ(n − x + 3)

If we now write the indicated Gamma functions in factorial form, i.e., Γ(α) = (α − 1)! we obtain: CB =

(n + 4)! (x + 1)!(n − x + 2)!

as required. (ii) From the given expressions for the Bayes and MAP estimates, we obtain: E(θˆB ) = E(θˆ∗ ) =

E(X) + 2 n+5 E(X) + 1 n+3

24

CHAPTER 14.

so that, because E(X) = np = nθ we find that: E(θˆB ) = E(θˆ∗ ) =

nθ + 2 n+5 nθ + 1 n+3

neither of which is exactly equal to θ for finite n, only approaching it as n → ∞. Therefore, both estimates are biased. From the expressions for these three estimates, we are able to obtain the associated variances as follows: First, ˆ = VM LE = V ar(θ)

V ar(X) n2

Also, VB = V ar(θˆB )

=

VM AP = V ar(θˆ∗ )

=

1 V ar(X) (n + 5)2 1 V ar(X) (n + 3)2

From where we deduce that ( VB

= (

VM AP

=

n n+5 n n+3

)2 VM LE )2 VM LE

Observe therefore that for finite n, ( )2 ( )2 n n < 6—not shown in the table above—are obtained from the relevant negative binomial pdf.)

34

CHAPTER 14.

NE → NT ↓ 1 2 3 4 5 6 7 8 9 10 11 12 13

1

2

3

4

5

6

7

0.500 0.250 0.125 0.063 0.031 0.016 0.008 0.004 0.002 0.001 0.000 0.000 0.000

– 0.250 0.250 0.188 0.125 0.078 0.047 0.027 0.016 0.009 0.005 0.003 0.001

– – 0.125 0.188 0.188 0.156 0.117 0.082 0.055 0.022 0.013 0.008 0.005

– – – 0.063 0.125 0.156 0.156 0.137 0.109 0.082 0.040 0.027 0.017

– – – – 0.031 0.078 0.117 0.137 0.137 0.123 0.103 0.060 0.044

– – – – – 0.016 0.047 0.082 0.109 0.123 0.123 0.113 0.079

– – – – – – 0.008 0.027 0.055 0.082 0.103 0.113 0.113

Here are some interesting facts revealed by this table of probabilities about the problem at hand. For example, the table shows that if there is only one early symptomatic patient (in the first month), it is highly unlikely that there will be as many as 7 total infected patients by the end of the year. If there are 2 early symptomatic patients in the first month, the chance that, at the end of the year, there will be just the two, is equal to the chance that there will be a third one, and a very slim chance that there will be as many as 7. If there are 3 early symptomatic patients, there is a low, but non-zero chance that there could be as many as 10 total infected patients. (Observe that this is indeed the case in Year 3.) If there are as many as 7 early symptomatic cases, the chances that there will be 13 total cases is higher than the chances that there will be, say, only 10; however, there is a small, but non-zero, chance that there will be just the observed 7. (This is in fact the case in Year 4.) In addition, the table also indicates, for each value of NE (the early symptomatic patients), a corresponding number NT∗ , the total number of infected with the highest probability of occurrence, along with the associated probability. For example, for NE = 1, NT∗ = 1 with p∗ = 0.5; for NE = 2, NT∗ = 1 or 2, with p∗ = 0.25; for NE = 3, NT∗ = 4 or 5, with p∗ = 0.188; for NE = 5, NT∗ = 8 or 9, with p∗ = 0.137; etc, thus providing an objective means for anticipating (and preparing for) the expected total case load, on the basis of the early symptomatic patients observed in the first month. 14.40 (i) To calculate the sample mean and variance, we assign the observed frequency associated with 5+ totally to x = 5 accidents; we therefore obtain, for the sample average: x ¯=

1∑ xi f (xi ) = 0.465 n i

35 and for the sample variance, s2 =

1∑ (xi − x ¯)2 f (xi ) = 0.691 n

We now observe that for the standard Poisson random variable, x ¯ and s2 should 2 be approximately equal; this is not the case here. In fact, s > x ¯, so that this random variable is “overdispersed.” (ii) The first moment and variance for the negative binomial random variable are related to the population parameters as follows: ( ) 1−p M1 = α p ( ) 1 − p σ 2 = M2 − M12 = α p2 from which we obtain the following expressions for the parameter estimates in terms of the moments: M1 σ2 M12 2 σ − M1

p = α

=

For the specific problem at hand, therefore, α ˆ

= 0.957

pˆ = 0.673 14.41 The appropriate model is the exponential E(β) pdf; the MLE for the unknown population parameter, β, is the sample average, i.e., βˆ = x ¯ In this case, for the three different populations, we obtain, directly from the supplied data: βˆA = x ¯A βˆB = x ¯B

= 1.867 = 2.124

βˆC = x ¯C

= 1.972

To obtain a precise 95% confidence interval, we may simply recall Example 14.14 in the text, where we showed that if X ∼ E(β), then ¯ X ∼ γ(n, 1/n) β Thus, the 95% CI estimate is obtained such that: ( ) ¯ X P ξL < < ξR = 0.95 β

(14.53)

36

CHAPTER 14.

Distribution Plot Gamma, Shape=10, Scale=0.1, Thresh=0 1.4 1.2

Density

1.0 0.8 0.6 0.4 0.2 0.025 0.0

0

0.025 0.480

1.71 X

Figure 14.2: The gamma γ(10, 0.1) distribution showing left and right variates corresponding to tail area probability 0.05: ξL = 0.48; ξR = 1.71.

with the indicated probability determined from the gamma γ(n, 1/n) pdf. In this specific case, since n = 10, we obtain ξL = 0.48 and ξR = 1.71 from the gamma γ(10, 0.1) pdf, as shown Fig 14.2. The expression in Eq (14.53) may be rearranged to yield the desired 95% confidence interval estimate for each computed sample average: x ¯i x ¯i < βi < ξR ξL Upon introducing the values obtained above for ξL and ξR and the computed sample averages, the result is: 1.092 < βA < 3.890 1.242 < βB < 4.425 1.153 < βC < 4.108 We may now observe that these 3 intervals overlap, so that with 95% confidence, there is no evidence that there is any difference between the safety performances of these 3 operators, strictly on the basis of this data set. 14.42 We begin by observing that the aggregate time-to-publication is a sum of the time it takes for several sequential events to happen: getting the manuscript to reviewers and completing the first round of reviews; returning the reviews to the authors and completing the necessary revisions; completing the final review and making a final decision to accept; going to press and publishing the paper (in print and/or on-line). Observe also that each event takes a randomly varying length of time to complete. The appropriate model is therefore the

37 gamma distribution, which represents phenomena involving waiting times until the occurrence of k events. A plot of the data histogram is shown in Fig 14.3, along with a theoretical gamma distribution fit (as obtained via MINITAB), with α = 3.58, β = 2.83; it indicates that the gamma distribution provides a good fit for this phenomenon. Histogram of JanPapers Gamma Shape 3.577 Scale 2.830 N 85

20

Frequency

15

10

5

0 0

4

8

12 16 JanPapers

20

24

Figure 14.3: Histogram of time-to-publication data with the theoretical gamma pdf fit superimposed.

The model prediction of the mode, x∗ , is obtained from the fact that, for the gamma pdf, x∗ = β(α − 1) which, in this case, with α = 3.58 and β = 2.83, yields x∗ = 7.3 months From the histogram, the data mode is seen to occur at the bin centered on x = 8, a value that is quite close to the gamma model estimate. (Note that determining distribution modes empirically from data histograms is subject to the choice of bin sizes and hence the number of bins used to obtain the histogram; different bin sizes may produce different values for the mode.) Upon ranking the 85 data entries from the lowest to the highest value, one finds that the 29th ranked entry is 7.4, implying that a total of 56 papers took longer to publish than the theoretical mode of 7.3 months. The data table therefore implies that the proportion of papers that took longer than x∗ = 7.3 months to publish is 56/85 = 0.659. From MINITAB, we obtain, from the theoretical gamma γ(3.58, 2.83) distribution, P (x > 7.3) = 0.658 see Fig (14.4).

38

CHAPTER 14.

Distribution Plot Gamma, Shape=3.58, Scale=2.83, Thresh=0 0.09 0.08 0.07

f(x)

0.06 0.05 0.04 0.03 0.02

0.658

0.01 0.00

0

7.3 X

Figure 14.4: Theoretical value for P (X > x∗ ) (with x∗ = 7.3) for the time-to-publication gamma pdf. The value, 0.658, is virtually identical to the proportion 56/85=0.659 determined from the raw data table.

Thus, the value obtained for this probability from the theoretical gamma distribution is virtually identical to the value obtained from the data. 14.43 From the given logarithmic series pdf, the likelihood function from a random sample, X1 , X2 , . . . , Xn , is obtained as: ∑

αn p L(p) = ∏n

i=1

Xi

Xi

from where we obtain the log-likelihood function as: ℓ(p) = n ln α + ln p

n ∑

Xi −

i=1

n ∑

ln Xi

i=1

The MLE for p is obtained by solving the equation ∂ℓ(p) =0 ∂p In this case, keeping in mind that α itself is a function of p, specifically, α=

−1 ln(1 − p)

we now obtain: n dα ∂ℓ(p) = + ∂p α dp

∑n i=1

p

(14.54)

Xi

=0

39 or, upon introducing Eq (14.54) for α, dα −n ln(1 − p) + dp to yield ln(1 − p)

∑n i=1

Xi

p

=0

¯ X dα = dp p

(14.55)

Now, from Eq (14.54), we obtain −1 dα = dp [ln(1 − p)]2 (1 − p) so that Eq (14.55) becomes −p ¯ =X (1 − p) ln(1 − p)

(14.56)

which is the equation to be solved for p to obtain the MLE, pˆ. Observe that if we were to introduce Eq (14.54) into this expression, the result will be ¯= X

αp (1 − p)

Of course, this is precisely the expression one would obtain if, in order to obtain the method-of-moments estimate for the unknown parameter p, one introduces the sample mean into the expression for the expected value for this random variable. The MLE and the method-of-moments estimate would therefore coincide precisely, as required. By definition, the sample average is obtained from the supplied data as: ∑ xi Φ(xi ) 3306 x ¯ = ∑i = = 6.599 Φ(x ) 501 i i From Eq (14.56) above, determining the MLE estimate for the population parameter p requires solving the following nonlinear equation numerically for p: 6.599 =

−p (1 − p) ln(1 − p)

The result is: pˆ = 0.953 Upon introducing this value into the logarithmic series pdf, f (x) = resulting predicted frequency, obtained as:

αpx x ,

the

ˆ Φ(x) = 501f (x) is shown in the following table, along with the observed frequency, with both frequencies plotted in Fig 14.5. From this table and the plots, we observe that the model appears sufficiently adequate.

40

CHAPTER 14.

160

Variable Observ ed Predicted

140 120

Frequency

100 80 60 40 20 0 0

5

10

15

20

25

X

Figure 14.5: Empirical frequency distribution of the Malaya butterfly data (solid line, circles) versus theoretical logarithmic series model, with p = 0.953 (dashed line, squares).

No of species x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Observed Frequency Φ(x) 118 74 44 24 29 22 20 19 20 15 12 14 6 12 6 9 9 6 10 10 11 5 3 3

Predicted Frequency ˆ Φ(x) 156.152 74.407 47.273 33.788 25.760 20.458 16.711 13.935 11.805 10.125 8.772 7.663 6.741 5.965 5.306 4.740 4.252 3.827 3.455 3.128 2.839 2.583 2.354 2.150

41 14.44 (i) The sample mean, standard deviation, and variance are obtained from the data as follows: x ¯ = 0.219; s = 0.0113; s2 = 1.277 × 10−4 Thus, with ν = 17 degrees of freedom, the value of t0.025 (ν) is 2.110; the 95% CI for the mean is therefore obtained as: ¯ ± t0.025 (ν) √s = 0.219 ± 0.0056 µ=X n or, 0.2134 < µ < 0.2246 Similarly for the variance, we obtain values for χ20.975 (17) and χ20.025 (17) as 7.56 and 30.2, respectively, so that the 95% CI for the variance is obtained as (n − 1)s2 (n − 1)s2 2 < σ < χ20.975 (17) χ20.975 (17) or

0.719 × 10−4 < σ 2 < 2.871 × 10−4

(ii) The problem requires that we compute P (0.19 < X < 0.25) from a normal pdf with µ = 0.22; σ = 0.011; the result, using MINITAB (see Fig 14.6) is: P (0.19 < X < 0.25) = 0.994

Distribution Plot Normal, Mean=0.22, StDev=0.011 40 0.994

Density

30

20

10

0

0.19

0.22 X

0.25

Figure 14.6: Determining P (0.19 < X < 0.25) for X ∼ N (0.22, 0.011). Observe that this result makes perfect sense. The process is capable of making cylinder heads with standard deviation 0.011; the specifications require

42

CHAPTER 14.

making the cylinder heads to within almost 3 times the standard deviation associated with the manufacturing process. We should therefore expect a fairly high probability of meeting these relatively lax specifications. 14.45 In this case, since the population standard deviation is given, the 95% CI is obtained from σ µ=x ¯ ± z0.025 √ n Thus, with σ = 5, and z0.025 = 1.96, the requirement is now to determine n such that: 1.96 × 5 σ z0.025 √ = √ =1 n n from where we obtain, n = 96.04 Thus, in order to determine a mean daily rate of pollution to within ±1 (or onefifth of the intrinsic standard deviation associated with the measured pollution), with 95% confidence, one requires at least 96 samples. With 6 samples, so that ν = 5 degrees of freedom, and with an estimate of variance obtained as sˆ2 = 30, the 95% confidence interval around this estimate of σ 2 is obtained from: (n − 1)ˆ s2 (n − 1)ˆ s2 2 < σ < χ20.975 (5) χ20.975 (5) and since χ20.975 (5) = 0.831 and χ20.025 (5) = 12.8, the required 95% CI is obtained as: 11.718 < σ 2 < 180.5 Even though this appears to be a very wide interval, nevertheless, it contains the assumed population value of σ 2 = 25. Thus, even though the estimated value appears excessive, it is still consistent with the stipulated population value. 14.46 (i) Replacing 1/β with θ in the exponential pdf yields: f (x) = θe−θx ; 0 < x < ∞ so that the joint pdf for a random sample X1 , X2 , . . . , Xn , will be: f (x1 , x2 , . . . , xn |θ) = θn e−θ

∑ i

Xi

Using the given gamma γ(a, b) distribution as the prior distribution for θ, i.e., f (θ) = Cθa−1 e−θ/b results in the posterior distribution: f (θ|x1 , x2 , . . . , xn ) ∝ Cθa−1 e−θ/b θn e−θ

∑ i

Xi

43 which simplifies to f (θ|x1 , x2 , . . . , xn ) = C2 θn+a−1 e−θ(

∑ i

Xi +1/b)

(14.57)

The constant C2 is a normalizing constant to be determined such that the integral of the pdf is 1. Instead of carrying out the required integration directly, however, we may observe the useful fact that this pdf is precisely of the form of the gamma γ(α, β) pdf with α = (n + a); as such,

and

(∑ C2 =

i

∑ 1 1 = Xi + β b i

)(n+a) Xi + 1b Γ(n + a)

More importantly, this observation allows us to determine the posterior mean, θˆB , directly. Since the mean of the γ(α, β) pdf is αβ, we obtain immediately that: n+a θˆB = ∑ 1 i Xi + b so that, since θ = 1/β, the corresponding estimate βˆB will be: ∑ ¯+ 1 Xi + 1b X nb βˆB = i = n+a 1 + na

(14.58)

Similarly, since the mode of the γ(α, β) pdf is β(α − 1), we obtain the MAP estimate, θˆ∗ , as: n+a−1 θˆ∗ = ∑ 1 i Xi + b so that the corresponding βˆ∗ will be: ∑ ¯+ 1 Xi + 1b X ∗ nb ˆ β = i = n+a−1 1 + (a−1) n

(14.59)

We may now note that: (a) but for the denominators that differ slightly, these two estimates are quite similar; ¯ and, (b) as n → ∞, both estimates βˆB and βˆ∗ approach the MLE, X; (c) for finite n, and a, b > 0, βˆ∗ > βˆB even if only slightly (again because of the terms in the denominators). (ii) From the given data, we obtain the MLE, βˆM L , as: βˆM L = x ¯ = 1.867

44

CHAPTER 14.

and, from Eqs (14.58) and (14.59) above, with a = 2, b = 1, we obtain the following Bayesian estimates: βˆB βˆ∗

= 1.639 = 1.788

These estimates are such that: βˆM L > βˆ∗ > βˆB . (iii) The required gamma pdf plots are shown in Fig 14.7, where we note that the new prior distribution, the gamma γ(3, 1) pdf, favors higher values for the unknown parameter θ, slightly more than does the former prior distribution, the gamma γ(2, 1) pdf. But keep in mind that these are prior distributions for θ, the reciprocal of the parameter of interest, β. Thus, the implication for the parameter β is the reverse: the new prior distribution favors lower values than does the former prior distribution. Distribution Plot Gamma, Scale=1, Thresh=0 0.4

Shape (alpha) 2 3

Density

0.3

0.2

0.1

0.0 0

2

4

6

8

10

Figure 14.7: A comparison of the two prior distributions for θ, the former, γ(2, 1) (solid line), and the new γ(3, 1) (dashed line). Note how the new prior distribution favors higher values of θ, which translates to lower values for β, the reciprocal of θ.

Upon using a = 3 and retaining b = 1 in Eqs (14.58) and (14.59) above, we obtain the following Bayesian estimates associated with this new prior distribution: βˆB βˆ∗

= 1.513 = 1.639

where we see immediately that these new estimates are uniformly less than the corresponding estimates obtained in (ii) above. As noted earlier, this is a direct consequence of the new prior distribution which favors lower values for β.

45 14.47 (i) The gamma γ(α, β) prior pdf for θ is: f (θ) = C1 θα−1 e−θ/β with C1 as the usual normalization constant for the gamma density. Now, given the Poisson pdf, e−θ θx f (x) = x! the sampling distribution (the joint pdf for the random sample X1 , X2 , . . . , Xn ) is obtained as: ∑n e−nθ θ( i=1 xi ) f (x1 , x2 , . . . , xn |θ) = (14.60) x1 !x2 ! . . . xn ! Thus, by Bayes’ theorem, the required posterior distribution is obtained as: ∑

f (θ|x1 , x2 , . . . , xn ) = Cθ(

xi +α−1) −(nθ+θ/β)

e

(14.61)

where the new normalizing constant, C, is: C=

C1 x1 !x2 ! . . . xn !

Now, to obtain the posterior mean, θˆB , from the posterior pdf in Eq (14.61), it is particularly convenient to observe that this pdf is that of a gamma γ(a, b) random variable where: a =

n ∑

xi + α

i=1

b

=

1 n + 1/β

and since the mean of the γ(a, b) random variable is ab, then the required θˆB is: ∑n xi + α ˆ θB = i=1 (14.62) n + 1/β The maximum likelihood estimate, θˆM L , may be obtained from the sampling distribution above (Eq (14.60)) considered as the likelihood function L(θ): ∑n

e−nθ θ( i=1 xi ) L(θ) = x1 !x2 ! . . . xn !

(14.63)

By taking the usual logarithm, differentiating with respect to θ, setting to zero and solving for the value of θ that maximizes this function, (or simply by quoting from the text), one obtains ∑n xi ˆ (14.64) θM L = i=1 n

46

CHAPTER 14.

as the maximum likelihood estimate for θ. And now from Eq (14.62), observe that as α → 0 simultaneously as β → ∞ ∑n xi θˆB → i=1 = θˆM L n as required. (ii) From the given sample data, we obtain 30 ∑

xi = 51;

i=1

With n = 30 along with the prior pdf parameters given as α = 2; β = 2, we may now obtain the following results: (a) The maximum likelihood estimate is: 51 θˆM L = = 1.700 30 (b) From Eq (14.62), the Bayes’ estimate is: 51 + 2 θˆB = = 1.738 30 + 1/2 (c) Finally, using only the first 10 data points gives: 10 ∑

xi = 17;

i=1

so that the MLE will be 17/10, or θˆM L = 1.70 as obtained previously from the full data set. However, this time, the Bayes’ estimate is obtained as: 17 + 2 θˆB = = 1.81 10 + 1/2 We see that with a smaller sample, there is a larger difference between the MLE and the Bayes’ estimate. One effect of an increase in the number of available data points is therefore to shrink the difference between the two estimates. 14.48 (i) Theory: With a sampling distribution for Yk given as: { } −(yk − ηk )2 1 f (yk |ηk ) = √ exp 2σ 2 σ 2π and with a prior distribution for ηk , the unknown true mean value of Yk , specified as N (ηk−1 , v 2 ), i.e., { } 1 −(ηk − ηk−1 )2 f (ηk ) = √ exp 2v 2 v 2π

47 the combination produces the posterior distribution, { } −(yk − ηk )2 (ηk − ηk−1 )2 f (ηk |yk ) ∝ exp − 2σ 2 2v 2 ∝

(14.65)

exp{−u2 }

where u2

= =

(ηk − ηk−1 )2 (yk − ηk )2 + 2σ 2 2v 2 v 2 (yk − ηk )2 + σ 2 (ηk − ηk−1 )2 2σ 2 v 2

which, upon dividing numerator and denominator by (σ 2 + v 2 ), simplifies to: u2 =

(1 − α)(yk − ηk )2 + α(ηk − ηk−1 )2 A = 2 2˜ σ 2˜ σ2

(14.66)

where, as specified in the problem, α

=

σ ˜2

=

σ2 + v2 σ2 v2 1 = = αv 2 ; (or (1 − α)σ 2 ) 1 1 2 + v2 σ + 2 2 σ v

σ2

Next, we focus on the numerator in Eq (14.66), which may be written as: A = =

(yk − ηk )2 + α{(ηk − ηk−1 )2 − (yk − ηk )2 } 2 (yk − ηk )2 + α{2ηk (yk − ηk−1 ) + ηk−1 − yk2 }

which consolidates to: 2 − 2ηk [αηk−1 + (1 − α)yk ] A = (1 − α)yk2 + ηk2 + αηk−1

or, 2 A = (1 − α)yk2 + ηk2 + αηk−1 − 2ηk η˜k

(14.67)

η˜k = αηk−1 + (1 − α)yk

(14.68)

where Since, from Eq (14.68), 2 η˜k2 = α2 ηk−1 + (1 − α)2 yk2 + 2α(1 − α)ηk−1 yk

we may now make use of the following identities: (1 − α)2 α2

= (1 − α) − α(1 − α) = α − α(1 − α)

48

CHAPTER 14.

to obtain: η˜k2

2 2 = αηk−1 + (1 − α)yk2 − α(1 − α)(ηk−1 + yk2 ) + 2α(1 − α)ηk−1 yk

so that: 2 η˜k2 + α(1 − α)(ηk−1 − yk )2 = αηk−1 + (1 − α)yk2

(14.69)

Upon substituting Eq (14.69) into Eq (14.67), we obtain: A

= η˜k2 + ηk2 − 2ηk η˜k + α(1 − α)(ηk−1 − yk )2 2

= (ηk − η˜k ) + α(1 − α)(ηk−1 − yk )2 As a result, we may now return to Eq (14.66), and then Eq (14.65), to obtain: { } { } −(ηk − η˜k )2 (ηk−1 − yk )2 f (ηk |yk ) = C exp exp 2˜ σ2 2(σ 2 + v 2 ) And now, since σ 2 and v 2 are known constants, and at time instant k, yk , and ηk−1 are also known, then the rightmost exponential term above is a constant; therefore, the posterior distribution simplifies to: { } −(ηk − η˜k )2 f (ηk |yk ) = C1 exp (14.70) 2˜ σ2 which is the pdf of a Gaussian distribution with mean η˜k and variance σ ˜ 2 , as required. (ii) Application: Upon applying the filter equation η˜k = α˜ ηk−1 + (1 − α)yk recursively to the given viscosity data for α = 0.2 and α = 0.8, the result is shown in the table below: k 1 2 3 4 5 6 7 8 9 10 11 12 13

η˜k α = 0.2 20.6560 20.8672 21.3414 21.9883 20.2057 21.5691 22.0178 23.8116 20.9703 20.4741 18.7508 19.1422 19.8204

η˜k α = 0.8 20.1640 20.3152 20.5442 20.8653 20.6443 20.8974 21.1439 21.7671 21.4657 21.2426 20.6581 20.3744 20.2976

k 14 15 16 17 18 19 20 21 22 23 24 25

η˜k α = 0.2 18.8841 20.9608 21.6722 22.2064 20.7213 20.4003 21.7601 20.9040 19.9728 20.2106 22.7061 20.2932

η˜k α = 0.8 19.9680 20.2704 20.5863 20.9371 20.8197 20.7197 20.9958 20.9346 20.6957 20.6106 21.1544 20.8616

49 A time sequence plot of the raw data and the two filtered sequences is shown in Fig 14.8. The smaller value of α = 0.2 results in light filtering of the data, so that the filtered values are not too different from the original raw measurement. The larger value of α = 0.8 provides a substantial amount of smoothing to the data while still capturing the essence of what may be considered the true variation present in the data (minus the “noise”). Thus, the more “heavily filtered” data appears to capture more of the true dynamic essence of the data, while the lighter filtered version still appears to contain a substantial amount of “noise.”

Time Series Plot of yk, etak2, etak8 25

Variable yk etak 2 etak 8

24

Data

23 22 21 20 19 18 2

4

6

8

10

12 14 Index

16

18

20

22

24

Figure 14.8: A comparison of raw solution viscosity data (solid line, circles) with two filtered versions: “lighter filtering” with α = 0.2 (long dashed line, squares), and “heavier filtering” with α = 0.8 (short dashed line, diamonds). Note how the more “heavily filtered” data is smoother, appearing to capture more of the true dynamic essence of the data with less “noise”.

14.49 From the expression for the Weibull cdf F (x) = 1 − e−(x/β) we obtain, upon taking logs: ln[1 − F (x)] = −

ζ

( )ζ x β

or,

( )ζ x − ln[1 − F (x)] = β Keeping in mind that F (x) ≤ 1 (and hence, that [1 − F (x)] ≤ 1), so that ln[1 − F (x)] will naturally be negative, upon taking logs once more, we obtain: ln{− ln[1 − F (x)]} = ζ(ln x − ln β)

50

CHAPTER 14.

so that a plot of ln{− ln[1 − F (x)]} versus ln x will produce a straight line with slope ζ and intercept −ζ ln β. From the raw data, the empirical cumulative distribution is obtained using the expression: i − 0.5 F (xi ) ≈ n where i is the rank order of the raw data. With n = 50, the rank and corresponding empirical cdf for each raw data entry xi are determined as shown in the table below. From here, we obtain ln[1 − F (xi )], and yi = ln{− ln[1 − F (xi )]} as shown in the table. A regression of y versus ln x produces Fig 14.9, which, while not perfectly linear shows a reasonably strong linear correlation, with slope 2.319 and intercept −2.198, so that: ζˆ = 2.319; and ζ ln β = 2.198 As a result: ln β =

2.198 = 0.948 ⇒ βˆ = e0.948 = 2.581 2.319

Fitted Line Plot y = - 2.198 + 2.319 lnx 2

S R-Sq R-Sq(adj)

1

0.270424 95.5% 95.4%

0

y

-1 -2 -3 -4 -5 -1.0

-0.5

0.0

0.5 lnx

1.0

1.5

2.0

Figure 14.9: A regression procedure for obtaining approximate estimates of Weibull distribution parameters: y = ln{− ln[1 − F (xi )]}, with F (xi ) as the empirical cdf for data entry xi with rank i.

We now observe that even such approximate estimates βˆ = 2.581 and ζˆ = 2.319 are quite close to the “true” values β = 2.5 and ζ = 2.0 used by Padgett and Spurrier in their analysis.

51 xi 1.4 3.2 2.2 1.8 1.6 3.7 1.6 1.2 0.4 1.1 3.0 0.8 5.1 3.7 2.0 1.4 5.6 2.5 2.5 1.6 1.0 1.7 1.2 0.9 2.1 2.8 1.6 3.5 1.6 1.9 4.9 2.0 2.2 2.8 2.9 3.7 1.2 1.7 4.7 2.8 1.8 1.1 1.3 2.0 2.1 1.6 1.7 4.4 1.8 3.7

i 11.5 40.0 31.5 23.0 15.5 43.5 15.5 8.0 1.0 5.5 39.0 2.0 49.0 43.5 27.0 11.5 50.0 33.5 33.5 15.5 4.0 20.0 8.0 3.0 29.5 36.0 15.5 41.0 15.5 25.0 48.0 27.0 31.5 36.0 38.0 43.5 8.0 20.0 47.0 36.0 23.0 5.5 10.0 27.0 29.5 15.5 20.0 46.0 23.0 43.5

F (xi ) 0.22 0.79 0.62 0.45 0.30 0.86 0.30 0.15 0.01 0.10 0.77 0.03 0.97 0.86 0.53 0.22 0.99 0.66 0.66 0.30 0.07 0.39 0.15 0.05 0.58 0.71 0.30 0.81 0.30 0.49 0.95 0.53 0.62 0.71 0.75 0.86 0.15 0.39 0.93 0.71 0.45 0.10 0.19 0.53 0.58 0.30 0.39 0.91 0.45 0.86

ln[1 − F (xi )] -0.24846 -1.56065 -0.96758 -0.59784 -0.35667 -1.96611 -0.35667 -0.16252 -0.01005 -0.10536 -1.46968 -0.03046 -3.50656 -1.96611 -0.75502 -0.24846 -4.60517 -1.07881 -1.07881 -0.35667 -0.07257 -0.49430 -0.16252 -0.05129 -0.86750 -1.23787 -0.35667 -1.66073 -0.35667 -0.67334 -2.99573 -0.75502 -0.96758 -1.23787 -1.38629 -1.96611 -0.16252 -0.49430 -2.65926 -1.23787 -0.59784 -0.10536 -0.21072 -0.75502 -0.86750 -0.35667 -0.49430 -2.40795 -0.59784 -1.96611

ln xi 0.33647 1.16315 0.78846 0.58779 0.47000 1.30833 0.47000 0.18232 -0.91629 0.09531 1.09861 -0.22314 1.62924 1.30833 0.69315 0.33647 1.72277 0.91629 0.91629 0.47000 0.00000 0.53063 0.18232 -0.10536 0.74194 1.02962 0.47000 1.25276 0.47000 0.64185 1.58924 0.69315 0.78846 1.02962 1.06471 1.30833 0.18232 0.53063 1.54756 1.02962 0.58779 0.09531 0.26236 0.69315 0.74194 0.47000 0.53063 1.48160 0.58779 1.30833

yi -1.39247 0.44510 -0.03295 -0.51444 -1.03093 0.67606 -1.03093 -1.81696 -4.60015 -2.25037 0.38504 -3.49137 1.25463 0.67606 -0.28101 -1.39247 1.52718 0.07586 0.07586 -1.03093 -2.62319 -0.70462 -1.81696 -2.97020 -0.14214 0.21340 -1.03093 0.50726 -1.03093 -0.39550 1.09719 -0.28101 -0.03295 0.21340 0.32663 0.67606 -1.81696 -0.70462 0.97805 0.21340 -0.51444 -2.25037 -1.55722 -0.28101 -0.14214 -1.03093 -0.70462 0.87877 -0.51444 0.67606

52

CHAPTER 14.

14.50 Because the random variables XA , XB , YA , and YB are fractions, the most natural probability model is the beta distribution. The parameters may be estimated by maximum likelihood, although it will also be acceptable if the method of moments is used to estimate the model parameters, α, β from the mean and variance. The results are as follows. Estimation results for XA (Mean = 0.2759; Variance = 0.0166) Parameter Estimate Std. Err. α 3.0319 0.5281 β 7.9978 1.7905 The beta pdf model fit versus the data histogram is shown in Fig 14.10. 3.5

3

Density

2.5

2

1.5

1

0.5

0

0

0.1

0.2

0.3

0.4

0.5

0.6

XA

Figure 14.10: XA data histogram and beta pdf model fit.

Estimation results for YA (Mean = 0.2012; Variance = 0.0149) Parameter Estimate Std. Err. α 1.9683 0.3359 β 7.8140 1.2437 The beta pdf model fit versus the data histogram is shown in Fig 14.11. Estimation results for XB (Mean = 0.3504; Variance = 0.0230) Parameter Estimate Std. Err. α 3.1123 0.5891 β 5.7697 1.1889

53 3.5

3

Density

2.5

2

1.5

1

0.5

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

YA

Figure 14.11: YA data histogram and beta pdf model fit. The beta pdf model fit versus the data histogram is shown in Fig 14.12. 2.5

Density

2

1.5

1

0.5

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

XB

Figure 14.12: XB data histogram and beta pdf model fit.

Estimation results for YB (Mean = 0.1371 ; Variance = 0.0128) Parameter α α

Estimate 1.1272 7.0942

Std. Err. 0.2315 1.5612

The beta pdf model fit versus the data histogram is shown in Fig 14.13.

54

CHAPTER 14.

5

Density

4

3

2

1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

YB

Figure 14.13: YB data histogram and beta pdf model fit. Observe from these figures that the beta pdf models fit the data fairly well.

Chapter 15

Exercises Section 15.2 15.1 If µ is the true mean “mooney viscosity” of elastomer produced by the process, then the appropriate null and alternative hypotheses to be tested are: H0 : µ = 44.0 Ha : µ ̸= 44.0 15.2 If µ is the true mean lifetime of the light bulbs in question, then the appropriate null and alternative hypotheses are: H0 : µ = 1000 Ha : µ > 1000 15.3 If µ is the true mean percent reduction in acne in the first week of usage of the medication, then the appropriate null and alternative hypotheses are: H0 : µ = 55 Ha : µ < 55 15.4 Let µA and µB represent the mean lifetime of brand A and brand B car batteries, respectively. Then H0 : µA − µB = 0 Ha : µA − µB ̸= 0 15.5 Let µ be the true mean time (in days) between safety incidents. Then H0 : µ = 30 Ha : µ < 30 15.6 (i) (a) The indicated test is an upper-tailed z-test, with H0 : µ = µ0 Ha : µ > µ0 1

2

CHAPTER 15.

Distribution Plot T, df=5 0.4

f(x)

0.3

0.2

0.1

0.051 0.0

0 X

2.000

Figure 15.1: P (T (ν) ≥ 2.0) = 0.051 is obtained for the t-distribution when ν = 5. (b) The test statistic is Z=

¯ − µ0 X √ σ/ n

(c) The significance level α is obtained with MINITAB using the cumulative probability feature (for a standard normal, N (0, 1) distribution) as: α = P (Z > 1.65) = 1 − P (Z ≤ 1.65) = 0.05 (ii) Recall that the test statistic T =

¯ − µ0 X √ S/ n

has a t(n − 1) distribution. Now, observe that: P (T (ν) > 2.0) = 0.5 depends on the degrees of freedom, ν = n − 1, which depends on n. By trial and error, we vary the degrees of freedom ν and compute p = P (T (ν) ≥ 2.0) until p ≈ 0.051. Fig 15.1 shows that this occurs for ν = 5 so that n=6 15.7 For n = 15, the p-value associated with an upper-tailed t-test statistic of 2.0 is obtained from the cumulative probability feature in MINITAB as P (T (14) ≥ 2.0) = 0.033

3 Next, we wish to determine ν such that the critical value t0.05 (ν) = 1.70, or, equivalently, such that P (T (ν) ≥ 1.70) = 0.05. This can be done using the inverse cumulative probability feature in MINITAB by varying the degrees of freedom until the value of t corresponding to a cumulative probability of 0.05 is close to −1.70. (It can also be done using the cumulative probability feature, varying ν until the cumulative probability corresponding to t = 1.70 is close to 0.95.) The closest match is obtained for ν = 29 so that n = 30 15.8 (i) This is a simple two-tailed hypothesis test with H0 : β = β0 Ha : β ̸= β0 ¯ (ii) The test statistic is X/β. (iii) The significance level is α = 1 − 0.95 = 0.05. (iv) From the reproductive properties of the Gamma random variable (see Eq ¯ ∼ γ(n, β/n), then X/β ¯ (9.39) of the text), note that if X ∼ γ(n, 1/n). ¯ Thus, with n = 20, the distribution of X/β is obtained as γ(20, 0.05). We may now use the inverse cumulative probability feature in MINITAB (for the Gamma distribution with Shape = 20, Scale = 0.05) to determine that F −1 (0.025) = 0.611 and F −1 (0.975) = 1.484 (see Fig 15.2). Therefore, in terms of the test statistic q = x ¯/β, the rejection region is RC = {q|q < 0.611; q > 1.484} Section 15.3 15.9 (i) The appropriate test statistic is: Z=

¯ − µ0 X √ σ/ n

which, from the supplied information, is obtained as: z = −2.0 (ii) If the alternative hypothesis is Ha : µ < µ0 , at the α = 0.1 significance level, the rejection region is obtained by determining the value zα such that P (Z < zα ) = 0.1 where Z ∼ N (0, 1). From MINITAB, the value is obtained as: zα = −1.282 so that the rejection region is Z < −1.282. Since the test statistic, z = −2.0, lies in this rejection region, the null hypothesis, H0 : µ = µ0 , should be rejected.

4

CHAPTER 15.

Distribution Plot Gamma, Shape=20, Scale=0.05, Thresh=0 2.0

f(x)

1.5

1.0

0.5

0.025 0.0

0.025 0.6108

1.484 X

Figure 15.2: Rejection region for the two-sided test of the β parameter of an exponential ¯ E(β) population. For a sample size n = 20, X/β ∼ γ(20, 0.05).

15.10 The 95% confidence interval estimate for the mean is obtained as: √ µ=x ¯ ± 1.96σ/ n and in this particular case: µ = 47.50 ± 2.45 We now observe that the maximum value of this interval, 49.95, is less than the hypothesized mean of 50. Because the 95% confidence interval estimate completely excludes the hypothesized mean, the implication is that at the α = 0.05 significance level, we must reject the null hypothesis that the population mean is equal to the hypothesized mean, µ0 , in favor of the (two-sided) alternative that it is not. This is because the 95% confidence interval is associated with a two-sided test. The p value associated with this (two-sided) test is defined as: p = P (Z ≤ −2) + P (Z ≥ 2); Z ∼ N (0, 1) obtained from MINITAB as p = 0.022 + 0.022 = 0.044 15.11 Since the population variance is unknown, we make use of the T statistic: T =

x ¯ − µ0 47.5 − 50 √ =√ = −1.60 s/ n 39.06/16

5 Now, for the one-sided alternative of Exercise 15.9, Ha : µ < µ0 , at the α = 0.1 significance level, the rejection region is obtained by determining the value tα such that P (T < tα ) = 0.1 where T ∼ t(15). From MINITAB, the value is determined as: tα = −1.341 so that the rejection region is T < −1.341. We now observe that the computed T -statistic, t = −1.60, lies in the rejection region, implying that we must reject the null hypothesis at the α = 0.1 significance level. At the α = 0.05 significance level, the rejection region is obtained from MINITAB as T < −1.753 This time, the computed test statistic does not lie in this rejection region; we therefore have no evidence for rejecting the null hypothesis at the α = 0.05 significance level. 15.12 (i) The null and alternative hypotheses are: H0 : µ = 10 Ha : µ ̸= 10 (ii)The appropriate test statistic is: Z=

¯ − µ0 X √ σ/ n

From the supplied data, we obtain: x ¯ = 9.596 from where we obtain the test statistic as: z=

9.596 − 10 √ = −1.28 1/ 10

(iii) The rejection region in this case is two-sided; it is determined by finding the value zα/2 such that: P (Z < −zα/2 ) or P (Z > zα/2 ) = 0.025; Z ∼ N (0, 1) From MINITAB we obtain: zα/2 = 1.96 such that the rejection region will be Z < −1.96; or Z > 1.96

6

CHAPTER 15.

(iv) The p-value associated with the computed test statistic, z = −1.28, is obtained from P (Z < −1.28) + P (Z > 1.28) = 0.201 (v) Either from the fact that the test statistic does not lie in the rejection region obtained in (iii), or because the associated p-value exceeds the pre-specified significance level α = 0.05, the conclusion is that there is no evidence for rejecting the null hypothesis in favor of the alternative. 15.13 When the population variance is unknown, the hypotheses do not change, but the test statistic is now the t-statistic, T =

¯ − µ0 X √ s/ n

where s is the sample standard deviation. From the supplied data, we obtain s = 1.003, so that the test statistic is: t = −1.27 The rejection region is determined by finding the value tα/2 (9) such that: P (Z < −tα/2 ) or P (Z > tα/2 ) = 0.025; T ∼ t(9) From MINITAB we obtain, for the t distribution with 9 degrees of freedom, tα/2 = 2.62 such that the rejection region will be T < −2.62 or T > 2.62 The associated p-value is obtained as, P (T < −2.62) + P (T > 2.62) = 0.235 Therefore, the conclusion remains the same as in Exercise 15.12: there is no evidence for rejecting the null hypothesis. Note that the value computed for the sample standard deviation, s = 1.003, is virtually identical to the population value σ = 1.0 specified in Exercise 15.12. 15.14 (i) The required test is a one-sided, one-sample z-test; the result from MINITAB is shown below: One-Sample Z Test of mu = 75 vs > 75 The assumed standard deviation = 3

7

N 50

Mean 80.050

SE Mean 0.424

95% Lower Bound 79.352

Z 11.90

P 0.000

The conclusion is that we must reject the null hypothesis since the associated p-value is zero to three decimal places. When the alternative hypothesis is µ0 ̸= 75, the result of the two-sided test carried out in MINITAB is shown below: One-Sample Z Test of mu = 75 vs not = 75 The assumed standard deviation = 3 N 50

Mean 80.050

SE Mean 0.424

95% CI (79.218, 80.882)

Z 11.90

P 0.000

The conclusion is also that we must reject the null hypothesis because the associated p-value is zero. The two tests are fundamentally different (the former is one-sided; the latter is two-sided); however, the results of the tests are also different because, in principle, the p-value associated with the latter test is exactly twice that associated with the former test. Had we computed these p-values to a sufficient number of decimal places, this fact would have been evident. For the first test, p1 = P (Z > 11.90) while for the second test, p2 = P (Z > 11.90) + P (Z < −11.90) and from the symmetry of the standard normal distribution, we observe that p2 = 2p1 . 15.15 The test statistic under the specified conditions is: z=

106 − 100 = 15 40/100

and the significance level for the one-sided test is obtained by computing P (Z > 15); from MINITAB, this is obtained as: P (Z > 15) = 0.000 Section 15.4 15.16 (i) From the supplied information, the required distributions are: ) ( 2 ¯ ∼ N 10, 2.5 X 50 ( ) 3.02 Y¯ ∼ N 12, 50

8

CHAPTER 15.

¯ is defined as (ii) If the random variable D ¯ = Y¯ − X ¯ D then, ¯ = E(Y¯ ) − (X) ¯ = 12 − 10 = 2 E(D) and ¯ = V ar(Y¯ ) + V ar(X) ¯ = V ar(D)

2.52 + 3.02 = 0.305 50

(iii) The required z statistic is: z=

(11.8 − 10.9) − (12 − 10) √ = −1.992 0.305

If this test statistic is used to test the stated hypothesis—which may now be restated as H0 : δ = 0 versus the alternative, Ha : δ > 0—the associated p value is obtained as: p = P (Z > −1.992) = 0.977 Thus, there is no evidence in support of rejecting the null hypothesis (of equality of the means) in favor of the alternative. (iv) The problem with using the z statistic computed in (iii) to test the hypothesis of equality of the two means is that it is the wrong statistic for this particular test. To test equality of the two means, the appropriate statistic will have to be based on δ0 = µY − µX = 0, instead of δ0 = 2 which was used to obtain z = −1.992; i.e., the appropriate z statistic for testing equality should be: (11.8 − 10.9) − 0 √ z= = 1.629 0.305 with the associated p-value: p = P (Z > 1.629) = 0.052 There is now (marginal) evidence, at the α = 0.052 significance level, for rejecting the null hypothesis of equality in favor of the alternative that µY > µX , which would be more consistent with the supplied population information. 15.17 The test statistic is: z=

(15.5 − 13.8) − (0) √ = 1.271 4.22 3.52 + 20 15

To test the null hypothesis that the means are equal, versus the alternative that they are not, we could use either the associated p-value obtained as: p = P (Z > 1.271) + P (Z < −1.271) = 0.204

9 (a value greater than 0.05), or the critical value for the two-sided test for α = 0.05, zα/2 = 1.96, and the fact that 1.271 < 1.96; either way, there is no evidence for rejecting the null hypothesis in favor of the alternative. For the one-sided alternative, Ha : µ1 > µ2 , this time, the associated p-value is obtained as: p = P (Z > 1.271) = 0.102 which is greater than 0.05; also, the critical value, zα = 1.65, is such that 1.271 < 1.65. Once more, the conclusion is that there is no evidence to reject the null hypothesis in favor of the upper-tailed hypothesis. 15.18 The result of an upper-tailed, two-sample t-test is shown below when the two population variances are assumed equal. (It is important in using MINITAB to ensure that the data columns are entered into the dialog box in the correct order so that the test is carried out for the proper alternative, µY > µX , and not µX > µY .) Two-Sample T-Test and CI: Y, X Two-sample T for Y vs X N Mean StDev SE Mean Y 15 10.80 2.55 0.66 X 15 9.51 2.49 0.64 Difference = mu (Y) - mu (X) Estimate for difference: 1.290 95% lower bound for difference: -0.275 T-Test of difference = 0 (vs >): T-Value = 1.40 P-Value = 0.086 DF = 28 Both use Pooled StDev = 2.5200 The implication of this result is that at the α = 0.05 significance level, we find no evidence for rejecting the null hypothesis. Note the values of the two sample standard deviations and of the pooled standard deviation used for the test. When the population variances are not assumed to be equal, the result is: Two-Sample T-Test and CI: Y, X Two-sample T for Y vs X N Mean StDev SE Mean Y 15 10.80 2.55 0.66 X 15 9.51 2.49 0.64 Difference = mu (Y) - mu (X) Estimate for difference: 1.290 95% lower bound for difference: -0.277 T-Test of difference = 0 (vs >): T-Value = 1.40 P-Value = 0.086 DF = 27

10

CHAPTER 15.

Scatterplot of X, Y vs Sample 14

Variable X Y

13 12

Y-Data

11 10 9 8 7 6 5 0

2

4

6

8 Sample

10

12

14

16

Figure 15.3: Plot of data for X and Y against sample number This result is virtually identical to the result obtained earlier. The fundamental difference is that this current test does not use a pooled standard deviation; consequently, the “degrees of freedom” is now 27 instead of 28. Nevertheless, once we observe that the computed sample standard deviations (sX = 2.55 and sY = 2.49) are quite close, so that assuming equality is not unreasonable, we should not be surprised that the key result of both versions of the test (i.e., T = 1.40; p = 0.086) are identical. 15.19 (i) The plot is shown in Fig 15.3,where it is abundantly clear that X and Y are correlated and therefore cannot be treated as independent. (ii) Treated as paired observations, the result of a paired t-test of equality of population means, versus the alternative that µY > µX , is shown below: Paired T-Test and CI: Y, X Paired T for Y - X N Mean StDev SE Mean Y 15 10.803 2.548 0.658 X 15 9.513 2.492 0.643 Difference 15 1.290 0.577 0.149 95% lower bound for mean difference: 1.028 T-Test of mean difference = 0 (vs > 0):T-Value = 8.67 P-Value = 0.000 This time, with a p-value of 0.000, there is strong evidence for rejecting the null hypothesis in favor of the alternative. From the plot in Fig 15.3, it is clear that Y is uniformly greater than X for each sample. Yet the two-sample t-test was unable to detect this difference.

11 The main problems with the standard two-sample t-test carried out in (i) are: (a) the two data sets are not independent; and (b) the variability within each data set is more pronounced than the difference between the data sets. As such, the standard two-sample t-test is inappropriate and will be entirely incapable of detecting the overwhelming evidence of a difference between the data sets. 15.20(i) The result of a standard two-sample t-test is as follows: Two-Sample T-Test and CI: XL1, XL2 Two-sample T for XL1 vs XL2 N Mean StDev SE Mean XL1 10 1.094 0.283 0.089 XL2 10 1.379 0.348 0.11 Difference = mu (XL1) - mu (XL2) Estimate for difference: -0.284 95% CI for difference: (-0.583, 0.015) T-Test of difference = 0 (vs not =): T-Value = -2.01 P-Value = 0.061 DF = 17 With a p-value of 0.061, at the α = 0.05 significance level, we find no evidence for rejecting the null hypothesis. However, we know that the two populations are in fact different; as such, this test—with α = 0.05—has led to an incorrect conclusion. (ii) The result of a standard two-sample t-test on the log-transformed data Y1 and Y2 is as follows: Two-Sample T-Test and CI: Y1, Y2 Two-sample T for Y1 vs Y2 N Mean StDev SE Mean Y1 10 0.062 0.245 0.078 Y2 10 0.295 0.239 0.076 Difference = mu (Y1) - mu (Y2) Estimate for difference: -0.233 95% CI for difference: (-0.461, -0.004) T-Test of difference = 0 (vs not =): T-Value = -2.15 P-Value = 0.047 DF = 17 This time, because the p-value is 0.047, at the α = 0.05 significance level, we find evidence for rejecting the null hypothesis in favor of the alternative that the population means are different. Since we know that the two populations are in fact different, we observe that this test, at the α = 0.05 significance level, has led to the correct conclusion. (iii) The first test led to an incorrect conclusion in part because the normality

12

CHAPTER 15.

assumption does not hold for the data sets. However, the p-value of 0.061 is really not that far from the α = 0.05 significance level; similarly, the p-value of 0.047 for the log-transformed data set, is also quite close to the critical value of 0.05 employed for the test. At a significance level of 0.1 for example (or any value greater than 0.061), one would have concluded (correctly, it turns out) even from the first test, that the two populations are different. 15.21 The result of the upper-tailed, two-sample t-test is as follows: Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 90 42.40 5.92 0.62 2 120 38.80 4.47 0.41 Difference = mu (1) - mu (2) Estimate for difference: 3.600 95% lower bound for difference: 2.367 T-Test of difference = 0 (vs >): T-Value = 4.83 P-Value = 0.000 DF = 159 With a p value of 0 (actually, p = 1.593 × 10−6 ), at the α = 0.05 significance level, we find evidence in support of rejecting the null hypothesis in favor of the alternative. If, instead, the hypothesis is changed to a two-sided one (that the population means are different), the test statistic remains the same at t = 4.83, and the p-value doubles to p = 3.186 × 10−6 , which is still practically zero. For all practical purposes, however, the conclusion remains unchanged. For the sake of completeness, the result of the two-sided test is shown below. Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 90 42.40 5.92 0.62 2 120 38.80 4.47 0.41 Difference = mu (1) - mu (2) Estimate for difference: 3.600 95% CI for difference: (2.128, 5.072) T-Test of difference = 0 (vs not =): T-Value = 4.83 P-Value = 0.000 DF = 159 Observe that now the result includes a 95% confidence interval for the estimated difference (which does not contain zero); and, as noted previously, the associated p-value is zero to three decimal places. Section 15.5 15.22 (i) From a t(99) distribution, we obtain the critical values (upon which the rejection regions will be based) for α = 0.1, 0.05 and 0.01, respectively, as follows (it is also acceptable to use the standard normal distribution and z, based on the valid assumption that the sample size of 100 is large enough to

13 make s sufficiently close to the unknown σ; the results are very similar): tc0.1 = (−1.66, 1.66); tc0.05 = (−1.984, 1.984); tc0.01 = (−2.626, 2.626) From the definition of the t-statistic: T =

¯ − µ0 X √ s/ n

we obtain the boundaries of the rejection region for the sample mean as: tc s x ¯ = √ + µ0 n

(15.1)

In this specific case with the supplied values for µ0 , s, and n, we obtain from Eq (15.1) that, for α = 0.1, the rejection region is x ¯ < 11.917; and x ¯ > 12.083 for α = 0.05, the rejection region is x ¯ < 11.901; and x ¯ > 12.099 and for α = 0.01, the rejection region is x ¯ < 11.869; and x ¯ > 12.131 (ii) With the indicated shift from 12 to 11.9, (δ ∗ = −0.1), the “z-shift” is: −0.1 × 0.5 = −0.005 10 The results of the power computation for α = 0.1, 0.05, and 0.01, follow: Power and Sample Size 1-Sample t Test Testing mean = null (versus not = null) Calculating power for mean = null + difference Alpha = 0.1 Assumed standard deviation = 0.5 Sample Difference Size Power -0.1 100 0.633759 so that, for α = 0.1, (1 − β) = 0.634; ⇒ β = 0.366 Next, for α = 0.05, Power and Sample Size 1-Sample t Test

(15.2)

14

CHAPTER 15.

Testing mean = null (versus not = null) Calculating power for mean = null + difference Alpha = 0.05 Assumed standard deviation = 0.5 Sample Difference Size Power -0.1 100 0.508265 so that, for α = 0.05, (1 − β) = 0.508; ⇒ β = 0.492

(15.3)

Finally, for α = 0.01, Power and Sample Size 1-Sample t Test Testing mean = null (versus not = null) Calculating power for mean = null + difference Alpha = 0.01 Assumed standard deviation = 0.5 Sample Difference Size Power -0.1 100 0.271186 so that, for α = 0.01, (1 − β) = 0.271; ⇒ β = 0.729

(15.4)

We now observe that as the value of α (i.e., the α-risk) is lowered from 0.1 through 0.05 to 0.01, the value of β (or the β-risk) increases from 0.366 to 0.492 and finally to 0.729, with a commensurate reduction in power from 0.634 to 0.508 and finally to 0.271. Thus, as noted in the text, a reduction in one risk will result in an increase in the other. 15.23(i) δ = 0.513; δ = 0.725 (ii) δ = 0.417; δ = 0.589 (iii) n = 43; n = 19 (iv) n = 28; n = 13 (v) P ower = 0.885; P ower = 0.609 (vi) P ower = 0.997; P ower = 0.918 15.24 When the test is a 1-sample t-test, the results are as follows: (i) δ = 0.525; δ = 0.764 (ii) δ = 0.427; δ = 0.621 (iii) n = 44; n = 21 (iv) n = 30; n = 15 (v) P ower = 0.869; P ower = 0.565 (vi) P ower = 0.996; P ower = 0.889 The differences are slight, but in general, things are “worse” across the board for the t-test compared to the corresponding results for the z-test. Specifically,

15 the detectable differences are slightly higher, larger samples are required, and power values are lower. 15.25 The characteristics of the power-and-sample-size analysis are as follows: • δ, the “difference” to be detected, is 0.25 (because µY2 = 0.25 while the null hypothesis, µY1 = µY2 = 0, is matched with the alternative hypothesis, µY1 ̸= µY2 ); • n, sample size, is 10; • α = 0.05; σ = 0.25 The result of the analysis using MINITAB is as follows: Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 0.25 Sample Difference Size Power 0.25 10 0.562007 The sample size is for each group. The power is obtained as (1 − β) = 0.562. To increase the power to 0.9 or better, a repeat of the analysis, inserting the desired power and removing the sample size, yields the following result (n = 23, and actual power of 0.912): Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 0.25 Sample Target Difference Size Power Actual Power 0.25 23 0.9 0.912498 The sample size is for each group. Thus, for the test to achieve a power of 0.9 or better requires a sample size of at least 23 for each group, 13 more than that used to generate the data in Exercise 15.20. 15.26 The supplied information is: • δ = 2; • Desired power: 0.9

16

CHAPTER 15. • Standard deviations: σ12 = 10; σ22 = 20; the pooled standard deviation is therefore obtained from: √ (n1 − 1)σ12 + (n2 − 1)σ22 σp = (n1 + n2 − 2) Even though the sample sizes are unknown, we know that n1 = n2 = n, so that: √ √ (n − 1)σ12 + (n − 1)σ22 σ12 + σ22 σp = = = 3.873 2(n − 1) 2

With α = 0.05, the result of the power-and-sample-size analysis is as follows: Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 3.873 Sample Target Difference Size Power Actual Power 2 80 0.9 0.900795 The sample size is for each group. Thus, 80 samples will be required from each population. 15.27 From the expression in Eq (15.98) in the text, we obtain, upon taking logs: ln n = 2(ln 2.925 − ln ρSN ) from where we obtain: Sr =

∂ ln n = −2 ∂ ln ρSN

as required, establishing that a 1% increase in the signal-to-noise ratio ρSN translates to an (instantaneous) incremental reduction of 2% in sample size requirements. To increase ρSN = δ ∗ /σ in practice, one has two options: (i) reduce σ, the standard deviation associated with the measurements, by investing in a better (more precise) measurement system, or (ii) increase the magnitude of the minimum signal detectable by the test. Section 15.6 15.28 The hypothesis to be tested is: H0 : σ 2 = 10 Ha : σ 2 ̸= 10

17

Distribution Plot Chi-Square, df=19 0.07 0.06

f(x)

0.05 0.04 0.03 0.02 0.01 0.00

0.025 0

0.025 8.907

32.85 X

Figure 15.4: Rejection region for χ2 test of Problem 15.28. To use “first-principles,” we begin by computing the test statistic, in this case, the χ2 statistic: 19 × 9.5 c2 = = 18.05 10 And now, we may determine the rejection region for the two-sided hypothesis at the α = 0.05 significance level, from a χ2 (19) distribution; the result is shown in Fig 15.4, i.e., C 2 > 32.85 and C 2 < 8.91. Since the specific computed test statistic, c2 = 18.05, does not lie in this rejection region, we find no evidence for rejecting the null hypothesis in favor of the alternative. Alternatively, one may use MINITAB directly to carry out the “Chi-square” test; the result is shown below: Test and CI for One Variance Method Null hypothesis Sigma-squared = 10 Alternative hypothesis Sigma-squared not = 10 The chi-square method is only for the normal distribution. The Bonett method cannot be calculated with summarized data. Statistics N StDev 20 3.08

Variance 9.50

95% Confidence Intervals CI for CI for Method StDev Variance Chi-Square (2.34, 4.50) (5.49, 20.27)

18

CHAPTER 15.

Tests Method Chi-Square

Test Statistic 18.05

DF 19

P-Value 0.962

The test statistic is 18.05, as we had obtained earlier; and the associated p-value is 0.962, indicating a lack of evidence for rejecting the null hypothesis. 15.29 What is required are two separate tests of single variances: (i) that σY1 = 0.25, and (ii) that σY2 = 0.25, versus alternatives that each is not equal to the specified value. A summary of the two χ2 test results from MINITAB is shown below (highlighting only the most relevant aspects of a more detailed set of results):

Variable Y1 Y2

Method Chi-Square Chi-Square

Test Statistic 8.68 8.22

DF 9 9

P-Value 0.935 0.976

The indicated p-values show that there is no evidence for rejecting each null hypothesis that the standard deviations are each equal to 0.25. 15.30 Rejection of an upper-tailed χ2 test occurs at the α = 0.05 significance level when the following condition holds: C2 =

(n − 1)S 2 > χ20.05 (n − 1) σ02

which, in this specific case, with n = 20, simplifies to yield: S2 >

σ02 χ20.05 (19) 19

and since, from MINITAB, χ20.05 (19) = 30.14, the required relationship is: s2 > 1.586σ02

15.31 The hypothesis to be tested is: H0 : σ 2 = 1 Ha : σ 2 ̸= 1 The result from MINITAB is as follows: Test and CI for One Variance: SN Method Null hypothesis Sigma-squared = 1 Alternative hypothesis Sigma-squared not = 1

19 Statistics Variable N SN 10

StDev 1.00

Variance 1.01

95% Confidence Intervals Variable SN

Method Chi-Square

CI for StDev (0.69, 1.83)

CI for Variance (0.48, 3.35)

Method Chi-Square

Test Statistic 9.06

DF 9

Tests Variable SN

P-Value 0.864

With an associated p-value of 0.864, there is no evidence for rejecting the null hypothesis, leading us to “confirm” the null hypothesis. 15.32 The hypothesis to be tested is: H0 : σY2 1 = σY2 2 Ha : σY2 1 ̸= σY2 2 the result of the F -test obtained from MINITAB is as follows: Test and CI for Two Variances: Y1, Y2 Method Null hypothesis Variance(Y1) / Variance(Y2) = 1 Alternative hypothesis Variance(Y1) / Variance(Y2) not = 1 Significance level Alpha = 0.05 Statistics Variable N Y1 10 Y2 10

StDev 0.245 0.239

Variance 0.060 0.057

Ratio of standard deviations = 1.027 Ratio of variances = 1.055 95% Confidence Intervals Distribution of Data Normal Continuous

CI for StDev Ratio (0.512, 2.061) (0.459, 2.297)

CI for Variance Ratio (0.262, 4.248) (0.211, 5.276)

Tests Method F Test (normal) Levene’s Test (any continuous)

DF1 9 1

DF2 9 18

Test Statistic 1.06 0.01

P-Value 0.938 0.939

20

CHAPTER 15.

Distribution Plot F, df1=30, df2=25 1.2

1.0

Density

0.8 0.6

0.4

0.2 0.025 0.025 0.0

0

0.4709

2.182 X

Figure 15.5: Rejection region for two-sided F -test of Problem 15.33(i)a, at significance level α = 0.05.

The p-value associated with the F -test, 0.938, indicates that there is no evidence for rejecting the null hypothesis. 15.33 (i) The rejection region is shown in Fig 15.5 for the two-sided F -test, with α = 0.05, i.e.: f > 2.182 and f < 0.471 For the F -test, with α = 0.1, the rejection region is shown in Fig 15.6, i.e.: f > 1.919 and f < 0.532

15.34 The upper-tailed rejection region, f > f0 , for the standard F (12, 11) at the α = 0.05 significance level, is shown in Fig 15.7, i.e., the critical value, f0 , for comparing S22 to S12 is: f0 = 2.79 (15.5) For comparing S22 to kS12 , the result was obtained as f > 5.58 From the definition of the test statistics for carrying out F -tests, the implication is now that: f0 = 5.58 k so that, from Eq (15.5), we obtain: k = 0.5

21

Distribution Plot F, df1=30, df2=25 1.2

1.0

Density

0.8 0.6

0.4

0.2

0.05 0.05

0.0

0

0.5324

1.919 X

Figure 15.6: Rejection region for two-sided F -test of Problem 15.33(i)b, at significance level α = 0.10.

Distribution Plot F, df-num=12, df-denom=11 0.8 0.7

Density

0.6 0.5 0.4 0.3 0.2 0.1 0.05 0.0

0

2.788 X

Figure 15.7: Rejection region for standard upper-tailed F -test, with numerator degrees of freedom 12, and denominator degrees of freedom 11, at significance level α = 0.05.

22

CHAPTER 15.

Distribution Plot F, df1=24, df2=24 1.2 1.0

Density

0.8 0.6

0.4

0.2 0.0

0.007942 0

2.76 X

Figure 15.8: Rejection region for upper-tailed F -test, with ν1 = ν2 = 24, at significance level α = 0.05.

15.35 The test statistic in this case is f=

0.58 = 2.76 0.21

For an upper-tailed test, the associated p-value is determined by computing P (F > 2.76), using an F (24, 24) distribution. The result is (see Fig 15.8): P (F > 2.76) = 0.008 In addition, a summary of a direct F -test using MINITAB is shown below: Test Method DF1 DF2 Statistic P-Value F Test (normal) 24 24 2.76 0.008 with a p-value of 0.008, as expected. Thus, at the α = 0.05 significance level, we reject the null hypothesis and conclude that the stock volatilities are different. Section 15.7 15.36 From Eq (5.124) in the text, in this specific case, we have: √ I0 = 0.75 ± 3 (0.75 × 0.25)/100 = 0.75 ± 0.13 so that the interval in question is (0.62, 0.88), which does not include zero. Thus, the sample size is large enough for the large sample approximation to be valid. The large sample (normal distribution approximation) test result from MINITAB is shown below:

23 Test and CI for One Proportion Test of p = 0.75 vs p not = 0.75 Sample X N Sample p 95% CI 1 72 100 0.720000 (0.631998, 0.808002) Using the normal approximation.

Z-Value -0.69

P-Value 0.488

With a p-value of 0.488, we find no evidence for rejecting the null hypothesis at the α = 0.05 significance level. 15.37 (i) Assuming that the sample size, n = 50, is sufficiently large to make a normal approximation valid (which it is, from Eq (5.124) in the text), the required 95% confidence interval is obtained as: √ p = 0.63 ± 1.96σp / 50 √

where σp =

pˆ(1 − pˆ) = 0.068 50

As such, p = 0.63 ± 0.019 is the required 95% confidence interval. (ii) Under the stated conditions, the equation to be solved is: √ pˆ(1 − pˆ) 0.1 = 1.96 n which yields,

1.96 √ pˆ(1 − pˆ) = 9.46 ≈ 10 0.1 so that, rounded up to the nearest integer, the required sample size is 10. (iii) The sample size is obtained by solving the following equation for n: √ pˆ(1 − pˆ) ℓ = 1.96 n n=

to yield n=

1.96 √ pˆ(1 − pˆ) ℓ

15.38 The presumption that the coin is fair is the same as postulating that p0 = 0.5. The hypothesis to be tested is therefore: H0 : p = 0.5 Ha : p ̸= 0.5

24

CHAPTER 15.

The interval I0 in this case is: I0 = 0.5 ± 3

√ (0.5 × 0.5)/10 = 0.5 ± 0.474

which just barely excludes zero, so that the large sample approximation is valid. The result of the hypothesis test in MINITAB is shown below: Test and CI for One Proportion Test of p = 0.5 vs p not = 0.5 Sample X N Sample p 95% CI Z-Value P-Value 1 4 10 0.400000 (0.096364, 0.703636) -0.63 0.527 Using the normal approximation. The normal approximation may be inaccurate for small samples. The observed p-value of 0.527 indicates that, at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis that the coin is fair. The result of the exact test is shown below: Test and CI for One Proportion Test of p = 0.5 vs p not = 0.5 Sample 1

X 4

N 10

Sample p 0.400000

95% CI (0.121552, 0.737622)

Exact P-Value 0.754

The associated p-value is now 0.754, but the conclusion is the same: we find no evidence for rejecting the null hypothesis at the α = 0.05 significance level. Had the coin been tossed 5 times with 2 heads resulting, the “large sample validity” interval becomes: √ I0 = 0.5 ± 3 (0.5 × 0.5)/5 = 0.5 ± 0.671 which now includes zero. As such, the large sample approximation would no longer be valid; the exact test must be used under these conditions. 15.39 (i) The result of the two-sided hypothesis test: H0 : p1 = p2 Ha : p1 ̸= p2 is shown below: Test and CI for Two Proportions Sample X N Sample p 1 33 100 0.330000 2 27 75 0.360000 Difference = p (1) - p (2) Estimate for difference: -0.03 95% CI for difference: (-0.172459, 0.112459)

25 Test for difference = 0 (vs not = 0): Z = -0.41 P-Value = 0.679 Fisher’s exact test: P-Value = 0.748 The conclusion from the indicated p-value is that, at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis. (ii) When the hypothesis is modified to the upper-tailed alternative p2 > p1 , or, equivalently, the lower-tailed alternative p1 < p2 (retaining the subscript “1” for the “first sample”), the result from MINITAB is shown below: Test and CI for Two Proportions Sample X N Sample p 1 33 100 0.330000 2 27 75 0.360000 Difference = p (1) - p (2) Estimate for difference: -0.03 95% upper bound for difference: 0.0895549 Test for difference = 0 (vs < 0): Z = -0.41 P-Value = 0.340 Fisher’s exact test: P-Value = 0.399 As expected, the associated p-value, 0.34, is now half of the previous value obtained for the two-sided test. Thus, again, at the α = 0.05 significance level, we find no evidence to reject the null hypothesis in favor of this new alternative. 15.40 Assuming that the sample size will be large enough for the normal approximation to be valid, the (1 − α) × 100% confidence interval estimate of (p1 − p2 ) is obtained as: √ ( ) 1 1 δ = (p1 − p2 ) = (ˆ p1 − pˆ2 ) ± zα/2 pˆqˆ + n1 n2 Thus, in this specific case, with n1 = n2 = n, and the postulate that p1 = p2 = 0.3, the required sample size is obtained by solving for n in the equation: √ ( ) 2 ℓ = zα/2 0.3 × 0.7 n i.e., n=

2 0.42zα/2

ℓ2 And in the specific case with ℓ = 0.02 and α = 0.05 so that zα/2 = 1.96, n=

0.42 × 1.962 = 4033.68 ≈ 4034 0.022

Thus, to be able to detect the difference between two binomial population proportions to a precision of ±0.02, with 95% confidence, when p ≈ 0.3 for each population, requires more than 4000 samples.

26

CHAPTER 15.

Sections 15.8 and 15.9 15.41 (i) The sample mean of the data set is obtained as: x ¯ = 3.91 ¯ We know that X/β ∼ γ(n, 1/n). In this specific case, with n = 20, we know ¯ therefore that the distribution of X/β is γ(20, 0.05); we may now obtain, using MINITAB, the values aL , aR such that: ( ) ¯ X P aL < < aR = 0.95 β The result is: aL = 0.611; aR = 1.484 ¯ so that the 95% confidence interval for X/β is 0.611
0.5. The implications are that, at the α = 0.05 significance levels, there is no evidence for rejecting the null hypothesis that the observations X = 1 and X = 2 belong to the same Poisson population, P(0.5), while there is evidence to reject this null hypothesis in favor of the alternative for the observation X = 3. 15.55 (i) First, we need to determine p for each cohort group as follows: ∑ xi y i Total number of successes p= = i Total number of trials 100 × 5 For each cohort group, the result is: 102 500 201 pY = 500 pO =

= 0.204 = 0.402

42

CHAPTER 15.

Distribution Plot Poisson, Mean=0.5 0.6

Probability

0.5 0.4 0.3 0.2 0.1 0.0

0.3935

0

1 X

(a) P (X ≥ 1|λ = 0.5) for a random variable X ∼ P(0.5) Distribution Plot Poisson, Mean=0.5 0.6

Probability

0.5 0.4 0.3 0.2 0.1 0.09020 0.0

0

2 X

(b) P (X ≥ 2|λ = 0.5) for a random variable X ∼ P(0.5) Distribution Plot Poisson, Mean=0.5 0.6

Probability

0.5 0.4 0.3 0.2 0.1 0.01439 0.0

0

3 X

(c) P (X ≥ 3|λ = 0.5) for a random variable X ∼ P(0.5)

Figure 15.14: Tail area probabilities for conducting exact hypothesis tests on safety incidents data of Problem 15.54(ii).

43 We are now required to carry out the following two-sided test of two proportions: H0 : p O = p Y Ha : pO ̸= pY Test and CI for Two Proportions Sample X N Sample p 1 102 500 0.204000 2 201 500 0.402000 Difference = p (1) - p (2) Estimate for difference: -0.198 95% CI for difference: (-0.253628, -0.142372) Test for difference = 0 (vs not = 0): Z = -6.98 P-Value = 0.000 Fisher’s exact test: P-Value = 0.000 The p-value of 0.000 indicates that at the α = 0.05 significance level, we must reject the null hypothesis of equality in favor of inequality. Thus, there is strong evidence that the single embryo probability of success parameter is significantly different for each cohort group. (ii) The hypothesis to be tested is the lower-tailed, single-proportion hypothesis: H0 : pO = 0.3 Ha : pO < 0.3 Test and CI for One Proportion Test of p = 0.3 vs p < 0.3 95% Upper Sample X N Sample p Bound 1 102 500 0.204000 0.235916

Exact P-Value 0.000

The p-value of 0.000 indicates that at the α = 0.05 significance level, we must reject the null hypothesis in favor of the alternative. Thus, the evidence in the data indicates that pO , the probability of success for the older cohort group, is less than 0.3. (iii) The hypothesis to be tested is: H0 : pY = 0.3 Ha : pY > 0.3 an upper-tailed, single-proportion test; the result is shown below: Test and CI for One Proportion Test of p = 0.3 vs p > 0.3 95% Upper Sample X N Sample p Bound 1 201 500 0.402000 0.365435

Exact P-Value 0.000

Again, the p-value of 0.000 indicates that at the α = 0.05 significance level, we must reject the null hypothesis in favor of the alternative. This time, the

44

CHAPTER 15.

evidence in the data indicates that, pY , the probability of success for the younger cohort group is greater than 0.3. (iv) We start by assuming that the sample size will be large enough for the normal approximation to be valid; as a result, the 95% confidence interval for the estimate of ΠO , the probability of success parameter for the older cohort group, will be: √ pˆO (1 − pˆO ) ΠO = pˆO ± 1.96 n From the supplied information, the problem requires determining n such that √ pˆO (1 − pˆO ) 0.05 = 1.96 n with the result,

( n=

1.96 0.05

)2 pˆO (1 − pˆO )

and for initial estimates pˆO = 0.204, we obtain: n = 249.53 ≈ 250 Thus, at least 250 patients will be required in the clinical study. 15.56 (i) The required 95% confidence interval is obtained as: √ µ=x ¯ ± t0.025 (ν)s/ n where t0.025 (ν) is the value of the t-distribution (ν degrees of freedom) variate with an upper tail area probability of 0.025. In this specific case, with ν = 49, t0.025 (49) = 2.01 Thus, from the given sample mean and variance, the 95% confidence interval is: µ = 9.50 ± 1.85 (ii) The hypothesis to be tested in this case is: H0 : µ = 14 Ha : µ < 14 From the supplied information, the result of a lower-tailed, one-sample t-test is: One-Sample T Test of mu = 14 vs < 14

N 50

Mean 9.500

StDev 6.500

SE Mean 0.919

95% Upper Bound 11.041

T -4.90

P 0.000

45 The p-value of 0.000 indicates that at the α = 0.05 significance level, we must reject the null hypothesis in favor of the alternative. The evidence in the data therefore supports the alternative hypothesis that the mean number of sick days taken by employees is less than 14.00. (iii) First, we assume that the data constitutes a random sample from a normal population; next, the null hypothesis of the test in (ii) is that µ = 14, versus the alternative that µ < 14, while the sample mean is 9.50; as such, the difference to be “detected” is obtained as δ = 9.50 − 14 = −4.5. A power-and-sample size analysis carried out in MINITAB, using the sample standard deviation as a reasonable estimate of the population parameter, σ, yields the following result: Power and Sample Size 1-Sample t Test Testing mean = null (versus < null) Calculating power for mean = null + difference Alpha = 0.05 Assumed standard deviation = 6.5

Difference -4.5

Sample Size 50

Power 0.999268

indicating a power of better than 0.999. (iv) The required test is a χ2 test of the following hypothesis: H0 : σ 2 = 35 Ha : σ 2 > 35 The most relevant aspect of the test result is summarized below: Test and CI for One Variance Method Null hypothesis Sigma-squared = 35 Alternative hypothesis Sigma-squared > 35 Tests Method Chi-Square

Test Statistic 59.15

DF 49

P-Value 0.152

The indicated p-value of 0.152 implies that at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis. Thus, we find no evidence in the data to suggest that the observed variance is any worse than the typical industry standard. 15.57 Let σ12 be the population variance of measurements obtained using instrument 1, and σ22 the corresponding value for the second instrument. The

46

CHAPTER 15.

hypothesis to be tested is: H0 : σ12 = σ22 Ha : σ12 < σ22 because the postulated lower precision of instrument 2 translates to a higher value of σ22 . The most relevant aspect of the result of the F -test is summarized below: Test and CI for Two Variances: Inst1, Inst2 Method Null hypothesis Variance(Inst1) / Variance(Inst2) = 1 Alternative hypothesis Variance(Inst1) / Variance(Inst2) < 1 Significance level Alpha = 0.05 Test Method DF1 DF2 Statistic P-Value F Test (normal) 9 9 0.50 0.157 Levene’s Test (any continuous) 1 18 0.19 0.334 The p-value of 0.157 indicates that at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis. Therefore, the evidence in the data does not support the claim that instrument 2 is less precise than instrument 1. 15.58 Upon assuming that the data sets are from two independent normal populations, the hypothesis to be tested is: H0 : σ12 = σ22 Ha : σ12 ̸= σ22 where σ12 is the variance associated with the “shaken” method, and σ22 is associated with the “surface” method. The most relevant aspect of the result of the F -test is summarized below: Test and CI for Two Variances: Shaken, Surface Method F Test (normal) Levene’s Test (any continuous)

DF1 5 1

DF2 5 10

Test Statistic 0.74 0.15

P-Value 0.747 0.710

The p-value of 0.747 indicates that there is no evidence for rejecting the null hypothesis. We are therefore inclined to believe that the variability in the protein content is the same for both methods. 15.59 With the assumptions that the normal approximation will be valid for describing the distribution of the sample mean of YAB and of YAC , to test the hypothesis that operator A is more safety conscious than operator B, the

47 hypothesis should be stated as follows: H0 : µAB = 0 Ha : µAB > 0 because better safety consciousness translates to longer waiting times between occurrences of safety violations, which implies µAB > 0. The result of an upper-tailed, one-sample t-test carried out on YAB = XA − XB is shown below: One-Sample T: YAB Test of mu = 0 vs > 0

Variable YAB

N 10

Mean -0.257

StDev 2.110

SE Mean 0.667

95% Lower Bound -1.480

T -0.39

P 0.645

The indicated p-value, 0.645, implies that at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis in favor of the alternative. Thus, there is no reason to believe that operator A is any more safety conscious than operator B. Similarly, to test the hypothesis that operator A is more safety conscious than operator C, this time, the hypothesis should be stated as follows: H0 : µAC = 0 Ha : µAC > 0 The result of the t-test is: One-Sample T: YAC Test of mu = 0 vs > 0 Variable YAC

N 10

Mean -0.105

StDev 3.085

SE Mean 0.975

95% Lower Bound -1.893

T -0.11

P 0.542

This time, the indicated p-value is 0.542, but the implication remains the same: at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis in favor of the alternative. Thus, again, there is no reason to believe that operator A is any more safety conscious than operator C. 15.60 Assuming that the samples come from independent and approximately normally distributed populations, the first hypothesis to be tested is: 2 2 H0 : σ A = σB 2 2 Ha : σ A ̸= σB 2 2 where σA is the variance associated with the production in Plant A, and σB is associated with Plant B. The result of the F -test is shown below:

48

CHAPTER 15.

Test and CI for Two Variances Method F Test (normal)

DF1 9

DF2 9

Test Statistic 0.80

P-Value 0.739

The p-value of 0.739 indicates that at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis in favor of the alternative. We conclude therefore that the two population variances are likely to be the same. The next hypothesis for testing the equality of the mean production output may be stated as follows: H0 : µA = µB Ha : µA ̸= µB and because we have concluded that the variances may be equal, we are justified in employing a pooled standard deviation for the two-sample t-test. The result is shown below: Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 10 28.00 3.24 1.0 2 10 33.00 3.63 1.1 Difference = mu (1) - mu (2) Estimate for difference: -5.00 95% CI for difference: (-8.23, -1.77) T-Test of difference = 0 (vs not =): T-Value = -3.25 P-Value = 0.004 DF = 18 Both use Pooled StDev = 3.4424 The p-value of 0.004 indicates that at the α = 0.05 significance level, we must reject the null hypothesis in favor of the alternative. We conclude therefore that the mean production output from each plant are different. 15.61 Some preliminary points: If we assume that the sampling distribution of ¯ the random sample average, is approximately normal the random variable X, with a constant variance σ 2 , it is important to recognize: 1. That the key issue involves tests to ascertain whether the observed data can be considered as random samples from a distribution with the pdf N (63.7, 212 ); 2. Furthermore, since we are unconcerned with whether the observed averages and standard deviations are greater than or less than the corresponding “standard” values, only whether they are “statistically equal” to the standard or not, it is also important to note that two simultaneous two-sided hypothesis tests are required, namely, H0µ : µ = 63.7 Haµ : µ ̸= 63.7

49 for the mean, along with H0σ : σ

=

21.0

Haσ : σ

̸=

21.0

for the standard deviation. 3. Finally, observe that σ for the reference distribution is in fact specified, thus affecting how the test for the sample means will be performed. In conducting these tests and drawing conclusions from them, note that for the monthly operation to be considered “normal,” none of the hypotheses about the mean µ, or the standard deviation σ should be rejected; a rejection of either one or the other of these hypotheses for any month immediately forces us to conclude that the operation is “abnormal” at the indicated significance level. The Test Statistics: Because σ is given, the test statistic for testing hypotheses about the monthly means is: ¯ − 63.7 X √ (15.15) Z= σ/ n which has a N (0, 1) distribution. (Care must be taken in computing the specific values of this statistic because the value of n varies from month to month.) At the 5% significance level, the rejection region is: |Z| > z0.025 = 1.96

(15.16)

For σ, the test statistic is: C2 =

(n − 1)S 2 212

(15.17)

which possesses a χ2 (n − 1) distribution. Thus, at the 5% significance level, the rejection region is: C 2 < χ20.975 (n − 1); or C 2 > χ20.025 (n − 1)

(15.18)

While it is not essential to perform all the tests, (since in some cases a rejection of a hypothesis concerning one half of the “µ − σ pair” immediately forces a conclusion) the following tables summarize the results of all the tests: Tests concerning the sample means z, In Critical Month Computed Value Region of Test Statistic |z| > 1.96? Jan 0.447 Feb −3.828 × Mar −2.153 × Apr 1.690 May −0.554 Jun 2.037 ×

50

CHAPTER 15.

Month Jan Feb Mar Apr May Jun

Tests concerning the sample standard deviations c2 Computed Value n − 1 χ20.975 (n − 1) χ20.025 (n − 1) of Test Statistic 25.702 11 3.82 21.92 8.000 8 2.18 17.53 12.350 9 2.70 19.02 7.185 10 3.25 20.48 3.143 6 1.24 14.45 6.708 10 3.25 20.48

In Critical Region? ×

We may now combine these two sets of test results into the following table summarizing the conclusions about the ore refining monthly operation status: Conclusions concerning the ore refining monthly operation Tests on Tests on Monthly Month Sample Sample Operation Means Standard Deviations Status Jan × “Abnormal” Feb × “Abnormal” Mar × “Abnormal” Apr “Normal” May “Normal” Jun × “Abnormal” Thus at a significance level of 5%, the refinery operation would be considered “abnormal” during the months of January, February, March and June.

Chapter 16

Exercises 16.1 (i) From the given expression, ∑n xi yi θˆ = ∑i=1 n 2 i=1 xi we find that: ˆ = E(θ)

∑n x E(y ) i=1 ∑n i 2 i i=1 xi

But from the problem statement, since E(ϵi ) = 0, then E(yi ) = xi θ As a result,

∑n θ i=1 x2i ˆ E(θ) = ∑n 2 =θ i=1 xi

as required. (ii) By definition and from the given expression, we find that ∑n 2 i=1 xi V ar(yi ) ˆ = V ar(θ) ∑n 2 ( i=1 x2i ) ∑n σ 2 i=1 x2i = ∑n 2 ( i=1 x2i ) =

σ2 ∑n i=1

x2i

as required. 16.2 The given objective, Sw (θ) =

n ∑

Wi (Yi − θ)2

i=1

1

2

CHAPTER 16.

may be minimized via the typical calculus route by taking derivatives with respect to θ, setting this to zero, and solving the resulting equation for θ, i.e., n ∑ ∂Sw Wi (Yi − θ) = 0 = −2 ∂θ i=1

yielding:

∑n Wi Yi ˆ θω = ∑i=1 n i=1 Wi

Upon defining ωi as:

Wi ωi = ∑n i=1

so that

n ∑

(16.1)

Wi

ωi = 1

i=1

then, Eq (16.1) becomes: θˆω =

n ∑

ωi Yi

i=1

as required. 16.3 (i) From the given objective, Sw (θ) =

n ∑

Wi [yi − (xi θ)]2

i=1

Upon taking derivatives with respect to θ and equating to zero, the result is: n ∑ ∂Sw ˆ =0 = −2 xi Wi (yi − xi θ) ∂θ i=1

or:



xi Wi yi =

i



Wi x2i θ

i

which, when solved for θ, yields: ∑ Wi xi yi ˆ θω = ∑i 2 i W i xi

(16.2)

When this is compared with Eq (16.18) in the text, we find that the two expressions are very similar except that the expression in Eq (16.2) incorporates with each observation i, the associated weight, Wi . (ii) By definition, ∑ ∑ Wi xi E(yi ) E ( i W i xi y i ) i∑ ∑ = E(θˆω ) = 2 2 W x i i i i Wi xi

3 since the denominator involves only precisely known (not randomly varying) quantities. And now, from the given one-parameter model and the fact that E(ϵi ) = 0, we find that: E(yi ) = xi θ so that: E(θˆω ) =

∑ (∑ ) 2 i W i xi i Wi xi xi θ ∑ ∑ = θ 2 2 i Wi xi i W i xi

or E(θˆω ) = θ as required. (iii) Also, by definition of the variance of a random variable, ∑ ∑ 2 σ 2 i Wi (Wi x2i ) i xi ) V ar(yi ) i (W ˆ ∑ ∑ V ar(θω ) = = 2 2 ( i Wi x2i )2 i (Wi xi ) If we define:

(16.3)

Wi x2i αi = ∑ 2 i Wi xi ∑

so that

αi = 1

i

then Eq (16.3) simplifies to: ∑ σ 2 i αi W i ˆ V ar(θω ) = ∑ 2 i W i xi

(16.4)

This∑ may be simplified∑further as follows: because, by its definition, 0 < αi < 1, and i αi = 1, then i αi Wi is a weighted average of the weights, Wi , Thus, if we define: ∑ ˜ αi W i = W i

then Eq (16.4) may be simplified to yield: σ2 V ar(θˆω ) = ∑ ( ) Wi i

˜ W

x2i

16.4 (i) From the objective function to be minimized, we obtain the following equations, which when solved simultaneously for the parameters, θ0 and θ1 , will yield the required weighted least squares estimates: n ∑

∂Sω ∂θ0

=

∂Sω ∂θ1

= −2

2

Wi [yi − (θ1 xi + θ0 )] = 0

i=1 n ∑ i=1

Wi xi [yi − (θ1 xi + θ0 )] = 0

4

CHAPTER 16.

i.e., θ1

n ∑

Wi xi + θ0

i=1

θ1

n ∑

n ∑

Wi

=

i=1

Wi x2i + θ0

n ∑

Wi xi

=

i=1

i=1

n ∑ i=1 n ∑

Wi yi

(16.5)

xi Wi yi

(16.6)

i=1

∑n From here, θ0 may be eliminated first as follows: ∑nmultiply Eq (16.6) by ( i=1 Wi ) and subtract from Eq (16.5) multiplied by ( i=1 Wi xi ), to yield: ( )( n ) ( n )2  n ∑ ∑ ∑ θ1  Wi x2i Wi − Wi xi  = ( n ∑

i=1

)( Wi

i=1

n ∑

i=1

)



W i xi y i

( n ∑

i=1

i=1

W i xi

i=1

)( n ∑

) Wi y i

(16.7)

i=1

And now, as in Eqs (16.31)–(16.33) in the text, let us define the following terms: w Sxx

w Syy

w Sxy

n ∑

=

i=1 n ∑

=

i=1 n ∑

=

(γi xi − γi x ˜ )2

(16.8)

(γi yi − γi y˜)2

(16.9)

(γi xi − γi x ˜)(γi yi − γi y˜)

(16.10)

i=1

where

∑n ∑n γi2 xi Wi xi i=1 ∑n = ∑i=1 n 2 γ Wi ∑n i=1 ∑ni=1 2i Wi yi γi yi ∑i=1 = ∑i=1 n n 2 i=1 γi i=1 Wi

x ˜ = y˜ =

Observe, therefore, from Eq (16.8) that: w Sxx

=

n ∑

Wi x2i

i=1

∑n 2 ( i=1 Wi xi ) − ∑n i=1 Wi

and from Eq (16.10), w Sxy

=

n ∑

Wi xi yi −

i=1

(

∑n i=1

∑n Wi xi ) ( i=1 Wi yi ) ∑n i=1 Wi

As such, Eq (16.7) becomes: ( n ) ( n ) ∑ ∑ w w θ1 Wi Sxx = Wi Sxy i=1

i=1

(16.11) (16.12)

5 to yield, finally, θˆ1ω =

w Sxy w Sxx

(16.13)

which is clearly reminiscent of Eq (16.37) in the text. And now, substituting Eq (16.13) into Eq (16.5) for θ1 , and solving for θ0 , yields: ∑n ∑n Wi yi ˆ Wi xi i=1 ˆ θ0ω = ∑n − θ1ω ∑i=1 n W i i=1 i=1 Wi or, from Eqs (16.11) and (16.12), θˆ0ω = y˜ − θˆ1ω x ˜

(16.14)

again, reminiscent of Eq (16.38) in the text. We may now observe that every term in the expression for the ordinary least squares estimate is replaced by the “weighted” counterpart in the corresponding expression for the weighted least squares estimate, i.e., Sxx and Sxy are replaced, w w ;x ¯ and y¯ are replaced, respectively, by x ˜ and y˜. and Sxy respectively, by Sxx (ii) From Eq (16.13), we observe that: ( ) ( w) 1 E θˆ1ω = w E Sxy Sxx and from the definitions in Eqs (16.8) and (16.10), we obtain: [ n ] ( ) ∑ 1 ˆ γi (θ1 xi + θ0 + ϵi )(γi xi − γi x ˜) E θ 1ω = E w Sxx i=1 )] [ ( n n ∑ ∑ 1 2 2 2 ˜ γi xi x γi xi − = E θ1 w Sxx i=1 i=1 ( n ) n ∑ ∑ 1 2 = Wi xi − Wi xi x ˜ θ w 1 Sxx i=1 i=1 =

1 θ S w = θ1 w 1 xx Sxx

as required. Also, from Eq (16.14), we obtain: ( ) E θˆ0ω = θ0 + θ1 x ˜ − θ1 x ˜ =

θ0

again, as required. (iii) From Eq (16.13), ( ) V ar θˆ1ω =

1 w V ar(Sxy ) w )2 (Sxx

6

CHAPTER 16.

w But, from the definition of Sxy ,

V

w ar(Sxy )

∑n 2 n ( i=1 Wi xi ) ∑ 2 2 = (Wi xi ) σ + ∑n Wi σ 2 ( i=1 Wi ) i=1 i=1 [( n )( n )] ∑n ∑ ∑ Wi xi i=1 − ∑n E W i xi y i Wi yi i=1 Wi i=1 i=1 ] [ n ∑n 2 ∑ ( W x ) i i i=1 σ2 = Wi x2i − ∑ n W i i=1 i=1 n ∑

2

2

w 2 = Sxx σ

As a result,

( ) σ2 V ar θˆ1ω = w Sxx

which is similar to Eq (16.55) in the text. Also, from Eq (16.14) above, ( ) σ2 V ar θˆ0ω = V ar(˜ y) + x ˜2 w Sxx Now, by definition of y˜ as ∑n n ∑ Wi yi = αi yi y˜ = ∑i=1 n i=1 Wi i=1 with Wi αi = ∑n i=1

Wi

; so that

we obtain: V ar(˜ y) =

n ∑

αi = 1

i=1

( n ∑

) αi2

σ2

i=1

As such,

(

)

V ar θˆ0ω = σ

2

( n ∑ i=1

αi2

x ˜2 + w Sxx

)

which is also reminiscent of Eq (16.56); in particular, in the special case where Wi = 1 for all i, observe that: 1 αi = n and therefore, n ∑ n 1 αi2 = 2 = n n i=1 precisely as in Eq (16.56) in the text.

7 16.5(i) Under the given circumstances, the likelihood function will be: ( f (y1 , y2 , . . . , yn |x, θ) =

1 2π

)n/2

1 exp σn

{



∑n

i=1 (yi − 2σ 2

θxi )2

}

from where the log-likelihood is obtained as: n ℓ(θ) = − ln 2π − n ln σ − 2 And now,

−2 ∂ℓ = ∂θ

simplifies to:

n ∑

∑n

∑n

i=1 (yi

− θxi )2

2σ 2

(16.15)

xi (yi − θxi ) =0 2σ 2

i=1

xi (yi − θxi ) = 0

i=1

which, when solved for θ, yields: ∑n xi y i ˆ θM L = ∑i=1 n 2 i=1 xi precisely as in Eq (16.18) of the text. (ii) With the two-parameter model, the log-likelihood function in Eq (16.15) becomes: ∑n (yi − θ0 − θxi )2 n ℓ(θ) = − ln 2π − n ln σ − i=1 2 2σ 2 whose maximization now requires the simultaneous solution of the following equations for θ0 and θ1 : ∂ℓ ∂θ0

= −2

∂ℓ ∂θ1

= −2

n ∑ i=1 n ∑

(yi − θ0 − θxi ) = 0 xi (yi − θ0 − θxi ) = 0

i=1

These equations are identical to Eqs (16.23) and (16.24) in the text; as such the solutions will be exactly the same as in Eqs (16.37) and (16.38) in the text. 16.6 Under the specified circumstances, the likelihood function will be: { n } ( )n/2 ∑ −(yi − θxi )2 1 1 f (y1 , y2 , . . . , yn |x, θ) = exp 2π σn 2σi2 i=1 so that the log-likelihood is obtained as: n ∑ n (yi − θxi )2 ℓ(θ) = − ln 2π − n ln σ − 2 2σi2 i=1

(16.16)

8

CHAPTER 16.

Maximization requires solving the following equation for θ: ∂ℓ ∑ −2xi (yi − θxi ) = =0 ∂θ 2σi2 i=1 n

or, θ

n n ∑ ∑ x2i xi y i = 2 σ σi2 i=1 i i=1

If we now define: Wi = then,

1 σi2

(16.17)

∑n Wi xi yi ˆ θ = ∑i=1 n 2 i=1 Wi xi

which is the weighted least squares estimate for the one-parameter model, where the weights are as defined in Eq (16.17), the reciprocal of the variance associated with observations i. This is intuitively appealing because less reliable measurements are thereby given less weight in determining the least squares estimate of the unknown parameter. (ii) In this case, ℓ(θ) = −

n ∑ n (yi − θ0 − θxi )2 ln 2π − n ln σ − 2 2σi2 i=1

From here, minimization requires solving the following two equations simultaneously for the unknown parameters: ∂ℓ ∂θ0

= −

n ∑ 1 (yi − θ0 − θ1 xi ) = 0 σ2 i=1 i

∂ℓ ∂θ1

= −

n ∑ 1 2 xi (yi − θ0 − θ1 xi ) = 0 σ i=1 i

If we now define Wi as in Eq (16.17), then the equations reduce to precisely the same pair of equations as in Eqs (16.5) and (16.6) in Problem 16.4 (i), so that the solutions will be exactly the same as obtained in Eqs (16.13) and (16.14). As such, the weights are the same as in the one-parameter case. 16.7 By definition, V ar(Θ1 ) = E(Θ21 ) − [E(Θ1 )]2 = E(Θ21 ) − θ12

9 since it has already been established that E(Θ1 ) = θ1 . Thus, V ar(Θ1 )

= = =

2 E(Sxy ) − θ1 2 Sxx [ n ] [ n ] ∑ ∑ 1 1 2 2 E Yi (xi − x ¯) − E Yi (xi − x ¯) 2 Sxx Sxx i=1 i=1

σ2 Sxx

as required. Next, in a similar fashion, V ar(Θ0 ) = E(Θ20 ) − [E(Θ0 )]2 = E(Θ21 ) − θ02

(16.18)

and since, from the given expression for Θ0 , Θ20 = Y¯ 2 + x ¯2 Θ21 − 2¯ xY¯ Θ1 then, xE(Y¯ Θ1 ) ¯2 E(Θ21 ) − 2¯ E(Θ20 ) = E(Y¯ 2 ) + x

(16.19)

The first two contributing terms in this equation are determined as follows: E(Y¯ 2 ) x ¯2 E(Θ21 )

σ2 2 + (θ1 x ¯ + θ0 ) n [( ) ] σ2 2 2 = x ¯ + θ1 Sxx

=

(16.20) (16.21)

the former from a knowledge of V ar(Y ), the latter from the just-determined V ar(Θ1 ). The third contributing term is determined as follows: { n } ∑ 1 1 ¯ ¯ ¯ ¯ E(Y Sxy ) = E Y (Yi − Y )(xi − x ¯) E(Y Θ1 ) = Sxx Sxx i=1 { [ n ]} n ∑ ∑ 1 = E Y¯ Yi (xi − x ¯) − Y¯ (xi − x ¯) Sxx i=1 i=1 { n } ∑ 1 ¯ = E Y Yi (xi − x ¯) Sxx i=1 { n } ∑ 1 E Y¯ (θ1 xi + θ0 + ϵi )(xi − x ¯) = Sxx i=1 { } n ∑ 1 E Y¯ θ1 xi (xi − x ¯) = Sxx i=1 =

} { 1 E Y¯ θ1 Sxx = θ1 E(Y¯ ) Sxx

10

CHAPTER 16.

so that −2¯ xE(Y¯ Θ1 ) = −2¯ xθ1 E(Y¯ ) = −2¯ xθ1 (θ1 x ¯ + θ0 )

(16.22)

And now, Eq (16.20) + Eq (16.21) + Eq (16.22) yields, upon slight rearrangement: x ¯2 σ 2 σ2 2 E(Θ20 ) = + + (θ1 x ¯ + θ0 ) − θ12 x ¯2 − 2θ1 θ0 x ¯ n Sxx which, when introduced into Eq (16.18), results in: V ar(Θ0 )

x ¯2 σ 2 σ2 2 + + (θ1 x ¯ + θ0 ) − θ12 x ¯2 − 2θ1 θ0 x ¯ − θ02 n Sxx σ2 x ¯2 σ 2 = + n Sxx ( ) 1 x ¯2 2 = σ + n Sxx

=

16.8 The expressions for the estimates of the two slopes are: ∑n xi yi θˆ = ∑i=1 n x2 ∑ni=1 i (x − x ¯)(yi − y¯) i=1 ∑n i θˆ1 = (x −x ¯)2 i i=1 We may expand, rearrange and simplify Eq (16.24) to yield: ∑n ( ∑n ) ( ) xi yi − n¯ xy¯ n¯ xy¯ i=1 xi yi ∑ ∑ θˆ1 = ∑i=1 = − n n n 2 2 2 x2 x2 x2 i=1 xi − n¯ i=1 xi − n¯ i=1 xi − n¯ Now, from Eq (16.23), n ∑

xi yi =

i=1

( n ∑

(16.23) (16.24)

(16.25)

) x2i

θˆ

i=1

so that Eq (16.25) becomes: ∑n ( ) ( ) x2i x2 y¯ ˆ − ∑n n¯ θˆ1 = ∑n i=1 θ 2 − n¯ 2 − n¯ 2 2 x ¯ x x x x i=1 i i=1 i as required. From here, we may now observe that if y¯ θˆ = x ¯ then upon introducing this into Eq (16.26), one obtains immediately, ( ∑n ) 2 x2 ˆ i=1 xi − n¯ ˆ ∑ θ1 = θ n 2 x2 i=1 xi − n¯ and the two slopes become identical.

(16.26)

11 Eq (16.26) may be rearranged to yield: ∑n ) ( ) ( x2i x2 y¯ ˆ = θˆ1 + ∑n n¯ ∑n i=1 θ 2 − n¯ 2 − n¯ 2 2 x ¯ x x x x i=1 i i=1 i or, θˆ = and if α = n¯ x2 /

∑n

( ∑n ) ( ) x2 − n¯ x2 ˆ n¯ x2 y¯ i=1 ∑n i 2 θ1 + ∑n 2 x ¯ x x i=1 i i=1 i

(16.27)

x2i , then Eq (16.27) becomes: ( y¯ ) θˆ = α + (1 − α)θˆ1 x ¯ as required. In this case, the estimation error is obtained as: i=1



= θˆ1 − θˆ

( y¯ ) = θˆ1 − α − θˆ1 + αθˆ1 x ¯ ( y¯ ) = α θˆ1 − x ¯ 16.9 By definition of Syy , and upon rearrangement as specified, i.e., Syy =

n ∑

(yi − y¯)2 =

i=1

n ∑

2

[(yi − yˆi ) − (¯ y − yˆi )]

i=1

further expansion of the RHS terms yields: Syy =

n ∑ [ 2 ] yi − 2yi yˆi − 2yi y¯ + 2yi yˆi + yˆi2 + 2ˆ yi y¯ − 2ˆ yi2 + y¯2 − 2¯ y yˆi + yˆi2 i=1

which may now be consolidated to: Syy =

n ∑ ] [ 2 yi y¯ − y¯2 ) − (yi − 2yi yˆi + yˆi2 ) (ˆ yi − 2ˆ i=1

from where we obtain the required result: n ∑

(yi − y¯)2 =

i=1

n ∑

(ˆ yi − y¯)2 +

i=1

n ∑ i=1

16.10 (i) From Y = θ0 + θ2 xθ3 + ϵ we obtain: ∂Y ∂θ0 ∂Y ∂θ2 ∂Y ∂θ3

= 1 = xθ3 = θ2 xθ3 ln(x)

(yi − yˆi )2

12

CHAPTER 16.

representing a nonlinear regression problem because the sensitivities with respect to θ2 and θ3 are functions of unknown parameters. (ii) From θ1 Y = θ0 + +ϵ x we obtain the following sensitivities: ∂Y ∂θ0 ∂Y ∂θ1

= 1 1 x

=

representing a linear regression problem because none of the sensitivities depends on an unknown parameter. (iii) From √ Y = θ0 + θ1 ex + θ2 sin x + θ3 x + ϵ we obtain: ∂Y ∂θ0 ∂Y ∂θ1 ∂Y ∂θ2 ∂Y ∂θ3

=

1

= ex = =

sin(x) √

x

representing a linear regression problem because none of the sensitivities depends on an unknown parameter, only on data, x. (iv) From Y = θ0 e−θ2 x + ϵ we obtain: ∂Y ∂θ0 ∂Y ∂θ2

= e−θ2 x = −θ0 xe−θ2 x

representing a nonlinear regression problem because both sensitivities are functions of the unknown parameters. (v) From Y = θ0 xθ11 xθ22 + ϵ

13 we obtain: ∂Y ∂θ0 ∂Y ∂θ1 ∂Y ∂θ2

=

xθ11 xθ22

=

θ0 xθ11 ln(x1 )xθ22

=

θ0 xθ11 ln(x2 )xθ22

representing a nonlinear regression problem because all three sensitivities are functions of the unknown parameters. 16.11 (i) The equation θ1

P vap = eθ0 − T +θ3 may be “linearized” by taking natural logarithms and rearranging to obtain: ( ) ( ) ( ) θ1 θ0 −1 vap ln P = θ0 − + T+ T ln P vap θ3 θ3 θ3 clearly of the form: y = β0 + β1 x1 + β2 x2 which may be used to carry out linear regression. However, x1 and x2 (and also y and x2 ) are not independent. The relationships between the transformed model parameters and the original parameters, are: β0 = θ0 −

θ1 θ0 1 ; β1 = ; and β2 = − θ3 θ3 θ3

(ii) Upon taking natural logs, N = θ0 eθ1 t becomes: ln N = ln θ0 + θ1 t of the form: y = β0 + β1 x1 where: β0 = ln θ0 ; and β1 = θ1 (iii) Upon taking natural logs, Q0 = θ0 M θ1 becomes ln Q0 = ln θ0 + θ1 ln M

14

CHAPTER 16.

of the form: y = β0 + β1 x1 where: β0 = ln θ0 ; and β1 = θ1 (iv) Upon taking natural logs, Sh = θ0 Reθ1 Scθ2 becomes ln(Sh) = ln θ0 + θ1 ln Re + θ2 ln Sc which is of the form: y = β0 + β1 x1 + β2 x2 with β0 = ln θ0 ; β1 = θ1 ; and β2 = θ2 16.12 Proving that the “hat” matrix is idempotent requires showing that HH = H. From the given expression, ( )−1 H = X XT X XT we obtain: HH =

[ ( ][ ( ] )−1 )−1 ( )−1 ( )( )−1 X XT X XT X XT X X T = X XT X XT X XT X XT

( )−1 = X XT X XT = H as required. In similar fashion, to establish (I − H) as idempotent requires showing that (I − H)(I − H) = (I − H). Since (I − H)(I − H) = I − 2H + HH and since we have just established that HH = H, then it follows that: (I − H)(I − H) = I − 2H + H = (I − H) as required. From Eq (16.157) in the text, i.e., ˆ = Hy y upon pre-multiplying both sides by H, we obtain Hˆ y = HHy = Hy

15 as required (since H is idempotent). Similarly, from e = (I − H)y upon pre-multiplying both sides by (I-H), we obtain (I − H)e = (I − H)(I − H)y = (I − H)y as required (again, since (I - H) is idempotent). 16.13 In standard notation, this problem requires obtaining the least squares estimate of the parameter vector θ from: y = Xθ + ϵ where:



     y1 1 0 0 θ1 y =  y2  ; X =  0 1 0  ; and θ =  θ2  y3 0 0 1 θ3

subject to the constraint: Lθ = v with L = [1 1 1]; and v = 180 Now, given the specific data vector: 

 91 y =  58  33

ˆ is obtained from: the standard least squares solution, θ, ˆ = (XT X)−1 XT y θ which in this specific case simplifies to ˆ=y θ From Eq (16.172) in the text, we obtain the “gain” matrix, Γ, as: [ ( ]−1 ( )−1 )−1 Γ = XT X LT L XT X LT = LT (LLT )−1   1/3 =  1/3  1/3 And since ˆ = (91 + 58 + 33) = 182 Lθ

(16.28)

16

CHAPTER 16.

then, the required constrained least squares solution, from Eq (16.173) in the text, is obtained as: ˆ CLS θ

= y + Γ(180 − 182)      91 2/3 =  58  −  2/3  =  33 2/3   90.333 =  57.333  32.333

271 3 172 3 93 3

 

16.14 (i) From the expression for the ridge regression estimate, ˆ RR = (XT X + k 2 I)−1 XT y θ

(16.29)

we obtain the expected value as: ˆ RR ) = (XT X + k 2 I)−1 XT E(y) E(θ and since E(ϵ) = 0, so that E(y) = Xθ, we see that: ˆ RR ) = (XT X + k 2 I)−1 XT Xθ E(θ ˆ RR ) will equal θ if, and only if, k = 0; however, for k ̸= 0, Now, E(θ ˆ RR ) ̸= θ E(θ ˆ RR is biased for any non-zero value of k. so that θ (ii) From Eq (16.29), we obtain { } ˆ RR ) = E [θ ˆ RR − E(θ ˆ RR )][θ ˆ RR − E(θ ˆ RR )]T Σθˆ RR = V ar(θ = (XT X + k 2 I)−1 XT σ 2 IX(XT X + k 2 I)−1 = (XT X + k 2 I)−1 (XT X)(XT X + k 2 I)−1 σ 2

(16.30)

We may now observe the following about the expression in Eq (16.30): 1. When k = 0, it is identical to the covariance matrix for the regular leastsquares estimate shown in Eq (16.151) in the text; 2. In the limit as k → ∞, Σθˆ RR → 0 Thus, because k 2 > 0, the covariance matrix for the ridge regression estimate will always be smaller in norm than the covariance matrix for the standard least squares estimate. 16.15 (i) Upon introducing the three given values of xi into the expression, ηi = 1 + 2xi

17 we obtain the following as the theoretical responses, ηi : η1 η2

= 4.80 = 5.00

η3

= 5.20

all of which compare well (within the limits of experimental error) with the corresponding observed data, η1

= 4.89

η2 η3

= 4.95 = 5.15

(ii) In matrix form, the two-parameter model is:       ] 1 1.9 [ ϵ1 y1  y2  =  1 2.0  θ0 +  ϵ2  θ1 1 2.1 ϵ3 y3 As such, the matrix X is:

 1.9 2.0  2.1



1 X= 1 1 from where we obtain:

[ XT X =

3.00 6.00

6.00 12.02

] (16.31)

whose determinant is obtained as: |XT X| = 0.06 confirming that the matrix XT X is near-singular. We may also observe that the rows and columns of XT X are almost perfectly linearly dependent. (iii) The least-squares estimates, determined from ˆ = (XT X)−1 XT y θ are obtained in this specific case as: [

θˆ0 θˆ1

]

[ =

2.397 1.300

]

which should be compared to the “true” values: [ ] [ ] θ0 1.000 = θ1 2.000

(16.32)

18

CHAPTER 16.

Note especially how far θˆ0 is from θ0 . (iv) For the indicated values of k 2 , we obtain the following results, ] [ 5.00 6.00 for k 2 = 2; (XT X + k 2 I) = 6.00 14.02 [ ] 4.00 6.00 = for k 2 = 1; 6.00 13.02 [ ] 3.50 6.00 = for k 2 = 0.5 6.00 12.52 Observe that these matrices have become much better conditioned than the original (XT X) matrix in Eq (16.31). From here the vectors of ridge regression estimates for each k 2 are obtained as follows: [ ] [ ] θˆ0RR 0.883 = for k 2 = 2; 1.762 θˆ1RR [ ] 0.941 = for k 2 = 1; 1.871 [ ] 0.977 = for k 2 = 0.5 1.929 which all compare quite well with the true values shown in Eq (16.32). In particular, note that the vector of estimates for k 2 = 0.5 is the closest to the true parameter vector. (v) A plot of the data and the least squares fit (solid line) is shown in Fig 16.1 along with the true line (dashed line) and the ridge regression line (dasheddotted line). Compared to the true line, the ridge regression line is clearly “offset,” showing the bias for which ridge regression estimates are known; however, the slopes of these two lines match very well, while the slope of the standard least squares line is different. As a result, upon extrapolating the three lines back to x = 0, the standard least squares regression line will show an intercept at y = 2.397 (the least squares estimate for θ0 ), while the ridge regression line will show an intercept of 0.977, compared to the true value of 1.000. 16.16 The result of a linear, two-parameter model fit to the supplied data is shown below: Regression Analysis: y16 versus xk The regression equation is y16 = 2.31 + 4.88 xk Predictor Constant xk

Coef 2.3115 4.8784

SE Coef 0.5193 0.8045

T 4.45 6.06

P 0.003 0.001

S = 1.55789 R-Sq = 84.0% R-Sq(adj) = 81.7%

19

Scatterplot of y, etah, eta vs x Variable y etah eta

5.2 5.1

y

5.0 4.9 4.8

4.7 4.6 1.90

1.95

2.00 x

2.05

2.10

Figure 16.1: Data versus standard least squares regression line (y, solid line), true line (η, dashed line), and the ridge regression line (ˆ η , dashed-dotted line) for an ill-conditioned two-parameter model.

Unusual Observations Obs xk y16 Fit SE Fit Residual St Resid 9 1.00 10.095 7.190 0.958 2.905 2.36R R denotes an observation with a large standardized residual. The p-values associated with the parameter estimates indicate that at the α = 0.05 significance level, both parameters are significantly different from zero; the 2 R2 and Radj values indicate that a decent amount of the information in the data has been captured by the linear model. However, Fig 16.2, a plot of the model fit to the data, shows clearly that the linear model is inadequate. The result of the quadratic model fit is shown below: Regression Analysis: y16 versus xk, xk2 The regression equation is y16 = 1.05 + 4.88 xk + 3.03 xk2 Predictor Constant xk xk2

Coef 1.0492 4.8784 3.0295

SE Coef 0.5031 0.5141 0.9076

T 2.09 9.49 3.34

P 0.082 0.000 0.016

S = 0.995549 R-Sq = 94.4% R-Sq(adj) = 92.5% The p-values associated with the parameter estimates indicate that at the α = 0.05 significance level, the constant term may not be significant; on the other 2 hand, the linear and quadratic coefficients are both significant. The R2 and Radj values, as expected, indicate an improvement in the amount of information in the data that has been captured by the quadratic model (compared with that

20

CHAPTER 16.

Fitted Line Plot y = 2.312 + 4.878 xk S R-Sq R-Sq(adj)

10

1.55789 84.0% 81.7%

8 6

y

4 2 0 -2 -4 -1.0

-0.5

0.0 xk

0.5

1.0

Figure 16.2: Data and linear, two-parameter model fit.

captured by the linear model). Notwithstanding, a plot of the quadratic model fit to the data in Fig 16.3 shows that the quadratic model, even though better than the linear model, still remains inadequate. The result of the cubic model fit is shown below: Regression Analysis: y16 versus xk, xk2, xk3 The regression equation is y16 = 1.05 + 1.86 xk + 3.03 xk2 + 4.09 xk3 Predictor Constant xk xk2 xk3

Coef 1.04922 1.8646 3.0295 4.0864

SE Coef 0.08297 0.2221 0.1497 0.2783

T 12.65 8.40 20.24 14.68

P 0.000 0.000 0.000 0.000

S = 0.164179 R-Sq = 99.9% R-Sq(adj) = 99.8% First, we observe that the p-values associated with the parameter estimates 2 indicate that all 4 parameters in the model are significant. Next, the R2 and Radj values indicate a near-perfect fit of the model to the data—which is confirmed by Fig 16.4. Finally, the result of the quartic model fit is shown below: Regression Analysis: y16 versus xk, xk2, xk3, xk4 The regression equation is y16 = 1.06 + 1.86 xk + 2.91 xk2 + 4.09 xk3 + 0.113 xk4

21

Fitted Line Plot y = 1.049 + 4.878 xk + 3.030 xk**2 S R-Sq R-Sq(adj)

10

0.995549 94.4% 92.5%

8

y

6 4 2 0 -2 -1.0

-0.5

0.0 xk

0.5

1.0

Figure 16.3: Data and quadratic model fit.

Fitted Line Plot y = 1.049 + 1.865 xk + 3.030 xk**2 + 4.086 xk**3 S R-Sq R-Sq(adj)

10

0.164179 99.9% 99.8%

8

y

6 4 2 0 -2 -1.0

-0.5

0.0 xk

0.5

1.0

Figure 16.4: Data and cubic model fit. Note the near-perfect fit and the corresponding 2 near-perfect values for R2 and Radj .

22

CHAPTER 16. Predictor Constant xk xk2 xk3 xk4

Coef 1.0629 1.8646 2.9133 4.0864 0.1132

SE Coef 0.1181 0.2472 0.6481 0.3098 0.6100

T 9.00 7.54 4.49 13.19 0.19

P 0.001 0.002 0.011 0.000 0.862

S = 0.182773 R-Sq = 99.9% R-Sq(adj) = 99.7% This time, we note first that there is little or no change in the values of the first 4 parameter estimates associated with the cubic model; the p-values associated with these parameters indicate that, even under these circumstances, they remain significant. However, the additional parameter, the coefficient of x4 , is not significant at the α = 0.05 level, implying that this parameter is not needed; 2 it is indistinguishable from zero. The Radj value shows a slight reduction compared to that for the cubic model, but not enough to be worth discussing. The key here is the significance (or lack thereof) of the additional parameter, which indicates that the cubic model is in fact sufficient; the additional parameter is therefore unnecessary. The quartic model fit to the data is indistinguishable from the cubic model fit; therefore, the plot is not shown. The conclusion: the most appropriate model is the cubic model, requiring four parameters estimated as: θˆ0 = 1.05; θˆ1 = 1.86; θˆ2 = 3.03; θˆ3 = 4.09 16.17 First, we observe that the variable x is already defined as n = 9 equally spaced values on the interval (−1, 1). The required orthogonal polynomial model is of the form: 4 ∑ y(x) = αi pi (x) (16.33) i=0

where pi (x), i = 0, 1, . . . , 4, are Gram orthogonal polynomials defined on this interval. From the supplied values of xk ; k = 1, 2, . . . , 9, and from Eq (16.199) in the text (which provides the formulas for generating the Gram orthogonal polynomials), we are able to obtain the following polynomials required to determine the desired linear, quadratic, cubic, and quartic polynomial fits (see Fig 16.5):       1 −1.00 0.583333  1   −0.75   0.145833         1   −0.50   −0.166667         1   −0.25   −0.354167             p0 =   1  ; p1 =  0.00  ; p2 =  −0.416667  ;  1   0.25   −0.354167         1   0.50   −0.166667         1   0.75   0.145833  1 1.00 0.583333

23

Scatterplot of p1, p2, p3, p4 vs xk Variable p1 p2 p3 p4

1.0

p(x)

0.5

0.0

-0.5

-1.0 -1.0

-0.5

0.0 xk

0.5

1.0

Figure 16.5: Gram orthogonal polynomials required for Problem 16.17. 

−0.26250 0.13125 0.24375 0.16875 0.00000 −0.16875 −0.24375 −0.13125 0.26250

      p3 =       





             ; p4 =             

0.093750 −0.140625 −0.073661 0.060268 0.120536 0.060268 −0.073661 −0.140625 0.093750

             

∑9 With the sums of squares term, ψi2 , defined as: ψi2 = pTi pi = k=1 pi (xk )pi (xk ), the required coefficients are obtained upon introducing the supplied data into Eq (16.206) in the text, i.e.: ∑9 α ˆ0 =

k=1

∑9 α ˆ1 =

k=1

∑9 α ˆ2 =

k=1

∑9 α ˆ3 =

k=1

∑9 α ˆ4 =

k=1

p0 (xk )y(xk ) ψ02

=

20.804 = 2.312 9

p1 (xk )y(xk ) ψ12

=

18.294 = 4.878 3.75

p2 (xk )y(xk ) ψ22

=

3.645 = 3.029 1.203

p3 (xk )y(xk ) ψ32

=

1.422 = 4.086 0.348

p4 (xk )y(xk ) ψ42

=

0.010 = 0.113 0.0897

24

CHAPTER 16.

Note that the first 4 coefficients (including the constant α ˆ 0 ) are all about the same magnitude; the last one, α ˆ 4 , is an order of magnitude less. Thus, the relative contribution of the 4th degree polynomial is seen to be much less. This is consistent with the conclusion in Problem 16.16, that a cubic provides the most appropriate fit. On the other hand, the final set of coefficients for the standard polynomial regression model were obtained as follows: θˆ0 = 1.05; θˆ1 = 1.86; θˆ2 = 3.03; θˆ3 = 4.09 As one might expect, it appears as if there is no connection between these parameters and those obtained via orthogonal polynomial representation. However, in this specific case (because the range of the predictor variable x is coincidentally the same as that required for orthogonal polynomial regression), we recall the result of the linear standard regression analysis: (y16 = 2.31 + 4.88 xk) whose coefficients are identical to the first two coefficients of the orthogonal polynomial regression model. Subsequent additions of the quadratic, then the cubic, and finally the quartic terms naturally modified these first two leading coefficients in the standard polynomial model. Not so with the orthogonal polynomial model; the parameters remain unchanged as more terms are added, as is characteristic of orthogonal polynomial regression.

APPLICATION PROBLEMS 16.18 (i) The result of the regression analysis is as follows: Regression Analysis: Expmt versus KG The regression equation is Expmt = - 0.0183 + 1.00 KG Predictor Constant KG

Coef -0.01826 1.00311

SE Coef 0.02061 0.00667

T -0.89 150.35

P 0.396 0.000

S = 0.00994367 R-Sq = 100.0% R-Sq(adj) = 100.0% We may now observe the following concerning this result: 1. The two parameters are estimated as: θˆ0 = −0.018; and θˆ1 = 1.003 but the associated p-values indicate that θˆ0 is not significantly different from zero (at the α = 0.05 significance level), while θˆ1 is. The implications are therefore that θˆ0 is essentially zero while θˆ1 is essentially 1, a conclusion indicating an essentially perfect match between the KG model prediction and the experimental data.

25

Fitted Line Plot Expmt = - 0.01826 + 1.003 KG Regression 95% CI

3.75

S R-Sq R-Sq(adj)

Expmt

3.50

0.0099437 100.0% 100.0%

3.25 3.00 2.75 2.50 2.50

2.75

3.00

3.25

3.50

3.75

KG

Figure 16.6: Experimental data, regression line of the KG model predictions (solid line) and 95% confidence interval (dashed line). The regression line is hardly distinguishable from the tight 95% confidence interval, indicating extremely small residual differences between the model prediction and the data. 2 2. Furthermore, the R2 and Radj values indicate perfect correlation between model and experimental data, with little or no variability left unexplained by the regression model.

3. A plot of the regression line versus the data, along with a 95% confidence interval around the regression line, is shown in Fig 16.6; it also indicates a near-perfect fit, consistent with all the points made above. 4. A normal probability plot of the residuals and the 95% confidence interval lines is shown in Fig 16.7; along with the indicated p-value of 0.124 associated with the AD normality test, it confirms the adequacy of the fit provided by the regression model to the data. The conclusion, therefore, is that strictly on the basis of this regression analysis, the KG model does indeed provide “excellent agreement” with the data. (ii) For the HCB model, the result of the regression analysis is as follows: Regression Analysis: Expmt versus HCB The regression equation is Expmt = - 0.0540 + 1.02 HCB Predictor Constant HCB

Coef -0.05397 1.02183

SE Coef 0.03227 0.01052

T -1.67 97.10

P 0.125 0.000

S = 0.0153925 R-Sq = 99.9% R-Sq(adj) = 99.9%

26

CHAPTER 16.

Probability Plot of RESI-KG Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

7.031412E-16 0.009481 12 0.548 0.124

80

Percent

70 60 50 40 30 20 10 5

1

-0.04

-0.03

-0.02

-0.01

0.00 0.01 RESI-KG

0.02

0.03

0.04

Figure 16.7: Normal probability plot and 95% confidence interval for the residuals resulting from the regression of the KG model predictions versus experimental data.

As in (i), we may now also observe the following concerning this result: 1. The two parameters are estimated as: θˆ0 = −0.054; and θˆ1 = 1.022 but once again, the p-values associated with the parameter estimates indicate that θˆ0 is not significantly different from zero (at the α = 0.05 significance level), while θˆ1 is. The implications: in this case also, θˆ0 is essentially zero while θˆ1 is essentially 1, indicating another near-perfect match between the HCB model prediction and the experimental data. However, we note that θˆ1 for the KG model, 1.003, is slightly closer to the ideal value of 1 than the value of 1.022 estimated for the HCB model. 2 2. The R2 and Radj values, each 99.9%, also indicate near-perfect correlation between model and experimental data, with very little variability left unexplained by the regression model. But, once again, these values are slightly less than the ideal 100%.

3. A plot of the regression line and 95% confidence interval versus the data, shown in Fig 16.8, is consistent with the points made above. 4. Shown in Fig 16.9 along with the indicated p-value of 0.102 associated with the AD normality test, is a normal probability plot of the residuals and the 95% confidence interval lines. This confirms the adequacy of the fit provided by the regression model to the data.

27

Fitted Line Plot Expmt = - 0.05397 + 1.022 HCB Regression 95% CI

3.75

S R-Sq R-Sq(adj)

Expmt

3.50

0.0153925 99.9% 99.9%

3.25 3.00 2.75 2.50 2.50

2.75

3.00 3.25 HCB

3.50

3.75

Figure 16.8: Experimental data, regression line of the HCB model predictions (solid line), and 95% confidence interval (dashed line). The regression line is hardly distinguishable from the tight 95% confidence interval, indicating extremely small differences between the model prediction and the data.

Probability Plot of RESI-HCB Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

4.440892E-16 0.01468 12 0.583 0.102

80

Percent

70 60 50 40 30 20 10 5

1

-0.050

-0.025

0.000 RESI-HCB

0.025

0.050

Figure 16.9: Normal probability plot and 95% confidence interval for the residuals resulting from the regression of the HCB model predictions versus experimental data.

28

CHAPTER 16.

In addition to the points made above, s, the residual root-mean-square, is 0.010 for the KG model but 50% higher (at 0.015) for the HCB model. The conclusion therefore is that, yes, strictly on the basis of this regression analysis, the HCB model also provides “excellent agreement” with the data; however, the KG model appears to provide slightly better predictions than the HCB model. 16.19 (i) A scatter plot of fire damage, y, versus distance to the closest fire station, x, suggests a linear relationship of the type: y = θ0 + θ1 x A regression analysis carried out in MINITAB produces the following result: Regression Analysis: Fire Damage, y versus Distance, x The regression equation is Fire Damage, y = 11.6 + 4.40 Distance, x Predictor Constant Distance, x

Coef 11.558 4.4000

SE Coef 1.351 0.4246

T 8.56 10.36

P 0.000 0.000

S = 2.00410 R-Sq = 91.5% R-Sq(adj) = 90.6% The p-values associated with the parameter estimates imply that both estimates, θˆ0 = 11.56 and θˆ1 = 4.40, are significantly different from zero. Along with the 2 indicated R2 and Radj values, the implication is that the regression model, y = 11.56 + 4.40x

(16.34)

provides a “reasonable fit” to the data. (See Fig 16.10.) To evaluate the model fit objectively requires an analysis of the residuals. First, a standard MINITAB “four-in-one” plot of the residuals resulting from the regression is shown in Fig 16.11. Nothing looks out of the ordinary with these residuals, but a more rigorous formal normality test is still necessary. The result of such a test is shown in Fig 16.12, along with an associated pvalue of 0.966. The mean value is virtually zero; the 95% confidence interval on the probability plot completely encompasses all the residuals; and along with the indicated p-value, we conclude that the residuals can be considered as normally distributed with zero mean. The model is therefore considered adequate. (ii) The regression model in Eq (16.34) may now be used to determine expected values, along with appropriate uncertainty intervals, which, in this case will be the confidence intervals. MINITAB can be used directly for this purpose. The result for house A, located at a distance of x = 5 miles, is obtained as follows: Predicted Values for New Observations New Obs x Fit SE Fit 95% CI 1 5.00 33.558 1.072 (31.170, 35.946)

29

Fitted Line Plot Fire Damage, y = 11.56 + 4.400 Distance, x 40

Regression 95% CI S R-Sq R-Sq(adj)

35

2.00410 91.5% 90.6%

Fire Damage, y

30 25 20 15 10 0

1

2

3 Distance, x

4

5

6

Figure 16.10: Fire damage data and regression line with 95% confidence interval.

Residual Plots for Fire Damage, y Normal Probability Plot

Versus Fits Standardized Residual

99

Percent

90 50 10 1 -2

-1 0 1 Standardized Residual

2 1 0 -1 -2

2

15

20

Frequency

3

2

1

0 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 Standardized Residual

30

35

Versus Order Standardized Residual

Histogram

25 Fitted Value

1.5

2 1 0 -1 -2 1

2

3

4

5 6 7 8 9 Observation Order

10

11 12

Figure 16.11: Qualitative residual analysis: The MINITAB “four-in-one” plot shows nothing out of the ordinary.

30

CHAPTER 16.

Probability Plot of RESI3 Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-1.08062E-14 1.911 12 0.137 0.966

80

Percent

70 60 50 40 30 20 10 5

1

-7.5

-5.0

-2.5

0.0 RESI3

2.5

5.0

Figure 16.12: Probability plot with 95% confidence interval. Thus, the expected “damage” value for house A (in thousands of dollars) is yˆA = 33.558; the 95% confidence interval is 31.170 < yˆA < 35.946, or yˆA = 33.558 ± 2.388 For house B located at a distance of x = 3 miles, the result is: Predicted Values for New Observations New Obs x Fit SE Fit 95% CI 1 3 24.758 0.581 (23.464, 26.053) As such, the expected “damage” value for house B (in thousands of dollars) is yˆB = 24.758; the 95% confidence interval is 23.464 < yˆB < 26.053, or yˆB = 24.758 ± 1.295 (iii) It is not “safe” to use this model to predict the fire damage to a house C that is 6 miles from the nearest fire station. This is because the data used to develop the model does not extend beyond x = 5.5; as such, any prediction for x = 6 will constitute an ill-advised extrapolation outside the data range. Still, if the model is used for such a prediction, the result can be obtained directly using MINITAB. The appropriate uncertainty interval this time is the prediction interval. The result from MINITAB is as follows: Predicted Values for New Observations New Obs x Fit SE Fit 95% CI 1 6 37.958 1.447 (34.733, 41.183)

95% PI (32.450, 43.467)X

X denotes a point that is an outlier in the predictors.

31 The predicted “damage” value for house C (in thousands of dollars) is yˆC = 37.958; the 95% prediction interval 32.450 < yˆC < 43.467, or yˆC = 37.958 ± 5.509

(16.35)

ˆ n+1 recursively given θ ˆ n , i.e., 16.20 (i) The expression for obtaining estimates θ Eq (16.190) of the text (page 695), is ( ) ˆ n+1 = θ ˆ n + Pn+1 xn+1 yn+1 − xT θ ˆ θ (16.36) n+1 n To determine Pn+1 directly from its definition in Eq (16.187) in the text, i.e., −1 T P−1 n+1 = Pn + xn+1 xn+1

(16.37)

requires, in principle, the inversion of an (n + 1) × (n + 1) matrix. To avoid carrying out this inversion, it is usual to invoke the matrix inversion lemma, a version of which states that if A = B−1 + CD−1 CT

(16.38)

A−1 = B − BC(D + CT BC)−1 CT B

(16.39)

then If we now let A = P−1 n+1 ; B = Pn ; with C = xn+1 along with D = 1, a scalar, then by invoking the matrix inversion result, Eq (16.37) becomes: Pn+1 = Pn − Pn xn+1 (1 + xTn+1 Pn xn+1 )−1 xTn+1 Pn

(16.40)

and because the term to be inverted is a scalar, the result simplifies to: Pn+1 = Pn −

Pn xn+1 xTn+1 Pn (1 + xTn+1 Pn xn+1 )

(16.41)

which may now be used in Eq (16.36): having obtained Pn = (XT X)−1 once from the original 12-sample data set in Problem 16.19, no matrix inversion is necessary any longer. Now, from the data, obtain [ ] 0.4543 −0.1290 P12 = (XT X)−1 = (16.42) −0.1290 0.0449 Upon recasting the new data as follows: xT13 xT14

= [1 3.8] = [1 4.8]

xT15

= [1 6.1]

32

CHAPTER 16.

one can easily write a computational script, which, beginning with P12 , will use Eq (16.41) recursively to determine the matrices P13 , P14 and P15 as: [ ] 0.4532 −0.1277 P13 = −0.1277 0.0434 ] [ 0.4324 −0.1172 P14 = −0.1172 0.0381 [ ] 0.3761 −0.0943 P15 = −0.0943 0.0287 From here, and from Eq (16.36), we obtain the following sequence of parameter estimates, using the appropriate P matrix, x data vector, and fire damage data ˆ 12 as the estimates obtained earlier from the original y, and beginning with θ set of data: [ ] 11.628 ˆ θ 13 = (16.43) 4.319 [ ] 11.102 ˆ 14 = θ (16.44) 4.584 [ ] 10.278 ˆ 15 = θ (16.45) 4.919 We observe therefore that as a result of the new data set, the intercept parameter, θ0 , has changed from 11.558 to 10.278, an 11.1% decrease, while the slope parameter, θ1 has changed from 4.40 to 4.919, an 11.8% increase. (ii) The recalculated expected fire damage values to houses A and B, in light of the new data, are: ∗ yˆA ∗ yˆB

= 34.88 = 25.04

(16.46) (16.47)

values that fit perfectly within the respective 95% confidence intervals computed earlier, 31.170 < yˆA < 35.946, and 23.464 < yˆB < 26.053. Thus, the observed change in the expected values are not significant (or practically important). (It is also possible to arrive at the same conclusion by obtaining the 95% confidence intervals for the new expected values. These intervals will be seen to overlap perfectly with the corresponding ones obtained earlier—leading to the same conclusions.) (iii) With the new data, which now extends to x = 6.1 (so that x = 6 is now within the data range), it “safe” to use the updated model to predict the fire damage to house C. The result, (along with the prediction interval) is obtained as: yˆC = 39.793 ± 5.88 (16.48)

33

Fitted Line Plot ln(DNADamage) = 0.3457 + 0.4584 ln(RadDose) 3

Regression 95% CI

2

S R-Sq R-Sq(adj)

ln(DNADamage)

1

0.402187 97.0% 95.5%

0 -1 -2 -3 -4 -5 -7

-6

-5

-4

-3 -2 -1 ln(RadDose)

0

1

2

Figure 16.13: Data, regression line and 95% confidence interval for DNA damage as a function of radiation dose. Note that the predicted value, yˆC = 39.793, does in fact still lie well within the 95% prediction interval obtained earlier. 16.21 (i) A logarithmic transformation of the given expression produces ln λ = ln θ0 + θ1 ln γ Consequently, when the regression analysis is carried out with y = ln λ and x = ln γ, the result is as shown below: Regression Analysis: ln(DNADamage) versus ln(RadDose) The regression equation is ln(DNADamage) = 0.346 + 0.458 ln(RadDose) Predictor Constant ln(RadDose)

Coef 0.3457 0.45842

SE Coef 0.2129 0.05727

T 1.62 8.01

P 0.246 0.015

S = 0.402187 R-Sq = 97.0% R-Sq(adj) = 95.5% The p-value of 0.246 associated with the estimate of ln θ0 indicates that at the α = 0.05 significance level, this parameter is not significantly different from zero; i.e., θˆ0 is more likely to be 1. Conversely, the p-value of 0.015 associated with 2 the estimate of θ1 indicates that this parameter is significant. The R2 and Radj values suggest that the model provides a decent explanation of the information contained in the data. A plot of the log transformed data, the regression line and 95% confidence interval is shown in Fig 16.13.

34

CHAPTER 16.

Distribution Plot Poisson, Mean=2.956 0.25

Probability

0.20

0.15

0.10

0.05 0.07955 0.00

0

6 X

Figure 16.14: P (X ≥ 6) for a Poisson random variable with λ = 2.956. (ii) For radiation dose γ = 5.0, (in which case ln γ = 1.609), the model prediction of the extent of DNA damage is: ln λ = 1.084; ⇒ λ = 2.956 From the supplied information, we deduce that a cell response of n = 3 pulses of p53 corresponds to 2n = 6 strand breaks. The problem therefore requires that we compute P (X ≥ 6) for a Poisson random variable with λ = 2.956, i.e., X ∼ P(2.956). The result is (see Fig 16.14): P (X ≥ 6|λ = 2.956) = 0.08 16.22 (i) A regression analysis of the data yields the following results: Regression Analysis: CrackedPistons-y versus Purity-x The regression equation is CrackedPistons-y = 59.5 - 58.0 Purity-x Predictor Constant Purity-x

Coef 59.520 -58.000

SE Coef 3.422 3.546

T 17.39 -16.36

P 0.000 0.000

S = 0.148324 R-Sq = 98.5% R-Sq(adj) = 98.2% The implication is that the parameter estimates are as follows: θˆ0 = 59.520; θˆ1 = −58.000 so that the regression equation is: y = 59.52 − 58.00x

(16.49)

35

Fitted Line Plot CrackedPistons-y = 59.52 - 58.00 Purity-x 5.5

Regression 95% CI

CrackedPistons-y

5.0

S R-Sq R-Sq(adj)

4.5

0.148324 98.5% 98.2%

4.0 3.5 3.0 2.5 2.0 0.94

0.95

0.96 0.97 Purity-x

0.98

0.99

Figure 16.15: Cracked pistons data, regression line and 95% confidence interval (dashed line).

The p-values indicate that both parameters are significantly different from zero, 2 and the R2 and Radj values indicate that the model explains a considerable amount of the information in the data. A plot of the regression fit to the data, shown in Fig 16.15, confirms these observations. (ii) From the estimated parameters and the resulting regression equation, we may determine the expected value of the average number of cracked pistons per ingot of 96% purity by introducing x = 0.96 into Eq (16.49) to obtain: y = 3.84 The problem now requires that we test the average number of cracked pistons obtained from the supplied sample data against the hypothesis that x = 0.96, which translates to a postulate that the “true” mean number of cracked pistons is µ0 = 3.84. Upon the assuming that the mean number of cracked pistons per ingot will be approximately normally distributed, with a sample size n = 5, and no population variance specified, the appropriate test is a one-sample t-test. The sample average, y¯, and variance, s2 , are obtained from the data as: y¯ = 5.00; s2 = 2.00 The null hypothesis is clearly that µ = µ0 , where µ0 = 3.84. But we must now exercise care in choosing the appropriate alternative hypothesis, given that the sample average has been determined as 5, a number that is higher than the postulated mean value. The two-sided alternative is not of much interest to us because merely establishing that the mean number of cracked pistons is not equal to the hypothesized 3.84 is not very useful. In determining whether the

36

CHAPTER 16.

“purity” claim by the steel mill is reasonable, it is of more interest to determine if the true “population” mean number of cracked pistons is indeed greater than the postulated 3.84, in light of the high value determined from the sample data. The hypotheses may therefore be stated as follows: H0 : µ = 3.84 H0 : µ > 3.84 The result of a one-sample t-test carried out in MINITAB follows: One-Sample T Test of mu = 3.84 vs > 3.84

N 5

Mean 5.000

StDev 1.414

SE Mean 0.632

95% Lower Bound 3.652

T 1.83

P 0.070

The p-value of 0.07 indicates that at the α = 0.05 significance level, there is no evidence for rejecting the null hypothesis. (iii) With n = 20, if every other sample information remains the same, then the result of the one sample t-test is as follows: One-Sample T Test of mu = 3.84 vs > 3.84

N 20

Mean 5.000

StDev 1.414

SE Mean 0.316

95% Lower Bound 4.453

T 3.67

P 0.001

This time, the p-value of 0.001 indicates that at the α = 0.05 significance level, there is evidence for rejecting the null hypothesis in favor of the alternative; i.e., the observed mean number of cracked pistons, 5, is significantly higher than the postulated 3.84. Therefore, the steel mill claim is no longer reasonable. The reason for the different result this time around is that with a larger sample, the variability associated with the computed mean (i.e., the standard error of the mean), is now much smaller. The observed difference between the computed sample average and the postulated mean is now seen to be comparatively large. With a smaller sample size, the observed difference was within the limits of random variation associated with the smaller sample size. 16.23 (i) Scatter plots of the city and highway mileage data versus number of cylinders indicate approximately linear relationships. The result of a linear regression analysis of a postulated two-parameter model for highway gasoline mileage versus number of cylinders is shown below. Regression Analysis: Highway mpg versus Cylinders The regression equation is Highway mpg = 32.4 - 1.29 Cylinders

37

Fitted Line Plot Highway mpg = 32.36 - 1.292 Cylinders 30

Regression 95% CI S R-Sq R-Sq(adj)

Highway mpg

25

2.23678 80.6% 79.6%

20

15

10 5.0

7.5

10.0 12.5 Cylinders

15.0

17.5

Figure 16.16: Highway gasoline mileage versus number of cylinders data, regression line and 95% confidence interval (dashed line).

Predictor Constant Cylinders

Coef 32.359 -1.2918

SE Coef 1.212 0.1492

T 26.70 -8.66

P 0.000 0.000

S = 2.23678 R-Sq = 80.6% R-Sq(adj) = 79.6% Unusual Observations Highway Obs Cylinders mpg 6 16.0 14.000

Fit 11.690

SE Fit 1.377

Residual 2.310

St Resid 1.31 X

X denotes an observation whose X value gives it large leverage. A plot of the data, regression line and 95% confidence interval is shown in Fig 16.16. The corresponding result for city gasoline mileage versus number of cylinders is shown below. Regression Analysis: City mpg versus Cylinders The regression equation is City mpg = 24.2 - 1.15 Cylinders Predictor Coef SE Coef T P Constant 24.1797 0.8569 28.22 0.000 Cylinders -1.1459 0.1055 -10.86 0.000 S = 1.58139 R-Sq = 86.8% R-Sq(adj) = 86.0% Unusual Observations

38

CHAPTER 16.

Fitted Line Plot City mpg = 24.18 - 1.146 Cylinders 22.5

Regression 95% CI

20.0

S R-Sq R-Sq(adj)

City mpg

17.5

1.58139 86.8% 86.0%

15.0 12.5 10.0 7.5 5.0 5.0

7.5

10.0 12.5 Cylinders

15.0

17.5

Figure 16.17: City gasoline mileage versus number of cylinders data, regression line and 95% confidence interval (dashed line).

Obs 6

Cylinders 16.0

City mpg 8.000

Fit 5.845

SE Fit 0.974

Residual 2.155

St Resid 1.73 X

X denotes an observation whose X value gives it large leverage. A plot of the data, regression line and 95% confidence interval is shown in Fig 16.17. From the p-values associated with each parameter estimate, we observe that all coefficients are significant (at the α = 0.05 significance level); and the R2 2 and Radj values for each regression analysis indicate that each model provides a reasonable explanation of the information contained in the data in question. An objective assessment of any difference that might exist between the corresponding parameters of these different models requires 95% confidence intervals around each parameter estimate. For parameters that are significantly different (at the α = 0.05 significance level), these intervals will not overlap. For this particular problem with n = 20, and having estimated 2 parameters (so that ν = 18 is the number of degrees of freedom), the required 95% confidence interval around each parameter estimate is obtained as θi = θˆi ± t0.025 (18) × SEθˆi Upon determining that t0.025 (18) = 2.101, we obtain the interval estimates for the highway gasoline mileage data as: θ0 = 32.35 ± 2.55 ⇒

29.81 ≤ θ0 ≤ 34.91

(16.50)

θ1 = −1.29 ± 0.31 ⇒

−1.60 ≤ θ1 ≤ −0.98

(16.51)

39

Probability Plot of RES-HWY Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-3.55271E-16 2.177 20 0.292 0.570

80

Percent

70 60 50 40 30 20 10 5

1

-8

-6

-4

-2

0 2 RES-HWY

4

6

8

Figure 16.18: Normal probability plot and 95% confidence interval for the residuals obtained from the regression of highway gasoline mileage versus number of cylinders. The p-value of 0.570 implies that the residuals appear to be normally distributed, as postulated. while for the city mileage data, the interval estimates are: θ0 = 24.18 ± 1.80 θ1 = −1.15 ± 0.22

⇒ 22.38 ≤ θ0 ≤ 25.98 ⇒ −1.37 ≤ θ1 ≤ −0.93

(16.52) (16.53)

We may now observe that the two intervals for θ0 , the intercepts, do not overlap. Therefore, at the α = 0.05 significance level, this parameter is different for each model; it is lower for city gasoline mileage, and higher for highway gasoline mileage. On the contrary, the intervals for θ1 , the slope, overlap considerably; as such, at the α = 0.05 significance level, we conclude that there is no significant difference in this parameter value for each model. Thus, in conclusion: the gasoline mileage ratings of the automobiles in question vary approximately linearly with the number of cylinders, with a negative slope that is essentially the same for “city” as well as “highway” mileage; the intercept of the linear relationship is higher for highway mileage. (ii) Figs 16.18 and 16.19 show normal probability plots of the residuals from each regression model (along with corresponding 95% confidence intervals and the pvalues associated with each Anderson-Darling test). The conclusion is that even 2 values are modest in each case, there is no evidence though the R2 and Radj for rejecting the null hypothesis that the residuals are normally distributed— implying that these simple models do in fact provide reasonable explanations of how the number of cylinders in a car engine affects the gas mileage in the city and on the highway. 16.24 (i) Scatter plots of the city and highway mileage data versus engine capacity indicate approximately linear relationships, although this time, the

40

CHAPTER 16.

Probability Plot of RES-CITY Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-1.09246E-14 1.539 20 0.325 0.503

80

Percent

70 60 50 40 30 20 10 5

1

-5.0

-2.5

0.0 RES-CITY

2.5

5.0

Figure 16.19: Normal probability plot and 95% confidence interval for the residuals obtained from the regression of city gasoline mileage versus number of cylinders.The p-value of 0.503 implies that the residuals appear to be normally distributed, as postulated.

correlations are not as strong. The result of a linear regression analysis of a postulated two-parameter model for highway gasoline mileage versus engine capacity is shown below. Regression Analysis: Highway mpg versus Eng Capacity The regression equation is Highway mpg = 30.2 - 1.78 Eng Capacity Predictor Constant Eng Capacity

Coef 30.159 -1.7754

SE Coef 1.669 0.3607

T 18.07 -4.92

P 0.000 0.000

S = 3.31935 R-Sq = 57.4% R-Sq(adj) = 55.0% Unusual Observations Eng Highway Obs Capacity mpg 8 7.00 24.000 9 8.40 22.000

Fit 17.731 15.246

SE Fit 1.269 1.705

Residual 6.269 6.754

St Resid 2.04R 2.37R

R denotes an observation with a large standardized residual. A plot of the data, regression line and 95% confidence interval is shown in Fig 16.20. The corresponding result for city gasoline mileage versus number of cylinders is shown below.

41

Fitted Line Plot Highway mpg = 30.16 - 1.775 Eng Capacity 30

Regression 95% CI S R-Sq R-Sq(adj)

Highway mpg

25

3.31935 57.4% 55.0%

20

15

10 1

2

3

4

5 6 Eng Capacity

7

8

9

Figure 16.20: Highway gasoline mileage versus engine capacity data, regression line and 95% confidence interval (dashed line).

Regression Analysis: City mpg versus Eng Capacity The regression equation is City mpg = 22.8 - 1.70 Eng Capacity Predictor Constant Eng Capacity

Coef 22.766 -1.7046

SE Coef 1.150 0.2486

T 19.79 -6.86

P 0.000 0.000

S = 2.28730 R-Sq = 72.3% R-Sq(adj) = 70.8% Unusual Observations Eng Obs Capacity City mpg 9 8.40 13.000

Fit 8.447

SE Fit 1.175

Residual 4.553

St Resid 2.32R

R denotes an observation with a large standardized residual. A plot of the data, regression line and 95% confidence interval is shown in Fig 16.21. The p-values associated with each set of parameter estimates indicate that all coefficients are significant (at the α = 0.05 significance level). This time, 2 however, the R2 and Radj values for the highway mileage regression analysis indicate that a fairly large amount of variability in the data remains unexplained by the model; the corresponding values for the city mileage regression analysis are somewhat better. Any difference that might exist between the corresponding parameters of these different models will be assessed, once again, using 95% confidence intervals around each parameter estimate. Again, with n = 20, and having estimated

42

CHAPTER 16.

Fitted Line Plot City mpg = 22.77 - 1.705 Eng Capacity 22.5

Regression 95% CI

20.0

S R-Sq R-Sq(adj)

City mpg

17.5

2.28730 72.3% 70.8%

15.0 12.5 10.0 7.5 5.0 1

2

3

4 5 6 Eng Capacity

7

8

9

Figure 16.21: City gasoline mileage versus engine capacity data, regression line and 95% confidence interval (dashed line).

2 parameters, ν = 18, and with t0.025 (18) = 2.101, the required 95% confidence intervals around each parameter estimate, θi = θˆi ± 2.101 × SEθˆi are obtained as follows for the highway gasoline mileage data: θ0 = 30.159 ± 3.507 ⇒

26.652 ≤ θ0 ≤ 36.666

(16.54)

θ1 = −1.775 ± 0.758 ⇒

−2.533 ≤ θ1 ≤ −1.017

(16.55)

θ0 = 22.766 ± 2.416 ⇒ 20.350 ≤ θ0 ≤ 25.182 θ1 = −1.705 ± 0.522 ⇒ −2.227 ≤ θ1 ≤ −1.183

(16.56) (16.57)

and for the city gasoline mileage data:

We now observe that because the two intervals for θ0 , the intercepts, do not overlap, then at the α = 0.05 significance level, this parameter is different for each model. As with the number of cylinders, this parameter estimate is lower for city gasoline mileage than for highway gasoline mileage. On the other hand, the intervals for θ1 , the slopes, overlap considerably; as such, at the α = 0.05 significance level, we conclude that there is no significant difference in this parameter value for both models. Thus, we conclude, as in Problem 16.23, that the gasoline mileage ratings of the automobiles in question vary approximately linearly with engine capacity, with a negative slope that is essentially the same for “city” as well as “highway” mileage; the intercept of the linear relationship is higher for highway mileage.

43

Probability Plot of RES-ENG-HWY Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-3.81917E-15 3.231 20 0.265 0.655

80

Percent

70 60 50 40 30 20 10 5

1

-10

-5

0 RES-ENG-HWY

5

10

Figure 16.22: Normal probability plot and 95% confidence interval for the residuals obtained from the regression of highway gasoline mileage versus engine capacity. The p-value of 0.655 implies that the residuals appear to be normally distributed, as postulated. (ii) Figs 16.22 and 16.23 show normal probability plots of the residuals from each regression model (along with corresponding 95% confidence intervals and the p-values associated with each Anderson-Darling test). The conclusion is that 2 even though the R2 and Radj values in the highway mileage case are mediocre at best, and only just somewhat better in the city mileage case, there is no evidence for rejecting the null hypothesis that the residuals are normally distributed. The implication is that these simple models (even with considerable unexplained variability) do in fact provide reasonable explanations of how the capacity of a car’s engine affects the gas mileage in the city and on the highway. 16.25 A scatter plot of the data suggests a quadratic relationship of the type y = θ0 + θ1 x + θ2 x2 The result of a regression analysis of this postulated model is shown below: Regression Analysis: Elec. Usage, y versus Home Size, x, x2 The regression equation is Elec. Usage, y = - 1216 + 2.40 Home Size, x - 0.000450 x2 Predictor Constant Home Size, x x2

Coef -1216.1 2.3989 -0.00045004

SE Coef 242.8 0.2458 0.00005908

T -5.01 9.76 -7.62

S = 46.8013 R-Sq = 98.2% R-Sq(adj) = 97.7%

P 0.002 0.000 0.000

44

CHAPTER 16.

Probability Plot of RES-ENG-CITY Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-8.17124E-15 2.226 20 0.255 0.691

80

Percent

70 60 50 40 30 20 10 5

1

-8

-6

-4

-2 0 2 RES-ENG-CITY

4

6

8

Figure 16.23: Normal probability plot and 95% confidence interval for the residuals obtained from the regression of city gasoline mileage versus engine capacity.The p-value of 0.691 implies that the residuals appear to be normally distributed, as postulated.

Unusual Observations Home Elec. Obs Size, x Usage, y 10 2930 1954.0

Fit 1949.2

SE Fit 44.7

Residual 4.8

St Resid 0.35 X

X denotes an observation whose X value gives it large leverage. The p-values associated with the parameters indicate that they are all signifi2 cantly different from zero; and the R2 and Radj values indicate that a substantial amount of the variability in the data set has been explained by the quadratic model. A plot of the regression line and 95% confidence interval versus the data is shown in Fig 16.24. A normal probability plot of the residuals, shown in Fig 16.25 along with a p-value of 0.422 associated with the Anderson-Darling normality test, indicate there there is no evidence to support rejection of the null hypothesis that these residuals are normally distributed. The conclusion therefore is that the regression model provides a decent fit to the data. However, note that it is highly unlikely that the amount of electricity used by a home will actually begin to decrease with home size after a certain point, as indicated by this quadratic model. This anomaly is due to the data point x = 2930; y = 1954, which is identified (by MINITAB) as a point with “high leverage,” implying that its x value is considered “extreme” compared with the mean, x ¯; as a result, this point will exert an unjustifiably large influence on the parameter estimates. With additional data on larger homes in the x = 2400– 3000 sq ft. range, the effect of this particular home will be moderated, and one will find that the rate at which electricity usage increases as a function of size will be less than the corresponding value for smaller homes, but it will not be

45

Fitted Line Plot Elec. Usage, y = - 1216 + 2.399 Home Size, x - 0.000450 Home Size, x**2 2200

Regression 95% CI

Elec. Usage, y

2000

S R-Sq R-Sq(adj)

46.8013 98.2% 97.7%

1800 1600 1400 1200 1000 1500

2000 Home Size, x

2500

3000

Figure 16.24: Regression line and 95% confidence interval (dashed line) for electricity usage versus home size.

Probability Plot of RESI-Elec Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-4.09273E-13 41.27 10 0.338 0.422

80

Percent

70 60 50 40 30 20 10 5

1

-150

-100

-50

0 RESI-Elec

50

100

150

Figure 16.25: Normal probability plot and 95% confidence interval for the residuals obtained from the regression of electricity usage versus home size.The p-value of 0.422 implies that the residuals appear to be reasonably normally distributed.

46

CHAPTER 16.

negative as erroneously implied by this current model. Thus, one must be careful in using this model to analyze how the size of a home influences the amount of electricity used. 16.26 (i) Upon taking logs in the given equation, we obtain: log Sh = log θ0 + θ1 log Re + θ2 log Sc and the result of the regression of log Sh against log Re and log Sc is shown below: Regression Analysis: LogSh versus LogRe, LogSc The regression equation is LogSh = - 1.43 + 0.777 LogRe + 0.455 LogSc Predictor Constant LogRe LogSc

Coef -1.4310 0.77699 0.45531

SE Coef 0.1104 0.02847 0.04228

T -12.96 27.29 10.77

P 0.000 0.000 0.000

S = 0.0297247 R-Sq = 98.2% R-Sq(adj) = 98.0% Therefore, the parameter estimates obtained from this abbreviated sample of the original data set are: θ0

= 10−1.431 = 0.036; compared to 0.023

θ1 θ2

= 0.777; compared to 0.830 = 0.455; compared to 0.440

These estimates are seen to compare fairly well with the corresponding results in the original publication. (ii) A plot of yp , defined as: ( ) Sh yp = log Sc0.455 versus log Re, is shown in Fig 16.26, with the regression model expressed as yˆp = −1.431 + 0.777 log Re The fit is virtually perfect. 16.27 (i) A scatter plot of the data indicates a quadratic relationship of the type y = θ0 + θ1 x + θ2 x2 The result of the regression analysis is shown below:

47

Scatterplot of yp, yph vs LogRe 1.9

Variable yp y ph

1.8 1.7

Y-Data

1.6 1.5 1.4 1.3 1.2 1.1 3.3

3.4

3.5

3.6

3.7

3.8 LogRe

3.9

4.0

4.1

4.2

Figure 16.26: Transformed Gilliland and Sherwood data, yp = log

(

Sh Sc0.455

)

, versus log Re (filled circles), and the regression model, yˆp = −1.431 + 0.777 log Re (dashed line), obtained from an abbreviated sample data set.

Regression Analysis: Peak Power Load versus Temp-F, Temp-F2 The regression equation is Peak Power Load = 385 - 8.30 Temp-F + 0.0598 Temp-F2 Predictor Coef SE Coef T P Constant 385.06 46.87 8.22 0.000 Temp-F -8.295 1.109 -7.48 0.000 Temp-F2 0.059847 0.006468 9.25 0.000 S = 4.85187 R-Sq = 96.7% R-Sq(adj) = 96.4% Unusual Observations

Obs 6 13 22

Temp-F 108 97 100

Peak Power Load 189.300 153.200 143.600

Fit 187.243 143.534 154.018

SE Fit 2.958 1.199 1.415

Residual 2.057 9.666 -10.418

St Resid 0.53X 2.06R -2.24R

R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. The resulting model is: P ∗ = 385.06 − 8.30T + 0.0598T 2

(16.58)

The p-values associated with all three parameter estimates indicate that they 2 values indicate that the regression model are all significant; the R2 and Radj captures a substantial amount of the variability contained in the data. A plot

48

CHAPTER 16.

Fitted Line Plot Peak Power Load = 385.1 - 8.295 Temp_F + 0.05985 Temp_F**2 200

Regression 95% CI S R-Sq R-Sq(adj)

Peak Power Load

180

4.85187 96.7% 96.4%

160 140 120 100 70

80

90 Temp_F

100

110

Figure 16.27: Regression line and 95% confidence interval (dashed line) for peak power load, P ∗ , versus daily high temperature, T ◦ F .

of the data, regression line and 95% confidence interval, shown in Fig 16.27, indicates that the regression model provides a reasonable fit to the data. A normal probability plot of the residuals is shown in Fig 16.28. The pvalue of 0.180 associated with the Anderson-Darling normality test indicates that there is no evidence to support rejection of the null hypothesis (that these residuals are approximately normally distributed). The conclusion therefore is that the regression model provides a decent fit to the data. (ii) The predicted values for T = 65, 85 and 110, are shown below along with 95% confidence and prediction intervals. Values of Predictors for New Observations NewObs Temp-F Temp-F2 1 65 4225 2 85 7225 3 110 12100 Predicted Values for New NewObs Fit SE Fit 1 98.730 2.539 2 112.370 1.290 3 196.746 3.525

Observations 95% CI ( 93.521, 103.940) (109.722, 115.017) (189.514, 203.978)

95% PI ( 87.495, 109.966) (102.068, 122.671) (184.441, 209.051)XX

XX denotes a point that is an extreme outlier in the predictors. It is “safe” to use this model to predict peak power load for observation 2, T = 85, because this value is within the range covered by the data on which the regression model is based. However, observations 1 and 3, (T = 65 and T = 110 respectively), fall outside the range of the experimental data. New

49

Probability Plot of RESI-Power Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-7.48438E-14 4.682 30 0.512 0.180

80

Percent

70 60 50 40 30 20 10 5

1

-15

-10

-5

0 RESI-Power

5

10

15

Figure 16.28: Normal probability plot and 95% confidence interval for the residuals obtained from the regression of peak power load, P ∗ , versus daily high temperature, T ◦ F .The p-value of 0.180 implies that the residuals appear to be normally distributed. observation 1, T = 65, falls just to the left of the minimum value of T = 67, while observation 3, T = 110, falls to the right of the maximum experimental value, T = 108. In fact, MINITAB flagged this latter point as “an extreme outlier in the predictors.” (iii) The required ranges are obtained as follows, along with the width of the range of corresponding P ∗ values: Temp Range 68 ≤ T1 ≤ 70 83 ≤ T2 ≤ 85 102 ≤ T3 ≤ 104

Peak Power Load Range 97.724 ≤ P1∗ ≤ 97.652 108.851 ≤ P2∗ ≤ 112.370 161.606 ≤ P3∗ ≤ 169.673

|∆P ∗ | 0.072 3.591 8.067

Note that for identical 2◦ F temperature ranges, the corresponding ranges of predicted P ∗ values are different, starting out small for low temperature values and increases with increasing temperature. This is an indication of the sensitivity of P ∗ to changes in temperature at different temperatures: the sensitivity is low at low temperatures and increases as temperature increases. (iv) One approach to this problem is to determine first the sensitivity function, ∂P ∗ /∂T , at the specified temperatures from the regression equation in Eq (16.27), i.e., ∂P ∗ = −8.30 + 0.1196Ti (16.59) ∂T T =Ti for the various specified Ti values. We may then use the fact that: ∆P ∗ ≈

∂P ∗ ∆T ∂T

(16.60)

50

CHAPTER 16.

to determine the required ∆Ti . Observe that for any given Ti and the corresponding nominal Pi∗ , the desired precision of the P ∗ prediction, given as ±2%P ∗ , translates to a maximum uncertainty range obtained as: ∆Pi∗ = 0.04Pi∗ Upon using Eq (16.59) to determine ∂P ∗ /∂T , Eq (16.60) is then easily solved to determine ∆Ti . The following table summarizes the results: T 69 84 103

P∗ 97.628 110.551 165.580

|∂P ∗ /∂T | 0.0476 1.746 4.019

|∆T | 82.040 2.532 1.648

The results indicate that at the low temperature, Tl = 69, the sensitivity of P ∗ predictions to uncertainty in temperature T , at 0.0476, is essentially flat. As such, the range of uncertainty in the high temperature forecast that can be tolerated in order to predict peak power to within ±2% of nominal P ∗ , is extremely wide. According to the regression model, P ∗ is essentially insensitive to changes in temperature at low temperature values. For the medium temperature, Tm = 84, the increased sensitivity (1.746) dictates that in order to predict P ∗ to within ±2 % of the nominal value, only about a 2.5◦ F uncertainty can be tolerated in the high temperature forecast; for the high temperature, Th = 103, only about a 1.6◦ F uncertainty can be tolerated in the high temperature forecast, as a result of the much higher sensitivity (4.019). 16.28 A scatter plot of the data does not indicate any particularly strong relationship between protein content and the reflectance measurements. Therefore, we begin with the simplest possible relationship, a linear, two-parameter model of the type: y = θ0 + θ1 x The result of the regression analysis for the data set is shown below: Regression Analysis: Protein Cont, y versus Reflectance, x The regression equation is Protein Cont, y = 5.38 + 0.0115 Reflectance, x Predictor Constant Reflectance, x

Coef 5.382 0.011454

SE Coef 2.558 0.006353

T 2.10 1.80

P 0.047 0.085

S = 1.35315 R-Sq = 12.9% R-Sq(adj) = 8.9% The p-values associated with each parameter indicate that at the α = 0.05 significance level, the intercept, θ0 , is significant, but not the slope, θ1 . The R2 2 values indicate that the linear regression model provides a very poor and Radj explanation of the variability in the data. These observations are confirmed by

51

Fitted Line Plot Protein Cont, y = 5.382 + 0.01145 Reflectance, x 15

Regression 95% CI 95% PI

14

S R-Sq R-Sq(adj)

Protein Cont, y

13 12

1.35315 12.9% 8.9%

11 10 9 8 7 6 350

400

450 Reflectance, x

500

550

Figure 16.29: Protein content, y, versus reflectance measurements, x, and regression line with 95% confidence (dashed line) and prediction (dotted line) intervals. The indicated model fit is quite poor and the prediction interval is so wide as to be virtually useless.

Fig 16.29, which shows that the linear model fit to the data is very poor indeed. The prediction interval is so wide as to be of little use in providing reasonable predictions of protein content for any given reflectance measurement. Upon postulating a quadratic model of the type y = θ0 + θ1 x + θ2 x2 this time, the result of the regression analysis is shown below: Regression Analysis: Protein Cont, y vs Reflectance, x, Refl-x2 The regression equation is Protein Cont, y = 34.0 - 0.126 Reflectance, x + 0.000163 Refl-x2 Predictor Constant Reflectance, x Refl-x2

Coef 34.01 -0.12606 0.0001629

SE Coef 20.10 0.09599 0.0001135

T 1.69 -1.31 1.44

P 0.105 0.203 0.166

S = 1.32166 R-Sq = 20.7% R-Sq(adj) = 13.1% From the indicated p-values, no parameter is significant at the α = 0.05 sig2 nificance level, not even the intercept, θ0 ; the R2 and Radj values have only improved marginally; and they still indicate that the quadratic model, even with the additional parameter, provides a very poor explanation of the variability contained in the data. Once more, these observations are confirmed by Fig 16.30, which shows that the quadratic model fit to the data is not much better than the linear fit; and the prediction interval is still too wide to provide any useful prediction of protein content from the reflectance measurements.

52

CHAPTER 16.

Fitted Line Plot Protein Cont, y = 34.01 - 0.1261 Reflectance, x + 0.000163 Reflectance, x**2 Regression 95% CI 95% PI

16

S R-Sq R-Sq(adj)

Protein Cont, y

14

1.32166 20.7% 13.1%

12 10 8 6 350

400

450 Reflectance, x

500

550

Figure 16.30: Protein content, y, versus reflectance measurements, x, and regression line with 95% confidence (dashed line) and prediction (dotted line) intervals. The indicated quadratic model fit is still quite poor and the prediction interval is still so wide as to be virtually useless.

Thus, none of these calibration lines (linear or quadratic) will be useful. 16.29 (i) A scatter plot indicates a linear relationship between rotameter readings and air flow rate; subsequent regression analysis produces the following result: Regression Analysis: Air Flow, y versus Rotameter, x The regression equation is Air Flow, y = - 6.30 + 1.05 Rotameter, x Predictor Constant Rotameter, x

Coef -6.300 1.04571

SE Coef 6.050 0.06765

T -1.04 15.46

P 0.345 0.000

S = 7.15893 R-Sq = 98.0% R-Sq(adj) = 97.5% The p-values indicate that at the α = 0.05 significance level, the intercept is 2 not significantly different from zero while the slope is; the R2 and Radj values indicate that the linear regression model explains a considerable amount of the variability in the data. The conclusion is therefore that the expression, y = 1.05x, (leaving out the intercept which is not significantly different from zero) is a reliable expression to use as a calibration equation. A MINITAB four-in-one plot of the residuals from this two-parameter linear model fit is shown in Fig 16.31. Although there are only 7 experimental points, there are no indications of any unusual residuals. Isolating the normal probability plot, and including the 95% confidence interval along with a formal AD

53 Residual Plots for Air Flow, y Versus Fits

99

15

90

10 Residual

Percent

Normal Probability Plot

50

5 0

10 -5 1 -10

0 Residual

10

0

50

100

150

Fitted Value

Histogram

Versus Order 15

3 Residual

Frequency

10 2

1

5 0 -5

0 -5

0

5 Residual

10

15

1

2

3 4 5 Observation Order

6

7

Figure 16.31: MINITAB four-in-one plot of residuals from the linear regression of air flow rate data versus rotameter reading indicating no unusual residuals.

normality test, produces the result shown in Fig 16.32. The p-value of 0.268 associated with the formal normality test indicates that at the α = 0.05 significance level, we find no evidence for rejecting the null hypothesis (that the residuals are normally distributed). We conclude from these results therefore that the experiments were reasonably well conducted. (ii) The required plot is shown in Fig 16.33. (iii) Using the linear regression model obtained in part (i), below is a table of the expected values (along with 95% confidence intervals) for the given rotameter readings. Predicted Values for New Observations NewObs Rotameter, x Fit SE Fit 1 70.0 66.90 2.79 2 75.0 72.13 2.73 3 85.0 82.59 2.73 4 90.0 87.81 2.79 5 95.0 93.04 2.89

95% CI (59.73, 74.07) (65.12, 79.14) (75.58, 89.60) (80.64, 94.98) (85.61, 100.47)

16.30 (i) The result of the weighted least squares analysis is shown below: Regression Analysis: ki versus Temp, xi Weighted analysis using weights in wi The regression equation is ki = 37.7 - 0.0107 Temp, xi

54

CHAPTER 16.

Probability Plot of RESI-AirFlow Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-2.08088E-14 6.535 7 0.395 0.268

80

Percent

70 60 50 40 30 20 10 5

1

-30

-20

-10

0 10 RESI-AirFlow

20

30

Figure 16.32: Normal probability plot and 95% confidence interval for residuals obtained from the linear regression of air flow rate data versus rotameter reading. The p-value of 0.268 implies that the residuals appear to be normally distributed.

Fitted Line Plot Air Flow, y = - 6.300 + 1.046 Rotameter, x 180

Regression 95% CI 95% PI

160 140

S R-Sq R-Sq(adj)

Air Flow, y

120

7.15893 98.0% 97.5%

100 80 60 40 20 0 20

40

60

80 100 Rotameter, x

120

140

Figure 16.33: Regression line for air flow rate, y, versus rotameter reading, x, and 95% confidence (dashed line) and prediction (dotted line) intervals. The indicated linear model fit is quite good.

55

Predictor Constant Temp, xi

Coef 37.7168 -0.010700

SE Coef 0.5012 0.001835

T 75.26 -5.83

P 0.000 0.010

S = 0.963178 R-Sq = 91.9% R-Sq(adj) = 89.2% The weighted least-squares estimates are: θˆ0w = 37.717; θˆ1w = −0.0107 The p-values indicate that both parameters are significantly different from zero 2 at the α = 0.05 significance level; the R2 and Radj values indicate that a decent amount of the information in the data set has been explained by this linear model. (ii) The result of a standard least-squares regression analysis is shown below: Regression Analysis: ki versus Temp, xi The regression equation is ki = 37.9 - 0.0113 Temp, xi Predictor Constant Temp, xi

Coef 37.8622 -0.011257

SE Coef 0.5168 0.001422

T 73.27 -7.91

P 0.000 0.004

S = 0.547188 R-Sq = 95.4% R-Sq(adj) = 93.9% The ordinary least-squares estimates are: θˆ0 = 37.862; θˆ1w = −0.0113 with the p-values indicating that both parameters are significantly different from 2 zero at the α = 0.05 significance level. The R2 and Radj values are both better than the corresponding values obtained for the weighted regression analysis, and indicate that comparatively more of the information in the data set has been explained by this version of the linear model. While the two estimates of the slope are hardly distinguishable, the ordinary least squares estimate of the intercept, θˆ0 = 37.862, is slightly higher than the weighted least squares estimate, θˆ0w = 37.717. The most important difference between these two results is in the precision of the estimates. Whereas SE(θˆ0 ), the standard error associated with the ordinary least squares estimate of the intercept, is 0.517, the corresponding value for the weighted counterpart, 0.501, is slightly better. The reverse is the case with the standard errors of the slopes: SE(θˆ1 ) = 0.0014 is slightly worse than SE(θˆ1w ) = 0.0018. (iii) The plot is shown in Fig 16.34 where we do not see much difference between the two lines. Thus, no one particular regression line provides a clearly better fit to the data. A probability plot of both sets of residuals (along with corresponding 95% confidence intervals) is shown in Fig 16.35. This plot also fails to show any conclusive advantage of one regression fit over the other.

56

CHAPTER 16.

Scatterplot of ki, kh_wls, kh_ols vs Temp, xi 37

Variable ki k h_wls k h_ols

36

k

35 34

33

32 31 100

200

300 400 Temp, xi

500

600

Figure 16.34: Regression lines for weighted least squares (dashed-dotted line), and ordinary least squares (dashed line) fits of thermal conductivity, k, versus temperature, xi , data. No regression line provides a clearly better fit.

Probability Plot of RESI-wls, RESI-ols Normal - 95% CI 99 Variable RESI-wls RESI-ols

95

Mean

90

-0.03274 -2.13163E-15

80

AD

P

0.4858 5 0.237 0.4739 5 0.334

StDev N

0.602 0.336

Percent

70 60 50 40 30 20 10 5

1

-2

-1

0 Data

1

2

Figure 16.35: Probability plots of the residuals from least squares fits (weighted least squares residuals (solid line, circles), and ordinary least squares residuals (dashed line, squares)) of thermal conductivity, k, versus temperature, xi . Both sets of residuals are similar and show no evidence of departure from normality.

57 Residual Plots for Linear fit to Cp Normal Probability Plot

Versus Fits

99 0.02 Residual

Percent

90 50

0.00

10 1 -0.04

-0.02 -0.02

0.00 Residual

0.02

0.04

1.4

1.5

Histogram

1.6 1.7 Fitted Value

1.8

Versus Order

3

Residual

Frequency

0.02 2

1

0.00

-0.02

0 -0.02

-0.01

0.00 0.01 Residual

0.02

0.03

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16

Observation Order

Figure 16.36: MINITAB four-in-one plot of the residuals obtained from a linear model fit to the heat capacity, Cp , versus temperature data. Note the distinctive quadratic structure evident in the residual versus fit, and residual versus order plots.

16.31 The result of a linear fit is shown below: Regression Analysis: Cp versus Temp, K The regression equation is Cp = 0.938 + 0.00307 Temp, K Predictor Constant Temp, K

Coef 0.93837 0.00307309

SE Coef 0.02019 0.00008791

T 46.47 34.96

P 0.000 0.000

S = 0.0162103 R-Sq = 98.9% R-Sq(adj) = 98.8% The p-values associated with each of the estimated parameters indicate that both parameters are significantly different from zero at the α = 0.05 significance 2 level; the R2 and Radj values indicate that a substantial amount of the information in the data set has been explained by this two-parameter model. However, the residual versus fit, and residual versus order sub-plots in the MINITAB four-in-one plot of the residuals—shown in Fig 16.36—indicate that a distinct quadratic structure remains unexplained by the linear model. Next, a quadratic model fit produces the result shown below: Regression Analysis: Cp versus Temp, K, Temp2 The regression equation is Cp = 1.32 - 0.000512 Temp, K + 0.000008 Temp2

58

CHAPTER 16. Residual Plots for Cp Normal Probability Plot

Versus Fits

99 0.002 Residual

Percent

90 50

-0.004

10 1 -0.0050

0.000 -0.002

-0.006 -0.0025

0.0000 Residual

0.0025

0.0050

1.50

1.65 1.80 Fitted Value

Histogram

1.95

Versus Order

4 3

Residual

Frequency

0.002

2 1

0.000 -0.002 -0.004 -0.006

0 -0.004

-0.002 0.000 Residual

1

0.002

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16

Observation Order

Figure 16.37: MINITAB four-in-one plot of the residuals obtained from a quadratic model fit to the heat capacity, Cp , versus temperature data. While the quadratic structure evident in the residuals of the linear model fit is gone, a bit of a pattern still remains in the residual versus fit, and residual versus order plots.

Predictor Constant Temp, K Temp2

Coef 1.32479 -0.0005123 0.00000797

SE Coef 0.01318 0.0001205 0.00000027

T 100.53 -4.25 29.87

P 0.000 0.001 0.000

S = 0.00201595 R-Sq = 100.0% R-Sq(adj) = 100.0% Unusual Observations Obs Temp, K Cp 14 280 1.80100

Fit 1.80600

SE Fit 0.00082

Residual -0.00500

St Resid -2.72R

R denotes an observation with a large standardized residual. The p-values associated with the estimated parameters indicate that all three parameters are significantly different from zero at the α = 0.05 significance level; 2 the R2 and Radj values indicate that the quadratic model has captured virtually all the information in the data set. Yet, the MINITAB four-in-one plot of the residuals in Fig 16.37 shows a systematic pattern in the residual versus fit, and residual versus order plots. But the quadratic structure evident in the residuals of the linear model fit is now gone. Finally, the result of a cubic model fit is shown below: Regression Analysis: Cp versus Temp, K, Temp2, Temp3 The regression equation is Cp = 1.32 - 0.00045 Temp, K + 0.000008 Temp2 + 0.000000 Temp3

59 Predictor Constant Temp, K Temp2 Temp3

Coef 1.32029 -0.000449 0.00000768 0.00000000

SE Coef 0.07466 0.001039 0.00000471 0.00000001

T 17.69 -0.43 1.63 0.06

P 0.000 0.673 0.129 0.952

S = 0.00209794 R-Sq = 100.0% R-Sq(adj) = 100.0% The following points are worth noting with this result: (a) the parameter estimates associated with the constant, linear, and quadratic terms have not changed by much, compared to the values obtained for the quadratic model. (b) However, the p-values associated with the estimated parameters indicate that, with the exception of the constant term, this time none of the three parameters is significantly different from zero at the α = 0.05 significance level. 2 (c) The R2 and Radj values remain unchanged (it is impossible to explain better than 100% of the information contained in the data); and (d) although not shown, the MINITAB four-in-one plot of the residuals, is identical to that shown in Fig 16.37, with the systematic pattern persisting. The implication is that the addition of the cubic term does nothing substantial for the model in terms of capturing more information contained in the data; conversely, the three non-constant coefficients are no longer significant. We may now conclude that, as long as we restrict our choice to polynomial models, it is impossible to do better than a quadratic model for this data set. The persistent pattern in the residuals is unexplainable by an additional polynomial term. Fig 16.38 shows the fit of the quadratic model Cp = 1.32 − 0.000512T + 0.000008T 2 to the data, along with 95% confidence and prediction intervals. The confidence and prediction intervals are so tight because there is very little variability left unexplained by the quadratic model. 16.32 A logarithmic transformation yields ( ) AK t ln(T − T0 ) = ln − τ τ a linear, two-parameter model of the form: ln y = θ0 + θ1 t relating ln(T − T0 ) to time, t. A subsequent regression analysis yields the following result: Regression Analysis: ln(T-T0) versus C1 The regression equation is ln(T-T0) = 1.40 - 0.205 C1

60

CHAPTER 16.

Fitted Line Plot Cp = 1.325 - 0.000512 Temp, K + 0.000008 Temp, K**2 1.9

Regression 95% CI 95% PI

1.8

S R-Sq R-Sq(adj)

0.0020159 100.0% 100.0%

Cp

1.7

1.6

1.5

1.4 150

175

200

225 Temp, K

250

275

300

Figure 16.38: Regression line of a quadratic fit to heat capacity, Cp , versus temperature data, along with 95% confidence (dashed line) and prediction (dotted line) intervals. The indicated model fit is near perfect; both the confidence and prediction intervals are tight and essentially indistinguishable from the regression line because the magnitude of the remaining residual structure is virtually inconsequential.

Predictor Constant T

Coef 1.39598 -0.204504

SE Coef 0.03751 0.005202

T 37.22 -39.31

P 0.000 0.000

S = 0.0695670 R-Sq = 99.7% R-Sq(adj) = 99.6% The p-values indicate that both estimated parameters are significant, and the R2 2 and Radj values imply that the fitted model provides an excellent explanation of the information contained in the data set. The estimated parameters, θˆ0 = 1.40, and θˆ1 = −0.205, imply that, in terms of the original process parameters, ( ) AK θˆ0 = ln = 1.40 τ 1 θˆ1 = − = −0.205 τ These equations are easily solved to produce the following estimates of the process parameters: ˆ = 1.975; τˆ = 4.890 K which are very close to the initial “guesstimates” K ≈ 2 and τ ≈ 5. 16.33 Antoine’s equation, θ1

P vap = eθ0 − T +θ3

(16.61)

61 may be “linearized” by taking natural logarithms and rearranging to obtain: ( ) ( ) ( ) θ1 θ0 −1 vap ln P = θ0 − + T+ T ln P vap (16.62) θ3 θ3 θ3 certainly in the form: y = β0 + β1 x1 + β2 x2

(16.63)

which may be used to carry out linear regression, but for which, clearly, x1 and x2 (and also y and x2 ) are not independent. Thus, one must be careful in interpreting the results of such regression analysis. These results are best considered as approximate “starting values” for the original equation’s unknown parameters, θ0 , θ1 and θ3 . Upon adequately transforming the supplied data, and carrying out the indicated regression, one obtains the following results: Regression Analysis: lnPvap versus T, TlnP The regression equation is lnPvap = 1.90 + 0.0729 T - 0.00453 TlnP Predictor Constant T TlnP

Coef 1.90191 0.0728873 -0.00453029

SE Coef 0.00289 0.0001799 0.00002458

T 657.39 405.24 -184.34

P 0.000 0.000 0.000

S = 0.00441504 R-Sq = 100.0% R-Sq(adj) = 100.0% 2 Observe the “unnatural” values for R2 and Radj , indicative of correlation among the regression variables. The p-values associated with the estimates of the “transformed” model parameters indicate that they are all significant, and the resulting standardized residuals, shown in Fig 16.39, appear reasonably “normal.” From the relationships between the transformed model parameters and the original parameters, i.e., β0 = θ0 −

θ1 θ0 −1 ; β1 = θ 0 − ; β2 = θ3 θ3 θ3

(16.64)

we obtain the following inverse relationship: θ0

=

θ1

=

θ3

=

−β1 β2 ( ) −1 −β1 β1 β0 − β0 = 2 + β2 β2 β2 β2 −1 β2

(16.65)

from which we may determine the values of the original parameters corresponding to the transformed parameters estimated as βˆ0 = 1.9019; βˆ1 = 0.0729; βˆ2 = 0.0045. The result is: θˆ0 = 16.089; θˆ1 = 3131.6; θˆ3 = 220.74

(16.66)

62

CHAPTER 16. Residual Plots for lnPvap Normal Probability Plot

Versus Fits Standardized Residual

99

Percent

90 50 10 1 -3.0

-1.5 0.0 1.5 Standardized Residual

2 1 0 -1 -2

3.0

2

4 Fitted Value

Histogram Standardized Residual

Frequency

2

1

0 -1 0 1 2 Standardized Residual

8

Versus Order

3

-2

6

3

2 1 0 -1 -2 1

2

3

4 5 6 7 Observation Order

8

9

10

Figure 16.39: Standardized residuals from the “linearized” Antoine’s equation model fit. How good are these estimates? One way to answer this question is to introduce these estimated values into the original Antoine’s equation and use the vap resulting model to obtain Ppred , the predicted the vapor pressures corresponding to the given temperatures. i.e., 3131.6

vap Ppred = e16.089− T +220.74

(16.67)

A comparison of the model predictions against experimental vapor pressure data should provide an objective evaluation of how good these estimates are. The result of such an exercise is shown in the following table: P vap (mm Hg) Data 5 10 20 40 60 100 200 400 760 1520

vap Ppred Prediction 5.02 9.99 19.96 39.99 59.88 99.76 200.20 401.37 763.32 1514.60

indicating excellent agreement between the prediction and actual data. A fitted line plot of data versus prediction is shown in Fig 16.40. Note that, theoretically, the slope should be 1, and the intercept 0; the corresponding actual values obtained for the slope and intercept (1.002, and −0.6143, respectively) are not

63

Fitted Line Plot Pvap = - 0.6143 + 1.002 exp_z 1600

S R-Sq R-Sq(adj)

1400

1.97656 100.0% 100.0%

1200

Pvap

1000 800 600 400 200 0 0

200

400

600

800 exp_z

1000

1200

1400

1600

Figure 16.40: Comparison of the Antoine’s equation model fit (using the approximate parameter estimates obtained via model transformation) to toluene vapor pressure data. “exp-z” is the model prediction computed from Eq (16.67). too far off from the theoretical values. The fit indicated in this plot is virtually perfect. vap To determine the required expected values of Ppred evaluated at the given temperatures: 0, 25, 50, 75, 100, and 125 (◦ C), first determine the corresponding vap Ppred values from Eq (16.67), and then obtain the corresponding values for vap x2 = T ln Ppred

(16.68)

from where the linearized equation, Eq (16.62), may then be used, in conjunction with the estimated values for the βs, to determine the 95% confidence intervals vap for the predicted yˆ = ln Ppred . The result of such an exercise, using MINITAB, is summarized as follows: T 0 25 50 75 100 125

vap Ppred 6.699 28.367 92.018 244.639 558.445 1131.349

vap x2 = T ln Ppred 0.000 83.631 226.099 412.484 632.516 878.896

yˆ 1.902 3.345 4.522 5.500 6.325 7.031

SE fit 0.0029 0.0018 0.0020 0.0020 0.0020 0.0029

95% CI for yˆ (1.895, 1.909) (3.341, 3.349) (4.517, 4.527) (5.495, 5.505) (6.320, 6.330) (7.024, 7.038)

And now, to determine, from the logarithmic transformed version shown above, the expected values for the vapor pressure along with the corresponding 95% confidence intervals, we simply observe that if the 95% CI for the transvap formed variable is represented as (ˆ yL , yˆR ), where yˆ = ln Ppred , then the correvap sponding approximate 95% CI for the original untransformed variable, Ppred , will be (eyˆL , eyˆR ).

64

CHAPTER 16. The result, obtained from the table given above, is as follows: T 0 25 50 75 100 125

vap Ppred 6.699 28.367 92.018 244.639 558.445 1131.349

vap 95% CI for eyˆ = Ppred (6.653, 6.745) (28.2477, 28.486) (91.576,492.461) (243.473, 245.810) (555.772,561.132) (1123.521,1139.2314)

16.34 (i) The result of a linear two-parameter model fit to the data is shown below: Regression Analysis: Height versus “Wingspan” The regression equation is Height = 42.6 + 0.725 "Wingspan" Predictor Constant "Wingspan"

Coef 42.608 0.72521

SE Coef 9.421 0.05335

T 4.52 13.59

P 0.000 0.000

S = 3.89986 R-Sq = 84.5% R-Sq(adj) = 84.0% Unusual Observations Obs "Wingspan" Height 33 177 180.000

Fit 170.969

SE Fit 0.651

Residual 9.031

St Resid 2.35R

R denotes an observation with a large standardized residual. The p-values indicate that the two parameters are significant, and the R2 and 2 Radj values, although not spectacular, indicate that a reasonable amount of the information contained in the data set has been explained by the linear model. However, the residual plots shown in Fig 16.41, especially the histogram, indicate a distribution of residuals that is somewhat positively skewed—indicating that there are more positive residuals than negative ones, so that the model is more likely to “underestimate” the height than not. A formal normality test of the residuals is shown in the probability plot of Fig 16.42; the associated p-value, 0.047, indicates that at the α = 0.05 significance level, we must reject the null hypothesis that the residuals are normally distributed. This is most likely due to the skewness. Finally, Fig 16.43 shows a plot of the regression model along with 95% confidence and prediction intervals. The observation marked with a diamondenclosed cross is unusual: it has an unusually low “wingspan” value associated with an otherwise high height value. This is also the only data point that lies outside the 95% prediction interval. As discussed above, the significance of 2 the model parameters and the R2 and Radj values indicate that height is fairly predictable from “wingspan;” however, the residual analysis indicates that the model is more likely to underestimate the height.

65

Residual Plots for Height Normal Probability Plot

Versus Fits

99

10

5

Residual

Percent

90 50

0

10 -5 1 -10

-5

0 Residual

5

10

150

160

Histogram

170 Fitted Value

180

190

Versus Order 10

5

Residual

Frequency

8 6 4

0

2 -5 0 -4

0 4 Residual

8

1

5

10

15 20 25 Observation Order

30

35

Figure 16.41: MINITAB four-in-one plots of the residuals obtained from a linear model fit to the height versus “wingspan” data. The histogram indicates a skewed distribution.

Probability Plot of RESI-Wingspan Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-6.47383E-14 3.844 36 0.748 0.047

80

Percent

70 60 50 40 30 20 10 5

1

-10

-5

0 RESI-Wingspan

5

10

Figure 16.42: Probability plot of the residuals obtained from a linear model fit to the height versus “wingspan” data. The p-value of 0.047 associated with the formal normality test indicates that at the α = 0.05 significance level, we reject the normality hypothesis.

66

CHAPTER 16.

Fitted Line Plot Height = 42.61 + 0.7252 "Wingspan" 200

Regression 95% CI 95% PI

190

S R-Sq R-Sq(adj)

Height

180

3.89986 84.5% 84.0%

170 160 150 140 150

160

170 180 "Wingspan"

190

200

Figure 16.43: Height versus “wingspan” data, linear regression line, and 95% confidence (dashed line) and prediction (dotted line) intervals. The diamond-enclosed cross indicates an observation with an unusually low “wingspan” value associated with an otherwise high height value.

(ii) Since there is no particular fundamental reason for choosing either height or “wingspan” as the predictor, for this part of the problem, we choose “wingspan” as the response y, and height as the predictor, x, and repeat the regression analysis. The result (which requires some discussion) is shown below: Regression Analysis: “Wingspan” versus Height The regression equation is "Wingspan" = - 22.2 + 1.16 Height Predictor Constant Height

Coef -22.25 1.16464

SE Coef 14.62 0.08567

T -1.52 13.59

P 0.137 0.000

S = 4.94212 R-Sq = 84.5% R-Sq(adj) = 84.0% Unusual Observations Obs Height "Wingspan" Fit SE Fit Residual St Resid 33 180 177.000 187.390 1.166 -10.390 -2.16R R denotes an observation with a large standardized residual. Before proceeding to use this result, we note the following: 1. The resulting equation, "Wingspan" = - 22.2 + 1.16 Height is not exactly the same as one would obtain had one simply “inverted” the regression equation obtained earlier. That equation,

67 Height = 42.6 + 0.725 "Wingspan" upon inversion would have yielded (using h for height and w for “wingspan”): w = −58.76 + 1.379h 2. Furthermore, the p-value of 0.137 associated with the constant term suggests that this parameter is now no longer significantly different from zero. 3. The value obtained for the root mean square error, s = 4.942, is higher than the value of 3.90 obtained for the first regression; but the R2 and 2 Radj values are identical for both regressions. The reason for these observations is that neither variable is error-free, whereas regression technically requires that the “predictor” (the x variable) be strictly deterministic. Thus, missing from these two regression equations are the errors associated with the variable that has been designated the “predictor.” Additionally, σh2 , the variance associated with the height measurements (easily computed as 95.08), is 1.5 times smaller than the variance associated with the “wingspan” 2 measurements, determined as σw = 152.96. Thus, the variability associated with both variables and, in particular, the more significant variability associated with “wingspan” as the response variable, have jointly contributed to the increased uncertainty associated with the estimate of the constant term, and also in the inflated value obtained for s. The R2 2 and Radj values are unchanged because the amount of information explained by the least-squares straight line through a data set remains unchanged regardless of the orientation of the x and y variables. If this new model1 is now used for estimating “wingspans” from the given values for height, the result is shown below : Predicted Values for New Observations NewObs Height Fit SE Fit 95% CI 1 156.5 160.021 1.446 (157.082, 162.960) 2 187.0 195.543 1.645 (192.199, 198.886) 16.35 A logarithmic transformation of Kleiber’s law yields log Q = log θ0 + θ1 log M A regression analysis of the logarithmically transformed sample data produces the following result: Regression Analysis: Log10Q versus Log10M-Kg The regression equation is Log10Q = 0.454 + 0.685 Log10M-Kg 1 Technically, before using the model, the insignificant constant term should be removed and the regression repeated without this parameter.

68

CHAPTER 16.

Fitted Line Plot log10(BMR) = 0.4536 + 0.6853 log10(Body Mass-Kg) 10000

Regression 95% CI 95% PI

1000

S R-Sq R-Sq(adj)

BMR

100

0.210248 96.5% 96.4%

10 1 0.1 0.01 0.01

0.1

1 10 100 Body Mass-Kg

1000

10000

Figure 16.44: Log-log plot of a sample from Heusner’s data (on basal metabolism rate (BMR) values, Q0 (in Watts), and body mass, M ), the linear regression line, and 95% confidence (dashed line) and prediction (dotted line) intervals.

Predictor Constant Log10M-Kg

Coef 0.45361 0.68526

SE Coef 0.04834 0.03055

T 9.38 22.43

P 0.000 0.000

S = 0.210248 R-Sq = 96.5% R-Sq(adj) = 96.4% Unusual Observations Obs Log10M-Kg Log10Q 20 3.56 3.3686

Fit 2.8965

SE Fit 0.1084

Residual 0.4721

St Resid 2.62R

R denotes an observation with a large standardized residual. Fig 16.44 shows a plot of the data and the regression line along with 95% confidence and prediction intervals. With n = 20, and having estimated 2 parameters, so that ν = 18 degrees of freedom; and since t0.025 (18) = 2.101, the 95% confidence interval estimates are therefore obtained as follows: log θ0 θ1

= 0.454 ± 2.101 × 0.048 = 0.454 ± 0.102 = 0.685 ± 2.101 × 0.031 = 0.685 ± 0.064

We now observe that the theoretical value of θ1 = 2/3 = 0.666 is very close to the point estimate, θˆ1 obtained above, and lies well within the 95% confidence interval (0.621, 0.749).

Chapter 17

Exercises 17.1 A normal probability plot (and 95% confidence interval) of the residuals from the least-squares regression is shown Fig 17.1. The p-value of 0.314 associated with the Anderson-Darling test indicates that the normality assumption concerning the residual errors appears to be valid. However, note that this validity refers only to the normality of the residuals, not necessarily the adequacy of the regression model in representing the information contained in the data.

Probability Plot of RESID-Protein Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-8.43769E-15 1.323 24 0.412 0.314

80

Percent

70 60 50 40 30 20 10 5

1

-4

-3

-2

-1 0 1 RESID-Protein

2

3

4

5

Figure 17.1: Normal probability plot and 95% confidence interval for the residuals from a linear calibration curve of protein content in wheat versus reflectance.

17.2 A standard MINITAB “four-in-one” plot of the residuals resulting from the regression is shown in Fig 17.2. Although there are only 8 data points, the plots show nothing out of the ordinary. A normal probability plot and 95% confidence interval for these residuals is shown in Fig 17.3; the associated p1

2

CHAPTER 17. Residual Plots for k Versus Fits

99

4

90

2 Residual

Percent

Normal Probability Plot

50

0 -2

10

-4

1 -5.0

-2.5

0.0 Residual

2.5

5.0

90

100

Histogram

120

130

Versus Order 4

3

2 Residual

Frequency

110 Fitted Value

2

1

0 -2 -4

0 -3

-2

-1

0 1 Residual

2

3

4

1

2

3 4 5 6 Observation Order

7

8

Figure 17.2: MINITAB “four-in-one” plot of residuals from the regression model. value of 0.922 indicates that the normality assumption for the regression model errors appears sufficiently adequate.

Probability Plot of RESID-k Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

-3.55271E-14 2.208 8 0.156 0.922

80

Percent

70 60 50 40 30 20 10 5

1

-10

-5

0 RESID-k

5

10

Figure 17.3: Normal probability plot and 95% confidence interval for the residuals from a linear regression model of thermal conductivity k as a function of temperature T .

17.3 Fig 17.4 shows both probability plots, when the mean is specified as β = 4, at the top, and when the mean is estimated from data as βˆ = 3.912 below. There is very little difference between both plots. Furthermore, the associated p-values indicate that the exponential distributional assumption appears valid

3 Probability Plot of Exponential Exponential - 95% CI 99 Mean N AD P-Value

90

Percent

80 70 60 50 40

4 20 0.118 >0.250

30 20 10 5 3 2 1

0.01

0.1

1 Exponential

10

100

Probability Plot of Exponential Exponential - 95% CI 99 Mean N AD P-Value

Percent

90 80 70 60 50 40 30

3.912 20 0.117 0.997

20 10 5 3 2 1

0.01

0.1

1 Exponential

10

100

Figure 17.4: Exponential probability plots (and 95% confidence intervals). Top panel: for the E(4) distribution; bottom panel: when β is estimated from data as 3.912. Specifying the population parameter independently makes no difference in this particular case.

in both cases. We conclude therefore that, in this particular case, knowing the population parameter independently does not make any difference. 17.4 Note: in MINITAB, the lognormal parameter α is referred to as the “location” parameter and β as the “scale” parameter. On the other hand, in the text, we refer to α as the “scale” parameter, and β as the “shape” parameter. (i) The lognormal probability plots (and 95% confidence intervals) shown in Fig 17.5, and the indicated p-values associated with the Anderson-Darling tests, imply that the postulated lognormal model appears sufficiently adequate for the XL1 and XL2 data. (ii) By definition, if X ∼ L(α, β 2 ), then ln X ∼ N (α, β 2 ). Thus, a natural logarithmic transformation of the XL1 and XL2 data should produce normally distributed data with the indicated parameters. A normal probability plot of the logarithmically transformed data is shown (along with 95% confidence intervals) in Fig 17.6 where we see that the AndersonDarling test statistics as well as the associated p-values are identical to the

4

CHAPTER 17. Probability Plot of XL2

Probability Plot of XL1

Lognormal - 95% CI

Lognormal - 95% CI 99

99 Loc Scale N AD P-Value

95

90

80

80

70

70

60 50 40

Loc Scale N AD P-Value

95

Percent

Percent

90

0.06213 0.2455 10 0.186 0.876

0.2946 0.2390 10 0.236 0.716

60 50 40

30

30

20

20

10

10

5

5

1

1

0.5

0.6

0.7 0.8 0.9 1

1.5

0.6

2

0.7 0.8 0.9 1

XL1

1.5

2

3

XL2

Figure 17.5: Lognormal probability plots and 95% confidence intervals for XL1 and XL2 data. The p-values indicate that the postulated models appear sufficiently adequate. Probability Plot of LnXL1

Probability Plot of LnXL2

Normal - 95% CI

Normal - 95% CI

99 Mean StDev N AD P-Value

95

0.06213 0.2455 10 0.186 0.876

90

80

80

70

70

60 50 40 30

0.2946 0.2390 10 0.236 0.716

60 50 40 30

20

20

10

10

5

5

1

Mean StDev N AD P-Value

95

Percent

Percent

90

99

1

-1.0

-0.5

0.0 LnXL1

0.5

1.0

-0.50

-0.25

0.00

0.25 0.50 LnXL2

0.75

1.00

1.25

Figure 17.6: Normal probability plots and 95% confidence intervals for log-transformed XL1 and XL2 data indicating that the transformed data sets appear to follow the postulated normal distributions. Note: the values of the AD test statistics and the associated p-values are identical to the corresponding values in Fig 17.5.

corresponding values obtained from the earlier tests carried out directly on the original data sets. Thus, the conclusion is the same as in (i), i.e., that the postulated distributions appear to be sufficiently adequate. 17.5 (i) Let Y = X1 X2 , where X1 ∼ L(α1 , β12 ) and X2 ∼ L(α2 , β22 ). Then, ln Y = ln X1 + ln X2 Also, from the properties of the lognormal distribution, we know that ln X1 ∼ N (α1 , β12 ) and ln X2 ∼ N (α2 , β22 ). We may now apply the characteristic function method for determining the distribution of random variable sums to obtain the following result: φln Y (t)

= φln X1 (t)φln X2 (t) 1 1 = exp{jα1 t − β1 t2 } exp{jα2 t − β2 t2 } 2 2 1 2 = exp{j(α1 + α2 )t − (β1 + β2 )t } 2

5

Probability Plot of PX1X2 Lognormal - 95% CI 99 Loc Scale N AD P-Value

95 90

0.7628 0.4031 10 0.195 0.851

Percent

80 70 60 50 40 30 20 10 5

1

1

10 PX1X2

Figure 17.7: Lognormal probability plot and 95% confidence interval for Y = X1 X2 data postulated as Y ∼ L(0.75, 0.3542 ).

Therefore, if we let αy

= α1 + α2

(17.1)

βy2

β12

(17.2)

=

+

β22

) ( then ln Y ∼ N αy , βy2 , so that Y ∼ L(αy , βy2 ) as required. (ii) If X1 ∼ L(0.5, 0.252 ) and X2 ∼ L(0.25, 0.252 ), then according to Eqs (17.1) and (17.2), we expect Y ∼ L(0.75, 0.3542 ). Upon generating the product Y = X1 X2 from the supplied data, we obtain the lognormal probability plot (and 95% confidence interval) shown in Fig 17.7. We may now note the following results: first, αy is estimated as 0.7628, and βy as 0.4031 (compared with the theoretical postulates of 0.75 and 0.354 respectively); next, the p-value associated with the AD test that the data set follows a lognormal distribution is given as 0.851. Taken together, these results indicate that the postulated model, Y ∼ L(0.75, 0.3542 ), is sufficiently adequate. 17.6 (i) Note that in MINITAB, the probability plot options do not include the inverse gamma distribution. Nevertheless, from the fact that if Y is a gamma random variable, then XI = 1/Y is an inverse gamma random variable, we are able to test the validity of the inverse gamma model with a gamma probability plot for 1/XI , the inverse of the inverse gamma data. The probability plots (and 95% confidence intervals) shown in Fig 17.8, and the associated p-values, all suggest that the probability distributions postulated for each of the four variables all appear sufficiently adequate. (ii) Using only the top half of each data set (so that n = 10 in each case), the resulting probability plots (not shown) indicate no change in the validity of the

6

CHAPTER 17. Probability Plot of X_N

Probability Plot of X_L

Normal - 95% CI

Lognormal - 95% CI

99 Mean StDev N AD P-Value

95

95 90

80

80

70

70

Percent

Percent

90

99

10.04 0.9866 20 0.195 0.877

60 50 40 30

20

10

10

5

5

Shape Scale N AD P-Value

7.065 1.388 20 0.210 >0.250

1

7

8

9

10 X_N

11

12

13

14

0.1

1

10 X_L

100

1000

Probability Plot of InvX_I

Probability Plot of X_G

Gamma - 95% CI

Gamma - 95% CI 99

99 Shape Scale N AD P-Value

95 90 80 70 60 50 40 30

6.882 1.716 20 0.358 >0.250

95 90 80

Percent

Percent

2.074 1.109 20 0.349 0.439

60 50 40 30

20

1

Loc Scale N AD P-Value

20

70 60 50 40 30 20

10

10

5

5

1

1

1

10

1

X_G

10 InvX_I

Figure 17.8: Normal probability plots (and 95% confidence intervals) for XN ,XL ,XG , and XI data (n = 20). The plot for XI is a gamma probability plot for the inverse, 1/XI . postulated distributions. Perhaps the only noteworthy change is in the p-value associated with the AD test for the XL data, which is reduced to 0.12 from 0.43; but even this change is not sufficient to alter the conclusion that the model is sufficiently adequate. 17.7 The normal probability plots and 95% confidence intervals for the X and Y data sets, along with the results of the AD tests, are shown in Fig 17.9. The p-values (0.417 and 0.1) indicate that the postulated models are sufficiently adequate in both cases.

APPLICATION PROBLEMS 17.8 The result of a “Goodness-of-fit test for Poisson” (obtained with the following sequence in MINITAB: Stat > Basic Statistics > Goodness of Fit Test for Poisson), is shown in the table below for the Period I data: Goodness-of-Fit Test for Poisson Distribution Data column: P1 Poisson mean for P1 = 6

7

Probability Plot of X, Y Normal - 95% CI 99 Variable X Y

95 90

Mean StDev N AD P 9.513 2.492 15 0.353 0.417 10.80 2.548 15 0.595 0.100

80

Percent

70 60 50 40 30 20 10 5

1

0

5

10 Data

15

20

Figure 17.9: Normal probability plots and 95% confidence intervals for X data (solid line, filled circles), and Y data (dashed line, squares).

P1 =9 N 20

Observed 4 1 5 3 1 2 4 N* 0

DF 5

Poisson Probability 0.151204 0.133853 0.160623 0.160623 0.137677 0.103258 0.152763

Chi-Sq 3.78515

Expected 3.02408 2.67705 3.21246 3.21246 2.75354 2.06515 3.05525

Contribution to Chi-Sq 0.31495 1.05060 0.99465 0.01405 1.11671 0.00206 0.29214

P-Value 0.581

7 cell(s) (100.00%) with expected value(s) less than 5. With an indicated p-value of 0.582 associated with the chi-squared test, we conclude that the Poisson model postulated for the Period I data appears sufficiently adequate. The corresponding result for the Period II data is shown below: Goodness-of-Fit Test for Poisson Distribution Data column:

P2

Poisson mean for P2 = 2.9

8

CHAPTER 17. Poisson P1 Observed Probability Expected =4 8 0.330377 6.60753 N N* DF Chi-Sq P-Value 20 0 2 1.80772 0.405

Contribution to Chi-Sq 0.11686 0.02999 1.36742 0.29345

3 cell(s) (75.00%) with expected value(s) less than 5. This time, the associated p-value is 0.405, indicating that the Poisson model postulated for the Period II data also appears sufficiently adequate. 17.9 A frequency distribution for each period is shown in the following table, from which the histograms in Fig 17.10 are obtained. x 0 1 2 3 4 5 6 7 8 9 10 TOTAL

ϕI (x) 0 0 1 3 1 5 3 1 2 1 3 20

ϕII (x) 1 4 5 2 6 0 0 2 0 0 0 20

To compare these two empirical distributions against each other using a chisquared test, it is essential to reconstitute the frequency tables, consolidating the observed frequencies to reduce the number of groups with fewer than 5 entries. One such consolidation is shown below: x Range 0-3 4-5 6-8 ≥9 TOTAL

ϕE I (x) 4 6 6 4 20

ϕE II (x) 12 6 2 0 20

The hypotheses to be tested may then be stated as follows: E H0 : ϕ E I (x) = ϕII (x) E Ha : ϕ E I (x) ̸= ϕII (x)

9

Histogram of P1, P2 0

2

P1

4

6

8

10

P2

6

Frequency

5 4 3 2 1 0 0

2

4

6

8

10

Figure 17.10: Histograms for the raw number of accidents occurring in Period I (P1) and Period II (P2), presented using the same X- and Y -axes scales. Note how very different the two histograms are.

Now, since there is no “theoretical” distribution to which an “empirical” one will be compared, we choose to define the chi-squared test statistic as: ) 4 ( E ∑ ϕI (xi ) − ϕE II (xi ) C2 = (17.3) ϕE I (xi ) i=1 with the normalization based on ϕE I (xi ). H0 will be rejected at the α significance level, if C 2 > χ2α (ν) or else on the basis of the p-value determined as: p = P (C 2 ≥ c2 (x)) using a χ2 (ν) distribution, where c2 (x) is the specific value computed for the chi-squared test statistic. In this specific case, c2 =

82 0 42 42 + + + = 22.67 4 6 6 4

Since we have not used a theoretical distribution, so that we have not estimated any parameter from the data, ν = 4 − 1 = 3. The associated p-value is obtained as follows (see Fig 17.11): P (C 2 ≥ 22.67|C 2 ∼ χ2 (3)) = 0.00005 Thus, at the α = 0.05 significance level, we must reject the null hypothesis and conclude that the two empirical distributions are different.

10

CHAPTER 17.

Distribution Plot Chi-Square, df=3 0.25

Density

0.20

0.15

0.10

0.05

0.00

0.00004731 22.67

0 X

Figure 17.11: P (C 2 ≥ 22.67) for the χ2 (3) distribution. The p-value of 0.00005 indicates that the null hypothesis being tested must be rejected at the α = 0.05 significance level. 17.10 From the supplied data, the sample average and variance are obtained as x ¯ = 54.75 and s2 = 705.28, so that for the gamma random variable, 54.75

= αβ

705.28

= αβ 2

Upon solving these two equations simultaneously for the two unknown parameters (or by simply recalling the solution to Problem 9.40), the parameters for the gamma distribution postulated to represent this data set are estimated as α ˆ = 4.25 βˆ = 12.88 Now, from the theoretical γ(4.25, 12.88) distribution with the pdf f (x) =

1 12.884.25 Γ(4.25)

e−x/12.88 x3.25

we may now obtain ϕ(xi ), the predicted relative frequency values for each value of the supplied inter-origin distances xi , as ϕ(xi ) = f (xi )∆x where, for this data set, ∆x = 15. The result is shown in the table below along with corresponding relative frequency data. The column, C 2 (xi ), contains individual contributions to the chi-squared statistic so that C2 =

n ∑ i=1

C 2 (xi ) = 0.156315

11

Scatterplot of Relative Frequency, fxGamma vs Distance (kb) 0.35

Variable Relativ e Frequency fxGamma

0.30 0.25

f(x)

0.20 0.15 0.10 0.05 0.00 0

20

40

60 80 100 120 x, Distance (kb)

140

160

180

Figure 17.12: Frequency data on distances between DNA replication origins (solid line, circles), f o (xi ), versus theoretical gamma γ(4.25, 12.88) distribution fit, ϕ(xi ).

Distance (kb) xi 15 30 45 60 75 90 105 120 135 150 165 Total

Observed Relative Frequency f o (xi ) 0.02 0.20 0.32 0.16 0.08 0.11 0.03 0.02 0.01 0.00 0.01 0.96

Theoretical Relative Frequency ϕ(xi ) 0.071970 0.213658 0.249022 0.197930 0.127553 0.071987 0.037072 0.017854 0.008170 0.003590 0.001527 1.00033

Contributions to C 2 C 2 (xi ) 0.0375275 0.0008731 0.0202307 0.0072685 0.0177282 0.0200734 0.0013492 0.0002578 0.0004100 0.0035905 0.0470062 0.156315

First, a plot of the observed empirical frequency distribution, f o (xi ), compared to the theoretical gamma γ(4.25, 12.88) distribution fit, ϕ(xi ), is shown in Fig 17.12. Now, because there are n = 11 data points (eliminating the uninformative and singularity-inducing value for x = 0), and since we have estimated two theoretical population parameters from the data, the reference distribution for the chi-squared statistic is χ2 (ν), where ν = 11 − 2 − 1 = 8. The p-value associated with the computed C 2 = 0.1563 is obtained as (see Fig 17.13): P (C 2 ≥ 0.156315|C 2 ∼ χ2 (8)) = 1.000

12

CHAPTER 17.

Distribution Plot Chi-Square, df=8 0.12 0.10

Density

0.08 0.06 1.000 0.04 0.02 0.00

0.1563 0 X

Figure 17.13: P (C 2 ≥ 0.156315) for the χ2 (8)) distribution. The value of 1.000 indicates strong evidence that the postulated gamma model is adequate.

As such, the gamma model appears to be quite adequate: the value of the chi-squared statistic is remarkably low, and the associated p-value is 1. 17.11 Fig 17.14 shows exponential probability plots (and 95% confidence intervals) for the each data set. The p-values associated with the AD tests indicate that the postulated exponential models appear sufficiently adequate. 17.12 To validate this model formally, first, we need to determine estimates for the postulated logarithmic series pdf, which will be used to obtain the theoretical predicted frequencies (see Problem 8.36) which, in turn, will subsequently be formally tested against the observed frequency. By definition, the data average is obtained as: ∑ xi Φ(xi ) 3306 = x ¯ = ∑i = 6.599 Φ(x ) 501 i i From the result in Exercise 8.13, we know that for this random variable: µ = E(X) = αp/(1 − p); where α=

−1 ln(1 − p)

Thus, given x ¯ = 6.599 as an estimate of µ, we must now solve the following nonlinear equation numerically for p: 6.599 =

−p (1 − p) ln(1 − p)

13

Probability Plot of Operator A, Operator B, Operator C Exponential - 95% CI

Percent

Operator A

Operator B

90

90

50

50

10

10

1

Operator A Mean 1.867 N 10 AD 0.352 P-Value 0.694

1 0.01

0.1 1 Operator C

10

0.01

0.1

10

0.01

0.1

1

10

90 50

Operator B Mean 2.124 N 10 AD 0.458 P-Value 0.511 Operator C Mean 1.972 N 10 AD 0.541 P-Value 0.405

10

1 1

Figure 17.14: Exponential probability plots and 95% confidence intervals for operators A, B, and C. The p-values indicate that the postulated exponential models appear sufficiently adequate.

The result is: p = 0.953

Upon introducing this value into the logarithmic series pdf,

f (x) =

αpx x

the resulting predicted frequency, obtained as: ˆ Φ(x) = 501f (x)

is shown in the following table, along with the observed frequency, and C 2 (x), the individual contributions to the chi-squared statistic.

14

CHAPTER 17. No of species x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 TOTAL

Observed Frequency Φ(x) 118 74 44 24 29 22 20 19 20 15 12 14 6 12 6 9 9 6 10 10 11 5 3 3 501

Predicted Frequency ˆ Φ(x) 156.152 74.407 47.273 33.788 25.760 20.458 16.711 13.935 11.805 10.125 8.772 7.663 6.741 5.965 5.306 4.740 4.252 3.827 3.455 3.128 2.839 2.583 2.354 2.150 474.190

Contributions to C 2 C 2 (x) 12.3357 0.0022 0.2435 3.9922 0.3619 0.1081 0.5408 1.3502 3.3583 1.5845 0.8685 2.8685 0.0915 3.0349 0.0803 2.0159 2.5049 0.7870 4.2835 4.7223 6.0545 1.1686 0.1390 0.2407 100.232

Fig 17.15 shows a plot of the observed frequency and the model prediction. While there appears to be a qualitatively good fit between the model and the observed data, we now proceed to carry out a formal test as follows, using the chi-squared statistic, C 2 , obtained in the table above. First, there are n = 24 data points, from which we have estimated a single theoretical population parameter; as such, the reference distribution for the chisquared statistic is χ2 (ν), where ν = 24 − 1 − 1 = 22. The p-value associated with the computed C 2 = 100.232 is therefore obtained as (see Fig 17.16): P (C 2 ≥ 100.232|C 2 ∼ χ2 (22)) = 5.8752 × 10−12 which is essentially zero. Thus, despite the visual appearance of a good fit in Fig 17.15, the formal model validation test indicates that the logarithmic series model is in fact not entirely valid for this data set. This somewhat surprising result may be due to the fact that for this random variable, E(X) is very sensitive to perturbations in the value of the parameter p, particularly when p ≈ 1, as is the case with this data set. (For example, if the

15

160

Variable Observ ed Predicted

140 120

Frequency

100 80 60 40 20 0 0

5

10

15

20

25

X

Figure 17.15: Empirical frequency distribution of the Malaya butterfly data (solid line, circles) versus theoretical logarithmic series model, with model parameter estimated as pˆ = 0.953 (dashed line, squares). model parameter had been estimated as 0.98 instead of 0.953, the C 2 statistic improves to 57.1650—a change of almost 43%—although, in this case this is not enough to change the conclusion of the chi-squared test.) 17.13 The nature of the phenomenon in question (waiting time until the occurrence of an event—publication of a manuscript—which itself involves a sequence of other intermediate events) suggests that a gamma distribution model might be appropriate. Fig 17.17 shows the gamma probability plot and 95% confidence interval. The indicated p-value associated with the AD test implies that the postulated gamma model appears sufficiently adequate for this data set. 17.14 A plot of the histogram and corresponding continuous gamma distribution fit is shown in Fig 17.18. The parameters for the gamma distribution (obtained via MINITAB and indicated on the plot), are: α ˆ = 3.577; βˆ = 2.830 Now, using the theoretical distribution, γ(3.577, 2.830), with the pdf f (x) =

1 2.833.577 Γ(3.577)

e−x/2.830 x2.577

the predicted values of ϕ(xi ), the frequency for each value of the supplied timeto-publication xi , are obtained as ϕ(xi ) = 85 × f (xi )∆x

16

CHAPTER 17.

Distribution Plot Chi-Square, df=22 0.07 0.06

f(x)

0.05 0.04 0.03 0.02 0.01 5.8752E-12 0.00

0

100.232 X

Figure 17.16: P (C 2 ≥ 100.232) for the χ2 (22) distribution. The value of 5.8752 × 10−12 indicates strongly that the logarithmic series model is not adequate for this data set.

Probability Plot of JanPapers Gamma - 95% CI 99.9 Shape Scale N AD P-Value

Percent

99 95 90 80 70 60 50 40 30 20

3.577 2.830 85 0.326 >0.250

10 5

1

0.1

1

10 JanPapers

100

Figure 17.17: Gamma probability plot and 95% confidence intervals for “January Papers” data set. The p-value indicates that the postulated gamma model appears sufficiently adequate for this data set.

17

Histogram of JanPapers Gamma Shape 3.577 Scale 2.830 N 85

20

Frequency

15

10

5

0 0

4

8

12 16 JanPapers

20

24

Figure 17.18: Histogram of the “January Papers” data along with a gamma distribution fit with parameters α = 3.577; β = 2.830.

where, for this data set, ∆x = 2 is the histogram bin size. The result is shown in the table below along with corresponding relative frequency data, and C 2 (xi ), individual contributions to the chi-squared statistic, so that C 2 = ∑ n 2 i=1 C (xi ) = 12.2780. A plot of the observed frequency f o (xi ), and the theoretical counterpart ϕ(xi ) obtained from the gamma distribution, is shown in Fig 17.19.

Range 1-3 3-5 5-7 7-9 9-11 11-13 13-15 15-17 17-19 19-21 > 21 Total

xi 2 4 6 8 10 12 14 16 18 20 22

Observed Frequency f o (xi ) 4 9 11 20 10 9 3 6 6 5 2 85

Theoretical Frequency ϕ(xi ) 3.3447 9.8443 13.8055 14.2921 12.5288 9.8864 7.2550 5.0485 3.3733 2.1830 1.3766 82.9383

C 2 (xi ) 0.12840 0.07241 0.57011 2.27960 0.51043 0.07948 2.49555 0.17933 2.04525 3.63508 0.28233 12.2780

Now, with n = 11 data points (bins), and given that two population parameters have been estimated from the data set, the reference distribution for the chi-squared statistic is χ2 (ν), where ν = 11 − 2 − 1 = 8. The p-value associated

18

CHAPTER 17.

Scatterplot of phixjp, fxjp vs xjp Variable phixjp fxjp

20

f(x)

15

10

5

0 0

5

10

15

20

25

xjp

Figure 17.19: Observed frequencies f o (xi ) for the “January Papers” data (dashed line, squares) and corresponding predicted frequencies, ϕ(xi ) (solid line, circles), obtained from the theoretical gamma γ(3.577, 2.830) distribution fit.

with the computed C 2 = 12.278 is obtained as (see Fig 17.20): P (C 2 ≥ 12.278|C 2 ∼ χ2 (8)) = 0.1392 The conclusion is that the postulated gamma model is sufficiently adequate—a result that is in agreement with that obtained previously in Problem 17.13. 17.15 First, the supplied data is recast as follows, using the mid-range of each bin as xi , and in terms of the corresponding relative frequency, f (xi ): xi 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5

f (xi ) 0.04 0.13 0.17 0.20 0.16 0.12 0.07 0.04 0.03 0.02 0.01 0.01

From here the sample mean x ¯, and variance, s2 , of the random variable, X, are

19

Distribution Plot Chi-Square, df=8 0.12 0.10

f(x)

0.08 0.06 0.04 0.02 0.1392 0.00

0

12.278 X

Figure 17.20: P (C 2 ≥ 12.278) for the χ2 (8)) distribution. The p-value, 0.1392, indicates that the postulated gamma model is sufficiently adequate.

obtained as: x ¯=

n ∑

xi f (xi ) = 20.7

i=1

and 2

s =

( n ∑

) x2i f (xi )

− (¯ x)2 = 128.76

i=1

We may now use the method of moments to determine estimates for the lognormal distribution parameters as follows. By definition of the theoretical mean µ, and variance σ 2 , of the lognormal random variable, µ = σ2

=

2

e(α+β /2) ( 2 ) µ2 eβ − 1

(17.4) (17.5)

Upon substituting x ¯ for µ, and s2 for σ 2 above, and solving the resulting two equations simultaneously for α and β, the result is: α ˆ = 2.899; βˆ = 0.513; (or βˆ2 = 0.2627) We may now use the theoretical lognormal L(2.899, 0.5132 ) pdf { } 1 −(ln x − 2.899)2 √ exp f (x) = 2 × 0.5132 0.513x 2π to obtain the predicted values of ϕ(xi ), the percentage of US Population with income level xi , as follows: ϕ(xi ) = 100 × f (xi )∆x

20

CHAPTER 17.

Scatterplot of f0xi, Phi-x vs xi 25

Variable f0xi Phi-x

Frequency

20

15

10

5

0 0

10

20

30 xi

40

50

60

Figure 17.21: Observed frequencies f o (xi ) for income of families in the US in 1979 (dashed line, squares) and corresponding predicted frequencies, ϕ(xi ) (solid line, circles), obtained from the theoretical lognormal L(2.899, 0.5132 ) distribution fit.

where, in this case, the bin size, ∆x = 5. The result is shown in the table below along with C 2 (xi ), the individual contributions to the chi-squared statistic. First, a plot of the observed frequency f o (xi ), and the theoretical counterpart ϕ(xi ) obtained from the lognormal distribution, is shown in Fig 17.21. The fit appears remarkably good, except for the unusually low prediction for xi = 2.5, This latter observation is crucial for conducting a formal chi-squared test fairly for this data set.

Range 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 > 55 Total

xi 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5

Observed Frequency f o (xi ) 4 13 17 20 16 12 7 4 3 2 1 1 100

Theoretical Frequency ϕ(xi ) 0.0887* 11.7425 23.8719 22.1619 15.8347 10.1905 6.2829 3.8161 2.3148 1.4122 0.8696 1.4142 100.0000

C 2 (xi ) 172.380* 0.13466 1.97818 0.21090 0.00173 0.32131 0.08185 0.00886 0.20282 0.24468 0.01955 0.12131 3.32585**

In conducting a formal chi-squared test using the information in this table,

21

Distribution Plot Chi-Square, df=8 0.12 0.10

f(x)

0.08 0.06 0.04 0.9123 0.02 0.00

0

3.32585 X

Figure 17.22: P (C 2 ≥ 3.32585) for the χ2 (8) distribution. The value of 0.9123 indicates that the postulated lognormal model appears to be sufficiently adequate. it is important to note that for the specific theoretical lognormal distribution in question, capturing the frequency changes at low values of x appropriately requires a finer bin size than the ∆x = 5 used to present the original data. The total percentage value of 4 observed for x in the 0–5 range should have been broken down further over a finer mesh size, say, for example, ∆x = 1, to provide a break down of this percentage value over the smaller intervals x=0–1, 1–2, 2–3, 3–4, 4–5. Unfortunately, the non-availability of observed frequencies on a such a finer mesh scale is directly responsible for the inconsistency of the predicted value at x = 2.5, which is marked with the asterisk “*” in the table. Eliminating the clearly inconsistent data for the range 0–5, removes the “outlier” term C 2 (2.5) = 172.380 from consideration, but also now leaves n = 11 data points (bins); and given that two theoretical population parameters have been estimated from the data set, the reference distribution for the chi-squared test statistic is χ2 (ν), where ν = 11 − 2 − 1 = 8. The p-value associated with the computed C 2 = 3.32585 (marked with ** in the table because it excludes the “outlier”) is obtained from this distribution as (see Fig 17.22) P (C 2 ≥ 3.32585|C 2 ∼ χ2 (8)) = 0.9123 The conclusion is that the distribution of income of families in the US in 1979 (in actual dollars uncorrected for inflation) is reasonably well-represented by the lognormal distribution, provided one leaves out the entry for the very low income range of 0–5 which requires a finer breakdown over the interval. 17.16 The required model parameters are to be obtained from sample mean and variances determined from the supplied data as follows. First, we assign the observed frequency associated with 5+ totally to x = 5 accidents and therefore

22

CHAPTER 17.

obtain, for the sample average: x ¯=

1∑ xi f (xi ) = 0.465 6 i

and for the sample variance, s2 =

1∑ (xi − x ¯)2 f (xi ) = 0.691 6 i

We now observe that for the standard Poisson random variable, x ¯ and s2 should 2 be approximately equal; this is not the case here. In fact, s > x ¯, so that this random variable is “overdispersed.” Still, the maximum likelihood estimate of the Poisson parameter, λ, is the sample average, hence: ˆ = 0.465 λ Next, we use the method of moments to obtain estimates of the negative binomial parameters. By definition, the first two sample moments for the negative binomial random variable are related to the population parameters as follows: ( ) p M1 = α 1−p { } p M2 = α (1 − p)2 from which we obtain the following expressions for the parameter estimates in terms of the moments: p = α

=

M1 σ2 M12 2 σ − M1

For the specific problem at hand, therefore, α ˆ = 0.957; pˆ = 0.673 To carry out a formal chi-squared test, we must now determine for each model, the predicted frequencies, using the appropriate theoretical pdfs and the corresponding parameter estimates obtained above. For the Poisson model, we use the P(0.465) pdf; for the negative binomial model, because the estimate for α (or k) is not an integer, we can either resort to the Gamma function representation (see Eq (8.53) in the text) or else round up the estimate to the nearest integer, in this case α = k = 1, and therefore use the N Bi(1, 0.673) pdf. The results, including contributions to the chi-squared statistic, are shown below, for the Poisson model first, and then for the negative binomial model.

23 Poisson λ = 0.465 f (xi ) 0.628135 0.292083 0.067909 0.010526 0.001224 0.000114 0.99999

Observed Freq f o (xi ) 447 132 42 21 3 2 647

Predicted (Poisson) ϕP (x) 406.403 188.978 43.937 6.810 0.792 0.074 646.994

Neg. Binomial (k = 1, p = 0.673) f (xi ) 0.673000 0.220071 0.071963 0.023532 0.007695 0.002516 0.998777

Observed Freq f o (xi ) 447 132 42 21 3 2 647

Predicted (Neg. Binomial) ϕN B (x) 435.431 142.386 46.560 15.225 4.979 1.628 646.209

xi 0 1 2 3 4 5 Total

xi 0 1 2 3 4 5 Total

CP2 (xi) 4.0553 17.1790 0.0854 29.5653 6.1597 50.4011 107.446

2 CN B (xi) 0.30738 0.75757 0.44664 2.19035 0.78636 0.08500 4.5733

It is informative to compare in a plot, the observed frequencies and the frequencies predicted by each model; such a plot is shown in Fig 17.23. Observe how the frequencies predicted by the negative binomial model are visually indistinguishable from the corresponding observed frequencies; the same is not true for the Poisson model predictions. The formal Chi-squared test is to be carried out using CP2 = 107.446 for the Poisson model: with 6 groups from which we have estimated one parameter, λ, the reference distribution is χ2 (ν) with ν = 4. For the negative binomial model, 2 CN B = 4.5733, and with two estimated parameters, k and p, in this case, ν = 3. The p-values associated with the tests are obtained as follows (see Fig 17.24): P (C 2 ≥ 107.446|C 2 ∼ χ2 (4)) = 0.00; (Poisson) P (C 2 ≥ 4.5733|C 2 ∼ χ2 (3)) = 0.2058; (Negative Binomial) Thus, from the p-value of 0.00 for the Poisson model, at the α = 0.05 significance level, we must reject the null hypothesis that the Poisson model is adequate. On the other hand, the p-value of 0.2058 indicates that the negative binomial model appears to be sufficiently adequate. We therefore conclude that the negative binomial model is more appropriate than the Poisson model. 17.17 Fig 17.25 shows a normal probability plot (and 95% confidence interval) for the supplied data set. The p-value associated with the AD test (p = 0.486) indicates that the normal distribution model is sufficiently adequate.

24

CHAPTER 17.

Scatterplot of f0xiYule, Phi-xNB, Phi-xPoisson vs xi-Yule 500

Variable f0xiYule Phi-xNB Phi-xPoisson

400

f(x)

300

200

100

0 0

1

2

3

4

5

xi-Yule

Figure 17.23: Observed accident frequencies f o (xi ) for the Greenwood-Yule data (solid line, circles) and corresponding predicted frequencies, ϕN B (xi ) (dashed line, squares), obtained from the theoretical negative binomial N Bi(1, 0.673) distribution, and the Poisson P(0.465) (dotted line, diamonds) distribution. Visually, the negative binomial model prediction is virtually indistinguishable from the observed frequencies.

17.18 The reasonableness of the postulated Poisson model is assessed by carrying out a formal “Goodness-of-fit” test, the result of which is shown below: Goodness-of-Fit Test for Poisson Distribution Data column: Flaws Poisson mean for Flaws = 1.8

Flaws 0 1 2 3 >= 4 N 20

N* 0

Observed 4 4 6 4 2 DF 3

Poisson Probability 0.165299 0.297538 0.267784 0.160671 0.108708

Chi-Sq 1.06920

Expected 3.30598 5.95076 5.35568 3.21341 2.17417

Contribution to Chi-Sq 0.145696 0.639492 0.077514 0.192544 0.013952

P-Value 0.785

3 cell(s) (60.00%) with expected value(s) less than 5. The associated p-value of 0.785 indicates that the postulated Poisson model appears to be sufficiently adequate.

25

Distribution Plot Chi-Square, df=4 0.20

Density

0.15

0.10

0.05

0.00

0 107.446

0 X

Distribution Plot Chi-Square, df=3 0.25

Density

0.20

0.15

0.10

0.05 0.2058 0.00

0

4.5733 X

Figure 17.24: P (C 2 ≥ 107.446) for the χ2 (4) distribution (top panel) and P (C 2 ≥ 4.5733) for the χ2 (3) distribution (bottom panel).

Probability Plot of Mee(1990) Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

0.219 0.01131 18 0.329 0.486

80

Percent

70 60 50 40 30 20 10 5

1

0.18

0.19

0.20

0.21 0.22 0.23 Mee(1990)

0.24

0.25

0.26

Figure 17.25: Normal probability plot and 95% confidence interval for the Mee data set. The p-value of 0.486 indicates that the normality assumption appears to be sufficiently adequate.

26

CHAPTER 17.

17.19 (i) First, we recast the supplied data using the mid-range of each age bracket (bin) as xi , and in terms of the corresponding relative frequency, f (xi ) (which requires dividing the 1960 population data by the total 179,322, and the 1980 population data by the total 226,546). The result is shown below.

Age xi 2 7 12 17 22 27 32 37 42 47 52 57 62 > 67

1960 f60 (xi) 0.113321 0.104237 0.093536 0.073717 0.060232 0.060612 0.066634 0.069601 0.064688 0.060667 0.053568 0.047010 0.039828 0.092348

1980 f80 (xi) 0.072162 0.073716 0.080522 0.093438 0.094105 0.086168 0.077516 0.061643 0.051508 0.048953 0.051689 0.051270 0.044530 0.112781

A plot of these frequency distributions is shown in Fig 17.26. Even by mere visual inspection, it is clear that both distributions are bi-modal and are therefore so obviously not normally distributed that it is unnecessary to conduct a formal test. Nevertheless, we show in Fig 17.27, normal probability plots for the two data sets as obtained in MINITAB using the following sequence: Graph > Prob Plot > Single > Graph Variable:

"Age-xi"

Distribution --> "Normal" Data Option --> (1960) (or 1980, as the case may be). Both p-values associated with each AD test are less than 0.005, implying that the postulated normal distribution is inadequate in each case. (ii) We may carry out a Chi-squared test to compare the two observed frequencies by treating one of these observed frequencies (say, for the year 1960) as the “predicted” frequencies. The resulting contributions to the Chi-squared statistics, obtained as defined below, yields the result in the following table. C 2 (xi ) =

o o 2 (f80 − f60 ) o f60

27

Scatterplot of fxi-1960, fxi-1980 vs Age-xi 0.12

Variable fxi-1960 fxi-1980

0.11 0.10

f(xi)

0.09 0.08 0.07 0.06 0.05 0.04 0

10

20

30 40 Age-xi

50

60

70

Figure 17.26: Age frequency distributions f60 (xi ) (solid line, circles) and f80 (xi ) (dashed line, squares) of the inhabitants of the US in 1960 and 1980 respectively. The distributions are clearly bi-modal and therefore not normally distributed.

Age xi 2 7 12 17 22 27 32 37 42 47 52 57 62 > 67 Total

1960 f60 (xi) 20321 18692 16773 13219 10801 10869 11949 12481 11600 10879 9606 8430 7142 16560 179,322

1980 f80 (xi) 16348 16700 18242 21168 21319 19521 17561 13965 11669 11090 11710 11615 10088 25550 226,546

C 2 (xi ) 776.769 212.287 128.657 4779.98 10242.4 6887.21 2635.75 176.449 0.41043 4.09238 460.839 1203.35 1215.19 4880.44 33603.8

With n = 14 groups, and no parameter estimated, the reference distribution for the test is χ2 (13), so that the associated p-value is obtained as: P (C 2 ≥ 33603.8|C 2 ∼ χ2 (13)) = 0.000 we must therefore reject the null hypothesis (at the α = 0.05 significance level), and conclude that the two age distributions are significantly different.

28

CHAPTER 17.

Probability Plot of Age-xi(1960) Normal - 95% CI 99

95 90

Mean StDev N AD P-Value

30.69 21.18 179322 3695.487 0.250

30 20 10 5 3 2 1

0.1

1 Padgett-Spurrier

10

Figure 17.31: Weibull probability plot (and 95% confidence interval) for the Padgett and Spurrier data when the distribution parameters are pre-specified as ζ = 2.0, β = 2.5. With a value p > 0.250 associated with the formal AD test, the implication is that the postulated distribution (along with the specified parameters) is sufficiently adequate.

17.22 (i) A probability plot (and 95% confidence interval) for the Weibull W (2.0, 2.5) distribution when compared with the Padgett and Spurrier data is shown in Fig 17.31. Since the p-value is greater than 0.25, the implication is that the postulated Weibull distribution with the specified parameter values is sufficiently adequate and provides a reasonable model for the data in question. (ii) When the distribution parameters are not specified and must therefore be estimated from the data, the resulting probability plot is shown in Fig 17.32. Even though the parameter estimates, now determined as ζ = 2.059 (as opposed to 2.0) and β = 2.611 (as opposed to 2.5), are not that different from the corresponding values postulated by Padgett and Spurrier, the result of the AD test is different: this time, since the p-value is 0.021, the implication is that the model adequacy is now indeterminable because 0.01 < p < 0.05. This problem illustrates the fact that obtaining independent parameter estimates may affect the outcome of model validation tests. If the parameter estimates supplied by Padgett and Spurrier are not as representative of the information in the data as are the estimates obtained from the data, one would draw the wrong conclusion that the postulated Weibull model is adequate. 17.23 Fig 17.33 shows the normal probability plot (and 95% confidence interval) for the viscosity measurements. The p-value associated with the AD test, 0.507, indicates that the postulated normal model appears to be sufficiently adequate.

33

Probability Plot of Padgett-Spurrier Weibull - 95% CI 99.9 Shape Scale N AD P-Value

Percent

99 90 80 70 60 50 40

2.059 2.611 50 0.895 0.021

30 20 10 5 3 2 1

0.1

1 Padgett-Spurrier

10

Figure 17.32: Weibull probability plot (and 95% confidence interval) for the Padgett and Spurrier data when the distribution parameters are estimated from the data as ζ = 2.059, β = 2.611. The p-value, 0.021, implies that the adequacy of the postulated distribution (with the estimated parameters) is now indeterminable.

Probability Plot of Holmes(1992) Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

14.63 0.6237 10 0.305 0.507

80

Percent

70 60 50 40 30 20 10 5

1

12

13

14 15 Holmes(1992)

16

17

Figure 17.33: Normal probability plot and 95% confidence interval for the viscosity data set of Holmes and Mergen (1992). The associated p-value of 0.507 indicates that the normality assumption appears to be sufficiently adequate.

Chapter 18

Exercises 18.1 (i) Shown below are the two tables of Dmi , the deviation from the postulated median, η0 = 1, defined in the text as: Dmi = Xi − η0 and the affiliated signs. For X1 , X1 1.26968 0.28875 0.07812 0.45664 0.68026 2.64165 0.21319 2.11448 1.43462 2.29095 1.52232 1.45313 0.65984 1.60555 0.08525 0.03254 0.75033 1.34203 1.25397 3.16319

Dmi 0.26968 −0.71125 −0.92188 −0.54336 −0.31974 1.64165 −0.78681 1.11448 0.43462 1.29095 0.52232 0.45313 −0.34016 0.60555 −0.91475 −0.96746 −0.24967 0.34203 0.25397 2.16319

and for X2 , 1

Sign + − − − − + − + + + + + − + − − − + + +

2

CHAPTER 18. X2 1.91282 1.13591 0.72515 1.19141 1.34322 3.18219 0.88740 2.68491 2.16498 2.84725 2.17989 2.11117 1.45181 2.45986 0.43390 0.76736 1.16390 2.01198 1.80569 3.77947

Dmi 0.91282 0.13591 −0.27485 0.19141 0.34322 2.18219 −0.11260 1.68491 1.16498 1.84725 1.17989 1.11117 0.45181 1.45986 −0.56610 −0.23264 0.16390 1.01198 0.80569 2.77947

Sign + + − + + + − + + + + + + + − − + + + +

(ii) For the data set X1 , the test statistic is determined as T + = 11, i.e., that 55% of the observations have a plus sign. Whereas, for the data set X2 , T + = 16, i.e., that 80% of the observations have a plus sign. Informally these results indicate that η0 = 1 is a reasonable estimate of the median for data set X1 , but not for data set X2 . This is because, theoretically, 50% of a sample set of observations from a population with median η0 will have values higher than this median (so that 50% of the Dmi values will have positive signs), with the other 50% having values lower than η0 . Values of T + corresponding to percentages much higher (or much lower) than 50% indicate the possibility that the true median is different from the postulated value. 18.2 (i) Below is the result of a formal sign test (obtained using MINITAB) of the hypothesis H0 : η = 1, against Ha : η ̸= 1, for data set X1 : Sign Test for Median: X1 Sign test of median = 1.000 versus not = 1.000 N Below Equal Above P Median X1 20 9 0 11 0.8238 1.262 The indicated p-value of 0.824 implies that there is no evidence to support rejection of the null hypothesis. As such, η = 1 appears to be a reasonable postulate for the median of the data set X1 . For data set X2 , the result of a formal sign test (obtained from MINITAB) is shown below.

3 Sign Test for Median: X2 Sign test of median = 1.000 versus not = 1.000 N Below Equal Above P Median X2 20 4 0 16 0.0118 1.859 This time, the p-value of 0.012 indicates evidence (at the α = 0.05 significance level) in support of rejecting the null hypothesis in favor of the alternative. Therefore, η = 1 is not considered a reasonable postulate for the median of the data set X2 . (ii) When we consider only the first 10 observations in each data set, we obtain the following results using MINITAB for the formal sign test. First, for the data set X1 , Sign Test for Median: X1-10 Sign test of median = 1.000 versus not = 1.000 N Below Equal Above P Median X1-10 10 5 0 5 1.0000 0.9750 and with a p-value of 1.00 (with precisely 5 values above and 5 values below the median of 0.975), the implication is that there is absolutely no evidence whatsoever in support of rejecting the null hypothesis. This is in agreement with the result in (i) above. For the data set X2 , the result of the sign test is: Sign Test for Median: X2-10 Sign test of median = 1.000 versus not = 1.000 N Below Equal Above P Median X2-10 10 2 0 8 0.1094 1.628 and with a p-value of 0.11, the implication is that this time, at the α = 0.05 significance level, there is no evidence in support of rejecting the null hypothesis in favor of the alternative; even at the somewhat higher α = 0.10 significance level, there is still no evidence against rejection of the null hypothesis (in the strictest possible sense). This, of course, contradicts the result obtained earlier in (i) above. Since we know that the two data sets are different, more specifically, since we know that the median for X2 is actually higher by 0.6 (on average), we can only conclude that the smaller sample size n = 10 used for the current analysis is too small to enable the detection of the true difference (with any reasonable degree of confidence), using the sign test. 18.3 (i) Upon generating a table of values for the difference, D = X1 − X2 , as shown below, one observes that all the values are negative, an absolute tell-tale sign that the two data sets are different, and that it is impossible for ηD , the median of this difference, to be zero.

4

CHAPTER 18. D = X1 − X2 -0.64314 -0.84716 -0.64703 -0.73477 -0.66296 -0.54054 -0.67421 -0.57043 -0.73036 -0.55630

D = X1 − X2 -0.65757 -0.65804 -0.79197 -0.85431 -0.34865 -0.73482 -0.41357 -0.66995 -0.55172 -0.61628

Still, the result of a formal one-sample Wilcoxon signed rank test, conducted on the computed values of D to test the null hypothesis, H0 : ηD = 0, versus the alternative, Ha : ηD ̸= 0, is: Wilcoxon Signed Rank Test: D12 Test of median = 0.00 versus median not = 0.00 N for Wilcoxon Estimate N Test Statistic P Median D12 20 20 0.0 0.000 -0.6524 As expected, the Wilcoxon test statistic, W + , the sum of the ranks with positive signs, is exactly 0 in this case (since there is no value of D that is positive). The associated p-value is exactly 0, implying that there is absolutely no evidence in support of the null hypothesis against the alternative. Hence, the data does not support the postulate that ηD = 0, significantly favoring the alternative. This result is in perfect keeping with how the data set X2 was generated. As presented in Exercise 18.1, X2 was generated as: X2 = X1 + 0.6 + ϵ

(18.1)

where ϵ ∼ N (0, 0.15 ). The systematic shift, δ = 0.6+ϵ, was detectable using the one-sample Wilcoxon signed rank test conducted on the difference D = X1 −X2 , or, from Eq (18.1), D = −(0.6 + ϵ). (ii) The result of a Mann-Whitney-Wilcoxon (MWW) two-sample test carried out directly on the two samples is: Mann-Whitney Test and CI: X1, X2 2

N Median X1 20 1.262 X2 20 1.859 Point estimate for ETA1-ETA2 is -0.661 95.0 Percent CI for ETA1-ETA2 is (-1.239,-0.044) W = 333.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0385

5 The indicated p-value, 0.0385, implies that there is evidence (at the α = 0.05 significance level) in support of rejecting the null hypothesis. We conclude therefore that the medians for the two data sets are in fact different. The difference between these two results is not so much the conclusions reached (both are in agreement in this regard); it is that the Wilcoxon test on the difference is more sensitive. Had the value added to X1 to generate X2 been smaller than 0.6, and/or had the noise level been higher than σ = 0.15, the MWW test might have been less successful in detecting a significant difference between the medians of the two data sets. (iii) The results obtained here are perfectly reminiscent of the difference between a paired t-test and a standard two-sample t-test. The one-sample Wicoxon test performed on the paired difference D is in fact the non-parametric equivalent of the paired t-test. Just as the paired t-test will always be more sensitive than the standard two-sample t-test in detecting differences, so it is that when the data occur naturally as pairs, the one-sample Wicoxon test performed on the paired difference D will also be more sensitive than the two-sample MWW test. 18.4 (i) The median of X ∼ L(α, β), a lognormal-distributed random variable, is given as: m = eα so that m1 and m2 , the respective theoretical medians for the random variables, XL1 ∼ L(0, 0.25) and XL2 ∼ L(0.25, 0.25), are obtained as: m1 = η01

= 1.000;

m2 = η02

= 1.284

The result of a formal sign test of the hypothesis H0 : η01 = 1, against Ha : η01 ̸= 1, for data set XL1 , obtained using MINITAB, is shown below: Sign Test for Median: XL1 Sign test of median = 1.000 versus not = 1.000 N Below Equal Above P Median XL1 10 4 0 6 0.7539 1.050 Observe that the actual data median for XL1 is 1.050, versus the postulated value of 1.000. With a p-value of 0.7539, the implication of this sign test is that there is no evidence in support of rejecting the null hypothesis that the median for XL1 is 1 as postulated from the theoretical lognormal distribution. The corresponding test of the hypothesis H0 : η02 = 1 against Ha : η02 ̸= 1 for data set XL2 , is shown below: Sign Test for Median: XL2 Sign test of median = 1.000 versus not = 1.000 N Below Equal Above P Median XL2 10 1 0 9 0.0215 1.364

6

CHAPTER 18.

This time, the data median is obtained as 1.364, versus the postulated value of 1.000. With an associated p-value of 0.022, at the α = 0.05 significance level, the sign test indicates that there is evidence to support rejection of the null hypothesis in favor of the alternative. Thus, this test is indeed able to detect that XL2 does not have the same median as XL1 . (ii) A natural log transformation of the data: Y1 = ln(XL1 ); Y2 = ln(XL2 ) yields the data shown in the following table. Y1 −0.202201 −0.038726 0.032731 −0.173809 0.065145 0.293547 −0.253356 0.131265 0.239183 0.527482

Y2 0.481740 0.147534 0.158393 0.086775 0.244401 −0.085145 0.372409 0.390691 0.770423 0.379233

From the characteristics of the lognormal X ∼ L(α, β) random variable, we know that: E(ln X) V ar(ln X)

= α = β2

(18.2) (18.3)

As a result, the theoretical means and standard deviations for the transformed variables Y1 and Y2 are as follows: µ1 = 0.00; σ1 = 0.25 µ2 = 0.25; σ2 = 0.25 The first required test for Y1 , a z-test of the hypothesis H0 : µ1 = 0, versus the alternative Ha : µ1 ̸= 0, given σ1 = 0.25, yields the following results: One-Sample Z: Y1 Test of mu = 0 vs not = 0 The assumed standard deviation = 0.25 Variable N Mean StDev SE Mean Y1 10 0.0621 0.2455 0.0791

95% CI (-0.0928, 0.2171)

Z 0.79

P 0.432

The transformed data average is 0.062, and since the associated p-value for the z-test is 0.432, we find no evidence to support rejection of the null hypothesis that Y1 is a sample from a N (0, 0.252 ) distribution.

7 The second test, this time for Y2 , is a z-test of the hypothesis that H0 : µ2 = 0, versus the alternative Ha : µ2 ̸= 0, given σ2 = 0.25; the result is: One-Sample Z: Y2 Test of mu = 0 vs not = 0 The assumed standard deviation = 0.25 Variable N Mean StDev SE Mean Y2 10 0.2946 0.2390 0.0791

95% CI (0.1397, 0.4496)

Z 3.73

P 0.000

This time, the transformed data average is 0.2946, with an associated p-value that is 0 to the third decimal place. The implication is that there is strong evidence in support of rejecting the null hypothesis that the mean, µ2 , is 0, in favor of the alternative that it is different from 0. (iii) There is not much difference between the results of the two tests. Both tests are able to show fairly conclusively that X1 belongs to the postulated population, and that the two data sets are from different populations. 18.5 (i) The result of a Mann-Whitney-Wilcoxon (MWW) test on the equality of the medians of XL1 and XL2 , versus the alternative that the medians are different, is shown below: Mann-Whitney Test and CI: XL1, XL2 N Median XL1 10 1.0503 XL2 10 1.3640 Point estimate for ETA1-ETA2 is -0.2943 95.5 Percent CI for ETA1-ETA2 is (-0.5516,-0.0068) W = 78.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0452 With the indicated p-value of 0.0452, the implication is that, at the α = 0.05 significance level, there is “borderline” evidence to support rejection of the null hypothesis that the two population medians are equal. (ii) To carry out the required two-sample z-test, it will be necessary to compute the z statistic according to Eq (15.48) in the text, and use it to conduct the test from “first principles,” since we cannot obtain the required results directly from a software package. This is because MINITAB and most other software packages do not provide for a two-sample z-test option, on the justifiable ground that very few (if any) practical problems involve situations where the two population standard deviations are known, but such is the case here. With σ1 and σ2 as determined in Exercise 18.4, and with n1 = n2 = 10, and Y¯1 = 0.062; Y¯2 = 0.295, (determined from the data), we obtain, from Eq (15.48) in the text, that, to test the null hypothesis, H0 : µ1 − µ2 = 0, against the alternative, H0 : µ1 − µ2 ̸= 0, the test statistic is: z=

−0.233 = −2.084 0.112

8

CHAPTER 18.

Distribution Plot Normal, Mean=0, StDev=1 0.4

Density

0.3

0.2

0.1

0.01858 0.0

0.01858 -2.084

0 X

2.084

Figure 18.1: Tail area probabilities for the two-sample z-test of Exercise 18.5. The implied p-value corresponding to the z-statistic, z = −2.084, is 2 × 0.01858 = 0.037.

And now, either by nothing that the computed value, z = −2.084, is less than the critical value of −1.96 (for α = 0.05; see Table 15.4), or else by determining the p-value associated with this z statistic (see Fig 18.1), i.e., p = 0.037, we conclude that there is evidence to support rejection of the null hypothesis at the α = 0.05 significance level. (iii) Surprisingly, the two tests, the parametric z-test, and the non-parametric MWW, produce similar results, enabling us to conclude that the two populations from which the data sets XL1 and XL2 are drawn, are different. Note, however, that the p-value associated with the parametric test is “borderline” in the sense that the actual computed value, 0.0452, is dangerously close to the traditional cut-off significance level of 0.05. The p-value associated with the z-test is not quite as close to the “borderline.” 18.6 (i) The result of the required two-sample t-test obtained from MINITAB is shown below: Two-Sample T-Test and CI: XU, YU N Mean StDev SE Mean XU 10 1.447 0.406 0.13 YU 8 2.066 0.694 0.25 Difference = mu (XU) - mu (YU) Estimate for difference: -0.619 95% CI for difference: (-1.236, -0.002) T-Test of difference = 0 (vs not =): T-Value = -2.24 P-Value = 0.049 DF = 10

9 For this test to be valid, the original population should be approximately Gaussian, especially since the sample sizes are quite small for both samples. With a p-value of 0.049, at the α = 0.05 level, there is “borderline” evidence to support rejection of the null hypothesis that the two population means are equal. However, we come to this conclusion with some reservation (because of how close 0.049 is to the cut off value of 0.05). (ii) The result of the MWW test, obtained from MINITAB, is shown below. Mann-Whitney Test and CI: XU, YU N Median XU 10 1.550 YU 8 2.160 Point estimate for ETA1-ETA2 is -0.690 95.4 Percent CI for ETA1-ETA2 is (-1.250,0.050) W = 74.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0685 The test is significant at 0.0684 (adjusted for ties) The associated p-value, 0.0685, indicates that, at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis that the two medians are the same. (iii) The parametric test, even though not completely “valid” because of the nature of the underlying distributions (uniform as opposed to the required Gaussian), is based on tests of population means: even though underlying distributions may not be Gaussian, the t-test is less susceptible to such deviations from the ideal because the distribution of means tends to be Gaussian, regardless of the underlying population distribution (so long as it is not the pathological Cauchy distribution). As such, the result produced by the parametric test is expected to be more consistent with what we know about how the data was generated. On the other hand, the non-parametric test, which is based on tests of population medians, was not as effective in distinguishing between the two populations. Even though estimates of sample medians are robust to outliers, they are quite susceptible to fluctuations when sample sizes are small. Thus, while the underlying distribution for the two data sets is uniform, not Gaussian, still the distribution of the means should be approximately Gaussian. Because of the limited sample size in each case (10 for XU and 8 for YU ), the parametric test (for equality of means) was able to “outperform” the more robust, but less powerful non-parametric test (for equality of medians). (iv) The use of an absolute α = 0.05 as an arbiter of significance is dangerous, especially when sample sizes are small and the underlying distributions are non-gaussian, or unknown. The value of 0.05 is arbitrary, and in this case, a strict adherence to this value cost us the ability to come to the correct conclusion and reject the null hypothesis after carrying out the non-parametric MWW test.

10

CHAPTER 18.

Probability Plot of Residuals Normal 99 Mean StDev N KS P-Value

95 90

0.095 0.4767 20 0.126 >0.150

80

Percent

70 60 50 40 30 20 10 5

1

-1.0

-0.5

0.0 0.5 Residuals

1.0

1.5

Figure 18.2: Probability plot for the residuals of Exercise 18.8, along with the results of a Kolmogorov-Smirnov normality test. The implied p-value, p > 0.150, indicates that there is no evidence to support rejection of the null hypothesis that the data set is normally distributed.

18.7 The result of the required one-sample sign test obtained from MINITAB is: Sign Test for Median: S15 Sign test of median = 3.000 versus not = 3.000 N Below Equal Above P Median S15 15 6 7 2 0.2891 3.000 The resulting p-value, 0.2891, indicates that, at the α = 0.05 significance level, there is no evidence to support rejecting the null hypothesis that the median response is 3. Thus, it seems reasonable to conclude that the median of the population is indifferent to the referendum in question. 18.8 (i) The K-S test carried out in MINITAB (Stat > Basic Statistics > Normality Test, and then select “Kolmogorov-Smirnov” under “Test for Normality”) produces the probability plot shown in Fig 18.2, along with a test statistic, D = 0.126, and p > 0.150. Upon selecting the Anderson-Darling option, the same probability plot is generated, along with the test statistic AD = 0.301, and p = 0.546. Both tests indicate that there is no evidence to support rejection of the null hypothesis that the residuals are normally distributed. (ii) Upon addition of the four “outliers” to the original data set, the K-S test produces nearly the same result as obtained above, with the K-S test statistic now obtained as D = 0.153, along with p = 0.148. On the other hand, the A-D test statistic is now obtained as AD = 0.523, along with p = 0.165. Both

11

Probability Plot of Sample Normal 99 Mean StDev N KS P-Value

95 90

0.677 2.574 20 0.186 0.068

80

Percent

70 60 50 40 30 20 10 5

1

-5.0

-2.5

0.0 Sample

2.5

5.0

7.5

Figure 18.3: Probability plot for the sample data set of Exercise 18.9, along with the results of a Kolmogorov-Smirnov normality test. The implied p-value, p = 0.068, indicates that there is no evidence to support rejection of the null hypothesis that the data set is normally distributed.

p-values still indicate a lack of evidence to support rejection of the null hypothesis; however, the A-D test statistic is much larger, while the K-S test statistic changed only sightly. Nevertheless, the addition of the 4 “outlier” values did not alter our conclusions about the normality of the residuals at all. 18.9 A K-S test on the provided sample produces the probability plot shown in Fig 18.3, along with the KS test statistic, D = 0.186, and an associated p = 0.068. The conclusion is that, at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis that the data in this sample is normally distributed. On the other hand, a probability plot with a 95% CI, and the result of an A-D test (in MINITAB: Graph > Probability Plot > Single) are shown in Fig 18.4: the test statistic is AD = 0.887, with an associated p = 0.019. The conclusion in this case is that there is, in fact, evidence to support rejection of the null hypothesis. A box plot of this data, shown in Fig 18.5, indicates two potential “outlier” values (6.86 and 6.70) that cast doubts on the normality postulate. Since the K-S test is less sensitive to extreme values, the AD test is more likely to lead to the correct decision for this particular problem.

12

CHAPTER 18.

Probability Plot of Sample Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

0.677 2.574 20 0.887 0.019

80

Percent

70 60 50 40 30 20 10 5

1

-5

0 Sample

5

10

Figure 18.4: Probability plot for the sample data set of Exercise 18.9, along with a 95% confidence interval, and the results of an Anderson-Darling normality test. The implied p-value, p = 0.019, indicates evidence to support rejection of the null hypothesis that the data set is normally distributed, in direct contradiction to the result of the K-S test. Note the single value (6.70) that lies outside the 95% confidence interval.

Boxplot of Sample 8

6

Sample

4

2

0

-2

-4

Figure 18.5: Box plot of the sample data set of Exercise 18.9. Note the two extreme values (6.70 and 6.86) flagged as potential outliers. These extreme values make the data set appear right-skewed, calling into question the normality postulate.

13

APPLICATION PROBLEMS 18.10 Let DAB represent the difference between the recorded “times” for operator A and for operator B (with DAC as the difference between operator A and operator C). If the two operators being compared have the same safety record, the computed differences should have a median of zero. The following data table of differences is obtained from the supplied data. DAB -0.63 -3.06 0.11 1.51 3.33 0.41 0.65 -0.21 -4.07 -0.61

DAC 0.52 -1.07 2.37 -0.73 4.66 0.14 -6.56 0.98 1.72 -3.08

Let ηAB be the median of DAB , and ηAC the median of DAC . The result of a Wilcoxon signed rank test of the hypothesis, H0 : ηAB = 0, versus the alternative Ha : ηAB ̸= 0, is shown below: Wilcoxon Signed Rank Test: D-AB Test of median = 0.00 versus median not = 0.00 N for Wilcoxon Estimate N Test Statistic P Median D-AB 10 10 26.0 0.919 -0.1000 The indicated p-value, p = 0.919, implies that, at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis in favor of the alternative. Similarly, the result of a Wilcoxon signed rank test of the hypothesis, H0 : ηAC = 0, versus the alternative Ha : ηAC ̸= 0, is shown below: Wilcoxon Signed Rank Test: D-AC Test of median = 0.00 versus median not = 0.00 N for Wilcoxon Estimate N Test Statistic P Median D-AC 10 10 29.0 0.919 0.1400 Coincidentally, the associated p-value is also p = 0.919, indicating that at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis. Thus, according to this series of tests based on the difference between the safety records, there is no evidence that operator “A” is truly more safety conscious than either “B” or “C.”

14

CHAPTER 18.

18.11 (i) The result of the first required two-sample t-test carried out in MINITAB is shown below: Two-Sample T-Test and CI: Operator A, Operator B Two-sample T for Operator A vs Operator B N Mean StDev SE Mean Operator A 10 1.87 1.49 0.47 Operator B 10 2.12 1.90 0.60 Difference = mu (Operator A) - mu (Operator B) Estimate for difference: -0.257 95% CI for difference: (-1.868, 1.354) T-Test of difference = 0 (vs not =): T-Value = -0.34 P-Value = 0.741 DF = 17 The indicated p-value, 0.741, implies that there is no evidence to support rejection of the null hypothesis that the mean time between occurrences of safety violations are the same for both operators A and B. The result of the second two-sample t-test is: Two-Sample T-Test and CI: Operator A, Operator C Two-sample T for Operator A vs Operator C N Mean StDev SE Mean Operator A 10 1.87 1.49 0.47 Operator C 10 1.97 2.29 0.72 Difference = mu (Operator A) - mu (Operator C) Estimate for difference: -0.105 95% CI for difference: (-1.948, 1.738) T-Test of difference = 0 (vs not =): T-Value = -0.12 P-Value = 0.905 DF = 15 Once again, since p = 0.905, the implication is that there is no evidence to support rejection of the null hypothesis, this time that the mean time between occurrences of safety violations are the same for both operators A and C. However, these tests may not be entirely valid. The underlying distribution for the original data is known to be exponential, and the sample size, n = 10, is quite small. (ii) The result of the MWW tests are shown below. For Operators A and B, Mann-Whitney Test and CI: Operator A, Operator B N Median Operator A 10 1.360 Operator B 10 1.640 Point estimate for ETA1-ETA2 is -0.205 95.5 Percent CI for ETA1-ETA2 is (-1.500,1.360) W = 100.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.7337

15 with the conclusion that there is no evidence to support rejection of the null hypothesis that the two medians are the same (since p = 0.7337). Similarly, for Operators A and C, Mann-Whitney Test and CI: Operator A, Operator C N Median Operator A 10 1.360 Operator B 10 0.875 Point estimate for ETA1-ETA2 is 0.135 95.5 Percent CI for ETA1-ETA2 is (-1.080,1.949) W = 111.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.6776 with the conclusion that there is also no evidence to support rejection of the null hypothesis that the two medians are the same (since p = 0.6776). These tests are valid in the sense that, being non-parametric, they make no assumptions about the underlying distributions. Nevertheless, the results are consistent with those obtained in (i). Whichever test is employed does not seem to matter much for this problem; there simply is not sufficient evidence to conclude that operator A is any more safety conscious than the other two operators. 18.12 (i) The box plots are shown in Fig 18.6, where we observe that the box plot for the “Home” scores is offset higher than that for the “Away”scores. This visual assessment seems to suggest that, on average, the team scores more points at home than it does away. (ii) The result of a two-sample t-test is: Two-Sample T-Test and CI: Home-Points, Away-Points Two-sample T for Home-Points vs Away-Points N Mean StDev SE Mean Home-Points 8 31.3 11.8 4.2 Away-Points 8 20.8 13.2 4.7 Difference = mu (Home-Points) - mu (Away-Points) Estimate for difference: 10.50 95% CI for difference: (-3.05, 24.05) T-Test of difference = 0 (vs not =): T-Value = 1.67 P-Value = 0.118 DF = 13 The implication of p = 0.118 is that there is no evidence to support rejecting the null hypothesis that the team’s mean offensive productivity at home and away are the same. The assumption necessary to make this test valid is that the data (scores) can be considered as random samples from a normal population. It is highly unlikely that this is the case, however, primarily because scores in football games can only take certain non-negative values: some scores are impossible; others are rare. For example, a score of 1 is impossible, and while scores of 0 or 2 or 4 are possible, they are rare. The normality assumption is therefore not likely to be valid.

16

CHAPTER 18.

50 45 40 35

Points

30 25 20 15 10 5 Home

Away

Figure 18.6: Box plots of total points scored at home and away by the Philadelphia Eagles during the 2008/2009 season.

(iii) The result of the MWW test is shown below: Mann-Whitney Test and CI: Home-Points, Away-Points N Median Home-Points 8 30.50 Away-Points 8 20.00 Point estimate for ETA1-ETA2 is 10.50 95.9 Percent CI for ETA1-ETA2 is (-5.00,25.00) W = 83.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1278 The test is significant at 0.1275 (adjusted for ties) With a p-value of 0.1278, the implication is that, at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis that the team’s offensive performance at home and away are the same. (iv) Even though the results of the formal tests indicate otherwise, a visual inspection of the box plot seems to suggest that the team might in fact have done a bit better at home than away. The small sample size is likely the reason for the lack of statistical significance in the observed difference.

17

40 30 20 10 0 −10 −20 −30 Home

Away

Figure 18.7: Box plots of point differentials for home and away games played by the Philadelphia Eagles during the 2008/2009 season.

18.13 (i) The box plots are shown in Fig 18.7, where the point differentials appear to be better at home than away. (ii) The result of the required two-sample t-test carried out in MINITAB, is: Two-Sample T-Test and CI: Home-Diff, Away-Diff Two-sample T for Home-Diff vs Away-Diff N Mean StDev SE Mean Home-Diff 8 16.5 16.8 6.0 Away-Diff 8 -0.6 14.7 5.2 Difference = mu (Home-Diff) - mu (Away-Diff) Estimate for difference: 17.13 95% CI for difference: (0.05, 34.20) T-Test of difference = 0 (vs not =): T-Value = 2.17 P-Value = 0.049 DF = 13 With a p-value of 0.049, at the α = 0.05 significance level, there appears to be “borderline” evidence in support of rejecting the null hypothesis, leading to the conclusion that the team’s overall performance is better at home than away. The normality assumption necessary for a t-test to be valid may actually be reasonable here since we are dealing not with absolute scores (which is not normally distributed), but with the difference between two scores, points scored by the Eagles minus those scored by their opponents (which could be normal).

18

CHAPTER 18.

(iii) The result of the required MWW test is shown below: Mann-Whitney Test and CI: Home-Diff, Away-Diff N Median Home-Diff 8 16.50 Away-Diff 8 -2.00 Point estimate for ETA1-ETA2 is 17.00 95.9 Percent CI for ETA1-ETA2 is (-2.01,37.99) W = 84.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1036 The test is significant at 0.1033 (adjusted for ties) The p-value of 0.1033 indicates that at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis; even at the α = 0.10 significance level, in the strictest possible sense, there is still no evidence to support rejection of the null hypothesis (unless one rounds down the p-value to 2 decimal places, and even then, the “evidence” is “borderline”). (iv) Taken at face value, it appears as if the two tests produce different results: the t-test indicates that there is a difference in the performance at home versus away, while the MWW test seems to indicate otherwise. However, upon closer observation, we see that the MWW test, being nonparametric, loses some power and may in fact end up agreeing with the t-test had the sample size being larger. In any event, the box plot seems to indicate that there might in fact be a significant difference in how the team performs at home versus away. 18.14 (i) The 1-Sample Wilcoxon test is appropriate only for data sets from symmetric distributions. There is no reason to believe that the underlying distribution for the data set in question is symmetric. For one thing, each response in the data set is not truly quantitative, even though it is ordinal; for another, even though there are 5 options from which to choose, so that there is a potential for symmetric distribution of the data around the “middle” number 3, there is absolutely no reason to expect that the actual data will be distributed in any particular fashion—uniformly, symmetrically, or skewed left or right. For what it is worth, a histogram of the men’s preference data, shown here in Fig 18.8, indicates some skewness to the right. A one-sample sign test will therefore be more appropriate. The result of such a test, used to test the null hypothesis, H0 : η = 3, against the alternative, that Ha : η ̸= 3, is shown below: Sign Test for Median: Men Sign test of median = 3.000 versus not = 3.000 N Below Equal Above P Median Men 15 6 6 3 0.5078 3.000 The indicated p-value of 0.5078 implies that there is no evidence to support rejection of the null hypothesis that η = 3, i.e., that men are indifferent. We are therefore led to conclude that, as far as the taste test in question is concerned,

19

Figure 18.8: Histogram of data on men’s preference in a diet cola taste test.

men are indifferent to the diet cola brands tested. (ii) To test whether or not there is a difference between men and women in their preferences for the diet cola brands, the appropriate test is the MWW test. The result of such a test, for the null hypothesis, H0 : ηW − ηM = 0, against the alternative, Ha : ηW − ηM ̸= 0, is shown below. Mann-Whitney Test and CI: Women, Men N Median Women 15 4.000 Men 15 3.000 Point estimate for ETA1-ETA2 is 1.000 95.4 Percent CI for ETA1-ETA2 is (-0.000,2.000) W = 283.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0362 The test is significant at 0.0306 (adjusted for ties) The p-value of 0.0362 indicates that at the α = 0.05 significance level, there is evidence in support of rejecting the null hypothesis. As such, we are led to conclude that there is indeed a significant difference between men and women in their preference for the different diet cola brands. The median for men is determined as 3 while that for women is 4, with the implication that while men may be indifferent, women tend to show a preference for the name-brand cola. There is therefore evidence in the data to support the first claim (that men are indifferent), but not the second claim.

20

CHAPTER 18.

18.15 The result of the MWW test of the null hypothesis, H0 : ηA − ηB = 0, versus the alternative, H0 : ηA − ηB < 0, is shown below. (Note that to be equivalent to the t-test in Example 5.6, the alternative hypothesis in this case must be that the difference in the medians is less than 0.) Mann-Whitney Test and CI: Method A, Method B N Median Method A 10 70.000 Method B 10 74.000 Point estimate for ETA1-ETA2 is -4.000 95.5 Percent CI for ETA1-ETA2 is (-9.000,0.999) W = 81.0 Test of ETA1 = ETA2 vs ETA1 < ETA2 is significant at 0.0378 The test is significant at 0.0374 (adjusted for ties) The resulting p = 0.0378 (or, p = 0.0374, when adjusted for ties) indicates that, at the α = 0.05 significance level, there is evidence in support of rejecting the null hypothesis in favor of the alternative. The median for “Method A” may therefore be considered as significantly less than that for “Method B”, a result that is consistent with the result in Example 15.6. Since it is plausible that the test scores may actually be approximately normally distributed around the population mean, we would expect the t-test to be somewhat more reliable. When the assumptions underlying parametric tests are valid, such tests are more powerful than the non-parametric counterpart. 18.16 (i) The histogram and the box plot are shown respectively in Figs 18.9 and 18.10, from where it is abundantly clear that this data set is anything but normally distributed, primarily because of the very prominent outliers and the marked asymmetry. (ii) A formal K-S normality test (carried out in MINITAB) produced the probability plot in Fig 18.11. With the indicated p-value of < 0.001, the conclusion is an emphatic rejection of the null hypothesis that the data set is normally distributed. On the other hand, an A-D normality test produced the results, shown along with a probability plot plus a 95% CI, in Fig 18.12. With p < 0.005, and multiple data points falling outside the 95% CI in the normal probability plot, the conclusion is to reject the null hypothesis that the price fluctuation data is normally distributed. (iii) A histogram of the price fluctuation data after the outlier has been removed is shown in Fig 18.13, with the corresponding box plot shown in Fig 18.14. While the gross skewness evident in earlier plots has now been somewhat muted, both the histogram and the box plot still display non-gaussian characteristics, primarily in the form of persistent skewness (more evident in the histogram) and a few surviving outliers (still evident in the box plot).

21

12

10

Frequency

8

6

4

2

0 −0.3

−0.25

−0.2

−0.15 −0.1 −0.05 Price Fluctuations

0

0.05

Figure 18.9: Histogram of price fluctuations data.

0.05

Price Fluctuations

0

−0.05 −0.1

−0.15

−0.2 −0.25 1

Figure 18.10: Box plot of price fluctuations data.

0.1

22

CHAPTER 18.

Probability Plot of PriceFluctuations Normal 99 Mean StDev N KS P-Value

95 90

-0.009375 0.06271 20 0.351 ETA2 is significant at 0.0001 The test is significant at 0.0001 (adjusted for ties) With p = 0.0001, the conclusion is that there is strong evidence in support of rejecting the null hypothesis in favor of the alternative. Thus, according to the MWW test, by Period II, there has indeed been a significant improvement in the number of accidents occurring at this facility. Strictly speaking, however, this is actually not a valid application of the MWW test. Recall that this test is applicable to continuous distributions, which is clearly not the case for such Poisson-distributed data. 18.18 The objective is to determine which of the car brands, A or B, experiences fewer breakdowns. First, the supplied data (number of breakdowns) is discrete, and quite likely Poisson-distributed. Nevertheless, it may be possible to employ a 2-sample t-test on the grounds that the population means will be reasonably normally distributed for large sample sizes. Unfortunately, the sample size is small, 7 for each brand. The alternative will be the non-parametric equivalent, the MWW test; however, this test is really only appropriate for data drawn from continuous distributions, which is not the case here. Nevertheless, these two are the most relevant—if technically not the most “appropriate”—tests. A two-sample t-test will involve the following hypotheses: H0 : µ A − µ B Ha : µ A − µ B

= 0 > 0

(18.6) (18.7)

which will indicate that brand B should be selected if there is sufficient evidence to reject the null hypothesis that there is no difference between the mean number of breakdowns, µA and µB , experienced by the different brands. The result of such a test is shown below:

26

CHAPTER 18.

Two-Sample T-Test and CI: BrandA, BrandB Two-sample T for BrandA vs BrandB N Mean StDev SE Mean BrandA 7 11.71 1.60 0.61 BrandB 7 8.71 2.50 0.94 Difference = mu (BrandA) - mu (BrandB) Estimate for difference: 3.00 95% lower bound for difference: 0.97 T-Test of difference = 0 (vs >): T-Value = 2.67 P-Value = 0.012 DF = 10 The indicated p-value of 0.012 implies that, at the α = 0.05 significance level, there is evidence to reject the null hypothesis in favor of the alternative. The decision will therefore be to select brand B. On the other hand, a MWW-test will involve the following hypotheses: H0 : ηA − ηB Ha : ηA − ηB

= 0 > 0

(18.8) (18.9)

which is similar to the t-test hypotheses, but with the means replaced by the medians. As such, this test will also indicate that brand B should be selected if there is sufficient evidence to reject the null hypothesis that there is no difference between the median number of breakdowns, ηA and ηB , experienced by the different brands. The result of such a test is shown below. Mann-Whitney Test and CI: BrandA, BrandB N Median BrandA 7 12.000 BrandB 7 9.000 Point estimate for ETA1-ETA2 is 3.000 95.9 Percent CI for ETA1-ETA2 is (1.001,5.000) W = 70.5 Test of ETA1 = ETA2 vs ETA1 > ETA2 is significant at 0.0127 The test is significant at 0.0120 (adjusted for ties) Remarkably (and purely coincidentally), this test returns a p-value (0.012, adjusted for ties) that is identical to the one obtained from the t-test, similarly indicating that brand B should be selected. Thus, even though there are genuine reasons for apprehension concerning the strict applicability of the two tests employed for this problem, at least the two results are consistent with each other. A box plot of the data (not shown) seems to lend support to the validity of these results, that brand B does indeed appear to experience fewer breakdowns. 18.19 (i) The result of the required A-D test is shown in the probability plot of Fig 18.17. The fact that p > 0.250 for the A-D test indicates that there is no evidence to support rejection of the null hypothesis (that the data is gamma

27 distributed). The gamma distribution postulate suggested on phenomenological grounds is thus validated. Probability Plot of JanPapers Gamma - 95% CI 99.9 Shape Scale N AD P-Value

Percent

99 95 90 80 70 60 50 40 30 20

3.577 2.830 85 0.326 >0.250

10 5

1

0.1

1

10 JanPapers

100

Figure 18.17: Gamma probability plot of the “January papers” data along with the results of the A-D test. The test result, p > 0.250, indicates that there is no evidence to support rejection of the null hypothesis that, as suggested on phenomenological grounds, the data is gamma distributed.

(ii) The result of a sign test of the null hypothesis that the median is 8 against the alternative that it is not, is as follows: Sign Test for Median: JanPapers Sign test of median = 8.000 versus not = 8.000 N Below Equal Above P Median JanPapers 85 34 0 51 0.0827 8.833 The indicated p-value of 0.0827 implies that at the α = 0.05 significance level, there is no evidence to support rejection of the null hypothesis in favor of the stated alternative. However, the result will be significant at the α = 0.10 level. Thus, while we will fail to reject the null hypothesis at the α = 0.05 significance level, we will reject the null hypothesis if we set α = 0.10. (iii) When the alternative hypothesis is changed to Ha : η > 8.0, the result of the sign test is the same in every detail as obtained above except for the single crucial difference that the p-value is now 0.0413, or exactly half of the value obtained in the test in (ii) above. Consequently, the conclusion now is that at the α = 0.05 significance level, there is sufficient evidence to reject the null hypothesis in favor of this new alternative. In reconciling this result with the one obtained earlier, we must note two facts: (a) the two-sided test in (ii) does not make a distinction between a true population median that is higher than the postulated value of 8, or one that

28

CHAPTER 18.

is lower. Consequently, the associated p-value will naturally be higher than the value obtained for a one-sided test; (b) with the data median determined as 8.833, should the null hypothesis prove to be invalid, it is more likely that the true population median will be greater than 8. As a result, by posing the problem in this fashion (with Ha : η > 8.0), the Editor-in-Chief is able to determine whether or not the median time to publication is longer than presumed, rather than that the presumption of 8 is simply inaccurate. Thus, the latter alternative, Ha : η > 8.0, is more relevant to the problem at hand. This is an example of a problem for which the two-sided test is not as meaningful as the appropriately formulated one-sided test. 18.20 (i) The probability plots (with 95% confidence intervals) and the result of the A-D tests are shown in Figs 18.18 and 18.19. First, a visual inspection of these probability plots indicates that the data sets are not normally distributed. Quantitatively, we observe that the p-values associated with the A-D tests are both less than 0.005. The implication is clearly that, in each case, there is conclusive evidence to support rejection of the null hypothesis (that each data set is normally distributed) at the α = 0.05 significance level, in favor of the alternative (that the data set is not normally distributed).

Probability Plot of XRDA Normal - 95% CI 99 Mean StDev N AD P-Value

95 90

105.1 2.810 30 4.181