Fundamentals of Statistics and Probability Theory: A Tutorial Approach vol 2. Statistics [2] 9781493793457

Welcome... Fundamentals of Statistics & Probability Theory, a two volume textbook tutorial created by Howard Dachsl

395 18 53MB

English Pages [551] Year 2018

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title Page
Table Of Contents
Inference Theory
Lesson 28 The Central Limit Theorem
Lesson 29 Estimating the Mean µ of a Population
Lesson 30 Statistical Hypotheses
Lesson 31 Hypothesis Testing
Lesson 32 Type I Error
Lesson 33 Type II Error
Lesson 34 Combining Type I and Type II Errors
Lesson 35 The Distribution of P
Lesson 36 Estimating The Proportion p of a Population
Lesson 37 Decision Theory Using P
Lesson 38 The Distribution of Differences of Sample Means
Lesson 39 The Distribution of Differences of Sample
Lesson 40 Small Sampling Theory
Lesson 41 The F Distribution
Lesson 42 Analysis of Variance
Lesson 43 The Chi-Square Distribution
Lesson 44 Correlation and Regression Analysis I
Lesson 45 Correlation and Regression Analysis II
Lesson 46 Non parametric Statistics
Table C The Standard Normal Distribution
Table D The t-Distribution table
Table E F - Distribution
Table F The Chi-Square Distribution
Appendix A Review Frequency Distributions
Appendix B Review Averages
Appendix C Review Measuring Variation
Appendix D Review The Standard Normal Distribution
Recommend Papers

Fundamentals of Statistics and Probability Theory: A Tutorial Approach vol 2. Statistics [2]
 9781493793457

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Fundamentals of Statistics & Probability Theory A Tutorial Approach Vol. 2 Statistics Howard Dachslager, Ph.D. Irvine Valley College

PATHWAYS TO CLEAR LEARNING Learning Step by Step

FUNDAMENTALS OF Statistics & Probability Theory A Tutorial Approach Vol. 2 Statistics Howard Dachslager, Ph.D. Copyright © 2012 by Howard Dachslager. All rights reserved. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher, with the exception that the program listings may be entered, stored, and executed in a computer system, but they may not be reproduced for publication. Published in the United States of American.

Published by Path Ways To Clear Learning Telephone (949) 375-1675 Web Site: PathwaysToClearLearning.com E-Mail address: [email protected]

ISBN: 978-1493793457

To My Dearest Friends Frankie Besch & Albert Murtz

Table Of Contents VOLUME I PROBABILITY THEORY Descriptive Statistics Lesson 1 Lesson 2 Lesson 3

Frequency Distributions Averages Measuring Variation

Set Theory Lesson 4 Lesson 5 Lesson 6 Lesson 7 Lesson 8 Lesson 9

Basic Concepts Set Operations Venn Diagrams Cardinality of a Set Sample Spaces & Events Boolean Algebra of Sets

Probability Theory Lesson 10 Lesson 11 Lesson 12 Lesson 13 Lesson 14 Lesson 15 Lesson 16 Lesson 17 Lesson 18

Probability Sample Space Sampling Methods Conditional Probability Bayes Theorem Random Variables Expectation of a Random Variable Variance of a Random Variable Factorials & Counting Samples The Binomial Sample Space

Lesson 19 Lesson 20 Lesson 21 Lesson 22 Lesson 23 Lesson 24 Lesson 25 Lesson 26 Lesson 27 TABLE A TABLE B TABLE C

The Binomial Random Variable The Binomial Distribution The Binomial Distribution Table The Poisson Distribution The Poisson Distribution Table The Poisson Approximation to the Binomial Distribution The Normal Distribution Applications Using The Normal Distribution The Normal Approximation To The Binomial Distribution The Cumulative Binomial Distribution The Poisson Distribution The Standard Normal Distribution

VOLUME II STATISTICS Inference Theory Lesson 28 Lesson 29 Lesson 30 Lesson 31 Lesson 32 Lesson 33 Lesson 34 Lesson 35 Lesson 36 Lesson 37 Lesson 38 Lesson 39 Lesson 40 Lesson 41

The Central Limit Theorem Estimating the Mean µ of a Population Statistical Hypotheses Hypothesis Testing Type I Error Type II Error Combining Type I and Type II Errors The Distribution of Estimating The Proportion p of a Population Decision Theory Using The Distribution of Differences of Sample Means The Distribution of Differences of Sample Small Sampling Theory The F Distribution

Lesson 42 Lesson 43 Lesson 44 Lesson 45 Lesson 46 TABLE C TABLE D TABLE E TABLE F Appendix A Appendix B Appendix C Appendix D

Analysis of Variance The Chi-Square Distribution Correlation and Regression Analysis I Correlation and Regression Analysis II Non parametric Statistics The Standard Normal Distribution The t-Distribution table F - Distribution The Chi-Square Distribution Review Frequency Distributions Review Averages Review Measuring Variation Review The Standard Normal Distribution

Complete solutions to all unsolved problems and supplementary problems are in the web site: http://www.PathwaysToClearLearning.com

“Brevity is the soul of wit” William Shakespeare Hamlet Act 2, Scene 2

Preface How to Use this Book This book is designed to tutor the reader in a first course study of statistics and probability theory. It covers all the major topics. Each topic is covered by several lessons. For each lesson, the reader is presented with exercises as examples and problems. For each example, its solution is worked out step by step in detail. Following the examples, solved problems that mimic the examples are also worked out in similar detail. Finally to test the reader’s understanding, unsolved problems that mimic the examples as well as the solved problems are presented with answers. At the end of each lesson, a set of supplementary problems is given. These problems are more challenging then those in the main body of the tutorial. Detailed solutions of the unsolved problems and supplementary problems can be found in the solution manual. To see how the tutorial works, let us look at a section of this book. In the book, turn to page 47, Lesson 4, section 1. Let us look at 4.1 - Example 1 (copied below). We see that this example comes with a solution. Next go to the bottom of page 47 for 4.1 - Solved problem 1 (copied below). We see that this example comes with a solution (top of page 48). Note that 4.1- Example 1 and 4.1 - Solved problem 1 are same type of problems and have the same type of solutions. Next go to page 48 for 4.1 - Problem 1 (copied below). Here only the answer is given. This problem is also similar to 4.1 - Example 1 and 4.1 - Solved Problem 1. The line Refer back to 4.1 - Example 1 & 4.1 - Solved Problem 1. reminds the reader to go back to 4.1 - Example 1 and Solved Problem 1 to help in solving the unsolved problem.

Throughout the entire book we use this tutorial method for each example and associated problems.

Why Study Statistics? Statistics is used to bring meaningful relationships between data and events. The subject can be divided into three areas of study: • Descriptive Statistics • Probability Theory • Inference Theory We begin this discussion, using examples, to show what statistics can do and finish with examples to show what statistics cannot do.

What Statistics Can Do. Descriptive Statistics The application of descriptive statistics to data is extensively used in data collection and applications. Example: Ms Smith teaches a class in Ancient Greek history at a local senior center. To get a better understanding of her students, she has them fill out a questionnaire about their personal and academic background. Questions such as the student’s age, gender, income, and academic background are asked. This information will be organized in a way to give Ms. Smith a better understanding of her students. Example: Mr. Fuente is the track coach at a local high school. At the beginning of each academic year, he has several male and female students try out for the varsity teams. To qualify, the students first need to complete a background questionnaire of themselves and make several runs on the track to determine their speed and endurance at different distances. All the information about the students is collected, organized and evaluated. From this data, students will be selected to participate on the varsity team.

Probability Theory The tools of probability theory can be used to show and interpret interesting

relationships between several events. Example: On the evening weather news, it was reported that there is a 60 percent chance that it will rain over the weekend. Several questions can be raised about this forecast: 1. How did this number come about? 2. What exactly does 60 percent mean? Probability theory will help in answering these types of questions. Example: Mr. Jones is playing a card game. His hand has one king. To win he needs to draw two more kings. What are his chances? Questions like this are not hard to answer when tools of probability theory are properly applied. Example: According to a recent study, 7.0% of the population has a lung disease. Of those people that have lung disease, 90% smoke. Suppose a person who smokes is selected at random from the population. Can we conclude that there is a 90% chance that he or she will have a lung disease? Most people would conclude there is a 90% chance or at least a very high chance that the person selected will have a lung disease. Proper use of probability theory will show this not to be the case. In fact one can show the chance is significantly less than 90%.

Inference Theory Inference theory is nothing more than drawing certain conclusions about a large population from samples taken from this population. Perhaps proper application of inference theory is the most important discipline of statistics. Example: The president of a local chamber of commerce wants to find out the opinions of Orange county residents on the building of a new shopping center. Since asking all the residents is not possible, he takes a sample of 200 residents at random and finds that 52% would support the new shopping center. From this poll several questions could be raised: 1. Can he conclude from the results of the sample that 52% of all residents are in support of a new shopping center?

2. Can he use the figure 52% to estimate the true percentage of residents that support a new shopping center? 3. If he decides to use this figure of 52% as an argument that the majority of the residents support a new shopping center, how far off can his claim be? In fact, what is the chance that a majority do not support a new shopping center? The application of inference theory, will answer many of these types of questions. Example: A manufacturer has recently developed a new gasoline additive. It claims that, when added to the fuel tank, the average car will experience a 15% increase in fuel mileage. Assume a consumer group decides to challenge the company’s claim. They run a test on ten automobiles, where five have the additive in their tanks and the others do not. From this test they discover that the cars with the additive only did 10% better then those cars tested that did not have the additive. Several conclusions can occur: 1. The claim of the manufacturer is false. 2. The claim of the manufacturer is not necessarily false due to uncontrolled circumstances that could have occurred in the test. 3. The manufacturer’s claim is true, since 10% is “close” to 15%. Whichever of these three conclusions are decided upon, there is always a possibility that an error is made in that the conclusion is not true. For example, assume from the data collected the consumer group rejects the claim of the manufacturer. Even so, there is still a chance that the claim of the manufacturer is valid. Such questions as these are dealt with using inference theory.

What Statistics Cannot Do. Over the years statistics has been successful in solving many of the above questions. Using statistics to establish relationships between several events is a very important part of statistics. However, one serious flaw in the application of statistics is concluding that there are causal relationships between events. Such a use of statistics is highly questionable.

Example: Some recent studies showed that, with all else being equal, married men live longer then single men. Can we conclude that if a man marries, he has a better chance of living longer then if he remains single? If we answer yes to this question, then we assume there is a causal relationship between being married and living longer. The following could be presented to support such a conclusion: 1. Since a married man eats at home his nutrition is better. Therefore, a married man lives a healthier life style. 2. Because of his responsibilities to his family, he is motivated to live a healthier life style. 3. To help with the family responsibility, his wife will pay close attention to his well being. Can one present a non causal relationship? Consider the following argument: When selecting a husband, most single women will prefer a single man who is healthy and lives a healthy life style over a man that is not healthy or lives a life style not acceptable to a woman. Therefore, this selection process assures that a higher percentage of men selected for marriages are already healthy and therefore will have a higher chance of living longer over single men. Which is correct? This is easier said then done. The one thing we can safely conclude is that the collection of data and statistical analysis does not give us a clear answer. Example: A result of a study was release by a large Eastern university. The longevity of one thousand adults that exercised vigorously each week was compared to the longevity of one thousand adults that were sedentary in their life style. The conclusion from this study was that a life style of vigorous exercise was important in living longer. One is tempted to draw a causal relationship between vigorous exercise and longevity. However, is it possible that this study does not warrant such a conclusion? The following is a simple explanation that points out the flaw in concluding that vigorous exercise causes one to live longer: Adults that are born with good health have a greater interest in exercising, over those that are born with poor health. For example, if one is born with asthma, there is less

chance that this person will do much vigorous exercise. In fact, perhaps it can be argued that people with asthma simply do not live as long. One can certainly argue both cases. However, it is difficult to see how statistics can determine such causality. As in beauty, causality lies in the eyes of the beholden.

STATISTICAL INFERENCE THEORY Lesson 28 The Central Limit Theorem

A population is a collection of numeric data. Each population can be considered a probability sample space with a given distribution. Inference theory allows one to take an appropriate sample of a given population and from this sample make specific judgements about the entire population. For example, assume you are a reporter on the newspaper of a local community college, whose enrollment is twenty thousand, and your assignment is to find the average age μ of all the enrolled students. There are two ways you could proceed: 1. By some means, find the age of each of the 20,000 students and average their ages. If this can be done, then you would have computed μ.

2. Take a representative sample, of say 100 students. From each of these students ask and record their age. From this sample you can compute the average age X. If the second method is used, then you will use the value X in place of the average μ of the whole population. Such a process is making an inference about the mean μ of the whole population, the entre student body. In order to use X as an estimator we need the central limit theorem which allows us to examine the distributions of X, P and other distributions

28.1-What is the Central Limit Theorem for X? Let {Xk} be a sequence of mutually independent random variables1 with a common distribution, generated from a sample of size n drawn from a population. Suppose that μ = E(Xk) and σ2 = Var(Xk) are finite. Define the random variable:

The central limit theorem states the following about the distribution of the random variable X : 1. For a large sample (n ≥30), X is approximately normally distributed. 2. The mean μX = μ 3. The standard deviation of X

is called the standard error of the mean. 4. If σ is known, the distribution of

is approximately normally distributed with mean 0 and standard deviation 1. 5. If σ is not known and s is the standard deviation of the sample, we use s in place of σ,

in place of σX and the distribution of

is approximately normally distributed with mean 0 and standard deviation 1. 28.1 - Example 1: Past records of the student body at a large university show that the mean age is μ = 23.5 years with a standard deviation σ = 3.1 years. A sample of n = 100 students is taken at random. We define X to be the average age of the sample. Find (a). σX. (b). the probability that the average age X of this sample is at least 24 years old. (c). the probability that the average age X is between 23.1 and 24.1 years old. (d). the probability that the average age X is at most 23 years old. Solutions: ➤ (a). We are given that σ = 3.1 and the sample size taken is n = 100. From the Central Limit Theorem, we have

➤ (b). Step 1:We use the formula

to find the area under the normal distribution curve for X = 24. fig. 1 µ = 23.5 σX = 0.31 Step 2: Step 3: From the normal distribution tables, P{X ≥ 24} = 0.5 - 0.4463 = 0.0537

➤ (c). We use the formula

fig. 2 Step 1: For X = 23.1,

Step 2: For X = 24.1,

fig. 3

From the normal distribution table, the area is P{23.1≤ X ≤ 24.1} = 0.4015 + 0.4738 = 0.8753. ➤ (d). We use the formula

For X = 23,

fig. 4 From the table, P{X ≤ 23} = 0.5 - 0.4463 = 0.0537.

28.1 - Example 2: A local fish packing company packs 50 gallon containers with 100 pounds of fish. Assume each month a government agency randomly selects 49 containers and computes the average weight. If this average of these containers is less than 100 pounds than the company is fined. Find the value μ the company should strive for to assure that they will not be fined more than 2% of the time. Assume a standard deviation σ = 5 pounds. Solution:

Step 1: To solve for μ, we use the formula Step 2: X = 100 Step 3: fig. 5 Step 4: From the figure, we need to look up the area 0.48 from the normal distribution table: z = -2.05. Step 5: μ = X - zσX = 100 - (-2.05)0.71 ≈ 100 + 1.45 = 101.45 pounds. 28.1 - Example 3: The American Bubble Company recently purchased a new machine to fill 16 ounces of spring water. To check if the machine is filling a proper amount of water, they sample 100 bottles each hour. If the average fill of these bottles is less than c* ounces, than the machine is stopped and adjusted. Assuming σ = 0.5 ounces, find c* so that the chance the machine is stopped when properly functioning is 0.01.

Solution: Step 1: To solve for c*, we use the formula c* = μ + zσX. Step 2: μ = 16 Step 3: fig. 6

Step 4: From the figure, we need to look up the area 0.49 from the Normal distribution table: z = -2.33. Step 5: c* = μ + zσX = 16 - 2.33(0.05) = 15.88 ounces.

Solved Problems 28.1 - Solved Problem 1: The average life of 100 watt light bulbs produced by a company is µ = 1,890 hours with a standard deviation σ = 150 hours. A sample of n = 400 of these bulbs is selected at random. Find (a). σX. (b). the probability that the average life of this sample is at most 1,900 hours. (c). the probability that the average life is between 1,900 and 2,000 hours. (d). the probability that the average life is greater than 1,950 or less than 1,875 hours.

Solutions: ➤ (a). We are given that σ = 150 and the sample size taken is n = 400. From the Central Limit Theorem, we have

.

➤ (b). Step 1: We use the formula

to find the area under the normal distribution curve for figure 7. fig. 7

X = 1,900 μ = 1,890 σX = 7.50 Step 2:

Step 3: From the normal distribution tables, P{X ≤ 1900} = P{z ≤ 1.33} =

0.4082 + 0.5 = 0.9082. ➤ (c). From the normal distribution table, Step 1: fig. 8

Step 2: Step 3: P{1900 ≤ X ≤ 2000} = P{1.33 ≤ z ≤ 14.67} = 0.5 - 0.4082 = 0.0918 ➤ (d). We use the formula

Step 1: For Step 2: For fig. 9

, ,

P{X ≥ 1950} + P{X ≤ 1875} = P{z ≥ 8} + P{z ≤ -2} = 0 + 0.5 - 0.4772 = 0.0228 28.1 - Solved Problem 2: A machine is filling 1,000 cans hourly with 16 ounces of coffee. Each hour, a sample of 200 cans is randomly selected and checked for weight. If the average of these 200 cans weigh more than 16 ounces, the machine is stopped and adjusted. Assume a standard deviation σ = 1.5 ounces. What value μ should the company set the machine to assure that the process will be stopped no more than 5% of the time. Solution: Step 1: To solve for μ, we use the formula μ = X - zσX

Step 2: X = 16 fig. 10

Step 3: Step 4: From the figure, we need to look up the area 0.45 from the normal distribution table: z = 1.64. Step 5: μ = X - zσX = 16 - (1.64)0.106 ≈ 16 - .17 = 15.83 ounces. 28.1 - Solved Problem 3: The American Bubble Company recently purchased a new machine to fill 16 ounces of spring water. To check if the machine is filling a proper amount of water, they sample 100 bottles each hour. If the average fill of these bottles is more than c* ounces, than the machine is stopped and adjusted. Assuming σ = 0.5 ounces, find c* so that the chance the machine is stopped, when properly functioning is 0.03. Solution: Step 1: To solve for c*, we use the formula c* = μ + zσX. fig. 11 Step 2: μ = 16 Step 3: Step 4: From the figure, we need to look up the area 0.47 from the Normal distribution table: z = 1.88. Step 5: c* = μ + zσX = 16 + 1.88(0.05) =16+ 0.094 = 16.094 ounces.

Unsolved Problems with Answers 28.1 - Problem 1: A machine bores on average 1 cm holes in a metal plate with a standard deviation of 0.01 cm. A sample of 100 plates are taken. Find (a). σX. (b). the probability that the average size hole for this sample is greater than 1.002 cm. (c). the probability that the average size hole for this sample is between 1.002 and 1.003. (d). the probability that the average size hole for this sample is between .999 and 1.003. Answers: ➤ (a). σx = 0.001 ➤ (b). 0.0228 ➤ (c). 0.0215 ➤ (d). 0.84 ⇑ Refer back to 28.1 - Example 1 & 28.1 - Solved Problem 1. 28.1 - Problem 2: A local fish packing company packs 50 gallon containers with 100 pounds of fish. Assume each month, the company randomly selects 36 containers and computes the average weight. If the average of these containers is more than 100 pounds, then the company has to repack the containers. Find the value μ that will cause the company to repack 10% of the time. Assume a standard deviation σ = 6 pounds. Answer: µ = 98.72 pounds ⇑ Refer back to 28.1 - Example 2 & 28.1 - Solved Problem 2. 28.1 - Problem 3: A fishing company catches all its fish using nets. Government regulations require that the average length of a fish caught is 15

inches. After each catch, the company samples the length of 49 fish from its nets. If the average length is less than c* inches, all the fish are returned to the water. Assume on a given day that the average length of the catch is 15 inches with a standard deviation of 1.4 inches. Find c* so that the chance is only 5% that all the catch will be returned to the water. Answer: c* = 14.67 inches ⇑ Refer back to 28.1 - Example 3 & 28.1 - Solved Problem 3..

Supplementary Problems 1. The records of a local men's health club show that the average lifting weight is 178 pounds. A random sample of 100 club members shows that 40% of the men can lift more that 179 pounds. For all members, find the standard deviation σ. 2. A computer selects, with replacement, 36 numbers from the set {0, 1, 2, 3, 4..., 100}. a. Using the formula

, find μ.

b. Using the formula

, find σ.

c. For this sample, find the probability that P{X ≥ 60}. d. If only one number is selected at random, find P{X ≥ 60}. e. Use the Central Limit Theorem to find a sample size N for σX = 4.86. f. Find the smallest sample size where P{X ≥ 60} = 0.01. 3. College's records show that the grade point average (G.P.A.) of all female students is 2.95 with a standard deviation of 0.2 and a G.P.A of 2.94 for all male students with a standard deviation of 0.25. A random sample of 200 female students and 100 male students was taken. Find the probability a. that the average G.P.A. of the sampled female students and male students is greater than 2.97.

b. that the average G.P.A. of the sampled female students or male students is greater than 2.97. 4. The American Bubble Company recently purchased a new machine to fill 16 ounces of spring water. To check if the machine is filling a proper amount of water, they sample each hour 100 bottles. If the average fill of these bottles is less than 15.85 ounces, than the machine is stopped and adjusted. Assuming σ = 0.7 ounces, find the probability that over a 5 hour period, the machine will be stopped 1 time. For any sequence of discrete random variables X1, X2, …, Xn, we define the joint distribution of any subset Xi, Xj,…, Xr as P{Xi = xk, Xj = xw,…, Xr = xt} = P[{Xi = xk}∩{Xj = xw}∩…∩{Xr = xt}]. 5. A fair die is tossed twice. Let X1 equal the outcome on the first toss and X2 the outcome on the second toss. a. Compute the distribution of X by completing the following table:

b. Compute μ = E(X1), μ = E(X2) and E(X). c. Compute σ2, d Show e. Show

.

f. Compute P{X1 > 3.5} and P{X > 3.5}. 6. Assume a binomial experiment with N independent trials where p is the probability of success on each trial. a. Show μ = Np.

b. 7. If X and Y are two discrete, independent random variables, show E(XY) = E(X)E(Y). 8. A sequence of mutually independent random variables is called a Bernoulli sequence if P{Xk = 1} = pk and P{Xk = 0} = 1 - pk = qk (k = 1, 2,…, N). a. If S = X1 + X2 + … + XN, show E(S) = p1 + p2 + … + pN. b. Show, 9. Assume {Xk} (k = 1,…,n} is a sequence of random variables satisfying the Central Limit Theorem. a. Show E(X) = μ. b.

Show

In

lesson

16,

problem .

13

we

showed

.

10. Assume the following population Ω = {2,10}. A sample (with replacement ) of size N = 30 is taken from this population where P{Xk = 2} = 1/2 and P{Xk = 10} = 1/2 (k = 1,…,30). a. Find μ and σ2. b. Define Ω as the population of all averages X generated by all possible samples. Find the size of population Ω. c. List all 31 distinct numbers of Ω. d. Find the distribution of X for the population Ω. e. Find a summation formula for μX f. Using the central limit theorem, evaluate the sum in d. g. Assume a sample of N = 30 is taken. From the distribution of X, find the

probability that 4.9 ≤ X ≤ 6.8. h. Use the central limit theorem to approximate an estimate of P{4.9 ≤ X ≤ 6.8} 11. Show the random variable

has mean μ = 0 and σ = 1.

12. Assume s is the standard deviation computed from a sample of size N. Find μ and σ of

where

.

13. In a small European country the law permits a maximum of 4 automobiles per family. Their department of transportation recently did a study and found the following distribution of number of automobiles owned: 51% of the families own 1 automobile; 23% own 2 automobiles; 17% own 3 automobiles and 9% own 4 automobiles. Recently 100 families renewed their automobiles registration. Find a. μ. b. σX. c. For these 100 families estimated the probability on average they own at least 2 automobiles. 14. Assume the following game is played: a fair die is tossed once and the resulting value is recorded. a. Write out the population. b. Find μ and σ. c. If this game is played 64 times, find the probability that the average score is between 4 and 5. 15. Assume the following game is played: five cards are drawn without replacement from an ordinary deck of cards and the number of diamonds is recorded. a. Write out the population.

b. Find μ, σ. c. If the game is played 100 times, find the probability that the average number of diamonds drawn is less than 2. 1See

Supplementary problems in lesson 15 for definition

Statistical Inference Theory Lesson 29 Estimating the Mean μ of a Population

Since μ generally is not known, one of the goals of inference theory is to use X as an estimation of μ. There are two types of estimates: point estimate and interval estimate. In either type of estimate, X is substituted in place of μ. This substitution creates an error.

29.1 - What is the error created when using a point estimate? a. The following formula equals the error created when the standard deviation σ of the population is known:

where σ is the standard deviation of the population and N is the sample size.

b. The following formula is the error created when σ of the population is not known:

where s is the standard deviation of the sample of size N.

29.1 - Example 1: A large university wants to estimate the average age of its students. A random sample of size 100 is taken of the student body. From the sample, the average age is X = 23.5 years and s = 2.1 years. Assume X = 23.5 replaces μ. (a). Find the probability that the error created exceeds 1/2 year. (b). Find the minimum sample size so that the probability is 0.05 of making an error that exceeds 1/2 year. Solutions: ➤ (a). It is almost certain X that is smaller or larger than μ. fig. 1

The difference between X and μ is the error e* = ±(X - μ). We need to find the probability that the error e* exceeds 1/2 year. Step 1: Since we only have the standard deviation of the sample, s = 2.1. Step 2: The sample size N = 100 Step 3: = ±0.5 Step 4: Solving for z gives

Step 5: From the normal distribution table: fig. 2

P{e* > 1/2} = 1 - 0.4913 - 0.4913 = 0.0174 ➤ (b). Step 1: For the probability that the error will exceed 1/2 year is 0.05, we find z for the area 0.5 - 0.05/2 = 0.475: z = 1.96. Step 2: Since Step 3: Solving for N gives and N ≈ 68, minimum sample size.

Solved Problems 29.1 - Solved Problem 1: A company that manufactures a new gasoline additive is interested in testing the additive to determine the average additional mileage it will give to consumers. It selects 36 different cars and runs each car for 100 miles. The final results showed that the average increase of mileage was 2.1 miles per gallon with a standard deviation of s = 0.5 miles per gallon. Assume X = 2.1 replaces μ. (a). Find the probability that the error created exceeds 0.1 miles per gallon. (b). Find the minimum sample size so that the probability is 0.02 of making an error that exceeds 0.1 miles per gallon. Solutions:

➤ (a). fig. 3

It is almost certain that X is smaller or larger than μ. The difference between X and μ is the error e* = ±(X - μ). We need to find the probability that the error exceeds 0.1 gallons. Step 1: Since we only have the standard deviation of the sample, s = 0.5. Step 2: The sample size N = 36. Step 3: ±0.1 gallons. Step 4: Solving for z gives

.

Step 5: From the normal distribution table: fig. 4

P{ e* > 0.1} = 1 - 0.3944 -0.3944 = 0.2112. ➤ (b). Step 1: For the probability that the error will exceed 0.1 gallons is 0.02, we find z for the area 0.5 - 0.02/2 = 0.49: z = 2.33. Step 2: Step 3: Solving for N gives minimum sample size.

. Hence N ≈ 136,

Unsolved Problems with Answers 29.1 - Problem 1: A machine fills bottles with orange juice with a standard deviation of σ = 1.5 ounces. Each hour a sample of 50 filled bottles is taken. Assume the average from this sample is 12.3 ounces. (a). Find the probability that the error from the true average filled exceeds 0.5 ounces. (b). Find the minimum sample size so that the probability is 0.01 of making an error that exceeds 0.5 ounces Answers: ➤ (a). 0.02 ➤ (b). N ≈ 60 ⇑ Refer back to 29.1 - Example 1 & 29.1 - Solved Problem 1.

29.2 - What is the error created when using a Confidence Interval estimate ? The interval estimate of μ is called the confidence interval given by the following formulas: 1. If σ of the population is known:

2. If σ is not known then use s, the standard deviation of the sample:

The value z is determined according to the confidence in μ within the given interval. 29.2 - Example 1: A large university wants to estimate the average age of its students. A random sample of size 100 is taken of the student body. From the sample, the average age is X = 23.5 years and s = 2.1 years. (a). Find a 90% confidence interval for μ. (b). Find a 95% confidence interval for μ. Solutions: ➤ (a). Since the confidence interval is 90%, we use the area 0.90/2 = 0.45 to find z = 1.64. fig. 5

Step 1: s = 2.1 Step 2: N = 100 Step 3: Step 4:

:

23.5 - 1.64(0.21) ≤ μ ≤ 23.5 + 1.64(0.21) which gives 23.16 ≤ μ ≤ 23.84. Step 5: The value for μ ranges between 23.16 and 23.84 with 90% probability. ➤ (b). Since the confidence interval is 95%, we use the area 0.95/2 = 0.475 to find z = 1.96. Step 1: s = 2.1 fig. 6

Step 2: N = 100 Step 3: Step 4: 23.5 - 1.96(0.21) ≤ μ ≤ 23.5 + 1.96(0.21) which gives 23.09 ≤ μ ≤ 23.91. Step 5: The value for μ ranges between 23.09 and 23.91 with 95% probability.

Solved Problems 29.2 - Problem 1: A company that manufactures a new gasoline additive is interested in testing the additive to determine the average additional mileage it will give to consumers. It selects 36 different cars and runs each car for 100 miles. The final results showed that the average increase of mileage was 2.1 miles per gallon with a standard deviation of s = 0.2 miles per gallon. (a). Find a 92% confidence interval for μ. (b). Find a 99% confidence interval for μ. Solutions: ➤(a).

Since the confidence interval is 92%, we use the area

to find z =

1.75. Step 1: X = 2.1 and s = 0.2 fig. 7

Step 2: N = 36 Step 3: Step 4: 2.1 -0.058 ≤ μ ≤ 2.1 +0.058 which gives 2.04 ≤ μ ≤ 2.16. Step 5: The value for μ ranges between 2.04 and 2.16 with 92% probability. ➤ (b). Since the confidence interval is 99%, we use the area 0.99/2 = 0.495 to find z = 2.57. Step 1: X = 2.1 and s = 0.2 Step 2: N = 36

Step 3: e* Step 4: 2.1 -2.57(0.033) ≤ μ ≤ 2.1 +2.57(0.033) which gives 2.02 ≤ μ ≤ 2.18. Step 5: The value for μ ranges between 2.02 and 2.18 with 99% probability. fig. 8

Unsolved Problems with Answers 29.2 - Problem.1: A machine fills bottles with orange juice with a standard deviation of σ = 1.5 ounces. Each hour a sample of 50 filled bottles is taken. If the average from this sample is 12.3 ounces, (a). Find a 90% confidence interval for μ. (b). Find a 98% confidence interval for μ. Answers: ➤ (a). 11.96 ≤ μ ≤ 12.64 ➤ (b). 11.81 ≤ μ ≤ 12.79 ⇑ Refer back to 29.2 - Example 1 & 29.2 - Solved Problem 1.

29.3 - Determining the Sample Size. In the first section of this lesson, for each example and problem, we derived an estimate for the minimum sample size needed under the condition that a given error will exceed a given amount for a specifed probability. In this lesson we give below the formula needed to derive the same minimum sample size, within a given confidence interval:

where e* is the error, z is determined by the confidence of the estimate, and σ is the standard deviation of the population. 29.3 - Example.1: A machine fills bottles with orange juice with a standard deviation of σ = 1.5 ounces. Each hour a sample of filled bottles is to be taken. Find the sample size required to assure, with 90% confidence, that the true estimate of μ will not be off by more than (a). 0.1 ounces. (b). 0.2 ounces. Solutions: ➤ (a). Since we want a confidence of 90%, we look up in the normal distribution table the area 0.45. This gives z = 1.64. From the statement of the problem, e* = 0.1 and σ = 1.5. Therefore,

➤ (b). Since we want a confidence of 90%, we look up in the normal distribution table the area 0.45. This gives z = 1.64. From the statement of the problem, e* = 0.2 and σ = 1.5. Therefore,

Solved Problems 29.3 - Solved Problem 1: A medical journal needs a sample to determine the true average length of time it takes for male patients to recover from heart surgery. They first did a preliminary study and found the standard deviation to be 72 days. For a confidence of 95%, Find the sample size needed to estimate the true mean average within (a). 2 days. (b). 10 days. Solutions: ➤(a). Since we want a confidence of 95%, we look up in the normal distribution table the area 0.475. This gives z = 1.96. From the statement of the problem, e* = 2 and σ = 72. Therefore,

➤(b). Since we want a confidence of 95%, we look up in the normal distribution table the area 0.475. This gives z = 1.96. From the statement of the problem, e* = 10 and σ = 72. Therefore,

Unsolved Problems with Answers 29.3 - Problem 1: Over the years a large liberal arts college claims that the standard deviation of their students' grade point average is 0.3. For a confidence of 99%, find the sample size needed to estimate the true mean average within (a). 0.05 points.

(b). 0.10. Answers: ➤(a). N ≈ 239 students ➤(b). N ≈ 59 students ⇑ Refer back to 29.3 - Example 1 & 29.3 - Solved Problem 1.

Supplementary Problems 1. A fair coin is tossed 100 times resulting in the value X which is the number of heads that occurs in the sample. Assume X is used as the estimator of the true average number of heads that should appear in the sample. a. Find the probability that the error created will be 10 heads or more. b. Find the range of X values that can occur with approximate probability 0.90. 2. A recent study of medical schools in California showed that 60% of all medical students are female. A random survey of 200 medical students was taken. Let X be the number of female students in the sample. Assume X is used as the estimator of the true average number of females that should result from this sample. a. Find the probability that X will deviate from μ by 20 females or more. b. Find the range of X values that can occur with approximate probability 0.95. 3. The Clear Water Bottling Company has a machine that fills bottles with spring water distributed according to a normal distribution, where µ = 16.2 ounces and σ = 0.15 ounces. A bottle is said to be under-filled if it contains less than 16 ounces. A random sample of 200 filled bottles is taken and checked for the quantity of spring water in each bottle. Let X be the number of bottles in this sample that contain less than 16 ounces. a. Find the probability that X will deviate from μ by at least 5 under- filled bottles. b. Find the range of X values that can occur with probability 0.90.

4. A Las Vegas casino tested a recently purchased machine that shuffles a card deck containing 52 cards. To test the machine for randomness, it has the machine deal 1000 randomly shuffled 5 card hands. Let X be the number of hands from this sample that contains all black cards. a. Find μ, the average number of hands that contain all black cards. b. Find the probability that X will deviate from μ by at least 5 such hands. c. Find the range of X values that can occur with probability 0.95. 5. Assume a sample of size 1,000 is taken from a binomial population. Let X be the estimator of μ. If 90% of all X have a range from μ of 12 units or less, find p > q, μ and σ of the binomial distribution. 6. From the Central Limit Theorem we have Show

.

.

7. A survey of 100 banks was taken on the interest rates they charge on home mortgages. The following frequency table shows these results:

Find: a. X. b. the standard error of the mean c. the probability that the error e* = X - μ exceeds 0.1%. 8. The Tammy May Weight Reduction Center recently took a random survey of 36 members that have lost over 10 pounds. The following frequency table

shows the amount of weight lost:

Find: a. X. b. the standard error of the mean c. the probability that the average weight of these 36 members is off by more than 1 pound from the true average of members that lost more than 10 pounds. 9. In a speed reading class, the following table shows the time it took 100 students to each read a given novel:

Find: a. X.

b. the standard error of the mean is c. the probability that the average time of these 100 readers to finish the novel is off by more than 10 minutes from the true average time it should take. 10. The determination of a minimum sample size is given by the formula

where e* is the desired error. a. If e* is changed by a factor a >0, show the new sample size is N1 = N/a2. b. If N = 100 and e* is decreased by 50%, find N1. c. If N = 100 and e* is increased by 10%, find N1. 11. The error is given by

a. If N is changed by a factor a >0, show the new error is b. If e* = 0.1 and N is decreased by 50%, find e1*. c. If e* = 0.01 and N is increased by 100%, find e1*.

Statistical Inference Theory Lesson 30 Statistical Hypotheses

A very important part of inference theory is the ability to statistically test claims and counter-claims made about a given population. Such claims are made in regard to certain parameters of the population such as the mean μ. A claim about a population parameter is called a null hypotheses, indicated by Ho. A counter-claim is called an alternative hypotheses, indicated by Ha. The following two rules should be strictly followed: Rule 1: For the null hypotheses and the claimed value a, Ho can only be one of the following forms: Ho: μ = a Ho: μ ≥ a Ho: μ ≤ a.

Important: For the claim Ho, μ must always be given a value. Rule 2: For the alternative hypotheses and counter-claim, Ha can only be one of the following forms: Ha: μ ≠ a Ha: μ > a Ha: μ < a. Important: For the counter-claim Ha, μ must never be given a value.

30.1- Statistical Hypotheses for µ 30.1 - Example 1: The President of a large university claims that the average age of all males graduate students is 32.25 years old. State Ho and Ha. Solution: Since the claim that the average age of all male graduate students is 32.25 years old: Ho: μ = 32.25. The counter-claim is that the average of all male graduate students is not 32.25: Ha: μ ≠ 32.25. 30.1 - Example 2: A CEO of an automobile manufacturer recently released to the local press an announcement that their new sports car gets, on average, 30 miles to the gallon. State Ho and Ha. Solution: Since the CEO has a vested interest in high mileage, the claim that their new sports car gets, on average, 30 miles to the gallon can be interpreted as the car really gets at least 30 miles to the gallon: Ho: μ ≥ 30.

The counter-claim is that the sport car gets less than 30 miles a gallon: Ha: μ < 30. 30.1 - Example: 3: An executive for a large corporation claims that their employees earn more than $8.00 an hour. The union that represents these workers doubts this claim. State Ho and Ha. Solution: In order to assign a value to μ for the null hypotheses, we must change the counter-claim of the union into the claim. Therefore, we can write the claim in two different ways: Ho: μ ≤ 8.00 or Ho: μ = 8.00. It naturally follows that the counter-claim is Ha: μ > 8.00.

Solved Problems 30.1 - Solved Problem 1: A recent study claims that the average age of taxi drivers in Los Angeles is 29.10 years. State Ho and Ha. Solution: The claim that the average age of all taxi drivers is 29.10 years: Ho: μ = 29.10. The counter-claim is that the average of all taxi drivers is not 29.10: Ha: μ ≠ 29.10. 30.1 - Solved Problem 2: The Martha Bay Speed Reading Corp. claims that a graduate of their program will read, on average, 5,000 words a minute. State Ho and Ha. Solution:

The company has a vested interest in the reading skills of its students. Therefore, their claim should be Ho: μ ≥ 5,000 or Ho: μ = 5,000. The counter-claim is Ha: μ < 5,000. 30.1 - Solved Problem: 3: Under a new study program, a representative of a State prison states that the inmates watch television, on average, less than 54 hours a week. A legislative committee of the State doubts this claim. State Ho and Ha. Solution: To assign a value to μ, The counter-claim of the legislative committee becomes the claim and is used for the null hypotheses which can be expressed in two different ways: Ho: μ ≥ 54 or Ho: μ = 54. The alternative hypotheses is Ha: μ < 54.

Unsolved Problems with Answers 30.1 - Problem 1: A time study showed that it takes 20 minutes on average to change the oil in a 8 cylinder sedan. State Ho and Ha. Answer: Ho: μ = 20 Ha: μ ≠ 20 ⇑ Refer back to 30.1 - Example 1 & 30.1 - Solved Problem 1.

30.1 - Problem 2: A heath store claims that a new blend of minerals they are selling will cause, on average, a loss of weight of at least 10 pounds in 30 days. State Ho and Ha. Answer: Ho: μ ≥ 10 or Ho: μ = 10 Ha: μ < 10. ⇑ Refer back to 30.1 - Example 2 & 30.1 - Solved Problem 2. 30.1 - Problem 3: It is claimed that senior students at a local high school study more than 20 hours a week. State Ho and Ha. Answer: Ho: μ ≤ 20 or Ho: μ = 20 Ha: μ > 20 ⇑ Refer back to 30.1 - Example 3 & 30.1 - Solved Problem 3.

Supplementary Problems 1. Water samples are taken from water used for cooling as it is being discharged from a power plant into a river. It has been determined that as long as the mean temperature of the discharged water is at most 1500F, there will be no negative effects on the river's ecosystem. State Ho and Ha. 2. Historically, evening long distance phone calls from a particular city have averaged 15.20 minutes per call. State Ho and Ha. 3. Stout Electric Co. operates a fleet of trucks giving electrical service to the construction industry. In the past, monthly average maintenance costs have been $75 per truck with a standard deviation of $3.75. Management wishes to determine whether or not the mean monthly maintenance cost has increased.

State Ho and Ha. 4. A government testing agency tests a sample of 36 packages of ground beef sold by the Shop & Save Super Market. The label on each package reads: "contains no more than 25% fat". State Ho and Ha. 5. A foundry desires to produce iron castings with an average weight of 20 lbs and a standard deviation of 2 pounds. To decide whether the manufacturing process is operating satisfactorily a sample of 40 castings is selected each hour from the output of the process and their average weight is determined. State Ho and Ha. 6. The average age of customers of a large manufacturer of men's clothing is 36 years. State Ho and Ha. 7. A soft-drink bottling process is considered to be operating satisfactorily when the mean fill per bottle is μ = 355 ml. State Ho and Ha. 8. Assume, a government housing agency, each year has been committed to doing N housing projects. Further assume the agency, decides to do a study to decide if the number of projects should be changed or keep the same. Their criteria might be the following: i. If their statistical study results in their concluding that the average family income is less than $5,000 then they are justified to increase the future number of housing projects to a number greater than N. ii. If their statistical study results in their concluding that the average family income is more than $6,000 then they are justified to decreasing the future number of housing projects to a number less than N. iii. However, if their statistical study results in their concluding that the average family income is some where between $5,000 and $6,000 then they are justified to keep the number of projects to N. State Ho and Ha. 9. Federal officials have investigated the problems associated with the disposal of hazardous wastes. The EPA standard for maximum allowable radiation level of drinking water is 15 pCi/l. Suppose that the sample of 11

water specimens from Glen Avon Springs resulted in a sample mean radiation level of 22.5 pCi/l and a standard deviation of 8. State Ho and Ha. 10. A soda manufacturer is interested in determining whether its bottling machine tends to overfill. Each bottle is supposed to contain 12 oz of fluid. A random sample of size 36 is taken from bottles coming off the production line and the contents of each bottle are carefully measured. It is found that the mean amount of soda in the sample of bottles is 12.1 oz and the standard deviation is 0.2 oz. State Ho and Ha. 11. A machine that cuts corks for wine bottles operates so that the diameter of the corks produced is approximately normally distributed with a mean 0.3 cm and standard deviation 0.01 cm. The specifications call for corks with diameters between .29 and .31 cm. A cork not meeting these specifications is considered defective. State Ho and Ha.

Statistical Inference Theory Lesson 31 Hypotheses Testing

Claims and counter-claims can be statistically tested by taking a sample of the population. There are two types of hypotheses testing:

For hypotheses testing, the following steps are taken in the given order: Depending on the purpose of the study: 1. A claim (Ho) is made about the true value μ of a population. 2. A counter-claim (Ha) is made about the true value μ of a population. 3. A decision rule is made which is determined by the counter-claim Ha. 4. An appropriate sample is taken from the population and the mean x of the sample is computed.

5. Depending on the value x and the decision rule, only one of the following action is to be taken: reject the claim Ho and accept the counter-claim Ha or reject the counter-claim Ha and accept Ho or reject the counter-claim Ha and reserve judgement on Ho.

31.1 - Two sided-test 31.1 - Example 1: A recent publication of the U.S. Army claims that the average age of new army personnel is μ = 21.5 years with a standard deviation of σ = 2.5 years. To test this claim, a random sample of 100 new personnel is taken. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: D.R.: If 21 ≤ X ≤ 22, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in a mean age of X = 22.1 years, what decision should be made? Solutions: ➤(a). Since the claim is that the average age is 21.5 years: Ho: μ = 21.5. Since we wish to test this claim, the counter-claim is Ha: μ ≠ 21.5.

➤(b). Step 1: First we compute

.

Step 2: From the alternative hypotheses, we have a two-sided test. Step 3: We need to compute P{21 ≤ X ≤ 22}. fig. 1.

From the normal distribution table, we find P{-2 ≤ Z ≤ 2} = 0.4772 + 0.4772 = 0.9544 and P{21 ≤ X ≤ 22} = P{-2 ≤ Z ≤ 2} = 0.9544. Step 4: Therefore, the probability of rejecting Ho and accepting Ha is 1 0.9544 = 0.0456 and the probability of rejecting Ha is 0.9544. fig. 2.

➤(c). Since X = 22.1 > 22 and lies outside the interval {21 ≤ X ≤ 22}, we reject Ho and accept Ha. We conclude that the average age of new army personnel is not 21.5. 31.1 - Example. 2: At a book publisher's convention, an author of American history text books claimed that the average American history text book contains μ = 850 pages. Assume you wish to test his claim by taking a random sample of 36 American history texts. Assume a standard deviation of σ = 80 pages. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: D.R.: If 840 ≤ X ≤ 860, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 857 pages, what decision should be made? Solutions: ➤(a). Since the claim is the μ = 850 pages Ho: μ = 850. Since we wish to test this claim the counter-claim is

Ha: μ ≠ 850. ➤(b). Step 1: First we compute Step 2: From the alternative hypotheses, we have a two-sided test. Step 3: We need to compute P{840 ≤ X ≤ 860}.

fig. 3.

From the normal distribution table, we find P{-0.75 ≤ Z ≤ 0.75} = 0.2734 + 0.2734 = 0.5468. and P{840 ≤ X ≤ 860} = P{-0.75 ≤ Z ≤ 0.75} = 0.5468.

Step 4: Therefore, the probability of rejecting Ho and accepting Ha is 1 0.5468 = 0.4532 and the probability of rejecting Ha is 0.5468. fig. 4 ➤(c). Since X = 857 and 840 ≤ 857 ≤ 860, we reject Ha and reserve judgement. We have no statistical reason to reject Ho. Solved Problems 31.1 Solved Problem 1: A large national corporation's past records show that their salespersons travel on average μ = 1,350 miles. They hire a statistician to determine if, during the past year, there has been a significant change in their travel mileage. A sample of 100 salespersons' travel records was taken. Assuming the standard deviation is σ = 150 miles. (a). State Ho and Ha. (b). For the following decision rule, if there has been no change in travel mileage, find the probability of rejecting Ho and accepting Ha and also find the probability of rejecting Ha: D.R..: If 1,300 ≤ X ≤ 1,400, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 1,287, what decision should be made?

Solutions: ➤ (a). Since the claim is the μ = 1,350 miles, Ho:: μ = 1,350. Since we wish to test this claim the counter-claim is Ha: μ ≠ 1,350. ➤ (b). Step 1: First we compute Step 2: From the alternative hypotheses, we have a two-sided test. Step 3: We need to compute P{1300 ≤ X ≤ 1400}. fig. 5.

From the normal distribution table, we find P{-3.33 ≤ Z ≤ 3.33} = 0.4996 + 0.4996 = 0.9992 and

P{1300 ≤ X ≤ 1400} = P{-3.33 ≤ Z ≤ 3.33} = 0.9992. Step 4: Therefore, the probability of rejecting Ho and accepting Ha is 1 - 0.9992 ≈ 0 and the probability of rejecting Ha is 0.9992 ≈ 1. fig. 6.

➤(c). Since X = 1,287 and 1287 < 1300, we reject Ho and accept Ha. We conclude there has been a significant change in travel distance. 31.1 - Solved Problem 2: A recent study showed that the average number of hours a law student studies to pass the bar is μ = 1,565 with a standard deviation of σ = 122 hours. You wish to take a sample of size 49 and test this claim. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: D.R..: If 1530 ≤ X ≤ 1600, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 1,558, what decision should be made? Solutions: ➤(a).

Since the claim is the μ = 1,565 hours Ho: μ = 1,565. Since we wish to test this claim the counter-claim is Ha: μ ≠ 1,565. ➤(b). Step 1: First we compute Step 2: From the alternative hypotheses, we have a two-sided test. Step 3: We need to compute P{1530 ≤ X ≤ 1600}. fig. 7.

From the normal distribution table, we find P{-2.02 ≤ Z ≤ 2.02} = 0.4783 + 0.4783 = 0.9566 and P{1530 ≤ X ≤ 1600} = P{-2.02 ≤ Z ≤ 2.02} = 0.9566.

Step 4: Therefore, the probability of rejecting Ho and accepting Ha is 1 - 0.9566 ≈ 0.0434 and the probability of rejecting Ha is 0.9566. fig. 8.

➤ (c). Since X = 1558 and 1530 ≤ 1558 ≤ 1600, we reject Ha and reserve judgement. There is no reason to reject this claim.

Unsolved Problems with Answers 31.1 - Problem 1: Over many years, records have shown that in a certain large airport the average number of pieces of passenger luggage that was handled by the airport was 25,500 per day. Management decided to modify the system that the employees used in handling the passengers' luggage. After completion, a random sample of 100 days was taken to find out if their had been any significant changes in the amount of luggage handled. Assume a standard deviation of 5,000. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: D.R..: If 25,000 ≤ X ≤ 26,000, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the decision rule, a sample is taken from the population. If the

sample resulted in X = 24,900, what decision should be made? Answers: ➤(a). Ho: μ = 25,500 Ha: μ ≠ 25,500 ➤(b). The probability of rejecting Ho and accepting Ha is 0.3174 and the probability of rejecting Ha is 0.6826. ➤(c). Reject Ho and accept Ha. We conclude that this new method has changed the handling of passengers' luggage. ⇑ Refer back to 31.1 - Example 1 & 31.1 - Solved Problem 1. 31.1 - Problem 2: It is claimed that a machine produces 1,000 electronic sockets a minute. To test this claim, the output of this machine is tested over 36 minutes. Assume a standard deviation of 76 sockets a minute. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting Ho and accepting Ha and also find the probability of rejecting Ha: D.R.: If 900 ≤ X ≤ 1,100, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 1,097, what decision should be made? Answers: ➤(a). Ho: μ = 1,000 Ha: μ ≠ 1,000 ➤(b).The probability of rejecting Ho and accepting Ha is 0 and the probability of rejecting Ha is 1. ➤(c). Reject Ha.

We conclude that there is no significant change in production. ⇑ Refer back to 31.1 - Example 2 & 31.1 - Solved Problem 2.

31.2 - One sided-test 31.2 - Example 1: In order to attract more winter tourists, a southern Florida resort hotel purchased a magazine advertisement that circulates in the New York city area. Part of the advertisement claimed that the average temperature in the resort area, during the month of January is μ = 75 degrees Fahrenheit. To challenge this claim, 36 past years are selected at random. Assume a standard deviation of 7o. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting Ho and accepting Ha and also find the probability of rejecting Ha: D.R.: If X < 73o, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 73.2o, what decision should be made? Solutions: ➤(a). Since the claim is 75o: Ho: μ = 75o Since we wish to test this claim the counter-claim is Ha: μ < 75o ➤(b). Step 1: First, we compute

Step 2: From the alternative hypotheses, we have a one-sided test.

fig. 9.

Step 3: From the normal distribution table, we find P{ Z ≤ -1.71} = 0.5 - 0.4562 = 0.0438. P{X ≤ 730} = P{ Z ≤ -1.71} = 0.0438. Step 4: Therefore, the probability of rejecting H0 and accepting Ha is 0.0438 and the probability of rejecting Ha is 1 - 0.0438 = 0.9562. fig. 10

➤(c). Since X = 73.2o > 73o, we reject Ha and have no statistical basis for rejecting

the resort's claim. 31.2 - Example 2: The manufacturer of fluorescent light bulbs advertises that the average life of a light bulb is μ = 15,000 hours of burning. A consumer group doubts this claim. Assume a random sample of 49 bulbs is selected. Also assume a standard deviation of 700 hours. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: Decision Rule: If X < 14,900, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 14,948, what decision should be made? Solutions: ➤ (a). Since the claim is 15,000 hours: Ho: μ = 15000. Since we wish to test this claim, the counter-claim is Ha: μ < 15000. ➤ (b). Step 1: First we compute Step 2: From the alternative hypotheses, we have a one-sided test. fig. 11.

Step 3: From the normal distribution table, we find P{z ≤ -1} = 0.5 - 0.3413 = 0.1587, P{X ≤ 14900} = P{ Z ≤ -1} = 0.1587. fig. 12.

Step 4: Therefore, the probability of rejecting Ho and accepting Ha is 0.1587 and the probability of rejecting Ha is 1 - 0.1587 = 0.8413. ➤ (c). Since X = 14,948 > 14900, we reject Ha and have no statistical basis for rejecting the manufacturer's claim.

Solved Problems 31.2 - Solved Problem 1: A U.S. Department of Agriculture study showed that over the last 50 years, cattle ranchers in southern Texas produced on average 25,200 head of cattle per year. As an attempt to increase this yield, 400 cattle ranches decided to feed their cattle a new hybrid of corn. Assuming a standard deviation of 2,100 cattle, (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: D.R.: If X > 25,300, then reject Ho and accept Ha; otherwise, reject Ha. (c). After one year, following the decision rule, a sample is taken from the population. which resulted in X = 25,524, what decision should be made? Solutions: ➤(a). Since the original claim is 25,200: Ho: μ = 25,200. Since we wish to test this claim the counter-claim is Ha: μ > 25,200 ➤(b). Step 1: First we compute

.

Step 2: From the alternative hypotheses, we have a one-sided test. fig. 13.

Step 3: From the normal distribution table, we find P{ Z ≥ 0.95} = 0.5 - 0.3289 = 0.1701 P{X ≥ 25300} = P{ Z ≥ 0.95} = 0.1701 fig. 14

Step 4: Therefore, the probability of rejecting Ho and accepting Ha is 0.1701 and the probability of rejecting Ha is 1 - 0.1701 = 0.8289. ➤(c). Since X = 25524 > 25300, we reject Ho and have a statistical basis for accepting the new feed increases the yield of cattle.

31.2 - Solved Problem 2: The Sally Stone Speed Reading System claims that a person using their system, after six weeks, will be able to read at least 1,200 words a minute. To test this claim, 100 graduating students of this program were tested for their speed in reading. Assume a standard deviation of 90 words a minute. (a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: Decision Rule: If X < 1,100, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = 1,298, what decision should be made? Solutions: ➤(a). Since the original claim is 1,200: Ho: μ = 1200. Since we wish to test this claim the counter-claim is Ha: μ < 1200. ➤(b). Step 1: First we compute Step 2: From the alternative hypotheses, we have a one-sided test. fig. 15.

Step 3: From the normal distribution table, we find P{ Z ≤ -11.11} = 0.5 - 0.4999 = 0.0001 P{X ≤ 1100} = P{ Z ≤ -11.11} ≈ 0 fig. 16

Step 4: Therefore, the probability of rejecting Ho and accepting Ha is about 0 and the probability of rejecting Ha is about 1 - 0 = 1. ➤(c). Since X = 1,298 > 1,1000, we reject Ha and have no statistical basis for rejecting the claim that the students can read at least 1200 words a minute.

Unsolved Problem with Answers 31.2 - Problem 1: A certain manufacturing process has been used in the automobile industry to produce a part for the transmission system. This process, on average, takes 5.6 minutes per transmission system. The manufacturer of a new laser machine claims that their machine will decrease the average production time. Using this new machine, a time study was taken to determine the average time to produce 100 transmissions. This study resulted in an average time of 5.1 minutes per transmission system with a standard deviation of 0.45 minutes. (a). State Ho and Ha. (b). For the following decision rule, if the Ho is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: Decision Rule: If X < 5.5, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the decision rule, a sample is taken from the population. Since the sample resulted in X = 5.1, what decision should be made? Answers: ➤(a). Ho: μ = 5.6 Ha: μ < 5.6 ➤(b). he probability of rejecting Ho and accept Ha is 0.0132 and the probability of rejecting Ha is .09868. ➤(c). Since X = 5.1 < 5.5, reject H0 and accept Ha. There is a statistical basis to conclude that the new machine is superior. ⇑ Refer back to 31.2 - Example 1 & 31.2 - Solved Problem 1. 31.2 - Problem 2: The union claims that the average worker at their Oakland plant earns no more than $8.90 an hour. To test this claim, a sample of 100 workers is taken. Assuming a standard deviation of $1.00.

(a). State Ho and Ha. (b). For the following decision rule, if the claim is true, find the probability of rejecting H0 and accepting Ha and also find the probability of rejecting Ha: Decision Rule: If X > $9.00, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the decision rule, a sample is taken from the population. If the sample resulted in X = $8.80, what decision should be made? Answers: ➤(a). Ho: μ = $8.90 Ha: μ > $8.90 ➤(b). The probability of rejecting Ho and accept Ha is 0.1587 and the probability of rejecting Ha is 0.8413. ➤(c). Since X = $8.80 < $9.00 then reject Ha. There is not statistical basis to reject the union's claim. ⇑ Refer back to 31.2 - Example 2 & 31.2 - Solved Problem 2.

Supplementary Problems 1. A rare mint coin is sold to an auction house. The guarantee on the coin states that the coin is well balanced. To test this claim, the coin is tossed 100 times to check its balance. The following decision rule is used to verify the claim: D.R.: If the number of heads that occurs is between 45 and 55, accept the claim that the coin is well balanced. However, if the number of heads is more than 55 or less than 45 heads, reject the claim that the coin is well balanced. a. State Ho and Ha. b. Write out the Decision rule for Ho and Ha.

c. Find the probability that the claim is accepted (Ho) and Ha rejected when Ho is true. d. Find the probability that the claim is rejected (Ho) and Ha accepted when Ho is true. 2. A spokesperson for a Federal agency stated that 60% of all medical students are female. To test this claim, 200 medical students are selected at random. The following decision rule is used: D.R.: If in this sample, between 128 and 112 students are found to be females, then accept the claim; otherwise reject this claim. a. State Ho and Ha. b. Write out the decision rule for Ho and Ha. c. Find the probability that the claim is accepted (Ho) and Ha rejected when Ho is true. d. Find the probability that the claim is rejected (Ho) and Ha accepted when Ho is true. 3. A machine that fills spring water was recently purchased. The manufacturer of the machine claims that the machine will fill 95% of the bottles with 16 ounces of water. To check out this claim, 36 filled bottles are randomly selected. Since the expected number of bottles from this sample is (36)(0.95) = 34.2, the following decision rule is to be developed for Ho and Ha: D.R.: From the sample, let X represent the number of bottles that contain 16 ounces of water. If 34.2 - e* ≤ X ≤ 34.2 + e* then accept the claim Ho and reject Ha; otherwise reject the claim Ho and accept Ha. a. State Ho and Ha. b. Assume Ho is true. Find c* so that the chance of rejecting Ha and accepting Ho is 0.90.

c. For e*, rewrite the decision rule. 4. A roulette wheel contains numbers 0,1,2,…, 36 and the double symbol 00. A Las Vegas casino each week needs to check the balance on its wheel. To test this balance, they spin the wheel 76 times and checks the total number of odd numbers that occur. The following decision rule is used to check if the wheel is balanced: D.R.: If the number of odd numbers that occur from the sample is between 32 and 40 then conclude that the wheel is in balance; otherwise conclude it is out of balance. a. State Ho and Ha. b. Write out the Decision rule. c. Find the probability that the claim is accepted (Ho) and Ha rejected when Ho is true. d. Find the probability that the claim is rejected (Ho) and Ha accepted when Ho is true. 5. A television rating company reported on a typical day, 35% of all viewers of day-time soap operas are males. To verify this report, a sample of 500 viewers of soap operas are sampled. The following decision rule is used: D.R.: If, in this sample, between 170 and 180 viewers are males, then accept the claim; otherwise reject this claim. a. State Ho and Ha. b. Write out the decision rule. c. Find the probability that the claim is accepted (Ho) and Ha rejected when Ho is true. d. Find the probability that the claim is rejected (Ho) and Ha accepted when Ho is true. 6. The warranty on a certain brand of tires assumes that 15% of all tires will last more than 35,000 miles. To check out this assumption, a sample is made

of 200 tires that are traded in. The following decision rule is to be developed: D.R..: From the sample let X represent the number of tire that lasted more than 35,000 miles. If 30 - e* ≤ X ≤ 30 + e* then accept the assumption; otherwise reject the assumption. a. State Ho and Ha. b. Assume Ho is true. Find e* so that the chance of accepting Ho is 0.95. c. For e*, rewrite the decision rule. 7. Mr. Jones purchased a computer program to predict the final outcome of football games. The manufacture's claim that the program has a success rate of 60% or more in predicting the outcomes of these games. To test this claim, Mr. Jones checks the results of the outcome of the prediction on the next 100 games and uses the following decision rule: D.R..: If less than 58 games are predicted correctly, Mr. Jones will return the program for a full refund. However, if 58or more games are predicted correctly, then Mr. Jones will use the program. a. State Ho and Ha. b. Write out the decision rule for Ho and Ha. c. Find the probability that the claim is accepted (Ho) and Ha rejected when Ho is true. d. Find the probability that the claim is rejected (Ho) and Ha accepted when Ho is true. 8. A union representing workers in the automobile industry claims that only 25% of all workers earn more than $12.00 an hour. A representative of the industry doubts this claim. a. State Ho and Ha. To test this claim, 200 workers are randomly sampled. The following decision rule is used: D.R.: Let X represent the number of workers earning more than $12.00. If X

≤ 45 then accept the claim of the union. If X > 45 then reject their claim. b. Find the probability that the claim is accepted (Ho) and Ha rejected when Ho is true. c. Find the probability that the claim is rejected (Ho) and Ha accepted when Ho is true. d. Modify the decision rule so that the probability of rejecting Ho and accepting Ha when Ho is true is about 0.05.

Statistical Inference Theory Lesson 32 Type I Error When rejecting Ho and Ha two types of errors can occur: Type I and Type II. In this lesson we take up Type I errors. A Type I error occurs, when 1. The null hypotheses Ho is true. 2. Ho is rejected and we must accept the alternative hypotheses Ha. The probability that a Type I error occurs is called the level of significance. This probability value is indicated by the symbol α. If a Type I error does not occur then 1. the null hypotheses Ho can be accepted or 2. reserve judgement i.e. there is not sufficient statistical evidence to accept Ho or reject Ho.

32.1-Two-sided test 32.1 - Example 1: For a certain population, the following claim and counterclaim was made: Ho: μ = 10 Ha: μ ≠ 10

To test this claim a sample is to be taken from the population after creating the following decision rule: D.R.: If

then reject Ha; otherwise reject Ho and accept Ha.

Assume the sample mean resulted in

.

(a). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (b). If μ = 6, did a Type I error occur? (c). If μ = 10, did a Type I error occur? Solutions: ➤(a). Since the sample average is outside the interval decision rule requires that you reject Ho and accept Ha.

, the

➤(b). Step 1: Since the population average μ = 6, the claim Ho: μ = 10 is false and the counter-claim Ha: μ ≠ 10 is true. Step 2: Since the sample average is outside the interval the decision rule requires that you reject Ho and accept Ha.

,

Step 3: Since you are rejecting Ho, which is false and accepting Ha which is true, no Type I error occurred. ➤(c). Step 1: Since the population average μ = 10, the claim Ho: μ = 10 is true and the counter-claim Ha: μ ≠ 10 is false. Step 2: Since the sample average is outside the interval the decision rule requires that you reject Ho and accept Ha.

,

Step 3: Since you are rejecting Ho, which is true and accepting Ha,which is false, a Type I error has occurred.

32.1 - Example 2: A recent publication of the U.S. Army claims that the average age of new army personnel is μ = 21.5 years with a standard deviation of σ = 2.5 years. To test this claim, a random sample of 100 new personnel is taken. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: D.R.: If 21 ≤ X ≤ 22 then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 23.5. If μ = 21.5, would a Type I error occur? (d). Modify the above decision rule so that α = 0.02. Solutions:

➤(a). From 31.1 - Example 1, Lesson 31, we have Ho: μ = 21.5 Ha: μ ≠ 21.5 ➤(b). From 31.1 - Example 1, Lesson 31, we have the probability of rejecting Ho and accepting Ha is 1 - 0.9544 = 0.0456 when H0 is true. Therefore, α =

0.0456. ➤(c). Step 1: Since μ = 21.5, Ho is true. Step 2: The decision rule is: D.R.: If 21 ≤ X ≤ 22, then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 23.5, the decision rule requires us to reject Ho and accept Ha. Since we are rejecting Ho, which is true, we are making a Type I error. ➤(d). Step 1: We write the decision rule as: D.R.: If 21.5 - c* ≤ X ≤ 21.5 + c*, then reject Ha; otherwise reject Ho and accept Ha, where

Step 2: Since the error can occur one of two sides, we use

to find z. From the standard normal table z ≈ 2.33.

Step 3: 21.5 - 0.58 ≤ X ≤ 21.5 + 0.58 = 20.92 ≤ X ≤ 22.08 Therefore, the decision rule is D.R.: If 20.92 ≤ X ≤ 22.08 then reject Ha; otherwise reject Ho and accept Ha. 32.1 - Example 3: At a book publisher's convention, an author of American history text books claimed that the average American history text book contains μ = 850 pages. Assume you wish to test his claim by taking a random sample of 36 American history texts. Assume a standard deviation of σ = 80 pages. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a Type I error: D.R.: If 840 ≤ X ≤ 860 then reject Ha and accept Ho; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 855. If μ = 843, would a Type I error occur? (d). Modify the above decision rule so that α = 0.01. Solutions: ➤(a). From Example 2, Lesson 31, Ho: μ = 850 Ha: μ ≠ 850 ➤(b). From 31.1- Example 2, Lesson 31, we have the probability of rejecting Ho and accepting Ha is 1 - 0.5468 = 0.4532 when Ho is true. Therefore, α = 0.4532.

➤(c). Step 1: Since μ = 843, Ha is true.

Step 2: The decision rule is If 840 ≤ X ≤ 860 then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 855, the decision rule requires us to reject Ha. Since we are rejecting Ha, which is true, no Type I error occurred.

➤(d). Step 1: We write the decision rule as D.R.:If 850 - c* ≤ X ≤ 850 + c* then reject Ha; otherwise reject Ho and accept Ha, where

Step 2: Since the error can occur on either side, we use

to find z. From the standard normal table z ≈ 2.58. Step 3: (850 - 34.40 ≤ X ≤ 850 + 34.40) = (816 ≤ X ≤ 884) Therefore, the decision rule is D.R.: If 816 ≤ X ≤ 884 then reject Ha; otherwise reject Ho and accept Ha.

Solved Problems 32.1 - Solved Problem 1: For a certain population, the following claim and counter-claim was made: Ho: μ = 55 Ha: μ ≠ 55 To test this claim a sample is to be taken of the population after creating the following decision rule: D.R. If accept Ha.

then reject Ha and accept Ho; otherwise reject Ho and

Assume the sample mean resulted in

.

(a). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha and accept Ho.

(b). If μ = 52 did a Type I error occur? (c). If μ = 55 did a Type I error occur? Solutions: ➤(a). Since the sample average is outside the interval decision rule requires that you reject Ho and accept Ha.

, the

➤ (b). Step 1: Since the population average μ = 52, the claim Ho: μ = 55 is false and the counter-claim Ha: μ ≠ 55 is true. Step 2: Since the sample average is outside the interval the decision rule requires that you reject Ho and accept Ha.

,

Step 3: Since you are rejecting Ho which is false and accepting Ha, which is true, no Type I error occurred. ➤ (c). Step 1: Since the population average μ = 55, the claim Ho: μ = 55 is true and the counter-claim Ha: μ ≠ 10 is false. Step 2: Since the sample average is outside the interval the decision rule requires that you reject Ho and accept Ha.

,

Step 3: Since you are rejecting Ho, which is true and accepting Ha,which is false, a Type I error has occurred. 32.1 - Solved Problem 2: A large national corporation's past records show that their salespersons travel on average μ = 1,350 miles. They hire a statistician to determine if, during the past year, there has been a significant change in their travel mileage. A sample of 100 salespersons' travel records was taken. Assuming the standard deviation is σ = 150 miles. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I

error: D.R.: If 1,300 ≤ X ≤ 1,400 then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 1,234. If μ = 1,400, would a Type I error occur? (d). Modify the above decision rule so that α = 0.05. Solutions:

➤(a). From 31.1 - Problem 1, Lesson 31, Ho: μ = 1,350 Ha: μ ≠ 1,350 ➤(b). From Solved Problem 1, Lesson 31 we showed the probability of rejecting Ho and accepting Ha is 1 - 0.9992 ≈ 0. Therefore, α = 0. ➤(c). Step 1: Since μ = 1,400, Ho is false and Ha is true. Step 2: The decision rule is: D.R.: If 1,300 ≤ X ≤ 1,400 then reject Ha; otherwise reject Ho and accept Ha.

Step 3: Since x = 1,234, the decision rule requires us to reject Ho and accept Ha. Since we are rejecting Ho, which is false and accepting Ha, which is true, we are not making a Type I error. ➤(d). Step 1: We write the decision rule as: D.R: If 1350 - c* ≤ X ≤ 1350 + c* then reject Ha; otherwise reject Ho and accept Ha, where

Step 2: Since the error can occur on either side, we use

to find z. From the standard normal table z = 1.96. Step 3: 1350 - 29.4 ≤ X ≤ 1350 + 29.4 = 1320.6 ≤ X ≤ 1379.4

Therefore, the decision rule is D.R.: If 1320.6 ≤ X ≤ 1379.4 then reject Ha; otherwise reject Ho and accept Ha. 32.1 - Solved Problem 3: A recent study showed that the average number of hours a law student studies to pass the bar is μ = 1,565 with a standard deviation of σ = 122 hours. You wish to take a sample of size 49 and test this claim. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: D.R.: If 1530 ≤ X ≤ 1600 then reject Ha; otherwise reject Ho and accept Ha. (c).Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 1,540. If μ = 1,565, would a Type I error occur? (d). Modify the above decision rule so that α = 0.10.

Solutions: ➤(a). From 31.1- Solved Problem 2, Lesson 31, Ho: μ = 1,565

Ha: μ ≠ 1,565 ➤(b). From 31.1 - Solved Problem 2, Lesson 31, we showed the probability of rejecting Ho and accepting Ha is 1 - 0.9566 ≈ 0.0434. Therefore α = 0.0434. ➤(c). Step 1: Since μ = 1565, Ho is true. Step 2: The decision rule is D.R.: If 1,530 ≤ X ≤ 1,600 then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 1,540, the decision rule requires us to reject Ha. Since we are rejecting Ha, which is false, a Type I error did not occur. ➤(d). Step 1: We write the decision rule as If 1565 - c* ≤ X ≤ 1565 + c*, then reject Ha; otherwise reject Ho and accept Ha where

Step 2: Since the error can occur on either side, we use

to find z. From the standard normal table, z = 1.64. Step 3:

(1565 - 28.6 ≤ X ≤ 1565 + 28.6) = (1536.4 ≤ X ≤ 1593.6) Therefore, the decision rule is If 1536.4 ≤ X ≤ 1593.6 then reject Ha; otherwise reject Ho and accept Ha.

Unsolved Problems with Answers 32.1 - Problem 1: For a certain population, the following claim and counterclaim was made: Ho: μ = 125 Ha: μ ≠ 125 To test this claim a sample is to be taken from the population after creating the following decision rule: D.R. If

then reject Ha ; otherwise reject Ho and accept Ha.

Assume the sample mean resulted in

.

(a). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha and accept Ho. (b). If μ = 131 did a Type I error occur? (c). If μ = 125 did a Type I error occur?

Answers: ➤(a). Reject Ho and accept Ha ➤(b). A Type I error did not occur. ➤(c). A Type I error occurred. ⇑ Refer back to 32.1 - Example 1 & 32.1 - Solved Problem 1. 32.1 - Problem 2: Over many years, records have shown that in a certain large airport the average number of pieces of passenger luggage that was handled by the airport was 25,500 per day. Not satisfied with this number, the directors decided to modify the system that the employees used in handling the passengers' luggage. After completion, a random sample of 100 days was taken to find out if their had been any significant changes in the amount of luggage handled. Assume a standard deviation of 5,000. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: D.R.: If 25,000 ≤ X ≤ 26,000 then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule a sample is taken from the population resulting in a sample mean x = 25,897. If μ = 25,500, would a Type I error occur? (d). Modify the above decision rule so that α = 0.05. Answers: ➤(a). Ho: μ = 25,500 Ha: μ ≠ 25,500 ➤(b). α = 0.3174 ➤(c). A Type I error would not occur.

➤(d). D. R.: If 24,520 ≤ X ≤ 26,480 then reject Ha; otherwise reject Ho and accept Ha.. ⇑ Refer back to 32.1 - Example 2 & 32.1 - Solved Problem 2. 32.1 - Problem 3: It is claimed that a machine produces 1,000 electric sockets a minute. To test this claim, the output of this machine is tested over 36 minutes. Assume a standard deviation of 76 sockets a minute. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: D.R.: If 900 ≤ X ≤ 1,100 then reject Ha; otherwise reject H0 and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 1,105. If μ = 1,000, would a Type I error occur? (d). Modify the above decision rule so that α = 0.01. Answers: ➤(a). Ho: μ = 1,000 Ha: μ ≠ 1,000 ➤(b). α = 0. ➤(c). A Type I error would occur. ➤(d). D.R.: If 967.31 ≤ X ≤ 1032.69, then reject Ha; otherwise reject Ho and accept Ha. ⇑ Refer back to 32.1 - Example 3 & 32.1 - Solved Problem 3.

32.2- One-sided test

32.2 - Example 1: For a certain population, the following claim and counterclaim was made: Ho: μ = 10 Ha: μ > 10 To test this claim a sample is to be taken from the population after creating the following decision rule: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

Assume the sample mean resulted in (a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (c). If μ = 13 did a Type I error occur? (d). If μ = 10 did a Type I error occur? Solutions: ➤(a). Step 1: The alternative to Ha is Ho. Step 2: Since Ha: μ > 10, the alternative to μ > 10 is μ ≤ 10. Step 3: Since Ho is the alternative to Ha, the other version for Ho is Ho: μ ≤

10. ➤(b). Since the sample average reject Ho and accept Ha.

, the decision rule requires that you

➤(c). Step 1: Since the population average μ = 13, the claim Ho: μ ≤ 10 is false and the counter-claim Ha: μ > 10 is true. Step 2: Since the sample average that you reject Ho and accept Ha.

, the decision rule requires

Step 3: Since you are rejecting Ho, which is false and accepting Ha which is true, no Type I error occurred. ➤(d). Step 1: Since the population average μ = 10, the claim Ho: μ ≤ 10 is true and the counter-claim Ha: μ > 10 is false. Step 2: Since the sample average that you reject Ho and accept Ha.

, the decision rule requires

Step 3: Since you are rejecting Ho, which is true and accepting Ha,which is false, a Type I error has occurred. 32.2 - Example 2: In order to attract more winter tourists, a southern Florida resort hotel purchased a magazine advertisement that circulates in the New York city area. Part of the advertisement claimed that the average temperature in the resort area, during the month of January is μ = 75 degrees Fahrenheit. To challenge this claim, 36 past years are selected at random. Assume a standard deviation of 70. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error:

D.R.: If X < 730 then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 710. If μ = 780, would a Type I error occur? (d). Modify the above decision rule so that α = 0.05. Solutions: ➤(a). From 31.2 - Example.1, Lesson 31, Ho: μ = 750 Ha: μ < 750

➤(b). From 31.2 - Example 1, Lesson 31, the probability of rejecting H0 and accepting Ha is 0.0438. Therefore, α = 0.0438. ➤(c). Step 1: Since μ = 780, Ho is true. Step 2: The decision rule is D.R.:If X < 730 then reject Ho and accept Ha; otherwise reject Ha.

Step 3: Since x = 71o, the decision rule requires us to reject Ho and accept Ha. Since we are rejecting Ho, which is true, a Type I error occurs.

➤(d). Step 1: We write the decision rule as D.R.: If X < c* then reject Ho and accept Ha; otherwise, reject Ha. Since μ = 75,

Step 2: Since the α error can only occur on the left-hand side, we use 0.5 - α = 0.5 - 0.05 = 0.45 to find z.

From the standard normal table, z = -1.64. Step 3: 73.090 Therefore, the decision rule is D.R: If X < 73.090, then reject Ho and accept Ha; otherwise, reject Ha.

32.2 - Example 3: The manufacturer of fluorescent light bulbs advertises that the average life of a light bulb is μ = 15,000 hours of burning. A consumer group doubts this claim. Assume a random sample of 49 bulbs is selected. Assume a standard deviation of 700 hours. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: D.R.: If X < 14,900, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 15,125. If μ = 16,100 would a Type I error occur? (d). Modify the above decision rule so that α = 0.10. Solutions:

➤(a). Ho: μ = 15000 Ha: μ < 15000 ➤(b). From 31.2- Example 2, Lesson 31, the probability of rejecting Ho and accepting Ha is 0.1587. Therefore, α = 0.1587. ➤(c). Step 1: Since μ = 16,100, Ho is true. Step 2: The decision rule is: If X < 14,900, then reject Ho and accept Ha; otherwise reject Ha. Step 3: Since x = 15,125, the decision rule requires us to reject Ha. Since we are rejecting Ha, which is false, no Type I error occurs.

➤(d). Step 1: We write the decision rule as D.R.: If X < c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the α error can occur on only on the left-hand side, we use 0.5 α = 0.5 - 0.10 = 0.40 to find z. From the standard normal table, z = 1.28. Step 3: Therefore, the decision rule is D.R.: If X < 14,872, then reject H0 and accept Ha; otherwise, reject Ha.

Solved Problems 32.2 - Solved Problem 1: For a certain population, the following claim and counter-claim was made: Ho: μ = 55 Ha: μ < 55 To test this claim a sample is to be taken from the population after the following decision rule is created: D.R. If Ho.

then reject Ho and accept Ha; otherwise reject Ha and accept

Assume the sample mean resulted in (a). Give an alternative version of Ho.

.

(b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha and accept Ho. (c). If μ = 54.5 did a Type I error occur? (d). If μ = 57.66 did a Type I error occur? Solutions: ➤(a). Step 1: The alternative to Ha is Ho. Step 2: Since Ha: μ < 55, the alternative to μ < 55 is μ ≥ 55. Step 3: Since Ho is the alternative Ha, the other version for Ho is Ho: μ ≥ 55. ➤(b). Since the sample average Ho and accept Ha.

, the decision rule requires that you reject

➤(c). Step 1: Since the population average μ = 54.5, the claim Ho: μ ≥ 55 is false and the counter-claim Ha: μ < 55 is true. Step 2: Since the sample average you reject Ho and accept Ha.

, the decision rule requires that

Step 3: Since you are rejecting Ho, which is false and accepting Ha which is true, no Type I error occurred. ➤ (d). Step 1: Since the population average μ = 57.66, the claim Ho: μ ≥ 55 is true and the counter-claim Ha: μ < 55 is false. Step 2: Since the sample average

, the decision rule requires that

you reject Ho and accept Ha. Step 3: Since you are rejecting Ho, which is true and accepting Ha,which is false, a Type I error has occurred. 32.2 - Solved Problem 2: A U.S. Department of Agriculture study showed that over the last 50 years, cattle ranchers in southern Texas produced on average 25,200 head of cattle per year. As an attempt to increase this yield, 400 cattle ranches decided to feed their cattle a new hybrid of corn. After one year, the average number of cattle produced was 25,524. Assuming a standard deviation of 2,100 cattle, (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a Type I error: D.R. If X > 25,300, then reject H0 and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 25,125. If μ = 25,100 would a Type I error occur? (d). Modify the above decision rule so that α = 0.01.

Solutions: ➤(a). Ho: μ = 25,200

Ha: μ > 25,200 ➤(b). From 31.2 - Problem 1, Lesson 31, the probability of rejecting H0 and accepting Ha is 0.1701. Therefore, α = 0.1701. ➤(c). Step 1: Since μ = 25,100, Ho is true. Step 2: The decision rule is D.R.:If X > 25,300, then reject Ho and accept Ha; otherwise reject Ha.

Step 3: Since x = 25,125, the decision rule requires us to reject Ha. Since we are rejecting Ha, which is false, no Type I error occurs. ➤(d). Step 1: We write the decision rule as D. R: If X > c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the α error can occur on only on the right-hand side, we use 0.5 - α = 0.5 - 0.01 = 0.49 to find z.

From the standard normal table, z = 2.33. .

Step 3: Therefore, the decision rule is D.R.: If

> 25,444.65, then reject Ho and accept Ha; otherwise, reject Ha..

32.2 - Solved Problem 3: The Sally Stone Speed Reading System claims that a person using their system, after six weeks, will be able to read at least 1,200 words a minute. To test this claim, 100 graduating students of this program were tested for their speed in reading. Assume a standard deviation of 90 words a minute. (a). State Ho and Ha.

(b). For the following decision rule, find the probability α of making a type I

error: D.R.: If X < 1,100, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 1070. If μ = 1,630 would a Type I error occur? (d). Modify the above decision rule so that α = 0.05. Solutions: ➤(a). Ho: μ = 1200 Ha: μ < 1200 ➤(b). From 31.2 - Solved Problem 2, Lesson 31, the probability of rejecting Ho and accepting Ha is 0. Therefore, α = 0. ➤(c). Step 1: Since μ = 1,630, Ho is true. Step 2: The decision rule is If X < 1,100, then reject Ho and accept Ha; otherwise reject Ha. Step 3: Since x = 1,070, the decision rule requires us to reject Ho. Since we are rejecting Ho, which is true, a Type I error occurs. ➤(d). Step 1: We write the decision rule as D.R.: If X < c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the α error can occur on only on the left-hand side, we use

0.5 - α = 0.5 - 0.05 = 0.45 to find z. From the standard normal table, z = 1.64. Step 3: Therefore, the decision rule is D.R.: If X < 1185.24, then reject Ho and accept Ha; otherwise, reject Ha.

Unsolved Problem with Answers 32.2 - Problem 1: For a certain population, the following claim and counterclaim was made: Ho: μ = 125 Ha: μ > 125 To test this claim a sample is to be taken from the population after the following decision rule is created: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

Assume the sample mean resulted in

.

(a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (c). If μ = 139.71 did a Type I error occur? (d). If μ = 100 did a Type I error occur? Answers: ➤(a). Ho: μ ≤ 125

➤(b). Reject Ho and accept Ha. ➤(c). A Type I error did not occur. ➤(d). A Type I error did occur. ⇑ Refer back to 32.2 - Example 1 & 32.2 - Solved Problem 1. 32.2 - Problem 2: A certain manufacturing process has been used in the automobile industry to produce a part for the transmission system. This process, on average, takes 5.6 minutes per transmission system. The manufacturer of a new laser machine claims that their machine will decrease the average production time. Using this new machine, a time study was taken to determine the average time to produce 400 transmissions. This study resulted in an average time of 5.1 minutes per transmission system with a standard deviation of 0.45 minutes. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: Decision Rule: If X < 5, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = 5.2. If μ = 5.4 would a Type I error occur? (d). Modify the above decision rule so that α = 0.02. Answers: ➤(a). Ho: μ = 5.6 Ha: μ < 5.6 ➤(b). α = 0 ➤(c). No Type I error occurs. ➤(d). D.R.: If X < 5.55 then reject H0 and accept Ha; otherwise, reject Ha.

⇑ Refer back to 32.2 - Example 2 & 32.2 - Solved Problem 2. 32.2 - Problem 3: The union claims that the average worker at their Oakland plant earns no more than $8.90 an hour. To test this claim, a sample of 100 workers is taken. Assuming a standard deviation 0f $1.00. (a). State Ho and Ha. (b). For the following decision rule, find the probability α of making a type I error: D.R.: If X > $9.00, then reject H0 and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in a sample mean x = $9.10. If μ = $8.79 would a Type I error occur? (d). Modify the above decision rule so that α = 0.01. Answers: ➤(a). Ho: μ = $8.90 Ha: μ > $8.90 ➤(b). α ≈ 0.16 ➤(c). A Type I error occurs. ➤(d). Decision Rule: If X > $9.13, then reject Ho and accept Ha; otherwise, reject Ha. ⇑ Refer back to 32.2 - Example 3 & 32.2 - Solved Problem 3.

Supplementary Problems 1. A rare mint coin is sold to an auction house. The guarantee on the coin states that the coin is well balanced. To test this claim, the coin is tossed 100 times to check its balance. The following decision rule is used to verify the claim:

D.R.: If the number of heads is more than 55 or less than 45 heads, reject the claim that the coin is well balanced. a. State Ho and Ha. b. Find the probability of a Type I error α. c. Rewrite the decision rule so that α = 0.05 (approximately). 2. A spokesperson for a Federal agency claims that 60% of all medical students are female. To test this claim, 200 medical students are selected at random. The following decision rule is used: D.R.: If in this sample, the number of females is more than 128 or less than 112 then reject this claim. a. State Ho and Ha. b. Find the probability of a Type I error α. c. Rewrite the decision rule so that α = 0.01. 3. A union representing workers in the automobile industry claims that only 25% of all workers earn more than $12.00 an hour. A representative of the industry doubts this claim. To test this claim, 200 workers are randomly sampled. The following decision rule is used: D.R.: Let x represent the number of workers earning more than $12.00. If x > 54 then reject the claim. a. State Ho and Ha. b. Find the probability of a Type I error α. c. Rewrite the decision rule so that α = 0.10 (approximately). 4. A manufacturer of diet medication claims that their product is 92% successful in causing significant weight loss when taken over a 90 day period. To test this claim, 100 people took the medication for 90 days. a. State Ho and Ha.

b. Write the decision rule so that α = 0.05 (approximately). 5. A roulette wheel contains numbers 0, 1, 2,…, 36 and the double symbol 00. A Las Vegas casino each week needs to check the balance on the wheel. To test this balance, they spin the wheel 76 times and checks the total number of odd numbers that occur. The following decision rule is used to check if the wheel is balanced: D.R.: If the number of odd numbers that occur, in the sample, is not between 32 and 40 then conclude that the wheel is not in balance. a. State Ho and Ha. b. Find the probability of a Type I error α. c. Rewrite the decision rule so that α = 0.02 (approximately). 6. A television rating company reported on a typical day, 35% of all viewers of day-time soap operas are males. To verify this report, a sample of 500 viewers of soap operas are sampled. The following decision rule is used: D.R.: If, in this sample between 170 and 180 viewers are not males, then reject the claim. a. State Ho and Ha. b. Find the probability of a Type I error α. c. Rewrite the decision rule so that α = 0.03. 7. The warranty on a certain brand of tires assumes that 15% of all tires will last more than 35,000 miles. To check out this assumption, a sample is taken of 200 tires that are traded in. The following decision rule is to be developed: D.R: From the sample, let x represent the number of tires that lasted more than 35,000 miles. If X ≤ 28 then reject the claim. (a). State Ho and Ha. (b). Find the probability of a Type I error α. (c). Rewrite the decision rule so that α = 0.01.

8. The CEO of a large auto company claims that over the last 5 years only 1% of their trucks have been recalled. To check this claim, 1,000 trucks have been sampled and checked if they have been recalled. a. State Ho and Ha. b. Write the decision rule so that α = 0.05.

Statistical Inference Theory Lesson 33 Type II Error

A Type II error occurs, when 1. The alternative hypotheses Ha is true. 2. Ha is rejected. 3. The null hypotheses Ho is accepted or judgement is reserved. The probability that a Type II error occurs is indicated by the symbol β. The following examples and problems are a continuation of those in Lesson 31 and 32.

33.1-Two-sided test 33.1 - Example 1: For a certain population, the following claim and counter-

claim was made: Ho: μ = 10 Ha: μ ≠ 10 To test this claim a sample is to be taken from the population after creating the following decision rule: D.R.: If

then reject Ha; otherwise reject Ho and accept Ha.

Assume the sample mean resulted in

.

(a). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (b). If μ = 6 did an error occur? (c). If μ = 10 did an error occur? Solutions: ➤(a). Since the sample average is inside the interval decision rule requires that you reject Ha.

, the

➤(b). Step 1: Since the population average μ = 6, the claim Ho: μ = 10 is false and the counter-claim Ha: μ ≠ 10 is true. Step 2: Since the sample average is inside the interval the decision rule requires that you reject Ha.

,

Step 3: Since you are rejecting Ha, which is true, a Type II error occurred. ➤ (c). Step 1: Since the population average μ = 10, the claim Ho: μ = 10 is true and the counter-claim Ha: μ ≠ 10 is false. Step 2: Since the sample average

is inside the interval

,

the decision rule requires that you reject Ha. Step 3: Since you are rejecting Ha, which is false, no error has occurred. 33.1 - Example 2: A recent publication of the U.S. Army claims that the average age of new army personnel is μ = 21.5 years with a standard deviation of σ = 2.5 years. To test this claim, a random sample of 100 new personnel is taken. (a). State Ho and Ha. (b). Assume μ = 20.5. For the following decision rule, find the probability β of making a Type II error: D.R.: If 21 ≤ X ≤ 22, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 21.25. If μ = 20.5, does a Type I or a Type II error occurs? (d). For μ = 20.5, modify the decision rule so that β = 0.05. (e). For the modified decision rule, find α. Solutions:

➤(a).

Ho: μ = 21.5 Ha: μ ≠ 21.5 ➤(b). Since μ = 20.5, we need to find β = P{21 ≤ X ≤ 22}. Step 1: P{20.5 ≤ X ≤ 22} = P{0 ≤ z ≤ 6} ≈ 0.5

P{20.5 ≤ X ≤ 21} = P{0 ≤ z ≤ 2} = 0.4772 β = P{21 ≤ X ≤ 22} = 0.5 - 0.4772 = 0.0228 ➤(c). Step 1: Since μ = 20.5, Ha is true. Step 2: The decision rule is: D.R.: If 21 ≤ X ≤ 22, then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 21.25, the decision rule requires us to reject Ha. Since we are rejecting Ha, which is true, we are making a Type II error.

➤(d). Step 1: D.R.: If 21.5 - c* ≤ X ≤ 21.5 + c* then reject Ha; otherwise reject H0 and accept Ha. Step 2: Therefore, P{20.5 ≤ X ≤ 21.5 } = 0.5 P{20.5 ≤ X ≤ 21.5 - c*} = 0.45. P{21.5 - c* ≤ X ≤ 21.5 } = β = 0.05 Step 3: 21.5 - c* = 20.5 + zσX = 20.5 + 1.64(0.25) = 20.91 c* = 0.59 P{20.5 ≤ X ≤ 21.5 } = 0.5, the decision rule is D.R.: If 20.91 ≤ X ≤ 21.5 then reject Ha; otherwise reject Ho and accept Ha ➤(e). Step 1: For computing a Type I error, we have μ = 21.5 and from Step 1, the new decision rule: D.R.: If 20.91 ≤ X ≤ 22.09 then reject Ha; otherwise reject H0 and accept Ha. Step 2:

P{20.91 ≤ X ≤ 22.09} = P{0 ≤ Z ≤ 2.36} + P{-2.36 ≤ Z≤ 0} = 2P{0 ≤ Z ≤ 2.36 } = 2(0.4909) = 0.9818 Step 3: α = 1 - 0.9818 ≈ 0.02 33.1 - Example 3: At a book publisher's convention, an author of American history text books claimed that the average American history text book contains μ = 850 pages. Assume you wish to test his claim by taking a random sample of 36 American history texts. Assume a standard deviation of σ = 80 pages.

(a). State Ho and Ha. (b.) Assume μ = 870. For the following decision rule, find the probability β of making a Type II error: D.R.: If 840 ≤ X ≤ 860, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 878. If μ = 890, does a Type I or a Type II error occurs? Solutions: ➤(a). Ho: μ = 850 Ha: μ ≠ 850 ➤(b). Since μ = 870, we need to find β = P{840 ≤ X ≤ 860}. Step 1: P{860 ≤ X ≤ 870} = P{-0.75 ≤ z ≤ 0} = 0.2734

P{840 ≤ X ≤ 870} = P{-2.25 ≤ z ≤ 0} = 0.4878 β = P{840 ≤ X ≤ 860} = 0.4878 - 0.2734 = 0.2144

➤(c). Step 1: Since μ = 870, Ha is true. Step 2: The decision rule is: D.R.:If 840 ≤ X ≤ 860, then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 878, the decision rule requires us to reject Ho and accept Ha. Since we are accepting Ha, which is true, no error occurs.

Solved Problems 33.1 - Solved Problem 1: For a certain population, the following claim and counter-claim was made: Ho: μ = 55 Ha: μ ≠ 55 To test this claim a sample is to be taken from the population after creating the following decision rule: D.R.: If accept Ha.

then reject Ha and accept Ho; otherwise reject Ho and

Assume the sample mean resulted in

.

(a). From the decision rule, which of the following would you do:

i. Reject Ho and accept Ha. ii. Reject Ha. (b). If μ = 52 did an error occur? (c). If μ = 55 did an error occur? Solutions: ➤(a). Since the sample average is inside the interval decision rule requires that you reject Ha accept Ho.

, the

➤(b). Step 1: Since the population average μ = 52, the claim Ho: μ = 55 is false and the counter-claim Ha: μ ≠ 55 is true. Step 2: Since the sample average is inside the interval the decision rule requires that you reject Ha accept Ho.

,

Step 3: Since you are rejecting Ha, which is true, and accept Ho which is false, a Type II error occurred. ➤(c). Step 1: Since the population average μ = 55, the claim Ho: μ = 55 is true and the counter-claim Ha: μ ≠ 10 is false. Step 2: Since the sample average is inside the interval the decision rule requires that you reject Ha accept Ho.

,

Step 3: Since you are rejecting Ha and accept Ho which is false, no error has occurred. 33.1 - Solved Problem 2: A large national corporation's past records show that their salespersons travel on average μ = 1,350 miles. They hire a statistician to determine if, during the past year, there has been a significant change in their travel mileage. A sample of 100 salespersons' travel records was taken. Assuming the standard deviation is σ = 150 miles. (a). State Ho and Ha.

(b.) Assume μ = 1,250. For the following decision rule, find the probability β of making a Type II error: D.R.: If 1,300 ≤ X ≤ 1,400, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population. resulting in x = 1,420. If μ = 1,350, does a Type I or a Type II error occurs? (d). For μ = 1,450, modify the decision rule so that β = 0.02. (e). For the modified decision rule, find α.

Solutions: ➤(a). Ho: μ = 1,350 Ha: μ ≠ 1,350 ➤(b). Since μ = 1,250, we need to find β = P{1,300 ≤ X ≤ 1,400}.

Step 1: P{1250 ≤ X ≤ 1,400} = P{0 ≤ z ≤ 10} ≈ 0.5

P{1250 ≤ X ≤ 1,300} = P{0 ≤ z ≤ 3.33} = 0.4996 β = P{1,300 ≤ X ≤ 1,400} = 0.5 - 0.4996 = 0.0004 ≈ 0 ➤(c). Step 1: Since μ = 1350, Ho is true. Step 2: The decision rule is:

D.R.: If 1,300 ≤ X ≤ 1,400, then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 1,420, the decision rule requires us to reject Ho and accept Ha. Since we are rejecting Ho, which is true, we are making a Type I error. ➤(d).

Step 1: D.R.: If 1350 - c* ≤ X ≤ 1350 + c* then reject Ha; otherwise reject Ho and accept Ha. Step 2: Therefore, P{1350 ≤ X ≤ 1450 } = 0.5 P{1350 + c* ≤ X ≤ 1450} = 0.48. P{1350 ≤ X ≤ 13450 + c* } = β = 0.02 Step 3: 1350 + c* = 1450 + zσX = 1450 - 2.05(15) = 1419.25 c* = 69.25 Since P{1350 ≤ X ≤ 1450 } = 0.5, the decision rule is D.R.: If 1350 ≤ X ≤ 1419.25 then reject Ha; otherwise reject Ho and accept Ha. ➤(e). Step 1: For computing a Type I error, we have μ = 1350 and the new decision rule: D.R.: If 1280.75 ≤ X ≤ 1419.25 then reject Ha; otherwise reject Ho and accept Ha. Step 2:

P{1280.75 ≤ X ≤ 1419.25} = P{0 ≤ Z ≤ 4.62} + P{-4.62 ≤ Z≤ 0} = 2P{0 ≤ Z ≤ 4.62 } = 2(0.4999) = 0.9998 Step 3: α = 1 - 0.98998 ≈ 0 33.1 - Solved Problem 3: A recent study showed that the average number of hours a law student studies to pass the bar is μ= 1,565 with a standard deviation of σ = 122 hours. You wish to take a sample of size 49 and test this claim.

(a). State Ho and Ha. (b.) Assume μ = 1,620. For the following decision rule, find the probability β of making a Type II error: D.R.: If 1530 ≤ X ≤ 1600, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population. resulting in x = 1,420. If μ = 1,600, does a Type I or a Type II error occurs?

Solutions: ➤(a). Ho: μ = 1,565 Ha: μ ≠ 1,565 ➤(b). Since μ = 1,620, we need to find β = P{1,530 ≤ X ≤ 1,600}. Step 1: P{1,530 ≤ X ≤ 1,620} = P{-5.16 ≤ z ≤ 0} ≈ 0.5

P{1,600 ≤ X ≤ 1,620} = P{-1.15 ≤ z ≤ 0} = 0.3749

β = P{1,530 ≤ X ≤ 1,600} = 0.5 - 0.3749 = 0.1251 ➤(c). Step 1: Since μ = 1,600, Ha is true. Step 2: The decision rule is: D.R.: If 1,530 ≤ X ≤ 1,600, then reject Ha; otherwise reject Ho and accept Ha. Step 3: Since x = 1,420, the decision rule requires us to reject Ho and accept Ha. Since we are accepting Ha, which is true, no error occurs.

Unsolved Problems with Answers 33.1 - Problem 1: For a certain population, the following claim and counterclaim was made: Ho: μ = 125 Ha: μ ≠ 125 To test this claim a sample is to be taken from the population after creating the following decision rule: D.R.: If

then reject Ha ; otherwise reject Ho and accept Ha.

Assume the sample mean resulted in

.

(a). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (b). If μ = 131 did an error occur? (c). If μ = 125 did an error occur? Answers:

➤(a). Reject Ha. ➤(b). A Type II error occurred. ➤(c). No error occurred. ⇑ Refer back to 33.1 - Example 1 & 33.1 - Solved Problem 1. 33.1 - Problem 2: Over many years, records have shown that in a certain large airport the average number of pieces of passenger luggage that was handled by the airport was 25,500 per day. Not satisfied with this number, the directors decided to modify the system that the employees used in handling the passengers' luggage. After completion, a random sample of 100 days was taken to find out if their had been any significant changes in the amount of luggage handled. Assume a standard deviation of 5,000. (a). State Ho and Ha. (b.) Assume μ = 27,000. For the following decision rule, find the probability β of making a Type II error: D.R.: If 25,000 ≤ X ≤ 26,000, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 25,200. If μ = 26,125, does a Type I or a Type II error occurs? (d). For μ = 27,000, modify the decision rule so that β = 0.01. (e). For the modified decision rule, find α. Answers: ➤(a). Ho: μ = 25,500 Ha: μ ≠ 25,500 ➤(b). β = 0.0228 ➤(c). A Type II error occurs.

➤(d). D.R.: If 25,165 ≤ X ≤ 25,835 then reject Ha; otherwise reject H0 and accept Ha. ➤(e). α = 0.5 ⇑ Refer back to 33.1 - Example 2 & 33.1 - Solved Problem 2. 33.1 - Problem 3: It is claimed that a machine produces 1,000 electronic sockets a minute. To test this claim, the output of this machine is tested over 36 minutes. Assume a standard deviation of 76 sockets a minute. (a). State Ho and Ha. (b.) Assume μ = 1,125. For the following decision rule, find the probability β of making a Type II error: D.R.: If 900 ≤ X ≤ 1,100, then reject Ha; otherwise reject Ho and accept Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting x = 1,121. If μ = 1,021, does a Type I or a Type II error occurs? Answers: ➤(a). Ho: μ = 1,000 Ha: μ ≠ 1,000 ➤(b). β = 0.0244 ➤(c). No error occurs. ⇑ Refer back to 33.1 - Example 3 & 33.1 - Solved Problem 3.

33.2 - One-sided test 33.2 - Example 1: For a certain population, the following claim and counterclaim was made: Ho: μ = 10

Ha: μ > 10 To test this claim a sample is to be taken after creating the following decision rule: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

Assume the sample mean resulted in

.

(a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (c). If μ = 13 did an error occur? (d). If μ = 10 did an error occur? Solutions: ➤(a). Step 1: The alternative to Ha is Ho. Step 2: Since Ha: μ > 10, the alternative to μ > 10 is μ ≤ 10. Step 3: Since Ho is the alternative to Ha, the other version for Ho is Ho: μ ≤ 10. ➤(b). Since the sample average reject Ha.

, the decision rule requires that you

➤(c). Step 1: Since the population average μ = 13, the claim Ho: μ ≤ 10 is false and the counter-claim Ha: μ > 10 is true. Step 2: Since the sample average that you reject Ha.

, the decision rule requires

Step 3: Since you are rejecting Ha, which is true, a Type II error has occurred. ➤(d). Step 1: Since the population average μ = 10, the claim Ho: μ ≤ 10 is true and the counter-claim Ha: μ > 10 is false. Step 2: Since the sample average that you reject Ha.

, the decision rule requires

Step 3: Since you are rejecting Ha, which is false, no error has occurred. 33.2 - Example 2: In order to attract more winter tourists, a southern Florida resort hotel purchased a magazine advertisement that circulates in the New York city area. Part of the advertisement claimed that the average temperature in the resort area, during the month of January is μ = 75 degrees Fahrenheit. To challenge this claim, 36 past years are selected at random. Assume a standard deviation of 7o. (a). State Ho and Ha. (b). Assume μ = 72o. For the following decision rule, find the probability β of making a Type II error: D.R.: If x < 73.09o, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 74.1o. If μ = 70o, does a Type I or a Type II error occurs? (d). If μ = 72o, modify the decision rule so that β = 0.05. (e). For the modified decision rule, find α. Solutions:

➤(a). From 32.2 - Example 1, Lesson 32, Ho: μ = 75o Ha: μ < 75o

➤(b). Step 1: From the decision rule, Ha is rejected if the sample results in X ≥ 73.09o. This will result in a Type II error. Therefore, β = P{X ≥ 73.09o} when μ = 72o. Step 2: Since μ = 72o,

Step 3: From the normal distribution table,

P{0 ≤ z ≤ 0.93} = 0.3238 β = P{X ≥ 73.09o} = 0.5 - P{0 ≤ z ≤ 0.93} = 0.5 - 0.3238 = 0.1762 ➤(c). Step 1: Since μ = 74.1o, Ha is true. Step 2: The decision rule is: D.R.: If X < 73.09o, then reject Ho and accept Ha; otherwise, reject Ha. Step 3: Since X = 74.1o, the decision rule requires us to reject Ha. Since we are rejecting Ha, which is true, we are making a Type II error. ➤(d). Step 1: We write the decision rule as

D.R.: If X < c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the β error will occur on the right-hand side, we use 0.5 - β = 0.5 - 0.05 = 0.45 to find z. From the standard normal table, z = 1.64.

Step 3: Therefore, the decision rule is D.R.: If X < 73.91o, then reject Ho and accept Ha; otherwise, reject Ha. ➤ (e). If μ = 75o, then α = P{X < 73.91o} Step 1: Therefore, α = P{X < 73.91o} = P{z < -0.93} = 0.5 - 0.3238 = 0.1762 33.2 - Example 3: A consumer group claims that the average life of a ligh bulb manufactured by a certain company is μ = 15,000 hours. The company denies this claim. Assume a standard deviation of 700 hours. To test this claim, a sample N = 49 bulbs is taken. (a). State Ho and Ha.

(b). Assume μ = 15,175. For the following decision rule, find the probability β of making a Type II error: D.R.: If X > 15,100, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 15,549.

If μ = 16,111, does a Type I or a Type II error occurs? (d). If μ = 15,175, modify the decision rule so that β = 0.01. (e). For the modified decision rule, find α. Solutions: ➤(a). Ho: μ = 15000 Ha: μ > 15000 ➤(b). Step 1: From the decision rule, Ha is rejected if the sample results in X ≤ 15,100. This will result in a Type II error. Therefore, β = P{X ≤ 15,100} when μ = 15,175. Step 2: Since μ = 15,175

Step 3: From the normal distribution table, P{-0.75 ≤ z ≤ 0} = 0.2734 β = P{X ≥ 15,100} = 0.5 - P{-0.75 ≤ z ≤ 0} = 0.5 - 0.2734 = 0.2266 ➤(c). Step 1: Since μ = 16,111, Ha is true. Step 2: The decision rule is: D.R.: If X > 15,100, then reject Ho and accept Ha; otherwise, reject Ha.

Step 3: Since x = 15,549, the decision rule requires us to reject Ho and accept Ha. Since we are accepting, which is true, no error occurs. ➤ (d). Step 1: We write the decision rule as D.R.: If X > c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the β error will occur on the left-hand side, we use

0.5 - β = 0.5 - 0.01 = 0.49 to find z. From the standard normal table, z = -2.33.

Step 3: Therefore, the decision rule is D.R.: If X > 14,942, then reject Ho and accept Ha; otherwise, reject Ha. ➤ (e). If μ = 15,000, then α = P{X > 14,942}

Therefore, α = P{X > 14,933} = 0.5 + P{- 0.58≤ z ≤ 0} = 0.5 + 0.2190 = 0.7190

Solved Problems 33.2 - Solved Problem 1: For a certain population, the following claim and counter-claim was made: Ho: μ = 55 Ha: μ < 55 To test this claim a sample is to be taken after creating the following decision rule:

D.R.: If Ho.

then reject Ho and accept Ha; otherwise reject Ha and accept

Assume the sample mean resulted in

.

(a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha and accept Ho. (c). If μ = 54.5 did an error occur? (d). If μ = 57.66 did an error occur? Solutions: ➤(a). Step 1: The alternative to Ha is Ho. Step 2: Since Ha: μ < 55, the alternative to μ < 55 is μ ≥ 55. Step 3: Since Ho is the alternative Ha,the other version for Ho is Ho: μ ≥ 55. ➤(b). Since the sample average Ha and accept Ho.

, the decision rule requires that you reject

➤(c). Step 1: Since the population average μ = 54.5, the claim Ho: μ ≥ 55 is false and the counter-claim Ha: μ < 55 is true. Step 2: Since the sample average you reject Ha and accept Ho.

, the decision rule requires that

Step 3: Since you are rejecting Ha, which is true and accepting Ho which is false, a Type II error has occurred. ➤(d). Step 1: Since the population average μ = 57.66, the claim Ho: μ ≥ 55 is true

and the counter-claim Ha: μ < 55 is false. Step 2: Since the sample average you reject Ha and accept Ho.

, the decision rule requires that

Step 3: Since you are rejecting Ha, which is false and accepting Ho,which is true, no error has occurred. 33.2 Solved Problem 2: A U.S. Department of Agriculture study showed that over the last 50 years, cattle ranchers in southern Texas produced, on average, 25,200 head of cattle per year. As an attempt to increase this yield, 400 cattle ranches decided to feed their cattle a new hybrid of corn. After one year, the average number of cattle produced was 25,524. Assuming a standard deviation of 2,100 cattle, (a). State Ho and Ha. (b). Assume μ = 25,500. For the following decision rule, find the probability β of making a Type II error: D.R.: If X > 25,300, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 25,540. If μ = 24,980, does a Type I or a Type II error occurs? (d). If μ = 25,500, modify the decision rule so that β = 0.10. (e). For the modified decision rule, find α.

Solutions: ➤(a). Ho: μ = 25,200 Ha: μ > 25,200 ➤(b). Step 1: From the decision rule, Ha is rejected if the sample results in ≤ 25,300. This will result in a Type II error. Therefore, β = P{X ≤ 25,300} when μ = 25,500. Step 2: Since μ = 25,500,

Step 3:

From the normal distribution table, P{-1.90 ≤ z ≤ 0} = 0.4713 β = P{X ≤ 25,300} = 0.5 - P{-1.90 ≤ z ≤ 0} = 0.5 - 0.4713 = 0.0287 ➤(c). Step 1: Since μ = 24,980, Ho is true. Step 2: The decision rule is: D.R.: If X > 25,300, then reject Ho and accept Ha; otherwise, reject Ha. Step 3: Since x = 25,540, the decision rule requires us to reject Ho and accept Ha. Since we are rejecting Ho, which is true, a Type I error occurs. ➤(d). Step 1: We write the decision rule as

D.R.: If X > c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the β error will occur on the left-hand side, we use 0.5 - β = 0.5 - 0.1 = 0.40 to find z. From the standard normal table, z = 1.28. Step 3: Therefore, the decision rule is D.R.: If

➤(e).

> 25,365.60, then reject Ho and accept Ha; otherwise, reject Ha.

If μ = 25,200, then α = P{X > 25,365.60} Step 1: Therefore, α = P{X > 25,365.6} = P{z > 1.58} = 0.5 - 0.4429 = 0.0571 33.2 - Solved Problem 3: The Sally Stone Speed Reading System claims that a person using their system, after six weeks, will be able to read at least 1,200 words a minute. To test this claim, 100 graduating students of this program were tested for their speed in reading. Assume a standard deviation of 90 words a minute. (a). State Ho and Ha. (b). Assume μ = 1,000. For the following decision rule, find the probability β of making a Type II error: D.R.: If X < 1,100, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 1,000. If μ = 12,080, does a Type I or a Type II error occurs? (d). If μ = 1,000, modify the decision rule so that β = 0.05. (e). For the modified decision rule, find α. Solutions: ➤(a). Ho: μ = 1200 Ha: μ < 1200 ➤ (b). Step 1: From the decision rule, Ha is rejected if the sample results in x≥ 1,100. This will result in a Type II error. Therefore, β = P{X ≥ 1,100} when μ = 1,000.

Step 2: Since μ = 1,000,

Step 3: Since z = 11.11 is not on the table,

P{0 ≤ z ≤ 11.11} ≈ 0.5 β = P{X ≥ 1,100} = 0.5 - P{0 ≤ z ≤ 11.11} = 0.5 - 0.5 = 0

➤ (c). Step 1: Since μ = 12,080, Ho is true. Step 2: The decision rule is: D.R.:If X < 1,100, then reject Ho and accept Ha; otherwise, reject Ha. Step 3: Since x = 1,000, the decision rule requires us to reject Ho and accept Ha. Since we are rejecting Ho, which is true, we are making a Type I error. ➤ (d). Step 1: We write the decision rule as D.R.: If X < c*, then reject Ho and accept Ha; otherwise, reject Ha where

Step 2: Since the β error will occur on the right-hand side, we use 0.5 - β = 0.5 - 0.05 = 0.45 to find z. From the standard normal table, z = 1.64.

Step 3: c* = 1000 + 1.64(9) = 1,014.76 Therefore, the decision rule is D.R.: If X < 1,014.76 then reject Ho and accept Ha; otherwise, reject Ha. ➤ (e).

If μ = 1,200, then α = P{X < 1,014.76}.

Step 1: Therefore, α = P{X 125 To test this claim a sample is to be taken after creating the following decision rule: D.R.: If X ≥ 128 then reject Ho and accept Ha; otherwise reject Ha. Assume the sample mean resulted in (a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha.

(c). If μ = 139.71 did an error occur? (d). If μ = 100 did an error occur? Answers: ➤(a). Ho: μ ≤ 125 ➤(b). Reject Ha. ➤(c). A Type II error has occur. ➤(d). No error has occur. ⇑ Refer back to 32.2 - Example 1 & 32.2 - Solved Problem 1. 33.2 - Problem 2: A certain manufacturing process has been used in the automobile industry to produce a part for the transmission system. This process, on average, takes 5.6 minutes per transmission system. The manufacturer of a new laser machine claims that their machine will decrease the average production time. Using this new machine, a time study was taken to determine the average time to produce 400 transmissions. This study resulted in an average time of 5.1 minutes per transmission system with a standard deviation of 0.45 minutes. (a). State Ho and Ha. (b). Assume μ = 5. For the following decision rule, find the probability β of making a Type II error: D.R.: If X < 5, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = 5.1. If μ = 5.5, does a Type I or a Type II error occurs? (d). If μ = 5, modify the decision rule so that β = 0.05. (e). For the modified decision rule, find α. Answers: ➤(a).

Ho: μ = 5.6 Ha: μ < 5.6 ➤(b). β = 0.5 ➤(c). A Type II error. ➤(d). D.R.: If X < 5.04, then reject Ho and accept Ha; otherwise, reject Ha. ➤(e). α = 0 ⇑ Refer back to 33.2 - Example 2 & 33.2 - Solved Problem 2. 33.2 - Problem 3: The union claims that the average worker at their Oakland plant earns no more than $8.90 an hour. To test this claim, a sample of 100 workers is taken. Assuming a standard deviation of $1.00. (a). State Ho and Ha. (b). Assume μ = $9.12. For the following decision rule, find the probability β of making a Type II error: D.R.: If X > $9.00, then reject Ho and accept Ha; otherwise, reject Ha. (c). Following the creation of the decision rule, a sample is taken from the population resulting in x = $9.21. If μ = $9.10, does a Type I or a Type II error occur? (d). If μ = $9.12, modify the decision rule so that β = 0.05. (e). For the modified decision rule, find α. Answers: ➤(a). Ho: μ = $8.90 Ha: μ > $8.90 ➤(b). β = 0.12 ➤(c). No error occurs.

➤(d). D.R.: If X > $8.96, then reject Ho and accept Ha; otherwise, reject Ha. ➤(e). α = 0.2743 ⇑ Refer back to 33.2 - Example 3 & 33.2 - Solved Problem 3.

Supplementary Problems 1. A rare mint coin is sold to an auction house. The guarantee on the coin states that the coin is well balanced. To test this claim, the coin is tossed 100 times to check its balance. The following decision rule is used to verify the claim: D.R.: If the number of heads is between 45 and 55 heads, then accept the claim that the coin is well balanced. Assume the coin is not balanced and the probability of heads on each toss is p = 0.60. a. State Ho and Ha. b. Find the probability of a Type II error β. 2. A spokesperson for a Federal agency stated that 60% of all medical students are female. To test this claim, 200 medical students are selected at random. The following decision rule is used: D.R.: If in this sample, between 112 and 128 medical students are found to be females, then accept this claim. Assume the claim is in error and the true percentage of women attending medical school is 52%. a. State Ho and Ha. b. Find the probability of a Type II error β. 3. A union representing workers in the automobile industry claims that only 25% of all workers earn more than $12.00 an hour. A representative of the industry doubts this claim. To test this claim, 200 workers are randomly sampled. The following decision rule is used: D.R.: Let x represent the number of workers earning more than $12.00. If x ≥

55 then reject the union's claim. Assume that the union's claim is in error and 30% of all workers earn more than $12.00. a. State Ho and Ha. b. Find the probability of a Type II error β. 4. A manufacturer of diet medication claims that their product is 92% successful in causing significant weight loss when taken over a 90 day period. A consumer group doubts this claim. To test this claim, they take a survey of 100 people who took the medication for 90 days. Assume that the real chance that the medication can result in a person losing significant weight is 80%. a. State Ho and Ha. b. Write the decision rule so that β = 0.05. 5. A roulette wheel contains numbers 0, 1, 2,…, 36 and the double symbol 00. A Las Vegas casino each week needs to check the balance on the wheel. To test this balance, they spin the wheel 76 times and check the total number of odd numbers that occur. The following decision rule is used to check if the wheel is balanced: D.R.: If the number of odd numbers that occur is between 32 and 40, then conclude that the wheel is balanced. Assume the wheel is really out of balance and the chance of an odd number per spin is p = 0.62. a. State Ho and Ha. b. Find the probability of a Type II error β. 6. A television rating company reported on a typical day that 35% of all viewers of day-time soap operas are males. To verify this report, a sample of 500 viewers of soap operas are taken. The following decision rule is used: Assume that the claim is in error and that 45% of all viewers are males. D.R.: If, in this sample between 170 and 180 viewers are males, then accept the claim.

a. State Ho and Ha. b. Find the probability of a Type II error β. 7. The warranty on a certain brand of tires assumes that 15% of all tires will last more than 35,000 miles. To check out this assumption, a sample is made of 200 tires that are traded in. The following decision rule is to be developed: D.R.: From the sample, let X represent the number of tires that lasted more than 35,000 miles. If 29 ≤ X than accept the claim. Assume the claim is not correct and the real percentage of tires that last more than 35,000 miles is 11%. a. State Ho and Ha. b. Find the probability of a Type II error β. 8. The CEO of a large auto company claims that over the last 5 years, only 1% of their trucks have been recalled. To check this claim, 1,000 trucks have been sampled and checked if they have been recalled. Assume that the CEO's claim is false and that the true percent of trucks that have been recalled is 2%. a. State Ho and Ha. b. Write the decision rule so that β = 0.03 (approximately). c. Find the probability of a Type I error α. 9. The Double Bubble Water Company purchased a new machine that fills 16 oz. of water. Concerned about the machine overfilling, a quality control engineer, each hour, samples 100 bottles for their quantity. The decision to stop the machine from this sample is based on the following decision rule: D.R.: If the sample results in an average exceeding 16.2 oz, the machine will be stopped. For what values of μ will β ≤ 0.05? Assume σ = 1.5. 10. The following is a continuation of problem 9. The manufacturer of the machine claims that there is only a 10% chance the machine will over fill. Assuming a β = 0.05, find he probability that if the machine is shut down, the machine is working properly.

11. For a certain population, the following claim and counter-claim was made: Ho: μ = 10 Ha: μ > 10 To test this claim a sample of N = 100 is to be taken. Assuming σ = 2, for the following decision rule: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

a. find c* so that α = 0.01. b. find the minimal range of values of μ for 0 ≤ β ≤1.

Statistical Inference Theory Lesson 34 (Optional)* Combining Type I & Type II Errors

When designing a one-sided test, the Type I and Type II errors can first be decided upon before sampling is actually carried out. The following examples and problems are a continuation of those in Lesson 33.

34.1-Creating decision rules from α and β for one-sided tests 34.1 - Example 1: In order to attract more winter tourists, a southern Florida resort hotel purchased a magazine advertisement that circulates in the New York city area. Part of the advertisement claimed that the average temperature in the resort area during the month of January is μ = 75 degrees Fahrenheit. To challenge this claim, N past years are selected at random. Assume a standard deviation of 7o.

(a). State Ho and Ha. (b). For the following decision rule, D.R.: Take a sample of size N. If X < c*, then reject Ho and accept Ha; otherwise, reject Ha. Find N and c* so that α = 0.05 and β = 0.05 when μ = 73o. (c). Restate the decision rule. Solutions: ➤(a). Ho: μ = 75o

Ha: μ < 75o ➤(b). Case 1: Type I error Step 1: μ = 75

Step 2: P{ < c* < 75 } = 0.05 From the standard normal distribution table, we look-up z for 0.5 - 0.05 = .45: z = - 1.64.

c* =

, equation 1

Case 2: Type II error Step 1: μ = 73

Step 2: P{73 < c* < X} = 0.05 From the standard normal distribution table, we look-up z for 0.5 - 0.05 = .45: z = 1.64.

Since the c* and N are the same for equation 1 and equation 2, we set the two equations equal and solve first for N:

N = 121 From equation 2,

➤(c).

D.R.: Take a sample of size 121. If X < 74, then reject Ho and accept Ha; otherwise, reject Ha. 34.1 - Example 2: The manufacturer of fluorescent light bulbs advertises that the average life of a light bulb is μ = 15,000 hours of burning. A consumer group doubts this claim. Assume a random sample of N bulbs is selected and a standard deviation of 700 hours. (a). State Ho and Ha. (b). For the following decision rule, D. R.: Take a sample of size N. If X < c*, then reject Ho and accept Ha; otherwise, reject Ha. Find N and c* so that α = 0.05 and β = 0.01 when μ = 14,700. (c). Restate the decision rule. Solutions: ➤(a). Ho: μ = 15000 Ha: μ < 15000 ➤(b). Case 1: Type I error

Step 1: μ = 15,000

Step 2: P{X < c* < 1500} = 0.05 From the standard normal distribution table, we look-up z for 0.5 - 0.05 = 0.45. Therefore, z = - 1.64

Case 2: Type II error

Step 1: μ = 14,700

Step 2: P{14700 < c* < X } = 0.01 From the standard normal distribution table, we look-up z for 0.5 - 0.01 = 0.49: z = 2.33.

Since the c* and N are the same for equation 1 and equation 2, we set the two equations equal and solve first for N:

N = 81 From equation 2,

➤(c). Decision Rule: Take a sample of size 81. If accept Ha; otherwise, reject Ha.

< 14,881, then reject Ho and

Solved Problems 34.1 - Solved Problem 1: A U.S. Department of Agriculture study showed that over the last 50 years, cattle ranchers in southern Texas produced on average 25,200 head of cattle per year. As an attempt to increase this yield, a sample of N cattle ranches decided to feed their cattle a new hybrid of corn. After one year, the average number of cattle produced was 25,524. Assuming a standard deviation of 2,100 cattle, (a). State Ho and Ha. (b). For the following decision rule, D.R.: Take a sample of size N. If X > c*, then reject Ho and accept Ha; otherwise, reject Ha. Find N and c* so that α = 0.01 and β = 0.05 when μ = 25,500. (c). Restate the decision rule. Solutions: ➤(a).

Ho: μ = 25,200 Ha: μ > 25,200 ➤(b). Case 1: Type I error Step 1: μ = 25,200

Step 2: P{25,200 < c* < X } = 0.01 From the standard normal distribution table, we look-up z for 0.5 - 0.01 = .49 and find z = 2.33.

Case 2: Type II error

Step 1: μ = 25,500

Step 2: P{ < c* < 25,500} = 0.05 From the standard normal distribution table, we look-up z for 0.5 - 0.05 = .45: z = -1.64.

Since the c* and N are the same for equation 1 and equation 2, we set the two equations equal and solve first for N:

N = 784 From equation 2,

➤(c). D.R.: Take a sample of size 784. If X > 25,377, then reject Ho and accept Ha; otherwise, reject Ha. 34.1 - Solved Problem 2: The Sally Stone Speed Reading System claims that a person using their system, after six weeks, will be able to read at least 1,200 words a minute. To test this claim, N graduating students of this program were tested for their speed in reading. Assume a standard deviation of 90 words a minute. (a). State Ho and Ha. (b). For the following decision rule,

D.R.: Take a sample of size N. If X < c*, then reject Ho and accept Ha; otherwise, reject Ha. Find N and c* so that α = 0.01 and β = 0.01 when μ = 1,150. (c). Restate the decision rule.

Solutions: ➤(a). Ho: μ = 1200 Ha: μ < 1200 ➤ (b). Case 1: Type I error. Step 1: μ = 1,200

Step 2: P{X < c* < 1200} = 0.01 From the standard normal distribution table, we look-up z for 0.5 - 0.01 = 0.49 and we find z = -2.33.

Case 2: Type II error

Step 1: μ = 1,150

Step 2: P{1000 < c* < X} = 0.01 From the standard normal distribution table, we look-up z for 0.5 - 0.1 = 0.49: z = 2.33.

Since the c* and N are the same for equation 1 and equation 2, we set the two equations equal and solve first for N:

N = 64 From equation 2,

➤ (c). D.R.: Take a sample of size 64. If X < 1176.21, then reject Ho and accept Ha; otherwise, reject Ha.

Unsolved Problem with Answers 34.1 - Problem 1: A certain manufacturing process has been used in the automobile industry to produce a part for the transmission system. This process, on average, takes 5.6 minutes per transmission system. The manufacturer of a new laser machine claims that their machine will decrease the average production time. Using this new machine, a time study was taken to determine the average time to produce N transmissions. This study resulted in an average time of 5.1 minutes per transmission system with a standard deviation of 0.45 minutes. (a). State Ho and Ha. (b). For the following decision rule, D.R.: If X < c*, then reject Ho and accept Ha; otherwise, reject Ha. Find N and c* so that α = 0.02 and β = 0.05 when μ = 5.4. (c). Restate the decision rule. Answers: ➤(a). Ho: μ = 5.6 Ha: μ < 5.6 ➤(b). N = 69, c* = 5.49 ➤(c). D.R.: Take a sample of 64 transmissions. If X < 5.39, then reject Ho and accept Ha; otherwise, reject Ha. ⇑ Refer back to 34.1 - Example 1 & 34.1 - Solved Problem 1. 34.1 - Problem 2: The union claims that the average worker at their Oakland plant earns no more than $8.90 an hour. To test this claim, a sample of N workers is taken. Assuming a standard deviation 0f $1.00.

(a). State Ho and Ha. (b). For the following decision rule, D.R.: If X > c*, then reject Ho and accept Ha; otherwise, reject Ha. Find N and c* so that α = 0.10 and β = 0.05 when μ = $9.00. (c). Restate the decision rule. Answers: ➤ (a). Ho: μ = $8.90 Ha: μ > $8.90 ➤ (b). N ≈ 853 c* = $8.94 ➤ (c). D.R.: If X > $8.94, then reject Ho and accept Ha; otherwise, reject Ha. ⇑ Refer back to 34.1 - Example 2 & 34.1 - Solved Problem 2.

34.2 - Creating decision rules from α and β for two-sided tests 34.2 - Example 1: At a book publisher's convention, an author of American history text books claimed that the average American history text book contains μ = 850 pages. Assume you wish to test his claim by taking a random sample of N American history texts. Assume a standard deviation of σ = 80 pages. (a). State Ho and Ha. (b). For the following decision rule, D.R.: If 850 - c* ≤ X ≤ 850 + c*, then reject Ha; otherwise, reject H0 and

accept Ha. Find N and c* so that α = 0.05 and β = 0.05 when μ = 870 and P{X < 850} = . (c). Restate the decision rule. Solutions: ➤ (a). Ho: μ = 850

Ha: μ ≠ 850 ➤ (b). Case 1: Type I error µ = 850

From the standard normal distribution table, we look-up z for 0.5 - 0.025 = .475 and we find z = 1.96

Case 2: Type II error

µ = 870 Since we require P{X < 850} =

,

For 850 + c* < X < 870, we have P{850 + c* < X < 870} = 0.45, z = - 1.64

Since equation 1 and equation 2 are equal:

➤ (c). D.R.: Take a sample of size N = 207. If 850 -10.89 ≤ X ≤ 850 + 10.89, then reject Ha; otherwise, reject H0 and accept Ha. Solved Problem 34.2 - Solved Problem 1: A large national corporation's past records show that their salespersons travel on average μ = 1,350 miles. They hire a statistician to determine if, during the past year, there has been a significant change in their travel mileage. A sample of N salespersons' travel records was taken. Assuming the standard deviation is σ = 150 miles. (a). State Ho and Ha. (b.) Assume μ = 1,250. For the following decision rule: D.R.: If 1,350 - c* ≤ X ≤ 1,350 + c*, then reject Ha; otherwise reject Ho and accept Ha. Find N and c* so that α = 0.05 and β = 0.02 when μ = 1,250 and

(c). Restate the decision rule.

Solutions: ➤ (a). Ho: μ = 1350 μ Ha: μ ≠ 1350 ➤ (b). Case 1: Type I error Step 1: μ = 1350

Step 2: From the standard normal distribution table, we look-up z for 0.5 0.025 = .475 and we find z = 1.96.

Case 2: Type II error µ = 1250

Since we require P{X > 1350} =

,

For 1250 < X < 1350 - c*, we have P{1250 < X < 1350 - c*} = 0.48, z = 2.05

Since equation 1 and equation 2 are equal

➤ (c). D.R.: Take a sample of size N = 36. If 1350 -26.1 ≤ X ≤ 1350 + 26.1, then reject Ha; otherwise, reject H0 and accept Ha.

Unsolved Problem with Answers 34.2 - Problem 1: Over many years, records have shown that in a certain large airport the average number of pieces of passenger luggage that was handled by the airport was 25,500 per day. Not satisfied with this number, the directors decided to modify the system that the employees used in handling the passengers' luggage. After completion, a random sample of 100 days was taken to find out if there had been any significant changes in the amount of luggage handled. Assume a standard deviation of 2,000. (a). State Ho and Ha. (b.) Assume μ = 25,500. For the following decision rule: D.R.: If 25,500 - c* ≤ X ≤ 25,500 + c*, then reject Ha; otherwise reject Ho

and accept Ha. Find N and c* so that α = 0.10 and β = 0.01 when μ = 27,700 and P{X < 25,500} = (c). Restate the decision rule. Answers: ➤(a). Ho: μ = 25,500 Ha: μ ≠ 25,500 ➤(b). N ≈ 380 c* ≈82.62 ➤(c). D.R.: Take a sample of size N = 380. If 25,500 -82.62 ≤ X ≤ 25,500 + 82.62, then reject Ha; otherwise, reject Ho and accept Ha. ⇑ Refer back to 34.2 - Example 1 & 34.2 - Solved Problem 1.

Supplementary Problems 1. Mrs. Pillar is running for reelection to Congress. Her opponent is against free trade. She will favor free trade if 50% or more of her district is in favor of free trade. She hires a political analysis to take a poll of 200 voters from her district to find out the number in favor of free trade. She decides on the following decision rule for supporting or rejecting free trade in her campaign: D.R.: If at least 95 of the voters say they are in favor of free trade, she will state in her election advertisements that she supports free trade. However, if less than 95 say they are in favor of free trade, she will state in her election advertisements that she does not support free trade. a. State Ho and Ha b. Find the probability of a Type I error α.

c. If the true proportion in her district that support free trade is only 40%, find the probability that she will commit a Type II error β. d. Modify the decision rule so that α = 0.05 when at least 50% of the voters support free trade and β = 0.01 when 40% of the voters support free trade. 2. A Federal agency believes that a national bank discriminates against a certain minority group when approving loans. This group constitutes 17% of the population. The agency decides to check 100 loans at random for evidence of discrimination. They use the following decision rule: D.R.: If 16 or less of these loans are issued to members of this minority group, then the agency will conclude that the bank discriminates at this group; otherwise the agency will reserve judgement. a. State Ho and Ha b. Find the probability of a Type I error α. c. If the true proportion of loans given to this group is 12%, find the Type II error β. d. Modify the decision rule so that α = 0.02 when at least 17% of the loans are approved for this group and β = 0.05 when 12% of the loans are approved for this group.

*This

lesson is not required for the understanding of the sequel and can be omitted.

Statistical Inference Theory Lesson 35 The Distribution of P

35.1-What is the Central Limit Theorem for P? Assume a Bernoulli experiment with n independent trials and the probability of success equals p for each trial. We define X1, X2, X3,,…, Xn to be a sequence of mutually independent random variables where P{Xk = 1} = p and P{Xk = 0} = 1 - p = q on the k th trial (k = 1, 2, 3,…, n). It can be shown that E(Xk) = p and σ2k = pq = σ2, (k = 1, 2,…, n). We define

where X = X1 + X2 + X3 +… + Xn.

The Central Limit Theorem states the following about the distribution of the random variable P :

For a large sample (n ≥30), P is approximately normally distributed. The mean of

.

The standard deviation of P is

which is call the standard error of proportions. If p is not known, then use

.

If p is known, the distribution of

is approximately normally distributed with mean 0 and standard deviation 1. 35.1 - Example 1: Past records of the student body at a large university show that 65% of the student body is female. A sample of n = 100 students is taken at random. Find (a). σP. (b). the probability that this sample has more than 70% female students. (c). the probability that this sample has less than 62% female students. (d). the probability that this sample has between 70% and 80% female students. Solutions: ➤(a). We are given that p = 65/100 = 0.65 and the sample size taken is n = 100. From the Central Limit Theorem, we have

➤(b). We use the formula

to find the area under the normal distribution

curve for figure 3: p = 0.65, σP = 0.048. Therefore, From the Normal Distribution tables, P{P ≥ 0.70} = 0.5 - 0.3508 = 0.1492. fig. 1

➤(c). We use the formula

fig. 2

P{P ≤ 0.62} = 0.5 - 0.2357 = 0.2643 ➤(d). We need to find P{ 0.70 ≤ P ≤ 0.08}. For P = 0.80,

For P = 0.70,

P{0.70 ≤ P ≤ 0.80} = 0.4991 - 0.3508 = 0.1483. fig. 3

35.1 - Example 2: The Bubble Bottling Company has a machine that fills 12 ounces of orange juice into bottles. According to their quality control engineer, the machine fills 70% of the bottles with 12 ounces or more. In monitoring this filling process, the company uses the following decision rule: D.R.: Each two hours, they randomly select 100 bottles from the production line. If 65% or less of these bottles contain 12 ounces or more, then stop the process and make appropriate adjustments. (a). Assume the machine fills 70% of these bottles with 12 ounces or more. Find the probability that the process will be shut down. (b). Modify the decision rule so that for a sample of n = 100, the machine will only be shut down 1% of the time when p = 0.70. Solutions: ➤(a). Step 1: Let p = 0.70. Step 2: Let P = 0.65. Step 3: Step 4: fig. 4

= -1.09

Step 5: From the normal Distribution table: ➤(b). We rewrite the decision rule first as: D.R.: Each two hours, they randomly select 100 bottles from the production line. If p* or less of these bottles contain 12 ounces or more, then stop the process and make appropriate adjustments. Step 1: We use the formula p* = p + zσP Step 2: p = 0.70 Step 3: σP = 0.046 fig. 5

Step 4: p* = zσP = 0.70 + (-2.33)(0.046) ≈ 0.593 Step 5: The decision rule now reads: D.R.: Each two hours, they randomly select 100 bottles from the production line. If 59.3% or less of these bottles contain 12 ounces or more, then stop the process and make appropriate adjustments.

Solved Problems 35.1 - Solved Problem 1: Mr. Dow claims that he can predict, 60% of the

time, the direction of the stock market. On each of the next 36 trading days, he predicts the direction of the market. Find (a). σP (b). the probability that he is correct more than 65% of the time. (c). the probability that he is correct less than 50% of the time. (d). the probability that he is correct between 48% and 55% of the time. Solutions:

➤(a). We are given that p = 0.60 and the sample size taken is n = 36. From the Central Limit Theorem, we have:

➤(b). We use the formula

to find the area under the normal distribution curve for figure 6: P = 0.65

p = 0.60 σP = 0.08 Therefore,

From the Normal Distribution table, ➤(c). We use the formula:

fig. 7

fig. 8

➤(d). We need to find

.

For P = 0.55,

P{0.48 ≤ P ≤ 0.55 } = 0.4332 - 0.2357 = 0.1975 fig. 9

35.1 - Problem 2: Congress Woman Jones takes a survey of 400 voters in her district on their support of a new environmental law. If 55% or more say they support such a law, she will vote for it. (a). If only 48% of the voters in her district support this purposed law, find the probability that she will vote for the law. (b). Modify the decision rule so there is only a 5% chance she will vote for this bill where p = 0.48. Solutions: ➤(a). Step 1: Let p = 0.48 Step 2: Let P = 0.55 Step 3: Step 4: fig. 10

Step 5: From the normal Distribution table: ➤(b). We rewrite the decision rule first as:

D.R.: Congress Woman Jones takes a survey of 400 voters in her district on their support of a new environmental law. If p* or more say they support such a law, she will vote for it. Step 1: We use the formula: Step 2: p = 0.48 Step 3: σP = 0.025 fig. 11

Step 4: Step 5: The decision rule now reads: D.R.: Congress Woman Jones takes a survey of 400 voters in her district on their support of a new environmental law. If 52% or more say they support such a law, she will vote for it.

Unsolved Problems with Answers 35.1 - Problem 1: Mrs. Jones claims that she has ESP. To check her claim, 100 cards are placed on a desk. For each card, the side facing up is blank and the down side is marked with the letter A or B. All 100 cards are selected one at a time and she attempts to predict the letter. Assume that on average she can predict correctly 70% of the time. Find

(a). σP (b). the probability that she is correct more than 65% of the time. (c). the probability that she is correct less than 65% of the time. (d). the probability that she is correct between 65% and 75% of the time. Answers: ➤(a). σP ≈ 0.046 ➤(b). 0.8621 ➤(c). 0.1379 ➤(d). 0.7242 ⇑ Refer back to 35.1 - Example 1 & 35.1 - Solved Problem 1. 35.1 - Problem 2: A publishing company of a gardening magazine believes that only 35% of its subscribers are men. A random survey of 400 is taken of its subscribers. If no more than 45% of those sampled are men, the company will increase its efforts to have more men subscribe. (a). If 50% of its subscribers are men, find the probability the company will increase their efforts to obtain more men subscribers. (b). Modify the decision rule so there is only a 1% chance the company will increase their efforts to obtain more men subscribers where p = 0.50. Answers: ➤(a). 0.0228 ➤(b). D.R.: A random survey of 400 is taken of its subscribers. If at most 44% of its subscribers are men, the company will increase its efforts to have more men subscribe. ⇑ Refer back to 35.1 - Example 2 & 35.1 - Solved Problem 2.

35.2 - Solving Binomial Problems Using P. 35.2 - Example 1: A fair coin is tossed 100 times. Find the probability

(a). at least 60 heads occur. (b). between 55 and 65 heads occur. Solutions: ➤(a). To use the distribution of P, we need to convert the problem into proportions: Step 1: Since the coin is fair, p = 0.5. Step 2: "At least 60 heads occur" changes to the event {P ≥ 0.60}. Step 3: Step 4: From the table: fig. 12

➤ (b). Step 1: The event "between 55 and 65 heads occur" converts to Step 2:

fig. 13

Step 3: Step 4:

Solved Problems 35.2 - Solved Problem 1: A large shipment of machine tools has 5% defective parts. A sample of 49 tools is taken. Find the probability that (a). at most 5 tools are defective. (b). between 1 and 4 tools are defective. Solutions: ➤(a). To use the distribution of P we need to convert the problem into proportions: Step 1: Since 5% are defective, p = 0.05. Step 2: The event "At most 5 tools are defective " changes to the event

Step 3: Step 4: From the table

fig. 14

➤(b). Step 1: The event "between 1 and 4 defective tools " converts to

Step 2: fig. 15

Step 3: Step 4:

Unsolved Problems with Answers 35.2 - Problem 1: A recent report from a government agency shows that 65% of businesses are in favor of eliminating capital gains taxes on businesses. In a random sample of 50 business owners, find the probability that (a). less than 25 owners are in favor of eliminating capital gains taxes. (b). between 30 and 45 are in favor of eliminating capital gains taxes. Answers: ➤(a). 0.0125 ➤(b). 0.77 ⇑ Refer back to 35.2 - Example 1 & 35.2 - Solved Problem 1.

Supplementary Problems 1. If a sample of size n is taken, without replacement, from a finite population

of size N then

Assume a computer randomly selects 36 numbers, without replacement, from the integer numbers {1,2,3,..,100}. We define success that an even number is selected. a. Find σP. b. Find the probability that at least 60% of the numbers selected are even. 2. The average weights of army officers and naval officers are normally distributed as follows: For army officers, The average weight is 200 pounds with a standard deviation of 10 pounds. For naval officers, the average weight is 210 pounds with a standard deviation of 15 pounds. Assume 36 army officers and 49 naval officers are randomly sampled. Find the probability: a. that at least 75% of the army officers weigh more than 195 pounds and at least 85% of the naval officers weigh more than 195 pounds. b. that at least 75% of the army officers weigh more than 195 pounds or at least 85% of the naval officers weigh more than 195 pounds. 3. A publishing company of a gardening magazine believes that only 35% of its subscribers are men. A random survey of N is taken of its subscribers. If at most 45% of its sampled subscribers are men, the company will increase its efforts to have more men subscribe. If 50% of its subscribers are men, find the appropriate sample size N so that there is only a 2% chance the company will increase its efforts to have more men subscribe. 4. A sampling is taken of size N, where P{P ≤ 0.35} = 0.02 and P{P ≥ 0.45} = 0.01. Find p and N. 5. Ms. Jones loves to wager on basketball. When she wins, she wins $100 and

when she losses, she losses $110. Her decision as which team to wager on is determined by tossing a fair coin: heads she wages on team A and tails team B. Each month, for 9 months, she wages on 90 games. a. Find her expected return for each play, each month and the entire season. b. For any given month, what is the probability that she will lose money. c. Find the probability that she will lose money at the end of each month. 6. Mrs. Pillar is running for reelection to Congress. Her opponent is against free trade. She hires a political analyst to take a poll of 200 voters from her district to find out the number in favor of free trade. She decides on the following decision rule for supporting free trade in her campaign: D.R.: If at least 53% of the voters say they are in favor of free trade, she will state in her election advertisements that she supports free trade. However, if less than 53% say they are in favor of free trade, she will state in her election advertisements that she does not support free trade. a. Assume 55% of all voters in the District support free trade. Find the probability that she will not support free trade in her campaign. b. Assume 50% of all voters in the District support free trade. Find the probability that she will support free trade in her campaign. c. Modify the decision rule so the chance she will support free trade in her campaign is 0.01 even though only 50% of all voters in the district support free trade. d. Find a sample size N and an appropriate decision rule so with probability 0.01, she will state in her election advertisements that she does not support free trade when p = 0.55 and with probability 0.05 she will state in her election advertisements that she does support free trade when p = 0.5. 7. Assume a Bernoulli experiment with n independent trials and the probability of success equals p for each trial. We define X1, X2, X3,,…, Xn to be a sequence of mutually independent random variables where P{Xk = 1} = p and P{Xk = 0} = 1 - p = q on the kth trial (k = 1, 2, 3,…, n). Show a. E(Xk) = p.

b. σ2Xk = pq = σ2, (k = 1, 2,…, n). c. We define

where X = X1 + X2 + X3 +… + Xn.

In lesson 16, problem 13, we showed

Show the standard deviation of P is

8. Assume a Bernoulli experiment with n independent trials and the probability of success equals p for each trial. We define X1, X2, X3,,…, Xn to be a sequence of mutually independent random variables where P{Xk = 1} = p and P{Xk = 0} = 1 - p = q on the kth trial (k = 1, 2, 3,…, n). Define Show

. .

(Hint: See Lesson 16, problem 13.) 9. Comparing the normal approximation to the binomial distribution and the distribution of P. Assume a fair coin is tossed 100 times. a. Using the normal approximation to the binomial distribution, find the probability the number of heads is between 45 and 55. b. Find the probability that the number of heads is between 45 and 55 using the distribution of P.

Statistical Inference Theory Lesson 36 Estimating the Proportion of a Population

Since p generally is not known, the goal of inference theory is to use P as an estimation of P. There are two types of estimates: point estimate and interval estimate. In either type of estimates, P is substituted in place of P. This substitution creates an error.

36.1-What is the error created when using a point estimate? The following formula equals the error created when p is replaced by P: where

Error =

where

P is the proportion computed from the sample and N is the sample space.

36.1 - Example 1: A large university wants to estimate the percentage of students that have part-time jobs. A survey of 100 students is taken and 60% of these students have part-time jobs. Assume P = 0.60. (a). Find the probability that the error created exceeds 5%. (b). Find the minimum sample size so that the probability is 0.02 of making an error that exceeds 5%. Solutions: ➤(a). Here P = 0.60 and N = 100 fig. 2

The difference between P and P is the error (P − P). We need to find the probability that the error exceeds 5%. Step 1: Since P = 0.60 and N = 100, we compute the standard deviation of the sample:

Step 2: Step 3: Solving for z gives

Step 4: From the normal distribution table: fig. 3

P{e* > 0.05} = 0.5 -0.3461 + 0.5-0.3461 = 0.1539 + 0.1539 = 0.3078. ➤(b). Step 1: Step 2: Since this probability of this is to be 0.02, we have 0.02/2 = 0.01. From the normal distribution table for 0.5 - 0.01 - 0.49, z = 2.33. Step 3:

N = ≈ 521

Solved Problems 36.1 - Solved Problem 1: A company that manufactures a new gasoline additive is interested in testing the additive to determine the percentage of cars that increase their mileage by 2.1 miles or more. It selects 36 different cars and runs each car for 100 miles. The final results showed that 48% of

these cars increased their mileage by at least 2.1 miles per gallon. Assume = 0.48 is used to estimate P. (a). Find the probability that the error created exceeds 10%. (b). Find the minimum sample size so that the probability is 0.05 of making an error that exceeds 10%. Solutions: ➤(a). Here P = 0.48 and N = 36 fig. 4

The difference between P and p, P − P is the error e*. We need to find the probability that the error e* exceeds 0.10. Step 1: Since P = 0.48 and N = 36 we compute the standard deviation of the sample

Step 2: Step 3: Solving for z gives

Step 4: From the normal distribution table: fig. 5

P{e* > 0.10} = 0.5 -0.3849 + 0.5 - 0.3849 ≈ 0.1151 + 0.1151 = 0.2302 ➤(b). Step 1: Step 2: Since the probability of this error is to be 0.05, we have 0.05/2 = 0.025. From the normal distribution table for 0.5 - 0.025 = 0.475, z = 1.96. Step 3:

N ≈ 96

Unsolved Problems with Answers

36.1 - Problem 1: A machine fills bottles with orange juice. Each hour a sample of 49 filled bottles is taken to determine the percentage of bottles that are filled with more than 16.5 ounces. The sample resulted in 15% of the bottles containing with more than 16.5 ounces. Assume we use this 15% as the true percentage of bottles that the machine fills with more than 16.5 ounces. (a). Find the probability that the error is more than 10%. (b). Find the minimum sample size so that the probability is 0.01 of making an error that exceeds 10%. Answers: ➤(a). 0.05 ➤(b). N ≈ 84 ⇑ Refer back to 36.1 - Example 1 & 36.1 - Solved Problem 1.

36.2 - What is the error created when using a Confidence Interval estimate ? The interval estimate of P is called the confidence interval given by the following formulas:

The value z is determined according to the confidence in P within the given interval. 36.2 - Example 1: A large university wants to estimate the percentage of students that have part-time jobs. A survey of 100 students is taken and 60% of these students have part-time jobs. (a). Find a 90% confidence interval for P. (b). Find a 95% confidence interval for P.

Solutions: ➤(a). Since the confidence interval is 90%, we use the area 0.90/2 = 0.45 to find z = 1.64. Step 1: fig. 6

Here P = 0.60 Step 2: N = 100 Step 3: ±z0.049 Step 4: Step 5: Since the confidence interval is 0.90, z = 1.64 and 0.60 - 1.64 (0.049) ≤ p ≤ 0.60 + 1.64(0.049) Step 6: ➤(b). Since the confidence interval is 95%, we use the area = 1.96.

to find z

Step 1: Here P = 0.60 fig. 7

Step 2: N = 100 Step 3: ±z0.049 Step 4: Step 5: Since the confidence interval is 0.90, z = 1.96 and Step 6:

Solved Problems 36.2 - Solved Problem 1: A company that manufactures a new gasoline additive is interested in testing the additive to determine the percentage of cars that increase their mileage by 2.1 miles or more. It selects 36 different cars and runs each car for 100 miles. The final results showed that 46% of these cars increased their mileage by at least 2.1 miles per gallon. (a). Find a 92% confidence interval for P. (b). Find a 99% confidence interval for P. Solutions: ➤(a). Since the confidence interval is 92%, we use the area 0.92/2 = 0.46 to find z = 1.75. Step 1: Here P = 0.46 fig. 8 Step 2: N = 36 Step 3:

Step 4: Step 5: Since the confidence interval is 0.92, z = 1.75 and Step 6: ➤(b). Since the confidence interval is 99%, we use the area = 2.57. Step 1: P Here = 0.46 fig. 9

Step 2: N = 36 Step 3: Step 4: Step 5: Since the confidence interval is 0.99, z = 2.57 and Step 6:

to find z

Unsolved Problems with Answers 36.2 - Problem 1: A machine fills bottles with orange juice. Each hour a sample of 49 filled bottles is taken to determine the percentage of bottles that are filled with more than 16.5 ounces. The sample resulted in 15% of the bottles containing more than 16.5 ounces. (a). Find a 90% confidence interval for P. (b). Find a 95% confidence interval for P. Answers: ➤(a). 0.07 ≤ p ≤ 0.23 ➤(b). 0.05 ≤ p ≤ 0.25 ⇑ Refer back to 36.2 - Example 1 & 36.2 - Solved Problem 1.

36.3 Determining the Minimum Sample Size. In the first section of this lesson, for each example and problem, we derived an estimate for the minimum sample size needed under the condition that a given error will exceed a given amount for a specifed probability. In this lesson we give below the formula needed to derive the same minimum sample size, within a given confidence interval:

where e* is the error, z is determined by the confidence of the estimate and p is the estimate of the true proportion for the population. However, if no estimate of p is given, then use p = 0.5 which will give the largest sample N possible. 36.3 - Example 1: Last year a national survey showed that 5% of the population regularly watch soap operas. (a). Find the sample size N that is needed to be taken to estimate, with 90%

confidence, the proportion of people that watch soap operas within a 1% error. (b). Assuming that last year's survey is not used, estimate the proportion of people that watch soap operas within a 1% error. Solutions: ➤(a). Step 1: p = 0.05 Step 2: e* = 0.01 Step 3: Since we want a confidence of 90%, z = 1.64. .

Step 4: ➤(b). Step 1: Since last year's estimate is not used, p = 0.5. Step 2: e* = 0.01 Step 3: Since we want a confidence of 90%, z = 1.64. Step 4:

.

Solved Problems 36.3 - Solved Problem 1: A medical journal needs a sample to determine the true percentage of male patients that recover from heart surgery within 30 days. They first did a preliminary study and found that 75% of these patients recover within 30 days. (a). Find the sample size needed to estimate the true proportion with 95% confidence and within 5% error. (b). Assuming that the preliminary study is not used, find the sample size needed to estimate the true proportion with 95% confidence and within 5% error.. Solutions:

➤(a). Step 1: p = 0.75 Step 2: e* = 0.05 Step 3: Since we want a confidence of 95%, z = 1.96. Step 4:

.

➤(b). Step 1: p = 0.50 Step 2: e* = 0.05 Step 3: Since we want a confidence of 95%, z = 1.96. Step 4:

Unsolved Problems with Answers 36.3 - Problem 1: Over the years, a large liberal arts college claims that 30% of its students have a grade point average of 3.0. (a). Find the sample size needed to estimate the true proportion of students with a grade point average of 3.0 with a confidence of 99% that the error will not exceed 5%. (b). Compute the sample size where no estimate is used. Answers: ➤(a). N ≈ 555 ➤(b). N ≈ 661 ⇑ Refer back to 36.3 - Example 1 & 36.3 - Solved Problem 1.

Supplementary Problems The estimator P and the standard error of proportions σP in problem 1 can be

computed from the formulas:

1. A national advertising firm hired by a computer manufacturing company wants to find an estimate for the percentage of families that own computers. They take random samples from the adult population in five cities. The following table gives the results of this survey:

a. Find P. b. Find σP. c. Using P to estimate p, find the probability that the error exceeds 2%. d. Estimate a confidence interval p with a 95% confidence. 2. Assume for the formulas used in problem 1 that all the samples are the same: (N = N1 = N2 = … = Nn ) a. Rewrite the formulas. b. Find a formula for estimating the sample size, given z, e* and . 3. Mr. Jones wages on basketball games. Over a 9 month period, he waged on

20 games per month, with the following monthly percentage wins:

a. Using this table find a 95% confidence interval to estimate his yearly performance. b. Using a point estimate p, how many plays would he have to make each month to be 90% confident that his estimate p is not off by more than 2%? c. For each game he can win $100 or lose $110. Find the probability that he loses money at the end of any given month. d. Find the probability that over 9 months he does not lose money at the end of any month. 4. Show that p = 0.5 will give the largest value of N when estimating a minimum sample size for a given error and confidence interval.

Statistical Inference Theory Lesson 37 Decision Theory Using P

Decisions often have to be made by challenging claims on the value p of a population. We will use hypothesis testing as discussed in previous lessons.

37.1- Real Life Applications 37.1 - Example 1: For a certain population, the following claim and counterclaim was made: Ho: p = 0.50 Ha: p ≠ 0.50. To test this claim, a sample is to be taken from the population after the following decision rule: D.R.: If

then reject Ha; otherwise reject Ho and accept Ha.

Assume the sample proportion resulted in p = 0.62.

(a). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (b). If p = 0.55 did an error occur? (c). If p = 0.50 did an error occur? Solution: ➤(a). Since the sample proportion p = 0.62 is outside the interval the decision rule requires that you reject Ho and accept Ha.

,

➤(b). Step 1: Since the population proportion p = 0.55, the claim Ho: p = 0.50 is false and the counter-claim Ha: p ≠ 0.50 is true. Step 2: Since the sample proportion p = 0.62 is outside the interval the decision rule requires that you reject Ho and accept Ha. Step 3: Since you are rejecting Ho which is false and accept Ha., which is true, no error occurred. ➤(c). Step 1: Since the population proportion p = 0.50, the claim Ho: p = 0.50 is true and the counter-claim Ha: p ≠ 0.55 is false. Step 2: Since the sample proportion p = 0.62 is outside the interval the decision rule requires that you reject Ho and accept Ha. Step 3: Since you are rejecting Ho which is true and accepting Ha which is false, a Type I error has occurred. 37.1 - Example 2: Mrs. Pillar is running for reelection to Congress. Her opponent is against free trade. She will favor free trade if at least 50% of her district is in favor of free trade. She hires a political analyst to take a poll of

200 voters from her district to find out the number in favor of free trade. She decides on the following decision rule for supporting or rejecting free trade in her campaign: D.R.: If at least p* of the voters say they are in favor of free trade, she will state in her election advertisements that she supports free trade. However, if less than p* say they are in favor of free trade, she will state in her election advertisements that she does not support free trade. (a). State Ho and Ha. (b). Find p* so that the probability of making a Type I error is α = 0.05. Restate the decision rule. (c). If the true proportion in her district that support free trade is 40%, find the probability that she will commit a Type II error β. Solutions: ➤(a). Since she will favor free trade if 50% or more of her district is in favor of free trade, Ho: p ≥ 0.5 Ha: p < 0.5. fig. 1

➤(b).

A Type I error assumes that Ho is true but rejected. Step 1: p = 0.5 Step 2: Step 3: We need the formula p* = p + zσP = 0.5 + z(0.035). fig. 2

Step 4: Since we have α = 0.05, we look up in the normal distribution table the area 0.45 for the z value. From the table z = -1.64. Step 5: For the decision rule we find: p* = 0.5 - 1.64(0.035) = 0.4426. D.R.: If at least 44.26% of the voters say they are in favor of free trade, she will state in her election advertisements that she supports free trade. However, if less than 44.26 % say they are in favor of free trade, she will state in her election advertisements that she does not support free trade. ➤(c). A Type II error assumes that Ha is true but rejected. Step 1: p = 0.40 fig. 3

Step 2: Step 3: From (b). we use p* = 0.4426 for our decision rule. Step 4: We use the formula:

Step 5: From the normal distribution table:

37.1 - Example 3: According to a recent medical journal, 62% of all students attending medical schools in the United States are women. In testing this claim, a sample of 400 students in medical schools are randomly selected and the following decision rule is used: D.R.: If the proportion p of the sample is between 0.62 - p* and 0.62 + p* than accept the claim of the medical journal; otherwise reject the claim. (a). State Ho and Ha. (b). Find p* so that the probability of making a Type I error is α = 0.02. Also restate the decision rule. (c). If the true proportion of women attending medical school is really only 50%, find the probability of making a Type II error β.

Solutions: ➤(a). Ho: p = 0.62 Ha: p ≠ 0.62 ➤(b). A Type I error assumes that Ho is true but rejected. Step 1: p = 0.62 fig. 4

Step 2: Step 3:We need the formula: p* = zσP = z(0.024) fig. 5

Step 4: Since we have α = 0.02, we look up in the Normal distribution table the area 0.49 for the z value. From the table z = 2.33. Step 5: For the decision rule we find p* = (0.024)2.33 ≈ 0.06. D.R.: If the proportion p of the sample is between 0.62 - 0.06 (0.56) and 0.62 + 0.06(0.68) than accept the claim of the medical journal; otherwise reject the claim. ➤(c). A Type II error assumes that Ha is true but rejected. Step 1: p = 0.5 fig. 6

Step 2: Step 3: The probability of a Type II error: where P = 0.5. Step 4: We use the formulas:

Step 5: From the normal distribution table:

Solved Problems 37.1 - Solved Problem 1: For a certain population, the following claim and counter-claim was made: Ho: p = 0.25 Ha: p > 0.25. To test this claim, a sample is to be taken from the population after the following decision rule: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

Assume the sample proportion resulted in

.

(a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha.

ii. Reject Ha. (c). If p = 0.30 did an error occur? (d). If p = 0.20 did an error occur? Solution: ➤(a). Step 1: The alternative to Ha is Ho. Step 2: Since Ha: p > 0.25, the alternative to p > 0.25 is p ≤ 0.25. Step 3: Since Ho is the alternative to Ha, the other version for Ho is Ho: p ≤ 0.25. ➤(b). Since the sample proportion that you reject Ha.

, the decision rule requires

➤(c). Step 1: Since the population proportion p = 0.30, the claim Ho: p ≤ 0.25 is false and the counter-claim Ha: p > 0.25 is true. Step 2: Since the sample proportion requires that you reject Ha.

the decision rule

Step 3: Since you are rejecting Ha, which is true, a Type II error has occurred. ➤(d). Step 1: Since the population proportion p = 0.20, the claim Ho: p ≤ 0.25 is true and the counter-claim Ha: p > 0.25 is false. Step 2: Since the sample proportion requires that you reject Ha.

, the decision rule

Step 3: Since you are rejecting Ha, which is false, no error has occurred.

37.1 - Solved Problem 2: A Federal agency believes that a national bank discriminates against a certain minority group when approving loans. This group constitutes 17% of the population. The agency decides to check 1,000 loans at random for evidence of discrimination. They use the following decision rule: D.R.: If p* or less of these loans are issued to members of this minority group then the agency will conclude that the bank discriminates against this group; otherwise the agency will reserve judgement. (a). State Ho and Ha. (b). Find p* so that the probability of making a Type I error is α = 0.05. Also restate the decision rule. (c). If the true proportion of loans issued to this group is only 12% find the probability that the agency will commit a Type II error β. Solutions: ➤(a). Ho: p ≥ 0.17 Ha: p < 0.17 ➤(b). A Type I error assumes that H0 is true but rejected. Step 1: p = 0.17 fig. 7

Step 2: Step 3: We need the formula: p* = p + zσP = 0.17 + z(0.012) Step 4: Since we have α = 0.05, we look up in the Normal distribution table the area 0.45 for the z value. From the table z = -1.64. fig. 8

Step 5: For the decision rule we find p* = 0.17 - 1.64(0.012) ≈ 0.15. D.R.: If 15% or less of these loans are issued to members of this minority group then the agency will conclude that the bank discriminates against this group; otherwise the agency will reserve judgement.

➤(c). A Type II error assumes that Ha is true but rejected. Step 1: p = 0.12 fig. 9

Step 2: Step 3: From (b). we use p* = 0.15 for our decision rule. Step 4: We use the formula:

Step 5: From the normal distribution table:

37.1 - Solved Problem 3: A large hospital claims that 70% of patients admitted for heart surgery suffer from high blood pressure. To test this claim, the records of 400 patients are examined. To accept this claim the following decision rule is used: D.R.: If the proportion P of the sample is between 0.70 - p* and 0.70 + p* than accept the claim of the medical journal; otherwise reject the claim. (a). State Ho and Ha.

(b). Find p* so that the probability of making a Type I error is α = 0.01. Also restate the decision rule. (c). If the true proportion of such patients is 80%, find the probability of making a Type II error β. Solutions: ➤(a). Ho: p = 0.70 Ha: p ≠ 0.70 ➤(b). A Type I error assumes that Ho is true but rejected. Step 1: p = 0.70 fig. 10

Step 2: Step 3: We need the formula: p* = = z(0.023) Step 4: Since we have α = 0.01, we look up the area 0.495 in the normal distribution table for the z value. From the table z = 2.57. Step 5: For the decision rule we find p* = (0.023)2.57 ≈ 0.06:

D.R.: If the proportion p of the sample is between 0.70 - 0.06(0.64) and 0.70 + 0.06(0.76) than accept the claim of the hospital; otherwise reject the claim. fig. 11.

➤(c). A Type II error assumes that Ha is true but rejected. Step 1: p = 0.80 fig. 12

Step 2: Step 3: The probability of a Type II error β = P{0.64 ≤ P ≤ 0.76}where p =0.80.

Step 4: We use the formulas:

Step 5: From the normal distribution table:

Unsolved Problems with Answers 37.1 - Problem 1: For a certain population, the following claim and counterclaim was made: Ho: p = 0.10 Ha: p < 0.10 To test this claim, a sample is to be taken from the population after the following decision rule: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

Assume the sample proportion resulted in

.

(a). Give an alternative version of Ho. (b). From the decision rule, which of the following would you do: i. Reject Ho and accept Ha. ii. Reject Ha. (c). If p = 0.09 did an error occur? (d). If p = 0.11 did an error occur? Answers: ➤(a). Ho: p ≥ 0.10

➤(b). Reject Ha. ➤(c). A Type II error has occur. ➤(d). No error has occur. ⇑ Refer back to 37.1 - Example 1 & 37.1 - Solved Problem 1. 37.1 - Problem 2: The warranty on a certain brand of tires assumes that 15% of all tires will last more than 35,000 miles. To check out this assumption, a sample is made of 1,500 tires that are traded in. The following decision rule is to be developed: DR: From the sample, let p represent the proportion of tires that lasted more than 35,000 miles. If p* ≤ p then accept the claim on the warranty; otherwise reject the claim. (a). State Ho and Ha (b). Find p* so that the probability of making a Type I error is α = 0.10. Also restate the decision rule. (c). If the true proportion of tires that last more than 35,000 miles is only 10%, find the probability of a Type II error β. Answers: ➤(a).

➤(b). p* ≈ 0.14 DR: From the sample, let p represent the proportion of tires that lasted more than 35,000 miles. If 0.14 p ≤ then accept the claim on the warranty; otherwise reject the claim. ➤(c). β = 0 ⇑ Refer back to 37.1 - Example 2 & 37.1 - Solved Problem 2. 37.1 - Problem 3: A news broadcaster recently claimed that in the Los Angeles area about 17% of drivers drive without their seat belts. In testing this claim a sample of 500 drivers are randomly selected and the following

decision rule is used: D.R.: If the proportion p of the sample that are wearing seat belts is between 0.17 - p* and 0.17 + p* than accept the claim of the broadcaster; otherwise reject the claim. (a). State Ho and Ha. (b). Find p* so that the probability of making a Type I error is α = 0.05. Also restate the decision rule. (c). If the true proportion of drivers that wear seat belts is really 23%, find the probability of making a Type II error β. Answers: ➤(a). ➤(b). p* = 0.04 If the proportion P of the sample that are wearing seat belts is between 0.13 and 0.21 than accept the claim of the broadcaster; otherwise reject the claim. ➤(c). β ≈ 0.16 ⇑ Refer back to 37.1 - Example 3 & 37.1 - Solved Problem 3.

Supplementary Problems 1. Assume: Ho: p = 0.60 Ha: p > 0.60 A sample is to be taken of size N. Establish a decision rule so that probability of a Type I error α = 0.05 and a Type II error of β = 0.02 when p = 0.65. 2. Mrs. Pillar is running for reelection to Congress. Her opponent is against free trade. She will favor free trade if more than 50% of her district is in favor of free trade. She hires a political analysist to take a poll of voters from her district to find out the percentage in favor of free trade. She decides on the following decision rule for supporting or rejecting free trade in her campaign:

D.R.: If at least p% of the voters say they are in favor of free trade, she will state in her election advertisements that she supports free trade. However, if less than p% say they are in favor of free trade, she will state in her election advertisements that she does not support free trade. Modify the decision rule so that α = 0.05 when at most 50% of the voters support free trade and β = 0.01 when 55% of the voters support free trade. 3. Assume: Ho: p = 0.60 Ha: p > 0.60 A sample is to be taken of size 900. Establish a decision rule so that probability of a Type I error α equals a Type II error β, when p = 0.65. Also find α and β. 4. A manufacturing company recently purchased a new machine that makes ball bearings. The company needs to estimate the percentage of defective ball bearing produced. They wish to take a sample of N and set up a decision rule for future sampling so that the probability of a type I error is α = 0.05. a. Find N to estimate a 95% confidence that the value p is with 0.01 of the true value p. b. Assume sampling was done for N = 9,604 and p = 0.04, the proportion of defective ball bearings. Using p = 0.04, state an appropriate null and alternative hypothesis. c. Find an appropriate decision rule that will satisfy these requirements. d. Find p* for α = 0.05. e. If p = 0.06, find the type II error β. 5. An article in a national magazine reported that 57% of the male voters were registered Republicans. A political organization decided to hire a statistician to take a national sample to check the accuracy of this report. a. State Ho and Ha. b. For the following decision rule:

D.R.: If 0.57 - p* ≤ p ≤ 0.57 + p* then reject Ha; otherwise reject Ha and accept H0.. Find sample size N and p* for α = 0.05 and β = 0.05 for p = 0.51 and P{p ≥ 0.57} = 0.025. c. Rewrite the decision rule. d. Assume the sample is taken using the sample size computed in part b and p = 0.55. If 0.55 is used as an estimate of p, find the probability that the error will exceed 2%. e. Using N = 824 and p = 0.55, find a 95% confidence interval of p.

Statistical Inference Theory Lesson 38 The Distribution of Differences of Sample Means

Assume that we have two distinct sample spaces S1 and S2 where μ1 and μ2 are the means and σ1 and σ2 are the standard deviations respectively. Assume a sample of size N1 is taken from the sample space S1 which results in the mean random variable X1 and a sample of size N2 is taken from the sample space S2 which results in the mean random variable X2. Further, assume X1 and X2 are independent random variables. The Central Limit Theorem allows us to determine the distribution of Xd = X1 - X2.

38.1- The Central Limit Theorem Of Xd = X1 - X2 The Central Limit Theorem states the following about the distribution of the

random variable Xd = X1 - X2: 1 For a large sample (n ≥30), Xd = X1 - X2 is approximately normally distributed. 2. The mean of

.

3. The standard deviation of Xd = X1 - X2 is

σd is called the standard error of difference of the means. Here, are the known standard deviations of the respective populations. If these values are not known, the standard deviations s1 and s2 of the samples should be used. 38.1 - Example 1: The average age of students at a Eastern College is 25.5 years old and the average age of students at a Western college is 23.1 years old. A sample of student ages was taken from both colleges. The following table presents the results of this survey:

Find: (a). Xd = X1 - X2 (b). μd = μ1 - μ2 (c). (d). the error (e). the number of standard deviations is Xd from μd. Solutions: ➤(a). From the table, X1 = 26.5 and X2 = 24.9. Therefore, Xd = 26.5-24.9 = 1.6 ➤(b). From the table, μ1 = 25.5 and μ2 = 23.1

Therefore, μd = 25.5 - 23.1 = 2.4 ➤(c). From the table, σ1 = 2.1, σ2 = 2.5 and N1 = 100, N2 = 200. Therefore, ➤(d). ➤(e). To find the number of standard deviations, we need the formula

Therefore, and the number of standard deviations is 2.96. 38.1 - Example 2: To estimate the difference of the grade point average between undergraduate and graduate students in Physics at a local university, a sample of grade point averages was taken of 100 undergraduate and graduate physics students. The following table gives the results of this survey:

Assume we wish to μd = μ1 - μ2 estimate by using Xd = X1 - X2 = 2.65-3.1 = -0.45 as the estimator. Find the probability that Xd = -0.45 differs from μd by more than 0.1 grade points. fig. 2

Solution: Step 1: From the equation:

we find Step 2: The error Step 3: Using the formula:

Step 4: From Step 2, zσd = ±0.1 and therefore

fig. 3

Step 5: From the normal distribution table for z = 2.44, the chance that the error will exceed 0.1 is 0.0073 + 0.0073 = 0.0146.

Solved Problems 38.1 - Solved Problem 1: The NBA recently studied the past history of the scores per game of the West Coast and East Coast basketball leagues. They also took random samples from the games played by these two leagues. The following table is a summary of this survey:

Find: (a). Xd = X1 - X2

(b). μd = μ1 - μ2 (c). (d). the error: (e). the number of standard deviations Xd is from μd. Solutions: ➤(a). From the table:

Therefore, Xd = 79.5 - 85.4 = -5.9 ➤(b). From the table: μ1 = 86.2 and μ2 = 101.4 Therefore, μd = 86.2 - 101.4 = -15.2 ➤(c). From the table: σ1 = 12.1, σ2 = 10.8, and N1 = 400, N2 = 600. Therefore, ➤(d). ➤(e).

To find the number of standard deviations, we need the formula:

Therefore,

the number of standard deviations. 38.1 - Solved Problem 2: The union representing auto workers in a manufacturing plant takes a sample of 200 male workers and 400 female workers to estimate the difference in hourly pay. The following table is a summary of this survey:

Assume we wish to estimate

by using:

Xd = X1 - X2 = $12.25 - $11.87 = $0.38 as the estimator. Find the probability that Xd = 0.38 differs from μd by more than $0.20. fig. 4

Solution: Step 1: From the equation:

we find

Step 2: The error Step 3:Using the formula:

Step 4: From Step 2, z = σd = ± 0.20 and therefore,

fig. 5

Step 5: From the normal distribution table for z = 1.82, the chance that the error will exceed 0.2 is 0.0344 + 0.0.0344 = 0.0688.

Unsolved Problems with Answers 38.1 - Problem 1: Daily climate records over the past 100 years shows that the daily average temperature in Miami, Florida is 76.5 degrees and the daily average for San Diego, California is 72.12 degrees. Also a sample of 36 randomly selected days where taken from the records of both cities. The following table is a summary of this sample:

Find: (a). Xd = X1 - X2 (b). μd = μ1 - μ2

(c). (d). Find the error: (e). Find the number of standard deviations Xd is from μd. Answers: ➤(a). Xd = 9.32 ➤(b). μd = 4.38 ➤(c). σd 2.25 degrees ➤(d). E = ± 4.94 ➤(e). 2.20, the number of standard deviations. ⇑ Refer back to 38.1 - Example 1 & 38.1 - Solved Problem 1. 38.1 - Problem 2: To estimate the difference in running the 100 yard dash between two rival high schools, the past running records of both schools where checked. From these records, a random sample was taken of the running speeds. The following table is a summary of this survey:

Assume we wish to estimate μd = μ1 - μ2 by using: Xd = X1 - X2 = 14.55 -14.11 = 0.44 seconds as the estimator. Find the probability that Xd = 0.44 differs from μd by more than 0.5 seconds. Answer:

0.04 ⇑ Refer back to 38.1 - Example 2 & 38.1 - Solved Problem 2.

38.2 - Statistical Decision Theory 38.2 - Example 1: An American anthropologist is interested in studying the differences, if any, in weights of males in two Central African tribes. He takes random samples of males in both tribes. The following table is a summary of this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). If μd = 2 pounds, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can you come to? Solutions: ➤(a). The anthropologist needs to test if there is any difference in the average weight of males in these two tribes. Therefore, Ho: μd = 0 (there is no difference in the average weights of the males), Ha: μd ≠ 0 (there is a difference in the average weights of the males). ➤(b). This is a two sided test, we first write the decision rule as follows:

D.R.: If -c*≤ Xd ≤ c* then reject Ha; otherwise reject H0 and accept Ha. Here we assume Ho: μd = 0. Step 1: Xd = X1 - X2 = 179.55 - 174.11 = 5.44 Step 2: fig. 6

Step 3: We use the formula: c* = μd + zσd = 0 +zσd = zσd = z(0.91). Step 4: Since α = 0.05, and we have a two-sided test, we look up the z value from the area of the normal distribution 0.5 - 0.025 = 0.475 and find z = 1.96. Step 5: c* = zσd = 1.96(0.91) ≈ 1.78 pounds Step 6: We restate the decision rule: D.R.: If Ha.

then reject Ha; otherwise reject H0 and accept

➤(c). We use the decision rule in (b) to find the probability of a Type II error if μd = 2 pounds.

fig. 7

Step 1: We use formula:

to find the area for

which equals the probability of a Type II error. fig. 8

Step 2: For Xd = -1.76:

Step 3: For Xd = 1.76:

Step 4: From the normal distribution table, the area between the two values of z is 0.4052. Step 5: Therefore, the probability of a Type II error is β ≈ 0.41. ➤(d). The decision rule that we use to determine Type I and Type II errors is: D.R.: If Ha.

then reject Ha; otherwise reject H0 and accept

Now, the above table shows that X1 = 179.55 pounds and X2 = 174.11 pounds. Therefore, Xd = X1 - X2 = 179.55 - 174.11 = 5.44 Since the value 5.44 falls outside the interval -1.78 and 1.78, H0 is rejected and Ha is accepted. ➤(e). There is a significant difference the average weight between the males of the two tribes. 38.2 - Example 2: A major petroleum company claims that a new additive in its oil will significantly increase gas mileage for automobiles. To check this claim, two samples are taken: one from automobiles containing oil with this additive and one from automobiles not containing oil with this additive. The following table summarizes this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.02. (c). If μd = 1.5 miles per gallon, find the probability of a Type II error β. (d). From the samples, is H0 or Ha rejected? (e). What conclusion can you come to? Solutions: ➤(a). Since we wish to check to seek if the additive increases gas mileage, we state the following: Ho: μd = 0, (there is a no difference in the average mileage). Ha: μd > 0, (there is a difference in the average mileage}. ➤(b). This is a one sided test. We first write the decision rule as follows: D.R.: If Xd ≥ c* then reject H0 and accept Ha; otherwise reject Ha.. Here we assume Ho: μd = 0. Step 1: Xd = X1 - X2 = 22.67- 19.66 = 3.01 Step 2: =

fig. 9

Step 3: We use the formula: c* = μd + zσd = 0 + zσd = z(0.63). Step 4: Since α = 0.02, and we have a one-sided test, we look up the z value from the area of the normal distribution 0.5 - 0.02 = 0.48 and find z = 2.05. Step 5: c* = zσd = 2.05(0.63) ≈ 1.29 miles per gallon. fig. 10

Step 6: We restate the decision rule: D.R.: If

then reject Ho and accept Ha; otherwise reject Ha.

➤(c). We use the decision rule in (b) to find the probability of a Type II error if μd = 1.5 miles per gallon. Step 1: We use the formula

to find the area for

which equals the probability of a Type II error. Step 2: For Xd = 1.29,

fig. 11

Step 3: From the normal distribution table, the area for z = -0.33 is 0.1293. Step 4: Therefore, the probability of a Type II error is β ≈ 0.5 - 0.1293 = 0.3707. ➤(d). The decision rule that we use to determine Type I and Type II errors is

D.R.: If Xd ≤ 1.29 then reject Ha; otherwise reject H0 and accept Ha The above table shows that X1 = 22.67 miles per gallon and X2 = 19.66 miles per gallon. Therefore: Xd = X1 - X2 = 22.67 - 19.66 = 3.01 Since the value 3.01 > 1.29, H0 is rejected and Ha is accepted. ➤(e). We conclude from this survey, that the additive does significantly increase gas mileage.

Solved Problems 38.2 - Solved Problem 1: A environmental group claims that the bald eagles in Northern California and Southern California lay on average a difference of 2 eggs per year. A random selection of the nests of these two regions show the following results:

(a). State Ho and Ha. (b). State the decision rule for α = 0.01. (c). If μd = 0 eggs, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to? Solutions: ➤(a).

Since the claim is that there is a difference of 2 eggs laid per year: Ho: μd = 2 eggs Ha: μd ≠ 2 eggs ➤(b). This is a two sided test, we first write the decision rule as follows: D.R.: If 2 - c* ≤ Xd ≤ 2 + c* then reject Ha; otherwise reject H0 and accept Ha. Here we assume Ho: μd = 2. Step 1: Xd = X1 - X2 = 23.5 - 20.4 = 3.1 eggs Step 2: fig. 12

Step 3: We use the formula: c* = zσd = z(0.45). Step 4: Since α = 0.01, and we have a two-sided test, we look up z value from the area of the normal distribution 0.5 - 0.005 = 0.495 and find z = 2.57. Step 5: c* = zσd = 2.57(0.45) ≈1.16

fig 13

Step 6: We restate the decision rule: D.R.: If 0.84 ≤ Xd ≤ 3.16 then reject Ha; otherwise reject H0 and accept Ha. ➤(c). If μd = 0 eggs, we use the decision rule in (b) to find the probability of a Type II error. fig. 14

Step 1: We the formula

to find the area for 0.84 ≤ Xd ≤ 3.16 which equals the probability of a Type II error. Step 2: For Xd = 0.84:

Step 3: For Xd = 3.16:

Step 4: From the normal distribution table, the area between the two values of z is 0.5 - 0.4693 = 0.0307. Step 5: Therefore, the probability of a Type II error is β ≈ 0.06. ➤(d). The decision rule that we use to determine Type I and Type II errors is D.R.: If 0.84 ≤ Xd ≤ 3.16 then reject Ha; otherwise reject H0 and accept Ha. Now, the above table shows that X1 = 23.5 eggs and X2 = 20.4 eggs. Therefore, Xd = X1 - X2 = 23.5 - 20.4 = 3.1 Since the value 3.1 falls inside the interval 0.84 and 3.16, Ha is rejected. ➤(e). The difference in the number of eggs hatched is not significantly different from two eggs. 38.2 - Solved Problem 2: In comparing two different weight loss programs, a nutrition company claims that their program will result in an average loss of weight of 5 pounds more than their competitor. The following table summarizes the result of a survey taken from dieters using both plans:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). If μd = 0 pounds loss, find the probability of a Type II error β. (d). From the samples is H0 or Ha rejected? (e). What conclusion can we come to: Solutions: ➤(a). Since the company claims a difference of 5 pounds in their favor, we state the following: Ho: μd = 5 Ha: μd < 5 ➤(b). This is a one sided test, we first write the decision rule as follows: D.R.: If Xd ≤ 5 - c* then reject H0 and accept Ha; otherwise reject Ha. Here we assume Ho: μd = 5. fig. 15

Step 1: Xd = X1 - X2 = 18.6- 14.1 = 4.5 Step 2: Step 3: We use the formula: c* = zσd = z(0.125). Step 4: Since α = 0.05, and we have a one-sided test, we look up z value from the area of the Normal distribution 0.5 - 0.05 = 0.45 finding z = 1.64. Step 5: c* = zσd = 1.64(0.125) ≈0.21 pounds fig. 16

Step 6: We restate the decision rule:

D.R.: If Xd ≤ 4.79 pounds, then reject H0 and accept Ha; otherwise reject Ha.. ➤(c). We use the decision rule in (b) to find the probability of a Type II error if μd = 0 pounds loss. fig. 17

Step 1: We use the formula

to find the area for Xd ≥ 4.79 which equals the probability of a Type II error. Step 2: For Xd = 4.79,

Step 3: From the normal distribution table, the area to the right of z = 38.35 is 0. Step 4: Therefore, the probability of a Type II error is β ≈ 0. ➤(d). The decision rule that we use to determine Type I and Type II errors is D.R.: If Xd ≤ 4.79 pounds then reject H0 and accept Ha; otherwise reject Ha.

Now, the above table shows that X1 = 18.6 pounds loss and X2 = 14.1 pounds loss. Therefore, Xd = X1 - X2 = 4.5, pounds loss difference. Since the value 4.5 pounds is less than 4.79, H0 is rejected and Ha is accepted. ➤(e). We conclude from this survey, we conclude that the company's wight loss program does not cause a significant weight loss of 5 pounds over its competitor.

Unsolved Problems with Answers 38.2 - Problem 1: An administrator at a large university claims that the grade point average of physics majors is different than math majors. A random sample of grade point averages of both types of students is taken. The following table summaries this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.02. (c). If μd = 0.25 grade point average difference, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to? Answers:

➤(a). ➤(b). D.R.:If -0.07 ≤ Xd ≤ 0.07 then reject Ha; otherwise reject H0 and accept Ha. ➤(c). β = 0 ➤(d). Reject Ho and accept Ha ➤(e). There is a significant difference between the grade point average of Physics and Math majors. ⇑ Refer back to 38.2 - Example 1 & 38.2 - Solved Problem 1. 38.2 - Problem 2: The Federal Trade Commission recently issued a report claiming that a certain foreign computer manufacturer was significantly underpricing its lap top computers when compared to domestic lap top computer manufacturers. A random sample of both types of computers was taken. The following table is a summary of this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). If μd = -$30, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to? Answers:

➤(a). ➤(b). D.R.: If xd < -$12.90 then reject H0 and accept Ha; otherwise reject Ha. ➤(c). β = 0.01 ➤(d). Reject Ho and accept Ha. ➤(e). The foreign price is significantly lower than the domestic prices for lap top computers. ⇑ Refer back to 38.2 - Example 2 & 38.2 - Solved Problem 2.

Supplementary Problems 1. Two machines produce fuses for auto transmissions. Each fuse from machine A weighs on average 0.5 ounces and 0.6 ounces from machine B. A sample of fuses was taken from both machines. The following table is a summary of these samples:

Find the probability that the total weight of each sample differences by more than 1 pound. 2. Both Bill and Jim each have a fair coin. They play the following game in a casino: Rule 1: Each toss their coin 100 times. Rule 2: If Bill tosses more heads than Jim, Bill wins $100. Rule 3: Otherwise, Bill pay $10 to the casino.

Find the expectation of this game for Bill. 3. A tire retail company sells two brands of tires. A study is needed to estimate the average difference in life of these two brands. Samples of wear of both brands is taken. The following table summarizes this sample:

Establish a 95% confidence interval for the average difference in mileage. 4. Much attention has been focused in recent years on merger activity among business firms. Many business analysts are interested in knowing how various characteristics of merged firms compare to those of non-merged firms. A recent published report showed the accompanying sample data on price-earnings ratios of two samples of firms:

You wish to test if there is a difference in the true average price-earnings ratios for all merged and non-merged firms. a. State Ho and Ha. b. Using a 0.01 level of significance, would you say there is a difference in the price-earning ratio of the two types of companies? c. From the difference of the sample means, establish a 95% interval for the true difference of means. 5. Ms. Floss an administrator at a local liberal arts college claims that the grade point average of female students is higher than the male students. To

test this claim, equal size samples are to be taken from the grade point averages of both male and female students. a. State Ho and Ha. b. If the standard deviation of grade point averages for females is σ1 = 0.38 and σ2 = 0.35 for males, find the appropriate sample size and the decision rule where α = 0.05 and β = 0.02 when μd = 0.1 grade point average. 6. It is often reported in the press that women live 4 years, on average, longer than men. You have been hired by a large national life insurance company to do a statistical analysis to determine if the deceased women insured by the company also live, on average 4 years, longer than the deceased men they insured. a. State Ho and Ha. b. Assume you take random samples from the deceased women and men provided by the insurance company, resulting in the following table:

Using α = 0.05, would you conclude that the deceased insured women live 4 years longer than the deceased insured men. c. Using the data from the table, estimate with 95% confidence, the difference of longevity between women and women. 7. Assume X1, X2 are independent. Show X2.

where Xd = X1 -

Statistical Inference Theory Lesson 39 The Distribution of Differences of Sample Proportions

Assume that we have two distinct binomial sample spaces S1 and S2 where p1 and p2 are the proportions of the sample spaces respectively. Assume a sample of size N1 is taken from the sample space S1 which results in the proportion random variable P1 and a sample of size N2 is taken from the sample space S2 which results in the proportion random variable P2. Further, assume P1 and P2 are independent random variables. The Central Limit Theorem allows us to examine the distribution of Pd = P1 - P2

39.1-The Central Limit Theorem for Pd = P1 - P2 The Central Limit Theorem states the following about the distribution of the

random variable Pd = P1 - P2: 1 For a large sample (n ≥30), Pd = P1 - P2 is approximately normally distributed. 2. The mean is pd = p1 - p2. 3. The standard deviation of Pd = P1 - P2 is

where is called the standard error of difference of the means. Here, p1 and p2 are the known proportions of the respective populations. If these values are not known, then use the proportions P1 and P2 in place of p1 and p2 respectively.

39.1 - Example 1: Mrs. Jones has two fair coins. She tosses one coin 100 times and the other coin 150 times. She wishes to compare the difference in the percentage of heads. The following table presents the results of this survey:

Find: (a). Pd = P1 - P2 (b). pd = p1 - p2 (c). (d). Find the error: (e). Find the number of standard deviations Pd is from pd. Solutions: ➤(a). From the table, P1 = 0.65 and P2= 0.60. Therefore, Pd = 0.65 - 0.60 =0.05 ➤(b). From the table, p1 = 0.5 and p2 = 0.5. Therefore, pd = 0.5 - 0.5 = 0 ➤(c). From the table,p1 = 0.5, p2 = 0.5 and N1 = 100, N2 = 150.

Therefore,

➤(d). ➤(e). To find the number of standard deviations, we need the formula:

Therefore,

the number of standard deviations. 39.1 - Example 2: At a large University a survey was taken to compare the percent of Math majors and Physics majors that graduate. The survey showed that 91% of all Math majors in comparison to 88% of all Physics majors finally graduated. The following table gives the results of this survey:

Assume we wish to estimate pd = p1 - p2 by using: Pd = P1 - P2 = 0.91 - 0.88 = 0.03. as the estimator. Find the probability that Pd = 0.03 exceeds pd by more than 0.05. fig. 2

Solution: Step 1: From the equation,

we find Step 2: The error Step 3: Using the formula

Step 4: From Step 2,

and therefore,

fig. 3

.

Step 5: From the Normal distribution table for z = 1.16, the chance that the error will exceed 0.05 is 0.123 + 0.123 = 0.246. Solved Problems 39.1 - Solved Problem 1: A computer manufacturing firm is planning to purchase a new machine to produce a special chip for its computers. In considering two machines on the market, they discovered that one machine produces 5% defective parts while the less expensive machine produces 7% defective parts. To compare the difference in the percentage of defective chips between these two machines, the company takes a sample from each machine. The following table presents the results of this survey:

Find: (a). Pd = P1 - P2 (b). pd = p1 - p2

(c). (d). Find the error: (e). Find the number of standard deviations Pd is from pd. Solutions: ➤(a). From the table, P1 = 0.056 and P2 = 0.068 Therefore, Pd = 0.056 - 0.068 =-0.012 ➤(b). From the table, p1 = 0.05 and p2 = 0.07. Therefore, pd = 0.05 - 0.07 = -0.02. ➤(c). From the table, p1 = 0.05 p2 = 0.07 and N1 = 100 N2 = 200. Therefore,

➤(d).

➤(e). To find the number of standard deviations, we need the formula

Therefore,

the number of standard deviations. 39.1 - Solved Problem 2: A survey was taken to compare the percent of adult males and females as to their afternoon television viewing habits. The survey showed that 25% of all adult females watched at least one soap opera per day in comparison to 18% of all adult males. The following table gives the results of this survey:

Assume we wish to estimate pd = p1 - p2 by using: Pd = P1 - P2 = 0.25 -0.18 = 0.07 as the estimator. Find the probability that Pd = 0.07 exceeds pd by more than 0.02. fig. 4

Solution: Step 1: From the equation

we find

.

Step 2: The error Step 3: Using the formula:

Step 4: From Step 2,

fig. 5

Step 5: From the normal distribution table for z = 0.67, the chance that the error will exceed 0.02 is 0.2514 + 0.2514 = 0.5028.

Unsolved Problems with Answers 39.1 - Problem 1: A political action organization takes a survey to determine if there is any difference between adult females and males in their support on environment legislation. The following table summarizes the results of this survey:

Find: (a). Pd = P1 - P2 (b). pd = p1 - p2

(c). (d). Find the error:

(e). Find the number of standard deviations Pd is from pd. Answers: ➤(a). Pd = 0.24 ➤(b). pd = 0 ➤(c). σd ≈ 0.07 ➤(d). E = ± 0.24 ➤(e). 3.42 ⇑ Refer back to 39.1 - Example 1 & 39.1 - Solved Problem 1. 39.1 - Problem 2: A pharmaceutical firm is testing two new drugs that will lower blood pressure. The following procedure is followed: Test, at random, each drug on a separate group of patients. At the end of the experiment, record the percentage of patients whose blood pressure fell by at least 20%. The following table gives the results of this survey:

Assume we wish to estimate pd = p2 - p1 by using

as the estimator.

Find the probability that Pd = 0.09 exceeds pd by more than 0.15. Answer: 0.08 ⇑ Refer back to 39.1 - Example 2 & 39.1 - Solved Problem 2.

39.2 - Statistical Decision Theory 39.2 - Example 1: A political scientist is interested in studying the differences, if any, in the percentage of males and females that regularly listen to a famous talk show host. He takes random samples of males and females. The following table is a summary of this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). If pd = 0.08, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to? Solutions: ➤(a). The political scientist states the null hypothesis that there is no difference. Therefore, Ho: pd = 0 (there is no difference in listing habits between male and female). Ha: pd ≠ 0, (there is a difference in listing habits between male and female). ➤(b).

Step 1: Since we assume Ho: pd = 0, we use the following formula to estimate p:

and

Step 2: Since

N1 = 400 and N2 = 400:

Step 3: This is a two sided test, we first write the decision rule as follows: D.R.: If - c* ≤ Pd ≤ c* then reject Ha; otherwise reject H0 and accept Ha. Step 4: Here, we assume Ho: pd = 0. We use the formula:

fig. 6

Step 5: Since α = 0.05, and we have a two-sided test, we look up z value from the area of the Normal distribution 0.5 - 0.025 = 0.475:z = 1.96. Step 6: c* =

= 1.96(0.03) ≈ 0.06

Step 7: We restate the decision rule: D.R.: If

then reject Ha; otherwise reject H0 and accept Ha.

➤(c). We use the decision rule in (b) to find the probability of a Type II error if pd = 0.08. fig. 7

Step 1: We use the formula:

to find the area for

which equals the probability of a Type II error. Step 2: Step 3: For pd = 0.08 we need to find the area for

.

Step 4:

Step 5: Using the normal distribution table for z = -4.67 and z = -0.67, we have 0.4999 - 0.2486 = 0.2513. Step 6: Therefore, the probability of a Type II error is β ≈ 0.25. ➤(d). The decision rule that we use to determine Type I and Type II errors is D.R.:If

then reject Ha; otherwise reject H0 and accept Ha.

Now, the above table shows that

.

Therefore,

Since the value 0.07 falls outside the interval -0.06 and 0.06, H0 is rejected and Ha is accepted. ➤(e). We conclude that there is a significant percentage difference between male and females in their listening habits of this talk show host. 39.2 - Example 2: A major petroleum company claims that a new additive in its oil produces will increase the percentage of cars that will get more than 25 miles per gallon on the freeways. To check this claim two samples are taken: one from automobiles containing oil with this additive and one from automobiles not containing oil with this additive. The following table summarizes this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.02. (c). If pd = 0.30, find the probability of a Type II error β. (d). From the samples is H0 or Ha rejected? (e). What conclusion can we come to? Solutions: ➤(a). Since we wish to check to seek if the additive increases the percentage of cars that get more than 35 miles per gallon, we state the following: Ho: pd = 0 (There is a no percentage increase.) Ha: pd > 0 (There is a percent increase.) ➤(b). Step 1: Since we assume Ho: pd = 0, we use the following formula to estimate p:

and

Step 2: Since P1 = 0.24, P2 = 0.19, N1 = 36 and N2 = 36,

Step 3: This is a one - sided test, we first write the decision rule as follows: D.R.: If c*≤ Pd then reject H0 and accept Ha; otherwise reject Ha. Step 4: Here, we assume Ho: pd = 0. We use the formula

fig. 8

Step 5: Since α = 0.02, and we have a one-sided test, we look up z value from the area of the Normal distribution: 0.5 - 0.02 = 0.48 and find z = 2.05. Step 6: c* =

= 2.05(0.10) ≈ 0.21

Step 7: We restate the decision rule: D.R.: If 0.21 ≤ Pd then reject H0 and accept Ha; otherwise reject Ha.

➤(c). We use the decision rule in (b) to find the probability of a Type II error if pd = 0.30. Step 1: We use the formula

to find the area for

Pd < 0.21 which equals the probability of a Type II error. Step 2: fig. 9

Step 3: For pd = 0.30 we need to find the area for Pd < 0.21. Step 4: Step 5: Using the normal distribution table for z = -0.9, we have 0.4999 0.3159 = 0.1840. Step 6: Therefore, the probability of a Type II error is β ≈ 0.18.

➤(d). The decision rule that we use to determine Type I and Type II errors is D.R.: If 0.21 ≤ Pd then reject H0 and accept Ha; otherwise reject Ha. Now, the above table shows that P1 = 0.24 and P2 = 0.19 Therefore,

Since the value Pd = 0.05 falls outside the interval

Ha is rejected. ➤(e). We conclude that there is a no significant percentage difference between automobiles using and not using the additive.

Solved Problems 39.2 - Solved Problem 1: Mr. Ito, a statistics instructor, wishes to test the effect of two different text books on the percentage of students in class that earn a grade of B or better. During the fall of 1994, he teaches two sections of statistics where in each section he uses a different text. The following table is a summary of this survey:

(a). State Ho and Ha.

(b). State the decision rule for α = 0.05. (c). If pd = 0.20, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to? Solutions: ➤(a). We start by claiming that the null hypothesis that there is no difference. Therefore, Ho: pd = 0 (There is no difference between texts.) Ha: pd ≠ 0 (There is a difference between texts.) ➤(b). Step 1: Since we assume Ho: pd = 0, we use the following formula to estimate p:

and

Step 2: Since P1 = 0.60, P2 = 0.55, N1 = 75 and N2 = 105,

Step 3: This is a two sided test, we first write the decision rule as follows: D.R.: If -c* ≤ Pd ≤ c* then reject Ha; otherwise reject H0 and accept Ha.

Step 4: Here, we assume Ho: pd = 0. We use the formula:

Step 5: Since α = 0.05, and we have a two-sided test, we look up z value from the area of the normal distribution 0.5 - 0.025 = 0.475: z = 1.96. Step 6: c* = fig. 10

Step 7: We restate the decision rule: D.R.: If

then reject Ha; otherwise reject H0 and accept Ha.

➤(c). We use the decision rule in (b) to find the probability of a Type II error if pd = 0.20. Step 1: We use the formula:

to find the area for

which equals the probability of a Type II error. Step 2: Step 3: For pd = 0.20 we need to find the area for:

Step 4:

Step 5: Using the normal distribution table for z = -4.86 and z = -0.86, we have 0.4999 - 0.3051 = 0.1948. fig. 11

Step 6: Therefore, the probability of a Type II error is β ≈ 0.19. ➤(d). The decision rule that we use to determine Type I and Type II errors is D.R.: If Ha.

, then reject Ha; otherwise reject H0 and accept

Now, the above table shows that

and

Therefore,

Since the value 0.05 falls in the interval -0.14 and 0.14, Ha is rejected. ➤(e). We conclude that there is a no significant percentage difference between the performance of students in the two statistics classes. 39.2 Solved Problem 2: A federal security agency claims that there is a significant preference for gun control legislation on the East coast in comparison to the West coast. A sample of adults was taken on their opinion from both coasts. The following table is a summary of this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.02. (c). If pd = 0.10, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to? Solutions: ➤(a). Ho: pd = 0 (There is zero percentage difference in favor of gun control..) Ha: pd > 0, (There is a significant percent difference in favor of gun control.)

➤(b). Step 1: Since we assume Ho: pd = 0, we use the following formula to estimate p:

and

Step 2: Since

N1 = 100 and N2 = 200,

Step 3: This is a one - sided test, we first write the decision rule as follows: D.R. If c* ≤ Pd then reject H0 and accept Ha; otherwise reject Ha. Step 4: Here, we assume Ho: pd = 0. We use the formula

fig. 12

Step 5: Since α = 0.02, and we have a one-sided test, we look up z value from the area of the Normal distribution 0.5 - 0.02 = 0.48: z = 2.05. Step 6: c* =

= 2.05(0.06) ≈ 0.12

Step 7: We restate the decision rule: D.R.: If

then reject H0 and accept Ha; otherwise reject Ha.

➤(c). We use the decision rule in (b) to find the probability of a Type II error if pd = 0.10. Step 1: We use the formula

to find the area for Pd < 0.12. which equals the probability of a Type II error. Step 2:

= 0.06

Step 3: For pd = 0.10 we need to find the area for: Pd < 0.12

fig. 13

Step 4: Step 5: Using the normal distribution table for z = 0.033, we have 0.5 + 0.1293 = 0.6393. fig. 14

Step 6: Therefore, the probability of a Type II error is β ≈ 0.63. ➤(d). The decision rule that we use to determine Type I and Type II errors is D.R.: If 0.12 ≤ Pdthen reject H0 and accept Ha; otherwise reject Ha.

Now, the above table shows that

Therefore,

Since the value 0.14 falls inside the interval

Ho is rejected and Ha is accepted. ➤(e). We conclude that on the East coast, adults are more in favor of gun control legislation than on the West coast.

Unsolved Problems with Answers 39.2 - Problem 1: A national honors society only accepts graduating seniors with a over all grade point average of 3.5 or better. The organization undertakes a study to see if there is a percent difference between graduating English majors and Economic majors that meet their criteria for membership. The following table summarizes a national random sampling of graduating students majoring in these two disciplines.

(a). State Ho and Ha. (b). State the decision rule for α = 0.01. (c). If pd = 0.10, find the probability of a Type II error β. (d). From the samples is Ho or Ha rejected? (e). What conclusion can we come to?

Answers: ➤(a). ➤(b). D.R.: If -0.05 ≤ Pd ≤ 0.05 then reject Ha; otherwise reject H0 and accept Ha. ➤(c). β = 0.006 ➤(d). Reject Ha ➤(e). The study shows that there is no significant difference in the percentage of students that major in English and Economics as far as having a GPA of 3.5 or more. ⇑ Refer back to 39.2 - Example 1 & 39.2 - Solved Problem 1. 39.2 - Problem 2: In a certain South American country, a religious journal claims that a higher percentage of adult Catholics believe in birth control than Protestants do. To test this claim, random samples of both groups were taken. The following table is a summary of this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.10. (c). If pd = 0.05, find the probability of a Type II error β. (d). From the samples is H0 or Ha rejected? (e). What conclusion can we come to? Answers:

➤(a). ➤(b).D.R.: 0.05 ≤ Pd then reject H0 and accept Ha; otherwise reject Ha. ➤(c). β = 0.5 ➤(d). Reject Ho and accept Ha. ➤(e). A significant higher percentage of Catholics support birth control as compared to Protestants in this country. ⇑ Refer back to 39.2 - Example 2 & 39.2 - Solved Problem 2.

Supplementary Problems 1. A computer manufacturing company purchases their mother boards from two different companies. Their statistics research department wants to take samples of these mother boards to estimate the difference in the percentage of defective mother boards. The following table summarizes this survey.

Derive a 95% confidence interval for the true difference in the percentage of defective fuses. 2. To study the effects of an experimental drug on high blood pressure, 500 patients suffering from hypertension volunteered to test this drug. The 500 patients were divided in half at random in two groups. Without their knowledge, group A (the controlled group) were given sugar pills and group B were given the medication. After one month, the following table summarizes this study:

a. State Ho and Ha b. For α = 0.05, state an appropriate decision rule. c. From the data in the table, would you conclude the medication is effective? d. If we conclude that pd = 0.14, what is the probability that the error is greater than 1%? e. Assuming N1 = N2, if we conclude that pd = 0.14, what sample size would be needed so that the probability is 0.05 that the error is greater than 1%? f. If N1 = N2 = 250 and

find β. 3. Two machines are used independently to produce electronic parts. Both machines produce about 7% defective parts. During production, if machine 1 produces significantly more defective parts than machine 2, machine 1is shut down. To test this, equal samples are taken from both machines. a. State Ho and Ha. b. Find the appropriate sample size and the decision rule where α = 0.05 and β = 0.02 when pd = 0.15 - 0.05 = 0.10. 4.Show

.

5. Jack and Jill play the following game: Jack tosses a fair coin 100 times. Jill tosses a biased coin (p = 0.55) 100 times. If Jack tosses more heads than Jill he will win the game; otherwise Jill wins. Find the probability that Jack will win.

Statistical Inference Theory Lesson 40 Small Sampling Theory

A sample from a population is considered small if the sample size N < 30. If the sample is small than the Central Limit theorem can not be used. Under certain conditions, the Student t distribution1 can be used.

40.1 - What is the Student t Distribution for x? The Student t distribution P{X ≤ x} equals the area under the bell shaped curve. Even though the figure looks like the Normal distribution, the values under the curve are different. Also there is more than one Student t distribution depending on the number of degrees freedom. The Student t distribution (for short, the t distribution) gives the probability under the shaded area of the curve for different degrees of freedom (df).

If a small sample of size N is taken at random from a population that is approximately normally distribution then X has approximately a t distribution and a standard deviation , or a standard deviation , where s is the standard deviation of the sample. The degrees of freedom is df = N-1. Table D is given for the t distribution whose mean μ = 0 and standard deviation σ = 1. Each line of the table represents a different degree of freedom. To find the appropriate area from this table for we must change to

or

where N - 1 is the degrees of freedom.

Throughout this lesson we assume all samples come from populations that are approximately normally distribution. 40.1 - Example 1: Assume a sample is taken where N = 25. Find t.40 in Table D for the shaded area in figure 2. Solution: Step 1: Go to the top of the t distribution table D. Step 2: Go across until you find t.40. Step 3: Move down until you reach the line df = 25 - 1 = 24.

Step 4: From the table t.40 = 1.318. 40.1 - Example 2: A random sample of size N = 17 is taken from a population with a mean μ = 10 and a standard deviation σ = 2. Find the 95% range of all possible values X. Solution: We use the formula

Step 1: fig. 3

Step 2: For 95%, we look up 0.95/2 = 0.475 and df = 17 - 1 = 16 in the t distribution table. Therefore, t.475 = 2.12. Step 3: Step 4: All sample means lie in the interval 8.94 ≤ X ≤ 11.06 with 95% confidence.

Solved Problems 40.1 - Solved Problem 1: Assume a sample is taken where N = 5. Find t.495 in Table D for the shaded area in figure 4. Solution: Step 1: Go to the top of the t distribution table D.

Step 2: Go across until you find t.495. Step 3: Move down until you reach the line df = 5 - 1 = 4. Step 4: From the table, t.495 = 4.604. 40.1 - Solved Problem 2: A random sample of size N = 26 is taken from a population with a mean μ = 5 and a standard deviation σ = 0.2. Find the 90% range of a possible values X. Solution: We use the formula fig. 5

Step 1: Step 2: For 90%, we look up 0.90/2 = 0.45 and df = 26 - 1 = 25 in the t distribution table. Therefore, t.45 = 1.708. Step 3: Step 4: All sample means lie in the interval 4.93168 ≤ X ≤ 5.06832 with 90% confidence.

Unsolved Problems with Answers

40.1 - Problem 1: Assume a sample is taken where N = 29. Find t.49. Answer: 2.467 ⇑ Refer back to 40.1 - Example 1 & 40.1 - Solved Problem 1. 40.1 - Problem 2: A random sample of size N = 10 is taken from a population with a mean μ = 1 and a standard deviation σ = 1. Find the 80% range of a possible values X. Answer: 0.54 ≤ X ≤ 1.46 ⇑ Refer back to 40.1 - Example 2 & 40.1 - Solved Problem 2.

40.2 - Estimating µ 40.2 - Example 1: Scientists recently discovered 10 dinosaur eggs in a remote area of Southern California. The mean weight of these eggs is 3.2 pounds with a standard deviation of 1.2 pounds. Find a 95% confidence interval of the mean of dinosaur eggs found in this area. Solution: We use the formula

Step 1: Since we want a 95% confidence, we use 0.95/2 = 0.475. Therefore t = t0.475. Step 2: For N = 10, df = 10 - 1 = 9 Step 3:

Step 4: For df = 9 and t.475, the t - distribution table gives t = 2.262. Step 5: Since X = 3.2, the above formula gives 3.2 - (2.262)(0.4) ≤ μ ≤ 3.2 + (2.262)(0.4). Step 6: From step 5 we have approximately: 2.3 ≤ μ ≤ 4.1.

Solved Problems 40.2 - Solved Problem 1: A large automobile manufacturing company wishes to test 5 new models to estimate the average gas mileage. After driving these cars for over 1,000 miles, the average mileage was 25.6 miles per gallon with a standard deviation of 2.5 miles per gallon. Find a 90% confidence interval for the true average mileage for these models. Solution: We use the formula

Step 1: Since we want a 90% confidence, we use 0.90/2 = 0.45. Therefore, t = t.45 Step 2: For N = 5, df = 5 - 1 = 4. Step 3: Step 4: For df = 4 and t.45 the t distribution table gives t = 2.132. Step 5: Since X = 25.6, the above formula gives 25.6 - (2.132)(1.25) ≤ μ ≤ 25.6 + (2.132)(1.25). Step 6: From step 5 we have approximately 22.94 ≤ μ ≤ 28.27 miles per gallon.

Unsolved Problems with Answers

40.2 - Problem 1: Twelve students at a local high school ran the 100 yard dash in 11.3 seconds. Assuming these students are representative of all runners for the high school, Find a 95% confidence interval for the true average running time for the 100 yard dash. Assume a standard deviation 0.15 seconds. Answer: 11.2 ≤ μ ≤ 11.4 ⇑ Refer back to 40.2 - Example 1 & 40.2 - Solved Problem 1.

40.3 - Statistical Decision Theory. 40.3 - Example 1: It is claimed that a religious artifact is at least 1,850 years old. Three separate tests using radioactive carbon dating were taken to test this claim. The result of these tests showed an average age of 1,792 years with a standard deviation of 148.5 years. (a). State Ho and Ha. (b). Using an error of α = 0.05, state an appropriate decision rule. (c). From the results of the test, would you reject the claim? Solutions: ➤(a). Since the claim: 'at least 1,850 years' is used, Ho: μ ≥ 1850 Ha: μ < 1850 ➤(b). Since we are testing μ < 1850, we first state the decision rule as: D.R.: If X < c* then reject H0 and accept Ha; otherwise reject Ha. Step 1: We use the formula Step 2:

.

Step 3: Since N = 3 and σ = 148.50 then

Step 4: Step 5: To find t we need df = 3 - 1 = 2 and since α = 0.05, we find t.45 in the table: t.45 = -2.92. Step 6: c* = μ + tσX = 1850 + t(121.4) = 1850 - 2.92(105) = 1,543.40 years. Step 7: The decision rule reads now: D.R.: If X < 1,543.40 years then reject H0 and accept Ha; otherwise reject Ha.. ➤(c). Since the result of the test was X = 1,792 and 1,792 > 1,543.51, we accept that the age is greater than 1,850.

Solved Problems 40.3 - Solved Problem 1: A local zoo recently received 5 elephants from a certain area of Africa. The average weight of these elephants weigh 2.1 tons with a standard deviation of 0.3 tons. The zoologist taking care of these animals claims that this breed of elephants weights on average 2.5 tons. (a). State Ho and Ha. (b). Using an error of α = 0.05, state an appropriate decision rule. (c). From the information presented, would you reject the claim? Solutions ➤(a). Since the claim 'weights on average 2.5 tons. Ho: μ = 2.5 tons

Ha: μ ≠ 2.5 tons fig. 6

➤(b). Since we are test μ = 2.5, we first state the decision rule as: D.R.: If 2.5 - c* ≤ X ≤ 2.5 + c* then reject Ho and accept Ha; otherwise reject Ha. Step 1: We use the formula c* = tσX. Step 2: Step 3: Since N = 5 and σ = 0.3 then

Step 4: Step 5: To find t we need df = 5-1 = 4 and since α = 0.05, we find t.475 in the table: t.475 = 2.776. Step 6: c* = tσX = t(0.15) = 2.776(0.15) = 0.42 tons Step 7: The decision rule reads now:

D.R.: If 2.08 ≤ X ≤ 2.92 then reject Ha; otherwise reject Ho.. ➤(c). Since the result of the test was X = 2.3, we have no basis for rejecting the zoologist claim.

Unsolved Problems with Answers 40.3 - Problem 1: The statistical records of a local poultry farm shows that the average number of eggs laid per chicken is 38 eggs per month. In an attempt to increase production, they introduce a new feed. After several months, they randomly select 10 chickens and discovered that they laid, on average, 40 eggs with a standard deviation of 2.5 eggs. (a). State Ho and Ha. (b). Using an error of α = 0.05, state an appropriate decision rule. (c). From the information presented, would you reject the claim? Answers: ➤(a). ➤(b). D.R.: If X ≥ 39.52 then accept that the new feed is effective in increasing production. If X < 39.52 then assume the new feed does not increase production. ➤(c). Since X = 40 > 39.52 we accept the new feed increases production of eggs. ⇑ Refer back to 40.3 - Example 1 & 40.3 - Solved Problem 1.

40.4 - The T distribution of the difference of means for small samples. Assume two small samples (N ≤ 30) are taken from two distinct sample spaces each approximately normally distributed and σ1 = σ2. The distribution of the difference between the sample means:

Xd = X1 - X2 is a t-distribution with a standard deviation

where N1, N2 are the sample sizes respectively, S1, S2 are the standard deviations of the samples respectively, N1 + N2 - 2 is the degrees of freedom. 40.4 - Example 1: A major petroleum company claims that a new additive in its oil will significantly increase gas mileage for automobiles. To check this claim, two samples are taken: one from automobiles containing oil with this additive and one from automobiles not containing oil with this additive. The following table summarizes this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). From the samples, is Ho or Ha rejected? Solutions: ➤(a). Since we wish to check to seek if the additive increases gas mileage, we state the following:

Ho: μd = 0 (There is a no difference in the average mileage.) Ha: μd > 0 (There is a difference in the average mileage.) ➤(b). This is a one sided test. We first write the decision rule as follows: D.R.: Assume Xd = X1 - X2 If Xd ≥ c* then reject Ho and accept Ha; otherwise reject Ha.. Here we assume Ho: μd = 0. Step 1: Xd = X1 - X2 = 22.67 - 19.66 = 3.01 Step

2:

Step 3: We use the formula: c* = μd + tσd = 0 + tσd = t(1.36). Step 4: Since α = 0.05, and we have a one-sided test, we look up the t value from the t distribution table for 10 + 10 - 2 = 18 degrees of freedom: t = 1.734. Step 5: c* = tσd = 1.734(1.36) ≈ 2.36 miles per gallon Step 6: We restate the decision rule: D.R.: If Xd ≥ 2.36 then reject Ho and accept Ha; otherwise reject Ha.. ➤(c). Xd = 22.67 - 19.66 = 3.01 ≥ 2.36 Therefore, reject Ho and accept Ha.

Solved Problems 40.4 - Solved Problem 1: A weight control client tested a new drug on both men and women to see if there is a significant difference in weight loss between these two groups. The following table summarizes this survey:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). From the samples, is H0 or Ha rejected? Solutions: ➤(a). Since we wish to check to seek if there is a significant difference in weight loss. we state the following: Ho: μd = 0 (There is a no difference in the average weight loss.) Ha: μd ≠ 0 (There is a difference in the average weight loss.) ➤(b). This is a two sided test. We first write the decision rule as follows: D.R.: Assume Xd = X1 - X2 If - c* ≤ Xd ≤ c* then reject Ha; otherwise, rejectHo and accept Ha.. Here we assume Ho: μd = 0. Step 1: Step

2:

Step 3: We use the formula: c* = μd + tσd = 0 + tσd = t(1.42). Step 4: Since α = 0.05, and we have a two-sided test, we look up the t value from the t distribution table for 10 + 15 - 2 = 23 degrees of freedom: t0.475 = 2.069. Step 5: c* = tσd = 2.069(1.42) ≈ 2.94 pounds loss Step 6: We restate the decision rule: D.R.: If -2.94 ≤ Xd ≤ 2.94 then reject Ha; otherwise reject Ho and accept Ha. ➤(c). Xd = 25 - 19 = 6 ≥ 2.94 Therefore, reject Ho and accept Ha

Unsolved Problems with Answers 40.4 - Problem 1: A environmental group claims that the bald eagles in Northern California and Southern California lay on average a difference of 2 eggs per year. A random selection of the nests of these two regions show the following results:

(a). State Ho and Ha. (b). State the decision rule for α = 0.05. (c). From the samples is Ho accepted or rejected? Answers: ➤(a). ➤(b). D.R.: If - 1.11 ≤ Xd ≤ 5.11 then reject Ha. ➤(c). Xd = 2.6, we reject Ha. ⇑ Refer back to 40.4 - Example 1 & 40.4 - Solved Problem 1.

Supplementary Problems 1. At a large university, ten students majoring in English are randomly selected. The following list is their ages: 19, 17, 21, 20, 19, 21, 18, 22, 17, 17. Typically, the average age for English students is 21.5 years old. a. State Ho and Ha. b. Using an error of α = 0.10, state an appropriate decision rule. c. From the information presented, would you reject the claim? 2. A sacred cloth was tested for its age. To test the reliability of such testing, two independent labs were selected to do radioactive carbon dating. Five thread samples were sent to one lab and 4 thread samples were sent to the other lab. The following table summarizes the tests:

(a). Using an error of α = 0.05, state an appropriate decision rule to establish if the two tests are consistent. (b). From the information presented, would you agree the testing is consistent? 3. A machine is designed to fill jars with 16 ounces of coffee. A consumer suspects that the machine is not filling the jars completely. A sample of 8 jars is taken and has a mean of 15.6 ounces and a standard deviation of 0.3 ounces. At α = 0.10, we wish to test the consumers's claim. a. State Ho and Ha b. The critical value t is (a). 3.77 (b). -3.77 (c). 1.415 (d). -1.415 (e). -1.397. c. Is this sufficient evidence to reject her claim?

1

The Student t distribution was developed by W.S. Gossett, who published his work under the name "Student" during the first part of the 20th century.

Statistical Inference Theory Lesson 41 The F Distribution

The F distribution is used to compare the difference, if any, between variances of two different populations. Because the distribution of difference σ12 - σ22 is very complicated, we consider the ratio σ22 /σ12. If this ratio is very large or small then we can conclude there is a significant difference between these two variances. For simplicity, we assume that σ22 ≥ σ12. For testing the difference between these two variances we use the following null and alternative hypothesis: Ho: σ1 = σ2 Ha: σ2 > σ1. Assume we have two samples taken respectively from normally (or approximately ) distributed populations with variances σ12, σ22. If N1 and N2 are respectively the sample sizes and S1 and S2 (S2 > S1) the standard deviations respectively of each sample, then the F distribution is given by:

and d1 = N1 -1 degrees of freedom d2 = N2 -1 degrees of freedom.

The graph for the F distribution is shown on the right. The total area under the curve is one.

41.1 - Applications 41.1 - Example 1: At a small Eastern college, the grade nt averages for senior female and senior male students are 3.80 and 3.71 respectively. Even though these female students have a higher grade point average, it is believed by the administration that the deviation from the average is greater for female than males. From this group of students, a sample of 31 female and 31 males is taken. This sample produced the following results: the female students had a variance of 0.7225 and the male students a variance of 0.3364. (a). State the null and alternative hypothesis. (b). Find F. (c). Find F0.05.

Using a 0.05 level of significance, would you reject the null hypothesis? (d). Find F0.01. Using a 0.01 level of significance, would you reject the null hypothesis? Solutions: N1 : 31 senior male students sampled. N2 : 31 senior female students sampled. σ12: grade point variance of senior male students. σ22 : grade point variance of senior female students. s12: 0.3364, sample grade point variance of senior male students. s22: 0.7225, sample grade point variance of senior female students.

➤(a). Ho: σ12 = σ22 Ha : σ22 > σ12 ➤(b). From the null hypothesis, we assume σ12 = σ22. Therefore the equation can be written as

➤(c). Step 1: Since N1 = N2 = 31,

d1 = N1 - 1 = 31 - 1 = 30 degrees of freedom. d2 = N2 - 1 = 31 - 1 = 30 degrees of freedom. Step 2: Use the F distribution table E for α = 0.05. Step 3: Since the degrees of freedom for both samples is 30, we find the value F0.05 = 1.84. Step 4: Using a level of significance of α = 0.05 and F = 2.15 > 1.84, we reject Ho: Accept the claim that the variation in grades for these females is greater than the male students. ➤(d). We use all the calculated values from (b). Step 1: Use the F distribution table for α = 0.01. Step 2: Since the degrees of freedom for both samples is 30, we find the value F0.01 = 2.39.

Step 3: Using a level of significance of α = 0.01 and F = 2.15 < 2.39, we do not reject Ho : There is no statistical significant variation between the female and male grade point averages.

Solved Problems 41.1 - Solved Problem 1: It is claimed that a religious artifact is at least 1,850 years old. In 1993, ten separate tests using radioactive carbon dating were taken to test this claim. The result of these tests showed an average age of 1,792 years with a variance of 44,100 years. In 1995, 8 separate tests were taken. The result of these tests showed an average age of 1,810 years with a variance of 65,025 years. The research team conducting these tests wish to see if there is a significant difference in the variations between these results. (a). State the null and alternative hypothesis. (b). Find F. (c). Find F0.05. Using a 0.05 level of significance, would you reject the null hypothesis? (d). Find F0.01. Using a 0.01 level of significance, would you reject the null hypothesis? Solutions: N1 : 10 tests were taken in 1993. N2 : 8 tests were taken in 1995. σ12: population variance of tests in 1993 σ22 : 65,025, population variance of tests in 1995 s12: 44,100, sample variance of tests in 1993 s22 : 65,025, sample variance of tests in 1995

➤(a). Ho: σ12 = σ22 Ha : σ22 > σ12 ➤(b). F = 1.52 ➤(c). Step 1: From the null hypothesis, we assume σ12 = σ22. Therefore the equation can be written as:

Step 2: d1 = N1 - 1 = 10 - 1 = 9 degrees of freedom d2 = N2 - 1 = 8 - 1 = 7 degrees of freedom Step 3: Use the F distribution table for α = 0.05.

Step 4: Since the degrees of freedom for the numerator is 7 and 9 for the denominator, we find the value F0.05 = 3.29. Step 5: Using a level of significance of α = 0.05 and F = 1.52 < 3.29, the research team has no basis for rejecting Ho. Statistically there is no reason to assume the variance between these tests are different.

➤(d). We use all the calculated values from (b). Step 1: Use the F distribution table for α = 0.01.

Step 2: Since the degrees of freedom for the numerator is 7 and 9 for the denominator, we find the value F0.01 = 5.61. Step3: Using a level of significance of α = 0.01 and F = 1.52 < 5.61, the research team has no basis for rejecting Ho. Statistically there is no reason to assume the variance between these tests are different.

Unsolved Problems with Answers 41.1 - Problem 1: In a factory there are two machines that produce ball bearings. Both machines produce ball bearings of equal size. Each day, samples of 100 ball bearings are randomly selected from each machine to check on the variance of the productions. If the variance of measurement significantly differ between the samples, then it will be accepted that the machines are not functioning properly. One sample for a machine had variance S12 = 0.21 and the other S22 = 0.33 (a). State the null and alternative hypothesis. (b). Find F. (c). Find F0.05.

Using a 0.05 level of significance, would you reject the null hypothesis and conclude the machines are not functioning properly? (d). Find F0.01. Using a 0.01 level of significance, would you reject the null hypothesis and conclude the machines are not functioning properly? Answers: ➤(a). Ho: σ12 = σ22 Ha : σ22 > σ12 ➤(b). F = 1.57 ➤(c). F0.05 = 1 Reject the null hypothesis and accept that there is a significant difference in the variance between these two machines. ➤(d). F0.01 = 1 Reject the null hypothesis and accept that there is a significant difference in the variance between these two machines. ⇑ Refer back to 41.1 - Example 1 & 41.1 - Solved Problem 1.

Supplementary Problems 1. In a factory there is a machine that produce ball bearings. A salesperson from a company that produces these machines claims that their latest model had a variance half that of the factory’s machine. To check this claim, samples of 100 ball bearings from each machine are taken. The sample from the factory's machine had variance 0.21 and the sample from the new model had a variance of 0.13. a. State the null and alternative hypothesis.

b. Find F. c. Using a 0.05 level of significance, state your conclusion. 2. In a factory there are two machines that produce ball bearings. Both machines produce ball bearings of equal size. However, one machine has a production variance of σ12 = 0.020 and the other machine a variance of σ22 = 0.025. It is believed that when the machine with the larger variation is lubricated each day with an expensive oil, the variances of the two machines become the same. To test this hypothesis, after the application of this special oil, samples of 100 ball bearings are randomly selected from each machine to check on the variance of the productions. The results of these samples was S12 = 0.021 and S22 = 0.033 respectively. a. State the null and alternative hypothesis. b. Find F. c. Find F0.05 . Using a 0.05 level of significance, would you reject the null hypothesis? 3. A tourist agency in southern Florida, claims that the variation in temperature is the same in January as in February. To test this claim 21 days are randomly selected in both months and the variance are computed with the following results: the variance in temperature for January was s21 = 3.50 and for February the variance was s22 = 4.10. a. State Ho and Ha. b. Using a level of significance of α = 0.10, would you reject the agency’s claim? c. If σ12 = 2.5 and σ22 = 4.8, would you reject the agency’s claim?

Statistical Inference Theory Lesson 42 Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) is a method for testing the hypothesis of the differences between three of more population means. In the case where we are testing three population means, Ho and Ha are Ho: μ1 = μ2 = μ3 Ha: μ1 ≠ μ2 or μ1 ≠ μ3 or µ2 ≠ μ3 To us ANOVA, we need to make the following assumptions about the given populations: 1. The populations are assumed to be (approximately) normally distributed. 2. The populations have equal variances. 3. The samples drawn from each population are independent of each other. The following example is as typical problem to be solved by ANOVA: Assume the track coach of a local high school wishes to test, among three brands of running shoes, the best performing shoes. He decides to select 15 track runners to run the 100 yard dash. In this race, each brand is worn by five runners. The following table gives the timing outcome of the race for each brand:

42.1 - One-factor classification Problems of this type are called one-factor classification because only one variable (factor) is considered: the brand of running shoes. The variables are called treatments. For the above example we have three treatments, the three brands of shoes. In this lesson we will study both one-factor and two-factor classification of ANOVA where the samples are of equal size for each treatment. In deciding to reject Ho or not, we first must learn to compute three types of variations: 1. Total variation 2. Variation within treatments 3. Variation between treatments The equation for the relationship between total variation, within variation and between variation is

42.1 - Example 1: For the example above, compute (a). total variation. (b). variation between treatments.

(c). variation within treatments. Solutions: ➤(a). Step 1: Compute the mean X for all the numbers in the table:

Step 2:



Subtract X from each of the numbers in the table.



Square each of these.



The total variation is the sum of the values computed in the table:

Sum of Table Values: S2T ≈ 5.92 ➤(b). Step 1: Compute the mean X for each column:

Step 2:



Subtract X (computed in step 1) from each of the Xs’ in the above table.



Square each of these differences.



The between variation is the sum of the values from b multiplied by the number of rows.

➤(c). To compute the within variation, we use the formula:

Solved Problems 42.1 - Solved Problem 1: A large petroleum company wishes to test five new gasoline additives for increase fuel efficiency. Their research department purchased 35 new model sedans and drove each car 100 miles, over the same

track. Each additive was mixed with the gasoline of seven sedans. The following table is the mileage recorded for each car in this test. Here, mileage is measured for each car as to the number of gallons consumed to travel 100 miles.

Compute: (a). total variation. (b). variation between treatments. (c). variation within treatments. Solutions: ➤(a). Step 1: Compute the mean X for all the numbers in the table:

X = 5.63 Step 2:



Subtract X from each of the numbers in the table.



Square each of these.



The total variation is the sum of the values in the table:

Sum of the table values: S2T ≈ 26.15 ➤(b). Step 1: Compute the mean X for each column:

Step 2:



Subtract X (computed in step 1) from each of the Xs’ in the above table.



Square each of these differences.



The between variation is the sum of the values from b multiplied by the number of rows.

(C). To compute the within variation, we use the formula:

Unsolved Problems with Answers 42.1 - Problem 1: A medical research laboratory t wishes to test if there is a difference between three different drugs that promote weight loss for women over 200 pounds. The client randomly divide up fifteen over-weight women into three equal groups. Each group takes only one of the drugs. The following table is the resulting weight loss (in pounds ) after 60 days:

Compute: (a). total variation. (b). variation between treatments. (c). variation within treatments. Answers: ➤(a). ➤(b). ➤(c). ⇑ Refer back to 42.1 - Example 1 & 42.1 - Solved Problem 1.

42.2-Testing Hypothesis on Means using the F Distribution To test the null and alternative hypothesis: Ho: μ1 = μ2 = μ3 = … = μn Ha: μ1 ≠ μ2 or μ1 ≠ μ3 or

μ2 ≠ μ3, etc, we use the F distribution. Using the values from analysis of variance, we need to test the F distribution for the value where c = the number of treatments (number of columns of the tables) r = the sample size for each treatment (number of rows of the tables) d2 = c -1 degrees of freedom d1 = c(r - 1) degrees of freedom. 42.2 - Example 1: For the 42.1- Example 1, (a). State Ho and Ha. (b). Compute F. (c). Find F0.05. Would you reject Ho? State your conclusions. (d). Find F0.01. Would you reject Ho? State your conclusions. Solutions:

➤(a). Ho: μA = μB = μC Ha: at least one of the μ values is different from the other two. ➤(b). From 42.1 - Example 1,

➤(c). d2 = c -1 = 3 - 1 = 2 d1 = c(r - 1) = 3(5 - 1) = 12 From the F distribution table for 0.05, we find F0.05 = 3.89. Since F = 1.61 < 3.89, Ho is not rejected. For a level of significance of 0.05 we have not statistical basis to conclude that the make of the running shoes improves the performance of the runners. ➤(d). d2 = c -1 = 3 - 1 = 2

d1 = c(r - 1) = 3(5 - 1) = 12 From the F distribution table for 0.01, we find F0.01 = 6.93. Since F = 1.61 < 6.93, Ho is not rejected. For a level of significance of 0.01, we have not statistical basis to conclude that the make of the running shoes improves the performance of the runners.

Solved Problems

42.2 Solved Problem 1: For solved 42.1 - Problem 1, (a). State Ho and Ha. (b). Compute F. (c). Find F0.05. Would you reject Ho? State your conclusions. (d). Find F0.01..

Would you reject Ho? State your conclusions. Solutions: ➤(a). Ho: μA = μB = μC = μD = μE Ha: at least one of the μ values is different from the other two. ➤(b). From 42.1 - Solved Problem 1,

c = 5 and r = 7,

➤(c). d2 = c -1 = 5 - 1 = 4 d1 = c(r - 1) = 5(7 - 1) = 30 From the F distribution table for 0.05, we find F0.05 = 2.69. Since F = 3.25 > 2.69, Ho is rejected. For a level of significance of 0.05 we have a statistical basis to conclude that the type of additive makes a difference in mileage. ➤(d). d2 = c -1 = 5 - 1 = 4 d1 = c(r - 1) = 5(7 - 1) = 30 From the F distribution table for 0.01, we find F0.01 = 4.02. Since F = 3.25 < 4.02, Ho is not rejected. For a level of significance of 0.01, we have no statistical basis to conclude that the type of additive makes a difference in mileage.

Unsolved Problems with Answers 42.2 - Problem 1: For unsolved 42.1 - Problem 1, (a). State Ho and Ha. (b). Compute F.

(c). Find F0.05. Would you reject Ho? State your conclusions. (d). Find F0.01. Would you reject Ho? State your conclusions. Answers: ➤(a). Ho: μA = μB = μC Ha: at least one of the μ values is different from the other two. ➤(b). F ≈ 3.49 ➤(c). F0.05 = 3.89 Since F = 3.49 < 3.89, we do not reject Ho. There is no statistical basis for assuming there is any difference among the three diet drugs for reducing weight. ➤(d). F0.01 = 6.93 Since F = 3.49 < 6.93, we do not reject Ho. There is statistical basis for assuming there is any difference among the three diet drugs for reducing weight. ⇑ Refer back to 42.2 - Example 1 & 42.2 - Solved Problem 1.

42.3 - Two-factor classification For two factor classification, we have two different types of treatments. In table form, one set of treatments will be list in the top row and the second set of treatments will be listed on the first column. The following is an example of a two - factor classification of analysis of variance. Assume the track coach of a local high school wishes to test, among three brands of running shoes, the best performing shoes for freshmen, sophomore, and senior

students. The numbers in the table below, represent the average running times for the 100 yard dash.

Since we have two different types of treatments, we have two null hypothesis to tests: Ho(1): There is no difference between brands of running shoes (between columns). Ho(2): There is no difference between class of the runners. (between rows). In deciding to reject Ho(1) or Ho(2), we first must learn to compute four types of variations: 1. Total variation 2. Variation between rows 3. Variation between columns 4. Variation due to chance The equation for the relationship between these four variations is

42.3 - Example 1: For the example above, compute: (a). total variation. (b). variation between rows. (c). variation between columns.

(d). random variation. Solutions: ➤(a). Step 1: From the table above, compute the row totals, column totals, row means, column means, table total and mean of the table:

Step 2: Complete the following table by subtracting the grand mean from each value of the table and squaring these differences:

Step 3: Total variation is the total of all these numbers: ST2 ≈ 3.69. ➤(b). The formula for the variation between rows is by summing the values in the following table:

➤(c). The formula for the variation between columns is by summing the values in the following table:

➤(d). To compute the random variation

, we use the formula:

Solved Problems 42.3 - Solved Problem 1: A large petroleum company wishes to test five new gasoline additives for increased fuel efficiency. Their research department purchased 15 new model sedans and drove each car 100 miles, over the same track. Each additive was mixed with the three octane gasolines: regular, premium and super. The following table is the mileage recorded for each car in this test. Here, mileage is measured for each car as to the number of gallons consumed to travel 100 miles.

From this table, compute: (a). total variation. (b). variation between rows. (c). variation between columns. (d). random variation. Solutions: ➤(a). Step 1: From the table above, compute the row totals, column totals, row means, column means, grand total and mean of the grand total:

Step 2: Complete the following table by subtracting the table mean value of the and squaring these differences:

Total variation is the total of all these numbers: ST2 ≈ 5.97 ➤(b). The formula for variation between rows is by summing the values in the following table:

➤(c). The formula for variation between columns is by summing the values in the following table:

➤(d). To compute the random variation (SE2 ), we use the formula:

Unsolved Problems with Answers 42.3 - Problem 1: A medical research laboratory t wishes to test if there is a difference between three different drugs that promote weight loss for women and men over 200 pounds. The following table is the resulting weight loss (in pounds ) after 60 days.

From this table, compute (a). total variation. (b). variation between rows. (c). variation between columns. (d). random variation.

Answers: ➤(a). 49.77 ➤(b). 43.44 ➤(c). 3.37 ➤(d). 2.96 ⇑ Refer back to 42.3 - Example 1 & 42.3 - Solved Problem 1.

42.4 - Testing Hypothesis between rows and between columns using the F Distribution. We need to test two hypothesis: Ho(1): There is no statistical significant difference between columns. Ho(2): There is no statistical significant difference between rows. To test Ho(1), we use the F distribution where

with d2 = c-1 and d1 = (r - 1)(c - 1) degrees of freedom. To test Ho(2), we use the F distribution where

with d2 = r - 1

and d1 = (r - 1)(c - 1) degrees of freedom. 42.4 - Example 1: For 42.3 - Example 1, (a). Find F. Using a level of significance of 0.05 and 0.01, determine if there is a

statistical difference between brands of running shoes. (b). Find F. Using a level of significance of 0.05 and 0.01, determine if there is a statistical difference between class year.

Solutions: ➤(a). Here, we are testing across columns. Step 1: To find F, we use the formula:

where

r = the number of rows. Step 2: From Example 3.1, we computed:

Step 3: We have d2 = c - 1 = 3 - 1 = 2 degrees of freedom and d1 = (r -1)(c - 1) = (3 -1)(3 -1) = 4 degrees of freedom. Step 4: Using the F distribution table for α = 0.05, F0.05 = 6.94. Step 5: Since F = 0.019 < 6.94, we conclude there is no significant difference between running shoes. Step 6: Using the F distribution table for α = 0.01, F0.01 = 18 . Step 7: Since F = 0.019 < 18, we conclude there is no significant difference between running shoes. ➤(b). Here, we are testing down rows. Step 1: To find F, we use the formula:

where c = the number of columns. Step 2: From Example 3.1, we computed: S2R = 01.98, S2E = 1.56 Since c = 3,

Step 3: We have d2 = r - 1 = 3 - 1 = 2 degrees of freedom and d1 = (r -1)(c 1) = (3 -1)(3 -1) = 4 degrees of freedom. Step 4: Using the F distribution table for α = 0.05, F0.05 = 6.94. Step 5: Since F = 2.54 < 6.94, we conclude there is no significant difference between class years. Step 6: Using the F distribution table for α = 0.01, F0.01 = 18. . Step 7: Since F = 0.019 < 18, we conclude there is no significant difference between class years.

Solved Problems 42.4 - Problem 1: For 42.3 - Solved Problem 1, (a). Find F. Using a level of significance of 0.05 and 0.01, determine if there is a statistical difference between gasoline additives. (b). Find F. Using a level of significance of 0.05 and 0.01, determine if there is a statistical difference between octanes. Solutions: ➤(a). Here, we are testing across columns. Step 1: To find F, we use the formula:

where r = the number of rows,

Step 2: From 42.3 - Solved Problem 1, we computed: S2C = 0.99, and S2E = 2.08. Since r = 3,

Step 3: We have d2 = 5 - 1 = 5 - 1 = 4 degrees of freedom and d1 = (r -1)(c - 1) = (3 -1)(5 -1) = 8 degrees of freedom.

Step 4: Using the F distribution table for α = 0.05, F0.05 = 3.84.

Step 5: Since F = 0.95 < 3.84, we conclude there is no significant difference between gasoline additives. Step 6: Using the F distribution table for α = 0.01, F0.01 = 7.01. Step 7: Since F = 0.95 < 7.08, we conclude there is no significant difference between gasoline additives. ➤(b). Here, we are testing down rows. Step 1: To find F, we use the formula: where c = the number of columns.

Step 2: From 42.3 - Solved Problem 1, we computed: S2R = 2.9, S2E = 2.08 Since c = 5,

Step 3: We have d2 = r - 1 = 3 - 1 = 2 degrees of freedom and d1 = (r -1)(c 1) = (3 -1)(5 -1) = 8 degrees of freedom. Step 4: Using the F distribution table for α = 0.05, F0.05 = 4.46. Step 5: Since F = 5.57 > 4.46, we conclude there is a significant difference between octane which affects mileage. Step 6: Using the F distribution table for α = 0.01, F0.01 = 8.65 Step 7: Since F = 5.57 < 8.65, we conclude there is no significant difference in octane. Unsolved Problems with answers. 42.4 - Problem 1: For unsolved 42.3 - Problem 1, (a). Find F. Using a level of significance of 0.05 and 0.01, determine if there is a statistical difference between diet drugs. (b). Using a level of significance of 0.05 and 0.01, determine if there is a statistical difference between men and women. Answers: ➤(a). Since F = 1.14, using a level of significance of 0.05 and 0.01, we conclude there is significant difference between diet drugs. ➤(b). F ≈ 29.35. Since F = 29.35 > 19, we conclude there is a significant difference in weight loss between men and women. at a 0.05 level of significance. However, at a 0.01 level of significance, there is no significant

weight loss difference between men and women. ⇑ Refer back to 42.4 - Example 1 & 42.4 - Solved Problem 1.

Supplementary Problems Ms. Romano teaches three sections of beginning Latin at a local high school. The following table gives the average grades over the past five years.

Using one - factor classification to see if there is a significant difference between sections, find: 1. total variation (S2T). 2. variation between treatments (S2B). 3. Variation within treatments (S2W). 4. F. 5. Using a 0.05 level of significance, is there is a difference in grades between class sections. Applying two factor classification to the above table data, find 6. total variation. 7. variation between rows. 8. variation between columns. 9. random variation.

10. F. Using a 0.05 level of significance, is there is a difference in grades between class sections? 11. F. Using a 0.05 level of significance, is there is a difference in grades between the five years?

Statistical Inference Theory Lesson 43 The Chi-Square Distribution

The chi-square distribution has many applications in statistical analysis. Before discussing these applications, we will first see how to interpret this distribution using the chi-square table F.

43.1 - What is the Chi-square distribution? Assume we have a sequence of n independent normally distributed random variables X1,X2,…,Xn each with a mean μ = 0 and σ = 1. We define the chisquare distribution χ2 as

Associated with each chi-square distribution is the number of degrees of freedom d = n. For a given sample size n, each chi-square random variable χ2 has a

distribution represented by the above graph where 1. The total area under the curve is 1. 2. The values for chi-square χ2 are found on the horizontal axis.

Table F gives the chi-square values for degrees of freedom from 1 to 30 and different values of α. When the sample size is larger than 30 (N > 30), we can use the fact that the chi-square distribution approximately equals

where Z is the standard normal distribution with mean 0 and standard deviation 1. If α ≤ 0.5 then Z ≥ 0. If α > 0.5 then Z < 0. 43.1 Example 1: A sample of size N = 9 is taken. From the chi-square table find χ2 for α = 0.05. Solution: Step 1: Since N = 9, the number of degrees of freedom is d = 9. Step 2: Since α = 0.05, select from the first row of the table χ20.05.

Step 3: Select from the first column the row for d = 9.

Step 4: Match this row with the column of χ20.05. The intersection of this column and row is χ20.05 = 16.9. 43.1 - Example 2: A sample of size N = 16 is taken. From the chi-square table find χ2 for the shaded area given in the figure. Solution: Step 1: Since N = 16, the number of degrees of freedom is d = 16. Step 2: Since the shaded area is 0.025, we need to compute α = 1 - 0.025 = 0.975.

Step 3: Using α = 0.975, we use χ20.975. From the table we find χ20.975 = 6.91.

43.1 - Example 3: A sample of size N = 26 is taken. From the chi-square table find the non-shaded area between χ2 = 17.3 and χ2 = 45.6. Solution: Step 1: Since N = 26, the number of degrees of freedom is d = 26.

Step 2: For d = 26, χ20.90 = 17.3. Step 3: For d = 26, χ20.01 = 45.6. Step 4: From step 2 the left-hand shaded area in the figure is 1 - 0.90 = 0.10. Step 5: From step 3 the right-hand shaded area in the figure is 0.01. Step 6: Therefore, the non-shaded area is 1 - (0.10 + 0.01) = 0.89. 43.1 - Example 4: A sample of size N = 100 is taken. Find χ2 for α = 0.05. Solution: Since our chi-square table only exists for N ≤ 30, we use the formula

where Z is standard normal distribution with mean 0 and standard deviation 1. Step 1: Since α = 0.05, rewrite the above formula as

Step 2: d = 100 Step 3: Since Z is a standard normal distribution random variable, look up the area in the standard normal distribution table for 0.50 - α = 0.45. Step 4: From step 3, Z0.05 = 1.64. Step 5:

Solved Problems 43.1 - Solved Problem 1: A sample of size N = 21 is taken. From the chisquare table find χ2 for α = 0.01.

Solution: Step 1: Since N = 21, the number of degrees of freedom is d = 21. Step 2: Since α = 0.01, select from the first row of the table χ20.01. Step 3: Select from the first column the row for d = 21. Step 4: Match this row with the column of χ20.01. The intersection of this column and row is χ20.01 = 38.9.

43.1 - Solved Problem 2: A sample of size N = 30 is taken. From the chisquare table, find χ2 for the shaded area given in this figure. Solution:

Step 1: Since N = 30, the number of degrees of freedom is d = 30. Step 2: Since the shaded area is 0.10, we need to compute α = 1 - 0.10 = 0.90. Step 3: Using α = 0.90, we use χ20.90. From the table we find χ20.90 = 20.6. 43.1 - Solved Problem 3: A sample of size N = 4 is taken. From the chisquare table find the area between χ2 = 0.48 and χ2 = 11.1.

Solution: Step 1: Since N = 4, the number of degrees of freedom is d = 4. Step 2: For d = 4, χ2.975 = 0.48

Step 3: For d = 4, χ20.025 = 11.1 Step 4: From step 2 the left-hand shaded area in the figure is 1 - 0.975 = 0.025. Step 5: From step 3 the right-hand shaded area in the figure is 0.025

Step 6: Therefore, the non-shaded area is 1 - (0.025 + 0.025) = 0.95. 43.1 - Solved Problem 4: A sample of size N = 250 is taken. Find χ2 for α = 0.95. Solution: Since our chi-square table only exists for N ≤ 30, we use the formula

where Z is standard normal distribution with mean 0 and standard deviation 1. Step 1: Since α = 0.95, rewrite the above formula as

Step 2: d = 250 Step 3: Since Z is a standard normal distribution random variable, look up the area in the standard normal distribution table for 0.95- 0.50 = 0.45. Step 4: From step 3, Z0.95 = -1.64 Step 5:

Unsolved Problems with Answers 43.1 - Problem 1: A sample of size N = 21 is taken. From the chi-square table, find χ2 for α = 0.005. Answer: χ20.005 = 41.4 ⇑ Refer back to 43.1 - Example 1 & 43.1 - Solved Problem 1. 43.1 - Problem 2: A sample of size N = 10 is taken. From the chi-square table, find χ2 for the shaded area given the figure.

Answer: χ2 = 2.16 ⇑ Refer back to 43.1 - Example 2 & 43.1 - Solved Problem 2. 43.1 - Problem 3: A sample of size N = 14 is taken. From the chi-square table, find the area between χ2 = 4.66 and χ2 = 7.79. Answer: 0.08 ⇑ Refer back to 43.1 - Example 3 & 43.1 - Solved Problem 3. 43.1 - Problem 4: A sample of size N = 35 is taken. Find χ2 for α = 0.90. Answer:

χ2 0,90= 24.7. ⇑ Refer back to 43.1 - Example 4 & 43.1 - Solved Problem 4.

43.2- Estimating σ2 and σ using a χ2 confidence interval. Assume a random sample of size N is taken from a normal population with variance σ2. If the variance of the sample is s2, then it can be shown that has a Chi-square distribution. Therefore, with d = N -1 degrees of freedom. From this distribution, we can derive the following confidence intervals for the variance σ2 and standard deviation σ of a population:

43.2 - Example.1: The Sweet Water Bottling company has a machine that fills in bottles 12 ounces of water. Each morning the variance of the machine is set to σ2 = 0.1 ounces. During the day, the vibrations in the operation of the machine can significantly change the variance of the machine. To check for significant changes in variance, a sample of N bottles is taken and the sample variance s2 is recorded. (a). For N = 16, s2 = 0.15 and a 95% confidence interval, estimate the variance σ2 and standard deviation σ of the machine.

(b). For N = 100, s2 = 0.11 and a 95% confidence interval, estimate the variance σ2 and standard deviation σ of the machine. Solutions: ➤(a). Step 1: Since we have a 95% confidence interval, α = (1 - 0.95)/2 = 0.025 and 1 - α = 0.975. Step 2: For N - 1 = 16 - 1 = 15 degrees of freedom, from the table we have χ20.025 = 27.50, χ20.975 = 6.26 Step 3: From the above formula:

Step 4: Take the square root of each number in the inequality: ➤(b). Since N = 100, we use the formula:

Step 1: Step 2: Step 3: From the above formula:

Step 4: Taking the square root of both sides gives

Solved Problems 43.2 - Solved Problem 1: A local community college recently took two random sample of grade point averages of its graduating students. To check the consistency of the teachers' grading policies, it needs an estimate of the variance σ2 and standard deviation σ. (a). If N = 30 and s2 = 0.3, estimate the variance σ2 and standard deviation σ of the grade point average using a confidence interval of 90%. (b). If N = 200 and s2 = 0.18, estimate the variance σ2 and standard deviation σ of the grade point average using a confidence interval of 90%. Solutions: ➤(a). Step 1: Since we have a 90% confidence interval, α = (1 - 0.90)/2 = 0.05, 1 - α = 0.95 Step 2: For N - 1 = 30 - 1 = 29 degrees of freedom, from the table we have χ20.05 = 42.6, χ20.95 = 17.7 Step 3: From the above formula:

Step 4: Take the square root of both numbers in the equality gives ➤(b). Since N = 200, we use the formula:

Step 1: Step 2: Step 3: From the above formula:

Step 4: Take the square root of both numbers in the above inequality: 0.38≤ σ ≤ 0.47

Unsolved Problems with Answers 43.2 - Problem 1: A machine drills holes in steel plates. The drilling error has a mean μ = 0.01 mm with σ2 = 0.009. Due to possible vibrations in the machine the drilling accuracy can change. To monitor the accuracy, periodically samples of size N are taken from plates that have been drilled. (a). If N = 25 and s2 = 0.0096, estimate the variance σ2 and standard deviation σ of the accuracy in drilling using a 99% confidence interval. (b). If N = 50 and s2 = 0.0092, estimate the variance σ2 and standard deviation σ of the accuracy in drilling using a 99% confidence interval. Answers: ➤(a). ➤(b). ⇑ Refer back to 43.2 - Example 1 & 43.2 - Solved Problem 1.

43.3 - Hypothesis testing for and a population standard deviation σ.

To test an hypothesis for a population standard deviation we need to use the formula:

43.3 - Example 1: The Sweet Water Bottling company has a machine that fills in bottles 12 ounces of water. Each morning the standard deviation of the machine is set to σ = 0.1 ounces. During the day, the vibrations in the operation of the machine can significantly change the amount filled in each bottle by the machine. To check for significant changes in variance, a sample of 30 bottles is taken and the standard deviation s = 0.12 is recorded. (a). State Ho and Ha. (b). After each sample, in order to decide whether the machine should be shut down for adjustments, assume the following decision rule: D.R.: If χ2 ≥ c* then stop the machine and make appropriate adjustments on the machine. Find c* for a Type I error of α = 0.05. (c). Based on the above decision rule, would the machine be shut down? Explain. Solutions: ➤(a). The concern is that the standard deviation will be larger than σ = 0.1. Therefore, Ho: σ = 0.1 Ha: σ > 0.1 ➤(b). Step 1: Since the alternative hypothesis is σ > 0.1, we use the right-hand side of the chi-square table for α = 0.05.

Step 2: N = 30, s = 0.12, σ = 0.10 Step 3: Step 4: For d = 30 - 1 = 29, and α = 0.05, the Chi-square table gives = 42.6. Step 5: Therefore, c* = 42.6 Step 6: The decision rule is: D.R.: If χ2 ≥ 42.6 then stop the machine and make appropriate adjustments on the machine. ➤(c). Since χ2 = 43.2 > 42.6, the machine will be shut down. Since the value of χ2 = 43.2 falls in the tail-end of the chi-square distribution, s = 0.12 is significant fromσ = 0.10 and there is only a 5% chance that this difference is caused by random variation.

Solved Problems 43.3 - Problem 1: Recently the United States Defense department is considering to order a new type of anti-tank missile. The manufacturer of these missiles claims that the missiles will hit the area around any enemy tank within a standard deviation of 2 feet. A test sample of 26 missiles against tanks produced a sample standard deviation of 2.5 feet. (a). State Ho and Ha. (b). Assume the final decision depends on the result of this sample test. Using the following decision rule: D.R.: If χ2 ≥ c* then the missiles will not be purchased. Find c* for a Type I error of α = 0.01. (c). Based on the above decision rule, would the missiles be purchased? Explain. Solutions:

➤(a). The concern is that the standard deviation will be larger than σ = 2 feet. Therefore, Ho: σ = 2 Ha: σ > 2 ➤(b). Step 1: Since the alternative hypothesis is σ > 2, we use the right-hand side of the chi-square table for α = 0.01. Step 2: N = 26, s = 2.5, σ = 2 Step 3: Step 4: For d = 26 - 1 = 25, and α = 0.01, the Chi-square table gives X20.01 = 44.3. Step 5: Therefore, c* = 44.3 Step 6: The decision rule is : D.R.: If χ2 ≥ 44.3 then the missiles will not be purchased. ➤(c). Since χ2 = 40.25 < 44.3, the missiles will be purchased. The value s = 2.5 feet can be explained as caused by random variation. Therefore there is no significant difference from the standard deviation of 2 feet.

Unsolved Problems with Answers 43.3 - Problem 1: The CEO of a large railroad company claims that the company's trains arrive in New York city within 1 standard deviation of official arrival time. To substantiate this claim, the company checked the arrival time of 10 trains into New York city. From this sample they computed a standard deviation of 1.8 minutes. (a). State Ho and Ha.

(b). In supporting or rejecting this claim, use the following decision rule: D.R.: If χ2 ≥ c* then the claim is rejected. Find c* for a Type I error of α = 0.10. (c). Based on the above decision rule, would the claim be rejected? Explain. Answers: ➤(a). ➤(b). The decision rule is : D.R.: If χ2 ≥ 14.7 then reject the claim. ➤(c). Since χ2 = 32.4 > 14.7, the value s = 1.8 minutes cannot not be explained as caused by random variation. Therefore, the claim is rejected. ⇑ Refer back to 43.3 - Example 1 & 43.3 - Solved Problem 1.

43.4 - Testing For Goodness of Fit The Chi-square distribution can be used to test the discrepancy between observed and expected frequencies. This test is given by the chi-square distribution:

where xk is the observed frequencies, ek is the expected frequencies, d = N - 1, the degrees of freedom.

To test the data for goodness of fit, the null hypothesis is Ho: The data fits the tested distribution. Ha: The data does not fit the tested distribution. The null is rejected if χ 2 >χ 2α for a given value of α. 43.4 - Example 1: Mr. Goodman recently purchased a rare coin. To test if the coin is fair he tossed the coin 500 times. Assuming that he recorded 275 heads and 225 tails. (a). State Ho and Ha. (b). Complete the table:

(c). Compute χ 2. For α = 0.05, would the null hypothesis be rejected? Explain. Solutions: ➤(a). Ho: The coin is fair. Ha: The coin is biased. ➤(b). To complete the above table, we need to compute each expected values. Step 1: To compute the expected values we assume Ho is true. Step 2: For each toss the probability of heads is p = 0.50 and the probability of tails is q = 0.50 Step 3: Since the number of tosses is 500, the expected values for both heads and tails are 500(0.50) = 250.

Step 4:

➤(c). Step 1: For the above formula we set: x1 = 275, x2 = 225, e1 = 250, e2 = 250, d = 2 - 1 = 1. Step 2: Step 3: For d = 2 - 1 = 1, the chi-square table give χ20.05 = 3.84. Step 4: Since 5 > 3.84 we reject H0 and conclude that at a α = 0.05 level of significance, we reject the coin as being fair. 43.4 - Example 2: A Las Vegas casino recently purchased an electronic machine that simulates the toss of a pair of dice. To test the randomness of the machine according to the odds for dice, the machine was made to toss a pair of dice 360 times and each sum of the dice were recorded. (a). State Ho and Ha. (b). Complete the table:

(c). Compute χ2. For α = 0.01, would the null hypothesis be rejected? Explain. (d). For α = 0.10, would the null hypothesis be rejected? Explain. Solutions:

➤(a). Ho: The machine is generating random pairs of numbers according to the odds on dice. Ha: The machine is biased. ➤(b). To complete the above table, we need to compute each expected values. Step 1: To compute the expected values we assume Ho is true. Step 2: Assume X is the random distribution for the sum of a pair of dice. Therefore, P{X = 2} = 1/36, P{X = 3} = 2/36, P{X = 4} = 3/36, P{X = 5} = 4/36, P{X = 6} = 5/36, P{X = 7} = 6/36, P{X = 8} = 5/36, P{X = 9} = 4/36, P{X = 10} = 3/36, P{X = 11} = 2/36, P{X = 12} = 1/36. Step 3: Since the number of tosses is 360, the expected values are as followings: e1 = 360(1/36) = 10, e2 = 360(2/36) = 20, e3 = 360(3/36) = 30, e4 = 360(4/36) = 40, e5 = 360(5/36) = 50, e6 = 360(6/36) = 60, e7 = 360(5/36) = 50, e8 = 360(4/36) = 40, e9 = 360(3/36) = 30, e10 = 360(2/36) = 20, e11 = 360(1/36) = 10. Step 4:

➤(c). Step 1: For the above formula we set x1 = 9, x2 = 21, x3 = 30, x4 = 30, x5 = 60, x6 = 57, x7 = 53, x8 = 35, x9 = 38, x10 = 26, x11 = 1 e1 = 10, e2 = 20, e3 = 30, e4 = 40, e5 = 50, e6 = 60, e7 = 50, e8 = 40,

e9 = 30, x10 = 20, x11 = 10

Step 3: For d = 11 - 1 = 10, the chi-square table gives χ20.01 = 23.2 Step 4: Since 17.64 < 23.2 we reject Ha and conclude that at an α = 0.01 level of significance, we have no reason to believe the machine is not generating numbers according to the odds of dice. ➤(d). For d = 11 - 1 = 10, the chi-square table gives χ20.10 = 16 Step 5: Since 17.64 > 16 we reject Ho and conclude that at an α = 0.10 level of significance, we have reason to believe the machine is not generating numbers according to the odds of dice. 43.4 - Example 3: According to the theory of human genetics, in a large population, half the children born are boys and the other half are girls. To test this proportion, 1,000 families each with four children were randomly selected. (a). State Ho and Ha. (b). Complete the following table:

(c). compute χ2. For α = 0.025, would you support the hypothesis that there is an equal proportion of boys and girls born? Explain. (d). For α = 0.01, would you support the hypothesis that there is an equal proportion of boys and girls born? Explain. Solutions: ➤(a). Ho: One-half of all children born are boys and other half girls. Ha: The proportion of girls and boys born are not equal. ➤(b). Step 1: Assume Ho is true. Step 2: It is reasonable to assume that the distribution of gender among children is a binomial distribution:

➤(c). Step 1: For the above formula we set: x1 = 75, x2 = 281, x3 = 355, x4 = 220, x5 = 69, e1 = 62.5, e2 = 250, e3 = 375, e4 = 250, e5 = 62.5 Step 2: χ2 =

Step 3: For d = 5 - 1 = 4, the chi-square table gives χ20.025 = 11.1 Step 4: Since 11.1 < 11.69 we reject H0 and conclude that at a α = 0.025 level of significance, we can't conclude the theory that 50% of children born are males. ➤(d). For d = 5 - 1 = 4, the chi-square table gives: χ20.01 = 13.3 Step 5: Since 11.69 < 13.3 we reject Ha and conclude that at a α = 0.01 level of significance, we can conclude the theory that 50% of children born are males.

Solved Problems 43.4 - Solved Problem 1: A statistician was hired by a professional basketball team to determine if the team's percentage of winning games significantly depend on the day of the week the games are played on. For over 350 winning games he compiled the following winning games:

(a). State Ho and Ha. (b). Complete the table:

(c). Compute χ 2. For α = 0.10, would the null hypothesis be rejected? Explain. Solutions: ➤(a). Ho: The day of the week has no affect on the team's ability to win. Ha: The day of the week has an affect on the team's ability to win. ➤(b). To complete the above table, we need to compute each expected values. Step 1: To compute the expected values we assume Ho is true. Step 2: There are 350 games spread over 7 days a week. Step 3: Since there are 350 games spread over 7 days a week, the expected number of games won on each day of the week is 50. Step 4:

➤(c). Step 1: For the above formula we set x1 = 61, x2 = 47, x3 = 55, x4 = 38, x5 = 59, x6 = 35, x7 = 55 e1 = 50, e2 = 50, e3 = 50, e4 = 50, e5 = 50, e6 = 50, e7 = 50, d=7-1=6

Step 3: For d = 7 - 1 = 6, the chi-square table give χ20.10= 10.60.

Step 4: Since 12.60 > 10.6 we reject Ho and conclude that at a α = 0.10 level of significance, we conclude that the day of the week affects the teams ability to win. 43.4 - Solved Problem 2: A manufacturer of personal computers purchases disk drives from five different companies. According to the claims of these five companies, the following table lists their percentage of defective hard drives:

To monitor these percentages, the company keeps careful records of the defective hard drives for each of the five companies. The following table lists the total number of defective disks:

(a). State Ho and Ha. (b). Complete the table:

(c). Compute χ2. For α = 0.05, would the null hypothesis be rejected? Explain. (d). For α = 0.01, would the null hypothesis be rejected? Explain. Solutions: ➤(a).

Ho: The percentage of defective hard drives claimed by their five manufacturers is correct. Ha: The claim is not correct. ➤(b). To complete the above table, we need to compute each expected values. Step 1: To compute the expected values we assume Ho is true. Step 2: The expected number of defective hard drives is computed in the table below by multiplying row one times row two:

Step 3: The table below gives the number of defective hard drives and the expected number of hard drives per company:

➤(c). Step 1: For the above formula we set: x1 = 108, x2 = 154, x3 = 170, x4 = 121, x5 = 158 e1 = 84.77, e2 = 197.82, e3 = 182.9, e4 = 103.48, e5 = 166.32 Step 2:

Step 3: For d = 5 - 1 = 4, the chi-square table gives: χ20.05 = 9.49. Step 4: Since 9.49 < 20.33 we reject H0 and conclude that at a α = 0.05 level of significance, we have reason to believe the percentages of defective hard drives reported by the five companies are not correct. ➤(d). For d = 5 - 1 = 4, the chi-square table gives χ20.01 = 13.3 Step 4: Since 9.49 < 13.3 we reject Ha and conclude that at a α = 0.01 level of significance, we have no reason to doubt the percentage of defective drives reported by the five companies. 43.4 - Problem 3: Mr. Jones is the president of a large railroad. He recently claimed that 70% of the time its express trains from Boston to New York City arrives on time. Everyday there are five such trips made by these trains. To test this claim, the arrival times were recorded over 3,000 days. (a). State Ho and Ha. (b). Complete the following table:

(c). Compute χ2. For α = 0.025, would you support the hypothesis? Explain.

(d). For α = 0.01, would you support the hypothesis? Solutions: ➤(a). Ho : At least seventy percent of the express trains from Boston to New York City arrive on time. Ha: Reject the claim that at least seventy percent of the express trains from Boston the New York City arrive on time. ➤(b). Step 1: Assume Ho is true. Step 2: It is reasonable to assume that the distribution of arriving on time is a binomial distribution:

➤(c). Step 1: For the above formula we set: x1 = 15, x2 = 98, x3 = 412, x4 = 925, x5 = 1070, x6 = 480 e1 = 7.29, e2 = 85.05, e3 = 396.9, e4 = 926.1, e5 = 1080.45, e6 = 504.21 Step

2:

Step 3: For d = 6 - 1 = 5, the chi-square table give χ20.025 = 12.8. Step 4: Since 11.96 < 12.8, we reject Ha and conclude that at a α = 0.05 level of significance, we can't reject the claim that at least 70% of the trains arrive on time. ➤(d). For d = 6 - 1 = 5, the chi-square table give χ20.1 = 9.24. Step 5: Since 9.24 < 11.96, we reject Ho and conclude that at a α = 0.1 level of significance, we can conclude the claim that at least 70% of the trains arrive on time is false.

Unsolved Problems with Answers 43.4 - Problem 1: A local trucking company leased 7 copy machines for its office. The warranty for each machine states that each machine will be down no more than 5% of the time. Careful records were kept on each machines' down time over a 1,000 hour time period. The following table summarizes these records:

(a). State Ho and Ha. (b). Complete the table

(c). Compute χ 2 For α = 0.10, would the null hypothesis be rejected? Explain. Answers: ➤(a). Ho: The machines were down no more than 5% of the time. Ha: The machines were down more than 5% down time. ➤(b).

➤(c). χ2 = 8.88. Since 8.88 < 10.6 we reject Ha and at α = 0.10 level of significance, we conclude that we have no statistical basis for rejecting the leasing company's claim. ⇑ Refer back to 43.4 - Example 1 & 43.4 - Solved Problem 1. 43.4 - Problem 2: For the Presidential election in 1996, the Democratic party took a random survey, in January, of voters concerns on four issues. The following table was a result of this survey:

Three months later, a second survey of 1,500 voters was taken to see if the above percentages had significantly changed. The following tables is the results of this latest survey:

(a). State Ho and Ha. (b). Complete the table

(c). Compute χ2. For α = 0.05, would the null hypothesis be rejected? Explain. (d). For α = 0.005, would the null hypothesis be rejected? Explain. Answers: ➤(a). Ho: The percentage of voters concern on issues has not changed. Ha: The percentage of voters concern on issues has changed. ➤(b).

➤(c). χ2 = 29.75 Since 29.75 > 7.81, reject Ho. For α = 0.05, there has been a significant change of opinion. ➤(d). χ2 = 29.75 Since 29.75 > 12.8, reject H0. For α = 0.005, there has been a significant change of opinion. ⇑ Refer back to 43.4 - Example 2 & 43.4 - Solved Problem 2. 43.4 - Problem 3: After ten years Mrs. Billings has decided to sell her 6 unit bed and breakfast inn. In advertising her property, she claims that she has a vacancy rate of 35%. To support her claim, she provides the following information to perspective buyers:

(a). State Ho and Ha. (b). Complete the following table:

(c). Compute χ2. For α = 0.05, would you support the hypothesis? Explain. (d). For α = 0.01, would you support the hypothesis? Answers: ➤(a). Ho: Her vacancy rate is 35%. Ha: Her vacancy rate is not 35%. ➤(b).

➤(c). χ2 ≈ 24.45 Since 24.45 > 12.6, we reject Ho and have a statistical basis for rejecting her claim of a vacancy rate of 35%. ➤(d). Since 24.45 > 16.8 we reject Ho and have a statistical basis for rejecting her claim of a vacancy rate of 35%. ⇑ Refer back to 43.4 - Example 3 & 43.4 - Solved Problem 3.

43.5 - Contingency Tables A contingency table is made up of r rows and c columns of data. Such tables allow us to compare the dependent relationship between data collected from different populations1. To make such a determination we use the chi-square formula:

where N is the number of cells in the table, x is the data for each cell, e is the expected values for each cell. For chi-square the degrees of freedom is d = (r - 1)(c - 1) For hypothesis we always assume the data from the various populations are independent of each other. 43.5 - Example 1: Ms. Jones teaches an introductory course in Statistics. She has been requested, by the administration, to study the success rate of her students that have at least one year of algebra in comparison to those students that do not have this preparation. She decided to use a contingency table to make this comparison:

(a). State Ho and Ha. (b). Assume Ho is true, complete the expected frequencies for the following table:

(c). Compute χ2. (d). For α = 0.05, would you reject Ho? Solutions: ➤(a).

Ho: A year of algebra and passing statistics are statistically independent. Ha: A year of algebra and passing statistics are statistically dependent. ➤(b). Assuming Ho is true, we should assume that 65% (65/100 ) of all students in her class should pass statistics and 35% should fail.

➤(c). Step 1: For the formula,

Step 2: x1 = 37, x2 = 16, x3 = 28, x4 = 19 e1 = 34.45, e2 = 18.55, e3 = 30.55, e4 = 16.45 Step

3:

Step 4: d = ( r - 1)(c - 1) = (2-1)(2-1) = 1 ➤(d). For α = 0.05, and 1 degree of freedom, χ20.05 = 3.84 Since 1.15 < 3.84, there is no statistical reason to reject Ho. Therefore, we conclude that a one year algebra background and passing statistics are independent of each other.

Solved Problems

43.5 - Problem 1: Recently a national survey of 1,000 adults was taken. From this survey, data on each person's annual income and educational level was presented in the following table:

Assume we wish to use this data to decide if their level of education and income are statistically related. (a). State Ho and Ha. (b). Assume Ho is true, complete the expected frequencies for the following table:

(c). Compute χ2. (d). For α = 0.01, would you reject Ho? Solutions: ➤(a). Ho: Level of income and education are statically independent.

Ha: Level of income and education are statically dependent. ➤(b). Assuming Ho is true, we have

➤(c). Step 1: For the formula,

Step 2: x1 = 111, x2 = 27, x3 = 2, x4 = 208, x5 = 41, x6 = 13, x7 = 107 x8 = 197, x9 = 97, x10 = 51, x11 = 100, x12 = 46 e1 = 66.78, e2 = 51.1, e3 = 22.12, e4 = 124.974, e5 = 95.63, e6 = 41.396, e7 = 191.277, e8 = 146.365, e9 = 63.358, e10 = 93.969, e11 = 71.905, e12 = 31.126

Step 4: d = (r - 1)(c - 1) = (4-1)(3-1) = 6 ➤(d). For α = 0.01, and 6 degree of freedom, χ20.01 = 16.8. Since 275.04 > 16.8, reject Ho and accept Ha Therefore, we conclude that there is a statistical dependent relationship between income and education.

Unsolved Problems with Answers 43.5 - Problem.1: A large petroleum company recently tested three new gasoline additives in 100 automobiles of a certain model to determine if there is an association between additives and mileage. The following table is a summary of this test:

(a). State Ho and Ha. (b). Assume Ho is true, complete the expected frequencies for the following table:

(c). Compute χ2. (d). For α = 0.05, would you reject Ho? Answers: ➤(a). Ho: The association between these additives and gasoline mileage performance are statistically independent. Ha: The association between these additives and gasoline mileage performance are statistically dependent.

➤(b).

➤(c). χ2 ≈ 3.72 ➤(d). For α = 0.05, and 6 degree of freedom, χ20.05 = 12.6. Since 3.72 < 12.6, reject Ha and accept Ho. Therefore, we conclude that there is no statistical dependent relationship between these additives and improved gasoline mileage. ⇑ Refer back to 43.5 - Example 1 & 43.5 - Solved Problem 1.

Supplementary Problems Assume we have a normal population with a sample random distribution X where N is the sample size, s the standard deviation of the sample and σ the standard deviation of the population. 1. From the formula: Find a formula for s2 and s. 2. For a given sample size N and α, find a general confidence interval formula for s2 and s. 3. The Sweet Water Bottling Company has a machine that fills bottles of water. Each morning the variance of the machine is set to σ2 = 0.1 ounces. During the day, the vibrations in the operation of the machine can significantly change the variance of the machine. To check for significant changes in variation, a sample of 100 bottles is taken and the sample variance

s2 is recorded. Find a 95% confidence interval for s2 and s. 4. Assume a sample of size N = 350. Find χ20.05. 5. Using the formula :

where z is the standard normal distribution, complete the following table for N = 50:

A machine drills holes in metal plates. The diameter tolerance of each hole is σ = 0.001 millimeters. Each hour 50 plates are tested for drill accuracy by computing s. 6. State Ho and Ha. 7. Using the formula:

and significant levels α = 0.02 and 1 - α = 0.98, establish a decision rule for stopping the machine. 8. According to the decision rule, If s2 = 0.0009, and σ = 0.001 would the machine be functioning properly? Explain. 9. Assume the drilling precision changed to σ = 0.0015 and a sample of N = 50 resulted in a standard deviation of s2 = 0.0019. Would the above decision rule shut the machine down. In a small village in South Africa 250 people became infected with a certain disease. To test the effectiveness of a new drug, half the women and men infected were given the drug while the others infected were given a placebo. The following table gives the final results of this experiment:

10. State Ho and Ha. 11. Compute χ2. 12. For α = 0.05, would you reject Ho?

1

A statistically dependent relationship between data does not necessary imply a causal relationship.

Statistical Inference Theory Lesson 44 Correlation & Regression Analysis I

Correlation and regression analysis allows one to determine the following: 1. Does there exist a dependent relationship between two or more sets of data? 2. In what form this dependent relationship exists? 3. How strong a dependent relationship exists? To carry out such a study, we will use throughout this lesson and lesson 45 the following example: Ms. Pool sells residential real estate. To see if there is a relationship between the floor size of a house and its selling price, she takes a random sample from all houses sold in her local area in 1995. The data in the following table is a result of this survey:

We begin by representing this table using a scatter diagram.

44.1- What is a scattered diagram? A scattered diagram is a graphic representation of the above table where each point in the diagram is the corresponding pair of numbers (x,y) where the x values are measured on the horizontal axis and the y values are measured on the vertical axis. The scattered diagram on the right represents the above table.

44.1 - Example 1: Mr. Jones teaches courses in calculus and physics. At the end of the semester he compared the final grades of seven students that were enrolled in both of his classes. The data in the following table is the grades he collected:

(a). Construct a scatter diagram. (b). What does the diagram show about the relationship between the students' final grades in calculus and physics? Solutions: ➤(a). Step 1: The horizontal axis x represents the calculus grades.

Step 2: The vertical axis y represents the physics grades. Step 3: Mark the ordered pairs (x,y): (36,42.8), (43.7,69.1), (55.9,73), (78.4,65.7), (81,91.2), (85.8,86.1), (92,88.8).

➤(b). The grades of calculus and physics approximately increase in the same direction.

Solved Problems 44.1 - Solved Problem 1: A large manufacturing firm needs to do a study in the relationship between yearly capital equipment expenditure and total yearly manufacturing costs. The following table is the yearly capital equipment expenditure (in thousands of dollars) and the total yearly manufacturing costs (in thousand of dollars) for six years:

(a). Construct a scatter diagram. (b). What does the diagram show about the relationship between the yearly capital equipment costs and yearly total manufacturing costs? Solutions: ➤(a). Step 1: The horizontal axis x represents the yearly capital expenditure. Step 2: The vertical axis y represents the yearly total costs. Step 3: Mark the ordered pairs (x,y): (18.7,51.25), (32.5,40.20), (5.7,89.30), (7.9,77.60), (21.3,49.60), (13.10,79.90).

➤(b). The total costs decline as capital equipment expenditure increases.

Unsolved Problems with Answers 44.1 - Problem 1: A statistician was hired by a community hospital to study the relationship between male patient's weight and level of cholesterol. A random sample of 10 male patients was taken and the following table is a summary of their weight and cholesterol levels.

(a). Construct a scatter diagram. (b). What does the diagram show about the relationship between the patients' weight and cholesterol level? Answers: ➤(a).

➤(b). From the diagram there appears to be no relationship between the patients' weight and level of cholesterol. ⇑ Refer back to 44.1 - Example 1 & 44.1 - Solved Problem 1.

44.2 - Regression Analysis Using Straight Lines. To determine the form of a dependent relationship between two sets of data x, y, we will assume that the relationship between the x and y values of the scatter diagram is approximately linear. Under this assumption we need to estimate the values of y for each value x by fitting a straight line to the scattered diagram. Since there are an infinite number of possible such lines, we will select the straight line that minimizes the absolute difference of the values of the scattered diagram from the y values of this line. Such lines are called least-squares lines. The equation for the least-squares line is y* = mx + b where m is the slope of the line, b is the y-intercept, y* are the values of the line used to estimate the given y values of the scattered diagram, and x are the values of the scattered diagram corresponding to the horizontal axis. To find m and b we need to complete the following table for each value of x and y from the scattered diagram:

From these totals we have the formulas:

N = the total pairs of numbers (x,y) in the scattered diagram. Let us fit a least-square straight line to Ms. Pool's scattered diagram data:

Step 1: Completing the table:

Step 2: N = 10 (number of pairs)

Step 3: The equation is y* = mx + b = 9.95x + -168.22. Step 4: The graph of this equation on the scattered diagram is:

44.2 - Example 1: From 44.1 - Example 1, (a). find the equation for the least-squares line. (b). Plot the least-squares line on the scattered diagram. (c). This line can be used to estimate values of y for given x values. If a student received a grade of 70 in calculus, what is the expected physics grade? Solutions: ➤(a). Step 1: We start with the data from the scattered diagram:

Step 2: Next, compute the values for the table:

Step 3: N = 7 (number of pairs)

Step 4: The equation is y* = mx + b = 0.63x + 31.46. ➤(b). The graph of this equation on the scattered diagram is:

➤(c). For x = 70, y* = mx + b = 0.63x + 31.46 = 0.63(70) + 31.46 = 75.56 which is the estimated grade in physics.

Solved Problems 44.2 - Solved Problem 1: From 44.1- Solved Problem 1, (a). find the equation for the least-squares line. (b). Plot the least-squares line on the scattered diagram. (c). This line can be used to estimate values of y for given x values. If for a given year capital expenditure is 10 million dollars, what is the expected total manufacturing costs? Solutions: ➤(a). Step 1: We start with the data from the scattered diagram:

Step 2: Next, compute the values for the table:

Step 3: N = 6 (number of pairs)

Step 3:

The equation is y* = mx + b = -1.90x + 96.02. ➤(b). The graph of this equation on the scattered diagram is :

➤(c). For x = 10, y* = mx + b = -1.90x + 96.02 = -1.90(10) + 96.02 = 77 million dollars which is the estimated total costs.

Unsolved Problems with Answers 44.2 - Problem 1: From 44.1 - Problem 1, (a). find the equation for the least-squares line. (b). Plot the least-squares line on the scattered diagram. (c). This line can be used to estimate values of y for given x values. If a patient's weight is 200 pounds, what is the expected cholesterol level? Answers: ➤(a). y* = 0.40x + 144.15 ➤(b).

➤(c). 224.15 level. ⇑ Refer back to 44.2 - Example 1 & 44.2 - Solved Problem 1.

44.3 - The linear correlation coefficient The linear correlation coefficient r allows us to measure the relationship between the least-square line and the data in the scattered diagram (y,x). The formulation of r follows from the standard variance of the least-square line s2y,x and the variance s2y :

where y is the average of the y values in the scattered diagram and yk* are the estimated values of y computed from the least square-line. sy,x is called the standard error of the estimate of y on x. Interpretation of s2y,x : The variance s2y,x measures the dispersion of the values in the scattered diagram from the least-squares line. If the least-square line passes through each of the points in the scattered diagram, then yk = y*k

and therefore, s2y,x = 0. If there is no relationship between the scattered diagram and the least-square line, then we have s2y,x = s2y. Interpretation of s2y: The variance s2y measure the dispersion of the y values totally independent of the x values.

Definition of the coefficient of determination:

is called the coefficient of determination, where 0 ≤ r2 ≤ 1.

Definition of the coefficient of correlation: The value r is called the coefficient of correlation where

Important facts about r: 1. -1 ≤ r ≤ 1. 2. If r > 0 then the least-square line is rising. The x and y data are positively correlated. 3. If r < 0 then the least-square line is falling. The x and y data are negatively correlated. 4. If r = ± 1 then the total variation s2y is completely explained by the leastsquare line and we say the correlation is perfect. 5. If r = 0 then none of the total variation is explained by the least-square line. 6. r2 can be interpreted as that portion of total variation s2y that can be explained by the least-square line.

7. The coefficient of correlation measures how well the least-square line fits the scattered diagram. If r is close to ± 1, we can assume the x and y val ues are linearly related. 8. A convenient way to compute r is to complete the table below:

and

Returning again to Ms. Pool's data on floor size and home prices, we complete the following table:

44.3 - Example 1: From 44.1 - Example 1, find r. Solution: The x,y data for this example is given in the table below:

Step 1: Complete the table below:

Solved Problems 44.3 - Solved Problem 1: From 44.1 - Solved Problem 1, find r. Solution: The x,y data for this example is given in the table below:

Step 1: Complete the table below:

Unsolved Problems with Answers 44.3 - Problem 1: From 43.1 Problem 1, find r. Answer:

r ≈ 0.295. ⇑ Refer back to 44.3 - Example 1 & 44.3 - Solved Problem 1.

Supplementary Problems Assume X and Y are two discrete random variables. We define the joint probability distribution of X and Y by f(xj,yk) = P(X = xj, Y = yk) = P[(X = xj)∩(Y = yk)] for j = 1,2, …,m and k = 1,2,…,n The following table is called a joint probability distribution table:

Assume three cards are drawn at random from an ordinary deck of cards. Let X be the random variable that equals the number of kings drawn and Y the random variable that equals the number of queens drawn. 1. Complete a joint probability distribution table. 2. Show that f1(xk) = P(X = k) for k = 0,1,2,3. For discrete random variables X and Y we define two important variances: • Variance of the random variable X:

where μx is the mean E(X).

• Covariance of the random variables X and Y:

3. For problem 1, find the variances of X and Y and their Covariance. 4. We measure correlation between two random variables by the formula

For problem 1, find ρ. Many different curves can be found to fit a scatter diagram. The least-square parabola is commonly used to fit data when it is believed that the x,y values are not linearly related. The following equation is the least square parabola: y* = a + bx + cx2 where a,b,c are determined from the solution of the following three equations: B = Na + Ab + Cc D = Aa + Cb + Ec F = Ca + Eb + Gc, where A,B,C,D,E,F,G can be computed by completing the following table:

5. Find the least-square parabola that best fits the scatter diagram in 44.1 Example 1.

6. To measure the goodness of fit of the least-square parabola, we use the formula:

where

For problem 5, find ρ.

Statistical Inference Theory Lesson 45 Correlation & Regression Analysis II

From Lesson 44, we can assume that the data collected is a random sample taken from a large population. In our first example (Lesson 44), Ms. Pool only selected ten homes to analysis. However, it is obvious that we can assume these ten homes is a random sample of all houses for sale in her community. Since it is not feasible to collect data pertaining to the entire population, we need to use the sample to make certain inferences about the dependence relationship within the whole population. We will assume the following: yp = α + βx represents the least-squares line for the population. ys = a + bx represents the least-squares line for the sample. ρ is the correlation coefficient of the population.

r is the correlation coefficient of the sample.

45.1- Estimating yp Assume we wish to study, for a given population, the dependence relationship between two sets of data x,y. By taking a sample we can easily derive ys = a + bx, which represents the least-squares line for the sample. Assume that we would like to get an estimate of yp, which is the predicted value of the least-square line of the population for a given value of x. For example, in the residential real estate sales of lesson 44, for x = 3,500, the least-square equation gives ys = m(35) + b = 9.95(35) + -168.22 = 170.08 From this sample, the predicted selling price is $170,080 for houses that have 3,500 square feet. If the y data from the entire population is normally distributed then the following will give us a confidence interval estimate of the predicted selling price of the entire population:

where t has a Student's distribution with d = n - 2 degrees of freedom, N is the sample size, x is the mean of the x values of the sample, sy,x is the standard error of the estimate (see Lesson 44). S2x is the variance of the x values. 45.1 - Example 1: A statistician was hired by a community hospital to study the relationship between male patients’ weight and level of cholesterol. A random sample of 10 male patients was taken and the following table is a summary of their weight and cholesterol levels.

The least-square line is ys = 0.40x + 144.15. For x = 250 pounds, find a 95% confidence interval estimate of the patient's projected cholesterol level yp. Solution: Step 1: The number of degrees of freedom is d = 10 - 2 = 8. Step 2: From the Student distribution table, t0.475 = 2.306. Step 3: To compute Sx,y complete the following table:

Step 4: Step 5: For x = 250, ys = 0.40(250) + 144.15 = 244.15 Step 6: Step 7: Step 8: Given the above confidence interval,

Step 9: The above simplifies to approximately 147.88 ≤ yp ≤ 340.42

Solved Problem

45.1 - Solved Problem 1: Ms. Pool sells residential real estate. To see if there is a relationship between the floor size of a house and its selling price, she takes a random sample from all houses sold in her local area in 1995. The data in the following table is a result of this survey:

The least-square line is ys = 9.95x + -168.22. For x = 35, find a 90% confidence interval estimate of the projected selling price yp. Solution: Step 1: The number of degrees of freedom is d = 10 - 2 = 8. Step 2: From the Student distribution table D, t0.45 = 1.86. Step 3: To compute Sx,y complete the following table:

Step 4: Step 5: For x = 35, ys = 9.95(35) - 168.22 = 180.03 Step 6: Step 7: Step 8: Given the above confidence interval,

Step 9: The above simplifies to 109.51 ≤ yp ≤ 250.55.

Unsolved Problems with Answers 45.1 - Problem 1: Mr. Jones teaches courses in calculus and physics. At the end of the semester he compared the final grades of seven students that were enrolled in both of his classes. The data in the following table is the grades he collected:

The least-square line is ys = 0.63x + 31.46. For x = 75, find a 90% confidence interval estimate of the projected physics grade yp. Answer: 54.25 ≤ yp ≤ 103.17 ⇑ Refer back to 45.1 - Example 1 & 45.1 - Solved Problem 1.

45.2 - Hypotheses Testing for ρ. To do an hypotheses testing for ρ, we have two cases: Case 1: Ho: ρ = 0. The distribution of of freedom.

is a Student distribution with n - 2 degree

Case 2: Ho: ρ ≠ 0. The distribution of

is normally distributed with mean μR

and

45.2 - Example 1: Modern portfolio investment theory advocates that the optimal strategy for an investor is to purchase stocks in companies that have a low correlation between them. Mr. Smith is interested in purchasing stocks in two different companies. Over a 24 month period the closing monthly prices showed a correlation of r = 0.22. (a). State Ho and Ha. (b). Using a 5% significant level, would Mr. Smith conclude that the two stocks have a correlation significantly different than zero? Solutions: ➤(a). Ho: ρ = 0 Ha: ρ ≠ 0 ➤(b). Step 1: The degree of freedom is d = 24 - 2 = 22. Step 2: Since we have a two-tail test, t.475 = 2.074. Step 3: Step 4: Since 1.06 < 2.074, we cannot reject Ho. Therefore, Mr. Smith could conclude that the two stocks do not have a correlation significantly different than zero. 45.2 - Example 2: In studying the price movements of corn and cattle, Mrs. Jones concludes that the correlation between these two commodity prices is

at least 0.50. To test this hypotheses a study of these monthly prices over a 48 month period resulted in a correlation coefficient r = 0.41. (a). State Ho and Ha. (b). Using a 5% significant level, can Mrs. Smith conclude that the data supports her hypotheses? ➤(a). Ho: ρ = 0.50 Ha: ρ < 0.50 ➤(b). Since Ho: ρ = 0.50, we follow case 2. Step 1: Step 2: Step 3: Step 4: Step 5: Since R and z are normally distributed, and we are using a 5% significance level, the normal distribution table gives z = -1.64. Step 6: Since -0.73 > -1.64, Mrs. has no statistical basis for rejecting her assumption that ρ ≥ 0.50.

Solved Problems 45.2 - Solved Problem 1: In studying the stock price movement and volume of a well known company, Mr. Allen believes that there is a strong correlation between the monthly volume and the change in the price of the company's stock. His conjecture is that the change in volume and price move in the same direction. He decides to test his idea using correlation analysis. Over a 24 month period he compared the monthly volume and the stock

prices. From this data he computed r = 0.33. (a). State Ho and Ha. (b). Using a 1% significant level, would Mr. Allen conclude that there is a strong correlation? Solutions: ➤(a). Ho: ρ = 0 Ha: ρ > 0 ➤(b). Step 1: The degree of freedom is d = 24 - 2 = 22. Step 2: Since we have a one tail test, we have t.49 = 2.508. Step 3: Step 4: Since 1.64 < 2.508, we cannot reject Ho. Therefore, Mr. Allen cannot conclude there is a strong relationship between the movement of stock prices and volume. 45.2 - Problem 2: The mathematics department at a large university recently conducted a study to determine if there has been a significant change in the correlation between the final grades of students that took both first and second semester calculus. Past records has shown a ρ = 0.78. A sample of 200 students grades were taken. From this data r = 0.69. (a). State Ho and Ha. (b). Using a 10% significant level, can we conclude there has been a change in the relationship between grades? ➤(a). Ho: ρ = 0.78 Ha: ρ ≠ 0.78

➤(b). Since Ho: ρ = 0.78, we follow case 2. Step 1: Step 2: Step 3: Step 4: Step 5: Since x and z are normally distributed, and we are using a 1% significance level, the normal distribution table gives z = ±2.58. Step 6: Since -40 < -2.58, Ho is rejected and conclude there has been a significant change in the relationship between grades.

Unsolved Problems with answers. 45.2 - Problem 1: A study of climate changes between the north and south pole was recently completed. Using data over a 20 year period, a value of r = 0.20 was computed. (a). State Ho and Ha. (b). Using a 5% significant level, would one conclude there is a significant relationship? Answers: ➤(a). Ho: ρ = 0 Ha: ρ ≠ 0 ➤(b). Since t = 0.87 < 2.101 we have no statistically basis for rejecting Ho. ⇑ Refer back to 45.2 - Example 1 & 45.2 - Solved Problem 1.

45.2 - Problem 2: Studies have shown that the correlation between the amount of feed chickens consumed and number of eggs laid per week is ρ = 0.80. Using a new brand of feed, 100 randomly selected chickens were studied for their production of eggs. From this study, r = 0.77. (a). State Ho and Ha. (b). Using a 5% significant level, can we conclude there has been a decrease in the number of eggs laid? Answers: ➤(a). Ho: ρ = 0.80 Ha: ρ -1.64, Ho would not be rejected. We could conclude that there is no significant statistical evidence that there has been a decrease in the production of eggs. ⇑ Refer back to 45.2 - Example 2 & 45.2 - Solved Problem 2.

Supplementary Problems Assume ρ ≠ 0. 1. Find a general formula for find a confidence for ρ. 2. For 45.2 - Example 2, find a 95% confidence interval for ρ. Assume we have two populations. From each population we take a sample and compute for each sample correlation coefficients r1 and r2 respectively. The distributions

are normally distributed where

and N1 and N2 are the sample sizes respectively. A research institute recently did a study to see if there is a significant difference between men and women according to their respective correlations of weight and cholesterol. They sampled N1 = 200 women and N2 = 100 and found r1 = 0.45 and r2 = 0.58. 3. State Ho and Ha.. 4. Using a level of significance of α = 0.05, what conclusion would you come to? 5. For a given value of x, we define μxy to be the mean value of all y values. For example, in the residential real estate sales of lesson 44, for x = 3,500 square feet, μxy would be the average sales price for all homes in Ms. Pool’s community that have a floor size of 3,500 square feet. Since the data x,y from the scattered diagram is a sample taken from a population, we can assume that for each value of x in the sample, ys = a + bx can represent the mean value of the sample y for a given x value. To estimate μxy, we have the following confidence interval: If the y data from the entire population is normal then Assume 45.1 - Example 1 in this lesson. Find for x = 250 pounds, find the 99% confidence interval for μxy given:

has a student distribution with N - 2 degrees of freedom.

Statistical Inference Theory Lesson 46 Non-parametric Statistics

Throughout this book, when studying populations, we frequently need information about the distribution, mean and variance of the populations that we draw our samples from. There are times when such information is not available or desirable. Under such circumstances we use statistical tests called non parametric testing. This arises when the following holds true: 1. The distribution of the populations are not known. 2. The values or estimates of μ and σ2 are not given or desired. 3. The general relationship between populations are needed to be determined. The following sections are the main non-parametric tests. Each of these tests are explained through examples and problems.

46.1-The Sign Test

The sign test can be used to decide if two populations are essentially the same by comparing the resulting samples taken from each of the populations. 46.1 - Example 1: A computer software company writes business application programs. Before a program is sold to the public, they have two computer programmers independently test it in order to detect and correct errors in each program. In January, 1995 the company hired two programmers, working independently, to find errors in ten programs. The company wants to see if there is a significant difference in the accuracy of these two programmers. The following table is the number of errors found by each programmer for each of the 10 programs:

(a). State Ho and Ha. (b). Using α = 0.05, do you conclude there is a significant difference in the performance of these two programmers? Solutions: ➤(a). Ho: p = 0.5, there is no significant difference in the performance of these two programmers. Ha: p ≠ 0.5, there is a significant difference in the performance of these two programmers. ➤(b). Rules for using the sign test: Rule 1: Subtract one row from another and record the number of + and signs. Eliminate the zeros, if any. Rule 2: Assume Ho is true and apply the binomial distribution. Step 1: Following rule 1 we get,

Step 2: Since we have two zeros, we assume N = 10 - 2 = 8. Step 3: The number of + signs is N+ = 5 and the number of minus signs is N= 3. Step 4: The following is the cumulative binomial distribution for N1 = 8 and p = 0.50.

Step 5: Since we have a two-sided test, we use

= 0.025.

Since P{N+ ≥ 5} = 0.3633 > 0.025, we have no statistical significance that would allow us to reject Ho. Therefore, we conclude that the study does not show there is a difference in the two programmers’ performance.

Solved Problems 46.1 - Solved Problem 1: A major petroleum company tested a new gasoline additive for automobiles to determine if the additive significantly improves mileage over their standard additive. The following table is the mileage

resulting from driving 6 automobiles of the same model, each have only one gallon of gasoline:

(a). State Ho and Ha. (b). Using α = 0.05, do you conclude there is a significant difference in the performance of these two additives. Solutions: ➤(a). Ho: p = 0.5, there is no significant difference in the performance of these additives. Ha: p > 0.5, the new additive significantly increases gasoline mileage. ➤(b). Rules for using the sign test: Rule 1: Subtract one row from another and record the number of + and - signs. Eliminate the zeros, if any. Rule 2: Assume Ho is true, and apply the binomial distribution. Step 1: Following rule 1 we get,

Step 2: The number of + signs is N+ = 4 and the number of minus signs is N= 2. Step 3: Assuming Ho is true, we should expect half the cars to show

improved gas mileage from the new additive. Step 4: Rule 3 assumes that the distribution of signs is binomial. Since we assume no significant difference, p = 0.50. Step 5: The following is the cumulative binomial distribution for N = 6 and p = 0.50.

Step 6: Since we have a one-sided test, we use α= 0.05. Since P{N+ ≥ 4} = 0.3438 > 0.05, we reject Ha and conclude there is no statistical significance indicating the additive improved mileage performance.

Unsolved Problems with Answers 46.1 - Problem1: An agency of the United States Department of Agriculture claims that the average retail price of roast beef in Orange county is less than $2.15 a pound. To test this claim 12 supermarkets were randomly selected in the Orange county area. The following table is the result of this survey:

(a). State Ho and Ha.

(b). Using α = 0.01, would you reject the claim that in Orange county the average retail price of roast beef is less than $2.15 a pound. Answers: ➤(a). Let p equal the proportion of supermarkets that charge less than $2.15 a pound. Ho: p ≤ 0.50 Ha: p > 0.50 ➤(b). We have no statistical basis for accepting the Agency’s claim. ⇑ Refer back to 46.1 - Example 1 & 46.1 - Solved Problem 1.

46.2 - The Mann-Whitney U Test Assume samples are taken respectively from two populations. The MannWhitney test allows us to decide, by ranking the samples, if there is a significant difference between the two populations. The following rules should be followed: Rule 1: Combine the data from both samples in ascending order. Rule 2: Rank the data in ascending order. If a tie exists, average the ranking for the numbers in the tie. Rule 3: Compute the sum of the ranking for each sample. Rule 4: The following formula gives the distribution between the rank sums: where N1: the size of sample 1, N2: the size of sample 2, R1: the sum of the rank for sample 1 If N1 and N2 are both at least 8 or more, the distribution of U is approximately normally distributed with a mean μ and σ2:

46.2 - Example 1: Ms. Rogers teach third year Latin. The following table is a list of the final grades for both boys and girls.

(a). State Ho and Ha. (b). Compute U for the final grades for male students. (c). For α = 0.10, what decision would you come to? Solutions: ➤(a). Ho: There is no significant difference between the final grades of the male and female students. Ha: There is a significant difference between the final grades of the male and female students. ➤(b). We will apply the four rules above. Step 1: Using rule 1 we have

Step 2: Using rule 2 we have

Step 3: Using rule 3, we have

Step 4: Since N1 = 8, N2 = 10, R1 = 82.5,

➤(c). Step 1: Step 2: Step 3: Since U is approximately normal, we use

which is normal with mean 0 and variance 1. Step 4: Step 5: Since we have a two-sided test,

Therefore, the corresponding z value is z = -1.64 Step 6: Since -0.57 > -1.64, we cannot reject Ho and conclude there is no significant difference between the final grades of the males and females in the Latin class.

Solved Problems 46.2 - Solved Problem 1: A professional baseball team is considering a new type of bat for future games. To see if this bat is significantly superior than their regular bat, they use a pitching machine to pitch 100 fast balls to each of ten players. The following table is the number of home runs:

(a). State Ho and Ha. (b). Compute U for number of home runs using the regular bat. (c). For α = 0.05, what decision would you come to? Solutions: ➤(a). Ho: There is no significant difference between the two bats. Ha: The new bat is significantly better. ➤(b). We will apply the four rules above. Step 1: Using rule 1 we have

Step 2: Using rule 2 we have

Step 3: Using rule 3, we have

Step 4: Since N1 = 10, N2 = 10, R1 = 87,

➤(c). Step 1:

Step 2: Step 3: Since U is approximately normal, we use

which is normal with mean zero and variance 1. Step 4: Step 5: Since we have a one-sided test, α = 0.05. Therefore, the corresponding z value is z = 1.64. Step 6: Since 1.36 < 1.64, we reject Ha and conclude there is a no significant difference between the two bats.

Unsolved Problems with Solutions 46.2 - Problem 1: A local high school basketball team wants to compare their team's final scores on their home court against the team's final scores away from home. They believe that the team does better on their home court. From the team's records the coach randomly selected 10 games played on the home court and 10 games played away from home:

(a). State Ho and Ha. (b). Compute U for final scores played at home. (c). For α = 0.05, what decision would you come to? Answers: ➤(a). Ho: There is no difference in the team's total scores between playing at home

or away. Ha: The team has greater scores when playing away from home. ➤(b). U = 23 ➤(c). We reject Ho and conclude there is a significant difference between the teams playing at home or away. ⇑Refer back to 46.2 - Example 1 & 46.2 Solved Problem 1.

46.3 - The Kruskal-Wallis H Test The Kruskal-Wallis H test is a generalization of the Mann-Whitney U test. This test allows one to compare three or more populations. Assume we take samples from three or more populations. The following formula

has a chi-square distribution with k -1 degrees of freedom under the restriction that each Nj ≥ 5 where Nj is the sample size taken from jth population, N = N1 + N2 +…+ Nk, Rj is the sum of the rank for the jth sample. The ranking and classification for each sample is done exactly as carried out in the Mann-Whitney U test. 46.3 - Example 1: Assume the track coach of a local high school wishes to test, among three brands of running shoes, the best performing shoes. He decides to select 15 track runners to run the 100 yard dash. In this race, each brand is worn by five runners. The following table gives the timing outcome of the race for each brand.

(a). State Ho and Ha (b). Compute H. (c). For α = 0.05 would you conclude there is a significant difference in performance due to the brand of running shoes used? Solutions: ➤(a). Ho: There is no difference in performance due to shoe brands. Ha: There is a difference in performance due to shoe brands. ➤(b). Step 1: From the above table, combine and rank the data:

Step 2: Apply this ranking for each brand:

Step 3: Compute H from the formula: N = N1 + N2 + N3 = 5 + 5 + 5 = 15

➤(c). Step 1: The degrees of freedom is k - 1 = 3 - 1 = 2. Step 2: For α = 0.05, the chi-square table give us χ2 = 5.99. Step 3: Since H = 2.36 < 5.99, we conclude that a α = 0.05 significance level we cannot conclude there is a significant difference in the running performance due to shoe brands.

Solved Problems 46.3 - Solved Problem 1: A large petroleum company wishes to test five new gasoline additives for increased fuel efficiency. Their research department purchased 35 new model sedans and drove each car 100 miles over the same track. Each additive was mixed with the gasoline of seven sedans. The following table is the mileage recorded for each car in this test. Here, mileage is measured for each car as to the number of gallons consumed to travel 100 miles.

(a). State Ho and Ha (b). Compute H. (c). For α = 0.05 would you conclude there is a significant difference in performance due to the additive used? Solutions: ➤(a). Ho: There is no difference in performance due to the additive. Ha : There is a difference in performance due to the additive. ➤(b). Step 1: From the above table, combine and rank the data:

Step 2: Apply this ranking for each additive:

Step 3: Compute H from the formula: N = N1 + N2 + N3 + N4 + N5 = 7 + 7 + 7 + 7 + 7= 35

➤(c). Step 1: The degrees of freedom is 5 - 1 = 4.

Step 2: For α = 0.05, the chi-square table give us χ2 = 9.49. Step 3: Since H = 13.79 > 9.49, we conclude that a α = 0.05 significance level we can conclude there is a significant difference in the mileage performance due to different gasoline additives.

Unsolved Problems with Answers 46.3 - Problem 1: A medical research laboratory wishes to test if there is a difference between three different drugs that promote weight loss for women over 200 pounds. The client randomly divide up fifteen over-weight women into three equal groups. Each group takes only one of the drugs. The following table is the resulting weight loss (in pounds ) after 60 days:

(a). State Ho and Ha. (b). Compute H. (c). For α = 0.01 would you conclude there is a significant difference in weight loss due to the drug used? Answers: ➤(a). Ho: There is no significant weight loss due to different drugs. Ha: There is a significant weight loss due to different drugs. ➤(b). H = 6.62 ➤(c). Since H = 6.62 < 9.21, there is no significant weight loss due to

different drugs. ⇑ Refer back to 46.3 - Example 1 & 46.3 - Solved Problem 1.

46.4 - The Spearman's Rank Correlation The following formula allows us to compute the correlation between two sets of ranked data x and y: where, Dj is the difference in the corresponding rank and N is the number of pair values x,y. 46.4 - Example 1: Ms. Pool sells residential real estate. To see if there is a relationship between the floor size of a house and its selling price, she takes a random sample from all houses sold in her local area in 1995. The data in the following table is a result of this survey:

Find the rank correlation r. Solution: Step 1: List the floor sizes in ascending order and rank:

Step 2: List the selling price in ascending order and rank:

Step 3: We now place the ranking in the above table and compute D and D2:

Step 4: Since N = 10,

Solved Problems 46.4 - Solved Problem 1: Mr. Jones teaches courses in calculus and physics. At the end of the semester he compared the final grades of seven students that were enrolled in both of his classes. The data in the following table is the grades he collected:

Compute r. Solution: Step 1: List the calculus grades in ascending order and rank:

Step 2: List the physics grades in ascending order and rank:

Step 3: We now place the ranking in the above table and compute D and D2:

Step 4: Since N = 7,

Unsolved Problem with Answer 46.4 - Problem 1: A statistician was hired by a community hospital to study the relationship between male patient's weight and level of cholesterol. A random sample of 10 male patients was taken and the following table is a summary of their weight and cholesterol levels:

Compute r. Answer: r ≈ 0.42 ⇑ Refer back to 46.4 - Example 1 & 46.4 - Solved Problem 1.

Supplementary Problems

1. Assume the data x,y are related as follows:

For this relationship, compute the Spearman's rank correlation. 2. A coin is tossed 20 times with the following sequence of heads(h) and tails(t): t h h t h h t t t h t h t h h h t t t h. This sequence can be separated into groups of heads and tails: t hh t hh ttt h t h t hhh ttt h Each such group is called a run. For this sequence we have 12 runs. For the theory of runs, we have the following important results: Assume we have a sequence of random trials where each trial results in two possible outcomes say, a,b. Let N1 be the number of a’s, N2 the number of b’s, and R the number of runs. If N1, N2 > 7 then the distribution of R is approximately normally distributed where

For the above sequence compute μ and σ2. 3. A computer chip manufacturer has a machine that produces a new type of computer chip. The company estimates that on average 5% of the chips are defective. To test for production stability, 200 chips are sampled in the order they are manufactured. a. Compute μR and σR. b. State Ho and Ha. c. Using a significant level of α = 0.05, state a decision to test if the production is stable.

d. On a given day a sample of 200 chips is taken. Assume the system is not stable and 8% of the chips are defective. Find the probability that we conclude the system is stable ( a type II error). e. If the system is stable, find the probability that a sample will result in at least 25 runs.

Table C - Standard Normal Distribution

Table D The t-Distribution table

Table E F - Distribution (α = 0.05) (d2 degrees of freedom in the numerator) (d1 degrees of freedom in the denominator)

Table E F - Distribution (α = 0.01) (d2 degrees of freedom in the numerator) (d1 degrees of freedom in the denominator)

Table F - The Chi-Square Distribution

APPENDIX A Descriptive Statistics Review Frequency Distributions

1.1 - What is a Frequency Distribution Table and Histogram? It is difficult to interpret most data in its raw form. One effective way to interpret raw data is to construct a distribution table and histogram where the data is tabulated according to classes. There are two types of classes: (1) individual numeric values and (2) fixed intervals. A histogram is a graphic representation of a distribution table made up of rectangles where the base of the rectangles represents the numeric or interval classes and the height of the rectangles measure the frequency of the values. Example: The following set of raw data is listed:

From this data, we form a distribution table and histogram where the classes are individual values:

Example: The following set of data is listed:

From this data, we form a distribution table and histogram where the classes are the following intervals1 [0.5,1.5), [1.5,2.5), [2.5,3.5), [3.5,4.5), [4.5,5.5),

[5.5,6.5), [6.5,7.5), [7.5,8.5), [8.5,9.5), [9.5,10.5). The interval [0.5,1.5) includes all values, from the table, greater than or equal to 0.5 but less than 1.5. The interval [1.5,2.5) includes all values, from the table, greater than or equal to 1.5 but less than 2.5, etc.

To construct a frequency distribution, use the following rules: 1. From the raw data, find the minimum and maximum values. This gives the range of data. 2. Decide on one of two types of classes: single numeric values or class intervals of a fixed interval size and the class values. 1.1 - Example 1: A sample of 30 families in New York City was recently taken. The following data represents the number of children per family:

(a). Construct a frequency distribution table for single value classes. (b). Construct the appropriate histogram. Solutions: ➤ (a). Step 1: Scanning these values, we find the smallest value is 0 and the largest value is 7. Step 2: From the data, we see that no children occur in one family, one child occurs in one family, two children occur in ten families, three children occur in five families, four children occur in five families, five children occur in five families, six children occur in two families and seven children occur in only one family.

1.1 - Example 2: A survey of hourly wages of fifty employees at a local fast food restaurant resulted in the following data: $6.37, $5.44, $5.29, $6.21, $6.35, $5.86, $5.62, $8.43, $6.85, $7.89 $6.93, $9.27, $4.63, $5.15, $6.50, $5.14, $7.35, $6.21, $5.34, $7.10 $6.77, $5.62, $4.08, $7.10, $6.33, $6.58, $5.98, $5.86, $6.84, $4.06 $5.04, $8.40, $5.93, $4.63, $6.45, $5.20, $5.93, $4.81, $5.99, $4.29 $5.87, $5.11, $6.83, $4.46, $5.34, $6.00, $6.71, $5.09, $5.27, $6.70

(a). Construct a frequency distribution table for the following classes: [3.5,4.5), [4.5,5.5), [5.5,6.5), [6.5,7.5), [7.5,8.5), [8.5,9.5). (b). Construct the appropriate histogram. Solutions: ➤ (a). Scanning the data, there are four employees that earn between $3.50 and $4.49, fourteen employees that earn between $4.50 and $5.49, sixteen employees that earn between $5.50 and $6.49, twelve employees that earn between $6.50 and $7.49, three employees that earn between $7.50 and $8.49 and 1 employees earns between $8.50 and $9.49.

Solved Problems 1.1 - Solved Problem 1: Ms. Jones recently gave a final examination in Spanish. Twenty five students in her class were surveyed as to the number of hours they studied for the final. The following table is the results of this survey:

(a). Construct a frequency distribution table for single value classes. (b). Construct the appropriate histogram. Solutions: ➤ (a). Step 1: Scanning these values, we find the smallest value is 1 and the largest value is 10. Step 2: From the data, we see that one student studied one hour, two students studied two hours, three students studied three hours, one student studied four hours, three students studied five hours, two students studied six hours, four students studied seven hours, two students studied 8 hours, four students studied nine hours, and three students studied ten hours.

1.1 - Solved Problem 2: The Frozen Foods Company recently developed a new pizza. To check the retail prices the supermarkets are charging for this pizza, it takes a survey of the prices that are charged by 20 supermarkets. The following data was collected: $5.37, $6.11, $4.88, $5.33, $5.54 $5.80, $4.71, $4.85, $5.01, $6.15 $4.78, $5.08, $5.47, $5.89, $6.47 $6.32, $5.77, $6.21, $6.17, $4.52 (a). Construct a frequency distribution table for the following classes: [$4.50,$4.70), [$4.70,$4.90), [$4.90,$5.10), [$5.10,$5.30), [$5.30,$5.50), [$5.50,$5.70), [$5.70,$5.90), [$5.90,$6,10), [$6.10,$6.30), [$6.30,$6.50). (b). Construct the appropriate histogram. Solutions: ➤ (a). Scanning the data, there is one supermarket that charges between $4.50 and $4.69, four supermarkets that charges between $4.70 and $4.89, two supermarkets between $4.90 and $5.09, no supermarkets between $5.10 and $5.29, three supermarkets between $5.30 and $5.49, one supermarket between $5.50 and $5.69, three supermarkets between $5.70 and $5.89, no supermarkets between $5.90 and $6.09, four between $6.10 and $6.29 and two between $6.30 and $6.49.

Unsolved Problems with Answers 1.1 - Problem 1: A die is a six sided cube, where each side is marked with the numbers 1 through 6. A pair of these dice are tossed 35 times where the sum of the dice each time is recorded. The following is the numbers recorded:

(a). Construct a frequency distribution table for single value classes. (b). Construct the appropriate histogram. Answers: ➤ (a).

⇑ Refer back to 1.1 - Example 1 & 1.1 - Solved Problem 1 1.1 - Problem 2: Twenty students at a local college tried out for the track team. Each student had to run the one hundred yard dash. The following is the running speed, in seconds for each student: 9.66, 9.41, 9.42, 10.23, 10.57 10.66, 10.72, 10.74, 9.77, 10.86 10.39, 10.17, 9.53, 10.74, 11.12 11.17, 10.54, 10.06, 11.33, 11.00. (a). Construct a frequency distribution table for the following classes: [9.3,9.5), [9.5,9.7), [9.7,9.9), [9.9,10.1), [10.1,10.3), [10.3,10.5), [10.5,10.7), [10.7,10.9), [10.9,11.1), [11.1,11.3), [11.3,11.5). (b). Construct the appropriate histogram. Answers: ➤ (a).

⇑ Refer back to 1.1 - Example 2 & 1.1 - Solved Problem 2

1.2 - Relative-Frequency Distribution The relative-frequency distribution of a class of data is the frequency of the data divided by the total frequency. 1.2 - Example 1: Using the frequency distribution table in 1.1 - Example 1, (a). Construct a relative-frequency distribution. (b). Interpret the meaning of the relative-frequency distribution. Solutions: ➤ (a). Step 1: The frequency distribution is

Step 2: In the above table, divide each number in the second column by 30:

➤ (b). 3% of the families have no children (0.03 = 3%). 3% of the families have one child. 33% of the families have two children. 17% of the families have three children. 17% of the families have four children. 17% of the families have five children. 7% of the families have 6 children. 3% of the families have 7 children.

Solved Problems

1.2 - Solved Problem 1: Using the distribution in the 1.1 - Solved Problem 1 (a). Construct a relative-frequency distribution. (b). Interpret the meaning of the relative-frequency Solutions: ➤ (a). Step 1: The frequency distribution is

Step 2: Construct the relative-frequency distribution by dividing each of the value in the second column by 25:

➤ (b). 4% of the students studied one hour. 8% of the students studied two hours. 12% of the students studied three hours. 4% of the students studied four hours. 12% of the students studied five hours. 8% of the students studied six hours. 16% of the students studied seven hours. 8% of the students studied eight hours. 16% of the students studied nine hours. 12% of the students studied ten hours.

Unsolved Problems with Answers 1.2 - Problem 1: Using the distribution in 1.1 - Example 2, (a). construct a relative-frequency distribution table. (b). Interpret the meaning of the relative-frequency distribution Answers: ➤(a).

➤ (b). 8% of the employees earn between $3.50 and $4.49 an hour. 28% of the employees earn between $4.50 and $5.49 an hour. 32% of the employees earn between $5.50 and $6.49 an hour. 24% of the employees earn between $6.50 and $7.49 an hour. 6% of the employees earn between $7.50 and $8.49 an hour. 2% of the employees earn between $8.50 and $9.49 an hour. ⇑ Refer back to 1.2 - Example 1 & 1.2 - Solved Problem 1

1.3 - Cumulative-Relative-Distribution The Cumulative-Relative-Distribution is the sum of all relative frequencies at and above each line of the relative-distribution table. 1.3 - Example 1: From 1.1 - Example 1, (a). Construct a cumulative relative-distribution table. (b). Interpret this table. Solutions: ➤ (a). Step 1: The distribution table for this example is

Step 2: The relative-frequency distribution table is

Step 3: For each line, sum the numbers at and above in the relative frequency table:

➤ (b). Three percent of the families have no children. Six percent have at most one child. Thirty nine percent have at most two children. Fifty six percent have at most three children. Seventy three percent have at most four children. Ninety percent have at most five children. Ninety seven percent have at most six children. One hundred percent have at most 7 children.

Solved Problems 1.3 - Solved Problem 1: Using the distribution in Solved Problem 1.1, (a). Construct a cumulative-relative-frequency distribution table. (b). Interpret the meaning of the cumulative relative-frequency distribution table. Solutions: ➤ (a). Step 1: The frequency distribution is

Step 2: The relative-frequency distribution table is:

Step 3: Sum the values at and above each line of the above table:

➤ (b). Four percent of the students studied one hour. Twelve percent of the students studied at most two hours. Twenty four percent of the students studied at most three hours. Twenty eight percent of the students studied at most four hours. Forty percent of the students studied at most five hours. Forty eight percent of the students studied at most six hours. Sixty four percent of the students studied at most seven hours. Seventy two percent of the students studied at most eight hours. Eighty eight percent of the students studied at most nine hours. One hundred percent of the students studied at most ten hours.

Unsolved Problems with Answers 1.3 - Problem 1: Using the relative distribution in 1.1 - Example 2, (a). Construct a Cumulative-relative-frequency distribution table. (b). Interpret the Cumulative-relative-frequency distribution table. Answers: ➤ (a).

(b). Eight percent earn less than $4.50. Thirty six percent earn less than $5.50. Sixty eight percent earn less than $6.50. Ninety two percent earn less than $7.50. Ninety eight percent earn less than $8.50. One hundred percent earn less than $9.50. ⇑ Refer back to 1.3 - Example 1 & 1.3 - Solved Problem 1

Supplementary Problems 1. A die is tossed 120 times. The following distribution occurred:

a. Interpret the frequency table. b. Construct the histogram.

c. Construct a relative frequency distribution table. d. Construct a cumulative frequency distribution table. e. Interpret the cumulative frequency distribution table. 2. A survey of ten children each in twenty cities as to whether they believe in Santa Claus was performed. The following is the distribution table resulting from this survey:

a. Interpret the distribution table. b. Draw the histogram. c. Construct a relative frequency distribution. d. Interpret the relative frequency distribution. 3. From the grade records at a local college, the following cumulative relative distribution of all students was constructed:

a. Interpret this distribution. b. Construct a relative frequency distribution. c. Interpret this distribution. d. If the enrollment at this college is 10,000, construct a frequency distribution.

1Selection

of class types and values depends on the application needed.

APPENDIX B Descriptive Statistics Review Averages

Since the collection of data usually consists of several numerical values, there is frequently a need for a single numeric value to represent this data. Such a value is called an average of the data. For most applications, there are three types of averages: mean, median and mode.

2.1 - How are the Mean, Median, and Mode values computed? How to compute the mean value. The mean value is the same as the arithmetic mean of the numeric data. To compute the mean value of data: 1. Add the values. 2. Divide this sum by the number of values. The mean value is represent by X.

Example: How to compute the median value. Case 1: The median value for an odd number of data is the middle value, where the data is arranged in ascending order (low to high). To compute the median value for an odd number of data: 1. Rearrange the data in ascending order. 2. The median value is the middle value. Example: 1, 10, 3, 7, 10, 100, 4. Step 1: Arranging these values in ascending order: 1, 3, 4, 7, 10, 10, 100 Step 2: The number 7 is the middle value. Therefore, 7 is the median value. Case 2: The median value for an even number of data is the average of the middle two numbers, where the data is arranged in ascending order. Example: 1, 10, 3, 7, 10, 100, 4, 33. Step 1: Arranging these values in ascending order: 1, 3, 4, 7, 10, 10, 33, 100 Step 2: The two middle values are 7 and 10. Step 3: The median value is

.

Therefore, 8.5 is the median value.

How to compute the mode. The mode of a set of data is the value that occurs the most frequently. Case 1: 1, 10, 3, 7, 10, 100, 4, 33. Here the number 10 appears twice and the other values occur only once.

Therefore, the number 10 is the mode of the data. Case 2: 1,13, 3, 7, 10, 100, 4,33 Here all numbers appear once. Therefore, there is no mode. Case 3: 1, 10, 3, 7, 10, 100, 4, 33, 1. Here the numbers 1 and 10 appear twice. The other numbers appear once. Therefore, the data has two modes: 1 and 10. 2.1 - Example 1: This past summer, Ms. Gardener read six books. The following data represents the number of pages in each of these books: 238, 132, 542, 601, 401, 225. (a). Find the mean (b). Find the median (c). Find the mode. Solutions: ➤(a). Step 1: 238 + 132 + 542 + 601 + 401 + 225 = 2139 Step 2:

pages.

➤(b). Step 1: Arrange the numbers 238, 132, 542, 601, 401, 225 in ascending order: 132, 225, 238, 401, 542, 601. Step 2: Since there are six values, average the middle two numbers

➤(c). Since all the numbers occur only once, there is no mode. 2.1 - Example 2: A sample of 30 families in New York City was recently

taken. The following data represents the number of children per family:

(a). Find X (b). Find the median (c). Find the mode. (d). Construct the appropriate histogram and locate these three averages. Solutions: ➤(a).

Step 2: ➤(b). Step 1: Arrange the numbers in ascending order: 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 7 Step 2: Since there are an even number of values, average the two middle values which are 3 and 3. Step 3: Therefore, the median is

.

➤(c). From the values arranged in ascending order in (b), we see that the number 2 appears ten times. Therefore, the number 2 is the mode. ➤(d).

Solved Problems 2.1 - Solved Problem 1: At a local community college, nine students applied for a special science scholarship. The following are their grade point averages: 3.8, 3.2, 3.2, 3.1, 4.0, 3.7, 3.5, 3.0, 3.9. (a). Find the mean (b). Find the median (c). Find the mode. Solutions: ➤(a). Step 1: 3.8 + 3.2 + 3.2 + 3.1 + 4.0 + 3.7 + 3.5 + 3.0 + 3.9 = 31.4. Step 2:

.

➤(b). Step 1: Arrange the above numbers in ascending order: 3.0, 3.1, 3.2, 3.2, 3.5, 3.7, 3.8, 3.9, 4.0. Step 2: Since there are nine values, the middle number is 3.5. Therefore, the median value is 3.5. ➤(c). The mode is 3.2 since this number occurs twice.

2.1 - Solved Problem 2: Ms. Jones recently gave a final examination in Spanish. Twenty five students in her class were surveyed as to the number of hours they studied for the final. The following table is the results of this survey:

(a). Find X (b). Find the median (c). Find the mode. (d). Construct the appropriate histogram and locate these three averages. Solutions: ➤(a). Step 1: Add the numbers: 8+3+5+6+7+ 2+3+3+8+2+ 4 + 5 + 7 + 10 + 9 + 7+9+9+1+6+ 9 +10 + 10 + 7 + 5 = 155 Step 2: Divide 155 by

.

Step 3: The mean X = 6.2 hours. ➤(b).. Step 1: Arrange the data in ascending order: 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10. Step 2: Since there is an odd number of values, the median value occurs at the thirteen number: 7. Therefore, the median value is 7. ➤(c).

The number 7 and 9 both occur four times. Therefore, both these numbers are modes. ➤(d).

Unsolved Problems with Answers 2.1 - Problem 1: A consumer group recently tested eight new automobiles for their mileage per gallon. The following is the mileage attained: 32.7, 31.3, 32.7, 29.7, 21.9, 31.3, 38.1, 37.7 (a). Find the mean (b). Find the median (c). Find the mode(s). Answers: ➤(a). X = 31.9 ➤(b). 32 ➤(c). 32.7, 31.3 ⇑ Refer back to 2.1 - Example 1 & 2.1 - Solved Problem 1 2.1 - Problem 2: A die is a six sided cube, where each side is marked with the numbers 1 through 6. A pair of these dice is tossed 35 times where the

sum of the dice is recorded each time. The following are the numbers recorded:

(a). Find X (b). Find the median (c). Find the mode. (d). Construct the appropriate histogram and locate these three averages. Answers: ➤(a). X = 6.89 ➤(b). 7 ➤(c). 8 ➤(d).

⇑ Refer back to 2.1 - Example 2 & 2.1 - Solved Problem 2

2.2 - How are the Mean, Median, and Modal values for a frequency distribution computed? The best way to show how to compute these averages is by example. 2.2 - Example 1: A survey of hourly wages of fifty employees at a local fast food restaurant resulted in the following frequency distribution:

(a). Compute the mean of the frequency distribution. (b). Compute the median of the frequency distribution. (c). Compute the mode of the frequency distribution. Solutions: ➤(a). To compute the mean of a frequency distribution we complete the following table:

1. The values in Col. 3 are the middle value for each interval. For example, the mid-class value for [$3.50,$4.50) equals 2. The mean of the frequency distribution is

Therefore, ➤(b).. We start with the frequency distribution table:

The following rules should be followed to compute the median value of the frequency distribution: 1. From the frequency column, we compute 50/2 = 25 employees. 2. The twenty-fifth employee occurs in the class [$5.50,$6.50). 3. The median value is given by the formula: , where the 18 = 4 + 14. ➤ (c). To compute the mode of the frequency distribution we use the formula Mode = where C is the width of the class intervals, Δ1 = the largest frequency value(s) of the second column minus the preceding frequency value, Δ2 = the largest frequency value(s) of the second column minus the following frequency value, L1 is the lowest value for the most frequently used class, where C = $6.50 - $5.50 = $1.00. Δ1 = 16 - 14 = 2, Δ2 = 16 - 12 = 4,

L1 = $5.50. Therefore, mode =

.

Solved Problems 2.2 - Solved Problem 1: The Frozen Foods Company recently developed a new pizza. To check the retail prices the supermarkets are charging for this pizza, a survey of the prices that are charged by 20 supermarkets is taken. The following frequency distribution was collected:

(a). Compute the mean of the frequency distribution. (b). Compute the median of the frequency distribution. (c). Find the mode of the frequency distribution. Solutions: ➤(a). To compute the mean of a frequency distribution, we complete the following table:

1. The values in Col. 3 are the middle value for each interval. For

example,

the

mid-class

value

for

[$4.50,$4.70]

equals

. 2. The mean of the frequency distribution is The following is the completed table

.

Therefore,

.

➤(b). We start with the following table:

The following rules should be followed to compute the median value of the frequency distribution:

1. From the frequency column, we compute 20/2 = 10 supermarkets. 2. The tenth supermarket occurs in the class [$5.30,$5.50). 3. The median value is given by the formula: , where the 7 = 1+ 4 + 2. ➤(c). Here, we have two classes which occur the most frequently: [$4.70,$4.90) and [$6.10,$6.30). To compute the two modes of the frequency distribution, we use the formula >Mode = For the class [$4.70,$4.90) we have C = $4.90 - $4.70 = $0.20 Δ1 = 4 - 1 = 3, Δ2 = 4 - 2 = 2, L1 = $4.70. Therefore, Mode1 =

.

For the class [$6.10,$6.30) we have C = $6.30 - $6.10 = $0.20 Δ1 = 4 - 0 = 4, Δ2 = 4 - 2 = 2, L1 = $4.70. Therefore, Mode2 =

.

Unsolved Problems with Answers 2.2 - Problem 1: Mrs. Clark is the manager of a weight reduction club for women. For the 294 members, she recorded their individual weights at the time they joined club. The following frequency distribution represents this data:

(a). Compute the mean of the distribution. (b). Compute the median of the distribution. (c). Compute the mode of the distribution. Answers: ➤(a). X = 170.92 pounds ➤(b). 162.18 pounds ➤(c). 154.35 pounds ⇑ Refer back to 2.2 - Example 1 & 2.2 - Solved Problem 1

Supplementary Problems 1. Ms. Cary is a commodity trader. Over the past twenty weeks, she has recorded the end-of-the- week closing prices where prices are measured in cents per pound. To get a better understanding of the changes of these prices, she needs to compute a five week average of these prices. The second column in the following table is a list of these prices:

The rules for computing the 5 day moving average is as follows: 1. Add the first 5 numbers and record this average in the last column. 2. Drop the first number in the list of prices, compute the average of the next 5 numbers and record this average in the last column. 3. Continue this process to the end of the price data. In the last column we have computed the first three 5 day moving averages. Complete the remainder of this column. 2. For the numbers 4, 12, 654, 132, -10, 13, 0, -125, 13, p, find the value p so that the average of these ten number is X = 1. 3. Using the formula average X of the numbers 5, 6, 7, 8, 9,…, 10,000.

, find the mean

4. Using the formula

, find

the mean X of the numbers 9, 16, 25, 36, 49, 64, 81, 100,…, 10,000. 5. Consider the list of data A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. From this set of data, we list all sub-lists of numbers consisting of nine numbers: A1: {2, 3, 4, 5, 6, 7, 8, 9, 10} A2: {1, 3, 4, 5, 6, 7, 8, 9, 10}, A3: {1, 2, 4, 5, 6, 7, 8, 9, 10}, A4: {1, 2, 3, 5, 6, 7, 8, 9, 10}, A5: {1, 2, 3, 4, 6, 7, 8, 9, 10}, A6: {1, 2, 3, 4, 5, 7, 8, 9, 10}, A7: {1, 2, 3, 4, 5, 6, 8, 9, 10}, A8: {1, 2, 3, 4, 5, 6, 7, 9, 10}, A9: {1, 2, 3, 4, 5, 6, 7, 8, 10}, A10: {1, 2, 3, 4, 5, 6, 7, 8, 9 }, For the original list of data as well as all sub-lists, compute their mean values.

APPENDIX C Descriptive Statistics Review Measuring Variation

3.1- What is Data Variation? Data variation is a numeric value which measures the spread of data from the mean x. For example, the two sets of numbers 20, 22, 25, 30 and 5, 22, 25, 45 both have the same mean x = 24.25, but the spreads from 24.25 are different since the first group of data is not as varied as the second group of data. The following two histograms graphically demonstrate two sets of data both having a mean x = 100 but different variations.

The following are three common methods for representing the variation of data. The Range The range of data is the difference between the largest and smallest numbers in the data. Example: Assume the data is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Since 10 and 1 are the largest and smallest numbers respectively, the range is 10 - 1 = 9. The Absolute Mean Variation (AMV) The following table shows how the absolute mean variation is computed:

The Standard Deviation (s) The following table shows how the standard deviation is computed:

The following are the rules for computing the standard deviation: Rule 1: Compute x from the data in column 1. Rule 2: The numbers in column 2 are computed using the formula x - x. Rule 3: The numbers in column 3 are computed using the formula (x - x)2. Rule 4: To compute the standard deviation: a. sum the numbers in column 3. In the table, this sum is 82.5. b. Divide this sum by the total number of values in column 3. This gives 8.25. c. The standard deviation s is the square root of the value computed in b. This gives s = 2.87. Variance

The variance of a set of data is defined as the square of the standard deviation s2. For the above example, the variance is s2 = 2.872 = 8.25. Of these methods used to compute the variation of a set of data, the variance and standard deviation are the most frequently used. 3.1 - Example 1: Ms. Jones teaches a Latin class at a Senior center. The following data is the ages of her students: 74, 67, 65, 74, 67, 81, 65, 85, 67, 80. Find the (a). range. (b). absolute mean variation. (c). standard deviation. (d). variance. Solutions: ➤(a). The oldest and youngest ages are 85 and 65 respectively. Therefore, the range is 85 - 65 = 20 years old. ➤(b). The following table shows how the absolute mean variation is computed.

➤(c). The following table shows how the standard deviation is computed:

➤(d). The variance is s2 = 7.022 ≈ 49.25.

Solved Problems 3.1 - Solved Problem 1: Rick is a member of the All Star Bowling Team. Last week he bowled the following scores: 187, 167, 201, 185, 167, 210, 205, 167. Find the (a). range. (b). absolute mean variation. (c). standard deviation. (d). variance. Solutions: ➤(a). The highest and lowest scores are 210 and 167 respectively. Therefore the range is 210 - 167 = 43 points. ➤(b). The following table shows how the absolute mean variation is computed.

➤(c). The following table shows how the standard deviation is computed:

➤(d). The variance is s2 = 16.762 ≈ 280.90.

Unsolved Problems with Answers 3.1 - Problem 1: A die is tossed 20 times with the following outcomes: 4, 5, 6, 4, 2, 4, 2, 2, 4, 4, 2, 4, 1, 1, 3, 4, 5, 3, 3, 4. Find the (a). range. (b). absolute mean variation. (c). standard deviation. (d). variance. Answers: ➤(a). 5

➤(b). 1.12 ➤(c). 1.31 ➤(d).1.73 ⇑ Refer back to 3.1 - Example 1 & 3.1 - Solved Problem 1.

3.2 - Computing the Variance and Standard Deviation for Frequency Distributions. The following example of a frequency distribution demonstrates how to compute its standard deviation:

x = 2530/44 = 57.50 s2 = 13225/44 = 300.57

3.2 - Example 1: A survey of hourly wages of fifty employees at a local fast food restaurant resulted in the following frequency distribution:

Compute the variance and standard deviation. Solution: The following table computes the standard deviation and variance:

x = $299/50 = $5.98 s2 = 62.97/50 ≈ $1.26

Solved Problems 3.2 - Solved Problem 1: The Frozen Foods Company recently developed a

new pizza. To check the retail prices the supermarkets are charging for this pizza, a survey of the prices that are charged by 20 supermarkets is taken. The following frequency distribution was collected:

Compute the variance and standard deviation. Solution: The following table computes the standard deviation and variance:

x = $110.60/20 = $5.53 s2 = 7.14/20 ≈ 0.36

Unsolved Problems with Answers 3.2 - Problem 1: Ms. Clark is the manager of a weight reduction club of 294 women. For the members, she recorded their individual weights at the time they joined the club. The following frequency distribution represents this data:

Find the variance and standard deviation. Answers: s2 = 528.92 s = 23.00 ⇑ Refer back to 3.2 - Example 1 & 3.2 - Solved Problem 1.

3.3 - An Application for the Standard Deviation

In Statistics, we frequently are interested in the data that fall within a given number of standard deviations from the mean x. 3.3 - Example 1: A sample of 30 families in New York City was recently taken. The following data, listed in ascending order, represents the number of children per family: 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 7 Find (a). the mean x. (b). the standard deviation. (c). the numbers that are within one standard deviation of x. (d). the numbers that are within two standard deviations of x. (e). the numbers that are within three standard deviations of x. (f). the percent of numbers that are within two standard deviations of x. Solutions: ➤(a). To find the mean, add all the above numbers and divide by 30. This gives x ≈ 3.33. ➤(b). Following the rules in section 1 of this lesson, we find the standard deviation s = 1.65. ➤(c). To find the numbers that are within one standard deviation of ×, we select those numbers that are between 3.33 - 1.65 = 1.68 and 3.33 + 1.65 = 4.98. This would include all numbers between 2 and 4 children: 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4.

➤(d). To find the numbers that are within two standard deviations of ×, we select those numbers that are between 3.33 - 2(1.65) = 0.03 and 3.33 + 2(1.65) = 6.63. This would include all numbers between 1 and 6 children: 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6. ➤(e). To find the numbers that are within three standard deviations of x, we select those numbers that are between 3.33 - 3(1.65) = -1.62 and 3.33 + 3(1.65) = 8.28. This would include all 30 numbers. ➤(f). Since 28 out of 30 numbers are within 2 standard deviations of ×, the percent is 28/30 ≈ 93%.

Solved Problems 3.3 - Solved Problem 1: A computer generates the following 25 random numbers ranging from 1 to 10: 1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 9, 9, 9, 10,10. Find (a). the mean x. (b). the standard deviation. (c). the numbers that are within one standard deviation of x. (d). the numbers that are within two standard deviations of x. (e). the numbers that are within three standard deviations of x. (f). the percent of numbers that are within two standard deviations of x. Solutions:

➤(a). To find the mean, add all the above numbers and divide by 25. This gives x ≈ 5.92 ➤(b). Following the rules in section 1 of this lesson, we find the standard deviation s ≈ 2.48. ➤(c). To find the numbers that are within one standard deviation of x, we select those numbers that are between 5.92 - 2.48 = 3.44 and 5.92 + 2.48 = 8.40. This would include all numbers between 4 and 8: 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8. ➤(d). To find the numbers that are within two standard deviations of x, we select those numbers that are between 5.92 - 2(2.48) = 0.96 and 5.92 + 2(2.48) = 10.88. This would include all the numbers. ➤(e). To find the numbers that are within three standard deviations of x, we select those numbers that are between 5.92 - 3(2.48) = -1.52 and 5.92 + 3(2.48) = 13.36. This would include all numbers 25 numbers. ➤(f). Since all of the 25 numbers are within 2 standard deviations of x, the percent of these numbers is 100%.

Unsolved Problems with Answers 3.3 - Problem 1: Ms. Jones recently gave a final examination in Spanish. The thirty students in her class received the follow grades: 57.47, 60.05, 60.83, 61.78, 62.70, 62.73, 62.80, 63.16, 63.24, 63.27, 63.31, 63.94, 64.18, 64.68, 64.83,

64.97, 65.18, 65.26, 65.31, 65.51, 65.60, 65.88, 66.31, 66.46, 66.56, 67.21, 67.57, 67.64, 68.44, 69.61. Find the (a). mean x. (b). standard deviation. (c). numbers that are within one standard deviation of x. (d). numbers that are within two standard deviations of x. (e). numbers that are within three standard deviations of x. (f). the percent of numbers that are within two standard deviations of x. Answers: ➤(a). x = 64.55 ➤(b). s = 2.57 ➤(c). 62.70,62.73, 62.80, 63.16, 63.24, 63.27,63.31, 63.94, 64.18, 64.68, 64.83,64.97, 65.18, 65.26, 65.31, 65.51,65.60, 65.88, 66.31, 66.46, 66.56 ➤(d). 60.05, 60.83, 61.78, 62.70, 62.73, 62.80, 63.16, 63.24, 63.27, 63.31, 63.94, 64.18, 64.68, 64.83, 64.97, 65.18, 65.26, 65.31, 65.51, 65.60, 65.88, 66.31, 66.46, 66.56, 67.21, 67.57, 67.64, 68.44, 69.61 ➤(e). All the numbers. ➤(f). 96.67% ⇑ Refer back to 3.3 - Example 1 & 3.3 - Solved Problem 1.

Supplementary Problems 1. For the set of numbers 0,1,2,3,4,5,6,…,100, use the formulas,

a. to find the mean x. b. to find its standard deviation. c. List all numbers that fall within one standard deviation of the mean. d. List all numbers that fall within two standard deviations of the mean. 2. A statistician was hired by a professional basketball team to do a study on the performance of the team. The following is a list, in numeric order, of the final scores of the team's last 50 games:

a. Find the standard deviation. b. For this data complete the following frequency distribution table:

c. For the above distribution, compute the standard deviation.

APPENDIX D Probability Theory Review The Normal Distribution

The normal distribution is the most important continuous distribution. Frequently studied as the bell shaped curve, this distribution is important in studying the distribution of sample means. The normal distribution P{X ≤ x} equals the shaded area under the bellshaped curve: The mean is μ and the variance is σ2. The positive number σ is called the standard deviation of the normal distribution. Note that 50% of the area under the curve lies to the left and right of μ. For the normal distribution, the following rules hold: 1. The total area under the curve is 1. 2. The total area under the right side of the curve is 0.5. 3. The total area under the left side of the curve is 0.5. 4. The left and right sides of the curve are symmetric.

25.1-The Standard Normal Distribution. A random variable Z has a standard normal distribution P{Z ≤ z} equal to the area shaded under the curve of a normal distribution with μ = 0 and variance σ2 = 1 (σ = 1). The area of the shaded area from 0 to z is given in the standard normal table C for specific values of z.

Given z, find the area shaded under the curve. 25.1 - Example 1 - From the normal distribution Table C, find the shaded area for the figures.

Solution: Step 1: Go to the top of table C. Since 1.64 = 1.6 +.04, select the column marked 0.04. Step 2: Move down the column marked z to the row 1.6. Step 3: The intersection of this row and column is the number 0.4495.

Solution: Step 1: The area from the table for z = 1.64 is 0.4495.

Step 2: Subtract 0.4495 from 0.5. Step 3: The shaded area is 0.5 - 0.4495 = 0.0505.

Solution: Step 1: The shaded area for z = 1.64 is 0.4495. Step 2: Add 0.5 to 0.4495. Step 3: The shaded area is 0.9495.

Solution: Step 1: For z = -1.64, the area is 0.4495. Step 2: For z = -1.64, the area is 0.4495. Step 3: Add 0.4495 + 0.4495 = 0.8990.

Solution: Step 1: Look up the area in the table for z = 1.96. Step 2: The area is 0.475.

Step 3: Look up the area in the table for z = 1.64. Step 4: The area is 0.4495. Step 5: Subtracting these two areas gives the shaded area under the curve which is 0.0255.

Given the area shaded under the curve, find z. For the next two examples, the area is given and the value z is to be found from table C. 25.1 - Example 2 - For the shaded area in the figures, find z.

Solutions: Step 1: Search the area portion of the table for the value 0.4904. Step 2: For this area, we find the z value is z = 2.34.

Solution: Step 1: Compute the area from 0 to z: 0.5 - 0.3783 = 0.1217. Step 2: Looking up in the area part of table C, the value 0.1217. Step 3: The corresponding z value is z = 0.31.

Solution: Step 1: Compute the shaded area: 0.5950 - 0.5 = 0.0950. Step 2: The area 0.0950 is not in the area portion of the table.

Step 3: Take the closest area to 0.0950 which is 0.0948. Step 4: The z value associated with 0.0948 is z = 0.24.

Solved Problems 25.1 - Solved Problem 1: Find the shaded area for the following figures: ➤ (a).

Solution: Using the symmetry of the normal distribution, look up z = 1.96. We find the area is 0.4750. ➤ (b).

➤ (c).

Solution: Step 1: From symmetry of the normal distribution, let z = 1.

Step 2: From the table, we find an area equal to 0.3413. Step 3: Since the total left area is 0.5, then the shaded area is.5 - 0.3413 = 0.1587.

Solution: Step 1: Look up in the table the area for z = 2.49. Step 2: Since the right side of the curve has an area equal to 0.5, the total shaded area is 0.4936 + 0.5 = 0.9936. ➤ (d).

Solution:

Step 1: Look up the area for z = 1.96. Step 2: The area is 0.4750. Step 3: The area for z = -1.96 is 0.4750. Step 4: The shaded area is 0.4750 + 0.4750 = 0.95. ➤ (e).

Solution: Step 1: From the table, the area from 0 to -3 is 0.4987. Step 2: From the table, the area from 0 to -2.1 is 0.4821. Step 3: The difference, 0.0166, is the shaded area. 25.1 - Solved Problem 2: For the shaded area in the figure, find z. ➤ (a).

Solution: Step 1: Search the area portion of the table for the value 0.0319. Step 2: For this area, we find the z value is z = -0.08.

➤ (b).

Solution: Step 1: Compute the area from 0 to z: 0.5 - 0.1020 = 0.3980. Step 2: Look up the area part of the table for the value 0.3980. Step 3: The corresponding z value is z = -1.27. ➤ (c).

Solution: Step 1: Compute the area 0.9903 - 0.5 = 0.4903. Step 2: The value 0.4903 does not exist in the area portion of the table. Step 3: The closest value is 0.4904. Step 4: The correspond z value is -2.34.

Unsolved Problems with Answers 25.1 - Problem 1: Find the shaded area for the figures below. ➤ (a).

➤ (b).

➤ (c).

➤ (d).

➤ (e).

Answers: ➤ (a). 0.4772 ➤ (b). 0.0344 ➤ (c). 0.9990 ➤ (d). 0.9956 ➤ (e). 0.3611 ⇑ Refer back to 25.1 - Example 1 & 25.1 - Solved Problem 1.

52.1 - Problem 2: For the shaded area in the figures below, find z. ➤ (a).

➤ (b).

➤ (c).

Answers: ➤ (a). z = -0.47 ➤ (b). z = -0.05 ➤ (c). z = 0.97

⇑ Refer back to 25.1 - Example 2 & 25.1 - Solved Problem 2.

25.2 - The Normal Distribution

To find the area of the normal distribution, the distribution must be converted to a Standard normal distribution by using the formula: P{X ≤ x} = P{Z ≤ z} where The value z measures the number of standard deviations from the mean μ (See Lesson 3, Descriptive Statistics).

The position of the z value must match the position of the x value. 25.2 - Example 1: For µ = 2.5, σ = 0.5, and x = 3.5, find the shaded area in the figure.

Solution:

Step 1: Step 2: Looking up in the table z = 2, the area is 0.4772.

25.2 - Example 2: For μ = 10, σ = 5, and x = 3, find the shaded area in the figure.

Solution: Step 1: Step 2: For z = -1.4, the area from the table is 0.4192. Step 3: The shaded area is 0.5 - 0.4192 = 0.0808.

25.2 - Example 3: For µ = 120, σ = 10, x1 = 125 and x2 = 135, find the shaded area in the figure.

Solution: Step 1: Step 2: Step 3: For z = 0.5, the area is 0.1915. Step 4: For z = 1.5, the area is 0.4332. Step 5: The shaded area is 0.4332 - 0.1915 = 0.2417.

25.2 - Example 4: For μ = 120, σ = 10, x = 110 and x = 135, find the shaded area in the figure.

Solution: Step 1: Step 2: Step 3: The area associated with z = 1.5 is 0.4332. Step 4: The area associated with z = -1 is 0.3413. Step 5: The shaded area is the sum: 0.3413 + 0.4332 = 0.7745.

25.2 - Solved Problems 25.2 - Solved Problem 1: For μ = 12.5, σ = 1.50 and x = 11.5, find the shaded area in the figure.

Solution: Step 1: Step 2: For z = -0.67, the table gives an area equal to 0.2486. 25.2 - Solved Problem 2: For µ = 100, σ = 50, and x = 130, find the shaded area in the figure.

Step 1: Step 2: From the table for z = 0.6, the area is 0.2257. Step 3: The shaded area is 0.5 - 0.2257 = 0.2743.

25.2 - Problem 3: For μ = 5, σ = 2, x1 = 4 and x2 = 3, find the shaded area in the figure.

Solution: Step 1: Step 2: Step 3: The area associated with -1 is 0.3413. Step 4: The area associated with -0.5 is 0.1915.

Step 5: The shaded area is 0.3413 - 0.1915 = 0.1498.

25.2 - Solved Problem 4: For µ = -120, σ = 10, x1 = -130 and x2 = -118, find the shaded area in the figure.

Solution: Step 1: Step 2:

Step 3: The area associated with z = 0.2 is 0.0793. Step 4: The area associated with z = 1 is 0.3413. Step 5: The total area is 0.0793 + 0.0.3413 = 0.4206.

Unsolved problems with Answers 25.2 - Problem 1: For μ = 0.5, σ = 0.15, and x = 1, find the shaded area in the figure.

Answer: 0.4996 ⇑ Refer back to 25.2 - Example 1 & 25.2 - Solved Problem 1. 25.2 - Problem 2: For μ = 1, σ = 5, and x = 13, find the shaded area in the figure.

Answer: 0.0082 ⇑ Refer back to 25.2 - Example 2 & 25.2 - Solved Problem 2. 25.2 - Problem 3: For μ = 1000, σ = 500, and x1 = 1300 and x2 = 1400, find the shaded area in the figure.

Answer: 0.0624 ⇑ Refer back to 25.2 - Example 3 & 25.2 - Solved Problem 3. 25.2 - Problem 4: For μ = -12, σ = 1, x1 = -13 and x2 = -11, find the shaded area in the figure.

Answer: 0.6826 ⇑ Refer back to 25.2 - Example 4 & 25.2 - Solved Problem 4.

25.3 - Important Formulas. 1. 2. X = μ + Zσ 3. μ = X - Zσ 4. 25.3 - Example 1: Assume σ = 2 and P{X ≤ 13} = 0.7054 Find μ. Solution: Step 1: We use formula 3: μ = x - zσ.

Step 2: x = 13 Step 3: From the table, for the area 0.2054, z = 0.54. Step 4: μ = x - zσ = 13 - 0.54(2) = 13 - 1.08 = 11.92

25.3 - Example 2: Assume σ = 10, μ = 25 and P{X ≥ x} = 0.0250. Find x. Solution: Step 1: Use formula 2: x = μ + zσ = 25 + z(10).

Step 2: To find z, use the area from the table 0.5 - 0.0250 = 0.4750. Step 3: From the table, z = 1.96

Step 4: Using the equation in Step 1 and z = 1.96 gives x = μ + zσ = 25 + z(10) = 25 + 1.96(10) = 25 + 19.6 = 44.60.

25.3 - Example 3: Assume μ = 4.75 and P{X < 7.51} = 0.9808. Find σ.

Solution: Step 1: Use formula 4:

.

Step 2: Using the area portion of the table for 0.4808, we find z = 2.07. Step 3:

Solved Problems 25.3 - Solved Problem 1: Assume σ = 10 and P{X ≤ 13} = 0.0336. Find μ.

Solution: Step 1: We use formula 3: μ = x - zσ. Step 2: x = 13 Step 3: Since 13 is to the left of μ, z will be a negative number. Step 4: Using the area portion of the table for 0.4664 we find z = -1.83.

Step 5: μ = x - zσ = 13 - (-1.83)(10) = 13 + 18.3 = 31.3 25.3 - Solved Problem 2: Assume σ = 10, μ = 25 and P{X ≥ x} = 0.9131. Find x.

Solution: Step 1: Use formula 2: x = μ + zσ = 25 + z(10) Step 2: To Find z, we use P{X < x} = 0.9131. Step 3: In the area portion of the table, 0.4131 has a z = -1.36. Step 4: Using the equation in Step 1 and z = -1.36 gives x = μ + zσ = 25 + z(10) = 25 + -1.36(10) = 25 + -13.6 = 11.4.

25.3 - Solved Problem 3: Assume μ = -5 and P{X < 1} = 0.9808. Find σ. Solution: Step 1: Use formula 4:

.

Step 2: P{X < 1} = P{X ≤ 1} - P{x = 1}

Step 3: Using the area portion of the table for 0.4808, we find z = 2.07. Step 4:

Unsolved Problems with Answers 25.3 - Problem 1: Assume σ = 10 and P{X ≤ 13} = 0.5. Find μ. 13 ⇑ Refer back to 25.3 - Example 1 & 25.3 - Solved Problem 1. 25.3 - Problem 2: Assume σ = 1, μ = 2.5 and P{X ≥ x} = 0.4129. Find x. Answer: 2.72 ⇑ Refer back to 25.3 - Example 2 & 25.3 - Solved Problem 2. 25.3 - Problem 3: Assume μ = 5 and P{X < -7} = 0.1736. Find σ. Answer: σ = 12.77 ⇑ Refer back to 25.3 - Example 3 & 25.3 - Solved Problem 3.

Supplementary Problems Assume Z and X are normally distributed 1. If P{Z < z}= 0.003, find z. 2. If μ = 2, σ = 5, and P{X < x}= 0.003, find x. 3. Assume two distributions where μ = 10 when P{X < x} = 0.05 and μ = 2 when P{X > x} = 0.01. Find x and σ. 4. Assume P{μ -2 < X < μ + 2 } = 0.2206. Find σ. 5. Find: a. P{μ - 2σ ≤ X ≤ μ + 2σ} b. P{μ - 3σ ≤ X ≤ μ + 3σ} 6. Assume X has a standard deviation σ = a and Y has a standard deviation 2a. Which is larger, P{μ - a ≤ X ≤ a + μ } or P{ μ - a ≤ Y ≤ a + μ}? 7. If P{X ≤ -5} = 0.01 and P{X ≥ 7} = 0.05. Find μ and σ. 8. From the equation

, algebraically derive:

a. x = μ + zσ b. μ = x - zσ c.

.

For the following problems, shade the appropriate area under the normal distribution and find the area. 9. P{Z ≤ -2.77} 10. P{Z > 0.13} 11. P{-2.79 ≤ Z ≤ 3.33}

12. P{2.15 ≤ Z ≤ 3.20} For the following problems, shade the appropriate area under the normal distribution and find z. 13. P{Z ≤ z} = 0.9671 14. P{Z ≤ z} = 0.1492 For the following problems, shade the appropriate area under the normal distribution and find the area. 15. P{Z ≤ -2} 16. P{1.13 < Z} 17. P{-3.11 ≤ Z ≤ 3.11} 18. Assume μ = 5, σ = 2. Find P{X ≤ 10}. 19. Assume μ = 15, σ = 3. Find P{10 ≤ X ≤ 19}. 20. Assume μ = 15, σ = 3. Find P{X ≥ 18 } + P{X ≤ 12}. 21. If P{X ≥ 10} = 0.4 and P{X ≤ 5} = 0.3, find μ and σ. For the remaining problems assume Z is a random variable with the standard distribution and X are random variables with means standard distributions µ, σ. 22. Find P(Z ≥ -1| Z ≤ 1). 23. Find x given P(X ≤x| X > 2) = 0.70, μ = 4, σ = 2. 24. Find μ, σ, given P(X ≥ 5| X ≤ 10) = 0.40, P(X ≤10 ) = 0.85. 25. Assume X has a normal distribution. Show that P(X = x) = 0.

About The Author Howard Dachslager received a Ph.D. in mathematics from the University of California, Berkeley where he specialized in real analysis and probability theory. Prior to beginning his doctoral studies at the University of California, Berkeley, he earned a masters degree in economics from the University of Wisconsin. Since completing his Ph.D. in mathematics, he has taught mathematics to a diverse student population on many levels. As a faculty member of the Department of Mathematics at the University of Toronto he prepared and presented undergraduate level courses in mathematics. For several years he taught undergraduate mathematics courses in the Department of Mathematics, University of California, Berkeley. While working in the State Department’s Alliance for Progress program, he taught advanced mathematics courses at a statistics institute in Santiago, Chile. Other teaching experience includes presenting undergraduate and community college mathematics courses. Throughout his teaching career in mathematics, he has always attempted to find and use the most effective teaching methodologies to communicate an understanding of mathematics. Unable to find an appropriate text for use in his courses in statistics and probability theory, and drawing on his own extensive teaching experience, education and training, he developed a tutorial statistics and probability theory text that has significantly improved the performance of students in those courses. By focusing on problem solving, the student can learn to repeat the methodologies involved, reinforcing on understanding of the concepts clearly explained in the text.