1,433 123 18MB
English Pages 832 [1017] Year 2014
A P P L I C AT I O N S Introduction to the Practice of Statistics presents a wide variety of applications from diverse disciplines. The following list indicates the number of Examples and Exercises related to different fields. Note that some items appear in more than one category.
7.14, 7.15, 7.18, 11.1, 12.26, 12.27, 15.8, 15.9, 15.10, 16.4, 16.5, 16.6, 16.8, 16.9, 16.11, 16.12, 16.13, 17.19
Science:
Environment:
1.1, 1.2, 1.40, 1.41, 1.43, 1.44, 1.45, 1.48, 2.17, 3.4, 3.5, 3.6, 4.17, 4.27, 4.31, 4.35, 4.36, 4.37, 4.42, 4.43, 4.46, 6.32, 7.1, 7.2, 7.3, 7.10, 7.11, 8.8, 9.3, 9.4, 9.5, 9.6, 9.7, 9.12, 9.13, 14.6, 14.10, 14.11, 15.1, 15.2, 15.3, 15.4, 15.5, 15.7, 15.11, 15.12, 16.7
3.33, 6.1, 6.17, 6.18, 6.30
Ethics:
Examples by Application
3.34, 3.35, 3.36, 3.37, 3.38, 3.39, 3.40, 3.41, 6.24, 6.25, 6.26
Agriculture:
Health and nutrition:
3.19, 4.19, 14.8, 14.9, 15.13, 15.14
Business and consumers: 1.1, 1.2, 1.15, 1.16, 1.23, 1.24, 1.25, 1.26, 1.27, 1.28, 1.30, 1.31, 1.36, 1.47, 1.48, 2.2, 2.8, 2.9, 2.10, 2.11, 2.14, 2.15, 2.16, 2.24, 2.30, 2.31, 2.40, 2.41, 2.43, 3.1, 3.7, 3.9, 3.11, 3.12, 3.13, 3.14, 3.16, 3.17, 3.23, 3.24, 3.25, 3.29, 3.30, 3.31, 3.38, 4.38, 5.2, 5.3, 5.4, 5.5, 5.16, 5.17, 5.18, 5.21, 5.22, 5.23, 5.24, 5.26, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 6.10, 6.11, 6.12, 6.13, 6.14, 6.20, 8.5, 8.6, 10.11, 10.12, 10.13, 11.2, 12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 12.10, 12.11, 12.12, 12.13, 12.14, 12.15, 12.16, 12.17, 12.18, 12.19, 12.20, 12.21, 12.22, 12.23, 12.24, 12.25, 13.1, 13.2, 14.6, 14.10, 14.11, 15.9, 16.1, 16.2, 16.3, 16.10, 17.2, 17.20, 17.21
Demographics and characteristics of people: 1.36, 1.38, 1.39, 5.1, 5.8, 5.12, 5.13, 5.14, 5.19, 5.25, 5.30, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 6.10, 6.11, 6.12, 6.13, 6.14, 6.15, 6.20, 7.1, 7.2, 7.3, 7.13, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 12.10, 12.11, 12.12, 12.13, 12.14, 12.15, 12.16, 12.17, 12.18, 12.19, 12.20, 12.21, 12.22, 12.23, 12.24, 13.7, 13.8
Economics and Finance:
Humanities and social sciences: 1.32, 3.3, 3.16, 3.27, 3.28, 3.32, 3.40, 3.41, 4.9, 4.10, 4.11, 4.47, 4.48, 5.12, 5.13, 5.14, 6.25, 6.27, 7.7, 7.8, 7.9, 7.12, 8.1, 8.2, 8.3, 8.10, 8.12, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.12, 9.13, 12.3, 13.8, 13.9, 14.1, 14.2, 14.3, 14.4, 14.5, 14.7
International: 1.23, 1.24, 1.25, 1.26, 1.27, 1.36, 1.47, 2.11, 2.15, 2.16, 2.17, 2.26, 3.23, 3.24, 3.25, 5.29, 11.2, 15.6, 16.1, 16.2, 16.3
Law and government data: 3.1, 3.2, 3.3, 3.26, 3.36
Manufacturing, products, and processes:
6.31, 7.4, 7.5, 7.6
Education and child development: 1.3, 1.4, 1.5, 1.6, 1.13, 1.17, 1.20, 1.29, 1.35, 1.37, 1.40, 1.43, 1.44, 1.45, 1.46, 3.2, 3.8, 3.20, 3.21, 3.38, 4.18, 4.22, 4.33, 4.34, 4.45, 6.3, 6.16, 6.19,
1.11, 1.12, 1.13, 1.21, 1.22, 1.33, 2.1, 2.3, 2.4, 2.5, 2.6, 2.12, 2.13, 2.18, 2.19, 2.20, 2.21, 2.22, 2.23, 2.25, 2.27, 2.28, 2.29, 2.32, 2.34, 2.35, 2.36, 2.37, 2.38, 2.39, 2.43, 2.44, 3.4, 3.5, 3.6, 3.15, 3.18, 3.34, 3.37, 3.39, 4.20, 4.21, 4.26, 4.30, 4.39, 4.41, 5.1, 5.15, 5.19, 5.20, 6.2, 6.15, 6.24, 6.29, 7.16, 7.17, 7.19, 7.20, 7.21, 7.22, 7.23, 8.4, 8.5, 8.6, 8.11, 8.13, 9.8, 9.9, 9.10, 9.11, 9.14, 9.15, 9.16, 9.17, 9.18, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.14, 10.15, 10.16, 10.17, 10.18, 10.19, 10.20, 10.21, 10.22, 10.23, 10.24, 10.25, 10.26, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8, 13.10, 16.14
1.19, 1.41, 3.10, 4.32, 6.27,
3.22, 5.11, 5.15, 5.29, 6.17, 6.18, 6.30, 6.32, 10.11, 10.12, 10.13, 17.1, 17.2, 17.3, 17.4, 17.5, 17.6, 17.7, 17.8, 17.9, 17.10, 17.11, 17.12, 17.13, 17.14, 17.15, 17.16, 17.17, 17.18, 17.19, 17.20, 17.21
1.34, 6.28
Sports and leisure:
Students: 1.3, 1.4, 1.5, 1.6, 1.19, 1.20, 1.35, 2.7, 4.32, 4.33, 4.34, 4.45, 5.1, 5.10, 5.19, 5.28, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 6.10, 6.11, 6.12, 6.13, 6.14, 6.15, 6.16, 6.19, 6.20, 7.1, 7.2, 7.3, 8.7, 8.8, 8.9, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.15, 10.25, 10.26, 11.1, 12.3, 16.4, 16.5, 16.6, 16.8, 16.9, 16.13
Technology and the Internet: 1.1, 1.2, 1.7, 1.8, 1.9, 1.10, 2.26, 3.7, 3.11, 3.12, 3.13, 3.17, 4.26, 4.43, 4.47, 4.48, 5.8, 5.9, 5.27, 5.28, 5.30, 6.28, 7.1, 7.2, 7.3, 7.10, 7.11, 8.1, 8.2, 8.3, 8.10, 8.12, 9.1, 9.2, 12.1, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 12.10, 12.11, 12.12, 12.13, 12.14, 12.15, 12.16, 12.17, 12.18, 12.19, 12.20, 12.21, 12.22, 12.23, 12.24, 13.1, 13.2, 14.1, 14.2, 14.3, 14.4, 14.5, 14.7, 15.6, 15.7
Exercises by Application Agriculture: 1.71, 3.64, 3.65, 5.45, 5.46, 5.83, 6.130, 6.131, 12.3, 12.8, 12.13, 13.43, 13.44, 13.45, 13.46, 13.47, 13.48, 13.49, 13.50, 14.31, 14.32, 14.33, 17.75, 17.76
Business and consumers: 1.1, 1.5, 1.8, 1.9, 1.36, 1.37, 1.38, 1.39, 1.47, 1.49, 1.50, 1.53, 1.59, 1.61, 1.62, 1.63, 1.67, 1.68, 1.69, 1.70, 1.77, 1.79, 1.81, 1.156, 1.159, 1.160, 1.161, 1.162, 1.163, 1.174, 2.3, 2.7, 2.10, 2.11, 2.12, 2.13, 2.14, 2.15, 2.16, 2.17, 2.26, 2.27, 2.36, 2.38, 2.39, 2.48, 2.49, 2.51, 2.52, 2.53, 2.54, 2.79, 2.80, 2.91, 2.110, 2.140, 2.149, 2.150, 2.151, 2.152,
2.153, 2.154, 2.155, 2.156, 2.162, 2.163, 2.164, 2.165, 2.166, 2.167, 2.168, 2.172, 3.4, 3.5, 3.9, 3.12, 3.13, 3.15, 3.22, 3.23, 3.27, 3.28, 3.29, 3.30, 3.31, 3.35, 3.39, 3.40, 3.41, 3.42, 3.44, 3.53, 3.59, 3.60, 3.66, 3.76, 3.113, 3.119, 3.123, 3.126, 3.127, 4.38, 4.83, 4.91, 4.92, 4.93, 4.94, 4.121, 4.122, 4.123, 4.124, 4.125, 4.128, 4.141, 4.149, 5.76, 6.1, 6.2, 6.3, 6.6, 6.7, 6.8, 6.22, 6.24, 6.25, 6.29, 6.74, 6.90, 6.91, 6.104, 6.122, 6.124, 6.132, 6.133, 6.140, 7.1, 7.2, 7.3, 7.6, 7.8, 7.9, 7.10, 7.11, 7.16, 7.27, 7.38, 7.42, 7.46, 7.47, 7.71, 7.72, 7.73, 7.77, 7.78, 7.82, 7.83, 7.84, 7.90, 7.91, 7.97, 7.105, 7.106, 7.112, 7.119, 7.120, 7.121, 7.124, 7.130, 7.131, 7.135, 7.140, 7.141, 7.142, 8.1, 8.3, 8.5, 8.6, 8.7, 8.8, 8.9, 8.12, 8.14, 8.15, 8.17, 8.19, 8.30, 8.31, 8.32, 8.37, 8.40, 8.41, 8.44, 8.48, 8.49, 8.50, 8.51, 8.72, 8.84, 8.85, 8.86, 8.101, 9.18, 9.36, 9.37, 9.38, 10.10, 10.11, 10.12, 10.13, 10.14, 10.16, 10.17, 10.18, 10.19, 10.20, 10.21, 10.27, 10.28, 10.30, 11.14, 11.15, 11.16, 11.17, 11.22, 11.23, 11.24, 11.25, 11.26, 11.27, 11.28, 12.14, 12.15, 12.27, 12.35, 12.40, 12.41, 12.42, 12.43, 12.44, 12.69, 13.7, 13.8, 13.13, 13.14, 13.19, 13.20, 13.21, 13.31, 14.3, 14.4, 14.5, 14.6, 14.7, 14.8, 14.11, 14.13, 14.14, 14.26, 14.27, 14.28, 14.34, 14.35, 14.36, 14.41, 14.44, 15.1, 15.2, 15.3, 15.4, 15.5, 15.6, 15.20, 15.21, 15.22, 15.23, 15.24, 15.25, 15.26, 15.27, 15.28, 15.29, 15.32, 15.33, 15.47, 16.1, 16.3, 16.4, 16.9, 16.11, 16.14, 16.15, 16.16, 16.25, 16.30, 16.31, 16.34, 16.47, 16.48, 16.49, 16.53, 16.55, 16.57, 16.61, 16.62, 16.63, 16.64, 16.66, 16.70, 16.73, 16.74, 16.82, 16.83, 16.94, 16.98, 16.99, 17.10, 17.36, 17.38, 17.39, 17.64, 17.69, 17.70, 17.71, 17.72, 17.74, 17.83, 17.84
Demographics and characteristics of people: 1.25, 1.26, 1.27, 1.29, 1.32, 1.33, 2.28, 2.29, 2.30, 2.124, 2.146, 2.147, 2.160, 2.161, 4.56, 5.1, 5.12, 5.13, 5.20, 5.51, 5.52, 5.53, 5.54, 5.55, 5.56, 5.60, 5.62, 5.66, 6.9, 6.10, 6.17, 6.18, 6.23, 6.35, 6.57, 6.66, 7.26, 7.28, 7.144, 8.38, 11.33, 11.34, 11.35, 12.50, 12.63, 13.10, 13.11, 16.11, 16.23
Economics and finance: 5.28, 5.34, 5.58, 5.87, 5.88, 7.87, 10.7, 10.22, 10.25, 10.26, 10.45, 10.46, 10.47, 17.33, 17.34, 17.73
Education and child development: 1.2, 1.3, 1.4, 1.6, 1.7, 1.13, 1.14, 1.17, 1.20, 1.21, 1.22, 1.23, 1.24, 1.43, 1.44, 1.45, 1.48, 1.51, 1.52, 1.54, 1.55, 1.56, 1.57, 1.60, 1.101, 1.102, 1.103, 1.104, 1.105, 1.106, 1.107, 1.108, 1.112, 1.113, 1.114, 1.115, 1.116, 1.117, 1.120, 1.121, 1.132, 1.133, 1.134, 1.135, 1.136, 1.137, 1.138, 1.139, 1.140, 1.141, 1.142, 1.143, 1.173, 1.176, 2.1, 2.21, 2.33, 2.34, 2.45, 2.59, 2.61, 2.73, 2.74, 2.75, 2.76, 2.77, 2.84, 2.87, 2.88, 2.89, 2.90, 2.98, 2.99, 2.100, 2.101, 2.106, 2.113, 2.121, 2.125, 2.137, 2.142, 2.145, 2.157, 2.158, 2.159, 2.175, 2.176, 2.177, 3.10, 3.20, 3.24, 3.25, 3.48, 3.52, 3.82, 3.107, 3.129, 3.132, 3.134, 4.33, 4.46, 4.49, 4.50, 4.99, 4.100, 4.113, 4.114, 4.115, 4.116, 4.117, 4.118, 4.119, 4.120, 4.126, 4.127, 4.142, 4.145, 4.146, 5.22, 5.24, 5.35, 5.64, 5.86, 6.56, 6.64, 6.65, 6.66, 6.71, 6.97, 6.119, 6.123, 6.125, 7.45, 7.50, 7.51, 7.133, 7.137, 7.138, 7.143, 8.59, 8.94, 9.25, 9.26, 9.27, 9.28, 9.33, 9.34, 9.35, 9.39, 9.40, 9.41, 9.42, 9.43, 10.10, 10.11, 10.12, 10.13, 10.14, 10.16, 10.17, 10.18, 10.19, 10.20, 10.21, 10.24, 10.38, 10.44, 10.48, 10.51, 10.52, 10.53, 11.1, 11.3, 11.5, 11.6, 11.13, 11.18, 11.29, 11.30, 11.31, 11.32, 12.13, 12.15, 12.17, 12.20, 12.21, 12.61, 12.62, 12.66, 13.9, 13.32, 13.52, 13.53, 13.54, 13.55, 14.47, 14.48, 14.49, 14.50, 15.8, 15.9, 15.10, 15.11, 15.12, 15.13, 15.16, 15.17, 15.18, 15.19, 15.34, 15.46, 16.18, 16.19, 16.26, 16.27, 16.37, 16.43, 16.45, 16.50, 16.51, 16.52, 16.56, 16.59, 16.65, 16.68, 16.71, 16.72, 16.79, 16.85, 16.86, 16.87, 16.88, 17.79
Environment: 1.30, 1.31, 1.34, 1.35, 1.64, 1.65, 1.66, 1.72, 1.83, 1.88, 1.89, 1.96, 1.100, 1.152, 1.153, 1.156, 2.126, 3.47, 3.49, 3.73, 5.23, 5.45, 5.46, 5.77, 6.35, 6.68, 6.69, 6.116, 6.117, 7.31, 7.77, 7.78, 7.85, 7.86, 7.93, 7.95, 7.96, 7.97, 7.105, 7.106, 7.107, 7.108, 7.110, 7.111, 7.129, 7.132,
8.78, 8.98, 9.55, 10.29, 10.32, 10.34, 10.35, 10.36, 10.37, 10.55, 10.56, 10.57, 11.20, 11.43, 11.44, 11.45, 11.46, 11.48, 11.49, 11.50, 11.51, 12.14, 12.16, 12.18, 13.12, 13.51, 15.24, 15.25, 15.26, 15.28, 15.29, 15.35, 15.45, 16.28, 16.31, 16.74, 16.75
10.33, 10.54, 11.42, 11.47, 11.52, 13.42, 15.27, 15.51,
Ethics: 3.96, 3.97, 3.98, 3.99, 3.100, 3.101, 3.102, 3.103, 3.104, 3.105, 3.106, 3.107, 3.108, 3.109, 3.110, 3.111, 3.112, 3.113, 3.114, 3.115, 3.116, 3.117, 3.119, 3.120, 3.132, 3.133, 3.134, 3.135, 3.136, 3.137, 3.138, 4.113, 4.114, 5.51, 5.52, 5.53, 5.54, 5.55, 5.59, 5.61, 5.81, 6.11, 6.91, 6.92, 6.94, 6.96, 6.100, 6.101, 6.103, 6.104, 7.38, 7.47, 8.33, 8.71, 8.92, 8.93, 8.98, 8.99, 16.98, 16.99
Health and nutrition: 1.15, 1.18, 1.19, 1.73, 1.74, 1.75, 1.82, 1.97, 1.144, 1.145, 1.146, 1.151, 1.157, 1.158, 1.169, 1.170, 1.171, 1.172, 2.2, 2.4, 2.8, 2.18, 2.19, 2.20, 2.35, 2.43, 2.44, 2.62, 2.63, 2.64, 2.66, 2.67, 2.68, 2.69, 2.70, 2.86, 2.92, 2.93, 2.94, 2.95, 2.108, 2.109, 2.115, 2.116, 2.117, 2.118, 2.119, 2.120, 2.123, 2.127, 2.128, 2.129, 2.130, 2.135, 2.136, 2.139, 2.140, 2.141, 2.143, 2.144, 2.170, 2.171, 2.174, 2.178, 3.1, 3.8, 3.11, 3.14, 3.16, 3.18, 3.19, 3.21, 3.26, 3.36, 3.37, 3.38, 3.43, 3.45, 3.75, 3.96, 3.100, 3.101, 3.105, 3.106, 3.109, 3.110, 3.112, 3.114, 3.117, 3.124, 3.130, 3.131, 3.136, 4.29, 4.30, 4.35, 4.39, 4.42, 4.43, 4.44, 4.45, 4.79, 4.82, 4.85, 4.90, 4.107, 4.108, 4.109, 4.111, 4.112, 4.129, 4.130, 4.131, 4.148, 5.21, 5.25, 5.27, 5.32, 5.33, 5.50, 5.56, 5.65, 6.17, 6.18, 6.19, 6.20, 6.23, 6.26, 6.33, 6.34, 6.39, 6.60, 6.67, 6.72, 6.96, 6.99, 6.118, 6.124, 6.125, 6.129, 7.32, 7.34, 7.35, 7.36, 7.37, 7.43, 7.44, 7.48, 7.49, 7.53, 7.61, 7.74, 7.75, 7.76, 7.79, 7.80, 7.89, 7.92, 7.94, 7.98, 7.100, 7.101, 7.103, 7.104, 7.109, 7.121, 7.127, 7.134, 7.136, 7.140, 7.141, 7.142, 8.13, 8.16, 8.18, 8.23, 8.24, 8.42, 8.43, 8.58, 8.59, 8.74, 8.75, 8.76, 8.77, 8.89, 8.90, 9.4, 9.5, 9.6, 9.7, 9.8, 9.14, 9.15, 9.16, 9.17, 9.22, 10.2, 10.5, 10.6, 10.23, 10.42,
10.43, 10.50, 10.58, 10.60, 11.21, 11.36, 11.37, 11.38, 11.40, 11.41, 12.26, 12.31, 12.33, 12.36, 12.39, 12.45, 12.48, 12.49, 12.53, 12.68, 13.18, 13.22, 13.23, 13.24, 13.26, 13.33, 13.36, 13.37, 14.15, 14.16, 14.17, 14.18, 14.20, 14.21, 14.29, 14.30, 14.38, 14.39, 14.40, 14.43, 14.46, 15.7, 15.31, 15.36, 15.42, 15.43, 15.48, 15.49, 15.52, 16.58, 16.67, 16.76, 16.78, 16.84, 17.54, 17.78
10.61, 11.39, 12.32, 12.47, 13.15, 13.25, 13.38, 14.19, 14.37, 14.45, 15.41, 15.50, 16.77,
Humanities and social sciences: 1.76, 1.118, 1.119, 2.107, 2.122, 2.138, 2.139, 3.6, 3.61, 3.63, 3.72, 3.77, 3.78, 3.79, 3.97, 3.98, 3.99, 3.111, 3.115, 3.116, 3.120, 3.133, 3.137, 3.138, 4.12, 4.13, 4.27, 4.28, 4.32, 4.61, 4.65, 4.66, 4.118, 4.119, 4.120, 4.126, 4.127, 4.152, 5.30, 5.31, 5.60, 5.62, 5.66, 6.11, 6.17, 6.18, 6.55, 6.56, 6.62, 6.70, 6.124, 6.140, 7.12, 7.13, 7.14, 7.15, 7.29, 7.33, 7.34, 7.35, 7.55, 7.67, 7.71, 7.90, 7.125, 7.126, 7.128, 7.130, 7.131, 7.139, 8.2, 8.4, 8.12, 8.25, 8.26, 8.28, 8.35, 8.36, 8.71, 8.73, 8.92, 8.93, 8.99, 9.25, 9.26, 9.27, 9.28, 9.40, 9.44, 9.45, 9.49, 10.42, 10.43, 10.50, 10.58, 10.59, 11.19, 11.33, 11.34, 11.35, 12.22, 12.23, 12.24, 12.25, 12.34, 12.36, 12.37, 12.38, 12.46, 12.67, 13.7, 13.8, 13.9, 13.16, 13.17, 13.22, 13.23, 13.24, 13.25, 13.26, 13.27, 13.28, 13.31, 13.35, 13.41, 14.9, 14.10, 15.13, 15.14, 15.15, 15.30, 15.37, 15.38, 15.39, 15.40, 15.45, 16.8, 16.10, 16.40, 16.82, 16.83, 16.89, 16.90, 16.91, 16.92
International: 1.25, 1.26, 1.27, 1.28, 1.29, 1.39, 1.40, 1.41, 1.47, 1.49, 1.61, 1.62, 1.63, 1.73, 1.74, 1.75, 1.119, 1.159, 1.160, 1.161, 1.162, 1.163, 2.15, 2.16, 2.28, 2.29, 2.30, 2.48, 2.53, 2.105, 2.124, 2.146, 2.147, 2.149, 2.150, 2.151, 2.152, 2.153, 2.154, 3.48, 3.66, 4.30, 4.32, 5.52, 5.54, 5.55, 5.59, 5.61, 5.76, 6.72, 6.99, 7.32, 8.23, 8.24, 8.29, 8.60, 8.61, 8.62, 9.41, 9.42, 9.43, 10.39, 10.40, 10.41, 11.33, 11.34, 11.35, 12.37, 12.38, 13.19, 13.20, 13.21, 13.22, 13.23, 13.24, 13.25, 13.26, 14.22, 14.23, 14.35, 14.42, 15.40, 16.3, 16.4,
16.11, 16.16, 16.23, 16.30, 16.34, 16.48, 16.53, 16.57, 16.70
Science:
1.23, 1.24, 1.28, 1.29, 1.48, 1.51, 1.52, 1.54, 1.55, 1.56, 1.57, 1.60, 1.120, 1.121, 1.164, 1.173, 2.1, 2.5, 2.7, 2.21, 2.32, 2.45, 2.59, 2.73, 2.74, 2.75, 2.76, 2.77, 2.89, 2.90, 2.98, 2.99, 2.100, 2.101, 2.125, 2.137, 2.142, 2.157, 2.158, 2.159, 3.15, 3.20, 3.24, 3.25, 3.52, 3.53, 3.54, 3.55, 3.59, 3.74, 3.79, 3.82, 3.107, 3.129, 4.113, 4.114, 4.115, 4.116, 4.117, 4.122, 4.123, 4.124, 4.125, 4.140, 5.11, 5.14, 5.15, 5.24, 5.36, 5.37, 5.49, 5.51, 5.53, 5.56, 5.64, 5.65, 5.68, 5.87, 6.6, 6.7, 6.8, 6.12, 6.13, 6.14, 6.15, 6.21, 6.22, 6.27, 6.28, 6.38, 6.54, 6.55, 6.56, 6.61, 6.63, 6.71, 6.90, 6.119, 6.121, 6.123, 7.7, 7.26, 7.28, 7.42, 7.68, 7.69, 7.70, 7.113, 7.119, 7.125, 7.126, 7.134, 8.10, 8.11, 8.28, 8.29, 8.30, 8.31, 8.32, 8.33, 8.84, 8.85, 8.86, 8.87, 8.88, 8.94, 9.101, 10.2, 10.5, 10.6, 10.10, 10.11, 10.12, 10.13, 10.14, 10.23, 10.31, 10.42, 10.43, 11.1, 11.3, 11.5, 11.6, 11.13, 11.14, 11.15, 11.16, 11.17, 11.18, 11.21, 12.13, 12.15, 12.17, 12.19, 12.22, 12.43, 12.50, 12.63, 13.52, 13.53, 13.54, 13.55, 15.8, 15.9, 15.10, 15.11, 15.12, 15.13, 15.46, 16.19, 16.26, 16.27, 16.43, 16.45, 16.50, 16.51, 16.52, 16.56, 16.80, 16.81, 16.85, 16.86, 16.87, 16.88, 17.4
1.100, 2.22, 2.23, 2.46, 2.47, 2.71, 2.72, 2.96, 2.97, 5.38, 5.74, 5.82, 10.7
Technology and the Internet:
Law and government data: 1.10, 1.30, 1.31, 2.21, 2.160, 2.161, 3.6, 3.67, 3.69, 3.70, 3.111, 3.135, 4.56, 5.29, 5.80, 6.57, 7.113, 9.49, 10.7
Manufacturing, products, and processes: 2.173, 5.17, 5.19, 5.29, 5.50, 5.75, 5.84, 6.30, 6.31, 6.36, 6.39, 6.73, 6.75, 7.27, 7.30, 7.39, 7.40, 7.41, 7.51, 7.52, 10.30, 10.62, 11.53, 11.54, 11.55, 11.56, 11.57, 11.58, 11.59, 11.60, 11.61, 12.51, 12.52, 12.54, 12.55, 12.56, 12.57, 12.58, 12.59, 12.60, 13.7, 13.8, 13.39, 13.40, 15.47, 16.25, 16.37, 16.47, 16.49, 16.84, 17.5, 17.6, 17.7, 17.8, 17.9, 17.11, 17.12, 17.13, 17.14, 17.15, 17.16, 17.17, 17.18, 17.19, 17.20, 17.21, 17.22, 17.29, 17.30, 17.31, 17.32, 17.35, 17.37, 17.40, 17.41, 17.42, 17.43, 17.44, 17.45, 17.46, 17.47, 17.48, 17.49, 17.50, 17.51, 17.57, 17.58, 17.59, 17.60, 17.62, 17.63, 17.65, 17.66, 17.77, 17.80, 17.81, 17.84, 17.86, 17.87, 17.88, 17.89, 17.90, 17.91
Sports and leisure: 1.46, 2.19, 2.20, 2.36, 2.37, 2.44, 2.54, 2.67, 2.69, 2.70, 2.91, 2.95, 2.122, 2.148, 3.2, 3.27, 3.28, 3.29, 3.30, 3.31, 3.32, 3.38, 3.50, 3.51, 3.53, 4.5, 4.6, 4.7, 4.9, 4.16, 4.18, 4.34, 4.36, 4.37, 4.58, 4.59, 4.60, 4.80, 4.81, 4.86, 4.89, 4.95, 4.96, 4.97, 4.98, 4.101, 4.136, 4.137, 4.139, 4.143, 4.144, 4.147, 5.11, 5.26, 5.32, 5.33, 5.40, 5.52, 5.54, 5.55, 5.63, 5.79, 5.85, 5.90, 6.9, 6.10, 6.16, 6.27, 6.28, 6.56, 6.72, 6.99, 6.136, 7.7, 7.11, 7.32, 7.134, 8.60, 8.61, 8.62, 8.63, 8.64, 8.65, 8.66, 8.67, 8.82, 8.83, 8.100, 9.44, 9.45, 10.31, 12.13, 12.15, 12.17, 12.19, 14.1, 14.26, 14.34, 15.1, 15.2, 15.3, 15.4, 15.5, 15.6, 15.20, 15.21, 15.22, 15.23, 16.9, 16.12, 16.17, 16.22, 16.24, 16.46, 16.80, 16.81, 16.94, 16.95, 16.96, 16.97, 17.53
Students: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.11, 1.12, 1.13, 1.14, 1.17, 1.20, 1.21, 1.22,
1.1, 1.16, 1.28, 1.29, 1.36, 1.37, 1.38, 1.40, 1.41, 1.165, 1.166, 1.168, 2.5, 2.28, 2.29, 2.30, 2.105, 2.122, 2.137, 2.155, 2.156, 3.5, 3.10, 3.19, 3.22, 3.25, 3.26, 3.27, 3.28, 3.29, 3.30, 3.31, 3.42, 3.44, 3.50, 3.51, 3.64, 3.65, 3.80, 3.107, 3.119, 3.133, 4.11, 4.25, 4.26, 4.27, 4.28, 4.54, 4.55, 5.1, 5.8, 5.11, 5.12, 5.13, 5.16, 5.18, 5.20, 5.37, 5.52, 5.54, 5.55, 5.59, 5.61, 5.64, 5.76, 6.16, 6.57, 6.121, 7.26, 7.28, 7.58, 7.59, 7.68, 7.69, 7.70, 7.81, 7.82, 7.83, 7.128, 8.1, 8.2, 8.3, 8.4, 8.12, 8.15, 8.19, 8.25, 8.33, 8.60, 8.61, 8.62, 8.63, 8.64, 8.65, 8.66, 8.67, 8.72, 8.79, 8.80, 8.81, 8.82, 8.83, 9.1, 9.2, 9.3, 9.10, 9.33, 9.34, 9.35, 12.14, 12.16, 12.18, 12.20, 12.21, 12.22, 12.50, 12.63, 13.19, 13.20, 13.21, 13.34, 14.9, 14.10, 14.11, 14.13, 14.14, 14.22, 14.23, 14.27, 14.28, 14.42, 15.7, 15.30, 15.37, 15.38, 15.39, 16.10, 16.12, 16.17, 16.22, 16.24, 16.46, 16.61, 16.62, 16.63, 16.64, 16.95, 16.96, 16.97
to the
I N T RO D U C T I O N
PRACTICE
of
STATISTICS EIG HT H EDIT ION
David S. Moore George P. McCabe Bruce A. Craig Purdue University
W. H. Freeman and Company A Macmillan Higher Education Company
SENIOR PUBLISHER: Ruth Baruth ACQUISITIONS EDITOR: Karen Carson MARKETING MANAGER: Steve Thomas DEVELOPMENTAL EDITOR: Katrina Wilhelm SENIOR MEDIA EDITOR: Laura Judge MEDIA EDITOR: Catriona Kaplan ASSOCIATE EDITOR: Jorge Amaral ASSISTANT MEDIA EDITOR: Liam Ferguson PHOTO EDITOR: Cecilia Varas PHOTO RESEARCHER: Dena Digilio Betz COVER DESIGNER: Victoria Tomaselli
Text Designer: Patrice Sheridan PROJECT EDITOR: Elizabeth Geller ILLUSTRATIONS: Aptara®, Inc. PRODUCTION COORDINATOR: Lawrence Guerra COMPOSITION: Aptara®, Inc. PRINTING AND BINDING: QuadGraphics
Library of Congress Control Number: 2013953337 Student Edition Hardcover (packaged with EESEE/CrunchIt! access card): ISBN-13: 978-1-4641-5893-3 ISBN-10: 1-4641-5893-2 Student Edition Looseleaf (packaged with EESEE/CrunchIt! access card): ISBN-13: 978-1-4641-5897-1 ISBN-10: 1-4641-5897-5 Instructor Complimentary Copy: ISBN-13: 978-1-4641-3338-1 ISBN-10: 1-4641-3338-7 © 2014, 2012, 2009, 2006 by W. H. Freeman and Company All rights reserved Printed in the United States of America First printing W. H. Freeman and Company 41 Madison Avenue New York, NY 10010 Houndmills, Basingstoke RG21 6XS, England www.whfreeman.com
BRIEF CONTENTS To Teachers: About This Book
xiii
To Students: What Is Statistics?
xxiii
About the Authors
xxvii
Data Table Index
xxix
Beyond the Basics Index
xxxi
PART I CHAPTER 1
CHAPTER 2
CHAPTER 3
PART II
CHAPTER 4
Looking at Data Looking at Data— Distributions Looking at Data— Relationships Producing Data
1
81 167
Probability and Inference Probability: The Study of Randomness
PART III CHAPTER 9
Analysis of Two-Way Tables 529
CHAPTER 10
Inference for Regression
563
CHAPTER 11
Multiple Regression
611
CHAPTER 12
One-Way Analysis of Variance
643
Two-Way Analysis of Variance
691
CHAPTER 13
Companion Chapters (on the IPS website www.whfreeman.com/ips8e) CHAPTER 14
Logistic Regression
14-1
CHAPTER 15
Nonparametric Tests
15-1
CHAPTER 16
Bootstrap Methods and Permutation Tests
16-1
Statistics for Quality: Control and Capability
17-1
231 CHAPTER 17
CHAPTER 5
Sampling Distributions
301
CHAPTER 6
Introduction to Inference
351
CHAPTER 7
Inference for Distributions 417
CHAPTER 8
Inference for Proportions
Topics in Inference
487
Tables Answers to Odd-Numbered Exercises Notes and Data Sources Photo Credits Index
T-1 A-1 N-1 C-1 I-1
iii
CONTENTS To Teachers: About This Book To Students: What Is Statistics? About the Authors Data Table Index Beyond the Basics Index PART I
xiii xxiii xxvii xxix xxxi
Looking at Data
CHAPTER 1
Looking at Data—Distributions
1
Introduction
1
1.1 Data
2
Key characteristics of a data set Section 1.1 Summary Section 1.1 Exercises
1.2 Displaying Distributions with Graphs Categorical variables: bar graphs and pie charts Quantitative variables: stemplots Histograms Data analysis in action: Don’t hang up on me Examining distributions Dealing with outliers Time plots Section 1.2 Summary Section 1.2 Exercises
1.3 Describing Distributions with Numbers Measuring center: the mean Measuring center: the median Mean versus median Measuring spread: the quartiles The five-number summary and boxplots The 1.5 3 IQR rule for suspected outliers Measuring spread: the standard deviation Properties of the standard deviation Choosing measures of center and spread Changing the unit of measurement Section 1.3 Summary Section 1.3 Exercises
4 8 8
9 10 13 15 18 20 21 23 25 25
30 31 33 34 35 37 39 42 44 45 45 47 48
1.4 Density Curves and Normal Distributions Density curves Measuring center and spread for density curves Normal distributions The 68–95–99.7 rule Standardizing observations Normal distribution calculations Using the standard Normal table Inverse Normal calculations Normal quantile plots Beyond the Basics: Density estimation Section 1.4 Summary Section 1.4 Exercises Chapter 1 Exercises
53 55 56 58 59 61 63 65 67 68 71 72 72 77
CHAPTER 2
Looking at Data—Relationships
81
Introduction
81
2.1 Relationships
81
Examining relationships
83
Section 2.1 Summary Section 2.1 Exercises
86 86
2.2 Scatterplots
87
Interpreting scatterplots The log transformation Adding categorical variables to scatterplots
89 93 94
Beyond the Basics: Scatterplot smoothers
94
Categorical explanatory variables
97
Section 2.2 Summary Section 2.2 Exercises
97 98
2.3 Correlation
103
The correlation r Properties of correlation Section 2.3 Summary Section 2.3 Exercises
103 105 107 107
2.4 Least-Squares Regression
109
Fitting a Line to Data Prediction Least-squares regression Interpreting the regression line
110 112 113 116 v
vi
Contents Facts about least-squares regression Correlation and regression Another view of r2
Section 2.4 Summary Section 2.4 Exercises
2.5 Cautions about Correlation and Regression Residuals Outliers and influential observations Beware of the lurking variable Beware of correlations based on averaged data Beware of restricted ranges Beyond the Basics: Data mining Section 2.5 Summary Section 2.5 Exercises
2.6 Data Analysis for Two-Way Tables The two-way table Joint distribution Marginal distributions Describing relations in two-way tables Conditional distributions Simpson’s paradox Section 2.6 Summary Section 2.6 Exercises
2.7 The Question of Causation Explaining association Establishing causation Section 2.7 Summary Section 2.7 Exercises Chapter 2 Exercises
118 119 120 121 122
126 126 130 132 134 134 135 136 136
139 139 141 142 143 144 146 148 148
152 152 154 156 156 157
How to randomize Cautions about experimentation Matched pairs designs Block designs Section 3.2 Summary Section 3.2 Exercises
3.3 Sampling Design Simple random samples Stratified random samples Multistage random samples Cautions about sample surveys Section 3.3 Summary Section 3.3 Exercises
3.4 Toward Statistical Inference Sampling variability Sampling distributions Bias and variability Sampling from large populations Why randomize? Beyond the Basics: Capture-recapture sampling Section 3.4 Summary Section 3.4 Exercises
3.5 Ethics Institutional review boards Informed consent Confidentiality Clinical trials Behavioral and social science experiments Section 3.5 Summary Section 3.5 Exercises Chapter 3 Exercises
CHAPTER 3 PART II
181 185 186 187 188 189
192 194 196 197 198 201 201
205 207 208 210 212 213 214 215 215
217 219 220 220 222 224 226 226 228
Probability and Inference
Producing Data
167
Introduction
167
CHAPTER 4
3.1 Sources of Data
168 168 169 171
Probability: The Study of Randomness
231
Introduction
231
174 174
4.1 Randomness
231
Anecdotal data Available data Sample surveys and experiments Section 3.1 Summary Section 3.1 Exercises
3.2 Design of Experiments Comparative experiments Randomization Randomized comparative experiments
175 178 179 181
The language of probability Thinking about randomness The uses of probability Section 4.1 Summary Section 4.1 Exercises
233 234 235 235 236
Contents
4.2 Probability Models Sample spaces Probability rules Assigning probabilities: finite number of outcomes Assigning probabilities: equally likely outcomes Independence and the multiplication rule Applying the probability rules Section 4.2 Summary Section 4.2 Exercises
4.3 Random Variables Discrete random variables Continuous random variables Normal distributions as probability distributions Section 4.3 Summary Section 4.3 Exercises
4.4 Means and Variances of Random Variables The mean of a random variable Statistical estimation and the law of large numbers Thinking about the law of large numbers Beyond the Basics: More laws of large numbers
Rules for means The variance of a random variable Rules for variances and standard deviations Section 4.4 Summary Section 4.4 Exercises
4.5 General Probability Rules General addition rules Conditional probability General multiplication rules Tree diagrams Bayes’s rule Independence again Section 4.5 Summary Section 4.5 Exercises Chapter 4 Exercises
236 237 240 242 243 244 248 248 249
252 253 256 259 260 261
263
The central limit theorem A few more facts Beyond the Basics: Weibull distributions Section 5.1 Summary Section 5.1 Exercises
5.2 Sampling Distributions for Counts and Proportions The binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities Binomial mean and standard deviation Sample proportions Normal approximation for counts and proportions The continuity correction Binomial formula The Poisson distributions Section 5.2 Summary Section 5.2 Exercises Chapter 5 Exercises
vii 307 313 315 316 316
320 322 324 325 328 329 331 335 336 339 343 344 349
264 267 269
CHAPTER 6
Introduction to Inference
351
271 273 275
Introduction
351
279 280
6.1 Estimating with Confidence
270
282 283 286 289 290 292 293 294 294 297
CHAPTER 5
Sampling Distributions
301
Introduction
301
5.1 The Sampling Distribution of a Sample Mean
303
The mean and standard deviation of x
305
Overview of inference Statistical confidence Confidence intervals Confidence interval for a population mean How confidence intervals behave Choosing the sample size Some cautions
352
353 354 356 358 362 363 365
Beyond the Basics: The bootstrap Section 6.1 Summary Section 6.1 Exercises
367 368 368
6.2 Tests of Significance
372
The reasoning of significance tests Stating hypotheses Test statistics P-values Statistical significance Tests for a population mean Two-sided significance tests and confidence intervals The P-value versus a statement of significance Section 6.2 Summary Section 6.2 Exercises
372 374 375 377 378 382 386 387 390 390
viii
Contents
6.3 Use and Abuse of Tests Choosing a level of significance What statistical significance does not mean Don’t ignore lack of significance Statistical inference is not valid for all sets of data Beware of searching for significance Section 6.3 Summary Section 6.3 Exercises
6.4 Power and Inference as a Decision Power Increasing the power Inference as decision Two types of error Error probabilities The common practice of testing hypotheses Section 6.4 Summary Section 6.4 Exercises Chapter 6 Exercises
394 395 396 397 398 399 400 400
402 405 406 407 408 410 411 411 413
Inference for Distributions
417
Introduction
417
The t distributions The one-sample t confidence interval The one-sample t test Matched pairs t procedures Robustness of the t procedures The power of the t test Inference for non-Normal populations Section 7.1 Summary Section 7.1 Exercises
7.2 Comparing Two Means The two-sample z statistic The two-sample t procedures The two-sample t confidence interval The two-sample t significance test Robustness of the two-sample procedures Inference for small samples Software approximation for the degrees of freedom The pooled two-sample t procedures Section 7.2 Summary Section 7.2 Exercises
Inference for population spread The F test for equality of spread Robustness of Normal inference procedures The power of the two-sample t test Section 7.3 Summary Section 7.3 Exercises Chapter 7 Exercises
473 474 474 477 477 479 480 481
402
CHAPTER 7
7.1 Inference for the Mean of a Population
7.3 Other Topics in Comparing Distributions
418 418 420 422 429 432 434 436
CHAPTER 8
Inference for Proportions
487
Introduction
487
8.1 Inference for a Single Proportion
488
Large-sample confidence interval for a single proportion
489
Beyond the Basics: The plus four confidence interval for a single proportion
493
Significance test for a single proportion Choosing a sample size
495 500
Section 8.1 Summary Section 8.1 Exercises
503 504
8.2 Comparing Two Proportions
508
Large-sample confidence interval for a difference in proportions
509
Beyond the Basics: Plus four confidence interval for a difference in proportions
Significance test for a difference in proportions Beyond the Basics: Relative risk Section 8.2 Summary Section 8.2 Exercises Chapter 8 Exercises
514
516 520 521 522 525
440 441
447 448 450 451 454 455 457 460 461 466 467
PART III
Topics in Inference
CHAPTER 9
Analysis of Two-Way Tables
529
Introduction
529
9.1 Inference for Two-Way Tables
530
The hypothesis: no association Expected cell counts The chi-square test Computations
536 537 537 540
Contents Computing conditional distributions The chi-square test and the z test Models for two-way tables
541 544 545
Beyond the Basics: Meta-analysis Section 9.1 Summary
548 550
9.2 Goodness of Fit
551
Section 9.2 Summary Chapter 9 Exercises
556 557
CHAPTER 10
Inference for Regression
563
Introduction
563
10.1 Simple Linear Regression
564
Statistical model for linear regression Data for simple linear regression Estimating the regression parameters Confidence intervals and significance tests Confidence intervals for mean response Prediction intervals Transforming variables Beyond the Basics: Nonlinear regression Section 10.1 Summary
10.2 More Detail about Simple Linear Regression Analysis of variance for regression The ANOVA F test Calculations for regression inference Inference for correlation Section 10.2 Summary Chapter 10 Exercises
564 566 568 574 576 578 580 582 584
585 586 588 590 597 599 600
CHAPTER 11
Multiple Regression
611
Introduction
611
11.1 Inference for Multiple Regression
612
Population multiple regression equation Data for multiple regression Multiple linear regression model Estimation of the multiple regression parameters Confidence intervals and significance tests for regression coefficients ANOVA table for multiple regression Squared multiple correlation R2
612 613 614 615 616 617 618
11.2 A Case Study Preliminary analysis Relationships between pairs of variables Regression on high school grades Interpretation of results Residuals Refining the model Regression on SAT scores Regression using all variables Test for a collection of regression coefficients Beyond the Basics: Multiple logistic regression Chapter 11 Summary Chapter 11 Exercises
ix
619 619 621 623 624 625 625 627 628 630 631 633 634
CHAPTER 12
One-Way Analysis of Variance
643
Introduction
643
12.1 Inference for One-Way Analysis of Variance
644
Data for one-way ANOVA Comparing means The two-sample t statistic An overview of ANOVA The ANOVA model Estimates of population parameters Testing hypotheses in one-way ANOVA The ANOVA table The F test
644 645 647 647 651 653 655 657 660
12.2 Comparing the Means Contrasts Multiple comparisons Software Power Chapter 12 Summary Chapter 12 Exercises
663 663 668 673 675 677 678
CHAPTER 13
Two-Way Analysis of Variance
691
Introduction
691
13.1 The Two-Way ANOVA Model
692
Advantages of two-way ANOVA The two-way ANOVA model Main effects and interactions
692 696 697
x
Contents
13.2 Inference for Two-Way ANOVA
702
15.3 The Kruskal-Wallis Test
15-28
The ANOVA table for two-way ANOVA
702
Hypotheses and assumptions The Kruskal-Wallis test
15-29 15-29
Section 15.3 Summary Section 15.3 Exercises Chapter 15 Exercises Chapter 15 Notes and Data Sources
15-33 15-33 15-35 15-36
Chapter 13 Summary Chapter 13 Exercises
708 708
Companion Chapters (on the IPS website www.whfreeman.com/ips8e)
CHAPTER 16
CHAPTER 14
Logistic Regression
14-1
Introduction
14-1
14.1 The Logistic Regression Model
14-2
Binomial distributions and odds Odds for two groups Model for logistic regression Fitting and interpreting the logistic regression model
14-2 14-3 14-5
14.2 Inference for Logistic Regression
14-9
Confidence intervals and significance tests Multiple logistic regression
14-10 14-16
Chapter 14 Summary Chapter 14 Exercises Chapter 14 Notes and Data Sources
14-6
14-19 14-20 14-27
CHAPTER 15
Nonparametric Tests
15-1
Introduction
15-1
15.1 The Wilcoxon Rank Sum Test
15-3
The rank transformation The Wilcoxon rank sum test The Normal approximation What hypotheses does Wilcoxon test? Ties Rank, t, and permutation tests
15-4 15-5 15-7 15-9 15-10 15-13
Section 15.1 Summary Section 15.1 Exercises
15-15 15-15
Bootstrap Methods and Permutation Tests
16-1
Introduction
16-1
Software
16-2
16.1 The Bootstrap Idea The big idea: resampling and the bootstrap distribution Thinking about the bootstrap idea Using software Section 16.1 Summary Section 16.1 Exercises
16.2 First Steps in Using the Bootstrap Bootstrap t confidence intervals Bootstrapping to compare two groups Beyond the Basics: The bootstrap for a scatterplot smoother Section 16.2 Summary Section 16.2 Exercises
16.3 How Accurate Is a Bootstrap Distribution? Bootstrapping small samples Bootstrapping a sample median Section 16.3 Summary Section 16.3 Exercises
16-10 16-11
16-13 16-14 16-17 16-20 16-22 16-22
16-24 16-27 16-29 16-30 16-31
16-32
Bootstrap percentile confidence intervals A more accurate bootstrap confidence interval: BCa Confidence intervals for the correlation
16-32
15-18
The Normal approximation Ties Testing a hypothesis about the median of a distribution
15-22 15-23
Section 16.4 Summary Section 16.4 Exercises
15-25
16.5 Significance Testing Using Permutation Tests
15-25 15-25
16-4 16-8 16-9
16.4 Bootstrap Confidence Intervals
15.2 The Wilcoxon Signed Rank Test
Section 15.2 Summary Section 15.2 Exercises
16-3
Using software
16-34 16-36 16-38 16-38
16-42 16-46
Contents Permutation tests in practice Permutation tests in other settings Section 16.5 Summary Section 16.5 Exercises Chapter 16 Exercises Chapter 16 Notes and Data Sources
16-46 16-49 16-52 16-53 16-56 16-58
CHAPTER 17
Statistics for Quality: Control and Capability
17-1
Introduction
17-1
Use of data to assess quality
17-2
17.1 Processes and Statistical Process Control
17-3
Describing processes Statistical process control x charts for process monitoring s charts for process monitoring
17-3 17-6 17-8 17-12
Section 17.1 Summary Section 17.1 Exercises
17-17 17-18
17.2 Using Control Charts
17-22
x and R charts Additional out-of-control rules
17-23 17-24
Setting up control charts Comments on statistical control Don’t confuse control with capability! Section 17.2 Summary Section 17.2 Exercises
xi
17-26 17-31 17-34 17-35 17-36
17.3 Process Capability Indexes
17-41
The capability indexes Cp and Cpk Cautions about capability indexes
17-43 17-46
Section 17.3 Summary Section 17.3 Exercises
17.4 Control Charts for Sample Proportions Control limits for p charts Section 17.4 Summary Section 17.4 Exercises Chapter 17 Exercises Chapter 17 Notes and Data Sources
17-48 17-48
17-52 17-53 17-57 17-57 17-58 17-60
Tables
T-1
Answers to Odd-Numbered Exercises
A-1
Notes and Data Sources
N-1
Photo Credits
C-1
Index
I-1
TO T E A C H E R S About This Book
S
tatistics is the science of data. Introduction to the Practice of Statistics (IPS) is an introductory text based on this principle. We present methods of basic statistics in a way that emphasizes working with data and mastering statistical reasoning. IPS is elementary in mathematical level but conceptually rich in statistical ideas. After completing a course based on our text, we would like students to be able to think objectively about conclusions drawn from data and use statistical methods in their own work. In IPS we combine attention to basic statistical concepts with a comprehensive presentation of the elementary statistical methods that students will find useful in their work. IPS has been successful for several reasons: 1. IPS examines the nature of modern statistical practice at a level suitable for beginners. We focus on the production and analysis of data as well as the traditional topics of probability and inference. 2. IPS has a logical overall progression, so data production and data analysis are a major focus, while inference is treated as a tool that helps us draw conclusions from data in an appropriate way. 3. IPS presents data analysis as more than a collection of techniques for exploring data. We emphasize systematic ways of thinking about data. Simple principles guide the analysis: always plot your data; look for overall patterns and deviations from them; when looking at the overall pattern of a distribution for one variable, consider shape, center, and spread; for relations between two variables, consider form, direction, and strength; always ask whether a relationship between variables is influenced by other variables lurking in the background. We warn students about pitfalls in clear cautionary discussions. 4. IPS uses real examples to drive the exposition. Students learn the technique of least-squares regression and how to interpret the regression slope. But they also learn the conceptual ties between regression and correlation and the importance of looking for influential observations. 5. IPS is aware of current developments both in statistical science and in teaching statistics. Brief optional Beyond the Basics sections give quick overviews of topics such as density estimation, scatterplot smoothers, data mining, nonlinear regression, and meta-analysis. Chapter 16 gives an elementary introduction to the bootstrap and other computer-intensive statistical methods. The title of the book expresses our intent to introduce readers to statistics as it is used in practice. Statistics in practice is concerned with drawing conclusions from data. We focus on problem solving rather than on methods that may be useful in specific settings.
GAISE The College Report of the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Project (http://www.amstat.org/education/ gaise/) was funded by the American Statistical Association to make recommendations for how introductory statistics courses should be taught. This report contains many interesting teaching suggestions and we strongly recommend xiii
xiv
To Teachers: About This Book that you read it. The philosophy and approach of IPS closely reflect the GAISE recommendations. Let’s examine each of the recommendations in the context of IPS. 1. Emphasize statistical literacy and develop statistical thinking. Through our experiences as applied statisticians, we are very familiar with the components that are needed for the appropriate use of statistical methods. We focus on collecting and finding data, evaluating the quality of data, performing statistical analyses, and drawing conclusions. In examples and exercises throughout the text, we emphasize putting the analysis in the proper context and translating numerical and graphical summaries into conclusions. 2. Use real data. Many of the examples and exercises in IPS include data that we have obtained from collaborators or consulting clients. Other data sets have come from research related to these activities. We have also used the Internet as a data source, particularly for data related to social media and other topics of interest to undergraduates. With our emphasis on real data, rather than artificial data chosen to illustrate a calculation, we frequently encounter interesting issues that we explore. These include outliers and nonlinear relationships. All data sets are available from the text website. 3. Stress conceptual understanding rather than mere knowledge of procedures. With the software available today, it is very easy for almost anyone to apply a wide variety of statistical procedures, both simple and complex, to a set of data. Without a firm grasp of the concepts, such applications are frequently meaningless. By using the methods that we present on real sets of data, we believe that students will gain an excellent understanding of these concepts. Our emphasis is on the input (questions of interest, collecting or finding data, examining data) and the output (conclusions) for a statistical analysis. Formulas are given only where they will provide some insight into concepts. 4. Foster active learning in the classroom. As we mentioned above, we believe that statistics is exciting as something to do rather than something to talk about. Throughout the text we provide exercises in Use Your Knowledge sections that ask the students to perform some relatively simple tasks that reinforce the material just presented. Other exercises are particularly suited to being worked and discussed within a classroom setting. 5. Use technology for developing concepts and analyzing data. Technology has altered statistical practice in a fundamental way. In the past, some of the calculations that we performed were particularly difficult and tedious. In other words, they were not fun. Today, freed from the burden of computation by software, we can concentrate our efforts on the big picture: what questions are we trying to address with a study and what can we conclude from our analysis? 6. Use assessments to improve and evaluate student learning. Our goal for students who complete a course based on IPS is that they are able to design and carry out a statistical study for a project in their capstone course or other setting. Our exercises are oriented toward this goal. Many ask about the design of a statistical study and the collection of data. Others ask for a paragraph summarizing the results of an analysis. This recommendation includes the use of projects, oral presentations, article
To Teachers: About This Book
xv
critiques, and written reports. We believe that students using this text will be well prepared to undertake these kinds of activities. Furthermore, we view these activities not only as assessments but also as valuable tools for learning statistics.
Teaching Recommendations We have used IPS in courses taught to a variety of student audiences. For general undergraduates from mixed disciplines, we recommend covering Chapters 1 to 8 and Chapters 9, 10, or 12. For a quantitatively strong audience—sophomores planning to major in actuarial science or statistics—we recommend moving more quickly. Add Chapters 10 and 11 to the core material in Chapters 1 to 8. In general, we recommend de-emphasizing the material on probability because these students will take a probability course later in their program. For beginning graduate students in such fields as education, family studies, and retailing, we recommend that the students read the entire text (Chapters 11 and 13 lightly), again with reduced emphasis on Chapter 4 and some parts of Chapter 5. In all cases, beginning with data analysis and data production (Part I) helps students overcome their fear of statistics and builds a sound base for studying inference. We believe that IPS can easily be adapted to a wide variety of audiences.
The Eighth Edition: What’s New? • Text Organization Each section now begins with the phrase “When you complete this section, you will be able to” followed by a bulleted list of behavioral objectives that the students should be able to master. Exercises that focus on these objectives appear at the beginning of the section exercises. The long introduction to Chapter 1 has been replaced by a short introduction and a new section titled “Data,” which gives an overview of the basic ideas on the key characteristics of a set of data. The same approach has been taken with Chapters 2 and 3, which now have new sections titled “Relationships” and “Sources of Data,” respectively. A short introduction to the Poisson distribution has been added to Section 5.2. Sections 9.1 and 9.2 have been combined with a more concise presentation of the material on computation and models from Section 5.2 of the seventh edition. In Chapter 16, the use of S-PLUS software has been replaced by R. Sections previously marked as optional are no longer given this designation. We have found that instructors make a variety of choices regarding what to include in their courses. General guidelines for different types of students are given in the Teaching Recommendations paragraph above. • Design A new design incorporates colorful, revised figures throughout to aid the students’ understanding of text material. Photographs related to chapter examples and exercises make connections to real-life applications and provide a visual context for topics. More figures with software output have been included. • Exercises and Examples Over 50% of the exercises are new or revised. There are more than 1700 exercises, a slight increase over the total in the seventh edition. To maintain the attractiveness of the examples to students, we have replaced or updated a large number of them. Over 35% of the 422 examples are new or revised. A list of exercises and examples categorized by application area is provided on the inside of the front cover.
xvi
To Teachers: About This Book In addition to the new eighth edition enhancements, IPS has retained the successful pedagogical features from previous editions: CHALLENGE
LOOK BACK
• Look Back At key points in the text, Look Back margin notes direct the reader to the first explanation of a topic, providing page numbers for easy reference. • Caution Warnings in the text, signaled by a caution icon, help students avoid common errors and misconceptions.
CHALLENGE
• Challenge Exercises More challenging exercises are signaled with an icon. Challenge exercises are varied: some are mathematical, some require open-ended investigation, and others require deeper thought about the basic concepts. • Applets Applet icons are used throughout the text to signal where related interactive statistical applets can be found on the IPS website.
USE YOUR KNOWLEDGE
• Use Your Knowledge Exercises We have found these to be a very useful learning tool. Therefore, we have increased the number and variety of these exercises. These exercises are listed, with page numbers, before the sectionending exercises.
Acknowledgments We are pleased that the first seven editions of Introduction to the Practice of Statistics have helped to move the teaching of introductory statistics in a direction supported by most statisticians. We are grateful to the many colleagues and students who have provided helpful comments, and we hope that they will find this new edition another step forward. In particular, we would like to thank the following colleagues who offered specific comments on the new edition: Ali Arab Georgetown University Sanjib Basu Northern Illinois University Mary Ellen Bock Purdue University Max Buot Xavier University Jerry J. Chen Suffolk Community College Pinyuen Chen Syracuse University Scott Crawford University of Wyoming Carolyn K. Cuff Westminster College K. L. D. Gunawardena University of Wisconsin–Oshkosh C. Clinton Harshaw Presbyterian College James Helmreich Marist College Ulrich Hoensch Rocky Mountain College Jeff Hovermill Northern Arizona University Debra L. Hydorn University of Mary Washington
Monica Jackson American University Tiffany Kolba Valparaiso University Sharon Navard College of New Jersey Ronald C. Neath Hunter College of CUNY Esther M. Pearson Lasell College Thomas Pfaff Ithaca College Kathryn Prewitt Arizona State University Robert L. Sims George Mason University Thomas M. Songer Portland Community College Haiyan Su Montclair State University Anatoliy Swishchuk University of Calgary Frederick C. Tinsley Colorado College Terri Torres Oregon Institute of Technology
To Teachers: About This Book
xvii
The professionals at W. H. Freeman and Company, in particular Ruth Baruth, Karen Carson, Katrina Wilhelm, Liam Ferguson, Elizabeth Geller, Vicki Tomaselli, and Lawrence Guerra, have contributed greatly to the success of IPS. In addition, we would like to thank Pamela Bruton, Jackie Miller, and Patricia Humphrey for their valuable contributions to the eighth edition. Most of all, we are grateful to the many friends and collaborators whose data and research questions have enabled us to gain a deeper understanding of the science of data. Finally, we would like to acknowledge the contributions of John W. Tukey, whose contributions to data analysis have had such a great influence on us as well as a whole generation of applied statisticians.
MEDIA AND SUPPLEMENTS W. H. Freeman’s new online homework system, LaunchPad, offers our quality content curated and organized for easy assignability in a simple but powerful interface. We’ve taken what we’ve learned from thousands of instructors and hundreds of thousands of students to create a new generation of W. H. Freeman/Macmillan technology. Curated Units. Combining a curated collection of videos, homework sets, tutorials, applets, and e-Book content, LaunchPad’s interactive units give you a building block to use as is or as a starting point for your own learning units. Thousands of exercises from the text can be assigned as online homework, including many algorithmic exercises. An entire unit’s worth of work can be assigned in seconds, drastically reducing the amount of time it takes for you to have your course up and running. Easily customizable. You can customize the LaunchPad Units by adding quizzes and other activities from our vast wealth of resources. You can also add a discussion board, a dropbox, and RSS feed, with a few clicks. LaunchPad allows you to customize your students’ experience as much or as little as you like. Useful analytics. The gradebook quickly and easily allows you to look up performance metrics for classes, individual students, and individual assignments. Intuitive interface and design. The student experience is simplified. Students’ navigation options and expectations are clearly laid out at all times, ensuring they can never get lost in the system.
Assets integrated into LaunchPad include: Interactive e-Book. Every LaunchPad e-Book comes with powerful study tools for students, video and multimedia content, and easy customization for instructors. Students can search, highlight, and bookmark, making it easier to study and access key content. And teachers can ensure that their classes get just the book they want to deliver: customize and rearrange chapters, add and share notes and discussions, and link to quizzes, activities, and other resources. LearningCurve provides students and instructors with powerful adaptive quizzing, a game-like format, direct links to the e-Book, and instant feedback. The quizzing system features questions tailored specifically to the text and adapts to students’ responses, providing material at different difficulty levels and topics based on student performance. SolutionMaster offers an easy-to-use web-based version of the instructor’s solutions, allowing instructors to generate a solution file for any set of homework exercises. New Stepped Tutorials are centered on algorithmically generated quizzing with step-by-step feedback to help students work their way toward the correct solution. These new exercise tutorials (two to three per chapter) are easily assignable and assessable. xviii
To Teachers: About This Book
xix
CHALLENGE
Statistical Video Series consists of StatClips, StatClips Examples, and Statistically Speaking “Snapshots.” View animated lecture videos, whiteboard lessons, and documentary-style footage that illustrate key statistical concepts and help students visualize statistics in real-world scenarios. New Video Technology Manuals available for TI-83/84 calculators, Minitab, Excel, JMP, SPSS, R, Rcmdr, and CrunchIT!® provide brief instructions for using specific statistical software. Updated StatTutor Tutorials offer multimedia tutorials that explore important concepts and procedures in a presentation that combines video, audio, and interactive features. The newly revised format includes built-in, assignable assessments and a bright new interface. Updated Statistical Applets give students hands-on opportunities to familiarize themselves with important statistical concepts and procedures, in an interactive setting that allows them to manipulate variables and see the results graphically. Icons in the textbook indicate when an applet is available for the material being covered. CrunchIT!® is a web-based statistical program that allows users to perform all the statistical operations and graphing needed for an introductory statistics course and more. It saves users time by automatically loading data from IPS 8e, and it provides the flexibility to edit and import additional data. Stats@Work Simulations put students in the role of the statistical consultant, helping them better understand statistics interactively within the context of real-life scenarios. EESEE Case Studies (Electronic Encyclopedia of Statistical Examples and Exercises), developed by The Ohio State University Statistics Department, teach students to apply their statistical skills by exploring actual case studies using real data. Data files are available in ASCII, Excel, TI, Minitab, SPSS (an IBM Company),* and JMP formats. Student Solutions Manual provides solutions to the odd-numbered exercises in the text. Available electronically within LaunchPad, as well as in print form. Interactive Table Reader allows students to use statistical tables interactively to seek the information they need. Instructor’s Guide with Full Solutions includes teaching suggestions, chapter comments, and detailed solutions to all exercises. Available electronically within LaunchPad, as well as on the IRCD and in print form. Test Bank offers hundreds of multiple-choice questions. Also available on CD-ROM (for Windows and Mac), where questions can be downloaded, edited, and resequenced to suit each instructor’s needs. Lecture PowerPoint Slides offer a detailed lecture presentation of statistical concepts covered in each chapter of IPS. *SPSS was acquired by IBM in October 2009.
xx
To Teachers: About This Book
Additional Resources Available with IPS 8e Companion Website www.whfreeman.com/ips8e This open-access website includes statistical applets, data files, supplementary exercises, and self-quizzes. The website also offers four optional companion chapters covering logistic regression, nonparametric tests, bootstrap methods and permutation tests, and statistics for quality control and capability. Instructor access to the Companion Website requires user registration as an instructor and features all of the open-access student web materials, plus: • Instructor version of EESEE with solutions to the exercises in the student version. • PowerPoint Slides containing all textbook figures and tables. • Lecture PowerPoint Slides
Special Software Packages Student versions of JMP and Minitab are available for packaging with the text. Contact your W. H. Freeman representative for information or visit www.whfreeman.com.
Enhanced Instructor’s Resource CD-ROM, ISBN: 1-4641-3360-3 Allows instructors to search and export (by key term or chapter) all the resources available on the student companion website plus the following: • All text images and tables • Instructor’s Guide with Full Solutions • PowerPoint files and lecture slides • Test Bank files
Course Management Systems W. H. Freeman and Company provides courses for Blackboard, Angel, Desire2Learn, Canvas, Moodle, and Sakai course management systems. These are completely integrated solutions that you can easily customize and adapt to meet your teaching goals and course objectives. Visit macmillanhighered.com/Catalog/other/Coursepack for more information. iClicker is a two-way radio-frequency classroom response solution developed by educators for educators. Each step of i-clicker’s development has been informed by teaching and learning. To learn more about packaging i-clicker with this textbook, please contact your local sales rep or visit www.iclicker. com.
TO S T U D E N T S What Is Statistics?
S
tatistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. We are bombarded by data in our everyday lives. The news mentions movie box-office sales, the latest poll of the president’s popularity, and the average high temperature for today’s date. Advertisements claim that data show the superiority of the advertiser’s product. All sides in public debates about economics, education, and social policy argue from data. A knowledge of statistics helps separate sense from nonsense in this flood of data. The study and collection of data are also important in the work of many professions, so training in the science of statistics is valuable preparation for a variety of careers. Each month, for example, government statistical offices release the latest numerical information on unemployment and inflation. Economists and financial advisers, as well as policy makers in government and business, study these data in order to make informed decisions. Doctors must understand the origin and trustworthiness of the data that appear in medical journals. Politicians rely on data from polls of public opinion. Business decisions are based on market research data that reveal consumer tastes and preferences. Engineers gather data on the quality and reliability of manufactured products. Most areas of academic study make use of numbers and, therefore, also make use of the methods of statistics. This means it is extremely likely that your undergraduate research projects will involve, at some level, the use of statistics.
Learning from Data The goal of statistics is to learn from data. To learn, we often perform calculations or make graphs based on a set of numbers. But to learn from data, we must do more than calculate and plot, because data are not just numbers; they are numbers that have some context that helps us learn from them. Two-thirds of Americans are overweight or obese according to the Centers for Disease Control and Prevention (CDC) website (www.cdc.gov/nchs/nhanes. htm). What does it mean to be obese or to be overweight? To answer this question we need to talk about body mass index (BMI). Your weight in kilograms divided by the square of your height in meters is your BMI. A man who is 6 feet tall (1.83 meters) and weighs 180 pounds (81.65 kilograms) will have a BMI of 81.65/(1.83)2 5 24.4 kg/m2. How do we interpret this number? According to the CDC, a person is classified as overweight if his or her BMI is between 25 and 29 kg/m2 and as obese if his or her BMI is 30 kg/m2 or more. Therefore, twothirds of Americans have a BMI of 25 kg/m2 or more. The man who weighs 180 pounds and is 6 feet tall is not overweight or obese, but if he gains 5 pounds, his BMI would increase to 25.1, and he would be classified as overweight. When you do statistical problems, even straightforward textbook problems, don’t just graph or calculate. Think about the context and state your conclusions in the specific setting of the problem. As you are learning how to do statistical calculations and graphs, remember that the goal of statistics is not calculation for its own sake but gaining understanding from numbers. The calculations and graphs can be automated by a calculator or software, xxi
xxii
To Students: What Is Statistics? but you must supply the understanding. This book presents only the most common specific procedures for statistical analysis. A thorough grasp of the principles of statistics will enable you to quickly learn more advanced methods as needed. On the other hand, a fancy computer analysis carried out without attention to basic principles will often produce elaborate nonsense. As you read, seek to understand the principles as well as the necessary details of methods and recipes.
The Rise of Statistics Historically, the ideas and methods of statistics developed gradually as society grew interested in collecting and using data for a variety of applications. The earliest origins of statistics lie in the desire of rulers to count the number of inhabitants or measure the value of taxable land in their domains. As the physical sciences developed in the seventeenth and eighteenth centuries, the importance of careful measurements of weights, distances, and other physical quantities grew. Astronomers and surveyors striving for exactness had to deal with variation in their measurements. Many measurements should be better than a single measurement, even though they vary among themselves. How can we best combine many varying observations? Statistical methods that are still important were invented in order to analyze scientific measurements. By the nineteenth century, the agricultural, life, and behavioral sciences also began to rely on data to answer fundamental questions. How are the heights of parents and children related? Does a new variety of wheat produce higher yields than the old, and under what conditions of rainfall and fertilizer? Can a person’s mental ability and behavior be measured just as we measure height and reaction time? Effective methods for dealing with such questions developed slowly and with much debate. As methods for producing and understanding data grew in number and sophistication, the new discipline of statistics took shape in the twentieth century. Ideas and techniques that originated in the collection of government data, in the study of astronomical or biological measurements, and in the attempt to understand heredity or intelligence came together to form a unified “science of data.” That science of data—statistics—is the topic of this text.
The Organization of This Book Part I of this book, called simply “Looking at Data,” concerns data analysis and data production. The first two chapters deal with statistical methods for organizing and describing data. These chapters progress from simpler to more complex data. Chapter 1 examines data on a single variable, Chapter 2 is devoted to relationships among two or more variables. You will learn both how to examine data produced by others and how to organize and summarize your own data. These summaries will first be graphical, then numerical, and then, when appropriate, in the form of a mathematical model that gives a compact description of the overall pattern of the data. Chapter 3 outlines arrangements (called designs) for producing data that answer specific questions. The principles presented in this chapter will help you to design proper samples and experiments for your research projects and to evaluate other such investigations in your field of study.
To Students: What Is Statistics?
xxiii
Part II, consisting of Chapters 4 to 8, introduces statistical inference— formal methods for drawing conclusions from properly produced data. Statistical inference uses the language of probability to describe how reliable its conclusions are, so some basic facts about probability are needed to understand inference. Probability is the subject of Chapters 4 and 5. Chapter 6, perhaps the most important chapter in the text, introduces the reasoning of statistical inference. Effective inference is based on good procedures for producing data (Chapter 3), careful examination of the data (Chapters 1 and 2), and an understanding of the nature of statistical inference as discussed in Chapter 6. Chapters 7 and 8 describe some of the most common specific methods of inference, for drawing conclusions about means and proportions from one and two samples. The five shorter chapters in Part III introduce somewhat more advanced methods of inference, dealing with relations in categorical data, regression and correlation, and analysis of variance. Four supplementary chapters, available from the text website, present additional statistical topics.
What Lies Ahead Introduction to the Practice of Statistics is full of data from many different areas of life and study. Many exercises ask you to express briefly some understanding gained from the data. In practice, you would know much more about the background of the data you work with and about the questions you hope the data will answer. No textbook can be fully realistic. But it is important to form the habit of asking, “What do the data tell me?” rather than just concentrating on making graphs and doing calculations. You should have some help in automating many of the graphs and calculations. You should certainly have a calculator with basic statistical functions. Look for keywords such as “two-variable statistics” or “regression” when you shop for a calculator. More advanced (and more expensive) calculators will do much more, including some statistical graphs. You may be asked to use software as well. There are many kinds of statistical software, from spreadsheets to large programs for advanced users of statistics. The kind of computing available to learners varies a great deal from place to place—but the big ideas of statistics don’t depend on any particular level of access to computing. Because graphing and calculating are automated in statistical practice, the most important assets you can gain from the study of statistics are an understanding of the big ideas and the beginnings of good judgment in working with data. Ideas and judgment can’t (at least yet) be automated. They guide you in telling the computer what to do and in interpreting its output. This book tries to explain the most important ideas of statistics, not just teach methods. Some examples of big ideas that you will meet are “always plot your data,” “randomized comparative experiments,” and “statistical significance.” You learn statistics by doing statistical problems. “Practice, practice, practice.” Be prepared to work problems. The basic principle of learning is persistence. Being organized and persistent is more helpful in reading this book than knowing lots of math. The main ideas of statistics, like the main ideas of any important subject, took a long time to discover and take some time to master. The gain will be worth the pain.
ABOUT THE AUTHORS David S. Moore is Shanti S. Gupta Distinguished Professor of Statistics, Emeritus, at Purdue University and was 1998 president of the American Statistical Association. He received his AB from Princeton and his PhD from Cornell, both in mathematics. He has written many research papers in statistical theory and served on the editorial boards of several major journals. Professor Moore is an elected fellow of the American Statistical Association and of the Institute of Mathematical Statistics and an elected member of the International Statistical Institute. He has served as program director for statistics and probability at the National Science Foundation. In recent years, Professor Moore has devoted his attention to the teaching of statistics. He was the content developer for the Annenberg/Corporation for Public Broadcasting college-level telecourse Against All Odds: Inside Statistics and for the series of video modules Statistics: Decisions through Data, intended to aid the teaching of statistics in schools. He is the author of influential articles on statistics education and of several leading texts. Professor Moore has served as president of the International Association for Statistical Education and has received the Mathematical Association of America’s national award for distinguished college or university teaching of mathematics.
George P. McCabe is Associate Dean for Academic Affairs in the College of Science and Professor of Statistics at Purdue University. In 1966, he received a BS degree in mathematics from Providence College and in 1970 a PhD in mathematical statistics from Columbia University. His entire professional career has been spent at Purdue, with sabbaticals at Princeton University, the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Melbourne (Australia), the University of Berne (Switzerland), the National Institute of Standards and Technology (NIST) in Boulder, Colorado, and the National University of Ireland in Galway. Professor McCabe is an elected fellow of the American Association for the Advancement of Science and of the American Statistical Association; he was 1998 Chair of its section on Statistical Consulting. In 2008–2010, he served on the Institute of Medicine Committee on Nutrition Standards for the National School Lunch and Breakfast Programs. He has served on the editorial boards of several statistics journals. He has consulted with many major corporations and has testified as an expert witness on the use of statistics in several cases. Professor McCabe’s research interests have focused on applications of statistics. Much of his recent work has focused on problems in nutrition, including nutrient requirements, calcium metabolism, and bone health. He is the author or coauthor of over 170 publications in many different journals.
xxv
xxvi
About the Authors
Bruce A. Craig is Professor of Statistics and Director of the Statistical Consulting Service at Purdue University. He received his BS in mathematics and economics from Washington University in St. Louis and his PhD in statistics from the University of Wisconsin–Madison. He is an elected fellow of the American Statistical Association and was chair of its section on Statistical Consulting in 2009. He is also an active member of the Eastern North American Region of the International Biometrics Society and was elected by the voting membership to the Regional Committee between 2003 and 2006. Professor Craig has served on the editorial board of several statistical journals and has been a member of several data and safety monitoring boards, including Purdue’s institutional review board. Professor Craig’s research interests focus on the development of novel statistical methodology to address research questions in the life sciences. Areas of current interest are protein structure determination, diagnostic testing, and animal abundance estimation. In 2005, he was named Purdue University Faculty Scholar.
D ATA TA B L E I N D E X IQ test scores for 60 randomly chosen fifth-grade students
16
Service times (seconds) for calls to a customer service center
19
TABLE 1.3
Educational data for 78 seventh-grade students
29
TABLE 2.1
World record times for the 10,000-meter run
102
TABLE 2.2
Four data sets for exploring correlation and regression
125
TABLE 2.3
Two measures of glucose level in diabetics
130
TABLE 2.4
Dwelling permits, sales, and production for 21 European countries
159
TABLE 2.5
Fruit and vegetable consumption and smoking
164
TABLE 7.1
Monthly rates of return on a portfolio (%)
425
TABLE 7.2
Aggressive behaviors of dementia patients
429
TABLE 7.3
Length (in seconds) of audio files sampled from an iPod
437
TABLE 7.4
DRP scores for third-graders
452
TABLE 7.5
Seated systolic blood pressure (mm Hg)
463
TABLE 10.1
In-state tuition and fees (in dollars) for 33 public universities
602
Sales price and assessed value (in $ thousands) of 30 homes in a midwestern city
604
Annual number of tornadoes in the United States between 1953 and 2012
605
TABLE 1.1 TABLE 1.2
TABLE 10.2 TABLE 10.3
2
Watershed area (km ), percent forest, and index of biotic integrity
606
TABLE 12.1
Age at death for North American women writers
686
TABLE 13.1
Safety behaviors of abused women
713
TABLE 13.2
Iron content (mg/100 g) of food cooked in different pots
716
TABLE 13.3
Tool diameter data
716
TABLE 16.1
Degree of Reading Power scores for third-graders
16-44
TABLE 16.2
Aggressive behaviors of dementia patients
16-50
TABLE 16.3
Serum retinol levels in two groups of children
16-55
TABLE 17.1
Twenty control chart samples of water resistance
17-10
TABLE 17.2
Control chart constants
17-14
TABLE 10.4
xxvii
xxviii
Data Table Index TABLE 17.3
Twenty samples of size 3, with x and s
17-19
TABLE 17.4
Three sets of x ’s from 20 samples of size 4
17-20
TABLE 17.5
Twenty samples of size 4, with x and s
17-21
TABLE 17.6
x and s for 24 samples of elastomer viscosity
17-27
TABLE 17.7
x and s for 24 samples of label placement
17-36
TABLE 17.8
x and s for 24 samples of label placement
17-37
TABLE 17.9
Hospital losses for 15 samples of DRG 209 patients
17-38
TABLE 17.10
Daily calibration samples for a Lunar bone densitometer 17-39
TABLE 17.11
x and s for samples of bore diameter
17-40
TABLE 17.12
Fifty control chart samples of call center response times
17-50
TABLE 17.13
Proportions of workers absent during four weeks
17-56
TABLE 17.14
x and s for samples of film thickness
17-59
B E YO N D T H E B A S I C S I N D E X CHAPTER 1
Density estimation
71
CHAPTER 2
Scatterplot smoothers
96
CHAPTER 2
Data mining
135
CHAPTER 3
Capture-recapture sampling
214
CHAPTER 4
More laws of large numbers
270
CHAPTER 5
Weibull distributions
315
CHAPTER 6
The bootstrap
367
CHAPTER 8
The plus four confidence interval for a single proportion
493
CHAPTER 8
The plus four confidence interval for a difference in proportions
493
CHAPTER 8
Relative risk
520
CHAPTER 9
Meta-analysis
548
CHAPTER 10
Nonlinear regression
582
CHAPTER 11
Multiple logistic regression
631
CHAPTER 16
The bootstrap for a scatterplot smoother
16-20
xxix
Looking at Data—Distributions Introduction Statistics is the science of learning from data. Data are numerical or qualitative descriptions of the objects that we want to study. In this chapter, we will master the art of examining data. We begin in Section 1.1 with some basic ideas about data. We will learn about the different types of data that are collected and how data sets are organized. Section 1.2 starts our process of learning from data by looking at graphs. These visual displays give us a picture of the overall patterns in a set of data. We have excellent software tools that help us make these graphs. However, it takes a little experience and a lot of judgment to study the graphs carefully and to explain what they tell us about our data. Section 1.3 continues our process of learning from data by computing numerical summaries. These sets of numbers describe key characteristics of the patterns that we saw in our graphical summaries. A final section in this chapter helps us make the transition from data summaries to statistical models. We learn about using density curves to describe a set of data. The Normal distributions are also introduced in this section. These distributions can be used to describe many sets of data that we will encounter. They also play a fundamental role in the methods that we will use to draw conclusions from many sets of data.
CHAPTER 1.1 Data
1
1.2 Displaying Distributions with Graphs 1.3 Describing Distributions with Numbers 1.4 Density Curves and Normal Distributions
1
2
CHAPTER 1
•
Looking at Data—Distributions
1.1 Data When you complete this section, you will be able to • Give examples of cases in a data set. • Identify the variables in a data set. • Demonstrate how a label can be used as a variable in a data set. • Identify the values of a variable. • Classify variables as categorical or quantitative. • Describe the key characteristics of a set of data. • Explain how a rate is the result of adjusting one variable to create another.
A statistical analysis starts with a set of data. We construct a set of data by first deciding what cases, or units, we want to study. For each case, we record information about characteristics that we call variables.
CASES, LABELS, VARIABLES, AND VALUES Cases are the objects described by a set of data. Cases may be customers, companies, subjects in a study, units in an experiment, or other objects. A label is a special variable used in some data sets to distinguish the different cases. A variable is a characteristic of a case. Different cases can have different values of the variables.
EXAMPLE 1.1 Over 12 billion sold. Apple’s music-related products and services generated $1.8 billion in the third quarter of 2012. Since Apple started marketing iTunes in 2003, they have sold over 12 billion songs. Let’s take a look at this remarkable product. Figure 1.1 is part of an iTunes playlist named IPS. The six songs shown are cases. They are numbered from 1 to 6 in the first column. These numbers are the labels that distinguish the six songs. The following five columns give name (of the song), time (the length of time it takes to play the song), artist, album, and genre.
Some variables, like the name of a song and the artist simply place cases into categories. Others, like the length of a song, take numerical values for which we can do arithmetic. It makes sense to give an average length of time for a collection of songs, but it does not make sense to give an “average” album. We can, however, count the numbers of songs on different albums, and we can do arithmetic with these counts.
1.1 Data
3
FIGURE 1.1 Part of an iTunes playlist, for Example 1.1.
CATEGORICAL AND QUANTITATIVE VARIABLES A categorical variable places a case into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. The distribution of a variable tells us what values it takes and how often it takes these values.
EXAMPLE 1.2 Categorical and quantitative variables in iTunes playlist. The IPS iTunes playlist contains five variables. These are the name, time, artist, album, and genre. The time is a quantitative variable. Name, artist, album, and genre are categorical variables. An appropriate label for your cases should be chosen carefully. In our iTunes example, a natural choice of a label would be the name of the song. However, if you have more than one artist performing the same song, or the same artist performing the same song on different albums, then the name of the song would not uniquely label each of the songs in your playlist. A quantitative variable such as the time in the iTunes playlist requires some special attention before we can do arithmetic with its values. The first song in the playlist has time equal to 3:32—that is, 3 minutes and 32 seconds. To do arithmetic with this variable, we should first convert all the values so that they have a single unit. We could convert to seconds; 3 minutes is 180 seconds, so the total time is 180 + 32, or 212 seconds. An alternative would be to convert to minutes; 32 seconds is 0.533 minute, so time written in this way is 3.533 minutes. USE YOUR KNOWLEDGE 1.1 Time in the iTunes playlist. In the iTunes playlist, do you prefer to convert the time to seconds or minutes? Give a reason for your answer. units of measurement
We use the term units of measurement to refer to the seconds or minutes that tell us how the variable time is measured. If we were measuring heights of children, we might choose to use either inches or centimeters. The units of measurement are an important part of the description of a quantitative variable.
4
CHAPTER 1
•
Looking at Data—Distributions
Key characteristics of a data set In practice, any set of data is accompanied by background information that helps us understand the data. When you plan a statistical study or explore data from someone else’s work, ask yourself the following questions: 1. Who? What cases do the data describe? How many cases does the data set contain? 2. What? How many variables do the data contain? What are the exact definitions of these variables? What are the units of measurement for each quantitative variable? 3. Why? What purpose do the data have? Do we hope to answer some specific questions? Do we want to draw conclusions about cases other than the ones we actually have data for? Are the variables that are recorded suitable for the intended purpose?
EXAMPLE 1.3 Data for students in a statistics class. Figure 1.2 shows part of a data set for students enrolled in an introductory statistics class. Each row gives the data on one student. The values for the different variables are in the columns. This data set has eight variables. ID is a label for each student. Exam1, Exam2, Homework, Final, and Project give the points earned, out of a total of 100 possible, for each of these course requirements. Final grades are based on a possible 200 points for each exam and the Final, 300 points for Homework, and 100 points for Project. TotalPoints is the variable that gives the composite score. It is computed by adding 2 times Exam1, Exam2, and Final, 3 times Homework, and 1 times Project. Grade is the grade earned in the course. This instructor used cutoffs of 900, 800, 700, etc. for the letter grades.
USE YOUR KNOWLEDGE 1.2 Who, what, and why for the statistics class data. Answer the who, what, and why questions for the statistics class data set. 1.3 Read the spreadsheet. Refer to Figure 1.2. Give the values of the variables Exam1, Exam2, and Final for the student with ID equal to 104. FIGURE 1.2 Spreadsheet
Excel
for Example 1.3.
A
B
C
D
E
F
G
H
1
ID
Exam1
Exam2
Homework
Final
Project
2
101
89
94
88
87
95
899 B
3
102
78
84
90
89
94
866 B
4
103
71
80
75
79
95
780 C
5
104
95
98
97
96
93
962 A
6
105
79
88
85
88
96
861 B
TotalPoints Grade
1.1 Data
5
1.4 Calculate the grade. A student whose data do not appear on the spreadsheet scored 83 on Exam1, 82 on Exam2, 77 for Homework, 90 on the Final, and 80 on the Project. Find TotalPoints for this student and give the grade earned.
spreadsheet
The display in Figure 1.2 is from an Excel spreadsheet. Spreadsheets are very useful for doing the kind of simple computations that you did in Exercise 1.4. You can type in a formula and have the same computation performed for each row. Note that the names we have chosen for the variables in our spreadsheet do not have spaces. For example, we could have used the name “Exam 1” for the first-exam score rather than Exam1. In some statistical software packages, however, spaces are not allowed in variable names. For this reason, when creating spreadsheets for eventual use with statistical software, it is best to avoid spaces in variable names. Another convention is to use an underscore (_) where you would normally use a space. For our data set, we could use Exam_1, Exam_2, and Final_Exam.
EXAMPLE 1.4 Cases and variables for the statistics class data. The data set in Figure 1.2 was constructed to keep track of the grades for students in an introductory statistics course. The cases are the students in the class. There are eight variables in this data set. These include a label for each student and scores for the various course requirements. There are no units for ID and grade. The other variables all have “points” as the unit.
EXAMPLE 1.5 Statistics class data for a different purpose. Suppose that the data for the students in the introductory statistics class were also to be used to study relationships between student characteristics and success in the course. For this purpose, we might want to use a data set like the spreadsheet in Figure 1.3.
FIGURE 1.3 Spreadsheet
Excel
for Example 1.5.
A
B
C
D
E
F
TotalPoints
Grade
Gender
PrevStat
Year
1
ID
2
101
899
B
F
Yes
4
3
102
866
B
M
Yes
3
4
103
780
C
M
No
3
5
104
962
A
M
No
1
6
105
861
B
F
No
4
6
CHAPTER 1
•
Looking at Data—Distributions Here, we have decided to focus on the TotalPoints and Grade as the outcomes of interest. Other variables of interest have been included: Gender, PrevStat (whether or not the student has taken a statistics course previously), and Year (student classification as first, second, third, or fourth year). ID is a categorical variable, TotalPoints is a quantitative variable, and the remaining variables are all categorical. In our example, the possible values for the grade variable are A, B, C, D, and F. When computing grade point averages, many colleges and universities translate these letter grades into numbers using A 5 4, B 5 3, C 5 2, D 5 1, and F 5 0. The transformed variable with numeric values is considered to be quantitative because we can average the numerical values across different courses to obtain a grade point average. Sometimes, experts argue about numerical scales such as this. They ask whether or not the difference between an A and a B is the same as the difference between a D and an F. Similarly, many questionnaires ask people to respond on a 1 to 5 scale with 1 representing strongly agree, 2 representing agree, etc. Again, we could ask whether or not the five possible values for this scale are equally spaced in some sense. From a practical point of view, however, the averages that can be computed when we convert categorical scales such as these to numerical values frequently provide a very useful way to summarize data.
USE YOUR KNOWLEDGE 1.5 Apartment rentals. A data set lists apartments available for students to rent. Information provided includes the monthly rent, whether or not cable is included free of charge, whether or not pets are allowed, the number of bedrooms, and the distance to the campus. Describe the cases in the data set, give the number of variables, and specify whether each variable is categorical or quantitative.
instrument
rate
Often the variables in a statistical study are easy to understand: height in centimeters, study time in minutes, and so on. But each area of work also has its own special variables. A psychologist uses the Minnesota Multiphasic Personality Inventory (MMPI), and a physical fitness expert measures “VO2 max,” the volume of oxygen consumed per minute while exercising at your maximum capacity. Both of these variables are measured with special instruments. VO2 max is measured by exercising while breathing into a mouthpiece connected to an apparatus that measures oxygen consumed. Scores on the MMPI are based on a long questionnaire, which is also an instrument. Part of mastering your field of work is learning what variables are important and how they are best measured. Because details of particular measurements usually require knowledge of the particular field of study, we will say little about them. Be sure that each variable really does measure what you want it to. A poor choice of variables can lead to misleading conclusions. Often, for example, the rate at which something occurs is a more meaningful measure than a simple count of occurrences.
1.1 Data
7
EXAMPLE 1.6 Comparing colleges based on graduates. Think about comparing colleges based on the numbers of graduates. This view tells you something about the relative sizes of different colleges. However, if you are interested in how well colleges succeed at graduating students whom they admit, it would be better to use a rate. For example, you can find data on the Internet on the six-year graduation rates of different colleges. These rates are computed by examining the progress of first-year students who enroll in a given year. Suppose that at College A there were 1000 first-year students in a particular year, and 800 graduated within six years. The graduation rate is 800 ⫽ 0.80 1000 or 80%. College B has 2000 students who entered in the same year, and 1200 graduated within six years. The graduation rate is 1200 ⫽ 0.60 2000 or 60%. How do we compare these two colleges? College B has more graduates, but College A has a better graduation rate.
USE YOUR KNOWLEDGE 1.6 How should you express the change? Between the first exam and the second exam in your statistics course you increased the amount of time that you spent working exercises. Which of the following three ways would you choose to express the results of your increased work: (a) give the grades on the two exams, (b) give the ratio of the grade on the second exam divided by the grade on the first exam, or (c) take the difference between the grade on the second exam and the grade on the first exam, and express this as a percent of the grade on the first exam. Give reasons for your answer. 1.7 Which variable would you choose. Refer to Example 1-6, on colleges and their graduates. (a) Give a setting where you would prefer to evaluate the colleges based on the numbers of graduates. Give a reason for your choice. (b) Give a setting where you would prefer to evaluate the colleges based on the graduation rates. Give a reason for your choice.
adjusting one variable to create another
In Example 1.6, when we computed the graduation rate, we used the total number of students to adjust the number of graduates. We constructed a new variable by dividing the number of graduates by the total number of students. Computing a rate is just one of several ways of adjusting one variable to create another. We often divide one variable by another to compute a more meaningful variable to study. Example 1.20 (page 22) is another type of adjustment.
8
CHAPTER 1
•
Looking at Data—Distributions Exercises 1.6 and 1.7 illustrate an important point about presenting the results of your statistical calculations. Always consider how to best communicate your results to a general audience. For example, the numbers produced by your calculator or by statistical software frequently contain more digits than are needed. Be sure that you do not include extra information generated by software that will distract from a clear explanation of what you have found.
SECTION 1.1 Summary A data set contains information on a number of cases. Cases may be customers, companies, subjects in a study, units in an experiment, or other objects. For each case, the data give values for one or more variables. A variable describes some characteristic of a case, such as a person’s height, gender, or salary. Variables can have different values for different cases. A label is a special variable used to identify cases in a data set. Some variables are categorical and others are quantitative. A categorical variable places each individual into a category, such as male or female. A quantitative variable has numerical values that measure some characteristic of each case, such as height in centimeters or annual salary in dollars. The key characteristics of a data set answer the questions Who?, What?, and Why?
SECTION 1.1 Exercises For Exercise 1.1, see page 3; for Exercises 1.2 to 1.4, see pages 4–5; for Exercise 1.5, see page 6; and for Exercises 1.6 and 1.7, see page 7.
(c) Set up a spreadsheet that could be used to record the data. Give appropriate column headings and five sample cases.
1.8 Summer jobs. You are collecting information about summer jobs that are available for college students in your area. Describe a data set that you could use to organize the information that you collect.
1.10 How would you rank cities? Various organizations rank cities and produce lists of the 10 or the 100 best based on various measures. Create a list of criteria that you would use to rank cities. Include at least eight variables and give reasons for your choices. Say whether each variable is quantitative or categorical.
(a) What are the cases? (b) Identify the variables and their possible values. (c) Classify each variable as categorical or quantitative. Be sure to include at least one of each. (d) Use a label and explain how you chose it. (e) Summarize the key characteristics of your data set. 1.9 Employee application data. The personnel department keeps records on all employees in a company. Here is the information that they keep in one of their data files: employee identification number, last name, first name, middle initial, department, number of years with the company, salary, education (coded as high school, some college, or college degree), and age. (a) What are the cases for this data set? (b) Describe each type of information as a label, a quantitative variable, or a categorical variable.
1.11 Survey of students. A survey of students in an introductory statistics class asked the following questions: (1) age; (2) do you like to sing? (Yes, No); (3) can you play a musical instrument (not at all, a little, pretty well); (4) how much did you spend on food last week? (5) height. (a) Classify each of these variables as categorical or quantitative and give reasons for your answers. (b) For each variable give the possible values. 1.12 What questions would you ask? Refer to the previous exercise. Make up your own survey questions with at least six questions. Include at least two categorical variables and at least two quantitative variables. Tell which variables are categorical and which are quantitative. Give reasons for your answers. For each variable give the possible values.
1.2 Displaying Distributions with Graphs 1.13 How would you rate colleges? Popular magazines rank colleges and universities on their “academic quality” in serving undergraduate students. Describe five variables that you would like to see measured for each college if you were choosing where to study. Give reasons for each of your choices. 1.14 Attending college in your state or in another state. The U.S. Census Bureau collects a large amount of information concerning higher education.1 For example, the bureau provides a table that includes the following variables: state, number of students from the state who attend college, number of students who attend college in their home state. (a) What are the cases for this set of data? (b) Is there a label variable? If yes, what is it?
9
(c) Identify each variable as categorical or quantitative. (d) Explain how you might use each of the quantitative variables to explain something about the states. (e) Consider a variable computed as the number of students in each state who attend college in the state divided by the total number of students from the state who attend college. Explain how you would use this variable explain something about the states. 1.15 Alcohol-impaired driving fatalities. A report on drunk-driving fatalities in the United States gives the number of alcohol-impaired driving fatalities for each state.2 Discuss at least three different ways that these numbers could be converted to rates. Give the advantages and disadvantages of each.
1.2 Displaying Distributions with Graphs When you complete this section, you will be able to • Analyze the distribution of a categorical variable using a bar graph. • Analyze the distribution of a categorical variable using a pie chart. • Analyze the distribution of a quantitative variable using a stemplot. • Analyze the distribution of a quantitative variable using a histogram. • Examine the distribution of a quantitative variable with respect to the overall pattern of the data and deviations from that pattern. • Identify the shape, center, and spread of the distribution of a quantitative variable. • Identify and describe any outliers in the distribution of a quantitative variable. • Use a time plot to describe the distribution of a quantitative variable that is measured over time.
exploratory data analysis
Statistical tools and ideas help us examine data to describe their main features. This examination is called exploratory data analysis. Like an explorer crossing unknown lands, we want first to simply describe what we see. Here are two basic strategies that help us organize our exploration of a set of data: • Begin by examining each variable by itself. Then move on to study the relationships among the variables. • Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.
10
CHAPTER 1
•
Looking at Data—Distributions We will follow these principles in organizing our learning. This chapter presents methods for describing a single variable. We will study relationships among several variables in Chapter 2. Within each chapter, we will begin with graphical displays, then add numerical summaries for a more complete description.
Categorical variables: bar graphs and pie charts distribution of a categorical variable
The values of a categorical variable are labels for the categories, such as “Yes” and “No.” The distribution of a categorical variable lists the categories and gives either the count or the percent of cases that fall in each category.
EXAMPLE DATA ONLINE
1.7 How do you do online research? A study of 552 first-year college students asked about their preferences for online resources. One question asked them to pick their favorite.3 Here are the results:
Resource
CHALLENGE
Google or Google Scholar Library database or website Wikipedia or online encyclopedia Other Total
Count (n) 406 75 52 19 552
Resource is the categorical variable in this example, and the values are the names of the online resources. Note that the last value of the variable resource is “Other,” which includes all other online resources that were given as selection options. For data sets that have a large number of values for a categorical variable, we often create a category such as this that includes categories that have relatively small counts or percents. Careful judgment is needed when doing this. You don’t want to cover up some important piece of information contained in the data by combining data in this way.
EXAMPLE DATA ONLINE
1.8 Favorites as percents. When we look at the online resources data set, we see that Google is the clear winner. We see that 406 reported Google or Google Scholar as their favorite. To interpret this number, we need to know that the total number of students polled was 552. When we say that Google is the winner, we can describe this win by saying that 73.6% (406 divided by
1.2 Displaying Distributions with Graphs
11
552, expressed as a percent) of the students reported Google as their favorite. Here is a table of the preference percents:
Resource
Percent (%)
Google or Google Scholar Library database or website Wikipedia or online encyclopedia Other Total
73.6 13.6 9.4 3.4 100.0
The use of graphical methods will allow us to see this information and other characteristics of the data easily. We now examine two types of graph.
EXAMPLE bar graph DATA
1.9 Bar graph for the online resource preference data. Figure 1.4 displays the online resource preference data using a bar graph. The heights of the four bars show the percents of the students who reported each of the resources as their favorite.
ONLINE
The categories in a bar graph can be put in any order. In Figure 1.4, we ordered the resources based on their preference percents. For other data sets,
80
online resource preference data, for Example 1.9.
70
CHALLENGE
FIGURE 1.4 Bar graph for the
Preference percent
60 50 40 30 20 10 0 Google
Library Wikipedia Online resource
Other
12
CHAPTER 1
•
Looking at Data—Distributions an alphabetical ordering or some other arrangement might produce a more useful graphical display. You should always consider the best way to order the values of the categorical variable in a bar graph. Choose an ordering that will be useful to you. If you have difficulty, ask a friend if your choice communicates what you expect.
EXAMPLE pie chart DATA ONLINE
1.10 Pie chart for the online resource preference data. The pie chart in Figure 1.5 helps us see what part of the whole each group forms. Here it is very easy to see that Google is the favorite for about three-quarters of the students.
Google 73.6% CHALLENGE
Wikipedia 9.4% Other 3.4%
FIGURE 1.5 Pie chart for the online resource preference data, for Example 1.10.
Library 13.6%
USE YOUR KNOWLEDGE DATA ONLINE
1.16 Compare the bar graph with the pie chart. Refer to the bar graph in Figure 1.4 and the pie chart in Figure 1.5 for the online resource preference data. Which graphical display does a better job of describing the data? Give reasons for your answer.
To make a pie chart, you must include all the categories that make up a whole. A category such as “Other” in this example can be used, but the sum of the percents for all the categories should be 100%. CHALLENGE
This constraint makes bar graphs more flexible. For example, you can use a bar graph to compare the numbers of students at your college majoring in
1.2 Displaying Distributions with Graphs
13
biology, business, and political science. A pie chart cannot make this comparison because not all students fall into one of these three majors.
Quantitative variables: stemplots A stemplot (also called a stem-and-leaf plot) gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. Stemplots work best for small numbers of observations that are all greater than 0.
STEMPLOT To make a stemplot, 1. Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem.
EXAMPLE DATA VITDG
1.11 How much vitamin D do they have? Your body needs vitamin D to use calcium when building bones. It is particularly important that young adolescents have adequate supplies of this vitamin because their bodies are growing rapidly. Vitamin D in the form 25-hydroxy vitamin D is measured in the blood and represents the stores available for the body to use. The units of measurement are nanograms per milliliter (ng/ml) of blood. Here are some values measured on a sample of 20 adolescent girls aged 11 to 14 years:4
CHALLENGE
16
43
38
48
42
23
36
35
37
34
25
28
26
43
51
33
40
35
41
42
To make a stemplot of these data, use the first digits as stems and the second digits as leaves. Figure 1.6 shows the steps in making the plot. The girl with
1 2 3 4 5
1 2 3 4 5 (a)
6 3 5 86 865 7 435 3 82 30 1 2 1 (b)
1 2 3 4 5
6 3 5 68 34556 78 01 22338 1 (c)
FIGURE 1.6 Making a stemplot of the data in Example 1.11. (a) Write the stems. (b) Go through the data and write each leaf on the proper stem. For example, the values on the 2 stem are 23, 25, 28, and 26 in the order given in the display for the example. (c) Arrange the leaves on each stem in order out from the stem. The 2 stem now has leaves 3, 5, 6, and 8.
14
CHAPTER 1
•
Looking at Data—Distributions a measured value of 16 ng/ml for vitamin D appears on the first stem with a leaf of 6, while the girl with a measured value of 43 ng/ml appears on the stem labeled 4 with a leaf of 3. The lowest value, 16 ng/ml, is somewhat far away from the next-highest value, 23. However, it is not particularly extreme.
USE YOUR KNOWLEDGE DATA
1.17 Make a stemplot. Here are the scores on the first exam in an introductory statistics course for 30 students in one section of the course: STAT
81
73
93
85
75
98
93
55
80
90
92
80
87
90
72
65
70
85
83
60
70
90
75
75
58
68
85
78
80
93
Use these data to make a stemplot. Then use the stemplot to describe the distribution of the first-exam scores for this course. CHALLENGE
back-to-back stemplot
When you wish to compare two related distributions, a back-to-back stemplot with common stems is useful. The leaves on each side are ordered out from the common stem.
EXAMPLE DATA VITDB
CHALLENGE
splitting stem trimming
FIGURE 1.7 A back-to-back stemplot to compare the distributions of vitamin D for samples of adolescent girls and boys, for Example 1.12.
1.12 Vitamin D for boys. Here are the 25-hydroxy vitamin D values for a sample of 20 adolescent boys aged 11 to 14 years: 18
28
28
28
37
31
24
29
8
27
24
12
21
32
27
24
23
33
31
29
Figure 1.7 gives the back-to-back stemplot for the girls and the boys. The values on the left give the vitamin D measures for the girls, while the values on the right give the measures for the boys. The values for the boys tend to be lower than those for the girls. There are two modifications of the basic stemplot that can be helpful in different situations. You can double the number of stems in a plot by splitting each stem into two: one with leaves 0 to 4 and the other with leaves 5 through 9. When the observed values have many digits, it is often best to trim the numbers by removing the last digit or digits before making a stemplot. Girls 6 86 5 3 8 7655 43 83322 1 0 1
0 1 2 3 4 5
Boys 8 28 1 3 4 4 4 7 7 88899 1 1 237
1.2 Displaying Distributions with Graphs
15
You must use your judgment in deciding whether to split stems and whether to trim, though statistical software will often make these choices for you. Remember that the purpose of a stemplot is to display the shape of a distribution. If there are many stems with no leaves or only one leaf, trimming will reduce the number of stems. Let’s take a look at the effect of splitting the stems for our vitamin D data.
EXAMPLE DATA VITDB
1.13 Stemplot with split stems for vitamin D. Figure 1.8 presents the data from Examples 1.11 and 1.12 in a stemplot with split stems. Notice that we needed only one stem for 0 because there are no values between 0 and 4. Girls
CHALLENGE
6 3 86 5 43 8765 5 3322 1 0 8 1
FIGURE 1.8 A back-to-back stemplot with split stems to compare the distributions of vitamin D for samples of adolescent girls and boys, for Example 1.13.
0 1 1 2 2 3 3 4 4 5
Boys 8 2 8 1 3444 7 7 888 9 9 1 1 23 7
USE YOUR KNOWLEDGE 1.18 Which stemplot do you prefer? Look carefully at the stemplots for the vitamin D data in Figures 1.7 and 1.8. Which do you prefer? Give reasons for your answer. 1.19 Why should you keep the space? Suppose that you had a data set for girls similar to the one given in Example 1.11, but in which the observations of 33 ng/ml and 34 ng/ml were both changed to 35 ng/ml. (a) Make a stemplot of these data for girls using split stems. (b) Should you use one stem or two stems for the 30s? Give a reason for your answer. (Hint: How would your choice reveal or conceal a potentially important characteristic of the data?)
Histograms
histogram
Stemplots display the actual values of the observations. This feature makes stemplots awkward for large data sets. Moreover, the picture presented by a stemplot divides the observations into groups (stems) determined by the number system rather than by judgment. Histograms do not have these limitations. A histogram breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class. You can choose any convenient number of classes, but you should always choose classes of equal width.
16
CHAPTER 1
•
Looking at Data—Distributions
TABLE 1.1 IQ Test Scores for 60 Randomly Chosen Fifth-Grade Students 145
139
126
122
125
130
96
110
118
118
101
142
134
124
112
109
134
113
81
113
123
94
100
136
109
131
117
110
127
124
106
124
115
133
116
102
127
117
109
137
117
90
103
114
139
101
122
105
97
89
102
108
110
128
114
112
114
102
82
101
Making a histogram by hand requires more work than a stemplot. Histograms do not display the actual values observed. For these reasons we prefer stemplots for small data sets. The construction of a histogram is best shown by example. Most statistical software packages will make a histogram for you.
EXAMPLE DATA IQ
1.14 Distribution of IQ scores. You have probably heard that the distribution of scores on IQ tests is supposed to be roughly “bell-shaped.” Let’s look at some actual IQ scores. Table 1.1 displays the IQ scores of 60 fifth-grade students chosen at random from one school. 1. Divide the range of the data into classes of equal width. The scores in Table 1.1 range from 81 to 145, so we choose as our classes
CHALLENGE
75 ⱕ IQ score ⬍ 85 85 ⱕ IQ score ⬍ 95 ⯗ 145 ⱕ IQ score ⬍ 155 Be sure to specify the classes precisely so that each individual falls into exactly one class. A student with IQ 84 would fall into the first class, but IQ 85 falls into the second.
frequency frequency table
2. Count the number of individuals in each class. These counts are called frequencies, frequency and a table of frequencies for all classes is a frequency table. Class 75 # IQ score , 85 85 # IQ score , 95 95 # IQ score , 105 105 # IQ score , 115
Count 2 3 10 16
Class 115 # IQ score , 125 125 # IQ score , 135 135 # IQ score , 145 145 # IQ score , 155
Count 13 10 5 1
3. Draw the histogram. First, on the horizontal axis mark the scale for the variable whose distribution you are displaying. That’s the IQ score. The scale runs from 75 to 155 because that is the span of the classes we chose. The vertical axis contains the scale of counts. Each bar represents a class.
1.2 Displaying Distributions with Graphs
17
FIGURE 1.9 Histogram of the IQ scores of 60 fifth-grade students, for Example 1.14. Count of students
15
10
5
0
80
90
100 110 120 130 140 150 IQ score
The base of the bar covers the class, and the bar height is the class count. There is no horizontal space between the bars unless a class is empty, so that its bar has height zero. Figure 1.9 is our histogram. It does look roughly “bell-shaped.” Large sets of data are often reported in the form of frequency tables when it is not practical to publish the individual observations. In addition to the frequency (count) for each class, we may be interested in the fraction or percent of the observations that fall in each class. A histogram of percents looks just like a frequency histogram such as Figure 1.9. Simply relabel the vertical scale to read in percents. Use histograms of percents for comparing several distributions that have different numbers of observations. USE YOUR KNOWLEDGE DATA STAT
1.20 Make a histogram. Refer to the first-exam scores from Exercise 1.17 (page 14). Use these data to make a histogram with classes 50 to 59, 60 to 69, etc. Compare the histogram with the stemplot as a way of describing this distribution. Which do you prefer for these data?
CHALLENGE CHALLENGE
Our eyes respond to the area of the bars in a histogram. Because the classes are all the same width, area is determined by height, and all classes are fairly represented. There is no one right choice of the classes in a histogram. Too few classes will give a “skyscraper” graph, with all values in a few classes with tall bars. Too many will produce a “pancake” graph, with most classes having one or no observations. Neither choice will give a good picture of the shape of the distribution. You must use your judgment in choosing classes to display the shape. Statistical software will choose the classes for you. The software’s choice is often a good one, but you can change it if you want. You should be aware that the appearance of a histogram can change when you change the classes. The histogram function in the One-Variable Statistical Calculator applet on the text website allows you to change the number of classes by dragging with the mouse, so that it is easy to see how the choice of classes affects the histogram.
18
CHAPTER 1
•
Looking at Data—Distributions USE YOUR KNOWLEDGE DATA STAT
1.21 Change the classes in the histogram. Refer to the first-exam scores from Exercise 1.17 (page 14) and the histogram that you produced in Exercise 1.20. Now make a histogram for these data using classes 40 to 59, 60 to 79, and 80 to 99. Compare this histogram with the one that you produced in Exercise 1.20. Which do you prefer? Give a reason for your answer. 1.22 Use smaller classes. Repeat the previous exercise using classes 55 to 59, 60 to 64, 65 to 69, etc.
CHALLENGE
Although histograms resemble bar graphs, their details and uses are distinct. A histogram shows the distribution of counts or percents among the values of a single variable. A bar graph compares the counts of different items. The horizontal axis of a bar graph need not have any measurement scale but simply identifies the items being compared. Draw bar graphs with blank space between the bars to separate the items being compared. Draw histograms with no space, to indicate that all values of the variable are covered. Some spreadsheet programs, which are not primarily intended for statistics, will draw histograms as if they were bar graphs, with space between the bars. Often, you can tell the software to eliminate the space to produce a proper histogram.
Data analysis in action: don’t hang up on me Many businesses operate call centers to serve customers who want to place an order or make an inquiry. Customers want their requests handled thoroughly. Businesses want to treat customers well, but they also want to avoid wasted time on the phone. They therefore monitor the length of calls and encourage their representatives to keep calls short.
EXAMPLE DATA CALLS80
1.15 How long are customer service center calls? We have data on the lengths of all 31,492 calls made to the customer service center of a small bank in a month. Table 1.2 displays the lengths of the first 80 calls.5 Take a look at the data in Table 1.2. In this data set the cases are calls made to the bank’s call center. The variable recorded is the length of each call. The units are seconds. We see that the call lengths vary a great deal. The longest call lasted 2631 seconds, almost 44 minutes. More striking is that 8 of these 80 calls lasted less than 10 seconds. What’s going on?
CHALLENGE
We started our study of the customer service center data by examining a few cases, the ones displayed in Table 1.2. It would be very difficult to examine all 31,492 cases in this way. How can we do this? Let’s try a histogram.
EXAMPLE DATA CALLS
1.16 Histogram for customer service center call lengths. Figure 1.10 is a histogram of the lengths of all 31,492 calls. We did not plot the few lengths greater than 1200 seconds (20 minutes). As expected, the graph shows that
1.2 Displaying Distributions with Graphs
19
TABLE 1.2 Service Times (Seconds) for Calls to a Customer Service Center 77
289
128
59
19
148
157
203
126
118
104
141
290
48
3
2
372
140
438
56
44
274
479
211
179
1
68
386
2631
90
30
57
89
116
225
700
40
73
75
51
148
9
115
19
76
138
178
76
67
102
35
80
143
951
106
55
4
54
137
367
277
201
52
9
700
182
73
199
325
75
103
64
121
11
9
88
1148
2
465
25
most calls last between about 1 and 5 minutes, with some lasting much longer when customers have complicated problems. More striking is the fact that 7.6% of all calls are no more than 10 seconds long. It turned out that the bank penalized representatives whose average call length was too long— so some representatives just hung up on customers to bring their average length down. Neither the customers nor the bank were happy about this. The bank changed its policy, and later data showed that calls under 10 seconds had almost disappeared.
FIGURE 1.10 The distribution of call lengths for 31,492 calls to a bank’s customer service center, for Example 1.16. The data show a surprising number of very short calls. These are mostly due to representatives deliberately hanging up in order to bring down their average call length.
The extreme values of a distribution are in the tails of the distribution. The high values are in the upper, or right, tail, and the low values are in the lower, or left, tail. The overall pattern in Figure 1.10 is made up of the many moderate call lengths and the long right tail of more lengthy calls. The striking deviation from the overall pattern is the surprising number of very short calls in the left tail.
2500 7.6% of all calls are ≤ 10 seconds long 2000 Count of calls
tails
1500
1000
500
0
0
200
400 600 800 1000 Service time (seconds)
1200
20
CHAPTER 1
•
Looking at Data—Distributions Our examination of the call center data illustrates some important principles: • After you understand the background of your data (cases, variables, units of measurement), the first thing to do is plot your data. • When you look at a plot, look for an overall pattern and also for any striking deviations from the pattern.
Examining distributions Making a statistical graph is not an end in itself. The purpose of the graph is to help us understand the data. After you make a graph, always ask, “What do I see?” Once you have displayed a distribution, you can see its important features as follows.
EXAMINING A DISTRIBUTION In any graph of data, look for the overall pattern and for striking deviations from that pattern. You can describe the overall pattern of a distribution by its shape, center, and spread. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. In Section 1.3, we will learn how to describe center and spread numerically. For now, we can describe the center of a distribution by its midpoint, the value with roughly half the observations taking smaller values and half taking larger values. We can describe the spread of a distribution by giving the smallest and largest values. Stemplots and histograms display the shape of a distribution in the same way. Just imagine a stemplot turned on its side so that the larger values lie to the right. Some things to look for in describing shape are modes unimodal symmetric skewed
• Does the distribution have one or several major peaks, called modes? A distribution with one major peak is called unimodal. • Is it approximately symmetric or is it skewed in one direction? A distribution is symmetric if the values smaller and larger than its midpoint are mirror images of each other. It is skewed to the right if the right tail (larger values) is much longer than the left tail (smaller values). Some variables commonly have distributions with predictable shapes. Many biological measurements on specimens from the same species and sex— lengths of bird bills, heights of young women—have symmetric distributions. Money amounts, on the other hand, usually have right-skewed distributions. There are many moderately priced houses, for example, but the few very expensive mansions give the distribution of house prices a strong right-skew.
EXAMPLE DATA IQ
1.17 Examine the histogram of IQ scores. What does the histogram of IQ scores (Figure 1.9, page 17) tell us? Shape: The distribution is roughly symmetric with a single peak in the center. We don’t expect real data to be perfectly symmetric, so in judging
1.2 Displaying Distributions with Graphs
21
symmetry, we are satisfied if the two sides of the histogram are roughly similar in shape and extent. Center: You can see from the histogram that the midpoint is not far from 110. Looking at the actual data shows that the midpoint is 114. Spread: The histogram has a spread from 75 to 155. Looking at the actual data shows that the spread is from 81 to 145. There are no outliers or other strong deviations from the symmetric, unimodal pattern.
EXAMPLE 1.18 Examine the histogram of call lengths. The distribution of call lengths in Figure 1.10 (page 19), on the other hand, is strongly skewed to the right. The midpoint, the length of a typical call, is about 115 seconds, or just under 2 minutes. The spread is very large, from 1 second to 28,739 seconds. The longest few calls are outliers. They stand apart from the long right tail of the distribution, though we can’t see this from Figure 1.10, which omits the largest observations. The longest call lasted almost 8 hours—that may well be due to equipment failure rather than an actual customer call. USE YOUR KNOWLEDGE DATA STAT
1.23 Describe the first-exam scores. Refer to the first-exam scores from Exercise 1.17 (page 14). Use your favorite graphical display to describe the shape, the center, and the spread of these data. Are there any outliers?
Dealing with outliers
CHALLENGE
In data sets smaller than the service call data, you can spot outliers by looking for observations that stand apart (either high or low) from the overall pattern of a histogram or stemplot. Identifying outliers is a matter for judgment. Look for points that are clearly apart from the body of the data, not just the most extreme observations in a distribution. You should search for an explanation for any outlier. Sometimes outliers point to errors made in recording the data. In other cases, the outlying observation may be caused by equipment failure or other unusual circumstances.
EXAMPLE DATA COLLEGE
1.19 College students. How does the number of undergraduate college students vary by state? Figure 1.11 is a histogram of the numbers of undergraduate students in each of the states.6 Notice that over 50% of the states are included in the first bar of the histogram. These states have fewer than 300,000 undergraduates. The next bar includes another 30% of the states. These have between 300,000 and 600,000 students. The bar at the far right of the histogram corresponds to the state of California, which has 2,685,893 undergraduates. California certainly stands apart from the other states for this variable. It is an outlier.
CHALLENGE
The state of California is an outlier in the previous example because it has a very large number of undergraduate students. Since California has the largest population of all the states, we might expect it to have a large number of undergraduate students. Let’s look at these data in a different way.
22
CHAPTER 1
•
Looking at Data—Distributions 60
Percent of states
50 40 30 20 10
FIGURE 1.11 The distribution of the numbers of undergraduate college students for the 50 states, for Example 1.19.
0 300
600
900 1200 1500 1800 2100 2400 2700 Undergraduates (1000s)
EXAMPLE variation in the populations of the states, for each state we divide the number of undergraduate students by the population and then multiply by 1000. This gives the undergraduate college enrollment expressed as the number of students per 1000 people in each state. Figure 1.12 gives a stemplot of the distribution. California has 60 undergraduate students per 1000 people. This is one of the higher values in the distribution but it is clearly not an outlier.
numbers of undergraduate college students per 1000 people in each of the 50 states, for Example 1.20.
8 1 1 1 1 1 33 55 556667 7 7 7 7 88889 001 1 1 2 224 4 44 566 00001 7 9 1 2 7
COLLEGE
CHALLENGE
FIGURE 1.12 Stemplot of the
3 4 4 5 5 6 6 7 7
DATA
1.20 College students per 1000. To account for the fact that there is large
USE YOUR KNOWLEDGE
(a) Examine the data file and report the names of these four states.
DATA
1.24 Four states with large populations. There are four states with populations greater that 15 million.
COLLEGE
(b) Find these states in the distribution of number of undergraduate students per 1000 people. To what extent do these four states influence the distribution of number of undergraduate students per 1000 people?
CHALLENGE
In Example 1.19 we looked at the distribution of the number of undergraduate students, while in Example 1.20 we adjusted these data by expressing the counts as number per 1000 people in each state. Which way is correct? The answer depends upon why you are examining the data.
1.2 Displaying Distributions with Graphs
23
If you are interested in marketing a product to undergraduate students, the unadjusted numbers would be of interest. On the other hand, if you are interested in comparing states with respect to how well they provide opportunities for higher education to their residents, the population-adjusted values would be more suitable. Always think about why you are doing a statistical analysis, and this will guide you in choosing an appropriate analytic strategy. Here is an example with a different kind of outlier.
EXAMPLE DATA PTH
1.21 Healthy bones and PTH. Bones are constantly being built up (bone formation) and torn down (bone resorption). Young people who are growing have more formation than resorption. When we age, resorption increases to the point where it exceeds formation. (The same phenomenon occurs when astronauts travel in space.) The result is osteoporosis, a disease associated with fragile bones that are more likely to break. The underlying mechanisms that control these processes are complex and involve a variety of substances. One of these is parathyroid hormone (PTH). Here are the values of PTH measured on a sample of 29 boys and girls aged 12 to 15 years:7
CHALLENGE
39
59
30
48 71
31
25
31
71
50
38
63
49
45
33
28
40 127 49
59
50
64
28
46
35
28
19
29
31
The data are measured in picograms per milliliter (pg/ml) of blood. The original data were recorded with one digit after the decimal point. They have been rounded to simplify our presentation here. Here is a stemplot of the data: 1 2 3 4 5 6 7 8 9 10 11 12
9 58889 01 1 1 3589 056899 0099 34 1 1
7
The observation 127 clearly stands out from the rest of the distribution. A PTH measurement on this individual taken on a different day was similar to the rest of the values in the data set. We conclude that this outlier was caused by a laboratory error or a recording error, and we are confident in discarding it for any additional analysis.
Time plots Whenever data are collected over time, it is a good idea to plot the observations in time order. Displays of the distribution of a variable that ignore time order, such as stemplots and histograms, can be misleading when there is systematic change over time.
24
CHAPTER 1
•
Looking at Data—Distributions
TIME PLOT A time plot of a variable plots each observation against the time at which it was measured. Always put time on the horizontal scale of your plot and the variable you are measuring on the vertical scale.
EXAMPLE DATA VITDS
CHALLENGE
1.22 Seasonal variation in vitamin D. Although we get some of our vitamin D from food, most of us get about 75% of what we need from the sun. Cells in the skin make vitamin D in response to sunlight. If people do not get enough exposure to the sun, they can become deficient in vitamin D, resulting in weakened bones and other health problems. The elderly, who need more vitamin D than younger people, and people who live in northern areas where there is relatively little sunlight in the winter, are particularly vulnerable to these problems. Figure 1.13 is a plot of the serum levels of vitamin D versus time of year for samples of subjects from Switzerland.8 The units for measuring vitamin D are nanomoles per liter (nmol/l) of blood. The observations are grouped into periods of two months for the plot. Means are marked by filled-in circles and are connected by a line in the plot. The effect of the lack of sunlight in the winter months on vitamin D levels is clearly evident in the plot.
The data described in the example above are based on a subset of the subjects in a study of 248 subjects. The researchers were particularly concerned about subjects whose levels were deficient, defined as a serum vitamin D level of less than 50 nmol/l. They found that there was a 3.8-fold higher deficiency rate in February–March than in August–September: 91.2% versus 24.3%. To ensure that individuals from this population have adequate levels of vitamin D, some form of supplementation is needed, particularly during certain times of the year.
FIGURE 1.13 Plot of
Vitamin D (nmol/l)
vitamin D versus months of the year, for Example 1.22.
110 100 90 80 70 60 50 40 30 20 10 0 FebMar
AprMay
JunJul AugSep Months
OctNov
DecJan
1.2 Displaying Distributions with Graphs
25
SECTION 1.2 Summary Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them. The distribution of a variable tells us what values it takes and how often it takes these values. Bar graphs and pie charts display the distributions of categorical variables. These graphs use the counts or percents of the categories. Stemplots and histograms display the distributions of quantitative variables. Stemplots separate each observation into a stem and a one-digit leaf. Histograms plot the frequencies (counts) or the percents of equal-width classes of values. When examining a distribution, look for shape, center, and spread and for clear deviations from the overall shape. Some distributions have simple shapes, such as symmetric or skewed. The number of modes (major peaks) is another aspect of overall shape. Not all distributions have a simple overall shape, especially when there are few observations. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. When observations on a variable are taken over time, make a time plot that graphs time horizontally and the values of the variable vertically. A time plot can reveal changes over time.
SECTION 1.2 Exercises For Exercise 1.16, see page 12; for Exercise 1.17, see page 14; for Exercises 1.18 and 1.19, see page 15; for Exercise 1.20, see page 17; for Exercises 1.21 and 1.22, see page 18; for Exercise 1.23, see page 21; and for Exercise 1.24, see page 22. 1.25 The Titanic and class. On April 15, 1912, on her maiden voyage, the Titanic collided with an iceberg and sank. The ship was luxurious but did not have enough lifeboats for the 2224 passengers and crew. As a result of the collision, 1502 people died.9 The ship had three classes of passengers. The level of luxury and the price of the ticket varied with the class, with first class being the most luxurious. There were 323 passengers in first class, TITANIC 277 in second class, and 709 in third class.10
(b) Compare the pie chart with the bar graph. Which do you prefer? Give reasons for your answer. 1.27 Who survived? Refer to the two previous exercises. The number of first-class passengers who survived was 200. For second and third class, the numbers were 119 and 181, respectively. Create a graphical summary that shows how the survival of passengers depended on class. TITANIC 1.28 Do you use your Twitter account? Although Twitter has more than 500,000,000 users, only about 170,000,000 are active. A study of Twitter account usage defined an active account as one with at least one message posted within a three-month period. Here are the percents TWITTC of active accounts for 20 countries:11
(a) Make a bar graph of these data. (b) Give a short summary of how the number of passengers varied with class.
Country Argentina
25
India
19
South Korea
24
(c) If you made a bar graph of the percents of passengers in each class, would the general features of the graph differ from the one you made in part (a)? Explain your answer.
Brazil
25
Indonesia
28
Spain
29
Canada
28
Japan
30
Turkey
25
Chile
24
Mexico
26
United Kingdom
26
Colombia
26
Netherlands
33
United States
28
France
24
Philippines
22
Venezuela
28
Germany
23
Russia
26
1.26 Another look at the Titanic and class. Refer to the previous exercise. TITANIC (a) Make a pie chart to display the data.
Percent
Country
Percent
Country
Percent
26
CHAPTER 1
•
Looking at Data—Distributions
(a) Make a stemplot of the distribution of percents of active accounts.
(a) Analyze these data using the questions in the previous exercise as a guide.
(b) Describe the overall pattern of the data and any deviations from that pattern.
(b) Compare the patterns in 2010 with those in 2011. Describe any similarities and differences.
(c) Identify the shape, center, and spread of the distribution. (d) Identify and describe any outliers. 1.29 Another look at Twitter account usage. Refer to TWITTC the previous exercise. (a) Use a histogram to summarize the distribution. (b) Use this histogram to answer parts (b), (c), and (d) of the previous exercise. (c) Which graphical display, stemplot or histogram, is more useful for describing this distribution? Give reasons for your answer. 1.30 Energy consumption. The U.S. Energy Information Administration reports data summaries of various energy statistics. Let’s look at the total amount of energy consumed, in quadrillions of British thermal units (Btu), ENERGY for each month in 2011. Here are the data:12
1.33 Least-favorite colors. Refer to the previous exercise. The same study also asked people about their leastfavorite color. Here are the results: orange, 30%; brown, 23%; purple, 13%; yellow, 13%; gray, 12%; green, 4%; white, 4%; red, 1%; black, 0%; and blue, 0%. Make a bar graph of these percents and write a summary of the results. LFAVCOL 1.34 Garbage. The formal name for garbage is “municipal solid waste.” Here is a breakdown of the materials that make up American municipal solid waste:14 GARBAGE
Month
Energy (quadrillion Btu)
January
9.33
July
8.41
February
8.13
August
8.43
March
8.38
September
7.58
Food scraps
34.8
13.9
April
7.54
October
7.61
Glass
11.5
4.6
May
7.61
November
7.81
Metals
22.4
9.0
June
7.92
December
8.60
Paper, paperboard
71.3
28.5
Month
Energy (quadrillion Btu)
1.32 Favorite colors. What is your favorite color? One survey produced the following summary of responses to that question: blue, 42%; green, 14%; purple, 14%; red, 8%; black, 7%; orange, 5%; yellow, 3%; brown, 3%; gray, 2%; and white, 2%.13 Make a bar graph of the percents and write a short summary of the major features of your graph. FAVCOL
Material
Weight (million tons)
Percent of total (%)
Plastics
31.0
12.4
(a) Look at the table and describe how the energy consumption varies from month to month.
Rubber, leather, textiles
20.9
8.4
(b) Make a time plot of the data and describe the patterns.
Wood
15.9
6.4
Yard trimmings
33.4
13.4
(c) Suppose you wanted to communicate information about the month-to-month variation in energy consumption. Which would be more effective, the table of the data or the graph? Give reasons for your answer. 1.31 Energy consumption in a different year. Refer to the previous exercise. Here are the data for 2010: ENERGY
Month
Energy (quadrillion Btu)
January
9.13
July
8.38
February
8.21
August
8.44
March
8.21
September
7.69
April
7.37
October
7.51
May
7.68
November
7.80
June
8.01
December
9.23
Month
Energy (quadrillion Btu)
Other Total
8.6 249.6
3.2 100.0
(a) Add the weights and then the percents for the nine types of material given, including “Other.” Each entry, including the total, is separately rounded to the nearest tenth. So the sum and the total may slightly because of roundoff error. (b) Make a bar graph of the percents. The graph gives a clearer picture of the main contributors to garbage if you order the bars from tallest to shortest. (c) Make a pie chart of the percents. Compare the advantages and disadvantages of each graphical summary. Which do you prefer? Give reasons for your answer.
1.2 Displaying Distributions with Graphs 1.35 Recycled garbage. Refer to the previous exercise. The following table gives the percent of the weight that was recycled for each of the categories. GARBAGE
Material
Weight (million tons)
Percent recycled (%)
27
1.37 Market share for mobiles and tablet browsers. The following table gives the market share for the browsers used on mobiles and tablets. BROWSEM
Search engine
Market share (%)
Search engine
Market share (%)
Safari
61.50
Chrome
1.14
Food scraps
34.8
2.8
Android
26.09
Blackberry
1.09
Glass
11.5
27.1
Opera
Other
3.16
Metals
22.4
35.1
Paper, paperboard
71.3
62.5
Plastics
31.0
8.2
Rubber, leather, textiles
20.9
15.0
Wood
15.9
14.5
Yard trimmings
33.4
57.5
8.6
16.3
Other Total
(b) Make another bar graph where the materials are ordered by the percent recycled, largest percent to smallest percent. (c) Which bar graph, (a) or (b), do you prefer? Give a reason for your answer. (d) Explain why it is inappropriate to use a pie chart to display these data. 1.36 Market share for desktop browsers. The following table gives the market share for the browsers used on desktop computers.15 BROWSED
Market share (%)
Internet Explorer
54.76
Firefox Chrome
(a) Use a bar graph to display the market shares. (b) Use a pie chart to display the market shares. (c) Summarize what these graphical summaries tell you about market shares for browsers on mobiles and tablets. (d) Which graphical display do you prefer? Give reasons for your answer.
249.6
(a) Use a bar graph to display the percent recycled for these materials. Use the order of the materials given in the table above.
Search engine
7.02
Search engine
Market share (%)
1.38 Compare the market shares for browsers. Refer to the previous two exercises. Using the analyses that you have done for browsers for desktops and browsers for mobiles and tablets, write a short report comparing the market shares for these two types of devices. BROWSED, BROWSEM 1.39 Vehicle colors. Vehicle colors differ among regions of the world. Here are data on the most popular colors for vehicles in North America and Europe:16 VCOLORS
Color
North America (%)
Europe (%)
White
23
20
Black
18
25
Silver
16
15
Gray
13
18
Red
10
6
Blue
9
7
Safari
5.33
20.44
Opera
1.67
Brown/beige
5
5
17.24
Other
0.56
Yellow/gold
3
1
Other
3
3
(a) Use a bar graph to display the market shares. (b) Use a pie chart to display the market shares. (c) Summarize what these graphical summaries tell you about market shares for browsers on desktops. (d) Which graphical display do you prefer? Give reasons for your answer.
(a) Make a bar graph for the North America percents. (b) Make a bar graph for the Europe percents. (c) Now, be creative: make one bar graph that compares the two regions as well as the colors. Arrange your graph so that it is easy to compare the two regions.
28
CHAPTER 1
•
Looking at Data—Distributions
1.40 Facebook users by region. The following table gives the numbers of Facebook users by region of the FACER world as of November 2012:17
Region Africa Asia Caribbean
Facebook users (in millions) 40 195 6
Central America
41
Europe
233
Region Middle East North America Oceania/ Australia South America
Facebook users (in millions) 20 173
113
(b) Describe the major features of your graph in a short paragraph. 1.41 Facebook ratios. One way to compare the numbers of Facebook users for different regions of the world is to take into account the populations of these regions. The market penetration for a product is the number of users divided by the number of potential users, expressed as a percent. For Facebook, we use the population as the number of potential users. Here are estimates of the populations in 2012 of the same geographic regions that we studied in the previous FACER exercise:18
Region
Population (in millions)
Africa
1026
Middle East
213
Asia
3900
North America
347
Oceania/ Australia
36
South America
402
Caribbean
39
Central America
155
Europe
818
(c) Use a stemplot to describe these data. You can list any extreme outliers separately from the plot. (d) Describe the major features of these data using your plot and your list of outliers. (e) How effective is the stemplot for summarizing these data? Give reasons for your answer.
14
(a) Use a bar graph to describe these data.
Region
(b) Carefully examine your table, and summarize what it shows. Are there any extreme outliers? Which ones would you classify in this way?
Population (in millions)
(a) Compute the market penetration for each region by dividing the number of users from the previous exercise by the population size given in this exercise. Multiply these ratios by 100 to make the ratios similar to percents, and make a table of the results. Use the values in this table to answer the remaining parts of this exercise.
(f) Explain why the values in the table that you constructed in part (a) are not the same as the percents of the population in each region who are users. 1.42 Sketch a skewed distribution. Sketch a histogram for a distribution that is skewed to the left. Suppose that you and your friends emptied your pockets of coins and recorded the year marked on each coin. The distribution of dates would be skewed to the left. Explain why. 1.43 Grades and self-concept. Table 1.3 presents data on 78 seventh-grade students in a rural midwestern school.19 The researcher was interested in the relationship between the students’ “self-concept” and their academic performance. The data we give here include each student’s grade point average (GPA), score on a standard IQ test, and gender, taken from school records. Gender is coded as F for female and M for male. The students are identified only by an observation number (OBS). The missing OBS numbers show that some students dropped out of the study. The final variable is each student’s score on the Piers-Harris Children’s Self-Concept Scale, a psychological test administered SEVENGR by the researcher. (a) How many variables does this data set contain? Which are categorical variables and which are quantitative variables? (b) Make a stemplot of the distribution of GPA, after rounding to the nearest tenth of a point. (c) Describe the shape, center, and spread of the GPA distribution. Identify any suspected outliers from the overall pattern. (d) Make a back-to-back stemplot of the rounded GPAs for female and male students. Write a brief comparison of the two distributions. 1.44 Describe the IQ scores. Make a graph of the distribution of IQ scores for the seventh-grade students in Table 1.3. Describe the shape, center, and spread
1.2 Displaying Distributions with Graphs of the distribution, as well as any outliers. IQ scores are usually said to be centered at 100. Is the midpoint for these students close to 100, clearly above, or SEVENGR clearly below?
1.45 Describe the self-concept scores. Based on a suitable graph, briefly describe the distribution of selfconcept scores for the students in Table 1.3. Be sure to identify any suspected outliers. SEVENGR
TABLE 1.3 Educational Data for 78 Seventh-Grade Students OBS
GPA
IQ
Gender
Selfconcept
OBS
GPA
IQ
Gender
Selfconcept
001
7.940
111
M
67
043
10.760
123
M
64
002
8.292
107
M
43
044
9.763
124
M
58
003
4.643
100
M
52
045
9.410
126
M
70
004
7.470
107
M
66
046
9.167
116
M
72
005
8.882
114
F
58
047
9.348
127
M
70
006
7.585
115
M
51
048
8.167
119
M
47
007
7.650
111
M
71
050
3.647
97
M
52
008
2.412
97
M
51
051
3.408
86
F
46
009
6.000
100
F
49
052
3.936
102
M
66
010
8.833
112
M
51
053
7.167
110
M
67
011
7.470
104
F
35
054
7.647
120
M
63
012
5.528
89
F
54
055
0.530
103
M
53
013
7.167
104
M
54
056
6.173
115
M
67
014
7.571
102
F
64
057
7.295
93
M
61
015
4.700
91
F
56
058
7.295
72
F
54
016
8.167
114
F
69
059
8.938
111
F
60
017
7.822
114
F
55
060
7.882
103
F
60
018
7.598
103
F
65
061
8.353
123
M
63
019
4.000
106
M
40
062
5.062
79
M
30
020
6.231
105
F
66
063
8.175
119
M
54
021
7.643
113
M
55
064
8.235
110
M
66
022
1.760
109
M
20
065
7.588
110
M
44
024
6.419
108
F
56
068
7.647
107
M
49
026
9.648
113
M
68
069
5.237
74
F
44
027
10.700
130
F
69
071
7.825
105
M
67
028
10.580
128
M
70
072
7.333
112
F
64
029
9.429
128
M
80
074
9.167
105
M
73
030
8.000
118
M
53
076
7.996
110
M
59
031
9.585
113
M
65
077
8.714
107
F
37
032
9.571
120
F
67
078
7.833
103
F
63
033
8.998
132
F
62
079
4.885
77
M
36
034
8.333
111
F
39
080
7.998
98
F
64
035
8.175
124
M
71
083
3.820
90
M
42
036
8.000
127
M
59
084
5.936
96
F
28
037
9.333
128
F
60
085
9.000
112
F
60
038
9.500
136
M
64
086
9.500
112
F
70
039
9.167
106
M
71
087
6.057
114
M
51
040
10.140
118
F
72
088
6.057
93
F
21
041
9.999
119
F
54
089
6.938
106
M
56
29
30
CHAPTER 1
•
Looking at Data—Distributions
1.46 The Boston Marathon. Women were allowed to enter the Boston Marathon in 1972. Here are the times
(in minutes, rounded to the nearest minute) for the winning women from 1972 to 2012:
Year
Time
Year
Time
Year
Time
Year
Time
1972
190
1983
143
1994
142
2005
145
1973
186
1984
149
1995
145
2006
143
1974
167
1985
154
1996
147
2007
149
1975
162
1986
145
1997
146
2008
145
1976
167
1987
146
1998
143
2009
152
1977
168
1988
145
1999
143
2010
146
1978
165
1989
144
2000
146
2011
142
1979
155
1990
145
2001
144
2012
151
1980
154
1991
144
2002
141
1981
147
1992
144
2003
145
1982
150
1993
145
2004
144
Make a graph that shows change over time. What overall pattern do you see? Have times stopped
improving in recent years? If so, when did improvement end? MARATH
1.3 Describing Distributions with Numbers When you complete this section, you will be able to • Describe the center of a distribution by using the mean. • Describe the center of a distribution by using the median. • Compare the mean and the median as measures of center for a particular set of data. • Describe the spread of a distribution by using quartiles. • Describe a distribution by using the five-number summary. • Describe a distribution by using a boxplot and a modified boxplot. • Compare one or more sets of data measured on the same variable by using side-by-side boxplots. • Identify outliers by using the 1.5 3 IQR rule. • Describe the spread of a distribution by using the standard deviation. • Choose measures of center and spread for a particular set of data. • Compute the effects of a linear transformation on the mean, the median, the standard deviation, and the interquartile range.
We can begin our data exploration with graphs, but numerical summaries make our analysis more specific. A brief description of a distribution should include its shape and numbers describing its center and spread. We describe the shape of a distribution based on inspection of a histogram or a stemplot. Now we will learn specific ways to use numbers to measure the center and spread of a distribution. We can calculate these numerical measures for any
1.3 Describing Distributions with Numbers
31
quantitative variable. But to interpret measures of center and spread, and to choose among the several measures we will learn, you must think about the shape of the distribution and the meaning of the data. The numbers, like graphs, are aids to understanding, not “the answer” in themselves.
EXAMPLE 1.23 The distribution of business start times. An entrepreneur faces many DATA TIME24
555567778 2233588 78 6 8 2 06
CHALLENGE
0 1 2 3 4 5 6 7 8 9
4
FIGURE 1.14 Stemplot for the sample of 24 business start times, for Example 1.23.
bureaucratic and legal hurdles when starting a new business. The World Bank collects information about starting businesses throughout the world. They have determined the time, in days, to complete all the procedures required to start a business.20 Data for 184 countries are included in the data file TIME. In this section we will examine data for a sample of 24 of these countries. Here are the data (start times, in days): 13
66
36
12
8
27
6
7
5
7
52
48
15
7
12
94
28
5
13
60
5
5
18
18
The stemplot in Figure 1.14 shows us the shape, center, and spread of the business start times. The stems are tens of days and the leaves are days. The distribution is highly skewed to the right. The largest value, 94, is separated from the rest of the distribution. We could consider this observation to be an outlier, but it appears to be part of a very long right tail. The values range from 5 to 94 days with a center somewhere around 10.
Measuring center: the mean Numerical description of a distribution begins with a measure of its center or average. The two common measures of center are the mean and the median. The mean is the “average value” and the median is the “middle value.” These are two different ideas for “center,” and the two measures behave differently. We need precise recipes for the mean and the median.
THE MEAN x To find the mean x of a set of observations, add their values and divide by the number of observations. If the n observations are x1, x2, . . . , xn, their mean is x⫽
x1 ⫹ x2 ⫹ p ⫹ xn n
or, in more compact notation, x⫽
1 x na i
The g (capital Greek sigma) in the formula for the mean is short for “add them all up.” The bar over the x indicates the mean of all the x-values. Pronounce the mean x as “x-bar.” This notation is so common that writers who are discussing data use x, y, etc. without additional explanation. The subscripts on the observations xi are a way of keeping the n observations separate.
32
CHAPTER 1
•
Looking at Data—Distributions
EXAMPLE DATA
1.24 Mean time to start a business. The mean time to start a business is x⫽
TIME24
⫽ ⫽
x1 ⫹ x2 ⫹ p ⫹ xn n 13 ⫹ 66 ⫹ p ⫹ 18 24 567 ⫽ 23.625 24
CHALLENGE
The mean time to start a business for the 24 countries in our data set is 23.6 days. Note that we have rounded the answer. Our goal is to use the mean to describe the center of a distribution; it is not to demonstrate that we can compute with great accuracy. The extra digits do not provide any additional useful information. In fact, they distract our attention from the important digits that are meaningful. Do you think it would be better to report the mean as 24 days? The value of the mean will not necessarily be equal to the value of one of the observations in the data set. Our example of time to start a business illustrates this fact. USE YOUR KNOWLEDGE
DATA TIME25
1.47 Include the outlier. The complete business start time data set with 184 countries has a few with very large start times. In constructing the data set for Example 1.23, a random sample of 25 countries was selected. This sample included the South American country of Suriname, where the start time is 694 days. This country was deleted for Example 1.23. Reconstruct the original random sample by including Suriname. Show that the mean has increased to 50 days. (This is a rounded number. You should report the mean with one digit after the decimal.) The effect of the outlier is to more than double the mean.
CHALLENGE DATA
1.48 Find the mean. Here are the scores on the first exam in an introductory statistics course for 10 students: STAT
81
73
93
85
75
98
93
55
80
90
Find the mean first-exam score for these students.
CHALLENGE
resistant measure
robust measure
Exercise 1.47 illustrates an important weakness of the mean as a measure of center: the mean is sensitive to the influence of a few extreme observations. These may be outliers, but a skewed distribution that has no outliers will also pull the mean toward its long tail. Because the mean cannot resist the influence of extreme observations, we say that it is not a resistant measure of center. A measure that is resistant does more than limit the influence of outliers. Its value does not respond strongly to changes in a few observations, no matter how large those changes may be. The mean fails this requirement because we can make the mean as large as we wish by making a large enough increase in just one observation. A resistant measure is sometimes called a robust measure.
1.3 Describing Distributions with Numbers
33
Measuring center: the median We used the midpoint of a distribution as an informal measure of center in Section 1.2. The median is the formal version of the midpoint, with a specific rule for calculation.
THE MEDIAN M The median M is the midpoint of a distribution. Half the observations are smaller than the median, and the other half are larger than the median. Here is a rule for finding the median: 1. Arrange all observations in order of size, from smallest to largest. 2. If the number of observations n is odd, the median M is the center observation in the ordered list. Find the location of the median by counting 1n ⫹ 12兾2 observations up from the bottom of the list. 3. If the number of observations n is even, the median M is the mean of the two center observations in the ordered list. The location of the median is again 1n ⫹ 12兾2 from the bottom of the list.
Note that the formula 1n ⫹ 12兾2 does not give the median, just the location of the median in the ordered list. Medians require little arithmetic, so they are easy to find by hand for small sets of data. Arranging even a moderate number of observations in order is tedious, however, so that finding the median by hand for larger sets of data is unpleasant. Even simple calculators have an x button, but you will need computer software or a graphing calculator to automate finding the median.
EXAMPLE DATA TIME24
1.25 Median time to start a business. To find the median time to start a business for our 24 countries, we first arrange the data in order from smallest to largest. 5
5
5
5
6
7
7
7
8
12
12
13
13
15
18
18
27
28
36
48
52
60
66
94
CHALLENGE
The count of observations n ⫽ 24 is even. The median, then, is the average of the two center observations in the ordered list. To find the location of the center observations, we first compute n⫹1 25 ⫽ ⫽ 12.5 2 2 Therefore, the center observations are the 12th and 13th observations in the ordered list. The median is location of M ⫽
M⫽
13 ⫹ 13 ⫽ 13 2
34
CHAPTER 1
•
Looking at Data—Distributions Note that you can use the stemplot in Figure 1.14 directly to compute the median. In the stemplot the cases are already ordered and you simply need to count from the top or the bottom to the desired location. USE YOUR KNOWLEDGE DATA TIME25
DATA CHALLENGE DATA
CALLS80
1.49 Include the outlier. Include Suriname, where the start time is 694 days, in the data set, and show that the median is 13 days. Note that with this case included, the sample size is now 25 and the median is the 13th observation in the ordered list. Write out the ordered list and circle the outlier. Describe the effect of the outlier on the median for this set of data. 1.50 Calls to a customer service center. The service times for 80 calls to a customer service center are given in Table 1.2 (page 19). Use these data to compute the median service time. 1.51 Find the median. Here are the scores on the first exam in an introductory statistics course for 10 students:
STAT
81
73
93
85
75
98
93
55
80
90
Find the median first-exam score for these students. CHALLENGE
Mean versus median
CHALLENGE CHALLENGE
Exercises 1.47 and 1.49 illustrate an important difference between the mean and the median. Suriname is an outlier. It pulls the mean time to start a business up from 24 days to 54 days. The median remained at 13 days. The median is more resistant than the mean. If the largest start time in the data set was 1200 days, the median for all 25 countries would still be 13 days. The largest observation just counts as one observation above the center, no matter how far above the center it lies. The mean uses the actual value of each observation and so will chase a single large observation upward. The best way to compare the response of the mean and median to extreme observations is to use an interactive applet that allows you to place points on a line and then drag them with your computer’s mouse. Exercises 1.85 to 1.87 use the Mean and Median applet on the website for this book, whfreeman.com/ ips8e, to compare the mean and the median. The median and mean are the most common measures of the center of a distribution. The mean and median of a symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is farther out in the long tail than is the median. The endowment for a college or university is money set aside and invested. The income from the endowment is usually used to support various programs. The distribution of the sizes of the endowments of colleges and universities is strongly skewed to the right. Most institutions have modest endowments, but a few are very wealthy. The median endowment of colleges and universities in a recent year was $93 million—but the mean endowment was $498 million.21 The few wealthy institutions pull the mean up but do not affect the median. Don’t confuse the “average” value of a variable (the mean) with its “typical” value, which we might describe by the median.
1.3 Describing Distributions with Numbers
35
We can now give a better answer to the question of how to deal with outliers in data. First, look at the data to identify outliers and investigate their causes. You can then correct outliers if they are wrongly recorded, delete them for good reason, or otherwise give them individual attention. The outlier in Example 1.21 (page 23) can be dropped from the data once we discover that it is an error. If you have no clear reason to drop outliers, you may want to use resistant methods in your analysis, so that outliers have little influence over your conclusions. The choice is often a matter for judgment.
Measuring spread: the quartiles
quartile
percentile
A measure of center alone can be misleading. Two countries with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families. A drug manufactured with the correct mean concentration of active ingredient is dangerous if some batches are much too high and others much too low. We are interested in the spread or variability of incomes and drug potencies as well as their centers. The simplest useful numerical description of a distribution consists of both a measure of center and a measure of spread. We can describe the spread or variability of a distribution by giving several percentiles. The median divides the data in two; half of the observations are above the median and half are below the median. We could call the median the 50th percentile. The upper quartile is the median of the upper half of the data. Similarly, the lower quartile is the median of the lower half of the data. With the median, the quartiles divide the data into four equal parts; 25% of the data are in each part. We can do a similar calculation for any percent. The pth percentile of a distribution is the value that has p percent of the observations fall at or below it. To calculate a percentile, arrange the observations in increasing order and count up the required percent from the bottom of the list. Our definition of percentiles is a bit inexact because there is not always a value with exactly p percent of the data at or below it. We will be content to take the nearest observation for most percentiles, but the quartiles are important enough to require an exact rule.
THE QUARTILES Q1 AND Q3 To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile Q1 is the median of the observations whose positions in the ordered list are to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose positions in the ordered list are to the right of the location of the overall median.
36
CHAPTER 1
•
Looking at Data—Distributions Here is an example.
EXAMPLE DATA
1.26 Finding the quartiles. Here is the ordered list of the times to start a business in our sample of 24 countries:
TIME24
5
5
5
5
6
7
7
7
8
12
12
13
13
15
18
18
27
28
36
48
52
60
66
94
CHALLENGE
The count of observations n ⫽ 24 is even, so the median is at position 124 ⫹ 12兾2 ⫽ 12.5, that is, between the 12th and the 13th observation in the ordered list. There are 12 cases above this position and 12 below it. The first quartile is the median of the first 12 observations, and the third quartile is the median of the last 12 observations. Check that Q1 ⫽ 7 and Q3 ⫽ 32. Notice that the quartiles are resistant. For example, Q3 would have the same value if the highest start time was 940 days rather than 94 days. Be careful when several observations take the same numerical value. Write down all the observations and apply the rules just as if they all had distinct values. USE YOUR KNOWLEDGE
DATA
1.52 Find the quartiles. Here are the scores on the first exam in an introductory statistics course for 10 students: 81
STAT
73
93
85
75
98
93
55
80
90
Find the quartiles for these first-exam scores.
EXAMPLE DATACHALLENGE
1.27 Results from software. Statistical software often provides several
TIME24
numerical measures in response to a single command. Figure 1.15 displays such output from Minitab, JMP, and SPSS software for the data on the time to start a business. Examine the outputs carefully. Notice that they give different numbers of significant digits for some of these numerical summaries. Which output do you prefer?
Minitab
CHALLENGE
Descriptive Statistics: TimeToStart Variable TimeToStart
N 24
Mean 23.63
StDev 23.83
Minimum 5.00
Q1 7.00
Median 13.00
Q3 34.00
Welcome to Minitab, press F1 for help.
Maximum 94.00
Editable
(a) Minitab
FIGURE 1.15 Descriptive statistics from (a) Minitab (continued )
1.3 Describing Distributions with Numbers
37
JMP
Distributions TimeToStart Quantiles 100.0%
Summary Statistics 94
Mean
99.5%
94
Std Dev
97.5%
94
Std Err Mean
4.8640252
90.0%
63
Upper 95% Mean
33.687003 13.562997
maximum
23.625 23.82876
75.0%
quartile
34
Lower 95% Mean
50.0%
median
13
N
25.0%
quartile
7
10.0%
5
2.5%
5
0.5% 0.0%
24
5 5
minimum
(b) JMP
*Output1 - IBM SPSS Statistics Viewer
Descriptives Descriptive Statistics N TimeToStart
24
Valid N (listwise)
24
Minimum
Maximum
Mean
Std. Deviation
5
94
23.63
23.829
IBM SPSS Statistics Processor is ready
(c) SPSS
FIGURE 1.15 (b) JMP, and (c) SPSS for the time to start a business, for Example 1.27.
There are several rules for calculating quartiles, which often give slightly different values. The differences are generally small. For describing data, just report the values that your software gives.
The five-number summary and boxplots In Section 1.2, we used the smallest and largest observations to indicate the spread of a distribution. These single observations tell us little about the distribution as a whole, but they give information about the tails of the distribution that is missing if we know only Q1, M, and Q3. To get a quick summary of both center and spread, use all five numbers.
38
CHAPTER 1
•
Looking at Data—Distributions
THE FIVE-NUMBER SUMMARY The five-number summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the fivenumber summary is Minimum Q1 M Q3 Maximum
EXAMPLE DATA CALLS80
1.28 Service center call lengths. Table 1.2 (page 19) gives the service center call lengths for the sample of 80 calls that we discussed in Example 1.15. The five-number summary for these data is 1.0, 54.5, 103.5, 200, and 2631. The distribution is highly skewed. The mean is 197 seconds, a value that is very close to the third quartile.
USE YOUR KNOWLEDGE CHALLENGE DATA CALLS80
1.53 Verify the calculations. Refer to the five-number summary and the mean for service center call lengths given in Example 1.28. Verify these results. Do not use software for this exercise and be sure to show all your work.
DATA
1.54 Find the five-number summary. Here are the scores on the first exam in an introductory statistics course for 10 students: 81
STAT
73
93
85
75
98
93
55
80
90
Find the five-number summary for these first-exam scores. CHALLENGE
The five-number summary leads to another visual representation of a distribution, the boxplot. CHALLENGE
BOXPLOT A boxplot is a graph of the five-number summary. • A central box spans the quartiles Q1 and Q3. • A line in the box marks the median M. • Lines extend from the box out to the smallest and largest observations.
whiskers box-and-whisker plots
The lines extending to the smallest and largest observations are sometimes called whiskers, and boxplots are sometimes called box-and-whisker plots.
1.3 Describing Distributions with Numbers
39
Software provides many varieties of boxplots, some of which use different choices for the placement of the whiskers. When you look at a boxplot, first locate the median, which marks the center of the distribution. Then look at the spread. The quartiles show the spread of the middle half of the data, and the extremes (the smallest and largest observations) show the spread of the entire data set.
EXAMPLE DATA IQ
1.29 IQ scores. In Example 1.14 (page 16), we used a histogram to examine the distribution of a sample of 60 IQ scores. A boxplot for these data is given in Figure 1.16. Note that the mean is marked with a “1” and appears very close to the median. The two quartiles are each approximately the same distance from the median, and the two whiskers are approximately the same distance from the corresponding quartiles. All these characteristics are consistent with a symmetric distribution, as illustrated by the histogram in Figure 1.9.
CHALLENGE DATA
USE YOUR KNOWLEDGE 1.55 Make a boxplot. Here are the scores on the first exam in an introductory statistics course for 10 students: 81
STAT
73
93
85
75
98
93
55
80 90
Make a boxplot for these first-exam scores.
The 1.5 3 IQR rule for suspected outliers CHALLENGE
If we look at the data in Table 1.2 (page 19), we can spot a clear outlier, a call lasting 2631 seconds, more than twice the length of any other call. How can we describe the spread of this distribution? The smallest and largest observations are extremes that do not describe the spread of the majority of the data.
FIGURE 1.16 Boxplot for sample
160
of 60 IQ scores, for Example 1.29.
1Q
140
120 ⴙ
100
80
40
CHAPTER 1
•
Looking at Data—Distributions The distance between the quartiles (the range of the center half of the data) is a more resistant measure of spread than the range. This distance is called the interquartile range.
THE INTERQUARTILE RANGE IQR The interquartile range IQR is the distance between the first and third quartiles: IQR ⫽ Q3 ⫺ Q1
EXAMPLE 1.30 IQR for service center call length data. In Exercise 1.53 (page 38) you verified that the five-number summary for our data on service center call lengths was 1.0, 54.5, 103.5, 200, and 2631. Therefore, we calculate IQR ⫽ Q3 ⫺ Q1 IQR ⫽ 200 ⫺ 54.5 ⫽ 145.5 The quartiles and the IQR are not affected by changes in either tail of the distribution. They are therefore resistant, because changes in a few data points have no further effect once these points move outside the quartiles. However, no single numerical measure of spread, such as IQR, is very useful for describing skewed distributions. The two sides of a skewed distribution have different spreads, so one number can’t summarize them. We can often detect skewness from the five-number summary by comparing how far the first quartile and the minimum are from the median (left tail) with how far the third quartile and the maximum are from the median (right tail). The interquartile range is mainly used as the basis for a rule of thumb for identifying suspected outliers.
THE 1.5 3 IQR RULE FOR OUTLIERS Call an observation a suspected outlier if it falls more than 1.5 ⫻ IQR above the third quartile or below the first quartile.
EXAMPLE DATA CALLS80
1.31 Outliers for call length data. For the call length data in Table 1.2 (page 19), 1.5 ⫻ IQR ⫽ 1.5 ⫻ 145.5 ⫽ 218.25 Any values below 54.5 ⫺ 218.25 ⫽ ⫺163.75 or above 200 ⫹ 218.25 ⫽ 418.25 are flagged as possible outliers. There are no low outliers, but the 8 longest calls are flagged as possible high outliers. Their lengths are 438 465 479 700 700 951 1148 2631
CHALLENGE
1.3 Describing Distributions with Numbers
41
USE YOUR KNOWLEDGE DATA
1.56 Find the IQR. Here are the scores on the first exam in an introductory statistics course for 10 students: 81
STAT
73
93
85
75
98
93
55
80 90
Find the interquartile range and use the 1.5 ⫻ IQR rule to check for outliers. How low would the lowest score need to be for it to be an outlier according to this rule?
CHALLENGE
modified boxplot
side-by-side boxplots
Two variations on the basic boxplot can be very useful. The first, called a modified boxplot, uses the 1.5 ⫻ IQR rule. The lines that extend out from the quartiles are terminated in whiskers that are 1.5 ⫻ IQR in length. Points beyond the whiskers are plotted individually and are classified as outliers according to the 1.5 ⫻ IQR rule. The other variation is to use two or more boxplots in the same graph to compare groups measured on the same variable. These are called side-by-side boxplots. The following example illustrates these two variations.
EXAMPLE DATA POETS
1.32 Do poets die young? According to William Butler Yeats, “She is the Gaelic muse, for she gives inspiration to those she persecutes. The Gaelic poets die young, for she is restless, and will not let them remain long on earth.” One study designed to investigate this issue examined the age at death for writers from different cultures and genders.22 Three categories of writers examined were novelists, poets, and nonfiction writers. We examine the ages at death for female writers in these categories from North America. Figure 1.17 shows modified side-by-side boxplots for the three categories of writers.
CHALLENGE
FIGURE 1.17 Modified side-
100
by-side boxplots for the data on writers’ age at death, for Example 1.32.
Age (years)
80
ⴙ ⴙ ⴙ
60
40
110
20 Nonfiction
Novels Type
Poems
42
CHAPTER 1
•
Looking at Data—Distributions Displaying the boxplots for the three categories of writers lets us compare the three distributions. We see that nonfiction writers tend to live the longest, followed by novelists. The poets do appear to die young! There is one outlier among the nonfiction writers, which is plotted individually along with the value of its label. This writer died at the age of 40, young for a nonfiction writer, but not for a novelist or a poet!
Measuring spread: the standard deviation The five-number summary is not the most common numerical description of a distribution. That distinction belongs to the combination of the mean to measure center and the standard deviation to measure spread, or variability. The standard deviation measures spread by looking at how far the observations are from their mean.
THE STANDARD DEVIATION s The variance s2 of a set of observations is the average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations x1, x2, . . . , xn is 1x1 ⫺ x 2 2 ⫹ 1x2 ⫺ x 2 2 ⫹ p ⫹ 1xn ⫺ x 2 2 s2 ⫽ n⫺1 or, in more compact notation, s2 ⫽
1 g 1xi ⫺ x 2 2 n⫺1
The standard deviation s is the square root of the variance s2: s⫽
1 g 1xi ⫺ x 2 2 Bn ⫺ 1
The idea behind the variance and the standard deviation as measures of spread is as follows: The deviations xi ⫺ x display the spread of the values xi about their mean x. Some of these deviations will be positive and some negative because some of the observations fall on each side of the mean. In fact, the sum of the deviations of the observations from their mean will always be zero. Squaring the deviations makes the negative deviations positive, so that observations far from the mean in either direction have large positive squared deviations. The variance is the average squared deviation. Therefore, s2 and s will be large if the observations are widely spread about their mean, and small if the observations are all close to the mean.
EXAMPLE DATA METABOL
1.33 Metabolic rate. A person’s metabolic rate is the rate at which the body consumes energy. Metabolic rate is important in studies of weight gain, dieting, and exercise. Here are the metabolic rates of 7 men who took
1.3 Describing Distributions with Numbers _ x = 1600
x = 1439
x = 1792
deviation = –161
1300
1400
1500
43
deviation = 192
1600 Metabolic rate
1700
1800
1900
FIGURE 1.18 Metabolic rates for seven men, with the mean (*) and the deviations of two observations from the mean, for Example 1.33.
part in a study of dieting. (The units are calories per 24 hours. These are the same calories used to describe the energy content of foods.) 1792
1666
1362
1614
1460
1867
1439
Enter these data into your calculator or software and verify that x ⫽ 1600 calories s ⫽ 189.24 calories Figure 1.18 plots these data as dots on the calorie scale, with their mean marked by an asterisk 1*2. The arrows mark two of the deviations from the mean. If you were calculating s by hand, you would find the first deviation as x1 ⫺ x ⫽ 1792 ⫺ 1600 ⫽ 192
Exercise 1.82 asks you to calculate the seven deviations from Example 1.33, square them, and find s2 and s directly from the deviations. Working one or two short examples by hand helps you understand how the standard deviation is obtained. In practice, you will use either software or a calculator that will find s. The software outputs in Figure 1.15 (page 37) give the standard deviation for the data on the time to start a business.
USE YOUR KNOWLEDGE DATA
1.57 Find the variance and the standard deviation. Here are the scores on the first exam in an introductory statistics course for 10 students: STAT
81
73
93
85
75
98
93
55
80
90
Find the variance and the standard deviation for these first-exam scores.
CHALLENGE
The idea of the variance is straightforward: it is the average of the squares of the deviations of the observations from their mean. The details we have just presented, however, raise some questions. Why do we square the deviations? • First, the sum of the squared deviations of any set of observations from their mean is the smallest that the sum of squared deviations from any number can possibly be. This is not true of the unsquared distances. So squared deviations point to the mean as center in a way that distances do not.
44
CHAPTER 1
•
Looking at Data—Distributions • Second, the standard deviation turns out to be the natural measure of spread for a particularly important class of symmetric unimodal distributions, the Normal distributions. We will meet the Normal distributions in the next section. Why do we emphasize the standard deviation rather than the variance? • One reason is that s, not s2, is the natural measure of spread for Normal distributions, which are introduced in the next section. • There is also a more general reason to prefer s to s2. Because the variance involves squaring the deviations, it does not have the same unit of measurement as the original observations. The variance of the metabolic rates, for example, is measured in squared calories. Taking the square root gives us a description of the spread of the distribution in the original measurement units. Why do we average by dividing by n 2 1 rather than n in calculating the variance? • Because the sum of the deviations is always zero, the last deviation can be found once we know the other n ⫺ 1. So we are not averaging n unrelated numbers. Only n ⫺ 1 of the squared deviations can vary freely, and we average by dividing the total by n ⫺ 1.
degrees of freedom
• The number n ⫺ 1 is called the degrees of freedom of the variance or standard deviation. Many calculators offer a choice between dividing by n and dividing by n ⫺ 1, so be sure to use n ⫺ 1.
Properties of the standard deviation Here are the basic properties of the standard deviation s as a measure of spread.
PROPERTIES OF THE STANDARD DEVIATION • s measures spread about the mean and should be used only when the mean is chosen as the measure of center. • s ⫽ 0 only when there is no spread. This happens only when all observations have the same value. Otherwise, s ⬎ 0. As the observations become more spread out about their mean, s gets larger. • s, like the mean x, is not resistant. A few outliers can make s very large. USE YOUR KNOWLEDGE 1.58 A standard deviation of zero. Construct a data set with 5 cases that has a variable with s ⫽ 0. The use of squared deviations renders s even more sensitive than x to a few extreme observations. For example, when we add Suriname to our sample of 24 countries for the analysis of the time to start a business (Example 1.24 and Exercise 1.47), we increase the standard deviation from 23.8 to 137.9! Distributions with outliers and strongly skewed distributions have standard deviations that do not give much helpful information about such distributions.
1.3 Describing Distributions with Numbers
45
USE YOUR KNOWLEDGE DATA TIME24, TIME25
1.59 Effect of an outlier on the IQR. Find the IQR for the time to start a business with and without Suriname. What do you conclude about the sensitivity of this measure of spread to the inclusion of an outlier?
Choosing measures of center and spread
CHALLENGE
How do we choose between the five-number summary and x and s to describe the center and spread of a distribution? Because the two sides of a strongly skewed distribution have different spreads, no single number such as s describes the spread well. The five-number summary, with its two quartiles and two extremes, does a better job.
CHOOSING A SUMMARY The five-number summary is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with strong outliers. Use x and s for reasonably symmetric distributions that are free of outliers.
Remember that a graph gives the best overall picture of a distribution. Numerical measures of center and spread report specific facts about a distribution, but they do not describe its shape. Numerical summaries do not disclose the presence of multiple modes or gaps, for example. Always plot your data.
Changing the unit of measurement The same variable can be recorded in different units of measurement. Americans commonly record distances in miles and temperatures in degrees Fahrenheit, while the rest of the world measures distances in kilometers and temperatures in degrees Celsius. Fortunately, it is easy to convert numerical descriptions of a distribution from one unit of measurement to another. This is true because a change in the measurement unit is a linear transformation of the measurements.
LINEAR TRANSFORMATIONS A linear transformation changes the original variable x into the new variable xnew given by an equation of the form xnew ⫽ a ⫹ bx Adding the constant a shifts all values of x upward or downward by the same amount. In particular, such a shift changes the origin (zero point) of the variable. Multiplying by the positive constant b changes the size of the unit of measurement.
46
CHAPTER 1
•
Looking at Data—Distributions
EXAMPLE 1.34 Change the units. (a) If a distance x is measured in kilometers, the same distance in miles is xnew ⫽ 0.62x For example, a 10-kilometer race covers 6.2 miles. This transformation changes the units without changing the origin—a distance of 0 kilometers is the same as a distance of 0 miles. (b) A temperature x measured in degrees Fahrenheit must be reexpressed in degrees Celsius to be easily understood by the rest of the world. The transformation is xnew ⫽
5 160 5 1x ⫺ 322 ⫽ ⫺ ⫹ x 9 9 9
Thus, the high of 95°F on a hot American summer day translates into 35°C. In this case a⫽⫺
160 5 and b ⫽ 9 9
This linear transformation changes both the unit size and the origin of the measurements. The origin in the Celsius scale 10°C, the temperature at which water freezes) is 32° in the Fahrenheit scale. Linear transformations do not change the shape of a distribution. If measurements on a variable x have a right-skewed distribution, any new variable xnew obtained by a linear transformation xnew ⫽ a ⫹ bx (for b ⬎ 02 will also have a right-skewed distribution. If the distribution of x is symmetric and unimodal, the distribution of xnew remains symmetric and unimodal. Although a linear transformation preserves the basic shape of a distribution, the center and spread will change. Because linear changes of measurement scale are common, we must be aware of their effect on numerical descriptive measures of center and spread. Fortunately, the changes follow a simple pattern.
EXAMPLE 1.35 Use scores to find the points. In an introductory statistics course, homework counts for 300 points out of a total of 1000 possible points for all course requirements. During the semester there were 12 homework assignments, and each was given a grade on a scale of 0 to 100. The maximum total score for the 12 homework assignments is therefore 1200. To convert the homework scores to final grade points, we need to convert the scale of 0 to 1200 to a scale of 0 to 300. We do this by multiplying the homework scores by 300/1200. In other words, we divide the homework scores by 4. Here are the homework scores and the corresponding final grade points for 5 students: Student
1
2
3
4
5
Score
1056
1080
900
1164
1020
Points
264
270
225
291
255
1.3 Describing Distributions with Numbers
47
These two sets of numbers measure the same performance on homework for the course. Since we obtained the points by dividing the scores by 4, the mean of the points will be the mean of the scores divided by 4. Similarly, the standard deviation of points will be the standard deviation of the scores divided by 4. USE YOUR KNOWLEDGE 1.60 Calculate the points for a student. Use the setting of Example 1.35 to find the points for a student whose score is 850. Here is a summary of the rules for linear transformations.
EFFECT OF A LINEAR TRANSFORMATION To see the effect of a linear transformation on measures of center and spread, apply these rules: • Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (interquartile range and standard deviation) by b. • Adding the same number a (either positive or negative) to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread.
In Example 1.35, when we converted from score to points, we described the transformation as dividing by 4. The multiplication part of the summary of the effect of a linear transformation applies to this case because division by 4 is the same as multiplication by 0.25. Similarly, the second part of the summary applies to subtraction as well as addition because subtraction is simply the addition of a negative number. The measures of spread IQR and s do not change when we add the same number a to all the observations because adding a constant changes the location of the distribution but leaves the spread unaltered. You can find the effect of a linear transformation xnew ⫽ a ⫹ bx by combining these rules. For example, if x has mean x, the transformed variable xnew has mean a ⫹ bx.
SECTION 1.3 Summary A numerical summary of a distribution should report its center and its spread, or variability. The mean x and the median M describe the center of a distribution in different ways. The mean is the arithmetic average of the observations, and the median is their midpoint. When you use the median to describe the center of a distribution, describe its spread by giving the quartiles. The first quartile Q1 has one-fourth of the observations below it, and the third quartile Q3 has three-fourths of the observations below it.
48
CHAPTER 1
•
Looking at Data—Distributions The interquartile range is the difference between the quartiles. It is the spread of the center half of the data. The 1.5 ⴛ IQR rule flags observations more than 1.5 ⫻ IQR beyond the quartiles as possible outliers. The five-number summary consisting of the median, the quartiles, and the smallest and largest individual observations provides a quick overall description of a distribution. The median describes the center, and the quartiles and extremes show the spread. Boxplots based on the five-number summary are useful for comparing several distributions. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the extremes and show the full spread of the data. In a modified boxplot, points identified by the 1.5 ⫻ IQR rule are plotted individually. Side-by-side boxplots can be used to display boxplots for more than one group on the same graph. The variance s2 and especially its square root, the standard deviation s, are common measures of spread about the mean as center. The standard deviation s is zero when there is no spread and gets larger as the spread increases. A resistant measure of any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are. The median and quartiles are resistant, but the mean and the standard deviation are not. The mean and standard deviation are good descriptions for symmetric distributions without outliers. They are most useful for the Normal distributions introduced in the next section. The five-number summary is a better exploratory description for skewed distributions. Linear transformations have the form xnew ⫽ a ⫹ bx. A linear transformation changes the origin if a ⬆ 0 and changes the size of the unit of measurement if b ⬎ 0. Linear transformations do not change the overall shape of a distribution. A linear transformation multiplies a measure of spread by b and changes a percentile or measure of center m into a ⫹ bm. Numerical measures of particular aspects of a distribution, such as center and spread, do not report the entire shape of most distributions. In some cases, particularly distributions with multiple peaks and gaps, these measures may not be very informative.
SECTION 1.3 Exercises For Exercises 1.47 and 1.48, see page 32; for Exercises 1.49 to 1.51, see page 34; for Exercise 1.52, see page 36; for Exercises 1.53 and 1.54, see page 38; for Exercise 1.55, see page 39; for Exercise 1.56, see page 41; for Exercise 1.57, see page 43; for Exercise 1.58, see page 44; for Exercise 1.59, see page 45; and for Exercise 1.60, see page 47. 1.61 Gosset’s data on double stout sales. William Sealy Gosset worked at the Guinness Brewery in Dublin and made substantial contributions to the practice of statistics.23 In his work at the brewery he collected and analyzed a great deal of data. Archives with Gosset’s handwritten tables, graphs, and notes have been preserved at the Guinness Storehouse in Dublin.24 In one
study, Gosset examined the change in the double stout market before and after World War I (1914–1918). For various regions in England and Scotland, he calculated the ratio of sales in 1925, after the war, as a percent of sales in 1913, before the war. Here are the data: STOUT Bristol
94
Glasgow
66
Cardiff
112
Liverpool
140 428
English Agents
78
London
English O
68
Manchester
190
English P
46
Newcastle-on-Tyne
118
English R
111
Scottish
24
1.3 Describing Distributions with Numbers (a) Compute the mean for these data.
49
1.65 Measures of spread for smolts. Refer to the previSMOLTS ous exercise.
(b) Compute the median for these data. (c) Which measure do you prefer for describing the center of this distribution? Explain your answer. (You may include a graphical summary as part of your explanation.) 1.62 Measures of spread for the double stout data. Refer to the previous exercise. STOUT (a) Compute the standard deviation for these data.
(a) Find the standard deviation of the reflectance for these smolts. (b) Find the quartiles of the reflectance for these smolts. (c) Do you prefer the standard deviation or the quartiles as a measure of spread for these data? Give reasons for your preference. 1.66 Are there outliers in the smolt data? Refer to Exercise 1.64. SMOLTS
(b) Compute the quartiles for these data.
(a) Find the IQR for the smolt data.
(c) Which measure do you prefer for describing the spread of this distribution? Explain your answer. (You may include a graphical summary as part of your explanation.)
(b) Use the 1.5 3 IQR rule to identify any outliers. (c) Make a boxplot for the smolt data and describe the distribution using only the information in the boxplot.
1.63 Are there outliers in the double stout data? Refer to Exercise 1.61. STOUT
(d) Make a modified boxplot for these data and describe the distribution using only the information in the boxplot.
(a) Find the IQR for these data.
(e) Make a stemplot for these data.
(b) Use the 1.5 3 IQR rule to identify and name any outliers. (c) Make a boxplot for these data and describe the distribution using only the information in the boxplot. (d) Make a modified boxplot for these data and describe the distribution using only the information in the boxplot. (e) Make a stemplot for these data. (f) Compare the boxplot, the modified boxplot, and the stemplot. Evaluate the advantages and disadvantages of each graphical summary for describing the distribution of the double stout data. 1.64 Smolts. Smolts are young salmon at a stage when their skin becomes covered with silvery scales and they start to migrate from freshwater to the sea. The reflectance of a light shined on a smolt’s skin is a measure of the smolt’s readiness for the migration. Here are the reflectances, in percents, for a sample of 50 smolts:25 SMOLTS
57.6
54.8
63.4
57.0
54.7
42.3
63.6
55.5
33.5
63.3
58.3
42.1
56.1
47.8
56.1
55.9
38.8
49.7
42.3
45.6
69.0
50.4
53.0
38.3
60.4
49.3
42.8
44.5
46.4
44.3
58.9
42.1
47.6
47.9
69.2
46.6
68.1
42.8
45.6
47.3
59.6
37.8
53.9
43.2
51.4
64.5
43.8
42.7
50.9
43.8
(a) Find the mean reflectance for these smolts. (b) Find the median reflectance for these smolts. (c) Do you prefer the mean or the median as a measure of center for these data? Give reasons for your preference.
(f) Compare the boxplot, the modified boxplot, and the stemplot. Evaluate the advantages and disadvantages of each graphical summary for describing the distribution of the smolt reflectance data. 1.67 The value of brands. A brand is a symbol or images that are associated with a company. An effective brand identifies the company and its products. Using a variety of measures, dollar values for brands can be calculated.26 The most valuable brand is Apple, with a value of $76.568 million. Apple is followed by Google at $69.726 million, Coca-Cola at $67.839 million, Microsoft at $57.853 million, and IBM at $57.532 million. For this exercise you will use the brand values (in millions of dollars) for the BRANDS top 100 brands in the data file BRANDS. (a) Graphically display the distribution of the values of these brands. (b) Use numerical measures to summarize the distribution. (c) Write a short paragraph discussing the dollar values of the top 100 brands. Include the results of your analysis. 1.68 Alcohol content of beer. Brewing beer involves a variety of steps that can affect the alcohol content. The data file BEER gives the percent alcohol for 153 domestic brands of beer.27 BEER (a) Use graphical and numerical summaries of your choice to describe these data. Give reasons for your choices. (b) Give the alcohol content and the brand of any outliers. Explain how you determined that they were outliers.
50
CHAPTER 1
•
Looking at Data—Distributions
1.69 Remove the outliers for alcohol content of beer. BEER Refer to the previous exercise. (a) Calculate the mean with and without the outliers. Do the same for the median. Explain how these statistics change when the outliers are excluded. (b) Calculate the standard deviation with and without the outliers. Do the same for the quartiles. Explain how these statistics change when the outliers are excluded. (c) Write a short paragraph summarizing what you have learned in this exercise. 1.70 Calories in beer. Refer to the previous two exercises. The data file also gives the calories per 12 ounces of beverage. BEER (a) Analyze the data and summarize the distribution of calories for these 153 brands of beer. (b) In Exercise 1.68 you identified outliers. To what extent are these brands outliers in the distribution of calories? Explain your answer. 1.71 Potatoes. A quality product is one that is consistent and has very little variability in its characteristics. Controlling variability can be more difficult with agricultural products than with those that are manufactured. The following table gives the weights, in ounces, of the 25 potatoes sold in a 10-pound bag. POTATO 7.6 7.9 8.0 6.9 6.7 7.9 7.9 7.9 7.6 7.8 7.0 4.7 7.6 6.3 4.7 4.7 4.7 6.3 6.0 5.3 4.3 7.9 5.2 6.0 3.7
(a) Summarize the data graphically and numerically. Give reasons for the methods you chose to use in your summaries. (b) Do you think that your numerical summaries do an effective job of describing these data? Why or why not? (c) There appear to be two distinct clusters of weights for these potatoes. Divide the sample into two subsamples based on the clustering. Give the mean and standard deviation for each subsample. Do you think that this way of summarizing these data is better than a numerical summary that uses all the data as a single sample? Give a reason for your answer. 1.72 Longleaf pine trees. The Wade Tract in Thomas County, Georgia, is an old-growth forest of longleaf pine trees (Pinus palustris) that has survived in a relatively undisturbed state since before the settlement of the area by Europeans. A study collected data on 584 of these trees.28 One of the variables measured was the diameter at breast height (DBH). This is the diameter of the tree at 4.5 feet and the units are centimeters (cm). Only trees with DBH greater than 1.5 cm were sampled. Here are the diameters of a random sample of 40 of these trees: PINES
10.5
13.3
26.0
18.3
52.2
9.2
26.1
17.6
40.5
47.2
11.4
2.7
69.3
44.4
16.9
35.7
5.4
44.2
31.8 2.2
4.3
7.8
38.1
2.2
11.4
51.5
4.9
39.7
32.6
51.8
43.6
2.3
44.6
31.5
40.3
22.3
43.3
37.5
29.1
27.9
(a) Find the five-number summary for these data. (b) Make a boxplot. (c) Make a histogram. (d) Write a short summary of the major features of this distribution. Do you prefer the boxplot or the histogram for these data? 1.73 Blood proteins in children from Papua New Guinea. C-reactive protein (CRP) is a substance that can be measured in the blood. Values increase substantially within 6 hours of an infection and reach a peak within 24 to 48 hours. In adults, chronically high values have been linked to an increased risk of cardiovascular disease. In a study of apparently healthy children aged 6 to 60 months in Papua New Guinea, CRP was measured in 90 children.29 The units are milligrams per liter (mg/l). Here are the data from a random sample of 40 of these children: CRP 0.00 3.90
5.64 8.22
73.20 0.00 46.70 0.00
0.00
5.62
3.92 6.81 30.61
0.00 26.41 22.82 0.00
0.00 0.00
4.81 9.57
5.36
0.00
15.74 0.00
0.00 0.00
0.00
9.37 20.78 7.10
0.00
0.00 3.49
5.66 0.00 59.76 12.38 7.89
5.53
(a) Find the five-number summary for these data. (b) Make a boxplot. (c) Make a histogram. (d) Write a short summary of the major features of this distribution. Do you prefer the boxplot or the histogram for these data? 1.74 Does a log transform reduce the skewness? Refer to the previous exercise. With strongly skewed distributions such as this, we frequently reduce the skewness by taking a log transformation. We have a bit of a problem here, however, because some of the data are recorded as 0.00, and the logarithm of zero is not defined. For this variable, the value 0.00 is recorded whenever the amount of CRP in the blood is below the level that the measuring instrument is capable of detecting. The usual procedure in this circumstance is to add a small number to each observation before taking the logs. Transform these data by adding 1 to each observation and then taking the logarithm. Use the questions in the previous exercise as a guide to your analysis, and prepare a summary contrasting this analysis with the one that you performed in the previous exercise. CRP
1.3 Describing Distributions with Numbers 1.75 Vitamin A deficiency in children from Papua New Guinea. In the Papua New Guinea study that provided the data for the previous two exercises, the researchers also measured serum retinol. A low value of this variable can be an indicator of vitamin A deficiency. Here are the data on the same sample of 40 children from this study. The units are micromoles per liter 1mmol/l). 1.15
1.36
0.38
0.34
0.35
0.37
1.17
0.97
0.97
0.67
0.31
0.99
0.52
0.70
0.88
0.36
0.24
1.00
1.13
0.31
1.44
0.35
0.34
1.90
1.19
0.94
0.34
0.35
0.33
0.69
0.69
1.04
0.83
1.11
1.02
0.56
0.82
1.20
0.87
0.41
Analyze these data. Use the questions in the previous two exercises as a guide. VITA 1.76 Luck and puzzle solving. Children in a psychology study were asked to solve some puzzles and were then given feedback on their performance. They then were asked to rate how luck played a role in determining their scores.30 This variable was recorded on a 1 to 10 scale with 1 corresponding to very lucky and 10 corresponding to very unlucky. Here are the scores for 60 children: 1
10
1
10
1
1
10
5
1
1
8
1
10
2
1
9
5
2
1
8
10
5
9
10
10
9
6
10
1
5
1
9
2
1
7
10
9
5
10
10
10
1
8
1
6
10
1
6
10
10
8
10
3
10
8
1
8
10
4
2
Use numerical and graphical methods to describe these data. Write a short report summarizing your LUCK work. 1.77 Median versus mean for net worth. A report on the assets of American households says that the median net worth of U.S. families is $77,300. The mean net worth of these families is $498,800.31 What explains the difference between these two measures of center? 1.78 Create a data set. Create a data set with 9 observations for which the median would change by a large amount if the smallest observation were deleted. 1.79 Mean versus median. A small accounting firm pays each of its six clerks $45,000, two junior accountants $70,000 each, and the firm’s owner $420,000. What is the mean salary paid at this firm? How many of the employees earn less than the mean? What is the median salary? 1.80 Be careful about how you treat the zeros. In computing the median income of any group, some federal agencies omit all members of the group who had no income. Give an example to show that the reported median income of a group can go down even though the group becomes economically better off. Is this also true of the mean income?
51
1.81 How does the median change? The firm in Exercise 1.79 gives no raises to the clerks and junior accountants, while the owner’s take increases to $500,000. How does this change affect the mean? How does it affect the median? 1.82 Metabolic rates. Calculate the mean and standard deviation of the metabolic rates in Example 1.33 (page 42), showing each step in detail. First find the mean x by summing the 7 observations and dividing by 7. Then find each of the deviations xi ⫺ x and their squares. Check that the deviations have sum 0. Calculate the variance as an average of the squared deviations (remember to divide by n ⫺ 12. Finally, obtain s as the square root of METABOL the variance. 1.83 Earthquakes. Each year there are about 900,000 earthquakes of magnitude 2.5 or less that are usually not felt. In contrast, there are about 10 of magnitude 7.0 that cause serious damage.32 Explain why the average magnitude of earthquakes is not a good measure of their impact. 1.84 IQ scores. Many standard statistical methods that you will study in Part II of this book are intended for use with distributions that are symmetric and have no outliers. These methods start with the mean and standard deviation, x and s. For example, standard methods would typically be used for the IQ and GPA data in Table 1.3 (page 29). IQGPA (a) Find x and s for the IQ data. In large populations, IQ scores are standardized to have mean 100 and standard deviation 15. In what way does the distribution of IQ among these students differ from the overall population? (b) Find the median IQ score. It is, as we expect, close to the mean. (c) Find the mean and median for the GPA data. The two measures of center differ a bit. What feature of the data (see your stemplot in Exercise 1.43 or make a new stemplot) explains the difference? 1.85 Mean and median for two observations. The Mean and Median applet allows you to place observations on a line and see their mean and median visually. Place two observations on the line by clicking below it. Why does only one arrow appear? 1.86 Mean and median for three observations. In the Mean and Median applet, place three observations on the line by clicking below it, two close together near the center of the line and one somewhat to the right of these two. (a) Pull the single rightmost observation out to the right. (Place the cursor on the point, hold down a mouse button, and drag the point.) How does the mean behave? How does the median behave? Explain briefly why each measure acts as it does.
52
CHAPTER 1
•
Looking at Data—Distributions
(b) Now drag the rightmost point to the left as far as you can. What happens to the mean? What happens to the median as you drag this point past the other two (watch carefully)? 1.87 Mean and median for five observations. Place five observations on the line in the Mean and Median applet by clicking below it. (a) Add one additional observation without changing the median. Where is your new point? (b) Use the applet to convince yourself that when you add yet another observation (there are now seven in all), the median does not change no matter where you put the seventh point. Explain why this must be true. 1.88 Hummingbirds and flowers. Different varieties of the tropical flower Heliconia are fertilized by different species of hummingbirds. Over time, the lengths of the flowers and the form of the hummingbirds’ beaks have evolved to match each other. Here are data on the lengths in millimeters of three varieties of these flowers on the island of Dominica:33 H. bihai 47.12
46.75
46.81
47.12
46.67
47.43
46.44
46.64
48.07
48.34
48.15
50.26
50.12
46.34
46.94
48.36
H. caribaea red 41.90
42.01
41.93
43.09
41.47
41.69
39.78
40.57
39.63
42.18
40.66
37.87
39.16
37.40
38.20
38.07
38.10
37.97
38.79
38.23
38.87
37.78
38.01
37.02
36.52
36.11
36.03
35.45
38.13
35.17
36.82
36.66
35.68
36.03
34.57
34.63
37.1
Make boxplots to compare the three distributions. Report the five-number summaries along with your graph. What are the most important differences among the three varieties of flowers? HELICON 1.89 Compare the three varieties of flowers. The biologists who collected the flower length data in the previous exercise compared the three Heliconia varieties using statistical methods based on x and s. HELICON (a) Find x and s for each variety. (b) Make a stemplot of each set of flower lengths. Do the distributions appear suitable for use of x and s as summaries? 1.90 Imputation. Various problems with data collection can cause some observations to be missing. Suppose a data set has 20 cases. Here are the values of the variable x for 10 of these cases: IMPUTE 17
6 12 14
20
23
9
12
16
21
(a) Verify that the mean is 15 and find the standard deviation for the 10 cases for which x is not missing. (b) Create a new data set with 20 cases by setting the values for the 10 missing cases to 15. Compute the mean and standard deviation for this data set. (c) Summarize what you have learned about the possible effects of this type of imputation on the mean and the standard deviation. 1.91 Create a data set. Give an example of a small set of data for which the mean is smaller than the third quartile. 1.92 Create another data set. Create a set of 5 positive numbers (repeats allowed) that have median 11 and mean 8. What thought process did you use to create your numbers? 1.93 A standard deviation contest. This is a standard deviation contest. You must choose four numbers from the whole numbers 0 to 20, with repeats allowed. (a) Choose four numbers that have the smallest possible standard deviation.
H. caribaea yellow 36.78
The values for the other 10 cases are missing. One way to deal with missing data is called imputation. The basic idea is that missing values are replaced, or imputed, with values that are based on an analysis of the data that are not missing. For a data set with a single variable, the usual choice of a value for imputation is the mean of the values that are not missing. The mean for this data set is 15.
(b) Choose four numbers that have the largest possible standard deviation. (c) Is more than one choice possible in either (a) or (b)? Explain. 1.94 Deviations from the mean sum to zero. Use the definition of the mean x to show that the sum of the deviations xi ⫺ x of the observations from their mean is always zero. This is one reason why the variance and standard deviation use squared deviations. 1.95 Does your software give incorrect answers? This exercise requires a calculator with a standard deviation button or statistical software on a computer. The observations 30,001
30,002
30,003
have mean x ⫽ 30,002 and standard deviation s ⫽ 1. Adding a 0 in the center of each number, the next set becomes 300,001
300,003
300,003
1.4 Density Curves and Normal Distributions The standard deviation remains s ⫽ 1 as more 0s are added. Use your calculator or computer to calculate the standard deviation of these numbers, adding extra 0s until you get an incorrect answer. How soon did you go wrong? This demonstrates that calculators and computers cannot handle an arbitrary number of digits correctly. 1.96 Compare three varieties of flowers. Exercise 1.88 reports data on the lengths in millimeters of flowers of three varieties of Heliconia. In Exercise 1.89 you found the mean and standard deviation for each variety. Starting from the x- and s-values in millimeters, find the means and standard deviations in inches. (A millimeter is 1/1000 of a meter. A meter is 39.37 inches.) 1.97 Weight gain. A study of diet and weight gain deliberately overfed 12 volunteers for eight weeks. The mean increase in fat was x ⫽ 2.32 kilograms, and the standard deviation was s ⫽ 1.21 kilograms. What are x and s in pounds? (A kilogram is 2.2 pounds.) 1.98 Changing units from inches to centimeters. Changing the unit of length from inches to centimeters multiplies each length by 2.54 because there are
53
2.54 centimeters in an inch. This change of units multiplies our usual measures of spread by 2.54. This is true of IQR and the standard deviation. What happens to the variance when we change units in this way? 1.99 A different type of mean. The trimmed mean is a measure of center that is more resistant than the mean but uses more of the available information than the median. To compute the 10% trimmed mean, discard the highest 10% and the lowest 10% of the observations and compute the mean of the remaining 80%. Trimming eliminates the effect of a small number of outliers. Compute the 10% trimmed mean of the service time data in Table 1.2 (page 19). Then compute the 20% trimmed mean. Compare the values of these measures with the median and the ordinary untrimmed mean. 1.100 Changing units from centimeters to inches. Refer to Exercise 1.72 (page 50). Change the measurements from centimeters to inches by multiplying each value by 0.39. Answer the questions from that exercise and explain the effect of the transformation on these data.
1.4 Density Curves and Normal Distributions When you complete this section, you will be able to • Compare the mean and the median for symmetric and skewed distributions. • Sketch a Normal distribution for any given mean and standard deviation. • Apply the 68–95–99.7 rule to find proportions of observations within 1, 2, and 3 standard deviations of the mean for any Normal distribution. • Transform values of a variable from a general Normal distribution to the standard Normal distribution. • Compute areas under a Normal curve using software or Table A. • Perform inverse Normal calculations to find values of a Normal variable corresponding to various areas. • Assess the extent to which the distribution of a set of data can be approximated by a Normal distribution.
We now have a kit of graphical and numerical tools for describing distributions. What is more, we have a clear strategy for exploring data on a single quantitative variable: 1. Always plot your data: make a graph, usually a stemplot or a histogram. 2. Look for the overall pattern and for striking deviations such as outliers. 3. Calculate an appropriate numerical summary to briefly describe center and spread.
54
CHAPTER 1
•
Looking at Data—Distributions
density curve
Technology has expanded the set of graphs that we can choose for Step 1. It is possible, though painful, to make histograms by hand. Using software, clever algorithms can describe a distribution in a way that is not feasible by hand, by fitting a smooth curve to the data in addition to or instead of a histogram. The curves used are called density curves. Before we examine density curves in detail, here is an example of what software can do.
EXAMPLE DATA
1.36 Density curves for times to start a business and Titanic passenger ages. Figure 1.19 illustrates the use of density curves along with histograms TIME
to describe distributions. Figure 1.19(a) shows the distribution of the times
CHALLENGE
0
20
40
60
80 100 120 Time (days) (a)
0
10
20
30
40 50 Age (years) (b)
60
140
160
180
70
80
90
FIGURE 1.19 (a) The distribution of the time to start a business, for Example 1.36. The distribution is pictured with both a histogram and a density curve. (b) The distribution of the ages of the Titanic passengers, for Example 1.36. These distributions have a single mode with tails of two different lengths.
1.4 Density Curves and Normal Distributions
DATA TITANIC
55
to start a business for 194 countries (see Example 1.23, page 31). The outlier, Surinam, described in Exercise 1.47 (page 32), has been deleted from the data set. The distribution is highly skewed to the right. Most of the data are in the first two classes, with 40 or fewer days to start a business. Exercise 1.25 (page 33) describes data on the class of the ticket of the Titanic passengers, and Figure 1.19(b) shows the distribution of the ages of these passengers. It has a single mode, a long right tail, and a relatively short left tail. A smooth density curve is an idealization that gives the overall pattern of the data but ignores minor irregularities. We turn now to a special class of density curves, the bell-shaped Normal curves.
Density curves CHALLENGE
One way to think of a density curve is as a smooth approximation to the irregular bars of a histogram. Figure 1.20 shows a histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills. Scores of many students on this national test have a very regular distribution. The histogram is symmetric, and both tails fall off quite smoothly from a single center peak. There are no large gaps or obvious outliers. The curve drawn through the tops of the histogram bars in Figure 1.20 is a good description of the overall pattern of the data.
EXAMPLE 1.37 Vocabulary scores. In a histogram, the areas of the bars represent either counts or proportions of the observations. In Figure 1.20(a) we have shaded the bars that represent students with vocabulary scores 6.0 or lower.
2
4 6 8 10 Grade equivalent vocabulary score (a)
12
2
8 10 4 6 Grade equivalent vocabulary score (b)
12
FIGURE 1.20 (a) The distribution of Iowa Test vocabulary scores for Gary, Indiana, seventhgraders, for Example 1.37. The shaded bars in the histogram represent scores less than or equal to 6.0. (b) The shaded area under the Normal density curve also represents scores less than or equal to 6.0. This area is 0.293, close to the true 0.303 for the actual data.
56
CHAPTER 1
•
Looking at Data—Distributions There are 287 such students, who make up the proportion 287/947 5 0.303 of all Gary seventh-graders. The shaded bars in Figure 1.20(a) make up proportion 0.303 of the total area under all the bars. If we adjust the scale so that the total area of the bars is 1, the area of the shaded bars will also be 0.303. In Figure 1.20(b), we have shaded the area under the curve to the left of 6.0. If we adjust the scale so that the total area under the curve is exactly 1, areas under the curve will then represent proportions of the observations. That is, area 5 proportion. The curve is then a density curve. The shaded area under the density curve in Figure 1.20(b) represents the proportion of students with score 6.0 or lower. This area is 0.293, only 0.010 away from the histogram result. You can see that areas under the density curve give quite good approximations of areas given by the histogram.
DENSITY CURVE A density curve is a curve that • is always on or above the horizontal axis and • has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range. The density curve in Figure 1.20 is a Normal curve. Density curves, like distributions, come in many shapes. Figure 1.21 shows two density curves, a symmetric Normal density curve and a right-skewed curve. We will discuss Normal density curves in detail in this section because of the important role that they play in statistics. There are, however, many applications where the use of other families of density curves are essential. A density curve of an appropriate shape is often an adequate description of the overall pattern of a distribution. Outliers, which are deviations from the overall pattern, are not described by the curve.
Measuring center and spread for density curves Our measures of center and spread apply to density curves as well as to actual sets of observations, but only some of these measures are easily seen from the
Median and mean (a)
Mean Median
(b)
FIGURE 1.21 (a) A symmetric Normal density curve with its mean and median marked. (b) A right-skewed density curve with its mean and median marked.
1.4 Density Curves and Normal Distributions
57
FIGURE 1.22 The mean of a density curve is the point at which it would balance. curve. A mode of a distribution described by a density curve is a peak point of the curve, the location where the curve is highest. Because areas under a density curve represent proportions of the observations, the median is the point with half the total area on each side. You can roughly locate the quartiles by dividing the area under the curve into quarters as accurately as possible by eye. The IQR is the distance between the first and third quartiles. There are mathematical ways of calculating areas under curves. These allow us to locate the median and quartiles exactly on any density curve. What about the mean and standard deviation? The mean of a set of observations is their arithmetic average. If we think of the observations as weights strung out along a thin rod, the mean is the point at which the rod would balance. This fact is also true of density curves. The mean is the point at which the curve would balance if it were made out of solid material. Figure 1.22 illustrates this interpretation of the mean. A symmetric curve, such as the Normal curve in Figure 1.21(a), balances at its center of symmetry. Half the area under a symmetric curve lies on either side of its center, so this is also the median. For a right-skewed curve, such as those shown in Figures 1.21(b) and 1.22, the small area in the long right tail tips the curve more than the same area near the center. The mean (the balance point) therefore lies to the right of the median. It is hard to locate the balance point by eye on a skewed curve. There are mathematical ways of calculating the mean for any density curve, so we are able to mark the mean as well as the median in Figure 1.21(b). The standard deviation can also be calculated mathematically, but it can’t be located by eye on most density curves.
MEDIAN AND MEAN OF A DENSITY CURVE The median of a density curve is the equal-areas point, the point that divides the area under the curve in half. The mean of a density curve is the balance point, at which the curve would balance if made of solid material. The median and mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail.
A density curve is an idealized description of a distribution of data. For example, the density curve in Figure 1.20 is exactly symmetric, but the histogram of vocabulary scores is only approximately symmetric. We therefore need to distinguish between the mean and standard deviation of the density curve and the numbers x and s computed from the actual observations. The
58
CHAPTER 1
•
Looking at Data—Distributions
mean m standard deviation s
usual notation for the mean of an idealized distribution is M (the Greek letter mu). We write the standard deviation of a density curve as S (the Greek letter sigma).
Normal distributions Normal curves Normal distributions
One particularly important class of density curves has already appeared in Figures 1.20 and 1.21(a). These density curves are symmetric, unimodal, and bell-shaped. They are called Normal curves, and they describe Normal distributions. All Normal distributions have the same overall shape. The exact density curve for a particular Normal distribution is specified by giving the distribution’s mean m and its standard deviation s. The mean is located at the center of the symmetric curve and is the same as the median. Changing m without changing s moves the Normal curve along the horizontal axis without changing its spread. The standard deviation s controls the spread of a Normal curve. Figure 1.23 shows two Normal curves with different values of s. The curve with the larger standard deviation is more spread out. The standard deviation s is the natural measure of spread for Normal distributions. Not only do m and s completely determine the shape of a Normal curve, but we can locate s by eye on the curve. Here’s how. As we move out in either direction from the center m, the curve changes from falling ever more steeply
to falling ever less steeply
The points at which this change of curvature takes place are located at distance s on either side of the mean m. You can feel the change as you run your finger along a Normal curve, and so find the standard deviation. Remember that m and s alone do not specify the shape of most distributions, and that the shape of density curves in general does not reveal s. These are special properties of Normal distributions.
σ σ
μ
μ
FIGURE 1.23 Two Normal curves, showing the mean m and the standard deviation s.
1.4 Density Curves and Normal Distributions
59
There are other symmetric bell-shaped density curves that are not Normal. The Normal density curves are specified by a particular equation. The height of the density curve at any point x is given by 1 s22p
1 x⫺m 2
e⫺ 2 1
s
2
We will not make direct use of this fact, although it is the basis of mathematical work with Normal distributions. Notice that the equation of the curve is completely determined by the mean m and the standard deviation s. Why are the Normal distributions important in statistics? Here are three reasons. 1. Normal distributions are good descriptions for some distributions of real data. Distributions that are often close to Normal include scores on tests taken by many people (such as the Iowa Test of Figure 1.20, page 55), repeated careful measurements of the same quantity, and characteristics of biological populations (such as lengths of baby pythons and yields of corn). 2. Normal distributions are good approximations to the results of many kinds of chance outcomes, such as tossing a coin many times. 3. Many statistical inference procedures based on Normal distributions work well for other roughly symmetric distributions. However, even though many sets of data follow a Normal distribution, many do not. Most income distributions, for example, are skewed to the right and so are not Normal. Non-Normal data, like nonnormal people, not only are common but are also sometimes more interesting than their Normal counterparts.
The 68–95–99.7 rule Although there are many Normal curves, they all have common properties. Here is one of the most important.
THE 68–95–99.7 RULE In the Normal distribution with mean m and standard deviation s : • Approximately 68% of the observations fall within s of the mean m. • Approximately 95% of the observations fall within 2s of m. • Approximately 99.7% of the observations fall within 3s of m.
Figure 1.24 illustrates the 68–95–99.7 rule. By remembering these three numbers, you can think about Normal distributions without constantly making detailed calculations.
EXAMPLE 1.38 Heights of young women. The distribution of heights of young women aged 18 to 24 is approximately Normal with mean m ⫽ 64.5 inches
60
CHAPTER 1
•
Looking at Data—Distributions
68% of data 95% of data 99.7% of data
FIGURE 1.24 The 68–95–99.7 rule for Normal distributions.
–3
–2
–1
0
1
2
3
and standard deviation s ⫽ 2.5 inches. Figure 1.25 shows what the 68–95–99.7 rule says about this distribution. Two standard deviations equals 5 inches for this distribution. The 95 part of the 68–95–99.7 rule says that the middle 95% of young women are between 64.5 ⫺ 5 and 64.5 ⫹ 5 inches tall, that is, between 59.5 and 69.5 inches. This fact is exactly true for an exactly Normal distribution. It is approximately true for the heights of young women because the distribution of heights is approximately Normal. The other 5% of young women have heights outside the range from 59.5 to 69.5 inches. Because the Normal distributions are symmetric, half of these women are on the tall side. So the tallest 2.5% of young women are taller than 69.5 inches.
N(m, s)
Because we will mention Normal distributions often, a short notation is helpful. We abbreviate the Normal distribution with mean m and standard deviation s as N(m, s). For example, the distribution of young women’s heights is N164.5, 2.52.
FIGURE 1.25 The 68–95–99.7 rule applied to the heights of young women, for Example 1.38. 68% 95% 99.7%
57
59.5
62 67 64.5 Height (inches)
69.5
72
1.4 Density Curves and Normal Distributions
61
USE YOUR KNOWLEDGE 1.101 Test scores. Many states assess the skills of their students in various grades. One program that is available for this purpose is the National Assessment of Educational Progress (NAEP).34 One of the tests provided by the NAEP assesses the reading skills of twelfth-grade students. In a recent year, the national mean score was 288 and the standard deviation was 38. Assuming that these scores are approximately Normally distributed, N(288, 38), use the 68–95–99.7 rule to give a range of scores that includes 95% of these students. 1.102 Use the 68–95–99.7 rule. Refer to the previous exercise. Use the 68–95–99.7 rule to give a range of scores that includes 99.7% of these students.
Standardizing observations As the 68–95–99.7 rule suggests, all Normal distributions share many properties. In fact, all Normal distributions are the same if we measure in units of size s about the mean m as center. Changing to these units is called standardizing. To standardize a value, subtract the mean of the distribution and then divide by the standard deviation.
STANDARDIZING AND z-SCORES If x is an observation from a distribution that has mean m and standard deviation s, the standardized value of x is z⫽
x⫺m s
A standardized value is often called a z-score.
A z-score tells us how many standard deviations the original observation falls away from the mean, and in which direction. Observations larger than the mean are positive when standardized, and observations smaller than the mean are negative. To compare scores based on different measures, z-scores can be very useful. For example, see Exercise 1.134 (page 75), where you are asked to compare an SAT score with an ACT score.
EXAMPLE 1.39 Find some z-scores. The heights of young women are approximately Normal with m ⫽ 64.5 inches and s ⫽ 2.5 inches. The z-score for height is z⫽
height ⫺ 64.5 2.5
62
CHAPTER 1
•
Looking at Data—Distributions A woman’s standardized height is the number of standard deviations by which her height differs from the mean height of all young women. A woman 68 inches tall, for example, has z-score z⫽
68 ⫺ 64.5 ⫽ 1.4 2.5
or 1.4 standard deviations above the mean. Similarly, a woman 5 feet (60 inches) tall has z-score z⫽
60 ⫺ 64.5 ⫽ ⫺1.8 2.5
or 1.8 standard deviations less than the mean height.
USE YOUR KNOWLEDGE 1.103 Find the z-score. Consider the NAEP scores (see Exercise 1.101), which we assume are approximately Normal, N(288, 38). Give the z-score for a student who received a score of 365. 1.104 Find another z-score. Consider the NAEP scores, which we assume are approximately Normal, N(288, 38). Give the z-score for a student who received a score of 250. Explain why your answer is negative even though all the test scores are positive.
We need a way to write variables, such as “height’’ in Example 1.38, that follow a theoretical distribution such as a Normal distribution. We use capital letters near the end of the alphabet for such variables. If X is the height of a young woman, we can then shorten “the height of a young woman is less than 68 inches’’ to “X ⬍ 68.’’ We will use lowercase x to stand for any specific value of the variable X. We often standardize observations from symmetric distributions to express them in a common scale. We might, for example, compare the heights of two children of different ages by calculating their z-scores. The standardized heights tell us where each child stands in the distribution for his or her age group. Standardizing is a linear transformation that transforms the data into the standard scale of z-scores. We know that a linear transformation does not change the shape of a distribution, and that the mean and standard deviation change in a simple manner. In particular, the standardized values for any distribution always have mean 0 and standard deviation 1. If the variable we standardize has a Normal distribution, standardizing does more than give a common scale. It makes all Normal distributions into a single distribution, and this distribution is still Normal. Standardizing a variable that has any Normal distribution produces a new variable that has the standard Normal distribution.
1.4 Density Curves and Normal Distributions
63
THE STANDARD NORMAL DISTRIBUTION The standard Normal distribution is the Normal distribution N(0, 1) with mean 0 and standard deviation 1. If a variable X has any Normal distribution N1m, s2 with mean m and standard deviation s, then the standardized variable Z⫽
X⫺m s
has the standard Normal distribution.
Normal distribution calculations
cumulative proportion
Areas under a Normal curve represent proportions of observations from that Normal distribution. There is no formula for areas under a Normal curve. Calculations use either software that calculates areas or a table of areas. The table and most software calculate one kind of area: cumulative proportions. A cumulative proportion is the proportion of observations in a distribution that lie at or below a given value. When the distribution is given by a density curve, the cumulative proportion is the area under the curve to the left of a given value. Figure 1.26 shows the idea more clearly than words do. The key to calculating Normal proportions is to match the area you want with areas that represent cumulative proportions. Then get areas for cumulative proportions either from software or (with an extra step) from a table. The following examples show the method in pictures.
EXAMPLE 1.40 NCAA eligibility for competition. To be eligible to compete in their first year of college, the National Collegiate Athletic Association (NCAA) requires Division I athletes to meet certain academic standards. These are based on their grade point average (GPA) in certain courses and combined scores on the SAT Critical Reading and Mathematics sections or the ACT composite score.35 For a student with a 3.0 GPA, the combined SAT score must be 800 or higher. Based on the distribution of SAT scores for college-bound students, Cumulative proportion at x = area under curve to the left of x.
FIGURE 1.26 The cumulative proportion for a value x is the proportion of all observations from the distribution that are less than or equal to x. This is the area to the left of x under the Normal curve.
x
64
CHAPTER 1
•
Looking at Data—Distributions we assume that the distribution of the combined Critical Reading and Mathematics scores is approximately Normal with mean 1010 and standard deviation 225.36 What proportion of college-bound students have SAT scores of 800 or more? Here is the calculation in pictures: the proportion of scores above 800 is the area under the curve to the right of 800. That’s the total area under the curve (which is always 1) minus the cumulative proportion up to 800.
=
–
800
area right of 800 0.8247
800
total area 1
5 5
area left of 800 0.1753
2 2
That is, the proportion of college-bound SAT takers with a 3.0 GPA who are eligible to compete is 0.8247, or about 82%. There is no area under a smooth curve that is exactly over the point 800. Consequently, the area to the right of 800 (the proportion of scores . 800) is the same as the area at or to the right of this point (the proportion of scores ⱖ 800). The actual data may contain a student who scored exactly 800 on the SAT. That the proportion of scores exactly equal to 800 is 0 for a Normal distribution is a consequence of the idealized smoothing of Normal distributions for data.
EXAMPLE 1.41 NCAA eligibility for aid and practice. The NCAA has a category of eligibility in which a first-year student may not compete but is still eligible to receive an athletic scholarship and to practice with the team. The requirements for this category are a 3.0 GPA and combined SAT Critical Reading and Mathematics scores of at least 620. What proportion of college-bound students who take the SAT would be eligible to receive an athletic scholarship and to practice with the team but would not be eligible to compete? That is, what proportion have scores between 620 and 800? Here are the pictures:
=
620 800
800
area between 620 and 800 5 0.1338
–
5
620
area left of 800
2
area left of 620
0.1753
2
0.0415
About 13% of college-bound students with a 3.0 GPA have SAT scores between 620 and 800.
CHALLENGE
1.4 Density Curves and Normal Distributions
65
How do we find the numerical values of the areas in Examples 1.40 and 1.41? If you use software, just plug in mean 1010 and standard deviation 225. Then ask for the cumulative proportions for 800 and for 620. (Your software will probably refer to these as “cumulative probabilities.’’ We will learn in Chapter 4 why the language of probability fits.) Sketches of the areas that you want similar to the ones in Examples 1.40 and 1.41 are very helpful in making sure that you are doing the correct calculations. You can use the Normal Curve applet on the text website, whfreeman.com/ ips8e, to find Normal proportions. The applet is more flexible than most software—it will find any Normal proportion, not just cumulative proportions. The applet is an excellent way to understand Normal curves. But, because of the limitations of web browsers, the applet is not as accurate as statistical software. If you are not using software, you can find cumulative proportions for Normal curves from a table. That requires an extra step, as we now explain.
Using the standard Normal table The extra step in finding cumulative proportions from a table is that we must first standardize to express the problem in the standard scale of z-scores. This allows us to get by with just one table, a table of standard Normal cumulative proportions. Table A in the back of the book gives standard Normal probabilities. Table A also appears on the last two pages of the text. The picture at the top of the table reminds us that the entries are cumulative proportions, areas under the curve to the left of a value z.
EXAMPLE 1.42 Find the proportion from z. What proportion of observations on a standard Normal variable Z take values less than 1.47? Solution: To find the area to the left of 1.47, locate 1.4 in the left-hand column of Table A and then locate the remaining digit 7 as .07 in the top row. The entry opposite 1.4 and under .07 is 0.9292. This is the cumulative proportion we seek. Figure 1.27 illustrates this area. Now that you see how Table A works, let’s redo the NCAA Examples 1.40 and 1.41 using the table. FIGURE 1.27 The area under a standard Normal curve to the left of the point z 5 1.47 is 0.9292, for Example 1.42.
Table entry: area = 0.9292.
z = 1.47
66
CHAPTER 1
•
Looking at Data—Distributions
EXAMPLE 1.43 Find the proportion from x. What proportion of college-bound students who take the SAT have scores of at least 800? The picture that leads to the answer is exactly the same as in Example 1.40. The extra step is that we first standardize to read cumulative proportions from Table A. If X is SAT score, we want the proportion of students for whom X ⱖ x, where x ⫽ 800. 1. Standardize. Subtract the mean, then divide by the standard deviation, to transform the problem about X into a problem about a standard Normal Z: X ⱖ 800 X ⫺ 1010 800 ⫺ 1010 ⱖ 225 225 Z ⱖ ⫺0.93 2. Use the table. Look at the pictures in Example 1.40. From Table A, we see that the proportion of observations less than 20.93 is 0.1762. The area to the right of 20.93 is therefore 1 ⫺ 0.1762 ⫽ 0.8238. This is about 82%.
The area from the table in Example 1.43 (0.8238) is slightly less accurate than the area from software in Example 1.40 (0.8247) because we must round z to two places when we use Table A. The difference is rarely important in practice.
EXAMPLE 1.44 Eligibility for aid and practice. What proportion of all students who take the SAT would be eligible to receive athletic scholarships and to practice with the team but would not be eligible to compete in the eyes of the NCAA? That is, what proportion of students have SAT scores between 620 and 800? First, sketch the areas, exactly as in Example 1.41. We again use X as shorthand for an SAT score. 1. Standardize. 620 #
X
, 800
620 ⫺ 1010 X ⫺ 1010 800 ⫺ 1010 ⱕ ⬍ 225 225 225 21.73 #
Z
, 20.93
2. Use the table. area between ⫺1.73 and ⫺0.93 ⫽ 1area left of ⫺0.932 ⫺ 1area left of ⫺1.732 ⫽ 0.1762 ⫺ 0.0418 ⫽ 0.1344 As in Example 1.41, about 13% of students would be eligible to receive athletic scholarships and to practice with the team. Sometimes we encounter a value of z more extreme than those appearing in Table A. For example, the area to the left of z ⫽ ⫺4 is not given in the table.
1.4 Density Curves and Normal Distributions
67
The z-values in Table A leave only area 0.0002 in each tail unaccounted for. For practical purposes, we can act as if there is zero area outside the range of Table A. USE YOUR KNOWLEDGE 1.105 Find the proportion. Consider the NAEP scores, which are approximately Normal, N(288, 38). Find the proportion of students who have scores less than 340. Find the proportion of students who have scores greater than or equal to 340. Sketch the relationship between these two calculations using pictures of Normal curves similar to the ones given in Example 1.40 (page 63). 1.106 Find another proportion. Consider the NAEP scores, which are approximately Normal, N(288, 38). Find the proportion of students who have scores between 340 and 370. Use pictures of Normal curves similar to the ones given in Example 1.41 (page 64) to illustrate your calculations.
Inverse Normal calculations Examples 1.40 to 1.44 illustrate the use of Normal distributions to find the proportion of observations in a given event, such as “SAT score between 620 and 800.’’ We may instead want to find the observed value corresponding to a given proportion. Statistical software will do this directly. Without software, use Table A backward, finding the desired proportion in the body of the table and then reading the corresponding z from the left column and top row.
EXAMPLE 1.45 How high for the top 10%? Scores for college-bound students on the SAT Critical Reading test in recent years follow approximately the N(500,120) distribution.37 How high must a student score to place in the top 10% of all students taking the SAT? Again, the key to the problem is to draw a picture. Figure 1.28 shows that we want the score x with an area of 0.10 above it. That’s the same as area below x equal to 0.90. FIGURE 1.28 Locating the point on a Normal curve with area 0.10 to its right, for Example 1.45. Area = 0.90 Area = 0.10
x = 500 z=0
x=? z = 1.28
68
CHAPTER 1
•
Looking at Data—Distributions Statistical software has a function that will give you the x for any cumulative proportion you specify. The function often has a name such as “inverse cumulative probability.’’ Plug in mean 500, standard deviation 120, and cumulative proportion 0.9. The software tells you that x ⫽ 653.786. We see that a student must score at least 654 to place in the highest 10%. Without software, first find the standard score z with cumulative proportion 0.9, then “unstandardize’’ to find x. Here is the two-step process: 1 Use the table. Look in the body of Table A for the entry closest to 0.9. It is 0.8997. This is the entry corresponding to z ⫽ 1.28. So z ⫽ 1.28 is the standardized value with area 0.9 to its left. 2. Unstandardize to transform the solution from z back to the original x scale. We know that the standardized value of the unknown x is z ⫽ 1.28. So x itself satisfies x ⫺ 500 ⫽ 1.28 120 Solving this equation for x gives x ⫽ 500 ⫹ 11.282 11202 ⫽ 653.6 This equation should make sense: it finds the x that lies 1.28 standard deviations above the mean on this particular Normal curve. That is the “unstandardized’’ meaning of z ⫽ 1.28. The general rule for unstandardizing a z-score is x ⫽ m ⫹ zs USE YOUR KNOWLEDGE 1.107 What score is needed to be in the top 25%? Consider the NAEP scores, which are approximately Normal, N(288, 38). How high a score is needed to be in the top 25% of students who take this exam? 1.108 Find the score that 80% of students will exceed. Consider the NAEP scores, which are approximately Normal, N(288, 38). Eighty percent of the students will score above x on this exam. Find x.
Normal quantile plots The Normal distributions provide good descriptions of some distributions of real data, such as the Iowa Test vocabulary scores. The distributions of some other common variables are usually skewed and therefore distinctly nonNormal. Examples include economic variables such as personal income and gross sales of business firms, the survival times of cancer patients after treatment, and the service lifetime of mechanical or electronic components. While experience can suggest whether or not a Normal distribution is plausible in a particular case, it is risky to assume that a distribution is Normal without actually inspecting the data. A histogram or stemplot can reveal distinctly non-Normal features of a distribution, such as outliers, pronounced skewness, or gaps and clusters. If the stemplot or histogram appears roughly symmetric and unimodal, however, we need a more sensitive way to judge the adequacy of a Normal model.
1.4 Density Curves and Normal Distributions Normal quantile plot
69
The most useful tool for assessing Normality is another graph, the Normal quantile plot. Here is the basic idea of a Normal quantile plot. The graphs produced by software use more sophisticated versions of this idea. It is not practical to make Normal quantile plots by hand. 1. Arrange the observed data values from smallest to largest. Record what percentile of the data each value occupies. For example, the smallest observation in a set of 20 is at the 5% point, the second smallest is at the 10% point, and so on.
Normal scores
2. Do Normal distribution calculations to find the values of z corresponding to these same percentiles. For example, z ⫽ ⫺1.645 is the 5% point of the standard Normal distribution, and z ⫽ ⫺1.282 is the 10% point. We call these values of Z Normal scores. 3. Plot each data point x against the corresponding Normal score. If the data distribution is close to any Normal distribution, the plotted points will lie close to a straight line. Any Normal distribution produces a straight line on the plot because standardizing turns any Normal distribution into a standard Normal distribution. Standardizing is a linear transformation that can change the slope and intercept of the line in our plot but cannot turn a line into a curved pattern.
USE OF NORMAL QUANTILE PLOTS If the points on a Normal quantile plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot. An optional line can be drawn on the plot that corresponds to the Normal distribution with mean equal to the mean of the data and standard deviation equal to the standard deviation of the data. Figures 1.29 and 1.30 are Normal quantile plots for data we have met earlier. The data x are plotted vertically against the corresponding standard Normal z-score plotted horizontally. The z-score scale generally extends from ⫺3 to 3 because almost all of a standard Normal curve lies between these values. These figures show how Normal quantile plots behave.
EXAMPLE DATA
1.46 IQ scores are approximately Normal. Figure 1.29 is a Normal quanIQ
tile plot of the 60 fifth-grade IQ scores from Table 1.1 (page 16). The points lie very close to the straight line drawn on the plot. We conclude that the distribution of IQ data is approximately Normal.
EXAMPLE DATA
1.47 Times to start a business are skewed. Figure 1.30 is a Normal quantile plot of the data on times to start a business from Example 1.23. We have
CHALLENGE
TIME
70
CHAPTER 1
•
Looking at Data—Distributions 150 140 130
IQ
120 110 100 90
FIGURE 1.29 Normal quantile
80 ⫺3
plot of IQ scores, for Example 1.46. This distribution is approximately Normal.
⫺2
⫺1
0 Normal score
1
2
3
excluded Suriname, the outlier that you examined in Exercise 1.47. The line drawn on the plot shows clearly that the plot of the data is curved. We conclude that these data are not Normally distributed. The shape of the curve is what we typically see with a distribution that is strongly skewed to the right. Real data often show some departure from the theoretical Normal model. When you examine a Normal quantile plot, look for shapes that show clear departures from Normality. Don’t overreact to minor wiggles in the plot. When we discuss statistical methods that are based on the Normal model, we are
FIGURE 1.30 Normal quantile
175
plot of 184 times to start a business, with the outlier, Suriname, excluded, for Example 1.47. This distribution is highly skewed.
150
Time (days)
125 100 75 50 25 0 ⫺3
⫺2
⫺1
0 Normal score
1
2
3
1.4 Density Curves and Normal Distributions
71
interested in whether or not the data are sufficiently Normal for these procedures to work properly. We are not concerned about minor deviations from Normality. Many common methods work well as long as the data are approximately Normal and outliers are not present. BEYOND THE BASICS
Density Estimation
density estimator
A density curve gives a compact summary of the overall shape of a distribution. Many distributions do not have the Normal shape. There are other families of density curves that are used as mathematical models for various distribution shapes. Modern software offers more flexible options. A density estimator does not start with any specific shape, such as the Normal shape. It looks at the data and draws a density curve that describes the overall shape of the data. Density estimators join stemplots and histograms as useful graphical tools for exploratory data analysis. Density estimates can capture other unusual features of a distribution. Here is an example.
EXAMPLE
DATA STUBHUB
1.48 StubHub! StubHub! is a website where fans can buy and sell tickets to sporting events. Ticket holders wanting to sell their tickets provide the location of their seats and the selling price. People wanting to buy tickets can choose from among the tickets offered for a given event.38 There were 186 tickets for the NCAA Women’s Final Four Basketball Championship in New Orleans posted for sale on StubHub! on January 2, 2013. A histogram with a density estimate is given in Figure 1.31. The distribution
60
StubHub! price per seat for tickets to the 2013 NCAA Women’s Final Four Basketball Championship in New Orleans, with a density estimate, for Example 1.48.
50
CHALLENGE
FIGURE 1.31 Histogram of
Percent
40
30
20
10
0 0
200
400
600 800 Price ($)
1000
1200
72
CHAPTER 1
•
Looking at Data—Distributions has three peaks, one around $300, another around $600, and a third around $1100. Inspection of the data suggests that these correspond roughly to three different types of seats: lower-level seats, club seats, and special luxury seats.
trimodal distribution bimodal distribution
Many distributions that we have met have a single peak, or mode. The distribution described in Example 1.48 has three modes and is called a trimodal distribution. A distribution that has two modes is called a bimodal distribution. The previous example reminds of a continuing theme for data analysis. We looked at a histogram and a density estimate and saw something interesting. This led us to speculation. Additional data on the type and location of the seats may explain more about the prices than we see in Figure 1.31.
SECTION 1.4 Summary The overall pattern of a distribution can often be described compactly by a density curve. A density curve has total area 1 underneath it. Areas under a density curve give proportions of observations for the distribution. The mean m (balance point), the median (equal-areas point), and the quartiles can be approximately located by eye on a density curve. The standard deviation s cannot be located by eye on most density curves. The mean and median are equal for symmetric density curves, but the mean of a skewed curve is located farther toward the long tail than is the median. The Normal distributions are described by bell-shaped, symmetric, unimodal density curves. The mean m and standard deviation s completely specify the Normal distribution N(m, s). The mean is the center of symmetry, and s is the distance from m to the change-of-curvature points on either side. All Normal distributions satisfy the 68–95–99.7 rule. To standardize any observation x, subtract the mean of the distribution and then divide by the standard deviation. The resulting z-score z ⫽ 1x ⫺ m2兾s says how many standard deviations x lies from the distribution mean. All Normal distributions are the same when measurements are transformed to the standardized scale. If X has the N(m, s) distribution, then the standardized variable Z ⫽ 1X ⫺ m2兾s has the standard Normal distribution N(0, 1). Proportions for any Normal distribution can be calculated by software or from the standard Normal table (Table A), which gives the cumulative proportions of Z ⬍ z for many values of z. The adequacy of a Normal model for describing a distribution of data is best assessed by a Normal quantile plot, which is available in most statistical software packages. A pattern on such a plot that deviates substantially from a straight line indicates that the data are not Normal.
SECTION 1.4 Exercises For Exercises 1.101 and 1.102, see page 61; for Exercises 1.103 and 1.104, see page 62; for Exercises 1.105 and 1.106, see page 67; and for Exercises 1.107 and 1.108, see page 68.
(b) Sketch a distribution that is skewed to the left. Mark the location of the mean and the median.
1.109 Means and medians.
1.110 The effect of changing the standard deviation.
(a) Sketch a symmetric distribution that is not Normal. Mark the location of the mean and the median.
(a) Sketch a Normal curve that has mean 20 and standard deviation 5.
1.4 Density Curves and Normal Distributions (b) On the same x axis, sketch a Normal curve that has mean 20 and standard deviation 10. (c) How does the Normal curve change when the standard deviation is varied but the mean stays the same? 1.111 The effect of changing the mean. (a) Sketch a Normal curve that has mean 20 and standard deviation 5. (b) On the same x axis, sketch a Normal curve that has mean 30 and standard deviation 5. (c) How does the Normal curve change when the mean is varied but the standard deviation stays the same? 1.112 NAEP music scores. In Exercise 1.101 (page 61) we examined the distribution of NAEP scores for the twelfth-grade reading skills assessment. For eighthgrade students the average music score is approximately Normal with mean 150 and standard deviation 35. (a) Sketch this Normal distribution. (b) Make a table that includes values of the scores corresponding to plus or minus one, two, and three standard deviations from the mean. Mark these points on your sketch along with the mean. (c) Apply the 68–95–99.7 rule to this distribution. Give the ranges of reading score values that are within one, two, and three standard deviations of the mean. 1.113 NAEP U.S. history scores. Refer to the previous exercise. The scores for twelfth-grade students on the U.S. history assessment are approximately N(288, 32). Answer the questions in the previous exercise for this assessment. 1.114 Standardize some NAEP music scores. The NAEP music assessment scores for eighth-grade students are approximately N(150, 35). Find z-scores by standardizing the following scores: 150, 140, 100, 180, 230. 1.115 Compute the percentile scores. Refer to the previous exercise. When scores such as the NAEP assessment scores are reported for individual students, the actual values of the scores are not particularly meaningful. Usually, they are transformed into percentile scores. The percentile score is the proportion of students who would score less than or equal to the score for the individual student. Compute the percentile scores for the five scores in the previous exercise. State whether you used software or Table A for these computations. 1.116 Are the NAEP U.S. history scores approximately Normal? In Exercise 1.113, we assumed that the NAEP U.S. history scores for twelfth-grade students are approximately Normal with the reported mean and standard deviation, N(288, 32). Let’s check that assumption. In addition to means and standard deviations, you can find selected percentiles for the NAEP assessments
73
(see previous exercise). For the twelfth-grade U.S. history scores, the following percentiles are reported: Percentile
Score
10%
246
25%
276
50%
290
75%
311
90%
328
Use these percentiles to assess whether or not the NAEP U.S. history scores for twelfth-grade students are approximately Normal. Write a short report describing your methods and conclusions. 1.117 Are the NAEP mathematics scores approximately Normal? Refer to the previous exercise. For the NAEP mathematics scores for twelfth-graders the mean is 153 and the standard deviation is 34. Here are the reported percentiles: Percentile
Score
10%
110
25%
130
50%
154
75%
177
90%
197
Is the N(153, 34) distribution a good approximation for the NAEP mathematics scores? Write a short report describing your methods and conclusions. 1.118 Do women talk more? Conventional wisdom suggests that women are more talkative than men. One study designed to examine this stereotype collected data on the speech of 42 women and 37 men in the United States.39 TALK (a) The mean number of words spoken per day by the women was 14,297 with a standard deviation of 6441. Use the 68–95–99.7 rule to describe this distribution. (b) Do you think that applying the rule in this situation is reasonable? Explain your answer. (c) The men averaged 14,060 words per day with a standard deviation of 9056. Answer the questions in parts (a) and (b) for the men. (d) Do you think that the data support the conventional wisdom? Explain your answer. Note that in Section 7.2 we will learn formal statistical methods to answer this type of question. 1.119 Data from Mexico. Refer to the previous exercise. A similar study in Mexico was conducted with 31 women
74
CHAPTER 1
•
Looking at Data—Distributions
and 20 men. The women averaged 14,704 words per day with a standard deviation of 6215. For men the mean was 15,022 and the standard deviation was 7864. TALKM (a) Answer the questions from the previous exercise for the Mexican study. (b) The means for both men and women are higher for the Mexican study than for the U.S. study. What conclusions can you draw from this observation? 1.120 Total scores. Here are the total scores of 10 students STAT in an introductory statistics course: 62
93 54 76
73
98
64
55
80
71
Previous experience with this course suggests that these scores should come from a distribution that is approximately Normal with mean 72 and standard deviation 10. (a) Using these values for m and s, standardize the scores of these 10 students. (b) If the grading policy is to give a grade of A to the top 15% of scores based on the Normal distribution with mean 72 and standard deviation 10, what is the cutoff for an A in terms of a standardized score? (c) Which of the 10 students earned a grade of A in the course? Show your work. 1.121 Assign more grades. Refer to the previous exercise. The grading policy says that the cutoffs for the other grades correspond to the following: bottom 5% receive F, next 10% receive D, next 40% receive C, and next 30% receive B. These cutoffs are based on the N(72, 10) distribution. (a) Give the cutoffs for the grades in this course in terms of standardized scores. (b) Give the cutoffs in terms of actual total scores. (c) Do you think that this method of assigning grades is a good one? Give reasons for your answer. 1.122 A uniform distribution. If you ask a computer to generate “random numbers’’ between 0 and 1, you will get observations from a uniform distribution. Figure 1.32 graphs the density curve for a uniform distribution. Use areas under this density curve to answer the following questions.
(a) Why is the total area under this curve equal to 1? (b) What proportion of the observations lie below 0.34? (c) What proportion of the observations lie between 0.34 and 0.60? 1.123 Use a different range for the uniform distribution. Many random number generators allow users to specify the range of the random numbers to be produced. Suppose that you specify that the outcomes are to be distributed uniformly between 0 and 5. Then the density curve of the outcomes has constant height between 0 and 5, and height 0 elsewhere. (a) What is the height of the density curve between 0 and 5? Draw a graph of the density curve. (b) Use your graph from (a) and the fact that areas under the curve are proportions of outcomes to find the proportion of outcomes that are less than 1. (c) Find the proportion of outcomes that lie between 0.5 and 2.5. 1.124 Find the mean, the median, and the quartiles. What are the mean and the median of the uniform distribution in Figure 1.32? What are the quartiles? 1.125 Three density curves. Figure 1.33 displays three density curves, each with three points marked on it. At which of these points on each curve do the mean and the median fall? 1.126 Use the Normal Curve applet. Use the Normal Curve applet for the standard Normal distribution to say how many standard deviations above and below the mean the quartiles of any Normal distribution lie. 1.127 Use the Normal Curve applet. The 68–95– 99.7 rule for Normal distributions is a useful approximation. You can use the Normal Curve applet on the text website, whfreeman.com/ips8e, to see how accurate the rule is. Drag one flag across the other so that the applet shows the area under the curve between the two flags.
A B C
A BC
0
1
FIGURE 1.32 The density curve of a uniform distribution, for Exercise 1.122.
AB C
FIGURE 1.33 Three density curves, for Exercise 1.125.
1.4 Density Curves and Normal Distributions (a) Place the flags one standard deviation on either side of the mean. What is the area between these two values? What does the 68–95–99.7 rule say this area is? (b) Repeat for locations two and three standard deviations on either side of the mean. Again compare the 68–95–99.7 rule with the area given by the applet. 1.128 Find some proportions. Using either Table A or your calculator or software, find the proportion of observations from a standard Normal distribution that satisfies each of the following statements. In each case, sketch a standard Normal curve and shade the area under the curve that is the answer to the question. (a) Z ⬎ 1.55 (b) Z ⬍ 1.55 (c) Z ⬎ ⫺0.70 (d) ⫺0.70 ⬍ Z ⬍ 1.55 1.129 Find more proportions. Using either Table A or your calculator or software, find the proportion of observations from a standard Normal distribution for each of the following events. In each case, sketch a standard Normal curve and shade the area representing the proportion. (a) Z ⱕ ⫺1.7 (b) Z ⱖ ⫺1.7 (c) Z ⬎ 1.9 (d) ⫺1.7 ⬍ Z ⬍ 1.9 1.130 Find some values of z. Find the value z of a standard Normal variable Z that satisfies each of the following conditions. (If you use Table A, report the value of z that comes closest to satisfying the condition.) In each case, sketch a standard Normal curve with your value of z marked on the axis. (a) 28% of the observations fall below z. (b) 60% of the observations fall above z. 1.131 Find more values of z. The variable Z has a standard Normal distribution. (a) Find the number z that has cumulative proportion 0.78. (b) Find the number z such that the event Z ⬎ z has proportion 0.22. 1.132 Find some values of z. The Wechsler Adult Intelligence Scale (WAIS) is the most common IQ test. The scale of scores is set separately for each age group, and the scores are approximately Normal with mean 100 and standard deviation 15. People with WAIS scores below 70 are considered developmentally disabled when, for example, applying for Social Security disability benefits. What percent of adults are developmentally disabled by this criterion?
75
1.133 High IQ scores. The Wechsler Adult Intelligence Scale (WAIS) is the most common IQ test. The scale of scores is set separately for each age group, and the scores are approximately Normal with mean 100 and standard deviation 15. The organization MENSA, which calls itself “the high-IQ society,’’ requires a WAIS score of 130 or higher for membership. What percent of adults would qualify for membership? There are two major tests of readiness for college, the ACT and the SAT. ACT scores are reported on a scale from 1 to 36. The distribution of ACT scores is approximately Normal with mean m ⫽ 21.5 and standard deviation s ⫽ 5.4. SAT scores are reported on a scale from 600 to 2400. The distribution of SAT scores is approximately Normal with mean m ⫽ 1498 and standard deviation s ⫽ 316. Exercises 1.134 to 1.143 are based on this information. 1.134 Compare an SAT score with an ACT score. Jessica scores 1825 on the SAT. Ashley scores 28 on the ACT. Assuming that both tests measure the same thing, who has the higher score? Report the z-scores for both students. 1.135 Make another comparison. Joshua scores 17 on the ACT. Anthony scores 1030 on the SAT. Assuming that both tests measure the same thing, who has the higher score? Report the z-scores for both students. 1.136 Find the ACT equivalent. Jorge scores 2060 on the SAT. Assuming that both tests measure the same thing, what score on the ACT is equivalent to Jorge’s SAT score? 1.137 Find the SAT equivalent. Alyssa scores 32 on the ACT. Assuming that both tests measure the same thing, what score on the SAT is equivalent to Alyssa’s ACT score? 1.138 Find an SAT percentile. Reports on a student’s ACT or SAT results usually give the percentile as well as the actual score. The percentile is just the cumulative proportion stated as a percent: the percent of all scores that were lower than or equal to this one. Renee scores 2040 on the SAT. What is her percentile? 1.139 Find an ACT percentile. Reports on a student’s ACT or SAT results usually give the percentile as well as the actual score. The percentile is just the cumulative proportion stated as a percent: the percent of all scores that were lower than or equal to this one. Joshua scores 17 on the ACT. What is his percentile? 1.140 How high is the top 15%? What SAT scores make up the top 15% of all scores? 1.141 How low is the bottom 10%? What SAT scores make up the bottom 10% of all scores? 1.142 Find the ACT quintiles. The quintiles of any distribution are the values with cumulative proportions 0.20, 0.40, 0.60, and 0.80. What are the quintiles of the distribution of ACT scores?
76
CHAPTER 1
•
Looking at Data—Distributions
1.143 Find the SAT quartiles. The quartiles of any distribution are the values with cumulative proportions 0.25 and 0.75. What are the quartiles of the distribution of SAT scores? 1.144 Do you have enough “good cholesterol?’’ Highdensity lipoprotein (HDL) is sometimes called the “good cholesterol’’ because low values are associated with a higher risk of heart disease. According to the American Heart Association, people over the age of 20 years should have at least 40 milligrams per deciliter (mg/dl) of HDL cholesterol.40 U.S. women aged 20 and over have a mean HDL of 55 mg/dl with a standard deviation of 15.5 mg/dl. Assume that the distribution is Normal. (a) What percent of women have low values of HDL (40 mg/dl or less)? (b) HDL levels of 60 mg/dl and higher are believed to protect people from heart disease. What percent of women have protective levels of HDL? (c) Women with more than 40 mg/dl but less than 60 mg/dl of HDL are in the intermediate range, neither very good or very bad. What proportion are in this category? 1.145 Men and HDL cholesterol. HDL cholesterol levels for men have a mean of 46 mg/dl with a standard deviation of 13.6 mg/dl. Answer the questions given in the previous exercise for the population of men. 1.146 Diagnosing osteoporosis. Osteoporosis is a condition in which the bones become brittle due to loss of minerals. To diagnose osteoporosis, an elaborate apparatus measures bone mineral density (BMD). BMD is usually reported in standardized form. The standardization is based on a population of healthy young adults. The World Health Organization (WHO) criterion for osteoporosis is a BMD 2.5 standard deviations below the mean for young adults. BMD measurements in a population of people similar in age and sex roughly follow a Normal distribution. (a) What percent of healthy young adults have osteoporosis by the WHO criterion? (b) Women aged 70 to 79 are of course not young adults. The mean BMD in this age is about ⫺2 on the standard scale for young adults. Suppose that the standard deviation is the same as for young adults. What percent of this older population has osteoporosis? 1.147 Deciles of Normal distributions. The deciles of any distribution are the 10th, 20th, . . . , 90th percentiles. The first and last deciles are the 10th and 90th percentiles, respectively.
deviation 0.15 ounce. What are the first and last deciles of this distribution? 1.148 Quartiles for Normal distributions. The quartiles of any distribution are the values with cumulative proportions 0.25 and 0.75. (a) What are the quartiles of the standard Normal distribution? (b) Using your numerical values from (a), write an equation that gives the quartiles of the N1m, s2 distribution in terms of m and s. 1.149 IQR for Normal distributions. Continue your work from the previous exercise. The interquartile range IQR is the distance between the first and third quartiles of a distribution. (a) What is the value of the IQR for the standard Normal distribution? (b) There is a constant c such that IQR ⫽ cs for any Normal distribution N(m, s). What is the value of c? 1.150 Outliers for Normal distributions. Continue your work from the previous two exercises. The percent of the observations that are suspected outliers according to the 1.5 ⫻ IQR rule is the same for any Normal distribution. What is this percent? 1.151 Deciles of HDL cholesterol. The deciles of any distribution are the 10th, 20th, . . . , 90th percentiles. Refer to Exercise 1.144 where we assumed that the distribution of HDL cholesterol in U.S. women aged 20 and over is Normal with mean 55 mg/dl and standard deviation 15.5 mg/dl. Find the deciles for this distribution. The remaining exercises for this section require the use of software that will make Normal quantile plots. 1.152 Longleaf pine trees. Exercise 1.72 (page 50) gives the diameter at breast height (DBH) for 40 longleaf pine trees from the Wade Tract in Thomas County, Georgia. Make a Normal quantile plot for these data and write a short paragraph interpreting what it describes. PINES 1.153 Three varieties of flowers. The study of tropical flowers and their hummingbird pollinators (Exercise 1.88, page 52) measured the lengths of three varieties of Heliconia flowers. We expect that such biological measurements will have roughly Normal distributions. HELICON (a) Make Normal quantile plots for each of the three flower varieties. Which distribution is closest to Normal?
(a) What are the first and last deciles of the standard Normal distribution?
(b) The other two distributions show the same kind of mild deviation from Normality. In what way are these distributions non-Normal?
(b) The weights of 9-ounce potato chip bags are approximately Normal with mean 9.12 ounces and standard
(c) Compute the mean for each variety. For each flower, subtract the mean for its variety. Make a single data set
Chapter 1 Exercises
77
for all varieties that contains the deviations from the means. Use this data set to create a Normal quantile plot. Examine the plot and summarize your conclusions.
a good way to become familiar with how histograms and Normal quantile plots look when data actually are close to Normal.)
1.154 Use software to generate some data. Use software to generate 200 observations from the standard Normal distribution. Make a histogram of these observations. How does the shape of the histogram compare with a Normal density curve? Make a Normal quantile plot of the data. Does the plot suggest any important deviations from Normality? (Repeating this exercise several times is
1.155 Use software to generate more data. Use software to generate 200 observations from the uniform distribution described in Exercise 1.122. Make a histogram of these observations. How does the histogram compare with the density curve in Figure 1.32? Make a Normal quantile plot of your data. According to this plot, how does the uniform distribution deviate from Normality?
CHAPTER 1 Exercises 1.156 Comparing fuel efficiency. Let’s compare the fuel efficiencies (mpg) of small cars and sporty cars for model year 2013.41 Here are the data: Small Cars 50
45
37
37
37
36
35
34
34
34
34
34
33
33
33
33
34
34
percent of people who regularly eat five or more servings of fruits and vegetables (FruitVeg5). Answer the questions BRFSS given in the previous exercise for this variable. 1.159 Vehicle colors. Vehicle colors differ among types of vehicle in different regions. Here are data on the most popular colors in 2011 for several different regions of the world:43
Sporty Cars 33
32
32
32
32
31
31
31
31
31
31
30
30
30
30
30
30
30
29
29
29
29
29
29
29
Color
North South South America America Europe China Korea Japan percent percent percent percent percent percent
Silver
16
30
15
26
30
19
White
23
17
20
15
25
26
Gray
13
15
18
10
12
9
Give graphical and numerical descriptions of the fuel efficiencies for these two types of vehicles. What are the main features of the distributions? Compare the two distributions and summarize your results in a short paragraph. MPGSS
Black
18
19
25
21
15
20
1.157 Smoking. The Behavioral Risk Factor Surveillance System (BRFSS) conducts a large survey of health conditions and risk behaviors in the United States.42 The BRFSS data file contains data on 23 demographic factors and risk factors for each state. Use the percent of smokers (SmokeEveryDay) for this exercise. BRFSS (a) Prepare a graphical display of the distribution and use your display to describe the major features of the distribution. (b) Calculate numerical summaries. Give reasons for your choices. (c) Write a short paragraph summarizing what the data tell us about smoking in the United States. 1.158 Eat your fruits and vegetables. Nutrition experts recommend that we eat five servings of fruits and vegetables each day. The BRFSS data file described in the previous exercise includes a variable that gives the
Blue
9
1
7
9
4
9
Red
10
11
6
7
4
5
Brown
5
5
5
4
4
4
Yellow
3
1
1
2
1
1
Green
2
1
1
1
1
1
Other
1
0
2
5
4
6
Use the methods you learned in this chapter to compare the vehicle color preferences for the regions of the world presented in this table. Write a report summarizing your findings with an emphasis on similarities and differences across regions. Include recommendations related to marketing and advertising of vehicles in these VCOLORS regions. 1.160 Canadian international trade. The government organization Statistics Canada provides data on many topics related to Canada’s population, resources, economy, society, and culture. Go to the web page statcan.gc.ca/start-debut-eng.html. Under the “Subject’’ tab, choose “International trade.’’ Pick some data from the resources listed and use the methods that you learned in this chapter to create graphical and numerical
78
CHAPTER 1
•
Looking at Data—Distributions
summaries. Write a report summarizing your findings that includes supporting evidence from your analyses. 1.161 Travel and tourism in Canada. Refer to the previous exercise. Under the “Subject’’ tab, choose “Travel and tourism.’’ Pick some data from the resources listed and use the methods that you learned in this chapter to create graphical and numerical summaries. Write a report summarizing your findings that includes supporting evidence from your analyses. 1.162 Internet use. The World Bank collects data on many variables related to development for countries throughout the world.44 One of these is Internet use, expressed as the number of users per 100 people. The data file for this exercise gives 2011 values of this variable for 185 countries. Use graphical and numerical methods to describe this distribution. Write a short report summarizing what the data tell about worldwide Internet use. INETUSE
Display these data in a graph. Write a short summary describing the distribution of subscribers for these 10 providers. Business people looking at this graph see an industry that offers opportunities for larger companies to take over. INETPRO 1.166 Internet service provider ratings. Refer to the previous exercise. The following table gives overall ratings, on a 10-point scale, for these providers. These were posted on the TopTenREVIEWS website.46 INETPRO Service provider
Rating
Service provider
Rating
Comcast
9.25
Charter
7.88
Time Warner
8.60
Verizon
7.63
AT&T
8.53
CenturyLink
7.58
Cox
8.38
SuddenLink
7.38
Optimum
8.20
EarthLink
7.20
1.163 Change Internet use. Refer to the previous exercise. The data file also contains the numbers of users per 100 people for 2010. INETUSE
Display these data in a graph. Write a short summary describing the distribution of ratings for these 10 providers. INETPRO
(a) Analyze the 2010 data.
1.167 What graph would you use? What type of graph or graphs would you plan to make in a study of each of the following issues?
(b) Compute the change in the number of users per 100 people from 2010 to 2011. Analyze the changes. (c) Compute the percent change in the number of users per 100 people from 2010 to 2011. Analyze the percent changes. (d) Write a summary of your analyses in parts (a) to (c). Include a comparison of the changes versus the percent changes. 1.164 Leisure time for college students. You want to measure the amount of “leisure time’’ that college students enjoy. Write a brief discussion of two issues: (a) How will you define “leisure time’’? (b) Once you have defined leisure time, how will you measure Sally’s leisure time this week? 1.165 Internet service. Providing Internet service is a very competitive business in the United States. The numbers of subscribers claimed by the top 10 providers of service were as follows:45 Service provider Comcast Time Warner
Subscribers (millions)
Service provider
Subscribers (millions)
17.0
Charter
5.5
9.7
Verizon
4.3
17.8
CenturyLink
6.4
Cox
3.9
SuddenLink
1.4
Optimum
3.3
EarthLink
1.6
AT&T
(a) What makes of cars do students drive? How old are their cars? (b) How many hours per week do students study? How does the number of study hours change during a semester? (c) Which radio stations are most popular with students? (d) When many students measure the concentration of the same solution for a chemistry course laboratory assignment, do their measurements follow a Normal distribution? 1.168 Spam filters. A university department installed a spam filter on its computer system. During a 21-day period, 6693 messages were tagged as spam. How much spam you get depends on what your online habits are. Here are the counts for some students and faculty in this department (with log-in IDs changed, of course): ID
Count
ID
Count
ID
Count
AA
1818
BB
EE
399
FF
II
251
JJ
ID
Count
1358
CC
389
GG
442
DD
416
304
HH
178
KK
251
158
LL
103
All other department members received fewer than 100 spam messages. How many did the others receive in total? Make a graph and comment on what you learn SPAM from these data.
Chapter 1 Exercises
79
1.169 How much vitamin C do you need? The Food and Nutrition Board of the Institute of Medicine working in cooperation with scientists from Canada have used scientific data to answer this question for a variety of vitamins and minerals.47 Their methodology assumes that needs, or requirements, follow a distribution. They have produced guidelines called dietary reference intakes for different gender-by-age combinations. For vitamin C, there are three dietary reference intakes: the estimated average requirement (EAR), which is the mean of the requirement distribution; the recommended dietary allowance (RDA), which is the intake that would be sufficient for 97% to 98% of the population; and the tolerable upper level (UL), the intake that is unlikely to pose health risks. For women aged 19 to 30 years, the EAR is 60 milligrams per day (mg/d), the RDA is 75 mg/d, and the UL is 2000 mg/d.48
1.172 How much vitamin C do men consume? To evaluate whether or not the intake of a vitamin or mineral is adequate, comparisons are made between the intake distribution and the requirement distribution. Here is some information about the distribution of vitamin C intake, in milligrams per day, for men aged 19 to 30 years:
(a) The researchers assumed that the distribution of requirements for vitamin C is Normal. The EAR gives the mean. From the definition of the RDA, let’s assume that its value is the 97.72 percentile. Use this information to determine the standard deviation of the requirement distribution.
(b) Sketch your Normal intake distribution on the same graph with a sketch of the requirement distribution that you produced in Exercise 1.70.
(b) Sketch the distribution of vitamin C requirements for 19- to 30-year-old women. Mark the EAR, the RDA, and the UL on your plot. 1.170 How much vitamin C do men need? Refer to the previous exercise. For men aged 19 to 30 years, the EAR is 75 milligrams per day (mg/d), the RDA is 90 mg/d, and the UL is 2000 mg/d. Answer the questions in the previous exercise for this population. 1.171 How much vitamin C do women consume? To evaluate whether or not the intake of a vitamin or mineral is adequate, comparisons are made between the intake distribution and the requirement distribution. Here is some information about the distribution of vitamin C intake, in milligrams per day, for women aged 19 to 30 years:49 Percentile (mg/d) Mean 84.1
1st 5th 31
42
19th 25th 48
61
50th
75th
90th
95th
99th
79
102
126
142
179
(a) Use the 5th, the 50th, and the 95th percentiles of this distribution to estimate the mean and standard deviation of this distribution assuming that the distribution is Normal. Explain your method for doing this. (b) Sketch your Normal intake distribution on the same graph with a sketch of the requirement distribution that you produced in part (b) of Exercise 1.69. (c) Do you think that many women aged 19 to 30 years are getting the amount of vitamin C that they need? Explain your answer.
Percentile (mg/d) Mean
1st 5th 19th
122.2
39
55
65
25th
50th
75th
90th
95th
99th
85
114
150
190
217
278
(a) Use the 5th, the 50th, and the 95th percentiles of this distribution to estimate the mean and standard deviation of this distribution assuming that the distribution is Normal. Explain your method for doing this.
(c) Do you think that many men aged 19 to 30 years are getting the amount of vitamin C that they need? Explain your answer. 1.173 Time spent studying. Do women study more than men? We asked the students in a large first-year college class how many minutes they studied on a typical weeknight. Here are the responses of random samples of 30 women and 30 men from the class: STUDY Women
Men
170
120
180
360
240
80
120
30
90
120
180
120
240
170
90
45
30
120
200 75
150
120
180
180
150
150
120
60
240
300
200
150
180
150
180
240
60
120
60
30
120
60
120
180
180
30
230
120
95
150
90
240
180
115
120
0
200
120
120
180
(a) Examine the data. Why are you not surprised that most responses are multiples of 10 minutes? We eliminated one student who claimed to study 30,000 minutes per night. Are there any other responses that you consider suspicious? (b) Make a back-to-back stemplot of these data. Report the approximate midpoints of both groups. Does it appear that women study more than men (or at least claim that they do)? (c) Make side-by-side boxplots of these data. Compare the boxplots with the stemplot you made in part (b). Which to you prefer? Give reasons for your answer. 1.174 Product preference. Product preference depends in part on the age, income, and gender of the consumer.
80
CHAPTER 1
•
Looking at Data—Distributions
A market researcher selects a large sample of potential car buyers. For each consumer, she records gender, age, household income, and automobile preference. Which of these variables are categorical and which are quantitative? 1.175 Two distributions. If two distributions have exactly the same mean and standard deviation, must their histograms have the same shape? If they have the same five-number summary, must their histograms have the same shape? Explain. 1.176 Norms for reading scores. Raw scores on behavioral tests are often transformed for easier comparison. A test of reading ability has mean 70 and standard deviation 10 when given to third-graders. Sixth-graders have mean score 80 and standard deviation 11 on the same test. To provide separate “norms’’ for each grade, we want scores in each grade to have mean 100 and standard deviation 20. (a) What linear transformation will change third-grade scores x into new scores xnew ⫽ a ⫹ bx that have the desired mean and standard deviation? (Use b ⬎ 0 to preserve the order of the scores.) (b) Do the same for the sixth-grade scores. (c) David is a third-grade student who scores 72 on the test. Find David’s transformed score. Nancy is a
sixth-grade student who scores 78. What is her transformed score? Who scores higher within his or her grade? (d) Suppose that the distribution of scores in each grade is Normal. Then both sets of transformed scores have the N(100, 20) distribution. What percent of third-graders have scores less than 75? What percent of sixth-graders have scores less than 75? 1.177 Use software to generate some data. Most statistical software packages have routines for generating values of variables having specified distributions. Use your statistical software to generate 30 observations from the N(25, 8) distribution. Compute the mean and standard deviation x and s of the 30 values you obtain. How close are x and s to the m and s of the distribution from which the observations were drawn? Repeat 19 more times the process of generating 30 observations from the N(25, 8) distribution and recording x and s. Make a stemplot of the 20 values of x and another stemplot of the 20 values of s. Make Normal quantile plots of both sets of data. Briefly describe each of these distributions. Are they symmetric or skewed? Are they roughly Normal? Where are their centers? (The distributions of measures like x and s when repeated sets of observations are made from the same theoretical distribution will be very important in later chapters.)
Looking at Data—Relationships Introduction In Chapter 1 we learned to use graphical and numerical methods to describe the distribution of a single variable. Many of the interesting examples of the use of statistics involve relationships between pairs of variables. Learning ways to describe relationships with graphical and numerical methods is the focus of this chapter. In Section 2.2 we focus on graphical descriptions. The scatterplot is our fundamental graphical tool for displaying the relationship between two quantitative variables. Sections 2.3 and 2.4 move on to numerical summaries for these relationships. Cautions about the use of these methods are discussed in Section 2.5. Graphical and numerical methods for describing the relationship between two categorical variables are presented in Section 2.6. We conclude with Section 2.7, a brief overview of issues related to the distinction between associations and causation.
2.1 Relationships
CHAPTER 2.1 Relationships
2
2.2 Scatterplots 2.3 Correlation 2.4 Least-Squares Regression 2.5 Cautions about Correlation and Regression 2.6 Data Analysis for Two-Way Tables 2.7 The Question of Causation
When you complete this section, you will be able to • Identify the key characteristics of a data set to be used to explore a relationship between two variables. • Categorize variables as response variables or explanatory variables. 81
82
CHAPTER 2
•
Looking at Data—Relationships In Chapter 1 (page 4) we discussed the key characteristics of a data set. Cases are the objects described by a set of data, and a variable is a characteristic of a case. We also learned to categorize variables as categorical or quantitative. For Chapter 2, we focus on data sets that have two variables.
EXAMPLE 2.1 Stress and lack of sleep. Stress is a common problem for college students. Exploring factors that are associated with stress may lead to strategies that will help students to relieve some of the stress that they experience. Recent studies have suggested that a lack of sleep is associated with stress.1 The two variables involved in the relationship here are lack of sleep and stress. The cases are the students who are the subjects for a particular study. When we study relationships between two variables, it is not sufficient to collect data on the two variables. A key idea for this chapter is that both variables must be measured on the same cases. USE YOUR KNOWLEDGE 2.1 Relationship between attendance at class and final exam. You want to study the relationship between the attendance at class and the score on the final for the 30 students enrolled in an elementary statistics class. (a) Who are the cases for your study? (b) What are the variables? (c) Are the variables quantitative or categorical? Explain your answer. We use the term associated to describe the relationship between two variables, such as stress and lack of sleep in Example 2.1. Here is another example where two variables are associated.
EXAMPLE 2.2 Size and price of a coffee beverage. You visit a local Starbucks to buy a Mocha Frappuccino®. The barista explains that this blended coffee beverage comes in three sizes and asks if you want a Tall, a Grande, or a Venti. The prices are $3.75, $4.35, and $4.85, respectively. There is a clear association between the size of the Mocha Frappuccino and its price.
ASSOCIATION BETWEEN VARIABLES Two variables measured on the same cases are associated if knowing the values of one of the variables tells you something about the values of the other variable that you would not know without this information.
2.1 Relationships
83
In the Mocha Frappuccino example, knowing the size tells you the exact price, so the association here is very strong. Many statistical associations, however, are simply overall tendencies that allow exceptions. Some people get adequate sleep and are highly stressed. Others get little sleep and do not experience much stress. The association here is much weaker than the one in the Mocha Frappuccino example.
Examining relationships To examine the relationship between two or more variables, we first need to know some basic characteristics of the data. Here is an example.
EXAMPLE 2.3 Stress and lack of sleep. A study of stress and lack of sleep collected data on 1125 students from an urban midwestern university. Two of the variables measured were the Pittsburgh Sleep Quality Index (PSQI) and the Subjective Units of Distress Scale (SUDS). In this study the cases are the 1125 students studied.2 The PSQI is based on responses to a large number of questions that are summarized in a single variable that has a value between 0 and 21 for each subject. Therefore, we will treat the PSQI as a quantitative variable. The SUDS is a similar scale with values between 0 and 100 for each subject. We will treat the SUDS as a quantitative variable also. In many situations, we measure a collection of categorical variables and then combine them in a scale that can be viewed as a quantitative variable. The PSQI is an example. We can also turn the tables in the other direction. Here is an example.
EXAMPLE 2.4 Hemoglobin and anemia. Hemoglobin is a measure of iron in the blood. The units are grams of hemoglobin per deciliter of blood (g/dl). Typical values depend on age and gender. Adult women typically have values between 12 and 16 g/dl. Anemia is a major problem in developing countries, and many studies have been designed to address the problem. In these studies, computing the mean hemoglobin is not particularly useful. For studies like these, it is more appropriate to use a definition of severe anemia (a hemoglobin level of less than 8 g/dl). Thus, for example, researchers can compare the proportions of subjects who are severely anemic for two treatments rather than the difference in the mean hemoglobin levels. In this situation, the categorical variable, severely anemic or not, is much more useful than the quantitative variable, hemoglobin. When analyzing data to draw conclusions it is important to carefully consider the best way to summarize the data. Just because a variable is measured as a quantitative variable, it does not necessarily follow that the best summary
84
CHAPTER 2
•
Looking at Data—Relationships is based on the mean (or the median). As the previous example illustrates, converting a quantitative variable to a categorical variable is a very useful option to keep in mind.
USE YOUR KNOWLEDGE 2.2 Create a categorical variable from a quantitative variable. Consider the study described in Example 2.3. Some analyses compared three groups of students. The students were classified as having optimal sleep quality (a PSQI of 5 or less), borderline sleep quality (a PSQI of 6 or 7), or poor sleep quality (a PSQI of 8 or more). When the three groups of students are compared, is the PSQI being used as a quantitative variable or as a categorical variable? Explain your answer and describe some advantages to using the optimal, borderline, and poor categories in explaining the results of a study such as this. 2.3 Replace names by ounces. In the Mocha Frappuccino example, the variable size is categorical, with Tall, Grande, and Venti as the possible values. Suppose that you converted these values to the number of ounces: Tall is 12 ounces, Grande is 16 ounces, and Venti is 24 ounces. For studying the relationship between ounces and price, describe the cases and the variables, and state whether each is quantitative or categorical. When you examine the relationship between two variables, a new question becomes important: • Is your purpose simply to explore the nature of the relationship, or do you hope to show that one of the variables can explain variation in the other? Is one of the variables a response variable and the other an explanatory variable?
RESPONSE VARIABLE, EXPLANATORY VARIABLE A response variable measures an outcome of a study. An explanatory variable explains or causes changes in the response variable.
EXAMPLE 2.5 Stress and lack of sleep. Refer to the study of stress and lack of sleep in Example 2.3. Here, the explanatory variable is the Pittsburgh Sleep Quality Index, and the response variable is the Subjective Units of Distress Scale.
USE YOUR KNOWLEDGE 2.4 Sleep and stress or stress and sleep? Consider the scenario described in the previous example. Make an argument for treating the Subjective Units of Distress Scale as the explanatory variable and the Pittsburgh Sleep Quality Index as the response variable.
2.1 Relationships
85
In some studies it is easy to identify explanatory and response variables. The following example illustrates one situation where this is true: when we actually set values of one variable to see how it affects another variable.
EXAMPLE 2.6 How much calcium do you need? Adolescence is a time when bones are growing very actively. If young people do not have enough calcium, their bones will not grow properly. How much calcium is enough? Research designed to answer this question has been performed for many years at events called “Camp Calcium.”3 At these camps, subjects eat controlled diets that are identical except for the amount of calcium. The amount of calcium retained by the body is the major response variable of interest. Since the amount of calcium consumed is controlled by the researchers, this variable is the explanatory variable. When you don’t set the values of either variable but just observe both variables, there may or may not be explanatory and response variables. Whether there are depends on how you plan to use the data.
EXAMPLE 2.7 Student loans. A college student aid officer looks at the findings of the National Student Loan Survey. She notes data on the amount of debt of recent graduates, their current income, and how stressful they feel about college debt. She isn’t interested in predictions but is simply trying to understand the situation of recent college graduates. A sociologist looks at the same data with an eye to using amount of debt and income, along with other variables, to explain the stress caused by college debt. Now, amount of debt and income are explanatory variables, and stress level is the response variable. In many studies, the goal is to show that changes in one or more explanatory variables actually cause changes in a response variable. But many explanatory-response relationships do not involve direct causation. The SAT scores of high school students help predict the students’ future college grades, but high SAT scores certainly don’t cause high college grades.
KEY CHARACTERISTICS OF DATA FOR RELATIONSHIPS A description of the key characteristics of a data set that will be used to explore a relationship between two variables should include • Cases. Identify the cases and how many there are in the data set. • Label. Identify what is used as a label variable if one is present. • Categorical or quantitative. Classify each variable as categorical or quantitative. • Values. Identify the possible values for each variable. • Explanatory or response. If appropriate, classify each variable as explanatory or response.
86
CHAPTER 2
•
Looking at Data—Relationships
independent variable dependent variable
Some of the statistical techniques in this chapter require us to distinguish explanatory from response variables; others make no use of this distinction. You will often see explanatory variables called independent variables and response variables called dependent variables. These terms express mathematical ideas; they are not statistical terms. The concept that underlies this language is that response variables depend on explanatory variables. Because the words “independent” and “dependent” have other meanings in statistics that are unrelated to the explanatory-response distinction, we prefer to avoid those words. Most statistical studies examine data on more than one variable. Fortunately, statistical analysis of several-variable data builds on the tools used for examining individual variables. The principles that guide our work also remain the same: • Start with a graphical display of the data. • Look for overall patterns and deviations from those patterns. • Based on what you see, use numerical summaries to describe specific aspects of the data.
SECTION 2.1 Summary To study relationships between variables, we must measure the variables on the same cases. If we think that a variable x may explain or even cause changes in another variable y, we call x an explanatory variable and y a response variable.
SECTION 2.1 Exercises For Exercise 2.1, see page 82; for Exercises 2.2 and 2.3, see page 84; and for Exercise 2.4, see page 84. 2.5 High click counts on Twitter. A study was done to identify variables that might produce high click counts on Twitter. You and 9 of your friends collect data on all of your tweets for a week. You record the number of click counts, the time of day, the day of the week, the gender of the person posting the tweet, and the length of the tweet. (a) What are the cases for this study? (b) Classify each of the variables as categorical or quantitative. (c) Classify each of the variables as explanatory, response, or neither. Explain your answers. 2.6 Explanatory or response? For each of the following scenarios, classify each of the pair of variables as explanatory or response or neither. Give reasons for your answers. (a) The amount of calcium per day in your diet and the amount of vitamin A per day in your diet. (b) The number of bedrooms in an apartment and the monthly rent of the apartment.
(c) The diameter of an apple and the weight of the apple. (d) The length of time that you spend in the sun and the amount of vitamin D that is produced by your skin. 2.7 Buy and sell prices of used textbooks. Think about a study designed to compare the prices of textbooks for third- and fourth-year college courses in five different majors. For the five majors, you want to examine the relationship between the difference in the price that you pay for a used textbook and the price that the seller gives back to you when you return the textbook. Describe a data set that could be used for this study, and give the key characteristics of the data. 2.8 Protein and carbohydrates. Think about a study designed to examine the relationship between protein intake and carbohydrate intake in the diets of college sophomores. Describe a data set that could be used for this study, and give the key characteristics of the data. 2.9 Can you examine the relationship? For each of the following scenarios, determine whether or not the data would allow you to examine a relationship between two variables. If your answer is Yes, give the key
2.2 Scatterplots
87
characteristics of a data set that could be analyzed. If your answer is No, explain your answer.
point averages of the students who will graduate this year.
(a) The temperature where you live yesterday and the temperature where you live today.
(c) A consumer study reported the price per load and an overall quality score for 24 brands of laundry detergents.
(b) The average high school grade point averages of the first-year students at your college and the college grade
2.2 Scatterplots When you complete this section, you will be able to • Make a scatterplot to examine a relationship between two quantitative variables. • Describe the overall pattern in a scatterplot and any striking deviations from that pattern. • Use a scatterplot to describe the form, direction, and strength of a relationship. • Use a scatterplot to identify outliers. • Identify a linear pattern in a scatterplot. • Explain the effect of a change of units on a scatterplot. • Use a log transformation to change a curved relationship into a linear relationship. • Use different plotting symbols to include information about a categorical variable in a scatterplot.
EXAMPLE DATA LAUNDRY
2.8 Laundry detergents. Consumers Union provides ratings on a large variety of consumer products. They use sophisticated testing methods as well as surveys of their members to create these ratings. The ratings are published in their magazine, Consumer Reports.4 One recent article rated laundry detergents on a scale from 1 to 100. Here are the ratings along with the price per load, in cents, for 24 laundry detergents:5
CHALLENGE
Rating
Price (cents)
Rating
Price (cents)
Rating
Price (cents)
Rating
Price (cents)
61 55 50 46 35 32
17 30 9 13 8 5
59 52 48 46 34 29
22 23 16 13 12 14
56 51 48 45 33 26
22 11 15 17 7 11
55 50 48 36 32 26
16 15 18 8 6 13
We will examine the relationship between rating and price per load for these laundry detergents. We expect that the higher-priced detergents will tend to have higher ratings.
88
CHAPTER 2
•
Looking at Data—Relationships USE YOUR KNOWLEDGE DATA LAUNDRY
2.10 Examine the spreadsheet. Examine the spreadsheet that gives the laundry detergent data in the data file LAUNDRY. (a) How many cases are in the data set? (b) Describe the labels, variables, and values. (c) Which columns represent quantitative variables? Which columns represent categorical variables?
CHALLENGE DATA LAUNDRY
(d) Is there an explanatory variable? A response variable? Explain your answer. 2.11 Use the data set. Using the data set from the previous exercise, create graphical and numerical summaries for the rating and for the price per load. The most common way to display the relationship between two quantitative variables is a scatterplot.
SCATTERPLOT CHALLENGE
A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.
EXAMPLE DATA LAUNDRY
2.9 Laundry detergents. A higher price for a product should be associated with a better product. Therefore, let’s treat price per load as the explanatory variable and rating as the response variable in our examination of the relationship between these two variables. We begin with a graphical display. Figure 2.1 gives a scatterplot that displays the relationship between the response variable, rating, and the explanatory variable, price per load. The plot confirms our idea that a higher price should be associated with a better rating.
FIGURE 2.1 Scatterplot of price
70
CHALLENGE
per load (in cents) versus rating for 24 laundry detergents, for Example 2.9. Rating
60 50 40 30 20 0
10 20 Price per load (cents)
30
2.2 Scatterplots
89
Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis) of a scatterplot. We usually call the explanatory variable x and the response variable y. If there is no explanatory-response distinction, either variable can go on the horizontal axis. Time plots, such as the one in Figure 1.13 (page 24), are special scatterplots where the explanatory variable x is a measure of time.
USE YOUR KNOWLEDGE 2.12 Make a scatterplot. DATA LAUNDRY
CHALLENGE DATA LAUNDRY
(a) Make a scatterplot similar to Figure 2.1 for the laundry detergent data. (b) Two of the laundry detergents are gels. These products are made by the same manufacturer, and one of them has an additive for stain removal. The ratings and prices per load are the same; the rating is 46 and the price is 13. Mark the location of these gels on your plot. (c) Cases with identical values for both variables are generally indistinguishable in a scatterplot. To what extent do you think that this could give a distorted picture of the relationship between two variables for a data set that has a large number of duplicate values? Explain your answer. 2.13 Change the units. (a) Create a spreadsheet for the laundry detergent data with the price per load expressed in dollars. (b) Make a scatterplot for the data in your spreadsheet. (c) Describe how this scatterplot differs from Figure 2.1.
CHALLENGE
Interpreting scatterplots To look more closely at a scatterplot such as Figure 2.1, apply the strategies of exploratory analysis learned in Chapter 1.
EXAMINING A SCATTERPLOT In any graph of data, look for the overall pattern and for striking deviations from that pattern. You can describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.
linear relationship
Figure 2.1 shows a clear form: the data lie in a roughly straight-line, or linear, pattern. To help us see this relationship, we can use software to put a straight line through the data. We will see more details about how this is done in Section 2.4.
CHAPTER 2
•
Looking at Data—Relationships
EXAMPLE DATA LAUNDRY
2.10 Scatterplot with a straight line. Figure 2.2 plots the laundry detergent data with a fitted straight line. The line helps us to see and to evaluate the linear form of the relationship. There is a large amount of scatter about the line. Referring to the data given in Example 2.8, we see that for 11 cents per load, one detergent has a rating of 26, while another has a rating of 51, almost twice as large. No clear outliers are evident.
CHALLENGE
70 60 Rating
90
50 40 30 20 0
10 20 Price per load (cents)
30
FIGURE 2.2 Scatterplot of rating versus price per load (in cents) with a fitted straight line, for Example 2.10.
The relationship in Figure 2.2 has a clear direction: laundry detergents that cost more generally have higher ratings. This is a positive association between the two variables.
POSITIVE ASSOCIATION, NEGATIVE ASSOCIATION Two variables are positively associated when above-average values of one tend to accompany above-average values of the other and below-average values also tend to occur together. Two variables are negatively associated when above-average values of one tend to accompany below-average values of the other, and vice versa.
The strength of a relationship in a scatterplot is determined by how closely the points follow a clear form. The overall relationship in Figure 2.2 is fairly moderate. Here is an example of a stronger linear relationship.
2.2 Scatterplots
91
EXAMPLE DATA DEBT
2.11 Debt for 33 countries. The amount of debt owed by a country is a measure of its economic health. The Organisation for Economic Co-operation and Development collects data on the central government debt for many countries. One of their tables gives the debt for countries over several years.6 Figure 2.3 is a spreadsheet giving the government debt for 33 countries that have data for the years 2005 to 2010. Since countries that have large economies tend to have large debts, we have chosen a table that expresses the debt as a percent of the gross domestic product (GDP).
CHALLENGE
FIGURE 2.3 Central government debt in the years 2005 to 2010 for 33 countries, in percent of GDP, for Example 2.11.
Excel
A 1
B
C
D
E
F
G
Debt2005 6.312
Debt2006 5.76
Debt2007 5.181
Debt2008 4.922
Debt2009 8.195
Debt2010 10.966
2
Country Australia
3
Austria
62.116
60.434
57.829
59.319
64.916
65.754
4
Belgium
91.774
87.568
85.295
90.094
94.893
96.789
5
Canada
30.235
27.934
25.183
28.642
35.716
36.073
6
Chile
7.282
5.264
4.097
5.173
6.228
9.185
7
Czech Republic
23.164
24.904
25.24
27.102
32.496
36.625
8
Denmark
39.292
32.715
27.765
32.318
37.891
39.59
9
Estonia
2.091
1.836
1.319
1.761
3.55
3.227
10
Finland
38.17
35.561
31.201
29.452
37.549
41.683
11
France
53.275
52.131
52.118
53.406
61.231
67.418
12
Germany
40.832
41.232
39.55
39.55
44.205
44.403
13
Greece
110.572
107.675
105.674
110.617
127.022
147.839
14
Hungary
58.103
61.971
61.551
67.668
72.79
73.898
15
Iceland
19.378
24.807
23.237
44.175
87.473
81.257
16
Ireland
23.524
20.253
19.834
28.001
47.074
60.703
17
Israel
92.102
82.659
75.948
75.307
77.693
74.714
18
Italy
97.656
97.454
95.627
98.093
106.778
109.015
19
Korea
27.595
30.065
29.651
29.027
32.558
31.935
20
Luxembourg
0.821
1.458
1.419
8.153
8.489
12.578
21
Mexico
20.295
20.583
20.861
24.369
28.086
27.46
22
Netherlands
42.952
39.169
37.552
50.068
49.719
51.845
23
New Zealand
22.069
21.58
20.343
20.721
27.53
30.45
24
Norway
17.173
12.473
11.681
13.905
26.363
26.077
25
Poland
44.764
45.143
42.62
44.686
47.015
49.679
26
Portugal
66.194
67.732
66.622
68.88
78.73
87.962
27
Slovak Republic
33.103
29.164
28.108
26.342
33.749
39.078
28
Slovenia
26.9
25.782
23.207
21.188
33.628
36.023
29
Spain
36.36
32.965
30.019
33.695
46.026
51.693
30
Sweden
46.232
42.242
36.406
35.56
38.098
33.782
31
Switzerland
28.102
25.195
23.216
22.376
20.723
20.24
32
Turkey
51.087
45.498
39.551
40.011
46.35
42.851
33
United Kingdom
43.523
43.185
42.744
61.059
75.27
85.535
34
United States
36.149
36.039
35.703
40.183
53.573
61.274
CHAPTER 2
•
Looking at Data—Relationships 160 140 Debt, 2010 (% GDP)
92
120 100 80 60 40 20 0 0
20
40
60 80 100 Debt, 2009 (% GDP)
120
140
FIGURE 2.4 Scatterplot of debt in 2010 (percent of GDP) versus debt in 2009 (percent of GDP) for 33 countries, for Example 2.11.
Figure 2.4 is a scatterplot of the central government debt in 2010 versus the central government debt in 2009. The scatterplot shows a strong positive relationship between the debt in these two years. USE YOUR KNOWLEDGE 2.14 Make a scatterplot. In our Mocha Frappuccino example, the 12-ounce drink costs $3.75, the 16-ounce drink costs $4.35, and the 24-ounce drink costs $4.85. Explain which variable should be used as the explanatory variable, and make a scatterplot. Describe the scatterplot and the association between these two variables. Can we conclude that the strong linear relationship that we found between the central government debt in 2009 and 2010 is evidence that the debt for each country is approximately the same in the two years? The answer is No. The first exercise below asks you to explore this issue. USE YOUR KNOWLEDGE DATA DEBT
DATA DEBT
2.15 Are the debts in 2009 and 2010 approximately the same? Use the methods you learned in Chapter 1 to examine whether or not the central government debts in 2009 and 2010 are approximately the same. (Hint: Think about creating a new variable that would help you to answer this question.) 2.16 The relationship between debt in 2005 and debt in 2010. Make a plot similar to Figure 2.4 to examine the relationship between debt in 2010 and debt in 2005.
CHALLENGE
(a) Describe the relationship and compare it with the relationship between debt in 2010 and debt in 2009. (b) Answer the question posed in the previous exercise for these data.
CHALLENGE
Of course, not all relationships are linear. Here is an example where the relationship is described by a curve.
2.2 Scatterplots
93
EXAMPLE DATA CALCIUM
CHALLENGE
2.12 Calcium retention. Our bodies need calcium to build strong bones. How much calcium do we need? Does the amount that we need depend on our age? Questions like these are studied by nutrition researchers. One series of studies used the amount of calcium retained by the body as a response variable and the amount of calcium consumed as an explanatory variable.7 Figure 2.5 is a scatterplot of calcium retention in milligrams per day (mg/d) versus calcium intake (mg/d) for 56 children aged 11 to 15 years. A smooth curve generated by software helps us see the relationship between the two variables. There is clearly a relationship here. As calcium intake increases, the body retains more calcium. However, the relationship is not linear. The curve is approximately linear for low values of intake, but then the line curves more and becomes almost level.
Calcium retention (mg/d)
1000 900 800 700 600 500 400 300 200 100 500
1000 1500 Calcium intake (mg/d)
2000
FIGURE 2.5 Scatterplot of calcium retention (mg/d) versus calcium intake (mg/d) for 56 children with a fitted curve, for Example 2.12. There is a positive relationship between these two variables but it is not linear.
transformation
There are many kinds of curved relationships like that in Figure 2.5. For some of these, we can apply a transformation to the data that will make the relationship approximately linear. To do this, we replace the original values with the transformed values and then use the transformed values for our analysis. Transforming data is common in statistical practice. There are systematic principles that describe how transformations behave and guide the search for transformations that will, for example, make a distribution more Normal or a curved relationship more linear.
The log transformation log transformation
The most important transformation that we will use is the log transformation. This transformation can be used for variables that have positive values only. Occasionally, we use it when there are zeros, but in this case we first replace the zero values by some small value, often one-half of the smallest positive value in the data set.
CHAPTER 2
•
Looking at Data—Relationships You have probably encountered logarithms in one of your high school mathematics courses as a way to do certain kinds of arithmetic. Logarithms are a lot more fun when used in statistical analyses. We will use natural logarithms. Statistical software and statistical calculators generally provide easy ways to perform this transformation. Let’s try a log transformation on our calcium retention data. Here are the details.
EXAMPLE DATA CALCIUM
2.13 Calcium retention with logarithms. Figure 2.6 is a scatterplot of the log of calcium retention versus calcium intake. The plot includes a fitted straight line to help us see the relationship. We see that the transformation has worked. Our relationship is now approximately linear. 7
CHALLENGE
Log calcium retention
94
6
5
4 600
800
1000 1200 1400 1600 Calcium intake (mg/d)
1800
2000
FIGURE 2.6 Scatterplot of log calcium retention versus calcium intake, with a fitted line, for 56 children, for Example 2.13. The relationship is approximately linear.
Our analysis of the calcium retention data in Examples 2.12 and 2.13 reminds us of an important issue when describing relationships. In Example 2.12 we noted that the relationship appeared to become approximately flat. Biological processes are consistent with this observation. There is probably a point where additional intake does not result in any additional retention. With our transformed relationship in Figure 2.6, however, there is no leveling off as we saw in Figure 2.5, even though we appear to have a good fit to the data. The relationship and fit apply to the range of data that are analyzed. We cannot assume that the relationship extends beyond the range of the data. Use of transformations and the interpretation of scatterplots are an art that requires judgment and knowledge about the variables that we are studying. Always ask yourself if the relationship that you see makes sense. If it does not, then additional analyses are needed to understand the data.
Adding categorical variables to scatterplots In Example 2.9 (page 88) we looked at the relationship between the rating and the price per load for 24 laundry detergents. A more detailed look at the data
2.2 Scatterplots
95
shows that there are three different types of laundry detergent included in this data set. In Exercise 2.12 we saw that two of the detergents were gels. The other two types are liquid and powder. Let’s examine where these three types of laundry detergents are in our plot.
CATEGORICAL VARIABLES IN SCATTERPLOTS To add a categorical variable to a scatterplot, use a different plot color or symbol for each category.
EXAMPLE plot, we use the symbol “G” for gels, “L” for liquids, and “P” for powders. The scatterplot with these plotting symbols is given in Figure 2.7. The two gels appear in the middle of the plot as a single point because the ratings and prices are identical. There is a tendency for the liquids to be clustered in the upper right of the plot, with high ratings and high prices. In contrast, the powders tend to be in the left, with low ratings and low prices.
CHALLENGE
In this example, we used a categorical variable, type, to distinguish the three types of laundry detergents in our plot. Suppose that the additional variable that we want to investigate is quantitative. In this situation, we sometimes can combine the values into ranges of the quantitative variable, such as high, medium, and low, to create a categorical variable. Careful judgment is needed in using this graphical method. Don’t be discouraged if your first attempt is not very successful. In performing a good data analysis, you will often produce several plots before you find the one that you believe to be the most effective in describing the data.8
70 60
L
L L
L
Rating
DATA LAUNDRY
2.14 Rating versus price and type of laundry detergent. In our scatter-
50
P
P
P PL
G
L P
L
P
40 PP P
30
P P
P P P
P
20 0
10 20 Price per load (cents)
30
FIGURE 2.7 Scatterplot of rating versus price per load (in cents), with a fitted straight line, for 24 laundry detergents, for Example 2.14. The type of detergent is indicated by the plotting symbol; “G” for gel, “L” for liquid, and “P” for powder.
CHAPTER 2
•
Looking at Data—Relationships USE YOUR KNOWLEDGE DATA
2.17 Is a linear relationship the best description? Look carefully at the plot in Figure 2.7.
LAUNDRY
(a) Do you think that the linear relationship we found between rating and price is mostly due to the difference between liquid and powder detergents? Explain your answer.
CHALLENGE
(b) In describing the laundry detergent data would you say that (i) there is a linear relationship between rating and price or (ii) powders cost less and have lower ratings; liquids cost more and have higher ratings; and gels are somewhere in the middle? Give reasons for your answer.
BEYOND THE BASICS
Scatterplot smoothers
algorithms smoothing
The relationship in Figure 2.4 (page 92) appears to be linear. Some statistical software packages provide a tool to help us make this kind of judgment. These use computer-intensive methods called algorithms that calculate a smooth curve that gives an approximate fit to the points in a scatterplot. This is called smoothing a scatterplot. Usually, these methods use a smoothing parameter that determines how smooth the fit will be. You can vary it until you have a fit that you judge suitable for your data. Here is an example.
EXAMPLE DATA DEBT
2.15 Debt for 33 countries with a smooth fit. Figure 2.8 gives the scatterplot that we examined in Figure 2.4 with a smooth fit. Notice that the smooth curve fits almost all the points. However, the curve is too wavy and does not provide a good summary of the relationship. 160 140
CHALLENGE
Debt, 2010 (% GDP)
96
120 100 80 60 40 20 0 0
20
40
60 80 100 Debt, 2009 (% GDP)
120
140
FIGURE 2.8 Scatterplot of debt in 2010 (percent of GDP) versus debt in 2009 (percent of GDP), with a smooth curve fitted to the data, for 33 countries, for Example 2.15. This smooth curve fits the data too well and does not provide a good summary of the relationship.
2.2 Scatterplots
97
160 Debt, 2010 (% GDP)
140 120 100 80 60 40 20 0 0
20
40
60 80 100 Debt, 2009 (% GDP)
120
140
FIGURE 2.9 Scatterplot of debt in 2010 (percent of GDP) versus debt in 2009 (percent of GDP), with a smooth curve fitted to the data, for 33 countries, for Example 2.16. This smooth curve gives a good summary of the relationship. It is approximately linear.
Our first attempt at smoothing the data was not very successful. This scenario happens frequently when we use data analysis methods to learn something from our data. Don’t be discouraged when your first attempt at summarizing data produces unsatisfactory results. Take what you learn and refine your analysis until you are satisfied that you have found a good summary. It is your last attempt, not your first, that is most important.
EXAMPLE DATA
2.16 A better smooth fit for the debt data. By varying the smoothing DEBT
parameter, we can make the curve more or less smooth. Figure 2.9 gives the same data as in the previous figure but with a better smooth fit. The smooth curve is very close to a straight line. In this way we have confirmed our original impression that the relationship between these two variables is approximately linear.
Categorical explanatory variables CHALLENGE
Scatterplots display the association between two quantitative variables. To display a relationship between a categorical variable and a quantitative variable, make a side-by-side comparison of the distributions of the response for each category. Back-to-back stemplots (page 14) and side-by-side boxplots (page 41) are useful tools for this purpose. We will study methods for describing the association between two categorical variables in Section 2.6 (page 139).
SECTION 2.2 Summary A scatterplot displays the relationship between two quantitative variables. Mark values of one variable on the horizontal axis (x axis) and values of the other variable on the vertical axis (y axis). Plot each individual’s data as a point on the graph.
98
CHAPTER 2
•
Looking at Data—Relationships Always plot the explanatory variable, if there is one, on the x axis of a scatterplot. Plot the response variable on the y axis. Plot points with different colors or symbols to see the effect of a categorical variable in a scatterplot. In examining a scatterplot, look for an overall pattern showing the form, direction, and strength of the relationship, and then for outliers or other deviations from this pattern. Form: Linear relationships, where the points show a straight-line pattern, are an important form of relationship between two variables. Curved relationships are other forms to watch for. Direction: If the relationship has a clear direction, we speak of either positive association (high values of the two variables tend to occur together) or negative association (high values of one variable tend to occur with low values of the other variable). Strength: The strength of a relationship is determined by how close the points in the scatterplot lie to a simple form such as a line. To display the relationship between a categorical explanatory variable and a quantitative response variable, make a graph that compares the distributions of the response for each category of the explanatory variable.
SECTION 2.2 Exercises For Exercises 2.10 and 2.11, see page 88; for Exercises 2.12 and 2.13, see page 89; for Exercise 2.14, see page 92; for Exercises 2.15 and 2.16, see page 92; and for Exercise 2.17, see page 96. 2.18 Bone strength. Osteoporosis is a condition where bones become weak. It affects more than 200 million people worldwide. Exercise is one way to produce strong bones and to prevent osteoporosis. Since we use our dominant arm (the right arm for most people) more than our nondominant arm, we expect the bone in our dominant arm to be stronger than the bone in our nondominant arm. By comparing the strengths, we can get an idea of the effect that exercise can have on bone strength. Here are some data on the strength of bones, measured in cm4/1000, for the arms of 15 young men:9 ARMSTR
ID Nondominant 1 2 3 4 5 6 7 8
15.7 25.2 17.9 19.1 12.0 20.0 12.3 14.4
Dominant 16.3 26.9 18.7 22.0 14.8 19.8 13.1 17.5
ID Nondominant Dominant 9 10 11 12 13 14 15
15.9 13.7 17.7 15.5 14.4 14.1 12.3
20.1 18.7 18.7 15.2 16.2 15.0 12.9
Before attempting to compare the arm strengths of the dominant and nondominant arms, let’s take a careful look at the data for these two variables. (a) Make a scatterplot of the data with the nondominant arm strength on the x axis and the dominant arm strength on the y axis. (b) Describe the overall pattern in the scatterplot and any striking deviations from the pattern. (c) Describe the form, direction, and strength of the relationship. (d) Identify any outliers. (e) Is the relationship approximately linear? 2.19 Bone strength for baseball players. Refer to the previous exercise. The study collected arm bone strength information for two groups of young men. The data in the previous exercise were for a control group. The second group in the study comprised men who played baseball. We know that these baseball players use their dominant arm in throwing (those who throw with their nondominant arm were excluded), so they get more arm exercise than the controls. Here are the data for the baseball players: ARMSTR
2.2 Scatterplots
ID Nondominant 16 17 18 19 20 21 22 23
17.0 16.9 17.7 21.2 21.0 14.6 31.5 14.9
Dominant 19.3 19.0 25.2 37.7 40.3 20.8 36.9 21.2
ID Nondominant Dominant 24 25 26 27 28 29 30
15.1 13.5 13.6 20.3 17.3 14.6 22.6
19.4 20.4 17.1 26.5 30.3 17.4 35.0
(a) Make a scatterplot of the data. Give reasons for the choice of which variables to use on the x and y axes. (b) Describe the overall pattern in the scatterplot and any striking deviations from the pattern. (c) Describe the form, direction, and strength of the relationship. (d) Identify any outliers. (e) Is the relationship approximately linear?
Answer the questions in the previous exercise for the baseball players. 2.20 Compare the baseball players with the controls. Refer to the previous two exercises. ARMSTR
2.23 Use a log for the radioactive decay. Refer to the previous exercise. Transform the counts using a log transformation. Then repeat parts (a) through (e) for the transformed data and compare your results with those DECAY from the previous exercise.
(a) Plot the data for the two groups on the same graph using different symbols for the baseball players and the controls.
2.24 Make some sketches. For each of the following situations, make a scatterplot that illustrates the given relationship between two variables.
(b) Use your plot to describe and compare the relationships for the two variables. Write a short paragraph summarizing what you have found.
(a) A weak negative relationship.
2.21 College students by state. In Example 1.19 (page 21) we examined the distribution of undergraduate college students in the United States and displayed the histogram for these data in Figure 1.11. We noted that we could explain some of the variation in this distribution by considering the populations of the states. In Example 1.20, we transformed the number of undergraduate college students into the number of undergraduates per 1000 population. Let’s look at these data a little differently. Let’s examine the relationship between two variables: number of college students and population COLLEGE of the state. (a) Which variable do you choose to be the explanatory variable? Which variable do you choose to be the response variable? Give reasons for your choices. (b) Make a scatterplot of the two variables and write a short paragraph describing the relationship. 2.22 Decay of a radioactive element. Barium-137m is a radioactive form of the element barium that decays very rapidly. It is easy and safe to use for lab experiments in schools and colleges.10 In a typical experiment, the radioactivity of a sample of barium-137m is measured for one minute. It is then measured for three additional oneminute periods, separated by two minutes. So data are recorded at 1, 3, 5, and 7 minutes after the start of the first counting period. The measurement units are counts. Here are the data for one of these experiments:11 DECAY Time Count
1 578
99
3 317
5 203
7 118
(b) No apparent relationship. (c) A strong positive linear relationship. (d) A more complicated relationship. Explain the relationship. 2.25 What’s wrong? Explain what is wrong with each of the following: (a) If two variables are negatively associated, then high values of one variable are associated with high values of the other variable. (b) In a scatterplot we put the response variable on the x axis and the explanatory variable on the y axis. (c) A histogram can be used to examine the relationship between two variables. 2.26 What’s in the beer? The website beer100.com advertises itself as “Your Place for All Things Beer.” One of their “things” is a list of 153 domestic beer brands with the percent alcohol, calories per 12 ounces, and carbohydrates per 12 ounces (in grams).12 BEER (a) Figure 2.10 gives a scatterplot of carbohydrates versus percent alcohol. Give a short summary of what can be learned from the plot. (b) One of the points is an outlier. Use the data file to find the outlier brand of beer. How is this brand of beer marketed compared with the other brands? (c) Remove the outlier from the data set and generate a scatterplot of the remaining data. (d) Describe the relationship between carbohydrates and percent alcohol based on what you see in your scatterplot.
100
CHAPTER 2
•
Looking at Data—Relationships 2.29 Try a log. Refer to the previous exercise.
Carbohydrates (g)
40
INBIRTH
(a) Make a scatterplot of the log of births per 1000 people versus Internet users per 100 people.
30 20
(b) Describe the relationship that you see in this plot and compare it with Figure 2.11.
10
(c) Which plot do you prefer? Give reasons for your answer. 2.30 Make another plot. Refer to Exercise 2.28.
0 0
1
2
3
4
5 6 7 8 Percent alcohol
9
10
11 12
FIGURE 2.10 Scatterplot of carbohydrates versus percent
(c) Make a scatterplot using the transformed variables.
BEER
(a) Make a scatterplot of calories versus percent alcohol using the data set without the outlier.
(d) Compare your new plot with the one in Figure 2.11.
(b) Describe the relationship between these two variables. 2.28 Internet use and babies. The World Bank collects data on many variables related to world development for countries throughout the world. Two of these are Internet use, in number of users per 100 people, and birthrate, in births per 1000 people.13 Figure 2.11 is a scatterplot of birthrate versus Internet use for the 106 countries that have data available for both variables. INBIRTH (a) Describe the relationship between these two variables. (b) A friend looks at this plot and concludes that using the Internet will decrease the number of babies born. Write a short paragraph explaining why the association seen in the scatterplot does not provide a reason to draw this conclusion. 50 Births per 1000 people
(a) Make a new data set that has Internet users expressed as users per 10,000 people and births as births per 10,000 people. (b) Explain why these transformations to give new variables are linear transformations. (Hint: See linear transformations on page 45.)
alcohol for 153 brands of beer, for Exercise 2.26. 2.27 More beer. Refer to the previous exercise.
(e) Why do you think that the analysts at the World Bank chose to express births as births per 1000 people and Internet users as users per 100 people? 2.31 Explanatory and response variables. In each of the following situations, is it more reasonable to simply explore the relationship between the two variables or to view one of the variables as an explanatory variable and the other as a response variable? In the latter case, which is the explanatory variable and which is the response variable? (a) The reading ability of a child and the shoe size of the child. (b) College grade point average and high school grade point average. (c) The rental price of an apartment and the number of square feet in the apartment.
40
(d) The amount of sugar added to a cup of coffee and how sweet the coffee tastes.
30
(e) The temperature outside today at noon and the temperature outside yesterday at noon.
20 10 0 0
10
20
30 40 50 60 70 Internet users per 100 people
80
FIGURE 2.11 Scatterplot of births (per 1000 people) versus Internet users (per 100 people) for 106 countries, for Exercise 2.28.
INBIRTH
90
2.32 Parents’ income and student loans. How well does the income of a college student’s parents predict how much the student will borrow to pay for college? We have data on parents’ income and college debt for a sample of 1200 recent college graduates. What are the explanatory and response variables? Are these variables categorical or quantitative? Do you expect a positive or negative association between these variables? Why? 2.33 Reading ability and IQ. A study of reading ability in schoolchildren chose 60 fifth-grade children at random from a school. The researchers had the children’s scores
Child’s self-estimate of reading ability
2.2 Scatterplots
Child's reading test score
100 80 60 40 20 0 70
80
90
5
4
3 2 1
100 110 120 130 140 150 Child's IQ score
FIGURE 2.12 IQ and reading test scores for 60 fifth-grade children, for Exercise 2.33.
on an IQ test and on a test of reading ability.14 Figure 2.12 plots reading test score (response) against IQ score (explanatory). (a) Explain why we should expect a positive association between IQ and reading score for children in the same grade. Does the scatterplot show a positive association? (b) A group of four points appear to be outliers. In what way do these children’s IQ and reading scores deviate from the overall pattern? (c) Ignoring the outliers, is the association between IQ and reading score roughly linear? Is it very strong? Explain your answers. 2.34 Can children estimate their reading ability? The main purpose of the study cited in Exercise 2.33 was to ask whether schoolchildren can estimate their own reading ability. The researchers had the children’s scores on a test of reading ability. They asked each child to estimate his or her reading level, on a scale from 1 (low) to 5 (high). Figure 2.13 is a scatterplot of the children’s estimates (response) against their reading scores (explanatory). (a) What explains the “stair-step” pattern in the plot? (b) Is there an overall positive association between reading score and self-estimate? (c) There is one clear outlier. What is this child’s selfestimated reading level? Does this appear to over- or underestimate the level as measured by the test? 2.35 Body mass and metabolic rate. Metabolic rate, the rate at which the body consumes energy, is important in studies of weight gain, dieting, and exercise. The
101
0
20 40 60 80 100 Child’s score on a test of reading ability
FIGURE 2.13 Reading test scores for 60 fifth-grade children and the children’s estimates of their own reading levels, for Exercise 2.34. following table gives data on the lean body mass and resting metabolic rate for 12 women and 7 men who are subjects in a study of dieting. Lean body mass, given in kilograms, is a person’s weight leaving out all fat. Metabolic rate is measured in calories burned per 24 hours, the same calories used to describe the energy content of foods. The researchers believe that lean body mass is an important influence on metabolic rate. BMASS Subject
Sex
Mass
Rate
Subject
1
M
62.0
1792
11
2
M
62.9
1666
12
3
F
36.1
995
13
4
F
54.6
1425
5
F
48.5
6
F
42.0
7
M
8
F
Mass
Rate
F
40.3
1189
F
33.1
913
M
51.9
1460
14
F
42.4
1124
1396
15
F
34.5
1052
1418
16
F
51.1
1347
47.4
1362
17
F
41.2
1204
50.6
1502
18
M
51.9
1867
19
M
46.9
1439
9
F
42.0
1256
10
M
48.7
1614
Sex
(a) Make a scatterplot of the data, using different symbols or colors for men and women. (b) Is the association between these variables positive or negative? What is the form of the relationship? How strong is the relationship? Does the pattern of the relationship differ for women and men? How do the male subjects as a group differ from the female subjects as a group?
102
CHAPTER 2
•
Looking at Data—Relationships
2.36 Team value in the NFL. Management theory says that the value of a business should depend on its operating income, the income produced by the business after taxes. (Operating income excludes income from sales of assets and investments, which don’t reflect the actual business.) Total revenue, which ignores costs, should be less important. Debt includes borrowing for the construction of a new arena. The data file NFL gives the value (in millions of dollars), debt (as percent of value), revenue (in millions of dollars), and operating income (in millions of dollars) of the 32 teams in the National Football League (NFL).15 NFL (a) Plot team value against revenue. Describe the relationship. (b) Plot team value against debt. Describe the relationship. (c) Plot team value against operating income. Describe the relationship.
TABLE 2.1
(d) Write a short summary comparing the relationships that you described in parts (a), (b), and (c) of this exercise. 2.37 Records for men and women in the 10K. Table 2.1 shows the progress of world record times (in seconds) for the 10,000-meter run for both men and women.16 TENK (a) Make a scatterplot of world record time against year, using separate symbols for men and women. Describe the pattern for each sex. Then compare the progress of men and women. (b) Women began running this long distance later than men, so we might expect their improvement to be more rapid. Moreover, it is often said that men have little advantage over women in distance running as opposed to sprints, where muscular strength plays a greater role. Do the data appear to support these claims?
World Record Times for the 10,000-Meter Run Men
Record year
Women
Time (seconds)
Record year
Time (seconds)
Record year
Time (seconds)
1912
1880.8
1963
1695.6
1967
2286.4
1921
1840.2
1965
1659.3
1970
2130.5
1924
1835.4
1972
1658.4
1975
2100.4
1924
1823.2
1973
1650.8
1975
2041.4
1924
1806.2
1977
1650.5
1977
1995.1
1937
1805.6
1978
1642.4
1979
1972.5
1938
1802.0
1984
1633.8
1981
1950.8
1939
1792.6
1989
1628.2
1981
1937.2
1944
1775.4
1993
1627.9
1982
1895.3
1949
1768.2
1993
1618.4
1983
1895.0
1949
1767.2
1994
1612.2
1983
1887.6
1949
1761.2
1995
1603.5
1984
1873.8
1950
1742.6
1996
1598.1
1985
1859.4
1953
1741.6
1997
1591.3
1986
1813.7
1954
1734.2
1997
1587.8
1993
1771.8
1956
1722.8
1998
1582.7
1956
1710.4
2004
1580.3
1960
1698.8
2005
1577.3
1962
1698.2
2.3 Correlation
103
2.3 Correlation When you complete this section, you will be able to • Use a correlation to describe the direction and strength of a linear relationship between two quantitative variables. • Interpret the sign of a correlation. • Identify situations where the correlation is not a good measure of association between two quantitative variables. • Identify a linear pattern in a scatterplot. • For describing the relationship between two quantitative variables, identify the roles of the correlation, a numerical summary, and the scatterplot (a graphical summary).
A scatterplot displays the form, direction, and strength of the relationship between two quantitative variables. Linear (straight-line) relations are particularly important because a straight line is a simple pattern that is quite common. We say a linear relationship is strong if the points lie close to a straight line, and weak if they are widely scattered about a line. Our eyes are not good judges of how strong a relationship is. The two scatterplots in Figure 2.14 depict exactly the same data, but the plot on the right is drawn smaller in a large field. The plot on the right seems to show a stronger relationship. Our eyes can be fooled by changing the plotting scales or the amount of white space around the cloud of points in a scatterplot.17 We need to follow our strategy for data analysis by using a numerical measure to supplement the graph. Correlation is the measure we use.
The correlation r We have data on variables x and y for n individuals. Think, for example, of measuring height and weight for n people. Then x1 and y1 are your height and your weight, x2 and y2 are my height and my weight, and so on. For the ith individual, height xi goes with weight yi. Here is the definition of correlation.
160
250
140
200
120
150
y 100
y 100
80
50
60 40
0 60
80
100 x
120
140
0
50
100
x
150
200
250
FIGURE 2.14 Two scatterplots of the same data. The linear pattern in the plot on the right appears stronger because of the surrounding space.
104
CHAPTER 2
•
Looking at Data—Relationships
CORRELATION The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r. Suppose that we have data on variables x and y for n individuals. The means and standard deviations of the two variables are x and sx for the x-values, and y and sy for the y-values. The correlation r between x and y is r⫽
xi ⫺ x yi ⫺ y 1 a ba b a sx sy n⫺1
As always, the summation sign g means “add these terms for all the individuals.” The formula for the correlation r is a bit complex. It helps us see what correlation is but is not convenient for actually calculating r. In practice you should use software or a calculator that finds r from keyed-in values of two variables x and y. The formula for r begins by standardizing the observations. Suppose, for example, that x is height in centimeters and y is weight in kilograms and that we have height and weight measurements for n people. Then x and sx are the mean and standard deviation of the n heights, both in centimeters. The value xi ⫺ x sx is the standardized height of the ith person. The standardized height says how many standard deviations above or below the mean a person’s height lies. Standardized values have no units—in this example, they are no longer measured in centimeters. Standardize the weights also. The correlation r is an average of the products of the standardized height and the standardized weight for the n people. USE YOUR KNOWLEDGE DATA LAUNDRY
DATA LAUNDRY
2.38 Laundry detergents. Example 2.8 describes data on the rating and price per load for 24 laundry detergents. Use these data to compute the correlation between rating and the price per load. 2.39 Change the units. Refer to the previous exercise. Express the price per load in dollars. (a) Is the transformation from cents to dollars a linear transformation? Explain your answer.
CHALLENGE
(b) Compute the correlation between rating and price per load expressed in dollars. (c) How does the correlation that you computed in part (b) compare with the one you computed in the previous exercise?
CHALLENGE
(d) What can you say in general about the effect of changing units using linear transformations on the size of the correlation?
2.3 Correlation
105
Properties of correlation The formula for correlation helps us see that r is positive when there is a positive association between the variables. Height and weight, for example, have a positive association. People who are above average in height tend to also be above average in weight. Both the standardized height and the standardized weight for such a person are positive. People who are below average in height tend also to have below-average weight. Then both standardized height and standardized weight are negative. In both cases, the products in the formula for r are mostly positive and so r is positive. In the same way, we can see that r is negative when the association between x and y is negative. More detailed study of the formula gives more detailed properties of r. Here is what you need to know in order to interpret correlation: • Correlation makes no use of the distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation. • Correlation requires that both variables be quantitative. For example, we cannot calculate a correlation between the incomes of a group of people and what city they live in, because city is a categorical variable. • Because r uses the standardized values of the observations, r does not change when we change the units of measurement (a linear transformation) of x, y, or both. Measuring height in inches rather than centimeters and weight in pounds rather than kilograms does not change the correlation between height and weight. The correlation r itself has no unit of measurement; it is just a number. • Positive r indicates positive association between the variables, and negative r indicates negative association. • The correlation r is always a number between ⫺1 and 1. Values of r near 0 indicate a very weak linear relationship. The strength of the relationship increases as r moves away from 0 toward either ⫺1 or 1. Values of r close to ⫺1 or 1 indicate that the points lie close to a straight line. The extreme values r ⫽ ⫺1 and r ⫽ 1 occur only when the points in a scatterplot lie exactly along a straight line. • Correlation measures the strength of only the linear relationship between two variables. Correlation does not describe curved relationships between variables, no matter how strong they are. CHALLENGE
• Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot. The scatterplots in Figure 2.15 illustrate how values of r closer to 1 or ⫺1 correspond to stronger linear relationships. To make the essential meaning of r clear, the standard deviations of both variables in these plots are equal and the horizontal and vertical scales are the same. In general, it is not so easy to guess the value of r from the appearance of a scatterplot. Remember that changing the plotting scales in a scatterplot may mislead our eyes, but it does not change the standardized values of the variables and therefore cannot change the correlation. To explore how extreme observations can influence r, use the Correlation and Regression applet available on the text website.
106
CHAPTER 2
•
Looking at Data—Relationships
FIGURE 2.15 How the correlation r measures the direction and strength of a linear association.
Correlation r = 0
Correlation r = –0.3
Correlation r = 0.5
Correlation r = –0.7
Correlation r = 0.9
Correlation r = –0.99
Finally, remember that correlation is not a complete description of twovariable data, even when the relationship between the variables is linear. You should give the means and standard deviations of both x and y along with the correlation. (Because the formula for correlation uses the means and standard deviations, these measures are the proper choices to accompany a correlation.) Conclusions based on correlations alone may require rethinking in the light of a more complete description of the data.
EXAMPLE 2.17 Scoring of figure skating in the Olympics. Until a scandal at the 2002 Olympics brought change, figure skating was scored by judges on a scale from 0.0 to 6.0. The scores were often controversial. We have the scores awarded by two judges, Pierre and Elena, to many skaters. How well do they agree? We calculate that the correlation between their scores is r ⫽ 0.9. But the mean of Pierre’s scores is 0.8 point lower than Elena’s mean.
2.3 Correlation
107
These facts in the example above do not contradict each other. They are simply different kinds of information. The mean scores show that Pierre awards lower scores than Elena. But because Pierre gives every skater a score about 0.8 point lower than Elena, the correlation remains high. Adding the same number to all values of either x or y does not change the correlation. If both judges score the same skaters, the competition is scored consistently because Pierre and Elena agree on which performances are better than others. The high r shows their agreement. But if Pierre scores some skaters and Elena others, we must add 0.8 point to Pierre’s scores to arrive at a fair comparison.
SECTION 2.3 Summary The correlation r measures the direction and strength of the linear (straight line) association between two quantitative variables x and y. Although you can calculate a correlation for any scatterplot, r measures only linear relationships. Correlation indicates the direction of a linear relationship by its sign: r ⬎ 0 for a positive association and r ⬍ 0 for a negative association. Correlation always satisfies ⫺1 ⱕ r ⱕ 1 and indicates the strength of a relationship by how close it is to ⫺1 or 1. Perfect correlation, r ⫽ ;1, occurs only when the points lie exactly on a straight line. Correlation ignores the distinction between explanatory and response variables. The value of r is not affected by changes in the unit of measurement of either variable. Correlation is not resistant, so outliers can greatly change the value of r.
SECTION 2.3 Exercises For Exercises 2.38 and 2.39, see page 104. 2.40 Correlations and scatterplots. Explain why you should always look at a scatterplot when you want to use a correlation to describe the relationship between two quantitative variables. 2.41 Interpret some correlations. For each of the following correlations, describe the relationship between the two quantitative variables in terms of the direction and the strength of the linear relationship. (a) r ⫽ 0.0 (b) r ⫽ ⫺0.9 (c) r ⫽ 0.3 (d) r ⫽ 0.8 2.42 When should you not use a correlation? Describe two situations where a correlation would not give a good numerical summary of the relationship between two quantitative variables. Illustrate each situation with a scatterplot and write a short paragraph explaining why the correlation would not be appropriate in each of these situations. 2.43 Bone strength. Exercise 2.18 (page 98) gives the bone strengths of the dominant and the nondominant arms for 15 men who were controls in a study. ARMSTR
(a) Find the correlation between the bone strength of the dominant arm and the bone strength of the nondominant arm. (b) Look at the scatterplot for these data that you made in part (a) of Exercise 2.18 (or make one if you did not do that exercise). Is the correlation a good numerical summary of the graphical display in the scatterplot? Explain your answer. 2.44 Bone strength for baseball players. Refer to the previous exercise. Similar data for baseball players is given in Exercise 2.19 (page 98). Answer parts (a) and (b) of the previous exercise for these data. ARMSTR 2.45 College students by state. In Exercise 2.21 (page 99) you used a scatterplot to display the relationship between the number of undergraduates and the populations of the states. COLLEGE, COL46 (a) What is the correlation between these two variables? (b) Does the correlation give a good numerical summary of the relationship between these two variables? Explain your answer. (c) Eliminate the four states with populations greater than 15 million and find the correlation for the other 46 states. How does this correlation differ from the one that
108
CHAPTER 2
•
Looking at Data—Relationships
you found in part (a)? What does this tell you about how the range of the values of the variables in a data set can affect the magnitude of a correlation? 2.46 Decay of a radioactive element. Data for an experiment on the decay of barium-137m is given in Exercise 2.22 (page 99). DECAY (a) Find the correlation between the radioactive counts and the time after the start of the first counting period. (b) Does the correlation give a good numerical summary of the relationship between these two variables? Explain your answer. 2.47 Decay in the log scale. Refer to the previous exercise and to Exercise 2.23 (page 99), where the counts were transformed by a log. DECAY (a) Find the correlation between the log counts and the time after the start of the first counting period. (b) Does the correlation give a good numerical summary of the relationship between these two variables? Explain your answer. (c) Compare your results for this exercise with those from the previous exercise. 2.48 Thinking about correlation. Figure 2.9 (page 97) is a scatterplot of 2010 debt versus 2009 debt for 33 countries. Is the correlation r for these data near ⫺1, clearly negative but not near ⫺1, near 0, clearly positive but not near 1, or near 1? Explain your answer. Verify your answer by doing the calculation. DEBT 2.49 Brand names and generic products. (a) If a store always prices its generic “store brand” products at 80% of the brand name products’ prices, what would be the correlation between the prices of the brand name products and the store brand products? (Hint: Draw a scatterplot for several prices.)
(d) What important point about correlation does this exercise illustrate? 2.51 Alcohol and carbohydrates in beer. Figure 2.10 (page 100) gives a scatterplot of the percent alcohol versus carbohydrates in 153 brands of beer. Compute BEER the correlation for these data. 2.52 Alcohol and carbohydrates in beer revisited. Refer to the previous exercise. The data that you used to compute the correlation includes an outlier. BEER (a) Remove the outlier and recompute the correlation. (b) Write a short paragraph about the possible effects of outliers on a correlation using this example to illustrate your ideas. 2.53 Internet use and babies. Figure 2.11 (page 100) is a scatterplot of the number of births per 1000 people versus Internet users per 100 people for 106 countries. In Exercise 2.28 (page 100) you described this INBIRTH relationship. (a) Make a plot of the data similar to Figure 2.11 and report the correlation. (b) Is the correlation a good numerical summary for this relationship? Explain your answer. 2.54 NFL teams. In Exercise 2.36 (page 102) you used graphical summaries to examine the relationship between team value and three possible explanatory variables for 32 National Football League teams. Find the correlations for these variables. Do you think that these correlations provide good numerical summaries for the relationships? NFL Explain your answers. 2.55 Use the applet. You are going to use the Correlation and Regression applet to make different scatterplots with 10 points that have correlation close to 0.8. Many patterns can have the same correlation. Always plot your data before you trust a correlation.
(b) If the store always prices its generic products $2 less than the corresponding brand name products, then what would be the correlation between the prices of the brand name products and the store brand products?
(a) Stop after adding the first 2 points. What is the value of the correlation? Why does it have this value no matter where the 2 points are located?
2.50 Strong association but no correlation. Here is a data set that illustrates an important point about correlation: CORR
(b) Make a lower-left to upper-right pattern of 10 points with correlation about r ⫽ 0.8. (You can drag points up or down to adjust r after you have 10 points.) Make a rough sketch of your scatterplot.
X
25
35
45
55
65
Y
10
30
50
30
10
(a) Make a scatterplot of Y versus X. (b) Describe the relationship between Y and X. Is it weak or strong? Is it linear? (c) Find the correlation between Y and X.
(c) Make another scatterplot, this time with 9 points in a vertical stack at the left of the plot. Add one point far to the right and move it until the correlation is close to 0.8. Make a rough sketch of your scatterplot. (d) Make yet another scatterplot, this time with 10 points in a curved pattern that starts at the lower left, rises to the right, then falls again at the far right. Adjust the points up or down until you have a quite smooth curve
2.4 Least-Squares Regression with correlation close to 0.8. Make a rough sketch of this scatterplot also. 2.56 Use the applet. Go to the Correlation and Regression applet. Click on the scatterplot to create a group of 10 points in the lower-right corner of the scatterplot with a strong straight-line negative pattern (correlation about ⫺0.9). (a) Add one point at the upper left that is in line with the first 10. How does the correlation change? (b) Drag this last point down until it is opposite the group of 10 points. How small can you make the correlation? Can you make the correlation positive? A single outlier can greatly strengthen or weaken a correlation. Always plot your data to check for outlying points. 2.57 An interesting set of data. Make a scatterplot of the following data: INTER x
1
2
3
4
10
10
y
1
3
3
5
1
11
Use your calculator to show that the correlation is about 0.5. What feature of the data is responsible for reducing the correlation to this value despite a strong straightline association between x and y in most of the observations? 2.58 High correlation does not mean that the values are the same. Investment reports often include correlations. Following a table of correlations among mutual funds, a report adds, “Two funds can have perfect correlation, yet different levels of risk. For example, Fund A and Fund B may be perfectly correlated, yet Fund A moves 20% whenever Fund B moves 10%.” Write a brief
109
explanation, for someone who knows no statistics, of how this can happen. Include a sketch to illustrate your explanation. 2.59 Student ratings of teachers. A college newspaper interviews a psychologist about student ratings of the teaching of faculty members. The psychologist says, “The evidence indicates that the correlation between the research productivity and teaching rating of faculty members is close to zero.” The paper reports this as “Professor McDaniel said that good researchers tend to be poor teachers, and vice versa.” Explain why the paper’s report is wrong. Write a statement in plain language (don’t use the word “correlation”) to explain the psychologist’s meaning. 2.60 What’s wrong? Each of the following statements contains a blunder. Explain in each case what is wrong. (a) “There is a high correlation between the age of American workers and their occupation.” (b) “We found a high correlation (r ⫽ 1.19) between students’ ratings of faculty teaching and ratings made by other faculty members.” (c) “The correlation between the gender of a group of students and the color of their cell phone was r ⫽ 0.23.” 2.61 IQ and GPA. Table 1.3 (page 29) reports data on 78 seventh-grade students. We expect a positive association between IQ and GPA. Moreover, some people think that self-concept is related to school performance. Examine in detail the relationships between GPA and the two explanatory variables IQ and self-concept. Are the relationships roughly linear? How strong are they? Are there unusual points? What is the effect of removing these points? SEVENGR
2.4 Least-Squares Regression When you complete this section, you will be able to • Draw a straight line on a scatterplot of a set of data, given the equation of the line. • Predict a value of the response variable y for a given value of the explanatory variable x using a regression equation. • Explain the meaning of the term “least squares.” • Calculate the equation of a least-squares regression line from the means and standard deviations of the explanatory and response variables and their correlation. • Read the output of statistical software to find the equation of the leastsquares regression line and the value of r 2. • Explain the meaning of r 2 in the regression setting.
110
CHAPTER 2
•
Looking at Data—Relationships Correlation measures the direction and strength of the linear (straight-line) relationship between two quantitative variables. If a scatterplot shows a linear relationship, we would like to summarize this overall pattern by drawing a line on the scatterplot. A regression line summarizes the relationship between two variables, but only in a specific setting: when one of the variables helps explain or predict the other. That is, regression describes a relationship between an explanatory variable and a response variable.
REGRESSION LINE A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation, requires that we have an explanatory variable and a response variable.
EXAMPLE DATA FIDGET
2.18 Fidgeting and fat gain. Does fidgeting keep you slim? Some people don’t gain weight even when they overeat. Perhaps fidgeting and other “nonexercise activity” (NEA) explains why—the body might spontaneously increase nonexercise activity when fed more. Researchers deliberately overfed 16 healthy young adults for 8 weeks. They measured fat gain (in kilograms) and, as an explanatory variable, increase in energy use (in calories) from activity other than deliberate exercise—fidgeting, daily living, and the like. Here are the data:18
CHALLENGE
⫺94
⫺57
⫺29
135
143
151
245
355
Fat gain (kg)
4.2
3.0
3.7
2.7
3.2
3.6
2.4
1.3
NEA increase (cal)
392
473
486
535
571
580
620
690
Fat gain (kg)
3.8
1.7
1.6
2.2
1.0
0.4
2.3
1.1
NEA increase (cal)
Figure 2.16 is a scatterplot of these data. The plot shows a moderately strong negative linear association with no outliers. The correlation is r ⫽ ⫺0.7786. People with larger increases in nonexercise activity do indeed gain less fat. A line drawn through the points will describe the overall pattern well.
Fitting a line to data fitting a line
When a scatterplot displays a linear pattern, we can describe the overall pattern by drawing a straight line through the points. Of course, no straight line passes exactly through all the points. Fitting a line to data means drawing a line that comes as close as possible to the points. The equation of a line fitted to the data gives a concise description of the relationship between the response variable y and the explanatory variable x. It is the numerical summary that supports the scatterplot, our graphical summary.
2.4 Least-Squares Regression FIGURE 2.16 Fat gain after
6
Fat gain (kilograms)
8 weeks of overeating plotted against the increase in nonexercise activity over the same period, for Example 2.18.
111
4
2
0
–200
0 200 400 600 800 Nonexercise activity (calories)
1000
STRAIGHT LINES Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A straight line relating y to x has an equation of the form y ⫽ b0 ⫹ b1x In this equation, b1 is the slope, the amount by which y changes when x increases by one unit. The number b0 is the intercept, the value of y when x ⫽ 0. In practice, we will use software to obtain values of b0 and b1 for a given set of data.
EXAMPLE 2.19 Regression line for fat gain. Any straight line describing the nonexercise activity data has the form fat gain ⫽ b0 ⫹ 1b1 ⫻ NEA increase2 In Figure 2.17 we have drawn the regression line with the equation fat gain ⫽ 3.505 ⫺ 10.00344 ⫻ NEA increase2 The figure shows that this line fits the data well. The slope b1 ⫽ ⫺0.00344 tells us that fat gained goes down by 0.00344 kilogram for each added calorie of NEA increase. The slope b1 of a line y ⫽ b0 ⫹ b1x is the rate of change in the response y as the explanatory variable x changes. The slope of a regression line is an important numerical description of the relationship between the two variables. For Example 2.19, the intercept, b0 ⫽ 3.505 kilograms. This value is the estimated
112
CHAPTER 2
•
Looking at Data—Relationships
FIGURE 2.17 A regression line
6
Fat gain (kilograms)
fitted to the nonexercise activity data and used to predict fat gain for an NEA increase of 400 calories, for Examples 2.19 and 2.20.
4
2
o
Predicted gain in fat
0
–200
0 200 400 600 800 Nonexercise activity (calories)
1000
fat gain if NEA does not change. When we substitute the value zero for the NEA increase, the regression equation gives 3.505 (the intercept) as the predicted value of the fat gain. USE YOUR KNOWLEDGE 2.62 Plot the line. Make a sketch of the data in Example 2.18 and plot the line fat gain ⫽ 2.505 ⫺ 10.00344 ⫻ NEA increase2 on your sketch. Explain why this line does not give a good fit to the data.
Prediction prediction
We can use a regression line to predict the response y for a specific value of the explanatory variable x.
EXAMPLE 2.20 Prediction for fat gain. Based on the linear pattern, we want to predict the fat gain for an individual whose NEA increases by 400 calories when she overeats. To use the fitted line to predict fat gain, go “up and over” on the graph in Figure 2.17. From 400 calories on the x axis, go up to the fitted line and over to the y axis. The graph shows that the predicted gain in fat is a bit more than 2 kilograms. If we have the equation of the line, it is faster and more accurate to substitute x ⫽ 400 in the equation. The predicted fat gain is fat gain ⫽ 3.505 ⫺ 10.00344 ⫻ 4002 ⫽ 2.13 kilograms The accuracy of predictions from a regression line depends on how much scatter about the line the data show. In Figure 2.17, fat gains for similar increases in NEA show a spread of 1 or 2 kilograms. The regression line summarizes the pattern but gives only roughly accurate predictions.
2.4 Least-Squares Regression
113
USE YOUR KNOWLEDGE 2.63 Predict the fat gain. Use the regression equation in Example 2.19 to predict the fat gain for a person whose NEA increases by 500 calories.
EXAMPLE 2.21 Is this prediction reasonable? Can we predict the fat gain for someone whose nonexercise activity increases by 1500 calories when she overeats? We can certainly substitute 1500 calories into the equation of the line. The prediction is fat gain ⫽ 3.505 ⫺ 10.00344 ⫻ 15002 ⫽ ⫺1.66 kilograms That is, we predict that this individual loses fat when she overeats. This prediction is not trustworthy. Look again at Figure 2.17. An NEA increase of 1500 calories is far outside the range of our data. We can’t say whether increases this large ever occur, or whether the relationship remains linear at such extreme values. Predicting fat gain when NEA increases by 1500 calories extrapolates the relationship beyond what the data show.
EXTRAPOLATION Extrapolation is the use of a regression line for prediction far outside the range of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate and should be avoided. USE YOUR KNOWLEDGE 2.64 Would you use the regression equation to predict? Consider the following values for NEA increase: ⫺400, 200, 500, 1000. For each, decide whether you would use the regression equation in Example 2.19 to predict fat gain or whether you would be concerned that the prediction would not be trustworthy because of extrapolation. Give reasons for your answers.
Least-squares regression Different people might draw different lines by eye on a scatterplot. This is especially true when the points are widely scattered. We need a way to draw a regression line that doesn’t depend on our guess as to where the line should go. No line will pass exactly through all the points, but we want one that is as close as possible. We will use the line to predict y from x, so we want a line that is as close as possible to the points in the vertical direction. That’s because the prediction errors we make are errors in y, which is the vertical direction in the scatterplot. The line in Figure 2.17 predicts 2.13 kilograms of fat gain for an increase in nonexercise activity of 400 calories. If the actual fat gain turns out to be 2.3 kilograms, the error is error ⫽ observed gain ⫺ predicted gain ⫽ 2.3 ⫺ 2.13 ⫽ 0.17 kilograms Errors are positive if the observed response lies above the line, and negative if the response lies below the line. We want a regression line that makes these
114
CHAPTER 2
•
Looking at Data—Relationships
FIGURE 2.18 The least-squares idea: make the errors in predicting y as small as possible by minimizing the sum of their squares. Fat gain (kilograms)
4.5
4.0 o
3.5
Predicted y2 = 3.7
o
Error = 3.0 – 3.7 = –0.7 3.0
o
Observed y2 = 3.0 x2 = –57
2.5 –150
–100 –50 0 Nonexercise activity (calories)
50
prediction errors as small as possible. Figure 2.18 illustrates the idea. For clarity, the plot shows only three of the points from Figure 2.17, along with the line, on an expanded scale. The line passes below two of the points and above one of them. The vertical distances of the data points from the line appear as vertical line segments. A “good” regression line makes these distances as small as possible. There are many ways to make “as small as possible” precise. The most common is the least-squares idea. The line in Figures 2.17 and 2.18 is in fact the least-squares regression line.
LEAST-SQUARES REGRESSION LINE The least-squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
Here is the least-squares idea expressed as a mathematical problem. We represent n observations on two variables x and y as 1x1, y1 2, 1x2, y2 2, . . . , 1xn, yn 2 If we draw a line y ⫽ b0 ⫹ b1x through the scatterplot of these observations, the line predicts the value of y corresponding to xi as yˆ i ⫽ b0 ⫹ b1xi. We write yˆ (read “y-hat”) in the equation of a regression line to emphasize that the line gives a predicted response yˆ for any x. The predicted response will usually not be exactly the same as the actually observed response y. The method of least squares chooses the line that makes the sum of the squares of these errors as small as possible. To find this line, we must find the values of the intercept b0 and the slope b1 that minimize 2 2 a 1error2 ⫽ a 1yi ⫺ b0 ⫺ b1xi 2
2.4 Least-Squares Regression
115
for the given observations xi and yi. For the NEA data, for example, we must find the b0 and b1 that minimize 14.2 ⫺ b0 ⫹ 94b1 2 2 ⫹ 13.0 ⫺ b0 ⫹ 57b1 2 2 ⫹ # # # ⫹ 11.1 ⫺ b0 ⫺ 690b1 2 2 These values are the intercept and slope of the least-squares line. You will use software or a calculator with a regression function to find the equation of the least-squares regression line from data on x and y. We will therefore give the equation of the least-squares line in a form that helps our understanding but is not efficient for calculation.
EQUATION OF THE LEAST-SQUARES REGRESSION LINE We have data on an explanatory variable x and a response variable y for n individuals. The means and standard deviations of the sample data are x and sx for x and y and sy for y, and the correlation between x and y is r. The equation of the least-squares regression line of y on x is yˆ ⫽ b0 ⫹ b1x with slope b1 ⫽ r
sy sx
and intercept b0 ⫽ y ⫺ b1x
EXAMPLE 2.22 Check the calculations. Verify from the data in Example 2.18 that the mean and standard deviation of the 16 increases in NEA are x ⫽ 324.8 calories and sx ⫽ 257.66 calories The mean and standard deviation of the 16 fat gains are y ⫽ 2.388 kg and sy ⫽ 1.1389 kg The correlation between fat gain and NEA increase is r ⫽ ⫺0.7786. The leastsquares regression line of fat gain y on NEA increase x therefore has slope sy
1.1389 sx 257.66 ⫽ ⫺0.00344 kg per calorie
b1 ⫽ r
⫽ ⫺0.7786
and intercept b0 ⫽ y ⫺ b1x ⫽ 2.388 ⫺ 1⫺0.003442 1324.82 ⫽ 3.505 kg The equation of the least-squares line is yˆ ⫽ 3.505 ⫺ 0.00344x When doing calculations like this by hand, you may need to carry extra decimal places in the preliminary calculations to get accurate values of the slope and intercept. Using software or a calculator with a regression function eliminates this worry.
116
CHAPTER 2
•
Looking at Data—Relationships
Interpreting the regression line The slope b1 ⫽ ⫺0.00344 kilograms per calorie in Example 2.22 is the change in fat gain as NEA increases. The units “kilograms of fat gained per calorie of NEA” come from the units of y (kilograms) and x (calories). Although the correlation does not change when we change the units of measurement, the equation of the least-squares line does change. The slope in grams per calorie would be 1000 times as large as the slope in kilograms per calorie, because there are 1000 grams in a kilogram. The small value of the slope, b1 ⫽ ⫺0.00344, does not mean that the effect of increased NEA on fat gain is small—it just reflects the choice of kilograms as the unit for fat gain. The slope and intercept of the least-squares line depend on the units of measurement—you can’t conclude anything from their size.
EXAMPLE 2.23 Regression using software. Figure 2.19 displays the basic regression output for the nonexercise activity data from three statistical software packages. Other software produces very similar output. You can find the slope and intercept of the least-squares line, calculated to more decimal places than we need, in each output. The software also provides information that we do not yet need, including some that we trimmed from Figure 2.19. Part of the art of using software is to ignore the extra information that is almost always present. Look for the results that you need. Once you understand a statistical method, you can read output from almost any software. FIGURE 2.19 Regression results for the nonexercise activity data from three statistical software packages: (a) Minitab, (b) SPSS, and (c) JMP. Other software produces similar output.
Minitab
Rgression Analysis: Fat versus NEA
The regression equation is Fat = 3.51 – 0.00344 NEA
Predictor Constant NEA
Coef 3.5051 –0.0034415
S = 0.739853
SE Coef 0.3036 0.0007414
R-Sq = 60.6%
T 11.54 –4.64
P 0.000 0.000
R-Sq(adj) = 57.8%
Analysis of Variance
Source Regression Residual Error
DF 1 14
SS 11.794 7.663
MS 11.794 0.547
Welcome to Minitab, press F1 for help.
(a) Minitab
F 21.55
P 0.000
2.4 Least-Squares Regression
117
*Output1 - IBM SPSS Statistics Viewer Regression [DataSet1] Variables Entered/Removeda Model
Variables Entered
Variables Removed
Method
NEAb
1
Enter
a. Dependent Variable: Fat b. All requested variables entered. Model Summary Model
R
1
.779a
Adjusted R Square
R Square .606
Std. Error of the Estimate
.578
.7399
a. Predictors: (Constant), NEA ANOVAa Sum of Squares
Model 1
Regression
11.794
Residual Total
df
Mean Square 1
11.794
7.663
14
.547
19.458
15
F
Sig.
21.546
.000b
a. Dependent Variable: Fat b. Predictors: (Constant), NEA Coefficientsa Unstandardized Coefficients B Std. Error
Model 1
(Constant)
3.505
.304
NEA
–.003
.001
Standardized Coefficients Beta –.779
t
Sig.
11.545
.000
– 4.642
.000
a. Dependent Variable: Fat IBM SPSS Statistics Processor is ready
(b) SPSS
FIGURE 2.19 Continued
118
CHAPTER 2
•
Looking at Data—Relationships
FIGURE 2.19 Continued
JMP Bivariate Fit of Fat By NEA Linear Fit Fat = 3.5051229 – 0.0034415*NEA Summary of Fit RSquare
0.606149
RSquare Adj
0.578017
Root Mean Square Error
0.739853
Mean of Response
2.3875
Observations (or Sum Wgts)
16
Analysis of Variance Parameter Estimates Term
Estimate
Std Error
t Ratio
Prob>|t|
Intercept
3.5051229
0.303616
11.54
0.8) (b)
1
Similarly, P1X ⱕ 0.52 ⫽ 0.5 P1X ⬎ 0.82 ⫽ 0.2 P1X ⱕ 0.5 or X ⬎ 0.82 ⫽ 0.7 Notice that the last event consists of two nonoverlapping intervals, so the total area above the event is found by adding two areas, as illustrated by Figure 4.9(b). This assignment of probabilities obeys all of our rules for probability.
USE YOUR KNOWLEDGE 4.48 Find the probability. For the uniform distribution described in Example 4.25, find the probability that X is between 0.2 and 0.7.
Probability as area under a density curve is a second important way of assigning probabilities to events. Figure 4.10 illustrates this idea in general form. We call X in Example 4.25 a continuous random variable because its values are not isolated numbers but an entire interval of numbers.
Area = P(A)
Event A
FIGURE 4.10 The probability distribution of a continuous random variable assigns probabilities as areas under a density curve. The total area under any density curve is 1.
4.3 Random Variables
259
CONTINUOUS RANDOM VARIABLE A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The probability of any event is the area under the density curve and above the values of X that make up the event.
The probability model for a continuous random variable assigns probabilities to intervals of outcomes rather than to individual outcomes. In fact, all continuous probability distributions assign probability 0 to every individual outcome. Only intervals of values have positive probability. To see that this is true, consider a specific outcome such as P1X ⫽ 0.82 in the context of Example 4.25. The probability of any interval is the same as its length. The point 0.8 has no length, so its probability is 0. Although this fact may seem odd, it makes intuitive, as well as mathematical, sense. The random number generator produces a number between 0.79 and 0.81 with probability 0.02. An outcome between 0.799 and 0.801 has probability 0.002. A result between 0.799999 and 0.800001 has probability 0.000002. You see that as we approach 0.8 the probability gets closer to 0. To be consistent, the probability of an outcome exactly equal to 0.8 must be 0. Because there is no probability exactly at X ⫽ 0.8, the two events 5X ⬎ 0.86 and 5X ⱖ 0.86 have the same probability. We can ignore the distinction between ⬎ and ⱖ when finding probabilities for continuous (but not discrete) random variables.
Normal distributions as probability distributions The density curves that are most familiar to us are the Normal curves. Because any density curve describes an assignment of probabilities, Normal distributions are probability distributions. Recall that N1m, s2 is our shorthand for the Normal distribution having mean m and standard deviation s. In the language of random variables, if X has the N1m, s2 distribution, then the standardized variable Z⫽
X⫺m s
is a standard Normal random variable having the distribution N10, 12.
EXAMPLE
LOOK BACK parameter, statistic, p. 206
4.26 Texting while driving. Texting while driving can be dangerous, but young people want to remain connected. Suppose that 26% of teen drivers text while driving. If we take a sample of 500 teen drivers, what percent would we expect to say that they text while driving?12 The proportion p ⫽ 0.26 is a parameter that describes the population of teen drivers. The proportion pˆ of the sample who say that they text while driving is a statistic used to estimate p. The statistic pˆ is a random variable because repeating the SRS would give a different sample of 500 teen drivers and a different value of pˆ .
260
CHAPTER 4
•
Probability: The Study of Randomness
Area = 0.8740
FIGURE 4.11 Probability in Example 4.26 as area under a Normal density curve.
LOOK BACK Normal distribution calculations, p. 63
p^ = 0.23
p^ = 0.26
p^ = 0.29
The statistic pˆ has approximately the N10.26, 0.01962 distribution. The mean 0.26 of this distribution is the same as the population parameter because pˆ is an unbiased estimate of p. The standard deviation is controlled mainly by the size of the sample. What is the probability that the survey result differs from the truth about the population by no more than 3 percentage points? We can use what we learned about Normal distribution calculations to answer this question. Because p ⫽ 0.26, the survey misses by no more than 3 percentage points if the sample proportion is between 0.23 and 0.29. Figure 4.11 shows this probability as an area under a Normal density curve. You can find it by software or by standardizing and using Table A. From Table A, pˆ ⫺ 0.26 0.23 ⫺ 0.26 0.29 ⫺ 0.26 ⱕ ⱕ b 0.0196 0.0196 0.0196 ⫽ P 1⫺1.53 ⱕ Z ⱕ 1.532 ⫽ 0.9370 ⫺ 0.0630 ⫽ 0.8740
P10.23 ⱕ pˆ ⱕ 0.292 ⫽ P a
About 87% of the time, the sample pˆ will be within 3 percentage points of the parameter p. We began this chapter with a general discussion of the idea of probability and the properties of probability models. Two very useful specific types of probability models are distributions of discrete and continuous random variables. In our study of statistics we will employ only these two types of probability models.
SECTION 4.3 Summary A random variable is a variable taking numerical values determined by the outcome of a random phenomenon. The probability distribution of a random variable X tells us what the possible values of X are and how probabilities are assigned to those values. A random variable X and its distribution can be discrete or continuous. A discrete random variable has possible values that can be given in an ordered list. The probability distribution assigns each of these values a probability between 0 and 1 such that the sum of all the probabilities is exactly 1. The probability of any event is the sum of the probabilities of all the values that make up the event.
4.3 Random Variables
261
A continuous random variable takes all values in some interval of numbers. A density curve describes the probability distribution of a continuous random variable. The probability of any event is the area under the curve and above the values that make up the event. Normal distributions are one type of continuous probability distribution. You can picture a probability distribution by drawing a probability histogram in the discrete case or by graphing the density curve in the continuous case.
SECTION 4.3 Exercises For Exercise 4.46, see page 254; for Exercise 4.47, see page 256; and for Exercise 4.48, see page 258. 4.49 How many courses? At a small liberal arts college, students can register for one to six courses. Let X be the number of courses taken in the fall by a randomly selected student from this college. In a typical fall semester, 5% take one course, 5% take two courses, 13% take three courses, 26% take four courses, 36% take five courses, and 15% take six courses. Let X be the number of courses taken in the fall by a randomly selected student from this college. Describe the probability distribution of this random variable. 4.50 Make a graphical display. Refer to the previous exercise. Use a probability histogram to provide a graphical description of the distribution of X. 4.51 Find some probabilities. Refer to Exercise 4.49. (a) Find the probability that a randomly selected student takes three or fewer courses. (b Find the probability that a randomly selected student takes four or five courses. (c) Find the probability that a randomly selected student takes eight courses. 4.52 Use the uniform distribution. Suppose that a random variable X follows the uniform distribution described in Example 4.25 (page 257). For each of the following events, find the probability and illustrate your calculations with a sketch of the density curve similar to the ones in Figure 4.9 (page 258). (a) The probability that X is less than 0.1. (b) The probability that X is greater than or equal to 0.8. (c) The probability that X is less than 0.7 and greater than 0.5. (d) The probability that X is 0.5. 4.53 What’s wrong? In each of the following scenarios, there is something wrong. Describe what is wrong and give a reason for your answer.
(a) The probabilities for a discrete statistic always add to 1. (b) A continuous random variable can take any value between 0 and 1. (c) Normal distributions are discrete random variables. 4.54 Use of Twitter. Suppose that the population proportion of Internet users who say that they use Twitter or another service to post updates about themselves or to see updates about others is 19%.13 Think about selecting random samples from a population in which 19% are Twitter users. (a) Describe the sample space for selecting a single person. (b) If you select three people, describe the sample space. (c) Using the results of (b), define the sample space for the random variable that expresses the number of Twitter users in the sample of size 3. (d) What information is contained in the sample space for part (b) that is not contained in the sample space for part (c)? Do you think this information is important? Explain your answer. 4.55 Use of Twitter. Find the probabilities for parts (a), (b), and (c) of the previous exercise. 4.56 Households and families in government data. In government data, a household consists of all occupants of a dwelling unit, while a family consists of two or more persons who live together and are related by blood or marriage. So all families form households, but some households are not families. Here are the distributions of household size and of family size in the United States: Number of persons
1
Household probability
0.27
Family probability
0
2
3
4
5
6
7
0.33 0.16
0.14
0.06
0.03
0.01
0.44 0.22
0.20
0.09
0.03
0.02
262
CHAPTER 4
•
Probability: The Study of Randomness
Make probability histograms for these two discrete distributions, using the same scales. What are the most important differences between the sizes of households and families? 4.57 Discrete or continuous? In each of the following situations decide whether the random variable is discrete or continuous and give a reason for your answer. (a) Your web page has five different links, and a user can click on one of the links or can leave the page. You record the length of time that a user spends on the web page before clicking one of the links or leaving the page. (b) The number of hits on your web page. (c) The yearly income of a visitor to your web page. 4.58 Texas hold ’em. The game of Texas hold ’em starts with each player receiving two cards. Here is the probability distribution for the number of aces in two-card hands: Number of aces Probability
0
1
0.8507
0.1448
2 0.0045
(a) Verify that this assignment of probabilities satisfies the requirement that the sum of the probabilities for a discrete distribution must be 1. (b) Make a probability histogram for this distribution. (c) What is the probability that a hand contains at least one ace? Show two different ways to calculate this probability. 4.59 Tossing two dice. Some games of chance rely on tossing two dice. Each die has six faces, marked with 1, 2, . . . , 6 spots called pips. The dice used in casinos are carefully balanced so that each face is equally likely to come up. When two dice are tossed, each of the 36 possible pairs of faces is equally likely to come up. The outcome of interest to a gambler is the sum of the pips on the two up-faces. Call this random variable X. (a) Write down all 36 possible pairs of up-faces. (b) If all pairs have the same probability, what must be the probability of each pair? (c) Write the value of X next to each pair of up-faces and use this information with the result of (b) to give the probability distribution of X. Draw a probability histogram to display the distribution. (d) One bet available in the game called craps wins if a 7 or an 11 comes up on the next roll of two dice. What is the probability of rolling a 7 or an 11 on the next roll? (e) Several bets in craps lose if a 7 is rolled. If any outcome other than 7 occurs, these bets either win or continue to the next roll. What is the probability that anything other than a 7 is rolled?
4.60 Nonstandard dice. Nonstandard dice can produce interesting distributions of outcomes. You have two balanced, six-sided dice. One is a standard die, with faces having 1, 2, 3, 4, 5, and 6 spots. The other die has three faces with 0 spots and three faces with 6 spots. Find the probability distribution for the total number of spots Y on the up-faces when you roll these two dice. 4.61 Spell-checking software. Spell-checking software catches “nonword errors,” which are strings of letters that are not words, as when “the” is typed as “eth.” When undergraduates are asked to write a 250-word essay (without spell-checking), the number X of nonword errors has the following distribution: Value of X
0
1
2
3
4
Probability
0.1
0.3
0.3
0.2
0.1
(a) Sketch the probability distribution for this random variable. (b) Write the event “at least one nonword error” in terms of X. What is the probability of this event? (c) Describe the event X ⱕ 2 in words. What is its probability? What is the probability that X ⬍ 2? 4.62 Find the probabilities. Let the random variable X be a random number with the uniform density curve in Figure 4.9 (page 258). Find the following probabilities: (a) P1X ⱖ 0.302 (b) P1X ⫽ 0.302 (c) P10.30 ⬍ X ⬍ 1.302 (d) P10.20 ⱕ X ⱕ 0.25 or 0.7 ⱕ X ⱕ 0.92 (e) X is not in the interval 0.4 to 0.7 4.63 Uniform numbers between 0 and 2. Many random number generators allow users to specify the range of the random numbers to be produced. Suppose that you specify that the range is to be all numbers between 0 and 2. Call the random number generated Y. Then the density curve of the random variable Y has constant height between 0 and 2, and height 0 elsewhere. (a) What is the height of the density curve between 0 and 2? Draw a graph of the density curve. (b) Use your graph from (a) and the fact that probability is area under the curve to find P1Y ⱕ 1.62. (c) Find P10.5 ⬍ Y ⬍ 1.72. (d) Find P1Y ⱖ 0.952. 4.64 The sum of two uniform random numbers. Generate two random numbers between 0 and 1 and take Y to be their sum. Then Y is a continuous random
4.4 Means and Variances of Random Variables
Height = 1
0
1
2
FIGURE 4.12 The density curve for the sum Y of two random numbers, for Exercise 4.64.
variable that can take any value between 0 and 2. The density curve of Y is the triangle shown in Figure 4.12. (a) Verify by geometry that the area under this curve is 1. (b) What is the probability that Y is less than 1? (Sketch the density curve, shade the area that represents the probability, then find that area. Do this for (c) also.) (c) What is the probability that Y is greater than 0.6? 4.65 How many close friends? How many close friends do you have? Suppose that the number of close friends adults claim to have varies from person to person with
263
mean m ⫽ 9 and standard deviation s ⫽ 2.4. An opinion poll asks this question of an SRS of 1100 adults. We will see in the next chapter that in this situation the sample mean response x has approximately the Normal distribution with mean 9 and standard deviation 0.0724. What is P18 ⱕ x ⱕ 102, the probability that the statistic x estimates the parameter m to within ;1? 4.66 Normal approximation for a sample proportion. A sample survey contacted an SRS of 700 registered voters in Oregon shortly after an election and asked respondents whether they had voted. Voter records show that 56% of registered voters had actually voted. We will see in the next chapter that in this situation the proportion pˆ of the sample who voted has approximately the Normal distribution with mean m ⫽ 0.56 and standard deviation s ⫽ 0.019. (a) If the respondents answer truthfully, what is P10.52 ⱕ pˆ ⱕ 0.602? This is the probability that the statistic pˆ estimates the parameter 0.56 within plus or minus 0.04. (b) In fact, 72% of the respondents said they had voted (pˆ ⫽ 0.722. If respondents answer truthfully, what is P1pˆ ⱖ 0.722? This probability is so small that it is good evidence that some people who did not vote claimed that they did vote.
4.4 Means and Variances of Random Variables When you complete this section, you will be able to • Use a probability distribution to find the mean of a discrete random variable. • Apply the law of large numbers to describe the behavior of the sample mean as the sample size increases. • Find means using the rules for means of linear transformations, sums, and differences. • Use a probability distribution to find the variance and the standard deviation of a discrete random variable. • Find variances and standard deviations using the rules for variances and standard deviations for linear transformations. • Find variances and standard deviations using the rules for variances and standard deviations for sums of and differences between two random variables, for uncorrelated and for correlated random variables. The probability histograms and density curves that picture the probability distributions of random variables resemble our earlier pictures of distributions of data. In describing data, we moved from graphs to numerical measures such as means and standard deviations. Now we will make the same move to expand our descriptions of the distributions of random variables. We
264
CHAPTER 4
•
Probability: The Study of Randomness can speak of the mean winnings in a game of chance or the standard deviation of the randomly varying number of calls a travel agency receives in an hour. In this section we will learn more about how to compute these descriptive measures and about the laws they obey.
The mean of a random variable In Chapter 1 (page 31), we learned that the mean x is the average of the observations in a sample. Recall that a random variable X is a numerical outcome of a random process. Think about repeating the random process many times and recording the resulting values of the random variable. You can think of the value of a random variable as the average of a very large sample where the relative frequencies of the values are the same as their probabilities. If we think of the random process as corresponding to the population, then the mean of the random variable is a parameter of this population. Here is an example.
EXAMPLE 4.27 The Tri-State Pick 3 lottery. Most states and Canadian provinces have government-sponsored lotteries. Here is a simple lottery wager, from the Tri-State Pick 3 game that New Hampshire shares with Maine and Vermont. You choose a three-digit number, 000 to 999. The state chooses a three-digit winning number at random and pays you $500 if your number is chosen. Because there are 1000 three-digit numbers, you have probability 1/1000 of winning. Taking X to be the amount your ticket pays you, the probability distribution of X is Payoff X Probability
$0
$500
0.999
0.001
The random process consists of drawing a three-digit number. The population consists of the numbers 000 to 999. Each of these possible outcomes is equally likely in this example. In the setting of sampling in Chapter 3 (page 194), we can view the random process as selecting an SRS of size 1 from the population. The random variable X is 1 if the selected number is equal to the one that you chose and is 0 if it is not. What is your average payoff from many tickets? The ordinary average of the two possible outcomes $0 and $500 is $250, but that makes no sense as the average because $500 is much less likely than $0. In the long run you receive $500 once in every 1000 tickets and $0 on the remaining 999 of 1000 tickets. The long-run average payoff is $500
1 999 ⫹ $0 ⫽ $0.50 1000 1000
or 50 cents. That number is the mean of the random variable X. (Tickets cost $1, so in the long run the state keeps half the money you wager.)
4.4 Means and Variances of Random Variables
265
If you play Tri-State Pick 3 several times, we would as usual call the mean of the actual amounts you win x. The mean in Example 4.27 is a different quantity—it is the long-run average winnings you expect if you play a very large number of times.
USE YOUR KNOWLEDGE 4.67 Find the mean of the probability distribution. You toss a fair coin. If the outcome is heads, you win $5.00; if the outcome is tails, you win nothing. Let X be the amount that you win in a single toss of a coin. Find the probability distribution of this random variable and its mean.
mean m
expected value
Just as probabilities are an idealized description of long-run proportions, the mean of a probability distribution describes the long-run average outcome. We can’t call this mean x, so we need a different symbol. The common symbol for the mean of a probability distribution is m, the Greek letter mu. We used m in Chapter 1 for the mean of a Normal distribution, so this is not a new notation. We will often be interested in several random variables, each having a different probability distribution with a different mean. To remind ourselves that we are talking about the mean of X, we often write mX rather than simply m. In Example 4.27, mX ⫽ $0.50. Notice that, as often happens, the mean is not a possible value of X. You will often find the mean of a random variable X called the expected value of X. This term can be misleading, for we don’t necessarily expect one observation on X to be close to its expected value. The mean of any discrete random variable is found just as in Example 4.27. It is an average of the possible outcomes, but a weighted average in which each outcome is weighted by its probability. Because the probabilities add to 1, we have total weight 1 to distribute among the outcomes. An outcome that occurs half the time has probability one-half and gets one-half the weight in calculating the mean. Here is the general definition.
MEAN OF A DISCRETE RANDOM VARIABLE Suppose that X is a discrete random variable whose distribution is Value of X
x1
x2
x3
p
xk
Probability
p1
p2
p3
p
pk
To find the mean of X, multiply each possible value by its probability, then add all the products: mX ⫽ x1 p1 ⫹ x2 p2 ⫹ p ⫹ xk pk ⫽ a xi p i
266
CHAPTER 4
•
Probability: The Study of Randomness
EXAMPLE 4.28 The mean of equally likely first digits. If first digits in a set of data all have the same probability, the probability distribution of the first digit X is then First digit X
1
2
3
4
5
6
7
8
9
Probability
1/9
1/9
1/9
1/9
1/9
1/9
1/9
1/9
1/9
The mean of this distribution is 1 1 1 1 1 ⫹2⫻ ⫹3⫻ ⫹4⫻ ⫹5⫻ 9 9 9 9 9 1 1 1 1 ⫹6⫻ ⫹7⫻ ⫹8⫻ ⫹9⫻ 9 9 9 9 1 ⫽ 45 ⫻ ⫽ 5 9
mX ⫽ 1 ⫻
Suppose that the random digits in Example 4.28 had a different probability distribution. In Example 4.12 (page 242) we described Benford’s law as a probability distribution that describes first digits of numbers in many real situations. Let’s calculate the mean for Benford’s law.
EXAMPLE 4.29 The mean of first digits that follow Benford’s law. Here is the distribution of the first digit for data that follow Benford’s law. We use the letter V for this random variable to distinguish it from the one that we studied in Example 4.28. The distribution of V is First digit V
1
2
Probability 0.301 0.176
3
4
5
6
7
8
9
0.125 0.097 0.079 0.067 0.058 0.051 0.046
The mean of V is mV ⫽ 112 10.3012 ⫹ 122 10.1762 ⫹ 132 10.1252 ⫹ 142 10.0972 ⫹ 152 10.0792 ⫹ 162 10.0672 ⫹ 172 10.0582 ⫹ 182 10.0512 ⫹ 192 10.0462 ⫽ 3.441 The mean reflects the greater probability of smaller first digits under Benford’s law than when first digits 1 to 9 are equally likely. Figure 4.13 locates the means of X and V on the two probability histograms. Because the discrete uniform distribution of Figure 4.13(a) is symmetric, the mean lies at the center of symmetry. We can’t locate the mean of the right-skewed distribution of Figure 4.13(b) by eye—calculation is needed. What about continuous random variables? The probability distribution of a continuous random variable X is described by a density curve. Chapter 1 (page 56) showed how to find the mean of the distribution: it is the point at
4.4 Means and Variances of Random Variables
267
X = 5
Probability
0.4 0.3 0.2 0.1 0.0 0
1
2
3
4 5 Outcomes (a)
6
7
8
9
6
7
8
9
V = 3.441
FIGURE 4.13 Locating the mean of a discrete random variable on the probability histogram for (a) digits between 1 and 9 chosen at random; (b) digits between 1 and 9 chosen from records that obey Benford’s law.
Probability
0.4 0.3 0.2 0.1 0.0 0
1
2
3
4 5 Outcomes (b)
which the area under the density curve would balance if it were made out of solid material. The mean lies at the center of symmetric density curves such as the Normal curves. Exact calculation of the mean of a distribution with a skewed density curve requires advanced mathematics.14 The idea that the mean is the balance point of the distribution applies to discrete random variables as well, but in the discrete case we have a formula that gives us this point.
Statistical estimation and the law of large numbers
LOOK BACK sampling distributions, p. 208
We would like to estimate the mean height m of the population of all American women between the ages of 18 and 24 years. This m is the mean mX of the random variable X obtained by choosing a young woman at random and measuring her height. To estimate m, we choose an SRS of young women and use the sample mean x to estimate the unknown population mean m. In the language of Section 3.4 (page 205), m is a parameter and x is a statistic. Statistics obtained from probability samples are random variables because their values vary in repeated sampling. The sampling distributions of statistics are just the probability distributions of these random variables. It seems reasonable to use x to estimate m. An SRS should fairly represent the population, so the mean x of the sample should be somewhere near the mean m of the population. Of course, we don’t expect x to be exactly equal to m, and we realize that if we choose another SRS, the luck of the draw will probably produce a different x. If x is rarely exactly right and varies from sample to sample, why is it nonetheless a reasonable estimate of the population mean m? We gave one answer
268
CHAPTER 4
•
Probability: The Study of Randomness in Section 3.4: x is unbiased and we can control its variability by choosing the sample size. Here is another answer: if we keep on adding observations to our random sample, the statistic x is guaranteed to get as close as we wish to the parameter m and then stay that close. We have the comfort of knowing that if we can afford to keep on measuring more women, eventually we will estimate the mean height of all young women very accurately. This remarkable fact is called the law of large numbers. It is remarkable because it holds for any population, not just for some special class such as Normal distributions.
LAW OF LARGE NUMBERS Draw independent observations at random from any population with finite mean m. Decide how accurately you would like to estimate m. As the number of observations drawn increases, the mean x of the observed values eventually approaches the mean m of the population as closely as you specified and then stays that close.
The behavior of x is similar to the idea of probability. In the long run, the proportion of outcomes taking any value gets close to the probability of that value, and the average outcome gets close to the distribution mean. Figure 4.1 (page 232) shows how proportions approach probability in one example. Here is an example of how sample means approach the distribution mean.
EXAMPLE 4.30 Heights of young women. The distribution of the heights of all young women is close to the Normal distribution with mean 64.5 inches and standard deviation 2.5 inches. Suppose that m ⫽ 64.5 were exactly true. Figure 4.14 shows the behavior of the mean height x of n women chosen at random from a population whose heights follow the N164.5, 2.52 distribution. The graph plots the values of x as we add women to our sample. The FIGURE 4.14 The law of large
66.0
numbers in action. As we take more observations, the sample mean always approaches the mean of the population. Mean of first n
65.5 65.0 64.5 64.0 63.5 63.0 62.5 0
5
10
50 100 1000 Number of observations, n
10,000
4.4 Means and Variances of Random Variables
269
first woman drawn had height 64.21 inches, so the line starts there. The second had height 64.35 inches, so for n ⫽ 2 the mean is CHALLENGE
x⫽
64.21 ⫹ 64.35 ⫽ 64.28 2
This is the second point on the line in the graph. At first, the graph shows that the mean of the sample changes as we take more observations. Eventually, however, the mean of the observations gets close to the population mean m ⫽ 64.5 and settles down at that value. The law of large numbers says that this always happens. USE YOUR KNOWLEDGE 4.68 Use the Law of Large Numbers applet. The Law of Large Numbers applet animates a graph like Figure 4.14 for rolling dice. Use it to better understand the law of large numbers by making a similar graph. The mean m of a random variable is the average value of the variable in two senses. By its definition, m is the average of the possible values, weighted by their probability of occurring. The law of large numbers says that m is also the long-run average of many independent observations on the variable. The law of large numbers can be proved mathematically starting from the basic laws of probability.
Thinking about the law of large numbers The law of large numbers says broadly that the average results of many independent observations are stable and predictable. The gamblers in a casino may win or lose, but the casino will win in the long run because the law of large numbers says what the average outcome of many thousands of bets will be. An insurance company deciding how much to charge for life insurance and a fast-food restaurant deciding how many beef patties to prepare also rely on the fact that averaging over many individuals produces a stable result. It is worth the effort to think a bit more closely about so important a fact. The “law of small numbers” Both the rules of probability and the law of large numbers describe the regular behavior of chance phenomena in the long run. Psychologists have discovered that our intuitive understanding of randomness is quite different from the true laws of chance.15 For example, most people believe in an incorrect “law of small numbers.” That is, we expect even short sequences of random events to show the kind of average behavior that in fact appears only in the long run. Some teachers of statistics begin a course by asking students to toss a coin 50 times and bring the sequence of heads and tails to the next class. The teacher then announces which students just wrote down a random-looking sequence rather than actually tossing a coin. The faked tosses don’t have enough “runs” of consecutive heads or consecutive tails. Runs of the same outcome don’t look random to us but are in fact common. For example, the probability of a run of three or more consecutive heads or tails in just 10 tosses is greater than 0.8.16 The runs of consecutive heads or consecutive tails that appear in real coin tossing (and that are predicted by the mathematics of
270
CHAPTER 4
•
Probability: The Study of Randomness probability) seem surprising to us. Because we don’t expect to see long runs, we may conclude that the coin tosses are not independent or that some influence is disturbing the random behavior of the coin.
EXAMPLE 4.31 The “hot hand” in basketball. Belief in the law of small numbers influences behavior. If a basketball player makes several consecutive shots, both the fans and her teammates believe that she has a “hot hand” and is more likely to make the next shot. This is doubtful. Careful study suggests that runs of baskets made or missed are no more frequent in basketball than would be expected if each shot were independent of the player’s previous shots. Baskets made or missed are just like heads and tails in tossing a coin. (Of course, some players make 30% of their shots in the long run and others make 50%, so a coin-toss model for basketball must allow coins with different probabilities of a head.) Our perception of hot or cold streaks simply shows that we don’t perceive random behavior very well.17 Our intuition doesn’t do a good job of distinguishing random behavior from systematic influences. This is also true when we look at data. We need statistical inference to supplement exploratory analysis of data because probability calculations can help verify that what we see in the data is more than a random pattern. How large is a large number? The law of large numbers says that the actual mean outcome of many trials gets close to the distribution mean m as more trials are made. It doesn’t say how many trials are needed to guarantee a mean outcome close to m. That depends on the variability of the random outcomes. The more variable the outcomes, the more trials are needed to ensure that the mean outcome x is close to the distribution mean m. Casinos understand this: the outcomes of games of chance are variable enough to hold the interest of gamblers. Only the casino plays often enough to rely on the law of large numbers. Gamblers get entertainment; the casino has a business. BEYOND THE BASICS
More laws of large numbers The law of large numbers is one of the central facts about probability. It helps us understand the mean m of a random variable. It explains why gambling casinos and insurance companies make money. It assures us that statistical estimation will be accurate if we can afford enough observations. The basic law of large numbers applies to independent observations that all have the same distribution. Mathematicians have extended the law to many more general settings. Here are two of these. Is there a winning system for gambling? Serious gamblers often follow a system of betting in which the amount bet on each play depends on the outcome of previous plays. You might, for example, double your bet on each spin of the roulette wheel until you win—or, of course, until your fortune is exhausted. Such a system tries to take advantage of the fact that you have
4.4 Means and Variances of Random Variables
271
a memory even though the roulette wheel does not. Can you beat the odds with a system based on the outcomes of past plays? No. Mathematicians have established a stronger version of the law of large numbers that says that, if you do not have an infinite fortune to gamble with, your long-run average winnings m remain the same as long as successive trials of the game (such as spins of the roulette wheel) are independent. What if observations are not independent? You are in charge of a process that manufactures video screens for computer monitors. Your equipment measures the tension on the metal mesh that lies behind each screen and is critical to its image quality. You want to estimate the mean tension m for the process by the average x of the measurements. Alas, the tension measurements are not independent. If the tension on one screen is a bit too high, the tension on the next is more likely to also be high. Many real-world processes are like this—the process stays stable in the long run, but two observations made close together are likely to both be above or both be below the long-run mean. Again the mathematicians come to the rescue: as long as the dependence dies out fast enough as we take measurements farther and farther apart in time, the law of large numbers still holds.
Rules for means You are studying flaws in the painted finish of refrigerators made by your firm. Dimples and paint sags are two kinds of surface flaw. Not all refrigerators have the same number of dimples: many have none, some have one, some two, and so on. You ask for the average number of imperfections on a refrigerator. The inspectors report finding an average of 0.7 dimples and 1.4 sags per refrigerator. How many total imperfections of both kinds (on the average) are there on a refrigerator? That’s easy: if the average number of dimples is 0.7 and the average number of sags is 1.4, then counting both gives an average of 0.7 ⫹ 1.4 ⫽ 2.1 flaws. In more formal language, the number of dimples on a refrigerator is a random variable X that varies as we inspect one refrigerator after another. We know only that the mean number of dimples is mX ⫽ 0.7. The number of paint sags is a second random variable Y having mean mY ⫽ 1.4. (As usual, the subscripts keep straight which variable we are talking about.) The total number of both dimples and sags is another random variable, the sum X ⫹ Y. Its mean mX⫹Y is the average number of dimples and sags together. It is just the sum of the individual means mX and mY. That’s an important rule for how means of random variables behave. Here’s another rule. The crickets living in a field have mean length 1.2 inches. What is the mean in centimeters? There are 2.54 centimeters in an inch, so the length of a cricket in centimeters is 2.54 times its length in inches. If we multiply every observation by 2.54, we also multiply their average by 2.54. The mean in centimeters must be 2.54 ⫻ 1.2, or about 3.05 centimeters. More formally, the length in inches of a cricket chosen at random from the field is a random variable X with mean mX. The length in centimeters is 2.54X, and this new random variable has mean 2.54mX. The point of these examples is that means behave like averages. Here are the rules we need.
272
CHAPTER 4
•
Probability: The Study of Randomness
RULES FOR MEANS OF LINEAR TRANSFORMATIONS, SUMS, AND DIFFERENCES Rule 1. If X is a random variable and a and b are fixed numbers, then ma⫹bX ⫽ a ⫹ bmX Rule 2. If X and Y are random variables, then mX⫹Y ⫽ mX ⫹ mY Rule 3. If X and Y are random variables, then mX⫺Y ⫽ mX ⫺ mY
LOOK BACK
Note that a ⫹ bX is a linear transformation of the random variable X.
linear transformation, p. 45
EXAMPLE 4.32 How many courses? In Exercise 4.49 (page 261) you described the probability distribution of the number of courses taken in the fall by students at a small liberal arts college. Here is the distribution: Courses in the fall Probability
1
2
3
4
5
6
0.05
0.05
0.13
0.26
0.36
0.15
For the spring semester, the distribution is a little different. Courses in the spring Probability
1
2
3
4
5
6
0.06
0.08
0.15
0.25
0.34
0.12
For a randomly selected student, let X be the number of courses taken in the fall semester, and let Y be the number of courses taken in the spring semester. The means of these random variables are mX ⫽ 112 10.052 ⫹ 122 10.052 ⫹ 132 10.132 ⫹ 142 10.262 ⫹ 152 10.362 ⫹ 162 10.152 ⫽ 4.28 mY ⫽ 112 10.062 ⫹ 122 10.082 ⫹ 132 10.152 ⫹ 142 10.252 ⫹ 152 10.342 ⫹ 162 10.122 ⫽ 4.09 The mean course load for the fall is 4.28 courses and the mean course load for the spring is 4.09 courses. We assume that these distributions apply to students who earned credit for courses taken in the fall and the spring semesters. The mean of the total number of courses taken for the academic year is X ⫹ Y. Using Rule 2, we calculate the mean of the total number of courses: mZ ⫽ mX ⫹ mY ⫽ 4.28 ⫹ 4.09 ⫽ 8.37 Note that it is not possible for a student to take 8.37 courses in an academic year. This number is the mean of the probability distribution.
4.4 Means and Variances of Random Variables
273
EXAMPLE 4.33 What about credit hours? In the previous exercise, we examined the number of courses taken in the fall and in the spring at a small liberal arts college. Suppose that we were interested in the total number of credit hours earned for the academic year. We assume that for each course taken at this college, three credit hours are earned. Let T be the mean of the distribution of the total number of credit hours earned for the academic year. What is the mean of the distribution of T ? To find the answer, we can use Rule 1 with a ⫽ 0 and b ⫽ 3. Here is the calculation: mT ⫽ ma⫹bZ ⫽ a ⫹ bmZ ⫽ 0 ⫹ 132 18.372 ⫽ 25.11 The mean of the distribution of the total number of credit hours earned is 25.11. USE YOUR KNOWLEDGE 4.69 Find MY. The random variable X has mean mX ⫽ 8. If Y ⫽ 12 ⫹ 7X, what is mY? 4.70 Find MW. The random variable U has mean mU ⫽ 22, and the random variable V has mean mV ⫽ 22. If W ⫽ 0.5U ⫹ 0.5V, find mW.
The variance of a random variable The mean is a measure of the center of a distribution. A basic numerical description requires in addition a measure of the spread or variability of the distribution. The variance and the standard deviation are the measures of spread that accompany the choice of the mean to measure center. Just as for the mean, we need a distinct symbol to distinguish the variance of a random variable from the variance s2 of a data set. We write the variance of a random variable X as s2X. Once again the subscript reminds us which variable we have in mind. The definition of the variance s2X of a random variable is similar to the definition of the sample variance s2 given in Chapter 1. That is, the variance is an average value of the squared deviation 1X ⫺ mX 2 2 of the variable X from its mean mX. As for the mean, the average we use is a weighted average in which each outcome is weighted by its probability in order to take account of outcomes that are not equally likely. Calculating this weighted average is straightforward for discrete random variables but requires advanced mathematics in the continuous case. Here is the definition.
VARIANCE OF A DISCRETE RANDOM VARIABLE Suppose that X is a discrete random variable whose distribution is Value of X
x1
x2
x3
p
xk
Probability
p1
p2
p3
p
pk
274
CHAPTER 4
•
Probability: The Study of Randomness
and that mX is the mean of X. The variance of X is s2X ⫽ 1x1 ⫺ mX 2 2p1 ⫹ 1x2 ⫺ mX 2 2p2 ⫹ p ⫹ 1xk ⫺ mX 2 2pk ⫽ a 1xi ⫺ mX 2 2pi The standard deviation sX of X is the square root of the variance.
EXAMPLE 4.34 Find the mean and the variance. In Example 4.32 we saw that the distribution of the number X of fall courses taken by students at a small liberal arts college is Courses in the fall Probability
1
2
3
4
5
6
0.05
0.05
0.13
0.26
0.36
0.15
We can find the mean and variance of X by arranging the calculation in the form of a table. Both mX and s2X are sums of columns in this table. xi
pi
xi pi
(xi 2 mx)2pi
1
0.05
0.05
11 ⫺ 4.282 2 10.052 5 0.53792
2
0.05
0.10
12 ⫺ 4.282 2 10.052 5 0.25992
3
0.13
0.39
13 ⫺ 4.282 2 10.132 5 0.21299
4
0.26
1.04
14 ⫺ 4.282 2 10.262 5 0.02038
5
0.36
1.80
15 ⫺ 4.282 2 10.362 5 0.18662
6
0.15
0.90
16 ⫺ 4.282 2 10.152 5 0.44376
mX ⫽ 4.28
s2X 5 1.662
We see that s2X ⫽ 1.662. The standard deviation of X is sX ⫽ 21.662 ⫽ 1.289. The standard deviation is a measure of the variability of the number of fall courses taken by the students at the small liberal arts college. As in the case of distributions for data, the standard deviation of a probability distribution is easiest to understand for Normal distributions.
USE YOUR KNOWLEDGE 4.71 Find the variance and the standard deviation. The random variable X has the following probability distribution: Value of X
$0
3
Probability
0.4
0.6
Find the variance s2X and the standard deviation sX for this random variable.
4.4 Means and Variances of Random Variables
275
Rules for variances and standard deviations
independence
correlation
What are the facts for variances that parallel Rules 1, 2, and 3 for means? The mean of a sum of random variables is always the sum of their means, but this addition rule is true for variances only in special situations. To understand why, take X to be the percent of a family’s after-tax income that is spent, and take Y to be the percent that is saved. When X increases, Y decreases by the same amount. Though X and Y may vary widely from year to year, their sum X ⫹ Y is always 100% and does not vary at all. It is the association between the variables X and Y that prevents their variances from adding. If random variables are independent, this kind of association between their values is ruled out and their variances do add. Two random variables X and Y are independent if knowing that any event involving X alone did or did not occur tells us nothing about the occurrence of any event involving Y alone. Probability models often assume independence when the random variables describe outcomes that appear unrelated to each other. You should ask in each instance whether the assumption of independence seems reasonable. When random variables are not independent, the variance of their sum depends on the correlation between them as well as on their individual variances. In Chapter 2, we met the correlation r between two observed variables measured on the same individuals. We defined (page 104) the correlation r as an average of the products of the standardized x and y observations. The correlation between two random variables is defined in the same way, once again using a weighted average with probabilities as weights. We won’t give the details—it is enough to know that the correlation between two random variables has the same basic properties as the correlation r calculated from data. We use r, the Greek letter rho, for the correlation between two random variables. The correlation r is a number between ⫺1 and 1 that measures the direction and strength of the linear relationship between two variables. The correlation between two independent random variables is zero. Returning to family finances, if X is the percent of a family’s after-tax income that is spent and Y is the percent that is saved, then Y ⫽ 100 ⫺ X. This is a perfect linear relationship with a negative slope, so the correlation between X and Y is r ⫽ ⫺1. With the correlation at hand, we can state the rules for manipulating variances.
RULES FOR VARIANCES AND STANDARD DEVIATIONS OF LINEAR TRANSFORMATIONS, SUMS, AND DIFFERENCES Rule 1. If X is a random variable and a and b are fixed numbers, then s2a⫹bX ⫽ b2s2X Rule 2. If X and Y are independent random variables, then s2X⫹Y ⫽ s2X ⫹ s2Y s2X⫺Y ⫽ s2X ⫹ s2Y This is the addition rule for variances of independent random variables.
276
CHAPTER 4
•
Probability: The Study of Randomness
Rule 3. If X and Y have correlation r, then s2X⫹Y ⫽ s2X ⫹ s2Y ⫹ 2rsXsY s2X⫺Y ⫽ s2X ⫹ s2Y ⫺ 2rsXsY This is the general addition rule for variances of random variables. To find the standard deviation, take the square root of the variance. Because a variance is the average of squared deviations from the mean, multiplying X by a constant b multiplies s2X by the square of the constant. Adding a constant a to a random variable changes its mean but does not change its variability. The variance of X ⫹ a is therefore the same as the variance of X. Because the square of ⫺1 is 1, the addition rule says that the variance of a difference between independent random variables is the sum of the variances. For independent random variables, the difference X ⫺ Y is more variable than either X or Y alone because variations in both X and Y contribute to variation in their difference. As with data, we prefer the standard deviation to the variance as a measure of the variability of a random variable. Rule 2 for variances implies that standard deviations of independent random variables do not add. To combine standard deviations, use the rules for variances. For example, the standard deviations of 2X and ⫺2X are both equal to 2sX because this is the square root of the variance 4s2X.
EXAMPLE 4.35 Payoff in the Tri-State Pick 3 lottery. The payoff X of a $1 ticket in the Tri-State Pick 3 game is $500 with probability 1/1000 and 0 the rest of the time. Here is the combined calculation of mean and variance: xi 0 500
pi
xi pi
0.999
0
0.001
0.5
(xi 2 mX)2pi 10 ⫺ 0.52 2 10.9992 5
0.24975
2
1500 ⫺ 0.52 10.0012 5 249.50025
mX ⫽ 0.5
s2X 5 249.75
The mean payoff is 50 cents. The standard deviation is sX ⫽ 2249.75 ⫽ $15.80. It is usual for games of chance to have large standard deviations because large variability makes gambling exciting. If you buy a Pick 3 ticket, your winnings are W ⫽ X ⫺ 1 because the dollar you paid for the ticket must be subtracted from the payoff. Let’s find the mean and variance for this random variable.
EXAMPLE 4.36 Winnings in the Tri-State Pick 3 lottery. By the rules for means, the mean amount you win is mW ⫽ mX ⫺ 1 ⫽ ⫺$0.50
4.4 Means and Variances of Random Variables
277
That is, you lose an average of 50 cents on a ticket. The rules for variances remind us that the variance and standard deviation of the winnings W ⫽ X ⫺ 1 are the same as those of X. Subtracting a fixed number changes the mean but not the variance. Suppose now that you buy a $1 ticket on each of two different days. The payoffs X and Y on the two tickets are independent because separate drawings are held each day. Your total payoff is X ⫹ Y. Let’s find the mean and standard deviation for this payoff.
EXAMPLE 4.37 Two tickets. The mean for the payoff for the two tickets is mX⫹Y ⫽ mX ⫹ mY ⫽ $0.50 ⫹ $0.50 ⫽ $1.00 Because X and Y are independent, the variance of X ⫹ Y is s2X⫹Y ⫽ s2X ⫹ s2Y ⫽ 249.75 ⫹ 249.75 ⫽ 499.5 The standard deviation of the total payoff is sX⫹Y ⫽ 2499.5 ⫽ $22.35 This is not the same as the sum of the individual standard deviations, which is $15.80 ⫹ $15.80 ⫽ $31.60. Variances of independent random variables add; standard deviations do not. When we add random variables that are correlated, we need to use the correlation for the calculation of the variance, but not for the calculation of the mean. Here is an example.
EXAMPLE 4.38 Utility bills. Consider a household where the monthly bill for natural gas averages $125 with a standard deviation of $75, while the monthly bill for electricity averages $174 with a standard deviation of $41. The correlation between the two bills is ⫺0.55. Let’s compute the mean and standard deviation of the sum of the naturalgas bill and the electricity bill. We let X stand for the natural-gas bill and Y stand for the electricity bill. Then the total is X ⫹ Y. Using the rules for means, we have mX⫹Y ⫽ mX ⫹ mY ⫽ 125 ⫹ 174 ⫽ 299 To find the standard deviation we first find the variance and then take the square root to determine the standard deviation. From the general addition rule for variances of random variables, s2X⫹Y ⫽ s2X ⫹ s2Y ⫹ 2rsXsY ⫽ 1752 2 ⫹ 1412 2 ⫹ 122 1⫺0.552 1752 1412 ⫽ 3923.5
278
CHAPTER 4
•
Probability: The Study of Randomness Therefore, the standard deviation is sX⫹Y ⫽ 23923.5 ⫽ 63 The total of the natural-gas bill and the electricity bill has mean $299 and standard deviation $63.
The negative correlation in Example 4.38 is due to the fact that, in this household, natural gas is used for heating and electricity is used for air-conditioning. So, when it is warm, the electricity charges are high and the natural-gas charges are low. When it is cool, the reverse is true. This causes the standard deviation of the sum to be less than it would be if the two bills were uncorrelated (see Exercise 4.83, on page 281). There are situations where we need to combine several of our rules to find means and standard deviations. Here is an example.
EXAMPLE 4.39 Calcium intake. To get enough calcium for optimal bone health, tablets containing calcium are often recommended to supplement the calcium in the diet. One study designed to evaluate the effectiveness of a supplement followed a group of young people for seven years. Each subject was assigned to take either a tablet containing 1000 milligrams of calcium per day (mg/d) or a placebo tablet that was identical except that it had no calcium.18 A major problem with studies like this one is compliance: subjects do not always take the treatments assigned to them. In this study, the compliance rate declined to about 47% toward the end of the seven-year period. The standard deviation of compliance was 22%. Calcium from the diet averaged 850 mg/d with a standard deviation of 330 mg/d. The correlation between compliance and dietary intake was 0.68. Let’s find the mean and standard deviation for the total calcium intake. We let S stand for the intake from the supplement and D stand for the intake from the diet. We start with the intake from the supplement. Since the compliance is 47% and the amount in each tablet is 1000 mg, the mean for S is mS ⫽ 100010.472 ⫽ 470 Since the standard deviation of the compliance is 22%, the variance of S is s2S ⫽ 10002 10.222 2 ⫽ 48, 400 The standard deviation is sS ⫽ 248, 400 ⫽ 220 Be sure to verify which rules for means and variances are used in these calculations. We can now find the mean and standard deviation for the total intake. The mean is mS⫹D ⫽ mS ⫹ mD ⫽ 470 ⫹ 850 ⫽ 1320
4.4 Means and Variances of Random Variables
279
and the variance is s2S⫹D ⫽ s2S ⫹ s2D ⫹ 2rsSsD ⫽ 12202 2 ⫹ 13302 2 ⫹ 210.682 12202 13302 ⫽ 256, 036 and the standard deviation is sS⫹D ⫽ 2256, 036 ⫽ 506 The mean of the total calcium intake is 1320 mg/d and the standard deviation is 506 mg/d. The correlation in this example illustrates an unfortunate fact about compliance and having an adequate diet. Some of the subjects in this study have diets that provide an adequate amount of calcium while others do not. The positive correlation between compliance and dietary intake tells us that those who have relatively high dietary intakes are more likely to take the assigned supplements. On the other hand, those subjects with relatively low dietary intakes, the ones who need the supplement the most, are less likely to take the assigned supplements.
SECTION 4.4 Summary The probability distribution of a random variable X, like a distribution of data, has a mean MX and a standard deviation SX. The law of large numbers says that the average of the values of X observed in many trials must approach m. The mean M is the balance point of the probability histogram or density curve. If X is discrete with possible values xi having probabilities pi, the mean is the average of the values of X, each weighted by its probability: mX ⫽ x1 p1 ⫹ x2 p2 ⫹ p ⫹ xk pk The variance s2X is the average squared deviation of the values of the variable from their mean. For a discrete random variable, s2X ⫽ 1x1 ⫺ mX 2 2p1 ⫹ 1x2 ⫺ mX 2 2p2 ⫹ p ⫹ 1xk ⫺ mX 2 2pk The standard deviation SX is the square root of the variance. The standard deviation measures the variability of the distribution about the mean. It is easiest to interpret for Normal distributions. The mean and variance of a continuous random variable can be computed from the density curve, but to do so requires more advanced mathematics. The means and variances of random variables obey the following rules. If a and b are fixed numbers, then ma⫹bX ⫽ a ⫹ bmX s2a⫹bX ⫽ b2s2X If X and Y are any two random variables having correlation r, then mX⫹Y ⫽ mX ⫹ mY mX⫺Y ⫽ mX ⫺ mY s2X⫹Y ⫽ s2X ⫹ s2Y ⫹ 2rsXsY s2X⫺Y ⫽ s2X ⫹ s2Y ⫺ 2rsXsY
280
CHAPTER 4
•
Probability: The Study of Randomness If X and Y are independent, then r ⫽ 0. In this case, s2X⫹Y ⫽ s2X ⫹ s2Y s2X⫺Y ⫽ s2X ⫹ s2Y To find the standard deviation, take the square root of the variance.
SECTION 4.4 Exercises For Exercise 4.67, see page 265; for Exercise 4.68, see page 269; for Exercises 4.69 and 4.70, see page 273; and for Exercise 4.71, see page 274. 4.72 Find the mean of the random variable. A random variable X has the following distribution. X
21
0
1
2
Probability
0.3
0.2
0.2
0.3
Find the mean for this random variable. Show your work. 4.73 Explain what happens when the sample size gets large. Consider the following scenarios: (1) You take a sample of two observations on a random variable and compute the sample mean, (2) you take a sample of 100 observations on the same random variable and compute the sample mean, (3) you take a sample of 1000 observations on the same random variable and compute the sample mean. Explain in simple language how close you expect the sample mean to be to the mean of the random variable as you move from Scenario 1 to Scenario 2 to Scenario 3. 4.74 Find some means. Suppose that X is a random variable with mean 20 and standard deviation 5. Also suppose that Y is a random variable with mean 40 and standard deviation 10. Find the mean of the random variable Z for each of the following cases. Be sure to show your work. (a) Z ⫽ 2 ⫹ 10X.
4.76 Find some variances and standard deviations. Suppose that X is a random variable with mean 20 and standard deviation 5. Also suppose that Y is a random variable with mean 40 and standard deviation 10. Find the variance and standard deviation of the random variable Z for each of the following cases. Be sure to show your work. (a) Z ⫽ 2 ⫹ 10X. (b) Z ⫽ 10X ⫺ 2. (c) Z ⫽ X ⫹ Y. (d) Z ⫽ X ⫺ Y. (e) Z ⫽ ⫺3X ⫺ 2Y. 4.77 What happens if the correlation is not zero? Suppose that X is a random variable with mean 20 and standard deviation 5. Also suppose that Y is a random variable with mean 40 and standard deviation 10. Assume that the correlation between X and Y is 0.5. Find the mean of the random variable Z for each of the following cases. Be sure to show your work. (a) Z ⫽ 2 ⫹ 10X. (b) Z ⫽ 10X ⫺ 2. (c) Z ⫽ X ⫹ Y. (d) Z ⫽ X ⫺ Y. (e) Z ⫽ ⫺3X ⫺ 2Y. 4.78 What’s wrong? In each of the following scenarios, there is something wrong. Describe what is wrong and give a reason for your answer.
(b) Z ⫽ 10X ⫺ 2. (c) Z ⫽ X ⫹ Y.
(a) If you toss a fair coin three times and get heads all three times, then the probability of getting a tail on the next toss is much greater than one-half.
(d) Z ⫽ X ⫺ Y. (e) Z ⫽ ⫺3X ⫺ 2Y. 4.75 Find the variance and the standard deviation. A random variable X has the following distribution. X
21
0
1
2
Probability
0.3
0.2
0.2
0.3
Find the variance and the standard deviation for this random variable. Show your work.
(b) If you multiply a random variable by 10, then the mean is multiplied by 10 and the variance is multiplied by 10. (c) When finding the mean of the sum of two random variables, you need to know the correlation between them. 4.79 Servings of fruits and vegetables. The following table gives the distribution of the number of servings of fruits and vegetables consumed per day in a population.
4.4 Means and Variances of Random Variables
Number of servings X Probability
0
1
2
3
4
5
0.3
0.1
0.1
0.2
0.2
0.1
Find the mean for this random variable. 4.80 Mean of the distribution for the number of aces. In Exercise 4.58 (page 262) you examined the probability distribution for the number of aces when you are dealt two cards in the game of Texas hold ’em. Let X represent the number of aces in a randomly selected deal of two cards in this game. Here is the probability distribution for the random variable X: Value of X
0
1
Probability
0.8507
0.1448
2 0.0045
Find mX, the mean of the probability distribution of X. 4.81 Standard deviation of the number of aces. Refer to Exercise 4.80. Find the standard deviation of the number of aces. 4.82 Standard deviation for fruits and vegetables. Refer to Exercise 4.79. Find the variance and the standard deviation for the distribution of the number of servings of fruits and vegetables. 4.83 Suppose that the correlation is zero. Refer to Example 4.38 (page 277).
281
4.86 Toss a four-sided die twice. Role-playing games like Dungeons & Dragons use many different types of dice. Suppose that a four-sided die has faces marked 1, 2, 3, and 4. The intelligence of a character is determined by rolling this die twice and adding 1 to the sum of the spots. The faces are equally likely and the two rolls are independent. What is the average (mean) intelligence for such characters? How spread out are their intelligences, as measured by the standard deviation of the distribution? 4.87 Means and variances of sums. The rules for means and variances allow you to find the mean and variance of a sum of random variables without first finding the distribution of the sum, which is usually much harder to do. (a) A single toss of a balanced coin has either 0 or 1 head, each with probability 1/2. What are the mean and standard deviation of the number of heads? (b) Toss a coin four times. Use the rules for means and variances to find the mean and standard deviation of the total number of heads. (c) Example 4.23 (page 255) finds the distribution of the number of heads in four tosses. Find the mean and standard deviation from this distribution. Your results in parts (b) and (c) should agree.
(b) Is this standard deviation larger or smaller than the standard deviation computed in Example 4.38? Explain why.
4.88 What happens when the correlation is 1? We know that variances add if the random variables involved are uncorrelated (r ⫽ 02, but not otherwise. The opposite extreme is perfect positive correlation (r ⫽ 12. Show by using the general addition rule for variances that in this case the standard deviations add. That is, sX⫹Y ⫽ sX ⫹ sY if rXY ⫽ 1.
4.84 Find the mean of the sum. Figure 4.12 (page 263) displays the density curve of the sum Y ⫽ X1 ⫹ X2 of two independent random numbers, each uniformly distributed between 0 and 1.
4.89 Will you assume independence? In which of the following games of chance would you be willing to assume independence of X and Y in making a probability model? Explain your answer in each case.
(a) The mean of a continuous random variable is the balance point of its density curve. Use this fact to find the mean of Y from Figure 4.12.
(a) In blackjack, you are dealt two cards and examine the total points X on the cards (face cards count 10 points). You can choose to be dealt another card and compete based on the total points Y on all three cards.
(a) Recompute the standard deviation for the total of the natural-gas bill and the electricity bill assuming that the correlation is zero.
(b) Use the same fact to find the means of X1 and X2. (They have the density curve pictured in Figure 4.9, page 258.) Verify that the mean of Y is the sum of the mean of X1 and the mean of X2.
(b) In craps, the betting is based on successive rolls of two dice. X is the sum of the faces on the first roll, and Y the sum of the faces on the next roll.
4.85 Calcium supplements and calcium in the diet. Refer to Example 4.39 (page 278). Suppose that people who have high intakes of calcium in their diets are more compliant than those who have low intakes. What effect would this have on the calculation of the standard deviation for the total calcium intake? Explain your answer.
4.90 Transform the distribution of heights from centimeters to inches. A report of the National Center for Health Statistics says that the heights of 20-year-old men have mean 176.8 centimeters (cm) and standard deviation 7.2 cm. There are 2.54 centimeters in an inch. What are the mean and standard deviation in inches?
282
CHAPTER 4
•
Probability: The Study of Randomness
Insurance. The business of selling insurance is based on probability and the law of large numbers. Consumers buy insurance because we all face risks that are unlikely but carry high cost. Think of a fire destroying your home. So we form a group to share the risk: we all pay a small amount, and the insurance policy pays a large amount to those few of us whose homes burn down. The insurance company sells many policies, so it can rely on the law of large numbers. Exercises 4.91 to 4.94 explore aspects of insurance. 4.91 Fire insurance. An insurance company looks at the records for millions of homeowners and sees that the mean loss from fire in a year is m ⫽ $300 per person. (Most of us have no loss, but a few lose their homes. The $300 is the average loss.) The company plans to sell fire insurance for $300 plus enough to cover its costs and profit. Explain clearly why it would be stupid to sell only 10 policies. Then explain why selling thousands of such policies is a safe business. 4.92 Mean and standard deviation for 10 and for 12 policies. In fact, the insurance company sees that in the entire population of homeowners, the mean loss from fire is m ⫽ $300 and the standard deviation of the loss is s ⫽ $400. What are the mean and standard deviation of the average loss for 10 policies? (Losses on separate policies are independent.) What are the mean and standard deviation of the average loss for 12 policies? 4.93 Life insurance. Assume that a 25-year-old man has these probabilities of dying during the next five years:
Age at death
25
26
27
28
29
Probability
0.00039
0.00044
0.00051
0.00057
0.00060
(a) What is the probability that the man does not die in the next five years? (b) An online insurance site offers a term insurance policy that will pay $100,000 if a 25-year-old man dies within the next five years. The cost is $175 per year. So the insurance company will take in $875 from this policy if the man does not die within five years. If he does die, the company must pay $100,000. Its loss depends on how many premiums the man paid, as follows: Age at death Loss
25
26
27
28
29
$99,825
$99,650
$99,475
$99,300
$99,125
What is the insurance company’s mean cash intake from such polices? 4.94 Risk for one versus thousands of life insurance policies. It would be quite risky for you to insure the life of a 25-year-old friend under the terms of Exercise 4.93. There is a high probability that your friend would live and you would gain $875 in premiums. But if he were to die, you would lose almost $100,000. Explain carefully why selling insurance is not risky for an insurance company that insures many thousands of 25-yearold men.
4.5 General Probability Rules When you complete this section, you will be able to • Apply the five rules of probability. • Apply the general addition rule for unions of two or more events. • Find conditional probabilities. • Apply the multiplication rule. • Use a tree diagram to find probabilities. • Use Bayes’s rule to find probabilities. • Determine whether or not two events that both have positive probability are independent.
Our study of probability has concentrated on random variables and their distributions. Now we return to the laws that govern any assignment of probabilities. The purpose of learning more laws of probability is to be able to give probability models for more complex random phenomena. We have already met and used five rules.
4.5 General Probability Rules
283
PROBABILITY RULES Rule 1. 0 ⱕ P1A2 ⱕ 1 for any event A Rule 2. P1S2 ⫽ 1 Rule 3. Addition rule: If A and B are disjoint events, then P1A or B2 ⫽ P1A2 ⫹ P1B2 Rule 4. Complement rule: For any event A, P1Ac 2 ⫽ 1 ⫺ P1A2 Rule 5. Multiplication rule: If A and B are independent events, then P1A and B2 ⫽ P1A2P1B2
General addition rules Probability has the property that if A and B are disjoint events, then P1A or B2 ⫽ P1A2 ⫹ P1B2. What if there are more than two events, or if the events are not disjoint? These circumstances are covered by more general addition rules for probability.
UNION The union of any collection of events is the event that at least one of the collection occurs. For two events A and B, the union is the event {A or B} that A or B or both occur. From the addition rule for two disjoint events we can obtain rules for more general unions. Suppose first that we have several events—say A, B, and C—that are disjoint in pairs. That is, no two can occur simultaneously. The Venn diagram in Figure 4.15 illustrates three disjoint events. The addition rule for two disjoint events extends to the following law.
ADDITION RULE FOR DISJOINT EVENTS If events A, B, and C are disjoint in the sense that no two have any outcomes in common, then P1one or more of A, B, C2 ⫽ P1A2 ⫹ P1B2 ⫹ P1C2 This rule extends to any number of disjoint events. FIGURE 4.15 The addition rule for disjoint events: P(A or B or C ) 5 P(A) 1 P(B ) 1 P(C ) when events A, B, and C are disjoint.
S C A B
284
CHAPTER 4
•
Probability: The Study of Randomness
EXAMPLE 4.40 Probabilities as areas. Generate a random number X between 0 and 1. What is the probability that the first digit after the decimal point will be odd? The random number X is a continuous random variable whose density curve has constant height 1 between 0 and 1 and is 0 elsewhere. The event that the first digit of X is odd is the union of five disjoint events. These events are 0.10 0.30 0.50 0.70 0.90
ⱕ ⱕ ⱕ ⱕ ⱕ
X X X X X
⬍ ⬍ ⬍ ⬍ ⬍
0.20 0.40 0.60 0.80 1.00
Figure 4.16 illustrates the probabilities of these events as areas under the density curve. Each area is 0.1. The union of the five therefore has probability equal to the sum, or 0.5. As we should expect, a random number is equally likely to begin with an odd or an even digit. 1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
FIGURE 4.16 The probability that the first digit after the decimal point of a random number is odd is the sum of the probabilities of the 5 disjoint events shown. See Example 4.40.
USE YOUR KNOWLEDGE 4.95 Probability that you roll a 2 or a 4 or a 5. If you roll a die, the probability of each of the six possible outcomes (1, 2, 3, 4, 5, 6) is 1/6. What is the probability that you roll a 2 or a 4 or a 5? If events A and B are not disjoint, they can occur simultaneously. The probability of their union is then less than the sum of their probabilities. As Figure 4.17 suggests, the outcomes common to both are counted twice when we add probabilities, so we must subtract this probability once. Here is the addition rule for the union of any two events, disjoint or not.
4.5 General Probability Rules S
A and B
A
FIGURE 4.17 The union
285
B
of two events that are not disjoint. The general addition rule says that P(A or B) 5 P(A) 1 P(B) 2 P(A and B).
GENERAL ADDITION RULE FOR UNIONS OF TWO EVENTS For any two events A and B, P1A or B2 ⫽ P1A2 ⫹ P1B2 ⫺ P1A and B2
If A and B are disjoint, the event {A or B} that both occur has no outcomes in it. This empty event is the complement of the sample space S and must have probability 0. So the general addition rule includes Rule 3, the addition rule for disjoint events.
EXAMPLE 4.41 Adequate sleep and exercise. Suppose that 40% of adults get enough sleep and 46% exercise regularly. What is the probability that an adult gets enough sleep or exercises regularly? To find this probability, we also need to know the percent who get enough sleep and exercise. Let’s assume that 24% do both. We will use the notation of the general addition rule for unions of two events. Let A be the event that an adult gets enough sleep and let B be the event that a person exercises regularly. We are given that P1A2 ⫽ 0.40, P1B2 ⫽ 0.46, and P1A and B2 ⫽ 0.24. Therefore, P1A or B2 ⫽ P1A2 ⫹ P1B2 ⫺ P1A and B2 ⫽ 0.40 ⫹ 0.46 ⫺ 0.24 ⫽ 0.62 The probability that an adult gets enough sleep or exercises regularly is 0.62, or 62%. USE YOUR KNOWLEDGE 4.96 Probability that your roll is odd or greater than 4. If you roll a die, the probability of each of the six possible outcomes (1, 2, 3, 4, 5, 6) is 1/6. What is the probability that your roll is odd or greater than 4? Venn diagrams are a great help in finding probabilities for unions because you can just think of adding and subtracting areas. Figure 4.18 shows some events and their probabilities for Example 4.41. What is the probability that an adult gets adequate sleep and does not exercise?
286
CHAPTER 4
•
Probability: The Study of Randomness
Sleep No Exercise No 0.38
Sleep Yes Exercise No 0.16
Sleep No Exercise Yes 0.22 Sleep Yes Exercise Yes 0.24
FIGURE 4.18 Venn diagram and probabilities for Example 4.41.
The Venn diagram shows that this is the probability that an adult gets adequate sleep minus the probability that an adult gets adequate sleep and exercises regularly, 0.40 ⫺ 0.24 ⫽ 0.16. Similarly, the probability that an adult does not get adequate sleep and exercises regularly is 0.46 ⫺ 0.24 ⫽ 0.22. The four probabilities that appear in the figure add to 1 because they refer to four disjoint events whose union is the entire sample space.
Conditional probability The probability we assign to an event can change if we know that some other event has occurred. This idea is the key to many applications of probability.
EXAMPLE 4.42 Probability of being dealt an ace. Slim is a professional poker player. He stares at the dealer, who prepares to deal. What is the probability that the card dealt to Slim is an ace? There are 52 cards in the deck. Because the deck was carefully shuffled, the next card dealt is equally likely to be any of the cards that Slim has not seen. Four of the 52 cards are aces. So P1ace2 ⫽
4 1 ⫽ 52 13
This calculation assumes that Slim knows nothing about any cards already dealt. Suppose now that he is looking at 4 cards already in his hand, and that one of them is an ace. He knows nothing about the other 48 cards except that exactly 3 aces are among them. Slim’s probability of being dealt an ace given what he knows is now P1ace 0 1 ace in 4 visible cards2 ⫽
3 1 ⫽ 48 16
Knowing that there is 1 ace among the 4 cards Slim can see changes the probability that the next card dealt is an ace. conditional probability
The new notation P1A 0 B2 is a conditional probability. That is, it gives the probability of one event (the next card dealt is an ace) under the condition that we know another event (exactly 1 of the 4 visible cards is an ace). You can read the bar 0 as “given the information that.”
4.5 General Probability Rules
287
MULTIPLICATION RULE The probability that both of two events A and B happen together can be found by P1A and B2 ⫽ P1A2P1B 0 A2 Here P1B 0 A2 is the conditional probability that B occurs, given the information that A occurs.
USE YOUR KNOWLEDGE 4.97 The probability of another ace. Refer to Example 4.42. Suppose that two of the four cards in Slim’s hand are aces. What is the probability that the next card dealt to him is an ace?
EXAMPLE 4.43 Downloading music from the Internet. The multiplication rule is just common sense made formal. For example, suppose that 29% of Internet users download music files, and 67% of downloaders say they don’t care if the music is copyrighted. So the percent of Internet users who download music (event A2 and don’t care about copyright (event B2 is 67% of the 29% who download, or 10.672 10.292 ⫽ 0.1943 ⫽ 19.43% The multiplication rule expresses this as P1A and B2 ⫽ P1A2 ⫻ P1B 0 A2 ⫽ 10.292 10.672 ⫽ 0.1943 Here is another example that uses conditional probability.
EXAMPLE 4.44 Probability of a favorable draw. Slim is still at the poker table. At the moment, he wants very much to draw two diamonds in a row. As he sits at the table looking at his hand and at the upturned cards on the table, Slim sees 11 cards. Of these, 4 are diamonds. The full deck contains 13 diamonds among its 52 cards, so 9 of the 41 unseen cards are diamonds. To find Slim’s probability of drawing two diamonds, first calculate 9 41 8 P1second card diamond 0 first card diamond2 ⫽ 40 P1first card diamond2 ⫽
Slim finds both probabilities by counting cards. The probability that the first card drawn is a diamond is 9/41 because 9 of the 41 unseen cards are diamonds. If the first card is a diamond, that leaves 8 diamonds among the
288
CHAPTER 4
•
Probability: The Study of Randomness 40 remaining cards. So the conditional probability of another diamond is 8/40. The multiplication rule now says that P1both cards diamonds2 ⫽
9 8 ⫻ ⫽ 0.044 41 40
Slim will need luck to draw his diamonds. USE YOUR KNOWLEDGE 4.98 The probability that the next two cards are diamonds. In the setting of Example 4.42, suppose that Slim sees 23 cards and the only diamonds are the 3 in his hand. What is the probability that the next 2 cards dealt to Slim will be diamonds? This outcome would give him 5 cards from the same suit, a hand that is called a flush. If P1A2 and P1A and B2 are given, we can rearrange the multiplication rule to produce a definition of the conditional probability P1B 0 A2 in terms of unconditional probabilities.
DEFINITION OF CONDITIONAL PROBABILITY When P1A2 ⬎ 0, the conditional probability of B given A is P1B 0 A2 ⫽
P1A and B2 P1A2
Be sure to keep in mind the distinct roles in P1B 0 A2 of the event B whose probability we are computing and the event A that represents the information we are given. The conditional probability P1B 0 A2 makes no sense if the event A can never occur, so we require that P1A2 ⬎ 0 whenever we talk about P1B 0 A2.
EXAMPLE 4.45 College students. Here is the distribution of U.S. college students classified by age and full-time or part-time status: Age (years)
Full-time
Part-time
15 to 19
0.21
0.02
20 to 24
0.32
0.07
25 to 29
0.10
0.10
30 and over
0.05
0.13
Let’s compute the probability that a student is aged 15 to 19, given that the student is full-time. We know that the probability that a student is full-time and aged 15 to 19 is 0.21 from the table of probabilities. But what we want here is a conditional probability, given that a student is full-time. Rather than asking
4.5 General Probability Rules
289
about age among all students, we restrict our attention to the subpopulation of students who are full-time. Let A ⫽ the student is between 15 and 19 years of age B ⫽ the student is a full-time student Our formula is P1A 0 B2 ⫽
P1A and B2 P1B2
We read P1A and B2 ⫽ 0.21 from the table as we mentioned previously. What about P1B2? This is the probability that a student is full-time. Notice that there are four groups of students in our table that fit this description. To find the probability needed, we add the entries: P1B2 ⫽ 0.21 ⫹ 0.32 ⫹ 0.10 ⫹ 0.05 ⫽ 0.68 We are now ready to complete the calculation of the conditional probability: P1A and B2 P1B2 0.21 ⫽ 0.68 ⫽ 0.31
P1A 0 B2 ⫽
The probability that a student is 15 to 19 years of age, given that the student is full-time, is 0.31. Here is another way to give the information in the last sentence of this example: 31% of full-time college students are 15 to 19 years old. Which way do you prefer? USE YOUR KNOWLEDGE 4.99
What rule did we use? In Example 4.45, we calculated P1B2. What rule did we use for this calculation? Explain why this rule applies in this setting.
4.100 Find the conditional probability. Refer to Example 4.45. What is the probability that a student is part-time, given that the student is 15 to 19 years old? Explain in your own words the difference between this calculation and the one that we did in Example 4.45.
General multiplication rules The definition of conditional probability reminds us that in principle all probabilities, including conditional probabilities, can be found from the assignment of probabilities to events that describe random phenomena. More often, however, conditional probabilities are part of the information given to us in a probability model, and the multiplication rule is used to compute P1A and B2. This rule extends to more than two events. The union of a collection of events is the event that any of them occur. Here is the corresponding term for the event that all of them occur.
290
CHAPTER 4
•
Probability: The Study of Randomness
INTERSECTION The intersection of any collection of events is the event that all the events occur. To extend the multiplication rule to the probability that all of several events occur, the key is to condition each event on the occurrence of all the preceding events. For example, the intersection of three events A, B, and C has probability P1A and B and C2 ⫽ P1A2P1B 0 A2P1C 0 A and B2
EXAMPLE 4.46 High school athletes and professional careers. Only 5% of male high school basketball, baseball, and football players go on to play at the college level. Of these, only 1.7% enter major league professional sports. About 40% of the athletes who compete in college and then reach the pros have a career of more than three years. Define these events: A ⫽ 5competes in college6 B ⫽ 5competes professionally6 C ⫽ 5pro career longer than 3 years6 What is the probability that a high school athlete competes in college and then goes on to have a pro career of more than three years? We know that P1A2 ⫽ 0.05 P1B 0 A2 ⫽ 0.017 P1C 0 A and B2 ⫽ 0.4 The probability we want is therefore P1A and B and C2 ⫽ P1A2P1B 0 A2P1C 0 A and B2 ⫽ 0.05 ⫻ 0.017 ⫻ 0.4 ⫽ 0.00034 Only about 3 of every 10,000 high school athletes can expect to compete in college and have a professional career of more than three years. High school students would be wise to concentrate on studies rather than on unrealistic hopes of fortune from pro sports.
Tree diagrams Probability problems often require us to combine several of the basic rules into a more elaborate calculation. Here is an example that illustrates how to solve problems that have several stages.
EXAMPLE 4.47 Online chat rooms. Online chat rooms are dominated by the young. Teens are the biggest users. If we look only at adult Internet users (aged 18 and over), 47% of the 18 to 29 age group chat, as do 21% of the 30 to 49 age group and just 7% of those 50 and over. To learn what percent of all
4.5 General Probability Rules
tree diagram
291
Internet users participate in chat, we also need the age breakdown of users. Here it is: 29% of adult Internet users are 18 to 29 years old (event A1 2, another 47% are 30 to 49 (event A2 2, and the remaining 24% are 50 and over (event A3 2. What is the probability that a randomly chosen adult user of the Internet participates in chat rooms (event C2? To find out, use the tree diagram in Figure 4.19 to organize your thinking. Each segment in the tree is one stage of the problem. Each complete branch shows a path through the two stages. The probability written on each segment is the conditional probability of an Internet user following that segment, given that he or she has reached the node from which it branches. Starting at the left, an Internet user falls into one of the three age groups. The probabilities of these groups P1A1 2 ⫽ 0.29 P1A2 2 ⫽ 0.47 P1A3 2 ⫽ 0.24 mark the leftmost branches in the tree. Conditional on being 18 to 29 years old, the probability of participating in chat is P1C 0 A1 2 ⫽ 0.47. So the conditional probability of not participating is P1C c 0 A1 2 ⫽ 1 ⫺ 0.47 ⫽ 0.53 These conditional probabilities mark the paths branching out from the A1 node in Figure 4.19. The other two age group nodes similarly lead to two branches marked with the conditional probabilities of chatting or not. The probabilities on the branches from any node add to 1 because they cover all possibilities, given that this node was reached. There are three disjoint paths to C, one for each age group. By the addition rule, P1C2 is the sum of their probabilities. The probability of reaching C through the 18 to 29 age group is P1C and A1 2 ⫽ P1A1 2P1C 0 A1 2 ⫽ 0.29 ⫻ 0.47 ⫽ 0.1363
FIGURE 4.19 Tree diagram for
0.47
Chat? C
Probability 0.1363*
0.53
Cc
0.1537
0.21
C
0.0987*
0.79
Cc
0.3713
0.07
C
0.0168*
0.93
Cc
0.2232
Age
Example 4.47. The probability P(C ) is the sum of the probabilities of the three branches marked with asterisks (*).
A1
0.29
Internet user
0.47
A2
0.24 A3
292
CHAPTER 4
•
Probability: The Study of Randomness Follow the paths to C through the other two age groups. The probabilities of these paths are P1C and A2 2 ⫽ P1A2 2P1C 0 A2 2 ⫽ 10.472 10.212 ⫽ 0.0987 P1C and A3 2 ⫽ P1A3 2P1C 0 A3 2 ⫽ 10.242 10.072 ⫽ 0.0168 The final result is P1C2 ⫽ 0.1363 ⫹ 0.0987 ⫹ 0.0168 ⫽ 0.2518 About 25% of all adult Internet users take part in chat rooms. It takes longer to explain a tree diagram than it does to use it. Once you have understood a problem well enough to draw the tree, the rest is easy. Tree diagrams combine the addition and multiplication rules. The multiplication rule says that the probability of reaching the end of any complete branch is the product of the probabilities written on its segments. The probability of any outcome, such as the event C that an adult Internet user takes part in chat rooms, is then found by adding the probabilities of all branches that are part of that event.
USE YOUR KNOWLEDGE 4.101 Draw a tree diagram. Refer to Slim’s chances of a flush in Exercise 4.98 (page 288). Draw a tree diagram to describe the outcomes for the two cards that he will be dealt. At the first stage, his draw can be a diamond or a nondiamond. At the second stage, he has the same possible outcomes but the probabilities are different.
Bayes’s rule There is another kind of probability question that we might ask in the context of thinking about online chat. What percent of adult chat room participants are aged 18 to 29?
EXAMPLE 4.48 Conditional versus unconditional probabilities. In the notation of Example 4.47 this is the conditional probability P1A1 0 C2. Start from the definition of conditional probability and then apply the results of Example 4.46: P1A1 and C2 P1C2 0.1363 ⫽ ⫽ 0.5413 0.2518
P1A1 0 C2 ⫽
Over half of adult chat room participants are between 18 and 29 years old. Compare this conditional probability with the original information (unconditional) that 29% of adult Internet users are between 18 and 29 years old. Knowing that a person chats increases the probability that he or she is young.
4.5 General Probability Rules
293
We know the probabilities P1A1 2, P1A2 2, and P1A3 2 that give the age distribution of adult Internet users. We also know the conditional probabilities P1C 0 A1 2, P1C 0 A2 2, and P1C 0 A3 2 that a person from each age group chats. Example 4.47 shows how to use this information to calculate P1C2. The method can be summarized in a single expression that adds the probabilities of the three paths to C in the tree diagram: P1C2 ⫽ P1A1 2P1C 0 A1 2 ⫹ P1A2 2P1C 0 A2 2 ⫹ P1A3 2P1C 0 A3 2 In Example 4.48 we calculated the “reverse” conditional probability P1A1 0 C2. The denominator 0.2518 in that example came from the previous expression. Put in this general notation, we have another probability law.
BAYES’S RULE Suppose that A1, A2, . . . , Ak are disjoint events whose probabilities are not 0 and add to exactly 1. That is, any outcome is in exactly one of these events. Then if C is any other event whose probability is not 0 or 1, P1Ai 0 C2 ⫽
P1C 0 Ai 2P1Ai 2 P1C 0 A1 2P1A1 2 ⫹ P1C 0 A2 2P1A2 2 ⫹ p ⫹ P1Ak 2P1C 0 Ak 2
The numerator in Bayes’s rule is always one of the terms in the sum that makes up the denominator. The rule is named after Thomas Bayes, who wrestled with arguing from outcomes like C back to the Ai in a book published in 1763. It is far better to think your way through problems like Examples 4.47 and 4.48 than to memorize these formal expressions.
Independence again The conditional probability P1B 0 A2 is generally not equal to the unconditional probability P1B2. That is because the occurrence of event A generally gives us some additional information about whether or not event B occurs. If knowing that A occurs gives no additional information about B, then A and B are independent events. The formal definition of independence is expressed in terms of conditional probability.
INDEPENDENT EVENTS Two events A and B that both have positive probability are independent if P1B 0 A2 ⫽ P1B2
This definition makes precise the informal description of independence given in Section 4.2. We now see that the multiplication rule for independent events, P1A and B2 ⫽ P1A2P1B2, is a special case of the general multiplication rule, P1A and B2 ⫽ P1A2P1B 0 A2, just as the addition rule for disjoint events is a special case of the general addition rule.
294
CHAPTER 4
•
Probability: The Study of Randomness
SECTION 4.5 Summary The complement Ac of an event A contains all outcomes that are not in A. The union {A or B} of events A and B contains all outcomes in A, in B, and in both A and B. The intersection {A and B} contains all outcomes that are in both A and B, but not outcomes in A alone or B alone. The conditional probability P1B 0 A2 of an event B, given an event A, is defined by P1B 0 A2 ⫽
P1A and B2 P1A2
when P1A2 ⬎ 0. In practice, conditional probabilities are most often found from directly available information. The essential general rules of elementary probability are Legitimate values: 0 ⱕ P1A2 ⱕ 1 for any event A Total probability 1: P1S2 ⫽ 1 Complement rule: P1Ac 2 ⫽ 1 ⫺ P1A2 Addition rule: P1A or B2 ⫽ P1A2 ⫹ P1B2 ⫺ P1A and B2 Multiplication rule: P1A and B2 ⫽ P1A2P1B 0 A2 If A and B are disjoint, then P1A and B2 ⫽ 0. The general addition rule for unions then becomes the special addition rule, P1A or B2 ⫽ P1A2 ⫹ P1B2. A and B are independent when P1B 0 A2 ⫽ P1B2. The multiplication rule for intersections then becomes P1A and B2 ⫽ P1A2P1B2. In problems with several stages, draw a tree diagram to organize use of the multiplication and addition rules.
SECTION 4.5 Exercises For Exercise 4.95, see page 284; for Exercise 4.96, see page 285; for Exercise 4.97, see page 287; for Exercise 4.98, see page 288; for Exercises 4.99 and 4.100, see page 289; and for Exercise 4.101, see page 292. 4.102 Find and explain some probabilities. (a) Can we have an event A that has negative probability? Explain your answer. (b) Suppose P1A2 ⫽ 0.2 and P1B2 ⫽ 0.4. Explain what it means for A and B to be disjoint. Assuming that they are disjoint, find the probability that A or B occurs. (c) Explain in your own words the meaning of the rule P1S2 ⫽ 1. (d) Consider an event A. What is the name for the event that A does not occur? If P1A2 ⫽ 0.3, what is the probability that A does not occur? (e) Suppose that A and B are independent and that P1A2 ⫽ 0.2 and P1B2 ⫽ 0.5. Explain the meaning of the event {A and B}, and find its probability.
4.103 Unions. (a) Assume that P1A2 ⫽ 0.4, P1B2 ⫽ 0.3, and P1C2 ⫽ 0.1. If the events A, B, and C are disjoint, find the probability that the union of these events occurs. (b) Draw a Venn diagram to illustrate your answer to part (a). (c) Find the probability of the complement of the union of A, B, and C. 4.104 Conditional probabilities. Suppose that P1A2 ⫽ 0.5, P1B2 ⫽ 0.3, and P1B 0 A2 ⫽ 0.2. (a) Find the probability that both A and B occur. (b) Use a Venn diagram to explain your calculation. (c) What is the probability of the event that B occurs and A does not? 4.105 Find the probabilities. Suppose that the probability that A occurs is 0.6 and the probability that A and B occur is 0.5.
4.5 General Probability Rules (a) Find the probability that B occurs given that A occurs. (b) Illustrate your calculations in part (a) using a Venn diagram. 4.106 Why not? Suppose that P1A2 ⫽ 0.4. Explain why P1A and B2 cannot be 0.5. 4.107 Is the calcium intake adequate? In the population of young children eligible to participate in a study of whether or not their calcium intake is adequate, 52% are 5 to 10 years of age and 48% are 11 to 13 years of age. For those who are 5 to 10 years of age, 18% have inadequate calcium intake. For those who are 11 to 13 years of age, 57% have inadequate calcium intake.19 (a) Use letters to define the events of interest in this exercise. (b) Convert the percents given to probabilities of the events you have defined. (c) Use a tree diagram similar to Figure 4.19 (page 291) to calculate the probability that a randomly selected child from this population has an inadequate intake of calcium. 4.108 Use Bayes’s rule. Refer to the previous exercise. Use Bayes’s rule to find the probability that a child from this population who has inadequate intake is 11 to 13 years old. 4.109 Are the events independent? Refer to the previous two exercises. Are the age of the child and whether or not the child has adequate calcium intake independent? Calculate the probabilities that you need to answer this question, and write a short summary of your conclusion. 4.110 What’s wrong? In each of the following scenarios, there is something wrong. Describe what is wrong and give a reason for your answer. (a) P1A or B2 is always equal to the sum of P1A2 and P1B2. (b) The probability of an event minus the probability of its complement is always equal to 1. (c) Two events are disjoint if P1B 0 A2 ⫽ P1B2. 4.111 Exercise and sleep. Suppose that 40% of adults get enough sleep, 46% get enough exercise, and 24% do both. Find the probabilities of the following events: (a) enough sleep and not enough exercise (b) not enough sleep and enough exercise (c) not enough sleep and not enough exercise (d) For each of parts (a), (b), and (c), state the rule that you used to find your answer. 4.112 Exercise and sleep. Refer to the previous exercise. Draw a Venn diagram showing the probabilities for exercise and sleep.
295
4.113 Lying to a teacher. Suppose that 48% of high school students would admit to lying at least once to a teacher during the past year and that 25% of students are male and would admit to lying at least once to a teacher during the past year.20 Assume that 50% of the students are male. What is the probability that a randomly selected student is either male or would admit to lying to a teacher, during the past year? Be sure to show your work and indicate all the rules that you use to find your answer. 4.114 Lying to a teacher. Refer to the previous exercise. Suppose that you select a student from the subpopulation of those who would admit to lying to a teacher during the past year. What is the probability that the student is female? Be sure to show your work and indicate all the rules that you use to find your answer. 4.115 Attendance at two-year and four-year colleges. In a large national population of college students, 61% attend four-year institutions and the rest attend two-year institutions. Males make up 44% of the students in the four-year institutions and 41% of the students in the twoyear institutions. (a) Find the four probabilities for each combination of gender and type of institution in the following table. Be sure that your probabilities sum to 1. Men
Women
Four-year institution Two-year institution
(b) Consider randomly selecting a female student from this population. What is the probability that she attends a four-year institution? 4.116 Draw a tree diagram. Refer to the previous exercise. Draw a tree diagram to illustrate the probabilities in a situation where you first identify the type of institution attended and then identify the gender of the student. 4.117 Draw a different tree diagram for the same setting. Refer to the previous two exercises. Draw a tree diagram to illustrate the probabilities in a situation where you first identify the gender of the student and then identify the type of institution attended. Explain why the probabilities in this tree diagram are different from those that you used in the previous exercise. 4.118 Education and income. Call a household prosperous if its income exceeds $100,000. Call the household educated if the householder completed college. Select an American household at random, and let A be the event that the selected household is prosperous and B the event
296
CHAPTER 4
•
Probability: The Study of Randomness
that it is educated. According to the Current Population Survey, P1A2 ⫽ 0.138, P1B2 ⫽ 0.261, and the probability that a household is both prosperous and educated is P1A and B2 ⫽ 0.082. What is the probability P1A or B2 that the household selected is either prosperous or educated? 4.119 Find a conditional probability. In the setting of the previous exercise, what is the conditional probability that a household is prosperous, given that it is educated? Explain why your result shows that events A and B are not independent. 4.120 Draw a Venn diagram. Draw a Venn diagram that shows the relation between the events A and B in Exercise 4.118. Indicate each of the following events on your diagram and use the information in Exercise 4.118 to calculate the probability of each event. Finally, describe in words what each event is. (a) {A and B} (b) {Ac and B}
Make a Venn diagram of the events A, B, and C. As in Figure 4.18 (page 286), mark the probabilities of every intersection involving these events and their complements. Use this diagram for Exercises 4.123 to 4.125. 4.123 Find the probability of at least one offer. What is the probability that Julie is offered at least one of the three jobs? 4.124 Find the probability of another event. What is the probability that Julie is offered both the Connecticut and New Jersey jobs, but not the federal job? 4.125 Find a conditional probability. If Julie is offered the federal job, what is the conditional probability that she is also offered the New Jersey job? If Julie is offered the New Jersey job, what is the conditional probability that she is also offered the federal job? 4.126 Academic degrees and gender. Here are the projected numbers (in thousands) of earned degrees in the United States in the 2010–2011 academic year, classified by level and by the sex of the degree recipient:21
(c) {A and Bc } Bachelor’s
(d) {Ac and Bc}
Master’s
Professional
Doctorate
Female
933
502
51
26
4.121 Sales of cars and light trucks. Motor vehicles sold to individuals are classified as either cars or light trucks (including SUVs) and as either domestic or imported. In a recent year, 69% of vehicles sold were light trucks, 78% were domestic, and 55% were domestic light trucks. Let A be the event that a vehicle is a car and B the event that it is imported. Write each of the following events in set notation and give its probability.
Male
661
260
44
26
(a) The vehicle is a light truck.
(c) What is the conditional probability that you choose a woman, given that the person chosen received a professional degree?
(b) The vehicle is an imported car.
(a) Convert this table to a table giving the probabilities for selecting a degree earned and classifying the recipient by gender and the degree by the levels given above. (b) If you choose a degree recipient at random, what is the probability that the person you choose is a woman?
4.122 Job offers. Julie is graduating from college. She has studied biology, chemistry, and computing and hopes to work as a forensic scientist applying her science background to crime investigation. Late one night she thinks about some jobs she has applied for. Let A, B, and C be the events that Julie is offered a job by
(d) Are the events “choose a woman” and “choose a professional degree recipient” independent? How do you know?
A 5 the Connecticut Office of the Chief Medical Examiner
(a) What is the probability that a randomly chosen degree recipient is a man?
B 5 the New Jersey Division of Criminal Justice C 5 the federal Disaster Mortuary Operations Response Team Julie writes down her personal probabilities for being offered these jobs: P1A2 ⫽ 0.7
P1B2 ⫽ 0.5
P1C2 ⫽ 0.3
P1A and B2 ⫽ 0.3
P1A and C2 ⫽ 0.1
P1B and C2 ⫽ 0.1
P1A and B and C2 ⫽ 0
4.127 Find some probabilities. The previous exercise gives the projected number (in thousands) of earned degrees in the United States in the 2010–2011 academic year. Use these data to answer the following questions.
(b) What is the conditional probability that the person chosen received a bachelor’s degree, given that he is a man? (c) Use the multiplication rule to find the probability of choosing a male bachelor’s degree recipient. Check your result by finding this probability directly from the table of counts. 4.128 Conditional probabilities and independence. Using the information in Exercise 4.121, answer these questions.
Chapter 4 Exercises (a) Given that a vehicle is imported, what is the conditional probability that it is a light truck? (b) Are the events “vehicle is a light truck” and “vehicle is imported” independent? Justify your answer. Genetic counseling. Conditional probabilities and Bayes’s rule are a basis for counseling people who may have genetic defects that can be passed to their children. Exercises 4.129 to 4.131 concern genetic counseling settings. 4.129 Albinism. People with albinism have little pigment in their skin, hair, and eyes. The gene that governs albinism has two forms (called alleles), which we denote by a and A. Each person has a pair of these genes, one inherited from each parent. A child inherits one of each parent’s two alleles independently with probability 0.5. Albinism is a recessive trait, so a person is albino only if the inherited pair is aa. (a) Beth’s parents are not albino but she has an albino brother. This implies that both of Beth’s parents have type Aa. Why? (b) Which of the types aa, Aa, AA could a child of Beth’s parents have? What is the probability of each type? (c) Beth is not albino. What are the conditional probabilities for Beth’s possible genetic types, given this fact? (Use the definition of conditional probability.)
297
4.130 Find some conditional probabilities. Beth knows the probabilities for her genetic types from part (c) of the previous exercise. She marries Bob, who is albino. Bob’s genetic type must be aa. (a) What is the conditional probability that a child of Beth and Bob is non-albino if Beth has type Aa? What is the conditional probability of a non-albino child if Beth has type AA? (b) Beth and Bob’s first child is non-albino. What is the conditional probability that Beth is a carrier, type Aa? 4.131 Muscular dystrophy. Muscular dystrophy is an incurable muscle-wasting disease. The most common and serious type, called DMD, is caused by a sex-linked recessive mutation. Specifically, women can be carriers but do not get the disease; a son of a carrier has probability 0.5 of having DMD; a daughter has probability 0.5 of being a carrier. As many as one-third of DMD cases, however, are due to spontaneous mutations in sons of mothers who are not carriers. Toni has one son, who has DMD. In the absence of other information, the probability is 1/3 that the son is the victim of a spontaneous mutation and 2/3 that Toni is a carrier. There is a screening test called the CK test that is positive with probability 0.7 if a woman is a carrier and with probability 0.1 if she is not. Toni’s CK test is positive. What is the probability that she is a carrier?
CHAPTER 4 Exercises 4.132 Repeat the experiment many times. Here is a probability distribution for a random variable X:
4.134 Work with a transformation. Here is a probability distribution for a random variable X:
Value of X
21
2
Value of X
1
2
Probability
0.4
0.6
Probability
0.4
0.6
A single experiment generates a random value from this distribution. If the experiment is repeated many times, what will be the approximate proportion of times that the value is ⫺1? Give a reason for your answer. 4.133 Repeat the experiment many times and take the mean. Here is a probability distribution for a random variable X: Value of X
21
2
Probability
0.2
0.8
A single experiment generates a random value from this distribution. If the experiment is repeated many times, what will be the approximate value of the mean of these random variables? Give a reason for your answer.
(a) Find the mean and the standard deviation of this distribution. (b) Let Y ⫽ 4X ⫺ 2. Use the rules for means and variances to find the mean and the standard deviation of the distribution of Y. (c) For part (b) give the rules that you used to find your answer. 4.135 A different transformation. Refer to the previous exercise. Now let Y ⫽ 4X2 ⫺ 2. (a) Find the distribution of Y. (b) Find the mean and standard deviation for the distribution of Y.
298
CHAPTER 4
•
Probability: The Study of Randomness
(c) Explain why the rules that you used for part (b) of the previous exercise do not work for this transformation. 4.136 Roll a pair of dice two times. Consider rolling a pair of fair dice two times. Let A be the total on the up-faces for the first roll and let B be the total on the up-faces for the second roll. For each of the following pairs of events, tell whether they are disjoint, independent, or neither. (a) A ⫽ 2 on the first roll, B ⫽ 8 or more on the first roll. (b) A ⫽ 2 on the first roll, B ⫽ 8 or more on the second roll. (c) A ⫽ 5 or less on the second roll, B ⫽ 4 or less on the first roll. (d) A ⫽ 5 or less on the second roll, B ⫽ 4 or less on the second roll. 4.137 Find the probabilities. Refer to the previous exercise. Find the probabilities for each event. 4.138 Some probability distributions. Here is a probability distribution for a random variable X: Value of X
2
Probability
0.2
3 0.4
4
For an initial roll of 5 or 9, the odds bet has a winning probability of 2/5 and the payoff for a $10 bet is $15. Similarly, when the initial roll is 6 or 8, the odds bet has a winning probability of 5/11 and the payoff for a $10 bet is $12. Find the mean of the payoff distribution for each of these bets. Then confirm that the bets are fair by showing that the difference between the amount bet and the mean of the distribution of the payoff is zero. 4.140 An ancient Korean drinking game. An ancient Korean drinking game involves a 14-sided die. The players roll the die in turn and must submit to whatever humiliation is written on the up-face: something like “Keep still when tickled on face.” Six of the 14 faces are squares. Let’s call them A, B, C, D, E, and F for short. The other eight faces are triangles, which we will call 1, 2, 3, 4, 5, 6, 7, and 8. Each of the squares is equally likely. Each of the triangles is also equally likely, but the triangle probability differs from the square probability. The probability of getting a square is 0.72. Give the probability model for the 14 possible outcomes. 4.141 Wine tasters. Two wine tasters rate each wine they taste on a scale of 1 to 5. From data on their ratings of a large number of wines, we obtain the following probabilities for both tasters’ ratings of a randomly chosen wine:
0.4 Taster 2
(a) Find the mean and standard deviation for this distribution. (b) Construct a different probability distribution with the same possible values, the same mean, and a larger standard deviation. Show your work and report the standard deviation of your new distribution. (c) Construct a different probability distribution with the same possible values, the same mean, and a smaller standard deviation. Show your work and report the standard deviation of your new distribution. 4.139 A fair bet at craps. Almost all bets made at gambling casinos favor the house. In other words, the difference between the amount bet and the mean of the distribution of the payoff is a positive number. An exception is “taking the odds” at the game of craps, a bet that a player can make under certain circumstances. The bet becomes available when a shooter throws a 4, 5, 6, 8, 9, or 10 on the initial roll. This number is called the “point”; when a point is rolled, we say that a point has been established. If a 4 is the point, an odds bet can be made that wins if a 4 is rolled before a 7 is rolled. The probability of winning this bet is 1/3 and the payoff for a $10 bet is $20 (you keep the $10 you bet and you receive an additional $20). The same probability of winning and the same payoff apply for an odds bet on a 10.
Taster 1
1
2
3
4
5
1
0.03
0.02
0.01
0.00
0.00
2
0.02
0.07
0.06
0.02
0.01
3
0.01
0.05
0.25
0.05
0.01
4
0.00
0.02
0.05
0.20
0.02
5
0.00
0.01
0.01
0.02
0.06
(a) Why is this a legitimate assignment of probabilities to outcomes? (b) What is the probability that the tasters agree when rating a wine? (c) What is the probability that Taster 1 rates a wine higher than 3? What is the probability that Taster 2 rates a wine higher than 3? 4.142 SAT scores. The College Board finds that the distribution of students’ SAT scores depends on the level of education their parents have. Children of parents who did not finish high school have SAT Math scores X with mean 445 and standard deviation 106. Scores Y of children of parents with graduate degrees have mean 566 and standard deviation 109. Perhaps we should standardize to a common scale for equity. Find positive numbers a, b, c, and d such that a ⫹ bX and c ⫹ dY both have mean 500 and standard deviation 100.
Chapter 4 Exercises 4.143 Lottery tickets. Joe buys a ticket in the Tri-State Pick 3 lottery every day, always betting on 956. He will win something if the winning number contains 9, 5, and 6 in any order. Each day, Joe has probability 0.006 of winning, and he wins (or not) independently of other days because a new drawing is held each day. What is the probability that Joe’s first winning ticket comes on the 20th day? 4.144 Slot machines. Slot machines are now video games, with winning determined by electronic random number generators. In the old days, slot machines were like this: you pull the lever to spin three wheels; each wheel has 20 symbols, all equally likely to show when the wheel stops spinning; the three wheels are independent of each other. Suppose that the middle wheel has 8 bells among its 20 symbols, and the left and right wheels have 1 bell each. (a) You win the jackpot if all three wheels show bells. What is the probability of winning the jackpot? (b) What is the probability that the wheels stop with exactly 2 bells showing? The following exercises require familiarity with the material presented in the optional Section 4.5. 4.145 Bachelor’s degrees by gender. Of the 2,325,000 bachelor’s, master’s, and doctoral degrees given by U.S. colleges and universities in a recent year, 69% were bachelor’s degrees, 28% were master’s degrees, and the rest were doctorates. Moreover, women earned 57% of the bachelor’s degrees, 60% of the master’s degrees, and 52% of the doctorates.22 You choose a degree at random and find that it was awarded to a woman. What is the probability that it is a bachelor’s degree? 4.146 Higher education at two-year and four-year institutions. The following table gives the counts of U.S. institutions of higher education classified as public or private and as two-year or four-year:23 Public
Private
Two-year
1000
721
Four-year
2774
672
Convert the counts to probabilities and summarize the relationship between these two variables using conditional probabilities. 4.147 Odds bets at craps. Refer to the odds bets at craps in Exercise 4.139. Suppose that whenever the shooter has an initial roll of 4, 5, 6, 8, 9, or 10, you take the odds. Here are the probabilities for these initial rolls:
Point Probability
4
5
6
8
9
10
3/36
4/36
5/36
5/36
4/36
3/36
299
Draw a tree diagram with the first stage showing the point rolled and the second stage showing whether the point is again rolled before a 7 is rolled. Include a firststage branch showing the outcome that a point is not established. In this case, the amount bet is zero and the distribution of the winnings is the special random variable that has P1X ⫽ 02 ⫽ 1. For the combined betting system where the player always makes a $10 odds bet when it is available, show that the game is fair. 4.148 Weights and heights of children adjusted for age. The idea of conditional probabilities has many interesting applications, including the idea of a conditional distribution. For example, the National Center for Health Statistics produces distributions for weight and height for children while conditioning on other variables. Visit the website cdc.gov/growthcharts/ and describe the different ways that weight and height distributions are conditioned on other variables. 4.149 Wine tasting. In the setting of Exercise 4.141, Taster 1’s rating for a wine is 3. What is the conditional probability that Taster 2’s rating is higher than 3? 4.150 An interesting case of independence. Independence of events is not always obvious. Toss two balanced coins independently. The four possible combinations of heads and tails in order each have probability 0.25. The events A ⫽ head on the first toss B ⫽ both tosses have the same outcome may seem intuitively related. Show that P1B 0 A2 ⫽ P1B2, so that A and B are in fact independent. 4.151 Find some conditional probabilities. Choose a point at random in the square with sides 0 ⱕ x ⱕ 1 and 0 ⱕ y ⱕ 1. This means that the probability that the point falls in any region within the square is the area of that region. Let X be the x coordinate and Y the y coordinate of the point chosen. Find the conditional probability P1Y ⬍ 1兾3 0 Y ⬎ X2. (Hint: Sketch the square and the events Y ⬍ 1兾3 and Y ⬎ X.) 4.152 Sample surveys for sensitive issues. It is difficult to conduct sample surveys on sensitive issues because many people will not answer questions if the answers might embarrass them. Randomized response is an effective way to guarantee anonymity while collecting information on topics such as student cheating or sexual behavior. Here is the idea. To ask a sample of students whether they have plagiarized a term paper
300
CHAPTER 4
•
Probability: The Study of Randomness
while in college, have each student toss a coin in private. If the coin lands heads and they have not plagiarized, they are to answer “No.” Otherwise, they are to give “Yes” as their answer. Only the student knows whether the answer reflects the truth or just the coin toss, but the researchers can use a proper random sample with followup for nonresponse and other good sampling practices. Suppose that in fact the probability is 0.3 that a randomly chosen student has plagiarized a paper. Draw a tree diagram in which the first stage is tossing the coin
and the second is the truth about plagiarism. The outcome at the end of each branch is the answer given to the randomized-response question. What is the probability of a “No” answer in the randomized-response poll? If the probability of plagiarism were 0.2, what would be the probability of a “No” response on the poll? Now suppose that you get 39% “No” answers in a randomized-response poll of a large sample of students at your college. What do you estimate to be the percent of the population who have plagiarized a paper?
Sampling Distributions Introduction Statistical inference draws conclusions about a population or process from data. It emphasizes substantiating these conclusions via probability calculations, as probability allows us to take chance variation into account. We have already examined data and arrived at conclusions many times. How do we move from summarizing a single data set to formal inference involving probability calculations? The foundation for this was described in Section 3.4 (page 205). There, we not only discussed the use of statistics as estimates of population parameters but also described the chance variation of a statistic when the data are produced by random sampling or randomized experimentation. The sampling distribution of a statistic shows how it would vary in these identical repeated data collections. That is, the sampling distribution is a probability distribution that answers the question “What would happen if we did this experiment or sampling many times?” It is these distributions that provide the necessary link between probability and the data in your sample or from your experiment. They are the key to understanding statistical inference.
CHAPTER
5
5.1 The Sampling Distribution of a Sample Mean 5.2 Sampling Distributions for Counts and Proportions
LOOK BACK parameters and statistics, p. 206
LOOK BACK sampling distribution, p. 208
301
302
CHAPTER 5
•
Sampling Distributions Suppose that you plan to survey 1000 students at your university about their sleeping habits. The sampling distribution of the average hours of sleep per night describes what this average would be if many simple random samples of 1000 students were drawn from the population of students at your university. In other words, it gives you an idea of what you are likely to see from your survey. It tells you whether you should expect this average to be near the population mean and whether the variation of the statistic is roughly ⫾2 hours or ⫾2 minutes.
THE DISTRIBUTION OF A STATISTIC A statistic from a random sample or randomized experiment is a random variable. The probability distribution of the statistic is its sampling distribution.
LOOK BACK density curves, p. 56
To help in the transition from probability as a topic in itself to probability as a foundation for inference, in this chapter we will study the sampling distributions of some common statistics. The general framework for constructing a sampling distribution is the same for all statistics, so our focus here will be on those statistics commonly used in inference. Before doing so, however, we need to consider another set of probability distributions that also play a role in statistical inference. Any quantity that can be measured on each member of a population is described by the distribution of its values for all members of the population. This is the context in which we first met distributions, as density curves that provide models for the overall pattern of data. Imagine choosing one individual at random from a population and measuring a quantity. The quantities obtained from repeated draws of one individual from a population have a probability distribution that is the distribution of the population.
EXAMPLE 5.1 Total sleep time of college students. A recent survey describes the distribution of total sleep time among college students as approximately Normal with a mean of 6.78 hours and standard deviation of 1.24 hours.1 Suppose that we select a college student at random and obtain his or her sleep time. This result is a random variable X because prior to the random sampling, we don’t know the sleep time. We do know, however, that in repeated sampling X will have the same N(6.78, 1.24) distribution that describes the pattern of sleep time in the entire population. We call N(6.78, 1.24) the population distribution. POPULATION DISTRIBUTION The population distribution of a variable is the distribution of its values for all members of the population. The population distribution is also the probability distribution of the variable when we choose one individual at random from the population.
5.1 The Sampling Distribution of a Sample Mean
LOOK BACK SRS, p. 194
303
In this example, the population of all college students actually exists, so that we can in principle draw an SRS of students from it. Sometimes our population of interest does not actually exist. For example, suppose that we are interested in studying final-exam scores in a statistics course, and we have the scores of the 34 students who took the course last semester. For the purposes of statistical inference, we might want to consider these 34 students as part of a hypothetical population of similar students who would take this course. In this sense, these 34 students represent not only themselves but also a larger population of similar students. The key idea is to think of the observations that you have as coming from a population with a probability distribution. USE YOUR KNOWLEDGE 5.1 Number of apps on an iOS device. AppsFire is a service that shares the names of the apps on an iOS device with everyone else using the service. This, in a sense, creates an iOS device app recommendation system. Recently, the service drew a sample of 1000 AppsFire users and reported a median of 108 apps per device.2 State the population that this survey describes, the statistic, and some likely values from the population distribution. In the next two sections, we will study the sampling distributions of two common statistics, the sample mean and the sample proportion. The focus will be on the important features of these distributions so that we can quickly describe and use them in the later chapters on statistical inference. We will see that in each case the sampling distribution depends on both the population distribution and the way we collect the data from the population.
5.1 The Sampling Distribution of a Sample Mean When you complete this section, you will be able to
x and the • Explain the difference between the sampling distribution of – population distribution. x for an SRS of size n • Determine the mean and standard deviation of – from a population with mean m and standard deviation s. • Describe how much larger n has to be to reduce the standard deviation of – x by a certain factor. • Utilize the central limit theorem to approximate the sampling distribux and perform various probability calculations. tion of –
A variety of statistics are used to describe quantitative data. The sample mean, median, and standard deviation are all examples of statistics based on quantitative data. Statistical theory describes the sampling distributions of these statistics. However, the general framework for constructing a sampling distribution is the same for all statistics. In this section we will concentrate on the sample mean. Because sample means are just averages of observations, they are among the most frequently used statistics.
CHAPTER 5
•
Sampling Distributions
45
45
40
40
Percent of all means
Percent of calls
304
35 30 25 20 15
35 30 25 20 15
10
10
5
5 0
200
400 600 800 1000 Call lengths (seconds)
1200
(a)
0
200 400 600 800 1000 1200 Mean length of 80 calls (seconds) (b)
FIGURE 5.1 (a) The distribution of lengths of all customer service calls received by a bank in a
month, for Example 5.2. (b) The distribution of the sample means − x for 500 random samples of size 80 from this population. The scales and histogram classes are exactly the same in both panels.
EXAMPLE
DATA CALLS80
CHALLENGE
5.2 Sample means are approximately Normal. Figure 5.1 illustrates two striking facts about the sampling distribution of a sample mean. Figure 5.1(a) displays the distribution of customer service call lengths for a bank service center for a month. There are more than 30,000 calls in this population.3 (We omitted a few extreme outliers, calls that lasted more than 20 minutes.) The distribution is extremely skewed to the right. The population mean is m ⫽ 173.95 seconds. Table 1.2 (page 19) contains the lengths of a random sample of 80 calls from this population. The mean of these 80 calls is x ⫽ 196.6 seconds. If we were to take another sample of size 80, we would likely get a different value of x. This is because this new sample would contain a different set of calls. To find the sampling distribution of x, we take many SRSs of size 80 and calculate x for each sample. Figure 5.1(b) is the distribution of the values of x for 500 random samples. The scales and choice of classes are exactly the same as in Figure 5.1(a), so that we can make a direct comparison. The sample means are much less spread out than the individual call lengths. What is more, the distribution in Figure 5.1(b) is roughly symmetric rather than skewed. The Normal quantile plot in Figure 5.2 confirms that the distribution is close to Normal. This example illustrates two important facts about sample means that we will discuss in this section.
FACTS ABOUT SAMPLE MEANS 1. Sample means are less variable than individual observations. 2. Sample means are more Normal than individual observations. These two facts contribute to the popularity of sample means in statistical inference.
5.1 The Sampling Distribution of a Sample Mean FIGURE 5.2 Normal quantile
230 Sample mean call length (seconds)
plot of the 500 sample means in Figure 5.1(b). The distribution is close to Normal.
305
o o
220 oo ooo o ooo ooooo oooo oo ooo o o o o oooo oo ooooooo ooooo oooooo oooooo oooo oooo ooo oo o o ooo oooo oooooo ooo oooo ooo ooooo ooooo ooooo oooooo oooo oooo oooo oooo oooo oooo ooo oooo oooooo ooooo o o o ooo oooo ooooo oooooo oooooo ooooo ooo ooooo oooooo oooooooo o oooo oooo oooo ooo ooooo oooooo oooooooo ooooo oooo o o o o ooooo oooo oooooooo oooo oooo ooo ooooo oo oo
210 200 190 180 170 160 150 140 oo
130 120
oo
o
o o
o oo
o o
o
o
–3
–2
–1 0 1 Normal score
2
3
The mean and standard deviation of – x The sample mean x from a sample or an experiment is an estimate of the mean m of the underlying population. The sampling distribution of x is determined by the design used to produce the data, the sample size n, and the population distribution. Select an SRS of size n from a population, and measure a variable X on each individual in the sample. The n measurements are values of n random variables X1, X2, . . . , Xn. A single Xi is a measurement on one individual selected at random from the population and therefore has the distribution of the population. If the population is large relative to the sample, we can consider X1, X2, . . . , Xn to be independent random variables each having the same distribution. This is our probability model for measurements on each individual in an SRS. The sample mean of an SRS of size n is x⫽
LOOK BACK rules for means, p. 272
LOOK BACK unbiased estimator, p. 210
1 1X ⫹ X2 ⫹ . . . ⫹ Xn 2 n 1
If the population has mean m, then m is the mean of the distribution of each observation Xi. To get the mean of x, we use the rules for means of random variables. Specifically, mx ⫽
1 1m ⫹ mX2 ⫹ . . . ⫹ mXn 2 n X1
⫽
1 1m ⫹ m ⫹ . . . ⫹ m2 ⫽ m n
That is, the mean of x is the same as the mean of the population. The sample mean x is therefore an unbiased estimator of the unknown population mean m.
306
CHAPTER 5
•
Sampling Distributions
LOOK BACK rules for variances, p. 275
The observations are independent, so the addition rule for variances also applies: 1 2 s 2x ⫽ a b 1sX21 ⫹ s X22 ⫹ . . . ⫹ s 2Xn 2 n 1 2 ⫽ a b 1s 2 ⫹ s 2 ⫹ . . . ⫹ s 2 2 n ⫽
s2 n
With n in the denominator, the variability of x about its mean decreases as the sample size grows. Thus, a sample mean from a large sample will usually be very close to the true population mean m. Here is a summary of these facts.
MEAN AND STANDARD DEVIATION OF A SAMPLE MEAN Let x be the mean of an SRS of size n from a population having mean m and standard deviation s. The mean and standard deviation of x are mx ⫽ m s sx ⫽ 1n How precisely does a sample mean x estimate a population mean m? Because the values of x vary from sample to sample, we must give an answer in terms of the sampling distribution. We know that x is an unbiased estimator of m, so its values in repeated samples are not systematically too high or too low. Most samples will give an x-value close to m if the sampling distribution is concentrated close to its mean m. So the precision of estimation depends on the spread of the sampling distribution. Because the standard deviation of x is s兾 1n, the standard deviation of the statistic decreases in proportion to the square root of the sample size. This means, for example, that a sample size must be multiplied by 4 in order to divide the statistic’s standard deviation in half. By comparison, a sample size must be multiplied by 100 in order to reduce the standard deviation by a factor of 10.
EXAMPLE 5.3 Standard deviations for sample means of service call lengths. The standard deviation of the population of service call lengths in Figure 5.1(a) is s ⫽ 184.81 seconds. The length of a single call will often be far from the population mean. If we choose an SRS of 20 calls, the standard deviation of their mean length is 184.81 ⫽ 41.32 seconds 120 Averaging over more calls reduces the variability and makes it more likely that x is close to m. Our sample size of 80 calls is 4 times 20, so the standard deviation will be half as large: sx ⫽
sx ⫽
184.81 ⫽ 20.66 seconds 180
5.1 The Sampling Distribution of a Sample Mean
307
USE YOUR KNOWLEDGE 5.2 Find the mean and the standard deviation of the sampling distribution. Compute the mean and standard deviation of the sampling distribution of the sample mean when you plan to take an SRS of size 49 from a population with mean 420 and standard deviation 21. 5.3 The effect of increasing the sample size. In the setting of the previous exercise, repeat the calculations for a sample size of 441. Explain the effect of the sample size increase on the mean and standard deviation of the sampling distribution.
The central limit theorem We have described the center and spread of the probability distribution of a sample mean x, but not its shape. The shape of the distribution of x depends on the shape of the population distribution. Here is one important case: if the population distribution is Normal, then so is the distribution of the sample mean.
SAMPLING DISTRIBUTION OF A SAMPLE MEAN If a population has the N1m, s2 distribution, then the sample mean x of n independent observations has the N1m, s兾1n 2 distribution.
central limit theorem
This is a somewhat special result. Many population distributions are not Normal. The service call lengths in Figure 5.1(a), for example, are strongly skewed. Yet Figures 5.1(b) and 5.2 show that means of samples of size 80 are close to Normal. One of the most famous facts of probability theory says that, for large sample sizes, the distribution of x is close to a Normal distribution. This is true no matter what shape the population distribution has, as long as the population has a finite standard deviation s. This is the central limit theorem. It is much more useful than the fact that the distribution of x is exactly Normal if the population is exactly Normal.
CENTRAL LIMIT THEOREM Draw an SRS of size n from any population with mean m and finite standard deviation s. When n is large, the sampling distribution of the sample mean x is approximately Normal: x is approximately N am,
s b 1n
EXAMPLE 5.4 How close will the sample mean be to the population mean? With the Normal distribution to work with, we can better describe how precisely a random sample of 80 calls estimates the mean length of all the calls in the
308
CHAPTER 5
•
Sampling Distributions
LOOK BACK 68–95–99.7 rule, p. 59
population. The population standard deviation for the more than 30,000 calls in the population of Figure 5.1(a) is s ⫽ 184.81 seconds. From Example 5.3 we know s x ⫽ 20.66 seconds. By the 95 part of the 68–95–99.7 rule, about 95% of all samples will have mean x within two standard deviations of m, that is, within ⫾41.32 seconds of m.
USE YOUR KNOWLEDGE 5.4 Use the 68–95–99.7 rule. You take an SRS of size 49 from a population with mean 185 and standard deviation 70. According to the central limit theorem, what is the approximate sampling distribution of the sample mean? Use the 95 part of the 68–95–99.7 rule to describe the variability of x. For the sample size of n ⫽ 80 in Example 5.4, the sample mean is not very precise. The population of service call lengths is very spread out, so the sampling distribution of x has a large standard deviation.
EXAMPLE 5.5 How can we reduce the standard deviation? In the setting of Example 5.4, if we want to reduce the standard deviation of x by a factor of 4, we must take a sample 16 times as large, n ⫽ 16 ⫻ 80, or 1280. Then sx ⫽
184.81 ⫽ 5.166 seconds 11280
For samples of size 1280, about 95% of the sample means will be within twice 5.166, or 10.33 seconds, of the population mean m. USE YOUR KNOWLEDGE 5.5 The effect of increasing the sample size. In the setting of Exercise 5.4, suppose that we increase the sample size to 1225. Use the 95 part of the 68–95–99.7 rule to describe the variability of this sample mean. Compare your results with those you found in Exercise 5.4. Example 5.5 reminds us that if the population is very spread out, the 1n in the standard deviation of x implies that very large samples are needed to estimate the population mean precisely. The main point of the example, however, is that the central limit theorem allows us to use Normal probability calculations to answer questions about sample means even when the population distribution is not Normal. How large a sample size n is needed for x to be close to Normal depends on the population distribution. More observations are required if the shape of the population distribution is far from Normal. For the very skewed call length population, samples of size 80 are large enough. Further study would be needed to see if the distribution of x is close to Normal for smaller samples like n ⫽ 20 or n ⫽ 40. Here is a more detailed study of another skewed distribution.
5.1 The Sampling Distribution of a Sample Mean
0
0
1
309
1
(a)
(b)
FIGURE 5.3 The central limit theorem in action: the sampling distribution of sample means from a strongly non-Normal population becomes more Normal as the sample size increases. (a) The distribution of 1 observation. (b) The distribution of − x for 2 observations. (c) The distribution of − x for 10 observations. (d) The distribution of − x for 25 observations.
0
1
0 (c)
1 (d)
EXAMPLE
exponential distribution
CHALLENGE
5.6 The central limit theorem in action. Figure 5.3 shows the central limit theorem in action for another very non-Normal population. Figure 5.3(a) displays the density curve of a single observation from the population. The distribution is strongly right-skewed, and the most probable outcomes are near 0. The mean m of this distribution is 1, and its standard deviation s is also 1. This particular continuous distribution is called an exponential distribution. Exponential distributions are used as models for how long an iOS device, for example, will last and for the time between text messages sent on your cell phone. Figures 5.3(b), (c), and (d) are the density curves of the sample means of 2, 10, and 25 observations from this population. As n increases, the shape becomes more Normal. The mean remains at m ⫽ 1, but the standard deviation decreases, taking the value 1兾1n. The density curve for 10 observations is still somewhat skewed to the right but already resembles a Normal curve having m ⫽ 1 and s ⫽ 1兾110 ⫽ 0.32. The density curve for n ⫽ 25 is yet more Normal. The contrast between the shape of the population distribution and of the distribution of the mean of 10 or 25 observations is striking. You can also use the Central Limit Theorem applet to study the sampling distribution of x. From one of three population distributions, 10,000 SRSs of a user-specified sample size n are generated, and a histogram of the sample means is constructed. You can then compare this estimated sampling distribution with the Normal curve that is based on the central limit theorem.
310
CHAPTER 5
•
Sampling Distributions
EXAMPLE 5.7 Using the Central Limit Theorem applet. In Example 5.6, we considered sample sizes of n ⫽ 2, 10, and 25 from an exponential distribution. Figure 5.4 shows a screenshot of the Central Limit Theorem applet for the exponential distribution when n ⫽ 10. The mean and standard deviation of this sampling distribution are 1 and 1兾110 ⫽ 0.316, respectively. From the 10,000 SRSs, the mean is estimated to be 1.001 and the estimated standard deviation is 0.319. These are both quite close to the true values. In Figure 5.3(c) we saw that the density curve for 10 observations is still somewhat skewed to the right. We can see this same behavior in Figure 5.4 when we compare the histogram with the Normal curve based on the central limit theorem.
FIGURE 5.4 Screenshot of the
CHALLENGE
Central Limit Theorem applet for the exponential distribution when n 5 10, for Example 5.7.
Try using the applet for the other sample sizes in Example 5.6. You should get histograms shaped like the density curves shown in Figure 5.3. You can also consider other sample sizes by sliding n from 1 to 100. As you increase n, the shape of the histogram moves closer to the Normal curve that is based on the central limit theorem. USE YOUR KNOWLEDGE 5.6 Use the Central Limit Theorem applet. Let’s consider the uniform distribution between 0 and 10. For this distribution, all intervals of the same length between 0 and 10 are equally likely. This distribution has a mean of 5 and standard deviation of 2.89.
5.1 The Sampling Distribution of a Sample Mean
311
CHALLENGE
(a) Approximate the population distribution by setting n ⫽ 1 and clicking the “Generate samples” button. (b) What are your estimates of the population mean and population standard deviation based on the 10,000 SRSs? Are these population estimates close to the true values? (c) Describe the shape of the histogram and compare it with the Normal curve. 5.7 Use the Central Limit Theorem applet again. Refer to the previous exercise. In the setting of Example 5.6, let’s approximate the sampling distribution for samples of size n ⫽ 2, 10, and 25 observations. (a) For each sample size, compute the mean and standard deviation of x. (b) For each sample size, use the applet to approximate the sampling distribution. Report the estimated mean and standard deviation. Are they close to the true values calculated in (a)? (c) For each sample size, compare the shape of the sampling distribution with the Normal curve based on the central limit theorem. (d) For this population distribution, what sample size do you think is needed to make you feel comfortable using the central limit theorem to approximate the sampling distribution of x? Explain your answer.
Now that we know that the sampling distribution of the sample mean x is approximately Normal for a sufficiently large n, let’s consider some probability calculations.
EXAMPLE 5.8 Time between sent text messages. Americans aged 18 to 29 years send
an average of almost 88 text messages a day.4 Suppose that the time X between text messages sent from your cell phone is governed by the exponential distribution with mean m ⫽ 15 minutes and standard deviation s ⫽ 15 minutes. You record the next 50 times between sent text messages. What is the probability that their average exceeds 13 minutes? The central limit theorem says that the sample mean time x (in minutes) between text messages has approximately the Normal distribution with mean equal to the population mean m ⫽ 15 minutes and standard deviation s 15 ⫽ ⫽ 2.12 minutes 150 150 The sampling distribution of x is therefore approximately N115, 2.122. Figure 5.5 shows this Normal curve (solid) and also the actual density curve of x (dashed).
312
CHAPTER 5
•
Sampling Distributions
FIGURE 5.5 The exact distribution (dashed) and the Normal approximation from the central limit theorem (solid) for the average time between text messages sent on your cell phone, for Example 5.8.
5
10 15 20 Average time (minutes)
25
The probability we want is P 1x ⬎ 13.02. This is the area to the right of 13 under the solid Normal curve in Figure 5.5. A Normal distribution calculation gives P 1 x ⬎ 13.02 ⫽ P a
x ⫺ 15 13.0 ⫺ 15 ⬎ b 2.12 2.12
⫽ P1Z ⬎ ⫺0.942 ⫽ 0.8264 The exactly correct probability is the area under the dashed density curve in the figure. It is 0.8271. The central limit theorem Normal approximation is off by only about 0.0007. We can also use this sampling distribution to talk about the total time between the 1st and 51st text message sent from your phone.
EXAMPLE 5.9 Convert the results to the total time. There are 50 time intervals between the 1st and 51st text message. According to the central limit theorem calculations in Example 5.8, P 1x ⬎ 13.02 ⫽ 0.8264
We know that the sample mean is the total time divided by 50, so the event 5x ⬎ 13.06 is the same as the event 550x ⬎ 50113.02 6. We can say that the probability is 0.8264 that the total time is 50(13.0) ⫽ 650 minutes (10.8 hours) or greater. USE YOUR KNOWLEDGE 5.8 Find a probability. Refer to Example 5.8. Find the probability that the mean time between text messages is less than 16 minutes. The exact probability is 0.6944. Compare your answer with the exact one. Figure 5.6 summarizes the facts about the sampling distribution of x in a way that emphasizes the big idea of a sampling distribution. The general framework for constructing the sampling distribution of x is shown on the left.
5.1 The Sampling Distribution of a Sample Mean
313
• Take many random samples of size n from a population with mean M and standard deviation S. • Find the sample mean x for each sample. • Collect all the x’s and display their distribution. The sampling distribution of x is shown on the right. Keep this figure in mind as you go forward.
SRS size n
FIGURE 5.6 The sampling distribution of a sample mean − x has mean m and standard deviation s兾1n. The sampling distribution is Normal if the population distribution is Normal; it is approximately Normal for large samples in any case.
SRS size n
SRS si
ze n
Population mean μ and standard deviation σ
x
σ n
x x
Mean μ
Values of x
A few more facts
LOOK BACK rules for means, p. 272; rules for variances, p. 275
The central limit theorem is the big fact of probability theory in this section. Here are three additional facts related to our investigations that will be useful in describing methods of inference in later chapters. The fact that the sample mean of an SRS from a Normal population has a Normal distribution is a special case of a more general fact: any linear combination of independent Normal random variables is also Normally distributed. That is, if X and Y are independent Normal random variables and a and b are any fixed numbers, aX ⫹ bY is also Normally distributed, and this is true for any number of Normal random variables. In particular, the sum or difference of independent Normal random variables has a Normal distribution. The mean and standard deviation of aX ⫹ bY are found as usual from the rules for means and variances. These facts are often used in statistical calculations. Here is an example.
EXAMPLE 5.10 Getting to and from campus. You live off campus and take the shuttle, provided by your apartment complex, to and from campus. Your time on the shuttle in minutes varies from day to day. The time going to campus X has the N120, 42 distribution, and the time returning from campus Y varies according to the N118, 82 distribution. If they vary independently, what is the probability that you will be on the shuttle for less time going to campus? The difference in times X ⫺ Y is Normally distributed, with mean and variance mX⫺Y ⫽ mX ⫺ mY ⫽ 20 ⫺ 18 ⫽ 2 s2X⫺Y ⫽ s2X ⫹ s2Y ⫽ 42 ⫹ 82 ⫽ 80
314
CHAPTER 5
•
Sampling Distributions Because 180 ⫽ 8.94, X ⫺ Y has the N12, 8.942 distribution. Figure 5.7 illustrates the probability computation: P1X ⬍ Y2 ⫽ P1X ⫺ Y ⬍ 02 ⫽ Pa
1X ⫺ Y2 ⫺ 2 0⫺2 ⬍ b 8.94 8.94
⫽ P1Z ⬍ ⫺0.222 ⫽ 0.4129 Although on average it takes longer to go to campus than return, the trip to campus will take less time on roughly two of every five days.
Probability = 0.4129
FIGURE 5.7 The Normal probability calculation for Example 5.10. The difference in times going to campus and returning from campus (X 2 Y ) is Normal with mean 2 minutes and standard deviation 8.94 minutes.
Going is faster –30
–20
Returning is faster
–10 0 10 Time difference (minutes)
20
30
The second useful fact is that more general versions of the central limit theorem say that the distribution of a sum or average of many small random quantities is close to Normal. This is true even if the quantities are not independent (as long as they are not too highly correlated) and even if they have different distributions (as long as no single random quantity is so large that it dominates the others). These more general versions of the central limit theorem suggest why the Normal distributions are common models for observed data. Any variable that is a sum of many small random influences will have approximately a Normal distribution. Finally, the central limit theorem also applies to discrete random variables. An average of discrete random variables will never result in a continuous sampling distribution, but the Normal distribution often serves as a good approximation. In Section 5.2, we will discuss the sampling distribution and Normal approximation for counts and proportions. This Normal approximation is just an example of the central limit theorem applied to these discrete random variables.
5.1 The Sampling Distribution of a Sample Mean
315
BEYOND THE BASICS
Weibull distributions
Weibull distributions
Our discussion of sampling distributions so far has concentrated on the Normal model to approximate the sampling distribution of the sample mean x. This model is important in statistical practice because of the central limit theorem and the fact that sample means are among the most frequently used statistics. Simplicity also contributes to its popularity. The parameter m is easy to understand, and to estimate it, we use a statistic x that is also easy to understand and compute. There are, however, many other probability distributions that are used to model data in various circumstances. The time that a product, such as a computer hard drive, lasts before failing rarely has a Normal distribution. Earlier we mentioned the use of the exponential distribution to model time to failure. Another class of continuous distributions, the Weibull distributions, is more commonly used in these situations.
EXAMPLE 5.11 Weibull density curves. Figure 5.8 shows the density curves of three members of the Weibull family. Each describes a different type of distribution for the time to failure of a product. 1. The top curve in Figure 5.8 is a model for infant mortality. This describes products that often fail immediately, prior to delivery to the customer. However, if the product does not fail right away, it will likely last a long time. For products like this, a manufacturer might test them and ship only the ones that do not fail immediately. FIGURE 5.8 Density curves for three members of the Weibull family of distributions, for Example 5.11. Time
Time
Time
316
CHAPTER 5
•
Sampling Distributions 2. The middle curve in Figure 5.8 is a model for early failure. These products do not fail immediately, but many fail early in their lives after they are in the hands of customers. This is disastrous—the product or the process that makes it must be changed at once. 3. The bottom curve in Figure 5.8 is a model for old-age wear-out. Most of these products fail only when they begin to wear out, and then many fail at about the same age. A manufacturer certainly wants to know to which of these classes a new product belongs. To find out, engineers operate a random sample of products until they fail. From the failure time data we can estimate the parameter (called the “shape parameter”) that distinguishes among the three Weibull distributions in Figure 5.8. The shape parameter has no simple definition like that of a population proportion or mean, and it cannot be estimated by a simple statistic such as pˆ or x. Two things save the situation. First, statistical theory provides general approaches for finding good estimates of any parameter. These general methods not only tell us how to use x in the Normal settings but also tell us how to estimate the Weibull shape parameter. Second, software can calculate the estimate from data even though there is no algebraic formula that we can write for the estimate. Statistical practice often relies on both mathematical theory and methods of computation more elaborate than the ones we will meet in this book. Fortunately, big ideas such as sampling distributions carry over to more complicated situations.5
SECTION 5.1 Summary The sample mean x of an SRS of size n drawn from a large population with mean m and standard deviation s has a sampling distribution with mean and standard deviation mx ⫽ m s sx ⫽ 1n The sample mean x is an unbiased estimator of the population mean m and is less variable than a single observation. The standard deviation decreases in proportion to the square root of the sample size n. This means that to reduce the standard deviation by a factor of C, we need to increase the sample size by a factor of C 2. The central limit theorem states that for large n the sampling distribution of x is approximately N1m, s兾1n 2 for any population with mean m and finite standard deviation s. This allows us to approximate probability calculations about x using the Normal distribution. Linear combinations of independent Normal random variables have Normal distributions. In particular, if the population has a Normal distribution, so does x.
SECTION 5.1 Exercises For Exercise 5.1, see page 303; for Exercises 5.2 and 5.3, see page 307; for Exercise 5.4, see page 308; for Exercise 5.5, see page308; for Exercises 5.6 and 5.7, see pages 310–311; and for Exercise 5.8, see page 312.
5.9 What is wrong? Explain what is wrong in each of the following statements. (a) If the population standard deviation is 20, then the standard deviation of x for an SRS of 10 observations will be 20兾10 ⫽ 2.
5.1 The Sampling Distribution of a Sample Mean
317
(b) When taking SRSs from a large population, larger sample sizes will result in larger standard deviations of x.
(a) Do you think that the two populations are comparable? Explain your answer.
(c) For an SRS from a large population, both the mean and the standard deviation of x depend on the sample size n.
(b) The AppsFire report provides a footnote stating that their data exclude users who do not use any apps at all. Explain how this might contribute to the difference in the two reported statistics.
5.10 What is wrong? Explain what is wrong in each of the following statements. (a) The central limit theorem states that for large n, the population mean m is approximately Normal. (b) For large n, the distribution of observed values will be approximately Normal. (c) For sufficiently large n, the 68–95–99.7 rule says that x should be within m ⫾ 2s about 95% of the time. 5.11 Generating a sampling distribution. Let’s illustrate the idea of a sampling distribution in the case of a very small sample from a very small population. The population is the 10 scholarship players currently on your women’s basketball team. For convenience, the 10 players have been labeled with the integers 0 to 9. For each player, the total amount of time spent (in minutes) on Facebook during the last week is recorded in the table below. Player Total time (min)
0
1
2
108 63 127
3
4
5
6
210 92 88 161
7 133
8
9
105 168
The parameter of interest is the average amount of time on Facebook. The sample is an SRS of size n ⫽ 3 drawn from this population of players. Because the players are labeled 0 to 9, a single random digit from Table B chooses one player for the sample. (a) Find the mean for the 10 players in the population. This is the population mean m. (b) Use Table B to draw an SRS of size 3 from this population. (Note: You may sample the same player’s time more than once.) Write down the three times in your sample and calculate the sample mean x. This statistic is an estimate of m. (c) Repeat this process 9 more times using different parts of Table B. Make a histogram of the 10 values of x. You are approximating the sampling distribution of x. (d) Is the center of your histogram close to m? Explain why you’d expect it to get closer to m the more times you repeated this sampling process. 5.12 Number of apps on a Smartphone. At a recent Appnation conference, Nielsen reported an average of 41 apps per smartphone among U.S. smartphone subscribers.6 State the population for this survey, the statistic, and some likely values from the population distribution. 5.13 Why the difference? Refer to the previous exercise. In Exercise 5.1 (page 303), a survey by AppsFire reported a median of 108 apps per device. This is very different from the average reported in the previous exercise.
5.14 Total sleep time of college students. In Example 5.1, the total sleep time per night among college students was approximately Normally distributed with mean m ⫽ 6.78 hours and standard deviation s ⫽ 1.24 hours. You plan to take an SRS of size n ⫽ 150 and compute the average total sleep time. (a) What is the standard deviation for the average time? (b) Use the 95 part of the 68–95–99.7 rule to describe the variability of this sample mean. (c) What is the probability that your average will be below 6.9 hours? 5.15 Determining sample size. Refer to the previous exercise. Now you want to use a sample size such that about 95% of the averages fall within ⫾10 minutes (0.17 hours) of the true mean m ⫽ 6.78. (a) Based on your answer to part (b) in Exercise 5.14, should the sample size be larger or smaller than 150? Explain. (b) What standard deviation of x do you need such that 95% of all samples will have a mean within 10 minutes of m? (c) Using the standard deviation you calculated in part (b), determine the number of students you need to sample. 5.16 File size on a tablet PC. A tablet PC contains 8152 music and video files. The distribution of file size is highly skewed. Assume that the standard deviation for this population is 0.82 megabytes (MB). (a) What is the standard deviation of the average file size when you take an SRS of 16 files from this population? (b) How many files would you need to sample if you wanted the standard deviation of x to be no larger than 0.10 MB? 5.17 Bottling an energy drink. A bottling company uses a filling machine to fill cans with an energy drink. The cans are supposed to contain 250 milliliters (ml). The machine, however, has some variability, so the standard deviation of the volume is s ⫽ 0.5 ml. A sample of 4 cans is inspected each hour for process control purposes, and records are kept of the sample mean volume. If the process mean is exactly equal to the target value, what will be the mean and standard deviation of the numbers recorded? 5.18 Average file size on a tablet. Refer to Exercise 5.16. Suppose that the true mean file size of the music and video files on the tablet is 7.4 MB and you plan to take an SRS of n ⫽ 40 files. (a) Explain why it may be reasonable to assume that the average x is approximately Normal even though the population distribution is highly skewed.
318
CHAPTER 5
•
Sampling Distributions
(b) Sketch the approximate Normal curve for the sample mean, making sure to specify the mean and standard deviation. (c) What is the probability that your sample mean will differ from the population mean by more than 0.15 MB?
and standard deviation s ⫽ 5.2. The distribution of scores is only roughly Normal. (a) What is the approximate probability that a single student randomly chosen from all those taking the test scores 27 or higher?
5.19 Can volumes. Averages are less variable than individual observations. It is reasonable to assume that the can volumes in Exercise 5.17 vary according to a Normal distribution. In that case, the mean x of an SRS of cans also has a Normal distribution.
(b) Now consider an SRS of 16 students who took the test. What are the mean and standard deviation of the sample mean score x of these 16 students?
(a) Make a sketch of the Normal curve for a single can. Add the Normal curve for the mean of an SRS of 4 cans on the same sketch.
(d) Which of your two Normal probability calculations in parts (a) and (c) is more accurate? Why?
(b) What is the probability that the volume of a single randomly chosen can differs from the target value by 1 ml or more? (c) What is the probability that the mean volume of an SRS of 4 cans differs from the target value by 1 ml or more? 5.20 Number of friends on Facebook. Facebook recently examined all active Facebook users (more than 10% of the global population) and determined that the average user has 190 friends. This distribution takes only integer values, so it is certainly not Normal. It is also highly skewed to the right, with a median of 100 friends.7 Suppose that s ⫽ 288 and you take an SRS of 70 Facebook users. (a) For your sample, what are the mean and standard deviation of x, the mean number of friends per user? (b) Use the central limit theorem to find the probability that the average number of friends for 70 Facebook users is greater than 250. (c) What are the mean and standard deviation of the total number of friends in your sample? (d) What is the probability that the total number of friends among your sample of 70 Facebook users is greater than 17,500? 5.21 Cholesterol levels of teenagers. A study of the health of teenagers plans to measure the blood cholesterol level of an SRS of 13- to 16-year-olds. The researchers will report the mean x from their sample as an estimate of the mean cholesterol level m in this population. (a) Explain to someone who knows no statistics what it means to say that x is an “unbiased” estimator of m. (b) The sample result x is an unbiased estimator of the population truth m no matter what size SRS the study chooses. Explain to someone who knows no statistics why a large sample gives more trustworthy results than a small sample. 5.22 ACT scores of high school seniors. The scores of your state’s high school seniors on the ACT college entrance examination in a recent year had mean m ⫽ 22.3
(c) What is the approximate probability that the mean score x of these 16 students is 27 or higher?
5.23 Monitoring the emerald ash borer. The emerald ash borer is a beetle that poses a serious threat to ash trees. Purple traps are often used to detect or monitor populations of this pest. In the counties of your state where the beetle is present, thousands of traps are used to monitor the population. These traps are checked periodically. The distribution of beetle counts per trap is discrete and strongly skewed. A majority of traps have no beetles, and only a few will have more than 1 beetle. For this exercise, assume that the mean number of beetles trapped is 0.3 with a standard deviation of 0.8. (a) Suppose that your state does not have the resources to check all the traps, and so it plans to check only an SRS of n ⫽ 100 traps. What are the mean and standard deviation of the average number of beetles x in 100 traps? (b) Use the central limit theorem to find the probability that the average number of beetles in 100 traps is greater than 0.5. (c) Do you think it is appropriate in this situation to use the central limit theorem? Explain your answer. 5.24 Grades in a math course. Indiana University posts the grade distributions for its courses online.8 Students in one section of Math 118 in the fall 2012 semester received 33% A’s, 33% B’s, 20% C’s, 12% D’s, and 2% F’s. (a) Using the common scale A 5 4, B 5 3, C 5 2, D 5 1, F 5 0, take X to be the grade of a randomly chosen Math 118 student. Use the definitions of the mean (page 265) and standard deviation (page 273) for discrete random variables to find the mean m and the standard deviation s of grades in this course. (b) Math 118 is a large enough course that we can take the grades of an SRS of 25 students to be independent of each other. If x is the average of these 25 grades, what are the mean and standard deviation of x? (c) What is the probability that a randomly chosen Math 118 student gets a B or better, P1X ⱖ 32? (d) What is the approximate probability P1x ⱖ 32 that the grade point average for 25 randomly chosen Math 118 students is B or better?
5.1 The Sampling Distribution of a Sample Mean 5.25 Diabetes during pregnancy. Sheila’s doctor is concerned that she may suffer from gestational diabetes (high blood glucose levels during pregnancy). There is variation both in the actual glucose level and in the results of the blood test that measures the level. A patient is classified as having gestational diabetes if her glucose level is above 140 milligrams per deciliter (mg/dl) one hour after a sugary drink is ingested. Sheila’s measured glucose level one hour after ingesting the sugary drink varies according to the Normal distribution with m ⫽ 125 mg/dl and s ⫽ 10 mg/dl. (a) If a single glucose measurement is made, what is the probability that Sheila is diagnosed as having gestational diabetes? (b) If measurements are made instead on three separate days and the mean result is compared with the criterion 140 mg/dl, what is the probability that Sheila is diagnosed as having gestational diabetes? 5.26 A roulette payoff. A $1 bet on a single number on a casino’s roulette wheel pays $35 if the ball ends up in the number slot you choose. Here is the distribution of the payoff X: Payoff X Probability
$0
$35
0.974
0.026
Each spin of the roulette wheel is independent of other spins. (a) What are the mean and standard deviation of X? (b) Sam comes to the casino weekly and bets on 10 spins of the roulette wheel. What does the law of large numbers say about the average payoff Sam receives from his bets each visit? (c) What does the central limit theorem say about the distribution of Sam’s average payoff after betting on 520 spins in a year? (d) Sam comes out ahead for the year if his average payoff is greater than $1 (the amount he bet on each spin). What is the probability that Sam ends the year ahead? The true probability is 0.396. Does using the central limit theorem provide a reasonable approximation? We will return to this problem in the next section. 5.27 Defining a high glucose reading. In Exercise 5.25, Sheila’s measured glucose level one hour after ingesting the sugary drink varies according to the Normal distribution with m ⫽ 125 mg/dl and s ⫽ 10 mg/dl. What is the level L such that there is probability only 0.05 that the mean glucose level of three test results falls above L for Sheila’s glucose level distribution? 5.28 Risks and insurance. The idea of insurance is that we all face risks that are unlikely but carry high cost. Think of a fire destroying your home. So we form a group to share the risk: we all pay a small amount, and the insurance policy pays a large amount to those few of us whose homes burn down. An insurance company
319
looks at the records for millions of homeowners and sees that the mean loss from fire in a year is m ⫽ $250 per house and that the standard deviation of the loss is s ⫽ $1000. (The distribution of losses is extremely rightskewed: most people have $0 loss, but a few have large losses.) The company plans to sell fire insurance for $250 plus enough to cover its costs and profit. (a) Explain clearly why it would be unwise to sell only 12 policies. Then explain why selling many thousands of such policies is a safe business. (b) If the company sells 25,000 policies, what is the approximate probability that the average loss in a year will be greater than $270? 5.29 Weights of airline passengers. In response to the increasing weight of airline passengers, the Federal Aviation Administration told airlines to assume that passen gers average 190 pounds in the summer, including clothing and carry-on baggage. But passengers vary: the FAA gave a mean but not a standard deviation. A reasonable standard deviation is 35 pounds. Weights are not Normally distributed, especially when the population includes both men and women, but they are not very non-Normal. A commuter plane carries 25 passengers. What is the approximate probability that the total weight of the passengers exceeds 5200 pounds? (Hint: To apply the central limit theorem, restate the problem in terms of the mean weight.) 5.30 Trustworthiness and eye color. Various studies have shown that facial appearance affects social interactions. One recent study looked at the relationship between eye color and trustworthiness.9 In this study, there were 238 participants, 78 with brown eyes and 160 with blue or green eyes. Each participant was asked to rate a set of student photos in terms of trustworthiness on a 10-point scale, where 1 means very trustworthy and 10 very untrustworthy. All photos showed a student who was seated in front of a white background and looking directly at the camera with a neutral expression. The photos were cropped so that the eyes were at the same height on each photo and a neckline was visible. Suppose that for the population of all brown-eyed participants, a photo of a blue-eyed female student has a mean score of 5.8 and a standard deviation of 2.5. That same photo for the population of all blue- or green-eyed participants has a mean score of 6.3 and a standard deviation of 2.2. (a) Although each participant’s score is discrete, the mean score for each eye color group will be close to Normal. Why? (b) What are the means and standard deviations of the sample means of the scores for the two eye color groups in this study?
320
CHAPTER 5
•
Sampling Distributions
5.31 Trustworthiness and eye color, continued. Refer to the previous exercise. (a) We can take all 238 scores to be independent because participants are not told each other’s scores. What is the distribution of the difference between the mean scores in the two groups? (b) Find the probability that the mean score for the browneyed group is less than the mean score for the other group. 5.32 Iron depletion without anemia and physical performance. Several studies have shown a link between iron depletion without anemia (IDNA) and physical performance. In one recent study, the physical performance of 24 female collegiate rowers with IDNA was compared with 24 female collegiate rowers with normal iron status.10 Several different measures of physical performance were studied, but we’ll focus here on training-session duration. Assume that training-session duration of female rowers with IDNA is Normally distributed with mean 58 minutes and standard deviation 11 minutes. Training-session duration of female rowers with normal iron status is Normally distributed with mean 69 minutes and standard deviation 18 minutes. (a) What is the probability that the mean duration of the 24 rowers with IDNA exceeds 63 minutes? (b) What is the probability that the mean duration of the 24 rowers with normal iron status is less than 63 minutes? (c) What is the probability that the mean duration of the 24 rowers with IDNA is greater than the mean duration of the 24 rowers with normal iron status? 5.33 Treatment and control groups. The previous exercise illustrates a common setting for statistical inference. This exercise gives the general form of the sampling distribution needed in this setting. We have a sample of n observations from a treatment group and an independent sample of m observations from a control group. Suppose
that the response to the treatment has the N1mX, sX 2 distribution and that the response of control subjects has the N1mY, sY 2 distribution. Inference about the difference mY ⫺ mX between the population means is based on the difference y ⫺ x between the sample means in the two groups. (a) Under the assumptions given, what is the distribution of y? Of x? (b) What is the distribution of y ⫺ x? 5.34 Investments in two funds. Jennifer invests her money in a portfolio that consists of 70% Fidelity 500 Index Fund and 30% Fidelity Diversified International Fund. Suppose that in the long run the annual real return X on the 500 Index Fund has mean 9% and standard deviation 19%, the annual real return Y on the Diversified International Fund has mean 11% and standard deviation 17%, and the correlation between X and Y is 0.6. (a) The return on Jennifer’s portfolio is R ⫽ 0.7X ⫹ 0.3Y. What are the mean and standard deviation of R? (b) The distribution of returns is typically roughly symmetric but with more extreme high and low observations than a Normal distribution. The average return over a number of years, however, is close to Normal. If Jennifer holds her portfolio for 20 years, what is the approximate probability that her average return is less than 5%? (c) The calculation you just made is not overly helpful, because Jennifer isn’t really concerned about the mean return R. To see why, suppose that her portfolio returns 12% this year and 6% next year. The mean return for the two years is 9%. If Jennifer starts with $1000, how much does she have at the end of the first year? At the end of the second year? How does this amount compare with what she would have if both years had the mean return, 9%? Over 20 years, there may be a large difference between the ordinary mean R and the geometric mean, which reflects the fact that returns in successive years multiply rather than add.
5.2 Sampling Distributions for Counts and Proportions When you complete this section, you will be able to • Determine when the count X can be modeled using the binomial distribution. • Determine when the sampling distribution of X can be modeled using the binomial distribution. • Calculate the mean and standard deviation of X when it has the B(n, p) distribution. • Explain the differences in the sampling distributions of a count X and the associated sample proportion pˆ 5X/n. • Determine when one can utilize the Normal approximation to describe the sampling distribution of the count or the sampling distribution of the sample proportion. • Use the Normal approximation for counts and proportions to perform probability calculations about the statistics.
5.2 Sampling Distributions for Counts and Proportions
LOOK BACK categorical variable, p. 3
321
In the previous section, we discussed the probability distribution of the sample mean, which meant a focus on population values that were quantitative. We will now shift our focus to population values that are categorical. Counts and proportions are discrete statistics that describe categorical data. We focus our discussion on the simplest case of a random variable with only two possible categories. Here is an example.
EXAMPLE 5.12 Work hours make it difficult to spend time with children. A sample survey asks 1006 British parents whether they think long working hours are making it difficult to spend enough time with their children.11 We would like to view the responses of these parents as representative of a larger population of British parents who hold similar beliefs. That is, we will view the responses of the sampled parents as an SRS from a population. When there are only two possible outcomes for a random variable, we can summarize the results by giving the count for one of the possible outcomes. We let n represent the sample size, and we use X to represent the random variable that gives the count for the outcome of interest.
EXAMPLE 5.13 The random variable of interest. In our sample survey of British par-
ents, n ⫽ 1006. We will ask each parent in our sample whether he or she feels long working hours make it difficult to spend enough time with their children. The variable X is the number of parents who think that long working hours make it difficult to spend enough time with their children. In this case, X ⫽ 755.
sample proportion
In our example, we chose the random variable X to be the number of parents who think that long working hours make it difficult to spend enough time with their children. We could have chosen X to be the number of parents who do not think that long working hours make it difficult to spend enough time with their children. The choice is yours. Often we make the choice based on how we would like to describe the results in a summary. Which choice do you prefer in this case? When a random variable has only two possible outcomes, we can also use the sample proportion pˆ ⫽ X兾n as a summary.
EXAMPLE 5.14 The sample proportion. The sample proportion of parents surveyed who think that long working hours make it difficult to spend enough time with their children is pˆ ⫽
755 ⫽ 0.75 1006
322
CHAPTER 5
•
Sampling Distributions Notice that this summary takes into account the sample size n. We need to know n in order to properly interpret the meaning of the random variable X. For example, the conclusion we would draw about parent opinions in this survey would be quite different if we had observed X ⫽ 755 from a sample twice as large, n ⫽ 2012.
USE YOUR KNOWLEDGE 5.35 Sexual harassment in middle school and high school. A survey of 1965 students in grades 7 to 12 reports that 48% of the students say they have encountered some type of sexual harassment while at school.12 Give n, X, and pˆ for this survey. 5.36 Seniors who have taken a statistics course. In a random sample of 300 senior students from your college, 63% reported that they had taken a statistics course. Give n, X, and pˆ for this setting. 5.37 Use of the Internet to find a place to live. A poll of 1500 college students asked whether or not they have used the Internet to find a place to live sometime within the past year. There were 1025 students who answered “Yes”; the other 475 answered “No.” (a) What is n? (b) Choose one of the two possible outcomes to define the random variable, X. Give a reason for your choice. (c) What is the value of X? (d) Find the sample proportion, pˆ .
Just like the sample mean, sample counts and sample proportions are commonly used statistics, and understanding their sampling distributions is important for statistical inference. These statistics, however, are discrete random variables and thus introduce us to a new family of probability distributions.
The binomial distributions for sample counts The distribution of a count X depends on how the data are produced. Here is a simple but common situation.
THE BINOMIAL SETTING 1. There is a fixed number of observations n. 2. The n observations are all independent. 3. Each observation falls into one of just two categories, which for convenience we call “success” and “failure.” 4. The probability of a success, call it p, is the same for each observation.
5.2 Sampling Distributions for Counts and Proportions
323
Think of tossing a coin n times as an example of the binomial setting. Each toss gives either heads or tails and the outcomes of successive tosses are independent. If we call heads a success, then p is the probability of a head and remains the same as long as we toss the same coin. The number of heads we count is a random variable X. The distribution of X (and, more generally, the distribution of the count of successes in any binomial setting) is completely determined by the number of observations n and the success probability p.
BINOMIAL DISTRIBUTIONS The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameters n and p. The parameter n is the number of observations, and p is the probability of a success on any one observation. The possible values of X are the whole numbers from 0 to n. As an abbreviation, we say that the distribution of X is B1n, p2.
The binomial distributions are an important class of discrete probability distributions. Later in this section we will learn how to assign probabilities to outcomes and how to find the mean and standard deviation of binomial distributions. That said, the most important skill for using binomial distributions is the ability to recognize situations to which they do and do not apply. This can be done by checking all the facets of the binomial setting.
EXAMPLE 5.15 Binomial examples? (a) Genetics says that children receive genes from each of their parents independently. Each child of a particular pair of parents has probability 0.25 of having type O blood. If these parents have 3 children, the number who have type O blood is the count X of successes in 3 independent trials with probability 0.25 of a success on each trial. So X has the B13, 0.252 distribution. (b) Engineers define reliability as the probability that an item will perform its function under specific conditions for a specific period of time. Replacement heart valves made of animal tissue, for example, have probability 0.77 of performing well for 15 years.13 The probability of failure within 15 years is therefore 0.23. It is reasonable to assume that valves in different patients fail (or not) independently of each other. The number of patients in a group of 500 who will need another valve replacement within 15 years has the B1500, 0.232 distribution. (c) A multicenter trial is designed to assess a new surgical procedure. A total of 540 patients will undergo the procedure, and the count of patients X who suffer a major adverse cardiac event (MACE) within 30 days of surgery will be recorded. Because these patients will receive this procedure from different surgeons at different hospitals, it may not be true that the probability of a MACE is the same for each patient. Thus, X may not have the binomial distribution.
324
CHAPTER 5
•
Sampling Distributions USE YOUR KNOWLEDGE 5.38 Genetics and blood types. Genetics says that children receive genes from each of their parents independently. Suppose that each child of a particular pair of parents has probability 0.5 of having type AB blood. If these parents have 4 children, what is the distribution of the number who have type AB blood? Explain your answer. 5.39 Toss a coin. Toss a fair coin 10 times. Give the distribution of X, the number of heads that you observe.
Binomial distributions in statistical sampling The binomial distributions are important in statistics when we wish to make inferences about the proportion p of “successes” in a population. Here is a typical example.
EXAMPLE 5.16 Audits of financial records. The financial records of businesses may be audited by state tax authorities to test compliance with tax laws. It is too time-consuming to examine all sales and purchases made by a company during the period covered by the audit. Suppose that the auditor examines an SRS of 150 sales records out of 10,000 available. One issue is whether each sale was correctly classified as subject to state sales tax or not. Suppose that 800 of the 10,000 sales are incorrectly classified. Is the count X of misclassified records in the sample a binomial random variable?
LOOK BACK stratified sample, p. 197
Choosing an SRS from a population is not quite a binomial setting. Removing one record in Example 5.16 changes the proportion of bad records in the remaining population, so the state of the second record chosen is not independent of the first. Because the population is large, however, removing a few items has a very small effect on the composition of the remaining population. Successive inspection results are very nearly independent. The population proportion of misclassified records is 800 ⫽ 0.08 p⫽ 10,000 If the first record chosen is bad, the proportion of bad records remaining is 799/9999 5 0.079908. If the first record is good, the proportion of bad records left is 800/9999 5 0.080008. These proportions are so close to 0.08 that for practical purposes we can act as if removing one record has no effect on the proportion of misclassified records remaining. We act as if the count X of misclassified sales records in the audit sample has the binomial distribution B1150, 0.082. Populations like the one described in Example 5.16 often contain a relatively small number of items with very large values. For this example, these values would be very large sale amounts and likely represent an important group of items to the auditor. An SRS taken from such a population will likely include very few items of this type. Therefore, it is common to use a stratified sample in settings like this. Strata are defined based on dollar value of the sale, and within each stratum, an SRS is taken. The results are then combined to obtain an estimate for the entire population.
5.2 Sampling Distributions for Counts and Proportions
325
SAMPLING DISTRIBUTION OF A COUNT A population contains proportion p of successes. If the population is much larger than the sample, the count X of successes in an SRS of size n has approximately the binomial distribution B1n, p2. The accuracy of this approximation improves as the size of the population increases relative to the size of the sample. As a rule of thumb, we will use the binomial sampling distribution for counts when the population is at least 20 times as large as the sample.
Finding binomial probabilities We will later give a formula for the probability that a binomial random variable takes any of its values. In practice, you will rarely have to use this formula for calculations because some calculators and most statistical software packages will calculate binomial probabilities for you.
EXAMPLE 5.17 Probabilities for misclassified sales records. In the audit setting of Example 5.16, what is the probability that the audit finds exactly 10 misclassified sales records? What is the probability that the audit finds no more than 10 misclassified records? Figure 5.9 shows the output from one statistical software system. You see that if the count X has the B1150, 0.082 distribution, P1X ⫽ 102 ⫽ 0.106959 P1X ⱕ 102 ⫽ 0.338427 It was easy to request these calculations in the software’s menus. For the TI-83/84 calculator, the functions binompdf and binomcdf would be used. In R, the functions dbinom and pbinom would be used. Typically, the output supplies more decimal places than we need and uses labels that may not be helpful (for example, “Probability Density Function” when the distribution is discrete, not continuous). But, as usual with software, we can ignore distractions and find the results we need. Minitab
Probability Density Function Binomial with n = 150 and p = 0.08 x 10
P(X = x ) 0.106959
Cumulative Distribution Function Binomial with n = 150 and p = 0.08
FIGURE 5.9 Binomial probabilities for Example 5.17: output from the Minitab statistical software.
x 10
P( X |t|
Pooled Satterthwaite
Equal Unequal
8 7.9601
2.28 2.28
0.0521 0.0523
7.2 Comparing Two Means
459
Excel
A
B
C
t-Test: Two-Sample Assuming Unequal Variances
1 2
Early
3
Late
4
Mean
5
Variance
6
Observations
7
Hypothesized Mean Difference
0
8
Df
9
t Stat
10
P(T t Prob < t
᎐2.27945 7.960145 0.0523 0.9739 0.0261*
᎐10
᎐5
0
5
10
FIGURE 7.15 (Continued ) Figure 7.15 gives outputs for this analysis from several software packages. Although the formats differ, the basic information is the same. All report the sample sizes, the sample means and standard deviations (or variances), the t statistic, and its P-value. All agree that the P-value is small, though some give more detail than others. Software often labels the groups in alphabetical order. Always check the means first and report the statistic (you may need to change the sign) in an appropriate way. We do not need to do that here. Be sure to also mention the size of the effect you observed, such as “The mean weight loss for the early eaters was 6.44 kg higher than for the late eaters.” There are two other things to notice in the outputs. First, SAS and SPSS only give results for the two-sided alternative. To get the P-value for the one-sided alternative, we must first check the mean difference to make sure it is in the proper direction. If it is, we divide the given P-value by 2. Also, SAS and SPSS report the results of two t procedures: a special procedure that assumes that the
460
CHAPTER 7
•
Inference for Distributions
*Output1 - IBM SPSS Statistics Viewer
Group Statistics grp
N
Mean
Std. Deviation
Std. Error Mean
Early
5
11.560
4.3062
1.9258
Late
5
5.120
4.6224
2.0672
Loss
t-test for Equality of Means t
df
Sig. (2tailed)
Std. Error 95% Confidence Interval of Mean the Difference Difference Difference Lower
Upper
Equal variances 2.279 assumed
8
.052
6.4400
2.8252
–.0750
12.9550
Equal variances 2.279 not assumed
7.960
.052
6.4400
2.8252
–.0807
12.9607
Loss
IBM SPSS Statistics Processor is ready
FIGURE 7.15 (Continued )
two population variances are equal and the general two-sample procedure that we have just studied. We don’t recommend the “equal-variances” procedures, but we describe them later, in the section on pooled two-sample t procedures.
Software approximation for the degrees of freedom We noted earlier that the two-sample t statistic does not have a t distribution. Moreover, the distribution changes as the unknown population standard deviations s1 and s2 change. However, the distribution can be approximated by a t distribution with degrees of freedom given by a df ⫽
s21 s22 2 ⫹ b n1 n2
s21 2 s22 2 1 1 a b ⫹ a b n1 ⫺ 1 n1 n2 ⫺ 1 n2
This is the approximation used by most statistical software. It is quite accurate when both sample sizes n1 and n2 are 5 or larger.
EXAMPLE 7.18 Degrees of freedom for directed reading assessment. For the DRP study of Example 7.14, the following table summarizes the data: Group
n
x
s
1
21
51.48
11.01
2
23
41.52
17.15
7.2 Comparing Two Means
461
For greatest accuracy, we will use critical points from the t distribution with degrees of freedom given by the equation above: 17.152 2 11.012 ⫹ b 21 23 df ⫽ 1 11.012 2 1 17.152 2 a b ⫹ a b 20 21 22 23 a
344.486 ⫽ 37.86 9.099 This is the value that we reported in Examples 7.14 and 7.15, where we gave the results produced by software. ⫽
The number df given by the preceding approximation is always at least as large as the smaller of n1 ⫺ 1 and n2 ⫺ 1. On the other hand, the number df is never larger than the sum n1 ⫹ n2 ⫺ 2 of the two individual degrees of freedom. The number df is generally not a whole number. There is a t distribution with any positive degrees of freedom, even though Table D contains entries only for whole-number degrees of freedom. When the number df is small and is not a whole number, interpolation between entries in Table D may be needed to obtain an accurate critical value or P-value. Because of this and the need to calculate df, we do not recommend regular use of this approximation if a computer is not doing the arithmetic. With a computer, however, the more accurate procedures are painless. USE YOUR KNOWLEDGE 7.60 Calculating the degrees of freedom. Assume that s1 ⫽ 13, s2 ⫽ 8, n1 ⫽ 28, and n2 ⫽ 24. Find the approximate degrees of freedom.
The pooled two-sample t procedures There is one situation in which a t statistic for comparing two means has exactly a t distribution. This is when the two Normal population distributions have the same standard deviation. As we’ve done with other t statistics, we will first develop the z statistic and then, from it, the t statistic. In this case, notice that we need to substitute only a single standard error when we go from the z to the t statistic. This is why the resulting t statistic has a t distribution. Call the common—and still unknown—standard deviation of both populations s. Both sample variances s21 and s22 estimate s2. The best way to combine these two estimates is to average them with weights equal to their degrees of freedom. This gives more weight to the sample variance from the larger sample, which is reasonable. The resulting estimator of s2 is s2p ⫽ pooled estimator of s2
1n1 ⫺ 12 s21 ⫹ 1n2 ⫺ 12 s22 n1 ⫹ n2 ⫺ 2
This is called the pooled estimator of s 2 because it combines the information in both samples. When both populations have variance s2, the addition rule for variances says that x1 ⫺ x2 has variance equal to the sum of the individual variances,
462
CHAPTER 7
•
Inference for Distributions which is s2 s2 1 1 ⫹ ⫽ s2 a ⫹ b n1 n2 n1 n2 The standardized difference between means in this equal-variance case is therefore 1x1 ⫺ x2 2 ⫺ 1m1 ⫺ m2 2 z⫽ 1 1 s ⫹ n n B 1 2 This is a special two-sample z statistic for the case in which the populations have the same s. Replacing the unknown s by the estimate sp gives a t statistic. The degrees of freedom are n1 ⫹ n2 ⫺ 2, the sum of the degrees of freedom of the two sample variances. This t statistic is the basis of the pooled two-sample t inference procedures.
THE POOLED TWO-SAMPLE t PROCEDURES Suppose that an SRS of size n1 is drawn from a Normal population with unknown mean m1 and that an independent SRS of size n2 is drawn from another Normal population with unknown mean m2. Suppose also that the two populations have the same standard deviation. A level C confidence interval for m1 ⫺ m2 is 1 1 1x1 ⫺ x2 2 ⫾ t*sp ⫹ n2 B n1 Here t* is the value for the t1n1 ⫹ n2 ⫺ 22 density curve with area C between ⫺t* and t*. To test the hypothesis H0: m1 ⫽ m2, compute the pooled two-sample t statistic x1 ⫺ x2 t⫽ 1 1 sp ⫹ n2 B n1 In terms of a random variable T having the t1n1 ⫹ n2 ⫺ 22 distribution, the P-value for a test of H0 against Ha: m1 ⬎ m2 is P1T ⱖ t2 Ha: m1 ⬍ m2 is P1T ⱕ t2 Ha: m1 ⬆ m2 is 2P1T ⱖ 0 t 0 2
t
t
t
EXAMPLE 7.19 Calcium and blood pressure. Does increasing the amount of calcium in our diet reduce blood pressure? Examination of a large sample of people revealed a relationship between calcium intake and blood pressure, but such observational studies do not establish causation. Animal experiments, however, showed that calcium supplements do reduce blood pressure in rats,
7.2 Comparing Two Means
TABLE 7.5
Seated Systolic Blood Pressure (mm Hg)
Calcium Group
DATA BP_CA
463
Begin
End
107
100
110
114
123
Placebo Group Decrease
Begin
End
Decrease
7
123
124
109
97
21 12
105
24 18
112
113
21
129
112
17
102
105
112
115
23
98
95
23 3
111
116
119
107
106
25 1
114 119
114
25 5
112
102
10
114
112
2
136
125
11
110
121
211
102
104
22
117
118
21
130
133
23
justifying an experiment with human subjects. A randomized comparative experiment gave one group of 10 black men a calcium supplement for 12 weeks. The control group of 11 black men received a placebo that appeared identical. (In fact, a block design with black and white men as the blocks was used. We will look only at the results for blacks, because the earlier survey suggested that calcium is more effective for blacks.) The experiment was double-blind. Table 7.5 gives the seated systolic (heart contracted) blood pressure for all subjects at the beginning and end of the 12-week period, in millimeters of mercury (mm Hg). Because the researchers were interested in decreasing blood pressure, Table 7.5 also shows the decrease for each subject. An increase appears as a negative entry.24
CHALLENGE
As usual, we first examine the data. To compare the effects of the two treatments, take the response variable to be the amount of the decrease in blood pressure. Inspection of the data reveals that there are no outliers. Side-by-side boxplots and Normal quantile plots (Figures 7.16 and 7.17) give a more
FIGURE 7.16 Side-by-side
20
boxplots of the decrease in blood pressure from Table 7.5.
15
Decrease
10 5 0 –5 –10 Calcium
Placebo
464
CHAPTER 7
•
Inference for Distributions
Blood pressure change— placebo group
15 10 5 0 –5 –10 –2
–1
0 Normal score
1
2
–1
0 Normal score
1
2
Blood pressure change— calcium group
20
FIGURE 7.17 Normal quantile
15 10 5 0 –5 –2
plots of the change in blood pressure from Table 7.5.
detailed picture. The calcium group has a somewhat short left tail, but there are no severe departures from Normality that will prevent use of t procedures. To examine the question of the researchers who collected these data, we perform a significance test.
EXAMPLE 7.20 Does increased calcium reduce blood pressure? Take Group 1 to be the calcium group and Group 2 to be the placebo group. The evidence that calcium lowers blood pressure more than a placebo is assessed by testing H0: m1 ⫽ m2 Ha: m1 ⬎ m2 Here are the summary statistics for the decrease in blood pressure: Group
Treatment
n
x
s
1 2
Calcium Placebo
10 11
5.000 20.273
8.743 5.901
The calcium group shows a drop in blood pressure, and the placebo group has a small increase. The sample standard deviations do not rule out equal
7.2 Comparing Two Means
465
population standard deviations. A difference this large will often arise by chance in samples this small. We are willing to assume equal population standard deviations. The pooled sample variance is s2p ⫽ ⫽
1n1 ⫺ 12 s21 ⫹ 1n2 ⫺ 12 s22 n1 ⫹ n2 ⫺ 2 110 ⫺ 128.7432 ⫹ 111 ⫺ 125.9012 ⫽ 54.536 10 ⫹ 11 ⫺ 2
so that sp ⫽ 254.536 ⫽ 7.385 The pooled two-sample t statistic is x1 ⫺ x2
t⫽ sp ⫽
1 1 ⫹ n n B 1 2
5.000 ⫺ 1⫺0.2732 7.385
⫽ df 5 19 p 0.10 t*
1.328
0.05 1.729
1 1 ⫹ B 10 11
5.273 ⫽ 1.634 3.227
The P-value is P1T ⱖ 1.6342, where t has the t1192 distribution. From Table D we can see that P falls between the a ⫽ 0.10 and a ⫽ 0.05 levels. Statistical software gives the exact value P ⫽ 0.059. The experiment found evidence that calcium reduces blood pressure, but the evidence falls a bit short of the traditional 5% and 1% levels. Sample size strongly influences the P-value of a test. An effect that fails to be significant at a specified level a in a small sample can be significant in a larger sample. In the light of the rather small samples in Example 7.20, the evidence for some effect of calcium on blood pressure is rather good. The published account of the study combined these results for blacks with the results for whites and adjusted for pretest differences among the subjects. Using this more detailed analysis, the researchers were able to report a P-value of 0.008. Of course, a P-value is almost never the last part of a statistical analysis. To make a judgment regarding the size of the effect of calcium on blood pressure, we need a confidence interval.
EXAMPLE 7.21 How different are the calcium and placebo groups? We estimate that the effect of calcium supplementation is the difference between the sample means of the calcium and the placebo groups, x1 ⫺ x2 ⫽ 5.273 mm Hg. A 90% confidence interval for m1 ⫺ m2 uses the critical value t* ⫽ 1.729 from the t1192 distribution. The interval is 1x1 ⫺ x2 2 ⫾ t*sp
1 1 1 1 ⫹ ⫽ 冤5.000 ⫺ 1⫺0.2732冥 ⫾ 11.7292 17.3852 ⫹ n2 B n1 B 10 11 ⫽ 5.273 ⫾ 5.579
466
CHAPTER 7
•
Inference for Distributions We are 90% confident that the difference in means is in the interval 1⫺0.306, 10.8522. The calcium treatment reduced blood pressure by about 5.3 mm Hg more than a placebo on the average, but the margin of error for this estimate is 5.6 mm Hg. The pooled two-sample t procedures are anchored in statistical theory and so have long been the standard version of the two-sample t in textbooks. But they require the assumption that the two unknown population standard deviations are equal. As we shall see in Section 7.3, this assumption is hard to verify. The pooled t procedures are therefore a bit risky. They are reasonably robust against both non-Normality and unequal standard deviations when the sample sizes are nearly the same. When the samples are quite different in size, the pooled t procedures become sensitive to unequal standard deviations and should be used with caution unless the samples are large. Unequal standard deviations are quite common. In particular, it is not unusual for the spread of data to increase when its center gets larger. Statistical software often calculates both the pooled and the unpooled t statistics, as in Figure 7.15. USE YOUR KNOWLEDGE 7.61 Timing of food intake revisited. Figure 7.15 (pages 458–460) gives the outputs from four software packages for comparing the weight loss of two groups with different eating schedules. Some of the software reports both pooled and unpooled analyses. Which outputs give the pooled results? What are the pooled t and its P-value? 7.62 Equal sample sizes. The software outputs in Figure 7.15 give the same value for the pooled and unpooled t statistics. Do some simple algebra to show that this is always true when the two sample sizes n1 and n2 are the same. In other cases, the two t statistics usually differ.
SECTION 7.2 Summary Significance tests and confidence intervals for the difference between the means m1 and m2 of two Normal populations are based on the difference x1 ⫺ x2 between the sample means from two independent SRSs. Because of the central limit theorem, the resulting procedures are approximately correct for other population distributions when the sample sizes are large. When independent SRSs of sizes n1 and n2 are drawn from two Normal populations with parameters m1, s1 and m2, s2 the two-sample z statistic 1x1 ⫺ x2 2 ⫺ 1m1 ⫺ m2 2
z⫽
s 21 s 22 ⫹ n2 B n1
has the N10, 12 distribution. The two-sample t statistic t⫽
1x1 ⫺ x2 2 ⫺ 1m1 ⫺ m2 2 s21 s22 ⫹ n2 B n1
does not have a t distribution. However, good approximations are available.
7.2 Comparing Two Means
467
Conservative inference procedures for comparing m1 and m2 are obtained from the two-sample t statistic by using the t1k2 distribution with degrees of freedom k equal to the smaller of n1 ⫺ 1 and n2 ⫺ 1. More accurate probability values can be obtained by estimating the degrees of freedom from the data. This is the usual procedure for statistical software. An approximate level C confidence interval for m1 ⫺ m2 is given by 1x1 ⫺ x2 2 ⫾ t*
s21 s22 ⫹ n2 B n1
Here, t* is the value for the t1k2 density curve with area C between ⫺t* and t*, where k is computed from the data by software or is the smaller of n1 ⫺ 1 and n2 ⫺ 1. The quantity t*
s21 s22 ⫹ n2 B n1
is the margin of error. Significance tests for H0: m1 ⫽ m2 use the two-sample t statistic t⫽
x1 ⫺ x2 s21 s22 ⫹ n2 B n1
The P-value is approximated using the t1k2 distribution where k is estimated from the data using software or is the smaller of n1 ⫺ 1 and n2 ⫺ 1. The guidelines for practical use of two-sample t procedures are similar to those for one-sample t procedures. Equal sample sizes are recommended. If we can assume that the two populations have equal variances, pooled twosample t procedures can be used. These are based on the pooled estimator 1n1 ⫺ 12 s21 ⫹ 1n2 ⫺ 12 s22 n1 ⫹ n2 ⫺ 2 of the unknown common variance and the t1n1 ⫹ n2 ⫺ 22 distribution. We do not recommend this procedure for regular use. s2p ⫽
SECTION 7.2 Exercises For Exercises 7.56 and 7.57, see pages 453–454; for Exercises 7.58 and 7.59, see page 455; for Exercise 7.60, see page 461; and for Exercises 7.61 and 7.62, see page 466. In exercises that call for two-sample t procedures, you may use either of the two approximations for the degrees of freedom that we have discussed: the value given by your software or the smaller of n1 ⫺ 1 and n2 ⫺ 1. Be sure to state clearly which approximation you have used. 7.63 What is wrong? In each of the following situations explain what is wrong and why. (a) A researcher wants to test H0: x1 ⫽ x2 versus the twosided alternative Ha: x1 ⬆ x2. (b) A study recorded the IQ scores of 100 college freshmen. The scores of the 56 males in the study were compared
with the scores of all 100 freshmen using the two-sample methods of this section. (c) A two-sample t statistic gave a P-value of 0.94. From this we can reject the null hypothesis with 90% confidence. (d) A researcher is interested in testing the one-sided alternative Ha: m1 ⬍ m2. The significance test gave t ⫽ 2.15. Since the P-value for the two-sided alternative is 0.036, he concluded that his P-value was 0.018. 7.64 Basic concepts. For each of the following, answer the question and give a short explanation of your reasoning. (a) A 95% confidence interval for the difference between two means is reported as 10.8, 2.32. What can you conclude about the results of a significance test of the null
468
CHAPTER 7
•
Inference for Distributions
hypothesis that the population means are equal versus the two-sided alternative? (b) Will larger samples generally give a larger or smaller margin of error for the difference between two sample means? 7.65 More basic concepts. For each of the following, answer the question and give a short explanation of your reasoning. (a) A significance test for comparing two means gave t ⫽ ⫺1.97 with 10 degrees of freedom. Can you reject the null hypothesis that the m’s are equal versus the two-sided alternative at the 5% significance level? (b) Answer part (a) for the one-sided alternative that the difference between means is negative.
students to look at the relationships among frequency of Facebook use, participation in Facebook activities, time spent preparing for class, and overall GPA.26 Students reported preparing for class an average of 706 minutes per week with a standard deviation of 526 minutes. Students also reported spending an average of 106 minutes per day on Facebook with a standard deviation of 93 minutes; 8% of the students reported spending no time on Facebook. (a) Construct a 95% confidence interval for the average number of minutes per week a student prepares for class. (b) Construct a 95% confidence interval for the average number of minutes per week a student spends on Facebook. (Hint: Be sure to convert from minutes per day to minutes per week.)
7.66 Effect of the confidence level. Assume that x1 ⫽ 100, x2 ⫽ 115, s1 ⫽ 19, s2 ⫽ 16, n1 ⫽ 50, and n2 ⫽ 40. Find a 95% confidence interval for the difference between the corresponding values of m. Does this interval include more or fewer values than a 99% confidence interval would? Explain your answer.
(c) Explain why you might expect the population distributions of these two variables to be highly skewed to the right. Do you think this fact makes your confidence intervals invalid? Explain your answer.
7.67 Trustworthiness and eye color. Why do we naturally tend to trust some strangers more than others? One group of researchers decided to study the relationship between eye color and trustworthiness.25 In their experiment the researchers took photographs of 80 students (20 males with brown eyes, 20 males with blue eyes, 20 females with brown eyes, and 20 females with blue eyes), each seated in front of a white background looking directly at the camera with a neutral expression. These photos were cropped so the eyes were horizontal and at the same height in the photo and so the neckline was visible. They then recruited 105 participants to judge the trustworthiness of each student photo. This was done using a 10-point scale, where 1 meant very untrustworthy and 10 very trustworthy. The 80 scores from each participant were then converted to z-scores, and the average z-score of each photo (across all 105 participants) was used for the analysis. Here is a summary of the results:
All students surveyed were U.S. residents admitted through the regular admissions process at a 4-year, public, primarily residential institution in the northeastern United States (N 5 3866). Students were sent a link to a survey hosted on SurveyMonkey.com, a survey-hosting website, through their universitysponsored email accounts. For the students who did not participate immediately, two additional reminders were sent, 1 week apart. Participants were offered a chance to enter a drawing to win one of 90 $10 Amazon.com gift cards as incentive. A total of 1839 surveys were completed for an overall response rate of 48%.
Eye color
n
x
s
Brown Blue
40 40
0.55 20.38
1.68 1.53
Can we conclude from these data that brown-eyed students appear more trustworthy compared to their blue-eyed counterparts? Test the hypothesis that the average scores for the two groups are the same. 7.68 Facebook use in college. Because of Facebook’s rapid rise in popularity among college students, there is a great deal of interest in the relationship between Facebook use and academic performance. One study collected information on n ⫽ 1839 undergraduate
7.69 Possible biases? Refer to the previous exercise. The authors state:
Discuss how these factors influence your interpretation of the results of this survey. 7.70 Comparing means. Refer to Exercise 7.68. Suppose that you wanted to compare the average minutes per week spent on Facebook with the average minutes per week spent preparing for class. (a) Provide an estimate of this difference. (b) Explain why it is incorrect to use the two-sample t test to see if the means differ. 7.71 Sadness and spending. The “misery is not miserly” phenomenon refers to a person’s spending judgment going haywire when the person is sad. In a study, 31 young adults were given $10 and randomly assigned to either a sad or a neutral group. The participants in the sad group watched a video about the death of a boy’s mentor (from The Champ), and those in the neutral group watched a video on the Great Barrier Reef. After the video, each participant was offered the chance to
7.2 Comparing Two Means trade $0.50 increments of the $10 for an insulated water bottle.27 Here are the data: SADNESS Group
Purchase price ($) 2.00
0.00
1.00
0.50
0.00
469
(b) Test whether these two groups show the same preference for this product. Use a two-sided alternative hypothesis and a significance level of 5%. (c) Construct a 95% confidence interval for the difference in average preference.
Neutral
0.00
0.50
2.00
1.00
0.00
0.00
0.00
0.00
1.00
Sad
3.00
4.00
0.50
1.00
2.50
2.00
1.50
0.00
1.50
1.50
2.50
4.00
3.00
3.50
1.00
3.50
1.00
(a) Examine each group’s prices graphically. Is use of the t procedures appropriate for these data? Carefully explain your answer. (b) Make a table with the sample size, mean, and standard deviation for each of the two groups. (c) State appropriate null and alternative hypotheses for comparing these two groups. (d) Perform the significance test at the a ⫽ 0.05 level, making sure to report the test statistic, degrees of freedom, and P-value. What is your conclusion? (e) Construct a 95% confidence interval for the mean difference in purchase price between the two groups. 7.72 Wine labels with animals? Traditional brand research argues that successful logos are ones that are highly relevant to the product they represent. However, a market research firm recently reported that nearly 20% of all table wine brands introduced in the last three years feature an animal on the label. Since animals have little to do with the product, why are marketers using this tactic? Some researchers have proposed that consumers who are “primed” (in other words, they’ve thought about the image earlier in an unrelated context) process visual information more easily.28 To demonstrate this, the researchers randomly assigned participants to either a primed or a nonprimed group. Each participant was asked to indicate his or her attitude toward a product on a seven-point scale (from 1 5 dislike very much to 7 5 like very much). A bottle of MagicCoat pet shampoo, with a picture of a collie on the label, was the product. Prior to giving this score, however, participants were asked to do a word find where four of the words were common to both groups (pet, grooming, bottle, label) and four were either related to the product image (dog, collie, puppy, woof) or conflicted with the image (cat, feline, kitten, meow). The following table contains the responses listed from smallest to largest. BPREF Group
Brand attitude
Primed
2233344444444445555555
Nonprimed
11223333333333334445
(a) Examine the scores of each group graphically. Is it appropriate to use the two-sample t procedures? Explain your answer.
(d) Write a short summary of your conclusions. 7.73 Drive-thru customer service. QSRMagazine.com assessed 2053 drive-thru visits at quick-service restaurants.29 One benchmark assessed was customer service. Responses ranged from “Rude (1)” to “Very Friendly (5).” The following table breaks down the responses according to two of the chains studied. DRVTHRU Rating Chain
1
2
3
4
5
Taco Bell McDonald’s
5
3
54
109
136
2
22
73
165
100
(a) Comment on the appropriateness of t procedures for these data. (b) Report the means and standard deviations of the ratings for each chain separately. (c) Test whether the two chains, on average, have the same customer satisfaction. Use a two-sided alternative hypothesis and a significance level of 5%. (d) Construct a 95% confidence interval for the difference in average satisfaction. 7.74 Diet and mood. Researchers were interested in comparing the long-term psychological effects of being on a high-carbohydrate, low-fat (LF) diet versus a highfat, low-carbohydrate (LC) diet.30 A total of 106 overweight and obese participants were randomly assigned to one of these two energy-restricted diets. At 52 weeks, 32 LC dieters and 33 LF dieters remained. Mood was assessed using a total mood disturbance score (TMDS), where a lower score is associated with a less negative mood. A summary of these results follows: Group
n
x
s
LC
32
47.3
28.3
LF
33
19.3
25.8
(a) Is there a difference in the TMDS at Week 52? Test the null hypothesis that the dieters’ average mood in the two groups is the same. Use a significance level of 0.05. (b) Critics of this study focus on the specific LC diet (that is, the science) and the dropout rate. Explain why the dropout rate is important to consider when drawing conclusions from this study. 7.75 Comparison of dietary composition. Refer to Example 7.16 (page 456). That study also broke down
470
CHAPTER 7
•
Inference for Distributions
the dietary composition of the main meal. The following table summarizes the total fats, protein, and carbohydrates in the main meal (g) for the two groups: Early eaters (n ⴝ 202)
Late eaters (n ⴝ 200)
x
s
x
s
Fats
23.1
12.5
21.4
8.2
Protein
27.6
8.6
25.7
6.8
Carbohydrates
64.1
21.0
63.5
20.8
(a) Is it appropriate to use the two-sample t procedures that we studied in this section to analyze these data for group differences? Give reasons for your answer. (b) Describe appropriate null and alternative hypotheses for comparing the two groups in terms of fats consumed. (c) Carry out the significance test using a ⫽ 0.05. Report the test statistic with the degrees of freedom and the P-value. Write a short summary of your conclusion. (d) Find a 95% confidence interval for the difference between the two means. Compare the information given by the interval with the information given by the significance test. 7.76 More on dietary composition. Refer to the previous exercise. Repeat parts (b) through (d) for protein and carbohydrates. Write a short summary of your findings. 7.77 Dust exposure at work. Exposure to dust at work can lead to lung disease later in life. One study measured the workplace exposure of tunnel construction workers.31 Part of the study compared 115 drill and blast workers with 220 outdoor concrete workers. Total dust exposure was measured in milligram years per cubic meter (mgⴢ y/m3). The mean exposure for the drill and blast workers was 18.0 mgⴢ y/m3 with a standard deviation of 7.8 mg ⴢ y/m3. For the outdoor concrete workers, the corresponding values were 6.5 mgⴢ y/m3 and 3.4 mgⴢ y/m3. (a) The sample included all workers for a tunnel construction company who received medical examinations as part of routine health checkups. Discuss the extent to which you think these results apply to other similar types of workers. (b) Use a 95% confidence interval to describe the difference in the exposures. Write a sentence that gives the interval and provides the meaning of 95% confidence. (c) Test the null hypothesis that the exposures for these two types of workers are the same. Justify your choice of a one-sided or two-sided alternative. Report the test statistic, the degrees of freedom, and the P-value. Give a short summary of your conclusion. (d) The authors of the article describing these results note that the distributions are somewhat skewed. Do you think that this fact makes your analysis invalid? Give reasons for your answer.
7.78 Not all dust is the same. Not all dust particles that are in the air around us cause problems for our lungs. Some particles are too large and stick to other areas of our body before they can get to our lungs. Others are so small that we can breathe them in and out and they will not deposit in our lungs. The researchers in the study described in the previous exercise also measured respirable dust. This is dust that deposits in our lungs when we breathe it. For the drill and blast workers, the mean exposure to respirable dust was 6.3 mg ⴢ y/m3 with a standard deviation of 2.8 mgⴢ y/m3. The corresponding values for the outdoor concrete workers were 1.4 mgⴢ y/m3 and 0.7 mg ⴢ y/m3. Analyze these data using the questions in the previous exercise as a guide. 7.79 Change in portion size. A study of food portion sizes reported that over a 17-year period, the average size of a soft drink consumed by Americans aged 2 years and older increased from 13.1 ounces (oz) to 19.9 oz. The authors state that the difference is statistically significant with P ⬍ 0.01.32 Explain what additional information you would need to compute a confidence interval for the increase, and outline the procedure that you would use for the computations. Do you think that a confidence interval would provide useful additional information? Explain why or why not. 7.80 Beverage consumption. The results in the previous exercise were based on two national surveys with a very large number of individuals. Here is a study that also looked at beverage consumption, but the sample sizes were much smaller. One part of this study compared 20 children who were 7 to 10 years old with 5 children who were 11 to 13.33 The younger children consumed an average of 8.2 oz of sweetened drinks per day while the older ones averaged 14.5 oz. The standard deviations were 10.7 oz and 8.2 oz, respectively. (a) Do you think that it is reasonable to assume that these data are Normally distributed? Explain why or why not. (Hint: Think about the 68–95–99.7 rule.) (b) Using the methods in this section, test the null hypothesis that the two groups of children consume equal amounts of sweetened drinks versus the two-sided alternative. Report all details of the significance-testing procedure with your conclusion. (c) Give a 95% confidence interval for the difference in means. (d) Do you think that the analyses performed in parts (b) and (c) are appropriate for these data? Explain why or why not. (e) The children in this study were all participants in an intervention study at the Cornell Summer Day Camp at Cornell University. To what extent do you think that these results apply to other groups of children?
7.2 Comparing Two Means 7.81 Study design is important! Recall Exercise 7.58 (page 455). You are concerned that day of the week may affect the number of hits. So to compare the two MySpace page designs, you choose two successive weeks in the middle of a month. You flip a coin to assign one Monday to the first design and the other Monday to the second. You repeat this for each of the seven days of the week. You now have 7 hit amounts for each design. It is incorrect to use the two-sample t test to see if the mean hits differ for the two designs. Carefully explain why. 7.82 New computer monitors? The purchasing department has suggested that all new computer monitors for your company should be flat screens. You want data to assure you that employees will like the new screens. The next 20 employees needing a new computer are the subjects for an experiment. (a) Label the employees 01 to 20. Randomly choose 10 to receive flat screens. The remaining 10 get standard monitors. (b) After a month of use, employees express their satisfaction with their new monitors by responding to the statement “I like my new monitor” on a scale from 1 to 5, where 1 represents “strongly disagree,” 2 is “disagree,” 3 is “neutral,” 4 is “agree,” and 5 stands for “strongly agree.” The employees with the flat screens have average satisfaction 4.8 with standard deviation 0.7. The employees with the standard monitors have average 3.0 with standard deviation 1.5. Give a 95% confidence interval for the difference in the mean satisfaction scores for all employees. (c) Would you reject the null hypothesis that the mean satisfaction for the two types of monitors is the same versus the two-sided alternative at significance level 0.05? Use your confidence interval to answer this question. Explain why you do not need to calculate the test statistic. 7.83 Why randomize? Refer to the previous exercise. A coworker suggested that you give the flat screens to the next 10 employees who need new screens and the standard monitor to the following 10. Explain why your randomized design is better. 7.84 Does ad placement matter? Corporate advertising tries to enhance the image of the corporation. A study compared two ads from two sources, the Wall Street Journal and the National Enquirer. Subjects were asked to pretend that their company was considering a major investment in Performax, the fictitious sportswear firm in the ads. Each subject was asked to respond to the question “How trustworthy was the source in the sportswear company ad for Performax?” on a 7-point scale. Higher values indicated more trustworthiness.34 Here is a summary of the results:
Ad source
n
x
s
Wall Street Journal
66
4.77
1.50
National Enquirer
61
2.43
1.64
471
(a) Compare the two sources of ads using a t test. Be sure to state your null and alternative hypotheses, the test statistic with degrees of freedom, the P-value, and your conclusion. (b) Give a 95% confidence interval for the difference. (c) Write a short paragraph summarizing the results of your analyses. 7.85 Size of trees in the northern and southern halves. The study of 584 longleaf pine trees in the Wade Tract in Thomas County, Georgia, had several purposes. Are trees in one part of the tract more or less like trees in any other part of the tract or are there differences? In Example 6.1 (page 352) we examined how the trees were distributed in the tract and found that the pattern was not random. In this exercise we will examine the sizes of the trees. In Exercise 7.31 (page 443) we analyzed the sizes, measured as diameter at breast height (DBH), for a random sample of 40 trees. Here we divide the tract into northern and southern halves and take random samples of 30 trees from each half. Here are the diameters in centimeters (cm) of the sampled trees: NSPINES
North
South
27.8
14.5
39.1
3.2
58.8
55.5
25.0
5.4
19.0
30.6
15.1
3.6
28.4
15.0
2.2
14.2
44.2
25.7
11.2
46.8
36.9
54.1
10.2
2.5
13.8
43.5
13.8
39.7
6.4
4.8
44.4
26.1
50.4
23.3
39.5
51.0
48.1
47.2
40.3
37.4
36.8
21.7
35.7
32.0
40.4
12.8
5.6
44.3
52.9
38.0
2.6
44.6
45.5
29.1
18.7
7.0
43.8
28.3
36.9
51.6
(a) Use a back-to-back stemplot and side-by-side boxplots to examine the data graphically. Describe the patterns in the data. (b) Is it appropriate to use the methods of this section to compare the mean DBH of the trees in the north half of the tract with the mean DBH of the trees in the south half? Give reasons for your answer. (c) What are appropriate null and alternative hypotheses for comparing the two samples of tree DBHs? Give reasons for your choices. (d) Perform the significance test. Report the test statistic, the degrees of freedom, and the P-value. Summarize your conclusion. (e) Find a 95% confidence interval for the difference in mean DBHs. Explain how this interval provides additional information about this problem.
472
CHAPTER 7
•
Inference for Distributions
7.86 Size of trees in the eastern and western halves. Refer to the previous exercise. The Wade Tract can also be divided into eastern and western halves. Here are the DBHs of 30 randomly selected longleaf pine trees from each half: EWPINES
East
West
23.5
43.5
6.6
11.5
17.2
38.7
2.3
31.5
10.5
13.8
5.2
31.5
22.1
6.7
2.6
6.3
51.1
5.4
23.7 9.0
43.0
8.7
22.8
2.9
22.3
43.8
48.1
46.5
39.8
10.9
17.2
44.6
44.1
35.5
51.0
21.6
44.1
11.2
36.0
42.1
3.2
25.5
36.5
39.0
25.9
20.8
3.2
57.7
43.3
58.0
21.7
35.6
30.9
40.6
30.7
35.6
18.2
2.9
20.4
11.4
Using the questions in the previous exercise, analyze these data. 7.87 Sales of a small appliance across months. A market research firm supplies manufacturers with estimates of the retail sales of their products from samples of retail stores. Marketing managers are prone to look at the estimate and ignore sampling error. Suppose that an SRS of 70 stores this month shows mean sales of 53 units of a small appliance, with standard deviation 12 units. During the same month last year, an SRS of 55 stores gave mean sales of 50 units, with standard deviation 10 units. An increase from 50 to 53 is a rise of 6%. The marketing manager is happy because sales are up 6%. (a) Use the two-sample t procedure to give a 95% confidence interval for the difference in mean number of units sold at all retail stores. (b) Explain in language that the manager can understand why he cannot be certain that sales rose by 6%, and that in fact sales may even have dropped. 7.88 An improper significance test. A friend has performed a significance test of the null hypothesis that two means are equal. His report states that the null hypothesis is rejected in favor of the alternative that the first mean is larger than the second. In a presentation on his work, he notes that the first sample mean was larger than the second mean and this is why he chose this particular one-sided alternative. (a) Explain what is wrong with your friend’s procedure and why. (b) Suppose that he reported t ⫽ 1.70 with a P-value of 0.06. What is the correct P-value that he should report? 7.89 Breast-feeding versus baby formula. A study of iron deficiency among infants compared samples of infants following different feeding regimens. One group contained breast-fed infants, while the infants in another group were fed a standard baby formula without any iron supplements. Here are summary results on blood hemoglobin levels at 12 months of age:35
Group
n
x
s
Breast-fed
23
13.3
1.7
Formula
19
12.4
1.8
(a) Is there significant evidence that the mean hemoglobin level is higher among breast-fed babies? State H0 and Ha and carry out a t test. Give the P-value. What is your conclusion? (b) Give a 95% confidence interval for the mean difference in hemoglobin level between the two populations of infants. (c) State the assumptions that your procedures in parts (a) and (b) require in order to be valid. 7.90 Revisiting the sadness and spending study. In Exercise 7.71 (page 468), the purchase price of a water bottle was analyzed using the two-sample t procedures that do not assume equal standard deviations. Compare the means using a significance test and find the 95% confidence interval for the difference using the pooled methods. How do the results compare with those you obtained in Exercise 7.71? 7.91 Revisiting wine labels with animals. In Exercise 7.72 (page 469), attitudes toward a product were compared using the two-sample t procedures that do not assume equal standard deviations. Compare the means using a significance test and find the 95% confidence interval for the difference using the pooled methods. How do the results compare with those you obtained in Exercise 7.72? BPREF 7.92 Revisiting dietary composition. In Exercise 7.75 (page 469), the total amount of fats was analyzed using the two-sample t procedures that do not assume equal standard deviations. Examine the standard deviations for the two groups and verify that it is appropriate to use the pooled procedures for these data. Compare the means using a significance test and find the 95% confidence interval for the difference using the pooled methods. How do the results compare with those you obtained in Exercise 7.75? 7.93 Revisiting the size of trees. Refer to the Wade Tract DBH data in Exercise 7.85 (page 471), where we compared a sample of trees from the northern half of the tract with a sample from the southern half. Because the standard deviations for the two samples are quite close, it is reasonable to analyze these data using the pooled procedures. Perform the significance test and find the 95% confidence interval for the difference in means using these methods. Summarize your results and compare them with what you found in Exercise 7.85. NSPINES 7.94 Revisiting the food-timing study. Example 7.16 (page 456) gives summary statistics for weight loss in
7.3 Other Topics in Comparing Distributions early eaters and late eaters. The two sample standard deviations are quite similar, so we may be willing to assume equal population standard deviations. Calculate the pooled t test statistic and its degrees of freedom from the summary statistics. Use Table D to assess significance. How do your results compare with the unpooled analysis in the example? 7.95 Computing the degrees of freedom. Use the Wade Tract data in Exercise 7.85 to calculate the software approximation to the degrees of freedom using the formula on page 460. Verify your calculation with software. 7.96 Again computing the degrees of freedom. Use the Wade Tract data in Exercise 7.86 to calculate the software approximation to the degrees of freedom using the formula on page 460. Verify your calculation with software. 7.97 Revisiting the dust exposure study. The data on occupational exposure to dust that we analyzed in Exercise 7.77 (page 470) come from two groups of workers that are quite different in size. This complicates the issue regarding pooling because the sample that is larger will dominate the calculations. (a) Calculate the software approximation to the degrees of freedom using the formula on page 460. Then verify your calculations with software. (b) Find the pooled estimate of the standard deviation. Write a short summary comparing it with the estimates of the standard deviations that come from each group.
473
(c) Find the standard error of the difference in sample means that you would use for the method that does not assume equal variances. Do the same for the pooled approach. Compare these two estimates with each other. (d) Perform the significance test and find the 95% confidence interval using the pooled methods. How do these results compare with those you found in Exercise 7.77? (e) Exercise 7.78 has data for the same workers but for respirable dust. Here the standard deviations differ more than those in Exercise 7.77 do. Answer parts (a) through (d) for these data. Write a summary of what you have found in this exercise. 7.98 Revisiting the small-sample example. Refer to Example 7.17 (page 457). This is a case where the sample sizes are quite small. With only 5 observations per group, we have very little information to make a judgment about whether the population standard deviations are equal. The potential gain from pooling is large when the sample sizes are small. Assume that we will perform a two-sided test using the 5% significance level. EATER (a) Find the critical value for the unpooled t test statistic that does not assume equal variances. Use the minimum of n1 ⫺ 1 and n2 ⫺ 1 for the degrees of freedom. (b) Find the critical value for the pooled t test statistic. (c) How does comparing these critical values show an advantage of the pooled test?
7.3 Other Topics in Comparing Distributions When you complete this section, you will be able to • Perform an F test for the equality of two variances. • Argue why this F test is of very little value in practice. In other words, identify when this test can be used and, more importantly, when it cannot. • Determine the sample size necessary to have adequate power to detect a scaled difference in means of size d.
In this section we discuss three topics that are related to the material that we have already covered in this chapter. If we can do inference for means, it is natural to ask if we can do something similar for spread. The answer is Yes, but there are many cautions. We also discuss robustness and show how to find the power for the two-sample t test. If you plan to design studies, you should become familiar with this last topic.
474
CHAPTER 7
•
Inference for Distributions
Inference for population spread The two most basic descriptive features of a distribution are its center and spread. In a Normal population, these aspects are measured by the mean and the standard deviation. We have described procedures for inference about population means for Normal populations and found that these procedures are often useful for non-Normal populations as well. It is natural to turn next to inference about the standard deviations of Normal populations. Our recommendation here is short and clear: don’t do it without expert advice. We will describe the F test for comparing the spread of two Normal populations. Unlike the t procedures for means, the F test and other procedures for standard deviations are extremely sensitive to non-Normal distributions. This lack of robustness does not improve in large samples. It is difficult in practice to tell whether a significant F-value is evidence of unequal population spreads or simply evidence that the populations are not Normal. Consequently, we do not recommend use of inference about population standard deviations in basic statistical practice.36 It was once common to test equality of standard deviations as a preliminary to performing the pooled two-sample t test for equality of two population means. It is better practice to check the distributions graphically, with special attention to skewness and outliers, and to use the software-based two-sample t that does not require equal standard deviations. In the words of one distinguished statistician, “To make a preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!”37
The F test for equality of spread Because of the limited usefulness of procedures for inference about the standard deviations of Normal distributions, we will present only one such procedure. Suppose that we have independent SRSs from two Normal populations, a sample of size n1 from N1m1, s1 2 and a sample of size n2 from N1m2, s2 2. The population means and standard deviations are all unknown. The hypothesis of equal spread H0: s1 ⫽ s2 is tested against Ha: s1 ⬆ s2 by a simple statistic, the ratio of the sample variances.
THE F STATISTIC AND F DISTRIBUTIONS When s21 and s22 are sample variances from independent SRSs of sizes n1 and n2 drawn from Normal populations, the F statistic F⫽
s21 s22
has the F distribution with n1 ⫺ 1 and n2 ⫺ 1 degrees of freedom when H0: s1 ⫽ s2 is true.
7.3 Other Topics in Comparing Distributions
475
FIGURE 7.18 The density curve for the F(9, 10) distribution. The F distributions are skewed to the right.
F distributions
0
1
2
3
4
5
6
The F distributions are a family of distributions with two parameters: the degrees of freedom of the sample variances in the numerator and denominator of the F statistic. The F distributions are another of R. A. Fisher’s contributions to statistics and are called F in his honor. Fisher introduced F statistics for comparing several means. We will meet these useful statistics in later chapters. Our brief notation will be F1 j, k2 for the F distribution with j degrees of freedom in the numerator and k degrees of freedom in the denominator. The numerator degrees of freedom are always mentioned first. Interchanging the degrees of freedom changes the distribution, so the order is important. The F distributions are not symmetric but are right-skewed. The density curve in Figure 7.18 illustrates the shape. Because sample variances cannot be negative, the F statistic takes only positive values and the F distribution has no probability below 0. The peak of the F density curve is near 1; values far from 1 in either direction provide evidence against the hypothesis of equal standard deviations. Tables of F critical values are awkward because a separate table is needed for every pair of degrees of freedom j and k. Table E in the back of the book gives upper P critical values of the F distributions for P 5 0.10, 0.05, 0.025, 0.01, and 0.001. For example, these critical values for the F19, 102 distribution shown in Figure 7.18 are p
0.10
0.05
0.025
0.01
0.001
F*
2.35
3.02
3.78
4.94
8.96
The skewness of F distributions causes additional complications. In the symmetric Normal and t distributions, the point with probability 0.05 below it is just the negative of the point with probability 0.05 above it. This is not true for F distributions. We therefore require either tables of both the upper and lower tails or a way to eliminate the need for lower-tail critical values. Statistical software that eliminates the need for tables is plainly very convenient. If you do not use statistical software, arrange the F test as follows: 1. Take the test statistic to be F⫽
larger s2 smaller s2
476
CHAPTER 7
•
Inference for Distributions This amounts to naming the populations so that s21 is the larger of the observed sample variances. The resulting F is always 1 or greater. 2. Compare the value of F with the critical values from Table E. Then double the probabilities obtained from the table to get the P-value for the twosided F test. The idea is that we calculate the probability in the upper tail and double to obtain the probability of all ratios on either side of 1 that are at least as improbable as that observed. Remember that the order of the degrees of freedom is important in using Table E.
EXAMPLE DATA BP_CA
7.22 Comparing calcium and placebo groups. Example 7.19 (page 462) recounts a medical experiment comparing the effects of calcium and a placebo on the blood pressure of black men. The analysis (Example 7.20) employed the pooled two-sample t procedures. Because these procedures require equal population standard deviations, it is tempting to first test H0: s1 ⫽ s2 Ha: s1 ⬆ s2
CHALLENGE
The larger of the two sample standard deviations is s ⫽ 8.743 from 10 observations. The other is s ⫽ 5.901 from 11 observations. The two-sided test statistic is therefore F⫽
larger s2 smaller s2
⫽
8.7432 ⫽ 2.20 5.9012
We compare the calculated value F ⫽ 2.20 with critical points for the F19, 102 distribution. Table E shows that 2.20 is less than the 0.10 critical value of the F19, 102 distribution, which is F* ⫽ 2.35. Doubling 0.10, we know that the observed F falls short of the 0.20 significance level. The results are not significant at the 20% level (or any lower level). Statistical software shows that the exact upper-tail probability is 0.118, and hence P ⫽ 0.236. If the populations were Normal, the observed standard deviations would give little reason to suspect unequal population standard deviations. Because one of the populations shows some non-Normality, however, we cannot be fully confident of this conclusion.
USE YOUR KNOWLEDGE 7.99 The F statistic. The F statistic F ⫽ s21兾s22 is calculated from samples of size n1 ⫽ 13 and n2 ⫽ 22. (a) What is the upper critical value for this F when using the 0.05 significance level? (b) In a test of equality of standard deviations against the two-sided alternative, this statistic has the value F ⫽ 2.45. Is this value significant at the 5% level? Is it significant at the 10% level?
7.3 Other Topics in Comparing Distributions
477
Robustness of Normal inference procedures We have claimed that • The t procedures for inference about means are quite robust against nonNormal population distributions. These procedures are particularly robust when the population distributions are symmetric and (for the two-sample case) when the two sample sizes are equal. • The F test and other procedures for inference about variances are so lacking in robustness as to be of little use in practice. Simulations with a large variety of non-Normal distributions support these claims. One set of simulations was carried out with samples of size 25 and used significance tests with fixed level a ⫽ 0.05. The three types of tests studied were the one-sample and pooled two-sample t tests and the F test for comparing two variances. The robustness of the one-sample and two-sample t procedures is remarkable. The true significance level remains between about 4% and 6% for a large range of populations. The t test and the corresponding confidence intervals are among the most reliable tools that statisticians use. Remember, however, that outliers can greatly disturb the t procedures. Also, two-sample procedures are less robust when the sample sizes are not similar. The lack of robustness of the tests for variances is equally remarkable. The true significance levels depart rapidly from the target 5% as the population distribution departs from Normality. The two-sided F test carried out with 5% critical values can have a true level of less than 1% or greater than 11% even in symmetric populations with no outliers. Results such as these are the basis for our recommendation that these procedures not be used.
The power of the two-sample t test
noncentral t distribution
The two-sample t test is one of the most used statistical procedures. Unfortunately, because of inadequate planning, users frequently fail to find evidence for the effects that they believe to be true. Power calculations should be part of the planning of any statistical study. Information from a pilot study or previous research is needed. In Section 7.1, we learned how to find an approximation for the power of the one-sample t test. The basic concepts (three steps) for the two-sample case are the same. Here, we give the exact method, which involves a new distribution, the noncentral t distribution. To perform the calculations, we simply need software to calculate probabilities for this distribution. We first present the method for the pooled two-sample t test, where the parameters are m1 ⫺ m2, and the common standard deviation is s. We then describe modifications to get approximate results when we do not pool. To find the power for the pooled two-sample t test, use the following steps. We consider only the case where the null hypothesis is m1 ⫺ m2 ⫽ 0. 1. Specify (a) an alternative value for m1 ⫺ m2 that you consider important to detect; (b) the sample sizes, n1 and n2; (c) a fixed significance level, a; (d) a guess at the standard deviation, s.
478
CHAPTER 7
•
Inference for Distributions 2. Find the degrees of freedom df 5 n1 ⫹ n2 ⫺ 2 and the value of t* that will lead to rejection of H0.
noncentrality parameter
3. (a) Calculate the noncentrality parameter 0 m1 ⫺ m2 0
d⫽ s
1 1 ⫹ n n B 1 2
(b) Find the power as the probability that a noncentral t random variable with degrees of freedom df and noncentrality parameter d will be greater than t*. In SAS the command is 1-PROBT(tstar,df,delta). In R the command is 1-pt(tstar,df,delta). If you do not have software that can perform this calculation, you can approximate the power as the probability that a standard Normal random variable is greater than t* ⫺ d, that is, P1z ⬎ t* ⫺ d2, and use Table A. Note that the denominator in the noncentrality parameter, s
1 1 ⫹ n2 B n1
is our guess at the standard deviation for the difference between the sample means. Therefore, if we wanted to assess a possible study in terms of the margin of error for the estimated difference, we would examine t* times this quantity. If we do not assume that the standard deviations are equal, we need to guess both standard deviations and then combine these for our guess at the standard deviation: s21 s22 ⫹ n2 B n1 This guess is then used in the denominator of the noncentrality parameter. For the degrees of freedom, the conservative approximation is appropriate.
EXAMPLE 7.23 Planning a new study of calcium versus placebo groups. In Example 7.20 (page 464) we examined the effect of calcium on blood pressure by comparing the means of a treatment group and a placebo group using a pooled two-sample t test. The P-value was 0.059, failing to achieve the usual standard of 0.05 for statistical significance. Suppose that we wanted to plan a new study that would provide convincing evidence—say, at the 0.01 level— with high probability. Let’s examine a study design with 45 subjects in each group 1n1 ⫽ n2 ⫽ 45) to see if this meets our goals. Step 1. Based on our previous results, we choose m1 ⫺ m2 ⫽ 5 as an alternative that we would like to be able to detect with a ⫽ 0.01. For s we use 7.4, our pooled estimate from Example 7.20. Step 2. The degrees of freedom are n1 ⫹ n2 ⫺ 2 ⫽ 88, which leads to t* ⫽ 2.37 for the significance test.
7.3 Other Topics in Comparing Distributions
479
Step 3. The noncentrality parameter is 5
d⫽ 7.4
1 1 ⫹ B 45 45
⫽
5 ⫽ 3.21 1.56
Software gives the power as 0.7965, or 80%. The Normal approximation gives 0.7983, a very accurate result. With this choice of sample sizes, we are just barely below 80% power. If we judge this to be enough power, we can proceed to the recruitment of our samples. With n1 ⫽ n2 ⫽ 45, we would expect the margin of error for a 95% confidence interval 1t* ⫽ 1.99) for the difference in means to be t* ⫻ 7.4
1 1 ⫹ ⫽ 1.99 ⫻ 1.56 ⫽ 3.1 B 45 45
With software it is very easy to examine the effects of variations in a study design. In the preceding example, we might want to examine the power for a ⫽ 0.05 and for smaller sample sizes.
USE YOUR KNOWLEDGE 7.100 Power and m1 2 m2. If you repeat the calculation in Example 7.23 for other values of m1 ⫺ m2 that are larger than 5, would you expect the power to be higher or lower than 0.7965? Why? 7.101 Power and the standard deviation. If the true population standard deviation were 7.1 instead of the 7.4 hypothesized in Example 7.23, would the power for this new experiment be greater or smaller than 0.7965? Explain.
SECTION 7.3 Summary Inference procedures for comparing the standard deviations of two Normal populations are based on the F statistic, which is the ratio of sample variances: F⫽
s21 s22
If an SRS of size n1 is drawn from the x1 population and an independent SRS of size n2 is drawn from the x2 population, the F statistic has the F distribution F1n1 ⫺ 1, n2 ⫺ 12 if the two population standard deviations s1 and s2 are in fact equal. The F test for equality of standard deviations tests H0: s1 ⫽ s2 versus Ha: s1 ⬆ s2 using the statistic F⫽
larger s2 smaller s2
and doubles the upper-tail probability to obtain the P-value.
480
CHAPTER 7
•
Inference for Distributions The t procedures are quite robust when the distributions are not Normal. The F tests and other procedures for inference about the spread of one or more Normal distributions are so strongly affected by non-Normality that we do not recommend them for regular use. The power of the pooled two-sample t test is found by first computing the critical value for the significance test, the degrees of freedom, and the noncentrality parameter for the alternative of interest. These are used to find the power from the noncentral t distribution. A Normal approximation works quite well. Calculating margins of error for various study designs and assumptions is an alternative procedure for evaluating designs.
SECTION 7.3 Exercises For Exercise 7.99, see page 476; and for Exercises 7.100 and 7.101, see page 479. In all exercises calling for use of the F test, assume that both population distributions are very close to Normal. The actual data are not always sufficiently Normal to justify use of the F test. 7.102 Comparison of standard deviations. Here are some summary statistics from two independent samples from Normal distributions: Sample
n
s2
1
11
3.5
2
16
9.1
You want to test the null hypothesis that the two population standard deviations are equal versus the two-sided alternative at the 5% significance level. (a) Calculate the test statistic. (b) Find the appropriate value from Table E that you need to perform the significance test. (c) What do you conclude? 7.103 Revisiting the eating-group comparison. Compare the standard deviations of weight loss in Example 7.16 (page 456). Give the test statistic, the degrees of freedom, and the P-value. Write a short summary of your analysis, including comments on the assumptions for the test. 7.104 A fat intake comparison. Compare the standard deviations of fat intake in Exercise 7.75 (page 469). (a) Give the test statistic, the degrees of freedom, and the P-value. Write a short summary of your analysis, including comments on the assumptions for the test. (b) Assume that the sample standard deviation for the late-eaters group is the value 8.2 given in Exercise 7.75. How large would the standard deviation in the earlyeaters group need to be to reject the null hypothesis of equal standard deviations at the 5% level?
7.105 Revisiting the dust exposure study. The twosample problem in Exercise 7.77 (page 470) compares drill and blast workers with outdoor concrete workers with respect to the total dust that they are exposed to in the workplace. Here it may be useful to know whether or not the standard deviations differ in the two groups. Perform the F test and summarize the results. Are you concerned about the assumptions here? Explain why or why not. 7.106 More on the dust exposure study. Exercise 7.78 (page 470) is similar to Exercise 7.77, but the response variable here is exposure to dust particles that can enter and stay in the lungs. Compare the standard deviations with a significance test and summarize the results. Be sure to comment on the assumptions. 7.107 Revisiting the size of trees in the north and south. The diameters of trees in the Wade Tract for random samples selected from the north and south halves of the tract are compared in Exercise 7.85 (page 471). Is there a statistically significant difference between the standard deviations for these two parts of the tract? Perform the significance test and summarize the results. Does the Normal assumption appear reasonable for these data? NSPINES 7.108 Revisiting the size of trees in the east and west. Tree diameters for the east and west halves of the Wade Tract are compared in Exercise 7.86 (page 472). Using the questions in the previous exercise as a guide, analyze these data. EWPINES 7.109 Revisiting the small-sample example. In Example 7.17 (page 457), we addressed a study with only 5 observations per group. EATER (a) Is there a statistically significant difference between the standard deviations of these two groups? Perform the test using a significance level of 0.05 and state your conclusion. (b) Using Table E, state the value that the ratio of variances would need to exceed for us to reject the null hypothesis (at the 5% level) that the standard deviations
Chapter 7 Exercises are equal. Also, report this value for sample sizes of n ⫽ 4, 3, and 2. What does this suggest about the power of this test when sample sizes are small? 7.110 Planning a study to compare tree size. In Exercise 7.85 (page 471) DBH data for longleaf pine trees in two parts of the Wade Tract are compared. Suppose that you are planning a similar study in which you will measure the diameters of longleaf pine trees. Based on Exercise 7.85, you are willing to assume that the standard deviation for both halves is 20 cm. Suppose that a difference in mean DBH of 10 cm or more would be important to detect. You will use a t statistic and a two-sided alternative for the comparison. (a) Find the power if you randomly sample 20 trees from each area to be compared. (b) Repeat the calculations for 60 trees in each sample. (c) If you had to choose between the 20 and 60 trees per sample, which would you choose? Give reasons for your answer. 7.111 More on planning a study to compare tree size. Refer to the previous exercise. Find the two standard deviations from Exercise 7.85. Do the same for the data in Exercise 7.86, which is a similar setting. These are somewhat smaller than the assumed value that you used
481
in the previous exercise. Explain why it is generally a better idea to assume a standard deviation that is larger than you expect than one that is smaller. Repeat the power calculations for some other reasonable values of s and comment on the impact of the size of s for planning the new study. 7.112 Planning a study to compare ad placement. Refer to Exercise 7.84 (page 471), where we compared trustworthiness ratings for ads from two different publications. Suppose that you are planning a similar study using two different publications that are not expected to show the differences seen when comparing the Wall Street Journal with the National Enquirer. You would like to detect a difference of 1.5 points using a twosided significance test with a 5% level of significance. Based on Exercise 7.84, it is reasonable to use 1.6 as the value of the common standard deviation for planning purposes. (a) What is the power if you use sample sizes similar to those used in the previous study—for example, 65 for each publication? (b) Repeat the calculations for 100 in each group. (c) What sample size would you recommend for the new study?
CHAPTER 7 Exercises 7.113 LSAT scores. The scores of four senior roommates on the Law School Admission Test (LSAT) are 156 133 147 122 Find the mean, the standard deviation, and the standard error of the mean. Is it appropriate to calculate a confidence interval based on these data? Explain why or why not. LSAT 7.114 Converting a two-sided P-value. You use statistical software to perform a significance test of the null hypothesis that two means are equal. The software reports a P-value for the two-sided alternative. Your alternative is that the first mean is greater than the second mean. (a) The software reports t ⫽ 2.08 with a P-value of 0.068. Would you reject H0 at a ⫽ 0.05? Explain your answer. (b) The software reports t ⫽ ⫺2.08 with a P-value of 0.068. Would you reject H0 at a ⫽ 0.05? Explain your answer. 7.115 Degrees of freedom and confidence interval width. As the degrees of freedom increase, the t distributions get closer and closer to the z 1N10, 12) distribution. One way to see this is to look at how the value of t* for
a 95% confidence interval changes with the degrees of freedom. Make a plot with degrees of freedom from 2 to 100 on the x axis and t* on the y axis. Draw a horizontal line on the plot corresponding to the value of z* ⫽ 1.96. Summarize the main features of the plot. 7.116 Degrees of freedom and t*. Refer to the previous exercise. Make a similar plot for a 90% confidence interval. How do the main features of this plot compare with those of the plot in the previous exercise? 7.117 Sample size and margin of error. The margin of error for a confidence interval depends on the confidence level, the standard deviation, and the sample size. Fix the confidence level at 95% and the standard deviation at 1 to examine the effect of the sample size. Find the margin of error for sample sizes of 5 to 100 by 5s—that is, let n ⫽ 5, 10, 15, p , 100. Plot the margins of error versus the sample size and summarize the relationship. 7.118 More on sample size and margin of error. Refer to the previous exercise. Make a similar plot and summarize its features for a 99% confidence interval. 7.119 Which design? The following situations all require inference about a mean or means. Identify each
482
CHAPTER 7
•
Inference for Distributions
as (1) a single sample, (2) matched pairs, or (3) two independent samples. Explain your answers.
Group 1
Group 2
48.86
48.88
50.60
52.63
51.02
52.55
47.99
50.94
54.20
53.02
50.66
50.66
45.91
47.78
48.79
48.44
47.76
48.92
51.13
51.63
(a) Your customers are college students. You are interested in comparing the interest in a new product that you are developing between those students who live in the dorms and those who live elsewhere. (b) Your customers are college students. You are interested in finding out which of two new product labels is more appealing. (c) Your customers are college students. You are interested in assessing their interest in a new product. 7.120 Which design? The following situations all require inference about a mean or means. Identify each as (1) a single sample, (2) matched pairs, or (3) two independent samples. Explain your answers. (a) You want to estimate the average age of your store’s customers. (b) You do an SRS survey of your customers every year. One of the questions on the survey asks about customer satisfaction on a seven-point scale with the response 1 indicating “very dissatisfied” and 7 indicating “very satisfied.” You want to see if the mean customer satisfaction has improved from last year. (c) You ask an SRS of customers their opinions on each of two new floor plans for your store. 7.121 Number of critical food violations. The results of a major city’s restaurant inspections are available through its online newspaper.38 Critical food violations are those that put patrons at risk of getting sick and must immediately be corrected by the restaurant. An SRS of n ⫽ 200 inspections from the more than 16,000 inspections since January 2009 were collected, resulting in x ⫽ 0.83 violations and s ⫽ 0.95 violations. (a) Test the hypothesis that the average number of critical violations is less than 1.5 using a significance level of 0.05. State the two hypotheses, the test statistic, and P-value. (b) Construct a 95% confidence interval for the average number of critical violations and summarize your result. (c) Which of the two summaries (significance test versus confidence interval) do you find more helpful in this case? Explain your answer. (d) These data are integers ranging from 0 to 9. The data are also skewed to the right, with 70% of the values either a 0 or a 1. Given this information, do you think use of the t procedures is appropriate? Explain your answer. 7.122 Two-sample t test versus matched pairs t test. Consider the following data set. The data were actually collected in pairs, and each row represents a pair. PAIRED
(a) Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a twosample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative. (b) Now analyze the data in the proper way. Compute the sample mean and variance of the differences. Then compute the t statistic, degrees of freedom, and P-value. (c) Describe the differences in the two test results. 7.123 Two-sample t test versus matched pairs t test, continued. Refer to the previous exercise. Perhaps an easier way to see the major difference in the two analysis approaches for these data is by computing 95% confidence intervals for the mean difference. (a) Compute the 95% confidence interval using the twosample t confidence interval. (b) Compute the 95% confidence interval using the matched pairs t confidence interval. (c) Compare the estimates (that is, the centers of the intervals) and margins of error. What is the major difference between the two approaches for these data? 7.124 Average service time. Recall the drive-thru study in Exercise 7.73 (page 469). Another benchmark that was measured was the service time. A summary of the results (in seconds) for two of the chains is shown below. Chain
n
x
s
Taco Bell
307
149.69
35.7
McDonald’s
362
188.83
42.8
(a) Is there a difference in the average service time between these two chains? Test the null hypothesis that the chains’ average service time is the same. Use a significance level of 0.05. (b) Construct a 95% confidence interval for the difference in average service time.
Chapter 7 Exercises (c) Lex plans to go to Taco Bell and Sam to McDonald’s. Does the interval in part (b) contain the difference in their service times that they’re likely to encounter? Explain your answer. 7.125 Interracial friendships in college. A study utilized the random roommate assignment process of a small college to investigate the interracial mix of friends among students in college.39 As part of this study, the researchers looked at 238 white students who were randomly assigned a roommate in their first year and recorded the proportion of their friends (not including the first-year roommate) who were black. The following table summarizes the results, broken down by roommate race, for the middle of the first and third years of college. Middle of First Year Randomly assigned
n
x
s
Black roommate
41
0.085
0.134
White roommate
197
0.063
0.112
Middle of Third Year Randomly assigned
n
x
s
Black roommate
41
0.146
0.243
White roommate
197
0.062
0.154
(a) Proportions are not Normally distributed. Explain why it may still be appropriate to use the t procedures for these data. (b) For each year, state the null and alternative hypotheses for comparing these two groups. (c) For each year, perform the significance test at the a ⫽ 0.05 level, making sure to report the test statistic, degrees of freedom, and P-value. (d) Write a one-paragraph summary of your conclusions from these two tests. 7.126 Interracial friendships in college, continued. Refer to the previous exercise. For each year, construct a 95% confidence interval for the difference in means m1 ⫺ m2 and describe how these intervals can be used to test the null hypotheses in part (b) of the previous exercise. 7.127 Alcohol consumption and body composition. Individuals who consume large amounts of alcohol do not use the calories from this source as efficiently as calories from other sources. One study examined the effects of moderate alcohol consumption on body composition and the intake of other foods. Fourteen subjects participated in a crossover design where they either drank wine for the first 6 weeks and then abstained for the next 6 weeks or vice versa.40 During the period when they drank wine, the subjects, on average, lost 0.4 kilograms (kg) of body weight; when they did not
483
drink wine, they lost an average of 1.1 kg. The standard deviation of the difference between the weight lost under these two conditions is 8.6 kg. During the wine period, they consumed an average of 2589 calories; with no wine, the mean consumption was 2575. The standard deviation of the difference was 210. (a) Compute the differences in means and the standard errors for comparing body weight and caloric intake under the two experimental conditions. (b) A report of the study indicated that there were no significant differences in these two outcome measures. Verify this result for each measure, giving the test statistic, degrees of freedom, and the P-value. (c) One concern with studies such as this, with a small number of subjects, is that there may not be sufficient power to detect differences that are potentially important. Address this question by computing 95% confidence intervals for the two measures and discuss the information provided by the intervals. (d) Here are some other characteristics of the study. The study periods lasted for 6 weeks. All subjects were males between the ages of 21 and 50 years who weighed between 68 and 91 kg. They were all from the same city. During the wine period, subjects were told to consume two 135-milliliter (ml) servings of red wine per day and no other alcohol. The entire 6-week supply was given to each subject at the beginning of the period. During the other period, subjects were instructed to refrain from any use of alcohol. All subjects reported that they complied with these instructions except for three subjects, who said that they drank no more than three to four 12-ounce bottles of beer during the no-alcohol period. Discuss how these factors could influence the interpretation of the results. 7.128 Brain training. The assessment of computerized brain-training programs is a rapidly growing area of research. Researchers are now focusing on who this training benefits most, what brain functions can be best improved, and which products are most effective. One study looked at 487 community-dwelling adults aged 65 and older, each randomly assigned to one of two training groups. In one group, the participants used a computerized program for 1 hour per day. In the other, DVD-based educational programs were shown with quizzes following each video. The training period lasted 8 weeks. The response was the improvement in a composite score obtained from an auditory memory/attention survey given before and after the 8 weeks.41 The results are summarized in the following table. Group
n
x
s
Computer program
242
3.9
8.28
DVD program
245
1.8
8.33
484
CHAPTER 7
•
Inference for Distributions
(a) Given that there are other studies showing a benefit of computerized brain training, state the null and alternative hypotheses. (b) Report the test statistic, its degrees of freedom, and the P-value. What is your conclusion using significance level a ⫽ 0.05? (c) Can you conclude that this computerized brain training always improves a person’s auditory memory better than the DVD program? If not, explain why. 7.129 Can mockingbirds learn to identify specific humans? A central question in urban ecology is why some animals adapt well to the presence of humans and others do not. The following results summarize part of a study of the northern mockingbird (Mimus polyglottos) that took place on a campus of a large university.42 For 4 consecutive days, the same human approached a nest and stood 1 meter away for 30 seconds, placing his or her hand on the rim of the nest. On the 5th day, a new person did the same thing. Each day, the distance of the human from the nest when the bird flushed was recorded. This was repeated for 24 nests. The human intruder varied his or her appearance (that is, wore different clothes) over the 4 days. We report results for only Days 1, 4, and 5 here. The response variable is flush distance measured in meters. Day
Mean
s
1
6.1
4.9
4
15.1
7.3
5
4.9
5.3
(a) Explain why this should be treated as a matched design. (b) Unfortunately, the research article does not provide the standard error of the difference, only the standard error of the mean flush distance for each day. However, we can use the general addition rule for variances (page 275) to approximate it. If we assume that the correlation between the flush distance at Day 1 and Day 4 for each nest is r ⫽ 0.40, what is the standard deviation for the difference in distance? (c) Using your result in part (b), test the hypothesis that there is no difference in the flush distance across these two days. Use a significance level of 0.05. (d) Repeat parts (b) and (c) but now compare Day 1 and Day 5, assuming a correlation between flush distances for each nest of r ⫽ 0.30. (e) Write a brief summary of your conclusions. 7.130 The wine makes the meal? In one study, 39 diners were given a free glass of cabernet sauvignon wine to
accompany a French meal.43 Although the wine was identical, half the bottle labels claimed the wine was from California and the other half claimed it was from North Dakota. The following table summarizes the grams of entrée and wine consumed during the meal.
Entrée Wine
Wine label
n
Mean
California
24
499.8
St. dev. 87.2
North Dakota
15
439.0
89.2
California
24
100.8
23.3
North Dakota
15
110.4
9.0
Did the patrons who thought that the wine was from California consume more? Analyze the data and write a report summarizing your work. Be sure to include details regarding the statistical methods you used, your assumptions, and your conclusions. 7.131 Study design information. In the previous study, diners were seated alone or in groups of two, three, four, and, in one case, nine (for a total of n ⫽ 16 tables). Also, each table, not each patron, was randomly assigned a particular wine label. Does this information alter how you might do the analysis in the previous problem? Explain your answer. 7.132 Analysis of tree size using the complete data set. The data used in Exercises 7.31 (page 443), 7.85, and 7.86 (pages 471 and 472) were obtained by taking simple random samples from the 584 longleaf pine trees that were measured in the Wade Tract. The entire data set is given in the WADE data set. Find the 95% confidence interval for the mean DBH using the entire data set, and compare this interval with the one that you calculated in Exercise 7.31. Write a report about these data. Include comments on the effect of the sample size on the margin of error, the distribution of the data, the appropriateness of the Normality-based methods for this problem, and the generalizability of the results to other similar stands of longleaf pine or other kinds of trees in this area of the United States and other areas. WADE 7.133 More on conditions for inference. Suppose that your state contains 85 school corporations and each corporation reports its expenditures per pupil. Is it proper to apply the one-sample t method to these data to give a 95% confidence interval for the average expenditure per pupil? Explain your answer. 7.134 A comparison of female high school students. A study was performed to determine the prevalence of the female athlete triad (low energy availability, menstrual dysfunction, and low bone mineral density) in high school students.44 A total of 80 high school athletes and 80 sedentary students were assessed. The following table summarizes several measured characteristics:
Chapter 7 Exercises
Athletes Characteristic
Sedentary s
x
x
s
Body fat (%)
25.61
5.54
32.51
8.05
Body mass index
21.60
2.46
26.41
2.73
297.13
516.63
580.54
372.77
2.21
1.46
1.82
1.24
Calcium deficit (mg) Glasses of milk/day
(a) For each of the characteristics, test the hypothesis that the means are the same in the two groups. Use a significance level of 0.05 for each test. (b) Write a short report summarizing your results. 7.135 Competitive prices? A retailer entered into an exclusive agreement with a supplier who guaranteed to provide all products at competitive prices. The retailer eventually began to purchase supplies from other vendors who offered better prices. The original supplier filed a legal action claiming violation of the agreement. In defense, the retailer had an audit performed on a random sample of invoices. For each audited invoice, all purchases made from other suppliers were examined and the prices were compared with those offered by the original supplier. For each invoice, the percent of purchases for which the alternate supplier offered a lower price than the original supplier was recorded.45 Here are the data: 0 68
100
0
100
33
34 100 48
78 100
100 79 100 100 100 100 100 100
77 100 38
89 100 100
Report the average of the percents with a 95% margin of error. Do the sample invoices suggest that the original supplier’s prices are not competitive on the average? COMPETE 7.136 Weight-loss programs. In a study of the effectiveness of weight-loss programs, 47 subjects who were at least 20% overweight took part in a group support program for 10 weeks. Private weighings determined each subject’s weight at the beginning of the program and 6 months after the program’s end. The matched pairs t test was used to assess the significance of the average weight loss. The paper reporting the study said, “The subjects lost a significant amount of weight over time, t1462 ⫽ 4.68, p ⬍ 0.01.” It is common to report the results of statistical tests in this abbreviated style.46 (a) Why was the matched pairs statistic appropriate? (b) Explain to someone who knows no statistics but is interested in weight-loss programs what the practical conclusion is. (c) The paper follows the tradition of reporting significance only at fixed levels such as a ⫽ 0.01. In fact, the results are more significant than “p ⬍ 0.01” suggests. What can you say about the P-value of the t test?
485
7.137 Do women perform better in school? Some research suggests that women perform better than men in school, but men score higher on standardized tests. Table 1.3 (page 29) presents data on a measure of school performance, grade point average (GPA), and a standardized test, IQ, for 78 seventh-grade students. Do these data lend further support to the previously found gender differences? Give graphical displays of the data and describe the distributions. Use significance tests and confidence intervals to examine this question, and prepare a short report summarizing your findings. GRADES 7.138 Self-concept and school performance. Refer to the previous exercise. Although self-concept in this study was measured on a scale with values in the data set ranging from 20 to 80, many prefer to think of this kind of variable as having only two possible values: low self-concept or high self-concept. Find the median of the self-concept scores in Table 1.3, and define those students with scores at or below the median to be lowself-concept students and those with scores above the median to be high-self-concept students. Do high-selfconcept students have GPAs that differ from those of lowself-concept students? What about IQ? Prepare a report addressing these questions. Be sure to include graphical and numerical summaries and confidence intervals, and state clearly the details of significance tests. GRADES 7.139 Behavior of pet owners. On the morning of March 5, 1996, a train with 14 tankers of propane derailed near the center of the small Wisconsin town of Weyauwega. Six of the tankers were ruptured and burning when the 1700 residents were ordered to evacuate the town. Researchers study disasters like this so that effective relief efforts can be designed for future disasters. About half the households with pets did not evacuate all their pets. A study conducted after the derailment focused on problems associated with retrieval of the pets after the evacuation and characteristics of the pet owners. One of the scales measured “commitment to adult animals,” and the people who evacuated all or some of their pets were compared with those who did not evacuate any of their pets. Higher scores indicate that the pet owner is more likely to take actions that benefit the pet.47 Here are the data summaries: Group
n
x
s
Evacuated all or some pets
116
7.95
3.62
Did not evacuate any pets
125
6.26
3.56
Analyze the data and prepare a short report describing the results. 7.140 Occupation and diet. Do various occupational groups differ in their diets? A British study of this question compared 98 drivers and 83 conductors of London double-decker buses.48 The conductors’ jobs require more
486
CHAPTER 7
•
Inference for Distributions
physical activity. The article reporting the study gives the data as “Mean daily consumption 1⫾ se).” Here are some of the study results: Drivers
Conductors
Total calories
2821 ⫾ 44
2844 ⫾ 48
Alcohol (grams)
0.24 ⫾ 0.06
0.39 ⫾ 0.11
(a) What does “se” stand for? Give x and s for each of the four sets of measurements. (b) Is there significant evidence at the 5% level that conductors consume more calories per day than do drivers? Use the two-sample t method to give a P-value, and then assess significance. (c) How significant is the observed difference in mean alcohol consumption? Use two-sample t methods to obtain the P-value. (d) Give a 95% confidence interval for the mean daily alcohol consumption of London double-decker bus conductors. (e) Give a 99% confidence interval for the difference in mean daily alcohol consumption between drivers and conductors. 7.141 Occupation and diet, continued. Use of the pooled two-sample t test is justified in part (b) of the previous exercise. Explain why. Find the P-value for the pooled t statistic, and compare it with your result in the previous exercise. 7.142 Conditions for inference. The report cited in Exercise 7.140 says that the distributions of alcohol consumption among the individuals studied are “grossly skew.”
(a) Do you think that this skewness prevents the use of the two-sample t test for equality of means? Explain your answer. (b) Do you think that the skewness of the distributions prevents the use of the F test for equality of standard deviations? Explain your answer. 7.143 Different methods of teaching reading. In the READ data set, the response variable Post3 is to be compared for three methods of teaching reading. The Basal method is the standard, or control, method, and the two new methods are DRTA and Strat. We can use the methods of this chapter to compare Basal with DRTA and Basal with Strat. Note that to make comparisons among three treatments it is more appropriate to use the procedures that we will learn in Chapter 12. READ (a) Is the mean reading score with the DRTA method higher than that for the Basal method? Perform an analysis to answer this question, and summarize your results. (b) Answer part (a) for the Strat method in place of DRTA. 7.144 Sample size calculation. Example 7.13 (page 449) tells us that the mean height of 10-year-old girls is N156.4, 2.72 and for boys it is N155.7, 3.82. The null hypothesis that the mean heights of 10-year-old boys and girls are equal is clearly false. The difference in mean heights is 56.4 ⫺ 55.7 ⫽ 0.7 inch. Small differences such as this can require large sample sizes to detect. To simplify our calculations, let’s assume that the standard deviations are the same, say s ⫽ 3.2, and that we will measure the heights of an equal number of girls and boys. How many would we need to measure to have a 90% chance of detecting the (true) alternative hypothesis?
Inference for Proportions Introduction We frequently collect data on categorical variables, such as whether or not a person is employed, the brand name of a cell phone, or the country where a college student studies abroad. When we record categorical variables, our data consist of counts or of percents obtained from counts. In these settings, our goal is to say something about the corresponding population proportions. Just as in the case of inference about population means, we may be concerned with a single population or with comparing two populations. Inference about one or two proportions is very similar to inference about means, which we discussed in Chapter 7. In particular, inference for both means and proportions is based on sampling distributions that are approximately Normal. We begin in Section 8.1 with inference about a single population proportion. Section 8.2 concerns methods for comparing two proportions.
CHAPTER
8
8.1 Inference for a Single Proportion 8.2 Comparing Two Proportions
487
488
CHAPTER 8
•
Inference for Proportions
8.1 Inference for a Single Proportion When you complete this section, you will be able to • Identify the sample proportion, the sample size, and the count for a single proportion. Use this information to estimate the population proportion. • Describe the relationship between the population proportion and the sample proportion. • Identify the standard error for a sample proportion and the margin of error for confidence level C. • Apply the guidelines for when to use the large-sample confidence interval for a population proportion. • Find and interpret the large-sample confidence interval for a single proportion. • Apply the guidelines for when to use the large-sample significance test for a population proportion. • Use the large-sample significance test to test a null hypothesis about a population proportion. • Find the sample size needed for a desired margin of error.
LOOK BACK sample proportion, p. 321
We want to estimate the proportion p of some characteristic in a large population. For example, we may want to know the proportion of likely voters who approve of the president’s conduct in office. We select a simple random sample (SRS) of size n from the population and record the count X of “successes” (such as “Yes” answers to a question about the president). We will use “success” to represent the characteristic of interest. The sample proportion of successes pˆ ⫽ X兾n estimates the unknown population proportion p. If the population is much larger than the sample (say, at least 20 times as large), the count X has approximately the binomial distribution B1n, p2.1 In statistical terms, we are concerned with inference about the probability p of a success in the binomial setting.
EXAMPLE DATA BREAK
8.1 Take a break from Facebook. A Pew Internet survey reported that 61% of Facebook users have taken a voluntary break from Facebook of several weeks or more at one time or another. The survey contacted 1006 adults living in the United States by landline and cell phone. The 525 people who reported that they were Facebook users were asked, “Have you ever voluntarily taken a break from Facebook for a period of several weeks or more?” A total of 320 responded, “Yes, I have done this.”2 Here, p is the proportion of adults in the population of Facebook users who have taken a break of several weeks or more, and the sample proportion pˆ is
CHALLENGE
pˆ ⫽
X 320 ⫽ 0.6095 ⫽ n 525
8.1 Inference for a Single Proportion
489
Pew uses the sample proportion pˆ to estimate the population proportion p. Pew estimates that 61% of all adult Facebook users in the United States have taken a break from using Facebook for several weeks or more. USE YOUR KNOWLEDGE 8.1 Smartphones and purchases. A Google research study asked 5013 smartphone users about how they used their phones. In response to a question about purchases, 2657 reported that they purchased an item after using their smartphone to search for information about the item.3 (a) What is the sample size n for this survey? (b) In this setting, describe the population proportion p in a short sentence. (c) What is the count X? Describe the count in a short sentence. (d) Find the sample proportion pˆ . 8.2 Past usage of Facebook. Refer to the Pew Internet survey described in Example 8.1. There were 334 Internet users who don’t use Facebook. Of these, 67 reported that they have used Facebook in the past. (a) What is the sample size n for the population of Internet users who don’t use Facebook? (b) In this setting, describe the population proportion p in a short sentence. (c) What is the count X of Internet users who don’t use Facebook but have used Facebook in the past? (d) Find the sample proportion pˆ . If the sample size n is very small, we must base tests and confidence intervals for p on the binomial distributions. These are awkward to work with because of the discreteness of the binomial distributions.4 But we know that when the sample is large, both the count X and the sample proportion pˆ are approximately Normal. We will consider only inference procedures based on the Normal approximation. These procedures are similar to those for inference about the mean of a Normal distribution.
Large-sample confidence interval for a single proportion LOOK BACK Normal approximation for proportions, p. 332
LOOK BACK standard error, p. 418
The unknown population proportion p is estimated by the sample proportion pˆ X兾n. If the sample size n is sufficiently large, pˆ has approximately the Normal distribution, with mean mpˆ p and standard deviation spˆ 1p11 p2兾n. This means that approximately 95% of the time pˆ will be within 2 1p11 p2兾n of the unknown population proportion p. Note that the standard deviation spˆ depends upon the unknown parameter p. To estimate this standard deviation using the data, we replace p in the formula by the sample proportion pˆ . As we did in Chapter 7, we use the term standard error for the standard deviation of a statistic that is estimated from data. Here is a summary of the procedure.
490
CHAPTER 8
•
Inference for Proportions
LARGE-SAMPLE CONFIDENCE INTERVAL FOR A POPULATION PROPORTION Choose an SRS of size n from a large population with an unknown proportion p of successes. The sample proportion is pˆ
X n
where X is the number of successes. The standard error of pˆ is SEpˆ
pˆ 11 pˆ 2 n B
and the margin of error for confidence level C is m z*SEpˆ where the critical value z* is the value for the standard Normal density curve with area C between z* and z*. An approximate level C confidence interval for p is pˆ m Use this interval for 90%, 95%, or 99% confidence when the number of successes and the number of failures are both at least 10.
Table D includes a line at the bottom with values of z* for selected values of C. Use Table A for other values of C.
EXAMPLE 8.2 Inference for Facebook breaks. The sample survey in Example 8.1 found that 320 of a sample of 525 Facebook users took a break from Facebook for several weeks or more. In that example we calculated pˆ 0.6095. The standard error is SEpˆ
pˆ 11 pˆ 2 0.609511 0.60952 0.02129 n B B 525
The z* critical value for 95% confidence is z* 1.96, so the margin of error is m 1.96SEpˆ 11.962 10.021292 0.04173 The confidence interval is pˆ m 0.61 0.04 We are 95% confident that between 57% and 65% of Facebook users took a voluntary break of several weeks or more. In performing these calculations, we have kept a large number of digits for our intermediate calculations. However, when reporting the results, we prefer to use rounded values: for example, 61% with a margin of error of 4%. In this way we focus attention on our major findings. There is no important information to be gained by reporting 0.6095 with a margin of error of 0.04173.
8.1 Inference for a Single Proportion
491
Remember that the margin of error in any confidence interval includes only random sampling error. If people do not respond honestly to the questions asked, for example, your estimate is likely to miss by more than the margin of error. Although the calculations for statistical inference for a single proportion are relatively straightforward and can be done with a calculator or in a spreadsheet, we prefer to use software. FIGURE 8.1 The Facebook break
Excel
data in an Excel spreadsheet for the confidence interval in Example 8.3.
A
B
C
Count
1
Break
2
Yes
320
3
No
205
4
EXAMPLE 8.3 Facebook break confidence interval using software. Figure 8.1 shows a spreadsheet that could be used as input for statistical software that calculates a confidence interval for a proportion for our Facebook break example. Note that 525 is the number of cases for this example. The sheet specifies a value for each of these cases: there are 320 cases with the value “Yes” and 205 cases with the value “No.” An alternative sheet would list all 525 cases with the values for each case. Figure 8.2 gives output from JMP, Minitab, and SAS for these data. Each is a little different but it is easy to find what we need. For JMP, the confidence interval is on the line with “Level” equal to “Yes” under the headings “Lower CL” and “Upper CL.” Minitab gives the output in the form of an interval under the heading “95% CI.” SAS reports the interval calculated in two different ways and uses the labels “95% Lower Conf Limit” and “95% Upper Conf Limit.” FIGURE 8.2 (a) JMP, (b) Minitab, and (c) SAS output for the Facebook break confidence interval in Example 8.3.
JMP Distributions Break Confidence Intervals Level
Count
Prob
Lower CI
Upper CI
1-Alpha
No
205
0.39048
0.349685
0.432859
0.950
Yes
320
0.60952
0.567141
0.650315
0.950
Total
525
(a) JMP
Continued
492
CHAPTER 8
•
Inference for Proportions
FIGURE 8.2 (Continued )
Minitab
Test and CI for One Proportion Sample X 1 320
Sample p 95% CI 0.609524 (0.566319, 0.651489)
N 525
Open a Minitab project file
(b) Minitab
SAS
The SAS System The FREQ Procedure Break Break Frequency Percent
Cumulative Frequency
Cumulative Percent
Yes
320
60.95
320
60.95
No
205
39.05
525
100.00
Binomial Proportion for Break = Yes Proportion
0.6095
ASE
0.0213
95% Lower Conf Limit
0.5678
95% Upper Conf Limit
0.6513
Exact Conf Limits 95% Lower Conf Limit
0.5663
95% Upper Conf Limit
0.6515
Done
(c) SAS
8.1 Inference for a Single Proportion
493
As usual, the output reports more digits than are useful. When you use software, be sure to think about how many digits are meaningful for your purposes. Do not clutter your report with information that is not meaningful. We recommend the large-sample confidence interval for 90%, 95%, and 99% confidence whenever the number of successes and the number of failures are both at least 10. For smaller sample sizes, we recommend exact methods that use the binomial distribution. These are available as the default or as options in many statistical software packages and we do not cover them here. There is also an intermediate case between large samples and very small samples where a slight modification of the large-sample approach works quite well.5 This method is called the “plus four” procedure and is described next.
USE YOUR KNOWLEDGE 8.3 Smartphones and purchases. Refer to Exercise 8.1 (page 489). (a) Find SEpˆ , the standard error of pˆ . (b) Give the 95% confidence interval for p in the form of estimate plus or minus the margin of error. (c) Give the confidence interval as an interval of percents. 8.4 Past usage of Facebook. Refer to Exercise 8.2 (page 489). (a) Find SEpˆ , the standard error of pˆ . (b) Give the 95% confidence interval for p in the form of estimate plus or minus the margin of error. (c) Give the confidence interval as an interval of percents. BEYOND THE BASICS
The plus four confidence interval for a single proportion Computer studies reveal that confidence intervals based on the largesample approach can be quite inaccurate when the number of successes and the number of failures are not at least 10. When this occurs, a simple adjustment to the confidence interval works very well in practice. The adjustment is based on assuming that the sample contains 4 additional observations, 2 of which are successes and 2 of which are failures. The estimator of the population proportion based on this plus four rule is X2 苲 p n4 plus four estimate
This estimate was first suggested by Edwin Bidwell Wilson in 1927, and we call it the plus four estimate. The confidence interval is based on the z statistic obtained by standardizing the plus four estimate 苲 p. Because 苲 p is the sample proportion for our modified sample of size n 4, it isn’t surprising that the distribution of 苲 p is close to the Normal distribution with mean p and standard deviation 1p11 p2兾1n 42. To get a confidence interval, we estimate p by 苲 p in this standard deviation to get the standard error of 苲 p. Here is an example.
494
CHAPTER 8
•
Inference for Proportions
EXAMPLE 8.4 Percent of equol producers. Research has shown that there are many health benefits associated with a diet that contains soy foods. Substances in soy called isoflavones are known to be responsible for these benefits. When soy foods are consumed, some subjects produce a chemical called equol, and it is thought that production of equol is a key factor in the health benefits of a soy diet. Unfortunately, not all people are equol producers; there appear to be two distinct subpopulations: equol producers and equol nonproducers. A nutrition researcher planning some bone health experiments would like to include some equol producers and some nonproducers among her subjects. A preliminary sample of 12 female subjects were measured, and 4 were found to be equol producers. We would like to estimate the proportion of equol producers in the population from which this researcher will draw her subjects. The plus four estimate of the proportion of equol producers is 6 42 苲 0.375 p 12 4 16 For a 95% confidence interval, we use Table D to find z* 1.96. We first compute the standard error SE ⬃p
苲 p 11 苲 p2 B n4 10.3752 11 0.3752 B 16
0.12103 and then the margin of error m z*SE ⬃p 11.962 10.121032 0.237 So the confidence interval is 苲 p m 0.375 0.237 10.138, 0.6122 We estimate with 95% confidence that between 14% and 61% of women from this population are equol producers. Note that the interval is very wide because the sample size is very small. If the true proportion of equol users is near 14%, the lower limit of this interval, there may not be a sufficient number of equol producers in the study if subjects are tested only after they are enrolled in the experiment. It may be necessary to determine whether or not a potential subject is an equol producer. The study could then be designed to have the same number of equol producers and nonproducers.
8.1 Inference for a Single Proportion
495
Significance test for a single proportion LOOK BACK Normal approximation for proportions, p. 332
Recall that the sample proportion pˆ X兾n is approximately Normal, with mean mpˆ p and standard deviation spˆ 1p11 p2兾n. For confidence intervals, we substitute pˆ for p in the last expression to obtain the standard error. When performing a significance test, however, the null hypothesis specifies a value for p, and we assume that this is the true value when calculating the P-value. Therefore, when we test H0: p p0, we substitute p0 into the expression for spˆ and then standardize pˆ . Here are the details. ˇ
ˇ
LARGE-SAMPLE SIGNIFICANCE TEST FOR A POPULATION PROPORTION Draw an SRS of size n from a large population with an unknown proportion p of successes. To test the hypothesis H0: p p0, compute the z statistic z
pˆ p0 p0 11 p0 2 n C
In terms of a standard Normal random variable Z, the approximate P-value for a test of H0 against Ha: p p0 is P1Z z2
Ha: p p0 is P1Z z2
Ha: p ⬆ p0 is 2P1Z 0 z 0 2
z
z
z
We recommend the large-sample z significance test as long as the expected number of successes, np0, and the expected number of failures, n11 p0 2, are both greater than 10.
LOOK BACK sign test for matched pairs, p. 429
If the expected numbers of successes and failures are not both greater than 10, or if the population is less than 20 times as large as the sample, other procedures should be used. One such approach is to use the binomial distribution as we did with the sign test. Here is a large-sample example.
EXAMPLE DATA SUNBLOCK
8.5 Comparing two sunblock lotions. Your company produces a sunblock lotion designed to protect the skin from both UVA and UVB exposure to the sun. You hire a company to compare your product with the product sold by your major competitor. The testing company exposes skin on the backs of a
496
CHAPTER 8
•
Inference for Proportions sample of 20 people to UVA and UVB rays and measures the protection provided by each product. For 13 of the subjects, your product provided better protection, while for the other 7 subjects, your competitor’s product provided better protection. Do you have evidence to support a commercial claiming that your product provides superior UVA and UVB protection? For the data we have n 20 subjects and X 13 successes. The parameter p is the proportion of people who would receive superior UVA and UVB protection from your product. To answer the claim question, we test H0: p 0.5 Ha: p ⬆ 0.5 The expected numbers of successes (your product provides better protection) and failures (your competitor’s product provides better protection) are 20 0.5 10 and 20 0.5 10. Both are at least 10, so we can use the z test. The sample proportion is
pˆ
X 13 0.65 n 20
The test statistic is z
pˆ p0 p0 11 p0 2 n C
0.65 0.5 10.52 10.52 C 20
1.34
From Table A we find P1Z 1.342 0.9099, so the probability in the upper tail is 1 0.9099 0.0901. The P-value is the area in both tails, P 2 0.0901 0.1802. We conclude that the sunblock testing data are compatible with the hypothesis of no difference between your product and your competitor’s product (pˆ 0.65, z 1.34, P 0.182. The data do not support your proposed advertising claim.
Note that we have used the two-sided alternative for this example. In settings like this, we must start with the view that either product could be better if we want to prove a claim of superiority. Thinking or hoping that your product is superior cannot be used to justify a one-sided test. Although these calculations are not particularly difficult to do using a calculator, we prefer to use software. Here are some details.
EXAMPLE DATA SUNBLOCK
8.6 Sunblock significance tests using software. JMP, Minitab, and SAS outputs for the analysis in Example 8.5 appear in Figure 8.3. JMP uses a slightly different way of reporting the results. Two ways of performing the significance test are labeled in the column “Test.” The one that corresponds to the procedure that we have used is on the second line, labeled “Pearson.”
8.1 Inference for a Single Proportion
497
The P-value under the heading “Prob . Chisq” is 0.1797, which is very close to the 0.1802 that we calculated using Table A. Minitab reports the value of the test statistic z, and the P-value is rounded to 0.180. SAS reports the P-value on the last line as 0.1797, the same as the value given in the JMP output.
FIGURE 8.3 (a) JMP, (b) Minitab, and (c) SAS output for the comparison of sunblock lotions in Example 8.5.
JMP Distributions Product Test Probabilities Estim Prob
Hypoth Prob
Theirs
0.35000
0.50000
Yours
0.65000
0.50000
Level
ChiSquare
Test
Prob>Chisq
DF
Likelihood Ratio
1.8280
1
0.1764
Pearson
1.8000
1
0.1797
Method: Fix hypothesized values, rescale omitted Confidence Intervals Level
Count
Prob
Lower CI
Upper CI
1-Alpha
Theirs
7
0.35000
0.181192
0.567146
0.950
Yours
13
0.65000
0.432854
0.818808
0.950
Total
20
Note: Computed using score confidence intervals.
(a) JMP
Minitab
Test and CI for One Proportion Test of p = 0.5 vs p not = 0.5 Sample 1
X 13
N 20
Sample p 95% CI Z-Value 0.650000 (0.440963, 0.859037) 1.34
P-Value 0.180
Using the normal approximation.
Welcome to Minitab, press F1 for help.
(b) Minitab
Continued
498
CHAPTER 8
•
Inference for Proportions
FIGURE 8.3 (Continued )
SAS
The SAS System The FREQ Procedure
Product
Frequency
Percent
Cumulative Frequency
Cumulative Percent
Yours
13
65.00
13
65.00
Theirs
7
35.00
20
100.00
Product
Binomial Proportion for Product = Yours Proportion
0.6500
ASE
0.1067
95% Lower Conf Limit
0.4410
95% Upper Conf Limit
0.8590
Exact Conf Limits 95% Lower Conf Limit
0.4078
95% Upper Conf Limit
0.8461
Test of H0: Proportion = 0.5 ASE under H0
0.1118
Z
1.3416
One-sided Pr > Z
0.0899
Two-sided Pr > Z
0.1797
Sample Size = 20
Done
(c) SAS
8.1 Inference for a Single Proportion
499
USE YOUR KNOWLEDGE 8.5 Draw a picture. Draw a picture of a standard Normal curve and shade the tail areas to illustrate the calculation of the P-value for Example 8.5. 8.6 What does the confidence interval tell us? Inspect the outputs in Figure 8.3. Report the confidence interval for the percent of people who would get better sun protection from your product than from your competitor’s. Be sure to convert from proportions to percents and to round appropriately. Interpret the confidence interval and compare this way of analyzing data with the significance test. 8.7 The effect of X. In Example 8.5, suppose that your product provided better UVA and UVB protection for 15 of the 20 subjects. Perform the significance test and summarize the results. 8.8 The effect of n. In Example 8.5, consider what would have happened if you had paid for twice as many subjects to be tested. Assume that the results would be similar to those in Example 8.5: that is, 65% of the subjects had better UVA and UVB protection with your product. Perform the significance test and summarize the results. In Example 8.5, we treated an outcome as a success whenever your product provided better sun protection. Would we get the same results if we defined success as an outcome where your competitor’s product was superior? In this setting the null hypothesis is still H0: p 0.5. You will find that the z test statistic is unchanged except for its sign and that the P-value remains the same.
USE YOUR KNOWLEDGE 8.9 Redefining success. In Example 8.5 we performed a significance test to compare your product with your competitor’s. Success was defined as the outcome where your product provided better protection. Now, take the viewpoint of your competitor where success is defined to be the outcome where your competitor’s product provides better protection. In other words, n remains the same (20) but X is now 7. (a) Perform the two-sided significance test and report the results. How do these compare with what we found in Example 8.5? (b) Find the 95% confidence interval for this setting, and compare it with the interval calculated when success is defined as the outcome where your product provides better protection. We do not often use significance tests for a single proportion, because it is uncommon to have a situation where there is a precise p0 that we want to test. For physical experiments such as coin tossing or drawing cards from a wellshuffled deck, probability arguments lead to an ideal p0. Even here, however, it can be argued, for example, that no real coin has a probability of heads exactly equal to 0.5. Data from past large samples can sometimes provide a p0 for the null hypothesis of a significance test. In some types of epidemiology research, for example, “historical controls” from past studies serve as the benchmark for evaluating new treatments. Medical researchers argue about
500
CHAPTER 8
•
Inference for Proportions the validity of these approaches, because the past never quite resembles the present. In general, we prefer comparative studies whenever possible.
Choosing a sample size LOOK BACK choosing sample size, p. 364
In Chapter 6, we showed how to choose the sample size n to obtain a confidence interval with specified margin of error m for a Normal mean. Because we are using a Normal approximation for inference about a population proportion, sample size selection proceeds in much the same way. Recall that the margin of error for the large-sample confidence interval for a population proportion is m z*SEpˆ z*
pˆ 11 pˆ 2 n B
Choosing a confidence level C fixes the critical value z*. The margin of error also depends on the value of pˆ and the sample size n. Because we don’t know the value of pˆ until we gather the data, we must guess a value to use in the calculations. We will call the guessed value p*. There are two common ways to get p*: 1. Use the sample estimate from a pilot study or from similar studies done earlier. 2. Use p* 0.5. Because the margin of error is largest when pˆ 0.5, this choice gives a sample size that is somewhat larger than we really need for the confidence level we choose. It is a safe choice no matter what the data later show. Once we have chosen p* and the margin of error m that we want, we can find the n we need to achieve this margin of error. Here is the result.
SAMPLE SIZE FOR DESIRED MARGIN OF ERROR The level C confidence interval for a proportion p will have a margin of error approximately equal to a specified value m when the sample size satisfies na
z* 2 b p*11 p*2 m
Here z* is the critical value for confidence level C, and p* is a guessed value for the proportion of successes in the future sample. The margin of error will be less than or equal to m if p* is chosen to be 0.5. Substituting p* 0.5 into the formula above gives n
1 z* 2 a b 4 m
The value of n obtained by this method is not particularly sensitive to the choice of p* when p* is fairly close to 0.5. However, if the value of p is likely to be smaller than about 0.3 or larger than about 0.7, use of p* 0.5 may result in a sample size that is much larger than needed.
8.1 Inference for a Single Proportion
501
EXAMPLE 8.7 Planning a survey of students. A large university is interested in assessing student satisfaction with the overall campus environment. The plan is to distribute a questionnaire to an SRS of students, but before proceeding, the university wants to determine how many students to sample. The questionnaire asks about a student’s degree of satisfaction with various student services, each measured on a five-point scale. The university is interested in the proportion p of students who are satisfied (that is, who choose either “satisfied” or “very satisfied,” the two highest levels on the five-point scale). The university wants to estimate p with 95% confidence and a margin of error less than or equal to 3%, or 0.03. For planning purposes, it is willing to use p* 0.5. To find the sample size required, n
1 z* 2 1 1.96 2 a b a b 1067.1 4 m 4 0.03
Round up to get n 1068. (Always round up. Rounding down would give a margin of error slightly greater than 0.03.) Similarly, for a 2.5% margin of error, we have (after rounding up) n
1 1.96 2 a b 1537 4 0.025
and for a 2% margin of error, n
1 1.96 2 a b 2401 4 0.02
News reports frequently describe the results of surveys with sample sizes between 1000 and 1500 and a margin of error of about 3%. These surveys generally use sampling procedures more complicated than simple random sampling, so the calculation of confidence intervals is more involved than what we have studied in this section. The calculations in Example 8.7 show in principle how such surveys are planned. In practice, many factors influence the choice of a sample size. The following example illustrates one set of factors.
EXAMPLE 8.8 Assessing interest in Pilates classes. The Division of Recreational Sports (Rec Sports) at a major university is responsible for offering comprehensive recreational programs, services, and facilities to the students. Rec Sports is continually examining its programs to determine how well it is meeting the needs of the students. Rec Sports is considering adding some new programs and would like to know how much interest there is in a new exercise program based on the Pilates method.6 They will take a survey of undergraduate students. In the past, they emailed short surveys to all undergraduate students. The response rate obtained in this way was about 5%. This time they will send emails to a simple random sample of the students and will follow up with additional emails and eventually a phone call to get a higher response rate. Because of limited staff and the work involved with
502
CHAPTER 8
•
Inference for Proportions the follow-up, they would like to use a sample size of about 200 responses. They assume that the new procedures will improve the response rate to 90%, so they will contact 225 students in the hope that these will provide at least 200 valid responses. One of the questions they will ask is “Have you ever heard about the Pilates method of exercise?” The primary purpose of the survey is to estimate various sample proportions for undergraduate students. Will the proposed sample size of n 200 be adequate to provide Rec Sports with the needed information? To address this question, we calculate the margins of error of 95% confidence intervals for various values of pˆ .
EXAMPLE 8.9 Margins of error. In the Rec Sports survey, the margin of error of a 95% confidence interval for any value of pˆ and n 200 is m z*SEpˆ 1.96
pˆ 11 pˆ 2 B 200
0.139 2pˆ 11 pˆ 2 The results for various values of pˆ are pˆ 0.05
0.030
pˆ 0.60
0.10
0.068
0.042
0.70
0.064
0.20
0.056
0.80
0.056
0.30
0.064
0.90
0.042
0.40
0.068
0.95
0.030
0.50
0.070
m
m
Rec Sports judged these margins of error to be acceptable, and they used a sample size of 200 in their survey. The table in Example 8.9 illustrates two points. First, the margins of error for pˆ 0.05 and pˆ 0.95 are the same. The margins of error will always be the same for pˆ and 1 pˆ . This is a direct consequence of the form of the confidence interval. Second, the margin of error varies between only 0.064 and 0.070 as pˆ varies from 0.3 to 0.7, and the margin of error is greatest when pˆ 0.5, as we claimed earlier (page 500). It is true in general that the margin of error will vary relatively little for values of pˆ between 0.3 and 0.7. Therefore, when planning a study, it is not necessary to have a very precise guess for p. If p* 0.5 is used and the observed pˆ is between 0.3 and 0.7, the actual interval will be a little shorter than needed, but the difference will be small. Again it is important to emphasize that these calculations consider only the effects of sampling variability that are quantified in the margin of error. Other sources of error, such as nonresponse and possible misinterpretation of
8.1 Inference for a Single Proportion
503
questions, are not included in the table of margins of error for Example 8.9. Rec Sports is trying to minimize these kinds of errors. They did a pilot study using a small group of current users of their facilities to check the wording of the questions, and for the final survey they devised a careful plan to follow up with the students who did not respond to the initial email.
USE YOUR KNOWLEDGE 8.10 Confidence level and sample size. Refer to Example 8.7 (page 501). Suppose that the university was interested in a 90% confidence interval with margin of error 0.03. Would the required sample size be smaller or larger than 1068 students? Verify this by performing the calculation. 8.11 Make a plot. Use the values for pˆ and m given in Example 8.9 to draw a plot of the sample proportion versus the margin of error. Summarize the major features of your plot.
SECTION 8.1 Summary Inference about a population proportion p from an SRS of size n is based on the sample proportion pˆ X兾n. When n is large, pˆ has approximately the Normal distribution with mean p and standard deviation 1p11 p2兾n. For large samples, the margin of error for confidence level C is m z*SEpˆ where the critical value z* is the value for the standard Normal density curve with area C between z* and z*, and the standard error of pˆ is SEpˆ
pˆ 11 pˆ 2 n B
The level C large-sample confidence interval is pˆ m We recommend using this interval for 90%, 95%, and 99% confidence whenever the number of successes and the number of failures are both at least 10. When sample sizes are smaller, alternative procedures such as the plus four estimate of the population proportion are recommended. The sample size required to obtain a confidence interval of approximate margin of error m for a proportion is found from na
z* 2 b p*11 p*2 m
where p* is a guessed value for the proportion, and z* is the standard Normal critical value for the desired level of confidence. To ensure that the margin of error of the interval is less than or equal to m no matter what pˆ may be, use n
1 z* 2 a b 4 m
504
CHAPTER 8
•
Inference for Proportions Tests of H0: p p0 are based on the z statistic pˆ p0
z B
p0 11 p0 2 n
with P-values calculated from the N10, 12 distribution. Use this procedure when the expected number of successes, np0, and the expected number of failures, n11 p0), are both greater than 10.
SECTION 8.1 Exercises For Exercises 8.1 and 8.2, see page 489; for Exercises 8.3 and 8.4, see page 493; for Exercises 8.5 to 8.8, see page 499; for Exercise 8.9, see page 499; and for Exercises 8.10 and 8.11, see page 503. 8.12 How did you use your cell phone? A Pew Internet poll asked cell phone owners about how they used their cell phones. One question asked whether or not during the past 30 days they had used their phone while in a store to call a friend or family member for advice about a purchase they were considering. The poll surveyed 1003 adults living in the United States by telephone. Of these, 462 responded that they had used their cell phone while in a store within the last 30 days to call a friend or family member for advice about a purchase they were considering.7
8.15 How did you use your cell phone? Refer to Exercise 8.12. (a) Report the sample proportion, the standard error of the sample proportion, and the margin of error for 95% confidence. (b) Are the guidelines for when to use the large-sample confidence interval for a population proportion satisfied in this setting? Explain your answer. (c) Find the 95% large-sample confidence interval for the population proportion. (d) Write a short statement explaining the meaning of your confidence interval.
(a) Identify the sample size and the count.
8.16 Do you eat breakfast? Refer to Exercise 8.13.
(b) Calculate the sample proportion.
(a) Report the sample proportion, the standard error of the sample proportion, and the margin of error for 95% confidence.
(c) Explain the relationship between the population proportion and the sample proportion. 8.13 Do you eat breakfast? A random sample of 200 students from your college are asked if they regularly eat breakfast. Eighty-four students responded that they did eat breakfast regularly. (a) Identify the sample size and the count.
(b) Are the guidelines for when to use the large-sample confidence interval for a population proportion satisfied in this setting? Explain your answer. (c) Find the 95% large-sample confidence interval for the population proportion.
(b) Calculate the sample proportion.
(d) Write a short statement explaining the meaning of your confidence interval.
(c) Explain the relationship between the population proportion and the sample proportion.
8.17 Would you recommend the service to a friend? Refer to Exercise 8.14.
8.14 Would you recommend the service to a friend? An automobile dealership asks all its customers who used their service department in a given two-week period if they would recommend the service to a friend. A total of 230 customers used the service during the two-week period, and 180 said that they would recommend the service to a friend.
(a) Report the sample proportion, the standard error of the sample proportion, and the margin of error for 95% confidence.
(a) Identify the sample size and the count.
(b) Are the guidelines for when to use the large-sample confidence interval for a population proportion satisfied in this setting? Explain your answer.
(b) Calculate the sample proportion.
(c) Find the 95% large-sample confidence interval for the population proportion.
(c) Explain the relationship between the population proportion and the sample proportion.
(d) Write a short statement explaining the meaning of your confidence interval.
8.1 Inference for a Single Proportion 8.18 Whole grain versus regular grain? A study of young children was designed to increase their intake of whole-grain, rather than regular-grain, snacks. At the end of the study the 76 children who participated in the study were presented with a choice between a regular-grain snack and a whole-grain alternative. The whole-grain alternative was chosen by 52 children. You want to examine the possibility that the children are equally likely to choose each type of snack. (a) Formulate the null and alternative hypotheses for this setting. (b) Are the guidelines for using the large-sample significance test satisfied for testing this null hypothesis? Explain your answer. (c) Perform the significance test and summarize your results in a short paragraph. 8.19 Find the sample size. You are planning a survey similar to the one about cell phone use described in Exercise 8.12. You will report your results with a largesample confidence interval. How large a sample do you need to be sure that the margin of error will not be greater than 0.04? Show your work. 8.20 What’s wrong? Explain what is wrong with each of the following: (a) An approximate 90% confidence interval for an unknown proportion p is pˆ plus or minus its standard error. (b) You can use a significance test to evaluate the hypothesis H0: pˆ ⫽ 0.3 versus the one-sided alternative. (c) The large-sample significance test for a population proportion is based on a t statistic. 8.21 What’s wrong? Explain what is wrong with each of the following: (a) A student project used a confidence interval to describe the results in a final report. The confidence level was 115%. (b) The margin of error for a confidence interval used for an opinion poll takes into account the fact that people who did not answer the poll questions may have had different responses from those who did answer the questions. (c) If the P-value for a significance test is 0.50, we can conclude that the null hypothesis has a 50% chance of being true. 8.22 Draw some pictures. Consider the binomial setting with n ⫽ 100 and p ⫽ 0.4. (a) The sample proportion pˆ will have a distribution that is approximately Normal. Give the mean and the standard deviation of this Normal distribution.
505
(b) Draw a sketch of this Normal distribution. Mark the location of the mean. (c) Find a value of x for which the probability is 95% that pˆ is within x of 0.4. Mark the corresponding interval on your plot. 8.23 Country food and Inuits. Country food includes seals, caribou, whales, ducks, fish, and berries and is an important part of the diet of the aboriginal people called Inuits who inhabit Inuit Nunangat, the northern region of what is now called Canada. A survey of Inuits in Inuit Nunangat reported that 3274 out of 5000 respondents said that at least half of the meat and fish that they eat is country food.8 Find the sample proportion and a 95% confidence interval for the population proportion of Inuits whose meat and fish consumption consists of at least half country food. 8.24 Soft drink consumption in New Zealand. A survey commissioned by the Southern Cross Healthcare Group reported that 16% of New Zealanders consume five or more servings of soft drinks per week. The data were obtained by an online survey of 2006 randomly selected New Zealanders over 15 years of age.9 (a) What number of survey respondents reported that they consume five or more servings of soft drinks per week? You will need to round your answer. Why? (b) Find a 95% confidence interval for the proportion of New Zealanders who report that they consume five or more servings of soft drinks per week. (c) Convert the estimate and your confidence interval to percents. (d) Discuss reasons why the estimate might be biased. 8.25 Violent video games. A 2013 survey of 1050 parents who have a child under the age of 18 living at home asked about their opinions regarding violent video games. A report describing the results of the survey stated that 89% of parents say that violence in today’s video games is a problem.10 (a) What number of survey respondents reported that they thought that violence in today’s video games is a problem? You will need to round your answer. Why? (b) Find a 95% confidence interval for the proportion of parents who think that violence in today’s video games is a problem. (c) Convert the estimate and your confidence interval to percents. (d) Discuss reasons why the estimate might be biased.
506
CHAPTER 8
•
Inference for Proportions
8.26 Bullying. Refer to the previous exercise. The survey also reported that 93% of the parents surveyed said that bullying contributes to violence in the United States. Answer the questions in the previous exercise for this item on the survey. 8.27 pˆ and the Normal distribution. Consider the binomial setting with n 50. You are testing the null hypothesis that p 0.3 versus the two-sided alternative with a 5% chance of rejecting the null hypothesis when it is true. (a) Find the values of the sample proportion pˆ that will lead to rejection of the null hypothesis. (b) Repeat part (a) assuming a sample size of n 100. (c) Make a sketch illustrating what you have found in parts (a) and (b). What does your sketch show about the effect of the sample size in this setting? 8.28 Students doing community service. In a sample of 159,949 first-year college students, the National Survey of Student Engagement reported that 39% participated in community service or volunteer work.11 (a) Find the margin of error for 99% confidence. (b) Here are some facts from the report that summarizes the survey. The students were from 617 four-year colleges and universities. The response rate was 36%. Institutions paid a participation fee of between $1800 and $7800 based on the size of their undergraduate enrollment. Discuss these facts as possible sources of error in this study. How do you think these errors would compare with the margin of error that you calculated in part (a)? 8.29 Plans to study abroad. The survey described in the previous exercise also asked about items related to academics. In response to one of these questions, 42% of first-year students reported that they plan to study abroad. (a) Based on the information available, how many students plan to study abroad? (b) Give a 99% confidence interval for the population proportion of first-year college students who plan to study abroad. 8.30 Student credit cards. In a survey of 1430 undergraduate students, 1087 reported that they had one or more credit cards.12 Give a 95% confidence interval for the proportion of all college students who have at least one credit card. 8.31 How many credit cards? The summary of the survey described in the previous exercise reported that 43% of undergraduates had four or more credit cards. Give a
95% confidence interval for the proportion of all college students who have four or more credit cards. 8.32 How would the confidence interval change? Refer to Exercise 8.31. (a) Would a 99% confidence interval be wider or narrower than the one that you found in Exercise 8.31? Verify your results by computing the interval. (b) Would a 90% confidence interval be wider or narrower than the one that you found in that exercise? Verify your results by computing the interval. 8.33 Do students report Internet sources? The National Survey of Student Engagement found that 87% of students report that their peers at least “sometimes” copy information from the Internet in their papers without reporting the source.13 Assume that the sample size is 430,000. (a) Find the margin of error for 99% confidence. (b) Here are some items from the report that summarizes the survey. More than 430,000 students from 730 four-year colleges and universities participated. The average response rate was 43% and ranged from 15% to 89%. Institutions pay a participation fee of between $3000 and $7500 based on the size of their undergraduate enrollment. Discuss these facts as possible sources of error in this study. How do you think these errors would compare with the error that you calculated in part (a)? 8.34 Can we use the z test? In each of the following cases state whether or not the Normal approximation to the binomial should be used for a significance test on the population proportion p. Explain your answers. (a) n 40 and H0: p 0.2. (b) n 30 and H0: p 0.4. (c) n 100 and H0: p 0.15. (d) n 200 and H0: p 0.04. 8.35 Long sermons. The National Congregations Study collected data in a one-hour interview with a key informant—that is, a minister, priest, rabbi, or other staff person or leader.14 One question concerned the length of the typical sermon. For this question 390 out of 1191 congregations reported that the typical sermon lasted more than 30 minutes. (a) Use the large-sample inference procedures to estimate the true proportion for this question with a 95% confidence interval. (b) The respondents to this question were not asked to use a stopwatch to record the lengths of a random
8.1 Inference for a Single Proportion sample of sermons at their congregations. They responded based on their impressions of the sermons. Do you think that ministers, priests, rabbis, or other staff persons or leaders might perceive sermon lengths differently from the people listening to the sermons? Discuss how your ideas would influence your interpretation of the results of this study. 8.36 Confidence level and interval width. Refer to the previous exercise. Would a 99% confidence interval be wider or narrower than the one that you found in that exercise? Verify your results by computing the interval. 8.37 Instant versus fresh-brewed coffee. A matched pairs experiment compares the taste of instant and fresh-brewed coffee. Each subject tastes two unmarked cups of coffee, one of each type, in random order and states which he or she prefers. Of the 50 subjects who participate in the study, 15 prefer the instant coffee. Let p be the probability that a randomly chosen subject prefers fresh-brewed coffee to instant coffee. (In practical terms, p is the proportion of the population who prefer freshbrewed coffee.) (a) Test the claim that a majority of people prefer the taste of fresh-brewed coffee. Report the large-sample z statistic and its P-value. (b) Draw a sketch of a standard Normal curve and mark the location of your z statistic. Shade the appropriate area that corresponds to the P-value. (c) Is your result significant at the 5% level? What is your practical conclusion? 8.38 Annual income of older adults. In a study of older adults, 1444 subjects out of a total of 2733 reported that their annual income was $30,000 or more. (a) Give a 95% confidence interval for the true proportion of subjects in this population with incomes of at least $30,000. (b) Do you think that some respondents might not give truthful answers to a question about their income? Discuss the possible effects on your estimate and confidence interval. 8.39 Tossing a coin 10,000 times! The South African mathematician John Kerrich, while a prisoner of war during World War II, tossed a coin 10,000 times and obtained 5067 heads. (a) Is this significant evidence at the 5% level that the probability that Kerrich’s coin comes up heads is not 0.5? Use a sketch of the standard Normal distribution to illustrate the P-value.
507
(b) Use a 95% confidence interval to find the range of probabilities of heads that would not be rejected at the 5% level. 8.40 Is there interest in a new product? One of your employees has suggested that your company develop a new product. You decide to take a random sample of your customers and ask whether or not there is interest in the new product. The response is on a 1 to 5 scale with 1 indicating “definitely would not purchase”; 2, “probably would not purchase”; 3, “not sure”; 4, “probably would purchase”; and 5, “definitely would purchase.” For an initial analysis, you will record the responses 1, 2, and 3 as “No” and 4 and 5 as “Yes.” What sample size would you use if you wanted the 95% margin of error to be 0.2 or less? 8.41 More information is needed. Refer to the previous exercise. Suppose that after reviewing the results of the previous survey, you proceeded with preliminary development of the product. Now you are at the stage where you need to decide whether or not to make a major investment to produce and market it. You will use another random sample of your customers, but now you want the margin of error to be smaller. What sample size would you use if you wanted the 95% margin of error to be 0.01 or less? 8.42 Sample size needed for an evaluation. You are planning an evaluation of a semester-long alcohol awareness campaign at your college. Previous evaluations indicate that about 20% of the students surveyed will respond “Yes” to the question “Did the campaign alter your behavior toward alcohol consumption?” How large a sample of students should you take if you want the margin of error for 95% confidence to be about 0.08? 8.43 Sample size needed for an evaluation, continued. The evaluation in the previous exercise will also have questions that have not been asked before, so you do not have previous information about the possible value of p. Repeat the preceding calculation for the following values of p*: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9. Summarize the results in a table and graphically. What sample size will you use? 8.44 Are the customers dissatisfied? An automobile manufacturer would like to know what proportion of its customers are dissatisfied with the service received from their local dealer. The customer relations department will survey a random sample of customers and compute a 95% confidence interval for the proportion who are dissatisfied. From past studies, they believe that this proportion will be about 0.2. Find the sample size needed if the margin of error of the confidence interval is to be no more than 0.02.
508
CHAPTER 8
•
Inference for Proportions
8.2 Comparing Two Proportions When you complete this section, you will be able to • Identify the counts and sample sizes for a comparison between two proportions; compute the proportions and find their difference. • Apply the guidelines for when to use the large-sample confidence interval for a difference between two proportions. • Apply the large-sample method to find the confidence interval for a difference between two proportions and interpret the confidence interval. • Apply the guidelines for when to use the large-sample significance test for a difference between two proportions. • Apply the large-sample method to perform a significance test for comparing two proportions and interpret the results of the significance test. • Calculate and interpret the relative risk.
Because comparative studies are so common, we often want to compare the proportions of two groups (such as men and women) that have some characteristic. In the previous section we learned how to estimate a single proportion. Our problem now concerns the comparison of two proportions. We call the two groups being compared Population 1 and Population 2, and the two population proportions of “successes” p1 and p2. The data consist of two independent SRSs, of size n1 from Population 1 and size n2 from Population 2. The proportion of successes in each sample estimates the corresponding population proportion. Here is the notation we will use in this section:
Population
Population proportion
Sample size
Count of successes
Sample proportion
1
p1
n1
X1
2
p2
n2
X2
pˆ 1 X1兾n1 pˆ 2 X2兾n2
To compare the two populations, we use the difference between the two sample proportions: D pˆ 1 pˆ 2
LOOK BACK addition rule for means, p. 272
When both sample sizes are sufficiently large, the sampling distribution of the difference D is approximately Normal. Inference procedures for comparing proportions are z procedures based on the Normal approximation and on standardizing the difference D. The first step is to obtain the mean and standard deviation of D. By the addition rule for means, the mean of D is the difference of the means: mD m pˆ 1 m pˆ 2 p1 p2
8.2 Comparing Two Proportions
LOOK BACK addition rule for variances, p. 275
509
That is, the difference D pˆ 1 pˆ 2 between the sample proportions is an unbiased estimator of the population difference p1 p2. Similarly, the addition rule for variances tells us that the variance of D is the sum of the variances: sD2 s p2ˆ1 s p2ˆ2
p1 11 p1 2 p2 11 p2 2 n1 n2
Therefore, when n1 and n2 are large, D is approximately Normal with mean mD p1 p2 and standard deviation sD
B
p1 11 p1 2 p2 11 p2 2 n1 n2
USE YOUR KNOWLEDGE 8.45 Rules for means and variances. Suppose that p1 0.3, n1 20, p2 0.6, n2 30. Find the mean and the standard deviation of the sampling distribution of p1 p2. 8.46 Effect of the sample sizes. Suppose that p1 0.3, n1 80, p2 0.6, n2 120. (a) Find the mean and the standard deviation of the sampling distribution of p1 p2. (b) The sample sizes here are four times as large as those in the previous exercise while the population proportions are the same. Compare the results for this exercise with those that you found in the previous exercise. What is the effect of multiplying the sample sizes by 4? 8.47 Rules for means and variances. It is quite easy to verify the formulas for the mean and standard deviation of the difference D. (a) What are the means and standard deviations of the two sample proportions pˆ 1 and pˆ 2? (b) Use the addition rule for means of random variables: what is the mean of D pˆ 1 pˆ 2? (c) The two samples are independent. Use the addition rule for variances of random variables: what is the variance of D?
Large-sample confidence interval for a difference in proportions To obtain a confidence interval for p1 p2, we once again replace the unknown parameters in the standard deviation by estimates to obtain an estimated standard deviation, or standard error. Here is the confidence interval we want.
510
CHAPTER 8
•
Inference for Proportions
LARGE-SAMPLE CONFIDENCE INTERVAL FOR COMPARING TWO PROPORTIONS Choose an SRS of size n1 from a large population having proportion p1 of successes and an independent SRS of size n2 from another population having proportion p2 of successes. The estimate of the difference in the population proportions is D pˆ 1 pˆ 2 The standard error of D is SED
pˆ 1 11 pˆ 1 2 pˆ 2 11 pˆ 2 2 n1 n2 B
and the margin of error for confidence level C is m z*SED where the critical value z* is the value for the standard Normal density curve with area C between z* and z*. An approximate level C confidence interval for p1 p2 is Dm Use this method for 90%, 95%, or 99% confidence when the number of successes and the number of failures in each sample are both at least 10.
EXAMPLE DATA FACEBOOK
8.10 Are you spending more time on Facebook? A Pew Internet survey asked 525 Facebook users about changes in the amount of time spent using Facebook over the past year. Here are the data for the response variable, Increase, with values “Yes” and “No,” classified by the explanatory variable, Gender, with values “Men” and “Women.” The cases are the 525 Facebook users who participated in the survey.15 Here are the data:
CHALLENGE
Population
n
X
1 (women)
292
47
pˆ X兾n 0.1610
2 (men)
233
21
0.0901
Total
525
68
0.1295
In this table the pˆ column gives the sample proportions of Facebook users who increased their use of Facebook over the past year. Let’s find a 95% confidence interval for the difference between the proportions of women and of men who increased their time spent on Facebook over the past year. Figure 8.4 shows a spreadsheet that can be used as input to software that can compute the confidence interval. Output from JMP,
8.2 Comparing Two Proportions
511
Excel
A
FIGURE 8.4 Spreadsheet that can be used as input to software that computes the confidence interval for the Facebook data in Example 8.10.
B
C
D
Count
1
Increase
Gender
2
Yes
Female
47
3
No
Female
245
4
Yes
Male
21
5
No
Male
212
E
6 7
Minitab, and SAS is given in Figure 8.5. To perform the computations using our formulas, we first find the difference in the proportions: D ⫽ pˆ 1 ⫺ pˆ 2 ⫽ 0.1610 ⫺ 0.0901 ⫽ 0.0709
FIGURE 8.5 (a) JMP, (b) Minitab, and (c) SAS output for the Facebook time confidence interval in Example 8.10.
JMP
Contingency Analysis of Increase By Gender Tests Two Sample Test for Proportions Proportion Description P(Yes Female)-P(Yes Male)
Difference
Lower 95%
Upper 95%
0.07083
0.013328
0.125969
Adjusted Wald Test
Prob
P(Yes Female)-P(Yes Male) 0
0.0077*
P(Yes Female)-P(Yes Male) 0
0.9923
P(Yes Female)-P(Yes Male) 0
0.0154*
Response Increase category of interest No Yes
(a) JMP
Continued
512
CHAPTER 8
•
Inference for Proportions
FIGURE 8.5 (Continued )
Minitab
Test and CI for Two Proportions Sample 1
X 47
N 292
Sample p 0.160959
2
21
233
0.090129
Difference = p (1) - p (2) Estimate for difference: 0.0708301 95% CI for difference: (0.0148953, 0.126765) Test for difference = 0 (vs not = 0): Z = 2.40
P-Value = 0.016
Welcome to Minitab, press F1 for help.
(b) Minitab
SAS
(Asymptotic) 95% Confidence Limits
(Exact) 95% Confidence Limits
Risk
ASE
Row 1
0.1610
0.0215
0.1188
0.2031
0.1207
0.2082
Row 2
0.0901
0.0188
0.0534
0.1269
0.0567
0.1345
Total
0.1295
0.0147
0.1008
0.1582
0.1020
0.1613
Difference
0.0708
0.0285
0.0149
0.1268
Difference is (Row 1 - Row 2)
Confidence Limits for the Proportion (Risk) Difference Column 1 (Increase = Yes) Proportion Difference = 0.0708 Type Wald
95% Confidence Limits 0.0149
Done
(c) SAS
0.1268
8.2 Comparing Two Proportions
513
Then we calculate the standard error of D: SED
pˆ 1 11 pˆ 1 2 pˆ 2 11 pˆ 2 2 n1 n2 B
10.09012 10.90992 10.16102 10.83902 B 292 233 0.0285
For 95% confidence, we have z* 1.96, so the margin of error is m z*SED 11.962 10.02852 0.0559 The 95% confidence interval is D m 0.0709 0.0559 10.0150, 0.12682 With 95% confidence we can say that the difference in the proportions is between 0.0150 and 0.1268. Alternatively, we can report that the difference between the percent of women who increased their time spent on Facebook over the past year and the percent of men who did so is 7.1%, with a 95% margin of error of 5.6%.
In this example men and women were not sampled separately. The sample sizes are, in fact, random and reflect the gender distributions of the subjects who responded to the survey. Two-sample significance tests and confidence intervals are still approximately correct in this situation. In the example above we chose women to be the first population. Had we chosen men to be the first population, the estimate of the difference would be negative (0.07092. Because it is easier to discuss positive numbers, we generally choose the first population to be the one with the higher proportion.
USE YOUR KNOWLEDGE 8.48 Gender and commercial preference. A study was designed to compare two energy drink commercials. Each participant was shown the commercials in random order and asked to select the better one. Commercial A was selected by 44 out of 100 women and 79 out of 140 men. Give an estimate of the difference in gender proportions that favored Commercial A. Also construct a large-sample 95% confidence interval for this difference. 8.49 Gender and commercial preference, revisited. Refer to Exercise 8.48. Construct a 95% confidence interval for the difference in proportions that favor Commercial B. Explain how you could have obtained these results from the calculations you did in Exercise 8.48.
514
CHAPTER 8
•
Inference for Proportions BEYOND THE BASICS
The plus four confidence interval for a difference in proportions Just as in the case of estimating a single proportion, a small modification of the sample proportions can greatly improve the accuracy of confidence intervals.16 As before, we add 2 successes and 2 failures to the actual data, but now we divide them equally between the two samples. That is, we add 1 success and 1 failure to each sample. We will again call the estimates produced by adding hypothetical observations plus four estimates. The plus four estimates of the two population proportions are X1 1 X2 1 苲 p1 and 苲 p2 n1 2 n2 2 The estimated difference between the populations is 苲 D苲 p1 苲 p2 苲 and the standard deviation of D is approximately s D⬃
p1 11 p1 2 p2 11 p2 2 B n1 2 n2 2
This is similar to the formula for sD, adjusted for the sizes of the modified samples. To obtain a confidence interval for p1 p2, we once again replace the unknown parameters in the standard deviation by estimates to obtain an estimated standard deviation, or standard error. Here is the confidence interval we want.
PLUS FOUR CONFIDENCE INTERVAL FOR COMPARING TWO PROPORTIONS Choose an SRS of size n1 from a large population having proportion p1 of successes and an independent SRS of size n2 from another population having proportion p2 of successes. The plus four estimate of the difference in proportions is 苲 D苲 p 苲 p 1
2
where X1 1 苲 p1 n1 2 苲 The standard error of D is SED⬃
X2 1 苲 p2 n2 2
苲 苲 p1 11 苲 p2 11 苲 p1 2 p2 2 B n1 2 n2 2
and the margin of error for confidence level C is m z*SED⬃
8.2 Comparing Two Proportions
515
where z* is the value for the standard Normal density curve with area C between z* and z*. An approximate level C confidence interval for p1 p2 is 苲 Dm Use this method for 90%, 95%, or 99% confidence when both sample sizes are at least 5.
EXAMPLE 8.11 Gender and sexual maturity. In studies that look for a difference between genders, a major concern is whether or not apparent differences are due to other variables that are associated with gender. Because boys mature more slowly than girls, a study of adolescents that compares boys and girls of the same age may confuse a gender effect with an effect of sexual maturity. The “Tanner score” is a commonly used measure of sexual maturity.17 Subjects are asked to determine their score by placing a mark next to a rough drawing of an individual at their level of sexual maturity. There are five different drawings, so the score is an integer between 1 and 5. A pilot study included 12 girls and 12 boys from a population that will be used for a large experiment. Four of the boys and three of the girls had Tanner scores of 4 or 5, a high level of sexual maturity. Let’s find a 95% confidence interval for the difference between the proportions of boys and girls who have high (4 or 5) Tanner scores in this population. The numbers of successes and failures in both groups are not all at least 10, so the largesample approach is not recommended. On the other hand, the sample sizes are both at least 5, so the plus four method is appropriate. The plus four estimate of the population proportion for boys is X1 1 41 苲 p1 0.3571 n1 2 12 2 For girls, the estimate is X2 1 31 苲 p2 0.2857 n2 2 12 2 Therefore, the estimate of the difference is 苲 D苲 p1 苲 p2 0.3571 0.2857 0.071 苲 The standard error of D is 苲 苲 p1 11 苲 p2 11 苲 p1 2 p2 2 SED⬃ B n1 2 n2 2
10.28572 11 0.28572 10.35712 11 0.35712 B 12 2 12 2
0.1760
516
CHAPTER 8
•
Inference for Proportions For 95% confidence, z* 1.96 and the margin of error is m z*SED⬃ 11.962 10.17602 0.345 The confidence interval is 苲 D m 0.071 0.345 10.274, 0.4162 With 95% confidence we can say that the difference in the proportions is between 0.274 and 0.416. Alternatively, we can report that the difference in the proportions of boys and girls with high Tanner scores in this population is 7.1% with a 95% margin of error of 34.5%.
The very large margin of error in this example indicates that either boys or girls could be more sexually mature in this population and that the difference could be quite large. Although the interval includes the possibility that there is no difference, corresponding to p1 p2 or p1 p2 0, we should not conclude that there is no difference in the proportions. With small sample sizes such as these, the data do not provide us with a lot of information for our inference. This fact is expressed quantitatively through the very large margin of error.
Significance test for a difference in proportions Although we prefer to compare two proportions by giving a confidence interval for the difference between the two population proportions, it is sometimes useful to test the null hypothesis that the two population proportions are the same. We standardize D pˆ 1 pˆ 2 by subtracting its mean p1 p2 and then dividing by its standard deviation sD
p1 11 p1 2 p2 11 p2 2 n1 n2 B
If n1 and n2 are large, the standardized difference is approximately N10, 12. For the large-sample confidence interval we used sample estimates in place of the unknown population values in the expression for sD. Although this approach would lead to a valid significance test, we instead adopt the more common practice of replacing the unknown sD with an estimate that takes into account our null hypothesis H0: p1 p2. If these two proportions are equal, then we can view all the data as coming from a single population. Let p denote the common value of p1 and p2 ; then the standard deviation of D pˆ 1 pˆ 2 is sD
p11 p2 p11 p2 n1 n2 B B
p11 p2 a
1 1 b n1 n2
8.2 Comparing Two Proportions
517
We estimate the common value of p by the overall proportion of successes in the two samples: pˆ pooled estimate of p
number of successes in both samples X1 X2 number of observations in both samples n1 n2
This estimate of p is called the pooled estimate because it combines, or pools, the information from both samples. To estimate sD under the null hypothesis, we substitute pˆ for p in the expression for sD. The result is a standard error for D that assumes H0: p1 p2: SEDp
A
pˆ 11 pˆ 2 a
1 1 b n1 n2
The subscript on SEDp reminds us that we pooled data from the two samples to construct the estimate.
SIGNIFICANCE TEST FOR COMPARING TWO PROPORTIONS To test the hypothesis H0: p1 p2 compute the z statistic z
pˆ 1 pˆ 2 SEDp
where the pooled standard error is SEDp
B
pˆ 11 pˆ 2 a
1 1 b n1 n2
and where the pooled estimate of the common value of p1 and p2 is pˆ
X1 X2 n1 n2
In terms of a standard Normal random variable Z, the approximate P-value for a test of H0 against Ha: p1 p2 is P1Z z2
z
Ha: p1 p2 is P1Z z2 z
Ha: p1 ⬆ p2 is 2P1Z 0 z 0 2
z
This z test is based on the Normal approximation to the binomial distribution. As a general rule, we will use it when the number of successes and the number of failures in each of the samples are at least 5.
518
CHAPTER 8
•
Inference for Proportions
EXAMPLE DATA FACEBOOK TIME
8.12 Gender and Facebook time: the z test. Are men and women equally likely to say that they increased the amount of time that they spend on Facebook over the past year? We examine the data in Example 8.10 (page 510) to answer this question. Here is the data summary:
CHALLENGE
Population
n
X
pˆ X兾n
1 (women)
292
47
0.1610
2 (men)
233
21
0.0901
Total
525
68
0.1295
The sample proportions are certainly quite different, but we will perform a significance test to see if the difference is large enough to lead us to believe that the population proportions are not equal. Formally, we test the hypotheses H0: p1 p2 Ha: p1 ⬆ p2 The pooled estimate of the common value of p is pˆ
47 21 68 0.1295 292 233 525
Note that this is the estimate on the bottom line of the preceding data summary. The test statistic is calculated as follows: SEDp z
B
10.12952 10.87052 a
1 1 b 0.02949 292 233
pˆ 1 pˆ 2 0.1610 0.0901 SEDp 0.02949
2.40 The P-value is 2P1Z 2.402. We can conclude that P 211 0.99182 0.0164. Output from JMP, Minitab, and SAS is given in Figure 8.6. JMP reports the P-value as 0.0154, Minitab reports 0.016, and SAS reports 0.0163. Here is our summary: among the Facebook users in the study, 16.1% of the women and 9.0% of the men said that they increased the time they spent on Facebook last year; the difference is statistically significant (z 2.40, P 0.022.
Do you think that we could have argued that the proportion would be higher for women than for men before looking at the data in this example? This would allow us to use the one-sided alternative Ha: p1 p2. The P-value would be half of the value obtained for the two-sided test. Do you think that this approach is justified?
8.2 Comparing Two Proportions FIGURE 8.6 (a) JMP, (b) Minitab,
519
JMP
and (c) SAS output for the Facebook time significance test in Example 8.10.
Contingency Analysis of Increase By Gender Tests Two Sample Test for Proportions Proportion Description
Difference
Lower 95%
Upper 95%
0.07083
0.013328
0.125969
P(Yes Female)-P(Yes Male) Adjusted Wald Test
Prob
P(Yes Female)-P(Yes Male) 0
0.0077*
P(Yes Female)-P(Yes Male) 0
0.9923
P(Yes Female)-P(Yes Male) 0
0.0154*
Response Increase category of interest No Yes
(a) JMP
Minitab
Test and CI for Two Proportions Sample 1
X 47
N 292
Sample p 0.160959
2
21
233
0.090129
Difference = p (1) - p (2) Estimate for difference: 0.0708301 95% CI for difference: (0.0148953, 0.126765) Test for difference = 0 (vs not = 0): Z = 2.40
P-Value = 0.016
Welcome to Minitab, press F1 for help.
(b) Minitab
Continued
520
CHAPTER 8
•
Inference for Proportions
FIGURE 8.6 (Continued )
SAS
Proportion (Risk) Difference Test H0 : P1 - P2 = 0 Proportion Difference
0.0708
ASE (H0)
0.0295
Z
2.4013
One-sided Pr > Z
0.0082
Two-sided Pr > Z
0.0163
Done
(c) SAS
USE YOUR KNOWLEDGE 8.50 Gender and commercial preference: the z test. Refer to Exercise 8.48 (page 513). Test whether the proportions of women and men who liked Commercial A are the same versus the two-sided alternative at the 5% level. 8.51 Changing the alternative hypothesis. Refer to the previous exercise. Does your conclusion change if you test whether the proportion of men who favor Commercial A is larger than the proportion of females? Explain.
BEYOND THE BASICS
Relative risk
risk relative risk
We summarized the comparison of the increased Facebook time during the past year for women and men by reporting the difference in the proportions with a confidence interval. Another way to compare two proportions is to take the ratio. This approach can be used in any setting and it is particularly common in medical settings. We think of each proportion as a risk that something (usually bad) will happen. We then compare these two risks with the ratio of the two proportions, which is called the relative risk (RR). Note that a relative risk of 1 means that the two proportions, pˆ 1 and pˆ 2, are equal. The procedure for calculating confidence intervals for relative risk is based on the same kind of principles that we have studied, but the details are somewhat more complicated. Fortunately, we can leave the details to software and concentrate on interpretation and communication of the results.
8.2 Comparing Two Proportions
521
EXAMPLE 8.13 Aspirin and blood clots: relative risk. A study of patients who had blood clots (venous thromboembolism) and had completed the standard treatment were randomly assigned to receive a low-dose aspirin or a placebo treatment. The 822 patients in the study were randomized to the treatments, 411 to each. Patients were monitored for several years for the occurrence of several related medical conditions. Counts of patients who experienced one or more of these conditions were reported for each year after the study began.18 The following table gives the data for a composite of events, termed “major vascular events.” Here, X is the number of patients who had a major event. n
X
pˆ X兾n
1 (aspirin)
411
45
0.1095
2 (placebo)
411
73
0.1776
Total
822
118
0.1436
Population
The relative risk is RR
pˆ 1 0.1095 0.6164 pˆ 2 0.1776
Software gives the 95% confidence interval as 0.4364 to 0.8707. Taking aspirin has reduced the occurrence of major events to 62% of what it is for patients taking the placebo. The 95% confidence interval is 44% to 87%.
Note that the confidence interval is not symmetric about the estimate. Relative risk is one of many situations where this occurs.
SECTION 8.2 Summary The large-sample estimate of the difference in two population proportions is D pˆ 1 pˆ 2 where pˆ 1 and pˆ 2 are the sample proportions: pˆ 1
X1 X2 and pˆ 2 n1 n2
The standard error of the difference D is SED
pˆ 1 11 pˆ 1 2 pˆ 2 11 pˆ 2 2 n1 n2 B
The margin of error for confidence level C is m z*SED
522
CHAPTER 8
•
Inference for Proportions where z* is the value for the standard Normal density curve with area C between z* and z*. The large-sample level C confidence interval is Dm We recommend using this interval for 90%, 95%, or 99% confidence when the number of successes and the number of failures in both samples are all at least 10. When sample sizes are smaller, alternative procedures such as the plus four estimate of the difference in two population proportions are recommended. Significance tests of H0: p1 p2 use the z statistic z
pˆ 1 pˆ 2 SEDp
with P-values from the N10, 12 distribution. In this statistic, SEDp
B
pˆ 11 pˆ 2 a
1 1 b n1 n2
and pˆ is the pooled estimate of the common value of p1 and p2: pˆ
X1 X2 n1 n2
Use this test when the number of successes and the number of failures in each of the samples are at least 5. Relative risk is the ratio of two sample proportions: RR
pˆ 1 pˆ 2
Confidence intervals for relative risk are often used to summarize the comparison of two proportions.
SECTION 8.2 Exercises For Exercises 8.45 to 8.47, see page 509; for Exercises 8.48 and 8.49, see page 513; and for Exercises 8.50 and 8.51, see page 520. 8.52 Identify the key elements. For each of the following scenarios, identify the populations, the counts, and the sample sizes; compute the two proportions and find their difference. (a) Two website designs are being compared. Fifty students have agreed to be subjects for the study, and they are randomly assigned to visit one or the other of the websites for as long as they like. For each student the study directors record whether or not the visit lasts for more than a minute. For the first design, 12 students visited for more than a minute; for the second, 5 visited for more than a minute. (b) Samples of first-year students and fourth-year students were asked if they were in favor of a new proposed core curriculum. Among the first-year students, 85 said
“Yes” and 276 said “No.” For the fourth-year students, 117 said “Yes” and 104 said “No.” 8.53 Apply the confidence interval guidelines. Refer to the previous exercise. For each of the scenarios, determine whether or not the guidelines for using the largesample method for a 95% confidence interval are satisfied. Explain your answers. 8.54 Find the 95% confidence interval. Refer to Exercise 8.52. For each scenario, find the large-sample 95% confidence interval for the difference in proportions, and use the scenario to explain the meaning of the confidence interval. 8.55 Apply the significance test guidelines. Refer to Exercise 8.52. For each of the scenarios, determine whether or not the guidelines for using the largesample significance test are satisfied. Explain your answers.
8.2 Comparing Two Proportions 8.56 Perform the significance test. Refer to Exercise 8.52. For each scenario, perform the large-sample significance test, and use the scenario to explain the meaning of the significance test. 8.57 Find the relative risk. Refer to Exercise 8.52. For each scenario, find the relative risk. Be sure to give a justification for your choice of proportions to use in the numerator and the denominator of the ratio. Use the scenarios to explain the meaning of the relative risk. 8.58 Teeth and military service. In 1898 the United States and Spain fought a war over the U.S. intervention in the Cuban War of Independence. At that time the U.S. military was concerned about the nutrition of its recruits. Many did not have a sufficient number of teeth to chew the food provided to soldiers. As a result, it was likely that they would be undernourished and unable to fulfill their duties as soldiers. The requirements at that time specified that a recruit must have “at least four sound double teeth, one above and one below on each side of the mouth, and so opposed” so that they could chew food. Of the 58,952 recruits who were under the age of 20, 68 were rejected for this reason. For the 43,786 recruits who were 40 or over, 3801 were rejected.19 (a) Find the proportion of rejects for each age group. (b) Find a 99% confidence interval for the difference in the proportions. (c) Use a significance test to compare the proportions. Write a short paragraph describing your results and conclusions. (d) Are the guidelines for the use of the large-sample approach satisfied for your work in parts (b) and (c)? Explain your answers. 8.59 Physical education requirements. In the 1920s, about 97% of U.S. colleges and universities required a physical education course for graduation. Today, about 40% require such a course. A recent study of physical education requirements included 354 institutions: 225 private and 129 public. Among the private institutions, 60 required a physical education course, while among the public institutions, 101 required a course.20
523
(e) Use a significance test to compare the private and the public institutions with regard to the physical education requirement. (f) For parts (d) and (e), verify that the guidelines for using the large-sample methods are satisfied. (g) Summarize your analysis of these data in a short paragraph. 8.60 Exergaming in Canada. Exergames are active video games such as rhythmic dancing games, virtual bicycles, balance board simulators, and virtual sports simulators that require a screen and a console. A study of exergaming practiced by students from grades 10 and 11 in Montreal, Canada, examined many factors related to participation in exergaming.21 Of the 358 students who reported that they stressed about their health, 29.9% said that they were exergamers. Of the 851 students who reported that they did not stress about their health, 20.8% said that they were exergamers. (a) Define the two populations to be compared for this exercise. (b) What are the counts, the sample sizes, and the proportions? (c) Are the guidelines for the use of the large-sample confidence interval satisfied? (d) Are the guidelines for the use of the large-sample significance test satisfied? 8.61 Confidence interval for exergaming in Canada. Refer to the previous exercise. Find the 95% confidence interval for the difference in proportions. Write a short statement interpreting this result. 8.62 Significance test for exergaming in Canada. Refer to Exercise 8.60. Use a significance test to compare the proportions. Write a short statement interpreting this result.
(c) What are the statistics?
8.63 Adult gamers versus teen gamers. A Pew Internet Project Data Memo presented data comparing adult gamers with teen gamers with respect to the devices on which they play. The data are from two surveys. The adult survey had 1063 gamers while the teen survey had 1064 gamers. The memo reports that 54% of adult gamers played on game consoles (Xbox, PlayStation, Wii, etc.) while 89% of teen gamers played on game consoles.22
(d) Use a 95% confidence interval to compare the private and the public institutions with regard to the physical education requirement.
(a) Refer to the table that appears on page 508. Fill in the numerical values of all quantities that are known.
(a) What are the explanatory and response variables for this exercise? Justify your answers. (b) What are the populations?
524
CHAPTER 8
•
Inference for Proportions
(b) Find the estimate of the difference between the proportion of teen gamers who played on game consoles and the proportion of adults who played on these devices. (c) Is the large-sample confidence interval for the difference between two proportions appropriate to use in this setting? Explain your answer. (d) Find the 95% confidence interval for the difference. (e) Convert your estimated difference and confidence interval to percents. (f) The adult survey was conducted between October and December 2008, whereas the teen survey was conducted between November 2007 and February 2008. Do you think that this difference should have any effect on the interpretation of the results? Be sure to explain your answer.
(c) Find a value d for which the probability is 0.95 that the difference in sample proportions is within d. Mark these values on your sketch. 8.69 What’s wrong? For each of the following, explain what is wrong and why. (a) A z statistic is used to test the null hypothesis that pˆ 1 pˆ 2. (b) If two sample proportions are equal, then the sample counts are equal. (c) A 95% confidence interval for the difference in two proportions includes errors due to nonresponse. 8.70 pˆ 1 2 pˆ 2 and the Normal distribution. Refer to Exercise 8.68. Assume that all the conditions for that exercise remain the same, with the exception that n2 1200.
8.64 Significance test for gaming on consoles. Refer to the previous exercise. Test the null hypothesis that the two proportions are equal. Report the test statistic with the P-value and summarize your conclusion.
(a) Find the mean and the standard deviation of the distribution of pˆ 1 pˆ 2.
8.65 Gamers on computers. The report described in Exercise 8.63 also presented data from the same surveys for gaming on computers (desktops or laptops). These devices were used by 73% of adult gamers and by 76% of teen gamers. Answer the questions given in Exercise 8.63 for gaming on computers.
(c) Because n2 is very large, we expect pˆ 2 to be very close to 0.5. How close?
8.66 Significance test for gaming on computers. Refer to the previous exercise. Test the null hypothesis that the two proportions are equal. Report the test statistic with the P-value and summarize your conclusion. 8.67 Can we compare gaming on consoles with gaming on computers? Refer to the previous four exercises. Do you think that you can use the large-sample confidence intervals for a difference in proportions to compare teens’ use of computers with teens’ use of consoles? Write a short paragraph giving the reason for your answer. (Hint: Look carefully in the box giving the assumptions needed for this procedure.) 8.68 Draw a picture. Suppose that there are two binomial populations. For the first, the true proportion of successes is 0.3; for the second, it is 0.5. Consider taking independent samples from these populations, 40 from the first and 60 from the second. (a) Find the mean and the standard deviation of the distribution of pˆ 1 pˆ 2. (b) This distribution is approximately Normal. Sketch this Normal distribution and mark the location of the mean.
(b) Find the mean and the standard deviation of the distribution of pˆ 1 0.5.
(d) Summarize what you have found in parts (a), (b), and (c) of this exercise. Interpret your results in terms of inference for comparing two proportions when the sample size of one of the samples is much larger than the sample size of the other. 8.71 Gender bias in textbooks. To what extent do syntax textbooks, which analyze the structure of sentences, illustrate gender bias? A study of this question sampled sentences from 10 texts.23 One part of the study examined the use of the words “girl,” “boy,” “man,” and “woman.” We will call the first two words juvenile and the last two adult. Is the proportion of female references that are juvenile (girl) equal to the proportion of male references that are juvenile (boy)? Here are data from one of the texts: Gender Female Male
n
X ( juvenile)
60
48
132
52
(a) Find the proportion of juvenile references for females and its standard error. Do the same for the males. (b) Give a 90% confidence interval for the difference and briefly summarize what the data show. (c) Use a test of significance to examine whether the two proportions are equal.
Chapter 8 Exercises
525
CHAPTER 8 Exercises 8.72 The future of gamification. Gamification is an interactive design that includes rewards such as points, payments, and gifts. A Pew survey of 1021 technology stakeholders and critics was conducted to predict the future of gamification. A report on the survey said that 42% of those surveyed thought that there would be no major increases in gamification by 2020. On the other hand, 53% said that they believed that there would be significant advances in the adoption and use of gamification by 2020.24 Analyze these data using the methods that you learned in this chapter and write a short report summarizing your work. 8.73 Where do you get your news? A report produced by the Pew Research Center’s Project for Excellence in Journalism summarized the results of a survey on how people get their news. Of the 2342 people in the survey who own a desktop or laptop, 1639 reported that they get their news from the desktop or laptop.25 (a) Identify the sample size and the count. (b) Find the sample proportion and its standard error. (c) Find and interpret the 95% confidence interval for the population proportion. (d) Are the guidelines for use of the large-sample confidence interval satisfied? Explain your answer. 8.74 Is the calcium intake adequate? Young children need calcium in their diet to support the growth of their bones. The Institute of Medicine provides guidelines for how much calcium should be consumed by people of different ages.26 One study examined whether or not a sample of children consumed an adequate amount of calcium based on these guidelines. Since there are different guidelines for children aged 5 to 10 years and those aged 11 to 13 years, the children were classified into these two age groups. Each student’s calcium intake was classified as meeting or not meeting the guideline. There were 2029 children in the study. Here are the data:27 Age (years) Met requirement
5 to 10
11 to 13
No
194
557
Yes
861
417
Identify the populations, the counts, and the sample sizes for comparing the extent to which the two age groups of children met the calcium intake requirement. 8.75 Use a confidence interval for the comparison. Refer to the previous exercise. Use a 95% confidence interval for the comparison and explain what the
confidence interval tells us. Be sure to include a justification for the use of the large-sample procedure for this comparison. 8.76 Use a significance test for the comparison. Refer to Exercise 8.74. Use a significance test to make the comparison. Interpret the result of your test. Be sure to include a justification for the use of the large-sample procedure for this comparison. 8.77 Confidence interval or significance test? Refer to Exercises 8.74 to 8.76. Do you prefer to use the confidence interval or the significance test for this comparison? Give reasons for your answer. 8.78 Punxsutawney Phil. There is a gathering every year on February 2 at Gobbler’s Knob in Punxsutawney, Pennsylvania. A groundhog, always named Phil, is the center of attraction. If Phil sees his shadow when he emerges from his burrow, tradition says that there will be six more weeks of winter. If he does not see his shadow, spring has arrived. How well has Phil predicted the arrival of spring for the past several years? The National Oceanic and Atmospheric Administration has collected data for the 25 years from 1988 to 2012. For each year, whether or not Phil saw his shadow is recorded. This is compared with the February temperature for that year, classified as above or below normal. For 18 of the 25 years, Phil saw his shadow, and for 6 of these years, the temperature was below normal. For the years when Phil did not see his shadow, 2 of these years had temperatures below normal.28 Analyze the data and write a report on how well Phil predicts whether or not winter is over. 8.79 Facebook users. A Pew survey of 1802 Internet users found that 67% use Facebook.29 (a) How many of those surveyed used Facebook? (b) Give a 95% confidence interval for the proportion of Internet users who use Facebook. (c) Convert the confidence interval that you found in part (b) to a confidence interval for the percent of Internet users who use Facebook. 8.80 Twitter users. Refer to the previous exercise. The same survey reported that 16% of Internet users use Twitter. Answer the questions in the previous exercise for Twitter use. 8.81 Facebook versus Twitter. Refer to Exercises 8.79 and 8.80. Can you use the data provided in these two exercises to compare the proportion of Facebook users with the proportion of Twitter users? If your answer is yes, do the comparison. If your answer is no, explain why you cannot make the comparison.
526
CHAPTER 8
•
Inference for Proportions
8.82 Video game genres. U.S. computer and video game software sales were $13.26 billion in 2012.30 A survey of 1102 teens collected data about video game use by teens. According to the survey, the following are the most popular game genres.31 Genre
Examples
Percent who play
Racing
NASCAR, Mario Kart, Burnout
74
Puzzle
Bejeweled, Tetris, Solitaire
72
Sports
Madden, FIFA, Tony Hawk
68
Action
Grand Theft Auto, Devil May Cry, Ratchet and Clank
67
Adventure
Legend of Zelda, Tomb Raider
66
Rhythm
Guitar Hero, Dance Dance Revolution, Lumines
61
Give a 95% confidence interval for the proportion who play games in each of these six genres. 8.83 Too many errors. Refer to the previous exercise. The chance that each of the six intervals that you calculated includes the true proportion for that genre is approximately 95%. In other words, the chance that your interval misses the true value is approximately 5%. (a) Explain why the chance that at least one of your intervals does not contain the true value of the parameter is greater than 5%. (b) One way to deal with this problem is to adjust the confidence level for each interval so that the overall probability of at least one miss is 5%. One simple way to do this is to use a Bonferroni procedure. Here is the basic idea: You have an error budget of 5% and you choose to spend it equally on six intervals. Each interval has a budget of 0.05兾6 0.008. So, each confidence interval should have a 0.8% chance of missing the true value. In other words, the confidence level for each interval should be 1 0.008 0.992. Use Table A to find the value of z* for a large-sample confidence interval for a single proportion corresponding to 99.2% confidence. (c) Calculate the six confidence intervals using the Bonferroni procedure. 8.84 Changes in credit card usage by undergraduates. In Exercise 8.31 (page 506) we looked at data from a survey of 1430 undergraduate students and their credit card use. In the sample, 43% said that they had four or more credit cards. A similar study performed four years earlier by the same organization reported that 32% of the sample said that they had four or more credit cards.32 Assume that the sample sizes for the two studies are the same. Find a 95% confidence interval for the change in the percent of undergraduates who report having four or more credit cards.
8.85 Do the significance test for the change. Refer to the previous exercise. Perform the significance test for comparing the two proportions. Report your test statistic, the P-value, and summarize your conclusion. 8.86 We did not know the sample size. Refer to the previous two exercises. We did not report the sample size for the earlier study, but it is reasonable to assume that it is close to the sample size for the later study. (a) Suppose that the sample size for the earlier study was only 800. Redo the confidence interval and significance test calculations for this scenario. (b) Suppose that the sample size for the earlier study was 2500. Redo the confidence interval and significance test calculations for this scenario. (c) Compare your results for parts (a) and (b) of this exercise with the results that you found in the previous two exercises. Write a short paragraph about the effects of assuming a value for the sample size on your conclusions. 8.87 Student employment during the school year. A study of 1530 undergraduate students reported that 1006 work 10 or more hours a week during the school year. Give a 95% confidence interval for the proportion of all undergraduate students who work 10 or more hours a week during the school year. 8.88 Examine the effect of the sample size. Refer to the previous exercise. Assume a variety of different scenarios where the sample size changes, but the proportion in the sample who work 10 or more hours a week during the school year remains the same. Write a short report summarizing your results and conclusions. Be sure to include numerical and graphical summaries of what you have found. 8.89 Gender and soft drink consumption. Refer to Exercise 8.24 (page 505). This survey found that 16% of the 2006 New Zealanders surveyed reported that they consumed five or more servings of soft drinks per week. The corresponding percents for men and women were 17% and 15%, respectively. Assuming that the numbers of men and women in the survey are approximately equal, do the data suggest that the proportions vary by gender? Explain your methods, assumptions, results, and conclusions. 8.90 Examine the effect of the sample size. Refer to the previous exercise. Assume the following values for the total sample size: 1000, 4000, 10,000. Also assume that the sample proportions do not change. For each of these scenarios, redo the calculations that you performed in the previous exercise. Write a short paragraph summarizing the effect of the sample size on the results.
Chapter 8 Exercises 8.91 Gallup Poll study. Go to the Gallup Poll website gallup.com and find a poll that has several questions of interest to you. Summarize the results of the poll giving margins of error and comparisons of interest. (For this exercise, you may assume that the data come from an SRS.) 8.92 More on gender bias in textbooks. Refer to the study of gender bias and stereotyping described in Exercise 8.71 (page 524). Here are the counts of “girl,” “woman,” “boy,” and “man” for all the syntax texts studied. The one we analyzed in Exercise 8.71 was number 6. GENDERB Text Number 1
2
3
4
5
6
7
8
9
10
Girl
2
5
25
11
2
48
Woman
3
2
31
65
1
12
38
5
48
13
2
13
24
Boy
7
18
14
19
12
5
52
70
6
128
32
Man
27
45
51
138
31
80
2
27
48
95
For each text perform the significance test to compare the proportions of juvenile references for females and males. Summarize the results of the significance tests for the 10 texts studied. The researchers who conducted the study note that the authors of the last 3 texts are women, while the other 7 texts were written by men. Do you see any pattern that suggests that the gender of the author is associated with the results? 8.93 Even more on gender bias in textbooks. Refer to the previous exercise. Let us now combine the categories “girl” with “woman” and “boy” with “man.” For each text calculate the proportion of male references and test the hypothesis that male and female references are equally likely (that is, the proportion of male references is equal to 0.5). Summarize the results of your 10 tests. Is there a pattern that suggests a relation with the gender of the author? 8.94 Changing majors during college. In a random sample of 975 students from a large public university, it was found that 463 of the students changed majors during their college years. (a) Give a 95% confidence interval for the proportion of students at this university who change majors. (b) Express your results from (a) in terms of the percent of students who change majors. (c) University officials concerned with counseling students are interested in the number of students who change majors rather than the proportion. The university has 37,500 undergraduate students. Convert the confidence interval you found in (a) to a confidence interval for the number of students who change majors during their college years.
527
8.95 Sample size and the P-value. In this exercise we examine the effect of the sample size on the significance test for comparing two proportions. In each case suppose that pˆ 1 0.55 and pˆ 2 0.45, and take n to be the common value of n1 and n2. Use the z statistic to test H0: p1 p2 versus the alternative Ha: p1 ⬆ p2. Compute the statistic and the associated P-value for the following values of n: 60, 70, 80, 100, 400, 500, and 1000. Summarize the results in a table. Explain what you observe about the effect of the sample size on statistical significance when the sample proportions pˆ 1 and pˆ 2 are unchanged. 8.96 Sample size and the margin of error. In Section 8.1, we studied the effect of the sample size on the margin of error of the confidence interval for a single proportion. In this exercise we perform some calculations to observe this effect for the two-sample problem. Suppose that pˆ 1 0.8 and pˆ 2 0.6, and n represents the common value of n1 and n2. Compute the 95% margins of error for the difference between the two proportions for n 5 60, 70, 80, 100, 400, 500, and 1000. Present the results in a table and with a graph. Write a short summary of your findings. 8.97 Calculating sample sizes for the two-sample problem. For a single proportion, the margin of error of a confidence interval is largest for any given sample size n and confidence level C when pˆ 0.5. This led us to use p* 0.5 for planning purposes. The same kind of result is true for the two-sample problem. The margin of error of the confidence interval for the difference between two proportions is largest when pˆ 1 pˆ 2 0.5. You are planning a survey and will calculate a 95% confidence interval for the difference between two proportions when the data are collected. You would like the margin of error of the interval to be less than or equal to 0.06. You will use the same sample size n for both populations. (a) How large a value of n is needed? (b) Give a general formula for n in terms of the desired margin of error m and the critical value z*. 8.98 A corporate liability trial. A major court case on the health effects of drinking contaminated water took place in the town of Woburn, Massachusetts. A town well in Woburn was contaminated by industrial chemicals. During the period that residents drank water from this well, there were 16 birth defects among 414 births. In years when the contaminated well was shut off and water was supplied from other wells, there were 3 birth defects among 228 births. The plaintiffs suing the firm responsible for the contamination claimed that these data show that the rate of birth defects was higher when the contaminated well was in use.33 How statistically significant is the evidence? What assumptions does your analysis require? Do these assumptions seem reasonable in this case?
528
CHAPTER 8
•
Inference for Proportions
8.99 Statistics and the law. Castaneda v. Partida is an important court case in which statistical methods were used as part of a legal argument.34 When reviewing this case, the Supreme Court used the phrase “two or three standard deviations” as a criterion for statistical significance. This Supreme Court review has served as the basis for many subsequent applications of statistical methods in legal settings. (The two or three standard deviations referred to by the Court are values of the z statistic and correspond to P-values of approximately 0.05 and 0.0026.) In Castaneda the plaintiffs alleged that the method for selecting juries in a county in Texas was biased against Mexican Americans. For the period of time at issue, there were 181,535 persons eligible for jury duty, of whom 143,611 were Mexican Americans. Of the 870 people selected for jury duty, 339 were Mexican Americans. (a) What proportion of eligible jurors were Mexican Americans? Let this value be p0. (b) Let p be the probability that a randomly selected juror is a Mexican American. The null hypothesis to be tested is H0: p p0. Find the value of pˆ for this problem, compute the z statistic, and find the P-value. What do you conclude? (A finding of statistical significance in this circumstance does not constitute proof of discrimination. It can be used, however, to establish a prima facie case. The burden of proof then shifts to the defense.) (c) We can reformulate this exercise as a two-sample problem. Here we wish to compare the proportion of Mexican Americans among those selected as jurors with the proportion of Mexican Americans among those not selected as jurors. Let p1 be the probability that a randomly selected juror is a Mexican American, and let p2 be the probability that a randomly selected nonjuror is a Mexican American. Find the z statistic and its P-value. How do your answers compare with your results in part (b)? 8.100 Home court advantage. In many sports there is a home field or home court advantage. This means that the home team is more likely to win when playing at
home than when playing at an opponent’s field or court, all other things being equal. Go to the website of your favorite sports team and find the proportion of wins for home games and the proportion of wins for away games. Now consider these games to be a random sample of the process that generates wins and losses. A complete analysis of data like these requires methods that are beyond what we have studied, but the methods discussed in this chapter will give us a reasonable approximation. Examine the home court advantage for your team and write a summary of your results. Be sure to comment on the effect of the sample size. 8.101 Attitudes toward student loan debt. The National Student Loan Survey asked the student loan borrowers in their sample about attitudes toward debt.35 Below are some of the questions they asked, with the percent who responded in a particular way. Assume that the sample size is 1280 for all these questions. Compute a 95% confidence interval for each of the questions, and write a short report about what student loan borrowers think about their debt. (a) “Do you feel burdened by your student loan payments?” 55.5% said they felt burdened. (b) “If you could begin again, taking into account your current experience, what would you borrow?” 54.4% said they would borrow less. (c) “Since leaving school, my education loans have not caused me more financial hardship than I had anticipated at the time I took out the loans.” 34.3% disagreed. (d) “Making loan payments is unpleasant, but I know that the benefits of education loans are worth it.” 58.9% agreed. (e) “I am satisfied that the education I invested in with my student loan(s) was worth the investment for career opportunities.” 58.9% agreed. (f) “I am satisfied that the education I invested in with my student loan(s) was worth the investment for personal growth.” 71.5% agreed.
Analysis of Two-Way Tables Introduction We continue our study of methods for analyzing categorical data in this chapter. Inference about proportions in one-sample and two-sample settings was the focus of Chapter 8. We now study how to compare two or more populations when the response variable has two or more categories and how to test whether two categorical variables are independent. A single statistical test handles both of these cases. The first section of this chapter gives the basics of statistical inference that are appropriate in this setting. A goodness-of-fit test is presented in the second section. The methods in this chapter answer questions such as
CHAPTER
9
9.1 Inference for Two-Way Tables 9.2 Goodness of Fit
• Are men and women equally likely to suffer lingering fear symptoms after watching scary movies like Jaws and Poltergeist at a young age? • Is there an association between texting while driving and automobile accidents? • Does political preference predict whether a person makes contributions online?
529
530
CHAPTER 9
•
Analysis of Two-Way Tables
9.1 Inference for Two-Way Tables When you complete this section, you will be able to • Translate a problem from a comparison of two proportions to an analysis of a 2 3 2 table. • Find the joint distribution, the marginal distributions, and the conditional distributions for a two-way table of counts. • Identify the joint distribution, the marginal distributions, and the conditional distributions for a two-way table from software output. • Distinguish between settings where the goal is to describe a relationship between an explanatory variable and a response variable or to just explain the relationship between two categorical variables. If there are explanatory and response variables, identify them. • Choose appropriate conditional distributions to describe relationships in a two-way table. • Compute expected counts from the counts in a two-way table. • Compute the chi-square statistic and the P-value from the expected counts in a two-way table. Use the P-value to draw your conclusion. • For a 2 3 2 table, explain the relationship between the chi-square test and the z test for comparing two proportions. • Distinguish between two models for two-way tables.
When we studied inference for two proportions in Chapter 8, we started summarizing the raw data by giving the number of observations in each population (n) and how many of these were classified as “successes” 1X2.
EXAMPLE 9.1 Are you spending more time on Facebook? In Example 8.10 (page 510), we compared the proportions of women and men who said that they increased the amount of time that they spent on Facebook during the past year. The following table summarizes the data used in this comparison: Population
n
X
1 (women)
292
47
pˆ ⫽ X兾n 0.1610
2 (men)
233
21
0.0901
Total
525
68
0.1295
DATA
These data suggest that the percent of women who increased the amount of time spent on Facebook is 7.1% larger than the percent of men, with a 95% margin of error of 5.6%. FACE
LOOK BACK two-way table, p. 139
In this chapter we consider a different summary of the data. Rather than recording just the count of those who spent more time on Facebook during the past year, we record counts of all the outcomes in a two-way table.
C
9.1 Inference for Two-Way Tables
531
EXAMPLE DATA FACE
9.2 Two-way table for time spent on Facebook. Here is the two-way table classifying Facebook users by gender and whether or not they increased the amount of time that they spent on Facebook during the past year: Two-way table for time spent on Facebook Gender Increased
CHALLENGE
r 3 c table
Women
Men
Total
Yes
47
21
68
No
245
212
457
Total
292
233
525
We use the term r ⫻ c table to describe a two-way table of counts with r rows and c columns. The two categorical variables in the 2 ⫻ 2 table of Example 9.2 are “Increased” and “Gender.” “Increased” is the row variable, with values “Yes” and “No,” and “Gender” is the column variable, with values “Men” and “Women.” Since the objective in this example is to compare the genders, we view “Gender” as an explanatory variable, and therefore, we make it the column variable. The next example presents another two-way table.
EXAMPLE DATA FRIGHT
9.3 Lingering symptoms from frightening movies. There is a growing body of literature demonstrating that early exposure to frightening movies is associated with lingering fright symptoms. As part of a class on media effects, college students were asked to write narrative accounts of their exposure to frightening movies before the age of 13. More than one-fourth of the respondents said that some of the fright symptoms were still present in waking life.1 The following table breaks down these results by gender: Observed numbers of students
CHALLENGE
Gender Ongoing fright symptoms
Female
Male
Total
No
50
31
Yes
29
7
36
Total
79
38
117
81
The two categorical variables in Example 9.3 are “Ongoing fright symptoms,” with values “Yes” and “No,” and “Gender,” with values “Female” and “Male.” Again we view “Gender” as an explanatory variable and “Ongoing fright symptoms” as a categorical response variable. In Chapter 2 we discussed two-way tables and the basics about joint, marginal, and conditional distributions. We now view those sample distributions as estimates of the corresponding population distributions. Let’s look at some software output that gives these distributions.
532
CHAPTER 9
•
Analysis of Two-Way Tables
EXAMPLE
FIGURE 9.1 Computer output CHALLENGE
for Examples 9.3 and 9.4.
JMP
Contingency Analysis of Gender By Fright
Weight: Count Mosaic Plot Contingency Table
Gender Count Total % Col % Row %
Fright
DATA FRIGHT
9.4 Software output for ongoing fright symptoms. Figure 9.1 shows the output from JMP, Minitab, and SPSS for the fright symptoms data of Example 9.3. For now, we will just concentrate on the different distributions. Later, we will explore other parts of the output.
Female
Male
No
50 42.74 63.29 61.73
31 26.50 81.58 38.27
81 69.23
Yes
29 24.79 36.71 80.56
7 5.98 18.42 19.44
36 30.77
79 67.52
38 32.48
117
Tests N
DF
-LogLike
R Square (U)
117
1
2.1303364
0.0289
Test Likelihood Ratio Pearson Fisher’s Exact Test Left Right 2-Tail
Prob 0.0341* 0.9887 0.0550
ChiSquare
Prob>ChiSq
4.261 4.028
0.0390* 0.0447*
Alternative Hypothesis Prob(Gender=Male) is greater for Fright=No than Yes Prob(Gender=Male) is greater for Fright=Yes than No Prob(Gender=Male) is different across Fright
(a) JMP
9.1 Inference for Two-Way Tables
533
Minitab
Rows: Fright
Columns: Gender
Female
Male
All
No
50 61.73 63.29 42.74 54.69
31 38.27 81.58 26.50 26.31
81 100.00 69.23 69.23 81.00
Yes
29 80.56 36.71 24.79 24.31
7 19.44 18.42 5.98 11.69
36 100.00 30.77 30.77 36.00
All
79 67.52 100.00 67.52 79.00
38 32.48 100.00 32.48 38.00
117 100.00 100.00 100.00 117.00
Cell Contents:
Count % of Row % of Column % of Total Expected count
Pearson Chi-Square = 4.028, DF = 1, P-Value = 0.045 Likelihood Ratio Chi-Square = 4.261, DF = 1, P-Value = 0.039
Clear highlighted area
(b) Minitab
Continued
The three packages use similar displays for the distributions. In the cells of the 2 ⫻ 2 table we find the counts, the conditional distributions of the column variable for each value of the row variable, the conditional distributions of the row variable for each value of the column variable, and the joint distribution. All of these are expressed as percents rather than proportions. Let’s look at the entries in the upper-left cell of the JMP output. We see that there are 50 females whose response is “No” to the fright symptoms question. These 50 represent 42.74% of the study participants. They represent 63.29% of the females in the study. And they represent 61.73% of the people who responded “No” to the fright symptoms question. The marginal distributions are in the rightmost column and the bottom row. Minitab and SPSS give the same information but not necessarily in the same order.
LOOK BACK conditional distributions, p. 144
In Chapter 2, we learned that the key to examining the relationship between two categorical variables is to look at conditional distributions. Let’s do that for the fright symptoms data.
534
CHAPTER 9
•
Analysis of Two-Way Tables
FIGURE 9.1 (Continued )
*Output1 - IBM SPSS Statistics Viewer Crosstabs Fright * Gender Crosstabulation Gender Female Male Fright
Total
No
Count Expected Count % within Fright % within Gender % of Total
50 54.7 61.7% 63.3% 42.7%
31 26.3 38.3% 81.6% 26.5%
81 81.0 100.0% 69.2% 69.2%
Yes
Count Expected Count % within Fright % within Gender % of Total
29 24.3 80.6% 36.7% 24.8%
7 11.7 19.4% 18.4% 6.0%
36 36.0 100.0% 30.8% 30.8%
Count Expected Count % within Fright % within Gender % of Total
79 79.0 67.5% 100.0% 67.5%
38 38.0 32.5% 100.0% 32.5%
117 117.0 100.0% 100.0% 100.0%
Fright
Chi-Square Tests
Value Pearson Chi-Square Continuity Correctionb Likelihood Ratio Fisher’s Exact Test N of Valid Cases
4.028a 3.216 4.261
df 1 1 1
Asymp. Sig. (2-sided)
Exact Sig. (2-sided)
Exact Sig. (1-sided)
.045 .073 .039 .055
117
.034
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 11.69. b. Computer only for a 2x2 table
IBM SPSS Statistics Processor is ready
H: 115, W: 508 pt
(c) SPSS
EXAMPLE DATA FRIGHT
9.5 Two-way table of ongoing fright symptoms and gender. To compare the frequency of lingering fright symptoms across genders, we examine column percents. Here they are, rounded from the output in Figure 9.1 for clarity: Column percents for gender Gender Ongoing fright symptoms Yes
CHALLENGE
No Total
Male
Female
18%
37%
82%
63%
100%
100%
9.1 Inference for Two-Way Tables
535
The “Total” row reminds us that 100% of the male and female students have been classified as having ongoing fright symptoms or not. (The sums sometimes differ slightly from 100% because of roundoff error.) The bar graph in Figure 9.2 compares the percents. The data reveal a clear relationship: 37% of the women have ongoing fright symptoms, as opposed to only 18% of the men.
FIGURE 9.2 Bar graph of the
40 Percent with symptoms
percents of male and female students with ongoing fright symptoms.
30 20 10 0
Female Male Gender
The difference between the percents of students with lingering fears is reasonably large. A statistical test will tell us whether or not this difference can be plausibly attributed to chance. Specifically, if there is no association between gender and having ongoing fright symptoms, how likely is it that a sample would show a difference as large or larger than that displayed in Figure 9.2? In the remainder of this section we discuss the significance test to examine this question.
USE YOUR KNOWLEDGE DATA
9.1 Find two conditional distributions. Use the output in Figure 9.3 (page 536) to answer the following questions. FACE
(a) Find the conditional distribution of increased Facebook time for females. (b) Do the same for males. (c) Graphically display the two conditional distributions.
CHALLENGEDATA
(d) Write a short summary interpreting the two conditional distributions. 9.2 Condition on Facebook time. Refer to Exercise 9.1 (page 530). Use the output in Figure 9.3 (page 536) to answer the following questions. FACE
(a) Find the conditional distribution of gender for those who have increased their Facebook time in the past year. (b) Do the same for those who did not increase their Facebook time.
536
CHAPTER 9
•
Analysis of Two-Way Tables (c) Graphically display the two conditional distributions. (d) Write a short summary interpreting the two conditional distributions. 9.3 Which conditional distributions should you use? Refer to your answers to the two previous exercises. Which of these distributions do you prefer for interpreting these data? Give reasons for your answer.
FIGURE 9.3 Computer output for Exercises 9.1 to 9.3.
Minitab
Rows: Gender
Columns: Increased No
Yes
All
Men
212 90.99 46.39 40.38 202.8
21 9.01 30.88 4.00 30.2
233 100.00 44.38 44.38 233.0
Women
245 83.90 53.61 46.67 254.2
47 16.10 69.12 8.95 37.8
292 100.00 55.62 55.62 292.0
457 87.05 100.00 87.05 457.0
68 12.95 100.00 12.95 68.0
525 100.00 100.00 100.00 525.0
All
Cell Contents:
Count % of Row % of Column % of Total Expected count
Pearson Chi-Square = 5.766, DF = 1, P-Value = 0.016 Likelihood Ratio Chi-Square = 5.939, DF = 1, P-Value = 0.015
Jump to previous command in Session window
The hypothesis: no association The null hypothesis H0 of interest in a two-way table is “There is no association between the row variable and the column variable.” In Example 9.3, this null hypothesis says that gender and having ongoing fright symptoms are not related. The alternative hypothesis Ha is that there is an association between these two variables. The alternative Ha does not specify any particular direction for the association. For two-way tables in general, the alternative includes many different possibilities. Because it includes all sorts of possible associations, we cannot describe Ha as either one-sided or two-sided. In our example, the hypothesis H0 that there is no association between gender and having ongoing fright symptoms is equivalent to the statement that the variables “ongoing fright symptoms” and “gender” are independent.
9.1 Inference for Two-Way Tables
537
For other two-way tables, where the columns correspond to independent samples from c distinct populations, there are c distributions for the row variable, one for each population. The null hypothesis then says that the c distributions of the row variable are identical. The alternative hypothesis is that the distributions are not all the same.
Expected cell counts expected cell counts
To test the null hypothesis in r ⫻ c tables, we compare the observed cell counts with expected cell counts calculated under the assumption that the null hypothesis is true. A numerical summary of the comparison will be our test statistic.
EXAMPLE DATA FRIGHT
CHALLENGE
9.6 Expected counts from software. The observed and expected counts for the ongoing fright symptoms example appear in the Minitab and SPSS computer outputs shown in Figure 9.1 (pages 532–534). The expected counts are given as the last entry in each cell for Minitab and as the second entry in each cell for SPSS. For example, in the cell for males with fright symptoms, the observed count is 7 and the expected count is 11.69 (Minitab) or 11.7 (SPSS). How is this expected count obtained? Look at the percents in the right margin of the tables in Figure 9.1. We see that 30.77% of all students had ongoing fright symptoms. If the null hypothesis of no relation between gender and ongoing fright is true, we expect this overall percent to apply to both men and women. In particular, we expect 30.77% of the men to have lingering fright symptoms. Since there are 38 men, the expected count is 30.77% of 38, or 11.69. The other expected counts are calculated in the same way. The reasoning of Example 9.6 leads to a simple formula for calculating expected cell counts. To compute the expected count of men with ongoing fright symptoms, we multiplied the proportion of students with fright symptoms (36/117) by the number of men (38). From Figure 9.1 we see that the numbers 36 and 38 are the row and column totals for the cell of interest and that 117 is n, the total number of observations for the table. The expected cell count is therefore the product of the row and column totals divided by the table total.
EXPECTED CELL COUNTS expected count ⫽
row total ⫻ column total n
The chi-square test To test the H0 that there is no association between the row and column classifications, we use a statistic that compares the entire set of observed counts with the set of expected counts. To compute this statistic, • First, take the difference between each observed count and its corresponding expected count, and square these values so that they are all 0 or positive.
538
CHAPTER 9
•
Analysis of Two-Way Tables
LOOK BACK standardizing, p. 61
• Since a large difference means less if it comes from a cell that is expected to have a large count, divide each squared difference by the expected count. This is a type of standardization. • Finally, sum over all cells. The result is called the chi-square statistic X 2. The chi-square statistic was proposed by the English statistician Karl Pearson (1857–1936) in 1900. It is the oldest inference procedure still used in its original form.
CHI-SQUARE STATISTIC The chi-square statistic is a measure of how much the observed cell counts in a two-way table diverge from the expected cell counts. The formula for the statistic is X2 ⫽ a
1observed count ⫺ expected count2 2 expected count
where “observed” represents an observed cell count, “expected” represents the expected count for the same cell, and the sum is over all r ⫻ c cells in the table.
chi-square distribution x2
If the expected counts and the observed counts are very different, a large value of X 2 will result. Large values of X 2 provide evidence against the null hypothesis. To obtain a P-value for the test, we need the sampling distribution of X 2 under the assumption that H0 (no association between the row and column variables) is true. The distribution is called the chi-square distribution, which we denote by x 2 1x is the lowercase Greek letter chi). Like the t distributions, the x 2 distributions form a family described by a single parameter, the degrees of freedom. We use x 2 1df2 to indicate a particular member of this family. Figure 9.4 displays the density curves of the x 2 122 and x 2 142 distributions. As you can see in the figure, x 2 distributions take only positive values and are skewed to the right. Table F in the back of the book gives upper critical values for the x 2 distributions.
0
0 (a)
(b) 2
2
FIGURE 9.4 (a) The x (2) density curve. (b) The x (4) density curve.
9.1 Inference for Two-Way Tables
539
CHI-SQUARE TEST FOR TWO-WAY TABLES The null hypothesis H0 is that there is no association between the row and column variables in a two-way table. The alternative hypothesis is that these variables are related. If H0 is true, the chi-square statistic X 2 has approximately a x 2 distribution with 1r ⫺ 12 1c ⫺ 12 degrees of freedom. The P-value for the chi-square test is
P1x 2 ⱖ X 2 2 0
X2
where x 2 is a random variable having the x 2 1df2 distribution with df ⫽ 1r ⫺ 12 1c ⫺ 12. For tables larger than 2 ⫻ 2, we will use this approximation whenever the average of the expected counts is 5 or more and the smallest expected count is 1 or more. For 2 ⫻ 2 tables, we require all four expected cell counts to be 5 or more.2
The chi-square test always uses the upper tail of the x 2 distribution, because any deviation from the null hypothesis makes the statistic larger. The approximation of the distribution of X 2 by x 2 becomes more accurate as the cell counts increase. Moreover, it is more accurate for tables larger than 2 ⫻ 2 tables.
EXAMPLE DATA FRIGHT
9.7 Chi-square significance test from software. The results of the chisquare significance test for the ongoing fright symptoms example appear in the computer outputs in Figures 9.1 (pages 532–534), labeled Pearson or Pearson Chi-Square. Because all the expected cell counts are moderately large (5 or more), the x 2 distribution provides an accurate P-value. We see that X 2 ⫽ 4.03, df ⫽ 1, and P ⫽ 0.045. As a check we verify that the degrees of freedom are correct for a 2 ⫻ 2 table: df ⫽ 1r ⫺ 12 1c ⫺ 12 ⫽ 12 ⫺ 12 12 ⫺ 12 ⫽ 1
CHALLENGE
The chi-square test confirms that the data provide evidence against the null hypothesis that there is no relationship between gender and ongoing fright symptoms. Under H0, the chance of obtaining a value of X 2 greater than or equal to the calculated value of 4.03 is small, 0.045—fewer than 5 times in 100. The test does not provide insight into the nature of the relationship between the variables. It is up to us to see that the data show that women are more likely to have lingering fright symptoms. You should always accompany a chi-square test by percents such as those in Example 9.5 and Figure 9.2 and by a description of the nature of the relationship.
540
CHAPTER 9
•
Analysis of Two-Way Tables
LOOK BACK confounding, p. 173
The observational study of Example 9.3 cannot tell us whether gender is a cause of lingering fright symptoms. The association may be explained by confounding with other variables. For example, other research has shown that there are gender differences in the social desirability of admitting fear.3 Our data don’t allow us to investigate possible confounding variables. Often a randomized comparative experiment can settle the issue of causation, but we cannot randomly assign gender to each student. The researcher who published the data of our example states merely that women are more likely to report lingering fright symptoms and that this conclusion is consistent with other studies.
Computations The calculations required to analyze a two-way table are straightforward but tedious. In practice, we recommend using software, but it is possible to do the work with a calculator, and some insight can be gained by examining the details. Here is an outline of the steps required.
COMPUTATIONS FOR TWO-WAY TABLES 1. Calculate descriptive statistics that convey the important information in the table. Usually these will be column or row percents. 2. Find the expected counts and use these to compute the X 2 statistic. 3. Use chi-square critical values from Table F to find the approximate P-value. 4. Draw a conclusion about the association between the row and column variables.
The following examples illustrate these steps.
EXAMPLE DATA HEALTH
9.8 Health habits of college students. Physical activity generally declines when students leave high school and enroll in college. This suggests that college is an ideal setting to promote physical activity. One study examined the level of physical activity and other health-related behaviors in a sample of 1184 college students.4 Let’s look at the data for physical activity and consumption of fruits. We categorize physical activity as low, moderate, or vigorous and fruit consumption as low, medium, or high. Here is the two-way table that summarizes the data:
CHALLENGE
Physical activity Fruit consumption
Low
Moderate
Vigorous
Total
Low
69
206
294
569
Medium
25
126
170
321
High
14
111
169
294
Total
108
443
633
1184
9.1 Inference for Two-Way Tables
541
The table in Example 9.8 is a 3 3 3 table, to which we have added the marginal totals obtained by summing across rows and columns. For example, the first-row total is 69 1 206 1 294 5 569. The grand total, the number of students in the study, can be computed by summing the row totals (569 1 321 1 294 5 1184) or the column totals (108 1 443 1 633 5 1184). It is easy to make an error in these calculations, so it is a good idea to do both as a check on your arithmetic.
Computing conditional distributions First, we summarize the observed relation between physical activity and fruit consumption. We expect a positive association, but there is no clear distinction between an explanatory variable and a response variable in this setting. If we have such a distinction, then the clearest way to describe the relationship is to compare the conditional distributions of the response variable for each value of the explanatory variable. Otherwise, we can compute the conditional distribution each way and then decide which gives a better description of the data.
EXAMPLE 9.9 Health habits of college students: conditional distributions. Let’s look at the data in the first column of the table in Example 9.8. There were 108 students with low physical activity. Of these, there were 69 with low fruit consumption. Therefore, the column proportion for this cell is 69 ⫽ 0.639 108
DATA
That is, 63.9% of the low physical activity students had low fruit consumption. Similarly, 25 of the low physical activity students has moderate fruit consumption. This percent is 23.1%. 25 ⫽ 0.231 108
HEALTH
In all, we calculate nine percents. Here are the results: Column percents for fruit consumption and physical activity Physical activity CHALLENGE
Fruit consumption
Low
Moderate
Vigorous
Total
Low
63.9
46.5
46.4
48.1
Medium
23.1
28.4
26.9
27.1
High
13.0
25.1
26.7
24.8
Total
100.0
100.0
100.0
100.0
In addition to the conditional distributions of fruit consumption for each level of physical activity, the table also gives the marginal distribution of fruit consumption. These percents appear in the rightmost column, labeled “Total.”
Analysis of Two-Way Tables PhysAct = Low
PhysAct = Moderate
PhysAct = Vigorous
70
70
70
60
60
60
50 40 30 20 10 0
Percent of students
•
Percent of students
CHAPTER 9
Percent of students
542
50 40 30 20 10
Low Medium High Fruit
0
50 40 30 20 10
Low Medium High Fruit
0
Low Medium High Fruit
FIGURE 9.5 Comparison of the distribution of fruit consumption for different levels of physical activity, for Example 9.9.
The sum of the percents in each column should be 100, except for possible small roundoff errors. It is good practice to calculate each percent separately and then sum each column as a check. In this way we can find arithmetic errors that would not be uncovered if, for example, we calculated the column percent for the “High” row by subtracting the sum of the percents for “Low” and “Medium” from 100. Figure 9.5 compares the distributions of fruit consumption for each of the three physical activity levels. For each activity level, the highest percent is for students who consume low amounts of fruit. For low physical activity, there is a clear decrease in the percent when moving from low to medium to high fruit consumption. The patterns for moderate physical activity and vigorous physical activity are similar. Low fruit consumption is still dominant, but the percents for medium and high fruit consumption are about the same for the moderate and vigorous activity levels. The percent of low fruit consumption is highest for the low physical activity students compared with those who have moderate or vigorous physical activity. These plots suggest that there is an association between these two variables.
USE YOUR KNOWLEDGE DATA HEALTH
DATA HEALTH
9.4 Examine the row percents. Refer to the health habits data that we examined in Example 9.8 (page 540). For the row percents, make a table similar to the one in Example 9.9 (page 541). 9.5 Make some plots. Refer to the previous exercise. Make plots of the row percents similar to those in Figure 9.5. 9.6 Compare the conditional distributions. Compare the plots you made in the previous exercise with those given in Figure 9.5. Which set of plots
CHALLENG
9.1 Inference for Two-Way Tables
DATA HEALTH
543
do you think gives a better graphical summary of the relationship between these two categorical variables? Give reasons for your answer. Note that there is not a clear right or wrong answer for this exercise. You need to make a choice and to explain your reasons for making it.
CHALLENGE
We observe a clear relationship between physical activity and fruit consumption in this study. The chi-square test assesses whether this observed association is statistically significant, that is, too strong to occur often just by chance. The test confirms only that there is some relationship. The percents we have compared describe the nature of the relationship. The chi-square test does not in itself tell us what population our conclusion describes. The subjects in this study were college students from four midwestern universities. The researchers could argue that these findings apply to college students in general. This type of inference is important, but it is based on expert judgment and is beyond the scope of the statistical inference that we have been studying.
EXAMPLE DATA HEALTH
9.10 The chi-square significance test for health habits of college students. The first step in performing the significance test is to calculate the expected cell counts. Let’s start with the cell for students with low fruit consumption and low physical activity. Using the formula on page 537, we need three quantities: (1) the corresponding row total, 569, the number of students who have low fruit consumption, (2) the column total, 108, the number of students who have low physical activity, and (3) the total number of students, 1184. The expected cell count is therefore
CHALLENGE
11082 15692 ⫽ 51.90 1184 Note that although any observed count of the number of students must be a whole number, an expected count need not be. Calculations for the other eight cells in the 3 ⫻ 3 table are performed in the same way. With these nine expected counts we are now ready to use the formula for the X 2 statistic on page 538. The first term in the sum comes from the cell for students with low fruit consumption and low physical activity. The observed count is 69 and the expected count is 51.90. Therefore, the contribution to the X 2 statistic for this cell is 169 ⫺ 51.902 2 ⫽ 5.63 51.90 When we add the terms for each of the nine cells, the result is X 2 ⫽ 14.15 Because there are r ⫽ 3 levels of fruit consumption and c ⫽ 3 levels of physical activity, the degrees of freedom for this statistic are df ⫽ 1r ⫺ 12 1c ⫺ 12 ⫽ 13 ⫺ 12 13 ⫺ 12 ⫽ 4
544
CHAPTER 9
df 5 4 p 0.01 2 x 13.28
•
Analysis of Two-Way Tables
0.005 14.86
Under the null hypothesis that fruit consumption and physical activity are independent, the test statistic X 2 has a x 2 142 distribution. To obtain the P-value, look at the df 5 4 row in Table F. The calculated value X 2 ⫽ 14.15 lies between the critical points for probabilities 0.01 and 0.005. The P-value is therefore between 0.01 and 0.005. (Software gives the value as 0.0068.) There is strong evidence 1X 2 ⫽ 14.15, df ⫽ 4, P ⬍ 0.012 that there is a relationship between fruit consumption and physical activity.
We can check our work by adding the expected counts to obtain the row and column totals, as in the table. These are the same as those in the table of observed counts except for small roundoff errors.
USE YOUR KNOWLEDGE DATA HEALTH
DATA HEALTH
9.7 Find the expected counts. Refer to Example 9.10. Compute the expected counts and display them in a 3 ⫻ 3 table. Check your work by adding the expected counts to obtain row and column totals. These should be the same as those in the table of observed counts except for small roundoff errors. 9.8 Find the X 2 statistic. Refer to the previous exercise. Use the formula on page 538 to compute the contributions to the chi-square statistic for each cell in the table. Verify that their sum is 14.15.
CHALLENGE
9.9 Find the P-value. For each of the following give the degrees of freedom and an appropriate bound on the P-value for the X 2 statistic. (a) X 2 ⫽ 19.00 for a 5 ⫻ 4 table (b) X 2 ⫽ 19.00 for a 4 ⫻ 5 table
CHALLENGE
(c) X 2 ⫽ 7.50 for a 2 ⫻ 2 table (d) X 2 ⫽ 1.60 for a 2 ⫻ 2 table
DATA FACE
9.10 Time spent on Facebook: the chi-square test. Refer to Example 9.2 (page 531). Use the chi-square test to assess the relationship between gender and increased amount of time spent on Facebook in the last year. State your conclusion.
The chi-square test and the z test CHALLENGE
A comparison of the proportions of “successes” in two populations leads to a 2 ⫻ 2 table. We can compare two population proportions either by the chisquare test or by the two-sample z test from Section 8.2. In fact, these tests always give exactly the same result, because the X 2 statistic is equal to the square of the z statistic, and x 2 112 critical values are equal to the squares of the corresponding N10, 12 critical values. The advantage of the z test is that we can test either one-sided or two-sided alternatives. The chi-square test always tests the two-sided alternative. Of course, the chi-square test can compare more than two populations, whereas the z test compares only two.
9.1 Inference for Two-Way Tables
545
USE YOUR KNOWLEDGE DATA
9.11 Comparison of conditional distributions. Consider the following 2 ⫻ 2 table.
COMP
Observed counts Explanatory variable Response variable
1
2
Total
CHALLENGE
Yes
75
95
170
No
135
115
250
Total
210
210
420
(a) Compute the conditional distribution of the response variable for each of the two explanatory-variable categories. (b) Display the distributions graphically. (c) Write a short paragraph describing the two distributions and how they differ.
DATA COMP
9.12 Expected cell counts and the chi-square test. Refer to Exercise 9.11. You decide to use the chi-square test to compare these two conditional distributions. (a) What is the expected count for the first cell (observed count is 75)? (b) Computer software gives you X 2 ⫽ 3.95. What are the degrees of freedom for this statistic? (c) Using Table F, give an appropriate bound on the P-value.
CHALLENGE DATA
COMP
9.13 Compare the chi-square test with the z test. Refer to the previous two exercises and the significance test for comparing two proportions (page 517). (a) Set up the problem as a comparison between two proportions. Describe the population proportions, state the null and alternative hypotheses, and give the sample proportions. (b) Carry out the significance test to compare the two proportions. Report the z statistic, the P-value, and your conclusion.
CHALLENGE
(c) Compare the P-value for this significance test with the one that you reported in the previous exercise. (d) Verify that the square of the z statistic is the X 2 statistic given in the previous exercise.
Models for two-way tables The chi-square test for the presence of a relationship between the two variables in a two-way table is valid for data produced from several different study designs. The precise statement of the null hypothesis of “no relationship” in terms of population parameters is different for different designs. We now
546
CHAPTER 9
•
Analysis of Two-Way Tables describe two of these settings in detail. An essential requirement is that each experimental unit or subject is counted only once in the data table. Comparing several populations: the first model Let’s think about the setting of Example 9.8 from a slightly different perspective. Suppose that we are interested in the relationship between physical activity and year of study in college. We will assume that the design called for independent SRSs of students from each of the four years. Here we have an example of separate and independent random samples from each of c populations. The c columns of the two-way table represent the populations. There is a single categorical response variable, physical activity. The r rows of the table correspond to the values of the response variable, physical activity. We know that the z test for comparing the two proportions of successes and the chi-square test for the 2 ⫻ 2 table are equivalent. The r ⫻ c table allows us to compare more than two populations or more than two categories of response, or both. In this setting, the null hypothesis “no relationship between column variable and row variable” becomes H0 : The distribution of the response variable is the same in all c populations. ˇ
ˇ
Because the response variable is categorical, its distribution just consists of the probabilities of its r possible values. The null hypothesis says that these probabilities (or population proportions) are the same in all c populations.
EXAMPLE 9.11 Physical activity: comparing subpopulations based on year of study. In our scenario based on Example 9.8, we compare four populations: Population 1: first-year students Population 2: second-year students Population 3: third-year students Population 4: fourth-year students The null hypothesis for the chi-square test is H0 : The distribution of physical activity is the same in all four populations. ˇ
ˇ
The alternative hypothesis for the chi-square test is Ha : The distribution of physical activity is not the same in all four populations. ˇ
ˇ
The parameters of the model are the proportions of low, moderate, and vigorous physical activity in each of the four years of study.
More generally, if we take an independent SRS from each of c populations and classify each outcome into one of r categories, we have an r ⫻ c table of population proportions. There are c different sets of proportions to be compared. There are c groups of subjects, and a single categorical variable with r possible values is measured for each individual.
9.1 Inference for Two-Way Tables
547
MODEL FOR COMPARING SEVERAL POPULATIONS USING TWO-WAY TABLES Select independent SRSs from each of c populations, of sizes n1, n2, p , nc. Classify each individual in a sample according to a categorical response variable with r possible values. There are c different probability distributions, one for each population. The null hypothesis is that the distributions of the response variable are the same in all c populations. The alternative hypothesis says that these c distributions are not all the same.
Testing independence: the second model A second model for which our analysis of r ⫻ c tables is valid is illustrated by the ongoing fright symptoms study, Example 9.3. There, a single sample from a single population was classified according to two categorical variables.
EXAMPLE 9.12 Ongoing fright symptoms and gender: testing independence. The single population studied is college students. Each college student was classified according to the following categorical variables: “Ongoing fright symptoms,” with possible responses “Yes” and “No,” and “Gender,” with possible responses “Men” and “Women.” The null hypothesis for the chi-square test is H0 : “Ongoing fright symptoms” and “Gender” are independent. ˇ
LOOK BACK multiplication rule, p. 283
LOOK BACK joint distribution, p. 141
LOOK BACK marginal distributions, p. 142
ˇ
The parameters of the model are the probabilities for each of the four possible combinations of values of the row and column variables. If the null hypothesis is true, the multiplication rule for independent events says that these can be found as the products of outcome probabilities for each variable alone.
More generally, take an SRS from a single population and record the values of two categorical variables, one with r possible values and the other with c possible values. The data are summarized by recording the number of individuals for each possible combination of outcomes for the two random variables. This gives an r ⫻ c table of counts. Each of these r ⫻ c possible outcomes has its own probability. The probabilities give the joint distribution of the two categorical variables. Each of the two categorical random variables has a distribution. These are the marginal distributions because they are the sums of the population proportions in the rows and columns. The null hypothesis “no relationship” now states that the row and column variables are independent. The multiplication rule for independent events tells us that the joint probabilities are the products of the marginal probabilities.
548
CHAPTER 9
•
Analysis of Two-Way Tables
EXAMPLE 9.13 The joint distribution and the two marginal distributions. The joint probability distribution gives a probability for each of the four cells in our 2 ⫻ 2 table of “Ongoing fright symptoms” and “Gender.” The marginal distribution for “Ongoing fright symptoms” gives probabilities for each of the two possible categories; the marginal distribution for “Gender” gives probabilities for each of the two possible gender categories. Independence between “Ongoing fright symptoms” and “Gender” implies that the joint distribution can be obtained by multiplying the appropriate terms from the two marginal distributions. For example, the probability that a randomly chosen college student has ongoing fright symptoms and is male is equal to the probability that the student has ongoing symptoms times the probability that the student is male. The hypothesis that “Ongoing fright symptoms” and “Gender” are independent says that the multiplication rule applies to all outcomes.
MODEL FOR EXAMINING INDEPENDENCE IN TWO-WAY TABLES Select an SRS of size n from a population. Measure two categorical variables for each individual. The null hypothesis is that the row and column variables are independent. The alternative hypothesis is that the row and column variables are dependent. You can distinguish between the two models by examining the design of the study. In the independence model, there is a single sample. The column totals and row totals are random variables. The total sample size n is set by the researcher; the column and row sums are known only after the data are collected. For the comparison-of-populations model, on the other hand, there is a sample from each of two or more populations. The column sums are the sample sizes selected at the design phase of the research. The null hypothesis in both models says that there is no relationship between the column variable and the row variable. The precise statement of the hypothesis differs, depending on the sampling design. Fortunately, the test of the hypothesis of “no relationship” is the same for both models; it is the chisquare test. There are yet other statistical models for two-way tables that justify the chi-square test of the null hypothesis “no relation,” made precise in ways suitable for these models. Statistical methods related to the chi-square test also allow the analysis of three-way and higher-way tables of count data. You can find a discussion of these topics in advanced texts on categorical data.5 BEYOND THE BASICS
Meta-analysis Policymakers wanting to make decisions based on research are sometimes faced with the problem of summarizing the results of many studies. These studies may show effects of different magnitudes, some highly significant
9.1 Inference for Two-Way Tables
meta-analysis
549
and some not significant. What overall conclusion can we draw? Meta-analysis is a collection of statistical techniques designed to combine information from different but similar studies. Each individual study must be examined with care to ensure that its design and data quality are adequate. The basic idea is to compute a measure of the effect of interest for each study. These are then combined, usually by taking some sort of weighted average, to produce a summary measure for all of the studies. Of course, a confidence interval for the summary is included in the results. Here is an example.
EXAMPLE 9.14 Do we eat too much salt? Evidence from a variety of sources suggests that diets high in salt are associated with risks to human health. To investigate the relationship between salt intake and stroke, information from 14 studies was combined in a meta-analysis.6 Subjects were classified based on the amount of salt in their normal diet. They were followed for several years and then classified according to whether or not they had developed cardiovascular disease (CVD). A total of 104,933 subjects were studied, and 5161 of them developed CVD. Here are the data from one of the studies:7 Low salt CVD
High salt
88
112
No CVD
1081
1134
Total
1169
1246
LOOK BACK relative risk, p. 520
Let’s look at the relative risk for this study. We first find the proportion of subjects who developed CVD in each group. For the subjects with a low salt intake the proportion who developed CVD is 88 ⫽ 0.0753 1169 or 75 per thousand; for the high-salt group, the proportion is 112 ⫽ 0.0899 1246 or 90 per thousand. We can now compute the relative risk as the ratio of these two proportions. We choose to put the high-salt group in the numerator. The relative risk is 0.0899 ⫽ 1.19 0.0753 Relative risk greater than 1 means that the high-salt group developed more CVD than the low-salt group. When the data from all 14 studies were combined, the relative risk was reported as 1.17 with a 95% confidence interval of (1.02, 1.32). Since this interval does not include the value 1, corresponding to equal proportions in
550
CHAPTER 9
•
Analysis of Two-Way Tables the two groups, we conclude that the higher CVD rates are not the same for the two diets 1P ⬍ 0.052. The high-salt diet is associated with a 17% higher rate of CVD than the low-salt diet. USE YOUR KNOWLEDGE 9.14 A different view of the relative risk. In the previous example, we computed the relative risk for the high-salt group relative to the lowsalt group. Now, compute the relative risk for the low-salt group relative to the high-salt group by inverting the relative risk reported for the meta analysis in Example 9.14, that is, compute 1/1.17. Then restate the last paragraph of the exercise with this change. (Hint: For the lower confidence limit, use 1 divided by the upper limit for the original ratio and do a similar calculation for the upper limit.)
SECTION 9.1 Summary The null hypothesis for r ⫻ c tables of count data is that there is no relationship between the row variable and the column variable. Expected cell counts under the null hypothesis are computed using the formula expected count ⫽
row total ⫻ column total n
The null hypothesis is tested by the chi-square statistic, which compares the observed counts with the expected counts: X2 ⫽ a
1observed ⫺ expected2 2 expected
Under the null hypothesis, X 2 has approximately the x 2 distribution with 1r ⫺ 12 1c ⫺ 12 degrees of freedom. The P-value for the test is P1x 2 ⱖ X 2 2 where x 2 is a random variable having the x 2 1df2 distribution with df ⫽ 1r ⫺ 12 1c ⫺ 12. The chi-square approximation is adequate for practical use when the average expected cell count is 5 or greater and all individual expected counts are 1 or greater, except in the case of 2 ⫻ 2 tables. All four expected counts in a 2 ⫻ 2 table should be 5 or greater. For two-way tables we first compute percents or proportions that describe the relationship of interest. Then, we compute expected counts, the X 2 statistic, and the P-value. Two different models for generating r ⫻ c tables lead to the chi-square test. In the first model, independent SRSs are drawn from each of c populations, and each observation is classified according to a categorical variable with r possible values. The null hypothesis is that the distributions of the row categorical variable are the same for all c populations. In the second model, a single SRS is drawn from a population, and observations are classified according to two categorical variables having r and c possible values. In this model, H0 states that the row and column variables are independent.
9.2 Goodness of Fit
551
9.2 Goodness of Fit When you complete this section, you will be able to • Compute expected counts given a sample size and the probabilities specified by a null hypothesis for a chi-square goodness-of-fit test. • Find the chi-square test statistic and its P-value. • Interpret the results of a chi-square goodness-of-fit significance test.
In the last section, we discussed the use of the chi-square test to compare categorical-variable distributions of c populations. We now consider a slight variation on this scenario where we compare a sample from one population with a hypothesized distribution. Here is an example that illustrates the basic ideas.
EXAMPLE DATA
9.15 Sampling in the Adequate Calcium Today (ACT) study. The ACT ACT
CHALLENGE
study was designed to examine relationships among bone growth patterns, bone development, and calcium intake. Participants were over 14,000 adolescents from six states: Arizona (AZ), California (CA), Hawaii (HI), Indiana (IN), Nevada (NV), and Ohio (OH). After the major goals of the study were completed, the investigators decided to do an additional analysis of the written comments made by the participants during the study. Because the number of participants was so large, a sampling plan was devised to select sheets containing the written comments of approximately 10% of the participants. A systematic sample (see page 204) of every tenth comment sheet was retrieved from each storage container for analysis.8 Here are the counts for each of the six states: Number of study participants in the sample AZ
CA
HI
IN
NV
OH
Total
167
257
257
297
107
482
1567
There were 1567 study participants in the sample. We will use the proportions of students from each of the states in the original sample of over 15,000 participants as the population values.9 Here are the proportions:
AZ
CA
0.105
0.172
Population proportions HI IN NV 0.164
0.188
0.070
OH
Total
0.301
100.000
Let’s see how well our sample reflects the state population proportions. We start by computing expected counts. Since 10.5% of the population is from Arizona, we expect the sample to have about 10.5% from Arizona. Therefore, since the sample has 1567 subjects, our expected count for Arizona is expected count for Arizona ⫽ 0.105115672 ⫽ 164.535
552
CHAPTER 9
•
Analysis of Two-Way Tables Here are the expected counts for all six states: Expected counts AZ
CA
HI
IN
NV
OH
Total
164.54
269.52
256.99
294.60
109.69
471.67
1567.01
USE YOUR KNOWLEDGE
ACT
9.16 Calculate the expected counts. Refer to Example 9.15. Find the expected counts for the other five states. Report your results with three places after the decimal as we did for Arizona.
DATA ACT
9.15 Why is the sum 1567.01? Refer to the table of expected counts in Example 9.15. Explain why the sum of the expected counts is 1567.01 and not 1567.
DATA CHALLENGE CHALLENGE
As we saw with the expected counts in the analysis of two-way tables in Section 9.1, we do not really expect the observed counts to be exactly equal to the expected counts. Different samples under the same conditions would give different counts. We expect the average of these counts to be equal to the expected counts when the null hypothesis is true. How close do we think the counts and the expected counts should be? We can think of our table of observed counts in Example 9.15 as a one-way table with six cells, each with a count of the number of subjects sampled from a particular state. Our question of interest is translated into a null hypothesis that says that the observed proportions of students in the six states can be viewed as random samples from the subjects in the ACT study. The alternative hypothesis is that the process generating the observed counts, a form of systematic sampling in this case, does not provide samples that are compatible with this hypothesis. In other words, the alternative hypothesis says that there is some bias in the way that we selected the subjects whose comments we will examine. Our analysis of these data is very similar to the analyses of two-way tables that we studied in Section 9.1. We have already computed the expected counts. We now construct a chi-square statistic that measures how far the observed counts are from the expected counts. Here is a summary of the procedure:
THE CHI-SQUARE GOODNESS-OF-FIT TEST Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n1, n2 , p , nk, in k cells. The null hypothesis specifies probabilities p1, p2, p , pk for the possible outcomes. The alternative hypothesis says that the true probabilities of the possible outcomes are not the probabilities specified in the null hypothesis. For each cell, multiply the total number of observations n by the specified probability to determine the expected counts: expected count ⫽ npi
9.2 Goodness of Fit
553
The chi-square statistic measures how much the observed cell counts differ from the expected cell counts. The formula for the statistic is X2 ⫽ a
1observed count ⫺ expected count2 2 expected count
The degrees of freedom are k ⫺ 1, and P-values are computed from the chi-square distribution. Use this procedure when the expected counts are all 5 or more.
EXAMPLE DATA ACT
9.16 The goodness-of-fit test for the ACT study. For Arizona, the observed count is 167. In Example 9.15, we calculated the expected count, 164.535. The contribution to the chi-square statistic for Arizona is 1observed count ⫺ expected count2 2 expected count
⫽
1167 ⫺ 164.5352 2 ⫽ 0.0369 164.535
CHALLENGE
We use the same approach to find the contributions to the chi-square statistic for the other five states. The expected counts are all at least 5, so we can proceed with the significance test. The sum of these six values is the chi-square statistic, X 2 ⫽ 0.93 The degrees of freedom are the number of cells minus 1: df ⫽ 6 ⫺ 1 ⫽ 5. We calculate the P-value using Table F or software. From Table F, we can determine P ⬎ 0.25. We conclude that the observed counts are compatible with the hypothesized proportions. The data do not provide any evidence that our systematic sample was biased with respect to selection of subjects from different states.
USE YOUR KNOWLEDGE DATA ACT
9.17 Compute the chi-square statistic. For each of the other five states, compute the contribution to the chi-square statistic using the method illustrated for Arizona in Example 9.16. Use the expected counts that you calculated in Exercise 9.16 for these calculations. Show that the sum of these values is the chi-square statistic.
EXAMPLE CHALLENGE DATA
9.17 The goodness-of-fit test from software. Software output from ACT
Minitab and SPSS for this problem is given in Figure 9.6. Both report the P-value as 0.968. Note that the SPSS output includes a column titled “Residual.” For tables of counts, a residual for a cell is defined as residual ⫽
observed count ⫺ expected count 1expected count
Note that the chi-square statistic is the sum of the squares of these residuals. CHALL
554
CHAPTER 9
•
Analysis of Two-Way Tables
FIGURE 9.6 (a) Minitab and (b) SPSS output for Example 9.17.
Minitab
Chi-Square Goodness-of-Fit Test for Observed Counts in Variable: Count Using category names in State
Category AZ CA HI IN NV OH
N 1567
DF 5
Observed
Test Proportion
Expected
Contribution to Chi-Sq
167 257 257 297 107 482
0.105 0.172 0.164 0.188 0.070 0.301
164.535 269.524 256.988 294.596 109.690 471.667
0.036930 0.581954 0.000001 0.019617 0.065969 0.226369
Chi-Sq 0.930840
P-Value 0.968
Open a Minitab project file
(a) Minitab
*Output1 - IBM SPSS Statistics Viewer
Chi-Square Test Frequencies Label Observed N
Expected N
Residual
1
167
164.5
2.5
2
257
269.5
-12.5
3
257
257.0
.0
4
297
294.6
2.4
5
107
109.7
-2.7
6
482
471.7
10.3
Total
1567 Test Statistics Label Chi-Square df Asymp. Sig.
.931a 5 .968
IBM SPSS Statistics Processor is ready
(b) SPSS
H: 22, W: 430 pt
9.2 Goodness of Fit
555
Some software packages do not provide routines for computing the chisquare goodness-of-fit test. However, there is a very simple trick that can be used to produce the results from software that can analyze two-way tables. Make a two-way table in which the first column contains k cells with the observed counts. Add a second column with counts that correspond exactly to the probabilities specified by the null hypothesis, with a very large number of observations. Then perform the chi-square significance test for twoway tables.
USE YOUR KNOWLEDGE DATA MM
9.18 Distribution of M&M colors. M&M Mars Company has varied the mix of colors for M&M’S Plain Chocolate Candies over the years. These changes in color blends are the result of consumer preference tests. Most recently, the color distribution is reported to be 13% brown, 14% yellow, 13% red, 20% orange, 24% blue, and 16% green.10 You open up a 14-ounce bag of M&M’S and find 61 brown, 59 yellow, 49 red, 77 orange, 141 blue, and 88 green. Use a goodnessof-fit test to examine how well this bag fits the percents stated by the M&M Mars Company.
CHALLENGE
EXAMPLE 9.18 The sign test as a goodness-of-fit test. In Example 7.12 (page 439) we used a sign test to examine the effect of the full moon on aggressive behaviors of dementia patients. The study included 15 patients, 14 of whom exhibited a greater number of aggressive behaviors on moon days than on other days. The sign test tests the null hypothesis that patients are equally likely to exhibit more aggressive behaviors on moon days than on other days. Since n ⫽ 15, the sample proportion is pˆ ⫽ 14兾15 and the null hypothesis is H0 : p ⫽ 0.5. To look at these data from the viewpoint of goodness of fit, we think of the data as two counts: patients who had a greater number of aggressive behaviors on moon days and patients who had a greater number of aggressive behaviors on other days. ˇ
ˇ
Counts Moon
Other
Total
14
1
15
If the two outcomes are equally likely, the expected counts are both 7.5 115 ⫻ 0.52. The expected counts are both greater than 5, so we can proceed with the significance test.
556
CHAPTER 9
•
Analysis of Two-Way Tables The test statistic is X2 ⫽
114 ⫺ 7.52 2
⫹
7.5 ⫽ 5.633 ⫹ 5.633 ⫽ 11.27
11 ⫺ 7.52 2 7.5
We have k ⫽ 2, so the degrees of freedom are 1. From Table F we conclude that P ⬍ 0.001.
In Example 7.12, we tested the null hypothesis versus the one-sided alternative that there was a “moon effect.” Within the framework of the goodnessof-fit test, we test only the general alternative hypothesis that the distribution of the counts does not follow the specified probabilities. Note that the P-value in Example 7.12 was calculated using the binomial distribution. The value was 0.000488, approximately one-half of the value that we reported from Table F in Example 9.18.
USE YOUR KNOWLEDGE 9.19 Is the coin fair? In Example 4.3 (page 234) we learned that the South African statistician John Kerrich tossed a coin 10,000 times while imprisoned by the Germans during World War II. The coin came up heads 5067 times. (a) Formulate the question about whether or not the coin was fair as a goodness-of-fit hypothesis. (b) Perform the chi-square significance test and write a short summary of the results.
SECTION 9.2 Summary The chi-square goodness-of-fit test is used to compare the sample distribution of a categorical variable from a population with a hypothesized distribution. The data for n observations with k possible outcomes are summarized as observed counts, n1, n2, p , nk, in k cells. The null hypothesis specifies probabilities p1, p2, p , pk for the possible outcomes. The analysis of these data is similar to the analyses of two-way tables discussed in Section 9.1. For each cell, the expected count is determined by multiplying the total number of observations n by the specified probability pi. The null hypothesis is tested by the usual chi-square statistic, which compares the observed counts, ni, with the expected counts. Under the null hypothesis, X 2 has approximately the x 2 distribution with df 5 k 2 1.
Chapter 9 Exercises
557
CHAPTER 9 Exercises (b) Samples of first-year students and fourth-year students were asked if they were in favor of a new proposed core curriculum. Among the first-year students, 85 said “Yes” and 276 said “No.” For the fourth-year students, 117 said “Yes” and 104 said “No.”
For Exercises 9.1 to 9.3, see pages 535–536; for Exercises 9.4 to 9.6, see pages 542–543; for Exercises 9.7 to 9.10, see page 544; for Exercises 9.11 to 9.13, see page 545; for Exercise 9.14, see page 550; for Exercises 9.15 and 9.16, see page 552; for Exercise 9.17, see page 553; for Exercise 9.18, see page 555; and for Exercise 9.19, see page 556.
9.21 Find the joint distribution, the marginal distributions, and the conditional distributions. Refer to the previous exercise. For each scenario, identify the joint distribution, the marginal distributions, and the conditional distributions.
9.20 Translate each problem into a 2 3 2 table. In each of the following scenarios, translate the problem into one that can be analyzed using a 2 ⫻ 2 table. (a) Two website designs are being compared. Fifty students have agreed to be subjects for the study, and they are randomly assigned to watch one of the designs for as long as they like. For each student the study directors record whether or not the website is watched for more than a minute. For the first design, 12 students watched for more than a minute; for the second, 5 watched for more than a minute.
FIGURE 9.7 Computer output for Exercise 9.22.
9.22 Read the output. Exercise 8.58 (page 523) gives data on individuals rejected for military service in the Cuban War of Independence in 1898 because they did not have enough teeth. In that exercise you compared the rejection rate for those under the age of 20 with the rejection rate for those over 40. Figure 9.7 gives software output for the table that classifies the recruits into six age
Minitab
Tabulated statistics: Reject, Age Using frequencies in Count Rows:
Reject
Columns: Age
15 to 20 20 to 25
40 to 60
All
No
58884 18.17 99.88 17.613 57136 53.5
77992 24.07 99.18 23.328 76216 41.4
55597 17.16 98.04 16.630 54964 7.3
43994 13.58 96.11 13.159 44367 3.1
47569 14.68 94.28 14.229 48902 36.3
39985 12.34 91.32 11.960 42437 141.7
324021 100.00 96.92 96.919 324021 *
Yes
68 0.66 0.12 0.020 1816 1682.8
647 6.28 0.82 0.194 2423 1301.5
1114 10.82 1.96 0.333 1747 229.5
1783 17.31 3.89 0.533 1410 98.5
2887 28.03 5.72 0.864 1554 1142.2
3801 36.90 8.68 1.137 1349 4456.9
10300 100.00 3.08 3.081 10300 *
All
58952 17.63 100.00 17.633 58952 *
56711 16.96 100.00 13.693 56711 *
45777 13.69 100.00 13.693 45777 *
Cell Contents:
78639 23.52 100.00 23.522 78639 *
25 to 30 30 to 35 35 to 40
50456 15.09 100.00 13.693 45777 *
Count % of Row % of Column % of Total Expected count Contribution to Chi-square
Pearson Chi-Square = 9194.724, DF = 5, P-Value = 0.000
Welcome to Minitab, press F1 for help.
43786 13.10 100.00 13.097 43786 *
334321 100.00 100.00 100.000 334321 *
558
CHAPTER 9
•
Analysis of Two-Way Tables
categories. Use the output to find the joint distribution, the marginal distributions, and the conditional distributions for these data. TEETH 9.23 Relationship or explanatory and response variables? In each of the following scenarios, determine whether the goal is to describe the relationship between an explanatory variable and a response variable or to simply describe the relationship between two categorical variables. There may not always be a clear correct answer, but you need to give reasons for the answer you choose. If there are explanatory and response variables, identify them. (a) A large sample of undergraduates is classified by major and year of study. (b) Equal-sized samples of first-year, second-year, thirdyear, and fourth-year undergraduates are selected. Each student is asked “Do you eat five or more servings of fruits or vegetables per day?” (c) Television programs are classified as low, medium, or high for violence content and by morning, afternoon, prime time, or late night for the time of day that they are broadcast.
whether or not they were harassed online. Here are the data for the girls: HARASG Harassed online Harassed in person
Yes
No
Yes No
321 40
200 441
(a) Analyze these data using the method presented in Chapter 8 for comparing two proportions (page 508). (b) Analyze these data using the method presented in this chapter for examining a relationship between two categorical variables in a 2 ⫻ 2 table. (c) Use this example to explain the relationship between the chi-square test and the z test for comparing two proportions. (d) The number of girls reported in this exercise is not the same as the number reported for Exercise 9.25. Suggest a possible reason for this difference. 9.28 Data for the boys. Refer to the previous exercise. Here are the corresponding data for boys: HARASB
(d) The setting of Exercise 9.22, which examines age and rejection rate for military recruits. 9.24 Choose the appropriate conditional distributions. Refer to the previous exercise. For each scenario, choose which conditional distribution you would use to describe the data. Give reasons for your answers. 9.25 Sexual harassment in middle and high schools. A nationally representative survey of students in grades 7 to 12 asked about the experience of these students with respect to sexual harassment.11 One question asked how many times the student had witnessed sexual harassment in school. Here are the data categorized by gender: HARAS1
Girls Boys
Harassed in person
Yes
No
Yes No
183 48
154 578
Using these data, repeat the analyses that you performed for the girls in Exercise 9.27. How do the results for the boys differ from those that you found for girls? 9.29 Repeat your analysis. In part (a) of Exercise 9.27, you had to decide which variable was explanatory and which variable was response when you computed the proportions to be compared.
Never
Once
More than once
(a) Did you use harassed online or harassed in person as the explanatory variable? Explain the reasons for your choice.
140 106
192 125
671 732
(b) Repeat the analysis that you performed in Exercise 9.27 with the other choice for the explanatory variable.
Times witnessed Gender
Harassed online
Find the expected counts for this 2 ⫻ 3 table. 9.26 Do the significance test. Refer to the previous exercise. Compute the chi-square statistic and the P-value. Write a short summary of your conclusions from the analysis of these data. HARAS1 9.27 Sexual harassment online or in person. In the study described in Exercise 9.25, the students were also asked whether or not they were harassed in person and
(c) Summarize what you have learned from comparing the results of using the different choices for analyzing these data. 9.30 Which model? Refer to the four scenarios in Exercise 9.23. For each, determine whether the model corresponds to the comparison of several populations or to the test of independence. Give reasons for your answers. 9.31 Is the die fair? You suspect that a die has been altered so that the outcomes of a roll, the numbers 1 to 6,
Chapter 9 Exercises are not equally likely. You toss the die 600 times and obtain the following results: DIE Outcome
1
2
3
4
5
6
Count
89
82
123
115
100
91
(c) Is it meaningful to interpret the marginal totals or percents for this table? Explain your answer. (d) Analyze the data in your two-way table and summarize the results.
Compute the expected counts that you would need to use in a goodness-of-fit test for these data. 9.32 Perform the significance test. Refer to the previous exercise. Find the chi-square test statistic and its P-value and write a short summary of your conclusions. 9.33 The value of online courses. A Pew Internet survey asked college presidents whether or not they believed that online courses offer an equal educational value when compared with courses taken in the classroom. The presidents were classified by the type of educational institution. Here are the data:12 ONLINE
9.36 Remote deposit capture. The Federal Reserve has called remote deposit capture (RDC) “the most important development the [U.S.] banking industry has seen in years.” This service allows users to scan checks and to transmit the scanned images to a bank for posting.13 In its annual survey of community banks, the American Bankers Association asked banks whether or not they offered this service.14 Here are the results classified by the asset size (in millions of dollars) of the bank: RDCA Offer RDC Asset size
Institution type 4-year private
4-year public
2-year private
For profit
Yes
36
50
66
54
No
62
48
34
45
Response
559
(a) Discuss different ways to plot the data. Choose one way to make a plot and give reasons for your choice. (b) Make the plot and describe what it shows. 9.34 Do the answers depend upon institution type? Refer to the previous exercise. You want to examine whether or not the data provide evidence that the belief that online and classroom courses offer equal educational value varies with the type of institution of the president. ONLINE (a) Formulate this question in terms of appropriate null and alternative hypotheses. (b) Perform the significance test. Report the test statistic, the degrees of freedom, and the P-value. (c) Write a short summary explaining the results. 9.35 Compare the college presidents with the general public. Refer to Exercise 9.33. Another Pew Internet survey asked the general public about their opinions on the value of online courses. Of the 2142 people who participated in the survey, 621 responded “Yes” to the question “Do you believe that online courses offer an equal educational value when compared with courses taken in the classroom?” ONLINE (a) Use the data given in Exercise 9.33 to find the number of college presidents who responded “Yes” to the question. (b) Construct a two-way table that you can use to compare the responses of the general public with the responses of the college presidents.
Yes
No
Under $100
63
309
$101–$200
59
132
112
85
$201 or more
(a) Summarize the results of this survey question numerically and graphically. (b) Test the null hypothesis that there is no association between the size of a bank, measured by assets, and whether or not they offer RDC. Report the test statistic, the P-value, and your conclusion. 9.37 Health care fraud. Most errors in billing insurance providers for health care services involve honest mistakes by patients, physicians, or others involved in the health care system. However, fraud is a serious problem. The National Health Care Anti-fraud Association estimates that approximately $68 billion is lost to health care fraud each year.15 When fraud is suspected, an audit of randomly selected billings is often conducted. The selected claims are then reviewed by experts, and each claim is classified as allowed or not allowed. The distributions of the amounts of claims are frequently highly skewed, with a large number of small claims and a small number of large claims. Since simple random sampling would likely be overwhelmed by small claims and would tend to miss the large claims, stratification is often used. See the section on stratified sampling in Chapter 3 (page 196). Here are data from an audit that used three strata based on the sizes of the claims (small, medium, and large):16 BILLER Stratum
Sampled claims
Number not allowed
Small
57
6
Medium
17
5
5
1
Large
560
CHAPTER 9
•
Analysis of Two-Way Tables
(a) Construct the 3 ⫻ 2 table of counts for these data that includes the marginal totals. (b) Find the percent of claims that were not allowed in each of the three strata. (c) To perform a significance test, combine the medium and large strata. Explain why we do this. (d) State an appropriate null hypothesis to be tested for these data. (e) Perform the significance test and report your test statistic with degrees of freedom and the P-value. State your conclusion. 9.38 Population estimates. Refer to the previous exercise. One reason to do an audit such as this is to estimate the number of claims that would not be allowed if all claims in a population were examined by experts. We have an estimate of the proportion of unallowed claims from each stratum based on our sample. We know the corresponding population proportion for each stratum. Therefore, if we take the sample proportions of unallowed claims and multiply by the population sizes, we would have the estimates that we need. Here are the population sizes for the three strata: Stratum Small Medium Large
online homework) was introduced, and student support outside the classroom was increased. The following table gives data on the DFW rates for the course over three years.17 In Year 1, the traditional course was given; in Year 2, a few changes were introduced; and in Year 3, the course was substantially revised. Year
42.3%
2408
Year 2
24.9%
2325
Year 3
19.9%
2126
Do you think that the changes in this gateway course had an impact on the DFW rate? Write a report giving your answer to this question. Support your answer by an analysis of the data. 9.40 Lying to a teacher. One of the questions in a survey of high school students asked about lying to teachers.18 The following table gives the numbers of students who said that they lied to a teacher at least once during the past year, classified by gender. LIE Gender
3342 58
(a) For each stratum, estimate the total number of claims that would not be allowed if all claims in the strata had been audited. (b) Give margins of error for your estimates. (Hint: You first need to find standard errors for your sample estimates using material presented in Chapter 8 (page 490). Then you need to use the rules for variances from Chapter 4 (page 275) to find the standard errors for the population estimates. Finally, you need to multiply by z* to determine the margins of error.) 9.39 DFW rates. One measure of student success for colleges and universities is the percent of admitted students who graduate. Studies indicate that a key issue in retaining students is their performance in so-called gateway courses. These are courses that serve as prerequisites for other key courses that are essential for student success. One measure of student performance in these courses is the DFW rate, the percent of students who receive grades of D, F, or W (withdraw). A major project was undertaken to improve the DFW rate in a gateway course at a large midwestern university. The course curriculum was revised to make it more relevant to the majors of the students taking the course, a small group of excellent teachers taught the course, technology (including clickers and
Number of students taking course
Year 1
Claims in strata 246
DFW rate
Lied at least once
Male
Female
Yes
3,228
10,295
No
9,659
4,620
(a) Add the marginal totals to the table. (b) Calculate appropriate percents to describe the results of this question. (c) Summarize your findings in a short paragraph. (d) Test the null hypothesis that there is no association between gender and lying to teachers. Give the test statistic and the P-value (with a sketch similar to the one on page 539) and summarize your conclusion. Be sure to include numerical and graphical summaries. 9.41 When do Canadian students enter private career colleges? A survey of 13,364 Canadian students who enrolled in private career colleges was conducted to understand student participation in the private postsecondary educational system.19 In one part of the survey, students were asked about their field of study and about when they entered college. Here are the results: CANF Field of study
Number of students
Time of Entry Right after high school
Later
Trades Design Health Media/IT Service Other
942 584 5085 3148 1350 2255
34% 47% 40% 31% 36% 52%
66% 53% 60% 69% 64% 48%
Chapter 9 Exercises In this table, the second column gives the number of students in each field of study. The next two columns give the marginal distribution of time of entry for each field of study. (a) Use the data provided to make the 6 ⫻ 2 table of counts for this problem. (b) Analyze the data. (c) Write a summary of your conclusions. Be sure to include the results of your significance testing as well as a graphical summary. 9.42 Government loans for Canadian students in private career colleges. Refer to the previous exercise. The survey also asked about how these college students paid for their education. A major source of funding was government loans. Here are the survey percents of Canadian private students who use government loans to finance their education by field of study: CANGOV Field of study Trades Design Health Media/IT Service Other
Number of students
Percent using government loans
942 599 5234 3238 1378 2300
45% 53% 55% 55% 60% 47%
(a) Construct the 6 ⫻ 2 table of counts for this exercise. (b) Test the null hypothesis that the percent of students using government loans to finance their education does not vary with field of study. Be sure to provide all the details of your significance test.
Answer the questions in the previous exercise for these data. 9.44 Why not use a chi-square test? As part of the study on ongoing fright symptoms due to exposure to horror movies at a young age, the following table was created based on the written responses from 119 students. Explain why a chi-square test is not appropriate for this table. Percent of students who reported each problem Type of Problem Bedtime Movie or video Poltergeist 1n ⫽ 292 Jaws 1n ⫽ 232 Nightmare on Elm Street 1n ⫽ 162 Thriller (music video) 1n ⫽ 162 It 1n ⫽ 242 The Wizard of Oz 1n ⫽ 122 E.T. 1n ⫽ 112
9.43 Other funding for Canadian students in private career colleges. Refer to the previous exercise. Another major source of funding was parents, family, or spouse. The following table gives the survey percents of Canadian private students who rely on these sources to finance their education by field of study. CANOTH Field of study Trades Design Health Media/IT Service Other
Number of students
Percent using parents/family/spouse
942 599 5234 3238 1378 2300
20% 37% 26% 16% 18% 41%
Waking
Short term
Enduring
Short term
Enduring
68 39
7 4
64 83
32 43
69
13
37
31
40 64
0 0
27 64
7 50
75 55
17 0
50 64
8 27
9.45 Waking versus bedtime symptoms. As part of the study on ongoing fright symptoms due to exposure to horror movies at a young age, the following table was presented to describe the lasting impact these movies have had during bedtime and waking life: FRITIM
(c) Summarize your analysis and conclusions. Be sure to include a graphical summary. (d) The number of students reported in this exercise is not the same as the number reported in Exercise 9.41. Suggest a possible reason for this difference.
561
Waking symptoms Bedtime symptoms
Yes
No
Yes
36
33
No
33
17
(a) What percent of the students have lasting waking-life symptoms? (b) What percent of the students have both waking-life and bedtime symptoms? (c) Test whether there is an association between wakinglife and bedtime symptoms. State the null and alternative hypotheses, the X 2 statistic, and the P-value. 9.46 Construct a table with no association. Construct a 3 ⫻ 3 table of counts where there is no apparent association between the row and column variables. 9.47 Can you construct the joint distribution from the marginal distributions? Here are the row and
562
CHAPTER 9
•
Analysis of Two-Way Tables
column totals for a two-way table with two rows and two columns: a
b
150
c
d
150
100
200
300
Find two different sets of counts a, b, c, and d for the body of the table. This demonstrates that the relationship between two variables cannot be obtained solely from the two marginal distributions of the variables. 9.48 Which model? Refer to Exercises 9.37, 9.39, 9.40, 9.42, and 9.45. For each, state whether you are comparing two or more populations (the first model for two-way tables) or testing independence between two categorical variables (the second model). 9.49 Are Mexican Americans less likely to be selected as jurors? Refer to Exercise 8.99 (page 528) concerning Castaneda v. Partida, the case where the Supreme Court review used the phrase “two or three standard deviations” as a criterion for statistical significance. Recall that there were 181,535 persons eligible for jury duty, of whom 143,611 were Mexican Americans. Of the 870 people selected for jury duty, 339 were Mexican Americans. We are interested in finding out if there is an association between being a Mexican American and being selected as a juror. Formulate this problem using a two-way table of counts. Construct the 2 ⫻ 2 table using the variables Mexican American or not and juror or not. Find the X 2 statistic and its P-value. Square the z statistic that you obtained in Exercise 8.99 and verify that the result is equal to the X 2 statistic. 9.50 Goodness of fit to a standard Normal distribution. Computer software generated 500 random numbers that should look as if they are from the standard Normal distribution. They are categorized into five groups: (1) less than or equal to ⫺0.6; (2) greater than ⫺0.6 and less than or equal to ⫺0.1; (3) greater than ⫺0.1 and less than or equal to 0.1; (4) greater than 0.1 and less than or equal to 0.6; and (5) greater than 0.6. The counts in the five groups are 139, 102, 41, 78, and 140, respectively. Find the probabilities for these five intervals using Table A. Then compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results. 9.51 More on the goodness of fit to a standard Normal distribution. Refer to the previous exercise. Use software to generate your own sample of 500 standard Normal random variables, and perform the goodness-offit test. Choose a different set of intervals than the ones used in the previous exercise.
9.52 Goodness of fit to the uniform distribution. Computer software generated 500 random numbers that should look as if they are from the uniform distribution on the interval 0 to 1 (see page 74). They are categorized into five groups: (1) less than or equal to 0.2; (2) greater than 0.2 and less than or equal to 0.4; (3) greater than 0.4 and less than or equal to 0.6; (4) greater than 0.6 and less than or equal to 0.8; and (5) greater than 0.8. The counts in the five groups are 114, 92, 108, 101, and 85, respectively. The probabilities for these five intervals are all the same. What is this probability? Compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results. 9.53 More on goodness of fit to the uniform distribution. Refer to the previous exercise. Use software to generate your own sample of 800 uniform random variables on the interval from 0 to 1, and perform the goodness-offit test. Choose a different set of intervals than the ones used in the previous exercise. 9.54 Suspicious results? An instructor who assigned an exercise similar to the one described in the previous exercise received homework from a student who reported a P-value of 0.999. The instructor suspected that the student did not use the computer for the assignment but just made up some numbers for the homework. Why was the instructor suspicious? How would this scenario change if there were 2000 students in the class? 9.55 Is there a random distribution of trees? In Example 6.1 (page 352) we examined data concerning the longleaf pine trees in the Wade Tract and concluded that the distribution of trees in the tract was not random. Here is another way to examine the same question. First, we divide the tract into four equal parts, or quadrants, in the east–west direction. Call the four parts Q1 to Q4. Then we take a random sample of 100 trees and count the number of trees in each quadrant. Here are the data: TREEQ Quadrant
Q1
Q2
Q3
Q4
Count
18
22
39
21
(a) If the trees are randomly distributed, we expect to find 25 trees in each quadrant. Why? Explain your answer. (b) We do not really expect to get exactly 25 trees in each quadrant. Why? Explain your answer. (c) Perform the goodness-of-fit test for these data to determine if these trees are randomly scattered. Write a short report giving the details of your analysis and your conclusion.
Inference for Regression Introduction In this chapter we continue our study of relationships between variables and describe methods for inference when there is a single quantitative response variable and a single quantitative explanatory variable. The descriptive tools we learned in Chapter 2—scatterplots, least-squares regression, and correlation— are essential preliminaries to inference and also provide a foundation for confidence intervals and significance tests. We first met the sample mean x in Chapter 1 as a measure of the center of a collection of observations. Later we learned that when the data are a random sample from a population, the sample mean is an estimate of the population mean m. In Chapters 6 and 7, we used x as the basis for confidence intervals and significance tests for inference about m. Now we will follow the same approach for the problem of fitting straight lines to data. In Chapter 2 we met the least-squares regression line yˆ ⫽ b0 ⫹ b1x as a description of a straight-line relationship between a response variable y and an explanatory variable x. At that point we did not distinguish between sample and population. Now we will think of the least-squares line computed from a sample as an estimate of a true regression line for the population. Following the common practice of using Greek letters for population parameters, we will write the population line as b0 ⫹ b1x. This notation reminds us that the intercept of the fitted line b0 estimates the intercept of the population line b0, and the fitted slope b1 estimates the slope of the population line b1.
10
CHAPTER
10.1 Simple Linear Regression
10.2 More Detail about Simple Linear Regression
563
564
CHAPTER 10
•
Inference for Regression The methods detailed in this chapter will help us answer questions such as • Is the trend in the annual number of tornadoes reported in the United States approximately linear? If so, what is the average yearly increase in the number of tornadoes? How many are predicted for next year? • What is the relationship between a female college student’s body mass index and physical activity level measured by a pedometer? • Among North American universities, is there a strong negative correlation between the binge-drinking rate and the average price for a bottle of beer at establishments within a two-mile radius of campus?
10.1 Simple Linear Regression When you complete this section, you will be able to • Describe the simple linear regression model in terms of a population regression line and the deviations of the response variable y from this line. • Interpret linear regression output from statistical software to obtain the least-squares regression line and model standard deviation. • Distinguish the model deviations ei from the residuals ei that are obtained from a least-squares fit to a data set. • Use diagnostic plots to check the assumptions of the simple linear regression model. • Construct and interpret a level C confidence interval for the population intercept and for the population slope. • Perform a level a significance test for the population intercept and for the population slope. • Construct and interpret a level C confidence interval for a mean response and a level C prediction interval for a future observation when x 5 x*.
Statistical model for linear regression Simple linear regression studies the relationship between a response variable y and a single explanatory variable x. We expect that different values of x will produce different mean responses for y. We encountered a similar but simpler situation in Chapter 7 when we discussed methods for comparing two population means. Figure 10.1 illustrates the statistical model for a comparison of blood pressure change in two groups of experimental subjects, one group taking a calcium supplement and the other a placebo. We can think of the treatment (placebo or calcium) as the explanatory variable in this example. This model has two important parts: • The mean change in blood pressure may be different in the two populations. These means are labeled m1 and m2 in Figure 10.1. • Individual changes vary within each population according to a Normal distribution. The two Normal curves in Figure 10.1 describe these responses. These Normal distributions have the same spread, indicating that the population standard deviations are assumed to be equal.
10.1 Simple Linear Regression
565
FIGURE 10.1 The statistical model for comparing responses to two treatments; the mean response varies with the treatment.
ge
n ha c e ur ss m2
e pr d m1 o o
Bl
Calcium Placebo Treatment group
subpopulations
simple linear regression
In linear regression the explanatory variable x is quantitative and can have many different values. Imagine, for example, giving different amounts of calcium x to different groups of subjects. We can think of the values of x as defining different subpopulations, one for each possible value of x. Each subpopulation consists of all individuals in the population having the same value of x. If we conducted an experiment with five different amounts of calcium, we could view these values as defining five different subpopulations. The statistical model for simple linear regression also assumes that for each value of x, the observed values of the response variable y are Normally distributed with a mean that depends on x. We use my to represent these means. In general, the means my can change as x changes according to any sort of pattern. In simple linear regression we assume that the means all lie on a line when plotted against x. To summarize, this model also has two important parts: • The mean of the response variable y changes as x changes. The means all lie on a straight line. That is, my ⫽ b0 ⫹ b1x. • Individual responses y with the same x vary according to a Normal distribution. This variation, measured by the standard deviation s, is the same for all values of x.
population regression line
This statistical model is pictured in Figure 10.2. The line describes how the mean response my changes with x. This is the population regression line. The three Normal curves show how the response y will vary for three different values of the explanatory variable x.
FIGURE 10.2 The statistical model for linear regression; the mean response is a straight-line function of the explanatory variable.
y
my = b 0 + b 1x
x
566
CHAPTER 10
•
Inference for Regression
Data for simple linear regression The data for a linear regression are observed values of y and x. The model takes each x to be a known quantity. In practice, x may not be exactly known. If the error in measuring x is large, more advanced inference methods are needed. The response y for a given x is a random variable. The linear regression model describes the mean and standard deviation of this random variable y. These unknown parameters must be estimated from the data. We will use the following example to explain the fundamentals of simple linear regression. Because regression calculations in practice are always done by statistical software, we will rely on computer output for the arithmetic. In Section 10.2, we give an example that illustrates how to do the work with a calculator if software is unavailable.
EXAMPLE
DATA PABMI
10.1 Relationship between BMI and physical activity. Decrease in physical activity is considered to be a major contributor to the increase in prevalence of overweight and obesity in the general adult population. Because the prevalence of physical inactivity among college students is similar to that of the adult population, many researchers feel that a clearer understanding of college students’ physical activity behaviors is needed to develop early interventions. As part of one study, researchers looked at the relationship between physical activity (PA) measured with a pedometer and body mass index (BMI).1 Each participant wore a pedometer for a week, and the average number of steps taken per day (in thousands) was recorded. Various body composition variables, including BMI (in kilograms per square meter, kg/m2), were also measured. We will consider a sample of 100 female undergraduates.
CHALLENGE
Before starting our analysis, it is appropriate to consider the extent to which the results can reasonably be generalized. In the original study, undergraduate volunteers were obtained at a large southeastern public university through classroom announcements and campus flyers. The potential for bias should always be considered when obtaining volunteers. In this case, the participants were screened, and those with severe health issues, as well as varsity athletes, were excluded. As a result, the researchers considered these volunteers as an SRS from the population of undergraduates at this university. However, they acknowledged the limitations of their study, stating that similar investigations at universities of different sizes and in other climates of the United States are needed. In the statistical model for predicting BMI from physical activity, subpopulations are defined by the explanatory variable, physical activity. We could think about sampling women from this university, each averaging the same number of steps per day—say, 9000. Variation in genetic makeup, lifestyle, and diet would be sources of variation that would result in different values of BMI for this subpopulation.
10.1 Simple Linear Regression
567
EXAMPLE LOOK BACK scatterplot, p. 88
10.2 Graphical display of BMI and physical activity. We start our analysis with a scatterplot of the data. Figure 10.3 is a plot of BMI versus physical activity for our sample of 100 participants. We use the variable names BMI and PA. The least-squares regression line is also shown in the plot. There is a negative association between BMI and PA that appears approximately linear. There is also a considerable amount of scatter about this least-squares regression line.
FIGURE 10.3 Scatterplot of BMI
35 BMI (kg/m2)
versus physical activity (PA) with the least-squares line, for Example 10.2.
30 25 20 15 2
4
6 8 10 12 PA (thousands of steps)
14
Always start with a graphical display of the data. There is no point in fitting a linear model if the relationship does not, at least approximately, appear linear. Now that we have confirmed an approximate linear relationship, we return to predicting BMI for different subpopulations, defined by the explanatory variable physical activity. Our statistical model assumes that the BMI values are Normally distributed with a mean my that depends upon x in a linear way. Specifically, my ⫽ b0 ⫹ b1x This population regression line gives the average BMI for all values of x. We cannot observe this line because the observed responses y vary about their means. The statistical model for linear regression consists of the population regression line and a description of the variation of y about the line. This was displayed in Figure 10.2 with the line and the three Normal curves. The following equation expresses this idea: DATA ⫽ FIT ⫹ RESIDUAL The FIT part of the model consists of the subpopulation means, given by the expression b0 ⫹ b1x. The RESIDUAL part represents deviations of the data from the line of population means. We assume that these deviations are Normally distributed with standard deviation s. We use e (the lowercase Greek letter epsilon) to stand for the RESIDUAL part of the statistical model. A response y is the sum of its mean and a chance deviation e from the mean. These model deviations e represent “noise,” that is, variation in y due to other causes that prevent the observed (x, y)-values from forming a perfectly straight line on the scatterplot.
568
CHAPTER 10
•
Inference for Regression
SIMPLE LINEAR REGRESSION MODEL Given n observations of the explanatory variable x and the response variable y, 1x1, y1 2, 1x2, y2 2, . . . ,1xn, yn 2 the statistical model for simple linear regression states that the observed response yi when the explanatory variable takes the value xi is yi ⫽ b0 ⫹ b1xi ⫹ ei Here b0 ⫹ b1xi is the mean response when x ⫽ xi. The deviations ei are assumed to be independent and Normally distributed with mean 0 and standard deviation s. The parameters of the model are b0, b1, and s.
Because the means my lie on the line my ⫽ b0 ⫹ b1x, they are all determined by b0 and b1. Thus, once we have estimates of b0 and b1, the linear relationship determines the estimates of my for all values of x. Linear regression allows us to do inference not only for subpopulations for which we have data but also for those corresponding to x’s not present in the data. These x-values can be both within and outside the range of observed x’s. However, extreme caution must be taken when performing inference for an x-value outside the range of the observed x’s because there is no assurance that the same linear relationship between my and x holds. Given the simple linear regression model, we will now learn how to do inference about • the slope b1 and the intercept b0 of the population regression line, • the mean response my for a given value of x, and • an individual future response y for a given value of x.
Estimating the regression parameters LOOK BACK least-squares regression, p. 113
The method of least squares presented in Chapter 2 fits a line to summarize a relationship between the observed values of an explanatory variable and a response variable. Now we want to use the least-squares line as a basis for inference about a population from which our observations are a sample. We can do this only when the statistical model just presented holds. In that setting, the slope b1 and intercept b0 of the least-squares line yˆ ⫽ b0 ⫹ b1x estimate the slope b1 and the intercept b0 of the population regression line. Using the formulas from Chapter 2 (page 115), the slope of the leastsquares line is b1 ⫽ r
sy sx
and the intercept is b0 ⫽ y ⫺ b1x
10.1 Simple Linear Regression LOOK BACK correlation, p. 103
residual
569
Here, r is the correlation between y and x, sy is the standard deviation of y, and sx is the standard deviation of x. Notice that if the slope is 0, so is the correlation, and vice versa. We will discuss this relationship more later in the chapter. The predicted value of y for a given value x* of x is the point on the leastsquares line yˆ ⫽ b0 ⫹ b1x*. This is an unbiased estimator of the mean response my when x ⫽ x*. The residual is ei ⫽ observed response ⫺ predicted response ⫽ yi ⫺ yˆ i ⫽ yi ⫺ b0 ⫺ b1xi The residuals ei correspond to the model deviations ei. The ei sum to 0, and the ei come from a population with mean 0. Because we do not observe the ei, we use the residuals to check the model assumptions of the ei. The remaining parameter to be estimated is s, which measures the variation of y about the population regression line. Because this parameter is the standard deviation of the model deviations, it should come as no surprise that we use the residuals to estimate it. As usual, we work first with the variance and take the square root to obtain the standard deviation. For simple linear regression, the estimate of s2 is the average squared residual g e2i n⫺2 g 1yi ⫺ yˆ i 2 2 ⫽ n⫺2
s2 ⫽
LOOK BACK sample variance, p. 42
model standard deviation s
We average by dividing the sum by n ⫺ 2 in order to make s2 an unbiased estimate of s2. The sample variance of n observations uses the divisor n ⫺ 1 for this same reason. The quantity n ⫺ 2 is called the degrees of freedom for s2. The estimate of the model standard deviation s is given by s ⫽ 2s2 We will use statistical software to calculate the regression for predicting BMI from physical activity for Example 10.1. In entering the data, we chose the names PA for the explanatory variable and BMI for the response. It is good practice to use names, rather than just x and y, to remind yourself which variables the output describes.
EXAMPLE 10.3 Statistical software output for BMI and physical activity. Figure 10.4 gives the outputs from three commonly used statistical software packages and Excel. Other software will give similar information. The SPSS output reports estimates of our three parameters as b0 ⫽ 29.578, b1 ⫽ ⫺0.655, and s ⫽ 3.6549. Be sure that you can find these entries in this output and the corresponding values in the other outputs. The least-squares regression line is the straight line that is plotted in Figure 10.3. We would report it as BMI ⫽ 29.578 ⫺ 0.655PA
570
CHAPTER 10
•
Inference for Regression
FIGURE 10.4 Regression output from SPSS, Minitab, Excel, and SAS for the physical activity example.
*Output1 - IBM SPSS Statistics Viewer Regression [DataSet1] Model Summary
Model
R .385a
1
R Square
Adjusted R Square
.149
Std. Error of the Estimate
.140
3.6549
a. Predictors: (Constant), PA Coefficientsa Unstandardized Coefficients Model 1
B (Constant) PA
Standardized Coefficients
Std. Error
29.578
1.412
–.655
.158
t
Beta
–.385
Sig.
20.948
.000
–4.135
.000
a. Dependent Variable: BMI IBM SPSS Statistics Processor is ready
H: 132, W: 320 pt
Minitab
Regression Analysis: BMI versus PA The regression equation is BMI = 29.6 – 0.655 PA Predictor Constant PA
Coef
SE Coef
T
P
29.578
1.412
20.95
0.000
–0.6547
0.1583
–4.13
0.000
S = 3.65488
R-Sq = 14.9%
R-Sq(adj) = 14.0%
Welcome to Minitab, press F1 for help.
with a model standard deviation of s ⫽ 3.655. Note that the number of digits provided varies with the software used and we have rounded the values to three decimal places. It is important to avoid cluttering up your report of the results of a statistical analysis with many digits that are not relevant. Software often reports many more digits than are meaningful or useful. The outputs contain other information that we will ignore for now. Computer outputs often give more information than we want or need. This is
10.1 Simple Linear Regression FIGURE 10.4 (Continued )
Excel
A 1
B
C
D
E
F
G
MS
F
Significance F
SUMMARY OUTPUT
2
Regression Statistics
3 4
Multiple R
5
R Square
0.14854014
6
Adjusted R Square
0.13985178
7
Standard Error
3.65488311
8
Observations
0.38540906
100
9 10
ANOVA df
11 12
Regression
13 14
SS 1
228.3771867
228.3772 17.09644
Residual
98
1309.100713
13.35817
Total
99
1537.4779
7.50303E-05
15 Coefficients Standard Error
16 17
Intercept
18
PA
t Stat
P-Value
Lower 95%
Upper 95%
5.71E-38
26.77622218
32.3802721
29.5782471
1.411978287
20.94809
–0.65468577
0.158336132
–4.13478
7.5E-05 –0.968898666 –0.340472865
SAS
Root MSE
3.65488
R-Square 0.1485
Dependent Mean
23.93900
Adj R-Sq 0.1399
Coeff Var
15.26748
Parameter Estimates DF
Parameter Estimate
Intercept
1
29.57825
1.41198
20.95
F
Model
1
187482771
187482771
10.23
0.0028
Error
38
696437936
18327314
Corrected Total
39
883920707
Standard Error
t Value
Pr > |t|
Parameter Estimates
DF
Parameter Estimate
Variable
Label
Intercept
Intercept
1
11818
2739.85172
4.31
0.0001
PercBorrow
PercBorrow
1
168.98446
52.83425
3.20
0.0028
Done
80
FIGURE 10.15 Scatterplot of average debt (in dollars) at graduation (AvgDebt) versus the percent of students who borrow (PercBorrow), for Exercise 10.10.
10.11 Can we consider this an SRS? Refer to the previous exercise. The report states that Kiplinger’s rankings focus on traditional four-year public colleges with broadbased curricula. Each year, they start with more than 500 schools and then narrow the list down to roughly 120 based on academic quality before ranking them. The data set in the previous exercise is an SRS from their published list of 100 schools. As far as investigating the relationship between average debt and the percent of students who borrow, is it reasonable to consider this to be an SRS from the population of interest? Write a short paragraph explaining your answer. BESTVAL
FIGURE 10.16 SAS output for
30
20
(c) The State University of New York–Fredonia is a school where 86% of the students borrow. Discuss the appropriateness of using this data set to predict the average debt for this school.
Exercise 10.12.
601
602
CHAPTER 10
•
Inference for Regression
10.13 More on predicting college debt. Refer to the previous exercise. The University of Michigan–Ann Arbor is a school where 46% of the students borrow, and the average debt is $27,828. The University of Wisconsin–La Crosse is a school where 69% of the students borrow, and the average debt is $21,420. BESTVAL (a) Using your answer to part (a) of the previous exercise, what is the predicted average debt for a student at the University of Michigan–Ann Arbor? (b) What is the predicted average debt for the University of Wisconsin–La Crosse? (c) Without doing any calculations, would the standard error for the estimated average debt be larger for the University of Michigan–Ann Arbor or the University of Wisconsin–La Crosse? Explain your answer. 10.14 Predicting college debt: other measures. Refer to Exercise 10.10. Let’s now look at AvgDebt and its relationship with all seven measures available in the data set. In addition to the percent of students who borrow (PercBorrow), we have the admittance rate (Admit), the four-year graduation rate (Yr4Grad), in-state tuition after aid (InAfterAid), out-of-state tuition after aid (OutAfterAid), average aid per student (AvgAid), and the number of students per faculty member (StudPerFac). BESTVAL (a) Generate scatterplots of each explanatory variable and AvgDebt. Do all these relationships look linear? Describe what you see. (b) Fit each of the predictors separately and create a table that lists the explanatory variable, model standard deviation s, and the P-value for the test of a linear association. (c) Which variable appears to be the best single predictor of average debt? Explain your answer. 10.15 Importance of Normal model deviations? A general form of the central limit theorem tells us that the sampling distributions of b0 and b1 will be approximately Normal even if the model deviations are not Normally
TABLE 10.1 School Penn State Rutgers Illinois Buffalo Virginia Cal–Irvine Oregon UCLA Iowa North Carolina Florida
distributed. Using this fact, explain why the Normal distribution assumption is much more important for a prediction interval than for the confidence interval of the mean response at x ⫽ x*. 10.16 Public university tuition: 2008 versus 2011. Table 10.1 shows the in-state undergraduate tuition and required fees for 33 public universities in 2008 and 2011.10 TUITION (a) Plot the data with the 2008 in-state tuition (IN08) on the x axis and the 2011 tuition (IN11) on the y axis. Describe the relationship. Are there any outliers or unusual values? Does a linear relationship between the in-state tuition in 2008 and in 2011 seem reasonable? (b) Run the simple linear regression and state the leastsquares regression line. (c) Obtain the residuals and plot them versus the 2008 in-state tuition amounts. Describe anything unusual in the plot. (d) Do the residuals appear to be approximately Normal with constant variance? Explain your answer. (e) The 5 California schools appear to follow the same linear trend as the other schools but have higher-thanpredicted in-state tuition in 2011. Assume that this jump is particular to this state (financial troubles?), and remove these 5 observations and refit the model. How do the model parameters change? (f) If you were to move forward with inference, which of these two model fits would you use? Write a short paragraph explaining your answer. 10.17 More on public university tuition. Refer to the previous exercise. We’ll now move forward with inference using the model fit you chose in part (f) of the previous exercise. TUITION (a) Give the null and alternative hypotheses for examining the linear relationship between 2008 and 2011 in-state tuition amounts.
In-State Tuition and Fees (in dollars) for 33 Public Universities 2008
2011
School
13,706 11,540 12,106 6,285 9,300 8,046 6,435 7,551 6,544 5,397 3,778
15,984 12,754 13,838 7,482 11,786 13,122 8,789 12,686 7,765 7,009 5,657
Pittsburgh Michigan State Minnesota Indiana Cal–Davis Purdue Wisconsin Texas Colorado Kansas Georgia Tech
2008
2011
13,642 10,214 10,756 8,231 8,635 7,750 7,564 8,532 7,278 7,042 6,040
16,132 12,202 13,022 9,524 13,860 9,478 9,665 9,794 9,152 9,222 9,652
School Michigan Maryland Missouri Ohio State Cal–Berkeley Cal–San Diego Washington Nebraska Iowa State Arizona Texas A&M
2008
2011
11,738 8,005 8,467 8,679 7,656 8,062 6,802 6,584 6,360 5,542 7,844
12,634 8,655 8,989 9,735 12,834 13,200 10,574 7,563 7,486 9,286 8,421
Chapter 10 Exercises
603
(b) Write down the test statistic and P-value for the hypotheses stated in part (a). State your conclusions.
annual returns (in percent) on two indexes of stock prices:
(c) Construct a 95% confidence interval for the slope. What does this interval tell you about the annual percent increase in tuition between 2008 and 2011?
MEAN OVERSEAS RETURN ⫽ ⫺0.2 ⫹ 0.32 ⫻ U.S. RETURN
(d) What percent of the variability in 2011 tuition is explained by a linear regression model using the 2008 tuition? (e) Explain why inference on b0 is not of interest for this problem. 10.18 Even more on public university tuition. Refer to the previous two exercises. TUITION (a) The in-state tuition at State U was $5100 in 2008. What is the predicted in-state tuition in 2011? (b) The in-state tuition at Moneypit U was $15,700 in 2008. What is its predicted in-state tuition in 2011? (c) Discuss the appropriateness of using the fitted equation to predict tuition for each of these universities. 10.19 Out-of-state tuition. Refer to Exercise 10.16. In addition to in-state tuition, out-of-state tuition for 2008 (OUT08) and 2011 (OUT11) was also obtained. Repeat parts (a) through (d) of Exercise 10.16 using these tuition rates. Does it appear we can use all the schools for this analysis or are there some unusual observations? Explain your answer. TUITION 10.20 More on out-of-state tuition. Refer to the previous exercise. TUITION (a) Construct a 95% confidence interval for the slope. What does this interval tell you about the annual percent increase in out-of-state tuition between 2008 and 2011? (b) In Exercise 10.17(c) you constructed a similar 95% confidence interval for the annual percent increase in in-state tuition. Suppose that you want to test whether the increase is the same for both tuition types. Given the two slope estimates b1 and standard errors, could we just do a variation of the two-sample t test from Chapter 7? Explain why or why not. 10.21 In-state versus out-of-state tuition. Refer to the previous five exercises. We can also investigate whether there is a linear association between the in-state and out-of-state tuition. Perform a linear regression analysis using the 2011 data, complete with scatterplots and residual checks, and write a paragraph summarizing your findings. TUITION 10.22 U.S. versus overseas stock returns. Returns on common stocks in the United States and overseas appear to be growing more closely correlated as economies become more interdependent. Suppose that the following population regression line connects the total
(a) What is b0 in this line? What does this number say about overseas returns when the U.S. market is flat (0% return)? (b) What is b1 in this line? What does this number say about the relationship between U.S. and overseas returns? (c) We know that overseas returns will vary in years when U.S. returns do not vary. Write the regression model based on the population regression line given above. What part of this model allows overseas returns to vary when U.S. returns remain the same? 10.23 Beer and blood alcohol. How well does the number of beers a student drinks predict his or her blood alcohol content (BAC)? Sixteen student volunteers at Ohio State University drank a randomly assigned number of 12-ounce cans of beer. Thirty minutes later, a police officer measured their BAC. Here are the data:11 Student
1
2
3
4
5
6
7
8
Beers
5
2
9
8
3
7
3
5
BAC
0.10
0.03
0.19
0.12
Student
9
10
11
12
13
14
15
16
Beers
3
5
4
6
5
7
1
4
BAC
0.02
0.05
0.07
0.04 0.095 0.07 0.06
0.10 0.085 0.09
0.01 0.05
The students were equally divided between men and women and differed in weight and usual drinking habits. Because of this variation, many students don’t believe that number of drinks predicts BAC well. BAC (a) Make a scatterplot of the data. Find the equation of the least-squares regression line for predicting BAC from number of beers and add this line to your plot. What is r2 for these data? Briefly summarize what your data analysis shows. (b) Is there significant evidence that drinking more beers increases BAC on the average in the population of all students? State hypotheses, give a test statistic and P-value, and state your conclusion. (c) Steve thinks he can drive legally 30 minutes after he drinks 5 beers. The legal limit is BAC 5 0.08. Give a 90% prediction interval for Steve’s BAC. Can he be confident he won’t be arrested if he drives and is stopped? 10.24 School budget and number of students. Suppose that there is a linear relationship between the number of students x in a school system and the annual budget y. Write a population regression model to describe this relationship.
604
CHAPTER 10
•
Inference for Regression
(a) Which parameter in your model is the fixed cost in the budget (for example, the salary of the principals and some administrative costs) that does not change as x increases?
(a) Now run the simple linear regression for the variables sqrt(rating) and percent of salary devoted to incentive payments.
(b) Which parameter in your model shows how total cost changes when there are more students in the system? Do you expect this number to be greater than 0 or less than 0?
(b) Obtain the residuals and assess whether the assumptions for the linear regression analysis are reasonable. Include all plots and numerical summaries used in doing this assessment.
(c) Actual data from various school systems will not fit a straight line exactly. What term in your model allows variation among schools of the same size x? 10.25 Performance bonuses. In the National Football League (NFL), performance bonuses now account for roughly 25% of player compensation.12 Does tying a player’s salary into performance bonuses result in better individual or team success on the field? Focusing on linebackers, let’s look at the relationship between a player’s end-of-year production rating and the percent of his salary devoted to incentive payments in that same year. PERFPAY (a) Use numerical and graphical methods to describe the two variables and summarize your results. (b) Both variable distributions are non-Normal. Does this necessarily pose a problem for performing linear regression? Explain. (c) Construct a scatterplot of the data and describe the relationship. Are there any outliers or unusual values? Does a linear relationship between the percent of salary and the player rating seem reasonable? Is it a very strong relationship? Explain. (d) Run the simple linear regression and state the leastsquares regression line. (e) Obtain the residuals and assess whether the assumptions for the linear regression analysis are reasonable. Include all plots and numerical summaries used in doing this assessment. 10.26 Performance bonuses, continued. Refer to the previous exercise. PERFPAY
TABLE 10.2
(c) Construct a 95% confidence interval for the square root increase in rating given a 1% increase in the percent of salary devoted to incentive payments. (d) Consider the values 0%, 20%, 40%, 60%, and 80% salary devoted to incentives. Compute the predicted rating for this model and for the one in the previous exercise. For the model in this problem, you will need to square the predicted value to get back to the original units. (e) Plot the predicted values versus the percent and connect those values from the same model. For which regions of percent do the predicted values from the two models differ the most? (f) Based on the comparison of regression models (both predicted values and residuals), which model do you prefer? Explain. 10.27 Sales price versus assessed value. Real estate is typically reassessed annually for property tax purposes. This assessed value, however, is not necessarily the same as the fair market value of the property. Table 10.2 summarizes an SRS of 30 homes recently sold in a midwestern city.13 Both variables are measured in thousands of dollars. SALES (a) Inspect the data. How many homes have a sales price greater than the assessed value? Do you think this trend would be true for the larger population of all homes recently sold? Explain your answer. (b) Make a scatterplot with assessed value on the horizontal axis. Briefly describe the relationship between assessed value and sales price.
Sales Price and Assessed Value (in $ thousands) of 30 Homes in a Midwestern City
Property
Sales price
Assessed value
Property
Sales price
Assessed value
Property
Sales price
Assessed value
1 4 7 10 13 16 19 22 25 28
179.9 281.5 281.5 184.0 185.0 160.0 160.0 190.0 157.0 175.0
188.7 232.4 232.4 180.3 162.3 191.7 181.6 229.7 143.9 181.0
2 5 8 11 14 17 20 23 26 29
240.0 186.0 210.0 186.5 251.0 255.0 200.0 150.5 171.5 159.0
220.4 188.1 211.8 294.7 236.8 245.6 177.4 168.9 201.4 125.1
3 6 9 12 15 18 21 24 27 30
113.5 275.0 210.0 239.0 180.0 220.0 265.0 189.0 157.0 229.0
118.1 240.1 168.0 209.2 123.7 219.3 307.2 194.4 143.9 195.3
Chapter 10 Exercises (c) Report the least-squares regression line for predicting sales price from assessed value. (d) Obtain the residuals and plot them versus assessed value. Property 11 was sold at a price substantially lower than the assessed value. Does this observation appear to be unusual in the residual plot? Approximately how many standard deviations is it away from its predicted value? (e) Remove this observation and redo the least-squares fit. How have the least-squares regression line and model standard deviation changed? (f) Check the residuals for this new fit. Do the assumptions for the linear regression analysis appear reasonable here? Explain your answer. 10.28 Sales price versus assessed value, continued. Refer to the previous exercise. Let’s consider the model fit with Property 11 excluded. SALES (a) Calculate the predicted sales prices for homes currently assessed at $155,000, $220,000, and $285,000. (b) Construct a 95% confidence interval for the slope and explain what this model tells you in terms of the relationship between assessed value and sales price. (c) Explain why inference on the intercept is not of interest. (d) Using the result from part (b), compare the estimated regression line with y ⫽ x, which says that, on average, the sales price is equal to the assessed value. Is there evidence that this model is not reasonable? In other words, is the sales price typically larger or smaller than the assessed value? Explain your answer.
TABLE 10.3
605
10.29 Is the number of tornadoes increasing? The Storm Prediction Center of the National Oceanic and Atmospheric Administration maintains a database of tornadoes, floods, and other weather phenomena. Table 10.3 summarizes the annual number of tornadoes in the United States between 1953 and 2012.14 TWISTER (a) Make a plot of the total number of tornadoes by year. Does a linear trend over years appear reasonable? Are there any outliers or unusual patterns? Explain your answer. (b) Run the simple linear regression and summarize the results, making sure to construct a 95% confidence interval for the average annual increase in the number of tornadoes. (c) Obtain the residuals and plot them versus year. Is there anything unusual in the plot? (d) Are the residuals Normal? Justify your answer. (e) The number of tornadoes in 2004 is much larger than expected under this linear model. Also, the number of tornadoes in 2012 is much smaller than predicted. Remove these observations and rerun the simple linear regression. Compare these results with the results in part (b). Do you think these two observations should be considered outliers and removed? Explain your answer. 10.30 Are the two fuel efficiency measurements similar? Refer to Exercise 7.30 (page 443). In addition to the computer calculating miles per gallon (mpg), the driver also recorded this measure by dividing the miles driven by the number of gallons at fill-up. The
Annual Number of Tornadoes in the United States Between 1953 and 2012
Year
Number of tornadoes
Year
Number of tornadoes
Year
Number of tornadoes
Year
Number of tornadoes
1953
421
1968
660
1983
931
1998
1449
1954
550
1969
608
1984
907
1999
1340
1955
593
1970
653
1985
684
2000
1075
1956
504
1971
888
1986
764
2001
1215
1957
856
1972
741
1987
656
2002
934
1958
564
1973
1102
1988
702
2003
1374
1959
604
1974
947
1989
856
2004
1817
1960
616
1975
920
1990
1133
2005
1265
1961
697
1976
835
1991
1132
2006
1103
1962
657
1977
852
1992
1298
2007
1096
1963
464
1978
788
1993
1176
2008
1692
1964
704
1979
852
1994
1082
2009
1156
1965
906
1980
866
1995
1235
2010
1282
1966
585
1981
783
1996
1173
2011
1692
1967
926
1982
1046
1997
1148
2012
939
606
CHAPTER 10
•
Inference for Regression
driver wants to determine if these calculations are different. MPGDIFF Fill-up 1 2 3 4 5 6 7 8 9 10 Computer 41.5 50.7 36.6 37.3 34.2 45.0 48.0 43.2 47.7 42.2 Driver 36.5 44.2 37.2 35.6 30.5 40.5 40.0 41.0 42.8 39.2 Fill-up 11 12 13 14 15 16 17 18 19 20 Computer 43.2 44.6 48.4 46.4 46.8 39.2 37.3 43.5 44.3 43.3 Driver 38.8 44.5 45.4 45.3 45.7 34.2 35.2 39.8 44.9 47.5
(a) Consider the driver’s mpg calculations as the explanatory variable. Plot the data and describe the relationship. Are there any outliers or unusual values? Does a linear relationship seem reasonable? (b) Run the simple linear regression and state the leastsquares regression line. (c) Summarize the results. Does it appear that the computer and driver calculations are the same? Explain. 10.31 Gambling and alcohol use by first-year college students. Gambling and alcohol use are problematic behaviors for many college students. One study looked at 908 first-year students from a large northeastern university.15 Each participant was asked to fill out the 10-item Alcohol Use Disorders Identification Test (AUDIT) and a 7-item inventory used in prior gambling research among college students. AUDIT assesses alcohol consumption and other alcohol-related risks and problems (a higher score means more risks). A correlation of 0.29 was reported between the frequency of gambling and the AUDIT score. (a) What percent of the variability in AUDIT score is explained by frequency of gambling? (b) Test the null hypothesis that the correlation between the gambling frequency and the AUDIT score is zero. (c) The sample in this study represents 45% of the students contacted for the online study. To what extent do
TABLE 10.4
you think these results apply to all first-year students at this university? To what extent do you think these results apply to all first-year students? Give reasons for your answers. 10.32 Predicting water quality. The index of biotic integrity (IBI) is a measure of the water quality in streams. IBI and land use measures for a collection of streams in the Ozark Highland ecoregion of Arkansas were collected as part of a study.16 Table 10.4 gives the data for IBI, the percent of the watershed that was forest, and the area of the watershed in square kilometers for streams in the original sample with watershed area less than or equal to 70 km2. IBI (a) Use numerical and graphical methods to describe the variable IBI. Do the same for area. Summarize your results. (b) Plot the data and describe the relationship between IBI and area. Are there any outliers or unusual patterns? (c) Give the statistical model for simple linear regression for this problem. (d) State the null and alternative hypotheses for examining the relationship between IBI and area. (e) Run the simple linear regression and summarize the results. (f) Obtain the residuals and plot them versus area. Is there anything unusual in the plot? (g) Do the residuals appear to be approximately Normal? Give reasons for your answer. (h) Do the assumptions for the analysis of these data using the model you gave in part (c) appear to be reasonable? Explain your answer. 10.33 More on predicting water quality. The researchers who conducted the study described in the previous exercise also recorded the percent of the watershed area that was forest for each of the streams.
Watershed Area (km2), Percent Forest, and Index of Biotic Integrity
Area
Forest
IBI
Area
Forest
IBI
Area
21
0
47
29
0
61
31
Forest
IBI
Area
0
39
32
Forest 0
IBI
Area
Forest
IBI
59
34
0
72
34
0
76
49
3
85
52
3
89
2
7
74
70
8
89
6
9
33
28
10
46
21
10
32
59
11
80
69
14
80
47
17
78
8
17
53
8
18
43
58
21
88
54
22
84
10
25
62
57
31
55
18
32
29
19
33
29
39
33
54
49
33
78
9
39
71
5
41
55
14
43
58
9
43
71
23
47
33
31
49
59
18
49
81
16
52
71
21
52
75
32
59
64
10
63
41
26
68
82
9
75
60
54
79
84
12
79
83
21
80
82
27
86
82
23
89
86
26
90
79
16
95
67
26
95
56
26
100
85
28
100
91
Chapter 10 Exercises These data are also given in Table 10.4. Analyze these data using the questions in the previous exercise as a guide. IBI 10.34 Comparing the analyses. In Exercises 10.32 and 10.33, you used two different explanatory variables to predict IBI. Summarize the two analyses and compare the results. If you had to choose between the two explanatory variables for predicting IBI, which one would you prefer? Give reasons for your answer. IBI 10.35 How an outlier can affect statistical significance. Consider the data in Table 10.4 and the relationship between IBI and the percent of watershed area that was forest. The relationship between these two variables is almost significant at the 0.05 level. In this exercise you will demonstrate the potential effect of an outlier on statistical significance. Investigate what happens when you decrease the IBI to 0.0 for (1) an observation with 0% forest and (2) an observation with 100% forest. Write a short summary of what you learn from this exercise. IBI 10.36 Predicting water quality for an area of 40 km2. Refer to Exercise 10.32. IBI (a) Find a 95% confidence interval for the mean response corresponding to an area of 40 km2. (b) Find a 95% prediction interval for a future response corresponding to an area of 40 km2. (c) Write a short paragraph interpreting the meaning of the intervals in terms of Ozark Highland streams. (d) Do you think that these results can be applied to other streams in Arkansas or in other states? Explain why or why not.
607
measurements for the years 1975 to 1987. The variable “lean” represents the difference between where a point on the tower would be if the tower were straight and where it actually is. The data are coded as tenths of a millimeter in excess of 2.9 meters, so that the 1975 lean, which was 2.9642 meters, appears in the table as 642. Only the last two digits of the year were entered into the computer.17 PISA Year 75 76 77 78 79 80 81 82 83 84 85 86 87 Lean 642 644 656 667 673 688 696 698 713 717 725 742 757
(a) Plot the data. Does the trend in lean over time appear to be linear? (b) What is the equation of the least-squares line? What percent of the variation in lean is explained by this line? (c) Give a 99% confidence interval for the average rate of change (tenths of a millimeter per year) of the lean. 10.40 More on the Leaning Tower of Pisa. Refer to the previous exercise. PISA (a) In 1918 the lean was 2.9071 meters. (The coded value is 71.) Using the least-squares equation for the years 1975 to 1987, calculate a predicted value for the lean in 1918. (Note that you must use the coded value 18 for year.) (b) Although the least-squares line gives an excellent fit to the data for 1975 to 1987, this pattern did not extend back to 1918. Write a short statement explaining why this conclusion follows from the information available. Use numerical and graphical summaries to support your explanation. 10.41 Predicting the lean in 2013. Refer to the previous two exercises. PISA
10.37 Compare the predictions. Consider Case 37 in Table 10.4 (8th row, 2nd column). For this case the area is 10 km2 and the percent forest is 63%. A predicted index of biotic integrity based on area was computed in Exercise 10.32, while one based on percent forest was computed in Exercise 10.33. Compare these two estimates and explain why they differ. Use the idea of a prediction interval to interpret these results. IBI
(a) How would you code the explanatory variable for the year 2013?
10.38 Reading test scores and IQ. In Exercise 2.33 (page 100) you examined the relationship between reading test scores and IQ scores for a sample of 60 fifth-grade children. READIQ
(c) To give a margin of error for the lean in 2013, would you use a confidence interval for a mean response or a prediction interval? Explain your choice.
(b) The engineers working on the Leaning Tower of Pisa were most interested in how much the tower would lean if no corrective action was taken. Use the least-squares equation to predict the tower’s lean in the year 2013. (Note: The tower was renovated in 2001 to make sure it does not fall down.)
(b) Rerun the analysis with the four possible outliers removed. Summarize your findings, paying particular attention to the effects of removing the outliers.
10.42 Correlation between binge drinking and the average price of beer. A recent study looked at 118 colleges to investigate the association between the binge-drinking rate and the average price for a bottle of beer at establishments within a two-mile radius of campus.18 A correlation of 20.36 was found. Explain this correlation.
10.39 Leaning Tower of Pisa. The Leaning Tower of Pisa is an architectural wonder. Engineers concerned about the tower’s stability have done extensive studies of its increasing tilt. Measurements of the lean of the tower over time provide much useful information. The following table gives
10.43 Is this relationship significant? Refer to the previous exercise. Test the null hypothesis that the correlation between the binge-drinking rate and the average price for a bottle of beer within a two-mile radius of campus is zero.
(a) Run the regression and summarize the results of the significance tests.
608
CHAPTER 10
•
Inference for Regression
10.44 Does a math pretest predict success? Can a pretest on mathematics skills predict success in a statistics course? The 62 students in an introductory statistics class took a pretest at the beginning of the semester. The leastsquares regression line for predicting the score y on the final exam from the pretest score x was yˆ ⫽ 13.8 ⫹ 0.81x. The standard error of b1 was 0.43. (a) Test the null hypothesis that there is no linear relationship between the pretest score and the score on the final exam against the two-sided alternative. (b) Would you reject this null hypothesis versus the onesided alternative that the slope is positive? Explain your answer. 10.45 Completing an ANOVA table. How are returns on common stocks in overseas markets related to returns in U.S. markets? Consider measuring U.S. returns by the annual rate of return on the Standard & Poor’s 500 stock index and overseas returns by the annual rate of return on the Morgan Stanley Europe, Australasia, Far East (EAFE) index.19 Both are recorded in percents. We will regress the EAFE returns on the S&P 500 returns for the years 1993 to 2012. Here is part of the Minitab output for this regression: The regression equation is EAFE = - 0.168 + 0.845 S&P Analysis of Variance Source
DF
Regression
SS
MS
F
4947.2
Residual Error Total
19
8251.5
Using the ANOVA table format on page 589 as a guide, complete the analysis of variance table. 10.46 Interpreting statistical software output. Refer to the previous exercise. What are the values of the regression standard error s and the squared correlation r2? 10.47 Standard error and confidence interval for the slope. Refer to the previous two exercises. The standard deviation of the S&P 500 returns for these years is 19.09%. From this and your work in the previous exercise, find the standard error for the least-squares slope b1. Give a 95% confidence interval for the slope b1 of the population regression line. 10.48 Grade inflation. The average undergraduate GPA for American colleges and universities was estimated based on a sample of institutions that published this information.20 Here are the data for public schools in that report: Year
1992 1996 2002 2007
GPA
2.85
2.90
2.97
3.01
Do the following by hand or with a calculator and verify your results with a software package. GRADEUP
(a) Make a scatterplot that shows the increase in GPA over time. Does a linear increase appear reasonable? (b) Find the equation of the least-squares regression line for predicting GPA from year. Add this line to your scatterplot. (c) Compute a 95% confidence interval for the slope and summarize what this interval tells you about the increase in GPA over time. 10.49 Significance test of the correlation. A study reported a correlation r ⫽ 0.5 based on a sample size of n ⫽ 15; another reported the same correlation based on a sample size of n ⫽ 25. For each, perform the test of the null hypothesis that r ⫽ 0. Describe the results and explain why the conclusions are different. 10.50 State and college binge drinking. Excessive consumption of alcohol is associated with numerous adverse consequences. In one study, researchers analyzed bingedrinking rates from two national surveys, the Harvard School of Public Health College Alcohol Study (CAS) and the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS).21 The CAS survey was used to provide an estimate of the college binge-drinking rate in each state, and the BRFSS was used to determine the adult binge-drinking rate in each state. A correlation of 0.43 was reported between these two rates for their sample of n ⫽ 40 states. The college binge-drinking rate had a mean of 46.5% and standard deviation 13.5%. The adult binge-drinking rate had a mean of 14.88% and standard deviation 3.8%. (a) Find the equation of the least-squares line for predicting the college binge-drinking rate from the adult bingedrinking rate. (b) Give the results of the significance test for the null hypothesis that the slope is 0. (Hint: What is the relation between this test and the test for a zero correlation?) 10.51 SAT versus ACT. The SAT and the ACT are the two major standardized tests that colleges use to evaluate candidates. Most students take just one of these tests. However, some students take both. Consider the scores of 60 students who did this. How can we relate the two tests? SATACT (a) Plot the data with SAT on the x axis and ACT on the y axis. Describe the overall pattern and any unusual observations. (b) Find the least-squares regression line and draw it on your plot. Give the results of the significance test for the slope. (c) What is the correlation between the two tests? 10.52 SAT versus ACT, continued. Refer to the previous exercise. Find the predicted value of ACT for each observation in the data set. SATACT
Chapter 10 Exercises (a) What is the mean of these predicted values? Compare it with the mean of the ACT scores. (b) Compare the standard deviation of the predicted values with the standard deviation of the actual ACT scores. If least-squares regression is used to predict ACT scores for a large number of students such as these, the average predicted value will be accurate but the variability of the predicted scores will be too small. (c) Find the SAT score for a student who is one standard deviation above the mean 1z ⫽ 1x ⫺ x2兾s ⫽ 12. Find the predicted ACT score and standardize this score. (Use the means and standard deviations from this set of data for these calculations.)
609
(a) Plot weight versus length and weight versus width. Do these relationships appear to be linear? Explain your answer. (b) Run the regression using length to predict weight. Do the same using width as the explanatory variable. Summarize the results. Be sure to include the value of r2. 10.55 Transforming the perch data. Refer to the previous exercise. PERCH (a) Try to find a better model using a transformation of length. One possibility is to use the square. Make a plot and perform the regression analysis. Summarize the results. (b) Do the same for width.
(d) Repeat part (c) for a student whose SAT score is one standard deviation below the mean 1z ⫽ ⫺12.
10.56 Creating a new explanatory variable. Refer to the previous two exercises. PERCH
(e) What do you conclude from parts (c) and (d)? Perform additional calculations for different z’s if needed.
(a) Create a new variable that is the product of length and width. Make a plot and run the regression using this new variable. Summarize the results.
10.53 Matching standardized scores. Refer to the previous two exercises. An alternative to the least-squares method is based on matching standardized scores. Specifically, we set 1y ⫺ y2 sy
⫽
1x ⫺ x2 sx
and solve for y. Let’s use the notation y ⫽ a0 ⫹ a1x for this line. The slope is a1 ⫽ sy兾sx and the intercept is a0 ⫽ y ⫺ a1x. Compare these expressions with the formulas for the least-squares slope and intercept (page 592). SATACT (a) Using the data in the previous exercise, find the values of a0 and a1. (b) Plot the data with the least-squares line and the new prediction line. (c) Use the new line to find predicted ACT scores. Find the mean and the standard deviation of these scores. How do they compare with the mean and standard deviation of the ACT scores? 10.54 Weight, length, and width of perch. Here are data for 12 perch caught in a lake in Finland:22 PERCH Weight (grams)
Length (cm)
Width (cm)
Weight (grams)
Length (cm)
Width (cm)
5.9
8.8
1.4
300.0
28.7
5.1
100.0
19.2
3.3
300.0
30.1
4.6
110.0
22.5
3.6
685.0
39.0
6.9
120.0
23.5
3.5
650.0
41.4
6.0
150.0
24.0
3.6
820.0
42.5
6.6
145.0
25.5
3.8
1000.0
46.6
7.6
In this exercise we will examine different models for predicting weight.
(b) Write a short report summarizing and comparing the different regression analyses that you performed in this exercise and the previous two exercises. 10.57 Index of biotic integrity. Refer to the data on the index of biotic integrity and area in Exercise 10.32 (page 606) and the additional data on percent watershed area that was forest in Exercise 10.33. Find the correlations among these three variables, perform the test of statistical significance, and summarize the results. Which of these test results could have been obtained from the analyses that you performed in Exercises 10.32 and 10.33? IBI 10.58 Food neophobia. Food neophobia is a personality trait associated with avoiding unfamiliar foods. In one study of 564 children who were 2 to 6 years of age, food neophobia and the frequency of consumption of different types of food were measured.23 Here is a summary of the correlations: Type of food
Correlation
Vegetables
⫺0.27
Fruit
⫺0.16
Meat
⫺0.15
Eggs
⫺0.08
Sweet/fatty snacks Starchy staples
0.04 ⫺0.02
Perform the significance test for each correlation and write a summary about food neophobia and the consumption of different types of food. 10.59 A mechanistic explanation of popularity. Previous experimental work has suggested that the serotonin system plays an important and causal role in social status. In other words, genes may predispose individuals
610
CHAPTER 10
•
Inference for Regression
to be popular/likable. As part of a recent study on adolescents, an experimenter looked at the relationship between the expression of a particular serotonin receptor gene, a person’s “popularity,” and the person’s rule-breaking (RB) behaviors.24 RB was measured by both a questionnaire and video observation. The composite score is an equal combination of these two assessments. Here is a table of the correlations: Rule-breaking measure
Popularity
Gene expression
RB.composite
0.28
0.26
RB.questionnaire
0.22
0.23
RB.video
0.24
0.20
Sample 1 1n ⫽ 1232
0.22
RB.questionnaire
0.16
0.24
RB.video
0.19
0.16
(b) Run the regression to predict metabolic rate from lean body mass for the women in the sample and summarize the results. Do the same for the men. 10.61 Resting metabolic rate and exercise, continued. Refer to the previous exercise. It is tempting to conclude that there is a strong linear relationship for the women but no relationship for the men. Let’s look at this issue a little more carefully. METRATE (a) Find the confidence interval for the slope in the regression equation that you ran for the females. Do the same for the males. What do these suggest about the possibility that these two slopes are the same? (The formal method for making this comparison is a bit complicated and is beyond the scope of this chapter.)
Sample 1 Caucasians only 1n ⫽ 962 RB.composite
(a) Make a scatterplot of the data, using different symbols or colors for men and women. Summarize what you see in the plot.
0.23
For each correlation, test the null hypothesis that the corresponding true correlation is zero. Reproduce the table and mark the correlations that have P ⬍ 0.001 with ***, those that have P ⬍ 0.01 with **, and those that have P ⬍ 0.05 with *. Write a summary of the results of your significance tests. 10.60 Resting metabolic rate and exercise. Metabolic rate, the rate at which the body consumes energy, is important in studies of weight gain, dieting, and exercise. The following table gives data on the lean body mass and resting metabolic rate for 12 women and 7 men who are subjects in a study of dieting. Lean body mass, given in kilograms, is a person’s weight leaving out all fat. Metabolic rate is measured in calories burned per 24 hours, the same calories used to describe the energy content of foods. The researchers believe that lean body mass is an important influence on metabolic rate. METRATE Subject
Sex
Mass
Rate
Subject
Sex
Mass
Rate
1
M
62.0
1792
11
F
40.3
1189
2
M
62.9
1666
12
F
33.1
913
3
F
36.1
995
13
M
51.9
1460
4
F
54.6
1425
14
F
42.4
1124
5
F
48.5
1396
15
F
34.5
1052
6
F
42.0
1418
16
F
51.1
1347
7
M
47.4
1362
17
F
41.2
1204
8
F
50.6
1502
18
M
51.9
1867
9
F
42.0
1256
19
M
46.9
1439
10
M
48.7
1614
(b) Examine the formula for the standard error of the regression slope given on page 593. The term in the denominator is 2© 1xi ⫺ x 2 2. Find this quantity for the females; do the same for the males. How do these calculations help to explain the results of the significance tests? (c) Suppose that you were able to collect additional data for males. How would you use lean body mass in deciding which subjects to choose? 10.62 Inference over different ranges of X. Think about what would happen if you analyzed a subset of a set of data by analyzing only data for a restricted range of values of the explanatory variable. What results would you expect to change? Examine your ideas by analyzing the fuel efficiency data described in Example 10.11 (page 581). First, run a regression of MPG versus MPH using all cases. This least-squares regression line is shown in Figure 10.9. Next run a regression of MPG versus MPH for only those cases with speed less than or equal to 30 mph. Note that this corresponds to 3.4 in the log scale. Finally, do the same analysis with a restriction on the response variable. Run the analysis with only those cases with fuel efficiency less than or equal to 20 mpg. Write a summary comparing the effects of these two restrictions with each other and with the complete data set results. MPHMPG
Multiple Regression Introduction In Chapter 10 we presented methods for inference in the setting of a linear relationship between a response variable y and a single explanatory variable x. In this chapter, we use more than one explanatory variable to explain or predict a single response variable. Many of the ideas that we encountered in our study of simple linear regression carry over to the multiple linear regression setting. For example, the descriptive tools we learned in Chapter 2—scatterplots, least-squares regression, and correlation—are still essential preliminaries to inference and also provide a foundation for confidence intervals and significance tests. The introduction of several explanatory variables leads to many additional considerations. In this short chapter we cannot explore all these issues. Rather, we will outline some basic facts about inference in the multiple regression setting and then illustrate the analysis with a case study whose purpose was to predict success in college based on several high school achievement scores.
11
CHAPTER
11.1 Inference for Multiple Regression 11.2 A Case Study
611
612
CHAPTER 11
•
Multiple Regression
11.1 Inference for Multiple Regression When you complete this section, you will be able to • Describe the multiple linear regression model in terms of a population regression line and the deviations of the response variable y from this line. • Interpret regression output from statistical software to obtain the leastsquares regression equation and model standard deviation, multiple correlation coefficient, ANOVA F test, and individual regression coefficient t tests. • Explain the difference between the ANOVA F test and the t tests for individual coefficients. • Interpret a level C confidence interval or significance test for a regression coefficient. • Use diagnostic plots to check the assumptions of the multiple linear regression model.
Population multiple regression equation The simple linear regression model assumes that the mean of the response variable y depends on the explanatory variable x according to a linear equation my ⫽ b0 ⫹ b1x For any fixed value of x, the response y varies Normally around this mean and has a standard deviation s that is the same for all values of x. In the multiple regression setting, the response variable y depends on p explanatory variables, which we will denote by x1, x2,…, xp. The mean response depends on these explanatory variables according to a linear function my ⫽ b0 ⫹ b1x1 ⫹ b2x2 ⫹ p ⫹ bpxp population regression equation
Similar to simple linear regression, this expression is the population regression equation, and the observed values y vary about their means given by this equation. Just as we did in simple linear regression, we can also think of this model in terms of subpopulations of responses. Here, each subpopulation corresponds to a particular set of values for all the explanatory variables x1, x2, …, xp. In each subpopulation, y varies Normally with a mean given by the population regression equation. The regression model assumes that the standard deviation s of the responses is the same in all subpopulations.
EXAMPLE 11.1 Predicting early success in college. Our case study is based on data
collected on science majors at a large university.1 The purpose of the study was to attempt to predict success in the early university years. One measure of success was the cumulative grade point average (GPA) after three semesters. Among the explanatory variables recorded at the time the students enrolled in the university were average high school grades in mathematics (HSM), science (HSS), and English (HSE).
11.1 Inference for Multiple Regression
613
We will use high school grades to predict the response variable GPA. There are p ⫽ 3 explanatory variables: x1 5 HSM, x2 5 HSS, and x3 5 HSE. The high school grades are coded on a scale from 1 to 10, with 10 corresponding to A, 9 to A2, 8 to B1, and so on. These grades define the subpopulations. For example, the straight-C students are the subpopulation defined by HSM 5 4, HSS 5 4, and HSE 5 4. One possible multiple regression model for the subpopulation mean GPAs is GPA ⫽ 0 ⫹ 1HSM ⫹ 2HSS ⫹ 3HSE For the straight-C subpopulation of students, the model gives the subpopulation mean as GPA ⫽ 0 ⫹ 14 ⫹ 24 ⫹ 34
Data for multiple regression The data for a simple linear regression problem consist of observations 1xi, yi 2 of the two variables. Because there are several explanatory variables in multiple regression, the notation needed to describe the data is more elaborate. Each observation or case consists of a value for the response variable and for each of the explanatory variables. Call xij the value of the jth explanatory variable for the ith case. The data are then Case 1: 1x11, x12, p , x1p, y1 2 Case 2: 1x21, x22, p , x2p, y2 2 o Case n: 1xn1, xn2, p , xnp, yn 2 Here, n is the number of cases and p is the number of explanatory variables. Data are often entered into computer regression programs in this format. Each row is a case and each column corresponds to a different variable. The data for Example 11.1, with several additional explanatory variables, appear in this format in the GPA data file. Figure 11.1 shows the first 5 rows entered into an Excel spreadsheet. Grade point average (GPA) is the response variable, followed by p ⫽ 7 explanatory variables. There are a total of n ⫽ 150 students in this data set.
FIGURE 11.1 Format of data set
Excel
for Example 11.1 in an Excel spreadsheet. 1
A
B
C
D
E
F
G
H
I
obs
GPA
HSM
HSS
HSE
SATM
SATCR
SATW
sex
J
2
1
3.84
10
10
10
630
570
590
2
3
2
3.97
10
10
10
750
700
630
1
4
3
3.49
8
10
9
570
510
490
2
5
4
1.95
6
4
8
640
600
610
1
6
5
2.59
8
10
9
510
490
490
2
7
614
CHAPTER 11
•
Multiple Regression USE YOUR KNOWLEDGE 11.1 Describing a multiple regression. Traditionally, demographic and high school academic variables have been used to predict college academic success. One study investigated the influence of emotional health on GPA.2 Data from 242 students who had completed their first two semesters of college were obtained. The researchers were interested in describing how students’ second-semester grade point averages are explained by gender, a standardized test score, perfectionism, selfesteem, fatigue, optimism, and depressive symptomatology. (a) What is the response variable? (b) What is n, the number of cases? (c) What is p, the number of explanatory variables? (d) What are the explanatory variables?
Multiple linear regression model LOOK BACK DATA 5 FIT 1 RESIDUAL, p. 567
We combine the population regression equation and assumptions about variation to construct the multiple linear regression model. The subpopulation means describe the FIT part of our statistical model. The RESIDUAL part represents the variation of observations about the means. We will use the same notation for the residual that we used in the simple linear regression model. The symbol e represents the deviation of an individual observation from its subpopulation mean. We assume that these deviations are Normally distributed with mean 0 and an unknown model standard deviation s that does not depend on the values of the x variables. These are assumptions that we can check by examining the residuals in the same way that we did for simple linear regression.
MULTIPLE LINEAR REGRESSION MODEL The statistical model for multiple linear regression is yi ⫽ b0 ⫹ b1xi1 ⫹ b2xi2 ⫹ p ⫹ bpxip ⫹ ei for i ⫽ 1, 2, . . . , n. The mean response my is a linear function of the explanatory variables: my ⫽ b0 ⫹ b1x1 ⫹ b2x2 ⫹ p ⫹ bpxp The deviations ei are assumed to be independent and Normally distributed with mean 0 and standard deviation s. In other words, they are an SRS from the N10, s2 distribution. The parameters of the model are b0, b1, b2, . . . , bp, and s.
The assumption that the subpopulation means are related to the regression coefficients b by the equation my ⫽ b0 ⫹ b1x1 ⫹ b2x2 ⫹ p ⫹ bpxp
11.1 Inference for Multiple Regression
615
implies that we can estimate all subpopulation means from estimates of the b’s. To the extent that this equation is accurate, we have a useful tool for describing how the mean of y varies with the collection of x’s. We do, however, need to be cautious when interpreting each of the regression coefficients in a multiple regression. First, the b0 coefficient represents the mean of y when all the x variables equal zero. Even more so than in simple linear regression, this subpopulation is rarely of interest. Second, the description provided by the regression coefficient of each x variable is similar to that provided by the slope in simple linear regression but only in a specific situation, namely, when all other x variables are held constant. We need this extra condition because with multiple x variables, it is quite possible that a unit change in one x variable may be associated with changes in other x variables. If that occurs, then the change in the mean of y is not described by just a single regression coefficient. USE YOUR KNOWLEDGE 11.2 Understanding the fitted regression line. The fitted regression equation for a multiple regression is yˆ ⫽ ⫺1.8 ⫹ 6.1x1 ⫺ 1.1x2 (a) If x1 ⫽ 3 and x2 ⫽ 1, what is the predicted value of y? (b) For the answer to part (a) to be valid, is it necessary that the values x1 ⫽ 3 and x2 ⫽ 1 correspond to a case in the data set? Explain why or why not. (c) If you hold x2 at a fixed value, what is the effect of an increase of two units in x1 on the predicted value of y?
Estimation of the multiple regression parameters LOOK BACK least squares, p. 113
Similar to simple linear regression, we use the method of least squares to obtain estimators of the regression coefficients b. The details, however, are more complicated. Let b0, b1, b2, p , bp denote the estimators of the parameters b0, b1, b2, p , bp For the ith observation, the predicted response is yˆ i ⫽ b0 ⫹ b1xi1 ⫹ b2xi2 ⫹ p ⫹ bpxip
LOOK BACK residual, p. 569
The ith residual, the difference between the observed and the predicted response, is therefore ei ⫽ observed response ⫺ predicted response ⫽ yi ⫺ yˆ i ⫽ yi ⫺ b0 ⫺ b1xi1 ⫺ b2xi2 ⫺ p ⫺ bpxip The method of least squares chooses the values of the b’s that make the sum of the squared residuals as small as possible. In other words, the parameter estimates b0, b1, b2, . . . , bp minimize the quantity g 1 yi ⫺ b0 ⫺ b1xi1 ⫺ b2xi2 ⫺ p ⫺ bpxip 2 2
616
CHAPTER 11
•
Multiple Regression The formula for the least-squares estimates is complicated. We will be content to understand the principle on which it is based and to let software do the computations. The parameter s 2 measures the variability of the responses about the population regression equation. As in the case of simple linear regression, we estimate s 2 by an average of the squared residuals. The estimator is s2 ⫽
g e2i n⫺p⫺1
⫽ LOOK BACK degrees of freedom, p. 44
g 1yi ⫺ yˆ i 2 2 n⫺p⫺1
The quantity n ⫺ p ⫺ 1 is the degrees of freedom associated with s2. The degrees of freedom equal the sample size, n, minus p ⫹ 1, the number of b’s we must estimate to fit the model. In the simple linear regression case there is just one explanatory variable, so p ⫽ 1 and the degrees of freedom are n ⫺ 2. To the model standard deviation s we use s ⫽ 2s2
Confidence intervals and significance tests for regression coefficients We can obtain confidence intervals and perform significance tests for each of the regression coefficients bj as we did in simple linear regression. The standard errors of the b’s have more complicated formulas, but all are multiples of s. We again rely on statistical software to do the calculations.
CONFIDENCE INTERVALS AND SIGNIFICANCE TESTS FOR bJ A level C confidence interval for bj is bj ⫾ t*SEbj where SEbj is the standard error of bj and t* is the value for the t1n ⫺ p ⫺ 12 density curve with area C between ⫺t* and t*. To test the hypothesis H0: bj ⫽ 0, compute the t statistic t⫽
bj SEbj
In terms of a random variable T having the t1n ⫺ p ⫺ 12 distribution, the P-value for a test of H0 against Ha: bj ⬎ 0 is P1T ⱖ t2
Ha: bj ⬍ 0 is P1T ⱕ t2
Ha: bj ⬆ 0 is 2P1T ⱖ 0 t 0 2
t
t
t
11.1 Inference for Multiple Regression LOOK BACK confidence intervals for mean response, p. 577 prediction intervals, p. 579
617
Because regression is often used for prediction, we may wish to use multiple regression models to construct confidence intervals for a mean response and prediction intervals for a future observation. The basic ideas are the same as in the simple linear regression case. In most software systems, the same commands that give confidence and prediction intervals for simple linear regression work for multiple regression. The only difference is that we specify a list of explanatory variables rather than a single variable. Modern software allows us to perform these rather complex calculations without an intimate knowledge of all the computational details. This frees us to concentrate on the meaning and appropriate use of the results.
ANOVA table for multiple regression LOOK BACK ANOVA F test, p. 588
In simple linear regression the F test from the ANOVA table is equivalent to the two-sided t test of the hypothesis that the slope of the regression line is 0. For multiple regression there is a corresponding ANOVA F test, but it tests the hypothesis that all the regression coefficients (with the exception of the intercept) are 0. Here is the general form of the ANOVA table for multiple regression:
Source
Degrees of freedom
Sum of squares
Mean square
F
Model Error
p n⫺p⫺1
g 1yˆ i ⫺ y2 2 g 1yi ⫺ yˆ i 2 2
SSM/DFM SSE/DFE
MSM/MSE
n⫺1
g 1yi ⫺ y2 2
SST/DFT
Total
The ANOVA table is similar to that for simple linear regression. The degrees of freedom for the model increase from 1 to p to reflect the fact that we now have p explanatory variables rather than just one. As a consequence, the degrees of freedom for error decrease by the same amount. It is always a good idea to calculate the degrees of freedom by hand and then check that your software agrees with your calculations. In this way you can verify that your software is using the number of cases and number of explanatory variables that you intended. The sums of squares represent sources of variation. Once again, both the sums of squares and their degrees of freedom add: SST ⫽ SSM ⫹ SSE DFT ⫽ DFM ⫹ DFE
LOOK BACK F statistic, p. 588
The estimate of the variance s2 for our model is again given by the MSE in the ANOVA table. That is, s2 5 MSE. The ratio MSM/MSE is an F statistic for testing the null hypothesis H0: b1 ⫽ b2 ⫽ p ⫽ bp ⫽ 0 against the alternative hypothesis Ha: at least one of the bj is not 0 The null hypothesis says that none of the explanatory variables are predictors of the response variable when used in the form expressed by the multiple regression equation. The alternative states that at least one of them is a predictor of the response variable.
618
CHAPTER 11
•
Multiple Regression As in simple linear regression, large values of F give evidence against H0. When H0 is true, F has the F1p, n ⫺ p ⫺ 12 distribution. The degrees of freedom for the F distribution are those associated with the model and error in the ANOVA table. A common error in the use of multiple regression is to assume that all the regression coefficients are statistically different from zero whenever the F statistic has a small P-value. Be sure that you understand the difference between the F test and the t tests for individual coefficients.
ANALYSIS OF VARIANCE F TEST In the multiple regression model, the hypothesis H0: b1 ⫽ b2 ⫽ p ⫽ bp ⫽ 0 is tested against the alternative hypothesis Ha: at least one of the bj is not 0 by the analysis of variance F statistic F⫽
MSM MSE
The P-value is the probability that a random variable having the F1p, n ⫺ p ⫺ 12 distribution is greater than or equal to the calculated value of the F statistic.
Squared multiple correlation R2 For simple linear regression we noted that the square of the sample correlation could be written as the ratio of SSM to SST and could be interpreted as the proportion of variation in y explained by x. A similar statistic is routinely calculated for multiple regression.
THE SQUARED MULTIPLE CORRELATION The statistic g 1yˆ i ⫺ y2 2 SSM R ⫽ ⫽ SST g 1yi ⫺ y2 2 2
is the proportion of the variation of the response variable y that is explained by the explanatory variables x1, x2, . . . , xp in a multiple linear regression.
multiple correlation coefficient
Often, R2 is multiplied by 100 and expressed as a percent. The square root of R2, called the multiple correlation coefficient, is the correlation between the observations yi and the predicted values yˆ i.
11.2 A Case Study
619
USE YOUR KNOWLEDGE 11.3 Significance tests for regression coefficients. As part of a study on undergraduate success among actuarial students a multiple regression was run using 82 students.3 The following table contains the estimated coefficients and standard errors: Variable
Estimate
SE
Intercept
⫺0.764
0.651
SAT Math
0.00156
0.00074
SAT Verbal
0.00164
0.00076
High school rank
1.470
0.430
College placement exam
0.889
0.402
(a) All the estimated coefficients for the explanatory variables are positive. Is this what you would expect? Explain. (b) What are the degrees of freedom for the model and error? (c) Test the significance of each coefficient and state your conclusions. 11.4 ANOVA table for multiple regression. Use the following information and the general form of the ANOVA table for multiple regression on page 617 to perform the ANOVA F test and compute R2.
Source
Degrees of freedom
Model
Sum of squares 75
Error
53
Total
57
594
11.2 A Case Study Preliminary analysis In this section we illustrate multiple regression by analyzing the data from the study described in Example 11.1. The response variable is the cumulative GPA, on a 4-point scale, after three semesters. The explanatory variables previously mentioned are average high school grades, represented by HSM, HSS, and HSE. We also examine the SAT Mathematics (SATM), SAT Critical Reading (SATCR), and SAT Writing (SATW) scores as explanatory variables. We have data for n ⫽ 150 students in the study. We use SAS, Excel, and Minitab to illustrate the outputs that are given by most software. The first step in the analysis is to carefully examine each of the variables. Means, standard deviations, and minimum and maximum values appear in Figure 11.2. The minimum value for high school mathematics (HSM) appears to be rather extreme; it is 18.59 ⫺ 2.002兾1.46 ⫽ 4.51 standard deviations below
620
CHAPTER 11
•
Multiple Regression
FIGURE 11.2 Descriptive statistics for the College of Science student case study.
SAS
The SAS System The MEANS Procedure
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
GPA
GPA
150
2.8421333
0.8178992
0.0300000
4.0000000
SATM
SATM
150
623.6000000
74.8356589
460.0000000
800.0000000
SATCR
SATCR
150
573.8000000
87.6208274
330.0000000
800.0000000
SATW
SATW
150
562.6000000
80.0874522
350.0000000
770.0000000
HSM
HSM
150
8.5866667
1.4617571
2.0000000
10.0000000
HSS
HSS
150
8.8000000
1.3951017
4.0000000
10.0000000
HSE
HSE
150
8.8333333
1.2660601
4.0000000
10.0000000
Done
the mean. Similarly, the minimum value for GPA is 3.43 standard deviations below the mean. We do not discard either of these cases at this time but will take care in our subsequent analyses to see if they have an excessive influence on our results. The mean for the SATM score is higher than the means for the Critical Reading (SATCR) and Writing (SATW) scores, as we might expect for a group of science majors. The three SAT standard deviations are all about the same. Although mathematics scores were higher on the SAT, the means and standard deviations of the three high school grade variables are very similar. Since the level and difficulty of high school courses vary within and across schools, this may not be that surprising. The mean GPA is 2.842 on a 4-point scale, with standard deviation 0.818. Because the variables GPA, SATM, SATCR, and SATW have many possible values, we could use stemplots or histograms to examine the shapes of their distributions. Normal quantile plots indicate whether or not the distributions look Normal. It is important to note that the multiple regression model does not require any of these distributions to be Normal. Only the deviations of the responses y from their means are assumed to be Normal. The purpose of examining these plots is to understand something about each variable alone before attempting to use it in a complicated model. Extreme values of any variable should be noted and checked for accuracy. If found to be correct, the cases with these values should be carefully examined to see if they are truly exceptional and perhaps do not belong in the same analysis with the other cases. When our data on science majors are examined in this way, no obvious problems are evident. The high school grade variables HSM, HSS, and HSE have relatively few values and are best summarized by giving the relative frequencies for each possible value. The output in Figure 11.3 provides these summaries. The distributions are all skewed, with a large proportion of high grades (10 5 A and 9 5 A2.) Again we emphasize that these distributions need not be Normal.
11.2 A Case Study FIGURE 11.3 The distributions of the high school grade variables.
621
SAS
HSM
HSM
Frequency
Percent
Cumulative Frequency
Cumulative Percent
2
1
0.67
1
0.67
5
2
1.33
3
2.00
6
13
8.67
16
10.67
7
14
9.33
30
20.00
8
35
23.33
65
43.33
9
30
20.00
95
63.33
10
55
36.67
150
100.00
HSS
HSS
Frequency
Percent
Cumulative Frequency
Cumulative Percent
4
2
1.33
2
1.33
5
3
2.00
5
3.33
6
4
2.67
9
6.00
7
20
13.33
29
19.33
8
19
12.67
48
32.00
9
39
26.00
87
58.00
10
63
42.00
150
100.00
HSE
HSE
Frequency
Percent
Cumulative Frequency
Cumulative Percent
4
1
0.67
1
0.67
5
1
0.67
2
1.33
6
7
4.67
9
6.00
7
13
8.67
22
14.67
8
28
18.67
50
33.33
9
41
27.33
91
60.67
10
59
39.33
150
100.00
Done
Relationships between pairs of variables LOOK BACK correlation, p. 103
The second step in our analysis is to examine the relationships between all pairs of variables. Scatterplots and correlations are our tools for studying twovariable relationships. The correlations appear in Figure 11.4. The output includes the P-value for the test of the null hypothesis that the population correlation is 0 versus the two-sided alternative for each pair. Thus, we see that
622
CHAPTER 11
•
Multiple Regression
FIGURE 11.4 Correlations among the case study variables.
Minitab
Results for: data Correlations: GPA, HSM, HSS, SATM, SATCR, SATW GPA
HSM
HSS
HSE
HSM
0.420 0.000
HSS
0.443 0.670 0.000 0.000
HSE
0.359 0.485 0.695 0.000 0.000 0.000
SATM
0.330 0.325 0.215 0.134 0.000 0.000 0.008 0.102
SATM SATCR
SATCR 0.251 0.150 0.215 0.259 0.579 0.002 0.067 0.008 0.001 0.000 SATW
0.223 0.072 0.161 0.185 0.551 0.734 0.006 0.383 0.048 0.023 0.000 0.000
Cell Contents:Pearson correlation P-Value
Current Worksheet: data
the correlation between GPA and HSM is 0.42, with a P-value of 0.000 (that is, P ⬍ 0.00052, whereas the correlation between GPA and SATW is 0.22, with a P-value of 0.006. Because of the large sample size, even somewhat weak associations are found to be statistically significant. As we might expect, math and science grades have the highest correlation with GPA 1r ⫽ 0.42 and r ⫽ 0.442, followed by English grades (0.36) and then SAT Mathematics (0.33). SAT Critical Reading (SATCR) and SAT Writing (SATW) have comparable, somewhat weak, correlations with GPA. On the other hand, SATCR and SATW have a high correlation with each other (0.73). The high school grades also correlate well with each other (0.49 to 0.70). SATM correlates well with the other SAT scores (0.58 and 0.55), somewhat with HSM (0.32), less with HSS (0.22), and poorly with HSE (0.13). SATCR and SATW do not correlate well with any of the high school grades (0.07 to 0.26). It is important to keep in mind that by examining pairs of variables we are seeking a better understanding of the data. The fact that the correlation of a particular explanatory variable with the response variable does not achieve statistical significance does not necessarily imply that it will not be a useful (and statistically significant) predictor in a multiple regression. Numerical summaries such as correlations are useful, but plots are generally more informative when seeking to understand data. Plots tell us whether the numerical summary gives a fair representation of the data. For a multiple regression, each pair of variables should be plotted. For the seven variables in our case study, this means that we should examine 21 plots. In general, there are p ⫹ 1 variables in a multiple regression analysis with p explanatory variables, so that p1 p ⫹ 12兾2 plots are required. Multiple regression is a complicated procedure. If we do not do the necessary preliminary work, we are in serious danger of producing useless or misleading results. We leave the task of making these plots as an exercise.
11.2 A Case Study
623
USE YOUR KNOWLEDGE DATA GPA
11.5 Pairwise relationships among variables in the GPA data set. Using a statistical package, generate the pairwise correlations and scatterplots discussed previously. Comment on any unusual patterns or observations.
Regression on high school grades CHALLENGE
To explore the relationship between the explanatory variables and our response variable GPA, we run several multiple regressions. The explanatory variables fall into two classes. High school grades are represented by HSM, HSS, and HSE, and standardized tests are represented by the three SAT scores. We begin our analysis by using the high school grades to predict GPA. Figure 11.5 gives the multiple regression output. The output contains an ANOVA table, some additional descriptive statistics, and information about the parameter estimates. When examining any ANOVA table, it is a good idea to first verify the degrees of freedom. This ensures that we have not made some serious error in specifying the model for the software or in entering the data. Because there are n ⫽ 150 cases, we have DFT ⫽ n ⫺ 1 ⫽ 149. The three explanatory variables give DFM ⫽ p ⫽ 3 and DFE ⫽ n ⫺ p ⫺ 1 ⫽ 150 ⫺ 3 ⫺ 1 ⫽ 146. The ANOVA F statistic is 14.35, with a P-value of ⬍ 0.0001. Under the null hypothesis H0: b1 ⫽ b2 ⫽ b3 ⫽ 0 the F statistic has an F13, 1462 distribution. According to this distribution, the chance of obtaining an F statistic of 14.35 or larger is less than 0.0001. We therefore conclude that at least one of the three regression coefficients for the high school grades is different from 0 in the population regression equation. In the descriptive statistics that follow the ANOVA table we find that Root MSE is 0.726. This value is the square root of the MSE given in the ANOVA table and is s, the estimate of the parameter s of our model. The value of R2 is 0.23. That is, 23% of the observed variation in the GPA scores is explained by linear regression on high school grades. Although the P-value of the F test is very small, the model does not explain very much of the variation in GPA. Remember, a small P-value does not necessarily tell us that we have a strong predictive relationship, particularly when the sample size is large. From the Parameter Estimates section of the computer output we obtain the fitted regression equation
k
GPA ⫽ 0.069 ⫹ 0.123HSM ⫹ 0.136HSS ⫹ 0.058HSE Let’s find the predicted GPA for a student with an A– average in HSM, B1 in HSS, and B in HSE. The explanatory variables are HSM ⫽ 9, HSS ⫽ 8, and HSE ⫽ 7. The predicted GPA is
k
GPA ⫽ 0.069 ⫹ 0.123192 ⫹ 0.136182 ⫹ 0.058172 ⫽ 2.67
624
CHAPTER 11
•
Multiple Regression
FIGURE 11.5 Multiple regression output for regression using high school grades to predict GPA.
SAS Number of Observations Read
150
Number of Observations Used
150
Analysis of Variance Source
DF
Sum of Squares
Mean Square
F Value
Model
3
22.69989
7.56663
14.35
Error
146
76.97503
0.52723
Corrected Total
149
99.67492
Pr > F |t|
Done
Recall that the t statistics for testing the regression coefficients are obtained by dividing the estimates by their standard errors. Thus, for the coefficient of HSM we obtain the t-value given in the output by calculating t⫽
b 0.12325 ⫽ ⫽ 2.25 SEb 0.05485
The P-values appear in the last column. Note that these P-values are for the two-sided alternatives. HSM has a P-value of 0.0262, and we conclude that the regression coefficient for this explanatory variable is significantly different from 0. The P-values for the other explanatory variables (0.0536 for HSS and 0.3728 for HSE) do not achieve statistical significance.
Interpretation of results The significance tests for the individual regression coefficients seem to contradict the impression obtained by examining the correlations in Figure 11.4. In that display we see that the correlation between GPA and HSS is 0.44 and the correlation between GPA and HSE is 0.36. The P-values for both of these correlations are ⬍ 0.0005. In other words, if we used HSS alone in a regression to predict GPA, or if we used HSE alone, we would obtain statistically significant regression coefficients. This phenomenon is not unusual in multiple regression analysis. Part of the explanation lies in the correlations between HSM and the other two
11.2 A Case Study
625
explanatory variables. These are rather high (at least compared with most other correlations in Figure 11.4). The correlation between HSM and HSS is 0.67, and that between HSM and HSE is 0.49. Thus, when we have a regression model that contains all three high school grades as explanatory variables, there is considerable overlap of the predictive information contained in these variables. The significance tests for individual regression coefficients assess the significance of each predictor variable assuming that all other predictors are included in the regression equation. Given that we use a model with HSM and HSS as predictors, the coefficient of HSE is not statistically significant. Similarly, given that we have HSM and HSE in the model, HSS does not have a significant regression coefficient. HSM, however, adds significantly to our ability to predict GPA even after HSS and HSE are already in the model. Unfortunately, we cannot conclude from this analysis that the pair of explanatory variables HSS and HSE contribute nothing significant to our model for predicting GPA once HSM is in the model. Questions like these require fitting additional models. The impact of relations among the several explanatory variables on fitting models for the response is the most important new phenomenon encountered in moving from simple linear regression to multiple regression. In this chapter, we can only illustrate some of the many complicated problems that can arise.
Residuals As in simple linear regression, we should always examine the residuals as an aid to determining whether the multiple regression model is appropriate for the data. Because there are several explanatory variables, we must examine several residual plots. It is usual to plot the residuals versus the predicted values yˆ and also versus each of the explanatory variables. Look for outliers, influential observations, evidence of a curved (rather than linear) relation, and anything else unusual. Again, we leave the task of making these plots as an exercise. The plots all appear to show more or less random noise above and below the center value of 0. If the deviations e in the model are Normally distributed, the residuals should be Normally distributed. Figure 11.6 presents a Normal quantile plot and histogram of the residuals. Both suggest some skewness (shorter right tail) in the distribution. However, given our large sample size, we do not think this skewness is strong enough to invalidate this analysis.
USE YOUR KNOWLEDGE DATA GPA
11.6 Residual plots for the GPA analysis. Using a statistical package, fit the linear model with HSM and HSE as predictors and obtain the residuals and predicted values. Plot the residuals versus the predicted values, HSM, and HSE. Are the residuals more or less randomly dispersed around zero? Comment on any unusual patterns.
Refining the model CHALLENGE
Because the variable HSE has the largest P-value of the three explanatory variables (see Figure 11.5) and therefore appears to contribute the least to our explanation of GPA, we rerun the regression using only HSM and HSS as explanatory
626
CHAPTER 11
•
Multiple Regression
FIGURE 11.6 (a) Normal
2
quantile plot and (b) histogram of the residuals from the high school grades model. There are no important deviations from Normality.
(a)
Residual
1 0 –1 –2 –3 –3 40
–2
–1
0 Normal score
1
2
3
(b)
Percent
30
20
10
0 –2.0
–1.5
–1.0
–0.5 0.0 Residual
0.5
1.0
1.5
variables. The SAS output appears in Figure 11.7. The F statistic indicates that we can reject the null hypothesis that the regression coefficients for the two explanatory variables are both 0. The P-value is still ⬍0.0001. The value of R2 has dropped very slightly compared with our previous run, from 0.2277 to 0.2235. Thus, dropping HSE from the model resulted in the loss of very little explanatory power. The measure s of variation about the fitted equation (Root MSE in the printout) is nearly identical for the two regressions, another indication that we lose very little when we drop HSE. The t statistics for the individual regression coefficients indicate that HSM is still significant 1P ⫽ 0.02402, while the statistic for HSS is larger than before (2.99 versus 1.95) and is now statistically significant 1P ⫽ 0.00322. Comparison of the fitted equations for the two multiple regression analyses tells us something more about the intricacies of this procedure. For the first run we have
k
GPA ⫽ 0.069 ⫹ 0.123HSM ⫹ 0.136HSS ⫹ 0.058HSE whereas the second gives us
k
GPA ⫽ 0.257 ⫹ 0.125HSM ⫹ 0.172HSS
11.2 A Case Study
627
Eliminating HSE from the model changes the regression coefficients for all the remaining variables and the intercept. This phenomenon occurs quite generally in multiple regression. Individual regression coefficients, their standard errors, and significance tests are meaningful only when interpreted in the context of the other explanatory variables in the model.
FIGURE 11.7 Multiple
SAS
regression output for regression using HSM and HSS to predict GPA.
Analysis of Variance Sum of Squares
Source
DF
Model
2
Mean Square
F Value
Pr > F
22.27859 11.13930
21.16
|t|
Intercept
1
0.25696
0.40189
0.64
0.5236
HSM
HSM
1
0.12498
0.05478
2.28
0.0240
HSS
HSS
1
0.17182
0.05740
2.99
0.0032
Variable
Label
Intercept
Done
Regression on SAT scores We now turn to the problem of predicting GPA using the three SAT scores. Figure 11.8 gives the output. The fitted model is
k
GPA ⫽ 0.45797 ⫹ 0.00301SATM ⫹ 0.00080SATCR ⫹ 0.00008SATW The degrees of freedom are as expected: 3, 146, and 149. The F statistic is 6.28, with a P-value of 0.0005. We conclude that the regression coefficients for SATM, SATCR, and SATW are not all 0. Recall that we obtained the P-value ⬍ 0.0001 when we used high school grades to predict GPA. Both multiple regression equations are highly significant, but this obscures the fact that the two models have quite different explanatory power. For the SAT regression, R2 ⫽ 0.1143, whereas for the high school grades model even with only HSM and HSS (Figure 11.7), we have R2 ⫽ 0.2235, a value almost twice as large. Stating that we have a statistically significant result is quite different from saying that an effect is large or important. Further examination of the output in Figure 11.8 reveals that the coefficient of SATM is significant 1t ⫽ 2.81, P ⫽ 0.00562, and that SATCR 1t ⫽ 0.71, P ⫽ 0.47672 and SATW 1t ⫽ 0.07, P ⫽ 0.94792 are not. For a complete analysis we should carefully examine the residuals. Also, we might want to run the analysis without SATW and the analysis with SATM as the only explanatory variable.
628
CHAPTER 11
•
Multiple Regression
FIGURE 11.8 Multiple regression output for regression using SAT scores to predict GPA.
SAS Analysis of Variance Sum of Mean DF Squares Square
Source Model
3
11.38939 3.79646
Error
146
88.28553 0.60470
Corrected Total
149
99.67492
F Value
Pr > F
6.28
0.0005
Root MSE
0.77762
R-Square 0.1143
Dependent Mean
2.84213
Adj R-Sq 0.0961
Coeff Var
27.36049
Parameter Estimates Parameter Standard Variable Label DF Estimate Error Intercept Intercept 1 0.45797 0.56657
t Value
Pr > |t|
0.81 0.4202
SATM
SATM
1
0.00301
0.00107
2.81 0.0056
SATCR SATW
SATCR SATW
1 0.00080324 1 0.00007882
0.00113 0.00120
0.71 0.4767 0.07 0.9479
Done
Regression using all variables We have seen that fitting a model using either the high school grades or the SAT scores results in a highly significant regression equation. The mathematics component of each of these groups of explanatory variables appears to be a key predictor. Comparing the values of R2 for the two models indicates that high school grades are better predictors than SAT scores. Can we get a better prediction equation using all the explanatory variables together in one multiple regression? To address this question we run the regression with all six explanatory variables. The output from SAS, Minitab, and Excel appears in Figure 11.9. Although the format and organization of outputs differ among software packages, the basic results that we need are easy to find. The degrees of freedom are as expected: 6, 143, and 149. The F statistic is 8.95, with a P-value ⬍ 0.0001, so at least one of our explanatory variables has a nonzero regression coefficient. This result is not surprising, given that we have already seen that HSM and SATM are strong predictors of GPA. The value of R2 is 0.2730, which is about 0.05 higher than the value of 0.2235 that we found for the high school grades regression. Examination of the t statistics and the associated P-values for the individual regression coefficients reveals a surprising result. None of the variables are significant! At first, this result may appear to contradict the ANOVA results. How can the model explain over 27% of the variation and have t tests that suggest none of the variables make a significant contribution? Once again it is important to understand that these t tests assess the contribution of each variable when it is added to a model that already has the other five explanatory variables. This result does not necessarily mean that the regression coefficients for the six explanatory variables are all 0. It simply means that the contribution of each variable overlaps considerably with the contribution of the other five variables already in the model.
11.2 A Case Study FIGURE 11.9 Multiple regression output for regression using all variables to predict GPA.
SAS Analysis of Variance Sum of Mean DF Squares Square
Source Model
6
27.21030 4.53505
Error
143
72.46462 0.50675
Corrected Total
149
99.67492
F Value
Pr > F
8.95
|t| –1.93 0.0562
SATM
SATM
1
0.00199
0.00106
1.88 0.0619
SATCR SATW HSM HSS HSE
SATCR SATW HSM HSS HSE
1 0.00015701 1 0.00047398 1 0.09148 1 0.13010 1 0.05679
0.00105 0.00112 0.05718 0.06877 0.06568
0.15 0.42 1.60 1.89 0.86
0.8813 0.6719 0.1119 0.0605 0.3887
Test SAT Results for Dependent Variable GPA Mean Source DF Square F Value Pr > F 3
1.50347
Denominator 143
0.50675
Numerator
2.97 0.0341
Test HS Results for Dependent Variable GPA Mean Source DF Square F Value Pr > F Numerator
3
5.27364
Denominator 143
0.50675
10.41 F
2.89 0.0362
12.2 Comparing the Means FIGURE 12.11 Continued
675
Minitab
One-way ANOVA: Score versus Group DF
SS
MS
F
P
Group
3
24.42
8.14
2.89
0.036
Error
218
613.14
2.81
Total
221
637.56
Source
S = 1.677
Level Blue Brown Down Green
N 67 37 41 77
R-Sq = 3.83%
Mean 3.194 3.724 3.107 3.860
StDev 1.755 1.715 1.525 1.666
Pooled StDev = 1.677
R-Sq (adj) = 2.51% Individual 95% CIs For Mean Based on Pooled StDev ---------+---------+---------+---------+(---------*---------) (---------*---------) (---------*---------) (---------*---------*) ---------+---------+---------+---------+3.00 3.50 4.00 4.50
USE YOUR KNOWLEDGE 12.7 Why no multiple comparisons? Any pooled two-sample t problem can be run as a one-way ANOVA with I ⫽ 2. Explain why it is inappropriate to analyze the data using contrasts or multiple-comparisons procedures in this setting. 12.8 Growth of Douglas fir seedlings. An experiment was conducted to compare the growth of Douglas fir seedlings under three different levels of vegetation control (0%, 50%, and 100%). Twenty seedlings were randomized to each level of control. The resulting sample means for stem volume were 53, 76, and 110 cubic centimeters (cm3), respectively, with sp ⫽ 28 cm3. The researcher hypothesized that the average growth at 50% control would be less than the average of the 0% and 100% levels. (a) What are the coefficients for testing this contrast? (b) Perform the test and report the test statistic, degrees of freedom, and P-value. Do the data provide evidence to support this hypothesis?
Power
LOOK BACK power, p. 477
Recall that the power of a test is the probability of rejecting H0 when Ha is in fact true. Power measures how likely a test is to detect a specific alternative. When planning a study in which ANOVA will be used for the analysis, it is important to perform power calculations to check that the sample sizes are adequate to detect differences among means that are judged to be important. Power calculations also help evaluate and interpret the results of studies in which H0 was not rejected. We sometimes find that the power of the test was so low against reasonable alternatives that there was little chance of obtaining a significant F. In Chapter 7 we found the power for the two-sample t test. One-way ANOVA is a generalization of the two-sample t test, so it is not surprising that the procedure for calculating power is quite similar. Here are the steps that are needed: 1. Specify (a) an alternative (Ha) that you consider important; that is, values for the true population means m1, m2, . . . , mI;
676
CHAPTER 12
•
One-Way Analysis of Variance (b) sample sizes n1, n2, . . . , nI; usually these will all be equal to the common value n; (c) a level of significance a, usually equal to 0.05; and (d) a guess at the standard deviation s. 2. Use the degrees of freedom DFG ⫽ I ⫺ 1 and DFE ⫽ N ⫺ I to find the critical value that will lead to the rejection of H0. This value, which we denote by F*, is the upper a critical value for the F1DFG, DFE2 distribution.
noncentrality parameter
noncentral F distribution
3. Calculate the noncentrality parameter5 g ni 1mi ⫺ m2 2 l⫽ s2 where m is a weighted average of the group means ni m ⫽ g mi N 4. Find the power, which is the probability of rejecting H0 when the alternative hypothesis is true, that is, the probability that the observed F is greater than F*. Under Ha, the F statistic has a distribution known as the noncentral F distribution. SAS, for example, has a function for this distribution. Using this function, the power is Power ⫽ 1 ⫺ PROBF1F*, DFG, DFE, l2 Note that, if the ni are all equal to the common value n, then m is the ordinary average of the mi and n g 1mi ⫺ m2 2 l⫽ s2 If the means are all equal (the ANOVA H0), then l ⫽ 0. The noncentrality parameter measures how unequal the given set of means is. Large l points to an alternative far from H0, and we expect the ANOVA F test to have high power. Software makes calculation of the power quite easy, but tables and charts are also available.
EXAMPLE 12.26 Power of a reading comprehension study. Suppose that a study on reading comprehension for three different teaching methods has 10 students in each group. How likely is this study to detect differences in the mean responses that would be viewed as important? A previous study performed in a different setting found sample means of 41, 47, and 44, and the pooled standard deviation was 7. Based on these results, we will use m1 ⫽ 41, m2 ⫽ 47, m3 ⫽ 44, and s ⫽ 7 in a calculation of power. The ni are equal, so m is simply the average of the mi: 41 ⫹ 47 ⫹ 44 m⫽ ⫽ 44 3 The noncentrality parameter is therefore l⫽
n g 1mi ⫺ m2 2
s2 1102 3 141 ⫺ 442 2 ⫹ 147 ⫺ 442 2 ⫹ 144 ⫺ 442 2 4 ⫽ 49 1102 1182 ⫽ ⫽ 3.67 49
12.2 Comparing the Means
677
Because there are three groups with 10 observations per group, DFG 5 2 and DFE 5 27. The critical value for ␣ ⫽ 0.05 is F* 5 3.35. The power is therefore 1 ⫺ PROBF13.35, 2, 27, 3.672 ⫽ 0.3486 The chance that we reject the ANOVA H0 at the 5% significance level is only about 35%. If the assumed values of the mi in this example describe differences among the groups that the experimenter wants to detect, then we would want to use more than 10 subjects per group.
EXAMPLE 12.27 Changing the sample size. To decide on an appropriate sample size for the experiment described in the previous example, we repeat the power calculation for different values of n, the number of subjects in each group. Here are the results: n
DFG
DFE
F*
l
Power
20
2
57
3.16
7.35
0.65
30
2
87
3.10
11.02
0.84
40
2
117
3.07
14.69
0.93
50
2
147
3.06
18.37
0.97
100
2
297
3.03
36.73
F
296.35 F
group
1
168432.0800
168432.0800
696.65
Z
0.0303
Two-Sided Pr > |Z|
0.0606
t Approximation One-Sided Pr > Z
0.0514
Two-Sided Pr > |Z|
0.1027
Exact Test One-Sided Pr ≥ S
0.0286
Two-Sided Pr ≥ |S – Mean|
0.0571
Z includes a continuity correction of 0.5.
Done
LOOK BACK two-sample t test, p. 454
It is worth noting that the two-sample t test for the one-sided alternative gives essentially the same result as the Wilcoxon test in Example 15.3 (t ⫽ 2.95, P ⫽ 0.016).
The Normal approximation The rank sum statistic W becomes approximately Normal as the two sample sizes increase. We can then form yet another z statistic by standardizing W: z⫽ ⫽
LOOK BACK continuity correction, p. 335
W ⫺ mW sW W ⫺ n1 1N ⫹ 12兾2 2n1n2 1N ⫹ 12兾12
Use standard Normal probability calculations to find P-values for this statistic. Because W takes only whole-number values, the continuity correction improves the accuracy of the approximation.
15-8
CHAPTER 15
•
Nonparametric Tests
EXAMPLE 15.4 The continuity correction. The standardized rank sum statistic W in our baseball example is z⫽
W ⫺ mW 25 ⫺ 18 ⫽ ⫽ 2.02 sW 3.464
We expect W to be larger when the alternative hypothesis is true, so the approximate P-value is P1Z ⱖ 2.022 ⫽ 0.0217 The continuity correction acts as if the whole number 25 occupies the entire interval from 24.5 to 25.5. We calculate the P-value P1W ⱖ 252 as P1W ⱖ 24.52 because the value 25 is included in the range whose probability we want. Here is the calculation: P1W ⱖ 24.52 ⫽ P a
W ⫺ mW 24.5 ⫺ 18 ⱖ b sW 3.464
⫽ P1Z ⱖ 1.8762 ⫽ 0.303 The continuity correction gives a result closer to the exact value P ⫽ 0.0286 (see Figure 15.2).
USE YOUR KNOWLEDGE DATA SPAS
DATA SPAS2
15.5 The P-value for top spas. Refer to Exercises 15.1 and 15.3 (pages 15-4 and 15-6). Find mW, sW, and the standardized rank sum statistic. Then give an approximate P-value using the Normal approximation. What do you conclude? 15.6 The effect of Animal Kingdom on the P-value. Refer to Exercises 15.2 and 15.4 (pages 15-4 and 15-6). Repeat the analysis in Exercise 15.5 using the altered data.
CHALLENGE CHALLENGE
We recommend always using either the exact distribution (from software or tables) or the continuity correction for the rank sum statistic W. The exact distribution is safer for small samples. As Example 15.4 illustrates, however, the Normal approximation with the continuity correction is often adequate.
EXAMPLE
Mann-Whitney test
15.5 Software output. Figure 15.3 shows the output for our data from two additional statistical programs. Minitab gives the Normal approximation, and it refers to the Mann-Whitney test. This is an alternative form of the Wilcoxon rank sum test. SPSS uses the exact calculation for the P-value here but tests the null hypothesis only against the two-sided alternative.
15.1 The Wilcoxon Rank Sum Test FIGURE 15.3 Output from the Minitab and SPSS statistical software for the data in Example 15.1. (a) Minitab uses the Normal approximation for the distribution of W. (b) SPSS gives the exact value for the two-sided alternative.
15-9
Minitab
Mann-Whitney Test and CI: HitsAmer, HitsNat N
Median
HitsAmer
4
20.500
HitsNat
4
12.000
Point estimate for ETA1–ETA2 is 8.500 97.0 Percent CI for ETA1–ETA2 is (–1.001,17.001) W = 25.0 Test of ETA1 = ETA2 vs ETA1 > ETA2 is significant at 0.0303
(a) Minitab *Output1 - IBM SPSS Statistics Viewer Nonparametric Tests Hypothesis Test Summary Null Hypothesis
Test
IndependentSamples The distribution of Hits is the same Mann1 across categories of LeagueN. Whitney U Test
Sig.
.0571
Decision Retain the null hypothesis.
Asymptotic significances are displayed. The significance level is .05. 1Exact
significance is displayed for this test.
IBM SPSS Statistics Processor is ready
H: 22, W: 533 pt.
(b) SPSS
What hypotheses does Wilcoxon test? Our null hypothesis is that the distribution of hits is the same in the two leagues. Our alternative hypothesis is that there are more hits in the American League than in the National League. If we are willing to assume that hits are Normally distributed, or if we have reasonably large samples, we use the twosample t test for means. Our hypotheses then become H0: m1 ⫽ m2 Ha: m1 ⬎ m2
15-10
CHAPTER 15
•
Nonparametric Tests When the distributions may not be Normal, we might restate the hypotheses in terms of population medians rather than means: H0: median1 ⫽ median2 Ha: median1 ⬎ median2 The Wilcoxon rank sum test does test hypotheses about population medians, but only if an additional assumption is met: both populations must have distributions of the same shape. That is, the density curve for hits in the American League must look exactly like that for the National League except that it may be shifted to the left or to the right. The Minitab output in Figure 15.3(a) states the hypotheses in terms of population medians (which it calls “ETA”) and also gives a confidence interval for the difference between the two population medians. The same-shape assumption is too strict to be reasonable in practice. Recall that our preferred version of the two-sample t test does not require that the two populations have the same standard deviation—that is, it does not make a same-shape assumption. Fortunately, the Wilcoxon test also applies in a much more general and more useful setting. It tests hypotheses that we can state in words as H0: The two distributions are the same. Ha: One distribution has values that are systematically larger.
systematically larger
Here is a more exact statement of the systematically larger alternative hypothesis. Take X1 to be hits in the American League and X2 to be hits in the National League. These hits are random variables. That is, for each game in the American League, the number of hits is a value of the variable X1. The probability that the number of hits is more than 15 is P1X1 ⬎ 152. Similarly, P1X2 ⬎ 152 is the corresponding probability for the National League. If the number of American League hits is “systematically larger” than the number of National League hits, getting more hits than 15 should be more likely in the American League. That is, we should have P1X1 ⬎ 152 ⬎ P1X2 ⬎ 152 The alternative hypothesis says that this inequality holds not just for 15 hits but for any number of hits.2 This exact statement of the hypotheses we are testing is a bit awkward. The hypotheses really are “nonparametric” because they do not involve any specific parameter such as the mean or median. If the two distributions do have the same shape, the general hypotheses reduce to comparing medians. Many texts and computer outputs state the hypotheses in terms of medians, sometimes ignoring the same-shape requirement. We recommend that you express the hypotheses in words rather than symbols. “The number of American League hits is systematically higher than the number of National League hits” is easy to understand and is a good statement of the effect that the Wilcoxon test looks for.
Ties The exact distribution for the Wilcoxon rank sum is obtained assuming that all observations in both samples take different values. This allows us to rank them all. In practice, however, we often find observations tied at the same value.
15.1 The Wilcoxon Rank Sum Test average ranks
15-11
What shall we do? The usual practice is to assign all tied values the average of the ranks they occupy. Here is an example with six observations: Observation Rank
153
155
158
158
161
164
1
2
3.5
3.5
5
6
The tied observations occupy the third and fourth places in the ordered list, so they share rank 3.5. The exact distribution for the Wilcoxon rank sum W changes if the data contain ties. Moreover, the standard deviation W must be adjusted if ties are present. The Normal approximation can be used after the standard deviation is adjusted. Statistical software will detect ties, make the necessary adjustment, and switch to the Normal approximation. In practice, software is required if you want to use rank tests when the data contain tied values. It is sometimes useful to use rank tests on data that have very many ties because the scale of measurement has only a few values. Here is an example.
EXAMPLE 15.6 Exergaming in Canada. Exergames are active video games such as rhythmic dancing games, virtual bicycles, balance board simulators, and virtual sports simulators that require a screen and a console. A study of exergaming in students from grades 10 and 11 in Montreal, Canada, examined many factors related to participation in exergaming.3 In Exercise 14.23 (page 14-22) we used logistic regression to examine the relationship between exergaming and time spent viewing television. Here are the data displayed in a two-way table of counts: TV time (hours per day) Exergamer DATA
None
Some but less than 2 hours
2 hours or more
Yes
6
160
115
No
48
616
255
EXERG
USE YOUR KNOWLEDGE DATA CHALLENGE
EXERG
LOOK BACK chi-square test, p. 539
15.7 Analyze as a two-way table. Analyze the exergaming data in Example 15.6 as a two-way table. (a) Compute the percents in the three categories of TV watching for the exergamers. Do the same for those who are not exergamers. Display the percents graphically and summarize the differences in the two distributions.
CHALLENGE
(b) Perform the chi-square test for the counts in the two-way table. Report the test statistic, the degrees of freedom, and the P-value. Give a brief summary of what you can conclude from this significance test.
15-12
CHAPTER 15
•
Nonparametric Tests How do we approach the analysis of these data using the Wilcoxon test? We start with the hypotheses. We have two distributions of TV viewing, one for the exergamers and one for those who are not exergamers. The null hypothesis states that these two distributions are the same. The alternative hypothesis uses the fact that the responses are ordered from no TV to 2 hours or more per day. It states that one of the exerciser groups watches more TV than the other. H0: The amount of time spent viewing TV is the same for students who are exergamers and students who are not. Ha: One of the two groups views more TV than the other. The alternative hypothesis is two-sided. Because the responses can take only three values, there are very many ties. All 54 students who watch no TV are tied. Similarly, all students in each of the other two columns of the table are tied. The graphical display that you prepared in Exercise 15.7 suggests that the exergamers watch more TV than those who are not exergamers. Is this difference statistically significant?
EXAMPLE DATA EXERG
CHALLENGE
15.7 Software output. Look at Figure 15.4, which gives SAS output for the Wilcoxon test. The rank sum for the exergamers (using average ranks for ties) is W ⫽ 187, 747.5. The expected rank sum under the null hypothesis is 168,740.5, so the exergamers have a higher rank sum than we would expect. The Normal approximation test statistic is z ⫽ 4.47 and the two-sided P-value is reported as P ⬍ 0.0001. There is very strong evidence of a difference. Exergamers watch more TV than the students who are not exergamers. We can use our framework of “systematically larger” (page 15-10) to summarize these data. For the exergamers, 98% watch some TV and 41% watch two or more hours per day. The corresponding percents for the students who are not exergamers are 95% and 28%.
In our discussion of TV viewing and exergaming, we have expressed results in terms of the amount of TV watched. In fact, we do not have the actual hours of TV watched by each student in the study. Only data with the hours classified into three groups are available. Many government surveys summarize quantitative data categorized into ranges of values. When summarizing the analysis of data, it is very important to explain clearly how the data are recorded. In this setting, we have chosen to use phrases such as “watch more TV” because they express the findings based on the data available. Note that the two-sample t test would not be appropriate in this setting. If we coded the TV-watching categories as 1, 2, and 3, the average of these coded values would not be meaningful. On the other hand, we frequently encounter variables measured in scales such as “strongly agree,” “agree,” “neither agree nor disagree,” “disagree,” and “strongly disagree.” In these circumstances, many would code the responses with the integers 1 to 5 and then use standard methods such as a t test or ANOVA. Whether to do this or not is a matter of judgment. Rank tests avoid
15.1 The Wilcoxon Rank Sum Test FIGURE 15.4 Output from SAS for the exergaming data, for Example 15.7.
15-13
SAS
The SAS System The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable TVN Classified by Variable Exergame Exergame
N
Sum of Scores
Expected Under H0
Std Dev Under H0
Mean Score
Yes
281
187747.50 168740.50
4253.97554
668.140569
No
919
532852.50 551859.50
4253.97554
579.817737
Average scores were used for ties. Wilcoxon Two-Sample Test Statistic (S)
187747.5000
Normal Approximation Z
4.4679
One-Sided Pr > Z
|Z|
Z
|Z|
Z
|Z|
Z
|Z|
0.000000
DiffLow
N 5
N* 5
N for test 5
Wilcoxon Statistic 9.0
P 0.394
Estimated Median 0.1000
(a) Minitab *Output1 - IBM SPSS Statistics Viewer Nonparametric Tests Hypothesis Test Summary Null Hypothesis
Test
One-Sample 1 The median of DiffLow equals 0.00. Wilcoxon Signed Rank Test
Sig. .686
Decision Retain the null hypothesis.
Asymptotic significances are displayed. The significance level is .05.
IBM SPSS Statistics Processor is ready
(b) SPSS (Continued )
15-22
CHAPTER 15
•
FIGURE 15.7 (Continued )
Nonparametric Tests
SAS
Tests for Location: Mu0=0 Test
Statistic
p Value
Student’s t
t
0.634979
Sign
M
– 0.5
Signed Rank
S
1.5
Pr > |t|
0.5599
Pr ≥ |M|
1.0000
Pr ≥ |S|
0.8125
(c) SAS
variable. The two-sided alternative is used. The test statistic for the signed rank test is given as S ⫽ 1.5. This quantity is W ⫹ minus its expected value mW ⫹ ⫽ 7.5, S ⫽ W ⫹ ⫺ mW ⫹ . The P-value is given as P ⫽ 0.8125. Results reported in the three outputs lead us to the same qualitative conclusion: the data do not provide evidence to support the idea that the Story 2 scores are higher than (or not equal to) the Story 1 scores. Different methods and approximations are used to compute the P-values. With larger sample sizes, we would not expect so much variation in the P-values. Note that the t test results reported in SAS also give the same conclusion, P ⫽ 0.5599. When the sampling distribution of a test statistic is symmetric, we can use output that gives a P-value for a two-sided alternative to compute a P-value for a one-sided alternative. Check that the effect is in the direction specified by the one-sided alternative and then divide the P-value by 2.
The Normal approximation The distribution of the signed rank statistic when the null hypothesis (no difference) is true becomes approximately Normal as the sample size becomes large. We can then use Normal probability calculations (with the continuity correction) to obtain approximate P-values for W ⫹ . Let’s see how this works in the storytelling example, even though n ⫽ 5 is certainly not a large sample.
EXAMPLE 15.10 The Normal approximation. For n ⫽ 5 observations, we saw in
Example 15.9 that mW ⫹ ⫽ 7.5. The standard deviation of W ⫹ under the null hypothesis is n1n ⫹ 12 12n ⫹ 12 sW ⫹ ⫽ B 24 ⫽
152 162 1112 B 24
⫽ 213.75 ⫽ 3.708
15.2 The Wilcoxon Signed Rank Test
15-23
The continuity correction calculates the P-value P1W ⫹ ⱖ 92 as P1W ⫹ ⱖ 8.52, treating the value W ⫹ ⫽ 9 as occupying the interval from 8.5 to 9.5. We find the Normal approximation for the P-value by standardizing and using the standard Normal table: P1W ⫹ ⱖ 8.52 ⫽ P a
W ⫹ ⫺ 7.5 8.5 ⫺ 7.5 ⱖ b 3.708 3.708
⫽ P1Z ⱖ 0.272 ⫽ 0.394 Despite the small sample size, the Normal approximation gives a result quite close to the exact value P ⫽ 0.4062. Figure 15.7(b) shows that the approximation is much less accurate without the continuity correction. This output reminds us not to trust software unless we know exactly what it does. USE YOUR KNOWLEDGE DATA SPAS3
DATA SPAS4
15.22 Significance test for top-ranked spas. Refer to Exercise 15.20 (page 15-20). Find mW ⫹ , sW ⫹ , and the Normal approximation for the P-value for the Wilcoxon signed rank test. 15.23 Significance test for lower-ranked spas. Refer to Exercise 15.21 (page 15-20). Find mW ⫹ , sW ⫹ , and the Normal approximation for the P-value for the Wilcoxon signed rank test.
Ties CHALLENGE CHALLENGE
Ties among the absolute differences are handled by assigning average ranks. A tie within a pair creates a difference of zero. Because these are neither positive nor negative, the usual procedure simply drops such pairs from the sample. This amounts to dropping observations that favor the null hypothesis (no difference). If there are many ties, the test may be biased in favor of the alternative hypothesis. As in the case of the Wilcoxon rank sum, ties complicate finding a P-value. Most software no longer provides an exact distribution for the signed rank statistic W ⫹, and the standard deviation sW ⫹ must be adjusted for the ties before we can use the Normal approximation. Software will do this. Here is an example.
EXAMPLE 15.11 Golf scores of a women’s golf team. Here are the golf scores of 12 members of a college women’s golf team in two rounds of tournament play. (A golf score is the number of strokes required to complete the course, so that low scores are better.) Player
1
2
3
4
5
6
7
Round 2
94
85
89
89
81
76
Round 1
89
90
87
95
86
5
25
2
26
25
Difference
8
9
10
11
12
107
89
87
91
88
80
81
102
105
83
88
91
79
25
5
216
4
3
23
1
15-24
CHAPTER 15
•
Nonparametric Tests Negative differences indicate better (lower) scores on the second round. We see that 6 of the 12 golfers improved their scores. We would like to test the hypotheses that in a large population of collegiate women golfers H0: Scores have the same distribution in Rounds 1 and 2. Ha: Scores are systematically lower or higher in Round 2. A Normal quantile plot of the differences (Figure 15.8) shows some irregularity and a low outlier. We will use the Wilcoxon signed rank test.
Difference in golf score
5
0
–5
–10
–15
FIGURE 15.8 Normal quantile plot of the difference in scores for two rounds of a golf tournament, for Example 15.11.
–3
–2
–1
0 1 Normal score
2
3
The absolute values of the differences, with boldface indicating those that are negative, are 5 5
2
6
5
5
5
4
16
3
3
1
Arrange these in increasing order and assign ranks, keeping track of which values were originally negative. Tied values receive the average of their ranks. Absolute value
1
2
3
3
4
5
5
5
5
5
6
16
Rank
1
2
3.5
3.5
5
8
8
8
8
8
11
12
The Wilcoxon signed rank statistic is the sum of the ranks of the negative differences. (We could equally well use the sum of the ranks of the positive differences.) Its value is W ⫹ ⫽ 50.5.
EXAMPLE 15.12 Software output. Here are the two-sided P-values for the Wilcoxon signed rank test for the golf score data from three statistical programs: Program
P-value
Minitab
P ⫽ 0.388
SAS
P ⫽ 0.388
SPSS
P ⫽ 0.363
15.2 The Wilcoxon Signed Rank Test
15-25
All lead to the same practical conclusion: these data give no evidence for a systematic change in scores between rounds. However, the P-value reported by SPSS differs a bit from the other two. The reason for the variation is that the programs use slightly different versions of the approximate calculations needed when ties are present. The exact result depends on which version the software programmer chose to use. For the golf data, the matched pairs t test gives t ⫽ 0.9314 with P ⫽ 0.3716. Once again, t and W ⫹ lead to the same conclusion.
Testing a hypothesis about the median of a distribution Let’s take another look at how the Wilcoxon signed rank test works. We have data for a pair of variables measured on the same individuals. The analysis starts with the differences between the two variables. These differences are what we input to statistical software. At this stage we can think of our data as consisting of a single variable. The Wilcoxon signed rank test tests the null hypothesis that the population median of the differences is zero. The alternative is that the median is not zero. Think about starting the analysis at the stage where we have a single variable and we are interested in testing a hypothesis about the median. The null hypothesis does not necessarily need to be zero. If it is some other value, we simply subtract that value from each observation before we start the analysis. Exercise 15.35 (page 15-27) leads you through the steps needed for this analysis.
SECTION 15.2 Summary The Wilcoxon signed rank test applies to matched pairs studies. It tests the null hypothesis that there is no systematic difference within pairs against alternatives that assert a systematic difference (either one-sided or two-sided). The test is based on the Wilcoxon signed rank statistic W 1, which is the sum of the ranks of the positive (or negative) differences when we rank the absolute values of the differences. The matched pairs t test and the sign test are alternative tests in this setting. P-values for the signed rank test are based on the sampling distribution of W ⫹ when the null hypothesis is true. You can find P-values from special tables, software, or a Normal approximation (with continuity correction).
SECTION 15.2 Exercises For Exercises 15.20 and 15.21, see page 15-20; and for Exercises 15.22 and 15.23, see page 15-23. 15.24 Fuel efficiency. Computers in some vehicles calculate various quantities related to performance. One of these is the fuel efficiency, or gas mileage, usually expressed as miles per gallon (mpg). For one vehicle equipped in this way, the mpg were recorded each time the gas tank was filled, and the computer was then reset. In addition to the computer calculating mpg, the driver also recorded the mpg by dividing the miles driven by the number of gallons at fill-up.9
The driver wants to determine if these calculations are different. MPG8
Fill-up
1
2
3
4
5
6
7
8
Computer
41.5
50.7
36.6
37.3
34.2
45.0
48.0
43.2
Driver
36.5
44.2
37.2
35.6
30.5
40.5
40.0
41.0
(a) For each of the eight fill-ups find the difference between the computer mpg and the driver mpg.
15-26
CHAPTER 15
•
Nonparametric Tests
(b) Find the absolute values of the differences you found in part (a).
(d) The output reports an estimated median. Explain how this statistic is calculated from the data.
(c) Order the absolute values of the differences that you found in part (b) from smallest to largest, and underline those absolute differences that came from positive differences in part (a).
15.30 Number of friends on Facebook. Facebook recently examined all active Facebook users (more than 10% of the global population) and determined that the average user has 190 friends. This distribution takes only integer values, so it is certainly not Normal. It is also highly skewed to the right, with a median of 100 friends.10 Consider the following SRS of n ⫽ 30 Facebook users from your large university. FACEFR
15.25 Find the Wilcoxon signed rank statistic. Using the work that you performed in the previous exercise, find the value of the Wilcoxon signed rank statistic W ⫹ . 15.26 State the hypotheses. Refer to Exercise 15.24. State the null hypothesis and the alternative hypothesis for this setting. 15.27 Find the mean and the standard deviation. Refer to Exercise 15.24. Use the sample size to find the mean and the standard deviation of the sampling distribution of the Wilcoxon signed rank statistic W ⫹ under the null hypothesis. 15.28 Find the P-value. Refer to Exercises 15.24 to 15.27. Find the P-value for the Wilcoxon signed rank statistic using the Normal approximation with the continuity correction. 15.29 Read the output. The data in Exercise 15.24 are a subset of a larger set of data. Figure 15.9 gives Minitab output for the analysis of this larger set of data. MPGCOMP (a) How many pairs of observations are in the larger data set? (b) What is the value of the Wilcoxon signed rank statistic W ⫹ ? (c) Report the P-value for the significance test and give a brief statement of your conclusion.
594
60
417
120
132
176
516
319
734
8
31
325
52
63
537
27
368
11
12
190
85
165
288
65
57
81
257
24
297
148
(a) Use the Wilcoxon signed rank procedure to test the null hypothesis that the median number of Facebook friends for Facebook users at your university is 190. Describe the steps in the procedure and summarize the results. (b) Exercise 7.26 (page 442) asked you to analyze these data using the t procedure. Perform this analysis and compare the results with those that you found in part (a). 15.31 The full moon and behavior. Can the full moon influence behavior? A study observed 15 nursing-home patients with dementia. The number of incidents of aggressive behavior was recorded each day for 12 weeks. Call a day a “moon day” if it is the day of a full moon or the day before or after a full moon. Here are the average numbers of aggressive incidents for moon days and other days for each subject:11 MOON Patient
Moon days
Other days
1
3.33
0.27
2
3.67
0.59
3
2.67
0.32
Minitab
4
3.33
0.19
Wilcoxon Signed Rank Test: Diff
5
3.33
1.26
6
3.67
0.11
7
4.67
0.30
8
2.67
0.40
9
6.00
1.59
10
4.33
0.60
11
3.33
0.65
12
0.67
0.69
13
1.33
1.26
14
0.33
0.23
15
2.00
0.38
Test of median = 0.000000 versus median not = 0.000000
Diff
N 20
N for Test 20
Wilcoxon Statistic 192.0
P 0.001
Estimated Median 2.925
FIGURE 15.9 Minitab output for the fuel efficiency data, for Exercise 15.29.
15.2 The Wilcoxon Signed Rank Test The matched pairs t test (Example 7.7, page 429) gives P ⬍ 0.000015, and a permutation test (Example 16.14, page 16-50) gives P ⫽ 0.0001. Does the Wilcoxon signed rank test, based on ranks rather than means, agree that there is strong evidence that there are more aggressive incidents on moon days? 15.32 Comparison of two energy drinks. Consider the following study to compare two popular energy drinks. For each subject, a coin was flipped to determine which drink to rate first. Each drink was rated on a 0 to 100 scale, with 100 being the highest rating. ENERDR6
1
2
3
4
5
A
43
83
66
87
78
67
B
45
78
64
79
71
62
6
(a) Inspect the data. Is there a tendency for these subjects to prefer one of the two energy drinks? (b) Use the matched pairs t test of Chapter 7 (page 429) to compare the two drinks. (c) Use the Wilcoxon signed rank test to compare the two drinks. (d) Write a summary of your results and explain why the two tests give different conclusions. 15.33 Comparison of two energy drinks with an additional subject. Refer to the previous exercise. Let’s suppose that there is an additional subject who expresses a strong preference for energy drink “A.” Here is the new data set: ENERDR7 Subject 2
3
4
5
6
scores between the pretest and the posttest for 20 teachers: SUMLANG 2
0
6
6
3
3
2
3
26
6
6
6
3
0
1
1
0
2
3
3
(Exercise 7.45, page 446, applies the t test to these data; Exercise 16.59, page 16-49, applies a permutation test based on the means.) Show the assignment of ranks and the calculation of the signed rank statistic W ⫹ for these data. Remember that zeros are dropped from the data before ranking, so that n is the number of nonzero differences within pairs. 15.35 Radon detectors. How accurate are radon detectors of a type sold to homeowners? To answer this question, university researchers placed 12 detectors in a chamber that exposed them to 105 picocuries per liter (pCi/l) of radon.12 The detector readings are as follows: RADON
Subject Drink
15-27
Drink
1
7
A
43
83
66
87
78
67
90
B
45
78
64
79
71
62
60
Answer the questions given in the previous exercise. Write a summary comparing this exercise with the previous one. Include a discussion of what you have learned regarding the choice of the t test versus the Wilcoxon signed rank test for different sets of data. 15.34 A summer language institute for teachers. A matched pairs study of the effect of a summer language institute on the ability of teachers to comprehend spoken French had these improvements in
91.9
97.8
111.4
122.3
105.4
95.0
103.8
99.6
96.6
119.3
104.8
101.7
We wonder if the median reading differs significantly from the true value 105. (a) Graph the data, and comment on skewness and outliers. A rank test is appropriate. (b) We would like to test hypotheses about the median reading from home radon detectors: H0: median ⫽ 105 Ha: median ⬆ 105 To do this, apply the Wilcoxon signed rank statistic to the differences between the observations and 105. (This is the one-sample version of the test.) What do you conclude? 15.36 Vitamin C in wheat-soy blend. The U.S. Agency for International Development provides large quantities of wheat-soy blend (WSB) for development programs and emergency relief in countries throughout the world. One study collected data on the vitamin C content of 5 bags of WSB at the factory and five months later in Haiti.13 Here are the data: WSBVITC Sample
1
2
3
4
5
Before
73
79
86
88
78
After
20
27
29
36
17
We want to know if vitamin C has been lost during transportation and storage. Describe what the data show about this question. Then use a rank test to see whether there has been a significant loss.
15-28
CHAPTER 15
•
Nonparametric Tests
15.3 The Kruskal-Wallis Test* When you complete this section, you will be able to • Describe the setting where the Kruskal-Wallis test can be used. • Specify the null and alternative hypotheses for the Kruskal-Wallis test. • For the Kruskal-Wallace test, use computer output to determine the results of the significance test.
We have now considered alternatives to the matched pairs and two-sample t tests for comparing the magnitude of responses to two treatments. To compare more than two treatments, we use one-way analysis of variance (ANOVA) if the distributions of the responses to each treatment are at least roughly Normal and have similar spreads. What can we do when these distribution requirements are violated?
EXAMPLE DATA WEEDS
15.13 Weeds and corn yield. Lamb’s-quarter is a common weed that interferes with the growth of corn. A researcher planted corn at the same rate in 16 small plots of ground and then randomly assigned the plots to four groups. He weeded the plots by hand to allow a fixed number of lamb’squarter plants to grow in each meter of corn row. These numbers were 0, 1, 3, and 9 in the four groups of plots. No other weeds were allowed to grow, and all plots received identical treatment except for the weeds. Here are the yields of corn (bushels per acre) in each of the plots:14
CHALLENGE
Weeds per meter
Corn yield
Weeds per meter
Corn yield
Weeds per meter
Corn yield
Weeds per meter
Corn yield
0
166.7
1
166.2
3
158.6
9
162.8
0
172.2
1
157.3
3
176.4
9
142.4
0
165.0
1
166.7
3
153.1
9
162.7
0
176.9
1
161.1
3
156.0
9
162.4
The summary statistics are Weeds
n
Mean
Std. dev.
0
4
170.200
5.422
1
4
162.825
4.469
3
4
161.025
10.493
9
4
157.575
10.118
*Because this test is an alternative to the one-way analysis of variance F test, you should first read Chapter 12.
15.3 The Kruskal-Wallis Test*
15-29
The sample standard deviations do not satisfy our rule of thumb that for safe use of ANOVA the largest should not exceed twice the smallest. A careful look at the data suggests that there may be some outliers in the 3 and 9 weeds per meter groups. These are the correct yields for their plots, so we have no justification for removing them. Let’s use a rank test that is not sensitive to outliers.
Hypotheses and assumptions The ANOVA F test concerns the means of the several populations represented by our samples. For Example 15.13, the ANOVA hypotheses are H0: m0 ⫽ m1 ⫽ m3 ⫽ m9 Ha: not all four means are equal Here, m0 is the mean yield in the population of all corn planted under the conditions of the experiment with no weeds present. The data should consist of four independent random samples from the four populations, all Normally distributed with the same standard deviation. The Kruskal-Wallis test is a rank test that can replace the ANOVA F test. The assumption about data production (independent random samples from each population) remains important, but we can relax the Normality assumption. We assume only that the response has a continuous distribution in each population. The hypotheses tested in our example are H0: Yields have the same distribution in all groups. Ha: Yields are systematically higher in some groups than in others. If all the population distributions have the same shape (Normal or not), these hypotheses take a simpler form. The null hypothesis is that all four populations have the same median yield. The alternative hypothesis is that not all four median yields are equal.
The Kruskal-Wallis test Recall the analysis of variance idea: we write the total observed variation in the responses as the sum of two parts, one measuring variation among the groups (sum of squares for groups, SSG) and one measuring variation among individual observations within the same group (sum of squares for error, SSE). The ANOVA F test rejects the null hypothesis that the mean responses are equal in all groups if SSG is large relative to SSE. The idea of the Kruskal-Wallis rank test is to rank all the responses from all groups together and then apply one-way ANOVA to the ranks rather than to the original observations. If there are N observations in all, the ranks are always the whole numbers from 1 to N. The total sum of squares for the ranks is therefore a fixed number no matter what the data are. So we do not need to look at both SSG and SSE. Although it isn’t obvious without some unpleasant algebra, the Kruskal-Wallis test statistic is essentially just SSG for the ranks. We give the formula, but you should rely on software to do the arithmetic. When SSG is large, that is evidence that the groups differ.
15-30
CHAPTER 15
•
Nonparametric Tests
THE KRUSKAL-WALLIS TEST Draw independent SRSs of sizes n1, n2, . . . , nI from I populations. There are N observations in all. Rank all N observations and let Ri be the sum of the ranks for the ith sample. The Kruskal-Wallis statistic is H⫽
R2i 12 ⫺ 31N ⫹ 12 N1N ⫹ 12 a ni
When the sample sizes ni are large and all I populations have the same continuous distribution, H has approximately the chi-square distribution with I ⫺ 1 degrees of freedom. The Kruskal-Wallis test rejects the null hypothesis that all populations have the same distribution when H is large.
We now see that, like the Wilcoxon rank sum statistic, the Kruskal-Wallis statistic is based on the sums of the ranks for the groups we are comparing. The more different these sums are, the stronger is the evidence that responses are systematically larger in some groups than in others. The exact distribution of the Kruskal-Wallis statistic H under the null hypothesis depends on all the sample sizes n1 to nI, so tables are awkward. The calculation of the exact distribution is so time-consuming for all but the smallest problems that even most statistical software uses the chi-square approximation to obtain P-values. As usual, there is no usable exact distribution when there are ties among the responses. We again assign average ranks to tied observations.
EXAMPLE DATA WEEDS
15.14 Perform the significance test. In Example 15.13, there are I ⫽ 4
populations and N ⫽ 16 observations. The sample sizes are equal, ni ⫽ 4. The 16 observations arranged in increasing order, with their ranks, are
Yield Rank
142.4 1
153.1 2
156.0 3
157.3 4
158.6 5
161.1 6
162.4 7
162.7 8
Yield
162.8
165.0
166.2
166.7
166.7
172.2
176.4
176.9
Rank
9
10
11
12.5
12.5
14
15
16
CHALLENGE
There is one pair of tied observations. The ranks for each of the four treatments are Weeds
Ranks
Rank sums
0
10
12.5
14
16
52.5
1
4
6
11
12.5
33.5
3
2
3
5
15
25.0
9
1
7
8
9
25.0
15.3 The Kruskal-Wallis Test*
15-31
The Kruskal-Wallis statistic is therefore H⫽
R2i 12 ⫺ 31N ⫹ 12 N1N ⫹ 12 a ni
⫽
12 52.52 33.52 252 252 a ⫹ ⫹ ⫹ b ⫺ 132 1172 1162 1172 4 4 4 4
⫽
12 11282.1252 ⫺ 51 272
⫽ 5.56 Referring to the table of chi-square critical points (Table F) with df 5 3, we find that the P-value lies in the interval 0.10 ⬍ P ⬍ 0.15. This small experiment suggests that more weeds decrease yield but does not provide convincing evidence that weeds have an effect.
Figure 15.10 displays the output from Minitab, SPSS, and SAS for the analysis of the data in Example 15.14. Minitab gives the H statistic adjusted for ties as H ⫽ 5.57 with 3 degrees of freedom and P ⫽ 0.134. SPSS reports the same P-value. SAS reports a chi-square statistic with 3 degrees of freedom and P ⫽ 0.1344. All agree that there is not sufficient evidence in the data to reject the null hypothesis that the number of weeds per meter has no effect on the yield.
FIGURE 15.10 Output from (a) Minitab, (b) SPSS, and (c) SAS for the Kruskal-Wallis test applied to the weed data, for Example 15.14.
Minitab
Kruskal-Wallis Test: Yield versus Weeds Kruskal-Wallis Test on Yield Weeds
N
Median
Ave Rank
Z
0
4
169.4
13.1
2.24
1
4
163.6
8.4
–0.06
3
4
157.3
6.3
–1.09
9
4
162.6
6.3
–1.09
Overall
16
8.5
H = 5.56
DF = 3
P = 0.135
H = 5.57
DF = 3
P = 0.134
(adjusted for ties)
* NOTE * One or more small samples
(a) Minitab (Continued )
15-32
CHAPTER 15
•
Nonparametric Tests
FIGURE 15.10 (Continued )
*Output1 - IBM SPSS Statistics Viewer Nonparametric Tests Hypothesis Test Summary Null Hypothesis
Test
The distribution of Yield is the same 1 across categories of Weeds.
Sig.
IndependentSamples KruskalWallis Test
Decision
.134
Retain the null hypothesis
Asymptotic significances are displayed. The significance level is .05.
IBM SPSS Statistics Processor is ready
H: 22, W: 345 pt.
(b) SPSS
SAS
The SAS System The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable Yield Classified by Variable Weeds
Weeds
N
Sum of Scores
Expected Under H0
Std Dev Under H0
Mean Score
0
4
52.50
34.0
8.240146
13.1250
1
4
33.50
34.0
8.240146
8.3750
3
4
25.00
34.0
8.240146
6.2500
9
4
25.00
34.0
8.240146
6.2500
Average scores were used for ties. Kruskal-Wallis Test Chi-Square
5.5725
DF
3
Pr > Chi-Square
Done
(c) SAS
0.1344
15.3 The Kruskal-Wallis Test*
15-33
SECTION 15.3 Summary The Kruskal-Wallis test compares several populations on the basis of independent random samples from each population. This is the one-way analysis of variance setting. The null hypothesis for the Kruskal-Wallis test is that the distribution of the response variable is the same in all the populations. The alternative hypothesis is that responses are systematically larger in some populations than in others. The Kruskal-Wallis statistic H can be viewed in two ways. It is essentially the result of applying one-way ANOVA to the ranks of the observations. It is also a comparison of the sums of the ranks for the several samples. When the sample sizes are not too small and the null hypothesis is true, H for comparing I populations has approximately the chi-square distribution with I ⫺ 1 degrees of freedom. We use this approximate distribution to obtain P-values.
SECTION 15.3 Exercises 15.37 Number of Facebook friends. An experiment was run to examine the relationship between the number of Facebook friends and the user’s perceived social attractiveness.15 A total of 134 undergraduate participants were randomly assigned to observe one of five Facebook profiles. Everything about the profile was the same except the number of friends, which appeared on the profile as 102, 302, 502, 702, or 902. After viewing the profile, each participant was asked to fill out a questionnaire on the physical and social attractiveness of the profile user. Each attractiveness score is an average of several seven-point questionnaire items, ranging from 1 (strongly disagree) to 7 (strongly agree). In Example 12.3 (page 648), we
FIGURE 15.11 Output from Minitab for the Kruskal-Wallis test applied to the Facebook data, for Exercise 15.39.
analyzed these data using a one-way ANOVA. Explain the setting for this problem. Include the number of groups to be compared, assumptions about independence, and the distribution of the distributions. FRIENDS 15.38 What are the hypotheses? Refer to the previous exercise. What are the null hypothesis and the alternative hypothesis? Explain why a nonparametric procedure is appropriate in this setting. 15.39 Read the output. Figure 15.11 gives the Minitab output for the analysis of the data described in Exercise 15.37. Describe the results given in the output and write a short summary of your conclusions from the analysis.
Minitab
Kruskal-Wallis Test: Score versus Friends Kruskal-Wallis Test on Score Friends
N
Median
Ave Rank
Z
102
24
3.600
46.3
–2.96
302
33
5.000
84.9
2.97
502
26
4.700
72.3
0.70
702
30
4.600
70.7
0.51
21
4.200
54.0
–1.74
902 Overall
134
67.5
H = 16.98
DF = 4
P = 0.002
H = 17.05
DF = 4
P = 0.002
(adjusted for ties)
15-34
CHAPTER 15
•
Nonparametric Tests
15.40 Do we experience emotions differently? In Exercise 12.37 (page 684) you analyzed data related to the way people from different cultures experience emotions. The study subjects were 410 college students from five different cultures. They were asked to record, on a 1 (never) to 7 (always) scale, how much of the time they typically felt eight specific emotions. These were averaged to produce the global emotion score for each participant. Analyze the data using the Kruskal-Wallis test and write a summary of your analysis and conclusions. Be sure to include your assumptions, hypotheses, and the results of the significance test. EMOTION 15.41 Do isoflavones increase bone mineral density? In Exercise 12.45 (page 686) you investigated the effects of isoflavones from kudzu on bone mineral density (BMD). The experiment randomized rats to three diets: control, low isoflavones, and high isoflavones. Here are the data: KUDZU
(a) Use the Kruskal-Wallis test to assess significance and then write a brief summary of what the data show. (b) Because there are only 2 observations per group, we suspect that the common chi-square approximation to the distribution of the Kruskal-Wallis statistic may not be accurate. The exact P-value (from SAS software) is P ⫽ 0.0011. Compare this with your P-value from part (a). Is the difference large enough to affect your conclusion? 15.43 Jumping and strong bones. In Exercise 12.47 (page 687) you studied the effects of jumping on the bones of rats. Ten rats were assigned to each of three treatments: a 60-centimeter “high jump,” a 30-centimeter “low jump,” and a control group with no jumping.17 Here are the bone densities (in milligrams per cubic centimeter) after eight weeks of 10 jumps per day: JUMP Bone density (mg/cm3)
Group 2
Treatment Control
BMD (g/cm )
Control
0.228 0.207 0.234 0.220 0.217 0.228 0.209 0.221 0.204 0.220 0.203 0.219 0.218 0.245 0.210
Low dose
Low jump
0.211 0.220 0.211 0.233 0.219 0.233 0.226 0.228 0.216 0.225 0.200 0.208 0.198 0.208 0.203
High dose 0.250 0.237 0.217 0.206 0.247 0.228 0.245 0.232
High jump
611
621
614
593
593
653
600
554
603
569
635
605
638
594
599
632
631
588
607
596
650
622
626
626
631
622
643
674
643
650
0.267 0.261 0.221 0.219 0.232 0.209 0.255
(a) Use the Kruskal-Wallace test to compare the three diets. (b) How do these results compare with what you find using the ANOVA F statistic? 15.42 Vitamins in bread. Does bread lose its vitamins when stored? Here are data on the vitamin C content (milligrams per 100 grams of flour) in bread baked from the same recipe and stored for 1, 3, 5, or 7 days.16 The 10 observations are from 10 different loaves of bread. BREAD Condition
Vitamin C (mg/100 g)
Immediately after baking
47.62
49.79
One day after baking
40.45
43.46
Three days after baking
21.25
22.34
Five days after baking
13.18
11.65
8.51
8.13
Seven days after baking
The loss of vitamin C over time is clear, but with only 2 loaves of bread for each storage time we wonder if the differences among the groups are significant.
(a) The study was a randomized comparative experiment. Outline the design of this experiment. (b) Make side-by-side stemplots for the three groups, with the stems lined up for easy comparison. The distributions are a bit irregular but not strongly non-Normal. We would usually use analysis of variance to assess the significance of the difference in group means. (c) Do the Kruskal-Wallis test. Explain the distinction between the hypotheses tested by Kruskal-Wallis and ANOVA. (d) Write a brief statement of your findings. Include a numerical comparison of the groups as well as your test result. 15.44 Do poets die young? In Exercise 12.46 (page 686) you analyzed the age at death for female writers. They were classified as novelists, poets, and nonfiction writers. The data are given in Table 12.1 (page 686). POETS (a) Use the Kruskal-Wallace test to compare the three groups of female writers. (b) Compare these results with what you find using the ANOVA F statistic.
Chapter 15 Exercises
15-35
CHAPTER 15 Exercises 15.45 Plants and hummingbirds. Different varieties of the tropical flower Heliconia are fertilized by different species of hummingbirds. Over time, the lengths of the flowers and the forms of the hummingbirds’ beaks have evolved to match each other. Here are data on the lengths in millimeters of three varieties of these flowers on the island of Dominica:18 HBIRDS H. bihai 47.12
46.75
46.81
47.12
46.67
47.43
46.44
46.64
48.07
48.34
48.15
50.26
50.12
46.34
46.94
48.36
H. caribaea red 41.90
42.01
41.93
43.09
41.47
41.69
39.78
40.57
39.63
42.18
40.66
37.87
39.16
37.40
38.20
38.07
38.10
37.97
38.79
38.23
38.87
37.78
38.01
Verizon 1
1
1
1
2
2
1
1
1
1
2
2
1
1
1
1
2
2
1
1
1
1
2
3
1
1
1
1
2
3
1
1
1
1
2
3
1
1
1
1
2
3
1
1
1
1
2
3
1
1
1
1
2
3
1
1
1
1
2
4
1
1
1
1
2
5
1
1
1
1
2
5
1
1
1
1
2
6
1
1
1
1
2
8
1
1
1
1
2
15
1
1
1
2
2
5
5
5
CLEC 1
H. caribaea yellow 36.78
37.02
36.52
36.11
36.03
35.45
38.13
37.10
35.17
36.82
36.66
35.68
36.03
34.57
34.63
1
5
5
5
1
5
(a) Does Verizon appear to give CLEC customers the same level of service as its own customers? Compare the data using graphs and descriptive measures and express your opinion. (b) We would like to see if times are significantly longer for CLEC customers than for Verizon customers. Why would you hesitate to use a t test for this purpose? Carry out a rank test. What can you conclude?
Do a complete analysis that includes description of the data and a rank test for the significance of the differences in lengths among the three species. 15.46 Time spent studying. In Exercise 1.173 (page 50) you compared the time spent studying by men and women. The students in a large first-year college class were asked how many minutes they studied on a typical weeknight. Here are the responses of random samples of 30 women and 30 men from the class: STIME Women
company Verizon to respond to repair calls from its own customers and from customers of a CLEC, another phone company that pays Verizon to use its local lines. Here are the data, which are rounded to the nearest hour: TREPAIR
Men 200
(c) Explain why a nonparametric procedure is appropriate in this setting. Iron-deficiency anemia is the most common form of malnutrition in developing countries. Does the type of cooking pot affect the iron content of food? We have data from a study in Ethiopia that measured the iron content (milligrams per 100 grams of food) for three types of food cooked in each of three types of pots:19 COOK
170
120
180
360
240
80
120
30
90
120
180
120
240
170
90
45
30
120
75
150
120
180
180
150
150
120
60
240
300
200
150
180
150
180
240
60
120
60
30
120
60
120
180
180
30
230
120
95
150
Aluminum
1.77
2.36
1.96
2.14
90
240
180
115
120
0
200
120
120
180
Clay
2.27
1.28
2.48
2.68
Iron
5.27
5.17
4.06
4.22
Aluminum
2.40
2.17
2.41
2.34
Clay
2.41
2.43
2.57
2.48
Iron
3.69
3.43
3.84
3.72
Aluminum
1.03
1.53
1.07
1.30
Clay
1.55
0.79
1.68
1.82
Iron
2.45
2.99
2.80
2.92
(a) Summarize the data numerically and graphically. (b) Use the Wilcoxon rank sum test to compare the men and women. Write a short summary of your results. (c) Use a two-sample t test to compare the men and women. Write a short summary of your results. (d) Which procedure is more appropriate for these data? Give reasons for your answer. 15.47 Response times for telephone repair calls. A study examined the time required for the telephone
Type of Pot
Iron Content Meat
Legumes
Vegetables
15-36
CHAPTER 15
•
Nonparametric Tests
Exercises 15.48 to 15.50 use these data. 15.48 Cooking vegetables in different pots. Does the vegetable dish vary in iron content when cooked in aluminum, clay, and iron pots? COOK (a) What do the data appear to show? Check the conditions for one-way ANOVA. Which requirements are a bit dubious in this setting? (b) Instead of ANOVA, do a rank test. Summarize your conclusions about the effect of pot material on the iron content of the vegetable dish. 15.49 Cooking meat and legumes in aluminum and clay pots. There appears to be little difference between the iron content of food cooked in aluminum pots and food cooked in clay pots. Is there a significant difference between the iron content of meat cooked in aluminum and clay? Is the difference between aluminum and clay significant for legumes? Use rank tests. COOK 15.50 Iron in food cooked in iron pots. The data show that food cooked in iron pots has the highest iron content. They also suggest that the three types of food differ in iron content. Is there significant evidence that the three types of food differ in iron content when all are cooked in iron pots? COOK 15.51 Multiple comparisons for plants and hummingbirds. As in ANOVA, we often want to carry
out a multiple-comparisons procedure following a Kruskal-Wallis test to tell us which groups differ significantly.20 The Bonferroni method (page 670) is a simple method: If we carry out k tests at fixed significance level 0.05/k, the probability of any false rejection among the k tests is always no greater than 0.05. That is, to get overall significance level 0.05 for all of k comparisons, do each individual comparison at the 0.05/k level. In Exercise 15.45 you found a significant difference among the lengths of three varieties of the flower Heliconia. Now we will explore multiple comparisons. HBIRDS (a) Write down all the pairwise comparisons we can make, for example, bihai versus caribaea red. There are three possible pairwise comparisons. (b) Carry out three Wilcoxon rank sum tests, one for each of the three pairs of flower varieties. What are the three two-sided P-values? (c) For purposes of multiple comparisons, any of these three tests is significant if its P-value is no greater than 0.05/3 5 0.0167. Which pairs differ significantly at the overall 0.05 level? 15.52 Multiple comparisons for cooking pots. The previous exercise outlines how to use the Wilcoxon rank sum test several times for multiple comparisons with overall significance level 0.05 for all comparisons together. Apply this procedure to the data used in each of Exercises 15.48 to 15.50. COOK
CHAPTER 15 Notes and Data Sources 1. Condé Nast Traveler readers poll data for 2013, from cntraveler.com/spas/2013/03/best-spas-united-statescaribbean-mexico-cruise-ships.
6. Data provided by Warren Page, New York City Technical College, from a study done by John Hudesman.
2. For purists, here is the precise definition: X1 is stochastically larger than X2 if
7. Data provided by Susan Stadler, Purdue University.
P1X1 ⬎ a2 ⱖ P1X2 ⬎ a2 for all a, with strict inequality for at least one a. The Wilcoxon rank sum test is effective against this alternative in the sense that the power of the test approaches 1 (that is, the test becomes more certain to reject the null hypothesis) as the number of observations increases. 3. Erin K. O’Loughlin et al., “Prevalence and correlates of exergaming in youth,” Pediatrics, 130 (2012), pp. 806–814. 4. From the PEW Internet and American Life website, pewinternet.org/Reports/2013/Civic-Engagement.aspx. 5. From Matthias R. Mehl et al., “Are women really more talkative than men?,” Science, 317, No. 5834 (2007), p. 82. The raw data were provided by Matthias Mehl.
8. Ibid. 9. The vehicle is a 2002 Toyota Prius owned by the third author. 10. Statistics regarding Facebook usage can be found at facebook.com/notes/facebook-data-team/anatomyof-facebook/10150388519243859. 11. These data were collected as part of a larger study of dementia patients conducted by Nancy Edwards, School of Nursing, and Alan Beck, School of Veterinary Medicine, Purdue University. 12. Data provided by Diana Schellenberg, Purdue University School of Health Sciences. 13. These data are from “Results report on the vitamin C pilot program,” prepared by SUSTAIN (Sharing
Chapter 15 Notes and Data Sources United States Technology to Aid in the Improvement of Nutrition) for the U.S. Agency for International Development. The report was used by the Committee on International Nutrition of the National Academy of Sciences/Institute of Medicine to make recommendations on whether or not the vitamin C content of food commodities used in U.S. food aid programs should be increased. The program was directed by Peter Ranum and Françoise Chomé. The second author was a member of the committee. 14. Data provided by Sam Phillips, Purdue University. 15. See Note 10. 16. Data provided by Helen Park. See H. Park et al., “Fortifying bread with each of three antioxidants,” Cereal Chemistry, 74 (1997), pp. 202–206.
15-37
17. Data provided by Jo Welch, Purdue University Department of Foods and Nutrition. 18. We thank Ethan J. Temeles of Amherst College for providing the data. His work is described in Ethan J. Temeles and W. John Kress, “Adaptation in a plant-hummingbird association,” Science, 300 (2003), pp. 630–633. 19. Based on A. A. Adish et al., “Effect of consumption of food cooked in iron pots on iron status and growth of young children: A randomised trial,” The Lancet, 353 (1999), pp. 712–716. 20. For more details on multiple comparisons, see M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods, 2nd ed., Wiley, 1999. This book is a useful reference on applied aspects of nonparametric inference in general.
Bootstrap Methods and Permutation Tests* Introduction The continuing revolution in computing is having a dramatic influence on statistics. The exploratory analysis of data is becoming easier as more graphs and calculations are automated. The statistical study of very large and very complex data sets is now feasible. Another impact of this fast and inexpensive computing is less obvious: new methods apply previously unthinkable amounts of computation to produce confidence intervals and tests of significance in settings that don’t meet the conditions for safe application of the usual methods of inference. Consider the commonly used t procedures for inference about means (Chapter 7) and for relationships between quantitative variables (Chapter 10). All these methods rest on the use of Normal distributions for data. While no data are exactly Normal, the t procedures are useful in practice because they
16
CHAPTER
16.1 The Bootstrap Idea 16.2 First Steps in Using the Bootstrap 16.3 How Accurate Is a Bootstrap Distribution?
16.4 Bootstrap Confidence Intervals 16.5 Significance Testing Using Permutation Tests
*The original version of this chapter was written by Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, and Rachel Epstein, with support from the National Science Foundation under grant DMI-0078706. Revisions have been made by Bruce A. Craig and George P. McCabe. Special thanks to Bob Thurman, Richard Heiberger, Laura Chihara, Tom Moore, and Gudmund Iversen for helpful comments on an earlier version.
16-1
16-2
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
LOOK BACK robust, p. 432
LOOK BACK F test for equality of spread, p. 474
are robust. Nonetheless, we cannot use t confidence intervals and tests if the data are strongly skewed, unless our samples are quite large. Other procedures cannot be used on non-Normal data even when the samples are large. Inference about spread based on Normal distributions is not robust and therefore of little use in practice. Finally, what should we do if we are interested in, say, a ratio of means, such as the ratio of average men’s salary to average women’s salary? There is no simple traditional inference method for this setting. The methods of this chapter—bootstrap confidence intervals and permutation tests—apply the power of the computer to relax some of the conditions needed for traditional inference and to do inference in new settings. The big ideas of statistical inference remain the same. The fundamental reasoning is still based on asking, “What would happen if we applied this method many times?” Answers to this question are still given by confidence levels and P-values based on the sampling distributions of statistics. The most important requirement for trustworthy conclusions about a population is still that our data can be regarded as random samples from the population—not even the computer can rescue voluntary response samples or confounded experiments. But the new methods set us free from the need for Normal data or large samples. They work the same way for many different statistics in many different settings. They can, with sufficient computing power, give results that are more accurate than those from traditional methods. Bootstrap intervals and permutation tests are conceptually simple because they appeal directly to the basis of all inference: the sampling distribution that shows what would happen if we took very many samples under the same conditions. The new methods do have limitations, some of which we will illustrate. But their effectiveness and range of use are so great that they are now widely used in a variety of settings.
Software Bootstrapping and permutation tests are feasible in practice only with software that automates the heavy computation that these methods require. If you are sufficiently expert, you can program at least the basic methods yourself. It is easier to use software that offers bootstrap intervals and permutation tests preprogrammed, just as most software offers the various t intervals and tests. You can expect the new methods to become more common in standard statistical software. This chapter primarily uses R, the software choice of many statisticians doing research on resampling methods.1 There are several packages of functions for resampling in R. We will focus on the boot package, which offers the most capabilities. Unlike software such as Minitab and SPSS, R is not menu driven and requires command line requests to load data and access various functions. All commands used in this chapter are available on the text website. SPSS and SAS also offer preprogrammed bootstrap and permutation methods. SPSS has an auxiliary bootstrap module that contains most of the methods described in this chapter. In SAS, the SURVEYSELECT procedure can be used to do the necessary resampling. The bootstrap macro contains most of the confidence interval methods offered by R. You can find links for downloading these modules or macros on the text website.
16.1 The Bootstrap Idea
16-3
16.1 The Bootstrap Idea When you complete this section, you will be able to • Randomly select bootstrap resamples from a small sample using software and a table of random numbers. • Find the bootstrap standard error from a collection of resamples. • Use computer output to describe the results of a bootstrap analysis of the mean.
Here is the example we will use to introduce these methods.
EXAMPLE DATA TIME50
16.1 Time to start a business. The World Bank collects information about starting businesses throughout the world. They have determined the time, in days, to complete all the procedures required to start a business. For this example, we use the times to start a business for a random sample of 50 countries included in the World Bank survey. Figure 16.1(a) gives a histogram and Figure 16.1(b) gives the Normal quantile plot. The data are strongly skewed to the right. The median is 12 days and the mean is almost twice as large, 23.26 days. We have some concerns about using the t procedures for these data.
CHALLENGE 140
0.6
120
Times (in days)
0.5
Percent
0.4 0.3
100 80 60
0.2
40
0.1
20 0
0.0 0
50
100 Time (in days)
(a)
150
–2
–1
0 1 Normal score
2
(b)
FIGURE 16.1 (a) The distribution of 50 times to start a business. (b) Normal quantile plot of the times to start a business, for Example 16.1. The distribution is strongly right-skewed.
16-4
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
The big idea: resampling and the bootstrap distribution LOOK BACK sampling distribution, p. 302
resamples
sampling with replacement
Statistical inference is based on the sampling distributions of sample statistics. A sampling distribution is based on many random samples from the population. The bootstrap is a way of finding the sampling distribution, at least approximately, from just one sample. Here is the procedure: Step 1: Resampling. In Example 16.1, we have just one random sample. In place of many samples from the population, create many resamples by repeatedly sampling with replacement from this one random sample. Each resample is the same size as the original random sample. Sampling with replacement means that after we randomly draw an observation from the original sample, we put it back before drawing the next observation. Think of drawing a number from a hat and then putting it back before drawing again. As a result, any number can be drawn more than once. If we sampled without replacement, we’d get the same set of numbers we started with, though in a different order. Figure 16.2 illustrates three resamples from a sample of five observations. In practice, we draw hundreds or thousands of resamples, not just three. 23 4 19 9 10 Mean = 13.0
4 19 19 9 9 Mean = 12.0
23 4 19 9 9 Mean = 12.8
4 4 19 19 9 Mean = 11.0
FIGURE 16.2 The resampling idea. The top box is a sample of size n 5 5 from the time to start a business data. The three lower boxes are three resamples from this original sample. Some values from the original sample are repeated in the resamples because each resample is formed by sampling with replacement. We calculate the statistic of interest, the sample mean in this example, for the original sample and each resample.
bootstrap distribution
Step 2: Bootstrap distribution. The sampling distribution of a statistic collects the values of the statistic from the many samples of the population. The bootstrap distribution of a statistic collects its values from the many resamples. The bootstrap distribution gives information about the sampling distribution.
THE BOOTSTRAP IDEA The original sample is representative of the population from which it was drawn. Thus, resamples from this original sample represent what we would get if we took many samples from the population. The bootstrap distribution of a statistic, based on the resamples, represents the sampling distribution of the statistic.
EXAMPLE DATA TIME50
16.2 Bootstrap distribution of mean time to start a business. In Example 16.1, we want to estimate the population mean time to start a business, m, so the statistic is the sample mean x. For our one random sample of 50 times,
16.1 The Bootstrap Idea
16-5
Mean times of resamples (in days)
x ⫽ 23.26 days. When we resample, we get different values of x, just as we would if we took new samples from the population of all times to start a business. We randomly generated 3000 resamples for these data. The mean for the resamples is 23.30 days and the standard deviation is 3.85. Figure 16.3(a) gives a histogram of the bootstrap distribution of the means of 3000 resamples from the time to start a business data. The Normal density curve with the mean 23.30 and standard deviation 3.85 is superimposed on the histogram. A Normal quantile plot is given in Figure 16.3(b). The distribution of the resample means is approximately Normal, although a small amount of skewness is still evident.
35
30
25
20
15
15
20
25
30
Mean times of resamples (in days)
(a)
35
–3
–2
–1 0 1 Normal score
2
3
(b)
FIGURE 16.3 (a) The bootstrap distribution of 3000 resample means from the sample of times to start a business. The smooth curve is the Normal density function for the distribution that matches the mean and standard deviation of the distribution of the resample means. (b) The Normal quantile plot confirms that the bootstrap distribution is somewhat skewed to the right but fits the Normal distribution quite well.
LOOK BACK central limit theorem, p. 307
LOOK BACK mean and standard deviation of x, p. 306
According to the bootstrap idea, the bootstrap distribution represents the sampling distribution. Let’s compare the bootstrap distribution with what we know about the sampling distribution. Shape: We see that the bootstrap distribution is nearly Normal. The central limit theorem says that the sampling distribution of the sample mean x is approximately Normal if n is large. So the bootstrap distribution shape is close to the shape we expect the sampling distribution to have. Center: The bootstrap distribution is centered close to the mean of the original sample, 23.30 days versus 23.26 days for the original sample.
16-6
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
bootstrap standard error
Therefore, the mean of the bootstrap distribution has little bias as an estimator of the mean of the original sample. We know that the sampling distribution of x is centered at the population mean m, that is, that x is an unbiased estimate of m. So the resampling distribution behaves (starting from the original sample) as we expect the sampling distribution to behave (starting from the population). Spread: The histogram and density curve in Figure 16.3(a) picture the variation among the resample means. We can get a numerical measure by calculating their standard deviation. Because this is the standard deviation of the 3000 values of x that make up the bootstrap distribution, we call it the bootstrap standard error of x. The numerical value is 3.85. In fact, we know that the standard deviation of x is s兾 1n , where s is the standard deviation of individual observations in the population. Our usual estimate of this quantity is the standard error of x, s兾 1n , where s is the standard deviation of our one random sample. For these data, s ⫽ 28.20 and s 2n
LOOK BACK central limit theorem, p. 307
⫽
28.20 250
⫽ 3.99
The bootstrap standard error 3.85 is relatively close to the theory-based estimate 3.99. In discussing Example 16.2, we took advantage of the fact that statistical theory tells us a great deal about the sampling distribution of the sample mean x. We found that the bootstrap distribution created by resampling matches the properties of this sampling distribution. The heavy computation needed to produce the bootstrap distribution replaces the heavy theory (central limit theorem, mean, and standard deviation of x) that tells us about the sampling distribution. The great advantage of the resampling idea is that it often works even when theory fails. Of course, theory also has its advantages: we know exactly when it works. We don’t know exactly when resampling works, so that “When can I safely bootstrap?” is a somewhat subtle issue. Figure 16.4 illustrates the bootstrap idea by comparing three distributions. Figure 16.4(a) shows the idea of the sampling distribution of the sample mean x: take many random samples from the population, calculate the mean x for each sample, and collect these x-values into a distribution. Figure 16.4(b) shows how traditional inference works: statistical theory tells us that if the population has a Normal distribution, then the sampling distribution of x is also Normal. If the population is not Normal but our sample is large, we can use the central limit theorem. If m and s are the mean and standard deviation of the population, the sampling distribution of x has mean m and standard deviation s兾 1n. When it is available, theory is wonderful: we know the sampling distribution without the impractical task of actually taking many samples from the population. Figure 16.4(c) shows the bootstrap idea: we avoid the task of taking many samples from the population by instead taking many resamples from a single sample. The values of x from these resamples form the bootstrap distribution. We use the bootstrap distribution rather than theory to learn about the sampling distribution.
16.1 The Bootstrap Idea
SRS of size n
16-7
x–
SRS of size n
x–
SRS of size n · · ·
x– · · ·
POPULATION unknown mean
Sampling distribution (a)
_
/兹n
Theory
Sampling distribution
NORMAL POPULATION unknown mean (b)
One SRS of size n
Resample of size n
x–
Resample of size n
x–
Resample of size n
x– · · ·
· · · POPULATION unknown mean
Bootstrap distribution (c)
FIGURE 16.4 (a) The idea of the sampling distribution of the sample mean x: take very many samples, collect the x-values from each, and look at the distribution of these values. (b) The theory shortcut: if we know that the population values follow a Normal distribution, theory tells us that the sampling distribution of x is also Normal. (c) The bootstrap idea: when theory fails and we can afford only one sample, that sample stands in for the population, and the distribution of x in many resamples stands in for the sampling distribution.
16-8
CHAPTER 16
•
Bootstrap Methods and Permutation Tests USE YOUR KNOWLEDGE DATA TIME6
16.1 A small bootstrap example. To illustrate the bootstrap procedure, let’s bootstrap a small random subset of the time to start a business data: 8
3
10
47
7
32
(a) Sample with replacement from this initial SRS by rolling a die. Rolling a 1 means select the first member of the SRS, a 2 means select the second member, and so on. (You can also use Table B of random digits, responding only to digits 1 to 6.) Create 20 resamples of size n ⫽ 6. CHALLENGE
(b) Calculate the sample mean for each of the resamples. (c) Make a stemplot of the means of the 20 resamples. This is the bootstrap distribution. (d) Calculate the bootstrap standard error. 16.2 Standard deviation versus standard error. Explain the difference between the standard deviation of a sample and the standard error of a statistic such as the sample mean.
Thinking about the bootstrap idea It might appear that resampling creates new data out of nothing. This seems suspicious. Even the name “bootstrap” comes from the impossible image of “pulling yourself up by your own bootstraps.”2 But the resampled observations are not used as if they were new data. The bootstrap distribution of the resample means is used only to estimate how the sample mean of one actual sample of size 50 would vary because of random sampling. Using the same data for two purposes—to estimate a parameter and also to estimate the variability of the estimate—is perfectly legitimate. We do exactly this when we calculate x to estimate m and then calculate s兾 1n from the same data to estimate the variability of x. What is new? First of all, we don’t rely on the formula s兾 1n to estimate the standard deviation of x. Instead, we use the ordinary standard deviation of the many x-values from our many resamples.3 Suppose that we take B resamples and call the means of these resamples x* to distinguish them from the mean x of the original sample. We would then find the mean and standard deviation of the x*’s in the usual way. To make clear that these are the mean and standard deviation of the means of the B resamples rather than the mean x and standard deviation s of the original sample, we use a distinct notation:
LOOK BACK describing distributions with numbers, p. 30
meanboot ⫽ SEboot ⫽
1 x* Ba 2 1 ax* ⫺ meanboot b a BB ⫺ 1
These formulas go all the way back to Chapter 1. Once we have the values x*, we can just ask our software for their mean and standard deviation.
16.1 The Bootstrap Idea
16-9
Because we will often apply the bootstrap to statistics other than the sample mean, here is the general definition for the bootstrap standard error.
BOOTSTRAP STANDARD ERROR The bootstrap standard error SEboot of a statistic is the standard deviation of the bootstrap distribution of that statistic.
Another thing that is new is that we don’t appeal to the central limit theorem or other theory to tell us that a sampling distribution is roughly Normal. We look at the bootstrap distribution to see if it is roughly Normal (or not). In most cases, the bootstrap distribution has approximately the same shape and spread as the sampling distribution, but it is centered at the original sample statistic value rather than the parameter value. In summary, the bootstrap allows us to calculate standard errors for statistics for which we don’t have formulas and to check Normality for statistics that theory doesn’t easily handle. To apply the bootstrap idea, we must start with a statistic that estimates the parameter we are interested in. We come up with a suitable statistic by appealing to another principle that we have often applied without thinking about it.
THE PLUG-IN PRINCIPLE To estimate a parameter, a quantity that describes the population, use the statistic that is the corresponding quantity for the sample.
The plug-in principle tells us to estimate a population mean m by the sample mean x and a population standard deviation s by the sample standard deviation s. Estimate a population median by the sample median and a population regression line by the least-squares line calculated from a sample. The bootstrap idea itself is a form of the plug-in principle: substitute the data for the population and then draw samples (resamples) to mimic the process of building a sampling distribution.
Using software Software is essential for bootstrapping in practice. Here is an outline of the program you would write if your software can choose random samples from a set of data but does not have bootstrap functions: Repeat B times { Draw a resample with replacement from the data. Calculate the resample statistic. Save the resample statistic into a variable. } Make a histogram and Normal quantile plot of the B resample statistics. Calculate the standard deviation of the B statistics.
16-10
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
EXAMPLE DATA TIME50
CHALLENGE
16.3 Using software. R has packages that contain various bootstrap functions so we do not have to write them ourselves. If the 50 times to start a business times are saved as a variable, we can use functions to resample from the data, calculate the means of the resamples, and request both graphs and printed output. We can also ask that the bootstrap results be saved for later access. The function plot.boot will generate graphs similar to those in Figure 16.3 so you can assess Normality. Figure 16.5 contains the default output from a call of the function boot. The variable Time contains the 50 starting times, the function theta is specified to be the mean, and we request 3000 resamples. The original entry gives the mean x ⫽ 23.26 of the original sample. Bias is the difference between the mean of the resample means and the original mean. If we add the entries for bias and original we get the mean of the resample means, meanboot: 23.26 ⫹ 0.04 ⫽ 23.30 The bootstrap standard error is displayed under std.error. All these values except original will differ a bit if you take another 3000 resamples, because resamples are drawn at random.
R Console
ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Time, statistic = theta, R = 3000) Bootstrap Statistics : original bias t1* 23.26 0.03955333
std. error 3.850817
FIGURE 16.5 R output for the time to start a business bootstrap, for Example 16.3.
SECTION 16.1 Summary To bootstrap a statistic such as the sample mean, draw hundreds of resamples with replacement from a single original sample, calculate the statistic for each resample, and inspect the bootstrap distribution of the resample statistics. A bootstrap distribution approximates the sampling distribution of the statistic. This is an example of the plug-in principle: use a quantity based on the sample to approximate a similar quantity from the population. A bootstrap distribution usually has approximately the same shape and spread as the sampling distribution. It is centered at the statistic (from the original sample) when the sampling distribution is centered at the parameter (of the population). Use graphs and numerical summaries to determine whether the bootstrap distribution is approximately Normal and centered at the original statistic,
16.1 The Bootstrap Idea
16-11
and to get an idea of its spread. The bootstrap standard error is the standard deviation of the bootstrap distribution. The bootstrap does not replace or add to the original data. We use the bootstrap distribution as a way to estimate the variation in a statistic based on the original data.
SECTION 16.1 Exercises (a) Do you think that these data appear to be from a Normal distribution? Give reasons for your answer.
For Exercises 16.1 and 16.2, see page 16-8. 16.3 Gosset’s data on double stout sales. William Sealy Gosset worked at the Guinness Brewery in Dublin and made substantial contributions to the practice of statistics. In Exercise 1.61 (page 48), we examined Gosset’s data on the change in the double stout market before and after World War I (1914–1918). For various regions in England and Scotland, he calculated the ratio of sales in 1925, after the war, as a percent of sales in 1913, before the war. Here are the data for a sample of six of the regions in the original data: STOUT6
(b) Select five resamples from this set of data. (c) Compute the mean for each resample. 16.4 Find the bootstrap standard error. Refer to your work in the previous exercise. STOUT6 (a) Would you expect the bootstrap standard error to be larger, smaller, or approximately equal to the standard deviation of the original sample of six regions? Explain your answer. (b) Find the bootstrap standard error.
Bristol
94
Glasgow
66
English P
46
Liverpool
140
English Agents
78
Scottish
16.5 Read the output. Figure 16.6 gives a histogram and a Normal quantile plot for 3000 resample means from R. Interpret these plots.
24
FIGURE 16.6 R output for
100 t* 80
0.020 0.015
60
0.010
40
0.005 0.000
Density
0.025
120
0.030
the change in double stout sales bootstrap, for Exercise 16.5.
20
60
100 t*
140
–3 –2 –1 0 1 2 3 Normal Score
16-12
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
R Console
ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = stout, statistic = theta, R = 3000) Bootstrap Statistics : original bias t1* 74.66667 -0.2038889
std.error 14.90047
Example 1.48 (page 71). The distribution is clearly not Normal; it has three peaks possibly corresponding to three types of seats. We view these data as coming from a process that gives seat prices for an event such as this. STUBHUB 16.10 Bootstrap distribution of time spent watching videos on a cell phone. The hours per month spent watching videos on cell phones in a random sample of eight cell phone subscribers (Example 7.1, page 421) are 11.9 2.8 3.0 6.2 4.7 9.8 11.1 7.8 The distribution has no outliers, but we cannot assess Normality from such a small sample. VIDEO
FIGURE 16.7 R output for the change in double stout sales bootstrap, for Exercise 16.6.
16.6 Read the output. Figure 16.7 gives output from R for the sample of regions in Exercise 16.3. Summarize the results of the analysis using this output. 16.7 What’s wrong? Explain what is wrong with each of the following statements. (a) The standard deviation of the bootstrap distribution will be approximately the same as the standard deviation of the original sample. (b) The bootstrap distribution is created by resampling without replacement from the original sample. (c) When generating the resamples, it is best to use a sample size smaller than the size of the original sample. (d) The bootstrap distribution is created by resampling with replacement from the population. Inspecting the bootstrap distribution of a statistic helps us judge whether the sampling distribution of the statistic is close to Normal. Bootstrap the sample mean x for each of the data sets in Exercises 16.8 to 16.12 using 2000 resamples. Construct a histogram and a Normal quantile plot to assess Normality of the bootstrap distribution. On the basis of your work, do you expect the sampling distribution of x to be close to Normal? Save your bootstrap results for later analysis. 16.8 Bootstrap distribution of average IQ score. The distribution of the 60 IQ test scores in Table 1.1 (page 16) is roughly Normal (see Figure 1.9) and the sample size is large enough that we expect a Normal sampling distribution. IQ 16.9 Bootstrap distribution of StubHub! prices. We examined the distribution of the 186 tickets for the National Collegiate Athletic Association (NCAA) Women’s Final Four Basketball Championship in New Orleans posted for sale on StubHub! on January 2, 2013, in
16.11 Bootstrap distribution of Titanic passenger ages. In Example 1.36 (page 54) we examined the distribution of the ages of the passengers on the Titanic. There is a single mode around 25, a short left tail, and a long right tail. We view these data as coming from a process that would generate similar data. TITANIC 16.12 Bootstrap distribution of average audio file length. The lengths (in seconds) of audio files found on an iPod (Table 7.3, page 437) are skewed. We previously transformed the data prior to using t procedures. SONGS 16.13 Standard error versus the bootstrap standard error. We have two ways to estimate the standard deviation of a sample mean x: use the formula s兾 1n for the standard error, or use the bootstrap standard error. (a) Find the sample standard deviation s for the 60 IQ test scores in Exercise 16.8 and use it to find the standard error s兾 1n of the sample mean. How closely does your result agree with the bootstrap standard error from your resampling in Exercise 16.8? (b) Find the sample standard deviation s for the StubHub! ticket price data in Exercise 16.9 and use it to find the standard error s兾 1n of the sample mean. How closely does your result agree with the bootstrap standard error from your resampling in Exercise 16.9? (c) Find the sample standard deviation s for the eight video-watching times in Exercise 16.10 and use it to find the standard error s兾 1n of the sample mean. How closely does your result agree with the bootstrap standard error from your resampling in Exercise 16.10? 16.14 Service center call lengths. Table 1.2 (page 19) gives the service center call lengths for a sample of 80 calls. See Example 1.15 (page 18) for more details about these data. CALLS80 (a) Make a histogram of the call lengths. The distribution is strongly skewed. (b) The central limit theorem says that the sampling distribution of the sample mean x becomes Normal as
16.2 First Steps in Using the Bootstrap the sample size increases. Is the sampling distribution roughly Normal for n ⫽ 80? To find out, bootstrap these data using 1000 resamples and inspect the bootstrap distribution of the mean. The central part of the distribution is close to Normal. In what way do the tails depart from Normality? 16.15 More on service center call lengths. Here is an SRS of 10 of the service center call lengths from Exercise 16.14: CALLS10 104 102
35
211
56
325
67
9
179
59
16-13
We expect the sampling distribution of x to be less close to Normal for samples of size 10 than for samples of size 80 from a skewed distribution. (a) Create and inspect the bootstrap distribution of the sample mean for these data using 1000 resamples. Compared with your distribution from the previous exercise, is this distribution closer to or farther away from Normal? (b) Compare the bootstrap standard errors for your two sets of resamples. Why is the standard error larger for the smaller SRS?
16.2 First Steps in Using the Bootstrap When you complete this section, you will be able to • Determine when it is appropriate to use the bootstrap standard error and the t distribution to find a confidence interval. • Use the bootstrap standard error and the t distribution to find a confidence interval.
LOOK BACK bias, p. 179
To introduce the key ideas of resampling and bootstrap distributions, we studied an example in which we knew quite a bit about the actual sampling distribution. We saw that the bootstrap distribution agrees with the sampling distribution in shape and spread. The center of the bootstrap distribution is not the same as the center of the sampling distribution. The sampling distribution of a statistic used to estimate a parameter is centered at the actual value of the parameter in the population, plus any bias. The bootstrap distribution is centered at the value of the statistic for the original sample, plus any bias. The key fact is that the two biases are similar even though the two centers may not be. The bootstrap method is most useful in settings where we don’t know the sampling distribution of the statistic. The principles are • Shape: Because the shape of the bootstrap distribution approximates the shape of the sampling distribution, we can use the bootstrap distribution to check Normality of the sampling distribution. • Center: A statistic is biased as an estimate of the parameter if its sampling distribution is not centered at the true value of the parameter. We can check bias by seeing whether the bootstrap distribution of the statistic is centered at the value of the statistic for the original sample.
bootstrap estimate of bias
More precisely, the bias of a statistic is the difference between the mean of its sampling distribution and the true value of the parameter. The bootstrap estimate of bias is the difference between the mean of the bootstrap distribution and the value of the statistic in the original sample. • Spread: The bootstrap standard error of a statistic is the standard deviation of its bootstrap distribution. The bootstrap standard error estimates the standard deviation of the sampling distribution of the statistic.
16-14
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
Bootstrap t confidence intervals If the bootstrap distribution of a statistic shows a Normal shape and small bias, we can get a confidence interval for the parameter by using the bootstrap standard error and the familiar t distribution. An example will show how this works.
EXAMPLE
DATA
16.4 Grade point averages. A study of college students at a large university looked at grade point average (GPA) after three semesters of college as a measure of success. In Example 11.1 (page 612) we examined predictors of GPA. Let’s take a look at the distribution of the GPA for the 150 students in this study. A histogram is given in Figure 16.8(a). The Normal quantile plot is given in Figure 16.8(b). The distribution is strongly skewed to the left. The Normal quantile plot suggests that there are several students with perfect (4.0) GPAs and one at the lower end of the distribution (0.0). These data are not Normally distributed.
GPA
4 0.12 0.10
3 CHALLENGE
GPA
Percent
0.08 0.06 0.04
2
1
0.02 0
0.00 0
1
2 GPA
3
4
–2
(a)
–1
0 1 Normal score (b)
2
FIGURE 16.8 Histogram and Normal quantile plot for 150 grade point averages, for Example 16.4. The distribution is strongly skewed.
LOOK BACK trimmed mean, p. 53
The first step is to abandon the mean as a measure of center in favor of a statistic that focuses on the central part of the distribution. We might choose the median, but in this case we will use the 25% trimmed mean, the mean of the middle 50% of the observations. The median is the middle observation or the mean of the two middle observations. The trimmed mean often does a better job of representing the average of typical observations than does the median. Our parameter is the 25% trimmed mean of the population of college student GPAs after three semesters at this large university. By the plug-in principle, the statistic that estimates this parameter is the 25% trimmed mean of the sample
16.2 First Steps in Using the Bootstrap
16-15
of 150 students. Because 25% of 150 is 37.5, we drop the 37 lowest and 37 highest GPAs and find the mean of the remaining 76 GPAs. The statistic is x25% ⫽ 2.950 Given the relatively large sample size from this strongly skewed distribution, we can use the central limit theorem to argue that the sampling distribution would be approximately Normal with mean near 2.950. Estimating its standard deviation, however, is a more difficult task. We can’t simply use the standard error of the sample mean based on the remaining 76 observations, as that will underestimate the true variability. Fortunately, we don’t need any distribution facts to use the bootstrap. We bootstrap the 25% trimmed mean just as we bootstrapped the sample mean: draw 3000 resamples of size 150 from the 150 GPAs, calculate the 25% trimmed mean for each resample, and form the bootstrap distribution from these 3000 values. Figure 16.9 shows the bootstrap distribution of the 25% trimmed mean. Here is the summary output from R: ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = GPA, statistic = theta, R = 3000) Bootstrap Statistics : original bias std. error t1* 2.949605 -0.002912 0.0778597
3.2
Means of resamples
3.1
3.0
2.9
2.8
2.7
2.7
2.8
2.9 3.0 3.1 Means of resamples (a)
3.2
3.3
–3
–2
–1
1 0 Normal score
2
3
(b)
FIGURE 16.9 The bootstrap distribution of the 25% trimmed means for 3000 resamples from the GPA data in Example 16.4. The bootstrap distribution is approximately Normal.
16-16
CHAPTER 16
•
Bootstrap Methods and Permutation Tests What do we see? Shape: The bootstrap distribution is close to Normal. This suggests that the sampling distribution of the trimmed mean is also close to Normal. Center: The bootstrap estimate of bias is ⫺0.003, which is small relative to the value 2.950 of the statistic. So the statistic (the trimmed mean of the sample) has small bias as an estimate of the parameter (the trimmed mean of the population). Spread: The bootstrap standard error of the statistic is SEboot ⫽ 0.078 This is an estimate of the standard deviation of the sampling distribution of the trimmed mean. Recall the familiar one-sample t confidence interval (page 421) for the mean of a Normal population: s x ⫾ t*SE ⫽ x ⫾ t* 2n This interval is based on the Normal sampling distribution of the sample mean x and the formula SE ⫽ s兾 1n for the standard error of x. When a bootstrap distribution is approximately Normal and has small bias, we can essentially use the same idea with the bootstrap standard error to get a confidence interval for any parameter.
BOOTSTRAP t CONFIDENCE INTERVAL Suppose that the bootstrap distribution of a statistic from an SRS of size n is approximately Normal and that the bootstrap estimate of bias is small. An approximate level C confidence interval for the parameter that corresponds to this statistic by the plug-in principle is statistic ⫾ t*SEboot where SEboot is the bootstrap standard error for this statistic and t* is the critical value of the t1n ⫺ 12 distribution with area C between ⫺t* and t*.
EXAMPLE DATA GPA
16.5 Bootstrap distribution of the trimmed mean. We want to estimate the 25% trimmed mean of the population of all college student GPAs after three semesters at this large university. We have an SRS of size n ⫽ 150. The software output above shows that the trimmed mean of this sample is x25% ⫽ 2.950 and that the bootstrap standard error of this statistic is SEboot ⫽ 0.078. A 95% confidence interval for the population trimmed mean is therefore x25% ⫾ t*SEboot ⫽ 2.950 ⫾ 12.0002 10.0782 ⫽ 2.950 ⫾ 0.156
CHALLENGE
⫽ 12.794, 3.1062 Because Table D does not have entries for 3n ⫺ 21372 4 ⫺ 1 ⫽ 75 degrees of freedom, we used t* ⫽ 2.000, the entry for 60 degrees of freedom. We are 95% confident that the 25% trimmed mean (the mean of the middle 50%) for the population of college student GPAs after three semesters at this large university is between 2.794 and 3.106.
16.2 First Steps in Using the Bootstrap
16-17
USE YOUR KNOWLEDGE 16.16 Bootstrap t confidence interval. Recall Example 16.2 (page 16-4). Suppose that a bootstrap distribution was created using 3000 resamples and that the mean and standard deviation of the resample means were 23.29 and 3.90, respectively. (a) What is the bootstrap estimate of the bias? (b) What is the bootstrap standard error of x? (c) Assume that the bootstrap distribution is reasonably Normal. Since the bias is small relative to the observed x, the bootstrap t confidence interval for the population mean m is justified. Give the 95% bootstrap t confidence interval for m. DATA SONGS
16.17 Bootstrap t confidence interval for average audio file length. Return to or create the bootstrap distribution resamples on the sample mean for audio file length in Exercise 16.12 (page 16-12). In Example 7.10 (page 437), the t confidence interval was applied to the logarithm of the time measurements. (a) Inspect the bootstrap distribution. Is a bootstrap t confidence interval appropriate? Explain why or why not. (b) Construct the 95% bootstrap t confidence interval.
CHALLENGE
(c) Compare the bootstrap results with the t confidence interval reported in Example 7.11 (page 438).
Bootstrapping to compare two groups
LOOK BACK two-sample t significance test, p. 454
Two-sample problems are among the most common statistical settings. In a two-sample problem, we wish to compare two populations, such as male and female college students, based on separate samples from each population. When both populations are roughly Normal, the two-sample t procedures compare the two population means. The bootstrap can also compare two populations, without the Normality condition and without the restriction to comparison of means. The most important new idea is that bootstrap resampling must mimic the “separate samples” design that produced the original data.
BOOTSTRAP FOR COMPARING TWO POPULATIONS Given independent SRSs of sizes n and m from two populations: 1. Draw a resample of size n with replacement from the first sample and a separate resample of size m from the second sample. Compute a statistic that compares the two groups, such as the difference between the two sample means. 2. Repeat this resampling process thousands of times. 3 Construct the bootstrap distribution of the statistic. Inspect its shape, bias, and bootstrap standard error in the usual way.
16-18
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
EXAMPLE DATA
16.6 Bootstrap comparison of GPAs. In Example 16.4 we looked at grade point average (GPA) after three semesters of college as a measure of success. How do GPAs compare between men and women? Figure 16.10 shows density curves and Normal quantile plots for the GPAs of 91 males and 59 females. The distributions are both far from Normal. Here are some summary statistics:
GPA
CHALLENGE
Gender
n
Male
91
2.784
0.859
Female
59
2.933
0.748
x
s
⫺0.149
Difference
The data suggest that GPAs tend to be slightly higher for females. The mean GPA for females is roughly 0.15 higher than the mean for males.
0.6
4
0.5 3 Male Female GPA
Density
0.4 0.3
2
0.2 1 0.1 0.0
0 0
1
2 GPA
(a)
3
4
–2
–1
0 Normal score
1
2
(b)
FIGURE 16.10 Density curves and Normal quantile plots of the distributions of GPA for males and females, for Example 16.6.
In the setting of Example 16.6 we want to estimate the difference between population means, m1 ⫺ m2. We might be somewhat reluctant to use the twosample t confidence interval because both samples are very skewed. To compute the bootstrap standard error for the difference in sample means x1 ⫺ x2, resample separately from the two samples. Each of our 3000 resamples consists of two group resamples, one of size 91 drawn with replacement from the male data and one of size 59 drawn with replacement from the female data.
16.2 First Steps in Using the Bootstrap
16-19
For each combined resample, compute the statistic x1 ⫺ x2. The 3000 differences form the bootstrap distribution. The bootstrap standard error is the standard deviation of the bootstrap distribution. The boot function in R automates this bootstrap procedure. Here is the R output: STRATIFIED BOOTSTRAP Call: boot(data = gpa, statistic = meanDiff, R = 3000, strata = sex) Bootstrap Statistics : original bias std. error t1* -0.1490259 0.003989901 0.1327419 Figure 16.11 shows that the bootstrap distribution is close to Normal. We can trust the bootstrap t confidence interval for these data. A 95% confidence interval for the difference in mean GPAs (males versus females) is therefore x25% ⫾ t*SEboot ⫽ ⫺0.149 ⫾ 12.0092 10.1332 ⫽ ⫺0.149 ⫾ 0.267 ⫽ 1⫺0.416, 0.1182 Because Table D does not have entries for min1n1 ⫺ 1, n2 ⫺ 12 ⫽ 58 degrees of freedom, we used t* ⫽ 2.009, the entry for 50 degrees of freedom.
Difference in means of resamples
0.4
0.2
0.0
–0.2
–0.4
–0.6 –0.6
–0.4
–0.2
0.0
0.2
Difference in means of resamples (a)
0.4
–3
–2
–1 1 0 Normal score (b)
2
3
FIGURE 16.11 The bootstrap distribution and Normal quantile plot for the differences in means for the GPA data.
16-20
CHAPTER 16
•
Bootstrap Methods and Permutation Tests We are 95% confident that the difference in the mean GPAs of males and females at this large university after three semesters is between ⫺0.416 and 0.118. Because 0 is in this interval, we cannot conclude that the two population means are different. We will discuss hypothesis testing in Section 16.5. In this example, the bootstrap distribution of the difference is close to Normal. When the bootstrap distribution is non-Normal, we can’t trust the bootstrap t confidence interval. Fortunately, there are more general ways of using the bootstrap to get confidence intervals that can be safely applied when the bootstrap distribution is not Normal. These methods, which we discuss in Section 16.4, are the next step in practical use of the bootstrap. USE YOUR KNOWLEDGE
DATA DRP
16.18 Bootstrap comparison of average reading abilities. Table 7.4 (page 452) gives the scores on a test of reading ability for two groups of third-grade students. The treatment group used “directed reading activities” and the control group followed the same curriculum without the activities. (a) Bootstrap the difference in means x1 ⫺ x2 and report the bootstrap standard error.
CHALLENGE
(b) Inspect the bootstrap distribution. Is a bootstrap t confidence interval appropriate? If so, give a 95% confidence interval. (c) Compare the bootstrap results with the two-sample t confidence interval reported in Example 7.14 on page 453.
DATA GPA
16.19 Formula-based versus bootstrap standard error. We have a formula (page 451) for the standard error of x1 ⫺ x2. This formula does not depend on Normality. How does this formula-based standard error for the data of Example 16.6 compare with the bootstrap standard error? BEYOND THE BASICS
The Bootstrap for a Scatterplot Smoother CHALLENGE
The bootstrap idea can be applied to quite complicated statistical methods, such as the scatterplot smoother illustrated in Chapter 2 (page 96).
EXAMPLE 16.7 Do all daily numbers have an equal payoff? The New Jersey Pick-It Lottery is a daily numbers game run by the state of New Jersey. We’ll analyze the first 254 drawings after the lottery was started in 1975.4 Buying a ticket entitles a player to pick a number between 000 and 999. Half the money bet each day goes into the prize pool. (The state takes the other half.) The state picks a winning number at random, and the prize pool is shared equally among all winning tickets. Although all numbers are equally likely to win, numbers chosen by fewer people have bigger payoffs if they win because the prize is shared among fewer tickets. Figure 16.12 is a scatterplot of the first 254 winning numbers and their payoffs. What patterns can we see?
16.2 First Steps in Using the Bootstrap
800
16-21
Smooth Regression line
Payoff ($)
600
FIGURE 16.12 The first 254 winning numbers in the New Jersey Pick-It Lottery and the payoffs for each, for Example 16.7. To see patterns we use least-squares regression (dashed line) and a scatterplot smoother (curve).
400
200
0
200
400
600
800
1000
Number
The straight line in Figure 16.12 is the least-squares regression line. The line shows a general trend of higher payoffs for larger winning numbers. The curve in the figure was fitted to the plot by a scatterplot smoother that follows local patterns in the data rather than being constrained to a straight line. The curve suggests that there were larger payoffs for numbers in the intervals 000 to 100, 400 to 500, 600 to 700, and 800 to 999. Are the patterns displayed by the scatterplot smoother just chance? We can use the bootstrap distribution of the smoother’s curve to get an idea of how much random variability there is in the curve. Each resample “statistic” is now a curve rather than a single number. Figure 16.13 shows the curves that result from applying the smoother to 20 resamples from the 254 data points
FIGURE 16.13 The curves produced by the scatterplot smoother for 20 resamples from the data displayed in Figure 16.12. The curve for the original sample is the heavy line.
800
Original smooth Bootstrap smooths
Payoff ($)
600
400
200
0
200
400
600
Number
800
1000
16-22
CHAPTER 16
•
Bootstrap Methods and Permutation Tests in Figure 16.12. The original curve is the thick line. The spread of the resample curves about the original curve shows the sampling variability of the output of the scatterplot smoother. Nearly all the bootstrap curves mimic the general pattern of the original smoother curve, showing, for example, the same low average payoffs for numbers in the 200s and 300s. This suggests that these patterns are real, not just chance. In fact, when people pick “random” numbers, they tend to choose numbers starting with 2, 3, 5, or 7, so these numbers have lower payoffs. This pattern disappeared after 1976; it appears that players noticed the pattern and changed their number choices.
SECTION 16.2 Summary Bootstrap distributions mimic the shape, spread, and bias of sampling distributions. The bootstrap standard error SEboot of a statistic is the standard deviation of its bootstrap distribution. It measures how much the statistic varies under random sampling. The bootstrap estimate of the bias of a statistic is the mean of the bootstrap distribution minus the statistic for the original data. Small bias means that the bootstrap distribution is centered at the statistic of the original sample and suggests that the sampling distribution of the statistic is centered at the population parameter. The bootstrap can estimate the sampling distribution, bias, and standard error of a wide variety of statistics, such as the trimmed mean, whether or not statistical theory tells us about their sampling distributions. If the bootstrap distribution is approximately Normal and the bias is small, we can give a bootstrap t confidence interval, statistic ⫾ t*SEboot , for the parameter. Do not use this t interval if the bootstrap distribution is not Normal or shows substantial bias. To use the bootstrap to compare two populations, draw separate resamples from each sample and compute a statistic that compares the two groups. Repeat many times and use the bootstrap distribution for inference.
SECTION 16.2 Exercises For Exercises 16.16 and 16.17, see page 16-17; and for Exercises 16.18 and 16.19, see page 16-20. 16.20 Should you use the bootstrap standard error and the t distribution for the confidence interval? For each of the following situations, explain whether or not you would use the bootstrap standard error and the t distribution for the confidence interval. Give reasons for your answers. (a) The bootstrap distribution of the mean is approximately Normal, and the difference between the mean of the data and the mean of the bootstrap distribution is large relative to the mean of the data.
(b) The bootstrap distribution of the mean is approximately Normal, and the difference between the mean of the data and the mean of the bootstrap distribution is small relative to the mean of the data. (c) The bootstrap distribution of the mean is clearly skewed, and the difference between the mean of the data and the mean of the bootstrap distribution is large relative to the mean of the data. (d) The bootstrap distribution of the mean is clearly skewed, and the difference between the mean of the data and the mean of the bootstrap distribution is small relative to the mean of the data.
16.2 First Steps in Using the Bootstrap 16.21 Use the bootstrap standard error and the t distribution for the confidence interval. The observed mean is 112.3, the mean of the bootstrap distribution is 109.8, the standard error is 9.4, and n ⫽ 51. Use the t distribution to find the 95% confidence interval. 16.22 Bootstrap t confidence interval for the StubHub! prices. In Exercise 16.9 (page 16-12) we examined the bootstrap for the prices of tickets to the NCAA Women’s Final Four Basketball Championship in New Orleans. STUBHUB (a) Find the bootstrap t 95% confidence interval for these data. (b) Compare the interval you found in part (a) with the usual t interval. (c) Which interval do you prefer? Give reasons for your answer. 16.23 Bootstrap t confidence interval for the ages of the Titanic passengers. In Exercise 16.11 (page 16-12) we examined the bootstrap for the ages of the Titanic passengers. TITANIC (a) Find the bootstrap t 95% confidence interval for these data. (b) Compare the interval you found in part (a) with the usual t interval. (c) Which interval do you prefer? Give reasons for your answer. 16.24 Bootstrap t confidence interval for time spent watching videos on a cell phone. Return to or re-create the bootstrap distribution of the sample mean for the eight times spent watching videos in Exercise 16.10 (page 16-12). (a) Although the sample is small, verify using graphs and numerical summaries of the bootstrap distribution that the distribution is reasonably Normal and that the bias is small relative to the observed x. VIDEO (b) The bootstrap t confidence interval for the population mean m is therefore justified. Give the 95% bootstrap t confidence interval for m. (c) Give the usual t 95% interval and compare it with your interval from part (b). 16.25 Bootstrap t confidence interval for service center call lengths. Return to or re-create the bootstrap distribution of the sample mean for the 80 service center call lengths in Exercise 16.14 (page 16-12). CALLS80
16-23
(a) What is the bootstrap estimate of the bias? Verify from the graphs of the bootstrap distribution that the distribution is reasonably Normal (some right-skew remains) and that the bias is small relative to the observed x. The bootstrap t confidence interval for the population mean m is therefore justified. (b) Give the 95% bootstrap t confidence interval for m. (c) The only difference between the bootstrap t and usual one-sample t confidence intervals is that the bootstrap interval uses SEboot in place of the formulabased standard error s兾 1n. What are the values of the two standard errors? Give the usual t 95% interval and compare it with your interval from part (b). 16.26 Another bootstrap distribution of the trimmed mean. Bootstrap distributions and quantities based on them differ randomly when we repeat the resampling process. A key fact is that they do not differ very much if we use a large number of resamples. Figure 16.9 (page 16-15) shows one bootstrap distribution of the trimmed mean of the GPA data. Repeat the resampling of these data to get another bootstrap distribution of the trimmed mean. GPA (a) Plot the bootstrap distribution and compare it with Figure 16.9. Are the two bootstrap distributions similar? (b) What are the values of the bias and bootstrap standard error for your new bootstrap distribution? How do they compare with the previous values given on page 16-15? (c) Find the 95% bootstrap t confidence interval based on your bootstrap distribution. Compare it with the previous result in Example 16.5 (page 16-16). 16.27 Bootstrap distribution of the standard deviation s. For Example 16.5 (page 16-16) we bootstrapped the 25% trimmed mean of 150 GPAs. Another statistic whose sampling distribution is unfamiliar to us is the standard deviation s. Bootstrap s for these data. Discuss the shape and bias of the bootstrap distribution. Is the bootstrap t confidence interval for the population standard deviation s justified? If it is, give a 95% confidence interval. GPA 16.28 Bootstrap comparison of tree diameters. In Exercise 7.85 (page 471) you were asked to compare the mean diameter at breast height (DBH) for trees from the northern and southern halves of a land tract using a random sample of 30 trees from each region. NSPINES
16-24
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
(a) Use a back-to-back stemplot or side-by-side boxplots to examine the data graphically. Does it appear reasonable to use standard t procedures?
usual t 95% confidence intervals for the population mean m. 16.30 Bootstrap distribution of the median. We will see in Section 16.3 that bootstrap methods often work poorly for the median. To illustrate this, bootstrap the sample median of the 50 times to start a business that we studied in Example 16.1 (page 16-3). Why is the bootstrap t confidence interval not justified? TIME50
(b) Bootstrap the difference in means xNorth ⫺ xSouth and look at the bootstrap distribution. Does it meet the conditions for a bootstrap t confidence interval? (c) Report the bootstrap standard error and the 95% bootstrap t confidence interval. (d) Compare the bootstrap results with the usual two-sample t confidence interval. 16.29 Bootstrapping a Normal data set. The following data are “really Normal.” They are an SRS from the standard Normal distribution N10, 12, produced by a software Normal random number generator. NORMALD
0.01 20.04 21.02 20.13 20.36 20.03 21.88 20.02 21.01 0.23 20.52
2.40
0.58
0.92 21.38 20.47 20.80
0.08 20.03
0.42 20.31
0.90 21.16
2.29 21.11 22.23
0.56
2.69
1.09
0.10 20.92 20.07 21.76
0.41
0.54
0.08
1.47
0.45
0.34
1.23
0.11
0.75
0.30 20.53 0.51
0.34 20.00
16.31 Bootstrap distribution of the mpg standard deviation. Computers in some vehicles calculate various quantities related to performance. One of these is the fuel efficiency, or gas mileage, usually expressed as miles per gallon (mpg). For one vehicle equipped in this way, the mpg were recorded each time the gas tank was filled, and the computer was then reset. We studied these data in Exercise 7.30 (page 443) using methods based on Normal distributions.5 Here are the mpg values for a random sample of 20 of these 1.21 records: MPG20 1.56
0.32 21.35 22.42
2.47
2.99 21.56
1.27
1.55
0.80 20.59
0.89
22.36
1.27 21.11
0.56 21.12
0.25
0.29
0.99
0.30
0.05
1.44 22.46
0.91
0.48
0.02 20.54
0.51
0.10
(a) Make a histogram and Normal quantile plot. Do the data appear to be “really Normal”? From the histogram, does the N10, 12 distribution appear to describe the data well? Why? (b) Bootstrap the mean. Why do your bootstrap results suggest that t confidence intervals are appropriate? (c) Give both the bootstrap and the formula-based standard errors for x. Give both the bootstrap and
41.5
50.7
36.6
37.3
34.2
45.0
48.0
43.2
47.7
42.2
43.2
44.6
48.4
46.4
46.8
39.2
37.3
43.5
44.3
43.3
In addition to the average mpg, the driver is also interested in how much variability there is in the mpg. (a) Calculate the sample standard deviation s for these mpg values. (b) We have no formula for the standard error of s. Find the bootstrap standard error for s. (c) What does the standard error indicate about how accurate the sample standard deviation is as an estimate of the population standard deviation? (d) Would it be appropriate to give a bootstrap t interval for the population standard deviation? Why or why not?
16.3 How Accurate Is a Bootstrap Distribution? When you complete this section, you will be able to • Describe the effect of the size of the original sample on the variation in bootstrap distributions. • Describe the effect of the number of resamples on the variation in bootstrap distributions.
16.3 How Accurate Is a Bootstrap Distribution?
16-25
We said earlier that “When can I safely bootstrap?” is a somewhat subtle issue. Now we will give some insight into this issue. We understand that a statistic will vary from sample to sample and that inference about the population must take this random variation into account. The sampling distribution of a statistic displays the variation in the statistic due to selecting samples at random from the population. For example, the margin of error in a confidence interval expresses the uncertainty due to sampling variation. In this chapter we have used the bootstrap distribution as a substitute for the sampling distribution. This introduces a second source of random variation: choosing resamples at random from the original sample.
SOURCES OF VARIATION IN A BOOTSTRAP DISTRIBUTION Bootstrap distributions and conclusions based on them include two sources of random variation: 1. Choosing an original sample at random from the population. 2. Choosing bootstrap resamples at random from the original sample.
A statistic in a given setting has only one sampling distribution. It has many bootstrap distributions, formed by the two-step process just described. Bootstrap inference generates one bootstrap distribution and uses it to tell us about the sampling distribution. Can we trust such inference? Figure 16.14 displays an example of the entire process. The population distribution (top left) has two peaks and is far from Normal. The histograms in the left column of the figure show five random samples from this population, each of size 50. The line in each histogram marks the mean x of that sample. These vary from sample to sample. The distribution of the x-values from all possible samples is the sampling distribution. This sampling distribution appears to the right of the population distribution. It is close to Normal, as we expect because of the central limit theorem. The middle column in Figure 16.14 displays the bootstrap distribution of x for each of the five samples. Each distribution was created by drawing 1000 resamples from the original sample, calculating x for each resample, and presenting the 1000 x’s in a histogram. The right column shows the bootstrap distribution of the first sample, repeating the resampling five more times. Compare the five bootstrap distributions in the middle column to see the effect of the random choice of the original sample. Compare the six bootstrap distributions drawn from the first sample to see the effect of the random resampling. Here’s what we see: • Each bootstrap distribution is centered close to the value of x for its original sample. That is, the bootstrap estimate of bias is small in all five
16-26
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
FIGURE 16.14 Five random samples of n 5 50 from the same population, with a bootstrap distribution of the sample mean formed by resampling from each of the five samples. At the right are five more bootstrap distributions from the first sample.
Sampling distribution
Population distribution
Population mean = μ Sample mean = x–
–3
0 μ 3
6
3
0
x– 3
0
x– 3
x–
x–
0
3
0
0
3
x–
x–
3
x–
3
Bootstrap distribution 5 for Sample 1
0
Bootstrap distribution for Sample 5
0
3
Bootstrap distribution 4 for Sample 1
3
x–
x– Bootstrap distribution 3 for Sample 1
Bootstrap distribution for Sample 4
Sample 5
0
0
Bootstrap distribution for Sample 3
Sample 4
0 x– 3
3
0
Sample 3
0 x– 3
x–
Bootstrap distribution 2 for Sample 1
Bootstrap distribution for Sample 2
Sample 2
0
3
Bootstrap distribution for Sample 1
Sample 1
0 x–
μ
0
x–
3
Bootstrap distribution 6 for Sample 1
3
0
x–
3
16.3 How Accurate Is a Bootstrap Distribution?
16-27
cases. Of course, the five x-values vary, and not all are close to the population mean m. • The shape and spread of the bootstrap distributions in the middle column vary a bit, but all five resemble the sampling distribution in shape and spread. That is, the shape and spread of a bootstrap distribution depend on the original sample, but the variation from sample to sample is not great. • The six bootstrap distributions from the same sample are very similar in shape, center, and spread. That is, random resampling adds very little variation to the variation due to the random choice of the original sample from the population. Figure 16.14 reinforces facts that we have already relied on. If a bootstrap distribution is based on a moderately large sample from the population, its shape and spread don’t depend heavily on the original sample and do mimic the shape and spread of the sampling distribution. Bootstrap distributions do not have the same center as the sampling distribution; they mimic bias, not the actual center. The figure also illustrates a fact that is important for practical use of the bootstrap: the bootstrap resampling process (using 1000 or more resamples) introduces very little additional variation. We can rely on a bootstrap distribution to inform us about the shape, bias, and spread of the sampling distribution.
Bootstrapping small samples We now know that almost all the variation in bootstrap distributions for a statistic such as the mean comes from the random selection of the original sample from the population. We also know that in general statisticians prefer large samples because small samples give more variable results. This general fact is also true for bootstrap procedures. Figure 16.15 repeats Figure 16.14, with two important differences. The five original samples are only of size n ⫽ 9, rather than the n ⫽ 50 of Figure 16.14. Also, the population distribution (top left) is Normal, so that the sampling distribution of x is Normal despite the small sample size. Even with a Normal population distribution, the bootstrap distributions in the middle column show much more variation in shape and spread than those for larger samples in Figure 16.14. Notice, for example, how the skewness of the fourth sample produces a skewed bootstrap distribution. The bootstrap distributions are no longer all similar to the sampling distribution at the top of the column. We can’t trust a bootstrap distribution from a very small sample to closely mimic the shape and spread of the sampling distribution. Bootstrap confidence intervals will sometimes be too long or too short, or too long in one direction and too short in the other. The six bootstrap distributions based on the first sample are again very similar. Because we used 1000 resamples, resampling adds very little variation. There are subtle effects that can’t be seen from a few pictures, but the main conclusions are clear.
16-28
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
FIGURE 16.15 Five random samples of n 5 9 from the same population, with a bootstrap distribution of the sample mean formed by resampling from each of the five samples. At the right are five more bootstrap distributions from the first sample.
Population distribution
Sampling distribution
Population mean_ = Sample mean = x
–3
3
Sample 1
_ x
3
_ x
3
Sample 3
_ x
3
3
–3
_ x
–3
3
_ x
_ x
3
–3
_ x
3
–3
_ x
_ x
3
–3
_ x
3
Bootstrap distribution 4 for Sample 1
3
–3
_ x
3
Bootstrap distribution 5 for Sample 1
3
–3
_ x
3
Bootstrap distribution 6 for Sample 1
Bootstrap distribution for Sample 5
_ x
–3
Bootstrap distribution 3 for Sample 1
Bootstrap distribution for Sample 4
Sample 5
–3
_ x
–3
Bootstrap distribution 2 for Sample 1
Bootstrap distribution for Sample 3
Sample 4
–3
3
Bootstrap distribution for Sample 2
Sample 2
–3
Bootstrap distribution for Sample 1
–3
–3
–3
3
–3
_ x
3
16.3 How Accurate Is a Bootstrap Distribution?
16-29
VARIATION IN BOOTSTRAP DISTRIBUTIONS For most statistics, almost all the variation in bootstrap distributions comes from the selection of the original sample from the population. You can reduce this variation by using a larger original sample. Bootstrapping does not overcome the weakness of small samples as a basis for inference. We will describe some bootstrap procedures that are usually more accurate than standard methods, but even they may not be accurate for very small samples. Use caution in any inference—including bootstrap inference—from a small sample. The bootstrap resampling process using 1000 or more resamples introduces very little additional variation.
Bootstrapping a sample median In dealing with the grade point averages in Example 16.5, we chose to bootstrap the 25% trimmed mean rather than the median. We did this in part because the usual bootstrapping procedure doesn’t work well for the median unless the original sample is quite large. Now we will bootstrap the median in order to understand the difficulties. Figure 16.16 follows the format of Figures 16.14 and 16.15. The population distribution appears at top left, with the population median M marked. Below in the left column are five samples of size n ⫽ 15 from this population, with their sample medians m marked. Bootstrap distributions of the median based on resampling from each of the five samples appear in the middle column. The right column again displays five more bootstrap distributions from resampling the first sample. The six bootstrap distributions from the same sample are once again very similar to each other—resampling adds little variation—so we concentrate on the middle column in the figure. Bootstrap distributions from the five samples differ markedly from each other and from the sampling distribution at the top of the column. Here’s why. The median of a resample of size 15 is the eighth-largest observation in the resample. This is always one of the 15 observations in the original sample and is usually one of the middle observations. Each bootstrap distribution repeats the same few values, and these values depend on the original sample. The sampling distribution, on the other hand, contains the medians of all possible samples and is not confined to a few values. The difficulty is somewhat less when n is even, because the median is then the average of two observations. It is much less for moderately large samples, say n ⫽ 100 or more. Bootstrap standard errors and confidence intervals from such samples are reasonably accurate, though the shapes of the bootstrap distributions may still appear odd. You can see that the same difficulty will occur for small samples with other statistics, such as the quartiles, that are calculated from just one or two observations from a sample. There are more advanced variations of the bootstrap idea that improve performance for small samples and for statistics such as the median and quartiles. Unless you have expert advice or undertake further study, avoid bootstrapping the median and quartiles unless your sample is rather large.
16-30
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
FIGURE 16.16 Five random samples of n 5 15 from the same population, with a bootstrap distribution of the sample median formed by resampling from each of the five samples. At the right are five more bootstrap distributions from the first sample.
Population distribution
–4
M
10
Sampling distribution
–4
M
10
Sample 1
–4
m
10
m
10
–4
m
10
m
10
–4
m
10
m
10
–4
m
10
m
–4
m
10
10
–4
m
m
10
10
Bootstrap distribution 4 for Sample 1
–4
m
10
Bootstrap distribution 5 for Sample 1
–4
m
Bootstrap distribution for Sample 5
–4
10
Bootstrap distribution 3 for Sample 1
Bootstrap distribution for Sample 4
Sample 5
–4
m
Bootstrap distribution for Sample 3
Sample 4
–4
–4
Bootstrap distribution for Sample 2
Sample 3
–4
Bootstrap distribution 2 for Sample 1
Bootstrap distribution for Sample 1
Sample 2
–4
Population median = M Sample median = m
10
Bootstrap distribution 6 for Sample 1
–4
m
10
16.3 How Accurate Is a Bootstrap Distribution?
16-31
SECTION 16.3 Summary Almost all the variation in a bootstrap distribution for a statistic is due to the selection of the original random sample from the population. Resampling introduces little additional variation. Bootstrap distributions based on small samples can be quite variable. Their shape and spread reflect the characteristics of the sample and may not accurately estimate the shape and spread of the sampling distribution. Bootstrap inference from a small sample may therefore be unreliable. Bootstrap inference based on samples of moderate size is unreliable for statistics like the median and quartiles that are calculated from just a few of the sample observations.
SECTION 16.3 Exercises 16.32 Variation in the bootstrap distributions. Consider the variation in the bootstrap for each of the following situations with two scenarios, S1 and S2. In comparing the variation, do you expect, in general, that S1 will have less variation than S2, that S2 will have less variation than S1, or that the variation for S1 and S2 will be approximately the same? Give reasons for your answers. Here, we use n for the size of the original sample and B for the number of resamples.
start a business for a random sample of 50 countries. The entire survey included 185 countries. The distribution of times is very non-Normal. A histogram with a smooth density curve is given in Figure 1.19(a) (page 54). However, for this histogram we excluded one country, Suriname, where it takes 694 days to start a business. Exclude Suriname from the data set and use the remaining data for the remaining 184 countries. TIME184
(a) S1: n ⫽ 50, B ⫽ 2000; S2: n ⫽ 50, B ⫽ 4000.
(a) Let’s think of the 184 countries as the population for this exercise. Find the mean m and the standard deviation s for this population.
(b) S1: n ⫽ 10, B ⫽ 2000; S2: n ⫽ 50, B ⫽ 2000. (c) S1: n ⫽ 50, B ⫽ 200; S2: n ⫽ 50, B ⫽ 2000. (d) S1: n ⫽ 10, B ⫽ 2000; S2: n ⫽ 50, B ⫽ 4000. 16.33 Bootstrap versus sampling distribution. Most statistical software includes a function to generate samples from Normal distributions. Set the mean to 26 and the standard deviation to 27. You can think of all the numbers that would be produced by this function if it ran forever as a population that has the N126, 272 distribution. Samples produced by the function are samples from this population. (a) What is the exact sampling distribution of the sample mean x for a sample of size n from this population? (b) Draw an SRS of size n ⫽ 10 from this population. Bootstrap the sample mean x using 2000 resamples from your sample. Give a histogram of the bootstrap distribution and the bootstrap standard error. (c) Repeat the same process for samples of sizes n ⫽ 40 and n ⫽ 160. (d) Write a careful description comparing the three bootstrap distributions and also comparing them with the exact sampling distribution. What are the effects of increasing the sample size? 16.34 The effect of increasing the sample size. The data for Example 16.1 (page 16-3) are the times to
(b) Although we don’t know the shape of the sampling distribution of the sample mean x for a sample of size n from this population, we do know the mean and standard deviation of this distribution. What are they? (c) Draw an SRS of size n ⫽ 10 from this population. Bootstrap the sample mean x using 2000 resamples from your sample. Give a histogram of the bootstrap distribution and the bootstrap standard error. (d) Repeat the same process for samples of sizes n ⫽ 40 and n ⫽ 160. (e) Write a careful description comparing the three bootstrap distributions. What are the effects of increasing the sample size? 16.35 The effect of non-Normality. The populations in the two previous exercises have the same mean and standard deviation, but one is Normal and the other is strongly non-Normal. Based on your work in these exercises, how does non-Normality of the population affect the bootstrap distribution of x? How does it affect the bootstrap standard error? Do either of these effects diminish when we start with a larger sample? Explain what you have observed based on what you know about the sampling distribution of x and the way in which bootstrap distributions mimic the sampling distribution.
16-32
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
16.4 Bootstrap Confidence Intervals When you complete this section, you will be able to • Use the bootstrap distribution to find a bootstrap percentile confidence interval. • Read software output to find the BCa confidence interval.
Until now, we have met just one type of inference procedure based on resampling, the bootstrap t confidence intervals. We can calculate a bootstrap t confidence interval for any parameter by bootstrapping the corresponding statistic. We don’t need conditions on the population or special knowledge about the sampling distribution of the statistic. The flexible and almost automatic nature of bootstrap t intervals is appealing—but there is a catch. These intervals work well only when the bootstrap distribution tells us that the sampling distribution is approximately Normal and has small bias. How well must these conditions be met? What can we do if we don’t trust the bootstrap t interval? In this section we will see how to quickly check t confidence intervals for accuracy, and we will learn alternative bootstrap confidence intervals that can be used more generally than the bootstrap t.
Bootstrap percentile confidence intervals Confidence intervals are based on the sampling distribution of a statistic. If a statistic has no bias as an estimator of a parameter, its sampling distribution is centered at the true value of the parameter. We can then get a 95% confidence interval by marking off the central 95% of the sampling distribution. The t critical values in a t confidence interval are a shortcut to marking off the central 95%. This shortcut doesn’t work under all conditions—it depends both on lack of bias and on Normality. One way to check whether t intervals (using either bootstrap or formula-based standard errors) are reasonable is to compare them with the central 95% of the bootstrap distribution. The 2.5 and 97.5 percentiles mark off the central 95%. The interval between the 2.5 and 97.5 percentiles of the bootstrap distribution is often used as a confidence interval in its own right. It is known as a bootstrap percentile confidence interval.
BOOTSTRAP PERCENTILE CONFIDENCE INTERVALS The interval between the 2.5 and 97.5 percentiles of the bootstrap distribution of a statistic is a 95% bootstrap percentile confidence interval for the corresponding parameter. Use this method when the bootstrap estimate of bias is small.
The conditions for safe use of bootstrap t and bootstrap percentile intervals are a bit vague. We recommend that you check whether these intervals are reasonable by comparing them with each other. If the bias of the bootstrap distribution is small and the distribution is close to Normal, the bootstrap t and percentile confidence intervals will agree closely.
16.4 Bootstrap Confidence Intervals
16-33
Percentile intervals, unlike t intervals, do not ignore skewness. Percentile intervals are therefore usually more accurate, as long as the bias is small. Because we will soon meet a much more accurate bootstrap interval, our recommendation is that when bootstrap t and bootstrap percentile intervals do not agree closely, neither type of interval should be used.
EXAMPLE 16.8 Bootstrap percentile confidence interval for the trimmed mean. In Example 16.5 (page 16-16) we found that a 95% bootstrap t confidence interval for the 25% trimmed mean of GPA for the population of college students after three semesters at this large university is between 2.794 and 3.106. The bootstrap distribution in Figure 16.9 shows a small bias and, though closely Normal, is a bit skewed. Is the bootstrap t confidence interval accurate for these data? We can use the quantile function in R to compute the needed percentiles of our 3000 resamples. For this bootstrap distribution, the 2.5 and 97.5 percentiles are 2.793 and 3.095, respectively. These are the endpoints of the 95% bootstrap percentile confidence interval. This interval is quite close to the bootstrap t interval. We conclude that both intervals are reasonably accurate. The bootstrap t interval for the trimmed mean of GPA in Example 16.8 is x25% ⫾ t*SEboot ⫽ 2.950 ⫾ 0.156 We can learn something by also writing the percentile interval starting at the statistic x25% ⫽ 2.950. In this form, it is 2.950 ⫺ 0.157, 2.950 ⫹ 0.145 Unlike the t interval, the percentile interval is not symmetric—its endpoints are different distances from the statistic. The slightly greater distance to the 2.5 percentile reflects the slight left-skewness of the bootstrap distribution. USE YOUR KNOWLEDGE 16.36 Determining the percentile endpoints. What percentiles of the bootstrap distribution are the endpoints of a 99% bootstrap percentile confidence interval? How do they change for a 90% bootstrap percentile confidence interval? DATA TIME50
16.37 Bootstrap percentile confidence interval for time to start a business. Consider the random subset of the time to start a business data in Exercise 16.1 (page 16-3). Bootstrap the sample mean using 2000 resamples. (a) Make a histogram and a Normal quantile plot. Does the bootstrap distribution appear close to Normal? Is the bias small relative to the observed sample mean? (b) Find the 95% bootstrap t confidence interval.
CHALLENGE
(c) Give the 95% confidence percentile interval and compare it with the interval in part (b).
16-34
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
A more accurate bootstrap confidence interval: BCa
accurate
Any method for obtaining confidence intervals requires some conditions in order to produce exactly the intended confidence level. These conditions (for example, Normality) are never exactly met in practice. So a 95% confidence interval in practice will not capture the true parameter value exactly 95% of the time. In addition to “hitting” the parameter 95% of the time, a good confidence interval should divide its 5% of “misses” equally between high misses and low misses. We will say that a method for obtaining 95% confidence intervals is accurate in a particular setting if 95% of the time it produces intervals that capture the parameter and if the 5% of misses are equally shared between high and low misses. Perfect accuracy isn’t available in practice, but some methods are more accurate than others. One advantage of the bootstrap is that we can to some extent check the accuracy of the bootstrap t and percentile confidence intervals by examining the bootstrap distribution for bias and skewness and by comparing the two intervals with each other. The interval in Example 16.8 reveals a slight leftskewness, but not enough to invalidate inference. In general, the t and percentile intervals may not be sufficiently accurate when • the statistic is strongly biased, as indicated by the bootstrap estimate of bias. • the sampling distribution of the statistic is clearly skewed, as indicated by the bootstrap distribution and by comparing the t and percentile intervals. Most confidence interval procedures are more accurate for larger sample sizes. The t and percentile procedures improve only slowly: they require 100 times more data to improve accuracy by a factor of 10. (Recall the 1n in the formula for the usual one-sample t interval.) These intervals may not be very accurate except for quite large sample sizes. There are more elaborate bootstrap procedures that improve faster, requiring only 10 times more data to improve accuracy by a factor of 10. These procedures are quite accurate unless the sample size is very small.
BCa CONFIDENCE INTERVALS The bootstrap bias-corrected accelerated (BCa) interval is a modification of the percentile method that adjusts the percentiles to correct for bias and skewness. This method is accurate in a wide variety of settings, has reasonable computation requirements (by modern standards), and does not produce excessively wide intervals. The BCa intervals are among the most widely used intervals. Since this interval is related to the percentile method, it is still based on the key ideas of resampling and the bootstrap distribution. Now that you understand these concepts, you should always use this more accurate method (or an alternative like tilting intervals) if your software offers it. The details of producing confidence intervals are quite technical.6 The BCa method requires more than 1000 resamples for high accuracy. We recommend that you use 5000 or more resamples. Don’t forget that even BCa confidence intervals should be used cautiously when sample sizes are small, because there are not enough data to accurately determine the necessary corrections for bias and skewness.
16.4 Bootstrap Confidence Intervals
16-35
EXAMPLE DATA GPA
CHALLENGE
16.9 The BCa confidence interval for the ratio of variances. In Example 16.6 (page 16-18), we compared the GPA means of men and women using a 95% bootstrap t confidence interval. Because 0 was contained in the interval, we concluded that there was not enough evidence to state that the two means were different. Suppose we also want to compare the variances. Figure 16.10 (page 16-18) suggests that the spread among the male GPAs is larger than that of the females. The ratio of the male sample variance to the female sample variance is 1.321. Can we conclude there is a difference? In Section 7.3, we discussed an F test for the equality of spread but also warned that this approach was very sensitive to non-Normal data. Because our GPA data are heavily skewed, we cannot trust this test and instead will use the bootstrap. Specifically, we’ll form a 95% confidence interval for s21兾s22. Figure 16.17 shows the bootstrap distribution of the ratio of sample variances s21兾s22. We see strong skewness in the bootstrap distribution and therefore in the sampling distribution. This is not unexpected. Recall that if the data are Normal and the variances are equal, we’d expect this ratio to follow an F distribution. The bootstrap t and percentile intervals aren’t reliable when the sampling distribution of the statistic is skewed. Figure 16.18 shows software output that includes the percentile and BCa intervals. The bootstrap t interval is closely related to the Normal interval that is also supplied. The basic confidence interval is another method based on the percentiles of the bootstrap distribution that we will not discuss here. The BCa interval is 11.321 ⫺ 0.456, 1.321 ⫹ 0.9142 ⫽ 10.865, 2.2352 and the percentile interval is 11.321 ⫺ 0.468, 1.321 ⫹ 0.8802 ⫽ 10.853, 2.2012 In this case the percentile and BCa intervals are similar, but the BCa is shifted slightly, as it has adjusted for the bias, which was estimated at 0.054. Both intervals are strongly asymmetrical: the upper endpoint is about twice as far from the sample ratio as the lower endpoint. This reflects the strong right-skewness of the bootstrap distribution. R Console
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 5000 bootstrap replicates CALL : boot.ci(boot.out = gpa2.boot) Intervals : Level Normal 95% (0.608, 1.926)
0.5
1.0 1.5 2.0 2.5 3.0 Ratio of variances (male to female) of resamples
FIGURE 16.17 The bootstrap distribution of the ratio of sample variances of 5000 resamples from the data in Example 16.6.
Basic (0.441, 1.788)
Level Percentile BCa 95% (0.853, 2.201) (0.865, 2.235) Calculations and Intervals on Original Scale
FIGURE 16.18 R output for bootstrapping the ratio of variances for the GPA data.
16-36
CHAPTER 16
•
Bootstrap Methods and Permutation Tests The output in Figure 16.18 also shows that both endpoints of the lessaccurate intervals (bootstrap t via the Normal interval and the percentile interval) are too low. These intervals miss the population ratio on the low side too often (more than 2.5% of the time) and miss on the high side too seldom. They give a biased picture of where the true ratio is likely to be.
Confidence intervals for the correlation The bootstrap allows us to find confidence intervals for a wide variety of statistics. So far, we have looked at the sample mean, trimmed mean, the difference between two means, and the ratio of sample variances using a variety of different bootstrap confidence intervals. The choice of interval depended on the shape of the bootstrap distribution and the desired accuracy. Now we will bootstrap the correlation coefficient. This is our first use of the bootstrap for a statistic that depends on two related variables. As with the difference between two means, we must pay attention to how we should resample.
EXAMPLE DATA LAUNDRY
16.10 Correlation between price and rating. Consumers Union provides ratings on a large variety of consumer products. They use sophisticated testing methods as well as surveys of their members to create these ratings. The ratings are published in their magazine, Consumer Reports. An article in Consumer Reports rated laundry detergents on a scale from 1 to 100. Here are the ratings along with the price per load, in cents, for 24 laundry detergents:
CHALLENGE
Rating
Price (cents)
Rating
Price (cents)
Rating
Price (cents)
Rating
Price (cents)
61
17
59
22
56
22
55
16
55
30
52
23
51
11
50
15
50
9
48
16
48
15
48
18
46
13
46
13
45
17
36
8
35
8
34
12
33
7
32
6
32
5
29
14
26
11
26
13
In Example 2.8 (page 87) we examined the relationship between rating and price per load for these laundry detergents. We expect that the higherpriced detergents will tend to have higher ratings. The scatterplot in Figure 16.19 shows that the higher-priced products do tend to have better ratings, but the relationship is not particularly strong. The correlation is 0.671. Let’s use the bootstrap to find a 95% confidence interval for the population correlation.
Our confidence interval will also provide a test of the null hypothesis that the population correlation is zero. If the 95% confidence interval does not include zero, we can reject the null hypothesis in favor of the two-sided alternative.
16.4 Bootstrap Confidence Intervals
16-37
70
Rating
60 50 40 30
FIGURE 16.19 Scatterplot of price per load (in cents) versus rating for 24 laundry detergents, for Example 16.10.
20 0
10 20 Price per load (cents)
30
Although we would expect the correlation to be positive, we could be surprised and find that it is negative. It is important to keep in mind that we cannot use what we learned by looking at the scatterplot to formulate our alternative hypothesis. How shall we resample from the laundry detergent data? Because each observation consists of the price and the rating for one product, we resample products. Resampling prices and ratings separately would lose the connection between a product’s price and its rating. Software such as R automates proper resampling. Once we have produced a bootstrap distribution by resampling, we can examine the distribution and construct a confidence interval in the usual way. We need no special formulas or procedures to handle the correlation. Figure 16.20 shows the bootstrap distribution and Normal quantile plot for the sample correlation for 5000 resamples from the 24 laundry detergents in our sample. The bootstrap distribution is skewed to the left with relatively small bias. We’ll need to check whether a 95% bootstrap t confidence interval is reasonable here.
0.9
Correlation coefficient
0.8 0.7 0.6 0.5 0.4 0.3 0.3
0.4
0.5
0.6
0.7
0.8
Correlation coefficient of resamples
(a)
0.9
–4
–2
0 Normal score
2
(b)
FIGURE 16.20 The bootstrap distribution and Normal quantile plot for the correlation r for 5000 resamples from the laundry detergent data set.
4
16-38
CHAPTER 16
•
Bootstrap Methods and Permutation Tests The bootstrap standard error is SEboot ⫽ 0.086. The t interval using the bootstrap standard error is r ⫾ t*SEboot ⫽ 0.671 ⫾ 12.0742 10.0862 ⫽ 0.671 ⫾ 0.178 ⫽ 10.493, 0.8492 The 95% bootstrap percentile interval is 12.5 percentile, 97.5 percentile2 ⫽ 10.485, 0.8272 ⫽ 10.671 ⫺ 0.186, 0.671 ⫹ 0.1562 The two confidence intervals are not too different. If you feel this discrepancy is acceptable, you might want to use the percentile interval to account for the skewness in the bootstrap distribution. While the confidence intervals give a wide range for the population correlation, both of them include only positive values. Thus, these data provide significant evidence that there is a positive relationship between a laundry detergent’s rating and its price per load.
SECTION 16.4 Summary Both bootstrap t and (when they exist) traditional z and t confidence intervals require statistics with small bias and sampling distributions close to Normal. We can check these conditions by examining the bootstrap distribution for bias and lack of Normality. The bootstrap percentile confidence interval for 95% confidence is the interval from the 2.5 percentile to the 97.5 percentile of the bootstrap distribution. Agreement between the bootstrap t and percentile intervals is an added check on the conditions needed by the t interval. Do not use t or percentile intervals if these conditions are not met. When bias or skewness is present in the bootstrap distribution, use a BCa interval. The t and percentile intervals are inaccurate under these circumstances unless the sample sizes are very large. The BCa confidence intervals adjust for bias and skewness and are generally accurate except for small samples.
SECTION 16.4 Exercises For Exercises 16.36 and 16.37, see page 16-33. 16.38 Find the 95% bootstrap percentile confidence interval. The mean of a sample is x ⫽ 218.3 and the standard deviation is s ⫽ 55.2. The mean of the bootstrap distribution is x ⫽ 220.2 and the standard deviation is s ⫽ 11.3. A bootstrap distribution has the following percentiles: Percentile 0.01
0.025
0.05
0.10
0.50
0.90
0.95
0.975
0.99
193
198
202
206
220
234
238
242
246
Find the 95% bootstrap percentile confidence interval.
16.39 Summarize the output. Figures 16.21 and 16.22 show software output from R with information about a bootstrap analysis. Summarize the information in the output. Be sure to include the BCa confidence interval. 16.40 Confidence interval for the average IQ score. The distribution of the 60 IQ test scores in Table 1.1 (page 16) is roughly Normal, and the sample size is large enough that we expect a Normal sampling distribution. We will compare confidence intervals for the population mean IQ m based on this sample. IQ (a) Use the formula s兾 1n to find the standard error of the mean. Give the 95% t confidence interval based on this standard error.
16.4 Bootstrap Confidence Intervals 2.0
2.5
1.5
2.0
1.0
t*
Density
FIGURE 16.21 R graphical output for Exercise 16.39.
0.5
1.0
0.0
0.5 0.5
FIGURE 16.22 Output from R with bootstrap confidence intervals, for Exercise 16.39.
1.5
1.0
1.5 t*
2.0
–3 –2 –1 0 1 2 3 Normal score
2.5
R Console
ORDINARY NONPARAMETRIC BOOTSTRAP Call : boot(data = bc, statistic = theta, R = 5000)
Bootstrap Statistics : original bias t1* 1.20713 0.04544967
std. error 0.2336016
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 5000 bootstrap replicates CALL : boot.ci(boot.out = corr1.boot) Intervals : Level Normal 95% (0.704, 1.620)
Basic (0.653, 1.554)
Level Percentile 95% (0.860, 1.762)
BCa (0.766, 1.671)
16-39
16-40
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
(b) Bootstrap the mean of the IQ scores. Make a histogram and a Normal quantile plot of the bootstrap distribution. Does the bootstrap distribution appear Normal? What is the bootstrap standard error? Give the 95% bootstrap t confidence interval. (c) Give the 95% confidence percentile and BCa intervals. Make a graphical comparison by drawing a vertical line at the original sample mean x and displaying the three intervals vertically, one above the other. How well do your four confidence intervals agree? Was bootstrapping needed to find a reasonable confidence interval, or was the formula-based confidence interval good enough? 16.41 Confidence interval for a Normal data set. In Exercise 16.29 (page 16-24) you bootstrapped the mean of a simulated SRS from the standard Normal distribution N10, 12 and found the 95% standard t and bootstrap t confidence intervals for the mean. NORMALD (a) Find the 95% bootstrap percentile confidence interval. Does this interval confirm that the t intervals are acceptable? (b) We know that the population mean is 0. Do the confidence intervals capture this mean? 16.42 Using bootstrapping to check traditional methods. Bootstrapping is a good way to check if traditional inference methods are accurate for a given sample. Consider the following data: DATA30 98
107
113
104
94
100
107
98
112
97
99
95
97
90
109
102
89
101
93
95
95
87
91
101
119
116
91
95
95
104
(a) Examine the data graphically. Do they appear to violate any of the conditions needed to use the one-sample t confidence interval for the population mean?
for the trimmed mean, given in the discussion of Example 16.4 (page 16-14). Does the comparison suggest any skewness? GPA 16.44 More on using bootstrapping to check traditional methods. Continue to work with the data given in Exercise 16.42. DATA30 (a) Find the 95% BCa confidence interval. (b) Does your opinion of the robustness of the onesample t confidence interval change when comparing it with the BCa interval? (c) To check the accuracy of the one-sample t confidence interval, would you generally use the bootstrap percentile or the BCa interval? Explain. 16.45 BCa interval for the correlation coefficient. Find the 95% BCa confidence interval for the correlation between price and rating, from the data in Example 16.10 (page 16-36). Is this more accurate interval in general agreement with the 95% bootstrap t and percentile intervals? Do you still agree with the judgment in the discussion of Example 16.10 that the simpler intervals are adequate? LAUNDRY 16.46 Bootstrap confidence intervals for the average audio file length. In Exercise 16.17 (page 16-17), you found a bootstrap t confidence interval for the population mean m. Careful examination of the bootstrap distribution reveals a slight skewness in the right tail. Is this something to be concerned about? Bootstrap the mean and give all three 95% bootstrap confidence intervals: t, percentile, and BCa. Make a graphical comparison by displaying the three intervals vertically, one above the other. Discuss what you see. SONGS
(d) Find the 95% bootstrap percentile interval. Does it agree with the two t intervals? What do you conclude about the accuracy of the one-sample t interval here?
16.47 Bootstrap confidence intervals for service center call lengths. The distribution of the call center lengths that you used in Exercise 16.25 (page 16-23) is strongly skewed. In that exercise you found a bootstrap t confidence interval for the population mean m, even though some skewness remains in the bootstrap distribution. Bootstrap the mean length and give all three bootstrap 95% confidence intervals: t, percentile, and BCa. Make a graphical comparison by drawing a vertical line at the original sample mean x and displaying the three intervals horizontally, one above the other. Discuss what you see: Do bootstrap t and percentile agree? Does the more accurate interval agree with the two simpler methods? CALLS80
16.43 Comparing bootstrap confidence intervals. The graphs in Figure 16.9 (page 16-15) do not appear to show any important skewness in the bootstrap distribution of the trimmed mean for Example 16.4. Compare the bootstrap percentile and bootstrap t intervals
16.48 Bootstrap confidence intervals for the standard deviation. We would like a 95% confidence interval for the standard deviation s of 150 GPAs. In Exercise 16.27 (page 16-23) we considered the bootstrap t interval. Now we have a more accurate method. Bootstrap s and report all three
(b) Calculate the 95% one-sample t confidence interval for this sample. (c) Bootstrap the data, and inspect the bootstrap distribution of the mean. Does it suggest that a t interval should be reasonably accurate? Calculate the bootstrap t 95% interval.
16.4 Bootstrap Confidence Intervals 95% bootstrap confidence intervals: t, percentile, and BCa. Make a graphical comparison by drawing a vertical line at the original s and displaying the three intervals vertically, one above the other. Discuss what you see: Do bootstrap t and percentile agree? Does the more accurate interval agree with the two simpler methods? What interval would you use in a report on GPAs at this college? GPA 16.49 The effect of decreasing the sample size. Exercise 16.15 (page 16-13) gives an SRS of 10 of the service center call lengths from Table 1.2. Describe the bootstrap distribution of x from this sample. Give a 95% confidence interval for the population mean m based on these data and a method of your choice. Describe carefully how your result differs from the intervals in Exercise 16.47, which use the larger sample of 80 call lengths. CALLS10 16.50 Bootstrap confidence interval for the GPA data. The GPA data for females from Example 16.6 (page 16-18) are strongly skewed to the left and have a cluster of observations at 4. GPA (a) Bootstrap the mean of the data. Based on the bootstrap distribution, which bootstrap confidence intervals would you consider for use? Explain your answer. (b) Find all three bootstrap confidence intervals. How do the intervals compare? Briefly explain the reasons for any differences. In particular, what kind of errors would you make in estimating the mean GPA by using a t interval or a percentile interval instead of a BCa interval? 16.51 Bootstrap confidence intervals for the difference in GPAs. Example 16.6 (page 16-18) considers the difference in mean GPAs of men and women. The bootstrap distribution appeared reasonably Normal. Give the 95% BCa confidence interval for the difference in mean GPAs. Is this interval comparable to the bootstrap t interval calculated in the example? GPA 16.52 The correlation between GPA and high school math grades. The study described in Example 16.4 (page 16-14) used high school grades to predict GPA. For this exercise, we will look at the correlation between GPA and high school math grades. GPA (a) Describe the distribution of GPAs. Do the same for high school math grades. (b) Describe the relationship between GPA and high school math grades. (c) Generate 2000 resamples and use these to obtain the bootstrap distribution for the correlation. (d) Describe the shape and bias of the bootstrap distribution. Does use of the simpler bootstrap confidence intervals (t and percentile) appear to be justified?
16-41
(e) Find all three 95% bootstrap confidence intervals: t, percentile, and BCa. Make a graphical comparison by drawing a vertical line at the original correlation r and displaying the three intervals vertically, one above the other. Discuss what you see. Does it still appear that the simpler intervals are justified? What confidence interval would you include in a report describing the relationship between GPA and high school math grades? 16.53 The correlation between debts. Figure 2.4 (page 92) shows a strong positive relationship between debt in 2010 and debt in 2009 for 33 countries. Use the bootstrap to perform statistical inference for these data. DEBT (a) Describe the shape and bias of the bootstrap distribution. Do you think that a simple bootstrap inference (t and percentile confidence intervals) is justified? Explain your answer. (b) Give the 95% BCa and bootstrap percentile confidence intervals for the population correlation. Do they (as expected) agree closely? Do these intervals provide significant evidence at the 5% level that the population correlation is not 0? 16.54 Bootstrap distribution for the slope b1. Describe carefully how to resample from data on an explanatory variable x and a response variable y to create a bootstrap distribution for the slope b1 of the least-squares regression line. 16.55 Predicting ratings of laundry detergents. Refer to Example 16.10 (page 16-36). LAUNDRY (a) Find the least-squares regression line for predicting rating from price. (b) Bootstrap the regression line and give a 95% confidence interval for the slope of the population regression line. (c) Compare the bootstrap results with the usual method for finding a confidence interval for a regression slope. 16.56 Predicting GPA. Continue your study of GPA and high school math grades, begun in Exercise 16.52, by performing a regression to predict GPA using high school math grades as the explanatory variable. GPA (a) Plot the residuals against the math grades and make a Normal quantile plot of the residuals. Do these plots suggest that inference based on the usual simple linear regression model may be inaccurate? Give reasons for your answer. (b) Examine the bootstrap distribution of the slope b1 of the least-squares regression line. Based on what you see, what do you recommend regarding the use of bootstrap t or bootstrap percentile intervals? Give reasons for your recommendation.
16-42
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
(c) Give the 95% BCa confidence interval for the slope b1 of the population regression line. Compare this with the standard 95% confidence interval based on Normality, the bootstrap t interval, and the bootstrap percentile interval. Using the BCa interval as a standard, which of the other intervals are adequately accurate for practical use?
distribution suggest that even the bootstrap t interval will be accurate? Give a reason for your answer. (c) Find the standard 95% t confidence interval for b1 and also the BCa, bootstrap t, and bootstrap percentile confidence intervals. What do you conclude about the accuracy of the two t intervals?
16.57 Predicting debt in 2010 from debt in 2009. Continue your study of the relationship between debt in 2009 and debt in 2010 for 33 countries, begun in Exercise 16.53. Run the regression to predict debt in 2010 using debt in 2009 as the explanatory variable. DEBT
16.58 The effect of outliers. We know that outliers can strongly influence statistics such as the mean and the least-squares line. Example 7.7 (page 429) describes a matched pairs study of disruptive behavior by dementia patients. The differences in Table 7.2 show several low values that may be considered outliers. MOON
(a) Plot the residuals against the explanatory variable and make a Normal quantile plot of the residuals. Do the residuals appear to be Normal? Explain your answer.
(a) Bootstrap the mean of the differences with and without the three low values. How do these values influence the shape and bias of the bootstrap distribution?
(b) Examine the shape and bias of the bootstrap distribution of the slope b1 of the least-squares line. Does this
(b) Give the BCa confidence interval from both bootstrap distributions. Discuss the differences.
16.5 Significance Testing Using Permutation Tests When you complete this section, you will be able to • Outline the steps needed for a permutation test for comparing two means. • Outline the steps needed for a permutation test for a matched pairs study. • Outline the steps needed for a permutation test for the relationship between two quantitative variables.
LOOK BACK tests of significance, p. 372
Significance tests tell us whether an observed effect, such as a difference between two means or a correlation between two variables, could reasonably occur “just by chance” in selecting a random sample. If not, we have evidence that the effect observed in the sample reflects an effect that is present in the population. The reasoning of tests goes like this: 1. Choose a statistic that measures the effect you are looking for. 2. Construct the sampling distribution that this statistic would have if the effect were not present in the population. 3. Locate the observed statistic on this distribution. A value in the main body of the distribution could easily occur just by chance. A value in the tail would rarely occur by chance and so is evidence that something other than chance is operating.
LOOK BACK null hypothesis, p. 374
The statement that the effect we seek is not present in the population is the null hypothesis, H0. Assuming the null hypothesis is true, the probability that we would observe a statistic value as extreme or more extreme than the one we did observe is the P-value. Figure 16.23 illustrates the idea of a P-value.
16.5 Significance Testing Using Permutation Tests
Sampling distribution when H0 is true
FIGURE 16.23 The P-value of a statistical test is found from the sampling distribution the statistic would have if the null hypothesis were true. It is the probability of a result at least as extreme as the value we actually observed. LOOK BACK P-value, p. 377
16-43
P-value
Observed statistic
Small P-values are evidence against the null hypothesis and in favor of a real effect in the population. The reasoning of statistical tests is indirect and a bit subtle but is by now familiar. Tests based on resampling don’t change this reasoning. They find P-values by resampling calculations rather than from formulas and so can be used in settings where traditional tests don’t apply. Because P-values are calculated acting as if the null hypothesis were true, we cannot resample from the observed sample as we did earlier. In the absence of bias, resampling from the original sample creates a bootstrap distribution centered at the observed value of the statistic. If the null hypothesis is in fact not true, this value may be far from the parameter value stated by the null hypothesis. We must estimate what the sampling distribution of the statistic would be if the null hypothesis were true. That is, we must obey this rule:
RESAMPLING FOR SIGNIFICANCE TESTS To estimate the P-value for a test of significance, estimate the sampling distribution of the test statistic when the null hypothesis is true by resampling in a manner that is consistent with the null hypothesis.
EXAMPLE DATA DRP
16.11 “Directed reading activities.” Do new “directed reading activities” improve the reading ability of elementary school students, as measured by their Degree of Reading Power (DRP) scores? A study assigns students at random to either the new method (treatment group, 21 students) or traditional teaching methods (control group, 23 students). The DRP scores at the end of the study appear in Table 16.1.7 In Example 7.15 (page 454) we applied the two-sample t test to these data. To apply resampling, we will start with the difference between the sample means as a measure of the effect of the new activities:
CHALLENGE
statistic ⫽ xtreatment ⫺ xcontrol The null hypothesis H0 for the resampling test is that the teaching method has no effect on the distribution of DRP scores. If H0 is true, the DRP scores
16-44
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
permutation test
in Table 16.1 do not depend on the teaching method. Each student has a DRP score that describes that child and is the same no matter which group the child is assigned to. The observed difference in group means just reflects the accident of random assignment to the two groups. Now we can see how to resample in a way that is consistent with the null hypothesis: imitate many repetitions of the random assignment of students to treatment and control groups, with each student always keeping his or her DRP score unchanged. Because resampling in this way scrambles the assignment of students to groups, tests based on resampling are called permutation tests, from the mathematical name for scrambling a collection of things.
TABLE 16.1 Degree of Reading Power Scores for Third-Graders Treatment group
Control group
24
61
59
46
42
33
46
37
43
44
52
43
43
41
10
42
58
67
62
57
55
19
17
55
71
49
54
26
54
60
28
43
53
57
62
20
53
48
49
56
33
37
85
42
Here is an outline of the permutation test procedure for comparing the mean DRP scores in Example 16.11:
permutation resample
• Choose 21 of the 44 students at random to be the treatment group; the other 23 are the control group. This is an ordinary SRS, chosen without replacement. It is called a permutation resample. • Calculate the mean DRP score in each group, using the students’ DRP scores in Table 16.1. The difference between these means is our statistic.
permutation distribution
• Repeat this resampling and calculation of the statistic hundreds of times. The distribution of the statistic from these resamples estimates the sampling distribution under the condition that H0 is true. It is called a permutation distribution. • Consider the value of the statistic actually observed in the study, xtreatment ⫺ xcontrol ⫽ 51.476 ⫺ 41.522 ⫽ 9.954 Locate this value on the permutation distribution to get the P-value. Figure 16.24 illustrates permutation resampling on a small scale. The top box shows the results of a study with four subjects in the treatment group and two subjects in the control group. A permutation resample chooses an SRS of four of the six subjects to form the treatment group. The remaining two are the control group. The results of three permutation resamples appear below the original results, along with the statistic (difference in group means) for each.
16.5 Significance Testing Using Permutation Tests
16-45
24, 61 | 42, 33, 46, 37 x1 – x2 = 42.5 – 39.5 = 3.0
33, 46 | 24, 61, 42, 37 x1 – x2 = 39.5 – 41 = –1.5
33, 61 | 24, 42, 46, 37 x1 – x2 = 47 – 37.25 = 9.75
37, 42 | 24, 61, 33, 46 x1 – x2 = 39.5 – 41 = –1.5
FIGURE 16.24 The idea of permutation resampling. The top box shows the outcome of a study with four subjects in one group and two in the other. The boxes below show three permutation resamples. The values of the statistic for many such resamples form the permutation distribution.
EXAMPLE DATA
16.12 Permutation test for the DRP study. Figure 16.25 shows the perDRP
CHALLENGE
mutation distribution of the difference in means based on 1000 permutation resamples from the DRP data in Table 16.1. This is a resampling estimate of the sampling distribution of the statistic when the null hypothesis H0 is true. As H0 suggests, the distribution is centered at 0 (no effect). The solid vertical line in the figure marks the location of the statistic for the original sample, 9.954. Use the permutation distribution exactly as if it were the sampling distribution: the P-value is the probability that the statistic takes a value at least as extreme as 9.954 in the direction given by the alternative hypothesis. We seek evidence that the treatment increases DRP scores, so the alternative hypothesis is that the distribution of the statistic xtreatment ⫺ xcontrol is centered not at 0 but at some positive value. Large values of the statistic are evidence against the null hypothesis in favor of this one-sided alternative.
Observed Mean
FIGURE 16.25 The permutation distribution of the difference between the treatment mean and the control mean based on the DRP scores of 44 students, for Example 16.12. The dashed line marks the mean of the permutation distribution: it is very close to zero, the value specified by the null hypothesis. The solid vertical line marks the observed difference in means, 9.954. Its location in the right tail shows that a value this large is unlikely to occur when the null hypothesis is true.
P-value
–15
–10
–5
0
5
10
15
16-46
CHAPTER 16
•
Bootstrap Methods and Permutation Tests The permutation test P-value is the proportion of the 1000 resamples that give a result at least as great as 9.954. A look at the resampling results finds that 14 of the 1000 resamples gave a value of 9.954 or larger, so the estimated P-value is 14/1000, or 0.014.
Figure 16.25 shows that the permutation distribution has a roughly Normal shape. Because the permutation distribution approximates the sampling distribution, we now know that the sampling distribution is close to Normal. When the sampling distribution is close to Normal, we can safely apply the usual two-sample t test. The t test in Example 7.15 gives P ⫽ 0.013, very close to the P-value from the permutation test.
Using software In principle, you can program almost any statistical software to do a permutation test. It is more convenient to use software that automates the process of resampling, calculating the statistic, forming the permutation distribution, and finding the P-value. The package perm in R contains functions that allow you to request permutation tests. The permutation distribution in Figure 16.25 is one output. Another is this summary of the test results: Exact Permutation Test Estimated by Monte Carlo data: trtgrp and ctrlgrp p-value = 0.0154 alternative hypothesis: true mean trtgrp - mean ctrlgrp is greater than 0 sample estimates: mean trtgrp - mean ctrlgrp 9.954451 p-value estimated from 5000 Monte Carlo replications 99 percent confidence interval on p-value: 0.01110640 0.02024333 By giving “greater” as the alternative hypothesis, the output makes it clear that 0.015 is the one-sided P-value. This estimate of the P-value is more precise than the 0.014 estimate because it is based on 5000 rather than 1000 resamples.
Permutation tests in practice LOOK BACK two-sample t test, page 454
Permutation tests versus t tests. We have analyzed the data in Table 16.1 both by the two-sample t test (in Chapter 7) and by a permutation test. Comparing the two approaches brings out some general points about permutation tests versus traditional formula-based tests. • The hypotheses for the t test are stated in terms of the two population means, H0 : mtreatment ⫺ mcontrol ⫽ 0 Ha : mtreatment ⫺ mcontrol ⬎ 0 ˇ
ˇ
ˇ
ˇ
16.5 Significance Testing Using Permutation Tests
16-47
The permutation test hypotheses are more general. The null hypothesis is “same distribution of scores in both groups,” and the one-sided alternative is “scores in the treatment group are systematically higher.” These more general hypotheses imply the t hypotheses if we are interested in mean scores and the two distributions have the same shape. • The plug-in principle says that the difference in sample means estimates the difference in population means. The t statistic starts with this difference. We used the same statistic in the permutation test, but that was a choice: we could use the difference in 25% trimmed means or any other statistic that measures the effect of treatment versus control. • The t test statistic is based on standardizing the difference in means in a clever way to get a statistic that has a t distribution when H0 is true. The permutation test works directly with the difference in means (or some other statistic) and estimates the sampling distribution by resampling. No formulas are needed. • The t test gives accurate P-values if the sampling distribution of the difference in means is at least roughly Normal. The permutation test gives accurate P-values even when the sampling distribution is not close to Normal. The permutation test is useful even if we plan to use the two-sample t test. Rather than relying on Normal quantile plots of the two samples and the central limit theorem, we can directly check the Normality of the sampling distribution by looking at the permutation distribution. Permutation tests provide a “gold standard” for assessing two-sample t tests. If the two P-values differ considerably, it usually indicates that the conditions for the two-sample t don’t hold for these data. Because permutation tests give accurate P-values even when the sampling distribution is skewed, they are often used when accuracy is very important. Here is an example.
EXAMPLE DATA GPA
16.13 Permutation test for GPAs. In Example 16.6 (page 16-18), we looked at the difference in mean GPAs of male and female students. Figure 16.10 (page 16-18) shows both distributions. Because the distributions are skewed and the sample sizes are somewhat different, a two-sample t test might be inaccurate. Based on the summary statistics,
CHALLENGE
Gender
n
x
s
Male Female Difference
91 59
2.784 2.933 20.149
0.859 0.748
the t statistic is ⫺1.12 with either 58 or 135.73 degrees of freedom. The P-value is roughly 0.26 in either case. We perform permutation tests with 5000 resamples using R. We use the difference in means, x1 ⫺ x2, as our test statistic. This is done by randomly regrouping the total set of GPAs into two groups that are the same sizes as the two original samples. This is consistent with the null hypothesis that
16-48
CHAPTER 16
•
Bootstrap Methods and Permutation Tests gender has no effect on GPA. Each GPA appears once in the data of each resample, but some GPAs move from the male to the female group, and vice versa. We calculate the test statistic for each resample and create its permutation distribution. The P-value is the proportion of the resamples with statistics that exceed the observed statistic. A 99% confidence interval for the P-value based on the 5000 resamples is (0.256, 0.309). This interval contains the P-value for the t test. The skewness and differing sample sizes do not have an impact here primarily because the sample sizes are relatively large. If you read Chapter 15 on nonparametric tests, you will find there more comparison of permutation tests with rank tests as well as tests based on Normal distributions. Data from an entire population. A subtle difference between confidence intervals and significance tests is that confidence intervals require the distinction between sample and population, but tests do not. If we have data on an entire population—say, all employees of a large corporation—we don’t need a confidence interval to estimate the difference between the mean salaries of male and female employees. We can calculate the means for all men and for all women and get an exact answer. But it still makes sense to ask, “Is the difference in means so large that it would rarely occur just by chance?” A test and its P-value answer that question. Permutation tests are a convenient way to answer such questions. In carrying out the test we pay no attention to whether the data are a sample or an entire population. The resampling assigns the full set of observed salaries at random to men and women and builds a permutation distribution from repeated random assignments. We can then see if the observed difference in mean salaries is so large that it would rarely occur if gender did not matter.
LOOK BACK two-sample t test, page 454
LOOK BACK Robustness of two-sample procedures, p. 455
When are permutation tests valid? The two-sample t test starts from the condition that the sampling distribution of x1 ⫺ x2 is Normal. This is the case if both populations have Normal distributions, and it is approximately true for large samples from non-Normal populations because of the central limit theorem. The central limit theorem helps explain the robustness of the two-sample t test. The test works well when both populations are symmetric, especially when the two sample sizes are similar. The permutation test completely removes the Normality condition. However, resampling in a way that moves observations between the two groups requires that the two populations are identical when the null hypothesis is true— that not only their means are the same but also their spreads and shapes. Our preferred version of the two-sample t allows different standard deviations in the two groups, so the shapes are both Normal but need not have the same spread. In Example 16.13, the distributions are skewed but we do not rule out the t test because of the central limit theorem. The permutation test is valid if the GPA distributions for males and females have the same shape, so that they are identical under the null hypothesis that the centers (the means) are the same. Based on Figure 16.10 (page 16-18), it appears that the distribution for the males has a little more spread than the distribution for the females. Fortunately, the permutation test is robust. That is, it gives accurate P-values when the two
16.5 Significance Testing Using Permutation Tests
16-49
population distributions have somewhat different shapes, such as when they have slightly different standard deviations. Sources of variation. Just as in the case of bootstrap confidence intervals, permutation tests are subject to two sources of random variability: the original sample is chosen at random from the population, and the resamples are chosen at random from the sample. Again as in the case of the bootstrap, the added variation due to resampling is usually small and can be made as small as we like by increasing the number of resamples. The number of resamples on which a permutation test is based determines the number of decimal places and precision in the resulting P-value. Tests based on 1000 resamples give P-values to three places (multiples of 0.001), with a margin of error of 2 2P11 ⫺ P2兾1000 equal to 0.014 when the true onesided P-value is 0.05. If higher precision is needed or your computer is sufficiently fast, you may choose to use 10,000 or more resamples. USE YOUR KNOWLEDGE 16.59 Is a permutation test valid? Suppose a professor wants to compare the effectiveness of two different instruction methods. By design, one method is more team oriented, so he expects the variability in individual tests scores for this method to be smaller. Is it valid to use a permutation test to compare the mean scores of the two methods? Explain. 16.60 Declaring significance. Suppose that a one-sided permutation test based on 250 permutation resamples resulted in a P-value of 0.04. What is the approximate standard deviation of the distribution? Would you feel comfortable declaring the results significant at the 5% level? Explain.
Permutation tests in other settings The bootstrap procedure can replace many different formula-based confidence intervals, provided that we resample in a way that matches the setting. Permutation testing is also a general method that we can adapt to various settings.
GENERAL PROCEDURE FOR PERMUTATION TESTS To carry out a permutation test based on a statistic that measures the size of an effect of interest: 1. Compute the statistic for the original data. 2. Choose permutation resamples from the data without replacement in a way that is consistent with the null hypothesis of the test and with the study design. Construct the permutation distribution of the statistic from its values in a large number of resamples. 3. Find the P-value by locating the original statistic on the permutation distribution.
16-50
CHAPTER 16
•
Bootstrap Methods and Permutation Tests Permutation test for matched pairs. The key step in the general procedure for permutation tests is to form permutation resamples in a way that is consistent with the study design and with the null hypothesis. Our examples to this point have concerned two-sample settings. How must we modify our procedure for a matched pairs design?
EXAMPLE
DATA MOON
CHALLENGE
16.14 Permutation test for full-moon study. Can the full moon influence behavior? A study observed 15 nursing-home patients with dementia. The number of incidents of aggressive behavior was recorded each day for 12 weeks. Call a day a “moon day” if it is the day of a full moon or the day before or after a full moon. Table 16.2 gives the average number of aggressive incidents for moon days and other days for each subject.8 These are matched pairs data. In Example 7.7 (page 429), the matched pairs t test found evidence that the mean number of aggressive incidents is higher on moon days (t ⫽ 6.45, df 5 14, P ⬍ 0.001). The data show some signs of nonNormality. We want to apply a permutation test. The null hypothesis says that the full moon has no effect on behavior. If this is true, the two entries for each patient in Table 16.2 are two measurements of aggressive behavior made under the same conditions. There is no distinction between “moon days” and “other days.” Resampling in a way consistent with this null hypothesis randomly assigns one of each patient’s two scores to “moon” and the other to “other.” We don’t mix results for different subjects, because the original data are paired. The permutation test (like the matched pairs t test) uses the difference in means xmoon ⫺ xother. Figure 16.26 shows the permutation distribution of this statistic from 10,000 resamples. None of these resamples produces a difference as large as the observed difference, xmoon ⫺ xother ⫽ 2.433. The estimated one-sided P-value is less than 1 in a thousand. We report this result as P ⬍ 0.0001. There is strong evidence that aggressive behavior is more common on moon days.
TABLE 16.2 Aggressive Behaviors of Dementia Patients Patient
Moon days
Other days
Patient
Moon days
Other days
1
3.33
0.27
9
6.00
1.59
2
3.67
0.59
10
4.33
0.60
3
2.67
0.32
11
3.33
0.65
4
3.33
0.19
12
0.67
0.69
5
3.33
1.26
13
1.33
1.26
6
3.67
0.11
14
0.33
0.23
7
4.67
0.30
15
2.00
0.38
8
2.67
0.40
16.5 Significance Testing Using Permutation Tests
16-51
Observed Mean
FIGURE 16.26 The permutation distribution for the mean difference (moon days minus other days) from 10,000 paired resamples from the data in Table 16.2, for Example 16.14.
–2.5
–2.0
–1.5
–1.0
–0.5 0.0 0.5 Difference in means
1.0
1.5
2.0
2.5
The permutation distribution in Figure 16.26 is close to Normal, as a Normal quantile plot confirms. The matched pairs t test is therefore reliable and agrees with the permutation test that the P-value is very small. Permutation test for the significance of a relationship. Permutation testing can be used to test the significance of a relationship between two variables. For example, in Example 16.10 we looked at the relationship between price and rating of laundry detergents. The null hypothesis is that there is no relationship. In that case, prices are assigned to detergents for reasons that have nothing to do with rating. We can resample in a way consistent with the null hypothesis by permuting the observed ratings among` the detergents at random. Take the correlation as the test statistic. For every resample, calculate the correlation between the prices (in their original order) and ratings (in the reshuffled order). The P-value is the proportion of the resamples with correlation larger than the original correlation. When can we use permutation tests? We can use a permutation test only when we can see how to resample in a way that is consistent with the study design and with the null hypothesis. We now know how to do this for the following types of problems: • Two-sample problems when the null hypothesis says that the two populations are identical. We may wish to compare population means, proportions, standard deviations, or other statistics. You may recall from Section 7.3 that traditional tests for comparing population standard deviations work very poorly. Permutation tests are a much better choice.
16-52
CHAPTER 16
•
Bootstrap Methods and Permutation Tests • Matched pairs designs when the null hypothesis says that there are only random differences within pairs. A variety of comparisons is again possible. • Relationships between two quantitative variables when the null hypothesis says that the variables are not related. The correlation is the most common measure of association, but not the only one. These settings share the characteristic that the null hypothesis specifies a simple situation such as two identical populations or two unrelated variables. We can see how to resample in a way that matches these situations. Permutation tests can’t be used for testing hypotheses about a single population, comparing populations that differ even under the null hypothesis, or testing general relationships. In these settings, we don’t know how to resample in a way that matches the null hypothesis. Researchers are developing resampling methods for these and other settings, so stay tuned. When we can’t do a permutation test, we can often calculate a bootstrap confidence interval instead. If the confidence interval fails to include the null hypothesis value, then we reject H0 at the corresponding significance level. This is not as accurate as doing a permutation test, but a confidence interval estimates the size of an effect as well as giving some information about its statistical significance. Even when a test is possible, it is often helpful to report a confidence interval along with the test result. Confidence intervals don’t assume that a null hypothesis is true, so we use bootstrap resampling with replacement rather than permutation resampling without replacement.
SECTION 16.5 Summary Permutation tests are significance tests based on permutation resamples drawn at random from the original data. Permutation resamples are drawn without replacement, in contrast to bootstrap samples, which are drawn with replacement. Permutation resamples must be drawn in a way that is consistent with the null hypothesis and with the study design. In a two-sample design, the null hypothesis says that the two populations are identical. Resampling randomly reassigns observations to the two groups. In a matched pairs design, randomly permute the two observations within each pair separately. To test the hypothesis of no relationship between two variables, randomly reassign values of one of the two variables. The permutation distribution of a suitable statistic is formed by the values of the statistic in a large number of resamples. Find the P-value of the test by locating the original value of the statistic on the permutation distribution. When they can be used, permutation tests have great advantages. They do not require specific population shapes such as Normality. They apply to a variety of statistics, not just to statistics that have a simple distribution under the null hypothesis. They can give very accurate P-values, regardless of the shape and size of the population (if enough permutations are used). It is often useful to give a confidence interval along with a test. To create a confidence interval, we no longer assume that the null hypothesis is true, so we use bootstrap resampling rather than permutation resampling.
16.5 Significance Testing Using Permutation Tests
16-53
SECTION 16.5 Exercises For Exercises 16.59 and 16.60, see page 16-49. 16.61 Marketing cell phones. You have two prototypes of a new cell phone and designed an experiment to help you decide which one to market. Forty students were randomly assigned to use one of the two phones for two weeks. Their overall satisfaction with the phone is recorded on a subjective scale with a range of 1 to 100. Outline the steps needed to compare the means for the two phones using a permutation test. 16.62 Marketing cell phones. Refer to the previous exercise. Suppose that you had each of the 40 students use both phones. Outline the steps needed to compare the means for the two phones using a permutation test. 16.63 Characteristics of cell phones. Refer to Exercise 16.61. Before asking the students to provide an overall satisfaction rating, they were asked to provide ratings for several characteristics of the cell phone. Two of these were satisfaction with the screen and satisfaction with the keyboard. Outline the steps needed to evaluate the relationship between these two variables for the first phone using a permutation test. 16.64 Compare the correlations. Refer to the previous exercise. Suppose that you calculate the correlation between satisfaction with the screen and satisfaction with the keyboard for each phone. Outline the steps needed to compare these two correlations using a permutation test. 16.65 A small-sample permutation test. To illustrate the process, let’s perform a permutation test by hand for a small random subset of the DRP data (Example 16.11, page 16-43). Here are the data: Treatment group
57
53
Control group
19
37
41
42
(a) Calculate the difference in means xtreatment ⫺ xcontrol between the two groups. This is the observed value of the statistic. (b) Resample: Start with the 6 scores and choose an SRS of 2 scores to form the treatment group for the first resample. You can do this by labeling the scores from 1 to 6 and using consecutive random digits from Table B or by rolling a die. Using either method, be sure to skip repeated digits. A resample is an ordinary SRS, without replacement. The remaining 4 scores are the control group. What is the difference in group means for this resample? (c) Repeat Step (b) 20 times to get 20 resamples and 20 values of the statistic. Make a histogram of the distribution of these 20 values. This is the permutation distribution for your resamples.
(d) What proportion of the 20 statistic values were equal to or greater than the original value in part (a)? You have just estimated the one-sided P-value for the original 6 observations. (e) For this small data set, there are only 15 possible permutations of the data. As a result, we can calculate the exact P-value by counting the number of permutations with a statistic value greater than or equal to the original value and then dividing by 15. What is the exact P-value here? How close was your estimate? 16.66 Product labels with animals? Participants in a study were asked to indicate their attitude toward a product on a seven-point scale (from 1 5 dislike very much to 7 5 like very much). A bottle of MagicCoat pet shampoo, with a picture of a collie on the label, was the product. Prior to indicating this preference, subjects were randomly assigned to two groups and were asked to do a word find. Four of the words were common to both groups and four were either related to the product image or conflicted with the image. The group with words related to the product image were considered primed. In Exercise 7.72 (page 469) the mean scores were compared using the two-sample t procedures. Let’s use a permutation test for the comparison. Here are the data: BRANDPR Group
Brand Attitude
Primed
2 2 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5
Nonprimed
1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 5
(a) Examine the scores of each group graphically. Is it appropriate to use the two-sample t procedures? Explain your answer. (b) Perform the two-sample t test to compare the group means. Use a two-sided alternative hypothesis and a significance level of 5%. (c) Perform a permutation test to compare the group means. Summarize your results and conclusions. (d) Write a short summary comparing your results in parts (b) and (c). Which method do you recommend for these data? Give reasons for your answer. 16.67 Timing of food intake. Examples 7.16 and 7.17 (pages 456 and 457) examine data on an experiment to compare weight loss in subjects who were classified as early eaters or late eaters, based on the timing of their main meal. In Example 7.17, the following data were analyzed: FOOD10 Group
Weight loss (kg)
Early eater
6.3
15.1
9.4
16.8
10.2
Late eater
7.8
0.2
1.5
11.5
4.6
16-54
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
(a) State appropriate null and alternative hypotheses for these data.
for the permutation test? Do your tests in parts (a) and (c) lead to the same practical conclusion?
(b) Report the result of the pooled two-sample t test.
16.72 Compare the medians. Refer to the previous exercise. Use a permutation test to compare the medians. Write a short summary of your results and conclusions. Include a comparison of what you found here with what you found in the previous exercise. FRENCH
(c) Perform a permutation test to compare the two means and report the results. Compare the P-value for this test with the P-value for the t test in part (b). (d) Find a BCa confidence interval for the difference in means. How is this interval related to your results in part (c)? 16.68 Standard deviation of the estimated P-value. The estimated P-value for the DRP study (Example 16.12, page 16-45) based on 1000 resamples is P ⫽ 0.015. Suppose that we obtained the same P-value based on 4000 resamples. What is the approximate standard deviation of each of these P-values? 16.69 When is a permutation test valid? You want to test the equality of the means of two populations. Sketch density curves for two populations for which (a) a permutation test is valid but a t test is not. (b) both permutation and t tests are valid. (c) a t test is valid but a permutation test is not. 16.70 Testing the correlation between debts. In Exercise 16.53 (page 16-41), we assessed the significance of the correlation between debt in 2009 and debt in 2010 for 33 countries by creating bootstrap confidence intervals. If a 95% confidence interval does not cover 0, the observed correlation is significantly different from 0 at the a ⫽ 0.05 level. Let’s do a test that provides a P-value. Carry out a permutation test and give the P-value. What do you conclude? Is your conclusion consistent with your work in Exercise 16.53 (page 16-41)? DEBT 16.71 Assessing a summer language institute. Exercise 7.45 (page 446) gives data on a study of the effect of a summer language institute on the ability of high school language teachers to understand spoken French. This is a matched pairs study, with scores for 20 teachers at the beginning (pretest) and end (posttest) of the institute. We conjecture that the posttest scores are higher on the average. FRENCH
16.73 Testing the correlation between price and rating. Example 16.10 (page 16-36) uses the bootstrap to find a confidence interval for the correlation between price and rating for 24 laundry detergents. Let’s use a permutation test to examine this correlation. LAUNDRY (a) State the null and alternative hypotheses. (b) Perform a permutation test based on the sample correlation. Report the P-value and draw a conclusion. 16.74 Comparing mpg calculations. Exercise 7.39 (page 445) gives data on a comparison of driver and computer mpg calculations. This is a matched pairs study, with mpg values for 20 fill-ups. MPG20 (a) Carry out the matched pairs t test. That is, state hypotheses, calculate the test statistic, and give its P-value. (b) A permutation test can help check the t test result. Carry out the permutation test for the difference in means in a matched pairs setting, using 10,000 resamples. What is the P-value for the permutation test? Does this test and the test in part (a) lead to the same practical conclusion? 16.75 Comparing the average northern and southern tree diameter. In Exercise 7.107 (page 480), the standard deviations of tree diameters for the northern and southern regions of the tract were compared. This test is unreliable because it is sensitive to non-Normality of the data. Perform a permutation test using the F statistic (ratio of sample variances) as your statistic. What do you conclude? Are the two tests comparable? NSPINES
(b) Make a Normal quantile plot of the gains: posttest score—pretest score. The data have a number of ties and a low outlier. A permutation test can help check the t test result.
16.76 Comparing serum retinol levels. The formal medical term for vitamin A in the blood is serum retinol. Serum retinol has various beneficial effects, such as protecting against fractures. Medical researchers working with children in Papua New Guinea asked whether recent infections reduce the level of serum retinol. They classified children as recently infected or not on the basis of other blood tests and then measured serum retinol. Of the 90 children in the sample, 55 had been recently infected. Table 16.3 gives the serum retinol levels for both groups, in micromoles per liter.9 RETINOL
(c) Carry out the permutation test for the difference in means in a matched pairs setting, using 9999 resamples. The Normal quantile plot shows that the permutation distribution is reasonably Normal. What is the P-value
(a) The researchers are interested in the proportional reduction in serum retinol. Verify that the mean for infected children is 0.620 and that the mean for uninfected children is 0.778.
(a) Carry out the matched pairs t test. That is, state hypotheses, calculate the test statistic, and give its P-value.
16.5 Significance Testing Using Permutation Tests
16-55
TABLE 16.3 Serum Retinol Levels (mmol/l) in Two Groups of Children Not infected
Infected
0.59
1.08
0.88
0.62
0.46
0.39
0.68
0.56
1.19
0.41
0.84
0.37
1.44
1.04
0.67
0.86
0.90
0.70
0.38
0.34
0.97
1.20
0.35
0.87
0.35
0.99
1.22
1.15
1.13
0.67
0.30
1.15
0.38
0.34
0.33
0.26
0.99
0.35
0.94
1.00
1.02
1.11
0.82
0.81
0.56
1.13
1.90
0.42
0.83
0.35
0.67
0.31
0.58
1.36
0.78
0.68
0.69
1.09
1.06
1.23
1.17
0.35
0.23
0.34
0.49
0.69
0.57
0.82
0.59
0.24
0.41
0.36
0.36
0.39
0.97
0.40
0.40
0.24
0.67
0.40
0.55
0.67
0.52
0.23
0.33
0.38
0.33
0.31
0.35
0.82
(b) There is no standard test for the null hypothesis that the ratio of the population means is 1. We can do a permutation test on the ratio of sample means. Carry out a one-sided test and report the P-value. Briefly describe the center and shape of the permutation distribution. Why do you expect the center to be close to 1? 16.77 Methods of resampling. In Exercise 16.76, we did a permutation test for the hypothesis “no difference between infected and uninfected children” using the ratio of mean serum retinol levels to measure “difference.” We might also want a bootstrap confidence interval for the ratio of population means for infected and uninfected children. Describe carefully how resampling is done for the permutation test and for the bootstrap, paying attention to the difference between the two resampling methods. RETINOL 16.78 Podcast downloads. A 2006 Pew survey of Internet users asked whether or not they had downloaded a podcast at least once. The survey was repeated with different users in 2008. For the 2006 survey, 198 of the 2822 Internet users reported that they had downloaded at least one podcast. In the 2008 survey, the results were 295 of 1553 users. We want to use these sample data to test equality of the population proportions of successes. Carry out a permutation test. Describe the permutation distribution. Give the P-value and report your conclusion. 16.79 Gender and GPA. In Exercise 16.51 (page 16-41) we used the bootstrap to compare the mean GPA scores for men and women. GPA (a) Use permutation methods to compare the means for men and women. (b) Use permutation methods to compare the standard deviations for men and women.
(c) Write a short paragraph summarizing your results and conclusions. 16.80 Sadness and spending. A study of sadness and spending randomized subjects to watch videos designed to produce sad or neutral moods. Each subject was given $10, and after watching the video, he or she was asked to trade $0.50 increments of their $10 for an insulated bottle of water. Here are the data: SADNESS Group Neutral
Sad
Purchase price ($) 0.00
2.00
0.00
1.00
0.50
0.00
0.50
2.00
1.00
0.00
0.00
0.00
0.00
1.00
3.00
4.00
0.50
1.00
2.50
2.00
1.50
0.00
1.50
1.50
2.50
4.00
3.00
3.50
1.00
3.50
1.00
(a) Use the two-sample t significance test (page 454) to compare the means of the two groups. Summarize your results. (b) Use the pooled two-sample t significance test (page 462) to compare the means of the two groups. Summarize your results. (c) Use a permutation test to compare the two groups. Summarize your results. (d) Discuss the differences among the results you found for parts (a), (b), and (c). Which method do you prefer? Give reasons for your answer. 16.81 Comparing the variances for sadness and spending. Refer to the previous example. Some treatments in randomized experiments such as this can cause variances to be different. Are the variances of the neutral and sad subjects equal? SADNESS
16-56
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
(a) Use the F test for equality of variances (page 474) to answer this question. Summarize your results. (b) Compare the variances using a permutation test. Summarize your results. (c) Write a short paragraph comparing the F test with the permutation test for these data. 16.82 Comparing two operators. Exercise 7.43 (page 445) gives these data on a delicate measurement of total body bone mineral content made by two operators on the same eight subjects:10 OPERAT
Subject Operator
1
2
3
4
5
6
7
8
1
1.328 1.342 1.075 1.228 0.939 1.004 1.178 1.286
2
1.323 1.322 1.073 1.233 0.934 1.019 1.184 1.304
Do permutation tests give good evidence that measurements made by the two operators differ systematically? If so, in what way do they differ? Do two tests, one that compares centers and one that compares spreads.
CHAPTER 16 Exercises 16.83 Gender and GPA. In Example 16.5 (page 16-16) you used the bootstrap to find a 95% confidence interval for the 25% trimmed mean of GPA. Let’s change the statistic of interest to the 5% trimmed mean. Using Example 16.5 as a guide, find the corresponding 95% confidence interval. Compare this interval with the one in Example 16.5. GPA 16.84 Change the trim. Refer to the previous exercise. Change the statistic of interest to the 10% trimmed mean. Answer the questions in the previous exercise and also compare your new interval with the one you found there. GPA 16.85 Compare the correlations. In Exercise 16.51 (page 16-41) we compared the mean GPA for men and women using the bootstrap. In Exercise 16.52 we used the bootstrap to examine the correlation between GPA and high school math grades. Let’s find the correlations for men and women separately and ask whether there is evidence that they differ. GPA (a) Find the correlation between GPA and high school math grades for the men. Use the bootstrap to find a 95% confidence interval for the population correlation. (b) Repeat part (a) for the women. (c) Use the bootstrap to test the null hypothesis that the population correlations for men and women are the same, rMen ⫽ r Women .
to ask the question about whether or not the relationship between GPA and high school math grades is the same for men and women. Answer the questions from the previous exercise using the slope. Compare the results that you find here with those you found in the previous exercise. GPA 16.87 Bootstrap confidence interval for the difference in proportions. Refer to Exercise 16.78 (page 16-55). We want a 95% confidence interval for the change from 2006 to 2008 in the proportions of Internet users who report that they have downloaded a podcast at least once. Bootstrap the sample data. Give all three bootstrap confidence intervals (t, percentile, and BCa). Compare the three intervals and summarize the results. Which intervals would you recommend? Give reasons for your answer. 16.88 Bootstrap confidence interval for the ratio. Here is one conclusion from the data in Table 16.3, described in Exercise 16.76: “The mean serum retinol level in uninfected children was 1.255 times the mean level in the infected children. A 95% confidence interval for the ratio of means in the population of all children in Papua New Guinea is . . . .” RETINOL (a) Bootstrap the data and use the BCa method to complete this conclusion.
(d) Summarize your findings.
(b) Briefly describe the shape and bias of the bootstrap distribution. Does the bootstrap percentile interval agree closely with the BCa interval for these data?
16.86 Use the regression slope. Refer to the previous exercise, where we used correlations to address the question of whether or not the relationship between GPA and high school math grades is the same for men and women. In Exercise 16.56 (page 16-42) we used the bootstrap to examine the slope of the least-squares regression line for predicting GPA using high school math grades. Let’s compute the slope separately for men and women and ask whether or not they differ. This is another way
16.89 Poetry: an occupational hazard. According to William Butler Yeats, “She is the Gaelic muse, for she gives inspiration to those she persecutes. The Gaelic´ poets die young, for she is restless, and will not let them remain long on earth.” One study designed to investigate this issue examined the age at death for writers from different cultures and genders.11 In Example 1.32 (page 41) we examined the distributions of the age at death for female novelists, poets, and
Chapter 16 Exercises nonfiction writers. Figure 1.17 shows modified sideby-side boxplots for the three categories of writers. The poets do appear to die young! Note that there is an outlier among the nonfiction writers. This writer died at the age of 40, young for a nonfiction writer, but not for a novelist or a poet! Let’s use the methods of this chapter to compare the ages at death for poets and nonfiction writers. POETS (a) Use numerical and graphical summaries to describe the distribution of age at death for the poets. Do the same for the nonfiction writers. (b) Use the methods of Chapter 7 (page 454) to compare the means of the two distributions. Summarize your findings. (c) Use the bootstrap methods of this chapter to compare the means of the two distributions. Summarize your findings. 16.90 Medians for the poets. Refer to the previous exercise. Use the bootstrap methods of this chapter to compare the medians of the two distributions. Summarize your findings and compare them with what you found in part (c) of the previous exercise. POETS 16.91 Permutation test for the poets. Refer to Exercise 16.89. Answer part (c) of that exercise using the permutation test. Summarize your findings and compare them with what you found in Exercise 16.89. POETS 16.92 Variance for poets. Refer to Exercises 16.89 and 16.91. (a) Instead of comparing means, compare variances. Summarize your findings. (b) Explain how questions about the equality of standard deviations are related to questions about the equality of variances. (c) Use the results of this exercise and the previous three exercises to address the question of whether or not the distributions of the poets and nonfiction writers are the same. POETS 16.93 Bootstrap confidence interval for the median. Your software can generate random numbers that have the uniform distribution on 0 to 1. Figure 4.9 (page 258) shows the density curve. Generate a sample of 50 observations from this distribution. (a) What is the population median? Bootstrap the sample median and describe the bootstrap distribution. (b) What is the bootstrap standard error? Compute a 95% bootstrap t confidence interval. (c) Find the 95% BCa confidence interval. Compare with the interval in (b). Is the bootstrap t interval reliable here? 16.94 Are female personal trainers, on average, younger? A fitness center employs 20 personal trainers.
16-57
Here are the ages in years of the female and male personal trainers working at this center: TRAIN Male
25
26
23 32 35 29 30 28 31 32 29
Female
21
23
22 23 20 29 24 19 22
(a) Make a back-to-back stemplot. Do you think the difference in mean ages will be significant? (b) A two-sample t test gives P ⬍ 0.001 for the null hypothesis that the mean age of female personal trainers is equal to the mean age of male personal trainers. Do a two-sided permutation test to check the answer. (c) What do you conclude about using the t test? What do you conclude about the mean ages of the trainers? 16.95 Adult gamers versus teen gamers. A Pew survey compared adult and teen gamers on where they played games. For the adults, 54% of 1063 survey participants played on game consoles such as Xbox, PlayStation, and Wii. For teens, 89% of 1064 survey participants played on game consoles. Use the bootstrap to find a 95% confidence interval for the difference between the teen proportion who play on consoles and the adult proportion. 16.96 Use a ratio for adult gamers versus teen gamers. Refer to the previous exercise. In many settings, researchers prefer to communicate the comparison of two proportions with a ratio. For gamers who play on consoles, they would report that teens are 1.65 (89/54) times more likely to play on consoles. Use the bootstrap to give a 95% confidence interval for this ratio. 16.97 Another way to communicate the result. Refer to the previous two exercises. Here is another way to communicate the result: teen gamers are 65% more likely to play on consoles than adult gamers. (a) Explain how the 65% is computed. (b) Use the bootstrap to give a 95% confidence interval for this estimate. (c) Based on this exercise and the previous two, which of the three ways is most effective for communicating the results? Give reasons for your answer. 16.98 Insurance fraud? Jocko’s Garage has been accused of insurance fraud. Data on estimates (in dollars) made by Jocko and another garage were obtained for 10 damaged vehicles. Here is what the investigators found: GARAGE Car
1
2
3
4
5
Jocko’s
1375
1550
1250
1300
900
Other
1250
1300
1250
1200
950
Car
6
7
8
9
10
Jocko’s
1500
1750
3600
2250
2800
Other
1575
1600
3300
2125
2600
16-58
CHAPTER 16
•
Bootstrap Methods and Permutation Tests
(a) Compute the mean estimate for Jocko and the mean estimate for the other garage. Report the difference in the means and the 95% standard t confidence interval. Be sure to choose the appropriate t procedure for your analysis and explain why you made this choice.
numerical and graphical summaries, your estimate, the 95% t confidence interval, the 95% bootstrap confidence interval, and an explanation for all choices (such as whether you chose to examine the mean or the median, bootstrap options, etc.).
(b) Use the bootstrap to find the confidence interval. Be sure to give details about how you used the bootstrap, which options you chose, and why.
(b) Compute the mean of Jocko’s estimates and the mean of the estimates made by the other garage. Divide Jocko’s mean by the mean for the other garage. Report this ratio and find a 95% confidence interval for this quantity. Be sure to justify choices that you made for the bootstrap.
(c) Compare the t interval with the bootstrap interval. 16.99 Other ways to look at Jocko’s estimates. Refer to the previous exercise. Let’s consider some other ways to analyze these data. GARAGE (a) For each damaged vehicle, divide Jocko’s estimate by the estimate from the other garage. Perform your analysis on these data. Write a short report that includes
(c) Using what you have learned in this exercise and the previous one, how would you summarize the comparison of Jocko’s estimates with those made by the other garage? Assume that your audience knows very little about statistics but a lot about insurance.
CHAPTER 16 Notes and Data Sources 1. Information about this free software is available at r-project.org. 2. The origin of this quaint phrase is Rudolph Raspe, The Singular Adventures of Baron Munchausen, 1786. Here is the passage, from the edition by John Carswell, Heritage Press, 1952: “I was still a couple of miles above the clouds when it broke, and with such violence I fell to the ground that I found myself stunned, and in a hole nine fathoms under the grass, when I recovered, hardly knowing how to get out again. Looking down, I observed that I had on a pair of boots with exceptionally sturdy straps. Grasping them firmly, I pulled with all my might. Soon I had hoist myself to the top and stepped out on terra firma without further ado.” 3. In fact, the bootstrap standard error underestimates the true standard error. Bootstrap standard errors are generally too small by a factor of roughly 21 ⫺ 1兾n. This factor is about 0.95 for n ⫽ 10 and 0.98 for n ⫽ 25, so we ignore it in this elementary exposition. 4. The 254 winning numbers and their payoffs are republished here by permission of the New Jersey State Lottery Commission. 5. The vehicle is a 2002 Toyota Prius owned by the third author. 6. The standard advanced introduction to bootstrap methods is B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, 1993. For tilting
intervals, see B. Efron, “Nonparametric standard errors and confidence intervals” (with discussion), Canadian Journal of Statistics, 36 (1981), pp. 369–401; and T. J. DiCiccio and J. P. Romano, “Nonparametric confidence limits by resampling methods and least favourable families,” International Statistical Review, 58 (1990), pp. 59–76. 7. This example is adapted from Maribeth C. Schmitt, “The effects of an elaborated directed reading activity on the metacomprehension skills of third graders,” PhD dissertation, Purdue University, 1987. 8. These data were collected as part of a larger study of dementia patients conducted by Nancy Edwards, School of Nursing, and Alan Beck, School of Veterinary Medicine, Purdue University. 9. Data provided by Francisco Rosales of the Department of Nutritional Sciences, Pennsylvania State University. See Francisco Rosales et al., “Relation of serum retinol to acute phase proteins and malarial morbidity in Papua New Guinea children,” American Journal of Clinical Nutrition, 71 (2000), pp. 1580–1588. 10. These data were collected in connection with a bone health study at Purdue University and were provided by Linda McCabe. 11. The data were provided by James Kaufman. The study is described in James C. Kaufman, “The cost of the muse: poets die young,” Death Studies, 27 (2003), pp. 813–821. The quote from Yeats appears in this article.
Statistics for Quality: Control and Capability Introduction Quality is a broad concept. Often it refers to a degree or grade of excellence. For example, you may feel that a restaurant serving filet mignon is a higherquality establishment than a fast-food outlet that primarily serves hamburgers. You may also consider a name-brand sweater of higher quality than one sold at a discount store. In this chapter, we consider a narrower concept of quality: consistently meeting standards appropriate for a specific product or service. The fast-food outlet, for example, may serve high-quality hamburgers. The hamburgers are freshly grilled and served promptly at the right temperature every time you visit. Similarly, the discount store sweaters may be high quality because they are consistently free of defects and the tight knit helps them keep their shape wash after wash. Statistically minded management can assess this concept of quality through sampling. For example, the fast-food outlet could sample hamburgers and measure the time from order to being served as well as the temperature and tenderness of the burgers. This chapter discusses the methods used to
17
CHAPTER
17.1 Processes and Statistical Process Control 17.2 Using Control Charts
17.3 Process Capability Indexes 17.4 Control Charts for Sample Proportions
17-1
17-2
CHAPTER 17
•
Statistics for Quality: Control and Capability monitor the quality of a product or service and effectively detect changes in the process that may affect its quality.
Use of data to assess quality Organizations are (or ought to be) concerned about the quality of the products and services they offer. What they don’t know about quality can hurt them: rather than make complaints that an alert organization could use as warnings, customers often simply leave when they feel they are receiving poor quality. A key to maintaining and improving quality is systematic use of data in place of intuition or anecdotes. Here are two examples.
EXAMPLE 17.1 Membership renewal process. Sometimes data that are routinely produced make a quality problem obvious. The internal financial statements of a professional society showed that hiring temporary employees to enter membership data was causing expenditures above budgeted levels each year during the several months when memberships were renewed. Investigation led to two actions. Membership renewal dates were staggered across the year to spread the workload more evenly. More important, outdated and inflexible data entry software was replaced by a modern system that was much easier to use. Result: permanent employees could now process renewals quickly, eliminating the need for temps and also reducing member complaints.
EXAMPLE
LOOK BACK time plot, p. 23
LOOK BACK regression line, p. 110
LOOK BACK comparative experiments, p. 178
LOOK BACK sampling distributions, p. 208
17.2 Response time process. Systematic collection of data helps an organization to move beyond dealing with obvious problems. Motorola measures the performance of its services and manufactured products. They track, for example, the average time from a customer’s call until the problem is fixed, month by month. The trend should be steadily downward as ways are found to speed response.
Because using data is a key to improving quality, statistical methods have much to contribute. Simple tools are often the most effective. Motorola’s service centers calculate mean response times each month and make a time plot. A scatterplot and perhaps a regression line can show how the time to answer telephone calls to a corporate call center influences the percent of callers who hang up before their calls are answered. The design of a new product such as a smartphone may involve interviewing samples of consumers to learn what features they want included and using randomized comparative experiments to determine the best interface. This chapter focuses on just one aspect of statistics for improving quality: statistical process control. The techniques are simple and are based on sampling distributions, but the underlying ideas are important and a bit subtle.
17.1 Processes and Statistical Process Control
17-3
17.1 Processes and Statistical Process Control When you complete this section, you will be able to • Describe a process using a flowchart and a cause-and-effect diagram. • Explain what is meant by a process being in control by distinguishing common and special cause variation. x chart and utilize the • Compute the center line and control limits for an – chart for process monitoring. • Compute the center line and control limits for an s chart and utilize the chart for process monitoring. x and s charts in terms of what they monitor and which • Contrast the – should be interpreted first.
In thinking about statistical inference, we distinguish between the sample data we have in hand and the wider population that the data represent. We hope to use the sample to draw conclusions about the population. In thinking about quality improvement, it is often more natural to speak of processes rather than populations. This is because work is organized in processes. Here are some examples: • Processing an application for admission to a university and deciding whether or not to admit the student. • Reviewing an employee’s expense report for a business trip and issuing a reimbursement check. • Hot forging to shape a billet of titanium into a blank that, after machining, will become part of a medical implant for hip, knee, or shoulder replacement. Each of these processes is made up of several successive operations that eventually produce the output—an admission decision, a reimbursement check, or a metal component.
PROCESS A process is a chain of activities that turns inputs into outputs.
We can accommodate processes in our sample-versus-population framework: think of the population as containing all the outputs that would be produced by the process if it ran forever in its present state. The outputs produced today or this week are a sample from this population. Because the population doesn’t actually exist now, it is simpler to speak of a process and of recent output as a sample from the process in its present state.
Describing processes The first step in improving a process is to understand it. If the process is at all complex, even the people involved with it may not have a full picture of how the activities interact in ways that influence quality. A brainstorming session is in order: bring people together to gain an understanding of the process.
17-4
CHAPTER 17
•
Statistics for Quality: Control and Capability
flowchart
cause-and-effect diagram
This understanding is often presented graphically using two simple tools: flowcharts and cause-and-effect diagrams. A flowchart is a picture of the stages of a process. Many organizations have formal standards for making flowcharts. Because flowcharts are not statistical graphics, we will informally illustrate their use in an example and not insist on a specific format. A cause-and-effect diagram organizes the logical relationships between the inputs and stages of a process and an output. Sometimes the output is successful completion of the process task; sometimes it is a quality problem that we hope to solve. A good starting outline for a cause-and-effect diagram appears in Figure 17.1. The main branches organize the causes and serve as a skeleton for detailed entries. You can see why these are sometimes called “fishbone diagrams.” Once again we will illustrate the diagram by example rather than insist on a specific format.1
Environment
Material
Equipment
Effect
FIGURE 17.1 An outline for a cause-and-effect diagram. Group causes under these main headings in the form of branches.
Personnel
Methods
EXAMPLE 17.3 Flowchart and cause-and-effect diagram of a hot-forging process. Hot forging involves heating metal to a plastic state and then shaping it by applying thousands of pounds of pressure to force the metal into a die (a kind of mold). Figure 17.2 is a flowchart of a typical hot-forging process.2 A process improvement team, after making and discussing this flowchart, came to several conclusions: • Inspecting the billets of metal received from the supplier adds no value. Insist that the supplier be responsible for the quality of the material. This then eliminates the inspection step. • If possible, buy the metal billets already cut to rough length and deburred by the supplier. This would eliminate the cost of preparing the raw material. • Heating the metal billet and forging (pressing the hot metal into the die) are the heart of the process. The company should concentrate attention here. The team then prepared a cause-and-effect diagram (Figure 17.3) for the heating and forging part of the process. The team members shared their specialist knowledge of the causes in their area, resulting in a more complete picture than any one person could produce. Figure 17.3 is a simplified version of the actual diagram. We have given some added detail for the “hammer stroke” branch under “equipment” to illustrate the next level of branches. Even this requires some knowledge of hot forging to understand. Based on detailed discussion of the diagram, the team decided what variables to measure and at what stages of the process to measure them. Producing well-chosen data is the key to improving the process.
17.1 Processes and Statistical Process Control
17-5
Receive the material
Check for size and metallurgy O.K.
No
Scrap
Yes Cut to the billet length
Yes
Deburr
Check for size O.K.
No
Oversize
No
Scrap
Yes Heat billet to the required temperature
Forge to the size
Flash trim and wash
Shot blast
FIGURE 17.2 Flowchart of the hot-forging process in Example 17.3. Use this as a model for flowcharts: decision points appear as diamonds, and other steps in the process appear as rectangles. Arrows represent flow from step to step.
Check for size and metallurgy O.K.
No
Scrap
Yes Bar code and store
We will apply statistical methods to a series of measurements made on a process. Deciding what specific variables to measure is an important step in quality improvement. Often we use a “performance measure” that describes an output of a process. A company’s financial office might record the percent of errors that outside auditors find in expense account reports or the number of data entry errors per week. The personnel department may measure the time to process employee insurance claims or the percent of job offers that are accepted. In the case of complex processes, it is wise to measure key
CHAPTER 17
•
Statistics for Quality: Control and Capability
Equipment
Billet temperature
gh t
ir e A sur es pr
Air quality
Die position
t gh
Dust in the die
ei
Billet size
Hammer stroke H
Billet metallurgy
Humidity
W ei
Material
St
Environment
ra in se ga t u ug p
e
17-6
Die temperature
Handling from furnace to press
FIGURE 17.3 Simplified causeand-effect diagram of the hot-forging process in Example 17.3. Good cause-and-effect diagram require detailed knowledge of the specific process.
Temperature setup
Hammer force and stroke Loading Kiss blocks accuracy setup Die position and lubrication Billet
Personnel
Methods
Good forged item
preparation
steps within the process rather than just final outputs. The process team in Example 17.3 might recommend that the temperature of the die and of the billet be measured just before forging. USE YOUR KNOWLEDGE 17.1 Describing your process. Choose a process that you know well, preferably from a job you have held. If you lack experience with actual business processes, choose a personal process such as making macaroni and cheese or brushing your teeth. Make a flowchart of the process. Make a cause-and-effect diagram that presents the factors that lead to successful completion of the process. 17.2 What variables to measure? Based on your description of the process in Exercise 17.1, suggest specific variables that you might measure in order to (a) assess the overall quality of the process. (b) gather information on a key step within the process.
Statistical process control The goal of statistical process control is to make a process stable over time and then keep it stable unless planned changes are made. You might want, for example, to keep your weight constant over time. A manufacturer of machine parts wants the critical dimensions to be the same for all parts. “Constant over time” and “the same for all” are not realistic requirements. They ignore the fact that all processes have variation. Your weight fluctuates from day to day; the critical dimension of a machined part varies a bit from item to item; the time to process a college admission application is not the same for all applications. Variation occurs in even the most precisely made product due to small
17.1 Processes and Statistical Process Control
common cause
special cause
17-7
changes in the raw material, the behavior of the machine or operator, and even the temperature in the plant. Because variation is always present, we can’t expect to hold a variable exactly constant over time. The statistical description of stability over time requires that the pattern of variation remain stable, not that there be no variation in the variable measured. In the language of statistical quality control, a process that is in control has only common cause variation. Common cause variation is the inherent variability of the process, due to many small causes that are always present. When the normal functioning of the process is disturbed by some unpredictable event, special cause variation is added to the common cause variation. We hope to be able to discover what lies behind special cause variation and eliminate that cause to restore the stable functioning of the process.
EXAMPLE 17.4 Common and special cause variation. Imagine yourself doing the same task repeatedly, say folding a circular, stuffing it into a stamped envelope, and sealing the envelope. The time to complete this task will vary a bit, and it is hard to point to any one reason for the variation. Your completion time shows only common cause variation. Now the telephone rings. You answer, and though you continue folding and stuffing while talking, your completion time rises beyond the level expected from common causes alone. Answering the telephone adds special cause variation to the common cause variation that is always present. The process has been disturbed and is no longer in its normal and stable state.
LOOK BACK sampling distributions, p. 302
Control charts work by distinguishing the always-present common cause variation in a process from the additional variation that suggests that the process has been disturbed by a special cause. A control chart sounds an alarm when it sees too much variation. This is accomplished through a combination of graphical and numerical descriptions of data with use of sampling distributions. Control charts were invented in the 1920s by Walter Shewhart at the Bell Telephone Laboratories.3 The most common application of control charts is to monitor the performance of industrial and business processes. The same methods, however, can be used to check the stability of quantities as varied as the ratings of a television show, the level of ozone in the atmosphere, and the gas mileage of your car.
STATISTICAL CONTROL A variable that continues to be described by the same distribution when observed over time is said to be in statistical control, or simply in control. Control charts are statistical tools that monitor a process and alert us when the process has been disturbed so that it is now out of control. This is a signal to find and correct the cause of the disturbance.
17-8
CHAPTER 17
•
Statistics for Quality: Control and Capability USE YOUR KNOWLEDGE 17.3 Considering common and special cause variation. In Exercise 17.1 (page 17-6), you described a process that you know well. What are some sources of common cause variation in this process? What are some special causes that might, at times, drive the process out of control? 17.4 Examples of special cause variation in arrival times. Lex takes a 7:45 A.M. shuttle to campus each morning. Her apartment complex is near a major road and is two miles from campus. Her arrival time to campus varies a bit from day to day but is generally stable. Give several examples of special causes that might raise Lex’s arrival time on a particular day.
– x charts for process monitoring
chart setup
process monitoring
When you first apply control charts to a process, the process may not be in control. Even if it is in control, you don’t yet understand its behavior. You will have to collect data from the process, establish control by uncovering and removing special causes, and then set up control charts to maintain control. We call this the chart setup stage. Later, when the process has been operating in control for some time, you understand its usual behavior and have a long run of data from the process. You keep control charts to monitor the process because a special cause could erupt at any time. We will call this process monitoring.4 Although in practice chart setup precedes process monitoring, the big ideas of control charts are more easily understood in the process-monitoring setting. We will start there and then discuss the more complex process improvement setting. Consider a quantitative variable x that is an important measure of quality. The variable might be the diameter of a part, the number of envelopes stuffed in an hour, or the time to respond to a customer call. If this process is in control, the variable x is described by the same distribution over time. For now, we’ll assume this distribution is Normal.
PROCESS-MONITORING CONDITIONS The measured quantitative variable x has a Normal distribution. The process has been operating in control for a long period, so that we know the process mean m and the process standard deviation s that describe the distribution of x as long as the process remains in control.
LOOK BACK law of large numbers, p. 268
In practice, we must estimate the process mean and standard deviation from past data on the process. Under the process-monitoring conditions, we have numerous observations and the process has remained in control. The law of large numbers tells us that estimates from past data will be very close to the truth about the process. That is, at the process-monitoring stage we can act as if we know the true values of m and s.
17.1 Processes and Statistical Process Control
–x chart
LOOK BACK sampling distribution of x, p. 307.
center line
LOOK BACK 68–95–99.7 rule, p. 59
control limits
17-9
Note carefully that m and s describe the center and spread of our variable x only as long as the process remains in control. A special cause may at any time disturb the process and change the mean, the standard deviation, or both. To make control charts, begin by taking small samples from the process at regular intervals. For example, we might measure 4 or 5 consecutive parts or the response times to 4 or 5 consecutive customer calls. There is an important idea here: the observations in a sample are so close together in time that we can assume that the process is stable during this short period. Variation within a single sample gives us a benchmark for the common cause variation in the process. The process standard deviation s refers to the standard deviation within the time period spanned by one sample. If the process remains in control, the same s describes the standard deviation of observations across any time period. Control charts help us decide whether this is the case. We start with the x chart, which is based on plotting the means of the successive samples. Here is the outline: 1. Take samples of size n from the process at regular intervals. Plot the means x of these samples against the order in which the samples were taken. 2. We know that the sampling distribution of x under the process-monitoring conditions is Normal with mean m and standard deviation s兾 1n. Draw a solid center line on the chart at height m. 3. The 99.7 part of the 68–95–99.7 rule for Normal distributions says that, as long as the process remains in control, 99.7% of the values of x will fall between m ⫺ 3s兾 1n and m ⫹ 3s兾 1n. Draw dashed control limits on the chart at these heights. The control limits mark off the range of variation in sample means that we expect to see when the process remains in control. If the process remains in control and the process mean and standard deviation do not change, we will rarely observe an x outside the control limits. Such an x would be a signal that the process has been disturbed.
EXAMPLE
DATA
17.5 Monitoring the water resistance of fabric. A manufacturer of outdoor sportswear must control the water resistance and breathability of their jackets. Water resistance is measured by the amount of water (depth in millimeters) that can be suspended above the fabric before water seeps through. For their jackets, this test is done along the seams and zipper, where the resistance is likely the weakest. For one particular style of jacket, the manufacturing process has been stable with mean resistance m ⫽ 2750 mm and process standard deviation s ⫽ 430 mm. Each four-hour shift, an operator measures the resistance on a sample of 4 jackets. Table 17.1 gives the last 20 samples. The table also gives the mean x and the standard deviation s for each sample. The operator did not have to calculate these—modern measuring equipment often comes equipped with software that automatically records x and s and even produces control charts.
H2ORES
Figure 17.4 is an x control chart for the 20 water resistance samples in Table 17.1. We have plotted each sample mean from the table against its sample number. For example, the mean of the first sample is 2534 mm, and
CHA
17-10
CHAPTER 17
•
Statistics for Quality: Control and Capability
TABLE 17.1 Twenty Control Chart Samples of Water Resistance (depth in mm) Sample
Depth measurements
Sample mean
Standard deviation
1
2345
2723
2345
2723
2534
218
2
3111
3058
2385
2862
2854
330
3
2471
2053
2526
3161
2553
457
4
2154
2968
2742
2568
2608
344
5
3279
2472
2833
2326
2728
425
6
3043
2363
2018
2385
2452
428
7
2689
2762
2756
2402
2652
170
8
2821
2477
2598
2728
2656
150
9
2608
2599
2479
3453
2785
449
10
3293
2318
3072
2734
2854
425
11
2664
2497
2315
2652
2532
163
12
1688
3309
3336
3183
2879
797
13
3499
3342
2923
3015
3195
271
14
2352
2831
2459
2631
2568
210
15
2573
2184
2962
2752
2618
330
16
2351
2527
3006
2976
2715
327
17
2863
2938
2362
2753
2729
256
18
3281
2726
3297
2601
2976
365
19
3164
2874
3730
2860
3157
407
20
2968
3505
2806
2598
2969
388
4000
Sample mean
3500
UCL
3000
2500 LCL 2000
FIGURE 17.4 The –x chart for the water resistance data of Table 17.1. No points lie outside the control limits.
1500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
17.1 Processes and Statistical Process Control
17-11
this is the value plotted for Sample 1. The center line is at m ⫽ 2750 mm. The upper and lower control limits are m⫹3 m⫺3
s 2n s 2n
⫽ 2750 ⫹ 3 ⫽ 2750 ⫺ 3
430 24 430 24
⫽ 2750 ⫹ 645 ⫽ 3395 mm
(UCL)
⫽ 2750 ⫺ 645 ⫽ 2105 mm
(LCL)
As is common, we have labeled the control limits UCL for upper control limit and LCL for lower control limit.
EXAMPLE x control chart. Figure 17.4 is a typical x chart for a pro17.6 Reading an – cess in control. The means of the 20 samples do vary, but all lie within the range of variation marked out by the control limits. We are seeing the common cause variation of a stable process. Figures 17.5 and 17.6 illustrate two ways in which the process can go out of control. In Figure 17.5, the process was disturbed by a special cause sometime between Sample 12 and Sample 13. As a result, the mean resistance for Sample 13 falls above the upper control limit. It is common practice to mark all out-of-control points with an “x” to call attention to them. A search for the cause begins as soon as we see a point out of control. Investigation finds that the seam sealer device has slipped, resulting in more sealer being applied. This is good for water resistance but harms the jacket’s breathability. When the problem is corrected, Samples 14 to 20 are again in control. Figure 17.6 shows the effect of a steady upward drift in the process center, starting at Sample 11. You see that some time elapses before x is out of control (Sample 18). The one-point-out rule works better for detecting sudden large disturbances than for detecting slow drifts in a process. 4000
x
Sample mean
3500
UCL
3000
2500 LCL 2000
FIGURE 17.5 The –x chart is identical to that in Figure 17.4 except that a special cause has driven – x for Sample 13 above the upper control limit. The out-of-control point is marked with an x.
1500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
17-12
CHAPTER 17
•
Statistics for Quality: Control and Capability 4000
Sample mean
3500
x UCL
x
x
3000
2500 LCL
FIGURE 17.6 The first 10 points
on this –x chart are as in Figure 17.4. The process mean drifts upward after Sample 10, and the sample means – x reflect this drift. The points for Samples 18, 19, and 20 are out of control.
2000
1500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
USE YOUR KNOWLEDGE 17.5 An x control chart for sandwich orders. A sandwich shop owner takes a daily sample of five consecutive sandwich orders at a random time during the lunch rush and records the time it takes to complete each order. Past experience indicates that the process mean should be m ⫽ 90 seconds and the process standard deviation should be s ⫽ 24 seconds. Calculate the center line and control limits for an x control chart. 17.6 Changing the sample size n or the unit of measure. Refer to Exercise 17.5. What happens to the center line and control limits if (a) the owner samples four consecutive sandwich orders? (b) the owner samples six consecutive sandwich orders? (c) the owner uses minutes rather than seconds as the units?
s charts for process monitoring The x charts in Figures 17.4, 17.5, and 17.6 were easy to interpret because the process standard deviation remained fixed at 430 mm. The effects of moving the process mean away from its in-control value (2750 mm) are then clear to see. We know that even the simplest description of a distribution should give both a measure of center and a measure of spread. So it is with control charts. We must monitor both the process center, using an x chart, and the process spread, using a control chart for the sample standard deviation s. The standard deviation s does not have a Normal distribution, even approximately. Under the process-monitoring conditions, the sampling distribution of s is skewed to the right. Nonetheless, control charts for any statistic
17.1 Processes and Statistical Process Control
17-13
are based on the “plus or minus three standard deviations” idea motivated by the 68–95–99.7 rule for Normal distributions. Control charts are intended to be practical tools that are easy to use. Standard practice in process control therefore ignores such details as the effect of non-Normal sampling distributions. Here is the general control chart setup for a sample statistic Q (short for “quality characteristic”).
THREE-SIGMA CONTROL CHARTS To make a three-sigma (3s) control chart for any statistic Q: 1. Take samples from the process at regular intervals and plot the values of the statistic Q against the order in which the samples were taken. 2. Draw a center line on the chart at height mQ, the mean of the statistic when the process is in control. 3. Draw upper and lower control limits on the chart three standard deviations of Q above and below the mean. That is, UCL ⫽ mQ ⫹ 3sQ LCL ⫽ mQ ⫺ 3sQ Here sQ is the standard deviation of the sampling distribution of the statistic Q when the process is in control. 4. The chart produces an out-of-control signal when a plotted point lies outside the control limits. We have applied this general idea to x charts. If m and s are the process mean and standard deviation, the statistic x has mean mx ⫽ m and standard deviation sx ⫽ s兾 1n. The center line and control limits for x charts follow from these facts. What are the corresponding facts for the sample standard deviation s? Study of the sampling distribution of s for samples from a Normally distributed process characteristic gives these facts: 1. The mean of s is a constant times the process standard deviation s, that is, ms ⫽ c4s. 2. The standard deviation of s is also a constant times the process standard deviation, ss ⫽ c5s. The constants are called c4 and c5 for historical reasons. Their values depend on the size of the samples. For large samples, c4 is close to 1. That is, the sample standard deviation s has little bias as an estimator of the process standard deviation s. Because statistical process control often uses small samples, we pay attention to the value of c4. Following the general pattern for three-sigma control charts, 1. The center line of an s chart is at c4s. 2. The control limits for an s chart are at UCL ⫽ ms ⫹ 3ss ⫽ c4s ⫹ 3c5s ⫽ 1c4 ⫹ 3c5 2s ⫽ B6s LCL ⫽ ms ⫺ 3ss ⫽ c4s ⫺ 3c5s ⫽ 1c4 ⫺ 3c5 2s ⫽ B5s
17-14
CHAPTER 17
•
Statistics for Quality: Control and Capability That is, the control limits UCL and LCL are also constants times the process standard deviation. These constants are called (again for historical reasons) B6 and B5. We don’t need to remember that B6 ⫽ c4 ⫹ 3c5 and B5 ⫽ c4 ⫺ 3c5, because tables give us the numerical values of B6 and B5.
x– AND s CONTROL CHARTS FOR PROCESS MONITORING5 Take regular samples of size n from a process that has been in control with process mean m and process standard deviation s. The center line and control limits for an x chart are UCL ⫽ m ⫹ 3 CL ⫽ m
s 2n
LCL ⫽ m ⫺ 3
s
2n The center line and control limits for an s chart are UCL ⫽ B6s CL ⫽ c4s LCL ⫽ B5s The control chart constants c4, B5, and B6 depend on the sample size n.
Table 17.2 gives the values of the control chart constants c4, c5, B5, and B6 for samples of sizes 2 to 10. This table makes it easy to draw s charts. The table has no B5 entries for samples smaller than n ⫽ 6. The lower control limit for an s chart is zero for samples of sizes 2 to 5. This is a consequence of the fact that s has a right-skewed distribution and takes only values greater than zero. The point three standard deviations above the mean (UCL) lies on the long right side of the distribution. The point three standard deviations below the mean (LCL) on the short left side is below zero, so we say that LCL 5 0.
TABLE 17.2 Control Chart Constants Sample size n
c4
c5
B5
B6
2
0.7979
0.6028
2.606
3
0.8862
0.4633
2.276
4
0.9213
0.3889
2.088
5
0.9400
0.3412
1.964
6
0.9515
0.3076
0.029
1.874
7
0.9594
0.2820
0.113
1.806
8
0.9650
0.2622
0.179
1.751
9
0.9693
0.2459
0.232
1.707
10
0.9727
0.2321
0.276
1.669
17.1 Processes and Statistical Process Control
17-15
EXAMPLE DATA H2ORES
CHALLENGE
17.7 Interpreting an s chart for the waterproofing process. Figure 17.7 is the s chart for the water resistance data in Table 17.1. The samples are of size n ⫽ 4 and the process standard deviation in control is s ⫽ 430 mm. The center line is therefore CL ⫽ c4s ⫽ 10.92132 14302 ⫽ 396 mm The control limits are UCL ⫽ B6s ⫽ 12.0882 14302 ⫽ 898 LCL ⫽ B5s ⫽ 102 14302 ⫽ 0 Figures 17.4 and 17.7 go together: they are the x and s charts for monitoring the waterproofing process. Both charts are in control, showing only common cause variation within the bounds set by the control limits. 1200
Sample standard deviation
1000
UCL
800 600 400 200
FIGURE 17.7 The s chart for the water resistance data of Table 17.1. Both the s chart and the – x chart (Figure 17.4) are in control.
0
LCL 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
Figures 17.8 and 17.9 are x and s charts for the water resistance process when a new and poorly trained operator takes over the seam application between Samples 10 and 11. The new operator introduces added variation into the process, increasing the process standard deviation from its in-control value of 430 mm to 600 mm. The x chart in Figure 17.8 shows one point out of control. Only on closer inspection do we see that the spread of the x’s increases after Sample 10. In fact, the process mean has remained unchanged at 2750 mm. The apparent lack of control in the x chart is entirely due to the larger process variation. There is a lesson here: it is difficult to interpret an x chart unless s is in control. When you look at x and s charts, always start with the s chart. The s chart in Figure 17.9 shows lack of control starting at Sample 11. As usual, we mark the out-of-control points by an “x.” The points for Samples 13 and 15 also lie above the UCL, and the overall spread of the sample points is much greater than for the first 10 samples. In practice, the s chart would call for action after Sample 11. We would ignore the x chart until the special cause (the new operator) for the lack of control in the s chart has been found and removed by training the operator.
17-16
CHAPTER 17
•
Statistics for Quality: Control and Capability 4000
Sample mean
3500
x
UCL
3000
2500 LCL 2000
FIGURE 17.8 The –x chart for water resistance when the process variability increases after Sample 10. The –x chart does show the increased variability, but the s chart is clearer and should be read first.
1500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
1200
FIGURE 17.9 The s chart for water resistance when the process variability increases after Sample 10. Increased within-sample variability is clearly visible. Find and remove the s-type special cause before reading the –x chart.
Sample standard deviation
1000
UCL
x
x
x
800 600 400 200 0
LCL 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
Example 17.7 suggests a strategy for using x and s charts in practice. First examine the s chart. Lack of control on an s chart is due to special causes that affect the observations within a sample differently. New and nonuniform raw material, a new and poorly trained operator, and mixing results from several machines or several operators are typical “s-type” special causes. Once the s chart is in control, the stable value of the process standard deviation s means that the variation within samples serves as a benchmark for detecting variation in the level of the process over the longer time periods between samples. The x chart, with control limits that depend on s, does this. The x chart, as we saw in Example 17.7, responds to s-type causes as well as
17.1 Processes and Statistical Process Control
17-17
to longer-range changes in the process, so it is important to eliminate s-type special causes first. Then the x chart will alert us to, for example, a change in process level caused by new raw material that differs from that used in the past or a gradual drift in the process level caused by wear in a cutting tool.
EXAMPLE 17.8 Special causes and their effect on control charts. A large health maintenance organization (HMO) uses control charts to monitor the process of directing patient calls to the proper department or doctor’s receptionist. Each day at a random time, 5 consecutive calls are recorded electronically. The first call today is handled quickly by an experienced operator, but the next goes to a newly hired operator who must ask a supervisor for help. The sample has a large s, and lack of control signals the need to train new hires more thoroughly. The same HMO monitors the time required to receive orders from its main supplier of pharmaceutical products. After a long period in control, the x chart shows a systematic shift downward in the mean time because the supplier has changed to a more efficient delivery service. This is a desirable special cause, but it is nonetheless a systematic change in the process. The HMO will have to establish new control limits that describe the new state of the process, with smaller process mean m. The second setting in Example 17.8 reminds us that a major change in the process returns us to the chart setup stage. In the absence of deliberate changes in the process, process monitoring uses the same values of m and s for long periods of time. One exception is common: careful monitoring and removal of special causes as they occur can permanently reduce the process s. If the points on the s chart remain near the center line for a long period, it is wise to update the value of s to the new, smaller value.
SECTION 17.1 Summary Work is organized in processes, chains of activities that lead to some result. We use flowcharts and cause-and-effect diagrams to describe processes. All processes have variation. If the pattern of variation is stable over time, the process is in statistical control. Control charts are statistical plots intended to warn when a process is out of control. Standard 3s control charts plot the values of some statistic Q for regular samples from the process against the time order of the samples. The center line is at the mean of Q. The control limits lie three standard deviations of Q above and below the center line. A point outside the control limits is an out-of-control signal. For process monitoring of a process that has been in control, the mean and standard deviation are based on past data from the process and are updated regularly. When we measure some quantitative characteristic of the process, we use x and s charts for process control. The s chart monitors variation within individual samples. If the s chart is in control, the x chart monitors variation from sample to sample. To interpret the charts, always look first at the s chart.
17-18
CHAPTER 17
•
Statistics for Quality: Control and Capability
SECTION 17.1 Exercises For Exercises 17.1 and 17.2, see page 17-6; for Exercises 17.3 and 17.4, see page 17-8; and for Exercises 17.5 and 17.6, see page 17-12.
DRG
Percent of losses
104
5.2
17.7 Constructing a flowchart. Consider the process of calling in a sandwich order for delivery to your apartment. Make a flowchart of this process, making sure to include steps that involve Yes/No decisions.
107
10.1
109
7.7
116
13.7
148
6.8
17.8 Determining sources of common and special cause variation. Refer to the previous exercise. The time it takes from deciding to order a sandwich to receiving the sandwich will vary. List several common causes of variation in this time. Then list several special causes that might result in unusual variation.
209
15.2
403
5.6
430
6.8
462
9.4
17.9 Constructing a Pareto chart. Comparisons are easier if you order the bars in a bar graph by height. A bar graph ordered from tallest to shortest bar is sometimes called a Pareto chart, after the Italian economist who recommended this procedure. Pareto charts are often used in quality studies to isolate the “vital few” categories on which we should focus our attention. Here is an example. Painting new auto bodies is a multistep process. There is an “electrocoat” that resists corrosion, a primer, a color coat, and a gloss coat. A quality study for one paint shop produced this breakdown of the primary problem type for those autos whose paint did not meet the manufacturer’s standards: Problem
Percent
Electrocoat uneven—redone
4
Poor adherence of color to primer
5
Lack of clarity in color
2
“Orange peel” texture in color “Orange peel” texture in gloss Ripples in color coat Ripples in gloss coat
32 1 28 4
Uneven color thickness
19
Uneven gloss thickness
5
Total
100
What percent of total losses do these 9 DRGs account for? Make a Pareto chart of losses by DRG. Which DRGs should the hospital study first when attempting to reduce its losses? 17.11 Making a Pareto chart. Continue the study of the process of calling in a sandwich order (Exercise 17.7). If you kept good records, you could make a Pareto chart of the reasons (special causes) for unusually long order times. Make a Pareto chart of these reasons. That is, list the reasons based on your experience and chart your estimates of the percent each reason explains. 17.12 Control limits for label placement. A rum producer monitors the position of its label on the bottle by sampling 4 bottles from each batch. One quantity measured is the distance from the bottom of the bottle neck to the top of the label. The process mean should be m ⫽ 2 inches. Past experience indicates that the distance varies with s ⫽ 0.1 inches. (a) The mean distance x for each batch sample is plotted on an x control chart. Calculate the center line and control limits for this chart. (b) The sample standard deviation s for each batch’s sample is plotted on an s control chart. What are the center line and control limits for this chart? 17.13 More on control limits for label placement. Refer to the previous exercise. What happens to the center line and control limits for the x and s control charts if (a) the distributor samples 10 bottles from each batch?
Make a Pareto chart. Which stage of the painting process should we look at first? 17.10 Constructing another Pareto chart. A large hospital finds that it is losing money on surgery due to inadequate reimbursement by insurance companies and government programs. An initial study looks at losses broken down by diagnosis. Government standards place cases into Diagnostic Related Groups (DRGs). For example, major joint replacements are DRG 209. Here is what the hospital finds:
(b) the distributor samples 2 bottles from each batch? (c) the distributor uses centimeters rather than inches as the units? 17.14 Control limits for air conditioner thermostats. A maker of auto air conditioners checks a sample of 6 thermostatic controls from each hour’s production. The thermostats are set at 728F and then placed in a chamber where the temperature is raised gradually. The temperature at which the thermostat turns on the air conditioner
17.1 Processes and Statistical Process Control is recorded. The process mean should be m 5 728F. Past experience indicates that the response temperature of properly adjusted thermostats varies with s 5 0.68F. (a) The mean response temperature x for each hour’s sample is plotted on an x control chart. Calculate the center line and control limits for this chart. (b) The sample standard deviation s for each hour’s sample is plotted on an s control chart. What are the center line and control limits for this chart? 17.15 Control limits for a meat-packaging process. A meat-packaging company produces 1-pound packages of ground beef by having a machine slice a long circular cylinder of ground beef as it passes through the machine. The timing between consecutive cuts will alter the weight of each section. Table 17.3 gives the weight of three consecutive sections of ground beef taken each hour over two 10-hour days. Past experience indicates that the process mean is 1.014 lb and the weight varies with s ⫽ 0.019 lb. MEATWGT (a) Calculate the center line and control limits for an x chart. (b) What are the center line and control limits for an s chart for this process?
(c) Create the x and s charts for these 20 consecutive samples. (d) Does the process appear to be in control? Explain. 17.16 Causes of variation in the time to respond to an application. The personnel department of a large company records a number of performance measures. Among them is the time required to respond to an application for employment, measured from the time the application arrives. Suggest some plausible examples of each of the following. (a) Reasons for common cause variation in response time. (b) s-type special causes. (c) x-type special causes. 17.17 Control charts for a tablet compression process. A pharmaceutical manufacturer forms tablets by compressing a granular material that contains the active ingredient and various fillers. The hardness of a sample from each lot of tablets is measured in order to control the compression process. The process has been operating in control with mean at the target value m ⫽ 11.5 kiloponds (kp) and
TABLE 17.3 Twenty Samples of Size 3, with x– and s Sample
17-19
Weight (pounds)
x
s
1
0.999
1.071
1.019
1.030
0.0373
2
1.030
1.057
1.040
1.043
0.0137
3
1.024
1.020
1.041
1.028
0.0108
4
1.005
1.026
1.039
1.023
0.0172
5
1.031
0.995
1.005
1.010
0.0185
6
1.020
1.009
1.059
1.029
0.0263
7
1.019
1.048
1.050
1.039
0.0176
8
1.005
1.003
1.047
1.018
0.0247
9
1.019
1.034
1.051
1.035
0.0159
10
1.045
1.060
1.041
1.049
0.0098
11
1.007
1.046
1.014
1.022
0.0207
12
1.058
1.038
1.057
1.051
0.0112
13
1.006
1.056
1.056
1.039
0.0289
14
1.036
1.026
1.028
1.030
0.0056
15
1.044
0.986
1.058
1.029
0.0382
16
1.019
1.003
1.057
1.026
0.0279
17
1.023
0.998
1.054
1.025
0.0281
18
0.992
1.000
1.067
1.020
0.0414
19
1.029
1.064
0.995
1.029
0.0344
20
1.008
1.040
1.021
1.023
0.0159
17-20
CHAPTER 17
•
Statistics for Quality: Control and Capability
TABLE 17.4 Three Sets of x–’s from 20 Samples of Size 4 Sample
Data set A
Data set B
Data set C
1
11.602
11.627
11.495
2
11.547
11.613
11.475
3
11.312
11.493
11.465
4
11.449
11.602
11.497
5
11.401
11.360
11.573
6
11.608
11.374
11.563
7
11.471
11.592
11.321
8
11.453
11.458
11.533
9
11.446
11.552
11.486
10
11.522
11.463
11.502
11
11.664
11.383
11.534
12
11.823
11.715
11.624
13
11.629
11.485
11.629
14
11.602
11.509
11.575
15
11.756
11.429
11.730
16
11.707
11.477
11.680
17
11.612
11.570
11.729
18
11.628
11.623
11.704
19
11.603
11.472
12.052
20
11.816
11.531
11.905
estimated standard deviation s ⫽ 0.2 kp. Table 17.4 gives three sets of data, each representing x for 20 successive samples of n ⫽ 4 tablets. One set of data remains in control at the target value. In a second set, the process mean m shifts suddenly to a new value. In a third, the process mean drifts gradually. PILL (a) What are the center line and control limits for an x chart for this process? (b) Draw a separate x chart for each of the three data sets. Mark any points that are beyond the control limits. (c) Based on your work in part (b) and the appearance of the control charts, which set of data comes from a process that is in control? In which case does the process mean shift suddenly, and at about which sample do you think that the mean changed? Finally, in which case does the mean drift gradually? 17.18 More on the tablet compression process. Exercise 17.17 concerns process control data on the hardness of tablets for a pharmaceutical product. Table 17.5 gives data for 20 new samples of size 4, with the x and s for each sample. The process has
been in control with mean at the target value m ⫽ 11.5 kp and standard deviation s ⫽ 0.2 kp. PILL1 (a) Make both x and s charts for these data based on the information given about the process. (b) At some point, the within-sample process variation increased from s ⫽ 0.2 to s ⫽ 0.4. About where in the 20 samples did this happen? What is the effect on the s chart? On the x chart? (c) At that same point, the process mean changed from m ⫽ 11.5 to m ⫽ 11.7. What is the effect of this change on the s chart? On the x chart? 17.19 Control limits for a milling process. The width of a slot cut by a milling machine is important to the proper functioning of a hydraulic system for large tractors. The manufacturer checks the control of the milling process by measuring a sample of six consecutive items during each hour’s production. The target width for the slot is m ⫽ 0.850 inch. The process has been operating in control with center close to the target and s ⫽ 0.002 inch. What center line and control limits should be drawn on the s chart? On the x chart?
17.1 Processes and Statistical Process Control
17-21
TABLE 17.5 Twenty Samples of Size 4, with x– and s Sample 1
Hardness (kp) 11.193
11.915
11.391
11.500
x
s
11.500
0.3047
2
11.772
11.604
11.442
11.403
11.555
0.1688
3
11.606
11.253
11.458
11.594
11.478
0.1642
4
11.509
11.151
11.249
11.398
11.326
0.1585
5
11.289
11.789
11.385
11.677
11.535
0.2362
6
11.703
11.251
11.231
11.669
11.463
0.2573
7
11.085
12.530
11.482
11.699
11.699
0.6094
8
12.244
11.908
11.584
11.505
11.810
0.3376
9
11.912
11.206
11.615
11.887
11.655
0.3284
10
11.717
11.001
11.197
11.496
11.353
0.3170
11
11.279
12.278
11.471
12.055
11.771
0.4725
12
12.106
11.203
11.162
12.037
11.627
0.5145
13
11.490
11.783
12.125
12.010
11.852
0.2801
14
12.299
11.924
11.235
12.014
11.868
0.4513
15
11.380
12.253
11.861
12.242
11.934
0.4118
16
11.220
12.226
12.216
11.824
11.872
0.4726
17
11.611
11.658
11.977
10.813
11.515
0.4952
18
12.251
11.481
11.156
12.243
11.783
0.5522
19
11.559
11.065
12.186
10.933
11.435
0.5681
20
11.106
12.444
11.682
12.378
11.902
0.6331
17.20 Control limits for a dyeing process. The unique colors of the cashmere sweaters your firm makes result from heating undyed yarn in a kettle with a dye liquor. The pH (acidity) of the liquor is critical for regulating dye uptake and hence the final color. There are five kettles, all of which receive dye liquor from a common source. Twice each day, the pH of the liquor in each kettle is measured, giving a sample of size 5. The process has been operating in control with m ⫽ 4.24 and s ⫽ 0.137. (a) Give the center line and control limits for the s chart. (b) Give the center line and control limits for the x chart. 17.21 Control charts for a mounting-hole process. Figure 17.10 reproduces a data sheet from a factory that makes electrical meters.6 The sheet shows measurements of the distance between two mounting holes for 18 samples of size 5. The heading informs us that the measurements are in multiples of 0.0001 inch above 0.6000 inch. That is, the first measurement, 44, stands for 0.6044 inch. All the measurements end in 4. Although we don’t know why this is true, it is clear that in effect the measurements were made to the nearest 0.001 inch, not to the nearest 0.0001 inch. Based on long experience with this process, you are keeping
control charts based on m ⫽ 43 and s ⫽ 12.74. Make s and x charts for the data in Figure 17.10 and describe the state of the process. MOUNT 17.22 Identifying special causes on control charts. The process described in Exercise 17.20 goes out of control. Investigation finds that a new type of yarn was recently introduced. The pH in the kettles is influenced by both the dye liquor and the yarn. Moreover, on a few occasions a faulty valve on one of the kettles had allowed water to enter that kettle; as a result, the yarn in that kettle had to be discarded. Which of these special causes appears on the s chart and which on the x chart? Explain your answer. 17.23 Determining the probability of detection. An x chart plots the means of samples of size 4 against center line CL 5 715 and control limits LCL 5 680 and UCL 5 750. The process has been in control. (a) What are the process mean and standard deviation? (b) The process is disrupted in a way that changes the mean to m ⫽ 700. What is the probability that the first sample after the disruption gives a point beyond the control limits of the x chart?
17-22
CHAPTER 17
•
Statistics for Quality: Control and Capability
Excel
A
B C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Part No. Chart No. 32506 1 Specification limits Part name (project) Operation (process) Metal frame Distance between mounting holes 0.6054" ± 0.0010" Operator Machine Gage Unit of measure Zero equals R-5 0.0001" 0.6000"
VARIABLES CONTROL CHART ( X & R)
Date Time Sample measurements
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 3 4 5
Average, X Range, R
3/7
3/8
8:30 10:30 11:45 1:30
8:15 10:15 11:45 2:00
44 44 44 44 64
64 44 34 34 54
34 44 54 44 54
4:00
8:30 10:00 11:45 1:30
2:30
3:30 4:30
24 54 44 34 34
34 44 44 34 34
34 44 34 64 34
54 24 54 44 44
54 34 24 44 54
70 30 30 30 30
10
30 30 20 30 30 30 40 40
44 34 34 54 54 14 64 64 54 84 34 34 34 54 44 44 44 44 44 34
20 30 20 20
3/9 3:00
64 34 54 44 44
54 44 24 54 24
44 24 34 34 44
24 24 54 44 44
5:30
54 54 34 24 74 64 44 34 54 44
19
FIGURE 17.10 A process control record sheet kept by operators, for Exercise 17.21. This is typical of records kept by hand when measurements are not automated. We will see in the next section why such records mention – x and R control charts rather than – x and s charts. (c) The process is disrupted in a way that changes the mean to m ⫽ 700 and the standard deviation to s ⫽ 10. What is the probability that the first sample after the disruption gives a point beyond the control limits of the x chart? 17.24 Alternative control limits. American and Japanese practice uses 3s control charts. That is, the control limits are three standard deviations on either side of the mean. When the statistic being plotted has a Normal distribution, the probability of a point outside the limits is about 0.003 (or about 0.0015 in each direction) by the 68–95–99.7 rule (page 59). European practice uses control limits placed so that the probability of a point outside the limits when in control is 0.001 in each direction.
For a Normally distributed statistic, how many standard deviations on either side of the mean do these alternative control limits lie? 17.25 2s control charts. Some special situations call for 2s control charts. That is, the control limits for a statistic Q will be mQ ⫾ 2sQ. Suppose that you know the process mean m and standard deviation s and will plot x and s from samples of size n. (a) What are the 2s control limits for an x chart? (b) Find expressions for the upper and lower 2s control limits for an s chart in terms of the control chart constants c4 and c5 introduced on page 17-13.
17.2 Using Control Charts When you complete this section, you will be able to • Implement various out-of-control rules when interpreting control charts. • Set up a control chart (that is, tentative control limits and center line) based on past data. • Identify rational subgroups when deciding how to choose samples. • Distinguish between the natural tolerances for a product and the control limits for a process, as well as between capability and control. We are now familiar with the ideas behind all control charts as well as the details of making x and s charts. This section discusses a variety of topics related to using control charts in practice.
17.2 Using Control Charts
17-23
– x and R charts
sample range
R chart
We have seen that it is essential to monitor both the center and the spread of a process. Control charts were originally intended to be used by factory workers with limited knowledge of statistics in the era before even calculators, let alone software, were common. In that environment, the standard deviation is too difficult to calculate. The x chart for center was therefore used with a control chart for spread based on the sample range rather than the sample standard deviation. The range R of a sample is just the difference between the largest and smallest observations. It is easy to find R without a calculator. Using R rather than s to measure the spread of samples replaces the s chart with an R chart. It also changes the x chart because the control limits for x use the estimated process spread. Because the range R uses only the largest and smallest observations in a sample, it is less informative than the standard deviation s calculated from all the observations. For this reason, x and s charts are now preferred to x and R charts. R charts, however, remain common because it is easier for workers to understand R than s. In this short introduction, we concentrate on the principles of control charts, so we won’t give the details of constructing x and R charts. These details appear in any text on quality control.7 If you meet a set of x and R charts, remember that the interpretation of these charts is just like the interpretation of x and s charts.
EXAMPLE 17.9 Example of a typical process control technology. Figure 17.11 is a display produced by custom process control software attached to a laser micrometer. In this demonstration prepared by the software maker, the FIGURE 17.11 Output for operators, from the Laser Manager software by System Dynamics, Inc. The software prepares control charts directly from measurements made by a laser micrometer. Compare the hand record sheet in Figure 17.10. (Image provided by Gordon A. Feingold, System Dynamics, Inc. Used by permission.)
17-24
CHAPTER 17
•
Statistics for Quality: Control and Capability micrometer is measuring the diameter in millimeters of samples of pens shipped by an office supply company. The software controls the laser, records measurements, makes the control charts, and sounds an alarm when a point is out of control. This is typical of process control technology in modern manufacturing settings. The software presents x and R charts rather than x and s charts. The R chart monitors within-sample variation (just like an s chart), so we look at it first. We see that the process spread is stable and well within the control limits. Just as in the case of s, the LCL for R is 0 for the samples of size n ⫽ 5 used here. The x chart is also in control, so process monitoring will continue. The software will sound an alarm if either chart goes out of control. USE YOUR KNOWLEDGE 17.26 What’s wrong? For each of the following, explain what is wrong and why. (a) The R chart monitors the center of the process. (b) The R chart is commonly used because the range R is more informative than the standard deviation s. (c) Use of the range R to monitor process spread does not alter the construction of the control limits for the x chart.
Additional out-of-control rules So far, we have used only the basic “one point beyond the control limits” criterion to signal that a process may have gone out of control. We would like a quick signal when the process moves out of control, but we also want to avoid “false alarms,” signals that occur just by chance when the process is really in control. The standard 3s control limits are chosen to prevent too many false alarms, because an out-of-control signal calls for an effort to find and remove a special cause. As a result, x charts are often slow to respond to a gradual drift in the process center. We can speed the response of a control chart to lack of control—at the cost of also enduring more false alarms—by adding patterns other than “one-pointout” as rules. The most common step in this direction is to add a runs rule to the x chart.
OUT-OF-CONTROL SIGNALS x and s or x and R control charts produce an out-of-control signal if (a) One-point-out: A single point lies outside the 3s control limits of either chart. (b) Run: The x chart shows 9 consecutive points above the center line or 9 consecutive points below the center line. The signal occurs when we see the 9th point of the run.
17.2 Using Control Charts
17-25
EXAMPLE 17.10 Effectiveness of the runs rule. Figure 17.12 reproduces the x chart from Figure 17.6. The process center began a gradual upward drift at Sample 11. The chart shows the effect of the drift—the sample means plotted on the chart move gradually upward, with some random variation. The one-point-out rule does not call for action until Sample 18 finally produces an x above the UCL. The runs rule reacts slightly more quickly: Sample 17 is the 9th consecutive point above the center line. FIGURE 17.12 The –x chart for
4000
water resistance data when the process center drifts upward, for Example 17.10. The "run of 9" signal gives an out-of-control warning at Sample 17. Sample mean
3500
x UCL
x
x x
3000
2500 LCL 2000
1500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample number
It is a mathematical fact that the runs rule responds to a gradual drift more quickly (on the average) than the one-point-out rule does. The motivation for a runs rule is that when a process is in control, half the points on an x chart should lie above the center line and half below. That’s true on the average in the long term. In the short term, we will see runs of points above or below, just as we see runs of heads or tails in tossing a coin. To determine how long a run must be to suggest that the process center has moved, we once again concern ourselves with the cost of false alarms. The 99.7 part of the 68–95–99.7 rule says that we will get a point outside the 3s control limits about 3 times for every 1000 points plotted when the process is in control. The chance of 9 straight points above the center line when the process is in control is 11兾22 9 ⫽ 1兾512, or about 2 per 1000. The chance for a run of 9 below the center line is the same. Combined, that’s about 4 false alarms per 1000 plotted points overall when the process is in control. This is very close to the false-alarm rate for one-point-out. There are many other patterns that can be added to the rules for responding to x and s or x and R charts. In our enthusiasm to detect various special kinds of loss of control, it is easy to forget that adding rules always increases the frequency of false alarms. Frequent false
17-26
CHAPTER 17
•
Statistics for Quality: Control and Capability alarms are so annoying that the people responsible for responding soon begin to ignore out-of-control signals. It is better to use only a few out-ofcontrol rules and to reserve rules other than one-point-out and runs for processes that are known to be prone to specific special causes for which there are tailor-made detection rules.8 USE YOUR KNOWLEDGE 17.27 What’s wrong? For each of the following, explain what is wrong and why. (a) For the one-point-out rule, you could reduce the frequency of false alarms by using 2s control limits. (b) In speeding up the response of a control chart to lack of control, we decrease the frequency of false alarms. (c) The runs rule is designed to quickly detect a large and sudden shift in the process. 17.28 The effect of special cause variation. Is each of the following examples of a special cause most likely to first result in (i) one-pointout on the s or R chart, (ii) one-point-out on the x chart, or (iii) a run on the x chart? In each case, briefly explain your reasoning. (a) An etching solution deteriorates as more items are etched. (b) Buildup of dirt reduces the precision with which parts are placed for machining. (c) A new customer service representative for a Spanish-language help line is not a native speaker and has difficulty understanding customers. (d) A data entry employee grows less attentive as his shift continues.
Setting up control charts When you first encounter a process that has not been carefully studied, it is quite likely that the process is not in control. Your first goal is to discover and remove special causes and so bring the process into control. Control charts are an important tool. Control charts for process monitoring follow the process forward in time to keep it in control. Control charts at the chart setup stage, on the other hand, look back in an attempt to discover the present state of the process. An example will illustrate the method.
EXAMPLE DATA
17.11 Monitoring the viscosity of a material. The viscosity of a material is VISC
its resistance to flow when under stress. Viscosity is a critical characteristic of rubber and rubber-like compounds called elastomers, which have many uses in consumer products. Viscosity is measured by placing specimens of the material above and below a slowly rotating roller, squeezing the assembly, and recording the drag on the roller. Measurements are in “Mooney units,” named after the inventor of the instrument.
CHAL
17.2 Using Control Charts
17-27
TABLE 17.6 x and s for 24 Samples of Elastomer Viscosity (in Mooneys) Sample
x
s
Sample
x
s
1
49.750
2.684
13
47.875
1.118
2
49.375
0.895
14
48.250
0.895
3
50.250
0.895
15
47.625
0.671
4
49.875
1.118
16
47.375
0.671
5
47.250
0.671
17
50.250
1.566
6
45.000
2.684
18
47.000
0.895
7
48.375
0.671
19
47.000
0.447
8
48.500
0.447
20
49.625
1.118
9
48.500
0.447
21
49.875
0.447
10
46.250
1.566
22
47.625
1.118
11
49.000
0.895
23
49.750
0.671
12
48.125
0.671
24
48.625
0.895
A specialty chemical company is beginning production of an elastomer that is supposed to have viscosity 45 ⫾ 5 Mooneys. Each lot of the elastomer is produced by “cooking” raw material with catalysts in a reactor vessel. Table 17.6 records x and s from samples of size n ⫽ 4 lots from the first 24 shifts as production begins.9 An s chart therefore monitors variation among lots produced during the same shift. If the s chart is in control, an x chart looks for shift-to-shift variation. Estimating m We do not know the process mean m and standard deviation s. What shall we do? Sometimes we can easily adjust the center of a process by setting some control, such as the depth of a cutting tool in a machining operation or the temperature of a reactor vessel in a pharmaceutical plant. In such cases it is common to simply take the process mean m to be the target value, the depth or temperature that the design of the process specifies as correct. The x chart then helps us keep the process mean at this target value. There is less likely to be a “correct value” for the process mean m if we are monitoring response times to customer calls or data entry errors. In Example 17.11, we have the target value 45 Mooneys, but there is no simple way to set viscosity at the desired level. In such cases, we want the m we use in our x chart to describe the center of the process as it has actually been operating. To do this, take the mean of all the individual measurements in the past samples. Because the samples are all the same size, this is just the mean of the sample x’s. The overall “mean of the sample means” is therefore usually called x. For the 24 samples in Table 17.6, 1 149.750 ⫹ 49.375 ⫹ p ⫹ 48.6252 24 1161.125 ⫽ ⫽ 48.380 24
x⫽
Estimating s It is almost never safe to use a “target value” for the process standard deviation s because it is almost never possible to directly adjust process variation. We must estimate s from past data. We want to combine the
17-28
CHAPTER 17
•
Statistics for Quality: Control and Capability sample standard deviations s from past samples rather than use the standard deviation of all the individual observations in those samples. That is, in Example 17.11, we want to combine the 24 sample standard deviations in Table 17.6 rather than calculate the standard deviation of the 96 observations in these samples. The reason is that it is the within-sample variation that is the benchmark against which we compare the longer-term process variation. Even if the process has been in control, we want only the variation over the short time period of a single sample to influence our value for s. There are several ways to estimate s from the sample standard deviations. Software may use a somewhat sophisticated method and then calculate the control limits for you. Here, we use a simple method that is traditional in quality control because it goes back to the era before software. If we are basing chart setup on k past samples, we have k sample standard deviations s1, s2, . . . , sk. Just average these to get 1 s ⫽ 1s1 ⫹ s2 ⫹ p ⫹ sk 2 k For the viscosity example, we average the s-values for the 24 samples in Table 17.6, 1 s⫽ 12.684 ⫹ 0.895 ⫹ p ⫹ 0.8952 24 24.156 ⫽ ⫽1.0065 24
LOOK BACK mean of s, p. 17-13
Combining the sample s-values to estimate s introduces a complication: the samples used in process control are often small (size n ⫽ 4 in the viscosity example), so s has some bias as an estimator of s. The estimator s inherits this bias. A proper estimate of s corrects this bias. Thus, our estimator is s sˆ ⫽ c4 We get control limits from past data by using the estimates x and sˆ in place of the m and s used in charts at the process-monitoring stage. Here are the results.10
– x AND s CONTROL CHARTS USING PAST DATA Take regular samples of size n from a process. Estimate the process mean m and the process standard deviation s from past samples by mˆ ⫽ x 1or use at target value2 sˆ ⫽
s c4
The center line and control limits for an x chart are UCL ⫽ mˆ ⫹ 3
sˆ 2n
CL ⫽ mˆ LCL ⫽ mˆ ⫺ 3
sˆ 2n
17.2 Using Control Charts
17-29
The center line and control limits for an s chart are UCL ⫽ B6sˆ CL ⫽ c4sˆ ⫽ s LCL ⫽ B5sˆ If the process was not in control when the samples were taken, these should be regarded as trial control limits.
Chart setup We are now ready to outline the chart setup procedure for the elastomer viscosity. Step 1. As usual, we look first at an s chart. For chart setup, control limits are based on the same past data that we will plot on the chart. Based on Table 17.6, s ⫽ 1.0065 s 1.0065 sˆ ⫽ ⫽ ⫽ 1.0925 c4 0.9213 So the center line and control limits for the s chart are UCL ⫽ B6sˆ ⫽ 12.0882 11.09252 ⫽ 2.281 CL ⫽ s ⫽ 1.0065 LCL ⫽ B5sˆ ⫽ 102 11.09252 ⫽ 0 Figure 17.13 is the s chart. The points for Shifts 1 and 6 lie above the UCL. Both are near the beginning of production. Investigation finds that the reactor operator made an error on one lot in each of these samples. The error changed the viscosity of those two lots and increased s for each of the samples. The error will not be repeated now that the operators have gained experience. That is, this special cause has already been removed. 4.0 3.5 3.0 Standard deviation
FIGURE 17.13 The s chart based on past data for the viscosity data of Table 17.6. The control limits are based on the same s-values that are plotted on the chart. Points 1 and 6 are out of control.
2.5
x
x UCL
2.0 1.5 1.0 0.5 0.0 2
4
6
8
10 12 14 16 Sample number
18
20
22
24
17-30
CHAPTER 17
•
Statistics for Quality: Control and Capability Step 2. Remove the two values of s that were out of control. This is proper because the special cause responsible for these readings is no longer present. From the remaining 22 shifts s ⫽ 0.854 and sˆ ⫽
0.854 ⫽ 0.927 0.9213
The new s chart center line and control limits are UCL ⫽ B6sˆ ⫽ 12.0882 10.9272 ⫽ 1.936 CL ⫽ s ⫽ 0.854 LCL ⫽ B5sˆ ⫽ 102 10.9272 ⫽ 0 We don’t show this chart, but you can see from Table 17.6 and Figure 17.13 that none of the remaining s-values lies above the new, lower UCL; the largest remaining s is 1.566. If additional points were out of control, we would repeat the process of finding and eliminating s-type causes until the s chart for the remaining shifts is in control. In practice, this is often a challenging task. Step 3. Once s-type causes have been eliminated, make an x chart using only the samples that remain after dropping those that had out-of-control s-values. For the 22 remaining samples, we calculate x ⫽ 48.4716 and we know that sˆ ⫽ 0.927. The center line and control limits for the x chart are UCL ⫽ x ⫹ 3
sˆ 2n
⫽ 48.4716 ⫹ 3
0.927 24
⫽ 49.862
CL ⫽ x ⫽ 48.4716 LCL ⫽ x ⫺ 3
sˆ 2n
⫽ 48.4716 ⫺ 3
0.927 24
⫽ 47.081
Figure 17.14 is the x chart. Shifts 1 and 6 were already dropped. Seven of the remaining 22 points are beyond the 3s limits, four high and three low. FIGURE 17.14 The –x chart
54
52 Sample mean
based on past data for the viscosity data of Table 17.6. The samples for Shifts 1 and 6 have been removed because s-type special causes active in those samples are no longer active. The – x chart shows poor control.
50
UCL
x
x
x
x
48 LCL
x x x
46
44 2
4
6
8
10 12 14 16 Sample number
18
20
22
24
17.2 Using Control Charts
17-31
Although within-shift variation is now stable, there is excessive variation from shift to shift. To find the cause, we must understand the details of the process, but knowing that the special cause or causes operate between shifts is a big help. If the reactor is set up anew at the beginning of each shift, that’s one place to look more closely. Step 4. Once the x and s charts are both in control (looking backward), use the estimates mˆ and sˆ from the points in control to set tentative control limits to monitor the process going forward. If it remains in control, we can update the charts and move to the process-monitoring stage.
USE YOUR KNOWLEDGE 17.29 Updating control chart limits. Suppose that when the process improvement project of Example 17.11 (page 17-26) is complete, the points remaining after removing special causes have x ⫽ 47.2 and s ⫽ 1.03. What are the center line and control limits for the x and s charts you would use to monitor the process going forward? DATA MEATWGT
17.30 More on updating control chart limits. In Exercise 17.15, control limits for the weight of ground beef were obtained using historical results. Using Table 17.3 (page 17-19), estimate the process m and process s. Do either of these values suggest a change in the process center and spread?
Comments on statistical control CHALLENGE
Having seen how x and s (or x and R) charts work, we can turn to some important comments and cautions about statistical control in practice. Focus on the process rather than on the product This is perhaps the fundamental idea in statistical process control. We might attempt to attain high quality by careful inspection of the finished product and reviewing every outgoing invoice and expense account payment. Inspection of finished products can ensure good quality, but it is expensive. Perhaps more important, final inspection often comes too late: when something goes wrong early in a process, much bad product may be produced before final inspection discovers the problem. This adds to the expense, because the bad product must then be scrapped or reworked. The small samples that are the basis of control charts are intended to monitor the process at key points, not to ensure the quality of the particular items in the samples. If the process is kept in control, we know what to expect in the finished product. We want to do it right the first time, not inspect and fix finished product. Choosing the “key points” at which we will measure and monitor the process is important. The choice requires that you understand the process well enough to know where problems are likely to arise. Flowcharts and cause-andeffect diagrams can help. It should be clear that control charts that monitor only the final output are often not the best choice. Rational subgroups The interpretation of control charts depends on the distinction between x-type special causes and s-type special causes. This distinction in turn depends on how we choose the samples from which we calculate
17-32
CHAPTER 17
•
Statistics for Quality: Control and Capability
rational subgroup
s (or R). We want the variation within a sample to reflect only the item-to-item chance variation that (when in control) results from many small common causes. Walter Shewhart, the founder of statistical process control, used the term rational subgroup to emphasize that we should think about the process when deciding how to choose samples.
EXAMPLE 17.12 Selecting the sample. A pharmaceutical manufacturer forms tablets by compressing a granular material that contains the active ingredient and various fillers. To monitor the compression process, we will measure the hardness of a sample from each 10 minutes’ production of tablets. Should we choose a random sample of tablets from the several thousand produced in a 10-minute period? A random sample would contain tablets spread across the entire 10 minutes. It fairly represents the 10-minute period, but that isn’t what we want for process control. If the setting of the press drifts or a new lot of filler arrives during the 10 minutes, the spread of the sample will be increased. That is, a random sample contains both the short-term variation among tablets produced in quick succession and the longer-term variation among tablets produced minutes apart. We prefer to measure a rational subgroup of 5 consecutive tablets every 10 minutes. We expect the process to be stable during this very short time period, so that variation within the subgroups is a benchmark against which we can see special cause variation. Samples of consecutive items are rational subgroups when we are monitoring the output of a single activity that does the same thing over and over again. Several consecutive items is the most common type of sample for process control. When the stream of product contains output from several machines or several people, however, the choice of samples is more complicated. Do you want to include variation due to different machines or different people within your samples? If you decide that this variation is common cause variation, be sure that the sample items are spread across machines or people. If all the items in each sample have a common origin, s will be small and the control limits for the x chart will be narrow. Points on the x chart from samples representing different machines or different people will often be out of control, some high and some low. There is no formula for deciding how to form rational subgroups. You must think about causes of variation in your process and decide which you are willing to think of as common causes that you will not try to eliminate. Rational subgroups are samples chosen to express variation due to these causes and no others. Because the choice requires detailed process knowledge, we will usually accept samples of consecutive items as being rational subgroups. Just remember that real processes are messier than textbooks suggest. Why statistical control is desirable To repeat, if the process is kept in control, we know what to expect in the finished product. The process mean m and standard deviation s remain stable over time, so (assuming Normal variation) the 99.7 part of the 68–95–99.7 rule tells us that almost all measurements on
17.2 Using Control Charts
natural tolerances
17-33
individual products will lie in the range m ⫾ 3s. These are sometimes called the natural tolerances for the product. Be careful to distinguish m ⫾ 3s, the range we expect for individual measurements, from the x chart control limits m ⫾ 3s兾 1n, which mark off the expected range of sample means.
EXAMPLE 17.13 Estimating the tolerances for the water resistance study. The process of waterproofing the jackets has been operating in control. The x and s charts were based on m ⫽ 2750 mm and s ⫽ 430 mm. The s chart in Figure 17.7 and a calculation (see Exercise 17.35, page 17-37) suggest that the process s is now less than 430 mm. We may prefer to calculate the natural tolerances from the recent data on 20 samples (80 jackets) in Table 17.1. The estimate of the mean is x ⫽ 2750.7, very close to the target value. Now a subtle point arises. The estimate sˆ ⫽ s兾c4 used for past-data control charts is based entirely on variation within the samples. That’s what we want for control charts, because within-sample variation is likely to be “pure common cause” variation. Even when the process is in control, there is some additional variation from sample to sample, just by chance. So the variation in the process output will be greater than the variation within samples. To estimate the natural tolerances, we should estimate s from all 80 individual jackets rather than by averaging the 20 within-sample standard deviations. The standard deviation for all 80 jackets is s ⫽ 383.8 For a sample of size 80, c4 is very close to 1, so we can ignore it. We are therefore confident that almost all individual jackets will have a water resistance reading between x ⫾ 3s ⫽ 2750.7 ⫾ 132 1383.82 ⬟ 2750.7 ⫾ 1151.4 We expect water resistance measurements to vary between 1599 and 3902 mm. You see that the spread of individual measurements is wider than the spread of sample means used for the control limits of the x chart. The natural tolerances in Example 17.13 depend on the fact that the water resistance of individual jackets follows a Normal distribution. We know that the process was in control when the 80 measurements in Table 17.1 were made, so we can use them to assess Normality. Figure 17.15 is a Normal quantile plot of these measurements. There are no strong deviations from Normality. All 80 observations, including the one point that may appear suspiciously low in Figure 17.15, lie within the natural tolerances. Examining the data strengthens our confidence in the natural tolerances. Because we can predict the performance of the waterproofing process, we can tell the buyers of our jackets what to expect. What is more, if a process is in control, we can see the effect of any changes we make. A process operating out of control is erratic. We can’t do reliable statistical studies on such a process, and if we make a change in the process, we can’t clearly see the results of the change—they are hidden by erratic special cause variation. If we want to improve a process, we must first bring it into control so that we have a stable starting point from which to improve.
17-34
CHAPTER 17
•
Statistics for Quality: Control and Capability
Water resistance (mm)
3500
FIGURE 17.15 Normal quantile plot for the 80 water resistance measurements of Table 17.1. Calculations about individual measurements, such as natural tolerances, depend on approximate Normality.
3000
2500
2000
–3
–2
–1
0 1 Normal score
2
3
Don’t confuse control with capability! A process in control is stable over time and we know how much variation the finished product will show. Control charts are, so to speak, the voice of the process telling us what state it is in. There is no guarantee that a process in control produces products of satisfactory quality. “Satisfactory quality” is measured by comparing the product to some standard outside the process, set by technical specifications, customer expectations, or the goals of the organization. These external standards are unrelated to the internal state of the process, which is all that statistical control pays attention to.
CAPABILITY Capability refers to the ability of a process to meet or exceed the requirements placed on it.
Capability has nothing to do with control—except for the very important point that if a process is not in control, it is hard to tell if it is capable or not.
EXAMPLE 17.14 Assessing the capability of the waterproofing process. An outfitting company is a large buyer of this jacket. They informed us that they need water resistance levels between 1000 and 4000 mm. Although the waterproofing process is in control, we know (Example 17.13) that almost all jackets will have water resistance levels between 1599 and 3902 mm. The process is capable of meeting the customer’s requirement. Figure 17.16 compares the distribution of water resistance levels for individual jackets with the customer specifications. The distribution of water
17.2 Using Control Charts
17-35
New specifications Old specifications
FIGURE 17.16 Comparison of the distribution of water resistance (Normal curve) with original and tightened specifications, for Example 17.14. The process in its current state is not capable of meeting the new specifications.
1000
1500
2750 3500 Water resistance (mm)
4000
resistance is approximately Normal, and we estimate its mean to be very close to 2750 mm and the standard deviation to be about 384 mm. The distribution is safely within the specifications. Times change, however. The outfitting company demands more similarity in jackets and decides to require that the water resistance level lie between 1500 and 3500 mm. These new specification limits also appear in Figure 17.16. The process is not capable of meeting the new requirements. The process remains in control. The change in its capability is entirely due to a change in external requirements. Because the waterproofing process is in control, we know that it is not capable of meeting the new specifications. That’s an advantage of control, but the fact remains that control does not guarantee capability. We will discuss numerical measures of capability in Section 17.3. Managers must understand, that if a process that is in control does not have adequate capability, fundamental changes in the process are needed. The process is doing as well as it can and displays only the chance variation that is natural to its present state. Slogans to encourage the workers or disciplining the workers for poor performance will not change the state of the process. Better training for workers is a change in the process that may improve capability. New equipment or more uniform material may also help, depending on the findings of a careful investigation.
SECTION 17.2 Summary An R chart based on the range of observations in a sample is often used in place of an s chart. Interpret x and R charts exactly as you would interpret x and s charts. It is common to use out-of-control rules in addition to “one point outside the control limits.” In particular, a runs rule for the x chart allows the chart to respond more quickly to a gradual drift in the process center.
17-36
CHAPTER 17
•
Statistics for Quality: Control and Capability Control charts based on past data are used at the chart setup stage for a process that may not be in control. Start with control limits calculated from the same past data that you are plotting. Beginning with the s chart, narrow the limits as you find special causes, and remove the points influenced by these causes. When the remaining points are in control, use the resulting limits to monitor the process. Statistical process control maintains quality more economically than inspecting the final output of a process. Samples that are rational subgroups are important to effective control charts. A process in control is stable, so that we can predict its behavior. If individual measurements have a Normal distribution, we can give the natural tolerances. A process is capable if it can meet the requirements placed on it. Control (stability over time) does not in itself imply capability. Remember that control describes the internal state of the process, whereas capability relates the state of the process to external specifications.
SECTION 17.2 Exercises For Exercise 17.26, see page 17-24; for Exercises 17.27 and 17.28, see page 17-26; and for Exercises 17.29 and 17.30, see page 17-31. 17.31 Setting up a control chart. In Exercise 17.12 (page 17-18) the x and s control charts for the placement of the rum label were based on historical results. Suppose that a new labeling machine has been purchased and new control limits need to be determined. Table 17.7 contains the means and standard deviations of the first 24 batch samples. We will use these to determine tentative control limits. LABEL (a) Estimate the center line and control limits for the s chart using all 24 samples.
(b) Does the variation within samples appear to be in control? If not, remove any out-of-control samples and recalculate the limits. We’ll assume that any out-of-control samples are due to the operators adjusting to the new machine. (c) Using the remaining samples, estimate the center line and control limits for the x chart. Again remove any outof-control samples and recalculate. (d) How do these control limits compare with the ones obtained in Exercise 17.12? 17.32 Setting up another control chart. Refer to the previous exercise. Table 17.8 contains another set of 24 samples. Repeat parts (a) to (c) using this data set. LABEL1
TABLE 17.7 x and s for 24 Samples of Label Placement (in inches) Sample
x
s
Sample
x
s
1
1.9824
0.0472
13
1.9949
0.0964
2
2.0721
0.0479
14
2.0287
0.0607
3
2.0031
0.0628
15
1.9391
0.0481
4
2.0088
0.1460
16
1.9801
0.1133
5
2.0445
0.0850
17
1.9991
0.0482
6
2.0322
0.0676
18
1.9834
0.0572
7
2.0209
0.0651
19
2.0348
0.0734
8
1.9927
0.1291
20
1.9935
0.0584
9
2.0164
0.0889
21
1.9866
0.0628
10
2.0462
0.0662
22
1.9599
0.0829
11
2.0438
0.0554
23
2.0018
0.0541
12
2.0269
0.0493
24
1.9954
0.0566
17.2 Using Control Charts
17-37
TABLE 17.8 x and s for 24 Samples of Label Placement (in inches) Sample
x
s
Sample
x
s
1
2.0309
0.1661
13
1.9907
0.0620
2
2.0066
0.1366
14
1.9612
0.0748
3
2.0163
0.0369
15
2.0312
0.0421
4
2.0970
0.1088
16
2.0293
0.0932
5
1.9499
0.0905
17
1.9758
0.0252
6
1.9859
0.1683
18
2.0255
0.0728
7
1.9456
0.0920
19
1.9574
0.0186
8
2.0213
0.0478
20
2.0320
0.0151
9
1.9621
0.0489
21
1.9775
0.0294
10
1.9529
0.0456
22
1.9612
0.0911
11
1.9995
0.0519
23
2.0042
0.0365
12
1.9927
0.0762
24
1.9933
0.0293
17.33 Control chart for an unusual sampling situation. Invoices are processed and paid by two clerks, one very experienced and the other newly hired. The experienced clerk processes invoices quickly. The new hire often refers to the procedures handbook and is much slower. Both are quite consistent, so that their times vary little from invoice to invoice. Suppose that each daily sample of four invoice-processing times comes from only one of the clerks. Thus, some samples are from one and some from the other clerk. Sketch the x chart pattern that will result. 17.34 Altering the sampling plan. Refer to Exercise 17.33. Suppose instead that each sample contains an equal number of invoices from each clerk. (a) Sketch the x and s chart patterns that will result. (b) The process in this case will appear in control. When might this be an acceptable conclusion? 17.35 Reevaluating the process parameters. The x and s control charts for the waterproofing example were based on m ⫽ 2750 mm and s ⫽ 430 mm. Table 17.1 (page 17-10) gives the 20 most recent samples from this process. H2ORES (a) Estimate the process m and s based on these 20 samples. (b) Your calculations suggest that the process s may now be less than 430 mm. Explain why the s chart in Figure 17.7 (page 17-15) suggests the same conclusion. (If this pattern continues, we would eventually update the value of s used for control limits.) 17.36 Estimating the control chart limits from past data. Table 17.9 gives data on the losses (in dollars) incurred by a hospital in treating
DRG 209 (major joint replacement) patients.11 The hospital has taken from its records a random sample of 8 such patients each month for 15 months. DRG (a) Make an s control chart using center lines and limits calculated from these past data. There are no points out of control. (b) Because the s chart is in control, base the x chart on all 15 samples. Make this chart. Is it also in control? 17.37 Efficient process control. A company that makes cellular phones requires that their microchip supplier practice statistical process control and submit control charts for verification. This allows the company to eliminate inspection of the microchips as they arrive, a considerable cost savings. Explain carefully why incoming inspection can safely be eliminated. 17.38 Determining the tolerances for losses from DRG 209 patients. Table 17.9 gives data on hospital losses for samples of DRG 209 patients. The distribution of losses has been stable over time. What are the natural tolerances within which you expect losses on nearly all such patients to fall? DRG 17.39 Checking the Normality of losses. Do the losses on the 120 individual patients in Table 17.9 appear to come from a single Normal distribution? Make a Normal quantile plot and discuss what it shows. Are the natural tolerances you found in the previous exercise trustworthy? Explain your answer. DRG 17.40 The percent of products that meet specifications. If the water resistance readings of individual jackets follow a Normal distribution, we can describe capability by giving the percent of jackets that meet specifications. The old specifications for water resistance are 1000 to 4000 mm. The new specifications are 1500 to 3500 mm.
17-38
CHAPTER 17
•
Statistics for Quality: Control and Capability
TABLE 17.9 Hospital Losses for 15 Samples of DRG 209 Patients Sample
Loss (dollars)
Sample mean
Standard deviation
1
6835
5843
6019
6731
6362
5696
7193
6206
6360.6
521.7
2
6452
6764
7083
7352
5239
6911
7479
5549
6603.6
817.1
3
7205
6374
6198
6170
6482
4763
7125
6241
6319.8
749.1
4
6021
6347
7210
6384
6807
5711
7952
6023
6556.9
736.5
5
7000
6495
6893
6127
7417
7044
6159
6091
6653.2
503.7
6
7783
6224
5051
7288
6584
7521
6146
5129
6465.8
1034.3
7
8794
6279
6877
5807
6076
6392
7429
5220
6609.2
1104.0
8
4727
8117
6586
6225
6150
7386
5674
6740
6450.6
1033.0
9
5408
7452
6686
6428
6425
7380
5789
6264
6479.0
704.7
10
5598
7489
6186
5837
6769
5471
5658
6393
6175.1
690.5
11
6559
5855
4928
5897
7532
5663
4746
7879
6132.4
1128.6
12
6824
7320
5331
6204
6027
5987
6033
6177
6237.9
596.6
13
6503
8213
5417
6360
6711
6907
6625
7888
6828.0
879.8
14
5622
6321
6325
6634
5075
6209
4832
6386
5925.5
667.8
15
6269
6756
7653
6065
5835
7337
6615
8181
6838.9
819.5
Because the process is in control, we can estimate (Example 17.13) that water resistance has mean 2750 mm and standard deviation 384 mm. H2ORES (a) What percent of jackets meet the old specifications? (b) What percent meet the new specifications? 17.41 Improving the capability of the process. Refer to the previous exercise. The center of the specifications for waterproofing is 2500 mm, but the center of our process is 2750 mm. We can improve capability by adjusting the process to have center 2500 mm. This is an easy adjustment that does not change the process variation. What percent of jackets now meet the new specifications? 17.42 Monitoring the calibration of a densitometer. Loss of bone density is a serious health problem for many people, especially older women. Conventional X-rays often fail to detect loss of bone density until the loss reaches 25% or more. New equipment such as the Lunar bone densitometer is much more sensitive. A health clinic installs one of these machines. The manufacturer supplies a “phantom,” an aluminum piece of known density that can be used to keep the machine calibrated. Each morning, the clinic makes two measurements on the phantom before measuring the first patient. Control charts based on these measurements alert the operators if the machine has lost calibration. Table 17.10 contains data for the first 30 days of operation.12 The units are grams per square centimeter (for technical reasons, area rather than volume is measured). DENSITY (a) Calculate x and s for the first 2 days to verify the table entries for those quantities.
(b) What kind of variation does the s chart monitor in this setting? Make an s chart and comment on control. If any points are out of control, remove them and recompute the chart limits until all remaining points are in control. (That is, assume that special causes are found and removed.) (c) Make an x chart using the samples that remain after you have completed part (b). What kind of variation will be visible on this chart? Comment on the stability of the machine over these 30 days based on both charts. 17.43 Determining the natural tolerances for the distance between holes. Figure 17.10 (page 17-22) displays a record sheet for 18 samples of distances between mounting holes in an electrical meter. In Exercise 17.21 (page 17-21), you found that Sample 5 was out of control on the process-monitoring s chart. The special cause responsible was found and removed. Based on the 17 samples that were in control, what are the natural tolerances for the distance between the holes? MOUNT 17.44 Determining the natural tolerances for the densitometer. Remove any samples in Table 17.10 that your work in Exercise 17.42 showed to be out of control on either chart. Estimate the mean and standard deviation of individual measurements on the phantom. What are the natural tolerances for these measurements? DENSITY 17.45 Determining the percent of meters that meet specifications. The record sheet in Figure 17.10 gives the specifications as 0.6054 ⫾ 0.0010 inch. That’s 54 ⫾ 10 as the data are coded on the record. Assuming that the distance varies Normally from meter to meter, about what DENSITY percent of meters meet the specifications?
17.2 Using Control Charts
17-39
TABLE 17.10 Daily Calibration Samples for a Lunar Bone Densitometer Day
Measurements (g/cm2)
x
s
1
1.261
1.260
1.2605
0.000707
2
1.261
1.268
1.2645
0.004950
3
1.258
1.261
1.2595
0.002121
4
1.261
1.262
1.2615
0.000707
5
1.259
1.262
1.2605
0.002121
6
1.269
1.260
1.2645
0.006364
7
1.262
1.263
1.2625
0.000707
8
1.264
1.268
1.2660
0.002828
9
1.258
1.260
1.2590
0.001414
10
1.264
1.265
1.2645
0.000707
11
1.264
1.259
1.2615
0.003536
12
1.260
1.266
1.2630
0.004243
13
1.267
1.266
1.2665
0.000707
14
1.264
1.260
1.2620
0.002828
15
1.266
1.259
1.2625
0.004950
16
1.257
1.266
1.2615
0.006364
17
1.257
1.266
1.2615
0.006364
18
1.260
1.265
1.2625
0.003536
19
1.262
1.266
1.2640
0.002828
20
1.265
1.266
1.2655
0.000707
21
1.264
1.257
1.2605
0.004950
22
1.260
1.257
1.2585
0.002121
23
1.255
1.260
1.2575
0.003536
24
1.257
1.259
1.2580
0.001414
25
1.265
1.260
1.2625
0.003536
26
1.261
1.264
1.2625
0.002121
27
1.261
1.264
1.2625
0.002121
28
1.260
1.262
1.2610
0.001414
29
1.260
1.256
1.2580
0.002828
30
1.260
1.262
1.2610
0.001414
17.46 Assessing the Normality of the densitometer measurements. Are the 60 individual measurements in Table 17.10 at least approximately Normal, so that the natural tolerances you calculated in Exercise 17.44 can be trusted? Make a Normal quantile plot (or another graph if your software is limited) and discuss what you see. DENSITY 17.47 Assessing the Normality of the distance between holes. Make a Normal quantile plot of the 85 distances in the data file MOUNT that remain after removing Sample 5. How does the plot reflect the limited
precision of the measurements (all of which end in 4)? Is there any departure from Normality that would lead you to discard your conclusion from Exercise 17.43? (If your software will not make Normal quantile plots, use a hisMOUNT togram to assess Normality.) 17.48 Determining the natural tolerances for the weight of ground beef. Table 17.3 (page 17-19) gives data on the weight of ground beef sections. Since the distribution of weights has been stable, use the data in Table 17.3 to construct the natural tolerances within which you expect almost all the weights to fall. MEATWGT
17-40
CHAPTER 17
•
Statistics for Quality: Control and Capability
17.49 Assessing the Normality of the weight measurements. Refer to the previous exercise. Do the weights of the 60 individual sections in Table 17.3 appear to come from a single Normal distribution? Make a Normal quantile plot and discuss whether the natural tolerances you MEATWGT found in the previous exercise are trustworthy. 17.50 Control charts for the bore diameter of a bearing. A sample of 5 skateboard bearings is taken near the end of each hour of production. Table 17.11 gives x and s for the first 21 samples, coded in units of 0.001 mm from the target value. The specifications allow a range of ⫾0.004 mm about the target (a range of ⫺4 to ⫹4 as BEARINGS coded). (a) Make an s chart based on past data and comment on control of short-term process variation.
(c) Inadequate air conditioning on a hot day allows the temperature to rise during the afternoon in an office that prepares a company’s invoices. 17.52 Deming speaks. The following comments were made by the quality guru W. Edwards Deming (1900–1993).13 Choose one of these sayings. Explain carefully what facts about improving quality the saying attempts to summarize. (a) “People work in the system. Management creates the system.” (b) “Putting out fires is not improvement. Finding a point out of control, finding the special cause and removing it, is only putting the process back to where it was in the first place. It is not improvement of the process.” (c) “Eliminate slogans, exhortations and targets for the workforce asking for zero defects and new levels of productivity.”
(b) Because the data are coded about the target, the process mean for the data provided is m ⫽ 0. Make an x chart and comment on control of long-term process variation. What special x-type cause probably explains the lack of control of x? 17.51 Detecting special cause variation. Is each of the following examples of a special cause most likely to first result in (i) a sudden change in level on the s or R chart, (ii) a sudden change in level on the x chart, or (iii) a gradual drift up or down on the x chart? In each case, briefly explain your reasoning. (a) An airline pilots’ union puts pressure on management during labor negotiations by asking its members to “work to rule” in doing the detailed checks required before a plane can leave the gate. (b) Measurements of part dimensions that were formerly made by hand are now made by a very accurate laser system. (The process producing the parts does not change— measurement methods can also affect control charts.)
17.53 Monitoring the winning times of the Boston Marathon. The Boston Marathon has been run each year since 1897. Winning times were highly variable in the early years, but control improved as the best runners became more professional. A clear downward trend continued until the 1980s. Sam plans to make a control chart for the winning times from 1980 to the present. Calculation from the winning times from 1980 to 2013 gives x ⫽ 129.52 minutes and s ⫽ 2.19 minutes Sam draws a center line at x and control limits at x ⫾ 3s for a plot of individual winning times. Explain carefully why these control limits are too wide to effectively signal unusually fast or slow times. 17.54 Monitoring weight. Joe has recorded his weight, measured at the gym after a workout, for several years. The mean is 181 pounds and the standard deviation is
x and s for Samples of Bore Diameter TABLE 17.11 – Sample
x
s
Sample
x
s
1
0.0
1.225
12
0.8
3.899
2
0.4
1.517
13
2.0
1.581
3
0.6
2.191
14
0.2
2.049
4
1.0
3.162
15
0.6
2.302
5
20.8
2.280
16
1.2
2.588
6
21.0
2.345
17
2.8
1.924
7
1.6
1.517
18
2.6
3.130
8
1.0
1.414
19
1.8
2.387
9
0.4
2.608
20
0.2
2.775
10
1.4
2.608
21
1.6
1.949
11
0.8
1.924
17.3 Process Capability Indexes 1.7 pounds, with no signs of lack of control. An injury keeps Joe away from the gym for several months. The data below give his weight, measured once each week for the first 16 weeks after he returns from the injury: Week
1
Weight 185.2 Week
9
Weight 181.1
2
3
4
5
6
7
8
185.5
186.3
184.3
183.1
180.8
183.8
182.1
10
11
12
13
14
15
16
180.1
178.7
181.2
183.1
180.2
180.8
182.2
17-41
Joe wants to plot these individual measurements on a control chart. When each “sample” is just one measurement, shortterm variation is estimated by advanced techniques.14 The short-term variation in Joe’s weight is estimated to be about s ⫽ 1.6 pounds. Joe has a target of m ⫽ 181 pounds for his weight. Make a control chart for his measurements, using control limits m ⫾ 2s. It is common to use these narrower limits on an “individuals chart.” Comment on individual points out of control and on runs. Is Joe’s weight stable or does it change systematically over this period? JOEWGT
17.3 Process Capability Indexes When you complete this section, you will be able to • Estimate the percent of product that meets specifications using the Normal distribution. • Explain why the percent of product meeting specifications is not a good measure of capability. • Compute and interpret the Cp and Cpk capability indexes. • Identify issues that affect the interpretation of capability indexes.
Capability describes the quality of the output of a process relative to the needs or requirements of the users of that output. To be more precise, capability relates the actual performance of a process in control, after special causes have been removed, to the desired performance. Suppose, to take a simple but common setting, that there are specifications set for some characteristic of the process output. The viscosity of the elastomer in Example 17.11 (page 17-26) is supposed to be 45 ⫾ 5 Mooneys. The speed with which calls are answered at a corporate customer service call center is supposed to be no more than 30 seconds. In this setting, we might measure capability by the percent of output that meets the specifications. When the variable we measure has a Normal distribution, we can estimate this percent using the mean and standard deviation estimated from past control chart samples. When the variable is not Normal, we can use the actual percent of the measurements in the samples that meet the specifications.
EXAMPLE 17.15 What is the probability of meeting specifications? (a) Before concluding the process improvement study begun in Example 17.11, we found and fixed special causes and eliminated from our data the samples on which those causes operated. The remaining viscosity measurements have x ⫽ 48.7 and s ⫽ 0.85. Note once again that to draw conclusions about viscosity for individual lots we estimate the standard deviation s from all individual lots, not from the average s of sample standard deviations. The specifications call for the viscosity of the elastomer to lie in the range 45 ⫾ 5. A Normal quantile plot shows the viscosities to be quite Normal.
17-42
CHAPTER 17
•
Statistics for Quality: Control and Capability
LSL USL
Figure 17.17(a) shows the Normal distribution of lot viscosities with the specification limits 45 ⫾ 5. These are marked LSL for lower specification limit and USL for upper specification limit. The percent of lots that meet the specifications is about 40 ⫺ 48.7 50 ⫺ 48.7 ⱕZⱕ b 0.85 0.85 ⫽ P 1⫺10.2 ⱕ Z ⱕ 1.532 ⫽ 0.937
P140 ⱕ viscosity ⱕ 502 ⫽ P a
Roughly 94% of the lots meet the specifications. If we can adjust the process center to the center of the specifications, m ⫽ 45, it is clear from Figure 17.17(a) that essentially 100% of lots will meet the specifications. (b) Times to answer calls to a corporate customer service center are usually right-skewed. Figure 17.17(b) is a histogram of the times for 300 calls to the call center of a large bank.15 The specification limit of 30 seconds is marked USL. The median is 20 seconds, but the mean is 32 seconds. Of the 300 calls, 203 were answered in no more than 30 seconds. That is, 203/300 5 68% of the times meet the specifications. FIGURE 17.17 Comparing dis-
LSL
40
USL
41
42
43
44 45 46 47 48 Viscosity (Mooneys)
49
50
51
52
(a) 200
150
Calls
tributions of individual measurements with specifications for, Example 17.15. (a) Viscosity has a Normal distribution. The capability is poor but will be good if we can properly center the process. (b) Response times to customer calls have a right-skewed distribution and only an upper specification limit. Capability is again poor.
USL
100
50
0 0
25 50 75 100 125 150 175 200 225 250 275 300 325 350
Call pickup time (seconds) (b)
17.3 Process Capability Indexes
17-43
Process B
LSL
USL
Process A
FIGURE 17.18 Two distributions for part diameters. All the parts from Process A meet the specifications, but a higher proportion of parts from Process B have diameters close to the target.
Target
Turns out, however, that the percent meeting specifications is a poor measure of capability. Figure 17.18 shows why. This figure compares the distributions of the diameter of the same part manufactured by two processes. The target diameter and the specification limits are marked. All the parts produced by Process A meet the specifications, but about 1.5% of those from Process B fail to do so. Nonetheless, Process B appears superior to Process A because it is less variable: much more of Process B’s output is close to the target. Process A produces many parts close to LSL and USL. These parts meet the specifications, but they will likely fit and perform more poorly than parts with diameters close to the center of the specifications. A distribution like that for Process A might result from inspecting all the parts and discarding those whose diameters fall outside the specifications. That’s not an efficient way to achieve quality. We need a way to measure process capability that pays attention to the variability of the process (smaller is better). The standard deviation does that, but it doesn’t measure capability because it takes no account of the specifications that the output must meet. Capability indexes start with the idea of comparing process variation with the specifications. Process B will beat Process A by such a measure. Capability indexes also allow us to measure process improvement—we can continue to drive down variation, and so improve the process, long after 100% of the output meets specifications. Continual improvement of processes is our goal, not just reaching “satisfactory” performance. The real importance of capability indexes is that they give us numerical measures to describe ever-better process quality.
The capability indexes Cp and Cpk Capability indexes are numerical measures of process capability that, unlike percent meeting specifications, have no upper limit such as 100%. We can use capability indexes to measure continuing improvement of a process. Of course, reporting just one number has limitations. What is more, the usual indexes are based on thinking about Normal distributions. They are not meaningful for distinctly non-Normal output distributions like the call center response times in Figure 17.17(b).
17-44
CHAPTER 17
•
Statistics for Quality: Control and Capability
CAPABILITY INDEXES Consider a process with specification limits LSL and USL for some measured characteristic of its output. The process mean for this characteristic is m and the standard deviation is s. The capability index Cp is Cp ⫽
USL ⫺ LSL 6s
The capability index Cpk is 0 m ⫺ nearer spec limit 0 3s Set Cpk ⫽ 0 if the process mean m lies outside the specification limits. Large values of Cp or Cpk indicate more capable processes. Cpk ⫽
Capability indexes start from the fact that Normal distributions are in practice about 6 standard deviations wide. That’s the 99.7 part of the 68–95–99.7 rule. Conceptually, Cp is the specification width as a multiple of the process width 6s. When Cp ⫽ 1, the process output will just fit within the specifications if the center is midway between LSL and USL. Larger values of Cp are better—the process output can fit within the specs with room to spare. But a process with high Cp can produce poor-quality product if it is not correctly centered. Cpk remedies this deficiency by considering both the center m and the variability s of the measurements. The denominator 3s in Cpk is half the process width. It is the space needed on either side of the mean if essentially all the output is to lie between LSL and USL. When Cpk ⫽ 1, the process has just this much space between the mean and the nearer of LSL and USL. Again, higher values are better. Cpk is the most common capability index, but starting with Cp helps us see how the indexes work.
EXAMPLE 17.16 A comparison of the Cp and Cpk indexes. Consider the series of pictures in Figure 17.19. We might think of a process that machines a metal part. Measure a dimension of the part that has LSL and USL as its specification limits. As usual, there is variation from part to part. The dimensions vary Normally with mean m and standard deviation s. Figure 17.19(a) shows process width equal to the specification width. That is, Cp ⫽ 1. Almost all the parts will meet specifications if, as in this figure, the process mean m is at the center of the specs. Because the mean is centered, it is 3s from both LSL and USL, so Cpk ⫽ 1 also. In Figure 17.19(b), the mean has moved down to LSL. Only half the parts will meet the specifications. Cp is unchanged because the process width has not changed. But Cpk sees that the center m is right on the edge of the specifications, Cpk ⫽ 0. The value remains 0 if m moves outside the specifications. In Figures 17.19(c) and (d), the process s has been reduced to half the value it had in (a) and (b). The process width 6s is now half the specification
17.3 Process Capability Indexes
17-45
width, so Cp ⫽ 2. In Figure 17.19(c) the center is just 3 of the new s’s above LSL, so that Cpk ⫽ 1. Figure 17.19(d) shows the same smaller s accompanied by mean m correctly centered between LSL and USL. Cpk rewards the process for moving the center from 3s to 6s away from the nearer limit by increasing from 1 to 2. You see that Cp and Cpk are equal if the process is properly centered. If not, Cpk is smaller than Cp.
(a) Cp = 1 Cpk = 1
(c) Cp = 2 Cpk = 1
3σ
μ
LSL
3σ
USL
LSL
(b) Cp = 1 Cpk = 0
μ
USL
(d) Cp = 2 Cpk = 2 6σ
LSL
USL
LSL
μ
USL
FIGURE 17.19 How capability indexes work. (a) Process centered, process width equal to specification width. (b) Process off-center, process width equal to specification width. (c) Process off-center, process width equal to half the specification width. (d) Process centered, process width equal to half the specification width.
EXAMPLE 17.17 Computing Cp and Cpk for the viscosity process. Figure 17.17(a) compares the distribution of the viscosities of lots of elastomers with the specifications LSL 5 40 and USL 5 50. The distribution here, as is always true in practice, is estimated from past observations on the process. The estimates are mˆ ⫽ x ⫽ 48.7 sˆ ⫽ s ⫽ 0.85 Because capability describes the distribution of individual measurements, we once more estimate s from individual measurements rather than using the estimate s兾c4 that we employ for control charts.
17-46
CHAPTER 17
•
Statistics for Quality: Control and Capability These estimates may be quite accurate if we have data on many past lots. Estimates based on only a few observations may, however, be inaccurate because statistics from small samples can have large sampling variability. This important point is often not appreciated when capability indexes are used in practice. To emphasize that we can only estimate the indexes, we ˆ p and C ˆ pk for values calculated from sample data. They are write C USL ⫺ LSL 6sˆ 50 ⫺ 40 10 ⫽ ⫽ ⫽ 1.96 162 10.852 5.1
ˆp ⫽ C
ˆ pk ⫽ C ⫽
0 mˆ ⫺ nearer limit 0 3sˆ 50 ⫺ 48.7 1.3 ⫽ ⫽ 0.51 132 10.852 2.55
ˆ p ⫽ 1.96 is quite satisfactory because it indicates that the process width is C ˆ pk reflects the only about half the specification width. The small value of C fact that the process center is not close to the center of the specs. If we can move the center m to 45, then Cˆ pk will also be 1.96. USE YOUR KNOWLEDGE 17.55 Specification limits versus control limits. The manager you report to is confused by LSL and USL versus LCL and UCL. The notations look similar. Carefully explain the conceptual difference between specification limits for individual measurements and control limits for x. 17.56 Interpreting the capability indexes. Sketch Normal curves that represent measurements on products from a process with (a) Cp ⫽ 1.0 and Cpk ⫽ 0.5. (b) Cp ⫽ 1.0 and Cpk ⫽ 1.0. (c) Cp ⫽ 2.0 and Cpk ⫽ 1.0.
Cautions about capability indexes Capability indexes are widely used, especially in manufacturing. Some large manufacturers even set standards, such as Cpk ⱖ 1.33, that their suppliers must meet. That is, suppliers must show that their processes are in control (through control charts) and also that they are capable of high quality (as measured by Cpk). There are good reasons for requiring Cpk: it is a better description of process quality than “100% of output meets specs,” and it can document continual improvement. Nonetheless, it is easy to trust Cpk too much. We will point to three possible pitfalls. How to cheat on Cpk Estimating Cpk requires estimates of the process mean m and standard deviation s. The estimates are usually based on samples measured in order to keep control charts. There is only one reasonable estimate of m. This is the mean x of all measurements in recent samples, which is the same as the mean x of the sample means.
17.3 Process Capability Indexes
17-47
There are two different ways of estimating s, however. The standard deviation s of all measurements in recent samples will usually be larger than the control chart estimate s兾c4 based on averaging the sample standard deviations. For Cpk, the proper estimate is s because we want to describe all the variation in the process output. Larger Cpk’s are better, and a supplier wanting to satisfy a customer can make Cpk a bit larger simply by using the smaller estimate s兾c4 for s. That’s cheating. Non-Normal distributions Many business processes, and some manufacturing processes as well, give measurements that are clearly right-skewed rather than approximately Normal. Measuring the times required to deal with customer calls or prepare invoices typically gives a right-skewed distribution— there are many routine cases and a few unusual or difficult situations that take much more time. Other processes have “heavy tails,” with more measurements far from the mean than in a Normal distribution. Process capability concerns the behavior of individual outputs, so the central limit theorem effect that improves the Normality of x does not help us. Capability indexes are therefore more strongly affected by non-Normality than are control charts. It is hard to interpret Cpk when the measurements are strongly non-Normal. Until you gain experience, it is best to apply capability indexes only when Normal quantile plots show that the distribution is at least roughly Normal. Sampling variation We know that all statistics are subject to sampling variation. If we draw another sample from the same process at the same time, we get slightly different x and s due to the luck of the draw in choosing samples. In process control language, the samples differ due to the common cause variation that is always present. Cp and Cpk are in practice calculated from process data because we don’t know the true process mean and standard deviation. That is, these capability indexes are statistics subject to sampling variation. A supplier under pressure from a large customer to measure Cpk often may base calculations on small ˆ pk can differ from the true samples from the process. The resulting estimate C process Cpk in either direction.
EXAMPLE 17.18 Can we adequately measure Cpk? Suppose that the process of waterproofing is in control at its original level. Water resistance measurements are Normally distributed with mean m ⫽ 2750 mm and standard deviation s ⫽ 430 mm. The tightened specification limits are LSL 5 1500 and USL 5 3500, so the true capability is Cpk ⫽
3500 ⫺ 2750 ⫽ 0.58 132 14302
Suppose also that the manufacturer measures 4 jackets each four-hour shift ˆ pk at the end of 8 shifts. That is, C ˆ pk uses measurements and then calculates C from 32 jackets. ˆ pk’s from this setFigure 17.20 is a histogram of 24 computer-simulated C ting. They vary from 0.44 to 0.84, almost a two-to-one spread. It is clear that 32 measurements are not enough to reliably estimate Cpk. ˆ pk unless it is based on at least As a very rough rule of thumb, don’t trust C 100 measurements.
17-48
CHAPTER 17
•
Statistics for Quality: Control and Capability 12
10
8
6
4
FIGURE 17.20 Capability indexes estimated from sample will vary from sample to sample The histo^ gram shows the variation in C pk in 24 samples, each of size 32, for Example 17.18. The process capability is in fact Cpk 5 0.58.
2
0 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Estimated capability index
SECTION 17.3 Summary Capability indexes measure process variability (Cp) or process center and variability (Cpk) against the standard provided by external specifications for the output of the process. Larger values indicate higher capability. Interpretation of Cp and Cpk requires that measurements on the process output have a roughly Normal distribution. These indexes are not meaningful unless the process is in control so that its center and variability are stable. Estimates of Cp and Cpk can be quite inaccurate when based on small numbers of observations, due to sampling variability. You should mistrust estimates not based on at least 100 measurements.
SECTION 17.3 Exercises
(a) The original specifications for water resistance were LSL 5 1000 mm and USL 5 4000 mm. Estimate Cp and Cpk for this process.
17.58 Capability indexes for the waterproofing process, continued. We could improve the performance of the waterproofing process discussed in the previous exercise by making an adjustment that moves the center of the process to m ⫽ 2500 mm, the center of the specifications. We should do this even if the original specifications remain in force, because this will require less sealer and therefore cost less. Suppose that we succeed in moving m to 2500 with no change in the process variability s, estimated by s ⫽ 383.8.
(b) A major customer tightened the specifications to LSL 5 1500 mm and USL 5 3500 mm. Now what are ˆ p and C ˆ pk? C
ˆ p and C ˆ pk with the original specifications? (a) What are C Compare the values with those from part (a) of the previous exercise.
For Exercises 17.55 and 17.56, see page 17-46. 17.57 Capability indexes for the waterproofing process. Table 17.1 (page 17-10) gives 20 process control samples of the water resistance of a particular outdoor jacket. In Example 17.13, we estimated from these samples that mˆ ⫽ x ⫽ 2750.7 mm and sˆ ⫽ s ⫽ 383.8 mm.
17.3 Process Capability Indexes ˆ p and Cˆ pk with the tightened specifica(b) What are C tions? Again compare with the previous results. 17.59 Capability indexes for the meat-packaging process. Table 17.3 (page 17-19) gives 20 process control samples of the weight of ground beef sections. The lower and upper specifications for the 1-pound sections are 0.96 and 1.10. MEATWGT (a) Using these data, estimate Cp and Cpk for this process. (b) What may be a reason for the specifications being centered at a weight that is slightly greater than the desired 1 pound? 17.60 Can we improve the capability of the meatpackaging process? Refer to Exercise 17.59. The average weight of each section can be increased (or decreased) by increasing (or decreasing) the time between slices of the machine. Based on the results of the previous exercise, would a change in the slicing-time interval improve capability? If so, what value of the average weight should the ˆ p and C ˆ pk with this company seek to attain, and what are C new process mean? 17.61 Capability of a characteristic with a uniform distribution. Suppose that a quality characteristic has the uniform distribution on 0 to 1. Figure 17.21 shows the density curve. You can see that the process mean (the balance point of the density curve) is m ⫽ 1兾2. The standard deviation turns out to be s ⫽ 0.289. Suppose also that LSL 5 1兾4 and USL 5 3兾4. (a) Mark LSL and USL on a sketch of the density curve. What is Cpk? What percent of the output meets the specifications? (b) For comparison, consider a process with Normally distributed output having mean m ⫽ 1兾2 and standard deviation s ⫽ 0.289. This process has the same Cpk that you found in part (a). What percent of its output meets the specifications?
17-49
17.62 An alternative estimate for Cpk of the waterˆ pk proofing process. In Exercise 17.58(b) you found C for specifications LSL 5 1500 and USL 5 3500 using the standard deviation s ⫽ 383.8 for all 80 individual jackets in Table 17.1. Repeat the calculation using the control ˆ pk to be chart estimate sˆ ⫽ s兾c4. You should find this C slightly larger. 17.63 Estimating capability indexes for the distance between holes. Figure 17.10 (page 17-22) displays a record sheet on which operators have recorded 18 samples of measurements on the distance between two mounting holes on an electrical meter. Sample 5 was out of control on an s chart. We remove it from the data after the special cause has been fixed. In Exercise 17.47 (page 17-39), you saw that the measurements are reasonably Normal. MOUNT (a) Based on the remaining 17 samples, estimate the mean and standard deviation of the distance between holes for the population of all meters produced by this process. Make a sketch comparing the Normal distribution with this mean and standard deviation with the specification limits 54 ⫾ 10. ˆ pk based on the data? How would ˆ p and C (b) What are C you characterize the capability of the process? (Mention both center and variability.) 17.64 Calculating capability indexes for the DRG 209 hospital losses. Table 17.9 (page 17-38) gives data on a hospital’s losses for 120 DRG 209 patients, collected as 15 monthly samples of 8 patients each. The process has been in control and losses have a roughly Normal distribution. The hospital decides that suitable specification limits for its loss in treating one such patient are LSL 5 $4500 and USL 5 $7500. DRG (a) Estimate the percent of losses that meet the specifications.
(c) What general fact do your calculations illustrate? (b) Estimate Cp. (c) Estimate Cpk.
height = 1
0
1
FIGURE 17.21 Density curve for the uniform distribution on 0 to 1, for Exercise 17.61.
17.65 Assessing the capability of the skateboard bearings process. Recall the skateboard bearings process described in Exercise 17.50 (page 17-40). The bore diameter has specifications 17.9920, 8.0002 mm. The process is monitored by x and s charts based on samples of 5 consecutive bearings each hour. Control has recently been excellent. The 200 individual measurements from the past week’s 40 samples have x ⫽ 7.996 mm s ⫽ 0.0023 mm
17-50
CHAPTER 17
•
Statistics for Quality: Control and Capability
A Normal quantile plot shows no important deviations from Normality. (a) What percent of bearings will meet specifications if the process remains in its current state? (b) Estimate the capability index Cpk. 17.66 Will these actions help the capability? Based on the results of the previous exercise, you conclude that the capability of the bearing-making process is inadequate. Here are some suggestions for improving the capability of this process. Comment on the usefulness of each action suggested. (a) Narrowing the control limits so that the process is adjusted more often. (b) Additional training of operators to ensure correct operating procedures. (c) A capital investment program to install new fabricating machinery. (d) An award program for operators who produce the fewest nonconforming bearings. (e) Purchasing more uniform (and more expensive) metal stock from which to form the bearings. 17.67 Cp and “six-sigma.” A process with Cp ⱖ 2 is sometimes said to have “six-sigma quality.” Sketch the specification limits and a Normal distribution of individual measurements for such a process when it is properly centered. Explain from your sketch why this is called sixsigma quality. 17.68 More on “six-sigma quality.” The originators of the “six-sigma quality” idea reasoned as follows. Shortterm process variation is described by s. In the long term, the process mean m will also vary. Studies show that in most manufacturing processes, ⫾1.5s is adequate to allow for changes in m. The six-sigma standard is intended to allow the mean m to be as much as 1.5s away from the center of the specifications and still meet high standards for percent of output lying outside the specifications. (a) Sketch the specification limits and a Normal distribution for process output when Cp ⫽ 2 and the mean is 1.5s away from the center of the specifications. (b What is Cpk in this case? Is six-sigma quality as strong a requirement as Cpk ⱖ 2? (c) Because most people don’t understand standard deviations, six-sigma quality is usually described as guaranteeing a certain level of parts per million of output that fails to meet specifications. Based on
your sketch in part (a), what is the probability of an outcome outside the specification limits when the mean is 1.5s away from the center? How many parts per million is this? (You will need software or a calculator for Normal probability calculations, because the value you want is beyond the limits of the standard Normal table.) Table 17.12 gives the process control samples that lie behind the histogram of call center response times in Figure 17.17(b) on page 17-42. A sample of 6 calls is recorded each shift for quality improvement purposes. The time from the first ring until a representative answers the call is recorded. Table 17.12 gives data for 50 shifts, 300 calls total. Exercises 17.69 to 17.71 make use of this setting. 17.69 Choosing the sample. The 6 calls each shift are chosen at random from all calls received during the shift. Discuss the reasons behind this choice and those behind a choice to time 6 consecutive calls. 17.70 Constructing and interpreting the s chart. Table 17.12 also gives x and s for each of the 50 samples. (a) Make an s chart and check for points out of control. (b) If the s-type cause responsible is found and removed, what would be the new control limits for the s chart? Verify that no points s are now out of control. (c) Use the remaining 46 samples to find the center line and control limits for an x chart. Comment on the control (or lack of control) of x. (Because the distribution of response times is strongly skewed, s is large and the control limits for x are wide. Control charts based on Normal distributions often work poorly when measurements are strongly skewed.) 17.71 More on interpreting the s chart. Each of the 4 out-of-control values of s in part (a) of the previous exercise is explained by a single outlier, a very long response time to one call in the sample. You can see these outliers in Figure 17.17(b). What are the values of these outliers, and what are the s-values for the 4 samples when the outliers are omitted? (The interpretation of the data is, unfortunately, now clear. Few customers will wait 5 minutes for a call to be answered, as the customer whose call took 333 seconds to answer did. We suspect that other customers hung up before their calls were answered. If so, response time data for the calls that were answered don’t adequately picture the quality of service. We should now look at data on calls lost before being answered to see a fuller picture.)
17.3 Process Capability Indexes
17-51
TABLE 17.12 Fifty Control Chart Samples of Call Center Response Times Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Time (seconds) 59 38 46 25 6 17 9 8 12 42 63 12 43 9 21 24 27 7 15 16 7 32 31 4 28 111 83 276 4 23 14 22 30 101 13 20 21 8 15 20 16 18 43 67 118 71 12 4 18 4
13 12 44 7 9 17 9 10 82 19 5 4 37 26 14 11 10 28 14 65 44 52 8 46 6 6 27 14 29 22 111 11 7 55 11 83 5 70 7 4 47 22 20 20 18 85 11 63 55 3
2 46 4 10 122 9 10 41 97 14 21 111 27 5 19 10 32 22 34 6 14 75 36 23 46 3 2 30 21 19 20 53 10 18 22 25 14 56 9 16 97 244 77 4 1 24 13 14 13 17
24 17 74 46 8 15 32 13 33 21 11 37 65 10 44 22 96 17 5 5 16 11 25 58 4 83 56 8 23 66 7 20 11 20 15 10 22 8 144 20 27 19 22 28 35 333 19 22 11 11
11 77 41 78 16 24 9 17 76 12 47 12 32 30 49 43 11 9 38 58 4 11 14 5 28 27 26 7 4 51 7 14 9 77 2 34 10 26 11 124 61 10 7 5 78 50 16 43 6 6
18 12 22 14 15 70 68 50 56 44 8 24 3 27 10 70 29 24 29 17 46 17 85 54 11 6 21 12 14 60 87 41 9 14 14 23 68 7 109 16 35 6 33 7 35 11 91 25 13 17
Sample mean
Standard deviation
21.2 33.7 38.5 30.0 29.3 25.3 22.8 23.2 59.3 25.3 25.8 33.3 34.5 17.8 26.2 30.0 34.2 17.8 22.5 27.8 21.8 33.0 33.2 31.7 20.5 39.3 35.8 57.8 15.8 40.2 41.0 26.8 12.7 47.5 12.8 32.5 23.3 29.2 49.2 33.3 47.2 53.2 33.7 21.8 47.5 95.7 27.0 28.5 19.3 9.7
19.93 25.56 23.73 27.46 45.57 22.40 23.93 17.79 32.11 14.08 23.77 39.76 20.32 10.98 16.29 22.93 31.71 8.42 13.03 26.63 18.49 25.88 27.45 24.29 16.34 46.34 28.88 107.20 10.34 21.22 45.82 16.56 8.59 36.16 6.49 25.93 22.82 27.51 60.97 44.80 28.99 93.68 24.49 24.09 43.00 119.53 31.49 21.29 17.90 6.31
17-52
CHAPTER 17
•
Statistics for Quality: Control and Capability
17.4 Control Charts for Sample Proportions When you complete this section, you will be able to x chart. • Know when to use a p chart rather than an – • Compute the center line and control limits for a p chart and utilize the chart for process monitoring.
We have considered control charts for just one kind of data: measurements of a quantitative variable in some meaningful scale of units. We describe the distribution of measurements by its center and spread and use x and s or x and R charts for process control. There are control charts for other statistics that are appropriate for other kinds of data. The most common of these is the p chart for use when the data are proportions.
p CHART A p chart is a control chart based on plotting sample proportions pˆ from regular samples from a process against the order in which the samples were taken.
EXAMPLE 17.19 Examples of the p chart. Here are two examples of the usefulness of p charts: Manufacturing. Measure two dimensions of a part and also grade its surface finish by eye. The part conforms if both dimensions lie within their specifications and the finish is judged acceptable. Otherwise, it is nonconforming. Plot the proportion of nonconforming parts in samples of parts from each shift. School absenteeism. An urban school system records the percent of its eighth-grade students who are absent three or more days each month. Because students with high absenteeism in eighth grade often fail to complete high school, the school system has launched programs to reduce absenteeism. These programs include calls to parents of absent students, public-service messages to change community expectations, and measures to ensure that the schools are safe and attractive. A p chart will show if the programs are having an effect. The manufacturing example illustrates an advantage of p charts: they can combine several specifications in a single chart. Nonetheless, p charts have been rendered outdated in many manufacturing applications by improvements in typical levels of quality. When the proportion of nonconforming parts is very small, even large samples of parts will rarely contain any bad parts. The sample proportions will almost all be 0, so that plotting them is uninformative.
17.4 Control Charts for Sample Proportions
17-53
It is better to choose important measured characteristics—voltage at a critical circuit point, for example—and keep x and s charts. Even if the voltage is satisfactory, quality can be improved by moving it yet closer to the exact voltage specified in the design of the part. The school absenteeism example is a management application of p charts. More than 19% of all American eighth-graders miss three or more days of school per month, and this proportion is higher in large cities and for certain ethnic groups.16 A p chart will be useful. Proportions of “things going wrong” are often higher in business processes than in manufacturing, so that p charts are an important tool in business.
Control limits for p charts We studied the sampling distribution of a sample proportion pˆ in Chapter 5. The center line and control limits for a 3s control chart follow directly from the facts stated there, in the box on page 330. We ought to call such charts “pˆ charts” because they plot sample proportions. Unfortunately, they have always been called p charts in quality control circles. We will keep the traditional name but also keep our usual notation: p is a process proportion and pˆ is a sample proportion.
p CHART USING PAST DATA Take regular samples from a process that has been in control. The samples need not all have the same size. Estimate the process proportion p of “successes” by p⫽
total number of successes in past samples total number of opportunities in these samples
The center line and control limits for a p chart for future samples of size n are UCL ⫽ p ⫹ 3
p11 ⫺ p 2 n B
CL ⫽ p LCL ⫽ p ⫺ 3
p11 ⫺ p 2 n B
Common out-of-control signals are one sample proportion pˆ outside the control limits or a run of 9 sample proportions on the same side of the center line.
If we have k past samples of the same size n, then p is just the average of the k sample proportions. In some settings, you may meet samples of unequal size—differing numbers of students enrolled in a month or differing numbers of parts inspected in a shift. The average p estimates the process proportion p even when the sample sizes vary. Note that the control limits use the actual size n of a sample.
17-54
CHAPTER 17
•
Statistics for Quality: Control and Capability
EXAMPLE 17.20 Monitoring employees’ absences. Unscheduled absences by clerical and production workers are an important cost in many companies. Reducing the rate of absenteeism is therefore an important goal for a company’s human relations department. A rate of absenteeism above 5% is a serious concern. Many companies set 3% absent as a desirable target. You have been asked to improve absenteeism in a production facility where 12% of the workers are now absent on a typical day. You first do some background study—in greater depth than this very brief summary. Companies try to avoid hiring workers who are likely to miss work often, such as substance abusers. They may have policies that reward good attendance or penalize frequent absences by individual workers. Changing those policies in this facility will have to wait until the union contract is renegotiated. What might you do with the current workers under current policies? Studies of absenteeism by clerical and production workers who do repetitive, routine work under close supervision point to unpleasant work environment and harsh or unfair treatment by supervisors as factors that increase absenteeism. It’s now up to you to apply this general knowledge to your specific problem.
First, collect data. Daily absenteeism data are already available. You carry out a sample survey that asks workers about their absences and the reasons for them (responses are anonymous, of course). Workers who are more often absent complain about their supervisors and about the lighting at their workstations. Female workers complain that the rest rooms are dirty and unpleasant. You do more data analysis: • A Pareto chart of average absenteeism rate for the past month broken down by supervisor (Figure 17.22) shows important differences among
FIGURE 17.22 Pareto chart of
25
Average percent of workers absent
the average absenteeism rate for workers reporting to each of 12 supervisors.
20
15
10
5
0
I
D
A
L
F
G
K
Supervisor
C
J
B
E
H
17.4 Control Charts for Sample Proportions
17-55
supervisors. Only supervisors B, E, and H meet the level of 5% or less absenteeism. Workers supervised by I and D have particularly high rates. • Another Pareto chart (not shown) by type of workstation shows that a few types of workstation have high absenteeism rates. Now you take action. You retrain all the supervisors in human relations skills, using B, E, and H as discussion leaders. In addition, a trainer works individually with supervisors I and D. You ask supervisors to talk with any absent worker when he or she returns to work. Working with the engineering department, you study the workstations with high absenteeism rates and make changes such as better lighting. You refurbish the rest rooms (for both genders even though only women complained) and schedule more frequent cleaning.
EXAMPLE 17.21 Are your actions effective? You hope to see a reduction in absenteeism. To view progress (or lack of progress), you will keep a p chart of the proportion of absentees. The plant has 987 production workers. For simplicity, you just record the number who are absent from work each day. Only unscheduled absences count, not planned time off such as vacations. Each day you will plot pˆ ⫽
number of workers absent 987
You first look back at data for the past three months. There were 64 workdays in these months. The total workdays available for the workers was 1642 19872 ⫽ 63, 168 person-days ˇ
Absences among all workers totaled 7580 person-days. The average daily proportion absent was therefore p⫽
total days absent total days available for work
7580 ⫽ 0.120 63, 168 The daily rate has been in control at this level. These past data allow you to set up a p chart to monitor future proportions absent: ⫽
ˇ
UCL ⫽ p ⫹ 3
p11 ⫺ p 2 10.1202 10.8802 ⫽ 0.120 ⫹ 3 n B B 987
⫽ 0.120 ⫹ 0.031 ⫽ 0.151 CL ⫽ p ⫽ 0.120 LCL ⫽ p ⫺ 3
p11 ⫺ p 2 10.1202 10.8802 ⫽ 0.120 ⫺ 3 n B B 987
⫽ 0.120 ⫺ 0.031 ⫽ 0.089 Table 17.13 gives the data for the next four weeks. Figure 17.23 is the p chart.
17-56
CHAPTER 17
•
Statistics for Quality: Control and Capability
TABLE 17.13 Proportions of Workers Absent During Four Weeks Day
M
T
W
Th
F
M
T
W
Th
F
Workers absent
129
121
117
109
122
119
103
103
89
105
0.131
0.123
0.119
0.110
0.124
0.121
0.104
0.104
0.090
0.106
Day
M
T
W
Th
F
M
T
W
Th
F
Workers absent
99
92
83
92
92
115
101
106
83
98
0.100
0.093
0.084
0.093
0.093
0.117
0.102
0.107
0.084
0.099
Proportion pˆ
Proportion pˆ
0.18
Proportion absent
0.16
FIGURE 17.23 The p chart for daily proportion of workers absent over a four-week period, for Example 17.21. The lack of control shows an improvement (decrease) in absenteeism. Update the chart to continue monitoring the process.
UCL
0.14 0.12
x
0.10
x
LCL 0.08
x
x
x
x x
0.06 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Day
Figure 17.23 shows a clear downward trend in the daily proportion of workers who are absent. Days 13 and 19 lie below LCL, and a run of 9 days below the center line is achieved at Day 15 and continues. The points marked “x” are therefore all out of control. It appears that a special cause (the various actions you took) has reduced the absenteeism rate from around 12% to around 10%. The last two weeks’ data suggest that the rate has stabilized at this level. You will update the chart based on the new data. If the rate does not decline further (or even rises again as the effect of your actions wears off), you will consider further changes. Example 17.21 is a bit oversimplified. The number of workers available did not remain fixed at 987 each day. Hirings, resignations, and planned vacations change the number a bit from day to day. The control limits for a day’s pˆ depend on n, the number of workers that day. If n varies, the control limits will move in and out from day to day. Software will do the extra arithmetic needed for a different n each day, but as long as the count of workers remains close to 987, the greater detail will not change your conclusion. A single p chart for all workers is not the only, or even the best, choice in this setting. Because of the important role of supervisors in absenteeism, it
Chapter 17 Exercises
17-57
would be wise to also keep separate p charts for the workers under each supervisor. These charts may show that you must reassign some supervisors.
SECTION 17.4 Summary There are control charts for several different types of process measurements. One important type is the p chart for sample proportions pˆ . The interpretation of p charts is very similar to that of x charts. The out-ofcontrol rules used are also the same.
SECTION 17.4 Exercises 17.72 Constructing a p chart for absenteeism. After inspecting Figure 17.23, you decide to monitor the next four weeks’ absenteeism rates using a center line and control limits calculated from the second two weeks’ data recorded in Table 17.13. Find p for these 10 days and give the new values of CL, LCL, and UCL. (Until you have more data, these are trial control limits. As long as you are taking steps to improve absenteeism, you have not reached the process-monitoring stage.) 17.73 Constructing a p chart for unpaid invoices. The controller’s office of a corporation is concerned that invoices that remain unpaid after 30 days are damaging relations with vendors. To assess the magnitude of the problem, a manager searches payment records for invoices that arrived in the past 10 months. The average number of invoices is 2635 per month, with relatively little month-to-month variation. Of all these invoices, 957 remained unpaid after 30 days. (a) What is the total number of opportunities for unpaid invoices? What is p? (b) Give the center line and control limits for a p chart on which to plot the future monthly proportions of unpaid invoices. 17.74 Constructing a p chart for mishandled baggage. The Department of Transportation reports that 3.09 of every 1000 passengers on domestic flights of the 10 largest U.S. airlines file a report of mishandled baggage.17 Starting with this information, you plan to sample records for 2500 passengers per day at a large airport to monitor the effects of efforts to reduce mishandled baggage. What are the initial center line and control limits for a chart of the daily proportion of mishandled baggage reports? (You will find that LCL , 0. Because proportions pˆ are always 0 or positive, take LCL 5 0.) 17.75 Constructing a p chart for damaged eggs. An egg farm wants to monitor the effects of some new handling procedures on the percent of eggs arriving at the packaging center with cracked or broken shells. In the past, 2.31% of the eggs were damaged. A machine will
allow the farm to inspect 500 eggs per hour. What are the initial center line and control limits for a chart of the hourly percent of damaged eggs? 17.76 More on constructing a p chart for damaged eggs. Refer to Exercise 17.75. Suppose that there are two machine operators, each working four-hour shifts. The first operator is very skilled and can inspect 500 eggs per hour. The second operator is less experienced and can inspect only 400 eggs per hour. Construct a p chart for an eight-hour day showing the appropriate center line and control limits. 17.77 Constructing a p chart for missing or deformed rivets. After completion of an aircraft wing assembly, inspectors count the number of missing or deformed rivets. There are hundreds of rivets in each wing, but the total number varies depending on the aircraft type. Recent data for wings with a total of 38,370 rivets show 194 missing or deformed. The next wing contains 1520 rivets. What are the appropriate center line and control limits for plotting the pˆ from this wing on a p chart? 17.78 Constructing the p chart limits for incorrect or illegible prescriptions. A regional chain of retail pharmacies finds that about 1% of prescriptions it receives from doctors are incorrect or illegible. The chain puts in place a secure online system that doctors’ offices can use to enter prescriptions directly. It hopes that fewer prescriptions entered online will be incorrect or illegible. A p chart will monitor progress. Use information about past prescriptions to set initial center line and control limits for the proportion of incorrect or illegible prescriptions on a day when the chain fills 90,000 online prescriptions. What are the center line and control limits for a day when only 45,000 online prescriptions are filled? 17.79 Calculating the p chart limits for school absenteeism. Here are data from an urban school district on the number of eighth-grade students with three or more unexcused absences from school during each month of a school year. Because the total number of eighth-graders changes a bit from month to month, these totals are also given for each month.
17-58
Month
CHAPTER 17
•
Statistics for Quality: Control and Capability
Sept. Oct. Nov. Dec. Jan. Feb. Mar. Apr. May June
Students
911
947
939
942
918
920
931
925
902
883
Absent
291
349
364
335
301
322
344
324
303
344
(a) Find p. Because the number of students varies from month to month, also find n, the average per month. (b) Make a p chart using control limits based on n students each month. Comment on control. (c) The exact control limits are different each month because the number of students n is different each month. This situation is common in using p charts. What are the exact limits for October and June, the months with the largest and smallest n? Add these limits to your p chart, using short lines spanning a single month. Do exact limits affect your conclusions? 17.80 p chart for a high-quality process. A manufacturer of consumer electronic equipment makes full use not only of statistical process control but also of automated testing equipment that efficiently tests all completed products. Data from the testing equipment show that finished products have only 2.9 defects per million opportunities.
(a) What is p for the manufacturing process? If the process turns out 5000 pieces per day, how many defects do you expect to see per day? In a typical month of 24 working days, how many defects do you expect to see? (b) What are the center line and control limits for a p chart for plotting daily defect proportions? (c) Explain why a p chart is of no use at such high levels of quality. 17.81 More on monitoring a high-quality process. Because the manufacturing quality in the previous exercise is so high, the process of writing up orders is the major source of quality problems: the defect rate there is 8000 per million opportunities. The manufacturer processes about 500 orders per month. (a) What is p for the order-writing process? How many defective orders do you expect to see in a month? (b) What are the center line and control limits for a p chart for plotting monthly proportions of defective orders? What is the smallest number of bad orders in a month that will result in a point above the upper control limit?
CHAPTER 17 Exercises 17.82 Describing a process that is in control. A manager who knows no statistics asks you, “What does it mean to say that a process is in control? Is being in control a guarantee that the quality of the product is good?” Answer these questions in plain language that the manager can understand. 17.83 Constructing a Pareto chart. You manage the customer service operation for a maker of electronic equipment sold to business customers. Traditionally, the most common complaint is that equipment does not operate properly when installed, but attention to manufacturing and installation quality will reduce these complaints. You hire an outside firm to conduct a sample survey of your customers. Here are the percents of customers with each of several kinds of complaints: Category Accuracy of invoices Clarity of operating manual
Percent 25 8
Complete invoice
24
Complete shipment
16
Correct equipment shipped
15
Ease of obtaining invoice adjustments/credits
33
Equipment operates when installed Meeting promised delivery date Sales rep returns calls Technical competence of sales rep
6 11 4 12
(a) Why do the percents not add to 100%? (b) Make a Pareto chart. What area would you choose as a target for improvement? 17.84 Choice of control chart. What type of control chart or charts would you use as part of efforts to assess quality? Explain your choices. (a) Time to get security clearance (b) Percent of job offers accepted (c) Thickness of steel washers (d) Number of dropped calls per day 17.85 Interpreting signals. Explain the difference in the interpretation of a point falling beyond the upper control limit of the x chart versus a point falling beyond the upper control limit of an s chart. 17.86 Selecting the appropriate control chart and limits. At the present time, about 5 out of every 1000 lots of material arriving at a plant site from outside vendors are rejected because they do not meet specifications. The plant receives about 350 lots per week. As part of an effort to reduce errors in the system of placing and filling orders, you will monitor the proportion of rejected lots each week. What type of control chart will you use? What are the initial center line and control limits?
Chapter 17 Exercises You have just installed a new system that uses an interferometer to measure the thickness of polystyrene film. To control the thickness, you plan to measure 3 film specimens every 10 minutes and keep x and s charts. To establish control, you measure 22 samples of 3 films each at 10-minute intervals. Table 17.14 gives x and s for these samples. The units are millimeters 3 1024. Exercises 17.87 to 17.91 are based on this process improvement setting. 17.87 Constructing the s chart. Calculate control limits for s, make an s chart, and comment on control of shortterm process variation. THICK 17.88 Recalculating the x– and s charts. Interviews with the operators reveal that in Samples 1 and 10 mistakes in operating the interferometer resulted in one high-outlier thickness reading that was clearly incorrect. Recalculate x and s after removing Samples 1 and 10. Recalculate UCL for the s chart and add the new UCL to your s chart from the previous exercise. Control for the remaining samples is excellent. Now find the appropriate center line and control limits for an x chart, make the x chart, and comment on control. THICK 17.89 Capability of the film thickness process. The specifications call for film thickness 830 ⫾ 25 mm ⫻ 10 ⫺4. (a) What is the estimate sˆ of the process standard deviation based on the sample standard deviations (after removing Samples 1 and 10)? Estimate the capability ratio Cp and comment on what it says about this process. (b) Because the process mean can easily be adjusted, Cp is more informative than Cpk. Explain why this is true. (c) The estimate of Cp from part (a) is probably too optimistic as a description of the film produced. Explain why.
17.90 Calculating the percent that meet specifications. Examination of individual measurements shows that they are close to Normal. If the process mean is set to the target value, about what percent of films will meet the specifications? THICK 17.91 More on the film thickness process. Previously, control of the process was based on categorizing the thickness of each film inspected as satisfactory or not. Steady improvement in process quality has occurred, so that just 15 of the last 5000 films inspected were unsatisfactory. THICK (a) What type of control chart would be used in this setting, and what would be the control limits for a sample of 100 films? (b) The chart in part (a) is of little practical value at current quality levels. Explain why. 17.92 Probability of an out-of-control signal. There are other out-of-control rules that are sometimes used with x charts. One is “15 points in a row within the 1s level.” That is, 15 consecutive points fall between m ⫺ s兾 1n and m ⫹ s兾 1n. This signal suggests either that the value of s used for the chart is too large or that careless measurement is producing results that are suspiciously close to the target. Find the probability that the next 15 points will give this signal when the process remains in control with the given m and s. 17.93 Probability of another out-of-control signal. Another out-of-control signal is when four out of five successive points are on the same side of the center line and farther than s兾 1n from it. Find the probability of this event when the process is in control.
TABLE 17.14 – x and s for Samples of Film Thickness (mm 3 1024) Sample
17-59
x
s
Sample
x
s
1
848
20.1
12
823
12.6
2
832
1.1
13
835
4.4
3
826
11.0
14
843
3.6
4
833
7.5
15
841
5.9
5
837
12.5
16
840
3.6
6
834
1.8
17
833
4.9
7
834
1.3
18
840
8.0
8
838
7.4
19
826
6.1
9
835
2.1
20
839
10.2
10
852
18.9
21
836
14.8
11
836
3.8
22
829
6.7
17-60
CHAPTER 17
•
Statistics for Quality: Control and Capability
CHAPTER 17 Notes and Data Sources 1. Texts on quality management give more detail about these and other simple graphical methods for quality problems. The classic reference is Kaoru Ishikawa, Guide to Quality Control, Asian Productivity Organization, 1986.
of Irving W. Burr, Statistical Quality Control Methods, Marcel Dekker, 1976.
2. The flowchart and a more elaborate version of the cause-and-effect diagram for Example 17.3 were prepared by S. K. Bhat of the General Motors Technical Center as part of a course assignment at Purdue University.
10. The control limits for the s chart based on past data are commonly given as B4s and B3s. That is, B4 ⫽ B6兾c4 and B3 ⫽ B5兾c4. This is convenient for users, but we choose to minimize the number of control chart constants students must keep straight and to emphasize that process-monitoring and past-data charts are exactly the same except for the source of m and s.
3. Walter Shewhart’s classic book, Economic Control of Quality of Manufactured Product (Van Nostrand, 1931), organized the application of statistics to improving quality.
11. Simulated data based on information appearing in Arvind Salvekar, “Application of six sigma to DRG 209,” found at the Smarter Solutions website, www. smartersolutions.com.
4. We have adopted the terms “chart setup” and “process monitoring” from Andrew C. Palm’s discussion of William H. Woodall, “Controversies and contradictions in statistical process control,” Journal of Quality Technology, 32 (2000), pp. 341–350. Palm’s discussion appears in the same issue, pp. 356–360. We have combined Palm’s stages B (“process improvement”) and C (“process monitoring”) in writing for beginners because the distinction between them is one of degree.
12. Data provided by Linda McCabe, Purdue University.
5. It is common to call these “standards given” x and s charts. We avoid this term because it easily leads to the common and serious error of confusing control limits (based on the process itself) with standards or specifications imposed from outside. 6. Data provided by Charles Hicks, Purdue University. 7. See, for example, Chapter 3 of Stephen B. Vardeman and J. Marcus Jobe, Statistical Quality Assurance Methods for Engineers, Wiley, 1999. 8. The classic discussion of out-of-control signals and the types of special causes that may lie behind special control chart patterns is the AT&T Statistical Quality Control Handbook, Western Electric, 1956. 9. The data in Table 17.6 are adapted from data on viscosity of rubber samples appearing in Table P3.3
13. The first two Deming quotations are from Public Sector Quality Report, December 1993, p. 5. They were found online at deming.eng.clemson.edu/pub/den/files/ demqtes.txt. The third quotation is part of the 10th of Deming’s “14 points of quality management,” from his book Out of the Crisis, MIT Press, 1986. 14. Control charts for individual measurements cannot use within-sample standard deviations to estimate shortterm process variability. The spread between successive observations is the next best thing. Texts such as that cited in Note 7 give the details. 15. The data in Figure 17.17(b) are simulated from a probability model for call pickup times. That pickup times for large financial institutions have median 20 seconds and mean 32 seconds is reported by Jon Anton, “A case study in benchmarking call centers,” Purdue University Center for Customer-Driven Quality, no date. 16. These 2011 statistics can be found at nces.ed.gov/ programs/digest/d12/tables/dt12_187.asp. 17. Data obtained from “Air travel consumer report,” Office of Aviation Enforcement and Proceedings, March 2013.
TA B L E S Table A Standard Normal Probabilities Table B Random Digits Table C Binomial Probabilities Table D t Distribution Critical Values Table E F Critical Values Table F x2 Distribution Critical Values
T-1
T-2
Tables
Probability
Table entry for z is the area under the standard Normal curve to the left of z.
TABLE A
z
Standard Normal probabilities
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
23.4 23.3 23.2 23.1 23.0 22.9 22.8 22.7 22.6 22.5 22.4 22.3 22.2 22.1 22.0 21.9 21.8 21.7 21.6 21.5 21.4 21.3 21.2 21.1 21.0 20.9 20.8 20.7 20.6 20.5 20.4 20.3 20.2 20.1 20.0
.0003 .0005 .0007 .0010 .0013 .0019 .0026 .0035 .0047 .0062 .0082 .0107 .0139 .0179 .0228 .0287 .0359 .0446 .0548 .0668 .0808 .0968 .1151 .1357 .1587 .1841 .2119 .2420 .2743 .3085 .3446 .3821 .4207 .4602 .5000
.0003 .0005 .0007 .0009 .0013 .0018 .0025 .0034 .0045 .0060 .0080 .0104 .0136 .0174 .0222 .0281 .0351 .0436 .0537 .0655 .0793 .0951 .1131 .1335 .1562 .1814 .2090 .2389 .2709 .3050 .3409 .3783 .4168 .4562 .4960
.0003 .0005 .0006 .0009 .0013 .0018 .0024 .0033 .0044 .0059 .0078 .0102 .0132 .0170 .0217 .0274 .0344 .0427 .0526 .0643 .0778 .0934 .1112 .1314 .1539 .1788 .2061 .2358 .2676 .3015 .3372 .3745 .4129 .4522 .4920
.0003 .0004 .0006 .0009 .0012 .0017 .0023 .0032 .0043 .0057 .0075 .0099 .0129 .0166 .0212 .0268 .0336 .0418 .0516 .0630 .0764 .0918 .1093 .1292 .1515 .1762 .2033 .2327 .2643 .2981 .3336 .3707 .4090 .4483 .4880
.0003 .0004 .0006 .0008 .0012 .0016 .0023 .0031 .0041 .0055 .0073 .0096 .0125 .0162 .0207 .0262 .0329 .0409 .0505 .0618 .0749 .0901 .1075 .1271 .1492 .1736 .2005 .2296 .2611 .2946 .3300 .3669 .4052 .4443 .4840
.0003 .0004 .0006 .0008 .0011 .0016 .0022 .0030 .0040 .0054 .0071 .0094 .0122 .0158 .0202 .0256 .0322 .0401 .0495 .0606 .0735 .0885 .1056 .1251 .1469 .1711 .1977 .2266 .2578 .2912 .3264 .3632 .4013 .4404 .4801
.0003 .0004 .0006 .0008 .0011 .0015 .0021 .0029 .0039 .0052 .0069 .0091 .0119 .0154 .0197 .0250 .0314 .0392 .0485 .0594 .0721 .0869 .1038 .1230 .1446 .1685 .1949 .2236 .2546 .2877 .3228 .3594 .3974 .4364 .4761
.0003 .0004 .0005 .0008 .0011 .0015 .0021 .0028 .0038 .0051 .0068 .0089 .0116 .0150 .0192 .0244 .0307 .0384 .0475 .0582 .0708 .0853 .1020 .1210 .1423 .1660 .1922 .2206 .2514 .2843 .3192 .3557 .3936 .4325 .4721
.0003 .0004 .0005 .0007 .0010 .0014 .0020 .0027 .0037 .0049 .0066 .0087 .0113 .0146 .0188 .0239 .0301 .0375 .0465 .0571 .0694 .0838 .1003 .1190 .1401 .1635 .1894 .2177 .2483 .2810 .3156 .3520 .3897 .4286 .4681
.0002 .0003 .0005 .0007 .0010 .0014 .0019 .0026 .0036 .0048 .0064 .0084 .0110 .0143 .0183 .0233 .0294 .0367 .0455 .0559 .0681 .0823 .0985 .1170 .1379 .1611 .1867 .2148 .2451 .2776 .3121 .3483 .3859 .4247 .4641
Tables
T-3
Probability
Table entry for z is the area under the standard Normal curve to the left of z.
TABLE A
z
Standard Normal probabilities (continued)
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4
.5000 .5398 .5793 .6179 .6554 .6915 .7257 .7580 .7881 .8159 .8413 .8643 .8849 .9032 .9192 .9332 .9452 .9554 .9641 .9713 .9772 .9821 .9861 .9893 .9918 .9938 .9953 .9965 .9974 .9981 .9987 .9990 .9993 .9995 .9997
.5040 .5438 .5832 .6217 .6591 .6950 .7291 .7611 .7910 .8186 .8438 .8665 .8869 .9049 .9207 .9345 .9463 .9564 .9649 .9719 .9778 .9826 .9864 .9896 .9920 .9940 .9955 .9966 .9975 .9982 .9987 .9991 .9993 .9995 .9997
.5080 .5478 .5871 .6255 .6628 .6985 .7324 .7642 .7939 .8212 .8461 .8686 .8888 .9066 .9222 .9357 .9474 .9573 .9656 .9726 .9783 .9830 .9868 .9898 .9922 .9941 .9956 .9967 .9976 .9982 .9987 .9991 .9994 .9995 .9997
.5120 .5517 .5910 .6293 .6664 .7019 .7357 .7673 .7967 .8238 .8485 .8708 .8907 .9082 .9236 .9370 .9484 .9582 .9664 .9732 .9788 .9834 .9871 .9901 .9925 .9943 .9957 .9968 .9977 .9983 .9988 .9991 .9994 .9996 .9997
.5160 .5557 .5948 .6331 .6700 .7054 .7389 .7704 .7995 .8264 .8508 .8729 .8925 .9099 .9251 .9382 .9495 .9591 .9671 .9738 .9793 .9838 .9875 .9904 .9927 .9945 .9959 .9969 .9977 .9984 .9988 .9992 .9994 .9996 .9997
.5199 .5596 .5987 .6368 .6736 .7088 .7422 .7734 .8023 .8289 .8531 .8749 .8944 .9115 .9265 .9394 .9505 .9599 .9678 .9744 .9798 .9842 .9878 .9906 .9929 .9946 .9960 .9970 .9978 .9984 .9989 .9992 .9994 .9996 .9997
.5239 .5636 .6026 .6406 .6772 .7123 .7454 .7764 .8051 .8315 .8554 .8770 .8962 .9131 .9279 .9406 .9515 .9608 .9686 .9750 .9803 .9846 .9881 .9909 .9931 .9948 .9961 .9971 .9979 .9985 .9989 .9992 .9994 .9996 .9997
.5279 .5675 .6064 .6443 .6808 .7157 .7486 .7794 .8078 .8340 .8577 .8790 .8980 .9147 .9292 .9418 .9525 .9616 .9693 .9756 .9808 .9850 .9884 .9911 .9932 .9949 .9962 .9972 .9979 .9985 .9989 .9992 .9995 .9996 .9997
.5319 .5714 .6103 .6480 .6844 .7190 .7517 .7823 .8106 .8365 .8599 .8810 .8997 .9162 .9306 .9429 .9535 .9625 .9699 .9761 .9812 .9854 .9887 .9913 .9934 .9951 .9963 .9973 .9980 .9986 .9990 .9993 .9995 .9996 .9997
.5359 .5753 .6141 .6517 .6879 .7224 .7549 .7852 .8133 .8389 .8621 .8830 .9015 .9177 .9319 .9441 .9545 .9633 .9706 .9767 .9817 .9857 .9890 .9916 .9936 .9952 .9964 .9974 .9981 .9986 .9990 .9993 .9995 .9997 .9998
T-4
Tables
TABLE B
Random digits
Line 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
19223 73676 45467 52711 95592 68417 82739 60940 36009 38448 81486 59636 62568 45149 61041 14459 38167 73190 95857 35476 71487 13873 54580 71035 96746 96927 43909 15689 36759 69051 05007 68732 45740 27816 66925 08421 53645 66831 55588 12975 96767 72829 88565 62964 19687 37609 54973 00694 71546 07511
95034 47150 71709 38889 94007 35013 57890 72024 19365 48789 69487 88804 70206 32992 77684 26056 98532 32533 07118 55972 09984 81598 81507 09001 12149 19931 99477 14227 58984 64817 16632 55259 41807 78416 55658 44753 66812 68908 99404 13258 35964 50232 42628 88145 12633 59057 86278 05977 05233 88915
05756 99400 77558 93074 69971 15529 20807 17868 15412 18338 60513 04634 40325 75730 94322 31424 62183 04470 87664 39421 29077 95052 27102 43367 37823 36089 25330 06565 68288 87174 81194 84292 65561 18329 39100 77377 61421 40772 70708 13048 23822 97892 17797 83083 57857 66967 88737 19664 53946 41267
28713 01927 00095 60227 91481 72765 47511 24943 39638 24697 09297 71197 03699 66280 24709 80371 70632 29669 92099 65850 14863 90908 56027 49497 71868 74192 64359 14374 22913 09517 14873 08796 33302 21337 78458 28744 47836 21558 41098 45144 96012 63408 49376 69453 95806 83401 74351 65441 68743 16853
96409 27754 32863 40011 60779 85089 81676 61790 85453 39364 00412 19352 71080 03819 73698 65103 23417 84407 58806 04266 61683 73592 55892 72719 18442 77567 40085 13352 18638 84534 04197 43165 07051 35213 11206 75592 12609 47781 43563 72321 94591 77919 61762 46109 09931 60705 47500 20903 72460 84569
12531 42648 29485 85848 53791 57067 55300 90656 46816 42006 71238 73089 22553 56202 14526 62253 26185 90785 66979 35435 47052 75186 33063 96758 35119 88741 16925 49367 54303 06489 85576 93739 93623 37741 19876 08563 15373 33586 56934 81940 65194 44575 16953 59505 02150 02384 84552 62371 27601 79367
42544 82425 82226 48767 17297 50211 94383 87964 83485 76688 27649 84898 11486 02938 31893 50490 41448 65956 98624 43742 62224 87136 41842 27611 62103 48409 85117 81982 00795 87201 45195 31685 18132 04312 87151 79140 98481 79177 48394 00360 50842 24870 88604 69680 43163 90597 19909 22725 45403 32337
82853 36290 90056 52573 59335 47487 14893 18883 41979 08708 39950 45785 11776 70915 32592 61181 75532 86382 84826 11937 51025 95761 81868 91596 39244 41903 36071 87209 08727 97245 96565 97150 09547 68508 31260 92454 14592 06928 51719 02428 53372 04178 12724 00900 58636 93600 67181 53340 88692 03316
Tables
TABLE B
T-5
Random digits (continued)
Line 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
03802 77320 07886 87065 42090 55494 16698 16297 22897 98163 43400 97341 64578 11022 81232 36843 84329 27788 99224 38075 87368 40512 81636 26411 80011 92813 70348 24005 85063 11532 59618 92965 85116 15106 03638 97971 49345 87370 88296 79485 40830 32006 37569 56680 05172 74782 85288 68309 26461 42672
29341 35030 56866 74133 09628 67690 30406 07626 17467 45944 25831 46254 67197 79124 43939 84798 80081 85789 00850 73239 49451 00681 57578 94292 09937 87503 72871 52114 55810 73186 03914 50837 27684 10411 31589 48932 18305 88099 95670 92200 24979 76302 85187 79003 08100 27005 93264 12060 88346 67680
29264 77519 39648 21117 54035 88131 96587 68683 17638 34210 06283 88153 28310 49525 23840 51167 69516 41592 43737 52555 55771 44282 54286 06340 57195 63494 63419 26224 10470 92541 05208 39921 14597 90221 07871 45792 76213 89695 74932 99401 23333 81221 44692 23361 22316 03894 61409 14762 52430 42376
80198 41109 69290 70595 93879 81800 65985 45335 70043 64158 22138 62336 90341 63078 05995 44728 78934 74472 75202 46342 48343 47178 27216 97762 33906 71379 57363 39078 08029 06915 84088 84661 85747 49377 25792 63993 82390 87633 65317 54473 37619 00693 50706 67094 54495 98038 03404 58002 60906 95023
12371 98296 03600 22791 98441 11188 07165 34377 36243 76971 16043 21112 37531 17229 84589 20554 14293 96773 44753 13365 51236 08139 58758 37033 94831 76550 29685 80798 30025 72954 20426 82514 01596 44369 85823 95635 77412 76987 93848 34336 56227 95197 53161 15019 60005 20627 09649 03716 74216 82744
13121 18984 05376 67306 04606 28552 50148 72941 13008 27689 15706 35574 63890 32165 06788 55538 92478 27090 63236 02182 18522 78693 80358 85968 10056 45984 43090 15220 29734 10167 39004 81899 25889 28185 55400 28753 97401 85503 43988 82786 95941 75044 69027 63261 29532 40307 55937 81968 96263 03971
54969 60869 58958 28420 27381 25752 16201 41764 83993 82926 73345 99271 52630 01343 76358 27647 16479 24954 14260 30443 73670 34715 84115 94165 42211 05481 18763 43186 61181 12142 84582 24565 41998 80959 56026 46069 50650 26257 47597 05457 59494 46596 88389 24543 18433 47317 60843 57934 69296 96560
43912 12349 22720 52067 82637 21953 86792 77038 22869 75957 26238 45297 76315 21394 26622 32708 26974 41474 73686 53229 23212 75606 84568 46514 65491 50830 31714 00976 72090 26492 87317 60874 15635 76355 12193 84635 71755 51736 83044 60343 86539 11628 60313 52884 18057 92759 66167 32624 90107 55148
T-6
Tables
TABLE C Binomial probabilities n Entry is P 1X ⫽ k2 ⫽ a b pk 11 ⫺ p2 n⫺k k p n
k
.01
.02
.03
.04
.05
.06
.07
.08
.09
2
0 1 2
.9801 .0198 .0001
.9604 .0392 .0004
.9409 .0582 .0009
.9216 .0768 .0016
.9025 .0950 .0025
.8836 .1128 .0036
.8649 .1302 .0049
.8464 .1472 .0064
.8281 .1638 .0081
3
0 1 2 3
.9703 .0294 .0003
.9412 .0576 .0012
.9127 .0847 .0026
.8847 .1106 .0046 .0001
.8574 .1354 .0071 .0001
.8306 .1590 .0102 .0002
.8044 .1816 .0137 .0003
.7787 .2031 .0177 .0005
.7536 .2236 .0221 .0007
4
0 1 2 3 4
.9606 .0388 .0006
.9224 .0753 .0023
.8853 .1095 .0051 .0001
.8493 .1416 .0088 .0002
.8145 .1715 .0135 .0005
.7807 .1993 .0191 .0008
.7481 .2252 .0254 .0013
.7164 .2492 .0325 .0019
.6857 .2713 .0402 .0027 .0001
5
0 1 2 3 4 5
.9510 .0480 .0010
.9039 .0922 .0038 .0001
.8587 .1328 .0082 .0003
.8154 .1699 .0142 .0006
.7738 .2036 .0214 .0011
.7339 .2342 .0299 .0019 .0001
.6957 .2618 .0394 .0030 .0001
.6591 .2866 .0498 .0043 .0002
.6240 .3086 .0610 .0060 .0003
6
0 1 2 3 4 5 6 0 1 2 3 4 5 6 7
.9415 .0571 .0014
.8858 .1085 .0055 .0002
.8330 .1546 .0120 .0005
.7828 .1957 .0204 .0011
.7351 .2321 .0305 .0021 .0001
.6899 .2642 .0422 .0036 .0002
.6470 .2922 .0550 .0055 .0003
.6064 .3164 .0688 .0080 .0005
.5679 .3370 .0833 .0110 .0008
.9321 .0659 .0020
.8681 .1240 .0076 .0003
.8080 .1749 .0162 .0008
.7514 .2192 .0274 .0019 .0001
.6983 .2573 .0406 .0036 .0002
.6485 .2897 .0555 .0059 .0004
.6017 .3170 .0716 .0090 .0007
.5578 .3396 .0886 .0128 .0011 .0001
.5168 .3578 .1061 .0175 .0017 .0001
0 1 2 3 4 5 6 7 8
.9227 .0746 .0026 .0001
.8508 .1389 .0099 .0004
.7837 .1939 .0210 .0013 .0001
.7214 .2405 .0351 .0029 .0002
.6634 .2793 .0515 .0054 .0004
.6096 .3113 .0695 .0089 .0007
.5596 .3370 .0888 .0134 .0013 .0001
.5132 .3570 .1087 .0189 .0021 .0001
.4703 .3721 .1288 .0255 .0031 .0002
7
8
Tables
T-7
TABLE C Binomial probabilities (continued) n Entry is P 1X ⫽ k2 ⫽ a b pk 11 ⫺ p2 n⫺k k p n
k
.10
.15
.20
.25
.30
.35
.40
.45
.50
2
0 1 2
.8100 .1800 .0100
.7225 .2550 .0225
.6400 .3200 .0400
.5625 .3750 .0625
.4900 .4200 .0900
.4225 .4550 .1225
.3600 .4800 .1600
.3025 .4950 .2025
.2500 .5000 .2500
3
0 1 2 3
.7290 .2430 .0270 .0010
.6141 .3251 .0574 .0034
.5120 .3840 .0960 .0080
.4219 .4219 .1406 .0156
.3430 .4410 .1890 .0270
.2746 .4436 .2389 .0429
.2160 .4320 .2880 .0640
.1664 .4084 .3341 .0911
.1250 .3750 .3750 .1250
4
0 1 2 3 4
.6561 .2916 .0486 .0036 .0001
.5220 .3685 .0975 .0115 .0005
.4096 .4096 .1536 .0256 .0016
.3164 .4219 .2109 .0469 .0039
.2401 .4116 .2646 .0756 .0081
.1785 .3845 .3105 .1115 .0150
.1296 .3456 .3456 .1536 .0256
.0915 .2995 .3675 .2005 .0410
.0625 .2500 .3750 .2500 .0625
5
0 1 2 3 4 5
.5905 .3280 .0729 .0081 .0004
.4437 .3915 .1382 .0244 .0022 .0001
.3277 .4096 .2048 .0512 .0064 .0003
.2373 .3955 .2637 .0879 .0146 .0010
.1681 .3602 .3087 .1323 .0284 .0024
.1160 .3124 .3364 .1811 .0488 .0053
.0778 .2592 .3456 .2304 .0768 .0102
.0503 .2059 .3369 .2757 .1128 .0185
.0313 .1563 .3125 .3125 .1562 .0312
6
0 1 2 3 4 5 6
.5314 .3543 .0984 .0146 .0012 .0001
.3771 .3993 .1762 .0415 .0055 .0004
.2621 .3932 .2458 .0819 .0154 .0015 .0001
.1780 .3560 .2966 .1318 .0330 .0044 .0002
.1176 .3025 .3241 .1852 .0595 .0102 .0007
.0754 .2437 .3280 .2355 .0951 .0205 .0018
.0467 .1866 .3110 .2765 .1382 .0369 .0041
.0277 .1359 .2780 .3032 .1861 .0609 .0083
.0156 .0938 .2344 .3125 .2344 .0937 .0156
7
0 1 2 3 4 5 6 7
.4783 .3720 .1240 .0230 .0026 .0002
.3206 .3960 .2097 .0617 .0109 .0012 .0001
.2097 .3670 .2753 .1147 .0287 .0043 .0004
.1335 .3115 .3115 .1730 .0577 .0115 .0013 .0001
.0824 .2471 .3177 .2269 .0972 .0250 .0036 .0002
.0490 .1848 .2985 .2679 .1442 .0466 .0084 .0006
.0280 .1306 .2613 .2903 .1935 .0774 .0172 .0016
.0152 .0872 .2140 .2918 .2388 .1172 .0320 .0037
.0078 .0547 .1641 .2734 .2734 .1641 .0547 .0078
8
0 1 2 3 4 5 6 7 8
.4305 .3826 .1488 .0331 .0046 .0004
.2725 .3847 .2376 .0839 .0185 .0026 .0002
.1678 .3355 .2936 .1468 .0459 .0092 .0011 .0001
.1001 .2670 .3115 .2076 .0865 .0231 .0038 .0004
.0576 .1977 .2965 .2541 .1361 .0467 .0100 .0012 .0001
.0319 .1373 .2587 .2786 .1875 .0808 .0217 .0033 .0002
.0168 .0896 .2090 .2787 .2322 .1239 .0413 .0079 .0007
.0084 .0548 .1569 .2568 .2627 .1719 .0703 .0164 .0017
.0039 .0313 .1094 .2188 .2734 .2188 .1094 .0312 .0039 (Continued )
T-8
Tables
TABLE C Binomial probabilities (continued) n Entry is P 1X ⫽ k2 ⫽ a b pk 11 ⫺ p2 n⫺k k p k
.01
.02
.03
.04
.05
.06
.07
.08
.09
9
0 1 2 3 4 5 6 7 8 9
.9135 .0830 .0034 .0001
.8337 .1531 .0125 .0006
.7602 .2116 .0262 .0019 .0001
.6925 .2597 .0433 .0042 .0003
.6302 .2985 .0629 .0077 .0006
.5730 .3292 .0840 .0125 .0012 .0001
.5204 .3525 .1061 .0186 .0021 .0002
.4722 .3695 .1285 .0261 .0034 .0003
.4279 .3809 .1507 .0348 .0052 .0005
10
0 1 2 3 4 5 6 7 8 9 10
.9044 .0914 .0042 .0001
.8171 .1667 .0153 .0008
.7374 .2281 .0317 .0026 .0001
.6648 .2770 .0519 .0058 .0004
.5987 .3151 .0746 .0105 .0010 .0001
.5386 .3438 .0988 .0168 .0019 .0001
.4840 .3643 .1234 .0248 .0033 .0003
.4344 .3777 .1478 .0343 .0052 .0005
.3894 .3851 .1714 .0452 .0078 .0009 .0001
12
0 1 2 3 4 5 6 7 8 9 10 11 12
.8864 .1074 .0060 .0002
.7847 .1922 .0216 .0015 .0001
.6938 .2575 .0438 .0045 .0003
.6127 .3064 .0702 .0098 .0009 .0001
.5404 .3413 .0988 .0173 .0021 .0002
.4759 .3645 .1280 .0272 .0039 .0004
.4186 .3781 .1565 .0393 .0067 .0008 .0001
.3677 .3837 .1835 .0532 .0104 .0014 .0001
.3225 .3827 .2082 .0686 .0153 .0024 .0003
15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
.8601 .1303 .0092 .0004
.7386 .2261 .0323 .0029 .0002
.6333 .2938 .0636 .0085 .0008 .0001
.5421 .3388 .0988 .0178 .0022 .0002
.4633 .3658 .1348 .0307 .0049 .0006
.3953 .3785 .1691 .0468 .0090 .0013 .0001
.3367 .3801 .2003 .0653 .0148 .0024 .0003
.2863 .3734 .2273 .0857 .0223 .0043 .0006 .0001
.2430 .3605 .2496 .1070 .0317 .0069 .0011 .0001
n
Tables
T-9
TABLE C Binomial probabilities (continued) n Entry is P 1X ⫽ k2 ⫽ a b pk 11 ⫺ p2 n⫺k k p k
.10
.15
.20
.25
.30
.35
.40
.45
.50
9
0 1 2 3 4 5 6 7 8 9
.3874 .3874 .1722 .0446 .0074 .0008 .0001
.2316 .3679 .2597 .1069 .0283 .0050 .0006
.1342 .3020 .3020 .1762 .0661 .0165 .0028 .0003
.0751 .2253 .3003 .2336 .1168 .0389 .0087 .0012 .0001
.0404 .1556 .2668 .2668 .1715 .0735 .0210 .0039 .0004
.0207 .1004 .2162 .2716 .2194 .1181 .0424 .0098 .0013 .0001
.0101 .0605 .1612 .2508 .2508 .1672 .0743 .0212 .0035 .0003
.0046 .0339 .1110 .2119 .2600 .2128 .1160 .0407 .0083 .0008
.0020 .0176 .0703 .1641 .2461 .2461 .1641 .0703 .0176 .0020
10
0 1 2 3 4 5 6 7 8 9 10
.3487 .3874 .1937 .0574 .0112 .0015 .0001
.1969 .3474 .2759 .1298 .0401 .0085 .0012 .0001
.1074 .2684 .3020 .2013 .0881 .0264 .0055 .0008 .0001
.0563 .1877 .2816 .2503 .1460 .0584 .0162 .0031 .0004
.0282 .1211 .2335 .2668 .2001 .1029 .0368 .0090 .0014 .0001
.0135 .0725 .1757 .2522 .2377 .1536 .0689 .0212 .0043 .0005
.0060 .0403 .1209 .2150 .2508 .2007 .1115 .0425 .0106 .0016 .0001
.0025 .0207 .0763 .1665 .2384 .2340 .1596 .0746 .0229 .0042 .0003
.0010 .0098 .0439 .1172 .2051 .2461 .2051 .1172 .0439 .0098 .0010
12
0 1 2 3 4 5 6 7 8 9 10 11 12
.2824 .3766 .2301 .0852 .0213 .0038 .0005
.1422 .3012 .2924 .1720 .0683 .0193 .0040 .0006 .0001
.0687 .2062 .2835 .2362 .1329 .0532 .0155 .0033 .0005 .0001
.0317 .1267 .2323 .2581 .1936 .1032 .0401 .0115 .0024 .0004
.0138 .0712 .1678 .2397 .2311 .1585 .0792 .0291 .0078 .0015 .0002
.0057 .0368 .1088 .1954 .2367 .2039 .1281 .0591 .0199 .0048 .0008 .0001
.0022 .0174 .0639 .1419 .2128 .2270 .1766 .1009 .0420 .0125 .0025 .0003
.0008 .0075 .0339 .0923 .1700 .2225 .2124 .1489 .0762 .0277 .0068 .0010 .0001
.0002 .0029 .0161 .0537 .1208 .1934 .2256 .1934 .1208 .0537 .0161 .0029 .0002
15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
.2059 .3432 .2669 .1285 .0428 .0105 .0019 .0003
.0874 .2312 .2856 .2184 .1156 .0449 .0132 .0030 .0005 .0001
.0352 .1319 .2309 .2501 .1876 .1032 .0430 .0138 .0035 .0007 .0001
.0134 .0668 .1559 .2252 .2252 .1651 .0917 .0393 .0131 .0034 .0007 .0001
.0047 .0305 .0916 .1700 .2186 .2061 .1472 .0811 .0348 .0116 .0030 .0006 .0001
.0016 .0126 .0476 .1110 .1792 .2123 .1906 .1319 .0710 .0298 .0096 .0024 .0004 .0001
.0005 .0047 .0219 .0634 .1268 .1859 .2066 .1771 .1181 .0612 .0245 .0074 .0016 .0003
.0001 .0016 .0090 .0318 .0780 .1404 .1914 .2013 .1647 .1048 .0515 .0191 .0052 .0010 .0001
.0000 .0005 .0032 .0139 .0417 .0916 .1527 .1964 .1964 .1527 .0916 .0417 .0139 .0032 .0005
n
(Continued )
T-10
Tables
TABLE C Binomial probabilities (continued) p n
k
.01
.02
.03
.04
.05
.06
.07
.08
.09
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
.8179 .1652 .0159 .0010
.6676 .2725 .0528 .0065 .0006
.5438 .3364 .0988 .0183 .0024 .0002
.4420 .3683 .1458 .0364 .0065 .0009 .0001
.3585 .3774 .1887 .0596 .0133 .0022 .0003
.2901 .3703 .2246 .0860 .0233 .0048 .0008 .0001
.2342 .3526 .2521 .1139 .0364 .0088 .0017 .0002
.1887 .3282 .2711 .1414 .0523 .0145 .0032 .0005 .0001
.1516 .3000 .2818 .1672 .0703 .0222 .0055 .0011 .0002
p n
k
.10
.15
.20
.25
.30
.35
.40
.45
.50
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
.1216 .2702 .2852 .1901 .0898 .0319 .0089 .0020 .0004 .0001
.0388 .1368 .2293 .2428 .1821 .1028 .0454 .0160 .0046 .0011 .0002
.0115 .0576 .1369 .2054 .2182 .1746 .1091 .0545 .0222 .0074 .0020 .0005 .0001
.0032 .0211 .0669 .1339 .1897 .2023 .1686 .1124 .0609 .0271 .0099 .0030 .0008 .0002
.0008 .0068 .0278 .0716 .1304 .1789 .1916 .1643 .1144 .0654 .0308 .0120 .0039 .0010 .0002
.0002 .0020 .0100 .0323 .0738 .1272 .1712 .1844 .1614 .1158 .0686 .0336 .0136 .0045 .0012 .0003
.0000 .0005 .0031 .0123 .0350 .0746 .1244 .1659 .1797 .1597 .1171 .0710 .0355 .0146 .0049 .0013 .0003
.0000 .0001 .0008 .0040 .0139 .0365 .0746 .1221 .1623 .1771 .1593 .1185 .0727 .0366 .0150 .0049 .0013 .0002
.0000 .0000 .0002 .0011 .0046 .0148 .0370 .0739 .1201 .1602 .1762 .1602 .1201 .0739 .0370 .0148 .0046 .0011 .0002
Tables
T-11
Probability p
Table entry for p and C is the critical value t* with probability p lying to its right and probability C lying between 2t* and t*.
t*
TABLE D t distribution critical values Upper-tail probability p df
.25
.20
.15
.10
.05
.025
.02
.01
.005
.0025
.001
.0005
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100 1000 z*
1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.681 0.679 0.679 0.678 0.677 0.675 0.674
1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854 0.851 0.849 0.848 0.846 0.845 0.842 0.841
1.963 1.386 1.250 1.190 1.156 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.058 1.057 1.056 1.055 1.055 1.050 1.047 1.045 1.043 1.042 1.037 1.036
3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.299 1.296 1.292 1.290 1.282 1.282
6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.676 1.671 1.664 1.660 1.646 1.645
12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.009 2.000 1.990 1.984 1.962 1.960
15.89 4.849 3.482 2.999 2.757 2.612 2.517 2.449 2.398 2.359 2.328 2.303 2.282 2.264 2.249 2.235 2.224 2.214 2.205 2.197 2.189 2.183 2.177 2.172 2.167 2.162 2.158 2.154 2.150 2.147 2.123 2.109 2.099 2.088 2.081 2.056 2.054
31.82 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.403 2.390 2.374 2.364 2.330 2.326
63.66 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.678 2.660 2.639 2.626 2.581 2.576
127.3 14.09 7.453 5.598 4.773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3.372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.119 3.104 3.091 3.078 3.067 3.057 3.047 3.038 3.030 2.971 2.937 2.915 2.887 2.871 2.813 2.807
318.3 22.33 10.21 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.611 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.261 3.232 3.195 3.174 3.098 3.091
636.6 31.60 12.92 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.416 3.390 3.300 3.291
50%
60%
70%
80%
90%
95%
96%
98%
99%
99.5%
99.8%
99.9%
Confidence level C
T-12
Tables
Probability p
Table entry for p is the critical value F* with probability p lying to its right.
TABLE E
F*
F critical values
Degrees of freedom in the denominator
Degrees of freedom in the numerator p
1
2
3
4
5
6
7
8
9
1
.100 .050 .025 .010 .001
39.86 161.45 647.79 4052.2 405284
49.50 199.50 799.50 4999.5 500000
53.59 215.71 864.16 5403.4 540379
55.83 224.58 899.58 5624.6 562500
57.24 230.16 921.85 5763.6 576405
58.20 233.99 937.11 5859.0 585937
58.91 236.77 948.22 5928.4 592873
59.44 238.88 956.66 5981.1 598144
59.86 240.54 963.28 6022.5 602284
2
.100 .050 .025 .010 .001
8.53 18.51 38.51 98.50 998.50
9.00 19.00 39.00 99.00 999.00
9.16 19.16 39.17 99.17 999.17
9.24 19.25 39.25 99.25 999.25
9.29 19.30 39.30 99.30 999.30
9.33 19.33 39.33 99.33 999.33
9.35 19.35 39.36 99.36 999.36
9.37 19.37 39.37 99.37 999.37
9.38 19.38 39.39 99.39 999.39
3
.100 .050 .025 .010 .001
5.54 10.13 17.44 34.12 167.03
5.46 9.55 16.04 30.82 148.50
5.39 9.28 15.44 29.46 141.11
5.34 9.12 15.10 28.71 137.10
5.31 9.01 14.88 28.24 134.58
5.28 8.94 14.73 27.91 132.85
5.27 8.89 14.62 27.67 131.58
5.25 8.85 14.54 27.49 130.62
5.24 8.81 14.47 27.35 129.86
4
.100 .050 .025 .010 .001
4.54 7.71 12.22 21.20 74.14
4.32 6.94 10.65 18.00 61.25
4.19 6.59 9.98 16.69 56.18
4.11 6.39 9.60 15.98 53.44
4.05 6.26 9.36 15.52 51.71
4.01 6.16 9.20 15.21 50.53
3.98 6.09 9.07 14.98 49.66
3.95 6.04 8.98 14.80 49.00
3.94 6.00 8.90 14.66 48.47
5
.100 .050 .025 .010 .001
4.06 6.61 10.01 16.26 47.18
3.78 5.79 8.43 13.27 37.12
3.62 5.41 7.76 12.06 33.20
3.52 5.19 7.39 11.39 31.09
3.45 5.05 7.15 10.97 29.75
3.40 4.95 6.98 10.67 28.83
3.37 4.88 6.85 10.46 28.16
3.34 4.82 6.76 10.29 27.65
3.32 4.77 6.68 10.16 27.24
6
.100 .050 .025 .010 .001
3.78 5.99 8.81 13.75 35.51
3.46 5.14 7.26 10.92 27.00
3.29 4.76 6.60 9.78 23.70
3.18 4.53 6.23 9.15 21.92
3.11 4.39 5.99 8.75 20.80
3.05 4.28 5.82 8.47 20.03
3.01 4.21 5.70 8.26 19.46
2.98 4.15 5.60 8.10 19.03
2.96 4.10 5.52 7.98 18.69
7
.100 .050 .025 .010 .001
3.59 5.59 8.07 12.25 29.25
3.26 4.74 6.54 9.55 21.69
3.07 4.35 5.89 8.45 18.77
2.96 4.12 5.52 7.85 17.20
2.88 3.97 5.29 7.46 16.21
2.83 3.87 5.12 7.19 15.52
2.78 3.79 4.99 6.99 15.02
2.75 3.73 4.90 6.84 14.63
2.72 3.68 4.82 6.72 14.33
Tables
T-13
Probability p
Table entry for p is the critical value F* with probability p lying to its right.
TABLE E
F*
F critical values (continued) Degrees of freedom in the numerator
10
12
15
20
25
30
40
50
60
120
1000
60.19 241.88 968.63 6055.8 605621
60.71 243.91 976.71 6106.3 610668
61.22 245.95 984.87 6157.3 615764
61.74 248.01 993.10 6208.7 620908
62.05 249.26 998.08 6239.8 624017
62.26 250.10 1001.4 6260.6 626099
62.53 251.14 1005.6 6286.8 628712
62.69 251.77 1008.1 6302.5 630285
62.79 252.20 1009.8 6313.0 631337
63.06 253.25 1014.0 6339.4 633972
63.30 254.19 1017.7 6362.7 636301
9.39 19.40 39.40 99.40 999.40
9.41 19.41 39.41 99.42 999.42
9.42 19.43 39.43 99.43 999.43
9.44 19.45 39.45 99.45 999.45
9.45 19.46 39.46 99.46 999.46
9.46 19.46 39.46 99.47 999.47
9.47 19.47 39.47 99.47 999.47
9.47 19.48 39.48 99.48 999.48
9.47 19.48 39.48 99.48 999.48
9.48 19.49 39.49 99.49 999.49
9.49 19.49 39.50 99.50 999.50
5.23 8.79 14.42 27.23 129.25
5.22 8.74 14.34 27.05 128.32
5.20 8.70 14.25 26.87 127.37
5.18 8.66 14.17 26.69 126.42
5.17 8.63 14.12 26.58 125.84
5.17 8.62 14.08 26.50 125.45
5.16 8.59 14.04 26.41 124.96
5.15 8.58 14.01 26.35 124.66
5.15 8.57 13.99 26.32 124.47
5.14 8.55 13.95 26.22 123.97
5.13 8.53 13.91 26.14 123.53
3.92 5.96 8.84 14.55 48.05
3.90 5.91 8.75 14.37 47.41
3.87 5.86 8.66 14.20 46.76
3.84 5.80 8.56 14.02 46.10
3.83 5.77 8.50 13.91 45.70
3.82 5.75 8.46 13.84 45.43
3.80 5.72 8.41 13.75 45.09
3.80 5.70 8.38 13.69 44.88
3.79 5.69 8.36 13.65 44.75
3.78 5.66 8.31 13.56 44.40
3.76 5.63 8.26 13.47 44.09
3.30 4.74 6.62 10.05 26.92
3.27 4.68 6.52 9.89 26.42
3.24 4.62 6.43 9.72 25.91
3.21 4.56 6.33 9.55 25.39
3.19 4.52 6.27 9.45 25.08
3.17 4.50 6.23 9.38 24.87
3.16 4.46 6.18 9.29 24.60
3.15 4.44 6.14 9.24 24.44
3.14 4.43 6.12 9.20 24.33
3.12 4.40 6.07 9.11 24.06
3.11 4.37 6.02 9.03 23.82
2.94 4.06 5.46 7.87 18.41
2.90 4.00 5.37 7.72 17.99
2.87 3.94 5.27 7.56 17.56
2.84 3.87 5.17 7.40 17.12
2.81 3.83 5.11 7.30 16.85
2.80 3.81 5.07 7.23 16.67
2.78 3.77 5.01 7.14 16.44
2.77 3.75 4.98 7.09 16.31
2.76 3.74 4.96 7.06 16.21
2.74 3.70 4.90 6.97 15.98
2.72 3.67 4.86 6.89 15.77
2.70 3.64 4.76 6.62 14.08
2.67 3.57 4.67 6.47 13.71
2.63 3.51 4.57 6.31 13.32
2.59 3.44 4.47 6.16 12.93
2.57 3.40 4.40 6.06 12.69
2.56 3.38 4.36 5.99 12.53
2.54 3.34 4.31 5.91 12.33
2.52 3.32 4.28 5.86 12.20
2.51 3.30 4.25 5.82 12.12
2.49 3.27 4.20 5.74 11.91
2.47 3.23 4.15 5.66 11.72 (Continued )
T-14
Tables
TABLE E
F critical values (continued)
Degrees of freedom in the denominator
Degrees of freedom in the numerator p
1
2
3
4
5
6
7
8
9
8
.100 .050 .025 .010 .001
3.46 5.32 7.57 11.26 25.41
3.11 4.46 6.06 8.65 18.49
2.92 4.07 5.42 7.59 15.83
2.81 3.84 5.05 7.01 14.39
2.73 3.69 4.82 6.63 13.48
2.67 3.58 4.65 6.37 12.86
2.62 3.50 4.53 6.18 12.40
2.59 3.44 4.43 6.03 12.05
2.56 3.39 4.36 5.91 11.77
9
.100 .050 .025 .010 .001
3.36 5.12 7.21 10.56 22.86
3.01 4.26 5.71 8.02 16.39
2.81 3.86 5.08 6.99 13.90
2.69 3.63 4.72 6.42 12.56
2.61 3.48 4.48 6.06 11.71
2.55 3.37 4.32 5.80 11.13
2.51 3.29 4.20 5.61 10.70
2.47 3.23 4.10 5.47 10.37
2.44 3.18 4.03 5.35 10.11
10
.100 .050 .025 .010 .001
3.29 4.96 6.94 10.04 21.04
2.92 4.10 5.46 7.56 14.91
2.73 3.71 4.83 6.55 12.55
2.61 3.48 4.47 5.99 11.28
2.52 3.33 4.24 5.64 10.48
2.46 3.22 4.07 5.39 9.93
2.41 3.14 3.95 5.20 9.52
2.38 3.07 3.85 5.06 9.20
2.35 3.02 3.78 4.94 8.96
11
.100 .050 .025 .010 .001
3.23 4.84 6.72 9.65 19.69
2.86 3.98 5.26 7.21 13.81
2.66 3.59 4.63 6.22 11.56
2.54 3.36 4.28 5.67 10.35
2.45 3.20 4.04 5.32 9.58
2.39 3.09 3.88 5.07 9.05
2.34 3.01 3.76 4.89 8.66
2.30 2.95 3.66 4.74 8.35
2.27 2.90 3.59 4.63 8.12
12
.100 .050 .025 .010 .001
3.18 4.75 6.55 9.33 18.64
2.81 3.89 5.10 6.93 12.97
2.61 3.49 4.47 5.95 10.80
2.48 3.26 4.12 5.41 9.63
2.39 3.11 3.89 5.06 8.89
2.33 3.00 3.73 4.82 8.38
2.28 2.91 3.61 4.64 8.00
2.24 2.85 3.51 4.50 7.71
2.21 2.80 3.44 4.39 7.48
13
.100 .050 .025 .010 .001
3.14 4.67 6.41 9.07 17.82
2.76 3.81 4.97 6.70 12.31
2.56 3.41 4.35 5.74 10.21
2.43 3.18 4.00 5.21 9.07
2.35 3.03 3.77 4.86 8.35
2.28 2.92 3.60 4.62 7.86
2.23 2.83 3.48 4.44 7.49
2.20 2.77 3.39 4.30 7.21
2.16 2.71 3.31 4.19 6.98
14
.100 .050 .025 .010 .001
3.10 4.60 6.30 8.86 17.14
2.73 3.74 4.86 6.51 11.78
2.52 3.34 4.24 5.56 9.73
2.39 3.11 3.89 5.04 8.62
2.31 2.96 3.66 4.69 7.92
2.24 2.85 3.50 4.46 7.44
2.19 2.76 3.38 4.28 7.08
2.15 2.70 3.29 4.14 6.80
2.12 2.65 3.21 4.03 6.58
15
.100 .050 .025 .010 .001
3.07 4.54 6.20 8.68 16.59
2.70 3.68 4.77 6.36 11.34
2.49 3.29 4.15 5.42 9.34
2.36 3.06 3.80 4.89 8.25
2.27 2.90 3.58 4.56 7.57
2.21 2.79 3.41 4.32 7.09
2.16 2.71 3.29 4.14 6.74
2.12 2.64 3.20 4.00 6.47
2.09 2.59 3.12 3.89 6.26
16
.100 .050 .025 .010 .001
3.05 4.49 6.12 8.53 16.12
2.67 3.63 4.69 6.23 10.97
2.46 3.24 4.08 5.29 9.01
2.33 3.01 3.73 4.77 7.94
2.24 2.85 3.50 4.44 7.27
2.18 2.74 3.34 4.20 6.80
2.13 2.66 3.22 4.03 6.46
2.09 2.59 3.12 3.89 6.19
2.06 2.54 3.05 3.78 5.98
17
.100 .050 .025 .010 .001
3.03 4.45 6.04 8.40 15.72
2.64 3.59 4.62 6.11 10.66
2.44 3.20 4.01 5.19 8.73
2.31 2.96 3.66 4.67 7.68
2.22 2.81 3.44 4.34 7.02
2.15 2.70 3.28 4.10 6.56
2.10 2.61 3.16 3.93 6.22
2.06 2.55 3.06 3.79 5.96
2.03 2.49 2.98 3.68 5.75
Tables
TABLE E
T-15
F critical values (continued) Degrees of freedom in the numerator
10
12
15
20
25
30
40
50
60
120
1000
2.54 3.35 4.30 5.81 11.54
2.50 3.28 4.20 5.67 11.19
2.46 3.22 4.10 5.52 10.84
2.42 3.15 4.00 5.36 10.48
2.40 3.11 3.94 5.26 10.26
2.38 3.08 3.89 5.20 10.11
2.36 3.04 3.84 5.12 9.92
2.35 3.02 3.81 5.07 9.80
2.34 3.01 3.78 5.03 9.73
2.32 2.97 3.73 4.95 9.53
2.30 2.93 3.68 4.87 9.36
2.42 3.14 3.96 5.26 9.89
2.38 3.07 3.87 5.11 9.57
2.34 3.01 3.77 4.96 9.24
2.30 2.94 3.67 4.81 8.90
2.27 2.89 3.60 4.71 8.69
2.25 2.86 3.56 4.65 8.55
2.23 2.83 3.51 4.57 8.37
2.22 2.80 3.47 4.52 8.26
2.21 2.79 3.45 4.48 8.19
2.18 2.75 3.39 4.40 8.00
2.16 2.71 3.34 4.32 7.84
2.32 2.98 3.72 4.85 8.75
2.28 2.91 3.62 4.71 8.45
2.24 2.85 3.52 4.56 8.13
2.20 2.77 3.42 4.41 7.80
2.17 2.73 3.35 4.31 7.60
2.16 2.70 3.31 4.25 7.47
2.13 2.66 3.26 4.17 7.30
2.12 2.64 3.22 4.12 7.19
2.11 2.62 3.20 4.08 7.12
2.08 2.58 3.14 4.00 6.94
2.06 2.54 3.09 3.92 6.78
2.25 2.85 3.53 4.54 7.92
2.21 2.79 3.43 4.40 7.63
2.17 2.72 3.33 4.25 7.32
2.12 2.65 3.23 4.10 7.01
2.10 2.60 3.16 4.01 6.81
2.08 2.57 3.12 3.94 6.68
2.05 2.53 3.06 3.86 6.52
2.04 2.51 3.03 3.81 6.42
2.03 2.49 3.00 3.78 6.35
2.00 2.45 2.94 3.69 6.18
1.98 2.41 2.89 3.61 6.02
2.19 2.75 3.37 4.30 7.29
2.15 2.69 3.28 4.16 7.00
2.10 2.62 3.18 4.01 6.71
2.06 2.54 3.07 3.86 6.40
2.03 2.50 3.01 3.76 6.22
2.01 2.47 2.96 3.70 6.09
1.99 2.43 2.91 3.62 5.93
1.97 2.40 2.87 3.57 5.83
1.96 2.38 2.85 3.54 5.76
1.93 2.34 2.79 3.45 5.59
1.91 2.30 2.73 3.37 5.44
2.14 2.67 3.25 4.10 6.80
2.10 2.60 3.15 3.96 6.52
2.05 2.53 3.05 3.82 6.23
2.01 2.46 2.95 3.66 5.93
1.98 2.41 2.88 3.57 5.75
1.96 2.38 2.84 3.51 5.63
1.93 2.34 2.78 3.43 5.47
1.92 2.31 2.74 3.38 5.37
1.90 2.30 2.72 3.34 5.30
1.88 2.25 2.66 3.25 5.14
1.85 2.21 2.60 3.18 4.99
2.10 2.60 3.15 3.94 6.40
2.05 2.53 3.05 3.80 6.13
2.01 2.46 2.95 3.66 5.85
1.96 2.39 2.84 3.51 5.56
1.93 2.34 2.78 3.41 5.38
1.91 2.31 2.73 3.35 5.25
1.89 2.27 2.67 3.27 5.10
1.87 2.24 2.64 3.22 5.00
1.86 2.22 2.61 3.18 4.94
1.83 2.18 2.55 3.09 4.77
1.80 2.14 2.50 3.02 4.62
2.06 2.54 3.06 3.80 6.08
2.02 2.48 2.96 3.67 5.81
1.97 2.40 2.86 3.52 5.54
1.92 2.33 2.76 3.37 5.25
1.89 2.28 2.69 3.28 5.07
1.87 2.25 2.64 3.21 4.95
1.85 2.20 2.59 3.13 4.80
1.83 2.18 2.55 3.08 4.70
1.82 2.16 2.52 3.05 4.64
1.79 2.11 2.46 2.96 4.47
1.76 2.07 2.40 2.88 4.33
2.03 2.49 2.99 3.69 5.81
1.99 2.42 2.89 3.55 5.55
1.94 2.35 2.79 3.41 5.27
1.89 2.28 2.68 3.26 4.99
1.86 2.23 2.61 3.16 4.82
1.84 2.19 2.57 3.10 4.70
1.81 2.15 2.51 3.02 4.54
1.79 2.12 2.47 2.97 4.45
1.78 2.11 2.45 2.93 4.39
1.75 2.06 2.38 2.84 4.23
1.72 2.02 2.32 2.76 4.08
2.00 2.45 2.92 3.59 5.58
1.96 2.38 2.82 3.46 5.32
1.91 2.31 2.72 3.31 5.05
1.86 2.23 2.62 3.16 4.78
1.83 2.18 2.55 3.07 4.60
1.81 2.15 2.50 3.00 4.48
1.78 2.10 2.44 2.92 4.33
1.76 2.08 2.41 2.87 4.24
1.75 2.06 2.38 2.83 4.18
1.72 2.01 2.32 2.75 4.02
1.69 1.97 2.26 2.66 3.87 (Continued )
T-16
Tables
TABLE E
F critical values (continued)
Degrees of freedom in the denominator
Degrees of freedom in the numerator p
1
2
3
4
5
6
7
8
9
18
.100 .050 .025 .010 .001
3.01 4.41 5.98 8.29 15.38
2.62 3.55 4.56 6.01 10.39
2.42 3.16 3.95 5.09 8.49
2.29 2.93 3.61 4.58 7.46
2.20 2.77 3.38 4.25 6.81
2.13 2.66 3.22 4.01 6.35
2.08 2.58 3.10 3.84 6.02
2.04 2.51 3.01 3.71 5.76
2.00 2.46 2.93 3.60 5.56
19
.100 .050 .025 .010 .001
2.99 4.38 5.92 8.18 15.08
2.61 3.52 4.51 5.93 10.16
2.40 3.13 3.90 5.01 8.28
2.27 2.90 3.56 4.50 7.27
2.18 2.74 3.33 4.17 6.62
2.11 2.63 3.17 3.94 6.18
2.06 2.54 3.05 3.77 5.85
2.02 2.48 2.96 3.63 5.59
1.98 2.42 2.88 3.52 5.39
20
.100 .050 .025 .010 .001
2.97 4.35 5.87 8.10 14.82
2.59 3.49 4.46 5.85 9.95
2.38 3.10 3.86 4.94 8.10
2.25 2.87 3.51 4.43 7.10
2.16 2.71 3.29 4.10 6.46
2.09 2.60 3.13 3.87 6.02
2.04 2.51 3.01 3.70 5.69
2.00 2.45 2.91 3.56 5.44
1.96 2.39 2.84 3.46 5.24
21
.100 .050 .025 .010 .001
2.96 4.32 5.83 8.02 14.59
2.57 3.47 4.42 5.78 9.77
2.36 3.07 3.82 4.87 7.94
2.23 2.84 3.48 4.37 6.95
2.14 2.68 3.25 4.04 6.32
2.08 2.57 3.09 3.81 5.88
2.02 2.49 2.97 3.64 5.56
1.98 2.42 2.87 3.51 5.31
1.95 2.37 2.80 3.40 5.11
22
.100 .050 .025 .010 .001
2.95 4.30 5.79 7.95 14.38
2.56 3.44 4.38 5.72 9.61
2.35 3.05 3.78 4.82 7.80
2.22 2.82 3.44 4.31 6.81
2.13 2.66 3.22 3.99 6.19
2.06 2.55 3.05 3.76 5.76
2.01 2.46 2.93 3.59 5.44
1.97 2.40 2.84 3.45 5.19
1.93 2.34 2.76 3.35 4.99
23
.100 .050 .025 .010 .001
2.94 4.28 5.75 7.88 14.20
2.55 3.42 4.35 5.66 9.47
2.34 3.03 3.75 4.76 7.67
2.21 2.80 3.41 4.26 6.70
2.11 2.64 3.18 3.94 6.08
2.05 2.53 3.02 3.71 5.65
1.99 2.44 2.90 3.54 5.33
1.95 2.37 2.81 3.41 5.09
1.92 2.32 2.73 3.30 4.89
24
.100 .050 .025 .010 .001
2.93 4.26 5.72 7.82 14.03
2.54 3.40 4.32 5.61 9.34
2.33 3.01 3.72 4.72 7.55
2.19 2.78 3.38 4.22 6.59
2.10 2.62 3.15 3.90 5.98
2.04 2.51 2.99 3.67 5.55
1.98 2.42 2.87 3.50 5.23
1.94 2.36 2.78 3.36 4.99
1.91 2.30 2.70 3.26 4.80
25
.100 .050 .025 .010 .001
2.92 4.24 5.69 7.77 13.88
2.53 3.39 4.29 5.57 9.22
2.32 2.99 3.69 4.68 7.45
2.18 2.76 3.35 4.18 6.49
2.09 2.60 3.13 3.85 5.89
2.02 2.49 2.97 3.63 5.46
1.97 2.40 2.85 3.46 5.15
1.93 2.34 2.75 3.32 4.91
1.89 2.28 2.68 3.22 4.71
26
.100 .050 .025 .010 .001
2.91 4.23 5.66 7.72 13.74
2.52 3.37 4.27 5.53 9.12
2.31 2.98 3.67 4.64 7.36
2.17 2.74 3.33 4.14 6.41
2.08 2.59 3.10 3.82 5.80
2.01 2.47 2.94 3.59 5.38
1.96 2.39 2.82 3.42 5.07
1.92 2.32 2.73 3.29 4.83
1.88 2.27 2.65 3.18 4.64
27
.100 .050 .025 .010 .001
2.90 4.21 5.63 7.68 13.61
2.51 3.35 4.24 5.49 9.02
2.30 2.96 3.65 4.60 7.27
2.17 2.73 3.31 4.11 6.33
2.07 2.57 3.08 3.78 5.73
2.00 2.46 2.92 3.56 5.31
1.95 2.37 2.80 3.39 5.00
1.91 2.31 2.71 3.26 4.76
1.87 2.25 2.63 3.15 4.57
Tables
TABLE E
T-17
F critical values (continued) Degrees of freedom in the numerator
10
12
15
20
25
30
40
50
60
120
1000
1.98 2.41 2.87 3.51 5.39
1.93 2.34 2.77 3.37 5.13
1.89 2.27 2.67 3.23 4.87
1.84 2.19 2.56 3.08 4.59
1.80 2.14 2.49 2.98 4.42
1.78 2.11 2.44 2.92 4.30
1.75 2.06 2.38 2.84 4.15
1.74 2.04 2.35 2.78 4.06
1.72 2.02 2.32 2.75 4.00
1.69 1.97 2.26 2.66 3.84
1.66 1.92 2.20 2.58 3.69
1.96 2.38 2.82 3.43 5.22
1.91 2.31 2.72 3.30 4.97
1.86 2.23 2.62 3.15 4.70
1.81 2.16 2.51 3.00 4.43
1.78 2.11 2.44 2.91 4.26
1.76 2.07 2.39 2.84 4.14
1.73 2.03 2.33 2.76 3.99
1.71 2.00 2.30 2.71 3.90
1.70 1.98 2.27 2.67 3.84
1.67 1.93 2.20 2.58 3.68
1.64 1.88 2.14 2.50 3.53
1.94 2.35 2.77 3.37 5.08
1.89 2.28 2.68 3.23 4.82
1.84 2.20 2.57 3.09 4.56
1.79 2.12 2.46 2.94 4.29
1.76 2.07 2.40 2.84 4.12
1.74 2.04 2.35 2.78 4.00
1.71 1.99 2.29 2.69 3.86
1.69 1.97 2.25 2.64 3.77
1.68 1.95 2.22 2.61 3.70
1.64 1.90 2.16 2.52 3.54
1.61 1.85 2.09 2.43 3.40
1.92 2.32 2.73 3.31 4.95
1.87 2.25 2.64 3.17 4.70
1.83 2.18 2.53 3.03 4.44
1.78 2.10 2.42 2.88 4.17
1.74 2.05 2.36 2.79 4.00
1.72 2.01 2.31 2.72 3.88
1.69 1.96 2.25 2.64 3.74
1.67 1.94 2.21 2.58 3.64
1.66 1.92 2.18 2.55 3.58
1.62 1.87 2.11 2.46 3.42
1.59 1.82 2.05 2.37 3.28
1.90 2.30 2.70 3.26 4.83
1.86 2.23 2.60 3.12 4.58
1.81 2.15 2.50 2.98 4.33
1.76 2.07 2.39 2.83 4.06
1.73 2.02 2.32 2.73 3.89
1.70 1.98 2.27 2.67 3.78
1.67 1.94 2.21 2.58 3.63
1.65 1.91 2.17 2.53 3.54
1.64 1.89 2.14 2.50 3.48
1.60 1.84 2.08 2.40 3.32
1.57 1.79 2.01 2.32 3.17
1.89 2.27 2.67 3.21 4.73
1.84 2.20 2.57 3.07 4.48
1.80 2.13 2.47 2.93 4.23
1.74 2.05 2.36 2.78 3.96
1.71 2.00 2.29 2.69 3.79
1.69 1.96 2.24 2.62 3.68
1.66 1.91 2.18 2.54 3.53
1.64 1.88 2.14 2.48 3.44
1.62 1.86 2.11 2.45 3.38
1.59 1.81 2.04 2.35 3.22
1.55 1.76 1.98 2.27 3.08
1.88 2.25 2.64 3.17 4.64
1.83 2.18 2.54 3.03 4.39
1.78 2.11 2.44 2.89 4.14
1.73 2.03 2.33 2.74 3.87
1.70 1.97 2.26 2.64 3.71
1.67 1.94 2.21 2.58 3.59
1.64 1.89 2.15 2.49 3.45
1.62 1.86 2.11 2.44 3.36
1.61 1.84 2.08 2.40 3.29
1.57 1.79 2.01 2.31 3.14
1.54 1.74 1.94 2.22 2.99
1.87 2.24 2.61 3.13 4.56
1.82 2.16 2.51 2.99 4.31
1.77 2.09 2.41 2.85 4.06
1.72 2.01 2.30 2.70 3.79
1.68 1.96 2.23 2.60 3.63
1.66 1.92 2.18 2.54 3.52
1.63 1.87 2.12 2.45 3.37
1.61 1.84 2.08 2.40 3.28
1.59 1.82 2.05 2.36 3.22
1.56 1.77 1.98 2.27 3.06
1.52 1.72 1.91 2.18 2.91
1.86 2.22 2.59 3.09 4.48
1.81 2.15 2.49 2.96 4.24
1.76 2.07 2.39 2.81 3.99
1.71 1.99 2.28 2.66 3.72
1.67 1.94 2.21 2.57 3.56
1.65 1.90 2.16 2.50 3.44
1.61 1.85 2.09 2.42 3.30
1.59 1.82 2.05 2.36 3.21
1.58 1.80 2.03 2.33 3.15
1.54 1.75 1.95 2.23 2.99
1.51 1.70 1.89 2.14 2.84
1.85 2.20 2.57 3.06 4.41
1.80 2.13 2.47 2.93 4.17
1.75 2.06 2.36 2.78 3.92
1.70 1.97 2.25 2.63 3.66
1.66 1.92 2.18 2.54 3.49
1.64 1.88 2.13 2.47 3.38
1.60 1.84 2.07 2.38 3.23
1.58 1.81 2.03 2.33 3.14
1.57 1.79 2.00 2.29 3.08
1.53 1.73 1.93 2.20
1.50 1.68 1.86 2.11
2.92
2.78 (Continued )
T-18
Tables
TABLE E
F critical values (continued) Degrees of freedom in the numerator p
2
3
4
5
6
7
8
9
.100
2.89
2.50
.050 .025 .010 .001
4.20 5.61 7.64 13.50
3.34 4.22 5.45 8.93
2.29 2.95 3.63 4.57 7.19
2.16 2.71 3.29 4.07 6.25
2.06 2.56 3.06 3.75 5.66
2.00 2.45 2.90 3.53 5.24
1.94 2.36 2.78 3.36 4.93
1.90 2.29 2.69 3.23 4.69
1.87 2.24 2.61 3.12 4.50
29
.100 .050 .025 .010 .001
2.89 4.18 5.59 7.60 13.39
2.50 3.33 4.20 5.42 8.85
2.28 2.93 3.61 4.54 7.12
2.15 2.70 3.27 4.04 6.19
2.06 2.55 3.04 3.73 5.59
1.99 2.43 2.88 3.50 5.18
1.93 2.35 2.76 3.33 4.87
1.89 2.28 2.67 3.20 4.64
1.86 2.22 2.59 3.09 4.45
30
.100 .050 .025 .010 .001
2.88 4.17 5.57 7.56 13.29
2.49 3.32 4.18 5.39 8.77
2.28 2.92 3.59 4.51 7.05
2.14 2.69 3.25 4.02 6.12
2.05 2.53 3.03 3.70 5.53
1.98 2.42 2.87 3.47 5.12
1.93 2.33 2.75 3.30 4.82
1.88 2.27 2.65 3.17 4.58
1.85 2.21 2.57 3.07 4.39
40
.100 .050 .025 .010 .001
2.84 4.08 5.42 7.31 12.61
2.44 3.23 4.05 5.18 8.25
2.23 2.84 3.46 4.31 6.59
2.09 2.61 3.13 3.83 5.70
2.00 2.45 2.90 3.51 5.13
1.93 2.34 2.74 3.29 4.73
1.87 2.25 2.62 3.12 4.44
1.83 2.18 2.53 2.99 4.21
1.79 2.12 2.45 2.89 4.02
50
.100 .050 .025 .010 .001
2.81 4.03 5.34 7.17 12.22
2.41 3.18 3.97 5.06 7.96
2.20 2.79 3.39 4.20 6.34
2.06 2.56 3.05 3.72 5.46
1.97 2.40 2.83 3.41 4.90
1.90 2.29 2.67 3.19 4.51
1.84 2.20 2.55 3.02 4.22
1.80 2.13 2.46 2.89 4.00
1.76 2.07 2.38 2.78 3.82
60
.100 .050 .025 .010 .001
2.79 4.00 5.29 7.08 11.97
2.39 3.15 3.93 4.98 7.77
2.18 2.76 3.34 4.13 6.17
2.04 2.53 3.01 3.65 5.31
1.95 2.37 2.79 3.34 4.76
1.87 2.25 2.63 3.12 4.37
1.82 2.17 2.51 2.95 4.09
1.77 2.10 2.41 2.82 3.86
1.74 2.04 2.33 2.72 3.69
100
.100 .050 .025 .010 .001
2.76 3.94 5.18 6.90 11.50
2.36 3.09 3.83 4.82 7.41
2.14 2.70 3.25 3.98 5.86
2.00 2.46 2.92 3.51 5.02
1.91 2.31 2.70 3.21 4.48
1.83 2.19 2.54 2.99 4.11
1.78 2.10 2.42 2.82 3.83
1.73 2.03 2.32 2.69 3.61
1.69 1.97 2.24 2.59 3.44
200
.100 .050 .025 .010 .001
2.73 3.89 5.10 6.76 11.15
2.33 3.04 3.76 4.71 7.15
2.11 2.65 3.18 3.88 5.63
1.97 2.42 2.85 3.41 4.81
1.88 2.26 2.63 3.11 4.29
1.80 2.14 2.47 2.89 3.92
1.75 2.06 2.35 2.73 3.65
1.70 1.98 2.26 2.60 3.43
1.66 1.93 2.18 2.50 3.26
1000
.100 .050 .025 .010 .001
2.71 3.85 5.04 6.66 10.89
2.31 3.00 3.70 4.63 6.96
2.09 2.61 3.13 3.80 5.46
1.95 2.38 2.80 3.34 4.65
1.85 2.22 2.58 3.04 4.14
1.78 2.11 2.42 2.82 3.78
1.72 2.02 2.30 2.66 3.51
1.68 1.95 2.20 2.53 3.30
1.64 1.89 2.13 2.43 3.13
28
Degrees of freedom in the denominator
1
Tables
TABLE E
T-19
F critical values (continued) Degrees of freedom in the numerator
10
12
15
20
25
30
40
50
60
120
1000
1.84 2.19 2.55 3.03 4.35
1.79 2.12 2.45 2.90 4.11
1.74 2.04 2.34 2.75 3.86
1.69 1.96 2.23 2.60 3.60
1.65 1.91 2.16 2.51 3.43
1.63 1.87 2.11 2.44 3.32
1.59 1.82 2.05 2.35 3.18
1.57 1.79 2.01 2.30 3.09
1.56 1.77 1.98 2.26 3.02
1.52 1.71 1.91 2.17 2.86
1.48 1.66 1.84 2.08 2.72
1.83 2.18 2.53 3.00 4.29
1.78 2.10 2.43 2.87 4.05
1.73 2.03 2.32 2.73 3.80
1.68 1.94 2.21 2.57 3.54
1.64 1.89 2.14 2.48 3.38
1.62 1.85 2.09 2.41 3.27
1.58 1.81 2.03 2.33 3.12
1.56 1.77 1.99 2.27 3.03
1.55 1.75 1.96 2.23 2.97
1.51 1.70 1.89 2.14 2.81
1.47 1.65 1.82 2.05 2.66
1.82 2.16 2.51 2.98 4.24
1.77 2.09 2.41 2.84 4.00
1.72 2.01 2.31 2.70 3.75
1.67 1.93 2.20 2.55 3.49
1.63 1.88 2.12 2.45 3.33
1.61 1.84 2.07 2.39 3.22
1.57 1.79 2.01 2.30 3.07
1.55 1.76 1.97 2.25 2.98
1.54 1.74 1.94 2.21 2.92
1.50 1.68 1.87 2.11 2.76
1.46 1.63 1.80 2.02 2.61
1.76 2.08 2.39 2.80 3.87
1.71 2.00 2.29 2.66 3.64
1.66 1.92 2.18 2.52 3.40
1.61 1.84 2.07 2.37 3.14
1.57 1.78 1.99 2.27 2.98
1.54 1.74 1.94 2.20 2.87
1.51 1.69 1.88 2.11 2.73
1.48 1.66 1.83 2.06 2.64
1.47 1.64 1.80 2.02 2.57
1.42 1.58 1.72 1.92 2.41
1.38 1.52 1.65 1.82 2.25
1.73 2.03 2.32 2.70 3.67
1.68 1.95 2.22 2.56 3.44
1.63 1.87 2.11 2.42 3.20
1.57 1.78 1.99 2.27 2.95
1.53 1.73 1.92 2.17 2.79
1.50 1.69 1.87 2.10 2.68
1.46 1.63 1.80 2.01 2.53
1.44 1.60 1.75 1.95 2.44
1.42 1.58 1.72 1.91 2.38
1.38 1.51 1.64 1.80 2.21
1.33 1.45 1.56 1.70 2.05
1.71 1.99 2.27 2.63 3.54
1.66 1.92 2.17 2.50 3.32
1.60 1.84 2.06 2.35 3.08
1.54 1.75 1.94 2.20 2.83
1.50 1.69 1.87 2.10 2.67
1.48 1.65 1.82 2.03 2.55
1.44 1.59 1.74 1.94 2.41
1.41 1.56 1.70 1.88 2.32
1.40 1.53 1.67 1.84 2.25
1.35 1.47 1.58 1.73 2.08
1.30 1.40 1.49 1.62 1.92
1.66 1.93 2.18 2.50 3.30
1.61 1.85 2.08 2.37 3.07
1.56 1.77 1.97 2.22 2.84
1.49 1.68 1.85 2.07 2.59
1.45 1.62 1.77 1.97 2.43
1.42 1.57 1.71 1.89 2.32
1.38 1.52 1.64 1.80 2.17
1.35 1.48 1.59 1.74 2.08
1.34 1.45 1.56 1.69 2.01
1.28 1.38 1.46 1.57 1.83
1.22 1.30 1.36 1.45 1.64
1.63 1.88 2.11 2.41 3.12
1.58 1.80 2.01 2.27 2.90
1.52 1.72 1.90 2.13 2.67
1.46 1.62 1.78 1.97 2.42
1.41 1.56 1.70 1.87 2.26
1.38 1.52 1.64 1.79 2.15
1.34 1.46 1.56 1.69 2.00
1.31 1.41 1.51 1.63 1.90
1.29 1.39 1.47 1.58 1.83
1.23 1.30 1.37 1.45 1.64
1.16 1.21 1.25 1.30 1.43
1.61 1.84 2.06 2.34 2.99
1.55 1.76 1.96 2.20 2.77
1.49 1.68 1.85 2.06 2.54
1.43 1.58 1.72 1.90 2.30
1.38 1.52 1.64 1.79 2.14
1.35 1.47 1.58 1.72 2.02
1.30 1.41 1.50 1.61 1.87
1.27 1.36 1.45 1.54 1.77
1.25 1.33 1.41 1.50 1.69
1.18 1.24 1.29 1.35 1.49
1.08 1.11 1.13 1.16 1.22
T-20
Tables
Probability p
Table entry for p is the critical value (x2)* with probability p lying to its right.
TABLE F
( χ 2)*
x2 distribution critical values Tail probability p
df
.25
.20
.15
.10
.05
.025
.02
.01
.005
.0025
.001
.0005
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100
1.32 2.77 4.11 5.39 6.63 7.84 9.04 10.22 11.39 12.55 13.70 14.85 15.98 17.12 18.25 19.37 20.49 21.60 22.72 23.83 24.93 26.04 27.14 28.24 29.34 30.43 31.53 32.62 33.71 34.80 45.62 56.33 66.98 88.13 109.1
1.64 3.22 4.64 5.99 7.29 8.56 9.80 11.03 12.24 13.44 14.63 15.81 16.98 18.15 19.31 20.47 21.61 22.76 23.90 25.04 26.17 27.30 28.43 29.55 30.68 31.79 32.91 34.03 35.14 36.25 47.27 58.16 68.97 90.41 111.7
2.07 3.79 5.32 6.74 8.12 9.45 10.75 12.03 13.29 14.53 15.77 16.99 18.20 19.41 20.60 21.79 22.98 24.16 25.33 26.50 27.66 28.82 29.98 31.13 32.28 33.43 34.57 35.71 36.85 37.99 49.24 60.35 71.34 93.11 114.7
2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 21.06 22.31 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 35.56 36.74 37.92 39.09 40.26 51.81 63.17 74.40 96.58 118.5
3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 38.89 40.11 41.34 42.56 43.77 55.76 67.50 79.08 101.9 124.3
5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 26.12 27.49 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 41.92 43.19 44.46 45.72 46.98 59.34 71.42 83.30 106.6 129.6
5.41 7.82 9.84 11.67 13.39 15.03 16.62 18.17 19.68 21.16 22.62 24.05 25.47 26.87 28.26 29.63 31.00 32.35 33.69 35.02 36.34 37.66 38.97 40.27 41.57 42.86 44.14 45.42 46.69 47.96 60.44 72.61 84.58 108.1 131.1
6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 44.31 45.64 46.96 48.28 49.59 50.89 63.69 76.15 88.38 112.3 135.8
7.88 10.60 12.84 14.86 16.75 18.55 20.28 21.95 23.59 25.19 26.76 28.30 29.82 31.32 32.80 34.27 35.72 37.16 38.58 40.00 41.40 42.80 44.18 45.56 46.93 48.29 49.64 50.99 52.34 53.67 66.77 79.49 91.95 116.3 140.2
9.14 11.98 14.32 16.42 18.39 20.25 22.04 23.77 25.46 27.11 28.73 30.32 31.88 33.43 34.95 36.46 37.95 39.42 40.88 42.34 43.78 45.20 46.62 48.03 49.44 50.83 52.22 53.59 54.97 56.33 69.70 82.66 95.34 120.1 144.3
10.83 13.82 16.27 18.47 20.51 22.46 24.32 26.12 27.88 29.59 31.26 32.91 34.53 36.12 37.70 39.25 40.79 42.31 43.82 45.31 46.80 48.27 49.73 51.18 52.62 54.05 55.48 56.89 58.30 59.70 73.40 86.66 99.61 124.8 149.4
12.12 15.20 17.73 20.00 22.11 24.10 26.02 27.87 29.67 31.42 33.14 34.82 36.48 38.11 39.72 41.31 42.88 44.43 45.97 47.50 49.01 50.51 52.00 53.48 54.95 56.41 57.86 59.30 60.73 62.16 76.09 89.56 102.7 128.3 153.2
A N S W E R S TO O D D - N U M B E R E D E X E R C I S E S CHAPTER 1 1.1 Working in seconds means avoiding decimals and fractions. 1.3 Exam1 5 95, Exam2 5 98, Final 5 96. 1.5 Cases: apartments. Five variables: rent (quantitative), cable (categorical), pets (categorical), bedrooms (quantitative), distance to campus (quantitative). 1.7 Answers will vary. (a) For example, number of graduates could be used for similar-sized colleges. (b) One possibility might be to compare graduation rates between private and public colleges. 1.9 (a) Individual employees. (b) Employee ID number, last name, first name, and middle initial are labels. Department and Education level are categorical variables. Years with the company, Salary, and Age are quantitative. 1.11 Age: quantitative, possible values 16 to ? (what would the oldest student’s age be?). Sing: categorical, yes/no. Play: categorical, no, a little, pretty well. Food: quantitative, possible values $0 to ? (what would be the most a person might spend in a week?). Height: quantitative, possible values 2 to 9 feet (check the Guinness World Records). 1.13 Answers will vary. A few possibilities are graduation rate, student/professor ratio, and job placement rate. 1.15 Answers will vary. One possibility is alcoholimpaired fatalities per 100,000 residents. This allows comparing states with different populations; however, states with large seasonal populations (like Florida) might be overstated. 1.17 Scores range from 55 to 98. The center is about 80. Very few students scored less than 70. 1.19 (a) The first line for the 3 (30) stem is now blank. (b) Use two stems, even though one is blank. Seeing the gap is useful. 1.21 The larger classes hide a lot of detail; there are now only three bars in the histogram. 1.23 A stemplot or histogram can be used; the distribution is unimodal and left-skewed, centered near 80, and range from 55 to 98. There are no apparent outliers. 1.25 (b) Second class had the fewest passengers. Third class had the most; over twice as many as first class. (c) A bar graph of relative frequency would have the same features. 1.27 A bar graph would be appropriate because each class is now a “whole” of interest.
1.29 (b) The overall pattern is unimodal (one major peak). The shape is roughly symmetric with center about 26 and spread from 19 to 33. There appears to be one possible low outlier. 1.31 (a) 2010 still has the highest usage in December and January. (b) The patterns are very similar, but we don’t see the increase between February and March that occurred in 2011; consumption in May was slightly higher in 2010. These differences are most likely due to weather. 1.33 For example, opinions about least-favorite color are somewhat more varied than for favorite color. Interestingly, purple is liked and disliked by about the same percentage of people. 1.35 (c) Preferences will vary, but the ordered bars make it easier to pick out similar categories. The most frequently recycled types (Paper and Trimmings) stand out in both graphs. (d) We cannot make a pie chart because each garbage type is a “whole.” 1.37 Mobile browsers are dominated by Safari (on iPads and iPhones). Android has about one-fourth of the market. All others are minor players. 1.39 (a and b) Black is clearly more popular in Europe than in North America. The most popular four colors account for at least 70% of cars in both regions. (c) One possibility is to cluster the bars for the two regions together by color. 1.41 (a) Region
% FB users
Region
Africa
3.9
Middle East
Asia
5.0
North America
Caribbean
15.4
Central America
26.5
Europe
28.5
% FB users 9.4 49.9
Oceania/Australia
38.9
South America
28.1
(b) For example, when looking only at the absolute number of Facebook users, Europe is the leading region; however, when expressed as a percent of the population, North America has the most Facebook users. (d) The shape of the distribution might be right-skewed (there are numerical gaps between 28 and 38 and between 38 and 49). The center of the distribution is about 26% (Central America). This stemplot does not really indicate any major outliers. (e) Answers will vary, but one possibility is that the scaling in the stemplot actually hides the gaps in the distribution. (f) One possibility is that both the population and number of Facebook users are rounded.
A-1
A-2
Answers to Odd-Numbered Exercises
1.43 (a) Four variables: GPA, IQ, and self-concept are quantitative; gender is categorical. (c) Unimodal and skewed to the left, centered near 7.8, spread from 0.5 to 10.8. (d) There is more variability among the boys; in fact, there seem to be two groups of boys: those with GPAs below 5 and those with GPAs above 5. 1.45 Unimodal and skewed to the left, centered near 59.5; most scores are between 35 and 73, with a few below that and one high score of 80 (probably not quite an outlier). 1.47 The new mean is 50.44 days. 1.49 The sorted data are 5
5
5
5
6
7
7
7
8
12
12
13
13
15
18
18
27
28
36
48
52
60
66
94
$7.5 million. (b) Using software, we find the numerical summaries shown below. Mean
StDev
Min
Q1
Med
Q3
Max
13,830
16,050
3731
4775
7516
15,537
77,839
(c) Answers will vary, but due to the severe right-skew, this distribution is best described by the five-number summary. 1.69 (a) With all the data, x ⫽ 5.23 and M 5 4.9. Removing the outliers, we have x ⫽ 4.93 and M 5 4.8. (b) With all the data, s 5 1.429; Q1 5 4.4, Q3 5 5.6. Removing the outliers, we have s 5 0.818, Q1 5 4.4, and Q3 5 5.5. 1.71 (a) With a small data set, a stemplot is reasonable. There are clearly two clumps of data. Summary statistics are shown below.
694
Adding the outlier adds another observation but does not change the median at all. 1.51 M 5 83. 1.53 x ⫽ 196.575 minutes (the value 197 in the text was rounded). The quartiles and median are in positions 20.5, 40.5, and 60.5. Q1 5 54.5, M 5 103.5, Q3 5 200. 1.55 Use the five-number summary from Exercise 1.54 (55, 75, 83, 93, 98). Be sure to give the plot a consistent, number-line axis. 2
1.57 s ⫽ 159.34 and s ⫽ 12.62. 1.59 Without Suriname, IQR 5 25; with Suriname, IQR 5 35. The IQR increases because there is one additional large observation, but it does not increase as much as the sample mean does. 1.61 (a) x ⫽ 122.92. (b) M 5 102.5. (c) The data set is right-skewed with an outlier (London), so the median is a better measure of center. 1.63 (a) IQR 5 62. (b) Outliers are below 226 or above 222. London is an outlier. (c) The first three quarters are about equal in length; the last (upper quarter) is extremely long. (d) The main part of the distribution is relatively symmetric; there is one extreme high outlier. (f) For example, the stemplot and the boxplot both indicate the same shape: relatively symmetric with an extremely high outlier. 1.65 (a) s ⫽ 8.80. (b) Q1 5 43.8 and Q3 5 57.0. (c) For example, if you think that the median is the better center in Exercise 1.64, that statistic should be paired with the quartiles and not with the standard deviation. 1.67 (a) A histogram of the data shows a strong right-skew. Half the companies have values less than
Mean
StDev
Min
Q1
Med
Q3
Max
6.424
1.400
3.7
4.95
6.7
7.85
8
(b) Because of the clusters of data, one set of numerical summaries will not be adequate. (c) After separating the data, we have for the smaller weights: Mean
StDev
Min
Q1
Med
Q3
Max
4.662
0.501
3.7
4.4
4.7
5.075
5.3
And for the larger weights: Mean
StDev
Min
Q1
Med
Q3
Max
7.253
0.740
6
6.5
7.6
7.9
8
1.73 (a) 0, 0, 5.09, 9.47, 73.2. (d) Answers will vary. The distribution is unimodal and strongly skewed to the right with five high outliers. 1.75 This distribution is unimodal and right-skewed and has no outliers. The five-number summary is 0.24, 0.355, 0.75, 1.03, 1.9. 1.77 Some people, such as celebrities and business executives, make a very large amount of money and have very large assets (think Bill Gates of Microsoft, Warren Buffett, Oprah, etc.). 1.79 The mean is $92,222.22. Eight of the employees make less than this. M 5 $45,000. 1.81 The median doesn’t change, but the mean increases to $101,111.11. 1.83 The average would be 2.5 or less (an earthquake that usually isn’t felt). These do little or no damage. 1.85 For n 5 2 the median is also the average of the two values. 1.87 (a) Place the new point at the current median.
Answers to Odd-Numbered Exercises 1.89 (a) Bihai: x ⫽ 47.5975, s ⫽ 1.2129. Red: x ⫽ 39.7113, s ⫽ 1.7988. Yellow: x ⫽ 36.1800, s ⫽ 0.9753 (all in mm). (b) Bihai and red appear to be right-skewed (although it is difficult to tell with such small samples). Skewness would make these distributions unsuitable for x and s.
1.117 Using the N(153, 34) distribution, we find the values corresponding to the given percentiles as given below (using Table A). The actual scores are very close to the percentiles of the Normal distribution.
1.91 Take six or more numbers, with the smallest number much smaller than Q1.
Percentile
Score
Score with N(153, 34)
10%
110
109
1.93 (a) Any set of four identical numbers works. (b) 0, 0, 20, 20 is the only possible answer.
25%
130
130
50%
154
153
1.95 Answers will vary with the technology. With newer technology, it is very hard to make this fail, until you reach the limits of the length of the number of digits allowed.
75%
177
176
90%
197
197
1.97 x ⫽ 5.104 pounds and s 5 2.662 pounds. 1.99 Full data set: x ⫽ 196.575 and M 5 103.5 minutes. The 10% and 20% trimmed means are x ⫽ 127.734 and x ⫽ 111.917 minutes, respectively. 1.101 212 to 364.
1.119 (a) Ranges are given in the table. Women
1.105 z 5 1.37. Using Table A, the proportion below 340 is 0.9147, and the proportion at or above is 0.0853. Using technology, the proportion below 340 is 0.9144. 1.107 x ⫽ m ⫹ zs. From Table A, we find the area to the left of z 5 0.67 is 0.7486 and the area to the left of z 5 0.68 is 0.7517. (Technology gives z 5 0.6745.) If we approximate as z 5 0.675, we have x 5 313.65, or about 314. 1.109 (a) In symmetric distributions, the mean and median are equal to each other. Examples are an equilateral triangle and a rectangle. (b) In left-skewed distributions, the mean is less than the median. 1.111 (c) The distributions look the same, only shifted. 1.113 (c) The table below indicates the desired ranges. Low
High
68%
256
320
95%
224
352
99.7%
192
384
Men
68%
8489 to 20,919
7158 to 22,886
95%
2274 to 27,134
2706 to 30,750
23941 to 33,349
28570 to 38,614
99.7%
1.103 z ⫽ 2.03.
A-3
In both cases, some of the lower limits are negative, which does not make sense; this happens because the women’s distribution is skewed, and the men’s distribution has an outlier. Contrary to the conventional wisdom, the men’s mean is slightly higher, although the outlier is at least partly responsible for that. (b) The means suggest that Mexican men and women tend to speak more than people of the same gender from the United States. 1.121 (a) F: 21.645. D: 21.04. C: 0.13. B: 1.04. (b) F: below 55.55. D: between 55.55 and 61.6. C: between 61.6 and 73.3. B: between 73.3 and 82.4. A: above 82.4. (c) Opinions will vary. 1.123 (a) 1y5 5 0.2. (b) 1y5 5 0.2. (c) 2y5 5 0.4. 1.125 (a) Mean is C, median is B (the right-skew pulls the mean to the right). (b) Mean A, median A. (c) Mean A, median B (the left-skew pulls the mean to the left). 1.127 (a) The applet shows an area of 0.6826 between 21.000 and 1.000, while the 68–95–99.7 rule rounds this to 0.68. (b) Between 22.000 and 2.000, the applet reports 0.9544 (rather than the rounded 0.95 from the 68–95– 99.7 rule). Between 23.000 and 3.000, the applet reports 0.9974 (rather than the rounded 0.997). 1.129 (a) 0.0446. (b) 0.9554. (c) 0.0287. (d) 0.9267.
1.115 Value
Percentile (Table A)
Percentile (Software)
150
50
50
140
38.6
38.8
100
7.6
7.7
180
80.5
80.4
230
98.9
98.9
1.131 (a) 0.77. (b) 0.77. 1.133 2.28%, or 0.0228. 1.135 Anthony has a z-score of 21.48. Joshua’s z-score is 20.83. Joshua’s score is higher. 1.137 About 2111. 1.139 20th percentile.
A-4
Answers to Odd-Numbered Exercises
1.141 About 1094 and lower. 1.143 1285, 1498, and 1711 (rounded to the nearest integer). 1.145 (a) From Table A, 33% of men have low values of HDL. (Software gives 32.95%.) (b) From Table A, 15.15% of men have protective levels of HDL. (Software gives 15.16%.) (c) 51.85% of men are in the intermediate range for HDL. (Software gives 51.88%.) 1.147 (a) 61.2816. (b) 8.93 and 9.31 ounces. 1.149 (a) 1.3490. (b) c 5 1.3490. 1.151 Percentile
10%
20%
30%
40%
50%
HDL level
35.2
42.0
46.9
51.1
55
Percentile
60%
70%
80%
90%
HDL level
58.9
63.1
68.0
74.8
1.153 (a) The yellow variety is the nearest to a straight line. (b) The other two distributions are both slightly right-skewed, and the bihai variety appears to have a couple of high outliers. (c) The deviations do not appear to be Normal. They seem to be right-skewed. 1.155 Histograms will suggest (but not exactly match) Figure 1.32. The uniform distribution does not extend as low or as high as a Normal distribution. 1.157 (a) The distribution appears to be roughly Normal, apart from two possible low and two possible high outliers. (b) The outliers on either end would inflate the standard deviation. The five-number summary is 8.5, 13.15, 15.4, 17.8, 23.8. (c) For example, smoking rates are typically 12% to 20%. Which states are high, and which are low? 1.159 For example, white is least popular in China, and silver is less common in Europe. Silver, white, gray, and black dominate the market worldwide. 1.163 (a) The distribution of 2010 Internet users is right-skewed. The five-number summary is 0.21, 10.31, 31.40, 55.65, 95.63. (b) The distribution of the change in users is right-skewed. The five-number summary is 21.285, 0.996, 2.570, 4.811, 22.000. (c) The percent change is also right-skewed. Two countries effectively tripled their Internet penetration (but it’s still minuscule). The five-number summary is 212.50, 5.58, 10.75, 20.70, 327.32. 1.165 A bar graph is appropriate (there are other providers besides the 10 largest; we don’t know who they are). There are two major providers and several smaller ones. 1.167 (a) For car makes (a categorical variable), use either a bar graph or a pie chart. For car age (a quantitative variable), use a histogram, stemplot, or boxplot.
(b) Study time is quantitative, so use a histogram, stemplot, or boxplot. To show change over time, use a time plot (average hours studied against time). (c) Use a bar graph or pie chart to show radio station preferences. (d) Use a Normal quantile plot to see whether the measurements follow a Normal distribution. 1.169 s ⫽ 7.50. 1.171 (a) One option is to say m 5 81.55 (the average of the given 50th percentile and mean). The 5th percentile is 42 5 81.55 2 1.645s. The 95th percentile is 142 5 81.55 1 1.645s. If we average the two estimates, we would have s ⫽ 30.4. (c) From the two distributions, over half of women consume more vitamin C than they need, but some consume far less. 1.173 (a) Not only are most responses multiples of 10; many are multiples of 30 and 60. The students who claimed 360 minutes (6 hours) and 300 minutes (5 hours) may have been exaggerating. (b) Women seem to generally study more (or claim to), as there are none that claim less than 60 minutes per night. The center (median) for women is 170; for men the median is 120 minutes. (c) Opinions will vary. 1.175 No to both questions; no summary can exactly describe a distribution that can include any number of values. 1.177 Simulation results will vary.
CHAPTER 2 2.1 (a) The 30 students. (b) Attendance and score on the final exam. (c) Score on the final is quantitative. Attendance is most likely quantitative: number of classes attended (or missed). 2.3 Cases: cups of Mocha Frappuccino. Variables: size and price (both quantitative). 2.5 (a) Tweets. (b) Click count and length of tweet are quantitative. Day of week and gender are categorical. Time of day could be quantitative (as hr:min) or categorical (if morning, afternoon, etc.). (c) Click count is the response. The others could all be potentially explanatory. 2.7 Answers will vary. Some possible variables are condition, number of pages, and binding type (hardback or paperback), in addition to purchase price and buyback price. Cases are the individual textbooks; one might be interested in predicting buyback price based on other variables. 2.9 (a) Temperatures are usually similar from one day to the next (recording temperatures at noon each day, for example). One variable that would help is whether a front (cold or warm) came through. (b) No relationship. These are different individuals. (c) Answers will vary. It’s possible that quality and price are related but not certain.
Answers to Odd-Numbered Exercises 2.11 Price per load looks right-skewed. Quality rating has two different clusters of values. Variable Rating
Mean StDev Min 43.88 10.77
PricePerLoad 14.21
5.99
Q1
26 33.5 5
Med
Q3 Max
47 51.5
10 13.5
17
61 30
2.13 (a) Divide each price by 100 to convert to dollars. (c) The only difference is the scaling of the x axis. 2.15 For example, a new variable might be the ratio of the 2010 and 2009 debts. 2.17 (a) All the liquid detergents are at the upper right, and the powder detergents are at the lower left. (b) Answers will vary. 2.19 (b) The overall pattern is linear and increasing. There is one possible outlier at the upper right, far from the other points. (c) The relationship is roughly linear, increasing, and moderately strong. (d) The baseball player represented by the point at the far right is not as strong in his dominant arm as other players. (e) Other than the one outlier, the relationship is approximately linear. 2.21 (a) Population should be the explanatory variable and college students the response. (b) The graph shows a strong, linear, increasing relationship with one high outlier in both undergraduates and population (California). 2.23 (b and c) The relationship is very strong, linear, and decreasing. (d) There do not appear to be any outliers. (e) The relationship is linear. 2.25 (a) The description is for variables that are positively related. (b) The response variable is plotted on the y axis, and the explanatory on the x axis. (c) A histogram shows the distribution of a single variable, not the relationship between two variables. 2.27 (b) The relationship is linear, increasing, and much stronger than the relationship between carbohydrates and percent alcohol. 2.29 (b) The plot is much more linear. 2.31 (a) Examine for a relationship. (b) Use high school GPA as explanatory and college GPA as response. (c) Use square feet as explanatory and rental price as response. (d) Use amount of sugar as explanatory and sweetness as response. (e) Use temperature yesterday at noon as explanatory and temperature today at noon as response. 2.33 (a) In general, we expect more intelligent children to be better readers, and less intelligent children to be weaker readers. The plot does show this positive association. (b) These four have moderate IQs but poor reading scores. (c) Roughly linear but weak (much scatter). 2.35 (b) The association is positive and linear. Overall, the relationship is strong, but it is stronger for women than for men. Male subjects generally have both greater lean body mass and higher metabolic rates than women.
A-5
2.37 (a) Both show fairly steady improvement. Women have made more rapid progress but have not improved since 1993, while men’s records may be dropping more rapidly in recent years. (b) The data support the first claim but do not seem to support the second. 2.39 (a) This is a linear transformation. Dollars 5 0 1 0.01 3 cents. (b) r 5 0.671. (c) They are the same. (d) Changing units does not change r. 2.41 (a) No linear relationship. (There could be a nonlinear relationship, though.) (b) Strongly linear and negative. (c) Weakly linear and positive. (d) Strongly linear and positive. 2.43 (a) r 5 0.905. (b) Correlation is a good summary for these data. The pattern is linear and appears to be strong. There is, however, one outlier at the upper right. 2.45 (a) r 5 0.984. (b) The correlation may be a good summary for these data because the scatterplot is strongly linear. California, however, is an outlier that strengthens the relationship (makes r closer to 1). (c) Eliminate California, Texas, Florida, and New York. r 5 0.971. Expanding the range of values can strengthen a relationship (if the new points follow the rest of the data). 2.47 (a) r 5 20.999. (b) Correlation is a good numerical summary here because the scatterplot is very strongly linear. (c) You must be careful; there can be a strong correlation between two variables even when the relationship is curved. 2.49 The correlation would be 1 in both cases. These are purely linear relationships. 2.51 r 5 0.521. 2.53 (a) r 5 20.730. (b) The relationship is curved; birthrate declines with increasing Internet use until about 40 Internet users per 100 people. After that, there is a steady overall birthrate. Correlation is not a good numerical summary for this relationship. 2.55 (a) r 5 61 for a line. (c) Leave some space above your vertical stack. (d) The curve must be higher at the right than at the left. 2.57 The correlation is r ⫽ 0.481. The correlation is greatly lowered by the one outlier. Outliers tend to have fairly strong effects on correlation; it is even stronger here because there are so few observations. 2.59 There is little linear association between research and teaching—for example, knowing a professor is a good researcher gives little information about whether she is a good or a bad teacher. 2.61 Both relationships are somewhat linear; GPA/IQ (r 5 0.634) is stronger than GPA/self-concept (r 5 0.542). The two students with the lowest GPAs stand out in both plots; a few others stand out in at least one plot. Generally speaking, removing these points raises r, except for the lower-left point in the self-concept plot.
A-6
Answers to Odd-Numbered Exercises
2.63 1.785 kilograms. 2.65 Expressed as percents, these fractions are 64%, 16%, 4%, 0%, 9%, 25%, and 81%. 2.67 The relationship is roughly linear. Bone strength in the dominant arm increases about 1.373 units for every unit increase in strength in the nondominant arm.
2.85 (a) 36. (b) When x increases one unit, y increases by 8 (the value of the slope). (c) The intercept is 12. 2.87 IQ and GPA: r1 ⫽ 0.634. Self-concept and GPA: r2 ⫽ 0.542. IQ does a slightly better job. 2.89 When x ⫽ x, yˆ ⫽ a ⫹ bx ⫽ 1y ⫺ bx2 ⫹ bx ⫽ y. 2.91 Scatterplots and correlations were found in Exercises 2.36 and 2.54. The regression equations
2.71 (a–c)
are Value 5 1073.87 1 (1.74 3 Debt), with r2 5 0.5%;
7
2.69 22.854 cm4兾1000.
7
Time
Count
Predicted
1
578
528.1
Difference
Squared difference
and Value 5 872.6 1 (5.695 3 Income), with r2 5 79.4%. 2.93 The residuals sum to 0.01.
49.9
2490.01
2.95 The residuals are 24.93, 25.09, 0.01, and 7.71. 2.97 (a–b)
3
317
378.7
261.7
3806.89
5
203
229.3
226.3
691.69
7
118
79.9
38.1
1451.61
(d) Count 5 500 2 (100 3 time) Time
Count
Difference
Squared difference
1
578
400
178
31,684
3
317
200
117
13,689
5
203
0
203
41,209
7
118
2200
318
101,124
Predicted
Value 5 2262.4 1 (4.966 3 Revenue), with r2 5 92.7%;
7
Count 5 602.8 2 (74.7 3 time)
2.73 yˆ 5 215,294.868 1 0.0533x, or, in context,
7
Students ⫽ ⫺15,294.868 ⫹ 0.0533Population. 2.75 (a) 304,505.13 students. (b) 299,247.47 students. (c) Including the states with the largest populations (and largest numbers of undergraduates) increases the estimate by about 5000 students.
7
2.77 (a) Students ⫽ 8491.907 ⫹ 0.048Population. (b) r 2 5 0.942. (c) About 94.2% of the variability in number of undergraduates is explained by the regression on population. (d) The numerical output does not tell us whether the relation is linear.
7
2.79 (a) Carbs ⫽ 2.606 ⫹ 1.789PercentAlcohol. (b) r2 5 0.271. 2.81 (a) All correlations are approximately 0.816 or 0.817, and the regression lines are yˆ ⫽ 3.000 ⫹ 0.500x. We predict yˆ ⫽ 8 when x ⫽ 10. (c) This regression formula is only appropriate for Set A. 2.83 (a) The added point is an outlier that does not follow the pattern of the rest of the data. It is an outlier in the x direction but not in y. (b) The new regression equation is yˆ 5 27.56 1 0.1031x. (c) r2 5 0.052. This added point is influential both to the regression equation (both the intercept and slope changed substantially from yˆ 5 17.38 1 0.6233x) and the correlation.
Time
LogCount
Predicted
Residual
1
6.35957
6.332444
3
5.75890
5.811208
5
5.31321
5.289972
0.023238
7
4.77068
4.768736
0.001944
0.027126 20.05231
2.99 (f) In the log scale, California is no longer an outlier anywhere, nor is it influential. 2.101 (c) One data point stands out in both graphs; it is West Virginia, with the largest positive residual. The next-largest positive residual belongs to Iowa. These do not seem to be influential. (d and e) Using the log data removes California as a potentially influential outlier. The data are more equally spread across the range. One possible disadvantage of using the log data is that explaining this to people could be difficult. 2.103 (a) If the line is pulled toward the influential point, the observation will not necessarily have a large residual. (b) High correlation is always present if there is causation. (c) Extrapolation is using a regression to make predictions for x-values outside the range of the data (here, using 20, for example). 2.105 Internet use does not cause people to have fewer babies. Possible lurking variables are economic status of the country, education levels, etc. 2.107 For example, a reasonable explanation is that the cause-and-effect relationship goes in the other direction: doing well makes students or workers feel good about themselves rather than vice versa. 2.109 The explanatory and response variables were “consumption of herbal tea” and “cheerfulness/health.” The most important lurking variable is social interaction; many of the nursing-home residents may have been lonely before the students started visiting. 2.111 (a) Drawing the “best line” by eye is a very inaccurate process. But with practice, you can get better.
Answers to Odd-Numbered Exercises
A-7
2.113 The plot should show a positive association when either group of points is viewed separately and should show a large number of bachelor’s degree economists in business and graduate degree economists in academia.
2.129 3.0% of Hospital A’s patients died, compared with 2.0% at Hospital B.
2.115 1278 met the requirements; 751 did not meet requirements.
2.133 For example, causation might be a negative association between the temperature setting on a stove and the time required to boil a pot of water (higher setting, less time). Common response might be a positive association between SAT scores and grade point average. Both of these will respond positively to a person’s IQ. An example of confounding might be a negative association between hours of TV watching and grade point average. Once again, people who are naturally smart could finish required work faster and have more time for TV; those who aren’t as smart could become frustrated and watch TV instead of doing homework.
2.117 Divide the cell count by the total for the table. 2.119 417兾974 5 0.4281 (which rounds to 43%). 2.121 (a) Drivers education course (yes/no) is the explanatory variable. The number of accidents is the response. (b) Drivers ed would be the column (x) variable, and number of accidents would be the row (y) variable. (c) There are 6 cells. For example, the first row, first column entry could be the number who took drivers ed and had 0 accidents. 2.123 (a) Age is the explanatory variable. “Rejected” is the response. With the dentistry available at that time, it’s reasonable to think that as a person got older, he would have lost more teeth. (b) ,20
20–25
25–30
30–35
35–40
.40
Yes
0.0002
0.0019
0.0033
0.0053
0.0086
0.0114
No
0.1761
0.2333
0.1663
0.1316
0.1423
0.1196
(c) Marginal distribution of “Rejected” Yes
No
0.03081
0.96919
2.131 In general, choose a to be any number from 0 to 200, and then all the other entries can be determined.
2.135 This is a case of confounding: the association between dietary iron and anemia is difficult to detect because malaria and helminths also affect iron levels in the body. 2.137 For example, students who choose the online course might have more self-motivation or better computer skills. 2.139 No; self-confidence and improving fitness could be a common response to some other personality trait, or high self-confidence could make a person more likely to join the exercise program. 2.141 Patients suffering from more serious illnesses are more likely to go to larger hospitals and may require more time to recuperate afterward. 2.143 People who are overweight are more likely to be on diets and so choose artificial sweeteners.
Marginal distribution of age ,20
20–25
25–30
30–35
35–40
.40
0.1763
0.2352
0.1696
0.1369
0.1509
0.1310
2.145 This is an observational study—students choose their “treatment” (to take or not take the refresher sessions). 2.147 (a) The tables are shown below. Female Titanic passengers
(d) The conditional distribution of Rejected given Age, because we have said Age is the explanatory variable. (e) In the table, note that all columns sum to 1. ,20
20–25
25–30
30–35
35–40
.40
Yes
0.0012
0.0082
0.0196
0.0389
0.0572
0.0868
No
0.9988
0.9918
0.9804
0.9611
0.9428
0.9132
Class
Survived
1
2
3
Total
139
94
106
339
Died
5
12
110
127
Total
144
106
216
466
Male Titanic passengers 2.125 Students with GPAs less than 2.0 are much more likely to enroll for 11 or fewer credits (68.5%). Students with GPAs above 3.0 are most likely to enroll for 15 or more credits (66.6%). 2.127 (a) 50.5% get enough sleep; 49.5% do not. (b) 32.2% get enough sleep; 67.8% do not. (c) Those who exercise more than the median are more likely to get enough sleep.
Class 1 Survived
2
3
Total
61
25
75
161
Died
118
146
418
682
Total
179
171
493
843
A-8
Answers to Odd-Numbered Exercises
2.149 (a) This is a negative relationship, mostly due to two outliers. (b) r 5 20.839. This would not be a good numerical summary for this relationship.
2.153 (a) The relationship is weakly increasing and linear. We almost seem to have two sets of data: five countries with high production and the rest. One country with Dwelling Permit Index approximately 225 (Canada) might be influential. (b) The equation is
7
2.155 A stacked bar graph clearly shows that offering the RDC service depends on size of the bank. Larger banks are much more likely to offer the service than smaller ones. 2.157 (a) The marginal totals are SsBL: 1688; SME: 911; AH: 801; Ed: 319; and Other: 857. By country, the marginal totals are Canada: 176; France: 672; Germany: 218; Italy: 321; Japan: 645; U.K.: 475; U.S.: 2069. (b) Canada: 0.0385; France: 0.1469; Germany: 0.0476; Italy: 0.0701; Japan: 0.1410; U.K.: 0.1038; U.S.: 0.4521. (c) SsBL: 0.3689; SME: 0.1991; AH: 0.1750; Ed: 0.0697; Other: 0.1873. 2.159 A school that accepts weaker students but graduates a higher-than-expected number of them would have a positive residual, while a school with a stronger incoming class but a lower-than-expected graduation rate would have a negative residual. It seems reasonable to measure school quality by how much benefit students receive from attending the school. 2.163 (a) The residuals are positive at the beginning and end, and negative in the middle. (b) The behavior of the residuals agrees with the curved relationship seen in Figure 2.34. 2.165 (a) The regression equation for predicting salary
7
from year is Salary ⫽ 41.253 ⫹ 3.9331 Year; for Year 25, the predicted salary is 139.58 thousand dollars, or about $139,600. (b) The log salary regression equation is
7
lnSalary ⫽ 3.8675 ⫹ 0.04832 Year. At Year 25 the predicted salary is e5.0755 5 160.052, or about $160,050.
2.169 Number of firefighters and amount of damage are common responses to the seriousness of the fire. 2.171 (b) The regression line PctCollEd ⫽ 4.033 ⫹ 0.906FruitVeg5 generally describes the relationship. There is one outlier at the upper right of the scatterplot (Washington, DC). (d) While the scatterplot and regression support a positive association between college degrees and eating fruits and vegetables, association is not causation. 2.173 The scatterplot of MOR against MOE shows a moderate positive linear association. The regression
7
Production ⫽ 110.96 ⫹ 0.0732 DwellPermit. (c) 122.672. (d) e 5 213.672. (e) r2 5 2.0%. Both indicate very weak relationships, but this is weaker.
2.167 (a) The regression equation is 2013Salary ⫽ 6523 ⫹ 0.97291 ⫻ 12012Salary2. (b) The residuals appear rather random, but we note that the largest positive residuals are on either end of the scatterplot. The largest negative residual is for the next-to-highest 2012–2013 salaried person.
7
2.151 (b) In Figure 2.33, we can see that the three territories have smaller proportions of their populations over 65 than the provinces. The two areas with the largest percents of the population under 15 are Nunavut and Northwest Territories.
(c) Although both predictions involve extrapolation, the second is more reliable because it is based on a linear fit to a linear relationship. (d) Interpreting relationships without a plot is risky.
7
(b) 96.53% of first-class females survived, 88.68% of second-class females survived, and 49.07% of thirdclass females survived. Survival depended on class. (c) 34.08% survival among first class, 14.62% survival among second class, and 15.21% survival among third class. Once again, survival depended on class. (d) Females overall had much higher survival rates than males.
equation is MOR ⫽ 2653 ⫹ 0.004742MOE; this regression explains r2 ⫽ 0.6217, or about 62% of the variation in MOR. So we can use MOE to get fairly good (though not perfect) predictions of MOR. 2.175 (a) Admit
Deny
Male
490
310
Female
400
300
(b) Males: 61.25% admitted. Females: 57.14% admitted. (c) Business school: 66.67% of males, 66.67% of females. Law school: 45% of males, 50% of females. (d) Most male applicants apply to the business school, where admission is easier. More women apply to the law school, which is more selective. 2.177 If we ignore “year,” Department A teaches 61.5% small classes while Department B teaches 39.6% small classes. However, in upper-level classes, 77.5% of A’s classes are small and 83.3% of B’s classes are small. Department A has 77.8% of its classes as upper-level, while only 33.96% of B’s classes are upper level.
CHAPTER 3 3.1 Answers will vary. One possibility is that the friend has a weak immune system. 3.3 Answers will vary, but the individual’s denial is clearly insufficient evidence to conclude that he did not use performance enhancing drugs.
Answers to Odd-Numbered Exercises 3.5 For example, who owns the web site? Do they have data to back up this statement, and if so, what was the source of those data?
A-9
3.31 Yes; each customer (who returns) will get both treatments.
3.7 Available data are from prior studies. They might be from either observational studies or experiments.
3.33 (a) Shopping patterns may differ on Friday and Saturday. (b) Responses may vary in different states. (c) A control is needed for comparison.
3.9 This is not an experiment (running them until the batteries die is not assigning treatments) or a sample survey.
3.35 For example, new employees should be randomly assigned to either the current program or the new one.
3.11 This is an experiment. Explanatory variable: apple form (juice or whole fruit); response variable: how full the subject felt. 3.13 The data were likely from random samples of cans of tuna.
3.37 (a) Factors: calcium dose and vitamin D dose. There are nine treatments (each calcium/vitamin D combination). (b) Assign 20 students to each group, with 10 of each gender. (d) Placebo is the 0 mg calcium, 0 mg vitamin D treatment.
3.15 (a) Anecdotal data. (b) This is a sample survey but likely biased. (c) Still a survey but random. (d) Answers will vary.
3.39 There are nine treatments. Choose the number of letters in each group, and send them out at random times over several weeks.
3.17 In Exercise 3.14: extra milk and no extra milk. In Exercise 3.16: no pills, pills without echinacea, pills with echinacea but subjects weren’t told, and pills with echinacea that were labeled as containing echinacea.
3.41 (a) Population 5 1 to 150, sample size 5 25, then click “Reset” and “Sample.” (b) Without resetting, click “Sample” again. (c) Continue to “Sample” from those remaining.
3.19 Treatments are the four coaching types that were actively assigned to the experimental units (subjects), who were 204 people. Factor is type of coaching, with four levels: increase fruit and vegetable intake and physical activity, decrease fat and sedentary leisure, decrease fat and increase physical activity, and increase fruit and vegetable intake and decrease sedentary leisure. Response is the measure of diet and exercise improvement after three weeks. This experiment had a very high completion rate.
3.43 Design (a) is an experiment, while (b) is an observational study; with the first, any difference in colon health between the two groups could be attributed to the treatment (bee pollen or not).
3.21 With 719 subjects, randomly assign 180 to each of the first three treatments and 179 to the last (echinacea with the labeled bottle). Afterward, compare diet and exercise improvement. 3.23 Answers will vary due to software. 3.25 (a) Experimental units were the 30 students. They are human, so we can use “subjects.” (b) Only one “treatment,” so not comparative. One possibility is to randomly assign half to the online homework system and half to “standard” homework. (c) One possibility is grade on an exam over the material from that month. 3.27 (a) Experimental units (subjects): people who go to the web site. Treatments: description of comfort or showing discounted price. Response variable: shoe sales. (b) Comparative, because of two treatments. (c) One option to improve: randomly assign morning and afternoon treatments. (d) Placebo (no special description or price) could give a “baseline” sales figure. 3.29 Starting on line 101, using 1 to 5 as morning and 6 to 0 as afternoon for comfort description, we have 19223 95034: comfort description is displayed in the afternoon on Days 2, 6, and 8 and in the morning on the other days.
3.45 (a) Randomly assign half the girls to get highcalcium punch; the other half will get low-calcium punch. Observe how each group processes the calcium. (b) Half receive high-calcium punch first; the rest get low-calcium punch first. For each subject, compute the difference in the response variable for each level. Matched pairs designs give more precise results. (c) The first five subjects are 35, 39, 16, 04, and 26. 3.47 Answers will vary. For example, the trainees and experienced professionals could evaluate the same water samples. 3.49 Population: forest owners from this region. Sample: the 348 returned questionnaires. Response rate: 348兾772 5 45%. Additionally, we would like to know the sample design (among other things). 3.51 Answers will vary depending on use of software or starting point in Table B. 3.53 (a) Season ticket holders. (b) 98 responses received. (c) 98兾150 5 65.3%. (d) 34.7%. (e) One possibility is to offer incentives (free hotdog?). 3.55 (a) Answers will vary depending on use of software. (b) Software is usually more efficient than Table B. 3.57 (a) The population is all items/individuals of potential interest. (b) Many people probably will not realize that dihydrogen monoxide is water. (c) In a public setting, few people will admit to cheating.
A-10
Answers to Odd-Numbered Exercises
3.59 Population: all local businesses. Sample: the 72 returned questionnaires. Nonresponse rate: 55%. 3.61 Note that the numbers add to 100% down the columns; that is, 39% is the percent of Fox viewers who are Republicans, not the percent of Republicans who watch Fox.
3.87 Answers will vary due to computer simulation. You should have a mean close to 0 and a standard deviation close to 1. 3.89 (a) The larger sample size should have a smaller standard deviation (less variability).
3.63 Labeled in alphabetical order, using line 126: 31 (Village Manor), 08 (Burberry), 19 (Franklin Park), 03 (Beau Jardin), and 25 (Pemberley Courts).
3.91 (a) Population: Students at four-year colleges in the U.S. Sample: 17,096 students. (b) Population: restaurant workers. Sample: 100 workers. (c) Population: 584 longleaf pine trees. Sample: 40 trees.
3.65 Population 5 1 to 200, sample size 5 20, then click “Reset” and “Sample.” Selections will vary.
3.93 The histograms should be centered at about 0.6 with standard deviation about 0.1.
3.67 Labeling the tracts in numerical order from 01 (block 1000) to 44 (block 3025), the selected random digits are labeled 21 (block 3002), 37 (block 3018), 18 (block 2011), 44, 23, 19, 10, 33, and 31.
3.95 Answers will vary due to computer simulation.
3.69 Answers will vary. Beginning on line 110, from Group 1 (labeled 1 through 6), select 3 and 4. Continuing from there, from Group 2 (labeled 01 through 12), select 08 and 05. Continuing from there, from Group 3 (labeled 01 through 26), select 13, 09, and 04.
3.97 (a) Nonscientists might have different viewpoints and raise different concerns from those considered by scientists. 3.99 Answers will vary. This question calls for a reasoned opinion. 3.101 Answers will vary. This question calls for a reasoned opinion.
3.71 Each student has chance 1兾40 of being selected, but the sample is not an SRS, because the only possible samples have exactly one name from the first 40, one name from the second 40, and so on.
3.103 No. Informed consent needs clear information on what will be done.
3.73 Number the parcels 01 through n for each forest type. Using Table B, select Climax 1: 05, 16, 17, 40, and 20; Climax 2: 19, 45, 05, 32, 19, and 41; Climax 3: 04, 19, and 25; Secondary: 29, 20, 43, and 16.
3.107 Answers will vary. This question calls for a reasoned opinion.
3.75 Each individual has a 1-in-8 chance of being selected. 3.77 (a) Households without telephones or with unlisted numbers. Such households would likely be made up of poor individuals, those who choose not to have phones, and those who do not wish to have their phone number published. Households with only cell phones are also not included. (b) Those with unlisted numbers. Or only cell phones. 3.79 The female and male students who responded are the samples. The populations are all college undergraduates (males and females) who could be judged to be similar to the respondents. This report is incomplete; a better one would give numbers about who responded, as well as the actual response rate. 3.81 The larger sample would have less sampling variability. 3.83 Answers will vary due to computer simulation. You should have a mean close to 0.5 and a standard deviation close to 0.204. 3.85 Answers will vary due to computer simulation. You should have a mean close to 0.5 and a standard deviation close to 0.08.
3.105 Answers will vary. This question calls for a reasoned opinion.
3.109 The samples should be randomly ordered for analysis. 3.111 Interviews conducted in person cannot be anonymous. 3.113 Answers will vary. This question calls for a reasoned opinion. 3.115 (a) Informed consent requires informing respondents about how the data will be used, how long the survey will take, etc. 3.117 Answers will vary. This question calls for a reasoned opinion. 3.119 Answers will vary. This question calls for a reasoned opinion. 3.121 (a) You need information about a random selection of his games, not just the ones he chooses to talk about. (b) These students may have chosen to sit in the front; all students should be assigned to their seats. 3.123 This is an experiment because treatments are assigned. Explanatory variable: price history (steady or fluctuating). Response variable: price the subject expects to pay.
Answers to Odd-Numbered Exercises 3.127 Randomly choose the order in which the treatments (gear and steepness combination) are tried. 3.129 (a) One possibility: full-time undergraduate students in the fall term on a list provided by the registrar. (b) One possibility: a stratified sample with 125 students from each class rank. (c) Nonresponse might be higher with mailed (or emailed) questionnaires; telephone interviews exclude some students and may require repeated calling for those who do not answer; face-to-face interviews might be too costly. The topic might also be subject to response bias. 3.131 Use a block design: separate men and women, and randomly allocate each gender among the six treatments.
A-11
4.21 (a) Not equally likely (check the web). (b) Equally likely. (c) This could depend on the intersection; is the turn onto a one-way street? (d) Not equally likely. 4.23 (a) The probability that both of two disjoint events occur is 0. (b) Probabilities must be no more than 1. (c) P(Ac) 5 0.65. 4.25 There are 6 possible outcomes: S 5 {link 1, link 2, link 3, link 4, link 5, leave}. 4.27 (a) 0.172. (b) 0.828. 4.29 (a) 0.03, so the sum equals 1. (b) 0.55. 4.31 (a) The probabilities sum to 2. (b) Legitimate (for a nonstandard deck). (c) Legitimate (for a nonstandard die).
3.133 CASI will typically produce more honest responses to embarrassing questions.
4.33 (a) 0.28. (b) 0.88.
3.135 Answers will vary. This question calls for a reasoned opinion.
4.35 Take each blood type probability and multiply by 0.84 and by 0.16. For example, the probability for A-positive blood is (0.42)(0.84) 5 0.3528.
3.137 Answers will vary. This question calls for a reasoned opinion.
4.37 (a) 0.006. (b) 0.001.
CHAPTER 4
4.41 Observe that P(A and Bc) 5 P(A) 2 P(A and B) 5 P(A) 2 P(A)P(B).
4.1 The proportion of heads is 0.5. In this case, we did get exactly 10 heads (this will NOT happen every time).
4.43 (a) Either B or O. (b) P(B) 5 0.75, and P(O) 5 0.25.
4.3 (a) This is random. We can discuss the probability (chance) that the temperature would be between 30 and 35 degrees, for example. (b) Depending on your school, this is not random. At my university, all student IDs begin with 900. (c) This is random. The probability of an ace in a single draw is 4y52 if the deck is well shuffled.
4.39 0.5160.
4.45 (a) 0.25. (b) 0.015625; 0.140625. 4.47 Possible values: 0, 1, 2. Probabilities: 1y4, 1y2, 1y4. 4.49 x P(X 5 x)
1
2
3
4
5
6
0.05
0.05
0.13
0.26
0.36
0.15
4.5 Answers will vary depending on your set of 25 rolls. 4.7 If you hear music (or talking) one time, you will almost certainly hear the same thing for several more checks after that. 4.9 The theoretical probability is 0.5177. What were the results of your “rolls”? 4.11 One possibility: from 0 to ____ hours (the largest number should be big enough to include all possible responses). 4.13 0.80 (add the probabilities of the other four colors and subtract from 1). 4.15 0.681. 4.17 1y4, or 0.25. 4.19 (a) S 5 {Yes, No}. (b) S 5 {0, 1, 2, . . . , n} where n is large enough to include a really busy tweeter. (c) S 5 [18, 75] is one possibility. This is given as an interval because age is a continuous variable. (d) S 5 {Accounting, Archeology, . . .}. This list could be very long.
4.51 (a) 0.23. (b) 0.62. (c) 0. 4.53 (a) Discrete random variables. (b) Continuous random variables can take values from any interval. (c) Normal random variables are continuous. 4.55 (a) P(T) 5 0.19. (b) P(TTT) 5 0.0069, P(TTT c) 5 P(TT cT) 5 P(T cTT) 5 0.0292, P(TT cT c) 5 P(T cTT c) 5 P(T cT cT) 5 0.1247, and P(T cT cT c) 5 0.5314. (c) P(X 5 3) 5 0.0069, P(X 5 2) 5 0.0876, P(X 5 1) 5 0.3741, and P(X 5 0) 5 0.5314. 4.57 (a) Continuous. (b) Discrete. (c) Discrete. 4.59 (a) Note that, for example, “(1, 2)” and “(2, 1)” are distinct outcomes. (b) 1y36. (c) For example, four pairs add to 5, so P(X 5 5) 5 4y36 5 1y9. (d) 2y9. (e) 5y6. 4.61 (b) P(X $ 1) 5 0.9. (c) “No more than two nonword errors.” P(X # 2) 5 0.7; P(X , 2) 5 0.4. 4.63 (a) The height should be 1y2. (b) 0.8. (c) 0.6. (d) 0.525. 4.65 Very close to 1.
A-12
Answers to Odd-Numbered Exercises
4.67 Possible values: $0 and $5. Probabilities: 0.5 and 0.5. Mean: $2.50.
4.115 (a) The four entries are 0.2684, 0.3416, 0.1599, 0.2301. (b) 0.5975.
4.69 mY 5 68.
4.117 For example, the probability of selecting a female student is 0.5717; the probability that she comes from a 4-year institution is 0.5975.
4.71 s2X ⫽ 2.16 and sX 5 1.47. 4.73 As the sample size gets larger, the standard deviation decreases. The mean for 1000 will be much closer to m than the mean for 2 (or 100) observations. 4.75
s2X
⫽ 1.45 and sX 5 1.204.
4.119 P1A 0 B2 ⫽ 0.3142. If A and B were independent, then P(A 0 B) would equal P(A). 4.121 (a) P(Ac) 5 0.69. (b) P(A and B) 5 0.08. 4.123 1.
4.77 (a) 202. (b) 198. (c) 60. (d) 220. (e) 2140.
4.125 (a) P(B 0 C) 5 1y3. P(C 0 B) 5 0.2.
4.79 Mean 5 2.2 servings.
4.127 (a) P1M2 ⫽ 0.3959. (b) P1B 0 M2 ⫽ 0.6671. (c) P1M2 P1B2 ⫽ 0.2521, so these are not independent.
4.81 0.373 aces. 4.83 (a) $85.48. (b) This is larger; the negative correlation decreased the variance. 4.85 The exercise describes a positive correlation between calcium intake and compliance. Because of this, the variance of total calcium intake is greater than the variance we would see if there were no correlation. 4.87 (a) m 5 s 5 0.5. (b) m4 5 2 and s4 5 1. 4.89 (a) Not independent. (b) Independent. 4.91 If 1 of the 10 homes were lost, it would cost more than the collected premiums. For many policies, the average claim should be close to $300. 4.93 (a) 0.99749. (b) $623.22. 4.95 1y2 5 0.5. 4.97 2y48 5 1y24. 4.99 The addition rule for disjoint events. 4.101 With 23 cards seen, there are 29 left to draw from. The four probabilities are 45y406, 95y406, 95y406, and 171y406. 4.103 (a) 0.8. (b) 0.2. 4.105 (a) 5y6 5 0.833. 4.107 (a) A 5 5 to 10 years old, B 5 11 to 13 years old, C 5 adequate calcium intake, I 5 inadequate calcium intake. (b) P(A) 5 0.52, P(B) 5 0.48, P(I 0 A) 5 0.18, P(I 0 B) 5 0.57. (c) P(I) 5 0.3672. 4.109 Not independent. P(I 0 A) 5 0.18, P(I 0 B) 5 0.57. These are different. 4.111 (a) 0.16. (b) 0.22. (c) 0.38. (d) For (a) and (b), use the addition rule for disjoint events; for (c), use the addition rule, and note that S cand E c 5 (S or E)c. 4.113 0.73; use the addition rule.
4.129 (a) Her brother has allele type aa, and he got one allele from each parent. (b) P(aa) 5 0.25, P(Aa) 5 0.5, P(AA) 5 0.25. (c) P(AA 0 not aa) 5 1y3, P(Aa 0 not aa) 5 2y3. 4.131 0.9333. 4.133 Close to mX 5 1.4. 4.135 (a) Possible values 2 and 14, with probabilities 0.4 and 0.6, respectively. (b) mY 5 9.2 and sY ⫽ 5.8788. (c) There are no rules for a quadratic function of a random variable; we must use definitions. 4.137 (a) P(A) 5 1y36 and P(B) 5 15y36. (b) P(A) 5 1y36 and P(B) 5 15y36. (c) P(A) 5 10y36 and P(B) 5 6y36. (d) P(A) 5 10y36 and P(B) 5 6y36. 4.139 For example, if the point is 4 or 10, the expected gain is (1y3)(120) 1 (2y3)(210) 5 0. 4.141 (a) All probabilities are greater than or equal to 0, and their sum is 1. (b) 0.61. (c) Both probabilities are 0.39. 4.143 0.005352. 4.145 0.6817. 4.147 P(no point) 5 1y3. The probability of winning (losing) an odds bet is 1y36 (1y18) on 4 or 10, 2y45 (1y15) on 5 or 9, 25y396 (5y66) on 6 or 8. 4.149 0.1622. 4.151 P(Y , 1y3 0 Y . X) 5 1y9.
CHAPTER 5 5.1 Population: iPhone users. Statistic: a median of 108 apps per device. Likely values will vary. 5.3 mx ⫽ 420, sx ⫽ 1. 5.5 About 95% of the time, x is between 181 and 189. 5.7 (a) Each sample size has mx ⫽ 1. For n 5 2, sx ⫽ 0.707. For n 5 10, sx ⫽ 0.316. For n 5 25, sx ⫽ 0.2.
Answers to Odd-Numbered Exercises 5.9 (a) The standard deviation for n 5 10 will be sx ⫽ 20兾 210. (b) Standard deviation decreases with increasing sample size. (c) mx always equals m. 5.11 (a) m 5 125.5. (b) Answers will vary. (c) Answers will vary. (d) The center of the histogram represents an average of averages. 5.13 (a) Both populations are smartphone users. They likely are comparable. (b) Excluding those with no apps will increase the median because you are eliminating individuals. 5.15 (a) Larger. (b) We need sx ⱕ 0.085. (c) The smallest sample size that will fit this criterion is n 5 213. 5.17 mx ⫽ 250. sx ⫽ 0.25. 5.19 (b) To be more than 1 ml away from the target value means the volume is less than 249 or more than 251. Using symmetry, P 5 2P(X , 249) 5 2P(z , 22) 5 2(0.0228) 5 0.0456. (c) P 5 2P(X , 249) 5 2P(z , 24) ¯ 0. (Software gives 0.00006.) 5.21 (a) x is not systematically higher than or lower than m. (b) With large samples, x is more likely to be close to m. 5.23 (a) mx ⫽ 0.3. sx ⫽ 0.08. (b) 0.0062. (c) n 5 100 is a large enough sample to be able to use the central limit theorem. 5.25 (a) 0.0668. (b) 0.0047. 5.27 134.5 mgydl. 5.29 0.0051. 5.31 (a) N(0.5, 0.332). (b) 0.0655. Software gives a probability of 0.0661. 5.33 (a) y has a N1mY, sY兾 2m2 distribution, and x has a N1mX, sX兾 2n2 distribution. (b) y ⫺ x has a Normal distribution with mean mY 2 mX and standard deviation 2s 2y ⫹ s 2x. 5.35 n 5 1965. X 5 0.48 3 1965 5 943. pˆ ⫽ 0.48. 5.37 (a) n 5 1500. (b) Answers and reasons will vary. (c) If the choice is “Yes,” X 5 1025. (d) For “Yes,” pˆ ⫽ 1025兾1500 ⫽ 0.683. 5.39 B(10, 0.5). 5.41 (a) P(X 5 0) 5 0.0467 and P(X $ 4) 5 0.1792. (b) P(X 5 6) 5 0.0467 and P(X # 2) 5 0.1792. (c) The number of “failures” in the B(6, 0.4) distribution has the B(6, 0.6) distribution. With 6 trials, 0 successes is equivalent to 6 failures, and 4 or more successes is equivalent to 2 or fewer failures. 5.43 (a) 0.9953. (b) 0.8415. Using software gives 0.8422. 5.45 (a) 0.1563. (b) 0.7851.
A-13
5.47 (a and b) The coin is fair. The probabilities are still P(H) 5 P(T) 5 0.5. Separate flips are independent (coins have no “memory”), so regardless of the results of the first four tosses, the fifth is equally likely to be a head or a tail. (c) The parameters for a binomial distribution are n and p. (d) This is best modeled with a Poisson distribution. 5.49 (a) A B(200, p) distribution seems reasonable for this setting (even though we do not know what p is). (b) This setting is not binomial; there is no fixed value of n. (c) A B(500, 1y12) distribution seems appropriate for this setting. (d) This is not binomial because separate cards are not independent. 5.51 (a) The distribution of those who say they have stolen something is B(10, 0.2). The distribution of those who do not say they have stolen something is B(10, 0.8). (b) X is the number who say they have stolen something. P(X $ 4) 5 1 2 P(X # 3) 5 0.1209. 5.53 (a) Stole: m 5 2; did not steal: m 5 8. (b) s ⫽ 1.265. (c) If p 5 0.1, s ⫽ 0.949. If p 5 0.01, s ⫽ 0.315. As p gets smaller, the standard deviation becomes smaller. 5.55 (a) P(X # 7) 5 0.0172 and P(X # 8) 5 0.0566, so 7 is the largest value of m. (b) P(X # 5) 5 0.0338 and P(X # 6) 5 0.0950, so 5 is the largest value of m. (c) The probability will decrease. 5.57 The count of 5s among n random digits has a binomial distribution with p 5 0.1. (a) 0.4686. (b) m 5 4. 5.59 (a) n 5 4, p 5 0.7. (b) x P(X 5 x)
0
1
2
3
4
0.0081
0.0756
0.2646
0.4116
0.2401
(c) m ⫽ 410.72 ⫽ 2.8, and s ⫽ 2410.7211 ⫺ 0.72 ⫽ 0.9165. 5.61 (a) Because (0.7)(300) 5 210 and (0.3)(300) 5 90, the approximate distribution is 10.7210.32 ⫽ 0.0265b. P10.67 ⬍ pˆ ⬍ 0.732 ⫽ B 300 0.7416 (Software gives 0.7424). (b) If p 5 0.9, the distribution of pˆ is approximately N(0.9, 0.0173). P10.87 ⬍ pˆ ⬍ 0.932 ⫽ 0.9164 (Software gives 0.9171). (c) As p gets closer to 1, the probability of being within 60.03 of p increases. pˆ ⬃ N a0.7,
5.63 (a) The mean is m 5 p 5 0.69, and the standard deviation is s ⫽ 2p11 ⫺ p2兾n ⫽ 0.0008444. (b) m 6 2s gives the range 68.83% to 69.17%. (c) This range is considerably narrower than the historical range. In fact, 67% and 70% correspond to z 5 223.7 and z 5 11.8. 5.65 (a) pˆ ⫽ 0.28. (b) 0.0934 using Table A. Software gives 0.0927 without rounding intermediate values. (c) Answers will vary.
A-14
Answers to Odd-Numbered Exercises
5.67 (a) p 5 1y4 5 0.25. (b) P(X $ 10) 5 0.0139. (c) m 5 np 5 5 and s ⫽ 23.75 ⫽ 1.9365 successes. (d) No. The trials would not be independent. 5.69 (a) X, the count of successes, has the B(900, 1y5) distribution, with mean m 5 180 and s 5 12 successes. (b) For pˆ , the mean is mpˆ ⫽ p ⫽ 0.2 and spˆ ⫽ 0.01333. ˆ ⬎ 0.242 ⫽ 0.0013. (d) 208 or more successes. (c) P1 p 5.71 (a) 0.1788. (b) 0.0594. (c) 400. (d) Yes.
6.13 Margins of error: 17.355, 12.272, 8.677, and 6.136; interval width decreases with increasing sample size. 6.15 (a) She did not divide the standard deviation by 2500 ⫽ 22.361. (b) Confidence intervals concern the population mean. (c) 0.95 is a confidence level, not a probability. (d) The large sample size affects the distribution of the sample mean (by the central limit theorem), not the individual ratings.
5.73 Y has possible values 1, 2, 3, . . . P(first ⱓ appears on toss k) 5 (5y6)k21(1y6).
6.17 (a) The margin of error is 0.244; the interval is 5.156 to 5.644. (b) The margin of error is 0.321; the interval is 5.079 to 5.721.
5.75 (a) m 5 50. (b) The standard deviation is s ⫽ 250 ⫽ 7.071. P1X ⬎ 602 ⫽ 0.0793. Software gives 0.0786.
6.19 Margin of error, 2.29 Uyl. Interval, 10.91 to 15.49 Uyl.
5.77 (a) x has (approximately) an N(123 mg, 0.04619 mg) distribution. (b) P1x ⱖ 1242 is essentially 0. 5.79 (a) Approximately Normal with mean mx ⫽ 2.13 and standard deviation sx ⫽ 0.159. (b) P1x ⬍ 22 ⫽ 0.2061. Software gives 0.2068. (c) Yes, because n 5 140 is large. 5.81 0.0034. 5.83 If the carton weighs between 755 and 830 g, then the average weight of the 12 eggs must be between 755兾12 ⫽ 62.92 and 830兾12 ⫽ 69.17 g. The distribution of the mean weight is N166, 6兾 212 ⫽ 1.7322. P162.92 ⬍ x ⬍ 69.172 ⫽ 0.9288. 5.85 (a) He needs 14.857 (really 15) wins. (b) m 5 13.52 and s ⫽ 3.629. (c) Without the continuity correction, P1X ⱖ 152 ⫽ 0.3409. With the continuity correction, we have P1X ⱖ 14.52 ⫽ 0.3936. The continuity correction is much closer. 5.87 (a) pˆ F is approximately N(0.82, 0.01921), and pˆ M is approximately N(0.88, 0.01625). (b) When we subtract two independent Normal random variables, the difference is Normal. The new mean is the difference between the two means (0.88 2 0.82 5 0.06), and the new variance is the sum of the variances (0.000369 1 0.000264 5 0.000633), so pˆ M ⫺ pˆ F is approximately N(0.06, 0.02516). (c) 0.0087 (software: 0.0085). 5.89 P1Y ⱖ 2002 ⫽ P1Y兾500 ⱖ 0.42 ⫽ P1Z ⱖ 4.562 ⫽ 0.
6.21 Scenario A has a smaller margin of error; less variability in a single class rank. 6.23 (a) 618.98. (b) 618.98. 6.25 No; this is a range of values for the mean rent, not for individual rents. 6.27 (a) 11.03 to 11.97 hours. (b) No; this is a range of values for the mean time spent, not for individual times. (c) The sample size is large (n 5 1200 students surveyed). 6.29 (a) Not certain (only 95% confident). (b) We obtained the interval 86.5% to 88.5% by a method that gives a correct result 95% of the time. (c) About 0.51%. (d) No (only random sampling error). 6.31 x ⫽ 18.3515 kpl; the margin of error is 0.6521 kpl. 6.33 n 5 73. 6.35 No; confidence interval methods can be applied only to an SRS. 6.37 (a) 0.7738. (b) 0.9774. 6.39 H0: m 5 1.4 gycm2; Ha: m ⬆ 1.4 g兾cm2. 6.41 P 5 0.0702 (Software gives 0.0703). 6.43 (a) 1.645. (b) z . 1.645. 6.45 (a) z 5 1.875. (b) P 5 0.0301 (Software gives 0.0304). (c) P 5 0.0602 (Software gives 0.0608). 6.47 (a) No. (b) Yes.
CHAPTER 6
6.49 (a) Yes. (b) No. (c) To reject, we need P , a.
6.1 sx ⫽ $0.40.
6.51 (a) P 5 0.031 and P 5 0.969. (b) We need to know whether the observed data (for example, x) are consistent with Ha. (If so, use the smaller P-value.)
6.3 $0.80. 6.7 The margin of error will be halved. 6.9 n 5 285. 6.11 The students who did not respond are (obviously) not represented in the results. They may be more (or less) likely to use credit cards.
6.53 (a) Population mean, not sample mean. (b) H0 should be that there is no change. (c) A small P-value is needed for significance. (d) Compare P, not z, with a. 6.55 (a) H0: m 5 77; Ha: m ⬆ 77. (b) H0: m 5 20 seconds; Ha: m . 20 seconds. (c) H0: m 5 880 ft2; Ha: m , 880 ft2.
Answers to Odd-Numbered Exercises
A-15
6.57 (a) H0: m 5 $42,800; Ha: m . $42,800. (b) H0: m 5 0.4 hr; Ha: m ⬆ 0.4 hr.
6.93 The first test was barely significant at a 5 0.05, while the second was significant at any reasonable a.
6.59 (a) P 5 0.9545. (b) P 5 0.0455. (c) P 5 0.0910.
6.95 A significance test answers only question (b).
6.61 P 5 0.09 means there is some evidence for the wage decrease, but it is not significant at the a 5 0.05 level.
6.97 (a) The differences observed might occur by chance even if SES had no effect. (b) This tells us that the statistically insignificant test result did not occur merely because of a small sample size.
6.63 The difference was large enough that it would rarely arise by chance. Health issues related to alcohol use are probably discussed in the health and safety class. 6.65 The report can be made for public school students but not for private school students. Not finding a significant increase is not the same as finding no difference. 6.67 z ⫽ 4.14, so P 5 0.00003 (for a two-sided alternative). 6.69 H0: m 5 100; Ha: m ⬆ 100; z 5 5.75; significant (P , 0.0001). 6.71 (a) z 5 2.13, P 5 0.0166. (b) The important assumption is that this is an SRS. We also assume a Normal distribution, but this is not crucial provided there are no outliers and little skewness.
6.99 (a) P 5 0.2843. (b) P 5 0.1020. (c) P 5 0.0023. 6.101 With a larger sample, we might have significant results. 6.107 n should be about 100,000. 6.109 Reject the fifth (P 5 0.002) and eleventh (P , 0.002), because the P-values are both less than 0.05y12 5 0.0042. 6.111 Larger samples give more power. 6.113 Higher; larger differences are easier to detect. 6.115 (a) Power decreases. (b) Power decreases. (c) Power increases.
6.73 (a) H0: m 5 0 mpg; Ha: m ⬆ 0 mpg, where m is the mean difference. (b) z ⫽ 4.07, which gives a very small P-value.
6.117 Power: about 0.99.
6.75 (a) H0: m 5 0.61 mg; Ha: m . 0.61 mg. (b) Yes. (c) No.
6.121 (a) Hypotheses: “subject should go to college” and “subject should join workforce.” Errors: recommending college for someone who is better suited for the workforce, and recommending work for someone who should go to college.
6.77 x ⫽ 0.8 is significant, but 0.7 is not. Smaller a means that x must be farther away. 6.79 x ⱖ 0.3 will be statistically significant. With a larger sample size, x close to m0 will be significant. 6.81 Changing to the two-sided alternative multiplies each P-value by 2.
6.119 Power: 0.4641.
6.123 (a) For example, if m is the mean difference in scores, H0: m 5 0; Ha: m ⬆ 0. (b) No. (c) For example: Was this an experiment? What was the design? How big were the samples? 6.125 (a) For boys:
x
P
x
P
0.1
0.7518
0.6
0.0578
Energy (kJ)
0.2
0.5271
0.7
0.0269
Protein (g)
0.3
0.3428
0.8
0.0114
Calcium (mg)
315.33 to 332.87
0.4
0.2059
0.9
0.0044
0.5
0.1139
1
0.0016
Energy (kJ)
2130.7 to 2209.3
6.83 Something that occurs “fewer than 1 time in 100 repetitions” must also occur “fewer than 5 times in 100 repetitions.” 6.85 Any z with 2.576 , |z| , 2.807. 6.87 P . 0.25. 6.89 0.05 , P , 0.10; P 5 0.0602. 6.91 To determine the effectiveness of alarm systems, we need to know the percent of all homes with alarm systems and the percent of burglarized homes with alarm systems.
2399.9 to 2496.1 24.00 to 25.00
(b) For girls: Protein (g) Calcium (mg)
21.66 to 22.54 257.70 to 272.30
(c) The confidence interval for boys is entirely above the confidence interval for girls for each food intake. 6.129 (a) 4.61 to 6.05 mgydl. (b) z 5 1.45, P 5 0.0735; not significant. 6.131 (b) 26.06 to 34.74 mgyl. (c) z 5 2.44, P 5 0.0073. 6.133 (a) Under H0, x has an N(0%, 5.3932%) distribution. (b) z 5 1.28, P 5 0.1003. (c) Not significant.
A-16
Answers to Odd-Numbered Exercises
6.135 It is essentially correct. 6.137 Find x, then take x ⫾ 1.9614兾 2122 ⫽ x ⫾ 2.2632. 6.139 Find x, then compute z ⫽ 1x ⫺ 232兾 14兾 2122. Reject H0 based on your chosen significance level.
CHAPTER 7 7.1 (a) $13.75. (b) 15. 7.3 $570.70 to $629.30. 7.5 (a) Yes. (b) No. 7.7 4.19 to 10.14 hours per month. 7.9 Using t* 5 2.776 from Table D: 0.685 to 22.515. Software gives 0.683 to 22.517. 7.11 The sample size should be sufficient to overcome any non-Normality, but the mean m might not be a useful summary of a bimodal distribution. 7.13 The power is about 0.9192. 7.15 The power is about 0.9452. 7.17 (a) t* 5 2.201. (b) t* 5 2.086. (c) t* 5 1.725. (d) t* decreases with increasing sample size and increases with increasing confidence. 7.19 t* 5 1.753 (or 21.753). 7.21 For the alternative m , 0, the answer would be the same (P 5 0.034). For the alternative m . 0, the answer would be P 5 0.966. 7.23 (a) df 5 26. (b) 1.706 , t , 2.056. (c) 0.05 , P , 0.10. (d) t 5 2.01 is not significant at either level. (e) From software, P ⫽ 0.0549. 7.25 It depends on whether x is on the appropriate side of m0. 7.27 (a) H0: m 5 4.7; Ha: m ⬆ 4.7. t 5 14.907 with 0.002 , P , 0.005 (software gives P 5 0.0045). (b) 4.8968% to 5.0566%. (c) Because our confidence interval is entirely within the range of 4.7% to 5.3%, it appears that Budweiser is meeting the required standards. 7.29 (a) H0: m 5 10; Ha: m , 10. (b) t ⫽ ⫺5.26, df 5 33, P , 0.0001. 7.31 (a) Distribution is not Normal; it has two peaks and one large value. (b) Maybe; we have a large sample but a small population. (c) 27.29 6 5.717, or 21.57 to 33.01 cm. (d) One could argue for either answer. 7.33 (a) Yes; the sample size is large. (b) t 5 22.115. Using Table D, we have 0.02 , P , 0.04, while software gives P 5 0.0381. 7.35 H0: m 5 45 versus Ha: m . 45. t ⫽ 5.457. Using df 5 49, P ¯ 0; with df 5 40, P , 0.0005.
7.37 (a) t 5 5.13, df 5 15, P , 0.001. (b) With 95% confidence, the mean NEAT increase is between 191.6 and 464.4 calories. 7.39 (a) H0: mc 5 md; Ha: mc ⬆ md. (b) t ⫽ 4.358, P ⫽ 0.0003; we reject H0. 7.41 (a) H0: m 5 925; Ha: m . 925. t ⫽ 3.27 (df 5 35), P ⫽ 0.0012. (b) H0: m 5 935; Ha: m . 935. t ⫽ 0.80, P ⫽ 0.2146. (c) The confidence interval includes 935 but not 925. 7.43 (a) The differences are spread from 20.018 to 0.020 g. (b) t 5 20.347, df 5 7, P 5 0.7388. (c) 20.0117 to 0.0087 g. (d) They may be representative of future subjects, but the results are suspect because this is not a random sample. 7.45 (a) H0: m 5 0; Ha: m . 0. (b) Slightly left-skewed; x ⫽ 2.5 and s 5 2.893. (c) t 5 3.865, df 5 19, P 5 0.00052. (d) 1.15 to 3.85. 7.47 For the sign test, P 5 0.0898; not quite significant, unlike Exercise 7.38. 7.49 H0: median 5 0; Ha: median ⬆ 0; P 5 0.7266. This is similar to the t test P-value. 7.51 H0: median 5 0; Ha: median . 0; P 5 0.0013. 7.53 Reject H0 if 0 x 0 ⱖ 0.00677. The power is about 7%. 7.55 n . 26. (The power is about 0.7999 when n 5 26.) 7.57 220.3163 to 0.3163; do not reject H0. 7.59 Using df 5 14, Table D gives 0.04 , P , 0.05. 7.61 SAS and SPSS give t 5 2.279, P 5 0.052. 7.63 (a) Hypotheses should involve m1 and m2. (b) The samples are not independent. (c) We need P to be small (for example, less than 0.10) to reject H0. (d) t should be negative to reject H0 with this alternative. 7.65 (a) No (in fact, P ⫽ 0.07712. (b) Yes 1P ⫽ 0.03852. 7.67 H0: mBrown 5 mBlue and Ha: mBrown . mBlue. t ⫽ 2.59. Software gives P ⫽ 0.0058. Table D gives 0.005 , P , 0.01. 7.69 The nonresponse is (3866 2 1839)y3866 5 0.5243, or about 52.4%. What can we say about those who do (or do not) respond despite the efforts of the researchers? 7.71 (a) While the distributions do not look particularly Normal, they have no extreme outliers or skewness. (b) xN ⫽ 0.5714, sN ⫽ 0.7300, nN 5 14; xS ⫽ 2.1176, sS ⫽ 1.2441, nS 5 17. (c) H0: mN 5 mS; Ha: mN , mS. (d) t 5 24.303, so P ⫽ 0.0001 1df ⫽ 26.52 or P , 0.0005 (df 5 13). (e) 22.2842 to 20.8082 1df ⫽ 26.52 or 22.3225 to 20.7699 (df 5 13).
Answers to Odd-Numbered Exercises
A-17
7.73 (a) Although the data are integers, the sample sizes are large. (b) Taco Bell: x ⫽ 4.1987, s ⫽ 0.8761, n 5 307. McDonald’s: x ⫽ 3.9365, s ⫽ 0.8768, n 5 362. (c) t ⫽ 3.85, P ⫽ 0.0001 1df ⫽ 649.42 or P , 0.005 (df 5 100). (d) 0.129 to 0.396 (df 5 649.4) or 0.128 to 0.391 (df 5 306) or 0.127 to 0.397 (df 5 100).
(e) df 5 121.503; sp 5 1.734; SE1 5 0.2653 and SE2 5 0.1995. t 5 24.56, df 5 333, P , 0.0001.
7.75 (a) Assuming we have SRSs from each population, this seems reasonable. (b) H0: mEarly 5 mLate and Ha: mEarly ⬆ mLate. (c) SED ⫽ 1.0534, t ⫽ 1.614, P ⫽ 0.1075 1df ⫽ 347.42 or P ⫽ 0.1081 (df 5 199). (d) 20.372 to 3.772 (df 5 347.7) or 20.377 to 3.777 (df 5 199) or 20.390 to 3.790 (df 5 100).
7.103 F 5 1.106, df 5 199 and 201. Using Table D (df 5 120 and 200), P . 0.200. (Software gives P 5 0.4762.)
7.77 (a) This may be near enough to an SRS if this company’s working conditions were similar to those of other workers. (b) 9.99 to 13.01 mg # yym3. (c) t 5 15.08, P , 0.0001 with either df 5 137 or 114. (d) The sample sizes are large enough that skewness should not matter. 7.79 You need either sample sizes and standard deviations or degrees of freedom and a more accurate value for the P-value. The confidence interval will give us useful information about the magnitude of the difference. 7.81 This is a matched pairs design.
7.99 (a) F* 5 2.25. (b) Significant at the 10% level but not at the 5% level. 7.101 A smaller s would yield more power.
7.105 F 5 5.263 with df 5 114 and 219; P , 0.0001. The authors described the distributions as somewhat skewed, so the Normality assumption may be violated. 7.107 F 5 1.506 with df 5 29 and 29; P 5 0.2757. The stemplots in Exercise 7.85 did not appear to be Normal. 7.109 (a) F ⫽ 1.12; do not reject H0. (b) The critical values are 9.60, 15.44, 39.00, and 647.79. With small samples, these are low-power tests. 7.111 Using a larger s for planning the study is advisable because it provides a conservative (safe) estimate of the power. 7.113 x ⫽ 139.5, s ⫽ 15.0222, sx ⫽ 7.5111.We cannot consider these four scores to be an SRS.
7.83 The next 10 employees who need screens might not be an independent group—perhaps they all come from the same department, for example.
7.115 As df increases, t* approaches 1.96.
7.85 (a) The north distribution (five-number summary 2.2, 10.2, 17.05, 39.1, 58.8 cm) is right-skewed, while the south distribution (2.6, 26.1, 37.70, 44.6, 52.9) is left-skewed. (b) The methods of this section seem to be appropriate. (c) H0: mN 5 mS; Ha: mN ⬆ mS. (d) t 5 22.63 with df 5 55.7 (P 5 0.011) or df 5 29 (P 5 0.014). (e) Either 219.09 to 22.57 or 219.26 to 22.40 cm.
7.119 (a) Two independent samples. (b) Matched pairs. (c) Single sample.
7.87 (a) Either 20.90 to 6.90 units (df 5 122.5) or 20.95 to 6.95 units (df 5 54). (b) Random fluctuation may account for the difference in the two averages. 7.89 (a) H0: mB 5 mF; Ha: mB . mF; t 5 1.654, P 5 0.053 (df 5 37.6) or P 5 0.058 (df 5 18). (b) 20.2 to 2.0. (c) We need two independent SRSs from Normal populations. 7.91 sp 5 0.9347; t ⫽ ⫺3.636, df 5 40, P ⫽ 0.0008; 21.6337 to 20.4663. Both results are similar to those for Exercise 7.72. 7.93 sp 5 15.96; t 5 22.629, df 5 58, P 5 0.0110; 219.08 to 22.58 cm. All results are nearly the same as in Exercise 7.85. 7.95 df 5 55.725. 7.97 (a) df 5 137.066. (b) sp 5 5.332 (slightly closer to s2, from the larger sample). (c) With no assumption, SE1 5 0.7626; with the pooled method, SE2 5 0.6136. (d) t 5 18.74, df 5 333, P , 0.0001. t and df are larger, so the evidence is stronger (although it was quite strong before).
7.117 Margins of error decrease with increasing sample size.
7.121 (a) H0: m 5 1.5; Ha: m , 1.5. t ⫽ ⫺9.974, P ¯ 0. (b) 0.697 to 0.962 violations. (d) The sample size should be large enough to make t procedures safe. 7.123 (a) 23.008 to 1.302 (Software gives 22.859 to 1.153). (b) 21.761 to 0.055. 7.125 (a) We are looking at the average proportion for samples of n 5 41 and 197. (b) H0: mB 5 mW and Ha: mB ⬆ mW . (c) For First Year: t ⫽ 0.982. With df 5 52.3, P ⫽ 0.3305. For Third Year: t ⫽ 2.126, df ⫽ 46.9, P ⫽ 0.0388. 7.127 (a) Body weight: mean 20.7 kg, SE 2.298 kg. Caloric intake: mean 5 14 cal, SE 5 56.125 cal. (b) t1 5 20.305 (body weight) and t2 5 0.249 (caloric intake), both df 5 13, both P-values are about 0.8. (c) 25.66 to 4.26 kg and 2107.23 to 135.23 cal. 7.129 (a) At each nest, the same mockingbird responded on each day. (b) 6.9774 m. (c) t ⫽ 6.32, P , 0.0001. (d) 5.5968 m. (e) t ⫽ ⫺0.973, P ⫽ 0.3407. 7.131 How much a person eats may depend on how many people he or she is sitting with. 7.133 No; what we have is nothing like an SRS. 7.135 77.76% 6 13.49%, or 64.29% to 91.25%.
A-18
Answers to Odd-Numbered Exercises
7.137 GPA: t 5 20.91, df 5 74.9 (P 5 0.1839) or 30 (0.15 , P , 0.20). Confidence interval: 21.33 to 0.5. IQ: t 5 1.64, df 5 56.9 (P 5 0.0530) or 30 (0.05 , P , 0.10). Confidence interval: 21.12 to 11.36.
8.27 (a) Values of pˆ outside the interval 0.1730 to 0.4720. (b) Values outside the interval 0.210 to 0.390.
7.139 t 5 3.65, df 5 237.0 or 115, P , 0.0005. 95% confidence interval for the difference: 0.78 to 2.60.
8.31 0.4043 to 0.4557.
7.141 t 5 20.3533, df 5 179, P 5 0.3621.
8.29 (a) About 67,179 students. (b) 0.4168 to 0.4232.
8.33 (a) 60.001321. (b) Other sources of error are much more significant than sampling error.
7.143 Basal: x ⫽ 41.0455, s 5 5.6356. DRTA: x ⫽ 46.7273, s 5 7.3884. Strat: x ⫽ 44.2727, s 5 5.7668. (a) t ⫽ 2.87, P , 0.005. Confidence interval for difference: 1.7 to 9.7 points. (b) t ⫽ 1.88, P , 0.05. Confidence interval for difference: 20.24 to 6.7 points.
8.35 (a) pˆ ⫽ 0.3275; 0.3008 to 0.3541. (b) Speakers and listeners probably perceive sermon length differently.
CHAPTER 8
8.39 (a) z 5 1.34, P 5 0.1802. (b) 0.4969 to 0.5165.
8.1 (a) n 5 5013 smartphone users. (b) p is the proportion of smartphone users who have used the phone to search for information about a product that they purchased. (c) X 5 2657. (d) pˆ ⫽ 0.530.
8.41 n 5 9604.
8.3 (a) 0.0070. (b) 0.530 6 0.014. (c) 51.6% to 54.4%. 8.5 Shade above 1.34 and below 21.34. 8.7 pˆ ⫽ 0.75, z 5 2.24, P 5 0.0250. 8.9 (a) z 5 21.34, P 5 0.1802 (Software gives P 5 0.1797). (b) 0.1410 to 0.5590—the complement of the interval shown in Figure 8.3. 8.11 The plot is symmetric about 0.5, where it has its maximum. 8.13 (a) p is the proportion of students at your college who regularly eat breakfast. n 5 200, X 5 84. (b) pˆ ⫽ 0.42. (c) We estimate that the proportion of all students at the university who eat breakfast is about 0.42 (42%). 8.15 (a) pˆ ⫽ 0.461, SEpˆ ⫽ 0.0157, m ⫽ 0.0308. (b) Yes. (c) 0.4302 to 0.4918. (d) We are 95% confident that between 43% and 49.2% of cell phone owners used their cell phone while in a store to call a friend or family member for advice about a purchase. 8.17 (a) pˆ ⫽ 0.7826, SEpˆ ⫽ 0.0272, m ⫽ 0.0533. (b) This was not an SRS; they asked all customers in the twoweek period. (c) 0.7293 to 0.8359. 8.19 n at least 597. 8.21 (a) The confidence level cannot exceed 100%. (In practical terms, the confidence level must be less than 100%.) (b) The margin of error only accounts for random sampling error. (c) P-values measure the strength of the evidence against H0, not the probability of it being true. 8.23 pˆ ⫽ 0.6548; 0.6416 to 0.6680. 8.25 (a) X 5 934.5, which rounds to 935. We cannot have fractions of respondents. (b) Using 89%, 0.8711 to 0.9089. (c) 87.1% to 90.9%.
8.37 (a) H0: p 5 0.5 versus Ha: p . 0.5; pˆ ⫽ 0.7. z ⫽ 2.83, P 5 0.0023. (c) The test is significant at the 5% level (and the 1% level as well).
8.43 The sample sizes are 55, 97, 127, 145, 151, 145, 127, 97, and 55; take n 5 151. 8.45 Mean 5 20.3, standard deviation 5 0.1360. 8.47 (a) Means p1 and p2, standard deviations 2p1 11 ⫺ p1 2兾n1 and 2p2 11 ⫺ p2 2兾n2. (b) p1 2 p2. (c) p1(1 2 p1)yn1 1 p2(1 2 p2)yn2. 8.49 The interval for qW 2 qM is 20.0030 to 0.2516. 8.51 The sample proportions support the alternative hypothesis pm . pw; P 5 0.0287. 8.53 (a) Only 5 of 25 watched the second design for more than a minute; this does not fit the guidelines. (b) It is reasonable to assume that the sampled students were chosen randomly. No information was given about the size of the institution; are there more than 20(361) 5 7220 first-year students and more than 20(221) 5 4420 fourth-year students? There were more than 15 each “Yes” and “No” answers in each group. 8.55 (a) Yes. (b) Yes. 8.57 (a) RR (watch more than one minute) 5 2.4. (b) RR (“Yes” answer) 5 2.248. 8.59 (a) Type of college is explanatory; response is requiring physical education. (b) The populations are private and public colleges and universities. (c) X1 5 101, n1 5 129, pˆ 1 5 0.7829, X2 5 60, n2 5 225, pˆ 2 5 0.2667. (d) 0.4245 to 0.6079. (e) H0: p1 5 p2 and 60 ⫹ 101 ⫽ 0.4548. z ⫽ 9.39, Ha: p1 ⬆ p2. We have pˆ ⫽ 225 ⫹ 129 P ¯ 0. (f) All counts are greater than 15. Were these random samples? 8.61 0.0363 to 0.1457. 8.63 (a) n1 5 1063, pˆ 1 ⫽ 0.54, n2 5 1064, pˆ 2 ⫽ 0.89. (We can estimate X1 ⫽ 574 and X2 ⫽ 947.) (b) 0.35. (c) Yes;
Answers to Odd-Numbered Exercises large, independent samples from two populations. (d) 0.3146 to 0.3854. (e) 35%; 31.5% to 38.5%. (f) A possible concern: adults were surveyed before Christmas.
8.101 (a) 0.5278 to 0.5822. (b) 0.5167 to 0.5713. (c) 0.3170 to 0.3690. (d) 0.5620 to 0.6160. (e) 0.5620 to 0.6160. (f) 0.6903 to 0.7397.
8.65 (a) n1 5 1063, pˆ 1 ⫽ 0.73, n2 5 1064, pˆ 2 ⫽ 0.76. (We can estimate X1 ⫽ 776 and X2 ⫽ 809.) (b) 0.03. (c) Yes; large, independent samples from two populations. (d) 20.0070 to 0.0670. (e) 3%; 20.7% to 6.7%. (f) A possible concern: adults were surveyed before Christmas.
CHAPTER 9
8.67 No; we need independent samples from different populations. 8.69 (a) H0 should refer to p1 and p2. (b) Only if n1 5 n2. (c) Confidence intervals account for only sampling error.
9.1 (a) Yes: 47y292 5 0.161, No: 245y292 5 0.839. (b) Yes: 21y233 5 0.090, No: 212y233 5 0.910. (d) Females are somewhat more likely than males to have increased the time they spend on Facebook. 9.5 Among all three fruit consumption groups, vigorous exercise is most likely. Incidence of low exercise decreases with increasing fruit consumption. 9.7 Physical Activity
8.71 (a) pˆ F ⫽ 0.8, SE ⫽ 0.05164; pˆ M ⫽ 0.3939, SE ⫽ 0.04253. (b) 0.2960 to 0.5161. (c) z 5 5.22, P ¯ 0.
Fruit
Low
Medium
Vigorous
Total
Low
51.9
212.9
304.2
569
Medium
29.3
120.1
171.6
321
8.73 (a) n 5 2342, x 5 1639. (b) pˆ ⫽ 0.6998. SE ⫽ 0.0095. (c) 0.6812 to 0.7184. (d) Yes.
High
26.8
110.0
157.2
443
633
8.75 We have large samples from two independent populations (different age groups). pˆ 1 ⫽ 0.8161, pˆ 2 ⫽ 0.4281. SED ⫽ 0.0198. The 95% confidence interval is 0.3492 to 0.4268.
Total
8.87 0.6337 to 0.6813. 8.89 H0: pF 5 pM and Ha: pF ⬆ pM. XM ⫽ 171 and XF ⫽ 150. pˆ ⫽ 0.1600, SEDp ⫽ 0.0164. z ⫽ 1.28, P 5 0.2009. 8.93 All pˆ -values are greater than 0.5. Texts 3, 7, and 8 have (respectively) z 5 0.82, P 5 0.4122; z 5 3.02, P 5 0.0025; and z 5 2.10, P 5 0.0357. For the other texts, z $ 4.64 and P , 0.00005. 8.95 The difference becomes more significant as sample size increases. With n 5 60, P 5 0.2713; with n 5 500, P 5 0.0016, for example. 2
8.97 (a) n 5 534. (b) n 5 (z*ym) y2. 8.99 (a) p0 5 0.7911. (b) pˆ ⫽ 0.3897, z 5 229.1; P is tiny. (c) pˆ 1 ⫽ 0.3897, pˆ 2 ⫽ 0.7930, z 5 229.2; P is tiny.
294 1184
9.11 (a) Explanatory variable Response
8.81 There was only one sample, not two independent samples. Many people use both.
8.85 pˆ ⫽ 0.375, SED ⫽ 0.01811, z ⫽ 6.08, P , 0.0001.
108
9.9 (a) df 5 12, 0.05 , P , 0.10. (b) df 5 12, 0.05 , P , 0.10. (c) df 5 1, 0.005 , P , 0.01. (d) df 5 1, 0.20 , P , 0.25.
8.79 (a) 1207. (b) 0.6483 to 0.6917. (c) About 64.8% to 69.2%.
8.83 (a) We have six chances to make an error. (b) Use z* 5 2.65 (software: 2.6383). (c) 0.705 to 0.775, 0.684 to 0.756, 0.643 to 0.717, 0.632 to 0.708, 0.622 to 0.698, and 0.571 to 0.649.
A-19
1
2
Yes
0.357
0.452
No
0.643
0.548
Total
1.000
1.000
(c) Explanatory variable value 1 had proportionately fewer “yes” responses. 9.13 (a) pi 5 proportion of “Yes” responses in group i. H0: p1 5 p2, Ha: p1 ⬆ p2. pˆ ⫽ 175 ⫹ 952兾 1210 ⫹ 2102 ⫽ 0.4048. z ⫽ ⫺1.9882, P 5 0.0468. We fail to reject H0. (c) The P-values agree. (d) z2 5 (21.9882)2 5 3.9529. 9.15 Roundoff error. 9.17 The contributions for the other five states are CA
HI
IN
NV
OH
0.5820
0.0000
0.0196
0.0660
0.2264
X 2 5 0.9309. 9.19 (a) H0: P(head) 5 P(tail) 5 0.5 versus Ha: H0 is incorrect (the probabilities are not 0.5). (b) X2 5 1.7956, df 5 1, P 5 0.1802.
A-20
Answers to Odd-Numbered Exercises 9.27 (a) H0: p1 5 p2 versus Ha: p1 ⬆ p2, where the proportions of interest are those for persons harassed in person. pˆ 1 ⫽ 321兾361 ⫽ 0.8892, pˆ 2 ⫽ 200兾641 ⫽ 0.3120, pˆ ⫽ 521兾1002 ⫽ 0.5200. z ⫽ 17.556, P ¯ 0. (b) H0: there is no association between being harassed online and in person versus Ha: There is a relationship. X 2 5 308.23, df 5 1, P ¯ 0. (c) 17.5562 5 308.21, which agrees with X2 to within roundoff error. (d) One possibility is eliminating girls who said they have not been harassed.
9.21 (a) Joint Distribution: Site 1
Site 2
Total
More than 1 min
0.24
0.10
0.34
Less than 1 min
0.26
0.40
0.66
Total
0.50
0.50
1.00
The conditional distributions are Site 1
Site 2
Total
More than 1 min
0.7059
0.2941
1.0000
Less than 1 min
0.3939
0.6061
1.0000
9.29 (a) The solution to Exercise 9.27 used “harassed online” as the explanatory variable. (b) Changing to use “harassed in person” for the two-proportions z test gives pˆ 1 ⫽ 0.6161, pˆ 2 ⫽ 0.0832, pˆ ⫽ 0.3603. We again compute z ⫽ 17.556, P ¯ 0. No changes will occur in the chi-square test. (c) If two variables are related, the test statistic will be the same regardless of which is viewed as explanatory.
and Site 1
Site 2
More than 1 min
0.48
0.20
Less than 1 min
0.52
0.80
9.31 Ei 5 100 for each face of the die.
Total
1.00
1.00
9.33 (a) One might believe that opinion depended on the type of institution. (b) Presidents at 4-year public institutions are roughly equally divided about online courses, with presidents at 2-year public institutions slightly in favor. 4-year private school presidents are definitely not in agreement, while those at private 2-year schools seem to think online courses are equivalent to face-to-face courses.
(b) Joint Distribution 1st year
4th year
Total
Yes
0.1460
0.2010
0.3471
No
0.4742
0.1787
0.6529
Total
0.6203
0.3797
1.0000
9.35 (a) 206. (b) We have separate samples, so the twoway table is
The conditional distributions are
Presidents
4th year
Total
0.4208
0.5792
1.0000
Yes
206
621
1.0000
No
189
1521
Yes No
0.7263
0.2737
and 1st year
4th year
Yes
0.2355
0.5294
No
0.7645
0.4706
Total
1.0000
1.0000
(c) The column totals for this table are the two sample sizes. The row totals might be seen as giving an overall opinion on the value of online courses. (d) H0: The opinions on the value of online courses are the same for college presidents and the general public versus Ha: The opinions are different. X2 5 81.41, df 5 1, P ¯ 0.
9.23 (a) Describe a relationship. (b) Describe a relationship. (c) Time of day might explain the violence content of TV programs. (d) Age would explain bad teeth. 9.25 Times Witnessed Gender
Public
1st year
Never
Once
More than once
Total
Girls
125.503
161.725
715.773
1003
Boys
120.497
155.275
687.227
Total
246
317
1403
963 1966
9.37 (a) For example, in the “small” stratum, 51 claims were allowed, 6 were not allowed, and the total number of claims was 57. Altogether, there were 79 claims; 67 were allowed and 12 were not. (b) 10.5% (small claims), 29.4% (medium), and 20.0% (large) were not allowed. (c) In the 3 3 2 table, the expected count for large/not allowed is too small. (d) There is no relationship between claim size and whether a claim is allowed. (e) X2 5 3.456, df 5 1, P 5 0.063. 9.39 There is strong evidence of a change (X2 5 308.3, df 5 2, P , 0.0001).
Answers to Odd-Numbered Exercises 9.41 (a) For example, among those students in trades, 320 enrolled right after high school, and 622 later. (b) In addition to the given percents, 39.4% of these students enrolled right after high school. (c) X2 5 276.1, df 5 5, P , 0.0001. 9.43 (a) For example, among those students in trades, 188 relied on parents, family, or spouse, and 754 did not. (b) X2 5 544.8, df 5 5, P , 0.0001. (c) In addition to the given percents, 25.4% of all students relied on family support.
9.47 Start by setting a equal to any number from 0 to 100. 2
9.49 X 5 852.433, df 5 1, P , 0.0005.
CHAPTER 10 10.1 (a) 3.1. (b) The slope of 3.1 means the average value of y increases by 3.1 units for each unit increase in x. (c) 82.6. (d) 72.2 to 93.0. 10.3 (a) t ⫽ 1.895, df 5 25 2 2 5 23. From Table D, we have 0.05 , P , 0.10 (software gives 0.0707). (b) t ⫽ 2.105, df 5 25 2 2 5 23. From Table D, 0.04 , P , 0.05 (0.0464 from software). (c) t ⫽ 3.091, df 5 98. Using df 5 80 in Table D, 0.002 , P , 0.005 (0.0026 from software). 10.5 m 5 0.7 kgym2. At x 5 5.0, the margin of error will be larger.
7
10.7 (b) The fitted line is Spending ⫽ ⫺4900.5333 ⫹ 2.4667 Year. (Note: Rounding on this exercise can make a big difference in results.) (c) The residuals are (with enough decimal places in slope and intercept) 20.1, 0.2, 20.1. s ⫽ 0.2449. (d) The model is y 5 b0 1 b1x 1 . We have estimates bˆ 0 ⫽ ⫺4900.5333, bˆ 1 ⫽ 2.4667, and sˆ 1ei 2 ⫽ 0.2449. (e) s1b1 2 ⫽ 0.0577. df 5 1, so t* 5 12.71. The 95% CI is 1.733 to 3.200.
10.11 Kiplinger narrows down the number of colleges; these are an SRS from that list, not from the original 500 four-year public colleges. 10.13 (a) $19,591.29. (b) $23,477.93. (c) La Crosse is farther from the center of the x distribution.
10.19 (a) The relationship is strong (little scatter), increasing, and fairly linear; however, there may be a bit of curve at each end. (b) OUT11 ⫽ 1075 ⫹ 1.15 OUT08 (or yˆ ⫽ 1075 ⫹ 1.15x2. (d) No overt problems are noted, even though the Normal plot wiggles around the line. 10.21 The scatterplot shows a weak, increasing relationship between in-state and out-of-state tuition rates for 2011. Minnesota appears to be an outlier, with an in-state tuition of $13,022 and an out-of-state tuition of $18,022. The regression equation is OUT11 ⫽ 17,160 ⫹ 1.017 IN11 (or yˆ ⫽ 17,160 ⫹ 1.017x). The scatterplot of residuals against x shows no overt problems (except the low outlier for Minnesota); the Normal quantile plot also shows no problems, although we note that several schools seem to have similar residuals (slightly more than $5000). 10.23 (a) yˆ ⫽ ⫺0.0127 ⫹ 0.0180x, r2 ⫽ 80.0%. (b) H0: b1 5 0; Ha: b1 . 0; t 5 7.48, P , 0.0001. (c) The predicted mean is 0.07712; the interval is 0.040 to 0.114. 10.25 (a) Both distributions are sharply right-skewed; the five-number summaries are 0%, 0.31%, 1.43%, 17.65%, 85.01% and 0, 2.25, 6.31, 12.69, 27.88. (b) No; x and y do not need to be Normal. (c) There is a weak positive linear relationship. (d) yˆ ⫽ 6.247 ⫹ 0.1063x. (e) The residuals are right-skewed. 10.27 (a) 17 of these 30 homes sold for more than their assessed values. (b) A moderately strong, increasing linear relationship. (c) yˆ ⫽ 66.95 ⫹ 0.6819x. (d) The outlier point is still an outlier in this plot; it is almost three standard deviations below its predicted value. (e) The new equation is yˆ ⫽ 37.41 ⫹ 0.8489x. s ⫽ 31.41 decreased to s ⫽ 26.80. (f) There are no clear violations of the assumptions. 10.29 (a) The plot could be described as increasing and roughly linear, or possibly curved; it almost looks as if there are two lines; one for years before 1980 and one after that. 2012 had an unusually low number of tornadoes, while 2004 had an unusually high
7
10.9 (a) b0, b1, and s are the parameters. (b) H0 should refer to b1. (c) The confidence interval will be narrower than the prediction interval.
10.17 (a) H0: b1 5 0 and Ha: b1 . 0. It does not seem reasonable to believe that tuition will decrease. (b) From software, t 5 13.94, P , 0.0005 (df 5 26). (c) Using df 5 26 from Table D, 0.9675 ⫾ 2.05610.069392 ⫽ 0.825 to 1.110. (d) r2 5 88.2%. (e) Inference on b0 would be extrapolation; there were no colleges close to $0 tuition in 2008.
7
9.55 (a) We expect each quadrant to contain one-fourth of the 100 trees. (b) Some random variation would not surprise us. (c) X2 5 10.8, df 5 3, P 5 0.0129.
10.15 Prediction intervals concern individuals instead of means. Departures from the Normal distribution assumption would be more severe here (in terms of how the individuals vary around the regression line).
7
9.45 (a) 57.98%. (b) 30.25%. (c) To test “There is no relationship between waking and bedtime symptoms” versus “There is a relationship,” we find X 2 ⫽ 2.275, df 5 1, P ⫽ 0.132.
A-21
number. (b) Tornadoes ⫽ ⫺27,432 ⫹ 14.312 Year (or yˆ ⫽ ⫺27,432 ⫹ 14.312x). The 95% confidence interval is 14.312 6 2.009(1.391) using df 5 50. (c) We see what seems to be an increasing amount of scatter in later years. (d) Based on the Normal quantile plot, we can assume that the residuals are Normally distributed.
A-22
Answers to Odd-Numbered Exercises
(e) After eliminating 2004 and 2012 from the data set,
7
the new equation is Tornadoes ⫽ ⫺27,458 ⫹ 14.324 Year. These years are not very influential to the regression (the slope and intercept changed very little). 10.31 (a) 8.41%. (b) t ⫽ 9.12, P , 0.0001. (c) The students who did not answer might have different characteristics.
7
10.33 (a) x (percent forested) is right-skewed; x ⫽ 39.3878%, sx 5 32.2043%. y (IBI) is left-skewed; y ⫽ 65.9388, sy 5 18.2796. (b) A weak positive association, with more scatter in y for small x. (c) yi 5 b0 1 b1 xi 1 ei, i 5 1,2, . . . , 49; ei are independent N(0, s) variables. (d) H0: b1 5 0; Ha: b1 ⬆ 0. (e) IBI ⫽ 59.9 ⫹ 0.153 Area; s 5 17.79. For testing the hypotheses in (d), t 5 1.92 and P 5 0.061. (f) Residual plot shows a slight curve. (g) Residuals are left-skewed. 10.35 The first change decreases P (that is, the relationship is more significant) because it accentuates the positive association. The second change weakens the association, so P increases (the relationship is less significant). 10.37 Area 5 10, yˆ ⫽ 57.52; using forest 5 63, yˆ ⫽ 69.55. Both predictions have a lot of uncertainty (the prediction intervals are about 70 units wide).
7
10.39 (a) It appears to be quite linear. (b) Lean ⫽ ⫺61.12 ⫹ 9.3187 Year; r2 5 98.8%. (c) 8.36 to 10.28 tenths of a millimeter/year. 10.41 (a) 113. (b) The prediction is 991.89 mm beyond 2.9 m, or about 3.892 m. (c) Prediction interval. 10.43 t ⫽ ⫺4.16, df 5 116, P , 0.0001. 10.45 DFM 5 1, DFE 518, SSE 5 3304.3. MSM 5 4947.2, MSE 5 183.572, F 5 26.95. 10.47 The standard error is 0.1628; the confidence interval is 0.503 to 1.187.
7
11.1 (a) Second semester GPA. (b) n 5 242. (c) p 5 7. (d) Gender, standardized test score, perfectionism, selfesteem, fatigue, optimism, and depressive symptomatology. 11.3 (a) Math GPA should increase when any explanatory variable increases. (b) DFM 5 4, DFE 5 77. (c) All four coefficients are significantly different from 0 (although the intercept is not). 11.5 The correlations are found in Figure 11.4. The scatterplots for the pairs with the largest correlations are easy to pick out. The whole-number scale for high school grades causes point clusters in those scatterplots. 11.7 Using Table D: (a) 20.0139 to 12.8139. (b) 0.5739 to 12.2261. (c) 0.2372 to 9.3628. (d) 0.6336 to 8.9664. Software gives 0.6422 to 8.9578. 11.9 (a) H0 should refer to b2. (b) Squared multiple correlation. (c) Small P implies that at least one coefficient is different from 0. 11.11 (a) yi ⫽ b0 ⫹ b1xi1 ⫹ b2xi2 ⫹ . . . ⫹ b7xi7 ⫹ ei, where i 5 1, 2, . . . , 142, and i are independent N(0, s) random variables. (b) The sources of variation are model (DFM 5 p 5 7), error (DFE 5 n 2 p 2 1 5 134), and total (DFT 5 n 2 1 5 141). 11.13 (a) The fitted model is GPA ⫽ ⫺0.847 ⫹ 0.00269SATM ⫹ 0.229HSS. (b) GPA 5 ⫺0.887 ⫹
0.00237SATM ⫹ 0.0850HSM ⫹ 0.173HSS. (c) GPA 5 ⫺1.11 ⫹ 0.00240SATM ⫹ 0.0827HSM ⫹ 0.133HSS ⫹
7
10.53 (a) a1 5 0.02617, a0 5 22.7522. (c) Mean 5 21.1333 and standard deviation 5 4.7137—the same as for the ACT scores.
CHAPTER 11
7
outlier (SAT 420, ACT 21). (b) ACT ⫽ 1.63 ⫹ 0.0214SAT, t 5 10.78, P , 0.0005. (c) r 5 0.8167.
10.61 (a) 95% confidence interval for women: 14.73 to 33.33. For men: 29.47 to 42.97. These intervals overlap quite a bit. (b) For women: 22.78. For men: 16.38. The women’s slope standard error is smaller in part because it is divided by a large number. (c) Choose men with a wider variety of lean body masses.
7
10.51 (a) Strong positive linear association with one
10.59 The three smallest correlations (0.16 and 0.19) are the only ones that are not significant (P 5 0.1193 and 0.0632). The first correlation (0.28) has the smallest P-value (0.0009). The next four, and the largest correlation in the Caucasian group, have P , 0.001. The remainder have P , 0.01.
7
10.49 For n 5 15, t ⫽ 2.08 and P 5 0.0579. For n 5 25, t ⫽ 2.77 and P 5 0.0109. Finding the same correlation with more data points is stronger evidence that the observed correlation is not just due to chance.
t 5 1.92, P 5 0.061 (Exercise 10.33). Area and percent forested: r 5 20.2571, t 5 21.82, P 5 0.075.
0.0644HSE. (d) GPA ⫽ 0.257 ⫹ 0.125HSM ⫹ 0.172HSS.
7
MSE
R2
P(x1)
P(x2)
P(x3)
(a)
0.506
25.4%
0.001
0.000
width: Weight ⫽ ⫺98.99 ⫹ 18.732SqWid, s ⫽ 65.24, r2 ⫽ 0.965. Both scatterplots look more linear.
(b)
0.501
26.6%
0.004
0.126
0.002
(c)
0.501
27.1%
0.004
0.137
0.053
(d)
0.527
22.4%
0.024
0.003
7
10.55 (a) For squared length: Weight ⫽ ⫺117.99 ⫹ 0.4970SqLen, s ⫽ 52.76, r2 5 0.977.(b) For squared
10.57 IBI and area: r 5 0.4459, t 5 3.42, P 5 0.001 (from Exercise 10.32). IBI and percent forested: r 5 0.2698,
P(x4)
0.315
The “best” model is the model with SATM and HSS.
Answers to Odd-Numbered Exercises
A-23
7
11.15 The first variable to leave is InAfterAid (P-value 5 0.465). Fitting the new model gives OutAfterAid (P-value 5 0.182) as the next to leave. AvgAid (P-value 5 0.184) leaves next. At that point, all variables are significant predictors. The model
7
is AvgDebt ⫽ ⫺9521 ⫹ 118Admit ⫹ 102Yr4Grad ⫹ 661StudPerFac ⫹ 130PercBorrow. 11.17 (a) 8 and 786. (b) 7.84%; this model is not very predictive. (c) Males and Hispanics consume energy drinks more frequently. Consumption increases with risk-taking scores. (d) Within a group of students with identical (or similar) values of those other variables, energy-drink consumption increases with increasing jock identity and increasing risk taking. 11.19 (a) Model 1: DFE 5 200. Model 2: DFE 5 199. (b) t 5 3.09, P 5 0.0023. (c) For gene expression: t 5 2.44, P 5 0.0153. For RB: t 5 3.33, P 5 0.0010. (d) The relationship is still positive. When gene expression increases by 1, popularity increases by 0.204 in Model 1 and by 0.161 in Model 2 (with RB fixed).
7
11.39 The correlations are 0.840 (LVO1 and LVO2), 0.774 (LVO1 and LOC), and 0.755 (LVO1 and LTRAP). Regression equations, t statistics, R2,
7
and s for each model: LVO⫹ ⫽ 4.38 ⫹ 0.706LOC; t 5 6.58, P , 0.0005; R2 5 0.599, s 5 0.3580. LVO⫹ ⫽ 4.26 ⫹ 0.430LOC ⫹ 0.424LTRAP; t 5 2.56, P 5 0.016; t 5 2.06, P 5 0.048; R2 5 0.652, s 5 0.3394. LVO⫹ ⫽ 0.872 ⫹ 0.392LOC ⫹ 0.028LTRAP ⫹ 0.672LVO⫺; t 5 3.40, P 5 0.002; t 5 0.18, P 5 0.862; t 5 5.71, P , 0.0005; R2 5 0.842, s 5 0.2326. As before, this suggests a model without LTRAP: LVO⫹ ⫽ 0.832 ⫹ 0.406LOC ⫹ 0.682LVO⫺; t 5 4.93, P , 0.0005; t 5 6.57, P , 0.0005; R2 5 0.842, s 5 0.2286. 11.41 Regression equations, t statistics, R2, and s
7
for each model: LVO⫺ ⫽ 5.21 ⫹ 0.441LOC; t 5 3.59, P 5 0.001; R2 5 0.308, s 5 0.4089. LVO⫺ ⫽ 5.04 ⫹ 0.057LOC ⫹ 0.590 LTRAP; t 5 0.31, P 5 0.761; t 5 2.61, P 5 0.014; R2 5 0.443, s 5 0.3732.
7
LVO⫹ ⫽ 1.57 ⫺ 0.293LOC ⫹ 0.245LTRAP ⫹ 0.813LVO⫹; t 5 22.08, P 5 0.047; t 5 1.47, P 5 0.152; t 5 5.71,
7
11.31 (a) OVERALLi 5 b0 1 b1 PEERi 1 b2 FtoSi 1 b3 CtoFi 1 ei, where ei are independent N(0, s) random
(b) VO⫹ ⫽ 58 ⫹ 6.41OC ⫹ 53.9TRAP. Coefficient of OC is not significantly different from 0 (t 5 1.25, P 5 0.221), but coefficient of TRAP is significantly different from 0 (t 5 3.50, P 5 0.002). This is consistent with the correlations found in Exercise 11.36.
7
11.29 (a) PEER is left-skewed; the other two variables are irregular. (b) PEER and FtoS are negatively correlated (r 5 20.114); FtoS and CtoF are positively correlated (r ⫽ 0.580); the other correlation is very small.
large OC. VO⫹ ⫽ 334 ⫹ 19.5OC, t ⫽ 4.73, P , 0.0005. Plot of residuals against OC is slightly curved.
7
11.27 (a) $86.87 to $154.91 million. (b) $89.94 to $154.99 million. (c) The intervals are very similar.
11.37 (a) Plot suggests greater variation in VO1 for
7
7
11.25 (a) USRevenuei 5 b0 1 b1 Budgeti 1 b2 Openingi 1 b3 Theatersi 1 b4 Opinioni 1 ei, where i 5 1, 2, . . . , 35; ei are independent N(0, s) random variables. (b) USRevenue 5 267.72 1 0.1351Budget 1 3.0165Opening 2 0.00223 Theaters 1 10.262Opinion. (c) The Dark Knight may be influential. The spread of the residuals appears to increase with Theaters. (d) 98.1%.
11.35 (a) Refer to your regression output. (b) For example, the t statistic for the GINI coefficient grows from t 5 20.42 (P 5 0.675) to t 5 4.25 (P , 0.0005). The DEMOCRACY t is 3.53 in the third model (P , 0.0005) but drops to 0.71 (P 5 0.479) in the fourth model. (c) A good choice is to use GINI, LIFE, and CORRUPT: all three coefficients are significant, and R2 5 77.0% is nearly the same as for the fourth model from Exercise 11.34.
7 7
11.23 (a) Budget and Opening are right-skewed; Theaters and Opinion are roughly symmetric (slightly left-skewed). Five-number summaries for Budget and Opening are appropriate; mean and standard deviation could be used for the other two variables. (b) All relationships are positive. The Budget/Theaters and Opening/ Theaters relationships appear to be curved; the others are reasonably linear. The correlations between Budget, Opening, and Theaters are all greater than 0.7. Opinion is less correlated with the other three variables—about 0.4 with Budget and Opening and only 0.156 with Theaters.
11.33 (a) For example: All distributions are skewed to varying degrees—GINI and CORRUPT to the right, the other three to the left. CORRUPT and DEMOCRACY have the most skewness. (b) GINI is negatively correlated to the other four variables (ranging from 20.396 to 20.050), while all other correlations are positive and more substantial (0.525 or more).
7
11.21 (a) BMI ⫽ 23.4 ⫺ 0.682 (PA 2 8.614) 1 0.102 (PA 2 8.614)2, (b) R2 5 17.7%. (c) The residuals look roughly Normal and show no obvious remaining patterns. (d) t 5 1.83, df 5 97, P 5 0.070.
variables. (b) OVERALL ⫽ 18.85 ⫹ 0.5746 PEER ⫹ 0.0013 FtoS ⫹ 0.1369 CtoF. (c) PEER: 0.4848 to 0.6644. FtoS: 20.0704 to 0.0730. CtoF: 0.0572 to 0.2166. The FtoS coefficient is not significantly different from 0. (d) R2 ⫽ 72.2%, s ⫽ 7.043.
P , 0.0005; R2 5 0.748, s 5 0.2558. LVO⫺ ⫽ 1.31 ⫺ 0.188LOC ⫹ 0.890LVO⫹; t 5 21.52, P 5 0.140; t 5 6.57, P , 0.0005; R2 5 0.728, s 5 0.2611.
A-24
Answers to Odd-Numbered Exercises
7
11.43 (a) yi 5 b0 1 b1xi1 1 b2xi2 1 b3xi3 1 b4xi4 1 ei, where i 5 1, 2 , . . . , 69; ei are independent N(0, s) random variables. (b) PCB 5 0.94 1 11.87x1 1 3.76x2 1 3.88x3 1 4.18x4. All coefficients are significantly different from 0, although the constant 0.937 is not (t 5 0.76, P 5 0.449). R2 5 0.989, s 5 6.382. (c) The residuals appear to be roughly Normal, but with two outliers. There are no clear patterns when plotted against the explanatory variables.
7
11.45 (a) PCB ⫽ ⫺1.02 ⫹ 12.64 PCB52 ⫹ 0.31 PCB118 ⫹ 8.25 PCB138, R2 5 0.973, s 5 9.945. (b) b2 5 0.313, P 5 0.708. (c) In Exercise 11.43, b2 5 3.76, P , 0.0005. 11.47 The model is yi 5 b0 1 b1xi1 1 b2xi2 1 b3xi3 1 b4xi4 1 ei, where i 5 1, 2, . . . , 69; ei are independent N(0, s) random variables. Regression gives
7
TEQ ⫽ 1.06 ⫺ 0.097 PCB52 ⫹ 0.306 PCB118 ⫹ 0.106 PCB138 ⫺ 0.004 PCB180 with R2 5 0.677. Only the constant (1.06) and the PCB118 coefficient (0.306) are significantly different from 0. Residuals are slightly right-skewed and show no clear patterns when plotted with the explanatory variables. 11.49 (a) The correlations are all positive, ranging from 0.227 (LPCB28 and LPCB180) to 0.956 (LPCB and LPCB138). LPCB28 has one outlier (Specimen 39) when plotted with the other variables; except for that point, all scatterplots appear fairly linear. (b) All correlations are higher with the transformed data. 11.51 It appears that a good model is LPCB126 and LPCB28 (R2 5 0.768). Adding more variables does not appreciably increase R2 or decrease s. 11.53 x, M, s, and IQR for each variable: Taste: 24.53, 20.95, 16.26, 23.9. Acetic: 5.498, 5.425, 0.571, 0.656. H2S: 5.942, 5.329, 2.127, 3.689. Lactic: 1.442, 1.450, 0.3035, 0.430. None of the variables show striking deviations from Normality. Taste and H2S are slightly right-skewed, and Acetic has two peaks. There are no outliers.
7
11.55 Taste ⫽ ⫺61.5 ⫹ 15.6Acetic; t 5 3.48, P 5 0.002. The residuals seem to have a Normal distribution but are positively associated with both H2S and Lactic.
7
11.57 Taste ⫽ ⫺29.9 ⫹ 37.7Lactic; t 5 5.25, P , 0.0005. The residuals seem to have a Normal distribution; there are no striking patterns for residuals against the other variables.
7
11.59 Taste ⫽ ⫺26.9 ⫹ 3.80Acetic ⫹ 5.15H2S. For the coefficient of Acetic, t 5 0.84 and P 5 0.406. This model is not much better than the model with H2S alone; Acetic and H2S are correlated (r ⫽ 0.618), so Acetic does not add significant information if H2S is included.
7
11.61 Taste ⫽ ⫺28.9 ⫹ 0.33Acetic ⫹ 3.91H2S ⫹ 19.7Lactic. The coefficient of Acetic is not significantly different from 0 (P 5 0.942). Residuals of this regression appear
to be Normally distributed and show no patterns in scatterplots with the explanatory variables. It appears that the H2S/Lactic model is best.
CHAPTER 12 12.1 (a) H0 says the population means are all equal. (b) Experiments are best for establishing causation. (c) ANOVA is used to compare means. ANOVA assumes all variances are equal. (d) Multiple-comparisons procedures are used when we wish to determine which means are significantly different but have no specific relations in mind before looking at the data. 12.3 (a) Yes: 7y4 5 1.75 , 2. (b) 16, 25, and 49. (c) 31.2647. (d) 5.5915. 12.5 (a) This is the description of between-group variation. (b) The sums of squares will add. (c) s is a parameter. (d) A small P means the means are not all the same, but the distributions may still overlap. 12.7 Assuming the t (ANOVA) test establishes that the means are different, contrasts and multiple comparisons provide no further useful information. 12.9 (a) df 5 3 and 20. In Table E, 3.10 , 3.18 , 3.86. (c) 0.025 , P , 0.05. (d) We can conclude only that at least one mean is different from the others. 12.11 (a) df are 3 and 60. F 5 2.54. 2.18 , F , 2.76, so 0.050 , P , 0.100. (Software gives P ⫽ 0.0649.) (b) df are 2 and 24. F ⫽ 4.047. 3.40 , F , 4.32, so 0.025 , P , 0.050. (Software gives P ⫽ 0.0306.) 12.13 (a) Response: egg cholesterol level. Populations: chickens with different diets or drugs. I 5 3, n1 5 n2 5 n3 5 25, N 5 75. (b) Response: rating on five-point scale. Populations: the three groups of students. I 5 3, n1 5 31, n2 5 18, n3 5 45, N 5 94. (c) Response: quiz score. Populations: students in each TA group. I 5 3, n1 5 n2 5 n3 5 14, N 5 42. 12.15 For all three situations, we test H0: m1 5 m2 5 m3; Ha: at least one mean is different. (a) DFM 2, DFE 72, DFT 74. F(2, 72). (b) DFM 2, DFE 91, DFT 93. F(2, 91). (c) DFM 2, DFE 39, DFT 41. F(2, 39). 12.17 (a) This sounds like a fairly well-designed experiment, so the results should at least apply to this farmer’s breed of chicken. (b) It would be good to know what proportion of the total student body falls in each of these groups—that is, is anyone overrepresented in this sample? (c) Effectiveness teaching one topic (power calculations) might not reflect overall effectiveness. 12.19 (a) df 5 4 and 178. (b) 5 1 146 5 151 athletes were used. (c) For example, the individuals could have been outliers in terms of their ability to withstand the water bath pain. In the case of either low or high outliers, their removal would lessen the standard deviation for
Answers to Odd-Numbered Exercises their sport and move that sports mean (removing a high outlier would lower the mean and removing a low outlier would raise the mean). 12.21 (a) c 5 mPandP 2 1y4(mText 1 mEmail 1 mFB 1 mMSN). (b) H0: c 5 0 versus Ha: c . 0. (c) t ⫽ 1.894 with df 5 138. P 5 0.0302. 12.23 (a) The table below gives the sample sizes, means, and standard deviations. Food
n
x
s
Comfort
22
4.887
0.573
Organic
20
5.584
0.594
Control
20
5.082
0.622
(b) Comfort food is relatively symmetric. Organic food has its most prevalent values at the extremes. Control could be called left-skewed (it does not look very symmetric). 12.25 (a) The means are not all equal for the three groups. Organic appears to differ from both Comfort and Control; Comfort and Control are not significantly different from each other. (b) The decrease in variability for the three groups and the curve in the Normal quantile plot might make us question Normality. 12.27 (a) I 5 3, N 5 120, so df 5 2 and 117. (b) From Table E, P , 0.001. Using software, P 5 0.0003. (c) We really shouldn’t generalize these results beyond what might occur in similar shops in Mexico. 12.29 (a) F can be made very small (close to 0), and P close to 1. (b) F increases, and P decreases. 12.31 (a) Group
n
x
s
Control
35
21.01
11.50
Group
34
210.79
11.14
Individual
35
23.71
9.08
(b) Yes; 2(9.08) 5 18.16 . 11.50. (c) Control is closest to a symmetric distribution; Individual seems left-skewed. However, with sample sizes at least 34 in each group, moderate departures from Normality are not a problem. 12.33 (a) The new group means and standard deviations will be the old means and standard deviations divided by 2.2. (b) Dividing by a constant will not change the Normality of the data. The test statistic is F 5 7.77 with P-value 0.001. These are exactly the same values obtained in Exercise 12.32. 12.35 (a) Based on the sample means, fiber is cheapest and cable is most expensive. (b) Yes; the ratio is 1.55. (c) df 5 2 and 44; 0.025 , P , 0.050, or P 5 0.0427.
A-25
12.37 (a) The variation in sample size is some cause for concern, but there can be no extreme outliers in a 1-to-7 scale, so ANOVA is probably reliable. (b) Yes: 1.26y1.03 5 1.22 , 2. (c) F(4, 405), P 5 0.0002. (d) Hispanic Americans are highest, Japanese are in the middle, the other three are lowest. 12.39 (a) Activity seems to increase with both drugs, and Drug B appears to have a greater effect. (b) Yes; the standard deviation ratio is 1.49. sp ⫽ 3.487. (c) df 5 4 and 20. (d) 0.05 , P , 0.10; software gives P 5 0.0642. 12.41 (a) c1 5 m2 2 (m1 1 m4)y2. (b) c2 5 (m1 1 m2 1 m4)y 3 2 m3. 12.43 (a) Yes; the ratio is 1.25. sp 5 0.7683. (b) df 5 2 and 767; P , 0.001. (c) Compare faculty to the student average: c 5 m2 2 (m1 1 m3)y2. We test H0: c 5 0; Ha: c . 0. We find c 5 0.585, t 5 5.99, and P , 0.0001. 12.45 (a) All three distributions show no particular skewness. Control: n 5 15, x ⫽ 0.21887, s 5 0.01159 gycm2. Low dose: n 5 15, x ⫽ 0.21593, s 5 0.01151 gycm2. High dose: n 5 15, x ⫽ 0.23507, s 5 0.01877 gycm2. (b) All three distributions appear to be nearly Normal. (c) F 5 7.72, df 5 2 and 42, P 5 0.001. (d) For Bonferroni, t** 5 2.49 and MSD 5 0.0131. The high-dose mean is significantly different from the other two. (e) High doses increase bone mineral density. 12.47 (a) Control: n 5 10, x ⫽ 601.10, s 5 27.36 mgycm3. Low jump: n 5 10, x ⫽ 612.50, s 5 19.33 mgycm3. High jump: n 5 10, x ⫽ 638.70, s 5 16.59 mgycm3. Pooling is reasonable. (b) F 5 7.98, df 5 2 and 27, P 5 0.002. We conclude that not all means are equal. 12.49 (a) c1 5 m1 2 (m2 1 m4)y2 and c2 5 (m3 2 m2) 2 (m5 2 m4). (b) c1 5 23.9, SEc1 ⫽ 2.1353, c2 5 2.35, and SEc2 ⫽ 3.487. (c) The first contrast is significant (t 5 21.826), but the second is not (t 5 20.674). 12.51 (a) ECM1: n 5 3, x ⫽ 65.0%, s 5 8.66%. ECM2: n 5 3, x ⫽ 63.33%, s 5 2.89%. ECM3: n 5 3, x ⫽ 73.33%, s 5 2.89%. MAT1: n 5 3, x ⫽ 23.33%, s 5 2.89%. MAT2: n 5 3, x ⫽ 6.67%, s 5 2.89%. MAT3: n 5 3, x ⫽ 11.67%, s 5 2.89%. Pooling is risky because 8.66y2.89 . 2. (b) F 5 137.94, df 5 5 and 12, P , 0.0005. We conclude that the means are not the same. 12.53 (a) c1 5 m5 2 0.25(m1 1 m2 1 m3 1 m4). c2 5 0.5(m1 1 m2) 20.5(m3 1 m4). c3 5 (m1 2 m2) 2 (m3 2 m4). (b) From Exercise 12.26, we have sp 5 18.421. c1 5 14.65, c2 5 6.1, and c3 5 20.5. sc1 ⫽ 4.209, sc2 ⫽ 3.874, and sc3 ⫽ 3.784. (c) t1 ⫽ 3.48. t2 ⫽ 1.612. t3 ⫽ ⫺0.132. t114,0.975 5 1.980. Two-tailed P-values are 0.0007, 0.1097, and 0.8952. The first two contrasts are significant and the third is not.
A-26
Answers to Odd-Numbered Exercises
12.55 (a) The plot shows granularity (which varies between groups), but that should not make us question independence; it is due to the fact that the scores are all integers. (b) The ratio of the largest to the smallest standard deviations is less than 2. (c) Apart from the granularity, the quantile plots are reasonably straight. (d) Again, apart from the granularity, the quantile plots look pretty good.
(c) The test statistics have an F distribution. (d) If the sample sizes are not the same, the sums of squares may not add.
12.57 (a) c1 5 (m1 1 m2 1 m3)y3 2 m4, c2 5 (m1 1 m2)y2 2 m3, c3 5 m1 2 m2. (b) The pooled standard deviation is sp 5 1.1958. SEc1 ⫽ 0.2355, SEc2 ⫽ 0.1413, SEc3 ⫽ 0.1609. (c) Testing H0: ci 5 0; Ha: ci ⬆ 0 for each contrast, we find c1 5 212.51, t1 5 253.17, P1 , 0.0005; c2 5 1.269, t2 5 8.98, P2 , 0.0005; c3 5 0.191, t3 5 1.19, P3 ⫽ 0.2359. The Placebo mean is significantly higher than the average of the other three, while the Keto mean is significantly lower than the average of the two Pyr means. The difference between the Pyr means is not significant (meaning the second application of the shampoo is of little benefit).
13.7 (a) The factors are gender (I 5 2) and age (J 5 3). The response variable is the percent of pretend play. N 5 (2)(3)(11) 5 66. (b) The factors are time after harvest (I 5 5) and amount of water (J 5 2). The response variable is the percent of seeds germinating. N 5 30. (c) The factors are mixture (I 5 6) and freezing/thawing cycles (J 5 3). The response variable is the strength of the specimen. N 5 54. (d) The factors are training programs (I 5 4) and the number of days to give the training (J 5 2). The response variable is not specified but presumably is some measure of the training’s effectiveness. N 5 80.
12.59 The means all increase by 5%, but everything else (standard deviations, standard errors, and the ANOVA table) is unchanged.
13.9 (a) The same students were tested twice. (b) The interactions plot shows a definite interaction; the control group’s mean score decreased, while the expressivewriting group’s mean increased. (c) No. 2(5.8) 5 11.6 , 14.3.
12.61 All distributions are reasonably Normal, and standard deviations are close enough to justify pooling. For PRE1, F 5 1.13, df 5 2 and 63, P 5 0.329. For PRE2, F 5 0.11, df 5 2 and 63, P 5 0.895. Neither set of pretest scores suggests a difference in means.
7
12.63 Score ⫽ 4.432 ⫺ 0.000102 Friends. The slope is not significantly different from 0 (t 5 20.28, P 5 0.782), and the regression explains only 0.1% of the variation in score. Residuals suggest a possible curved relationship. 12.67 (b) Answers will vary with choice of Ha and desired power. For example, with m1 5 m2 5 4.4, m3 5 5, s 5 1.2, three samples of size 75 will produce power 0.78. 12.69 The design can be similar, although the types of music might be different. Bear in mind that spending at a casual restaurant will likely be less than at the restaurants examined in Exercise 12.40; this might also mean that the standard deviations could be smaller. Decide how big a difference in mean spending you want to detect, then do some power computations.
CHAPTER 13 13.1 (a) Two-way ANOVA is used when there are two factors. (b) Each level of A should occur with all three levels of B. (c) The RESIDUAL part of the model represents the error. (d) DFAB 5 (I 2 1)(J 2 1). 13.3 (a) Reject H0 when F is large. (b) Mean squares equal the sum of squares divided by degrees of freedom.
13.5 (a) N 5 36. DFA 5 2, DFB 5 1, DFAB 5 2, DFE 5 30, so F has 2 and 30 degrees of freedom. (c) P . 0.10. (d) Interaction is not significant; the interaction plot should have roughly parallel lines.
13.11 (a) Recall from Chapter 12 that ANOVA is robust against reasonable departures from Normality, especially when sample sizes are similar (and as large as these). (b) Yes. 1.62y0.82 5 1.98 , 2. The ANOVA table is below. Source
DF
SS
Age
6
31.97
Gender
1
44.66
MS 5.328 44.66
6
13.22
2.203
Error
232
280.95
1.211
Total
245
370.80
Age 3 Gender
F
P
4.400
0.0003
36.879
0.0000
1.819
0.0962
13.13 (a) There appears to be an interaction; a thankyou increases repurchase intent for those with short history and decreases it for customers with long history. (b) The marginal means for history (6.245 and 7.45) convey the fact that repurchase intent is higher for customers with long history. The thank-you marginal means (6.61 and 7.085) are less useful because of the interaction. 13.15 (a) The plot suggests a possible interaction. (b) By subjecting the same individual to all four treatments, rather than four individuals to one treatment each, we reduce the within-groups variability. 13.17 (a) We’d expect reaction times to slow with older individuals. If bilingualism helps brain functioning, we would not expect that group to slow as much as the
Answers to Odd-Numbered Exercises
A-27
monolingual group. The expected interaction is seen in the plot; mean total reaction time for the older bilingual group is much less than for the older monolingual group; the lines are not parallel. (b) The interaction is just barely not significant (F 5 3.67, P 5 0.059). Both main effects are significant (P 5 0.000).
4.871. M mean: 4.485. R mean: 5.246. LR minus LM: 0.63. NR minus NM: 0.892. Mean GITH levels are lower for M than for R; there is not much difference between L and N. The difference between M and R is greater among rats who had normal chromium levels in their diets (N).
13.19 (a) There may be an interaction; for a favorable process, a favorable outcome increases satisfaction quite a bit more than for an unfavorable process (12.32 versus 10.24). (b) This time, the increase in satisfaction from a favorable outcome is less for a favorable process (10.49 versus 11.32). (c) There seems to be a three-factor interaction, because the interactions in parts (a) and (b) are different.
13.35 (a) sp ⫽ $38.14, df 5 105. (b) Yes; the largestto-smallest ratio is 1.36. (c) Individual sender, $70.90; group sender, $48.85; individual responder, $59.75; group responder, $60.00. (d) There appears to be an interaction; individuals send more money to groups, while groups send more money to individuals. (e) P 5 0.0033, P 5 0.9748, and P 5 0.1522. Only the main effect of sender is significant.
13.21 Humor slightly increases satisfaction (3.58 with no humor, 3.96 with humor). The process and outcome effects are greater: favorable process, 4.75; unfavorable process, 2.79; favorable outcome, 4.32; unfavorable outcome, 3.22.
13.37 Yes; the iron-pot means are the highest, and F for testing the effect of the pot type is very large.
13.23 The largest-to-smallest ratio is 1.26, and the pooled standard deviation is 1.7746. 13.25 Except for female responses to purchase intention, means decreased from Canada to the United States to France. Females had higher means than men in almost every case, except for French responses to credibility and purchase intention (a modest interaction). 13.27 (a) Intervention, 11.6; control, 9.967; baseline, 10.0; 3 months, 11.2; 6 months, 11.15. Overall, 10.783. The row means suggest that the intervention group showed more improvement than the control group. (b) Interaction means that the mean number of actions changes differently over time for the two groups. 13.29 With I 5 3, J 5 2, and 6 observations per cell, we have DFA 5 2, DFB 5 1, DFAB 5 2, and DFE 5 30. 3.32 , 3.45 , 4.18, so 0.025 , PA , 0.05 (software gives 0.0448). 2.49 , 2.88, so PB . 0.10 (software gives 0.1250). 1.14 , 2.49, so PAB . 0.10 (software gives 0.3333). The only significant effect is the main effect for factor A. 13.31 (a) There is little evidence of an interaction. (b) sp ⫽ 0.1278. (c) c1 5 (mnew,city 1 mnew,hw)y2 2 (mold,city 1 mold,hw)y2. c2 5 mnew,city 2 mnew,hw . c3 5 mold,hw 2 mold,city . (d) By subjecting the same individual to all four treatments, rather than four individuals to one treatment each, we reduce the within-groups variability. 13.33 (b) There seems to be a fairly large difference between the means based on how much the rats were allowed to eat but not very much difference based on the chromium level. There may be an interaction: the NM mean is lower than the LM mean, while the NR mean is higher than the LR mean. (c) L mean: 4.86. N mean:
13.39 (a) In the order listed in the table: x11 ⫽ 25.0307, s11 5 0.0011541; x12 ⫽ 25.0280, s12 5 0; x13 ⫽ 25.0260, s13 5 0; x21 ⫽ 25.0167, s21 5 0.0011541; x22 ⫽ 25.0200, s22 5 0.002000; x23 ⫽ 25.0160, s23 5 0; x31 ⫽ 25.0063, s31 5 0.001528; x32 ⫽ 25.0127, s32 5 0.0011552; x33 ⫽ 25.0093, s33 5 0.0011552; x41 ⫽ 25.0120, s41 5 0; x42 ⫽ 25.0193, s42 5 0.0011552; x43 ⫽ 25.0140, s43 5 0.004000; x51 ⫽ 24.9973, s51 5 0.001155; x52 ⫽ 25.0060, s52 5 0; x53 ⫽ 25.0003, s53 5 0.001528. (b) Except for Tool 1, mean diameter is highest at Time 2. Tool 1 had the highest mean diameters, followed by Tool 2, Tool 4, Tool 3, and Tool 5. (c) FA 5 412.94, df 5 4 and 30, P , 0.0005. FB 5 43.60, df 5 2 and 30, P , 0.0005. FAB 5 7.65, df 5 8 and 30, P , 0.0005. (d) There is strong evidence of a difference in mean diameter among the tools (A) and among the times (B). There is also an interaction (specifically, Tool 1’s mean diameters changed differently over time compared with the other tools). 13.41 (a) All three F-values have df 5 1 and 945; the P-values are ,0.001, ,0.001, and 0.1477. Gender and handedness both have significant effects on mean lifetime, but there is no interaction. (b) Women live about 6 years longer than men (on the average), while righthanded people average 9 more years of life than lefthanded people. Handedness affects both genders in the same way, and vice versa. 13.43 (a) and (b) The first three means and standard deviations are x1,1 ⫽ 3.2543, s1,1 5 0.2287; x1,2 ⫽ 2.7636, s1,2 5 0.0666; x1,3 ⫽ 2.8429, s1,3 5 0.2333. The standard deviations range from 0.0666 to 0.3437, for a ratio of 5.16—larger than we like. (c) For Plant, F 5 1301.32, df 5 3 and 224, P , 0.0005. For Water, F 5 9.76, df 5 6 and 224, P , 0.0005. For interaction, F 5 5.97, df 5 18 and 224, P , 0.0005. 13.45 The seven F statistics are 184.05, 115.93, 208.87, 218.37, 220.01, 174.14, and 230.17, all with df 5 3 and 32 and P , 0.0005.
A-28
Answers to Odd-Numbered Exercises
13.47 Fresh: Plant, F 5 81.45, df 5 3 and 84, P , 0.0005; Water, F 5 43.71, df 5 6 and 84, P , 0.0005; interaction, F 5 1.79, df 5 18 and 84, P 5 0.040. Dry: Plant, F 5 79.93, df 5 3 and 84, P , 0.0005; Water, F 5 44.79, df 5 6 and 84, P , 0.0005; interaction, F 5 2.22, df 5 18 and 84, P 5 0.008. 13.49 The twelve F statistics are fresh biomass: 15.88, 11.81, 62.08, 10.83, 22.62, 8.20, and 10.81; dry biomass: 8.14, 26.26, 22.58, 11.86, 21.38, 14.77, and 8.66, all with df 5 3 and 15 and P , 0.003. 13.51 (a) Gender: df 5 1 and 174. Floral characteristic: df 5 2 and 174. Interaction: df 5 2 and 174. (b) Damage to males was higher for all characteristics. For males, damage was higher under characteristic level 3, while for females, the highest damage occurred at level 2. (c) Three of the standard deviations are at least half as large as the means. Because the response variable (leaf damage) had to be nonnegative, this suggests that these distributions are right-skewed. 13.53 Men in CS: n 5 39, x ⫽ 7.79487, s 5 1.50752. Men in EOS: n 5 39, x ⫽ 7.48718, s 5 2.15054. Men in Other: n 5 39, x ⫽ 7.41026, s 5 1.56807. Women in CS: n 5 39, x ⫽ 8.84615, s 5 1.13644. Women in EOS: n 5 39, x ⫽ 9.25641, s 5 0.75107. Women in Other: n 5 39,
x ⫽ 8.61539, s 5 1.16111. The means suggest that females have higher HSE grades than males. For a given gender, there is not too much difference among majors. Normal quantile plots show no great deviations from Normality, apart from the granularity of the grades (most evident among women in EO). In the ANOVA, only the effect of gender is significant (F 5 50.32, df 5 1 and 228, P , 0.0005). 13.55 Men in CS: n 5 39, x ⫽ 526.949, s 5 100.937. Men in EOS: n 5 39, x ⫽ 507.846, s 5 57.213. Men in Other: n 5 39, x ⫽ 487.564, s 5 108.779. Women in CS: n 5 39, x ⫽ 543.385, s 5 77.654. Women in EOS: n 5 39, x ⫽ 538.205, s 5 102.209. Women in Other: n 5 39, x ⫽ 465.026, s 5 82.184. The means suggest that students who stay in the sciences have higher mean SATV scores than those who end up in the Other group. Female CS and EO students have higher scores than males in those majors, but males have the higher mean in the Other group. Normal quantile plots suggests some rightskewness in the “Women in CS” group and also some non-Normality in the tails of the “Women in EO” group. Other groups look reasonably Normal. In the ANOVA, only the effect of major is significant (F 5 9.32, df 5 2 and 228, P , 0.0005).
N OT E S A N D D ATA S O U R C E S CHAPTER 1 1. See census.gov. 2. From State of Drunk Driving Fatalities in America 2010, available at centurycouncil.org.
17. Data for November 2012, from internetworldstats. com/facebook.htm. 18. See previous note. 19. Data provided by Darlene Gordon, Purdue University.
3. James P. Purdy, “Why first-year college students select online research sources as their favorite,” First Monday, 17, No. 9 (September 3, 2012). See firstmonday.org.
20. Data for 1980 to 2012 are available from the World Bank at data.worldbank.org/indicator/IC.REG.DURS. Data for 2012 were used for this example.
4. Data collected in the lab of Connie Weaver, Department of Foods and Nutrition, Purdue University, and provided by Linda McCabe.
21. See, for example, http://www.nacubo.org/Research.
5. Haipeng Shen, “Nonparametric regression for problems involving lognormal distributions,” PhD dissertation, University of Pennsylvania, 2003. Thanks to Haipeng Shen and Larry Brown for sharing the data. 6. From the Digest of Education Statistics at the website of the National Center for Education Statistics, nces.ed.gov/programs/digest. 7. See Note 4. 8. Based on Barbara Ernst et al., “Seasonal variation in the deficiency of 25–hydroxyvitamin D3 in mildly to extremely obese subjects,” Obesity Surgery, 19 (2009), pp. 180–183. 9. More information about the Titanic can be found at the website for the Titanic Project in Belfast, Ireland, at titanicbelfast.com/Home.aspx. 10. Data describing the passengers on the Titanic can be found at lib.stat.cmu.edu/S/Harrell/data/descriptions/ titanic.html. 11. See semiocast.com/publications/2012_01_31_Brazil_ becomes_2nd_country_on_Twitter_superseds_Japan. 12. Data for 2011 from Table 1.1 in the U.S. Energy Information Administration’s December 2012 Monthly Energy Review, available at eia.gov/totalenergy/data/ monthly/pdf/mer.pdf. 13. From the Color Assignment website of Joe Hallock, joehallock.com/edu/COM498/index.html. 14. U.S. Environmental Protection Agency, Municipal Solid Waste Generation, Recycling, and Disposal in the United States: Tables and Figures for 2010. 15. November 2012 report from marketshare.hitslink.com. 16. Color popularity for 2011 from the Dupont Automotive Color report; see dupont.com/Media_ Center/en_US/color_popularity.
22. The data were provided by James Kaufman. The study is described in James C. Kaufman, “The cost of the muse: Poets die young,” Death Studies, 27 (2003), pp. 813–821. The quote from Yeats appears in this article. 23. See, for example, the bibliographic entry for Gosset in the School of Mathematics and Statistics of the University of St. Andrews, Scotland, MacTutor History of Mathematics archive at www.history.mcs.st-andrews. ac.uk/Biographies/Gosset.html. 24. These and other data that were collected and used by Gosset can be found in the Guinness Archives in Dublin. See guinness-storehouse.com/en/Archive.aspx. 25. These data were provided by Krista Nichols, Department of Biological Sciences, Purdue University. 26. From the Interbrand website; see interbrand.com/ en/best-global-brands. 27. From beer100.com/beercalories.htm on January 4, 2013. 28. See Noel Cressie, Statistics for Spatial Data, Wiley, 1993. 29. Data provided by Francisco Rosales of the Department of Nutritional Sciences, Pennsylvania State University. 30. Data provided by Betsy Hoza, Department of Psychological Sciences, University of Vermont. 31. Net worth for 2010 from the Federal Reserve Bulletin, 98, No. 2 (2012), p. 17. 32. For more information about earthquakes, see the U.S. Geological Service website at usgs.gov. 33. We thank Ethan J. Temeles of Amherst College for providing the data. His work is described in Ethan J. Temeles and W. John Kress, “Adaptation in a planthummingbird association,” Science, 300 (2003), pp. 630–633.
N-1
N-2
Notes and Data Sources
34. The National Assessment of Educational Progress (NAEP) is conducted by the National Center for Education Statistics (NCES). The NAEP is a large assessment of student knowledge in a variety of subjects. See nces.ed.gov/nationsreportcard/naepdata. 35. See the NCAA Eligibility Center Quick Reference Sheet, available at fs.ncaa.org/Docs/eligibility_center/ Quick_Reference_Sheet.pdf. 36. Distributions for SAT scores can be found at the College Board website, research.collegeboard.org/content/satdata-tables. 37. See previous note. 38. See stubhub.com. 39. From Matthias R. Mehl et al., “Are women really more talkative than men?,” Science, 317, No. 5834 (2007), p. 82. The raw data were provided by Matthias Mehl. 40. From the American Heart Association website, americanheart.org. 41. From fueleconomy.gov. 42. From cdc.gov/brfss. The data were collected in 2011, with the exception of the fruits and vegetables variable, which is from 2009, the most recent year when this variable was included in the survey. 43. See Note 16. 44. See worldbank.org. These data are among the files available under “Data,” “Indicators.” 45. Data for 2013 were downloaded from isp-review. toptenreviews.com. 46. See previous note. 47. The Institute of Medicine website, www.iom.edu, provides links to reports related to dietary reference intakes as well as other health and nutrition topics. 48. Dietary Reference Intakes for Vitamin C, Vitamin E, Selenium and Carotenoids, National Academy of Sciences, 2000. 49. See previous note.
CHAPTER 2 1. Hannah G. Lund et al., “Sleep patterns and predictors of disturbed sleep in a large population of college students,” Adolescent Health, 46, No. 2 (2010), pp. 97–99. 2. See previous note. 3. See cfs.purdue.edu/FN/campcalcium/public.htm for information about the 2010 camp. 4. See consumersunion.org/about. 5. “Best laundry detergents,” Consumer Reports, November 2011, pp. 8–9.
6. OECD StatExtracts, Organisation for Economic Co-operation and Development, downloaded on January 8, 2013, from stats.oecd.org/wbos. 7. These studies were conducted by Connie Weaver, Department of Nutrition Science, Purdue University, over the past 20 years. The data for this example were provided by Linda McCabe. More details concerning this particular study and references to other related studies are given in Lu Wu, “Calcium requirements and metabolism in Chinese-American boys and girls,” Journal of Bone Mineral Research, 25, No. 8 (2010), pp. 1842–1849. 8. A sophisticated treatment of improvements and additions to scatterplots is W. S. Cleveland and R. McGill, “The many faces of a scatterplot,” Journal of the American Statistical Association, 79 (1984), pp. 807–822. 9. Stewart Warden et al., “Throwing induces substantial torsional adaption within the midshaft humerus of male baseball players,” Bone, 45 (2009), pp. 931–941. The data were provided by Stewart Warden, Department of Physical Therapy, School of Health and Rehabilitation Sciences, Indiana University. 10. See spectrumtechniques.com/isotope_generator.htm. 11. These data were collected under the supervision of Zach Grigsby, Science Express Coordinator, College of Science, Purdue University. 12. See beer100.com/beercalories.htm. 13. See worldbank.org. 14. James T. Fleming, “The measurement of children’s perception of difficulty in reading materials,” Research in the Teaching of English, 1 (1967), pp. 136–156. 15. Data for 2012 from forbes.com/nfl-valuations/. 16. From en.wikipedia.org/wiki/10000_metres. 17. A careful study of this phenomenon is W. S. Cleveland, P. Diaconis, and R. McGill, “Variables on scatterplots look more highly correlated when the scales are increased,” Science, 216 (1982), pp. 1138–1141. 18. Data from a plot in James A. Levine, Norman L. Eberhardt, and Michael D. Jensen, “Role of nonexercise activity thermogenesis in resistance to fat gain in humans,” Science, 283 (1999), pp. 212–214. 19. Frank J. Anscombe, “Graphs in statistical analysis,” American Statistician, 27 (1973), pp. 17–21. 20. From the website of the National Center for Education Statistics, nces.ed.gov. 21. Debora L. Arsenau, “Comparison of diet management instruction for patients with non-insulin dependent diabetes mellitus: Learning activity package vs. group instruction,” master’s thesis, Purdue University, 1993.
Notes and Data Sources 22. The facts in Exercise 2.100 come from Nancy W. Burton and Leonard Ramist, Predicting Success in College: Classes Graduating since 1980, Research Report No. 2001-2, The College Board, 2001.
N-3
38. The counts reported were calculated using counts of the numbers of banks in the different regions and the percents given in the ABA report.
24. See iom.edu.
39. Education Indicators: An International Perspective, Institute of Education Studies, National Center for Education Statistics; see nces.ed.gov/surveys/international.
25. Based on a study described in Corby C. Martin et al., “Children in school cafeterias select foods containing more saturated fat and energy than the Institute of Medicine recommendations,” Journal of Nutrition, 140 (2010), pp. 1653–1660.
40. Information about this procedure was provided by Samuel Flanigan of U.S. News & World Report. See usnews.com/usnews/rankguide/rghome.htm for a description of the variables used to construct the ranks and for the most recent ranks.
26. You can find a clear and comprehensive discussion of numerical measures of association for categorical data in Chapter 2 of Alan Agresti, Categorical Data Analysis, 2nd ed., Wiley, 2002.
41. From the Social Security website, ssa.gov/OACT/ babynames.
27. Edward Bumgardner, “Loss of teeth as a disqualification for military service,” Transactions of the Kansas Academy of Science, 18 (1903), pp. 217–219.
43. We thank Zhiyong Cai of Texas A&M University for providing the data. The data are from work performed in connection with his PhD dissertation in the Department of Forestry and Natural Resources, Purdue University.
23. See Note 19.
28. Based on reports prepared by Andy Zehner, vice president for Student Affairs, Purdue University. 29. Data are from the NOAA Satellite and Information Service at ncdc.noaa.gov/special-reports/groundhogday.php. 30. From M.-Y. Chen et al., “Adequate sleep among adolescents is positively associated with health status and health-related behaviors,” BMC Public Health, 6, No. 59 (2006); available from biomedicalcentral.com/14712458/6/59. 31. M. S. Linet et al., “Residential exposure to magnetic fields and acute lymphoblastic leukemia in children,” New England Journal of Medicine, 337 (1997), pp. 1–7. 32. The Health Consequences of Smoking: 1983, U.S. Public Health Service, 1983. 33. Dennis Bristow et al., “Thirty games out and sold out for months! An empirical examination of fan loyalty to two Major League Baseball teams,” Journal of Management Research, 2, No. 1 (2010), E2; available at macrothink.org/jmr. 34. See www12.statcan.ca/english/census06/analysis/ agesex/ProvTerr1.cfm. 35. OECD StatExtracts, Organisation for Economic Co-operation and Development, downloaded on June 29, 2008, from stats.oecd.org/wbos. 36. For an overview of remote deposit capture, see remotedepositcapture.com/overview/rdc.overview.aspx. 37. From the “Community Bank Competitiveness Survey,” 2008, ABA Banking Journal. The survey is available at nxtbook.com/nxtbooks/sb/ababj-compsurv08/ index.php.
42. See cdc.gov/brfss/. The data file BRFSS contains several variables from this source.
44. Although these data are fictitious, similar though less simple situations occur. See P. J. Bickel and J. W. O’Connell, “Is there a sex bias in graduate admissions?,” Science, 187 (1975), pp. 398–404.
CHAPTER 3 1. See norc.uchicago.edu. 2. Stewart Warden et al., “Throwing induces substantial torsional adaption within the midshaft humerus of male baseball players,” Bone, 45 (2009), pp. 931–941. 3. Corby C. Martin et al., “Children in school cafeterias select foods containing more saturated fat and energy than the Institute of Medicine recommendations,” Journal of Nutrition, 140 (2010), pp. 1653–1660. 4. Based on “Look, no hands: Automatic soap dispensers,” Consumer Reports, February 2013, p. 11. 5. From “Did you know,” Consumer Reports, February 2013, p. 10. 6. Bruce Barrett et al., “Echinacea for treating the common cold,” Annals of Internal Medicine, 153 (2010), pp. 769–777. 7. For a full description of the STAR program and its follow-up studies, go to heros-inc.org/star.htm. 8. See Note 6. 9. Bonnie Spring et al., “Multiple behavior changes in diet and activity,” Archives of Internal Medicine, 172, No. 10 (2012), pp. 789–796.
N-4
Notes and Data Sources
10. Based on Gerardo Ramirez and Sian L. Beilock, “Writing about testing worries boosts exam performance in the classroom,” Science, 331 (2011), p. 2011. Although we describe the experiment as not including a control group, the researchers who conducted this study did, in fact, use one.
September 12, 1998. Many other examples appear in T. W. Smith, “That which we call welfare by any other name would smell sweeter,” Public Opinion Quarterly, 51 (1987), pp. 75–83.
11. A general discussion of failures of blinding is Dean Ferguson et al., “Turning a blind eye: The success of blinding reported in a random sample of randomised, placebo controlled trials,” British Medical Journal, 328 (2004), p. 432.
25. From pewresearch.org on November 10, 2009.
12. Based on a study conducted by Sandra Simonis under the direction of Professor Jon Harbor from the Purdue University Department of Earth, Atmospheric Sciences, and Planetary.
28. John C. Bailar III, “The real threats to the integrity of science,” Chronicle of Higher Education, April 21, 1995, pp. B1–B2.
13. Based on a study conducted by Tammy Younts directed by Professor Deb Bennett of the Purdue University Department of Educational Studies. For more information about Reading Recovery, see readingrecovery.org/. 14. Based on a study conducted by Rajendra Chaini under the direction of Professor Bill Hoover of the Purdue University Department of Forestry and Natural Resources. 15. From the Hot Ringtones list at billboard.com/ on January 28, 2013. 16. From the Rock Songs list at billboard.com/ on January 28, 2013. 17. From the online version of the Bureau of Labor Statistics, Handbook of Methods, modified April 17, 2003, at bls.gov. The details of the design are more complicated than we describe. 18. For more detail on the material of this section and complete references, see P. E. Converse and M. W. Traugott, “Assessing the accuracy of polls and surveys,” Science, 234 (1986), pp. 1094–1098. 19. From census.gov/cps/methodology/nonresponse. html on January 29, 2013. 20. From www3.norc.Org/GSSWebsite/FAQs/ on January 29, 2013.
24. From gallup.com on November 10, 2009.
26. From thefuturescompany.com/ on January 29, 2013. 27. From aauw.org/act/laf/library/harassment_stats.cfm on January 30, 2013.
29. The difficulties of interpreting guidelines for informed consent and for the work of institutional review boards in medical research are a main theme of Beverly Woodward, “Challenges to human subject protections in U.S. medical research,” Journal of the American Medical Association, 282 (1999), pp. 1947–1952. The references in this paper point to other discussions. 30. Quotation from the Report of the Tuskegee Syphilis Study Legacy Committee, May 20, 1996. A detailed history is James H. Jones, Bad Blood: The Tuskegee Syphilis Experiment, Free Press, 1993. 31. Dr. Hennekens’s words are from an interview in the Annenberg/Corporation for Public Broadcasting video series Against All Odds: Inside Statistics. 32. See ftc.gov/opa/2009/04/kellogg.shtm. 33. On February 12, 2012, the CBS show 60 Minutes reported the latest news on this study, which was published in the Journal of Clinical Oncology in 2007. See cbsnews.com/video/watch/?id57398476n. 34. From Randi Zlotnik Shaul et al., “Legal liabilities in research: Early lessons from North America,” BMJ Medical Ethics, 6, No. 4 (2005), pp. 1–4. 35. See previous note. 36. The report was issued in February 2009 and is available from ftc.gov/os/2009/02/P085400behavadreport.pdf.
21. See pewresearch.org/about. 22. See “Assessing the representativeness of public opinion surveys,” May 15, 2012, from people-press. org/2012/05/15. 23. Sex: Tom W. Smith, “The JAMA controversy and the meaning of sex,” Public Opinion Quarterly, 63 (1999), pp. 385–400. Welfare: from a New York Times/ CBS News Poll reported in the New York Times, July 5, 1992. Scotland: “All set for independence?” Economist,
CHAPTER 4 1. An informative and entertaining account of the origins of probability theory is Florence N. David, Games, Gods and Gambling, Charles Griffin, London, 1962. 2. Color popularity for 2011 from the Dupont Automotive Color report; see dupont.com/Media_center/en_US/ color_popularity.
Notes and Data Sources 3. You can find a mathematical explanation of Benford’s law in Ted Hill, “The first-digit phenomenon,” American Scientist, 86 (1996), pp. 358–363; and Ted Hill, “The difficulty of faking data,” Chance, 12, No. 3 (1999), pp. 27–31. Applications in fraud detection are discussed in the second paper by Hill and in Mark A. Nigrini, “I’ve got your number,” Journal of Accountancy, May 1999, available online at aicpa.org/pubs/jofa/joaiss.htm. 4. Royal Statistical Society news release, “Royal Statistical Society concerned by issues raised in Sally Clark case,” October 23, 2001, at www.rss.org.uk. For background, see an editorial and article in the Economist, January 22, 2004. The editorial is entitled “The probability of injustice.” 5. See cdc.gov/mmwr/preview/mmwrhtml/ mm57e618a1.htm. 6. See the previous note. 7. From funtonia.com/top_ringtones_chart.asp. This website gives popularity scores based on download activity on the Internet. These scores were converted to probabilities for this exercise by dividing each popularity score by the sum of the scores for the top ten ringtones. 8. See bloodbook.com/world-abo.html for the distribution of blood types for various groups of people. 9. From Statistics Canada, www.statcan.ca. 10. We use x both for the random variable, which takes different values in repeated sampling, and for the numerical value of the random variable in a particular sample. Similarly, s and pˆ stand both for random variables and for specific values. This notation is mathematically imprecise but statistically convenient.
N-5
The variance of a continuous random variable X is the average squared deviation of the values of X from their mean, found by the integral
冮
s2X ⫽ 1x ⫺ m2 2 f 1x2 dx 15. See A. Tversky and D. Kahneman, “Belief in the law of small numbers,” Psychological Bulletin, 76 (1971), pp. 105–110, and other writings of these authors for a full account of our misperception of randomness. 16. Probabilities involving runs can be quite difficult to compute. That the probability of a run of three or more heads in 10 independent tosses of a fair coin is (1/2) 1 (1/128) 5 0.508 can be found by clever counting. A general treatment using advanced methods appears in Section XIII.7 of William Feller, An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., Wiley, 1968. 17. R. Vallone and A. Tversky, “The hot hand in basketball: On the misperception of random sequences,” Cognitive Psychology, 17 (1985), pp. 295–314. A later series of articles that debate the independence question is A. Tversky and T. Gilovich, “The cold facts about the ‘hot hand’ in basketball,” Chance, 2, No. 1 (1989), pp. 16–21; P. D. Larkey, R. A. Smith, and J. B. Kadane, “It’s OK to believe in the ‘hot hand,’” Chance, 2, No. 4 (1989), pp. 22–30; and A. Tversky and T. Gilovich, “The ‘hot hand’: Statistical reality or cognitive illusion?” Chance, 2, No. 4 (1989), pp. 31–34. 18. Based on a study discussed in S. Atkinson, G. McCabe, C. Weaver, S. Abrams, and K O’Brien, “Are current calcium recommendations for adolescents higher than needed to achieve optimal peak bone mass? The controversy,” Journal of Nutrition, 138, No. 6 (2008), pp. 1182–1186.
11. We will consider only the case in which X takes a finite number of possible values. The same ideas, implemented with more advanced mathematics, apply to random variables with an infinite but still countable collection of values.
19. Based on a study described in Corby C. Martin et al., “Children in school cafeterias select foods containing more saturated fat and energy than the Institute of Medicine recommendations, “ Journal of Nutrition, 140 (2010), pp. 1653–1660.
12. Based on a Pew Internet report, “Teens and distracted driving,” available from pewinternet.org/ Reports/2009/Teens-and-Distracted-Driving.aspx.
20. Based on The Ethics of American Youth–-2008, available from the Josephson Institute, charactercounts.org/ programs/reportcard.
13. See pewinternet.org/Reports/2009/17-Twitter-andStatus-Updating-Fall-2009.aspx.
21. See nces.ed.gov/programs/digest. Data are from the 2012 Digest of Education Statistics.
14. The mean of a continuous random variable X with density function f(x) can be found by integration:
22. From the 2012 Statistical Abstract of the United States, Table 299.
冮
mX ⫽ x f 1x2 dx This integral is a kind of weighted average, analogous to the discrete-case mean mX ⫽ a xP1X x 2
23. Ibid., Table 278.
CHAPTER 5 1. K. M. Orzech et al., “The state of sleep among college students at a large public university,” Journal of American College Health, 59 (2011), pp. 612–619.
N-6
Notes and Data Sources
2. The description of the 2011 survey and results can be found at blog.appsfire.com/infographic-ios-apps-vsweb-apps. 3. Haipeng Shen, “Nonparametric regression for problems involving lognormal distributions,” PhD dissertation, University of Pennsylvania, 2003. Thanks to Haipeng Shen and Larry Brown for sharing the data.
18. This 2011 survey was performed by Ipsos MediaCT right before a new Copyright Amendment Act went into effect in New Zealand. Results of this survey can be found at www.copyright.co.nz/News/2195/. 19. Lydia Saad, “Americans’ preference for smaller families edges higher,” Gallup Poll press release, June 30, 2011, www.gallup.com.
4. Findings are from the Time Mobility Poll run between June 29 and July 28, 2012. The results were published in the August 27, 2012, issue of Time.
20. A summary of Larry Wright’s study can be found at www.nytimes.com/2009/03/04/sports/ basketball/04freethrow.html.
5. Statistical methods for dealing with time-to-failure data, including the Weibull model, are presented in Wayne Nelson, Applied Life Data Analysis, Wiley, 1982.
21. Barbara Means et al., “Evaluation of evidencebased practices in online learning: A meta-analysis and review of online learning studies,” U.S. Department of Education, Office of Planning, Evaluation, and Policy Development, 2010.
6. Findings are from Nielsen’s “State of the Appnation— a year of change and growth in U.S. Smartphones,” posted May 16, 2012, on blog.nielsen.com/nielsenwire/. 7. Statistics regarding Facebook usage can be found at www.facebook.com/notes/facebook-data-team/ anatomy-of-facebook/10150388519243859.
22. Dafna Kanny et al., “Vital signs: Binge drinking among women and high school girls—United States, 2011,” Morbidity and Mortality Weekly Report, January 8, 2013.
8. From the grade distribution database of the Indiana University Office of the Registrar, gradedistribution. registrar.indiana.edu.
23. Information was obtained from “Price comparisons of wireline, wireless and internet services in Canada and with foreign jurisdictions,” Canadian Radio-Television and Telecommunications Commission, April 6, 2012.
9. Karel Kleisner et al., “Trustworthy-looking face meets brown eyes,” PLoS ONE, 8, No. 1 (2013), e53285, doi:10.1371/journal.pone.0053285.
24. This information can be found at www.census.gov/ genealogy/names/dist.all.last.
10. Diane M. Dellavalle and Jere D. Haas, “Iron status is associated with endurance performance and training in female rowers,” Medicine and Science in Sports and Exercise, 44, No. 8 (2012), pp. 1552–1559. 11. Results of this and other questions from this 2011 survey can be found at http://www.mumsnet.com/ surveys/pressure-on-children-and-parents. 12. Crossing the Line: Sexual Harassment at School, a report from the American Association of University Women Educational Foundation published in 2011. See www.aauw.org/. 13. S. A. Rahimtoola, “Outcomes 15 years after valve replacement with a mechanical vs. a prosthetic valve: Final report of the Veterans Administration randomized trial,” American College of Cardiology, www.acc.org/ education/online/trials/acc2000/15yr.htm.
CHAPTER 6 1. Noel Cressie, Statistics for Spatial Data, Wiley, 1993. The significance test result that we report is one of several that could be used to address this question. See pp. 607–609 of the Cressie book for more details. 2. The 2010–2011 statistics for California were obtained from the California Department of Education website, dq.cde.ca.gov. 3. Based on information reported in “How America pays for college 2012,” found online at www.siena.edu/uploadedfiles/ home/SallieMaeHowAmericaPays2012.pdf. 4. See Note 3. This total amount includes grants, scholarships, loans, and assistance from friends and family.
14. The full online clothing store ratings are featured in the December 2008 issue of Consumer Reports and online at www.ConsumerReports.org.
5. Average starting salary taken from the September 2012 salary survey by the National Association of Colleges and Employers.
15. The results of this 2012 survey can be found at www. theaa.com/newsroom/news-2012/streetwatch-october2012-fewer-potholes.html.
6. The standard reference here is Bradley Efron and Robert J. Tibshirani, An Introduction to the Bootstrap, Chapman Hall, 1993. A less technical overview is in Bradley Efron and Robert J. Tibshirani, “Statistical data analysis in the computer age,” Science, 253 (1991), pp. 390–395.
16. The results of this 2012 survey can be found at josephsoninstitute.org. 17. A description and summary of this 2012 survey can be found at www.ipsos-na.com/news-polls/pressrelease. aspx?id55537.
7. See www.thekaraokeechannel.com/online/#. 8. These annual surveys can be found at www.apa.org/ news/press/releases/stress/index.aspx.
Notes and Data Sources 9. C. M. Weaver et al., “Quantification of biochemical markers of bone turnover by kinetic measures of bone formation and resorption in young healthy females,” Journal of Bone and Mineral Research, 12 (1997), pp. 1714–1720. 10. See Note 5. 11. Euna Hand and Lisa M. Powell, “Consumption patterns of sugar-sweetened beverages in the United States,” Journal of the Academy of Nutrition and Dietetics, 113, No. 1 (2013), pp. 43–53. 12. See the 2012 press release from the Student Monitor, at www.studentmonitor.com. 13. Elizabeth Mendes, “U.S. job satisfaction struggles to recover to 2008 levels,” Gallup News Service, May 31, 2011. Found at www.gallup.com/poll/. 14. The vehicle is a 2002 Toyota Prius. 15. Regional cost-of-living rates are often computed using the Department of Labor, Bureau of Labor Statistics, metropolitan-area consumer price indexes. These can be found at www.bls.gov/cpi. 16. See Note 11. 17. M. Garaulet et al., “Timing of food intake predicts weight loss effectiveness,” International Journal of Obesity, 1 (2013), pp. 1–8. 18. Giacomo DeGiorgi et al., “Be as careful of the company you keep as of the books you read: Peer effects in education and on the labor market,” National Bureau of Economic Research, working paper 14948 (2009). 19. Seung-Ok Kim, “Burials, pigs, and political prestige in neolithic China,” Current Anthropology, 35 (1994), pp. 119–141. 20. These data were collected in connection with the Purdue Police Alcohol Student Awareness Program run by Police Officer D. A. Larson. 21. National Assessment of Educational Progress, The Nation’s Report Card, Mathematics 2011. 22. Matthew A. Lapierre et al., “Background television in the homes of U.S. children,” Pediatrics, 130, No. 5 (2012), pp. 839–846. 23. Sogol Javaheri et al., “Sleep quality and elevated blood pressure in adolescents,” Circulation, 118 (2008), pp. 1034–1040. 24. Victor Lun et al., “Evaluation of nutritional intake in Canadian high-performance athletes,” Clinical Journal of Sports Medicine, 19, No. 5 (2009), pp. 405–411. 25. R. A. Fisher, “The arrangement of field experiments,” Journal of the Ministry of Agriculture of Great Britain, 33 (1926), p. 504, quoted in Leonard J. Savage, “On rereading R. A. Fisher,” Annals of Statistics, 4 (1976), p. 471. Fisher’s
N-7
work is described in a biography by his daughter: Joan Fisher Box, R. A. Fisher: The Life of a Scientist, Wiley, 1978. 26. The editorial was written by Phil Anderson. See British Medical Journal, 328 (2004), pp. 476–477. A letter to the editor on this topic by Doug Altman and J. Martin Bland appeared shortly after. See “Confidence intervals illuminate absence of evidence,” British Medical Journal, 328 (2004), pp. 1016–1017. 27. A. Kamali et al., “Syndromic management of sexuallytransmitted infections and behavior change interventions on transmission of HIV-1 in rural Uganda: A community randomised trial,” Lancet, 361 (2003), pp. 645–652. 28. T. D. Sterling, “Publication decisions and their possible effects on inferences drawn from tests of significance— or vice versa,” Journal of the American Statistical Association, 54 (1959), pp. 30–34. Related comments appear in J. K. Skipper, A. L. Guenther, and G. Nass, “The sacredness of 0.05: A note concerning the uses of statistical levels of significance in social science,” American Sociologist, 1 (1967), pp. 16–18. 29. For a good overview of these issues, see Bruce A. Craig, Michael A. Black, and Rebecca W. Doerge, “Gene expresssion data: The technology and statistical analysis,” Journal of Agricultural, Biological, and Environmental Statistics, 8 (2003), pp. 1–28. 30. Erick H. Turner et al., “Selective publication of antidepressant trials and its influence on apparent efficacy,” New England Journal of Medicine, 358 (2008), pp. 252–260. 31. Robert J. Schiller, “The volatility of stock market prices,” Science, 235 (1987), pp. 33–36. 32. Padmaja Ayyagari and Jody L. Sindelar, “The impact of job stress on smoking and quitting: Evidence from the HRS,” National Bureau of Economic Research, working paper 15232 (2009). 33. Corby K. Martin et al., “Children in school cafeterias select foods containing more saturated fat and energy than the Institute of Medicine recommendations,” Journal of Nutrition, 140 (2010), pp. 1653–1660. 34. Data from Joan M. Susic, “Dietary phosphorus intakes, urinary and peritoneal phosphate excretion and clearance in continuous ambulatory peritoneal dialysis patients,” MS thesis, Purdue University, 1985. 35. Mugdha Gore and Joseph Thomas, “Store image as a predictor of store patronage for nonprescription medication purchases: A multiattribute model approach,” Journal of Pharmaceutical Marketing & Management, 10 (1996), pp. 45–68. 36. Greg L. Stewart et al., “Exploring the handshake in employment interviews,” Journal of Applied Psychology, 93 (2008), pp. 1139–1146.
N-8
Notes and Data Sources
CHAPTER 7 1. Average hours per month obtained from “The CrossPlatform Report, 3rd Quarter 2012,” Nielsen Company (2013). 2. C. Don Wiggins, “The legal perils of ‘underdiversification’— a case study,” Personal Financial Planning, 1, No. 6 (1999), pp. 16–18. 3. These data were collected as part of a larger study of dementia patients conducted by Nancy Edwards, School of Nursing, and Alan Beck, School of Veterinary Medicine, Purdue University. 4. These recommendations are based on extensive computer work. See, for example, Harry O. Posten, “The robustness of the one-sample t-test over the Pearson system,” Journal of Statistical Computation and Simulation, 9 (1979), pp. 133–149; and E. S. Pearson and N. W. Please, “Relation between the shape of population distribution and the robustness of four simple test statistics,” Biometrika, 62 (1975), pp. 223–241. 5. The data were obtained on August 24, 2006, from an iPod owned by George McCabe, Jr. 6. The method is described in Xiao-Hua Zhou and Sujuan Gao, “Confidence intervals for the log-normal mean,” Statistics in Medicine, 16 (1997), pp. 783–790. 7. You can find a practical discussion of distributionfree inference in Myles Hollander and Douglas A. Wolfe, Nonparametric Statistical Methods, 2nd ed., Wiley, 1999. 8. Statistics regarding Facebook usage can be found at www.facebook.com/notes/facebook-data-team/ anatomy-of-facebook/10150388519243859. 9. A description of the lawsuit can be found at www. cnn.com/2013/02/26/business/california-anheuserbusch-lawsuit/index.html. 10. See Note 1. 11. Christine L. Porath and Amir Erez, “Overlooked but not untouched: How rudeness reduces onlookers’ performance on routine and creative tasks,” Organizational Behavior and Human Decision Processes, 109 (2009), pp. 29–44. 12. The vehicle is a 2002 Toyota Prius owned by the third author. 13. Niels van de Ven et al., “The return trip effect: Why the return trip often seems to take less time,” Psychonomic Bulletin and Review, 18, No. 5 (2011), pp. 827–832. 14. Sujata Sethi et al., “Study of level of stress in the parents of children with attention-deficit/hyperactivity disorder,” Journal of Indian Association for Child and Adolescent Mental Health, 8, No. 2 (2012), pp. 25–37. 15. James A. Levine, Norman L. Eberhardt, and Michael D. Jensen, “Role of nonexercise activity thermogenesis
in resistance to fat gain in humans,” Science, 283 (1999), pp. 212–214. Data for this study are available from the Science website, www.sciencemag.org. 16. These data were collected in connection with a bone health study at Purdue University and were provided by Linda McCabe. 17. Data provided by Joseph A. Wipf, Department of Foreign Languages and Literatures, Purdue University. 18. Data from Wayne Nelson, Applied Life Data Analysis, Wiley, 1982, p. 471. 19. Summary information can be found at the National Center for Health Statistics website, www.cdc.gov/nchs/ nhanes.htm. 20. Detailed information about the conservative t procedures can be found in Paul Leaverton and John J. Birch, “Small sample power curves for the two sample location problem,” Technometrics, 11 (1969), pp. 299–307; in Henry Scheffé, “Practical solutions of the Behrens-Fisher problem,” Journal of the American Statistical Association, 65 (1970), pp. 1501–1508; and in D. J. Best and J. C. W. Rayner, “Welch’s approximate solution for the BehrensFisher problem,” Technometrics, 29 (1987), pp. 205–210. 21. This example is adapted from Maribeth C. Schmitt, “The effects of an elaborated directed reading activity on the metacomprehension skills of third graders,” PhD dissertation, Purdue University, 1987. 22. See the extensive simulation studies in Harry O. Posten, “The robustness of the two-sample t test over the Pearson system,” Journal of Statistical Computation and Simulation, 6 (1978), pp. 295–311. 23. M. Garaulet et al., “Timing of food intake predicts weight loss effectiveness,” International Journal of Obesity, advance online publication, January 29, 2013, doi:10.1038/ijo.2012.229. 24. This study is reported in Roseann M. Lyle et al., “Blood pressure and metabolic effects of calcium supplementation in normotensive white and black men,” Journal of the American Medical Association, 257 (1987), pp. 1772–1776. The individual measurements in Table 7.5 were provided by Dr. Lyle. 25. Karel Kleisner et al., “Trustworthy-looking face meets brown eyes,” PLoS ONE, 8, No. 1 (2013), e53285, doi:10.1371/journal.pone.0053285. 26. Reynol Junco, “Too much face and not enough books: The relationship between multiple indices of Facebook use and academic performance,” Computers in Human Behavior, 28, No. 1 (2011), pp. 187–198. 27. C. E. Cryfer et al., “Misery is not miserly: Sad and self-focused individuals spend more,” Psychological Science, 19 (2008), pp. 525–530.
Notes and Data Sources 28. A. A. Labroo et al., “Of frog wines and frowning watches: Semantic priming, perceptual fluency, and brand evaluation,” Journal of Consumer Research, 34 (2008), pp. 819–831. 29. The 2012 study can be found at www.qsrmagazine. com/content/2012-drive-thru-. 30. Grant D. Brinkworth et al., “Long-term effects of a very low-carbohydrate diet and a low-fat diet on mood and cognitive function,” Archives of Internal Medicine, 169 (2009), pp. 1873–1880. 31. B. Bakke et al., “Cumulative exposure to dust and gases as determinants of lung function decline in tunnel construction workers,” Occupational Environmental Medicine, 61 (2004), pp. 262–269. 32. Samara Joy Nielsen and Barry M. Popkin, “Patterns and trends in food portion sizes, 1977–1998,” Journal of the American Medical Association, 289 (2003), pp. 450–453. 33. Gordana Mrdjenovic and David A. Levitsky, “Nutritional and energetic consequences of sweetened drink consumption in 6- to 13-year-old children,” Journal of Pediatrics, 142 (2003), pp. 604–610. 34. David Han-Kuen Chu, “A test of corporate advertising using the elaboration likelihood model,” MS thesis, Purdue University, 1993. 35. M. F. Picciano and R. H. Deering, “The influence of feeding regimens on iron status during infancy,” American Journal of Clinical Nutrition, 33 (1980), pp. 746–753. 36. The problem of comparing spreads is difficult even with advanced methods. Common distribution-free procedures do not offer a satisfactory alternative to the F test, because they are sensitive to unequal shapes when comparing two distributions. A good introduction to the available methods is W. J. Conover, M. E. Johnson, and M. M. Johnson, “A comparative study of tests for homogeneity of variances, with applications to outer continental shelf bidding data,” Technometrics, 23 (1981), pp. 351–361. Modern resampling procedures often work well. See Dennis D. Boos and Colin Brownie, “Bootstrap methods for testing homogeneity of variances,” Technometrics, 31 (1989), pp. 69–82.
N-9
and metabolism in healthy free-living males,” Journal of the American College of Nutrition, 16 (1997), pp. 134–139. 41. G. E. Smith et al., “A cognitive training program based on principles of brain plasticity: Results from the Improvement in Memory with Plasticity-Based Adaptive Cognitive Training (IMPACT) study,” Journal of the American Geriatrics Society, epub (2009) 57, No. 4, pp. 594–603. 42. Douglas J. Levey et al., “Urban mockingbirds quickly learn to identify individual humans,” Proceedings of the National Academy of Sciences, 106 (2009), pp. 8959–8962. 43. B. Wansink et al., “Fine as North Dakota wine: Sensory expectations and the intake of companion foods,” Physiology & Behavior, 90 (2007), pp. 712–716. 44. Anne Z. Hoch et al., “Prevalence of the female athlete triad in high school athletes and sedentary students,” Clinical Journal of Sports Medicine, 19 (2009), pp. 421–428. 45. This exercise is based on events that are real. The data and details have been altered to protect the privacy of the individuals involved. 46. Based loosely on D. R. Black et al., “Minimal interventions for weight control: A cost-effective alternative,” Addictive Behaviors, 9 (1984), pp. 279–285. 47. These data were provided by Professor Sebastian Heath, School of Veterinary Medicine, Purdue University. 48. J. W. Marr and J. A. Heady, “Within- and betweenperson variation in dietary surveys: Number of days needed to classify individuals,” Human Nutrition: Applied Nutrition, 40A (1986), pp. 347–364.
CHAPTER 8 1. The actual distribution of X based on an SRS from a finite population is the hypergeometric distribution. Details regarding this distribution can be found in Sheldon M. Ross, A First Course in Probability, 8th ed., Prentice Hall, 2010. 2. From pewinternet.org/Reports/2013/Coming-andgoing-on-facebook.aspx, February 5, 2013. 3. Results of the survey are available at slideshare.net/ duckofdoom/google-research-about-mobile-internetin-2011.
37. G. E. P. Box, “Non-normality and tests on variances,” Biometrika, 40 (1953), pp. 318–335. The quote appears on page 333.
4. Details of exact binomial procedures can be found in Myles Hollander and Douglas Wolfe, Nonparametric Statistical Methods, 2nd ed., Wiley, 1999.
38. This city’s restaurant inspection data can be found at www.jsonline.com/watchdog/dataondemand/.
5. See A. Agresti and B. A. Coull, “Approximate is better than ‘exact’ for interval estimation of binomial proportions,” American Statistician, 52 (1998), pp. 119–126. A detailed theoretical study is Lawrence D. Brown, Tony Cai, and Anirban DasGupta, “Confidence intervals for a binomial proportion and asymptotic expansions,” Annals of Statistics, 30 (2002), pp. 160–201.
39. Braz Camargo et al., “Interracial friendships in college,” Journal of Labor Economics, 28 (2010), pp. 861–892. 40. Based on Loren Cordain et al., “Influence of moderate daily wine consumption on body weight regulation
N-10
Notes and Data Sources
6. See, for example, pilatesmethodalliance.org. 7. See pewinternet.org/Reports/2013/in-store-mobilecommerce.aspx. 8. Heather Tait, Aboriginal Peoples Survey, 2006: Inuit Health and Social Conditions, Social and Aboriginal Statistics Division, Statistics Canada, 2008. Available from statcan.gc.ca/pub. 9. See southerncross.co.nz/about-the-group/mediareleases/2013.aspx. 10. See commonsensemedia.org/sites/default/files/ full_cap-csm_report_results-1-7-13.pdf. 11. See “National Survey of Student Engagement, the College Student Report 2009,” available online at nsse. iub.edu/index.cfm. 12. This survey and others that study issues related to college students can be found at nelliemae.com. 13. See Note 11. 14. Information about the survey can be found online at saint-denis.library.arizona.edu/natcong. 15. See Note 2. 16. See Alan Agresti and Brian Caffo, “Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures,” American Statistician, 45 (2000), pp. 280–288. The plus four interval is a bit conservative (true coverage probability is higher than the confidence level) when p1 and p2 are equal and close to 0 or 1, but the traditional interval is much less accurate and has the fatal flaw that the true coverage probability is less than the confidence level. 17. J. M. Tanner, “Physical growth and development,” in J. O. Forfar and G. C. Arneil, Textbook of Paediatrics, 3rd ed., Churchill Livingston, 1984, pp. 1–292. 18. Based on T. A. Brighton et al., “Low-dose aspirin for preventing recurrent venous thromboembolism,” New England Journal of Medicine, 367, No. 21 (2012), pp. 1979–1987. The analysis in the published manuscript used a slightly more complicated summary, called the hazard ratio, to compare the treatments. 19. Edward Bumfardner, “Loss of teeth as a disqualification for military service,” Transactions of the Kansas Academy of Science, 18 (1903), pp. 217–219. 20. B. J. Bradley et al., “Historical perspective and current status of the physical education requirement at American 4-year colleges and universities,” Research Quarterly for Exercise and Sport, 83, No. 4 (2012), pp. 503–512. 21. Erin K. O’Loughlin et al., “Prevalence and correlates of exergaming in youth,” Pediatrics, 130 (2012), pp. 806–814.
22. From a Pew Internet Project Data Memo by Amanda Lenhart et al., dated December 2008. Available at pewinternet.org. 23. From Monica Macaulay and Colleen Brice, “Don’t touch my projectile: Gender bias and stereotyping in syntactic examples,” Language, 73, No. 4 (1997), pp. 798–825. The first part of the title is a direct quote from one of the texts. 24. The report, dated May 18, 2012, is available from pewinternet.org/Reports/2012/Future-of-Gamification/ Overview.aspx. 25. From the Pew Research Center’s Project for Excellence in Journalism, The State of the News Media 2012, available from stateofthemedia.org/?src5prc-headline. 26. See iom.edu. 27. Based on a study described in Corby C. Martin et al., “Children in school cafeterias select foods containing more saturated fat and energy than the Institute of Medicine recommendations,” Journal of Nutrition, 140 (2010), pp. 1653–1660. 28. Data are from the NOAA Satellite and Information Service at ncdc.noaa.gov/special-reports/groundhogday.php. 29. From pewinternet.org/,/media//Files/ Reports/2013/PIP_SocialMediaUsers.pdf. 30. From forbes.com/sites/ericsavitz/2013/01/11/totallypwned-2012-u-s-video-game-retail-sales-tumble-22. 31. From the Entertainment Software Association website at theesa.com/facts. 32. See Note 12. 33. See S. W. Lagakos, B. J. Wessen, and M. Zelen, “An analysis of contaminated well water and health effects in Woburn, Massachusetts,” Journal of the American Statistical Association, 81 (1986), pp. 583–596, and the following discussion. This case is the basis for the movie A Civil Action. 34. This case is discussed in D. H. Kaye and M. Aickin (eds.), Statistical Methods in Discrimination Litigation, Marcel Dekker, 1986; and D. C. Baldus and J. W. L. Cole, Statistical Proof of Discrimination, McGraw-Hill, 1980. 35. See Note 12.
CHAPTER 9 1. From J. Cantor, “Long-term memories of frightening media often include lingering trauma symptoms,” poster paper presented at the Association for Psychological Science Convention, New York, May 26, 2006.
Notes and Data Sources 2. When the expected cell counts are small, it is best to use a test based on the exact distribution rather than the chi-square approximation, particularly for 2 ⫻ 2 tables. Many statistical software systems offer an “exact” test as well as the chi-square test for 2 ⫻ 2 tables. 3. From E. Y. Peck, “Gender differences in film-induced fear as a function of type of emotion measure and stimulus content: A meta-analysis and laboratory study,” PhD dissertation, University of Wisconsin–Madison. 4. D.-C. Seo et al., “Relations between physical activity and behavioral and perceptual correlates among midwestern college students,” Journal of Americal College Health, 56, No. 2 (2007), pp. 187–197. 5. See, for example, Alan Agresti, Categorical Data Analysis, 2nd ed., Wiley, 2007. 6. From P. Strazzullo et al., “Salt intake, stroke, and cardiovascular disease: a meta-analysis of prospective studies,” British Medical Journal, 339 (2009), pp. 1–9. The meta-analysis combined data from 14 study cohorts taken from 10 different studies. 7. N. R. Cook et al., “Long term effects of dietary sodium reduction on cardiovascular disease outcomes: Observational follow-up of the trials of the hypertension prevention (TOHP),” British Medical Journal, 334 (2007), pp. 1–8. 8. The sampling procedure was designed by George McCabe. It was carried out by Amy Conklin, an undergraduate honors student in the Department of Foods and Nutrition at Purdue University. 9. The analysis could also be performed by using a twoway table to compare the states of the selected and notselected students. Since the selected students are a relatively small percent of the total sample, the results will be approximately the same. 10. See the M&M Mars website at us.mms.com/us/ about/products for this and other information. 11. Catherine Hill and Holly Kearl, Crossing the Line: Sexual Harassment at School, American Association of University Women, Washington, DC, 2011. 12. Based on pewsocialtrends.org/files/2011/08/onlinelearning.pdf. 13. For an overview of remote deposit capture, see remotedepositcapture.com/overview/rdc.overview.aspx. 14. From the Community Bank Competitiveness Survey, 2008, ABA Banking Journal. The survey is available at nxtbook.com/nxtbooks/sb/ababj-compsurv08/index.php. 15. See nhcaa.org. 16. These data are a composite based on several actual audits of this type.
N-11
17. Data provided by Professor Marcy Towns of the Purdue University Department of Chemistry. 18. Based on The Ethics of American Youth–2008, available from the Josephson Institute at charactercounts. org/programs/reportcard. 19. From the Survey of Canadian Career College Students Phase II: In-School Student Survey, 2008. This report is available from hrsdc.gc.ca/eng/publications_ resources.
CHAPTER 10 1. Data based on Michael L. Mestek et al., “The relationship between pedometer-determined and self-reported physical activity and body composition variables in college-aged men and women,” Journal of American College Health, 57 (2008), pp. 39–44. 2. The vehicle is a Pontiac transport van. 3. Information regarding bone health can be found in “Osteoporosis: Peak bone mass in women,” last reviewed in January 2012 and available at www. niams.nih.gov/Health_Info/Bone/Osteoporosis/ bone_mass.asp. 4. The data were provided by Linda McCabe and were collected as part of a large study of women’s bone health and another study of calcium kinetics, both directed by Professor Connie Weaver of the Department of Foods and Nutrition, Purdue University. 5. These data were provided by Professor Wayne Campbell of the Purdue University Department of Foods and Nutrition. 6. For more information about nutrient requirements, see the Institute of Medicine publications on Dietary Reference Intakes available at www.nap.edu. 7. The method is described in Chapter 2 of M. Kutner et al., Applied Linear Statistical Models, 5th ed., Irwin, 2004. 8. National Science Foundation, Division of Science Resources Statistics, Academic Research and Development Expenditures: Fiscal Year 2009, Detailed Statistical Tables NSF 11-313, Arlington, VA, 2011. Available at www.nsf. gov/statistics/nsf11313/. 9. This annual report can be found at www.kiplinger. com. 10. Tuition rates for 2008 and 2011 were obtained from www.findthebest.com. 11. These are part of the data from the EESEE story “Blood Alcohol Content,” found on the text website, www.whfreeman.com/ips8e.
N-12
Notes and Data Sources
12. M. Mondello and J. Maxcy, “The impact of salary dispersion and performance bonuses in NFL organizations,” Management Decision, 47 (2009), pp. 110–123. These data were collected from www.cbssports.com/nfl/ playerrankings/regularseason/ and content.usatoday. com/sports/football/nfl/salaries/. 13. Selling price and assessment value available at php. jconline.com/propertysales/propertysales.php. 14. Data available at www.ncdc.noaa.gov. 15. Matthew P. Martens et al., “The co-occurrence of alcohol use and gambling activities in first-year college students,” Journal of American College Health, 57 (2009), pp. 597–602. 16. Based on Dan Dauwalter’s master’s thesis in the Department of Forestry and Natural Resources at Purdue University. More information is available in Daniel C. Dauwalter et al., “An index of biotic integrity for fish assemblages in Ozark Highland streams of Arkansas,” Southeastern Naturalist, 2 (2003), pp. 447–468. These data were provided by Emmanuel Frimpong. 17. G. Geri and B. Palla, “Considerazioni sulle più recenti osservazioni ottiche alla Torre Pendente di Pisa,” Estratto dal Bollettino della Società Italiana di Topografia e Fotogrammetria, 2 (1988), pp. 121–135. Professor Julia Mortera of the University of Rome provided valuable assistance with the translation. 18. M. Kuo et al., “The marketing of alcohol to college students: The role of low prices and special promotions,” American Journal of Preventive Medicine, 25, No. 3 (2003), pp. 204–211. 19. Rates can be found in “Annual Return of Key Indices (1993–2012),” available at www.lazardnet.com. 20. These data can be found in the report titled “Grade inflation at American colleges and universities,” at www. gradeinflation.com. 21. Toben F. Nelson et al., “The state sets the rate: The relationship among state-specific college binge drinking, stat binge drinking rates, and selected state alcohol control policies,” American Journal of Public Health, 95, No. 3 (2005), pp. 441–446. 22. Data on a sample of 12 of 56 perch in a data set contributed to the Journal of Statistics Education data archive www.amstat.org/publications/jse/ by Juha Puranen of the University of Helsinki. 23. L. Cooke et al., “Relationship between parental report of food neophobia and everyday food consumption in 2–6-year-old children,” Appetite, 41 (2003), pp. 205–206. 24. Alexandra Burt, “A mechanistic explanation of popularity: Genes, rule breaking, and evocative geneenvironment correlations,” Journal of Personality and Social Psychology, 96 (2009), pp. 783–794.
CHAPTER 11 1. This data set is similar to those used at Purdue University to assess academic success. 2. Mary E. Pritchard and Gregory S. Wilson, “Predicting academic success in undergraduates,” Academic Exchange Quarterly, 11 (2007), pp. 201–206. 3. R. M. Smith and P. A. Schumacher, “Predicting success for–actuarial students in undergraduate mathematics courses,” College Student Journal, 39, No. 1 (2005), pp. 165–177. 4. Based on Leigh J. Maynard and Malvern Mupandawana, “Tipping behavior in Canadian restaurants,” International Journal of Hospitality Management, 28 (2009), pp. 597–603. 5. Kathleen E. Miller, “Wired: Energy drinks, jock identity, masculine norms, and risk taking,” Journal of American College Health, 56 (2008), pp. 481–489. 6. From a table entitled “Largest Indianapolis-area architectural firms,” Indianapolis Business Journal, December 16, 2003. 7. The data were obtained from the Internet Movie Database (IMDb), available at www.imdb.com on April 20, 2010. 8. The 2009 table of 200 top universities can be found at www.timeshighereducation.co.uk. 9. The results were published in C. M. Weaver et al., “Quantification of biochemical markers of bone turnover by kinetic measures of bone formation and resorption in young healthy females,” Journal of Bone and Mineral Research, 12 (1997), pp. 1714–1720. The data were provided by Linda McCabe. 10. This data set was provided by Joanne Lasrado of the Purdue University Department of Foods and Nutrition. 11. These data are based on experiments performed by G. T. Lloyd and E. H. Ramshaw of the CSIRO Division of Food Research, Victoria, Australia. Some results of the statistical analyses of these data are given in G. P. McCabe, L. McCabe, and A. Miller, “Analysis of taste and chemical composition of cheddar cheese, 1982–83 experiments,” CSIRO Division of Mathematics and Statistics Consulting Report VT85/6; and in I. Barlow et al., “Correlations and changes in flavour and chemical parameters of cheddar cheeses during maturation,” Australian Journal of Dairy Technology, 44 (1989), pp. 7–18.
CHAPTER 12 1. R. Kanai et al., “Online social network size is reflected in human brain structure,” Proceedings of the Royal Society—Biological Sciences, 297 (2012), pp. 1327–1334.
Notes and Data Sources 2. Based on Stephanie T. Tong et al., “Too much of a good thing? The relationship between number of friends and interpersonal impressions on Facebook,” Journal of Computer-Mediated Communication, 13 (2008), pp. 531–549. 3. This rule is intended to provide a general guideline for deciding when serious errors may result by applying ANOVA procedures. When the sample sizes in each group are very small, this rule may be a little too conservative. For unequal sample sizes, particular difficulties can arise when a relatively small sample size is associated with a population having a relatively large standard deviation. 4. Penny M. Simpson et al., “The eyes have it, or do they? The effects of model eye color and eye gaze on consumer ad response,” Journal of Applied Business and Economics, 8 (2008), pp. 60–71. 5. Several different definitions for the noncentrality parameter of the noncentral F distribution are in use. When I 5 2, the l defined here is equal to the square of the noncentrality parameter d that we used for the two-sample t test in Chapter 7. Many authors prefer f ⫽ 2l兾I. We have chosen to use l because it is the form needed for the SAS function PROBF. 6. Bryan Raudenbush et al., “Pain threshold and tolerance differences among intercollegiate athletes: Implication of past sports injuries and willingness to compete among sports teams,” North American Journal of Psychology, 14 (2012), pp. 85–94. 7. Eileen Wood et al., “Examining the impact of off-task multi-tasking with technology on real-time classroom learning,” Computers & Education, 58 (2012), pp. 365–374. 8. Kendall J. Eskine, “Wholesome foods and wholesome morals? Organic foods reduce prosocial behavior and harshen moral judgments,” Social Psychological and Personality Science, 2012, doi: 10.1177/1948550612447114. 9. Adam I. Perlman et al., “Massage therapy for osteoarthritis of the knee: A randomized dose-finding trial,” PLoS ONE, 7, No. 2 (2012), e30248, doi:10.1371/journal. pone.0030248. 10. Jesus Tanguma et al., “Shopping and bargaining in Mexico: The role of women,” Journal of Applied Business and Economics, 9 (2009), pp. 34–40. 11. Jeffrey T. Kullgren et al., “Individual- versus groupbased financial incentives for weight loss,” Annals of Internal Medicine, 158, No. 7 (2013), pp. 505–514. 12. Corinne M. Kodama and Angela Ebreo, “Do labels matter? Attitudinal and behavioral correlates of ethnic and racial identity choices among Asian American undergraduates,” College Student Affairs Journal, 27, No. 2 (2009), pp. 155–175.
N-13
13. Sangwon Lee and Seonmi Lee, “Multiple play strategy in global telecommunication markets: An empirical analysis,” International Journal of Mobile Marketing, 3 (2008), pp. 44–53. 14. Christie N. Scollon et al., “Emotions across cultures and methods,” Journal of Cross-cultural Psychology, 35 (2004), pp. 304–326. 15. Adrian C. North et al., “The effect of musical style on restaurant consumers’ spending,” Environment and Behavior, 35 (2003), pp. 712–718. 16. Woo Gon Kim et al., “Influence of institutional DINESERV on customer satisfaction, return intention, and word-of-mouth,” International Journal of Hospitality Management, 28 (2009), pp. 10–17. 17. The experiment was performed in Connie Weaver’s lab in the Purdue University Department of Foods and Nutrition. The data were provided by Berdine Martin and Yong Jiang. 18. The data were provided by James Kaufman. The study is described in James C. Kaufman, “The cost of the muse: Poets die young,” Death Studies, 27 (2003), pp. 813–821. The quote from Yeats appears in this article. 19. Data provided by Jo Welch of the Purdue University Department of Foods and Nutrition. 20. Steve Badylak et al., “Marrow-derived cells populate scaffolds composed of xenogeneic extracellular matrix,” Experimental Hematology, 29 (2001), pp. 1310–1318. 21. This exercise is based on data provided from a study conducted by Jim Baumann and Leah Jones of the Purdue University School of Education.
CHAPTER 13 1. See www.who.int/topics/malaria/en/ for more information about malaria. 2. This example is based on a 2009 study described at clinicaltrials.gov/ct2/show/NCT00623857. 3. We present the two-way ANOVA model and analysis for the general case in which the sample sizes may be unequal. If the sample sizes vary a great deal, serious complications can arise. There is no longer a single standard ANOVA analysis. Most computer packages offer several options for the computation of the ANOVA table when cell counts are unequal. When the counts are approximately equal, all methods give essentially the same results. 4. Euna Hand and Lisa M. Powell, “Consumption patterns of sugar-sweetened beverages in the United States,” Journal of the Academy of Nutrition and Dietetics, 113, No. 1 (2013), pp. 43–53.
N-14
Notes and Data Sources
5. Rick Bell and Patricia L. Pliner, “Time to eat: The relationship between the number of people eating and meal duration in three lunch settings,” Appetite, 41 (2003), pp. 215–218. 6. Karolyn Drake and Jamel Ben El Hine, “Synchronizing with music: Intercultural differences,” Annals of the New York Academy of Sciences, 99 (2003), pp. 429–437. 7. Example 13.10 is based on a study described in P. D. Wood et al., “Plasma lipoprotein distributions in male and female runners,” in P. Milvey (ed.), The Marathon: Physiological, Medical, Epidemiological, and Psychological Studies, New York Academy of Sciences, 1977. 8. Gerardo Ramirez and Sian L. Beilock, “Writing about testing worries boosts exam performance in the classroom,” Science, 331 (2011), pp. 211–213. 9. Felix Javier Jimenez-Jimenez et al., “Influence of age and gender in motor performance in healthy adults,” Journal of the Neurological Sciences, 302 (2011), pp. 72–80.
American, Canadian, and French college students,” Journal of American College Health, 57 (2008), pp. 143–149. 17. Judith McFarlane et al., “An intervention to increase safety behaviors of abused women,” Nursing Research, 51 (2002), pp. 347–354. 18. Gad Saad and John G. Vongas, “The effect of conspicuous consumption on men’s testosterone levels,” Organizational Behavior and Human Decision Processes, 110 (2009), pp. 80–92. 19. Klaus Boehnke et al., “On the interrelation of peer climate and school performance in mathematics: A German-Canadian-Israeli comparison of 14-year-old school students,” in B. N. Setiadi, A. Supratiknya, W. J. Lonner, and Y. H. Poortinga (eds.), Ongoing Themes in Psychology and Culture (Online Ed.), International Association for Cross-Cultural Psychology. 20. Data provided by Julie Hendricks and V. J. K. Liu of the Department of Foods and Nutrition, Purdue University.
10. Tomas Brodin et al., “Dilute concentrations of a psychiatric drug alter behavior of fish from natural populations,” Science, 339 (2013), pp. 814–815.
21. Lijia Lin et al., “Animated agents and learning: Does the type of verbal feedback they provide matter?” Computers and Education, 2013, doi: 10.1016/j. compedu.2013.04.017.
11. Vincent P. Magnini and Kiran Karande, “The influences of transaction history and thank you statements in service recovery,” International Journal of Hospitality Management, 28 (2009), pp. 540–546.
22. Tamar Kugler et al., “Trust between individuals and groups: Groups are less trusting than individuals but just as trustworthy,” Journal of Economic Psychology, 28 (2007), pp. 646–657.
12. Brian Wansink et al., “The office candy dish: Proximity’s influence on estimated and actual consumption,” International Journal of Obesity, 30 (2006), pp. 871–875.
23. Based on A. A. Adish et al., “Effect of consumption of food cooked in iron pots on iron status and growth of young children: A randomised trial,” Lancet, 353 (1999), pp. 712–716.
13. Data based on Brian T. Gold et al., “Lifelong bilingualism maintains neural efficiency for cognitive control in aging,” Journal of Neuroscience, 33, No. 2 (2013), pp. 387–396. 14. Annette N. Senitko et al., “Influence of endurance exercise training status and gender on postexercise hypotension,” Journal of Applied Physiology, 92 (2002), pp. 2368–2374. 15. Willemijn M. van Dolen, Ko de Ruyter, and Sandra Streukens, “The effect of humor in electronic service encounters,” Journal of Economic Psychology, 29 (2008), pp. 160–179. 16. Jane Kolodinsky et al., “Sex and cultural differences in the acceptance of functional foods: A comparison of
24. Based on a problem from Renée A. Jones and Regina P. Becker, Department of Statistics, Purdue University. 25. For a summary of this study and other research in this area, see Stanley Coren and Diane F. Halpern, “Lefthandedness: A marker for decreased survival fitness,” Psychological Bulletin, 109 (1991), pp. 90–106. 26. Data provided by Neil Zimmerman of the Purdue University School of Health Sciences. 27. See I. C. Feller et al., “Sex-biased herbivory in Jackin-the-pulpit (Arisaema triphyllum) by a specialist thrips (Heterothrips arisaemae),” in Proceedings of the 7th International Thysanoptera Conference, Reggio Callabrio, Italy, pp. 163–172.
P H OTO C R E D I T S CHAPTER 1
PAGE 324
D. Hurst/Alamy
PAGE 330
NetPhotos/Alamy
PAGE 1
Jordan Siemens/Getty Images
PAGE 4
Alamy
PAGE 10
© Carl Skepper/Alamy
CHAPTER 6
PAGE 63
Mitchell Layton/Getty Images
PAGE 351
Alamy
PAGE 354
© Syracuse Newspapers/Caroline Chen/ The Image Works
PAGE 376
Alamy
PAGE 383
Joe Raedle/Getty Images
PAGE 386
Olivier Voisin/Photo Researchers
PAGE 408
Photo by The Photo Works
CHAPTER 2 PAGE 81
Sam Edwards/Getty Images
PAGE 82
Alamy
PAGE 87
© Kristoffer Tripplaar/Alamy
PAGE 154
Alamy
CHAPTER 7
CHAPTER 3 PAGE 167
Thinkstock
PAGE 170
U.S. Department of Education Institute of Education Sciences National Center for Education Statistics
PAGE 417
Getty Images/Blend Images RM
PAGE 429
Richard Kail/Photo Researchers, Inc.
PAGE 437
© Oramstock/Alamy
PAGE 449
Robert Warren/Getty Images
PAGE 176
Alamy
PAGE 452
Getty Images/Photo Researchers
PAGE 180
© Alex Segre/Alamy
PAGE 456
Serge Krouglikoff/Getty Images
PAGE 192
hartcreations/iStockphoto
PAGE 195
© Ann E Parry/Alamy
CHAPTER 8
PAGE 199
GSS
PAGE 487
iStockphoto/Thinkstock
PAGE 214
© blickwinkel/Alamy
PAGE 494
Photolibrary
PAGE 222
National Archives and Records Administration NARA
PAGE 496
Alamy
PAGE 501
iStockphoto
CHAPTER 4 PAGE 231
Jgroup/Dreamstime.com
PAGE 239
© MBI/Alamy
PAGE 241
Norlito/iStockphoto
PAGE 246
Profimedia.CZ a.s./Alamy
PAGE 272
skynesher/iStockphoto
PAGE 285
© Randy Faris/Corbis
CHAPTER 5
CHAPTER 9 PAGE 529
© Image Source/Alamy
PAGE 530
© Pixellover RM 6/Alamy
PAGE 541
Alamy
PAGE 549
Alamy
CHAPTER 10 PAGE 563
Jack Hollingsworth/Photodisc/Getty
PAGE 566
Ruth Jenkinson/Getty Images
PAGE 301
Digital Vision/Thinkstock
PAGE 581
© Drive Images/Alamy
PAGE 306
Jacob Wackerhausen/iStockphoto
PAGE 583
PAGE 321
Istockphoto/Thinkstock
Doncaster and Bassetlaw Hospitals/ Science Source
C-1
C-2
Photo Credits
CHAPTER 11
CHAPTER 15
PAGE 611
Barry Austin Photography/Getty Images
PAGE 631
© Radius Images/Alamy
CHAPTER 12 PAGE 643
© Monkey Business Images Ltd/ Dreamstime.com
PAGE 645
© Ingram Publishing/Alamy
PAGE 673
Thinkstock
PAGE 15-1
Steven King/Icon SMI 258/Steven King/ Icon SMI/Newscom
PAGE 15-4
DWC/Alamy
PAGE 15-11
© Jeff Greenberg/Alamy
PAGE 15-18
© Iofoto/Dreamstime
PAGE 15-23
Photo by Jason Barnette; courtesy of Purdue University
CHAPTER 16 CHAPTER 13
PAGE 16-1
Digital Vision/Thinkstock
PAGE 691
© Jupiter Images/Getty Images
PAGE 16-14
Digital Vision/Thinkstock
PAGE 695
Professor Pietro M. Motta/Photo Researchers
PAGE 16-50
istockphoto
PAGE 698
© Patti McConville/Alamy
CHAPTER 17
PAGE 700
© Banana Stock/Agefotostock
PAGE 17-1
CHAPTER 14 PAGE 14-1
© Blend Images/Alamy
PAGE 14-12
Nigel Cattlin/Alamy
PAGE 14-17
© ZUMA Press, Inc./Alamy
Pressmaster/Shutterstock
PAGE 17-4
Michael Rosenfeld/Getty Images
PAGE 17-8
Alamy
PAGE 17-9
George Frey/Bloomberg via Getty Images
PAGE 17-12
© Jeff Greenberg/The Image Works
INDEX Acceptance sampling, 406 Alternative hypothesis. See Hypothesis, alternative ACT college entrance examination, 75, 318, 608–609 Adequate Calcium Today (ACT) study, 551 Analysis of variance (ANOVA) one-way, 644–677 regression, 586–589, 617–618 two-way, 692–706, 708 Analysis of variance table one-way, 657–662 regression, 589, 617 two-way, 702–703 Anonymity, 221–222 Aggregation, 148 Applet Central Limit Theorem, 309, 310, 311 Confidence Interval, 357, 358, 371, 413 Correlation and Regression, 105, 108, 109, 126, 138 Law of Large Numbers, 236, 269 Mean and Median, 34, 51, 52 Normal Approximation to Binomial, 333 Normal Curve, 65, 74 One-variable statistical calculator, 17 One-Way ANOVA, 660, 682 Probability, 232, 236 Simple Random Sample, 191, 195, 203, 217 Statistical Power, 411, 412 Statistical Significance, 393, 394 Two-variable statistical calculator, 126 Probability, 217, 232, 236, 346 AppsFire, 303, 317 Association, 83–84, 536 and causation, 134, 136, 152–153 negative, 90, 98 positive, 90, 98 Attention deficit hyperactivity disorder (ADHD), 444 Available data, 169, 174
Bar graph, 11, 25 Bayes’s rule, 292–293 Behavioral and social science experiments, 224–226 Benford’s law, 242–243, 254 Bias see also Unbiased estimator in a sample, 194, 198, 200, 201, 210, 211, 212, 215 in an experiment, 179, 188, 207 of a statistic, 210–211, 215 Binomial coefficient, 337, 343 Binomial distribution. See Distribution, binomial
Binomial setting, 322, 343, 14-1 Block, 187, 188 Bonferroni procedure, 402, 670–671 Bootstrap, 367, See also Chapter 16 Boston Marathon, 30, 17-40 Boxplot, 37–38, 48 modified, 41, 48 side-by-side, 41, 48, 643, 649 Buffon, Count, 234
Canadian Internet Use Survey (CIUS), 14-25 Capability, 17-34 Capture-recapture sampling, 214 Case, 2, 8, 613 Categorical data. See Variable, categorical Causation, 134, 136, 152–155, 156, 176 Cause-and-effect diagram, 17-4 Cell, 140, 693 Census, 171, 174, 636 Census Bureau, 9, 203, 204, 349, 391 Center of a distribution, 20, 25, 47 Centers for Disease Control and Prevention, 163, 166, 230, 347, 608 Central limit theorem, 3, 307–313, 314, 316 Chi-square distribution. See Distribution, chi-square Chi-square statistic, 538, 550 and the z statistic, 544–545 goodness of fit test, 552–553 Clinical trials, 222 Clusters in data, 50, 95 Coefficient of determination, 662. See Correlation, multiple Coin tossing, 233, 323, 331, 335, 339 Column variable. See Variable, row and column Common response, 153, 156 Complement of an event. See Event, complement Condé Nast Traveler magazine, 15-4, 15-20 Conditional distribution. See Distribution, conditional Conditional probability. See Probability, conditional Confidence interval, 356–358, 368 bootstrap, 16-14–16-16, 16-32–16-38 cautions, 365–367 for multiple comparisons, 672 for odds ratio, 14-10, 14-19 for slope in a logistic regression, 14-10, 14-19 relation to two-sided tests, 387–388 t for a contrast, 665 t for difference of means, 451–454, 467 pooled, 462 t for matched pairs, 431
t for mean response in regression, 576–578, 585 t for one mean, 420–421, 441 t for regression parameters, 574, 584, 616, 633 z for one mean, 358–361 z for one proportion large sample, 490, 503 plus four, 493, 503 z for difference of proportions large sample, 510, 522 plus four, 514, 522 simultaneous, 672 Confidence level, 356, 368 Confidentiality, 220–221, 226 Confounding, 153–154, 156, 173, 174, 429 Consumer Behavior Report 14-25–14-26 Consumer Report on Eating Share Trend (CREST) 631, 14-24 Consumer Reports National Research Center, 330 Consumers Union, 87, 16-37 Continuity correction, 335–336, 343, 15-7 Contrast, 650, 663–668, 678 Control chart, 17-7, 17-17 individuals chart, 17-41 p chart, 17-52–17-57 R chart, 17-23, 17-35 s_ chart, 17-12–17-17 x chart, 17-8–17-12, 17-14, 17-17 Control group, 172, 178, 179, 188, 320, 397, 400, 686 Correlation, 103–104, 107, 275 and regression, 119, 121 based on averaged data, 134, 136 between random variables, 275 bootstrap confidence interval, 16-36–16-38 cautions about, 126–134 multiple, 618. See Coefficient of determination nonsense, 134 inference for, 597–599 population, 597 properties, 105 squared, 119–120, 121, 588, 600 test for, 597, 600 Count, 10, 487 see also Frequency distribution of, 320–325, 339–342, 343 Critical value, 389, 390 of chi-square distribution, 539, Table F of F distribution, 474–476, Table E of standard normal distribution, 65, 359, 389, Table A of t distribution, 419–420, Table D Cumulative proportion, 63–65, 72 standard normal, 63, Table A
I-1
I-2
Index
Data, 2 Anecdotal, 168, 174 Available, 169, 174 Data mining, 135 Decision analysis, 406–411 Degree of Reading Power, 452–455, 16-43–16-46 Degrees of freedom, 44 approximation for, 451, 460, 467 of chi-square distribution, 538 of chi-square test, 539 of F distribution, 474 of one-way ANOVA, 658–659 of t distribution, 419, 441 of two-way ANOVA, 697–698, 702–703 of regression ANOVA, 587, 589, 600 of regression t, 574, 577, 579, 584 of regression s2, 569 of two-sample t, 451, 460, 462 Deming, W. Edwards, 17-40 Density curve, 54–56, 72, 257–259, 261 Density estimation, 71 Department of Transportation, 17-57 Design, 183. see also Experiments block, 187–188 experimental, 179 repeated-measures, 701, 709, 711, 713, 714 sampling, 192–201 Direction of a relationship, 89, 98 Disjoint events. See Event, disjoint Distribution, 3, 25 bimodal, 72 binomial, 322–327, 343, 14-2, Table C formula, 336–338, 343 Normal approximation, 331–335, 343 use in the sign test, 438–439 bootstrap, 16-24–16-30 of categorical variable, 10 chi-square, 538, 550, Table F conditional, 144, 148, 533 describing, 20 examining, 20 exponential, 309 geometric, 348 F, 474–475, Table E joint, 141–142, 148 jointly Normal, 597 marginal, 142–143, 148 noncentral F, 676 noncentral t, 477 Normal, 58–59, 72, 257–261 for probabilities, 257–261 standard, 63, 72, Table A Poisson, 339–342 population, 302 probability. See Probability distribution of quantitative variable, 13–18 sampling. See Sampling distribution skewed, 2, 25 symmetric, 20, 25
t, 419–420 Table D trimodal, 72 tails, 19 uniform, 257–258 unimodal, 20 Weibull, 315–316 Distribution-free procedure, 436. See also Chapter 15 Double-blind experiment, 185, 188 Dual X-ray absorptiometry scanner, 375, 445–446, 447, 17-38–17-39 Estimation, 267–268 Ethics, 167, 217–226 Excel, 4, 5, 91, 158, 184, 196, 427, 459, 491, 511, 571, 613, 630, 674, 707 Expected value, 265 see also Mean of a random variable Expected cell count, 537, 550, 552, 556 Experiment, 171, 172, 174 block design, 187, 188 cautions about, 185–186 comparative,178, 188 completely randomized, 184 matched pairs, 186, 188 principles, 181 units, 175, 188 Explanatory variable. See Variable, explanatory Exploratory data analysis, 9, 25, 167 Extrapolation, 113, 121 Event, 239, 248 complement of, 240, 249, 283, 294 disjoint, 240, 249, 240, 248, 283 empty, 285 independent, 240, 249 intersection, 290 union, 283 F distribution. See Distribution, F F test one-way ANOVA, 660 regression ANOVA, 588, 618 for collection of regression coefficients, 630–631, 637 for standard deviations, 474 two-way ANOVA, 703 Facebook, 28, 317, 318, 442, 465, 468, 488–493, 510–511, 518–520, 525, 530–531, 648–662, 663–673, 680, 687, 689, 14-2–14-4, 14-6–14-8, 15-26, 15-33 Factor, experimental, 175, 188, 643, 692–696 Federal Aviation Administration (FAA), 319 Fisher, Sir R. A., 396, 411, 475 Fitting a line, 110–111 Five-number summary, 37–38, 48 Flowchart, 17-4–17-5 Form of a relationship, 89, 98
Frequency, 16, 25 Frequency table, 16 Gallup-Healthways Well-Being Index, 370 Gallup Poll, 202, 345, 346 Genetic counseling, 297 Genomics, 399 Goodness of fit, 551–556 Gosset, William, 48, 420, 16-11 Health and Retirement Study (HRS), 413 Histogram, 15–18, 25 Hypothesis alternative, 374–375, 390 one-sided, 375, 390 two-sided, 375, 390 null, 374, 390 Hypothesis testing, 410–411. See also Significance test Independence, 235 in two-way tables, 547–548, 550 of events, 244–245, 293, 294 of random variables, 275 Indicator variable, 635–636, 710, 14-4 Inference, statistical. See Statistical inference Influential observation, 129–130, 136, 573, 625 Informed consent, 220, 226 Institutional review board (IRB), 219, 226 Instrument, 6 Interaction, 696, 697–701 Intercept of a line, 111 of least-squares line, 115, 121, 565 Internet Movie Database (IMDb), 637 Intervention, 173 Intersection of events, 290, 294 Interquartile range (IQR), 39–40, 48 iPod, 437 iTunes, 2–3 JMP, 118, 146, 459, 491, 497, 511, 519, 532, 14-14 Karaoke Channel, 369 Kerrich, John, 234 Key characteristics of a data set, 4, 8 Key characteristics of data for relationships, 85 Kruskal-Wallis test, 15-28–15-33 Label, 2, 3, 8 Law of large numbers, 267–270, 279 Law School Admission Test (LSAT), 401, 481 Leaf, in a stemplot, 13, 25 Leaning Tower of Pisa, 607 Least significant difference, 670
Index Least squares, 114, 615–616 Least squares regression line, 113–115, 121, 563, 584 Level of a factor, 175, 188, 653, 675, 692–697 Line, equation of, 111 least-squares, 114, 568 Linear relationship, 89, 98 Linear transformation. See Transformation, linear Logarithm transformation. See Transformation, logarithm Logistic regression, 631–632. See also Chapter 14 Logit, 14-5 Lurking variable. See Variable, lurking
Main effect, 696, 697–701 Major League Baseball, 15-3 Mann-Whitney test, 15-5 Margin of error, 211, 215, 356, 362, 368 for a difference in two means, 451, 467 for a difference in two proportions, 510, 521 for a single mean, 359–360, 368, 420–421, 441 for a single proportion, 490, 503 Marginal means, 699, 708 Matched pairs design, 186, 188 inference for, 429–432, 438–440, 15-20, 15-25 Mean, 31 of binomial distribution, 328, 343 of density curve, 56–57, 72 of difference of sample means, 449 of difference of sample proportions, 508 of normal distribution, 58–59 of random variable, 264–265, 279 rules for, 271–271, 279 of sample mean, 305–306, 316 of sample proportion, 489 trimmed, 53 versus median, 34 Mean square in one-way ANOVA, 659–661, 678 in two-way ANOVA, 702–703 in multiple linear regression, 617, 634 in simple linear regression, 587, 589 Median, 33 inference for, 438–440 of density curve, 56–57, 72 Mendel, Gregor, 246–247 Meta-analysis, 548–549, 551 Minitab, 116, 145, 325, 426, 427, 438, 492, 497, 512, 519, 533, 536, 554, 557, 570, 608, 622, 629, 674, 675, 681, 707, 14-4, 14-11, 14-14, 14-16, 14-18, 14-21, 14-22, 15-9, 15-21, 15-26, 15-31, 15-33 Mode, 20, 25
Motorola, 17-2 Multiple comparisons, 668–673 National AIDS Behavioral Surveys, 345 National Assessment of Educational Progress (NAEP), 61, 73, 392 National Association of Colleges and Employers (NACE), 365, 370 National Congregations Study, 506 National Crime Victimization Survey, 713 National Football League, 102, 604 National Health and Nutrition Examination Survey (NHANES), 383, 698 National Oceanic and Atmospheric Administration (NOAA), 605 National Science Foundation (NSF), 599 National Student Loan Survey, 528 New Jersey Pick-It Lottery, 16-20–16-22 Neyman, Jerzy, 410 Nielsen Company, 421, 443 Noncentrality parameter for t, 478 for F, 676 Nonparametric procedure, 436, 438–440. See also Chapter 15 Nonresponse, 198, 201 Normal distribution. See Distribution, Normal Normal distribution calculations, 63–68, 72 Normal probability plot. See Normal quantile plot Normal scores, 69 Normal quantile plot, 68–70, 72 Null hypothesis. See Hypothesis, null
Observational study, 172 Odds, 632, 14-2, 14-5, 14-19 Odds ratio, 632, 14-7, 14-19 Outcomes, 175, 188 Out-of-control rules, 17-24–17-26 Outliers, 20, 21, 25, 15-1 1.5 3 IQR criterion, 39–40, 48 regression, 130–131 Parameter, 206, 215 Pareto chart, 17-18, 17-54, 17-83 Pearson, Egon, 410 Pearson, Karl, 234 Percent, 10, 487 Percentile, 35 Permutation tests, 16-42–16-52 Pew survey 488, 504, 510, 523, 525, 559, 14-2, 14-20, 15-15, 16-55, 16-57 Pie chart, 12, 25 Placebo effect, 178 Plug-in principle, 16-9, 16-10 Pooled estimator of population proportion, 517
I-3
of ANOVA variance, 654, 659 of variance in two samples, 461 Population, 171, 192, 201, 206 Population distribution. See Distribution, population Power, 402–404, 411 and Type II error, 410 increasing, 405–406 of one-way ANOVA, of t test one-sample, 434–435 two-sample, 477–479 of z test, 402–405 Prediction, 110, 112, 121 Prediction interval, 578–580, 585, 617 Probability, 233, 235 conditional, 286–288, 294 equally likely outcomes, 243–244 finite sample space, 242 Probability distribution, 253, 259, 260 mean of, 264–265, 272 standard deviation of, 274, 275–276 variance of, 272–276 Probability histogram, 254 Probability model, 237, 248 Probability rules, 240 addition, 240, 249, 283, 285, 294 complement, 240, 249, 283, 294 general, 282–286, 294 multiplication, 244–245, 248, 283, 294 Probability sample. See Sample, probability Process capability indexes, 17-41–17-47 Proportion, sample, 321, 343, 488, 503 distribution of, 330–332, 343, 489 inference for a single proportion, 488–503 inference for comparing two proportions, 508–521 Punxsutawney Phil, 149–150, 525 P-value, 377 Quartiles, 35 of a density curve, 57 R, 340, 16-10, 16-11, 16-12, 16-15, 16-19, 16-35, 16-39, 16-46 Randomization consequences of, 213 experimental, 179, 188 how to, 181–184 Random digits, 182, 188, Table B Random number generator, 386 Random phenomenon, 233, 235 Random variable, 252–253, 260 continuous, 256–257, 261 discrete, 253, 260 mean of, 264–265 standard deviation of, 274 variance of, 273–274
I-4
Index
Randomized comparative experiment, 181 Randomized response survey, 299–300 Ranks, 15-4, 15-15 Rate, 6 Regression, 109–121 and correlation, 119, 121 cautions about, 126–136 deviations, 567, 568, 586, 614 interpretation, 116 least-squares, 113–115, 121, 568, 615 multiple, 612–618 nonlinear, 582–584 simple linear, 109–121, 564–599 Regression equation, population, 612 Regression line, 110, 121 population, 565, 584 Relative risk, 520, 522 Reliability, 323 Resample, 367. See also Chapter 16 Residual, 126–127, 136, 569, 584, 615, 633, 652, 653 plots, 128, 136, 572–573, 625 Resistant measure, 32, 48 Response bias, 200, 201 Response rate, 193 Response variable. See Variable, response Ringtone, 196 Robustness, 32, 432–433, 455–456, 477, 15-1 Roundoff error, 26 Row variable. See Variable, row and column Sallie Mae, 360 Sample, 171, 192, 201, 206 cautions about, 198–200 design of, 192–201 multistage, 197–198 probability, 196–201 proportion, 321, 488 simple random (SRS), 194, 201 stratified, 196–197, 201 systematic, 204 Sample size, choosing confidence interval for a mean, 363–365 confidence interval for a proportion, 500 one-way ANOVA, 675–677 t test, one-sample, 434–435 t test, two-sample, 477–479 Sample space, 237, 248 finite, 242 Sample survey, 171, 174, 192, 201 Sampling distribution, 208–209, 215 of difference of means, 449 of regression estimators, 574 of sample count, 325, 332, 339, 343–344 of sample mean, 307, 316, 433 of sample proportion, 332, 343
Sampling variability, 207 SAS, 458, 478, 492, 498, 512, 520, 571, 601, 674, 676, 704, 705, 14-17, 15-7, 15-13, 15-16, 15-22, 15-32 SAT college entrance examination, 67, 75–76, 354, 608–609, 619–622, 627–631, 635, 718 Scatterplot, 87–89, 97 adding categorical variables to, 94–95 smoothing, 96 Shape of a distribution, 20, 25 Shewhart, Walter, 17-7, 17-32 Sign test, 438–440 Significance level, 379, 395–398 Significance, statistical, 378–382 and Type I error, 409 Significance test, 372–390 chi-square for two-way table, 539, 550 relation to z test, 544 chi-square for goodness of fit, 552–553, 556 F test in one-way ANOVA, 660–662 F test in regression, 588–590, 600, 618 F test for a collection of regression coefficients, 630–631, 637 F test for standard deviations, 474–476 F tests in two-way ANOVA, 703 Kruskal-Wallis test, 15-28–15-23 relationship to confidence intervals, 386–388 t test for a contrast, 665 t test for correlation, 597–599, 600 t test for one mean, 422–424 t test for matched pairs, 429–431 t test for two means, 454, 466–467 pooled, 462 t test for regression coefficients, 574, 584 t tests for multiple comparisons, 670 use and abuse, 394–400 Wilcoxon rank sum test, 15-5 Wilcoxon signed rank test, 15-20 z test for one mean, 383, 390 z test for one proportion, 495, 504 z test for logistic regression slope, 14-10, 14-19 z test for two proportions, 517, 522 z test for two means, 450, 466 Simple random sample. See Sample, simple random Simpson’s paradox, 146–147, 148 Simulation, 207 Simultaneous confidence intervals, 672 68–95–99.7 rule, 59–60, 72 Skewed distribution. See Distribution, skewed Slope of a line, 111 of least-squares line, 115, 121, 568 Small numbers, law of, 269–270 Spread of a distribution, 20, 25, 35, 42, 47 Spreadsheet, 5. see also Excel
SPSS, 117, 123, 124, 145, 426, 427, 460, 534, 554, 570, 589, 598, 656, 657, 668, 671, 14-14, 14-18, 15-9, 15-21, 15-32 Standard & Poor’s 500-Stock Index, 425–426 Standard deviation, 42, 48 See also Variance of binomial distribution, 329, 343 of density curve, 57–58, 72 of difference between sample means, 449–450 of difference between sample proportions, 509 of Normal distribution, 58 of Poisson distribution, 339, 344 of regression intercept and slope, 593 pooled for two samples, 462 in ANOVA, 654 properties, 44 of random variable, 274, 279 rules for, 275–276, 279 of sample mean, 306, 316 of sample proportion, 418, 489 Standard error, 418 bootstrap, 16-6, 16-8–16-9 of a contrast, 665 of a difference of sample proportions, 510, 521 for regression prediction, 595, 600 of regression intercept and slope, 593, 600 of mean regression response, 595, 600 of a sample mean, 418, 440 of a sample proportion, 418, 489, 503 Standard Normal distribution. See Distribution, standard Normal Standardized observation, 61, 72 Statistic, 206, 215 Statistical inference, 167, 205–213, 352–353 for Nonnormal populations, 436–440. See also Chapter 15 for small samples, 457–460 Statistical process control, Chapter 17 Statistical significance. See Significance, statistical Stem-and-leaf plot. See Stemplot Stemplot, 13, 25 back-to-back, 14 splitting stems, 14 trimming, 14 Strata, 197, 201. See also Sample, stratified Strength of a relationship, 89, 98. See also Correlation StubHub! 71–72, 16-12, 16-23 Student Monitor, 370 Subjects, experimental, 175, 188
Index Subpopulation, 565, 612–613 Sums of squares in one-way ANOVA, 658–659 in two-way ANOVA, 702–703 in multiple linear regression, 617 in simple linear regression, 586–587 Survey of Study Habits and Attitudes (SSHA), 393 Systematically larger, 15-10 Symmetic distribution. See Distribution, symmetric
t distribution. See Distribution, t t inference procedures for contrasts, 665 for correlation, 597 for matched pairs, 429–431 for multiple comparisons, 670 for one mean, 421, 423 for two means, 450–454 for two means, pooled, 461–462 for regression coefficients, 574, 616 for regression mean response, 577 for regression prediction, 579 robustness of, 432–433, 455–456 Tails of a distribution. See Distribution, tails Test of significance. See Significance test Test statistic, 375–376 Testing hypotheses. See Significance test The Times Higher Education Supplement, 638 Three-way table, 148 Ties, 15-10–15-11 Time plot, 23–24, 25 Titanic, 25, 54, 149, 157, 16-12, 16-23
Transformation, 93 linear, 45–47, 48 logarithm, 93, 436, 582 rank, 15-4 Treatment, experimental, 172, 174, 175, 178, 188 Tree diagram, 290–291, 294 Tuskegee study, 222–223 Twitter, 25–26, 261, 525 Two-sample problems, 448 Two-way table, 139–140, 148, 530 data analysis for, 139–148 inference for, 530–550 models for, 545–548, 550 relationships in 143–144 Type I and II errors, 407–408 Unbiased estimator, 210–211, 215 Undercoverage, 198, 201 Unimodal distribution. See Distribution, unimodal Union of events, 283, 294 Unit of measurement, 3, 45 Unit, experimental, 175 U.S. Agency for International Development, 15-27 U.S. Department of Education, 346 Value of a variable, 2, 8 Variability, 47, 211 Variable, 2, 8 categorical, 3, 8, 97, 487 dependent, 86 explanatory, 84, 86, 97 independent, 86 lurking, 133, 136, 176
I-5
quantitative, 3, 8 response, 84, 86 row and column, 140, 148 Variance, 42, 48 of a difference between two sample means, 449 of a difference between two sample proportions, 509 of a random variable, 273–274, 279 a pooled estimator, 462, 467 rules for, 275–276, 279 of a sample mean, 306 Variation among groups, 658, 678 between groups, 647, 678 common cause, 17-7 special cause, 17-7 within group, 647, 658, 678, Venn diagram, 240 Voluntary response, 194
Wald statistic, 14-10, 14-20 Whiskers, 38 Wilcoxon rank sum test, 15-3–15-15 Wilcoxon signed rank test, 15-18–15-25 Wording questions, 200, 201 World Bank, 31, 78, 100, 16-3 World Database of Happiness, 638
z -score, 61, 72 z statistic for one proportion, 495 for two proportions, 517 one-sample for mean, 419, 440, 440 two-sample for means, 448–450
FORMULAS AND KEY IDEAS CARD • The median of a density curve. The equal-areas point, the point that divides the area under the curve in half.
CHAPTER 1 • The mean x. If the n observations are x1, x2, p , xn, their mean is x1 ⫹ x2 ⫹ p ⫹ xn x⫽ n • The median M. Arrange all observations in order of size, from smallest to largest. If the number of observations n is odd, the median M is the center observation in the ordered list. Find the location of the median by counting 1n ⫹ 12兾2 observations up from the bottom of the list. If the number of observations n is even, the median M is the mean of the two center observations in the ordered list. The location of the median is again 1n ⫹ 12兾2 from the bottom of the list. • The quartiles Q1 and Q3. Arrange the observations in increasing order and locate the median M in the ordered list of observations. Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. • The five-number summary. The smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five-number summary is Minimum Q1 M Q3 Maximum • A boxplot. A graph of the five-number summary. A central box spans the quartiles Q1 and Q3. A line in the box marks the median M. Lines extend from the box out to the smallest and largest observations. • The interquartile range (IQR). The distance between the first and third quartiles, IQR ⫽ Q3 ⫺ Q1 • The 1.5 ⴛ IQR rule for outliers. Call an observation a suspected outlier if it falls more than 1.5 ⫻ IQR above the third quartile or below the first quartile. • The variance s2. For n observations x1, x2, p , xn, s2 ⫽
1x1 ⫺ x2 2 ⫹ 1x2 ⫺ x2 2 ⫹ p ⫹ 1xn ⫺ x2 2 n⫺1
• The standard deviation s. Is the square root of the variance s 2. • Effect of a linear transformation. Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (interquartile range and standard deviation) by b. Adding the same number a (either positive or negative) to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread. • Density curve. Is always on or above the horizontal axis and has area exactly 1 underneath it.
• The mean of a density curve. The balance point at which the curve would balance if made of solid material. • The 68–95–99.7 rule. In the Normal distribution with mean and standard deviation , approximately 68% of the observations fall within of the mean , approximately 95% of the observations fall within 2 of , and approximately 99.7% of the observations fall within 3 of . • Standardizing and z-scores. If x is an observation from a distribution that has mean and standard deviation , z⫽
x⫺
• The standard Normal distribution. The Normal distribution N10, 12 with mean 0 and standard deviation 1. If a variable X has any Normal distribution N1, 2 with mean and standard deviation , then the standardized variable Z⫽
X⫺
has the standard Normal distribution. • Use of Normal quantile plots. If the points on a Normal quantile plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.
CHAPTER 2 • Response variable, explanatory variable. A response variable measures an outcome of a study. An explanatory variable explains or causes changes in the response variables. • Scatterplot. A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual. • Positive association, negative association. Two variables are positively associated when above-average values of one tend to accompany above-average values of the other and below-average values also tend to occur together. Two variables are negatively associated when above-average values of one tend to accompany below-average values of the other, and vice versa. • Correlation. The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r. Suppose that we have data on variables x and y for n individuals. The means and standard deviations of the two variables are x and sx for the x-values, and y and sy for the y-values. The correlation r
FORMULAS AND KEY IDEAS CARD between x and y is r⫽
xi ⫺ x yi ⫺ y 1 a ba b n⫺1a sx sy
• Straight lines. Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A straight line relating y to x has an equation of the form y ⫽ b0 ⫹ b1x In this equation, b1 is the slope, the amount by which y changes when x increases by one unit. The number b0 is the intercept, the value of y when x ⫽ 0. • Equation of the least-squares regression line. We have data on an explanatory variable x and a response variable y for n individuals. The means and standard deviations of the sample data are x and sx for x and y and sy for y, and the correlation between x and y is r. The equation of the leastsquares regression line of y on x is yˆ ⫽ b0 ⫹ b1x with slope b1 ⫽ rsy兾sx and intercept b0 ⫽ y ⫺ b1x. • r2 in regression. The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x. • Residuals. A residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is, residual ⫽ y ⫺ yˆ . • Outliers and influential observations in regression. An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in the y direction of a scatterplot have large regression residuals, but other outliers need not have large residuals. An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line. • Simpson’s paradox. An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s paradox. • Confounding. Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables.
CHAPTER 3 • Anecdotal evidence. Anecdotal evidence is based on haphazardly selected individual cases, which often come to our attention because they are striking in some way. These cases need not be representative of any larger group of cases.
• Available data. Available data are data that were produced in the past for some other purpose but that may help answer a present question. • Observation versus experiment. In an observational study we observe individuals and measure variables of interest but do not attempt to influence the responses. In an experiment we deliberately impose some treatment on individuals and we observe their responses. • Experimental units, subjects, treatment. The individuals on which the experiment is done are the experimental units. When the units are human beings, they are called subjects. A specific experimental condition applied to the units is called a treatment. • Bias. The design of a study is biased if it systematically favors certain outcomes. • Principles of experimental design. 1. Compare two or more treatments. 2. Randomize—use impersonal chance to assign experimental units to treatments. 3. Repeat each treatment on many units to reduce chance variation in the results. • Statistical significance. An observed effect so large that it would rarely occur by chance is called statistically significant. • Random digits. A table of random digits is a list of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 that has the following properties: The digit in any position in the list has the same chance of being any one of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. The digits in different positions are independent in the sense that the value of one has no influence on the value of any other. • Block design. A block is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a block design, the random assignment of units to treatments is carried out separately within each block. • Population and sample. The entire group of individuals that we want information about is called the population. A sample is a part of the population that we actually examine in order to gather information. • Voluntary response sample. A voluntary response sample consists of people who choose themselves by responding to a general appeal. Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond. • Simple random sample. A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. • Probability sample. A probability sample is a sample chosen by chance. We must know what samples are possible and what chance, or probability, each possible sample has. • Stratified random sample. To select a stratified random sample, first divide the population into groups of similar
FORMULAS AND KEY IDEAS CARD individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample. • Undercoverage and nonresponse. Undercoverage occurs when some groups in the population are left out of the process of choosing the sample. Nonresponse occurs when an individual chosen for the sample can’t be contacted or does not cooperate. • Parameters and statistics. A parameter is a number that describes the population. A parameter is a fixed number, but in practice we do not know its value. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. We often use a statistic to estimate an unknown parameter. • Sampling distribution. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. • Bias and variability. Bias concerns the center of the sampling distribution. A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling design and the sample size n. Statistics from larger probability samples have smaller spreads. • Managing bias and variability. To reduce bias, use random sampling. When we start with a list of the entire population, simple random sampling produces unbiased estimates—the values of a statistic computed from an SRS neither consistently overestimate nor consistently underestimate the value of the population parameter. To reduce the variability of a statistic from an SRS, use a larger sample. You can make the variability as small as you want by taking a large enough sample. • Population size doesn’t matter. The variability of a statistic from a random sample does not depend on the size of the population, as long as the population is at least 100 times larger than the sample. • Basic data ethics. The organization that carries out the study must have an institutional review board that reviews all planned studies in advance in order to protect the subjects from possible harm. All individuals who are subjects in a study must give their informed consent before data are collected. All individual data must be kept confidential. Only statistical summaries for groups of subjects may be made public.
CHAPTER 4 • Randomness and probability. We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in a large number of repetitions. The probability of any outcome of a
random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. • Sample space. The sample space S of a random phenomenon is the set of all possible outcomes. • Event. An event is an outcome or a set of outcomes of a random phenomenon. That is, an event is a subset of the sample space. • Probability rules. Rule 1. The probability P1A2 of any event A satisfies 0 ⱕ P1A2 ⱕ 1. Rule 2. If S is the sample space in a probability model, then P1S2 ⫽ 1. Rule 3. Two events A and B are disjoint if they have no outcomes in common and so can never occur together. If A and B are disjoint, P1A or B2 ⫽ P1A2 ⫹ P1B2. This is the addition rule for disjoint events. Rule 4. The complement of any event A is the event that A does not occur, written as Ac. The complement rule states that P1Ac 2 ⫽ 1 ⫺ P1A2. • Probabilities in a finite sample space. Assign a probability to each individual outcome. These probabilities must be numbers between 0 and 1 and must have sum 1. The probability of any event is the sum of the probabilities of the outcomes making up the event. • Equally likely outcomes. If a random phenomenon has k possible outcomes, all equally likely, then each individual outcome has probability 1兾k. The probability of any event A is P1A2 ⫽ 1count of outcomes in A2兾k. • The multiplication rule for independent events. Rule 5. Two events A and B are independent if knowing that one occurs does not change the probability that the other occurs. If A and B are independent, P1A and B2 ⫽ P1A2P1B2. This is the multiplication rule for independent events. • Random variable. A random variable is a variable whose value is a numerical outcome of a random phenomenon. • Discrete random variable. A discrete random variable X has a finite number of possible values. The probability distribution of X lists the values and their probabilities: Value of X
x1
x2
x3
p
xk
Probability
p1
p2
p3
p
pk
The probabilities pi must satisfy two requirements: 1. Every probability pi is a number between 0 and 1. 2. p1 ⫹ p2 ⫹ p ⫹ pk ⫽ 1. Find the probability of any event by adding the probabilities pi of the particular values xi that make up the event. • Continuous random variable. A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The probability of any event is the area under the density curve and above the values of X that make up the event. • Mean of a discrete random variable. Suppose that X is a discrete random variable whose distribution is Value of X
x1
x2
x3
p
xk
Probability
p1
p2
p3
p
pk
FORMULAS AND KEY IDEAS CARD To find the mean of X, multiply each possible value by its probability, then add all the products: X ⫽ x1p1 ⫹ x2p2 ⫹ p ⫹ xkpk • Law of large numbers. Draw independent observations at random from any population with finite mean . Decide how accurately you would like to estimate . As the number of observations drawn increases, the mean x of the observed values eventually approaches the mean of the population as closely as you specified and then stays that close. • Rules for means. Rule 1. If X is a random variable and a and b are fixed numbers, then a⫹bX ⫽ a ⫹ bX . Rule 2. If X and Y are random variables, then X⫹Y ⫽ X ⫹ Y . • Variance of a discrete random variable. Suppose that X is a discrete random variable whose distribution is Value of X Probability
x1 p1
x2 p2
x3
p
xk
p3
p
pk
and that X is the mean of X. The variance of X is 2X ⫽ 1x1 ⫺ X 2 2p1 ⫹ 1x2 ⫺ X 2 2p2 ⫹ p ⫹ 1xk ⫺ X 2 2pk • Rules for variances and standard deviations. Rule 1. If X is a random variable and a and b are fixed numbers, then 2a⫹bX ⫽ b22X . Rule 2. If X and Y are independent random variables, then 2X⫹Y ⫽ 2X ⫹ 2Y and 2X⫺Y ⫽ 2X ⫹ 2Y . This is the addition rule for variances of independent random variables. Rule 3. If X and Y have correlation , then 2X⫹Y ⫽ 2X ⫹ 2Y ⫹ 2XY and 2X⫺Y ⫽ 2X ⫹ 2Y ⫺ 2XY . This is the general addition rule for variances of random variables. To find the standard deviation, take the square root of the variance. • Rules of probability. Rule 1. 0 ⱕ P1A2 ⱕ 1 for any event A. Rule 2. P1S2 ⫽ 1. Rule 3. Addition rule: If A and B are disjoint events, then P1A or B2 ⫽ P1A2 ⫹ P1B2. Rule 4. Complement rule: For any event A, P1Ac 2 ⫽ 1 ⫺ P1A2. Rule 5. Multiplication rule: If A and B are independent events, then P1A and B2 ⫽ P1A2P1B2. • Union. The union of any collection of events is the event that at least one of the collection occurs. • Addition rule for disjoint events. If events A, B, and C are disjoint in the sense that no two have any outcomes in common, then P1one or more of A, B, C2 ⫽ P1A2 ⫹ P1B2 ⫹ P1C2. This rule extends to any number of disjoint events. • General addition rule for unions of two events. For any two events A and B, P1A or B2 ⫽ P1A2 ⫹ P1B2 ⫺ P1A and B2. • Multiplication rule. The probability that both of two events A and B happen together can be found by P1A and B2 ⫽ P1A2P1B 0 A2. Here P1B 0 A2 is the conditional probability that B occurs, given the information that A occurs. • Definition of conditional probability. When P1A2 ⬎ 0, the and B2 conditional probability of B given A is P1B 0 A2 ⫽ P1AP1A2 .
• Intersection. The intersection of any collection of events is the event that all of the events occur. • Bayes’s rule. Suppose that A1, A2, …, Ak are disjoint events whose probabilities are not 0 and add to exactly 1. That is, any outcome is in exactly one of these events. Then if C is any other event whose probability is not 0 or 1, P1Ai 0 C2 ⫽
P1C 0 Ai 2P1Ai 2 P1C 0 A1 2P1A1 2 ⫹ p ⫹ P1Ak 2P1C 0 Ak 2
• Independent events. Two events A and B that both have positive probability are independent if P1B 0 A2 ⫽ P1B2.
CHAPTER 5 • The sample mean x of an SRS of size n drawn from a large population with mean and standard deviation has a sampling distribution with mean x ⫽ and standard deviation x ⫽ 兾 1n. • Linear combinations of independent Normal random variables have Normal distributions. In particular, if the population has a Normal distribution, so does x. • The central limit theorem states that for large n the sampling distribution of x is approximately N1, 兾 1n2 for any population with mean and finite standard deviation . This includes populations of both continuous and discrete random variables. • The binomial distribution. A count X of successes has the binomial distribution B1n, p2 when there are n trials, all independent, each resulting in a success or a failure, and each having the same probability p of a success. The mean of X is X ⫽ np and the standard deviation is X ⫽ 2np11 ⫺ p2. • The sample proportion of success pˆ ⫽ X兾n has mean pˆ ⫽ p and standard deviation pˆ ⫽ 2p11 ⫺ p2兾n. It is an unbiased estimator of the population proportion p. • The sampling distribution of the count of successes. The B1n, p2 distribution is a good approximation to the sampling distribution of the count of successes in an SRS of size n from a large population containing proportion p of successes. We will use this approximation when the population is at least 20 times larger than the sample. • The sampling distribution of the sample proportion. The sampling distribution of pˆ is not binomial but the B1n, p2 distribution can be used to do probability calculations about pˆ by restating them in terms of the count X. We will use the B1n, p2 distribution when the population is at least 20 times larger than the sample. • The Normal approximation to the binomial distribution says that if X is a count having the B1n, p2 distribution, then when n is large, X is approximately N1np, 2np11 ⫺ p22. In addition, the sample proportion pˆ ⫽ X兾n is N1p, 2p11 ⫺ p2兾n2. We will use these approximations when np ⱖ 10 and n11 ⫺ p2 ⱖ 10. The continuity correction improves the accuracy of the Normal approximations.
FORMULAS AND KEY IDEAS CARD CHAPTER 6 • Confidence interval. The purpose of a confidence interval is to estimate an unknown parameter with an indication of how accurate the estimate is and of how confident we are that the result is correct. Any confidence interval has two parts: an interval computed from the data and a confidence level. The interval often has the form estimate ⫾ margin of error. • Confidence level. The confidence level states the probability that the method will give a correct answer. That is, if you use 95% confidence intervals, in the long run 95% of your intervals will contain the true parameter value. When you apply the method once, you do not know if your interval gave a correct value (this happens 95% of the time) or not (this happens 5% of the time). • Confidence interval for the mean . For a Normal population with known standard deviation , a level C confidence interval for the mean is given by x ⫾ m, where the margin of error m ⫽ z* 1n . Here z* is obtained from the standard Normal distribution such that the probability is C that a standard Normal random variable takes a value between ⫺z* and z*. • Margin of error. Other things being equal, the margin of error of a confidence interval decreases as the confidence level C decreases, the sample size n increases, and the population standard deviation decreases. The sample size n required to obtain a confidence interval of specified margin of error m for a Normal mean is n ⫽ 1z*兾m2 2, where z* is the critical point for the desired level of confidence. • A test of significance is intended to assess the evidence provided by data against a null hypothesis H0 in favor of an alternative hypothesis Ha. The hypotheses are stated in terms of population parameters. Usually H0 is a statement that no effect or no difference is present, and Ha says that there is an effect or difference. The difference can be in a specific direction (one-sided alternative) or in either direction (two-sided alternative). • The test statistic and P-value. The test of significance is based on a test statistic. The P-value is the probability, computed assuming that H0 is true, that the test statistic will take a value at least as extreme as that actually observed. Small P-values indicate strong evidence against H0. Calculating Pvalues requires knowledge of the sampling distribution of the test statistic when H0 is true. If the P-value is as small or smaller than a specified value ␣, the data are statistically significant at significance level ␣. • Significance test concerning an unknown mean . Significance tests for the hypothesis H0 : ⫽ 0 are based on the z statistic, z ⫽ 1 x ⫺ 0 2兾 1兾 1n 2. This z test assumes an SRS of size n, known population standard deviation , and either a Normal population or a large sample. ˇ
ˇ
CHAPTER 7 • Standard error. When the standard deviation of a statistic is estimated from the data, the result is called the standard
error of the statistic. The standard error of the sample mean s x is SEx ⫽ 1n . • The t distributions. Suppose that an SRS of size n is drawn from an N1, 2 population. The one-sample t statistic t ⫽ 1x ⫺ 2兾 1s兾 1n2 has the t distribution with n ⫺ 1 degrees of freedom. • The one-sample t confidence interval. Consider an SRS of size n drawn from a population having unknown mean . A level C confidence interval for is x ⫾ t*s兾 1n, where t* is the value for the t1n ⫺ 12 density curve with area C between ⫺t* and t*. The quantity t*s兾 1n is the margin of error. This interval is exact when the population distribution is Normal and is approximately correct for large n in other cases. • The one-sample t test. Suppose that an SRS of size n is drawn from a population having unknown mean . To test the hypothesis H0 : ⫽ 0, compute the one-sample t statistic t ⫽ 1x ⫺ 0 2兾 1s兾 1n2. P-values or fixed significance levels are computed from the t1n ⫺ 12 distribution. ˇ
ˇ
• Matched pairs t procedures. These one-sample procedures are used to analyze matched pairs data by first taking the differences within each matched pair to produce a single sample. • Robustness of t procedures. The t procedures are relatively robust against non-Normal populations. The t procedures are useful for non-Normal data when 15 ⱕ n ⬍ 40 unless the data show outliers or strong skewness. When n ⱖ 40, the t procedures can be used even for clearly skewed distributions. • Power of a t test. The power of the t test is calculated like that of the z test, using an approximate value for both and s. • Sign test. The sign test is a distribution-free test because it uses probability calculations that are correct for a wide range of population distributions. The sign test for “no treatment effect” in matched pairs counts the number of positive differences. The P-value is computed from the B1n, 1兾22 distribution, where n is the number of non-0 differences. The sign test is less powerful than the t test in cases where use of the t test is justified. • The two-sample t test. Suppose that an SRS of size n1 is drawn from a Normal population with unknown mean 1 and that an independent SRS of size n2 is drawn from another Normal population with unknown mean 2. To test the hypothesis H0 : 1 ⫽ 2, compute the two-sample t statistic t ⫽ 1x1 ⫺ x2 2兾 1 2s21兾n1 ⫹ s22兾n2 2 and use P-values or critical values for the t1k2 distribution, where the degrees of freedom k either are approximated by software or are the smaller of n1 ⫺ 1 and n2 ⫺ 1. ˇ
ˇ
• The two-sample t test. Suppose that an SRS of size n1 is drawn from a Normal population with unknown mean 1 and that an independent SRS of size n2 is drawn from another Normal population with unknown mean 2. The confidence interval for 1 ⫺ 2 is given by 1x1 ⫺ x2 2 ⫾
FORMULAS AND KEY IDEAS CARD t* 2s21兾n1 ⫹ s22兾n2. This interval has confidence level at least C no matter what the population standard deviations may be. Here, t* is the value for the t1k2 density curve with area C between ⫺t* and t*, where the degrees of freedom k either are approximated by software or are the smaller of n1 ⫺ 1 and n2 ⫺ 1. • Pooled two-sample t procedures. If we can assume that the two populations have equal variances, pooled two-sample t procedures can be used. These are based on the pooled estimator s2p ⫽ 11n1 ⫺ 12s21 ⫹ 1n2 ⫺ 12s22 2兾 1n1 ⫹ n2 ⫺ 22 of the unknown common variance and the t1n1 ⫹ n2 ⫺ 22 distribution.
of successes and the number of failures in each sample are both at least 10. • Significance test for comparing two proportions. To test the hypothesis H0 : p1 ⫽ p2 compute the z statistic z ⫽ 1pˆ 1 ⫺ pˆ 2 2兾SEDp where the pooled standard error is SEDp ⫽ 2pˆ 11 ⫺ pˆ 211兾n1 ⫹ 1兾n2 2 and where pˆ ⫽ 1X1 ⫹ X2 2兾 1n1 ⫹ n2 2. In terms of a standard Normal random variable Z, the Pvalue for a test of H0 against Ha : p1 ⬎ p2 is P1Z ⱖ z2, Ha : p1 ⬍ p2 is P1Z ⱕ z2, and Ha : p1 ⬆ p2 is 2P1Z ⱖ 0z0 2. ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
CHAPTER 9 CHAPTER 8 • Large-sample confidence interval for a population proportion. Choose an SRS of size n from a large population with an unknown proportion p of successes. The sample proportion is pˆ ⫽ X兾n, where X is the number of successes. The standard error of pˆ is SEpˆ ⫽ 2pˆ 11 ⫺ pˆ 2兾n and the margin of error for confidence level C is m ⫽ z*SEpˆ , where the critical value z* is the value for the standard Normal density curve with area C between ⫺z* and z*. An approximate level C confidence interval for p is pˆ ⫾m. Use this interval for 90%, 95%, or 99% confidence when the number of successes and the number of failures are both at least 10. • Large-sample significance test for a population proportion. Draw an SRS of size n from a large population with an unknown proportion p of successes. To test the hypothesis H0 : p ⫽ p0, compute the z statistic, z ⫽ 1pˆ ⫺ p0 2兾 2p0 11 ⫺ p0 2兾n. In terms of a standard Normal random variable Z, the approximate P-value for a test of H0 against Ha : p ⬎ p0 is P1Z ⱖ z2, Ha : p ⬍ p0 is P1Z ⱕ z2, and Ha : p ⬆ p0 is 2P1Z ⱖ 0 z0 2. ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
ˇ
• Sample size for desired margin of error. The level C confidence interval for a proportion p will have a margin of error approximately equal to a specified value m when the sample size satisfies n ⫽ 1z*兾m2 2p*11 ⫺ p*2. Here z* is the critical value for confidence C, and p* is a guessed value for the proportion of successes in the future sample. The margin of error will be less than or equal to m if p* is chosen to be 0.5. The sample size required when p* ⫽ 0.5 is n ⫽ 11兾421z*兾m2 2. • Large-sample confidence interval for comparing two proportions. Choose an SRS of size n1 from a large population having proportion p1 of successes and an independent SRS of size n2 from another population having proportion p2 of successes. The estimate of the difference in the population proportions is D ⫽ pˆ 1 ⫺ pˆ 2. The standard error of D is SED ⫽ 21pˆ 1 11 ⫺ pˆ 1 2兾n1 2 ⫹ 1pˆ 2 11 ⫺ pˆ 2 2兾n2 2 and the margin of error for confidence level C is m ⫽ z*SED, where the critical value z* is the value for the standard Normal density curve with area C between ⫺z* and z*. An approximate level C confidence interval for p1 ⫺ p2 is D ⫾ m. Use this method for 90%, 95%, or 99% confidence when the number
• Chi-square statistic. The chi-square statistic is a measure of how much the observed cell counts in a two-way table diverge from the expected cell counts. The formula for the statistic is X2 ⫽ a
1observed count ⫺ expected count2 2 expected count
where “observed” represents an observed cell count, “expected” represents the expected count for the same cell, and the sum is over all r ⫻ c cells in the table. • Chi-square test for two-way tables. The null hypothesis H0 is that there is no association between the row and column variables in a two-way table. The alternative is that these variables are related. If H0 is true, the chi-square statistic X2 has approximately a 2 distribution with 1r ⫺ 121c ⫺ 12 degrees of freedom. The P-value for the chi-square test is P12 ⱖ X2 2, where 2 is a random variable having the 2 1df 2 distribution with df ⫽ 1r ⫺ 121c ⫺ 12. • Expected cell counts. Expected count 5 (row total 3 column total2兾n. • The chi-square goodness of fit test. Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n1, n2, p , nk in k cells. A null hypothesis specifies probabilities p1, p2, p , pk for the possible outcomes. For each cell, multiply the total number of observations n by the specified probability to determine the expected counts: expected count ⫽ npi. The chisquare statistic measures how much the observed cell counts differ from the expected cell counts. The formula for the statistic is X2 ⫽ a
1observed count ⫺ expected count2 2 expected count
The degrees of freedom are k ⫺ 1, and P-values are computed from the chi-square distribution.
CHAPTER 10 • Simple linear regression. The statistical model for simple linear regression assumes the means of the response
FORMULAS AND KEY IDEAS CARD variable y fall on a line when plotted against x, with the observed y’s varying Normally about these means. For n observations, this model can be written yi ⫽ 0 ⫹ 1xi ⫹ ⑀i, where i ⫽ 1, 2, p , n, and the ⑀i are assumed to be independent and Normally distributed with mean 0 and standard deviation . Here 0 ⫹ 1xi is the mean response when x ⫽ xi. The parameters of the model are 0, 1, and . • Estimation of model parameters. The population regression line intercept and slope, 0 and 1, are estimated by the intercept and slope of the least-squares regression line, b0 and b1. The parameter is estimated by s ⫽ 2 ge2i 兾 1n ⫺ 22, where the ei are the residuals ei ⫽ yi ⫺ yˆ i. • Confidence interval and significance test for 1. A level C confidence interval for population slope 1 is b1 ⫾ t*SEb1 where t* is the value for the t1n ⫺ 22 density curve with area C between ⫺t* and t*. The test of the hypothesis H0: 1 ⫽ 0 is based on the t statistic t ⫽ b1兾SEb1 and the t1n ⫺ 22 distribution. This tests whether there is a straight-line relationship between y and x. There are similar formulas for confidence intervals and tests for 0, but these are meaningful only in special cases. • Confidence interval for the mean response. The estimated mean response for the subpopulation corresponding to the value x* of the explanatory variable is ˆ y ⫽ b0 ⫹ b1x*. A level C confidence interval for the mean response is ˆ y ⫾ t*SEˆ where t* is the value for the t1n ⫺ 22 density curve with area C between ⫺t* and t*. • Prediction interval for the estimated response. The estimated value of the response variable y for a future observation from the subpopulation corresponding to the value x* of the explanatory variable is yˆ ⫽ b0 ⫹ b1x*. A level C prediction interval for the estimated response is yˆ ⫾ t*SEyˆ where t* is the value for the t1n ⫺ 22 density curve with area C between ⫺t* and t*. The standard error for the prediction interval is larger than that for the confidence interval because it also includes the variability of the future observation around its subpopulation mean.
CHAPTER 11 • Multiple linear regression. The statistical model for multiple linear regression with response variable y and p explanatory variables x1, x2, p , xp is yi ⫽ 0 ⫹ 1xi1 ⫹ 2xi2 ⫹ p ⫹ pxip ⫹ ⑀i where i ⫽ 1, 2, p , n. The ⑀i are assumed to be independent and Normally distributed with mean 0 and standard deviation . The parameters of the model are 0, 1, 2, p , p, and . • Estimation of model parameters. The multiple regression equation predicts the response variable by a linear relationship with all the explanatory variables: yˆ ⫽ b0 ⫹ b1x1 ⫹ b2x2 ⫹ p ⫹ bpxp. The ’s are estimated by b0, b1, b2, p , bp, which are obtained by the method of least squares. The parameter is estimated by s ⫽ 1MSE ⫽ 2 ge2i 兾 1n ⫺ p ⫺ 12 where the ei are the residuals, ei ⫽ yi ⫺ yˆ i.
• Confidence interval for j . A level C confidence interval for j is bj ⫾ t*SEbj where t* is the value for the t1n ⫺ p ⫺ 12 density curve with area C between ⫺t* and t*. The test of the hypothesis H0 : j ⫽ 0 is based on the t statistic t ⫽ bj兾SEbj and the t1n ⫺ p ⫺ 12 distribution. The estimate bj of j and the test and confidence interval for j are all based on a specific multiple linear regression model. The results of all of these procedures change if other explanatory variables are added to or deleted from the model. ˇ
ˇ
• The ANOVA F test. The ANOVA table for a multiple linear regression gives the degrees of freedom, sum of squares, and mean squares for the model, error, and total sources of variation. The ANOVA F statistic is the ratio MSM/MSE and is used to test the null hypothesis H0 : 1 ⫽ 2 ⫽ p ⫽ p ⫽ 0. If H0 is true, this statistic has an F1p, n ⫺ p ⫺ 12 distribution. ˇ
ˇ
• Squared multiple correlation. The squared multiple correlation is given by the expression R2 ⫽ SSM兾SST and is interpreted as the proportion of the variability in the response variable y that is explained by the explanatory variables x1, x2, p , xp in the multiple linear regression.
CHAPTER 12 • One-way analysis of variance (ANOVA) is used to compare several population means based on independent SRSs from each population. The populations are assumed to be Normal with possibly different means and the same standard deviation. To do an analysis of variance, first compute sample means and standard deviations for all groups. Sideby-side boxplots give an overview of the data. Examine Normal quantile plots (either for each group separately or for the residuals) to detect outliers or extreme deviations from Normality. Compute the ratio of the largest to the smallest sample standard deviation. If this ratio is less than 2 and the Normal quantile plots are satisfactory, ANOVA can be performed. • ANOVA F test. An analysis of variance table organizes the ANOVA calculations. Degrees of freedom, sums of squares, and mean squares appear in the table. The F statistic is the ratio MSG/MSE and is used to test the null hypothesis that the population means are all equal. The alternative hypothesis is true if there are any differences among the population means. The F1I ⫺ 1, N ⫺ I2 distribution is used to compute the P-value. • Contrasts. Specific questions formulated before examination of the data can be expressed as contrasts. A contrast is a combination of population means of the form c ⫽ gaii where the coefficients ai sum to 0. The corresponding sample contrast is c ⫽ gaixi. The standard error of c is SEc ⫽ sp 2 ga2i 兾ni. Tests and confidence intervals for contrasts provide answers to these specific questions. • Multiple comparisons. To perform a multiple-comparisons procedure, compute t statistics for all pairs of means using
FORMULAS AND KEY IDEAS CARD the formula tij ⫽ 1xi ⫺ xj 2兾 1sp 21兾ni ⫹ 1兾nj 2. If 0 tij 0 ⱖ t** we declare that the population means i and j are different. Otherwise, we conclude that the data do not distinguish between them. The value of t** depends upon which multiplecomparisons procedure we choose.
CHAPTER 13 • Two-way analysis of variance is used to compare population means when populations are classified according to two factors. ANOVA assumes that the populations are Normal with possibly different means and the same standard deviation and that independent SRSs are drawn from each population. As with one-way ANOVA, preliminary analysis in-
cludes examination of means, standard deviations, and Normal quantile plots. • ANOVA table and F tests. ANOVA separates the total variation into parts for the model and error. The model variation is separated into parts for each of the main effects and the interaction. These calculations are organized into an ANOVA table. Pooling is used to estimate the within-group variance. F statistics and P-values are used to test hypotheses about the main effects and the interaction. • Marginal means are calculated by taking averages of the cell means across rows and columns. Careful inspection of the cell means is necessary to interpret statistically significant main effects and interactions. Plots are a useful aid.
Probability p
Table entry for p and C is the critical value t* with probability p lying to its right and probability C lying between 2t* and t*.
t*
TABLE D t distribution critical values Upper-tail probability p df
.25
.20
.15
.10
.05
.025
.02
.01
.005
.0025
.001
.0005
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100 1000 z*
1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.681 0.679 0.679 0.678 0.677 0.675 0.674
1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854 0.851 0.849 0.848 0.846 0.845 0.842 0.841
1.963 1.386 1.250 1.190 1.156 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.058 1.057 1.056 1.055 1.055 1.050 1.047 1.045 1.043 1.042 1.037 1.036
3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.299 1.296 1.292 1.290 1.282 1.282
6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.676 1.671 1.664 1.660 1.646 1.645
12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.009 2.000 1.990 1.984 1.962 1.960
15.89 4.849 3.482 2.999 2.757 2.612 2.517 2.449 2.398 2.359 2.328 2.303 2.282 2.264 2.249 2.235 2.224 2.214 2.205 2.197 2.189 2.183 2.177 2.172 2.167 2.162 2.158 2.154 2.150 2.147 2.123 2.109 2.099 2.088 2.081 2.056 2.054
31.82 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.403 2.390 2.374 2.364 2.330 2.326
63.66 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.678 2.660 2.639 2.626 2.581 2.576
127.3 14.09 7.453 5.598 4.773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3.372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.119 3.104 3.091 3.078 3.067 3.057 3.047 3.038 3.030 2.971 2.937 2.915 2.887 2.871 2.813 2.807
318.3 22.33 10.21 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.611 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.261 3.232 3.195 3.174 3.098 3.091
636.6 31.60 12.92 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.416 3.390 3.300 3.291
50%
60%
70%
80%
90%
95%
96%
98%
99%
99.5%
99.8%
99.9%
Confidence level C