119 60 6MB
english Pages [50] Year 2023
Get complete eBook Order by email at [email protected] Loose-Leaf Print Companion –› Use with your digital solution
B L ACK
B AY L EY
CASTI LLO
BUSINESS STATISTICS FOR CONTEMPORARY DECISION MAKING Fourth Canadian Edition
Cover Design: Wiley Cover Image: © echo3005 / Shutterstock
www.wiley.com
Get complete eBook Order by email at [email protected]
Get Complete eBook Download Link below for instant download
https://browsegrades.net/documents/2 86751/ebook-payment-link-for-instantdownload-after-payment Follow the Instructions for complete eBook Download by email.
Get complete eBook Order by email at [email protected]
Business Statistics For Contemporary Decision-Making Fourth Canadian Edition
KEN BLACK University of Houston—Clear Lake
TIFFANY BAYLEY Western University
IGNACIO CASTILLO Wilfrid Laurier University
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 1
22/03/23 6:11 PM
Get complete eBook Order by email at [email protected]
Brief Contents
ABOUT THE AUTHORS iii PREFACE iv
1 Introduction to Statistics and Business Analytics 1 2 Visualizing Data with Charts and Graphs 21 3 Descriptive Statistics 56 4 Probability 95 5 Discrete Distributions 138 6 Continuous Distributions 178 7 Sampling and Sampling Distributions 213 8 Statistical Inference: Estimation for Single Populations 245 9 Statistical Inference: Hypothesis Testing for Single Populations 280 10 Statistical Inferences About Two Populations 329 11 Analysis of Variance and Design of Experiments 383 12 Correlation and Simple Regression Analysis 438 13 Multiple Regression Analysis 491 14 Building Multiple Regression Models 526 15 Time-Series Forecasting and Index Numbers 579 16 Analysis of Categorical Data 635 17 Nonparametric Statistics 657 18 Statistical Quality Control 704 19 Decision Analysis 744 APPE ND IX A
Tables 779
APPE ND IX B Making Inferences About Population Parameters: A Brief
Summary 819 GLOSSARY 821 INDEX 828
ix
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 9
22/03/23 6:11 PM
Get complete eBook Order by email at [email protected]
Contents
ABOUT THE AUTHORS iii PREFACE iv
3 Descriptive Statistics
1 Introduction to Statistics and
Decision Dilemma: Laundry Statistics 56 Introduction 57 3.1 Measures of Central Tendency 57 3.2 Measures of Variability 65 Thinking Critically About Statistics in Business Today 3.1 65 Thinking Critically About Statistics in Business Today 3.2 77 3.3 Measures of Shape 79 3.4 Business Analytics Using Descriptive Statistics 83 Decision Dilemma Solved 85 Key Considerations 86 Why Statistics Is Relevant 86 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Coca-Cola Develops the African Market 91 Big Data Case 92 Using Software for Data Analysis 93 Answers 93
Business Analytics
1
Decision Dilemma: Statistics Describe the State of Business in India’s Countryside 1 Introduction 2 1.1 Basic Statistical Concepts 3 1.2 Variables, Data, and Data Measurement 5 Thinking Critically About Statistics in Business Today 1.1 6 1.3 Big Data 10 1.4 Business Analytics 11 1.5 Data Mining and Data Visualization 13 Decision Dilemma Solved 15 Key Considerations 16 Why Statistics Is Relevant 16 Summary of Learning Objectives / Key Terms / Supplementary Problems / Exploring the Databases with Business Analytics Case: Canadian Farmers Dealing with Stress 19 Big Data Case 20 Using Software for Data Analysis 20 Answers 20
2 Visualizing Data with Charts and Graphs
21
Decision Dilemma: Energy Consumption Around the World 21 Introduction 22 2.1 Frequency Distributions 23 2.2 Quantitative Data Graphs 26 2.3 Qualitative Data Graphs 32 Thinking Critically About Statistics in Business Today 2.1 33 2.4 Charts and Graphs for Two Variables 37 2.5 Visualizing Time-Series Data 41 Decision Dilemma Solved 45 Key Considerations 46 Why Statistics Is Relevant 46 Summary of Learning Objectives / Key Terms / Supplementary Problems / Exploring the Databases with Business Analytics Case: Southwest Airlines and WestJet Airlines Ltd. 51 Big Data Case 53 Using Software for Data Analysis 53 Answers 55
4 Probability
56
95
Decision Dilemma: Education, Gender, and Employment 95 Introduction 96 4.1 Introduction to Probability 96 4.2 Structure of Probability 99 4.3 Marginal, Union, Joint, and Conditional Probabilities 105 4.4 Addition Laws 107 4.5 Multiplication Laws 114 4.6 Conditional Probability 119 Thinking Critically About Statistics in Business Today 4.1 121 4.7 Revision of Probabilities: Bayes’ Rule 125 Decision Dilemma Solved 129 Key Considerations 129 Why Statistics Is Relevant 130 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Bluewater Recycling Association Offers Bigger Bins 135 Big Data Case 136 Answers 136
x
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 10
22/03/23 6:11 PM
Get complete eBook Order by email at [email protected] Contents xi
5 Discrete Distributions
138
Decision Dilemma: Life with a Cellphone 138 Introduction 139 5.1 Discrete Versus Continuous Distributions 139 5.2 Describing a Discrete Distribution 141 5.3 Binomial Distribution 145 Thinking Critically About Statistics in Business Today 5.1 148 5.4 Poisson Distribution 155 Thinking Critically About Statistics in Business Today 5.2 157 5.5 Hypergeometric Distribution 165 Decision Dilemma Solved 168 Key Considerations 169 Why Statistics Is Relevant 169 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Whole Foods Market Grows Through Mergers and Acquisitions 175 Big Data Case 176 Using Software for Data Analysis 176 Answers 176
Thinking Critically About Statistics in Business Today 7.1 221 7.2 Sampling Distribution of a Sample Mean 226 7.3 Sampling Distribution of a Sample Proportion 235 Decision Dilemma Solved 238 Key Considerations 238 Why Statistics Is Relevant 238 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: 3M 242 Big Data Case 243 Using Software for Data Analysis 243 Answers 243
8 Statistical Inference: Estimation for Single Populations
245
Decision Dilemma: Canadian National (CN) Railway 178 Introduction 179 6.1 Uniform Distribution 179 6.2 Normal Distribution 183 Thinking Critically About Statistics in Business Today 6.1 183 6.3 Using the Normal Curve to Approximate Binomial Distribution Problems 194 6.4 Exponential Distribution 200 Decision Dilemma Solved 204 Key Considerations 205 Why Statistics Is Relevant 205 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Mercedes Goes after Younger Buyers 210 Big Data Case 210 Using Software for Data Analysis 211 Answers 211
Decision Dilemma: Batteries and Bulbs: How Long Do They Last? 245 Introduction 246 8.1 Estimating the Population Mean Using the z Statistic (σ Known) 247 8.2 Estimating the Population Mean Using the t Statistic (σ Unknown) 254 Thinking Critically About Statistics in Business Today 8.1 254 8.3 Estimating the Population Proportion 260 Thinking Critically About Statistics in Business Today 8.2 260 8.4 Estimating the Population Variance 264 8.5 Estimating Sample Size 267 Decision Dilemma Solved 271 Key Considerations 272 Why Statistics Is Relevant 273 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Solutions Your Organized Living Store 277 Big Data Case 278 Using Software for Data Analysis 278 Answers 279
7 Sampling and Sampling
9 Statistical Inference: Hypothesis
6 Continuous Distributions
Distributions
178
213
Decision Dilemma: Spending Behaviour on Cough and Cold Medicine 213 Introduction 214 7.1 Sampling 214
Testing for Single Populations
280
Decision Dilemma: Business Referrals and Social Media Influencers 280 Introduction 281 9.1 Introduction to Hypothesis Testing 282
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 11
22/03/23 6:11 PM
Get complete eBook Order by email at [email protected] xii Contents
9.2
Testing Hypotheses About a Population Mean Using the z Statistic (σ Known) 291 9.3 Testing Hypotheses About a Population Mean Using the t Statistic (σ Unknown) 298 9.4 Testing Hypotheses About a Proportion 304 Thinking Critically About Statistics in Business Today 9.1 304 9.5 Testing Hypotheses About a Variance 309 9.6 Solving for Type II Errors 312 Decision Dilemma Solved 319 Key Considerations 320 Why Statistics Is Relevant 320 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: A&W’s New Menu Targets Meat Alternatives 325 Big Data Case 326 Using Software for Data Analysis 326 Answers 327
10 Statistical Inferences About Two Populations
329
Decision Dilemma: L.L. Bean 329 Introduction 330 10.1 Hypothesis Testing and Confidence Intervals About the Difference in Two Means Using the z Statistic: Population Variances Known 332 10.2 Hypothesis Testing and Confidence Intervals About the Difference in Two Means Using the t Statistic: Independent Samples with Population Variances Unknown 340 Thinking Critically About Statistics in Business Today 10.1 341 10.3 Statistical Inferences for Two Related Populations 349 10.4 Statistical Inferences About Two Population Proportions 358 10.5 Testing Hypotheses About Two Population Variances 365 Decision Dilemma Solved 372 Key Considerations 372 Why Statistics Is Relevant 372 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Seitz LLC: Producing Quality Gear-Driven and Linear-Motion Products 379 Big Data Case 380 Using Software for Data Analysis 380 Answers 381
11 Analysis of Variance and Design of Experiments
383
Decision Dilemma: Job and Career Satisfaction of Foreign Self-Initiated Expatriates 383 Introduction 384 11.1 Introduction to Design of Experiments 385 11.2 The Completely Randomized Design (One-Way ANOVA) 387 Thinking Critically About Statistics in Business Today 11.1 388 11.3 Multiple Comparison Tests 398 11.4 The Randomized Block Design 406 11.5 A Factorial Design (Two-Way ANOVA) 415 Decision Dilemma Solved 427 Key Considerations 428 Why Statistics Is Relevant 428 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: ASCO Valve Canada’s RedHat Valve 434 Big Data Case 435 Using Software for Data Analysis 436 Answers 436
12 Correlation and Simple Regression Analysis
438
Decision Dilemma: Predicting International Hourly Wages by the Price of a Big MacTM 438 Introduction 439 12.1 Correlation 439 Thinking Critically About Statistics in Business Today 12.1 441 12.2 Introduction to Simple Regression Analysis 443 12.3 Determining the Equation of the Regression Line 445 12.4 Residual Analysis 451 12.5 Standard Error of the Estimate 458 12.6 Coefficient of Determination 461 12.7 Hypothesis Tests for the Slope of the Regression Model and for the Overall Model 464 12.8 Estimation 469 12.9 Using Regression to Develop a Forecasting Trend Line 473 12.10 Interpreting the Output 479 Decision Dilemma Solved 480 Key Considerations 480 Why Statistics Is Relevant 481 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 12
22/03/23 6:11 PM
Get complete eBook Order by email at [email protected] Contents xiii
15 Time-Series Forecasting and Index
Case: Caterpillar, Inc. 487 Big Data Case 488 Using Software for Data Analysis 489 Answers 489
13 Multiple Regression Analysis
Numbers
491
Decision Dilemma: Will You Like Your New Job? 491 Introduction 492 13.1 The Multiple Regression Model 492 13.2 Significance Tests of the Regression Model and Its Coefficients 500 13.3 Residuals, Standard Error of the Estimate, and R2 504 Thinking Critically About Statistics in Business Today 13.1 504 13.4 Interpreting Multiple Regression Computer Output 510 13.5 Using Regression Analysis: Some Caveats 513 Decision Dilemma Solved 517 Key Considerations 518 Why Statistics Is Relevant 518 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Starbucks Introduces Debit Card 522 Big Data Case 524 Using Software for Data Analysis 524 Answers 524
14 Building Multiple
Regression Models
526
Decision Dilemma: Predicting CEO Salaries 526 14.1 Nonlinear Models: Mathematical Transformation 527 Thinking Critically About Statistics in Business Today 14.1 529 14.2 Indicator (Dummy) Variables 542 14.3 Model Building: Search Procedures 547 14.4 Multicollinearity 557 14.5 Logistic Regression 560 Decision Dilemma Solved 567 Key Considerations 568 Why Statistics Is Relevant 568 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Ceapro Turns Oats into Beneficial Products 574 Big Data Case 575 Using Software for Data Analysis 576 Answers 576
579
Decision Dilemma: Forecasting Air Pollution 579 Introduction 580 15.1 Introduction to Forecasting 580 15.2 Smoothing Techniques 585 15.3 Trend Analysis 595 Thinking Critically About Statistics in Business Today 15.1 595 15.4 Seasonal Effects 603 15.5 Autocorrelation and Autoregression 608 15.6 Index Numbers 615 Decision Dilemma Solved 622 Key Considerations 624 Why Statistics Is Relevant 624 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Dofasco Changes Its Style 631 Big Data Case 632 Using Software for Data Analysis 632 Answers 633
16 Analysis of Categorical Data
635
Decision Dilemma: Selecting Suppliers in the Electronics Industry 635 Introduction 636 16.1 Chi-Square Goodness-of-Fit Test 636 16.2 Contingency Analysis: Chi-Square Test of Independence 643 Thinking Critically About Statistics in Business Today 16.1 644 Decision Dilemma Solved 650 Key Considerations 650 Why Statistics Is Relevant 651 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Foot Locker in the Shoe Mix 654 Big Data Case 655 Using Software for Data Analysis 655 Answers 655
17 Nonparametric Statistics
657
Decision Dilemma: How Is the Doughnut Business Doing? 657 Introduction 658 17.1 Runs Test 659 17.2 Mann-Whitney U Test 664
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 13
22/03/23 6:11 PM
Get complete eBook Order by email at [email protected] xiv Contents
Thinking Critically About Statistics in Business Today 17.1 665 17.3 Wilcoxon Matched-Pairs Signed Rank Test 673 17.4 Kruskal-Wallis Test 680 17.5 Friedman Test 684 17.6 Spearman’s Rank Correlation 689 Decision Dilemma Solved 693 Key Considerations 694 Why Statistics Is Relevant 694 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Schwinn 700 Big Data Case 701 Using Software for Data Analysis 702 Answers 702
18 Statistical Quality Control
704
Decision Dilemma: Italy’s Piaggio Makes a Comeback 704 Introduction 705 18.1 Introduction to Quality Control 705 Thinking Critically About Statistics in Business Today 18.1 710 18.2 Process Analysis 715 18.3 Control Charts 721 Decision Dilemma Solved 735 Key Considerations 735 Why Statistics Is Relevant 736 Summary of Learning Objectives / Key Terms / Formulas / Supplementary Problems / Exploring the Databases with Business Analytics Case: Catalyst Paper Introduces Microsoft Dynamics CRM 741 Big Data Case 742 Answers 742
19 Decision Analysis
744
Decision Dilemma: Decision-Making at the CEO Level 744 Introduction 745 19.1 The Decision Table and Decision-Making Under Certainty 746 19.2 Decision-Making Under Uncertainty 748 Thinking Critically About Statistics in Business Today 19.1 748 19.3 Decision-Making Under Risk 755 19.4 Revising Probabilities in Light of Sample Information 763 Decision Dilemma Solved 771 Key Considerations 772 Why Statistics Is Relevant 772 Summary of Learning Objectives / Key Terms / Formula / Supplementary Problems / Exploring the Databases with Business Analytics Case: Fletcher-Terry: On the Cutting Edge 776 Big Data Case 777 Answers 778 A P P E NDIX A Tables 779 A P P E NDIX B Making Inferences About Population
Parameters: A Brief Summary 819
GLOSSARY 821 INDEX 828
Get complete eBook Order by email at [email protected] Black4CE_FM.indd 14
22/03/23 6:12 PM
Get complete eBook Order by email at [email protected]
CHAPTER 1
Introduction to Statistics and Business Analytics LEARNING OBJECTIVES The primary objective of Chapter 1 is to introduce you to the world of statistics and analytics, thereby enabling you to: 1.1 Define important statistical terms, including population, sample, and parameter, as they relate to descriptive and inferential statistics. 1.2 Explain the difference between variables, measurement, and data, and compare the four different levels of data: nominal, ordinal, interval, and ratio. 1.3 Explain the differences between the four dimensions of big data. 1.4 Compare and contrast the three categories of business analytics. 1.5 Describe the data mining and data visualization processes.
Decision Dilemma
Kailash Kumar/Getty Images
Statistics Describe the State of Business in India’s Countryside
India is the second-most populous country in the world, with more than 1.4 billion people. Nearly 70% of the people live in rural areas, scattered about the countryside in 600,000 villages. In fact, it may be said that more than one in every ten people in the world live in rural India. While it has a per capita income of US$1.50 per day, rural India, which has been described in the past as poor and semi-literate, now contributes about one-half of the country’s gross national product (GNP). However, rural India still has the most households in the world without electricity. Despite its poverty and economic disadvantages, there are compelling reasons for companies to market their goods and
services to rural India. This market has been steadily growing. There is increasing agricultural productivity, leading to growth in disposable income, and there is a reduction in the gap between the tastes of urban and rural customers. The literacy level is increasing, and people are becoming more conscious of their lifestyles and of opportunities for a better life. Agriculture is no longer the main source of income in rural India, with only 23% coming from farming. Nearly three-quarters of rural India has a mobile connection. Rural consumers are saving, on average, 25% of their income, thereby presenting a significant opportunity for an expansion in the use of and demand for banking and investment products. In addition, there has been an increased consumer priority on health and hygiene products. The rural Fast-Moving Consumer Goods market in India is expected to grow from US$23.63 billion in fiscal year 2018 to US$220 billion by 2025. Because of such factors, many global and Indian firms, such as Microsoft, General Electric, Nestlé, Kellogg’s, Colgate-Palmolive, Hindustan-Unilever, Godrej, Nirma, Novartis, Dabur, Tata, Hero, Bajaj, and Vodafone India, among many others, have entered the rural Indian market with enthusiasm. Marketing to rural customers often involves persuading them to try products that they may not have used before. Rural India is a huge, relatively untapped market for businesses. However, entering such a market is not without risks and obstacles. The dilemma facing companies is whether to enter this marketplace and, if so, to what extent and how.
1
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 1
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 2 C HA PTER 1 Introduction to Statistics and Business Analytics Managerial, Statistical, and Analytical Questions 1. Are the statistics presented in this report exact figures or estimates? 2. How and where could the business analysts have gathered such data? 3. In measuring the potential of the rural Indian marketplace, what other statistics could have been gathered? 4. What levels of data measurement are represented by data on rural India? 5. How can managers use these and other statistics to make better decisions about entering this marketplace? 6. What big data might be available on rural India?
Sources: Adapted from “India—Population 2021,” statisticstimes. com, December 9, 2021; “India’s Rural-Urban Divide: Village Worker Earns Less Than Half of City Peer,” financialexpress.com, December 12, 2019; “Living in the Dark: 240 Million Indians Have No Electricity,” bloomberg.com, January 24, 2017; “Only 23% of Rural Income from Farming, Reveals NABARD 2016-2017 Survey,” indianexpress.com, November 1, 2021; “Agriculture No Longer Main Rural Income Source,” business-standard.com, November 16, 2017; “Tring Tring! Nearly Three-Fourths of Rural India Has a Mobile Connection Now,” thebetterindia.com, January 24, 2019; “Online Now the Second Largest Consumed Media after TV in Rural India: Kantar-Dialogue Factory,” mediabrief.com, September 8, 2021; “Fast Moving Consumer Goods (FMCG),” readkong.com, January 2021.
Introduction Every minute of the working day, decisions are made by businesses around the world that determine whether companies will be profitable and grow or stagnate and die. Most of these decisions are made with the assistance of information gathered about the marketplace, the economic and financial environment, the workforce, the competition, and other factors. Such information usually comes in the form of data or is accompanied by data. Business statistics and business analytics provide the tools by which such data are collected, analyzed, summarized, and presented to facilitate the decision-making process, and both business statistics and business analytics play an important role in the ongoing saga of decision-making within the dynamic world of business. There is a wide variety of uses and applications of statistics in business. Several examples follow. • A survey of 3,893 Canadians aged 18 or older conducted by Bank of Canada indicated that 80% of Canadians have no plans to become cashless in the next five years, whereas 12% of Canadians are already cashless.1 • According to an Ipsos survey of 1,001 Canadians, 44% intend to ask for a raise later this year in response to high rates of inflation and rising interest rates.2 • According to Statistics Canada, 84.4% of Canadians over 15 years of age have a smartphone for personal use. For 53.2% of Canadians, checking their smartphone is the first thing they do upon waking up in the morning. This habit is more apparent in younger age brackets, with 74.0% and 77.3% of males and females, respectively, between 15 and 24 waking up to their smartphones, compared to 24.8% and 20.3% of males and females, respectively aged 65 and over.3 • A Deloitte Retail “Green” survey of 1,080 adults revealed that 54% agreed that plastic, non-compostable shopping bags should be banned. • A J.D. Power survey of 6,699 Canadians found that satisfaction with credit card customer service had fallen from 811 to 794 (out of 1,000) over a one-year period, with 22% of respondents indicating that they had switched their primary card to avoid paying annual fees.4 1
Heng Chen, Walter Engert, Marie-Hélène Felt, Kim Huynh, Gradon Nicholls, Daneal O’Habib, and Julia Zhu, “Cash and COVID-19: The Impact of the Second Wave in Canada,” Bank of Canada, July 2021, https:// www.bankofcanada.ca/2021/07/staff-discussion-paper-2021-12/.
2
Darrell Bricker, “85% of Canadians Are Concerned That Inflation Will Make Things Less Affordable,” Ipsos, June 22, 2022, https://www.ipsos.com/en-ca/news-polls/85-percent-of-canadians-concernedinflation-less-affordable.
3
Statistics Canada, “Smartphone Personal Use and Selected Smartphone Habits by Gender and Age Group,” Table 22-10-0143-01, June 22, 2021, https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=2210014301.
4
J.D. Power, “Credit Card Customers in Canada Spending Less but Have Higher Satisfaction, J.D. Power Finds,” September 16, 2021, https://www.jdpower.com/business/press-releases/2021-canada-credit-cardsatisfaction-study.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 2
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 1.1 Basic Statistical Concepts 3
• In Deloitte’s Food Consumer Survey of approximately 1,000 respondents, 23% were satisfied with their online grocery shopping experience. The 18- to 34-year-old age group was more satisfied (28%) than the national average, but still purchased 84% of their monthly groceries in bricks-and-mortar stores.5 Note that, in most of these examples, business analysts have conducted a study and provided rich and interesting information that can be used in business decision-making. In this text, we will examine several types of graphs for visualizing data as we study ways to arrange or structure data into forms that are both meaningful and useful to decision-makers. We will learn about techniques for sampling from a population that allow studies of the business world to be conducted in a less expensive and more timely manner. We will explore various ways to forecast future values and we will examine techniques for predicting trends. This text also includes many statistical and analytics tools for testing hypotheses and for estimating population values. These and many other exciting statistics and statistical techniques await us on this journey through business statistics and analytics. Let’s begin!
1.1 Basic Statistical Concepts LEARNING OBJECTIVE 1.1 Define important statistical terms, including population, sample, and parameter, as they relate to descriptive and inferential statistics. Business statistics, like many areas of study, has its own language. It is important to begin our study with an introduction of some basic concepts in order to understand and communicate about the subject. We begin with a discussion of the word statistics. This word has many different meanings in our culture. Webster’s Third New International Dictionary gives a comprehensive definition of statistics as a science dealing with the collection, analysis, statistics A science dealing with the collection, analysis, interpretation, and presentation of numerical data. Viewed from this perspective, statistics interpretation, and presentation includes all the topics presented in this text. Figure 1.1 captures the key elements of busiof numerical data. ness statistics. population A collection of The study of statistics can be organized in a variety of ways. One of the main ways is persons, objects, or items of to subdivide statistics into two branches: descriptive statistics and inferential statistics. To interest. understand the difference between descriptive and inferential statistics, definitions of popcensus A process of gathering ulation and sample are helpful. Webster’s Third New International Dictionary defines popudata from the whole population for lation as a collection of persons, objects, or items of interest. The population can be a widely a given measurement of interest. defined category, such as “all automobiles,” or it can be narrowly defined, such as “all Ford Escape crossover vehicles produced from Year 1 to Year 2.” A population can be a group of people, such as “all workers employed by Microsoft,” or it can be a set of objects, such as “all Toyota RAV4s produced in February of Year 1 by Toyota Canada at the Woodstock, Ontario, plant.” The analyst defines the population to be whatever he or she is studying. When analysts Collect Analyze Interpret Present gather data from the whole population for a given meadata data data findings surement of interest, they call it a census. Most people are familiar with the Canadian Census. Every five years, the government attempts to measure all persons living in this country. As another example, if an analyst FIGURE 1.1 The Key Elements of Statistics is interested in ascertaining the grade point average for
5
Deloitte, “The Conflicted Consumer—2021 Food Consumer Survey,” 2021, https://www2.deloitte.com/ca/en/ pages/consumer-business/articles/future-of-food-a-canadian-perspective.html.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 3
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 4 C HA PTER 1 Introduction to Statistics and Business Analytics
sample A portion of the whole.
descriptive statistics Statistics that have been gathered on a group to describe or reach conclusions about that same group.
inferential statistics Statistics that have been gathered from a sample and used to reach conclusions about the population from which the sample was taken.
parameter A descriptive measure of the population. statistic A descriptive measure of a sample.
all students at the University of Toronto, one way to do so is to conduct a census of all students currently enrolled there. A sample is a portion of the whole and, if properly taken, is representative of the whole. For various reasons (explained in Chapter 7), analysts often prefer to work with a sample of the population instead of the entire population. For example, in conducting quality control experiments to determine the average life of smartphone batteries, a smartphone battery manufacturer might randomly sample only 75 batteries during a production run. Because of time and money limitations, a human resources manager might take a random sample of 40 employees instead of using a census to measure company morale. If a business analyst is using data gathered on a group to describe or reach conclusions about that same group, the statistics are called descriptive statistics. For example, if an instructor produces statistics to summarize a class’s examination results and uses those statistics to reach conclusions about that class only, the statistics are descriptive. The instructor can use these statistics to discuss class average, talk about the range of class scores, or present any other data measurements for the class based on the test. Most athletic statistics, such as batting averages, save percentages, and first downs, are descriptive statistics because they are used to describe an individual or team effort. Many of the statistical data generated by businesses are descriptive. They might include number of employees on vacation during June, average salary at the Edmonton office, corporate sales for the current fiscal year, average managerial satisfaction score on a company-wide census of employee attitudes, and average return on investment for lululemon for the first ten years of operations. Another type of statistics is called inferential statistics. If an analyst gathers data from a sample and uses the statistics generated to reach conclusions about the population from which the sample was taken, the statistics are inferential statistics. The data gathered from the sample are used to infer something about a larger group. Inferential statistics are sometimes referred to as inductive statistics. The use and importance of inferential statistics continue to grow. One application of inferential statistics is in pharmaceutical research. Some new drugs are expensive to produce, and therefore tests must be limited to small samples of patients. Utilizing inferential statistics, analysts can design experiments with small, randomly selected samples of patients and attempt to reach conclusions and make inferences about the population. Market analysts use inferential statistics to study the impact of advertising on various market segments. Suppose a soft drink company creates an advertisement depicting a dispensing machine that talks to the buyer, and market analysts want to measure the impact of the new advertisement on various age groups. The analyst could stratify the population into age categories ranging from young to old, randomly sample each stratum, and use inferential statistics to determine the effectiveness of the advertisement for the various age groups in the population. The advantage of using inferential statistics is that they enable the analyst to effectively study a wide range of phenomena without having to conduct a census. Most of the topics discussed in this text pertain to inferential statistics. A descriptive measure of the population is called a parameter. Parameters are usually denoted by Greek letters. Examples of parameters are population mean (μ), population variance (σ2), and population standard deviation (σ). A descriptive measure of a sample is called a statistic. Statistics are usually denoted by Roman letters. Examples of statistics are sample _ mean (x), sample variance (s2), and sample standard deviation (s). Differentiation between the terms parameter and statistic is important only in the use of inferential statistics. A business analyst often wants to estimate the value of a parameter or conduct tests about the parameter. However, the calculation of parameters is usually either impossible or infeasible because of the amount of time and money required to take a census. In such cases, the business analyst can take a random sample of the population, calculate a statistic on the sample, and infer by estimation the value of the parameter. The basis for inferential statistics, then, is the ability to make decisions about parameters without having to complete a census of the population. For example, a manufacturer of washing machines would probably want to determine the average number of loads that a new machine can wash before it needs repairs. The
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 4
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 1.2 Variables, Data, and Data Measurement 5
parameter is the population mean or average number of washes per machine before repair. A company analyst takes a sample of machines, computes the number of washes before repair for each machine, averages the numbers, and estimates the population value or parameter by using the statistic, which in this case is the sample average. Figure 1.2 demonstrates the inferential process. Inferences about parameters are made under uncertainty. Unless parameters are computed directly from the population, the statistician never knows with certainty whether the estimates or inferences made from samples are true. In an effort to estimate the level of confidence in the result of the process, statisticians use probability statements. For this and other reasons, part of this text is devoted to probability (Chapter 4).
Calculate x to estimate
Population
Sample x (statistic)
(parameter)
Select a random sample FIGURE 1.2 Process of Applying Inferential Statistics to
Estimate a Population Mean (μ)
Concept Check Fill in the blanks. 1. Descriptive statistics can be used to _____________ the data to describe a data sample either numerically or graphically. 2. Statistical inference is inference about a _____________ from a random data _____________ drawn from it.
1.2 Variables, Data, and Data Measurement LEARNING OBJECTIVE 1.2 Explain the difference between variables, measurement, and data, and compare the four different levels of data: nominal, ordinal, interval, and ratio. Business statistics is about measuring phenomena in the business world and organizing, analyzing, and presenting the resulting numerical information in such a way that better, more informed business decisions can be made. Most business statistics studies contain variables, measurements, and data. In business statistics, a variable is a characteristic of any entity being studied that is capable of taking on different values. Some examples of variables in business might include return on investment, advertising dollars, labour productivity, stock price, historic cost, total sales, market share, age of worker, earnings per share, kilometres driven to work, time spent in store shopping, and many, many others. In business statistics studies, most variables produce a measurement that can be used for analysis. A measurement occurs when a standard process is used to assign numbers to particular attributes or characteristics of a variable. Many measurements are obvious, such as the time a customer spends shopping in a store, the age of the worker, or the number of kilometres driven to work. However, some measurements, such as labour productivity, customer satisfaction, and return on investment, have to be defined by the business analyst or by experts within the field. Once such measurements are recorded and stored, they can be denoted as “data.” It can be said that data are recorded measurements. The processes of measuring and data gathering are basic to all that we do in business statistics and analytics. It is data that are analyzed by business statisticians and analysts in order to learn more about the variables being studied. Sometimes, sets of data are organized into databases as a way to store them or as a means for more conveniently analyzing data or comparing variables. Valid data are the lifeblood of business statistics and business analytics, and it is
variable A characteristic of any entity being studied that is capable of taking on different values. measurement What occurs when a standard process is used to assign numbers to particular attributes or characteristics of a variable. data Recorded measurements.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 5
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 6 C HA PTER 1 Introduction to Statistics and Business Analytics
Thinking Critically About Statistics in Business Today 1.1 Cellular Phone Use in Japan The Communications and Information Network Association of Japan conducts an annual study of cellular phone use in Japan. A recent survey was taken as part of this study using a sample of 1,200 cellphone users split evenly between men and women and almost equally distributed over six age brackets. The survey was administered in the greater Tokyo and Osaka metropolitan areas. The study produced several interesting findings. Of the respondents, 98.3% said that their main-use terminal was a smartphone, while 1.6% said it was a feature phone, a mobile phone that incorporates features such as the ability to access the internet and store and play music but lacks the advanced functionality of a smartphone. Slightly more than 32% of respondents used multiple devices. Of those that did, almost 83% used tablets. Smartphone users in the survey reported that the two decisive factors in purchasing a new
Highest level of data measurement Ratio Interval Ordinal Nominal Lowest level of data measurement FIGURE 1.3 Hierarchy of
Levels of Data
smartphone were purchase price (81.0%) and monthly payment cost (87.6%). Over half of all respondents used their device for mobile cashless payment services.
Things to Ponder 1. In what way was this study an example of inferential statistics? 2. What is the population of this study? 3. What are some of the variables being studied? 4. How might a study such as this yield information that is useful to business decision-makers? Source: Adapted from “CIAJ Releases Report on the Study of Mobile Phone Use,” Communications and Information Network Association of Japan (CIAJ), December 9, 2021, www.ciaj.or.jp/en/news/ news2020/1071.html.
important that the business analyst pay thoughtful attention to the creation of meaningful, valid data before embarking on analysis and reaching conclusions (see Thinking Critically About Statistics in Business Today 1.1). Immense volumes of numerical data are gathered by businesses every day, representing myriad items. For example, numbers represent dollar costs of items produced, geographical locations of retail outlets, masses of shipments, and rankings of employees at yearly reviews. Not all such data should be analyzed in the same way statistically because the entities represented by the numbers are different. For this reason, the business analyst needs to know the level of data measurement represented by the numbers being analyzed. The disparate use of numbers can be illustrated by the numbers 40 and 80, which could represent the masses of two objects being shipped, the ratings received on a consumer test by two different products, or the hockey jersey numbers of a centre and a winger. Although 80 kg is twice as much as 40 kg, the winger is probably not twice as big as the centre! Averaging the two masses seems reasonable but averaging the hockey jersey numbers makes no sense. The appropriateness of the data analysis depends on the level of measurement of the data gathered. The phenomenon represented by the numbers determines the level of data measurement. Four common levels of data measurement are: 1. Nominal 2. Ordinal 3. Interval 4. Ratio Nominal is the lowest level of data measurement, followed by ordinal, interval, and ratio. Ratio is the highest level of data, as shown in Figure 1.3.
Nominal Level nominal‐level data The lowest level of data measurement; used only to classify or categorize.
The lowest level of data measurement is the nominal level. Numbers representing ominal-level data (the word level is often omitted) can be used only to classify or categon rize. Employee identification numbers are an example of nominal data. The numbers are used only to differentiate employees and not to make a value statement about them. Many demographic questions in surveys result in data that are nominal because the questions are used for classification only. The following is an example of a question that would result in nominal data:
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 6
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 1.2 Variables, Data, and Data Measurement 7
Which of the following employment classifications best describes your area of work? a. Educator b. Construction worker c. Manufacturing worker d. Lawyer e. Doctor f. Other Suppose that, for computing purposes, an educator is assigned a 1, a construction worker is assigned a 2, a manufacturing worker is assigned a 3, and so on. These numbers should be used only to classify respondents. The number 1 does not denote the top classification. It is used only to differentiate an educator (1) from a lawyer (4) or any other occupation. Other types of variables that often produce nominal-level data are marital status, religion, language, geographic location, and employment status. Social insurance numbers, telephone numbers, and employee ID numbers are further examples of nominal data. Statistical techniques that are appropriate for analyzing nominal data are limited. However, some of the more widely used statistics, such as the chi-square statistic, can be applied to nominal data, often producing useful information.
Ordinal Level Ordinal-level data measurement is higher than the nominal level. In addition to having the nominal-level capabilities, ordinal-level measurement can be used to rank or order objects. For example, using ordinal data, a supervisor can evaluate three employees by ranking their productivity with the numbers 1 through 3. The supervisor could identify one employee as the most productive, one as the least productive, and one as somewhere in between by using ordinal data. However, the supervisor could not use ordinal data to establish that the intervals between the employees ranked 1 and 2 and between the employees ranked 2 and 3 are equal; that is, the supervisor could not say that the differences in the amount of productivity between the workers ranked 1, 2, and 3 are necessarily the same. With ordinal data, the distances between consecutive numbers are not always equal. Some Likert-type scales on questionnaires are considered by many analysts to be ordinal in level. The following is an example of such a scale: This computer tutorial is
not helpful 1
somewhat helpful 2
moderately helpful 3
very helpful 4
ordinal‐level data Next‐higher level of data from nominal‐level data; can be used to order or rank items, objects, or people.
extremely helpful 5
When this survey question is coded for the computer, only the numbers 1 through 5 will remain, not the descriptions. Virtually everyone would agree that a 5 is higher than a 4 on this scale and that it is possible to rank responses. However, most respondents would not consider the differences between not helpful, somewhat helpful, moderately helpful, very helpful, and extremely helpful to be equal. Mutual funds are sometimes rated in terms of investment risk by using measures of default risk, currency risk, and interest rate risk. These three measures are applied to investments by rating them as high, medium, or low risk. Suppose high risk is assigned a 3, medium risk a 2, and low risk a 1. If a fund is awarded a 3 rather than a 2, it carries more risk, and so on. However, the differences in risk between categories 1, 2, and 3 are not necessarily equal. Thus, these measurements of risk are only ordinal-level measurements. Another example of the use of ordinal numbers in business is the ranking of the 50 best employers in Canada in Report on Business magazine. The numbers ranking the companies are only ordinal in measurement. Certain statistical techniques are specifically suited to ordinal data, but many other techniques are not appropriate for use on such data. Because nominal and ordinal data are often derived from imprecise measurements such as demographic questions, the categorization of people or objects, or the ranking of items, nominal and ordinal data are nonmetric data and are sometimes referred to as qualitative data.
nonmetric data Nominal‐ and ordinal‐level data. Also called qualitative data.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 7
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 8 C HA PTER 1 Introduction to Statistics and Business Analytics
Interval Level interval‐level data Next to highest level of data. These data have all the properties of ordinal‐level data, but in addition, intervals between consecutive numbers have meaning. ratio‐level data Highest level of data measurement; contains the same properties as interval‐ level data, with the additional property that zero has meaning and represents the absence of the phenomenon being measured. metric data Interval‐ and ratio‐ level data. Also called quantitative data. parametric statistics A class of statistical techniques that contains assumptions about the population and that is used only with interval- and ratio‐level data. nonparametric statistics A class of statistical techniques that makes few assumptions about the population and is particularly applicable to nominal‐ and ordinal‐level data. Ratio Interval Ordinal Nominal
Interval-level data measurement is the next to the highest level of data, in which the distances between consecutive numbers have meaning and the data are always numerical. The distances represented by the differences between consecutive numbers are equal; that is, interval data have equal intervals. An example of interval measurement is Celsius temperature. With Celsius temperature numbers, the temperatures can be ranked, and the increases in heat between consecutive readings, such as 20°, 21°, and 22°, are the same. In addition, with interval-level data, the zero point is a matter of convention or convenience and not a natural or fixed zero point. Zero is just another point on the scale and does not mean the absence of the phenomenon. For example, zero degrees Celsius is not the lowest possible temperature. Some other examples of interval-level data are the percentage change in employment, the percentage return on a stock, and the dollar change in share price.
Ratio Level Ratio-level data measurement is the highest level of data measurement. Ratio data have the same properties as interval data, but ratio data have an absolute zero and the ratio of two numbers is meaningful. The notion of absolute zero means that zero is fixed, and the zero value in the data represents the absence of the characteristic being studied. The value of zero cannot be arbitrarily assigned because it represents a fixed point. This definition enables the statistician to create ratios with the data. Examples of ratio data are height, mass, time, volume, and Kelvin temperature. With ratio data, an analyst can state that 180 kg of mass is twice as much as 90 kg or, in other words, make a ratio of 180:90. Many of the data gathered by machines in industry are ratio data. Other examples in the business world that are ratio level in measurement are production cycle time, work measurement time, passenger distance, number of trucks sold, complaints per 10,000 flyers, and number of employees. With ratio-level data, no b factor is required in converting units from one measurement to another, that is, y = ax. As an example, in converting height from metres to feet, 1 m = 3.28 ft. Because interval- and ratio-level data are usually gathered by precise instruments often used in production and engineering processes, in standardized testing, or in standardized accounting procedures, they are called metric data and are sometimes referred to as quantitative data.
Comparison of the Four Levels of Data Figure 1.4 shows the relationships of the usage potential among the four levels of data measurement. The concentric squares denote that each higher level of data can be analyzed by any of the techniques used on lower levels of data but, in addition, can be used in other statistical techniques. Therefore, ratio data can be analyzed by any statistical technique applicable to the FIGURE 1.4 Usage Potential of other three levels of data plus some others. Various Levels of Data Nominal data are the most limited data in terms of the types of statistical analysis that can be used with them. Ordinal data allow the analyst to perform any analysis that can be done with nominal data and some additional ones. With ratio data, an analyst can make ratio comparisons and appropriately do any analysis that can be performed on nominal, ordinal, or interval data. Some statistical • Higher-level data techniques require ratio data and cannot be used to analyze other • Interval and ratio Metric levels of data. • Quantitative data data Statistical techniques can be separated into two categories: para• Can use parametric statistics metric statistics and nonparametric statistics. Parametric statistics require that data be interval or ratio. If the data are nominal or ordinal, nonparametric statistics must be used. Nonparametric statis• Lower-level data • Nominal and ordinal Nonmetric tics can also be used to analyze interval or ratio data. This text focuses • Qualitative data data largely on parametric statistics, with the exception of Chapter 16 and • Must use nonparametric statistics Chapter 17, which present nonparametric techniques. Thus, much of the material in this text requires interval or ratio data. Figure 1.5 contains a summary of metric data and nonmetric data. FIGURE 1.5 Metric vs. Nonmetric Data
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 8
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 1.2 Variables, Data, and Data Measurement 9
Concept Check Match the level with the correct measurement. 1. Nominal
a. Measures with a fixed zero that means “no quantity”; a constant interval (distance on the scale)
2. Ordinal
b. Measures require a fixed distance, but the zero point is arbitrary
3. Interval
c. Measures classified by shared attributes (characteristics)
4. Ratio
d. Measures by orderings (ranks)
Demonstration Problem 1.1 The health-care system is constantly changing. Because of the growing pressure to deliver a high level of service at increasingly low costs and the need to determine how providers can better serve their patients, hospital administrators sometimes administer a quality satisfaction survey to their patients after their release from hospital. The following types of questions are sometimes asked on such a survey. What level of data measurement will these questions result in? 1. How long ago were you released from the hospital? 2. Which type of unit were you in for most of your stay? ___Coronary care unit ___Intensive care unit ___Maternity care unit ___Medical unit ___Pediatric/children’s unit ___Surgical unit 3. How serious was your condition when you were first admitted to the hospital? ___Critical
___Serious
___Moderate
___Minor
___Good
___Fair
4. Rate the skill of your doctor: ___Excellent
___Very Good
___Poor
5. Rate the nursing care on the following scale from 1 to 7: Poor 1 2 3 4 5 6 7 Excellent Solution Question 1 is a time measurement with an absolute zero and is therefore a ratio-level measurement. A person who has been out of the hospital for two weeks has been out twice as long as someone who has been out of the hospital for one week. Question 2 yields nominal data because the patient is asked only to categorize the type of unit he or she was in. This question does not require a hierarchy or ranking of the type of unit. Questions 3 and 4 are likely to result in ordinal-level data. Suppose a number is assigned to each descriptor in each of these two questions. For question 3, “critical” might be assigned a 4, “serious” a 3, “moderate” a 2, and “minor” a 1. Certainly, the higher the number, the more serious the patient’s condition is. Thus, these responses can be ranked by selection. However, the increases in importance from 1 to 2 to 3 to 4 are not necessarily equal. This same logic applies to the numerical values assigned in question 4. Question 5 displays seven numerical choices with equal distances between the numbers shown on the scale and no adjective descriptors assigned to the numbers. Many analysts would declare this to be interval-level measurement because of the equal distance between numbers and the absence of a true zero on this scale. Other analysts might argue that, because of the imprecision of the scale and the vagueness of selecting values between “poor” and “excellent,” the measurement is only ordinal in level.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 9
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 10 C HA PTER 1 Introduction to Statistics and Business Analytics
1.3 Big Data LEARNING OBJECTIVE 1.3 Explain the differences between the four dimensions of big data.
big data A large amount of either organized or unorganized data from different sources that is difficult to process using traditional data management and processing applications, and is analyzed to make an informed decision or evaluation.
In the current world of business, the data available to decision-makers to help them produce better business outcomes are growing exponentially. Growing sources of data are available from the internet, social media, governments, transportation systems, health care, environmental organizations, and a plethora of business data sources, among others. Business data include, but are not limited to, consumer information, labour statistics, financials, product and resource tracking information, supply chain information, operations information, and human resource information. According to vCloud, 2.5 quintillion bytes of data are created every day. As a business example, from its millions of products and hundreds of millions of customers, Walmart alone collects multi-terabytes of new data every day, which are then added to its petabytes of historical data.6 The advent of such growth in the amount and types of data available to analysts, data scientists, and business decision-makers has resulted in a new term, “Big Data.” Big data has been defined as a collection of large and complex data sets from different sources that are difficult to process using traditional data management and processing applications.7 In addition, big data can be seen as a large amount of either organized or unorganized data that are analyzed to make an informed decision or evaluation.8 All data are not created in the same way, nor do they represent the same things. Thus, analysts recognize that there are at least four characteristics or dimensions associated with big data.9 These are: 1. Variety 2. Velocity 3. Veracity 4. Volume
variety Refers to the many different forms of data based on data sources. velocity Refers to the speed at which the data are available and at which they can be processed. veracity Has to do with the quality, correctness, and accuracy of data. volume Has to do with the ever‐increasing size of data and databases.
Variety refers to the many different forms of data based on data sources. A wide variety of data are available from such sources as mobile phones, videos, text, retail scanners, internet searches, government documents, multimedia, empirical research, and many others. Data can be structured (such as databases or Excel sheets) or unstructured (such as writing and photographs). Velocity refers to the speed at which the data is available and can be processed. The velocity characteristic of data is important in ensuring that data is current and updated in real time.10 Veracity of data has to do with data quality, correctness, and accuracy.11 Data lacking veracity may be imprecise, unrepresentative, inferior, and untrustworthy. In using such data to better understand business decisions, it might be said that the result is “Garbage In, Garbage Out.” Veracity indicates reliability, authenticity, legitimacy, and validity in the data. Volume has to do with the ever-increasing size of the data and databases. Big data produces vast amounts of data, as exemplified by the Walmart example mentioned above. A fifth characteristic or dimension of data that is sometimes considered is value. Analysis of data that does not generate value makes no contribution to an organization.12 6
“How Walmart Makes Data Work for Its Customers,” SAS.com, 2016, at www.sas.com/en_us/insights/ articles/analytics/how‐walmart‐makes‐data‐work‐for‐its‐customers.html.
7
Shih‐Chia Huang, Suzanne McIntosh, Stanislav Sobolevsky, and Patrick C. K. Hung, “Big Data Analytics and Business Intelligence in Industry,” Information Systems Frontiers 19 (2017): 1229–32.
8
Ibid.
9
Hans W. Ittmann, “The Impact of Big Data and Business Analytics on Supply Chain Management,” Journal of Transport and Supply Chain Management 9, no. 1 (2015): a165.
10
Huang et al., “Big Data Analytics and Business Intelligence in Industry.”
11
Ittmann, “The Impact of Big Data and Business Analytics on Supply Chain Management.”
12
Roger H. L. Chiang, Varun Grover, Ting-Peng Lian, and Zhang Dongsong, “Special Issue: Strategic Value of Big Data and Business Analytics,” Journal of Management Information Systems 35, no. 2: 383–87.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 10
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 1.4 Business Analytics 11
Along with this unparalleled explosion of data, major advances have been made in computing performance, network functioning, data handling, device mobility, and other areas. When businesses capitalize on these developments, new opportunities become available for discovering greater and deeper insights that could produce significant improvements in the way businesses operate and function. So how do businesses go about capitalizing on this potential?
1.4 Business Analytics LEARNING OBJECTIVE 1.4 Compare and contrast the three categories of business analytics. To benefit from the challenges, opportunities, and potentialities presented to business decision-makers by big data, the new field of “business analytics” has emerged. There are different definitions of business analytics in the literature. However, one that most accurately describes the intent of the approach here is that business analytics is the application of business analytics The application of processes and processes and techniques that transform raw data into meaningful information to improve techniques that transform raw decision-making.13 Because big data sources are so large and complex, new questions have data into meaningful information arisen that cannot always be effectively answered with traditional methods of analysis.14 In to improve decision‐making. light of this, new methodologies and processing techniques have been developed, giving birth 15 to a new era in decision-making referred to as the “business analytics period.” Other names sometimes used to refer to business analytics are business intelligence, data science, data analytics, analytics, and even big data, but the goal in all cases is to convert data into actionable insight for more timely and accurate decision-making.16 For example, when applied to the business environment, data analytics becomes synonymous with business analytics.17 Business analytics provides added value to data, resulting in deeper and broader business insights that can be used to improve decision-makers’ understanding in all aspects of business, as shown in Figure 1.6. Value added There are many new opportunities for employment in business analytics, yet there is presently a shortage of availBetter Big Business able talent in the field. Today’s executives say that they are decisiondata analytics making seeking data-driven leaders. They are looking for people who can work comfortably with both data and people— managers who can build useful models from data and also lead the team that will put them into practice.18 FIGURE 1.6 Business Analytics Add Value to Data
Categories of Business Analytics It might be said that the mission of business analytics is to apply processes and techniques to transform raw data into meaningful information. There is a plethora of such techniques available, drawing from such areas as statistics, operations research, mathematical modelling, data 13 Coleen R. Wilder and Ceyhun O. Ozgur, “Business Analytics Curriculum for Undergraduate Majors,” Informs 15, no. 2 (January 2015): 180–87, doi.org/10.1287/ited.2014.0134. 14
Dursun Delen and Hamed M. Zolbanin, “The Analytics Paradigm in Business Research,” Journal of Business Research 90 (2018): 186–95. 15
M. J. Mortenson, N. F. Doherty, and S. Robinson, “Operational Research from Taylorism to Terabytes: A Research Agenda for the Analytic Sage,” European Journal of Operational Research 241, no. 3 (2015): 583–95. 16 R. Sharda, D. Delen, and E. Turban, Business Intelligence Analytics and Data Science: A Managerial Perspective (Upper Saddle River, NJ: Pearson, 2017). 17 C. Aasheim, S. Williams, P. Rutner, and A. Gardner, “Data Analytics vs. Data Science: A Study of Similarities and Differences in Undergraduate Programs Based on Course Descriptions,” Journal of Information Systems Education 26, no. 2 (2015): 103–15. 18 Dan LeClair, “Integrating Business Analytics in the Marketing Curriculum: Eight Recommendations,” Marketing Education Review 28, no. 1 (2018): 6–13.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 11
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 12 C HA PTER 1 Introduction to Statistics and Business Analytics
mining, and artificial intelligence, to name a few. The various techniques and approaches provide different information to decision-makers. Accordingly, the business analytics community has organized and classified business analytic tools into three main categories: Descriptive Analytics, Predictive Analytics, and Prescriptive Analytics.
Descriptive Analytics
descriptive analytics Often the first step in the analytics process, it takes traditional data and describes what has happened or is happening in a business. It can be used to condense big data into smaller, more useful, data. Sometimes referred to as reporting analytics.
The simplest and perhaps the most commonly used of the three categories of business analytics is descriptive analytics. Often the first step in the analytics process, descriptive analytics takes traditional data and describes what has happened or is happening in a business. It can be used to condense big data into smaller, more useful data.19 Also referred to as reporting analytics, descriptive analytics can be used to discover hidden relationships in the data and identify undiscovered patterns.20 Descriptive analytics drills down in the data to uncover useful and important details, and mines data for trends.21 At this step, visualization can be a key technique for presenting information. Much of what is taught in a traditional introductory statistics course could be classified as descriptive analytics, including descriptive statistics, frequency distributions, discrete distributions, continuous distributions, sampling distributions, and statistical inference.22 We could probably add correlation and various clustering techniques to the mix, along with data mining and data visualization techniques.
Predictive Analytics
predictive analytics The second step in the analytics process, which finds relationships in data that are not readily apparent with descriptive analytics.
The next step in data reduction is predictive analytics, which finds relationships in the data that are not readily apparent with descriptive analytics.23 With predictive analytics, patterns or relationships are extrapolated forward in time, and the past is used to make predictions about the future.24 Predictive analytics also provides answers that move beyond using the historical data as the principal basis for decisions.25 It builds and assesses algorithmic models that are intended to make empirical rather than theoretical predictions, and are designed to predict future observations.26 Predictive analytics can help managers develop likely scenarios.27 Topics in predictive analytics can include regression, time-series, forecasting, simulation, data mining (see Section 1.5), statistical modelling, machine-learning techniques, and others. They can also include classifying techniques, such as decision-tree models and neural networks.28
prescriptive analytics Takes uncertainty into account, recommends ways to mitigate risks, and tries to see what the effect of future decisions will be in order to adjust the decisions before they are made.
Prescriptive Analytics The final stage of business analytics is prescriptive analytics, which is still in its early stages of development.29 Prescriptive analytics follows descriptive and predictive analytics in an attempt to find the best course of action under certain circumstances.30 The goal is to examine current trends and likely forecasts and use that information to make better decisions.31 Prescriptive analytics takes uncertainty into account, recommends ways to mitigate risks, and tries to see what the effect of future decisions will be in order to adjust the decisions before they are actually made.32 It does this by exploring a set of possible actions 19 Jeff Bertolucci, “Big Data Analytics: Descriptive vs. Predictive vs. Prescriptive,” Information Week, December 31, 2018, www.informationweek.com/big‐data/big‐data. 20
C. K. Praseeda and B. L. Shivakumar, “A Review of Trends and Technologies in Business Analytics,” International Journal of Advanced Research in Computer Science 5, no. 8 (November–December 2014): 225–29. 21
Watson ioT, “Descriptive, Predictive, Prescriptive: Transforming Asset and Facilities Management with Analytics,” IBM paper, 2017, www‐01.ibm.com/common/ssi/cgi‐bin/ssialias?infotype=SA&subtype=WH& htmlfid=TIW14162USEN. 22
Wilder and Ozgur, “Business Analytics Curriculum for Undergraduate Majors.”
23
B. Daniel, “Big Data and Analytics in Higher Education: Opportunities and Challenges,” British Journal of Educational Technology 46, no. 5 (2015): 904–20, onlinelibrary.wiley.com/doi/abs/10.1111/bjet.12230. 24
Praseeda and Shivakumar, “A Review of Trends and Technologies in Business Analytics.”
25
Ibid.
26
Delen and Zolbanin, “The Analytics Paradigm in Business Research.”
27
Watson ioT, “Descriptive, Predictive, Prescriptive.”
28
Sharda, Delen, and Turban, Business Intelligence Analytics and Data Science.
29
Sharda, Delen, and Turban, Business Intelligence Analytics and Data Science.
30
Delen and Zolbanin, “The Analytics Paradigm in Business Research.”
31
Sharda, Delen, and Turban, Business Intelligence Analytics and Data Science.
32
Praseeda and Shivakumar, “A Review of Trends and Technologies in Business Analytics.”
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 12
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 1.5 Data Mining and Data Visualization 13
based on descriptive and predictive analyses of complex data, and then suggesting courses of action.33 Prescriptive analytics evaluates data and determines new ways to operate while balancing all constraints,34 at the same time continually and automatically processing new data to improve recommendations and provide better decision options.35 It not only foresees what will happen and when, but also suggests why it will happen and recommends how to act in order to take advantage of the predictions.36 Predictive analytics uses a set of mathematical techniques that computationally determine an optimal action or decision given a complex set of objectives, requirements, and constraints.37 Topics in prescriptive analytics come from the fields of management science or operations research and are generally aimed at optimizing the performance of a system, using tools such as mathematical programming, simulation, network analysis, and others.38
1.5 Data Mining and Data Visualization LEARNING OBJECTIVE 1.5 Describe the data mining and data visualization processes. The dawning of the big data era has given rise to new and promising prospects for improving business decisions and outcomes. Two main components in the process of transforming the mountains of data now available into useful business information are data mining and data visualization.
Data Mining In the field of business, data mining is the process of collecting, exploring, and analyzing large volumes of data in an effort to uncover hidden patterns and/or relationships that can be used to enhance business decision-making. In short, data mining is a process used by companies to turn raw data into meaningful information that can potentially lead to some business advantage. Data mining allows business people to discover and interpret useful information that will help them make more knowledgeable decisions and better serve their customers and clients. Figure 1.7 displays the process of data mining, which involves finding the data, converting the data into useful forms, loading the data into a holding or storage location, managing the data, and making the data accessible to business analytics users. Data scientists often refer to the first three steps of this process as ETL (extract, transform, and load). Extracting involves locating the data by asking the question “Where can it be found?” Countless pieces of data are unearthed the world over in a great variety of forms and from multiple disparate sources. Because of this, extracting data can be the most time-consuming step.39 After a set of data has been located and extracted, it must be transformed or converted into a usable form. Included in this transformation may be a sorting process that determines which data are useful and which are not. In addition, data are typically “cleaned” by removing corrupt or incorrect records and identifying incomplete, incorrect, or irrelevant parts of the data.40 Often the data are sorted into columns and rows to improve usability 33
Watson ioT, “Descriptive, Predictive, Prescriptive.”
34
Daniel, “Big Data and Analytics in Higher Education.”
data mining The process of collecting, exploring, and analyzing large volumes of data in an effort to uncover hidden patterns and/or relationships that can be used to enhance business decision‐making.
Extract data
Transform data
Load data
Manage data
Make data accessible to business analyst
35
A. Basu, “Five Pillars of Prescriptive Analytics Success,” Analytics (March/April 2013): 8–12, analytics‐ magazine.org/executive‐edge‐five‐pillars‐of‐prescriptive‐analytics‐success/. 36
Praseeda and Shivakumar, “A Review of Trends and Technologies in Business Analytics.”
FIGURE 1.7 Process
of Data Mining
37
I. Lusting, B. Dietrich, C. Johnson, and C. Dziekan, “The Analytics Journey,” Analytics Magazine 3, no. 6 (2010): 11–3. 38
Sharda, Delen, and Turban, Business Intelligence Analytics and Data Science.
39
Shirley Zhao, “What Is ETL? (Extract, Transform, Load),” paper from Experian Data Quality, October 20, 2017, www.edq.com/blog/what‐is‐etl‐extract‐transform‐load/. 40 S. Wu, “A Review on Coarse Warranty Data and Analysis,” Reliability Engineering and System Safety 114 (2013): 1–11, doi.org/10.1016/j.ress.2012.12.021.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 13
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 14 C HA PTER 1 Introduction to Statistics and Business Analytics
and searchability.41 After the set of data is extracted and transformed, it is loaded into an end target, which is often a database. Data are often managed through a database management system, which is a software system that enables users to define, create, maintain, and control access to the database.42 Lastly, the ultimate goal of the process of data mining is to make the data accessible and usable to the business analyst.
Data Visualization
data visualization The study of the visual representation of data employed to convey data or information by imparting it as visual objects displayed in graphics.
As business organizations amass large reservoirs of data, one swift and easy way to obtain an overview of the data is through data visualization, which has been described as perhaps the most useful component of business analytics, and the element that makes it truly unique.43 So what is data visualization? Generally, data visualization is any attempt made by data analysts to help individuals better understand data by putting it in a visual context. Specifically, data visualization is the study of the visual representation of data and is employed to convey data or information by imparting it as visual objects displayed in graphics. Interpretation of data is vital to unlocking the potential value it holds and to making the most informed decisions.44 Using visual techniques to convey information hidden in data allows a broader audience with a wider range of backgrounds to view and understand its meaning. Data visualization tools help make data-driven insights accessible to people at all levels throughout an organization and can reveal surprising patterns and connections, resulting in improved decision-making. To communicate information clearly and efficiently, data visualization uses statistical graphics, plots, information graphics, and other tools. Numerical data may be encoded, using dots, lines, or bars, to visually communicate a quantitative message, thereby making complex data more accessible, understandable, and usable.45
Visualization Example
As an example of data visualization, consider the top five organizations in the manufacturing industry to receive Canadian government funding and their respective amounts funded (in dollars) in a recent year, displayed in Table 1.1.46 One of the leading companies to develop data visualization software is Tableau©. Figure 1.8 is a bubble chart of the Table 1.1 data, developed using Tableau©. One of the advantages of using visualization to display data is the variety of types of graphs or charts that can be produced. Figure 1.9 contains a Tableau©-produced bar chart of the same information to give the user a different perspective. TA B L E 1. 1
Top Five Manufacturing Firms to Receive Canadian Government Funding
Company Name
Dollars Funded
FCA Canada Inc. (Chrysler)
$85,800,000
Bombardier Inc.
$54,150,000
Produits Kruger
$39,500,000
Sonaca Montréal Inc.
$23,250,000
Hanwha L&C Canada Inc.
$15,000,000
41
Zhao, “What Is ETL?”
42
homas M. Connolly and Carolyn E. Begg, Database Systems: A Practical Approach to Design Implementation T and Management, 6th ed. (Harlow, UK: Pearson, 2014), 64. 43
James R. Evans. Business Analytics: Methods, Models, and Decisions, 2nd ed. (Boston: Pearson, 2016), 7.
44
R. Roberts, R. Laramee, P. Brookes, G.A. Smith, T. D’Cruze, and M.J. Roach, “A Tale of Two Visions— Exploring the Dichotomy of Interest between Academia and Industry in Visualisation,” in Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 3 (Setúbal, Portugal: SciTePress, 2018), 319–26. 45
Stephen Few, “Eenie, Meenie, Minie, Moe: Selecting the Right Graph for Your Message,” paper from Perceptual Edge, 2004, www.perceptualedge.com/articles/ie/the_right_graph.pdf. 46 “Top 50 Companies That Received Canadian Government Funding,” The Globe and Mail, July 19, 2017, www. theglobeandmail.com/report‐on‐business/top‐50‐report‐on‐business‐the‐funding‐portal/article19192109/.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 14
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] Decision Dilemma Solved 15
Sonaca Montréal Inc.
FCA Canada Inc. (Chrysler)
Hanwha L&C Canada Inc.
Bombardier Inc.
Produits Kruger
FIGURE 1.8 Bubble Chart of
the Top Five Manufacturing Firms to Receive Canadian Government Funding
Company name
FCA Canada Inc.(Chrysler) Bombardier Inc. Produits Kruger Sonaca Montréal Inc. Hanwha L & C Canada Inc.
FIGURE 1.9 Bar Chart of
0M
10M
20M
30M
40M 50M Dollars funded
60M
70M
80M
the Top Five Manufacturing 90M Firms to Receive Canadian Government Funding
Decision Dilemma Solved Statistics Describe the State of Business in India’s Countryside Several statistics were reported in the Decision Dilemma about rural India. The authors of the sources from which the Decision Dilemma was drawn never stated whether the reported statistics were based on actual data drawn from a census of rural India households or on estimates taken from a sample of rural households. If the data came from a census, then the totals, averages, and percentages presented in the Decision Dilemma are parameters. If, on the other hand, the data were gathered from samples, then they are statistics. Although governments do conduct censuses and at least some of the reported numbers could be parameters, more often than not such data are gathered from samples of people or items. For example, in rural India, the government, academicians, or business analysts could have taken random samples of households, gathering consumer statistics that are then used to estimate population parameters, such as percentage of households with a mobile connection, and so forth. In conducting research on a topic like consumer consumption in rural India, a wide variety of statistics can be gathered that represent several levels of data. For example, ratio-level measurements of items such as income, number of children, age of household heads, number of heads of livestock, and grams of toothpaste consumed per year might be obtained. On the other hand, if
analysts use a Likert-type scale (1-to-5 measurements) to gather responses about rural Indian consumers’ interests, likes, and preferences, an ordinal-level measurement would be obtained, as with the ranking of products or brands in market research studies. Other variables, such as geographic location, occupation, or religion, are usually measured with nominal data. The decision to enter the rural Indian market is not just a marketing decision. It involves production and operations capacity, scheduling issues, transportation challenges, financial commitments, managerial growth or reassignment issues, accounting issues (accounting for rural India may differ from techniques used in urban markets), information systems, and other related areas. With so much on the line, company decision-makers need as much relevant information available as possible. In this Decision Dilemma, it is obvious to the decision-maker that rural India is still quite poor and illiterate. Its capacity as a market is great. The statistics on the increasing sales of a few personal-care products look promising. What are the future forecasts for the earning power of people in rural India? Will major cultural issues block the adoption of the types of products that companies want to sell there? The answers to these and many other interesting and useful questions can be obtained by the appropriate use of statistics. The approximately 900 million people living in rural India certainly make it a market segment worth studying further.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 15
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 16 C HA PTER 1 Introduction to Statistics and Business Analytics
Key Considerations With the abundance and proliferation of statistical data, potential misuse of statistics in business dealings is a concern. It is, in effect, unethical business behaviour to use statistics out of context. Unethical business people might use only selective data from studies to underscore their point, omitting statistics from the same studies that argue against their case. The results of statistical studies can be misstated or overstated to gain favour.
This chapter noted that, if data are nominal or ordinal, then only nonparametric statistics are appropriate for analysis. The use of parametric statistics to analyze nominal and/or ordinal data is wrong and could be considered under some circumstances to be unethical. In this text, each chapter contains a section on ethics that discusses how businesses can misuse the techniques presented in the chapter in an unethical manner. As both users and producers, business students need to be aware of the potential ethical pitfalls that can occur with statistics.
Why Statistics Is Relevant In the contemporary business world, good decisions are driven by data. In all areas of business, amazing amounts of diverse data are available for interpretation and quantitative insight. Business managers and professionals are increasingly required to justify decisions on the basis of data analysis. Thus, the ability to extract useful information from data is one of the most important, and marketable, skills business managers and professionals can acquire. This text is an introduction to the theory and, more importantly, the methods used to intelligently collect, analyze, and interpret data relevant to business decision-making.
Summary of Learning Objectives L E A R N I N G OBJ E CTI VE 1 . 1 Define important statistical terms, including population, sample, and parameter, as they relate to descriptive and inferential statistics. Statistics is an important decision-making tool in business and is used in virtually every area of business. In this text, the word statistics is defined as the science of gathering, analyzing, interpreting, and presenting data. The study of statistics can be subdivided into two main areas: descriptive statistics and inferential statistics. Descriptive statistics result from gathering data from a body, group, or population and reaching conclusions only about that group. Inferential statistics are generated from the process of gathering sample data from a group, body, or population and reaching conclusions about the larger group from which the sample was drawn.
L E A R N I N G O BJ E CTI VE 1 . 2 Explain the difference bet ween variables, measurement, and data, and compare the four different levels of data: nominal, ordinal, interval, and ratio. Most business statistics studies contain variables, measurements, and data. A variable is a characteristic of any entity being studied that is capable of taking on different values. Examples of variables might include monthly household food spending, time between arrivals at a restaurant, and patient satisfaction rating. A measurement is produced when a standard process is used to assign numbers to particular attributes or characteristics of a variable. Measurements of monthly household food spending might be taken in dollars, time between arrivals might be measured in minutes, and patient satisfaction might be measured using a 5-point scale. Data are recorded measurements.
It is data that are analyzed by business statisticians in order to learn more about the variables being studied. The appropriate type of statistical analysis depends on the level of data measurement, which can be (1) nominal, (2) ordinal, (3) interval, or (4) ratio. Nominal is the lowest level, representing the classification of only data, such as geographic location, language, or social insurance number. The next level is ordinal, which provides rank-ordering measurements in which the intervals between consecutive numbers do not necessarily represent equal distances. Interval is the next highest level of data measurement, in which the distances represented by consecutive numbers are equal. The highest level of data measurement is ratio. This has all the qualities of interval measurement, but ratio data contain an absolute zero, and ratios between numbers are meaningful. Interval and ratio data are sometimes called metric or quantitative data. Nominal and ordinal data are sometimes called nonmetric or qualitative data. Two major types of inferential statistics are (1) parametric statistics and (2) nonparametric statistics. Use of parametric statistics requires interval or ratio data and certain assumptions about the distribution of the data. The techniques presented in this text are largely parametric. If data are only nominal or ordinal in level, nonparametric statistics must be used.
LEARNING OBJECTIVE 1.3 Explain the differences bet ween the four dimensions of big data. The emergence of exponential growth in the number and type of data existing has resulted in a new term, big data, which is a collection of large and complex data sets from different sources that are difficult to process using traditional data management and process applications. There are four dimensions of big data: (1) variety, (2) velocity,
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 16
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] Supplementary Problems 17 (3) veracity, and (4) volume. Variety refers to the many different forms of data, velocity refers to the speed at which the data are available and can be processed, veracity has to do with data quality and accuracy, and volume has to do with the ever-increasing size of data.
L E A R NI N G O BJ E CTI VE 1 . 4 Compare and contrast the three categories of business analytics. Business analytics is a relatively new field of business dealing with the challenges, opportunities, and potentialities available to business analysts through big data. Business analytics is the application of processes and techniques that transform raw data into meaningful information to improve decision-making. There are three categories of business analytics: (1) descriptive analytics, (2) predictive analytics, and (3) prescriptive analytics.
LEARNING OBJECTIVE 1.5 Describe the data mining and data visualization processes. Two main components in the process of transforming the mountains of data now available are data mining and data visualization. Data mining is the process of collecting, exploring, and analyzing large volumes of data in an effort to uncover hidden patterns and/ or relationships that can be used to enhance business decision- making. One effective way of communicating data to a broader audience of people is data visualization. Data visualization is any attempt made by data analysts to help individuals better understand data by putting it in a visual context. Specifically, data visualization is the study of the visual representation of data and is employed to convey data or information by imparting it as visual objects displayed in graphics.
Key Terms big data 10 business analytics 11 census 3 data 5 data mining 13 data visualization 14 descriptive analytics 12 descriptive statistics 4 inferential statistics 4 interval‐level data 8
measurement 5 metric data 8 nominal‐level data 6 nonmetric data 7 nonparametric statistics 8 ordinal‐level data 7 parameter 4 parametric statistics 8 population 3 predictive analytics 12
prescriptive analytics 12 ratio‐level data 8 sample 4 statistic 4 statistics 3 variable 5 variety 10 velocity 10 veracity 10 volume 10
Supplementary Problems 1.1. Give a specific example of data that might be gathered from each of the following business disciplines: accounting, finance, human resources, marketing, operations and supply chain management, information systems, production, and management. An example in the marketing area might be “number of sales per month by each salesperson.”
better understand how to improve sales. Think about this in such areas as sales, customers, human resources, inventory, suppliers, and so on. List five variables that might produce information that could aid the manager in his or her job. Write a sentence or two describing each variable, and briefly discuss some numerical observations that might be generated for each variable.
1.2. Provide examples of data that can be gathered for decision-making purposes from each of the following industries: manufacturing, insurance, travel, retailing, communications, computing, agriculture, banking, and health care. An example in the travel industry might be the cost of business travel per day in various European cities.
1.6. Suppose you are the owner of a medium-sized restaurant in a small city. What are some variables associated with different aspects of the business that might be helpful to you in making business decisions about the restaurant? Name four of these variables. For each variable, briefly describe a numerical observation that might be the result of measuring the variable.
1.3. Give an example of descriptive statistics in the recorded music industry. Give an example of how inferential statistics could be used in the recorded music industry. Compare the two examples. What makes them different?
1.7. Video Classify each of the following as nominal, ordinal, interval, or ratio data.
1.4. Suppose you are an operations manager for a plant that manufactures batteries. Give an example of how you could use descriptive statistics to make better managerial decisions. Give an example of how you could use inferential statistics to make better managerial decisions. 1.5. There are many types of information that might help the manager of a large department store run the business more efficiently and
a. The time required to produce each tire on an assembly line b. The number of litres of milk a family drinks in a month c. The ranking of four machine operators in your plant according to experience d. The telephone area codes of clients in Canada e. The age of each of your employees f. The dollar sales at the local pizza house each month
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 17
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 18 C HA PTER 1 Introduction to Statistics and Business Analytics g. An employee’s identification number
Rating
Grade
h. The response time of an emergency unit
Medium quality
BBB
Somewhat speculative
BB
1.8. Classify each of the following as nominal, ordinal, interval, or ratio data.
Low quality, speculative
B
a. The ranking of a company by Report on Business magazine’s Top 1000
Low grade, default possible
CCC
b. The number of tickets sold at a movie theatre on any given night
Low grade, partial recovery possible
CC
Default, recovery unlikely
C
c. The identification number on a questionnaire d. Per capita income e. The trade balance in dollars f. Profit/loss in dollars g. A company’s tax identification number h. The Standard & Poor’s bond ratings of cities based on the following scales:
1.9. Video The Mapletech Manufacturing Company makes electric wiring, which it sells to contractors in the construction industry. Approximately 900 electrical contractors purchase wire from Mapletech annually. Mapletech’s director of marketing wants to determine electrical contractors’ satisfaction with Mapletech’s wire. She develops a questionnaire that yields a satisfaction score of between 10 and 50 for participant responses. A random sample of 35 of the 900 contractors is asked to complete a satisfaction survey. The satisfaction scores for the 35 participants are averaged to produce a mean satisfaction score.
Rating
Grade
Highest quality
AAA
b. What is the sample for this study?
High quality
AA
c. What is the statistic for this study?
Upper medium quality
A
d. What would be a parameter for this study?
a. What is the population for this study?
Exploring the Databases with Business Analytics See the databases in Wiley’s online resources
International Labour Database
Six major databases constructed for this text can be used to apply the business analytics and statistical techniques presented in this course. These databases are located in Wiley’s online course, and each one is available in a few electronic formats for your convenience. These six databases represent a wide variety of business areas: the stock market, international labour, finance, energy, and agri-business. The data are gathered from such reliable sources as Statistics Canada, the Toronto Stock Exchange, Yahoo Finance, the Fraser Institute, and the Global Environment Outlook (GEO) Data Portal. Four of the six databases contain time-series data that can be especially useful in forecasting and regression analysis. Here is a description of each database, along with information that may help you to interpret outcomes.
This time-series database contains the civilian unemployment rates in percent from seven countries (G7) presented yearly over a 30-year period. The data are published by the Bureau of Labor Statistics of the U.S. Department of Labor. The countries are Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States.
Canadian Stock Market Database The stock market database contains seven variables on the Toronto Stock Exchange. Weekly observations for a period of five years yield a total of 257 observations per variable. The variables are Composite Index, Energy Index, Financial Index, Health Index, Utility Index, I.T. Index, and Gold Index. These data were obtained directly from TMX Inc.
Canadian RRSP Contribution Database The registered retirement savings plan database contains five variables: Number of Tax Filers, Total RRSP Contributors, Average Age of RRSP Contributors, Total RRSP Contributions ($ × 1,000), and Median RRSP Contributions ($). There are 156 entries for each variable in this database, representing 10 years of data for each of 10 provinces and 3 territories in Canada. The data in this database were obtained from Statistics Canada.
Financial Database The financial database contains observations on 12 variables for 100 companies. The variables are Industry Group, Type of Industry, Total Revenues, Price, Average Yield, Dividend Growth, Average Price/ Earnings (P/E) Ratio, Dividends per Share, Total Debt/Total Equity, Price/Cash Flow, Price/Book, and One-Year Total Return. The data were gathered from the Toronto Stock Exchange. The companies represent seven different types of industries. The variable “Type” displays a company’s industry type as follows: 1 = Real Estate 2 = Financial Institutions 3 = Chemicals 4 = Mining, Electric, Oil & Gas, Pipelines 5 = Telecommunications, Retail, Commercial Services, Auto Parts & Equipment, Transportation 6 = Insurance 7 = Food, Pharmaceuticals, Electronics, Media, Aerospace/ Defence, Hand/Machine Tools, Agriculture, Iron/Steel, Holding Companies, Engineering and Construction, Machinery— Diversified, Forest Products and Paper
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 18
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] Case 19
Energy Resource Database The energy resource database consists of data for North America and Europe on nine variables related to energy supply and emission of carbon dioxide over a period of 35 years. The database is adapted from the Global Environment Outlook (GEO) Data Portal. The nine variables are the seven total primary energy supplies of Solar, Wind, Tide, and Wave; Nuclear; Natural Gas; Coal and Coal Products; Crude Oil; Hydro; and Petroleum Products, plus the two sources of emissions of CO2: Manufacturing Industries and Construction, and Transportation.
Agri-Business Canada Database The agri-business time-series database contains the monthly weight (in thousands of pounds) of seven different grains over an eight-year
period. Each of the seven variables represents 94 months of data. The seven grains are Wheat; Wheat, Excluding Durum; Durum Wheat; Oats; Barley; Flaxseed; and Canola. The data are published by Statistics Canada under Agri-business. Use the databases to answer the following question. 1. In the Financial Database, what is the level of data for each of the following variables? a. Type of Industry b. Average Price c. Price/Cash Flow
Case Canadian Farmers Dealing with Stress
Discussion
Farming is an extremely demanding job whose success depends largely on the dedication of the farmers. In order to tackle the rigorous tasks of the trade, farmers must be in good physical and mental condition. The University of Guelph conducted a study in 2017–18 which showed that Canadian farmers experienced higher levels of anxiety and depression and displayed lower levels of help-seeking behaviour than the general Canadian populace. The study indicated that Canadian farmers were at a heightened risk of suicide compared to the rest of the population. The Canadian Agricultural Safety Association (CASA) also recently sponsored research on the stress level of Canadian farmers. Western Opinion Research Inc. conducted the research study, which was completed by 1,100 farmers across Canada. The survey asked farmers to rate their stress level, ranging from “stressed” to “somewhat stressed” to “very stressed.” The results varied and revealed the following: two-thirds of farmers indicated feeling “stressed,” 45% indicated feeling “somewhat stressed,” and one in five indicated feeling “very stressed.” The results of the survey also revealed that the major causes of stress among farmers were (a) financial concerns related to prices of commodities, (b) diseases affecting livestock, and (c) finances regarding general farm expenses. The results also revealed that a large percentage of farmers (35%) were interested in having access to more stress-related resources in order to help alleviate their stress level. This study allowed CASA to realize that stress within the farming industry is a major issue, making it imperative to take action and offer stress counselling resources to farmers. These resources include (a) confidential meeting with a health care professional, (b) over-thephone consultation with a health care professional, and (c) attending workshops such as retreats that focus on relaxation techniques and role playing with other farmers. Over the last few years, CASA has made great progress to help reduce the stress level of Canadian farmers. For example, successful counselling services that deal with farmers and their problems were established using phone, email, and online chat help lines. These include the Manitoba Farm, Rural & Northern Support Services and the Saskatchewan Farm Stress Line. These services are well recognized and are having great success in helping farmers, mainly because they are staffed by paid professional counsellors who all have farming backgrounds, making the farmers feel more comfortable because they are connecting with someone who understands their work-related issues.
Think of the market research that was conducted by CASA. 1. What are some of the populations that CASA might have been interested in measuring for these studies? Did CASA attempt to contact entire populations? What samples were taken? In light of these two questions, how was the inferential process used by CASA in its market research? Can you think of any descriptive statistics that might have been used by CASA in its decision-making process? 2. In the various market research efforts made by CASA to determine stress experienced by farmers, some of the possible measurements appear in the following list. Categorize these by level of data. Think of some other measurements that CASA analysts might have taken to help them in this research effort and categorize them by level of data. a. Ranking of the level of stress on a stress test b. Number of farmers who ask for professional help c. Number of farmers who are aware of professional help resources d. Number of farmers who try to manage stress on their own e. Number of farmers who are interested in having access to more stress-related resources f. Number of farmers who are close to being out of business g. Number of farmers who would prefer dealing with stress on their own h. Number of farmers who would prefer dealing with stress with a professional over the telephone i. Number of farmers who would prefer dealing with stress with a professional in person j. Age of survey respondent k. Gender of survey respondent l. Geographical region of survey respondent m. Amount of time farmers spend dealing with their stress n. Rating of the most stress-related factors on a scale from 1 to 10, where 1 is the least stress-related factor and 10 is the most stress-related factor o. Rating of the reasons why farmers do not seek more help for stress on a scale from 1 to 10, where 1 is the least important reason and 10 is the most important reason
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 19
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected] 20 C HA PTER 1 Introduction to Statistics and Business Analytics
Big Data Case Virtually every chapter in this text will end with a big data case. Unlike the chapter cases, which feature a wide variety of companies and business scenarios, the big data case will be based on data from a single industry, U.S. hospitals. Drawing from the American Hospital Association database of over 2,000 hospitals, we will use business analytics to mine data on these hospitals in different ways for each chapter based on analytics presented in that chapter.
The hospital database features data on 12 variables, thereby offering over 24,000 observations to analyze. Hospital data for each of the 50 states are contained in the database with qualitative data representing state, region of the country, type of ownership, and type of hospital. Quantitative measures include Number of Beds, Number of Admissions, Census, Number of Outpatients, Number of Births, Total Expenditures, Payroll Expenditures, and Personnel.
Using Software for Data Analysis Statistical Analysis Using Excel The advent of the modern computer opened up many new opportunities for statistical analysis. The computer allows for storage, retrieval, and transfer of large data sets. Furthermore, computer software has been developed to analyze data by means of sophisticated statistical techniques. Some widely used statistical techniques, such as multiple regression, are so tedious and cumbersome to compute manually that they were of little practical use to analysts before computers were developed. In this textbook, the end-of-chapter “Using the Computer” feature will show you how you can use software to apply the statistical techniques you learn in each chapter. • Business statisticians use many popular statistical software packages, including MINITAB, SAS, and SPSS. Many computer spreadsheet software packages can also analyze data statistically. In this text, when it is appropriate to use a computer, we will use Microsoft Excel for data analysis. • Excel is by far the most commonly used spreadsheet for PCs, making it the obvious choice for basic statistical analysis. Excel can perform a variety of calculations and includes a large collection of statistical functions. The Data Analysis ToolPak provides a further suite of statistical macro-functions.
• We note, however, that although Excel is a fine spreadsheet, it is not a professional statistical data analysis package: there are some important limitations. Key limitations to keep in mind include the following: missing values in Excel are difficult to handle when performing data analysis; data organization differs according to analysis, sometimes forcing you to reorganize your data; some output and charts produced by Excel are of poor quality from a statistical point of view and are sometimes inadequately labelled; and there is no record of how an analysis was done if the Data Analysis ToolPak is used. • Despite these limitations, Excel is the most commonly used package in the business environment and is the package you are most likely to use in your professional life. • It is important to remember that a statistical software package is not a replacement for a thorough understanding of correct statistical methods. The business analyst is responsible for determining the most appropriate statistical methods for a given business problem. Simply relying on convenient software tools that may be at hand without thinking through the most appropriate approach can lead to errors, oversights, and poor decisions. One of the goals of this text is to show you when and how to use statistical methods to provide information that can be used in business decisions.
Answers Concept Check 1.1
Odd-Numbered Supplementary Problems
1. Descriptive statistics can be used to summarize the data to describe a data sample either numerically or graphically. 2. Statistical inference is inference about a population from a random data sample drawn from it.
1.7. a. ratio b. ratio c. ordinal d. nominal e. ordinal f. ratio g. nominal h. ratio
Concept Check 1.2
1.9. a. 900 electric contractors b. 35 contractors c. average score for 35 participants d. average score for all 900 electric contractors
1. c. 2. d. 3. b. 4. a.
Get complete eBook Order by email at [email protected] c01IntroductionToStatisticsAndBusinessAnalytics.indd 20
22/03/23 6:07 PM
Get complete eBook Order by email at [email protected]
CHAPTER 2
Visualizing Data with Charts and Graphs LEARNING OBJECTIVES The overall objective of Chapter 2 is for you to master several techniques for summarizing and visualizing data, thereby enabling you to: 2.1 Explain the difference between grouped and ungrouped data and construct a frequency distribution from a set of data. Explain what the distribution represents. 2.2 Describe and construct different types of quantitative data graphs, including histograms, frequency polygons, ogives, and stem-and-leaf plots. Explain when these graphs should be used. 2.3 Describe and construct different types of qualitative data graphs, including pie charts, bar charts, and Pareto charts. Explain when these graphs should be used. 2.4 Display and analyze two variables simultaneously using cross-tabulation and scatter plots. 2.5 Describe and construct a time-series graph. Visually identify any trends in the data.
Decision Dilemma As most people suspect, the United States is the number one consumer of oil in the world, followed by China, India, Saudi Arabia, Japan, and the Russian Federation. (Canada ranks ninth, with a consumption that is below that of Brazil and above that of Germany.) China, however, is the world’s largest consumer of coal, with India coming in at number two, followed by the United States, Japan, and South Africa. (Canada ranks 22nd, below Brazil and above the Czech Republic.) The annual oil and coal consumption figures for six of the top total energy-consuming nations in the world, according to figures released by the BP Statistical Review of World Energy for a recent year, are as follows.
instamatics/Getty Images
Energy Consumption Around the World
21
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 21
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 22 C HA PTER 2 Visualizing Data with Charts and Graphs
Country
Oil Consumption (thousands of barrels)
Coal Consumption (exajoules)
U.S.
17,178
9.20
China
14,225
82.27
India
4,669
17.54
Japan
3,268
4.57
Russian Federation
3,238
3.27
Brazil
2,323
0.58
Managerial, Statistical, and Analytical Questions Suppose you are an energy industry analyst and you are asked to prepare a brief report showing the leading energy-consumption countries in both oil and coal. 1. What is the best way to display the data on energy consumption in a report? Are the raw data enough? Can you effectively display the data graphically? 2. Is there a way to graphically display oil and coal figures together so that readers can visually compare countries on their consumptions of the two different energy sources?
Source: BP Statistical Review of World Energy, 2021. https://www.bp. com/content/dam/bp/business-sites/en/global/corporate/pdfs/energyeconomics/statistical-review/bp-stats-review-2021-full-report.pdf.
Introduction In this era of seemingly boundless big data, the application of business analytics has great potential to unearth business knowledge and intelligence that can substantially improve business decision-making. A key objective of business analytics is to convert data into deeper and broader actionable insights and understandings for all aspects of business. One of the first steps is to visualize the data through graphs and charts, thereby providing business analysts with an overview of the data and a glimpse into any underlying relationships. In this chapter, we will study how to visually represent data in order to convey information that can unlock potentialities for making better business decisions. Using visuals to convey information hidden in the data allows a broader audience with a wide range of backgrounds to understand its meaning. Data visualization tools can reveal surprising patterns and connections, making data-driven insights accessible to people at all levels of an organization. In business decision-making, graphical depictions of data are often much more effective communication tools than tables of numbers. In addition, key characteristics of graphs often suggest appropriate choices among potential numerical methods (discussed in later chapters) for analyzing data. A first step in exploring and analyzing data is to reduce important and sometimes expansive data to a graphic picture that is clear, concise, and consistent with the message of the original data. Converting data to graphics can be creative and artful. This chapter focuses on graphical tools for summarizing and presenting data. Charts and graphs discussed in detail in Chapter 2 include histograms, frequency polygons, ogives, dot plots, stem-and-leaf plots, bar charts, pie charts, and Pareto charts for one-variable data, and cross-tabulation tables and scatter plots for two-variable numerical data. In addition, there is a section on timeseries graphs for displaying data gathered over time. Box-and-whisker plots are discussed in Chapter 3. A non-exhaustive list of other graphical tools includes the line plot, area plot, bubble chart, treemap, map chart, Gantt chart, bullet chart, doughnut chart, circle view, highlight table, matrix plot, marginal plot, probability distribution plot, individual value plot, and contour plot. Recall the four levels of data measurement discussed in Chapter 1: nominal, ordinal, interval, and ratio. The lowest levels of data, nominal and ordinal, are referred to as qualitative data, while the highest levels of data, interval and ratio, are referred to as quantitative data. In this chapter, Section 2.2 will present quantitative data graphs and Section 2.3 will present qualitative data graphs. Note that the data visualization software Tableau© divides data variables into measures and dimensions based on whether data are qualitative or quantitative.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 22
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.1 Frequency Distributions 23
2.1 Frequency Distributions LEARNING OBJECTIVE 2.1 Explain the difference between grouped and ungrouped data and construct a frequency distribution from a set of data. Explain what the distribution represents. Raw data, or data that have not been summarized in any way, are sometimes referred to as ungrouped data. As an example, Table 2.1 contains raw data showing 60 years of unemployment rates for Canada. Data that have been organized into a frequency distribution are called grouped data. Table 2.2 presents a frequency distribution for the data displayed in Table 2.1. The distinction between ungrouped and grouped data is important because statistics are calculated differently for each type of data. Several of the charts and graphs presented in this chapter are constructed from grouped data. One particularly useful tool for grouping data is the frequency distribution, which is a summary of data presented in the form of class intervals and frequencies. How is a frequency distribution constructed from raw data? That is, how are frequency distributions like the one displayed in Table 2.2 constructed from raw data like those presented in Table 2.1? Frequency distributions are relatively easy to construct. Although some guidelines help in their construction, frequency distributions vary in final shape and design, even when the original raw data are identical. In a sense, frequency distributions are constructed according to individual business analysts’ tastes.
ungrouped data Raw data, or data that have not been summarized in any way. grouped data Data that have been organized into a frequency distribution. frequency distribution A summary of data presented in the form of class intervals and frequencies.
TA B L E 2. 1 Unemployment Rates for Canada over 60 Years (Ungrouped Data) 6.0
6.4
12.0
9.5
6.0
7.0
6.3
11.3
9.6
6.1
7.1
5.6
10.5
9.1
8.3
5.9
5.4
9.6
8.3
8.1
5.5
7.1
8.8
7.6
7.5
4.7
7.1
7.8
6.8
7.3
3.9
8.0
7.5
7.2
7.1
3.6
8.4
8.1
7.7
6.9
4.1
7.5
10.3
7.6
6.9
4.8
7.5
11.2
7.2
7.0
4.7
7.6
11.4
6.8
6.3
5.9
11.0
10.4
6.3
5.8
TAB L E 2. 2
Frequency Distribution of 60 Years of Unemployment Data for Canada (Grouped Data)
Class Interval
Frequency
2–under 4
2
4–under 6
10
6–under 8
29
8–under 10
11
10–under 12
7
12–under 14
1
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 23
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 24 C HA PTER 2 Visualizing Data with Charts and Graphs
range The difference between the largest and the smallest values in a set of numbers.
When constructing a frequency distribution, the business analyst should first determine the range of the raw data. The range is often defined as the difference between the largest and smallest numbers. The range for the data in Table 2.1 is 8.4 (12.0 − 3.6). The second step in constructing a frequency distribution is to determine how many classes it will contain. One guideline is to select between 5 and 15 classes. If the frequency distribution contains too few classes, the data summary may be too general to be useful. Too many classes may result in a frequency distribution that does not aggregate the data enough to be helpful. The final number of classes is arbitrary. The business analyst arrives at a number by examining the range and determining a number of classes that will span the range adequately and be meaningful to the user. The data in Table 2.1 were grouped into six classes for Table 2.2. After selecting the number of classes, the business analyst must determine the width of the class interval. An approximation of the class width can be calculated by dividing the range by the number of classes. For the data in Table 2.1, this approximation is 8.4/6, or 1.4. Normally, the number is rounded up to the next whole number, which in this case is 2. The frequency distribution must start at a value equal to or lower than the lowest number of the ungrouped data and end at a value equal to or higher than the highest number. The lowest unemployment rate is 3.6 and the highest is 12.0, so the business analyst starts the frequency distribution at 2 and ends it at 14. Table 2.2 contains the completed frequency distribution for the data in Table 2.1. Class endpoints are selected so that no value of the data can fit into more than one class. The class interval expression “under” in the distribution of Table 2.2 avoids this problem.
Class Midpoint class midpoint For any given class interval of a frequency distribution, the value halfway across the class interval; the average of the two class endpoints. Also called class mark.
The midpoint of each class interval is called the class midpoint and is sometimes referred to as the class mark. It is the value halfway across the class interval and can be calculated as the average of the two class endpoints. For example, in the distribution of Table 2.2, the midpoint of the class interval 4–under 6 is 5, or (4 + 6)/2. Class Beginning Point = 4 Class Width = 2
1 (2) = 5 Class Midpoint = 4 + _ 2 The class midpoint is important because it becomes the representative value for each class in most group statistics calculations. The third column in Table 2.3 contains the class midpoints for all classes of the data from Table 2.2. TAB L E 2 . 3
Class Midpoints, Relative Frequencies, and Cumulative Frequencies for Unemployment Data
Interval
Frequency
Class Midpoint
Relative Frequency
Cumulative Frequency
2–under 4
2
3
0.0333
2
4–under 6
10
5
0.1667
12
6–under 8
29
7
0.4833
41
8–under 10
11
9
0.1833
52
10–under 12
7
11
0.1167
59
12–under 14
1
13
0.0167
60
Totals
60
1.0000
Relative Frequency relative frequency The proportion of the total frequencies that fall into any given class interval in a frequency distribution.
Relative frequency is the proportion of the total frequency that is in any given class interval in a frequency distribution. Relative frequency is the individual class frequency divided by the total frequency. For example, from Table 2.3, the relative frequency for the class interval 6–under 8 is 29/60 = 0.4833. We consider the relative frequency here to prepare for the study of probability in
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 24
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.1 Frequency Distributions 25
Chapter 4. Indeed, if values are selected randomly from the data in Table 2.1, the probability of drawing a number that is 6–under 8 is 0.4833, the relative frequency for that class interval. The fourth column of Table 2.3 lists the relative frequencies for the frequency distribution of Table 2.2.
Cumulative Frequency The cumulative frequency is a running total of frequencies through the classes of a frequency distribution. The cumulative frequency for each class interval is the frequency for that class interval added to the preceding cumulative total. In Table 2.3, the cumulative frequency for the first class is the same as the class frequency: 2. The cumulative frequency for the second class interval is the frequency of that interval (10) plus the frequency of the first interval (2), which yields a new cumulative frequency of 12. This process continues through the last interval, at which point the cumulative total equals the sum of the frequencies (60). The concept of cumulative frequency is used in many areas, including sales cumulated over a fiscal year, sports scores during a contest (cumulated points), years of service, points earned in a course, and costs of doing business over a period of time. The last column in Table 2.3 gives cumulative frequencies for the data in Table 2.2.
cumulative frequency A running total of frequencies through the classes of a frequency distribution.
Demonstration Problem 2.1 The following data are the average weekly interest rates for a 60-week period. 7.29
7.03
7.14
6.77
6.35
7.16
6.78
6.79
7.07
7.03
6.69
7.02
7.40
7.16
6.96
6.87
6.80
7.10
7.13
6.95
6.98
7.56
6.75
6.87
7.11
7.08
7.24
7.34
7.47
7.31
7.39
7.28
6.97
6.90
6.57
6.96
6.70
6.57
6.88
6.84
7.11
6.95
7.23
7.31
7.00
7.02
7.40
7.12
7.16
7.16
7.30
7.17
6.96
6.78
7.30
6.99
6.94
7.29
7.05
6.84
Construct a frequency distribution for these data. Calculate and display the class midpoints, relative frequencies, and cumulative frequencies for this frequency distribution. Solution How many classes should this frequency distribution contain? The range of the data is 1.21 (7.56 − 6.35). If seven classes are used, each class width is approximately: Range Class Width = ___________ = 0.173 = _ 1.21 7 Number of Classes If a class width of 0.20 is used, a frequency distribution can be constructed with endpoints that are more uniform looking and allow presentation of the information in categories more familiar to interest rate users. The first class endpoint must be 6.35 or lower to include the smallest value; the last endpoint must be 7.56 or higher to include the largest value. In this case, the frequency distribution begins at 6.30 and ends at 7.70. The resulting frequency distribution, class midpoints, relative frequencies, and cumulative frequencies are listed in the following table. Class Interval
Frequency
Class Midpoint
6.30–under 6.50 6.50–under 6.70 6.70–under 6.90 6.90–under 7.10 7.10–under 7.30 7.30–under 7.50 7.50–under 7.70 Totals
1 3 12 18 16 9 1 60
6.40 6.60 6.80 7.00 7.20 7.40 7.60
Relative Frequency
Cumulative Frequency
0.0167 0.0500 0.2000 0.3000 0.2666 0.1500 0.0167 1.0000
1 4 16 34 50 59 60
The frequencies and relative frequencies of these data reveal the interest rate classes that are likely to occur during the period. Most of the interest rates (55 of the 60) are in the classes starting with 6.70–under 6.90 and going through 7.30–under 7.50. The rates with the greatest frequency, 18, are in the 6.90–under 7.10 class.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 25
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 26 C HA PTER 2 Visualizing Data with Charts and Graphs
Concept Check Fill in the blanks. The value halfway across the class interval (calculated as the average of the two class endpoints) is called the class . For each class, the individual class frequency divided by the total frequency is called the frequency. Moreover, a running total of frequencies through the classes of a frequency distribution is called the frequency.
2.1 Problems 2.1 The following data represent the afternoon high temperatures for 50 construction days during a year in Toronto. 6 13 ‒9 3 ‒1
21 29 4 26 3
18 ‒12 27 2 11
8 ‒4 ‒9 2 ‒9
19 7 2 ‒5 27
21 ‒1 ‒8 18 ‒11
23 17 4 24 16
3 8 2 12 6
9 17 7 ‒1 ‒1
Construct a frequency distribution for these data. What does the frequency distribution reveal about the box fills? 2.3 The owner of a fast-food restaurant ascertains the ages of a sample of customers. From these data, the owner constructs the frequency distribution shown. For each class interval of the frequency distribution, determine the class midpoint, the relative frequency, and the cumulative frequency.
‒4 29 ‒8 16 1
Class Interval 0–under 5 5–under 10 10–under 15 15–under 20 20–under 25 25–under 30 30–under 35
a. Construct a frequency distribution for the data using five class intervals. b. Construct a frequency distribution for the data using 10 class intervals. c. Examine the results of (a) and (b) and comment on the usefulness of the frequency distribution in terms of temperature summarization capability. 2.2 A packaging process is supposed to fill small boxes of raisins with approximately 50 raisins so that each box will have the same mass. However, the number of raisins in each box will vary. Suppose 100 boxes of raisins are randomly sampled, the raisins counted, and the following data are obtained. 57 44 49 49 51 54 55 46 59 47
51 53 49 52 48 46 53 59 53 52
53 45 44 49 55 51 50 57 45 48
52 57 54 54 53 48 47 47 45 50
50 39 46 57 55 53 57 61 56 45
60 53 52 52 47 56 49 60 40 56
51 58 55 52 53 48 43 49 46 47
51 47 54 53 43 47 58 53 49 47
52 51 47 49 48 49 52 41 50 48
52 48 53 47 46 57 44 48 57 46
Frequency 6 8 17 23 18 10 4
What does the relative frequency tell the fast-food restaurant owner about customer ages? 2.4 The human resources manager for a large company commissions a study in which the employment records of 500 company employees are examined for absenteeism during the past year. The business analyst conducting the study organizes the data into a frequency distribution to assist the human resources manager in analyzing the data. The frequency distribution is shown. For each class of the frequency distribution, determine the class midpoint, the relative frequency, and the cumulative frequency. Class Interval 0–under 2 2–under 4 4–under 6 6–under 8 8–under 10
Frequency 218 207 56 11 8
2.5 List three specific uses of cumulative frequencies in business.
2.2 Quantitative Data Graphs LEARNING OBJECTIVE 2.2 Describe and construct different types of quantitative data graphs, including histograms, frequency polygons, ogives, and stem-and-leaf plots. Explain when these graphs should be used. Data graphs can generally be classified as quantitative or qualitative. Quantitative data graphs are plotted along a numerical scale, and qualitative graphs are plotted using non-numerical
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 26
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.2 Quantitative Data Graphs 27
categories. In this section, we will examine four types of quantitative data graphs: (1) histogram, (2) frequency polygon, (3) ogive, and (4) stem-and-leaf plot.
Histograms One of the more widely used types of graphs for quantitative data is the histogram. A histogram is a series of contiguous rectangles that represents the frequency of data in given class intervals. If the class intervals used along the horizontal axis are equal, then the heights of the rectangles represent the frequency of values in a given class interval. If the class intervals are unequal, then the areas of the rectangles can be used for relative comparisons of class frequencies. Construction of a histogram involves labelling the x-axis with the class endpoints and the y-axis with the frequencies, drawing a horizontal line segment from class endpoint to class endpoint at each frequency value, and connecting each line segment vertically from the frequency value to the x-axis to form a series of rectangles. Figure 2.1 is a histogram of the frequency distribution in Table 2.2. A histogram is a useful tool for differentiating the frequencies of class intervals. A quick glance at a histogram reveals which class intervals produce the highest frequency totals. Figure 2.1 clearly shows that the class interval 6–under 8 yields by far the highest frequency count (29). Examination of the histogram reveals where large increases or decreases occur between classes, such as from the 2–under 4 class to the 4–under 6 class, an increase of 8, and from the 6–under 8 class to the 8–under 10 class, a decrease of 18. Note that the scales used along the x-and y-axes for the histogram in Figure 2.1 are almost identical. However, because ranges of meaningful numbers for the two variables being graphed often differ considerably, the histogram may have different scales on the two axes. Figure 2.2 shows what the histogram of unemployment rates would look like if the scale on the y-axis was more compressed than that on the x-axis. Notice that there is less difference in the height of the rectangles used to represent the frequencies in Figure 2.2. It is important that the user of the graph clearly understand the scales used for the axes of a histogram. Otherwise, a graph’s creator can lie with statistics by stretching or compressing a graph to make a point.1
histogram A type of vertical bar chart constructed by graphing line segments for the frequencies of classes across the class intervals and connecting each to the x-axis to form a series of rectangles.
Using Histograms to Get an Initial Overview of the Data Because of the widespread availability of computers and statistical software packages to business decisionmakers, the histogram continues to grow in importance for yielding information about the shape of the distribution of a large database, the variability of the data, the central location of the data, and outlier data. Although most of these concepts are presented in Chapter 3, the notion of the histogram as an initial tool to access these data characteristics is presented here. 35 35
30
30
25 Frequency
15
20 15
10
10
5
5
Unemployment Data
14 er nd 12 –u
er nd 10 –u
de r un 8–
12
10
8 er 6– un d
4– un d
er
4
un de r un 4 de r 6– un 6 de 8– r8 un de 10 r1 –u 0 nd er 12 1 –u nd 2 er 14 4–
2–
er
Unemployment Rates for Canada
Unemployment Rates for Canada FIGURE 2.1 Histogram of Canadian
6
0
0
2– un d
Frequency
25 20
FIGURE 2.2 Histogram of Canadian Unemployment Data (y-axis
compressed)
1
It should be pointed out that Excel uses the term histogram to refer to a frequency distribution. However, if you check Chart Output in the Excel histogram dialogue box, a graphical histogram is also created.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 27
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 50 45 40 35 30 25 20 15 10 5 0
le 12 ss t 2 – ha 1 6 un n 1 6– de 22 2 1 un r 1 0– de 66 2 5 un r 2 4– de 10 2 9 un r 2 9– de 54 3 4 un r 2 3– de 99 3 8 un r 3 7– de 43 4 3 un r 3 1– de 87 4 7 un r 4 5– de 31 5 1 un r 4 9– de 75 5 6 un r 5 3– de 19 6 0 un r 5 7– de 63 6 5 un r 6 1– de 07 6 9 un r 6 5– de 51 7 4 un r 6 0– de 95 7 8 un r 7 4– de 40 8 2 un r 7 8– de 84 8 7 un r 8 2– de 28 m un r 8 or de 72 e r th 91 an 6 91 6
Frequency
28 C HA PTER 2 Visualizing Data with Charts and Graphs
Stock Volumes (in millions) FIGURE 2.3 Histogram of Stock Volumes
FIGURE 2.4 Normal Distribution
For example, suppose a financial decision-maker wants to use data to reach some conclusions about the stock market. Figure 2.3 shows a histogram of 324 stock volume observations, that is, amounts of shares traded in a given day. What can we learn from this histogram? Virtually all stock market volumes fall between 78 million and 1 billion shares. The distribution takes on a shape that is high on the left end and tapered to the right. In Chapter 3, we will learn that the shape of this distribution is skewed toward the right end. In statistics, it is often useful to determine whether data are approximately normally distributed (bell-shaped curve), as shown in Figure 2.4. We can see by examining the histogram in Figure 2.3 that the stock market volume data are not normally distributed. Although the centre of the histogram is located near 500 million shares, a large portion of stock volume observations falls in the lower end of the data, somewhere between 100 million and 400 million shares. In addition, the histogram shows some outliers in the upper end of the distribution. Outliers are data points that appear outside the main body of observations and may represent phenomena that differ from those represented by other data points. By looking closely at the histogram, we notice a few data observations near 1 billion. One could conclude that, on a few stock market days, an unusually large volume of shares are traded. These and other insights can be gleaned by examining the histogram and show that histograms play an important role in the initial analysis of data.
Frequency Polygons frequency polygon A graph constructed by plotting a dot for the frequencies at the class midpoints and connecting the dots.
A frequency polygon, like the histogram, is a graphical display of class frequencies. However, instead of using rectangles like a histogram, in a frequency polygon each class frequency is plotted as a dot at the class midpoint, and the dots are connected by a series of line segments. Construction of a frequency polygon begins, as with a histogram, by scaling class endpoints along the x-axis and frequency values along the y-axis. A dot is plotted for the associated frequency value at each class midpoint. Connecting these midpoint dots completes the graph. Figure 2.5 shows a frequency polygon of the distribution data from Table 2.2 produced in Excel. The information gleaned from frequency polygons and histograms is similar. As with the histogram, changing the scales of the axes can compress or stretch a frequency polygon, which affects the user’s impression of what the graph represents.
Ogives ogive A cumulative frequency polygon; plotted by graphing a dot at each class endpoint for the cumulative or decumulative frequency value and connecting the dots.
An ogive (o-jive) is a cumulative frequency polygon. Construction begins by labelling the x-axis with the class endpoints and the y-axis with the frequencies. However, the use of cumulative frequency values requires that the scale along the y-axis be great enough to include the frequency total. A dot of zero frequency is plotted at the beginning of the first class, and construction proceeds by marking a dot at the end of each class interval for the cumulative value. Connecting the dots then completes the ogive. Figure 2.6 presents an ogive produced in Excel for the data in Table 2.2.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 28
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.2 Quantitative Data Graphs 29 35 30
Frequency
25 20 15 10 5 0 3
5
7 9 Class Midpoints
11
13
FIGURE 2.5 Frequency
Polygon of the Unemployment Data
70
Cumulative Frequency
60 50 40 30 20 10 0
2
4
6
8 Class Endpoints
10
12
14
FIGURE 2.6 Ogive of the
Unemployment Data
Ogives are most useful when the decision-maker wants to see running totals. For example, if a comptroller is interested in controlling costs, an ogive could depict cumulative costs over a fiscal year. Steep slopes in an ogive can be used to identify sharp increases in frequencies. In Figure 2.6, a particularly steep slope occurs in the 6–under 8 class, signifying a large jump in class frequency totals.
Stem-and-Leaf Plots Another way to organize raw data into groups is by a stem-and-leaf plot. This technique is simple and provides a unique view of the data. A stem-and-leaf plot is constructed by separating the digits for each number of the data into two groups, a stem and a leaf. The leftmost digits are the stem and consist of the higher-valued digits. The rightmost digits are the leaves and contain the lower values. If a set of data has only two digits, the stem is the value on the left and the leaf is the value on the right. For example, if 34 is one of the numbers, the stem is 3 and the leaf is 4. For numbers with more than two digits, division of stem and leaf is a matter of analyst preference. Table 2.4 contains scores from an examination on plant safety policy and rules given to a group of 35 job trainees. A stem-and-leaf plot of these data is displayed in Table 2.5. One advantage of such a distribution is that the instructor can readily see whether the scores are in the upper or lower end of each bracket and also determine the spread of the scores. A second advantage of stem-and-leaf plots is that the values of the original raw data are retained (whereas most frequency distributions and graphic depictions use the class midpoint to represent the values in a class).
stem-and-leaf plot A plot of numbers constructed by separating each number into two groups, a stem and a leaf. The leftmost digits are the stems and the rightmost digits are the leaves.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 29
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 30 C HA PTER 2 Visualizing Data with Charts and Graphs TAB L E 2 . 4 Safety Examination Scores for Plant Trainees 86
77
91
60
55
76
92
47
88
67
23
59
72
75
83
77
68
82
97
89
81
75
74
39
67
79
83
70
78
91
68
49
56
94
81
TAB L E 2 . 5 Stem-and-Leaf Plot for Plant Safety Examination Data
Stem
Leaf
2
3
3
9
4
7
9
5
5
6
9
6
0
7
7
8
8
7
0
2
4
5
5
6
7
7
8
1
1
2
3
3
6
8
9
9
1
1
2
4
7
8
9
Demonstration Problem 2.2 The following data represent the costs (in dollars) of a sample of 30 postal mailings by a company. 3.67 1.83 6.72 3.34 5.10
2.75 10.94 7.80 4.95 6.45
9.15 1.93 5.47 5.42 4.65
5.11 3.89 4.15 8.64 1.97
3.32 7.20 3.55 4.84 2.84
2.09 2.78 3.53 4.10 3.21
Using dollars as a stem and cents as a leaf, construct a stem-and-leaf plot of the data. Solution Stem
Leaf
1
83
93
97
2
09
75
78
84
3
21
32
34
53
55
4
10
15
65
84
95
5
10
11
42
47
6
45
72
7
20
80
8
64
9
15
10
94
67
89
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 30
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.2 Quantitative Data Graphs 31
Concept Check 1. What is the first step in constructing a graph of statistical data? 2. Draw the outline of a histogram for each of the following descriptions. a. A set of quiz scores where the quiz was very easy b. The last digit of the winning lottery numbers for a year c. The average mass of a healthy adult measured monthly over the course of two years 3. When are ogives most useful to decision-makers? 4. What is the key advantage of stem-and-leaf plots over histograms?
2.2 Problems 2.6 Assembly times in minutes for components must be understood in order to “level” the stages of a production process. Construct both a histogram and a frequency polygon for the following assembly time data, and comment on the key characteristics of the distribution. Class Interval 30–under 32 32–under 34 34–under 36 36–under 38 38–under 40 40–under 42 42–under 44 44–under 46
Frequency 5 7 15 21 34 24 17 8
2.7 A call centre is trying to better understand staffing requirements. It records the number of calls received during the evening shift for 78 evenings and obtains the information given below. Construct a histogram of the data and comment on the key characteristics of the distribution. Construct a frequency polygon and compare it with the histogram. Which do you prefer, and why? Class Interval 10–under 20 20–under 30 30–under 40 40–under 50 50–under 60 60–under 70 70–under 80
Frequency 9 7 10 6 13 18 15
2.8 Construct an ogive for the following data. Class Interval 3–under 6 6–under 9 9–under 12 12–under 15 15–under 18 18–under 21
Frequency 2 5 10 11 17 5
2.9 A real estate group is investigating the price for condominiums of a given size in Atlantic Canada. The following sales prices (in $ thousands) were obtained in one region of a city. Construct a stem-andleaf plot for the following data using two digits for the stem. Comment on the key characteristics of the distribution. 212 257 243 218 253 273
239 271 261 238 227 220
255
226
240 266 249 254 270 226
218 234 230 249 257 239
222 239 246 250 261 258
249 219 263 263 238 259
265 255 235 229 240 230
224 260 229 221 239 262
2.10 The following data represent the number of passengers per flight in a sample of 50 flights from Toronto, Ontario, to Detroit, Michigan. 23 25 69 48 45
46 20 34 46 47
66 47 35 23 49
67 28 60 38 19
13 16 37 52 32
58 38 52 50 64
19 44 80 17 27
17 29 59 57 61
65 48 51 41 70
17 29 33 77 19
Construct a stem-and-leaf plot for these data. What does the stemand-leaf plot tell you about the number of passengers per flight? 2.11 The Airports Council International (ACI) publishes data on the world’s busiest airports. Shown below is a histogram constructed from ACI data on the number of passengers that emplaned and deplaned in a recent year (pre-COVID-19 pandemic). As an example, Atlanta’s Hartsfield-Jackson International Airport was the busiest airport in the world, with 110,531,300 passengers. Toronto’s Pearson International Airport was the busiest airport in Canada with 50,496,804 passengers. What are some observations that you can make from the graph? Describe the top 30 airports in the world using this histogram.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 31
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 32 C HA PTER 2 Visualizing Data with Charts and Graphs equipment, personnel, and supplies, fishing for king crabs can be financially risky if not enough crabs are caught. Thus, as pots are pulled and emptied, there is great interest in how many legal king crabs (males of a certain size) there are in any given pot. Suppose the number of legal king crabs is reported for each pot during a season and recorded. In addition, suppose that 200 of these pots are randomly selected and the numbers per pot are used to create the ogive shown below. Study the ogive and comment on the number of legal king crabs per pot.
10 9 8 Frequency
7 6 5 4 3
200
2
180
1 10 de r1
05 10
5– un
de r1
5 95
–u n
de r9
5 –u n 85
de r8
5 –u n 75
de r7
5 65
–u n
de r6
5 –u n
de r5 55
–u n 45
Annual Number of Passengers (millions)
Number of Pots
160
0
140 120 100 80 60 40
2.12 Every year, a hundred or so boats go fishing for Alaskan king crabs for three or four weeks off the Bering Strait. To catch these king crabs, large pots are baited and left on the sea bottom, often a few hundred metres deep. Because of the investment in boats,
20 0
10 20 30 40 50 60 70 80 90 100 110 120 Number of Legal King Crabs per Pot
2.3 Qualitative Data Graphs LEARNING OBJECTIVE 2.3 Describe and construct different types of qualitative data graphs, including pie charts, bar charts, and Pareto charts. Explain when these graphs should be used. In contrast to quantitative data graphs, which are plotted along a numerical scale, qualitative graphs are plotted using non-numerical categories. In this section, we will examine three types of qualitative data graphs: (1) pie charts, (2) bar charts, and (3) Pareto charts.
Pie Charts pie chart A circular depiction of data where the area of the whole pie represents 100% of the data being studied and slices represent a percentage breakdown of the sublevels.
A pie chart is a circular depiction of data where the area of the whole pie represents 100% of the data and slices represent a percentage breakdown of the sublevels. Pie charts show the relative magnitudes of parts to a whole. They are widely used in business, particularly to depict such things as budget categories, market share, and time and resource allocations. However, the use of pie charts is minimized in the sciences and technology because they can lead to less accurate judgements than are possible with other types of graphs.2 Generally, it is more difficult for the viewer to interpret the relative size of angles in a pie chart than to judge the length of rectangles in a histogram or the relative distance of a frequency polygon dot from the x-axis. In the feature Thinking Critically About Statistics in Business Today 2.1, “Where Are Soft Drinks Sold?,” graphical depictions of the percentage of sales by place are displayed by both a pie chart and a vertical bar chart.
2 William S. Cleveland, The Elements of Graphing Data (Monterey, CA: Wadsworth Advanced Books and Software, 1985).
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 32
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.3 Qualitative Data Graphs 33
Thinking Critically About Statistics in Business Today 2.1 Beverages that contain more than 1% by weight of flavours are considered to be soft drinks, including flavoured bottled water, soda water, seltzer water, and tonic water. The soft drink market is extremely large in Canada and worldwide. Based on data from Statista, in a recent year, Canada placed in eleventh place in the global soft drinks market, with a revenue of around US$12 billion. Aside from Pepsi and Coca-Cola, popular soft drinks in Canada include Canada Dry, Crush, Big 8, and Clearly Canadian. Where are soft drinks sold? The following data from Bernstein Research indicate that the four leading places for soft drink sales in the U.S. are supermarkets, soda fountains or snack bars, convenience stores/gas stations, and vending machines. Place of Sales
Percentage
Supermarkets Soda fountains Convenience stores/gas stations Vending machines Mass merchandisers Drugstores Mass merch. 3% Vending 11%
44 24 16 11 3 2
These data can be displayed graphically in several ways. Displayed here are a pie chart and a bar chart of the data. Some statisticians prefer the histogram or the bar chart over the pie chart because they believe it is easier to compare categories that are similar in size with the histogram or the bar chart rather than the pie chart. 50 40
Percent
Where Are Soft Drinks Sold?
30 20 10 0 SuperSoda Conven- Vending Mass Drugstore market fountain ience/Gas merch. Place
Drugstore 2% Supermarket 44%
Things to Ponder 1. How might this information be useful to large soft drink companies? 2. How might the packaging of soft drinks differ according to the top four places where soft drinks are sold? 3. How might the distribution of soft drinks differ between the various places where soft drinks are sold?
Convenience/Gas 16%
Soda fountain 24%
The pie chart is constructed by determining the proportion of the subunit to the whole. Table 2.6 contains the number of barrels refined by day for the top petroleum-refining companies in the U.S. in a recent year. To construct a pie chart from these data, convert the raw sales figures to proportions by dividing each sales figure by the total sales figure. This proportion is analogous to the relative frequency computed for frequency distributions. The pie chart in Figure 2.7 depicts the data from Table 2.6. TA B L E 2. 6
Number of Barrels Refined by Day for the Top Petroleum-refining Companies in the U.S. in a Recent Year
Company
Barrels per Day
Proportion
Marathon Petroleum
3,024,715
0.3057
Valero Energy
2,181,300
0.2204
Phillips 66
1,919,300
0.1940
Exxon Mobil
1,732,124
0.1750
Chevron
1,037,660
0.1049
Totals
9,895,099
1.0000
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 33
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 34 C HA PTER 2 Visualizing Data with Charts and Graphs Chevron 10% Marathon Petroleum 31%
Exxon Mobil 18%
FIGURE 2.7 Pie-chart of
Barrels of Refined Oil by Day for the Top Petroleum-refining Companies in the U.S. in a Recent Year
Phillips 66 19%
Valero Energy 22%
Bar Charts bar chart Chart containing two or more categories along one axis and a series of bars, one for each category, along the other axis.
Another widely used qualitative data graphing technique is the bar chart or bar graph. A bar graph or chart contains two or more categories along one axis and a series of bars, one for each category, along the other axis. Typically, the length of the bar represents the magnitude of the measure (amount, frequency, money, percentage, etc.) for each category. The bar chart is qualitative because the categories are non-numerical, and it may be either horizontal or vertical. A bar chart generally is constructed from the same type of data that are used to produce a pie chart. However, an advantage of using a bar chart over a pie chart for a given set of data is that, for categories that are close in value, it is considered easier to see the difference in the bars of bar charts than to discriminate between pie slices. As an example, consider the data in Table 2.7 regarding how much the average university student spends on some back-to-school items. Constructing a bar chart from these data, the categories are electronics, clothing and accessories, residence furnishings, school supplies, and miscellaneous. Bars for each of these categories are made using the dollar figures given in the table. The resulting bar chart is shown in Figure 2.8. TAB L E 2 . 7 Back-to-School Spending by the Average University Student
Category
Amount Spent ($)
Electronics
211.89
Clothing and accessories
134.40
Residence furnishings
90.90
School supplies
68.47
Miscellaneous
93.72
Misc. School supplies Residence furnishings Clothing and accessories Electronics FIGURE 2.8 Bar Chart of
Back-to-School Spending
50.00
100.00 150.00 Amount Spent ($)
200.00
250.00
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 34
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.3 Qualitative Data Graphs 35
Demonstration Problem 2.3 According to the National Retail Federation and Center for Retailing Education at the University of Florida, the four main sources of inventory shrinkage are employee theft, shoplifting, administrative error, and vendor fraud. The estimated annual dollar amount in shrinkage (in US$ millions) associated with each of these sources is shown below. Construct a pie chart to depict these data. Employee theft
$17,918.6
Shoplifting
15,191.9
Administrative error
7,617.6
Vendor fraud
2,553.6
Total
$43,281.7
Solution Convert each raw dollar amount to a proportion by dividing each individual amount by the total. Employee theft
17,918.6/43,281.7
=
0.414
Shoplifting
15,191.9/43,281.7
=
0.351
Administrative error
7,617.6/43,281.7
=
0.176
Vendor fraud
2,553.6/43,281.7
=
0.059
Total
1.000 Vendor fraud 6% Employee theft 41%
Administrative error 18%
Shoplifting 35%
Using the raw data above, we can also produce the following bar chart. Vendor fraud Administrative error Shoplifting Employee theft 5,000
10,000 15,000 Shrinkage (US$ millions)
20,000
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 35
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 36 C HA PTER 2 Visualizing Data with Charts and Graphs
45
35
40
30
35
25
30
20 15
25 20
10
15
5
10
0
5 Poor wiring
Short in coil
Defective Bearings plug seized
FIGURE 2.9 Pareto Chart for Electric Motor
Problems
100 90 80 70 60 50 40 30 20 10 0
0 Poor wiring
Short in coil
Cumulative Percentage
40
Count
Percentage of Total
45
Defective Bearings plug seized
FIGURE 2.10 Pareto Chart for Electric Motor Problems
Pareto Charts
Pareto chart A vertical bar chart in which the number and types of defects for a product or service are graphed in order of magnitude from greatest to least.
A third type of qualitative data graph is a Pareto chart, which could be viewed as a particular application of the bar chart. One of the important aspects of the quality movement in business is the constant search for causes of problems in products and processes. A graphical technique for displaying problem causes is Pareto analysis. Pareto analysis is a quantitative tallying of the number and types of defects that occur with a product or service. Analysts use this tally to produce a vertical bar chart that displays the most common types of defects, ranked in order of occurrence from left to right. The bar chart is called a Pareto chart. Pareto charts were named after an Italian economist, Vilfredo Pareto, who observed more than 100 years ago that most of Italy’s wealth was controlled by a few families who were the major drivers behind the Italian economy. Quality expert J. M. Juran applied this notion to the quality field by observing that poor quality can often be addressed by attacking a few major causes that result in most of the problems. A Pareto chart enables decision-makers responsible for quality management to separate the most important defects from trivial defects, which helps them set priorities for needed quality improvement work. Suppose the number of electric motors being rejected by inspectors for a company has been increasing. Company officials examine the records of several hundred motors in which at least one defect was found, to determine which defects occurred most frequently. They find that 40% of the defects involved poor wiring, 30% involved a short in the coil, 25% involved a defective plug, and 5% involved seizing of bearings. Figure 2.9 is a Pareto chart constructed from this information. It shows that the three main problems with defective motors—poor wiring, a short in the coil, and a defective plug—account for 95% of the problems. From the Pareto chart, decision-makers can formulate a logical plan for reducing the number of defects. Company officials and workers would probably begin to improve quality by examining the segments of the production process that involve the wiring. Next, they would study the construction of the coil, then examine the plugs used and the plug-supplier process. Figure 2.10 is a different rendering of this Pareto chart. In addition to the bar chart analysis (primary axis), the Pareto analysis contains a cumulative percentage line graph (secondary axis). Observe the slopes on the line graph. The steepest slopes represent the more frequently occurring problems. As the slopes level off, the problems occur less frequently. The line graph gives the decision-maker another tool for determining which problems to solve first.
Concept Check 1. True or false? Pie charts are an effective way of displaying data if the intent is to compare the size of a slice with the whole pie, rather than comparing the slices among themselves. 2. What type of chart do you think could be used to help answer the following questions?
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 36
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.4 Charts and Graphs for Two Variables 37 a. What are the main issues that our business is facing? b. What 20% of sources are causing 80% of our quality control problems? c. Where should we focus our efforts to achieve the largest improvement? 3. What is the key difference between histograms and bar charts?
2.3 Problems 2.13 The top seven airlines in the world by passengers carried in a recent year were American Airlines with 215.18 million, Delta Airlines with 204.00 million, Southwest Airlines with 162.68 million, United Airlines with 162.44 million, Ryanair with 152.40 million, China Southern Airlines with 151.63 million, and Lufthansa Group with 145.20 million. Construct a pie chart and a bar chart to depict this information. 2.14 The following is a list of the six largest pharmaceutical companies in the world ranked by revenue (in US$ billions). Use this information to construct a pie chart and a bar chart to represent these six companies and their revenues. Pharmaceutical Firm
Revenue (US$ billions)
Johnson & Johnson Sinopharm Pfizer Roche AbbVie Novartis
93.78 81.77 81.29 67.83 56.20 51.63
2.15 Shown here is the policyadvisor.com list of the top five Canadian insurance firms with their annual premiums in a recent year. Construct a bar chart to display these data. Construct a pie chart to represent these data and label the slices with the appropriate percentages. Comment on the effectiveness of using a pie chart to display the total annual premiums for these top insurance firms. Insurance Firm
2.16 The Canada Beef Export Federation reports that the top five destinations for Canadian beef in a recent year were the U.S. with $2,360 million, Japan with $346 million, Mexico with $114 million, Hong Kong/Macau with $108 million, and China with $82 million. Construct a pie chart to depict this information. 2.17 An airline uses a call centre to take reservations. It has been receiving an unusually high number of customer complaints about its reservation system. The company conducted a survey of customers, asking them whether they had encountered any of the following problems in making reservations: busy signal, disconnection, poor connection, too long a wait to talk to someone, could not get through to an agent, or transferred to the wrong person. Suppose a survey of 744 complaining customers resulted in the following frequency tally. Number of Complaints 184 10 85 37 420 8
Complaint Too long a wait Transferred to the wrong person Could not get through to an agent Got disconnected Busy signal Poor connection
Construct a Pareto chart from this information to display the various problems encountered in making reservations.
Annual Premiums ($ billions)
Manulife Canada Life Sun Life Desjardins iA Financial Group
39.2 36.45 20.9 9.2 8.6
2.4 Charts and Graphs for Two Variables LEARNING OBJECTIVE 2.4 Display and analyze two variables simultaneously using cross-tabulation and scatter plots. It is very common in business statistics to want to analyze two variables simultaneously in an effort to gain insight into a possible relationship between them. For example, business analysts might be interested in the relationship between years of experience and amount of productivity in a manufacturing facility or in the relationship between a person’s technology usage and their age. Business analysts have many techniques for exploring such relationships. Two of the more elementary tools for observing the relationships between two variables are cross-tabulation and scatter plots.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 37
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 38 C HA PTER 2 Visualizing Data with Charts and Graphs
Cross-Tabulation cross-tabulation A process for producing a two‐dimensional table that displays the frequency counts for two variables simultaneously.
Cross-tabulation is a process for producing a two-dimensional table that displays the frequency counts for two variables simultaneously. As an example, suppose a job satisfaction survey of a randomly selected sample of 177 bankers is taken in the banking industry. The bankers are asked how satisfied they are with their job using a 1 to 5 scale where 1 denotes very dissatisfied, 2 denotes dissatisfied, 3 denotes neither satisfied nor dissatisfied, 4 denotes satisfied, and 5 denotes very satisfied. In addition, each banker is asked to report his or her age by using one of three categories: under 30 years, 30 to 50 years, and over 50 years. Table 2.8 displays how some of the data might look as they are gathered. Note that age and level of job satisfaction are recorded for each banker. By tallying the frequency of responses for each combination of categories between the two variables, the data are cross-tabulated according to the two variables. For instance, in this example, there is a tally of how many bankers rated their level of satisfaction as 1 and were under 30 years of age, there is a tally of how many bankers rated their level of satisfaction as 2 and were under 30 years of age, and so on until frequency tallies are determined for each possible combination of the two variables. Table 2.9 shows the completed cross-tabulation table for the banker survey. A cross-tabulation table is sometimes referred to as a contingency table, and Excel calls such a table a PivotTable.
Scatter Plot scatter plot A plot or graph of the pairs of data from a simple regression analysis.
A scatter plot is a two-dimensional graph plot of pairs of points from two numerical variables. The scatter plot is a graphical tool that is often used to examine possible relationships between two variables. TA B L E 2. 8 Banker Data Observations by Job Satisfaction and Age
Banker
Level of Job Satisfaction
Age
1
4
53
2
3
37
3
1
24
4
2
28
5
4
46
6
5
62
7
3
41
8
3
32
9
4
29
.
.
.
.
.
.
.
.
.
177
3
51
TAB L E 2. 9 Cross-Tabulation Table of Banker Data Age Category
Level of Job Satisfaction
Under 30
30–50
Over 50
Total
1
7
3
0
10
2
19
14
3
36
3
28
17
12
57
4
11
22
16
49
5
2
9
14
25
Total
67
65
45
177
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 38
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 2.4 Charts and Graphs for Two Variables 39
As an example of two numerical variables, consider the data in Table 2.10. Displayed are the base salary and total compensation (including cash bonuses, stock-based bonuses, options-based bonuses, pension value, and any other payments aside from base salary) for the 25 highest-paid Canadian CEOs in a recent year. Do these two numerical variables exhibit any relationship? It might seem logical that when the base salary is high, the total CEO compensation would be high as well. However, the scatter plot of these data, displayed in Figure 2.11, shows somewhat mixed results. The apparent tendency is that total CEO compensation falls mostly in the range between $10,000 and $30,000 (in $ thousands) for the entire range of base salaries, with a single exception that takes the highest total CEO compensation for a relatively low base salary. T otal Compensation and Base Salary for the 25 TAB L E 2. 10 Highest-Paid Canadian CEOs
Total Compensation ($ thousands)
Base Salary ($ thousands)
45,306
282
26,903
742
24,320
2,415
22,248
1,433
22,109
436
20,264
140
18,941
2,048
17,761
2,146
17,055
1,546
16,816
1,601
16,381
1,700
16,037
1,031
15,189
1,271
14,910
1,274
14,697
1,585
14,552
1,466
14,203
1,850
13,569
196
13,566
0 ($1.00)
13,481
1,500
13,081
1,850
13,080
1,250
13,055
4,776
13,036
1,110
12,872
1,744
Source: Adapted from David Mcdonald, “Another Year in Paradise, CEO Pay in 2020,” Canadian Centre for Policy Alternatives, January 2022. https:// policyalternatives.ca/publications/reports/another-year-paradise.
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 39
22/03/23 9:38 AM
Get complete eBook Order by email at [email protected] 40 C HA PTER 2 Visualizing Data with Charts and Graphs 5,000
Base Salary ($ thousands)
4,500 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0
0
10,000
20,000
30,000
40,000
50,000
Total Compensation ($ thousands) FIGURE 2.11 Scatter Plot of Total Compensation and Base Salary for the 25
Highest-Paid Canadian CEOs
Concept Check 1. Draw the outline of a scatter plot for each of the following sets of two numerical variables. a. The mass and height of 30 people selected at random b. The mass and IQ of 30 people selected at random c. The number of advertising dollars spent by a company and the total sales revenue
2.4 Problems 2.18 The U.S. National Oceanic and Atmospheric Administration, National Marine Fisheries Service, publishes data on the quantity and value of domestic fishing in the U.S. The quantity (in millions of kilograms) of fish caught and used for human food and for industrial products (oil, bait, animal food, etc.) over a decade follows. Is a relationship evident between the quantity used for human food and the quantity used for industrial products for a given year? Construct a scatter plot of the data. Examine the plot and discuss the strength of the relationship of the two variables. Human Food
Industrial Product
1,661 1,612 1,493 1,472 1,509 1,497 1,542 1,794 2,085 2,820
1,285 1,105 1,401 1,455 1,417 1,347 1,199 1,341 1,184 1,027
2.19 Are the advertising dollars spent by a company related to total sales revenue? The following data represent the advertising dollars and the sales revenues for various companies in a given industry
during a recent year. Construct a scatter plot of the data from the two variables and discuss the relationship between the two variables. Advertising ($ millions)
Sales ($ millions)
4.2 1.6 6.3 2.7 10.4 7.1 5.5 8.3
155.7 87.3 135.6 99.0 168.2 136.9 101.4 158.2
2.20 It seems logical that the number of days per year that an employee is late for work is at least somewhat related to the employee’s job satisfaction. Suppose 10 employees are asked to record how satisfied they are with their job on a scale from 0 to 10, with 0 denoting completely unsatisfied and 10 denoting completely satisfied. Suppose also that through human resource records, it is determined how many days each of these employees was tardy last year. The scatter plot graphs the job satisfaction scores of each employee against the number of days he or she was tardy. What information can you glean from the scatter plot? Does there appear to be any relationship between job satisfaction and tardiness? If so, how might they appear to be related?
Get complete eBook Order by email at [email protected] c02VisualizingDataWithChartsAndGraphs.indd 40
22/03/23 9:38 AM
Get Complete eBook Download Link below for instant download
https://browsegrades.net/documents/2 86751/ebook-payment-link-for-instantdownload-after-payment Follow the Instructions for complete eBook Download by email.