2,321 404 405MB
English Pages 648 [577] Year 2020
Basic Bivariate Techniques
REBECCA M. WARNER
©
Applied Statistics I Third Edition
1% Pageiof 624 - Location 2 of 15772
To my students: Past, present, andfuture.
1% Pageiiof 624 + Location 4 of 15772
Sara Miller McCune founded SAGE Publishing in 1965 to support the dissemination of usable knowledge and educate a global community. SAGE publishes more than 1000 journals and over 800 new books each year, spanning a wide range of subject areas. Our growing selection of library products includes archives, data, case studies and video. SAGE remains majority owned by our
founder and after her lifetime will become owned by a charitable trust that secures the company’s continued independence. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
1% Pageiiof 624 - Location 5 of15772
Applied Statistics I Basic Bivariate Techniques Third Edition
Rebecca M. Warner ProfessorEmerita, University ofNew Hampshire
© SAGE Los Angeles | London | New Delhi Singapore | Washington DC | Melbourne
Los Angeles London New Delhi Singapore Washington DC Melbourne
1% Раде тof 624 » Location 10 of 15772
Copyright © 2021 by SAGE Publications, Inc. All rights reserved. Except as permitted by U.S. copyright law, no part of this work may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without permission in writing from the publisher. All third-party trademarks referenced or depicted herein are included solely for the purpose of illustration and are the propertyof their respective owners. Reference to these trademarks in no way indicates any relationship with, or endorsement by, the trademark owner.
SAGEPublications Asia-Pacific Pte. Ltd. 18 Cross Street #10-10/11/12 China SquareCentral Singapore 048423 ISBN 978-1-5063-5280-0
This bookis printed on acid-free paper.
202122232410987654321 Printed in the United States ofAmerica AcquisitionsEditor: Helen Salmon Editorial Assistant: Megan O'Heffernan ContentDevelopment Editor: Chelsea Neve
SPSS is a registered trademark of International Business Machines Corporation. Excelis a registered trademark of Microsoft Corporation.
Production Editor: LaureenGleason
All Excel screenshots in this book are used with
Typesetter: Hurix Digital
permission from Microsoft Corporation.
OSAGE FOR INFORMATION: SAGEPublications, Inc. 2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected] SAGEPublicationsLtd. 1 Oliver's Yard 55 City Road London, EC1Y 1SP United Kingdom SAGEPublications India Pvt. Ltd. B 1/11 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India
1% Page ivof 624 » Location 14 of 15772
Copy Editor: Jim Kelly Proofreader: Scott Oney
Indexer: Michael Ferreira Cover Designer: Gail Buschman Marketing Manager: Shari Countryman
Applied Statistics I Basic Bivariate Techniques Third Edition
Rebecca M. Warner ProfessorEmerita, University ofNew Hampshire
© SAGE Los Angeles | London | New Delhi Singapore | Washington DC | Melbourne
Los Angeles London New Delhi Singapore Washington DC Melbourne
1% Раде тof 624 » Location 10 of 15772
Detailed Contents Preface Acknowledgments Aboutthe Author Chapter1 - Evaluating Numerical Information 1.1 Introduction 1.2 Guidelines for Numeracy 1.3 SourceCredibility 1.3.1 Self-Interest or Bias 1.3.2 Bias and “Cherry-Picking” 1.3.3 Primary, Secondary, and Third-Party Sources 1.3.4 Communicator Credentials andSkills 1.3.5 Track Record for Truth-Telling 1.4 Message Content
1.4.1 Anecdotal Versus Numerical Information 1.4.2 Citation of Supporting Evidence 1.5 Evaluating Generalizability 1.6 Making Causal Claims 1.6.1 The “Post Hoc, Ergo Propter Hoc”Fallacy 1.6.2 Correlation (by Itself) Does Not Imply Causation 1.6.3 Perfect Correlation Versus Imperfect Correlation 1.6.4 “Individual Results Vary” 1.6.5 Requirements for Evidence of Causal Inference 1.7 Quality Control Mechanisms in Science 1.7.1 Peer Review
1.7.2 Replication and Accumulation of Evidence 1.7.3 Open Science and Study Preregistration
1% Pagexix of624 » Location 61 of 15772.
1.8 Biases of Information Consumers 1.8.1 Confirmation Bias (A Social Influence and Consensus
1.9 Ethical Issues in Data Collection and Analy: 1.9.1 Ethical Guidelines for Researchers: Data Collection 1.9.2 Ethical Guidelines for Statisticians: Data Analysis and Reporting
1.10 Lying With Graphs and Statistics 1.11 Degrees of Belief 1.12 Summary
Chapter 2 - Basic Research Concepts 2.1 Introduction 2.2 Types of Variables 2.2.1 Overview 2.2.2 Categorical Variables 2.2.3 Quantitative Variables 2.2.4 Ordinal Variables 2.2.5 Variable Type and Choice of Analysis 2.2.6 Rating Scale Variables 2.2.7 Scores That Represent Counts
2.3 Independent and Dependent Variables 2.4 Typical Research Questions 2.4.1 AreX and Correlated? 2.4.2 Does X Predict Y? 2.4.3
Does X Cause Y?
2.5 Conditions for Causal Inference 2.6 Experimental Research Design 2.7 Nonexperimental Research Design 2.8 Quasi-Experimental Research Designs 2.9 Other Issues in Design and Analysis 2.10 Choice ofStatistical Analysis Preview 2.11 Populations and Samples: Ideal Versus Actual Situations
2.11.1 Ideal Definition of Population and Sample 2.11.2 Two Real-World Research Situations Similar to the Ideal Population and Sample Situation 2.11.3 Actual Research Situations That Are Not Similar to Ideal Situations 2.12 Common Problems in Interpretation of Results Appendix 2A: More About Levels of Measurement
Appendix 2B: Justification for the Use of Likert and Other Rating Scales as Quantitative Variables (in Some Situations’ Chapter3 - Frequency Distribution Tables 3.1 Introduction 3.2 Use of Frequency Tables for Data Screening 3.3 Frequency Tables for Categorical Variables 3.4 Elements of Frequency Tables 3.4.1 Frequency Counts (n or f
3.4.2 Total Numberof Scores in a Sample (N 3.4.3 Missing Values (if Any 3.4.4 Proportions 3.4.5 Percentages
3.4.6 Cumulative Frequencies or Cumulative Percentages 3.5 Using SPSS to Obtain a Frequency Table 3.6 Mode, Impossible Score Values, and Missing Values 3.7 Reporting Data Screening for Categorical Variables 3.8 Frequency Tables for Quantitative Variables 3.8.1 Ungrouped Frequency Distribution 1% Pagexix of 624 - Location 94 of 15772
3.8.2 Evaluation of Score Location Using Cumulative Percentage 3.8.3 Grouped or Binned Frequency Distributions 3.9 Frequency Tables for Categorical Versus Quantitative Variables 3.10 Reporting Data Screening for Quantitative Variables 3.11 What We Hopeto See in Frequency Tables for Categorical Variables 3.11.1 Categorical Variables That Represent Naturally Occurring Groups
3.11.2 Categorical Variables That Represent Treatment Groups
3.12 What We Hopeto See in Frequency Tables for Quantitative Variables 3.13 Summary
Appendix 3A: Getting Started in IBM SPSS” Version 25 3.A.1 The Bare Minimum: Using an Existing SPSS Data File to Obtain, Print, and Save Results 3.A.2 Moving Between Windows in SPSS
3.A.3 Creating a File and Entering Data
3.A.4 Defining Variable Names and Properties of Variables Appendix 3B: Missing Values in Frequency Tables Appendix 3C: Dividing Scores Into Groups or Bins Chapter4 - Descriptive Statistics 4.1 Introduction 4.2 Questions About Quantitative Variables 4.3 Notation
4.4 Sample Median 4.5 Sample Mean (M) 4.6 An Important Characteristic of M:
The Sum ofDeviations From M = 0 4.7 Disadvantage of M: It Is Not Robust Against Influence of Extreme Scores 4.8 Behavior of Mean, Median, and Mode in Common Real-World Situations 4.8.1 Example 1: Bell-Shaped Distribution 4.8.2 Example 2: Bimodal or Polarized Distribution 4.8.3 Example 3: Skewed Distribution 4.8.4 Example 4: No Clear Mode 4.9 Choosing Among Mean, Median, and
Mode 4.10 Using SPSS to Obtain Descriptive Statistics for a Quantitative Variable 4.11 Minimum, Maximum, and Range: Variation Among Scores 4.12 The Sample Variance s? 4.12.1 Step 1: Deviation of Each Score From the Mean 4.12.2 Step 2: Sum of Squared Deviations 4.12.3 Step 3: Degrees of Freedom 4.12.4 Putting the Pieces Together: Computing a Sample Variance 4.13 Sample Standard Deviation (S or SD! 4.14 How a Standard Deviation Describes Variation Among Scores in a Frequency
5.1 Introduction 5.2 Pie Chartsfor Categorical Variables 5.3 Bar Charts for Frequencies of Categorical Variables 5.4 GoodPractice for Construction of Bar Charts 5.5 Deceptive Bar Graphs 5.6 Histograms for Quantitative Variables 5.7 Obtaining a Histogram Using SPSS 5.8 Describing and Sketching BellShaped Distributions 5.9 GoodPractices in Setting Up Histograms 5.10 Boxplot (Box and Whiskers Plot: 5.10.1 How to Set Up a Boxplot by Hand 5.10.2 How to Obtain a Boxplot Using SPSS 5.11 Telling Stories About Distributions 5.12 Usesof Graphs in Actual Research 5.13 Data Screening: Separate Bar Charts or Histograms for Groups 5.14 Use of Bar Charts to Represent Group Means
5.15 Other Examples 5.15.1 Scatterplots 5.15.2 Maps
5.15.3 Historical Example 5.16 Summary
Table 4.15 Why Is There Variance? 4.16 Reports of DescriptiveStatistics in ournal Articles 4.17 Additional Issues in Reporting Descriptive Statistics 4.18 Summary
Appendix 4A: Orderof Arithmetic Operations
Appendix 4B: Rounding Chapter 5 - Graphs: Bar Charts, Histograms, and Boxplots 1% Pagexix of 624 » Location 123of 15772
Chapter6 - The Normal Distribution and z Scores
6.1 Introduction 6.2 Locations of Individual Scores in Normal Distributions 6.3 Standardized or z Scores 6.3.1 First Step in Finding az Score for X: The Distance of X From M 6.3.2 Second Step: Divide the (X - M) Distance by SD to Obtain a Unit-Free or Standardized Distance of Score
From the Mean 6.4 Converting z Scores Back Into X Units 6.5 Understanding Values of z 6.6 Qualitative Description of Normal Distribution Shape 6.7 More Precise Description of Normal Distribution Shape 6.8 Areas Under the Normal Distribution Curve Can Be Interpreted as Probabilities 6.9 Reading Tables ofAreasfor the Standard Normal Distribution 6.10 Dividing the Normal Distribution Into Three Regions: LowerTail, Middle, and Upper Tail 6.11 Outliers Relative to a Normal Distribution 6.12 Summary of First Part of Chapter 6.13 Why WeAssess Distribution Shape 6.14 Departure From Normality: Skewness ⑥.①⑤ Another Departure From Normality: Kurtosis 6.16 Overall Normality 6.17 Practical Recommendations for Preliminary Data Screening and Descriptions ofScores for Quantitative Variables 6.18 Reporting Information About Distribution Shape, Missing Values Outliers, and Descriptive Statistics for Quantitative Variables 6.19 Summary
Appendix 6A: The Mathematics of the Normal Distribution Appendix 6B: How to Select and Remove Outliers in SPSS Appendix 6C: Quantitative Assessments of Departure From Normality 6.C.1 Index for Skewness 6.C.2 Index for Kurtosis 6.C.3 Test for Overall Departure 2% Page xix of 624 » Location 154 of 15772
From Normal Distribution Shape Appendix 6D: Why Are Some Real-World Variables Approximately Normally Distributed? Appendix 6E: Saving z Scores for All Cases
Chapter7 - Sampling Error and Confidence Intervals 7.1 Descriptive Versus Inferential Uses ofStatistics 7.2 Notation for Samples Versus Populations 7.3 Sampling Error and the Sampling Distribution for Values of M 7.3.1 What Is Sampling Error? 7.3.2 Sampling Errorin a Classroom Demonstration
7.3.3 Sampling Error in Monte Carlo Simulations 7.4 Prediction Error 7.5 Sample Versus Population (Revisited: 7.5.1 Representative Samples 7.5.2 Convenience Samples 7.6 The Central Limit Theorem: Characteristics of the Sampling Distribution of M 7.7 Factors That Influence Population Standard Error (om)
7.8 Effect of N on Value of the Population Standard Error 7.9 Describing the Location ofa Single Outcome for M Relative to Population Sampling Distribution (Setting Up az Ratio’
7.10 What We Do When o Is Unknown 7.11 The Family of t Distributions 7.12 Tables fort Distributions 7.13 Using Sampling Errorto Set Up a Confidence Interval 7.14 How to Interpret a Confidence Interval
7.15 Empirical Example: Confidence Interval for Body Temperature 7.16 Other Applications for Confidence Intervals 7.16.1 CIs Can Be Obtained for Other Sample Statistics (Such as Proportions; 7.16.2 Margin of Error in Political Polls 7.17 Error Bars in Graphs of Group Means 7.18 Summary
Chapter 8 - The One-Sample t Test: Introduction to Statistical Significance Tests 8.1 Introduction 8.2 Significance Tests as Yes/No Questions About Proposed Values of Population Means 8.3 Stating a Null Hypothesis 8.4 Selecting an Alternative Hypothesis 8.5 The One-Sample t Test 8.6 Choosing an Alpha (A) Level 8.7 Specifying Reject Regions on the Basis of a, Ha, and df 8.8 Questions for the One-Sample t Test 8.9 Assumptions for the Use of the OneSample t Test 8.10 Rules for the Use of NHST 8.11 First Analysis of Mean Driving Speed Data (Using a Nondirectional Test 8.12 SPSS Analysis: One-Samplet Test for Mean Driving Speed (Using a Nondirectional or Two-Tailed Test 8.13 “Exact” p Values 8.14 Reporting Results for a Two-Tailed One-Sample t Test 8.15 Second Analysis of Driving Speed Data Using a One-Tailed or Directional Test
8.16 Reporting Results for a One-Tailed One-Sample t Test 2% Pagexix of 624 - Location 185 of 15772.
8.17 Advantages and Disadvantages of One-Tailed Tests 8.18 Traditional NHST Versus New Statistics Recommendations
8.19 Things You Should Not Say About p Values 8.20 Summary
Chapter9 Issues in Significance Tests: Effect Size, Statistical Power, and Decision Errors 9.1 Beyond p Values 9.2 Cohen's d: An Effect Size Index 9.3 Factors That Affect the Size of t
Ratios 9.4 Statistical Significance Versus Practical Importance 9.5 Statistical Power 9.6 Type land TypeII Decision Errors 9.7 Meaningsof “Error” 9.8 Use of NHST in Exploratory Versus Confirmatory Research 9.9 Inflated Risk for Type I Decision Error for Multiple Tests 9.10 Interpretation of Null Outcomes 9.11 Interpretation ofStatistically Significant Outcomes 9.11.1 Sampling Error 9.11.2 Human Error
9.11.3 Misleading p Values 9.12 Understanding Past Research 9.13 Planning Future Research 9.14 Guidelines for Reporting Results 9.15 What You Cannot Say 9.16 Summary
Appendix 9A: Further Explanation of Statistical Power Chapter 10 - Bivariate Pearson Correlation 10.1 Research Situations Where Pearson's r Is Used 10.2 Correlation and Causal Inference 10.3 How Sign and Magnitude of r Describe an X, Y Relationship
10.4 Setting Up Scatterplots 10.5 Most Associations Are Not Perfect 10.6 DifferentSituations in Which = .00
10.7 Assumptions for Use of Pearson'sr 10.7.1 Sample Must Be Similar to Population of Interest 10.7.2 X, Y Association Must Be Reasonably Linear 10.7.3 No Extreme Bivariate Outliers 10.7.4 Independent Observations for X and Independent Observations
for 10.7.5 X and Y Must Be Appropriate Variable Types 10.7.6 Assumptions About Distribution Shapes 10.8 Preliminary Data Screening for Pearson’sr
10.9 Effect of Extreme Bivariate Outliers 10.10 Research Example 10.11 Data Screening for Research Example 10.12 Computation of Pearson'sr 10.13 How Computation of Correlation Is Related to Pattern of Data Points in the Scatterplot 10.14 Testing the Hypothesis That po = 0
10.15 Reporting Many Correlations and Inflated Risk for Type I Error 10.15.1 Call Results Exploratory and De-empbhasize or Avoid Statistical Significance Tests 10.15.2 Limit the Numberof Correlations 10.15.3 Replicate or Cross-Validate Correlations 10.15.4 Bonferroni Procedure: Use More Conservative Alpha Level for Tests of Individual Correlations 10.15.5 Common Bad Practice in 2% Page xix of 624 + Location 218 of 15772
Reports of NumerousSignificance Tests
10.15.6 Summary: Reporting Numerous Correlations 10.16 Obtaining Confidence Intervals for Correlations 10.17 Pearson’s r and r? as Effect Sizes and Part n of Variance 10.18 Statistical Power and Sample Size for Correlation Studies 10.19 Interpretation of Outcomes for Pearson’sr
10.19.1 When r Is Not Statistically Significant 10.19.2 When Is Statistically Significant 10.19.3 Sources of Doubt 10.19.4 The Problem of Spuriousness 10.20 SPSS Example: Relationship Survey 10.21 Results Sections for One and Several Pearson's r Values 10.22 Reasons to Be Skeptical of Correlations 10.23 Summary
Appendix 10A: Nonparametric Alternatives to Pearson’s r 10.A.1 Spearman'sr
Appendix 10B: Setting Up a 95% CI for Pearson's r by Hand Appendix 10C: Testing Significance of Differences Between Correlations Appendix 10D: Some Factors That Artifactually Influence Magnitude of r Appendix 10E: Analysis of Nonlinear Relationships Appendix 10F: Alternative Formula to Compute Pearson’s r
Chapter11 - Bivariate Regression 11.1 Research Situations Where Bivariate Regression Is Used
11.2 New Information Provided by Regression 11.3 Regression Equations and Lines 11.4 Two Versions of Regression Equations 11.4.1 Raw-Score Regression
Equation 11.4.2 Standardized Regression Equation 11.4.3 Comparing the Two Forms of Regression 11.5 Steps in Regression Analysis 11.6 Preliminary Data Screening 11.7 Formulas for Bivariate Regression Coefficients 11.8 Statistical Significance Tests for Bivariate Regression 11.9 Confidence Intervals for Regression Coefficients 11.10 Effect Size and Statistical Power 11.11 Empirical Example Using SPSS: Salary Data 11.12 SPSS Output: Salary Data 11.13 Results Section: Hypothetical Salary Data 11.14 Plotting the Regression Line: Salary Data 11.15 Using a Regression Equation to Predict Scorefor Individual (Joe’s Heart Rate Data
11.16 Partition of Sums of Squares in Bivariate Regression 11.17 Why Is There Variance Revisited)? 11.18 Issues in Planning a Bivariate Regression Study 11.19 Plotting Residuals 11.20 Standard Error of the Estimate 11.21 Summary
Appendix 11A: Review: How to Graph a Line From Two Points Obtained From an 2% Page xix of 624 » Location 248 of 15772
Equation
Appendix 11B: OLSDerivation of Equation for Regression Coefficients Appendix 11C: Alternative Formulafor Computation of Slope Appendix 11D: Fully Worked Example: Deviations and SS Chapter 12 - The Independent-Samples t Test 12.1 Research Situations Where the Independent-Samples t Test Is Used 12.2 A Hypothetical Research Example 12.3 Assumptions for Use of Independent-Samples t Test 12.3.1 Y Scores Are Quantitative 12.3.2 Y Scores Are Independent of Each Other Both Between and Within Groups 12.3.3 Y Scores Are Sampled From Normally Distributed Populations With Equal Variances 12.3.4 No Outliers Within Groups 12.3.5 Relative Importance of Violations of These Assumptions 12.4 Preliminary Data Screening: Evaluating Violations of Assumptions and Getting to Know Your Data 12.5 Computation of IndependentSamples t Test 12.6 Statistical Significance of Independent-Samples t Test 12.7 Confidence Interval Around M; - Ma
12.8 SPSS Commands for IndependentSamples t Test 12.9 SPSS Output for IndependentSamples t Test 12.10 Effect Size Indexes for t
12.10.1 М; - Мо 12.10.2 Eta Squared (n°) 12.10.3 Point Biserialr (pb)
⑫.①0.④ Cohen`sd
12.10.5 Computation of EffectSizes for Heart Rate and Caffeine Data 12.10.6 Summary of Effect Sizes 12.11 Factors That Influence the Size of t 12.11.1 Effect Size and N 12.11.2 Dosage Levels for Treatment, or Magnitudes of Differences for Participant Characteristics, Between Groups 12.11.3 Controlof Within-Group Error Variance 12.11.4 Summary for Design Decisions 12.12 Results Section 12.13 Graphing Results: Means and Cls 12.14 Decisions About Sample Size for the Independent-Samples t Test 12.15 Issues in Designing a Study 12.15.1 Avoiding Potential Confounds 12.15.2 Decisions About Type or Dosageof Treatment 12.15.3 Decisions About Participant Recruitment and Standardization of Procedures 12.15.4 Decisions About Sample Size 12.16 Summary
Appendix 12A: A Nonparametric Alternative to the Independent-Samples t Test
Chapter 13 - One-Way Between-Subjects Analysis of Variance 13.1 Research Situations Where OneWay ANOVA Is Used 13.2 Questions in One-Way Between-S ANOVA
13.3 Hypothetical Research Example 13.4 Assumptions and Data Screening for One-Way ANOVA 13.5 Computations for One-Way Between-S ANOVA
2% Pagexix of 624 - Location 279 of 15772
13.5.1 Overview 13.5.2 SSpetween: Information About Distances Among Group Means 13.5.3 SSyithin: Information About
Variability of Scores Within Groups 13.5.4 SStota]: Information About
Total Variance in Y Scores 13.5.5 Converting Each SS to a Mean Square andSetting Up an F Ratio 13.6 Patterns of Scores and Magnitudes
OfSShetween and SSwithin 13.7 Confidence Intervals for Group Means
13.8 Effect Sizes for One-Way Between-S ANOVA
13.9 Statistical Power Analysis for OneWay Between-S ANOVA
13.10 Planned Contrasts 13.11 Post Hoc or “Protected” Tests 13.12 One-Way Between-S ANOVAin SPSS
13.13 Output From SPSS for One-Way Between-S ANOVA
13.14 Reporting Results From One-Way Between-S ANOVA
13.15 Issues in Planning a Study 13.16 Summary
Appendix 13A: ANOVA Model and Division of Scores Into Components Appendix 13B: Expected Value of F When Ho Is True
Appendix 13C: Comparison of ANOVA andt Test Appendix 13D: Nonparametric Alternative to One-Way Between-S ANOVA: Independent-Samples KruskalWallis Test Chapter14 - Paired-Samples t Test 14.1 Independent- Versus PairedSamples Designs
14.2 Between-S and Within-S or PairedGroups Designs 14.3 Types ofPaired Samples 14.3.1 Naturally Occurring Pairs Different but Related Persons in the Two Samples; 14.3.2 Creation of Matched Pairs 14.4 Hypothetical Study: Effects of Stress on Heart Rate
14.5 Review: Data Organization for Independent Samples 14.6 New: Data Organization for Paired Samples 14.7 A First Look at Repeated-Measures
Appendix 14A: Nonparametric Alternative to Paired-Samples t: Wilcoxon Signed Rank Test Chapter 15 - One-Way Repeated-Measures Analysis of Variance 15.1 Introduction 15.2 Null Hypothesisfor RepeatedMeasures ANOVA
15.3 Preliminary Assessment of Repeated-Measures Data 15.4 Computations for One-Way Repeated-Measures ANOVA 15.5 Use of SPSS Reliability Procedure for One-Way Repeated-Measures ANOVA
Data
15.6 Partition of SS in Between-S Versus
14.8 Calculation of Difference (d) Scores 14.9 Null Hypothesisfor Paired-Samples
Within-S ANOVA
tTest
Measures ANOVA
14.10 Assumptions for Paired-Samples t Test
14.11 Formulas for Paired-Samples t Test 14.12 SPSS Paired-Samples t Test Procedure 14.13 Comparison Between Results for Independent-Samples and PairedSamples t Tests 14.14 Effect Size and Power 14.15 Some Design Problems in Repeated-Measures Analyses 14.15.1 OrderEffects 14.15.2 Counterbalancing to Control for OrderEffects 14.15.3 Carryover Effects 14.15.4 Problems Due to Outside Events and Changes in Participants Across Time 14.16 Results for Paired-Samples t Test: Stress and Heart Rate 14.17 Further Evaluation of Assumptions 14.18 Summary
3% Page xix of 624 » Location 300 of 15772
15.7 Assumptions for Repeated15.7.1 Scores on Outcome Variables Are Quantitative and Approximately Normally Distributed Without Extreme Outliers 15.7.2 Relationships Among the Repeated-Measures Variables Should Be Linear Without Bivariate Outliers 15.7.3 Population Variances of Contrasts Should Be Equal Sphericity Assumption 15.7.4 Assumption of No Person-byTreatment Interaction
15.8 Choices of Contrasts in GLM Repeated Measures 15.8.1 Simple Contrasts 15.8.2 Repeated Contrasts 15.8.3 Polynomial Contrasts 15.8.4 Other Contrasts Available in the SPSS GLM Procedure 15.9 SPSS GLM Procedurefor RepeatedMeasures ANOVA
15.10 Output of GLM Repeated-Measures ANOVA
15.11 Paired-Samples t Tests as Follow-
GLM Procedure
Up
16.12 SPSS Output
15.12 Results 15.13 Effect Size 15.14 Statistical Power 15.15 Counterbalancing in RepeatedMeasures Studies 15.16 More Complex Designs
16.13 Results 16.14 Design Decisions and Magnitudes of SS Terms
15.17 Summary
Appendix 15A: Test for Person-byTreatment Interaction
Appendix 15B: Nonparametric Analysis for Repeated Measures (Friedman Test Chapter16 - Factorial Analysis of Variance 16.1 Research Situations Where Factorial Design Is Used 16.2 Questions in Factorial ANOVA 16.3 Null Hypotheses in Factorial ANOVA
16.3.1 First Null Hypothesis: Test of Main Effect for Factor A 16.3.2 Second Null Hypothesis: Test of Main Effect for Factor B 16.3.3 Third Null Hypothesis: Test of the A x B Interaction 16.4 Screening for Violations of Assumptions 16.5 Hypothetical ResearchSituation 16.6 Computations for Between-$ Factorial ANOVA 16.7 Computation of SS and df in TwoWayFactorial ANOVA 16.8 Effect Size Estimates for Factorial ANOVA
16.9 Statistical Power 16.10 Follow-Up Tests 16.10.1 Nature of a Two-Way Interaction 16.10.2 Nature of Main Effect Differences 16.11 Factorial ANOVA Using the SPSS 3% Page xix of 624 + Location 338 of 15772
16.14.1 Distances Between Group Means (Magnitudes of SSand SSg)
16.14.2 Numberof Scores Within Each Group orCell 16.14.3 Variability of Scores Within Groups orCells (Magnitude of
MSwithin) 16.15 Summary
Appendix 16A: Fixed Versus Random Factors
Appendix 16B: Weighted Versus Unweighted Means Appendix 16C: Unequal Cell n’s in Factorial ANOVA: Computing Adjusted Sums of Squares 16.C.1 Partition of Variance in Orthogonal Factorial ANOVA 16.C.2 Partition of Variance in Nonorthogonal Factorial ANOVA Appendix 16D: Modelfor Factorial ANOVA
Appendix 16E: Computation of Sums of Squares by Hand Chapter 17 - Chi-Square Analysis of Contingency Tables 17.1 Evaluating Association Between Two Categorical Variables 17.2 First Example: Contingency Tables for Titanic Data 17.3 What Is Contingency? 17.4 Conditional and Unconditional Probabilities 17.5 Null Hypothesis for Contingency Table Analysis 17.6 Second Empirical Example: Dog Ownership Data
17.7 Preliminary Examination of Dog Ownership Data 17.8 ExpectedCell Frequencies If Hy Is
True 17.9 Computation of Chi Squared Significance Test 17.10 Evaluation ofStatistical significance of x? 17.11 Effect Sizes for Chi Squared 17.12 Chi Squared Example Using SPSS 17.13 Output From Crosstabs Procedure 17.14 Reporting Results 17.15 Assumptions and Data Screening
for Contingency Tables 17.15.1 Independence of Observations 17.15.2 Minimum Requirements for Expected Values in Cells 17.15.3 Hypothetical Example: Data With One or More Values of E < 5 17.15.4 Four Waysto Handle Tables With Small Expected Values 17.15.5 How to Remove Groups
17.15.6 How to Combine Groups 17.16 Other Measures of Association for Contingency Tables 17.17 Summary
Appendix 17A: Margin of Error for Percentages in Surveys
Appendix 17B: Contingency Tables With Repeated Measures: McNemar Test
Appendix 17C: Fisher Exact Test Appendix 17D: How Marginal Distributions for X and Y Constrain Maximum Value of q Appendix 17E: Other Uses of x? Chapter18 - Selection of Bivariate Analyses and Review of Key Concepts 18.1 Selecting Appropriate Bivariate Analyses
3% Page xix of 624 + Location 368 of 15772
18.2 Types of Independent and Dependent Variables (Categorical Versus Quantitative 18.3 Parametric Versus Nonparametric Analyses 18.4 Comparisons of Means or Medians Across Groups (Categorical IV and Quantitative DV) 18.5 Problems With Selective Reporting of Evidence and Analyses 18.6 Limitations of Statistical Significance Tests and p Values 18.7 Statistical Versus Practical Significance 18.8 Generalizability Issues 18.9 Causal Inference 18.10 Results Sections 18.11 Beyond Bivariate Analyses: Adding Variables 18.11.1 Factorial ANOVA and Repeated-Measures ANOVA 18.11.2 ControlVariables 18.11.3 Moderator Variables 18.11.4 Too Many Variables? 18.12 Some Multivariable or Multivariate Analyses 18.13 Degree of Belief Appendices Appendix A: Proportions of Area Under a Standard Normal Curve Appendix B: Critical Values for t Distribution Appendix C: Critical Values of F Appendix D: Critical Values of Chi-Square Appendix E: Critical Values of the Pearson Correlation Coefficient Appendix F: Critical Values of the Studentized Range Statistic Appendix G: Transformation of r (Pearson Correlation) to Fisher's Z Glossary References
7.15 Empirical Example: Confidence Interval for Body Temperature 7.16 Other Applications for Confidence Intervals 7.16.1 CIs Can Be Obtained for Other Sample Statistics (Such as Proportions; 7.16.2 Margin of Error in Political Polls 7.17 Error Bars in Graphs of Group Means 7.18 Summary
Chapter 8 - The One-Sample t Test: Introduction to Statistical Significance Tests 8.1 Introduction 8.2 Significance Tests as Yes/No Questions About Proposed Values of Population Means 8.3 Stating a Null Hypothesis 8.4 Selecting an Alternative Hypothesis 8.5 The One-Sample t Test 8.6 Choosing an Alpha (A) Level 8.7 Specifying Reject Regions on the Basis of a, Ha, and df 8.8 Questions for the One-Sample t Test 8.9 Assumptions for the Use of the OneSample t Test 8.10 Rules for the Use of NHST 8.11 First Analysis of Mean Driving Speed Data (Using a Nondirectional Test 8.12 SPSS Analysis: One-Samplet Test for Mean Driving Speed (Using a Nondirectional or Two-Tailed Test 8.13 “Exact” p Values 8.14 Reporting Results for a Two-Tailed One-Sample t Test 8.15 Second Analysis of Driving Speed Data Using a One-Tailed or Directional Test
8.16 Reporting Results for a One-Tailed One-Sample t Test 2% Pagexix of 624 - Location 185 of 15772.
8.17 Advantages and Disadvantages of One-Tailed Tests 8.18 Traditional NHST Versus New Statistics Recommendations
8.19 Things You Should Not Say About p Values 8.20 Summary
Chapter9 Issues in Significance Tests: Effect Size, Statistical Power, and Decision Errors 9.1 Beyond p Values 9.2 Cohen's d: An Effect Size Index 9.3 Factors That Affect the Size of t
Ratios 9.4 Statistical Significance Versus Practical Importance 9.5 Statistical Power 9.6 Type land TypeII Decision Errors 9.7 Meaningsof “Error” 9.8 Use of NHST in Exploratory Versus Confirmatory Research 9.9 Inflated Risk for Type I Decision Error for Multiple Tests 9.10 Interpretation of Null Outcomes 9.11 Interpretation ofStatistically Significant Outcomes 9.11.1 Sampling Error 9.11.2 Human Error
9.11.3 Misleading p Values 9.12 Understanding Past Research 9.13 Planning Future Research 9.14 Guidelines for Reporting Results 9.15 What You Cannot Say 9.16 Summary
Appendix 9A: Further Explanation of Statistical Power Chapter 10 - Bivariate Pearson Correlation 10.1 Research Situations Where Pearson's r Is Used 10.2 Correlation and Causal Inference 10.3 How Sign and Magnitude of r Describe an X, Y Relationship
outlier detection and evaluation of distribution
Preface
shape). Connections are made between design decisions and results; for instance, students will
The set of bivariate techniques covered in this book (analyses with one predictor and one outcome) are the same as those in most introductory textbooks. This book provides an applied perspective. What does an applied perspective involve? Textbooks often use well-behaved data (without missing values, outliers, or violations of assumptions). This book introduces, early on, the idea that real data have problems. Discussion of ways in which actual practice differs from ideal situations helps students understand statistics in
the context of real-world research. Here are examples: Textbooks describe random samples
see that choice of dosage levels, control over within-group variance, and sample size influence the obtained magnitude of zand Fratios (along with sampling error, of course). Traditional use of statistical significance tests is covered. However, consistent with the New Statistics guidelines, there is greater emphasis on confidence intervals, effect sizes, and the need to documentdecisions made during analysis. Limitations ofp values are discussed in
nontechnical terms. Discussion also focuses on common researcher behaviors that affect p values (e.g., running numerous analyses and reporting only a few).
from clearly defined populations, while
A distinction is made between “statistical
researchers often work with convenience
significance” and practical or clinical or everyday
samples. Textbooks usually present one
“significance” or importance (i.e., a small y value
significance test in isolation, whereas research
does not necessarily indicate a strong treatment
reports often include numerous analyses,
effect).
accompanied by increased risk for Type I error. This book includes discussion of these problems.
Students are encouraged to think in terms of
Each chapter begins with a simple question: What
paraphrase David Hume, a wise person
kinds of questions can this analysis answer?
proportions belief to the evidence.
Chapters include fully worked examples with byhand computation for small data sets, screenshots for SPSS menu selections and output, and results sections. Technical and supplemental information, including nonparametric alternatives, is provided in appendices at the ends
“degree of belief” rather than yes/no decisions. To
Notation and presentation are consistent with Volume II (Applied Statistics II:Multivariable and Multivariate Technigues[Warner, 2020]).
Digital Resources
of most chapters. Instructor and student support materials are This book devotes less space to rarely used techniques (such as frequency polygons and
available for download from e. SAGE edge offers a |
methods to locate medians in grouped frequency
robust online environment featuring an
distributions) and more space to real-world
impressive array of free tools and resources for
decisions made during data analysis (such as
review, study, and further explorations,
3% Pagexx of 624 » Location 400 of 15772
enhancing use of the textbook by students and
teachers, students, and readers; please e-mail her
teachers.
at [email protected] with comments, corrections, or
SAGE edge for students provides a personalized approach to help you accomplish your coursework goals in an easy-to-use learning environment. Resources include the following:
* Mobile-friendly eFlashcards to strengthen your understanding of key terms * Datasets for completing in-chapter exercises * Links to web resources, including video tutorials and creativelectures, to support and enhance your learning
SAGEedge for instructors supports your teaching by providing resources that are easy to integrate into your curriculum. SAGE edge includes the following:
* Editable, chapter-specific PowerPoint® slides covering key information that offer you flexibility in creating multimedia presentations
* Test banks for each chapter with a diverse range of prewritten questions, which can be loaded into your LMSto help you assess students' progress and understanding
* Tables andfigures pulled from the book that you can download to add to handouts and assignments
* Answers to in-text comprehension questions, perfect for assessing in-class work or take-home assignments Finally, in response to feedback from instructors for R content to mirror the SPSS coverage in this book, SAGE has commissioned Az R Companionfor Applied Statistics Tby Danney Rasco. This short supplement can be bundled with this main
textbook. The author welcomes communication from 3% Pagexxiof 624 » Location 425 of 15772
suggestions.
Acknowledgments
University, Chico Jason King,Baylor College ofMedicine Patrick Leung, University ofHouston
Writers depend on many people for intellectual preparation and moral support. My understanding of statistics was shaped by exceptional teachers, including the late Morris de Grootat Carnegie Mellon University, and my dissertation advisers at Harvard, Robert Rosenthal and David Kenny. Several people who have most strongly influenced my thinking are writers I know only through their books and journal
articles. I want to thank all the authors whose work is cited in the reference list. Authors whose work has particularly influenced my understanding include Jacob and Patricia Cohen, Barbara Tabachnick, Linda Fidell, James Jaccard, Richard Harris, Geoffrey Keppel, and James
Scott E. Maxwell, University ofNotre Dame W. James Potter, University ofCalifornia, Santa
Barbara KyleL. Saunders, Colorado State University Joseph Stevens, University ofOregon James A. Swartz, University ofIllinois at Chicago Keith Thiede, University ofIllinois at Chicago
For the second edition: Diane Bagwell, University of WestFlorida Gerald R. Busheé, George Mason University Evita G. Bynum, University ofMarylandEastern
Shore Ralph Carlson, The University ofTexas Pan
Stevens.
American
Special thanksare due to reviewers who provided
America
exemplary feedback on first drafts of the
Kimberly A. Kick,Dominican University
chapters:
Tracey D. Matthews, Springfield College
John J. Convey, The Catholic University of
For thefirst edition:
Hideki Morooka,Fayetteville State University Daniel J. Mundfrom,New Mexico State
David J. Armor, GeorgeMason University
University
Michael D. Biderman, University ofTennessee
Shanta Pandey, Washington University
at Chattanooga
Beverly L. Roberts, University ofFlorida
Susan Cashin, University of Wisconsin—
Jim Schwab, University ofTexas atAustin
Milwaukee
Michael T. Scoles, University ofCentral
Ruth Childs, University ofToronto
Arkansas
Young-Hee Cho, California State University,
Carla]. Thompson, University of WestFlorida
LongBeach
Michael D. Toland, University ofKentucky
Jennifer Dunn, CenterforAssessment
Paige L. Tompkins, Mercer University
William A. Fredrickson, University of Missouri-Kansas City
For the third edition:
Robert Hanneman, University ofCalifornia,
Linda M. Bajdo, Wayne State University
Riverside
Timothy Ford, University ofOklahoma
Andrew Hayes, The Ohio State University
Beverley Hale, University ofChichester
Lawrence G. Herringer, California State
Dan Ispas,///inois State University
3% Page xxii of 624 - Location 439 of 15772
Jill A. Jacobson, Queen's University Seung-Lark Lim, University ofMissouri, Kansas City Karla Hamlen Mansour, Cleveland State University Paul F. Tremblay, University of Western Ontario Barry Trunk, Capella University I also thank the editorial and publishing team at SAGE, including Helen Salmon, Chelsea Neve, Megan O'Heffernan, and Laureen Gleason, who provided extremely helpful advice, support, and encouragement. Copy editor Jim Kelly merits special thanksfor his attention to detail. Manypeople provided moral support, particularly mylate parents, David and Helen Warner; and friends and colleagues at UNH, including Ellen Cohn, Ken Fuld, Jack Mayer, and Anita Remig. I hope this book is worthy of the support they have given me. Of course, I am responsible for any
errors and omissions that remain. Last but not least, I want to thank all my students, who havealso been my teachers. Their questions continually prompt me to search for better explanations—and I am still learning.
Dr. Rebecca M. Warner ProfessorEmerita Department ofPsychology University ofNew Hampshire
3% Page xxiii of 624 - Location 482 of 15772
who is also the world’s greatest writing
About the Author
buddy.
Rebecca M. Warner is Professor Emerita at the University of New Hampshire. She has taught statistics in the UNH
Department
of
Psychology
and
elsewhere for 40 years. Her courses have included
Introductory
and
Intermediate
Statistics as well as seminars in Multivariate Statistics, Structural Equation Modeling, and Time-Series Analysis. She received a UNH Liberal Arts Excellence in Teaching Award,is
a
Fellow
of
both
the
Association
for
Psychological Science and the Society of Experimental Social Psychology, and is a member
of the
American
Psychological
Association, the International Association for Statistical Education, and the Society for Personality and Social Psychology. She has consulted on statistics and data management for the World Health Organization in Geneva, Project Orbis, and other organizations; and served as
a visiting faculty member
at
Shandong Medical University in China. Her previous book, The Spectral Analysis of TimeSeries Data, was published in 1998. She has published
articles
on
and
social
psychology,
statistics,
health
psychology
in
numerous journals, including the Journal of Personality and Social Psychology. She has served as a reviewer for many journals, including Psychological Bulletin, Psychological Methods,
Personal
Psychometrika.
She
Relationships, received
a
BA
and from
Carnegie Mellon University in social relations in 1973 and a PhD in social psychology from
Harvard in 1978. She writes historical fiction and is a hospice volunteer along with her Pet Partner certified Italian greyhound Benny,
4% Pagexxivof 624 + Location 473 of 15772
12.10.5 Computation of EffectSizes for Heart Rate and Caffeine Data 12.10.6 Summary of Effect Sizes 12.11 Factors That Influence the Size of t 12.11.1 Effect Size and N 12.11.2 Dosage Levels for Treatment, or Magnitudes of Differences for Participant Characteristics, Between Groups 12.11.3 Controlof Within-Group Error Variance 12.11.4 Summary for Design Decisions 12.12 Results Section 12.13 Graphing Results: Means and Cls 12.14 Decisions About Sample Size for the Independent-Samples t Test 12.15 Issues in Designing a Study 12.15.1 Avoiding Potential Confounds 12.15.2 Decisions About Type or Dosageof Treatment 12.15.3 Decisions About Participant Recruitment and Standardization of Procedures 12.15.4 Decisions About Sample Size 12.16 Summary
Appendix 12A: A Nonparametric Alternative to the Independent-Samples t Test
Chapter 13 - One-Way Between-Subjects Analysis of Variance 13.1 Research Situations Where OneWay ANOVA Is Used 13.2 Questions in One-Way Between-S ANOVA
13.3 Hypothetical Research Example 13.4 Assumptions and Data Screening for One-Way ANOVA 13.5 Computations for One-Way Between-S ANOVA
2% Pagexix of 624 - Location 279 of 15772
13.5.1 Overview 13.5.2 SSpetween: Information About Distances Among Group Means 13.5.3 SSyithin: Information About
Variability of Scores Within Groups 13.5.4 SStota]: Information About
Total Variance in Y Scores 13.5.5 Converting Each SS to a Mean Square andSetting Up an F Ratio 13.6 Patterns of Scores and Magnitudes
OfSShetween and SSwithin 13.7 Confidence Intervals for Group Means
13.8 Effect Sizes for One-Way Between-S ANOVA
13.9 Statistical Power Analysis for OneWay Between-S ANOVA
13.10 Planned Contrasts 13.11 Post Hoc or “Protected” Tests 13.12 One-Way Between-S ANOVAin SPSS
13.13 Output From SPSS for One-Way Between-S ANOVA
13.14 Reporting Results From One-Way Between-S ANOVA
13.15 Issues in Planning a Study 13.16 Summary
Appendix 13A: ANOVA Model and Division of Scores Into Components Appendix 13B: Expected Value of F When Ho Is True
Appendix 13C: Comparison of ANOVA andt Test Appendix 13D: Nonparametric Alternative to One-Way Between-S ANOVA: Independent-Samples KruskalWallis Test Chapter14 - Paired-Samples t Test 14.1 Independent- Versus PairedSamples Designs
Self-interest of information providers is not
problem of distance from a source. People form a
always obvious. Many webpages offer “sponsored
line; the first person whispers a message to the
content”: paid messages from advertisers that
second person, the second person whispers it to
look like news articles but in fact promote the
the third, and so forth. When the final message is
interests of advertisers. For instance, a new diet
compared with the original message, there are
pill might be presented as “news” when in fact the
changes and distortions. Transmission of
article is an advertisement. Communicator self-
information can introduce errors because of each
interestraises concerns about credibility ofmessages.
person’s biases or misunderstandings.
1.3.2 Bias and “Cherry-Picking”
In science, a primary is a research report
Communicators generally cannot (or do not) present all available information. Selection of information by communicators can be influenced by
a preference for
written by a researcher who has firsthand knowledge of behaviors and events in a study. Primary source reports (sometimes called articles or papers) are published in journals.2 Primary source data may also appear in books
information that confirms preexisting beliefs or
written for science audiences.
ideas. Biased selection of evidence can be
A secondary is a description or summary
informally called cherry-picking. Information and ideasthat are excluded may be as important
as information thatis included.
experience the reported data collection or observations firsthand. In many disciplines,
As an example of cherry-picking, suppose 20 studies show no association between consuming meat and cancer risk, and 3 studies do show an association. A journalist might report only the 3 studies that showed an association or might report only the single most recent study. Whether the bias was intentional or not, the article will not provide an accurate summary of research results.
When scientists write
of past research, created by someone who did not
(reviews
of past research), they are expected to discuss all past relevant research. Literature reviews are included in the introductions to most primary
source research reports;literature reviews can also be stand-alone papers or books.
1.3.3 Primary, Secondary, and Third-Party Sources An old game called “telephone”illustrates the
4% Page2of624 - Location 512 of 15772
secondary sources are scholarly books. Some journal articles are also secondary sources because they only review past research and do not present
new data about which their authors have firsthand knowledge. Literature reviews in the introductions to science journal articles are secondhand discussions of past studies. (In the sciences, literature refers to past published research.) Unfortunately, primary source reports are usually long and difficult to read (particularly for readers unfamiliar with statistics and technical terms). Languagein research reports is sometimes unnecessarily obscure. Some full-length science research reports are published on the web as
open-access materials; anyone can view these. However, many publishers require fees or subscriptions for access. The consequence is that many people can'teasily understand most primary source information in science and
sometimes cannot even gain access to it.
about things we think we know.
Much content on websites for news organizations
1.3.4 Communicator Credentials and Skills
is third-partyconte . This is content written by someone who may have examined only secondary sources or other thirdhand content, such as news reports or press releases. Often, third-party
Communicators are more believable when they
content is authored by someone who has no
have training and background related to
technical knowledge of the research field and
information in the message. Researchers
statistical methods. Examples include articles
generally have credentials that provide evidence
published by news organizations. These articles
of this training and background,including
usually don’t provide complete or accurate
advanced degrees such as a PhD or MD, affiliations
information aboutresearch results.
with respected organizations such as universities,
In the past, editors of prestigious newspapers
Some journalists have strong credentials in
required reporters to fact-check claims carefully.
science, but many do not. People who do not have
Increasingly, news reports on the web are
training in statistics can easily misunderstand
paraphrases of, or uncritical reposting of, third-
studies that use statistical terms such as /ag/stic
party content from other news sources. Some
regression and odds ratios.
and publications in high-quality science journals.
mass media news sources specifically disclaim responsibility for accuracy. Here is an example;
Celebrity status is not a meaningful credential.
many other news organizations post similar
Famous media personalities, such as Dr. 0z3 and
disclaimers:
other self-appointed lifestyle or health experts, may base recommendations on incomplete or
CNN is a distributor (and not a publisher or
incorrect information.
creator) of content supplied by third parties
Scientific research reports include source
and users. ... Neither CNN nor any third-party
information (authors, university affiliations, and
provider of information guarantees the
so forth). News reports and websites sometimes
accuracy, completeness, or usefulness of any
do not include source information; they provide
content. ... (CNN, 2018)
no basis to evaluate self-interest, distance from information source, and credentials. Guidelines
Communicators can provide better quality
for evaluation of websites are provided by Kiely
information when they are closerto original sources
and Robertson (2016) and Montecino (1998).
ofinformation, and they are likely toprovide better quality information when they assumeresponsibility for accuracy. In everyday life, most of us rely on thirdhand
information most of the time. Because so much of what we think we know is based on thirdhand information, we should not be overly confident
4% Page3 of 624 + Location 542 of 15772
1.3.5 Track Record for TruthTelling There are independent, nonpartisan organizations that evaluate communicator track records for truth-telling in journalism, for example, the Pulitzer Prize-winning site
www.politifact.com. PolitiFact rates statements
experience shown is generalizable: Has this
as true, mostly true, half true, mostly false, false,
experience happened to many other people, or
and “pants on fire” (extremelyfalse). Other
was this a unique situation? Diet product
respected fact-checking sites are
advertisers are required to acknowledge this and
www snopes.com and www. factcheck.org. These fact-checkers do the work that information
typically do so in a tiny footnote: “Individual
consumers usually don’t have the time to do.
results may vary.” In science, a detailed report of an individual
Information published in scientific journals can
person or situation is called a study. The
be incorrect because of fraud; fraud in science is
study of uniquecases, such as the brain damage
rare, but it has occurred. A notorious example was
suffered by railway worker Phineas Gage
aclaim by Andrew Wakefield that vaccines cause
(Kihlstrom, 2010; Twomey, 2010) can be valuable.
autism (discussed by Godlee, Smith, & Marcovitch,
However, generalizability concerns are still
2011). There are severe penalties for fraud or
relevant.
plagiarism in science, including forced retraction of publications, withdrawal of research funds, loss of reputation, and job dismissal. Rare instances of fraud in science can be identified by a
web search for the researcher name and terms such asfraud. Information consumers should be skeptical ofinformationfrom sources withpoor recordsfor truth-telling.
Anecdotal evidence can dramatize genuine problems. However, anecdotal evidence can also dramatize and promote incorrect beliefs. It is obviously easy to cherry-pick anecdotes. Supporting evidence in the form of systematic numerical information can provide a more
accurate overview of evidence than anecdotal
reports.
1.4 Message Content 1.4.1 Anecdotal Versus Numerical Information
1.4.2 Citation of Supporting Evidence In science, identification of outside sources of
evidence is done by citation. Author names and Anecdote means “story,” often about an individual
years of publication are included in the text (to
person or situation. First-person accounts are
identify sources of ideas and evidence), and
often called testimonials. Audiences may find
complete information to locate each source is
narrative stories or anecdotes more persuasive
included in a reference list. Citation has two
and memorable than numerical information.
purposes. First, it gives credit to others for their
There are many potential problems with
ideas and evidence; this avoids plagiarism, which
anecdotes (anecdotal evides
occurs if authors presentideas or contributions of
. Sometimes
individual situations are not reported accurately
other peopleas if they were the authors’ own new
(for example, advertisements for weight loss
contributions. Second,it shows how the present
products often include falsified before and after
study builds upon an existing body of evidence.
photos). Even when anecdotal evidence is accurate, it is difficult to know whether the
4% Page3 of 624 - Location 570 of 15772
A message is more believable when it includes or
refers to specific supporting evidence. In science,
ifthe sample is representative of the population;
the most complete and detailed supporting
representativeness can often be obtained using
evidence appears in primary source research
random or systematic methodsto select the
reports in science journals. Documentation of
sample. Results from an accidental or a
information sources is typically less detailed and
convenience sample may be generalizable to a
systematic in journalism and mass media. (The
hypothetical population if the sample resembles
best science journalists provide references or links
that hypothetical population. Results from a
to primary source research reports.)
biased sampleare not generalizable. In
It is possible for a writer or an advertiser to claim a spuriousair of authority by citing numerous sources. However, along list of references does
not guarantee accuracy. On closer examination,
experiments, generalizability also depends on similarity of type and dosages of experimental treatmentto real-world experiences with the treatmentvariable, setting, and other factors.
readers may find that communicators have
Polling organizations, such as Gallup, collect
cherry-picked, misinterpreted, or misrepresented
public opinion information in ways that provide a
evidence; cited sources that are not relevantto the
good basis for generalization. They use large
topic; or referred only to opinion pieces that do
samples (usually at least 1,000 individuals) and
not actually contain evidence.
obtain these samples using combinations of
To evaluate the quality of evidence, we need to
know how it was collected. Collection of evidence in science is systematic; that is, there are rules and procedures that specify what researchers should
random and systematic selection so that the people who responded to the survey resemble the larger population (such as all registered voters) in terms of age, income, and so forth (Gallup, n.d.).
do to gather evidence and limit the kinds of
When journalists report information from polls
interpretations they are permitted to make. Rules
and demographic studies, they are (once again) in
for statistical analysis are an important part of
a position to cherry-pick. Because of differences in
this.
procedures and types of people contacted, various polling organizations may report different
1.5 Evaluating Generalizability
predictions about presidential candidate
Researchers and journalists usually want to
to support Candidate X may report only the poll in
generalize abouttheir findings. In other words,
which Candidate X had the highest approval
instead of just saying: “45% of the respondents 7
ratings.
talked to said they plan to vote for candidate X,” they wantto say something like “45% of a// registered voters plan to vote for candidate X.”
aresearcher can claim that results obtained in a specific sample would be the same for a population of interest. Results from a sample can be generalized to an actual population of interest
4% Paged of 624 - Location 598 of 15772
preference. A journalist who wants to make a case
In behavioral and social science, the problem of generalizability can have a different form. A researcher may want to know whether cognitive behavioral therapy (CBT) reduces depression. Typically, studies examine small to moderate numbers of cases, for instance, 35 patients who receive CBT and 35 who do not. To generalize results about effects of CBT to a large hypothetical
17.7 Preliminary Examination of Dog Ownership Data 17.8 ExpectedCell Frequencies If Hy Is
True 17.9 Computation of Chi Squared Significance Test 17.10 Evaluation ofStatistical significance of x? 17.11 Effect Sizes for Chi Squared 17.12 Chi Squared Example Using SPSS 17.13 Output From Crosstabs Procedure 17.14 Reporting Results 17.15 Assumptions and Data Screening
for Contingency Tables 17.15.1 Independence of Observations 17.15.2 Minimum Requirements for Expected Values in Cells 17.15.3 Hypothetical Example: Data With One or More Values of E < 5 17.15.4 Four Waysto Handle Tables With Small Expected Values 17.15.5 How to Remove Groups
17.15.6 How to Combine Groups 17.16 Other Measures of Association for Contingency Tables 17.17 Summary
Appendix 17A: Margin of Error for Percentages in Surveys
Appendix 17B: Contingency Tables With Repeated Measures: McNemar Test
Appendix 17C: Fisher Exact Test Appendix 17D: How Marginal Distributions for X and Y Constrain Maximum Value of q Appendix 17E: Other Uses of x? Chapter18 - Selection of Bivariate Analyses and Review of Key Concepts 18.1 Selecting Appropriate Bivariate Analyses
3% Page xix of 624 + Location 368 of 15772
18.2 Types of Independent and Dependent Variables (Categorical Versus Quantitative 18.3 Parametric Versus Nonparametric Analyses 18.4 Comparisons of Means or Medians Across Groups (Categorical IV and Quantitative DV) 18.5 Problems With Selective Reporting of Evidence and Analyses 18.6 Limitations of Statistical Significance Tests and p Values 18.7 Statistical Versus Practical Significance 18.8 Generalizability Issues 18.9 Causal Inference 18.10 Results Sections 18.11 Beyond Bivariate Analyses: Adding Variables 18.11.1 Factorial ANOVA and Repeated-Measures ANOVA 18.11.2 ControlVariables 18.11.3 Moderator Variables 18.11.4 Too Many Variables? 18.12 Some Multivariable or Multivariate Analyses 18.13 Degree of Belief Appendices Appendix A: Proportions of Area Under a Standard Normal Curve Appendix B: Critical Values for t Distribution Appendix C: Critical Values of F Appendix D: Critical Values of Chi-Square Appendix E: Critical Values of the Pearson Correlation Coefficient Appendix F: Critical Values of the Studentized Range Statistic Appendix G: Transformation of r (Pearson Correlation) to Fisher's Z Glossary References
association is not sufficient byitself to prove
male child inherits this genetic mutation, he will
causation because, even if Yand Ycovary,this co-
have hemophilia. Most other heritable diseases do
occurrence may be dueto the influence of one or
not show this perfect association. (For female
more other variables; one of those other variables
children, effects of the hemophilia gene are ruled
might be the real cause of X, or of F, or both. In
out by information on the other X chromosome.)
this example, heat or temperature might cause (or at least predict) ice cream purchase and homicide. The effects of rival explanatory variables can be
reduced or eliminated in well-controlled
Table 1.1
Homophilia EE Hemophilia gene is present 100% Hemophila gene isabsent
®
100%
experiments and reduced bystatistical controls.
Mere co-occurrenceis not enough evidence to makea causal inference. Sometimes the need to look for a different
Table 1.2
HMM Person does not wash hands regularly
23%
67%
explanation is obvious (as in the ice cream/homicide example). It would be absurd to
If a male child does not inherit the gene for
argue that ice cream causes homicide. However,
hemophilia, he will not have hemophilia. In
the need to consider rival explanations also arises
logical terms, the mutated gene is both
for the disease. The mutated gene
in situations that are not so obviously silly. In the diet soft drink/weight gain example, it is
is necessary for hemophilia because a person can’t
conceivable that artifi al sweeteners have causal
get hemophilia without it. The mutated gene is
effects on appetite or metabolism that really do
sufficient for hemophilia, because if a person has
lead to weight gain, even though the artificial
it, he always has hemophilia. In other words,
sweeteners contain zero (or negligible) calories.
hemophilia always occurs when the mutated gene
However, the other explanation (that drinking
is present and never occurs when the mutated
diet beverages leads people to indulge in other
gene is absent.
high-calorie foods) is also plausible. (It is also conceivable that both these explanations are partly correct.) Both experimental and nonexperimental studies, with humans and nonhuman animals, would be helpful in sorting out the relations among variables and whether any of the associations are causal.
Most associations in behavioral and social sciences and medicine are zotperfect. Consider this hypothetical example for a behavior (washing or not washing hands) and a disease outcome (getting sick). Table 1.2 shows an imperfect association. Only 25% of regular hand washers got sick, while 67%
1.6.3 Perfect Correlation Versus
of the those who don’t regularly wash their hands
Imperfect Correlation
got sick. While most people who washed their
Perfect co-occurrence (perfect correlation or statistical association) is rare. Consider the genetic mutation for hemophilia (Table 1.1). Ifa
5% Page7of624 - Location 680 of 15772
hands did not getsick, hand washing did not guarantee that they could avoid getting sick. The association between lung cancer and smoking is also not perfect. The risk for getting lung cancer
is much higher for smokers than for nonsmokers.
Training in research methods and statistics
However, a few nonsmokers do get lung cancer,
provides the skills scientists need to think
and many smokers do not get lung cancer.
carefully about the evidence needed to support
In situations where associations are not perfect, it
is likely that other variables are involved. Behaviors or conditions that sometimes (but not always) precededisease are often usually called
“riskfactors” rather than causes. Smoking is a risk factor for lung cancer. Some diseases have numerous risk factors (for example, risk for heart disease is related to smoking, body weight, sex, age, high blood pressure, and other factors). We call behaviors that reduce risk for a negative
outcome “protectivefactors.” For example, hand washing is a protective factor against getting sick.
1.6.4 “Individual Results Vary” Unless there is a perfect correlation (as in the hemophilia example), statistical associations or correlations between variables do not predict
exact outcomes for all individuals. Consider the results of a study by Judge and Cable (2004), informally reported in Dittman (July/ August 2014). They reported that taller persons tend to earn more money (thatis, height is correlated with salary). This is not a perfect correlation. If you are short, that does not necessarily mean that you will earn verylittle. Mark Zuckerberg (the founder of Facebook)is reported to be 5'7”, but that did not prevent him from becoming one of the wealthiest men in the world. If youthink about the implications correlations might have for your own outcomes, realize that individual
outcomes differ when correlations are not perfect.
causal claims. Mass media journalists often rely on secondary sources or third-party content. Bythe time information filters through multiple communication links, details about the nature of
the evidence and concerns aboutlimitations that affect the ability to generalize and make causal inferences are often lost. Third-party content often does not provide accurate information about generalizability and potential causality.
1.7 Quality Control Mechanismsin Science 1.7.1 Peer Review The science research process has mechanisms for information quality control. The most important
mechanism is review. Researchers submit research reports to science journals (also called academicjournals) for consideration (see note 2). The editor sends papers to peer reviewers (peers are expert researchers in the same field). Reviewers providedetailed criticism of studies, including evaluation of their research methods. On the basis of reviews, editors decide whether to reject a paper as inadequate, ask authors to revise the paper to correct errors or deficiencies, or (very rarely) accept the paper with only minor corrections. Papers are rarely accepted in their initially submitted form. Rejection rates for some journals are 80% or higher.
Peer review is fallible. Reviewers can also be subject to confirmation bias (they are more likely
1.6.5 Requirements for Evidence of Causal Inference 5% Page8of624 - Location 706 of15772
to favor conclusions consistent with their own beliefs). Reviewers may not notice all of the problems in a research report. However, peer
review weeds out much poorly conducted
components such as preregistration of research
research and improves the quality of published
plans and sharing details of data and methods. For
papers. The community of scientists in effect
further discussion, see Cumming and Calin-
systematically polices the work of all individual
Jageman (2016).
scientists.
1.8 Biases of Information
1.7.2 Replication and
Accumulation of Evidence A second important mechanism for data quality
control in academic research is replication. Replication meansSior redoing a study.
This can be an ex
(keeping all
methods the same) ora (changing elements of the study,such as location,
measures,or type of participants,to evaluate whether the same results occur in different situations). We should not treat findings from any
one study as a conclusive answer to a research question. Any single study may have unique problems or flaws. In an ideal world, before we accept aresearch claim, we should have a substantial body of good-quality and consistent
Consumers 1.8.1 Confirmation Bias (Again) Information consumers or receivers also tend to select evidence consistent with their preexisting
beliefs. Media consumers need to be aware that they can systematically miss kinds of information (which may be of high or low quality) when they select news sources they like. Ratings of many
web news sources on a continuum from left/liberal to right/conservative, along with assessment of accuracy, are provided at
https://mediabiasfactcheck.com/politifact/. News sources that are extremely far left or far right tend to beless accurate.
evidence to back up that claim; this can be
Because of confirmation bias, people can get
obtained from replications.
stuck: They continueto believe “facts” that aren’t
Peer review and replication in science are fallible. However, they providethe best ongoing quality control checks we have. In contrast to science, there are few quality control mechanisms for most mass media communication.
1.7.3 Open Science and Study Preregistration There are recentinitiatives to improve the reproducibility and quality of research results in biomedicine, psychology, and other fields (Begley & Ioannidis, 2015; Open Science Collaboration, 2015). The Open Science model includes
5% Page9 of 624 - Location 735 of15772
true, and ideas that are wrong, because they never expose themselves to information that might prompt them to consider different possibilities. Consumers of mass media usually avoid evidence that challenges their beliefs. Philosopher of science Karl Popper argued that scientists also need to examine evidence that might falsify their beliefs. Scientists and people in general should consider evidence that challenges their beliefs.
1.8.2 Social Influence and Consensus Should we believe something simply because many people, particularly those whom we know
media and held by millions of people. My personal
evaluated by an institutio research that involves nonhuman animals is al care a evaluated by an institutional a
favorite conspiracy theory is that alien reptiles
committee. Ethical codes govern research in
control U.S. politics. Bump (2013) reported that
other areas such as biomedicine. Data collection
more than 12 million people, or 4%, of the U.S.
cannot begin until ethics board approval of
population said that they believed this theory in
procedures has been obtained. Adherence to those
2012-2013. To be clear, I strongly disbelieve that
rules is an ethical obligation for researchers. We
we are ruled by alien reptiles. (I am also not sure
should not harm the people or entities we study.
and respect, believe it? Not necessarily. Some incorrect beliefs are widely reported in mass
whether to believe Bump's report that 12 million peoplereally believe this; surveys are not always accurate.)
As an example of potential harm to a research participant, suppose that a study reveals that a person has a history of addiction. If that
Consensus amongscience researchers can
information gets into the handsof potential
enhance the believability of a claim. However,
landlords or employers, it could have an impact
even in science, consensus does not always guarantee accuracy. Experts can turn out to be wrong. For example, there was a consensus
on that person’s search for housing and jobs.
among nutrition researchers that eggs are bad for
health because of their cholesterol content. Some recent research suggests that this widely held belief may be incorrect? (Gray & Griffin, 2009),
but the issue continues to be controversial. A beliefshared by millions ofpeople is not necessarily
wrong. However, consensusIs neithernecessary nor sufficient evidence that information is correct.
1.9 Ethical Issues in Data Collection and Analysis 1.9.1 Ethical Guidelines for Researchers: Data Collection Ethical issues arise when collecting data about people and nonhuman animals. For psychologists, the American Psychological Association has codes of ethics that protect the well-being of subjects (Campbell, Vasquez, Behnke, & Kinscherff, 2009). Research that involves human participants is
6% Page 100f 624 - Location 763 of15772
Researchers must keep such records confidential. Researchers also have an ethical responsibility to think about the potential impact of their research (both positive and negative) on public policy and the behavior of organizations and individuals.
1.9.2 Ethical Guidelines for Statisticians: Data Analysis and Reporting The GAISEreportstates, “Students should
demonstrate an awareness of ethical issues associated with sound statistical practice” (GAISE College Report ASA RevisionCommittee, 2016). A separate document (American Statistical Association, 2015) discusses ethical issues in detail. Here is a list of ethical practices for data analysts, paraphrased from the American
Statistical Association's ethics document. You will be reminded aboutthese issues as you continue through the book. 1. Ensure that numbers are accurate. Fully disclose data handling procedures (such as
enhancing use of the textbook by students and
teachers, students, and readers; please e-mail her
teachers.
at [email protected] with comments, corrections, or
SAGE edge for students provides a personalized approach to help you accomplish your coursework goals in an easy-to-use learning environment. Resources include the following:
* Mobile-friendly eFlashcards to strengthen your understanding of key terms * Datasets for completing in-chapter exercises * Links to web resources, including video tutorials and creativelectures, to support and enhance your learning
SAGEedge for instructors supports your teaching by providing resources that are easy to integrate into your curriculum. SAGE edge includes the following:
* Editable, chapter-specific PowerPoint® slides covering key information that offer you flexibility in creating multimedia presentations
* Test banks for each chapter with a diverse range of prewritten questions, which can be loaded into your LMSto help you assess students' progress and understanding
* Tables andfigures pulled from the book that you can download to add to handouts and assignments
* Answers to in-text comprehension questions, perfect for assessing in-class work or take-home assignments Finally, in response to feedback from instructors for R content to mirror the SPSS coverage in this book, SAGE has commissioned Az R Companionfor Applied Statistics Tby Danney Rasco. This short supplement can be bundled with this main
textbook. The author welcomes communication from 3% Pagexxiof 624 » Location 425 of 15772
suggestions.
medicine. There are many questions in medicine (such as what causes autoimmune disorders) for which medical research does not have good
3. Isthe communicator far from the
It is useful to think about scientific knowledge in
terms of of instead ofcertainty. The philosopher David Hume said that “a wise [person] ... proportions his [or her] belief to the evidence” (Schmidt, 2004). Degree of belief should be based
on orting evidence. When
there is little evidence (for example, results from only one study), people should not have strong belief in a claim. As additional good-quality increase. People should revise degree of belief upward or downward as new (good-quality)
evaluate the information? 4. Does the communicator have a good record for truth-telling? 5. Whattypes of evidence are included. Anecdotes? Citations of specific, credible sources? 6. Have you considered your own possible biases as an information consumer? Do you accept information uncritically because it confirms by what other people believe? 7. Do data come from people (or cases) who resemble the population of interest? Are
evidence becomes available.
results generalizable?
This rating scaleillustrates the concept of degree of belief. The use of a five-point scale and the exact verbal descriptions for each numerical rating are arbitrary.
Maybeuntrue
information source or not well qualified to
when you already believe? Are you influenced
evidence accumulates, degree of belief can
Probably untrue
2. Is evidence cherry-picked to fit the communicator’s argument?
answers (Fox, 2003).
systematically collected su)
self-interest?
8. Are causal inferences drawn when there is not enough information to prove a causal association? Remember that imperfect
correlation or co-occurrence does not indicate causation.
Not sure; insufficient evidence
Maybetrue
Probably true
9. Has information been subjected to quality control? (In science, this includes peer review and replication.)
Fairly often, the best answerto research orpublic Policy questionsis that we do not have enough high-quality evidenceto be confident that we
know thecorrect answer. We should never assumethat numerical results ofone single study or mass media report are conclusive.
10. Is the presentation of information deceptive (e.g., lying graphs)?
11. What ethical issues are at stake in the conduct and application of the research? 12. Is your degree of belief proportional to the quantity of good quality and consistent evidence? (You should never believe a claim
1.12 Summary
on the basis of just one scientific study or one journalism report.)
Here are some questions to keep in mind when
Sometimes the best answer to questions such as
evaluating numerical (and other) information.
“Are eggs harmful to cardiovascular health?”is
1. Isthere evidence of communicator bias or
6% Page 12 of 624 » Location 817 of15772
that we don’t have enough evidence yet to answer the question. Unfortunately, lack of evidence does
not prevent some communicators from making
information an important factor when you
premature claims. When claims are made on the
evaluate message credibility?
basis of limited evidence, contradiction and
4. What does it meanto say that a correlation
confusion often arise.It is better to reserve
(or association) between variables is
judgment until a large quantity of good-quality
imperfect?
evidence is available. One single media report, or one single science report, is not “proof.” Even if you do not plan to be a researcher, you can benefit from thinking like a scientist and statistician about numerical evidence you encounter in everydaylife. Some decisions have high stakes. For example, you may need to decide whether to undertakea risky but potentially beneficial medical treatment. Ideally, you should have accurate information about potential outcomes. The higher the stakes, the more you need to know how to obtain trustworthy
information.
5. Give an example of a risk factor, and a protective factor, not discussed in the chapter. 6. Whyis the existence of a correlation (existence of co-occurrence or association) between and Fnot enough evidence for us to say that Y causes Y? 7. Whatis the post hoc, ergo propter hoc fallacy? (Give an example you have seen, different from the one in this chapter.)
8. What is confirmation bias? 9. What quality control mechanisms are used in science? 10. Whatis peer review? How can it improve the
The take-home message from this chapter is: We all know a lot less than we think we do, because most of us rely heavily on third-party content that has little or no information quality control. All of us (scientists, journalists, and information consumers) should be cautious about degree of belief. Sometimes the best answer to a question is: We don’t have enough good quality evidence.
Courses in statistics and research methods teach
credibility of science reporting? 11. Whatis research replication? How can this improve the quality of evidence in science? How do exact replication and conceptual replication differ? 12. Aresearcher might say “the results of this one study prove” something. Is this justified? 13. What (approximate) degree of belief should you have on the basis of only one study?
you good practice in evaluation and presentation
of evidence.
Notes
Comprehension Questions
1 Scientists are expected to be objective when they
1. What is cherry-picking of evidence, and why is it deceptive? (Can you think of a book or media report that seems to present cherrypicked evidence?) 2. Give examples of self-interest that might
make a communicator less believable. 3. Why is distance to original source of
6% Page 13 of 624 » Location 842 of15772
select information to report. However, scientists
tend to focusselectively on information consistent with the most widely accepted existing theories; Kuhn and Hacking (2012) called this “selection of significant fact.” 2 Numerous predatory, for-profit online journal publishers have emerged in recent years. It has
become more difficult to determine whether
online publications are credible. Research reports
including eFlashcards, data sets, and web
published in predatory journals are not valued by
resources, on the accompanying website at
professional colleagues and universities. Beall’s List of Predatory Journals and Publishers names many publishers that are almost certainly predatory (https://beallslist.weebly.com). Additional warning signs that a publisher may be predatory: * The journal invites you to submit your undergraduate or graduate thesis for publication (particularly if the journal title is not in your discipline or field). * Thejournal offers to publish your paper without peer review. * Thejournal asks you to pay for publication. (However, manylegitimate publishers charge author fees to make journal articles open access on the web;therefore, a request for paymentis not always an indication thata journalis predatory.) If you are not sure whether a journal or publisher is predatory, search or along with the term predatory. You can also ask mentors, advisers, or colleagues.
3 About half of Dr. Oz's medical advice is not supported by medical research (Belluz, 2014). Dr. Oz was investigated in a congressional hearing and paid largesettlements in lawsuits for false advertising (Cohen, 2015). 4 This video about an imaginary time-traveling dietician makes fun of changes in dietary
recommendations across the decades: https://www.youtube.com/watch?v=5UaWVg1SsA.
Digital Resources Find free study tools to support your learning,
6% Page 14 of 624 » Location 867 of15772
needed to understand later topics throughout the
original data sources, if these exist. However,
book.
original data sources are sometimes not available, and complete proofreading of data may be
3.2 Use of Frequency Tables for Data Screening Welook at frequency tables to get to know the data and to identify potential errors and problems with data before we do other analyses. This
extremely time-consuming and costly. Ata minimum, spot checks (checking some score values in SPSS against original sources of data) provide an opportunity to detect problems that might be more widespread throughout the data set and would require much closer checking. If you find scores that are clearly impossible or at
Introductory statistics textbooks often present
least highly unlikely, the best option is to obtain
students with sample data sets that are assumed
valid scores from other sources if thatis possible.
notto have errors or missing information. In real-
If astudent reports a grade point average (GPA) of
world applications, data often have problems, and
6 when college GPAs are on a 0-to-4 scale, and you
it is importantto look for them. These problems
have access to university records and can find
include:
that student’s GPA, you could use the university
* Information is sometimes missing for some members of a sample. * Some scores can be unusually large or small; unusual or extreme scores can be problematic in some analyses. * Some groups contain too few cases for meaningful analyses.
* Real datasets often contain mistakes (incorrect, or even impossible, score values). Implausible or incorrect score values can arise in many ways. If a person is asked to report hair color and reports “plaid,” that is an unlikely response. If a heart rate is recorded as 275 beats per minute, the heart rate monitor is probably malfunctioning. However, a score value can appear plausible and still be incorrect; if a heart rate monitor is not properly calibrated, a person whose heartrate is given as 110 beats per minute might really have a heart rate of 95 beats per minute.
record to replace the incorrect self-reported value. If a respondent reports large numbers of silly or impossible values, you might decide to drop that person’s data entirely. There is increasing concern about completeness and transparency in data reporting (Simmons, Nelson, & Simonsohn, 2011). Research reports should include information about problems detected during preliminary screening. This information is often obtained from frequency tables and graphs (such as histograms). The
numbersor percentagesof incorrect scores, extreme scores, and missing values should be reported. Authors also need to specify what, if anything, was done to remedy these problems. You might say something like “Data for five students were dropped because they reported unlikely or inconsistent information”or “Data from three sessions had to be dropped because of equipment malfunction.” Whatever problems with data you find, and whatever actions you take,
In an ideal world, researchers would proofread every single number in the data file against
10% Page 39 of 624 - Location 1495 of 15772
you need to keep a detailed record and include this information in published research reports.
who is also the world’s greatest writing
About the Author
buddy.
Rebecca M. Warner is Professor Emerita at the University of New Hampshire. She has taught statistics in the UNH
Department
of
Psychology
and
elsewhere for 40 years. Her courses have included
Introductory
and
Intermediate
Statistics as well as seminars in Multivariate Statistics, Structural Equation Modeling, and Time-Series Analysis. She received a UNH Liberal Arts Excellence in Teaching Award,is
a
Fellow
of
both
the
Association
for
Psychological Science and the Society of Experimental Social Psychology, and is a member
of the
American
Psychological
Association, the International Association for Statistical Education, and the Society for Personality and Social Psychology. She has consulted on statistics and data management for the World Health Organization in Geneva, Project Orbis, and other organizations; and served as
a visiting faculty member
at
Shandong Medical University in China. Her previous book, The Spectral Analysis of TimeSeries Data, was published in 1998. She has published
articles
on
and
social
psychology,
statistics,
health
psychology
in
numerous journals, including the Journal of Personality and Social Psychology. She has served as a reviewer for many journals, including Psychological Bulletin, Psychological Methods,
Personal
Psychometrika.
She
Relationships, received
a
BA
and from
Carnegie Mellon University in social relations in 1973 and a PhD in social psychology from
Harvard in 1978. She writes historical fiction and is a hospice volunteer along with her Pet Partner certified Italian greyhound Benny,
4% Pagexxivof 624 + Location 473 of 15772
mean of 80.9.
for which the mean, median, and mode have
This example illustrates two things: e When one very high score is added to this sample, the value of Mincreases (while the
value of the median and mode do not change). This demonstrates that the mean is less robust against the impact of extreme
scores than the median and mode. e With one or more extremely high scores added, the value of the sample mean Mis higher than the median; and in this example, Mis actually higher than the majority ofthe individual scores in the sample. Under these circumstances the sample mean Mis nota very good way to describe “average”or typical responses. Note that adding an extremely low
score will make the mean smaller than the median.
4.8 Behavior of Mean, Median, and Mode in Common RealWorld Situations
similar values. Suppose you have a survey question that asks peopleto rate their degree of agreement
with this statement: “I think that the U.S. economy is doing well.” Response options are scores of 1 = strongly disagree (SD), 2 = disagree (D), 3 = neutral (N), 4 = agree(A), and 5 = strongly agree (SA). We might obtain a frequency distribution
like the one in Figure 4.4. Note that the answer given by the largest number of people corresponds to 3 (neutral), the next highest frequency responses were 2 (disagree) and 4 (agree), and the most extreme responses, 1 (strongly disagree) and 5 (strongly agree), were uncommon. For now, we will call this pattern a “bell-shaped”distribution. (Later, we'll talk more formally about normal distributions.) Bell-shaped distributions tend to have values of the mean, median, and modethat are close to one another. In the graph in the lower part of Figure 4.4, the number above the bar for each score value (such as 0) corresponds to the frequency of that score in the table (in the upper part of Figure 4.4). For example, in this hypothetical data set, a score of 1
This section previews the use of graphs to
had a frequency of 6. A score of 3 had a frequency
represent score frequencies for quantitative
of 33 (i.e., 33 people chose the answer 3). The
variables (graphs are discussed more extensively
histogram or graph at the bottom of Figure 4.4
in Chapter 5). Figure 4.4 shows a frequencytable
represents the same information about
for a set of hypothetical scores. A corresponding
frequencies using bars with heights that
histogram presents the same information
correspond to frequency. This distribution can be
graphically; the height of each bar in the
informally defined as bell shaped; there is a peak
histogram correspondsto the frequency of that
in the middle, and the pattern is symmetrical;
score (i.e., the number of people who had that
thatis, the left-hand sideof the distribution is
score value).
approximately a mirror imageof the right-hand
side.
4.8.1 Example 1: Bell-Shaped
Distribution
Figure 4.4 Hypothetical Likert Scale Ratings With Bell-Shaped Frequency Distribution: (a) Frequency Table and (b) Corresponding Histogram
First let’s consider a hypothetical batchof scores
15% Раде 78 о624 - Location 2338 of 15772
the other). Most people gave a rating of 1 (strongly disagree) or 5 (strongly agree). The highest modeis for arating of 5. The frequency for arating of 1 was almost as high. Very few people gaveratings
between these extremes. Figure 4.5 shows the frequency table and histogram for this Figure 4.5 Hypothetical Likert Scale Ratings for Polarized or Bimodal Responses: (a) Frequency Table and (b) Histogram Likertpolarized
vaia 1 2 a 4
5
Cumulative Percent ValidPercent Frequency Percent 36.9 36.9 (9) 27 ④e ィ 5 77 508 62 62 4 」 56.9 62 462
(©) so
100.0
1000
65
Total
100.0
434
(a
②
28
|
24
g ⑳
Total: 65; 100; 100
Thesecondpart ofthe image is the same data as aboverepresentedin a histogram. Thehorizontal axis representsthe degree of agreement and ranges from 1 to 5,in increments of 1. The vertical axis represents the frequency andrangesfrom O to 30, in incrementsof 10. Therearefivebins on the graph and their heights are; 24,5, 4,4 and 28. A notebelow the graph states: Mean is equal to 3.11, median is equal to 3, and modes are 5 and 1. In this example, because the distribution in Figure
3E
ÈE 10 o
6.9; 36.9 5; 7.7; 7.7; 44.6 4; 6.2; 6.2; 50.8 4; 6.2; 6.2; 56.9 28; 43.1; 43.1; 100 The numbers 36.9 and 43,1 are circled.
hypothetical outcome.
(a)
the table:
n ェ vw NE
in Figure 4.5 is an example of bimodal or polarized ratings (i.e., scores tend to be at one extreme or
5 ①
②
4 3
4.5 is bimodal, with one mode at the highest possible score and a second mode at the lowest possible score, neither the mean (#/= 3.11) nor
4 4
5
Degreeof agreement Note-Mean = 3.11, median = 3, and modes are 5
and 1.
the median (Mdn = 3) describes typical or average response very well. In fact, very few people gave ratings close to 3. We get a better sense of “typical” responses if we report the two modes. Peopleeither love liberal policies or hate them. The point of this example is that in some
The image is a combination of a table that showsLikert Scale ratings and a histogram displaying bimodal or polarizedratings. The first part of the image is a five-columned table that displaysvalid count, frequency, percent, valid percent, and cumulative percent forfive piecesofdata. Thedetails provided below are in the same order as mentionedin 16% Page 80 of 624 - Location 2388 of 15772
frequency distributions, the mean and median may not be good ways to describetypical or
average response.
4.8.3 Example 3: Skewed
Distribution
shows a frequencydistribution and a histogram.
behaviors, for example, How many children do you have? How many speeding tickets have you
received? Distributions for variables like these often have many responses of 0, 1, or 2 (with a smallest possible value of 0). However, the highest responses can be 8, 10, or more. For these types of variables, the shape of a distribution is often
asymmetrical or skewed. A frequency table of hypothetical answers to the question “How many children do you want to have in the future?” appears in Figure 4.6. Figure
4.6
Frequency
Hypothetical Scores
Distribution
on Number
for
of Children
Wanted: (a) Frequency Table and (b) Histogram
(a)
children
Frequency Percent ValdPercent cumutive Percent vaia o NO we это E ① 17288 258 575 2 y 125 136 na 3 FRY 121 833 ‘ iw 61 вол 5 ュ ー 45 030 7 ィ s 15 ce ⑤ Tis 15 это n as 15 oe 16 ィ e 15 1000 Total a 1000 1000
Thehorizontal axis denotes the number of children andrangesfrom 0 to 15, in increments of 5. The vertical axis denotesthe frequency and rangesfrom 0 to 25, in increments of 5. Thereare 10 bins onthe graph and their heightsare: 21,17,9,8,4,3,1,1,1,and 1.
21 20
A notebelow the graph reads: Mean is equal to 3.11, median is equal to 3, and modes are 5 and 1.
17 è 15
$
5
g
0; 21; 31.8; 31.8; 31.8 ; 17; 25.8; 25.8; 57.6 2; 9; 13.6; 13.6; 71.2 12.1; 12.1; 83.3 6.1; 6.1; 89.4 5; 3; 4.5; 4.5; 93.9 ; 1.5; 1.5; 95.5 1.5; 1.5; 97 11; 1; 1.5; 1.5; 98.5 16; 1; 1.5; 1.5; 100 Total; 66; 100; 100
The second part of the image is the same data as above representedin a histogram.
(b) 25
と ⑩
The first part of the imageis a five-columned table that displays valid count,frequency, percent, valid percent, and cumulative percent for ten pieces of data. The details provided below are in the same order as mentionedin the table: ..u.........
Some variables represent counts of events or
8
ィ
ト
En 5
1
10
-
Numberof children
The distribution in Figure 4.6 is described as
+
①
15
Note-Mean = 3.11, median = 3, and modes are 5
and 1.
-
“positively skewed” because there is alonger (and thinner) tail at the positive end of the distribution. In this positively skewed distribution, there are a few extreme scores at the high end (e.g., the persons who said they wanted 11 and 16 children). In this example, the mean of 2 isnot a good indication of typical responses
The image is a combination ofa table that
16% Page 80 of 624 - Location 2409 of 15772
(more than half of the peoplein this sample
sometimes cannot even gain access to it.
about things we think we know.
Much content on websites for news organizations
1.3.4 Communicator Credentials and Skills
is third-partyconte . This is content written by someone who may have examined only secondary sources or other thirdhand content, such as news reports or press releases. Often, third-party
Communicators are more believable when they
content is authored by someone who has no
have training and background related to
technical knowledge of the research field and
information in the message. Researchers
statistical methods. Examples include articles
generally have credentials that provide evidence
published by news organizations. These articles
of this training and background,including
usually don’t provide complete or accurate
advanced degrees such as a PhD or MD, affiliations
information aboutresearch results.
with respected organizations such as universities,
In the past, editors of prestigious newspapers
Some journalists have strong credentials in
required reporters to fact-check claims carefully.
science, but many do not. People who do not have
Increasingly, news reports on the web are
training in statistics can easily misunderstand
paraphrases of, or uncritical reposting of, third-
studies that use statistical terms such as /ag/stic
party content from other news sources. Some
regression and odds ratios.
and publications in high-quality science journals.
mass media news sources specifically disclaim responsibility for accuracy. Here is an example;
Celebrity status is not a meaningful credential.
many other news organizations post similar
Famous media personalities, such as Dr. 0z3 and
disclaimers:
other self-appointed lifestyle or health experts, may base recommendations on incomplete or
CNN is a distributor (and not a publisher or
incorrect information.
creator) of content supplied by third parties
Scientific research reports include source
and users. ... Neither CNN nor any third-party
information (authors, university affiliations, and
provider of information guarantees the
so forth). News reports and websites sometimes
accuracy, completeness, or usefulness of any
do not include source information; they provide
content. ... (CNN, 2018)
no basis to evaluate self-interest, distance from information source, and credentials. Guidelines
Communicators can provide better quality
for evaluation of websites are provided by Kiely
information when they are closerto original sources
and Robertson (2016) and Montecino (1998).
ofinformation, and they are likely toprovide better quality information when they assumeresponsibility for accuracy. In everyday life, most of us rely on thirdhand
information most of the time. Because so much of what we think we know is based on thirdhand information, we should not be overly confident
4% Page3 of 624 + Location 542 of 15772
1.3.5 Track Record for TruthTelling There are independent, nonpartisan organizations that evaluate communicator track records for truth-telling in journalism, for example, the Pulitzer Prize-winning site
science data. Negative skewness is possible (with a few extreme scores at the low end)
but less common. 3. If adistribution is bell shaped or approximately normal, the values of the mean, median, and mode will be close together. The mean is a good way to describe central tendency for bell-shaped distributions; the median and mode will have
similar values. 4. When in doubt, or if the situation is complicated,it may be better to report the entire frequencydistribution (and/or histogram) along with values for the mean, median, and one or more modes.
Good practice:
This is deceptive.
* Fail to makeclear which index of central tendency is reported, and fail to note potential problems withit. Chapter 1 mentioned “lying with statistics.” Reports of central tendency can be deceptive when they present only selected information that creates the impression the author wants to create. When an author wants readers to think, “Wow, that averageis really high,” the author might choose to report the highest of the three values (mean, median, or mode). Conversely,if the author wants readers to think, “Wow, that averageis really low,” the author might choose to report the lowest value among mean, median, and mode. An author who cherry-picks the highest
* Do preliminary data screening by examining afrequencydistribution table and graph to evaluate whether the mean, median, and/or mode(s) are better ways to describe central tendency. eo If implausible score values appear, go back
and reexamine the data to correct errors. * Note the number of missing values. e State whether extreme scores or multiple modes were detected (or whether the distribution is approximately normal). e State clearly what statistic is used (mean, median, or mode) to describe average
responses.
“average”is presenting misleading (although perhaps not technically false) information.
4.10 Using SPSS to Obtain Descriptive Statistics for a Quantitative Variable Previoussections discussed statistics for central tendency; the following sections discuss statistics
to describe variability. In this section, SPSS is used to obtain all these descriptive statistics (to describe both central tendency and variability) from data in the file named temphr10.sav using the SPSS frequencies procedure.
Bad practice:
To run Frequencies, make these menu selections
* Obtain a mean, median, or mode without
(as in the example in Chapter 3): >
examining a frequency table or graph. * Select the index of central tendency value
っ . This opens the main dialog box for the frequencies
that “fits the narrative.” For example, if you
procedure; in this window, move the variable hr
want to report a high average, you can select
into the Variables window. Click the Statistics
whichever of these three statistics has the
button in the top right-hand corner of the main
highest value, whether it makes sense or not.
dialog box for the frequencies procedure to open
16% Page 82of 624 - Location 2456 of 15772
the Frequencies: Statistics dialog box (shown on
(a) alge Graphs unies Extensions window Help
Rogers Daseriptve Statstcs Bayesian Saisies Tapes Compare Means (General Linear Model Generalizea Linear Models
the right-hand side of Figure 4.7). There isa checkbox menu;click these checkboxes as shown
to select central tendency statistics and statistics to describe variability (in the area headed “Dispersion”), as shown. The statistics that describe variability are explained in upcoming
sections. Click Continueto exit from the Frequencies: Statistics box and return to the main Frequencies dialog box; and click OK in the main dialog box to run the analysis. Output appears in
Figure Figure 4.7 SPSS Frequencies: Statistics Dialog Box to Obtain Descriptive Statistics for Quantitative
Med Models
come gegression ueme cm Dimension Reduction sale Nonparametie Tests Foecasing sumar tseResponse maem Quitconto
= * Eéequences > impose Sle 日 Fm » | xe ィ › [Mer ros. り menw 0 » ⑧ » , » » ⑧
E Roc curve.
Spatal andTemporal Modeling...» Teu SPSSAmos.
Variables
x
© 8 Frequencies: Statistics
Percentile Values [J Quartiles
D Cutpointsfor 10
|| || № Mean меда equal groups № моде
[7] Percentile(s):
E sum
add
|
|
|
|
|
| [E Values are group midpoints
Dispersin ———] [ Characterizo Posterior Dist... Y Std. deviation ¥ Minimum
variance
(Maximum
| [E Skewess
| |жив
The image is a combination screenshot that shows how to select descriptive statistics as well as a SPSSstatistics dialog box. In the first part ofthe image, a closeup of the taskbar of a spreadsheetshows different navigation buttons including Analyze, graphs, utilities, extensions, window and help. On theclickingofthe Analyze button, a drop down menu with thefollowingoptions has 16% Page 83 of 624 - Location 2481 of 15772
|
IBM SPSS amos.
The reports tab has been depressed leading to another menu with the following options; frequencies, descriptives, explore, crosstabs, turf analysis, ratio, p-p plots, and q-q plots. The second part of the image showsthe frequencies: statistics dialog box. On the top left are the Percentile values withthe following check boxes; Quartiles, Cut points for equal groupsthat has an emptyslotto fill in for the numberof equal groups, and Percentile. The top right has a central tendencysection with the following check options; mean, median, mode and sum. Thefirst three have been checked. The bottom left has a dispersion segment with the following checkoptions; std. deviation, minimum,variance, maximum,range, and S.E. mean.All except for S.E. mean have been checked. Onthebottom right is a check boxthat states Valuesare group midpoints. Below this is a section titled characterize posterior dist with two check boxes; skewness and kurtosis. At the bottom are buttons for continue, cancel
and help. The values for mean, median, and mode in Figure 4.8 agree with the values obtained in earlier sections by hand, and they are close together. This
the same set of scores. Figure 4.8 Output for Descriptive Statistics for Hypothetical Heart Rate Data in temphr10.sav
Statistics hr 10
Valid
N
0
Missing
73.10
Mean Median
|
73.50 」 75
Mode
5.666
Std. Deviation
32.100
Variance Range
|
20 」
Minimum
62
Maximum
82
The image is a table titledStatistics with the following information: e Hr oN o Valid-10 o Missing -0 Mean-73.1 Median-73.5 Mode -75 Std. Deviation 5.666 Variance - 32.1 Range - 20 Minimum - 62
.......[.
opened; reports, descriptive statistics, Bayesian statistics, tables, compare means, general linear model, generalizedlinear models, mixed models, correlate, regression, loglinear, classify, dimension reduction, scale, non-parametric tests, forecasting, survival, multiple response, simulation, quality control, ROC curve, spatial and temporal modeling and
Maximum - 82
exampleverifies the by-hand computations for
The next section describes variability or variation
mean and median done in previous sections for
in quantitative scores. You will see how
16% Page 84 of 624 - Location 2491 of 15772
population of “all depressed persons,”ideally, we
possible that the association reported in some
would want a random sample drawn from that
studies did not arise because of any direct causal
population. However, participants are often
impact of diet soft drinks on weight. Perhaps
convenience samples, that is, people who were
when people drink diet soft drinks, they feel free
easy to recruit.
to indulge in other high-calorie foods, and
It is important to know what kinds of people were (and were not) included in a study. For example, if adrug study finds evidence that a new medication is effective and safe for healthy young men, that does not necessarily mean that the drug is also effective and safe for women, elders, children,
perhaps it is those other high-calorie foods, not the soft drinks in and of themselves, that cause weight gain. If that is the correct explanation, then what you need to do to avoid weight gain is to avoid consuming high-calorie foods (rather than reduce diet soda consumption).
and other kinds of people not included in the
Causal explanations are attractive because they tie
study.
events together in meaningful ways. This is useful
Be careful not to overgeneralize results, particularly when there is little information about the types and numbers of people (or cases)
in science as well as everyday life. Sometimes when a cause-effect relationship is known, it suggests what we can do to change outcomes.
included. /z makes sense togeneralize information
Demonstrating that two events are causally
from a smallgroup to some largerpopulation only
connected can bedifficult, because there are often
when people in thegroup resemble thepopulation of
rival possible explanations. Well-controlled
interest. This is discussed further in Chapter 2 in
experiments can rule out many rival explanations.
sections about samples and populations.
In everydaylife, people sometimes jump to
In science communication, authors are expected
to discuss limitations that must be considered before drawing any conclusions. Limitations include the number and kindsof people(or cases) included in a study. Science writingshould make
conclusions about causality on the basis of
insufficient evidence.
1.6.1 The “Post Hoc, Ergo Propter Hoc” Fallacy
limitations ofevidence clear; media reporting often News commentators frequently offer causal
does not.
explanations for events (e.g., the stock market
1.6 Making Causal Claims In everydaylife, and in science, we often want to
know about causal connections. Consider a question raised by Wootson (2017). Do diet (artificially sweetened) soft drinks cause weight gain? If you are concerned about weight gain, and if artificially sweetened soft drinks cause weight gain, then you might consider avoiding diet soft drinksto avoid weight gain. However, it is
5% Page5of624 - Location 624 of15772
went down because of a blizzard the previous day). This explanation is often just an opinion of the news commentator. The stock market might have gone down for other reasons (including random variations). This is an example of a common This Latin phrase means “after this, therefore, because of this.” This (incorrect) reasoning goes like this: If Event A happens, and then Event B happens, then A must have caused B. Before we
higher or lower than other people’s. Statistical
We return to the question: How much do people’s
analyses you will learn later in the course provide
scores in a sample vary or differ relative to the
ways to evaluate how much of the individual
sample mean? In words, the answer to this
differences in hr mightbe related to each variable,
question is: We find out how far each X score is
such as anxiety.
from the mean by computinga deviation, we square each deviation, then we sum the squared
4.12.2 Step 2: Sum of Squared
Deviations
deviations to summarize information about distance from the mean. This gives the formula for SS, the sum of squared deviations of scores
Next, we need to summarize information about distances from the mean across all the people in the sample. You might think that you could summarize information by summing the deviations, the values of (X-4), across all people in the data set. However, recall from Section 4.6
that this sum of deviations from the mean is always zero. It might occur to you that this problem could be avoided by summing the absolute values of these deviations. However, there is another approach that yields more useful
from their mean: Other
(4.5)
55 = E[(X — MY]. A different version of the formulafor SSis often given in introductory textbooks:
Other
(4.6)
results.
SS = 200) - [€ X)/N].
Here we introduce another tool in the
Equation 4.5 makesit easier to see what
statistician’s bag of tricks. When deviations sum
information aboutscores is included when you compute SS. Equation 4.6 is easier for by-hand computation of SSfrom scores. They yield the
deviations makes all the terms in this sum
same results.
positive. Notice that we square each individual deviation
To summarize information about individual score
first; then we add those squared deviations.
distances from the mean: First, we square each
Appendix 4A describes rules about precedence in
person’s deviation from the mean. (Squaring a
the order of arithmetic operations. Operations
negative valueyields a positive value, so squaring
that are enclosed in parentheses are done before
deviations gets rid of the problem that positive
operations outside the parentheses. For example,
and negative deviations would cancel each other
if you see the expression E(7?), you square the
out by summing to 0.) Then we sum those squared
value of each 7, and then sum the squared values.
deviations. The resulting sum is called the sum
If you see the expression (5772, you sum the
squares (or sum of squared deviations), abbreviated SS. In upcoming steps, SSwill be used
values of Yand then square that sum.
to compute sample variance and standard
Sometimes textbook examples use numbers that
deviation.
give a whole-number result for SS; however, in
17% Page 26 of 624 + Location 2538 of 15772
Appendix 4B reviews rounding. I suggest that you retain at least three decimal places during computations. Final results for most statistics are
e...
real data, SSis usually not a whole number.
75; 1.9; 3.61 80; 6.9; 47.61 82; 8.9; 79.21 Sum:0; 288.9
often rounded to two decimal places. See Note that SS cannot be a negative number
Appendix 4B for a discussion of rounding.
(because we are summing squared deviations, and
In Figure 4.9 (data from temphr10.sav) the
squared numbers cannotbe negative).
squared deviation from the meanfor each individual person appears in the last column (the
Other factors being equal, SStendsto be larger
variable named deviationsq). Adding the scores
when:
for deviationsq gives the valueof SS for this data set: SS= 288.90. For larger data sets, it is more convenient to have a computer program do this.
Figure 4.9 Deviations and Squared Deviations of Heart Rate Scores From Mean
|@ deviation
69 70 71 73 74 ⑦⑤ 75 80 82
|devi
1. The individual (X- M) deviations from the meanare larger in absolute value. 2. The number of squared deviations included in the sum increases. The minimum possible value of SS (which is O)
11.10 4.10 -3.10 2.10 -①0 .⑨0 1.90 1.90 6.90 8.90
occurs when all the X scores are equal and,
16.81 9.61 4.41 0① -⑧① 3.61 3.61 47.61 79.21
therefore, equal to M. For example, in the set of scores [73, 73, 73, 73, 73], the SSterm would equal 0. There is no limit, in practice, for the maximum
valueof SS. To interpret SSas information about variability,
we need to correct for the fact that SStends to be larger when the number of squared deviations included in the sum is large. Dividing by #, the number of scores in the sample, seems like the obvioussolution. However, this does not provide
Sum
0.0
288.90
........
The imageis a table that showsheart rate values, deviation and square of deviations.
hr; deviation; deviationsq 62; minus 11.1; 123.21 69; minus4.1; 16.81 70; minus3.1; 9.61 71; minus2.1; 4.41 73; minus.1;.01 74;.9;.81 75;1.9;3.61 17% Page 87 of 624 - Location 2566 of 15772
the best answer.
12.3 Step 3: Degrees of Freedom It might seem logical to divide SSby Ato correct
for the increase in size of S§as increases. However, this yields values that are slightly too small; Gosset (discussed in Tankard, 1984) worked out the reason for the problem and discovered a simple solution. When we look at the pieces of information used to compute SS(i.e., the deviation of each score from the sample mean), it
is possible to see that we do not have N independent deviations (or pieces of information) available to compute the SS; in fact, we have only (W-1) pieces of information.
This modified divisor, V-1,is called the
To explain why deviations from the mean in a
the deviations are “free to vary.” The use of df
sample of Vscores provide only (1-1)
instead of Vasa divisor is another frequently used
independentpieces of information aboutdistance
toolin the statistician's bag of tricks. Later
from the mean, recall that the sum of all
analyses also use terms,although のoften has
deviations of scores from the mean must equal 0.
different values than (W-—1) in other situations.
Suppose we have & = 3 scores in a sample(call
Degrees of freedom for the SSand sample variance
these scores X71, Xp, and X3) and that their meanis
are obtained using Equation 4.7:
M.
of freedom
(45. The 4/term tells us how many of
Other
First, we convert each X score into a deviation by subtracting the sample mean M. We know that the sum of these deviations must equal zero. That yields this simple equation:
Other
(Х, - М) + (Х, - М) + (Х, - М) = 0.
(4.7)
df= (N-1). 4.12.4 Putting the Pieces Together: Computing a Sample Variance
We can rearrange this equation by subtracting (X35 — M) from bothsides; the equation becomes:
The variance for a sampleis usually denoted s2. A sample variance is obtained by dividing SSby its
Other
degrees of freedom:
( — M) + X, - / = (X, ー ル ⑦. When we compute (X; —M) + (X — M) (on the left side of the equation), this gives us the value that the remaining deviation, (X3 - 1), must have. Only the first two deviations are “free to vary,” that is, free to take on any possible value. Once we know the value of any two of the deviations, the value of the last deviation is determined (it must be
whatever number is needed to make the sum of all deviations equal 0). This is only a demonstration, not a formal proof.
Other
(4.8 )
»# = SS/(N - 1)or SS/df (Some textbooks use $2 to denote a sample variance calculated as SS/N. In actual practice, this
notation is almost never used when statistics are applied to real-world data, and you will not see $2 again in this book.)
Return to the data in Figure 4.9. The first column shows heart rate scores for each person. The
second column shows the deviation of each available when we compute SSoranother
17% Page 88 of 624 + Location 2588 of 15772
person’s score from the mean(the variable name is deviation). The third column shows each
review weeds out much poorly conducted
components such as preregistration of research
research and improves the quality of published
plans and sharing details of data and methods. For
papers. The community of scientists in effect
further discussion, see Cumming and Calin-
systematically polices the work of all individual
Jageman (2016).
scientists.
1.8 Biases of Information
1.7.2 Replication and
Accumulation of Evidence A second important mechanism for data quality
control in academic research is replication. Replication meansSior redoing a study.
This can be an ex
(keeping all
methods the same) ora (changing elements of the study,such as location,
measures,or type of participants,to evaluate whether the same results occur in different situations). We should not treat findings from any
one study as a conclusive answer to a research question. Any single study may have unique problems or flaws. In an ideal world, before we accept aresearch claim, we should have a substantial body of good-quality and consistent
Consumers 1.8.1 Confirmation Bias (Again) Information consumers or receivers also tend to select evidence consistent with their preexisting
beliefs. Media consumers need to be aware that they can systematically miss kinds of information (which may be of high or low quality) when they select news sources they like. Ratings of many
web news sources on a continuum from left/liberal to right/conservative, along with assessment of accuracy, are provided at
https://mediabiasfactcheck.com/politifact/. News sources that are extremely far left or far right tend to beless accurate.
evidence to back up that claim; this can be
Because of confirmation bias, people can get
obtained from replications.
stuck: They continueto believe “facts” that aren’t
Peer review and replication in science are fallible. However, they providethe best ongoing quality control checks we have. In contrast to science, there are few quality control mechanisms for most mass media communication.
1.7.3 Open Science and Study Preregistration There are recentinitiatives to improve the reproducibility and quality of research results in biomedicine, psychology, and other fields (Begley & Ioannidis, 2015; Open Science Collaboration, 2015). The Open Science model includes
5% Page9 of 624 - Location 735 of15772
true, and ideas that are wrong, because they never expose themselves to information that might prompt them to consider different possibilities. Consumers of mass media usually avoid evidence that challenges their beliefs. Philosopher of science Karl Popper argued that scientists also need to examine evidence that might falsify their beliefs. Scientists and people in general should consider evidence that challenges their beliefs.
1.8.2 Social Influence and Consensus Should we believe something simply because many people, particularly those whom we know
when you compute the following. The values of Mand SD can be combined to set up ranges of score values; that is, we can combine
information about the mean and information about typical distances from the mean. This can be done using integer multiples of 57, such as M+
1250 Lo M-28D=505 na 15D ma M-1°SD=62 ET) mas — = ら т sD a M+1SD=67
= a = ョ E ッ ッ ッ
#3 М+2'50=695
1x SDand M+ 2 x SD.
“oso
For M= 64.5 and SD = 2.5, we obtain the following
Other
M-2xSD=64.5—5 M — 1 x SD = 64.5 — 2.5 M:0xSD=64.5+0 M +1 x SD = 64.5 + 2.5 M+2xSD=64.5+5
59.5. = 62. =64.5. = 67. = 69.5.
The shorter vertical arrow next to the frequency table in Figure 4.10 extends from M-(1 x SD) to M + (1 x SD). This correspondsto the frequencies enclosed in the smaller ellipse. The longer vertical arrow ranges from M-(2 x SD) to M+ (2 x 5D), score values from 59.5 to 69.5. This corresponds to scores in the larger ellipse. Most women in the sample had heights that were included in the range M-(2 x SD) to M+ (2 x SD); only three women (2.5%) had scores below 59.5, and only two women (1.7%) had scores above 69.5.
In words: When we combine information about distance from the mean(57) with the location of the mean (M), we obtain information about the rangeof values within which most of the Yscores lie; this is called the rangerule. The range rule works only for bell-shaped distributions, as in the present example. Figure 4.10 Hypothetical Data for Female Height
in Inches for # = 120 Women With 47 = 64.5 and ②の =②.⑤
17% Page 90 of 624 - Location 2647 of 15772.
The imageis a combinationof a table and a graph that shows hypothetical data for female height. The table hasfour columns; valid count, frequency,percent and cumulative percent. Details are below; о о о ooo.
values for the hypothetical female height data:
58;1;.8;.8 トル ;②⑤⑤ „7; 11.7 1.7; 23.3 63; 12; 10; 33.3 20; 16.7; 50 64; 3.3;63.3 65; 66; 18; 15; 78.3 12.5; 90.8 67; ; 5; 4.2; 95 „3; 98.3 70; 2; 1.7; 100 Total; 120; 100
There are2 circles over the figures; one covers thepercent values 11.7, 10, 16.7, 13.3, 15, and 12.5 andthesecondcoversa larger set of percent values including 2.5, 6.7, 11.7, 10, 16.7, 13.3, 15, 12.5, 4.2 and 3.3.
Thegraph in the second part of the image showsthe X and Y axes as well as the 1 into SD and2 into SD lines. The following figures are mentioned alongside the graph: * Mminus2 into SD equals 59.5 * Mminus1 into SD equals 62 * Mequals 64.5
* Mplus 1 into SD equals 67 * Mplus 2 into SD equals 69.5
frequency distribution table or graph. Ifthe distribution is approximately normal, Mand SD are good ways to describe these. If the distribution
Here are some approximate (not exact)
is clearly non-normal, Mdn and interquartile
relationships of SD with data values that can help
range maybe preferred.For distributions that are
you understand what SD = 2.5 tells us.
not bell shaped, see the next chapter for better waysto describe variation among scores.
In the preceding example, the range for height scores (70 — 58) was 12. The range rule suggests that, for a bell-shaped distribution, the rangeis often little less than 4 x SD. For these data, 4 x SD = 4 x 2.5 = 10. Turning this statement around,the rangerule suggests that SD is often little less than one quarter of the range. Knowing that SDis related to range may help you understand SD. Remember that the range rule works only for bellshaped distributions. * The value of SDtells us about typical distances of scores from the sample mean.
e Few scores are lower than 2 x SD units below Mor higher than 2 x SD units above M. In other words, 2 x SDis a large distance from the mean; only a small percentage of scores are that far away from M. * Ifaresearch report tells you that the
distribution of scores is close to normal with known values for Mand SD, this is sufficient information for you to guess the range. e Using SD = 2.5, individual deviations of height from a mean height of 2.5 inches or less (either positive or negative deviations)
were very common. e Almost all people had deviations from the
4.15 Why is There Variance? Whydo scores differ across people? This is the most fundamental question in applied statistics. For data about humans, the question becomes: What makes people different? Why do some people have higher, and some people lower, heart rates? Why are some people taller and others
shorter? Some characteristics do not differ across people (they are constant). Most people have five fingers on each hand. The rare exceptions are people who have genes for a different number of fingers, or people who have lost fingers because of injury. However, characteristics such as heart rate do differ across persons and situations. Suppose you measure hr for all members of a group. Some persons will have low hr; their hr may be lower than average because they are physically fit and do not smoke. Others have high hr; these elevated hr scores might be due to anxiety or caffeine consumption. A first goal of statistical analysisis to quantify or describe how much people differ. Range, variance,
mean that were less than 2 x SDin absolute
and standard deviation provide this information.
value; 2 x SD = 2 x 2.5 = 5 inches. To say this
We will consider a more interesting question in
another way, most women had heights
upcoming chapters: Can we explain or predict
between 62 and 67 inches.
these differences in heart rate? Can we
Good practice: To choose the most appropriate statistics to describe central tendency and variability, the data analyst should examine a
18% Page 90 of 624 - Location 2669 of 15772
understand why people differ? You probably already have some intuitions about factors that are related to hr, for example, smoking and
deletion of cases or replacement of missing
This book discusses good practices in applied
values) that could alter conclusions.
statistics that can potentially improvethe clarity
2. Make the limitations of the type of statistical
and honesty of research reports. When
analysis clear. (As each new analysis is
communicators present information in
introduced, you will learn about its
misleading, unclear, or dishonest ways, they risk
limitations.)
loss of credibility, trust, and respect, not just for
3. Avoid behaviors that can lead to errors
themselves but for the professions of statistics
(including, but not limited to, cherry-picking
and science. When information consumers rely
afew results).
on incorrect information, they may make poor
4. Avoid misleading presentations (such as
decisions.
“lying graphs”; see Section 1.10). 5. Avoid language that obscures results. 6. Do not overgeneralize. Do not make strong claims about characteristics of a population when your sample does not resemble that population.
1.10 Lying with Graphs and Statistics The most extreme form of lying withstatistics is fabrication or falsification of data; this is rare.
Real-world problems in applications of data
However, some common research practices slant
analysis are often notclear in introductory
information presentation in ways that can be
courses; students learn to do one analysis at a time
called “lying withstatistics.” The classic book How
using one small set of numbers. In actual practice,
to Lie With Statistics (Huff, 1954) presented
data analysts often work with large sets of messy
numerous examples.
data. Data analysts need to make many choices that involvedifficult judgmentcalls. This book points out differences between the idea/use of statistics in artificially simplified situations and the actual application of statistics to real-world data. Sometimes decisions about “best practice”
Deceptive bar graphs are among the most common ways information communicators mislead information consumers. If you will be an information producer, you need to know how to set up “honest” bar graphs. When you are an
are difficult.
information consumer, you need to know how to
As Harris (2001) said, “Statistics is a form of social
misleading. Chapter 5 provides examples of clear
control over the professional behavior of
versus misleading graphs and guidelines for
researchers. The ultimate justification for any
evaluation of graphs.
examine graphs to makesure that they are not
statistical procedure lies in the kinds of research behavior it encourages or discourages.” Science hasrules and standards about good practice in
1.11 Degrees of Belief
collection, analysis, and presentation of evidence.
Peoplerarely have time to collect all necessary
These are discussed throughout this book.
information. Even for questions in science, we
Researchers should be aware that press releases from universities sometimes overhype research findings (Resnick, 2019).
6% Page 11 of 624 » Location 791 of15772
often do not have enough information to be confident about conclusions. Uncertainty is more common than people realize, even in areas such as
who are not familiar with the variables will find
deviation of scores for sex).
this helpful to evaluate the obtained scores.
When a study includes many groups and/or many
Here are other things a summary table might
variables, all groups and all variables should be
include: the minimum and maximum scores
identified and reported in descriptivetables. This
obtained in the sample, numbers of missing
lets readers know if you haveselectively excluded
values for each variable, and information about
some groups or variables from the analyses you
reliability for each variable. If you do research in a
report later.
specific area, look at tables in published research reports to see if additional information is usually included in summary tables for descriptive statistics.
Research reports often describe scores on quantitative data using the sample mean M, the
Table 4.1
Well-being variables 5 РА Health behavior variables Sleepquality Diet variables
4.18 Summary
23.46 SE
640 125
Possible Mi
Possible Max
5 10
5 so
330
standard deviation SD (or s), and the variance s2.
Readers tend to assume that scores for quantitative variables have an approximately bellshaped distribution (if they are not informed otherwise), and they interpret the descriptive statistics accordingly. The “bag of tricks” used to compute many
сн
a
o
Note. LS is life satisfaction from theSatisfaction With Life
Scale; higherscores indicate greatersatisfaction. PA is positive affect using the PANASscale;higher scores indicate morepositive mood. Sleepquality wasrated on a 1-to-5 scale; 5 indicates thebest sleep quality. Sugar isan estimateof dailycalorie intake from sugar-containing
beverages.
8
statistics is actually quite small, and you have seen several of these tricks in this chapter: « When a sum of deviations would be zero, square terms before summing them. * When correcting for the number of deviations (or pieces of information)
2 NCIfv is the number of servings of fruit and vegetables in a typical day on the basis of a National Cancer Institute food frequency questionnaire, with responses recoded on a
included in a sum, divide by @finstead of by N.
e To put information back into the original
terms of measurement, take the square root.
scale from 0 to 8. The modal response was O.
These “tricks” are used again in many future
Because of this, and because thisvariable is
analyses.
the most important predictor variable in this study, the entire frequencydistribution should be presented in a separate table (not shown in this chapter). For a categorical variable such as sex, report proportions of male and female respondents as descriptive information (not the mean and standard
18% Page 92 of 624 + Location 2723 of 15772
You have seen that the sample meanis not always the best description of central tendency. In some frequencydistributions, Mis much larger (or smaller) than the median, and the magnitude of the meanis influenced strongly by a few extreme scores. When frequencies have more than one
mode, or are skewed, Mis sometimes not the best
Consider thisset of scores: X= [1, 3, 5, 2]. If you
description of the “typical” response. When you
square each X value and then sum the squared
report a mean, you need to tell readers something
values, you would obtain (1 + 9 + 25 + 4)=39.1f
about the shape of the frequency distribution to
you sum the X's and then square that sum, you
provide the background information needed to
would obtain (1 + 3 + 5 + 2)2 = 112? = 121.It is
understand potential problems with the mean.
important to know which arithmetic operation to
Statistics books provide so many examples of bell-
do first.
shaped distributions that students may assume
There are rules of precedence (order) for
that all data have this distribution shape.
arithmetic operations (see
However, many common kinds of variables do not
http://mathworld.wolfram.com/Precedence.htm
have bell-shaped distributions. Graphs, discussed
1). When I present equations I explain in words the
in Chapter 5, can be used to evaluate whether
order in which computations should be done, and
scores have a bell-shaped distribution or some
often, I use extra parentheses to make this clear in
other distribution shape. We should not assume
the equation. When an expression appears within
that all distribution shapes are bell shaped. When
parentheses, such as (X- 5), do that operation
reporting information about variables, remember
first. If you see E(X?), square each X valuefirst,
that readers may assume a bell-shaped
and then sum the squared X values: (1 +9 +25 +4)
distribution if you do not explain clearly that the
= 39. If you see (F X)?, sum the X values first, and
distribution shape is different.
then square the sum: (1 + 3 + 5 +2)2 = 112= 121.
If you read mass media reports about “averages,”
Be aware thatif you do arithmetic operations in
you need to know whether average was estimated
the wrong order, you can obtain answers that are
using the mode, median, or mean; under some
incorrect by huge amounts.
circumstances, these three descriptivestatistics can yield very different values.
Appendix 4B: Rounding
The next chapter provides further information
Computer programs often provide numbers given
about obtaining and interpreting graphs of
to several decimal places. Each number that
frequency distributions and additional questions
comes after a decimal point represents one
we can ask about distributions of scores on a
decimal place. For example, the number 4.171 has
quantitative variable.
three decimal places.
Appendix 4A: Order of Arithmetic Operations
If you do by-hand computations, you should retain at least three decimal places during your computations to minimize rounding error. Final results are usually rounded to a small number of
Many equations combine two or more arithmetic
decimal places, often two decimal places. The
operations, for example, XX? includes both
preferred number of decimal places to report
squaring and summing X scores. When operations
differs across disciplines and may differ across
are combined,the result often differs depending
variables. Use common sense. It would besilly to
upon the order in which operations are done.
say that the average American gets 7.481 hours of
18% Page 94 of 624 » Location 2746 of 15772.
sleep per night; it would make more sense to report this as 7.5 hours. If you are in doubt, report
higher. 5. True or false: The mode, median, and
more decimal points than you think reviewers or
mean can be equal, but they do not have
editors or readers will want; these can always be
to be equal.
rounded later. Use past research in your area of
. Consider the following small set of scores.
interest as a guide for the number of decimal
Each number represents the number of
places to report.
siblings reported by each of the V = 6 persons
Here are simple rules for rounding. If a final digit is greater than 5, the digit before it is increased by one unit when you round (this is “rounding up”). For example, 3.86 would be rounded to 3.9. If the final digit is less than 5, the digit before it is left the same when rounding (this is called “rounding down”); for example, 3.83 would be rounded to 3.8. If the final digit is exactly 5, you can toss a coin to decide whether to round up or down. In many journal articles, and for many statistics such as M, results are presented to one or two decimal places. One exception is that p values, introduced in later chapters, are often reported to three decimal places.
Comprehension Questions
in the sample: Æscores are [0, 1, 1, 1, 2, 7].
1. What isthe median for these scores? 2. What is the mode for these scores? 3. Compute the mean (M) for this set of six
scores. 4. Do youthinkthe median or the mean is a better way to describe the “typical” number of siblings? Why? 5. Whyisthe meanhigher than the median
for this batch of data? 6. Compute the six deviations from the mean (X- M), and list these six
deviations. 7. Sum these six deviations. What is the sum of the six deviations? Is this outcome a surprise? 8. Now calculate the sum of squared deviations (SS) for this set of six scores.
1. Fill in each blank using either mean or
median. 1. The
is the value for which 50%
of people in the sample havescores above, and 50% in the sample have
scores below. 2. The is the value for which the sum of deviations equals 0. 3. If extremely high scores are present, the may be so high that most peoplein the sample have scores that fall
below it. 4. If you change one or two of the highest scores in the sampleto higher values, the valueof the but the value of the
will not change, will get
18% Page 95of 624 - Location 2774 of 15772
9. Compute the sample variance, s?, for this
set of six scores. 10. When you compute s2, why should you divide SSby (W-1) rather than by W? 11. Finally, compute the sample standard deviation (denoted byeither sor SD). 12. Write a sentence in which you
summarize the information about this variable (including N, M, SD, Min, and Max). . In your own words, what does SStell us about
asset of data? Under what circumstances will the value of SSequal 0? Can SSever be negative? Why or why not?
. What would the value of SSbefor this set of scores: [103, 103,103, 103, 103, 103]? (You
should not need to do any computations.) 5. Think about the SSvalues you might obtain if you computed SSfor these two samples:
* Sample A: Y=[103, 156, 200, 300, 98] * Sample B: Y= [101, 102, 103, 102, 101] 1. Which sample will have alarger SS value? (You should not need to calculate SSto answer this.) 2. Will a sample that has a larger SS value also havelarger values for s2 and s (assuming Vis the same)? Explain your answer briefly. 6. Consider a quantitative variable (such as body temperature given either in degrees Fahrenheit or Celsius). List all the descriptive statistics information you could present to describe results about central tendency and variability for temperature. 7. Suppose that IQ scores are normally
distributed with #7= 100 and $0 = 15. Use the rangerule to approximate the sample range
on the basis of values of Mand SD.
Digital Resources Find free study tools to support your learning,
including eFlashcards,data sets, and web resources, on the accompanying website at
18% Page 96 of 624 - Location 2799 of 15772.
Graphs: Bar Charts, Histograms, and Boxplots 5.1 Introduction Information about scores that was presented in the form of frequency tables in Chapters 3 and 4 can be presented in simple graphs. This chapter
commonly seen in real data (examples appear in Tables 5.1 and 5.2). The bell-shaped curve (more formally, the normal distribution or Gaussian distribution) is of particular
interest. The normal distribution will be discussed further in Chapter 6. A disadvantage of frequencytables is that it can bedifficult (although it is possible) to evaluate distribution shape by inspection of a frequency table.
describes some widely used types of graphs: pie
Ideally, preliminary data screening includes
charts and bar charts for categorical variables, and
frequency tables (Chapter 3), descriptive statistics
histograms and boxplots for quantitative
(Chapter 4), and graphs (the present chapter).
variables. Each approach (frequency table vs.
Frequencytables are rarely included in published
graph) has advantages and potential
research reports. Graphs of frequency
disadvantages:
distributions are not often reported in journal
1. An advantage of frequency tables is that they provide exact information about the numbers or percentages of persons who had each score
articles, although they can be. Information in frequency tables can be used to label graphs accurately.
value. The corresponding disadvantage of
SPSS does not produce publication-quality
graphs is that when they are poorly labeled,it
graphics. For beginners, this is not a major
is difficult to identify exact numbers and
problem; the graphs are adequate for preliminary
percentages.
data screening. Advanced users may prefer other
2. Adisadvantage of graphs is that they can be
programs to generate graphics. The R supplement
constructed in ways that create deceptive
for this book (Rasco, 2020) demonstrates use of
impressions. Frequency tables generally are
the ggplot procedure; this produces better quality
not deceptive.
graphics. I modified most SPSS graphics in this
3. An advantage of graphs is that they provide appealing visual information that grabs
book by editing to increase font sizes and add
information.
readers’ attention; this is particularly useful in mass media reports, PowerPoint or Prezi
In real-world data analysis, descriptivestatistics,
presentations, and poster presentationsat
frequency tables, and graphs should be examined
professional conferences. A disadvantage of
before a data analyst conducts the main analysis
frequency tables is that they do not have
thatis of primaryinterest (such as a /test or
much visual appeal.
analysis of variance [ANOVA)]). These provide
4. An advantage of graphs for quantitative
information needed for preliminary data
variables (such as histograms) is that they
screening. Published research reports typically
provide easily understandable information
include only a few sentences about preliminary
about distributionshape. This chapter
screening (if they mention it at all). Hoekstra,
describes several distribution shapes
Kiers, and Johnson (2012) noted that many
18% Page 98of 624 + Location 2811 of 15772
The imageis a frequencytable that shows hypothetical marital statusscores.
authors don’t report much about data screening; they argue that the validity of statistical results is often questionable because assumptions required
Therearefive columns: valid count, frequency,percent, valid percent and cumulative percent.
for statistical analysis are not satisfied (and often not even checked). Potential violations of some of the assumptions that are introduced later can be
Details are as below:
assessed by examining graphs.
* valid count, frequency, percent,valid percent, cumulative percent ......
5.2 Pie Charts for Categorical Variables Pie charts are almost universally despised by scientists, and you are unlikely to see them in academic journals; however, they are popular in
never married, 20, 47.6, 47.6, 47.6 engaged,4, 9.5, 9.5, 57.1 married, 11, 26.2, 26.2, 83.3 divorced, 4, 9.5, 9.5, 92.9 widowed,3, 7.1, 7.1, 100 Total, 42, 100, 100
mass media, so you should be familiar with them. Consider the frequency table for hypothetical
Figure 5.2 Use of Frequencies: Charts Dialog Box
scores for the categorical variable marital status
to Request a Pie Chart
(Figure 5.1).
Recall that the “Cumulative Percent” column automatically provided by SPSS makes no sense
a
e ッー C © вона
for categorical variables. Focus on the “Frequency” and “Percent” columns. To request a pie chart, use the familiar Frequencies procedure, beginning with these SPSS menu selections: + . Click the Charts button to open the Frequencies: Charts dialog box in Figure 5.2; within that window, select the radio button for “Pie charts,” then click Continue and OK. Edited pie chart output appears in Figure 5.3. Figure 5.1 Frequency Table for Hypothetical
Marital Status Scores
maritalstatus
Valid
never married engaged married divorced widowed Total
equeney 20 4 11 4 3 42
Percent Valid 476 476 95 95 262 262 95 95 74 74 100.0 1000
Cumulative Percent 476 574 833 929 1000
19% Page 99 of 624 - Location 2836 of 15772
There are twoboxes,andthe one ontheright has a variable titled maritalstatus. Below is a selected check box nameddisplay frequency tables. At the bottom are options buttonsfor
the following; OK,Paste, Reset, Cancel and Help.
Ontheright are theradio buttonsStatistics, charts, format and help. The Charts option has been depressed. The frequenciescharts dialog box hasfour chart type check options: none, bar charts, pie charts andhistograms. The Pie chartsoption hasbeenchecked. The chart values tab hastwochoices frequencies and percentages. Frequencieshas
been selected.
provide complete information.
Atthe bottom are the option buttons Continue,
Pie charts have only two virtues. They provide
Cancel and Help.
colorful slides in presentations, and this is
Figure 5.3 Pie Chart for Hypothetical Marital Status Data, N = 42
something that some data analysts (in marketing, for example) may like. Also, they lend themselves well to humor. (Search online for “funny pie charts”to find examples, or create your own
Marital Status [ Nevermarried
I Married IM Divorced [El Engaged
O widowed
There are five options: never married, married, divorced, engaged and widowed. The largestis the never married pie, followed by married, engaged, divorced and widowed.
comic version. Perhaps you can persuade your instructor to give a prize or extra credit for the most comical or ingenious examples.) If you become a science researcher, you will probably
never usepie charts.
5.3 Bar Charts for Frequencies of Categorical Variables The SPSS Frequencies procedure, which was used in previous chapters to obtain frequency tables,
can also provide charts (or graphs). To open the Frequencies dialog box that appears in Figure 5.4, make these menu selections:
The frequency table in Figure 5.1 tells us that the
> >
group with the largest number of members is
. Clickthe Charts button on the
“never married”; this correspondsto the solid
right-hand side of the Frequencies dialog box to
“slice”in the pie chart. The frequency table has a
open the Frequencies: Charts box; a bar chart is
great advantage over the pie chart; it provides
obtained byselecting the radio button for “Bar
exact frequencies and percentages, while the pie
charts” in the Frequencies: Charts dialog box (also
chart only approximates group sizes (unless the
shown in Figure 5.4). The Yaxis may be given in
slices are labeled using numbers or percentages).
frequencies (number of cases) or percentages.
Kopf (2015) reviewed reasons why many data analysts hate pie charts. For example, people are
Click Continueto return to the main Frequencies dialog box. Click OK to run the procedure.
not good at estimating percentages from the areas
The hypothetical marital status scores in Figure
of the slices. Pie charts require the use of colors (or textures such as dots or stripes) to differentiate
5.1 were used to set up the bar graph in Figure 5.5. The height of each bar represents group size. I
slices; most science journals do not publish figures
edited the bar graph produced by SPSS (using the
in color. Tufte (2001), who authored several books
SPSS Chart Editor and Microsoft Paint) in the
about excellence in graphing, regards most
following ways: I increased font sizes for the X
multicolored figures as unsightly; he argues that
and Yaxis labels and added the exact number of
graphs should use as little ink as possible to
cases per group (from the frequencytable) above
19% Page 100 of 624 - Location 2859 of 15772
each bar. 5.4
Frequencies
Dialog
Box
and
Frequencies: Charts Dialog Box
e
Eramcem
Emarital
x
Da
い yeeme o (corn) carene
ceeeme
There are two boxes,and the one on the left hasa variable titled marital. Below is a check box nameddisplay frequency tables. At the bottom are options buttonsfor the following: OK,Paste, Reset, Cancel and Help. On the right are theradio buttonsStatistics, charts, format and help. The Charts option has been depressed. Thefrequencieschartsdialog boxhas four chart type check options; none, bar charts, pie charts andhistograms. The bar charts option hasbeenchecked. Thechart valuestab has two choices frequencies and percentages. Frequencieshas beenselected. At the bottom are the option buttons Continue, Cancel and Help. Figure 5.5 Bar Chart for Hypothetical Marital Status Groups, Total N = 42
20
never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3
5.4 Good Practice for Construction of Bar Charts Bar charts and other graphs should provide accurate information that is easy to understand.It is easier for readers to understand graphs when they follow simple rules and conventional
standards. 1. A separate bar represents the frequency (or proportion or percentageofcases) for each group. The height of the bar corresponds to the number or frequency in each group (or the proportion or percentage of cases in each group). The labels on the Yaxis should make clear whether frequency, proportion, or percentage is reported. However, the relative heights of the bars are the same no matter which label is used. (Usually bars are vertical, but it is possible to set up bar charts in which
20
bars are horizontal.) 2. Names of groups are specified by labels on the
15 Frequency
Thedetailsare asfollows:
.....
Figure
The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies.
Xaxis. 3. Bars should have equal widths. (This rule is
10
not always followed.) 4. The height of the graph (¥axis) is usually less than the width of the Yaxis (the height of Yis
Never married
Engaged
Married
Divorced
Widowed
Martial status
19% Page 101 of 624 - Location 2885 of 15772
often about 75% the length of X). 5. The Yaxis begins at 0 (or at another minimum value of Y.
6. The top of each bar is labeled with an exact
considered (20 never married, 3 divorced), the
numerical value (a frequency or a
never married group is only about 7 times as large
percentage). SPSS does not dothis for you; I
as the widowed group.
added this information using SPSS Chart
Figure 5.6 An Example of Bad Practice: Deceptive
Editor.
Bar Chart for Frequency of Marital Status
20 18
7. Information about total NV must be provided.
source of data should bestated. Readers tend to assume that numbers are based on new data collected by the researcher; if there is
Frequency
8. In afootnote or the body of the text, the
another source (such as Gallup polls or the U.S. census), that source must be identified.
Never married
9. Bars in bar graphs for categorical variables usually do not touch one another. (This
Widowed
Divorced
Married
Engaged
Martial status
reminds readers that bars represent distinct
When you generate bar charts for frequencies in SPSS, many of these good form requirements are taken care of by default (e.g., bars are equal widths, and the Yaxis begins at 0).
5.5 Deceptive Bar Graphs The most common way to make a bar chart for group frequencies “lie” is to set up the Yaxis so
thatit does not start at 0. To illustrate this deception, I modified the graph in Figure 5.5 so that the Yaxis begins at 2 (instead of 0). The modified bar chart in Figure 5.6 is potentially misleading because people tend to look at the ratio of bar heights (or bar areas) when they compare
The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies. Thedetailsare asfollows:
.....
groups.)
never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3
Figure 5.7 Deceptive Bar Chart: Use of Cartoons Instead of Bars to Represent Frequencies
10000 Number of new houses built 5000
group sizes; people often do not pay close attention to the specific values indicated on the Y axis. In Figure 5.6, the differences in group sizes appear larger than in Figure 5.5. In Figure 5.6, the never married group appears to have about 10 times as many members as the widowed group (measure the height of the bar for never married and dividethis by the height of the bar for the widowed group). When actual group sizes are
19% Page 102 of 624 - Location 2906 of 15772
o
LE
2009
2019
Year
The X axis denotes the year, 2009 and 2019
and the Y axis denotes the number of new housesbuilt and ranges from 0 to 10,000.
mean of 80.9.
for which the mean, median, and mode have
This example illustrates two things: e When one very high score is added to this sample, the value of Mincreases (while the
value of the median and mode do not change). This demonstrates that the mean is less robust against the impact of extreme
scores than the median and mode. e With one or more extremely high scores added, the value of the sample mean Mis higher than the median; and in this example, Mis actually higher than the majority ofthe individual scores in the sample. Under these circumstances the sample mean Mis nota very good way to describe “average”or typical responses. Note that adding an extremely low
score will make the mean smaller than the median.
4.8 Behavior of Mean, Median, and Mode in Common RealWorld Situations
similar values. Suppose you have a survey question that asks peopleto rate their degree of agreement
with this statement: “I think that the U.S. economy is doing well.” Response options are scores of 1 = strongly disagree (SD), 2 = disagree (D), 3 = neutral (N), 4 = agree(A), and 5 = strongly agree (SA). We might obtain a frequency distribution
like the one in Figure 4.4. Note that the answer given by the largest number of people corresponds to 3 (neutral), the next highest frequency responses were 2 (disagree) and 4 (agree), and the most extreme responses, 1 (strongly disagree) and 5 (strongly agree), were uncommon. For now, we will call this pattern a “bell-shaped”distribution. (Later, we'll talk more formally about normal distributions.) Bell-shaped distributions tend to have values of the mean, median, and modethat are close to one another. In the graph in the lower part of Figure 4.4, the number above the bar for each score value (such as 0) corresponds to the frequency of that score in the table (in the upper part of Figure 4.4). For example, in this hypothetical data set, a score of 1
This section previews the use of graphs to
had a frequency of 6. A score of 3 had a frequency
represent score frequencies for quantitative
of 33 (i.e., 33 people chose the answer 3). The
variables (graphs are discussed more extensively
histogram or graph at the bottom of Figure 4.4
in Chapter 5). Figure 4.4 shows a frequencytable
represents the same information about
for a set of hypothetical scores. A corresponding
frequencies using bars with heights that
histogram presents the same information
correspond to frequency. This distribution can be
graphically; the height of each bar in the
informally defined as bell shaped; there is a peak
histogram correspondsto the frequency of that
in the middle, and the pattern is symmetrical;
score (i.e., the number of people who had that
thatis, the left-hand sideof the distribution is
score value).
approximately a mirror imageof the right-hand
side.
4.8.1 Example 1: Bell-Shaped
Distribution
Figure 4.4 Hypothetical Likert Scale Ratings With Bell-Shaped Frequency Distribution: (a) Frequency Table and (b) Corresponding Histogram
First let’s consider a hypothetical batchof scores
15% Раде 78 о624 - Location 2338 of 15772
CONETC Reverse J.haped
my ornormalor Gaussian
One mode Is ator near0. (This could be called severely positively skewed, but ts more: extreme)
“Spiy”butcosetonormal Reasanablyclosetonormal (Md callthis "spikybutthat is notatermyouwould use in research report). Uniform distribution OccursifÆscoresareramks (eg.inthisexamplefivecases areranked, with ranksof 1,2.3, 4,and5). Anproimatly normalwith autlrsatoneor bothends
1 i 1 2 3.45. 6795 w ^*
Bimodaldistribution With modes atend points. Possible situation: A degree of agreement questionforwhich opinions are strongly polarized (SD =stronglydisagree, disagree, N= neutral, agree, SA = strongly agree).
Positivelyskewed. More “weight” althe low endand alongen thinner tailatthe highend. skewed” Positivelyskewed
EC
foo malmeerotar
En
| —mer” o
‘normally distributed samples
TeaAS
with meansthat arefarapart.
Negativelyskewed Longer, thinner tall atthe low end ofthe distribution, Negatively skewed distributionsare not very common in behavioral sciencedata.
Trimodal Three modes that are not at end points. (Could be scores for three normally distributed samples that have different means.
Thisdistributiondoes notlook likeanyspecific distribution shape.
ー
Table ⑤.②
However, when we look at real data, we often see
distributions that do not look anything like normal or bell-shaped curves. Table 5.2 shows
19% Page 106 of 624 + Location 2052 of 15772
distributions with shapes that are clearly not
none of the common distribution shapes is a good
close to bell-shaped or normal. Tables 5.1 and 5.2
description for a histogram you obtain for your
do not include all possible distribution shapes;
data. People who often work with specific types of
there are many others.
variables (such as reaction time) will learn the
To decide whether one or more of these distribution shapes best describe the data in your sample, you can obtain a histogram and compare it with the examples in these tables. Visual examination of a histogram is usually sufficient to
specific distribution shapes for those variables.
5.7 Obtaining a Histogram Using SPSS
make reasonable evaluations about distribution
The hypothetical female height data in
shapes. In Chapter 6, you'll see that there are
femaleheight.sav are used to set up a histogram.
quantitative methods to evaluate how well data fit
You may have wondered why height and
a specific distribution shape; however, these are
temperature are used as variables in early
rarely usedin practice.
examples. Each of these variables can be given in
The bell-shaped distribution in row 1 of Table 5.1 is discussed extensively in statistics. Informally, we can describe this bell-shaped distribution shape as follows.
different units (for example, height can be inches or centimeters; temperature can be given in degrees Fahrenheit or Celsius). (The United States is one of very few nations that still uses nonmetric units such as inches.) The following
e There isa “hump”in the center of a bellshaped distribution. In a perfectly normal distribution, the mean, median, and mode are exactly equal and correspond to the center of the distribution (and all correspond to the top of the hump). ヶ Frequencies (the heights of the bars in the histogram) decline gradually as scores become either larger or smaller than the mean, median, or mode; this creates a shape something like a bell. ヶ The distribution is symmetrical around the mean. Thatis, the upper half of the histogram is a mirror image of the lower half. Comprehension questions will ask you to examine histograms and evaluate whether the distribution is bell-shaped with minor variations or is described better by quite different distribution shapes. This is a somewhat subjective judgment call. Sometimes the best decision is to say that
19% Page 107 of 624 - Location 2955 of 15772
example shows how to obtain histograms; examples demonstrate that converting units of
measurement from inches to centimeters does not change the shape of the frequency distribution (although unit conversion does changethe values of M, SD, and other descriptive statistics).
You will find it useful to be able to convert scores from one unit to another, and to do other computations. The SPSS Compute Variable command can beused to do this and has many additional potential uses. In this situation, we use this command to obtain (approximate) height in centimeters by multiplying height in inches by 2.54. To open the Compute Variable dialog box in Figure 5.8, select these menu options: + . In the left-hand window, type the name of the new variable (in this example, heightem). In the right-hand window, type a numerical expression
that includes the name of one (or more) existing
andhelp.
variable(s) that is used to assign values to the new
Below these buttonsare icon buttons to opena
variable (in this example, the numerical expression is “2.54*heightinch”). After you click OK, the new variable heightcm will appear as a new column on the right-hand side of your data
worksheet. Now let’s compare the distributions of height in inches and height in centimeters. The familiar Frequencies procedure (used to obtain descriptive statistics and pie and bar charts for categorical variables) can be used to request histograms for quantitative variables. Use these SPSS menu selections: っ > . Move both variables (heightinch and heightem) into the Variable(s) pane. Click the Charts button and select the radio button for “Histograms.” You may also want to
check the box for “Show normal curve on histogram.” Click the Statistics button and use checkboxes in the Frequencies: Statistics dialog box to choose the desired descriptivestatistics. Click OK to run the procedure. Output for descriptivestatistics appears in Figure 5.9 and the histograms in Figure 5.10.
Figure 5.8 SPSS Compute Statement to Convert Height From Inches to Centimeters
file, save, print, and othertable editing options.
The Transform menubutton,onbeing clicked results in a drop down menu with the following options; computevariable, programmability transformation, countvalues within cases, shift values, recode into same variables, recode into different variables, automatic recode, create dummyvariables, visual binning,rankcases, data andtime wizard, create timeseries, replace missing values, random number generators and run pending transforms. The computevariable button has been depressed, leading to a dialog boxto compute variables. Atthe top left, thereis a box titled Targetvariable, where heightem has been filled in the field. Below this is a Type and label button, which hastwoentries; heightinch and heightem. Heightinch has beenselected. Ontheright, a numeric expression field has the entry 2.54 into heightinch. A keypad with standard numbers and symbols is below this. Ontheright is a Function group section with the following entries;all, arithmetic, CDF and noncentral CDE, conversion, current date or time,date arithmetic, and datecreation. Below this is an empty box titled Functions and special variables. An IF statement box has the statement
Optional case selection condition. ニニ ーー ma
At thetopofthe spreadsheet,titled femaleheight.sav, are the following menu buttons:file, edit, view, data, transform, analyze, graphs, utilities, extensions, window 20% Page 107 of 624 - Location 2081 of 15772
At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 5.9 Descriptive Statistics for Hypothetical Female Heights in Inches and Centimeters
Statistics heightinch N
⑫0
MECO
Missing 」
heightcm ⑫0 」
0
0
Mean64.481637665
Median Mode
Std. Deviation
—
6450
163.8300
64
162.56
2.463 6.25614
Variance
6.067
39.139
Minímum
58
147.32
Maximum Percentiles
| 25
50
75
70 |
177.80 」
— 63.00 | 160.0200 」
6450 1638300 66.00
167.6400
..........
Thedetailsofthestatistics figures are mentioned below:
N Valid: 120, 120 N Missing: 0,0 Mean:64.48, 163.7665 Median: 64.5, 163.8300 Mode: 64, 162.56 Std. Deviation: 2.463, 6.25614 Variance: 6.067, 39.139 Minimum: 58, 147.32 Maximum: 70, 177.80 Percentiles: © 25: 63, 160.02 : 64.5, 163.83 6,167.64
o 58 60 62 64 66 68 70 72 м Height (inches)
o 140
150 160 170 Female height (cm)
180
In the first diagram, the X axis denotes the height in inches which ranges from 58 to 72, rising in increments of 2. The Y axis denotes the frequency and rangesfrom 0 to 20, rising inincrementsof 5. The SD has been specified as 2.5 oneitherside of the mean. A curve drawnthrough each of the bars ofthe histogram is approximately bell shaped. The second diagram’s X axis denotes the heightin centimeters and ranges from 140 to 180, rising in incrementsof 10. The Y axis denotes the frequency and rangesfrom 0 to 20, rising in increments of 5. The SD has been specifiedas 6.25 on eitherside of the mean. A curve drawn through each of the bars ofthe histogram is approximately bell shaped. As you might expect, transformation of scores from inches to cm changed all values for descriptivestatistics, such as mean and standard deviation. For example, the mean for height in centimeters is 2.54 times the meanfor height in inches. Each descriptivestatistic for height in centimeters is 2.54 times the corresponding statistic for height in inches (except that variance for height in centimeters is 2.542 times the variance in inches).
Figure 5.10 Histograms for Hypothetical Female Height Data
Did this transformation change the shape of the
distribution? Figure 5.10 shows that zhe distributions ofheight scoresgiven in inches and centimeters have identical shapes, even though individual scores and descriptive statistics such as M and SD are in different units, and the units along the X axis differ.
20% Page 109 of 624 - Location 3006 of 15772
Ihave marked が /and ⑤の in the two histograms above. The sample mean is approximately in the middle, marked by the letter Mon the XY axis.
Recall that SD summarizes information about distances of scores from the mean; SD is shown as
horizontal arrowsthat indicate distance from the meanof X: For height in inches, SD was 2.5 inches. The end points of the arrows that indicate the
distance of one SD below Mand one SDabove M
are: Other
Lower end point:M — 1 SD = 64.5 — 2.5 = 62 Upper end point:M + 1 SD = 64.5 + 2.5 = 67 In Chapter6, you'll learn about the mathematical definition of normal distribution shape (expressed in the form of a somewhat complicated equation). That equation generates
the smooth curves superimposed on the
55
70
85
100
115
130
145
The X axis rangesfrom 55 to 145, rising in increments of 15. The meanis the highest pointofthe curve, at 100, and the curveis symmetrical on both sides. There are three arrowson eitherside of the mean. For example, scores on many IQ tests are normally
distributed with a mean of 100 and a standard
histograms above.
deviation of 15 (or sometimes 16). This is enough
5.8 Describing and Sketching Bell-ShapedDistributions
distribution, as shown in Figure 5.11.
information to sketch the shape of the
When sample data are approximately normally distributed, you need only three pieces of information to specify the distribution, communicate information about it to someone else, and/or draw a sketch of thatdistribution. These pieces of information are: 1. The distribution shape (normal). 2. The sample mean M. 3. The sample standard deviation SD.
Figure 5.11 Sketch Based on Three Pieces of Information: Normal Shape, M = 100, SD= 15
20% Page 109 of 624 - Location 3023 of 15772
The range rule (from the previous chapter) will help you identify the approximate locations of the minimum and maximum value on the XY axis, and this rangeis divided into six parts (the range is approximately equal to 6 x SDif the distribution is normal). Note that you won't be ableto label the Y axis in this graph. You can label the following seven points along the X axis if you know Mand
SD. The seven Xaxis values marked in Figure 5.11 are calculated from Mand SDasfollows:
M-3x SD
100- (3 x 15) =65
M-2x SD
100— (2 x 15) = 70
M-1xSD
100- (1 x 15)
M+0xSD
100 + 0 100
M+1xSD
100+ (1 x 15) = 115
M+2xSD
100 + (2 x 15) = 130
M+3xSD
100 + (3 x 15) = 145
Score locations relative to the mean can be approximately described as follows (we will describe distance from the mean more precisely in Chapter 6). A score can becalled “not very far from the mean” if it lies within the range M-1 SD and M+ 1 SD. For example, an IQ of 110 is not very
far from the mean. An Xscore can be called “far from the mean”if it is below M- 2 SD or above M + 2 SD. For example, an IQ of 135 is far above the mean; and an IQ of 69 is far below the mean. A score can becalled “unusually far from the mean” if it is less than #/- 3 SD or greater than M+ 3 SD. For example, an IQ of 50 is unusually far below the mean, while an IQ of 150 is unusually far above
the mean.
scores equal to or greater than 160).
5.9 Good Practices in Setting up Histograms Mostrules for good practice in bar chart construction also apply to the construction of histograms: 1. A separate bar represents the frequency (or proportion or percentageof cases) for each score value (or for a rangeof score values, as described later in this section). 2. The height of each bar correspondsto the number or frequency in each group (or the
proportion or percentage ofcases in each group).
3. Labels on the Yaxis should makeclear Whether frequency, proportion, or percentage is reported. (However, the relative heights of the bars are the same regardless of labels.) 4. Score values are specified by labels on the X axis.
Another way to look at this: If you had a set of
5. Bars should have equal widths.
1,000 IQ scores with M=100 and SD = 15, and you
6. The height of the graph (Yaxis) is usually less
selected one case at random, the mostlikely
than the width of the Yaxis.
outcome would be an IQ in the range from 85 to
7. The Yaxis begins at O.
115. You could obtain a case with an IQ in the
8. Inbarcharts, it is good practice to label the
range 150 and up, but that would be an unusual or
top of each bar with an exact numerical value
unlikely outcome.
(afrequency or a percentage). There may not be enough space on a histogram to include
If you know your own IQ, or any specific IQ score,
such labels. Clearly labeled tick marks on the
you can locate that score on the Xaxis, and
Yaxis help readers evaluate frequencies.
immediately see the following:Is your IQ score
above or below the mean M Is it far from the mean, or unusually far from the mean? Refer to
9. Information about total MN must be provided. 10. Ina footnote or the body of the text, source of
Figure 5.11. An IQ of 90 is below the mean, but it
data should be stated. Readers tend to assume that numbers are based on new data collected
is not very far from the mean. An IQ of 160 is
by the researcher; if there is another source
above the mean, and it is unusually far from the
(such as Gallup polls or the U.S. census), that
mean(in other words, very few people have IQ
20% Page 110 of 624 - Location 3044 of 15772
science data. Negative skewness is possible (with a few extreme scores at the low end)
but less common. 3. If adistribution is bell shaped or approximately normal, the values of the mean, median, and mode will be close together. The mean is a good way to describe central tendency for bell-shaped distributions; the median and mode will have
similar values. 4. When in doubt, or if the situation is complicated,it may be better to report the entire frequencydistribution (and/or histogram) along with values for the mean, median, and one or more modes.
Good practice:
This is deceptive.
* Fail to makeclear which index of central tendency is reported, and fail to note potential problems withit. Chapter 1 mentioned “lying with statistics.” Reports of central tendency can be deceptive when they present only selected information that creates the impression the author wants to create. When an author wants readers to think, “Wow, that averageis really high,” the author might choose to report the highest of the three values (mean, median, or mode). Conversely,if the author wants readers to think, “Wow, that averageis really low,” the author might choose to report the lowest value among mean, median, and mode. An author who cherry-picks the highest
* Do preliminary data screening by examining afrequencydistribution table and graph to evaluate whether the mean, median, and/or mode(s) are better ways to describe central tendency. eo If implausible score values appear, go back
and reexamine the data to correct errors. * Note the number of missing values. e State whether extreme scores or multiple modes were detected (or whether the distribution is approximately normal). e State clearly what statistic is used (mean, median, or mode) to describe average
responses.
“average”is presenting misleading (although perhaps not technically false) information.
4.10 Using SPSS to Obtain Descriptive Statistics for a Quantitative Variable Previoussections discussed statistics for central tendency; the following sections discuss statistics
to describe variability. In this section, SPSS is used to obtain all these descriptive statistics (to describe both central tendency and variability) from data in the file named temphr10.sav using the SPSS frequencies procedure.
Bad practice:
To run Frequencies, make these menu selections
* Obtain a mean, median, or mode without
(as in the example in Chapter 3): >
examining a frequency table or graph. * Select the index of central tendency value
っ . This opens the main dialog box for the frequencies
that “fits the narrative.” For example, if you
procedure; in this window, move the variable hr
want to report a high average, you can select
into the Variables window. Click the Statistics
whichever of these three statistics has the
button in the top right-hand corner of the main
highest value, whether it makes sense or not.
dialog box for the frequencies procedure to open
16% Page 82of 624 - Location 2456 of 15772
Thereare several bars, signifying manybins, and the distribution has several spikes in the center, with the right end becoming almost
1250
flat. The curve drawn through the bars resembles a normal distribution thatis
positively skewed.
Figure 5.14 Optimal Number of Bins Determined by SPSS for BMI Histogram
200 249
185 150 0 30
20
10
40
The X axis rangesfrom 10 to 40, rising in incrementsof 10 and the Y axis hasjust one
number 1250.
g o È 50
—
20
ー es
っ
Bu
The histogram is a big bar stretching from 15
to 40 alongthe axis and reaching upto the Single value ⑫⑤0 on theYaxs
N
The X axis denotesthe BMI and ranges from 10 to 40,rising in incrementsof 10. The Y axis
Figure 5.13 Large Number of Bins (“Too Many”
Bins) WithJagged Distribution Shape
Frequency
во)
зо 20
Histogram
denotesthe frequency and ranges from 0 to
200, rising in increments of 50.
There areseveral bars, signifying the bins, and the distribution spikes in the center, with the right end becoming almost flat. The curve drawn through the bars is approximately normal except fora few outliersat the high end ofthedistribution. The bars for 18.5 and 24.9 have been specifically marked out.
N |
Figure 5.14 shows the histogram for BMI scores when SPSS was allowed to decide on the “optimal”
number of bins. I marked the clinical cutoffs for BMI
normal BMI (18.5 and 24.9) on this histogram as points of reference.
The X axis denotes the BMI and ranges from 10
to 40, rising in incrementsof 10. The Y axis
In Figure 5.14, it is clear that the distribution
denotes the frequency and ranges from 0 to 60, rising in increments of 10.
shape for BMI was approximately normal except for a few outliers at the high end of the
20% Page 113 of 624 + Location 3087 of 15772
distribution, thatis, a few cases with unusually
at the upper end of the incomedistribution, many
high BMI. The SPSS default choice for the number
additional bars would be needed to represent the
and widths of bins provided a relatively smooth
full range of incomes in the United States. If you
histogram to use for evaluation of the distribution
drew an X axis wide enough to include all these
shape for this data set. (SPSS does not publish the
additional bars, the graph would haveto be at
details of how this decision is made, and rules for
least five times wider than shown in Figure 5.15.
this can be complex.)
To avoid that problem, information about
When your variable can be evaluated in terms of clinical guidelines (a BMI between 18.5 and 24.9 is generally described as indicating healthy body weight), it can be useful to evaluate distribution shape relative to these clinical cutoffs. A large proportion of students had BMI scores in the “healthy” range. A fairly substantial minority of students had BMIscores that would be judged overweight or very overweight; a few had BMI scores that would be described as underweight. (A frequency table would provide the information
incomes greater than $200,000 was compressed
into two bars. When looking at graphs like this,
readers need to notice how the last few bars were defined.A first impression might be that there is a modefor incomes between $200,000 and $205,000, but this impression is incorrect. In fact,
there is an extremely long and thintail for this incomedistribution (the distribution is extremely positively skewed).
Figure 5.15 Annual Household Income in the United States in 2010
% of households
needed to find the exact percentages of persons
6%
‘who were over- or underweight.) This
5%
distribution is positively skewed; it has a longer tail at the high end. Skewness is discussed further
in Chapter 6. It is desirable to have bins that correspond to the same ranges of score values, but thisis not feasible in some situations. Figure 5.15 shows a histogram for real data: the percentages of households whose annual incomes fall into ranges such as less than $5,000, between $5,001 and $10,000, and so
forth. In the histogram in Figure 5.15, each bar (except
4% 3% 2% 1%
, as shown in Figure 5.18. In mostreal-life situations, researchers want to compare boxplots for the same variable for two or more groups, as in the following example. BMI is an index of body weight corrected for height. Using data in the file bmi.sav, we will examine BMI scores separately for men and women. In the first Boxplot dialog box (Figure 5.19), highlight the box for “Simple” boxplot and select the radio button for “Summaries for groups of cases.” In the Define Simple Boxplot: Summaries for Groups of Cases dialog box (in Figure 5.20), the name of the variable for the plot (heightinches) is moved into the variables list. The resulting boxplot graph appears in Figure 5.21. On the basis of output from the SPSS frequencies
procedure (not shown here), the median BMI was
23 for men and 22 for women. There were numerous outliers for both groups, mostly higher
BMI scores. You need to know that when two
menu options; bar, 3-D bar,line, area, pie, high-low, boxplot, error bar, population pyramid, scatter or dot and histogram. The spreadsheethasfive columns and 15 rows filled with numerical data.
scores have the same value, SPSS draws just one
circle. Each circle indicates the row number in the SPSS data file where the outlier score is located. You can determine the number of scores identified as outliers by counting these numbers. To find the number of nonextreme outliers, count
Figure 5.19 Initial Boxplot Dialog Box: Select “Simple” and “Summaries for groups of cases”
€ Boxplot
x
the case numbers for the open circles and ignore
the case numbers for the asterisks. An outlier that is not extreme is denoted using an open circle,
LA
Simple
É 朗
Clustered
while outliers labeled as extreme appear as
asterisks. Figure 5.18 Menu Selections to Access Boxplot Dialog Box
| craie
7 B i В ©
5
О ゃ n ョ ョ
き
ョm ク ャme o Tenelpeeeeee 1 0 верен atis о
р 5 С
ョ
B ッ : ⑧ ョ
E ow w リ ョ ョ 1 no
ー①森
Em a し as
E
в
a ョ x
“es Moe lame Em
[Huei ョ ョョ mm (Howe ョ ョ [Ветви ッ ョ mm ョ ae
=
-
Data in Chart Are
© summaries for groups of cases © Summaries of separate variables
(Detine (cance Help
a
At thetopofthe spreadsheetarethefollowing menu buttons;file, edit, view,data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttonsto open a
file, save, print, go back andforward, and other table editing options.
The graphs menu option has beenclicked and a drop-down menu showsthefollowing; chart builder, graphboardtemplate chooser, Weibull plot, compare subgroups, regression variable plots, andlegacy dialogs. Legacy dialogs has been depressed,leadingto the next group of 21% Page 117 of 624 » Location 3202 of 15772
There are twotypes of boxplot choices available; simple and clustered.Thedata in the chart can be of twotypes; summariesfor groupsof cases and summaries of separate variables. The box for “Simple” boxplot and the radio button for “Summariesfor groupsof cases” have beenselected. At the bottom ofthe dialog box are buttonsfor the following; Define, cancel and help. Figure 5.20 Define Simple Boxplot: Compare BMI Scores for Male Versus Female Groups
descriptivestatistics for variation (including
scores high enough to warrant diagnoses of mild,
minimum, maximum, range, variance, and
moderate, or severe depression? In a study of a
standard deviation) can be obtained by hand and
new antidepressant drug, for example, readers
how they are interpreted.
would want to know whether most patients were mildly or severely depressed.
4.11 Minimum, Maximum, and
Range: Variation Among Scores The simplest way to describe variation among scores begins by rank-ordering scores from lowest to highest. The lowest score value is the minimum (often abbreviated as Min); the highest score value is the maximum (Max). As noted in Chapter 3, the range is maximum - minimum. For the heart rate datain Figure 4.1, Min = 62, Max = 82, and range = 20. Whydoes this information matter? It helps us characterize the variety of people we have in the sample.
4.12 The Sample Variance s2 We can obtain more useful information about variability by using information for a//the individual scores. If all people had the same heart rate score, there would be no variance (e.g., a sample with hr scores of 72, 72, 72, …, 72 will have variance of 0). Variance in hr exists when people have different values of hr. Variability is evaluated by examining how far individual people's scores are from the mean.
4.12.1 Step 1: Deviation of Each
When a variable has real-world uses, clinical or other interpretation guidelines can help us
understand what the minimum and maximum scores in a sample tell us. For example, guidelines published by the Mayo Clinicstate that the normal adult resting heart rate ranges from approximately 60 to 100 beats per minute. A wellconditioned athlete might have a heart rate of about 50 beats per minute. The people in this hypothetical sample all have hr scores within the lower half of the normal range. Thistells us that the sample consisted of people with heart rates in the low normal range, and this suggests a sample
Score From the Mean Equation 4.2 appeared earlier, and it is repeated here as Equation 4.4. The first step in calculation of variance is to compute the deviation of each person's score from the sample mean M. (ズー カ answers the question, How far is a person's Y
score above or below the mean? Other
(4.4)
Deviation of individual Xscore from mean = (X- M).
of persons with good cardiovascular fitness. If the
For the data in temphr10, the deviation of the
sample had Min hr = 90 and Max hr = 120, this
first X score from the mean is (70 — 73.1), that is,
would indicate that many or most of the members
the score for the first case minus the mean of hr
of the sample have unusually high heartrates.
scores.
When a frame of reference for the evaluation of scores is available, it should be used when characterizing the sample. For example, if depression is assessed, one might ask, Are some
17% Page 85of 624 + Location 2510 of 15772
Why do some people have higher, and some people lower, hr scores? Because people have different characteristics, such as physical fitness, smoking, and anxiety, that make their heart rates
outliers (but not as extreme outliers). Three scores
which is considered healthy. Histograms would
(on rows 126, 157, and 197) were identified as
help to evaluate distribution shapes.
extremely high outliers. The case on row lies in between these groups of scores. You can
determine whether the in-between BMIscore on row is an extreme outlier by comparing the BMI valuein the data file on row 3 (which is 34) with the BMIvalues for the two neighboring values. The BMIscore on row 157 is also 34, and case number 157 is not tagged as an extreme outlier in this boxplot; therefore row 3 would also not be
5.11 Telling Stories About Distributions After you examine graphs such as histograms or boxplots, you should be able to tell an honest and reasonably complete story about the pattern you see. Imagine this game: Your task is to get a person
identified as an extreme outlier.
who has not seen the histogram or other graph to
We can report results for the male BMI boxplot as
information you provide. You win the game if you
follows. Values obtained from the SPSS
and your partner can do this more quickly and
frequencies procedure, not shown here, are used
accurately than other teams. Ready? Go!
to identify the exact values for the 25th, 50th, and 75th percentiles and the minimum and maximum scores. For men, median BMI was 23;
50% of male BMI scores were between 22 and 25. There were 2 low-end outliers for male BMI; neither was extreme. There were 13 high-end outliers; 10 were not extreme and 3 were extreme
outliers. For men, minimum BMI was 16 and maximum BMI was 41. Median BMI for women was 22; 50% of female
BMI scores were between 20 and 23. The female group had no low-end outliers for BMI. There were five nonextreme high-end outliers (rows 204,302,318,353, and 398). There were also two extreme high-end outliers; these BMI scores appear on rows 290 and 374. Minimum BMIfor women was 17 and maximum BMI was 33.
draw a picture of the graph, based only on verbal
If you have a roughly normal or bell-shaped distribution, you can communicate this to your partner very quickly with three pieces of information (normal, M, SD). That should be sufficient for your partner to sketch a graph. If the distribution appears somewhat normal but with some variations, such as positive skewness or outliers (see Table 5.1 for examples), you need to add that information (for example, three outliers at the high end). On the other hand,if your distribution does not resemble a bell-shaped curve (see Table 5.2), you need different stories or pieces of information. It may be sufficient to say “reverse J-shaped” or bimodal or uniform. However, you will need to give your partner more information (for example, the maximum score was 10). Distributions that
If we compare men and women, it appears that
have one or more modes and non-normal shapes
men tend to have higher BMIs than women.
require more information. Where was each mode
It would also be useful to examine the frequency distributions for BMI using suggested clinical cutoffs to evaluate the percentage of persons whose BMIs were within the range 18.5 to 24.9,
21% Page 120 of 624 » Location 3234 of 15772
located? Were some modes higher than others? Think about what your results mean. Figure 5.22 Histogram for Polarized Degree of Agreement Ratings
52%
30% strongly agree). Very few people chose intermediate levels of agreement. Most people strongly disagree, but the number of people who
30%
strongly agree is a substantial minority. Use of a mean or median (a value somewhere around 2.5) to describe central tendency would be misleading in this situation; 2.5 is near the neutral point, but very few people chose ratings near neutral. A concise way to communicate this would be: “Fifty-
10%
two percent strongly disagreed with this
2%
6%
statement, 30% strongly agreed, and very small percentages of people chose intermediate levels of
SD
D
N
A
SA
1
2
3
4
5
agreement. Opinion was strongly polarized.” If the author of a research report makes a blanket statement that all variables had approximately
Note: Agreement with the statement “The current
normal distributions, or allows readers to assume
U.S. president is doing an excellent job” was rated
that all distributions were normal, and then tells
using the response options 1 = strongly disagree, 2
readers that the mean degree of agreement with
= disagree, 3 = neutral or don’t know, 4 = agree, and
this statement was 2.5, this information byitself
5 = strongly agree.
provides a misleading description of the results.
The image is a histogram that showsa degree of responses to a statement “The current U.S.
president is doing an excellent job”. The X axis denotes the responses that range from strongly disagree, disagree, neutral, agree and strongly agree. The Y axis denotesthe percentageofresponses. There are five bars, andtheir heights are; o Strongly disagree: 52 percent + Disagree: 10 percent * Neutral: 2 percent Agree: 6 percent o Strongly agree: 30 percent
5.12 Uses of Graphs in Actual Research 1. Data screening: Identify potential errors or problems with data (such as recording errors, implausible scores, and missing values). Researchers need to report the number of scores that are problematic and indicate what they did to correct these problems. For beginning students, it may be sufficient to report the percentage of missing scores and
the number of outliers and extreme outliers for each variable. I suggest that beginning
How can the hypothetical results in Figure 5.22 be
students run analyses with outliers included
described? Opinion is highly polarized;that is,
and with outliers excluded;if results are
peopleare at either the negativeor positive
substantially the same, report one of these
extreme in this hypothetical example. There are
analyses and add a footnote to indicate that
two modes (52% of peoplestrongly disagree and
the other analysis yielded similar results. For
21% Page 121 of 624 » Location 3260 of 15772
both beginning and advanced students, keep
Screening”section of your research report.
arecord of any problems you detect in data,
(For some statistics you will need to check
and anything that you do to deal with the
additional assumptions.)
problems. Discussion of better ways to handle
2. Youmight need to say, “The histogram
outliers and missing values are provided in
appears approximately normal except for a
Volume II (Warner, 2020).
specific number of outliers.” In this situation
. Evaluation of whether assumptions for
you face the “what to do with outliers”
analysesare violated: When you learn
problem. Ideally, you decide what to do with
statistical techniques such as ¿tests, ANOVA,
outliers prior to data collection. You need to
and regression, you will see that each analysis
document the number of outliers and what
is based on some assumptions. Some analyses
you decided to do with them (such as drop
work fairly well, under certain
from analysis, recode into different values, or
circumstances, when their assumptions are
leave them in). Do not experiment with
violated; others do not. There is a widespread,
different ways of handling outliers until you
but not exactly accurate, belief that scores in
find results you like; this is p-hacking.
samples need to be normally distributed to
3. You might need to say, “The distribution is
satisfy the assumptions for many common
very skewed, and skewness cannot be
analyses. I think it would be more accurate to
corrected by modifying or removing a few
say that, in practice, some kinds of departure
outliers.” Only if it is conventional in your
from normality in the sample (such as the
field, only if values differ by orders of
presence of extreme outliers, or reverse J-
magnitude, and only if planned ahead,log or
shaped or polarized distributions) create
other nonlinear transformations may be
problems in many common analyses. The
applied to data analyses using log(X) instead
ways the violations of assumptions and rules
of X.
can lead to incorrect conclusions are
4. In some situations that involve outliers,
discussed in later chapters about significance
nonparametric analysis may be preferable.
tests.
When scores are converted to ranks, extreme
. Report information needed to characterize
outliers and skewness are not problems.
and describe your sample: For categorical
(Newer robust techniques, not covered in this
variables, this is often in sentence form, for
book, may be better choices; Field, 2018.)
example, “The sample consisted of 100 male
5. If distribution looks nothing like a normal
and 150 female university students, with a
distribution (e.g., uniform, J-shaped, U-
mean age of 19.1 years.”
shaped, mode at zero), proceed with caution.
Here are some of the stories (or descriptions) about distributions that might appear ina research report.
Entirely different analyses than the ones in this book may be required.
5.13 Data Screening: Separate
1. You might say, “The histogram appears approximately normal with no extreme
Bar Charts or Histograms for
outliers.” You can state this in the “Data
Groups
21% Page 121 of 624 » Location 3281 of 15772
Appendix 4B reviews rounding. I suggest that you retain at least three decimal places during computations. Final results for most statistics are
e...
real data, SSis usually not a whole number.
75; 1.9; 3.61 80; 6.9; 47.61 82; 8.9; 79.21 Sum:0; 288.9
often rounded to two decimal places. See Note that SS cannot be a negative number
Appendix 4B for a discussion of rounding.
(because we are summing squared deviations, and
In Figure 4.9 (data from temphr10.sav) the
squared numbers cannotbe negative).
squared deviation from the meanfor each individual person appears in the last column (the
Other factors being equal, SStendsto be larger
variable named deviationsq). Adding the scores
when:
for deviationsq gives the valueof SS for this data set: SS= 288.90. For larger data sets, it is more convenient to have a computer program do this.
Figure 4.9 Deviations and Squared Deviations of Heart Rate Scores From Mean
|@ deviation
69 70 71 73 74 ⑦⑤ 75 80 82
|devi
1. The individual (X- M) deviations from the meanare larger in absolute value. 2. The number of squared deviations included in the sum increases. The minimum possible value of SS (which is O)
11.10 4.10 -3.10 2.10 -①0 .⑨0 1.90 1.90 6.90 8.90
occurs when all the X scores are equal and,
16.81 9.61 4.41 0① -⑧① 3.61 3.61 47.61 79.21
therefore, equal to M. For example, in the set of scores [73, 73, 73, 73, 73], the SSterm would equal 0. There is no limit, in practice, for the maximum
valueof SS. To interpret SSas information about variability,
we need to correct for the fact that SStends to be larger when the number of squared deviations included in the sum is large. Dividing by #, the number of scores in the sample, seems like the obvioussolution. However, this does not provide
Sum
0.0
288.90
........
The imageis a table that showsheart rate values, deviation and square of deviations.
hr; deviation; deviationsq 62; minus 11.1; 123.21 69; minus4.1; 16.81 70; minus3.1; 9.61 71; minus2.1; 4.41 73; minus.1;.01 74;.9;.81 75;1.9;3.61 17% Page 87 of 624 - Location 2566 of 15772
the best answer.
12.3 Step 3: Degrees of Freedom It might seem logical to divide SSby Ato correct
for the increase in size of S§as increases. However, this yields values that are slightly too small; Gosset (discussed in Tankard, 1984) worked out the reason for the problem and discovered a simple solution. When we look at the pieces of information used to compute SS(i.e., the deviation of each score from the sample mean), it
described as approximately normal. Among the
トA相 川④ る
three outliers identified in the boxplots, the only one that stands out clearly in the histograms is the male height of 78 inches or 6’6, or about 198 cm. This is unusually tall, but the number is not so large that you would think it impossible. Figure 5.24 Separate Boxplots for Height for
Female
and
Male
Groups
(Data
From
malefemaleht.sav)
レ ーー ョー ェ ーーシーー ョーー ョーー ョ ーーーーー ェ ーー ロー n ーーーーー nシmm noe ョc ョ mn
je レー
eニー
The imageis a screenshot of the menu bar in SPSS.
Female
Male
There are two boxplotsin the imageindicating heights for female and male groups. The X axis denotesthe sex, whether male or female andthe Y axis denotestheheight in inches. This range from 55 to 80,rising in increments of 5. The female boxplot has a median of 64 and one low-end outlier. Thisfigure lies at a lower plane than the male boxplot. The male boxplot, with median around 70, has one high-end outlier and one low-end outlier. Figure 5.25 Command to Organize Output by Groups
At thetopofthe spreadsheetarethefollowing menu buttons;file, edit, view,data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttons to open a file, save, print, go back and forward, and other table editing options. The Databutton has been depressed, and the following optionsare visible in the drop-down menu;define variable properties, set measurementlevels of unknown,copy data properties, new custom attribute, define date and time,define multiple response sets, identify duplicate cases, compare datasets, sort cases, sort variables, transpose, adjust string widthsacross files, merge files, restructure, rake weights, propensity score matching,case control matching, aggregate,
copy dataset, and split into files.
The split file dialog box hasa large box for the variable which has beenfilled with thevariable Heightinch. Ontheright are checkboxes such as; analyze all cases, do notcreate groups; compare groups and organize output by groups. Thelast has beenchecked.
22% Page 123 of 624 - Location 3320 of 15772
is possible to see that we do not have N independent deviations (or pieces of information) available to compute the SS; in fact, we have only (W-1) pieces of information.
This modified divisor, V-1,is called the
To explain why deviations from the mean in a
the deviations are “free to vary.” The use of df
sample of Vscores provide only (1-1)
instead of Vasa divisor is another frequently used
independentpieces of information aboutdistance
toolin the statistician's bag of tricks. Later
from the mean, recall that the sum of all
analyses also use terms,although のoften has
deviations of scores from the mean must equal 0.
different values than (W-—1) in other situations.
Suppose we have & = 3 scores in a sample(call
Degrees of freedom for the SSand sample variance
these scores X71, Xp, and X3) and that their meanis
are obtained using Equation 4.7:
M.
of freedom
(45. The 4/term tells us how many of
Other
First, we convert each X score into a deviation by subtracting the sample mean M. We know that the sum of these deviations must equal zero. That yields this simple equation:
Other
(Х, - М) + (Х, - М) + (Х, - М) = 0.
(4.7)
df= (N-1). 4.12.4 Putting the Pieces Together: Computing a Sample Variance
We can rearrange this equation by subtracting (X35 — M) from bothsides; the equation becomes:
The variance for a sampleis usually denoted s2. A sample variance is obtained by dividing SSby its
Other
degrees of freedom:
( — M) + X, - / = (X, ー ル ⑦. When we compute (X; —M) + (X — M) (on the left side of the equation), this gives us the value that the remaining deviation, (X3 - 1), must have. Only the first two deviations are “free to vary,” that is, free to take on any possible value. Once we know the value of any two of the deviations, the value of the last deviation is determined (it must be
whatever number is needed to make the sum of all deviations equal 0). This is only a demonstration, not a formal proof.
Other
(4.8 )
»# = SS/(N - 1)or SS/df (Some textbooks use $2 to denote a sample variance calculated as SS/N. In actual practice, this
notation is almost never used when statistics are applied to real-world data, and you will not see $2 again in this book.)
Return to the data in Figure 4.9. The first column shows heart rate scores for each person. The
second column shows the deviation of each available when we compute SSoranother
17% Page 88 of 624 + Location 2588 of 15772
person’s score from the mean(the variable name is deviation). The third column shows each
maps to show the spread of obesity in the United
Nevertheless, she persisted.
States over time. A PowerPoint presentation that shows a series of maps from 1985 to 2010 appears
Figure 5.31 Florence Nightingale’s Graph: Number of British Soldiers Who Died in the
at
Crimean War During Each Month Divided Into
https://www.cdc.gov/obesity/downloads/obesity trends 2010.ppt. Figure 5.30 shows a more
Three Causes of Death Diagram of the causes of mortality in the ARMYin the EAST.
recent graph for prevalence of obesity in the have higher percentages of obesity. (The
April 1854 to March 1855.
Bulgaria
United States in 2017. States shaded darker gray corresponding map online at
https://www.cdc.gov/obesity/data/prevalencemaps.htmlis keyed in color.) At aglance you can see several features of the data. High rates of obesity occurred in the deep south, Iowa, and West Virginia. Colorado, Hawaii,
and the District of Columbia had low rates. U.S. residents can see how obesity rates in their states compare with those of other states.
Outer portion
5.15.3 Historical Example Most peoplethink of Florence Nightingale as a pioneer of nursing; her work also had an enormous impact on medicine and hospital design (Lienhard, 2002). During the Crimean War, she sent reports to Britain about the number of
soldiers who died each month and their causes of death. She used polar diagrams (this is not currently a popular form of graph) to
communicate this information. Figure 5.31 is adapted from part of her graphics (Nightingale, 1858). Her major finding was that far more soldiers were dying from preventable diseases (sometimes acquired in the military hospitals) than from wounds. Up until the 19th century, this was true in many wars. The point she wanted to make was that far more sanitary conditions and better nutrition were needed to keep the army (and civilian populations) healthy. This was not something the War Department wanted to hear.
22% Page 127 of 624 » Location 3412 of 15772
Source; Public domain.
In the diagram,each month has a separateslice of apie. The length and width ofthepie varied basedon the numberofdeaths. The pie is subdividedbasedonthe cause of death, ‘mainly reasonsof battle, disease and other causes. The main cause of the deaths seemsto be due to disease. The datafor the months between April 1854 to March 1855 has beencoveredin the diagram.April to June had very low deaths. July was slightly higher, while August and September was higherthanthe previous ‘months. There wasa dip in October, but Novemberlevels are similar to those of
You can describe distribution shape by thinking
vegetable consumption?
about the answers to these questions. Some of
3. Diet experts often recommend at least
these descriptions are not mutually exclusive. For
five servingsof fruits and vegetables per
example, a positively skewed distribution may
day. How well are the peoplein this
also have high-end outliers, and it may have a
sample doing at meeting that standard?
large mode at zero.
4. What percentage of persons reported eating one serving per day? Thisisa
In atypical research report, authors would like to
frustrating question to answer, given
beable to say something like this at the beginning
this bar chart. If you had access to these
of the “Results” section: “All quantitative variables
data, what other SPSS output would you
were approximately normally distributed with no
want to see to answer this question
extreme outliers.” Real data often do not behaveso
precisely?
nicely, of course. An author might have to say
. Briefly describe, in your own words, three
something more like this: “Number of doctor
things you look for to decide whether a
visits had a reverse J-shaped distribution with five
histogram lookslike a “reasonably normal”
high-end outliers.”
distribution. . Describe the shape of each of the histograms
Comprehension Questions
in Table 5.3. Sometimes more than one term can be applied; for example, skewed
1. Inthe bar graphs in most of this chapter
distributions may also have outliers.
(except those in Section 5.14), the height of
. Whattype of plot appears in Figure 5.33?
the Yaxis provides what information?
What do the values on the Yaxis correspond
2. Suppose you generate a bar graph using SPSS.
to? (Score values? Frequencies?) What
You also have a frequencytablefor the same
information can you report from this plot?
data. What information from the frequency
There are omissions in labeling. Whatlabels
table might you add to the bar graph to make
could be added to this chart?
the information in the bar graph more Figure 5.32 Results From Warner, Frye, Morrell,
precise? 3. Whatisacommon practice that can makea bar graph deceptive? Can you think of at least one other way bar graphs can be made deceptive?
and Carey (2017): Number of Servings of Fruits and Vegetables Eaten on a Typical Day, V= 1,250
50% 40%
4. What can you see in a histogram of quantitative scores that is less easy to see in a frequency table? 5. Consider the histogram in Figure 5.32.
1. What were the minimum and maximum number of servings of fruits and vegetables peoplesaid they ate per day?
30% 20% 10% 0%
O
1 2 3 4 5 6 7 Number of servings offruits and vegetables per day
What was the range?
2. What was the modal amount offruit and
23% Page 1300f 624 - Location 3458 of 15772
The X axis representsthe numberof servings
8
........[.
of fruit andvegetables and the Y axis the percentageeaten. There are 8 bars, and their values are as follows; 0: :④② 1 :①② 2 :11 3 :10 4: 5:
Figure 5.33 Figure for Comprehension Question 8: WhatIs It?
23% Page 133 of 624 - Location 3482 of 15772
when you compute the following. The values of Mand SD can be combined to set up ranges of score values; that is, we can combine
information about the mean and information about typical distances from the mean. This can be done using integer multiples of 57, such as M+
1250 Lo M-28D=505 na 15D ma M-1°SD=62 ET) mas — = ら т sD a M+1SD=67
= a = ョ E ッ ッ ッ
#3 М+2'50=695
1x SDand M+ 2 x SD.
“oso
For M= 64.5 and SD = 2.5, we obtain the following
Other
M-2xSD=64.5—5 M — 1 x SD = 64.5 — 2.5 M:0xSD=64.5+0 M +1 x SD = 64.5 + 2.5 M+2xSD=64.5+5
59.5. = 62. =64.5. = 67. = 69.5.
The shorter vertical arrow next to the frequency table in Figure 4.10 extends from M-(1 x SD) to M + (1 x SD). This correspondsto the frequencies enclosed in the smaller ellipse. The longer vertical arrow ranges from M-(2 x SD) to M+ (2 x 5D), score values from 59.5 to 69.5. This corresponds to scores in the larger ellipse. Most women in the sample had heights that were included in the range M-(2 x SD) to M+ (2 x SD); only three women (2.5%) had scores below 59.5, and only two women (1.7%) had scores above 69.5.
In words: When we combine information about distance from the mean(57) with the location of the mean (M), we obtain information about the rangeof values within which most of the Yscores lie; this is called the rangerule. The range rule works only for bell-shaped distributions, as in the present example. Figure 4.10 Hypothetical Data for Female Height
in Inches for # = 120 Women With 47 = 64.5 and ②の =②.⑤
17% Page 90 of 624 - Location 2647 of 15772.
The imageis a combinationof a table and a graph that shows hypothetical data for female height. The table hasfour columns; valid count, frequency,percent and cumulative percent. Details are below; о о о ooo.
values for the hypothetical female height data:
58;1;.8;.8 トル ;②⑤⑤ „7; 11.7 1.7; 23.3 63; 12; 10; 33.3 20; 16.7; 50 64; 3.3;63.3 65; 66; 18; 15; 78.3 12.5; 90.8 67; ; 5; 4.2; 95 „3; 98.3 70; 2; 1.7; 100 Total; 120; 100
There are2 circles over the figures; one covers thepercent values 11.7, 10, 16.7, 13.3, 15, and 12.5 andthesecondcoversa larger set of percent values including 2.5, 6.7, 11.7, 10, 16.7, 13.3, 15, 12.5, 4.2 and 3.3.
Thegraph in the second part of the image showsthe X and Y axes as well as the 1 into SD and2 into SD lines. The following figures are mentioned alongside the graph: * Mminus2 into SD equals 59.5 * Mminus1 into SD equals 62 * Mequals 64.5
The Normal Distribution and z Scores
a zscore that correspondsto the original X score.
A zscore, also called a sta score (which mightbein dollars, kilograms, or degrees Celsius) from the sample mean, in unit-
6.1 Introduction In the previous chapter, you learned to evaluate score location by examining cumulative percentages in frequencytables. You can obtain information such asthe percentage of persons
free or standardized terms. Then, we use a table of
areas for the sta
a
look up the percentage of scores that fall below that zscore. This method works well only ifthe distribution shapefor scoresis reasonably close to
normal.
who havescores below a specific value of Yby
To do this, we need to define normal distribution
examining cumulative percentages in frequency
shape more precisely. A distribution, also
tables.
called a Gaussian distribution, appears
You already know something about score locations in everydaylife. To evaluate how tall you are, you look at other peopleof the same sex and ask, Are most of them taller or shorter than Iam? If you see that more than half of them are shorter, you know your height is above average. If something like 90% of peopleare shorter than you, you know you are muchtaller than average.
approximately bell shaped in a histogram. However, many bell-shaped curves do not correspond exactly to normal distributions. What
defines a normal distribution is a fixed relationship between distance from the mean and area under the curve. This relationship is given in
detail in tables of the standard normal distribution. Appendix 6A provides a brief explanation of the mathematics of the normal
We will need a method to describe locations in
distribution. The area below the value of zthat
distributions that can be generalized to more
corresponds to an Æscore in a normal distribution
situations (and that does not require all the
is roughly equivalent to the cumulative
information in a frequency table). When
percentage of scores below that Y value in the
distributions have an approximately normal
frequency table.
shape, we can evaluate locations of specific X outcomes quickly by converting X values into a
unit-free index of distance from the mean. The
6.3 Standardized or zScores
only information we need for that is Mand SD for
A zscore is an index of the distance of an Yscore
the distribution of X scores.
from the sample mean that has been converted into unit-free or standardized terms. Suppose that
6.2 Locations of Individual Scores in Normal Distributions The new method for score location introduced in this chapter involves two steps: First, we compute
23% Page 135 of 624 - Location 3494 of 15772
Æis height; Æscores can be given in different units, such as inches or centimeters. When we
convert an Xscore into a zscore, we obtain an index of distance from that mean that is not related to the original units of measurement.
6.3.1 First Step in Finding a 2 Score for Æ The Distance ofY From M The first step toward evaluating the location ofa specific scoreis to find the distance (or deviation) of the Y score from the sample mean Min the original units of measurement, such asinches. That distance, also called a deviation from the mean, is (X- M). You haveseen this term before. Deviation of individual score X from a sample mean Mis:
Other
(6.2)
а = (X-M)SD. The values of Mand SD differ depending on the unit of measurement (e.g., feet, centimeters, or inches). When we convert X'to z, we obtain z scores that are independent of the original unit of
measurement. We can say that zscores are standardized or unit free.
Standa
very frequently used tool in the statistician’s bag of tricks. You will see this again in many future situations.
Other
As an example, consider one individual female
(6) X-M).
height score (my own), given in both inches and centimeters. The example in Table 6.1 demonstrates that we end up with the same z
For example, Iam 62 in. tall (X= 62). Let’s assume
score even if the units of measurementfor X
the mean height of women in a sample is M= 64.5
differ. The left-hand column in Table 6.1 provides
in., and the standard deviation SD = 2.5. For me, (X
all the needed information in inches, and the
—M) = (62 — 64.5) = -2.5. 1 am 2.5 in. below average
right-hand column gives the corresponding
height for women in the sample.
information in centimeters. At the bottom of each
The sign of (X- M) tells you whether Yis below the mean(if X- Mis negative) or above the mean (if X — Mispositive). This is part of the information we want. However, the value of (X- M) doesn’t tell us what percentage of persons are shorter or taller
column, a zscore is computed using the values of X, M, and SD. Note that you convert inches to centimeters by multiplying by 2.54. A woman whose heightis 62 in. is 167.64 cm tall.
Table 6.1xz
than Xinches.
6.3.2 Second Step: Divide the (YM) Distance by SDto Obtain a Unit-Free or Standardized Distance of Score From the Mean
м-в
=-100
M= 1638 o 21082-100
The point of this example is that the value of zis the same (within rounding error) whether X
To evaluate how far an individual X score is from
heightis given in inches or centimeters. I am 62
„М, уме can compute a zscore (also called a standard
in. tall (or approximately 163.8 cm). Whether
score or standardized score):
heightis given in inches or centimeters, Iam 1 standard deviation below the average height for
23% Page 136 of 624 » Location 3523 of 15772
women in this example. This is a demonstration
score of -1.00 tells me that this height is 1
(not a proof) that the value of zdoes not depend
standard deviation below the mean. More
on original units of measurement.
generally, once we have a zvalue, we can say, This
I suggest that you obtain a zscore for your own height. For female height in inches, use M = 64.5 and SD = 2.5; for male heightin inches, use M=
score is zstandard deviations below the mean(if z is negative) or this score is zstandard deviations above the mean (if zis positive).
67.5 and SD = 2.5. To convert inches to
We don't know yet whether a distance of 2=-1.00
centimeters, multiply these values by 2.54. Your z
is not very far, or very far, below the mean. Is 2=-
score tells you whether you are above or below
1.00 so far below the mean that when people see
averagein heightrelative to the imaginary data in
me, they think, wow, that's the shortest woman
this example. (You can find estimates of male and
I've ever seen? We need a way to evaluate whether
female height for many different nations online,
the absolute value of zindicates a notably large, or
and use these values if you want to compare your
small, difference from average.
height with national averages.)
6.4 Converting zScores Back Into YUnits
If scores for the variable of interest, suchas height, are normally distributed, we can use graphs or tables of zscores for a standard normal distribution to find areas that lie below (or above) 2. These are interpreted like cumulative
If you know thatscores are normally distributed,
percentages. If I want to compare my height with
and you know the values of z, M, and SD, you can
other heights in a normally distributed sample, I
convert a zscore back into the original score by
obtain approximately the same information about
“reversing”the operations in Equation 6.2. First
location if Ilook at the cumulative percentage in a
you multiply zby SD, then you add M, as in
frequency table or the area below zin a normal
Equation 6.3:
distribution. To evaluate location using cumulative percentage, I needed alot of
Other
information (all the scores and frequencies in a frequency table). To evaluate score location using
(6.3)
X=(z x SD) + MN.
zscores, I need only three pieces of information:
If I know that height is normally distributed, that
normal, with mean Mand standard deviation SD.
my zscore is —1, and that for height in inches, M= 64.5 and SD = 2.5, then I can find X Æ= (-1 x 2.5) + 64.5 = 62.
the information that the distribution shape is
Thatleads to the next question: How do we
evaluate whether a distribution of scores is approximately normal?
6.5 Understanding Values of z A zscore can be verbally interpreted. My height is 62 in., and relativeto the values of Mand SDin the previoussection, this corresponds to z=-1.00.A z
23% Page 137 of 624 » Location 3551 of 15772
6.6 Qualitative Description of Normal Distribution Shape The term normalhas a different meaning in
physical fitness. When we go on to bivariate
interest waslife satisfaction (LS). Before doing
analyses, we will ask how hr scores are
analyses to evaluate whether NCIfv predicts LS,
statistically related to other variables, such as
we need to know about the behavior of scores for
amount of anxiety or stress. Results of these
each of these variables. This survey was
analyses can lead to inferences that stress
completed by 492 students from a university in
predicts, or perhaps influences, heart rate.
New England,including 152 male and 340 female
In later chapters you'll see that the overall
variance for a variable such as hr can be divided (or partitioned) into proportions of variance that can bepredictedfrom or are related to other variables (such as physical fitness, smoking, anxiety, and caffeine use). Some variables may predict large
students. They were recruited from introductory courses, 79 from a nutrition course and 413 from psychology classes. All participants were between ages 18 and 24; the modal age was 18. Descriptive statistics for quantitative variables appear in Table 4.1.
proportions of variance in heart rate (possibly
Tables of descriptive statistics often use
these are the variables that have the strongest
abbreviated names for variables that are used
influence on hr). For those of us who are excited
throughout the paper. Notes at the bottom of the
aboutstatistics, this is where the fun begins; this
table identify the variables and provide additional
is where we can make discoveries or test past
information about them. Direction of scoring
research claims about discoveries. Other variables
must be clear (for example, we need to know that
may predictlittle or none of the variance in hr.
ascore of 5 indicates better sleep, rather than more sleep problems). It is helpful to list variables
4.16 Reports of Descriptive Statistics in Journal Articles Most journal articles report descriptive statistics
for numerous variables. Information about categorical variables (that describe groups in the study) can usually be provided in sentence form. Usually information for numerous quantitative
variables is summarized in table form. The following data are from Warner, Frye, Morrell, and Carey (2017). The predictor variable of most interest was number of servings of fruit and
in sets (in this example, a list of well-being outcome measures, a list of behavioral predictors, and alist of dietary predictors). An earlier “Methods” section in the research report would provide more information about how variables
were measured. Information about distribution shapes should be included;this is discussed in
Chapter 5.
4.17 Additional Issues in Reporting Descriptive Statistics
vegetables consumed per day (NCIfv, servings of
Many additional kinds of information can be
fruits and vegetables from a National Cancer
included in summary tables. The minimum
Institute food frequency questionnaire). Past
information usually provided for each
research suggested that people who eat more
quantitative variable is Mand SD. Table 4.1
fruits and vegetables tend to have higher scores
included the possible minimum and maximum
on measures of well-being such as life satisfaction
scores for each variable, on the basis of the way
and positive mood. The outcome variable of most
scores were obtained for these variables. Readers
18% Page 01 of 624 » Location 2696 of 15772
summing the probabilities for the “slices” above z
example, z= +1.00); that is, 15.86% of the area lies
=+1.00:13.59% + 2.14% + .13% = 15.86%. That
above z= +1.00, and 15.86% of the area lies below
is, 15.86% of cases in a perfectly normal
2=-1.00.
distribution have zvalues greater than +1.00. Note that z= 0.00 corresponds to the mean of this
distribution. The sum ofall the slices in Figure 6.1 is 100%. The sum of the slices above the mean (above z= 0.00) is 50%, and the area below z= 0.00
is also 50%. Figure 6.1 Areas in Normal Distribution That
Because the total area under the curveis 100%, once we know the percentage of cases that lie abovea value of z, we can find the percentage of cases below zby subtraction. Because 15.86% of scores lie above z= +1.00, we know that (100% — 15.86%) = 84.14% of cases lie below z= +1.00.
Correspond to zScores
HF0 19 about 997%
210 +2 about 95% Lt —1 to +1 about 68%
6.8 Areas Under the Normal Distribution Curve Can Be Interpreted as Probabilities If you were to draw a case at random from a
os
ッ
ョ
normally distributed population of scores, the probability that it would have a zscore greater
At the top is a scale that showsthe percentage
of areathat falls underthe curve under different z scores. Minus 1 to plus 1 is about 68 percent. Minus 2 to plus2 is around 95 percent and minus 3 to plus 3 is close to 99.7 percent. There are 6 z scores, 3 each on both the positive and negativeside of 0. The area coveredis: * OtoPlus 1 and minus 1 correspondto 34.13 percent * Plus 1to plus 2 and minus 1 to minus 2 corresponds to 13.59 percent e Plus 2 to plus 3 and minus 2 to minus 3 corresponds to 2.14 percent The outer edges beyond minus 3 and plus 3 correspondsto 13 percent
than z= +1.00 is 15.86%. The probability that a randomly drawn case will have z> 0.00 is 50%. In other words, areas can be interpreted as probabilities. For integer values of z such as z= +1.00, the diagram in Figure 6.1 can be used to answer questions about area and probabilities. However, 2 values are often not integers. Areas that correspond to other (noninteger) values of zcan
beobtained from tables of the standard normal distribution, as discussed in the next section.
To summarize information about areas in the standard normal distribution: * The total area under the curve is 100%. * The area below the mean = 50%; the area
Because the distribution is perfectly symmetrical (amirror image), the percentage of area below a specific negative value of z(such as z=-1.00) is the same as the percentage of area that is above the corresponding positive area of 2 (in this
24% Page 138 of 624 - Location 3608 of 15772
above the mean = 50%. The mean is z= 0. * The area above a specific value of +2, such as 2 = 1.96, is the same as the area below —z (—
1.96). e Areas can be combined by addition and
subtraction.
want to evaluate (out in the tail of the
Standard normal distribution tables generally give
distribution).
area in terms of proportion; people often talk
There are several ways to use this table to describe
about areas in terms of percentages. To convert
the location of a score with z= +1.96. Here is the
proportion to percentage, multiply proportion by
easiest.
100.
Suppose we want to know the proportion of area
6.9 Reading Tables of Areas for the Standard Normal Distribution The equation in Appendix 6A can be used to
that lies above, and the proportion that lies below, z=+1.96.
Locate the value of z= 1.96 in column in Figure 6.2. The corresponding number in column C, the “tail area,”is.025.
generate normal distributions for any values of
We can convert from proportion to percentage by
the mean and standard deviation that you want.
multiplying by 100; 2.5% of the area in this
For example, a normal distribution for IQ scores
distribution lies above z= +1.96. By subtraction,
would have a mean of 100 and a standard deviation of 15. The standard normal distribution
97.5% of the area in this distribution lies below z=
has a mean of 0 and a standard deviation of 1; it
like the cumulative percentage in a frequency
corresponds to a distribution of zscores. Figure
table. We could say this score is at the 97.5th
6.1 provides only areas related to integer values of z. In practice we will often need areas that -
percentile.
correspond to noninteger values. More detailed
information about zscore distances from the mean, and areas under the normal distribution, is given in tables of the standard normal distribution. See the table in Appendix A at the back of this book. Part of that table appears in Figure 6.2 (for selected values of zthat range from 1.83 to 2.12). Figure 6.3 shows enlarged versions of the diagrams that appearat the top and bottom of the table; these diagrams indicate which slices or areas correspond to the numbers in the table. For each valueof z, the table provides two kinds of
information about zscore location. Column A lists the zvalues. Column gives the area between z= 0.00 and the zvalue you want to evaluate. (Recall that z= 0.00 correspondsto X = A) Column C gives the area thatlies beyond the z value you
24% Page 139 of 624 » Location 3633 of 15772
+1.96. The percentageof area below a zscore is
This tells us that a person who has a zscore of +1.96 has an unusually high score. We can also think in terms of probability.If a person is randomly selected from this distribution of scores, there is a 2.5% probability that the person will have a higher score, and a 97.5% probability that the person will have a lower score, than z= +1.96. (We can convert zscores back into units for
Xif we want to make these statements in terms of Xscore values.) When zis negative, use the diagrams at the bottom of the table to identify which slices of area in the distribution correspond to ranges of 2 values. Because the distribution is symmetrical, we know the following:
The area between z= 0.00 and z= +1.96 is the same as the area between z= 0.00 and z=-1.96
mode, or are skewed, Mis sometimes not the best
Consider thisset of scores: X= [1, 3, 5, 2]. If you
description of the “typical” response. When you
square each X value and then sum the squared
report a mean, you need to tell readers something
values, you would obtain (1 + 9 + 25 + 4)=39.1f
about the shape of the frequency distribution to
you sum the X's and then square that sum, you
provide the background information needed to
would obtain (1 + 3 + 5 + 2)2 = 112? = 121.It is
understand potential problems with the mean.
important to know which arithmetic operation to
Statistics books provide so many examples of bell-
do first.
shaped distributions that students may assume
There are rules of precedence (order) for
that all data have this distribution shape.
arithmetic operations (see
However, many common kinds of variables do not
http://mathworld.wolfram.com/Precedence.htm
have bell-shaped distributions. Graphs, discussed
1). When I present equations I explain in words the
in Chapter 5, can be used to evaluate whether
order in which computations should be done, and
scores have a bell-shaped distribution or some
often, I use extra parentheses to make this clear in
other distribution shape. We should not assume
the equation. When an expression appears within
that all distribution shapes are bell shaped. When
parentheses, such as (X- 5), do that operation
reporting information about variables, remember
first. If you see E(X?), square each X valuefirst,
that readers may assume a bell-shaped
and then sum the squared X values: (1 +9 +25 +4)
distribution if you do not explain clearly that the
= 39. If you see (F X)?, sum the X values first, and
distribution shape is different.
then square the sum: (1 + 3 + 5 +2)2 = 112= 121.
If you read mass media reports about “averages,”
Be aware thatif you do arithmetic operations in
you need to know whether average was estimated
the wrong order, you can obtain answers that are
using the mode, median, or mean; under some
incorrect by huge amounts.
circumstances, these three descriptivestatistics can yield very different values.
Appendix 4B: Rounding
The next chapter provides further information
Computer programs often provide numbers given
about obtaining and interpreting graphs of
to several decimal places. Each number that
frequency distributions and additional questions
comes after a decimal point represents one
we can ask about distributions of scores on a
decimal place. For example, the number 4.171 has
quantitative variable.
three decimal places.
Appendix 4A: Order of Arithmetic Operations
If you do by-hand computations, you should retain at least three decimal places during your computations to minimize rounding error. Final results are usually rounded to a small number of
Many equations combine two or more arithmetic
decimal places, often two decimal places. The
operations, for example, XX? includes both
preferred number of decimal places to report
squaring and summing X scores. When operations
differs across disciplines and may differ across
are combined,the result often differs depending
variables. Use common sense. It would besilly to
upon the order in which operations are done.
say that the average American gets 7.481 hours of
18% Page 94 of 624 » Location 2746 of 15772.
The values of z from 1.83 to 2.12 and the
values under Band C have been shown asa table. At the top and bottom, thereare figures showing the extentof area covered by B and C.
Table values: 1.83
0.4664 0.0336
1.84
0.4671 0.0329
Onespecific z value, 1.96, which corresponds
1.85
0.4678 0.0322
to .4750 for Band .0250 for C has been
1.86
0.4686 0.0314
1.87
0.4693 0.0307
1.87
0.4693 0.0307
1.88
0.4699 0.0301
1.89
0.4706 0.0294
1.9
0.4713 0.0287
1.91
0.4719 0.0281
1.92
0.4726 0.0274
1.93
0.4732 0.0268
1.94
0.4738 0.0262
1.95
0.4744
1.96
0.4750 0.0250
1.97
0.4756 0.0244
1.98
0.4761 0.0239
1.99
0.4767 0.0233
2.00
0.4772 0.0228
2.01
0.4778 0.0222
2.02
0.4783 0.0217
2.03
0.4788 0.0212
2.04
0.4793 0.0207
2.05
0.4798 0.0202
2.06
0.4803 0.0197
2.07
0.4808 0.0192
2.08
0.4812 0.0188
2.09
0.4817 0.0183
2.10
0.4821 0.1790
2.11
0.4826 0.0174
2.12
0.4830 0.0170
highlighted.
Table values:
24% Page 142 of 624 - Location 3668of 15772
0.0256
Figure 6.3 Detail Distribution Table
B
area below 0 Eee
area between 0 IL ea セ
Y
7 o
From
+z
⑧ area between -z
Standard
Normal
Textbooks sometimes drill students in the use of the normal distribution table with questions such
с
as “What percentage of area lies between 2=-1.00
area above +7
Л
and z= +2.00?” These artificial examples do not correspond to the kinds of questions that are of real interest to data analysts.
b
Data analysts usually want to answer a simple
c
question: Is an score or other outcomeclose to, far from, or extremely far away from the mean?
ando
Data analysts sometimes choose different
ッ
numerical values to define “far from.” The
À
AX
o
following z values are common ways of thinking
aboutdistance from the mean. e Values between z=-1.00 and z= +1.00 are “close” to the mean.
There are four diagrams that show the area between O and z as well as beyond z for positive and negative valuesof z.
* Values between 2=-2.00 and 2= + 2.00 (but
The first diagram highlights the area between 0 and positive value of z in a normal distribution diagram. The area to the left of O has been markedas Area below 0 equals 50
between”close and far from the mean. * Values below z=-2.00 or above z= +2.00 are “far from” the mean. * Values below 2=-3.00 or above 2= +3.00 are
percent.
outside the range -1.00 and +1.00) are “in
“very far from” the mean.
The second diagram, to the right of the first, highlightsthe area beyondpositive zina normal distribution diagram. This has been shownas the Area above positive z. The third diagram,below thefirst, highlights the area between 0 andnegative value of zina normal distribution diagram. The fourth diagram, to the rightof thethird, highlights the areathe area beyond negative z ina normal distribution diagram.
6.10 Dividing the Normal Distribution Into Three Regions: Lower Tail, Middle, and Upper Tail 24% Page 142 of 624 » Location 3671 of 15772
A normal curvedivided into these areas appearsin
Figure 6.4. Individual researchers are free to use other values of zascriteria for distances. Researchers are often interested in the situation where the areas beyond +zsum to exactly 5%. A
normal distribution can be divided into three
areas: ヶ 2.5% of the area below -z, the “lower tail,” ® 95% of the area in the center, and ® 2.5% of the area above+2, the “upper tail.” Figure 6.4 Areas That Are Close, Far, and Very Far From the Mean (in z Score Units)
95% of the area in the center, is z= +1.96.
close to M
Another common wayto divide the distribution into lower tail, center, and upper tail appears in
: very far
{between
very far =
マ
4
o
+
ョ
ッ
Figure 6.6. In Chapter7 (on confidence intervals) we will focus on the range of values that is “not very far
The X axis denotesthe z score and ranges from minus3 to plus 3, with as the center. The area between minus1 and plus 1 is underthe highestpart of the curve and has been termed Close to M.
The area between minus 1 and minus 2 and plus 1 and plus 2 has been termed Between. The area between minus 2 and minus 3 and plus 2 and plus 3 has been termed Far. The outer edges beyond minus 3 and plus 3 has been termed Veryfar. Figure 6.5 Normal Distribution Divided Into Areas Below z=-①.⑨⑥, Between z= -①.⑨⑥ and +①.⑨⑥, and
Above z=+①.⑨⑥
from the mean,” thatis, the middle 95%. There is a 95% chance that a randomlyselected case will
havea score thatlies in the center area. In Chapter 8 (on significance tests), we focus on the areas in the lower and upper tails. There is a 2.5% chance that a randomly selected score will lie in the lower tail and a 2.5% chance that a randomly selected score will lie in the upper tail. These two areas
combined describe outcomes that can be called “far away from” the mean. You should develop a sense that zscores larger than 2 in absolute value (the rounded value for 1.96) indicate that an outcome is usually considered far from the mean (and therefore unusual or unlikely). Also, z-score values larger than 3 in absolute value are very far from the mean (and therefore very unusual or unlikely). Figure 6.6 Bottom .5%, Middle 99%, and Top .5%
of Normal Distribution
Е
10p 2.5%
middle 95%
botom 25% ns
コ
o
a
+196 middle 99%
The X axis denotes the z score and rangesfrom minus 3 to 1.96, with 0 as the center. The area between plus 1.96 and minus 1.96 has been termed Middle 95 percent.Thearea beyond minus1.96 is called the Bottom 2.5 percent and beyondplus 1.96 has beentermed Top 2.5 percent. These areas appear in Figure 6.5. The exact value of zthat “cuts off” 2.5% of area in each tail, with
24% Page 143 of 624 » Location 3694 of 15772
2.576
о
12,576
The X axis denotesthe z score and ranges from minus2.576 to plus 2.576, with 0 as the center. The area betweenplus 2.576 and minus2.576 has been termed Middle 99 percent. The area beyond minus2.576 is.005 percent and beyondplus 2.576 is .005 percent.
6.11 Outliers Relative to a Normal Distribution A score that has a verylarge distance from the
median instead of the mean. 5. Use robust statistical methods (these are beyond the scope of this book;see Field and Wilcox, 2017, for an introduction).
mean(and therefore a very large absolute value of
Ideally you decide the method you will use to
2) is called an outlier. It is possible to use zscores
identify outliers, and the method you will use to
to identify scores as outliers. Tabachnick and
handle them, before you collect data. For example,
Fidell (2018) suggested that scores with zvalues
you could use zscores (which work well for
less than -3.29 or greater than +3.29 can becalled
normally distributed samples) or boxplots (which
outliers. Scores can beidentified as outliers using
are preferable for non-normally distributed
other criteria, for example, location in a boxplot.
samples) to identify outliers. You must describe
Boxplots and zscores may not identify the same
the criteria for outliers, the number of outliers,
scores as outliers. Many other rules can be used to
and the handling of outliers in the research
identify outliers (Aguinas, Gottfredson, & Joo,
report.
2013).
It can be useful as a student exercise to
Outliers create problems in manystatistical
“experiment” with outliers in data (data that you
analyses. For example, the value of the sample
will not publish!). You can evaluate how results of
mean Mis not robust against the effect of outliers.
analyses change when outliers are retained versus
When you see outliers in a sample, at a minimum,
removed. In actual research, you should commit
you need to report:
to decisions ahead of time. It is bad practice to
* The method you used to identify cases as
outliers e The number of outliers * The decisions you made about handling
outliers
“experiment” with outliers in data you plan to publish. You should not drop outliers in various ways until you obtain the outcome you want, and then report one final outcome without explaining that it was “cherry-picked” from a large number of different analyses.
There are several possible ways to handleoutliers, and none of themis a perfect solution:
1. Leave the outliers in. 2. Remove outliers from the data set before analysis (using methods described in Appendix 6B). 3. Modify the values of outliers (i.e., change the
score value of an outlier to the next nearest score value that is not an outlier; this is called Winsorizing; Aguinas et al., 2013). 4. Use anonparametric analysis that can reduce the effects of outliers; for instance, report the
24% Page 144 of 624 - Location 3718 of 15772
6.12 Summary of First Part of Chapter At this point you should be able to do the following. e Convert an XY score (for example, height in centimeters) into a z score, given values of M
and SD. * Given a zscore and values of Mand 57, find the original Y score. e Given a diagram of the normal distribution
(asin Figure 6.1), find the percentage of area
Refer back to Tables 5.1 and 5.2 to see examples of
above or below any integer valueof z, or the
histograms that represent common distribution
percentage of area between any two integer
shapes. Table 5.1 shows some approximately
values of z.
normal distributions with slight departures from
* Using the table of the normal distribution in
normal shape, such as outliers and mild to
Appendix A at the end of the book,find the
moderate skewness. Other histograms were
percentage of area above or below any
clearly non-normal (such as the uniform and
noninteger value of 2, or the percentage of
reverse J-shaped distributions). At this point,
area between any two noninteger values of z.
when you look at a histogram for sampledata, try
However, this is less important for further
to find a good match for your histogram shape in
work in statistics than understanding the
these tables. It is okay if you cannot find a match.
idea of dividing a normal distribution into
Some distributions in samples don’t have any
regions (lower tail, center, and upper tail). * Decide whether an X valueis far away from the mean, on the basis of its absolute value of 2.1 suggest that you call values of z greater
than 2 in absolute value “far from” the mean and values of zgreater than 3 in absolute value “very far from” the mean.
simple shape.
1. Distributions that resemble those in Table 5.1 can be judged “reasonably normal”in shape, with appropriate modifications to descriptions such as “moderately positively
skewed.” 2. Distributions that resemble those in Table 5.2 are not at all close to normal in shape. Some
6.13 Why We Assess Distribution Shape
problems if they are analyzed using the basic
You should always examine histograms or
require different and more advanced
boxplots or other graphs for scores on
analyses.
quantitative variables before you do additional
of these distribution shapes can cause serious bivariate techniques in this book and may
3. Whether a distribution is approximately
analyses. These graphs provide information you
normalin shape or not, pay attention to
need to do the following things:
outliers. Outliers can have a disproportionate
1. Describe distribution shapes for your
variables. 2. Detect outliers. 3. Evaluate whether data meet requirements and assumptions for statistical analyses you plan to do.
impact on results. You must acknowledge the presence of outliers and decide what to do with them (even if your decision is to leave them in). 4. Itis possible to do quantitative tests for departure from normality, as described in Appendix 6A at the end of this chapter.
The third point (evaluating possible violations of assumptions) will be discussed for each new analysis when it is introduced; you do not need to worry about it now.
24% Page 145 of 624 » Location 3742 of 15772
However, quantitativetests for skewness, kurtosis, and overall departure from normal distribution shape are often not very useful in practice. Results of these tests often depend more on samplesize than on degree
of departure from normality (these tests
exam). Variables such as annual income tend to be
almost alwayssignal problems with
strongly positively skewed because minimum
distribution shape, even for distributions
income is 0, but there is virtually no limit to
that are similar to normal, when samples are
income at the upper end of the distribution.
large, for example, N > 200). Furthermore,
Figure 6.8 shows substantial positive skewness
some statistical tests (not all tests) are fairly
(along with a possible floor effect and high-end
robust to violations of assumptions about
outliers). A effect occurs when there is a
normal distribution of scores in the
limit to possible scores at the low end of a
population.
distribution. For most students, the quiz was too hard. For example, a student cannot earn an exam
6.14 Departure From
score less than 0 points. If an exam is extremely
Normality: Skewness
and few students will earn high scores, as in the
difficult, many students will earn very low scores,
One common departure from ideal normal distribution shape is skewness. Skewness is asymmetry; an ideal normal distribution is perfectly symmetrical. If you could “fold” a normal distribution along the line that corresponds to the mean, the two halves would match. Skewness describes the degree to which a histogram deviates from perfect symmetry. We say that a distribution is positively
skewed if
it is “heavy”at the lower end and has a longer, thin tail at the upper end. Conversely, we say that
a distribution is negative
ewed if it has a
longer, thinner tail at the lower end. Figure 6.7 shows schematic examples of positive and negative skewness. Visual examination of a histogram is often sufficient to decide whether there is notable skewness. A quantitative index of skewness can be requested from SPSS (see Appendix 6C); it isn’t needed in most situations. In mostsituations, visual examination of the histogram is sufficient.
Some common situations cause data to be skewed. For example, there may be a lower limit to score values (a person cannot have fewer than 0 children) or an upper limit to scores (a student
cannot obtain more than 100% correct on an 25% Page 1460f 624 - Location 3767 of 15772
hypothetical distribution in Figure 6.8. Figure 6.7 Examples of Distribution Shapes for Positive, Zero, and Negative Skewness
RN A
Positive skewness (long tail on high end) SPSS skewness > 0
Not skewed (perfect symmetry) SPSS skewness = 0
Negativeskewness (long tail on low end) SPSS skewness 200), these tests almost always indicate significant departures from normality. The results of these tests of normality often depend more on samplesize than on distribution shape (University College London, Great Ormond Street Institute of Child Health, 2010). In mostsituations, simple visual examination of a histogram is enough to evaluate whether sample data are reasonably normally
distributed. Quantitative tests for overall departure from normal distribution shape (essentially, comparing the shape of the histogram in your sample with an ideal normal distribution) appear in Appendix 6C.
mean) than for an ideal normal distribution. The
Some textbooks say that a normal distribution of
patterns of scores in the center of platykurtic
scores in a sample is a required assumption for the
(sometimes described as “flatter” than normal)
use of many common statistics. Strictly speaking,
and leptokurtic (sometimes described as more
that is incorrect (Field, 2018). (An assumption
“peaked” than normal) distributions can vary in
involved in developing many of the statistics you
ways that do not correspond to the graphs that
will use was that scores were randomly sampled
appear in some textbooks. It is inaccurate to
from a normally distributed population, but we
describe kurtosis simply as degree of “peakedness”
usually don’t have enough information to evaluate
(Westfall, 2014).
distribution shape in the population.)
In practical applications of statistics, you can
In practice, some departures from normal
ignore kurtosis. Visual examination of
distribution shape, such as extreme outliers, do
distribution shape in histograms and boxplots
cause problems in data analysis. Distribution
provides more useful information about related
shape is discussed in later chapters when it is
potential problems, such as extreme outliers.
importantfor specific analyses.
More complete information about kurtosis (for curious readers) is provided in Appendix 6C.
6.16 Overall Normality 25% Page 148 of 624 - Location 3815 of 15772
6.17 Practical Recommendations for
Preliminary Data Screening and Descriptions of Scores for Quantitative Variables
* Your decision whether to use mean or median (as well as choices among later statistics) may depend on distribution shape and whether outliers are present. * Documentevery decision you made.
When you work with quantitative variables, you should do the following things.
く In all research, decide the value of Nbefore you begin to collect data. (Do not collect data, repeatedly analyzeit, collect more data because you are not happy with results, and then stop at a point where you have results you like.)
Choose the method for outlier identification (such as boxplots or zscores) before you
collect data. Establish rules for inclusion or exclusion of cases ahead of data collection. (For example, you may wantto includea limited range of ages, or only right-handed persons, in your sample.) Decide how you will handle outliers before you collect data. If you anticipate skewness, think about what you might do to reduce skewness ahead of time. In many cases, if skewness is not extreme, you don't need to do anything about it.
6.18 Reporting Information About Distribution Shape, Missing Values, Outliers, and DescriptiveStatistics for Quantitative Variables You use all the information discussed in Chapters 3 through 6 to describe the behavior of each quantitative variable early in your research report. Try to communicate the pattern of information as clearly as possible. Information about distribution shape can be summarized in statements such as: Heartrates were approximately normally distributed, with = 100, M= 74, and SD = 4.5. There were no missing values. Using 2>
3.29 in absolute valueas the criterion for identifying outliers, there were no outliers.
The initial data set had V = 340 heart rate scores, with M= 76 and SD = 6.5. There were
Collect data. Obtain a frequency table; identify impossible or questionable score values and note percentage of missing values. Obtain a histogram and visually examine it to evaluate distribution shape and skewness. Unless skewness is extreme, you probably don’t need to do anything aboutit. To evaluate outliers, obtain a boxplot and/or z
20 missing values. Using z> 3.29 in absolute value as the criterion for identifying outliers, there were 10 outliers, all at the upper end of the distribution. On the basis of prior plans for data handling, the 20 missing values and 10 outliers were removed from the data set, leaving N = 310 cases for analysis. For these 310 cases, M 68 and SD = 5.7.
scores for all cases. Either boxplots or zscores
Number of daily servingsof fruit and
can be used to identify outliers. Note the
vegetables had a possible range of scores from
number and locations of outliers.
O to 8. Scores were not normally distributed;
25% Page 149 of 624 » Location 3841 of 15772
each bar. 5.4
Frequencies
Dialog
Box
and
Frequencies: Charts Dialog Box
e
Eramcem
Emarital
x
Da
い yeeme o (corn) carene
ceeeme
There are two boxes,and the one on the left hasa variable titled marital. Below is a check box nameddisplay frequency tables. At the bottom are options buttonsfor the following: OK,Paste, Reset, Cancel and Help. On the right are theradio buttonsStatistics, charts, format and help. The Charts option has been depressed. Thefrequencieschartsdialog boxhas four chart type check options; none, bar charts, pie charts andhistograms. The bar charts option hasbeenchecked. Thechart valuestab has two choices frequencies and percentages. Frequencieshas beenselected. At the bottom are the option buttons Continue, Cancel and Help. Figure 5.5 Bar Chart for Hypothetical Marital Status Groups, Total N = 42
20
never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3
5.4 Good Practice for Construction of Bar Charts Bar charts and other graphs should provide accurate information that is easy to understand.It is easier for readers to understand graphs when they follow simple rules and conventional
standards. 1. A separate bar represents the frequency (or proportion or percentageofcases) for each group. The height of the bar corresponds to the number or frequency in each group (or the proportion or percentage of cases in each group). The labels on the Yaxis should make clear whether frequency, proportion, or percentage is reported. However, the relative heights of the bars are the same no matter which label is used. (Usually bars are vertical, but it is possible to set up bar charts in which
20
bars are horizontal.) 2. Names of groups are specified by labels on the
15 Frequency
Thedetailsare asfollows:
.....
Figure
The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies.
Xaxis. 3. Bars should have equal widths. (This rule is
10
not always followed.) 4. The height of the graph (¥axis) is usually less than the width of the Yaxis (the height of Yis
Never married
Engaged
Married
Divorced
Widowed
Martial status
19% Page 101 of 624 - Location 2885 of 15772
often about 75% the length of X). 5. The Yaxis begins at 0 (or at another minimum value of Y.
distribution shape, we need to provide much
normal distributions (and other similar
more information to provide a complete
distributions) are used to identify “common”
description of different scores or responses.
versus “uncommon”(or rare or unexpected)
In somesituations, we may need to report a
outcomes.
complete frequency table to providefull
information.
2. Problems in the distribution: Information about distribution shape is needed to identify potential problems such as outliers and
skewness.
3. Describing quantitative variables in research reports: If there are few variables, you might summarize information about
each variableis a sentencesuch as “Scores on Xwere approximately normally distributed,” or “Scores on X were extremely positively skewed, with 3% missing values, and two low-end outliers identified by location in a boxplot.” If there are many variables, a table
could summarize this information. The skills you need to remember from this chapter are: * How to convert an score intoazscore (given values of Mand SD).
* How to convert a z score back into an Xscore
Appendix 6A: The Mathematics ofthe Normal Distribution A function is an equation that generates values for
a Yvariable on the basis of values of one or more Y variables. The very simple function to convert height in inches (7) to height in centimeters (7) is ¥=2.54 x X. This is a linear function; if you plot values of Y(vertical axis) against values of Y (horizontal axis), the equation corresponds to a straight line, as shown in Figure 6.11. The equation (function) for the normal (or Gaussian) distribution is much more complicated, and it generates a curve (not a straight line). The equation uses a lot of notation you have not seen yet. The key things to notice are that: e Yrepresents the height of the curve (on the vertical axis) + (X-u) represents the distance of an X score from the mean (on the horizontal axis of the
(given values of Mand SD). * How to find the percentage of area above or below any value of zor the percentage of area between any two values of 2. e How to identify outliers.
® How to summarize information about a quantitative variable, including at least distribution shape, missing values, outliers, and descriptivestatistics such
plot of the function) Equation 6.4 generates a valuefor ¥(the height of the distribution) as a function of the distance of
an Æscore from the mean. Other
(6.4)
(Xn)?
as Mand SD.
o
The material in this chapter is extremely important; two widely used statistical procedures (confidence intervals and statistical significance tests) depend on understanding the way areas of
25% Page 151 of 624 - Location 3887 of 15772
ア
e
一
ひ ②
6. The top of each bar is labeled with an exact
considered (20 never married, 3 divorced), the
numerical value (a frequency or a
never married group is only about 7 times as large
percentage). SPSS does not dothis for you; I
as the widowed group.
added this information using SPSS Chart
Figure 5.6 An Example of Bad Practice: Deceptive
Editor.
Bar Chart for Frequency of Marital Status
20 18
7. Information about total NV must be provided.
source of data should bestated. Readers tend to assume that numbers are based on new data collected by the researcher; if there is
Frequency
8. In afootnote or the body of the text, the
another source (such as Gallup polls or the U.S. census), that source must be identified.
Never married
9. Bars in bar graphs for categorical variables usually do not touch one another. (This
Widowed
Divorced
Married
Engaged
Martial status
reminds readers that bars represent distinct
When you generate bar charts for frequencies in SPSS, many of these good form requirements are taken care of by default (e.g., bars are equal widths, and the Yaxis begins at 0).
5.5 Deceptive Bar Graphs The most common way to make a bar chart for group frequencies “lie” is to set up the Yaxis so
thatit does not start at 0. To illustrate this deception, I modified the graph in Figure 5.5 so that the Yaxis begins at 2 (instead of 0). The modified bar chart in Figure 5.6 is potentially misleading because people tend to look at the ratio of bar heights (or bar areas) when they compare
The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies. Thedetailsare asfollows:
.....
groups.)
never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3
Figure 5.7 Deceptive Bar Chart: Use of Cartoons Instead of Bars to Represent Frequencies
10000 Number of new houses built 5000
group sizes; people often do not pay close attention to the specific values indicated on the Y axis. In Figure 5.6, the differences in group sizes appear larger than in Figure 5.5. In Figure 5.6, the never married group appears to have about 10 times as many members as the widowed group (measure the height of the bar for never married and dividethis by the height of the bar for the widowed group). When actual group sizes are
19% Page 102 of 624 - Location 2906 of 15772
o
LE
2009
2019
Year
The X axis denotes the year, 2009 and 2019
and the Y axis denotes the number of new housesbuilt and ranges from 0 to 10,000.
The X axis denotesthe z score and ranges from minus3 to plus 3, with as the center. The standard deviation hasbeen providedas 1 and the meanis 0.
second Select Cases: If dialog box.
The area oneither side of the mean,thatis, between 1 and 0 ontheright as well as between minus1 and 0 ontheleft, is equal to
“temp_Fahrenheit < 100.” A logical expression
34.13 percent.
Thearea betweenplus 1 andplus 2 on the right and minus 1 and minus2 on theleft is equal to 13.59 percent each. Thearea betweenplus 2 andplus 3 on the right and minus 2 and minus3 ontheleft is equal to 2.14 percenteach. Theareabeyondplus 3 and minus3 on either sideis 13 percent. At thetopofthefigure, threelines show the area underthe curve. The area under minus 1 to plus 1 is 68 percent. The area under minus 2 to plus 2 is 95 percent. The area under minus3 andplus 3 is around99 percent.
Appendix 6B: How to Select and Remove Outliers in SPSS If aresearcher decides on rules for the identification and removal of outliers before looking at the data, and detects outliers using these rules, the following SPSS commands can be used to remove(filter out) outliers.
Next you will see the Select Cases: If dialog box in Figure 6.15. Type in the logical expression generally includes a variable; operators such as greater than, equal to, or less than; and specific numerical values (see Table 6.2). The full
command this createsis “Select cases if temp.Fahrenheit is less than 100.” By implication, cases with values of temp.Fahrenheit greater than or equal to 100 are not selected. Data for the cases that satisfy this condition will be included in later analyses. Cases that do not meet this condition (that is, persons with temperatures above 100) will be excluded from future analyses. Under the Output heading in the Select Cases dialog box in Figure 6.14, I left the radio button selection as “Filter out unselected cases.”If you choose “Delete unselected cases,” cases will be removed permanently. Permanentdeletion is usually not a good idea. A research report must include information about anycases that are selected out. The number of cases, the score values, and the reason for selecting them out should be stated. Usually scores are removed because they are outliers, but
there can be other reasons to remove scores. When you look at the data file in Figure 6.16 you'll
see that the row numbers for two excluded cases (with temperature scores of 101.3 and 100.4) are
In the following example, SPSS Select Cases
marked out with cross hatches. If the frequencies
commandsare used to retain temperatures that
procedure is run to obtain the sample mean M,
are below 100 degrees Fahrenheit (and
those two values will not be included. Figure 6.14 Select Cases Dialog Box
temporarily filter out any temperatures higher than this). To do this, make the following menu
selections: っ .In the Select Cases dialog box (Figure 6.14), click the radio
button for “If condition is satisfied.” Then click the If button immediately below that to open the
26% Page 154 of 624 - Location 3960 of 15772
ER Select Cases
& sex
a hr
@ tempFahrenheit temp.Ceisius
>
Select O alcases
© If condition is satisfied
панасне
x]
ir
Ce)
© Random sample of cases
© Based ontime or case range Use filter variable:
RC] Output © iter out unselected cases © Copyselected casesto a new dataset
© Delete unselected cases Current Status: Donot filter cases
== [mese] (cancer) Cie) Ontheleft are variables, namely, sex, hr, temp underscore Fahrenheit, and temp underscore Celsius. Theright has check boxes to select cases. There are five choices; all cases, if condition is satisfied, random sampleof cases, based on timeor case range and usingfilter variable. Thesecondchoice, If condition is satisfied, hasbeenchecked. Below this is the output section. There are three choices in the check boxes here;filter out unselectedcases, copy selected cases to a new dataset and delete unselected cases. The first option hasbeen selected. A statement “Current Cases: Do notfilter cases”is below this.
At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 6.15 Select Cases:If Dialog Box
26% Page 156 of 624 - Location 3987 of 15772
Ontheleft is а set ofvariables, namely, sex, hr, temp underscore Fahrenheit, and temp underscoreCelsius. Temp underscore Fahrenheit has beenselected. A box ontheright shows one morevariable, temp underscore Fahrenheit less than 100. Below this is a keyboard with numbersand special characters that is usedto input the variable specifications. At the extremeright is a box showing Functiongroups, of which the following are visible; all, arithmetic, CDF and noncentral CDF,conversion, currentdate or time,date arithmetic and date creation. Table 6.2
-eeo NE
al Notegual
pT
Less than
> or GT me &orAND 0
Greater than or equal to late whether both conditions hold Evaluate whetherane orboth oftheconditions hold
Figure 6.16 Temperature Data File With Cases Removed by Select Cases Procedure Marked by
Cross Hatches
TE “Unid Isa 1PSSui Dt tr
This could be done with a logical expression such
ド や - — — = incon tt = = = 3 5 5
Be
2 2 2 1 1 2
Ph
ョ 7 ァ 5 70 ヵ
2 empFovennet|_ 2 tempCelis
1013 1004 ses ss En En
ЕТ) 2000 nn 3750 a a
as “temp_Fahrenheit > 97 AND temp_Fahrenheit < 100.” This would include scores only if they are both greater than 97 and less than 100 and
0 о 1 1 ① ①
excludescores outsidethat range. Remember that you must report the number of
From that point on, when you run analyses (such as the frequencies procedure to obtain statistics such as the mean), the few cases with temperatures equal to or greater than 100 will not be included. They have not been deleted from the data file, only temporarily excluded. If you want to stop excluding the cases with outliers, you need to go back to the Select Cases dialog box and select the radio button for “All cases,” as shown in Figure
6.17. Figure 6.17 “Select Cases” Radio Button to Select All Cases (Stop Excluding Outliers) | À Select Cases
PES PA
4 temp.Fahrenheit 4 temp.Celsius
& tempFahrenheit
120, the ¿distribution becomes very close to the standard normal distribution). As dfincreases, the critical values of zdecrease; by the time d/> 120, critical values of ¿converge to the standard normal distribution. For d/> 120, 2.5% of the distribution lies below -1.96, the middle 95% lies between -1.96 and +1.96, and 2.5% of the distribution lies above +1.96. When Wislarge, the amount of additional sampling error created by using SDto estimate o becomes negligible. Figure 7.7 Lower Tail, Middle 95%, and Upper Tail
of Normal Distribution
Source: Abridged from Fisher and Yates (1974, Table V). The imageis an extract from thecritical values for T distribution and has been adapted from thetable by Fisher and Yates. Thetablelists different confidence intervals, andlevels ofsignificancefor one tailed and two tailed tests. It also showsthe df ranges that result in thecritical values. Details are below:
—1.96
o
+1.96
Mostextreme 5%: sum ofareasin
lower and uppertails beyond z = 1.96
The imageis a diagram of the normal distribution that showsthelower tail, middle 95 percent, and uppertail. The X axis has 0 as the center. Minus1.96 and plus 1.96 have also been marked. The area betweenplus 1.96 on the right and minus1.96 on the left is equal to 95 percent. Thearea beyondplus1.96 and minus 1.96 on either side is 2.5 percenteach. A statement below the diagram mentionsthat Most extreme 5 percent is sum of areas in lower and uppertails beyond z equals 1.96. 30% Page 120 of 624 - Location 4596 of 15772
Confidenceintervals percentage 80
90
95
98
The area between plus 2.034 on the right and 99
99
Levels of significance for One-tailed test 0.1
0.05
0.025
0.01
0
0.005
O.
percent and beyond minus2.034 is lower 2.5
percent.
.
0.01
95 percent.
The area beyondplus2.034 isA the upper 2.5
Levels ofsignificance for Two-tailed test 0.02
minus 2.034 onthe left is equal to the middle
df
0.2
0.1
0.05
1
3.078
6.314
12.706 31.821 63.657 63
2
1.886
2.92
4.303
6.965
9.925
31
3
1.638
2.353
3.182
4.541
5.841
12
4
1.533
2.132
2.776
3.747
4.604
8
5
1.476
2.015
2.571
3.365
4.032
6£
The value of Cis usually 95%; it corresponds
6
1.44
1.943
2.447
3.143
3.707
5.
tothe percentage of area we use for the middle of the distribution when we look up
7
A415
199
2565
2098 5399 BE
cutoffvaluesfor £. Ccan be other values, such
8
1.397
1.86
2.306
2.896
3.355
5.
as 90% or 99%.
b
1383
1833
2262
2821
325
4:
0④
7.13 Using Sampling Error to
Set Up a Confidence Interval The following pieces of information are needed to .
R
set up a confidence interval (CI):
* Canarbitrarily selected confidence.
く The values of M, SD, and Nfrom the sample
ata. 10
1.372
1.812
2.228
2.764
3.169
4:
The eighth row of the df and the 95 percent confidenceinterval column havebeencircled.
Weneed to do the following to find the lower and upper limits of a confidence interval on the basis of a sample mean M:
Figure 7.9 Division of Area for Distribution With
8 dfInto Bottom 2.5%, Middle 95%, and Top 2.5% For t distribution with 8 df
1. Find 2 4/= N-1.
2. Lookup the (absolute) critical value from a £ distribution that correspondsto the middle C% of the ¿distribution with V-1 d/ The
critical value of ¿can be obtained from the table in Appendix B at the end of this book and is sometimes denoted Zitical c%3. Calculate SZ; (using SP and N from the
Middle 95%
Lover 57
sample).
Upper 2.5%
t=-2.034
{= +2.034
The image is a diagram of a t distribution that showsthepercentage of area under the curve.
30% Page 181 of 624 - Location 4614 of 15772
4. Find the lowerlimit of the 95% CI: Other
(7.8)
—
M= Critica 0% X SE)
5. Find the upper limit of the 95% CI:
narrower, we could do any of the following things (other factors being equal): e Choose a lower level of confidence (such
Other
as 90% or 68% instead of 95%).
(7.9)
M + (tcritical C% x SE)
However, researchers are reluctant to
These equations convert the ¿values that
95% confidence is the most widely used
correspond to the middle 95% of the
value.
make the level of confidence too low;
distribution of ¿backinto the raw-score units
* Increase the samplesize V.
in which the mean wasgiven.
* Decrease SD. (Chapter 12, on the independent-samples ¿test, describes
An example: Suppose that V= 25. We use the
things researchers can do in some
“distribution with 24 gf Suppose we want a
situations that may decrease SD.
90% levelof confidence. We locate the value from the table of the /distribution in
However, in manysituations,
researchers havelittle control over SD.)
Appendix B at the end of the book for a 90% level of confidence and 24 df critical 90% = 2.064.
We have values of M= 50, SD = 10, and SEy=
7.14 How to Interpret a
Confidence Interval
10/V 25 = 10/5 = 2. The confidence interval is
The language used to interpret CIs is tricky. It is
calculated as follows:
incorrectto say that a 95% CI computed using data
Lower limit of the 90% CI: M- (éritical 90% *
SEm) = 50-2.064 x 2 = 50—4.128 = 45.872. Upper limit of the 90% CI: M + (Zritical 90%*
SE) = 50 + 2.064 x 2 = 50 + 4.128 = 54.128.
from a single sample has a 95% chance of including и.(It either does or it doesn’t, and we have no wayto be certain which situation we have for an individual sample.)
It is more accurate to think about a Clas a statement about expected outcomes in the long
This can be reported as: 90% CI [45.872,
54.128].
run, across hundreds or thousandsof different samples from the same population. For a 95% CI,
Other factors being equal, these factors make
confidence intervals wider: ® A higher level of confidence, for example, use of C= 99% instead of C= 95%
approximately 95% of the CIs that are set up using the procedures described in this chapter are expected to include the true population mean up between the lower and the upper limits. Approximately 5% of these CIs will not contain y.
® Smaller N
Cumming and Finch (2005) suggested thisas a
* Alarger value of の
way to think about CIs: “a range of plausible values
We prefer to have narrow confidence
intervals. To make confidence intervals
30% Page 182 of 624 + Location 4632 of 15772
for u; values outside the CI are relatively implausible ... [the] data are compatible with any value of p within the CI but relatively
incompatible with any value outside it.” A problem with CIs is that, like Mand SD, they vary across samples. Here is a thought experiment that illustrates the problem. If you randomly select 18 samples (each of size 25) from the same population, the values of Mand SD will vary across these samples. That implies that the upper and lower boundaries of the 95% CIs will also vary across samples, as in the hypothetical example in
Figure 7.10. Each vertical line with whiskers represents the lower and upper bounds of the CI for 1 ofthe 18 samples. The circle in the middle of each CIrepresents the meanfor that sample; the circleis filled if the CI for that sample includes the true value of wand open if the CI for that sample does not includethe true value of u. The true value of ufor the population correspondsto the
horizontal line.
Samples 1 through 18
In this example, 16 of the 18 CIs included ju, while the other 2 CIs did not include pw. If we had CIs for thousandsof samples, 95% of them would be
The image showsa hypothetical outcomefor 18 confidence intervals.
expected to include ju; the other 5% would not include u. The 95% confidence level is a prediction about how many CIs out ofthousands would include p.
Figure 7.10 Hypothetical Outcomes Confidence Intervals
for
Source: Adapted from Cumming and Finch (2005).
18
The X axis is the vertical axis with its centeras ‘mu, and the outcomesare indicated by circles in the center with vertical lines on either side. Eachvertical line representsthe lower and upper boundsof the confidence intervals for each ofthe 18 samples. Most ofthe circles andlines for the confidence intervals included the mu, except for two samples, which have been circled. While one lies above the mu,the otherfalls below the mu.
The imagehas been adapted from Cumming andFinch.
7.15 Empirical Example: Confidence Interval for Body 30% Page 183 of 624 - Location 4658 of 15772
Temperature
differ from the ones I reported; I modified his data
Most of us assume that normal or average healthy
enough that we can dismiss it, or large enough
adult body temperature is 98.6°F. In 1868,
that we should pay attention to it? Further
Wunderlich (cited in Mackowiak, Wasserman, &
information is needed.
Levine, 1992) summarized data from over 1 million temperature measurements for 25,000 patients; he concluded that the “normal healthy” body temperature was 98.6°F or 37°C. Until fairly recently, that value has not been questioned; few studies of normal body temperature have been done. Mackowiak et al. (1992) believed that it
would be useful to examine new data because instrumentation for taking body temperature has changed since the 19th century. Shoemaker (1996) created an artificial data set in which the
score values led to conclusions like those of Mackowiak et al. Data adapted from Shoemaker are used for the analyses in this section. It might seem that finding average body temperature for human populations would be easy, but it’s a more complex question than it appears. For readable discussions, see Cook (2018) and Maril (2018). A more recent study of
slightly.) Is this difference or inconsistency small
To evaluate whether the sample mean is consistent with an estimate of pu = 98.6, we will set up a 95% CI and ask whether 98.6 is included in thatCI (or not). Notice that the standard error of the mean(SZ) reported by SPSS is .0667. You can confirm this by hand: SZyr= Sの/ = .0⑥⑥⑦. To obtain the limits of the 95% CI, a new SPSS procedure is introduced (the one-sample test). This procedure will be used more extensively in Chapter 8. Make the following menu selections: > > . When the dialog box for the one-sample # test appears, as in Figure 7.12, move the name of
the variable of interest into the list of variables to be analyzed. Leave the box “Test Value” containing
the default value of 0. Then click OK. Figure 7.11 Descriptive Statistics for Temperature
in Fahrenheit in shoemaker.sav
Statistics
temperature data collected through smart phone crowdsourcing is reported by Hausman et al.
(2018).
temp_Fahrenheit
Values of N, M, and SD for the Fahrenheit
N
temperature scores in the file shoemaker.sav were obtained using the SPSS frequencies procedure (menu selections are not repeated from earlier chapters). Results appear in Figure 7.11. The first thing to notice is that the sample mean in Figure 7.11, M= 98.25, is lower than the population mean that people generally believe
Valid
130
Missing
‘ 0
E⑨.②⑤④」 Std. Error of Mean
.0667
Std. Deviation
.7603
(98.6). The difference is (98.25 — 98.6) = —35. This sample mean is about a third of a degree lower than the generally accepted value. (Note that if you look up Shoemaker's article, numerical values
30% Page 123 of 624 - Location 4675 of 15772
The image is a table that showsthefollowing descriptive statistics data:
Interval: Initial Menu Selections and Dialog Box
Temp underscore Fahrenheit
for One-Sample ¿Using 0 as Test Value
.....
Statistics
N - Valid: 130
N - Missing: 0
res rs Lies ー ne tio
o Tes Como me
Mean: 98.254 Std error of mean: .0667
+ | tungFata Pre Si * [бах > |oresencerten [E seenmoles Test
> [rvmon
Std Deviation: .7603
The output appears in Figure 7.13. The area enclosed in the ellipse in Figure 7.13 shows the lower and upper limits for the 95% CI for mean temperature in degrees Fahrenheit. Note that this
ge とー シーっ C ye
CI does zofinclude the value that most people thinkof as average body temperature (98.6). The Shoemaker temperature data suggest that the true population mean for body temperature may be lower than the conventionally assumed value of 98.6.
We can create graphs to display confidence intervals. When CIs are graphed,they are called error bars. However, note that lines that are called “error bars” in published graphs do not always represent confidence intervals; sometimes error bars correspond to SD or SE. Titles for the graphs
should makeit clear what the error bars represent. From the main SPSS menu,select > > . The Error Bar dialog box appears in Figure 7.14. Choose “Simple” and “Summaries of separate variables,” then click Define to open the next dialog box, in Figure 7.15. Enter the name of the variable for which you wanterror bars. There is a pull-down menu, “Bars Represent,”initially set to “Confidence interval for mean,” that allows you to specify whether you want bars to represent the CI (or SD or SE); leavethis at the default selection for CI. The output appears in Figure 7.16.
Figure 7.12 Use of SPSS One-Sample # Test Procedure to Obtain Mean and 95% Confidence
30% Page 184 of 624 + Location 4700 of 15772
The image is a screenshot of the procedureto use SPSS One-Samplet Test.
At thetopofthe spreadsheet arethefollowing ‘menu buttons; analyze, graphs,utilities, extensions, window and help. Below these buttons are icon buttonsfor table editing options. On the clicking of the Analyze button,a dropdown menu withthefollowing options has opened; reports, descriptive statistics, Bayesian statistics, tables, compare means, general linear model, generalizedlinear ‘models, mixed models, correlate, regression, loglinear, classify, dimension reduction, scale, non-parametric tests, forecasting, survival, multiple response, simulation, quality control, ROC curve, and Spatial and temporal modelling. The compare means menuhasbeen clicked andthe following menu optionsarevisible; means, one sample T test, independent samples test, summary independent samples T test, Paired samplesT test, and one-way ANOVA.
The one sample test dialog box is also open. Thishas a set of variablesontheleft, sex, hr,
Atthe right is a button to control Options.
The test value can be changed. At present,it hasbeenset to 0.
Output
for
One-Sample
CI for body temperature in degrees Fahrenheit と Test
=0) ‘One-Sample Statistics a
[98.12, 98.39] is a range of plausible values for population mean body temperature. Values outside this CI are relatively implausible.” To report and interpretthis result more
7603
ョ
The 95 percent confidence interval of the difference has been circled.
Finch (2005), a brief interpretation is: “Our 95%
Procedure for Body Temperature Data (Test Value
EMO
T:1473.387 Df: 129 Sig- tailed: .000 Mean difference: 98.2542 95 percentconfidence interval of the difference o Lower: 98.122 o Upper: 98.386
Using the language suggested by Cumming and
At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 7.13
o e e o e
temp underscore Celsius, zscore open bracket temp underscore Fahrenheit close bracket and 2score open bracket temp underscore Celsius close bracket. There is an option to movethe requiredvariable to the box ontheright for test variables. Temp underscore Fahrenheit is in this box.
Ea
extensively, we can say,
One-Sample Test On the basis of this sample of N = 130 temperature measurements, with M= 98.254 Tamanna 1473387
129
ET
se:
and SD =.7603, the 95% CI for body temperature in degrees Fahrenheit was
The image is the output for the one sample T test procedure.
The statistics are in one table and the outputis in another table. Both of them have been provided below: + One samplestatistics Temp underscore Fahrenheit
[98.12, 98.39]. The value that people usually assume for population mean body temperature, 98.6, does not fall within this range of plausible values for u. The results of this (hypothetical) study are inconsistent with the claim that ju = 98.6. However, they do not conclusively disprove that = 98.6 The low value of Min this sample might have occurred because of sampling error.
© N:130
Information from additional studies is needed
© Mean: 98.254
to evaluate whether the true population mean
© Std Deviation:
.7603
© Std Error Mean: .0667
© One sampletest Test value equals 0
Temp underscore Fahrenheit 31% Page 186 of 624 + Location 4724 of 15772
for body temperature is lower than 98.6. We should always look for replications using large and representative samples before we draw conclusions; data from one small study do not
prove that p= 98.6 is incorrect. However, results reported by Mackowiak et al. (1992) were
Figure 7.15 Second Dialog Box to Define Error Bar
somewhat inconsistent with that belief. The normal healthy adult population is 98.6°F deserves a second look; that value was based on a kind of instrumentation to measure body
4 temp.Celsius
9 zscorenemoFarve..
Figure 7.14 First Dialog Box for Error Bar (Confidence Interval) Graph
x
Simple
Clustered
- Data in Chart Are © Summaries for groupsof cases
© Summaries of separate variables
an ra
ror Bars # tempFanrennet
Ze クv
4 zscoreftemp.Celsi.
temperature that is no longer used.
TH
x
1 Define Simple Error Bar. Summaries of Separate Variables
assumption that mean body temperature in a
A Error Bar
Define button has been selected.
(EIA) Bars Represent (Confidence interval for mean Level: [95
ェ
The imageis a seconddialog box to define error bar and shows howto representthe error bars in the graph. On the left are a set of variables, which can be chosen and movedto the box ontheright. The variables available are sex, hr, temp underscoreCelsius, zscore openbracket temp underscore Fahrenheit close bracket, zscore pen bracket temp underscoreCelsiusclose bracket. The variable temp underscore Fahrenheit has been movedto theerrorbar variable box. There are tworadio buttonson the side;titles and options.
The image is a dialog box for the error bar graph that allowsfor choosingthe type of graph as well as how thedata is summarized. The dialog box has two chart types; simple and. clustered. The Simple option has been chosen. The data in the chartcan be arranged in two ways; summariesfor groups of cases and summaries of separate variables. The second option has been chosen. At the bottom ofthe dialog box, there are three radio buttons; define, cancel and help. The 31% Page 186 of 624 + Location 4747 of 15772
There is a drop-down menu that allows a choice of what the barsrepresent. Here the bars represent confidence interval for mean. The level can also be chosen and 95 percent is the level currently. Figure 7.16 Graph of 95% CI for Temperature Data (Degrees Fahrenheit)
98.40
—
—— upper limit
98.35
Other Sample Statistics (Such as
Proportions) The sample mean, M, is not the only statistic that has a sampling distribution and a known standard error. The sampling distributions for many other
98.30
98.25
statistics are known; thus, it is possible to identify
e
mean
an appropriate sampling distribution and to estimate the standard error and set up CIs for many other samplestatistics, such as Pearson’s 7. Political polls (and sometimes opinion polls) often report statistics such as percentages, and it is
98.20
possible to set up CIs for percentage estimates.
98.15 —— lower limit 98.10
7.16.2 Margin of Error in Political Polls In many political and opinion polls, respondents are asked to state which among two or more
95% confidenceinterval for temperature in Fahrenheit The imageis an output graph of the 95 percent confidenceinterval for temperature data. In the image, which is the 95 percent confidenceinterval for temperature in Fahrenheit, the temperatureis the Y axis and ranges from 98.1 to 98.4. The mean hasbeen indicated as 98.25, and the upper and lower levelof the line have also been shown. The upper limitis close to 98.38 and the lower limitis around 98.12.
7.16 Other Applications for Confidence Intervals 7.16.1 CIs Can Be Obtained for 31% Page 187 of 624 + Location 4763 of 15772
alternatives they prefer (for example, pass or reject Proposition 13, which calls for legalization of recreational use of marijuana; intention to vote for Candidate A, B, or C). In these situations, the sample statistic of interest is a percentage (e.g., the proportion of respondents who say that they intend to vote Гог А,В, ог С ог who say they don’t know). It is possible to set up a 95% CI for a sample percentage taking Vin the sampleinto account. However, a margin of error reported for polling results usually correspondsto a 68% CI. As N
increases, margin of error decreases. The lower and upper limits of a 68% CI for a
sample percentageare Other
(7.10)
Lowerlimit = (% — margin of error). Other
(7.11)
Upper limit = (% + margin of error). It is possible for margin of error to be related to a different level of confidence. Unfortunately, the
3.25, SD= .957, and SEy = .479; for female students, M= 3, M= 4.33, SD= .577, and SEm= .333. (For each group, you should be able to calculate SEfrom Nand SD.)
definition of margin of error is often not stated
Look at the correspondence between the
specifically in media reports. Often the margin of
descriptivestatistics and the graph. The height of
error reported in media corresponds to a 68%
each bar corresponds to the mean rating for a
confidence interval.
group. The end points of the 95% CI error bars can
As an example, suppose that 54% of those polled say that they plan to vote for Candidate A, with a
be found by multiplying critical 95% (using a £ distribution with d/= 2 for the female and d/= 3
margin of error of +2%. This implies that the 68%
for the male group) by the value of SFygand
CIranges from 52% to 56% in favor of Candidate
identifying this distance below and above the
A. Plausible estimates of the population
group mean. This type of bar graph is very
proportion lie within this range. If more than
common in research reports in which group
50% of the vote is required to win the election, a
means are compared. Keep in mind that the error
CI from 52% to 56% indicates that itis plausible
bars on this type of graph can represent a 95% CI
(but not certain) that Candidate A will win.
but might represent SD or SZ; look for that information in the figure title or note.
Consider a different scenario, in which the By now you should be able to understand the
2%, and the proportion of those who plan to vote
nature of the differences between female and
for Cis 35 + 3%. This translates into CIs of 31% to
male students either by comparing values of Mor
35% (for Candidate B) and 32% to 38% (for
by examining the bar graph. Which group had
Candidate C). Candidate C may be ahead of
higher mean guilt about unhealthy foods?
candidate B by a small amount, but that small
Figure 7.17 Bar Graph for Group Means With 95%
difference could easily be due to sampling error.
CI Error Bars
Group Means Error bars can be superimposed on bar graphs in which the heights of bars correspond to group means. Consider the hypothetical example in
Figure 7.17. Students are asked to rate their degree of agreement with the statement “I feel guilty when I eat foods I know are unhealthy” on a five-point scale (1 = strongly disagreeto 5 = strongly agree). Mean scores are calculated for male and female students. For male students, N = 4, М=
31% Page 128 of 624 - Location 4785 of 15772
6
Mean| feel
7.17 Error Bars in Graphs of
ty when | eat foods | be unhealthy.
proportion of people who plan to vote for Bis 33 +
Male
Female Error bars: 95%CI
The image shows two bars in a graph that
represent group means with 95 percent CI
error bars. The bars represent student agreement with the statement - 1 feel guilty
70
(extraversion) are related to scores on another
Mean height
quantitativevariable (physical energy). A preliminary graph called a scatterplot is used to examine the relationship between variables prior
65
to doing statistical analyses such as correlation or regression. An exampleof a scatterplot appearsin Figure 5.29. In this hypothetical study, each
”
Female
Sex
Male
The X axisdenotesthe sex, male andfemale. The Y axis denotes the mean height, and ranges from 60 to 70. There are two bars, female and male. The female bar hasa height of 64, while the male bar hasa heightof 69. One difference you may notice is that, in this chart, the Yaxis begins at 60 (instead of 0, which
person provided self-report scores for extraversion (rated on a scale from 1, not at all extraverted, to 5, highly extraverted) and for energy (1 = very low energy, 6 = very high energy). Each data point in the scatterplot represents the combination of scores on extraversion (on the X axis) and energy (on the Yaxis) for one case. For example, the case marked with circle in Figure 5.29 represents a person with an extraversion score of 4 andan energy score of 3.
was the recommended valuefor the Yaxis origin
The three ellipses in Figure 5.29 identify areas of
when bar charts were used for frequencies).
the graph that can be compared. On the left, an
Here's why. For group frequencies, O cases per
ellipse encloses energy scores for people whose
groupis a possible value. For means of variables
scores on extraversion were low (below 2). On the
such as adult height, O is not a possible value of
right, an ellipse encloses the energy scores for
height. It makes sense to choose a value of Ythatis
persons whose extraversion ratings were high
below the minimum height in the sample, but
(above 4). You can see that for the people with low
higher than O, for a bar chart in which bars
scores on extraversion, energy scores also tended
represent means.
to be low. For persons with high scores for
If you read research reports, you are more likely to encounter bar charts that represent group means than bar charts for group sizes or frequencies. You will learn more about setup and interpretation of this type of bar chart in chapters about the independent-samples ¿test and ANOVA.
5.15 Other Examples 5.15.1 Scatterplots In some studies, researchers want to evaluate Whether scores on one quantitativevariable
22% Page 126 of 624 » Location 3367 of 15772
extraversion, energy scores tended to be high. People with moderate scores on one variable also
had moderate scores on the other variable. This is an example of a positivelinear relationship. In a later chapter this kind of relationship between two quantitative variables will be assessed using
Pearson correlation. Figure 5.29 Scatterplot of Physical Energy Scores (¥Axis) with Extraversion Scores (YAxis)
5. How does SEdiffer from 017?
. What is SZ, Whatdoes the value of SEytell you about the typical magnitude of sampling error? « As SD increases, how does the size of SZ; change (assuming Wstays the same)? e As Nincreases, how does the size of SZ; change (assuming SD stays the same)?
. Howisa ¿distribution like a standard normal distribution? Howisit different? . Under what circumstances should a distribution be used rather than the standard normal distribution to look up areas or probabilities associated with distances from
the mean? . Consider the following questions about CIs: A researcher tests emotional intelligence (EI) for a random sample of children selected from a population of all students who are enrolled in a schoolfor gifted children. The
researcher wants to estimate the mean EI for the entire school. Let's suppose that a researcher wants to set up a 95% CI for IQ
values involved in computing the CI
influences the width of the CI. Recalculate the CI for the emotional IQ information in the preceding question to see how the lower and upper limits (and the width of the CI) changeas you vary the Vin the sample (and leaveall the other values the same). 1. Whatare the upper and lower limits of
the CI and the width of the 95% CI if all the other values remain the same (M=
130, SD = 15), but you changethe value of Nto 16? Note that when you change N, you need to changetwo things: the computed value of SZ, and the degrees of freedom used to look up the critical
values for £. 2. Whatare the upper and lower limits of
the CI and the width of the 95% CI if all the other values remain the same, but you changethe value of Vto 25? 3. Whatare the upper and lower limits of
the CI and the width of the 95% CI if all
scores using the following information:
the other values remain the same (M=
The sample mean M= 130.
of Nto 49?
The sample standard deviation SD = 15. The samplesize V= 120.
130, SD= 15), but you changethe value 4. Onthe basis of the numbers you reported for sample size Nof 16, 25, and 49, how does the width of the CI change as N(the number of cases in the sample)
df=N-1=119. For the values given above, the limits of the
95% Clare as follows: Lower limit = 130-1.96 x 1.37 = 127.31.
increases? 5. What are the upper and lower limits and the width of this CI if you change the confidence level to 80% (and continue to use M= 130, SD= 15, and N= 49)? 6. What are the upper and lower limits and
Upper limit = 130 + 1.96 x 1.37 = 132.69.
The following exercises ask you to experiment to see how changing some of the
31% Page 191 of 624 + Location 4838 of 15772
the width of the CI if you change the confidence level to 99% (continue to use
M= 130, SD= 15, and N= 49)? 7. How does changing the level of
confidence from 80% to 99% affect the width of the CI?
Digital Resources Find free study tools to support your learning,
including eFlashcards, data sets, and web resources, on the accompanying website at
31% Page 192 of 624 + Location 4863 of 15772
You can describe distribution shape by thinking
vegetable consumption?
about the answers to these questions. Some of
3. Diet experts often recommend at least
these descriptions are not mutually exclusive. For
five servingsof fruits and vegetables per
example, a positively skewed distribution may
day. How well are the peoplein this
also have high-end outliers, and it may have a
sample doing at meeting that standard?
large mode at zero.
4. What percentage of persons reported eating one serving per day? Thisisa
In atypical research report, authors would like to
frustrating question to answer, given
beable to say something like this at the beginning
this bar chart. If you had access to these
of the “Results” section: “All quantitative variables
data, what other SPSS output would you
were approximately normally distributed with no
want to see to answer this question
extreme outliers.” Real data often do not behaveso
precisely?
nicely, of course. An author might have to say
. Briefly describe, in your own words, three
something more like this: “Number of doctor
things you look for to decide whether a
visits had a reverse J-shaped distribution with five
histogram lookslike a “reasonably normal”
high-end outliers.”
distribution. . Describe the shape of each of the histograms
Comprehension Questions
in Table 5.3. Sometimes more than one term can be applied; for example, skewed
1. Inthe bar graphs in most of this chapter
distributions may also have outliers.
(except those in Section 5.14), the height of
. Whattype of plot appears in Figure 5.33?
the Yaxis provides what information?
What do the values on the Yaxis correspond
2. Suppose you generate a bar graph using SPSS.
to? (Score values? Frequencies?) What
You also have a frequencytablefor the same
information can you report from this plot?
data. What information from the frequency
There are omissions in labeling. Whatlabels
table might you add to the bar graph to make
could be added to this chart?
the information in the bar graph more Figure 5.32 Results From Warner, Frye, Morrell,
precise? 3. Whatisacommon practice that can makea bar graph deceptive? Can you think of at least one other way bar graphs can be made deceptive?
and Carey (2017): Number of Servings of Fruits and Vegetables Eaten on a Typical Day, V= 1,250
50% 40%
4. What can you see in a histogram of quantitative scores that is less easy to see in a frequency table? 5. Consider the histogram in Figure 5.32.
1. What were the minimum and maximum number of servings of fruits and vegetables peoplesaid they ate per day?
30% 20% 10% 0%
O
1 2 3 4 5 6 7 Number of servings offruits and vegetables per day
What was the range?
2. What was the modal amount offruit and
23% Page 1300f 624 - Location 3458 of 15772
The X axis representsthe numberof servings
8
8.2 Significance Tests as Yes/No Questions About Proposed Values of Population Means
Sometimes real-world data analysts have access to data for an entire population of interest. Statistical significance tests are not needed when information is available for the entire population. Significance tests are used when we want to make inferences (or estimates or guesses) about unknown population characteristics such as 。
In Chapter 7, the body temperature data in the file
using only data from a sample.
shoemaker.sav were used to set up a CI for mean body temperature in degrees Fahrenheit. The 95%
The following sections describethe steps that are
CI based on that data did not includethe value of
involved in NHST.
98.6°F that most peoplebelieve is the mean temperature for healthy human populations. In this chapter, we begin by proposing (or
8.3 Stating a Null Hypothesis
hypothesizing) that p = 98.6 °F; then we examine
The term Zypothesis can refer to a verbal
sample data to decide whether that proposed
statement (e.g., “I think my partner is cheating”).
value of pis, or is not, plausible.
For statistical significance tests, hypotheses
The procedure for NHST involves familiar operations: computing descriptive statistics such as M, SD, and SErand looking upcritical or cutoff values of fin a table for #with d/= (N-1). New steps involve setting up null and alternative hypotheses about proposed values of u. Each individual step is simple; however, it can be difficult for beginning students to keep all these steps in mind.It is importantto go through these
correspond to equations. To set up a yes/no question about a proposed value for u, the unknown population mean, we begin by stating a null hypothesis (Hp)in this form:
Other
(8.1)
Ной = Мур:
steps “by hand”; the more you repeat them, the
In this equation, Mhyp is always replaced by a
better you may understand the logic.
specific numerical value. Using 98.6*F asthe specific numerical value for upyp, the null
After you escape into the “real world” and write
hypothesisfor the study of Mackowiak,
research reports, you will not haveto write out all
Wasserman, and Levine (1992) is:
the logic involved in NHST; most of the logic will be implicit. Research reports rarely provide detailed information about all the steps that are
Other
H,: p = 98.6ºF.
outlined in this chapter. SPSS and similar programs will generate final numerical results for
In words, this null hypothesis says, “I hypothesize
you; you won't need to do arithmetic and table
that the true population mean body temperature
lookups. However, you need to understand the
equals 98.6°F.” Depending on the variable that is
logic so that you can understand the meaningand
examined in the study, the proposed value for the
limitations ofp values and NHST.
population mean stated in the null hypothesis could have other values, such as a driving speed of
32% Page 193 of 624 - Location 4894 of 15772
35 mph, a diameter of 10 cm, or an IQ of 100
we reason about evidence in everyday life. In
points. Most books refer to the Ap equality
everyday life, a person thinks of a hypothesis
statement in Equation 8.1 as a null hypothesis. It
(such as “My dating partner is cheating on me”)
makes more sense to think of this as a hypothesis
and then looks for evidence to support that
that can potentially be nullified or rejected on the
hypothesis. In everyday life, we tend to look for
basis of the obtained sample mean.
confirmatory evidence, thatis, evidence that supports our initial hypotheses (Abelson &
On the basis of information from the sample
Rosenberg, 1958). NHST requires usto look for
temperature data (W, M, SD) we will be able to
disconfirmatory evidence. In effect, in many
makeone of two decisions:
research situations, researchers set up a null
e Reject Hp. If we reject Ap, that is equivalent to saying that we do not believe phy(which is 98.6°F in the body temperature example) is a plausible value for u. * Do not reject Hp. If we do not reject Ap, that is equivalent to saying that we cannot rule out
hypothesis that they don’t believe and then look for evidence to reject that null hypothesis. This requires us to think in terms of double negatives (e.g., T have evidence against a null hypothesis that I wantto believe is wrong). This setup is counterintuitive; it differs from our natural inclinations in everyday reasoning.
Hhyp as a plausible value for u. The logic of NHST focuses on evidence thatis We cannot say “Accept Hy.” This would be logically
inconsistent with a null hypothesis (or, to be more
equivalent to saying “I have proved that p exactly
precise, evidence we would be unlikely to obtain if
equals phy(98.6°F).” The logic used in NHST does
Hpis correct). The first step in NHSTis setting up
not provide support for that kind of conclusion. If
anullifiable hypothesis. An example of a
a research report says “accept Ho,” the author has
nullifiable hypothesis in NHST is #0: u = 98.6°F.
misunderstood statistical significance testing.
The evidence that would lead us to doubtor reject
Never, never say “accept Ap"!
Ho is a value of Mthatis “very far” from jnyp (Le.,
Neither decision (reject Ap or do not reject Ap) can be made with certainty when we have only sample data. For either decision, reject or do not reject, there is arisk that the decision is wrong. In
a sample meanvery different from 98.6°F). The computations madein statistical significance tests make it possible to quantify precisely what we mean by “very far.”
theory, NHST provides ways to evaluate the risk or
Often (but not always) data analysts hope to reject
probability of a Type I decision error (a decision to
(or “nullify”) Hp. In many studies, a researcher
reject Hp when Aj is correct). Note that a
specifies a null hypothesis he or she does not
researcher can make a Type I decision error even
believe and then hopes to obtain evidence to reject
if he or she has done everything correctly.
that null hypothesis. Sometimes rejecting #
Uncertainty about decisions is inherent in the
means that, from the researcher’s point of view,
process of using sample data to make inferences
the study was a success.
about populations. Note that the logic of NHST differs from the way
32% Page 194 of 624 + Location 4922 of 15772
8.4 Selecting an Alternative
Hypothesis
are converted into /ratios, and then ¿ratios are
Two hypotheses are needed for NHST: a null
terms. (Use of a /ratio to assess the distance of M
hypothesis (denoted Ap) and an alternative
from a hypothesized value of pis analogous to the
hypothesis (denoted 77,or sometimes 71). As
use of a zscore to assess the distance of a single Y
used to assess distance from the meanin unit-free
noted previously, the equation for a null
score from a sample mean.)
hypothesis is of the form Ho: u = Whyp (where unyp
The two-tailed version of Ha}is also called
is a specific value chosen by the data analyst, such
nondirectional because the direction of difference
as 98.6°F).
between Mand the specific value of ppyp (such as
Note that a null hypothesis could be incorrect in
98.6°F) is not specified.It is called two-tailed
any of three different ways: pu could be unequal to
because we reject Afor values of Mor ¿that
Hhyp, greater than jupyp, Or less than upyp. (True
correspond to either the lower or upper tailof a ¢
population mean body temperature could be =
distribution. In practice, the terms
98.6°F, > 98.6°F, or < 98.6°F). Alternative are statementsofalternative realities:
interchangeably.
In the body temperature example, if Ho is incorrect, what range of outcomes for sample mean temperature would you expect? Each version of Hy; specifies a different range. For a one-sample test, a data analyst selects one of the following three alternative hypotheses.
can be used
and
Using this nondirectional or two-tailed version of Нан,the researcher collects data, examines sample M, and rejects #if the sample mean Mis either far aboveor far below janyp. In this example, we wouldreject Hp: u = 98.6°F as implausibleif we obtain a sample Mthat is either much lower or
Alternative Hypothesis 1-The population mean is
much higher than 98.6°F. Later in the chapter
hypothesized to differ from upyp, but we do not
you'll see how we quantify what we mean by
specify a direction of difference.
“much higher”: Exactly how far away from jnyp does Mneed to be to reject Hy? Can we reject Hp: u
The equation for a two-tailed or nondirectional alternative hypothesis is:
Other
= 98.6°F if we obtain M= 99.0°F? M= 94.5°F? M= 101.3°F?
When we car specify the expected direction of difference, we can use one of the following one-
(8.2)
代 許 チ Ту" This version of py; is called two-tailed because we will reject Hp for values of Mthat are either much higher or much lower than jpyp- These values of M correspond to /values in either the lower or upper tail of the /distribution. To evaluate distance from Hhyp (such as 98.6°F), values of M
32% Page 195 of 624 + Location 4950 of 15772
tailed or directional alternative hypotheses. These tests are called directional because they specify one of two possible directions in which u might differ from ppyp. They are called one-tailed because for Hay 2, Hy is rejected only for outcome values of Mand ¿that fall in the upper tail, and for Hay 3, Ho is rejected only for outcome values of #
and ¿that fall in the lower tail. The terms one-
and
can be used
interchangeably. The direction of difference
hypothesis (and often data analysts want to reject
should bestated when test results are reported.
the null hypothesis). I suggest that you use Za1 (the two-tailed test or nondirectional alternative
Alternative Hypothesis 2: The population mean
hypothesis) in mostsituations. When you learn
is hypothesized to be higher than ppyyp.
about other statistics later, such as the Ftest, you will find that some tests are always one tailed. The
If the researcher thinks that the true population mean may belarger than unyp (for example, that true population mean for body temperature is higher than 98.6°F), the directional alternative hypothesisis as follows; we would reject Ap: p= 98.6°F only for a sample value of Mthat is much higher than 98.6°F:
choice between one- and two-tailed Ha}. options is an issue primarily for ¿tests.
8.5 The One-Sample /Test Weneed to quantify the distance of M from Upyp precisely so that we can decide whether Mis “very far” from ppyp. We wantthe distance to be in
unit-free terms so that we can evaluate it asa
Other
large or small distance by looking at standardized
(8.3)
Ни 2: в > Ву ⑥⑧ > 98.6°Р).
distributions of zor values. In earlier chapters, when we wanted to specify
the distance of an individual Y score from the Alternative Hypothesis 3: The population mean
sample mean M, we computed a zscore, z= (X—
is hypothesized to be lower than phyp.
M)/SD. Because z was unit free, we could look up
the value of zin a table of the standard normal If the analyst expects that the true population
distribution (Appendix A at the end of the book) to
mean is lower than the hypothesized value (e.g.,
evaluate areas below or above z, to decide whether
that population mean body temperature is lower
the XY score was very far away from M. For
than 98.6°F), the equation for a one-tailed or
example, if an X score was in the top 2% of the
directional alternative hypothesis takes this form:
area of the normal distribution, we could say that it was unusually high. (This works onlyif the X'
Other
scores are normally distributed.)
(8.4)
To evaluate the distance of M from janyp, we do
Hy 3:p Mhyp), the reject region consists of only the upper tail; we have a = .05, one tailed (upper tail only). Look under “Level of Significance for One-Tailed Test”
We will reject Ap (that p equals some specific
in the column “.05”to find the critical value for d/
value, such as 98.6°F) if the zratio tells us that Mis
= 15; this critical value is 1.753. We reject Æif #
veryfar from uhyp- To do this, we need to define
> 1.753. This reject region appears in Figure 8.3b.
reject and do not reject regions in terms of specific values for £. To define the reject region(s) for values of £, we need to know these three things:
Because the ¿distribution is symmetrical, once you know that +1.753 identifies the top 5%, you also know that /= -1.753 correspondsto the bottom 5%. If Hair: И < Hhyp, We reject Ho for
e Choice of Hat. This tells us whether to include only one tail or both tails in the reject region.
values of Zbelow -①.⑦⑤③,as shown in Figure 8.3c. The values of used to labelthe reject and do not reject regions for the three different versions of
e Choice of a. This tells us how much area is included in one or both tails ofthe £
May appear in Figures 8.3a, 8.3b, and 8.3c. (Reject
distribution; often a is 5% or 1%. e Sample 4/(N- 1). This tells us which £
but it is more conventional to think about them in
distribution to use to find critical values that cut off tail areas.
regions could also be given in terms of values of M, terms of values of £) The reject regions in Figure 8.3 correspond to values of Mthat are so far away from ppyp (with distance between Mand jnyp
Suppose your sample has N = 16 (df= 15); that you
expressed in terms of the unit-free /test) that
use Раде H = Mhyp; and you choose a =.05. The
they wouldbe very unlikely to occur if Ap is true.
reject regions correspond to a = .05, two tailed. Thus, you need the critical values that divide a と
Later you will see that there is an easier way to
distribution with 15 dfinto the bottom 2.5%,
decide whether to reject or not reject Ap than
middle 95%, and top 2.5% areas. Critical values of
comparing an obtained value of zwith these reject
tcan be found in the ¿distribution table in
regions from a /distribution. You can just
Appendix B at the end of the book. An excerpt
examine p values in SPSS output instead of #
33% Page 200 of 624 - Location 5097 of 15772
that resultin thecritical values. Details are below:
values. The reject/do not reject decision becomes quite simple when you do this:
Confidence intervals percentage
« If obtained р < а, reject Ap (the outcome is
80
called “statistically significant”). e If obtained p> a, do not reject Ho (the
90
95
98
99
9
Levels ofsignificance for One-tailed test
outcome is called “not statistically
0.1
significant,” sometimes abbreviated 7s).
0.05
0.025
0.01
0.005
0
Levels ofsignificancefor Two-tailed test
The a levelis selected by a data analyst before looking at the data. Often ais set at .05. The p
df
0.2
0.1
0.05
002
001
O
valueis obtained from your computer output.
12
1.356
1.782
2.179
2.681
3.055
4
13
135
1.771
216
2.65
3.012
4
14
1.345
1.761
2.145
2.624
2.977
4
15
1.341
1.753
2.131
2.602
2.947
4
16
1.337
1.746
2.12
2.583
2921
4
17
1.333
1.74
211
2.567
2.898
3
SPSS reports a p value as “Sig.” SPSS usually reports two-tailed p values (for tests
such as ratios that can be either one or two tailed). If you use a two-tailed alternative hypothesis, just evaluate whether the SPSS “Sig.” or pvalueis less than a. If you use a one-tailed alternative hypothesis, you need to convert the two-tailed SPSS “Sig.” or p value into a one-tailed y
value. Figure 8.2 Excerpt From Table of ¢ Distribution (From Appendix B at End of Book)
The df 15 level andthelevelofsignificance for
CRITICAL VALUES FOR t DISTRIBUTION
[
30
7
Confidence Imervals 06) ッ | の | タ タ LevelofSignificancefor One-Tailed Test
TE Tm 1761 =
260 2583
2567
2921
also circled.
Figure 8.3 Reject Regions for Two-Tailed and One-
ous
| ぁ ]
5055 3012 Tm |299 | 2898
two-tailed test values of 1.753 and 2.131 are
ヶッ
|| | 5 avi ofSignfcance or Two Tail Ter «|» To a a
The 95 percent confidenceinterval has been circled, as has the .05 levelofsignificance for the One-tailed and twotailed tests.
w am aw | 30% | aos
| 3965
The imageis an extract from the critical values for T distribution andhas been adapted from the table by Fisher and Yates. Thetable lists different confidence intervals, andlevels ofsignificance for one tailed and twotailed tests. It also showsthe df ranges
33% Page 201 of 624 - Location 5129 of 15772
|
Tailed ¿Tests (Example: 2/= 15)
tailedtest or directional test: reject H subscript 0 onlyfor t values in upper tail or values of M greaterthan 35.
tor nondi lues of ti cear // Эла) Do notrejectitt / between 2.131 4 and 12.131 っm
Rect 1 >42131
The imageis of a normal distribution.
Thearea beyondplus 1.753 on the right is the reject region of 5 percent with the statement —
reject H subscript 0. @ One-ta directions Ny for values in u MF)
Thearea to the left of plus 1.753 hasthe
statement — Do not reject H subscript 0. / бота,
Reject H,
t=+1753
a) H subscript ait: mu is less than 35. Onetailedtest or directional test: reject H subscript 0 onlyfor t values in upper tail or values of M less than 35. The imageis of a normal distribution. The area beyond minus 1.753 on the leftis the
Reject H,
Do not reject H, t=-1753
The image shows the reject regions for twotailed and one-tailed t tests. There are three diagrams,for different values of H subscript ait. The df level is equal to 15. a) H subscript ait: mu is not equal to 35. Twotailedtest or nondirectional test: reject H subscript 0 for valuesof t in both lower and upper tails. The imageis of a normal distribution.
The area betweenplus 2.131 on the right and minus 2.131 on the left is the central region
reject region of 5 percent with the statement —
reject H subscript 0.
Thearea to the right of minus 1.753 has the
statement — Do not reject H subscript 0. A one-tailed p valueis half of the corresponding two-tailed y value. If SPSS saysthat (the twotailed)
= .06, then the corresponding one-tailed
pvalue is .03. For a one-tailed or directional ztest, compare the one-tailed p value (in this example, p =.03) with a. You must also check that the direction of difference of Mfrom unypis consistent with the direction of difference in your alternative hypothesis. To avoid possible confusion between one- and
‘with the statement - Do not reject if tis
two-tailed p values, and for other reasons, I
between minus 2.131 andplus 2.131.
recommend that you use nondirectional (two-
Thearea beyondplus2. 131 is the upper2.5 percent and beyond minus2. 131 is lower 2.5 percent. Both these regions are to be rejected.
tailed) tests in most situations.
Statements state: Rejectif t less than -2.131
and Reject if is greater than 2.131.
8.8 Questions for the OneSample /Test
b) H subscript ait: mu is greater than 35. One-
The question examined by the one-sample¿test
33% Page 202 of 624 + Location 5144 of 15772
can be worded three different ways. For the body
the population. We usually havelittle
temperature example, we use 98.6°F as the value
information aboutthe distribution of scores
for Uhyp-
in populations. We usually have convenience samples, instead of random samples from the
1. Can we reject Ho: U = hyp? (The decision can beeither to reject or not reject Ap.) 2. IS Mhyp a plausible value for y? (The decision can be either yes or no.) 3. Is Msignificantly different from ppyp? (The
population of interest. * Payattention to non-normal features of data that could make the sample mean a poor way to describe central tendency, such as extreme outliers, a mode at zero, and bimodal
decision can beeither that Mis significantly
distributions with modes far apart. If Mdoes
different from Ирур or that Mis not
not make sense to describe scores in the
significantly different from ppyp.)
sample, then the one-sample ¿test won't
The third version of the question is most consistent with the ways NHST is usually reported for most statistical significance tests
discussed later in the book.
8.9 Assumptions for the Use of the One-Sample /Test + Scores for the XY variable must be quantitative.
makesense either. Violations of assumptions can lead to p values that underestimate the true risk for Type I error.
8.10 Rules for the Use of NHST If you want to make yes/no decisions, you should do thingsin the correct sequence. Before you collect data, decide on Y, decide on
(If they are not, it makes no sense to compute
procedures for the identification and handling of
amean.)
outliers, formulate the null and alternative
+ Scores for the Y variable should be independent of one another. (In the following example with driving speeds, the speeds would be nonindependentif there were heavy traffic or if cars were racing one another. The independence assumption was
hypotheses, and select the a level. Do one significance test (or a small number of tests). Do not run dozens or hundredsof tests and then hand-pick a few with small p values to
report.
discussed in Chapter 2.) If scores are not
After you have done significance tests, do not go
independent, the estimate of SD may be too
back and rerun tests with variations in procedure
small. * Some sources state that the distribution of Y
to see if you can obtain different results. For example, do not change from a two-tailed toa
scores in the sample must be normal.
one-tailed test, do not change the a level, do not
Technically,thisis not correct. The
drop outliers and rerun the analysis, and do not
assumptions made when this test was
collect more data and rerun the analysis. Running
developed were that scores are normally
large numbers of analysis in search of small p
distributed in the population, and the scores
values is called p-hacking.
in the sample were randomly selected from
33% Page 203 of 624 - Location 5166 of 15772
Violations of rules can also lead to p values that underestimate the true risk for Type I decision
error. Unfortunately, in real-world research, violations
Step 4: State the alternative hypothesis:
Other
Aa Ehyp #35.
of rules and assumptions are fairly common.
Ifthe cranky resident has not specifically
Therefore we should not have too muchfaith in y
stated the direction of difference for the
values.
alternative hypothesis, a nondirectional or two-tailed alternative hypothesis is used.(If
8.11 First Analysis of Mean Driving Speed Data (Using a Nondirectional Test) We are now ready to apply the one-sample¿test using a hypothetical example. Suppose that a cranky resident of a college town is upset about students’ driving speeds. The posted speed limitis 35 mph. The citizen plans to gather data on driving speed to evaluate if she can plausibly complain to the police that the actual average driving speed for the population of all student drivers is significantly different from the posted speed limit. (In a later example, a one-tailed test using a directional alternative hypothesis is used.) In the traditional approach to NHST, it is importantto decide on M, a, and the nature of the
the resident uses SPSS to run her data analysis, the p value provided by SPSS implicitly assumes this nondirectional or two-tailed alternative hypothesis.) Step 5: Specify the reject regions. Use the ¿ distribution with (N-1=) 8 d/ The values of # that correspond to the bottom 2.5% and top 2.5% of the area of a /distribution with 8 d/ can be found in the table in Appendix B at the
end of the book. Values of -2.306 and +2.306 divide a ¿distribution with 8 Zfinto the lowest 2.5%, middle 95%, and top 2.5%, as shown in Figure 8.4. Drawing a diagram similar to Figure 8.4 can be helpful when you are identifying reject regions.
Other
H, will be rejected if 1 +2.30¢
alternative hypothesis before the collection and
Step 6: Collect data. To evaluate the null
analysis of data.
hypothesis that the mean population speed equals the 35-mph speed limit, the resident
Step 1: Decide on 4, the number of cases in
uses a radar detection device to clock speeds
the sample. For this example, V = 9 cars.
for a sampleof nine cars that pass her house and computes the descriptivestatistics for
Step 2: Decide on the acceptable risk for Type
this sample (W, M, and SD). Ideally, these cars
Terror (I use the popular value of a =.05).
would be randomly sampled from the
Step 3: State the null hypothesis about population driving speed:
population of all passing cars. However, it
would bedifficult to obtain a true random sample in this situation. For this question, the cars should be driven by students (the
Other
Ho: Epyp = 35 mph. 34% Page 204 of 624 + Location 5192 of 15772
population of interest). Ideally, the sample
would not include only red cars or cars driven
using the SPSS frequencies procedure
by intoxicated students coming home from
discussed in Chapter 3. For this hypothetical
weekend parties. The nature of cases included
data set, M= 39, SD = 6.103, N= 9, and SEy =
in the samplecan limit the generalizability of
SD/ = 6.103/3 = 2.034.
findings (Simons, Shoda, & Lindsay, 2017). Data for this hypothetical example are in the
Step 8: Find the /ratio and its d/ The one-
file carspeed.sav.
sample ¿ratio can be calculated by hand or
Figure 8.4 Reject Regions for a = .05, Two
obtained using the SPSS one-sample #
Tailed, With 8 df, Corresponding to Shaded
procedure. On the basis of the null
Areas
hypothesis, Mhyp = 35. Given 1, for a onesample ¿test, 4/= V-1 = 8. From the previous step, M= 39 and SEy= 2.034. Combining this information, we have £= (M-— Hhyp)/SEm = (39
Reject H,
RejectH, 2.5%
2.5%
—2.306
+2.306 t with 8 df
The image is a diagram of a t distribution that shows the percentageof rejected area under the curve for alphaequals .05, two tailed with df equaling 8. The image showsvaluesof t that correspondto 5 percent area in the combined upper and lower tails or the 2.5 percent of area at the ends ofeachtail. The area betweenplus 2.306 on the right and minus2. 306 on the left is equal to 95 percent.
—35) = 4/2.034 = 1.966. Screenshots of the output of SPSS’s one-sample #test procedure appearin Figures 8.5 and 8.6. Enter the value for Hhyp (which is 35, in this example) into the space for “Test Value.” Step 9: Find the CI for Mthat correspondsto
the selected a level. Current recommendations for reporting from many sources call for inclusion of CI information when significance tests are reported. To obtain the CI for M, the onesample test procedure is run a second time (using test value = 0, as demonstrated in Chapter 7).
For the distribution on the left side of Figure
The area beyondplus 2. 306 is the upper 2.5 percentand beyond minus 2. 306 is lower 2.5 percent. Both these regionsare shaded and have a statement: reject H subscript 0.
8.1, the middle area under the distribution corresponds to C(95%); the combined areas
Below plus 2. 306 is a statementthat t is
andthe top 2.5%. C+ a = 1.00, the entire area.
with 8 df.
The equation to obtain level of confidence C
Step 7: Obtain descriptivestatistics. For small
in the upper and lower tails correspond to à (2.5% + 2.5% = 5%). Thus, the distribution is divided into the lower 2.5%, the middle 95%,
for a CIthat correspondsto the a level used
for atwo-tailed ¿test is:
data sets, this can be done by hand. Descriptivestatistics can also be obtained
34% Page 205 of 624 - Location 5221 of 15772
Other
(8.6)
In the output in Figure 8.5, the obtained value of #
Level of confidence = C = 100 x (1-0 Because a is given as a proportion, we subtract a from 1; to turn this difference into a percentage, we multiply it by 100. For a = .0⑤, two tailed, the corresponding level of confidence = 100 x (1 - a) = 100 x (1 -.05) = 95%. Thus, if your test uses a = .05, two tailed, the corresponding CI is 95%. (CIs do not correspond to one-tailed a values.)
Step 10: Compare the obtained value of ¿from Step 8 with the reject regions in Step 5. In this example, £= +1.966 (with 8 75 falls in the “do not reject” region. This tells us that the
= 1.966. This agrees with the value of zreported above from by-hand computation. This ¢ratio has 8 df(df= N-1, where N= 9). The value under the heading “Mean Difference” refers to the numerator of the ratio, thatis, M-(Uhyp)- Using M= 39 and Mhyp = 35, the difference between sample mean speed and hypothesized mean speed is (39-35) =4. The sample mean was 4 mph higher than the hypothesized population mean of 35 mph.
The confidence interval in Figure 8.5 is for the difference between Mand Hhyp (not for M). The
95% CI for (M- upyp)is [69, +8.69].
obtained mean speed (# = 39) was not high enough for the citizen to reject the null hypothesis that the population mean driving
8.13 “Exact”p Values
speed for the entire population of students is
A new piece of information appears in the SPSS
35 mph.
outputin Figure 8.5. In the column headed “Sig. (2-tailed)” we find the “exact” p value that
8.12 SPSS Analysis: One-Sample ¿Test for Mean Driving Speed (Using a Nondirectional or Two-Tailed Test)
corresponds to the obtained value of £. This y
value is the sum of the two tail areas thatlie beyond the obtained /value of +1.966. “Exact”is in quotation marks because many common data analysis practices result in p values that greatly underestimate the true risk for Type l error that y is supposed to estimate. The y value in computer
The SPSS one-sample ¿procedure was used in the
output is exact only in the sense that it
previous chapter (where it was used to set up a
corresponds exactly to the tail area(s) using the
95% CI for M); screenshots for the menu
obtained /valueto “cut off” the tails.
selections appeared there. You can use the same
Figure 8.5 Output From the One-Sample # Test
procedure to perform the one-sample¿test for M
Procedure for Hypothetical Driving Speed Data
(using phyp as the test value). Make the following
Using Test Value = 35
mmenu selections: っ
- . Enter the value of phyp specified in the null hypothesis into the space for “Test Value”; in this example, phyp is 35. Output appears in Figure 8.5. (We will ignore the CI information in Figure 8.5 and focus on the ztest.)
34% Page 205 of 624 - Location 5247 of 15772
One-SampleStatistics
u
Reject rules in terms of obtained p value, using a = .05, can be stated asfollows: « Ifp.05, do not reject the null hypothesis. More generally, do not reject Zp ifp> a. Proponents of the New Statistics suggest that we
The imageis a diagram of a t distribution that showsthe percentage of rejected area under the curve for alphaequals .05, onetailed with df equaling 8. The image showsvaluesof t that correspondto 5 percent area in right tail that are to be rejected.
report the exact p value from the SPSS output(e.g., p=.0845, two tailed) and avoid making yes/no decisions about a null hypothesis. In other words, we don’t state that we reject or do not reject the null hypothesis; we don’t say that the result is statistically significant or not statistically significant. Reporting an exact p value makes it possible for readers who still prefer the
Thearea to the left of plus 1.86 is equal to 95
traditional approach to NHST to make their own
percent. This region hasa statement: Do not
decisions whether an outcome is “significant” or
reject H subscript 0.
not. Reporting an exact p value also avoids the
Thearea beyondplus 1.86 onthe right is the 5
following problem: What can you say ifp= .051 or
percentreject area and hasthe statement:
reject H subscript 0.
P= .06? For an outcome such as p= .051, you
should not say that the outcome was “almost” significant. Reporting exact p values reminds us
The p value in SPSS output provides an easier way
that values ofprepresent a continuum and that
to make the decision whether to reject Ap. You
we do not have to think of .05 as a “cliff.”
can reject Mo if the exact y value on the SPSS
two tailed.)
8.14 Reporting Results for a Two-Tailed One-Sample ¿Test
In this example, the exact two-tailed p = .0845
When youreport results for significance testsin
(.04225 of the area lies below = -1.966, and
research papers, much of the logic is implicit. For
.04225 of the area lies above £= +1.966). The a,
example, you convey the information that you
two tailed,criterion for statistical significance
used Hp: ц = 35 by saying that the p valueis two-
wasset at .05. Because pis larger than à, we do not
tailed. The example “Results” section below
reject Hp.
follows the New Statistics guidelines: Report an
outputis less than the a level you selected. (You need to specify whether the testis one tailed or
Obviously, it is much easier to make reject/do not reject decisions on the basis of values ofpthan values of ¿and reject regions.
34% Page 207 of 624 - Location 5275 of 15772
exact p value; do not state a decision whether the result is “statistically significant.”
Results
A one-sample /test was conducted to assess
level, for example, p< .05 or p> .05, or
whether mean speed for a sample of N= 9
sometimes 7s as an abbreviation for not
cars differed from the posted speed limit of
significant. Reporting an exact p value from
35 mph. For this sample, M= 39, SD= 6.103,
SPSS (e.g., p= .0845, two tailed) is now
and SZ 2.024. The one-sample#statistic
preferred.
was #8) = 1.966, p= .0845, two tailed. Cars in
« If you don't specify a choice of a level within
this sample drove an average of 4 mph faster
the “Results” section or earlier in a research
than the posted speed limit. The 95% CI for
report, readers generally assume à = .05, and
this difference was [-.69, +8.69].
they may use that to draw their own yes/no conclusions about the null hypothesis.
A person who prefers traditional NHST reasoning could go on to say that, using a = .05, two tailed,as the criterion for statistical significance, this difference was not statistically significant. Proponents of the New Statistics advise against
8.15 Second Analysis of Driving Speed Data Using a One-Tailed or Directional Test
this yes/no kind of thinking. Let's return to the car speed data. Wait! The When scores are given in meaningful units, it is
cranky residentis really interested only in the
useful to think about differences in terms of those
possibility that students are driving/asteron
units. In this example, sample mean driving speed
average than 35 mph (not slower). The resident
exceeded the speed limit by 4 mph. In the United
could decide to do a one-tailed test. These would
States, police usually do not bother to give
be the null and alternative hypotheses:
speeding tickets unless driving speed is at least 5 mph above the speed limit (and often much higher than that). From a practical or real-world
Other
Hy;p=35.
perspective, a sample mean speed only 4 mph above the posted limit is negligible. We could say this outcome has no practica
ance, and it
is not statistically significant. In Chapter 9, you will learn how to add effect size information when you report significance tests.
有ale、 > ③⑨. Using a one-tailed or directional alternative hypothesis does not change any of the computations for zor dfthat were used for a twotailed test, but it does mean we need to consider a
Here are several things to notice about “Results”
different reject region. For this version of the
sections.
alternative hypothesis we reject Zp onlyif the
* For ¿tests, you must specify whether the reported p value is based on a two-tailed or one-tailed (nondirectional or directional) alternative hypothesis. e Older textbooks sometimes reported whether ク wasless than or greater than a chosen a
34% Page 207 of 624 + Location 5301 of 15772
valueof 215 in the upper tail (thatis, if Mis far above35). For a = .05, one tailed, with the reject region in the upper tail, the reject region appears in Figure 8.6.
Figure 8.6 One-Tailed Reject Region for Aq: p> Wap, a =.05,0ne Tailed, df= 8
the SPSS reported two-tailed p value) is .04225, which is less than the a of .05.
Whether the decision is made on the basis of the ¿value or the p value, the results are the
same.
8.16 Reporting Results for a One-Tailed One-Sample ¿Test
Do not reject H,
+1.86
Using a one-tailed test, we can report the test
result as follows:
For this directional version of yy, the decision rule becomes: Reject Apif obtained 7 < +1.86. If obtained #< +1.86, do not reject Hp. We now examine the obtained ¿value compared with this
one-tailed decision rule. From the same SPSS output in Figure 8.5, the obtained ¿was +1.966. This did not fall into the reject region using the two-tailed test, but for a one-tailed test, £= +1.966 falls in the upper tail reject region.
Results A one-sample mean test was conducted to assess whether mean speed for a sample of Y = 9 cars differed from the posted speed limit of 35 mph. The alternative hypothesis was that the mean population speed was greater than 35 mph. For this sample, M= 39, ⑤の = 6.103, and SFy = 2.024. The result was 8) =
The SPSS output reports a two-tailed p value, as noted earlier. You can obtain the one-tailed p by
sample drove an average of 4 mph faster than
taking half of the two-tailed p. For the driving
the posted speed limit. The 95% CI for this
speed example, SPSS reported р = .085, two tailed.
difference was [-.69, +8.69].
(Some SPSS procedures allow you to request onetailed p values as an option, but many procedures produce two-tailed p values by default.) The corresponding one-tailed p value = .085/2 = .0④②②⑤.
1.966, p= .04225, one tailed. Cars in this
Authors who prefer the traditional approach to NHST would go on to say that, using a = .05, one tailed, as the criterion, this difference would be judged statistically significant.
For the one-tailed test (Hay: | > 35), the decision to reject Ap could be based on either:
* Obtained zof 1.996 falls within the one-tailed reject region at the upper end of the distribution,
and/or
« one-tailedp value(calculated by taking half of 34% Page 208 of 624 + Location 5329 of 15772
Note that everything in the write-up is the same as for the two-tailed test, except for the reported p value (now one-tailed) and any verbal statement aboutstatistical significance.
8.17 Advantages and Disadvantages of One-Tailed Tests
Figure 6.3 Detail Distribution Table
B
area below 0 Eee
area between 0 IL ea セ
Y
7 o
From
+z
⑧ area between -z
Standard
Normal
Textbooks sometimes drill students in the use of the normal distribution table with questions such
с
as “What percentage of area lies between 2=-1.00
area above +7
Л
and z= +2.00?” These artificial examples do not correspond to the kinds of questions that are of real interest to data analysts.
b
Data analysts usually want to answer a simple
c
question: Is an score or other outcomeclose to, far from, or extremely far away from the mean?
ando
Data analysts sometimes choose different
ッ
numerical values to define “far from.” The
À
AX
o
following z values are common ways of thinking
aboutdistance from the mean. e Values between z=-1.00 and z= +1.00 are “close” to the mean.
There are four diagrams that show the area between O and z as well as beyond z for positive and negative valuesof z.
* Values between 2=-2.00 and 2= + 2.00 (but
The first diagram highlights the area between 0 and positive value of z in a normal distribution diagram. The area to the left of O has been markedas Area below 0 equals 50
between”close and far from the mean. * Values below z=-2.00 or above z= +2.00 are “far from” the mean. * Values below 2=-3.00 or above 2= +3.00 are
percent.
outside the range -1.00 and +1.00) are “in
“very far from” the mean.
The second diagram, to the right of the first, highlightsthe area beyondpositive zina normal distribution diagram. This has been shownas the Area above positive z. The third diagram,below thefirst, highlights the area between 0 andnegative value of zina normal distribution diagram. The fourth diagram, to the rightof thethird, highlights the areathe area beyond negative z ina normal distribution diagram.
6.10 Dividing the Normal Distribution Into Three Regions: Lower Tail, Middle, and Upper Tail 24% Page 142 of 624 » Location 3671 of 15772
A normal curvedivided into these areas appearsin
Figure 6.4. Individual researchers are free to use other values of zascriteria for distances. Researchers are often interested in the situation where the areas beyond +zsum to exactly 5%. A
normal distribution can be divided into three
areas: ヶ 2.5% of the area below -z, the “lower tail,” ® 95% of the area in the center, and ® 2.5% of the area above+2, the “upper tail.” Figure 6.4 Areas That Are Close, Far, and Very Far From the Mean (in z Score Units)
additional information and avoid using terms
“marginally significant” or “approaches
such as significant and nonsignificant. Their
recommendation is based on concerns about the
significance” oris “close to significant” or “trends towardsignificance.” This will make
misuse and misinterpretations ofp values (among
readers and reviewers cringe, whether they
other things).
advocate traditional use of significance tests
Ithink many of you may find the New Statistics approach attractive. You don’t need to set up reject regions! You don’t have to judge your study afailure ifp> .05!
or prefer the New Statistics approach. In the minds of traditionalists, zis either less than .05 or it isn’t. It is either significant or not. (To paraphrase the late Groucho Marx,“Close is no cigar.”) From the perspective of the New
At least one journal (Basic andApplied Social Psychology) no longer accepts reports ofp values (Trafimow & Marks, 2015). However, the New Statistics view has not entirely replaced traditional thinking (at least not yet). My current
Statistics, just say that p = .052, without invoking an a = .05 criterion to decide what the p value means.
8.20 Summary
recommendation is to report “exact” p values, but don't place too muchfaith in them, and always
Most of this chapter outlines procedures used in
include confidence interval and effect size information. You will learn abouteffectsize in the
traditional approaches to interpretation ofp
next chapter.
presented a traditional approach to significance
values. Statistics textbooks prior to 2000 generally testing, with a strong focus on yes/no significance
8.19 Things You Should Not Say Aboutp Values 1. If SPSS shows “Sig. (2-tailed)” as .000, do not
say thatp= .000.Ap valueis a risk for Type 1
tests. (Some booksstill do.) In recent years, advocates of the New Statistics have urged us to move away from yes/no decisions and to focus
more on confidence intervals and effect size information. Effect sizes are discussed in the next chapter.
error, and theoretically, this risk is never zero. The tails of /distributions are infinite;
Although proponents of the New Statistics (e.g.,
tail areas are never exactly zero, theyjust
Cumming, 2014) do not necessarily dismiss p
become smaller and smaller as increases. If
values as completely useless, they make the
SPSS shows “Sig. (2-tailed)” as .000, report
following recommendations.
thisas “p< .001, two tailed.”
2. Given small yvalues such as7 .05) and “significant” outcomes is
discussed. 7. Guidelines for reporting results are provided, along withalist of things you should not say.
values; these verbal labels are only approximate.)
Table 9.1
Noeffect Smal efect Medium effect Large effect
= dsm de 50 (e.9, d'between 20and 79) 02%
9.2 Cohen's 4 An Effect Size Index
Cohen’s d'effect size can be calculated for the
An effect size provides information about the size
one-sample ¿test for these data was discussed in
of differences between group means, or the
Chapter 8. For these data, M= 39, Hhyp (test value)
impact of treatments, that is independent of
= 35, and SD = 6.103. For this example, Cohen’s d=
sample size and often in unit-free termsthat can be
(M-Mhyp)/SD=(39 -35)/6.103 =.655. Wecan say
compared acrossstudies. The effect size Cohen’s
that Mis about .66 or two thirds of a standard
provides an index that assesses the magnitude of
deviation above upyp of 35. Using Cohen's
the difference between Mand unyp independent
standards, = .66 for the driving speed study
of samplesize. Its magnitude(like that of other
would be called a medium effect size. Mean speed
effect size indexes) is not related to N. SPSS does
(39 mph) observed in the study was two thirds of
not provide Cohen's d'as part of the outputof #
a standard deviation higher than the proposed or
tests. However, it provides the information you
hypothesized value of mean speed (34 mph). That
need to compute Cohen's by hand: M, the test
difference was not statistically significant when a
value Upyp, and SD. For the one-sample¿test:
two-tailed test was used;it wassignificant, p8
significant Resultmay or may notbe statistically significant
statistically significant Result is usually statistically significant
identifies the /distribution used to look up critical values also increases. Use a = .05, two tailed,as the criterion for significance.
Other For N=9,7=.50x v9 =,50 3 = 1.50 with 8 df; notstatisticallysignificant. 6,t=.50 x V36 =.50 x 6 = 3.00 with 35 df; statisticallysignificant. ог №= 100, г = .50 х 100 = .50 x 10 = 5.00 with 99 df;statistically significant.
9.4 Statistical Significance Versus Practical Importance The term significant means something different in statistics than in everyday use. In everyday use, the word significantusually means large, substantial,
An effect size of = .⑤ would not be judged
of practical or clinical value, or worthy of notice.
statistically significant for ダ =⑨ but would be
By contrast, s
judged statistically significant for higher values of
technical meaning; outcomes of studies are
N.
judged “statistically significant” when results
A group of undergraduates got upset when I showed them this. “That’s cheating!” “The
e has a specific
would be unlikely to arise just from sampling error, on the basis of the logic of NHST.
researcher can make ¿come out (almost) any way
It is useful to distinguish between “statistical
heor she wants!” That's correct, within certain
significance” and o clinical pracr tical
limits. A ¿ratio is not a fact of nature. The
significance (Kirk, 1996). A result that is
magnitudeof ¿is at least partly the result of
statistically significant may be too small to have
decisions you made when you set up the study
much real-world value. A difference between M
(such as the decision about sample size).
and phyp can bestatistically significant and yet be
The dependence of ¿on Wis useful (when we want to take sampling error into account) but potentially problematic (when we want to evaluate effect size independent of sample size). When values of are very large, unless effect size information is provided,it can bedifficult to evaluate how muchof the size of ¿is due to large
36% Page 216 of 624 + Location 5507 of 15772
too small in actual units to have much practical or clinical significance, as in the car speed example. Statistical significance alone is not a guarantee of practical significance or usefulness (Vacha-Haase,
2001). Weevaluate statistical significance by examining atest statistic (such as a zratio) and accompanying information such as gfand p value.
also need to ask, What kindsof people were
repeated here to remind you how the magnitude
included in the study? Were the participants
of tin a sampleis related to sample effectsize and
doing additional things, such as exercise and diet
sample.
modification? How long did they take the drug and in what dose? How long was weight loss maintained after the drug was stopped? Was there a control group that did not receive the drug? And so forth.
In Equation 9.2, drepresents Cohen’s 7, and Vis sample size. This equation suggests that if we want to obtain a large valueof ¿in a future study, in theory, we could do that by examining a large effect size (4) or by using a large Nor both.
Do not use thephrase “highly significant” to describe
However, any value of d'we guess for population
research outcomes with smally values. That
effect size may be incorrect, and even if we did
language leads people to believe the results of a
know 4, the magnitudeof £in a future study will
study have great practical or clinical importance,
also beaffected by sampling error. We cannot
when in fact p< .001 can arise when a small effect
simply put values of dand Vinto Equation 9.2 and
is combined with a very large sample size. When
solve the equation for and assume that our study
you see the phrase “highlysignificant” in media
will result in that value of # In practice, values of #
reports, be skeptical. You need more information
(like values of M) vary because of sampling error.
(such as the actual difference between means, or
The logic used to estimate statistical power given
Cohen's d) to evaluate whether the results of the
values of Zand Vis discussed in Appendix 9A. In
study indicate that an intervention or treatment
practice, tables can be used to look up estimated
hadstrong, or even noticeable, effects.
statistical power for combinations of planned values of Vand guessed values of (Cohen, 1988,
9.5 Statistical Power In most(although notall) applications of NHST, researchers hope to reject Ap. Statistical power is defined as the probability of obtaining a value of # that is large enough to reject Ho when Ho is actually false. Refer back to Table 9.2 to see four possible outcomes when decisions are made whether to reject or not reject a null hypothesis. The outcome of interest, at this point, is the one in the upper right-hand corner of the table: the probability of correctly rejecting Ap when Ap is false, which is called statistical power.
1992a, 1992b). An exampleof a statistical power table, adapted from Jaccard and Becker (2009), appears in Table 9.3. Given an estimate for the population value of Cohen's Zand for planned samplesize #, you can look up expected statistical power in the body of the table. Alternatively, you can look down the column for an estimated population effect size, find the cell for power = .80, and look at the М Гог that row to find the minimum A required. This table applies only to tests that use a = .05, two
tailed. Different tables would be needed for other a levels or one-tailed tests.
Researchers want statistical power to be reasonably high; often, statistical power of .80 is
For example, suppose that aresearcher believes
suggested as a reasonable goal.
that the magnitude of difference she is trying to detect using a one-sample test corresponds to a
Recall that we can reject Zp when the obtained
population effect size of Cohen’s #=.50 and plans
valueof ¿is sufficiently large. Equation 9.2 is
to use a = .05, two tailed. The researcher can read
36% Page 218 of 624 - Location 5564 of 15772
down the column of values for estimated power
The sample size needed for adequate statistical
under the column headed #= .50 until reaching
power can be approximated only by making an
the table entry of .80. Then, she would look to the
educated guess about the true magnitude of the
left (of this value of .80) for the corresponding
effect, as indexed by d If the guess about the
value of On the basis of the values in Table 9.3, the value of Vrequired to havestatistical power of
population effect size dis wrong, then the
about .80 to detect an effect size of d= .5 in a one-
wrong. Information from past studies can often
sample test with a = .05, two tailed, is between
be used to make at least approximate estimates of population effectsize.
30 and 40.
Table 9.3
estimate of power based on that guess will also be
Statistical power analysis is useful when planning a future study. It is important to think about whether the expected effect size, alpha level, and sample size provide you with a reasonably large chance (reasonably high power) to obtain a statistically significant outcome. People who
"
s
⑥
ャ
write proposals to compete for research funds from government grant agencies are generally required to includea rationale for decisions about
n
a s a ョ ュ タ ョ タ
ョ ョ ョ ッ e
ッ
planned samplesize on the basis of power. There are several places to obtain information for statistical power analysis. Jaccard and Becker (2009) provide power tables for some additional
ョ +
メ ョ ョ ュ ュ ュ タ
ッ ッ
situations. SPSS has an add-on procedure for statistical power, and numerous other computer
s ョ メ メタ ラ ョ ョ ッ ッ ッ
programs (some free) can do power analyses. Free online power calculators are widely available (for example, at http://powerandsamplesize.com/Calculators/).
Source: Reprinted with permission from Dr. Victor
Bissonnette(2019).
The true strength of the population effect size we are trying to detect is not known. For example, the degree to which the actual population mean y differs from the hypothesized value, Hhyp, as indexed by the population value of Cohen's 4, is not known in advance of the study. If we knew
Usually researchers rely on computer programs instead of tables for power analysis. A researcher provides program input information about type of analysis (e.g., a one-sample¿test), planned a level, whether a one- or two-tailed test is desired, and expected effect size. Programs usually provide either the estimated power for an input value of N or the minimum A needed to achieve a requested level of power.
the answer to that question, we would not need to You should not report a post ho
do a study!
Thatis, do not look up your obtained Cohen’s &
36% Page 218 of 624 - Location 5594 of 15772
250
* Include mention of skewness in your
200
description of the distribution. e Ifskewnessis not extreme (asin the examples in Table 5.1), you may not need to
150
do anythingto try to get rid of skewness. If skewness is extreme (asin Figures 6.8 and
100
6.9), you may want to consider options such
as outlier removal to reduce skewness. * Decisions about the identification and removal of outliers should be madebefore
50 o
②
④
⑥
you collect data. If you makethese decisions
8
after you peek at your data, you must explain
Numberof correct answers on an 8-item quiz
to 250.
Thereare eight bars, one foreach question, drawn as a histogram. The heights of the bars from left to right are; 210, 70, 60, 45, 30, 15, 20, 15, 20. A curve followsthebars;its tail on the right is long andthe curve is higher towards the left. Figure 6.9 shows substantial negative skewness
data. Figure
6.9
Example
of
Negative
Skewness:
Hypothetical Exam Scores on a Scale From O to 100 30
Frequency
The X axis denotesthe items on a quiz and ranges from O to 8. The Y axis denotes the numberof correct answersandrangesfrom 0
this when you report information about your
20
10
and a possibleceiling effect. In Figure 6.9 most scores are piled up near 100 points (out of 100
possible points). A ceiling occurs when an exam is “too easy” for most students. Visual examination is usually sufficient to evaluate
skewness. Skewness should be mentioned when data are described in research reports. Positive
skewness is common in real data. Sometimes an appearance of skewness is dueto a few high-end outliers. An index to describe degree of skewness is available; see Appendix 6A for further discussion. Usually visual examination of a histogram is sufficient to evaluate skewness.
‘What should you do if you see skewness in your sample data? 25% Page 147 of 624 » Location 3794 of 15772
0
③0
⑥0
⑨0
negskew The X axis denotesthe scores on an exam and ranges from 0 to 90. The Y axis denotesthe frequency and rangesfrom 0 to 30. There are fifteen bars visible on the histogram. Mostof the bars onthe left are close to 0, and the onescloserto the right are higher. A curve followsthebars;its tail on the left is long andthe curve is higher towards the right.
either committed a II or has reported a correct decision notto reject Hg. (The researcher can never be sure which.) We want the probability or risk for both types of error to be low,that is, we want both aand B to be low. When a data analyst selects an a level, such as a =.05, that choice theoretically sets an upper limit for the risk for Type I error. If a is set at .05, then in theory, we have a maximumrisk of 5% for Type I error. However, the limit of risk for Type I error works in practice only if the assumptions and rules for NHSTare followed—and in many situations, they are not. The actual risk for Type I error in many research situations is often much higher than the nominal (selected) a level.
Actual State of the World Loss Drug Really Does Not Work Typel error ih risk a Researcher istrue. Reject decides! work, but the rejection H; says The drug + claims that it does. rese thatthe weight loss drug works The study probably
stcon
lishes the reditfora For patients who take the drug, a benefit
takethe drug will not benefit Correct decision, although maybe Type ll error with unknown Researcher risk not the decision the researcher decidesnotto The researcher id not reject H, hopedfor. reject; does not claimthatthe The drug does notwork andthe when His false researcher does not clamthatit The drug really does work. butthe drugworks researcherdoes not claim that it works. works Often this type of result does The study probably doesn't get not get published, and that is published; a missed opportunity unfortunate. Other researchers may do studiesto seeifthis drug The drug may not be approved for works, notknowing thatthereis use with patients, even thoughit works already evidence suggestingit This is likely to happen when may not work. studies are "underpoweredthat is, the N of casesis too small to detect the effect of interest
The risk for Type II error, B, cannot be exactly known; but we know something about factors
What does it mean for Ho to be false? Ho is true
that tend to make ß larger or smaller. In the
only if pis exactly equal to O (or exactly equal to
previous section we talked aboutstatistical power:
the proposed value in the null hypothesis, such as
the probability of rejecting Zo when it is false.
98.6 or 35 or 100 in previous examples). However,
Power is (1 —B), and we want power to be high,
Hp can be false in billions of ways. If we consider
usually on the order of .80.
Ho: y = 35, Hois false if p really equals any number other than 35 (e.g., 45, 12, 35.01, 99, 34.3, and so
Table 9.5
forth). Hp can be false to varying degrees; ina sense, Hp: = 35 is “less false”if pis really 35.2 or 34.9 than if pis really 30 or 51. Population effect size is the degree to which Æis false. For example, if Cohen's d(for the difference between the real and hypothesized population means) is d= 1.00,
this indicates that the difference between hypothesis and reality is large; if d= .05, this indicates that the difference between hypothesis and reality is small. The values of B and (1 - В) магу depending on the population effect size. We never know the exact population effect size, but we can think about the values of Band (1-8) that we would expect, in theory,for possibledifferent
values of Zand for fixed decisions about Vand a. Appendix 9A explains this in more detail.
36% Page 221 of 624 - Location 5641 of 15772
These are the factors that influence B,risk for
Cohen's 4. They decide on an adequate level of
Type II error (and also 1 —B, statistical power):
statistical power, 1 —B, often .80 They look up
ヶ As aincreases,B decreases.However, researchers are reluctantto increase a, risk for Type I error. Increasing a is not a common
way to try to reduce risk for Type II error. « As samplesize Nincreases, risk for Type II error B decreases, and statistical power
increases. This is consistent with intuitions you probably have by now: You have a higher probability to reject Zp when samplesize is large. e As population effect size such as Cohen's d increases, risk for Type II error B decreases, and statistical power increases. Design
decisions that are often under researcher control are related to effect size. This is discussed more extensively in Chapter 12 on the independent-samples test, a test you are more likely to use and a situation that will be easier for you to think about. These are the factors that influence risk for Type I
error, a:
these numbers in a table for statistical power to find the minimum value of Wthat will provide the desired level of power under those conditions. (Or they inputthis information into a statistical power calculating program.)
9.7 Meanings of “Error” Note that the term error has different meanings in everyday life than the term errorin statistics. In everydaylife, error means mistake. For example, if astudent adds a set of numbers incorrectly when calculating a sample mean, that is an error in the everyday sense: a mistake. The assumptions and rules involved in NHST were designed to keep the risks for committing each of these kinds of error low. However, even a researcher who follows all the rules exactlystill hasrisk for decision errors. In statistics, we talk about many kindsof error, and each has a technical definition. So far you have learned about sampling error. Because of sampling error, the values of means vary across
e The a level that the data analyst chooses as
samples drawn from the same population.
criterion for statistical significance. e Adherence to the assumptions and rules for
Sampling error is not a “mistake.” This is just the way the world works. Prediction error has also
NHST. If there are violations of assumptions
been mentioned: If the mean from a single sample
and rules, the true risk for Type I error is
is used to estimate an unknown population mean,
often much higher than a.
it will probably not exactly equal the population mean uy; if we use Mto estimate u, we will make a
If a study has an A'too small to have a reasonable
prediction error. In this chapter you learned about
chance to detect an effect (to reject Zp when Apis
two new kinds of error; these are the two kinds of
false), it is called underpowered. Researchers try
error that can occur when making a reject/do not
to avoid underpowered studies by using the
reject decision about a null hypothesis.
statistical power analysis methods in the previous
(Additional types of error arise later, such as
section. They decide on the type of statistical
measurementerror.)
analysis, the alpha level, and the nature of the test (one vs. two tailed). They make educated guesses about possible population effect size, such as
37% Page 223 of 624 - Location 5659 of 15772
Of course, people who handle data can make mistakes (errors, in the everyday sense of the
word): errors in computation or copying
a
dies, research questions are
numerical values or interpreting numbers.
often open ended. For example, ina
Mistakes may be surprisingly common in
nonexperimental survey, an analyst may evaluate
published research reports (Green et al., 2018).
many variables to see which one(s) best predict an
The technical types of error that arise in statistics (such as sampling error and prediction error) do not arise because the data analyst has made a
mistake. Procedures such asstatistical
outcome such aslife satisfaction. Fishing for predictors in a large set of “candidate” variables potentially opens up a much wider range of ways
to violate rules for NHST.
significance tests involve inherent uncertainty.
Some journals seem to accord greater value to
Even when a data analyst has done all the steps
confirmatory studies than to exploratory work.
correctly, the data analyst can make a decision
Perhaps because of this, there is atemptation for
error, such as rejecting Zp when it is true. This
researchers who have done exploratory studies
kind of error is unavoidable in inferential
(who havetried out many different combinations
statistics. We can’t get rid of it no matter how
of variables, rules for identification, handling of
careful we are, but we can try to reduce the risk
outliers, etc.) to cherry-pick a small set of results
for error, and we musttake risk for error into
and write research reports that make it sound as if
account when we reportresults.
the study were confirmatory.
9.8 Use of NHST in Exploratory Versus Confirmatory Research
Exploratory and confirmatory studies both have value. In many research areas, truly confirmatory studies are possible only after a period of exploratory work. However, reporting hand-
In a confirmatorya researcher usually has
picked p values from large numbers of tests in
a small number of hypotheses. These may have
exploratory studies violates a fundamental rule
been selected during earlier exploratory research,
for the use of NHST: Do only a small number of
or specified by a theory, or they may bevariations
significance tests. When a small number of
of hypotheses in previous confirmatory studies.
selected results from an exploratory study are
Confirmatory studies are often (but not always)
reported as if they were obtained through a
experiments. Confirmatory studies often have
confirmatory study, p values can greatly
few variables and a limited number of statistical
underestimate the true risk for Type I error.
significance tests. This is the context in which Fisher and colleagues developed the logic for NHST. Researchers may face fewer temptations to violate some of the rules of NHST in confirmatory studies than in exploratory research. However, there still are many ways to violate rules and assumptions for NHST in confirmatory research, for example, by trying out different methods of handling outliers and switching from two-tailed
to one-tailed tests.
A specific study may provide information to do both confirmatory and exploratory analyses. When this is the case, the first part of a “Results” section can report a limited number of analyses for which the researcher had specific hypotheses in advance. A later section titled “Exploratory Results” can report additional interesting results that were not predicted in advance. In general, we should not place muchfaith in y
37% Page 223 of 624 - Location 5696 of 15772
Preliminary Data Screening and Descriptions of Scores for Quantitative Variables
* Your decision whether to use mean or median (as well as choices among later statistics) may depend on distribution shape and whether outliers are present. * Documentevery decision you made.
When you work with quantitative variables, you should do the following things.
く In all research, decide the value of Nbefore you begin to collect data. (Do not collect data, repeatedly analyzeit, collect more data because you are not happy with results, and then stop at a point where you have results you like.)
Choose the method for outlier identification (such as boxplots or zscores) before you
collect data. Establish rules for inclusion or exclusion of cases ahead of data collection. (For example, you may wantto includea limited range of ages, or only right-handed persons, in your sample.) Decide how you will handle outliers before you collect data. If you anticipate skewness, think about what you might do to reduce skewness ahead of time. In many cases, if skewness is not extreme, you don't need to do anything about it.
6.18 Reporting Information About Distribution Shape, Missing Values, Outliers, and DescriptiveStatistics for Quantitative Variables You use all the information discussed in Chapters 3 through 6 to describe the behavior of each quantitative variable early in your research report. Try to communicate the pattern of information as clearly as possible. Information about distribution shape can be summarized in statements such as: Heartrates were approximately normally distributed, with = 100, M= 74, and SD = 4.5. There were no missing values. Using 2>
3.29 in absolute valueas the criterion for identifying outliers, there were no outliers.
The initial data set had V = 340 heart rate scores, with M= 76 and SD = 6.5. There were
Collect data. Obtain a frequency table; identify impossible or questionable score values and note percentage of missing values. Obtain a histogram and visually examine it to evaluate distribution shape and skewness. Unless skewness is extreme, you probably don’t need to do anything aboutit. To evaluate outliers, obtain a boxplot and/or z
20 missing values. Using z> 3.29 in absolute value as the criterion for identifying outliers, there were 10 outliers, all at the upper end of the distribution. On the basis of prior plans for data handling, the 20 missing values and 10 outliers were removed from the data set, leaving N = 310 cases for analysis. For these 310 cases, M 68 and SD = 5.7.
scores for all cases. Either boxplots or zscores
Number of daily servingsof fruit and
can be used to identify outliers. Note the
vegetables had a possible range of scores from
number and locations of outliers.
O to 8. Scores were not normally distributed;
25% Page 149 of 624 » Location 3841 of 15772
9.11 Interpretation of Statistically Significant
madeby noticing whether obtained pis less than .05. If the obtained p value underestimates the true risk for Type I error, then the decision to
Outcomes
reject Hp may be incorrect.
Reports of “statistically significant” outcomes
9.12 Understanding Past Research
should also be viewed with caution. It is important to understand that a “statistically significant” outcome can be obtained even when Hp is correct.
When you read past research, think about these questions.
Here are some common reasons why a decision to reject Hp and call a test result “statistically significant” may be incorrect.
Were too manysignificancetests done fory valuesto be believable? There is no universally agreed upon rule about the
9.11.1 Sampling Error
number of tests that is acceptable. I suggest that if you see more than 10 pvaluesin a
A statistically significant outcome may arise
research report, you should begin to suspect
because of sampling error. That is, even when the
that at least a few of them are due to Type I
null hypothesis #0: 4 = Мрур 15 correct, some
error. Ideally, authors should acknowledge
values of the sample meanthat are quite far away
this problem (inflated risk for Type I error
from ppyp can arise just because of sampling error
when multiple tests are performed) in the
or chance. By definition, when the nominal alpha
discussion sections of papers. If an author
level is set at .05, values of Mthatare far enough
reports that an important variable was
away from pnyp to meet the criterion for the
measured 12 different ways and then reports
decision to reject Ap occur about 5% of the time
statistically significant results for only 1 of
when the null hypothesis is actually correct.
9.11.2 Human Error Human error in computation and reporting of statistics is common (Green et al., 2018). Usually errors are in favor of a researcher's preferred outcome (people rarely recheck their numbers when they haveresults they like).
9.11.3 Misleadingp Values Obtained p values underestimate true risk for Type I error. The decision to reject Ap is often
37% Page 225 of 624 - Location 5741 of 15772
these measures, you might suspect that the
other 11 measures did not turn out to be significant when they were examined.
Sometimes numerous tests are done but not included in a paper. The use of too many significance tests is problematic whether you see them in the published paper or not.
Evaluatepvaluescritically. Realize that violations of assumptions and rules (that probably are not explicitly reported in most research reports) can make p values poor estimates of the true risk for Type Ierror.
Realize thata very smallpvalue does not
necessarily imply that the effect is large in
(e.g., one-tailed ztest, a =.05, two tailed) to look up
practical or clinical terms.
estimated power for your effect size and planned
Look for effect size information.If effect size is not reported,there should besufficient information for you to calculate this by hand. All you need to find Cohen’s dis M, SD, and Hhyp (the proposed or hypothesized value of
7. Or, using .80 for power, figure out the minimum needed to have 80% power.
9.14 Guidelines for Reporting Results
Ww). Also evaluate whether the effect size is large enough to have any practical or clinical
The information to include in a research report
importance. When variables are measured in
depends on the specific test. For a one-sample £
meaningful units, #/— ppyyp is useful
test, include N, M, SD, а}, SEm, t, and (exact) 7;
information.
whether pis one tailed or two tailed; effect size
Look for confidence intervals.
information such as Cohen's ⑦and/or ーuhyp: and a CI for M(or for M-unyp). The following
Ask ifitis reasonable to generalize from the
elements should be included in a written report
types of cases in this study to larger
for a one-sample Ztest.
populations in the real world. Ask if the situation in the study is comparable with real-
world situations.
e A statement of what test was done, for what
variable. * Samplesize (W), M, SD, and SEm.
* The CI for M(or the CI for the M- Uhyp
9.13 Planning Future Research
difference). * Obtained /with its d/and exact p. State
Research methods textbooks specific to your field
whether pis one tailed or two tailed. e Traditionally,a statement of whether a test
of interest provide much information about planning research. From the perspective of NHST,
Wasstatistically significant and/or whether
here are some important issues.
the null hypothesis can be rejected has
Make decisions ahead of time about significance tests (teststatistic, a level, directional or nondirectional test).
usually been included. Proponents of the New Statistics suggest that we should avoid yes/no thinking and instead focus on confidence
intervals and effectsizes.
Make decisions ahead of time about the
ヶ Effect size (such as Cohen's à) and,if units of
identification and handling of outliers.
measurementare interpretable, a difference such as M- Mhyp may also be useful as
Estimate the population effect size. Effect sizes from past studies (your own past research or
information aboutpractical significance.
other people’s) may be used to do this. It is better
Here is an example of a complete “Results” section
to underestimate population effectsize than to
for a one-sample¿test that includes all
overestimate it.
information listed above.
Use your estimated effect size and type of test
37% Page 226 of 624 + Location 5767 of 15772
Results
We wouldlike to know something about the
A one-sample /test was conducted to assess whether mean speed for a sample of N= 9 cars differed from the posted speed limit of 35 mph. A two-tailed test was used. For this sample, M= 39, SD= 6.103, and SEy= 2.024.
probability that the null (or the alternative) hypothesis is correct, given the information in our sample data. Instead, ay valuetells us (often very inaccurately) about the probability of obtaining the values of Mand ¿we got in our sample, given that the null hypothesisis correct
The 95% CI for Mwas [34.31, 43.69]. The
(Cohen, 1994). I don't suggest that you try to say
result was 8) = 1.966, p= .0845, two tailed.
thatin aresearch report (it may confuse your
Cohen's effect size was .66; by Cohen's
readers). Here are examples of things you should
standards, this represents a medium effect.
not say.
However, the obtained 4 mphdifference between the sample mean (M= 39) and the posted speed limit (35 mph) was too small to have much practical importance.
Nevermake anyof the following statements:
ャ ク = .000 e pwas “highly”significant * pwas “almost”significant (or synonymous
We could add that, using a = .05, two tailed, this
terms such as “close to” or “marginally”
difference was not statistically significant.
significant)
A discussion section following these results
For “small” p values, such as p = .04, we cannotsay:
should consider limitations such as the following: * Results were not due to chance, or could not * An accidental sample may not be representative of (similar to) the population
be explained by chance (we don’t know that!) * Results will replicate in future studies
of all drivers in this town. If the sample
e Hpisfalse
contained mostly male (rather than female)
* Weaccept (or have proved) the alternative
drivers, or was obtained mostly during rush hour, the sample mean may overestimate driving speed for cars more generally. e This sample size (N = 9) is too small to draw meaningful conclusions. e This report makes no mention of screening for outliers (was one driver clocked at 90 mph?).
hypothesis * Because pis small, this is an important
difference We also cannot use (1 —p), for example (1 —.04 = .96), to make probability statements such as:
e There is a 96% chance that results will replicate
You maybe able to think of additional questions.
e There is a 96% chance that the null hypothesisis false
9.15 What You Cannot Say A major problem with p values is that they cannot answer the question we really want to answer.
37% Page 227 of 624 + Location 5795 of 15772
For p values on the order ofp = .37, we cannot say, “Accept the null hypothesis.” The language we use to report results should not
overstate the strength of the evidence, imply
rejecting Ho.
large effect sizes in the absence of careful evaluation of effect size, overgeneralize the
In addition to difficulties and disputes about the
findings, or imply causality when rival
logic of statistical significance testing, there are
explanations cannot be ruled out. We should
additional reasons why the results of a single
never say, “This study proves that...” Any one
study should not be interpreted as conclusive
study has limitations. As suggested in Chapter 1, it
evidence that the null hypothesis is either true or
is better to think in terms of degrees of belief. As
false. A study can be flawed in many ways that
we obtain increasing amounts of good-quality
make the results uninformative, and even when a
evidence, we may become more confident of a
study is well designed and carefully conducted,
belief. We should also pay attention to
statistically significant outcomes sometimes arise
inconsistent evidence that would reduce our belief.
just by chance. Therefore, the results of a single study should never betreated as conclusive evidence. To have enough evidence to be
We can say things such as: * The evidence in this study is consistent with the hypothesis that ... * The evidence in this study is not consistent with the hypothesis that ... Hypothesis can be replaced by similar terms, such as prediction.
confident that we know how variables are related, it is necessary to have many replications of a result based on methodologically rigorous studies. Despite logical and practical problems with NHST, most experts do not recommend that NHST and reports ofp values should be entirely abandoned. NHSTcan help researchers evaluate whether chance or sampling error are likely explanations
9.16 Summary
for an observed outcome of a study. We can’t completely get rid of risk for error, no
use of NHST are frequently violated. Samples are
matter how well we behave. But we should avoid behaviors that we know makeour risk for error
often not randomly selected from real
worse. These behaviors have been given many
populations of interest or evaluated for their
names (-hacking fishing data torturing,
representativeness relative to real-world
questionable research practices). I will remind you
populations. Researchers often report large
of these problems as you learn additional
numbers of significance tests. The desire to obtain
statistical tests.
In practice, many assumptions and rules for the
statistically significant results can tempt researchers to engage in “data fishing”; researchers may “torture”their data until it confesses (Mills, 1993). For example, they may run many different analyses or delete extreme scores until they obtain statistically significant results. When anyof these violations of rules and
Thou shalt not place too much faith in p values.
Appendix 9A: Further Explanation ofStatistical Power
assumptions are present, reported y values do not accurately represent the truerisk for incorrectly
38% Page 220 of 624 + Location 5820 of 15772
Wecan incorporate sampling error into
understanding statistical power by visualizing
statistical significance test are based on this first
two sampling distributions. The first describes
distribution. The second is the distribution of
the sampling distribution for # (and for 2) if Ho is
outcomes for Mthat we would expectto see if the
correct. The second describes the sampling
effect size = 1.00, thatis, if the real population
distribution for Mif p equals a specific value
mean (115) were 1 standard deviation above the
different than the value specified in Ho. In the
hypothesized population mean of 100 points.
following example, let’s consider testing hypotheses about intelligence scores. Suppose that the null hypothesis is
For this example, let's work with an effectsize of
Cohen’s d= 1.00. Now let’s suppose that the actual population mean
Other
Hyp = 100,
is 115. This would make the value of Cohen’s d= [y
—#hypl/SD = [115 -100]/15 = 1.00.
the sample standard deviation SD= 15, and the sample size is N = 10 (therefore df= 9). This gives
us:
The upper panelof Figure 9.1 shows the expected distribution of outcome values of ¿given Ho: 100, Ha: = 100, df= 9, and a = .05 (two tailed).
Other
SE,= 15/VN = 15/V10 = 15/3.162 = 4.74. From the tableof critical values for the £
Using df= 9, we can find the critical values of # from the table in Appendix B; for a = .05 (two tailed) with 9 df we would reject Æfor values of e
> 2.262 and for values of 7 < -2.262.
distribution, which appears in Appendix B at the end of the book, the critical values of #for a =.05,
The lower panel of Figure 9.1 shows how these
two tailed, and の= ⑨ are Z= +②.②⑥② and z= -②.②⑥②.
critical values of ¿correspond to values of M.A
The critical values of Mwouldtherefore be
value of ¿can be converted back into the original
units of measurement. The value of Mthat Other
100 — (2.262 x 4.74) = 89.28, and
100 + (2.262 x 4.74) = 110.72. In other words, we would reject #0: u = 100 if we
corresponds to a critical or cutoff value of tis M= M + (Critical * SEM). For example, a /value of 2.262 corresponds to a sample mean Mof 110.72. The reject regions for Ap can be given in terms of obtained values of M. We would reject #0: u = 100 for values of M> 110.72 and for values of M< 89.28.
obtain a sample mean Mthatis less than 89.28 or
Other
greater than 110.72.
If? = (M n, リoがip then M = Php +critical * SE,
To evaluate statistical power, we need to think
The preceding discussion shows how the
about two different possible distributions of
distribution of outcomes for #that is
outcomes for M, the sample mean. The first is the
(theoretically) expected when Apis assumed to be
distribution of outcomes that would be expected
trueis used to work out the reject regions for Hg
if Hy were true; the “reject regions”for the
(in terms of values of zor M).
38% Page 220 of 624 + Location 5846 of 15772
Note: The upper distribution shows how values of
The graph is that of a normal distribution
Mare expected to be distributed if Ao is true, that
‘where the X axis ranges from 110 to 125. The
is, u = 100. The shaded regions in the upper and
lower tails of the upper distribution correspond to the reject regions for the test of #9: u = 100. The lower distribution shows how values of Mwould bedistributed if the population mean pis actually 115; on the basis of this distribution, we see thatif pis really 115, then about 80% of the outcomes for M would be expected to exceed the critical value of M(110.72).
Figure 9.1 Illustration of Statistical Power and Risk for Type II Error (B)
Critical value of M = 110,72
グ
critical value of Mis 110.72.
The regionthatcorrespondsto power open bracket 1 minusbetaclose bracket 0.8 ofthe distribution has beenshaded. This is the entire region to the right of 110.72. The next step is to ask what values of M would be expected to occur if Apis false (one of many ways that Æcan befalse is if pis actually equal to 115). An actual population mean of p= 115 corresponds to a Cohen’s effectsize of 1 (i.e, the actual population mean 115 is 1 standard deviation higher than the value of upyp = 100 given in the null hypothesis).
=
dl2=.025
025 7 105 100 9095 Distribution of values of M f Hy is true (Ho: 1 = 100) and SE, = 474
The lower panel of Figure 9.1 illustrates the
m}
theoretical sampling distribution of Mifthe population meanisreally equal to 115. We would
Power (1-1) 80 25 120 1s mo Distribution ofvalues of Mi y = 115 and SE,, = 4.74
The imageis a combination diagram with two graphs that illustrates the statistical power and risk for type Il error. 1. The first diagram showsthe distribution ofvalues M if H subscript 0: mu equals 100 andSE subscript M equals 4.74. The graph is that of a normal distribution ‘where the X axis ranges from 90 to 110. The
critical value of Mis 110.72.
The tail region thatcorrespondsto alpha by 2 at the tail end of the distribution on either side equals .025. This has been shaded. 2. The seconddiagram showsthe distribution of values M is mu equals 115 and SE subscript m equals 4.74. 38% Page 230 of 624 + Location 5877 of 15772
expect most values of Mto befairly close to 115 if the real population mean is 115, and we can use SEprto predict the amount of sampling error that is expected to arise for values of Macross many samples. The final step involves asking this question: On
the basis of the distribution of outcomes for #7 that would be expected if pis really equal to 115 (as shown in the bottom panel of Figure 9.1), how often would we expect to obtain values of the sample mean Mthatare larger than the critical value of M= 110.72 (as shown in the upper panel of Figure 9.1)? Note that values of Mbelow the
lower critical value of M= 89.28 would occur so rarely when p really is equal to 115 that we can ignore this set of possible outcomes. To work out the probability of obtaining a sample mean Mgreater than 110.72 when actual u = 115,
we find the ¿ratio that tells us the distance between the “real” population mean, u = 115, and
the critical value of M= 110.72. This value is £= (M
of Mshown in Figure 9.1.
—p)/SEm= (110.72 -115)/4.74 = ~.90. The likelihood that we will obtain a sample value for M that is large enough to bejudged statistically
Comprehension Questions
significant given the decision rule developed
1. What isa Type Ierror?
previously (i.e., reject Zp for A> 110.72) can now
2. Whatfactors influence the magnitude of risk
be evaluated by finding the proportion of the area
for Type I error?
in /distribution with 9 d/that lies to the right of
3. Whatis a Type II error?
z=ー⑨0.Tables ofthe or GT me &orAND 0
Greater than or equal to late whether both conditions hold Evaluate whetherane orboth oftheconditions hold
Figure 6.16 Temperature Data File With Cases Removed by Select Cases Procedure Marked by
Cross Hatches
The image is a scatterplot for perfect negative correlation where r equals minus1.
It is useful to think about the way average values
The X axis denotesthe hours underscore study and ranges from 0 to 6. TheY axis denotes the errors underscore exam and rangesfrom 0 to
medium, and high values of X). The vertical
10.
see that the group of people with low SAT scores
Thereare8 datapoints visible: 0,10; 1, 9; 2, 8; 3, 7; 4; 6; 5, 5; 6, 4; 7, 3
The line formedby joining the data points isa straight line from the top left to the bottom right.
10.5 Most Associations Are Not Perfect In behavioral and social science research, data rarely have correlations near -1.00 or +1.00;
values of 7tend to be below .30 in absolute value. When scores are positively associated (but not perfectly linearly related), they tend to fall within aroughly cigar-shaped ellipse in a scatterplot, as
shown in Figure 10.4. To see how patterns in scatterplots change as the absolute value of rdecreases, consider the following scatterplots that show hypothetical data for SAT score (a college entrance exam in the United States, Æ) as a predictor of first-year college grades (7).
of Ydiffer across selected values of X (such as low, ellipses in Figure 10.5 identify groups of people with low, medium, and high SAT scores. You can has amean GPA of 1.1, while the group of students with high SAT scores has a mean GPA of 3.6. We can’t predict each person’s GPA exactly from SAT score, but we can see that the average GPA is higher when SAT score is high than when
SATscore is low. If the correlation between GPA and SAT score is about +.50, the scatterplot will look like the one in Figure 10.6. In real-world studies, correlations
between SAT and GPA tend to be about +.50 (Stricker, 1991). The difference in mean GPA for the low versus high SAT score groups in the graph for = +.50 is less than it was in the graph for 7= +.75. Also, within the low, medium, and high SAT score groups, GPA varies more when 7= +.50 than
when 7= +.75. SAT scores are less closely related
to GPA when 7=.50 than when 7=.75. Now consider whatthe scatterplot lookslike when correlation is even smaller, for example, 7= +.20. Many correlations in behavioral science research reports are about this magnitude. A scatterplot for 7= +.20 appears in Figure 10.7. In this scatterplot, points tend to be even farther
Figure 10.5 shows a scatterplot that corresponds to a correlation of +.75 between SAT score (XY predictor) and college grade point average (GPA) (Youtcome). The association tends to be linear (GPA increases as SAT score increases), but it is not perfectly linear. If we draw line through the center of the entire cluster of data points, it is a straight line with a positive slope (higher GPAs go with higher SAT scores). However, many ofthe data points are not very close to the line.
39% Page 237 of 624 + Location 6015 of 15772
away from the line than when 7= +.50, and mean
GPA does not differ muchfor the low SAT versus high SAT groups. For correlations below about 7= .50, it becomes difficult to detect any association by visual examination of the scatterplot. Figure 10.4 Ellipse Drawn Around Scores in an X, YScatterplot With a Strong Positive Correlation
GPA
Y=GPA 40 30 20 10
300
400
500
600
700
800 SAT score
The image is an ellipse drawn around a scatterplot. The scatterplot has strong positive correlation. The axis represents SATscores and ranges from 300 to 800. The Y axis represents GPA and ranges from 1 to 4.
The scatter points mostly lie within an elliptical area within 500 to 700 on the X axis and between 2 and 4 ontheY axis. The points show positive correlation as a rise in X axis levels also seem to indicate rise in Y axis levels. There are a couple of outliers, but most points lie within theellipse. Figure 10.5 Scatterplot: Hypothetical Association Between GPA and SAT Score Corresponding to 7= +.75
250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = +.75
The image is an ellipse drawn around a scatterplot that showsa relationship between GPA and SATscores correspondingto r equals plus.75. The axis represents SAT scores and ranges from 250 to 800.The Y axis represents GPA and ranges from 1 to 4.
There are threeellipses within which most of the datapoints are clustered. Thereare a few outliers, but mostpoints lie within the ellipses. The ellipses are almost vertical. For a mean GPAof1.1, thefirst ellipse has 8
datapoints. They are clustered aroundthe 1 GPA and 400 SATscore levels.
The secondellipse is for a mean GPA of 2.4. The data points are clustered aroundthe 2 to 3 GPA range and the 500 to 600 SATscorelevels.
There are around 14 such datapoints.
The thirdellipse is for a mean GPA of 3.6. Here, datapoints are fewer,just around5, and are clustered around the 4 GPAlevel and 700 SATlevel.
A straight line through the means of the three ellipses showsstrong positive correlation. Figure 10.6 Scatterplot for Hypothetical GPA and
SAT Score With 7= +.50
39% Page 230 of 624 - Location 6043 of 15772
GPA First Year 40
Figure 10.7 Hypothetical Scatterplot for 7= +.20
GPA 40
30 so 20 20 10
250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = .50
The image is an ellipse drawn around a scatterplot that showsa relationship between GPA and SATscores correspondingto r equals plus .5. TheX axis represents SAT scores and ranges from 250 to 800.The Y axis represents GPA and ranges from 1 to 4.
There are threeellipses within which most of the datapoints are clustered. Thereare many outliers, but several points lie within the ellipses. The ellipses are vertical.
10 250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation of about .20
The image is an ellipse drawn around a scatterplot that shows a relationship between GPA and SAT scores corresponding to r equals plus .2. The X axis represents SAT scores and ranges from 250 to 800. The Y axis represents GPA and ranges from 1 to 4.
There are two ellipses within which many data points are clustered. There are manyoutliers, and several of thesepoints lie betweenthe ellipses.
For a mean GPA of 1.4, thefirst ellipse has 6
data points. Theyare clustered around the 2 GPA and 400 SATscore levels.
The secondellipse is for a mean GPA of 2.0. The data points are clustered aroundthe 1 to 3 GPA range and the 500 to 600 SATscore levels.
There are around18 such datapoints. There are many points close to the ellipse, but not contained within it.
The thirdellipse is for a mean GPAof 2.6. Here, data points are fewer, just around5, and are clustered around the 3 GPA level and 700
For a mean GPA of 2.1, thefirst ellipse has 8
data points. Theyare clustered around the 1 to 3 GPA and 400 SATscore levels.
The secondellipse is for a mean GPA of 2.4. The data points are clustered aroundthe 1.5 to 3.5 GPA range and the 650 to 700 SAT score
levels. There are around 5 such data points.
Mostof the other points lie betweenthe two ellipses and not inside them, while a straight line drawn betweenthe meansofboth ellipses is almosthorizontal.
SATlevel.
A straight line drawnbetweenthe means of theellipses is almost linear.
39% Page 230 of 624 + Location 6060 of 15772
10.6 Different Situations in Which 7 = .00
Finally, consider what scatterplots can look like
when 7is close to 0. An 7of 0 tells us that there is no linear relationship between Xand ¥. However, there are two different ways 7 close to O can happen. If Yand Yare completely unrelated, »will
beclose to 0. If Yand Yhave a nonlinear or curvilinearrelationship, 7 can also be close to 0.
analysis for this situation. An example of a different curvilinear function appears in Figure 10.10 (height in feet, F, and grade in school, X). In this example, would be large and positive, however; a straight line isnot a good description of the pattern. Height increases rapidly from Grades 4 through 7; after that,
Figure 10.8 shows a scatterplotfor a situation in
height increases slowly and levels off. If you flip
which Xis not related to Fat all. If SAT scores
Figures 10.9 and 10.10 upside down, they
were completely unrelated to GPA, the results
correspond to other possible curvilinear patterns.
would look like Figure 10.8. The two groups (low
Figure 10.8 An 7 of O That Represents No
and high SAT scores) have the same mean GPA,
Association Between Yand Y
and mean GPAfor each of these groups is equal to mean GPA for all persons in the sample. Also note
GPA
4.0
that the overall distribution of points in the scatterplot is approximately circular in shape (instead of elliptical).
3.0
However, an 7of 0 does not always correspond to a situation where Yand Yare completely
2.0
unrelated. An 7close to 0 can be found when there is a strong but not linear association between Y and Y Figure 10.9 shows hypothetical data for an
association sometimes found in research on anxiety (X) and task performance (such as exam scores, 7). The plot shows a strong, but not linear, association between anxiety and exam score. An inverse U-shaped curve corresponds closely to the pattern of changein F. In this example, students very low in anxiety obtain low exam scores (perhaps they are not motivated to study and do not concentrate). Students with medium levels of anxiety have high mean exam scores (they are motivated to study). However, students with the highestlevel of anxiety also have low exam scores; at high levels of anxiety, panic may set in and students may do poorly on exams. Pearson’s ris close to O for the data in this plot. Pearson's 7 does not tell us anything about the strength of this type of association, and it is not an appropriate
10 250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = 00
The image is that of a circle and two ellipses drawnarounda scatterplot that shows no relationship betweenX and Y. The r equals 0. The axis represents SATscores and ranges from 250 to 800. The Y axis represents GPA and ranges from 1 to 4.
There are two ellipses within which many data points are clustered. There are manyoutliers, and several of thesepoints lie betweenthe ellipses. A larger circle encircles most ofthe points as well as theellipses. For a mean GPA of 2.4, thefirst ellipse has 9
datapoints. Theyareclustered aroundthe 1 to 3 GPA and 400 SATscore levels.
39% Page 240 of 624 - Location 6080 of 15772
cubeis positive; when (X- My) is negative, its cube is negative. Skewness therefore provides information about the comparative magnitudes of positive versus negative deviations from the mean. A positive valuefor the skewness index indicates more extreme scores at the upper end of the distribution. SPSS provides a skewness index (along with the standard error of skewness, or SEskewness) that can be used to test whether skewness differs significantly from zero. However, in most research situations, visual
examination of a histogram is sufficient to
evaluate skewness. To decide whether skewness is severe, you can divide skewness by the standard error of skewness given in SPSS’s output for descriptive statistics and evaluate this ratio using standards for zscores. If the zratio is greater than 3 in
Other
(6.6)
4
/が ( ダーダ Kurtosis = UX-M,) 5 When deviations from the mean are taken to the fourth power, greater weightis given to extreme scores. Kurtosis provides more information about extreme scores in the tails than about the shape of the peak. Different distribution shapes can arise for varying degrees of kurtosis. Westfall (2014) offers examples to demonstrate that kurtosis does not provide information about the shape of distribution peaks. Figure
6.18
Skewness
vegetable servings data that appeared in Figure
NCIFV
6.10. For the daily number of servings of fruits and vegetables variable (NCIfv), skewness = 1.273,
N
|
Valid
Mean
present, values of the median and modeare
Median
normal distribution does notfit well. Descriptive statistics, including the skewness index, appear in Figure 6.18.
6.C.2 Index for Kurtosis Kurtosis has been widely misunderstood;it is
sometimes described as information about “peakedness”of distribution shape. That is incorrect (Westfall, 2014). Thinking about the
Mode Std. Deviation
es 談
Vegetable
|
492
①.⑧⑥ |
①.00
ニ ー0 2.327
談z
Std.ErrorofSkewness .①①0 NCIFV + N: valid - 492
computational formulahelps us see why. A
+ N: missing -0
common formulafor kurtosisis:
* Mean-1.86
26% Page 157 of 624 » Location 4025of 15772
and
w。
longer tail on the upper end). When skewness is usually not close together, and the curve for a
Fruit
Including
Statistics
significant” skewness. Consider the fruit and
distribution is very positively skewed (it has a
Daily
Statistics
Consumption Data
absolute value, it indicates “statistically
SEskewmess = -110, and 1.273/.110 = 11.57. This
for
Descriptive
among observations and the data collection
association between and Fbelinear. When an X
methods that tend to create problems with this
predictor variable has only two possible values,
assumption, refer to Chapter 2. When people in
the only association that Y can have with Yis
the sample have not had opportunities to
linear.
influence one another, this assumption is usually met. When this assumption is violated, values of 7 and significance tests of 7can be incorrect.
In the bivariate regression chapter, you will see that the Yindependentvariable can be either quantitative or dichotomous. However, the Y dependentvariable in regression analysis cannot
10.7.5 Xand YMust Be
be dichotomous; it must be quantitative.
Appropriate Variable Types
Correlation analysis does not require us to distinguish variables as independent versus
Some textbooks say that both Yand Ymust be
dependent; regression does require that
quantitative variables for Pearson's 7; thatis the
distinction.
most common situation when Pearson’s ris reported. However, Pearson's 7 can also be used if either Xor F, or both, isa
le
(for example, if Xrepresents membership in just two groups). When one or both variables are dichotomous, Pearson’s 7 can be reported with
different names. If we correlate the dichotomous variable sex with the quantitative variable height, thisis called a point biserial ィbp) If we correlate
the dichotomous variable sex with the dichotomous variable political party coded 1 = Republican, 2 = non-Republican, that correlation is called a phi coefficient, denoted q.
10.7.6 Assumptions About Distribution Shapes Textbooks sometimes say that the joint
distribution of Yand Ymust be bivariate normal and/or that Yand Ymust each be normally distributed. In practice thisis often difficult to
evaluate. The bivariate normal distribution is not discussed further here. In practice, it is more important that Xand Fhavesimilar distribution shapes (Tabachnick & Fidell, 2018). When and F havedifferent distribution shapes, values of rin
If Xand/or Yhave three or more categories,
the sample are restricted to a narrower range, for
Pearson’s cannot be used, because it is possible
example, -40 to +.40.
for the pattern of means on a Fvariable to show a
Figure
nonlinear increase or decrease across groups
Predictor
defined by a categorical X variable. For example, if
(Height)
“ispolitical party membership (coded 1= Democrat, 2 = Republican, 3 = Socialist, and so forth), and Fis a rating of the president’s
performance, we cannot expect changes in Y across values of Yto belinear. Consider the scatterplot in Figure 10.11 that represents an association between sex (XY) and height (7). Pearson's rrequires that the
A0% Page 242 of 624 » Location 6154 of 15772
10.11 (Sex)
Scatterplot and
for
Dichotomous
Quantitative
Outcome
Screening for Pearson's 7 The following information is needed to evaluate
the assumptions. To evaluate representativeness Height
of the sample and independence of observations, you need to know how the sample was obtained and how data were collected (see Chapter 2). In
addition: Examinefrequencytablesto evaluate problems with missing values and/or outliers
Male
Female
and/or implausible values for Yor Y(as for all analyses). Document numbers of missing
Note: M, = mean male height; M) = mean female height.
values and outliers and how they are handled.
Obtain histogramsfor Yand Y, Evaluate whether distribution shapes are reasonably
The image is a scatter plot for a dichotomous predictor such as sex and a quantitative outcomelike height. The X axis has two points; one for M1 or males andthe second for M2 or females. The Y axis showsthe height andrangesfrom 581076.
There are 10 datapoints for males and an equal number for females. These pointsappear to be in straight vertical linesfor each sex. When the midpoints of the two lines are joined, a downwardsloping line emerges. When there are problems with assumptions, nonparametric alternative correlations such as Spearman's 7or Kendall's tau may be preferred (see Appendix 10A and Kendall, 1962). These also require assumption of linearity, but they are likely to be less influenced by bivariate outliers.
10.8 Preliminary Data
40% Page 244 of 624 » Location 6181 of 15772
normal and and Fhave similar shapes. For a dichotomousvariable, the closest to normal shape is a 50/50 split in group membership. Obtain an X, Fscatterplot. This is the most important part of data screening for correlation. The scatterplot is used to evaluate linearity and identify potential bivariate
outliers. Pearson’s ris not robust against violations of most of its assumptions. A statistic such as the median is described as robust if departures from assumptions and/or the presence of outliers do not have much impact on the value of that sample statistic. Partly because ris affected badly by violations of its assumptions and additional problems discussed later, samplesizes for 7should be large, ideally at least /= ⑤0 or 100, and data screening should include evaluation of bivariate
outliers.
10.9 Effect of Extreme Bivariate
Outlier
Outliers Prior chapters discussed methods for the detection of univariate outliers (i.e., outliers in the distribution of a single variable) through
60 50
examination of histograms and boxplots. In
40
correlation, we also need to consider possible tliers. These do not necessarily have aria
30
extreme values on Yor on Y(although they may). A bivariate outlier often represents an unusual combination of values of Yand Y: For example, if Y is height and Yis weight, height of 6 ft and body
Extreme bivariate outlier included
20 10
d data without outlier
weight of 120 Ib would be a very unusual combination of values, even though these are not extreme scores in histograms for height and weight. If you visualize the location of points in your scatterplot as a cloud,a bivariate outlier is an isolated data pointthat lies outsidethat cloud.
Figure 10.12 shows an extreme bivariate outlier (the circled point at the upper right). For this scatterplot, 7= +.64. If the outlier is removed, and anew scatterplotis set up for the remaining data, the plot in Figure 10.13 is obtained; when the outlier is excluded, 7= -.11 (not significantly different from 0). This exampleillustrates that the presence of a bivariate outlier can inflate the
valueof a correlation. It is not desirable to have the result of an analysis depend so much on one
outlier score. The presence of an outlier does not always increase the magnitude of 7 consider the example in Figure 10.14. In this example, when the circled outlier at the lower right of the plot is included, » =+.532; when it is excluded, = +.86. When this bivariate outlier is included,it decreases the value of 7. These examples demonstrate that decisions
to retain or exclude outliers can have substantial impact on 7 values.
Figure 10.12 Scatterplot That Includes a Bivariate
40% Page 244 of 624 » Location 6205of 15772
The image is of a scatterplot thatincludes a bivariate outlier. The X axis rangesfrom 0 to 60 and the Y axis also rangesfrom 0 to 60. The bivariate outlier is a datapoint60, 58. Thislies outsidea circle that enclosesthe other data points. The other datapoints lie close to theregion bounded by 10 and 20 on the X and Y axes. Figure 10.13 Subset of Data From Figure 10.12 After Outlier Is Removed
Extreme bivariate outlier removed
Note: With the bivariate outlier included, Pearson's (48) = +.64, p< .001; withthe bivariate outlier removed, Pearson’s (47) = —. 10, not significant.
The image is of a scatterplot after a bivariate outlier has been eliminated. The X axis ranges from O to 25 and the Y axis ranges from O to 20. The data points lie in a circle close to the region bounded by 10 and 15 on the X and Y axes. A notebelow the graph mentionsthe following:
Note-With the bivariate outlier included,
Withthebivariate outlier included, Pearson's (48) equals plus .64, p greater than .001; with the bivariate outlier removed,Pearson's r(47) equals minus .10, notsignificant.
bivariate outlier removed, Pearson’s (47) =
It is dishonest to run two correlations (one that includes and one that excludes bivariate outliers) and then report only the larger correlation. It can be acceptable to report both correlations so that
readers can see the effect of the outlier. Decisions about identification and handling of outliers
should be made before data arecollected. Figure 10.14 A Bivariate Outlier That Deflates the Size of 7
Pearson's (48) = +.532, p< .001; with the
+.86,p .25) is large. Guidelines are
summarized in Table 10.3. Below rof about .10, a correlation represents a
relation between variables that we can detect in statistical analyses using large samples, but the
relation is so weakthatit is not noticeable in everyday life. When 7= .30, relations between variables may be strong enough that we can detect them in everydaylife. When ris above =
10.17 Pearson’s rand 7? as Effect Sizes and Partition of Variance Both Pearson’s rand 72 аге indexes of effectsize.
.50, relations may beeasily noticeable in everyday life. These guidelines for effect size labels are well known and generally accepted by researchers in social and behavioral sciences, but they are not set in stone. In other research situations, an effect
size index other than and/or different cutoff values for small, medium, and large effects may be
They are standardized (their values do not depend
preferable (Fritz, Morris, & Richler, 2012). When
on the original units of measurement of Yand №),
findings are used to make important decisions
and they are independent of sample size N.
that affect people’slives (such as whether anew
Sometimes 7 is called the coefficient
medical treatment produces meaningful
determination; I prefer to avoid that term because
improvements in patient outcomes), additional
it suggests causality, and as noted earlier,
information is needed; see further discussion
correlation is not sufficient evidence for causality.
about effect size in Chapter 12, on the
An 2 estimates the proportion of variance in F
independent-samples ¿test.
that can be predicted from X (or, equivalently, the
Figure 10.22 Overlapping Circles: Proportions of
proportion of variance in Xthat is predictable
Areas Correspond to 7? and (7 - 2)
from 少 . Proportion of predicted variance (72) can be diagramed by overlapping circles, as shown in Figure 10.22. Each circle represents the total variance of one variable. The area of overlap between circles is proportional to /2, the shared or predicted variance. The remaining area of each circle corresponds to 1-72; this represents the proportion of variance in ¥that is not predictable
42% Page 258 of 624 - Location 6555 of 15772
The image shows two overlapping circles. Whenbothcircles intersect, the area of overlap between circlesis proportional to r squared. Circle X on the left represents 1 minus r squared. Circle Y on the right also represents1 minus r squared.
As shown in Figure 10.22, 72 and (1 - /2) provide a partition of variance for the scores in the sample. For example, the variance in Fscores can be partitioned into a proportion that is predictable from X (7%) and a proportion that is not predictable from X (1 - 72). Ап / is often referred
Table 10.377
ICEE Large effect ly noticeable difference in real fe, suchas 2in, di
Between medium and large Medium effec Between small and medium effect
to as “explained” or predicted variance in 万 ① - ア )
is variance or variancethat cannot be predicted from Y: In everydaylife, e7ror usually means “mistake” (and mistakes sometimes do happen when statistics are calculated and
o
reported). The term error means several different things in statistics. In the context of correlation and many other statistical analyses, error refers to
o
a
y signi large samples butis not noliceable or detectable in everyday lite.
Between small and noeffec Noeffec o o Source: Based on Cohen(1988). Why did Cohen choose .30 as the criterion for a medium effect? I suspect it was because
the collective influence of several kindsof factors that include other predictor or causal variables not included in the study, problems with measurements of Xand/or ¥, and randomness. Suppose the Yvariableis first-year college GPA,
and Xis SAT score. Correlations between these variables are on the order of .4 to .5 in many
correlations of approximately 7=.30 and below
studies. If 7=.5, then? =.25 = 25%ofthe
are common in research in areas such as
variance in GPAis predictable; and (1-7?) = 75%
personality and social psychology. For example,
of the variance in GPA is error variance, or
Mischel (1968) remarked that values of greater
variance that is not predicted by SAT score. In
than .30 are rare in personality research. In some
statistics, errorrefers to all other variables that are
fields, such as psychophysics and behavior
not included in the analysis that may influence
analysis research in psychology, proportions of
GPA. For example, GPA may also depend on
explained variance tend to be much higher
variables such as amount of time each student
(sometimes on the order of 90%). The effect size
spends partying and drinking,difficulty of the
guidelines in Table 10.3 would not be used in
courses taken by the student, amount of time
research fields where stronger effects are
spent on outsidejobs, life stress, physical illness,
common.
and a potentially endless list of other factors. If we
have not measured these other variables and have In practice, you will want to compare your ヶ and 〆
not included them in our data analysis, we have
values with those obtained by other researchers
no way to evaluate their effects.
who study similar variables. This will give you some idea of how your effect sizes compare with those in other studies in your research domain.
42% Page 259 of 624 » Location 6583 of 15772
By now, you may be thinking, If we could measure
these other variables and include them in the analysis, then the percentage of variance in GPA
detecting a population effect size p of .50. In
to have at least N = 100 cases where correlations
statistical power analysis, power of .80 is used as
are reported, to avoid situations where there is
the goal (i.e., you want an 80% chance of rejecting
not enough information to evaluate whether
Hoif Hy is false).
assumptions (such as normality and linearity) are
satisfied and situations where one or two extreme Using Table 10.4, it is possible to look up the
outliers can have a large effect on the size of the
minimum Wof participants required to obtain
sample correlation. The following is slightly
adequate statistical power for different
paraphrased from Schénbrodt (2011):
population correlation values. For example, let a = .05, two tailed; set the desired level of statistical power at .80 or 80%; and assume that the true
population value of the correlation is p = .5. This implies a population p2 of .25. From Table 10.4, a minimum of N = 28 subjects would be required to have power of 80% to obtain significant sample result if the true population correlation is p = .50. Note that for smaller effects (e.g., a p? value on the order of .05), samplesizes need to be substantially larger; in this case, V= 153 would be needed to have power of .80.
From my experience as a personality psychologist, I do not trust correlations with
N< 80. … Nof 100-120 is better. In this region, correlations get stable (this is of course only a rule ofthumb and certainly depends on the magnitude of the correlation). The p value by itself is bad guidance, as in small samples the CIs are very huge ... the CI for 7= .34 (with N= 35) goes from .008 to .60, which is “no association” to “a strong association.” Furthermore, 7is rather susceptible to outliers, which is even more
Table 10.47
serious in small samples.
EREEREEEEAER
Guidelines about sample size are not chiseled into stone. When data points are difficult to obtain,
sometimes researchers have no choice but to use small Vs. Be aware that small Vs are not ideal and that results obtained using small samples will have wide confidence intervals and may not replicate closely in later studies.
Source:Adaptedfrom Jaccard and Becker(2009). Post hoc (or postmortem) power analyses should not be conducted. In other words, if you found a
10.19 Interpretation of Outcomes for Pearson’s 7
sample 7? of .03 using an Vof 10, do notsay, “The
value of M.
10.19.1 When ris Not Statistically Significant
Even if power analysis suggests that a smaller
If rdoes not differ significantly from 0, this does
samplesize is adequate, it is generally a good idea
not prove there is no association between Yand Y
rin my study would have been statistically significant if Thad Nof 203” (or some other larger
43% Page 261 of 624 - Location 6636 of 15772
Chapter Sampling Error and Confidence Intervals
wants to say something about the mean lengthfor thepopulation ofall lizards on the island. Two problems must be considered when using information from a sample to make inferences
7.1 Descriptive Versus Inferential Uses of Statistics Upto this point, we have used statistics such as / and SD only to describe scores in small samples. In some real-life situations, such as evaluation of exam scores for a class of students, that is all the data analyst wants to do. For example, a teacher may report summary information such as the mean, median, minimum, maximum, and
standard deviation of scores in his or her class.
about a population. One issue, discussed earlier, is representativeness of the sample. Is the sample similar to the population of interest? We should be careful not to generalize results from a study to a population that includes many kinds of people that were not included in the study. Now we consider a second problem that arises when using a sample to make inferences about a population: the problem of sampling error. Different samples, drawn from the same population, usually have different sample means. Variation in values of M across different samples from the same
However, teachers typically do not use this
population is called sampling error. How much
information to make inferences about students outside the class. When the use of statistics is
Mis close to the population mean, it is a good
limited to description of a sample, that is called a
1
s. When instructors
engage in descriptive use of statistics, they report their results something like this: “In the sample of students in my classroom, = 36, M= 85, and SD = 10.” Then the instructor stops and makes no statements about larger populations of students beyond the students included in the class. In scientific studies, however, researchers almost always wantto say something about a population of cases beyond the cases included in the study. Here is a simple hypothetical example of an
inferential
us
s. À researcher wants
to estimate (make an inference about) population mean length of lizards for the entire population of lizards on an island. Suppose it is not possible to
can we believe the mean from any one sample? If estimate of that population mean; if it is far, then it is not a good estimate. We need to have some idea how far any individual value of is Mlikely to be from the population mean. This may appearto be an unanswerable question. How can we say anything about the distance of a sample mean Mfrom the population mean if we don’tknowthe population mean? However, this question can be answered by creating artificial (imaginary) populations of scores for which we do know the population mean, drawing many different samples from those imaginary populations, and examining the distributions of values of Macross all of these samples (for example, by setting up a histogram for values of
M.
locate every lizard. A biologist captures a sample of N= 25 lizards and finds mean length M= 2 in. The researcher can say, “The mean length 72 ту sampleis 2 in.” However, the biologist probably
27% Page 167 of 624 » Location 4220 of 15772
7.2 Notation for Samples Versus Populations
A spurious correlation may occur because of
more weight, this outcome would be consistent
chance or coincidence. Spurious correlations arise
with the hypothesis that diet drinks cause weight
because a third variable (sometimes referred to as
gain. Further research would then be needed to
a “confounded variable”or “lurking variable”) is
figure out a possible mechanism, for example,
involved. For the example of ice cream sales and
specific ways in which diet drinks might change
homicide, the confounded variableis
metabolism.
temperature. In hotter months, homicide rates increase, and ice cream sales increase (Peters,
As a beginning statistics student, you have not yet
2013).
learned statistical techniques that can be used to
However, correlations that seem silly are not
using these methods can lead to mistaken
always spurious. Some evidence suggests that
judgements whether a correlation is spurious or
consumption of diet drinksis related to weight
not.
assess more subtle forms of spuriousness. Even
gain (Wootson, 2017). At first glance this may seem silly. Diet drinks have few or no calories; how could they influence body weight? In this case, I'm not sure whether the correlation is spurious or not. On one hand, some artificial sweeteners might alter metabolism in ways that promote weight gain. If that is the case, then this correlation is not spurious: the artificial sweeteners may cause weight gain. On the other hand,it is possible that weight gain increases diet drink consumption (i.e., people who worry about their weight may switch to diet drinks) or that consumption of diet drinksis related to confounded or lurking variables, such as exercise. People who don't exercise may gain weight; if people who don't exercise also consume diet drinks, then consumption of diet drinks will have a spurious(not directly causal) correlation with weight gain.
When unexpected or odd or even silly correlations arise, researchers should not go through mental gymnastics trying to explain them. In a study of folk culture, a researcher once reported that nations with high milk production (X) also scored high in ornamentation of folk song style (7). There was a positive correlation between Xand ¥. The author suggested that the additional protein provided by milk provided the energy to generate more elaborate song style. This is a forced and unlikely explanation. For beginning students in statistics, here is my advice: Be skeptical about what you read; be careful what you say. There are many reasons why sample correlations may not be good estimates of population correlations. Spurious correlations can happen; large correlations sometimes turn up in situations where variables
When correlations are puzzling or difficult to explain, more research is needed to decide whether they might indicate real relationships between variables. If a study were done in which all participants had the same daily calorie consumption, and participants were randomly divided into groups that did and did not consume diet drinks, then if the diet drink group gained
43% Page 263 of 624 - Location 6692 of 15772
are not really related to each other in any meaningful way. A decision to call a correlation statistically significant can be a Type I error; a decision to call a correlation not significant can be a Type II error. (Later you will learn that correlation values also depend on which other variables are included in the analysis.)
Researchers often select Xand Fvariables for correlation analysis because they believe Yand Y
have a meaningful, perhaps causal, association.
ves taser) SPSS Statistics Data or c re Data ant raie Grohe ve oe .
mai Dn ニー =
However, correlation does not provide a rigorous way to test this belief.
am
When you report correlations, use language that is consistent with their limitations. Avoid using terms such as proof; and do not say that X causes, influences, or determines ¥when your data come from nonexperimental research.
10.20 SPSS Example: Relationship Survey
_ewemel
incoado ! RE ele ⑧
es
»
limes
Ра 2 4 2
2 3 E n
1 4 3 3
General Linear Model Goneralizod Linear Models aed Models. Correlate
» » »
-
⑧① i
ーー = ジーー
n= I EJ canonica Constan Е
оD n Е -
ッ
2
i ① i 73
ha
Borsa E [= т
avant
⑧
ド
pata ano Temporal ode.» ⑥ т
>
2
⑧ 3 i ⑨ョ
The file called love.sav is used in the following example. Table 10.1 lists the names and
characteristics of variables. To obtain a Pearson correlation, the menu selections (from the menu bar above the data worksheet) are っ > , as shown in Figure
10.23. The term //variate means that each requested correlation involves two (2/means two) variables. These menuselections open the Bivariate Correlations dialog box, shown in Figure 10.24.
The data analystuses the cursor to highlight the
names of at least two variablesin the left-hand pane (whichlists all the variables in the active data file) for correlations. Then, the user clicks on
the arrow button to moveselected variable names into the list of variables to be analyzed. In this example, the variables to be correlated are named commit (commitment) and intimacy. Other boxes can be checked to determine whether significance tests are to be displayed and whether two-tailed or one-tailed p values are desired. To run the analyses, click the OK button.
Figure 10.23 Correlations
Menu Selections
for
Bivariate
The image is a screenshot from SPSSthat helps in menu selections for bivariate correlations. The details are below; At thetopofthe spreadsheetarethefollowing menu buttons;file, edit, view,data, transform, analyze, graphs, utilities, extensions, window and help Below these buttons are icon buttonsfor table editing options. On the clicking of the Analyze button,a dropdown menu withthefollowing options has opened; reports, descriptive statistics, Bayesian statistics, tables, compare means, general linear model, generalizedlinear ‘models, mixed models, correlate, regression, loglinear, classify, dimension reduction, scale, non-parametric tests, forecasting, survival, multiple response, simulation, quality control, ROC curve, and spatial and temporal modelling. An arrow next to correlate showsthat this has
been depressed. The following menuoptions have opened; Bivariate,Partial, Distances and Canonical correlation. Bivariate has been indicated by an arrow. The output from this procedure is displayed in
43% Page 264 of 624 - Location 6717 of 15772
Figure 10.25, which shows the value of the
variable with itself is 1 (by definition). Only one of
Pearson correlation (7= +.745), the p value (which
the four cells in Figure 10.25 contains useful
would be reported as p< .001, two tailed), and the
information. If you have only one correlation, it
number of data pairs the correlation was based on
makes sense to reportit in sentence form (“The
(W= 118). The degrees of freedom for this
correlation between commitment and intimacy
correlation are given by V-2, so in this example,
was {116] = +.745, p< .001, two tailed”). The
the correlation has 116 d/ (A common student
value in parentheses after ris usually assumed to
mistake is confusion of dfwith N. You may report
be the df, unless clearly stated otherwise.
either of these as information about samplesize, but in this example, note that d/= 116 [W-2] and
It is possible to run correlations among many
N= 118.)
pairs of variables. The SPSS Bivariate Correlations
Figure 10.24 SPSS Dialog Box for Bivariate Correlations
list of five variables: intimacy, commit, passion,
1% Bivariate Correlations
length (of relationship), and times (the number of times the person has been in love). If the data
Variable:
analyst enters alist of five variables, as shown in
& my [2 com
[& vercer | servant en あ sa | ff mes
dialog box that appears in Figure 10.26 includes a
this example, SPSS runs the bivariate correlations among all possible pairs of these five variables (as shown in Figure 10.27). If there are variables,
passion
the number of possible different pairs of variables is given by [4 x (4—1)]/2. In this example with #= 5 variables, (5 x 4)/2 = 10 different correlations are
Correlation Coefficient
Pearson ©] Kendarstab [E] speorman
reported in Figure 10.27. Note that because correlation is “symmetrical” (i.e., the correlation
Testof Significance-
© Two-taied © Oretaied
between Xand Fis the same as the correlation between Fand X), the correlations that appear in
[Y Flag significant correlations
Lex) Eme.) Cen Caro) Ce)
the upper right-hand corner of the table in Figure
Figure 10.25 Output for One Pearson Correlation Correlations commit intimacy
intimacy commit
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
T
118 745% .000 118
Tas .000 118 1 118
Correlation is significant at the 0.01 level (2-tailed).
Note that the same correlation appears twice in Figure 10.25. The correlation of intimacy with commit(.745) is the same as the correlation of commit with intimacy (.745). The correlation of a
43% Page 265 of 624 - Location 6740 of 15772
10.27 are the same as those that appear in the
lower left-hand corner. When suchtables are presented in journal articles, usually only the correlations in the upper right-hand corner are
shown. Figure 10.26 Bivariate Correlations Dialog Box: Correlations Among All Variables in List
\&, Bivariate Correlations de cender
E cenar E cta
Correlation Coefficients
[В Pearson [E] Kendars aud [J speaman Test of Significance @ 7wo-taied © One.taied
Correlations Umes ето Trina commit] passion 008 ams r er т nen Pearson Correlation 934 058 000 000 Sig. (2-tailed) 115 118 114 118 118 N 008 | oe 1 745 commit Pearson Correlation 929 033 000 000 Sig. (2-tailed) 115 118 114 18 118 N 041 o 1 sn TE passion Pearson Correlation 670 432 000 000 Sig. (2-tailed) m 114 ne 14 114 N 090 1 om 197° 175 length Pearson Correlation 310 an 033 058 sig (2-tailed) 115 ne 114 118 118 N 1 090 081 008 008 Pearson Correlation ‘mes 340 670 29 934 Sig. (2-talled) 115 ns un us 115 N *"cotreation is significant atthe 0.01 level (2-talled) +. Correlation is significant atthe 0.05 level (2-1ailed). Figure 10.28 Initial SPSS Syntax Generated by Paste Button
[7 Fig sinificant corrections
E
Cos Lea (ret) (conca) Cro The image is a dialog box that shows how to select bivariate correlations among all variables in the list. On the left are the set of variables that includes the following; gender, genpart, and attach. On the right are theselected variablesintimacy, commit, passion, times, and length. Times has been highlighted. Below this are options to select correlation coefficients including Pearson, Kendall's tau-b and Spearman.Pearsonhas been chosen. There is a choice oftest of significance — ‘whetheronetailed or two tailed, where the choice is made through radio buttons. Two tailed has been selected.
A check box that states Flag significant correlationshas beenticked. On the right is an options button. At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 10.27 Correlation Output: All Variables in List (in Figure 10.26)
43% Page 267 of 624 - Location 6762 of 15772
The imageis an SPSS syntax generated by a Paste button.
At thetopofthe sheet are thefollowing menu buttons;file, edit, view, data, transform, analyze, graphs, utilities, add-ons, run, tools, window andhelp. Below these buttonsare icon buttonsto open a
file, save, print, go back andforward, and othertable editing options.
On the left, the statement Dataset activate and
correlations is seen. Ontheright, the editor showsthe SPSS commandsfor correlation that were generatedby the user's menu selections which are now pasted here. There are6 statements; 1. Dataset activate dataset6
2. Correlations 3. Backslash variables equals intimacy commitpassion lengthtimes
4. Backslash print equals twotail nosig 5. Backslash missing equals pairwise.
and passion) and two predictor variables X; and X, (length and times). To do this, we need to paste and edit SPSS syntax.
Figure 10.29 Edited SPSS Syntax to Obtain Selected Pairs of Correlations
Look again at the Bivariate Correlations dialog box in Figure 10.26; there is a button labeled Paste. Clicking the Paste button opens a new window, called a Syntax window, and pastes the SPSS commands(or syntax) for correlation that were generated by the user’s menu selections into this window. The initial SPSS Syntax window appears in Figure 10.28. Syntax can be saved,printed, or edited.It is useful
The image is an SPSS syntax and atthe top of the sheet arethefollowing menubuttons;file, edit, view,data, transform,analyze, graphs, utilities, add-ons, run, tools, window and help. Below these buttonsare icon buttons to open a
file, save, print, go back andforward, and other table editing options.
On the left, the statement Dataset activate and
to save syntax to document what you have done or to rerun analyses later. In this example, we will edit the syntax; in Figure 10.29, the SPSS keyword WITH has been placed within the list of variable
names so that the list of variables in the CORRELATIONS command now reads, “intimacy commit passion WITH length times.” It does not
matter whether the SPSS commands are in uppercase or lowercase; the word WITH appears
correlations is seen. Ontheright, the editor showstheedited SPSS commandsfor correlation that were generatedby the user's menuselections which are now pasted here.
in uppercase characters in this example to makeit
Thereare6 statements;
(length and times). Variables can be grouped by
1. Dataset activate dataset6
2. Correlations 3. Backslash variables equals intimacy commitpassion WITH length times 4. Backslash print equals twotail nosig 5. Backslash missing equals pairwise.
easy to see. In this example, each variable in the second list (intimacy, commit, and passion) is
correlated with each variablein the firstlist kinds of variables. Length and times are objective information aboutrelationship history that could be thought of as predictors; intimacy, commitment, and passion are subjective ratings of relationship quality that could be thought of as
outcomes. This results in a table of six correlations, as shown in Figure 10.30.
Correlations among long lists of variables can generate hugetables, and often researchers want to obtain smaller tables. Suppose a data analyst wants to obtain summary information about the
correlations between a set of three outcome variables 71, ¥», and ¥3 (intimacy, commitment,
44% Page 267 of 624 + Location 6779 of 15772
If many variables are included in the list for the bivariate correlation procedure, the resulting table of correlations can be large. It is often useful to set up smaller tables for subsets of correlations that are of interest, using the WITH command to designate which variables should be paired.
Note that judgments about the significance of» values indicated by asterisks in SPSS output are not adjusted to correct for the inflated risk for Type I error that arises when large numbers of significance tests are reported. If a researcher wants to control or limit the risk for Type I error, this can be done by using Bonferroni-corrected per comparison alpha levels to decide which, if any, of the p values reported by SPSS can be judged
TTT Commitment ぁ a Passio ヶ os Note:Judgments aboutstatisticalsignificance were based on Bonferroni-correctedper comparison alphas. To achieve EW- .05 for this set ofsix correlations, each correlation was evaluated using PC, =.05/6 = .008. By this criterion, noneofthe 6 correlations can bejudged statistically significant.
statistically significant. For example, to hold the
As a reader, you can generally assume that
EW level to .05, the PC, level used to test the six
statistical significance assessments have not been
correlations in Table 10.5 could beset to à = .05/6
corrected for inflated risk for Type I error unless
= .008. Using this Bonferroni-corrected PC, level,
the author explicitly says this was done.
none of the correlations would be judged statistically significant.
Figure 10.30 SPSS Correlation Output From Edited Syntax in Figure 10.29
10.21 Results Sections for One and Several Pearson’S 7 Values
Correlations
length intimacy
commit
passion
times
Pearson Correlation Sig. (2-tailed) N Pearson Correlation
175 .0⑤⑧ ⑪⑧ .197*
-.008 .⑨③④ ⑪⑤ .00⑧
Sig. (2-tailed)
033
.⑨②⑨
N Pearson Correlation Sig. (2-tailed) N
⑪⑧ .0⑦④ .④③② 114
⑪⑤ -0④① 670 111
*, Correlation is significant at the 0.05 level (2-tailed).
Following is an exampleof a “Results” section that presents the results of one correlation analysis.
Results A Pearson correlation was performed to assess whether levels of intimacyin dating relationships could be predicted from levels of commitmenton a self-report survey administered to 118 college students currently involved in dating relationships.
Note: Correlations between variables in the
Commitmentand intimacy scores were
first list (intimacy, commitment, passion) are
obtained by summing items on two ofthe
correlated with variables in the second list
scales from Sternberg’s (1997) Triangular
(length of present dating relationship,
LoveScale; the range of possible scores was
number of times in love). The p values in
from 15 (low levels of commitment or
Figure 10.30 were notcorrected for inflated
intimacy) to 75 (high levels of commitment
Type IL error.
or intimacy). Examination of histograms indicated that both variables had negatively
skewed distributions. Scores tended to be
Table 10.54
high on both variables, possibly because of social desirability response bias and a ceiling effect (most participants reported very
A4% Page 268 of 624 - Location 6906 of 15772
positive evaluations of their relationships).
number of times the participant has been in
Skewness was not judged severe enough to
love, on the basis of a self-report survey
require data transformation or removal of
administered to 118 college students
outliers.
currently involved in dating relationships.
The scatterplot of intimacy with commitment showed a positive linear relationship. There was one bivariate outlier with unusually low scores for both intimacy and commitment; this outlier was retained. The correlation between intimacy and commitment was statistically significant,
Intimacy, commitment, and passion scores
were obtained by summing items on scales from Sternberg’s (1997) Triangular Love Scale; the rangeof possible scores was 15 to
75 on each of the three scales. Examination of histograms indicated that the distribution shapes were not close to normal for any of these variables; distributions of scores were
A116) = +.75, p< .001 (two tailed). The 7? was
negatively skewed for intimacy,
.56; about 56% of the variance in intimacy
commitment, and passion. Most scores were
could be predicted from levels of
near the high end of the scale, which
commitment. This is a strong effect. The 95%
indicated the existence of ceiling effects, and
CIwas[.659,.819]. This relationship remained strong and statistically significant, (108) = +.64,p< .001, two tailed, even when outliers with
there were a few isolated outliers at the low endsof the scales. Skewness was not judged severe enough to require data transformation
or removal of outliers.
scores less than 56 on intimacy and 49 on
Scatterplots suggested that relationships
commitment were removed from the sample.
between pairs of variables were (weakly)
linear. The six Pearson correlations are Note that if SPSS reports p = .000, reportthis as р
> . When the dialog box for the one-sample # test appears, as in Figure 7.12, move the name of
the variable of interest into the list of variables to be analyzed. Leave the box “Test Value” containing
the default value of 0. Then click OK. Figure 7.11 Descriptive Statistics for Temperature
in Fahrenheit in shoemaker.sav
Statistics
temperature data collected through smart phone crowdsourcing is reported by Hausman et al.
(2018).
temp_Fahrenheit
Values of N, M, and SD for the Fahrenheit
N
temperature scores in the file shoemaker.sav were obtained using the SPSS frequencies procedure (menu selections are not repeated from earlier chapters). Results appear in Figure 7.11. The first thing to notice is that the sample mean in Figure 7.11, M= 98.25, is lower than the population mean that people generally believe
Valid
130
Missing
‘ 0
E⑨.②⑤④」 Std. Error of Mean
.0667
Std. Deviation
.7603
(98.6). The difference is (98.25 — 98.6) = —35. This sample mean is about a third of a degree lower than the generally accepted value. (Note that if you look up Shoemaker's article, numerical values
30% Page 123 of 624 - Location 4675 of 15772
The image is a table that showsthefollowing descriptive statistics data:
large gaps betweenthe data points.
amount of water. However, high-income households may include people who live in small but expensive apartments (who may not use very much water) and people with huge estates (who fill swimming pools and water vast lawns). The vertical arrows indicate the approximate range or variance of scores for low- versus high-income groups. Variance in water use is much greater for high values of income than low values of income. In thissituation, Pearson’s rwould not be a complete description; information about the
differences in variances in amount of water use for different levels of income would also be needed. Figure 11.3 Hypothetical Data: Water Use Plotted by Household Income Showing Heteroscedasticity
Gallons of water use
Violation of this assumption would result in larger prediction errors in the outcome variable (water use) for people with high incomes than for people with low incomes.
11.7 Formulas for Bivariate Regression Coefficients We want to find values of 29 and ¿thatgive us the equation thatfalls as close as possibleto all the points in the scatterplot. Another way to say this
is that we want coefficients that minimize the prediction errors. The prediction or
residual for each caseis just the difference between each person's actual value of Yand the value of Y predicted using the regression line:
Large
Other
(11.4) Prediction errorfor person i = (Y;— Y).
‘Small variance
O
If Ann's actual salary Yis $40,000, and her predicted salary, ¥, is $43,000, the prediction
Household income
error is (40,000 - 43,000) = -3,000. Her actual salary is $3,000 lower thanthe salary predicted
The image is a scatterplot thatdisplays heteroscedasticity.
for her using the regression line.
The X axis denotes household incomeandthe Y axis the gallons of water used. There are several scatter dotsspread all overthe graph area, andthedistance betweenthem increase as they move from left to right.
provides predicted values (7) that are as close as
Two areasof the scatter dots have been circled. The first is atthe bottom left and is termedthe small variables andthe second is the top right and is termedthelarge variance. The small variance shows less gap between the data points andthe large variance section has 49% Page 297 of 624 » Location 7562 of 15772
The “best” regression equation is the one that possibleto the actual Y. We need to summarize information about the magnitudeof prediction errors across all persons in the sample. Can we just add up the prediction errors? No. I will tell you (without proof) that the sum of the prediction errors across all cases, , always equals 0. This should not surprise you; you have seen that deviations often sum to zero. Therefore, adding prediction errors won't provide useful information. We encountered the same problem
when we wanted to compute a sample variance,
because the sum of deviations of X scores from the sample mean was also O in that situation. Recall
Other
(11.6) R
The solution that was used to compute the
b=r—. Sx
variance of Xwasto square each deviation and
See Appendix 11C for an alternative equation to
that the “bag of tricks” in statistics is fairly small.
then sum the squared deviations. The same trick is used here. We compute SSE (sum of squared prediction errors) as follows:
compute the estimate of À.
Note that the ratio of standard deviations in this equation is sy/syor, more generally,
Other
dependent/Sindependent- To understand this ratio,
(11.5)
SSE = X[(¥,- Y/Y]. Equation 11.5 says that we calculate a prediction error for each individual case, square each
prediction error, then sum the squared errors for all cases. We want to obtain the values of A and % that minimize SSZ. The method used by
mathematical statisticians to derive the formulas to compute Zp and dis called ordinary least squares (OLS). Many other statistics you'll learn
you can think of it this way. When you want to predict Yscores from Y scores: * First you need to divide Y scores by syto take
them out of the X score units of
measurement. e You multiply that result by 7 this provides information about the sign and strength of
linear association between Yand Yscores. * Then you multiply by syto convert the result
into units of the Youtcome variable.
are also based on OLS estimation. Appendix 11B
Wealso need to adjust predictions to take into
explains how the formulasfor 29 and ¿that yield
accountthe differences between the means of X
the minimum SSZ (the smallest prediction errors)
and ¥. The intercept 4p (the predicted value of F
were obtained.
when Æ= 0, or the point on the graph where the
Fortunately, because optimal formulas for the best values of Zp and pare known, you don’t have to solvethis problem every time you want todo a regression. In practice, here is how to calculate the estimated values of band Zp: First, find the means and standard deviations for
regression line crosses the Yaxis) can be computed from the means of Yand 7, and the raw-score slope 7, as follows:
Other
(11.7)
b,=M,— bx My,
Xand Yand their bivariate correlation 7yy: (By now, these computations should be familiar.)
where Myis the mean of the Fscores in the sample and Myis the mean of the Yscores in the
Once you havethe values of My, Y, sx, sy, and 7xy, the estimate for the raw-score slope ¿to predict Y
from Xis:
sample. Including Zp in a regression equation adjusts for differences between the means of X
and ¥.
49% Page 298 of 624 » Location 7587 of 15772
Equation 11.6 makes it clear that 4is essentially a
rescaled version of 7. Unlike the unit-free
Tests for Bivariate Regression
Pearson’s 7, which has a range from -1 to +1, the
The null hypothesis for a regression with one
range of 2depends on the units in which Yand Y
predictor variable can bestated as follows: Ho: do =
are measured.
0. (We can also test the null hypothesis Zp: 2p = 0,
Notice several implications of Equation 11.6:
but this is usually not of interest.)
* Ifrequals0, then Zalso equals 0.
When you run the SPSS regression procedure, you
* Thesign of pis determined by the sign of =
Will obtain a ¿ratio to test this null hypothesis. As
* The magnitude of the raw-score 4 coefficient
in earlier situations, this ¿test has the following
depends on the standard deviations of both variables. Values of 2can range from extremely small into the thousands and
form: Other
(11.8)
above. * Assy(the standard deviation of the outcome
Samplestatistic - Hypothesized parameter _ b-0 imple statistic
variable) increases, holding sy constant, 2 increases. * Onthe other hand, as sy(the standard deviation of the predictor variable) decreases, holding syconstant, also increases. * Because Ais a rescaled version of 7, factors thatcan inflate or deflate (discussed in Appendix 10D) can also influence the magnitudeof 7. The B (beta coefficient) to predict zp from zris В = 7. The standard-score (z-score) version of the regression equation does not require an intercept to adjust for means of the variables, because z scores have means of 0. Because P = 7, factors that
can inflate or deflate also influence the magnitudeof B. Because the magnitudes of Zand B are influenced by many of the same problems that can make sample 7s poor estimates of the true population correlation p, comparisons of values of bor B (across samples or predictor variables) should only be made with great caution.
11.8 Statistical Significance
49% Page 299 of 624 » Location 7618 of 15772
with (W-2) df An estimate of SF; is needed to set up the zratio. SPSS provides this estimate; if you need to calculate it by hand, the formula is as
follows: Other
(11.9)
/(N-2) E(7-Yy SE, = (ダ - W)
This ¿ratio is evaluated relative to a /distribution with V-2 4%, where Nis the number of cases. SPSS provides this ¿test along with a two-tailed y value. SPSS also provides a test of whether the intercept bp equals zero; however, this test is rarely of interest and usually is not reported.
11.9 Confidence Intervals for Regression Coefficients Using SE, the upper and lower limits of the 95%
Claround #can be calculated using the usual
formula: Other
11.11 Empirical Example Using SPSS: Salary Data A smallset of hypothetical data for the empirical
(11.10)
Lower limit =b—¢_, t x SE,,
exampleis given in the SPSSfile salary.sav (with N = 50 cases). The predictor, X, is the number of years employed at a company; the outcome, ¥, is
Other
the annual salary in dollars. The research question is whether salary changes in a systematic (linear)
(11.11)
Upper limit = b + #“crit x SE,
way asyears of job experience or job seniority
where Zit is the critical value of ¿that separates
salary can an individual expect to earn for each
the bottom 2.5%, the middle 95%, and the top
additional year of employment?
2.5% of the area in a ¿distribution with が-② が SPSS provides the upper and lower limits of the
95% CI for the Acoefficient.
11.10 Effect Size and Statistical
increases. In other words, how many dollars more
To run a bivariate linear regression, make the following menu selections: っ > (as shown in Figure
11.4). This opens the main dialog box for the SPSS Linear Regression procedure, which appears in
Power
Figure 11.5. The name of the dependent variable
For a bivariate regression, the assessment of
(years) were moved into the panes marked
statistical power can be done using the same
“Dependent” and “Independent(s)”in this main
statistical power tables as those used for Pearson’s
dialog box.It is possibleat this point to click OK
7. In general,if the researcher assumes that the
and run a regression analysis; however, for this
strength of the squared correlation between Xand
example, the following additional selections were
Yin the population is weak, the number of cases
made. The Statistics button was clicked to open
required to have power of .80 or higher is rather
the Linear Regression: Statistics dialog box; a
large. Tabachnick and Fidell (2018) suggested that
checkbox in this window was marked to request
the ratio of cases (#) to number of predictor
values of the 95% CI for 7, the slope to predict raw
variables (4) should be on the order of N > 50 + 8%
scores on Yfrom raw scores on Y (Figure 11.6).
(salary) and the name of the predictor variable
or N> 104 + (whichever is larger) for regression
Figure 11.4 SPSS Menu Selections for Linear
analysis. This implies that Mshould beat least 105
Regression
when using one predictor variable. This is consistent with sample size suggestions from Schônbrodt (2011), discussed in Chapter 10. Even if statistical power tables may suggest that N< 100 can give adequate statistical power for significance tests of band 7, it is preferable to have N> 100.
49% Page 300 of 624 » Location 7647 of 15772
The Residualssection is below this. Here there are check options for Durbin-Watson and casewise diagnostics. Both have been left unmarked.
Table 11.1 relabels and rearranges the elements of the coefficient table in the SPSS outputso that you can relate them to terms in the textbook. The top panel of the SPSS outputin Figure 11.7 gives
Atthe bottom are option buttons for continue,
cancel andhelp.
results for Æ (capital Æis called multiple 2).
Onthe basis of information in Table 11.1 we can
11.12 SPSS Output: Salary Data To see the equivalence between Pearson's rand
write the unstandardized regression equation to predict salary in dollars from experience in years,
as follows:
parts of the results of the bivariate regression
Other
result, Pearson's rbetween years and salary was
Y =31,416.72 + 2,829.57 x years.
obtained using the SPSS correlations procedure;
Figure11.7 Pearson's 7for Years and Salary Correlations
results appear in Figure 11.7.
Unstandardized Predicted Value
Complete SPSS regression output includes additional information (discussed in Volume II [Warner, 2020]). Figure 11.8 shows the results needed to find the proportion of predicted and unpredicted variance (Æ and 1 - £2) and to write out the two versions of the regression equations (raw score and standardized). From the top of Figure 11.8, the proportion of variance in salary that can be predicted from years of experience is 72 or £2, thatis, .688 or about 69%. When regression includes more than one predictor, multiple Rtells us how well the entire set of predictor variables can predict 乃 In this example, the regression equation has only one predictor. When there is only one predictor variable, Pearson’s 7between Xand Pis the same as multiple R for the equation that uses Xto predict ¥. (You can ignore the other information in the top panel of Figure 11.8 for now. The
standard error of the estimate is discussed later in the chapter and is not usually included in research reports. The adjusted £2 valueis only used when a regression has more than one predictor variable.)
Unstandardized
Predicted Value
Pearson Correlation
Sig. (2-tailed) N
Pearson Correlation Sig. (2-tailed)
salary
N
** Correlation is significant at the 0.01 level (2-tailed).
830**
50
50
50
50
830°] 000
The image is a table that depicts Pearson correlations forsalary. Details are below; * Unstandardized predicted Value © Pearson correlation = Unstandardizedpredicted Value: 1
= Salary: .830 double star © Sig. 2-tailed = Unstandardizedpredicted Value: blank = Salary: .000 e R
» Unstandardizedpredicted Value: 50
= Salary: 50
e salary © Pearson correlation » Unstandardizedpredicted Value: .830 double star = salary: 1
© Sig. 2-tailed 49% Page 302 of 624 » Location 7691 of 15772
salary
1
000
1
» Unstandardizedpredicted Value:
= Lower bound -26528.786 = Upper bound - 36304.646 © Model 1- years © Unstandardizedcoefficients = B-2829.572 = Std error - 274.838 Standardized coefficients = Beta -.830 T-10.295 Sig -.000 95 percentconfidence interval = Lower bound -2276.972 = Upper bound - 3382.171
.000
= Salary: blank e R
» Unstandardizedpredicted Value: 50
o
= Salary: 50
o o o
Double star indicates that correlation is significantat the .01 level 2-tailed.
Figure 11.8 Selected SPSS Linear Regression Output: Prediction of Salary From Years at Work
Model 1
R
830º
Model summary? Adjusted Rsquare RSquare
a. Predictors: (Constant), years b. Dependent Variable: salary
688 |
682
SW. Error of the Estinate
10407343
Herethe dependentvariable is salary. Table 11.1
Cont 。。 во [зе т сео пабе ал you ansa mune Tern ae oy
coment? Cae be + TI am ||
314167 se
ene ove Bond Uppa fas | a | eme ow| mesa mow
The image shows2 tablesthat depicts prediction of salary from years at work. The first table is the model summary superscript b that is reproduced below: Model 1, R: .830 superscript a, R square: .688, Adjusted square: .682, Std error of estimate: 10407.343 Superscript a - The predictors: constant, years
Superscript b- Dependentvariable: salary Thesecondtable is the coefficient table.
zum
=azs
[26.528.79, 36.04.65
om
(22769733247)
example, O is a possible value for years of experience. In this example, $31,416.72 represents starting salary. The slope ¿tells us the predicted increase in salary (in dollars) for each additional one-unit increase in experience; this is $2,829.57. This corresponds to the average salary raise per year. The valueof the beta (B) coefficient also appears in
Figure 11.8. SPSS denotes the column for the standardized coefficients with the word Beta. In this example, B = .83. It is not just a coincidence that B = 7= RThis always happens when regression includes only one predictor variable.
o
We can use the value of B to write the following standardized regression equation:
Other
50% Page 303 of 624 - Location 7717 of 15772
.05. (However, keep in mind that some readers and reviewers continue to think that way.)
Comprehension Questions 1. Rerun the analysis for the carspeed.sav data and compare results across analyses (the chapter reports results for 95% Cland à = .05, two tailed).
оныено) Situation]
Situation 2
E
MS
—
=
ッ
Mop,
⑨
uE
g
e
ョ
Situations Situation 6
30 ぁ
Mob, んs
o k
ョ
. . . . 7. What is the difference between a directional and a nondirectional significance test? 8. Other factors being equal, which type of significance test requires a value of zthatis larger (in absolute value) to reject Zp—a
directional or a nondirectional test? 9. When a researcher reports a p value, p stands for “probability”or risk. What probability or
risk does this refer to? 10. Do we typically want pto belarge or small?
11. What is the conventional standard for an “acceptably small” p value?
1. Use a =.01, two tailed, and a 99% CI.
2. Use a =.10, twotailed, and a 90% CI.
Digital Resources
As a increases, does it become easier or more
Find free study tools to support your learning,
difficult to reject 79?
including eFlashcards, data sets, and web
2. Using the data in shoemakertemp.sav, test the null hypothesis Ho: = 98.6°F using a nondirectional alternative hypothesis. Conduct a one-sample¿test using a = .05, two tailed,as the criterion for significance. Also obtain the 95% CI. Can you reject Ap: 98.6°F? How does this result compare with
the CI obtained for the same data in the previous chapter? 3. What is a null hypothesis? 4. Describe three possible alternative hypotheses. 5. Whatis an alpha level? What determines the
value of a? 6. Sketch reject regions for each of the following situations:
35% Page 211 of 624 - Location 5407 of 15772
a
resources, on the accompanying website at
|
2.754 /(-2.754 +18)= 7.585 (7.585 +18)= .296= .30.
independent-samples ratio does depend on 〆 IF other factors are held constant, as increases, £ also increases in absolute magnitude. In a few respects, 715 similar to some effect sizes: it is unit
About 30% of the variance in heart rate in this
free or standardized and not in the original units
study was predictable from caffeine dose.
of measurement; it has a sign that indicates the direction of the relationship (which group mean
To obtain 7pp:
is higher). By itself, zcannot be interpreted as a
Take the square root of n°;
proportion of variance; however, tand d/can be converted into n2, which does provide
Toobtain Cohen's 4
information about proportion of variance. A £ ratio does not have a limited range of possible
First find sp from s, and 52.
values. Neither a ¿ratio nor its accompanying y value provides information about effect size.
When 7; = 75, we can use Equation 12.12 (if ns are not equal, use Equation 12.11):
e Researchers report £, 4/, and pas information aboutstatistical significance; these numbers
sp? = (512 + 522)/2 = (7.208? + 9.0852)/2 =
donot tell us anything abouteffect size. On
(51.955 + 82.537)/2 = 134.492/2 = 67.246.
the basis of zand p, we make judgments only aboutstatistical significance (and not about
To obtain Sp, take the square root of sp: や ー
significance or importance in practical, clinical, or real-world domains). e Researchers should also report one or more of
8.200. Then d= (M —M2)/sp =-10.1/8.200 =-1.23.
the effect sizes listed above as information about strength or size of effect (independent
This value of Ztells us that the mean of the no-
of sample size). Kirk (1996) suggested that we
caffeine group was 1.23 standard deviations
can interpret these values in terms of clinical
lower than the mean of the caffeine group (and
or practical or real-world “significance.”
the mean of the caffeine group was 1.23 standard
Unfortunately both researchers and research
deviations higher than the mean of the no-
consumers sometimes confuse statistical
caffeine group).
significance (» < .05) with practical, clinical, or real-world “significance.” I prefer to speak
Using Cohen's standards? to evaluate effectsize in
of practical, clinical, or real-world
Table 12.1, all these values are judged to be large to
importance (and avoid use of the potentially
very largeeffectsizes.
confusing term significances).
12.10.6 Summary of Effect Sizes
Table 12.1
Table 12.1 summarizes the characteristics of these effect sizes. Effect size values do not depend on A. By comparison, the magnitude of the
I 58% Page 350 of 624 - Location 8975 of 15772
In original units of measurement? Standardized/unit free? Dependent on N? Sign that indicates direction of difference? Fixed range of possible values?
Interpretas proportion ofvariance? Interpret as information about strength of association, independent of sample size?
fes No No Yes No
No Yes, if meaningful units
EPC Yes No No Otol
Yes Yes
Yes Yes No No scanbe Yes assigned tor Nobutd>t in absolute value is in research No = xs Yes
labels for effect size (thatis, require larger values of pp and other effect sizes before calling them “medium”or “large” effects). Effect size guidelines suggested by Cohen differ slightly when given in
terms of different effect size indexes.
Table 12.27
Very large effect
Note: Mosteffect sizes can be converted into other effect
sizes, but additional information is often required.
a.Unequal 77s in the groups restrict the range of possible values for 7p. The greater the inequality of 77s, the smaller the possible absolute value of ‘pb. Wecan also ask whether a finding has theoretical value or importance. If variable X accounts for more than 50% of the variance in a Foutcome, we might decide that variable X should be included in our theory about what causes Y: On the other hand,if variable X can account for only 1% of the variance in Y(even if is a “statistically significant” predictor of ¥), we would want to include more useful explanatory variables in a theory that attempts to explain X. There is no clear cutoff for a minimum proportion of explained variance. Cohen (1988) suggested guidelines for interpretations of effectsizes; Table 12.2 summarizes these labels. You may want to compare this with Table 10.3 in Chapter 10,
Largeeffect Mediumeffect Small effect Noeffect
20 15 12 10 09 08 06 05 04 03 02 01 00
707 о su 447 410 an 3% 287 243 1% 148 100 050 1000
囚 a0 265 200 168 198 109 ces 哲 哲 2 00 ско 00
Source: Adapted from Cohen (1988).
Note: The cutoff points for verbal labels are approximate. For 7 .50, effects may be detectable in everyday life (for instance, the sex difference in height, with d'= 2.00, is something people notice in everydaylife). Effect sizes have three major uses:
1. At least one index of effect size should be
which includes some additional information
reported with every statistical significance
about the way effect sizes are related to whether
test. For the independent-samples rtest itis
effects are detectable in everydaylife. These labels
common to report n2, 7pb, or Cohen's d. When
are based on recommendations made by Cohen
the dependentvariable is measured in
for the evaluation of effect sizes in social and
meaningful units, discussion should also
behavioral research; however, in other research
focus on the M; -M, difference as a way to
domains, it might make sense to use different
SE 58% Page 351 of 624 - Location 9005 of 15772
think about the clinical or practical or realworld importance of the finding. 2. When you plan future research, you can use effect sizes from past research to estimate the minimum sample size you need to have adequate statistical power in your planned study. This is called statistical power analysis. Usually people want to haveat least 80% power (i.e., approximately 80% chance of obtaining a statistically significant outcome for the guessed value of population effect size, such as n°). When a study has such small ns that there is a very low probability of obtaining a statistically significant outcome given the population effectsize, it is called
Formulasfor statistical significance tests such as the independent-samples /test can be written in a way that makes it clear that the /test combines information about effect size and sample size or N or df(Rosenthal & Rosnow, 1991). In words:
Other
(12.22) Magnitude of t-test ratio = Effect size x Sample size of study. If effect size is held constant, the expected magnitudeof ¿increases as Vincreases. If Vis held constant, the expected magnitude of #
increases as effect size increases. With little bit of thought it should beclear that:
underpowered. 3. When an author summarizes past research,
When effect size and Ware both very large,
he or she obtains and combines (averages)
the value of ¿will almost alwaysbelarge
effect size information for each of dozens or
enough to judge the outcome statistically
hundreds of studies. Thisis called metaanalysis. For example, we might want to
significant (and values ofp will be very small).
know whether mean depression after therapy for patients differs across numerous studies that compare client-centered therapy (treatment) with no therapy (control). An
When effect size and Ware both extremely small, the valueof ¿will almost alwaysbe too small to judge the outcome statistically significant (and values ofp will be large).
effect size such as Cohen's d'or pp provides important information about the direction of difference (there might be a few studies in which mean depression was lower for the notherapy group). If past studies have not reported effect sizes, effect sizes can almost always be obtained from other numerical results in the papers. In meta-analysis, it is importantto include direction of effect.
In practice, when effectsize is very small, you need a larger Ato have a reasonable chance of obtaining a statistically significant outcome. When the effect size is very large, you may be able to obtain a statistically significant outcome using quite a small sample. A specific formula for the independent-samples £ test given by Rosenthal and Rosnow (1991)is:
12.11 Factors that Influence the
Other
Size of £
(12.23)
12.11.1 Effect Size and N
=ダ
58% Page 352 of 624 - Location 9027 of 15772
り
where dis Cohen's 4, calculated as:
get a sense of how increase in sample size makes it possible to detect very small effect sizes (i.e., judge
Other
y MD Sp
them to be statistically significant). For instance, research that compares mean IQ for single-birth children (Group 1) with mean IQ for
If we substitute the formulafor Cohen's Zinto Equation 12.23, we have:
Other
identical twins (Group 2) yields sample means of about M, = 100 and M= 99. (A 1-point difference in IQ is not noticeable in everydaylife; you might notice IQ score differences of 20 or 30 points.) For most IQ tests, s= 15. Using Equation 12.24, we can
(12.24)
ー MoMSF 如Sp
compare potential differences in outcomes for a study with d/= 100 versus a study with d/= 10,000. With d/= 100, the 1-point mean IQ difference is unlikely to yield a ¿value large
The specific values of /that occur in studies will vary because of sampling error. This equation tells us that if we hold other terms in the equation
constant:
enough to be statistically significant. When ду= 10,000, the obtained ¢ratio is likely to be large enough to judge this 1-point difference statistically significant. (The /values are not exact; this equation does not take sampling error
« As df(sample size) goes up, ¢tendsto increase
into account.)
(and ptends to become smaller). « As (M, — M») goes up, ttendsto increase (and
Other
ク tends to become smaller).
On the other hand: * As sp goes up, ztends to decrease (and tends to increase).
—100-99 100 _ 4667510 _ 3335. 152 2 ー ロ ー ア e=.0667x12-667. In practice, researchers sometimes can control
Notice an important implication of Equation
sample size; sometimes they can control the
12.24. Even when effect sizes such as (M; — M) or
magnitude of the other two elements in Equation
Cohen's Zare extremely small, as long as they do not turn out to be exactly zero in your sample, you
12.24. Decisions about “dosage level”or type of treatmentoften can increase the M, — Mp
can judge even very small mean differences
difference. Decisions about the kindsof people to
statistically significant for larger values of М. You
includein the study and the degree of
cannot use Equation 12.24 to predict your
standardization of data collection situations can
outcome valueof ¿exactly from samplesizeand
influence the magnitudeof sp, the within-group
effect size, because this equation doesn't take
standard deviation.
sampling error into account, and we don’t know population effect size. However, you can
substitute different values into Equation 12.24 to
Researchers do not always have control over sample size. Sometimes researchers do not have
I; 58% Page 353 of 624 - Location 9055 of 15772
funds to pay participants, treatments or data
e Explain effectsizes clearly and evaluate them
collection procedures are very costly, or the study
honestly. + Discuss simple information such as M; - Ma
hasto be completed in a very short period of time. When a researcher knows that the sample cannot belarge, he or she needsto think about ways to
when units of measurement are meaningful. * Never say “extremely significant.”
increase the (M, — M3) difference and/or decrease
SpOn the other hand, sometimes the results of large-# studies are reported in misleading ways. When Mis very large, an effect can be judged statistically significant even when the effect size is too small to be of any real life or clinical or practical importance. Consider the twin versus individual child IQ study again. When the difference between mean IQsis tested in samples of 10,000 or more, it is almost alwaysstatistically significant. However, this difference could be
12.11.2 Dosage Levels for Treatment, or Magnitudes of Differencesfor Participant Characteristics, Between Groups The value of M; - M» can be affected by design decisions that involve the types of groups, types of treatment, or dosages of treatment for the two groups. Consider these two hypothetical studies
of caffeine effects on heart rate:
deemed too small to be of any practical or clinical Study A: Group 1 receives 0 mg caffeine,
importance.
Group 2 receives 50 mg caffeine Unfortunately, researchers who conductlarge-N studies and obtain p values < .001 sometimes call
Study B: Group 1 receives 0 mg caffeine,
their results “extremely significant.” (Do not say
Group 2 receives 500 mg caffeine
that!) Here's the problem. In everydaylife, when we use the word significant, we mean large or
Assuming that caffeine does have an effect on
worthy of notice (or at the very least detectable).
heart rate, we would expect the means for heart
When we hear the word significant we tend to
rate to be much farther apart in Study B than in
assume that differences between groups are large
Study A. By increasing the difference between
enough to matter to people and clinicians. Calling
treatment dosage amounts, researchers can often
the results of a study “highly significant” can
increase M, — M» and, therefore, other factors
mislead many readers into thinking that the
being equal, increase £
effects are large enough to be valuable or at least
noticeable in real life.
Studies of naturally occurring groups can also be thought of in these terms. Suppose you want to
Statistical significance and practical or clinical
study age group (X) differences in mean reaction
importance do not always go together,
time (7).
particularly when Wis extremely large. Here's
how to avoid confusion: ® Emphasizeeffect sizes in reports (instead of
Study A: Group 1 is ages 20-29, Group is ages 30-39
statistical significance tests).
rss 58% Page 354 of 624 - Location 9083 of 15772
Study B: Group 1 is ages 20-29, Group 2 is ages
deviation. The within-group standard deviation sp
70-79
is often called experimentalero. Experimental error tendsto belargein drug studies where
Other factors being equal, you would expect mean
participants within each treatment group differ
reaction times to differ much more in Study B
from one another on characteristics such as age,
than in Study A.
anxiety, history of drug use, and so forth. Experimental error is also largeif participants
Researchers must be very careful about
within the same treatment groups are tested in
something else that can influence the magnitude
different waysin different situations. Consider
of the M, — M» difference: confounds of other
the caffeine/heart rate study again: Group 1
variables with type or dosage of treatment. In the
receives no caffeine, and Group 2 receives 150 mg
0 mg caffeine versus 150 mg caffeine study,if the
caffeine. Now consider these different scenarios.
people in the 0 mg caffeine group have heart rate measured in a very relaxing setting, while those in the 150 mg group are assessed in a stressful setting, there is a complete confound between stress and caffeine dosage. Whether it is statistically significant or not, we cannot interpret a large M, — M» difference as
Study A:Participants within both groups are very similar in age, health, and amount of past caffeine consumption; all are nonsmokers; all have averagefitness; none are evaluated during midterms or final exams; and none are tested by an anxious experimenter.
information about the effects of caffeine. Some or all heart rate differences might be dueto the
Study B: Participants within both groups vary
amount of stress in the situation. In this example,
in age, health, and amount of past caffeine
a confound of high stress with high caffeine
consumption; some smoke, some do not; they
would make the M, — M» difference larger. Some
have varying levels of aerobic fitness; some
confounds may make an M, — M, difference
are tested during midterms and finals, others
smaller (for example, if heart rate was measured by anasty and threatening experimenter in the 0 mg caffeine group and by a relaxed and friendly experimenter in the 150 mg caffeine group, the
before spring break; and several different experimenters interact with the participants,
some of whom are much more anxious than others.
effects of caffeine and the confound might cancel each other out and lead to a small M, —- M
In StudyA,if participant characteristics are very
difference). The presence of one or more
similar or homogeneous, and experimental
confounds makes an A; — M» difference, and the #
procedures are standardized and consistent,
ratio based on that difference, uninterpretable.
participants in each group should not show much variation in heartrates. Thus, in Study A, Sp
12.11.3 Control of Within-Group
Error Variance Researcher decisions can also influence sp the pooled or averaged within-group standard
should be relatively small. On the other hand, in Study B, people who are in the same treatment group havedifferent health backgrounds and are tested under different circumstances; you would expect wide variation in their heart rates. In Study
gs 58% Page 355 of 624 - Location 9111 of 15772
B, sp would berelatively large. If other factors
heartrate.
(effect size and A) are held constant, there would be a better chance of obtaining a large ¿value for
Results
Study A than for Study B. Recruiting similar participants can help withstatistical power, but it
An independent-samples /test was
also reduces generalizability of findings. The
performed to assess whether mean heart rate
participants in Study A are notdiverse.
differed significantly for a group of 10 participants who consumed no caffeine
12.11.4 Summary for Design Decisions Members of my undergraduate class became upset when I explained the way research design decisions can affect the values of « They said, “You
mean you can make a study turn out any way you want?” The answer is, within some limits, yes. The independent-samples ¿testis likely to be large for these situations and decisions. (For each factor, such as 7, add the condition “other factors being equal.”)
(Group 1) compared with a group of 10 participants who consumed 150 mg of caffeine. Preliminary data screening indicated that scores on heart rate were reasonably normally distributed within groups. There were two high-end outliers in Group 1, but they were not extreme; outliers were retained in the analysis. The mean heartrates differed significantly, (18) =-2.75, p= .013, two
tailed. Mean heart rate for the no-caffeine group (M= 57.8, SD = 7.2) was about 10 beats per minute lower than mean heart rate for the caffeine group (M= 67.9, SD= 9.1). The
® Nis large (a very large NV study can yield a
effectsize, as indexed by n2, was .30; this is a
statistically significant ¿ratio evenif the
very large effect. The 95% CI for the
population effect is very small).
difference between sample means, /ハ ー か ,
e Population effect size such as n° is large(this is often related to treatment dosages or types
had alower bound of-①⑦.⑧① and an upper bound of -2.39. This study suggests that
of participants being compared). e M, — Mis large (however, M; — M» is not
consuming 150 mg of caffeine may
interpretable if confoundsare present). * spis small (this happens when participant
increase on the order of 10 bpm.
significantly increase heart rate, with an
characteristics and assessmentsituations are
The assumption of homogeneity of variance was
homogeneous within groups).
assessed using the Levene test, £= 1.57, p= .226;
Depending on their research questions and resources, the degree to which researchers can control each of these factors may vary.
12.12 Results Section
this indicated no significant violation of the equal variance assumption. Readers generally assume that the equal variances assumed version of the 7 test (also called the pooled-variances ¿test) was used unless otherwise stated. If you see d/ reported to several decimal places, thistells you that the equal variances not assumed /test was
Following is an example of a “Results” section for
used.
the study of the effect of caffeine consumption on
eee 59% Page 356 of 624 - Location 9139 of 15772
12.13 Graphing Results: Means and CIs
raiesDan 55 Sis Dvr
CEE 因 ニ ュ jam = ーn [Jaen] 2% | commsons
Cumming and Finch (2005) suggested that
レ レ
authors should emphasize confidence intervals along with effect sizes. Graphs of CIs help focus
ョ ョ
リ ョ レ ョ ョ ョ ョ ョ ョ ョ 2
reader attention on these. Several types of CI graphs can be presented for the independentsamples test. We could set up a graph of the CI
ョー ape San
for the (M, — M») difference using either an error
а
baror a bar chart. The lower and upper limits of this Clare provided in the independent-samples ¿ test output. It is more common to show a CI for each of the group means (M, and M2). This can be
done with either the SPSS error bar or bar chart procedure. To obtain an error bar graph for M; and My, make the menu selections shown in Figure 12.15, Figure 12.16, and Figure 12.17.
In Figure 12.18 the separate vertical lines for each group (no caffeine, 150 mg caffeine) have two features. The dot represents the group mean. The T-shaped bars identify the lower and upper limits of the 95% CI for each group. Be careful when you examine error bar plots in journals or conference posters. Error bars that resemble the ones in Figure 12.18 sometimes represent the mean + 1 standard deviation, or the mean * 1 SZ, instead of a 95% CI. Graphs should be clearly labeled so that viewers know whatthe error bars represent.
Figure 12.15 SPSS Menu Selections for Error Bar Procedure
The image is a SPSS menu selection to obtain the error bar procedure for thefile hrcaffeine.sav.
At thetopofthe spreadsheet are the following menu buttons;file, edit, view, data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttonsto open a
file, save, print, go back andforward, and other table editing options.
The graphs menu hasbeenopened and the following selectionsare visible; chart builder, graphboard template chooser, Weibull plot, comparesubgroupsandlegacy dialogs. The legacy dialogs menu has beenopenedto show thefollowing menu options; bar, 2-D bar, line, area, pie, high-low, box plot,error bar, population pyramid, scatter or dot and histogram.
......
There is some data visible on the spreadsheet. This has been reproduced below:
Caffeine, hr 1,51 1,66 1,58 1,58 1,53
SE 50% Page 357 of 624 - Location 9167 of 15772
….…............
summary in two ways; eîther as summaries for groupsofcases or as summaries of separate variables. Thefirst option hasbeen selected.
1,48 1,57 1,73 1,56 1,58 2,72 2,57 2,78 2,61 2,66 2,54 2,64 2,82 2,71 2,74
There are radio buttons for Define, Cancel and Help at the bottom ofthe dialog box. Cumming and Finch (2005) pointed out that when two 95% CIs, like the ones in Figure 12.18, do not overlap, you know that the ¿test for the difference between group means must be statistically significant using a = .05, two tailed. On the other hand,if the CIs do overlap, it is possible that the ¿test that compares group means maybestatistically significant (because the CI for [M; — M] has a larger d/than the CIs for
Figure 12.16 Error Bar Dialog Box
M, and for M3).
Figure 12.17 Define Simple Error Bar: Summaries
ta Error Bar
for Groups of Cases Dialog Box
EB Define Simple Error Bar: Summaries for Groupsof Cases И
Simple
団 Variable, »
TH
Clustered
x |
Category Avis: [8 carene
Em |
Bars Represent [Confidenceinterval for mean Lever ps Jw
- Data in Chart Are ———— — 一
Panel by
Rows
»
© Summaries for groups of cases Columns:
© Summaries of separate variables
Template Use chart specifications from:
The image is a dialog box in SPSS to create an Error bar.
Cox esse) ose) [canca) Ces
There are two choices available; simple and clustered. The Simple option has been highlighted.
The image is a dialog box to define error bar as summaries for groupsofcases.
The data in the chart area can be shown as a
50% Page 358 of 624 - Location 9189 of 15772
Ontheleft is space for variables, which can be
chosen and movedto the box ontheright. The variable hr has been moved to the right-side variable section. The category axis can also be defined. Here it has been specified as caffeine. There are tworadio buttonson theside; titles andoptions. Thereis a drop-down menu that allows a choice of what the bars represent. Here the bars represent confidenceinterval for mean. Thelevel can also be chosenand 95 percent is the level currently. The Panel by option for rowsand columns can be specified although currentlytheyare blank. A check box to nest variables for rows and columnsis also present. The Template option hasa check box that states: “Use chartspecification from” and allowsone to selectthe required file. At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 12.18 Error Bars for Mean Heart Rates in Hypothetical Caffeine Experiment
A dot in each line represents the group mean andthis is 58 for the nocaffeine bar and68 for the 150 mgcaffeine bar. The upper and lower bounds are also shown as horizontal lines across the top and the bottom of the vertical line. This is 53 and 63 for the no caffeine bar and 63 and 75 for the 150 mg caffeine bar.
A bar chart is another way to represent
information aboutCIs. The menu selections to open the bar chart procedure were shown earlier ( > っ . Inthe Define Simple Bar: Summaries for Groups of Cases dialog box,in Figure 12.19, select the radio button for “Other statistic (e.g., mean)” and move the dependent variable name (heart rate) into the box labeled “Variable.”It will appear as MEAN([hr]). The height of each bar will correspond to the mean heart rate for one group. Enter the name of the group or category variable into the box labeled Options dialog box, also shown in Figure 12.19,
70 95% Cl hr
There are two separate vertical linesfor each group.
“Category Axis.” Click the Options button. In the
75
check the box for “Display error bars.” Leave the
default radio button selection under “Confidence
65
Intervals”as 95.0 for “Level (%),” unless otherwise
60
desired. This will produce a 95% CI for each group
mean.
55
50
and ranges from 50 to 75.
The resulting bar chart appears in Figure 12.20.
Nocaffeine
150 mg caffeine
By default, SPSS uses O as the starting value for the Yaxis. When bar charts were used to represent
The image showsan error bar graphfor the caffeine experiment.
the frequency of cases for each groupearlier,
The X axis denotes whetherthebar represents nocaffeine or 150 mg caffeine.
recommended; cutting out large portions of the Y
The Y axisrepresents the 95 percentCI for hr
using O as the lowest value for Ywas axis that representpossible values for Ycan yield a graph that exaggerates the magnitude of group
sizes.
PP 50% Page 359 of 624 - Location 9203 of 15772
caffeine bar rises to about 68. Both bars have error bars embedded in them.
Figure 12.21 Edited Bar Chart: 95% CIs for Two Group Means
Samples ¿Test Statistical power analysis provides a more formal way to address this question: How does the probability of obtaining a ¿ratio large enough to
80
reject the null hypothesis (Hp: py = Ha) Vary as a function of sample size and effect size? Statistical power is the probability of obtaining a test statistic large enough to reject Zp when Ap is false. Researchers generally want to have a reasonably high probability of rejecting the null
40
No caffeine
150 mg caffeine
Error bars: 95%CI
Thisis an edited bar chart with 95 percent CI for two group means. There are two bars in the image indicating presence or absenceof caffeine. The X axis denotesthe absenceof caffeine as well as 150 gmscaffeine and the Y axis denotes the mean hr. This ranges from 40 to 80, rising in increments of 10. The no caffeine bar rises to 58. The 150 mg caffeine bar rises to about 68. Both bars have error bars embedded in them.
When bars represent group means, starting the Y axis at 0 often does not makesense. For heart rate,
it would makesense to use the lowest value for heart rate that you could call a normal healthy heart rate as your minimum. In this situation it
wouldbe reasonable to use a value such as 40 as the lowest value marked on the axis This change can be madein the chart editor (commandsare not shown). The edited bar chart appearsin Figure 12.21.
hypothesis; power of 80% is sometimes used as a reasonable guideline. Cohen (1988) provided tables that can be used to look up power as a function of effect size and zor to lookup nasa
function of effectsize and the desired level of
power. An exampleof a power table that can be used to look up the minimum required 7 per group to
obtain adequate statistical power is given in Table 12.3. This table assumes that the researcher will use the conventional a = .05, two tailed,criterion for significance. For other alpha levels, tables can be found in Jaccard and Becker (2009) and Cohen (1988). To use thistable, the researcher mustfirst decide on the desired level of power (power of .80 is often taken as a reasonable minimum). Then, the researcher needs to make an educated guess about the population effectsize that the study is designed to detect. In an area where similar studies have already been done, the researcher may calculate n2 values on the basis of the zor 7 ratios reported in published studies and then use the averageeffect size from past research as an estimate of the population effect size. (Recall that n° can be calculated by hand from the values of £
12.14 Decisions About Sample Size for the Independent50% Page 361 of 624 - Location 9249 of 15772
and dfusing Equation 12.19 if the valueof n° is not reported in the journal article.) If no similar past studies have been done, the researcher can
down the column of values for estimated power
The sample size needed for adequate statistical
under the column headed #= .50 until reaching
power can be approximated only by making an
the table entry of .80. Then, she would look to the
educated guess about the true magnitude of the
left (of this value of .80) for the corresponding
effect, as indexed by d If the guess about the
value of On the basis of the values in Table 9.3, the value of Vrequired to havestatistical power of
population effect size dis wrong, then the
about .80 to detect an effect size of d= .5 in a one-
wrong. Information from past studies can often
sample test with a = .05, two tailed, is between
be used to make at least approximate estimates of population effectsize.
30 and 40.
Table 9.3
estimate of power based on that guess will also be
Statistical power analysis is useful when planning a future study. It is important to think about whether the expected effect size, alpha level, and sample size provide you with a reasonably large chance (reasonably high power) to obtain a statistically significant outcome. People who
"
s
⑥
ャ
write proposals to compete for research funds from government grant agencies are generally required to includea rationale for decisions about
n
a s a ョ ュ タ ョ タ
ョ ョ ョ ッ e
ッ
planned samplesize on the basis of power. There are several places to obtain information for statistical power analysis. Jaccard and Becker (2009) provide power tables for some additional
ョ +
メ ョ ョ ュ ュ ュ タ
ッ ッ
situations. SPSS has an add-on procedure for statistical power, and numerous other computer
s ョ メ メタ ラ ョ ョ ッ ッ ッ
programs (some free) can do power analyses. Free online power calculators are widely available (for example, at http://powerandsamplesize.com/Calculators/).
Source: Reprinted with permission from Dr. Victor
Bissonnette(2019).
The true strength of the population effect size we are trying to detect is not known. For example, the degree to which the actual population mean y differs from the hypothesized value, Hhyp, as indexed by the population value of Cohen's 4, is not known in advance of the study. If we knew
Usually researchers rely on computer programs instead of tables for power analysis. A researcher provides program input information about type of analysis (e.g., a one-sample¿test), planned a level, whether a one- or two-tailed test is desired, and expected effect size. Programs usually provide either the estimated power for an input value of N or the minimum A needed to achieve a requested level of power.
the answer to that question, we would not need to You should not report a post ho
do a study!
Thatis, do not look up your obtained Cohen’s &
36% Page 218 of 624 - Location 5594 of 15772
and smaller than it should be in other situations.
equivalent.
Suppose that you want to know whether patients
Self-selection into treatment is problematic. If
have lower mean anxiety scores after Rogerian
your study includes a meditation training group
therapy (Group 1) or Freudian psychodynamic
and a control group, and participants are allowed
therapy (Group 2). Suppose that these two types
to choose their groups, you will probably have
of therapy are given by different therapists (Dr.
different kinds of people in the meditation group
Goodman does the Rogerian therapy and Dr.
than in the no-treatment control group.
Deadwood does the psychodynamic therapy). This would be a perfect confound between therapist personality and ability and type of therapy. If the Group 1 patients do better than those in Group 2, we cannottell whether this is dueto differences in the type of therapy or
differences between the two doctors. This is a perfect or complete confound, and it makes the results of this study uninterpretable. The M, — difference can be dueto type of therapy, personality and ability of the therapist, or both. (Even if Dr. Goodman did the therapy in both groups, there could be problems, because she might have greater faith in one type of therapy than the other, and this could produce placebo or expectancy effects.) Confounds do not have to be complete confounds to be problematic. Consider a group of patients in adrugstudy.If the drug group has 55% women and the placebo group has only 39% women, there is a partial confound between type of drug and sex. M, might differ from M, because the M,
12.15.2 Decisions About Type or Dosage of Treatment Researcher decisions about the types or amounts of treatments (or other group characteristics) can influence the M, — M, difference between means. Usually, researchers want to maximizethis difference. However, there are limits. We cannot give human beings 10,000 mg of caffeine to maximize the effects of caffeine on heartrate (for ethical as well as practical reasons). It would not be useful to give rats amounts of artificial sweetener that would correspond to human consumption of 50 diet sodas per day, because that dosage would not correspond to any real-
world situation. If naturally occurring groups are compared (for example, older adults vs. younger adults), it will usually be easier to find differences when groups differ substantially. For instance, a study that
compares reaction time between a group of
group includes more women, while the M group
persons ages 60 to 70 and a group of persons ages
includes more men—insteadof or in addition to
20 to 30 is more likely to find a difference than a
any drug effects.
study that compares a group of persons in their
Confoundscan be obvious, but sometimes they are subtle. Random assignmentof participants to groups is supposed to make groups equivalent in composition, but sometimes this doesn’t work as well as expected. When background information is available aboutparticipants, it's good to compare the groups to see whether they are
60% Page 363 of 624 - Location 9298 of 15772
20s with a group in their 30s.
12.15.3 Decisions About Participant Recruitment and Standardization of Procedures
Researcher decisions about types of participants
embarrassingly low, while CIs are often
to recruit, and about standardization of
embarrassingly wide.
procedures, can affect the magnitude of Sp, the pooled or averaged within-group standard deviation. Recruiting homogeneousparticipants such as 18-year-old healthy men helps keep sp low (compared with studies with wider ranges of age and health), but it also limits the potential generalizability of results. It is a good idea to standardize situations and testing procedures to keep sp small, but rigid protocols can result in experiences that make the situation feel even
To summarize: Researcher decisions about treatmenttype and dosage, and the presence of confounds, will affect the magnitude of M, - M3. Confounds make M, — M5 differences uninterpretable even if they are statistically significant. Researcher decisions about participant recruitment and procedures can reduce the magnitudeof sp, but may also reduce generalizability. Very low 77s result in
more artificial.
underpowered studies, that is, studies in which a
12.15.4 Decisions About Sample
the null hypothesisis false. Very large 77s can lead
Size
have any real-worldpractical or clinical
statistically significant ¿value is unlikely even if
to situations in whicheffects that are too small to importance are judged statistically significant. In
Sometimes participants or cases are difficult or
between these extremes, statistical power tables
costly to obtain. A neuroscience study might
can help researchers evaluate the samplesizes
involve surgical procedures and lengthy training
needed for adequate statistical power.
and testing procedures. In such situations, standardization of procedures and optimal choice of treatment dosage levels is particularly important.
12.16 Summary This chapter discussed a simple and widely used
When researchers have access to very large V's
statistical test (the independent-samples ¿test)
(on the order of tens of thousands), thereisa
and provided additional information about effect
different problem. Even effects that are extremely
size, statistical power, and factors that affect the
small (when evaluated by looking at M, — M3, or
size of £. The ¿test is sometimes used by itself to
12, or Cohen’s 2) can bestatistically significant when ¥is very large. Researchers should resist the temptation to overemphasize statistical significance in these situations. Clear information about effect size should be provided in terms
report results in relatively simple studies that compare group means on a few outcome variables; it is also used as a follow-up in more complex designs that involve larger numbers of groups or
outcome variables.
readers can understand. Thisis particularly
A £test value (and correspondingeffect sizes) is
important when important real-life decisions
not a fact of nature. Researchers have some
(such as medical decisions) are at stake.
control over factors that influence the size of £ in
One possible reason why researchers have been slow to adopt the reporting of effect size
information and CIs is that effect sizes are often 60% Page 364 of 624 - Location 9325 of 15772
both experimental and nonexperimental research situations. Because the size of /depends to a great extent on our research decisions, we should be
cautious about making inferences about the
The term errorin everydaylife means “mistake.”
strength of effects in the real world on the basis of
In statistics, e7707has manydifferent meanings,
the obtained effect sizes in our samples.
depending on context. Errors in prediction don’t
For the independent-samples rtest researchers often report one of the following effect size
happen because the data analyst made a mistake (although mistakes in data analysis can happen, of course). Errors in prediction happen because
measures: Cohen's 4, 7p, Or 12. Eta squared is an
many other variables, other than the X variable
effect size commonly used to do power analysis
used as a predictor, influence the scores on the Y
for future similar studies. When researchers want
outcome variable. Z77orrefers collectively to all
to summarize information across many past
the variables in the world that are related to F, but
studies (as in a meta-analysis), 7pb (often just
that we did not controlin the study or include in
called 7) is often the effect size of choice. Past
the statistical analysis. This may clarify why
research has not always included effect size
proportions of error variance are so high in most
information, but readers can usually calculate
research! Error also includes any chance or
effect sizes from the information in published
random or unpredictable elements in F. If you go
journal articles.
on to learn about analyses that include multiple
Notice that the independent-samples test like correlation and regression, provides a partition of the total variance in Poutcome scoresinto two
predictor variables, you will see that use of multiple predictors sometimes reduces the proportion of error variance.
parts; n° is the proportion of variance in ¥that
To describe the problem of error variance another
differs between groups (variance that may be due
way, consider the tongue-in-cheek Harvard Law
to different types or amounts of treatment). In
of Animal Behavior: “Under carefully controlled
regression, 72 was the proportion of variance in ¥
experimental circumstances, an animal will
that could belinearly predicted from JX. Similarly,
behave as it damned well pleases.”
(1-72) was the proportion of variance in Fthat could notbe linearly predicted from X; for the
This chapter was long and detailed because it introduces issues that arise when comparing
independent-samples ¿test, (1 -n?) is the
means across groups; many of the following
proportion of variance in Ythatis not predictable
chapters describe analyses that also compare
from group membership or from the score on the
means across groups. This set of analyses is called
predictor variable.
analysis of variance. The same issues
The ? and n? are both called proportion of predicted (or sometimes explained) variance.
Predicted variance is variance in Ythatis related to scores on the Ypredictor variable. By contrast, (1-7) and (1 - n°) are the parts of the variance in Ythat are not predictable from the Xindependent variable. These are interpreted as proportions of error variance.
60% Page 364 of 624 - Location 9353 of 15772
(assumptions, data screening, effect size, and so forth) continueto be important for those analyses, and I'll often refer you back to this chapter for more complete discussion.
Appendix 12A: A Nonparametric Alternative to the Independent-Samples ¿Test
either committed a II or has reported a correct decision notto reject Hg. (The researcher can never be sure which.) We want the probability or risk for both types of error to be low,that is, we want both aand B to be low. When a data analyst selects an a level, such as a =.05, that choice theoretically sets an upper limit for the risk for Type I error. If a is set at .05, then in theory, we have a maximumrisk of 5% for Type I error. However, the limit of risk for Type I error works in practice only if the assumptions and rules for NHSTare followed—and in many situations, they are not. The actual risk for Type I error in many research situations is often much higher than the nominal (selected) a level.
Actual State of the World Loss Drug Really Does Not Work Typel error ih risk a Researcher istrue. Reject decides! work, but the rejection H; says The drug + claims that it does. rese thatthe weight loss drug works The study probably
stcon
lishes the reditfora For patients who take the drug, a benefit
takethe drug will not benefit Correct decision, although maybe Type ll error with unknown Researcher risk not the decision the researcher decidesnotto The researcher id not reject H, hopedfor. reject; does not claimthatthe The drug does notwork andthe when His false researcher does not clamthatit The drug really does work. butthe drugworks researcherdoes not claim that it works. works Often this type of result does The study probably doesn't get not get published, and that is published; a missed opportunity unfortunate. Other researchers may do studiesto seeifthis drug The drug may not be approved for works, notknowing thatthereis use with patients, even thoughit works already evidence suggestingit This is likely to happen when may not work. studies are "underpoweredthat is, the N of casesis too small to detect the effect of interest
The risk for Type II error, B, cannot be exactly known; but we know something about factors
What does it mean for Ho to be false? Ho is true
that tend to make ß larger or smaller. In the
only if pis exactly equal to O (or exactly equal to
previous section we talked aboutstatistical power:
the proposed value in the null hypothesis, such as
the probability of rejecting Zo when it is false.
98.6 or 35 or 100 in previous examples). However,
Power is (1 —B), and we want power to be high,
Hp can be false in billions of ways. If we consider
usually on the order of .80.
Ho: y = 35, Hois false if p really equals any number other than 35 (e.g., 45, 12, 35.01, 99, 34.3, and so
Table 9.5
forth). Hp can be false to varying degrees; ina sense, Hp: = 35 is “less false”if pis really 35.2 or 34.9 than if pis really 30 or 51. Population effect size is the degree to which Æis false. For example, if Cohen's d(for the difference between the real and hypothesized population means) is d= 1.00,
this indicates that the difference between hypothesis and reality is large; if d= .05, this indicates that the difference between hypothesis and reality is small. The values of B and (1 - В) магу depending on the population effect size. We never know the exact population effect size, but we can think about the values of Band (1-8) that we would expect, in theory,for possibledifferent
values of Zand for fixed decisions about Vand a. Appendix 9A explains this in more detail.
36% Page 221 of 624 - Location 5641 of 15772
across groups Compare medians across groups. This option has beenchecked. * Customize analysis Description: compare mediansacross groups using the MedianTest for k samples. Figure
12.24
Specification
of
Hypothesis Test Summary Null Hypothesis
Test
Independent Samples same the is hr of n distributio 4 The Whitney U across categories of caffeine. Test
Sig.
Decision
a ely
Asymptotic significances are displayed. The significance levelis 05 "Exact significance is displayed forthis test
Variables
The image is a table that providesthe results of
(Assignment to Fields)
"AB Nonparametne Text Tuoor Moreindependent Samples
the Mann-Whitney U test. The summary states
the following;
a me se
Ousereses os (© Useustomastron ce
zer: 7
Null hypothesis: Thedistribution of hr is the sameacrosscategories of caffeine. ・ TesIndependent samples Mann-Whitney U test
© Sig:.023 superscript 1
e Decision: Reject the null hypothesis.
There are a couple ofnotes below the table that state the following:
(nre) game, pere) conce) Otto, The image shows the Fields tab of
nonparametric tests.
Asymptotic significances are displayed. The significance level is .05. The superscript 1 states: Exact significance is displayedfor thistest. Details for computation of Mann-Whitney U are
There are two choices in the check boxes; use predefinedroles and use custom field assignment. The second option has been checked.
not presented here. When samplesizes are
There are two boxes on either side below this. The one on the left depicts Fields from which theselected choice can be moved to the box on the right that is named Test Field. This is currently populated by hr.
membership) into ranks. These ranks replace the
reasonably large (i.e., V> 30 for the entire data set), the Mann-Whitney Utest begins by converting the ¥scores (ignoring group original Yscores in the two samples or groups. Nonparametric tests vary in the way they handle tied ranks. The null hypothesis is that these
distributions of ranks are the same across the two
The Groups option is below this and is populated bycaffeine.
groups.
At the bottom ofthe dialog box are buttonsfor the following; Run, paste, reset, cancel and help.
statistic, only the corresponding p value. When
Figure 12.25 Mann-Whitney UTest Results 60% Page 366 of 624 - Location 9403 of 15772
SPSS does notdisplay the Mann-Whitney 7 the hrcaffeine.sav data were analyzed, the result was p = .023. The distribution of heart rate ranks in the two samples (no caffeine vs. caffeine)
differed significantly. Whether the independentsamples /test (Figure 12.11) or the MannWhitney Utest (Figure 12.25) is used to compare heart rate across groups, in this data set, the conclusion was the same. That does not always
3. It can make either larger or smaller or leave it unchanged. 4. It generally has no effect on the size of
the /ratio. . Which of these pieces of information would
happen. Parametric tests such as the ¿test may
be sufficient to calculate an n° effect size
have greater statistical power than corresponding
from areported independent-samples ztest?
nonparametric tests in some situations, but
1. 512, 2? and 71, 72
parametric tests are not always more powerful.
2. tand df
Your decision whether to perform an independent-samples ztest or a Mann-Whitney ひ test will depend on the most common practices in your discipline. Many journals accept the independent-samples zas an appropriate analysis even when some assumptions(such as normality of distribution shapes) are violated. If practitioners in your research area prefer to report non-parametric statistics such as MannWhitney U, it is probably better to follow common practice.
3. The M; - M, difference and sp
4. None of the above . Aronson and Mills (1959) conducted an experiment to see whether people’s liking for a group is influenced by the severity of initiation. They reasoned that when people willingly undergo a severe initiation to become members of a group, they are motivated to think that the group membership must be worthwhile. Otherwise, they would experience cognitive dissonance:
Comprehension Questions
Why put up with severe initiation for the sake of a group membership that is worthless? In their experiment, participants
1. Suppose you read the following in a journal: “The group means did not differ significantly, 30.1) = 1.68, > .05, two tailed.” You notice that the 7s in the groups were 721 = 40 and 725 = 55. 1. What degrees of freedom would you normally expect a ¿test to have when 71 =40and 7; = 55? 2. How do you explain why the degrees of
were randomly assigned to one of three
treatment groups: Group 1 (control) had no initiation. Group 2 (mild) had a mildly embarrassing initiation (reading wordsrelated to sex out loud). Group 3 (severe) had a severely embarrassing
freedom reported here differ from the
initiation (reading sexually explicit words
value you just calculated?
and obscene words out loud).
2. What type of effect can a variablethatis
confounded with the treatmentvariable in a two-group experimental study have on the
obtained value of the ¿ratio? 1. It always makes ¿larger. 2. It usually makes smaller.
60% Page 366 of 624 - Location 9423 of 15772
After the initiation, each person listened to a standard tape-recorded discussion among the group that they would now supposedly be invited to join; this was made to be as dull and banal as possible. Then, they were asked to
rate how interesting they thought the
each married couple, both the husband and
discussion was. The researchers expected
wifefill out a scale that measures their level
that people who had undergone the most
of marital satisfaction; the researcher carries
embarrassing initiation would evaluate the
out an independent-samples ¿test to test the
discussion most positively. In the table below,
null hypothesis that male and femalelevels of
ahigher score represents a more positive
marital satisfaction do notdiffer. Is this
evaluation.
analysis appropriate? Give reasons for your
answer.
1. Were the researchers’ predictions upheld? In simple language, what was
. Aresearcher plans to do a study to see
found?
whether people who eat an all-carbohydrate
meal have different scores on a mood scale
2. Calculate an effect size (n°) for each of
than people who eat an all-protein meal.
these three /ratios and interpret these.
||ésomimentaiéonäiion PRETCTE so м
132 a
818 210 21
‚Source: Data from Aronson andMills (1959).
is the type of meal (1 = carbohydrate, 2 = protein). In past research, the effectsizes that
166 21
have been reported for the effects of food on mood have been on the order of nº = .15. On this basis and assuming that the researcher plans to use a = .05, two tailed, and wants to
Control versus severe
Mid versus severe Control versus mild
Thus, the manipulated independentvariable
have power of about .80, what is the
ッ
Ns
Source: Data from Aronson and Mills (1959).
minimum group size that is needed (i.e., how many subjects would be needed in each of the two groups)?
as 71 and 7? increase. 6. Aresearcher reports that the n° effect size for her studyis very large (n° = .64), but the £ value she reports is quite small and not statistically significant. What inference can you makeaboutthis situation? 1. The researcher has made a mistake: If n2 is this large, then /mustbe significant. 2. The 7's in the groups are probably rather
. Which of the following would besufficient information for you to calculate an independent-samples 1. 512,59”; Mi, My; and 71, 79
n よ ッ N
5. Trueor false: The size of n° tendsto increase
dand df m, m, the M; — Ma difference, and sp
None of the above Any of the above(a, b, or c)
10. What changes in research design tend to
reduce the magnitude of the within-group variances(n ろ 572)? What advantage does a
large. 3. The s in the groups are probably rather
researcher get from decreasing the
small. 4. None of the aboveinferences is correct.
magnitude of these variances? What
7. Aresearcher collects data on married couples
to see whether men and women differ in their mean levels of marital satisfaction. For
61% Page 368 of 624 - Location 9450 of 15772
disadvantage arises when these variances are
reduced? 11. The statistic that is most frequently used to
describe the relation between a dichotomous
group membership variable and scores on a
the nature of the relationship between food
continuousvariable is the independent-
and mood (i.e., did eating carbohydrates
samples £. Name two other statistics that can
make people more or less calm than eating
be used to describethe relationship between
protein)?
these kindsof variables (these are effect sizes; later you will see that an “ratio can also be reported in this situation). 12. Suppose that a student conducts a study in
which the manipulated independentvariable is the level of white noise (60 vs. 65 dB). Ten participants are assigned to each level of noise; these participants vary widely in age, hearing acuity, and study habits. The outcome variableis performance on a verbal
Also, as an exercise, if you wanted to designa better study to assess the possible impact of food on mood, what would you add to this study? What would you change? (For background about research on the possible effects of food on mood,refer to Spring, Chiodo, & Bowen, 1987.)
Data for Question 13:
learning task (how many words on a list of 25 words each participant remembers). The ¢ value obtained is not statistically significant.
5 8 3
4 9 5
4 3 2
4
8
2
3
5
5
5
4
1
2 2 E
o 3 1
9 4 6
0 3 4
2 2
2 3 4 2 3
6 4 9 9 4
2
What advice would you giveto this student about waysto redesign this study that might improve the chances of detecting an effect of noise on verbal learning recall? 13. The table below shows data obtained in a
small experiment run by one of my research methods classes to evaluate the possible
effects of food on mood. This was done as a between-subjects study; each participant was randomly assigned to Group 1 (an allcarbohydrate lunch) or Group 2 (an allprotein lunch). One hour after eating lunch, each participant rated his or her mood, with higher scores indicating more agreement
with that mood.Select one of the mood
2 2
3 1 o
outcome variables, enter the data into SPSS,
Food type was coded 7 = carbohydrate, 2
examine a histogram to see if the scores
= protein.
appearto be normally distributed, and conduct an independent-samples ¿test to see if mean moodsdiffered significantly between groups. Write up your results in the form of an APA-style “Results” section, includinga
statement abouteffectsize. Be certain to state
61% Page 360 of 624 + Location 9474 of 15772
Note: The moods calm,anxious,sleepy,and alert wererated on a 15-point scale: 0 = mot at all, 15 = extremely. 14. Atest of emotional intelligence was given to
241 women and 89 men. The results were as follows: For women, M= 96.62, SD = 10.34;
for men, M= ⑧⑨.③③, ⑤の = ⑪.⑥①. Was this
these two d/terms: the overall d/for the
difference statistically significant (2 =.05,
independent-samples ¿test = (771 -1) + (ло -1),
two tailed)? How large wasthe effect, as
which can also be written 7; + 79 —2. For the total
indexed by Cohen's Yand byn② (For this
Nin the study (NV = 721 + np), df= N-2. We “lose” 1
result, refer to Brackett, Mayer, & Warner,
degree of freedom for each mean that is
2004.)
estimated.
15. Whatis the null hypothesis for an independent-samples ¿test? 16. Inwhatsituations should the paired-samples
3The null hypothesis for the Levene test Fratio is that the variances of the populations that
ttest be used rather than the independent-
correspond to the two samples are equal (Hp: 612 =
samples ¿test?
02°). The Levene test "ratio, based on values of
17. What information does the Fratio in the SPSS output for an independent-samples ¿test provide? That is, what assumption does it test? 18. Explain briefly why there are two different versions of the ¿test in the SPSS output and how you decide which one is more appropriate. 19. Whatis n2? How is it computed, and how is it interpreted?
SD, and SD», is large if the sample data suggest that this assumption of equal variance is violated. If the p value associated with Levene’s Fis less than .05, the data analyst may consider use of the “equal variances not assumed”version of the independent-samples ztest. If the y value associated with Levene’s Fis greater than .05, the data analyst can use the “equal variances assumed”version of the independent-samples ¿
test. You'll learn more about ratios later. Fratios are reported as one tailed (never two tailed), so it
Notes
is not necessary to specify one or two tailed when you report F.
11t would be nonsense to compute means for categorical dependentvariables.
4The “equal variances not assumed”or “separate
variances” version ofthe testis included in the 2Recall that whenever SSor s2 is computed, each
SPSS output. However, the equal variances not
SSterm has a corresponding d/ When an SSterm
assumed ¢is rarely reported, because the
is computed, deviations from a sample mean are
independent-samples ¿test is robust against
squared and then summed. Because deviations
violations of the homogeneity of variance
must sum to O within a sample, if a sample has 77
assumption. There are two differences in the
members, only 7-1 of the deviations from the
computation of this ratio. First, instead of
mean are “free to vary.” Once we know any 7-1
pooling the two within-group variances, the two
deviations from the mean, the last deviation must
within-group variances are kept separate when
be whatever value is required to make the sum of
the standard error term, SEM My is calculated:
all deviations equal 0. For the independentsamples ¿test there are two SSterms, one for each group. For Sample 1, 851 has 71 — 1 df. For Sample 2, 88, has 75-1 df: The combined &fis the sum of
61% Page 371 of 624 - Location 9494 of 15772
Other
(12.25)
for the equal variances not assumed test is smaller than the dffor the equal variances assumed test, and the p value for the equal variances not Second, a (downwardly) adjusted df term is used to evaluate the significance of
Usually this “
will not be an integer value; it will be smaller than nq + ny —2. The larger the difference in the magnitudeof the variances and the 7s, the
assumed version of the test is larger than the y valuefor the equal variances assumed test. In other words, you are less likely to be able to reject the null hypothesis using the equal variances not
assumed ¿test.
greater the downward adjustment of the degrees
STables for effectsize labels can differ from this
of freedom. Computation of adjusted degrees of
one (for example, some tables use lower or higher
freedom (4) for the equal variances not assumed
values of rto correspond to a medium effect).
test (from the SPSS algorithms webpage at
Some statisticians prefer to set the bar higher
https://www.ibm.com/support/pages/ibm-spss-
(that is, to require a higher value of rthan in
statistics-25-documentation; scroll down to
Cohen’s table to judge an effect size “large”).
“Algorithms”to download the PDF file of SPSS
Another reason tables differ: Most tables are
algorithms):
developed by starting with a list of easy-to-
remember effect size values for one effect size
Other
日 /が ① ぁ コ ー > s/n, +55 /n, IN-1
ぁる コ
S/N 1 ーー 5) / п, +5) / п, NO № -1
(such as .10, .20, .30, etc., for Cohen's 2). Then they convert the d'values into other effect size units (such as 7pb)- The values for 7pb will have more decimal places and won't be as easy to remember. A different table might start with easy-to-remember values of 7p and then convert
those to Cohen's d This would make the values that correspond to small, medium, or large effect
1 df'=——, / z, +2,
size slightly different between the two tables. However, these values are only approximate, so the discrepancies are not important.
where 51? and s,2 are the within-group variances
6In everydaylife people define significance as
of the scores relative to their group means and N
important, noteworthy, or large enough to be of
is the total number of scores in the two groups
value. A result can be “statistically significant”
combined, N= 7, + m9. 1 view thistest as a relic
even if it haslittle practical, clinical, or everyday
from a time when some statisticians were much more worried about violations of certain
value or importance. When we read that an
assumptions than most are today. However,
not assume that the outcome is large, noteworthy,
someday you may encounter a statistical
and valuable in clinical practice. It might be, but
conservative who wants you to use this procedure
we haveto look at effect size, not a statistical
or an exam that has a question about this. Also,
significance test, to evaluate the real-world or
when sample variances differ substantially, the df
clinical value of a research result.
61% Page 372 of 624 - Location 9523 of 15772
outcome is “statistically significant,” we should
Digital Resources Find free study tools to support your learning,
including eFlashcards, data sets, and web resources, on the accompanying website at
necessarily imply that the effect is large in
(e.g., one-tailed ztest, a =.05, two tailed) to look up
practical or clinical terms.
estimated power for your effect size and planned
Look for effect size information.If effect size is not reported,there should besufficient information for you to calculate this by hand. All you need to find Cohen’s dis M, SD, and Hhyp (the proposed or hypothesized value of
7. Or, using .80 for power, figure out the minimum needed to have 80% power.
9.14 Guidelines for Reporting Results
Ww). Also evaluate whether the effect size is large enough to have any practical or clinical
The information to include in a research report
importance. When variables are measured in
depends on the specific test. For a one-sample £
meaningful units, #/— ppyyp is useful
test, include N, M, SD, а}, SEm, t, and (exact) 7;
information.
whether pis one tailed or two tailed; effect size
Look for confidence intervals.
information such as Cohen's ⑦and/or ーuhyp: and a CI for M(or for M-unyp). The following
Ask ifitis reasonable to generalize from the
elements should be included in a written report
types of cases in this study to larger
for a one-sample Ztest.
populations in the real world. Ask if the situation in the study is comparable with real-
world situations.
e A statement of what test was done, for what
variable. * Samplesize (W), M, SD, and SEm.
* The CI for M(or the CI for the M- Uhyp
9.13 Planning Future Research
difference). * Obtained /with its d/and exact p. State
Research methods textbooks specific to your field
whether pis one tailed or two tailed. e Traditionally,a statement of whether a test
of interest provide much information about planning research. From the perspective of NHST,
Wasstatistically significant and/or whether
here are some important issues.
the null hypothesis can be rejected has
Make decisions ahead of time about significance tests (teststatistic, a level, directional or nondirectional test).
usually been included. Proponents of the New Statistics suggest that we should avoid yes/no thinking and instead focus on confidence
intervals and effectsizes.
Make decisions ahead of time about the
ヶ Effect size (such as Cohen's à) and,if units of
identification and handling of outliers.
measurementare interpretable, a difference such as M- Mhyp may also be useful as
Estimate the population effect size. Effect sizes from past studies (your own past research or
information aboutpractical significance.
other people’s) may be used to do this. It is better
Here is an example of a complete “Results” section
to underestimate population effectsize than to
for a one-sample¿test that includes all
overestimate it.
information listed above.
Use your estimated effect size and type of test
37% Page 226 of 624 + Location 5767 of 15772
Comparisons among several group means could be
on the outcome variable. The alternative
made by calculating ztests for each pairwise
hypothesis in this situation is not that all
comparison among the means of these four
population means are unequal; the alternative
treatment groups. However, as described earlier,
hypothesis is that there is at least one inequality
doing numerous significance tests leads to an
between one pair of means in the set.
inflated risk for Type I error. If a study includes # groups, there are A2-1)/2 pairs of means; thus, for a set of four groups, the researcher would need to do (4 x 3)/2 = 6 different ¿tests to makeall possible pairwise comparisons. If a = .05 is used as the criterion for significance for each test, and the researcher conducts six significance tests, the probability that this set of six decisions contains at least one instance of Type I error is greater than 05.
The best question ever asked by a student in any of mystatistics classes was deceptively simple: “Why is there variance?” In the hypothetical experimental study described in the following section, the outcome variableis a self-report measure of anxiety, and the group membership variable is type of stress. We want to know, How much of the variance in anxiety can be predicted from type of stress? Is stress a major reason why anxiety scores differed among persons in this
One way that ANOVAlimits the risk for Type 1
study? Why do some persons report more anxiety
erroris by obtaining a single omnibus that
than other persons? To what extent are the
examines all possible comparisons among means
differences in amount of anxiety systematically
in the study. Researchers often want to examine
associated with the independent variable (type of
selected pairwise comparisons of means as a
stress), and to what extentare differences in the
follow-up analysis to obtain more information
amount of self-reported anxiety due to other
about the pattern of differences among groups.
factors (such astrait levels of anxiety, physiological arousal, drug use, sex, other anxiety-
13.2 Questions in One-Way Between-SANOVA
arousing events that each participant has
The overall null hypothesis for one-way ANOVA is
the same thing as in everyday life. In everyday life,
that the means of the é populations that
we use the word e77orto mean “mistake.” In
correspond to the groups in the study are all
ANOVA,the term errorrefers to the parts of
equal:
scores that cannotbe predicted from type of
experienced on the day of the study,etc.)? In statistics, the term error usually does not mean
treatment or group membership. The part of
Other
anxiety scores that we cannot predict from type of stress is presumably dueto the effects of other
(13.1)
Ho= № = + = В
variables that we have not included in the study,
When each group has been exposed to different
may have happened to the person just before the
types or dosages of a treatment, as in a typical
study, recent use of drugs such as alcohol,
experiment, this null hypothesis corresponds to
caffeine, and tobacco, and possibly a multitude of
an assumption that the treatment has no effect
other unknown variables.
61% Page 375 of 624 - Location 9579 of 15772
such as personality, other upsetting events that
Questions in ANOVA:
significance (like per comparison alpha
1. The first question in one-way ANOVA is this:
[РС] 1 the Bonferroni procedure).
When all group means are considered as a set,
Later in your study of statistics, you will discover
are there any significant differences between
that many analyses involvea similar approach:
means? An overall Fratio will tell us whether
first, an omnibustest that includes all groups
there are any significant differences among
and/or all variables, then follow-up analyses to
the group means, but it does nottell us which
evaluate which groups or which variables show
specific means differ. It is possible that each
significant differences.
group mean differs from every other group mean, butit is also possible that only one or a few pairs of means differ. 2. The second question in one-way ANOVAis this: Which specific pairs (or combinations)
13.3 Hypothetical Research Example
of group means differ significantly? There
Suppose that an experiment is done to compare
are two ways to answer this question. A data
the effects of four situations: Group 1 is tested in a
analyst either decides which comparisons are
“no-stress,” baseline situation; Group 2 does a
of interest ahead of time and sets up planned
mental arithmetic task; Group 3 does a stressful
contrasts or explores data using post hoc
social role play; and Group 4 does a mock job
follow-up tests to which means differ
interview. For this study, the X variable is a
significantly. Both approaches are discussed
categorical variable with codes 1,2, 3, and 4 that
in this chapter.
represent which of these four types of stress each
ヶ Planned contrasts (sometimes just
participant received. This categorical Xpredictor
called contrasts and also called priori
variableis called a factor; in this case, the factor is
comparisons) can be set up to examine a limited number of differences between
called “type of stress”; the four levels of this factor correspond to no stress, mental arithmetic,
means that the data analyst has decided
stressful role play, and a mock job interview. At
ahead of time are of interest. These are
the end of each session, the participants self-
called unprotected (that is, not
report their anxiety on a scale that ranges from 0
protected against inflated risk for Type I
= noanxietyto 20 = extremely high anxiety. Scores
error) because, except for limiting the
on anxiety are, therefore, scores on a quantitative
number of significance tests, there are
Youtcome variable. Imagine that there is a
no other corrections for inflated risk for
convenience sample of V= 28 participants.
Type IL error.
(Capital V denotes the total number of
e hoc (such as the Tukey
participants in the study.) Imagine that
honestly significant difference [HSD]
participants were randomly assigned to one of the
test) can be used to examine manyor all
four levels of stress. This results in #= 4 groups
the possible comparisons among means.
with по 7 participants in each group, for a total of
because
NV = 28 participants in the entire study. Lowercase
These are called
most of them use more conservative per comparison criteria for statistical
62% Page 375 of 624 - Location 9606 of 15772
nindicates the number of cases per group. The
SPSS Data View worksheet that contains data for
this imaginary study appears in Figure 13.1, and
data are available in the SPSSfile
stress_anxiety.sav. The goal of data analysis is to find out: 1. Whether mean anxiety levels differed across
o
these four situations. 2. Which situations elicited the highest and lowest anxiety.
トーーーーーーイ
al
stress
nxiei
Y
1
1
10
2
1
10)
3
1
12
4
1
11
5
1
⑦
6
1
7 12
3. Which treatment group means differed significantly from the baseline (no-stress)
7
1
condition.
8
2
17
9
2
14
10
2
14
Figure 13.1 Data View Worksheet for Stress and
⑪
②
13
Anxiety Study in stress_anxiety.sav
12
2
11
13
2
17
14
2
14
15
3
15
16
3
11
17
3
12|
18
3
14|
19
3
16
20
3
17|
21
3
10)
22
4
16)
23
4
20
24
4
14|
25
4
16
26
4
⑲
②⑦
④
⑯
②⑧
④
⑱
4. Whether mean anxiety differed among the mental arithmetic, role play, and mock job interview stress situations.
62% Page 376 of 624 - Location 9633 of 15772
e
62% Page 377 of 624 -
n 9639 of 15772
stress Anxiety
1 |1 2 [1 3 |1 4 [1 5 [1 6 [1 7 [1 8 |2 ⑨ ② 10 |2 11 |2 12 |2 ⑬ ② 14 2 15 3 16 3 17|3 18 3 19 3 20 |3 21 |3 22 |4 23 ④ 24 |4 25 |4 26 |4 27 |4 28 4
10 10 12 11 7 7 12 17 ⑭ 14 13 11 ⑰ 14 15 11 12 14 16 17 10 16 ⑳ 14 16 19 16 18
13.4 Assumptions and Data Screening for One-Way ANOVA The assumptions for one-way ANOVA are the same as those described for the independentsamples ¿test. The scores on the dependent variable must be quantitative. Observations must be independent of one another, both within and between groups. Ideally, scores should be approximately normallydistributed within each group, and variances should be approximately equal across groups. ANOVA,like the ¿test,is robust against violations of the normality and equal variance assumptions if within-group 7's
of squares or SSis obtained by finding M for the group of interest, computing a (F- M) deviation for each individual Yscore, squaring the deviation for each score, and summing the squared deviations. For the independent-samples ¿test, we needed to find SSonly for Groups 1 and 2. In ANOVA,several different forms of SSare
obtained. SStotal is obtained by: ・ Finding the grand meanfor the entire data set, denoted My: * Obtaining the (7- My) deviation for every
score in the data set.
are reasonably large. Finally, there should not be
* Squaring each deviation.
extreme outliers.
* Summing the squared deviations.
Preliminary screening involves the same
For abatch of data, recall that the sample variance
procedures as for the ztest: Histograms can be
s= SS/df. For SStotal, &= N-1, where Nis the total
examined separately for each group to assess
number of scores in the entire data set. We could
normality of distribution shape; boxplots for
use SStota] to find the total variance sfor all F
groups can identify and potential outliers within
scores in the study; however, to find out what
groups. The Levene test (or another test of
proportion of variance in Vis related to group
homogeneity of variance) can be requested as part
membership, it is more convenient to focus on SS
of the output and used to assess whether the
than s. In ANOVA,an SS divided by its fis usually
homogeneity of variance assumption is violated.
called a mean square (MS).
Because preliminary data screening for one-way
A one-way ANOVAdivides SStota] into two sources
between-SANOVAuses the same procedures as
of variance, often called SShetween groups and
those shown in Chapter 12, on the independentsamples ¿test, these procedures are not repeated
here.
SSwithin groups The formulas to obtain the latter two SSterms can appear confusing, so let’s just focus on the information provided by each term.
13.5 Computationsfor OneWay Between-SANOVA 13.5.1 Overview ANOVAbegins with familiarstatistics. For each group, we obtain M, s, SS, and 7. Recall that a sum
62% Page 377 of 624 - Location 9641 of 15772
SSbetween groups tells us how far the values of Mi, Mp,..., M¿are from the grand mean. If the group means are all exactly equal, SSpetween groups Will be 0. SSterms can never be negative, and there is no fixed upper limit for values. A “large” value of SSbetween groups (also called SShetween) tells us that: * group means are far away from the grand
mean, and/or * group means are far away from one another.
55, + 55) + 55, + 554. The dffor MSwithin is the sum of d, do, dfs, and
What information do we need to consider to
df4; this can also be written as 7, + 729 + 3 + N4— k,
decide whether SShetweenis “large”?
or N- & where Vis the total number of persons in
First, we need to divide SSby its d/ Deviations of
the study and #is the number of groups.
group means from the grand mean, like
If 857 (or the SSfor any group) = 0, that tells us
deviations of individual scores from a sample
that all scores within Group 1 were equal to one
mean, must sum to 0. If there are 2 group means, only the first #- 1 deviations of group means from the grand mean are free to vary. Thus, for between, Y= 2-1 (where Zis the number of groups). Dividing an SSby its dfcorrects for the number of independent deviations used to calculate the SS. An SS divided by its dfis called a mean square. MSpetween 1s, in effect, the variance
another. As the valueof SS, gets larger, we have evidence that a sample of people who received the same treatment havedifferent score values, and
these differences are due to other variables that influenced the outcome. In the hypothetical study of stress and anxiety, anxiety scores may be influenced by recent drug use, depression, or
events in the lab.
of the group means. For technical reasons, statisticians do not refer to MSas a variance (but
After you calculate SStotal, SShetween, and SSwithin,
essentially, that’s whatit is).
you will find that this equality holds (as long as
Second, we need to compare SShetween With information abouterror variance or within-group
you have not made arithmetic errors):
Other
variance. The error variance term is called SSwithin- There are several ways to compute SSwithin- The easiest way to think aboutit is this: First, find SSfor the set of scores within each treatment group. For Treatment Group1, find the group mean, M1; compute the deviation of each Y score in that group from fy; square the deviations; and sum the squared deviations. This yields 857, and this tells us about variation of scores within Group 1. For a study with & = 4 groups and 7 = 7 cases within each group, you obtain the following:
+ SSwithin” This equation describes the partition (division) of
total variation of Yinto two sources of variance: differences among group means (SSbetween) and differences among scores within the same treatment groups (SSwithin). We hope that most of the variation between groups is due to the different types or amounts of treatment received by groups, and we usually hope that SShetween Will be large. We know that SSyithin provides information about response differences among people who received the same type of treatment
dí,
CA
dh
D
and that SSwithin tells us about magnitude of experimental error; we want SSwithin to be small.
MSywithin is the sum of four within-group SSterms: Recall that one of the effect sizes for /was n° and
62% Page 378 of 624 - Location 9670 of 15772
that n° was the proportion of variance of Yscores
To summarize: The by-hand computation for one-
that is predictable from or related to group
way ANOVA(with #groups and a total of Y
membership. In one-way ANOVA, nº =
observations) involves the following steps.
SShetween/ SStotal- Thus, the SSterms provideeffect
Complete formulas are provided in the following
size information.
sections.
To obtain a statistical significance test, we set up
- Compute SSbetween» SSwithin, and SStotal2. Find effect size: n° = SShetween/SStotal-
an Fratio:
3. Compute MShetween by dividing SShetween bY
Other
its 7, 2-1.
(13.3)
F=MS,,,../MS etween' within* Because it is a ratio of MSterms, Fcannot be
4. Compute MSwithin by dividing SSwithin by its
dfiN-R. 5. Compute an Fratio: MSpetween/MSwithin6. Compare this Fvalue obtained with the
negative. Fwould bezero if all group means were
critical value of Ffrom a table of the 7
equal. There is no fixed upper limit for values of 5.
distribution with (4-1) and (W-k) d/(using
To decide whether Fis large enough to be
the table in Appendix Cat the end of the book
statistically significant, we need to find a critical
that correspondsto the desired alpha level;
value of Ffrom the table in Appendix C at the end
for example, the first table provides critical
of this book. The reject region for Fis always one
values for a = .05). If the Fvalue obtained
tailed (values in the top 5% of an distribution,
exceeds the tabled critical value of Ffor the
for instance). To locate the critical value that
predetermined alpha level and the applicable
correspondsto the top 5% of the distribution, you
degrees of freedom, reject the null hypothesis
need to know about d/ The independent-samples
that all the population means are equal.
ttest required only one d/term. An Fratio compares two different MSterms, and each of
In practice, these computations are done by
those MSterms has its own df so we need to
programs such as SPSS; you can decide whether
specify two different dfterms:
the outcome is statistically significant by examining the p value for the Ftest and evaluate
Other
effectsize by calculating an n2.
(13.4) dfctveen =k- 1, Other
(13.5)
弘m ニ が ー る where #is the number of groups and Vis the total
number of cases.
13.5.2 SShetween: Information About Distances Among Group Means The following notation will be used: Let Abe the number of groups in the study. Let 71, 712,..., 14, be the number of scores in
62% Page 379 of 624 - Location 9700 of 15772
Groups 1,2,..., & Let Y;;be the score of subject/in Group /(/= 1,
2,..., Y.
SSbetween * 182 (this agrees with the value of SShetween in the SPSS output presented in Figure 13.8 except for a small amount of rounding error).
Let My, Mo,
Mkbe the means of scores in
13.5.3 SSwithin: Information
Groups 1,2,
A
About Variability of Scores Within Groups
Let Vbethe total Win the entire study; V= 71
+ m9 +. + nk.
To summarize information aboutthe variability of Let Mybe the grand meanof all scores in the
scores within each group, we compute MSyithin-
study (i.e., the total of all the individual
For each group, for groups numbered 7/= 1, 2,..., 4,
scores, divided by Y, the total number of
we first find the sum of squared deviations of
scores).
scores relative to each group mean, 55; The SSfor scores within Group fis found by taking this sum:
Once we have calculated the means of each individual group (74, M>,..., Mk) and the grand
Other
mean My, we can summarize information about
(13.7)
the distances of the group means, M fromthe
SS, = (HM)
grand mean, My, by computing SShetween as
nm
j=1
follows:
Thatis, for each of the #groups, find the deviation
Other
of each individual score from the group mean;
(13.6)
, S の(④カー =
=m (M,~ My)’ +n, (My ~My)’ +...4m, (My — My)’ For the hypothetical data in Figure 13.1, the mean anxiety scores for Groups 1 through 4 were as follows: M; = 9.86, M= 14.29, M3 = 13.57, and M4 = 17.00. The grand mean on anxiety, My, is 13.68. Each group had 7 = 7 scores. Therefore, for this study,
square and sum these deviations for all the scores in the group. These within-group SSterms for Groups 1, 2,..., 2 are summed across the groups to obtain the total
SSwithin: Other
(13.8)
k
SS,iin = 25S; = SS, +SS, +...+SS, i=l
Other
$$ щщеся = 7 x (0.86 — 13.68)+ 7 x (14.29 — 13.68)? + 7 x (13.57 — 13.68}+ 7 x (17.00 — 13.68)?
For this data set, we can find the SSterm for Group 1 (for example) by taking the sum of the
=7x (3.82 + 7х (61)? + 7х (11)? + 7 ж (+332)?
squared deviations of each individual score in
= 7х 14.5924 + 7 x3721 + T0121 + 7 x 11.0224.
Group 1 from the mean of Group 1, M,. The values
62% Page 380 of 624 - Location 9731 of 15772
are shown for by-hand computations; it can be
As noted earlier, the SSbetween ANd SSwithin terms
instructiveto do this as a spreadsheet, entering
will sum to SStotal:
the value of the group mean for each participant as anew variable and computing the deviation of
Other
each score from its group mean and the squared
(13.10)
deviation for each participant.
SS total = 98 between + 55, within”
Other 55, = (10 — 9.86)? + (10 — 9.86)? + (12 - 9.86)? + (11 - 9.86)? +
(7— 9.86)? + (7 - 9.86)? + (12 — 9.86).
SS, = 26.86.
For these data, SStota] = 304, SShetween = 182, and SSwithin = 122, so the sum of SShetween and
SSwithin equals SStota] (because of rounding error,
For the four groups of scores in the data set in Figure 13.1, these are the values of SSfor each
group: S51 = 26.86, 55, = 27.43, 853 = 41.71,and 55, = 26.00. Thus, the total value of SSyithin for this set of data
is SSwithin = SS1 + 557 + SS + 554 = 26.86 + 27.43 + 41.71 + 26.00 = 122.00.
these values differ slightly from the values that appearin the SPSS output in Section 13.13).
13.5.5 Converting Each SStoa Mean Square and Setting Up an 7 Ratio An Fratio is a ratio of two mean squares. A mean square is the ratio of a sum of squares to its
13.5.4 SStotal: Information About Total Variance in FScores We can also find SStota]; this involves taking the deviation of every individual score from the grand mean, squaring each deviation, and summing the squared deviations across all scores and all groups:
formula for a sample variance is also S5/d/ MS terms in ANOVAare similar to variances, but they
are not called variances for technical reasons. The dfterms for the two MSterms in a one-way between-SANOVAare based on Z, the number of groups, and M, the total number of scores in the entire study (where N= 7, + 72) + -- + 74). The between-group SS was obtained by summing the
Other
(13.9)
degrees of freedom, MS = SS/df. Note that the
deviations of each of the #group means from the
em
grand mean; only the first #- 1 of these deviations
SSrt = 220(¥; ~My)’
are free to vary, so the between-groups df= 4-1, where #is the number of groups.
た ⑰①
Other The grand mean My= 13.68. The SStota] term includes 28 squared deviations, one for each
(13.11)
participant in the data set, as follows:
dfctveen = k -L
Other
In ANOVA, the mean square between groups is
88, = (10— 13.68)? + (10 — 13.68)? + (12 — 13.68)? +
+ (18— 13.687= 304.
63% Page 381 of 624 - Location 9760 of 15772
calculated by dividing SSpetween by its degrees of
freedom:
Other
Other
(13.15)
pMS, = between
(13.12)
MS,= ween k—1
MSiin Figure 13.2 Reject Region for 7 Distribution With 3 and 24 df Using a= .05
and anxiety, SShetween = 182, dfoetween = 4-1 = 3,
and MSpetween = 182/3 = 60.7.
The dffor each SSwithin-group term is given by 7 —1, where is the number of participants in each group. Thus, in this example, $5; had 7-1 or df=
Height of the distribution curve
For the data in the hypothetical study of stress
301 reject region
6. When we form SSwithin, We add up SS, + SS7 + + SSk. There are (7— 1) dfassociated with each ss
Value of F
term, and there are #groups, so the total dfwithin = &х (n—1). This can also be written as
Other
(13.13) びがmm =N- k, where Nis the total number of scores (77 + 7) + -+ nk) and £is the number of groups. We obtain MSyithin by dividing SSyithin by its corresponding df.
The horizontal axis of the graph showsthe value of F marked from to 3.0, and the vertical axis showsthe height of the F distribution curve. The graph line is an inverted-V shapedcurve skewedto the left, peakingat almost x equals 0.5 and then dropping gradually to reach base level at about x equals 3.5. The area underthe graph line to the right of a vertical line drawn at x equals 3.01 is shadedand labelled “reject region.”
Other
For the stress and anxiety data, £= 60.702/5.083
(13.14)
distribution with (è-1) and (W- A) df For this data
MSin = SSpichinN = k).
set, #= 4 and N= 28, so dfvalues for the Fratio are
= 11.94. This Fratio is evaluated using the 7
3 and 24.
For the hypothetical stress and anxiety data in
Figure 13.1, MSwithin = 122/24 = 5.083.
An Fdistribution has a shape that differs from the
Finally, we can set up a test statistic for the null
two mean squares and MS cannotbe less than 0,
hypothesis Mo: Mi = Но =
the minimum possiblevalue of Fis 0. On the other
= M¿by taking the ratio
ofMSbetween to MSwithin:
normal or ¿distribution. Because an Fis a ratio of
hand,thereis no fixed upper limit for the value of F. Therefore, the distribution of Ftends to be
62% Page 382 of 624 «+ Location 9792 of 15772
positively skewed, with a lower limit of 0, as in
scores within each group, the larger the value of
Figure 13.2. The reject region for significance
SSwithin-
tests with “ratios consists of only one tail (at the Consider the example shown in Table 13.1, which
upper end of the distribution). The first table in Appendix C at the end of the book shows the
shows hypothetical data for which SShetween
critical values of for æ = .05. The second and
would be O (because all the group means are
third tables in Appendix C providecritical values
equal); however, SSyithin is not 0 (because the
of for g= .0① and g= .00①.In thehypothetical study of stress and anxiety, the Fratio has dfequal
scores vary within groups). Table 13.2 shows data for which SShetween is not 0 (group means differ)
to 3 and 24. Using a = .05, the critical value of F
but SSwithin is O (scores do not vary within
from the first table in Appendix C with d/= 3 in the numerator (across the top of the table) and df
groups). Table 13.3 shows data for which both
= 24 in the denominator (along the left-hand side of the table) is 3.01. Thus, in thissituation, the a = .05 decision rule for evaluating statistical significance is to reject Zp when values of F>
between and SSwithin are nonzero. Finally, Table 13.4 shows a pattern of scores for which both
SSbetween aNd SSwithin are O. Table 13.155,ctweenSSwithin
+3.01 are obtained. A value of 3.01 cuts off the top 5% of the area in the right-hand tail of the 7
2
distribution with dfequal to 3 and 24, as shown in
Figure 13.2. The obtained = 11.94 would therefore be judged statistically significant.
M,=6
13.6 Patterns of Scores and Magnitudes of SSpetween and
M,=6
M,=6
Me
Table 13.255,ctweenSSwithin 7
SSwithin Itis important to understand what information
3
5
about pattern in the data is contained in these SS and MSterms. SShetween 1s a function of the distances among the group means (My, My, ..., Mp); the farther apart these group means are, the larger SShetween tends to be. Most researchers hope to find significant differences among groups, and therefore, they want SSetween (and 5) to be relatively large. SSyithin is the total of squared within-group deviations of scores from group means. SSyithin would be 0 in the unlikely event that all scores within each group were equal to one another. The greater the variability of
63% Page 323 of 624 - Location 9820 of 15772
Table 13.455,ithinSSpetween
7
and it is not interpreted as evidence of causality. An eta squared (n?) is an effect size index given as a proportion of variance; if n° = .50, then 50% of
the variance in the ¥j;scores is related to betweengroup differences. Thisis the same eta squared
13.7 Confidence Intervals for Group Means Once we know the mean, variance, and 7 for each group, we can set up a confidence interval (CI) around the mean for each group or a CI for any difference between a pair of group means. Procedures for CIs were reviewed in Chapter 12, on the independent-samples ztest and are not repeated here.
that wasintroduced in the previous chapter as an effect size index for the independent-samples £ test; verbal labels that can be used to describe effect sizes are provided in Table 12.2. If the scores in a two-group ¿test are partitioned into components using the logic just described here and then summarized by creating sums of squares, the n2 value obtained will be identical to the n° that was calculated from the zand dfterms. It is also possible to calculate eta squared from the
13.8 Effect Sizes for One-Way Between-SANOVA
Fratio and its df this is useful when reading journal articles that report Ftests without providing effect size information:
Other By comparing the sizes of these SSterms that represent variability of scores between and within
(13.17)
groups, we can make a summary statement about
nº —
the comparative size of the effects of the independent and extraneous variables. The proportion of the total variability (SS{ota]) that is due to between-group differences is given by
の⑨es x ア
( のceca X ア+ がmana An eta squared is interpreted as the proportion of
variance in scores on the Youtcome variable that is predictable from group membership (i.e., from
Other
the score on X, the predictor variable). Suggested
(13.16)
verbal labels for eta squared effect sizes were given in Table 12.2.
ま SSpenween
TS total In the context of a well-controlled experiment, these between-group differences in scores are, presumably, due primarily to the manipulated independent variable; in a nonexperimental study that compares naturally occurring groups, this proportion of variance is reported only to describe the magnitudes of differences between groups,
63% Page 324 of 624 - Location 9245 of 15772
One alternative effect size measure sometimes used in ANOVAis called omega squared («w?) (see Hays, 1994). The eta squared index describes the proportion of variance due to between-group differences in the sample, butit is a biased estimate of the proportion of variance that is theoretically dueto differences among the populations. The »? indexis essentially a
(downwardly) adjusted version of eta squared that
and 2%etween = 2, @ population eta squared value
provides a more conservative estimate of variance
of .15, and a desired level of power of .80, the
among population means; however, eta squared is
minimum number of participants required per
more widely used in statistical power analysis and
group would be 19.
as an effect size measurein the literature. Cohen's À is yet another effect size, often used in statistical power analysis. Cohen's£ = n2/(1 — n°).
Table 13.55%
ECE Power O ® 6 ADA > wo > ©
13.9 Statistical Power Analysis for One-Way Between-S ANOVA Table 13.5 is an example of a statistical power
table that can be used to make decisions about sample size when planning a one-way between-§
ws
=
ow
o
ea
ю16 e » мно © am ow om @ un» ゃ コ © = ow 5 a 2 8 = Source:Adapted from Jaccard and Becker (2009). Note:Each table entrycorrespondsto the minimum n required in each group to obtain the level ofstatistical power shown.
ANOVA with 4= 3 groups and а = .05. Using Table
Java applets are available on the web for statistical
13.5, given the number of groups, the number of participants, the predetermined alpha level, and
power analysis; typically,if the user identifies a
the anticipated population effectsize estimated
analysis (such as between-Sone-way ANOVA) and
by eta squared,the researcher can look up the
enters information about alpha, the number of
minimum 7 of participants per group that is
groups, population effect size, and desired level of
required to obtain various levels of statistical
power, the applet provides the minimum per
power. The researcher needs to make an educated
group sample size required to achieve the user-
guess: How large an effect is expected in the
specified levelof statistical power.
Java applet that is appropriate for the specific
planned study? If similar studies have been conducted in the past, the eta squared values from past research can be used to estimate effect size; if
13.10 Planned Contrasts
not, the researcher may have to make a guess on
The idea behind planned contrasts is that the
the basis of less exact information. The researcher chooses the alpha level (usually .05), calculates fpetween (Which equals £- 1, where £is the
researcher identifies a limited number of
number of groups in the study), and decides on
comparison is essentially identical to a Zratio,
the desired level of statistical power (usually .80,
except that the denominator is usually based on
or 80%). Using this information, the researcher
the MSwithin for the entire ANOVA,rather than
can use the tables in Cohen (1988) or in Jaccard
just the variances for the two groups involvedin
and Becker (2009) to look up the minimum
the comparison. Sometimes an Fis reported for
sample size per group that is needed to achieve
the significance of each contrast, but Fis
the power of 80%. For example, using Table 13.5,
equivalent to 2 in situations where only two
for an alpha level of .05, a study with three groups
group means are compared or where a contrast
62% Page 386 of 624 - Location 9873 of 15772
comparisons between group means before looking
at the data. The teststatistic that is used for each
suppose that the researcher has a study in which
has only ① が For the means of Groups a and b, the null hypothesis for a simple contrast between M, and My is as follows:
there are four groups; Group 1 receives a placebo, and Groups 2 to all receivedifferent antidepressant drugs. One hypothesis that may be of interest is whether the average depression score combined across the three drug groups is
Other
significantly lower than the mean depression
Hyp, =,
score in Group 1, the group that received only a placebo.
or
The null hypothesis that corresponds to this
Other
Но:— №= 0. The teststatistic can be in the form of a /test:
comparison can be written in any of the following
ways: Other
Ao =P +; La + Ha
Other
which can be stated: Other
where mis the number of cases within each group
Hu Ho TH Ha _ 9 0-1
in the ANOVA.(If the 77s are unequal across groups, then an average value of 7 is used; usually,
In words, this null hypothesis says that when we
this is the harmonic’ mean of #5.)
combine the means using certain weights (such as
Note that this is essentially equivalent to an
+1,-1/3,-1/3, and -1/3), the resulting composite is predicted to have a value of O. This is equivalent
ordinary £test. In a /test, the measure of within-
to saying that the mean outcome averaged or
groupvariability is 2p; in a one-way ANOVA,
combined across Groups 2 to 4 (which received
information about within-group variability is
three different types of medication) is equal to the
contained in the term MSyithin. In cases where an
mean outcome in Group 1 (which received no
Fis reported as a significance test for a contrast
medication). Weights that define a contrast
between a pair of group means, Fis equivalent to
among group means are called contrast
2. The dffor this ¿test equal V-£, where Vis the
coefficients. Usually, contrast coefficients are
total number of cases in the entire study and #is
constrained to sum to 0, and the coefficients
the number of groups.
themselves are usually given as integers for reasons of simplicity. If we multiply this set of
When a researcher uses planned contrasts, it is
contrast coefficients by 3 (to get rid of the
possible to make other kinds of comparisons that
fractions), we obtain the following set of contrast
may be more complex in form than a simple
coefficients that can be used to see if the
pairwise comparison of means. For instance,
combined mean of Groups 2 to 4 differs from the
63% Page 327 of 624 - Location 9900 of 15772
itself, does not imply causation.
association. For positive values of 7, as values of X increase, values of Falso tend to increase. For
If Y causes Y, we would expect to find a statistical relationship between Yand Yusing the appropriate bivariate statistic (such as, ¿test, chi squared, or other analyses). Evidence that Yand Y co-occur or are statistically related is a necessary condition for any claim that Y might cause or influence Y: Statistical association is a necessary,
but not sufficient, condition for causal inference. We need to be able to rule out rival explanations
before we claim that Y causesY. The additional evidence needed to make causal inferences was discussed in Chapter 2.
negative values of 7, as values of increase, values
of Ytend to decrease. Thistells us the nature or direction of the association. The absolute magnitude of 7(without the plus or minus sign) indicates the strength of the association. If ris near 0, there is little or no association between X and F As rincreases in absolute value, there is a stronger association.
10.4 Setting up Scatterplots Initial evaluation of linearity is based on visual examination of scatterplots. Consider the data in
When we interpretcorrelation results, we must
the file perfect linear association scatter data.sav
be careful not to use causal-sounding language
in Figure 10.1. In this imaginary data, number of
unless other conditions for causal inference are
cars sold (X) is the predictor of a salesperson’s
met. We should not report correlation results
salary (7). A scatterplot can be set up by hand. If
using words such as cause, determine, and
you already know how to set up a scatterplot and
Influence unless data come from a carefully
graph a straight line, you may skip to Section 10.5.
designed study that makes it possibleto rule out To create a scatterplot for the XY variable cars.sold
rival explanations.
and the Yvariable salary, set up a graph with
10.3 How Sign and Magnitude of r Describe an x, Y Relationship Before you obtain a correlation, you need to examine an Æ, Yscatterplot to see if the association between Yand Yis approximately linear. Pearson’s rprovides useful information only aboutlinear relationships. Additional assumptions required for Pearson’s 7 will be
values of O through 10 marked on the XY axis (this corresponds to the range of scores for cars_sold, the predictor), and $10,000 through $25,000 marked on the Yaxis (the range of scores for salary, the outcome) as shown in Figure 10.2. If one variableis clearly the predictor or causal variable, that variable is placed on the X axis; in this example, cars_sold predicts salary. To graph one data point, look at one line of data in the file.
The ninth line has X= 8 for number of cars sold and Y= $22,000 for salary. Locate the value of Y
discussed later.
(number of cars sold = 8) on the horizontal axis.
Values of rcan range from -1.00 through O to
corresponding valueof 7, salary, which is
+1.00. If assumptions for the use of rare satisfied,
$22,000. Place a dot at the location for that
then the value of Pearson's rtells us two things.
combination of values of Yand Y. When pairs of X,
The sign ofrtells us the direction of the
scores are placed in the graphfor all 11 cases,
Then, movestraight up from that value of Yto the
39% Page 235 of 624 - Location 5968 of 15772
coefficients. First, you list the coefficients for
in programs such as SPSS.
Contrasts 1 and 2 (make sure that each set of coefficients sums to 0, or this shortcut will not
13.11 Post Hoc or “Protected”
produce valid results).
Tests
Contrast 1:(-2,-1,0, +1, +2)
If the researcher wants to make all possible
comparisons among groups or does not have a
Contrast 2:(+1,-1,0,0,0)
theoretical basis for choosing a limited number of comparisons before looking at the data,it is
You cross-multiply each pair of corresponding coefficients (i.e., the coefficients that are applied to the same group) and then sum these cross
2 ョ ceo 2
1 1 EM) ョ
for Type I error by using “protected”tests. Protected tests use a more stringent criterion
products. In this example, you get
Contrast (C x CrossproductofC,xC,
possible to use test procedures that limit the risk
than would be used for planned contrasts in
Sum=0 Sum=o
æ 4 0° 0 00 o o
0 0 o 0
Smes
judging whether any given pair of means differs significantly. One method for setting a more stringent test criterion is the Bonferroni procedure, described in Chapter 10. The
In this case, the sum of the cross products is —1.
Bonferroni procedure requires that the data
This means that the two contrasts above are not
analyst use a more conservative (smaller) alpha
independent or orthogonal; some of the
level to judge whether each individual
information that they contain about differences
comparison between group means is statistically
among means is redundant. Consider a second
significant. For instance, in a one-way ANOVA
example that illustrates a situation in which the
with 2= 5 groups, there are £x (4-1)/2 = 10
two contrasts are orthogonal or independent:
possible pairwise comparisons of group means. If
Linear(C) Curvilinear (C) produetC,xC,
2 1 aco 2
① o CHO) o
e + qa o
コ o DO o
+2 Sm=0 1 Sumeo cacn 2 Sumoterose products =0
the researcher wants to limit the overall experiment-wise risk for Type I error (EW) for the entire set of 10 comparisons to .05, one possible way to achievethisis to set the PC, level for each individual significance test between
In this second example, the curvilinear contrast is
means at agw/(number of post hoc tests to be
orthogonal to the linear trend contrast.
performed). For example, if the experimenter wants an experiment-wise æ of .05 when doing #=
In a one-way ANOVA with Zgroups, it is possible
10 post hoc comparisons between groups, the
to have up to (#£- 1) orthogonal contrasts. The
alpha level for each individual test wouldbeset at
preceding discussion of contrast coefficients
EW,2/k, or 05/10, or .005 for each individual test.
assumed that the groups in the one-way ANOVA
The ¿test could be calculated using the same
had equal 77s. When the 77s in the groups are unequal,it is necessary to adjust the values of the contrast coefficients so that they take unequal groupsize into account; this is done automatically
64% Page 329 of 624 - Location 9955 of 15772
formula as for an ordinary ¿test, but it would be judged significant only ifits obtained p value were less than .005. The Bonferroni procedure is extremely conservative, and many researchers
prefer less conservative methodsof limiting the risk for Type I error. (One way to make the Bonferroni procedure less conservativeis to set the experiment-wise alpha to some higher value, such as ⑩ Dozens of post hoc or protected tests have been developed to make comparisons among means in ANOVAthat were not predicted in advance. Some of these procedures are intended for use with a limited number of comparisons; other tests are used to make all possible pairwise comparisons among group means. Some of the better known post hoc tests include the Scheffé test, the Newman-Keuls test, and the Tukey HSD test. The Tukey HSD test has become popular because it is moderately conservative and easy to apply;it can be used to perform all possible pairwise comparisons of means and is available as an option in widely used computer programs such as SPSS. The menu for the SPSS one-way ANOVA procedure includes the Tukey HSD test as one of many options for post hoc tests; SPSS calls it the Tukey procedure.
where a and Adenote any two groups a and み Values of the g ratio are compared with critical values from tables of the Studentized range statistic (see the table in Appendix F at the end of the book). The Studentized rangestatistic is essentially a modified version of the ¿distribution. Like ¢ its distribution depends on the numbers of subjects within groups, but the shape of this distribution also depends on à the number of groups. As the number of groups (2) increases, the number of pairwise comparisons also increases. To protect against inflated risk for Type I error, larger differences between group means are required for rejection of the null hypothesis as #
increases. The distribution of the Studentized rangestatistic is broader and flatter than the 7 distribution and hasthicker tails; thus, when it is used to look upcritical values of g that cut off the most extreme 5% of the area in the upper and lower tails, the critical values of gare larger than
The Tukey HSDtest (and several similar post hoc tests) uses a different method of limiting the risk for Type I error. Essentially, the Tukey HSD test uses the same formula as a ¿ratio, but the resulting test ratio is labeled g rather than ¢ to remind the user that it should be evaluated using a different sampling distribution. The Tukey HSD test and several related post hoc tests use critical
values from a distribution called the “Studentized rangestatistic,” and the test ratio is often denoted by the letter 7:
the corresponding critical values of £. This formula for the Tukey HSDtest could be applied by computing a g ratio for each pair of sample means and then checking to see if the obtained g for each comparison exceeded the critical value of ¢ from the table of the Studentized range statistic. However, in practice,
acomputational shortcut is often preferred. The formula is rearranged so that the cutofffor judging a difference between groups to be statistically significant is given in terms of
differences between means rather than in terms
Other
of values of a g ratio.
(13.19)
Other
(13.20) 64% Page 390 of 624 - Location 9980 of 15772
slightly larger between-group differences asa
HSD= Geriticat X Then, if the obtained difference between any pair
basis for a decision that differences are statistically significant, than the overall one-way ANOVA.
of means (such as M,— Mp) is greater in absolute value than this HSD, this difference between
13.12 One-Way Between-S§
means is judged statistically significant.
ANOVA in SPSS
An HSD criterion is computed by looking up the
To run the one-way between-SANOVA procedure
appropriate critical value of 7, the Studentized rangestatistic, from a table of this distribution (see the table in Appendix F). The critical 7 value is
in SPSS, makethe following menu selections from the menu bar at the top of the Data View worksheet, as shown in Eigure ⑬.③: っ
a function of both 77, the average number of
+ . This
subjects per group, and £, the number of groups in
opens the dialog box in Figure 13.4. Enter the
the overall one-way ANOVA.Asin other test situations, most researchers use the critical value
of gthat corresponds to a = .05, two tailed. This critical ¢ value obtained from the table is multiplied by the error term to yield HSD. This HSDis used as the criterion to judge each obtained difference between sample means. The researcher then computes the absolute value of the difference between each pair of group means (M;
name of one (or several) dependentvariables into the pane labeled “Dependent List”; enter the name of the categorical variable that provides group membership information into the box labeled “Factor.” For this example, additional windows were accessed by clicking on the buttons marked Post Hoc, Contrasts, and Options. The screenshots that correspond to this series of dialog boxes appearin Figures 13.4 through 13.7.
— M2), (My — M3), and so forth. If the absolute value
Figure 13.3 SPSS Menu Selections for One-Way
of a difference between group means exceeds the
Between-SANOVA
ré (Due) 5 Suiata sc ら at ven Du Tom rs Goons e n_ e ve
HSDvaluejust calculated, then that pair of group means is judged to besignificantly different. When a Tukey HSDtest is requested from SPSS,
|
包 目⑥ 四 = Ja TZ)
Besaros vate
SPSS provides a summarytable that shows all possible pairwise comparisons of group means and reports whether each of these comparisons is significant. If the overall for the one-way ANOVAis statistically significant, it implies that
ee
El ded E ョ rm ロ ov tren. Bi sarc sanoTest ロ ー senssacs Test rosariesTun.
(Зонеベ
ーーシーー see
there should be at least one significant contrast among group means. However, it is possible to
Thedetailsare asfollows.
havesituations in which a significant overall Fis
“Analyze”is the sixth tab from the left on the menu bar on top.
followed by a set of post hoc tests that do not reveal any significant differences among means. This can happen because protected post hoc tests are somewhat more conservative and thus require
64% Page 391 of 624 - Location 10007 of15772
“Compare means”is the fifth option from the top of the drop-down menu.
“One-way ANOVA”is the last optionof six given. Arrowsare shownagainst options “compare
AB, One-Way ANOVA: Post Hoc Multiple Comparisons Equal Variances Assumed
150
means” and “one-way ANOVA.”
Figure 13.4 One-Way ANOVA Dialog Box
LEO Dependent List E mie
Drecwa
E
Eveler-Duncan
[7] Tukeys-b
っama
E Duncan IE] Hochbergs GT2 Test D gare
le
Equal Varianceslot Assumed
oop Cloumerstz [7] Games-Howel [J Dunnetrs c
mmeeeee ー (connue) (cance nes) Thedetailsare asfollows.
Thedetailsare asfollows. Onthe left: An unlabeled pane.
Onthetop center: A panelabelled “dependent list” with theentry“anxiety” Toits right, three buttonslabelled “contrasts,” “post hoc,” and“options” Lower center: A panelabelled “factor” with the
entry“stress” Bottom row: Five buttons labelled “OK,”
“paste,” “reset,”
“cancel,” and “help.”
Figure 13.5 One-Way ANOVA: Post Hoc Multiple Comparisons Dialog Box
64% Page 392 of 624 - Location 10033 of 15772
The paneon top labelled “equal variances assumed” shows 14 choicesin three columns, of which “Tukey”hasbeen checked, the second choicefrom thetop in the central column. The panebelow is labelled “equal variances not assumed,” and shows 4 choices. A dialog box below that,labelled “significance level,” hasthe entry “0.05.”
In the lower marginarethree buttons: continue; cancel; and help. Figure 13.6 Specification of a Planned Contrast
1. One-Way ANOVA: Contrasts
Contrast Coefficient
Group
図
Polynomial
1 No stress
+3
2 Mental arithmetic
-①
3 Stressrole play
Contrast 1 of 1
4 Mock job interview
1 -①
The null hypothesis about a weighted linear
Coefficients:
composite of means that is represented by thisset
ofcontrast coefficients: Other
(+3)p, +Dp, + Ор, +Dp, = 0 or
Coefficient Total: 0.000
(cone) (cancer) nen
Other
5
占 -巴
Hi に 0
or Other
Note:
+
LL
FLL
+
и = 巳 Е H4
Thedetails are as follows, from the top downward.
« An option “polynomial”left unchecked. ® А рапе labeled “contrast 1 of 1” with a box labelled “coefficients.” Options include, 3; ‘minus1; minus 1; minus 1. « Coefficient total is shown as 0.000. o To the right of the pane is a button labelled “next”. + At the bottom are three buttons: continue;
cancel; and help.
From the menu of post hoc tests, this example uses the one SPSScalls “Tukey”(this corresponds to the Tukey HSD test). To define a contrast that compares the mean of Group 1 (no stress) with the mean ofthe three stress treatment groups combined, these contrast coefficients are entered one at atime: +3,-1,-1,-1. From the list of options, “Descriptive”statistics and “Homogeneity of variance test” were selected by placing checks in the boxes next to the names of
these tests. Figure 13.7 One-Way ANOVA: Options Dialog Box
64% Page 392 of 624 - Location 10046 of 15772
GPA First Year 40
Figure 10.7 Hypothetical Scatterplot for 7= +.20
GPA 40
30 so 20 20 10
250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = .50
The image is an ellipse drawn around a scatterplot that showsa relationship between GPA and SATscores correspondingto r equals plus .5. TheX axis represents SAT scores and ranges from 250 to 800.The Y axis represents GPA and ranges from 1 to 4.
There are threeellipses within which most of the datapoints are clustered. Thereare many outliers, but several points lie within the ellipses. The ellipses are vertical.
10 250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation of about .20
The image is an ellipse drawn around a scatterplot that shows a relationship between GPA and SAT scores corresponding to r equals plus .2. The X axis represents SAT scores and ranges from 250 to 800. The Y axis represents GPA and ranges from 1 to 4.
There are two ellipses within which many data points are clustered. There are manyoutliers, and several of thesepoints lie betweenthe ellipses.
For a mean GPA of 1.4, thefirst ellipse has 6
data points. Theyare clustered around the 2 GPA and 400 SATscore levels.
The secondellipse is for a mean GPA of 2.0. The data points are clustered aroundthe 1 to 3 GPA range and the 500 to 600 SATscore levels.
There are around18 such datapoints. There are many points close to the ellipse, but not contained within it.
The thirdellipse is for a mean GPAof 2.6. Here, data points are fewer, just around5, and are clustered around the 3 GPA level and 700
For a mean GPA of 2.1, thefirst ellipse has 8
data points. Theyare clustered around the 1 to 3 GPA and 400 SATscore levels.
The secondellipse is for a mean GPA of 2.4. The data points are clustered aroundthe 1.5 to 3.5 GPA range and the 650 to 700 SAT score
levels. There are around 5 such data points.
Mostof the other points lie betweenthe two ellipses and not inside them, while a straight line drawn betweenthe meansofboth ellipses is almosthorizontal.
SATlevel.
A straight line drawnbetweenthe means of theellipses is almost linear.
39% Page 230 of 624 + Location 6060 of 15772
10.6 Different Situations in Which 7 = .00
Descriptives
am
The details are as follows.
| Mom oon| os re ft Trt| rimam ee pue | me] wee | ow Z| m= m| ww um] u| =
Table 1 Contrast 1
Test of Homogeneity of Variances o swe de de se o E ANOVA any Sumo cニ ーー 森 ー ェ ー弟Sim win Groups 122000 24 R
Total
204.07
STRESS None Mentalarithmetic Stress role play Mock job int 3 Minus1 Minus 1 Minus 1
Table2 Contra Value ofcontrast 1d error | ォ ー Andety assume equal variances 1 Minus 15.29 [2952 Wins 5176 24 [as Aniety does not assume equal 1 Minus 15.29 [2832 Minus 5.397 11058 0.06 Figure 13.10 shows the results for the Tukey HSD : i
27
tests that compared all possible pairs of group
means. The table “Multiple Comparisons”gives Thedetails are as follows. т.
the difference between means for all possible
a
Anxiety
pairs of means (note that each comparison
N] Mean Standard]
deviation
Standard 95% confidenee interval for Mean Lower Upper bound bound 0200 [70 us error
minimum 7
a
.
crm 6
with Group a). Examination of the “Sig.” or p
gts
FEE
Tone
7 [986 |2:
oss
[1231
[1626
[u
Se
|7 [857267
em
(1433
jason
jo
1700 (7002
Em
pe
ee
A
o over Total
appears twice; that is, Group a is compared with Group b, and in another row, Group b is compared
[7
i
values indicates that several of the pairwise
①
comparisons were significant at the .05 level. The
3
results are displayed in a moreeasily readable
3 ⑧
.
。
Table 2
form in the last panel under the heading ぅ : ‘Homogeneous Subsets.” Each subset consists of group means that were notsignificantly different
⑧ Anxiety
from one another using the Tukeytest. The no-
TE li
om
om
jus
h
Levene statistic
Df 1
Df 2
Sig
0.453
3
24
0.7
Table 3
stress group was in a subset byitself; in other words, it had significantly lower mean anxiety than any of the three stress intervention groups. The second subset consisted of the stress role play and mental arithmetic groups, which did not
Anxiety
Sum of squares df Mean square | Between groups 182.107 3 160,702 Ton Within groups 122.000 24 5.083
Total
304.107
27
Figure 13.9 SPSS Output for Planned Contrasts
Contrast Coefficients STRESS ета! nor metio lay
mock job interview 1 Contrast Tests Value or Contrast] Contrast Std. Enor 」 dl o eees [ANDIETV Azaumo equal vriancos! 8207 2081 STEW 2000] Donsnotassume equal? sal 2.832] Same 200 Contrast ①
64% Page 395 of 624 - Location 10081 of 15772
differ significantly in anxiety. The third subset
A
.
:
consisted of the mental arithmetic and mock job
interview groups.
Figure 13.10 SPSS Output for Post Hoc Test (Tukey HSD)
Multiple Cor parisons
DepandentVariable: ANXIETY
Codere tal 返 Te sun so {iowa[ppm Baní stress ッ co | am) 108 may e 778 vo] 000 進 一蓬 E— пе И e ss | | ply a e wer 0 am] as em ore гы o | n| sas mera rima 10 e 一磁 一賞 啓 清 澳 тоdr во ⑤wl 報 道 am ex se Sol il mri Tow anGrr snai rl.
SES None Stressroleplay Mental æithmetc Mockjobintervew Sg
memes
Homogeneous Subsets
[
Mean for groups in homogeneous subsets are displayed. Note that it is possible for a group to belong to more than one subset; the anxiety score for the mental arithmetic group was not significantly different from the stress role play or the mock job
aer
stress
N Subsetforalpha equal 005 ー ロ ェ E E [7 EEE] 7 70, 7 Too [098 [oe
interview groups. However, becausethestress
aE 謀 談 誠
role play group differed significantly from the mock job interview group, these three groups did
1.000 ow] ら cmニ ーm
"iss
not form one subset. Note also that it is possiblefor all the Tukey HSD
The details are as follows. Table 1
comparisons to be nonsignificant even when the
overall #for the one-way ANOVA is statistically significant. This can happen because the Tukey
Dependent variable: ANXIETY Tukey HSD бен [Шин
Mean difference [Si k 95% confidare intenal error 0 tower TO bound |b om [0006 Mis ド Minas ws ③ ma mete ster) [1208 [6025 [Minus N Stress role play Minus 3.71 |o 70% faster) [1208 [0000 Mun N o um oo |3 (ete) âmteriem ー as mera las_ Teastis Mental armee [none 1205 Tos[wins |4 stress role lay (0.71 261 11305 (one ves 6 Mazzi Mok e intere [7 Siar[1355 [005 [03 Hone E 10s |0933| Ming” |2 ae Mental H metic liecei [1205 [6047 Minus k [Minus 343 Mockjob emoh (atril) aoce 5:14 (ser) 1305 0000 [382 Wackoaren Ne Mines [6 Taos [0238| ost 27 Mera Seses e Stress role play 345 overs [1255 [о[010
Note: Asterisk signifies the meandifferenceis significantat the 0.05 level. Table 2
HSDtest requires a slightly larger difference between means to achieve significance. In this imaginary example, as in some research studies, the outcome measure (anxiety) is not a
standardized test for which we have norms. The numbers by themselves do not tell us whether the mock job interview participants were moderately anxious or twitching, stuttering wrecks. Studies
that use standardized measures can make comparisons with test norms to help readers understand whether the group differences were large enough to beof clinical or practical importance. Alternatively, qualitative data about the behavior of participants can also help readers understand how substantial the group differences
were.
Anxiety Tukey H $ D (uses harmonic mean sample size equal to 7.000
65% Page 396 of 624 - Location 10095 of 15772
Figure 13.11 Bar Chart for Group Means With 95% Confidence Intervals
Mean anxiety
20
anxiety and stress.
15
Results
10
A one-way between-SANOVA was done to
compare the mean scores on an anxiety scale None
Mental arithmetic Stress role play Shem Error bars: 95% Cl
Mock job interview
Thedetails are presentedhere ina table, with all values approximatedfrom the graph. mean | 10 14 135 17
Lower C | boundary |8 [12.5 [11 [⑮
for participants who were randomly assigned to one of four groups: Group 1, control
Thehorizontal axis of the graph showsthe different categories for which stress has been measured and thevertical axis showsthe mean anxiety. Theerror bars shows a confidenceinterval of 95%.
stress None Mental arithmetic Stress role play Mock job interview
(0 = not at all anxious, 20 = extremely anxious)
Upper C | bou 12 16.5 16 19
SPSS one-way ANOVA does not provide an effect size measure, but this can easily be calculated by
group/no stress; Group 2, mental arithmetic; Group 3, stressful role play; and Group 4, mock job interview. Examination of a histogram of anxiety scores indicated that the scores were approximately normally
distributed with no extreme outliers. Prior to the analysis, the Levene test for homogeneity
of variance was used to examine whether there were serious violations of the homogeneity of variance assumption across groups, but no significant violation was found, Æ3,24) = .718, p=.72. The overall for the one-way ANOVA was
statistically significant, A(3,24) = 11.94,p
50 + 8%
scores on Yfrom raw scores on Y (Figure 11.6).
(salary) and the name of the predictor variable
or N> 104 + (whichever is larger) for regression
Figure 11.4 SPSS Menu Selections for Linear
analysis. This implies that Mshould beat least 105
Regression
when using one predictor variable. This is consistent with sample size suggestions from Schônbrodt (2011), discussed in Chapter 10. Even if statistical power tables may suggest that N< 100 can give adequate statistical power for significance tests of band 7, it is preferable to have N> 100.
49% Page 300 of 624 » Location 7647 of 15772
byanyoverlap is labelled lower case c.
and B factors, and probably also between the main
A notereads: Partition of sum of squares Y for an orthogonal factorial ANOVA
effects and the A x B interaction. In a
A legendfor the area readsasfollows:
to explain some of the same variance. Thisis like a
・ ・ ・ ・
SS subscript A is lower case a $$ subscript Bis lowercase b SS subscript A timesBis lowercase a,b $8 subscript within is lower case c
nonorthogonal factorial ANOVA, the predictor variables (or factors) are correlated; they compete situation you will encounter when you learn to use more than one predictor in regression analysis. Correlated predictors in regression compete to explain some of the same variance in the Foutcome variable, and confounded factors
The total area of the circle that represents Y
compete to explain the same variance in the
correspondsto SStotal, and the overlap between
dependentvariable in factorial ANOVA. The
thecircles that represent Factor A and Yis
variance-partitioning problem in anonorthogonal
equivalent to SSa. The proportion of explained variance (represented by the overlap between Y and A)is equivalent to. When we have an orthogonal factorial design, there is no confound between the group memberships on the A and B factors; the predictive Factors A and B do not compete to explain the same variance. The A x B interaction is also orthogonal to (uncorrelated with) the main effects of the A and B factors. Therefore, when we diagram the variance-partitioning situation for a factorial ANOVA(as in Figure 16.16), there is no overlap between the circles that represent A, B, and the A x B interaction. Because these group membership variables are independent of each other in an orthogonal design, there is no need to statistically control for any correlation between predictors.
16.C.2 Partition of Variance in Nonorthogonal Factorial ANOVA When the 7s in the cells are not balanced,it implies that group memberships (on the A and B factors) are not independent; in such situations,
there is a confound or correlation between the A
82% Page 513 of 624 + Location 12898 of 15772
factorial ANOVAis illustrated by the diagram in Figure 16.17. In discussions of multiple regression, a similar problem arises in partition of variance. When we want to predict scores on Yfrom intercorrelated continuouspredictor variables Y; and X>, we need to take into account the overlap in variance that could be explained by these variables. In fact, we can use variance-partitioning strategies similar to those that can be used in multiple regression to compute the SSterms in a nonorthogonal factorial ANOVA. There are several ways to handle the problem of variance partitioning in regression analysis, and the same logic can be used to evaluate variance partitioning in nonorthogonal
factorial ANOVA. Figure 16.17 Partition of Sums of Squares in Nonorthogonal Factorial ANOVA
The Residualssection is below this. Here there are check options for Durbin-Watson and casewise diagnostics. Both have been left unmarked.
Table 11.1 relabels and rearranges the elements of the coefficient table in the SPSS outputso that you can relate them to terms in the textbook. The top panel of the SPSS outputin Figure 11.7 gives
Atthe bottom are option buttons for continue,
cancel andhelp.
results for Æ (capital Æis called multiple 2).
Onthe basis of information in Table 11.1 we can
11.12 SPSS Output: Salary Data To see the equivalence between Pearson's rand
write the unstandardized regression equation to predict salary in dollars from experience in years,
as follows:
parts of the results of the bivariate regression
Other
result, Pearson's rbetween years and salary was
Y =31,416.72 + 2,829.57 x years.
obtained using the SPSS correlations procedure;
Figure11.7 Pearson's 7for Years and Salary Correlations
results appear in Figure 11.7.
Unstandardized Predicted Value
Complete SPSS regression output includes additional information (discussed in Volume II [Warner, 2020]). Figure 11.8 shows the results needed to find the proportion of predicted and unpredicted variance (Æ and 1 - £2) and to write out the two versions of the regression equations (raw score and standardized). From the top of Figure 11.8, the proportion of variance in salary that can be predicted from years of experience is 72 or £2, thatis, .688 or about 69%. When regression includes more than one predictor, multiple Rtells us how well the entire set of predictor variables can predict 乃 In this example, the regression equation has only one predictor. When there is only one predictor variable, Pearson’s 7between Xand Pis the same as multiple R for the equation that uses Xto predict ¥. (You can ignore the other information in the top panel of Figure 11.8 for now. The
standard error of the estimate is discussed later in the chapter and is not usually included in research reports. The adjusted £2 valueis only used when a regression has more than one predictor variable.)
Unstandardized
Predicted Value
Pearson Correlation
Sig. (2-tailed) N
Pearson Correlation Sig. (2-tailed)
salary
N
** Correlation is significant at the 0.01 level (2-tailed).
830**
50
50
50
50
830°] 000
The image is a table that depicts Pearson correlations forsalary. Details are below; * Unstandardized predicted Value © Pearson correlation = Unstandardizedpredicted Value: 1
= Salary: .830 double star © Sig. 2-tailed = Unstandardizedpredicted Value: blank = Salary: .000 e R
» Unstandardizedpredicted Value: 50
= Salary: 50
e salary © Pearson correlation » Unstandardizedpredicted Value: .830 double star = salary: 1
© Sig. 2-tailed 49% Page 302 of 624 » Location 7691 of 15772
salary
1
000
1
methods produce identical results. By default, the
In atwo-way factorial ANOVA, we need to add a
SPSS GLM procedure uses the Type III
second term to this modelto represent the main
computation; you click the Custom Models button
effect for a second factor (B) and,also, athird
in the main GLM dialog box to select other types of
term to representa possible interaction between
computation methods for sums of squares, such
the A and B factors. The following effect
as SSType I, and to specify the order of entry of
components can be estimated for each participant
predictors and indicate whether some terms,
once we havecalculated the grand mean and all of
such as interactions, should be included or
the cell, row, and column means. The following
excluded from the model.
equations show the population parameter (e.g., a;)
Other optional methods of computing sums of squares can be requested from the SPSS GLM custom model(SS Types Il and IV). These are more
and the information from the sample used to estimate it (e.g., Ma;- Mp. Let a;bethe effect of Level¿for Factor A:
complicated and are rarely used.
Other
Appendix 16D: Modelfor Factorial ANOVA The theoretical model for ANOVAis an equation that represents each individual score as a sum of effects of all the theoretical components in the model. In one-way ANOVA,the modelfor the score of Person/in Treatment Group A;can be
represented as follows:
(16.26)
= M,;- Mp
Let B;be the effect of Levelfor Factor B:
Other
(16.27)
B.=My;- My. тек авбе the interaction effectfor the 7cell:
Other
Y;=hy+0;+E; where ju pis the grand mean of 了 wrepresents the “effect” of the #h level of the A factor on people's
Other
a; = Map; 7 呂 - の - 阜
scores, and cグ represents the residual or error, which captures any uniquefactors that
Letez be the residual, or unexplained part, of
influenced each person’s score. The pyterm is
each individual score:
estimated by the sample grand mean My, the a;
Other
term is estimated by My;~ My, the distance of each sample group mean from the sample grand mean; and the residual eis estimated by the difference
between the individual ¥jscore and the meanfor the 7th level of Factor A,thatis, £ÿ= Yy-Mas
83% Page 514 of 624 + Location 12947 of 15772
(16.29)
e,ijk = У.於 Marsi For each observation in an A x B factorial model, each individual observed ¥jscore corresponds to
an additive combination of these theoretical effects:
The theoretical terms (the u, a, B, aB, and effects) may beeasier to comprehend when you see that each observed symptom score can be separated
Other
into estimates of these components. Thatis, we
can obtain a numerical estimate for eacheffect for
(16.30)
坊 = ру+а,+ В; + AB; + Ep
each participant. The following example uses the
The “no-interaction” null hypothesis is equivalent
SPSSfile socialsupportstress.sav that was
social support, stress, and symptoms data in the
to the assumption thatfor all cells, this aB term
discussed earlier in this chapter.
is equal to or close to zero. For a two-way factorial
Each individual Fscore can be represented as the
analysis, the null hypothesis (of no interaction)
sum of the following components:
can be written as follows: Other
Other
Y=G_MEAN + AEFF + B_EFF + AB_EFF + RESIDUAL
or
(16.31)
H,: aB,, = aB,, = apy = af,= 0.
Symptoms = G_MEAN+ A_EFF+ B_EFF + AB_EFF + RESIDUAL
If we do not find a statistically significant #for
In the SPSS data set scorecomponents.sav, shown
the interaction, scores can be adequately predicted from the reduced (no-interaction) model, also called the “additive model”:
Other
in Figure 16.18, each individual person’s score is divided into the components described in Table 16.7.
For all persons in the study, the value of G_MEANis the same. It is the grand mean for
(16.32)
坊 = y+ の + В+ Ej The equation for this reduced (no-interaction) modelsays that the Fscore for each person
and/or the mean of for eachcell can be predicted from just the additive main effects of the A and factors. When there is no interaction,
symptom scores in the entire study. For all persons in the same social support group, A_EFF is the same. The A effect is the mean of the A group the person belongs to minus the grand mean. For all persons in the same stress group, B_EFF is the same.
we do not need to add an adjustment factor (ap) to
For all persons in the same cell, the AB_EFF is
predict the mean for each cell. The af effect
the same. The AB effect is based on the mean
represents something “different” that happens
for the cell that the person belongs to minus
for particular combinations of levels of A with
the grand mean, the A effect, and the B effect.
levels of B, which cannot be anticipated simply by
The AB cell effectis calculated as follows:
summing their main effects. Thus, the null
ABEFF = Mag;(My+ Mp;+ Mp). Members of
hypothesis of “no interaction” can be written
each cell have the same value for AB_EFF.
algebraically: #0: aB;= 0, for all /and / For each person, there is a unique value of
83% Page 515 of 624 - Location 12978 of 15772.
RESIDUAL (deviation of individual score from
each column, of which the data in row number
cell mean). It is obtained by subtracting the
10 has been highlighted.
cell mean from theindividual score.
The details of the spreadsheetare asfollows. ]Residual TAERF TOEFF TAB, Socaup [Stress ]sympiom]6 177 Mean | a 665 [165 Minus Minus [Mins セ ェ Tir 2757 [175 |190 665 os a qa ques ー ェ ェ h 1
⑥
Notice that if you sum the terms G_MEAN,A_EFF, B_EFF, AB_EFF, and RESIDUALfor each person,
Reprot symp [200 TS
|s
665
[165
[Minus
[Minus
[120
[500
symptom score. The reproduced scores appear in
*
nl
ll
El
Fl el
陳
thelast column of Figure 16.18.
ド
n his | [as
[as
|
27 [am
|375 fis
Jum [1000
you can exactly reproduce each person’s original
ョ
Suppose that Subject 10 is Joe. On the basis of
⑥ symptom score ⑧ 16.18, we can say that Joe's Figure
16.18
Score
Each
for
Components
Individual Case in the Social Support Data
Ee iニー se= pe tu rc ими- Go SHSM ew BLE A ВЕ do B レー ンーン レ ーーMAAAA do] Te mm oa aa ea a Ya 7 り ーー ① aw aw am am ww Toe + ро 一一 ро ⑧ am owe om am e Pa ① an amoowm ae as aaa В ta im mo mo de a Po El ie 4 iu + ョ ?
NL
ав
36
В
The spreadsheet shows 10 columns, with the
⑧
headingsas follows.
1. Serial number
2. Socsup underscore a 3. Stress underscore b
4. Symptom 5. Gunderscore MEAN 6. Aunderscore EFF 7. Bunderscore EFF 8. ABunderscore EFF 9. RESIDUAL
10. Reproduced symptom
Twenty rows of data have been entered under 83% Page 516 of 624 - Location 13007 of 15772
ド
© |1
[a
7h ョ
2ド
эт
z—
|
ド
“pe 一e
f
FT)fio
es Jess
[16s [1m
f
Ae
35
aZA 665 hh Pa
of 14 is made up of the components summarized : in Table 16.8
Figure
+
ав
laz
UBS ав bis 165 |225 Minus [Mins [175
LK 200 [000
0 |300
[200 ー一
|600 fas
fass
Ra
es
lie
26 |275 [Mine [Minus [17% 一談 一ー ョ
e ド 司 e
665 誌
ia
ォ
Toa
和 n レ sr ョ テ
ー ロ ー T+ ド ド
665 665 се 655
a テ
ド
レ
Ia
ー
fu
[Wns 180 Tow [wna 280 綱
ェe [Minus 165 [Minus 165 [ms 165 [Minas 165 [Mina 165
Toes
[27s [275 [375 [278 [275
[2787
Tim
100 [000 m Minos |200 [000
[Wins 175 Minas 175 Tomes 175 [Mins 175 [Mine 100 175
Table 16.7
一ama
[600 |700 400 [600
[756
бека
Estimate
Sample
Nameof Sample Estimatein computationofss
Effect ofLevel fof Factor A EffectofLevel of
a
MIM,
Aer
в
маем,
B_EFF
combination of the Level iof with Level
Interaction effect for
ab,
Mig, +My,
ABLEFF
Residualorerror tor each participant
ら
ー ビ E||
Factor
¡AB
Table 16.8
Corresponding Population
o
say File
RESIDUAL
6_MEAN EFF: Effect of Joe's being a member of the low-social support group. B_EFF: Effect of Joe's being a member of thehigh-stress group. ABEFF: Effect of Joe's beingin the low-support/high-stress group (interaction effect) Joe's individual tendency to report more symptoms than oth Je in the samecircumstances. Sumofthe grandmean, Joe's effects for A, B, andAB group memberships, and Joe's residualreproduce his symptom score of 14,
+165
Afactor).
+275 +175
A different formula for SS, is provided in
+120
Equation 16.4 says:
14
Equation 16.4: $84 = Z(Mp;~ Mp2. In words,
* For each individual case, subtract the grand mean Myfrom the mean of the A group (Ma) that the person belongsto; this difference is
Appendix 16E: Computation of Sums of Squares by Hand Notation and formulas for computation of SSvary
across textbooks. Equation 16.33 shows a commonly used formula to calculate SS by hand
from data.
denoted (Ma;- Mp. e Compute this difference for each person in
the data set. e Square the difference for each person. * Sum this difference across all values of i,, and £, where 7is the level of A,/is the level of B, and Ris subject number within each group. In other words, sum these squared deviations across all cases in the entire study.
Other
e The sum is SSa.
(16.33)
SS,=nxbx2(M,- My),
Equations 16.33 and 16.4 yield the same values for SSa. (There can be small differences due to
where
rounding error.) If you must do by-hand
nisthe number of cases in eachcell, bis the number of levels of the B factor,
Mais the mean of Group A; and Myis the grand mean. In words, Equation 16.33 says:
computation, Equation 16.33 may be preferred because it involves fewer steps than Equation 16.4. However, it is helpful to understand that SS
values summarize information about ANOVA model score components such as A_EFF. I believe that Equation 16.4 makes that conceptclearer. Figure 16.19 shows that after you obtain score components (A_EFF, B_EFF, etc.) for individual persons in the study, only two more steps are
e Find the difference between the mean of Group A and the grand mean (Ma;and My for
needed to obtain sums of squares. You need to square the effect (such as A_EFF) for each case and
each group defined by the A factor. + Square these differences.
then sum these squared effects across all persons
* Sum the squared differences across all A
Figure 16.19 Score Components in Factorial
in the study.
groups (ais the number of groups defined by
ANOVA
the A factor).
(A_EFFSQ), and Computation of SS Terms (554)
e Multiply this sum by 77 x (that is, by the number of cases in each group defined by the
83% Page 518 of 624 + Location 13021 of 15772
(A_EFF),
Squared
Score
Components
pose re rgeaguee genes ⑧ i \⑤ =
Corresponding Squared Deviations:
SSwithin = sum ofthis column
ェ
SSAxB = sum of this column
2
Socs Stre
symp
|AEFF |BEFF AB,
1
1
3
Les
ュ
ュ
s
[166
に ①FiF= ene Resid AE вa AB, 諸 SQ |sa |sa キ f T茂 葉 B 逸匠 H 7.56
3.06
肉 csssessssesssssssss PoR く ERRARERARRRARRAAASO cs, =sumofthis column В N È В ⑧ N = ド Èà
ェ ー ト 伊織楊 会 会 レー ド エ [ie 275 vie 175 vee [130 (277 786 Ei ェ テ in a wee [756 し 7116 le az — lia 12 a az 81 a de [ie am [15 мы |2|286 zw sw e ke e fe fo e 10 2 2 [1 ha |275 [175 |120 |272 [756 Ha 1 120 [Mins ins 178 (unos 272 [756 Le |b 200 Гри [75 TTT iis [wna TE Too
任 [3 56
È
n
AE pe E
E
ト
關
P i
L
P
i
FP
ド ① M | ド me
3
2
1
16
рр Original Data
. Thescreenshot shows a spreadsheet with data in 10 columnsplaced sidewayson theleft, pós ⑧ squared deviations я thecorresponding with mentionedon the right. Thedetails are as follows.
165
Minus
[Minus
Minus
275
175
Minus | 272
礎 進綱 [mines 十[Minus [120 [272 |756 [306
|275
Minus
[175
[200
|
[272
|756
|e he |306 [306 5% |306
Vo |225 fans[wine (1 m 127 m fa 165 |275 [Mins [Mins [175
Minus [2.72 |756 |306 R B 200 16 2 26 Minus [2:75 Minus [0.00 (272 |756 |306 h 175 ョ ド デ ア мя [тли [об | Ги a ha 165 175 E E E a mina [15 vis vw [277 [755 [306 h 175200 ョ ド ュ e lee wes ooo [27 [780 [306 h 1» ュ 隆 伊 森[75 [5% ォ В 175 Coso [ora Gr rar bet Devi oeDear soar Sauer quer [og ed [m |r cvs [ons Rea fone under [on under [mt der [ma [od uta mimo det[dae eto devit Sevi [C 綺 出 [ a [SSS e ae le ai| oe ees | cr suc sts | equal equl times mA [si e |eee [Sam эт | R WL Lin| a смт сыт sm | ur ee [oras wr [er samt mt i м” Ra ee om De na. om ② S ョ
ョ
ュ
The SPSSfile fullcomputationofss.sav, shown in Figure 16.19, includes the effect components for all 20 individual cases that appear in Figure 16.18. The values in the last four columns (e.g., A_EFFSQ) were obtained by squaring these components. The sums of squares are obtained by adding these squared effects across all 20 persons. ‘What can we learn by going through all the computations in Figure 16.19? A fundamental concept in ANOVA is that we can break each individual score into components that represent
83% Page 520 of 624 + Location 13049 of 15772
the strength of the effect of each independent
correlation or regression, allows us to say
variable. The computations in Figure 16.19
something about the proportion of variance in F
demonstrate that SSterms are just sums of
scores that we can predict using the independent
squared effects such as A_EFF, B_EFF, and so on.
variables in our research. A major goal of research
(At least, this is the implicit theory behind the
is usually to accountfor a reasonably large
ANOVA model in Appendix 16C. It is possible to
proportion of variance in dependent variables.
think of real-world issues that would make breaking scores up into components associated with individual factors more complicated.) We can also divide or partition SS;ota] into summary
Comprehension Questions 1. Consider the following actual data from a
information aboutvariability due to effects of
study by Lyon and Greenberg (1991). The
each independentvariable. This example included
first factor in their factorial ANOVA was
effect of A (social support), B (stress), and A x B
family background; female participants were
(their interaction).
classified into two groups (Group 1:
Each effect (such as A_EFF) is a deviation of one mean from another mean, or a deviation of an individual score from a group mean. We can summarize information about magnitudes of effects by squaring and summing them. (As in earlier computations of SS terms, we have to square deviations before we sum them, because deviations from means sum to 0.) Sum of squares tells us which factors correspond to large components of individual stress scores and which factors correspond to small components, when
we summarize information across all cases in the study.
codependent, women with an alcoholic parent; Group 2: non-codependent, women with nonalcoholic parents). Members of these two groups were randomly assigned to one of two conditions; they were asked to donate time to help a man who was described to them aseither Mr. Wrong (exploitative, selfish, and dishonest) or Mr. Right (nurturant, helpful). The researchers predicted that women from anoncodependent/nonalcoholic family background would be more helpful to a person described as nurturant and helpful,
whereas women from a
Usually we hope for at least one independent variable SSterm for A, B, and their interaction to be large because this suggests that the corresponding factor is a useful predictor of Y scores or perhaps a cause of Y. We usually hope that SSyithin, also called SSresidual, Will be small. A key conceptin statistics is part
a
ANOVA makes it possible to partition, or divide, SStota] into SSterms that representeffects of A, B, A x B, and residual or within-group variability. The n° term for each SS(Section 16.8), like 7 in
83% Page 520 of 624 - Location 13057of 15772
codependent/alcoholic family background would be more helpful to a person described as needy, exploitative, and selfish. The table of means below represents the
amount oftime donated in minutes in each of the four cells of this 2 x 2 factorial design. In each cell,the first entry is the mean, and the standard deviation is given in parentheses.
The min each cell was 12.
EET A, (codependent family background) 133.84 (54.24) 12.502 A, (non-codependent family background)
0000.)
wwc
The reported Fratios were as follows:
design. 2. Run a factorial ANOVA using the SPSS GLM procedure. Verify that the values of the SSterms in the SPSS GLM output agree with the SSvalues you obtained from your spreadsheet. Make sure that
Other
F,(1,44) = 9.89, p < .003. F,(1, 44) = 4.99, p < .03.
you request cell means, a test of homogeneity of variance, and a plot of cell means (asin the examplein this chapter). 3. What null hypothesisis tested by the
Fy(1,44) = 43.64, p < .0001. 1. Calculate an n° effectsize for each of these effects (A and B maineffects and the A x B interaction). (Recall that n° =
Levene statistic? Whatdoes this test tell you about possible violations of an assumption for ANOVA? 4. Writeupa “Results”section. What conclusions would you reach about the
dfoetween * F/[dfbetween * F + dfwithin].)
possibleeffects of caffeine and exercise
2. Calculate the row means, column means,
on heart rate? Is there any indication of
and grand mean from these cell means.
an interaction?
3. Set up a table of cell means, or a bar chart of cell means, or a line plot of cell means. 4. Write up a “Results”section that presents these findings and provides an interpretation of the results.
5. What were the values ofthe 12 individual scores in the A,/B, group? How do you know them? (Scores on the dependentvariable cannot be negative
8
numbers in this example.) 2. Dothe following analyses using the hypothetical data below. In this imaginary experiment, participants were randomly assigned to receive either no caffeine (1) or 150 mg of caffeine (2) and to a no-exercise condition (1) or half an hour of exercise on a treadmill (2). The dependentvariable was heartrate in beats per minute. Data are also in
the SPSSfile caffeineexercisehr.sav. 1. Compute the row, column, and grand means by hand. Set up a table that shows the mean and 7 for each group in this
84% Page 521 of 624 + Location 13082 of 15772
. Consider these tables ofcell means. Which one shows a possible interaction, and which one does not show any evidence of an interaction?
‘has been checked.
Belowthis is a space to indicate the percentage of points to fit and Kernel. The confidence intervals choices are None, mean or individual. The first option has been selected.
The percentage has beensetto 95 percent.
The imageis a view of a scatter plot chart after adding a fit line in the SPSS chart editor. At thetop are the menu buttonssuch as; file, edit, view,options, elements and help. Below are buttonsfor editing and other chart functions.
The main chart appears onthe screen. The X axis denotes the years and rangesfrom 0 to 25. The Y axis denotesthesalary and rangesfrom
A ticked check box allows for Attaching label to file.
At the bottom of the chart are radio buttons for apply, close and help.
11.15 Using a Regression Equation to Predict Score for Individual (Joe’s Heart Rate Data)
Oto 100000.
The data pointsare spread throughthe chart; however, many are close to the region within 10 on the X axis and between 20000 to 60000
onthe Y axis.
In an earlier chapter, this question was raised: What might explain why some people have higher heart rates than average? Bivariate regression provides a way to answer this question (keeping in
A linear line is drawnthroughthe data points onthe chart. The equation oftheline as shown onthe chart is:
mind that no one batch of data, and no one analysis, can provide a definitive answer). If mean heartrate for a sample is 81.2 beats per minute, and Joe’s heartrate is 88 beats per minute, that
Y dash equals 31,416.72 plus 2,829.57 into X.
tells us Joe's heart rate is (88 — 81.2) = 6.8 beats per
Ontheright of the chartis a dialog box that allows forthe properties to be changed.
minute above average. Can we explain why Joe's heartrate is 6.8 beats per minute higher than
The fit line tab has beendepressed and the following are the options that can be edited:
above average? Bivariate regression provides a
Checkboxesfor Display spikes and suppress intercept. Both of these have beenleft unchecked. Options for Fit Method shows a series of different typesoffit lines. These are; mean of Y, Linear, Loess, Quadratic and Cubic. Linear 50% Page 306 of 624 - Location 7789 of 15772
average, or predict that his heart rate is this far way to think about this. Data for this example are in Figure 11.12 and in the file named joeshr.sav. For now, look only at the first two columns. Suppose Joe is a member of a sample, and all 10 members of the sample have scores for anxiety (the independentvariable) and hr (the ¥dependent variable). We'll assume
Chi-Square
Analysis of Contingency Tables
Here is an example based on data from the sinking of the 7itanic. The X variable is passenger class (1 = first, 2 = second,3 = third). The Yvariable is whether the person did or did not survive the sinking (1= died, 2 = survived). This table uses data for only female passengers (e.g., sexis held constant). Detailed information about casualties
17.1 Evaluating Association Between Two Categorical Variables
was published after the 77zanic sank (Mersey & Gough-Calthorpe, 1912). Probably you have seen
at least one of the films that dramatize this disaster, and you have some idea how things turned out for people who were first-class
Recall that the choice among bivariate analyses
passengers versus those in third class. These data
depends upon types of measurementfor the X
are in the SPSS file Titanic.sav.
independent and Ydependentvariables. 1. When the Ypredictor variable is categorical
A note abouttable setup: In my examples, if one
variable can be viewed as a risk factor or
and Yis quantitative, ¿tests or analysis of
protective factor or predictor or cause, I use that
variance (ANOVA)can be used to evaluate
as the row variable in the contingency table.
how means for Ydiffer across groups on the
basis of X. 2. When both variables are quantitative, and if X and Yare linearly related, Pearson’s rand bivariate regression can be used to evaluate
how scores are associated. 3. This chapter discusses situations in which both Yand Yare categorical variables. Chi squared (x2) is the most widely used statistic for this case. We begin by setting up a contingencytable. This can be done using the SPSS crosstabs procedure. A contingency
table has one row for each value of Yand one column for each value of ¥. The cell entries, called observed frequencies, tell us how many people were in each group.
17.2 First Example: Contingency Tables for Titanic Data
Asin correlation, there are situations where there is no clear reason to call one variable a predictor
and the other an outcome. When there is a basis to call one variable a predictor (or to think of it as a potential cause), I use that variable to define rows
in the table. That is not an ironclad rule. In the Titanic data, class of passage was established earlier in time than survival, so the table was set up using class of passage as the Xrow variable and
survival status as the column variable. This table has three rows (one row eachfor first-, second-, and third-class passengers) and two columns (died or survived). Each female passenger could be identified as a member of just one of the six groups (e.g., a passenger in first class who died). The number in each cell, O, is the observed number of persons in one of the six cells ofthe table (e.g., a Womanin first class who died). The total number of passengers in each class is denoted by 771, 772, 3; for example, 77 is the total number of passengersinfirst class. The total
84% Page 524 of 624 - Location 13116 of 15772.
number of passengers in each column (i.e., the
chart. We can divide each marginal row total (771,
numbers who died vs. survived) are denoted e; (Y
ny, and 73) by the table total Vto obtain
= 1, died) and c(F= 2, died). The values of the 7's
proportion for each class, as shown in Table 17.2.
and thecs are called marginal frequencies or
Cell frequencies are omitted from Table 17.2 to
marginal totals (because they are in the right and
highlight which numbers are the focus.
bottom margins of the table).
Proportions can be multiplied by 100 to obtain
Odenotes the observed number of persons in each cell. Numerical subscripts can be used to identify
percentages. For example, 36% of the female passengers were in first class.
each cell. In general, Ojis the number of persons
Similarly, we ask, What was the marginal
in the cell in row /and column/of the table. For
distribution of scores for the column variable? For
example, Os; is the number of persons in the cell
the 7itanicdata,the question is, How many died
in row 3 and column 2. These are persons with
and how many survived? Within each column in
scores of Y = 3 and Y=2,thatis, third-class
the table, we can add the frequencies in the three
passengers who survived.
cells to obtain a column total. The total number of women who died (c,) is the sum of women in first
Nis the total number of persons in the table. Note
class who died plus the number of women in
that can be obtained by summing all values of7,
second class who died plus the number of women
or all values of ¢, or all values of O.
in third class who died: 4 +13 +80=106=a.
Contingency tables are described by number of
Theses column totals appear in Table 17.3. To find
rows and number of columns. Table 17.1isa3 x 2
out what percentage of women died, divide ¢; by
table. Number of cellsis given by number of rows
N, this is .26 or 26%. To find the percentage of
multiplied by the number of columns, so in this
women who survived, divide ¢; by Ato obtain .74;
table, there are six cells (number of rows x
74% of all women survived.
number of columns = 3 x 2). First we need to know, What was the marginal
Table 17.1 7itanic E
distribution of scores for the Y (row) variable? In
E
other words, how were the passengers distributed on the categorical variable class? Within each row in the table, we can add the two cells to obtain a row total; for example, the total number of women in first class (771) is the sum of number of
women in first class who died and the number of women in first class who survived: 4 + 140 = 144 = пд. These numbers can be expressed as proportions (or percentages) by dividing them by the table total Y. In an earlier chapter, you saw that a distribution of scores for a categorical variable (such as class) could be graphed as a bar
84% Page 525 of 624 » Location 13142of 15772
Secondciass,
[ Thirdelass, 1-3
0,=140
n; = number in second class = 9
0,=1
ace
= number infirstclass = 14
0,= 76
7 =numberinihirdelass —="
=mumberwho N = total number c,=numberwho urvived =256 des = 106
Table 17.2
sa
examining selected percentages calculated from observed frequencies in this table. Later, we can assess whether contingency can be judged
Second class Xe Thirdclas: X=8
in secondcla ber of third-class women: Proportion in third class: 165/40 Total Nof women = 422 Sum of percentages: 96% + 23% + 41% = 100%
statistically significant. Like bivariate Pearson correlation, contingency is information on whether Ycan be predicted from X. To look for possible contingency between the row
Table 17.3
ICETー|
and column variables, we examine percentages with each row of the table. (In all examples that follow, the independent variable corresponds to rows in the table.) The row percentages’ will tell
c,=numbersurvivec 06 5402 Proportion survived = Tor at
N= total number = at;
On the basis of these marginal distributions, we
know that:
us whether the proportion or percentage of women who survived (those with scores of Y= 2) differs across the three passenger class groups (with scores of X= 1, X= 2, and X= 3). We examine the data for each of the three passenger class
e More women were in third class than in first or second class (although the group sizes did not differ greatly). e If we ignore passenger class, most women (74%) survived.
17.3 What Is Contingency? If you haveseen films about the 7itan:c sinking, you know that a 74% survival rate did not apply equally to all women. You probably understand
that women in third class had alower chance of survival than women in first class. Comparison of
in this table will tell us how much these survival rates differed. If percentage of persons who survived differs across the three passenger classes, we can say that survival was
contingent on passengerclass. Contingent means “related to” or “predictable from.” Tables similar to Table 17.1 are called contingency tables because the observed frequencies in the cells can be used to evaluate whether survival status (7) is contingent upon passenger class (7). We can
assess whetherthere is contingency by 84% Page 526 of 624 + Location 13167 of 15772
groups separately and compute the proportion who died and the proportion who survived separately within each class. Think of each row in Table 17.4 as a separate group and examine the fates of women in each passenger class separately. (The row for Group 1, women in first class, is shaded to highlight that this is one of three separate groups.) What percentage of women in first class survived? What percentage of women in third class survived? To obtain row proportions, the observed frequency in eachcell, O, is divided by the corresponding row total 77. For example, to find the percentage of all women in first class who survived, we divide the number of survivors in first class (017 = 140) by the total number of women in first class (771 = 144). The proportion of
women in first class who survived is 140/144 = .972;if we convert this to a percentage, we can say
that 97.2% of women in first class survived. Other
(17.1)
Rowproportion = Observed value/Correspondingrow total n.
A column percentagefor a cell is obtained by
Other
dividing the 7 of cases in that cell by the total
(17.2) Row percentage = Row proportion x 100. Table 17.4 shows the row percentages for each of the three passenger classes. Proportions of death versus survival were calculated separately within each of the three passenger classes. Within each passenger group, percentages of those who died and survived sum to 100% (within rounding error). We can make a comparison at this point. More than 97% of women in first class survived,
number of cases in that column. A set of column percentages for Table 17.4 would answer the question, Among those who died, what were the percentages of women passengers in first, second, and third classes? For example, among the 106 women who died, only 4 (4/104 = 3.8%) were first-class passengers.
17.4 Conditional and Unconditional Probabilities
while only about 46% of women in third class
The value of .26 or 26% deaths for all women in
survived.
the Zitanicdata (the marginal proportion of death in the bottom margin in Table 17.1) is an example
Table 17.4
of an unconditionalprobailty. That is,if we ignore passenger class, what percentage of all
women died? This is a rate or risk for death that is not conditioned on, or limited by, or dependent on, or statistically related to, passenger class
14% ol women in second class died X=3
membership.
86% of women in second class survived. n=16 54% + 46% = 100%
Pr(specific outcome) can be used to denote unconditional probabilities of specific events. For the Titanicdata, these are the unconditional
We can interpret row percentages as probabilities.
probabilities of two outcomes: death and survival.
For a woman in first class, probability of survival was about 97%, while for a womanin third class, probability of survival was only 46%. Thisis a
Other
Prídeath) or Pr(Y = 1) = .26 (or .26%).
substantial difference in outcomes. Later in this chapter you will learn how to test whether the
Pr(survival) or Pr(Y = 2) = .74 (or 74%).
differences in percentages are large enough to be judged statistically significant. Thinking in terms
Because death and survival are the only possible
of effect size, we should also ask whether a
outcomes, these unconditional probabilities must
difference between percentages is large enough to
sum to 1.0 (or 100%), within rounding error.
matter. If the survival rates for first and third class were 97.1% versus 89.9%, this difference
could be viewed as small.
Wecanalso obtain
When we look at probabilities within one selected row of the table (for example, the row for women in first class) we find values of .28 (or 2.8%) for
in tables.
84% Page 527 of 624 » Location 13192of 15772
death and .972 (97.2%) for survival. These row
percentages are called
where a corresponds to any possible score value
Given the condition that a woman wasin first
for X(such as 1, 2, or 3), and 4corresponds to any
class (e.g., if we decide to look only at data for
possible score value for Y(1 or 2). The values used
first-class passengers), her probability of death
to compute conditional probability for each group
was only 2.8%, and her probability of survival was
in the table appearin Table 17.5. (The numbers in
97.2%. If we specify other conditions (for
Table 17.5 are the same as in Table 17.4, but they
instance, examine only women in third class),
are now labeled formally as conditional
conditional probabilities are different.
probabilities.)
Conditional probability is denoted using a vertical
By now you have probably noticed that the
line. Before the vertical line, we identify the
conditional probability of death for women in
outcome of interest (for example, woman is dead;
first class (2.8%) is much lower than the
in other words, she has a score Y= 1). After the
conditional probability of death for women in
vertical line, we identify the condition that we
third class (46%). The conditional probability of
assume, or the group that we select (for example,
death for passengers in second class falls in
woman is in third class, X = 3). The conditional
between. Passenger class wasrelated to chance of
probability of death given that a woman is in third
survival. A higher percentage of women in third
class can be written:
class than in first class died, or equivalently, a lower percentage of women in third class
Other
Pr(death | third class) or Pr(¥ = 1 | X= 3). To obtain this conditional probability, divide
number of women who were in third class and died (031) by the total number of women in third class (773): (89/165) = .549 or 54%.
relationship) between the Yand Yvariables, we compare conditional probabilities across groups.
probability:
If the probability of death were the same for all women, regardless of whether they had first-,
Other
second-, or third-class tickets, then the
(17.3)
Pr(F = 1X = a) Number of people who have scoresof Y =band X =a Number of people who have scores ofX = a
X=3
Pridied PrÜY=11Xfirst clas Prídied | second clase) PaY=1|4=2=0,Ín,=-14 Prídied th ua Pr=1|X Unconditional prot Prideath): Pr(Y= 1) = 26
conditional probabilities of death for first, second, and third class would be equal. All conditional probabilities would also be equal tothe
Table 17.5 7itanic
ィz
17.5 Null Hypothesis for Contingency Table Analysis To evaluate possible contingency (or predictive
Here is a general formula for conditional
ィn
survived.
unconditional probability. Here is the formal null hypothesis for the 77tanic data. If survival were
class) J⑥ n, 86 Unconditional probability of survival, Prisurvived): Pr(Y=2)=.14
84% Page 528 of 624 - Location 13218 of 15772
not contingent on or related to class of passage, then the expected result would be:
Other
Hy Pr(dead | first class) = Pr(dead | second class) = Pr(dead | third class) = Pr(dead) = .26 or 26%.
Because there are only two possible outcome
example of research about possible health benefits
values for Yin this example (died vs. survived), we
of pet ownership. Data from the actual study are
could state an equivalent null hypothesis as
in the file dog.sav. The values of Oin each cell in
follows:
Table 17.6 are the observed number (or frequency) of cases in each of the four possible
Other
Hg Pr(survived | first class) = Pr(survived| second class) = Pr(survived| thirdclass) = Prsurvived) = 74 or 74%
groups: non-dog owners who died, non-dog
Here is a more general statement of the null
dog owners who survived. The values of 72g and 7;
hypothesis for tables of all sizes. For all possible
are the marginal row frequencies (total number of
values of and a:
nonowners and owners). The values of ¢p and ¢;
owners who survived, dog owners who died, and
are the marginal column frequencies (total
Other
number who died,total number who survived).
(17.4)
H,: PY =b1X=1) = Pr(Y =D). In words, this equation says that the conditional
probability Pr( ア= ク | ギ = の equals the unconditional probability r( ア = ク.
The survival status codes are different in the dog data than in the 77tanicdata. It is acceptable to use any numerical values to describe levels of categorical data; survival status was coded 1, 2 for the Zitanicdataand 0, 1 for the dog owner data. For categorical variables, numbers are only labels
The alternative hypothesis is that there is at least
for group membership. The choice of numerical
one difference between conditional probabilities
labels for group membership makes no difference
and unconditional probability somewhere in the
in the results. Small integer values are most
table.
convenient. The dog ownership data set is used to
17.6 Second Empirical Example: Dog Ownership Data
table. Here are the steps included:
demonstrate complete analysis of a contingency
A study by Friedmann, Katcher, Lynch, and
1. Examine expected frequencies to evaluate whether data are appropriate for x2 analysis. 2. Obtain the table of observed frequencies from
Thomas (1980) reported data about 92 men who
the SPSS crosstabs procedure and examine
had a first heart attack. The researchers were
marginal distribution for X, marginal
interested in variables that would predict survival
distribution for F, and row percentages. The
1 year after the heart attack. Each man was asked
marginal distribution for アprovides
numerous questions about hislifestyle, including
information about unconditional probability
whether he owned a dog (X, coded 0 = no, 1 = yes).
of each Youtcome. Row percentages provide
At the end of a 1-year follow-up period, the
information about conditional probabilities.
researchers recorded whether each man had survived;this was the outcome or dependent variable (7, coded O = dead, 1 = survived). They examined whether dog ownership wasrelated to (or predictive of) survival. This was an early
85% Page 520 of 624 - Location 13245 of 15772.
Table 17.6
No(X=0) Yes(X=1) Columntotal
0=3
CRETEEEES n, number who
survival. There are three possibilities. First, dog
0=50
walking causes overexertion or increases risk for
in dogs =3 n, Number who own dogs =53
су = питье dead =14
Source: Friedmannet al. (1980).
owners might be more likely to die (perhaps dog falling). Second, dog owners might beless likely to die (perhaps the companionship of a dog and/or the beneficial mild exercise provide health benefits). Third, there might be no association of
3. Obtain y? or another significance test to evaluate the null hypothesis that whether a man owns a dog is unrelated to whether he survives. On the basis of the value of x2 (along with dfand a level), we can judge the outcome of the study statistically significant or not statistically significant. 4. Obtain effect size information: A ¢ (phi) coefficient can be obtained for a 2 x 2 table,
and Cramer's Vcan be used for tables with more rows or columns. Value of ¢ can be interpreted like Pearson's 7. 5. Evaluate the nature of the association by
dog ownership with survival. Which outcome do you expect to see? The null hypothesis in this exampleis:
Other
Pr(dead | own dog) = Pr(dead | don’t own dog) = Pr(dead). You should be able to state that null hypothesis in words. Which terms in this null hypothesis are conditional probabilities, and which is an unconditional probability? The outcome of the study appears in Table 17.7; row proportions are included. On the basis of the
comparing row percentages (conditional
numbers in this table, were the results of this
probabilities) across groups. In this example
study consistent with what you expected?
the question is whether dog owners have a different probability of survival than men
About 28% of men who didn’t own dogsdied,
who don’t own dogs.
while only about 6% of the dog owners were dead by the end of the year. It appears that dog owners
Confidence intervals for proportions and
had better survival outcomes. This study was not
percentages are rarely included in research
an experiment, so even if it suggests an
reports, although these can be obtained. In
association between these variables, we can’t say
contrast, opinion poll results (for example, the
that dog ownership causes better survival
proportion of voters who favor passing a law) are
outcomes. Observed cell frequencies (like other
often reported with a of similar toa confidence interval. See Appendix 17A for
discussion.
sample data) will vary because of sampling error. We will need a statistical significance test to evaluate whether, taking sampling error into account, the difference between death rates of 6%
17.7 Preliminary Examination of Dog Ownership Data Think first about the nature of association you might expect between dog ownership and
85% Page 530 of 624 - Location 13271 of 15772
and 28% is substantial enough to be judged statistically significant.
Table 17.7
Now let's look at the other parts of Joe's hr score.
prediction we would makein the absence of other
Weknow that part of his score was not predicted
information. If no other information were
by the regression equation. How much of his score
available, we would predict his hr to be My, the
was predicted? The predicted part of his score
sample mean. This deviation is (¥ - My) = (84.2 —
correspondsto (7 - My. If we knew nothing about
81.2) = 3. In words, the regression equation
other variables that might predict hr, the best
predicted, on the basis of his anxiety, that Joe's
predicted hr for all persons in the sample would
heart rate would be 3 beats per minute faster than
be My, mean hr. The ¥ predicted value is an
the sample mean.
adjustmentto the prediction; how much higher or lower than Mpywould we expect each person’s score to be when we generate a predicted heart rate using anxiety? We now have three pieces of information: Joe’s actual hr, F= 88 Joe’s predicted hr, ¥ = 84.18 (rounded to 84.2) The mean hr for the sample, My= 81.2
You can check to see that Joe's total deviation (column 1) equals the sum of the residual (column 2) and the predicted part (column 3) of the score:
Y-M, mean
YY) (88-842)=38 The part ol Joe's hr that anlety could not predict; error or al foJoe
rm
predict
We can write an equation to summarize the
components or pieces of Joe's hr score: Other
Wecan use these numbers to divide Joe's total
Joe’ hr = Mp for entire sample+ (7 — Y) forJoe + (7 = My) forJoe.
deviation from the sample mean (7-1, into two
Joes hr=81.2+38+3.
parts:
In words, Joe's heart rate can be constructed from the sample mean (81.2), plus the part of Joe's
Total deviation of Joe’s heart rate from mean
score that could notbe predicted from his anxiety
= (У-М»= (88 - 81.2) = 6.8. Joe's hr is 6.8
(3.8), plus the part of Joe's score that could be
beats per minute abovethe averagein the
predicted from anxiety(3).
sample.
Wecan do this for every individual in the sample
Difference between Joe’s actual and predicted
(see fully worked example in Appendix 11D). Note
hr = (7-7) = (88-84.2) = 3.8. Joe's actual hr
that the values of (7— 7) and (Y - My will differ
was about 4 beats per minute higher than the
for other persons, and in many cases, one or both
value predicted by the regression equation
of these score components will be negative
from his anxiety. This wasthe part of Joe’s hr
numbers. You may find it helpful to calculate
that the regression did not predict or explain.
predicted scores and deviations for a few other
This is Joe's residual or prediction error.
cases and compare your results with those in Appendix 11D.
(¥ - Mp) is the difference between Joe’s predicted
Wecan locate Joe's score in a scatterplot, as shown
heart rate (from the regression) and the “default”
in Figure 11.16. Joe's actual and predicted scores
51% Page 309 of 624 - Location 7883 of 15772
column, the sums must equal the original
shows the O- Æ difference within each cell. If Ho
marginal frequencies in the table. You should
is true, and dog ownership is unrelated to
verify that the Z values sum to the row and
survival, these (O- £) deviations should be close to
column totals in the original table. This check
O. In simple terms, if Ho is true, we would expect
appears in Table 17.9. You should also verify that
observed outcomes (values of O) to be close to the
if you calculate row percentages on the basis of
hypothesized outcomes (values of £). As with
the values of Z, the row percentages are the same
other samplestatistics, these deviations will vary
for each row. This check appears in Table 17.10.
because of sampling error.
The percentage of dead persons on the basis of values of Fis the same (.15 or 15%) for the non-
Itis not a coincidence that we obtain the same
dog owners as for the dog owners. In other words,
value (5.1) for all four cells in Table 17.11, except
values of Frepresent the imaginary situation in
that some have plus and some have minus signs.
which conditional probabilities are equal across
The sum of (O- £) must be O across each row and
groups as specified in the null hypothesis.
down each column. Therefore, if we know any one (O- Æ) deviation in this 2 x 2 table, we can fill
Table 17.8
CE Dead(Y=0) ed (Y RowTotal Total Ps PROSННЕ Now =0) к ma xe) Yestr=1) Total
(бота зал E,=60x1M2=81 Ef x=" as НЕТ om
Nem
the constraint that (O- £) must sum to zero for (О- £) is sufficientto fill in the other cell values.
Table 17.1105
£,-59 E,=81 59481514
ЕТ 38144927
59+m1-3 81+49=
I2CT EE not have dog (о-в) Does have dog Column total N
(0-8=0-81 o
o
"o
Table 17.1207
Table 17.107
EXCCOETCCO ras n= No u=o) Ed, 150r15% Ein, Yes(X=1) Total
Table 17.12 is a brief exercise to show that, given each row and column, knowing just one value of
neg
Table 17.95
No=0) Yes r=) Total
in the values of (O- £) for the other three cells.
En, =B.68=150r15% Е, 14/92 = 15 or 15%,
9/53= B50r85% cs
17.9 Computation of Chi Squared Significance Test We want to know how much the observed
Fill in the values for each question mark, given that values must sum to 0 in each row
and column. Fillin the values for each question mark, given thatvalues must sumto in each rowand 。 not have dog =0) Does have dog
Column In general, for a table with rrows and ccolumns:
frequency (о) in each cell differs from the expected frequency(77). The values of O;for all
Other
four cells appear in Table 17.6; the values of £;;for
(17.6)
all four cells appear in Table 17.9. Table 17.11
85% Page 532 of 624 » Location 13325of 15772
4= (7-1) х (с- 1).
Because the dog ownership data have only two rows and two columns, the 2/for y? for this table is (2-1) x (2-1) = 1. Only one of the four (0-Z) differences is free to vary. This is analogous to a
the sum equals the number of cells in the table.
Other
(17.7)
computed a sum of squares (SSterm) in order to
x? = E[(O - E)/E].
obtain a sample variance, you learned that only
The dffor x2 is:
situation you have seen before. When you
the first N-1 deviations of Yscores from the sample mean are free to vary (Vis the number of
Other
scores in the sample). Once you know any (N- 1)
(17.8)
deviations, the last remaining deviation has a fixed value; it does not provide additional independent information about variance. We call N-1 the d/for a sample variance. For a
а= (7-1) х (c- 1). where 7is the number of rows and cis the number of columns in the table.
contingencytable, the Zfis based not on the For the dog ownership data, we obtain the
number of cases but on the number of rows and columns in the table.
following value of x2:
In addition, note that we cannot summarize
Other
information about (O- £) across all four cells in
X = (50 — 44.9)/44.9 + (28 — 33.1)/33.1 + (3 — 8.1)/8.1 + (11 - 5.9/5.9
the table simply by summing these deviations,
= (5.1)/44.9 + (5.1)/33.1 + (5.1)/8.1 + (5.1)/5.9
because the sum would be 0. You saw the same
= 8.85.
problem before when computing an SSterm to find a sample variance. As noted earlier, the bag of tricksin statistics is small. For both x2 and SS, we solve the problem that deviations sum to 0 in the same way: We square the deviations before we
sum them. Computation of y? requires one new thing: we need to scale each squared deviation to take numbers of cases into account. This is done by dividing the (O- £)? term for each cell by the value of £for that cell. (Findirectly provides
information about numberof cases in the sample.)
Because 7 (number of rows) = 2 and ¢ (number of columns) = 2,
Other
df=@-1)x@-1)=1. 17.10 Evaluation of Statistical
Significance ofx2 As usual, there are two waysto evaluate statistical significance. Because SPSS output has a “Sig.” or y value, you can compare the obtained p with a preselected a level, often a =.05.1fp< .05, chi
Combining all these operations gives us the formula to compute the x? test of association for a contingency table. The values of (O- £) are found
in Table 17.11. Note that the number of terms in
85% Page 524 of 624 + Location 13349 of 15772
squared is large enough to be judged statistically significant. You do not have to state that the test is one-tailed; readers will assume thatit is. Alternatively, you could look up critical values for X2, based on your sample d/ in the table in
under the graph line to the rightof this line is shaded,with a label that reads: upper 5%tail.
Appendix D at the end of this book and evaluate whether your obtained y? value exceeds the tabled
critical value.
For the dog owner and survival status data, with
It is useful to visualize distribution shapes when
an obtained x2 of 8.85 with d/= 1, we can reject
you think about reject regions (e.g., values of x2 for
the null hypothesis and say that there isa
which you reject Ap). Up until now the
statistically significant difference in survival
distributions you have considered most often, the
outcomes between nonowners and owners of
normal and /distributions, have been
dogs. Dog owners had a significantly lower
approximately bell shaped.In contrast, x2 is positively skewed. It has a minimum possible value of 0 (and therefore a fixed lower limit). There is no fixed upper limit to possible values. Similar to /distributions, the exact shape of y?
proportion of deaths (.06) than nonowners (.28).
17.11 Effect Sizes for Chi
Squared
distributions differs depending on df. Figure 17.1
For 2 x 2 tables like the dog owner data example,
shows the y? distribution with 1 d/ For 1 df the
the most commoneffect size is the coefficient
critical value of y? = 3.84 identifies the boundary ofthe upper tail that corresponds to 5% of the area. We can reject Hp using a = .05 if the obtained x2 exceeds 3.84. Figure 17.1 Chi Squared Distribution With d/= 1
($).
Phi can be computed from the cell frequencies in a
2 x 2 table. For tables with more than two rows or two columns, a similar statistic, Cramer’s V, is a widely used effect size. There are two ways to compute ¢;it can be obtained directly from the
observed values in the cells of the table, or it can be calculated from x2. Table 17.13 shows how frequencies of cases in the four cellsof a2 x 2 table are labeled to compute ¢. As in discussion of Pearson correlation, cases are called concordantif they have high scores on both
upper 5%tail 3.84
0
Æand Yor low scores on both Xand Y. Cases are called discordantif they have low scores on one variable and high scores on the other variable.
Textbooks and SPSS sometimes differ in the order
The graph shows 0 markedat the origin betweenthe horizontal and vertical axes, and a downwardsloping concave curve drawn
in which rows and columns are presented. To
between the two axes.
concordant values.
Aline is drawn upward from thehorizontal axis to the graph line at a point marked 3.84 towardtheright end oftheaxis. The area 85% Page 535 of 624 - Location 13378 of 15772.
compute ¢, make sure that the values you use for band cin the formula for ¢ correspond to
Assuming observed cell frequencies a through 4 are as shown in Table 17.13 (ie, aand d
Other
was Æ = 5.9, and x? was a reasonable analysis. The data in the SPSSfile dog.sav show you how data
(17.10)
appearin an SPSS file when you have the
ア = ん x ⑩.
categorical scores for each of the variables for
When tables have more than two rows or columns, ¢ cannot be used; a different effect size is needed. The most widely reported effect size for the chi-square test of association is Cramer's 7. Cramer's can be calculated for contingency tables with any number of rows and columns. Values of Cramer's Prange from O to 1. Values close to 0 indicate no association; values close to 1 indicate a strong association. Cramer’s does not provide information about direction of association (i.e., whether higher scores on X go with higher scores for Y).
each individual. To enter the Friedmann et al. dog owner and survival data into SPSS, one column was used to represent each person’s score on the variable named dog (coded 0 = did not own dog, 1 = owned dog), and a second column was used to enter each person's score for the variable survived (0 = did not survivefor 1 yearafter heart attack, 1 = survived for at least 1 year). Asin earlier examples, each row represents the scores for one person. The complete data set for this SPSS exampleis in the SPSS data file dog.sav. The
number of rows with scores of 1 in this data set correspondsto the number of survivors who
Other
owned dogs.
(17.11)
The SPSS menu selections to run the crosstabs 2
, ー と= ネ N xm
procedure are as follows. From the top-level menu, make these menu selections, as shown in Figure 17.2: >
> .
where x2 is computed from Equation 17.5, is the total number of scores in the sample, and mis the
This opens the SPSS dialog box for the crosstabs
minimum of [(number of rows — 1), (number of
procedure, shown in Figure 17.3. The names of
columns - 1)]. Like Pearson's 7, Cramer's Visa
the row and column variables were placed in the
symmetrical index of association; thatis, it does
appropriate windows. All examples in this
not matter whether the row or column variable is
chapter use the independentvariable as the row
the independent variable. Unlike Pearson's 7,
variable. In this example, the row variable
Cramer’s Vdoes not require a linear association
corresponds to the score on the predictor variable
between scores on the Yand Yvariables, nor does
(dog), and the column variable corresponds to the
it have a sign. For a 2 x 2 table, Cramer's Vequals
score on the outcome variable (survived). The
the absolute value of ¢.
Statistics button wasclicked to access the menu of optional statistics to describethe pattern of
17.12 Chi Squared Example Using SPSS
association in this table, as shown in Figure 17.4. The optional statistics selected included x2, q, and Cramer's 7. (Other available statistics are for different specificsituations; e.g., the McNemar
For the dog ownership data from the study by Friedmann et al. (1980), the lowest expected value
86% Page537 of 624 - Location 13431 of 15772
test, described in Appendix 17B, is for paired
samples; the tau statistics are used when
1 Crosstabs.
categorical variables provide ordinal information.) In addition, the Cells button in the
Rows): 団 & dog
main Crosstabs dialog box was clicked to open the Crosstabs: Cell Display menu, which appears in
ЕЕ
Figure 17.5. In addition to the observed frequency
団
for each cell, both the expected frequency for each
Layer 1 of t—————————
cell and row percentages were requested.
Press
Figure 17.2 Menu Selections for SPSS Crosstabs Procedure TB) dogsav [Dataset1] IBM SPSS Statistics Data Editor
=
Elo Edt View Data _Itansform Analyze cwe wan Extensions Window Help
こ
ー コ
w
SHE 因 Coossastates
7 2 3 3
6 7 а
n ッ ョ
ZA
wm Compare Means
enr inaMot neralLinear ode mueModel Coie Begressi rinon
[El Erquences.
e E
Classy DimensionReduction see MonparameniTests Forecasting The details are asfollows.
A drop-down menu from thetask bar shows
several options, of which “descriptive
er cda leading to another drop-down menu. Thesecond drop-down menuhasseveral options, of which “crosstabs,” the fourth from the top, is selected. Arrowsare drawn pointing toward “descriptive statistics” and ycrosstabs.” Figure17.3 SPSS Crosstabs Dialog Box
86% Page 538 of 624 - Location 13458 of 15772.
[7 Dispiay clusterea bar charts
u
LE
0) Suppress tables
(a) Cm) Lei) (care Cena) Thedetails are as follows.
Pane on theleft: blank
Central panes, from the top downward SRow(s): entry shown: “dog” Column(s): entry shown: “survived” Layer 1 of 1: blank Buttons to the right: Statistics; Cells; Format
Figure 17.4 SPSS Crosstabs: Statistics Dialog Box
Figure 17.5 SPSS Crosstabs: Cell Display Dialog Box
4% Crosstabs: Statistics
4 Crosstabs: Cell Display,
[М chi-square
[E] Correlations
Counts —— rz-test-
Nominal———————
[Ordinal
M Observed
| (D Comparecolumn proportions
[¥ Expected
|
| | | [М Phi and Cramer's V | 回 Lambda | [D] Uncertainty coefficient | J [7] contingency coefficient
Nominal by Interval. ————
О
|
EN
|
|
[D Gamma
Bass
es (@
[1] Hide small counts |
回 Somers d
D Kendairs tau-b [1] Kendalrstau-c
E Kappa © Risk [] McNemar
[LD Cochran's and Mantel-Haenszelstatistics
rPercentages—]
Residuals———————— Instandardized
M imew
Standardized Adjusted standardized
白 cuw | [E Total
NonintegerWeights © Round celicounts (© Round case weights © Truncate cell counts © Truncatecase weights
© No adjustments.
(conte) (canon) (ves Thedetailsare asfollows.
The details ofall the check boxes, of which only twoare checked, are asfollows. o Chi-square (checked) o Correlations Nominal © Contingencycoefficient © Phi and Cramer’s V (checked) © Lambda
o Uncertainly coefficient * Ordinal © Gamma
© Somers'd o Kendall's tau-b o Kendall's tau-c Nominal by interval
o Observed;checked « Expected; checked « Hide small counts z-test:
compare column proportions Percentages: Row;checked + Column e Total Residuals:
* Unstandardized ・ standardized ・ Adjusted standardized
+ Eta
о ооо
Counts:
Kappa
Risk
McNemar
Cochran's and Mantel-Haenszelstatistics
Buttonsat the bottom: Continue; Cancel;
Help
86% Page 540 of 624 - Location 13471 of 15772
Noninteger weights: * Round cell counts; selected * Roundcase weights
* Truncate cell counts * Truncatecase weights
Thedetails are as follows.
+ No adjustments
Table1
survived Did not survive dog Does not own à dog Count ュ Expected count 5.9 % within dog 282% Owns a dog Count 3
. Buttonsat the bottom: Continue; Cancel; Help
17.13 Output From Crosstabs
Expected count 8
Procedure
% within dog Count
Total
The output from the crosstabs procedure for
summarized in Table 17.15.
誠言m
The top panel in Figure 17.6 shows the
value 弓eme sided) 8851 (note 1 [0003
ョ
o
Uikelihood ato
som
[1 [0005
nesyear
8755
T 1605
;
frequencies and row percentages. The second
Note b: Computed only for 2 by 2 table.
panelreports the obtained value of x? (8.85) and
=
A us R symmetrical measures of association (effect sizes) including the val f (310) and C is
ik
?
(.
he
and
val
Cramer's
= Nominal by
= Phi
Nominal
Cramer's V
=
Fth
(also .310). Like Pearson's 7, the values of these
Bats
sided)
[ ー
ー
value 0.310
Approx 0.003
0.310
0.003
③
additional tests. The third panel reports
valuesof
fossem
sided)
1 [0.007
N of valid cases
⑤
92
effect sizes do not differ depending on which variableis treated as independent.
Ifyou have only observed cell frequencies, and do
not have the rowsof data for individual
陳 。 participants, you can use one of many online ② p
calculators to obtain y“. Figure 17.7 shows an
example.
Figure 17.6 SPSS Crosstabs Output From Dog and
Survival Status Data Rcareu.com
ea] Lo съvie f-85108SER8601 a ные homme Rom x Cours E X7
86% Page 540 of 624 » Location 13482 of 15772
Table 17.15
ー Cased gency
Eor expected frequency Chi square or chi squared or a?
will value o or row variable
⑧
|!
: 16: 34.3%
association 7 Nof valid cases Note a: 0 cells (0.0%)have expected countlessthan 5.
contingency table with observed and expected cell
Е
94.3% 78
Table 2 Pearson chisquare
al:
ms
5.7% 14
Sessioni Ra % within dog] 15.2%
these data appears in Figure 17.6. Notice that SPSS ⑧ uses different terminology than most textbooks,
including
survived 28 331 71.8% 50 !
=
Expected count Pearson chisquare
'%within dog(or within other rowor value of
independent variable)
cs
Figure 17.7 Online Calculator for x? Using Cell Frequencies as Input
Са Нем
‚сот
The following kinds of information should be
included in “Results” sections. Some of this can be in earlier parts of a report. * Were assumptions for use of the analysis satisfied? (For x2, we do not wantvalues of Z
== osrca ОНИ ra EEE om Com EE
in the tableto beless than 5, as discussed in the next section.) What kinds of persons were included in the sample? e Whatstatistical significance test was done,
Source:
with what df and what p value was obtained?
https://www.icalcu.com/stat/chissqtest.htm
For contingencytables, the most commonly
L
reported test is x2. * What wasthe effect size? For contingency tables this is usually ¢ or Cramer’s V. Values
The details are as follows.
of ¢ are sufficiently similar to Pearson’s 7that
Chi-Squarecalculator
similar values can be called small, medium,
Youcantype any rows/columnsof numbers separated by space or comma.You can also copy and paste any rows/columnsof numbers from a table (excel, word, or othersoftware). An example is listed below: 5786
and largeeffects. What wasthe nature of the relationship? This is described by comparing row percentages or conditional probabilities. (In this example, the survival rate for dog owners was higher than for nonowners.)
6875
Results
Within box:
A survey was done to assess variables that
11 space28
might predict survival for 1 year after a first heart attack. The study included 92 men. The
3 space 50
smallest expected value was 5.93; therefore,
Buttonsbelow: Calculate; Reset
x2 was judged to be an appropriate analysis.
Chi-square value: 8.851085698691
Only one predictor of survival is reported here: dog ownership. Table 17.6 shows the
Degrees of freedom:1
observed cell frequencies for dog ownership
P value: 0.002929146051
and survival status. Of the 53 dog owners, 3
Rows X columns: 2 by 2
11 did not survive. This was a statistically
did not survive; of the 39 nonowners of dogs, significant association: y2(1, N= 92) = 8.85, p
17.14 Reporting Results 86% Page 542 of 624 » Location 13498 of 15772
5 and
other terms used in SPSS output, such as
that no cell should have an expected value less
asymptotic, unless you report results to a
than 1. These criteria are related to the need for a
mathematical statistician or obtain x2 values
reasonably large number of cases in each row and
from more complex methods such as
each column. Low values of Z'tell us indirectly
structural equation modeling. (Minimum
that there are small marginal totals for one or
values for Fare discussed in the next section.
more rows or columns. In these situations, if you
The values in parentheses following y? are df
change the cell group membership for just one
and then table total N.)
case, the results for x2 can change substantially. It
is undesirable to have a research outcome for which results would change dramatically if the
17.15 Assumptions and Data Screening for Contingency Tables
participants were different. Here is a hypothetical
17.15.1 Independenceof
17.15.3 Hypothetical Example: Data With One or More Values of
Observations Both the 77anic data and the dog ownership data are completely between-S. That is, each person can be a member of only one group on each of the categorical variables (i.e., each person can be alive or dead, but not both). The design required for the Y? test of contingency must not have repeated measures or paired samples. It is possible to have cross-tabulated data in repeated measures studies. For paired samples or repeated measures, the McNemar (1947) test can be used, provided variables are dichotomous (only two categories). See Appendix 17B for details.
17.15.2 Minimum Requirements for Expected Values in Cells Most sources say that x2 can be used to analyze
86% Page 542 of 624 » Location 13522of 15772
group memberships of just one or a few example.
Æ > procedure was used to obtain a histogram and descriptivestatistics for each
group. Figure 12.2 Using the SPSS Command to Obtain Output for Separate Groups
Мен Gus Tw asa ne e
caffeine. To evaluate whether the independence of observations is violated, we need to know the
research situation. If we know that each participantis tested under only one treatment condition and that there was no matching or pairing of participants for the samples, then assumption that scores are independent between groups should be satisfied. If we know that each participant was tested individually and that the
ontounto Response Su.
Hs ョーで Le
ロ ーニー
sears ox) er) cm ue
participants did not have any chance to influence one another’s levels of physiological arousal or heartrate, then the assumption that observations are independent within groups should be
satisfied. Data analysts can evaluate whether scores within each sample have reasonably normal distribution shapes and no extreme outliers and whether the
The image is a screenshot of theSPSS split file commandthatseparates output for different groupsbasedon specific criteria. Atthe top are the menu buttonssuch as; view,
SE 55% Page 324 of 624 - Location 2548 of 15772
variables by a researcher and experimental
was named the / ratio in honor of Sir Ronald
control over
Fisher, one of the major contributors to the
other variables that
influence the participants.
outcomes or Often
might
responses of
experiments
involve
comparisons of mean scores on one or more outcome variables across groups that have received
different
types
or
amounts
of
development of modern statistics.) Factor: In the context of analysis of variance, a categorical predictor variable is usually called
treatments.
a factor. In an experiment, the levels of a
Exploratory studies:
of treatment, or different dosagelevels of the
Studies
factor typically correspond to different types
that
include
large
numbers
of
same treatment, administered to participants
variables and may evaluate large numbers of
by
the
researcher.
In
nonexperimental
hypotheses, including hypotheses that arise
studies, the levels of a factor can correspond
during the process of data examination.
to different naturally occurring groups, such as political party or religious affiliation.
External validity: In psychology, the degree to which research
Factorial design:
results can be generalized to participants,
A design in which there is more than one
settings, and materials beyond those included
factor or categorical predictor variable.
in the study. Note that internal validity is related to causal inference; external validity is related to generalizability.
Fixed factor: A factor in an analysis of variance is “fixed”if
the levels of the factor that are included in
メ
the study include all the possible levels for The frequency of cases in a group. See also
that factor or if the levels of the factor
Jrequencyand n.
included in the study are systematically selected to cover the entire range of “dosage
Fratio: In analysis of variance and other analyses, an F ratio is obtained by taking a mean square that
represents
variability
that
can
be
predicted from the independent variable (in the case of analysis of variance, this provides information about differences among group means) and dividing it by another mean square that provides information about the variability that is due to other variables or “error.” If Fis much greater than 1 and if it
levels” that is of interest to the researcher. For example, if we code gender as 1 = male, 2 = female and use these two levels of gender in a factorial study, gender would be treated as a fixed factor. If we select equally spaced dosagelevels of caffeine that cover the entire range of interest (e.g., 0, 100, 200, 300, and 400 mg), then caffeine would be treated as a
fixed factor.
Floor effect:
exceeds the tabulated critical values for 7, the
When scores have a fixed lower limit, such as
researcher concludes that the independent
0 on an exam, and when many scores are
variable
predictive
close to that minimum possible value, there is
relationship with the outcome variable. (It
a floor effect. If this distribution occurs for
has
a
significant
92% Page 597 of 624 - Location 14451 of 15772
more
of this
model;
general
for
examination scores, it suggests that the
cases
examination was too difficult. A floor effect is undesirable.
example, in one-way analysis of variance,
scores on one continuous outcome variable are predicted from one categorical variable.
Frequency: The frequency of cases in a group is the same as the 7 of cases in a group. Later in the book, nis used instead of/to report groupsize.
All these analyses involve the computation of similar terms (e.g., sums of squares).
Generalizability of results: The degree to which a researcher can claim
Frequency distribution table: A list of all possible scores on a variable, along with the number of persons who received each possible score, is called a frequency distribution. For example, a frequency table for the variable type of tobacco used could be
as follows:
that results obtained in a specific sample would be the same for a population of Results from
interest.
generalized to an actual population of interest
ype of Tobacco
representative
is
sample
if the
of the
population; representativeness can often be obtained
Number (Frequency of Persons
a sample can be
using
random
systematic
or
methods to select the sample. Results from an accidental or a convenience sample may be generalizable to a hypothetical population if
None
43
the
Cigarette
41
population. Results from a biased sample are
6
Pipe
11
Chewing tobacco
resembles
sample
not
generalizable.
hypothetical
that In
experiments,
generalizability also depends on similarity of type and dosages of experimental treatment to real-world experiences with the treatment
Galton board:
variable, setting, and other factors.
A physical device that demonstrates the
distribution of outcomes for Bernoulli trials (binary chance
decisions). Also
called a
Grand mean: The mean for all the scores in an entire study, denoted by Myor Merand-
quincunx.
Gaussian distribution: See normaldistribution. General linear model (GLM): The most general case of this modelis one in
Harmonic mean of 7's: A method of computing an average 77 across groups that have unequal 77s. Hinges:
which one or several predictor variables
The hinges are the 25th and 75th percentile
(which may be categorical or continuous) are
points
used to predict outcomes on one or several
quantitative variables shown in a boxplot or
outcome variables (which may be categorical
box and whiskers plot.
or continuous). Most of the analyses taught in introductory statistics courses are special
92% Page 597 of 624 - Location 14473 of 15772.
Histogram:
for
a
distribution
of
scores
on
A graph that provides information about the
We use statistics in an inferential way when
number or proportion of people with a
we estimate population characteristics (such
particular score, for each possible score or
as 4) from sample statistics (such as M) or
interval of score values on a quantitative
when we extrapolate beyond the cases in the
variable (X). Typically, X score values are
study to some larger hypothetical population.
indicated by tick marks on the Xaxis. For each
When we make inferences about populations
X score, the height
of the vertical bar
on the basis of information in samples, we
corresponds to the frequency (or proportion)
must take sampling error into account. Thisis
of people in the sample who have that score
often done by setting up confidence intervals
value for X (or a score for X that falls within
or conducting statistical significance tests.
the indicated score interval). The reference
This is in contrast to descriptive uses of
scale
statistics, in which we use statistics such as M
used to
evaluate the
information
provided by the height of each bar
is
only to describe the data in the sample. In
indicated on the Y axis, which is usually
most published research reports, researchers
labeled in terms of either frequencies or
hope to be able to say something about
proportions of cases. Conventionally, the bars
populations beyond the cases in the study, so
in a histogram touch one another.
they generally use inferential methods.
Inner fences:
Homogeneity of variance: The
assumption
that
variances
of
the
populations being compared (using the ¿test or analysis of variance) are equal. For a ¿test or analysis of variance, possible violations of this assumption can be detected using the
Levene test or other test statistics that
compare the sample variances across groups. In regression or correlation, homogeneity of variance refers to an assumption of uniform
variance of Fscores across levels of X.
are
made
when
research is based on an accidental or a convenience
sample.
For
Institutional animal care and use committee
(LACUC): Research procedures involving nonhuman animal participants must be approved by an
TACUC before data collection. Institutional review board (IRB): research that involves humanparticipants in
The (often imprecisely defined) population to generalizations
the inner fences.
An IRB reviews and evaluates all proposed
Hypothetical or imaginary population: which
The ends of the whiskers in a boxplot mark
instance,
a
researcher who studies the effect of caffeine on anxiety in a convenience sampleof college students may want to generalize the results to all healthy young adults; this broader population is purely hypothetical.
Inferential use of statistics: 93% Page 598 of 624 - Location 14493 of 15772
the United States. Researchers must obtain IRB approval before collecting data from human
participants.
The
corresponding
committee that reviews and evaluates research that involves nonhuman animal subjects is the institutional animal care and use committee (IACUC).
Interaction effect: This is a pattern of cell means in a factorial
ANOVA that is different from what would be
predicted by summing the grand mean, the
(where # is the number of levels for the
row effect, and the column effect. When
within-S factor). Each row corresponds to a
there is a significant interaction, the lines
different order of treatment presentation.
that connect cell means in a graph are not
Each treatment appears once in each row and
parallel; in other words, for members of the
once in each column. Type of treatment is not
A, group, changes in scores on the dependent
confounded
variable across levels of the B factor are not
Ideally, each treatment would follow each
the same as the changes in the A, group.
other treatment only once (to control for
Interaction effects correspond to a pattern of
carryover effects).
cell means that cannot be reproduced just by summing the main effects of the row and column factors. Interaction is equivalent to
moderation.
with
order
of presentation.
Level ofconfidence: When setting up a confidence interval, the level of confidence, C, is usually arbitrarily set at 95% or 90%. If all assumptions for use of confidence intervals are correct, then in the
Intercept:
long
See bo.
run,
if
we
set
up
thousands
of
confidence intervals using samples from the
Internal validity:
same population,
C%
of the
confidence
The degree to which results from a study can
intervals are expected to contain p, and (1 —
be used as evidence of a causal connection
C%) are expected not to contain y.
between variables. Typically, well-controlled
experiments can provide stronger support for causal
inference
than
nonexperimental
studies.
Levels of a factor: Each
group
in
an
corresponds
to
a
analysis level
of
of variance the
factor.
Depending on the nature of the study, levels
Interrater reliability:
of a factor may correspond to different
An assessment of consistency or agreement
amounts of treatment (for instance, if a
for two or more raters, coders, or observers. If
researcher
the
categorical variables,
caffeine, the levels of the caffeine factor
percentage agreement and Cohen’s kappa (x)
could be 0, 100, 200, and 300 mg of caffeine).
may be used to quantify agreement; if ratings
In
involve dichotomous (yes/no) judgments or
correspond to qualitatively different types of
quantitative ratings, then Cronbach’s alpha or
treatment
KR-20 may be used to assess reliability.
compare Rogerian, cognitive behavioral, and
ratings
involve
Interquartile range (IQR):
The distance between the 25th and 75th percentiles in a boxplot. This range includes
the middle 50% of scores.
other
manipulates
cases, levels (for
the
of a
instance,
a
dosage
factor study
of
may might
Freudian therapy as the three levels of a factor called “type of psychotherapy”). In some
studies
where
naturally
occurring
groups are compared, the levels of a factor correspond to naturally occurring group memberships (e.g., gender, political party).
Latin square: A Latin square has 2 rows and Z columns
93% Page 598 of 624 - Location 14516 of 15772
Likert scale:
Rensis Likert, a sociologist, devised this rating scale format. Respondents are asked to report their degree of agreement with a statement about an attitude or a belief using a multiplepoint rating scale (usually five points, with labels that range from 1 = strongly disagree to 5 = strongly agree). In practice, rating scales often have more than five points, and points may belabeled for things other than degree of agreement, for instance, reports of behavior frequency.
with a margin of error of +3%.
Marginal frequencies: In a contingency table, these are the total numbers of cases in each row or each column, obtained by summing cell frequencies within
each row or column. Mauchly's sphericity test: A test of the sphericity assumption for analysis
repeated-measures
variance,
of
required only when there are more than two levels of the repeated-measures factor. The
Linear: Let 7 stand for the slope of a line in a
null hypothesis is that the contrasts (C; - の)
scatterplot of X, Y scores. The association
(Cy — C3), and so on, have equal variances. If
between Yand Vis perfectlylinearif, for each
the sphericity assumption is violated, the 7
one-unit increase in X score, there is a
ratio for standard repeated-measures analysis
constant and consistent increase of 4 units in
of variance
the Yscore, asin Figure 10.1. An association is
underestimate the true risk for Type I error.
approximately linear if a one-unit increase in
Possible
X is associated with an average increase of 7
assumption are either (a) use of corrected d/
units in the Fscore.
based on either the Greenhouse-Geisser or
is
biased;
remedies
its p value
for violations
will
of this
Huynh-Feldt procedure or (b) multivariate
Literature review:
analysis of variance (for more advanced
In science, a review of relevant past scientific
students).
Violation
of
this
assumption
published
creates more serious problems than violation
research, but unpublished results may also be
of the (similar) homogeneity of variance
discussed.
assumption in between-S analysis of variance
Usually
literature.
focuses
on
and should not be ignored.
OVA:
Mean (M9:
See multivariate analysis of variance.
A measure of central tendency
⑧
Margin oferror: Surveys
or
polls
of
attitudes
or
voter
intentions often report results in terms of sample proportions. The margin of error
y
that is
obtained by summing the scores in a sample ② and dividing by the number of scores. Median:
reported for percentages in polls is often (but
A measure
not always) plus or minus one standard error
obtained by ranking the scores in a sample
for the proportion estimate. For example, a
from lowest to highest and identifying the
of
score that has 50% of the scores below it and 50% of the scores aboveit.
polling
expert
might
say that
52%
registered voters who were contacted said that they favored passage of Proposition 21,
93% Page 599 of 624 » Location 14538 of 15772
of central tendency that
is
Meta-analysis: Combining
effect
multiple
studies,
variance
of
effect
size
information
examining size,
and
from
mean
variance
examines
outcome
variable,
means ¥,
while
on
just
one
multivariate
and
analysis of variance compares a vector or list
sometimes
of means on p outcome variables across
searching for variables that explain why
groups (F1, F2, …, 垂 )
effect sizes are larger in some studies than in
others. The total number of observations in a sample.
Missing value: A number (or blank) in a cell in an SPSS data sheet that represents a missing response is called a system missing value; such values are excluded from computations.
if both of these statements are true: Yoccurs
In this book the term mixed models refers to of
variance
that
include
both
within-Sand between-Sfactors. The term can be defined more broadly to include other types of designs.
only after Yhas occurred, and Yalways occurs
after Yhas occurred. Necessary but not sufficient: X is a necessary but not sufficient condition
for ¥if both of these statements are true: У can occur only if Yhas happened, but, when Y
Mode: A measure
of central tendency that
is
obtained by finding the score in a sample that has the highest frequency of occurrence. A frequency distribution can have more than
one mode.
Rating scale items may have any number of response alternatives, and many different types of labels can be used for response alternatives (such as frequencies of behaviors or intensities of feelings). A rating scale is often called a Likert scale; the original format proposed by Likert involved rating degree of agreement on a five-point scale.
a
An asymmetric distribution that has a longer tail at the low (or negative) end of the
multivariate
generalization
of
it involves comparisons of means across however,
univariate
Nominal variable: At a nominal level of measurement, numbers serve only as names or labels for group membership
and
do
not
convey
any
information about rank order or quantity. See also categorical variable.
Nondirectional test: hypothesis that does not specify a directional
analysis of variance. Like analysis of variance, groups;
Negatively skewed:
A significance test that uses an alternative
Multivariate analysisof variance (MANOVA): is
happens, Ydoes not always happen.
distribution is said to be negatively skewed.
Multiple-pointrating scale:
This
Necessary and sufficient: Xis anecessary and sufficient condition for ¥
Mixed models: analyses
The number of cases in a group.
analysis
93% Page 599 of 624 - Location 14561 of 15772
of
difference. For Ho: u = 100,the nondirectional alternative hypothesis is 71: y = 100. For a nondirectional test, the rejection regions include both the upper and lower tails
of the z or distribution. See also two-tailed
test.
An algebraic statement that some population parameter has a specific value. For example,
Nonequivalent control group: When
individual participants
randomly
assigned
to
cannot
treatment
be
and/or
control groups, we often find that these groups are nonequivalent; that is, they are unequal on their scores on many participant characteristics prior to the administration of
treatment. Even when a random assignment of participants to groups occurs, sometimes
nonequivalence
among
groups
occurs
because of “unlucky randomization.” If it is not possible to use experimental controls (such as matching) to ensure equivalence, analysis of covariance (ANCOVA) is often used to try to correct for or removethis type of nonequivalence. However, the statistical
control for
one or more
covariates
in
ANCOVAis not guaranteed to correct for all sources
of
nonequivalence;
also,
if
assumptions of ANCOVA are violated, the adjustments it makes for covariates may be incorrect.
nonexperimental
investigator
the null hypothesis for the one-sample z test is usually of the form Ho: nu = ¢, where cis a specific numerical value. In other words, Ho is the assumption that the population mean on a variable corresponds to a specific numerical
value c. In the evaluation of mean human body temperature, the null hypothesis is p = 98.6°F.
Null-hypothesis significance testing (NHST): Null hypothesis significance testing involves the selection of an alpha level to limit the risk of Type I error, statements of null and alternative hypotheses, and the evaluation of obtained research results (such as a #ratio) to decide whether to reject the null hypothesis. In theory,if all the assumptions for NHST are satisfied,
the
risk
for
Type
I
error
is
(theoretically) equal to alpha. In practice, some assumptions of NHST are often violated, so obtained p values may not accurately estimate the true risk for Type I error in many
studies.
Nonexperimental research design: In
Null hypothesis (Ho):
does
not
research,
the
manipulate
an
independent variable and does not have experimental control over other variables that might influence the outcome of the
Numeracy: Skills needed to evaluate simple numerical or
statistical information.
Odds ratio: An odds ratio is a ratio of the odds for
study.
Normal distribution: The mathematical definition of a normal distribution is given in Appendix 6A. Analysts typically call an empirical distribution seen in a histogram “approximately normal” if its shape approximates that of a bell curve. Also
called the Gaussian distribution.
members of two different groups, often used to summarize information about outcomes
when
the
outcome
variable
is
a true
dichotomy. The odds themselves are also a ratio. In the study of survival status among owners and nonowners of dogs, we could set up an oddsratio to describe how much more likely survival is for a dog owner than for a nonowner by taking the ratio of the odds of
93% Page 600 of 624 - Location 14524 of 15772
survival for dog owners (16.67) to the odds of
participant receives two or more different
survival for a nonowner (2.545). This ratio,
treatments, it is possible that participant
16.67/2.54 = 6.56, tells us that in this sample,
responses depend on the order in which
the odds of survival were more than 6 times
treatments are administered as well as the
as high for dog owners as for nonowners.
type of treatment. Factors such as practice, boredom, fatigue, and sensitization to whatis
Omnibus test: A test of the significance of an overall model (such as a multiple regression) that includes all predictor variables. For example, the © ratio that tests the null hypothesis that multiple 2 = O is the omnibus test for a multiple regression. An F test that tests whether
all
population
means
that
correspond to the groups in a study are equal
to one another is the omnibus test in a one-
outcome measures taken at Times 1, 2, and so forth.
If all participants
experience the
treatments in the same order, there is a
confound
between
order
effect
and
treatment effect. To prevent this confound, in
most
repeated
researchers
vary
measures
the
order
studies, in
which
treatments are administered.
Ordinal variables:
way analysis of variance.
In Stevens's (1946, 1951) description of levels
One-sample¿test: This test uses information from a sample(its mean, D, and M) to decide whether a specific hypothesized
being measured may lead to differences in the
value
for
the
unknown
population mean p appears to be plausible.
of measurement, ordinal variables are those that contain information only about rank.
Orthogonal: This term
means
that
contrasts
are
independent or uncorrelated. If you correlate
One-tailed test: A significance test that uses an alternative hypothesis that specifies an alternative value of u thatis either less than or greater than the value of p stated in the null hypothesis. For
a pair of variables (0; and 05) that are coded as
orthogonal
contrasts,
the
correlation
between O, and O, should be O.
Orthogonal factorial ANOVA:
example, if #0: u = 100, then the two possible
This is a factorial design in which the
directional alternative hypotheses are Aq: pu
100. The rejection region
proportional
consists of just one tail of the normal or ¢
percentage of members in each B group is
distribution. The lower tail is used as the
equal across levels of A). In an orthogonal
rejection region for #7: p< 100, and the upper
design,
tail is used as the rejection region for Aq: u >
confounded.
100. One-tailed tests are used when there is a directional alternative hypothesis. See also
directional test.
such
effects
of
a
way
factors
that
are
the
not
Outer fences: These are a feature of boxplots that usually do not appear on the graph. The outer fences fall at Mdn + 3 x IQR and Mdn - 3 x IQR (or the
Order effects: In
the
in
repeated-measures
studies,
if
each
93% Page 600 of 624 - Location 14806 of 15772
actual minimum and maximum if these are
closer to the median than these limits).
or other designs where scores are paired in
Individual scores that lie beyond the outer
some manner. Also called the correlated-
fences are marked as extreme outliers (using
samples ¿test or direct-difference ztest.
asterisks).
Parameter:
Outlier:
In the context in this book, a parameter is a
A score that is extreme or unusual relative to
quantitative description of distribution shape
the sample distribution. There are many
for a population. Each parameter can be
standards that may be used to decide which
estimated by a samplestatistic. For example,
scores are outliers; for example, a researcher
His the population mean, M is the sample
might judge any case with a z score greater
mean, o is the population standard deviation,
than 3.3 to be an outlier. Alternatively, scores
and SD is the sample standard deviation. In
that lie outside the outer fences of a boxplot
other contexts the term parameter can have
might also be designated as outliers.
different meanings.
p (orpvalue):
Partition ofvariance:
The area in one or two tails of a distribution
The variability of scores (as indexed by their
(such as a / distribution); p represents the
sum
theoretical probability of obtaining a research
separated
result (such as a ¢ value) equal to or greater
explained by between-group differences (or
than the one obtained in the study, when Ho
treatment) and the variance not predictable
is correct. It thus represents the “surprise
from group membership (due to extraneous
value” (Hays, 1973) of the following result: If
variables). The ratio of between-group (or
Ho is correct, how surprising or unlikely is the
explained) variation to total variation, eta
outcome of the study? When a small y value is
squared,is called the proportion of explained
obtained (7 less than a preselected A value),
variance. Researchers usually hope to explain
then the researcher may decideto reject the
or predict a reasonably large proportion of
null hypothesis. Another definition for p
variance for the outcome variable. Partition of
would be the
following
SS is introduced in discussion of one-way
question: How likely is it that, when Ho is
analysis of variance, and estimated partitions
correct, a study wouldyield aresult (such as a
of variance are also provided by regression
answer to
the
tvalue) equal to or greater than the observed £ value just due to sampling error? A p value provides an accurate estimate of the risk for Type I error only when all the assumptions required
for
null-hypothesis
significance
testing are satisfied.
paired-samples ztest: A form of the ¿test that is appropriate when scores come from a repeated-measures study, a pretest-posttest design, matched samples,
93% Page 601 of 624 - Location 14830 of 15772
of squares) into
can be
two
partitioned
parts:
the
or
variance
and multivariate analyses.
Pearson product-momentcorrelation: See Pearson's 7. Pearson's
7
(or
Pearson
product-moment
correlation): A
parametric
correlation
statistic
that
provides information about the strength of a relationship
between
two
quantitative
variables; it should be used only when the
variables are normally distributed, linearly
Repeated-measures
related, and at least approximately at the
Person x Treatment interaction.
interval/ratio level of measurement. When not otherwise specified, the term correlation to
refers
usually
the
product—
Pearson
moment correlation.
no
p-hacking: Searching for smaller p values, often by running numerous different analyses using different
Peer review:
assumes
ANOVA
about
decisions
which
cases,
variables, or groups to include; see Wicherts are
et al. (2016) for a list of p-hacking practices.
submitted to journals, they are sent to “peers”
The final reported p values obtained after p-
(other researchers) for review. This is a
hacking are
When
quality
control
prevents
the
reports
research
scientific
publication
usually
that
mechanism
of poor-quality
research information.
not believable; they greatly
underestimate the true risk for Type I error. Phi coefficient (g): A correlation (also an effect size) that indexes the strength of association between scores on
Percentage (%): is
Percentage
multiplying
by
obtained
two
true
variables;
dichotomous
it
is
equivalent to Pearson's 7.
proportion by 100.
Plagiarism:
Percentile rank: The cumulative percentage of scores below an
When authors present ideas or contributions
X score in a frequency table (not including
of other people as if they were the authors’
scores exactly equal to X) can be reported as
own new contributions.
percentile rank. The percentage of area below
zin a standard normal distribution can also be reported as percentile rank.
See contrast. Pointbiserial correlation (777):
Person effects: In paired-samples or repeated-measures data, we can calculate a mean for each person (combining
across
scores
times
all
or
treatment conditions). If these means differ substantially
across
persons,
we
have
evidence that there are individual differences in the response variable (such as heart rate).
show
If persons
different
responses
to
treatments (e.g., one person's heart rate in
response
to
A correlation that is used to show how a true dichotomous variable is related to a quantitative
variable;
it
is
equivalent to
Pearson's 7.
Pooled-variances Ztest: there
is
homogeneity
of
When
no
evidence
variance
that
assumption
the is
violated,this is the version of the /test that is
Person x Treatmentinteraction:
increases
Planned contrast:
pain,
another
person's heart rate decreases, and another person's heart rate does not change), it indicates a Person x Treatment interaction.
04% Page 601 of 624 » Location 14653 of 15772
preferred. (This is the version of # that is usually taught
in
introductory
statistics
courses.)
Population: In ideal descriptions of statistics, a set of
scores for the entire population of interest,
of the M, — M, observed difference between
from which a sample (subset) of scores is
group means
selected, often randomly. In the ideal world
difference that is large enough to be of any
of statistics, we begin with a population
practical or clinical significance; in practice,
whose members can be identified and then
to make this judgment, a person needs to
select a sample. In actual research practice,
know something about the possible range of
we often begin with an easily accessible
values on the outcome variable and how
convenience sample of cases and then try to
much these changes are valued by people or
make
clinicians.
inferences
to
some
broader
hypothetical population.
See
in a study represents a
also
clinical
or practical
significance.
Positively skewed:
Prediction error:
An asymmetric distribution that has a longer
If a researcher uses a sample value of M to
tail at the high (or positive) end of the
predict or estimate ju, and the value of Mis not
distribution is said to be positively skewed.
equal to ju, the difference between M and p tells us the magnitude and direction of
Post hoc, ergo propter hoc: Latin for “After this, therefore because of this (or caused by this).” A common logical fallacy. This fallacy (error) occurs when a person assumes that a prior event caused a later event,
in
the
absence
of
any
other
prediction
error.
In
regression
analyses,
prediction errors are usually called residuals. Other things being equal, prediction errors
tend to decrease as increases. Preliminary data screening:
information about how the events might be
Examination of frequency tables and graphs
related.
to examine data before doing the analysis of
Post hoc power analysis: Don’t do this. If you conduct a statistical significance test and have a sample value of 4, and if you then use the power table to find power as a function of Zand Win your study,
primary interest; this makes it possible to see potential problems such as extreme scores, non-normal
distribution
shape,
and
nonlinearity. Primary source:
do not use a post hoc power analysis to say
In science, a primary source is a research
“the results would have been statistically
report written by a researcher who has
significant if V were larger.” (You can use an
firsthand knowledge about data collection
effect size from a completed study to evaluate
and analysis.
statistical power for a future planned study, Proportion (7):
of course.)
A proportion is obtained by dividing the zina
Post hoc test:
group or category by the total Win the entire
See protected test.
Practical significance: A subjective judgment as to whether the value
04% Page 602 of 624 + Location 14675 of 15772
sample. Also called relative frequency.
Protected test: A test that reduces the risk for Type I error by
using more conservative procedures. Some
treatment groups, but lacks one or more of
examples include the Bonferroni procedure
the
and the Tukey honestly significant difference
experiment.
test. Also called a post hoc test.
experiment, participants are randomly or
of
a
For
true,
well-controlled
example,
in
a
true
systematically assigned to treatment groups
Protective factor: Something that is associated with lower risk for diseases or problems. For example, hand washing is a protective factor for getting
colds and other common diseases. Protective factors are statistically related to disease or problem outcomes. However, the association is not perfect. Engaging in a protective behavior usually does not reduce the risk for disease or problem to zero. Not engaging in a protective behavior does not predict that the disease or problem is certain to occur. For many
features
diseases
and problems, there
are
multiple protectivefactors.
participant
as
possible
characteristics;
experiment, participants
with
random to
in
respect a
quasi-
assignment
treatments
is
to of
often
not
possible, and therefore, the groups are often not equivalent with respect to participant characteristics prior to treatment. In a true experiment, the intervention is under the control of the researcher; a quasi-experiment often assesses the impact of an intervention
that is not under the direct control of the researchers.
An
example
of
a
quasi-
scores on an outcome variable for two groups
When aresearcher suggests that results from a convenience sample can be generalized to a population
equivalent
experiment is a study that compares mean
Proximal similarity model:
hypothetical
in a way that makes the groups as nearly
that
has
characteristics similar to those of cases in the sample, the researcher is implicitly relying on similarity (rather than random sampling) to justify making generalizations.
that havereceived different treatments, ina
situation where the researcher did not have control over the assignment of participants to groups and/or does not have control over other variables that might influence the outcome of the study. Analysis of covariance is often used to analyze data from quasiexperimental designs that include pretest—
value:
posttest comparisons and/or nonequivalent
See p (orp value).
control groups.
Quantitative variable:
Quincunx:
A variable that contains information about the quantity or amount of some underlying characteristic, for example, age in years or salary in dollars. This includes the levels of measurement that
Stevens
(1946,
1951)
called interval and ratio.
Quasi-experimental research design: A research design that involves pretest—
posttest comparisons, or comparisons of 94% Page 602 of 624 - Location 14897 of 15772.
See Galton board. Random assignmentof participants to groups or
conditions: A way of assigning members of a sample to two or more treatment groups, such that each member of the sample has an equal chance of being included in each group. Note that this is not the same thing as random
Regression slope:
sampling of participants from a population.
In a regression that predicts a raw-score Y
Random factor: An analysis of variance is considered random if the levels included in the study represent an extremely small proportion of all the possible levels for that factor. For example, if a
researcher
randomly
selected
10
photographs as stimuli in a perception task, the factor that corresponded to the
10
individual photographs would be a random
from a raw score for X, the regression slope 2 is the average number of units of increase in predicted Y score for each one-unit increase in Æ This À slope is also called a regression coefficient. (When a zscore for Vis predicted from the z score for X, the standardized regression slope is denoted B.)
Reliability:
factor. In a factorial analysis of variance, the F
A measure is reliable if it provides stable and
ratio to test the significance for a factor that
consistent
is crossed with a random factor is based on an error term that involves an interaction between factors.
measurement. For example, if you weigh
Random sampling of participants population:
from
a
A random sampleis a subset of cases from a population selected in a manner that gives
across
results
occasions
of
yourself on a bathroom scale several times, and the weights are all similar, the bathroom provides
scale
reliable
measures.
If the
weights differ substantially, the bathroom scale provides unreliable measurements.
Repeated measures:
each member of the population an equal
A design in which each participantis tested at
chance of being included in the sample.
every point in time or under every treatment
Random sampling from a population should
condition; because the same participants
enhance the generalizability of results to that
contribute scores for all the treatments,
population. Note that this is not the same
participant characteristics are held constant
thing as random assignment of participants
across treatments (which avoids confounds),
to groups or conditions.
and
Range: The difference between the highest and
lowest values of a variable.
data
are
approximately
participant
to
due
characteristics can, in theory, be removed
from the error term used to assess the significance
of differences
of treatment
group means.
Range rule: When
variance
normally
distributed, the range is approximately 4 times the standard deviation; the standard deviation is approximately one quarter of the
range.
Replication: We replicate a study by reproducing or repeating it. Scientists
can do an exact
replication or reproduction of a study, using the
same
methods,
or
a
conceptual
replication with variations in procedure. If
Regression coefficient:
See regression slope.
04% Page 603 of 624 - Location 14719 of 15772
results
cannot
be
replicated
by
other
researchers, this does not necessarily prove that the original study was wrong. However,
risk for Type I error. The computations in this
downwardly adjusted @f. You probably will never
section are for the equal variances assumed
need to use the equal variances not assumed ztest,
version of the test. You will probably never use
but it appears in your SPSS output whether you
the equal variances not assumed version of the £
request it or not.
test. Given the values of 5, 71, and 79, We can When the “equal variances assumed” version of
calculate the standard error of the difference
the ¿testis used,the variances within the two
between sample means, SEy,_m,:
groups are pooled or averaged, and this average is called Spooled? or spe. The term poo/edjust means
Other
averaged. To obtain 57”, the pooled or averaged
(12.13)
within-group variance, we average the two within-group variances s;2 and s,2. The first version of the formula works whether 7; = 72; or not. It “weights” the variances by the sample sizes (that is, sp? will be closer to s2 for the group with the larger 7).
Other
В Sample statistic — Hypothesized parameter SE,sample statistic
Other
(12.11)
В
ニ P
A tratio generally has the following form:
[(n, =1) + (ヵ - 引
For theindependent-samples ¿test, the value of
(72, +7, — 2)
difference between means is 0, that is equivalent
(M1 — Hp) is usually hypothesized to be 0. If the to anull hypothesis that caffeine has no effect on
If 721 = np, this formula reduces to the following.
heartrate; that is, mean heartrate is the same
This version of the formula makes it even clearer
whether people receive caffeine or not.
that sp? is the averageof s? for the two groups:
Next we calculate the independent-samples # ratio:
Other
(12.12)
2 i +5)72. 5 = (57 The alternativeto the pooled-variances or equal variances assumed independent-samples ¿test procedure is the equal variances not assumed (also called separate variances) ztest procedure.* The formula for SEM, -M, for the equal variances not assumed test keeps the two variances separate
Other
(12.14)
MM, ЗЕмм» Then calculate the degrees of freedom for the independent-samples ratio:
Other
instead of pooling them; this test also uses a
o 56% Page 340 of 624 - Location 2683 of 15772
always completely successful in controlling for
rival
explanatory
variables,
but
Sample:
in
In formal or ideal descriptions of research
experiment
methods, a sampleis a subset of cases drawn
makes it possible to rule out many rival
from the population of interest (often using
explanatory variables. In nonexperimental
random
research, it is typically the case that for any
samples often consist of readily available
predictor variable of interest (X7), many other
cases (convenience samples) that were not
potential predictors of ¥ are correlated with
drawn
or confounded with Xj. The existence of
population.
principle,
a
well-controlled
numerous rival explanations that cannot be completely ruled out is the primary reason why we say “correlation does not indicate
causation.” This caveat can be worded more precisely: “Correlational (or nonexperimental) research does not provide a
basis
for
making
confident
inferences, because there completely
rule
out
explanatory variables
all
is no
causal way to
possible
rival
in nonexperimental
studies.” A nonexperimental researcher can
sampling).
from
a
In
actual
well-defined
Sampling distribution ofM: The theoretical distribution
practice,
broader
of obtained
values for M when thousands of samples (all the same size) are randomlyselected from the same population. The shape, center, and variability of values for Mare predicted by the
central
limit
theorem
and
can
be
demonstrated empirically using Monte Carlo
simulations.
identify a few rival, important explanatory
Sampling error: When hundreds or thousands of random
variables and attempt to control for their
samples are drawn from the same population,
influence
correlations;
and a sample statistic such as Mis calculated
however, at best, this can be done for only a
for each sample, the value of A varies across
few rival variables, whereas in real-world
samples. This variation across samples is
situations, there can potentially be hundreds
called sampling error. It occurs because, just
of rival explanatory variables that would need
by chance, some samples contain a few
to be ruled out before we could conclude that
unusually high or low scores.
by
doing
partial
Xj influences Y.
Sampling model:
Robust:
When random (and/or systematic) methods
A statistic is considered robust if problems
are used to select a sample from a population,
with the data (such as extreme scores) do not
such that the sample is representative of the
make the statistic a poor estimate. The mean
population,
is not robust against the effect of extreme
justification for making generalizations from
scores.
sample to population (Trochim, 2006).
the
sampling
model
is
the
Science journal:
Row percentage: A row percentage for a cell in a contingency
A periodical that publishes peer-reviewed
table is found by dividing the cell ヵ by the
scientific research reports. Also called an
total number of cases in that row.
academicor a professional journal.
94% Page 603 of 624 » Location 14764 of 15772
(12.15)
=.06, then the corresponding one-tailed » = .03).
df=n,+n,—2.
In this situation, the analyst must also check that
the direction of difference between the means (For the equal variances not assumed ztest のis
corresponds to the difference in the alternative
calculated using a complicated formula,it is
hypothesis. If A: py > pp, the null hypothesis can
smaller than 7; + 72, — 2, and it is usually given to
be rejected if M; > M2 but notif M, < M3.
two or more decimal places.) Ihave used annoying quotation marksfor “exact”
12.6 Statistical Significance of Independent-Samples ¿Test
p.1do this as a reminder that the “exact” value ofp given by programs such as SPSS, often reported to 3 decimal places, is not necessarily correct. When assumptions are violated—and they often are—
Irecommend that you report the exact p value for
the p values given by a computer program often
the equal variances assumed (or pooled-variances)
greatly underestimate the true risk for Type I
version of the /test. This is a two-tailed test. For
decision error.
example, if “Sig.” as reported by SPSSis .032, report p= .032, two tailed. Remember thatif SPSS
A judgmentaboutstatistical significance can also
gives you a “Sig.” value of .000, you should report
be madedirectly from the obtained valueof 4 its
this as p< .001. À p value estimates risk for Type I
df, and the alevel. If zis large enough to exceed
error, and that risk can never be 0.
the tabled critical values of ¿for 7 + 72, - 2 df, the
A two-tailed exact p value corresponds to the combined areas of the upper and lower tails of the tdistribution that lie beyond the obtained sample
values of +z
null hypothesis of equal meansisrejected, and the researcher concludes that thereis a significant difference between the means. In the preceding empirical exampleof data from an experiment on the effects of caffeine on heart rate, 7; = 10 and
If you want to report your outcome as a
ny =10, therefore df= ny + 7-2 = 18. ff we use a
significance test using the conventional @ = .05,
=.05, two tailed, then from the table of critical
two tailed, level of significance, an obtained p
values of ¿in Appendix B at the end of this book,
value less than .05 is interpreted as evidence that
the reject regions for this test (given in terms of
the ¿value is large enough so that it would be
obtained values of é) would be as follows:
unlikely to occur by chance (because of sampling error) if the null hypothesis were true. In other words, if we set a = .05, two tailed, as the criterion
Other
Reject H,if obtained + +2.101.
significantly different. Note thatthese values of falso correspond to the If an analyst decides to use a one-tailed
middle 95% of the area of a ¿distribution with 18
(directional) test before peeking at the data, a one-
df. These ¿values (“critical” values) are also needed
tailed p value can be obtained by dividing the two-
to set up a confidence interval (CI) for M, — M3.
tailed pin the SPSS output by 2 (e.g., if two-tailed p
o 56% Page 341 of 624 - Location 2712 of 15772
SSE.
distance from the mean (given in zscore calculated using
Sum of squared errors,
proportions of cases in the sample that
Equation 11.5.
correspond to distances of z units above or
Standard error ofthe estimate: In
units) corresponds to proportions of area or
this
regression,
corresponds
to
the
standard deviation of the distribution of
below the mean.
Standardization:
actual ¥ scores relative to the predicted ア at
The term standardization has two different
each individual X score value. SFest provides
meanings in this book. In data analysis,
information about the typical magnitude of
standardization refers to the conversion of
the prediction error (difference between
scores in original units of measurement (e.g.,
actual and predicted 7) in regression. Smaller
pounds, degrees, inches) into unit-free z
values of SÆest are associated with smaller
scores (see Chapter 6). In experimental design
prediction
errors
and,
thus,
with
more
accurate predictions of ¥. Also denoted sy.x.
and measurement,
standardization means
keeping data collection procedures as similar as possible across all participants or cases (see
Standard normal distribution:
Chapter 2). See experimental control over other
A normal distribution with mean = 0 and standard deviation and variance = 1. See also normal distribution. Standard regression:
situationalfactors or extraneous variables.
Standardized scores: These are scores expressed in z-score units, that is, as unit-free distances from the mean.
A method of regression in whichall predictor
For example, the standardized score version
variables are entered into the equation at one
of X, zy, is obtained as zy = (M - Mx)/sx.
step, and the proportion of variance uniquely explained by
each predictor
is
assessed
Statistical control:
controlling for all other predictors. It is also
When information is available aboutat least
called simultaneous (or sometimes direct)
one additional variable (2), it is possible to
regression.
evaluate the
relationship
between
an X
predictor and a Y outcome variable using
Standard score:
statistical methods to partial out or remove
The distance of an individual score from the
variation
mean of a distribution expressed in unit-free
controlling for Z makes the appearance
terms (ie., in terms of the number of
between X and Y appear stronger, but there
standard deviations from the mean). If p and
are many waysthat the inclusion of a control
care known, the zscore is given by z= (X- w)/
variable can change our understanding of the
o. When p and 0 are not known, a distance
way Yand Ymayberelated. In paired-samples
from the mean can be computed using the
designs, the controlvariable is “persons.”
associated
with
Z
Sometimes
corresponding samplestatistics, M and s (or SD). If the distribution of scores has a normal
Statistical significance:
normal
Statistical significance is evaluated by looking
distribution can be used to assess how
at a p value associated with a test statistic. Ifp
shape,
a table
of the
standard
95% Page 604 of 624 + Location 14809 of 15772
necessary to click the Define Groups button; this
There is an options button onthe right.
opens the Define Groups dialog box that appears
At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help.
in Figure 12.10. Enter the code numbers that identify the groups that are to be compared (in this case, the codes are 1 for the 0-mg caffeine group and 2 for the 150-mg caffeine group;
Figure 12.10 SPSS Define Groups Dialog Box for
however, different numbers can be used to
Independent-Samples /Test
identify groups). Clickthe OK button to run the specified tests. The output for the independent-
tE. Define Groups
К
samples /test appears in Figure 12.11.
Figure 12.8 SPSS Menu Selections to Obtain Independent-Samples /Test
ee sn es gue (en ー =ー oo
BHO ES ニー
k
r=mo
ョ ニニーー E real =
=
: ョー Eee manых | Besoin Trin NE
o Use specified values Group 1:
|1
Group ②:
|②
© Cut point:
Cesneel)( re) Figure 12.9 Screenshot of SPSS Dialog Box for Independent-Samples /Test 1& Independent-SamplesT Test
|| Test Variables):
=
En
There are twooptions to select from. The first statesUse specified values and provides values to thedifferent groups. Here group 1 value is 1 and groupvalueis 2.
Grouping Variable
The second option is the cut point option.
[8 EE (CD se
The SPSS dialog box to define groups has been shown in the image.
Groups.) (ese) (cane) Chow)
The first option has been selected. At the bottom there are three radio buttons;
continue,cancel and help. The image is the dialog box for the independent-samplest testfor the hr variable. On the left is the space for variables, which is blank. Ontheright the test variable has been specifiedas hr. The groupingvariable has been specified as caffeine. There is a define groups button right below this. 56% Page 343 of 624 - Location 8773 of 15772
Figure 12.11 Output From SPSS IndependentSamples ¿Test Procedure
test.
when they have been converted into units
Typel error:
that have a mean of O and a standard deviation of 1. When individual X scores such
A decision to reject Ho when Aois correct.
as height in inches are converted to z scores, they become unitfree.
Type I sum of squares: This method of variance partitioning in the SPSS GLM procedure is essentially equivalent
Unlucky randomization: Sometimes even when random assignment of
to the method of variance partitioning in
cases to groups is used, just by chance, the
sequential
or
groups end up being different in some way.
regression.
Each
hierarchical predictor
multiple is
assessed
controlling only for other predictors that are
Unprotected test:
entered at the same step or in earlier steps of
A significance test that does not use more
the analysis.
conservative decision rules to decide whether multiple comparisons between group means
Type Il error:
are
statistically
significant
(ie,
more
A decision not to reject Ho when Ho is
conservative than the decision rules for a
incorrect.
single independent-samples /test). Protection
Unconditional probability: The overall probability of some outcome (such as survival vs. death) for the entire sample, ignoring membership on any other categorical variables. For the dog owner survival data in Chapter 17, the unconditional probability of survival is the total number of survivors in the entire sample (78) divided by the total Win the entire study (92), which is equal to .85.
refers to protection against the inflated risk for Type I error that arises when multiple significance
tests
are
performed.
For
example, if several / tests are done after an analysis of variance, using the same decision rules as for a single independent-samples ¢ test, these are unprotected tests. Post hoc (also called protected) tests use modified and
more conservative decision rules to evaluate statistical significance; these decision rules protect against inflated risk for Type I error.
Underpowered: A study is underpowered if the samplesize is too small(relative to the effect size) to havea reasonable chance of rejecting Ap when Ap is
false.
Uniform distribution: A distribution where all values of scores for the X variable have equal frequencies or proportions.
Variability: A set of scores has variability if any individual X scores differ from the mean. (Variability is the same as variation.)
Variable: A characteristic that varies across cases. For example, humans differ on variables such as blood pressure, height, and age.
Variation:
Unit free: Scores are unit free (also called standardized)
95% Раде 605 of 624 - Location 14258 of 15772
A set of scores has variation if any of the individual X scores differ from the mean. (In
contrast, if each score equals every other
The formulato calculate a standard score or 2
score, there is no variation in score values.)
score is z = (XY — M)/SD. A distribution of 2
Weighted mean: This is a mean that combines information across several groups or cells (such as the
mean for all the scores in one row of a factorial
analysis
variance)
of
is
and
calculated by weighting each cell mean by its corresponding number,
z,
of cases.
For
example, if Group 1 has 7; cases and a mean of M, and Group 2 has 7; cases and a mean of Му,
the
z score:
weighted
mean
Muweighted
is
calculated as follows: Mweighted = (71411) +
(22M32)1/(n + >). Whiskers: In a box and whiskers plot, these are the vertical lines that extend beyond the hinges out to the adjacent values. Any scores that lie beyond the whiskers are labeled as outliers.
Within-s: See repeatedmeasures. *ZPRED: The standardized or z-score version of the predicted value of Y (7) from a multiple regression. This is one of the new variables that can be computed and saved into the SPSS worksheet in SPSS multiple regression. Also
called ZPR_1.
*ZRESID: The standardized or z-score version of the residuals from a multiple regression (7- 7). If any of these lie outside the range that
includes the middle 99% of the standard normal distribution, these cases should be examined as possible multivariate outliers.
Also called *ZRE_1.
95% Раде 606 of 624 - Location 14280 of 15772
scores has M= 0 and SD = 1. See also standard scoreand standardized scores.
References Abelson, R. P., & Rosenberg, M. J. (1958). Symbolic
Psychological Reports, 19(1), 3-11. doi: 10.2466/pr0.1966.1.3
Baum, A., Gatchel, R.J., & Schaeffer, M. A. (1983).
psycho-logic: A modelof attitudinal cognition.
Emotional, behavioral, and physiological effects
Behavioral Science, 3, 1-13.
of chronic stress at Three Mile Island. Journal of Consulting and Clinical Psychology, 51,
Aguinas, H., Gottfredson, R.K., & Joo, H. (2013).
565-572.
Best-practice recommendations for defining, identifying, and handling outliers.
Beck, A. T., Steer, R. A., € Brown, G. K. (1996).
Organizational Research Methods, 16(2),
Manual for the Beck Depression Inventory-II.
270-301.doi: 10.1177/1094428112470848
San Antonio, TX: Psychological Corporation.
American Psychological Association. (2009).
Begley, C. G., & Ioannidis, J.P.A. (2015).
Publication manual of the American
Reproducibility in science: Improvingthe
Psychological Association (6th ed.).
standard for basic and preclinical research.
Washington, DC: Author.
Circulation Research, 116, 116-126. doi: 10.1161/CIRCRESAHA.114.303819
American Statistical Association. (2015). Ethical guidelines for statistical practice. Retrieved
from http://www.amstat.org/asa/files/pdfs/Ethical Guidelines.pdf
Belluz,J. (2014, December 17). Scientists tallied up
all the advice on Dr. Oz's show. Half of it was baseless or wrong. Vox. Retrieved from https://Www.vox.com/2014/12/17/7410535/
dr-oz-advice
Anderson, C. A., & Bushman, B. J. (2001). Effects of violent video games on aggressive behavior,
Bewick, V., Cheek, L., & Ball, J. (2004). Statistics
aggressive cognition, aggressiveaffect,
Review 8: Qualitative data—Tests of association.
physiological arousal, and prosocial behavior: A
Critical Care, 8, 46-53.
meta-analytic review of the scientific literature. Psychological Science, 12,353-359.
Bissonnette, V. (2019). Resources for the learning and teaching of statistics and behavioral
Aronson, E., & Mills, J. (1959). The effect of severity of initiation on liking for a group. Journal of Abnormal and Social Psychology, 59,
science. Retrieved from https://sites.berry.edu/vbissonnette/index/stat istical-tables/
177-181.
Boneau,C. A. (1960). The effects of violations of Bartko, G. G. (1966). The intraclass correlation coefficient as a measure of reliability.
95% Page 607 of 624 » Location 14295of 15772
assumptions underlying the test. Psychological Bulletin, 57(1), 49-64.
Boston Children’s Hospital. (2014, October 5). Number of genes linked to height revealed by
case illustrations. Washington, DC: American Psychological Association.
study. Science Daily. Retrieved from
https://www.sciencedaily.com/releases/2014/ 10/141005134909.htm
Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical Education, 42,1150-1152.
Brackett, M. A, Mayer, J. D., & Warner, R. M. (2004). Emotional intelligence and its relation to everyday behaviour. Personality and Individual Differences, 36, 1387-1402.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment (Quantitative applications in the social sciences, No. 17). Beverly Hills, CA: Sage.
Brannon, L., Feist, J., & Updegraff, J. A. (2017). Health psychology: An introduction to behavior and health (9th ed.). Boston: Cengage.
Bump, P. (2013, April 2). 12 million Americans believelizard people rule our country. The
Atlantic. Retrieved from https://www.theatlantic.com/national/archive /2013/04/12-million-americans-believe-lizardpeople-run-our-country/316706/
Burish, T. G. (1981). EMG biofeedback in the
treatment ofstress-related disorders. In C. Prokop & L. Bradley (Eds.), Medical psychology (pp. 395-421). New York: Academic Press.
Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for research.
New York: Wadsworth. Campbell, D. T., & Stanley, J. S. (2001). Experimental and quasi-experimental designs for research (2nd ed.). Boston: Houghton
Mifflin.
CNN.(2018, October 11). CNN terms of use.
Retrieved from https://www.theatlantic.com/national/archive /2013/04/12-million-americans-believe-lizardpeople-run-our-country/316706/
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum. Cohen, J. (1992a). A power primer. Psychological
Bulletin, 112(1), 155-159. doi:10.1037/00332909.112.1.155
Cohen, J. (1992b). Statistical power analysis. Current Directions in Psychological Science,
1(3),98-101.doi: 10.1111/14678721.ep10768783
Cohen, J. (1994). The earth is round (y < .05). American Psychologist, 49, 997-1003.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S.
Campbell, L., Vasquez, M., Behnke, S., & Kinscherff, R.(2009). APA ethics code commentary and
95% Page 607 of 624 - Location 14934 of 15772.
(2013). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Hillsdale, NJ: Lawrence Erlbaum.
for the independent-samples test; this
months, half of the patients in the study had
discussion includes only the most widely
shorter, and half had longer, improvements in
reported.
length of survival. Ability to generalize results
12.10.1 M, -М» When the dependentvariable Vis measured in meaningful units, the difference between sample means can be useful information (Pek & Flora, 2018), although may authors do not refer to that
difference as an effect size. The difference between means can sometimes be interpreted as information about practical, clinical, or everyday importance. In this hypothetical example, people who consumed 150 mg of caffeine (about one cup of coffee) had heart rates about 10 beats per minute higher than those who did not consume caffeine. That is a noticeabledifference, but not large enough that people need to be worried about it. To make judgments about clinical or practical significance of differences between means, we need to understand the meaningsof different score values; even then, people can have different subjective evaluations. Imagine a situation in which people who receive chemotherapy for a specific type of cancer live on average 3 weeks longer than people who decline chemotherapy. Apart from the question of whether this difference is statistically significant, we have the question, How much practical value does a 3-week difference have? A medical researcher might be pleased to find a treatmentthat extendslife by 3 weeks. As a patient, however, I might not want to undergo possibly severe negative side effects unless the average extension of life was 2 or 3 months. In situations likethis, clinicians and patients should remember that group averages often do not predict individual outcomes well. If median improvementin length of survival is 3
from a study to your own personal situation shouldalso take into account how similar you are, and how similar your disease condition is, to persons included in the study. In the extremes it may beeasy to say whether a treatment such as a weightloss pill has practical or real-world significance. Most people would not think that a mean weightloss of 1 lb is enough to be meaningful or valuable. On the other hand, most people might think that a mean weight loss of 30 lb is enough to have practical, clinical, or
real-world value. For in-between amounts of weightloss, people may differ in how much they thinkis sufficient to beof value, relative to costs
and risks of the treatment. When variables are not measured in meaningful units, M, — M, may not provide useful real-world information (although it may still be interesting to compare values of M; — M» across different studies that use the same measures). For example, suppose you are told that female teachers receive average teaching evaluation scores of 24, while male teachers receive average evaluation scores of 27.You can see that the mean rating is higher for male than femaleteachers in this example, but you would need much more information to evaluate whether the difference is large. It is usually helpful to know the possible minimum and possible maximum score value and the actual
minimum and maximum values found in the sample (this information is sometimes not included, but it should be). Other effect size
indexes use standard deviation or variance of scores to evaluate effectsize. The value of M, — Mis notrelated to sample size
o 57% Page 345 of 624 - Location 2839 of 15772
Friedmann, E., Katcher, A. H., Lynch, J.J., &
Godlee, F., Smith, J., € Marcovitch, H. (2011).
Thomas, S. A. (1980). Animal companions and
Wakefield's article linking MMR vaccine and
one year survival of patients after discharge
autism was fraudulent. British Medical Journal,
from a coronary care unit. Public Health
342, 7452. doi: 10.1136/bmj.c7452
Reports, 95,307-312.
Grande, T. (2015, May 13). “Visual binning” Frigge, M., Hoaglin, D. C., & Iglewicz, B. (1989). Some implementations of the box plot. American Statistician, 43(1), 50-54.
Fritz, C. O., Morris, P. E., & Richler, J.J. (2012).
features on SPSS. Retrieved from https://www.youtube.com/watch? v=tAdmnPegsig Gray, J., & Griffin, B. (2009). Eggs and dietary
Effect size estimates: Currentuse, calculations,
cholesterol—Dispelling the myth. Nutrition
andinterpretation. Journal of Experimental
Bulletin, 341), 66-70. doi: 10.1111/j.1467-
Psychology: General, 141(1), 2-18. doi:
3010.2008.01735.x
10.1037/a0024338
Gray, M. (1985). Legal perspectives on sex equity GAISE College Report ASA Revision Committee. (2016). Guidelines for assessment and
in faculty employment. Journal of Social Issues,
41(4),121-134.
instruction in statistics education (GAISE): College report 2016. Retrieved from
http://www.amstat.org/education/gaise
Green, C. D., Abbas, S., Belliveau, A., Beribisky, N., Davidson, I.]., DiGiovanni, J., ... Wainewright, L. M.(2018). Statcheck in Canada: What
Gaito, J. (1980). Measurement scales and statistics:
proportion of CPA journal articles contain
Resurgence of an old misconception.
errors in the reporting ofp-values? Canadian
Psychological Bulletin, 87, 564-567.
Psychology, 59(3), 203-210. doi: 10.1037/cap0000139
Gallup. (n.d.). Methodology center: Scientifically proven methodology and rigorous research
Greenland,S., Maclure, M., Schlesselman, J. J.,
standards. Retrieved from http://www.gallup.com/178685/methodology-
Standardized regression coefficients: A further
center.aspx
critique and review of some alternatives.
Poole, C., & Morgenstern, H. (1991).
Epidemiology, 2, 387-392. Glen, S. (2013, December 3). Choose bin sizes for histograms in easy steps + Sturge's rule.
Greenland, S., Schlesselman, J. J., & Criqui, M. H.
Retrieved from http://www.statisticshowto.com/choose-bin-
coefficients and correlations as measures of
sizes-statistics/
effect. American Journal of Epidemiology, 123,
(1986). The fallacy of employing standardized
203-208.
06% Page 608 of 624 - Location 15021 of 15772
Grimm, K. J., & Ram, N. (2016). Growth modeling: Structural equation and multilevel modeling approaches. Thousand Oaks, CA: Sage.
Guthrie, R. V. (2004). Even the rat was white: A historical view of psychology (2nd ed.). Boston: Allyn & Bacon.
Hodges, J. L., Jr., Krech, D., & Crutchfield, R. S. (1975). Statlab: An empirical introduction to
statistics. New York: McGraw-Hill. Hoekstra, R., Kiers, H.A.L., & Johnson, A. (2012, May 14). Are assumptions of well-known statistical tests checked, and why (not)? Frontiers in Psychology, 3, Article 137. doi:
Harker, L., & Keltner, D. (2001). Expressions of positive emotion in women’s college yearbook pictures and their relationship to personality and life outcomes across adulthood. Journal of Personality and Social Psychology, 80, 112-124.
Harris, R.J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence
10.3389/fpsy.2012.00137
Hogg,R. V., Tanis, E., € Zimmerman, D. (2014). Probability and statistical inference (9th ed.). Boston: Pearson.
Howell, D. C. (1992). Statistical methods for psychology (3rd ed.). Boston: PWS-Kent.
Erlbaum. Huff, D. (1954). How to lie with statistics. New Hausman, J. S., Berna, R., Gujral, N., Ayubi,S.,
York: Norton.
Hawkins, J., Brownstein, J. S., & Dedeoglu,F. (2018). Using smartphone crowdsourcing to redefine normal and febrile temperatures in adults: Results from the Feverprints study.
Huff, D., & Geis, I. (1993). How to lie with statistics (Reissue ed.). New York: W. W. Norton.
Journal of General Internal Medicine, 33, 2046-2047. doi: 10.1007/s11606-018-4610-8
Hays, W.(1994). Statistics (5th ed.). Fort Worth, TX: Harcourt Brace.
Hays, W. L. (1973). Statistics for the social sciences (2nd ed.). New York: Holt, Rinehart.
Jaccard,J., & Becker, M. A. (2009). Statistics for the behavioral sciences (5th ed.). Pacific Grove, CA: Wadsworth Cengage Learning.
John, L.K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truthtelling. Psychological Science, 23(5), 524-532. doi: 10.1177/0956797611430953
Henry, P.J. (2008). College sophomores in the laboratory redux: Influences of a narrow data base on social psychology’s view of the nature of prejudice. Psychological Inquiry, 19(2), 49-71.
Kendall, M. (1962). Rank correlation methods (3rd ed.). New York: Hafner.
doi: 10.1080/10478400802049936
Kerr, N. L. (1998). HARKing: Hypothesizing after
96% Page 608 of 624 - Location 15063 of 15772
the results are known. Personality and Social
Kopf, D. (2015, October 5). Should you ever use a
Psychology Review, 2(3), 196-217. doi:
pie chart? Retrieved from
10.1207/s15327957pspr0203_4
https://priceonomics.com/should-you-everuse-a-pie-chart/
Keys, A. B. (1980). Seven countries: A multivariate analysis of death and coronary heart disease. Cambridge, MA: Harvard University Press.
Kuhn, T. S., & Hacking, I. (2012). The structure of scientific revolutions: 50th anniversary edition (4th ed.). Chicago: University of Chicago Press.
Kiely, E., & Robertson, L. (2016, November 18). How to spot fake news. FactCheck.org.
Retrieved from http://www.factcheck.org/2016/11/how-tospot-fake-news/ Kihlstrom, J. F. (2010). Social neuroscience: The
Kumar, G.N.S. (2015, March 15). Visual binning in
SPSS. Retrieved from https://www.youtube.com/watch? v=WHuXyVaRPvM
Lenhard,J. (2006). Models and statistical
footprints of Phineas Gage. Social Cognition,
inference: The controversy between Fisher and
28(6), 757-783. doi:
Neyman-Pearson. British Journal of Philosophy
10.1521/soco.2010.28.6.757
of Science, 57(1), 69-91. doi:
10.1093/bjps/axi152 Kirk, R. (1996). Practical significance: A concept
whose time has come. Educational and Psychological Measurement, 56, 746-759.
Lenth, R. V. (2018). Java applets for power and sample size. Retrieved September 3,2019, from
http://www. stat.uiowa.edu/~rlenth/Power Kline, R. B. (2013). Beyond significance testing: Reforming data analysis in behavioral research (2nd ed.). Washington, DC: American Psychological Association.
Kling, К. C., Hyde, J. S., Showers, C.J., € Buswell, B.
Lienhard,J. (2002). No. 1712: Nightingale’s graph.
Retrieved from https://www.uh.edu/engines/epi1712.htm Lindeman, R. H., Merenda, P. F., & Gold, R. Z.
N. (1999). Gender differences in self-esteem: A
(1980). Introduction to bivariate and
meta-analysis. Psychological Bulletin, 125,
multivariate analysis. Glenview, IL: Scott,
470-500.
Foresman.
Koch, G. G. (1982). Intraclass correlation coefficient. In S. Kotz & N. L. Johnson,
Lowry, R. (2019). The confidence interval of rho.
Retrieved from http://vassarstats.net/rho.html
Encyclopedia of statistical sciences (pp. 213-217). New York: John Wiley.
Lyon, D., & Greenberg,J. (1991). Evidence of codependency in women with an alcoholic
97% Page 609 of 624 - Location 15107 of 15772
parent: Helping out Mr. Wrong. Journal of Personality and Social Psychology, 61, 435-439.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological
Bulletin, 105(1), 156-166. Mackowiak, P. A., Wasserman, S. S., & Levine, M. M. (1992). A critical appraisal of 98.6 degrees F, the upper limit of the normal body temperature,
Mills, J. L. (1993). Data torturing. New England Journal of Medicine, 329, 1196-1199.
and other legacies of Carl Reinhold August Wunderlich. JAMA, 268, 1578-1580.
Mischel, W. (1968). Personality and assessment. New York: John Wiley.
Maril, C. C. (2018, August 29). 98.6 degrees isa normal body temperature, right? Not quite.
Wired. Retrieved from https://www.wired.com/story/98-degrees-is-anormal-body-temperature-right-not-quite/ Maronna, В. A., Martin, R. D., Yohai, V.J., & Salibidn-Barrera, M. (2019). Robuststatistics: Theory and methods (with R). Hoboken, NJ: John Wiley.
Montecino, V. (1998). Criteria to evaluate the credibility of WWW resources. Retrieved from
https://mason.gmu.edu/~montecin/web-evalsites.htm Mooney, K. M. (1990). Assertiveness, family history of hypertension, and other psychological and biophysical variables as predictors of cardiovascular reactivity to social stress. Dissertation Abstracts International,
McGill, R., Tukey, J. W., & Larsen, W. A. (1978).
51(3-B), 1548-1549.
Variations of box plots. American Statistician,
32(1), 12-16.
Myers, J. L., & Well, A. D. (1995). Research design and statistical analysis. Mahwah, NJ: Lawrence
McNemar, Q. (1947). Note on the sampling error
Erlbaum.
ofthe difference between correlated proportions of percentages. Psychometrika, 12(2), 153-157. doi: 10.1007/BF02295996
Nightingale, F. (1858). Notes on matters affecting the health, efficiency, and hospital administration of the British Army founded
Mersey, J.C.B., € Gough-Calthorpe, A. (1912). Report of a formal investigation into the
chiefly on the experience of the late war. London: Harrison & Sons.
circumstances attending the foundering on the 15th April, 1912, of the British Steamship “Titanic,” of Liverpool, after striking ice in or near latitude 41° 46’ N., longitude 50° 14’ W., North Atlantic Ocean, whereby loss of life ensued. London: His Majesty's Stationery Office.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. doi: 10.1126/science.aac4716
Pearson, E. S., & Hartley, H. O. (Eds.). (1970).
97% Page 609 of 624 - Location 15148 of 15772.
Biometrika tables for statisticians (3rd ed., Vol. 1). Cambridge, UK: Cambridge University Press.
Resnick, B. (2019). Hyped-up science erodes trust. Here's how researchers can fight back. Vox.
Retrieved from https://www.vox.com/sciencePek, J., & Flora, D. B. (2018). Reporting effectsizes in original psychological research: A discussion
and-health/2019/6/11/18652225/hype-
science-press-releases
and tutorial. Psychological Methods, 23(2), 208-225. doi: 10.1037/met0000126
Rosenthal, R. (1966). Experimenter effects in behavioral research. New York: Appleton-
Peters, J. (2013, July 13). When ice cream sales
Century-Crofts.
rise, so do homicides. Coincidence, or will your next cone murder you? Slate. Retrieved from
http://www. slate.com/blogs/crime/2013/07/0 9/warm weather homicide rates when ice cr eam sales rise homicides rise coincidence.ht ml
Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: Methods and data analysis (2nd ed.). New York: McGraw-Hill.
Rubin, Z. (1970). Measurement of romantic love. Journal of Personality and Social Psychology, 16,
Pickering, T. G., Gerin, W., & Schwartz, A.R.
265-273.
(2002). What is the white-coat effect and how
shouldit be measured? Blood Pressure Monitoring, 7(6), 293-300.
Rubin, Z. (1976). On studying love: Notes on the researcher-subject relationship. In M. P. Golden (Ed.), The research experience (pp. 508-513).
Pierce, R. (2017, March 29). Quincunx (Galton
Itasca, IL: Peacock.
board). Retrieved May 9, 2019, from
http://www.mathisfun.com/data/quincunx.ht
ml
Sawilowsky,S.S., & Blair, R. C. (1992). A more realistic look at the robustness and Type II error properties of the #test to departures from
Polya, G. (2014). Mathematics and plausible
population normality. Psychological Bulletin,
reasoning: Two volumes in one. New York:
111(2), 352-360. doi: 10.1037/0033-
Martino Fine Books. (Original work published
2909.111.2.352
1954) Schmidt, C. M. (2004). David Hume: Reason in Rasco, D. (2020). Companion volume for R. Thousand Oaks, CA: Sage.
Record, R. G., McKeown, T., & Edwards, J. H.
history. Philadelphia: Pennsylvania State University Press.
Schénbrodt, F. (2011). What is a reasonable
(1970). An investigation of the difference in
sample size for correlation analysis? Retrieved
measured intelligence between twins and single
from https://stats.stackexchange.com/questions/15
births. Annals of Human Genetics, 34, 11-20.
97% Page 610 of 624 - Location 15188 of 15772
842/what-is-a-reasonable-sample-size-forcorrelation-analysis-for-both-overall-and-s
Snedecor, G. W., & Cochran, W. G. (1989). Statistical methods (8th ed.). Ames: Iowa State University Press.
Sears, D. 0. (1986). College sophomores in the laboratory: Influences of a narrow data base on
Spring, B., Chiodo, J., € Bowen, D. J. (1987).
psychology's view of human nature. Journal of
Carbohydrates, tryptophan, and behavior: A
Personality and Social Psychology, 51, 515-530.
methodological review. Psychological Bulletin, 102,234-256.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.
Statistical Consultants Ltd. (2012, April 23).
Titanic survival data. Retrieved from http://www statisticalconsultants.co.nz/blog/ti tanic-survival-data.html
Shoemaker, A. L. (1996). What’s normal? Temperature, gender, and heart rate. Journal of
Sternberg,R. J. (1997). Construct validation of a
Statistics Education, 4. Retrieved June 27, 2006,
triangularlove scale. European Journal of Social
from https://www.tandfonline.com/doi/full/10.108
Psychology, 27, 313-335.
0/10691898.1996.11910512
Stevens, S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
Sigall, H., & Ostrove, N. (1975). Beautiful but dangerous: Effects of offender attractiveness and nature of the crime on juridic judgment. Journal of Personality and Social Psychology, 31, 410-414.
Stevens, S. (1951). Mathematics, measurement, and psychophysics. In S. Stevens (Ed.), Handbook of experimental psychology (pp. 1-49). New York: John Wiley.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after y-hacking. In S. Botti & A. Labroo (Eds.), Advances in consumer research. Duluth, MN: Association for Consumer
Research. Simons, D.]J., Shoda, Y., & Lindsay,S. (2017). Constraints on generalizability (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123-1128. doi: 10.1177/17546911708630
Stricker, L.J. (1991). Current validity of 1975 and 1985 SATs: Implications for validity trends since the mid-1970s. Journal of Educational Measurement, 28(2), 93-98.
Tabachnick, B. G., & Fidell, L. S. (2018). Using multivariate statistics (7th ed.). Boston: Pearson.
Tankard, J. W. (1984). The statistical pioneers. Cambridge, MA: Schenkman.
97% Page 610 of 624 - Location 15228 of 15772
standard deviations. It helps us visualize how much overlap there is between two distributions of scores. The following examples illustrate small
sourced from Kling, Hyde, Showers, and Buswell.
versuslargevalues of Cohen's d Figure 12.13
Abovethe imageisthefollowinginformation; Cohen's d equals 0.22 and Overlap equals 83.7
shows a small effectsize. Data from numerous
percent.
studies suggests that men tend to haveself-
Thedistribution on theleft has the d at 0 and the one ontheright hasthe d at .22. Thus, the distributions overlap.
esteem scores about .22 (two tenths) 57 higher than those of women (i.e., Cohen's 2=.22). Thisis a small effect. Figure 12.13 shows the overlap
A note below the graph states the following:
between these two distributions of scores. The normal distribution on the left represents selfesteem scores for women, with the meanlocated at d= 0. The distribution on the right represents self-esteem scores for men, with the mean located
at d=.22. Figure 12.13 Small Cohen’s d Effect Size and Overlap of Female (Left) Versus Male (Right)
Distributions of Self-Esteem Scores
Across numerousstudies, the average difference in self-esteem between male and female samplesisestimatedto be about .22; mean self-esteem for menis typically about twotenths of a standard deviation than mean self-esteem of women. Figure 12.14 Large Cohen's d Effect Size and Overlap of Female (Left) Versus Male (Right)
Cohen's d= 0.22 Overlap = 83.7%
Distributions of Heights
Cohen's d=2.00 Overlap = 18.9%
o
1 d
2
3
4
5 o
Source-Kling, Hyde, Showers, and Buswell (1999).
Note:Across numerousstudies, the average
1 d
2
3
4
5
Source-http://en.wikipedia.org/wiki/Effect size.
difference in self-esteem between male and
Note:From samples of men and women in the
female samples is estimated to be about .22; mean
United Kingdom, mean height for men = 1,755
self-esteem for menistypically about two tenths
mm, and mean height for women = 1,620 mm.
of a standard deviation /herthan meanself-
The standard deviation for height = 67.5 mm.
esteem of women.
Therefore Cohen's @= (Male — Mremale)/s = (1,754
-1,620)/67.5 = 2.00. The image resembles a normal distribution with two overlapping curves that shows the small Cohen's effect size. The imagehas been
The image resembles a normal distribution
o 57% Page 348 of 624 - Location 8924 of 15772
Wilkinson, L., & Task Force on Statistical Inference, APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. Winer, B.J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed.). New York: McGraw-Hill. Wootson, C. R., Jr. (2017, July 18). Diet drinks are associated with weight gain, new research suggests. The Washington Post. Retrieved from
https://www.washingtonpost.com/news/toyour-health/wp/2017/07/18/diet-drinks-areassociated-with-weight-gain-new-researchsuggests/?utm term=.83b6d025e6b5 Zumbo, B. D., € Zimmerman, D. W. (1993). Isthe selection of statistical methods governed by level of measurement? Canadian Psychology, 34,390-400.
98% Page 611 of 624 - Location 15309 of 15772
nonparametric alternative to, 407-408
Index
null hypothesis for, 375, 403-405 planned contrasts, 387-389, 399, 467
Accidental sample, 30 Aggregated data, 283 Alpha (a) level choosing of, 198-200 reject regions, 200-202 Alternative hypothesis (Za) reject regions, 200-202 selection of, 195-197
post hoc or protected tests, 390-391, 467
preliminary data screening for, 377-378 questions in, 375-376
repeated-measures. See Repeatedmeasures ANOVA reporting results from, 397-398 research situations for, 374-375 source table, 452
American Statistical Association Guidelines for Assessment and Instruction in Statistics
in SPSS, output, 394-397
Education, 1, 10
SSbetween, 380-381, 383-385
Analysis bivariate. See Bivariate analyses definition of, 16 selection of, 28-29 selective reporting of, 567 variable type based on, 18 Analysis of covariance (ANCOVA), 507 Analysis of variance (ANOVA) assumptions for, 377-378 computations for, 378-383 confidence intervals for group means, 385
contrast coefficients, 388 data screening for, 377-378 description of, 306 division of scores into components, 400— 403
effect sizes for, 385-386 errorin, 375
expected Fvalue, 403-405 factorial, 374. See also Factorial ANOVA factor in, 374
in SPSS, 391-394
SSiotal, 381-382 SSwithin, 381, 383-385 SSwithin groups, 378 statistical power analysis for, 386-387 study, planning of, 398-399 summary of, 399-400 ttest versus, 374,405-406
Anecdotal evidence, 4 Anecdotes, 4 ANOVA.See Analysis of variance A priori comparisons, 376 Areas under normal distribution, 138-140 zscores and, 138 Area under the curve, 153 Arithmetic operations, order of, 94 Artifacts, 253 Associations, 237-239
Asymptotic output, 543 Attenuation (of correlation) due to unreliability, 282
hypothetical research examplefor, 375
Attrition, 432
independent-samples ztest and, ④0⑤-
Bar charts
④0⑥
Kruskal-Wallis test versus, 407-408
08% Page 612 of 624 » Location 15322 of 15772
construction of, 101-102 data screening uses of, 122-124
deceptive, 11,102-103
predictor variables in, 314
for frequencies of categorical variables,
preliminary data screening for, 297-298
100-101
questions for answering, 296-297
group means represented with, 125-126
regression equations, 291-296
Bar graphs, 11
regression line, 291-292, 296, 304
Bell-shaped distribution
researchsituations using, 290
communication about, 120
statistical significance tests for, 300
description of, 78-79,109-111, 138
Bivariate regression coefficients, 298-299
illustration of, 103-104
Bonferroni procedure, 256-257, 390, 468
mean for, 82
Boxplot
sketching of, 109-111
definition of, 115
Bernoullitrials, 161
outer fences in, 117
Beta (6, risk for Type I error), 221, 223, 231,
settingup, 115-117
299
SPSS for obtaining, 117-120, 123
Between-S, 28, 374, 413, 417,437 See also1ndependent-samples t test Between-Sfactorial ANOVA, 487-489
Carryover effects, 431,474 Case number, 44 Cases, 16
Bias confirmation, 2, 9-10 in correlation, 278 Bimodal distribution, 79-80, 106 Binned frequency distribution, 48-49 Binning, 48,113
Binomial distribution, 161,163 Bivariate analyses dependentvariables for, 563-564 independentvariables for, 563-564 nonparametric, 564-565 parametric, 564-565
results section of, 569-570 selection of, 563 variables added to, 570-572 Bivariate outliers, 242, 244-246, 455
Bivariate Pearson correlation. See Pearson's r Bivariate regression advantages of, 291 comparing two forms of regression, 295-296
information provided by, 290-291 partition of sums of squares in, 312-313 planning of study, 314-315
08% Page 612 of 624 + Location 15348 of 15772
Case study, 4 Categorical variables bar charts for frequencies of, 100-101 data screening for, 46 dependent, 563-564 description of, 17, 38, 571-572 frequency distribution tables for, 40, 49— 50
independent, 563-566 modefor, 45 naturally occurring groups, 50 pie charts for, 99-100 treatment groups, 50
Causal claims description of, 6 “post hoc, ergo propter hoc”fallacy, 6
Causal inference conditions for, 20-21 correlation and, 235 description of, 568-569 evidence of, requirements for, 8-9 Ceiling effect, 147 Central limit theorem, 169-173, 176
Central tendency measures
exampleof, 184-187
description of, 72
graphing of, 357-358
lying with, 83
for group means, 385
See also specific measure
independent-samples ¿test, 357-358
Change scores, 423 Cherry-picking, 2
interpreting of, 183
Chi-square (42) test
for regression coefficients, 300
95%, for Pearson's 7, 273-274
computation of, 533-535
samplestatistics obtained using, 187
description of, 28, 532
sampling error used to set up, 181-182
effect size indexes for, 552
Confidence levels
effect sizes, 536-538
for correlations, 257
expected cell frequencies if Mois true,
in research reports, 257
532-533
Confirmation bias, 2, 9-10
as “goodness of fit” index, 559
Confirmatory evidence, 195
one-way, 559
Confirmatory studies, null-hypothesis
results, reporting, 542-543
significance testing in, 224
SPSS, 538-540
Confounded variables, 21
SPSS, crosstabs procedure, 540-542
Confounds, 21, 23,363
statistical significance of, 535-536
Consensus, 10
in structural equation modeling, 560
Construct validity, 314
uses of, 558-560
Contingency, defined, 526-528 Contingency tables
Citation, 4
Clinical significance, 217
association, measures of, 552
Close to the mean, 180
assumptions for, 543-551
Coefficient of determination, 258
chi-square analysis of. See Chi-square (x2)
Cognitive behavioral therapy, 5
test
Cohen’s d 214-215, 217-218, 348-349, 353
conditional probabilities, 528-529
Communicators
data screening for, 543-551
credentials of, 3
description of, 524
skills of, 3
examples of, 524-526, 530-532
Complete counterbalancing, 473
expected cell frequencies if Mois true,
Completely crossed, 481
532-533
Computer simulations, 33
expected values in cells, minimum
Conceptual replication, 9
requirements for, 543-544
Concordant pairs, 253
Fisher exact test, 556-557
Conditional probabilities, 528-529
groups, combining, 547-551
Confidence intervals
groups, removing, 544-547
around M; - M2, 342
marginal distributions for Xand ¥
body temperature exampleof, 184-187
constrain maximum value of ¢, 557-558
description of, 143,169
McNemartest, 553-556
null hypothesis for, 529-533
08% Page 613 of 624 - Location 15376 of 15772
observations, independence of, 543
Cubic trends, 459
with repeated measures, 553-556
Cumulative percentage, 48,137
2x2,557 unconditional probability, 528 Contrast coefficients, 388,393 Contrasts description of, 376 in general linear model, 456-460 polynomial, 458 population variance of, 455 repeated, 457-458 simple, 457 Control variables, 571-572 Convenience sample, 30,172
Correlation alpha level for tests of, 256 attenuation of, due to unreliability, 282 bias in, 278 causal inference and, 235 causation and, 6 computation of, 252-253 confidence levels for, 257 cross-validation of, 256 differences between, testing significance
of, 274-275 limiting the number of, 256 magnitudeof, 283 meaning of, 6 as necessary but not sufficient condition, 7 Pearson product-moment, 6
perfect, 7-8 point biserial, 347 replication of, 256 skepticism about, 270-271 spurious, 263-264, 281
Correlational study, 24 Counterbalancing, 472-474 Covariance, 285-286 Cramer's 7, 536, 540
Cronbach’s alpha reliability, 450
08% Page 614 of 624 - Location 15403 of 15772
Data aggregated, 283 definition of, 16 repeated-measures, 419-420, 444
Data analysis, 376 Data collection, ethical issues in, 10-11 Data organization for independent-samples /test, 418-419 for paired-samples test, 419 Data reporting, 39 Data screening for ANOVA, 377-378
bar charts, 122-124 for categorical variables, 46 for frequencydistribution tables, 39-40 graphs for, 121 preliminary, 149 Dataset, 16
Data torturing, 572
Deceptive bar graphs, 11,102-103 Degree of belief, 12,573 Degrees of freedom (47 description of, 88-89, 179 in factorial ANOVA, 489-493 reject regions, 200-202 Dependent variables, 19,314, 563-564 Descriptivestatistics in journal articles, 92-93 notation, 73
quantitative variables, 72, 150-151 reporting of, 92 SPSS use of, for obtaining quantitative variable, 83-85 Descriptive use of statistics, 167
Deviation from the mean, 136 Dichotomous variable, 242 Difference (2) scores, 420-421 Directional test, 196,202
Disconfirmatory evidence, 195
340,349,357
Discordant pairs, 253
Equal variances not assumed ztest, 333,372
Distribution
Error
bell-shaped. See Bell-shaped distribution
definition of, 223,375
bimodal, 79-80, 106
prediction, 171-172, 223
binned frequency, 48-49
sampling. See Sampling error
F,383
technical types of, 224
frequency. See Frequency distribution
Typel, 220-223
Gaussian, 103,163
Type II, 220-223
grouped frequency, 48-49
Error bars, in graphs of group means, 188—
J-shaped, 105,120
189,357-359
normal. See Normal distribution
Errors in interpretation, 16
skewed, 80-82
Error variance
trimodal, 106
description of, 260
uniform, 105
within-group, 356
Distribution shapes, 98, 150-151
Eta squared (72) calculation of, from Fratio, 385
Effectsize for analysis of variance, 385-386 for chi-square test, 536-538 computation of, 349-350 description of, 214-216, 226, 300 in factorial ANOVA, 493-494 forindependent-samples test, 345-353, 429
indexes, 258 interpretation of, 351 Nand, 353-355
for paired-samples ¿test, 429-430 Pearson's rand 72as, 258-261
for repeated-measures ANOVA, 470 summary of, 350-353 unit-free, 570
Effect size indexes for chi-square test, 552 Cohen`s ② 348-349 eta squared, ③④⑥-③④⑦, ③⑥④ forindependent-samples Ztest, ③④⑤-③⑤③
M; - M2, 345-346 point biserial 7, 347-348 Equal variances assumed version of the Ztest,
99% Раде 614 of 624 » Location 15430 of 15772
description of, 346-347, 364 Ethical issues, in data collection, 10-11
Evidence replication of, 9 selective reporting of, 567 supporting, 4-5 Exact replication, 9 Experimental controls, 21, 23, 569 Experimental error, 356 Experimental research design control group in, 21 definition of, 21 experimental control in, 22 quasi-, 25-26 Experiments, quasi-, 25-26
Experiment-wise alpha (EW), 256 “Experiment-wise” error rate, 468, 496
Exploratory studies, 224 External validity, 27-28 Extreme bivariate outliers, 244-246 Extreme scores, 77-78
Extreme values, 198 Factor, in ANOVA, 374
Factorial ANOVA
weighted means, 508-510
assumptions, violations, 486
Factorial design, 481
A x B,test of, 484-485
Fdistribution, 383
between-S, 487-489
Fisher, R. A., 559
components in, 520
Fisher exact test, 556-557
degrees of freedom calculation in, 489—
Fisher’s Z, 273
493
Fixed factors, 508
description of, 571
Floor effect, 146
effect size estimates in, 493-494
Fratio
fixed factors, 508
description of, 371, 379, 382, 399, 426
Fratio in, 483, 506
expected value, 403-405
group means, 505-506
in factorial ANOVA, 483, 506
hypothetical research situation, 486-
Frequency, 40-41
487
Frequency counts, 40-41
main effect differences, 496
Frequency distribution
main effect for Factor A, null hypothesis
binned, 48-49
for, 484
grouped, 48-49
main effect for Factor B, null hypothesis
ungrouped, 46-48 Frequency distribution tables
for, 484 modelfor, 515-518
for categorical variables, 40, 49-50
nonexperimental research situations
cumulative percentage, 48
using, 482
for data screening, 39-40
nonorthogonal, partition of variance in,
elements of, 40-42
513-515
frequency counts, 40-41
null hypotheses in, 484-485
missing values, 41, 44, 63-65
orthogonal, 486,489,511-513
overview of, 37-39
questions in, 482-483
percentages in, 41-42
random factors, 508
proportions, 41
research situations using, 481-482
for quantitative variables, 39, 46-50
results, 504-505
SPSS for obtaining, 42-44
SPSS GLM procedure for, 496-499
total number of scores in a sample, 41
SPSS output, 499-504
ungrouped, 46-49
statistical power in, 494-495
variation amongscores in, standard
summary of, 507
deviation for describing, 90-91
sum of squares in, 489-493, 505-506,
Friedman one-way ANOVAtest, 476-478
518-520
Function
two-way, 515
definition of, 152
two-way interaction, 495-496
linear, 152
2 x 2,495, 504 unequal cell 7s in, 508, 510-515 unweighted means, 508-510
99% Page 615 of 624 » Location 15458 of 15772
Gallup, 5,29 Galton board, 161-162
Gaussian distribution, 103, 163
See alsoNormal distribution
Huynh-Feldt procedure, 466 Hypothesis
Generalizability, 5, 568
alternative, 195-197
Generalization, 16
null. See Null hypothesis (HO)
General linear model contrasts in, 456-460
definition of, 449 simple contrasts in, 457 SPSS procedure, for repeated-measures ANOVA, 460-464
variables added to, 474-475
GLM.See General linear model Goodness of fit index, 559 Graphs
Hypothetical or imaginary population, 30 Imaginary population, 30 Imperfect association, 8
Independent-samples /ratio, 340-341, 350 Independent-samples ¿test ANOVA and, 405-406
assumptions for use of, 332-338 computation of, 338-341 confidence interval around M; - М», 342
bar, 11
confidence intervals, 357-358
lying with, 11
confounds, 363
maps as formats for, 127
data organization for, 418-419
research uses of, 121-122
description of, 28-29, 122,413
Greenhouse-Geisser df 465-466
effect size indexes for, 345-353,429
Grouped frequencydistribution, 48-49
formulafor, 353
Group means
groupsin, 414
bar charts used to represent, 125-126
hypothetical research examplefor, 331
comparisons among, 375
Mann-Whitney Utest versus, 365,367
distances among, information about,
nonparametric alternative to, 365-367
380-381
null hypothesisfor, 403, 422
errors bars in graphs of, 188-189
outliers within groups, 333-334, 337
factorial ANOVA, 505-506
paired-samples ¿test versus, 413, 426429
Harmonic mean of 7's, 387,412
preliminary data screening, 335-338
Higher order polynomials, 459
research situations for, 329-331
Histograms
Results section, 357
examples of, 132-133, 145
samplesize for, decisions about, 361-364
for groups, 123
SPSS commandsfor, 342-344
negatively skewed, 249
SPSS output for, 344-345
for quantitative variables, 103-107
SSpartition in, 453-455
settingup, 111-115
statistical power for, 362
SPSS used to obtain, 107-109, 248
statistical significance of, 341-342
Homogeneity of variance, 332,456
study design, issues in, 363-364
Homoscedasticity, 297
summary of, 364-365
Human error, 226
terms for, values of, 332-338
Hume, David, 12
99% Page 616 of 624 - Location 15483 of 15772.
Independentvariables, 19,314, 563-564,
571-573
description of, 18, 79
Index effect size. See Effectsize indexes
justification for using, 33-34 questionnaire item on, 79
for kurtosis, 158-159
Linear function, 152,458
for skewness, 157-158
Linearrelationships, 235
Inferential use of statistics, 167
Linear trend contrast, 389
Institutional animal care and use committee,
Literature reviews, 2
10
Logistic regression, 3
Interaction effect, 487
Lower tail, 142-143, 180
Intercept (4p), 291 Internal validity description of, 27-28 threats to, 432 Interpretation, errorsin, 16
Interquartile range, 116 Interval level of measurement, 32
M, — M9, 345-346
Main effects-only model, 486 Mann-Whitney Utest, 362,365, 367, 407,
564 Maps, 127-128
Marginal frequencies, 525 Margin of error
Journal articles, descriptive statistics in, 92—
description of, 188
93
for percentages in surveys, 553
J-shaped distribution, 105, 120 Kendall's tau correlations, 272-273, 563 Kolmogorov-Smirnov test, 148, 159
Kruskal-Wallis test, 407-408 Kurtosis description of, 148 formula for, 158 index for, 158-159 Latin squares, 473-474
Leptokurtic distribution shape, 148 Levelof confidence, 181 Levels of a factor, 482 Levels of measurement, 31-33 interval, 32 nominal, 31-32 ordinal, 32 ratio, 32-33 in SPSS, 61-63
Levene Ftest, 333, 345, 357, 371, 377, 394, 499
Matched pairs, 416-417 Mauchly’s sphericity test, 456, 465-466 Maximum scores, 85-86 McNemar test, 553-556, 563
Mean advantages of, 76-77 for bell-shaped distributions, 82 definition of, 73 deviation from, 76,136 of difference scores, 420 disadvantages of, 77-78 obtaining of, 74-75 in real-worldsituations, 78 sum of deviations from M= 0, 75-77 when to choose, 82-83 See also Group means
Median definition of, 73 obtaining of, 73-74 in real-worldsituations, 78 when to choose, 82-83 Meta-analysis, 353
Likert scale 99% Page 617 of 624 - Location 15512 of 15772
Minimum scores, 85-86
Nonparametric analyses, 564-565
Missing values
No person-by-treatment interaction, 456
frequency distribution tables, 41, 44, 63—
Normal distribution
65
areas under, 138-140
SPSS, 61
definition of, 135
Mixed models, 507
description of, 103-105, 120
Mode
development of, 163
for categorical variables, 45
locations of individual scores in, 135
in real-worldsituations, 78
lower tail of, 142-143, 180
when to choose, 82-83
mathematics of, 152-154
Moderator variables, 572
middle area of, 142, 180
Monte Carlo simulation
negatively skewed, 146-147
sampling distribution of M, 175
outliers relative to, 144-145
sampling errorin, 171
positively skewed, 104, 146-147
Multiple-point rating scales, 18
real-world variables, 160-163
Multivariable analyses, 573
skewness of, 146-147
Multivariate analyses, 573
standard, 135, 140-141, 154
Multivariate analysis of variance (MANOVA),
upper tail of, 142-143, 180
507,573
Normal distribution shape
Naturally occurring groups, 50 Naturally occurring pairs, 415-416 Necessary but not sufficient, 7 Negatively skewed distribution, 105 Negatively skewed histograms, 249 New Statistics approach, 210 Nominal level of measurement, 31-32 Nominal variables, 17 Nonadditivity, 476 Nondirectional test, 196, 202, 204-206
See also Two-tailed test Nonequivalent control group, 25 Nonexperimental research design, 24-25 Nonlinear relationships, 284-285 Nonorthogonal factorial ANOVA, 513-515 Nonparametricalternatives to ANOVA,407-408
to independent-samples /test, 365-367 to paired-samples test, 438-440 to Pearson's 7, 271-273 to repeated-measures ANOVA, 476-478
99% Page 617 of 624 + Location 15538 of 15772
description of, 138-139 overall departure from, 159-160 Normality departure from, 157-158 description of, 148-149 Null hypothesis (Zp) ANOVA, 403-405
contingency table analysis, 529-533 expected Fvalue when true, 403-405 in factorial ANOVA, 484 false, 222 formal, 254 forindependent-samples test, 403,422 “no-interaction,” 516
for paired-samples ¿test, 422 planned contrasts, 387-388 rejection of, 218,220, 225 for repeated contrasts, 457 for repeated-measures ANOVA, 444 rho (pg = 0), 254-255 stating of, 194-195 Null-hypothesis significance testing (NHST)
alternative hypothesis, 195-197
Repeated-measures ANOVA
in confirmatory studies, 224
Open Science model, 9
definition of, 193
Order effects, 430-431
disconfirmatory evidence, 195
Order of arithmetic operations, 94
in exploratory studies, 224
Ordinal level of measurement, 32
logic of, 194-195, 210, 255
Ordinal variables, 17-18
null hypothesis, 194-195,218
Orthogonal contrasts, 389
rules for using, 203-204
Orthogonal factorial ANOVA, 486,489,511
traditional approach to, 210
513
Typelerrorin, 221-222
Outcome variables, 314
Type Il error in, 221
Outer fences, 117
Null outcomes, 225 Numeracy guidelines, 1 Oddsratios, 3, 552 OLS derivation of equation for regression coefficients, 321-323 Omnibustest, 375 One-sample ¿test assumptions for, 203 description of, 193, 197-198, 329
Outliers, 314 bivariate, 242,244-246,455 definition of, 144 independent-samples ¿test, 333-334, 337
normal distribution and, 144-145 Pearson's rand, 242 in SPSS, 154-157
Paired-samples ¿test
equation for, 215
advantages of, 437
one-tailed, reporting results for, 209
assumptions for, 422-423, 433-437
questions for, 203
data organization for, 419
reporting results, 227
designs, 414
SPSS analysis, 206
difference (d) scores, 420-421
two-tailed, reporting results for, 207—
effect size for, 429-430
208
as follow-up, 468-469
One-tailed p values, 201
formulas for, 423-424
One-tailed test
hypothetical study, 417-418
advantages of, 209-210
independent-samples ¿test versus, 413,
description of, 196
426-429
disadvantages of, 209-210
matched pairs, 416-417
driving speed data analysis using, 208—
naturally occurring pairs, 415-416
209
nonparametric alternative to, 438-440
one-sample¿test, 209
null hypothesisfor, 422
reject region, 208
paired samples, 415-417
two-tailed tests versus, 209-210
repeated-measures ANOVAversus, 443
One-way between-subjects (between-S)
results for, 426-429, 433
ANOVA.See Analysis of variance
SPSS procedure, 424-426
One-way repeated-measures ANOVA. See
summary of, 437-438
99% Page 618 of 624 + Location 15567 of 15772
terms for, values of, 428
Percentages
variance in, 429
cumulative, 48, 137
Wilcoxon signed rank test versus, 438—
in frequencydistribution tables, 41-42
440
in surveys, margin of error for, 553
Parametric analyses, 564-565
Percentile rank, 48
Parametric statistics, 564
Perfect correlation, 7-8
Partition of sums of squares, 312-313
Perfect negative correlation, 237
Partition of variance
Person effects, 427
definition of, 520
Person x Treatment interaction, 449, 475—
in nonorthogonal factorial ANOVA, 513—
476
515
みbhacking ②①③, ⑤⑥④ ⑤⑦②
Pearson product-moment correlation. See
Phi coefficient (¢), 536
gearson`sr
Pie charts
Pearson's 7
for categorical variables, 99-100
artifacts that affect, 253
disadvantages of, 100
assumptions for, 242-244
Plagiarism, 4
bivariate outliers and, 242, 244-246
Planned contrasts, 376,399,467
computation of, 251-252, 285-286
ANOVA, 387-389, 399,467
definition of, 251, 290
null hypothesis, 387-388
deflation of, 275-276
Platykurtic distribution shape, 148
description of, 29
Plotting residuals, 315-318
distribution shapes, 243-244
Pointbiserial 7, 347-348
Fisher’s Zconversion of, 273
Political polls, 188
formulafor, 285-286
Polling organizations, 5,29
magnitudeof, 235, 275-283
Polynomial contrasts, 458
95% confidence interval for, 273-274
Pooled-variances /test, 340, 349, 357
nonparametric alternatives to, 271-273
Popper, Karl, 10
outcomes for, 262-264
Population
overestimation of, 278-280
definition of, 29
phi coefficient interpreted as, 537
hypothetical, 30
preliminary data screening for, 244
imaginary, 30
rand 7 as effectsizes, 258-261
notation for, 168-169
research example of, 246-250
sample and, 16,29
research situations for, 234
sample versus, 172
results sections for, 269-270
Population effect size, 220, 222
Spearman's 7 versus, 271-273
Population sampling distribution, 176-177
statistical power for, 261
Population standard deviation (0), 177-178
statistical significance of, 262
Population standard error (om)
tratio from, 255
factors that influence, 173
when 7= 0.0, 240-241
Neffects on value of, 173-176
Peer review, 9
100% Page 619 of 624 + Location 15593 of 15772
Population variance of contrasts, 455
Positively skewed distribution, 104, 146
frequency distribution tables for, 39, 46—
“Post hoc, ergo propter hoc”fallacy, 6
50
Post hoc power analysis, 220, 262,363
histograms for, 103-107
Post hoc tests, 376, 467
independent, 563-564
Practical significance
questions about, 72
description of, 208,217
SPSS use of descriptivestatistics for, 83—
statistical significance versus, 567-568
85
Prediction error, 171-172, 223,298
Quasi-experimental research design, 25-26
Predictor variables, 314
Quasi-experiments, 25-26
Preliminary data screening
Quincunx, 161-162
for ANOVA, 377-378
for bivariate regression, 297-298 description of, 149 forindependent-samples test, 335-338 for Pearson's 7, 244 Primary source, 2
Probability conditional, 528-529 unconditional, 528 Proportions, in frequency distribution tables,
41 Protected tests, 376,390-391, 467
“Protective factors,” 8 Proximal similarity model, 30 ク values critical evaluation of, 226 definition of, 198 exact, 206-207
limitations of, 567 misleading, 226 one-tailed, 202 problems with, 213 things not to say about, 211,228 two-tailed, 201, 209
Quintic trends, 459-460 Random assignment of participants to groups or conditions, 22-23 Random factors, 508 Random sampling of participants froma population, 22 Range, 85-86
Range rule, 90 Rating scales description of, 18 justification for using, 33-34 Ratio level of measurement, 32-33 Raw-score prediction equation, 309 Raw-score regression equation, 294, 308
Real-world variables, 160-163 Regression coefficients confidence intervals for, 300 description of, 291 OLS derivation of equation for, 321-323 Regression equations description of, 291-296, 307 graphing a line from two points obtained from, 320-321 Regression line, 291-292, 296, 304
gratio, 390-391
Regression slope, 291
Quadratic trends, 458-460
Reject regions
Quality control, 9
definition of, 198
Quantitative variables
specifying of, 199-202
dependent, 563-566
Reliability, 314
description of, 17-18, 38
Repeated contrasts, 457-458
100% Page 620 of 624 » Location 15621 of 15772
Repeated measures, contingency tables with,
Research designs
553-556
description of, 21
Repeated-measures analyses
experimental. See Experimental research
carryover effects in, 431
design
order effects in, 430-431
Researcher credentials, 3
participants, 431-432
Research questions, 19-20
Repeated-measures ANOVA
Research reports
advantages of, 472, 475
confidence levels in, 257
assumptions for, 455-456
description of, 571
computations for, 446-449
languagein, 2-3
counterbalancing in, 472-474
peer review of, 9
data, preliminary assessment of, 444—
Residuals
446
definition of, 298
description of, 571
plotting, 315-318
effect size, 470
standardized, 317
Friedman one-way ANOVAtest versus,
Results
476-478
ANOVA, 397-398
GLM, 460-464
bivariate analyses, 569-570
GLM, contrasts in, 456-460
chi-square tests, 542-543
GLM,output of, 464-468
factorial ANOVA, 504-505
GLM,variables added to, 474-475
generalizability of, 5
nonparametric alternative to, 476-478
independent-samples ¿test, 357
null hypothesisfor, 444
interpretation of, problems in, 31
overview of, 443-445
one-sample test, 207-209, 227
paired-samples /test versus, 443
paired-samples /test, 426-429, 433
Person x Treatment interaction, 475—
Pearson’s 7, 269-270
476
repeated-measures ANOVA, 469-470
results for, 469-470
Reverse J-shaped distribution, 105, 146
SPSS reliability procedure for, 449-453
“Risk factors,” 8
SSpartition in, 453-455
Rival explanatory variables, 21
statistical power of, 470-472
Robust analyses, 564
summary of, 475
Rounding, 94-95
Repeated-measures data, 419-420 Replication, 9 Representative sample, 30, 172
Research analysis of variance, 374-375 bivariate regression uses in, 290
future, planning of, 227 graph uses in, 121-122 past, understanding of, 226
100% Page 620 of 624 » Location 15648 of 15772
Row percentages, 526-528
Sample accidental, 30 convenience, 30,172
definition of, 29 notation for, 168-169 population and, 16, 29 population versus, 172
B, sp would berelatively large. If other factors
heartrate.
(effect size and A) are held constant, there would be a better chance of obtaining a large ¿value for
Results
Study A than for Study B. Recruiting similar participants can help withstatistical power, but it
An independent-samples /test was
also reduces generalizability of findings. The
performed to assess whether mean heart rate
participants in Study A are notdiverse.
differed significantly for a group of 10 participants who consumed no caffeine
12.11.4 Summary for Design Decisions Members of my undergraduate class became upset when I explained the way research design decisions can affect the values of « They said, “You
mean you can make a study turn out any way you want?” The answer is, within some limits, yes. The independent-samples ¿testis likely to be large for these situations and decisions. (For each factor, such as 7, add the condition “other factors being equal.”)
(Group 1) compared with a group of 10 participants who consumed 150 mg of caffeine. Preliminary data screening indicated that scores on heart rate were reasonably normally distributed within groups. There were two high-end outliers in Group 1, but they were not extreme; outliers were retained in the analysis. The mean heartrates differed significantly, (18) =-2.75, p= .013, two
tailed. Mean heart rate for the no-caffeine group (M= 57.8, SD = 7.2) was about 10 beats per minute lower than mean heart rate for the caffeine group (M= 67.9, SD= 9.1). The
® Nis large (a very large NV study can yield a
effectsize, as indexed by n2, was .30; this is a
statistically significant ¿ratio evenif the
very large effect. The 95% CI for the
population effect is very small).
difference between sample means, /ハ ー か ,
e Population effect size such as n° is large(this is often related to treatment dosages or types
had alower bound of-①⑦.⑧① and an upper bound of -2.39. This study suggests that
of participants being compared). e M, — Mis large (however, M; — M» is not
consuming 150 mg of caffeine may
interpretable if confoundsare present). * spis small (this happens when participant
increase on the order of 10 bpm.
significantly increase heart rate, with an
characteristics and assessmentsituations are
The assumption of homogeneity of variance was
homogeneous within groups).
assessed using the Levene test, £= 1.57, p= .226;
Depending on their research questions and resources, the degree to which researchers can control each of these factors may vary.
12.12 Results Section
this indicated no significant violation of the equal variance assumption. Readers generally assume that the equal variances assumed version of the 7 test (also called the pooled-variances ¿test) was used unless otherwise stated. If you see d/ reported to several decimal places, thistells you that the equal variances not assumed /test was
Following is an example of a “Results” section for
used.
the study of the effect of caffeine consumption on
eee 59% Page 356 of 624 - Location 9139 of 15772
paired-samples /test procedure, 424—
Standardized residuals, 317
426,468
Standard normal distribution, 135,154
Q-Qplot, 160
kurtosisfor, 159
relationship survey example, 264-269
reading tables of areasfor, 140-141
reliability procedure, for repeated-
Standard regression, 514
measures ANOVA, 449-453
Standard score, 135
salary data, 301-305
See alsoz scores (standard scores)
Save Output As dialog box in, 51, 55
Statistical analysis, 28-29
statistical power, 362
Statistical association, 7
Tukey HSD test, 391,396
Statistical control, 21, 446, 569
variables, properties, 58-63
Statistically significant outcomes
variables names, defining, 58-63
human error as cause of, 226
Wilcoxon signed rank test output, 440
interpretation of, 225-226
windows, moving between, 54-56
sampling error as cause of, 226
zscores, saving, 164-165
SPSS GLM procedures
Statistical power assessment of, 300
description of, 449
for correlation studies, 261-262
effect sizes, 493
definition of, 261,361
factorial ANOVA using, 496-499
description of, 218-220, 229-232
repeated-measures ANOVA, 460-464
in factorial ANOVA, 494-495
source tables, 492-493
repeated-measures ANOVA, 470-472
Spurious correlation, 263-264, 281
Statistical power analysis
SShetween, 380-381, 383-385
for ANOVA, 386-387
SSerror, 445
description of, 352,361
SStotal, 381-382, 487 SStreatment 445
SSwithin, 381, 383-385 SSwithin groups, 378 Standard deviation computation of, 115 population, 177-178 sample (s), 89 variation among scores in frequency table, 90-91 Standard error of the difference, 340 Standard error of the estimate (SZest), 318—
Statistical power tables, 218-219, 300, 361 Statistical significance chi-square test, 535-536 definition of, 217, 372 independent-samples ztest ③④①-③④② Pearson’s with, 262 practical significance versus, 567-568 Statistical significance tests avoidance of, 255-256 description of, 193 formulasfor, 353 limitations of, 567 logic of, 193
319
number of, 226
Standard error (SE) for Md, 424, 427
uncertainty in, 224
Standardization, 23,136
Statisticians, 10-11
Standardized regression equation, 294-295
Statistics
100% Page 622 of 624 - Location 15701 of 15772
12.13 Graphing Results: Means and CIs
raiesDan 55 Sis Dvr
CEE 因 ニ ュ jam = ーn [Jaen] 2% | commsons
Cumming and Finch (2005) suggested that
レ レ
authors should emphasize confidence intervals along with effect sizes. Graphs of CIs help focus
ョ ョ
リ ョ レ ョ ョ ョ ョ ョ ョ ョ 2
reader attention on these. Several types of CI graphs can be presented for the independentsamples test. We could set up a graph of the CI
ョー ape San
for the (M, — M») difference using either an error
а
baror a bar chart. The lower and upper limits of this Clare provided in the independent-samples ¿ test output. It is more common to show a CI for each of the group means (M, and M2). This can be
done with either the SPSS error bar or bar chart procedure. To obtain an error bar graph for M; and My, make the menu selections shown in Figure 12.15, Figure 12.16, and Figure 12.17.
In Figure 12.18 the separate vertical lines for each group (no caffeine, 150 mg caffeine) have two features. The dot represents the group mean. The T-shaped bars identify the lower and upper limits of the 95% CI for each group. Be careful when you examine error bar plots in journals or conference posters. Error bars that resemble the ones in Figure 12.18 sometimes represent the mean + 1 standard deviation, or the mean * 1 SZ, instead of a 95% CI. Graphs should be clearly labeled so that viewers know whatthe error bars represent.
Figure 12.15 SPSS Menu Selections for Error Bar Procedure
The image is a SPSS menu selection to obtain the error bar procedure for thefile hrcaffeine.sav.
At thetopofthe spreadsheet are the following menu buttons;file, edit, view, data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttonsto open a
file, save, print, go back andforward, and other table editing options.
The graphs menu hasbeenopened and the following selectionsare visible; chart builder, graphboard template chooser, Weibull plot, comparesubgroupsandlegacy dialogs. The legacy dialogs menu has beenopenedto show thefollowing menu options; bar, 2-D bar, line, area, pie, high-low, box plot,error bar, population pyramid, scatter or dot and histogram.
......
There is some data visible on the spreadsheet. This has been reproduced below:
Caffeine, hr 1,51 1,66 1,58 1,58 1,53
SE 50% Page 357 of 624 - Location 9167 of 15772
Variability, 83-84
standardization of, 136
Variables
unit free, 136
analysis based on, 18
values of, 137
categorical. See Categorical variables
Xscores converted into, 136
confounded, 21 control, 571-572 definition of, 16 dependent, 19, 314, 563-566 dichotomous, 242 independent, 19, 314, 563-566 moderator, 572 nominal, 17 nonexperimental design with, 24 ordinal, 17-18 outcome, 314
predictor, 314 quantitative. See Quantitative variables rating scale, 18 real-world, 160-163 types of, 17-19 Variance homogeneity of, 332 partition of. See Partition of variance reasons for, 91-92, 313-314 sphericity of, 456 Weighted means, 508-510 Wilcoxon signed rank test, 438-440 Within-group error variance, 356
Within-S; 28, 413, 418, 443 See alsoRepeated-measures ANOVA *ZPRED, 315 zratio, ⑰⑥-①⑦⑦ *ZRESID, 315
zscores (standard scores) areas and, 138 computation of, 135 definition of, 135 finding of, 136-137 saving of, 164-165
100% Page 623 of 624 » Location 15756 of 15772
ーで
ャ
ャ
E
ーで
ャ
ッ
ッ
ーマ
a
ао
-
100% Page 624 of 624 » Location 15772 of 15772