Applied Statistics I: Basic Bivariate Techniques [3 ed.] 1506352804, 9781506352800

Rebecca M. Warner’s bestselling Applied Statistics: From Bivariate Through Multivariate Techniques has been split into t

2,234 404 405MB

English Pages 648 [577] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Applied Statistics I: Basic Bivariate Techniques [3 ed.]
 1506352804, 9781506352800

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Basic Bivariate Techniques

REBECCA M. WARNER

©

Applied Statistics I Third Edition

1% Pageiof 624 - Location 2 of 15772

To my students: Past, present, andfuture.

1% Pageiiof 624 + Location 4 of 15772

Sara Miller McCune founded SAGE Publishing in 1965 to support the dissemination of usable knowledge and educate a global community. SAGE publishes more than 1000 journals and over 800 new books each year, spanning a wide range of subject areas. Our growing selection of library products includes archives, data, case studies and video. SAGE remains majority owned by our

founder and after her lifetime will become owned by a charitable trust that secures the company’s continued independence. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne

1% Pageiiof 624 - Location 5 of15772

Applied Statistics I Basic Bivariate Techniques Third Edition

Rebecca M. Warner ProfessorEmerita, University ofNew Hampshire

© SAGE Los Angeles | London | New Delhi Singapore | Washington DC | Melbourne

Los Angeles London New Delhi Singapore Washington DC Melbourne

1% Раде тof 624 » Location 10 of 15772

Copyright © 2021 by SAGE Publications, Inc. All rights reserved. Except as permitted by U.S. copyright law, no part of this work may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without permission in writing from the publisher. All third-party trademarks referenced or depicted herein are included solely for the purpose of illustration and are the propertyof their respective owners. Reference to these trademarks in no way indicates any relationship with, or endorsement by, the trademark owner.

SAGEPublications Asia-Pacific Pte. Ltd. 18 Cross Street #10-10/11/12 China SquareCentral Singapore 048423 ISBN 978-1-5063-5280-0

This bookis printed on acid-free paper.

202122232410987654321 Printed in the United States ofAmerica AcquisitionsEditor: Helen Salmon Editorial Assistant: Megan O'Heffernan ContentDevelopment Editor: Chelsea Neve

SPSS is a registered trademark of International Business Machines Corporation. Excelis a registered trademark of Microsoft Corporation.

Production Editor: LaureenGleason

All Excel screenshots in this book are used with

Typesetter: Hurix Digital

permission from Microsoft Corporation.

OSAGE FOR INFORMATION: SAGEPublications, Inc. 2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected] SAGEPublicationsLtd. 1 Oliver's Yard 55 City Road London, EC1Y 1SP United Kingdom SAGEPublications India Pvt. Ltd. B 1/11 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India

1% Page ivof 624 » Location 14 of 15772

Copy Editor: Jim Kelly Proofreader: Scott Oney

Indexer: Michael Ferreira Cover Designer: Gail Buschman Marketing Manager: Shari Countryman

Applied Statistics I Basic Bivariate Techniques Third Edition

Rebecca M. Warner ProfessorEmerita, University ofNew Hampshire

© SAGE Los Angeles | London | New Delhi Singapore | Washington DC | Melbourne

Los Angeles London New Delhi Singapore Washington DC Melbourne

1% Раде тof 624 » Location 10 of 15772

Detailed Contents Preface Acknowledgments Aboutthe Author Chapter1 - Evaluating Numerical Information 1.1 Introduction 1.2 Guidelines for Numeracy 1.3 SourceCredibility 1.3.1 Self-Interest or Bias 1.3.2 Bias and “Cherry-Picking” 1.3.3 Primary, Secondary, and Third-Party Sources 1.3.4 Communicator Credentials andSkills 1.3.5 Track Record for Truth-Telling 1.4 Message Content

1.4.1 Anecdotal Versus Numerical Information 1.4.2 Citation of Supporting Evidence 1.5 Evaluating Generalizability 1.6 Making Causal Claims 1.6.1 The “Post Hoc, Ergo Propter Hoc”Fallacy 1.6.2 Correlation (by Itself) Does Not Imply Causation 1.6.3 Perfect Correlation Versus Imperfect Correlation 1.6.4 “Individual Results Vary” 1.6.5 Requirements for Evidence of Causal Inference 1.7 Quality Control Mechanisms in Science 1.7.1 Peer Review

1.7.2 Replication and Accumulation of Evidence 1.7.3 Open Science and Study Preregistration

1% Pagexix of624 » Location 61 of 15772.

1.8 Biases of Information Consumers 1.8.1 Confirmation Bias (A Social Influence and Consensus

1.9 Ethical Issues in Data Collection and Analy: 1.9.1 Ethical Guidelines for Researchers: Data Collection 1.9.2 Ethical Guidelines for Statisticians: Data Analysis and Reporting

1.10 Lying With Graphs and Statistics 1.11 Degrees of Belief 1.12 Summary

Chapter 2 - Basic Research Concepts 2.1 Introduction 2.2 Types of Variables 2.2.1 Overview 2.2.2 Categorical Variables 2.2.3 Quantitative Variables 2.2.4 Ordinal Variables 2.2.5 Variable Type and Choice of Analysis 2.2.6 Rating Scale Variables 2.2.7 Scores That Represent Counts

2.3 Independent and Dependent Variables 2.4 Typical Research Questions 2.4.1 AreX and Correlated? 2.4.2 Does X Predict Y? 2.4.3

Does X Cause Y?

2.5 Conditions for Causal Inference 2.6 Experimental Research Design 2.7 Nonexperimental Research Design 2.8 Quasi-Experimental Research Designs 2.9 Other Issues in Design and Analysis 2.10 Choice ofStatistical Analysis Preview 2.11 Populations and Samples: Ideal Versus Actual Situations

2.11.1 Ideal Definition of Population and Sample 2.11.2 Two Real-World Research Situations Similar to the Ideal Population and Sample Situation 2.11.3 Actual Research Situations That Are Not Similar to Ideal Situations 2.12 Common Problems in Interpretation of Results Appendix 2A: More About Levels of Measurement

Appendix 2B: Justification for the Use of Likert and Other Rating Scales as Quantitative Variables (in Some Situations’ Chapter3 - Frequency Distribution Tables 3.1 Introduction 3.2 Use of Frequency Tables for Data Screening 3.3 Frequency Tables for Categorical Variables 3.4 Elements of Frequency Tables 3.4.1 Frequency Counts (n or f

3.4.2 Total Numberof Scores in a Sample (N 3.4.3 Missing Values (if Any 3.4.4 Proportions 3.4.5 Percentages

3.4.6 Cumulative Frequencies or Cumulative Percentages 3.5 Using SPSS to Obtain a Frequency Table 3.6 Mode, Impossible Score Values, and Missing Values 3.7 Reporting Data Screening for Categorical Variables 3.8 Frequency Tables for Quantitative Variables 3.8.1 Ungrouped Frequency Distribution 1% Pagexix of 624 - Location 94 of 15772

3.8.2 Evaluation of Score Location Using Cumulative Percentage 3.8.3 Grouped or Binned Frequency Distributions 3.9 Frequency Tables for Categorical Versus Quantitative Variables 3.10 Reporting Data Screening for Quantitative Variables 3.11 What We Hopeto See in Frequency Tables for Categorical Variables 3.11.1 Categorical Variables That Represent Naturally Occurring Groups

3.11.2 Categorical Variables That Represent Treatment Groups

3.12 What We Hopeto See in Frequency Tables for Quantitative Variables 3.13 Summary

Appendix 3A: Getting Started in IBM SPSS” Version 25 3.A.1 The Bare Minimum: Using an Existing SPSS Data File to Obtain, Print, and Save Results 3.A.2 Moving Between Windows in SPSS

3.A.3 Creating a File and Entering Data

3.A.4 Defining Variable Names and Properties of Variables Appendix 3B: Missing Values in Frequency Tables Appendix 3C: Dividing Scores Into Groups or Bins Chapter4 - Descriptive Statistics 4.1 Introduction 4.2 Questions About Quantitative Variables 4.3 Notation

4.4 Sample Median 4.5 Sample Mean (M) 4.6 An Important Characteristic of M:

The Sum ofDeviations From M = 0 4.7 Disadvantage of M: It Is Not Robust Against Influence of Extreme Scores 4.8 Behavior of Mean, Median, and Mode in Common Real-World Situations 4.8.1 Example 1: Bell-Shaped Distribution 4.8.2 Example 2: Bimodal or Polarized Distribution 4.8.3 Example 3: Skewed Distribution 4.8.4 Example 4: No Clear Mode 4.9 Choosing Among Mean, Median, and

Mode 4.10 Using SPSS to Obtain Descriptive Statistics for a Quantitative Variable 4.11 Minimum, Maximum, and Range: Variation Among Scores 4.12 The Sample Variance s? 4.12.1 Step 1: Deviation of Each Score From the Mean 4.12.2 Step 2: Sum of Squared Deviations 4.12.3 Step 3: Degrees of Freedom 4.12.4 Putting the Pieces Together: Computing a Sample Variance 4.13 Sample Standard Deviation (S or SD! 4.14 How a Standard Deviation Describes Variation Among Scores in a Frequency

5.1 Introduction 5.2 Pie Chartsfor Categorical Variables 5.3 Bar Charts for Frequencies of Categorical Variables 5.4 GoodPractice for Construction of Bar Charts 5.5 Deceptive Bar Graphs 5.6 Histograms for Quantitative Variables 5.7 Obtaining a Histogram Using SPSS 5.8 Describing and Sketching BellShaped Distributions 5.9 GoodPractices in Setting Up Histograms 5.10 Boxplot (Box and Whiskers Plot: 5.10.1 How to Set Up a Boxplot by Hand 5.10.2 How to Obtain a Boxplot Using SPSS 5.11 Telling Stories About Distributions 5.12 Usesof Graphs in Actual Research 5.13 Data Screening: Separate Bar Charts or Histograms for Groups 5.14 Use of Bar Charts to Represent Group Means

5.15 Other Examples 5.15.1 Scatterplots 5.15.2 Maps

5.15.3 Historical Example 5.16 Summary

Table 4.15 Why Is There Variance? 4.16 Reports of DescriptiveStatistics in ournal Articles 4.17 Additional Issues in Reporting Descriptive Statistics 4.18 Summary

Appendix 4A: Orderof Arithmetic Operations

Appendix 4B: Rounding Chapter 5 - Graphs: Bar Charts, Histograms, and Boxplots 1% Pagexix of 624 » Location 123of 15772

Chapter6 - The Normal Distribution and z Scores

6.1 Introduction 6.2 Locations of Individual Scores in Normal Distributions 6.3 Standardized or z Scores 6.3.1 First Step in Finding az Score for X: The Distance of X From M 6.3.2 Second Step: Divide the (X - M) Distance by SD to Obtain a Unit-Free or Standardized Distance of Score

From the Mean 6.4 Converting z Scores Back Into X Units 6.5 Understanding Values of z 6.6 Qualitative Description of Normal Distribution Shape 6.7 More Precise Description of Normal Distribution Shape 6.8 Areas Under the Normal Distribution Curve Can Be Interpreted as Probabilities 6.9 Reading Tables ofAreasfor the Standard Normal Distribution 6.10 Dividing the Normal Distribution Into Three Regions: LowerTail, Middle, and Upper Tail 6.11 Outliers Relative to a Normal Distribution 6.12 Summary of First Part of Chapter 6.13 Why WeAssess Distribution Shape 6.14 Departure From Normality: Skewness ⑥.①⑤ Another Departure From Normality: Kurtosis 6.16 Overall Normality 6.17 Practical Recommendations for Preliminary Data Screening and Descriptions ofScores for Quantitative Variables 6.18 Reporting Information About Distribution Shape, Missing Values Outliers, and Descriptive Statistics for Quantitative Variables 6.19 Summary

Appendix 6A: The Mathematics of the Normal Distribution Appendix 6B: How to Select and Remove Outliers in SPSS Appendix 6C: Quantitative Assessments of Departure From Normality 6.C.1 Index for Skewness 6.C.2 Index for Kurtosis 6.C.3 Test for Overall Departure 2% Page xix of 624 » Location 154 of 15772

From Normal Distribution Shape Appendix 6D: Why Are Some Real-World Variables Approximately Normally Distributed? Appendix 6E: Saving z Scores for All Cases

Chapter7 - Sampling Error and Confidence Intervals 7.1 Descriptive Versus Inferential Uses ofStatistics 7.2 Notation for Samples Versus Populations 7.3 Sampling Error and the Sampling Distribution for Values of M 7.3.1 What Is Sampling Error? 7.3.2 Sampling Errorin a Classroom Demonstration

7.3.3 Sampling Error in Monte Carlo Simulations 7.4 Prediction Error 7.5 Sample Versus Population (Revisited: 7.5.1 Representative Samples 7.5.2 Convenience Samples 7.6 The Central Limit Theorem: Characteristics of the Sampling Distribution of M 7.7 Factors That Influence Population Standard Error (om)

7.8 Effect of N on Value of the Population Standard Error 7.9 Describing the Location ofa Single Outcome for M Relative to Population Sampling Distribution (Setting Up az Ratio’

7.10 What We Do When o Is Unknown 7.11 The Family of t Distributions 7.12 Tables fort Distributions 7.13 Using Sampling Errorto Set Up a Confidence Interval 7.14 How to Interpret a Confidence Interval

7.15 Empirical Example: Confidence Interval for Body Temperature 7.16 Other Applications for Confidence Intervals 7.16.1 CIs Can Be Obtained for Other Sample Statistics (Such as Proportions; 7.16.2 Margin of Error in Political Polls 7.17 Error Bars in Graphs of Group Means 7.18 Summary

Chapter 8 - The One-Sample t Test: Introduction to Statistical Significance Tests 8.1 Introduction 8.2 Significance Tests as Yes/No Questions About Proposed Values of Population Means 8.3 Stating a Null Hypothesis 8.4 Selecting an Alternative Hypothesis 8.5 The One-Sample t Test 8.6 Choosing an Alpha (A) Level 8.7 Specifying Reject Regions on the Basis of a, Ha, and df 8.8 Questions for the One-Sample t Test 8.9 Assumptions for the Use of the OneSample t Test 8.10 Rules for the Use of NHST 8.11 First Analysis of Mean Driving Speed Data (Using a Nondirectional Test 8.12 SPSS Analysis: One-Samplet Test for Mean Driving Speed (Using a Nondirectional or Two-Tailed Test 8.13 “Exact” p Values 8.14 Reporting Results for a Two-Tailed One-Sample t Test 8.15 Second Analysis of Driving Speed Data Using a One-Tailed or Directional Test

8.16 Reporting Results for a One-Tailed One-Sample t Test 2% Pagexix of 624 - Location 185 of 15772.

8.17 Advantages and Disadvantages of One-Tailed Tests 8.18 Traditional NHST Versus New Statistics Recommendations

8.19 Things You Should Not Say About p Values 8.20 Summary

Chapter9 Issues in Significance Tests: Effect Size, Statistical Power, and Decision Errors 9.1 Beyond p Values 9.2 Cohen's d: An Effect Size Index 9.3 Factors That Affect the Size of t

Ratios 9.4 Statistical Significance Versus Practical Importance 9.5 Statistical Power 9.6 Type land TypeII Decision Errors 9.7 Meaningsof “Error” 9.8 Use of NHST in Exploratory Versus Confirmatory Research 9.9 Inflated Risk for Type I Decision Error for Multiple Tests 9.10 Interpretation of Null Outcomes 9.11 Interpretation ofStatistically Significant Outcomes 9.11.1 Sampling Error 9.11.2 Human Error

9.11.3 Misleading p Values 9.12 Understanding Past Research 9.13 Planning Future Research 9.14 Guidelines for Reporting Results 9.15 What You Cannot Say 9.16 Summary

Appendix 9A: Further Explanation of Statistical Power Chapter 10 - Bivariate Pearson Correlation 10.1 Research Situations Where Pearson's r Is Used 10.2 Correlation and Causal Inference 10.3 How Sign and Magnitude of r Describe an X, Y Relationship

10.4 Setting Up Scatterplots 10.5 Most Associations Are Not Perfect 10.6 DifferentSituations in Which = .00

10.7 Assumptions for Use of Pearson'sr 10.7.1 Sample Must Be Similar to Population of Interest 10.7.2 X, Y Association Must Be Reasonably Linear 10.7.3 No Extreme Bivariate Outliers 10.7.4 Independent Observations for X and Independent Observations

for 10.7.5 X and Y Must Be Appropriate Variable Types 10.7.6 Assumptions About Distribution Shapes 10.8 Preliminary Data Screening for Pearson’sr

10.9 Effect of Extreme Bivariate Outliers 10.10 Research Example 10.11 Data Screening for Research Example 10.12 Computation of Pearson'sr 10.13 How Computation of Correlation Is Related to Pattern of Data Points in the Scatterplot 10.14 Testing the Hypothesis That po = 0

10.15 Reporting Many Correlations and Inflated Risk for Type I Error 10.15.1 Call Results Exploratory and De-empbhasize or Avoid Statistical Significance Tests 10.15.2 Limit the Numberof Correlations 10.15.3 Replicate or Cross-Validate Correlations 10.15.4 Bonferroni Procedure: Use More Conservative Alpha Level for Tests of Individual Correlations 10.15.5 Common Bad Practice in 2% Page xix of 624 + Location 218 of 15772

Reports of NumerousSignificance Tests

10.15.6 Summary: Reporting Numerous Correlations 10.16 Obtaining Confidence Intervals for Correlations 10.17 Pearson’s r and r? as Effect Sizes and Part n of Variance 10.18 Statistical Power and Sample Size for Correlation Studies 10.19 Interpretation of Outcomes for Pearson’sr

10.19.1 When r Is Not Statistically Significant 10.19.2 When Is Statistically Significant 10.19.3 Sources of Doubt 10.19.4 The Problem of Spuriousness 10.20 SPSS Example: Relationship Survey 10.21 Results Sections for One and Several Pearson's r Values 10.22 Reasons to Be Skeptical of Correlations 10.23 Summary

Appendix 10A: Nonparametric Alternatives to Pearson’s r 10.A.1 Spearman'sr

Appendix 10B: Setting Up a 95% CI for Pearson's r by Hand Appendix 10C: Testing Significance of Differences Between Correlations Appendix 10D: Some Factors That Artifactually Influence Magnitude of r Appendix 10E: Analysis of Nonlinear Relationships Appendix 10F: Alternative Formula to Compute Pearson’s r

Chapter11 - Bivariate Regression 11.1 Research Situations Where Bivariate Regression Is Used

11.2 New Information Provided by Regression 11.3 Regression Equations and Lines 11.4 Two Versions of Regression Equations 11.4.1 Raw-Score Regression

Equation 11.4.2 Standardized Regression Equation 11.4.3 Comparing the Two Forms of Regression 11.5 Steps in Regression Analysis 11.6 Preliminary Data Screening 11.7 Formulas for Bivariate Regression Coefficients 11.8 Statistical Significance Tests for Bivariate Regression 11.9 Confidence Intervals for Regression Coefficients 11.10 Effect Size and Statistical Power 11.11 Empirical Example Using SPSS: Salary Data 11.12 SPSS Output: Salary Data 11.13 Results Section: Hypothetical Salary Data 11.14 Plotting the Regression Line: Salary Data 11.15 Using a Regression Equation to Predict Scorefor Individual (Joe’s Heart Rate Data

11.16 Partition of Sums of Squares in Bivariate Regression 11.17 Why Is There Variance Revisited)? 11.18 Issues in Planning a Bivariate Regression Study 11.19 Plotting Residuals 11.20 Standard Error of the Estimate 11.21 Summary

Appendix 11A: Review: How to Graph a Line From Two Points Obtained From an 2% Page xix of 624 » Location 248 of 15772

Equation

Appendix 11B: OLSDerivation of Equation for Regression Coefficients Appendix 11C: Alternative Formulafor Computation of Slope Appendix 11D: Fully Worked Example: Deviations and SS Chapter 12 - The Independent-Samples t Test 12.1 Research Situations Where the Independent-Samples t Test Is Used 12.2 A Hypothetical Research Example 12.3 Assumptions for Use of Independent-Samples t Test 12.3.1 Y Scores Are Quantitative 12.3.2 Y Scores Are Independent of Each Other Both Between and Within Groups 12.3.3 Y Scores Are Sampled From Normally Distributed Populations With Equal Variances 12.3.4 No Outliers Within Groups 12.3.5 Relative Importance of Violations of These Assumptions 12.4 Preliminary Data Screening: Evaluating Violations of Assumptions and Getting to Know Your Data 12.5 Computation of IndependentSamples t Test 12.6 Statistical Significance of Independent-Samples t Test 12.7 Confidence Interval Around M; - Ma

12.8 SPSS Commands for IndependentSamples t Test 12.9 SPSS Output for IndependentSamples t Test 12.10 Effect Size Indexes for t

12.10.1 М; - Мо 12.10.2 Eta Squared (n°) 12.10.3 Point Biserialr (pb)

⑫.①0.④ Cohen`sd

12.10.5 Computation of EffectSizes for Heart Rate and Caffeine Data 12.10.6 Summary of Effect Sizes 12.11 Factors That Influence the Size of t 12.11.1 Effect Size and N 12.11.2 Dosage Levels for Treatment, or Magnitudes of Differences for Participant Characteristics, Between Groups 12.11.3 Controlof Within-Group Error Variance 12.11.4 Summary for Design Decisions 12.12 Results Section 12.13 Graphing Results: Means and Cls 12.14 Decisions About Sample Size for the Independent-Samples t Test 12.15 Issues in Designing a Study 12.15.1 Avoiding Potential Confounds 12.15.2 Decisions About Type or Dosageof Treatment 12.15.3 Decisions About Participant Recruitment and Standardization of Procedures 12.15.4 Decisions About Sample Size 12.16 Summary

Appendix 12A: A Nonparametric Alternative to the Independent-Samples t Test

Chapter 13 - One-Way Between-Subjects Analysis of Variance 13.1 Research Situations Where OneWay ANOVA Is Used 13.2 Questions in One-Way Between-S ANOVA

13.3 Hypothetical Research Example 13.4 Assumptions and Data Screening for One-Way ANOVA 13.5 Computations for One-Way Between-S ANOVA

2% Pagexix of 624 - Location 279 of 15772

13.5.1 Overview 13.5.2 SSpetween: Information About Distances Among Group Means 13.5.3 SSyithin: Information About

Variability of Scores Within Groups 13.5.4 SStota]: Information About

Total Variance in Y Scores 13.5.5 Converting Each SS to a Mean Square andSetting Up an F Ratio 13.6 Patterns of Scores and Magnitudes

OfSShetween and SSwithin 13.7 Confidence Intervals for Group Means

13.8 Effect Sizes for One-Way Between-S ANOVA

13.9 Statistical Power Analysis for OneWay Between-S ANOVA

13.10 Planned Contrasts 13.11 Post Hoc or “Protected” Tests 13.12 One-Way Between-S ANOVAin SPSS

13.13 Output From SPSS for One-Way Between-S ANOVA

13.14 Reporting Results From One-Way Between-S ANOVA

13.15 Issues in Planning a Study 13.16 Summary

Appendix 13A: ANOVA Model and Division of Scores Into Components Appendix 13B: Expected Value of F When Ho Is True

Appendix 13C: Comparison of ANOVA andt Test Appendix 13D: Nonparametric Alternative to One-Way Between-S ANOVA: Independent-Samples KruskalWallis Test Chapter14 - Paired-Samples t Test 14.1 Independent- Versus PairedSamples Designs

14.2 Between-S and Within-S or PairedGroups Designs 14.3 Types ofPaired Samples 14.3.1 Naturally Occurring Pairs Different but Related Persons in the Two Samples; 14.3.2 Creation of Matched Pairs 14.4 Hypothetical Study: Effects of Stress on Heart Rate

14.5 Review: Data Organization for Independent Samples 14.6 New: Data Organization for Paired Samples 14.7 A First Look at Repeated-Measures

Appendix 14A: Nonparametric Alternative to Paired-Samples t: Wilcoxon Signed Rank Test Chapter 15 - One-Way Repeated-Measures Analysis of Variance 15.1 Introduction 15.2 Null Hypothesisfor RepeatedMeasures ANOVA

15.3 Preliminary Assessment of Repeated-Measures Data 15.4 Computations for One-Way Repeated-Measures ANOVA 15.5 Use of SPSS Reliability Procedure for One-Way Repeated-Measures ANOVA

Data

15.6 Partition of SS in Between-S Versus

14.8 Calculation of Difference (d) Scores 14.9 Null Hypothesisfor Paired-Samples

Within-S ANOVA

tTest

Measures ANOVA

14.10 Assumptions for Paired-Samples t Test

14.11 Formulas for Paired-Samples t Test 14.12 SPSS Paired-Samples t Test Procedure 14.13 Comparison Between Results for Independent-Samples and PairedSamples t Tests 14.14 Effect Size and Power 14.15 Some Design Problems in Repeated-Measures Analyses 14.15.1 OrderEffects 14.15.2 Counterbalancing to Control for OrderEffects 14.15.3 Carryover Effects 14.15.4 Problems Due to Outside Events and Changes in Participants Across Time 14.16 Results for Paired-Samples t Test: Stress and Heart Rate 14.17 Further Evaluation of Assumptions 14.18 Summary

3% Page xix of 624 » Location 300 of 15772

15.7 Assumptions for Repeated15.7.1 Scores on Outcome Variables Are Quantitative and Approximately Normally Distributed Without Extreme Outliers 15.7.2 Relationships Among the Repeated-Measures Variables Should Be Linear Without Bivariate Outliers 15.7.3 Population Variances of Contrasts Should Be Equal Sphericity Assumption 15.7.4 Assumption of No Person-byTreatment Interaction

15.8 Choices of Contrasts in GLM Repeated Measures 15.8.1 Simple Contrasts 15.8.2 Repeated Contrasts 15.8.3 Polynomial Contrasts 15.8.4 Other Contrasts Available in the SPSS GLM Procedure 15.9 SPSS GLM Procedurefor RepeatedMeasures ANOVA

15.10 Output of GLM Repeated-Measures ANOVA

15.11 Paired-Samples t Tests as Follow-

GLM Procedure

Up

16.12 SPSS Output

15.12 Results 15.13 Effect Size 15.14 Statistical Power 15.15 Counterbalancing in RepeatedMeasures Studies 15.16 More Complex Designs

16.13 Results 16.14 Design Decisions and Magnitudes of SS Terms

15.17 Summary

Appendix 15A: Test for Person-byTreatment Interaction

Appendix 15B: Nonparametric Analysis for Repeated Measures (Friedman Test Chapter16 - Factorial Analysis of Variance 16.1 Research Situations Where Factorial Design Is Used 16.2 Questions in Factorial ANOVA 16.3 Null Hypotheses in Factorial ANOVA

16.3.1 First Null Hypothesis: Test of Main Effect for Factor A 16.3.2 Second Null Hypothesis: Test of Main Effect for Factor B 16.3.3 Third Null Hypothesis: Test of the A x B Interaction 16.4 Screening for Violations of Assumptions 16.5 Hypothetical ResearchSituation 16.6 Computations for Between-$ Factorial ANOVA 16.7 Computation of SS and df in TwoWayFactorial ANOVA 16.8 Effect Size Estimates for Factorial ANOVA

16.9 Statistical Power 16.10 Follow-Up Tests 16.10.1 Nature of a Two-Way Interaction 16.10.2 Nature of Main Effect Differences 16.11 Factorial ANOVA Using the SPSS 3% Page xix of 624 + Location 338 of 15772

16.14.1 Distances Between Group Means (Magnitudes of SSand SSg)

16.14.2 Numberof Scores Within Each Group orCell 16.14.3 Variability of Scores Within Groups orCells (Magnitude of

MSwithin) 16.15 Summary

Appendix 16A: Fixed Versus Random Factors

Appendix 16B: Weighted Versus Unweighted Means Appendix 16C: Unequal Cell n’s in Factorial ANOVA: Computing Adjusted Sums of Squares 16.C.1 Partition of Variance in Orthogonal Factorial ANOVA 16.C.2 Partition of Variance in Nonorthogonal Factorial ANOVA Appendix 16D: Modelfor Factorial ANOVA

Appendix 16E: Computation of Sums of Squares by Hand Chapter 17 - Chi-Square Analysis of Contingency Tables 17.1 Evaluating Association Between Two Categorical Variables 17.2 First Example: Contingency Tables for Titanic Data 17.3 What Is Contingency? 17.4 Conditional and Unconditional Probabilities 17.5 Null Hypothesis for Contingency Table Analysis 17.6 Second Empirical Example: Dog Ownership Data

17.7 Preliminary Examination of Dog Ownership Data 17.8 ExpectedCell Frequencies If Hy Is

True 17.9 Computation of Chi Squared Significance Test 17.10 Evaluation ofStatistical significance of x? 17.11 Effect Sizes for Chi Squared 17.12 Chi Squared Example Using SPSS 17.13 Output From Crosstabs Procedure 17.14 Reporting Results 17.15 Assumptions and Data Screening

for Contingency Tables 17.15.1 Independence of Observations 17.15.2 Minimum Requirements for Expected Values in Cells 17.15.3 Hypothetical Example: Data With One or More Values of E < 5 17.15.4 Four Waysto Handle Tables With Small Expected Values 17.15.5 How to Remove Groups

17.15.6 How to Combine Groups 17.16 Other Measures of Association for Contingency Tables 17.17 Summary

Appendix 17A: Margin of Error for Percentages in Surveys

Appendix 17B: Contingency Tables With Repeated Measures: McNemar Test

Appendix 17C: Fisher Exact Test Appendix 17D: How Marginal Distributions for X and Y Constrain Maximum Value of q Appendix 17E: Other Uses of x? Chapter18 - Selection of Bivariate Analyses and Review of Key Concepts 18.1 Selecting Appropriate Bivariate Analyses

3% Page xix of 624 + Location 368 of 15772

18.2 Types of Independent and Dependent Variables (Categorical Versus Quantitative 18.3 Parametric Versus Nonparametric Analyses 18.4 Comparisons of Means or Medians Across Groups (Categorical IV and Quantitative DV) 18.5 Problems With Selective Reporting of Evidence and Analyses 18.6 Limitations of Statistical Significance Tests and p Values 18.7 Statistical Versus Practical Significance 18.8 Generalizability Issues 18.9 Causal Inference 18.10 Results Sections 18.11 Beyond Bivariate Analyses: Adding Variables 18.11.1 Factorial ANOVA and Repeated-Measures ANOVA 18.11.2 ControlVariables 18.11.3 Moderator Variables 18.11.4 Too Many Variables? 18.12 Some Multivariable or Multivariate Analyses 18.13 Degree of Belief Appendices Appendix A: Proportions of Area Under a Standard Normal Curve Appendix B: Critical Values for t Distribution Appendix C: Critical Values of F Appendix D: Critical Values of Chi-Square Appendix E: Critical Values of the Pearson Correlation Coefficient Appendix F: Critical Values of the Studentized Range Statistic Appendix G: Transformation of r (Pearson Correlation) to Fisher's Z Glossary References

7.15 Empirical Example: Confidence Interval for Body Temperature 7.16 Other Applications for Confidence Intervals 7.16.1 CIs Can Be Obtained for Other Sample Statistics (Such as Proportions; 7.16.2 Margin of Error in Political Polls 7.17 Error Bars in Graphs of Group Means 7.18 Summary

Chapter 8 - The One-Sample t Test: Introduction to Statistical Significance Tests 8.1 Introduction 8.2 Significance Tests as Yes/No Questions About Proposed Values of Population Means 8.3 Stating a Null Hypothesis 8.4 Selecting an Alternative Hypothesis 8.5 The One-Sample t Test 8.6 Choosing an Alpha (A) Level 8.7 Specifying Reject Regions on the Basis of a, Ha, and df 8.8 Questions for the One-Sample t Test 8.9 Assumptions for the Use of the OneSample t Test 8.10 Rules for the Use of NHST 8.11 First Analysis of Mean Driving Speed Data (Using a Nondirectional Test 8.12 SPSS Analysis: One-Samplet Test for Mean Driving Speed (Using a Nondirectional or Two-Tailed Test 8.13 “Exact” p Values 8.14 Reporting Results for a Two-Tailed One-Sample t Test 8.15 Second Analysis of Driving Speed Data Using a One-Tailed or Directional Test

8.16 Reporting Results for a One-Tailed One-Sample t Test 2% Pagexix of 624 - Location 185 of 15772.

8.17 Advantages and Disadvantages of One-Tailed Tests 8.18 Traditional NHST Versus New Statistics Recommendations

8.19 Things You Should Not Say About p Values 8.20 Summary

Chapter9 Issues in Significance Tests: Effect Size, Statistical Power, and Decision Errors 9.1 Beyond p Values 9.2 Cohen's d: An Effect Size Index 9.3 Factors That Affect the Size of t

Ratios 9.4 Statistical Significance Versus Practical Importance 9.5 Statistical Power 9.6 Type land TypeII Decision Errors 9.7 Meaningsof “Error” 9.8 Use of NHST in Exploratory Versus Confirmatory Research 9.9 Inflated Risk for Type I Decision Error for Multiple Tests 9.10 Interpretation of Null Outcomes 9.11 Interpretation ofStatistically Significant Outcomes 9.11.1 Sampling Error 9.11.2 Human Error

9.11.3 Misleading p Values 9.12 Understanding Past Research 9.13 Planning Future Research 9.14 Guidelines for Reporting Results 9.15 What You Cannot Say 9.16 Summary

Appendix 9A: Further Explanation of Statistical Power Chapter 10 - Bivariate Pearson Correlation 10.1 Research Situations Where Pearson's r Is Used 10.2 Correlation and Causal Inference 10.3 How Sign and Magnitude of r Describe an X, Y Relationship

outlier detection and evaluation of distribution

Preface

shape). Connections are made between design decisions and results; for instance, students will

The set of bivariate techniques covered in this book (analyses with one predictor and one outcome) are the same as those in most introductory textbooks. This book provides an applied perspective. What does an applied perspective involve? Textbooks often use well-behaved data (without missing values, outliers, or violations of assumptions). This book introduces, early on, the idea that real data have problems. Discussion of ways in which actual practice differs from ideal situations helps students understand statistics in

the context of real-world research. Here are examples: Textbooks describe random samples

see that choice of dosage levels, control over within-group variance, and sample size influence the obtained magnitude of zand Fratios (along with sampling error, of course). Traditional use of statistical significance tests is covered. However, consistent with the New Statistics guidelines, there is greater emphasis on confidence intervals, effect sizes, and the need to documentdecisions made during analysis. Limitations ofp values are discussed in

nontechnical terms. Discussion also focuses on common researcher behaviors that affect p values (e.g., running numerous analyses and reporting only a few).

from clearly defined populations, while

A distinction is made between “statistical

researchers often work with convenience

significance” and practical or clinical or everyday

samples. Textbooks usually present one

“significance” or importance (i.e., a small y value

significance test in isolation, whereas research

does not necessarily indicate a strong treatment

reports often include numerous analyses,

effect).

accompanied by increased risk for Type I error. This book includes discussion of these problems.

Students are encouraged to think in terms of

Each chapter begins with a simple question: What

paraphrase David Hume, a wise person

kinds of questions can this analysis answer?

proportions belief to the evidence.

Chapters include fully worked examples with byhand computation for small data sets, screenshots for SPSS menu selections and output, and results sections. Technical and supplemental information, including nonparametric alternatives, is provided in appendices at the ends

“degree of belief” rather than yes/no decisions. To

Notation and presentation are consistent with Volume II (Applied Statistics II:Multivariable and Multivariate Technigues[Warner, 2020]).

Digital Resources

of most chapters. Instructor and student support materials are This book devotes less space to rarely used techniques (such as frequency polygons and

available for download from e. SAGE edge offers a |

methods to locate medians in grouped frequency

robust online environment featuring an

distributions) and more space to real-world

impressive array of free tools and resources for

decisions made during data analysis (such as

review, study, and further explorations,

3% Pagexx of 624 » Location 400 of 15772

enhancing use of the textbook by students and

teachers, students, and readers; please e-mail her

teachers.

at [email protected] with comments, corrections, or

SAGE edge for students provides a personalized approach to help you accomplish your coursework goals in an easy-to-use learning environment. Resources include the following:

* Mobile-friendly eFlashcards to strengthen your understanding of key terms * Datasets for completing in-chapter exercises * Links to web resources, including video tutorials and creativelectures, to support and enhance your learning

SAGEedge for instructors supports your teaching by providing resources that are easy to integrate into your curriculum. SAGE edge includes the following:

* Editable, chapter-specific PowerPoint® slides covering key information that offer you flexibility in creating multimedia presentations

* Test banks for each chapter with a diverse range of prewritten questions, which can be loaded into your LMSto help you assess students' progress and understanding

* Tables andfigures pulled from the book that you can download to add to handouts and assignments

* Answers to in-text comprehension questions, perfect for assessing in-class work or take-home assignments Finally, in response to feedback from instructors for R content to mirror the SPSS coverage in this book, SAGE has commissioned Az R Companionfor Applied Statistics Tby Danney Rasco. This short supplement can be bundled with this main

textbook. The author welcomes communication from 3% Pagexxiof 624 » Location 425 of 15772

suggestions.

Acknowledgments

University, Chico Jason King,Baylor College ofMedicine Patrick Leung, University ofHouston

Writers depend on many people for intellectual preparation and moral support. My understanding of statistics was shaped by exceptional teachers, including the late Morris de Grootat Carnegie Mellon University, and my dissertation advisers at Harvard, Robert Rosenthal and David Kenny. Several people who have most strongly influenced my thinking are writers I know only through their books and journal

articles. I want to thank all the authors whose work is cited in the reference list. Authors whose work has particularly influenced my understanding include Jacob and Patricia Cohen, Barbara Tabachnick, Linda Fidell, James Jaccard, Richard Harris, Geoffrey Keppel, and James

Scott E. Maxwell, University ofNotre Dame W. James Potter, University ofCalifornia, Santa

Barbara KyleL. Saunders, Colorado State University Joseph Stevens, University ofOregon James A. Swartz, University ofIllinois at Chicago Keith Thiede, University ofIllinois at Chicago

For the second edition: Diane Bagwell, University of WestFlorida Gerald R. Busheé, George Mason University Evita G. Bynum, University ofMarylandEastern

Shore Ralph Carlson, The University ofTexas Pan

Stevens.

American

Special thanksare due to reviewers who provided

America

exemplary feedback on first drafts of the

Kimberly A. Kick,Dominican University

chapters:

Tracey D. Matthews, Springfield College

John J. Convey, The Catholic University of

For thefirst edition:

Hideki Morooka,Fayetteville State University Daniel J. Mundfrom,New Mexico State

David J. Armor, GeorgeMason University

University

Michael D. Biderman, University ofTennessee

Shanta Pandey, Washington University

at Chattanooga

Beverly L. Roberts, University ofFlorida

Susan Cashin, University of Wisconsin—

Jim Schwab, University ofTexas atAustin

Milwaukee

Michael T. Scoles, University ofCentral

Ruth Childs, University ofToronto

Arkansas

Young-Hee Cho, California State University,

Carla]. Thompson, University of WestFlorida

LongBeach

Michael D. Toland, University ofKentucky

Jennifer Dunn, CenterforAssessment

Paige L. Tompkins, Mercer University

William A. Fredrickson, University of Missouri-Kansas City

For the third edition:

Robert Hanneman, University ofCalifornia,

Linda M. Bajdo, Wayne State University

Riverside

Timothy Ford, University ofOklahoma

Andrew Hayes, The Ohio State University

Beverley Hale, University ofChichester

Lawrence G. Herringer, California State

Dan Ispas,///inois State University

3% Page xxii of 624 - Location 439 of 15772

Jill A. Jacobson, Queen's University Seung-Lark Lim, University ofMissouri, Kansas City Karla Hamlen Mansour, Cleveland State University Paul F. Tremblay, University of Western Ontario Barry Trunk, Capella University I also thank the editorial and publishing team at SAGE, including Helen Salmon, Chelsea Neve, Megan O'Heffernan, and Laureen Gleason, who provided extremely helpful advice, support, and encouragement. Copy editor Jim Kelly merits special thanksfor his attention to detail. Manypeople provided moral support, particularly mylate parents, David and Helen Warner; and friends and colleagues at UNH, including Ellen Cohn, Ken Fuld, Jack Mayer, and Anita Remig. I hope this book is worthy of the support they have given me. Of course, I am responsible for any

errors and omissions that remain. Last but not least, I want to thank all my students, who havealso been my teachers. Their questions continually prompt me to search for better explanations—and I am still learning.

Dr. Rebecca M. Warner ProfessorEmerita Department ofPsychology University ofNew Hampshire

3% Page xxiii of 624 - Location 482 of 15772

who is also the world’s greatest writing

About the Author

buddy.

Rebecca M. Warner is Professor Emerita at the University of New Hampshire. She has taught statistics in the UNH

Department

of

Psychology

and

elsewhere for 40 years. Her courses have included

Introductory

and

Intermediate

Statistics as well as seminars in Multivariate Statistics, Structural Equation Modeling, and Time-Series Analysis. She received a UNH Liberal Arts Excellence in Teaching Award,is

a

Fellow

of

both

the

Association

for

Psychological Science and the Society of Experimental Social Psychology, and is a member

of the

American

Psychological

Association, the International Association for Statistical Education, and the Society for Personality and Social Psychology. She has consulted on statistics and data management for the World Health Organization in Geneva, Project Orbis, and other organizations; and served as

a visiting faculty member

at

Shandong Medical University in China. Her previous book, The Spectral Analysis of TimeSeries Data, was published in 1998. She has published

articles

on

and

social

psychology,

statistics,

health

psychology

in

numerous journals, including the Journal of Personality and Social Psychology. She has served as a reviewer for many journals, including Psychological Bulletin, Psychological Methods,

Personal

Psychometrika.

She

Relationships, received

a

BA

and from

Carnegie Mellon University in social relations in 1973 and a PhD in social psychology from

Harvard in 1978. She writes historical fiction and is a hospice volunteer along with her Pet Partner certified Italian greyhound Benny,

4% Pagexxivof 624 + Location 473 of 15772

12.10.5 Computation of EffectSizes for Heart Rate and Caffeine Data 12.10.6 Summary of Effect Sizes 12.11 Factors That Influence the Size of t 12.11.1 Effect Size and N 12.11.2 Dosage Levels for Treatment, or Magnitudes of Differences for Participant Characteristics, Between Groups 12.11.3 Controlof Within-Group Error Variance 12.11.4 Summary for Design Decisions 12.12 Results Section 12.13 Graphing Results: Means and Cls 12.14 Decisions About Sample Size for the Independent-Samples t Test 12.15 Issues in Designing a Study 12.15.1 Avoiding Potential Confounds 12.15.2 Decisions About Type or Dosageof Treatment 12.15.3 Decisions About Participant Recruitment and Standardization of Procedures 12.15.4 Decisions About Sample Size 12.16 Summary

Appendix 12A: A Nonparametric Alternative to the Independent-Samples t Test

Chapter 13 - One-Way Between-Subjects Analysis of Variance 13.1 Research Situations Where OneWay ANOVA Is Used 13.2 Questions in One-Way Between-S ANOVA

13.3 Hypothetical Research Example 13.4 Assumptions and Data Screening for One-Way ANOVA 13.5 Computations for One-Way Between-S ANOVA

2% Pagexix of 624 - Location 279 of 15772

13.5.1 Overview 13.5.2 SSpetween: Information About Distances Among Group Means 13.5.3 SSyithin: Information About

Variability of Scores Within Groups 13.5.4 SStota]: Information About

Total Variance in Y Scores 13.5.5 Converting Each SS to a Mean Square andSetting Up an F Ratio 13.6 Patterns of Scores and Magnitudes

OfSShetween and SSwithin 13.7 Confidence Intervals for Group Means

13.8 Effect Sizes for One-Way Between-S ANOVA

13.9 Statistical Power Analysis for OneWay Between-S ANOVA

13.10 Planned Contrasts 13.11 Post Hoc or “Protected” Tests 13.12 One-Way Between-S ANOVAin SPSS

13.13 Output From SPSS for One-Way Between-S ANOVA

13.14 Reporting Results From One-Way Between-S ANOVA

13.15 Issues in Planning a Study 13.16 Summary

Appendix 13A: ANOVA Model and Division of Scores Into Components Appendix 13B: Expected Value of F When Ho Is True

Appendix 13C: Comparison of ANOVA andt Test Appendix 13D: Nonparametric Alternative to One-Way Between-S ANOVA: Independent-Samples KruskalWallis Test Chapter14 - Paired-Samples t Test 14.1 Independent- Versus PairedSamples Designs

Self-interest of information providers is not

problem of distance from a source. People form a

always obvious. Many webpages offer “sponsored

line; the first person whispers a message to the

content”: paid messages from advertisers that

second person, the second person whispers it to

look like news articles but in fact promote the

the third, and so forth. When the final message is

interests of advertisers. For instance, a new diet

compared with the original message, there are

pill might be presented as “news” when in fact the

changes and distortions. Transmission of

article is an advertisement. Communicator self-

information can introduce errors because of each

interestraises concerns about credibility ofmessages.

person’s biases or misunderstandings.

1.3.2 Bias and “Cherry-Picking”

In science, a primary is a research report

Communicators generally cannot (or do not) present all available information. Selection of information by communicators can be influenced by

a preference for

written by a researcher who has firsthand knowledge of behaviors and events in a study. Primary source reports (sometimes called articles or papers) are published in journals.2 Primary source data may also appear in books

information that confirms preexisting beliefs or

written for science audiences.

ideas. Biased selection of evidence can be

A secondary is a description or summary

informally called cherry-picking. Information and ideasthat are excluded may be as important

as information thatis included.

experience the reported data collection or observations firsthand. In many disciplines,

As an example of cherry-picking, suppose 20 studies show no association between consuming meat and cancer risk, and 3 studies do show an association. A journalist might report only the 3 studies that showed an association or might report only the single most recent study. Whether the bias was intentional or not, the article will not provide an accurate summary of research results.

When scientists write

of past research, created by someone who did not

(reviews

of past research), they are expected to discuss all past relevant research. Literature reviews are included in the introductions to most primary

source research reports;literature reviews can also be stand-alone papers or books.

1.3.3 Primary, Secondary, and Third-Party Sources An old game called “telephone”illustrates the

4% Page2of624 - Location 512 of 15772

secondary sources are scholarly books. Some journal articles are also secondary sources because they only review past research and do not present

new data about which their authors have firsthand knowledge. Literature reviews in the introductions to science journal articles are secondhand discussions of past studies. (In the sciences, literature refers to past published research.) Unfortunately, primary source reports are usually long and difficult to read (particularly for readers unfamiliar with statistics and technical terms). Languagein research reports is sometimes unnecessarily obscure. Some full-length science research reports are published on the web as

open-access materials; anyone can view these. However, many publishers require fees or subscriptions for access. The consequence is that many people can'teasily understand most primary source information in science and

sometimes cannot even gain access to it.

about things we think we know.

Much content on websites for news organizations

1.3.4 Communicator Credentials and Skills

is third-partyconte . This is content written by someone who may have examined only secondary sources or other thirdhand content, such as news reports or press releases. Often, third-party

Communicators are more believable when they

content is authored by someone who has no

have training and background related to

technical knowledge of the research field and

information in the message. Researchers

statistical methods. Examples include articles

generally have credentials that provide evidence

published by news organizations. These articles

of this training and background,including

usually don’t provide complete or accurate

advanced degrees such as a PhD or MD, affiliations

information aboutresearch results.

with respected organizations such as universities,

In the past, editors of prestigious newspapers

Some journalists have strong credentials in

required reporters to fact-check claims carefully.

science, but many do not. People who do not have

Increasingly, news reports on the web are

training in statistics can easily misunderstand

paraphrases of, or uncritical reposting of, third-

studies that use statistical terms such as /ag/stic

party content from other news sources. Some

regression and odds ratios.

and publications in high-quality science journals.

mass media news sources specifically disclaim responsibility for accuracy. Here is an example;

Celebrity status is not a meaningful credential.

many other news organizations post similar

Famous media personalities, such as Dr. 0z3 and

disclaimers:

other self-appointed lifestyle or health experts, may base recommendations on incomplete or

CNN is a distributor (and not a publisher or

incorrect information.

creator) of content supplied by third parties

Scientific research reports include source

and users. ... Neither CNN nor any third-party

information (authors, university affiliations, and

provider of information guarantees the

so forth). News reports and websites sometimes

accuracy, completeness, or usefulness of any

do not include source information; they provide

content. ... (CNN, 2018)

no basis to evaluate self-interest, distance from information source, and credentials. Guidelines

Communicators can provide better quality

for evaluation of websites are provided by Kiely

information when they are closerto original sources

and Robertson (2016) and Montecino (1998).

ofinformation, and they are likely toprovide better quality information when they assumeresponsibility for accuracy. In everyday life, most of us rely on thirdhand

information most of the time. Because so much of what we think we know is based on thirdhand information, we should not be overly confident

4% Page3 of 624 + Location 542 of 15772

1.3.5 Track Record for TruthTelling There are independent, nonpartisan organizations that evaluate communicator track records for truth-telling in journalism, for example, the Pulitzer Prize-winning site

www.politifact.com. PolitiFact rates statements

experience shown is generalizable: Has this

as true, mostly true, half true, mostly false, false,

experience happened to many other people, or

and “pants on fire” (extremelyfalse). Other

was this a unique situation? Diet product

respected fact-checking sites are

advertisers are required to acknowledge this and

www snopes.com and www. factcheck.org. These fact-checkers do the work that information

typically do so in a tiny footnote: “Individual

consumers usually don’t have the time to do.

results may vary.” In science, a detailed report of an individual

Information published in scientific journals can

person or situation is called a study. The

be incorrect because of fraud; fraud in science is

study of uniquecases, such as the brain damage

rare, but it has occurred. A notorious example was

suffered by railway worker Phineas Gage

aclaim by Andrew Wakefield that vaccines cause

(Kihlstrom, 2010; Twomey, 2010) can be valuable.

autism (discussed by Godlee, Smith, & Marcovitch,

However, generalizability concerns are still

2011). There are severe penalties for fraud or

relevant.

plagiarism in science, including forced retraction of publications, withdrawal of research funds, loss of reputation, and job dismissal. Rare instances of fraud in science can be identified by a

web search for the researcher name and terms such asfraud. Information consumers should be skeptical ofinformationfrom sources withpoor recordsfor truth-telling.

Anecdotal evidence can dramatize genuine problems. However, anecdotal evidence can also dramatize and promote incorrect beliefs. It is obviously easy to cherry-pick anecdotes. Supporting evidence in the form of systematic numerical information can provide a more

accurate overview of evidence than anecdotal

reports.

1.4 Message Content 1.4.1 Anecdotal Versus Numerical Information

1.4.2 Citation of Supporting Evidence In science, identification of outside sources of

evidence is done by citation. Author names and Anecdote means “story,” often about an individual

years of publication are included in the text (to

person or situation. First-person accounts are

identify sources of ideas and evidence), and

often called testimonials. Audiences may find

complete information to locate each source is

narrative stories or anecdotes more persuasive

included in a reference list. Citation has two

and memorable than numerical information.

purposes. First, it gives credit to others for their

There are many potential problems with

ideas and evidence; this avoids plagiarism, which

anecdotes (anecdotal evides

occurs if authors presentideas or contributions of

. Sometimes

individual situations are not reported accurately

other peopleas if they were the authors’ own new

(for example, advertisements for weight loss

contributions. Second,it shows how the present

products often include falsified before and after

study builds upon an existing body of evidence.

photos). Even when anecdotal evidence is accurate, it is difficult to know whether the

4% Page3 of 624 - Location 570 of 15772

A message is more believable when it includes or

refers to specific supporting evidence. In science,

ifthe sample is representative of the population;

the most complete and detailed supporting

representativeness can often be obtained using

evidence appears in primary source research

random or systematic methodsto select the

reports in science journals. Documentation of

sample. Results from an accidental or a

information sources is typically less detailed and

convenience sample may be generalizable to a

systematic in journalism and mass media. (The

hypothetical population if the sample resembles

best science journalists provide references or links

that hypothetical population. Results from a

to primary source research reports.)

biased sampleare not generalizable. In

It is possible for a writer or an advertiser to claim a spuriousair of authority by citing numerous sources. However, along list of references does

not guarantee accuracy. On closer examination,

experiments, generalizability also depends on similarity of type and dosages of experimental treatmentto real-world experiences with the treatmentvariable, setting, and other factors.

readers may find that communicators have

Polling organizations, such as Gallup, collect

cherry-picked, misinterpreted, or misrepresented

public opinion information in ways that provide a

evidence; cited sources that are not relevantto the

good basis for generalization. They use large

topic; or referred only to opinion pieces that do

samples (usually at least 1,000 individuals) and

not actually contain evidence.

obtain these samples using combinations of

To evaluate the quality of evidence, we need to

know how it was collected. Collection of evidence in science is systematic; that is, there are rules and procedures that specify what researchers should

random and systematic selection so that the people who responded to the survey resemble the larger population (such as all registered voters) in terms of age, income, and so forth (Gallup, n.d.).

do to gather evidence and limit the kinds of

When journalists report information from polls

interpretations they are permitted to make. Rules

and demographic studies, they are (once again) in

for statistical analysis are an important part of

a position to cherry-pick. Because of differences in

this.

procedures and types of people contacted, various polling organizations may report different

1.5 Evaluating Generalizability

predictions about presidential candidate

Researchers and journalists usually want to

to support Candidate X may report only the poll in

generalize abouttheir findings. In other words,

which Candidate X had the highest approval

instead of just saying: “45% of the respondents 7

ratings.

talked to said they plan to vote for candidate X,” they wantto say something like “45% of a// registered voters plan to vote for candidate X.”

aresearcher can claim that results obtained in a specific sample would be the same for a population of interest. Results from a sample can be generalized to an actual population of interest

4% Paged of 624 - Location 598 of 15772

preference. A journalist who wants to make a case

In behavioral and social science, the problem of generalizability can have a different form. A researcher may want to know whether cognitive behavioral therapy (CBT) reduces depression. Typically, studies examine small to moderate numbers of cases, for instance, 35 patients who receive CBT and 35 who do not. To generalize results about effects of CBT to a large hypothetical

17.7 Preliminary Examination of Dog Ownership Data 17.8 ExpectedCell Frequencies If Hy Is

True 17.9 Computation of Chi Squared Significance Test 17.10 Evaluation ofStatistical significance of x? 17.11 Effect Sizes for Chi Squared 17.12 Chi Squared Example Using SPSS 17.13 Output From Crosstabs Procedure 17.14 Reporting Results 17.15 Assumptions and Data Screening

for Contingency Tables 17.15.1 Independence of Observations 17.15.2 Minimum Requirements for Expected Values in Cells 17.15.3 Hypothetical Example: Data With One or More Values of E < 5 17.15.4 Four Waysto Handle Tables With Small Expected Values 17.15.5 How to Remove Groups

17.15.6 How to Combine Groups 17.16 Other Measures of Association for Contingency Tables 17.17 Summary

Appendix 17A: Margin of Error for Percentages in Surveys

Appendix 17B: Contingency Tables With Repeated Measures: McNemar Test

Appendix 17C: Fisher Exact Test Appendix 17D: How Marginal Distributions for X and Y Constrain Maximum Value of q Appendix 17E: Other Uses of x? Chapter18 - Selection of Bivariate Analyses and Review of Key Concepts 18.1 Selecting Appropriate Bivariate Analyses

3% Page xix of 624 + Location 368 of 15772

18.2 Types of Independent and Dependent Variables (Categorical Versus Quantitative 18.3 Parametric Versus Nonparametric Analyses 18.4 Comparisons of Means or Medians Across Groups (Categorical IV and Quantitative DV) 18.5 Problems With Selective Reporting of Evidence and Analyses 18.6 Limitations of Statistical Significance Tests and p Values 18.7 Statistical Versus Practical Significance 18.8 Generalizability Issues 18.9 Causal Inference 18.10 Results Sections 18.11 Beyond Bivariate Analyses: Adding Variables 18.11.1 Factorial ANOVA and Repeated-Measures ANOVA 18.11.2 ControlVariables 18.11.3 Moderator Variables 18.11.4 Too Many Variables? 18.12 Some Multivariable or Multivariate Analyses 18.13 Degree of Belief Appendices Appendix A: Proportions of Area Under a Standard Normal Curve Appendix B: Critical Values for t Distribution Appendix C: Critical Values of F Appendix D: Critical Values of Chi-Square Appendix E: Critical Values of the Pearson Correlation Coefficient Appendix F: Critical Values of the Studentized Range Statistic Appendix G: Transformation of r (Pearson Correlation) to Fisher's Z Glossary References

association is not sufficient byitself to prove

male child inherits this genetic mutation, he will

causation because, even if Yand Ycovary,this co-

have hemophilia. Most other heritable diseases do

occurrence may be dueto the influence of one or

not show this perfect association. (For female

more other variables; one of those other variables

children, effects of the hemophilia gene are ruled

might be the real cause of X, or of F, or both. In

out by information on the other X chromosome.)

this example, heat or temperature might cause (or at least predict) ice cream purchase and homicide. The effects of rival explanatory variables can be

reduced or eliminated in well-controlled

Table 1.1

Homophilia EE Hemophilia gene is present 100% Hemophila gene isabsent

®

100%

experiments and reduced bystatistical controls.

Mere co-occurrenceis not enough evidence to makea causal inference. Sometimes the need to look for a different

Table 1.2

HMM Person does not wash hands regularly

23%

67%

explanation is obvious (as in the ice cream/homicide example). It would be absurd to

If a male child does not inherit the gene for

argue that ice cream causes homicide. However,

hemophilia, he will not have hemophilia. In

the need to consider rival explanations also arises

logical terms, the mutated gene is both

for the disease. The mutated gene

in situations that are not so obviously silly. In the diet soft drink/weight gain example, it is

is necessary for hemophilia because a person can’t

conceivable that artifi al sweeteners have causal

get hemophilia without it. The mutated gene is

effects on appetite or metabolism that really do

sufficient for hemophilia, because if a person has

lead to weight gain, even though the artificial

it, he always has hemophilia. In other words,

sweeteners contain zero (or negligible) calories.

hemophilia always occurs when the mutated gene

However, the other explanation (that drinking

is present and never occurs when the mutated

diet beverages leads people to indulge in other

gene is absent.

high-calorie foods) is also plausible. (It is also conceivable that both these explanations are partly correct.) Both experimental and nonexperimental studies, with humans and nonhuman animals, would be helpful in sorting out the relations among variables and whether any of the associations are causal.

Most associations in behavioral and social sciences and medicine are zotperfect. Consider this hypothetical example for a behavior (washing or not washing hands) and a disease outcome (getting sick). Table 1.2 shows an imperfect association. Only 25% of regular hand washers got sick, while 67%

1.6.3 Perfect Correlation Versus

of the those who don’t regularly wash their hands

Imperfect Correlation

got sick. While most people who washed their

Perfect co-occurrence (perfect correlation or statistical association) is rare. Consider the genetic mutation for hemophilia (Table 1.1). Ifa

5% Page7of624 - Location 680 of 15772

hands did not getsick, hand washing did not guarantee that they could avoid getting sick. The association between lung cancer and smoking is also not perfect. The risk for getting lung cancer

is much higher for smokers than for nonsmokers.

Training in research methods and statistics

However, a few nonsmokers do get lung cancer,

provides the skills scientists need to think

and many smokers do not get lung cancer.

carefully about the evidence needed to support

In situations where associations are not perfect, it

is likely that other variables are involved. Behaviors or conditions that sometimes (but not always) precededisease are often usually called

“riskfactors” rather than causes. Smoking is a risk factor for lung cancer. Some diseases have numerous risk factors (for example, risk for heart disease is related to smoking, body weight, sex, age, high blood pressure, and other factors). We call behaviors that reduce risk for a negative

outcome “protectivefactors.” For example, hand washing is a protective factor against getting sick.

1.6.4 “Individual Results Vary” Unless there is a perfect correlation (as in the hemophilia example), statistical associations or correlations between variables do not predict

exact outcomes for all individuals. Consider the results of a study by Judge and Cable (2004), informally reported in Dittman (July/ August 2014). They reported that taller persons tend to earn more money (thatis, height is correlated with salary). This is not a perfect correlation. If you are short, that does not necessarily mean that you will earn verylittle. Mark Zuckerberg (the founder of Facebook)is reported to be 5'7”, but that did not prevent him from becoming one of the wealthiest men in the world. If youthink about the implications correlations might have for your own outcomes, realize that individual

outcomes differ when correlations are not perfect.

causal claims. Mass media journalists often rely on secondary sources or third-party content. Bythe time information filters through multiple communication links, details about the nature of

the evidence and concerns aboutlimitations that affect the ability to generalize and make causal inferences are often lost. Third-party content often does not provide accurate information about generalizability and potential causality.

1.7 Quality Control Mechanismsin Science 1.7.1 Peer Review The science research process has mechanisms for information quality control. The most important

mechanism is review. Researchers submit research reports to science journals (also called academicjournals) for consideration (see note 2). The editor sends papers to peer reviewers (peers are expert researchers in the same field). Reviewers providedetailed criticism of studies, including evaluation of their research methods. On the basis of reviews, editors decide whether to reject a paper as inadequate, ask authors to revise the paper to correct errors or deficiencies, or (very rarely) accept the paper with only minor corrections. Papers are rarely accepted in their initially submitted form. Rejection rates for some journals are 80% or higher.

Peer review is fallible. Reviewers can also be subject to confirmation bias (they are more likely

1.6.5 Requirements for Evidence of Causal Inference 5% Page8of624 - Location 706 of15772

to favor conclusions consistent with their own beliefs). Reviewers may not notice all of the problems in a research report. However, peer

review weeds out much poorly conducted

components such as preregistration of research

research and improves the quality of published

plans and sharing details of data and methods. For

papers. The community of scientists in effect

further discussion, see Cumming and Calin-

systematically polices the work of all individual

Jageman (2016).

scientists.

1.8 Biases of Information

1.7.2 Replication and

Accumulation of Evidence A second important mechanism for data quality

control in academic research is replication. Replication meansSior redoing a study.

This can be an ex

(keeping all

methods the same) ora (changing elements of the study,such as location,

measures,or type of participants,to evaluate whether the same results occur in different situations). We should not treat findings from any

one study as a conclusive answer to a research question. Any single study may have unique problems or flaws. In an ideal world, before we accept aresearch claim, we should have a substantial body of good-quality and consistent

Consumers 1.8.1 Confirmation Bias (Again) Information consumers or receivers also tend to select evidence consistent with their preexisting

beliefs. Media consumers need to be aware that they can systematically miss kinds of information (which may be of high or low quality) when they select news sources they like. Ratings of many

web news sources on a continuum from left/liberal to right/conservative, along with assessment of accuracy, are provided at

https://mediabiasfactcheck.com/politifact/. News sources that are extremely far left or far right tend to beless accurate.

evidence to back up that claim; this can be

Because of confirmation bias, people can get

obtained from replications.

stuck: They continueto believe “facts” that aren’t

Peer review and replication in science are fallible. However, they providethe best ongoing quality control checks we have. In contrast to science, there are few quality control mechanisms for most mass media communication.

1.7.3 Open Science and Study Preregistration There are recentinitiatives to improve the reproducibility and quality of research results in biomedicine, psychology, and other fields (Begley & Ioannidis, 2015; Open Science Collaboration, 2015). The Open Science model includes

5% Page9 of 624 - Location 735 of15772

true, and ideas that are wrong, because they never expose themselves to information that might prompt them to consider different possibilities. Consumers of mass media usually avoid evidence that challenges their beliefs. Philosopher of science Karl Popper argued that scientists also need to examine evidence that might falsify their beliefs. Scientists and people in general should consider evidence that challenges their beliefs.

1.8.2 Social Influence and Consensus Should we believe something simply because many people, particularly those whom we know

media and held by millions of people. My personal

evaluated by an institutio research that involves nonhuman animals is al care a evaluated by an institutional a

favorite conspiracy theory is that alien reptiles

committee. Ethical codes govern research in

control U.S. politics. Bump (2013) reported that

other areas such as biomedicine. Data collection

more than 12 million people, or 4%, of the U.S.

cannot begin until ethics board approval of

population said that they believed this theory in

procedures has been obtained. Adherence to those

2012-2013. To be clear, I strongly disbelieve that

rules is an ethical obligation for researchers. We

we are ruled by alien reptiles. (I am also not sure

should not harm the people or entities we study.

and respect, believe it? Not necessarily. Some incorrect beliefs are widely reported in mass

whether to believe Bump's report that 12 million peoplereally believe this; surveys are not always accurate.)

As an example of potential harm to a research participant, suppose that a study reveals that a person has a history of addiction. If that

Consensus amongscience researchers can

information gets into the handsof potential

enhance the believability of a claim. However,

landlords or employers, it could have an impact

even in science, consensus does not always guarantee accuracy. Experts can turn out to be wrong. For example, there was a consensus

on that person’s search for housing and jobs.

among nutrition researchers that eggs are bad for

health because of their cholesterol content. Some recent research suggests that this widely held belief may be incorrect? (Gray & Griffin, 2009),

but the issue continues to be controversial. A beliefshared by millions ofpeople is not necessarily

wrong. However, consensusIs neithernecessary nor sufficient evidence that information is correct.

1.9 Ethical Issues in Data Collection and Analysis 1.9.1 Ethical Guidelines for Researchers: Data Collection Ethical issues arise when collecting data about people and nonhuman animals. For psychologists, the American Psychological Association has codes of ethics that protect the well-being of subjects (Campbell, Vasquez, Behnke, & Kinscherff, 2009). Research that involves human participants is

6% Page 100f 624 - Location 763 of15772

Researchers must keep such records confidential. Researchers also have an ethical responsibility to think about the potential impact of their research (both positive and negative) on public policy and the behavior of organizations and individuals.

1.9.2 Ethical Guidelines for Statisticians: Data Analysis and Reporting The GAISEreportstates, “Students should

demonstrate an awareness of ethical issues associated with sound statistical practice” (GAISE College Report ASA RevisionCommittee, 2016). A separate document (American Statistical Association, 2015) discusses ethical issues in detail. Here is a list of ethical practices for data analysts, paraphrased from the American

Statistical Association's ethics document. You will be reminded aboutthese issues as you continue through the book. 1. Ensure that numbers are accurate. Fully disclose data handling procedures (such as

enhancing use of the textbook by students and

teachers, students, and readers; please e-mail her

teachers.

at [email protected] with comments, corrections, or

SAGE edge for students provides a personalized approach to help you accomplish your coursework goals in an easy-to-use learning environment. Resources include the following:

* Mobile-friendly eFlashcards to strengthen your understanding of key terms * Datasets for completing in-chapter exercises * Links to web resources, including video tutorials and creativelectures, to support and enhance your learning

SAGEedge for instructors supports your teaching by providing resources that are easy to integrate into your curriculum. SAGE edge includes the following:

* Editable, chapter-specific PowerPoint® slides covering key information that offer you flexibility in creating multimedia presentations

* Test banks for each chapter with a diverse range of prewritten questions, which can be loaded into your LMSto help you assess students' progress and understanding

* Tables andfigures pulled from the book that you can download to add to handouts and assignments

* Answers to in-text comprehension questions, perfect for assessing in-class work or take-home assignments Finally, in response to feedback from instructors for R content to mirror the SPSS coverage in this book, SAGE has commissioned Az R Companionfor Applied Statistics Tby Danney Rasco. This short supplement can be bundled with this main

textbook. The author welcomes communication from 3% Pagexxiof 624 » Location 425 of 15772

suggestions.

medicine. There are many questions in medicine (such as what causes autoimmune disorders) for which medical research does not have good

3. Isthe communicator far from the

It is useful to think about scientific knowledge in

terms of of instead ofcertainty. The philosopher David Hume said that “a wise [person] ... proportions his [or her] belief to the evidence” (Schmidt, 2004). Degree of belief should be based

on orting evidence. When

there is little evidence (for example, results from only one study), people should not have strong belief in a claim. As additional good-quality increase. People should revise degree of belief upward or downward as new (good-quality)

evaluate the information? 4. Does the communicator have a good record for truth-telling? 5. Whattypes of evidence are included. Anecdotes? Citations of specific, credible sources? 6. Have you considered your own possible biases as an information consumer? Do you accept information uncritically because it confirms by what other people believe? 7. Do data come from people (or cases) who resemble the population of interest? Are

evidence becomes available.

results generalizable?

This rating scaleillustrates the concept of degree of belief. The use of a five-point scale and the exact verbal descriptions for each numerical rating are arbitrary.

Maybeuntrue

information source or not well qualified to

when you already believe? Are you influenced

evidence accumulates, degree of belief can

Probably untrue

2. Is evidence cherry-picked to fit the communicator’s argument?

answers (Fox, 2003).

systematically collected su)

self-interest?

8. Are causal inferences drawn when there is not enough information to prove a causal association? Remember that imperfect

correlation or co-occurrence does not indicate causation.

Not sure; insufficient evidence

Maybetrue

Probably true

9. Has information been subjected to quality control? (In science, this includes peer review and replication.)

Fairly often, the best answerto research orpublic Policy questionsis that we do not have enough high-quality evidenceto be confident that we

know thecorrect answer. We should never assumethat numerical results ofone single study or mass media report are conclusive.

10. Is the presentation of information deceptive (e.g., lying graphs)?

11. What ethical issues are at stake in the conduct and application of the research? 12. Is your degree of belief proportional to the quantity of good quality and consistent evidence? (You should never believe a claim

1.12 Summary

on the basis of just one scientific study or one journalism report.)

Here are some questions to keep in mind when

Sometimes the best answer to questions such as

evaluating numerical (and other) information.

“Are eggs harmful to cardiovascular health?”is

1. Isthere evidence of communicator bias or

6% Page 12 of 624 » Location 817 of15772

that we don’t have enough evidence yet to answer the question. Unfortunately, lack of evidence does

not prevent some communicators from making

information an important factor when you

premature claims. When claims are made on the

evaluate message credibility?

basis of limited evidence, contradiction and

4. What does it meanto say that a correlation

confusion often arise.It is better to reserve

(or association) between variables is

judgment until a large quantity of good-quality

imperfect?

evidence is available. One single media report, or one single science report, is not “proof.” Even if you do not plan to be a researcher, you can benefit from thinking like a scientist and statistician about numerical evidence you encounter in everydaylife. Some decisions have high stakes. For example, you may need to decide whether to undertakea risky but potentially beneficial medical treatment. Ideally, you should have accurate information about potential outcomes. The higher the stakes, the more you need to know how to obtain trustworthy

information.

5. Give an example of a risk factor, and a protective factor, not discussed in the chapter. 6. Whyis the existence of a correlation (existence of co-occurrence or association) between and Fnot enough evidence for us to say that Y causes Y? 7. Whatis the post hoc, ergo propter hoc fallacy? (Give an example you have seen, different from the one in this chapter.)

8. What is confirmation bias? 9. What quality control mechanisms are used in science? 10. Whatis peer review? How can it improve the

The take-home message from this chapter is: We all know a lot less than we think we do, because most of us rely heavily on third-party content that has little or no information quality control. All of us (scientists, journalists, and information consumers) should be cautious about degree of belief. Sometimes the best answer to a question is: We don’t have enough good quality evidence.

Courses in statistics and research methods teach

credibility of science reporting? 11. Whatis research replication? How can this improve the quality of evidence in science? How do exact replication and conceptual replication differ? 12. Aresearcher might say “the results of this one study prove” something. Is this justified? 13. What (approximate) degree of belief should you have on the basis of only one study?

you good practice in evaluation and presentation

of evidence.

Notes

Comprehension Questions

1 Scientists are expected to be objective when they

1. What is cherry-picking of evidence, and why is it deceptive? (Can you think of a book or media report that seems to present cherrypicked evidence?) 2. Give examples of self-interest that might

make a communicator less believable. 3. Why is distance to original source of

6% Page 13 of 624 » Location 842 of15772

select information to report. However, scientists

tend to focusselectively on information consistent with the most widely accepted existing theories; Kuhn and Hacking (2012) called this “selection of significant fact.” 2 Numerous predatory, for-profit online journal publishers have emerged in recent years. It has

become more difficult to determine whether

online publications are credible. Research reports

including eFlashcards, data sets, and web

published in predatory journals are not valued by

resources, on the accompanying website at

professional colleagues and universities. Beall’s List of Predatory Journals and Publishers names many publishers that are almost certainly predatory (https://beallslist.weebly.com). Additional warning signs that a publisher may be predatory: * The journal invites you to submit your undergraduate or graduate thesis for publication (particularly if the journal title is not in your discipline or field). * Thejournal offers to publish your paper without peer review. * Thejournal asks you to pay for publication. (However, manylegitimate publishers charge author fees to make journal articles open access on the web;therefore, a request for paymentis not always an indication thata journalis predatory.) If you are not sure whether a journal or publisher is predatory, search or along with the term predatory. You can also ask mentors, advisers, or colleagues.

3 About half of Dr. Oz's medical advice is not supported by medical research (Belluz, 2014). Dr. Oz was investigated in a congressional hearing and paid largesettlements in lawsuits for false advertising (Cohen, 2015). 4 This video about an imaginary time-traveling dietician makes fun of changes in dietary

recommendations across the decades: https://www.youtube.com/watch?v=5UaWVg1SsA.

Digital Resources Find free study tools to support your learning,

6% Page 14 of 624 » Location 867 of15772

needed to understand later topics throughout the

original data sources, if these exist. However,

book.

original data sources are sometimes not available, and complete proofreading of data may be

3.2 Use of Frequency Tables for Data Screening Welook at frequency tables to get to know the data and to identify potential errors and problems with data before we do other analyses. This

extremely time-consuming and costly. Ata minimum, spot checks (checking some score values in SPSS against original sources of data) provide an opportunity to detect problems that might be more widespread throughout the data set and would require much closer checking. If you find scores that are clearly impossible or at

Introductory statistics textbooks often present

least highly unlikely, the best option is to obtain

students with sample data sets that are assumed

valid scores from other sources if thatis possible.

notto have errors or missing information. In real-

If astudent reports a grade point average (GPA) of

world applications, data often have problems, and

6 when college GPAs are on a 0-to-4 scale, and you

it is importantto look for them. These problems

have access to university records and can find

include:

that student’s GPA, you could use the university

* Information is sometimes missing for some members of a sample. * Some scores can be unusually large or small; unusual or extreme scores can be problematic in some analyses. * Some groups contain too few cases for meaningful analyses.

* Real datasets often contain mistakes (incorrect, or even impossible, score values). Implausible or incorrect score values can arise in many ways. If a person is asked to report hair color and reports “plaid,” that is an unlikely response. If a heart rate is recorded as 275 beats per minute, the heart rate monitor is probably malfunctioning. However, a score value can appear plausible and still be incorrect; if a heart rate monitor is not properly calibrated, a person whose heartrate is given as 110 beats per minute might really have a heart rate of 95 beats per minute.

record to replace the incorrect self-reported value. If a respondent reports large numbers of silly or impossible values, you might decide to drop that person’s data entirely. There is increasing concern about completeness and transparency in data reporting (Simmons, Nelson, & Simonsohn, 2011). Research reports should include information about problems detected during preliminary screening. This information is often obtained from frequency tables and graphs (such as histograms). The

numbersor percentagesof incorrect scores, extreme scores, and missing values should be reported. Authors also need to specify what, if anything, was done to remedy these problems. You might say something like “Data for five students were dropped because they reported unlikely or inconsistent information”or “Data from three sessions had to be dropped because of equipment malfunction.” Whatever problems with data you find, and whatever actions you take,

In an ideal world, researchers would proofread every single number in the data file against

10% Page 39 of 624 - Location 1495 of 15772

you need to keep a detailed record and include this information in published research reports.

who is also the world’s greatest writing

About the Author

buddy.

Rebecca M. Warner is Professor Emerita at the University of New Hampshire. She has taught statistics in the UNH

Department

of

Psychology

and

elsewhere for 40 years. Her courses have included

Introductory

and

Intermediate

Statistics as well as seminars in Multivariate Statistics, Structural Equation Modeling, and Time-Series Analysis. She received a UNH Liberal Arts Excellence in Teaching Award,is

a

Fellow

of

both

the

Association

for

Psychological Science and the Society of Experimental Social Psychology, and is a member

of the

American

Psychological

Association, the International Association for Statistical Education, and the Society for Personality and Social Psychology. She has consulted on statistics and data management for the World Health Organization in Geneva, Project Orbis, and other organizations; and served as

a visiting faculty member

at

Shandong Medical University in China. Her previous book, The Spectral Analysis of TimeSeries Data, was published in 1998. She has published

articles

on

and

social

psychology,

statistics,

health

psychology

in

numerous journals, including the Journal of Personality and Social Psychology. She has served as a reviewer for many journals, including Psychological Bulletin, Psychological Methods,

Personal

Psychometrika.

She

Relationships, received

a

BA

and from

Carnegie Mellon University in social relations in 1973 and a PhD in social psychology from

Harvard in 1978. She writes historical fiction and is a hospice volunteer along with her Pet Partner certified Italian greyhound Benny,

4% Pagexxivof 624 + Location 473 of 15772

mean of 80.9.

for which the mean, median, and mode have

This example illustrates two things: e When one very high score is added to this sample, the value of Mincreases (while the

value of the median and mode do not change). This demonstrates that the mean is less robust against the impact of extreme

scores than the median and mode. e With one or more extremely high scores added, the value of the sample mean Mis higher than the median; and in this example, Mis actually higher than the majority ofthe individual scores in the sample. Under these circumstances the sample mean Mis nota very good way to describe “average”or typical responses. Note that adding an extremely low

score will make the mean smaller than the median.

4.8 Behavior of Mean, Median, and Mode in Common RealWorld Situations

similar values. Suppose you have a survey question that asks peopleto rate their degree of agreement

with this statement: “I think that the U.S. economy is doing well.” Response options are scores of 1 = strongly disagree (SD), 2 = disagree (D), 3 = neutral (N), 4 = agree(A), and 5 = strongly agree (SA). We might obtain a frequency distribution

like the one in Figure 4.4. Note that the answer given by the largest number of people corresponds to 3 (neutral), the next highest frequency responses were 2 (disagree) and 4 (agree), and the most extreme responses, 1 (strongly disagree) and 5 (strongly agree), were uncommon. For now, we will call this pattern a “bell-shaped”distribution. (Later, we'll talk more formally about normal distributions.) Bell-shaped distributions tend to have values of the mean, median, and modethat are close to one another. In the graph in the lower part of Figure 4.4, the number above the bar for each score value (such as 0) corresponds to the frequency of that score in the table (in the upper part of Figure 4.4). For example, in this hypothetical data set, a score of 1

This section previews the use of graphs to

had a frequency of 6. A score of 3 had a frequency

represent score frequencies for quantitative

of 33 (i.e., 33 people chose the answer 3). The

variables (graphs are discussed more extensively

histogram or graph at the bottom of Figure 4.4

in Chapter 5). Figure 4.4 shows a frequencytable

represents the same information about

for a set of hypothetical scores. A corresponding

frequencies using bars with heights that

histogram presents the same information

correspond to frequency. This distribution can be

graphically; the height of each bar in the

informally defined as bell shaped; there is a peak

histogram correspondsto the frequency of that

in the middle, and the pattern is symmetrical;

score (i.e., the number of people who had that

thatis, the left-hand sideof the distribution is

score value).

approximately a mirror imageof the right-hand

side.

4.8.1 Example 1: Bell-Shaped

Distribution

Figure 4.4 Hypothetical Likert Scale Ratings With Bell-Shaped Frequency Distribution: (a) Frequency Table and (b) Corresponding Histogram

First let’s consider a hypothetical batchof scores

15% Раде 78 о624 - Location 2338 of 15772

the other). Most people gave a rating of 1 (strongly disagree) or 5 (strongly agree). The highest modeis for arating of 5. The frequency for arating of 1 was almost as high. Very few people gaveratings

between these extremes. Figure 4.5 shows the frequency table and histogram for this Figure 4.5 Hypothetical Likert Scale Ratings for Polarized or Bimodal Responses: (a) Frequency Table and (b) Histogram Likertpolarized

vaia 1 2 a 4

5

Cumulative Percent ValidPercent Frequency Percent 36.9 36.9 (9) 27 ④e ィ 5 77 508 62 62 4 」 56.9 62 462

(©) so

100.0

1000

65

Total

100.0

434

(a



28

|

24

g ⑳

Total: 65; 100; 100

Thesecondpart ofthe image is the same data as aboverepresentedin a histogram. Thehorizontal axis representsthe degree of agreement and ranges from 1 to 5,in increments of 1. The vertical axis represents the frequency andrangesfrom O to 30, in incrementsof 10. Therearefivebins on the graph and their heights are; 24,5, 4,4 and 28. A notebelow the graph states: Mean is equal to 3.11, median is equal to 3, and modes are 5 and 1. In this example, because the distribution in Figure

3E

ÈE 10 o

6.9; 36.9 5; 7.7; 7.7; 44.6 4; 6.2; 6.2; 50.8 4; 6.2; 6.2; 56.9 28; 43.1; 43.1; 100 The numbers 36.9 and 43,1 are circled.

hypothetical outcome.

(a)

the table:

n ェ vw NE

in Figure 4.5 is an example of bimodal or polarized ratings (i.e., scores tend to be at one extreme or

5 ①



4 3

4.5 is bimodal, with one mode at the highest possible score and a second mode at the lowest possible score, neither the mean (#/= 3.11) nor

4 4

5

Degreeof agreement Note-Mean = 3.11, median = 3, and modes are 5

and 1.

the median (Mdn = 3) describes typical or average response very well. In fact, very few people gave ratings close to 3. We get a better sense of “typical” responses if we report the two modes. Peopleeither love liberal policies or hate them. The point of this example is that in some

The image is a combination of a table that showsLikert Scale ratings and a histogram displaying bimodal or polarizedratings. The first part of the image is a five-columned table that displaysvalid count, frequency, percent, valid percent, and cumulative percent forfive piecesofdata. Thedetails provided below are in the same order as mentionedin 16% Page 80 of 624 - Location 2388 of 15772

frequency distributions, the mean and median may not be good ways to describetypical or

average response.

4.8.3 Example 3: Skewed

Distribution

shows a frequencydistribution and a histogram.

behaviors, for example, How many children do you have? How many speeding tickets have you

received? Distributions for variables like these often have many responses of 0, 1, or 2 (with a smallest possible value of 0). However, the highest responses can be 8, 10, or more. For these types of variables, the shape of a distribution is often

asymmetrical or skewed. A frequency table of hypothetical answers to the question “How many children do you want to have in the future?” appears in Figure 4.6. Figure

4.6

Frequency

Hypothetical Scores

Distribution

on Number

for

of Children

Wanted: (a) Frequency Table and (b) Histogram

(a)

children

Frequency Percent ValdPercent cumutive Percent vaia o NO we это E ① 17288 258 575 2 y 125 136 na 3 FRY 121 833 ‘ iw 61 вол 5 ュ ー 45 030 7 ィ s 15 ce ⑤ Tis 15 это n as 15 oe 16 ィ e 15 1000 Total a 1000 1000

Thehorizontal axis denotes the number of children andrangesfrom 0 to 15, in increments of 5. The vertical axis denotesthe frequency and rangesfrom 0 to 25, in increments of 5. Thereare 10 bins onthe graph and their heightsare: 21,17,9,8,4,3,1,1,1,and 1.

21 20

A notebelow the graph reads: Mean is equal to 3.11, median is equal to 3, and modes are 5 and 1.

17 è 15

$

5

g

0; 21; 31.8; 31.8; 31.8 ; 17; 25.8; 25.8; 57.6 2; 9; 13.6; 13.6; 71.2 12.1; 12.1; 83.3 6.1; 6.1; 89.4 5; 3; 4.5; 4.5; 93.9 ; 1.5; 1.5; 95.5 1.5; 1.5; 97 11; 1; 1.5; 1.5; 98.5 16; 1; 1.5; 1.5; 100 Total; 66; 100; 100

The second part of the image is the same data as above representedin a histogram.

(b) 25

と ⑩

The first part of the imageis a five-columned table that displays valid count,frequency, percent, valid percent, and cumulative percent for ten pieces of data. The details provided below are in the same order as mentionedin the table: ..u.........

Some variables represent counts of events or

8





En 5

1

10

-

Numberof children

The distribution in Figure 4.6 is described as

+



15

Note-Mean = 3.11, median = 3, and modes are 5

and 1.

-

“positively skewed” because there is alonger (and thinner) tail at the positive end of the distribution. In this positively skewed distribution, there are a few extreme scores at the high end (e.g., the persons who said they wanted 11 and 16 children). In this example, the mean of 2 isnot a good indication of typical responses

The image is a combination ofa table that

16% Page 80 of 624 - Location 2409 of 15772

(more than half of the peoplein this sample

sometimes cannot even gain access to it.

about things we think we know.

Much content on websites for news organizations

1.3.4 Communicator Credentials and Skills

is third-partyconte . This is content written by someone who may have examined only secondary sources or other thirdhand content, such as news reports or press releases. Often, third-party

Communicators are more believable when they

content is authored by someone who has no

have training and background related to

technical knowledge of the research field and

information in the message. Researchers

statistical methods. Examples include articles

generally have credentials that provide evidence

published by news organizations. These articles

of this training and background,including

usually don’t provide complete or accurate

advanced degrees such as a PhD or MD, affiliations

information aboutresearch results.

with respected organizations such as universities,

In the past, editors of prestigious newspapers

Some journalists have strong credentials in

required reporters to fact-check claims carefully.

science, but many do not. People who do not have

Increasingly, news reports on the web are

training in statistics can easily misunderstand

paraphrases of, or uncritical reposting of, third-

studies that use statistical terms such as /ag/stic

party content from other news sources. Some

regression and odds ratios.

and publications in high-quality science journals.

mass media news sources specifically disclaim responsibility for accuracy. Here is an example;

Celebrity status is not a meaningful credential.

many other news organizations post similar

Famous media personalities, such as Dr. 0z3 and

disclaimers:

other self-appointed lifestyle or health experts, may base recommendations on incomplete or

CNN is a distributor (and not a publisher or

incorrect information.

creator) of content supplied by third parties

Scientific research reports include source

and users. ... Neither CNN nor any third-party

information (authors, university affiliations, and

provider of information guarantees the

so forth). News reports and websites sometimes

accuracy, completeness, or usefulness of any

do not include source information; they provide

content. ... (CNN, 2018)

no basis to evaluate self-interest, distance from information source, and credentials. Guidelines

Communicators can provide better quality

for evaluation of websites are provided by Kiely

information when they are closerto original sources

and Robertson (2016) and Montecino (1998).

ofinformation, and they are likely toprovide better quality information when they assumeresponsibility for accuracy. In everyday life, most of us rely on thirdhand

information most of the time. Because so much of what we think we know is based on thirdhand information, we should not be overly confident

4% Page3 of 624 + Location 542 of 15772

1.3.5 Track Record for TruthTelling There are independent, nonpartisan organizations that evaluate communicator track records for truth-telling in journalism, for example, the Pulitzer Prize-winning site

science data. Negative skewness is possible (with a few extreme scores at the low end)

but less common. 3. If adistribution is bell shaped or approximately normal, the values of the mean, median, and mode will be close together. The mean is a good way to describe central tendency for bell-shaped distributions; the median and mode will have

similar values. 4. When in doubt, or if the situation is complicated,it may be better to report the entire frequencydistribution (and/or histogram) along with values for the mean, median, and one or more modes.

Good practice:

This is deceptive.

* Fail to makeclear which index of central tendency is reported, and fail to note potential problems withit. Chapter 1 mentioned “lying with statistics.” Reports of central tendency can be deceptive when they present only selected information that creates the impression the author wants to create. When an author wants readers to think, “Wow, that averageis really high,” the author might choose to report the highest of the three values (mean, median, or mode). Conversely,if the author wants readers to think, “Wow, that averageis really low,” the author might choose to report the lowest value among mean, median, and mode. An author who cherry-picks the highest

* Do preliminary data screening by examining afrequencydistribution table and graph to evaluate whether the mean, median, and/or mode(s) are better ways to describe central tendency. eo If implausible score values appear, go back

and reexamine the data to correct errors. * Note the number of missing values. e State whether extreme scores or multiple modes were detected (or whether the distribution is approximately normal). e State clearly what statistic is used (mean, median, or mode) to describe average

responses.

“average”is presenting misleading (although perhaps not technically false) information.

4.10 Using SPSS to Obtain Descriptive Statistics for a Quantitative Variable Previoussections discussed statistics for central tendency; the following sections discuss statistics

to describe variability. In this section, SPSS is used to obtain all these descriptive statistics (to describe both central tendency and variability) from data in the file named temphr10.sav using the SPSS frequencies procedure.

Bad practice:

To run Frequencies, make these menu selections

* Obtain a mean, median, or mode without

(as in the example in Chapter 3): >

examining a frequency table or graph. * Select the index of central tendency value

っ . This opens the main dialog box for the frequencies

that “fits the narrative.” For example, if you

procedure; in this window, move the variable hr

want to report a high average, you can select

into the Variables window. Click the Statistics

whichever of these three statistics has the

button in the top right-hand corner of the main

highest value, whether it makes sense or not.

dialog box for the frequencies procedure to open

16% Page 82of 624 - Location 2456 of 15772

the Frequencies: Statistics dialog box (shown on

(a) alge Graphs unies Extensions window Help

Rogers Daseriptve Statstcs Bayesian Saisies Tapes Compare Means (General Linear Model Generalizea Linear Models

the right-hand side of Figure 4.7). There isa checkbox menu;click these checkboxes as shown

to select central tendency statistics and statistics to describe variability (in the area headed “Dispersion”), as shown. The statistics that describe variability are explained in upcoming

sections. Click Continueto exit from the Frequencies: Statistics box and return to the main Frequencies dialog box; and click OK in the main dialog box to run the analysis. Output appears in

Figure Figure 4.7 SPSS Frequencies: Statistics Dialog Box to Obtain Descriptive Statistics for Quantitative

Med Models

come gegression ueme cm Dimension Reduction sale Nonparametie Tests Foecasing sumar tseResponse maem Quitconto

= * Eéequences > impose Sle 日 Fm » | xe ィ › [Mer ros. り menw 0 » ⑧ » , » » ⑧

E Roc curve.

Spatal andTemporal Modeling...» Teu SPSSAmos.

Variables

x

© 8 Frequencies: Statistics

Percentile Values [J Quartiles

D Cutpointsfor 10

|| || № Mean меда equal groups № моде

[7] Percentile(s):

E sum

add

|

|

|

|

|

| [E Values are group midpoints

Dispersin ———] [ Characterizo Posterior Dist... Y Std. deviation ¥ Minimum

variance

(Maximum

| [E Skewess

| |жив

The image is a combination screenshot that shows how to select descriptive statistics as well as a SPSSstatistics dialog box. In the first part ofthe image, a closeup of the taskbar of a spreadsheetshows different navigation buttons including Analyze, graphs, utilities, extensions, window and help. On theclickingofthe Analyze button, a drop down menu with thefollowingoptions has 16% Page 83 of 624 - Location 2481 of 15772

|

IBM SPSS amos.

The reports tab has been depressed leading to another menu with the following options; frequencies, descriptives, explore, crosstabs, turf analysis, ratio, p-p plots, and q-q plots. The second part of the image showsthe frequencies: statistics dialog box. On the top left are the Percentile values withthe following check boxes; Quartiles, Cut points for equal groupsthat has an emptyslotto fill in for the numberof equal groups, and Percentile. The top right has a central tendencysection with the following check options; mean, median, mode and sum. Thefirst three have been checked. The bottom left has a dispersion segment with the following checkoptions; std. deviation, minimum,variance, maximum,range, and S.E. mean.All except for S.E. mean have been checked. Onthebottom right is a check boxthat states Valuesare group midpoints. Below this is a section titled characterize posterior dist with two check boxes; skewness and kurtosis. At the bottom are buttons for continue, cancel

and help. The values for mean, median, and mode in Figure 4.8 agree with the values obtained in earlier sections by hand, and they are close together. This

the same set of scores. Figure 4.8 Output for Descriptive Statistics for Hypothetical Heart Rate Data in temphr10.sav

Statistics hr 10

Valid

N

0

Missing

73.10

Mean Median

|

73.50 」 75

Mode

5.666

Std. Deviation

32.100

Variance Range

|

20 」

Minimum

62

Maximum

82

The image is a table titledStatistics with the following information: e Hr oN o Valid-10 o Missing -0 Mean-73.1 Median-73.5 Mode -75 Std. Deviation 5.666 Variance - 32.1 Range - 20 Minimum - 62

.......[.

opened; reports, descriptive statistics, Bayesian statistics, tables, compare means, general linear model, generalizedlinear models, mixed models, correlate, regression, loglinear, classify, dimension reduction, scale, non-parametric tests, forecasting, survival, multiple response, simulation, quality control, ROC curve, spatial and temporal modeling and

Maximum - 82

exampleverifies the by-hand computations for

The next section describes variability or variation

mean and median done in previous sections for

in quantitative scores. You will see how

16% Page 84 of 624 - Location 2491 of 15772

population of “all depressed persons,”ideally, we

possible that the association reported in some

would want a random sample drawn from that

studies did not arise because of any direct causal

population. However, participants are often

impact of diet soft drinks on weight. Perhaps

convenience samples, that is, people who were

when people drink diet soft drinks, they feel free

easy to recruit.

to indulge in other high-calorie foods, and

It is important to know what kinds of people were (and were not) included in a study. For example, if adrug study finds evidence that a new medication is effective and safe for healthy young men, that does not necessarily mean that the drug is also effective and safe for women, elders, children,

perhaps it is those other high-calorie foods, not the soft drinks in and of themselves, that cause weight gain. If that is the correct explanation, then what you need to do to avoid weight gain is to avoid consuming high-calorie foods (rather than reduce diet soda consumption).

and other kinds of people not included in the

Causal explanations are attractive because they tie

study.

events together in meaningful ways. This is useful

Be careful not to overgeneralize results, particularly when there is little information about the types and numbers of people (or cases)

in science as well as everyday life. Sometimes when a cause-effect relationship is known, it suggests what we can do to change outcomes.

included. /z makes sense togeneralize information

Demonstrating that two events are causally

from a smallgroup to some largerpopulation only

connected can bedifficult, because there are often

when people in thegroup resemble thepopulation of

rival possible explanations. Well-controlled

interest. This is discussed further in Chapter 2 in

experiments can rule out many rival explanations.

sections about samples and populations.

In everydaylife, people sometimes jump to

In science communication, authors are expected

to discuss limitations that must be considered before drawing any conclusions. Limitations include the number and kindsof people(or cases) included in a study. Science writingshould make

conclusions about causality on the basis of

insufficient evidence.

1.6.1 The “Post Hoc, Ergo Propter Hoc” Fallacy

limitations ofevidence clear; media reporting often News commentators frequently offer causal

does not.

explanations for events (e.g., the stock market

1.6 Making Causal Claims In everydaylife, and in science, we often want to

know about causal connections. Consider a question raised by Wootson (2017). Do diet (artificially sweetened) soft drinks cause weight gain? If you are concerned about weight gain, and if artificially sweetened soft drinks cause weight gain, then you might consider avoiding diet soft drinksto avoid weight gain. However, it is

5% Page5of624 - Location 624 of15772

went down because of a blizzard the previous day). This explanation is often just an opinion of the news commentator. The stock market might have gone down for other reasons (including random variations). This is an example of a common This Latin phrase means “after this, therefore, because of this.” This (incorrect) reasoning goes like this: If Event A happens, and then Event B happens, then A must have caused B. Before we

higher or lower than other people’s. Statistical

We return to the question: How much do people’s

analyses you will learn later in the course provide

scores in a sample vary or differ relative to the

ways to evaluate how much of the individual

sample mean? In words, the answer to this

differences in hr mightbe related to each variable,

question is: We find out how far each X score is

such as anxiety.

from the mean by computinga deviation, we square each deviation, then we sum the squared

4.12.2 Step 2: Sum of Squared

Deviations

deviations to summarize information about distance from the mean. This gives the formula for SS, the sum of squared deviations of scores

Next, we need to summarize information about distances from the mean across all the people in the sample. You might think that you could summarize information by summing the deviations, the values of (X-4), across all people in the data set. However, recall from Section 4.6

that this sum of deviations from the mean is always zero. It might occur to you that this problem could be avoided by summing the absolute values of these deviations. However, there is another approach that yields more useful

from their mean: Other

(4.5)

55 = E[(X — MY]. A different version of the formulafor SSis often given in introductory textbooks:

Other

(4.6)

results.

SS = 200) - [€ X)/N].

Here we introduce another tool in the

Equation 4.5 makesit easier to see what

statistician’s bag of tricks. When deviations sum

information aboutscores is included when you compute SS. Equation 4.6 is easier for by-hand computation of SSfrom scores. They yield the

deviations makes all the terms in this sum

same results.

positive. Notice that we square each individual deviation

To summarize information about individual score

first; then we add those squared deviations.

distances from the mean: First, we square each

Appendix 4A describes rules about precedence in

person’s deviation from the mean. (Squaring a

the order of arithmetic operations. Operations

negative valueyields a positive value, so squaring

that are enclosed in parentheses are done before

deviations gets rid of the problem that positive

operations outside the parentheses. For example,

and negative deviations would cancel each other

if you see the expression E(7?), you square the

out by summing to 0.) Then we sum those squared

value of each 7, and then sum the squared values.

deviations. The resulting sum is called the sum

If you see the expression (5772, you sum the

squares (or sum of squared deviations), abbreviated SS. In upcoming steps, SSwill be used

values of Yand then square that sum.

to compute sample variance and standard

Sometimes textbook examples use numbers that

deviation.

give a whole-number result for SS; however, in

17% Page 26 of 624 + Location 2538 of 15772

Appendix 4B reviews rounding. I suggest that you retain at least three decimal places during computations. Final results for most statistics are

e...

real data, SSis usually not a whole number.

75; 1.9; 3.61 80; 6.9; 47.61 82; 8.9; 79.21 Sum:0; 288.9

often rounded to two decimal places. See Note that SS cannot be a negative number

Appendix 4B for a discussion of rounding.

(because we are summing squared deviations, and

In Figure 4.9 (data from temphr10.sav) the

squared numbers cannotbe negative).

squared deviation from the meanfor each individual person appears in the last column (the

Other factors being equal, SStendsto be larger

variable named deviationsq). Adding the scores

when:

for deviationsq gives the valueof SS for this data set: SS= 288.90. For larger data sets, it is more convenient to have a computer program do this.

Figure 4.9 Deviations and Squared Deviations of Heart Rate Scores From Mean

|@ deviation

69 70 71 73 74 ⑦⑤ 75 80 82

|devi

1. The individual (X- M) deviations from the meanare larger in absolute value. 2. The number of squared deviations included in the sum increases. The minimum possible value of SS (which is O)

11.10 4.10 -3.10 2.10 -①0 .⑨0 1.90 1.90 6.90 8.90

occurs when all the X scores are equal and,

16.81 9.61 4.41 0① -⑧① 3.61 3.61 47.61 79.21

therefore, equal to M. For example, in the set of scores [73, 73, 73, 73, 73], the SSterm would equal 0. There is no limit, in practice, for the maximum

valueof SS. To interpret SSas information about variability,

we need to correct for the fact that SStends to be larger when the number of squared deviations included in the sum is large. Dividing by #, the number of scores in the sample, seems like the obvioussolution. However, this does not provide

Sum

0.0

288.90

........

The imageis a table that showsheart rate values, deviation and square of deviations.

hr; deviation; deviationsq 62; minus 11.1; 123.21 69; minus4.1; 16.81 70; minus3.1; 9.61 71; minus2.1; 4.41 73; minus.1;.01 74;.9;.81 75;1.9;3.61 17% Page 87 of 624 - Location 2566 of 15772

the best answer.

12.3 Step 3: Degrees of Freedom It might seem logical to divide SSby Ato correct

for the increase in size of S§as increases. However, this yields values that are slightly too small; Gosset (discussed in Tankard, 1984) worked out the reason for the problem and discovered a simple solution. When we look at the pieces of information used to compute SS(i.e., the deviation of each score from the sample mean), it

is possible to see that we do not have N independent deviations (or pieces of information) available to compute the SS; in fact, we have only (W-1) pieces of information.

This modified divisor, V-1,is called the

To explain why deviations from the mean in a

the deviations are “free to vary.” The use of df

sample of Vscores provide only (1-1)

instead of Vasa divisor is another frequently used

independentpieces of information aboutdistance

toolin the statistician's bag of tricks. Later

from the mean, recall that the sum of all

analyses also use terms,although のoften has

deviations of scores from the mean must equal 0.

different values than (W-—1) in other situations.

Suppose we have & = 3 scores in a sample(call

Degrees of freedom for the SSand sample variance

these scores X71, Xp, and X3) and that their meanis

are obtained using Equation 4.7:

M.

of freedom

(45. The 4/term tells us how many of

Other

First, we convert each X score into a deviation by subtracting the sample mean M. We know that the sum of these deviations must equal zero. That yields this simple equation:

Other

(Х, - М) + (Х, - М) + (Х, - М) = 0.

(4.7)

df= (N-1). 4.12.4 Putting the Pieces Together: Computing a Sample Variance

We can rearrange this equation by subtracting (X35 — M) from bothsides; the equation becomes:

The variance for a sampleis usually denoted s2. A sample variance is obtained by dividing SSby its

Other

degrees of freedom:

( — M) + X, - / = (X, ー ル ⑦. When we compute (X; —M) + (X — M) (on the left side of the equation), this gives us the value that the remaining deviation, (X3 - 1), must have. Only the first two deviations are “free to vary,” that is, free to take on any possible value. Once we know the value of any two of the deviations, the value of the last deviation is determined (it must be

whatever number is needed to make the sum of all deviations equal 0). This is only a demonstration, not a formal proof.

Other

(4.8 )

»# = SS/(N - 1)or SS/df (Some textbooks use $2 to denote a sample variance calculated as SS/N. In actual practice, this

notation is almost never used when statistics are applied to real-world data, and you will not see $2 again in this book.)

Return to the data in Figure 4.9. The first column shows heart rate scores for each person. The

second column shows the deviation of each available when we compute SSoranother

17% Page 88 of 624 + Location 2588 of 15772

person’s score from the mean(the variable name is deviation). The third column shows each

review weeds out much poorly conducted

components such as preregistration of research

research and improves the quality of published

plans and sharing details of data and methods. For

papers. The community of scientists in effect

further discussion, see Cumming and Calin-

systematically polices the work of all individual

Jageman (2016).

scientists.

1.8 Biases of Information

1.7.2 Replication and

Accumulation of Evidence A second important mechanism for data quality

control in academic research is replication. Replication meansSior redoing a study.

This can be an ex

(keeping all

methods the same) ora (changing elements of the study,such as location,

measures,or type of participants,to evaluate whether the same results occur in different situations). We should not treat findings from any

one study as a conclusive answer to a research question. Any single study may have unique problems or flaws. In an ideal world, before we accept aresearch claim, we should have a substantial body of good-quality and consistent

Consumers 1.8.1 Confirmation Bias (Again) Information consumers or receivers also tend to select evidence consistent with their preexisting

beliefs. Media consumers need to be aware that they can systematically miss kinds of information (which may be of high or low quality) when they select news sources they like. Ratings of many

web news sources on a continuum from left/liberal to right/conservative, along with assessment of accuracy, are provided at

https://mediabiasfactcheck.com/politifact/. News sources that are extremely far left or far right tend to beless accurate.

evidence to back up that claim; this can be

Because of confirmation bias, people can get

obtained from replications.

stuck: They continueto believe “facts” that aren’t

Peer review and replication in science are fallible. However, they providethe best ongoing quality control checks we have. In contrast to science, there are few quality control mechanisms for most mass media communication.

1.7.3 Open Science and Study Preregistration There are recentinitiatives to improve the reproducibility and quality of research results in biomedicine, psychology, and other fields (Begley & Ioannidis, 2015; Open Science Collaboration, 2015). The Open Science model includes

5% Page9 of 624 - Location 735 of15772

true, and ideas that are wrong, because they never expose themselves to information that might prompt them to consider different possibilities. Consumers of mass media usually avoid evidence that challenges their beliefs. Philosopher of science Karl Popper argued that scientists also need to examine evidence that might falsify their beliefs. Scientists and people in general should consider evidence that challenges their beliefs.

1.8.2 Social Influence and Consensus Should we believe something simply because many people, particularly those whom we know

when you compute the following. The values of Mand SD can be combined to set up ranges of score values; that is, we can combine

information about the mean and information about typical distances from the mean. This can be done using integer multiples of 57, such as M+

1250 Lo M-28D=505 na 15D ma M-1°SD=62 ET) mas — = ら т sD a M+1SD=67

= a = ョ E ッ ッ ッ

#3 М+2'50=695

1x SDand M+ 2 x SD.

“oso

For M= 64.5 and SD = 2.5, we obtain the following

Other

M-2xSD=64.5—5 M — 1 x SD = 64.5 — 2.5 M:0xSD=64.5+0 M +1 x SD = 64.5 + 2.5 M+2xSD=64.5+5

59.5. = 62. =64.5. = 67. = 69.5.

The shorter vertical arrow next to the frequency table in Figure 4.10 extends from M-(1 x SD) to M + (1 x SD). This correspondsto the frequencies enclosed in the smaller ellipse. The longer vertical arrow ranges from M-(2 x SD) to M+ (2 x 5D), score values from 59.5 to 69.5. This corresponds to scores in the larger ellipse. Most women in the sample had heights that were included in the range M-(2 x SD) to M+ (2 x SD); only three women (2.5%) had scores below 59.5, and only two women (1.7%) had scores above 69.5.

In words: When we combine information about distance from the mean(57) with the location of the mean (M), we obtain information about the rangeof values within which most of the Yscores lie; this is called the rangerule. The range rule works only for bell-shaped distributions, as in the present example. Figure 4.10 Hypothetical Data for Female Height

in Inches for # = 120 Women With 47 = 64.5 and ②の =②.⑤

17% Page 90 of 624 - Location 2647 of 15772.

The imageis a combinationof a table and a graph that shows hypothetical data for female height. The table hasfour columns; valid count, frequency,percent and cumulative percent. Details are below; о о о ooo.

values for the hypothetical female height data:

58;1;.8;.8 トル ;②⑤⑤ „7; 11.7 1.7; 23.3 63; 12; 10; 33.3 20; 16.7; 50 64; 3.3;63.3 65; 66; 18; 15; 78.3 12.5; 90.8 67; ; 5; 4.2; 95 „3; 98.3 70; 2; 1.7; 100 Total; 120; 100

There are2 circles over the figures; one covers thepercent values 11.7, 10, 16.7, 13.3, 15, and 12.5 andthesecondcoversa larger set of percent values including 2.5, 6.7, 11.7, 10, 16.7, 13.3, 15, 12.5, 4.2 and 3.3.

Thegraph in the second part of the image showsthe X and Y axes as well as the 1 into SD and2 into SD lines. The following figures are mentioned alongside the graph: * Mminus2 into SD equals 59.5 * Mminus1 into SD equals 62 * Mequals 64.5

* Mplus 1 into SD equals 67 * Mplus 2 into SD equals 69.5

frequency distribution table or graph. Ifthe distribution is approximately normal, Mand SD are good ways to describe these. If the distribution

Here are some approximate (not exact)

is clearly non-normal, Mdn and interquartile

relationships of SD with data values that can help

range maybe preferred.For distributions that are

you understand what SD = 2.5 tells us.

not bell shaped, see the next chapter for better waysto describe variation among scores.

In the preceding example, the range for height scores (70 — 58) was 12. The range rule suggests that, for a bell-shaped distribution, the rangeis often little less than 4 x SD. For these data, 4 x SD = 4 x 2.5 = 10. Turning this statement around,the rangerule suggests that SD is often little less than one quarter of the range. Knowing that SDis related to range may help you understand SD. Remember that the range rule works only for bellshaped distributions. * The value of SDtells us about typical distances of scores from the sample mean.

e Few scores are lower than 2 x SD units below Mor higher than 2 x SD units above M. In other words, 2 x SDis a large distance from the mean; only a small percentage of scores are that far away from M. * Ifaresearch report tells you that the

distribution of scores is close to normal with known values for Mand SD, this is sufficient information for you to guess the range. e Using SD = 2.5, individual deviations of height from a mean height of 2.5 inches or less (either positive or negative deviations)

were very common. e Almost all people had deviations from the

4.15 Why is There Variance? Whydo scores differ across people? This is the most fundamental question in applied statistics. For data about humans, the question becomes: What makes people different? Why do some people have higher, and some people lower, heart rates? Why are some people taller and others

shorter? Some characteristics do not differ across people (they are constant). Most people have five fingers on each hand. The rare exceptions are people who have genes for a different number of fingers, or people who have lost fingers because of injury. However, characteristics such as heart rate do differ across persons and situations. Suppose you measure hr for all members of a group. Some persons will have low hr; their hr may be lower than average because they are physically fit and do not smoke. Others have high hr; these elevated hr scores might be due to anxiety or caffeine consumption. A first goal of statistical analysisis to quantify or describe how much people differ. Range, variance,

mean that were less than 2 x SDin absolute

and standard deviation provide this information.

value; 2 x SD = 2 x 2.5 = 5 inches. To say this

We will consider a more interesting question in

another way, most women had heights

upcoming chapters: Can we explain or predict

between 62 and 67 inches.

these differences in heart rate? Can we

Good practice: To choose the most appropriate statistics to describe central tendency and variability, the data analyst should examine a

18% Page 90 of 624 - Location 2669 of 15772

understand why people differ? You probably already have some intuitions about factors that are related to hr, for example, smoking and

deletion of cases or replacement of missing

This book discusses good practices in applied

values) that could alter conclusions.

statistics that can potentially improvethe clarity

2. Make the limitations of the type of statistical

and honesty of research reports. When

analysis clear. (As each new analysis is

communicators present information in

introduced, you will learn about its

misleading, unclear, or dishonest ways, they risk

limitations.)

loss of credibility, trust, and respect, not just for

3. Avoid behaviors that can lead to errors

themselves but for the professions of statistics

(including, but not limited to, cherry-picking

and science. When information consumers rely

afew results).

on incorrect information, they may make poor

4. Avoid misleading presentations (such as

decisions.

“lying graphs”; see Section 1.10). 5. Avoid language that obscures results. 6. Do not overgeneralize. Do not make strong claims about characteristics of a population when your sample does not resemble that population.

1.10 Lying with Graphs and Statistics The most extreme form of lying withstatistics is fabrication or falsification of data; this is rare.

Real-world problems in applications of data

However, some common research practices slant

analysis are often notclear in introductory

information presentation in ways that can be

courses; students learn to do one analysis at a time

called “lying withstatistics.” The classic book How

using one small set of numbers. In actual practice,

to Lie With Statistics (Huff, 1954) presented

data analysts often work with large sets of messy

numerous examples.

data. Data analysts need to make many choices that involvedifficult judgmentcalls. This book points out differences between the idea/use of statistics in artificially simplified situations and the actual application of statistics to real-world data. Sometimes decisions about “best practice”

Deceptive bar graphs are among the most common ways information communicators mislead information consumers. If you will be an information producer, you need to know how to set up “honest” bar graphs. When you are an

are difficult.

information consumer, you need to know how to

As Harris (2001) said, “Statistics is a form of social

misleading. Chapter 5 provides examples of clear

control over the professional behavior of

versus misleading graphs and guidelines for

researchers. The ultimate justification for any

evaluation of graphs.

examine graphs to makesure that they are not

statistical procedure lies in the kinds of research behavior it encourages or discourages.” Science hasrules and standards about good practice in

1.11 Degrees of Belief

collection, analysis, and presentation of evidence.

Peoplerarely have time to collect all necessary

These are discussed throughout this book.

information. Even for questions in science, we

Researchers should be aware that press releases from universities sometimes overhype research findings (Resnick, 2019).

6% Page 11 of 624 » Location 791 of15772

often do not have enough information to be confident about conclusions. Uncertainty is more common than people realize, even in areas such as

who are not familiar with the variables will find

deviation of scores for sex).

this helpful to evaluate the obtained scores.

When a study includes many groups and/or many

Here are other things a summary table might

variables, all groups and all variables should be

include: the minimum and maximum scores

identified and reported in descriptivetables. This

obtained in the sample, numbers of missing

lets readers know if you haveselectively excluded

values for each variable, and information about

some groups or variables from the analyses you

reliability for each variable. If you do research in a

report later.

specific area, look at tables in published research reports to see if additional information is usually included in summary tables for descriptive statistics.

Research reports often describe scores on quantitative data using the sample mean M, the

Table 4.1

Well-being variables 5 РА Health behavior variables Sleepquality Diet variables

4.18 Summary

23.46 SE

640 125

Possible Mi

Possible Max

5 10

5 so

330

standard deviation SD (or s), and the variance s2.

Readers tend to assume that scores for quantitative variables have an approximately bellshaped distribution (if they are not informed otherwise), and they interpret the descriptive statistics accordingly. The “bag of tricks” used to compute many

сн

a

o

Note. LS is life satisfaction from theSatisfaction With Life

Scale; higherscores indicate greatersatisfaction. PA is positive affect using the PANASscale;higher scores indicate morepositive mood. Sleepquality wasrated on a 1-to-5 scale; 5 indicates thebest sleep quality. Sugar isan estimateof dailycalorie intake from sugar-containing

beverages.

8

statistics is actually quite small, and you have seen several of these tricks in this chapter: « When a sum of deviations would be zero, square terms before summing them. * When correcting for the number of deviations (or pieces of information)

2 NCIfv is the number of servings of fruit and vegetables in a typical day on the basis of a National Cancer Institute food frequency questionnaire, with responses recoded on a

included in a sum, divide by @finstead of by N.

e To put information back into the original

terms of measurement, take the square root.

scale from 0 to 8. The modal response was O.

These “tricks” are used again in many future

Because of this, and because thisvariable is

analyses.

the most important predictor variable in this study, the entire frequencydistribution should be presented in a separate table (not shown in this chapter). For a categorical variable such as sex, report proportions of male and female respondents as descriptive information (not the mean and standard

18% Page 92 of 624 + Location 2723 of 15772

You have seen that the sample meanis not always the best description of central tendency. In some frequencydistributions, Mis much larger (or smaller) than the median, and the magnitude of the meanis influenced strongly by a few extreme scores. When frequencies have more than one

mode, or are skewed, Mis sometimes not the best

Consider thisset of scores: X= [1, 3, 5, 2]. If you

description of the “typical” response. When you

square each X value and then sum the squared

report a mean, you need to tell readers something

values, you would obtain (1 + 9 + 25 + 4)=39.1f

about the shape of the frequency distribution to

you sum the X's and then square that sum, you

provide the background information needed to

would obtain (1 + 3 + 5 + 2)2 = 112? = 121.It is

understand potential problems with the mean.

important to know which arithmetic operation to

Statistics books provide so many examples of bell-

do first.

shaped distributions that students may assume

There are rules of precedence (order) for

that all data have this distribution shape.

arithmetic operations (see

However, many common kinds of variables do not

http://mathworld.wolfram.com/Precedence.htm

have bell-shaped distributions. Graphs, discussed

1). When I present equations I explain in words the

in Chapter 5, can be used to evaluate whether

order in which computations should be done, and

scores have a bell-shaped distribution or some

often, I use extra parentheses to make this clear in

other distribution shape. We should not assume

the equation. When an expression appears within

that all distribution shapes are bell shaped. When

parentheses, such as (X- 5), do that operation

reporting information about variables, remember

first. If you see E(X?), square each X valuefirst,

that readers may assume a bell-shaped

and then sum the squared X values: (1 +9 +25 +4)

distribution if you do not explain clearly that the

= 39. If you see (F X)?, sum the X values first, and

distribution shape is different.

then square the sum: (1 + 3 + 5 +2)2 = 112= 121.

If you read mass media reports about “averages,”

Be aware thatif you do arithmetic operations in

you need to know whether average was estimated

the wrong order, you can obtain answers that are

using the mode, median, or mean; under some

incorrect by huge amounts.

circumstances, these three descriptivestatistics can yield very different values.

Appendix 4B: Rounding

The next chapter provides further information

Computer programs often provide numbers given

about obtaining and interpreting graphs of

to several decimal places. Each number that

frequency distributions and additional questions

comes after a decimal point represents one

we can ask about distributions of scores on a

decimal place. For example, the number 4.171 has

quantitative variable.

three decimal places.

Appendix 4A: Order of Arithmetic Operations

If you do by-hand computations, you should retain at least three decimal places during your computations to minimize rounding error. Final results are usually rounded to a small number of

Many equations combine two or more arithmetic

decimal places, often two decimal places. The

operations, for example, XX? includes both

preferred number of decimal places to report

squaring and summing X scores. When operations

differs across disciplines and may differ across

are combined,the result often differs depending

variables. Use common sense. It would besilly to

upon the order in which operations are done.

say that the average American gets 7.481 hours of

18% Page 94 of 624 » Location 2746 of 15772.

sleep per night; it would make more sense to report this as 7.5 hours. If you are in doubt, report

higher. 5. True or false: The mode, median, and

more decimal points than you think reviewers or

mean can be equal, but they do not have

editors or readers will want; these can always be

to be equal.

rounded later. Use past research in your area of

. Consider the following small set of scores.

interest as a guide for the number of decimal

Each number represents the number of

places to report.

siblings reported by each of the V = 6 persons

Here are simple rules for rounding. If a final digit is greater than 5, the digit before it is increased by one unit when you round (this is “rounding up”). For example, 3.86 would be rounded to 3.9. If the final digit is less than 5, the digit before it is left the same when rounding (this is called “rounding down”); for example, 3.83 would be rounded to 3.8. If the final digit is exactly 5, you can toss a coin to decide whether to round up or down. In many journal articles, and for many statistics such as M, results are presented to one or two decimal places. One exception is that p values, introduced in later chapters, are often reported to three decimal places.

Comprehension Questions

in the sample: Æscores are [0, 1, 1, 1, 2, 7].

1. What isthe median for these scores? 2. What is the mode for these scores? 3. Compute the mean (M) for this set of six

scores. 4. Do youthinkthe median or the mean is a better way to describe the “typical” number of siblings? Why? 5. Whyisthe meanhigher than the median

for this batch of data? 6. Compute the six deviations from the mean (X- M), and list these six

deviations. 7. Sum these six deviations. What is the sum of the six deviations? Is this outcome a surprise? 8. Now calculate the sum of squared deviations (SS) for this set of six scores.

1. Fill in each blank using either mean or

median. 1. The

is the value for which 50%

of people in the sample havescores above, and 50% in the sample have

scores below. 2. The is the value for which the sum of deviations equals 0. 3. If extremely high scores are present, the may be so high that most peoplein the sample have scores that fall

below it. 4. If you change one or two of the highest scores in the sampleto higher values, the valueof the but the value of the

will not change, will get

18% Page 95of 624 - Location 2774 of 15772

9. Compute the sample variance, s?, for this

set of six scores. 10. When you compute s2, why should you divide SSby (W-1) rather than by W? 11. Finally, compute the sample standard deviation (denoted byeither sor SD). 12. Write a sentence in which you

summarize the information about this variable (including N, M, SD, Min, and Max). . In your own words, what does SStell us about

asset of data? Under what circumstances will the value of SSequal 0? Can SSever be negative? Why or why not?

. What would the value of SSbefor this set of scores: [103, 103,103, 103, 103, 103]? (You

should not need to do any computations.) 5. Think about the SSvalues you might obtain if you computed SSfor these two samples:

* Sample A: Y=[103, 156, 200, 300, 98] * Sample B: Y= [101, 102, 103, 102, 101] 1. Which sample will have alarger SS value? (You should not need to calculate SSto answer this.) 2. Will a sample that has a larger SS value also havelarger values for s2 and s (assuming Vis the same)? Explain your answer briefly. 6. Consider a quantitative variable (such as body temperature given either in degrees Fahrenheit or Celsius). List all the descriptive statistics information you could present to describe results about central tendency and variability for temperature. 7. Suppose that IQ scores are normally

distributed with #7= 100 and $0 = 15. Use the rangerule to approximate the sample range

on the basis of values of Mand SD.

Digital Resources Find free study tools to support your learning,

including eFlashcards,data sets, and web resources, on the accompanying website at

18% Page 96 of 624 - Location 2799 of 15772.

Graphs: Bar Charts, Histograms, and Boxplots 5.1 Introduction Information about scores that was presented in the form of frequency tables in Chapters 3 and 4 can be presented in simple graphs. This chapter

commonly seen in real data (examples appear in Tables 5.1 and 5.2). The bell-shaped curve (more formally, the normal distribution or Gaussian distribution) is of particular

interest. The normal distribution will be discussed further in Chapter 6. A disadvantage of frequencytables is that it can bedifficult (although it is possible) to evaluate distribution shape by inspection of a frequency table.

describes some widely used types of graphs: pie

Ideally, preliminary data screening includes

charts and bar charts for categorical variables, and

frequency tables (Chapter 3), descriptive statistics

histograms and boxplots for quantitative

(Chapter 4), and graphs (the present chapter).

variables. Each approach (frequency table vs.

Frequencytables are rarely included in published

graph) has advantages and potential

research reports. Graphs of frequency

disadvantages:

distributions are not often reported in journal

1. An advantage of frequency tables is that they provide exact information about the numbers or percentages of persons who had each score

articles, although they can be. Information in frequency tables can be used to label graphs accurately.

value. The corresponding disadvantage of

SPSS does not produce publication-quality

graphs is that when they are poorly labeled,it

graphics. For beginners, this is not a major

is difficult to identify exact numbers and

problem; the graphs are adequate for preliminary

percentages.

data screening. Advanced users may prefer other

2. Adisadvantage of graphs is that they can be

programs to generate graphics. The R supplement

constructed in ways that create deceptive

for this book (Rasco, 2020) demonstrates use of

impressions. Frequency tables generally are

the ggplot procedure; this produces better quality

not deceptive.

graphics. I modified most SPSS graphics in this

3. An advantage of graphs is that they provide appealing visual information that grabs

book by editing to increase font sizes and add

information.

readers’ attention; this is particularly useful in mass media reports, PowerPoint or Prezi

In real-world data analysis, descriptivestatistics,

presentations, and poster presentationsat

frequency tables, and graphs should be examined

professional conferences. A disadvantage of

before a data analyst conducts the main analysis

frequency tables is that they do not have

thatis of primaryinterest (such as a /test or

much visual appeal.

analysis of variance [ANOVA)]). These provide

4. An advantage of graphs for quantitative

information needed for preliminary data

variables (such as histograms) is that they

screening. Published research reports typically

provide easily understandable information

include only a few sentences about preliminary

about distributionshape. This chapter

screening (if they mention it at all). Hoekstra,

describes several distribution shapes

Kiers, and Johnson (2012) noted that many

18% Page 98of 624 + Location 2811 of 15772

The imageis a frequencytable that shows hypothetical marital statusscores.

authors don’t report much about data screening; they argue that the validity of statistical results is often questionable because assumptions required

Therearefive columns: valid count, frequency,percent, valid percent and cumulative percent.

for statistical analysis are not satisfied (and often not even checked). Potential violations of some of the assumptions that are introduced later can be

Details are as below:

assessed by examining graphs.

* valid count, frequency, percent,valid percent, cumulative percent ......

5.2 Pie Charts for Categorical Variables Pie charts are almost universally despised by scientists, and you are unlikely to see them in academic journals; however, they are popular in

never married, 20, 47.6, 47.6, 47.6 engaged,4, 9.5, 9.5, 57.1 married, 11, 26.2, 26.2, 83.3 divorced, 4, 9.5, 9.5, 92.9 widowed,3, 7.1, 7.1, 100 Total, 42, 100, 100

mass media, so you should be familiar with them. Consider the frequency table for hypothetical

Figure 5.2 Use of Frequencies: Charts Dialog Box

scores for the categorical variable marital status

to Request a Pie Chart

(Figure 5.1).

Recall that the “Cumulative Percent” column automatically provided by SPSS makes no sense

a

e ッー C © вона

for categorical variables. Focus on the “Frequency” and “Percent” columns. To request a pie chart, use the familiar Frequencies procedure, beginning with these SPSS menu selections: + . Click the Charts button to open the Frequencies: Charts dialog box in Figure 5.2; within that window, select the radio button for “Pie charts,” then click Continue and OK. Edited pie chart output appears in Figure 5.3. Figure 5.1 Frequency Table for Hypothetical

Marital Status Scores

maritalstatus

Valid

never married engaged married divorced widowed Total

equeney 20 4 11 4 3 42

Percent Valid 476 476 95 95 262 262 95 95 74 74 100.0 1000

Cumulative Percent 476 574 833 929 1000

19% Page 99 of 624 - Location 2836 of 15772

There are twoboxes,andthe one ontheright has a variable titled maritalstatus. Below is a selected check box nameddisplay frequency tables. At the bottom are options buttonsfor

the following; OK,Paste, Reset, Cancel and Help.

Ontheright are theradio buttonsStatistics, charts, format and help. The Charts option has been depressed. The frequenciescharts dialog box hasfour chart type check options: none, bar charts, pie charts andhistograms. The Pie chartsoption hasbeenchecked. The chart values tab hastwochoices frequencies and percentages. Frequencieshas

been selected.

provide complete information.

Atthe bottom are the option buttons Continue,

Pie charts have only two virtues. They provide

Cancel and Help.

colorful slides in presentations, and this is

Figure 5.3 Pie Chart for Hypothetical Marital Status Data, N = 42

something that some data analysts (in marketing, for example) may like. Also, they lend themselves well to humor. (Search online for “funny pie charts”to find examples, or create your own

Marital Status [ Nevermarried

I Married IM Divorced [El Engaged

O widowed

There are five options: never married, married, divorced, engaged and widowed. The largestis the never married pie, followed by married, engaged, divorced and widowed.

comic version. Perhaps you can persuade your instructor to give a prize or extra credit for the most comical or ingenious examples.) If you become a science researcher, you will probably

never usepie charts.

5.3 Bar Charts for Frequencies of Categorical Variables The SPSS Frequencies procedure, which was used in previous chapters to obtain frequency tables,

can also provide charts (or graphs). To open the Frequencies dialog box that appears in Figure 5.4, make these menu selections:

The frequency table in Figure 5.1 tells us that the

> >

group with the largest number of members is

. Clickthe Charts button on the

“never married”; this correspondsto the solid

right-hand side of the Frequencies dialog box to

“slice”in the pie chart. The frequency table has a

open the Frequencies: Charts box; a bar chart is

great advantage over the pie chart; it provides

obtained byselecting the radio button for “Bar

exact frequencies and percentages, while the pie

charts” in the Frequencies: Charts dialog box (also

chart only approximates group sizes (unless the

shown in Figure 5.4). The Yaxis may be given in

slices are labeled using numbers or percentages).

frequencies (number of cases) or percentages.

Kopf (2015) reviewed reasons why many data analysts hate pie charts. For example, people are

Click Continueto return to the main Frequencies dialog box. Click OK to run the procedure.

not good at estimating percentages from the areas

The hypothetical marital status scores in Figure

of the slices. Pie charts require the use of colors (or textures such as dots or stripes) to differentiate

5.1 were used to set up the bar graph in Figure 5.5. The height of each bar represents group size. I

slices; most science journals do not publish figures

edited the bar graph produced by SPSS (using the

in color. Tufte (2001), who authored several books

SPSS Chart Editor and Microsoft Paint) in the

about excellence in graphing, regards most

following ways: I increased font sizes for the X

multicolored figures as unsightly; he argues that

and Yaxis labels and added the exact number of

graphs should use as little ink as possible to

cases per group (from the frequencytable) above

19% Page 100 of 624 - Location 2859 of 15772

each bar. 5.4

Frequencies

Dialog

Box

and

Frequencies: Charts Dialog Box

e

Eramcem

Emarital

x

Da

い yeeme o (corn) carene

ceeeme

There are two boxes,and the one on the left hasa variable titled marital. Below is a check box nameddisplay frequency tables. At the bottom are options buttonsfor the following: OK,Paste, Reset, Cancel and Help. On the right are theradio buttonsStatistics, charts, format and help. The Charts option has been depressed. Thefrequencieschartsdialog boxhas four chart type check options; none, bar charts, pie charts andhistograms. The bar charts option hasbeenchecked. Thechart valuestab has two choices frequencies and percentages. Frequencieshas beenselected. At the bottom are the option buttons Continue, Cancel and Help. Figure 5.5 Bar Chart for Hypothetical Marital Status Groups, Total N = 42

20

never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3

5.4 Good Practice for Construction of Bar Charts Bar charts and other graphs should provide accurate information that is easy to understand.It is easier for readers to understand graphs when they follow simple rules and conventional

standards. 1. A separate bar represents the frequency (or proportion or percentageofcases) for each group. The height of the bar corresponds to the number or frequency in each group (or the proportion or percentage of cases in each group). The labels on the Yaxis should make clear whether frequency, proportion, or percentage is reported. However, the relative heights of the bars are the same no matter which label is used. (Usually bars are vertical, but it is possible to set up bar charts in which

20

bars are horizontal.) 2. Names of groups are specified by labels on the

15 Frequency

Thedetailsare asfollows:

.....

Figure

The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies.

Xaxis. 3. Bars should have equal widths. (This rule is

10

not always followed.) 4. The height of the graph (¥axis) is usually less than the width of the Yaxis (the height of Yis

Never married

Engaged

Married

Divorced

Widowed

Martial status

19% Page 101 of 624 - Location 2885 of 15772

often about 75% the length of X). 5. The Yaxis begins at 0 (or at another minimum value of Y.

6. The top of each bar is labeled with an exact

considered (20 never married, 3 divorced), the

numerical value (a frequency or a

never married group is only about 7 times as large

percentage). SPSS does not dothis for you; I

as the widowed group.

added this information using SPSS Chart

Figure 5.6 An Example of Bad Practice: Deceptive

Editor.

Bar Chart for Frequency of Marital Status

20 18

7. Information about total NV must be provided.

source of data should bestated. Readers tend to assume that numbers are based on new data collected by the researcher; if there is

Frequency

8. In afootnote or the body of the text, the

another source (such as Gallup polls or the U.S. census), that source must be identified.

Never married

9. Bars in bar graphs for categorical variables usually do not touch one another. (This

Widowed

Divorced

Married

Engaged

Martial status

reminds readers that bars represent distinct

When you generate bar charts for frequencies in SPSS, many of these good form requirements are taken care of by default (e.g., bars are equal widths, and the Yaxis begins at 0).

5.5 Deceptive Bar Graphs The most common way to make a bar chart for group frequencies “lie” is to set up the Yaxis so

thatit does not start at 0. To illustrate this deception, I modified the graph in Figure 5.5 so that the Yaxis begins at 2 (instead of 0). The modified bar chart in Figure 5.6 is potentially misleading because people tend to look at the ratio of bar heights (or bar areas) when they compare

The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies. Thedetailsare asfollows:

.....

groups.)

never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3

Figure 5.7 Deceptive Bar Chart: Use of Cartoons Instead of Bars to Represent Frequencies

10000 Number of new houses built 5000

group sizes; people often do not pay close attention to the specific values indicated on the Y axis. In Figure 5.6, the differences in group sizes appear larger than in Figure 5.5. In Figure 5.6, the never married group appears to have about 10 times as many members as the widowed group (measure the height of the bar for never married and dividethis by the height of the bar for the widowed group). When actual group sizes are

19% Page 102 of 624 - Location 2906 of 15772

o

LE

2009

2019

Year

The X axis denotes the year, 2009 and 2019

and the Y axis denotes the number of new housesbuilt and ranges from 0 to 10,000.

mean of 80.9.

for which the mean, median, and mode have

This example illustrates two things: e When one very high score is added to this sample, the value of Mincreases (while the

value of the median and mode do not change). This demonstrates that the mean is less robust against the impact of extreme

scores than the median and mode. e With one or more extremely high scores added, the value of the sample mean Mis higher than the median; and in this example, Mis actually higher than the majority ofthe individual scores in the sample. Under these circumstances the sample mean Mis nota very good way to describe “average”or typical responses. Note that adding an extremely low

score will make the mean smaller than the median.

4.8 Behavior of Mean, Median, and Mode in Common RealWorld Situations

similar values. Suppose you have a survey question that asks peopleto rate their degree of agreement

with this statement: “I think that the U.S. economy is doing well.” Response options are scores of 1 = strongly disagree (SD), 2 = disagree (D), 3 = neutral (N), 4 = agree(A), and 5 = strongly agree (SA). We might obtain a frequency distribution

like the one in Figure 4.4. Note that the answer given by the largest number of people corresponds to 3 (neutral), the next highest frequency responses were 2 (disagree) and 4 (agree), and the most extreme responses, 1 (strongly disagree) and 5 (strongly agree), were uncommon. For now, we will call this pattern a “bell-shaped”distribution. (Later, we'll talk more formally about normal distributions.) Bell-shaped distributions tend to have values of the mean, median, and modethat are close to one another. In the graph in the lower part of Figure 4.4, the number above the bar for each score value (such as 0) corresponds to the frequency of that score in the table (in the upper part of Figure 4.4). For example, in this hypothetical data set, a score of 1

This section previews the use of graphs to

had a frequency of 6. A score of 3 had a frequency

represent score frequencies for quantitative

of 33 (i.e., 33 people chose the answer 3). The

variables (graphs are discussed more extensively

histogram or graph at the bottom of Figure 4.4

in Chapter 5). Figure 4.4 shows a frequencytable

represents the same information about

for a set of hypothetical scores. A corresponding

frequencies using bars with heights that

histogram presents the same information

correspond to frequency. This distribution can be

graphically; the height of each bar in the

informally defined as bell shaped; there is a peak

histogram correspondsto the frequency of that

in the middle, and the pattern is symmetrical;

score (i.e., the number of people who had that

thatis, the left-hand sideof the distribution is

score value).

approximately a mirror imageof the right-hand

side.

4.8.1 Example 1: Bell-Shaped

Distribution

Figure 4.4 Hypothetical Likert Scale Ratings With Bell-Shaped Frequency Distribution: (a) Frequency Table and (b) Corresponding Histogram

First let’s consider a hypothetical batchof scores

15% Раде 78 о624 - Location 2338 of 15772

CONETC Reverse J.haped

my ornormalor Gaussian

One mode Is ator near0. (This could be called severely positively skewed, but ts more: extreme)

“Spiy”butcosetonormal Reasanablyclosetonormal (Md callthis "spikybutthat is notatermyouwould use in research report). Uniform distribution OccursifÆscoresareramks (eg.inthisexamplefivecases areranked, with ranksof 1,2.3, 4,and5). Anproimatly normalwith autlrsatoneor bothends

1 i 1 2 3.45. 6795 w ^*

Bimodaldistribution With modes atend points. Possible situation: A degree of agreement questionforwhich opinions are strongly polarized (SD =stronglydisagree, disagree, N= neutral, agree, SA = strongly agree).

Positivelyskewed. More “weight” althe low endand alongen thinner tailatthe highend. skewed” Positivelyskewed

EC

foo malmeerotar

En

| —mer” o

‘normally distributed samples

TeaAS

with meansthat arefarapart.

Negativelyskewed Longer, thinner tall atthe low end ofthe distribution, Negatively skewed distributionsare not very common in behavioral sciencedata.

Trimodal Three modes that are not at end points. (Could be scores for three normally distributed samples that have different means.

Thisdistributiondoes notlook likeanyspecific distribution shape.



Table ⑤.②

However, when we look at real data, we often see

distributions that do not look anything like normal or bell-shaped curves. Table 5.2 shows

19% Page 106 of 624 + Location 2052 of 15772

distributions with shapes that are clearly not

none of the common distribution shapes is a good

close to bell-shaped or normal. Tables 5.1 and 5.2

description for a histogram you obtain for your

do not include all possible distribution shapes;

data. People who often work with specific types of

there are many others.

variables (such as reaction time) will learn the

To decide whether one or more of these distribution shapes best describe the data in your sample, you can obtain a histogram and compare it with the examples in these tables. Visual examination of a histogram is usually sufficient to

specific distribution shapes for those variables.

5.7 Obtaining a Histogram Using SPSS

make reasonable evaluations about distribution

The hypothetical female height data in

shapes. In Chapter 6, you'll see that there are

femaleheight.sav are used to set up a histogram.

quantitative methods to evaluate how well data fit

You may have wondered why height and

a specific distribution shape; however, these are

temperature are used as variables in early

rarely usedin practice.

examples. Each of these variables can be given in

The bell-shaped distribution in row 1 of Table 5.1 is discussed extensively in statistics. Informally, we can describe this bell-shaped distribution shape as follows.

different units (for example, height can be inches or centimeters; temperature can be given in degrees Fahrenheit or Celsius). (The United States is one of very few nations that still uses nonmetric units such as inches.) The following

e There isa “hump”in the center of a bellshaped distribution. In a perfectly normal distribution, the mean, median, and mode are exactly equal and correspond to the center of the distribution (and all correspond to the top of the hump). ヶ Frequencies (the heights of the bars in the histogram) decline gradually as scores become either larger or smaller than the mean, median, or mode; this creates a shape something like a bell. ヶ The distribution is symmetrical around the mean. Thatis, the upper half of the histogram is a mirror image of the lower half. Comprehension questions will ask you to examine histograms and evaluate whether the distribution is bell-shaped with minor variations or is described better by quite different distribution shapes. This is a somewhat subjective judgment call. Sometimes the best decision is to say that

19% Page 107 of 624 - Location 2955 of 15772

example shows how to obtain histograms; examples demonstrate that converting units of

measurement from inches to centimeters does not change the shape of the frequency distribution (although unit conversion does changethe values of M, SD, and other descriptive statistics).

You will find it useful to be able to convert scores from one unit to another, and to do other computations. The SPSS Compute Variable command can beused to do this and has many additional potential uses. In this situation, we use this command to obtain (approximate) height in centimeters by multiplying height in inches by 2.54. To open the Compute Variable dialog box in Figure 5.8, select these menu options: + . In the left-hand window, type the name of the new variable (in this example, heightem). In the right-hand window, type a numerical expression

that includes the name of one (or more) existing

andhelp.

variable(s) that is used to assign values to the new

Below these buttonsare icon buttons to opena

variable (in this example, the numerical expression is “2.54*heightinch”). After you click OK, the new variable heightcm will appear as a new column on the right-hand side of your data

worksheet. Now let’s compare the distributions of height in inches and height in centimeters. The familiar Frequencies procedure (used to obtain descriptive statistics and pie and bar charts for categorical variables) can be used to request histograms for quantitative variables. Use these SPSS menu selections: っ > . Move both variables (heightinch and heightem) into the Variable(s) pane. Click the Charts button and select the radio button for “Histograms.” You may also want to

check the box for “Show normal curve on histogram.” Click the Statistics button and use checkboxes in the Frequencies: Statistics dialog box to choose the desired descriptivestatistics. Click OK to run the procedure. Output for descriptivestatistics appears in Figure 5.9 and the histograms in Figure 5.10.

Figure 5.8 SPSS Compute Statement to Convert Height From Inches to Centimeters

file, save, print, and othertable editing options.

The Transform menubutton,onbeing clicked results in a drop down menu with the following options; computevariable, programmability transformation, countvalues within cases, shift values, recode into same variables, recode into different variables, automatic recode, create dummyvariables, visual binning,rankcases, data andtime wizard, create timeseries, replace missing values, random number generators and run pending transforms. The computevariable button has been depressed, leading to a dialog boxto compute variables. Atthe top left, thereis a box titled Targetvariable, where heightem has been filled in the field. Below this is a Type and label button, which hastwoentries; heightinch and heightem. Heightinch has beenselected. Ontheright, a numeric expression field has the entry 2.54 into heightinch. A keypad with standard numbers and symbols is below this. Ontheright is a Function group section with the following entries;all, arithmetic, CDF and noncentral CDE, conversion, current date or time,date arithmetic, and datecreation. Below this is an empty box titled Functions and special variables. An IF statement box has the statement

Optional case selection condition. ニニ ーー ma

At thetopofthe spreadsheet,titled femaleheight.sav, are the following menu buttons:file, edit, view, data, transform, analyze, graphs, utilities, extensions, window 20% Page 107 of 624 - Location 2081 of 15772

At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 5.9 Descriptive Statistics for Hypothetical Female Heights in Inches and Centimeters

Statistics heightinch N

⑫0

MECO

Missing 」

heightcm ⑫0 」

0

0

Mean64.481637665

Median Mode

Std. Deviation



6450

163.8300

64

162.56

2.463 6.25614

Variance

6.067

39.139

Minímum

58

147.32

Maximum Percentiles

| 25

50

75

70 |

177.80 」

— 63.00 | 160.0200 」

6450 1638300 66.00

167.6400

..........

Thedetailsofthestatistics figures are mentioned below:

N Valid: 120, 120 N Missing: 0,0 Mean:64.48, 163.7665 Median: 64.5, 163.8300 Mode: 64, 162.56 Std. Deviation: 2.463, 6.25614 Variance: 6.067, 39.139 Minimum: 58, 147.32 Maximum: 70, 177.80 Percentiles: © 25: 63, 160.02 : 64.5, 163.83 6,167.64

o 58 60 62 64 66 68 70 72 м Height (inches)

o 140

150 160 170 Female height (cm)

180

In the first diagram, the X axis denotes the height in inches which ranges from 58 to 72, rising in increments of 2. The Y axis denotes the frequency and rangesfrom 0 to 20, rising inincrementsof 5. The SD has been specified as 2.5 oneitherside of the mean. A curve drawnthrough each of the bars ofthe histogram is approximately bell shaped. The second diagram’s X axis denotes the heightin centimeters and ranges from 140 to 180, rising in incrementsof 10. The Y axis denotes the frequency and rangesfrom 0 to 20, rising in increments of 5. The SD has been specifiedas 6.25 on eitherside of the mean. A curve drawn through each of the bars ofthe histogram is approximately bell shaped. As you might expect, transformation of scores from inches to cm changed all values for descriptivestatistics, such as mean and standard deviation. For example, the mean for height in centimeters is 2.54 times the meanfor height in inches. Each descriptivestatistic for height in centimeters is 2.54 times the corresponding statistic for height in inches (except that variance for height in centimeters is 2.542 times the variance in inches).

Figure 5.10 Histograms for Hypothetical Female Height Data

Did this transformation change the shape of the

distribution? Figure 5.10 shows that zhe distributions ofheight scoresgiven in inches and centimeters have identical shapes, even though individual scores and descriptive statistics such as M and SD are in different units, and the units along the X axis differ.

20% Page 109 of 624 - Location 3006 of 15772

Ihave marked が /and ⑤の in the two histograms above. The sample mean is approximately in the middle, marked by the letter Mon the XY axis.

Recall that SD summarizes information about distances of scores from the mean; SD is shown as

horizontal arrowsthat indicate distance from the meanof X: For height in inches, SD was 2.5 inches. The end points of the arrows that indicate the

distance of one SD below Mand one SDabove M

are: Other

Lower end point:M — 1 SD = 64.5 — 2.5 = 62 Upper end point:M + 1 SD = 64.5 + 2.5 = 67 In Chapter6, you'll learn about the mathematical definition of normal distribution shape (expressed in the form of a somewhat complicated equation). That equation generates

the smooth curves superimposed on the

55

70

85

100

115

130

145

The X axis rangesfrom 55 to 145, rising in increments of 15. The meanis the highest pointofthe curve, at 100, and the curveis symmetrical on both sides. There are three arrowson eitherside of the mean. For example, scores on many IQ tests are normally

distributed with a mean of 100 and a standard

histograms above.

deviation of 15 (or sometimes 16). This is enough

5.8 Describing and Sketching Bell-ShapedDistributions

distribution, as shown in Figure 5.11.

information to sketch the shape of the

When sample data are approximately normally distributed, you need only three pieces of information to specify the distribution, communicate information about it to someone else, and/or draw a sketch of thatdistribution. These pieces of information are: 1. The distribution shape (normal). 2. The sample mean M. 3. The sample standard deviation SD.

Figure 5.11 Sketch Based on Three Pieces of Information: Normal Shape, M = 100, SD= 15

20% Page 109 of 624 - Location 3023 of 15772

The range rule (from the previous chapter) will help you identify the approximate locations of the minimum and maximum value on the XY axis, and this rangeis divided into six parts (the range is approximately equal to 6 x SDif the distribution is normal). Note that you won't be ableto label the Y axis in this graph. You can label the following seven points along the X axis if you know Mand

SD. The seven Xaxis values marked in Figure 5.11 are calculated from Mand SDasfollows:

M-3x SD

100- (3 x 15) =65

M-2x SD

100— (2 x 15) = 70

M-1xSD

100- (1 x 15)

M+0xSD

100 + 0 100

M+1xSD

100+ (1 x 15) = 115

M+2xSD

100 + (2 x 15) = 130

M+3xSD

100 + (3 x 15) = 145

Score locations relative to the mean can be approximately described as follows (we will describe distance from the mean more precisely in Chapter 6). A score can becalled “not very far from the mean” if it lies within the range M-1 SD and M+ 1 SD. For example, an IQ of 110 is not very

far from the mean. An Xscore can be called “far from the mean”if it is below M- 2 SD or above M + 2 SD. For example, an IQ of 135 is far above the mean; and an IQ of 69 is far below the mean. A score can becalled “unusually far from the mean” if it is less than #/- 3 SD or greater than M+ 3 SD. For example, an IQ of 50 is unusually far below the mean, while an IQ of 150 is unusually far above

the mean.

scores equal to or greater than 160).

5.9 Good Practices in Setting up Histograms Mostrules for good practice in bar chart construction also apply to the construction of histograms: 1. A separate bar represents the frequency (or proportion or percentageof cases) for each score value (or for a rangeof score values, as described later in this section). 2. The height of each bar correspondsto the number or frequency in each group (or the

proportion or percentage ofcases in each group).

3. Labels on the Yaxis should makeclear Whether frequency, proportion, or percentage is reported. (However, the relative heights of the bars are the same regardless of labels.) 4. Score values are specified by labels on the X axis.

Another way to look at this: If you had a set of

5. Bars should have equal widths.

1,000 IQ scores with M=100 and SD = 15, and you

6. The height of the graph (Yaxis) is usually less

selected one case at random, the mostlikely

than the width of the Yaxis.

outcome would be an IQ in the range from 85 to

7. The Yaxis begins at O.

115. You could obtain a case with an IQ in the

8. Inbarcharts, it is good practice to label the

range 150 and up, but that would be an unusual or

top of each bar with an exact numerical value

unlikely outcome.

(afrequency or a percentage). There may not be enough space on a histogram to include

If you know your own IQ, or any specific IQ score,

such labels. Clearly labeled tick marks on the

you can locate that score on the Xaxis, and

Yaxis help readers evaluate frequencies.

immediately see the following:Is your IQ score

above or below the mean M Is it far from the mean, or unusually far from the mean? Refer to

9. Information about total MN must be provided. 10. Ina footnote or the body of the text, source of

Figure 5.11. An IQ of 90 is below the mean, but it

data should be stated. Readers tend to assume that numbers are based on new data collected

is not very far from the mean. An IQ of 160 is

by the researcher; if there is another source

above the mean, and it is unusually far from the

(such as Gallup polls or the U.S. census), that

mean(in other words, very few people have IQ

20% Page 110 of 624 - Location 3044 of 15772

science data. Negative skewness is possible (with a few extreme scores at the low end)

but less common. 3. If adistribution is bell shaped or approximately normal, the values of the mean, median, and mode will be close together. The mean is a good way to describe central tendency for bell-shaped distributions; the median and mode will have

similar values. 4. When in doubt, or if the situation is complicated,it may be better to report the entire frequencydistribution (and/or histogram) along with values for the mean, median, and one or more modes.

Good practice:

This is deceptive.

* Fail to makeclear which index of central tendency is reported, and fail to note potential problems withit. Chapter 1 mentioned “lying with statistics.” Reports of central tendency can be deceptive when they present only selected information that creates the impression the author wants to create. When an author wants readers to think, “Wow, that averageis really high,” the author might choose to report the highest of the three values (mean, median, or mode). Conversely,if the author wants readers to think, “Wow, that averageis really low,” the author might choose to report the lowest value among mean, median, and mode. An author who cherry-picks the highest

* Do preliminary data screening by examining afrequencydistribution table and graph to evaluate whether the mean, median, and/or mode(s) are better ways to describe central tendency. eo If implausible score values appear, go back

and reexamine the data to correct errors. * Note the number of missing values. e State whether extreme scores or multiple modes were detected (or whether the distribution is approximately normal). e State clearly what statistic is used (mean, median, or mode) to describe average

responses.

“average”is presenting misleading (although perhaps not technically false) information.

4.10 Using SPSS to Obtain Descriptive Statistics for a Quantitative Variable Previoussections discussed statistics for central tendency; the following sections discuss statistics

to describe variability. In this section, SPSS is used to obtain all these descriptive statistics (to describe both central tendency and variability) from data in the file named temphr10.sav using the SPSS frequencies procedure.

Bad practice:

To run Frequencies, make these menu selections

* Obtain a mean, median, or mode without

(as in the example in Chapter 3): >

examining a frequency table or graph. * Select the index of central tendency value

っ . This opens the main dialog box for the frequencies

that “fits the narrative.” For example, if you

procedure; in this window, move the variable hr

want to report a high average, you can select

into the Variables window. Click the Statistics

whichever of these three statistics has the

button in the top right-hand corner of the main

highest value, whether it makes sense or not.

dialog box for the frequencies procedure to open

16% Page 82of 624 - Location 2456 of 15772

Thereare several bars, signifying manybins, and the distribution has several spikes in the center, with the right end becoming almost

1250

flat. The curve drawn through the bars resembles a normal distribution thatis

positively skewed.

Figure 5.14 Optimal Number of Bins Determined by SPSS for BMI Histogram

200 249

185 150 0 30

20

10

40

The X axis rangesfrom 10 to 40, rising in incrementsof 10 and the Y axis hasjust one

number 1250.

g o È 50



20

ー es



Bu

The histogram is a big bar stretching from 15

to 40 alongthe axis and reaching upto the Single value ⑫⑤0 on theYaxs

N

The X axis denotesthe BMI and ranges from 10 to 40,rising in incrementsof 10. The Y axis

Figure 5.13 Large Number of Bins (“Too Many”

Bins) WithJagged Distribution Shape

Frequency

во)

зо 20

Histogram

denotesthe frequency and ranges from 0 to

200, rising in increments of 50.

There areseveral bars, signifying the bins, and the distribution spikes in the center, with the right end becoming almost flat. The curve drawn through the bars is approximately normal except fora few outliersat the high end ofthedistribution. The bars for 18.5 and 24.9 have been specifically marked out.

N |

Figure 5.14 shows the histogram for BMI scores when SPSS was allowed to decide on the “optimal”

number of bins. I marked the clinical cutoffs for BMI

normal BMI (18.5 and 24.9) on this histogram as points of reference.

The X axis denotes the BMI and ranges from 10

to 40, rising in incrementsof 10. The Y axis

In Figure 5.14, it is clear that the distribution

denotes the frequency and ranges from 0 to 60, rising in increments of 10.

shape for BMI was approximately normal except for a few outliers at the high end of the

20% Page 113 of 624 + Location 3087 of 15772

distribution, thatis, a few cases with unusually

at the upper end of the incomedistribution, many

high BMI. The SPSS default choice for the number

additional bars would be needed to represent the

and widths of bins provided a relatively smooth

full range of incomes in the United States. If you

histogram to use for evaluation of the distribution

drew an X axis wide enough to include all these

shape for this data set. (SPSS does not publish the

additional bars, the graph would haveto be at

details of how this decision is made, and rules for

least five times wider than shown in Figure 5.15.

this can be complex.)

To avoid that problem, information about

When your variable can be evaluated in terms of clinical guidelines (a BMI between 18.5 and 24.9 is generally described as indicating healthy body weight), it can be useful to evaluate distribution shape relative to these clinical cutoffs. A large proportion of students had BMI scores in the “healthy” range. A fairly substantial minority of students had BMIscores that would be judged overweight or very overweight; a few had BMI scores that would be described as underweight. (A frequency table would provide the information

incomes greater than $200,000 was compressed

into two bars. When looking at graphs like this,

readers need to notice how the last few bars were defined.A first impression might be that there is a modefor incomes between $200,000 and $205,000, but this impression is incorrect. In fact,

there is an extremely long and thintail for this incomedistribution (the distribution is extremely positively skewed).

Figure 5.15 Annual Household Income in the United States in 2010

% of households

needed to find the exact percentages of persons

6%

‘who were over- or underweight.) This

5%

distribution is positively skewed; it has a longer tail at the high end. Skewness is discussed further

in Chapter 6. It is desirable to have bins that correspond to the same ranges of score values, but thisis not feasible in some situations. Figure 5.15 shows a histogram for real data: the percentages of households whose annual incomes fall into ranges such as less than $5,000, between $5,001 and $10,000, and so

forth. In the histogram in Figure 5.15, each bar (except

4% 3% 2% 1%

, as shown in Figure 5.18. In mostreal-life situations, researchers want to compare boxplots for the same variable for two or more groups, as in the following example. BMI is an index of body weight corrected for height. Using data in the file bmi.sav, we will examine BMI scores separately for men and women. In the first Boxplot dialog box (Figure 5.19), highlight the box for “Simple” boxplot and select the radio button for “Summaries for groups of cases.” In the Define Simple Boxplot: Summaries for Groups of Cases dialog box (in Figure 5.20), the name of the variable for the plot (heightinches) is moved into the variables list. The resulting boxplot graph appears in Figure 5.21. On the basis of output from the SPSS frequencies

procedure (not shown here), the median BMI was

23 for men and 22 for women. There were numerous outliers for both groups, mostly higher

BMI scores. You need to know that when two

menu options; bar, 3-D bar,line, area, pie, high-low, boxplot, error bar, population pyramid, scatter or dot and histogram. The spreadsheethasfive columns and 15 rows filled with numerical data.

scores have the same value, SPSS draws just one

circle. Each circle indicates the row number in the SPSS data file where the outlier score is located. You can determine the number of scores identified as outliers by counting these numbers. To find the number of nonextreme outliers, count

Figure 5.19 Initial Boxplot Dialog Box: Select “Simple” and “Summaries for groups of cases”

€ Boxplot

x

the case numbers for the open circles and ignore

the case numbers for the asterisks. An outlier that is not extreme is denoted using an open circle,

LA

Simple

É 朗

Clustered

while outliers labeled as extreme appear as

asterisks. Figure 5.18 Menu Selections to Access Boxplot Dialog Box

| craie

7 B i В ©

5

О ゃ n ョ ョ



ョm ク ャme o Tenelpeeeeee 1 0 верен atis о

р 5 С



B ッ : ⑧ ョ

E ow w リ ョ ョ 1 no

ー①森

Em a し as

E

в

a ョ x

“es Moe lame Em

[Huei ョ ョョ mm (Howe ョ ョ [Ветви ッ ョ mm ョ ae

=

-

Data in Chart Are

© summaries for groups of cases © Summaries of separate variables

(Detine (cance Help

a

At thetopofthe spreadsheetarethefollowing menu buttons;file, edit, view,data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttonsto open a

file, save, print, go back andforward, and other table editing options.

The graphs menu option has beenclicked and a drop-down menu showsthefollowing; chart builder, graphboardtemplate chooser, Weibull plot, compare subgroups, regression variable plots, andlegacy dialogs. Legacy dialogs has been depressed,leadingto the next group of 21% Page 117 of 624 » Location 3202 of 15772

There are twotypes of boxplot choices available; simple and clustered.Thedata in the chart can be of twotypes; summariesfor groupsof cases and summaries of separate variables. The box for “Simple” boxplot and the radio button for “Summariesfor groupsof cases” have beenselected. At the bottom ofthe dialog box are buttonsfor the following; Define, cancel and help. Figure 5.20 Define Simple Boxplot: Compare BMI Scores for Male Versus Female Groups

descriptivestatistics for variation (including

scores high enough to warrant diagnoses of mild,

minimum, maximum, range, variance, and

moderate, or severe depression? In a study of a

standard deviation) can be obtained by hand and

new antidepressant drug, for example, readers

how they are interpreted.

would want to know whether most patients were mildly or severely depressed.

4.11 Minimum, Maximum, and

Range: Variation Among Scores The simplest way to describe variation among scores begins by rank-ordering scores from lowest to highest. The lowest score value is the minimum (often abbreviated as Min); the highest score value is the maximum (Max). As noted in Chapter 3, the range is maximum - minimum. For the heart rate datain Figure 4.1, Min = 62, Max = 82, and range = 20. Whydoes this information matter? It helps us characterize the variety of people we have in the sample.

4.12 The Sample Variance s2 We can obtain more useful information about variability by using information for a//the individual scores. If all people had the same heart rate score, there would be no variance (e.g., a sample with hr scores of 72, 72, 72, …, 72 will have variance of 0). Variance in hr exists when people have different values of hr. Variability is evaluated by examining how far individual people's scores are from the mean.

4.12.1 Step 1: Deviation of Each

When a variable has real-world uses, clinical or other interpretation guidelines can help us

understand what the minimum and maximum scores in a sample tell us. For example, guidelines published by the Mayo Clinicstate that the normal adult resting heart rate ranges from approximately 60 to 100 beats per minute. A wellconditioned athlete might have a heart rate of about 50 beats per minute. The people in this hypothetical sample all have hr scores within the lower half of the normal range. Thistells us that the sample consisted of people with heart rates in the low normal range, and this suggests a sample

Score From the Mean Equation 4.2 appeared earlier, and it is repeated here as Equation 4.4. The first step in calculation of variance is to compute the deviation of each person's score from the sample mean M. (ズー カ answers the question, How far is a person's Y

score above or below the mean? Other

(4.4)

Deviation of individual Xscore from mean = (X- M).

of persons with good cardiovascular fitness. If the

For the data in temphr10, the deviation of the

sample had Min hr = 90 and Max hr = 120, this

first X score from the mean is (70 — 73.1), that is,

would indicate that many or most of the members

the score for the first case minus the mean of hr

of the sample have unusually high heartrates.

scores.

When a frame of reference for the evaluation of scores is available, it should be used when characterizing the sample. For example, if depression is assessed, one might ask, Are some

17% Page 85of 624 + Location 2510 of 15772

Why do some people have higher, and some people lower, hr scores? Because people have different characteristics, such as physical fitness, smoking, and anxiety, that make their heart rates

outliers (but not as extreme outliers). Three scores

which is considered healthy. Histograms would

(on rows 126, 157, and 197) were identified as

help to evaluate distribution shapes.

extremely high outliers. The case on row lies in between these groups of scores. You can

determine whether the in-between BMIscore on row is an extreme outlier by comparing the BMI valuein the data file on row 3 (which is 34) with the BMIvalues for the two neighboring values. The BMIscore on row 157 is also 34, and case number 157 is not tagged as an extreme outlier in this boxplot; therefore row 3 would also not be

5.11 Telling Stories About Distributions After you examine graphs such as histograms or boxplots, you should be able to tell an honest and reasonably complete story about the pattern you see. Imagine this game: Your task is to get a person

identified as an extreme outlier.

who has not seen the histogram or other graph to

We can report results for the male BMI boxplot as

information you provide. You win the game if you

follows. Values obtained from the SPSS

and your partner can do this more quickly and

frequencies procedure, not shown here, are used

accurately than other teams. Ready? Go!

to identify the exact values for the 25th, 50th, and 75th percentiles and the minimum and maximum scores. For men, median BMI was 23;

50% of male BMI scores were between 22 and 25. There were 2 low-end outliers for male BMI; neither was extreme. There were 13 high-end outliers; 10 were not extreme and 3 were extreme

outliers. For men, minimum BMI was 16 and maximum BMI was 41. Median BMI for women was 22; 50% of female

BMI scores were between 20 and 23. The female group had no low-end outliers for BMI. There were five nonextreme high-end outliers (rows 204,302,318,353, and 398). There were also two extreme high-end outliers; these BMI scores appear on rows 290 and 374. Minimum BMIfor women was 17 and maximum BMI was 33.

draw a picture of the graph, based only on verbal

If you have a roughly normal or bell-shaped distribution, you can communicate this to your partner very quickly with three pieces of information (normal, M, SD). That should be sufficient for your partner to sketch a graph. If the distribution appears somewhat normal but with some variations, such as positive skewness or outliers (see Table 5.1 for examples), you need to add that information (for example, three outliers at the high end). On the other hand,if your distribution does not resemble a bell-shaped curve (see Table 5.2), you need different stories or pieces of information. It may be sufficient to say “reverse J-shaped” or bimodal or uniform. However, you will need to give your partner more information (for example, the maximum score was 10). Distributions that

If we compare men and women, it appears that

have one or more modes and non-normal shapes

men tend to have higher BMIs than women.

require more information. Where was each mode

It would also be useful to examine the frequency distributions for BMI using suggested clinical cutoffs to evaluate the percentage of persons whose BMIs were within the range 18.5 to 24.9,

21% Page 120 of 624 » Location 3234 of 15772

located? Were some modes higher than others? Think about what your results mean. Figure 5.22 Histogram for Polarized Degree of Agreement Ratings

52%

30% strongly agree). Very few people chose intermediate levels of agreement. Most people strongly disagree, but the number of people who

30%

strongly agree is a substantial minority. Use of a mean or median (a value somewhere around 2.5) to describe central tendency would be misleading in this situation; 2.5 is near the neutral point, but very few people chose ratings near neutral. A concise way to communicate this would be: “Fifty-

10%

two percent strongly disagreed with this

2%

6%

statement, 30% strongly agreed, and very small percentages of people chose intermediate levels of

SD

D

N

A

SA

1

2

3

4

5

agreement. Opinion was strongly polarized.” If the author of a research report makes a blanket statement that all variables had approximately

Note: Agreement with the statement “The current

normal distributions, or allows readers to assume

U.S. president is doing an excellent job” was rated

that all distributions were normal, and then tells

using the response options 1 = strongly disagree, 2

readers that the mean degree of agreement with

= disagree, 3 = neutral or don’t know, 4 = agree, and

this statement was 2.5, this information byitself

5 = strongly agree.

provides a misleading description of the results.

The image is a histogram that showsa degree of responses to a statement “The current U.S.

president is doing an excellent job”. The X axis denotes the responses that range from strongly disagree, disagree, neutral, agree and strongly agree. The Y axis denotesthe percentageofresponses. There are five bars, andtheir heights are; o Strongly disagree: 52 percent + Disagree: 10 percent * Neutral: 2 percent Agree: 6 percent o Strongly agree: 30 percent

5.12 Uses of Graphs in Actual Research 1. Data screening: Identify potential errors or problems with data (such as recording errors, implausible scores, and missing values). Researchers need to report the number of scores that are problematic and indicate what they did to correct these problems. For beginning students, it may be sufficient to report the percentage of missing scores and

the number of outliers and extreme outliers for each variable. I suggest that beginning

How can the hypothetical results in Figure 5.22 be

students run analyses with outliers included

described? Opinion is highly polarized;that is,

and with outliers excluded;if results are

peopleare at either the negativeor positive

substantially the same, report one of these

extreme in this hypothetical example. There are

analyses and add a footnote to indicate that

two modes (52% of peoplestrongly disagree and

the other analysis yielded similar results. For

21% Page 121 of 624 » Location 3260 of 15772

both beginning and advanced students, keep

Screening”section of your research report.

arecord of any problems you detect in data,

(For some statistics you will need to check

and anything that you do to deal with the

additional assumptions.)

problems. Discussion of better ways to handle

2. Youmight need to say, “The histogram

outliers and missing values are provided in

appears approximately normal except for a

Volume II (Warner, 2020).

specific number of outliers.” In this situation

. Evaluation of whether assumptions for

you face the “what to do with outliers”

analysesare violated: When you learn

problem. Ideally, you decide what to do with

statistical techniques such as ¿tests, ANOVA,

outliers prior to data collection. You need to

and regression, you will see that each analysis

document the number of outliers and what

is based on some assumptions. Some analyses

you decided to do with them (such as drop

work fairly well, under certain

from analysis, recode into different values, or

circumstances, when their assumptions are

leave them in). Do not experiment with

violated; others do not. There is a widespread,

different ways of handling outliers until you

but not exactly accurate, belief that scores in

find results you like; this is p-hacking.

samples need to be normally distributed to

3. You might need to say, “The distribution is

satisfy the assumptions for many common

very skewed, and skewness cannot be

analyses. I think it would be more accurate to

corrected by modifying or removing a few

say that, in practice, some kinds of departure

outliers.” Only if it is conventional in your

from normality in the sample (such as the

field, only if values differ by orders of

presence of extreme outliers, or reverse J-

magnitude, and only if planned ahead,log or

shaped or polarized distributions) create

other nonlinear transformations may be

problems in many common analyses. The

applied to data analyses using log(X) instead

ways the violations of assumptions and rules

of X.

can lead to incorrect conclusions are

4. In some situations that involve outliers,

discussed in later chapters about significance

nonparametric analysis may be preferable.

tests.

When scores are converted to ranks, extreme

. Report information needed to characterize

outliers and skewness are not problems.

and describe your sample: For categorical

(Newer robust techniques, not covered in this

variables, this is often in sentence form, for

book, may be better choices; Field, 2018.)

example, “The sample consisted of 100 male

5. If distribution looks nothing like a normal

and 150 female university students, with a

distribution (e.g., uniform, J-shaped, U-

mean age of 19.1 years.”

shaped, mode at zero), proceed with caution.

Here are some of the stories (or descriptions) about distributions that might appear ina research report.

Entirely different analyses than the ones in this book may be required.

5.13 Data Screening: Separate

1. You might say, “The histogram appears approximately normal with no extreme

Bar Charts or Histograms for

outliers.” You can state this in the “Data

Groups

21% Page 121 of 624 » Location 3281 of 15772

Appendix 4B reviews rounding. I suggest that you retain at least three decimal places during computations. Final results for most statistics are

e...

real data, SSis usually not a whole number.

75; 1.9; 3.61 80; 6.9; 47.61 82; 8.9; 79.21 Sum:0; 288.9

often rounded to two decimal places. See Note that SS cannot be a negative number

Appendix 4B for a discussion of rounding.

(because we are summing squared deviations, and

In Figure 4.9 (data from temphr10.sav) the

squared numbers cannotbe negative).

squared deviation from the meanfor each individual person appears in the last column (the

Other factors being equal, SStendsto be larger

variable named deviationsq). Adding the scores

when:

for deviationsq gives the valueof SS for this data set: SS= 288.90. For larger data sets, it is more convenient to have a computer program do this.

Figure 4.9 Deviations and Squared Deviations of Heart Rate Scores From Mean

|@ deviation

69 70 71 73 74 ⑦⑤ 75 80 82

|devi

1. The individual (X- M) deviations from the meanare larger in absolute value. 2. The number of squared deviations included in the sum increases. The minimum possible value of SS (which is O)

11.10 4.10 -3.10 2.10 -①0 .⑨0 1.90 1.90 6.90 8.90

occurs when all the X scores are equal and,

16.81 9.61 4.41 0① -⑧① 3.61 3.61 47.61 79.21

therefore, equal to M. For example, in the set of scores [73, 73, 73, 73, 73], the SSterm would equal 0. There is no limit, in practice, for the maximum

valueof SS. To interpret SSas information about variability,

we need to correct for the fact that SStends to be larger when the number of squared deviations included in the sum is large. Dividing by #, the number of scores in the sample, seems like the obvioussolution. However, this does not provide

Sum

0.0

288.90

........

The imageis a table that showsheart rate values, deviation and square of deviations.

hr; deviation; deviationsq 62; minus 11.1; 123.21 69; minus4.1; 16.81 70; minus3.1; 9.61 71; minus2.1; 4.41 73; minus.1;.01 74;.9;.81 75;1.9;3.61 17% Page 87 of 624 - Location 2566 of 15772

the best answer.

12.3 Step 3: Degrees of Freedom It might seem logical to divide SSby Ato correct

for the increase in size of S§as increases. However, this yields values that are slightly too small; Gosset (discussed in Tankard, 1984) worked out the reason for the problem and discovered a simple solution. When we look at the pieces of information used to compute SS(i.e., the deviation of each score from the sample mean), it

described as approximately normal. Among the

トA相 川④ る

three outliers identified in the boxplots, the only one that stands out clearly in the histograms is the male height of 78 inches or 6’6, or about 198 cm. This is unusually tall, but the number is not so large that you would think it impossible. Figure 5.24 Separate Boxplots for Height for

Female

and

Male

Groups

(Data

From

malefemaleht.sav)

レ ーー ョー ェ ーーシーー ョーー ョーー ョ ーーーーー ェ ーー ロー n ーーーーー nシmm noe ョc ョ mn

je レー

eニー

The imageis a screenshot of the menu bar in SPSS.

Female

Male

There are two boxplotsin the imageindicating heights for female and male groups. The X axis denotesthe sex, whether male or female andthe Y axis denotestheheight in inches. This range from 55 to 80,rising in increments of 5. The female boxplot has a median of 64 and one low-end outlier. Thisfigure lies at a lower plane than the male boxplot. The male boxplot, with median around 70, has one high-end outlier and one low-end outlier. Figure 5.25 Command to Organize Output by Groups

At thetopofthe spreadsheetarethefollowing menu buttons;file, edit, view,data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttons to open a file, save, print, go back and forward, and other table editing options. The Databutton has been depressed, and the following optionsare visible in the drop-down menu;define variable properties, set measurementlevels of unknown,copy data properties, new custom attribute, define date and time,define multiple response sets, identify duplicate cases, compare datasets, sort cases, sort variables, transpose, adjust string widthsacross files, merge files, restructure, rake weights, propensity score matching,case control matching, aggregate,

copy dataset, and split into files.

The split file dialog box hasa large box for the variable which has beenfilled with thevariable Heightinch. Ontheright are checkboxes such as; analyze all cases, do notcreate groups; compare groups and organize output by groups. Thelast has beenchecked.

22% Page 123 of 624 - Location 3320 of 15772

is possible to see that we do not have N independent deviations (or pieces of information) available to compute the SS; in fact, we have only (W-1) pieces of information.

This modified divisor, V-1,is called the

To explain why deviations from the mean in a

the deviations are “free to vary.” The use of df

sample of Vscores provide only (1-1)

instead of Vasa divisor is another frequently used

independentpieces of information aboutdistance

toolin the statistician's bag of tricks. Later

from the mean, recall that the sum of all

analyses also use terms,although のoften has

deviations of scores from the mean must equal 0.

different values than (W-—1) in other situations.

Suppose we have & = 3 scores in a sample(call

Degrees of freedom for the SSand sample variance

these scores X71, Xp, and X3) and that their meanis

are obtained using Equation 4.7:

M.

of freedom

(45. The 4/term tells us how many of

Other

First, we convert each X score into a deviation by subtracting the sample mean M. We know that the sum of these deviations must equal zero. That yields this simple equation:

Other

(Х, - М) + (Х, - М) + (Х, - М) = 0.

(4.7)

df= (N-1). 4.12.4 Putting the Pieces Together: Computing a Sample Variance

We can rearrange this equation by subtracting (X35 — M) from bothsides; the equation becomes:

The variance for a sampleis usually denoted s2. A sample variance is obtained by dividing SSby its

Other

degrees of freedom:

( — M) + X, - / = (X, ー ル ⑦. When we compute (X; —M) + (X — M) (on the left side of the equation), this gives us the value that the remaining deviation, (X3 - 1), must have. Only the first two deviations are “free to vary,” that is, free to take on any possible value. Once we know the value of any two of the deviations, the value of the last deviation is determined (it must be

whatever number is needed to make the sum of all deviations equal 0). This is only a demonstration, not a formal proof.

Other

(4.8 )

»# = SS/(N - 1)or SS/df (Some textbooks use $2 to denote a sample variance calculated as SS/N. In actual practice, this

notation is almost never used when statistics are applied to real-world data, and you will not see $2 again in this book.)

Return to the data in Figure 4.9. The first column shows heart rate scores for each person. The

second column shows the deviation of each available when we compute SSoranother

17% Page 88 of 624 + Location 2588 of 15772

person’s score from the mean(the variable name is deviation). The third column shows each

maps to show the spread of obesity in the United

Nevertheless, she persisted.

States over time. A PowerPoint presentation that shows a series of maps from 1985 to 2010 appears

Figure 5.31 Florence Nightingale’s Graph: Number of British Soldiers Who Died in the

at

Crimean War During Each Month Divided Into

https://www.cdc.gov/obesity/downloads/obesity trends 2010.ppt. Figure 5.30 shows a more

Three Causes of Death Diagram of the causes of mortality in the ARMYin the EAST.

recent graph for prevalence of obesity in the have higher percentages of obesity. (The

April 1854 to March 1855.

Bulgaria

United States in 2017. States shaded darker gray corresponding map online at

https://www.cdc.gov/obesity/data/prevalencemaps.htmlis keyed in color.) At aglance you can see several features of the data. High rates of obesity occurred in the deep south, Iowa, and West Virginia. Colorado, Hawaii,

and the District of Columbia had low rates. U.S. residents can see how obesity rates in their states compare with those of other states.

Outer portion

5.15.3 Historical Example Most peoplethink of Florence Nightingale as a pioneer of nursing; her work also had an enormous impact on medicine and hospital design (Lienhard, 2002). During the Crimean War, she sent reports to Britain about the number of

soldiers who died each month and their causes of death. She used polar diagrams (this is not currently a popular form of graph) to

communicate this information. Figure 5.31 is adapted from part of her graphics (Nightingale, 1858). Her major finding was that far more soldiers were dying from preventable diseases (sometimes acquired in the military hospitals) than from wounds. Up until the 19th century, this was true in many wars. The point she wanted to make was that far more sanitary conditions and better nutrition were needed to keep the army (and civilian populations) healthy. This was not something the War Department wanted to hear.

22% Page 127 of 624 » Location 3412 of 15772

Source; Public domain.

In the diagram,each month has a separateslice of apie. The length and width ofthepie varied basedon the numberofdeaths. The pie is subdividedbasedonthe cause of death, ‘mainly reasonsof battle, disease and other causes. The main cause of the deaths seemsto be due to disease. The datafor the months between April 1854 to March 1855 has beencoveredin the diagram.April to June had very low deaths. July was slightly higher, while August and September was higherthanthe previous ‘months. There wasa dip in October, but Novemberlevels are similar to those of

You can describe distribution shape by thinking

vegetable consumption?

about the answers to these questions. Some of

3. Diet experts often recommend at least

these descriptions are not mutually exclusive. For

five servingsof fruits and vegetables per

example, a positively skewed distribution may

day. How well are the peoplein this

also have high-end outliers, and it may have a

sample doing at meeting that standard?

large mode at zero.

4. What percentage of persons reported eating one serving per day? Thisisa

In atypical research report, authors would like to

frustrating question to answer, given

beable to say something like this at the beginning

this bar chart. If you had access to these

of the “Results” section: “All quantitative variables

data, what other SPSS output would you

were approximately normally distributed with no

want to see to answer this question

extreme outliers.” Real data often do not behaveso

precisely?

nicely, of course. An author might have to say

. Briefly describe, in your own words, three

something more like this: “Number of doctor

things you look for to decide whether a

visits had a reverse J-shaped distribution with five

histogram lookslike a “reasonably normal”

high-end outliers.”

distribution. . Describe the shape of each of the histograms

Comprehension Questions

in Table 5.3. Sometimes more than one term can be applied; for example, skewed

1. Inthe bar graphs in most of this chapter

distributions may also have outliers.

(except those in Section 5.14), the height of

. Whattype of plot appears in Figure 5.33?

the Yaxis provides what information?

What do the values on the Yaxis correspond

2. Suppose you generate a bar graph using SPSS.

to? (Score values? Frequencies?) What

You also have a frequencytablefor the same

information can you report from this plot?

data. What information from the frequency

There are omissions in labeling. Whatlabels

table might you add to the bar graph to make

could be added to this chart?

the information in the bar graph more Figure 5.32 Results From Warner, Frye, Morrell,

precise? 3. Whatisacommon practice that can makea bar graph deceptive? Can you think of at least one other way bar graphs can be made deceptive?

and Carey (2017): Number of Servings of Fruits and Vegetables Eaten on a Typical Day, V= 1,250

50% 40%

4. What can you see in a histogram of quantitative scores that is less easy to see in a frequency table? 5. Consider the histogram in Figure 5.32.

1. What were the minimum and maximum number of servings of fruits and vegetables peoplesaid they ate per day?

30% 20% 10% 0%

O

1 2 3 4 5 6 7 Number of servings offruits and vegetables per day

What was the range?

2. What was the modal amount offruit and

23% Page 1300f 624 - Location 3458 of 15772

The X axis representsthe numberof servings

8

........[.

of fruit andvegetables and the Y axis the percentageeaten. There are 8 bars, and their values are as follows; 0: :④② 1 :①② 2 :11 3 :10 4: 5:

Figure 5.33 Figure for Comprehension Question 8: WhatIs It?

23% Page 133 of 624 - Location 3482 of 15772

when you compute the following. The values of Mand SD can be combined to set up ranges of score values; that is, we can combine

information about the mean and information about typical distances from the mean. This can be done using integer multiples of 57, such as M+

1250 Lo M-28D=505 na 15D ma M-1°SD=62 ET) mas — = ら т sD a M+1SD=67

= a = ョ E ッ ッ ッ

#3 М+2'50=695

1x SDand M+ 2 x SD.

“oso

For M= 64.5 and SD = 2.5, we obtain the following

Other

M-2xSD=64.5—5 M — 1 x SD = 64.5 — 2.5 M:0xSD=64.5+0 M +1 x SD = 64.5 + 2.5 M+2xSD=64.5+5

59.5. = 62. =64.5. = 67. = 69.5.

The shorter vertical arrow next to the frequency table in Figure 4.10 extends from M-(1 x SD) to M + (1 x SD). This correspondsto the frequencies enclosed in the smaller ellipse. The longer vertical arrow ranges from M-(2 x SD) to M+ (2 x 5D), score values from 59.5 to 69.5. This corresponds to scores in the larger ellipse. Most women in the sample had heights that were included in the range M-(2 x SD) to M+ (2 x SD); only three women (2.5%) had scores below 59.5, and only two women (1.7%) had scores above 69.5.

In words: When we combine information about distance from the mean(57) with the location of the mean (M), we obtain information about the rangeof values within which most of the Yscores lie; this is called the rangerule. The range rule works only for bell-shaped distributions, as in the present example. Figure 4.10 Hypothetical Data for Female Height

in Inches for # = 120 Women With 47 = 64.5 and ②の =②.⑤

17% Page 90 of 624 - Location 2647 of 15772.

The imageis a combinationof a table and a graph that shows hypothetical data for female height. The table hasfour columns; valid count, frequency,percent and cumulative percent. Details are below; о о о ooo.

values for the hypothetical female height data:

58;1;.8;.8 トル ;②⑤⑤ „7; 11.7 1.7; 23.3 63; 12; 10; 33.3 20; 16.7; 50 64; 3.3;63.3 65; 66; 18; 15; 78.3 12.5; 90.8 67; ; 5; 4.2; 95 „3; 98.3 70; 2; 1.7; 100 Total; 120; 100

There are2 circles over the figures; one covers thepercent values 11.7, 10, 16.7, 13.3, 15, and 12.5 andthesecondcoversa larger set of percent values including 2.5, 6.7, 11.7, 10, 16.7, 13.3, 15, 12.5, 4.2 and 3.3.

Thegraph in the second part of the image showsthe X and Y axes as well as the 1 into SD and2 into SD lines. The following figures are mentioned alongside the graph: * Mminus2 into SD equals 59.5 * Mminus1 into SD equals 62 * Mequals 64.5

The Normal Distribution and z Scores

a zscore that correspondsto the original X score.

A zscore, also called a sta score (which mightbein dollars, kilograms, or degrees Celsius) from the sample mean, in unit-

6.1 Introduction In the previous chapter, you learned to evaluate score location by examining cumulative percentages in frequencytables. You can obtain information such asthe percentage of persons

free or standardized terms. Then, we use a table of

areas for the sta

a

look up the percentage of scores that fall below that zscore. This method works well only ifthe distribution shapefor scoresis reasonably close to

normal.

who havescores below a specific value of Yby

To do this, we need to define normal distribution

examining cumulative percentages in frequency

shape more precisely. A distribution, also

tables.

called a Gaussian distribution, appears

You already know something about score locations in everydaylife. To evaluate how tall you are, you look at other peopleof the same sex and ask, Are most of them taller or shorter than Iam? If you see that more than half of them are shorter, you know your height is above average. If something like 90% of peopleare shorter than you, you know you are muchtaller than average.

approximately bell shaped in a histogram. However, many bell-shaped curves do not correspond exactly to normal distributions. What

defines a normal distribution is a fixed relationship between distance from the mean and area under the curve. This relationship is given in

detail in tables of the standard normal distribution. Appendix 6A provides a brief explanation of the mathematics of the normal

We will need a method to describe locations in

distribution. The area below the value of zthat

distributions that can be generalized to more

corresponds to an Æscore in a normal distribution

situations (and that does not require all the

is roughly equivalent to the cumulative

information in a frequency table). When

percentage of scores below that Y value in the

distributions have an approximately normal

frequency table.

shape, we can evaluate locations of specific X outcomes quickly by converting X values into a

unit-free index of distance from the mean. The

6.3 Standardized or zScores

only information we need for that is Mand SD for

A zscore is an index of the distance of an Yscore

the distribution of X scores.

from the sample mean that has been converted into unit-free or standardized terms. Suppose that

6.2 Locations of Individual Scores in Normal Distributions The new method for score location introduced in this chapter involves two steps: First, we compute

23% Page 135 of 624 - Location 3494 of 15772

Æis height; Æscores can be given in different units, such as inches or centimeters. When we

convert an Xscore into a zscore, we obtain an index of distance from that mean that is not related to the original units of measurement.

6.3.1 First Step in Finding a 2 Score for Æ The Distance ofY From M The first step toward evaluating the location ofa specific scoreis to find the distance (or deviation) of the Y score from the sample mean Min the original units of measurement, such asinches. That distance, also called a deviation from the mean, is (X- M). You haveseen this term before. Deviation of individual score X from a sample mean Mis:

Other

(6.2)

а = (X-M)SD. The values of Mand SD differ depending on the unit of measurement (e.g., feet, centimeters, or inches). When we convert X'to z, we obtain z scores that are independent of the original unit of

measurement. We can say that zscores are standardized or unit free.

Standa

very frequently used tool in the statistician’s bag of tricks. You will see this again in many future situations.

Other

As an example, consider one individual female

(6) X-M).

height score (my own), given in both inches and centimeters. The example in Table 6.1 demonstrates that we end up with the same z

For example, Iam 62 in. tall (X= 62). Let’s assume

score even if the units of measurementfor X

the mean height of women in a sample is M= 64.5

differ. The left-hand column in Table 6.1 provides

in., and the standard deviation SD = 2.5. For me, (X

all the needed information in inches, and the

—M) = (62 — 64.5) = -2.5. 1 am 2.5 in. below average

right-hand column gives the corresponding

height for women in the sample.

information in centimeters. At the bottom of each

The sign of (X- M) tells you whether Yis below the mean(if X- Mis negative) or above the mean (if X — Mispositive). This is part of the information we want. However, the value of (X- M) doesn’t tell us what percentage of persons are shorter or taller

column, a zscore is computed using the values of X, M, and SD. Note that you convert inches to centimeters by multiplying by 2.54. A woman whose heightis 62 in. is 167.64 cm tall.

Table 6.1xz

than Xinches.

6.3.2 Second Step: Divide the (YM) Distance by SDto Obtain a Unit-Free or Standardized Distance of Score From the Mean

м-в

=-100

M= 1638 o 21082-100

The point of this example is that the value of zis the same (within rounding error) whether X

To evaluate how far an individual X score is from

heightis given in inches or centimeters. I am 62

„М, уме can compute a zscore (also called a standard

in. tall (or approximately 163.8 cm). Whether

score or standardized score):

heightis given in inches or centimeters, Iam 1 standard deviation below the average height for

23% Page 136 of 624 » Location 3523 of 15772

women in this example. This is a demonstration

score of -1.00 tells me that this height is 1

(not a proof) that the value of zdoes not depend

standard deviation below the mean. More

on original units of measurement.

generally, once we have a zvalue, we can say, This

I suggest that you obtain a zscore for your own height. For female height in inches, use M = 64.5 and SD = 2.5; for male heightin inches, use M=

score is zstandard deviations below the mean(if z is negative) or this score is zstandard deviations above the mean (if zis positive).

67.5 and SD = 2.5. To convert inches to

We don't know yet whether a distance of 2=-1.00

centimeters, multiply these values by 2.54. Your z

is not very far, or very far, below the mean. Is 2=-

score tells you whether you are above or below

1.00 so far below the mean that when people see

averagein heightrelative to the imaginary data in

me, they think, wow, that's the shortest woman

this example. (You can find estimates of male and

I've ever seen? We need a way to evaluate whether

female height for many different nations online,

the absolute value of zindicates a notably large, or

and use these values if you want to compare your

small, difference from average.

height with national averages.)

6.4 Converting zScores Back Into YUnits

If scores for the variable of interest, suchas height, are normally distributed, we can use graphs or tables of zscores for a standard normal distribution to find areas that lie below (or above) 2. These are interpreted like cumulative

If you know thatscores are normally distributed,

percentages. If I want to compare my height with

and you know the values of z, M, and SD, you can

other heights in a normally distributed sample, I

convert a zscore back into the original score by

obtain approximately the same information about

“reversing”the operations in Equation 6.2. First

location if Ilook at the cumulative percentage in a

you multiply zby SD, then you add M, as in

frequency table or the area below zin a normal

Equation 6.3:

distribution. To evaluate location using cumulative percentage, I needed alot of

Other

information (all the scores and frequencies in a frequency table). To evaluate score location using

(6.3)

X=(z x SD) + MN.

zscores, I need only three pieces of information:

If I know that height is normally distributed, that

normal, with mean Mand standard deviation SD.

my zscore is —1, and that for height in inches, M= 64.5 and SD = 2.5, then I can find X Æ= (-1 x 2.5) + 64.5 = 62.

the information that the distribution shape is

Thatleads to the next question: How do we

evaluate whether a distribution of scores is approximately normal?

6.5 Understanding Values of z A zscore can be verbally interpreted. My height is 62 in., and relativeto the values of Mand SDin the previoussection, this corresponds to z=-1.00.A z

23% Page 137 of 624 » Location 3551 of 15772

6.6 Qualitative Description of Normal Distribution Shape The term normalhas a different meaning in

physical fitness. When we go on to bivariate

interest waslife satisfaction (LS). Before doing

analyses, we will ask how hr scores are

analyses to evaluate whether NCIfv predicts LS,

statistically related to other variables, such as

we need to know about the behavior of scores for

amount of anxiety or stress. Results of these

each of these variables. This survey was

analyses can lead to inferences that stress

completed by 492 students from a university in

predicts, or perhaps influences, heart rate.

New England,including 152 male and 340 female

In later chapters you'll see that the overall

variance for a variable such as hr can be divided (or partitioned) into proportions of variance that can bepredictedfrom or are related to other variables (such as physical fitness, smoking, anxiety, and caffeine use). Some variables may predict large

students. They were recruited from introductory courses, 79 from a nutrition course and 413 from psychology classes. All participants were between ages 18 and 24; the modal age was 18. Descriptive statistics for quantitative variables appear in Table 4.1.

proportions of variance in heart rate (possibly

Tables of descriptive statistics often use

these are the variables that have the strongest

abbreviated names for variables that are used

influence on hr). For those of us who are excited

throughout the paper. Notes at the bottom of the

aboutstatistics, this is where the fun begins; this

table identify the variables and provide additional

is where we can make discoveries or test past

information about them. Direction of scoring

research claims about discoveries. Other variables

must be clear (for example, we need to know that

may predictlittle or none of the variance in hr.

ascore of 5 indicates better sleep, rather than more sleep problems). It is helpful to list variables

4.16 Reports of Descriptive Statistics in Journal Articles Most journal articles report descriptive statistics

for numerous variables. Information about categorical variables (that describe groups in the study) can usually be provided in sentence form. Usually information for numerous quantitative

variables is summarized in table form. The following data are from Warner, Frye, Morrell, and Carey (2017). The predictor variable of most interest was number of servings of fruit and

in sets (in this example, a list of well-being outcome measures, a list of behavioral predictors, and alist of dietary predictors). An earlier “Methods” section in the research report would provide more information about how variables

were measured. Information about distribution shapes should be included;this is discussed in

Chapter 5.

4.17 Additional Issues in Reporting Descriptive Statistics

vegetables consumed per day (NCIfv, servings of

Many additional kinds of information can be

fruits and vegetables from a National Cancer

included in summary tables. The minimum

Institute food frequency questionnaire). Past

information usually provided for each

research suggested that people who eat more

quantitative variable is Mand SD. Table 4.1

fruits and vegetables tend to have higher scores

included the possible minimum and maximum

on measures of well-being such as life satisfaction

scores for each variable, on the basis of the way

and positive mood. The outcome variable of most

scores were obtained for these variables. Readers

18% Page 01 of 624 » Location 2696 of 15772

summing the probabilities for the “slices” above z

example, z= +1.00); that is, 15.86% of the area lies

=+1.00:13.59% + 2.14% + .13% = 15.86%. That

above z= +1.00, and 15.86% of the area lies below

is, 15.86% of cases in a perfectly normal

2=-1.00.

distribution have zvalues greater than +1.00. Note that z= 0.00 corresponds to the mean of this

distribution. The sum ofall the slices in Figure 6.1 is 100%. The sum of the slices above the mean (above z= 0.00) is 50%, and the area below z= 0.00

is also 50%. Figure 6.1 Areas in Normal Distribution That

Because the total area under the curveis 100%, once we know the percentage of cases that lie abovea value of z, we can find the percentage of cases below zby subtraction. Because 15.86% of scores lie above z= +1.00, we know that (100% — 15.86%) = 84.14% of cases lie below z= +1.00.

Correspond to zScores

HF0 19 about 997%

210 +2 about 95% Lt —1 to +1 about 68%

6.8 Areas Under the Normal Distribution Curve Can Be Interpreted as Probabilities If you were to draw a case at random from a

os





normally distributed population of scores, the probability that it would have a zscore greater

At the top is a scale that showsthe percentage

of areathat falls underthe curve under different z scores. Minus 1 to plus 1 is about 68 percent. Minus 2 to plus2 is around 95 percent and minus 3 to plus 3 is close to 99.7 percent. There are 6 z scores, 3 each on both the positive and negativeside of 0. The area coveredis: * OtoPlus 1 and minus 1 correspondto 34.13 percent * Plus 1to plus 2 and minus 1 to minus 2 corresponds to 13.59 percent e Plus 2 to plus 3 and minus 2 to minus 3 corresponds to 2.14 percent The outer edges beyond minus 3 and plus 3 correspondsto 13 percent

than z= +1.00 is 15.86%. The probability that a randomly drawn case will have z> 0.00 is 50%. In other words, areas can be interpreted as probabilities. For integer values of z such as z= +1.00, the diagram in Figure 6.1 can be used to answer questions about area and probabilities. However, 2 values are often not integers. Areas that correspond to other (noninteger) values of zcan

beobtained from tables of the standard normal distribution, as discussed in the next section.

To summarize information about areas in the standard normal distribution: * The total area under the curve is 100%. * The area below the mean = 50%; the area

Because the distribution is perfectly symmetrical (amirror image), the percentage of area below a specific negative value of z(such as z=-1.00) is the same as the percentage of area that is above the corresponding positive area of 2 (in this

24% Page 138 of 624 - Location 3608 of 15772

above the mean = 50%. The mean is z= 0. * The area above a specific value of +2, such as 2 = 1.96, is the same as the area below —z (—

1.96). e Areas can be combined by addition and

subtraction.

want to evaluate (out in the tail of the

Standard normal distribution tables generally give

distribution).

area in terms of proportion; people often talk

There are several ways to use this table to describe

about areas in terms of percentages. To convert

the location of a score with z= +1.96. Here is the

proportion to percentage, multiply proportion by

easiest.

100.

Suppose we want to know the proportion of area

6.9 Reading Tables of Areas for the Standard Normal Distribution The equation in Appendix 6A can be used to

that lies above, and the proportion that lies below, z=+1.96.

Locate the value of z= 1.96 in column in Figure 6.2. The corresponding number in column C, the “tail area,”is.025.

generate normal distributions for any values of

We can convert from proportion to percentage by

the mean and standard deviation that you want.

multiplying by 100; 2.5% of the area in this

For example, a normal distribution for IQ scores

distribution lies above z= +1.96. By subtraction,

would have a mean of 100 and a standard deviation of 15. The standard normal distribution

97.5% of the area in this distribution lies below z=

has a mean of 0 and a standard deviation of 1; it

like the cumulative percentage in a frequency

corresponds to a distribution of zscores. Figure

table. We could say this score is at the 97.5th

6.1 provides only areas related to integer values of z. In practice we will often need areas that -

percentile.

correspond to noninteger values. More detailed

information about zscore distances from the mean, and areas under the normal distribution, is given in tables of the standard normal distribution. See the table in Appendix A at the back of this book. Part of that table appears in Figure 6.2 (for selected values of zthat range from 1.83 to 2.12). Figure 6.3 shows enlarged versions of the diagrams that appearat the top and bottom of the table; these diagrams indicate which slices or areas correspond to the numbers in the table. For each valueof z, the table provides two kinds of

information about zscore location. Column A lists the zvalues. Column gives the area between z= 0.00 and the zvalue you want to evaluate. (Recall that z= 0.00 correspondsto X = A) Column C gives the area thatlies beyond the z value you

24% Page 139 of 624 » Location 3633 of 15772

+1.96. The percentageof area below a zscore is

This tells us that a person who has a zscore of +1.96 has an unusually high score. We can also think in terms of probability.If a person is randomly selected from this distribution of scores, there is a 2.5% probability that the person will have a higher score, and a 97.5% probability that the person will have a lower score, than z= +1.96. (We can convert zscores back into units for

Xif we want to make these statements in terms of Xscore values.) When zis negative, use the diagrams at the bottom of the table to identify which slices of area in the distribution correspond to ranges of 2 values. Because the distribution is symmetrical, we know the following:

The area between z= 0.00 and z= +1.96 is the same as the area between z= 0.00 and z=-1.96

mode, or are skewed, Mis sometimes not the best

Consider thisset of scores: X= [1, 3, 5, 2]. If you

description of the “typical” response. When you

square each X value and then sum the squared

report a mean, you need to tell readers something

values, you would obtain (1 + 9 + 25 + 4)=39.1f

about the shape of the frequency distribution to

you sum the X's and then square that sum, you

provide the background information needed to

would obtain (1 + 3 + 5 + 2)2 = 112? = 121.It is

understand potential problems with the mean.

important to know which arithmetic operation to

Statistics books provide so many examples of bell-

do first.

shaped distributions that students may assume

There are rules of precedence (order) for

that all data have this distribution shape.

arithmetic operations (see

However, many common kinds of variables do not

http://mathworld.wolfram.com/Precedence.htm

have bell-shaped distributions. Graphs, discussed

1). When I present equations I explain in words the

in Chapter 5, can be used to evaluate whether

order in which computations should be done, and

scores have a bell-shaped distribution or some

often, I use extra parentheses to make this clear in

other distribution shape. We should not assume

the equation. When an expression appears within

that all distribution shapes are bell shaped. When

parentheses, such as (X- 5), do that operation

reporting information about variables, remember

first. If you see E(X?), square each X valuefirst,

that readers may assume a bell-shaped

and then sum the squared X values: (1 +9 +25 +4)

distribution if you do not explain clearly that the

= 39. If you see (F X)?, sum the X values first, and

distribution shape is different.

then square the sum: (1 + 3 + 5 +2)2 = 112= 121.

If you read mass media reports about “averages,”

Be aware thatif you do arithmetic operations in

you need to know whether average was estimated

the wrong order, you can obtain answers that are

using the mode, median, or mean; under some

incorrect by huge amounts.

circumstances, these three descriptivestatistics can yield very different values.

Appendix 4B: Rounding

The next chapter provides further information

Computer programs often provide numbers given

about obtaining and interpreting graphs of

to several decimal places. Each number that

frequency distributions and additional questions

comes after a decimal point represents one

we can ask about distributions of scores on a

decimal place. For example, the number 4.171 has

quantitative variable.

three decimal places.

Appendix 4A: Order of Arithmetic Operations

If you do by-hand computations, you should retain at least three decimal places during your computations to minimize rounding error. Final results are usually rounded to a small number of

Many equations combine two or more arithmetic

decimal places, often two decimal places. The

operations, for example, XX? includes both

preferred number of decimal places to report

squaring and summing X scores. When operations

differs across disciplines and may differ across

are combined,the result often differs depending

variables. Use common sense. It would besilly to

upon the order in which operations are done.

say that the average American gets 7.481 hours of

18% Page 94 of 624 » Location 2746 of 15772.

The values of z from 1.83 to 2.12 and the

values under Band C have been shown asa table. At the top and bottom, thereare figures showing the extentof area covered by B and C.

Table values: 1.83

0.4664 0.0336

1.84

0.4671 0.0329

Onespecific z value, 1.96, which corresponds

1.85

0.4678 0.0322

to .4750 for Band .0250 for C has been

1.86

0.4686 0.0314

1.87

0.4693 0.0307

1.87

0.4693 0.0307

1.88

0.4699 0.0301

1.89

0.4706 0.0294

1.9

0.4713 0.0287

1.91

0.4719 0.0281

1.92

0.4726 0.0274

1.93

0.4732 0.0268

1.94

0.4738 0.0262

1.95

0.4744

1.96

0.4750 0.0250

1.97

0.4756 0.0244

1.98

0.4761 0.0239

1.99

0.4767 0.0233

2.00

0.4772 0.0228

2.01

0.4778 0.0222

2.02

0.4783 0.0217

2.03

0.4788 0.0212

2.04

0.4793 0.0207

2.05

0.4798 0.0202

2.06

0.4803 0.0197

2.07

0.4808 0.0192

2.08

0.4812 0.0188

2.09

0.4817 0.0183

2.10

0.4821 0.1790

2.11

0.4826 0.0174

2.12

0.4830 0.0170

highlighted.

Table values:

24% Page 142 of 624 - Location 3668of 15772

0.0256

Figure 6.3 Detail Distribution Table

B

area below 0 Eee

area between 0 IL ea セ

Y

7 o

From

+z

⑧ area between -z

Standard

Normal

Textbooks sometimes drill students in the use of the normal distribution table with questions such

с

as “What percentage of area lies between 2=-1.00

area above +7

Л

and z= +2.00?” These artificial examples do not correspond to the kinds of questions that are of real interest to data analysts.

b

Data analysts usually want to answer a simple

c

question: Is an score or other outcomeclose to, far from, or extremely far away from the mean?

ando

Data analysts sometimes choose different



numerical values to define “far from.” The

À

AX

o

following z values are common ways of thinking

aboutdistance from the mean. e Values between z=-1.00 and z= +1.00 are “close” to the mean.

There are four diagrams that show the area between O and z as well as beyond z for positive and negative valuesof z.

* Values between 2=-2.00 and 2= + 2.00 (but

The first diagram highlights the area between 0 and positive value of z in a normal distribution diagram. The area to the left of O has been markedas Area below 0 equals 50

between”close and far from the mean. * Values below z=-2.00 or above z= +2.00 are “far from” the mean. * Values below 2=-3.00 or above 2= +3.00 are

percent.

outside the range -1.00 and +1.00) are “in

“very far from” the mean.

The second diagram, to the right of the first, highlightsthe area beyondpositive zina normal distribution diagram. This has been shownas the Area above positive z. The third diagram,below thefirst, highlights the area between 0 andnegative value of zina normal distribution diagram. The fourth diagram, to the rightof thethird, highlights the areathe area beyond negative z ina normal distribution diagram.

6.10 Dividing the Normal Distribution Into Three Regions: Lower Tail, Middle, and Upper Tail 24% Page 142 of 624 » Location 3671 of 15772

A normal curvedivided into these areas appearsin

Figure 6.4. Individual researchers are free to use other values of zascriteria for distances. Researchers are often interested in the situation where the areas beyond +zsum to exactly 5%. A

normal distribution can be divided into three

areas: ヶ 2.5% of the area below -z, the “lower tail,” ® 95% of the area in the center, and ® 2.5% of the area above+2, the “upper tail.” Figure 6.4 Areas That Are Close, Far, and Very Far From the Mean (in z Score Units)

95% of the area in the center, is z= +1.96.

close to M

Another common wayto divide the distribution into lower tail, center, and upper tail appears in

: very far

{between

very far =



4

o

+





Figure 6.6. In Chapter7 (on confidence intervals) we will focus on the range of values that is “not very far

The X axis denotesthe z score and ranges from minus3 to plus 3, with as the center. The area between minus1 and plus 1 is underthe highestpart of the curve and has been termed Close to M.

The area between minus 1 and minus 2 and plus 1 and plus 2 has been termed Between. The area between minus 2 and minus 3 and plus 2 and plus 3 has been termed Far. The outer edges beyond minus 3 and plus 3 has been termed Veryfar. Figure 6.5 Normal Distribution Divided Into Areas Below z=-①.⑨⑥, Between z= -①.⑨⑥ and +①.⑨⑥, and

Above z=+①.⑨⑥

from the mean,” thatis, the middle 95%. There is a 95% chance that a randomlyselected case will

havea score thatlies in the center area. In Chapter 8 (on significance tests), we focus on the areas in the lower and upper tails. There is a 2.5% chance that a randomly selected score will lie in the lower tail and a 2.5% chance that a randomly selected score will lie in the upper tail. These two areas

combined describe outcomes that can be called “far away from” the mean. You should develop a sense that zscores larger than 2 in absolute value (the rounded value for 1.96) indicate that an outcome is usually considered far from the mean (and therefore unusual or unlikely). Also, z-score values larger than 3 in absolute value are very far from the mean (and therefore very unusual or unlikely). Figure 6.6 Bottom .5%, Middle 99%, and Top .5%

of Normal Distribution

Е

10p 2.5%

middle 95%

botom 25% ns



o

a

+196 middle 99%

The X axis denotes the z score and rangesfrom minus 3 to 1.96, with 0 as the center. The area between plus 1.96 and minus 1.96 has been termed Middle 95 percent.Thearea beyond minus1.96 is called the Bottom 2.5 percent and beyondplus 1.96 has beentermed Top 2.5 percent. These areas appear in Figure 6.5. The exact value of zthat “cuts off” 2.5% of area in each tail, with

24% Page 143 of 624 » Location 3694 of 15772

2.576

о

12,576

The X axis denotesthe z score and ranges from minus2.576 to plus 2.576, with 0 as the center. The area betweenplus 2.576 and minus2.576 has been termed Middle 99 percent. The area beyond minus2.576 is.005 percent and beyondplus 2.576 is .005 percent.

6.11 Outliers Relative to a Normal Distribution A score that has a verylarge distance from the

median instead of the mean. 5. Use robust statistical methods (these are beyond the scope of this book;see Field and Wilcox, 2017, for an introduction).

mean(and therefore a very large absolute value of

Ideally you decide the method you will use to

2) is called an outlier. It is possible to use zscores

identify outliers, and the method you will use to

to identify scores as outliers. Tabachnick and

handle them, before you collect data. For example,

Fidell (2018) suggested that scores with zvalues

you could use zscores (which work well for

less than -3.29 or greater than +3.29 can becalled

normally distributed samples) or boxplots (which

outliers. Scores can beidentified as outliers using

are preferable for non-normally distributed

other criteria, for example, location in a boxplot.

samples) to identify outliers. You must describe

Boxplots and zscores may not identify the same

the criteria for outliers, the number of outliers,

scores as outliers. Many other rules can be used to

and the handling of outliers in the research

identify outliers (Aguinas, Gottfredson, & Joo,

report.

2013).

It can be useful as a student exercise to

Outliers create problems in manystatistical

“experiment” with outliers in data (data that you

analyses. For example, the value of the sample

will not publish!). You can evaluate how results of

mean Mis not robust against the effect of outliers.

analyses change when outliers are retained versus

When you see outliers in a sample, at a minimum,

removed. In actual research, you should commit

you need to report:

to decisions ahead of time. It is bad practice to

* The method you used to identify cases as

outliers e The number of outliers * The decisions you made about handling

outliers

“experiment” with outliers in data you plan to publish. You should not drop outliers in various ways until you obtain the outcome you want, and then report one final outcome without explaining that it was “cherry-picked” from a large number of different analyses.

There are several possible ways to handleoutliers, and none of themis a perfect solution:

1. Leave the outliers in. 2. Remove outliers from the data set before analysis (using methods described in Appendix 6B). 3. Modify the values of outliers (i.e., change the

score value of an outlier to the next nearest score value that is not an outlier; this is called Winsorizing; Aguinas et al., 2013). 4. Use anonparametric analysis that can reduce the effects of outliers; for instance, report the

24% Page 144 of 624 - Location 3718 of 15772

6.12 Summary of First Part of Chapter At this point you should be able to do the following. e Convert an XY score (for example, height in centimeters) into a z score, given values of M

and SD. * Given a zscore and values of Mand 57, find the original Y score. e Given a diagram of the normal distribution

(asin Figure 6.1), find the percentage of area

Refer back to Tables 5.1 and 5.2 to see examples of

above or below any integer valueof z, or the

histograms that represent common distribution

percentage of area between any two integer

shapes. Table 5.1 shows some approximately

values of z.

normal distributions with slight departures from

* Using the table of the normal distribution in

normal shape, such as outliers and mild to

Appendix A at the end of the book,find the

moderate skewness. Other histograms were

percentage of area above or below any

clearly non-normal (such as the uniform and

noninteger value of 2, or the percentage of

reverse J-shaped distributions). At this point,

area between any two noninteger values of z.

when you look at a histogram for sampledata, try

However, this is less important for further

to find a good match for your histogram shape in

work in statistics than understanding the

these tables. It is okay if you cannot find a match.

idea of dividing a normal distribution into

Some distributions in samples don’t have any

regions (lower tail, center, and upper tail). * Decide whether an X valueis far away from the mean, on the basis of its absolute value of 2.1 suggest that you call values of z greater

than 2 in absolute value “far from” the mean and values of zgreater than 3 in absolute value “very far from” the mean.

simple shape.

1. Distributions that resemble those in Table 5.1 can be judged “reasonably normal”in shape, with appropriate modifications to descriptions such as “moderately positively

skewed.” 2. Distributions that resemble those in Table 5.2 are not at all close to normal in shape. Some

6.13 Why We Assess Distribution Shape

problems if they are analyzed using the basic

You should always examine histograms or

require different and more advanced

boxplots or other graphs for scores on

analyses.

quantitative variables before you do additional

of these distribution shapes can cause serious bivariate techniques in this book and may

3. Whether a distribution is approximately

analyses. These graphs provide information you

normalin shape or not, pay attention to

need to do the following things:

outliers. Outliers can have a disproportionate

1. Describe distribution shapes for your

variables. 2. Detect outliers. 3. Evaluate whether data meet requirements and assumptions for statistical analyses you plan to do.

impact on results. You must acknowledge the presence of outliers and decide what to do with them (even if your decision is to leave them in). 4. Itis possible to do quantitative tests for departure from normality, as described in Appendix 6A at the end of this chapter.

The third point (evaluating possible violations of assumptions) will be discussed for each new analysis when it is introduced; you do not need to worry about it now.

24% Page 145 of 624 » Location 3742 of 15772

However, quantitativetests for skewness, kurtosis, and overall departure from normal distribution shape are often not very useful in practice. Results of these tests often depend more on samplesize than on degree

of departure from normality (these tests

exam). Variables such as annual income tend to be

almost alwayssignal problems with

strongly positively skewed because minimum

distribution shape, even for distributions

income is 0, but there is virtually no limit to

that are similar to normal, when samples are

income at the upper end of the distribution.

large, for example, N > 200). Furthermore,

Figure 6.8 shows substantial positive skewness

some statistical tests (not all tests) are fairly

(along with a possible floor effect and high-end

robust to violations of assumptions about

outliers). A effect occurs when there is a

normal distribution of scores in the

limit to possible scores at the low end of a

population.

distribution. For most students, the quiz was too hard. For example, a student cannot earn an exam

6.14 Departure From

score less than 0 points. If an exam is extremely

Normality: Skewness

and few students will earn high scores, as in the

difficult, many students will earn very low scores,

One common departure from ideal normal distribution shape is skewness. Skewness is asymmetry; an ideal normal distribution is perfectly symmetrical. If you could “fold” a normal distribution along the line that corresponds to the mean, the two halves would match. Skewness describes the degree to which a histogram deviates from perfect symmetry. We say that a distribution is positively

skewed if

it is “heavy”at the lower end and has a longer, thin tail at the upper end. Conversely, we say that

a distribution is negative

ewed if it has a

longer, thinner tail at the lower end. Figure 6.7 shows schematic examples of positive and negative skewness. Visual examination of a histogram is often sufficient to decide whether there is notable skewness. A quantitative index of skewness can be requested from SPSS (see Appendix 6C); it isn’t needed in most situations. In mostsituations, visual examination of the histogram is sufficient.

Some common situations cause data to be skewed. For example, there may be a lower limit to score values (a person cannot have fewer than 0 children) or an upper limit to scores (a student

cannot obtain more than 100% correct on an 25% Page 1460f 624 - Location 3767 of 15772

hypothetical distribution in Figure 6.8. Figure 6.7 Examples of Distribution Shapes for Positive, Zero, and Negative Skewness

RN A

Positive skewness (long tail on high end) SPSS skewness > 0

Not skewed (perfect symmetry) SPSS skewness = 0

Negativeskewness (long tail on low end) SPSS skewness 200), these tests almost always indicate significant departures from normality. The results of these tests of normality often depend more on samplesize than on distribution shape (University College London, Great Ormond Street Institute of Child Health, 2010). In mostsituations, simple visual examination of a histogram is enough to evaluate whether sample data are reasonably normally

distributed. Quantitative tests for overall departure from normal distribution shape (essentially, comparing the shape of the histogram in your sample with an ideal normal distribution) appear in Appendix 6C.

mean) than for an ideal normal distribution. The

Some textbooks say that a normal distribution of

patterns of scores in the center of platykurtic

scores in a sample is a required assumption for the

(sometimes described as “flatter” than normal)

use of many common statistics. Strictly speaking,

and leptokurtic (sometimes described as more

that is incorrect (Field, 2018). (An assumption

“peaked” than normal) distributions can vary in

involved in developing many of the statistics you

ways that do not correspond to the graphs that

will use was that scores were randomly sampled

appear in some textbooks. It is inaccurate to

from a normally distributed population, but we

describe kurtosis simply as degree of “peakedness”

usually don’t have enough information to evaluate

(Westfall, 2014).

distribution shape in the population.)

In practical applications of statistics, you can

In practice, some departures from normal

ignore kurtosis. Visual examination of

distribution shape, such as extreme outliers, do

distribution shape in histograms and boxplots

cause problems in data analysis. Distribution

provides more useful information about related

shape is discussed in later chapters when it is

potential problems, such as extreme outliers.

importantfor specific analyses.

More complete information about kurtosis (for curious readers) is provided in Appendix 6C.

6.16 Overall Normality 25% Page 148 of 624 - Location 3815 of 15772

6.17 Practical Recommendations for

Preliminary Data Screening and Descriptions of Scores for Quantitative Variables

* Your decision whether to use mean or median (as well as choices among later statistics) may depend on distribution shape and whether outliers are present. * Documentevery decision you made.

When you work with quantitative variables, you should do the following things.

く In all research, decide the value of Nbefore you begin to collect data. (Do not collect data, repeatedly analyzeit, collect more data because you are not happy with results, and then stop at a point where you have results you like.)

Choose the method for outlier identification (such as boxplots or zscores) before you

collect data. Establish rules for inclusion or exclusion of cases ahead of data collection. (For example, you may wantto includea limited range of ages, or only right-handed persons, in your sample.) Decide how you will handle outliers before you collect data. If you anticipate skewness, think about what you might do to reduce skewness ahead of time. In many cases, if skewness is not extreme, you don't need to do anything about it.

6.18 Reporting Information About Distribution Shape, Missing Values, Outliers, and DescriptiveStatistics for Quantitative Variables You use all the information discussed in Chapters 3 through 6 to describe the behavior of each quantitative variable early in your research report. Try to communicate the pattern of information as clearly as possible. Information about distribution shape can be summarized in statements such as: Heartrates were approximately normally distributed, with = 100, M= 74, and SD = 4.5. There were no missing values. Using 2>

3.29 in absolute valueas the criterion for identifying outliers, there were no outliers.

The initial data set had V = 340 heart rate scores, with M= 76 and SD = 6.5. There were

Collect data. Obtain a frequency table; identify impossible or questionable score values and note percentage of missing values. Obtain a histogram and visually examine it to evaluate distribution shape and skewness. Unless skewness is extreme, you probably don’t need to do anything aboutit. To evaluate outliers, obtain a boxplot and/or z

20 missing values. Using z> 3.29 in absolute value as the criterion for identifying outliers, there were 10 outliers, all at the upper end of the distribution. On the basis of prior plans for data handling, the 20 missing values and 10 outliers were removed from the data set, leaving N = 310 cases for analysis. For these 310 cases, M 68 and SD = 5.7.

scores for all cases. Either boxplots or zscores

Number of daily servingsof fruit and

can be used to identify outliers. Note the

vegetables had a possible range of scores from

number and locations of outliers.

O to 8. Scores were not normally distributed;

25% Page 149 of 624 » Location 3841 of 15772

each bar. 5.4

Frequencies

Dialog

Box

and

Frequencies: Charts Dialog Box

e

Eramcem

Emarital

x

Da

い yeeme o (corn) carene

ceeeme

There are two boxes,and the one on the left hasa variable titled marital. Below is a check box nameddisplay frequency tables. At the bottom are options buttonsfor the following: OK,Paste, Reset, Cancel and Help. On the right are theradio buttonsStatistics, charts, format and help. The Charts option has been depressed. Thefrequencieschartsdialog boxhas four chart type check options; none, bar charts, pie charts andhistograms. The bar charts option hasbeenchecked. Thechart valuestab has two choices frequencies and percentages. Frequencieshas beenselected. At the bottom are the option buttons Continue, Cancel and Help. Figure 5.5 Bar Chart for Hypothetical Marital Status Groups, Total N = 42

20

never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3

5.4 Good Practice for Construction of Bar Charts Bar charts and other graphs should provide accurate information that is easy to understand.It is easier for readers to understand graphs when they follow simple rules and conventional

standards. 1. A separate bar represents the frequency (or proportion or percentageofcases) for each group. The height of the bar corresponds to the number or frequency in each group (or the proportion or percentage of cases in each group). The labels on the Yaxis should make clear whether frequency, proportion, or percentage is reported. However, the relative heights of the bars are the same no matter which label is used. (Usually bars are vertical, but it is possible to set up bar charts in which

20

bars are horizontal.) 2. Names of groups are specified by labels on the

15 Frequency

Thedetailsare asfollows:

.....

Figure

The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies.

Xaxis. 3. Bars should have equal widths. (This rule is

10

not always followed.) 4. The height of the graph (¥axis) is usually less than the width of the Yaxis (the height of Yis

Never married

Engaged

Married

Divorced

Widowed

Martial status

19% Page 101 of 624 - Location 2885 of 15772

often about 75% the length of X). 5. The Yaxis begins at 0 (or at another minimum value of Y.

distribution shape, we need to provide much

normal distributions (and other similar

more information to provide a complete

distributions) are used to identify “common”

description of different scores or responses.

versus “uncommon”(or rare or unexpected)

In somesituations, we may need to report a

outcomes.

complete frequency table to providefull

information.

2. Problems in the distribution: Information about distribution shape is needed to identify potential problems such as outliers and

skewness.

3. Describing quantitative variables in research reports: If there are few variables, you might summarize information about

each variableis a sentencesuch as “Scores on Xwere approximately normally distributed,” or “Scores on X were extremely positively skewed, with 3% missing values, and two low-end outliers identified by location in a boxplot.” If there are many variables, a table

could summarize this information. The skills you need to remember from this chapter are: * How to convert an score intoazscore (given values of Mand SD).

* How to convert a z score back into an Xscore

Appendix 6A: The Mathematics ofthe Normal Distribution A function is an equation that generates values for

a Yvariable on the basis of values of one or more Y variables. The very simple function to convert height in inches (7) to height in centimeters (7) is ¥=2.54 x X. This is a linear function; if you plot values of Y(vertical axis) against values of Y (horizontal axis), the equation corresponds to a straight line, as shown in Figure 6.11. The equation (function) for the normal (or Gaussian) distribution is much more complicated, and it generates a curve (not a straight line). The equation uses a lot of notation you have not seen yet. The key things to notice are that: e Yrepresents the height of the curve (on the vertical axis) + (X-u) represents the distance of an X score from the mean (on the horizontal axis of the

(given values of Mand SD). * How to find the percentage of area above or below any value of zor the percentage of area between any two values of 2. e How to identify outliers.

® How to summarize information about a quantitative variable, including at least distribution shape, missing values, outliers, and descriptivestatistics such

plot of the function) Equation 6.4 generates a valuefor ¥(the height of the distribution) as a function of the distance of

an Æscore from the mean. Other

(6.4)

(Xn)?

as Mand SD.

o

The material in this chapter is extremely important; two widely used statistical procedures (confidence intervals and statistical significance tests) depend on understanding the way areas of

25% Page 151 of 624 - Location 3887 of 15772



e



ひ ②

6. The top of each bar is labeled with an exact

considered (20 never married, 3 divorced), the

numerical value (a frequency or a

never married group is only about 7 times as large

percentage). SPSS does not dothis for you; I

as the widowed group.

added this information using SPSS Chart

Figure 5.6 An Example of Bad Practice: Deceptive

Editor.

Bar Chart for Frequency of Marital Status

20 18

7. Information about total NV must be provided.

source of data should bestated. Readers tend to assume that numbers are based on new data collected by the researcher; if there is

Frequency

8. In afootnote or the body of the text, the

another source (such as Gallup polls or the U.S. census), that source must be identified.

Never married

9. Bars in bar graphs for categorical variables usually do not touch one another. (This

Widowed

Divorced

Married

Engaged

Martial status

reminds readers that bars represent distinct

When you generate bar charts for frequencies in SPSS, many of these good form requirements are taken care of by default (e.g., bars are equal widths, and the Yaxis begins at 0).

5.5 Deceptive Bar Graphs The most common way to make a bar chart for group frequencies “lie” is to set up the Yaxis so

thatit does not start at 0. To illustrate this deception, I modified the graph in Figure 5.5 so that the Yaxis begins at 2 (instead of 0). The modified bar chart in Figure 5.6 is potentially misleading because people tend to look at the ratio of bar heights (or bar areas) when they compare

The X axis denotes the marital status of never married, engaged, married, divorced and widowed. The Y axis denotesthe frequencies. Thedetailsare asfollows:

.....

groups.)

never married: 20 engaged: 4 married: 11 divorced: 4 widowed: 3

Figure 5.7 Deceptive Bar Chart: Use of Cartoons Instead of Bars to Represent Frequencies

10000 Number of new houses built 5000

group sizes; people often do not pay close attention to the specific values indicated on the Y axis. In Figure 5.6, the differences in group sizes appear larger than in Figure 5.5. In Figure 5.6, the never married group appears to have about 10 times as many members as the widowed group (measure the height of the bar for never married and dividethis by the height of the bar for the widowed group). When actual group sizes are

19% Page 102 of 624 - Location 2906 of 15772

o

LE

2009

2019

Year

The X axis denotes the year, 2009 and 2019

and the Y axis denotes the number of new housesbuilt and ranges from 0 to 10,000.

The X axis denotesthe z score and ranges from minus3 to plus 3, with as the center. The standard deviation hasbeen providedas 1 and the meanis 0.

second Select Cases: If dialog box.

The area oneither side of the mean,thatis, between 1 and 0 ontheright as well as between minus1 and 0 ontheleft, is equal to

“temp_Fahrenheit < 100.” A logical expression

34.13 percent.

Thearea betweenplus 1 andplus 2 on the right and minus 1 and minus2 on theleft is equal to 13.59 percent each. Thearea betweenplus 2 andplus 3 on the right and minus 2 and minus3 ontheleft is equal to 2.14 percenteach. Theareabeyondplus 3 and minus3 on either sideis 13 percent. At thetopofthefigure, threelines show the area underthe curve. The area under minus 1 to plus 1 is 68 percent. The area under minus 2 to plus 2 is 95 percent. The area under minus3 andplus 3 is around99 percent.

Appendix 6B: How to Select and Remove Outliers in SPSS If aresearcher decides on rules for the identification and removal of outliers before looking at the data, and detects outliers using these rules, the following SPSS commands can be used to remove(filter out) outliers.

Next you will see the Select Cases: If dialog box in Figure 6.15. Type in the logical expression generally includes a variable; operators such as greater than, equal to, or less than; and specific numerical values (see Table 6.2). The full

command this createsis “Select cases if temp.Fahrenheit is less than 100.” By implication, cases with values of temp.Fahrenheit greater than or equal to 100 are not selected. Data for the cases that satisfy this condition will be included in later analyses. Cases that do not meet this condition (that is, persons with temperatures above 100) will be excluded from future analyses. Under the Output heading in the Select Cases dialog box in Figure 6.14, I left the radio button selection as “Filter out unselected cases.”If you choose “Delete unselected cases,” cases will be removed permanently. Permanentdeletion is usually not a good idea. A research report must include information about anycases that are selected out. The number of cases, the score values, and the reason for selecting them out should be stated. Usually scores are removed because they are outliers, but

there can be other reasons to remove scores. When you look at the data file in Figure 6.16 you'll

see that the row numbers for two excluded cases (with temperature scores of 101.3 and 100.4) are

In the following example, SPSS Select Cases

marked out with cross hatches. If the frequencies

commandsare used to retain temperatures that

procedure is run to obtain the sample mean M,

are below 100 degrees Fahrenheit (and

those two values will not be included. Figure 6.14 Select Cases Dialog Box

temporarily filter out any temperatures higher than this). To do this, make the following menu

selections: っ .In the Select Cases dialog box (Figure 6.14), click the radio

button for “If condition is satisfied.” Then click the If button immediately below that to open the

26% Page 154 of 624 - Location 3960 of 15772

ER Select Cases

& sex

a hr

@ tempFahrenheit temp.Ceisius

>

Select O alcases

© If condition is satisfied

панасне

x]

ir

Ce)

© Random sample of cases

© Based ontime or case range Use filter variable:

RC] Output © iter out unselected cases © Copyselected casesto a new dataset

© Delete unselected cases Current Status: Donot filter cases

== [mese] (cancer) Cie) Ontheleft are variables, namely, sex, hr, temp underscore Fahrenheit, and temp underscore Celsius. Theright has check boxes to select cases. There are five choices; all cases, if condition is satisfied, random sampleof cases, based on timeor case range and usingfilter variable. Thesecondchoice, If condition is satisfied, hasbeenchecked. Below this is the output section. There are three choices in the check boxes here;filter out unselectedcases, copy selected cases to a new dataset and delete unselected cases. The first option hasbeen selected. A statement “Current Cases: Do notfilter cases”is below this.

At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 6.15 Select Cases:If Dialog Box

26% Page 156 of 624 - Location 3987 of 15772

Ontheleft is а set ofvariables, namely, sex, hr, temp underscore Fahrenheit, and temp underscoreCelsius. Temp underscore Fahrenheit has beenselected. A box ontheright shows one morevariable, temp underscore Fahrenheit less than 100. Below this is a keyboard with numbersand special characters that is usedto input the variable specifications. At the extremeright is a box showing Functiongroups, of which the following are visible; all, arithmetic, CDF and noncentral CDF,conversion, currentdate or time,date arithmetic and date creation. Table 6.2

-eeo NE

al Notegual

pT

Less than

> or GT me &orAND 0

Greater than or equal to late whether both conditions hold Evaluate whetherane orboth oftheconditions hold

Figure 6.16 Temperature Data File With Cases Removed by Select Cases Procedure Marked by

Cross Hatches

TE “Unid Isa 1PSSui Dt tr

This could be done with a logical expression such

ド や - — — = incon tt = = = 3 5 5

Be

2 2 2 1 1 2

Ph

ョ 7 ァ 5 70 ヵ

2 empFovennet|_ 2 tempCelis

1013 1004 ses ss En En

ЕТ) 2000 nn 3750 a a

as “temp_Fahrenheit > 97 AND temp_Fahrenheit < 100.” This would include scores only if they are both greater than 97 and less than 100 and

0 о 1 1 ① ①

excludescores outsidethat range. Remember that you must report the number of

From that point on, when you run analyses (such as the frequencies procedure to obtain statistics such as the mean), the few cases with temperatures equal to or greater than 100 will not be included. They have not been deleted from the data file, only temporarily excluded. If you want to stop excluding the cases with outliers, you need to go back to the Select Cases dialog box and select the radio button for “All cases,” as shown in Figure

6.17. Figure 6.17 “Select Cases” Radio Button to Select All Cases (Stop Excluding Outliers) | À Select Cases

PES PA

4 temp.Fahrenheit 4 temp.Celsius

& tempFahrenheit
120, the ¿distribution becomes very close to the standard normal distribution). As dfincreases, the critical values of zdecrease; by the time d/> 120, critical values of ¿converge to the standard normal distribution. For d/> 120, 2.5% of the distribution lies below -1.96, the middle 95% lies between -1.96 and +1.96, and 2.5% of the distribution lies above +1.96. When Wislarge, the amount of additional sampling error created by using SDto estimate o becomes negligible. Figure 7.7 Lower Tail, Middle 95%, and Upper Tail

of Normal Distribution

Source: Abridged from Fisher and Yates (1974, Table V). The imageis an extract from thecritical values for T distribution and has been adapted from thetable by Fisher and Yates. Thetablelists different confidence intervals, andlevels ofsignificancefor one tailed and two tailed tests. It also showsthe df ranges that result in thecritical values. Details are below:

—1.96

o

+1.96

Mostextreme 5%: sum ofareasin

lower and uppertails beyond z = 1.96

The imageis a diagram of the normal distribution that showsthelower tail, middle 95 percent, and uppertail. The X axis has 0 as the center. Minus1.96 and plus 1.96 have also been marked. The area betweenplus 1.96 on the right and minus1.96 on the left is equal to 95 percent. Thearea beyondplus1.96 and minus 1.96 on either side is 2.5 percenteach. A statement below the diagram mentionsthat Most extreme 5 percent is sum of areas in lower and uppertails beyond z equals 1.96. 30% Page 120 of 624 - Location 4596 of 15772

Confidenceintervals percentage 80

90

95

98

The area between plus 2.034 on the right and 99

99

Levels of significance for One-tailed test 0.1

0.05

0.025

0.01

0

0.005

O.

percent and beyond minus2.034 is lower 2.5

percent.

.

0.01

95 percent.

The area beyondplus2.034 isA the upper 2.5

Levels ofsignificance for Two-tailed test 0.02

minus 2.034 onthe left is equal to the middle

df

0.2

0.1

0.05

1

3.078

6.314

12.706 31.821 63.657 63

2

1.886

2.92

4.303

6.965

9.925

31

3

1.638

2.353

3.182

4.541

5.841

12

4

1.533

2.132

2.776

3.747

4.604

8

5

1.476

2.015

2.571

3.365

4.032



The value of Cis usually 95%; it corresponds

6

1.44

1.943

2.447

3.143

3.707

5.

tothe percentage of area we use for the middle of the distribution when we look up

7

A415

199

2565

2098 5399 BE

cutoffvaluesfor £. Ccan be other values, such

8

1.397

1.86

2.306

2.896

3.355

5.

as 90% or 99%.

b

1383

1833

2262

2821

325

4:

0④

7.13 Using Sampling Error to

Set Up a Confidence Interval The following pieces of information are needed to .

R

set up a confidence interval (CI):

* Canarbitrarily selected confidence.

く The values of M, SD, and Nfrom the sample

ata. 10

1.372

1.812

2.228

2.764

3.169

4:

The eighth row of the df and the 95 percent confidenceinterval column havebeencircled.

Weneed to do the following to find the lower and upper limits of a confidence interval on the basis of a sample mean M:

Figure 7.9 Division of Area for Distribution With

8 dfInto Bottom 2.5%, Middle 95%, and Top 2.5% For t distribution with 8 df

1. Find 2 4/= N-1.

2. Lookup the (absolute) critical value from a £ distribution that correspondsto the middle C% of the ¿distribution with V-1 d/ The

critical value of ¿can be obtained from the table in Appendix B at the end of this book and is sometimes denoted Zitical c%3. Calculate SZ; (using SP and N from the

Middle 95%

Lover 57

sample).

Upper 2.5%

t=-2.034

{= +2.034

The image is a diagram of a t distribution that showsthepercentage of area under the curve.

30% Page 181 of 624 - Location 4614 of 15772

4. Find the lowerlimit of the 95% CI: Other

(7.8)



M= Critica 0% X SE)

5. Find the upper limit of the 95% CI:

narrower, we could do any of the following things (other factors being equal): e Choose a lower level of confidence (such

Other

as 90% or 68% instead of 95%).

(7.9)

M + (tcritical C% x SE)

However, researchers are reluctant to

These equations convert the ¿values that

95% confidence is the most widely used

correspond to the middle 95% of the

value.

make the level of confidence too low;

distribution of ¿backinto the raw-score units

* Increase the samplesize V.

in which the mean wasgiven.

* Decrease SD. (Chapter 12, on the independent-samples ¿test, describes

An example: Suppose that V= 25. We use the

things researchers can do in some

“distribution with 24 gf Suppose we want a

situations that may decrease SD.

90% levelof confidence. We locate the value from the table of the /distribution in

However, in manysituations,

researchers havelittle control over SD.)

Appendix B at the end of the book for a 90% level of confidence and 24 df critical 90% = 2.064.

We have values of M= 50, SD = 10, and SEy=

7.14 How to Interpret a

Confidence Interval

10/V 25 = 10/5 = 2. The confidence interval is

The language used to interpret CIs is tricky. It is

calculated as follows:

incorrectto say that a 95% CI computed using data

Lower limit of the 90% CI: M- (éritical 90% *

SEm) = 50-2.064 x 2 = 50—4.128 = 45.872. Upper limit of the 90% CI: M + (Zritical 90%*

SE) = 50 + 2.064 x 2 = 50 + 4.128 = 54.128.

from a single sample has a 95% chance of including и.(It either does or it doesn’t, and we have no wayto be certain which situation we have for an individual sample.)

It is more accurate to think about a Clas a statement about expected outcomes in the long

This can be reported as: 90% CI [45.872,

54.128].

run, across hundreds or thousandsof different samples from the same population. For a 95% CI,

Other factors being equal, these factors make

confidence intervals wider: ® A higher level of confidence, for example, use of C= 99% instead of C= 95%

approximately 95% of the CIs that are set up using the procedures described in this chapter are expected to include the true population mean up between the lower and the upper limits. Approximately 5% of these CIs will not contain y.

® Smaller N

Cumming and Finch (2005) suggested thisas a

* Alarger value of の

way to think about CIs: “a range of plausible values

We prefer to have narrow confidence

intervals. To make confidence intervals

30% Page 182 of 624 + Location 4632 of 15772

for u; values outside the CI are relatively implausible ... [the] data are compatible with any value of p within the CI but relatively

incompatible with any value outside it.” A problem with CIs is that, like Mand SD, they vary across samples. Here is a thought experiment that illustrates the problem. If you randomly select 18 samples (each of size 25) from the same population, the values of Mand SD will vary across these samples. That implies that the upper and lower boundaries of the 95% CIs will also vary across samples, as in the hypothetical example in

Figure 7.10. Each vertical line with whiskers represents the lower and upper bounds of the CI for 1 ofthe 18 samples. The circle in the middle of each CIrepresents the meanfor that sample; the circleis filled if the CI for that sample includes the true value of wand open if the CI for that sample does not includethe true value of u. The true value of ufor the population correspondsto the

horizontal line.

Samples 1 through 18

In this example, 16 of the 18 CIs included ju, while the other 2 CIs did not include pw. If we had CIs for thousandsof samples, 95% of them would be

The image showsa hypothetical outcomefor 18 confidence intervals.

expected to include ju; the other 5% would not include u. The 95% confidence level is a prediction about how many CIs out ofthousands would include p.

Figure 7.10 Hypothetical Outcomes Confidence Intervals

for

Source: Adapted from Cumming and Finch (2005).

18

The X axis is the vertical axis with its centeras ‘mu, and the outcomesare indicated by circles in the center with vertical lines on either side. Eachvertical line representsthe lower and upper boundsof the confidence intervals for each ofthe 18 samples. Most ofthe circles andlines for the confidence intervals included the mu, except for two samples, which have been circled. While one lies above the mu,the otherfalls below the mu.

The imagehas been adapted from Cumming andFinch.

7.15 Empirical Example: Confidence Interval for Body 30% Page 183 of 624 - Location 4658 of 15772

Temperature

differ from the ones I reported; I modified his data

Most of us assume that normal or average healthy

enough that we can dismiss it, or large enough

adult body temperature is 98.6°F. In 1868,

that we should pay attention to it? Further

Wunderlich (cited in Mackowiak, Wasserman, &

information is needed.

Levine, 1992) summarized data from over 1 million temperature measurements for 25,000 patients; he concluded that the “normal healthy” body temperature was 98.6°F or 37°C. Until fairly recently, that value has not been questioned; few studies of normal body temperature have been done. Mackowiak et al. (1992) believed that it

would be useful to examine new data because instrumentation for taking body temperature has changed since the 19th century. Shoemaker (1996) created an artificial data set in which the

score values led to conclusions like those of Mackowiak et al. Data adapted from Shoemaker are used for the analyses in this section. It might seem that finding average body temperature for human populations would be easy, but it’s a more complex question than it appears. For readable discussions, see Cook (2018) and Maril (2018). A more recent study of

slightly.) Is this difference or inconsistency small

To evaluate whether the sample mean is consistent with an estimate of pu = 98.6, we will set up a 95% CI and ask whether 98.6 is included in thatCI (or not). Notice that the standard error of the mean(SZ) reported by SPSS is .0667. You can confirm this by hand: SZyr= Sの/ = .0⑥⑥⑦. To obtain the limits of the 95% CI, a new SPSS procedure is introduced (the one-sample test). This procedure will be used more extensively in Chapter 8. Make the following menu selections: > > . When the dialog box for the one-sample # test appears, as in Figure 7.12, move the name of

the variable of interest into the list of variables to be analyzed. Leave the box “Test Value” containing

the default value of 0. Then click OK. Figure 7.11 Descriptive Statistics for Temperature

in Fahrenheit in shoemaker.sav

Statistics

temperature data collected through smart phone crowdsourcing is reported by Hausman et al.

(2018).

temp_Fahrenheit

Values of N, M, and SD for the Fahrenheit

N

temperature scores in the file shoemaker.sav were obtained using the SPSS frequencies procedure (menu selections are not repeated from earlier chapters). Results appear in Figure 7.11. The first thing to notice is that the sample mean in Figure 7.11, M= 98.25, is lower than the population mean that people generally believe

Valid

130

Missing

‘ 0

E⑨.②⑤④」 Std. Error of Mean

.0667

Std. Deviation

.7603

(98.6). The difference is (98.25 — 98.6) = —35. This sample mean is about a third of a degree lower than the generally accepted value. (Note that if you look up Shoemaker's article, numerical values

30% Page 123 of 624 - Location 4675 of 15772

The image is a table that showsthefollowing descriptive statistics data:

Interval: Initial Menu Selections and Dialog Box

Temp underscore Fahrenheit

for One-Sample ¿Using 0 as Test Value

.....

Statistics

N - Valid: 130

N - Missing: 0

res rs Lies ー ne tio

o Tes Como me

Mean: 98.254 Std error of mean: .0667

+ | tungFata Pre Si * [бах > |oresencerten [E seenmoles Test

> [rvmon

Std Deviation: .7603

The output appears in Figure 7.13. The area enclosed in the ellipse in Figure 7.13 shows the lower and upper limits for the 95% CI for mean temperature in degrees Fahrenheit. Note that this

ge とー シーっ C ye

CI does zofinclude the value that most people thinkof as average body temperature (98.6). The Shoemaker temperature data suggest that the true population mean for body temperature may be lower than the conventionally assumed value of 98.6.

We can create graphs to display confidence intervals. When CIs are graphed,they are called error bars. However, note that lines that are called “error bars” in published graphs do not always represent confidence intervals; sometimes error bars correspond to SD or SE. Titles for the graphs

should makeit clear what the error bars represent. From the main SPSS menu,select > > . The Error Bar dialog box appears in Figure 7.14. Choose “Simple” and “Summaries of separate variables,” then click Define to open the next dialog box, in Figure 7.15. Enter the name of the variable for which you wanterror bars. There is a pull-down menu, “Bars Represent,”initially set to “Confidence interval for mean,” that allows you to specify whether you want bars to represent the CI (or SD or SE); leavethis at the default selection for CI. The output appears in Figure 7.16.

Figure 7.12 Use of SPSS One-Sample # Test Procedure to Obtain Mean and 95% Confidence

30% Page 184 of 624 + Location 4700 of 15772

The image is a screenshot of the procedureto use SPSS One-Samplet Test.

At thetopofthe spreadsheet arethefollowing ‘menu buttons; analyze, graphs,utilities, extensions, window and help. Below these buttons are icon buttonsfor table editing options. On the clicking of the Analyze button,a dropdown menu withthefollowing options has opened; reports, descriptive statistics, Bayesian statistics, tables, compare means, general linear model, generalizedlinear ‘models, mixed models, correlate, regression, loglinear, classify, dimension reduction, scale, non-parametric tests, forecasting, survival, multiple response, simulation, quality control, ROC curve, and Spatial and temporal modelling. The compare means menuhasbeen clicked andthe following menu optionsarevisible; means, one sample T test, independent samples test, summary independent samples T test, Paired samplesT test, and one-way ANOVA.

The one sample test dialog box is also open. Thishas a set of variablesontheleft, sex, hr,

Atthe right is a button to control Options.

The test value can be changed. At present,it hasbeenset to 0.

Output

for

One-Sample

CI for body temperature in degrees Fahrenheit と Test

=0) ‘One-Sample Statistics a

[98.12, 98.39] is a range of plausible values for population mean body temperature. Values outside this CI are relatively implausible.” To report and interpretthis result more

7603



The 95 percent confidence interval of the difference has been circled.

Finch (2005), a brief interpretation is: “Our 95%

Procedure for Body Temperature Data (Test Value

EMO

T:1473.387 Df: 129 Sig- tailed: .000 Mean difference: 98.2542 95 percentconfidence interval of the difference o Lower: 98.122 o Upper: 98.386

Using the language suggested by Cumming and

At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 7.13

o e e o e

temp underscore Celsius, zscore open bracket temp underscore Fahrenheit close bracket and 2score open bracket temp underscore Celsius close bracket. There is an option to movethe requiredvariable to the box ontheright for test variables. Temp underscore Fahrenheit is in this box.

Ea

extensively, we can say,

One-Sample Test On the basis of this sample of N = 130 temperature measurements, with M= 98.254 Tamanna 1473387

129

ET

se:

and SD =.7603, the 95% CI for body temperature in degrees Fahrenheit was

The image is the output for the one sample T test procedure.

The statistics are in one table and the outputis in another table. Both of them have been provided below: + One samplestatistics Temp underscore Fahrenheit

[98.12, 98.39]. The value that people usually assume for population mean body temperature, 98.6, does not fall within this range of plausible values for u. The results of this (hypothetical) study are inconsistent with the claim that ju = 98.6. However, they do not conclusively disprove that = 98.6 The low value of Min this sample might have occurred because of sampling error.

© N:130

Information from additional studies is needed

© Mean: 98.254

to evaluate whether the true population mean

© Std Deviation:

.7603

© Std Error Mean: .0667

© One sampletest Test value equals 0

Temp underscore Fahrenheit 31% Page 186 of 624 + Location 4724 of 15772

for body temperature is lower than 98.6. We should always look for replications using large and representative samples before we draw conclusions; data from one small study do not

prove that p= 98.6 is incorrect. However, results reported by Mackowiak et al. (1992) were

Figure 7.15 Second Dialog Box to Define Error Bar

somewhat inconsistent with that belief. The normal healthy adult population is 98.6°F deserves a second look; that value was based on a kind of instrumentation to measure body

4 temp.Celsius

9 zscorenemoFarve..

Figure 7.14 First Dialog Box for Error Bar (Confidence Interval) Graph

x

Simple

Clustered

- Data in Chart Are © Summaries for groupsof cases

© Summaries of separate variables

an ra

ror Bars # tempFanrennet

Ze クv

4 zscoreftemp.Celsi.

temperature that is no longer used.

TH

x

1 Define Simple Error Bar. Summaries of Separate Variables

assumption that mean body temperature in a

A Error Bar

Define button has been selected.

(EIA) Bars Represent (Confidence interval for mean Level: [95



The imageis a seconddialog box to define error bar and shows howto representthe error bars in the graph. On the left are a set of variables, which can be chosen and movedto the box ontheright. The variables available are sex, hr, temp underscoreCelsius, zscore openbracket temp underscore Fahrenheit close bracket, zscore pen bracket temp underscoreCelsiusclose bracket. The variable temp underscore Fahrenheit has been movedto theerrorbar variable box. There are tworadio buttonson the side;titles and options.

The image is a dialog box for the error bar graph that allowsfor choosingthe type of graph as well as how thedata is summarized. The dialog box has two chart types; simple and. clustered. The Simple option has been chosen. The data in the chartcan be arranged in two ways; summariesfor groups of cases and summaries of separate variables. The second option has been chosen. At the bottom ofthe dialog box, there are three radio buttons; define, cancel and help. The 31% Page 186 of 624 + Location 4747 of 15772

There is a drop-down menu that allows a choice of what the barsrepresent. Here the bars represent confidence interval for mean. The level can also be chosen and 95 percent is the level currently. Figure 7.16 Graph of 95% CI for Temperature Data (Degrees Fahrenheit)

98.40



—— upper limit

98.35

Other Sample Statistics (Such as

Proportions) The sample mean, M, is not the only statistic that has a sampling distribution and a known standard error. The sampling distributions for many other

98.30

98.25

statistics are known; thus, it is possible to identify

e

mean

an appropriate sampling distribution and to estimate the standard error and set up CIs for many other samplestatistics, such as Pearson’s 7. Political polls (and sometimes opinion polls) often report statistics such as percentages, and it is

98.20

possible to set up CIs for percentage estimates.

98.15 —— lower limit 98.10

7.16.2 Margin of Error in Political Polls In many political and opinion polls, respondents are asked to state which among two or more

95% confidenceinterval for temperature in Fahrenheit The imageis an output graph of the 95 percent confidenceinterval for temperature data. In the image, which is the 95 percent confidenceinterval for temperature in Fahrenheit, the temperatureis the Y axis and ranges from 98.1 to 98.4. The mean hasbeen indicated as 98.25, and the upper and lower levelof the line have also been shown. The upper limitis close to 98.38 and the lower limitis around 98.12.

7.16 Other Applications for Confidence Intervals 7.16.1 CIs Can Be Obtained for 31% Page 187 of 624 + Location 4763 of 15772

alternatives they prefer (for example, pass or reject Proposition 13, which calls for legalization of recreational use of marijuana; intention to vote for Candidate A, B, or C). In these situations, the sample statistic of interest is a percentage (e.g., the proportion of respondents who say that they intend to vote Гог А,В, ог С ог who say they don’t know). It is possible to set up a 95% CI for a sample percentage taking Vin the sampleinto account. However, a margin of error reported for polling results usually correspondsto a 68% CI. As N

increases, margin of error decreases. The lower and upper limits of a 68% CI for a

sample percentageare Other

(7.10)

Lowerlimit = (% — margin of error). Other

(7.11)

Upper limit = (% + margin of error). It is possible for margin of error to be related to a different level of confidence. Unfortunately, the

3.25, SD= .957, and SEy = .479; for female students, M= 3, M= 4.33, SD= .577, and SEm= .333. (For each group, you should be able to calculate SEfrom Nand SD.)

definition of margin of error is often not stated

Look at the correspondence between the

specifically in media reports. Often the margin of

descriptivestatistics and the graph. The height of

error reported in media corresponds to a 68%

each bar corresponds to the mean rating for a

confidence interval.

group. The end points of the 95% CI error bars can

As an example, suppose that 54% of those polled say that they plan to vote for Candidate A, with a

be found by multiplying critical 95% (using a £ distribution with d/= 2 for the female and d/= 3

margin of error of +2%. This implies that the 68%

for the male group) by the value of SFygand

CIranges from 52% to 56% in favor of Candidate

identifying this distance below and above the

A. Plausible estimates of the population

group mean. This type of bar graph is very

proportion lie within this range. If more than

common in research reports in which group

50% of the vote is required to win the election, a

means are compared. Keep in mind that the error

CI from 52% to 56% indicates that itis plausible

bars on this type of graph can represent a 95% CI

(but not certain) that Candidate A will win.

but might represent SD or SZ; look for that information in the figure title or note.

Consider a different scenario, in which the By now you should be able to understand the

2%, and the proportion of those who plan to vote

nature of the differences between female and

for Cis 35 + 3%. This translates into CIs of 31% to

male students either by comparing values of Mor

35% (for Candidate B) and 32% to 38% (for

by examining the bar graph. Which group had

Candidate C). Candidate C may be ahead of

higher mean guilt about unhealthy foods?

candidate B by a small amount, but that small

Figure 7.17 Bar Graph for Group Means With 95%

difference could easily be due to sampling error.

CI Error Bars

Group Means Error bars can be superimposed on bar graphs in which the heights of bars correspond to group means. Consider the hypothetical example in

Figure 7.17. Students are asked to rate their degree of agreement with the statement “I feel guilty when I eat foods I know are unhealthy” on a five-point scale (1 = strongly disagreeto 5 = strongly agree). Mean scores are calculated for male and female students. For male students, N = 4, М=

31% Page 128 of 624 - Location 4785 of 15772

6

Mean| feel

7.17 Error Bars in Graphs of

ty when | eat foods | be unhealthy.

proportion of people who plan to vote for Bis 33 +

Male

Female Error bars: 95%CI

The image shows two bars in a graph that

represent group means with 95 percent CI

error bars. The bars represent student agreement with the statement - 1 feel guilty

70

(extraversion) are related to scores on another

Mean height

quantitativevariable (physical energy). A preliminary graph called a scatterplot is used to examine the relationship between variables prior

65

to doing statistical analyses such as correlation or regression. An exampleof a scatterplot appearsin Figure 5.29. In this hypothetical study, each



Female

Sex

Male

The X axisdenotesthe sex, male andfemale. The Y axis denotes the mean height, and ranges from 60 to 70. There are two bars, female and male. The female bar hasa height of 64, while the male bar hasa heightof 69. One difference you may notice is that, in this chart, the Yaxis begins at 60 (instead of 0, which

person provided self-report scores for extraversion (rated on a scale from 1, not at all extraverted, to 5, highly extraverted) and for energy (1 = very low energy, 6 = very high energy). Each data point in the scatterplot represents the combination of scores on extraversion (on the X axis) and energy (on the Yaxis) for one case. For example, the case marked with circle in Figure 5.29 represents a person with an extraversion score of 4 andan energy score of 3.

was the recommended valuefor the Yaxis origin

The three ellipses in Figure 5.29 identify areas of

when bar charts were used for frequencies).

the graph that can be compared. On the left, an

Here's why. For group frequencies, O cases per

ellipse encloses energy scores for people whose

groupis a possible value. For means of variables

scores on extraversion were low (below 2). On the

such as adult height, O is not a possible value of

right, an ellipse encloses the energy scores for

height. It makes sense to choose a value of Ythatis

persons whose extraversion ratings were high

below the minimum height in the sample, but

(above 4). You can see that for the people with low

higher than O, for a bar chart in which bars

scores on extraversion, energy scores also tended

represent means.

to be low. For persons with high scores for

If you read research reports, you are more likely to encounter bar charts that represent group means than bar charts for group sizes or frequencies. You will learn more about setup and interpretation of this type of bar chart in chapters about the independent-samples ¿test and ANOVA.

5.15 Other Examples 5.15.1 Scatterplots In some studies, researchers want to evaluate Whether scores on one quantitativevariable

22% Page 126 of 624 » Location 3367 of 15772

extraversion, energy scores tended to be high. People with moderate scores on one variable also

had moderate scores on the other variable. This is an example of a positivelinear relationship. In a later chapter this kind of relationship between two quantitative variables will be assessed using

Pearson correlation. Figure 5.29 Scatterplot of Physical Energy Scores (¥Axis) with Extraversion Scores (YAxis)

5. How does SEdiffer from 017?

. What is SZ, Whatdoes the value of SEytell you about the typical magnitude of sampling error? « As SD increases, how does the size of SZ; change (assuming Wstays the same)? e As Nincreases, how does the size of SZ; change (assuming SD stays the same)?

. Howisa ¿distribution like a standard normal distribution? Howisit different? . Under what circumstances should a distribution be used rather than the standard normal distribution to look up areas or probabilities associated with distances from

the mean? . Consider the following questions about CIs: A researcher tests emotional intelligence (EI) for a random sample of children selected from a population of all students who are enrolled in a schoolfor gifted children. The

researcher wants to estimate the mean EI for the entire school. Let's suppose that a researcher wants to set up a 95% CI for IQ

values involved in computing the CI

influences the width of the CI. Recalculate the CI for the emotional IQ information in the preceding question to see how the lower and upper limits (and the width of the CI) changeas you vary the Vin the sample (and leaveall the other values the same). 1. Whatare the upper and lower limits of

the CI and the width of the 95% CI if all the other values remain the same (M=

130, SD = 15), but you changethe value of Nto 16? Note that when you change N, you need to changetwo things: the computed value of SZ, and the degrees of freedom used to look up the critical

values for £. 2. Whatare the upper and lower limits of

the CI and the width of the 95% CI if all the other values remain the same, but you changethe value of Vto 25? 3. Whatare the upper and lower limits of

the CI and the width of the 95% CI if all

scores using the following information:

the other values remain the same (M=

The sample mean M= 130.

of Nto 49?

The sample standard deviation SD = 15. The samplesize V= 120.

130, SD= 15), but you changethe value 4. Onthe basis of the numbers you reported for sample size Nof 16, 25, and 49, how does the width of the CI change as N(the number of cases in the sample)

df=N-1=119. For the values given above, the limits of the

95% Clare as follows: Lower limit = 130-1.96 x 1.37 = 127.31.

increases? 5. What are the upper and lower limits and the width of this CI if you change the confidence level to 80% (and continue to use M= 130, SD= 15, and N= 49)? 6. What are the upper and lower limits and

Upper limit = 130 + 1.96 x 1.37 = 132.69.

The following exercises ask you to experiment to see how changing some of the

31% Page 191 of 624 + Location 4838 of 15772

the width of the CI if you change the confidence level to 99% (continue to use

M= 130, SD= 15, and N= 49)? 7. How does changing the level of

confidence from 80% to 99% affect the width of the CI?

Digital Resources Find free study tools to support your learning,

including eFlashcards, data sets, and web resources, on the accompanying website at

31% Page 192 of 624 + Location 4863 of 15772

You can describe distribution shape by thinking

vegetable consumption?

about the answers to these questions. Some of

3. Diet experts often recommend at least

these descriptions are not mutually exclusive. For

five servingsof fruits and vegetables per

example, a positively skewed distribution may

day. How well are the peoplein this

also have high-end outliers, and it may have a

sample doing at meeting that standard?

large mode at zero.

4. What percentage of persons reported eating one serving per day? Thisisa

In atypical research report, authors would like to

frustrating question to answer, given

beable to say something like this at the beginning

this bar chart. If you had access to these

of the “Results” section: “All quantitative variables

data, what other SPSS output would you

were approximately normally distributed with no

want to see to answer this question

extreme outliers.” Real data often do not behaveso

precisely?

nicely, of course. An author might have to say

. Briefly describe, in your own words, three

something more like this: “Number of doctor

things you look for to decide whether a

visits had a reverse J-shaped distribution with five

histogram lookslike a “reasonably normal”

high-end outliers.”

distribution. . Describe the shape of each of the histograms

Comprehension Questions

in Table 5.3. Sometimes more than one term can be applied; for example, skewed

1. Inthe bar graphs in most of this chapter

distributions may also have outliers.

(except those in Section 5.14), the height of

. Whattype of plot appears in Figure 5.33?

the Yaxis provides what information?

What do the values on the Yaxis correspond

2. Suppose you generate a bar graph using SPSS.

to? (Score values? Frequencies?) What

You also have a frequencytablefor the same

information can you report from this plot?

data. What information from the frequency

There are omissions in labeling. Whatlabels

table might you add to the bar graph to make

could be added to this chart?

the information in the bar graph more Figure 5.32 Results From Warner, Frye, Morrell,

precise? 3. Whatisacommon practice that can makea bar graph deceptive? Can you think of at least one other way bar graphs can be made deceptive?

and Carey (2017): Number of Servings of Fruits and Vegetables Eaten on a Typical Day, V= 1,250

50% 40%

4. What can you see in a histogram of quantitative scores that is less easy to see in a frequency table? 5. Consider the histogram in Figure 5.32.

1. What were the minimum and maximum number of servings of fruits and vegetables peoplesaid they ate per day?

30% 20% 10% 0%

O

1 2 3 4 5 6 7 Number of servings offruits and vegetables per day

What was the range?

2. What was the modal amount offruit and

23% Page 1300f 624 - Location 3458 of 15772

The X axis representsthe numberof servings

8

8.2 Significance Tests as Yes/No Questions About Proposed Values of Population Means

Sometimes real-world data analysts have access to data for an entire population of interest. Statistical significance tests are not needed when information is available for the entire population. Significance tests are used when we want to make inferences (or estimates or guesses) about unknown population characteristics such as 。

In Chapter 7, the body temperature data in the file

using only data from a sample.

shoemaker.sav were used to set up a CI for mean body temperature in degrees Fahrenheit. The 95%

The following sections describethe steps that are

CI based on that data did not includethe value of

involved in NHST.

98.6°F that most peoplebelieve is the mean temperature for healthy human populations. In this chapter, we begin by proposing (or

8.3 Stating a Null Hypothesis

hypothesizing) that p = 98.6 °F; then we examine

The term Zypothesis can refer to a verbal

sample data to decide whether that proposed

statement (e.g., “I think my partner is cheating”).

value of pis, or is not, plausible.

For statistical significance tests, hypotheses

The procedure for NHST involves familiar operations: computing descriptive statistics such as M, SD, and SErand looking upcritical or cutoff values of fin a table for #with d/= (N-1). New steps involve setting up null and alternative hypotheses about proposed values of u. Each individual step is simple; however, it can be difficult for beginning students to keep all these steps in mind.It is importantto go through these

correspond to equations. To set up a yes/no question about a proposed value for u, the unknown population mean, we begin by stating a null hypothesis (Hp)in this form:

Other

(8.1)

Ной = Мур:

steps “by hand”; the more you repeat them, the

In this equation, Mhyp is always replaced by a

better you may understand the logic.

specific numerical value. Using 98.6*F asthe specific numerical value for upyp, the null

After you escape into the “real world” and write

hypothesisfor the study of Mackowiak,

research reports, you will not haveto write out all

Wasserman, and Levine (1992) is:

the logic involved in NHST; most of the logic will be implicit. Research reports rarely provide detailed information about all the steps that are

Other

H,: p = 98.6ºF.

outlined in this chapter. SPSS and similar programs will generate final numerical results for

In words, this null hypothesis says, “I hypothesize

you; you won't need to do arithmetic and table

that the true population mean body temperature

lookups. However, you need to understand the

equals 98.6°F.” Depending on the variable that is

logic so that you can understand the meaningand

examined in the study, the proposed value for the

limitations ofp values and NHST.

population mean stated in the null hypothesis could have other values, such as a driving speed of

32% Page 193 of 624 - Location 4894 of 15772

35 mph, a diameter of 10 cm, or an IQ of 100

we reason about evidence in everyday life. In

points. Most books refer to the Ap equality

everyday life, a person thinks of a hypothesis

statement in Equation 8.1 as a null hypothesis. It

(such as “My dating partner is cheating on me”)

makes more sense to think of this as a hypothesis

and then looks for evidence to support that

that can potentially be nullified or rejected on the

hypothesis. In everyday life, we tend to look for

basis of the obtained sample mean.

confirmatory evidence, thatis, evidence that supports our initial hypotheses (Abelson &

On the basis of information from the sample

Rosenberg, 1958). NHST requires usto look for

temperature data (W, M, SD) we will be able to

disconfirmatory evidence. In effect, in many

makeone of two decisions:

research situations, researchers set up a null

e Reject Hp. If we reject Ap, that is equivalent to saying that we do not believe phy(which is 98.6°F in the body temperature example) is a plausible value for u. * Do not reject Hp. If we do not reject Ap, that is equivalent to saying that we cannot rule out

hypothesis that they don’t believe and then look for evidence to reject that null hypothesis. This requires us to think in terms of double negatives (e.g., T have evidence against a null hypothesis that I wantto believe is wrong). This setup is counterintuitive; it differs from our natural inclinations in everyday reasoning.

Hhyp as a plausible value for u. The logic of NHST focuses on evidence thatis We cannot say “Accept Hy.” This would be logically

inconsistent with a null hypothesis (or, to be more

equivalent to saying “I have proved that p exactly

precise, evidence we would be unlikely to obtain if

equals phy(98.6°F).” The logic used in NHST does

Hpis correct). The first step in NHSTis setting up

not provide support for that kind of conclusion. If

anullifiable hypothesis. An example of a

a research report says “accept Ho,” the author has

nullifiable hypothesis in NHST is #0: u = 98.6°F.

misunderstood statistical significance testing.

The evidence that would lead us to doubtor reject

Never, never say “accept Ap"!

Ho is a value of Mthatis “very far” from jnyp (Le.,

Neither decision (reject Ap or do not reject Ap) can be made with certainty when we have only sample data. For either decision, reject or do not reject, there is arisk that the decision is wrong. In

a sample meanvery different from 98.6°F). The computations madein statistical significance tests make it possible to quantify precisely what we mean by “very far.”

theory, NHST provides ways to evaluate the risk or

Often (but not always) data analysts hope to reject

probability of a Type I decision error (a decision to

(or “nullify”) Hp. In many studies, a researcher

reject Hp when Aj is correct). Note that a

specifies a null hypothesis he or she does not

researcher can make a Type I decision error even

believe and then hopes to obtain evidence to reject

if he or she has done everything correctly.

that null hypothesis. Sometimes rejecting #

Uncertainty about decisions is inherent in the

means that, from the researcher’s point of view,

process of using sample data to make inferences

the study was a success.

about populations. Note that the logic of NHST differs from the way

32% Page 194 of 624 + Location 4922 of 15772

8.4 Selecting an Alternative

Hypothesis

are converted into /ratios, and then ¿ratios are

Two hypotheses are needed for NHST: a null

terms. (Use of a /ratio to assess the distance of M

hypothesis (denoted Ap) and an alternative

from a hypothesized value of pis analogous to the

hypothesis (denoted 77,or sometimes 71). As

use of a zscore to assess the distance of a single Y

used to assess distance from the meanin unit-free

noted previously, the equation for a null

score from a sample mean.)

hypothesis is of the form Ho: u = Whyp (where unyp

The two-tailed version of Ha}is also called

is a specific value chosen by the data analyst, such

nondirectional because the direction of difference

as 98.6°F).

between Mand the specific value of ppyp (such as

Note that a null hypothesis could be incorrect in

98.6°F) is not specified.It is called two-tailed

any of three different ways: pu could be unequal to

because we reject Afor values of Mor ¿that

Hhyp, greater than jupyp, Or less than upyp. (True

correspond to either the lower or upper tailof a ¢

population mean body temperature could be =

distribution. In practice, the terms

98.6°F, > 98.6°F, or < 98.6°F). Alternative are statementsofalternative realities:

interchangeably.

In the body temperature example, if Ho is incorrect, what range of outcomes for sample mean temperature would you expect? Each version of Hy; specifies a different range. For a one-sample test, a data analyst selects one of the following three alternative hypotheses.

can be used

and

Using this nondirectional or two-tailed version of Нан,the researcher collects data, examines sample M, and rejects #if the sample mean Mis either far aboveor far below janyp. In this example, we wouldreject Hp: u = 98.6°F as implausibleif we obtain a sample Mthat is either much lower or

Alternative Hypothesis 1-The population mean is

much higher than 98.6°F. Later in the chapter

hypothesized to differ from upyp, but we do not

you'll see how we quantify what we mean by

specify a direction of difference.

“much higher”: Exactly how far away from jnyp does Mneed to be to reject Hy? Can we reject Hp: u

The equation for a two-tailed or nondirectional alternative hypothesis is:

Other

= 98.6°F if we obtain M= 99.0°F? M= 94.5°F? M= 101.3°F?

When we car specify the expected direction of difference, we can use one of the following one-

(8.2)

代 許 チ Ту" This version of py; is called two-tailed because we will reject Hp for values of Mthat are either much higher or much lower than jpyp- These values of M correspond to /values in either the lower or upper tail of the /distribution. To evaluate distance from Hhyp (such as 98.6°F), values of M

32% Page 195 of 624 + Location 4950 of 15772

tailed or directional alternative hypotheses. These tests are called directional because they specify one of two possible directions in which u might differ from ppyp. They are called one-tailed because for Hay 2, Hy is rejected only for outcome values of Mand ¿that fall in the upper tail, and for Hay 3, Ho is rejected only for outcome values of #

and ¿that fall in the lower tail. The terms one-

and

can be used

interchangeably. The direction of difference

hypothesis (and often data analysts want to reject

should bestated when test results are reported.

the null hypothesis). I suggest that you use Za1 (the two-tailed test or nondirectional alternative

Alternative Hypothesis 2: The population mean

hypothesis) in mostsituations. When you learn

is hypothesized to be higher than ppyyp.

about other statistics later, such as the Ftest, you will find that some tests are always one tailed. The

If the researcher thinks that the true population mean may belarger than unyp (for example, that true population mean for body temperature is higher than 98.6°F), the directional alternative hypothesisis as follows; we would reject Ap: p= 98.6°F only for a sample value of Mthat is much higher than 98.6°F:

choice between one- and two-tailed Ha}. options is an issue primarily for ¿tests.

8.5 The One-Sample /Test Weneed to quantify the distance of M from Upyp precisely so that we can decide whether Mis “very far” from ppyp. We wantthe distance to be in

unit-free terms so that we can evaluate it asa

Other

large or small distance by looking at standardized

(8.3)

Ни 2: в > Ву ⑥⑧ > 98.6°Р).

distributions of zor values. In earlier chapters, when we wanted to specify

the distance of an individual Y score from the Alternative Hypothesis 3: The population mean

sample mean M, we computed a zscore, z= (X—

is hypothesized to be lower than phyp.

M)/SD. Because z was unit free, we could look up

the value of zin a table of the standard normal If the analyst expects that the true population

distribution (Appendix A at the end of the book) to

mean is lower than the hypothesized value (e.g.,

evaluate areas below or above z, to decide whether

that population mean body temperature is lower

the XY score was very far away from M. For

than 98.6°F), the equation for a one-tailed or

example, if an X score was in the top 2% of the

directional alternative hypothesis takes this form:

area of the normal distribution, we could say that it was unusually high. (This works onlyif the X'

Other

scores are normally distributed.)

(8.4)

To evaluate the distance of M from janyp, we do

Hy 3:p Mhyp), the reject region consists of only the upper tail; we have a = .05, one tailed (upper tail only). Look under “Level of Significance for One-Tailed Test”

We will reject Ap (that p equals some specific

in the column “.05”to find the critical value for d/

value, such as 98.6°F) if the zratio tells us that Mis

= 15; this critical value is 1.753. We reject Æif #

veryfar from uhyp- To do this, we need to define

> 1.753. This reject region appears in Figure 8.3b.

reject and do not reject regions in terms of specific values for £. To define the reject region(s) for values of £, we need to know these three things:

Because the ¿distribution is symmetrical, once you know that +1.753 identifies the top 5%, you also know that /= -1.753 correspondsto the bottom 5%. If Hair: И < Hhyp, We reject Ho for

e Choice of Hat. This tells us whether to include only one tail or both tails in the reject region.

values of Zbelow -①.⑦⑤③,as shown in Figure 8.3c. The values of used to labelthe reject and do not reject regions for the three different versions of

e Choice of a. This tells us how much area is included in one or both tails ofthe £

May appear in Figures 8.3a, 8.3b, and 8.3c. (Reject

distribution; often a is 5% or 1%. e Sample 4/(N- 1). This tells us which £

but it is more conventional to think about them in

distribution to use to find critical values that cut off tail areas.

regions could also be given in terms of values of M, terms of values of £) The reject regions in Figure 8.3 correspond to values of Mthat are so far away from ppyp (with distance between Mand jnyp

Suppose your sample has N = 16 (df= 15); that you

expressed in terms of the unit-free /test) that

use Раде H = Mhyp; and you choose a =.05. The

they wouldbe very unlikely to occur if Ap is true.

reject regions correspond to a = .05, two tailed. Thus, you need the critical values that divide a と

Later you will see that there is an easier way to

distribution with 15 dfinto the bottom 2.5%,

decide whether to reject or not reject Ap than

middle 95%, and top 2.5% areas. Critical values of

comparing an obtained value of zwith these reject

tcan be found in the ¿distribution table in

regions from a /distribution. You can just

Appendix B at the end of the book. An excerpt

examine p values in SPSS output instead of #

33% Page 200 of 624 - Location 5097 of 15772

that resultin thecritical values. Details are below:

values. The reject/do not reject decision becomes quite simple when you do this:

Confidence intervals percentage

« If obtained р < а, reject Ap (the outcome is

80

called “statistically significant”). e If obtained p> a, do not reject Ho (the

90

95

98

99

9

Levels ofsignificance for One-tailed test

outcome is called “not statistically

0.1

significant,” sometimes abbreviated 7s).

0.05

0.025

0.01

0.005

0

Levels ofsignificancefor Two-tailed test

The a levelis selected by a data analyst before looking at the data. Often ais set at .05. The p

df

0.2

0.1

0.05

002

001

O

valueis obtained from your computer output.

12

1.356

1.782

2.179

2.681

3.055

4

13

135

1.771

216

2.65

3.012

4

14

1.345

1.761

2.145

2.624

2.977

4

15

1.341

1.753

2.131

2.602

2.947

4

16

1.337

1.746

2.12

2.583

2921

4

17

1.333

1.74

211

2.567

2.898

3

SPSS reports a p value as “Sig.” SPSS usually reports two-tailed p values (for tests

such as ratios that can be either one or two tailed). If you use a two-tailed alternative hypothesis, just evaluate whether the SPSS “Sig.” or pvalueis less than a. If you use a one-tailed alternative hypothesis, you need to convert the two-tailed SPSS “Sig.” or p value into a one-tailed y

value. Figure 8.2 Excerpt From Table of ¢ Distribution (From Appendix B at End of Book)

The df 15 level andthelevelofsignificance for

CRITICAL VALUES FOR t DISTRIBUTION

[

30

7

Confidence Imervals 06) ッ | の | タ タ LevelofSignificancefor One-Tailed Test

TE Tm 1761 =

260 2583

2567

2921

also circled.

Figure 8.3 Reject Regions for Two-Tailed and One-

ous

| ぁ ]

5055 3012 Tm |299 | 2898

two-tailed test values of 1.753 and 2.131 are

ヶッ

|| | 5 avi ofSignfcance or Two Tail Ter «|» To a a

The 95 percent confidenceinterval has been circled, as has the .05 levelofsignificance for the One-tailed and twotailed tests.

w am aw | 30% | aos

| 3965

The imageis an extract from the critical values for T distribution andhas been adapted from the table by Fisher and Yates. Thetable lists different confidence intervals, andlevels ofsignificance for one tailed and twotailed tests. It also showsthe df ranges

33% Page 201 of 624 - Location 5129 of 15772

|

Tailed ¿Tests (Example: 2/= 15)

tailedtest or directional test: reject H subscript 0 onlyfor t values in upper tail or values of M greaterthan 35.

tor nondi lues of ti cear // Эла) Do notrejectitt / between 2.131 4 and 12.131 っm

Rect 1 >42131

The imageis of a normal distribution.

Thearea beyondplus 1.753 on the right is the reject region of 5 percent with the statement —

reject H subscript 0. @ One-ta directions Ny for values in u MF)

Thearea to the left of plus 1.753 hasthe

statement — Do not reject H subscript 0. / бота,

Reject H,

t=+1753

a) H subscript ait: mu is less than 35. Onetailedtest or directional test: reject H subscript 0 onlyfor t values in upper tail or values of M less than 35. The imageis of a normal distribution. The area beyond minus 1.753 on the leftis the

Reject H,

Do not reject H, t=-1753

The image shows the reject regions for twotailed and one-tailed t tests. There are three diagrams,for different values of H subscript ait. The df level is equal to 15. a) H subscript ait: mu is not equal to 35. Twotailedtest or nondirectional test: reject H subscript 0 for valuesof t in both lower and upper tails. The imageis of a normal distribution.

The area betweenplus 2.131 on the right and minus 2.131 on the left is the central region

reject region of 5 percent with the statement —

reject H subscript 0.

Thearea to the right of minus 1.753 has the

statement — Do not reject H subscript 0. A one-tailed p valueis half of the corresponding two-tailed y value. If SPSS saysthat (the twotailed)

= .06, then the corresponding one-tailed

pvalue is .03. For a one-tailed or directional ztest, compare the one-tailed p value (in this example, p =.03) with a. You must also check that the direction of difference of Mfrom unypis consistent with the direction of difference in your alternative hypothesis. To avoid possible confusion between one- and

‘with the statement - Do not reject if tis

two-tailed p values, and for other reasons, I

between minus 2.131 andplus 2.131.

recommend that you use nondirectional (two-

Thearea beyondplus2. 131 is the upper2.5 percent and beyond minus2. 131 is lower 2.5 percent. Both these regions are to be rejected.

tailed) tests in most situations.

Statements state: Rejectif t less than -2.131

and Reject if is greater than 2.131.

8.8 Questions for the OneSample /Test

b) H subscript ait: mu is greater than 35. One-

The question examined by the one-sample¿test

33% Page 202 of 624 + Location 5144 of 15772

can be worded three different ways. For the body

the population. We usually havelittle

temperature example, we use 98.6°F as the value

information aboutthe distribution of scores

for Uhyp-

in populations. We usually have convenience samples, instead of random samples from the

1. Can we reject Ho: U = hyp? (The decision can beeither to reject or not reject Ap.) 2. IS Mhyp a plausible value for y? (The decision can be either yes or no.) 3. Is Msignificantly different from ppyp? (The

population of interest. * Payattention to non-normal features of data that could make the sample mean a poor way to describe central tendency, such as extreme outliers, a mode at zero, and bimodal

decision can beeither that Mis significantly

distributions with modes far apart. If Mdoes

different from Ирур or that Mis not

not make sense to describe scores in the

significantly different from ppyp.)

sample, then the one-sample ¿test won't

The third version of the question is most consistent with the ways NHST is usually reported for most statistical significance tests

discussed later in the book.

8.9 Assumptions for the Use of the One-Sample /Test + Scores for the XY variable must be quantitative.

makesense either. Violations of assumptions can lead to p values that underestimate the true risk for Type I error.

8.10 Rules for the Use of NHST If you want to make yes/no decisions, you should do thingsin the correct sequence. Before you collect data, decide on Y, decide on

(If they are not, it makes no sense to compute

procedures for the identification and handling of

amean.)

outliers, formulate the null and alternative

+ Scores for the Y variable should be independent of one another. (In the following example with driving speeds, the speeds would be nonindependentif there were heavy traffic or if cars were racing one another. The independence assumption was

hypotheses, and select the a level. Do one significance test (or a small number of tests). Do not run dozens or hundredsof tests and then hand-pick a few with small p values to

report.

discussed in Chapter 2.) If scores are not

After you have done significance tests, do not go

independent, the estimate of SD may be too

back and rerun tests with variations in procedure

small. * Some sources state that the distribution of Y

to see if you can obtain different results. For example, do not change from a two-tailed toa

scores in the sample must be normal.

one-tailed test, do not change the a level, do not

Technically,thisis not correct. The

drop outliers and rerun the analysis, and do not

assumptions made when this test was

collect more data and rerun the analysis. Running

developed were that scores are normally

large numbers of analysis in search of small p

distributed in the population, and the scores

values is called p-hacking.

in the sample were randomly selected from

33% Page 203 of 624 - Location 5166 of 15772

Violations of rules can also lead to p values that underestimate the true risk for Type I decision

error. Unfortunately, in real-world research, violations

Step 4: State the alternative hypothesis:

Other

Aa Ehyp #35.

of rules and assumptions are fairly common.

Ifthe cranky resident has not specifically

Therefore we should not have too muchfaith in y

stated the direction of difference for the

values.

alternative hypothesis, a nondirectional or two-tailed alternative hypothesis is used.(If

8.11 First Analysis of Mean Driving Speed Data (Using a Nondirectional Test) We are now ready to apply the one-sample¿test using a hypothetical example. Suppose that a cranky resident of a college town is upset about students’ driving speeds. The posted speed limitis 35 mph. The citizen plans to gather data on driving speed to evaluate if she can plausibly complain to the police that the actual average driving speed for the population of all student drivers is significantly different from the posted speed limit. (In a later example, a one-tailed test using a directional alternative hypothesis is used.) In the traditional approach to NHST, it is importantto decide on M, a, and the nature of the

the resident uses SPSS to run her data analysis, the p value provided by SPSS implicitly assumes this nondirectional or two-tailed alternative hypothesis.) Step 5: Specify the reject regions. Use the ¿ distribution with (N-1=) 8 d/ The values of # that correspond to the bottom 2.5% and top 2.5% of the area of a /distribution with 8 d/ can be found in the table in Appendix B at the

end of the book. Values of -2.306 and +2.306 divide a ¿distribution with 8 Zfinto the lowest 2.5%, middle 95%, and top 2.5%, as shown in Figure 8.4. Drawing a diagram similar to Figure 8.4 can be helpful when you are identifying reject regions.

Other

H, will be rejected if 1 +2.30¢

alternative hypothesis before the collection and

Step 6: Collect data. To evaluate the null

analysis of data.

hypothesis that the mean population speed equals the 35-mph speed limit, the resident

Step 1: Decide on 4, the number of cases in

uses a radar detection device to clock speeds

the sample. For this example, V = 9 cars.

for a sampleof nine cars that pass her house and computes the descriptivestatistics for

Step 2: Decide on the acceptable risk for Type

this sample (W, M, and SD). Ideally, these cars

Terror (I use the popular value of a =.05).

would be randomly sampled from the

Step 3: State the null hypothesis about population driving speed:

population of all passing cars. However, it

would bedifficult to obtain a true random sample in this situation. For this question, the cars should be driven by students (the

Other

Ho: Epyp = 35 mph. 34% Page 204 of 624 + Location 5192 of 15772

population of interest). Ideally, the sample

would not include only red cars or cars driven

using the SPSS frequencies procedure

by intoxicated students coming home from

discussed in Chapter 3. For this hypothetical

weekend parties. The nature of cases included

data set, M= 39, SD = 6.103, N= 9, and SEy =

in the samplecan limit the generalizability of

SD/ = 6.103/3 = 2.034.

findings (Simons, Shoda, & Lindsay, 2017). Data for this hypothetical example are in the

Step 8: Find the /ratio and its d/ The one-

file carspeed.sav.

sample ¿ratio can be calculated by hand or

Figure 8.4 Reject Regions for a = .05, Two

obtained using the SPSS one-sample #

Tailed, With 8 df, Corresponding to Shaded

procedure. On the basis of the null

Areas

hypothesis, Mhyp = 35. Given 1, for a onesample ¿test, 4/= V-1 = 8. From the previous step, M= 39 and SEy= 2.034. Combining this information, we have £= (M-— Hhyp)/SEm = (39

Reject H,

RejectH, 2.5%

2.5%

—2.306

+2.306 t with 8 df

The image is a diagram of a t distribution that shows the percentageof rejected area under the curve for alphaequals .05, two tailed with df equaling 8. The image showsvaluesof t that correspondto 5 percent area in the combined upper and lower tails or the 2.5 percent of area at the ends ofeachtail. The area betweenplus 2.306 on the right and minus2. 306 on the left is equal to 95 percent.

—35) = 4/2.034 = 1.966. Screenshots of the output of SPSS’s one-sample #test procedure appearin Figures 8.5 and 8.6. Enter the value for Hhyp (which is 35, in this example) into the space for “Test Value.” Step 9: Find the CI for Mthat correspondsto

the selected a level. Current recommendations for reporting from many sources call for inclusion of CI information when significance tests are reported. To obtain the CI for M, the onesample test procedure is run a second time (using test value = 0, as demonstrated in Chapter 7).

For the distribution on the left side of Figure

The area beyondplus 2. 306 is the upper 2.5 percentand beyond minus 2. 306 is lower 2.5 percent. Both these regionsare shaded and have a statement: reject H subscript 0.

8.1, the middle area under the distribution corresponds to C(95%); the combined areas

Below plus 2. 306 is a statementthat t is

andthe top 2.5%. C+ a = 1.00, the entire area.

with 8 df.

The equation to obtain level of confidence C

Step 7: Obtain descriptivestatistics. For small

in the upper and lower tails correspond to à (2.5% + 2.5% = 5%). Thus, the distribution is divided into the lower 2.5%, the middle 95%,

for a CIthat correspondsto the a level used

for atwo-tailed ¿test is:

data sets, this can be done by hand. Descriptivestatistics can also be obtained

34% Page 205 of 624 - Location 5221 of 15772

Other

(8.6)

In the output in Figure 8.5, the obtained value of #

Level of confidence = C = 100 x (1-0 Because a is given as a proportion, we subtract a from 1; to turn this difference into a percentage, we multiply it by 100. For a = .0⑤, two tailed, the corresponding level of confidence = 100 x (1 - a) = 100 x (1 -.05) = 95%. Thus, if your test uses a = .05, two tailed, the corresponding CI is 95%. (CIs do not correspond to one-tailed a values.)

Step 10: Compare the obtained value of ¿from Step 8 with the reject regions in Step 5. In this example, £= +1.966 (with 8 75 falls in the “do not reject” region. This tells us that the

= 1.966. This agrees with the value of zreported above from by-hand computation. This ¢ratio has 8 df(df= N-1, where N= 9). The value under the heading “Mean Difference” refers to the numerator of the ratio, thatis, M-(Uhyp)- Using M= 39 and Mhyp = 35, the difference between sample mean speed and hypothesized mean speed is (39-35) =4. The sample mean was 4 mph higher than the hypothesized population mean of 35 mph.

The confidence interval in Figure 8.5 is for the difference between Mand Hhyp (not for M). The

95% CI for (M- upyp)is [69, +8.69].

obtained mean speed (# = 39) was not high enough for the citizen to reject the null hypothesis that the population mean driving

8.13 “Exact”p Values

speed for the entire population of students is

A new piece of information appears in the SPSS

35 mph.

outputin Figure 8.5. In the column headed “Sig. (2-tailed)” we find the “exact” p value that

8.12 SPSS Analysis: One-Sample ¿Test for Mean Driving Speed (Using a Nondirectional or Two-Tailed Test)

corresponds to the obtained value of £. This y

value is the sum of the two tail areas thatlie beyond the obtained /value of +1.966. “Exact”is in quotation marks because many common data analysis practices result in p values that greatly underestimate the true risk for Type l error that y is supposed to estimate. The y value in computer

The SPSS one-sample ¿procedure was used in the

output is exact only in the sense that it

previous chapter (where it was used to set up a

corresponds exactly to the tail area(s) using the

95% CI for M); screenshots for the menu

obtained /valueto “cut off” the tails.

selections appeared there. You can use the same

Figure 8.5 Output From the One-Sample # Test

procedure to perform the one-sample¿test for M

Procedure for Hypothetical Driving Speed Data

(using phyp as the test value). Make the following

Using Test Value = 35

mmenu selections: っ

- . Enter the value of phyp specified in the null hypothesis into the space for “Test Value”; in this example, phyp is 35. Output appears in Figure 8.5. (We will ignore the CI information in Figure 8.5 and focus on the ztest.)

34% Page 205 of 624 - Location 5247 of 15772

One-SampleStatistics

u

Reject rules in terms of obtained p value, using a = .05, can be stated asfollows: « Ifp.05, do not reject the null hypothesis. More generally, do not reject Zp ifp> a. Proponents of the New Statistics suggest that we

The imageis a diagram of a t distribution that showsthe percentage of rejected area under the curve for alphaequals .05, onetailed with df equaling 8. The image showsvaluesof t that correspondto 5 percent area in right tail that are to be rejected.

report the exact p value from the SPSS output(e.g., p=.0845, two tailed) and avoid making yes/no decisions about a null hypothesis. In other words, we don’t state that we reject or do not reject the null hypothesis; we don’t say that the result is statistically significant or not statistically significant. Reporting an exact p value makes it possible for readers who still prefer the

Thearea to the left of plus 1.86 is equal to 95

traditional approach to NHST to make their own

percent. This region hasa statement: Do not

decisions whether an outcome is “significant” or

reject H subscript 0.

not. Reporting an exact p value also avoids the

Thearea beyondplus 1.86 onthe right is the 5

following problem: What can you say ifp= .051 or

percentreject area and hasthe statement:

reject H subscript 0.

P= .06? For an outcome such as p= .051, you

should not say that the outcome was “almost” significant. Reporting exact p values reminds us

The p value in SPSS output provides an easier way

that values ofprepresent a continuum and that

to make the decision whether to reject Ap. You

we do not have to think of .05 as a “cliff.”

can reject Mo if the exact y value on the SPSS

two tailed.)

8.14 Reporting Results for a Two-Tailed One-Sample ¿Test

In this example, the exact two-tailed p = .0845

When youreport results for significance testsin

(.04225 of the area lies below = -1.966, and

research papers, much of the logic is implicit. For

.04225 of the area lies above £= +1.966). The a,

example, you convey the information that you

two tailed,criterion for statistical significance

used Hp: ц = 35 by saying that the p valueis two-

wasset at .05. Because pis larger than à, we do not

tailed. The example “Results” section below

reject Hp.

follows the New Statistics guidelines: Report an

outputis less than the a level you selected. (You need to specify whether the testis one tailed or

Obviously, it is much easier to make reject/do not reject decisions on the basis of values ofpthan values of ¿and reject regions.

34% Page 207 of 624 - Location 5275 of 15772

exact p value; do not state a decision whether the result is “statistically significant.”

Results

A one-sample /test was conducted to assess

level, for example, p< .05 or p> .05, or

whether mean speed for a sample of N= 9

sometimes 7s as an abbreviation for not

cars differed from the posted speed limit of

significant. Reporting an exact p value from

35 mph. For this sample, M= 39, SD= 6.103,

SPSS (e.g., p= .0845, two tailed) is now

and SZ 2.024. The one-sample#statistic

preferred.

was #8) = 1.966, p= .0845, two tailed. Cars in

« If you don't specify a choice of a level within

this sample drove an average of 4 mph faster

the “Results” section or earlier in a research

than the posted speed limit. The 95% CI for

report, readers generally assume à = .05, and

this difference was [-.69, +8.69].

they may use that to draw their own yes/no conclusions about the null hypothesis.

A person who prefers traditional NHST reasoning could go on to say that, using a = .05, two tailed,as the criterion for statistical significance, this difference was not statistically significant. Proponents of the New Statistics advise against

8.15 Second Analysis of Driving Speed Data Using a One-Tailed or Directional Test

this yes/no kind of thinking. Let's return to the car speed data. Wait! The When scores are given in meaningful units, it is

cranky residentis really interested only in the

useful to think about differences in terms of those

possibility that students are driving/asteron

units. In this example, sample mean driving speed

average than 35 mph (not slower). The resident

exceeded the speed limit by 4 mph. In the United

could decide to do a one-tailed test. These would

States, police usually do not bother to give

be the null and alternative hypotheses:

speeding tickets unless driving speed is at least 5 mph above the speed limit (and often much higher than that). From a practical or real-world

Other

Hy;p=35.

perspective, a sample mean speed only 4 mph above the posted limit is negligible. We could say this outcome has no practica

ance, and it

is not statistically significant. In Chapter 9, you will learn how to add effect size information when you report significance tests.

有ale、 > ③⑨. Using a one-tailed or directional alternative hypothesis does not change any of the computations for zor dfthat were used for a twotailed test, but it does mean we need to consider a

Here are several things to notice about “Results”

different reject region. For this version of the

sections.

alternative hypothesis we reject Zp onlyif the

* For ¿tests, you must specify whether the reported p value is based on a two-tailed or one-tailed (nondirectional or directional) alternative hypothesis. e Older textbooks sometimes reported whether ク wasless than or greater than a chosen a

34% Page 207 of 624 + Location 5301 of 15772

valueof 215 in the upper tail (thatis, if Mis far above35). For a = .05, one tailed, with the reject region in the upper tail, the reject region appears in Figure 8.6.

Figure 8.6 One-Tailed Reject Region for Aq: p> Wap, a =.05,0ne Tailed, df= 8

the SPSS reported two-tailed p value) is .04225, which is less than the a of .05.

Whether the decision is made on the basis of the ¿value or the p value, the results are the

same.

8.16 Reporting Results for a One-Tailed One-Sample ¿Test

Do not reject H,

+1.86

Using a one-tailed test, we can report the test

result as follows:

For this directional version of yy, the decision rule becomes: Reject Apif obtained 7 < +1.86. If obtained #< +1.86, do not reject Hp. We now examine the obtained ¿value compared with this

one-tailed decision rule. From the same SPSS output in Figure 8.5, the obtained ¿was +1.966. This did not fall into the reject region using the two-tailed test, but for a one-tailed test, £= +1.966 falls in the upper tail reject region.

Results A one-sample mean test was conducted to assess whether mean speed for a sample of Y = 9 cars differed from the posted speed limit of 35 mph. The alternative hypothesis was that the mean population speed was greater than 35 mph. For this sample, M= 39, ⑤の = 6.103, and SFy = 2.024. The result was 8) =

The SPSS output reports a two-tailed p value, as noted earlier. You can obtain the one-tailed p by

sample drove an average of 4 mph faster than

taking half of the two-tailed p. For the driving

the posted speed limit. The 95% CI for this

speed example, SPSS reported р = .085, two tailed.

difference was [-.69, +8.69].

(Some SPSS procedures allow you to request onetailed p values as an option, but many procedures produce two-tailed p values by default.) The corresponding one-tailed p value = .085/2 = .0④②②⑤.

1.966, p= .04225, one tailed. Cars in this

Authors who prefer the traditional approach to NHST would go on to say that, using a = .05, one tailed, as the criterion, this difference would be judged statistically significant.

For the one-tailed test (Hay: | > 35), the decision to reject Ap could be based on either:

* Obtained zof 1.996 falls within the one-tailed reject region at the upper end of the distribution,

and/or

« one-tailedp value(calculated by taking half of 34% Page 208 of 624 + Location 5329 of 15772

Note that everything in the write-up is the same as for the two-tailed test, except for the reported p value (now one-tailed) and any verbal statement aboutstatistical significance.

8.17 Advantages and Disadvantages of One-Tailed Tests

Figure 6.3 Detail Distribution Table

B

area below 0 Eee

area between 0 IL ea セ

Y

7 o

From

+z

⑧ area between -z

Standard

Normal

Textbooks sometimes drill students in the use of the normal distribution table with questions such

с

as “What percentage of area lies between 2=-1.00

area above +7

Л

and z= +2.00?” These artificial examples do not correspond to the kinds of questions that are of real interest to data analysts.

b

Data analysts usually want to answer a simple

c

question: Is an score or other outcomeclose to, far from, or extremely far away from the mean?

ando

Data analysts sometimes choose different



numerical values to define “far from.” The

À

AX

o

following z values are common ways of thinking

aboutdistance from the mean. e Values between z=-1.00 and z= +1.00 are “close” to the mean.

There are four diagrams that show the area between O and z as well as beyond z for positive and negative valuesof z.

* Values between 2=-2.00 and 2= + 2.00 (but

The first diagram highlights the area between 0 and positive value of z in a normal distribution diagram. The area to the left of O has been markedas Area below 0 equals 50

between”close and far from the mean. * Values below z=-2.00 or above z= +2.00 are “far from” the mean. * Values below 2=-3.00 or above 2= +3.00 are

percent.

outside the range -1.00 and +1.00) are “in

“very far from” the mean.

The second diagram, to the right of the first, highlightsthe area beyondpositive zina normal distribution diagram. This has been shownas the Area above positive z. The third diagram,below thefirst, highlights the area between 0 andnegative value of zina normal distribution diagram. The fourth diagram, to the rightof thethird, highlights the areathe area beyond negative z ina normal distribution diagram.

6.10 Dividing the Normal Distribution Into Three Regions: Lower Tail, Middle, and Upper Tail 24% Page 142 of 624 » Location 3671 of 15772

A normal curvedivided into these areas appearsin

Figure 6.4. Individual researchers are free to use other values of zascriteria for distances. Researchers are often interested in the situation where the areas beyond +zsum to exactly 5%. A

normal distribution can be divided into three

areas: ヶ 2.5% of the area below -z, the “lower tail,” ® 95% of the area in the center, and ® 2.5% of the area above+2, the “upper tail.” Figure 6.4 Areas That Are Close, Far, and Very Far From the Mean (in z Score Units)

additional information and avoid using terms

“marginally significant” or “approaches

such as significant and nonsignificant. Their

recommendation is based on concerns about the

significance” oris “close to significant” or “trends towardsignificance.” This will make

misuse and misinterpretations ofp values (among

readers and reviewers cringe, whether they

other things).

advocate traditional use of significance tests

Ithink many of you may find the New Statistics approach attractive. You don’t need to set up reject regions! You don’t have to judge your study afailure ifp> .05!

or prefer the New Statistics approach. In the minds of traditionalists, zis either less than .05 or it isn’t. It is either significant or not. (To paraphrase the late Groucho Marx,“Close is no cigar.”) From the perspective of the New

At least one journal (Basic andApplied Social Psychology) no longer accepts reports ofp values (Trafimow & Marks, 2015). However, the New Statistics view has not entirely replaced traditional thinking (at least not yet). My current

Statistics, just say that p = .052, without invoking an a = .05 criterion to decide what the p value means.

8.20 Summary

recommendation is to report “exact” p values, but don't place too muchfaith in them, and always

Most of this chapter outlines procedures used in

include confidence interval and effect size information. You will learn abouteffectsize in the

traditional approaches to interpretation ofp

next chapter.

presented a traditional approach to significance

values. Statistics textbooks prior to 2000 generally testing, with a strong focus on yes/no significance

8.19 Things You Should Not Say Aboutp Values 1. If SPSS shows “Sig. (2-tailed)” as .000, do not

say thatp= .000.Ap valueis a risk for Type 1

tests. (Some booksstill do.) In recent years, advocates of the New Statistics have urged us to move away from yes/no decisions and to focus

more on confidence intervals and effect size information. Effect sizes are discussed in the next chapter.

error, and theoretically, this risk is never zero. The tails of /distributions are infinite;

Although proponents of the New Statistics (e.g.,

tail areas are never exactly zero, theyjust

Cumming, 2014) do not necessarily dismiss p

become smaller and smaller as increases. If

values as completely useless, they make the

SPSS shows “Sig. (2-tailed)” as .000, report

following recommendations.

thisas “p< .001, two tailed.”

2. Given small yvalues such as7 .05) and “significant” outcomes is

discussed. 7. Guidelines for reporting results are provided, along withalist of things you should not say.

values; these verbal labels are only approximate.)

Table 9.1

Noeffect Smal efect Medium effect Large effect

= dsm de 50 (e.9, d'between 20and 79) 02%

9.2 Cohen's 4 An Effect Size Index

Cohen’s d'effect size can be calculated for the

An effect size provides information about the size

one-sample ¿test for these data was discussed in

of differences between group means, or the

Chapter 8. For these data, M= 39, Hhyp (test value)

impact of treatments, that is independent of

= 35, and SD = 6.103. For this example, Cohen’s d=

sample size and often in unit-free termsthat can be

(M-Mhyp)/SD=(39 -35)/6.103 =.655. Wecan say

compared acrossstudies. The effect size Cohen’s

that Mis about .66 or two thirds of a standard

provides an index that assesses the magnitude of

deviation above upyp of 35. Using Cohen's

the difference between Mand unyp independent

standards, = .66 for the driving speed study

of samplesize. Its magnitude(like that of other

would be called a medium effect size. Mean speed

effect size indexes) is not related to N. SPSS does

(39 mph) observed in the study was two thirds of

not provide Cohen's d'as part of the outputof #

a standard deviation higher than the proposed or

tests. However, it provides the information you

hypothesized value of mean speed (34 mph). That

need to compute Cohen's by hand: M, the test

difference was not statistically significant when a

value Upyp, and SD. For the one-sample¿test:

two-tailed test was used;it wassignificant, p8

significant Resultmay or may notbe statistically significant

statistically significant Result is usually statistically significant

identifies the /distribution used to look up critical values also increases. Use a = .05, two tailed,as the criterion for significance.

Other For N=9,7=.50x v9 =,50 3 = 1.50 with 8 df; notstatisticallysignificant. 6,t=.50 x V36 =.50 x 6 = 3.00 with 35 df; statisticallysignificant. ог №= 100, г = .50 х 100 = .50 x 10 = 5.00 with 99 df;statistically significant.

9.4 Statistical Significance Versus Practical Importance The term significant means something different in statistics than in everyday use. In everyday use, the word significantusually means large, substantial,

An effect size of = .⑤ would not be judged

of practical or clinical value, or worthy of notice.

statistically significant for ダ =⑨ but would be

By contrast, s

judged statistically significant for higher values of

technical meaning; outcomes of studies are

N.

judged “statistically significant” when results

A group of undergraduates got upset when I showed them this. “That’s cheating!” “The

e has a specific

would be unlikely to arise just from sampling error, on the basis of the logic of NHST.

researcher can make ¿come out (almost) any way

It is useful to distinguish between “statistical

heor she wants!” That's correct, within certain

significance” and o clinical pracr tical

limits. A ¿ratio is not a fact of nature. The

significance (Kirk, 1996). A result that is

magnitudeof ¿is at least partly the result of

statistically significant may be too small to have

decisions you made when you set up the study

much real-world value. A difference between M

(such as the decision about sample size).

and phyp can bestatistically significant and yet be

The dependence of ¿on Wis useful (when we want to take sampling error into account) but potentially problematic (when we want to evaluate effect size independent of sample size). When values of are very large, unless effect size information is provided,it can bedifficult to evaluate how muchof the size of ¿is due to large

36% Page 216 of 624 + Location 5507 of 15772

too small in actual units to have much practical or clinical significance, as in the car speed example. Statistical significance alone is not a guarantee of practical significance or usefulness (Vacha-Haase,

2001). Weevaluate statistical significance by examining atest statistic (such as a zratio) and accompanying information such as gfand p value.

also need to ask, What kindsof people were

repeated here to remind you how the magnitude

included in the study? Were the participants

of tin a sampleis related to sample effectsize and

doing additional things, such as exercise and diet

sample.

modification? How long did they take the drug and in what dose? How long was weight loss maintained after the drug was stopped? Was there a control group that did not receive the drug? And so forth.

In Equation 9.2, drepresents Cohen’s 7, and Vis sample size. This equation suggests that if we want to obtain a large valueof ¿in a future study, in theory, we could do that by examining a large effect size (4) or by using a large Nor both.

Do not use thephrase “highly significant” to describe

However, any value of d'we guess for population

research outcomes with smally values. That

effect size may be incorrect, and even if we did

language leads people to believe the results of a

know 4, the magnitudeof £in a future study will

study have great practical or clinical importance,

also beaffected by sampling error. We cannot

when in fact p< .001 can arise when a small effect

simply put values of dand Vinto Equation 9.2 and

is combined with a very large sample size. When

solve the equation for and assume that our study

you see the phrase “highlysignificant” in media

will result in that value of # In practice, values of #

reports, be skeptical. You need more information

(like values of M) vary because of sampling error.

(such as the actual difference between means, or

The logic used to estimate statistical power given

Cohen's d) to evaluate whether the results of the

values of Zand Vis discussed in Appendix 9A. In

study indicate that an intervention or treatment

practice, tables can be used to look up estimated

hadstrong, or even noticeable, effects.

statistical power for combinations of planned values of Vand guessed values of (Cohen, 1988,

9.5 Statistical Power In most(although notall) applications of NHST, researchers hope to reject Ap. Statistical power is defined as the probability of obtaining a value of # that is large enough to reject Ho when Ho is actually false. Refer back to Table 9.2 to see four possible outcomes when decisions are made whether to reject or not reject a null hypothesis. The outcome of interest, at this point, is the one in the upper right-hand corner of the table: the probability of correctly rejecting Ap when Ap is false, which is called statistical power.

1992a, 1992b). An exampleof a statistical power table, adapted from Jaccard and Becker (2009), appears in Table 9.3. Given an estimate for the population value of Cohen's Zand for planned samplesize #, you can look up expected statistical power in the body of the table. Alternatively, you can look down the column for an estimated population effect size, find the cell for power = .80, and look at the М Гог that row to find the minimum A required. This table applies only to tests that use a = .05, two

tailed. Different tables would be needed for other a levels or one-tailed tests.

Researchers want statistical power to be reasonably high; often, statistical power of .80 is

For example, suppose that aresearcher believes

suggested as a reasonable goal.

that the magnitude of difference she is trying to detect using a one-sample test corresponds to a

Recall that we can reject Zp when the obtained

population effect size of Cohen’s #=.50 and plans

valueof ¿is sufficiently large. Equation 9.2 is

to use a = .05, two tailed. The researcher can read

36% Page 218 of 624 - Location 5564 of 15772

down the column of values for estimated power

The sample size needed for adequate statistical

under the column headed #= .50 until reaching

power can be approximated only by making an

the table entry of .80. Then, she would look to the

educated guess about the true magnitude of the

left (of this value of .80) for the corresponding

effect, as indexed by d If the guess about the

value of On the basis of the values in Table 9.3, the value of Vrequired to havestatistical power of

population effect size dis wrong, then the

about .80 to detect an effect size of d= .5 in a one-

wrong. Information from past studies can often

sample test with a = .05, two tailed, is between

be used to make at least approximate estimates of population effectsize.

30 and 40.

Table 9.3

estimate of power based on that guess will also be

Statistical power analysis is useful when planning a future study. It is important to think about whether the expected effect size, alpha level, and sample size provide you with a reasonably large chance (reasonably high power) to obtain a statistically significant outcome. People who

"

s





write proposals to compete for research funds from government grant agencies are generally required to includea rationale for decisions about

n

a s a ョ ュ タ ョ タ

ョ ョ ョ ッ e



planned samplesize on the basis of power. There are several places to obtain information for statistical power analysis. Jaccard and Becker (2009) provide power tables for some additional

ョ +

メ ョ ョ ュ ュ ュ タ

ッ ッ

situations. SPSS has an add-on procedure for statistical power, and numerous other computer

s ョ メ メタ ラ ョ ョ ッ ッ ッ

programs (some free) can do power analyses. Free online power calculators are widely available (for example, at http://powerandsamplesize.com/Calculators/).

Source: Reprinted with permission from Dr. Victor

Bissonnette(2019).

The true strength of the population effect size we are trying to detect is not known. For example, the degree to which the actual population mean y differs from the hypothesized value, Hhyp, as indexed by the population value of Cohen's 4, is not known in advance of the study. If we knew

Usually researchers rely on computer programs instead of tables for power analysis. A researcher provides program input information about type of analysis (e.g., a one-sample¿test), planned a level, whether a one- or two-tailed test is desired, and expected effect size. Programs usually provide either the estimated power for an input value of N or the minimum A needed to achieve a requested level of power.

the answer to that question, we would not need to You should not report a post ho

do a study!

Thatis, do not look up your obtained Cohen’s &

36% Page 218 of 624 - Location 5594 of 15772

250

* Include mention of skewness in your

200

description of the distribution. e Ifskewnessis not extreme (asin the examples in Table 5.1), you may not need to

150

do anythingto try to get rid of skewness. If skewness is extreme (asin Figures 6.8 and

100

6.9), you may want to consider options such

as outlier removal to reduce skewness. * Decisions about the identification and removal of outliers should be madebefore

50 o







you collect data. If you makethese decisions

8

after you peek at your data, you must explain

Numberof correct answers on an 8-item quiz

to 250.

Thereare eight bars, one foreach question, drawn as a histogram. The heights of the bars from left to right are; 210, 70, 60, 45, 30, 15, 20, 15, 20. A curve followsthebars;its tail on the right is long andthe curve is higher towards the left. Figure 6.9 shows substantial negative skewness

data. Figure

6.9

Example

of

Negative

Skewness:

Hypothetical Exam Scores on a Scale From O to 100 30

Frequency

The X axis denotesthe items on a quiz and ranges from O to 8. The Y axis denotes the numberof correct answersandrangesfrom 0

this when you report information about your

20

10

and a possibleceiling effect. In Figure 6.9 most scores are piled up near 100 points (out of 100

possible points). A ceiling occurs when an exam is “too easy” for most students. Visual examination is usually sufficient to evaluate

skewness. Skewness should be mentioned when data are described in research reports. Positive

skewness is common in real data. Sometimes an appearance of skewness is dueto a few high-end outliers. An index to describe degree of skewness is available; see Appendix 6A for further discussion. Usually visual examination of a histogram is sufficient to evaluate skewness.

‘What should you do if you see skewness in your sample data? 25% Page 147 of 624 » Location 3794 of 15772

0

③0

⑥0

⑨0

negskew The X axis denotesthe scores on an exam and ranges from 0 to 90. The Y axis denotesthe frequency and rangesfrom 0 to 30. There are fifteen bars visible on the histogram. Mostof the bars onthe left are close to 0, and the onescloserto the right are higher. A curve followsthebars;its tail on the left is long andthe curve is higher towards the right.

either committed a II or has reported a correct decision notto reject Hg. (The researcher can never be sure which.) We want the probability or risk for both types of error to be low,that is, we want both aand B to be low. When a data analyst selects an a level, such as a =.05, that choice theoretically sets an upper limit for the risk for Type I error. If a is set at .05, then in theory, we have a maximumrisk of 5% for Type I error. However, the limit of risk for Type I error works in practice only if the assumptions and rules for NHSTare followed—and in many situations, they are not. The actual risk for Type I error in many research situations is often much higher than the nominal (selected) a level.

Actual State of the World Loss Drug Really Does Not Work Typel error ih risk a Researcher istrue. Reject decides! work, but the rejection H; says The drug + claims that it does. rese thatthe weight loss drug works The study probably

stcon

lishes the reditfora For patients who take the drug, a benefit

takethe drug will not benefit Correct decision, although maybe Type ll error with unknown Researcher risk not the decision the researcher decidesnotto The researcher id not reject H, hopedfor. reject; does not claimthatthe The drug does notwork andthe when His false researcher does not clamthatit The drug really does work. butthe drugworks researcherdoes not claim that it works. works Often this type of result does The study probably doesn't get not get published, and that is published; a missed opportunity unfortunate. Other researchers may do studiesto seeifthis drug The drug may not be approved for works, notknowing thatthereis use with patients, even thoughit works already evidence suggestingit This is likely to happen when may not work. studies are "underpoweredthat is, the N of casesis too small to detect the effect of interest

The risk for Type II error, B, cannot be exactly known; but we know something about factors

What does it mean for Ho to be false? Ho is true

that tend to make ß larger or smaller. In the

only if pis exactly equal to O (or exactly equal to

previous section we talked aboutstatistical power:

the proposed value in the null hypothesis, such as

the probability of rejecting Zo when it is false.

98.6 or 35 or 100 in previous examples). However,

Power is (1 —B), and we want power to be high,

Hp can be false in billions of ways. If we consider

usually on the order of .80.

Ho: y = 35, Hois false if p really equals any number other than 35 (e.g., 45, 12, 35.01, 99, 34.3, and so

Table 9.5

forth). Hp can be false to varying degrees; ina sense, Hp: = 35 is “less false”if pis really 35.2 or 34.9 than if pis really 30 or 51. Population effect size is the degree to which Æis false. For example, if Cohen's d(for the difference between the real and hypothesized population means) is d= 1.00,

this indicates that the difference between hypothesis and reality is large; if d= .05, this indicates that the difference between hypothesis and reality is small. The values of B and (1 - В) магу depending on the population effect size. We never know the exact population effect size, but we can think about the values of Band (1-8) that we would expect, in theory,for possibledifferent

values of Zand for fixed decisions about Vand a. Appendix 9A explains this in more detail.

36% Page 221 of 624 - Location 5641 of 15772

These are the factors that influence B,risk for

Cohen's 4. They decide on an adequate level of

Type II error (and also 1 —B, statistical power):

statistical power, 1 —B, often .80 They look up

ヶ As aincreases,B decreases.However, researchers are reluctantto increase a, risk for Type I error. Increasing a is not a common

way to try to reduce risk for Type II error. « As samplesize Nincreases, risk for Type II error B decreases, and statistical power

increases. This is consistent with intuitions you probably have by now: You have a higher probability to reject Zp when samplesize is large. e As population effect size such as Cohen's d increases, risk for Type II error B decreases, and statistical power increases. Design

decisions that are often under researcher control are related to effect size. This is discussed more extensively in Chapter 12 on the independent-samples test, a test you are more likely to use and a situation that will be easier for you to think about. These are the factors that influence risk for Type I

error, a:

these numbers in a table for statistical power to find the minimum value of Wthat will provide the desired level of power under those conditions. (Or they inputthis information into a statistical power calculating program.)

9.7 Meanings of “Error” Note that the term error has different meanings in everyday life than the term errorin statistics. In everydaylife, error means mistake. For example, if astudent adds a set of numbers incorrectly when calculating a sample mean, that is an error in the everyday sense: a mistake. The assumptions and rules involved in NHST were designed to keep the risks for committing each of these kinds of error low. However, even a researcher who follows all the rules exactlystill hasrisk for decision errors. In statistics, we talk about many kindsof error, and each has a technical definition. So far you have learned about sampling error. Because of sampling error, the values of means vary across

e The a level that the data analyst chooses as

samples drawn from the same population.

criterion for statistical significance. e Adherence to the assumptions and rules for

Sampling error is not a “mistake.” This is just the way the world works. Prediction error has also

NHST. If there are violations of assumptions

been mentioned: If the mean from a single sample

and rules, the true risk for Type I error is

is used to estimate an unknown population mean,

often much higher than a.

it will probably not exactly equal the population mean uy; if we use Mto estimate u, we will make a

If a study has an A'too small to have a reasonable

prediction error. In this chapter you learned about

chance to detect an effect (to reject Zp when Apis

two new kinds of error; these are the two kinds of

false), it is called underpowered. Researchers try

error that can occur when making a reject/do not

to avoid underpowered studies by using the

reject decision about a null hypothesis.

statistical power analysis methods in the previous

(Additional types of error arise later, such as

section. They decide on the type of statistical

measurementerror.)

analysis, the alpha level, and the nature of the test (one vs. two tailed). They make educated guesses about possible population effect size, such as

37% Page 223 of 624 - Location 5659 of 15772

Of course, people who handle data can make mistakes (errors, in the everyday sense of the

word): errors in computation or copying

a

dies, research questions are

numerical values or interpreting numbers.

often open ended. For example, ina

Mistakes may be surprisingly common in

nonexperimental survey, an analyst may evaluate

published research reports (Green et al., 2018).

many variables to see which one(s) best predict an

The technical types of error that arise in statistics (such as sampling error and prediction error) do not arise because the data analyst has made a

mistake. Procedures such asstatistical

outcome such aslife satisfaction. Fishing for predictors in a large set of “candidate” variables potentially opens up a much wider range of ways

to violate rules for NHST.

significance tests involve inherent uncertainty.

Some journals seem to accord greater value to

Even when a data analyst has done all the steps

confirmatory studies than to exploratory work.

correctly, the data analyst can make a decision

Perhaps because of this, there is atemptation for

error, such as rejecting Zp when it is true. This

researchers who have done exploratory studies

kind of error is unavoidable in inferential

(who havetried out many different combinations

statistics. We can’t get rid of it no matter how

of variables, rules for identification, handling of

careful we are, but we can try to reduce the risk

outliers, etc.) to cherry-pick a small set of results

for error, and we musttake risk for error into

and write research reports that make it sound as if

account when we reportresults.

the study were confirmatory.

9.8 Use of NHST in Exploratory Versus Confirmatory Research

Exploratory and confirmatory studies both have value. In many research areas, truly confirmatory studies are possible only after a period of exploratory work. However, reporting hand-

In a confirmatorya researcher usually has

picked p values from large numbers of tests in

a small number of hypotheses. These may have

exploratory studies violates a fundamental rule

been selected during earlier exploratory research,

for the use of NHST: Do only a small number of

or specified by a theory, or they may bevariations

significance tests. When a small number of

of hypotheses in previous confirmatory studies.

selected results from an exploratory study are

Confirmatory studies are often (but not always)

reported as if they were obtained through a

experiments. Confirmatory studies often have

confirmatory study, p values can greatly

few variables and a limited number of statistical

underestimate the true risk for Type I error.

significance tests. This is the context in which Fisher and colleagues developed the logic for NHST. Researchers may face fewer temptations to violate some of the rules of NHST in confirmatory studies than in exploratory research. However, there still are many ways to violate rules and assumptions for NHST in confirmatory research, for example, by trying out different methods of handling outliers and switching from two-tailed

to one-tailed tests.

A specific study may provide information to do both confirmatory and exploratory analyses. When this is the case, the first part of a “Results” section can report a limited number of analyses for which the researcher had specific hypotheses in advance. A later section titled “Exploratory Results” can report additional interesting results that were not predicted in advance. In general, we should not place muchfaith in y

37% Page 223 of 624 - Location 5696 of 15772

Preliminary Data Screening and Descriptions of Scores for Quantitative Variables

* Your decision whether to use mean or median (as well as choices among later statistics) may depend on distribution shape and whether outliers are present. * Documentevery decision you made.

When you work with quantitative variables, you should do the following things.

く In all research, decide the value of Nbefore you begin to collect data. (Do not collect data, repeatedly analyzeit, collect more data because you are not happy with results, and then stop at a point where you have results you like.)

Choose the method for outlier identification (such as boxplots or zscores) before you

collect data. Establish rules for inclusion or exclusion of cases ahead of data collection. (For example, you may wantto includea limited range of ages, or only right-handed persons, in your sample.) Decide how you will handle outliers before you collect data. If you anticipate skewness, think about what you might do to reduce skewness ahead of time. In many cases, if skewness is not extreme, you don't need to do anything about it.

6.18 Reporting Information About Distribution Shape, Missing Values, Outliers, and DescriptiveStatistics for Quantitative Variables You use all the information discussed in Chapters 3 through 6 to describe the behavior of each quantitative variable early in your research report. Try to communicate the pattern of information as clearly as possible. Information about distribution shape can be summarized in statements such as: Heartrates were approximately normally distributed, with = 100, M= 74, and SD = 4.5. There were no missing values. Using 2>

3.29 in absolute valueas the criterion for identifying outliers, there were no outliers.

The initial data set had V = 340 heart rate scores, with M= 76 and SD = 6.5. There were

Collect data. Obtain a frequency table; identify impossible or questionable score values and note percentage of missing values. Obtain a histogram and visually examine it to evaluate distribution shape and skewness. Unless skewness is extreme, you probably don’t need to do anything aboutit. To evaluate outliers, obtain a boxplot and/or z

20 missing values. Using z> 3.29 in absolute value as the criterion for identifying outliers, there were 10 outliers, all at the upper end of the distribution. On the basis of prior plans for data handling, the 20 missing values and 10 outliers were removed from the data set, leaving N = 310 cases for analysis. For these 310 cases, M 68 and SD = 5.7.

scores for all cases. Either boxplots or zscores

Number of daily servingsof fruit and

can be used to identify outliers. Note the

vegetables had a possible range of scores from

number and locations of outliers.

O to 8. Scores were not normally distributed;

25% Page 149 of 624 » Location 3841 of 15772

9.11 Interpretation of Statistically Significant

madeby noticing whether obtained pis less than .05. If the obtained p value underestimates the true risk for Type I error, then the decision to

Outcomes

reject Hp may be incorrect.

Reports of “statistically significant” outcomes

9.12 Understanding Past Research

should also be viewed with caution. It is important to understand that a “statistically significant” outcome can be obtained even when Hp is correct.

When you read past research, think about these questions.

Here are some common reasons why a decision to reject Hp and call a test result “statistically significant” may be incorrect.

Were too manysignificancetests done fory valuesto be believable? There is no universally agreed upon rule about the

9.11.1 Sampling Error

number of tests that is acceptable. I suggest that if you see more than 10 pvaluesin a

A statistically significant outcome may arise

research report, you should begin to suspect

because of sampling error. That is, even when the

that at least a few of them are due to Type I

null hypothesis #0: 4 = Мрур 15 correct, some

error. Ideally, authors should acknowledge

values of the sample meanthat are quite far away

this problem (inflated risk for Type I error

from ppyp can arise just because of sampling error

when multiple tests are performed) in the

or chance. By definition, when the nominal alpha

discussion sections of papers. If an author

level is set at .05, values of Mthatare far enough

reports that an important variable was

away from pnyp to meet the criterion for the

measured 12 different ways and then reports

decision to reject Ap occur about 5% of the time

statistically significant results for only 1 of

when the null hypothesis is actually correct.

9.11.2 Human Error Human error in computation and reporting of statistics is common (Green et al., 2018). Usually errors are in favor of a researcher's preferred outcome (people rarely recheck their numbers when they haveresults they like).

9.11.3 Misleadingp Values Obtained p values underestimate true risk for Type I error. The decision to reject Ap is often

37% Page 225 of 624 - Location 5741 of 15772

these measures, you might suspect that the

other 11 measures did not turn out to be significant when they were examined.

Sometimes numerous tests are done but not included in a paper. The use of too many significance tests is problematic whether you see them in the published paper or not.

Evaluatepvaluescritically. Realize that violations of assumptions and rules (that probably are not explicitly reported in most research reports) can make p values poor estimates of the true risk for Type Ierror.

Realize thata very smallpvalue does not

necessarily imply that the effect is large in

(e.g., one-tailed ztest, a =.05, two tailed) to look up

practical or clinical terms.

estimated power for your effect size and planned

Look for effect size information.If effect size is not reported,there should besufficient information for you to calculate this by hand. All you need to find Cohen’s dis M, SD, and Hhyp (the proposed or hypothesized value of

7. Or, using .80 for power, figure out the minimum needed to have 80% power.

9.14 Guidelines for Reporting Results

Ww). Also evaluate whether the effect size is large enough to have any practical or clinical

The information to include in a research report

importance. When variables are measured in

depends on the specific test. For a one-sample £

meaningful units, #/— ppyyp is useful

test, include N, M, SD, а}, SEm, t, and (exact) 7;

information.

whether pis one tailed or two tailed; effect size

Look for confidence intervals.

information such as Cohen's ⑦and/or ーuhyp: and a CI for M(or for M-unyp). The following

Ask ifitis reasonable to generalize from the

elements should be included in a written report

types of cases in this study to larger

for a one-sample Ztest.

populations in the real world. Ask if the situation in the study is comparable with real-

world situations.

e A statement of what test was done, for what

variable. * Samplesize (W), M, SD, and SEm.

* The CI for M(or the CI for the M- Uhyp

9.13 Planning Future Research

difference). * Obtained /with its d/and exact p. State

Research methods textbooks specific to your field

whether pis one tailed or two tailed. e Traditionally,a statement of whether a test

of interest provide much information about planning research. From the perspective of NHST,

Wasstatistically significant and/or whether

here are some important issues.

the null hypothesis can be rejected has

Make decisions ahead of time about significance tests (teststatistic, a level, directional or nondirectional test).

usually been included. Proponents of the New Statistics suggest that we should avoid yes/no thinking and instead focus on confidence

intervals and effectsizes.

Make decisions ahead of time about the

ヶ Effect size (such as Cohen's à) and,if units of

identification and handling of outliers.

measurementare interpretable, a difference such as M- Mhyp may also be useful as

Estimate the population effect size. Effect sizes from past studies (your own past research or

information aboutpractical significance.

other people’s) may be used to do this. It is better

Here is an example of a complete “Results” section

to underestimate population effectsize than to

for a one-sample¿test that includes all

overestimate it.

information listed above.

Use your estimated effect size and type of test

37% Page 226 of 624 + Location 5767 of 15772

Results

We wouldlike to know something about the

A one-sample /test was conducted to assess whether mean speed for a sample of N= 9 cars differed from the posted speed limit of 35 mph. A two-tailed test was used. For this sample, M= 39, SD= 6.103, and SEy= 2.024.

probability that the null (or the alternative) hypothesis is correct, given the information in our sample data. Instead, ay valuetells us (often very inaccurately) about the probability of obtaining the values of Mand ¿we got in our sample, given that the null hypothesisis correct

The 95% CI for Mwas [34.31, 43.69]. The

(Cohen, 1994). I don't suggest that you try to say

result was 8) = 1.966, p= .0845, two tailed.

thatin aresearch report (it may confuse your

Cohen's effect size was .66; by Cohen's

readers). Here are examples of things you should

standards, this represents a medium effect.

not say.

However, the obtained 4 mphdifference between the sample mean (M= 39) and the posted speed limit (35 mph) was too small to have much practical importance.

Nevermake anyof the following statements:

ャ ク = .000 e pwas “highly”significant * pwas “almost”significant (or synonymous

We could add that, using a = .05, two tailed, this

terms such as “close to” or “marginally”

difference was not statistically significant.

significant)

A discussion section following these results

For “small” p values, such as p = .04, we cannotsay:

should consider limitations such as the following: * Results were not due to chance, or could not * An accidental sample may not be representative of (similar to) the population

be explained by chance (we don’t know that!) * Results will replicate in future studies

of all drivers in this town. If the sample

e Hpisfalse

contained mostly male (rather than female)

* Weaccept (or have proved) the alternative

drivers, or was obtained mostly during rush hour, the sample mean may overestimate driving speed for cars more generally. e This sample size (N = 9) is too small to draw meaningful conclusions. e This report makes no mention of screening for outliers (was one driver clocked at 90 mph?).

hypothesis * Because pis small, this is an important

difference We also cannot use (1 —p), for example (1 —.04 = .96), to make probability statements such as:

e There is a 96% chance that results will replicate

You maybe able to think of additional questions.

e There is a 96% chance that the null hypothesisis false

9.15 What You Cannot Say A major problem with p values is that they cannot answer the question we really want to answer.

37% Page 227 of 624 + Location 5795 of 15772

For p values on the order ofp = .37, we cannot say, “Accept the null hypothesis.” The language we use to report results should not

overstate the strength of the evidence, imply

rejecting Ho.

large effect sizes in the absence of careful evaluation of effect size, overgeneralize the

In addition to difficulties and disputes about the

findings, or imply causality when rival

logic of statistical significance testing, there are

explanations cannot be ruled out. We should

additional reasons why the results of a single

never say, “This study proves that...” Any one

study should not be interpreted as conclusive

study has limitations. As suggested in Chapter 1, it

evidence that the null hypothesis is either true or

is better to think in terms of degrees of belief. As

false. A study can be flawed in many ways that

we obtain increasing amounts of good-quality

make the results uninformative, and even when a

evidence, we may become more confident of a

study is well designed and carefully conducted,

belief. We should also pay attention to

statistically significant outcomes sometimes arise

inconsistent evidence that would reduce our belief.

just by chance. Therefore, the results of a single study should never betreated as conclusive evidence. To have enough evidence to be

We can say things such as: * The evidence in this study is consistent with the hypothesis that ... * The evidence in this study is not consistent with the hypothesis that ... Hypothesis can be replaced by similar terms, such as prediction.

confident that we know how variables are related, it is necessary to have many replications of a result based on methodologically rigorous studies. Despite logical and practical problems with NHST, most experts do not recommend that NHST and reports ofp values should be entirely abandoned. NHSTcan help researchers evaluate whether chance or sampling error are likely explanations

9.16 Summary

for an observed outcome of a study. We can’t completely get rid of risk for error, no

use of NHST are frequently violated. Samples are

matter how well we behave. But we should avoid behaviors that we know makeour risk for error

often not randomly selected from real

worse. These behaviors have been given many

populations of interest or evaluated for their

names (-hacking fishing data torturing,

representativeness relative to real-world

questionable research practices). I will remind you

populations. Researchers often report large

of these problems as you learn additional

numbers of significance tests. The desire to obtain

statistical tests.

In practice, many assumptions and rules for the

statistically significant results can tempt researchers to engage in “data fishing”; researchers may “torture”their data until it confesses (Mills, 1993). For example, they may run many different analyses or delete extreme scores until they obtain statistically significant results. When anyof these violations of rules and

Thou shalt not place too much faith in p values.

Appendix 9A: Further Explanation ofStatistical Power

assumptions are present, reported y values do not accurately represent the truerisk for incorrectly

38% Page 220 of 624 + Location 5820 of 15772

Wecan incorporate sampling error into

understanding statistical power by visualizing

statistical significance test are based on this first

two sampling distributions. The first describes

distribution. The second is the distribution of

the sampling distribution for # (and for 2) if Ho is

outcomes for Mthat we would expectto see if the

correct. The second describes the sampling

effect size = 1.00, thatis, if the real population

distribution for Mif p equals a specific value

mean (115) were 1 standard deviation above the

different than the value specified in Ho. In the

hypothesized population mean of 100 points.

following example, let’s consider testing hypotheses about intelligence scores. Suppose that the null hypothesis is

For this example, let's work with an effectsize of

Cohen’s d= 1.00. Now let’s suppose that the actual population mean

Other

Hyp = 100,

is 115. This would make the value of Cohen’s d= [y

—#hypl/SD = [115 -100]/15 = 1.00.

the sample standard deviation SD= 15, and the sample size is N = 10 (therefore df= 9). This gives

us:

The upper panelof Figure 9.1 shows the expected distribution of outcome values of ¿given Ho: 100, Ha: = 100, df= 9, and a = .05 (two tailed).

Other

SE,= 15/VN = 15/V10 = 15/3.162 = 4.74. From the tableof critical values for the £

Using df= 9, we can find the critical values of # from the table in Appendix B; for a = .05 (two tailed) with 9 df we would reject Æfor values of e

> 2.262 and for values of 7 < -2.262.

distribution, which appears in Appendix B at the end of the book, the critical values of #for a =.05,

The lower panel of Figure 9.1 shows how these

two tailed, and の= ⑨ are Z= +②.②⑥② and z= -②.②⑥②.

critical values of ¿correspond to values of M.A

The critical values of Mwouldtherefore be

value of ¿can be converted back into the original

units of measurement. The value of Mthat Other

100 — (2.262 x 4.74) = 89.28, and

100 + (2.262 x 4.74) = 110.72. In other words, we would reject #0: u = 100 if we

corresponds to a critical or cutoff value of tis M= M + (Critical * SEM). For example, a /value of 2.262 corresponds to a sample mean Mof 110.72. The reject regions for Ap can be given in terms of obtained values of M. We would reject #0: u = 100 for values of M> 110.72 and for values of M< 89.28.

obtain a sample mean Mthatis less than 89.28 or

Other

greater than 110.72.

If? = (M n, リoがip then M = Php +critical * SE,

To evaluate statistical power, we need to think

The preceding discussion shows how the

about two different possible distributions of

distribution of outcomes for #that is

outcomes for M, the sample mean. The first is the

(theoretically) expected when Apis assumed to be

distribution of outcomes that would be expected

trueis used to work out the reject regions for Hg

if Hy were true; the “reject regions”for the

(in terms of values of zor M).

38% Page 220 of 624 + Location 5846 of 15772

Note: The upper distribution shows how values of

The graph is that of a normal distribution

Mare expected to be distributed if Ao is true, that

‘where the X axis ranges from 110 to 125. The

is, u = 100. The shaded regions in the upper and

lower tails of the upper distribution correspond to the reject regions for the test of #9: u = 100. The lower distribution shows how values of Mwould bedistributed if the population mean pis actually 115; on the basis of this distribution, we see thatif pis really 115, then about 80% of the outcomes for M would be expected to exceed the critical value of M(110.72).

Figure 9.1 Illustration of Statistical Power and Risk for Type II Error (B)

Critical value of M = 110,72



critical value of Mis 110.72.

The regionthatcorrespondsto power open bracket 1 minusbetaclose bracket 0.8 ofthe distribution has beenshaded. This is the entire region to the right of 110.72. The next step is to ask what values of M would be expected to occur if Apis false (one of many ways that Æcan befalse is if pis actually equal to 115). An actual population mean of p= 115 corresponds to a Cohen’s effectsize of 1 (i.e, the actual population mean 115 is 1 standard deviation higher than the value of upyp = 100 given in the null hypothesis).

=

dl2=.025

025 7 105 100 9095 Distribution of values of M f Hy is true (Ho: 1 = 100) and SE, = 474

The lower panel of Figure 9.1 illustrates the

m}

theoretical sampling distribution of Mifthe population meanisreally equal to 115. We would

Power (1-1) 80 25 120 1s mo Distribution ofvalues of Mi y = 115 and SE,, = 4.74

The imageis a combination diagram with two graphs that illustrates the statistical power and risk for type Il error. 1. The first diagram showsthe distribution ofvalues M if H subscript 0: mu equals 100 andSE subscript M equals 4.74. The graph is that of a normal distribution ‘where the X axis ranges from 90 to 110. The

critical value of Mis 110.72.

The tail region thatcorrespondsto alpha by 2 at the tail end of the distribution on either side equals .025. This has been shaded. 2. The seconddiagram showsthe distribution of values M is mu equals 115 and SE subscript m equals 4.74. 38% Page 230 of 624 + Location 5877 of 15772

expect most values of Mto befairly close to 115 if the real population mean is 115, and we can use SEprto predict the amount of sampling error that is expected to arise for values of Macross many samples. The final step involves asking this question: On

the basis of the distribution of outcomes for #7 that would be expected if pis really equal to 115 (as shown in the bottom panel of Figure 9.1), how often would we expect to obtain values of the sample mean Mthatare larger than the critical value of M= 110.72 (as shown in the upper panel of Figure 9.1)? Note that values of Mbelow the

lower critical value of M= 89.28 would occur so rarely when p really is equal to 115 that we can ignore this set of possible outcomes. To work out the probability of obtaining a sample mean Mgreater than 110.72 when actual u = 115,

we find the ¿ratio that tells us the distance between the “real” population mean, u = 115, and

the critical value of M= 110.72. This value is £= (M

of Mshown in Figure 9.1.

—p)/SEm= (110.72 -115)/4.74 = ~.90. The likelihood that we will obtain a sample value for M that is large enough to bejudged statistically

Comprehension Questions

significant given the decision rule developed

1. What isa Type Ierror?

previously (i.e., reject Zp for A> 110.72) can now

2. Whatfactors influence the magnitude of risk

be evaluated by finding the proportion of the area

for Type I error?

in /distribution with 9 d/that lies to the right of

3. Whatis a Type II error?

z=ー⑨0.Tables ofthe or GT me &orAND 0

Greater than or equal to late whether both conditions hold Evaluate whetherane orboth oftheconditions hold

Figure 6.16 Temperature Data File With Cases Removed by Select Cases Procedure Marked by

Cross Hatches

The image is a scatterplot for perfect negative correlation where r equals minus1.

It is useful to think about the way average values

The X axis denotesthe hours underscore study and ranges from 0 to 6. TheY axis denotes the errors underscore exam and rangesfrom 0 to

medium, and high values of X). The vertical

10.

see that the group of people with low SAT scores

Thereare8 datapoints visible: 0,10; 1, 9; 2, 8; 3, 7; 4; 6; 5, 5; 6, 4; 7, 3

The line formedby joining the data points isa straight line from the top left to the bottom right.

10.5 Most Associations Are Not Perfect In behavioral and social science research, data rarely have correlations near -1.00 or +1.00;

values of 7tend to be below .30 in absolute value. When scores are positively associated (but not perfectly linearly related), they tend to fall within aroughly cigar-shaped ellipse in a scatterplot, as

shown in Figure 10.4. To see how patterns in scatterplots change as the absolute value of rdecreases, consider the following scatterplots that show hypothetical data for SAT score (a college entrance exam in the United States, Æ) as a predictor of first-year college grades (7).

of Ydiffer across selected values of X (such as low, ellipses in Figure 10.5 identify groups of people with low, medium, and high SAT scores. You can has amean GPA of 1.1, while the group of students with high SAT scores has a mean GPA of 3.6. We can’t predict each person’s GPA exactly from SAT score, but we can see that the average GPA is higher when SAT score is high than when

SATscore is low. If the correlation between GPA and SAT score is about +.50, the scatterplot will look like the one in Figure 10.6. In real-world studies, correlations

between SAT and GPA tend to be about +.50 (Stricker, 1991). The difference in mean GPA for the low versus high SAT score groups in the graph for = +.50 is less than it was in the graph for 7= +.75. Also, within the low, medium, and high SAT score groups, GPA varies more when 7= +.50 than

when 7= +.75. SAT scores are less closely related

to GPA when 7=.50 than when 7=.75. Now consider whatthe scatterplot lookslike when correlation is even smaller, for example, 7= +.20. Many correlations in behavioral science research reports are about this magnitude. A scatterplot for 7= +.20 appears in Figure 10.7. In this scatterplot, points tend to be even farther

Figure 10.5 shows a scatterplot that corresponds to a correlation of +.75 between SAT score (XY predictor) and college grade point average (GPA) (Youtcome). The association tends to be linear (GPA increases as SAT score increases), but it is not perfectly linear. If we draw line through the center of the entire cluster of data points, it is a straight line with a positive slope (higher GPAs go with higher SAT scores). However, many ofthe data points are not very close to the line.

39% Page 237 of 624 + Location 6015 of 15772

away from the line than when 7= +.50, and mean

GPA does not differ muchfor the low SAT versus high SAT groups. For correlations below about 7= .50, it becomes difficult to detect any association by visual examination of the scatterplot. Figure 10.4 Ellipse Drawn Around Scores in an X, YScatterplot With a Strong Positive Correlation

GPA

Y=GPA 40 30 20 10

300

400

500

600

700

800 SAT score

The image is an ellipse drawn around a scatterplot. The scatterplot has strong positive correlation. The axis represents SATscores and ranges from 300 to 800. The Y axis represents GPA and ranges from 1 to 4.

The scatter points mostly lie within an elliptical area within 500 to 700 on the X axis and between 2 and 4 ontheY axis. The points show positive correlation as a rise in X axis levels also seem to indicate rise in Y axis levels. There are a couple of outliers, but most points lie within theellipse. Figure 10.5 Scatterplot: Hypothetical Association Between GPA and SAT Score Corresponding to 7= +.75

250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = +.75

The image is an ellipse drawn around a scatterplot that showsa relationship between GPA and SATscores correspondingto r equals plus.75. The axis represents SAT scores and ranges from 250 to 800.The Y axis represents GPA and ranges from 1 to 4.

There are threeellipses within which most of the datapoints are clustered. Thereare a few outliers, but mostpoints lie within the ellipses. The ellipses are almost vertical. For a mean GPAof1.1, thefirst ellipse has 8

datapoints. They are clustered aroundthe 1 GPA and 400 SATscore levels.

The secondellipse is for a mean GPA of 2.4. The data points are clustered aroundthe 2 to 3 GPA range and the 500 to 600 SATscorelevels.

There are around 14 such datapoints.

The thirdellipse is for a mean GPA of 3.6. Here, datapoints are fewer,just around5, and are clustered around the 4 GPAlevel and 700 SATlevel.

A straight line through the means of the three ellipses showsstrong positive correlation. Figure 10.6 Scatterplot for Hypothetical GPA and

SAT Score With 7= +.50

39% Page 230 of 624 - Location 6043 of 15772

GPA First Year 40

Figure 10.7 Hypothetical Scatterplot for 7= +.20

GPA 40

30 so 20 20 10

250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = .50

The image is an ellipse drawn around a scatterplot that showsa relationship between GPA and SATscores correspondingto r equals plus .5. TheX axis represents SAT scores and ranges from 250 to 800.The Y axis represents GPA and ranges from 1 to 4.

There are threeellipses within which most of the datapoints are clustered. Thereare many outliers, but several points lie within the ellipses. The ellipses are vertical.

10 250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation of about .20

The image is an ellipse drawn around a scatterplot that shows a relationship between GPA and SAT scores corresponding to r equals plus .2. The X axis represents SAT scores and ranges from 250 to 800. The Y axis represents GPA and ranges from 1 to 4.

There are two ellipses within which many data points are clustered. There are manyoutliers, and several of thesepoints lie betweenthe ellipses.

For a mean GPA of 1.4, thefirst ellipse has 6

data points. Theyare clustered around the 2 GPA and 400 SATscore levels.

The secondellipse is for a mean GPA of 2.0. The data points are clustered aroundthe 1 to 3 GPA range and the 500 to 600 SATscore levels.

There are around18 such datapoints. There are many points close to the ellipse, but not contained within it.

The thirdellipse is for a mean GPAof 2.6. Here, data points are fewer, just around5, and are clustered around the 3 GPA level and 700

For a mean GPA of 2.1, thefirst ellipse has 8

data points. Theyare clustered around the 1 to 3 GPA and 400 SATscore levels.

The secondellipse is for a mean GPA of 2.4. The data points are clustered aroundthe 1.5 to 3.5 GPA range and the 650 to 700 SAT score

levels. There are around 5 such data points.

Mostof the other points lie betweenthe two ellipses and not inside them, while a straight line drawn betweenthe meansofboth ellipses is almosthorizontal.

SATlevel.

A straight line drawnbetweenthe means of theellipses is almost linear.

39% Page 230 of 624 + Location 6060 of 15772

10.6 Different Situations in Which 7 = .00

Finally, consider what scatterplots can look like

when 7is close to 0. An 7of 0 tells us that there is no linear relationship between Xand ¥. However, there are two different ways 7 close to O can happen. If Yand Yare completely unrelated, »will

beclose to 0. If Yand Yhave a nonlinear or curvilinearrelationship, 7 can also be close to 0.

analysis for this situation. An example of a different curvilinear function appears in Figure 10.10 (height in feet, F, and grade in school, X). In this example, would be large and positive, however; a straight line isnot a good description of the pattern. Height increases rapidly from Grades 4 through 7; after that,

Figure 10.8 shows a scatterplotfor a situation in

height increases slowly and levels off. If you flip

which Xis not related to Fat all. If SAT scores

Figures 10.9 and 10.10 upside down, they

were completely unrelated to GPA, the results

correspond to other possible curvilinear patterns.

would look like Figure 10.8. The two groups (low

Figure 10.8 An 7 of O That Represents No

and high SAT scores) have the same mean GPA,

Association Between Yand Y

and mean GPAfor each of these groups is equal to mean GPA for all persons in the sample. Also note

GPA

4.0

that the overall distribution of points in the scatterplot is approximately circular in shape (instead of elliptical).

3.0

However, an 7of 0 does not always correspond to a situation where Yand Yare completely

2.0

unrelated. An 7close to 0 can be found when there is a strong but not linear association between Y and Y Figure 10.9 shows hypothetical data for an

association sometimes found in research on anxiety (X) and task performance (such as exam scores, 7). The plot shows a strong, but not linear, association between anxiety and exam score. An inverse U-shaped curve corresponds closely to the pattern of changein F. In this example, students very low in anxiety obtain low exam scores (perhaps they are not motivated to study and do not concentrate). Students with medium levels of anxiety have high mean exam scores (they are motivated to study). However, students with the highestlevel of anxiety also have low exam scores; at high levels of anxiety, panic may set in and students may do poorly on exams. Pearson’s ris close to O for the data in this plot. Pearson's 7 does not tell us anything about the strength of this type of association, and it is not an appropriate

10 250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = 00

The image is that of a circle and two ellipses drawnarounda scatterplot that shows no relationship betweenX and Y. The r equals 0. The axis represents SATscores and ranges from 250 to 800. The Y axis represents GPA and ranges from 1 to 4.

There are two ellipses within which many data points are clustered. There are manyoutliers, and several of thesepoints lie betweenthe ellipses. A larger circle encircles most ofthe points as well as theellipses. For a mean GPA of 2.4, thefirst ellipse has 9

datapoints. Theyareclustered aroundthe 1 to 3 GPA and 400 SATscore levels.

39% Page 240 of 624 - Location 6080 of 15772

cubeis positive; when (X- My) is negative, its cube is negative. Skewness therefore provides information about the comparative magnitudes of positive versus negative deviations from the mean. A positive valuefor the skewness index indicates more extreme scores at the upper end of the distribution. SPSS provides a skewness index (along with the standard error of skewness, or SEskewness) that can be used to test whether skewness differs significantly from zero. However, in most research situations, visual

examination of a histogram is sufficient to

evaluate skewness. To decide whether skewness is severe, you can divide skewness by the standard error of skewness given in SPSS’s output for descriptive statistics and evaluate this ratio using standards for zscores. If the zratio is greater than 3 in

Other

(6.6)

4

/が ( ダーダ Kurtosis = UX-M,) 5 When deviations from the mean are taken to the fourth power, greater weightis given to extreme scores. Kurtosis provides more information about extreme scores in the tails than about the shape of the peak. Different distribution shapes can arise for varying degrees of kurtosis. Westfall (2014) offers examples to demonstrate that kurtosis does not provide information about the shape of distribution peaks. Figure

6.18

Skewness

vegetable servings data that appeared in Figure

NCIFV

6.10. For the daily number of servings of fruits and vegetables variable (NCIfv), skewness = 1.273,

N

|

Valid

Mean

present, values of the median and modeare

Median

normal distribution does notfit well. Descriptive statistics, including the skewness index, appear in Figure 6.18.

6.C.2 Index for Kurtosis Kurtosis has been widely misunderstood;it is

sometimes described as information about “peakedness”of distribution shape. That is incorrect (Westfall, 2014). Thinking about the

Mode Std. Deviation

es 談

Vegetable

|

492

①.⑧⑥ |

①.00

ニ ー0 2.327

談z

Std.ErrorofSkewness .①①0 NCIFV + N: valid - 492

computational formulahelps us see why. A

+ N: missing -0

common formulafor kurtosisis:

* Mean-1.86

26% Page 157 of 624 » Location 4025of 15772

and

w。

longer tail on the upper end). When skewness is usually not close together, and the curve for a

Fruit

Including

Statistics

significant” skewness. Consider the fruit and

distribution is very positively skewed (it has a

Daily

Statistics

Consumption Data

absolute value, it indicates “statistically

SEskewmess = -110, and 1.273/.110 = 11.57. This

for

Descriptive

among observations and the data collection

association between and Fbelinear. When an X

methods that tend to create problems with this

predictor variable has only two possible values,

assumption, refer to Chapter 2. When people in

the only association that Y can have with Yis

the sample have not had opportunities to

linear.

influence one another, this assumption is usually met. When this assumption is violated, values of 7 and significance tests of 7can be incorrect.

In the bivariate regression chapter, you will see that the Yindependentvariable can be either quantitative or dichotomous. However, the Y dependentvariable in regression analysis cannot

10.7.5 Xand YMust Be

be dichotomous; it must be quantitative.

Appropriate Variable Types

Correlation analysis does not require us to distinguish variables as independent versus

Some textbooks say that both Yand Ymust be

dependent; regression does require that

quantitative variables for Pearson's 7; thatis the

distinction.

most common situation when Pearson’s ris reported. However, Pearson's 7 can also be used if either Xor F, or both, isa

le

(for example, if Xrepresents membership in just two groups). When one or both variables are dichotomous, Pearson’s 7 can be reported with

different names. If we correlate the dichotomous variable sex with the quantitative variable height, thisis called a point biserial ィbp) If we correlate

the dichotomous variable sex with the dichotomous variable political party coded 1 = Republican, 2 = non-Republican, that correlation is called a phi coefficient, denoted q.

10.7.6 Assumptions About Distribution Shapes Textbooks sometimes say that the joint

distribution of Yand Ymust be bivariate normal and/or that Yand Ymust each be normally distributed. In practice thisis often difficult to

evaluate. The bivariate normal distribution is not discussed further here. In practice, it is more important that Xand Fhavesimilar distribution shapes (Tabachnick & Fidell, 2018). When and F havedifferent distribution shapes, values of rin

If Xand/or Yhave three or more categories,

the sample are restricted to a narrower range, for

Pearson’s cannot be used, because it is possible

example, -40 to +.40.

for the pattern of means on a Fvariable to show a

Figure

nonlinear increase or decrease across groups

Predictor

defined by a categorical X variable. For example, if

(Height)

“ispolitical party membership (coded 1= Democrat, 2 = Republican, 3 = Socialist, and so forth), and Fis a rating of the president’s

performance, we cannot expect changes in Y across values of Yto belinear. Consider the scatterplot in Figure 10.11 that represents an association between sex (XY) and height (7). Pearson's rrequires that the

A0% Page 242 of 624 » Location 6154 of 15772

10.11 (Sex)

Scatterplot and

for

Dichotomous

Quantitative

Outcome

Screening for Pearson's 7 The following information is needed to evaluate

the assumptions. To evaluate representativeness Height

of the sample and independence of observations, you need to know how the sample was obtained and how data were collected (see Chapter 2). In

addition: Examinefrequencytablesto evaluate problems with missing values and/or outliers

Male

Female

and/or implausible values for Yor Y(as for all analyses). Document numbers of missing

Note: M, = mean male height; M) = mean female height.

values and outliers and how they are handled.

Obtain histogramsfor Yand Y, Evaluate whether distribution shapes are reasonably

The image is a scatter plot for a dichotomous predictor such as sex and a quantitative outcomelike height. The X axis has two points; one for M1 or males andthe second for M2 or females. The Y axis showsthe height andrangesfrom 581076.

There are 10 datapoints for males and an equal number for females. These pointsappear to be in straight vertical linesfor each sex. When the midpoints of the two lines are joined, a downwardsloping line emerges. When there are problems with assumptions, nonparametric alternative correlations such as Spearman's 7or Kendall's tau may be preferred (see Appendix 10A and Kendall, 1962). These also require assumption of linearity, but they are likely to be less influenced by bivariate outliers.

10.8 Preliminary Data

40% Page 244 of 624 » Location 6181 of 15772

normal and and Fhave similar shapes. For a dichotomousvariable, the closest to normal shape is a 50/50 split in group membership. Obtain an X, Fscatterplot. This is the most important part of data screening for correlation. The scatterplot is used to evaluate linearity and identify potential bivariate

outliers. Pearson’s ris not robust against violations of most of its assumptions. A statistic such as the median is described as robust if departures from assumptions and/or the presence of outliers do not have much impact on the value of that sample statistic. Partly because ris affected badly by violations of its assumptions and additional problems discussed later, samplesizes for 7should be large, ideally at least /= ⑤0 or 100, and data screening should include evaluation of bivariate

outliers.

10.9 Effect of Extreme Bivariate

Outlier

Outliers Prior chapters discussed methods for the detection of univariate outliers (i.e., outliers in the distribution of a single variable) through

60 50

examination of histograms and boxplots. In

40

correlation, we also need to consider possible tliers. These do not necessarily have aria

30

extreme values on Yor on Y(although they may). A bivariate outlier often represents an unusual combination of values of Yand Y: For example, if Y is height and Yis weight, height of 6 ft and body

Extreme bivariate outlier included

20 10

d data without outlier

weight of 120 Ib would be a very unusual combination of values, even though these are not extreme scores in histograms for height and weight. If you visualize the location of points in your scatterplot as a cloud,a bivariate outlier is an isolated data pointthat lies outsidethat cloud.

Figure 10.12 shows an extreme bivariate outlier (the circled point at the upper right). For this scatterplot, 7= +.64. If the outlier is removed, and anew scatterplotis set up for the remaining data, the plot in Figure 10.13 is obtained; when the outlier is excluded, 7= -.11 (not significantly different from 0). This exampleillustrates that the presence of a bivariate outlier can inflate the

valueof a correlation. It is not desirable to have the result of an analysis depend so much on one

outlier score. The presence of an outlier does not always increase the magnitude of 7 consider the example in Figure 10.14. In this example, when the circled outlier at the lower right of the plot is included, » =+.532; when it is excluded, = +.86. When this bivariate outlier is included,it decreases the value of 7. These examples demonstrate that decisions

to retain or exclude outliers can have substantial impact on 7 values.

Figure 10.12 Scatterplot That Includes a Bivariate

40% Page 244 of 624 » Location 6205of 15772

The image is of a scatterplot thatincludes a bivariate outlier. The X axis rangesfrom 0 to 60 and the Y axis also rangesfrom 0 to 60. The bivariate outlier is a datapoint60, 58. Thislies outsidea circle that enclosesthe other data points. The other datapoints lie close to theregion bounded by 10 and 20 on the X and Y axes. Figure 10.13 Subset of Data From Figure 10.12 After Outlier Is Removed

Extreme bivariate outlier removed

Note: With the bivariate outlier included, Pearson's (48) = +.64, p< .001; withthe bivariate outlier removed, Pearson’s (47) = —. 10, not significant.

The image is of a scatterplot after a bivariate outlier has been eliminated. The X axis ranges from O to 25 and the Y axis ranges from O to 20. The data points lie in a circle close to the region bounded by 10 and 15 on the X and Y axes. A notebelow the graph mentionsthe following:

Note-With the bivariate outlier included,

Withthebivariate outlier included, Pearson's (48) equals plus .64, p greater than .001; with the bivariate outlier removed,Pearson's r(47) equals minus .10, notsignificant.

bivariate outlier removed, Pearson’s (47) =

It is dishonest to run two correlations (one that includes and one that excludes bivariate outliers) and then report only the larger correlation. It can be acceptable to report both correlations so that

readers can see the effect of the outlier. Decisions about identification and handling of outliers

should be made before data arecollected. Figure 10.14 A Bivariate Outlier That Deflates the Size of 7

Pearson's (48) = +.532, p< .001; with the

+.86,p .25) is large. Guidelines are

summarized in Table 10.3. Below rof about .10, a correlation represents a

relation between variables that we can detect in statistical analyses using large samples, but the

relation is so weakthatit is not noticeable in everyday life. When 7= .30, relations between variables may be strong enough that we can detect them in everydaylife. When ris above =

10.17 Pearson’s rand 7? as Effect Sizes and Partition of Variance Both Pearson’s rand 72 аге indexes of effectsize.

.50, relations may beeasily noticeable in everyday life. These guidelines for effect size labels are well known and generally accepted by researchers in social and behavioral sciences, but they are not set in stone. In other research situations, an effect

size index other than and/or different cutoff values for small, medium, and large effects may be

They are standardized (their values do not depend

preferable (Fritz, Morris, & Richler, 2012). When

on the original units of measurement of Yand №),

findings are used to make important decisions

and they are independent of sample size N.

that affect people’slives (such as whether anew

Sometimes 7 is called the coefficient

medical treatment produces meaningful

determination; I prefer to avoid that term because

improvements in patient outcomes), additional

it suggests causality, and as noted earlier,

information is needed; see further discussion

correlation is not sufficient evidence for causality.

about effect size in Chapter 12, on the

An 2 estimates the proportion of variance in F

independent-samples ¿test.

that can be predicted from X (or, equivalently, the

Figure 10.22 Overlapping Circles: Proportions of

proportion of variance in Xthat is predictable

Areas Correspond to 7? and (7 - 2)

from 少 . Proportion of predicted variance (72) can be diagramed by overlapping circles, as shown in Figure 10.22. Each circle represents the total variance of one variable. The area of overlap between circles is proportional to /2, the shared or predicted variance. The remaining area of each circle corresponds to 1-72; this represents the proportion of variance in ¥that is not predictable

42% Page 258 of 624 - Location 6555 of 15772

The image shows two overlapping circles. Whenbothcircles intersect, the area of overlap between circlesis proportional to r squared. Circle X on the left represents 1 minus r squared. Circle Y on the right also represents1 minus r squared.

As shown in Figure 10.22, 72 and (1 - /2) provide a partition of variance for the scores in the sample. For example, the variance in Fscores can be partitioned into a proportion that is predictable from X (7%) and a proportion that is not predictable from X (1 - 72). Ап / is often referred

Table 10.377

ICEE Large effect ly noticeable difference in real fe, suchas 2in, di

Between medium and large Medium effec Between small and medium effect

to as “explained” or predicted variance in 万 ① - ア )

is variance or variancethat cannot be predicted from Y: In everydaylife, e7ror usually means “mistake” (and mistakes sometimes do happen when statistics are calculated and

o

reported). The term error means several different things in statistics. In the context of correlation and many other statistical analyses, error refers to

o

a

y signi large samples butis not noliceable or detectable in everyday lite.

Between small and noeffec Noeffec o o Source: Based on Cohen(1988). Why did Cohen choose .30 as the criterion for a medium effect? I suspect it was because

the collective influence of several kindsof factors that include other predictor or causal variables not included in the study, problems with measurements of Xand/or ¥, and randomness. Suppose the Yvariableis first-year college GPA,

and Xis SAT score. Correlations between these variables are on the order of .4 to .5 in many

correlations of approximately 7=.30 and below

studies. If 7=.5, then? =.25 = 25%ofthe

are common in research in areas such as

variance in GPAis predictable; and (1-7?) = 75%

personality and social psychology. For example,

of the variance in GPA is error variance, or

Mischel (1968) remarked that values of greater

variance that is not predicted by SAT score. In

than .30 are rare in personality research. In some

statistics, errorrefers to all other variables that are

fields, such as psychophysics and behavior

not included in the analysis that may influence

analysis research in psychology, proportions of

GPA. For example, GPA may also depend on

explained variance tend to be much higher

variables such as amount of time each student

(sometimes on the order of 90%). The effect size

spends partying and drinking,difficulty of the

guidelines in Table 10.3 would not be used in

courses taken by the student, amount of time

research fields where stronger effects are

spent on outsidejobs, life stress, physical illness,

common.

and a potentially endless list of other factors. If we

have not measured these other variables and have In practice, you will want to compare your ヶ and 〆

not included them in our data analysis, we have

values with those obtained by other researchers

no way to evaluate their effects.

who study similar variables. This will give you some idea of how your effect sizes compare with those in other studies in your research domain.

42% Page 259 of 624 » Location 6583 of 15772

By now, you may be thinking, If we could measure

these other variables and include them in the analysis, then the percentage of variance in GPA

detecting a population effect size p of .50. In

to have at least N = 100 cases where correlations

statistical power analysis, power of .80 is used as

are reported, to avoid situations where there is

the goal (i.e., you want an 80% chance of rejecting

not enough information to evaluate whether

Hoif Hy is false).

assumptions (such as normality and linearity) are

satisfied and situations where one or two extreme Using Table 10.4, it is possible to look up the

outliers can have a large effect on the size of the

minimum Wof participants required to obtain

sample correlation. The following is slightly

adequate statistical power for different

paraphrased from Schénbrodt (2011):

population correlation values. For example, let a = .05, two tailed; set the desired level of statistical power at .80 or 80%; and assume that the true

population value of the correlation is p = .5. This implies a population p2 of .25. From Table 10.4, a minimum of N = 28 subjects would be required to have power of 80% to obtain significant sample result if the true population correlation is p = .50. Note that for smaller effects (e.g., a p? value on the order of .05), samplesizes need to be substantially larger; in this case, V= 153 would be needed to have power of .80.

From my experience as a personality psychologist, I do not trust correlations with

N< 80. … Nof 100-120 is better. In this region, correlations get stable (this is of course only a rule ofthumb and certainly depends on the magnitude of the correlation). The p value by itself is bad guidance, as in small samples the CIs are very huge ... the CI for 7= .34 (with N= 35) goes from .008 to .60, which is “no association” to “a strong association.” Furthermore, 7is rather susceptible to outliers, which is even more

Table 10.47

serious in small samples.

EREEREEEEAER

Guidelines about sample size are not chiseled into stone. When data points are difficult to obtain,

sometimes researchers have no choice but to use small Vs. Be aware that small Vs are not ideal and that results obtained using small samples will have wide confidence intervals and may not replicate closely in later studies.

Source:Adaptedfrom Jaccard and Becker(2009). Post hoc (or postmortem) power analyses should not be conducted. In other words, if you found a

10.19 Interpretation of Outcomes for Pearson’s 7

sample 7? of .03 using an Vof 10, do notsay, “The

value of M.

10.19.1 When ris Not Statistically Significant

Even if power analysis suggests that a smaller

If rdoes not differ significantly from 0, this does

samplesize is adequate, it is generally a good idea

not prove there is no association between Yand Y

rin my study would have been statistically significant if Thad Nof 203” (or some other larger

43% Page 261 of 624 - Location 6636 of 15772

Chapter Sampling Error and Confidence Intervals

wants to say something about the mean lengthfor thepopulation ofall lizards on the island. Two problems must be considered when using information from a sample to make inferences

7.1 Descriptive Versus Inferential Uses of Statistics Upto this point, we have used statistics such as / and SD only to describe scores in small samples. In some real-life situations, such as evaluation of exam scores for a class of students, that is all the data analyst wants to do. For example, a teacher may report summary information such as the mean, median, minimum, maximum, and

standard deviation of scores in his or her class.

about a population. One issue, discussed earlier, is representativeness of the sample. Is the sample similar to the population of interest? We should be careful not to generalize results from a study to a population that includes many kinds of people that were not included in the study. Now we consider a second problem that arises when using a sample to make inferences about a population: the problem of sampling error. Different samples, drawn from the same population, usually have different sample means. Variation in values of M across different samples from the same

However, teachers typically do not use this

population is called sampling error. How much

information to make inferences about students outside the class. When the use of statistics is

Mis close to the population mean, it is a good

limited to description of a sample, that is called a

1

s. When instructors

engage in descriptive use of statistics, they report their results something like this: “In the sample of students in my classroom, = 36, M= 85, and SD = 10.” Then the instructor stops and makes no statements about larger populations of students beyond the students included in the class. In scientific studies, however, researchers almost always wantto say something about a population of cases beyond the cases included in the study. Here is a simple hypothetical example of an

inferential

us

s. À researcher wants

to estimate (make an inference about) population mean length of lizards for the entire population of lizards on an island. Suppose it is not possible to

can we believe the mean from any one sample? If estimate of that population mean; if it is far, then it is not a good estimate. We need to have some idea how far any individual value of is Mlikely to be from the population mean. This may appearto be an unanswerable question. How can we say anything about the distance of a sample mean Mfrom the population mean if we don’tknowthe population mean? However, this question can be answered by creating artificial (imaginary) populations of scores for which we do know the population mean, drawing many different samples from those imaginary populations, and examining the distributions of values of Macross all of these samples (for example, by setting up a histogram for values of

M.

locate every lizard. A biologist captures a sample of N= 25 lizards and finds mean length M= 2 in. The researcher can say, “The mean length 72 ту sampleis 2 in.” However, the biologist probably

27% Page 167 of 624 » Location 4220 of 15772

7.2 Notation for Samples Versus Populations

A spurious correlation may occur because of

more weight, this outcome would be consistent

chance or coincidence. Spurious correlations arise

with the hypothesis that diet drinks cause weight

because a third variable (sometimes referred to as

gain. Further research would then be needed to

a “confounded variable”or “lurking variable”) is

figure out a possible mechanism, for example,

involved. For the example of ice cream sales and

specific ways in which diet drinks might change

homicide, the confounded variableis

metabolism.

temperature. In hotter months, homicide rates increase, and ice cream sales increase (Peters,

As a beginning statistics student, you have not yet

2013).

learned statistical techniques that can be used to

However, correlations that seem silly are not

using these methods can lead to mistaken

always spurious. Some evidence suggests that

judgements whether a correlation is spurious or

consumption of diet drinksis related to weight

not.

assess more subtle forms of spuriousness. Even

gain (Wootson, 2017). At first glance this may seem silly. Diet drinks have few or no calories; how could they influence body weight? In this case, I'm not sure whether the correlation is spurious or not. On one hand, some artificial sweeteners might alter metabolism in ways that promote weight gain. If that is the case, then this correlation is not spurious: the artificial sweeteners may cause weight gain. On the other hand,it is possible that weight gain increases diet drink consumption (i.e., people who worry about their weight may switch to diet drinks) or that consumption of diet drinksis related to confounded or lurking variables, such as exercise. People who don't exercise may gain weight; if people who don't exercise also consume diet drinks, then consumption of diet drinks will have a spurious(not directly causal) correlation with weight gain.

When unexpected or odd or even silly correlations arise, researchers should not go through mental gymnastics trying to explain them. In a study of folk culture, a researcher once reported that nations with high milk production (X) also scored high in ornamentation of folk song style (7). There was a positive correlation between Xand ¥. The author suggested that the additional protein provided by milk provided the energy to generate more elaborate song style. This is a forced and unlikely explanation. For beginning students in statistics, here is my advice: Be skeptical about what you read; be careful what you say. There are many reasons why sample correlations may not be good estimates of population correlations. Spurious correlations can happen; large correlations sometimes turn up in situations where variables

When correlations are puzzling or difficult to explain, more research is needed to decide whether they might indicate real relationships between variables. If a study were done in which all participants had the same daily calorie consumption, and participants were randomly divided into groups that did and did not consume diet drinks, then if the diet drink group gained

43% Page 263 of 624 - Location 6692 of 15772

are not really related to each other in any meaningful way. A decision to call a correlation statistically significant can be a Type I error; a decision to call a correlation not significant can be a Type II error. (Later you will learn that correlation values also depend on which other variables are included in the analysis.)

Researchers often select Xand Fvariables for correlation analysis because they believe Yand Y

have a meaningful, perhaps causal, association.

ves taser) SPSS Statistics Data or c re Data ant raie Grohe ve oe .

mai Dn ニー =

However, correlation does not provide a rigorous way to test this belief.

am

When you report correlations, use language that is consistent with their limitations. Avoid using terms such as proof; and do not say that X causes, influences, or determines ¥when your data come from nonexperimental research.

10.20 SPSS Example: Relationship Survey

_ewemel

incoado ! RE ele ⑧

es

»

limes

Ра 2 4 2

2 3 E n

1 4 3 3

General Linear Model Goneralizod Linear Models aed Models. Correlate

» » »

-

⑧① i

ーー = ジーー

n= I EJ canonica Constan Е

оD n Е -



2

i ① i 73

ha

Borsa E [= т

avant





pata ano Temporal ode.» ⑥ т

>

2

⑧ 3 i ⑨ョ

The file called love.sav is used in the following example. Table 10.1 lists the names and

characteristics of variables. To obtain a Pearson correlation, the menu selections (from the menu bar above the data worksheet) are っ > , as shown in Figure

10.23. The term //variate means that each requested correlation involves two (2/means two) variables. These menuselections open the Bivariate Correlations dialog box, shown in Figure 10.24.

The data analystuses the cursor to highlight the

names of at least two variablesin the left-hand pane (whichlists all the variables in the active data file) for correlations. Then, the user clicks on

the arrow button to moveselected variable names into the list of variables to be analyzed. In this example, the variables to be correlated are named commit (commitment) and intimacy. Other boxes can be checked to determine whether significance tests are to be displayed and whether two-tailed or one-tailed p values are desired. To run the analyses, click the OK button.

Figure 10.23 Correlations

Menu Selections

for

Bivariate

The image is a screenshot from SPSSthat helps in menu selections for bivariate correlations. The details are below; At thetopofthe spreadsheetarethefollowing menu buttons;file, edit, view,data, transform, analyze, graphs, utilities, extensions, window and help Below these buttons are icon buttonsfor table editing options. On the clicking of the Analyze button,a dropdown menu withthefollowing options has opened; reports, descriptive statistics, Bayesian statistics, tables, compare means, general linear model, generalizedlinear ‘models, mixed models, correlate, regression, loglinear, classify, dimension reduction, scale, non-parametric tests, forecasting, survival, multiple response, simulation, quality control, ROC curve, and spatial and temporal modelling. An arrow next to correlate showsthat this has

been depressed. The following menuoptions have opened; Bivariate,Partial, Distances and Canonical correlation. Bivariate has been indicated by an arrow. The output from this procedure is displayed in

43% Page 264 of 624 - Location 6717 of 15772

Figure 10.25, which shows the value of the

variable with itself is 1 (by definition). Only one of

Pearson correlation (7= +.745), the p value (which

the four cells in Figure 10.25 contains useful

would be reported as p< .001, two tailed), and the

information. If you have only one correlation, it

number of data pairs the correlation was based on

makes sense to reportit in sentence form (“The

(W= 118). The degrees of freedom for this

correlation between commitment and intimacy

correlation are given by V-2, so in this example,

was {116] = +.745, p< .001, two tailed”). The

the correlation has 116 d/ (A common student

value in parentheses after ris usually assumed to

mistake is confusion of dfwith N. You may report

be the df, unless clearly stated otherwise.

either of these as information about samplesize, but in this example, note that d/= 116 [W-2] and

It is possible to run correlations among many

N= 118.)

pairs of variables. The SPSS Bivariate Correlations

Figure 10.24 SPSS Dialog Box for Bivariate Correlations

list of five variables: intimacy, commit, passion,

1% Bivariate Correlations

length (of relationship), and times (the number of times the person has been in love). If the data

Variable:

analyst enters alist of five variables, as shown in

& my [2 com

[& vercer | servant en あ sa | ff mes

dialog box that appears in Figure 10.26 includes a

this example, SPSS runs the bivariate correlations among all possible pairs of these five variables (as shown in Figure 10.27). If there are variables,

passion

the number of possible different pairs of variables is given by [4 x (4—1)]/2. In this example with #= 5 variables, (5 x 4)/2 = 10 different correlations are

Correlation Coefficient

Pearson ©] Kendarstab [E] speorman

reported in Figure 10.27. Note that because correlation is “symmetrical” (i.e., the correlation

Testof Significance-

© Two-taied © Oretaied

between Xand Fis the same as the correlation between Fand X), the correlations that appear in

[Y Flag significant correlations

Lex) Eme.) Cen Caro) Ce)

the upper right-hand corner of the table in Figure

Figure 10.25 Output for One Pearson Correlation Correlations commit intimacy

intimacy commit

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

T

118 745% .000 118

Tas .000 118 1 118

Correlation is significant at the 0.01 level (2-tailed).

Note that the same correlation appears twice in Figure 10.25. The correlation of intimacy with commit(.745) is the same as the correlation of commit with intimacy (.745). The correlation of a

43% Page 265 of 624 - Location 6740 of 15772

10.27 are the same as those that appear in the

lower left-hand corner. When suchtables are presented in journal articles, usually only the correlations in the upper right-hand corner are

shown. Figure 10.26 Bivariate Correlations Dialog Box: Correlations Among All Variables in List

\&, Bivariate Correlations de cender

E cenar E cta

Correlation Coefficients

[В Pearson [E] Kendars aud [J speaman Test of Significance @ 7wo-taied © One.taied

Correlations Umes ето Trina commit] passion 008 ams r er т nen Pearson Correlation 934 058 000 000 Sig. (2-tailed) 115 118 114 118 118 N 008 | oe 1 745 commit Pearson Correlation 929 033 000 000 Sig. (2-tailed) 115 118 114 18 118 N 041 o 1 sn TE passion Pearson Correlation 670 432 000 000 Sig. (2-tailed) m 114 ne 14 114 N 090 1 om 197° 175 length Pearson Correlation 310 an 033 058 sig (2-tailed) 115 ne 114 118 118 N 1 090 081 008 008 Pearson Correlation ‘mes 340 670 29 934 Sig. (2-talled) 115 ns un us 115 N *"cotreation is significant atthe 0.01 level (2-talled) +. Correlation is significant atthe 0.05 level (2-1ailed). Figure 10.28 Initial SPSS Syntax Generated by Paste Button

[7 Fig sinificant corrections

E

Cos Lea (ret) (conca) Cro The image is a dialog box that shows how to select bivariate correlations among all variables in the list. On the left are the set of variables that includes the following; gender, genpart, and attach. On the right are theselected variablesintimacy, commit, passion, times, and length. Times has been highlighted. Below this are options to select correlation coefficients including Pearson, Kendall's tau-b and Spearman.Pearsonhas been chosen. There is a choice oftest of significance — ‘whetheronetailed or two tailed, where the choice is made through radio buttons. Two tailed has been selected.

A check box that states Flag significant correlationshas beenticked. On the right is an options button. At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 10.27 Correlation Output: All Variables in List (in Figure 10.26)

43% Page 267 of 624 - Location 6762 of 15772

The imageis an SPSS syntax generated by a Paste button.

At thetopofthe sheet are thefollowing menu buttons;file, edit, view, data, transform, analyze, graphs, utilities, add-ons, run, tools, window andhelp. Below these buttonsare icon buttonsto open a

file, save, print, go back andforward, and othertable editing options.

On the left, the statement Dataset activate and

correlations is seen. Ontheright, the editor showsthe SPSS commandsfor correlation that were generatedby the user's menu selections which are now pasted here. There are6 statements; 1. Dataset activate dataset6

2. Correlations 3. Backslash variables equals intimacy commitpassion lengthtimes

4. Backslash print equals twotail nosig 5. Backslash missing equals pairwise.

and passion) and two predictor variables X; and X, (length and times). To do this, we need to paste and edit SPSS syntax.

Figure 10.29 Edited SPSS Syntax to Obtain Selected Pairs of Correlations

Look again at the Bivariate Correlations dialog box in Figure 10.26; there is a button labeled Paste. Clicking the Paste button opens a new window, called a Syntax window, and pastes the SPSS commands(or syntax) for correlation that were generated by the user’s menu selections into this window. The initial SPSS Syntax window appears in Figure 10.28. Syntax can be saved,printed, or edited.It is useful

The image is an SPSS syntax and atthe top of the sheet arethefollowing menubuttons;file, edit, view,data, transform,analyze, graphs, utilities, add-ons, run, tools, window and help. Below these buttonsare icon buttons to open a

file, save, print, go back andforward, and other table editing options.

On the left, the statement Dataset activate and

to save syntax to document what you have done or to rerun analyses later. In this example, we will edit the syntax; in Figure 10.29, the SPSS keyword WITH has been placed within the list of variable

names so that the list of variables in the CORRELATIONS command now reads, “intimacy commit passion WITH length times.” It does not

matter whether the SPSS commands are in uppercase or lowercase; the word WITH appears

correlations is seen. Ontheright, the editor showstheedited SPSS commandsfor correlation that were generatedby the user's menuselections which are now pasted here.

in uppercase characters in this example to makeit

Thereare6 statements;

(length and times). Variables can be grouped by

1. Dataset activate dataset6

2. Correlations 3. Backslash variables equals intimacy commitpassion WITH length times 4. Backslash print equals twotail nosig 5. Backslash missing equals pairwise.

easy to see. In this example, each variable in the second list (intimacy, commit, and passion) is

correlated with each variablein the firstlist kinds of variables. Length and times are objective information aboutrelationship history that could be thought of as predictors; intimacy, commitment, and passion are subjective ratings of relationship quality that could be thought of as

outcomes. This results in a table of six correlations, as shown in Figure 10.30.

Correlations among long lists of variables can generate hugetables, and often researchers want to obtain smaller tables. Suppose a data analyst wants to obtain summary information about the

correlations between a set of three outcome variables 71, ¥», and ¥3 (intimacy, commitment,

44% Page 267 of 624 + Location 6779 of 15772

If many variables are included in the list for the bivariate correlation procedure, the resulting table of correlations can be large. It is often useful to set up smaller tables for subsets of correlations that are of interest, using the WITH command to designate which variables should be paired.

Note that judgments about the significance of» values indicated by asterisks in SPSS output are not adjusted to correct for the inflated risk for Type I error that arises when large numbers of significance tests are reported. If a researcher wants to control or limit the risk for Type I error, this can be done by using Bonferroni-corrected per comparison alpha levels to decide which, if any, of the p values reported by SPSS can be judged

TTT Commitment ぁ a Passio ヶ os Note:Judgments aboutstatisticalsignificance were based on Bonferroni-correctedper comparison alphas. To achieve EW- .05 for this set ofsix correlations, each correlation was evaluated using PC, =.05/6 = .008. By this criterion, noneofthe 6 correlations can bejudged statistically significant.

statistically significant. For example, to hold the

As a reader, you can generally assume that

EW level to .05, the PC, level used to test the six

statistical significance assessments have not been

correlations in Table 10.5 could beset to à = .05/6

corrected for inflated risk for Type I error unless

= .008. Using this Bonferroni-corrected PC, level,

the author explicitly says this was done.

none of the correlations would be judged statistically significant.

Figure 10.30 SPSS Correlation Output From Edited Syntax in Figure 10.29

10.21 Results Sections for One and Several Pearson’S 7 Values

Correlations

length intimacy

commit

passion

times

Pearson Correlation Sig. (2-tailed) N Pearson Correlation

175 .0⑤⑧ ⑪⑧ .197*

-.008 .⑨③④ ⑪⑤ .00⑧

Sig. (2-tailed)

033

.⑨②⑨

N Pearson Correlation Sig. (2-tailed) N

⑪⑧ .0⑦④ .④③② 114

⑪⑤ -0④① 670 111

*, Correlation is significant at the 0.05 level (2-tailed).

Following is an exampleof a “Results” section that presents the results of one correlation analysis.

Results A Pearson correlation was performed to assess whether levels of intimacyin dating relationships could be predicted from levels of commitmenton a self-report survey administered to 118 college students currently involved in dating relationships.

Note: Correlations between variables in the

Commitmentand intimacy scores were

first list (intimacy, commitment, passion) are

obtained by summing items on two ofthe

correlated with variables in the second list

scales from Sternberg’s (1997) Triangular

(length of present dating relationship,

LoveScale; the range of possible scores was

number of times in love). The p values in

from 15 (low levels of commitment or

Figure 10.30 were notcorrected for inflated

intimacy) to 75 (high levels of commitment

Type IL error.

or intimacy). Examination of histograms indicated that both variables had negatively

skewed distributions. Scores tended to be

Table 10.54

high on both variables, possibly because of social desirability response bias and a ceiling effect (most participants reported very

A4% Page 268 of 624 - Location 6906 of 15772

positive evaluations of their relationships).

number of times the participant has been in

Skewness was not judged severe enough to

love, on the basis of a self-report survey

require data transformation or removal of

administered to 118 college students

outliers.

currently involved in dating relationships.

The scatterplot of intimacy with commitment showed a positive linear relationship. There was one bivariate outlier with unusually low scores for both intimacy and commitment; this outlier was retained. The correlation between intimacy and commitment was statistically significant,

Intimacy, commitment, and passion scores

were obtained by summing items on scales from Sternberg’s (1997) Triangular Love Scale; the rangeof possible scores was 15 to

75 on each of the three scales. Examination of histograms indicated that the distribution shapes were not close to normal for any of these variables; distributions of scores were

A116) = +.75, p< .001 (two tailed). The 7? was

negatively skewed for intimacy,

.56; about 56% of the variance in intimacy

commitment, and passion. Most scores were

could be predicted from levels of

near the high end of the scale, which

commitment. This is a strong effect. The 95%

indicated the existence of ceiling effects, and

CIwas[.659,.819]. This relationship remained strong and statistically significant, (108) = +.64,p< .001, two tailed, even when outliers with

there were a few isolated outliers at the low endsof the scales. Skewness was not judged severe enough to require data transformation

or removal of outliers.

scores less than 56 on intimacy and 49 on

Scatterplots suggested that relationships

commitment were removed from the sample.

between pairs of variables were (weakly)

linear. The six Pearson correlations are Note that if SPSS reports p = .000, reportthis as р
> . When the dialog box for the one-sample # test appears, as in Figure 7.12, move the name of

the variable of interest into the list of variables to be analyzed. Leave the box “Test Value” containing

the default value of 0. Then click OK. Figure 7.11 Descriptive Statistics for Temperature

in Fahrenheit in shoemaker.sav

Statistics

temperature data collected through smart phone crowdsourcing is reported by Hausman et al.

(2018).

temp_Fahrenheit

Values of N, M, and SD for the Fahrenheit

N

temperature scores in the file shoemaker.sav were obtained using the SPSS frequencies procedure (menu selections are not repeated from earlier chapters). Results appear in Figure 7.11. The first thing to notice is that the sample mean in Figure 7.11, M= 98.25, is lower than the population mean that people generally believe

Valid

130

Missing

‘ 0

E⑨.②⑤④」 Std. Error of Mean

.0667

Std. Deviation

.7603

(98.6). The difference is (98.25 — 98.6) = —35. This sample mean is about a third of a degree lower than the generally accepted value. (Note that if you look up Shoemaker's article, numerical values

30% Page 123 of 624 - Location 4675 of 15772

The image is a table that showsthefollowing descriptive statistics data:

large gaps betweenthe data points.

amount of water. However, high-income households may include people who live in small but expensive apartments (who may not use very much water) and people with huge estates (who fill swimming pools and water vast lawns). The vertical arrows indicate the approximate range or variance of scores for low- versus high-income groups. Variance in water use is much greater for high values of income than low values of income. In thissituation, Pearson’s rwould not be a complete description; information about the

differences in variances in amount of water use for different levels of income would also be needed. Figure 11.3 Hypothetical Data: Water Use Plotted by Household Income Showing Heteroscedasticity

Gallons of water use

Violation of this assumption would result in larger prediction errors in the outcome variable (water use) for people with high incomes than for people with low incomes.

11.7 Formulas for Bivariate Regression Coefficients We want to find values of 29 and ¿thatgive us the equation thatfalls as close as possibleto all the points in the scatterplot. Another way to say this

is that we want coefficients that minimize the prediction errors. The prediction or

residual for each caseis just the difference between each person's actual value of Yand the value of Y predicted using the regression line:

Large

Other

(11.4) Prediction errorfor person i = (Y;— Y).

‘Small variance

O

If Ann's actual salary Yis $40,000, and her predicted salary, ¥, is $43,000, the prediction

Household income

error is (40,000 - 43,000) = -3,000. Her actual salary is $3,000 lower thanthe salary predicted

The image is a scatterplot thatdisplays heteroscedasticity.

for her using the regression line.

The X axis denotes household incomeandthe Y axis the gallons of water used. There are several scatter dotsspread all overthe graph area, andthedistance betweenthem increase as they move from left to right.

provides predicted values (7) that are as close as

Two areasof the scatter dots have been circled. The first is atthe bottom left and is termedthe small variables andthe second is the top right and is termedthelarge variance. The small variance shows less gap between the data points andthe large variance section has 49% Page 297 of 624 » Location 7562 of 15772

The “best” regression equation is the one that possibleto the actual Y. We need to summarize information about the magnitudeof prediction errors across all persons in the sample. Can we just add up the prediction errors? No. I will tell you (without proof) that the sum of the prediction errors across all cases, , always equals 0. This should not surprise you; you have seen that deviations often sum to zero. Therefore, adding prediction errors won't provide useful information. We encountered the same problem

when we wanted to compute a sample variance,

because the sum of deviations of X scores from the sample mean was also O in that situation. Recall

Other

(11.6) R

The solution that was used to compute the

b=r—. Sx

variance of Xwasto square each deviation and

See Appendix 11C for an alternative equation to

that the “bag of tricks” in statistics is fairly small.

then sum the squared deviations. The same trick is used here. We compute SSE (sum of squared prediction errors) as follows:

compute the estimate of À.

Note that the ratio of standard deviations in this equation is sy/syor, more generally,

Other

dependent/Sindependent- To understand this ratio,

(11.5)

SSE = X[(¥,- Y/Y]. Equation 11.5 says that we calculate a prediction error for each individual case, square each

prediction error, then sum the squared errors for all cases. We want to obtain the values of A and % that minimize SSZ. The method used by

mathematical statisticians to derive the formulas to compute Zp and dis called ordinary least squares (OLS). Many other statistics you'll learn

you can think of it this way. When you want to predict Yscores from Y scores: * First you need to divide Y scores by syto take

them out of the X score units of

measurement. e You multiply that result by 7 this provides information about the sign and strength of

linear association between Yand Yscores. * Then you multiply by syto convert the result

into units of the Youtcome variable.

are also based on OLS estimation. Appendix 11B

Wealso need to adjust predictions to take into

explains how the formulasfor 29 and ¿that yield

accountthe differences between the means of X

the minimum SSZ (the smallest prediction errors)

and ¥. The intercept 4p (the predicted value of F

were obtained.

when Æ= 0, or the point on the graph where the

Fortunately, because optimal formulas for the best values of Zp and pare known, you don’t have to solvethis problem every time you want todo a regression. In practice, here is how to calculate the estimated values of band Zp: First, find the means and standard deviations for

regression line crosses the Yaxis) can be computed from the means of Yand 7, and the raw-score slope 7, as follows:

Other

(11.7)

b,=M,— bx My,

Xand Yand their bivariate correlation 7yy: (By now, these computations should be familiar.)

where Myis the mean of the Fscores in the sample and Myis the mean of the Yscores in the

Once you havethe values of My, Y, sx, sy, and 7xy, the estimate for the raw-score slope ¿to predict Y

from Xis:

sample. Including Zp in a regression equation adjusts for differences between the means of X

and ¥.

49% Page 298 of 624 » Location 7587 of 15772

Equation 11.6 makes it clear that 4is essentially a

rescaled version of 7. Unlike the unit-free

Tests for Bivariate Regression

Pearson’s 7, which has a range from -1 to +1, the

The null hypothesis for a regression with one

range of 2depends on the units in which Yand Y

predictor variable can bestated as follows: Ho: do =

are measured.

0. (We can also test the null hypothesis Zp: 2p = 0,

Notice several implications of Equation 11.6:

but this is usually not of interest.)

* Ifrequals0, then Zalso equals 0.

When you run the SPSS regression procedure, you

* Thesign of pis determined by the sign of =

Will obtain a ¿ratio to test this null hypothesis. As

* The magnitude of the raw-score 4 coefficient

in earlier situations, this ¿test has the following

depends on the standard deviations of both variables. Values of 2can range from extremely small into the thousands and

form: Other

(11.8)

above. * Assy(the standard deviation of the outcome

Samplestatistic - Hypothesized parameter _ b-0 imple statistic

variable) increases, holding sy constant, 2 increases. * Onthe other hand, as sy(the standard deviation of the predictor variable) decreases, holding syconstant, also increases. * Because Ais a rescaled version of 7, factors thatcan inflate or deflate (discussed in Appendix 10D) can also influence the magnitudeof 7. The B (beta coefficient) to predict zp from zris В = 7. The standard-score (z-score) version of the regression equation does not require an intercept to adjust for means of the variables, because z scores have means of 0. Because P = 7, factors that

can inflate or deflate also influence the magnitudeof B. Because the magnitudes of Zand B are influenced by many of the same problems that can make sample 7s poor estimates of the true population correlation p, comparisons of values of bor B (across samples or predictor variables) should only be made with great caution.

11.8 Statistical Significance

49% Page 299 of 624 » Location 7618 of 15772

with (W-2) df An estimate of SF; is needed to set up the zratio. SPSS provides this estimate; if you need to calculate it by hand, the formula is as

follows: Other

(11.9)

/(N-2) E(7-Yy SE, = (ダ - W)

This ¿ratio is evaluated relative to a /distribution with V-2 4%, where Nis the number of cases. SPSS provides this ¿test along with a two-tailed y value. SPSS also provides a test of whether the intercept bp equals zero; however, this test is rarely of interest and usually is not reported.

11.9 Confidence Intervals for Regression Coefficients Using SE, the upper and lower limits of the 95%

Claround #can be calculated using the usual

formula: Other

11.11 Empirical Example Using SPSS: Salary Data A smallset of hypothetical data for the empirical

(11.10)

Lower limit =b—¢_, t x SE,,

exampleis given in the SPSSfile salary.sav (with N = 50 cases). The predictor, X, is the number of years employed at a company; the outcome, ¥, is

Other

the annual salary in dollars. The research question is whether salary changes in a systematic (linear)

(11.11)

Upper limit = b + #“crit x SE,

way asyears of job experience or job seniority

where Zit is the critical value of ¿that separates

salary can an individual expect to earn for each

the bottom 2.5%, the middle 95%, and the top

additional year of employment?

2.5% of the area in a ¿distribution with が-② が SPSS provides the upper and lower limits of the

95% CI for the Acoefficient.

11.10 Effect Size and Statistical

increases. In other words, how many dollars more

To run a bivariate linear regression, make the following menu selections: っ > (as shown in Figure

11.4). This opens the main dialog box for the SPSS Linear Regression procedure, which appears in

Power

Figure 11.5. The name of the dependent variable

For a bivariate regression, the assessment of

(years) were moved into the panes marked

statistical power can be done using the same

“Dependent” and “Independent(s)”in this main

statistical power tables as those used for Pearson’s

dialog box.It is possibleat this point to click OK

7. In general,if the researcher assumes that the

and run a regression analysis; however, for this

strength of the squared correlation between Xand

example, the following additional selections were

Yin the population is weak, the number of cases

made. The Statistics button was clicked to open

required to have power of .80 or higher is rather

the Linear Regression: Statistics dialog box; a

large. Tabachnick and Fidell (2018) suggested that

checkbox in this window was marked to request

the ratio of cases (#) to number of predictor

values of the 95% CI for 7, the slope to predict raw

variables (4) should be on the order of N > 50 + 8%

scores on Yfrom raw scores on Y (Figure 11.6).

(salary) and the name of the predictor variable

or N> 104 + (whichever is larger) for regression

Figure 11.4 SPSS Menu Selections for Linear

analysis. This implies that Mshould beat least 105

Regression

when using one predictor variable. This is consistent with sample size suggestions from Schônbrodt (2011), discussed in Chapter 10. Even if statistical power tables may suggest that N< 100 can give adequate statistical power for significance tests of band 7, it is preferable to have N> 100.

49% Page 300 of 624 » Location 7647 of 15772

The Residualssection is below this. Here there are check options for Durbin-Watson and casewise diagnostics. Both have been left unmarked.

Table 11.1 relabels and rearranges the elements of the coefficient table in the SPSS outputso that you can relate them to terms in the textbook. The top panel of the SPSS outputin Figure 11.7 gives

Atthe bottom are option buttons for continue,

cancel andhelp.

results for Æ (capital Æis called multiple 2).

Onthe basis of information in Table 11.1 we can

11.12 SPSS Output: Salary Data To see the equivalence between Pearson's rand

write the unstandardized regression equation to predict salary in dollars from experience in years,

as follows:

parts of the results of the bivariate regression

Other

result, Pearson's rbetween years and salary was

Y =31,416.72 + 2,829.57 x years.

obtained using the SPSS correlations procedure;

Figure11.7 Pearson's 7for Years and Salary Correlations

results appear in Figure 11.7.

Unstandardized Predicted Value

Complete SPSS regression output includes additional information (discussed in Volume II [Warner, 2020]). Figure 11.8 shows the results needed to find the proportion of predicted and unpredicted variance (Æ and 1 - £2) and to write out the two versions of the regression equations (raw score and standardized). From the top of Figure 11.8, the proportion of variance in salary that can be predicted from years of experience is 72 or £2, thatis, .688 or about 69%. When regression includes more than one predictor, multiple Rtells us how well the entire set of predictor variables can predict 乃 In this example, the regression equation has only one predictor. When there is only one predictor variable, Pearson’s 7between Xand Pis the same as multiple R for the equation that uses Xto predict ¥. (You can ignore the other information in the top panel of Figure 11.8 for now. The

standard error of the estimate is discussed later in the chapter and is not usually included in research reports. The adjusted £2 valueis only used when a regression has more than one predictor variable.)

Unstandardized

Predicted Value

Pearson Correlation

Sig. (2-tailed) N

Pearson Correlation Sig. (2-tailed)

salary

N

** Correlation is significant at the 0.01 level (2-tailed).

830**

50

50

50

50

830°] 000

The image is a table that depicts Pearson correlations forsalary. Details are below; * Unstandardized predicted Value © Pearson correlation = Unstandardizedpredicted Value: 1

= Salary: .830 double star © Sig. 2-tailed = Unstandardizedpredicted Value: blank = Salary: .000 e R

» Unstandardizedpredicted Value: 50

= Salary: 50

e salary © Pearson correlation » Unstandardizedpredicted Value: .830 double star = salary: 1

© Sig. 2-tailed 49% Page 302 of 624 » Location 7691 of 15772

salary

1

000

1

» Unstandardizedpredicted Value:

= Lower bound -26528.786 = Upper bound - 36304.646 © Model 1- years © Unstandardizedcoefficients = B-2829.572 = Std error - 274.838 Standardized coefficients = Beta -.830 T-10.295 Sig -.000 95 percentconfidence interval = Lower bound -2276.972 = Upper bound - 3382.171

.000

= Salary: blank e R

» Unstandardizedpredicted Value: 50

o

= Salary: 50

o o o

Double star indicates that correlation is significantat the .01 level 2-tailed.

Figure 11.8 Selected SPSS Linear Regression Output: Prediction of Salary From Years at Work

Model 1

R

830º

Model summary? Adjusted Rsquare RSquare

a. Predictors: (Constant), years b. Dependent Variable: salary

688 |

682

SW. Error of the Estinate

10407343

Herethe dependentvariable is salary. Table 11.1

Cont 。。 во [зе т сео пабе ал you ansa mune Tern ae oy

coment? Cae be + TI am ||

314167 se

ene ove Bond Uppa fas | a | eme ow| mesa mow

The image shows2 tablesthat depicts prediction of salary from years at work. The first table is the model summary superscript b that is reproduced below: Model 1, R: .830 superscript a, R square: .688, Adjusted square: .682, Std error of estimate: 10407.343 Superscript a - The predictors: constant, years

Superscript b- Dependentvariable: salary Thesecondtable is the coefficient table.

zum

=azs

[26.528.79, 36.04.65

om

(22769733247)

example, O is a possible value for years of experience. In this example, $31,416.72 represents starting salary. The slope ¿tells us the predicted increase in salary (in dollars) for each additional one-unit increase in experience; this is $2,829.57. This corresponds to the average salary raise per year. The valueof the beta (B) coefficient also appears in

Figure 11.8. SPSS denotes the column for the standardized coefficients with the word Beta. In this example, B = .83. It is not just a coincidence that B = 7= RThis always happens when regression includes only one predictor variable.

o

We can use the value of B to write the following standardized regression equation:

Other

50% Page 303 of 624 - Location 7717 of 15772

.05. (However, keep in mind that some readers and reviewers continue to think that way.)

Comprehension Questions 1. Rerun the analysis for the carspeed.sav data and compare results across analyses (the chapter reports results for 95% Cland à = .05, two tailed).

оныено) Situation]

Situation 2

E

MS



=



Mop,



uE

g

e



Situations Situation 6

30 ぁ

Mob, んs

o k



. . . . 7. What is the difference between a directional and a nondirectional significance test? 8. Other factors being equal, which type of significance test requires a value of zthatis larger (in absolute value) to reject Zp—a

directional or a nondirectional test? 9. When a researcher reports a p value, p stands for “probability”or risk. What probability or

risk does this refer to? 10. Do we typically want pto belarge or small?

11. What is the conventional standard for an “acceptably small” p value?

1. Use a =.01, two tailed, and a 99% CI.

2. Use a =.10, twotailed, and a 90% CI.

Digital Resources

As a increases, does it become easier or more

Find free study tools to support your learning,

difficult to reject 79?

including eFlashcards, data sets, and web

2. Using the data in shoemakertemp.sav, test the null hypothesis Ho: = 98.6°F using a nondirectional alternative hypothesis. Conduct a one-sample¿test using a = .05, two tailed,as the criterion for significance. Also obtain the 95% CI. Can you reject Ap: 98.6°F? How does this result compare with

the CI obtained for the same data in the previous chapter? 3. What is a null hypothesis? 4. Describe three possible alternative hypotheses. 5. Whatis an alpha level? What determines the

value of a? 6. Sketch reject regions for each of the following situations:

35% Page 211 of 624 - Location 5407 of 15772

a

resources, on the accompanying website at

|

2.754 /(-2.754 +18)= 7.585 (7.585 +18)= .296= .30.

independent-samples ratio does depend on 〆 IF other factors are held constant, as increases, £ also increases in absolute magnitude. In a few respects, 715 similar to some effect sizes: it is unit

About 30% of the variance in heart rate in this

free or standardized and not in the original units

study was predictable from caffeine dose.

of measurement; it has a sign that indicates the direction of the relationship (which group mean

To obtain 7pp:

is higher). By itself, zcannot be interpreted as a

Take the square root of n°;

proportion of variance; however, tand d/can be converted into n2, which does provide

Toobtain Cohen's 4

information about proportion of variance. A £ ratio does not have a limited range of possible

First find sp from s, and 52.

values. Neither a ¿ratio nor its accompanying y value provides information about effect size.

When 7; = 75, we can use Equation 12.12 (if ns are not equal, use Equation 12.11):

e Researchers report £, 4/, and pas information aboutstatistical significance; these numbers

sp? = (512 + 522)/2 = (7.208? + 9.0852)/2 =

donot tell us anything abouteffect size. On

(51.955 + 82.537)/2 = 134.492/2 = 67.246.

the basis of zand p, we make judgments only aboutstatistical significance (and not about

To obtain Sp, take the square root of sp: や ー

significance or importance in practical, clinical, or real-world domains). e Researchers should also report one or more of

8.200. Then d= (M —M2)/sp =-10.1/8.200 =-1.23.

the effect sizes listed above as information about strength or size of effect (independent

This value of Ztells us that the mean of the no-

of sample size). Kirk (1996) suggested that we

caffeine group was 1.23 standard deviations

can interpret these values in terms of clinical

lower than the mean of the caffeine group (and

or practical or real-world “significance.”

the mean of the caffeine group was 1.23 standard

Unfortunately both researchers and research

deviations higher than the mean of the no-

consumers sometimes confuse statistical

caffeine group).

significance (» < .05) with practical, clinical, or real-world “significance.” I prefer to speak

Using Cohen's standards? to evaluate effectsize in

of practical, clinical, or real-world

Table 12.1, all these values are judged to be large to

importance (and avoid use of the potentially

very largeeffectsizes.

confusing term significances).

12.10.6 Summary of Effect Sizes

Table 12.1

Table 12.1 summarizes the characteristics of these effect sizes. Effect size values do not depend on A. By comparison, the magnitude of the

I 58% Page 350 of 624 - Location 8975 of 15772

In original units of measurement? Standardized/unit free? Dependent on N? Sign that indicates direction of difference? Fixed range of possible values?

Interpretas proportion ofvariance? Interpret as information about strength of association, independent of sample size?

fes No No Yes No

No Yes, if meaningful units

EPC Yes No No Otol

Yes Yes

Yes Yes No No scanbe Yes assigned tor Nobutd>t in absolute value is in research No = xs Yes

labels for effect size (thatis, require larger values of pp and other effect sizes before calling them “medium”or “large” effects). Effect size guidelines suggested by Cohen differ slightly when given in

terms of different effect size indexes.

Table 12.27

Very large effect

Note: Mosteffect sizes can be converted into other effect

sizes, but additional information is often required.

a.Unequal 77s in the groups restrict the range of possible values for 7p. The greater the inequality of 77s, the smaller the possible absolute value of ‘pb. Wecan also ask whether a finding has theoretical value or importance. If variable X accounts for more than 50% of the variance in a Foutcome, we might decide that variable X should be included in our theory about what causes Y: On the other hand,if variable X can account for only 1% of the variance in Y(even if is a “statistically significant” predictor of ¥), we would want to include more useful explanatory variables in a theory that attempts to explain X. There is no clear cutoff for a minimum proportion of explained variance. Cohen (1988) suggested guidelines for interpretations of effectsizes; Table 12.2 summarizes these labels. You may want to compare this with Table 10.3 in Chapter 10,

Largeeffect Mediumeffect Small effect Noeffect

20 15 12 10 09 08 06 05 04 03 02 01 00

707 о su 447 410 an 3% 287 243 1% 148 100 050 1000

囚 a0 265 200 168 198 109 ces 哲 哲 2 00 ско 00

Source: Adapted from Cohen (1988).

Note: The cutoff points for verbal labels are approximate. For 7 .50, effects may be detectable in everyday life (for instance, the sex difference in height, with d'= 2.00, is something people notice in everydaylife). Effect sizes have three major uses:

1. At least one index of effect size should be

which includes some additional information

reported with every statistical significance

about the way effect sizes are related to whether

test. For the independent-samples rtest itis

effects are detectable in everydaylife. These labels

common to report n2, 7pb, or Cohen's d. When

are based on recommendations made by Cohen

the dependentvariable is measured in

for the evaluation of effect sizes in social and

meaningful units, discussion should also

behavioral research; however, in other research

focus on the M; -M, difference as a way to

domains, it might make sense to use different

SE 58% Page 351 of 624 - Location 9005 of 15772

think about the clinical or practical or realworld importance of the finding. 2. When you plan future research, you can use effect sizes from past research to estimate the minimum sample size you need to have adequate statistical power in your planned study. This is called statistical power analysis. Usually people want to haveat least 80% power (i.e., approximately 80% chance of obtaining a statistically significant outcome for the guessed value of population effect size, such as n°). When a study has such small ns that there is a very low probability of obtaining a statistically significant outcome given the population effectsize, it is called

Formulasfor statistical significance tests such as the independent-samples /test can be written in a way that makes it clear that the /test combines information about effect size and sample size or N or df(Rosenthal & Rosnow, 1991). In words:

Other

(12.22) Magnitude of t-test ratio = Effect size x Sample size of study. If effect size is held constant, the expected magnitudeof ¿increases as Vincreases. If Vis held constant, the expected magnitude of #

increases as effect size increases. With little bit of thought it should beclear that:

underpowered. 3. When an author summarizes past research,

When effect size and Ware both very large,

he or she obtains and combines (averages)

the value of ¿will almost alwaysbelarge

effect size information for each of dozens or

enough to judge the outcome statistically

hundreds of studies. Thisis called metaanalysis. For example, we might want to

significant (and values ofp will be very small).

know whether mean depression after therapy for patients differs across numerous studies that compare client-centered therapy (treatment) with no therapy (control). An

When effect size and Ware both extremely small, the valueof ¿will almost alwaysbe too small to judge the outcome statistically significant (and values ofp will be large).

effect size such as Cohen's d'or pp provides important information about the direction of difference (there might be a few studies in which mean depression was lower for the notherapy group). If past studies have not reported effect sizes, effect sizes can almost always be obtained from other numerical results in the papers. In meta-analysis, it is importantto include direction of effect.

In practice, when effectsize is very small, you need a larger Ato have a reasonable chance of obtaining a statistically significant outcome. When the effect size is very large, you may be able to obtain a statistically significant outcome using quite a small sample. A specific formula for the independent-samples £ test given by Rosenthal and Rosnow (1991)is:

12.11 Factors that Influence the

Other

Size of £

(12.23)

12.11.1 Effect Size and N

=ダ

58% Page 352 of 624 - Location 9027 of 15772



where dis Cohen's 4, calculated as:

get a sense of how increase in sample size makes it possible to detect very small effect sizes (i.e., judge

Other

y MD Sp

them to be statistically significant). For instance, research that compares mean IQ for single-birth children (Group 1) with mean IQ for

If we substitute the formulafor Cohen's Zinto Equation 12.23, we have:

Other

identical twins (Group 2) yields sample means of about M, = 100 and M= 99. (A 1-point difference in IQ is not noticeable in everydaylife; you might notice IQ score differences of 20 or 30 points.) For most IQ tests, s= 15. Using Equation 12.24, we can

(12.24)

ー MoMSF 如Sp

compare potential differences in outcomes for a study with d/= 100 versus a study with d/= 10,000. With d/= 100, the 1-point mean IQ difference is unlikely to yield a ¿value large

The specific values of /that occur in studies will vary because of sampling error. This equation tells us that if we hold other terms in the equation

constant:

enough to be statistically significant. When ду= 10,000, the obtained ¢ratio is likely to be large enough to judge this 1-point difference statistically significant. (The /values are not exact; this equation does not take sampling error

« As df(sample size) goes up, ¢tendsto increase

into account.)

(and ptends to become smaller). « As (M, — M») goes up, ttendsto increase (and

Other

ク tends to become smaller).

On the other hand: * As sp goes up, ztends to decrease (and tends to increase).

—100-99 100 _ 4667510 _ 3335. 152 2 ー ロ ー ア e=.0667x12-667. In practice, researchers sometimes can control

Notice an important implication of Equation

sample size; sometimes they can control the

12.24. Even when effect sizes such as (M; — M) or

magnitude of the other two elements in Equation

Cohen's Zare extremely small, as long as they do not turn out to be exactly zero in your sample, you

12.24. Decisions about “dosage level”or type of treatmentoften can increase the M, — Mp

can judge even very small mean differences

difference. Decisions about the kindsof people to

statistically significant for larger values of М. You

includein the study and the degree of

cannot use Equation 12.24 to predict your

standardization of data collection situations can

outcome valueof ¿exactly from samplesizeand

influence the magnitudeof sp, the within-group

effect size, because this equation doesn't take

standard deviation.

sampling error into account, and we don’t know population effect size. However, you can

substitute different values into Equation 12.24 to

Researchers do not always have control over sample size. Sometimes researchers do not have

I; 58% Page 353 of 624 - Location 9055 of 15772

funds to pay participants, treatments or data

e Explain effectsizes clearly and evaluate them

collection procedures are very costly, or the study

honestly. + Discuss simple information such as M; - Ma

hasto be completed in a very short period of time. When a researcher knows that the sample cannot belarge, he or she needsto think about ways to

when units of measurement are meaningful. * Never say “extremely significant.”

increase the (M, — M3) difference and/or decrease

SpOn the other hand, sometimes the results of large-# studies are reported in misleading ways. When Mis very large, an effect can be judged statistically significant even when the effect size is too small to be of any real life or clinical or practical importance. Consider the twin versus individual child IQ study again. When the difference between mean IQsis tested in samples of 10,000 or more, it is almost alwaysstatistically significant. However, this difference could be

12.11.2 Dosage Levels for Treatment, or Magnitudes of Differencesfor Participant Characteristics, Between Groups The value of M; - M» can be affected by design decisions that involve the types of groups, types of treatment, or dosages of treatment for the two groups. Consider these two hypothetical studies

of caffeine effects on heart rate:

deemed too small to be of any practical or clinical Study A: Group 1 receives 0 mg caffeine,

importance.

Group 2 receives 50 mg caffeine Unfortunately, researchers who conductlarge-N studies and obtain p values < .001 sometimes call

Study B: Group 1 receives 0 mg caffeine,

their results “extremely significant.” (Do not say

Group 2 receives 500 mg caffeine

that!) Here's the problem. In everydaylife, when we use the word significant, we mean large or

Assuming that caffeine does have an effect on

worthy of notice (or at the very least detectable).

heart rate, we would expect the means for heart

When we hear the word significant we tend to

rate to be much farther apart in Study B than in

assume that differences between groups are large

Study A. By increasing the difference between

enough to matter to people and clinicians. Calling

treatment dosage amounts, researchers can often

the results of a study “highly significant” can

increase M, — M» and, therefore, other factors

mislead many readers into thinking that the

being equal, increase £

effects are large enough to be valuable or at least

noticeable in real life.

Studies of naturally occurring groups can also be thought of in these terms. Suppose you want to

Statistical significance and practical or clinical

study age group (X) differences in mean reaction

importance do not always go together,

time (7).

particularly when Wis extremely large. Here's

how to avoid confusion: ® Emphasizeeffect sizes in reports (instead of

Study A: Group 1 is ages 20-29, Group is ages 30-39

statistical significance tests).

rss 58% Page 354 of 624 - Location 9083 of 15772

Study B: Group 1 is ages 20-29, Group 2 is ages

deviation. The within-group standard deviation sp

70-79

is often called experimentalero. Experimental error tendsto belargein drug studies where

Other factors being equal, you would expect mean

participants within each treatment group differ

reaction times to differ much more in Study B

from one another on characteristics such as age,

than in Study A.

anxiety, history of drug use, and so forth. Experimental error is also largeif participants

Researchers must be very careful about

within the same treatment groups are tested in

something else that can influence the magnitude

different waysin different situations. Consider

of the M, — M» difference: confounds of other

the caffeine/heart rate study again: Group 1

variables with type or dosage of treatment. In the

receives no caffeine, and Group 2 receives 150 mg

0 mg caffeine versus 150 mg caffeine study,if the

caffeine. Now consider these different scenarios.

people in the 0 mg caffeine group have heart rate measured in a very relaxing setting, while those in the 150 mg group are assessed in a stressful setting, there is a complete confound between stress and caffeine dosage. Whether it is statistically significant or not, we cannot interpret a large M, — M» difference as

Study A:Participants within both groups are very similar in age, health, and amount of past caffeine consumption; all are nonsmokers; all have averagefitness; none are evaluated during midterms or final exams; and none are tested by an anxious experimenter.

information about the effects of caffeine. Some or all heart rate differences might be dueto the

Study B: Participants within both groups vary

amount of stress in the situation. In this example,

in age, health, and amount of past caffeine

a confound of high stress with high caffeine

consumption; some smoke, some do not; they

would make the M, — M» difference larger. Some

have varying levels of aerobic fitness; some

confounds may make an M, — M, difference

are tested during midterms and finals, others

smaller (for example, if heart rate was measured by anasty and threatening experimenter in the 0 mg caffeine group and by a relaxed and friendly experimenter in the 150 mg caffeine group, the

before spring break; and several different experimenters interact with the participants,

some of whom are much more anxious than others.

effects of caffeine and the confound might cancel each other out and lead to a small M, —- M

In StudyA,if participant characteristics are very

difference). The presence of one or more

similar or homogeneous, and experimental

confounds makes an A; — M» difference, and the #

procedures are standardized and consistent,

ratio based on that difference, uninterpretable.

participants in each group should not show much variation in heartrates. Thus, in Study A, Sp

12.11.3 Control of Within-Group

Error Variance Researcher decisions can also influence sp the pooled or averaged within-group standard

should be relatively small. On the other hand, in Study B, people who are in the same treatment group havedifferent health backgrounds and are tested under different circumstances; you would expect wide variation in their heart rates. In Study

gs 58% Page 355 of 624 - Location 9111 of 15772

B, sp would berelatively large. If other factors

heartrate.

(effect size and A) are held constant, there would be a better chance of obtaining a large ¿value for

Results

Study A than for Study B. Recruiting similar participants can help withstatistical power, but it

An independent-samples /test was

also reduces generalizability of findings. The

performed to assess whether mean heart rate

participants in Study A are notdiverse.

differed significantly for a group of 10 participants who consumed no caffeine

12.11.4 Summary for Design Decisions Members of my undergraduate class became upset when I explained the way research design decisions can affect the values of « They said, “You

mean you can make a study turn out any way you want?” The answer is, within some limits, yes. The independent-samples ¿testis likely to be large for these situations and decisions. (For each factor, such as 7, add the condition “other factors being equal.”)

(Group 1) compared with a group of 10 participants who consumed 150 mg of caffeine. Preliminary data screening indicated that scores on heart rate were reasonably normally distributed within groups. There were two high-end outliers in Group 1, but they were not extreme; outliers were retained in the analysis. The mean heartrates differed significantly, (18) =-2.75, p= .013, two

tailed. Mean heart rate for the no-caffeine group (M= 57.8, SD = 7.2) was about 10 beats per minute lower than mean heart rate for the caffeine group (M= 67.9, SD= 9.1). The

® Nis large (a very large NV study can yield a

effectsize, as indexed by n2, was .30; this is a

statistically significant ¿ratio evenif the

very large effect. The 95% CI for the

population effect is very small).

difference between sample means, /ハ ー か ,

e Population effect size such as n° is large(this is often related to treatment dosages or types

had alower bound of-①⑦.⑧① and an upper bound of -2.39. This study suggests that

of participants being compared). e M, — Mis large (however, M; — M» is not

consuming 150 mg of caffeine may

interpretable if confoundsare present). * spis small (this happens when participant

increase on the order of 10 bpm.

significantly increase heart rate, with an

characteristics and assessmentsituations are

The assumption of homogeneity of variance was

homogeneous within groups).

assessed using the Levene test, £= 1.57, p= .226;

Depending on their research questions and resources, the degree to which researchers can control each of these factors may vary.

12.12 Results Section

this indicated no significant violation of the equal variance assumption. Readers generally assume that the equal variances assumed version of the 7 test (also called the pooled-variances ¿test) was used unless otherwise stated. If you see d/ reported to several decimal places, thistells you that the equal variances not assumed /test was

Following is an example of a “Results” section for

used.

the study of the effect of caffeine consumption on

eee 59% Page 356 of 624 - Location 9139 of 15772

12.13 Graphing Results: Means and CIs

raiesDan 55 Sis Dvr

CEE 因 ニ ュ jam = ーn [Jaen] 2% | commsons

Cumming and Finch (2005) suggested that

レ レ

authors should emphasize confidence intervals along with effect sizes. Graphs of CIs help focus

ョ ョ

リ ョ レ ョ ョ ョ ョ ョ ョ ョ 2

reader attention on these. Several types of CI graphs can be presented for the independentsamples test. We could set up a graph of the CI

ョー ape San

for the (M, — M») difference using either an error

а

baror a bar chart. The lower and upper limits of this Clare provided in the independent-samples ¿ test output. It is more common to show a CI for each of the group means (M, and M2). This can be

done with either the SPSS error bar or bar chart procedure. To obtain an error bar graph for M; and My, make the menu selections shown in Figure 12.15, Figure 12.16, and Figure 12.17.

In Figure 12.18 the separate vertical lines for each group (no caffeine, 150 mg caffeine) have two features. The dot represents the group mean. The T-shaped bars identify the lower and upper limits of the 95% CI for each group. Be careful when you examine error bar plots in journals or conference posters. Error bars that resemble the ones in Figure 12.18 sometimes represent the mean + 1 standard deviation, or the mean * 1 SZ, instead of a 95% CI. Graphs should be clearly labeled so that viewers know whatthe error bars represent.

Figure 12.15 SPSS Menu Selections for Error Bar Procedure

The image is a SPSS menu selection to obtain the error bar procedure for thefile hrcaffeine.sav.

At thetopofthe spreadsheet are the following menu buttons;file, edit, view, data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttonsto open a

file, save, print, go back andforward, and other table editing options.

The graphs menu hasbeenopened and the following selectionsare visible; chart builder, graphboard template chooser, Weibull plot, comparesubgroupsandlegacy dialogs. The legacy dialogs menu has beenopenedto show thefollowing menu options; bar, 2-D bar, line, area, pie, high-low, box plot,error bar, population pyramid, scatter or dot and histogram.

......

There is some data visible on the spreadsheet. This has been reproduced below:

Caffeine, hr 1,51 1,66 1,58 1,58 1,53

SE 50% Page 357 of 624 - Location 9167 of 15772

….…............

summary in two ways; eîther as summaries for groupsofcases or as summaries of separate variables. Thefirst option hasbeen selected.

1,48 1,57 1,73 1,56 1,58 2,72 2,57 2,78 2,61 2,66 2,54 2,64 2,82 2,71 2,74

There are radio buttons for Define, Cancel and Help at the bottom ofthe dialog box. Cumming and Finch (2005) pointed out that when two 95% CIs, like the ones in Figure 12.18, do not overlap, you know that the ¿test for the difference between group means must be statistically significant using a = .05, two tailed. On the other hand,if the CIs do overlap, it is possible that the ¿test that compares group means maybestatistically significant (because the CI for [M; — M] has a larger d/than the CIs for

Figure 12.16 Error Bar Dialog Box

M, and for M3).

Figure 12.17 Define Simple Error Bar: Summaries

ta Error Bar

for Groups of Cases Dialog Box

EB Define Simple Error Bar: Summaries for Groupsof Cases И

Simple

団 Variable, »

TH

Clustered

x |

Category Avis: [8 carene

Em |

Bars Represent [Confidenceinterval for mean Lever ps Jw

- Data in Chart Are ———— — 一

Panel by

Rows

»

© Summaries for groups of cases Columns:

© Summaries of separate variables

Template Use chart specifications from:

The image is a dialog box in SPSS to create an Error bar.

Cox esse) ose) [canca) Ces

There are two choices available; simple and clustered. The Simple option has been highlighted.

The image is a dialog box to define error bar as summaries for groupsofcases.

The data in the chart area can be shown as a

50% Page 358 of 624 - Location 9189 of 15772

Ontheleft is space for variables, which can be

chosen and movedto the box ontheright. The variable hr has been moved to the right-side variable section. The category axis can also be defined. Here it has been specified as caffeine. There are tworadio buttonson theside; titles andoptions. Thereis a drop-down menu that allows a choice of what the bars represent. Here the bars represent confidenceinterval for mean. Thelevel can also be chosenand 95 percent is the level currently. The Panel by option for rowsand columns can be specified although currentlytheyare blank. A check box to nest variables for rows and columnsis also present. The Template option hasa check box that states: “Use chartspecification from” and allowsone to selectthe required file. At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help. Figure 12.18 Error Bars for Mean Heart Rates in Hypothetical Caffeine Experiment

A dot in each line represents the group mean andthis is 58 for the nocaffeine bar and68 for the 150 mgcaffeine bar. The upper and lower bounds are also shown as horizontal lines across the top and the bottom of the vertical line. This is 53 and 63 for the no caffeine bar and 63 and 75 for the 150 mg caffeine bar.

A bar chart is another way to represent

information aboutCIs. The menu selections to open the bar chart procedure were shown earlier ( > っ . Inthe Define Simple Bar: Summaries for Groups of Cases dialog box,in Figure 12.19, select the radio button for “Other statistic (e.g., mean)” and move the dependent variable name (heart rate) into the box labeled “Variable.”It will appear as MEAN([hr]). The height of each bar will correspond to the mean heart rate for one group. Enter the name of the group or category variable into the box labeled Options dialog box, also shown in Figure 12.19,

70 95% Cl hr

There are two separate vertical linesfor each group.

“Category Axis.” Click the Options button. In the

75

check the box for “Display error bars.” Leave the

default radio button selection under “Confidence

65

Intervals”as 95.0 for “Level (%),” unless otherwise

60

desired. This will produce a 95% CI for each group

mean.

55

50

and ranges from 50 to 75.

The resulting bar chart appears in Figure 12.20.

Nocaffeine

150 mg caffeine

By default, SPSS uses O as the starting value for the Yaxis. When bar charts were used to represent

The image showsan error bar graphfor the caffeine experiment.

the frequency of cases for each groupearlier,

The X axis denotes whetherthebar represents nocaffeine or 150 mg caffeine.

recommended; cutting out large portions of the Y

The Y axisrepresents the 95 percentCI for hr

using O as the lowest value for Ywas axis that representpossible values for Ycan yield a graph that exaggerates the magnitude of group

sizes.

PP 50% Page 359 of 624 - Location 9203 of 15772

caffeine bar rises to about 68. Both bars have error bars embedded in them.

Figure 12.21 Edited Bar Chart: 95% CIs for Two Group Means

Samples ¿Test Statistical power analysis provides a more formal way to address this question: How does the probability of obtaining a ¿ratio large enough to

80

reject the null hypothesis (Hp: py = Ha) Vary as a function of sample size and effect size? Statistical power is the probability of obtaining a test statistic large enough to reject Zp when Ap is false. Researchers generally want to have a reasonably high probability of rejecting the null

40

No caffeine

150 mg caffeine

Error bars: 95%CI

Thisis an edited bar chart with 95 percent CI for two group means. There are two bars in the image indicating presence or absenceof caffeine. The X axis denotesthe absenceof caffeine as well as 150 gmscaffeine and the Y axis denotes the mean hr. This ranges from 40 to 80, rising in increments of 10. The no caffeine bar rises to 58. The 150 mg caffeine bar rises to about 68. Both bars have error bars embedded in them.

When bars represent group means, starting the Y axis at 0 often does not makesense. For heart rate,

it would makesense to use the lowest value for heart rate that you could call a normal healthy heart rate as your minimum. In this situation it

wouldbe reasonable to use a value such as 40 as the lowest value marked on the axis This change can be madein the chart editor (commandsare not shown). The edited bar chart appearsin Figure 12.21.

hypothesis; power of 80% is sometimes used as a reasonable guideline. Cohen (1988) provided tables that can be used to look up power as a function of effect size and zor to lookup nasa

function of effectsize and the desired level of

power. An exampleof a power table that can be used to look up the minimum required 7 per group to

obtain adequate statistical power is given in Table 12.3. This table assumes that the researcher will use the conventional a = .05, two tailed,criterion for significance. For other alpha levels, tables can be found in Jaccard and Becker (2009) and Cohen (1988). To use thistable, the researcher mustfirst decide on the desired level of power (power of .80 is often taken as a reasonable minimum). Then, the researcher needs to make an educated guess about the population effectsize that the study is designed to detect. In an area where similar studies have already been done, the researcher may calculate n2 values on the basis of the zor 7 ratios reported in published studies and then use the averageeffect size from past research as an estimate of the population effect size. (Recall that n° can be calculated by hand from the values of £

12.14 Decisions About Sample Size for the Independent50% Page 361 of 624 - Location 9249 of 15772

and dfusing Equation 12.19 if the valueof n° is not reported in the journal article.) If no similar past studies have been done, the researcher can

down the column of values for estimated power

The sample size needed for adequate statistical

under the column headed #= .50 until reaching

power can be approximated only by making an

the table entry of .80. Then, she would look to the

educated guess about the true magnitude of the

left (of this value of .80) for the corresponding

effect, as indexed by d If the guess about the

value of On the basis of the values in Table 9.3, the value of Vrequired to havestatistical power of

population effect size dis wrong, then the

about .80 to detect an effect size of d= .5 in a one-

wrong. Information from past studies can often

sample test with a = .05, two tailed, is between

be used to make at least approximate estimates of population effectsize.

30 and 40.

Table 9.3

estimate of power based on that guess will also be

Statistical power analysis is useful when planning a future study. It is important to think about whether the expected effect size, alpha level, and sample size provide you with a reasonably large chance (reasonably high power) to obtain a statistically significant outcome. People who

"

s





write proposals to compete for research funds from government grant agencies are generally required to includea rationale for decisions about

n

a s a ョ ュ タ ョ タ

ョ ョ ョ ッ e



planned samplesize on the basis of power. There are several places to obtain information for statistical power analysis. Jaccard and Becker (2009) provide power tables for some additional

ョ +

メ ョ ョ ュ ュ ュ タ

ッ ッ

situations. SPSS has an add-on procedure for statistical power, and numerous other computer

s ョ メ メタ ラ ョ ョ ッ ッ ッ

programs (some free) can do power analyses. Free online power calculators are widely available (for example, at http://powerandsamplesize.com/Calculators/).

Source: Reprinted with permission from Dr. Victor

Bissonnette(2019).

The true strength of the population effect size we are trying to detect is not known. For example, the degree to which the actual population mean y differs from the hypothesized value, Hhyp, as indexed by the population value of Cohen's 4, is not known in advance of the study. If we knew

Usually researchers rely on computer programs instead of tables for power analysis. A researcher provides program input information about type of analysis (e.g., a one-sample¿test), planned a level, whether a one- or two-tailed test is desired, and expected effect size. Programs usually provide either the estimated power for an input value of N or the minimum A needed to achieve a requested level of power.

the answer to that question, we would not need to You should not report a post ho

do a study!

Thatis, do not look up your obtained Cohen’s &

36% Page 218 of 624 - Location 5594 of 15772

and smaller than it should be in other situations.

equivalent.

Suppose that you want to know whether patients

Self-selection into treatment is problematic. If

have lower mean anxiety scores after Rogerian

your study includes a meditation training group

therapy (Group 1) or Freudian psychodynamic

and a control group, and participants are allowed

therapy (Group 2). Suppose that these two types

to choose their groups, you will probably have

of therapy are given by different therapists (Dr.

different kinds of people in the meditation group

Goodman does the Rogerian therapy and Dr.

than in the no-treatment control group.

Deadwood does the psychodynamic therapy). This would be a perfect confound between therapist personality and ability and type of therapy. If the Group 1 patients do better than those in Group 2, we cannottell whether this is dueto differences in the type of therapy or

differences between the two doctors. This is a perfect or complete confound, and it makes the results of this study uninterpretable. The M, — difference can be dueto type of therapy, personality and ability of the therapist, or both. (Even if Dr. Goodman did the therapy in both groups, there could be problems, because she might have greater faith in one type of therapy than the other, and this could produce placebo or expectancy effects.) Confounds do not have to be complete confounds to be problematic. Consider a group of patients in adrugstudy.If the drug group has 55% women and the placebo group has only 39% women, there is a partial confound between type of drug and sex. M, might differ from M, because the M,

12.15.2 Decisions About Type or Dosage of Treatment Researcher decisions about the types or amounts of treatments (or other group characteristics) can influence the M, — M, difference between means. Usually, researchers want to maximizethis difference. However, there are limits. We cannot give human beings 10,000 mg of caffeine to maximize the effects of caffeine on heartrate (for ethical as well as practical reasons). It would not be useful to give rats amounts of artificial sweetener that would correspond to human consumption of 50 diet sodas per day, because that dosage would not correspond to any real-

world situation. If naturally occurring groups are compared (for example, older adults vs. younger adults), it will usually be easier to find differences when groups differ substantially. For instance, a study that

compares reaction time between a group of

group includes more women, while the M group

persons ages 60 to 70 and a group of persons ages

includes more men—insteadof or in addition to

20 to 30 is more likely to find a difference than a

any drug effects.

study that compares a group of persons in their

Confoundscan be obvious, but sometimes they are subtle. Random assignmentof participants to groups is supposed to make groups equivalent in composition, but sometimes this doesn’t work as well as expected. When background information is available aboutparticipants, it's good to compare the groups to see whether they are

60% Page 363 of 624 - Location 9298 of 15772

20s with a group in their 30s.

12.15.3 Decisions About Participant Recruitment and Standardization of Procedures

Researcher decisions about types of participants

embarrassingly low, while CIs are often

to recruit, and about standardization of

embarrassingly wide.

procedures, can affect the magnitude of Sp, the pooled or averaged within-group standard deviation. Recruiting homogeneousparticipants such as 18-year-old healthy men helps keep sp low (compared with studies with wider ranges of age and health), but it also limits the potential generalizability of results. It is a good idea to standardize situations and testing procedures to keep sp small, but rigid protocols can result in experiences that make the situation feel even

To summarize: Researcher decisions about treatmenttype and dosage, and the presence of confounds, will affect the magnitude of M, - M3. Confounds make M, — M5 differences uninterpretable even if they are statistically significant. Researcher decisions about participant recruitment and procedures can reduce the magnitudeof sp, but may also reduce generalizability. Very low 77s result in

more artificial.

underpowered studies, that is, studies in which a

12.15.4 Decisions About Sample

the null hypothesisis false. Very large 77s can lead

Size

have any real-worldpractical or clinical

statistically significant ¿value is unlikely even if

to situations in whicheffects that are too small to importance are judged statistically significant. In

Sometimes participants or cases are difficult or

between these extremes, statistical power tables

costly to obtain. A neuroscience study might

can help researchers evaluate the samplesizes

involve surgical procedures and lengthy training

needed for adequate statistical power.

and testing procedures. In such situations, standardization of procedures and optimal choice of treatment dosage levels is particularly important.

12.16 Summary This chapter discussed a simple and widely used

When researchers have access to very large V's

statistical test (the independent-samples ¿test)

(on the order of tens of thousands), thereisa

and provided additional information about effect

different problem. Even effects that are extremely

size, statistical power, and factors that affect the

small (when evaluated by looking at M, — M3, or

size of £. The ¿test is sometimes used by itself to

12, or Cohen’s 2) can bestatistically significant when ¥is very large. Researchers should resist the temptation to overemphasize statistical significance in these situations. Clear information about effect size should be provided in terms

report results in relatively simple studies that compare group means on a few outcome variables; it is also used as a follow-up in more complex designs that involve larger numbers of groups or

outcome variables.

readers can understand. Thisis particularly

A £test value (and correspondingeffect sizes) is

important when important real-life decisions

not a fact of nature. Researchers have some

(such as medical decisions) are at stake.

control over factors that influence the size of £ in

One possible reason why researchers have been slow to adopt the reporting of effect size

information and CIs is that effect sizes are often 60% Page 364 of 624 - Location 9325 of 15772

both experimental and nonexperimental research situations. Because the size of /depends to a great extent on our research decisions, we should be

cautious about making inferences about the

The term errorin everydaylife means “mistake.”

strength of effects in the real world on the basis of

In statistics, e7707has manydifferent meanings,

the obtained effect sizes in our samples.

depending on context. Errors in prediction don’t

For the independent-samples rtest researchers often report one of the following effect size

happen because the data analyst made a mistake (although mistakes in data analysis can happen, of course). Errors in prediction happen because

measures: Cohen's 4, 7p, Or 12. Eta squared is an

many other variables, other than the X variable

effect size commonly used to do power analysis

used as a predictor, influence the scores on the Y

for future similar studies. When researchers want

outcome variable. Z77orrefers collectively to all

to summarize information across many past

the variables in the world that are related to F, but

studies (as in a meta-analysis), 7pb (often just

that we did not controlin the study or include in

called 7) is often the effect size of choice. Past

the statistical analysis. This may clarify why

research has not always included effect size

proportions of error variance are so high in most

information, but readers can usually calculate

research! Error also includes any chance or

effect sizes from the information in published

random or unpredictable elements in F. If you go

journal articles.

on to learn about analyses that include multiple

Notice that the independent-samples test like correlation and regression, provides a partition of the total variance in Poutcome scoresinto two

predictor variables, you will see that use of multiple predictors sometimes reduces the proportion of error variance.

parts; n° is the proportion of variance in ¥that

To describe the problem of error variance another

differs between groups (variance that may be due

way, consider the tongue-in-cheek Harvard Law

to different types or amounts of treatment). In

of Animal Behavior: “Under carefully controlled

regression, 72 was the proportion of variance in ¥

experimental circumstances, an animal will

that could belinearly predicted from JX. Similarly,

behave as it damned well pleases.”

(1-72) was the proportion of variance in Fthat could notbe linearly predicted from X; for the

This chapter was long and detailed because it introduces issues that arise when comparing

independent-samples ¿test, (1 -n?) is the

means across groups; many of the following

proportion of variance in Ythatis not predictable

chapters describe analyses that also compare

from group membership or from the score on the

means across groups. This set of analyses is called

predictor variable.

analysis of variance. The same issues

The ? and n? are both called proportion of predicted (or sometimes explained) variance.

Predicted variance is variance in Ythatis related to scores on the Ypredictor variable. By contrast, (1-7) and (1 - n°) are the parts of the variance in Ythat are not predictable from the Xindependent variable. These are interpreted as proportions of error variance.

60% Page 364 of 624 - Location 9353 of 15772

(assumptions, data screening, effect size, and so forth) continueto be important for those analyses, and I'll often refer you back to this chapter for more complete discussion.

Appendix 12A: A Nonparametric Alternative to the Independent-Samples ¿Test

either committed a II or has reported a correct decision notto reject Hg. (The researcher can never be sure which.) We want the probability or risk for both types of error to be low,that is, we want both aand B to be low. When a data analyst selects an a level, such as a =.05, that choice theoretically sets an upper limit for the risk for Type I error. If a is set at .05, then in theory, we have a maximumrisk of 5% for Type I error. However, the limit of risk for Type I error works in practice only if the assumptions and rules for NHSTare followed—and in many situations, they are not. The actual risk for Type I error in many research situations is often much higher than the nominal (selected) a level.

Actual State of the World Loss Drug Really Does Not Work Typel error ih risk a Researcher istrue. Reject decides! work, but the rejection H; says The drug + claims that it does. rese thatthe weight loss drug works The study probably

stcon

lishes the reditfora For patients who take the drug, a benefit

takethe drug will not benefit Correct decision, although maybe Type ll error with unknown Researcher risk not the decision the researcher decidesnotto The researcher id not reject H, hopedfor. reject; does not claimthatthe The drug does notwork andthe when His false researcher does not clamthatit The drug really does work. butthe drugworks researcherdoes not claim that it works. works Often this type of result does The study probably doesn't get not get published, and that is published; a missed opportunity unfortunate. Other researchers may do studiesto seeifthis drug The drug may not be approved for works, notknowing thatthereis use with patients, even thoughit works already evidence suggestingit This is likely to happen when may not work. studies are "underpoweredthat is, the N of casesis too small to detect the effect of interest

The risk for Type II error, B, cannot be exactly known; but we know something about factors

What does it mean for Ho to be false? Ho is true

that tend to make ß larger or smaller. In the

only if pis exactly equal to O (or exactly equal to

previous section we talked aboutstatistical power:

the proposed value in the null hypothesis, such as

the probability of rejecting Zo when it is false.

98.6 or 35 or 100 in previous examples). However,

Power is (1 —B), and we want power to be high,

Hp can be false in billions of ways. If we consider

usually on the order of .80.

Ho: y = 35, Hois false if p really equals any number other than 35 (e.g., 45, 12, 35.01, 99, 34.3, and so

Table 9.5

forth). Hp can be false to varying degrees; ina sense, Hp: = 35 is “less false”if pis really 35.2 or 34.9 than if pis really 30 or 51. Population effect size is the degree to which Æis false. For example, if Cohen's d(for the difference between the real and hypothesized population means) is d= 1.00,

this indicates that the difference between hypothesis and reality is large; if d= .05, this indicates that the difference between hypothesis and reality is small. The values of B and (1 - В) магу depending on the population effect size. We never know the exact population effect size, but we can think about the values of Band (1-8) that we would expect, in theory,for possibledifferent

values of Zand for fixed decisions about Vand a. Appendix 9A explains this in more detail.

36% Page 221 of 624 - Location 5641 of 15772

across groups Compare medians across groups. This option has beenchecked. * Customize analysis Description: compare mediansacross groups using the MedianTest for k samples. Figure

12.24

Specification

of

Hypothesis Test Summary Null Hypothesis

Test

Independent Samples same the is hr of n distributio 4 The Whitney U across categories of caffeine. Test

Sig.

Decision

a ely

Asymptotic significances are displayed. The significance levelis 05 "Exact significance is displayed forthis test

Variables

The image is a table that providesthe results of

(Assignment to Fields)

"AB Nonparametne Text Tuoor Moreindependent Samples

the Mann-Whitney U test. The summary states

the following;

a me se

Ousereses os (© Useustomastron ce

zer: 7

Null hypothesis: Thedistribution of hr is the sameacrosscategories of caffeine. ・ TesIndependent samples Mann-Whitney U test

© Sig:.023 superscript 1

e Decision: Reject the null hypothesis.

There are a couple ofnotes below the table that state the following:

(nre) game, pere) conce) Otto, The image shows the Fields tab of

nonparametric tests.

Asymptotic significances are displayed. The significance level is .05. The superscript 1 states: Exact significance is displayedfor thistest. Details for computation of Mann-Whitney U are

There are two choices in the check boxes; use predefinedroles and use custom field assignment. The second option has been checked.

not presented here. When samplesizes are

There are two boxes on either side below this. The one on the left depicts Fields from which theselected choice can be moved to the box on the right that is named Test Field. This is currently populated by hr.

membership) into ranks. These ranks replace the

reasonably large (i.e., V> 30 for the entire data set), the Mann-Whitney Utest begins by converting the ¥scores (ignoring group original Yscores in the two samples or groups. Nonparametric tests vary in the way they handle tied ranks. The null hypothesis is that these

distributions of ranks are the same across the two

The Groups option is below this and is populated bycaffeine.

groups.

At the bottom ofthe dialog box are buttonsfor the following; Run, paste, reset, cancel and help.

statistic, only the corresponding p value. When

Figure 12.25 Mann-Whitney UTest Results 60% Page 366 of 624 - Location 9403 of 15772

SPSS does notdisplay the Mann-Whitney 7 the hrcaffeine.sav data were analyzed, the result was p = .023. The distribution of heart rate ranks in the two samples (no caffeine vs. caffeine)

differed significantly. Whether the independentsamples /test (Figure 12.11) or the MannWhitney Utest (Figure 12.25) is used to compare heart rate across groups, in this data set, the conclusion was the same. That does not always

3. It can make either larger or smaller or leave it unchanged. 4. It generally has no effect on the size of

the /ratio. . Which of these pieces of information would

happen. Parametric tests such as the ¿test may

be sufficient to calculate an n° effect size

have greater statistical power than corresponding

from areported independent-samples ztest?

nonparametric tests in some situations, but

1. 512, 2? and 71, 72

parametric tests are not always more powerful.

2. tand df

Your decision whether to perform an independent-samples ztest or a Mann-Whitney ひ test will depend on the most common practices in your discipline. Many journals accept the independent-samples zas an appropriate analysis even when some assumptions(such as normality of distribution shapes) are violated. If practitioners in your research area prefer to report non-parametric statistics such as MannWhitney U, it is probably better to follow common practice.

3. The M; - M, difference and sp

4. None of the above . Aronson and Mills (1959) conducted an experiment to see whether people’s liking for a group is influenced by the severity of initiation. They reasoned that when people willingly undergo a severe initiation to become members of a group, they are motivated to think that the group membership must be worthwhile. Otherwise, they would experience cognitive dissonance:

Comprehension Questions

Why put up with severe initiation for the sake of a group membership that is worthless? In their experiment, participants

1. Suppose you read the following in a journal: “The group means did not differ significantly, 30.1) = 1.68, > .05, two tailed.” You notice that the 7s in the groups were 721 = 40 and 725 = 55. 1. What degrees of freedom would you normally expect a ¿test to have when 71 =40and 7; = 55? 2. How do you explain why the degrees of

were randomly assigned to one of three

treatment groups: Group 1 (control) had no initiation. Group 2 (mild) had a mildly embarrassing initiation (reading wordsrelated to sex out loud). Group 3 (severe) had a severely embarrassing

freedom reported here differ from the

initiation (reading sexually explicit words

value you just calculated?

and obscene words out loud).

2. What type of effect can a variablethatis

confounded with the treatmentvariable in a two-group experimental study have on the

obtained value of the ¿ratio? 1. It always makes ¿larger. 2. It usually makes smaller.

60% Page 366 of 624 - Location 9423 of 15772

After the initiation, each person listened to a standard tape-recorded discussion among the group that they would now supposedly be invited to join; this was made to be as dull and banal as possible. Then, they were asked to

rate how interesting they thought the

each married couple, both the husband and

discussion was. The researchers expected

wifefill out a scale that measures their level

that people who had undergone the most

of marital satisfaction; the researcher carries

embarrassing initiation would evaluate the

out an independent-samples ¿test to test the

discussion most positively. In the table below,

null hypothesis that male and femalelevels of

ahigher score represents a more positive

marital satisfaction do notdiffer. Is this

evaluation.

analysis appropriate? Give reasons for your

answer.

1. Were the researchers’ predictions upheld? In simple language, what was

. Aresearcher plans to do a study to see

found?

whether people who eat an all-carbohydrate

meal have different scores on a mood scale

2. Calculate an effect size (n°) for each of

than people who eat an all-protein meal.

these three /ratios and interpret these.

||ésomimentaiéonäiion PRETCTE so м

132 a

818 210 21

‚Source: Data from Aronson andMills (1959).

is the type of meal (1 = carbohydrate, 2 = protein). In past research, the effectsizes that

166 21

have been reported for the effects of food on mood have been on the order of nº = .15. On this basis and assuming that the researcher plans to use a = .05, two tailed, and wants to

Control versus severe

Mid versus severe Control versus mild

Thus, the manipulated independentvariable

have power of about .80, what is the



Ns

Source: Data from Aronson and Mills (1959).

minimum group size that is needed (i.e., how many subjects would be needed in each of the two groups)?

as 71 and 7? increase. 6. Aresearcher reports that the n° effect size for her studyis very large (n° = .64), but the £ value she reports is quite small and not statistically significant. What inference can you makeaboutthis situation? 1. The researcher has made a mistake: If n2 is this large, then /mustbe significant. 2. The 7's in the groups are probably rather

. Which of the following would besufficient information for you to calculate an independent-samples 1. 512,59”; Mi, My; and 71, 79

n よ ッ N

5. Trueor false: The size of n° tendsto increase

dand df m, m, the M; — Ma difference, and sp

None of the above Any of the above(a, b, or c)

10. What changes in research design tend to

reduce the magnitude of the within-group variances(n ろ 572)? What advantage does a

large. 3. The s in the groups are probably rather

researcher get from decreasing the

small. 4. None of the aboveinferences is correct.

magnitude of these variances? What

7. Aresearcher collects data on married couples

to see whether men and women differ in their mean levels of marital satisfaction. For

61% Page 368 of 624 - Location 9450 of 15772

disadvantage arises when these variances are

reduced? 11. The statistic that is most frequently used to

describe the relation between a dichotomous

group membership variable and scores on a

the nature of the relationship between food

continuousvariable is the independent-

and mood (i.e., did eating carbohydrates

samples £. Name two other statistics that can

make people more or less calm than eating

be used to describethe relationship between

protein)?

these kindsof variables (these are effect sizes; later you will see that an “ratio can also be reported in this situation). 12. Suppose that a student conducts a study in

which the manipulated independentvariable is the level of white noise (60 vs. 65 dB). Ten participants are assigned to each level of noise; these participants vary widely in age, hearing acuity, and study habits. The outcome variableis performance on a verbal

Also, as an exercise, if you wanted to designa better study to assess the possible impact of food on mood, what would you add to this study? What would you change? (For background about research on the possible effects of food on mood,refer to Spring, Chiodo, & Bowen, 1987.)

Data for Question 13:

learning task (how many words on a list of 25 words each participant remembers). The ¢ value obtained is not statistically significant.

5 8 3

4 9 5

4 3 2

4

8

2

3

5

5

5

4

1

2 2 E

o 3 1

9 4 6

0 3 4

2 2

2 3 4 2 3

6 4 9 9 4

2

What advice would you giveto this student about waysto redesign this study that might improve the chances of detecting an effect of noise on verbal learning recall? 13. The table below shows data obtained in a

small experiment run by one of my research methods classes to evaluate the possible

effects of food on mood. This was done as a between-subjects study; each participant was randomly assigned to Group 1 (an allcarbohydrate lunch) or Group 2 (an allprotein lunch). One hour after eating lunch, each participant rated his or her mood, with higher scores indicating more agreement

with that mood.Select one of the mood

2 2

3 1 o

outcome variables, enter the data into SPSS,

Food type was coded 7 = carbohydrate, 2

examine a histogram to see if the scores

= protein.

appearto be normally distributed, and conduct an independent-samples ¿test to see if mean moodsdiffered significantly between groups. Write up your results in the form of an APA-style “Results” section, includinga

statement abouteffectsize. Be certain to state

61% Page 360 of 624 + Location 9474 of 15772

Note: The moods calm,anxious,sleepy,and alert wererated on a 15-point scale: 0 = mot at all, 15 = extremely. 14. Atest of emotional intelligence was given to

241 women and 89 men. The results were as follows: For women, M= 96.62, SD = 10.34;

for men, M= ⑧⑨.③③, ⑤の = ⑪.⑥①. Was this

these two d/terms: the overall d/for the

difference statistically significant (2 =.05,

independent-samples ¿test = (771 -1) + (ло -1),

two tailed)? How large wasthe effect, as

which can also be written 7; + 79 —2. For the total

indexed by Cohen's Yand byn② (For this

Nin the study (NV = 721 + np), df= N-2. We “lose” 1

result, refer to Brackett, Mayer, & Warner,

degree of freedom for each mean that is

2004.)

estimated.

15. Whatis the null hypothesis for an independent-samples ¿test? 16. Inwhatsituations should the paired-samples

3The null hypothesis for the Levene test Fratio is that the variances of the populations that

ttest be used rather than the independent-

correspond to the two samples are equal (Hp: 612 =

samples ¿test?

02°). The Levene test "ratio, based on values of

17. What information does the Fratio in the SPSS output for an independent-samples ¿test provide? That is, what assumption does it test? 18. Explain briefly why there are two different versions of the ¿test in the SPSS output and how you decide which one is more appropriate. 19. Whatis n2? How is it computed, and how is it interpreted?

SD, and SD», is large if the sample data suggest that this assumption of equal variance is violated. If the p value associated with Levene’s Fis less than .05, the data analyst may consider use of the “equal variances not assumed”version of the independent-samples ztest. If the y value associated with Levene’s Fis greater than .05, the data analyst can use the “equal variances assumed”version of the independent-samples ¿

test. You'll learn more about ratios later. Fratios are reported as one tailed (never two tailed), so it

Notes

is not necessary to specify one or two tailed when you report F.

11t would be nonsense to compute means for categorical dependentvariables.

4The “equal variances not assumed”or “separate

variances” version ofthe testis included in the 2Recall that whenever SSor s2 is computed, each

SPSS output. However, the equal variances not

SSterm has a corresponding d/ When an SSterm

assumed ¢is rarely reported, because the

is computed, deviations from a sample mean are

independent-samples ¿test is robust against

squared and then summed. Because deviations

violations of the homogeneity of variance

must sum to O within a sample, if a sample has 77

assumption. There are two differences in the

members, only 7-1 of the deviations from the

computation of this ratio. First, instead of

mean are “free to vary.” Once we know any 7-1

pooling the two within-group variances, the two

deviations from the mean, the last deviation must

within-group variances are kept separate when

be whatever value is required to make the sum of

the standard error term, SEM My is calculated:

all deviations equal 0. For the independentsamples ¿test there are two SSterms, one for each group. For Sample 1, 851 has 71 — 1 df. For Sample 2, 88, has 75-1 df: The combined &fis the sum of

61% Page 371 of 624 - Location 9494 of 15772

Other

(12.25)

for the equal variances not assumed test is smaller than the dffor the equal variances assumed test, and the p value for the equal variances not Second, a (downwardly) adjusted df term is used to evaluate the significance of

Usually this “

will not be an integer value; it will be smaller than nq + ny —2. The larger the difference in the magnitudeof the variances and the 7s, the

assumed version of the test is larger than the y valuefor the equal variances assumed test. In other words, you are less likely to be able to reject the null hypothesis using the equal variances not

assumed ¿test.

greater the downward adjustment of the degrees

STables for effectsize labels can differ from this

of freedom. Computation of adjusted degrees of

one (for example, some tables use lower or higher

freedom (4) for the equal variances not assumed

values of rto correspond to a medium effect).

test (from the SPSS algorithms webpage at

Some statisticians prefer to set the bar higher

https://www.ibm.com/support/pages/ibm-spss-

(that is, to require a higher value of rthan in

statistics-25-documentation; scroll down to

Cohen’s table to judge an effect size “large”).

“Algorithms”to download the PDF file of SPSS

Another reason tables differ: Most tables are

algorithms):

developed by starting with a list of easy-to-

remember effect size values for one effect size

Other

日 /が ① ぁ コ ー > s/n, +55 /n, IN-1

ぁる コ

S/N 1 ーー 5) / п, +5) / п, NO № -1

(such as .10, .20, .30, etc., for Cohen's 2). Then they convert the d'values into other effect size units (such as 7pb)- The values for 7pb will have more decimal places and won't be as easy to remember. A different table might start with easy-to-remember values of 7p and then convert

those to Cohen's d This would make the values that correspond to small, medium, or large effect

1 df'=——, / z, +2,

size slightly different between the two tables. However, these values are only approximate, so the discrepancies are not important.

where 51? and s,2 are the within-group variances

6In everydaylife people define significance as

of the scores relative to their group means and N

important, noteworthy, or large enough to be of

is the total number of scores in the two groups

value. A result can be “statistically significant”

combined, N= 7, + m9. 1 view thistest as a relic

even if it haslittle practical, clinical, or everyday

from a time when some statisticians were much more worried about violations of certain

value or importance. When we read that an

assumptions than most are today. However,

not assume that the outcome is large, noteworthy,

someday you may encounter a statistical

and valuable in clinical practice. It might be, but

conservative who wants you to use this procedure

we haveto look at effect size, not a statistical

or an exam that has a question about this. Also,

significance test, to evaluate the real-world or

when sample variances differ substantially, the df

clinical value of a research result.

61% Page 372 of 624 - Location 9523 of 15772

outcome is “statistically significant,” we should

Digital Resources Find free study tools to support your learning,

including eFlashcards, data sets, and web resources, on the accompanying website at

necessarily imply that the effect is large in

(e.g., one-tailed ztest, a =.05, two tailed) to look up

practical or clinical terms.

estimated power for your effect size and planned

Look for effect size information.If effect size is not reported,there should besufficient information for you to calculate this by hand. All you need to find Cohen’s dis M, SD, and Hhyp (the proposed or hypothesized value of

7. Or, using .80 for power, figure out the minimum needed to have 80% power.

9.14 Guidelines for Reporting Results

Ww). Also evaluate whether the effect size is large enough to have any practical or clinical

The information to include in a research report

importance. When variables are measured in

depends on the specific test. For a one-sample £

meaningful units, #/— ppyyp is useful

test, include N, M, SD, а}, SEm, t, and (exact) 7;

information.

whether pis one tailed or two tailed; effect size

Look for confidence intervals.

information such as Cohen's ⑦and/or ーuhyp: and a CI for M(or for M-unyp). The following

Ask ifitis reasonable to generalize from the

elements should be included in a written report

types of cases in this study to larger

for a one-sample Ztest.

populations in the real world. Ask if the situation in the study is comparable with real-

world situations.

e A statement of what test was done, for what

variable. * Samplesize (W), M, SD, and SEm.

* The CI for M(or the CI for the M- Uhyp

9.13 Planning Future Research

difference). * Obtained /with its d/and exact p. State

Research methods textbooks specific to your field

whether pis one tailed or two tailed. e Traditionally,a statement of whether a test

of interest provide much information about planning research. From the perspective of NHST,

Wasstatistically significant and/or whether

here are some important issues.

the null hypothesis can be rejected has

Make decisions ahead of time about significance tests (teststatistic, a level, directional or nondirectional test).

usually been included. Proponents of the New Statistics suggest that we should avoid yes/no thinking and instead focus on confidence

intervals and effectsizes.

Make decisions ahead of time about the

ヶ Effect size (such as Cohen's à) and,if units of

identification and handling of outliers.

measurementare interpretable, a difference such as M- Mhyp may also be useful as

Estimate the population effect size. Effect sizes from past studies (your own past research or

information aboutpractical significance.

other people’s) may be used to do this. It is better

Here is an example of a complete “Results” section

to underestimate population effectsize than to

for a one-sample¿test that includes all

overestimate it.

information listed above.

Use your estimated effect size and type of test

37% Page 226 of 624 + Location 5767 of 15772

Comparisons among several group means could be

on the outcome variable. The alternative

made by calculating ztests for each pairwise

hypothesis in this situation is not that all

comparison among the means of these four

population means are unequal; the alternative

treatment groups. However, as described earlier,

hypothesis is that there is at least one inequality

doing numerous significance tests leads to an

between one pair of means in the set.

inflated risk for Type I error. If a study includes # groups, there are A2-1)/2 pairs of means; thus, for a set of four groups, the researcher would need to do (4 x 3)/2 = 6 different ¿tests to makeall possible pairwise comparisons. If a = .05 is used as the criterion for significance for each test, and the researcher conducts six significance tests, the probability that this set of six decisions contains at least one instance of Type I error is greater than 05.

The best question ever asked by a student in any of mystatistics classes was deceptively simple: “Why is there variance?” In the hypothetical experimental study described in the following section, the outcome variableis a self-report measure of anxiety, and the group membership variable is type of stress. We want to know, How much of the variance in anxiety can be predicted from type of stress? Is stress a major reason why anxiety scores differed among persons in this

One way that ANOVAlimits the risk for Type 1

study? Why do some persons report more anxiety

erroris by obtaining a single omnibus that

than other persons? To what extent are the

examines all possible comparisons among means

differences in amount of anxiety systematically

in the study. Researchers often want to examine

associated with the independent variable (type of

selected pairwise comparisons of means as a

stress), and to what extentare differences in the

follow-up analysis to obtain more information

amount of self-reported anxiety due to other

about the pattern of differences among groups.

factors (such astrait levels of anxiety, physiological arousal, drug use, sex, other anxiety-

13.2 Questions in One-Way Between-SANOVA

arousing events that each participant has

The overall null hypothesis for one-way ANOVA is

the same thing as in everyday life. In everyday life,

that the means of the é populations that

we use the word e77orto mean “mistake.” In

correspond to the groups in the study are all

ANOVA,the term errorrefers to the parts of

equal:

scores that cannotbe predicted from type of

experienced on the day of the study,etc.)? In statistics, the term error usually does not mean

treatment or group membership. The part of

Other

anxiety scores that we cannot predict from type of stress is presumably dueto the effects of other

(13.1)

Ho= № = + = В

variables that we have not included in the study,

When each group has been exposed to different

may have happened to the person just before the

types or dosages of a treatment, as in a typical

study, recent use of drugs such as alcohol,

experiment, this null hypothesis corresponds to

caffeine, and tobacco, and possibly a multitude of

an assumption that the treatment has no effect

other unknown variables.

61% Page 375 of 624 - Location 9579 of 15772

such as personality, other upsetting events that

Questions in ANOVA:

significance (like per comparison alpha

1. The first question in one-way ANOVA is this:

[РС] 1 the Bonferroni procedure).

When all group means are considered as a set,

Later in your study of statistics, you will discover

are there any significant differences between

that many analyses involvea similar approach:

means? An overall Fratio will tell us whether

first, an omnibustest that includes all groups

there are any significant differences among

and/or all variables, then follow-up analyses to

the group means, but it does nottell us which

evaluate which groups or which variables show

specific means differ. It is possible that each

significant differences.

group mean differs from every other group mean, butit is also possible that only one or a few pairs of means differ. 2. The second question in one-way ANOVAis this: Which specific pairs (or combinations)

13.3 Hypothetical Research Example

of group means differ significantly? There

Suppose that an experiment is done to compare

are two ways to answer this question. A data

the effects of four situations: Group 1 is tested in a

analyst either decides which comparisons are

“no-stress,” baseline situation; Group 2 does a

of interest ahead of time and sets up planned

mental arithmetic task; Group 3 does a stressful

contrasts or explores data using post hoc

social role play; and Group 4 does a mock job

follow-up tests to which means differ

interview. For this study, the X variable is a

significantly. Both approaches are discussed

categorical variable with codes 1,2, 3, and 4 that

in this chapter.

represent which of these four types of stress each

ヶ Planned contrasts (sometimes just

participant received. This categorical Xpredictor

called contrasts and also called priori

variableis called a factor; in this case, the factor is

comparisons) can be set up to examine a limited number of differences between

called “type of stress”; the four levels of this factor correspond to no stress, mental arithmetic,

means that the data analyst has decided

stressful role play, and a mock job interview. At

ahead of time are of interest. These are

the end of each session, the participants self-

called unprotected (that is, not

report their anxiety on a scale that ranges from 0

protected against inflated risk for Type I

= noanxietyto 20 = extremely high anxiety. Scores

error) because, except for limiting the

on anxiety are, therefore, scores on a quantitative

number of significance tests, there are

Youtcome variable. Imagine that there is a

no other corrections for inflated risk for

convenience sample of V= 28 participants.

Type IL error.

(Capital V denotes the total number of

e hoc (such as the Tukey

participants in the study.) Imagine that

honestly significant difference [HSD]

participants were randomly assigned to one of the

test) can be used to examine manyor all

four levels of stress. This results in #= 4 groups

the possible comparisons among means.

with по 7 participants in each group, for a total of

because

NV = 28 participants in the entire study. Lowercase

These are called

most of them use more conservative per comparison criteria for statistical

62% Page 375 of 624 - Location 9606 of 15772

nindicates the number of cases per group. The

SPSS Data View worksheet that contains data for

this imaginary study appears in Figure 13.1, and

data are available in the SPSSfile

stress_anxiety.sav. The goal of data analysis is to find out: 1. Whether mean anxiety levels differed across

o

these four situations. 2. Which situations elicited the highest and lowest anxiety.

トーーーーーーイ

al

stress

nxiei

Y

1

1

10

2

1

10)

3

1

12

4

1

11

5

1



6

1

7 12

3. Which treatment group means differed significantly from the baseline (no-stress)

7

1

condition.

8

2

17

9

2

14

10

2

14

Figure 13.1 Data View Worksheet for Stress and





13

Anxiety Study in stress_anxiety.sav

12

2

11

13

2

17

14

2

14

15

3

15

16

3

11

17

3

12|

18

3

14|

19

3

16

20

3

17|

21

3

10)

22

4

16)

23

4

20

24

4

14|

25

4

16

26

4



②⑦





②⑧





4. Whether mean anxiety differed among the mental arithmetic, role play, and mock job interview stress situations.

62% Page 376 of 624 - Location 9633 of 15772

e

62% Page 377 of 624 -

n 9639 of 15772

stress Anxiety

1 |1 2 [1 3 |1 4 [1 5 [1 6 [1 7 [1 8 |2 ⑨ ② 10 |2 11 |2 12 |2 ⑬ ② 14 2 15 3 16 3 17|3 18 3 19 3 20 |3 21 |3 22 |4 23 ④ 24 |4 25 |4 26 |4 27 |4 28 4

10 10 12 11 7 7 12 17 ⑭ 14 13 11 ⑰ 14 15 11 12 14 16 17 10 16 ⑳ 14 16 19 16 18

13.4 Assumptions and Data Screening for One-Way ANOVA The assumptions for one-way ANOVA are the same as those described for the independentsamples ¿test. The scores on the dependent variable must be quantitative. Observations must be independent of one another, both within and between groups. Ideally, scores should be approximately normallydistributed within each group, and variances should be approximately equal across groups. ANOVA,like the ¿test,is robust against violations of the normality and equal variance assumptions if within-group 7's

of squares or SSis obtained by finding M for the group of interest, computing a (F- M) deviation for each individual Yscore, squaring the deviation for each score, and summing the squared deviations. For the independent-samples ¿test, we needed to find SSonly for Groups 1 and 2. In ANOVA,several different forms of SSare

obtained. SStotal is obtained by: ・ Finding the grand meanfor the entire data set, denoted My: * Obtaining the (7- My) deviation for every

score in the data set.

are reasonably large. Finally, there should not be

* Squaring each deviation.

extreme outliers.

* Summing the squared deviations.

Preliminary screening involves the same

For abatch of data, recall that the sample variance

procedures as for the ztest: Histograms can be

s= SS/df. For SStotal, &= N-1, where Nis the total

examined separately for each group to assess

number of scores in the entire data set. We could

normality of distribution shape; boxplots for

use SStota] to find the total variance sfor all F

groups can identify and potential outliers within

scores in the study; however, to find out what

groups. The Levene test (or another test of

proportion of variance in Vis related to group

homogeneity of variance) can be requested as part

membership, it is more convenient to focus on SS

of the output and used to assess whether the

than s. In ANOVA,an SS divided by its fis usually

homogeneity of variance assumption is violated.

called a mean square (MS).

Because preliminary data screening for one-way

A one-way ANOVAdivides SStota] into two sources

between-SANOVAuses the same procedures as

of variance, often called SShetween groups and

those shown in Chapter 12, on the independentsamples ¿test, these procedures are not repeated

here.

SSwithin groups The formulas to obtain the latter two SSterms can appear confusing, so let’s just focus on the information provided by each term.

13.5 Computationsfor OneWay Between-SANOVA 13.5.1 Overview ANOVAbegins with familiarstatistics. For each group, we obtain M, s, SS, and 7. Recall that a sum

62% Page 377 of 624 - Location 9641 of 15772

SSbetween groups tells us how far the values of Mi, Mp,..., M¿are from the grand mean. If the group means are all exactly equal, SSpetween groups Will be 0. SSterms can never be negative, and there is no fixed upper limit for values. A “large” value of SSbetween groups (also called SShetween) tells us that: * group means are far away from the grand

mean, and/or * group means are far away from one another.

55, + 55) + 55, + 554. The dffor MSwithin is the sum of d, do, dfs, and

What information do we need to consider to

df4; this can also be written as 7, + 729 + 3 + N4— k,

decide whether SShetweenis “large”?

or N- & where Vis the total number of persons in

First, we need to divide SSby its d/ Deviations of

the study and #is the number of groups.

group means from the grand mean, like

If 857 (or the SSfor any group) = 0, that tells us

deviations of individual scores from a sample

that all scores within Group 1 were equal to one

mean, must sum to 0. If there are 2 group means, only the first #- 1 deviations of group means from the grand mean are free to vary. Thus, for between, Y= 2-1 (where Zis the number of groups). Dividing an SSby its dfcorrects for the number of independent deviations used to calculate the SS. An SS divided by its dfis called a mean square. MSpetween 1s, in effect, the variance

another. As the valueof SS, gets larger, we have evidence that a sample of people who received the same treatment havedifferent score values, and

these differences are due to other variables that influenced the outcome. In the hypothetical study of stress and anxiety, anxiety scores may be influenced by recent drug use, depression, or

events in the lab.

of the group means. For technical reasons, statisticians do not refer to MSas a variance (but

After you calculate SStotal, SShetween, and SSwithin,

essentially, that’s whatit is).

you will find that this equality holds (as long as

Second, we need to compare SShetween With information abouterror variance or within-group

you have not made arithmetic errors):

Other

variance. The error variance term is called SSwithin- There are several ways to compute SSwithin- The easiest way to think aboutit is this: First, find SSfor the set of scores within each treatment group. For Treatment Group1, find the group mean, M1; compute the deviation of each Y score in that group from fy; square the deviations; and sum the squared deviations. This yields 857, and this tells us about variation of scores within Group 1. For a study with & = 4 groups and 7 = 7 cases within each group, you obtain the following:

+ SSwithin” This equation describes the partition (division) of

total variation of Yinto two sources of variance: differences among group means (SSbetween) and differences among scores within the same treatment groups (SSwithin). We hope that most of the variation between groups is due to the different types or amounts of treatment received by groups, and we usually hope that SShetween Will be large. We know that SSyithin provides information about response differences among people who received the same type of treatment

dí,

CA

dh

D

and that SSwithin tells us about magnitude of experimental error; we want SSwithin to be small.

MSywithin is the sum of four within-group SSterms: Recall that one of the effect sizes for /was n° and

62% Page 378 of 624 - Location 9670 of 15772

that n° was the proportion of variance of Yscores

To summarize: The by-hand computation for one-

that is predictable from or related to group

way ANOVA(with #groups and a total of Y

membership. In one-way ANOVA, nº =

observations) involves the following steps.

SShetween/ SStotal- Thus, the SSterms provideeffect

Complete formulas are provided in the following

size information.

sections.

To obtain a statistical significance test, we set up

- Compute SSbetween» SSwithin, and SStotal2. Find effect size: n° = SShetween/SStotal-

an Fratio:

3. Compute MShetween by dividing SShetween bY

Other

its 7, 2-1.

(13.3)

F=MS,,,../MS etween' within* Because it is a ratio of MSterms, Fcannot be

4. Compute MSwithin by dividing SSwithin by its

dfiN-R. 5. Compute an Fratio: MSpetween/MSwithin6. Compare this Fvalue obtained with the

negative. Fwould bezero if all group means were

critical value of Ffrom a table of the 7

equal. There is no fixed upper limit for values of 5.

distribution with (4-1) and (W-k) d/(using

To decide whether Fis large enough to be

the table in Appendix Cat the end of the book

statistically significant, we need to find a critical

that correspondsto the desired alpha level;

value of Ffrom the table in Appendix C at the end

for example, the first table provides critical

of this book. The reject region for Fis always one

values for a = .05). If the Fvalue obtained

tailed (values in the top 5% of an distribution,

exceeds the tabled critical value of Ffor the

for instance). To locate the critical value that

predetermined alpha level and the applicable

correspondsto the top 5% of the distribution, you

degrees of freedom, reject the null hypothesis

need to know about d/ The independent-samples

that all the population means are equal.

ttest required only one d/term. An Fratio compares two different MSterms, and each of

In practice, these computations are done by

those MSterms has its own df so we need to

programs such as SPSS; you can decide whether

specify two different dfterms:

the outcome is statistically significant by examining the p value for the Ftest and evaluate

Other

effectsize by calculating an n2.

(13.4) dfctveen =k- 1, Other

(13.5)

弘m ニ が ー る where #is the number of groups and Vis the total

number of cases.

13.5.2 SShetween: Information About Distances Among Group Means The following notation will be used: Let Abe the number of groups in the study. Let 71, 712,..., 14, be the number of scores in

62% Page 379 of 624 - Location 9700 of 15772

Groups 1,2,..., & Let Y;;be the score of subject/in Group /(/= 1,

2,..., Y.

SSbetween * 182 (this agrees with the value of SShetween in the SPSS output presented in Figure 13.8 except for a small amount of rounding error).

Let My, Mo,

Mkbe the means of scores in

13.5.3 SSwithin: Information

Groups 1,2,

A

About Variability of Scores Within Groups

Let Vbethe total Win the entire study; V= 71

+ m9 +. + nk.

To summarize information aboutthe variability of Let Mybe the grand meanof all scores in the

scores within each group, we compute MSyithin-

study (i.e., the total of all the individual

For each group, for groups numbered 7/= 1, 2,..., 4,

scores, divided by Y, the total number of

we first find the sum of squared deviations of

scores).

scores relative to each group mean, 55; The SSfor scores within Group fis found by taking this sum:

Once we have calculated the means of each individual group (74, M>,..., Mk) and the grand

Other

mean My, we can summarize information about

(13.7)

the distances of the group means, M fromthe

SS, = (HM)

grand mean, My, by computing SShetween as

nm

j=1

follows:

Thatis, for each of the #groups, find the deviation

Other

of each individual score from the group mean;

(13.6)

, S の(④カー =

=m (M,~ My)’ +n, (My ~My)’ +...4m, (My — My)’ For the hypothetical data in Figure 13.1, the mean anxiety scores for Groups 1 through 4 were as follows: M; = 9.86, M= 14.29, M3 = 13.57, and M4 = 17.00. The grand mean on anxiety, My, is 13.68. Each group had 7 = 7 scores. Therefore, for this study,

square and sum these deviations for all the scores in the group. These within-group SSterms for Groups 1, 2,..., 2 are summed across the groups to obtain the total

SSwithin: Other

(13.8)

k

SS,iin = 25S; = SS, +SS, +...+SS, i=l

Other

$$ щщеся = 7 x (0.86 — 13.68)+ 7 x (14.29 — 13.68)? + 7 x (13.57 — 13.68}+ 7 x (17.00 — 13.68)?

For this data set, we can find the SSterm for Group 1 (for example) by taking the sum of the

=7x (3.82 + 7х (61)? + 7х (11)? + 7 ж (+332)?

squared deviations of each individual score in

= 7х 14.5924 + 7 x3721 + T0121 + 7 x 11.0224.

Group 1 from the mean of Group 1, M,. The values

62% Page 380 of 624 - Location 9731 of 15772

are shown for by-hand computations; it can be

As noted earlier, the SSbetween ANd SSwithin terms

instructiveto do this as a spreadsheet, entering

will sum to SStotal:

the value of the group mean for each participant as anew variable and computing the deviation of

Other

each score from its group mean and the squared

(13.10)

deviation for each participant.

SS total = 98 between + 55, within”

Other 55, = (10 — 9.86)? + (10 — 9.86)? + (12 - 9.86)? + (11 - 9.86)? +

(7— 9.86)? + (7 - 9.86)? + (12 — 9.86).

SS, = 26.86.

For these data, SStota] = 304, SShetween = 182, and SSwithin = 122, so the sum of SShetween and

SSwithin equals SStota] (because of rounding error,

For the four groups of scores in the data set in Figure 13.1, these are the values of SSfor each

group: S51 = 26.86, 55, = 27.43, 853 = 41.71,and 55, = 26.00. Thus, the total value of SSyithin for this set of data

is SSwithin = SS1 + 557 + SS + 554 = 26.86 + 27.43 + 41.71 + 26.00 = 122.00.

these values differ slightly from the values that appearin the SPSS output in Section 13.13).

13.5.5 Converting Each SStoa Mean Square and Setting Up an 7 Ratio An Fratio is a ratio of two mean squares. A mean square is the ratio of a sum of squares to its

13.5.4 SStotal: Information About Total Variance in FScores We can also find SStota]; this involves taking the deviation of every individual score from the grand mean, squaring each deviation, and summing the squared deviations across all scores and all groups:

formula for a sample variance is also S5/d/ MS terms in ANOVAare similar to variances, but they

are not called variances for technical reasons. The dfterms for the two MSterms in a one-way between-SANOVAare based on Z, the number of groups, and M, the total number of scores in the entire study (where N= 7, + 72) + -- + 74). The between-group SS was obtained by summing the

Other

(13.9)

degrees of freedom, MS = SS/df. Note that the

deviations of each of the #group means from the

em

grand mean; only the first #- 1 of these deviations

SSrt = 220(¥; ~My)’

are free to vary, so the between-groups df= 4-1, where #is the number of groups.

た ⑰①

Other The grand mean My= 13.68. The SStota] term includes 28 squared deviations, one for each

(13.11)

participant in the data set, as follows:

dfctveen = k -L

Other

In ANOVA, the mean square between groups is

88, = (10— 13.68)? + (10 — 13.68)? + (12 — 13.68)? +

+ (18— 13.687= 304.

63% Page 381 of 624 - Location 9760 of 15772

calculated by dividing SSpetween by its degrees of

freedom:

Other

Other

(13.15)

pMS, = between

(13.12)

MS,= ween k—1

MSiin Figure 13.2 Reject Region for 7 Distribution With 3 and 24 df Using a= .05

and anxiety, SShetween = 182, dfoetween = 4-1 = 3,

and MSpetween = 182/3 = 60.7.

The dffor each SSwithin-group term is given by 7 —1, where is the number of participants in each group. Thus, in this example, $5; had 7-1 or df=

Height of the distribution curve

For the data in the hypothetical study of stress

301 reject region

6. When we form SSwithin, We add up SS, + SS7 + + SSk. There are (7— 1) dfassociated with each ss

Value of F

term, and there are #groups, so the total dfwithin = &х (n—1). This can also be written as

Other

(13.13) びがmm =N- k, where Nis the total number of scores (77 + 7) + -+ nk) and £is the number of groups. We obtain MSyithin by dividing SSyithin by its corresponding df.

The horizontal axis of the graph showsthe value of F marked from to 3.0, and the vertical axis showsthe height of the F distribution curve. The graph line is an inverted-V shapedcurve skewedto the left, peakingat almost x equals 0.5 and then dropping gradually to reach base level at about x equals 3.5. The area underthe graph line to the right of a vertical line drawn at x equals 3.01 is shadedand labelled “reject region.”

Other

For the stress and anxiety data, £= 60.702/5.083

(13.14)

distribution with (è-1) and (W- A) df For this data

MSin = SSpichinN = k).

set, #= 4 and N= 28, so dfvalues for the Fratio are

= 11.94. This Fratio is evaluated using the 7

3 and 24.

For the hypothetical stress and anxiety data in

Figure 13.1, MSwithin = 122/24 = 5.083.

An Fdistribution has a shape that differs from the

Finally, we can set up a test statistic for the null

two mean squares and MS cannotbe less than 0,

hypothesis Mo: Mi = Но =

the minimum possiblevalue of Fis 0. On the other

= M¿by taking the ratio

ofMSbetween to MSwithin:

normal or ¿distribution. Because an Fis a ratio of

hand,thereis no fixed upper limit for the value of F. Therefore, the distribution of Ftends to be

62% Page 382 of 624 «+ Location 9792 of 15772

positively skewed, with a lower limit of 0, as in

scores within each group, the larger the value of

Figure 13.2. The reject region for significance

SSwithin-

tests with “ratios consists of only one tail (at the Consider the example shown in Table 13.1, which

upper end of the distribution). The first table in Appendix C at the end of the book shows the

shows hypothetical data for which SShetween

critical values of for æ = .05. The second and

would be O (because all the group means are

third tables in Appendix C providecritical values

equal); however, SSyithin is not 0 (because the

of for g= .0① and g= .00①.In thehypothetical study of stress and anxiety, the Fratio has dfequal

scores vary within groups). Table 13.2 shows data for which SShetween is not 0 (group means differ)

to 3 and 24. Using a = .05, the critical value of F

but SSwithin is O (scores do not vary within

from the first table in Appendix C with d/= 3 in the numerator (across the top of the table) and df

groups). Table 13.3 shows data for which both

= 24 in the denominator (along the left-hand side of the table) is 3.01. Thus, in thissituation, the a = .05 decision rule for evaluating statistical significance is to reject Zp when values of F>

between and SSwithin are nonzero. Finally, Table 13.4 shows a pattern of scores for which both

SSbetween aNd SSwithin are O. Table 13.155,ctweenSSwithin

+3.01 are obtained. A value of 3.01 cuts off the top 5% of the area in the right-hand tail of the 7

2

distribution with dfequal to 3 and 24, as shown in

Figure 13.2. The obtained = 11.94 would therefore be judged statistically significant.

M,=6

13.6 Patterns of Scores and Magnitudes of SSpetween and

M,=6

M,=6

Me

Table 13.255,ctweenSSwithin 7

SSwithin Itis important to understand what information

3

5

about pattern in the data is contained in these SS and MSterms. SShetween 1s a function of the distances among the group means (My, My, ..., Mp); the farther apart these group means are, the larger SShetween tends to be. Most researchers hope to find significant differences among groups, and therefore, they want SSetween (and 5) to be relatively large. SSyithin is the total of squared within-group deviations of scores from group means. SSyithin would be 0 in the unlikely event that all scores within each group were equal to one another. The greater the variability of

63% Page 323 of 624 - Location 9820 of 15772

Table 13.455,ithinSSpetween

7

and it is not interpreted as evidence of causality. An eta squared (n?) is an effect size index given as a proportion of variance; if n° = .50, then 50% of

the variance in the ¥j;scores is related to betweengroup differences. Thisis the same eta squared

13.7 Confidence Intervals for Group Means Once we know the mean, variance, and 7 for each group, we can set up a confidence interval (CI) around the mean for each group or a CI for any difference between a pair of group means. Procedures for CIs were reviewed in Chapter 12, on the independent-samples ztest and are not repeated here.

that wasintroduced in the previous chapter as an effect size index for the independent-samples £ test; verbal labels that can be used to describe effect sizes are provided in Table 12.2. If the scores in a two-group ¿test are partitioned into components using the logic just described here and then summarized by creating sums of squares, the n2 value obtained will be identical to the n° that was calculated from the zand dfterms. It is also possible to calculate eta squared from the

13.8 Effect Sizes for One-Way Between-SANOVA

Fratio and its df this is useful when reading journal articles that report Ftests without providing effect size information:

Other By comparing the sizes of these SSterms that represent variability of scores between and within

(13.17)

groups, we can make a summary statement about

nº —

the comparative size of the effects of the independent and extraneous variables. The proportion of the total variability (SS{ota]) that is due to between-group differences is given by

の⑨es x ア

( のceca X ア+ がmana An eta squared is interpreted as the proportion of

variance in scores on the Youtcome variable that is predictable from group membership (i.e., from

Other

the score on X, the predictor variable). Suggested

(13.16)

verbal labels for eta squared effect sizes were given in Table 12.2.

ま SSpenween

TS total In the context of a well-controlled experiment, these between-group differences in scores are, presumably, due primarily to the manipulated independent variable; in a nonexperimental study that compares naturally occurring groups, this proportion of variance is reported only to describe the magnitudes of differences between groups,

63% Page 324 of 624 - Location 9245 of 15772

One alternative effect size measure sometimes used in ANOVAis called omega squared («w?) (see Hays, 1994). The eta squared index describes the proportion of variance due to between-group differences in the sample, butit is a biased estimate of the proportion of variance that is theoretically dueto differences among the populations. The »? indexis essentially a

(downwardly) adjusted version of eta squared that

and 2%etween = 2, @ population eta squared value

provides a more conservative estimate of variance

of .15, and a desired level of power of .80, the

among population means; however, eta squared is

minimum number of participants required per

more widely used in statistical power analysis and

group would be 19.

as an effect size measurein the literature. Cohen's À is yet another effect size, often used in statistical power analysis. Cohen's£ = n2/(1 — n°).

Table 13.55%

ECE Power O ® 6 ADA > wo > ©

13.9 Statistical Power Analysis for One-Way Between-S ANOVA Table 13.5 is an example of a statistical power

table that can be used to make decisions about sample size when planning a one-way between-§

ws

=

ow

o

ea

ю16 e » мно © am ow om @ un» ゃ コ © = ow 5 a 2 8 = Source:Adapted from Jaccard and Becker (2009). Note:Each table entrycorrespondsto the minimum n required in each group to obtain the level ofstatistical power shown.

ANOVA with 4= 3 groups and а = .05. Using Table

Java applets are available on the web for statistical

13.5, given the number of groups, the number of participants, the predetermined alpha level, and

power analysis; typically,if the user identifies a

the anticipated population effectsize estimated

analysis (such as between-Sone-way ANOVA) and

by eta squared,the researcher can look up the

enters information about alpha, the number of

minimum 7 of participants per group that is

groups, population effect size, and desired level of

required to obtain various levels of statistical

power, the applet provides the minimum per

power. The researcher needs to make an educated

group sample size required to achieve the user-

guess: How large an effect is expected in the

specified levelof statistical power.

Java applet that is appropriate for the specific

planned study? If similar studies have been conducted in the past, the eta squared values from past research can be used to estimate effect size; if

13.10 Planned Contrasts

not, the researcher may have to make a guess on

The idea behind planned contrasts is that the

the basis of less exact information. The researcher chooses the alpha level (usually .05), calculates fpetween (Which equals £- 1, where £is the

researcher identifies a limited number of

number of groups in the study), and decides on

comparison is essentially identical to a Zratio,

the desired level of statistical power (usually .80,

except that the denominator is usually based on

or 80%). Using this information, the researcher

the MSwithin for the entire ANOVA,rather than

can use the tables in Cohen (1988) or in Jaccard

just the variances for the two groups involvedin

and Becker (2009) to look up the minimum

the comparison. Sometimes an Fis reported for

sample size per group that is needed to achieve

the significance of each contrast, but Fis

the power of 80%. For example, using Table 13.5,

equivalent to 2 in situations where only two

for an alpha level of .05, a study with three groups

group means are compared or where a contrast

62% Page 386 of 624 - Location 9873 of 15772

comparisons between group means before looking

at the data. The teststatistic that is used for each

suppose that the researcher has a study in which

has only ① が For the means of Groups a and b, the null hypothesis for a simple contrast between M, and My is as follows:

there are four groups; Group 1 receives a placebo, and Groups 2 to all receivedifferent antidepressant drugs. One hypothesis that may be of interest is whether the average depression score combined across the three drug groups is

Other

significantly lower than the mean depression

Hyp, =,

score in Group 1, the group that received only a placebo.

or

The null hypothesis that corresponds to this

Other

Но:— №= 0. The teststatistic can be in the form of a /test:

comparison can be written in any of the following

ways: Other

Ao =P +; La + Ha

Other

which can be stated: Other

where mis the number of cases within each group

Hu Ho TH Ha _ 9 0-1

in the ANOVA.(If the 77s are unequal across groups, then an average value of 7 is used; usually,

In words, this null hypothesis says that when we

this is the harmonic’ mean of #5.)

combine the means using certain weights (such as

Note that this is essentially equivalent to an

+1,-1/3,-1/3, and -1/3), the resulting composite is predicted to have a value of O. This is equivalent

ordinary £test. In a /test, the measure of within-

to saying that the mean outcome averaged or

groupvariability is 2p; in a one-way ANOVA,

combined across Groups 2 to 4 (which received

information about within-group variability is

three different types of medication) is equal to the

contained in the term MSyithin. In cases where an

mean outcome in Group 1 (which received no

Fis reported as a significance test for a contrast

medication). Weights that define a contrast

between a pair of group means, Fis equivalent to

among group means are called contrast

2. The dffor this ¿test equal V-£, where Vis the

coefficients. Usually, contrast coefficients are

total number of cases in the entire study and #is

constrained to sum to 0, and the coefficients

the number of groups.

themselves are usually given as integers for reasons of simplicity. If we multiply this set of

When a researcher uses planned contrasts, it is

contrast coefficients by 3 (to get rid of the

possible to make other kinds of comparisons that

fractions), we obtain the following set of contrast

may be more complex in form than a simple

coefficients that can be used to see if the

pairwise comparison of means. For instance,

combined mean of Groups 2 to 4 differs from the

63% Page 327 of 624 - Location 9900 of 15772

itself, does not imply causation.

association. For positive values of 7, as values of X increase, values of Falso tend to increase. For

If Y causes Y, we would expect to find a statistical relationship between Yand Yusing the appropriate bivariate statistic (such as, ¿test, chi squared, or other analyses). Evidence that Yand Y co-occur or are statistically related is a necessary condition for any claim that Y might cause or influence Y: Statistical association is a necessary,

but not sufficient, condition for causal inference. We need to be able to rule out rival explanations

before we claim that Y causesY. The additional evidence needed to make causal inferences was discussed in Chapter 2.

negative values of 7, as values of increase, values

of Ytend to decrease. Thistells us the nature or direction of the association. The absolute magnitude of 7(without the plus or minus sign) indicates the strength of the association. If ris near 0, there is little or no association between X and F As rincreases in absolute value, there is a stronger association.

10.4 Setting up Scatterplots Initial evaluation of linearity is based on visual examination of scatterplots. Consider the data in

When we interpretcorrelation results, we must

the file perfect linear association scatter data.sav

be careful not to use causal-sounding language

in Figure 10.1. In this imaginary data, number of

unless other conditions for causal inference are

cars sold (X) is the predictor of a salesperson’s

met. We should not report correlation results

salary (7). A scatterplot can be set up by hand. If

using words such as cause, determine, and

you already know how to set up a scatterplot and

Influence unless data come from a carefully

graph a straight line, you may skip to Section 10.5.

designed study that makes it possibleto rule out To create a scatterplot for the XY variable cars.sold

rival explanations.

and the Yvariable salary, set up a graph with

10.3 How Sign and Magnitude of r Describe an x, Y Relationship Before you obtain a correlation, you need to examine an Æ, Yscatterplot to see if the association between Yand Yis approximately linear. Pearson’s rprovides useful information only aboutlinear relationships. Additional assumptions required for Pearson’s 7 will be

values of O through 10 marked on the XY axis (this corresponds to the range of scores for cars_sold, the predictor), and $10,000 through $25,000 marked on the Yaxis (the range of scores for salary, the outcome) as shown in Figure 10.2. If one variableis clearly the predictor or causal variable, that variable is placed on the X axis; in this example, cars_sold predicts salary. To graph one data point, look at one line of data in the file.

The ninth line has X= 8 for number of cars sold and Y= $22,000 for salary. Locate the value of Y

discussed later.

(number of cars sold = 8) on the horizontal axis.

Values of rcan range from -1.00 through O to

corresponding valueof 7, salary, which is

+1.00. If assumptions for the use of rare satisfied,

$22,000. Place a dot at the location for that

then the value of Pearson's rtells us two things.

combination of values of Yand Y. When pairs of X,

The sign ofrtells us the direction of the

scores are placed in the graphfor all 11 cases,

Then, movestraight up from that value of Yto the

39% Page 235 of 624 - Location 5968 of 15772

coefficients. First, you list the coefficients for

in programs such as SPSS.

Contrasts 1 and 2 (make sure that each set of coefficients sums to 0, or this shortcut will not

13.11 Post Hoc or “Protected”

produce valid results).

Tests

Contrast 1:(-2,-1,0, +1, +2)

If the researcher wants to make all possible

comparisons among groups or does not have a

Contrast 2:(+1,-1,0,0,0)

theoretical basis for choosing a limited number of comparisons before looking at the data,it is

You cross-multiply each pair of corresponding coefficients (i.e., the coefficients that are applied to the same group) and then sum these cross

2 ョ ceo 2

1 1 EM) ョ

for Type I error by using “protected”tests. Protected tests use a more stringent criterion

products. In this example, you get

Contrast (C x CrossproductofC,xC,

possible to use test procedures that limit the risk

than would be used for planned contrasts in

Sum=0 Sum=o

æ 4 0° 0 00 o o

0 0 o 0

Smes

judging whether any given pair of means differs significantly. One method for setting a more stringent test criterion is the Bonferroni procedure, described in Chapter 10. The

In this case, the sum of the cross products is —1.

Bonferroni procedure requires that the data

This means that the two contrasts above are not

analyst use a more conservative (smaller) alpha

independent or orthogonal; some of the

level to judge whether each individual

information that they contain about differences

comparison between group means is statistically

among means is redundant. Consider a second

significant. For instance, in a one-way ANOVA

example that illustrates a situation in which the

with 2= 5 groups, there are £x (4-1)/2 = 10

two contrasts are orthogonal or independent:

possible pairwise comparisons of group means. If

Linear(C) Curvilinear (C) produetC,xC,

2 1 aco 2

① o CHO) o

e + qa o

コ o DO o

+2 Sm=0 1 Sumeo cacn 2 Sumoterose products =0

the researcher wants to limit the overall experiment-wise risk for Type I error (EW) for the entire set of 10 comparisons to .05, one possible way to achievethisis to set the PC, level for each individual significance test between

In this second example, the curvilinear contrast is

means at agw/(number of post hoc tests to be

orthogonal to the linear trend contrast.

performed). For example, if the experimenter wants an experiment-wise æ of .05 when doing #=

In a one-way ANOVA with Zgroups, it is possible

10 post hoc comparisons between groups, the

to have up to (#£- 1) orthogonal contrasts. The

alpha level for each individual test wouldbeset at

preceding discussion of contrast coefficients

EW,2/k, or 05/10, or .005 for each individual test.

assumed that the groups in the one-way ANOVA

The ¿test could be calculated using the same

had equal 77s. When the 77s in the groups are unequal,it is necessary to adjust the values of the contrast coefficients so that they take unequal groupsize into account; this is done automatically

64% Page 329 of 624 - Location 9955 of 15772

formula as for an ordinary ¿test, but it would be judged significant only ifits obtained p value were less than .005. The Bonferroni procedure is extremely conservative, and many researchers

prefer less conservative methodsof limiting the risk for Type I error. (One way to make the Bonferroni procedure less conservativeis to set the experiment-wise alpha to some higher value, such as ⑩ Dozens of post hoc or protected tests have been developed to make comparisons among means in ANOVAthat were not predicted in advance. Some of these procedures are intended for use with a limited number of comparisons; other tests are used to make all possible pairwise comparisons among group means. Some of the better known post hoc tests include the Scheffé test, the Newman-Keuls test, and the Tukey HSD test. The Tukey HSD test has become popular because it is moderately conservative and easy to apply;it can be used to perform all possible pairwise comparisons of means and is available as an option in widely used computer programs such as SPSS. The menu for the SPSS one-way ANOVA procedure includes the Tukey HSD test as one of many options for post hoc tests; SPSS calls it the Tukey procedure.

where a and Adenote any two groups a and み Values of the g ratio are compared with critical values from tables of the Studentized range statistic (see the table in Appendix F at the end of the book). The Studentized rangestatistic is essentially a modified version of the ¿distribution. Like ¢ its distribution depends on the numbers of subjects within groups, but the shape of this distribution also depends on à the number of groups. As the number of groups (2) increases, the number of pairwise comparisons also increases. To protect against inflated risk for Type I error, larger differences between group means are required for rejection of the null hypothesis as #

increases. The distribution of the Studentized rangestatistic is broader and flatter than the 7 distribution and hasthicker tails; thus, when it is used to look upcritical values of g that cut off the most extreme 5% of the area in the upper and lower tails, the critical values of gare larger than

The Tukey HSDtest (and several similar post hoc tests) uses a different method of limiting the risk for Type I error. Essentially, the Tukey HSD test uses the same formula as a ¿ratio, but the resulting test ratio is labeled g rather than ¢ to remind the user that it should be evaluated using a different sampling distribution. The Tukey HSD test and several related post hoc tests use critical

values from a distribution called the “Studentized rangestatistic,” and the test ratio is often denoted by the letter 7:

the corresponding critical values of £. This formula for the Tukey HSDtest could be applied by computing a g ratio for each pair of sample means and then checking to see if the obtained g for each comparison exceeded the critical value of ¢ from the table of the Studentized range statistic. However, in practice,

acomputational shortcut is often preferred. The formula is rearranged so that the cutofffor judging a difference between groups to be statistically significant is given in terms of

differences between means rather than in terms

Other

of values of a g ratio.

(13.19)

Other

(13.20) 64% Page 390 of 624 - Location 9980 of 15772

slightly larger between-group differences asa

HSD= Geriticat X Then, if the obtained difference between any pair

basis for a decision that differences are statistically significant, than the overall one-way ANOVA.

of means (such as M,— Mp) is greater in absolute value than this HSD, this difference between

13.12 One-Way Between-S§

means is judged statistically significant.

ANOVA in SPSS

An HSD criterion is computed by looking up the

To run the one-way between-SANOVA procedure

appropriate critical value of 7, the Studentized rangestatistic, from a table of this distribution (see the table in Appendix F). The critical 7 value is

in SPSS, makethe following menu selections from the menu bar at the top of the Data View worksheet, as shown in Eigure ⑬.③: っ

a function of both 77, the average number of

+ . This

subjects per group, and £, the number of groups in

opens the dialog box in Figure 13.4. Enter the

the overall one-way ANOVA.Asin other test situations, most researchers use the critical value

of gthat corresponds to a = .05, two tailed. This critical ¢ value obtained from the table is multiplied by the error term to yield HSD. This HSDis used as the criterion to judge each obtained difference between sample means. The researcher then computes the absolute value of the difference between each pair of group means (M;

name of one (or several) dependentvariables into the pane labeled “Dependent List”; enter the name of the categorical variable that provides group membership information into the box labeled “Factor.” For this example, additional windows were accessed by clicking on the buttons marked Post Hoc, Contrasts, and Options. The screenshots that correspond to this series of dialog boxes appearin Figures 13.4 through 13.7.

— M2), (My — M3), and so forth. If the absolute value

Figure 13.3 SPSS Menu Selections for One-Way

of a difference between group means exceeds the

Between-SANOVA

ré (Due) 5 Suiata sc ら at ven Du Tom rs Goons e n_ e ve

HSDvaluejust calculated, then that pair of group means is judged to besignificantly different. When a Tukey HSDtest is requested from SPSS,

|

包 目⑥ 四 = Ja TZ)

Besaros vate

SPSS provides a summarytable that shows all possible pairwise comparisons of group means and reports whether each of these comparisons is significant. If the overall for the one-way ANOVAis statistically significant, it implies that

ee

El ded E ョ rm ロ ov tren. Bi sarc sanoTest ロ ー senssacs Test rosariesTun.

(Зонеベ

ーーシーー see

there should be at least one significant contrast among group means. However, it is possible to

Thedetailsare asfollows.

havesituations in which a significant overall Fis

“Analyze”is the sixth tab from the left on the menu bar on top.

followed by a set of post hoc tests that do not reveal any significant differences among means. This can happen because protected post hoc tests are somewhat more conservative and thus require

64% Page 391 of 624 - Location 10007 of15772

“Compare means”is the fifth option from the top of the drop-down menu.

“One-way ANOVA”is the last optionof six given. Arrowsare shownagainst options “compare

AB, One-Way ANOVA: Post Hoc Multiple Comparisons Equal Variances Assumed

150

means” and “one-way ANOVA.”

Figure 13.4 One-Way ANOVA Dialog Box

LEO Dependent List E mie

Drecwa

E

Eveler-Duncan

[7] Tukeys-b

っama

E Duncan IE] Hochbergs GT2 Test D gare

le

Equal Varianceslot Assumed

oop Cloumerstz [7] Games-Howel [J Dunnetrs c

mmeeeee ー (connue) (cance nes) Thedetailsare asfollows.

Thedetailsare asfollows. Onthe left: An unlabeled pane.

Onthetop center: A panelabelled “dependent list” with theentry“anxiety” Toits right, three buttonslabelled “contrasts,” “post hoc,” and“options” Lower center: A panelabelled “factor” with the

entry“stress” Bottom row: Five buttons labelled “OK,”

“paste,” “reset,”

“cancel,” and “help.”

Figure 13.5 One-Way ANOVA: Post Hoc Multiple Comparisons Dialog Box

64% Page 392 of 624 - Location 10033 of 15772

The paneon top labelled “equal variances assumed” shows 14 choicesin three columns, of which “Tukey”hasbeen checked, the second choicefrom thetop in the central column. The panebelow is labelled “equal variances not assumed,” and shows 4 choices. A dialog box below that,labelled “significance level,” hasthe entry “0.05.”

In the lower marginarethree buttons: continue; cancel; and help. Figure 13.6 Specification of a Planned Contrast

1. One-Way ANOVA: Contrasts

Contrast Coefficient

Group



Polynomial

1 No stress

+3

2 Mental arithmetic

-①

3 Stressrole play

Contrast 1 of 1

4 Mock job interview

1 -①

The null hypothesis about a weighted linear

Coefficients:

composite of means that is represented by thisset

ofcontrast coefficients: Other

(+3)p, +Dp, + Ор, +Dp, = 0 or

Coefficient Total: 0.000

(cone) (cancer) nen

Other

5

占 -巴

Hi に 0

or Other

Note:

+

LL

FLL

+

и = 巳 Е H4

Thedetails are as follows, from the top downward.

« An option “polynomial”left unchecked. ® А рапе labeled “contrast 1 of 1” with a box labelled “coefficients.” Options include, 3; ‘minus1; minus 1; minus 1. « Coefficient total is shown as 0.000. o To the right of the pane is a button labelled “next”. + At the bottom are three buttons: continue;

cancel; and help.

From the menu of post hoc tests, this example uses the one SPSScalls “Tukey”(this corresponds to the Tukey HSD test). To define a contrast that compares the mean of Group 1 (no stress) with the mean ofthe three stress treatment groups combined, these contrast coefficients are entered one at atime: +3,-1,-1,-1. From the list of options, “Descriptive”statistics and “Homogeneity of variance test” were selected by placing checks in the boxes next to the names of

these tests. Figure 13.7 One-Way ANOVA: Options Dialog Box

64% Page 392 of 624 - Location 10046 of 15772

GPA First Year 40

Figure 10.7 Hypothetical Scatterplot for 7= +.20

GPA 40

30 so 20 20 10

250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation = .50

The image is an ellipse drawn around a scatterplot that showsa relationship between GPA and SATscores correspondingto r equals plus .5. TheX axis represents SAT scores and ranges from 250 to 800.The Y axis represents GPA and ranges from 1 to 4.

There are threeellipses within which most of the datapoints are clustered. Thereare many outliers, but several points lie within the ellipses. The ellipses are vertical.

10 250 300 350 400 450 500 550 600 650 700 750 800 SAT score Correlation of about .20

The image is an ellipse drawn around a scatterplot that shows a relationship between GPA and SAT scores corresponding to r equals plus .2. The X axis represents SAT scores and ranges from 250 to 800. The Y axis represents GPA and ranges from 1 to 4.

There are two ellipses within which many data points are clustered. There are manyoutliers, and several of thesepoints lie betweenthe ellipses.

For a mean GPA of 1.4, thefirst ellipse has 6

data points. Theyare clustered around the 2 GPA and 400 SATscore levels.

The secondellipse is for a mean GPA of 2.0. The data points are clustered aroundthe 1 to 3 GPA range and the 500 to 600 SATscore levels.

There are around18 such datapoints. There are many points close to the ellipse, but not contained within it.

The thirdellipse is for a mean GPAof 2.6. Here, data points are fewer, just around5, and are clustered around the 3 GPA level and 700

For a mean GPA of 2.1, thefirst ellipse has 8

data points. Theyare clustered around the 1 to 3 GPA and 400 SATscore levels.

The secondellipse is for a mean GPA of 2.4. The data points are clustered aroundthe 1.5 to 3.5 GPA range and the 650 to 700 SAT score

levels. There are around 5 such data points.

Mostof the other points lie betweenthe two ellipses and not inside them, while a straight line drawn betweenthe meansofboth ellipses is almosthorizontal.

SATlevel.

A straight line drawnbetweenthe means of theellipses is almost linear.

39% Page 230 of 624 + Location 6060 of 15772

10.6 Different Situations in Which 7 = .00

Descriptives

am

The details are as follows.

| Mom oon| os re ft Trt| rimam ee pue | me] wee | ow Z| m= m| ww um] u| =

Table 1 Contrast 1

Test of Homogeneity of Variances o swe de de se o E ANOVA any Sumo cニ ーー 森 ー ェ ー弟Sim win Groups 122000 24 R

Total

204.07

STRESS None Mentalarithmetic Stress role play Mock job int 3 Minus1 Minus 1 Minus 1

Table2 Contra Value ofcontrast 1d error | ォ ー Andety assume equal variances 1 Minus 15.29 [2952 Wins 5176 24 [as Aniety does not assume equal 1 Minus 15.29 [2832 Minus 5.397 11058 0.06 Figure 13.10 shows the results for the Tukey HSD : i

27

tests that compared all possible pairs of group

means. The table “Multiple Comparisons”gives Thedetails are as follows. т.

the difference between means for all possible

a

Anxiety

pairs of means (note that each comparison

N] Mean Standard]

deviation

Standard 95% confidenee interval for Mean Lower Upper bound bound 0200 [70 us error

minimum 7

a

.

crm 6

with Group a). Examination of the “Sig.” or p

gts

FEE

Tone

7 [986 |2:

oss

[1231

[1626

[u

Se

|7 [857267

em

(1433

jason

jo

1700 (7002

Em

pe

ee

A

o over Total

appears twice; that is, Group a is compared with Group b, and in another row, Group b is compared

[7

i

values indicates that several of the pairwise



comparisons were significant at the .05 level. The

3

results are displayed in a moreeasily readable

3 ⑧

.



Table 2

form in the last panel under the heading ぅ : ‘Homogeneous Subsets.” Each subset consists of group means that were notsignificantly different

⑧ Anxiety

from one another using the Tukeytest. The no-

TE li

om

om

jus

h

Levene statistic

Df 1

Df 2

Sig

0.453

3

24

0.7

Table 3

stress group was in a subset byitself; in other words, it had significantly lower mean anxiety than any of the three stress intervention groups. The second subset consisted of the stress role play and mental arithmetic groups, which did not

Anxiety

Sum of squares df Mean square | Between groups 182.107 3 160,702 Ton Within groups 122.000 24 5.083

Total

304.107

27

Figure 13.9 SPSS Output for Planned Contrasts

Contrast Coefficients STRESS ета! nor metio lay

mock job interview 1 Contrast Tests Value or Contrast] Contrast Std. Enor 」 dl o eees [ANDIETV Azaumo equal vriancos! 8207 2081 STEW 2000] Donsnotassume equal? sal 2.832] Same 200 Contrast ①

64% Page 395 of 624 - Location 10081 of 15772

differ significantly in anxiety. The third subset

A

.

:

consisted of the mental arithmetic and mock job

interview groups.

Figure 13.10 SPSS Output for Post Hoc Test (Tukey HSD)

Multiple Cor parisons

DepandentVariable: ANXIETY

Codere tal 返 Te sun so {iowa[ppm Baní stress ッ co | am) 108 may e 778 vo] 000 進 一蓬 E— пе И e ss | | ply a e wer 0 am] as em ore гы o | n| sas mera rima 10 e 一磁 一賞 啓 清 澳 тоdr во ⑤wl 報 道 am ex se Sol il mri Tow anGrr snai rl.

SES None Stressroleplay Mental æithmetc Mockjobintervew Sg

memes

Homogeneous Subsets

[

Mean for groups in homogeneous subsets are displayed. Note that it is possible for a group to belong to more than one subset; the anxiety score for the mental arithmetic group was not significantly different from the stress role play or the mock job

aer

stress

N Subsetforalpha equal 005 ー ロ ェ E E [7 EEE] 7 70, 7 Too [098 [oe

interview groups. However, becausethestress

aE 謀 談 誠

role play group differed significantly from the mock job interview group, these three groups did

1.000 ow] ら cmニ ーm

"iss

not form one subset. Note also that it is possiblefor all the Tukey HSD

The details are as follows. Table 1

comparisons to be nonsignificant even when the

overall #for the one-way ANOVA is statistically significant. This can happen because the Tukey

Dependent variable: ANXIETY Tukey HSD бен [Шин

Mean difference [Si k 95% confidare intenal error 0 tower TO bound |b om [0006 Mis ド Minas ws ③ ma mete ster) [1208 [6025 [Minus N Stress role play Minus 3.71 |o 70% faster) [1208 [0000 Mun N o um oo |3 (ete) âmteriem ー as mera las_ Teastis Mental armee [none 1205 Tos[wins |4 stress role lay (0.71 261 11305 (one ves 6 Mazzi Mok e intere [7 Siar[1355 [005 [03 Hone E 10s |0933| Ming” |2 ae Mental H metic liecei [1205 [6047 Minus k [Minus 343 Mockjob emoh (atril) aoce 5:14 (ser) 1305 0000 [382 Wackoaren Ne Mines [6 Taos [0238| ost 27 Mera Seses e Stress role play 345 overs [1255 [о[010

Note: Asterisk signifies the meandifferenceis significantat the 0.05 level. Table 2

HSDtest requires a slightly larger difference between means to achieve significance. In this imaginary example, as in some research studies, the outcome measure (anxiety) is not a

standardized test for which we have norms. The numbers by themselves do not tell us whether the mock job interview participants were moderately anxious or twitching, stuttering wrecks. Studies

that use standardized measures can make comparisons with test norms to help readers understand whether the group differences were large enough to beof clinical or practical importance. Alternatively, qualitative data about the behavior of participants can also help readers understand how substantial the group differences

were.

Anxiety Tukey H $ D (uses harmonic mean sample size equal to 7.000

65% Page 396 of 624 - Location 10095 of 15772

Figure 13.11 Bar Chart for Group Means With 95% Confidence Intervals

Mean anxiety

20

anxiety and stress.

15

Results

10

A one-way between-SANOVA was done to

compare the mean scores on an anxiety scale None

Mental arithmetic Stress role play Shem Error bars: 95% Cl

Mock job interview

Thedetails are presentedhere ina table, with all values approximatedfrom the graph. mean | 10 14 135 17

Lower C | boundary |8 [12.5 [11 [⑮

for participants who were randomly assigned to one of four groups: Group 1, control

Thehorizontal axis of the graph showsthe different categories for which stress has been measured and thevertical axis showsthe mean anxiety. Theerror bars shows a confidenceinterval of 95%.

stress None Mental arithmetic Stress role play Mock job interview

(0 = not at all anxious, 20 = extremely anxious)

Upper C | bou 12 16.5 16 19

SPSS one-way ANOVA does not provide an effect size measure, but this can easily be calculated by

group/no stress; Group 2, mental arithmetic; Group 3, stressful role play; and Group 4, mock job interview. Examination of a histogram of anxiety scores indicated that the scores were approximately normally

distributed with no extreme outliers. Prior to the analysis, the Levene test for homogeneity

of variance was used to examine whether there were serious violations of the homogeneity of variance assumption across groups, but no significant violation was found, Æ3,24) = .718, p=.72. The overall for the one-way ANOVA was

statistically significant, A(3,24) = 11.94,p
50 + 8%

scores on Yfrom raw scores on Y (Figure 11.6).

(salary) and the name of the predictor variable

or N> 104 + (whichever is larger) for regression

Figure 11.4 SPSS Menu Selections for Linear

analysis. This implies that Mshould beat least 105

Regression

when using one predictor variable. This is consistent with sample size suggestions from Schônbrodt (2011), discussed in Chapter 10. Even if statistical power tables may suggest that N< 100 can give adequate statistical power for significance tests of band 7, it is preferable to have N> 100.

49% Page 300 of 624 » Location 7647 of 15772

byanyoverlap is labelled lower case c.

and B factors, and probably also between the main

A notereads: Partition of sum of squares Y for an orthogonal factorial ANOVA

effects and the A x B interaction. In a

A legendfor the area readsasfollows:

to explain some of the same variance. Thisis like a

・ ・ ・ ・

SS subscript A is lower case a $$ subscript Bis lowercase b SS subscript A timesBis lowercase a,b $8 subscript within is lower case c

nonorthogonal factorial ANOVA, the predictor variables (or factors) are correlated; they compete situation you will encounter when you learn to use more than one predictor in regression analysis. Correlated predictors in regression compete to explain some of the same variance in the Foutcome variable, and confounded factors

The total area of the circle that represents Y

compete to explain the same variance in the

correspondsto SStotal, and the overlap between

dependentvariable in factorial ANOVA. The

thecircles that represent Factor A and Yis

variance-partitioning problem in anonorthogonal

equivalent to SSa. The proportion of explained variance (represented by the overlap between Y and A)is equivalent to. When we have an orthogonal factorial design, there is no confound between the group memberships on the A and B factors; the predictive Factors A and B do not compete to explain the same variance. The A x B interaction is also orthogonal to (uncorrelated with) the main effects of the A and B factors. Therefore, when we diagram the variance-partitioning situation for a factorial ANOVA(as in Figure 16.16), there is no overlap between the circles that represent A, B, and the A x B interaction. Because these group membership variables are independent of each other in an orthogonal design, there is no need to statistically control for any correlation between predictors.

16.C.2 Partition of Variance in Nonorthogonal Factorial ANOVA When the 7s in the cells are not balanced,it implies that group memberships (on the A and B factors) are not independent; in such situations,

there is a confound or correlation between the A

82% Page 513 of 624 + Location 12898 of 15772

factorial ANOVAis illustrated by the diagram in Figure 16.17. In discussions of multiple regression, a similar problem arises in partition of variance. When we want to predict scores on Yfrom intercorrelated continuouspredictor variables Y; and X>, we need to take into account the overlap in variance that could be explained by these variables. In fact, we can use variance-partitioning strategies similar to those that can be used in multiple regression to compute the SSterms in a nonorthogonal factorial ANOVA. There are several ways to handle the problem of variance partitioning in regression analysis, and the same logic can be used to evaluate variance partitioning in nonorthogonal

factorial ANOVA. Figure 16.17 Partition of Sums of Squares in Nonorthogonal Factorial ANOVA

The Residualssection is below this. Here there are check options for Durbin-Watson and casewise diagnostics. Both have been left unmarked.

Table 11.1 relabels and rearranges the elements of the coefficient table in the SPSS outputso that you can relate them to terms in the textbook. The top panel of the SPSS outputin Figure 11.7 gives

Atthe bottom are option buttons for continue,

cancel andhelp.

results for Æ (capital Æis called multiple 2).

Onthe basis of information in Table 11.1 we can

11.12 SPSS Output: Salary Data To see the equivalence between Pearson's rand

write the unstandardized regression equation to predict salary in dollars from experience in years,

as follows:

parts of the results of the bivariate regression

Other

result, Pearson's rbetween years and salary was

Y =31,416.72 + 2,829.57 x years.

obtained using the SPSS correlations procedure;

Figure11.7 Pearson's 7for Years and Salary Correlations

results appear in Figure 11.7.

Unstandardized Predicted Value

Complete SPSS regression output includes additional information (discussed in Volume II [Warner, 2020]). Figure 11.8 shows the results needed to find the proportion of predicted and unpredicted variance (Æ and 1 - £2) and to write out the two versions of the regression equations (raw score and standardized). From the top of Figure 11.8, the proportion of variance in salary that can be predicted from years of experience is 72 or £2, thatis, .688 or about 69%. When regression includes more than one predictor, multiple Rtells us how well the entire set of predictor variables can predict 乃 In this example, the regression equation has only one predictor. When there is only one predictor variable, Pearson’s 7between Xand Pis the same as multiple R for the equation that uses Xto predict ¥. (You can ignore the other information in the top panel of Figure 11.8 for now. The

standard error of the estimate is discussed later in the chapter and is not usually included in research reports. The adjusted £2 valueis only used when a regression has more than one predictor variable.)

Unstandardized

Predicted Value

Pearson Correlation

Sig. (2-tailed) N

Pearson Correlation Sig. (2-tailed)

salary

N

** Correlation is significant at the 0.01 level (2-tailed).

830**

50

50

50

50

830°] 000

The image is a table that depicts Pearson correlations forsalary. Details are below; * Unstandardized predicted Value © Pearson correlation = Unstandardizedpredicted Value: 1

= Salary: .830 double star © Sig. 2-tailed = Unstandardizedpredicted Value: blank = Salary: .000 e R

» Unstandardizedpredicted Value: 50

= Salary: 50

e salary © Pearson correlation » Unstandardizedpredicted Value: .830 double star = salary: 1

© Sig. 2-tailed 49% Page 302 of 624 » Location 7691 of 15772

salary

1

000

1

methods produce identical results. By default, the

In atwo-way factorial ANOVA, we need to add a

SPSS GLM procedure uses the Type III

second term to this modelto represent the main

computation; you click the Custom Models button

effect for a second factor (B) and,also, athird

in the main GLM dialog box to select other types of

term to representa possible interaction between

computation methods for sums of squares, such

the A and B factors. The following effect

as SSType I, and to specify the order of entry of

components can be estimated for each participant

predictors and indicate whether some terms,

once we havecalculated the grand mean and all of

such as interactions, should be included or

the cell, row, and column means. The following

excluded from the model.

equations show the population parameter (e.g., a;)

Other optional methods of computing sums of squares can be requested from the SPSS GLM custom model(SS Types Il and IV). These are more

and the information from the sample used to estimate it (e.g., Ma;- Mp. Let a;bethe effect of Level¿for Factor A:

complicated and are rarely used.

Other

Appendix 16D: Modelfor Factorial ANOVA The theoretical model for ANOVAis an equation that represents each individual score as a sum of effects of all the theoretical components in the model. In one-way ANOVA,the modelfor the score of Person/in Treatment Group A;can be

represented as follows:

(16.26)

= M,;- Mp

Let B;be the effect of Levelfor Factor B:

Other

(16.27)

B.=My;- My. тек авбе the interaction effectfor the 7cell:

Other

Y;=hy+0;+E; where ju pis the grand mean of 了 wrepresents the “effect” of the #h level of the A factor on people's

Other

a; = Map; 7 呂 - の - 阜

scores, and cグ represents the residual or error, which captures any uniquefactors that

Letez be the residual, or unexplained part, of

influenced each person’s score. The pyterm is

each individual score:

estimated by the sample grand mean My, the a;

Other

term is estimated by My;~ My, the distance of each sample group mean from the sample grand mean; and the residual eis estimated by the difference

between the individual ¥jscore and the meanfor the 7th level of Factor A,thatis, £ÿ= Yy-Mas

83% Page 514 of 624 + Location 12947 of 15772

(16.29)

e,ijk = У.於 Marsi For each observation in an A x B factorial model, each individual observed ¥jscore corresponds to

an additive combination of these theoretical effects:

The theoretical terms (the u, a, B, aB, and effects) may beeasier to comprehend when you see that each observed symptom score can be separated

Other

into estimates of these components. Thatis, we

can obtain a numerical estimate for eacheffect for

(16.30)

坊 = ру+а,+ В; + AB; + Ep

each participant. The following example uses the

The “no-interaction” null hypothesis is equivalent

SPSSfile socialsupportstress.sav that was

social support, stress, and symptoms data in the

to the assumption thatfor all cells, this aB term

discussed earlier in this chapter.

is equal to or close to zero. For a two-way factorial

Each individual Fscore can be represented as the

analysis, the null hypothesis (of no interaction)

sum of the following components:

can be written as follows: Other

Other

Y=G_MEAN + AEFF + B_EFF + AB_EFF + RESIDUAL

or

(16.31)

H,: aB,, = aB,, = apy = af,= 0.

Symptoms = G_MEAN+ A_EFF+ B_EFF + AB_EFF + RESIDUAL

If we do not find a statistically significant #for

In the SPSS data set scorecomponents.sav, shown

the interaction, scores can be adequately predicted from the reduced (no-interaction) model, also called the “additive model”:

Other

in Figure 16.18, each individual person’s score is divided into the components described in Table 16.7.

For all persons in the study, the value of G_MEANis the same. It is the grand mean for

(16.32)

坊 = y+ の + В+ Ej The equation for this reduced (no-interaction) modelsays that the Fscore for each person

and/or the mean of for eachcell can be predicted from just the additive main effects of the A and factors. When there is no interaction,

symptom scores in the entire study. For all persons in the same social support group, A_EFF is the same. The A effect is the mean of the A group the person belongs to minus the grand mean. For all persons in the same stress group, B_EFF is the same.

we do not need to add an adjustment factor (ap) to

For all persons in the same cell, the AB_EFF is

predict the mean for each cell. The af effect

the same. The AB effect is based on the mean

represents something “different” that happens

for the cell that the person belongs to minus

for particular combinations of levels of A with

the grand mean, the A effect, and the B effect.

levels of B, which cannot be anticipated simply by

The AB cell effectis calculated as follows:

summing their main effects. Thus, the null

ABEFF = Mag;(My+ Mp;+ Mp). Members of

hypothesis of “no interaction” can be written

each cell have the same value for AB_EFF.

algebraically: #0: aB;= 0, for all /and / For each person, there is a unique value of

83% Page 515 of 624 - Location 12978 of 15772.

RESIDUAL (deviation of individual score from

each column, of which the data in row number

cell mean). It is obtained by subtracting the

10 has been highlighted.

cell mean from theindividual score.

The details of the spreadsheetare asfollows. ]Residual TAERF TOEFF TAB, Socaup [Stress ]sympiom]6 177 Mean | a 665 [165 Minus Minus [Mins セ ェ Tir 2757 [175 |190 665 os a qa ques ー ェ ェ h 1



Notice that if you sum the terms G_MEAN,A_EFF, B_EFF, AB_EFF, and RESIDUALfor each person,

Reprot symp [200 TS

|s

665

[165

[Minus

[Minus

[120

[500

symptom score. The reproduced scores appear in

*

nl

ll

El

Fl el



thelast column of Figure 16.18.



n his | [as

[as

|

27 [am

|375 fis

Jum [1000

you can exactly reproduce each person’s original



Suppose that Subject 10 is Joe. On the basis of

⑥ symptom score ⑧ 16.18, we can say that Joe's Figure

16.18

Score

Each

for

Components

Individual Case in the Social Support Data

Ee iニー se= pe tu rc ими- Go SHSM ew BLE A ВЕ do B レー ンーン レ ーーMAAAA do] Te mm oa aa ea a Ya 7 り ーー ① aw aw am am ww Toe + ро 一一 ро ⑧ am owe om am e Pa ① an amoowm ae as aaa В ta im mo mo de a Po El ie 4 iu + ョ ?

NL

ав

36

В

The spreadsheet shows 10 columns, with the



headingsas follows.

1. Serial number

2. Socsup underscore a 3. Stress underscore b

4. Symptom 5. Gunderscore MEAN 6. Aunderscore EFF 7. Bunderscore EFF 8. ABunderscore EFF 9. RESIDUAL

10. Reproduced symptom

Twenty rows of data have been entered under 83% Page 516 of 624 - Location 13007 of 15772



© |1

[a

7h ョ

2ド

эт

z—

|



“pe 一e

f

FT)fio

es Jess

[16s [1m

f

Ae

35

aZA 665 hh Pa

of 14 is made up of the components summarized : in Table 16.8

Figure

+

ав

laz

UBS ав bis 165 |225 Minus [Mins [175

LK 200 [000

0 |300

[200 ー一

|600 fas

fass

Ra

es

lie

26 |275 [Mine [Minus [17% 一談 一ー ョ

e ド 司 e

665 誌

ia



Toa

和 n レ sr ョ テ

ー ロ ー T+ ド ド

665 665 се 655

a テ





Ia



fu

[Wns 180 Tow [wna 280 綱

ェe [Minus 165 [Minus 165 [ms 165 [Minas 165 [Mina 165

Toes

[27s [275 [375 [278 [275

[2787

Tim

100 [000 m Minos |200 [000

[Wins 175 Minas 175 Tomes 175 [Mins 175 [Mine 100 175

Table 16.7

一ama

[600 |700 400 [600

[756

бека

Estimate

Sample

Nameof Sample Estimatein computationofss

Effect ofLevel fof Factor A EffectofLevel of

a

MIM,

Aer

в

маем,

B_EFF

combination of the Level iof with Level

Interaction effect for

ab,

Mig, +My,

ABLEFF

Residualorerror tor each participant



ー ビ E||

Factor

¡AB

Table 16.8

Corresponding Population

o

say File

RESIDUAL

6_MEAN EFF: Effect of Joe's being a member of the low-social support group. B_EFF: Effect of Joe's being a member of thehigh-stress group. ABEFF: Effect of Joe's beingin the low-support/high-stress group (interaction effect) Joe's individual tendency to report more symptoms than oth Je in the samecircumstances. Sumofthe grandmean, Joe's effects for A, B, andAB group memberships, and Joe's residualreproduce his symptom score of 14,

+165

Afactor).

+275 +175

A different formula for SS, is provided in

+120

Equation 16.4 says:

14

Equation 16.4: $84 = Z(Mp;~ Mp2. In words,

* For each individual case, subtract the grand mean Myfrom the mean of the A group (Ma) that the person belongsto; this difference is

Appendix 16E: Computation of Sums of Squares by Hand Notation and formulas for computation of SSvary

across textbooks. Equation 16.33 shows a commonly used formula to calculate SS by hand

from data.

denoted (Ma;- Mp. e Compute this difference for each person in

the data set. e Square the difference for each person. * Sum this difference across all values of i,, and £, where 7is the level of A,/is the level of B, and Ris subject number within each group. In other words, sum these squared deviations across all cases in the entire study.

Other

e The sum is SSa.

(16.33)

SS,=nxbx2(M,- My),

Equations 16.33 and 16.4 yield the same values for SSa. (There can be small differences due to

where

rounding error.) If you must do by-hand

nisthe number of cases in eachcell, bis the number of levels of the B factor,

Mais the mean of Group A; and Myis the grand mean. In words, Equation 16.33 says:

computation, Equation 16.33 may be preferred because it involves fewer steps than Equation 16.4. However, it is helpful to understand that SS

values summarize information about ANOVA model score components such as A_EFF. I believe that Equation 16.4 makes that conceptclearer. Figure 16.19 shows that after you obtain score components (A_EFF, B_EFF, etc.) for individual persons in the study, only two more steps are

e Find the difference between the mean of Group A and the grand mean (Ma;and My for

needed to obtain sums of squares. You need to square the effect (such as A_EFF) for each case and

each group defined by the A factor. + Square these differences.

then sum these squared effects across all persons

* Sum the squared differences across all A

Figure 16.19 Score Components in Factorial

in the study.

groups (ais the number of groups defined by

ANOVA

the A factor).

(A_EFFSQ), and Computation of SS Terms (554)

e Multiply this sum by 77 x (that is, by the number of cases in each group defined by the

83% Page 518 of 624 + Location 13021 of 15772

(A_EFF),

Squared

Score

Components

pose re rgeaguee genes ⑧ i \⑤ =

Corresponding Squared Deviations:

SSwithin = sum ofthis column



SSAxB = sum of this column

2

Socs Stre

symp

|AEFF |BEFF AB,

1

1

3

Les





s

[166

に ①FiF= ene Resid AE вa AB, 諸 SQ |sa |sa キ f T茂 葉 B 逸匠 H 7.56

3.06

肉 csssessssesssssssss PoR く ERRARERARRRARRAAASO cs, =sumofthis column В N È В ⑧ N = ド Èà

ェ ー ト 伊織楊 会 会 レー ド エ [ie 275 vie 175 vee [130 (277 786 Ei ェ テ in a wee [756 し 7116 le az — lia 12 a az 81 a de [ie am [15 мы |2|286 zw sw e ke e fe fo e 10 2 2 [1 ha |275 [175 |120 |272 [756 Ha 1 120 [Mins ins 178 (unos 272 [756 Le |b 200 Гри [75 TTT iis [wna TE Too

任 [3 56

È

n

AE pe E

E





P i

L

P

i

FP

ド ① M | ド me

3

2

1

16

рр Original Data

. Thescreenshot shows a spreadsheet with data in 10 columnsplaced sidewayson theleft, pós ⑧ squared deviations я thecorresponding with mentionedon the right. Thedetails are as follows.

165

Minus

[Minus

Minus

275

175

Minus | 272

礎 進綱 [mines 十[Minus [120 [272 |756 [306

|275

Minus

[175

[200

|

[272

|756

|e he |306 [306 5% |306

Vo |225 fans[wine (1 m 127 m fa 165 |275 [Mins [Mins [175

Minus [2.72 |756 |306 R B 200 16 2 26 Minus [2:75 Minus [0.00 (272 |756 |306 h 175 ョ ド デ ア мя [тли [об | Ги a ha 165 175 E E E a mina [15 vis vw [277 [755 [306 h 175200 ョ ド ュ e lee wes ooo [27 [780 [306 h 1» ュ 隆 伊 森[75 [5% ォ В 175 Coso [ora Gr rar bet Devi oeDear soar Sauer quer [og ed [m |r cvs [ons Rea fone under [on under [mt der [ma [od uta mimo det[dae eto devit Sevi [C 綺 出 [ a [SSS e ae le ai| oe ees | cr suc sts | equal equl times mA [si e |eee [Sam эт | R WL Lin| a смт сыт sm | ur ee [oras wr [er samt mt i м” Ra ee om De na. om ② S ョ





The SPSSfile fullcomputationofss.sav, shown in Figure 16.19, includes the effect components for all 20 individual cases that appear in Figure 16.18. The values in the last four columns (e.g., A_EFFSQ) were obtained by squaring these components. The sums of squares are obtained by adding these squared effects across all 20 persons. ‘What can we learn by going through all the computations in Figure 16.19? A fundamental concept in ANOVA is that we can break each individual score into components that represent

83% Page 520 of 624 + Location 13049 of 15772

the strength of the effect of each independent

correlation or regression, allows us to say

variable. The computations in Figure 16.19

something about the proportion of variance in F

demonstrate that SSterms are just sums of

scores that we can predict using the independent

squared effects such as A_EFF, B_EFF, and so on.

variables in our research. A major goal of research

(At least, this is the implicit theory behind the

is usually to accountfor a reasonably large

ANOVA model in Appendix 16C. It is possible to

proportion of variance in dependent variables.

think of real-world issues that would make breaking scores up into components associated with individual factors more complicated.) We can also divide or partition SS;ota] into summary

Comprehension Questions 1. Consider the following actual data from a

information aboutvariability due to effects of

study by Lyon and Greenberg (1991). The

each independentvariable. This example included

first factor in their factorial ANOVA was

effect of A (social support), B (stress), and A x B

family background; female participants were

(their interaction).

classified into two groups (Group 1:

Each effect (such as A_EFF) is a deviation of one mean from another mean, or a deviation of an individual score from a group mean. We can summarize information about magnitudes of effects by squaring and summing them. (As in earlier computations of SS terms, we have to square deviations before we sum them, because deviations from means sum to 0.) Sum of squares tells us which factors correspond to large components of individual stress scores and which factors correspond to small components, when

we summarize information across all cases in the study.

codependent, women with an alcoholic parent; Group 2: non-codependent, women with nonalcoholic parents). Members of these two groups were randomly assigned to one of two conditions; they were asked to donate time to help a man who was described to them aseither Mr. Wrong (exploitative, selfish, and dishonest) or Mr. Right (nurturant, helpful). The researchers predicted that women from anoncodependent/nonalcoholic family background would be more helpful to a person described as nurturant and helpful,

whereas women from a

Usually we hope for at least one independent variable SSterm for A, B, and their interaction to be large because this suggests that the corresponding factor is a useful predictor of Y scores or perhaps a cause of Y. We usually hope that SSyithin, also called SSresidual, Will be small. A key conceptin statistics is part

a

ANOVA makes it possible to partition, or divide, SStota] into SSterms that representeffects of A, B, A x B, and residual or within-group variability. The n° term for each SS(Section 16.8), like 7 in

83% Page 520 of 624 - Location 13057of 15772

codependent/alcoholic family background would be more helpful to a person described as needy, exploitative, and selfish. The table of means below represents the

amount oftime donated in minutes in each of the four cells of this 2 x 2 factorial design. In each cell,the first entry is the mean, and the standard deviation is given in parentheses.

The min each cell was 12.

EET A, (codependent family background) 133.84 (54.24) 12.502 A, (non-codependent family background)

0000.)

wwc

The reported Fratios were as follows:

design. 2. Run a factorial ANOVA using the SPSS GLM procedure. Verify that the values of the SSterms in the SPSS GLM output agree with the SSvalues you obtained from your spreadsheet. Make sure that

Other

F,(1,44) = 9.89, p < .003. F,(1, 44) = 4.99, p < .03.

you request cell means, a test of homogeneity of variance, and a plot of cell means (asin the examplein this chapter). 3. What null hypothesisis tested by the

Fy(1,44) = 43.64, p < .0001. 1. Calculate an n° effectsize for each of these effects (A and B maineffects and the A x B interaction). (Recall that n° =

Levene statistic? Whatdoes this test tell you about possible violations of an assumption for ANOVA? 4. Writeupa “Results”section. What conclusions would you reach about the

dfoetween * F/[dfbetween * F + dfwithin].)

possibleeffects of caffeine and exercise

2. Calculate the row means, column means,

on heart rate? Is there any indication of

and grand mean from these cell means.

an interaction?

3. Set up a table of cell means, or a bar chart of cell means, or a line plot of cell means. 4. Write up a “Results”section that presents these findings and provides an interpretation of the results.

5. What were the values ofthe 12 individual scores in the A,/B, group? How do you know them? (Scores on the dependentvariable cannot be negative

8

numbers in this example.) 2. Dothe following analyses using the hypothetical data below. In this imaginary experiment, participants were randomly assigned to receive either no caffeine (1) or 150 mg of caffeine (2) and to a no-exercise condition (1) or half an hour of exercise on a treadmill (2). The dependentvariable was heartrate in beats per minute. Data are also in

the SPSSfile caffeineexercisehr.sav. 1. Compute the row, column, and grand means by hand. Set up a table that shows the mean and 7 for each group in this

84% Page 521 of 624 + Location 13082 of 15772

. Consider these tables ofcell means. Which one shows a possible interaction, and which one does not show any evidence of an interaction?

‘has been checked.

Belowthis is a space to indicate the percentage of points to fit and Kernel. The confidence intervals choices are None, mean or individual. The first option has been selected.

The percentage has beensetto 95 percent.

The imageis a view of a scatter plot chart after adding a fit line in the SPSS chart editor. At thetop are the menu buttonssuch as; file, edit, view,options, elements and help. Below are buttonsfor editing and other chart functions.

The main chart appears onthe screen. The X axis denotes the years and rangesfrom 0 to 25. The Y axis denotesthesalary and rangesfrom

A ticked check box allows for Attaching label to file.

At the bottom of the chart are radio buttons for apply, close and help.

11.15 Using a Regression Equation to Predict Score for Individual (Joe’s Heart Rate Data)

Oto 100000.

The data pointsare spread throughthe chart; however, many are close to the region within 10 on the X axis and between 20000 to 60000

onthe Y axis.

In an earlier chapter, this question was raised: What might explain why some people have higher heart rates than average? Bivariate regression provides a way to answer this question (keeping in

A linear line is drawnthroughthe data points onthe chart. The equation oftheline as shown onthe chart is:

mind that no one batch of data, and no one analysis, can provide a definitive answer). If mean heartrate for a sample is 81.2 beats per minute, and Joe’s heartrate is 88 beats per minute, that

Y dash equals 31,416.72 plus 2,829.57 into X.

tells us Joe's heart rate is (88 — 81.2) = 6.8 beats per

Ontheright of the chartis a dialog box that allows forthe properties to be changed.

minute above average. Can we explain why Joe's heartrate is 6.8 beats per minute higher than

The fit line tab has beendepressed and the following are the options that can be edited:

above average? Bivariate regression provides a

Checkboxesfor Display spikes and suppress intercept. Both of these have beenleft unchecked. Options for Fit Method shows a series of different typesoffit lines. These are; mean of Y, Linear, Loess, Quadratic and Cubic. Linear 50% Page 306 of 624 - Location 7789 of 15772

average, or predict that his heart rate is this far way to think about this. Data for this example are in Figure 11.12 and in the file named joeshr.sav. For now, look only at the first two columns. Suppose Joe is a member of a sample, and all 10 members of the sample have scores for anxiety (the independentvariable) and hr (the ¥dependent variable). We'll assume

Chi-Square

Analysis of Contingency Tables

Here is an example based on data from the sinking of the 7itanic. The X variable is passenger class (1 = first, 2 = second,3 = third). The Yvariable is whether the person did or did not survive the sinking (1= died, 2 = survived). This table uses data for only female passengers (e.g., sexis held constant). Detailed information about casualties

17.1 Evaluating Association Between Two Categorical Variables

was published after the 77zanic sank (Mersey & Gough-Calthorpe, 1912). Probably you have seen

at least one of the films that dramatize this disaster, and you have some idea how things turned out for people who were first-class

Recall that the choice among bivariate analyses

passengers versus those in third class. These data

depends upon types of measurementfor the X

are in the SPSS file Titanic.sav.

independent and Ydependentvariables. 1. When the Ypredictor variable is categorical

A note abouttable setup: In my examples, if one

variable can be viewed as a risk factor or

and Yis quantitative, ¿tests or analysis of

protective factor or predictor or cause, I use that

variance (ANOVA)can be used to evaluate

as the row variable in the contingency table.

how means for Ydiffer across groups on the

basis of X. 2. When both variables are quantitative, and if X and Yare linearly related, Pearson’s rand bivariate regression can be used to evaluate

how scores are associated. 3. This chapter discusses situations in which both Yand Yare categorical variables. Chi squared (x2) is the most widely used statistic for this case. We begin by setting up a contingencytable. This can be done using the SPSS crosstabs procedure. A contingency

table has one row for each value of Yand one column for each value of ¥. The cell entries, called observed frequencies, tell us how many people were in each group.

17.2 First Example: Contingency Tables for Titanic Data

Asin correlation, there are situations where there is no clear reason to call one variable a predictor

and the other an outcome. When there is a basis to call one variable a predictor (or to think of it as a potential cause), I use that variable to define rows

in the table. That is not an ironclad rule. In the Titanic data, class of passage was established earlier in time than survival, so the table was set up using class of passage as the Xrow variable and

survival status as the column variable. This table has three rows (one row eachfor first-, second-, and third-class passengers) and two columns (died or survived). Each female passenger could be identified as a member of just one of the six groups (e.g., a passenger in first class who died). The number in each cell, O, is the observed number of persons in one of the six cells ofthe table (e.g., a Womanin first class who died). The total number of passengers in each class is denoted by 771, 772, 3; for example, 77 is the total number of passengersinfirst class. The total

84% Page 524 of 624 - Location 13116 of 15772.

number of passengers in each column (i.e., the

chart. We can divide each marginal row total (771,

numbers who died vs. survived) are denoted e; (Y

ny, and 73) by the table total Vto obtain

= 1, died) and c(F= 2, died). The values of the 7's

proportion for each class, as shown in Table 17.2.

and thecs are called marginal frequencies or

Cell frequencies are omitted from Table 17.2 to

marginal totals (because they are in the right and

highlight which numbers are the focus.

bottom margins of the table).

Proportions can be multiplied by 100 to obtain

Odenotes the observed number of persons in each cell. Numerical subscripts can be used to identify

percentages. For example, 36% of the female passengers were in first class.

each cell. In general, Ojis the number of persons

Similarly, we ask, What was the marginal

in the cell in row /and column/of the table. For

distribution of scores for the column variable? For

example, Os; is the number of persons in the cell

the 7itanicdata,the question is, How many died

in row 3 and column 2. These are persons with

and how many survived? Within each column in

scores of Y = 3 and Y=2,thatis, third-class

the table, we can add the frequencies in the three

passengers who survived.

cells to obtain a column total. The total number of women who died (c,) is the sum of women in first

Nis the total number of persons in the table. Note

class who died plus the number of women in

that can be obtained by summing all values of7,

second class who died plus the number of women

or all values of ¢, or all values of O.

in third class who died: 4 +13 +80=106=a.

Contingency tables are described by number of

Theses column totals appear in Table 17.3. To find

rows and number of columns. Table 17.1isa3 x 2

out what percentage of women died, divide ¢; by

table. Number of cellsis given by number of rows

N, this is .26 or 26%. To find the percentage of

multiplied by the number of columns, so in this

women who survived, divide ¢; by Ato obtain .74;

table, there are six cells (number of rows x

74% of all women survived.

number of columns = 3 x 2). First we need to know, What was the marginal

Table 17.1 7itanic E

distribution of scores for the Y (row) variable? In

E

other words, how were the passengers distributed on the categorical variable class? Within each row in the table, we can add the two cells to obtain a row total; for example, the total number of women in first class (771) is the sum of number of

women in first class who died and the number of women in first class who survived: 4 + 140 = 144 = пд. These numbers can be expressed as proportions (or percentages) by dividing them by the table total Y. In an earlier chapter, you saw that a distribution of scores for a categorical variable (such as class) could be graphed as a bar

84% Page 525 of 624 » Location 13142of 15772

Secondciass,

[ Thirdelass, 1-3

0,=140

n; = number in second class = 9

0,=1

ace

= number infirstclass = 14

0,= 76

7 =numberinihirdelass —="

=mumberwho N = total number c,=numberwho urvived =256 des = 106

Table 17.2

sa

examining selected percentages calculated from observed frequencies in this table. Later, we can assess whether contingency can be judged

Second class Xe Thirdclas: X=8

in secondcla ber of third-class women: Proportion in third class: 165/40 Total Nof women = 422 Sum of percentages: 96% + 23% + 41% = 100%

statistically significant. Like bivariate Pearson correlation, contingency is information on whether Ycan be predicted from X. To look for possible contingency between the row

Table 17.3

ICETー|

and column variables, we examine percentages with each row of the table. (In all examples that follow, the independent variable corresponds to rows in the table.) The row percentages’ will tell

c,=numbersurvivec 06 5402 Proportion survived = Tor at

N= total number = at;

On the basis of these marginal distributions, we

know that:

us whether the proportion or percentage of women who survived (those with scores of Y= 2) differs across the three passenger class groups (with scores of X= 1, X= 2, and X= 3). We examine the data for each of the three passenger class

e More women were in third class than in first or second class (although the group sizes did not differ greatly). e If we ignore passenger class, most women (74%) survived.

17.3 What Is Contingency? If you haveseen films about the 7itan:c sinking, you know that a 74% survival rate did not apply equally to all women. You probably understand

that women in third class had alower chance of survival than women in first class. Comparison of

in this table will tell us how much these survival rates differed. If percentage of persons who survived differs across the three passenger classes, we can say that survival was

contingent on passengerclass. Contingent means “related to” or “predictable from.” Tables similar to Table 17.1 are called contingency tables because the observed frequencies in the cells can be used to evaluate whether survival status (7) is contingent upon passenger class (7). We can

assess whetherthere is contingency by 84% Page 526 of 624 + Location 13167 of 15772

groups separately and compute the proportion who died and the proportion who survived separately within each class. Think of each row in Table 17.4 as a separate group and examine the fates of women in each passenger class separately. (The row for Group 1, women in first class, is shaded to highlight that this is one of three separate groups.) What percentage of women in first class survived? What percentage of women in third class survived? To obtain row proportions, the observed frequency in eachcell, O, is divided by the corresponding row total 77. For example, to find the percentage of all women in first class who survived, we divide the number of survivors in first class (017 = 140) by the total number of women in first class (771 = 144). The proportion of

women in first class who survived is 140/144 = .972;if we convert this to a percentage, we can say

that 97.2% of women in first class survived. Other

(17.1)

Rowproportion = Observed value/Correspondingrow total n.

A column percentagefor a cell is obtained by

Other

dividing the 7 of cases in that cell by the total

(17.2) Row percentage = Row proportion x 100. Table 17.4 shows the row percentages for each of the three passenger classes. Proportions of death versus survival were calculated separately within each of the three passenger classes. Within each passenger group, percentages of those who died and survived sum to 100% (within rounding error). We can make a comparison at this point. More than 97% of women in first class survived,

number of cases in that column. A set of column percentages for Table 17.4 would answer the question, Among those who died, what were the percentages of women passengers in first, second, and third classes? For example, among the 106 women who died, only 4 (4/104 = 3.8%) were first-class passengers.

17.4 Conditional and Unconditional Probabilities

while only about 46% of women in third class

The value of .26 or 26% deaths for all women in

survived.

the Zitanicdata (the marginal proportion of death in the bottom margin in Table 17.1) is an example

Table 17.4

of an unconditionalprobailty. That is,if we ignore passenger class, what percentage of all

women died? This is a rate or risk for death that is not conditioned on, or limited by, or dependent on, or statistically related to, passenger class

14% ol women in second class died X=3

membership.

86% of women in second class survived. n=16 54% + 46% = 100%

Pr(specific outcome) can be used to denote unconditional probabilities of specific events. For the Titanicdata, these are the unconditional

We can interpret row percentages as probabilities.

probabilities of two outcomes: death and survival.

For a woman in first class, probability of survival was about 97%, while for a womanin third class, probability of survival was only 46%. Thisis a

Other

Prídeath) or Pr(Y = 1) = .26 (or .26%).

substantial difference in outcomes. Later in this chapter you will learn how to test whether the

Pr(survival) or Pr(Y = 2) = .74 (or 74%).

differences in percentages are large enough to be judged statistically significant. Thinking in terms

Because death and survival are the only possible

of effect size, we should also ask whether a

outcomes, these unconditional probabilities must

difference between percentages is large enough to

sum to 1.0 (or 100%), within rounding error.

matter. If the survival rates for first and third class were 97.1% versus 89.9%, this difference

could be viewed as small.

Wecanalso obtain

When we look at probabilities within one selected row of the table (for example, the row for women in first class) we find values of .28 (or 2.8%) for

in tables.

84% Page 527 of 624 » Location 13192of 15772

death and .972 (97.2%) for survival. These row

percentages are called

where a corresponds to any possible score value

Given the condition that a woman wasin first

for X(such as 1, 2, or 3), and 4corresponds to any

class (e.g., if we decide to look only at data for

possible score value for Y(1 or 2). The values used

first-class passengers), her probability of death

to compute conditional probability for each group

was only 2.8%, and her probability of survival was

in the table appearin Table 17.5. (The numbers in

97.2%. If we specify other conditions (for

Table 17.5 are the same as in Table 17.4, but they

instance, examine only women in third class),

are now labeled formally as conditional

conditional probabilities are different.

probabilities.)

Conditional probability is denoted using a vertical

By now you have probably noticed that the

line. Before the vertical line, we identify the

conditional probability of death for women in

outcome of interest (for example, woman is dead;

first class (2.8%) is much lower than the

in other words, she has a score Y= 1). After the

conditional probability of death for women in

vertical line, we identify the condition that we

third class (46%). The conditional probability of

assume, or the group that we select (for example,

death for passengers in second class falls in

woman is in third class, X = 3). The conditional

between. Passenger class wasrelated to chance of

probability of death given that a woman is in third

survival. A higher percentage of women in third

class can be written:

class than in first class died, or equivalently, a lower percentage of women in third class

Other

Pr(death | third class) or Pr(¥ = 1 | X= 3). To obtain this conditional probability, divide

number of women who were in third class and died (031) by the total number of women in third class (773): (89/165) = .549 or 54%.

relationship) between the Yand Yvariables, we compare conditional probabilities across groups.

probability:

If the probability of death were the same for all women, regardless of whether they had first-,

Other

second-, or third-class tickets, then the

(17.3)

Pr(F = 1X = a) Number of people who have scoresof Y =band X =a Number of people who have scores ofX = a

X=3

Pridied PrÜY=11Xfirst clas Prídied | second clase) PaY=1|4=2=0,Ín,=-14 Prídied th ua Pr=1|X Unconditional prot Prideath): Pr(Y= 1) = 26

conditional probabilities of death for first, second, and third class would be equal. All conditional probabilities would also be equal tothe

Table 17.5 7itanic

ィz

17.5 Null Hypothesis for Contingency Table Analysis To evaluate possible contingency (or predictive

Here is a general formula for conditional

ィn

survived.

unconditional probability. Here is the formal null hypothesis for the 77tanic data. If survival were

class) J⑥ n, 86 Unconditional probability of survival, Prisurvived): Pr(Y=2)=.14

84% Page 528 of 624 - Location 13218 of 15772

not contingent on or related to class of passage, then the expected result would be:

Other

Hy Pr(dead | first class) = Pr(dead | second class) = Pr(dead | third class) = Pr(dead) = .26 or 26%.

Because there are only two possible outcome

example of research about possible health benefits

values for Yin this example (died vs. survived), we

of pet ownership. Data from the actual study are

could state an equivalent null hypothesis as

in the file dog.sav. The values of Oin each cell in

follows:

Table 17.6 are the observed number (or frequency) of cases in each of the four possible

Other

Hg Pr(survived | first class) = Pr(survived| second class) = Pr(survived| thirdclass) = Prsurvived) = 74 or 74%

groups: non-dog owners who died, non-dog

Here is a more general statement of the null

dog owners who survived. The values of 72g and 7;

hypothesis for tables of all sizes. For all possible

are the marginal row frequencies (total number of

values of and a:

nonowners and owners). The values of ¢p and ¢;

owners who survived, dog owners who died, and

are the marginal column frequencies (total

Other

number who died,total number who survived).

(17.4)

H,: PY =b1X=1) = Pr(Y =D). In words, this equation says that the conditional

probability Pr( ア= ク | ギ = の equals the unconditional probability r( ア = ク.

The survival status codes are different in the dog data than in the 77tanicdata. It is acceptable to use any numerical values to describe levels of categorical data; survival status was coded 1, 2 for the Zitanicdataand 0, 1 for the dog owner data. For categorical variables, numbers are only labels

The alternative hypothesis is that there is at least

for group membership. The choice of numerical

one difference between conditional probabilities

labels for group membership makes no difference

and unconditional probability somewhere in the

in the results. Small integer values are most

table.

convenient. The dog ownership data set is used to

17.6 Second Empirical Example: Dog Ownership Data

table. Here are the steps included:

demonstrate complete analysis of a contingency

A study by Friedmann, Katcher, Lynch, and

1. Examine expected frequencies to evaluate whether data are appropriate for x2 analysis. 2. Obtain the table of observed frequencies from

Thomas (1980) reported data about 92 men who

the SPSS crosstabs procedure and examine

had a first heart attack. The researchers were

marginal distribution for X, marginal

interested in variables that would predict survival

distribution for F, and row percentages. The

1 year after the heart attack. Each man was asked

marginal distribution for アprovides

numerous questions about hislifestyle, including

information about unconditional probability

whether he owned a dog (X, coded 0 = no, 1 = yes).

of each Youtcome. Row percentages provide

At the end of a 1-year follow-up period, the

information about conditional probabilities.

researchers recorded whether each man had survived;this was the outcome or dependent variable (7, coded O = dead, 1 = survived). They examined whether dog ownership wasrelated to (or predictive of) survival. This was an early

85% Page 520 of 624 - Location 13245 of 15772.

Table 17.6

No(X=0) Yes(X=1) Columntotal

0=3

CRETEEEES n, number who

survival. There are three possibilities. First, dog

0=50

walking causes overexertion or increases risk for

in dogs =3 n, Number who own dogs =53

су = питье dead =14

Source: Friedmannet al. (1980).

owners might be more likely to die (perhaps dog falling). Second, dog owners might beless likely to die (perhaps the companionship of a dog and/or the beneficial mild exercise provide health benefits). Third, there might be no association of

3. Obtain y? or another significance test to evaluate the null hypothesis that whether a man owns a dog is unrelated to whether he survives. On the basis of the value of x2 (along with dfand a level), we can judge the outcome of the study statistically significant or not statistically significant. 4. Obtain effect size information: A ¢ (phi) coefficient can be obtained for a 2 x 2 table,

and Cramer's Vcan be used for tables with more rows or columns. Value of ¢ can be interpreted like Pearson's 7. 5. Evaluate the nature of the association by

dog ownership with survival. Which outcome do you expect to see? The null hypothesis in this exampleis:

Other

Pr(dead | own dog) = Pr(dead | don’t own dog) = Pr(dead). You should be able to state that null hypothesis in words. Which terms in this null hypothesis are conditional probabilities, and which is an unconditional probability? The outcome of the study appears in Table 17.7; row proportions are included. On the basis of the

comparing row percentages (conditional

numbers in this table, were the results of this

probabilities) across groups. In this example

study consistent with what you expected?

the question is whether dog owners have a different probability of survival than men

About 28% of men who didn’t own dogsdied,

who don’t own dogs.

while only about 6% of the dog owners were dead by the end of the year. It appears that dog owners

Confidence intervals for proportions and

had better survival outcomes. This study was not

percentages are rarely included in research

an experiment, so even if it suggests an

reports, although these can be obtained. In

association between these variables, we can’t say

contrast, opinion poll results (for example, the

that dog ownership causes better survival

proportion of voters who favor passing a law) are

outcomes. Observed cell frequencies (like other

often reported with a of similar toa confidence interval. See Appendix 17A for

discussion.

sample data) will vary because of sampling error. We will need a statistical significance test to evaluate whether, taking sampling error into account, the difference between death rates of 6%

17.7 Preliminary Examination of Dog Ownership Data Think first about the nature of association you might expect between dog ownership and

85% Page 530 of 624 - Location 13271 of 15772

and 28% is substantial enough to be judged statistically significant.

Table 17.7

Now let's look at the other parts of Joe's hr score.

prediction we would makein the absence of other

Weknow that part of his score was not predicted

information. If no other information were

by the regression equation. How much of his score

available, we would predict his hr to be My, the

was predicted? The predicted part of his score

sample mean. This deviation is (¥ - My) = (84.2 —

correspondsto (7 - My. If we knew nothing about

81.2) = 3. In words, the regression equation

other variables that might predict hr, the best

predicted, on the basis of his anxiety, that Joe's

predicted hr for all persons in the sample would

heart rate would be 3 beats per minute faster than

be My, mean hr. The ¥ predicted value is an

the sample mean.

adjustmentto the prediction; how much higher or lower than Mpywould we expect each person’s score to be when we generate a predicted heart rate using anxiety? We now have three pieces of information: Joe’s actual hr, F= 88 Joe’s predicted hr, ¥ = 84.18 (rounded to 84.2) The mean hr for the sample, My= 81.2

You can check to see that Joe's total deviation (column 1) equals the sum of the residual (column 2) and the predicted part (column 3) of the score:

Y-M, mean

YY) (88-842)=38 The part ol Joe's hr that anlety could not predict; error or al foJoe

rm

predict

We can write an equation to summarize the

components or pieces of Joe's hr score: Other

Wecan use these numbers to divide Joe's total

Joe’ hr = Mp for entire sample+ (7 — Y) forJoe + (7 = My) forJoe.

deviation from the sample mean (7-1, into two

Joes hr=81.2+38+3.

parts:

In words, Joe's heart rate can be constructed from the sample mean (81.2), plus the part of Joe's

Total deviation of Joe’s heart rate from mean

score that could notbe predicted from his anxiety

= (У-М»= (88 - 81.2) = 6.8. Joe's hr is 6.8

(3.8), plus the part of Joe's score that could be

beats per minute abovethe averagein the

predicted from anxiety(3).

sample.

Wecan do this for every individual in the sample

Difference between Joe’s actual and predicted

(see fully worked example in Appendix 11D). Note

hr = (7-7) = (88-84.2) = 3.8. Joe's actual hr

that the values of (7— 7) and (Y - My will differ

was about 4 beats per minute higher than the

for other persons, and in many cases, one or both

value predicted by the regression equation

of these score components will be negative

from his anxiety. This wasthe part of Joe’s hr

numbers. You may find it helpful to calculate

that the regression did not predict or explain.

predicted scores and deviations for a few other

This is Joe's residual or prediction error.

cases and compare your results with those in Appendix 11D.

(¥ - Mp) is the difference between Joe’s predicted

Wecan locate Joe's score in a scatterplot, as shown

heart rate (from the regression) and the “default”

in Figure 11.16. Joe's actual and predicted scores

51% Page 309 of 624 - Location 7883 of 15772

column, the sums must equal the original

shows the O- Æ difference within each cell. If Ho

marginal frequencies in the table. You should

is true, and dog ownership is unrelated to

verify that the Z values sum to the row and

survival, these (O- £) deviations should be close to

column totals in the original table. This check

O. In simple terms, if Ho is true, we would expect

appears in Table 17.9. You should also verify that

observed outcomes (values of O) to be close to the

if you calculate row percentages on the basis of

hypothesized outcomes (values of £). As with

the values of Z, the row percentages are the same

other samplestatistics, these deviations will vary

for each row. This check appears in Table 17.10.

because of sampling error.

The percentage of dead persons on the basis of values of Fis the same (.15 or 15%) for the non-

Itis not a coincidence that we obtain the same

dog owners as for the dog owners. In other words,

value (5.1) for all four cells in Table 17.11, except

values of Frepresent the imaginary situation in

that some have plus and some have minus signs.

which conditional probabilities are equal across

The sum of (O- £) must be O across each row and

groups as specified in the null hypothesis.

down each column. Therefore, if we know any one (O- Æ) deviation in this 2 x 2 table, we can fill

Table 17.8

CE Dead(Y=0) ed (Y RowTotal Total Ps PROSННЕ Now =0) к ma xe) Yestr=1) Total

(бота зал E,=60x1M2=81 Ef x=" as НЕТ om

Nem

the constraint that (O- £) must sum to zero for (О- £) is sufficientto fill in the other cell values.

Table 17.1105

£,-59 E,=81 59481514

ЕТ 38144927

59+m1-3 81+49=

I2CT EE not have dog (о-в) Does have dog Column total N

(0-8=0-81 o

o

"o

Table 17.1207

Table 17.107

EXCCOETCCO ras n= No u=o) Ed, 150r15% Ein, Yes(X=1) Total

Table 17.12 is a brief exercise to show that, given each row and column, knowing just one value of

neg

Table 17.95

No=0) Yes r=) Total

in the values of (O- £) for the other three cells.

En, =B.68=150r15% Е, 14/92 = 15 or 15%,

9/53= B50r85% cs

17.9 Computation of Chi Squared Significance Test We want to know how much the observed

Fill in the values for each question mark, given that values must sum to 0 in each row

and column. Fillin the values for each question mark, given thatvalues must sumto in each rowand 。 not have dog =0) Does have dog

Column In general, for a table with rrows and ccolumns:

frequency (о) in each cell differs from the expected frequency(77). The values of O;for all

Other

four cells appear in Table 17.6; the values of £;;for

(17.6)

all four cells appear in Table 17.9. Table 17.11

85% Page 532 of 624 » Location 13325of 15772

4= (7-1) х (с- 1).

Because the dog ownership data have only two rows and two columns, the 2/for y? for this table is (2-1) x (2-1) = 1. Only one of the four (0-Z) differences is free to vary. This is analogous to a

the sum equals the number of cells in the table.

Other

(17.7)

computed a sum of squares (SSterm) in order to

x? = E[(O - E)/E].

obtain a sample variance, you learned that only

The dffor x2 is:

situation you have seen before. When you

the first N-1 deviations of Yscores from the sample mean are free to vary (Vis the number of

Other

scores in the sample). Once you know any (N- 1)

(17.8)

deviations, the last remaining deviation has a fixed value; it does not provide additional independent information about variance. We call N-1 the d/for a sample variance. For a

а= (7-1) х (c- 1). where 7is the number of rows and cis the number of columns in the table.

contingencytable, the Zfis based not on the For the dog ownership data, we obtain the

number of cases but on the number of rows and columns in the table.

following value of x2:

In addition, note that we cannot summarize

Other

information about (O- £) across all four cells in

X = (50 — 44.9)/44.9 + (28 — 33.1)/33.1 + (3 — 8.1)/8.1 + (11 - 5.9/5.9

the table simply by summing these deviations,

= (5.1)/44.9 + (5.1)/33.1 + (5.1)/8.1 + (5.1)/5.9

because the sum would be 0. You saw the same

= 8.85.

problem before when computing an SSterm to find a sample variance. As noted earlier, the bag of tricksin statistics is small. For both x2 and SS, we solve the problem that deviations sum to 0 in the same way: We square the deviations before we

sum them. Computation of y? requires one new thing: we need to scale each squared deviation to take numbers of cases into account. This is done by dividing the (O- £)? term for each cell by the value of £for that cell. (Findirectly provides

information about numberof cases in the sample.)

Because 7 (number of rows) = 2 and ¢ (number of columns) = 2,

Other

df=@-1)x@-1)=1. 17.10 Evaluation of Statistical

Significance ofx2 As usual, there are two waysto evaluate statistical significance. Because SPSS output has a “Sig.” or y value, you can compare the obtained p with a preselected a level, often a =.05.1fp< .05, chi

Combining all these operations gives us the formula to compute the x? test of association for a contingency table. The values of (O- £) are found

in Table 17.11. Note that the number of terms in

85% Page 524 of 624 + Location 13349 of 15772

squared is large enough to be judged statistically significant. You do not have to state that the test is one-tailed; readers will assume thatit is. Alternatively, you could look up critical values for X2, based on your sample d/ in the table in

under the graph line to the rightof this line is shaded,with a label that reads: upper 5%tail.

Appendix D at the end of this book and evaluate whether your obtained y? value exceeds the tabled

critical value.

For the dog owner and survival status data, with

It is useful to visualize distribution shapes when

an obtained x2 of 8.85 with d/= 1, we can reject

you think about reject regions (e.g., values of x2 for

the null hypothesis and say that there isa

which you reject Ap). Up until now the

statistically significant difference in survival

distributions you have considered most often, the

outcomes between nonowners and owners of

normal and /distributions, have been

dogs. Dog owners had a significantly lower

approximately bell shaped.In contrast, x2 is positively skewed. It has a minimum possible value of 0 (and therefore a fixed lower limit). There is no fixed upper limit to possible values. Similar to /distributions, the exact shape of y?

proportion of deaths (.06) than nonowners (.28).

17.11 Effect Sizes for Chi

Squared

distributions differs depending on df. Figure 17.1

For 2 x 2 tables like the dog owner data example,

shows the y? distribution with 1 d/ For 1 df the

the most commoneffect size is the coefficient

critical value of y? = 3.84 identifies the boundary ofthe upper tail that corresponds to 5% of the area. We can reject Hp using a = .05 if the obtained x2 exceeds 3.84. Figure 17.1 Chi Squared Distribution With d/= 1

($).

Phi can be computed from the cell frequencies in a

2 x 2 table. For tables with more than two rows or two columns, a similar statistic, Cramer’s V, is a widely used effect size. There are two ways to compute ¢;it can be obtained directly from the

observed values in the cells of the table, or it can be calculated from x2. Table 17.13 shows how frequencies of cases in the four cellsof a2 x 2 table are labeled to compute ¢. As in discussion of Pearson correlation, cases are called concordantif they have high scores on both

upper 5%tail 3.84

0

Æand Yor low scores on both Xand Y. Cases are called discordantif they have low scores on one variable and high scores on the other variable.

Textbooks and SPSS sometimes differ in the order

The graph shows 0 markedat the origin betweenthe horizontal and vertical axes, and a downwardsloping concave curve drawn

in which rows and columns are presented. To

between the two axes.

concordant values.

Aline is drawn upward from thehorizontal axis to the graph line at a point marked 3.84 towardtheright end oftheaxis. The area 85% Page 535 of 624 - Location 13378 of 15772.

compute ¢, make sure that the values you use for band cin the formula for ¢ correspond to

Assuming observed cell frequencies a through 4 are as shown in Table 17.13 (ie, aand d

Other

was Æ = 5.9, and x? was a reasonable analysis. The data in the SPSSfile dog.sav show you how data

(17.10)

appearin an SPSS file when you have the

ア = ん x ⑩.

categorical scores for each of the variables for

When tables have more than two rows or columns, ¢ cannot be used; a different effect size is needed. The most widely reported effect size for the chi-square test of association is Cramer's 7. Cramer's can be calculated for contingency tables with any number of rows and columns. Values of Cramer's Prange from O to 1. Values close to 0 indicate no association; values close to 1 indicate a strong association. Cramer’s does not provide information about direction of association (i.e., whether higher scores on X go with higher scores for Y).

each individual. To enter the Friedmann et al. dog owner and survival data into SPSS, one column was used to represent each person’s score on the variable named dog (coded 0 = did not own dog, 1 = owned dog), and a second column was used to enter each person's score for the variable survived (0 = did not survivefor 1 yearafter heart attack, 1 = survived for at least 1 year). Asin earlier examples, each row represents the scores for one person. The complete data set for this SPSS exampleis in the SPSS data file dog.sav. The

number of rows with scores of 1 in this data set correspondsto the number of survivors who

Other

owned dogs.

(17.11)

The SPSS menu selections to run the crosstabs 2

, ー と= ネ N xm

procedure are as follows. From the top-level menu, make these menu selections, as shown in Figure 17.2: >

> .

where x2 is computed from Equation 17.5, is the total number of scores in the sample, and mis the

This opens the SPSS dialog box for the crosstabs

minimum of [(number of rows — 1), (number of

procedure, shown in Figure 17.3. The names of

columns - 1)]. Like Pearson's 7, Cramer's Visa

the row and column variables were placed in the

symmetrical index of association; thatis, it does

appropriate windows. All examples in this

not matter whether the row or column variable is

chapter use the independentvariable as the row

the independent variable. Unlike Pearson's 7,

variable. In this example, the row variable

Cramer’s Vdoes not require a linear association

corresponds to the score on the predictor variable

between scores on the Yand Yvariables, nor does

(dog), and the column variable corresponds to the

it have a sign. For a 2 x 2 table, Cramer's Vequals

score on the outcome variable (survived). The

the absolute value of ¢.

Statistics button wasclicked to access the menu of optional statistics to describethe pattern of

17.12 Chi Squared Example Using SPSS

association in this table, as shown in Figure 17.4. The optional statistics selected included x2, q, and Cramer's 7. (Other available statistics are for different specificsituations; e.g., the McNemar

For the dog ownership data from the study by Friedmann et al. (1980), the lowest expected value

86% Page537 of 624 - Location 13431 of 15772

test, described in Appendix 17B, is for paired

samples; the tau statistics are used when

1 Crosstabs.

categorical variables provide ordinal information.) In addition, the Cells button in the

Rows): 団 & dog

main Crosstabs dialog box was clicked to open the Crosstabs: Cell Display menu, which appears in

ЕЕ

Figure 17.5. In addition to the observed frequency



for each cell, both the expected frequency for each

Layer 1 of t—————————

cell and row percentages were requested.

Press

Figure 17.2 Menu Selections for SPSS Crosstabs Procedure TB) dogsav [Dataset1] IBM SPSS Statistics Data Editor

=

Elo Edt View Data _Itansform Analyze cwe wan Extensions Window Help



ー コ

w

SHE 因 Coossastates

7 2 3 3

6 7 а

n ッ ョ

ZA

wm Compare Means

enr inaMot neralLinear ode mueModel Coie Begressi rinon

[El Erquences.

e E

Classy DimensionReduction see MonparameniTests Forecasting The details are asfollows.

A drop-down menu from thetask bar shows

several options, of which “descriptive

er cda leading to another drop-down menu. Thesecond drop-down menuhasseveral options, of which “crosstabs,” the fourth from the top, is selected. Arrowsare drawn pointing toward “descriptive statistics” and ycrosstabs.” Figure17.3 SPSS Crosstabs Dialog Box

86% Page 538 of 624 - Location 13458 of 15772.

[7 Dispiay clusterea bar charts

u

LE

0) Suppress tables

(a) Cm) Lei) (care Cena) Thedetails are as follows.

Pane on theleft: blank

Central panes, from the top downward SRow(s): entry shown: “dog” Column(s): entry shown: “survived” Layer 1 of 1: blank Buttons to the right: Statistics; Cells; Format

Figure 17.4 SPSS Crosstabs: Statistics Dialog Box

Figure 17.5 SPSS Crosstabs: Cell Display Dialog Box

4% Crosstabs: Statistics

4 Crosstabs: Cell Display,

[М chi-square

[E] Correlations

Counts —— rz-test-

Nominal———————

[Ordinal

M Observed

| (D Comparecolumn proportions

[¥ Expected

|

| | | [М Phi and Cramer's V | 回 Lambda | [D] Uncertainty coefficient | J [7] contingency coefficient

Nominal by Interval. ————

О

|

EN

|

|

[D Gamma

Bass

es (@

[1] Hide small counts |

回 Somers d

D Kendairs tau-b [1] Kendalrstau-c

E Kappa © Risk [] McNemar

[LD Cochran's and Mantel-Haenszelstatistics

rPercentages—]

Residuals———————— Instandardized

M imew

Standardized Adjusted standardized

白 cuw | [E Total

NonintegerWeights © Round celicounts (© Round case weights © Truncate cell counts © Truncatecase weights

© No adjustments.

(conte) (canon) (ves Thedetailsare asfollows.

The details ofall the check boxes, of which only twoare checked, are asfollows. o Chi-square (checked) o Correlations Nominal © Contingencycoefficient © Phi and Cramer’s V (checked) © Lambda

o Uncertainly coefficient * Ordinal © Gamma

© Somers'd o Kendall's tau-b o Kendall's tau-c Nominal by interval

o Observed;checked « Expected; checked « Hide small counts z-test:

compare column proportions Percentages: Row;checked + Column e Total Residuals:

* Unstandardized ・ standardized ・ Adjusted standardized

+ Eta

о ооо

Counts:

Kappa

Risk

McNemar

Cochran's and Mantel-Haenszelstatistics

Buttonsat the bottom: Continue; Cancel;

Help

86% Page 540 of 624 - Location 13471 of 15772

Noninteger weights: * Round cell counts; selected * Roundcase weights

* Truncate cell counts * Truncatecase weights

Thedetails are as follows.

+ No adjustments

Table1

survived Did not survive dog Does not own à dog Count ュ Expected count 5.9 % within dog 282% Owns a dog Count 3

. Buttonsat the bottom: Continue; Cancel; Help

17.13 Output From Crosstabs

Expected count 8

Procedure

% within dog Count

Total

The output from the crosstabs procedure for

summarized in Table 17.15.

誠言m

The top panel in Figure 17.6 shows the

value 弓eme sided) 8851 (note 1 [0003



o

Uikelihood ato

som

[1 [0005

nesyear

8755

T 1605

;

frequencies and row percentages. The second

Note b: Computed only for 2 by 2 table.

panelreports the obtained value of x? (8.85) and

=

A us R symmetrical measures of association (effect sizes) including the val f (310) and C is

ik

?

(.

he

and

val

Cramer's

= Nominal by

= Phi

Nominal

Cramer's V

=

Fth

(also .310). Like Pearson's 7, the values of these

Bats

sided)

[ ー



value 0.310

Approx 0.003

0.310

0.003



additional tests. The third panel reports

valuesof

fossem

sided)

1 [0.007

N of valid cases



92

effect sizes do not differ depending on which variableis treated as independent.

Ifyou have only observed cell frequencies, and do

not have the rowsof data for individual

陳 。 participants, you can use one of many online ② p

calculators to obtain y“. Figure 17.7 shows an

example.

Figure 17.6 SPSS Crosstabs Output From Dog and

Survival Status Data Rcareu.com

ea] Lo съvie f-85108SER8601 a ные homme Rom x Cours E X7

86% Page 540 of 624 » Location 13482 of 15772

Table 17.15

ー Cased gency

Eor expected frequency Chi square or chi squared or a?

will value o or row variable



|!

: 16: 34.3%

association 7 Nof valid cases Note a: 0 cells (0.0%)have expected countlessthan 5.

contingency table with observed and expected cell

Е

94.3% 78

Table 2 Pearson chisquare

al:

ms

5.7% 14

Sessioni Ra % within dog] 15.2%

these data appears in Figure 17.6. Notice that SPSS ⑧ uses different terminology than most textbooks,

including

survived 28 331 71.8% 50 !

=

Expected count Pearson chisquare

'%within dog(or within other rowor value of

independent variable)

cs

Figure 17.7 Online Calculator for x? Using Cell Frequencies as Input

Са Нем

‚сот

The following kinds of information should be

included in “Results” sections. Some of this can be in earlier parts of a report. * Were assumptions for use of the analysis satisfied? (For x2, we do not wantvalues of Z

== osrca ОНИ ra EEE om Com EE

in the tableto beless than 5, as discussed in the next section.) What kinds of persons were included in the sample? e Whatstatistical significance test was done,

Source:

with what df and what p value was obtained?

https://www.icalcu.com/stat/chissqtest.htm

For contingencytables, the most commonly

L

reported test is x2. * What wasthe effect size? For contingency tables this is usually ¢ or Cramer’s V. Values

The details are as follows.

of ¢ are sufficiently similar to Pearson’s 7that

Chi-Squarecalculator

similar values can be called small, medium,

Youcantype any rows/columnsof numbers separated by space or comma.You can also copy and paste any rows/columnsof numbers from a table (excel, word, or othersoftware). An example is listed below: 5786

and largeeffects. What wasthe nature of the relationship? This is described by comparing row percentages or conditional probabilities. (In this example, the survival rate for dog owners was higher than for nonowners.)

6875

Results

Within box:

A survey was done to assess variables that

11 space28

might predict survival for 1 year after a first heart attack. The study included 92 men. The

3 space 50

smallest expected value was 5.93; therefore,

Buttonsbelow: Calculate; Reset

x2 was judged to be an appropriate analysis.

Chi-square value: 8.851085698691

Only one predictor of survival is reported here: dog ownership. Table 17.6 shows the

Degrees of freedom:1

observed cell frequencies for dog ownership

P value: 0.002929146051

and survival status. Of the 53 dog owners, 3

Rows X columns: 2 by 2

11 did not survive. This was a statistically

did not survive; of the 39 nonowners of dogs, significant association: y2(1, N= 92) = 8.85, p

17.14 Reporting Results 86% Page 542 of 624 » Location 13498 of 15772

5 and

other terms used in SPSS output, such as

that no cell should have an expected value less

asymptotic, unless you report results to a

than 1. These criteria are related to the need for a

mathematical statistician or obtain x2 values

reasonably large number of cases in each row and

from more complex methods such as

each column. Low values of Z'tell us indirectly

structural equation modeling. (Minimum

that there are small marginal totals for one or

values for Fare discussed in the next section.

more rows or columns. In these situations, if you

The values in parentheses following y? are df

change the cell group membership for just one

and then table total N.)

case, the results for x2 can change substantially. It

is undesirable to have a research outcome for which results would change dramatically if the

17.15 Assumptions and Data Screening for Contingency Tables

participants were different. Here is a hypothetical

17.15.1 Independenceof

17.15.3 Hypothetical Example: Data With One or More Values of

Observations Both the 77anic data and the dog ownership data are completely between-S. That is, each person can be a member of only one group on each of the categorical variables (i.e., each person can be alive or dead, but not both). The design required for the Y? test of contingency must not have repeated measures or paired samples. It is possible to have cross-tabulated data in repeated measures studies. For paired samples or repeated measures, the McNemar (1947) test can be used, provided variables are dichotomous (only two categories). See Appendix 17B for details.

17.15.2 Minimum Requirements for Expected Values in Cells Most sources say that x2 can be used to analyze

86% Page 542 of 624 » Location 13522of 15772

group memberships of just one or a few example.

Æ > procedure was used to obtain a histogram and descriptivestatistics for each

group. Figure 12.2 Using the SPSS Command to Obtain Output for Separate Groups

Мен Gus Tw asa ne e

caffeine. To evaluate whether the independence of observations is violated, we need to know the

research situation. If we know that each participantis tested under only one treatment condition and that there was no matching or pairing of participants for the samples, then assumption that scores are independent between groups should be satisfied. If we know that each participant was tested individually and that the

ontounto Response Su.

Hs ョーで Le

ロ ーニー

sears ox) er) cm ue

participants did not have any chance to influence one another’s levels of physiological arousal or heartrate, then the assumption that observations are independent within groups should be

satisfied. Data analysts can evaluate whether scores within each sample have reasonably normal distribution shapes and no extreme outliers and whether the

The image is a screenshot of theSPSS split file commandthatseparates output for different groupsbasedon specific criteria. Atthe top are the menu buttonssuch as; view,

SE 55% Page 324 of 624 - Location 2548 of 15772

variables by a researcher and experimental

was named the / ratio in honor of Sir Ronald

control over

Fisher, one of the major contributors to the

other variables that

influence the participants.

outcomes or Often

might

responses of

experiments

involve

comparisons of mean scores on one or more outcome variables across groups that have received

different

types

or

amounts

of

development of modern statistics.) Factor: In the context of analysis of variance, a categorical predictor variable is usually called

treatments.

a factor. In an experiment, the levels of a

Exploratory studies:

of treatment, or different dosagelevels of the

Studies

factor typically correspond to different types

that

include

large

numbers

of

same treatment, administered to participants

variables and may evaluate large numbers of

by

the

researcher.

In

nonexperimental

hypotheses, including hypotheses that arise

studies, the levels of a factor can correspond

during the process of data examination.

to different naturally occurring groups, such as political party or religious affiliation.

External validity: In psychology, the degree to which research

Factorial design:

results can be generalized to participants,

A design in which there is more than one

settings, and materials beyond those included

factor or categorical predictor variable.

in the study. Note that internal validity is related to causal inference; external validity is related to generalizability.

Fixed factor: A factor in an analysis of variance is “fixed”if

the levels of the factor that are included in



the study include all the possible levels for The frequency of cases in a group. See also

that factor or if the levels of the factor

Jrequencyand n.

included in the study are systematically selected to cover the entire range of “dosage

Fratio: In analysis of variance and other analyses, an F ratio is obtained by taking a mean square that

represents

variability

that

can

be

predicted from the independent variable (in the case of analysis of variance, this provides information about differences among group means) and dividing it by another mean square that provides information about the variability that is due to other variables or “error.” If Fis much greater than 1 and if it

levels” that is of interest to the researcher. For example, if we code gender as 1 = male, 2 = female and use these two levels of gender in a factorial study, gender would be treated as a fixed factor. If we select equally spaced dosagelevels of caffeine that cover the entire range of interest (e.g., 0, 100, 200, 300, and 400 mg), then caffeine would be treated as a

fixed factor.

Floor effect:

exceeds the tabulated critical values for 7, the

When scores have a fixed lower limit, such as

researcher concludes that the independent

0 on an exam, and when many scores are

variable

predictive

close to that minimum possible value, there is

relationship with the outcome variable. (It

a floor effect. If this distribution occurs for

has

a

significant

92% Page 597 of 624 - Location 14451 of 15772

more

of this

model;

general

for

examination scores, it suggests that the

cases

examination was too difficult. A floor effect is undesirable.

example, in one-way analysis of variance,

scores on one continuous outcome variable are predicted from one categorical variable.

Frequency: The frequency of cases in a group is the same as the 7 of cases in a group. Later in the book, nis used instead of/to report groupsize.

All these analyses involve the computation of similar terms (e.g., sums of squares).

Generalizability of results: The degree to which a researcher can claim

Frequency distribution table: A list of all possible scores on a variable, along with the number of persons who received each possible score, is called a frequency distribution. For example, a frequency table for the variable type of tobacco used could be

as follows:

that results obtained in a specific sample would be the same for a population of Results from

interest.

generalized to an actual population of interest

ype of Tobacco

representative

is

sample

if the

of the

population; representativeness can often be obtained

Number (Frequency of Persons

a sample can be

using

random

systematic

or

methods to select the sample. Results from an accidental or a convenience sample may be generalizable to a hypothetical population if

None

43

the

Cigarette

41

population. Results from a biased sample are

6

Pipe

11

Chewing tobacco

resembles

sample

not

generalizable.

hypothetical

that In

experiments,

generalizability also depends on similarity of type and dosages of experimental treatment to real-world experiences with the treatment

Galton board:

variable, setting, and other factors.

A physical device that demonstrates the

distribution of outcomes for Bernoulli trials (binary chance

decisions). Also

called a

Grand mean: The mean for all the scores in an entire study, denoted by Myor Merand-

quincunx.

Gaussian distribution: See normaldistribution. General linear model (GLM): The most general case of this modelis one in

Harmonic mean of 7's: A method of computing an average 77 across groups that have unequal 77s. Hinges:

which one or several predictor variables

The hinges are the 25th and 75th percentile

(which may be categorical or continuous) are

points

used to predict outcomes on one or several

quantitative variables shown in a boxplot or

outcome variables (which may be categorical

box and whiskers plot.

or continuous). Most of the analyses taught in introductory statistics courses are special

92% Page 597 of 624 - Location 14473 of 15772.

Histogram:

for

a

distribution

of

scores

on

A graph that provides information about the

We use statistics in an inferential way when

number or proportion of people with a

we estimate population characteristics (such

particular score, for each possible score or

as 4) from sample statistics (such as M) or

interval of score values on a quantitative

when we extrapolate beyond the cases in the

variable (X). Typically, X score values are

study to some larger hypothetical population.

indicated by tick marks on the Xaxis. For each

When we make inferences about populations

X score, the height

of the vertical bar

on the basis of information in samples, we

corresponds to the frequency (or proportion)

must take sampling error into account. Thisis

of people in the sample who have that score

often done by setting up confidence intervals

value for X (or a score for X that falls within

or conducting statistical significance tests.

the indicated score interval). The reference

This is in contrast to descriptive uses of

scale

statistics, in which we use statistics such as M

used to

evaluate the

information

provided by the height of each bar

is

only to describe the data in the sample. In

indicated on the Y axis, which is usually

most published research reports, researchers

labeled in terms of either frequencies or

hope to be able to say something about

proportions of cases. Conventionally, the bars

populations beyond the cases in the study, so

in a histogram touch one another.

they generally use inferential methods.

Inner fences:

Homogeneity of variance: The

assumption

that

variances

of

the

populations being compared (using the ¿test or analysis of variance) are equal. For a ¿test or analysis of variance, possible violations of this assumption can be detected using the

Levene test or other test statistics that

compare the sample variances across groups. In regression or correlation, homogeneity of variance refers to an assumption of uniform

variance of Fscores across levels of X.

are

made

when

research is based on an accidental or a convenience

sample.

For

Institutional animal care and use committee

(LACUC): Research procedures involving nonhuman animal participants must be approved by an

TACUC before data collection. Institutional review board (IRB): research that involves humanparticipants in

The (often imprecisely defined) population to generalizations

the inner fences.

An IRB reviews and evaluates all proposed

Hypothetical or imaginary population: which

The ends of the whiskers in a boxplot mark

instance,

a

researcher who studies the effect of caffeine on anxiety in a convenience sampleof college students may want to generalize the results to all healthy young adults; this broader population is purely hypothetical.

Inferential use of statistics: 93% Page 598 of 624 - Location 14493 of 15772

the United States. Researchers must obtain IRB approval before collecting data from human

participants.

The

corresponding

committee that reviews and evaluates research that involves nonhuman animal subjects is the institutional animal care and use committee (IACUC).

Interaction effect: This is a pattern of cell means in a factorial

ANOVA that is different from what would be

predicted by summing the grand mean, the

(where # is the number of levels for the

row effect, and the column effect. When

within-S factor). Each row corresponds to a

there is a significant interaction, the lines

different order of treatment presentation.

that connect cell means in a graph are not

Each treatment appears once in each row and

parallel; in other words, for members of the

once in each column. Type of treatment is not

A, group, changes in scores on the dependent

confounded

variable across levels of the B factor are not

Ideally, each treatment would follow each

the same as the changes in the A, group.

other treatment only once (to control for

Interaction effects correspond to a pattern of

carryover effects).

cell means that cannot be reproduced just by summing the main effects of the row and column factors. Interaction is equivalent to

moderation.

with

order

of presentation.

Level ofconfidence: When setting up a confidence interval, the level of confidence, C, is usually arbitrarily set at 95% or 90%. If all assumptions for use of confidence intervals are correct, then in the

Intercept:

long

See bo.

run,

if

we

set

up

thousands

of

confidence intervals using samples from the

Internal validity:

same population,

C%

of the

confidence

The degree to which results from a study can

intervals are expected to contain p, and (1 —

be used as evidence of a causal connection

C%) are expected not to contain y.

between variables. Typically, well-controlled

experiments can provide stronger support for causal

inference

than

nonexperimental

studies.

Levels of a factor: Each

group

in

an

corresponds

to

a

analysis level

of

of variance the

factor.

Depending on the nature of the study, levels

Interrater reliability:

of a factor may correspond to different

An assessment of consistency or agreement

amounts of treatment (for instance, if a

for two or more raters, coders, or observers. If

researcher

the

categorical variables,

caffeine, the levels of the caffeine factor

percentage agreement and Cohen’s kappa (x)

could be 0, 100, 200, and 300 mg of caffeine).

may be used to quantify agreement; if ratings

In

involve dichotomous (yes/no) judgments or

correspond to qualitatively different types of

quantitative ratings, then Cronbach’s alpha or

treatment

KR-20 may be used to assess reliability.

compare Rogerian, cognitive behavioral, and

ratings

involve

Interquartile range (IQR):

The distance between the 25th and 75th percentiles in a boxplot. This range includes

the middle 50% of scores.

other

manipulates

cases, levels (for

the

of a

instance,

a

dosage

factor study

of

may might

Freudian therapy as the three levels of a factor called “type of psychotherapy”). In some

studies

where

naturally

occurring

groups are compared, the levels of a factor correspond to naturally occurring group memberships (e.g., gender, political party).

Latin square: A Latin square has 2 rows and Z columns

93% Page 598 of 624 - Location 14516 of 15772

Likert scale:

Rensis Likert, a sociologist, devised this rating scale format. Respondents are asked to report their degree of agreement with a statement about an attitude or a belief using a multiplepoint rating scale (usually five points, with labels that range from 1 = strongly disagree to 5 = strongly agree). In practice, rating scales often have more than five points, and points may belabeled for things other than degree of agreement, for instance, reports of behavior frequency.

with a margin of error of +3%.

Marginal frequencies: In a contingency table, these are the total numbers of cases in each row or each column, obtained by summing cell frequencies within

each row or column. Mauchly's sphericity test: A test of the sphericity assumption for analysis

repeated-measures

variance,

of

required only when there are more than two levels of the repeated-measures factor. The

Linear: Let 7 stand for the slope of a line in a

null hypothesis is that the contrasts (C; - の)

scatterplot of X, Y scores. The association

(Cy — C3), and so on, have equal variances. If

between Yand Vis perfectlylinearif, for each

the sphericity assumption is violated, the 7

one-unit increase in X score, there is a

ratio for standard repeated-measures analysis

constant and consistent increase of 4 units in

of variance

the Yscore, asin Figure 10.1. An association is

underestimate the true risk for Type I error.

approximately linear if a one-unit increase in

Possible

X is associated with an average increase of 7

assumption are either (a) use of corrected d/

units in the Fscore.

based on either the Greenhouse-Geisser or

is

biased;

remedies

its p value

for violations

will

of this

Huynh-Feldt procedure or (b) multivariate

Literature review:

analysis of variance (for more advanced

In science, a review of relevant past scientific

students).

Violation

of

this

assumption

published

creates more serious problems than violation

research, but unpublished results may also be

of the (similar) homogeneity of variance

discussed.

assumption in between-S analysis of variance

Usually

literature.

focuses

on

and should not be ignored.

OVA:

Mean (M9:

See multivariate analysis of variance.

A measure of central tendency



Margin oferror: Surveys

or

polls

of

attitudes

or

voter

intentions often report results in terms of sample proportions. The margin of error

y

that is

obtained by summing the scores in a sample ② and dividing by the number of scores. Median:

reported for percentages in polls is often (but

A measure

not always) plus or minus one standard error

obtained by ranking the scores in a sample

for the proportion estimate. For example, a

from lowest to highest and identifying the

of

score that has 50% of the scores below it and 50% of the scores aboveit.

polling

expert

might

say that

52%

registered voters who were contacted said that they favored passage of Proposition 21,

93% Page 599 of 624 » Location 14538 of 15772

of central tendency that

is

Meta-analysis: Combining

effect

multiple

studies,

variance

of

effect

size

information

examining size,

and

from

mean

variance

examines

outcome

variable,

means ¥,

while

on

just

one

multivariate

and

analysis of variance compares a vector or list

sometimes

of means on p outcome variables across

searching for variables that explain why

groups (F1, F2, …, 垂 )

effect sizes are larger in some studies than in

others. The total number of observations in a sample.

Missing value: A number (or blank) in a cell in an SPSS data sheet that represents a missing response is called a system missing value; such values are excluded from computations.

if both of these statements are true: Yoccurs

In this book the term mixed models refers to of

variance

that

include

both

within-Sand between-Sfactors. The term can be defined more broadly to include other types of designs.

only after Yhas occurred, and Yalways occurs

after Yhas occurred. Necessary but not sufficient: X is a necessary but not sufficient condition

for ¥if both of these statements are true: У can occur only if Yhas happened, but, when Y

Mode: A measure

of central tendency that

is

obtained by finding the score in a sample that has the highest frequency of occurrence. A frequency distribution can have more than

one mode.

Rating scale items may have any number of response alternatives, and many different types of labels can be used for response alternatives (such as frequencies of behaviors or intensities of feelings). A rating scale is often called a Likert scale; the original format proposed by Likert involved rating degree of agreement on a five-point scale.

a

An asymmetric distribution that has a longer tail at the low (or negative) end of the

multivariate

generalization

of

it involves comparisons of means across however,

univariate

Nominal variable: At a nominal level of measurement, numbers serve only as names or labels for group membership

and

do

not

convey

any

information about rank order or quantity. See also categorical variable.

Nondirectional test: hypothesis that does not specify a directional

analysis of variance. Like analysis of variance, groups;

Negatively skewed:

A significance test that uses an alternative

Multivariate analysisof variance (MANOVA): is

happens, Ydoes not always happen.

distribution is said to be negatively skewed.

Multiple-pointrating scale:

This

Necessary and sufficient: Xis anecessary and sufficient condition for ¥

Mixed models: analyses

The number of cases in a group.

analysis

93% Page 599 of 624 - Location 14561 of 15772

of

difference. For Ho: u = 100,the nondirectional alternative hypothesis is 71: y = 100. For a nondirectional test, the rejection regions include both the upper and lower tails

of the z or distribution. See also two-tailed

test.

An algebraic statement that some population parameter has a specific value. For example,

Nonequivalent control group: When

individual participants

randomly

assigned

to

cannot

treatment

be

and/or

control groups, we often find that these groups are nonequivalent; that is, they are unequal on their scores on many participant characteristics prior to the administration of

treatment. Even when a random assignment of participants to groups occurs, sometimes

nonequivalence

among

groups

occurs

because of “unlucky randomization.” If it is not possible to use experimental controls (such as matching) to ensure equivalence, analysis of covariance (ANCOVA) is often used to try to correct for or removethis type of nonequivalence. However, the statistical

control for

one or more

covariates

in

ANCOVAis not guaranteed to correct for all sources

of

nonequivalence;

also,

if

assumptions of ANCOVA are violated, the adjustments it makes for covariates may be incorrect.

nonexperimental

investigator

the null hypothesis for the one-sample z test is usually of the form Ho: nu = ¢, where cis a specific numerical value. In other words, Ho is the assumption that the population mean on a variable corresponds to a specific numerical

value c. In the evaluation of mean human body temperature, the null hypothesis is p = 98.6°F.

Null-hypothesis significance testing (NHST): Null hypothesis significance testing involves the selection of an alpha level to limit the risk of Type I error, statements of null and alternative hypotheses, and the evaluation of obtained research results (such as a #ratio) to decide whether to reject the null hypothesis. In theory,if all the assumptions for NHST are satisfied,

the

risk

for

Type

I

error

is

(theoretically) equal to alpha. In practice, some assumptions of NHST are often violated, so obtained p values may not accurately estimate the true risk for Type I error in many

studies.

Nonexperimental research design: In

Null hypothesis (Ho):

does

not

research,

the

manipulate

an

independent variable and does not have experimental control over other variables that might influence the outcome of the

Numeracy: Skills needed to evaluate simple numerical or

statistical information.

Odds ratio: An odds ratio is a ratio of the odds for

study.

Normal distribution: The mathematical definition of a normal distribution is given in Appendix 6A. Analysts typically call an empirical distribution seen in a histogram “approximately normal” if its shape approximates that of a bell curve. Also

called the Gaussian distribution.

members of two different groups, often used to summarize information about outcomes

when

the

outcome

variable

is

a true

dichotomy. The odds themselves are also a ratio. In the study of survival status among owners and nonowners of dogs, we could set up an oddsratio to describe how much more likely survival is for a dog owner than for a nonowner by taking the ratio of the odds of

93% Page 600 of 624 - Location 14524 of 15772

survival for dog owners (16.67) to the odds of

participant receives two or more different

survival for a nonowner (2.545). This ratio,

treatments, it is possible that participant

16.67/2.54 = 6.56, tells us that in this sample,

responses depend on the order in which

the odds of survival were more than 6 times

treatments are administered as well as the

as high for dog owners as for nonowners.

type of treatment. Factors such as practice, boredom, fatigue, and sensitization to whatis

Omnibus test: A test of the significance of an overall model (such as a multiple regression) that includes all predictor variables. For example, the © ratio that tests the null hypothesis that multiple 2 = O is the omnibus test for a multiple regression. An F test that tests whether

all

population

means

that

correspond to the groups in a study are equal

to one another is the omnibus test in a one-

outcome measures taken at Times 1, 2, and so forth.

If all participants

experience the

treatments in the same order, there is a

confound

between

order

effect

and

treatment effect. To prevent this confound, in

most

repeated

researchers

vary

measures

the

order

studies, in

which

treatments are administered.

Ordinal variables:

way analysis of variance.

In Stevens's (1946, 1951) description of levels

One-sample¿test: This test uses information from a sample(its mean, D, and M) to decide whether a specific hypothesized

being measured may lead to differences in the

value

for

the

unknown

population mean p appears to be plausible.

of measurement, ordinal variables are those that contain information only about rank.

Orthogonal: This term

means

that

contrasts

are

independent or uncorrelated. If you correlate

One-tailed test: A significance test that uses an alternative hypothesis that specifies an alternative value of u thatis either less than or greater than the value of p stated in the null hypothesis. For

a pair of variables (0; and 05) that are coded as

orthogonal

contrasts,

the

correlation

between O, and O, should be O.

Orthogonal factorial ANOVA:

example, if #0: u = 100, then the two possible

This is a factorial design in which the

directional alternative hypotheses are Aq: pu
100. The rejection region

proportional

consists of just one tail of the normal or ¢

percentage of members in each B group is

distribution. The lower tail is used as the

equal across levels of A). In an orthogonal

rejection region for #7: p< 100, and the upper

design,

tail is used as the rejection region for Aq: u >

confounded.

100. One-tailed tests are used when there is a directional alternative hypothesis. See also

directional test.

such

effects

of

a

way

factors

that

are

the

not

Outer fences: These are a feature of boxplots that usually do not appear on the graph. The outer fences fall at Mdn + 3 x IQR and Mdn - 3 x IQR (or the

Order effects: In

the

in

repeated-measures

studies,

if

each

93% Page 600 of 624 - Location 14806 of 15772

actual minimum and maximum if these are

closer to the median than these limits).

or other designs where scores are paired in

Individual scores that lie beyond the outer

some manner. Also called the correlated-

fences are marked as extreme outliers (using

samples ¿test or direct-difference ztest.

asterisks).

Parameter:

Outlier:

In the context in this book, a parameter is a

A score that is extreme or unusual relative to

quantitative description of distribution shape

the sample distribution. There are many

for a population. Each parameter can be

standards that may be used to decide which

estimated by a samplestatistic. For example,

scores are outliers; for example, a researcher

His the population mean, M is the sample

might judge any case with a z score greater

mean, o is the population standard deviation,

than 3.3 to be an outlier. Alternatively, scores

and SD is the sample standard deviation. In

that lie outside the outer fences of a boxplot

other contexts the term parameter can have

might also be designated as outliers.

different meanings.

p (orpvalue):

Partition ofvariance:

The area in one or two tails of a distribution

The variability of scores (as indexed by their

(such as a / distribution); p represents the

sum

theoretical probability of obtaining a research

separated

result (such as a ¢ value) equal to or greater

explained by between-group differences (or

than the one obtained in the study, when Ho

treatment) and the variance not predictable

is correct. It thus represents the “surprise

from group membership (due to extraneous

value” (Hays, 1973) of the following result: If

variables). The ratio of between-group (or

Ho is correct, how surprising or unlikely is the

explained) variation to total variation, eta

outcome of the study? When a small y value is

squared,is called the proportion of explained

obtained (7 less than a preselected A value),

variance. Researchers usually hope to explain

then the researcher may decideto reject the

or predict a reasonably large proportion of

null hypothesis. Another definition for p

variance for the outcome variable. Partition of

would be the

following

SS is introduced in discussion of one-way

question: How likely is it that, when Ho is

analysis of variance, and estimated partitions

correct, a study wouldyield aresult (such as a

of variance are also provided by regression

answer to

the

tvalue) equal to or greater than the observed £ value just due to sampling error? A p value provides an accurate estimate of the risk for Type I error only when all the assumptions required

for

null-hypothesis

significance

testing are satisfied.

paired-samples ztest: A form of the ¿test that is appropriate when scores come from a repeated-measures study, a pretest-posttest design, matched samples,

93% Page 601 of 624 - Location 14830 of 15772

of squares) into

can be

two

partitioned

parts:

the

or

variance

and multivariate analyses.

Pearson product-momentcorrelation: See Pearson's 7. Pearson's

7

(or

Pearson

product-moment

correlation): A

parametric

correlation

statistic

that

provides information about the strength of a relationship

between

two

quantitative

variables; it should be used only when the

variables are normally distributed, linearly

Repeated-measures

related, and at least approximately at the

Person x Treatment interaction.

interval/ratio level of measurement. When not otherwise specified, the term correlation to

refers

usually

the

product—

Pearson

moment correlation.

no

p-hacking: Searching for smaller p values, often by running numerous different analyses using different

Peer review:

assumes

ANOVA

about

decisions

which

cases,

variables, or groups to include; see Wicherts are

et al. (2016) for a list of p-hacking practices.

submitted to journals, they are sent to “peers”

The final reported p values obtained after p-

(other researchers) for review. This is a

hacking are

When

quality

control

prevents

the

reports

research

scientific

publication

usually

that

mechanism

of poor-quality

research information.

not believable; they greatly

underestimate the true risk for Type I error. Phi coefficient (g): A correlation (also an effect size) that indexes the strength of association between scores on

Percentage (%): is

Percentage

multiplying

by

obtained

two

true

variables;

dichotomous

it

is

equivalent to Pearson's 7.

proportion by 100.

Plagiarism:

Percentile rank: The cumulative percentage of scores below an

When authors present ideas or contributions

X score in a frequency table (not including

of other people as if they were the authors’

scores exactly equal to X) can be reported as

own new contributions.

percentile rank. The percentage of area below

zin a standard normal distribution can also be reported as percentile rank.

See contrast. Pointbiserial correlation (777):

Person effects: In paired-samples or repeated-measures data, we can calculate a mean for each person (combining

across

scores

times

all

or

treatment conditions). If these means differ substantially

across

persons,

we

have

evidence that there are individual differences in the response variable (such as heart rate).

show

If persons

different

responses

to

treatments (e.g., one person's heart rate in

response

to

A correlation that is used to show how a true dichotomous variable is related to a quantitative

variable;

it

is

equivalent to

Pearson's 7.

Pooled-variances Ztest: there

is

homogeneity

of

When

no

evidence

variance

that

assumption

the is

violated,this is the version of the /test that is

Person x Treatmentinteraction:

increases

Planned contrast:

pain,

another

person's heart rate decreases, and another person's heart rate does not change), it indicates a Person x Treatment interaction.

04% Page 601 of 624 » Location 14653 of 15772

preferred. (This is the version of # that is usually taught

in

introductory

statistics

courses.)

Population: In ideal descriptions of statistics, a set of

scores for the entire population of interest,

of the M, — M, observed difference between

from which a sample (subset) of scores is

group means

selected, often randomly. In the ideal world

difference that is large enough to be of any

of statistics, we begin with a population

practical or clinical significance; in practice,

whose members can be identified and then

to make this judgment, a person needs to

select a sample. In actual research practice,

know something about the possible range of

we often begin with an easily accessible

values on the outcome variable and how

convenience sample of cases and then try to

much these changes are valued by people or

make

clinicians.

inferences

to

some

broader

hypothetical population.

See

in a study represents a

also

clinical

or practical

significance.

Positively skewed:

Prediction error:

An asymmetric distribution that has a longer

If a researcher uses a sample value of M to

tail at the high (or positive) end of the

predict or estimate ju, and the value of Mis not

distribution is said to be positively skewed.

equal to ju, the difference between M and p tells us the magnitude and direction of

Post hoc, ergo propter hoc: Latin for “After this, therefore because of this (or caused by this).” A common logical fallacy. This fallacy (error) occurs when a person assumes that a prior event caused a later event,

in

the

absence

of

any

other

prediction

error.

In

regression

analyses,

prediction errors are usually called residuals. Other things being equal, prediction errors

tend to decrease as increases. Preliminary data screening:

information about how the events might be

Examination of frequency tables and graphs

related.

to examine data before doing the analysis of

Post hoc power analysis: Don’t do this. If you conduct a statistical significance test and have a sample value of 4, and if you then use the power table to find power as a function of Zand Win your study,

primary interest; this makes it possible to see potential problems such as extreme scores, non-normal

distribution

shape,

and

nonlinearity. Primary source:

do not use a post hoc power analysis to say

In science, a primary source is a research

“the results would have been statistically

report written by a researcher who has

significant if V were larger.” (You can use an

firsthand knowledge about data collection

effect size from a completed study to evaluate

and analysis.

statistical power for a future planned study, Proportion (7):

of course.)

A proportion is obtained by dividing the zina

Post hoc test:

group or category by the total Win the entire

See protected test.

Practical significance: A subjective judgment as to whether the value

04% Page 602 of 624 + Location 14675 of 15772

sample. Also called relative frequency.

Protected test: A test that reduces the risk for Type I error by

using more conservative procedures. Some

treatment groups, but lacks one or more of

examples include the Bonferroni procedure

the

and the Tukey honestly significant difference

experiment.

test. Also called a post hoc test.

experiment, participants are randomly or

of

a

For

true,

well-controlled

example,

in

a

true

systematically assigned to treatment groups

Protective factor: Something that is associated with lower risk for diseases or problems. For example, hand washing is a protective factor for getting

colds and other common diseases. Protective factors are statistically related to disease or problem outcomes. However, the association is not perfect. Engaging in a protective behavior usually does not reduce the risk for disease or problem to zero. Not engaging in a protective behavior does not predict that the disease or problem is certain to occur. For many

features

diseases

and problems, there

are

multiple protectivefactors.

participant

as

possible

characteristics;

experiment, participants

with

random to

in

respect a

quasi-

assignment

treatments

is

to of

often

not

possible, and therefore, the groups are often not equivalent with respect to participant characteristics prior to treatment. In a true experiment, the intervention is under the control of the researcher; a quasi-experiment often assesses the impact of an intervention

that is not under the direct control of the researchers.

An

example

of

a

quasi-

scores on an outcome variable for two groups

When aresearcher suggests that results from a convenience sample can be generalized to a population

equivalent

experiment is a study that compares mean

Proximal similarity model:

hypothetical

in a way that makes the groups as nearly

that

has

characteristics similar to those of cases in the sample, the researcher is implicitly relying on similarity (rather than random sampling) to justify making generalizations.

that havereceived different treatments, ina

situation where the researcher did not have control over the assignment of participants to groups and/or does not have control over other variables that might influence the outcome of the study. Analysis of covariance is often used to analyze data from quasiexperimental designs that include pretest—

value:

posttest comparisons and/or nonequivalent

See p (orp value).

control groups.

Quantitative variable:

Quincunx:

A variable that contains information about the quantity or amount of some underlying characteristic, for example, age in years or salary in dollars. This includes the levels of measurement that

Stevens

(1946,

1951)

called interval and ratio.

Quasi-experimental research design: A research design that involves pretest—

posttest comparisons, or comparisons of 94% Page 602 of 624 - Location 14897 of 15772.

See Galton board. Random assignmentof participants to groups or

conditions: A way of assigning members of a sample to two or more treatment groups, such that each member of the sample has an equal chance of being included in each group. Note that this is not the same thing as random

Regression slope:

sampling of participants from a population.

In a regression that predicts a raw-score Y

Random factor: An analysis of variance is considered random if the levels included in the study represent an extremely small proportion of all the possible levels for that factor. For example, if a

researcher

randomly

selected

10

photographs as stimuli in a perception task, the factor that corresponded to the

10

individual photographs would be a random

from a raw score for X, the regression slope 2 is the average number of units of increase in predicted Y score for each one-unit increase in Æ This À slope is also called a regression coefficient. (When a zscore for Vis predicted from the z score for X, the standardized regression slope is denoted B.)

Reliability:

factor. In a factorial analysis of variance, the F

A measure is reliable if it provides stable and

ratio to test the significance for a factor that

consistent

is crossed with a random factor is based on an error term that involves an interaction between factors.

measurement. For example, if you weigh

Random sampling of participants population:

from

a

A random sampleis a subset of cases from a population selected in a manner that gives

across

results

occasions

of

yourself on a bathroom scale several times, and the weights are all similar, the bathroom provides

scale

reliable

measures.

If the

weights differ substantially, the bathroom scale provides unreliable measurements.

Repeated measures:

each member of the population an equal

A design in which each participantis tested at

chance of being included in the sample.

every point in time or under every treatment

Random sampling from a population should

condition; because the same participants

enhance the generalizability of results to that

contribute scores for all the treatments,

population. Note that this is not the same

participant characteristics are held constant

thing as random assignment of participants

across treatments (which avoids confounds),

to groups or conditions.

and

Range: The difference between the highest and

lowest values of a variable.

data

are

approximately

participant

to

due

characteristics can, in theory, be removed

from the error term used to assess the significance

of differences

of treatment

group means.

Range rule: When

variance

normally

distributed, the range is approximately 4 times the standard deviation; the standard deviation is approximately one quarter of the

range.

Replication: We replicate a study by reproducing or repeating it. Scientists

can do an exact

replication or reproduction of a study, using the

same

methods,

or

a

conceptual

replication with variations in procedure. If

Regression coefficient:

See regression slope.

04% Page 603 of 624 - Location 14719 of 15772

results

cannot

be

replicated

by

other

researchers, this does not necessarily prove that the original study was wrong. However,

risk for Type I error. The computations in this

downwardly adjusted @f. You probably will never

section are for the equal variances assumed

need to use the equal variances not assumed ztest,

version of the test. You will probably never use

but it appears in your SPSS output whether you

the equal variances not assumed version of the £

request it or not.

test. Given the values of 5, 71, and 79, We can When the “equal variances assumed” version of

calculate the standard error of the difference

the ¿testis used,the variances within the two

between sample means, SEy,_m,:

groups are pooled or averaged, and this average is called Spooled? or spe. The term poo/edjust means

Other

averaged. To obtain 57”, the pooled or averaged

(12.13)

within-group variance, we average the two within-group variances s;2 and s,2. The first version of the formula works whether 7; = 72; or not. It “weights” the variances by the sample sizes (that is, sp? will be closer to s2 for the group with the larger 7).

Other

В Sample statistic — Hypothesized parameter SE,sample statistic

Other

(12.11)

В

ニ P

A tratio generally has the following form:

[(n, =1) + (ヵ - 引

For theindependent-samples ¿test, the value of

(72, +7, — 2)

difference between means is 0, that is equivalent

(M1 — Hp) is usually hypothesized to be 0. If the to anull hypothesis that caffeine has no effect on

If 721 = np, this formula reduces to the following.

heartrate; that is, mean heartrate is the same

This version of the formula makes it even clearer

whether people receive caffeine or not.

that sp? is the averageof s? for the two groups:

Next we calculate the independent-samples # ratio:

Other

(12.12)

2 i +5)72. 5 = (57 The alternativeto the pooled-variances or equal variances assumed independent-samples ¿test procedure is the equal variances not assumed (also called separate variances) ztest procedure.* The formula for SEM, -M, for the equal variances not assumed test keeps the two variances separate

Other

(12.14)

MM, ЗЕмм» Then calculate the degrees of freedom for the independent-samples ratio:

Other

instead of pooling them; this test also uses a

o 56% Page 340 of 624 - Location 2683 of 15772

always completely successful in controlling for

rival

explanatory

variables,

but

Sample:

in

In formal or ideal descriptions of research

experiment

methods, a sampleis a subset of cases drawn

makes it possible to rule out many rival

from the population of interest (often using

explanatory variables. In nonexperimental

random

research, it is typically the case that for any

samples often consist of readily available

predictor variable of interest (X7), many other

cases (convenience samples) that were not

potential predictors of ¥ are correlated with

drawn

or confounded with Xj. The existence of

population.

principle,

a

well-controlled

numerous rival explanations that cannot be completely ruled out is the primary reason why we say “correlation does not indicate

causation.” This caveat can be worded more precisely: “Correlational (or nonexperimental) research does not provide a

basis

for

making

confident

inferences, because there completely

rule

out

explanatory variables

all

is no

causal way to

possible

rival

in nonexperimental

studies.” A nonexperimental researcher can

sampling).

from

a

In

actual

well-defined

Sampling distribution ofM: The theoretical distribution

practice,

broader

of obtained

values for M when thousands of samples (all the same size) are randomlyselected from the same population. The shape, center, and variability of values for Mare predicted by the

central

limit

theorem

and

can

be

demonstrated empirically using Monte Carlo

simulations.

identify a few rival, important explanatory

Sampling error: When hundreds or thousands of random

variables and attempt to control for their

samples are drawn from the same population,

influence

correlations;

and a sample statistic such as Mis calculated

however, at best, this can be done for only a

for each sample, the value of A varies across

few rival variables, whereas in real-world

samples. This variation across samples is

situations, there can potentially be hundreds

called sampling error. It occurs because, just

of rival explanatory variables that would need

by chance, some samples contain a few

to be ruled out before we could conclude that

unusually high or low scores.

by

doing

partial

Xj influences Y.

Sampling model:

Robust:

When random (and/or systematic) methods

A statistic is considered robust if problems

are used to select a sample from a population,

with the data (such as extreme scores) do not

such that the sample is representative of the

make the statistic a poor estimate. The mean

population,

is not robust against the effect of extreme

justification for making generalizations from

scores.

sample to population (Trochim, 2006).

the

sampling

model

is

the

Science journal:

Row percentage: A row percentage for a cell in a contingency

A periodical that publishes peer-reviewed

table is found by dividing the cell ヵ by the

scientific research reports. Also called an

total number of cases in that row.

academicor a professional journal.

94% Page 603 of 624 » Location 14764 of 15772

(12.15)

=.06, then the corresponding one-tailed » = .03).

df=n,+n,—2.

In this situation, the analyst must also check that

the direction of difference between the means (For the equal variances not assumed ztest のis

corresponds to the difference in the alternative

calculated using a complicated formula,it is

hypothesis. If A: py > pp, the null hypothesis can

smaller than 7; + 72, — 2, and it is usually given to

be rejected if M; > M2 but notif M, < M3.

two or more decimal places.) Ihave used annoying quotation marksfor “exact”

12.6 Statistical Significance of Independent-Samples ¿Test

p.1do this as a reminder that the “exact” value ofp given by programs such as SPSS, often reported to 3 decimal places, is not necessarily correct. When assumptions are violated—and they often are—

Irecommend that you report the exact p value for

the p values given by a computer program often

the equal variances assumed (or pooled-variances)

greatly underestimate the true risk for Type I

version of the /test. This is a two-tailed test. For

decision error.

example, if “Sig.” as reported by SPSSis .032, report p= .032, two tailed. Remember thatif SPSS

A judgmentaboutstatistical significance can also

gives you a “Sig.” value of .000, you should report

be madedirectly from the obtained valueof 4 its

this as p< .001. À p value estimates risk for Type I

df, and the alevel. If zis large enough to exceed

error, and that risk can never be 0.

the tabled critical values of ¿for 7 + 72, - 2 df, the

A two-tailed exact p value corresponds to the combined areas of the upper and lower tails of the tdistribution that lie beyond the obtained sample

values of +z

null hypothesis of equal meansisrejected, and the researcher concludes that thereis a significant difference between the means. In the preceding empirical exampleof data from an experiment on the effects of caffeine on heart rate, 7; = 10 and

If you want to report your outcome as a

ny =10, therefore df= ny + 7-2 = 18. ff we use a

significance test using the conventional @ = .05,

=.05, two tailed, then from the table of critical

two tailed, level of significance, an obtained p

values of ¿in Appendix B at the end of this book,

value less than .05 is interpreted as evidence that

the reject regions for this test (given in terms of

the ¿value is large enough so that it would be

obtained values of é) would be as follows:

unlikely to occur by chance (because of sampling error) if the null hypothesis were true. In other words, if we set a = .05, two tailed, as the criterion

Other

Reject H,if obtained + +2.101.

significantly different. Note thatthese values of falso correspond to the If an analyst decides to use a one-tailed

middle 95% of the area of a ¿distribution with 18

(directional) test before peeking at the data, a one-

df. These ¿values (“critical” values) are also needed

tailed p value can be obtained by dividing the two-

to set up a confidence interval (CI) for M, — M3.

tailed pin the SPSS output by 2 (e.g., if two-tailed p

o 56% Page 341 of 624 - Location 2712 of 15772

SSE.

distance from the mean (given in zscore calculated using

Sum of squared errors,

proportions of cases in the sample that

Equation 11.5.

correspond to distances of z units above or

Standard error ofthe estimate: In

units) corresponds to proportions of area or

this

regression,

corresponds

to

the

standard deviation of the distribution of

below the mean.

Standardization:

actual ¥ scores relative to the predicted ア at

The term standardization has two different

each individual X score value. SFest provides

meanings in this book. In data analysis,

information about the typical magnitude of

standardization refers to the conversion of

the prediction error (difference between

scores in original units of measurement (e.g.,

actual and predicted 7) in regression. Smaller

pounds, degrees, inches) into unit-free z

values of SÆest are associated with smaller

scores (see Chapter 6). In experimental design

prediction

errors

and,

thus,

with

more

accurate predictions of ¥. Also denoted sy.x.

and measurement,

standardization means

keeping data collection procedures as similar as possible across all participants or cases (see

Standard normal distribution:

Chapter 2). See experimental control over other

A normal distribution with mean = 0 and standard deviation and variance = 1. See also normal distribution. Standard regression:

situationalfactors or extraneous variables.

Standardized scores: These are scores expressed in z-score units, that is, as unit-free distances from the mean.

A method of regression in whichall predictor

For example, the standardized score version

variables are entered into the equation at one

of X, zy, is obtained as zy = (M - Mx)/sx.

step, and the proportion of variance uniquely explained by

each predictor

is

assessed

Statistical control:

controlling for all other predictors. It is also

When information is available aboutat least

called simultaneous (or sometimes direct)

one additional variable (2), it is possible to

regression.

evaluate the

relationship

between

an X

predictor and a Y outcome variable using

Standard score:

statistical methods to partial out or remove

The distance of an individual score from the

variation

mean of a distribution expressed in unit-free

controlling for Z makes the appearance

terms (ie., in terms of the number of

between X and Y appear stronger, but there

standard deviations from the mean). If p and

are many waysthat the inclusion of a control

care known, the zscore is given by z= (X- w)/

variable can change our understanding of the

o. When p and 0 are not known, a distance

way Yand Ymayberelated. In paired-samples

from the mean can be computed using the

designs, the controlvariable is “persons.”

associated

with

Z

Sometimes

corresponding samplestatistics, M and s (or SD). If the distribution of scores has a normal

Statistical significance:

normal

Statistical significance is evaluated by looking

distribution can be used to assess how

at a p value associated with a test statistic. Ifp

shape,

a table

of the

standard

95% Page 604 of 624 + Location 14809 of 15772

necessary to click the Define Groups button; this

There is an options button onthe right.

opens the Define Groups dialog box that appears

At the bottom ofthe dialog box are options buttons for thefollowing; OK,Paste, Reset, Cancel and Help.

in Figure 12.10. Enter the code numbers that identify the groups that are to be compared (in this case, the codes are 1 for the 0-mg caffeine group and 2 for the 150-mg caffeine group;

Figure 12.10 SPSS Define Groups Dialog Box for

however, different numbers can be used to

Independent-Samples /Test

identify groups). Clickthe OK button to run the specified tests. The output for the independent-

tE. Define Groups

К

samples /test appears in Figure 12.11.

Figure 12.8 SPSS Menu Selections to Obtain Independent-Samples /Test

ee sn es gue (en ー =ー oo

BHO ES ニー

k

r=mo

ョ ニニーー E real =

=

: ョー Eee manых | Besoin Trin NE

o Use specified values Group 1:

|1

Group ②:

|②

© Cut point:

Cesneel)( re) Figure 12.9 Screenshot of SPSS Dialog Box for Independent-Samples /Test 1& Independent-SamplesT Test

|| Test Variables):

=

En

There are twooptions to select from. The first statesUse specified values and provides values to thedifferent groups. Here group 1 value is 1 and groupvalueis 2.

Grouping Variable

The second option is the cut point option.

[8 EE (CD se

The SPSS dialog box to define groups has been shown in the image.

Groups.) (ese) (cane) Chow)

The first option has been selected. At the bottom there are three radio buttons;

continue,cancel and help. The image is the dialog box for the independent-samplest testfor the hr variable. On the left is the space for variables, which is blank. Ontheright the test variable has been specifiedas hr. The groupingvariable has been specified as caffeine. There is a define groups button right below this. 56% Page 343 of 624 - Location 8773 of 15772

Figure 12.11 Output From SPSS IndependentSamples ¿Test Procedure

test.

when they have been converted into units

Typel error:

that have a mean of O and a standard deviation of 1. When individual X scores such

A decision to reject Ho when Aois correct.

as height in inches are converted to z scores, they become unitfree.

Type I sum of squares: This method of variance partitioning in the SPSS GLM procedure is essentially equivalent

Unlucky randomization: Sometimes even when random assignment of

to the method of variance partitioning in

cases to groups is used, just by chance, the

sequential

or

groups end up being different in some way.

regression.

Each

hierarchical predictor

multiple is

assessed

controlling only for other predictors that are

Unprotected test:

entered at the same step or in earlier steps of

A significance test that does not use more

the analysis.

conservative decision rules to decide whether multiple comparisons between group means

Type Il error:

are

statistically

significant

(ie,

more

A decision not to reject Ho when Ho is

conservative than the decision rules for a

incorrect.

single independent-samples /test). Protection

Unconditional probability: The overall probability of some outcome (such as survival vs. death) for the entire sample, ignoring membership on any other categorical variables. For the dog owner survival data in Chapter 17, the unconditional probability of survival is the total number of survivors in the entire sample (78) divided by the total Win the entire study (92), which is equal to .85.

refers to protection against the inflated risk for Type I error that arises when multiple significance

tests

are

performed.

For

example, if several / tests are done after an analysis of variance, using the same decision rules as for a single independent-samples ¢ test, these are unprotected tests. Post hoc (also called protected) tests use modified and

more conservative decision rules to evaluate statistical significance; these decision rules protect against inflated risk for Type I error.

Underpowered: A study is underpowered if the samplesize is too small(relative to the effect size) to havea reasonable chance of rejecting Ap when Ap is

false.

Uniform distribution: A distribution where all values of scores for the X variable have equal frequencies or proportions.

Variability: A set of scores has variability if any individual X scores differ from the mean. (Variability is the same as variation.)

Variable: A characteristic that varies across cases. For example, humans differ on variables such as blood pressure, height, and age.

Variation:

Unit free: Scores are unit free (also called standardized)

95% Раде 605 of 624 - Location 14258 of 15772

A set of scores has variation if any of the individual X scores differ from the mean. (In

contrast, if each score equals every other

The formulato calculate a standard score or 2

score, there is no variation in score values.)

score is z = (XY — M)/SD. A distribution of 2

Weighted mean: This is a mean that combines information across several groups or cells (such as the

mean for all the scores in one row of a factorial

analysis

variance)

of

is

and

calculated by weighting each cell mean by its corresponding number,

z,

of cases.

For

example, if Group 1 has 7; cases and a mean of M, and Group 2 has 7; cases and a mean of Му,

the

z score:

weighted

mean

Muweighted

is

calculated as follows: Mweighted = (71411) +

(22M32)1/(n + >). Whiskers: In a box and whiskers plot, these are the vertical lines that extend beyond the hinges out to the adjacent values. Any scores that lie beyond the whiskers are labeled as outliers.

Within-s: See repeatedmeasures. *ZPRED: The standardized or z-score version of the predicted value of Y (7) from a multiple regression. This is one of the new variables that can be computed and saved into the SPSS worksheet in SPSS multiple regression. Also

called ZPR_1.

*ZRESID: The standardized or z-score version of the residuals from a multiple regression (7- 7). If any of these lie outside the range that

includes the middle 99% of the standard normal distribution, these cases should be examined as possible multivariate outliers.

Also called *ZRE_1.

95% Раде 606 of 624 - Location 14280 of 15772

scores has M= 0 and SD = 1. See also standard scoreand standardized scores.

References Abelson, R. P., & Rosenberg, M. J. (1958). Symbolic

Psychological Reports, 19(1), 3-11. doi: 10.2466/pr0.1966.1.3

Baum, A., Gatchel, R.J., & Schaeffer, M. A. (1983).

psycho-logic: A modelof attitudinal cognition.

Emotional, behavioral, and physiological effects

Behavioral Science, 3, 1-13.

of chronic stress at Three Mile Island. Journal of Consulting and Clinical Psychology, 51,

Aguinas, H., Gottfredson, R.K., & Joo, H. (2013).

565-572.

Best-practice recommendations for defining, identifying, and handling outliers.

Beck, A. T., Steer, R. A., € Brown, G. K. (1996).

Organizational Research Methods, 16(2),

Manual for the Beck Depression Inventory-II.

270-301.doi: 10.1177/1094428112470848

San Antonio, TX: Psychological Corporation.

American Psychological Association. (2009).

Begley, C. G., & Ioannidis, J.P.A. (2015).

Publication manual of the American

Reproducibility in science: Improvingthe

Psychological Association (6th ed.).

standard for basic and preclinical research.

Washington, DC: Author.

Circulation Research, 116, 116-126. doi: 10.1161/CIRCRESAHA.114.303819

American Statistical Association. (2015). Ethical guidelines for statistical practice. Retrieved

from http://www.amstat.org/asa/files/pdfs/Ethical Guidelines.pdf

Belluz,J. (2014, December 17). Scientists tallied up

all the advice on Dr. Oz's show. Half of it was baseless or wrong. Vox. Retrieved from https://Www.vox.com/2014/12/17/7410535/

dr-oz-advice

Anderson, C. A., & Bushman, B. J. (2001). Effects of violent video games on aggressive behavior,

Bewick, V., Cheek, L., & Ball, J. (2004). Statistics

aggressive cognition, aggressiveaffect,

Review 8: Qualitative data—Tests of association.

physiological arousal, and prosocial behavior: A

Critical Care, 8, 46-53.

meta-analytic review of the scientific literature. Psychological Science, 12,353-359.

Bissonnette, V. (2019). Resources for the learning and teaching of statistics and behavioral

Aronson, E., & Mills, J. (1959). The effect of severity of initiation on liking for a group. Journal of Abnormal and Social Psychology, 59,

science. Retrieved from https://sites.berry.edu/vbissonnette/index/stat istical-tables/

177-181.

Boneau,C. A. (1960). The effects of violations of Bartko, G. G. (1966). The intraclass correlation coefficient as a measure of reliability.

95% Page 607 of 624 » Location 14295of 15772

assumptions underlying the test. Psychological Bulletin, 57(1), 49-64.

Boston Children’s Hospital. (2014, October 5). Number of genes linked to height revealed by

case illustrations. Washington, DC: American Psychological Association.

study. Science Daily. Retrieved from

https://www.sciencedaily.com/releases/2014/ 10/141005134909.htm

Carifio, J., & Perla, R. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical Education, 42,1150-1152.

Brackett, M. A, Mayer, J. D., & Warner, R. M. (2004). Emotional intelligence and its relation to everyday behaviour. Personality and Individual Differences, 36, 1387-1402.

Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment (Quantitative applications in the social sciences, No. 17). Beverly Hills, CA: Sage.

Brannon, L., Feist, J., & Updegraff, J. A. (2017). Health psychology: An introduction to behavior and health (9th ed.). Boston: Cengage.

Bump, P. (2013, April 2). 12 million Americans believelizard people rule our country. The

Atlantic. Retrieved from https://www.theatlantic.com/national/archive /2013/04/12-million-americans-believe-lizardpeople-run-our-country/316706/

Burish, T. G. (1981). EMG biofeedback in the

treatment ofstress-related disorders. In C. Prokop & L. Bradley (Eds.), Medical psychology (pp. 395-421). New York: Academic Press.

Campbell, D. T., & Stanley, J. (1963). Experimental and quasi-experimental designs for research.

New York: Wadsworth. Campbell, D. T., & Stanley, J. S. (2001). Experimental and quasi-experimental designs for research (2nd ed.). Boston: Houghton

Mifflin.

CNN.(2018, October 11). CNN terms of use.

Retrieved from https://www.theatlantic.com/national/archive /2013/04/12-million-americans-believe-lizardpeople-run-our-country/316706/

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:

Lawrence Erlbaum. Cohen, J. (1992a). A power primer. Psychological

Bulletin, 112(1), 155-159. doi:10.1037/00332909.112.1.155

Cohen, J. (1992b). Statistical power analysis. Current Directions in Psychological Science,

1(3),98-101.doi: 10.1111/14678721.ep10768783

Cohen, J. (1994). The earth is round (y < .05). American Psychologist, 49, 997-1003.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S.

Campbell, L., Vasquez, M., Behnke, S., & Kinscherff, R.(2009). APA ethics code commentary and

95% Page 607 of 624 - Location 14934 of 15772.

(2013). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Hillsdale, NJ: Lawrence Erlbaum.

for the independent-samples test; this

months, half of the patients in the study had

discussion includes only the most widely

shorter, and half had longer, improvements in

reported.

length of survival. Ability to generalize results

12.10.1 M, -М» When the dependentvariable Vis measured in meaningful units, the difference between sample means can be useful information (Pek & Flora, 2018), although may authors do not refer to that

difference as an effect size. The difference between means can sometimes be interpreted as information about practical, clinical, or everyday importance. In this hypothetical example, people who consumed 150 mg of caffeine (about one cup of coffee) had heart rates about 10 beats per minute higher than those who did not consume caffeine. That is a noticeabledifference, but not large enough that people need to be worried about it. To make judgments about clinical or practical significance of differences between means, we need to understand the meaningsof different score values; even then, people can have different subjective evaluations. Imagine a situation in which people who receive chemotherapy for a specific type of cancer live on average 3 weeks longer than people who decline chemotherapy. Apart from the question of whether this difference is statistically significant, we have the question, How much practical value does a 3-week difference have? A medical researcher might be pleased to find a treatmentthat extendslife by 3 weeks. As a patient, however, I might not want to undergo possibly severe negative side effects unless the average extension of life was 2 or 3 months. In situations likethis, clinicians and patients should remember that group averages often do not predict individual outcomes well. If median improvementin length of survival is 3

from a study to your own personal situation shouldalso take into account how similar you are, and how similar your disease condition is, to persons included in the study. In the extremes it may beeasy to say whether a treatment such as a weightloss pill has practical or real-world significance. Most people would not think that a mean weightloss of 1 lb is enough to be meaningful or valuable. On the other hand, most people might think that a mean weight loss of 30 lb is enough to have practical, clinical, or

real-world value. For in-between amounts of weightloss, people may differ in how much they thinkis sufficient to beof value, relative to costs

and risks of the treatment. When variables are not measured in meaningful units, M, — M, may not provide useful real-world information (although it may still be interesting to compare values of M; — M» across different studies that use the same measures). For example, suppose you are told that female teachers receive average teaching evaluation scores of 24, while male teachers receive average evaluation scores of 27.You can see that the mean rating is higher for male than femaleteachers in this example, but you would need much more information to evaluate whether the difference is large. It is usually helpful to know the possible minimum and possible maximum score value and the actual

minimum and maximum values found in the sample (this information is sometimes not included, but it should be). Other effect size

indexes use standard deviation or variance of scores to evaluate effectsize. The value of M, — Mis notrelated to sample size

o 57% Page 345 of 624 - Location 2839 of 15772

Friedmann, E., Katcher, A. H., Lynch, J.J., &

Godlee, F., Smith, J., € Marcovitch, H. (2011).

Thomas, S. A. (1980). Animal companions and

Wakefield's article linking MMR vaccine and

one year survival of patients after discharge

autism was fraudulent. British Medical Journal,

from a coronary care unit. Public Health

342, 7452. doi: 10.1136/bmj.c7452

Reports, 95,307-312.

Grande, T. (2015, May 13). “Visual binning” Frigge, M., Hoaglin, D. C., & Iglewicz, B. (1989). Some implementations of the box plot. American Statistician, 43(1), 50-54.

Fritz, C. O., Morris, P. E., & Richler, J.J. (2012).

features on SPSS. Retrieved from https://www.youtube.com/watch? v=tAdmnPegsig Gray, J., & Griffin, B. (2009). Eggs and dietary

Effect size estimates: Currentuse, calculations,

cholesterol—Dispelling the myth. Nutrition

andinterpretation. Journal of Experimental

Bulletin, 341), 66-70. doi: 10.1111/j.1467-

Psychology: General, 141(1), 2-18. doi:

3010.2008.01735.x

10.1037/a0024338

Gray, M. (1985). Legal perspectives on sex equity GAISE College Report ASA Revision Committee. (2016). Guidelines for assessment and

in faculty employment. Journal of Social Issues,

41(4),121-134.

instruction in statistics education (GAISE): College report 2016. Retrieved from

http://www.amstat.org/education/gaise

Green, C. D., Abbas, S., Belliveau, A., Beribisky, N., Davidson, I.]., DiGiovanni, J., ... Wainewright, L. M.(2018). Statcheck in Canada: What

Gaito, J. (1980). Measurement scales and statistics:

proportion of CPA journal articles contain

Resurgence of an old misconception.

errors in the reporting ofp-values? Canadian

Psychological Bulletin, 87, 564-567.

Psychology, 59(3), 203-210. doi: 10.1037/cap0000139

Gallup. (n.d.). Methodology center: Scientifically proven methodology and rigorous research

Greenland,S., Maclure, M., Schlesselman, J. J.,

standards. Retrieved from http://www.gallup.com/178685/methodology-

Standardized regression coefficients: A further

center.aspx

critique and review of some alternatives.

Poole, C., & Morgenstern, H. (1991).

Epidemiology, 2, 387-392. Glen, S. (2013, December 3). Choose bin sizes for histograms in easy steps + Sturge's rule.

Greenland, S., Schlesselman, J. J., & Criqui, M. H.

Retrieved from http://www.statisticshowto.com/choose-bin-

coefficients and correlations as measures of

sizes-statistics/

effect. American Journal of Epidemiology, 123,

(1986). The fallacy of employing standardized

203-208.

06% Page 608 of 624 - Location 15021 of 15772

Grimm, K. J., & Ram, N. (2016). Growth modeling: Structural equation and multilevel modeling approaches. Thousand Oaks, CA: Sage.

Guthrie, R. V. (2004). Even the rat was white: A historical view of psychology (2nd ed.). Boston: Allyn & Bacon.

Hodges, J. L., Jr., Krech, D., & Crutchfield, R. S. (1975). Statlab: An empirical introduction to

statistics. New York: McGraw-Hill. Hoekstra, R., Kiers, H.A.L., & Johnson, A. (2012, May 14). Are assumptions of well-known statistical tests checked, and why (not)? Frontiers in Psychology, 3, Article 137. doi:

Harker, L., & Keltner, D. (2001). Expressions of positive emotion in women’s college yearbook pictures and their relationship to personality and life outcomes across adulthood. Journal of Personality and Social Psychology, 80, 112-124.

Harris, R.J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence

10.3389/fpsy.2012.00137

Hogg,R. V., Tanis, E., € Zimmerman, D. (2014). Probability and statistical inference (9th ed.). Boston: Pearson.

Howell, D. C. (1992). Statistical methods for psychology (3rd ed.). Boston: PWS-Kent.

Erlbaum. Huff, D. (1954). How to lie with statistics. New Hausman, J. S., Berna, R., Gujral, N., Ayubi,S.,

York: Norton.

Hawkins, J., Brownstein, J. S., & Dedeoglu,F. (2018). Using smartphone crowdsourcing to redefine normal and febrile temperatures in adults: Results from the Feverprints study.

Huff, D., & Geis, I. (1993). How to lie with statistics (Reissue ed.). New York: W. W. Norton.

Journal of General Internal Medicine, 33, 2046-2047. doi: 10.1007/s11606-018-4610-8

Hays, W.(1994). Statistics (5th ed.). Fort Worth, TX: Harcourt Brace.

Hays, W. L. (1973). Statistics for the social sciences (2nd ed.). New York: Holt, Rinehart.

Jaccard,J., & Becker, M. A. (2009). Statistics for the behavioral sciences (5th ed.). Pacific Grove, CA: Wadsworth Cengage Learning.

John, L.K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truthtelling. Psychological Science, 23(5), 524-532. doi: 10.1177/0956797611430953

Henry, P.J. (2008). College sophomores in the laboratory redux: Influences of a narrow data base on social psychology’s view of the nature of prejudice. Psychological Inquiry, 19(2), 49-71.

Kendall, M. (1962). Rank correlation methods (3rd ed.). New York: Hafner.

doi: 10.1080/10478400802049936

Kerr, N. L. (1998). HARKing: Hypothesizing after

96% Page 608 of 624 - Location 15063 of 15772

the results are known. Personality and Social

Kopf, D. (2015, October 5). Should you ever use a

Psychology Review, 2(3), 196-217. doi:

pie chart? Retrieved from

10.1207/s15327957pspr0203_4

https://priceonomics.com/should-you-everuse-a-pie-chart/

Keys, A. B. (1980). Seven countries: A multivariate analysis of death and coronary heart disease. Cambridge, MA: Harvard University Press.

Kuhn, T. S., & Hacking, I. (2012). The structure of scientific revolutions: 50th anniversary edition (4th ed.). Chicago: University of Chicago Press.

Kiely, E., & Robertson, L. (2016, November 18). How to spot fake news. FactCheck.org.

Retrieved from http://www.factcheck.org/2016/11/how-tospot-fake-news/ Kihlstrom, J. F. (2010). Social neuroscience: The

Kumar, G.N.S. (2015, March 15). Visual binning in

SPSS. Retrieved from https://www.youtube.com/watch? v=WHuXyVaRPvM

Lenhard,J. (2006). Models and statistical

footprints of Phineas Gage. Social Cognition,

inference: The controversy between Fisher and

28(6), 757-783. doi:

Neyman-Pearson. British Journal of Philosophy

10.1521/soco.2010.28.6.757

of Science, 57(1), 69-91. doi:

10.1093/bjps/axi152 Kirk, R. (1996). Practical significance: A concept

whose time has come. Educational and Psychological Measurement, 56, 746-759.

Lenth, R. V. (2018). Java applets for power and sample size. Retrieved September 3,2019, from

http://www. stat.uiowa.edu/~rlenth/Power Kline, R. B. (2013). Beyond significance testing: Reforming data analysis in behavioral research (2nd ed.). Washington, DC: American Psychological Association.

Kling, К. C., Hyde, J. S., Showers, C.J., € Buswell, B.

Lienhard,J. (2002). No. 1712: Nightingale’s graph.

Retrieved from https://www.uh.edu/engines/epi1712.htm Lindeman, R. H., Merenda, P. F., & Gold, R. Z.

N. (1999). Gender differences in self-esteem: A

(1980). Introduction to bivariate and

meta-analysis. Psychological Bulletin, 125,

multivariate analysis. Glenview, IL: Scott,

470-500.

Foresman.

Koch, G. G. (1982). Intraclass correlation coefficient. In S. Kotz & N. L. Johnson,

Lowry, R. (2019). The confidence interval of rho.

Retrieved from http://vassarstats.net/rho.html

Encyclopedia of statistical sciences (pp. 213-217). New York: John Wiley.

Lyon, D., & Greenberg,J. (1991). Evidence of codependency in women with an alcoholic

97% Page 609 of 624 - Location 15107 of 15772

parent: Helping out Mr. Wrong. Journal of Personality and Social Psychology, 61, 435-439.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological

Bulletin, 105(1), 156-166. Mackowiak, P. A., Wasserman, S. S., & Levine, M. M. (1992). A critical appraisal of 98.6 degrees F, the upper limit of the normal body temperature,

Mills, J. L. (1993). Data torturing. New England Journal of Medicine, 329, 1196-1199.

and other legacies of Carl Reinhold August Wunderlich. JAMA, 268, 1578-1580.

Mischel, W. (1968). Personality and assessment. New York: John Wiley.

Maril, C. C. (2018, August 29). 98.6 degrees isa normal body temperature, right? Not quite.

Wired. Retrieved from https://www.wired.com/story/98-degrees-is-anormal-body-temperature-right-not-quite/ Maronna, В. A., Martin, R. D., Yohai, V.J., & Salibidn-Barrera, M. (2019). Robuststatistics: Theory and methods (with R). Hoboken, NJ: John Wiley.

Montecino, V. (1998). Criteria to evaluate the credibility of WWW resources. Retrieved from

https://mason.gmu.edu/~montecin/web-evalsites.htm Mooney, K. M. (1990). Assertiveness, family history of hypertension, and other psychological and biophysical variables as predictors of cardiovascular reactivity to social stress. Dissertation Abstracts International,

McGill, R., Tukey, J. W., & Larsen, W. A. (1978).

51(3-B), 1548-1549.

Variations of box plots. American Statistician,

32(1), 12-16.

Myers, J. L., & Well, A. D. (1995). Research design and statistical analysis. Mahwah, NJ: Lawrence

McNemar, Q. (1947). Note on the sampling error

Erlbaum.

ofthe difference between correlated proportions of percentages. Psychometrika, 12(2), 153-157. doi: 10.1007/BF02295996

Nightingale, F. (1858). Notes on matters affecting the health, efficiency, and hospital administration of the British Army founded

Mersey, J.C.B., € Gough-Calthorpe, A. (1912). Report of a formal investigation into the

chiefly on the experience of the late war. London: Harrison & Sons.

circumstances attending the foundering on the 15th April, 1912, of the British Steamship “Titanic,” of Liverpool, after striking ice in or near latitude 41° 46’ N., longitude 50° 14’ W., North Atlantic Ocean, whereby loss of life ensued. London: His Majesty's Stationery Office.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. doi: 10.1126/science.aac4716

Pearson, E. S., & Hartley, H. O. (Eds.). (1970).

97% Page 609 of 624 - Location 15148 of 15772.

Biometrika tables for statisticians (3rd ed., Vol. 1). Cambridge, UK: Cambridge University Press.

Resnick, B. (2019). Hyped-up science erodes trust. Here's how researchers can fight back. Vox.

Retrieved from https://www.vox.com/sciencePek, J., & Flora, D. B. (2018). Reporting effectsizes in original psychological research: A discussion

and-health/2019/6/11/18652225/hype-

science-press-releases

and tutorial. Psychological Methods, 23(2), 208-225. doi: 10.1037/met0000126

Rosenthal, R. (1966). Experimenter effects in behavioral research. New York: Appleton-

Peters, J. (2013, July 13). When ice cream sales

Century-Crofts.

rise, so do homicides. Coincidence, or will your next cone murder you? Slate. Retrieved from

http://www. slate.com/blogs/crime/2013/07/0 9/warm weather homicide rates when ice cr eam sales rise homicides rise coincidence.ht ml

Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: Methods and data analysis (2nd ed.). New York: McGraw-Hill.

Rubin, Z. (1970). Measurement of romantic love. Journal of Personality and Social Psychology, 16,

Pickering, T. G., Gerin, W., & Schwartz, A.R.

265-273.

(2002). What is the white-coat effect and how

shouldit be measured? Blood Pressure Monitoring, 7(6), 293-300.

Rubin, Z. (1976). On studying love: Notes on the researcher-subject relationship. In M. P. Golden (Ed.), The research experience (pp. 508-513).

Pierce, R. (2017, March 29). Quincunx (Galton

Itasca, IL: Peacock.

board). Retrieved May 9, 2019, from

http://www.mathisfun.com/data/quincunx.ht

ml

Sawilowsky,S.S., & Blair, R. C. (1992). A more realistic look at the robustness and Type II error properties of the #test to departures from

Polya, G. (2014). Mathematics and plausible

population normality. Psychological Bulletin,

reasoning: Two volumes in one. New York:

111(2), 352-360. doi: 10.1037/0033-

Martino Fine Books. (Original work published

2909.111.2.352

1954) Schmidt, C. M. (2004). David Hume: Reason in Rasco, D. (2020). Companion volume for R. Thousand Oaks, CA: Sage.

Record, R. G., McKeown, T., & Edwards, J. H.

history. Philadelphia: Pennsylvania State University Press.

Schénbrodt, F. (2011). What is a reasonable

(1970). An investigation of the difference in

sample size for correlation analysis? Retrieved

measured intelligence between twins and single

from https://stats.stackexchange.com/questions/15

births. Annals of Human Genetics, 34, 11-20.

97% Page 610 of 624 - Location 15188 of 15772

842/what-is-a-reasonable-sample-size-forcorrelation-analysis-for-both-overall-and-s

Snedecor, G. W., & Cochran, W. G. (1989). Statistical methods (8th ed.). Ames: Iowa State University Press.

Sears, D. 0. (1986). College sophomores in the laboratory: Influences of a narrow data base on

Spring, B., Chiodo, J., € Bowen, D. J. (1987).

psychology's view of human nature. Journal of

Carbohydrates, tryptophan, and behavior: A

Personality and Social Psychology, 51, 515-530.

methodological review. Psychological Bulletin, 102,234-256.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.

Statistical Consultants Ltd. (2012, April 23).

Titanic survival data. Retrieved from http://www statisticalconsultants.co.nz/blog/ti tanic-survival-data.html

Shoemaker, A. L. (1996). What’s normal? Temperature, gender, and heart rate. Journal of

Sternberg,R. J. (1997). Construct validation of a

Statistics Education, 4. Retrieved June 27, 2006,

triangularlove scale. European Journal of Social

from https://www.tandfonline.com/doi/full/10.108

Psychology, 27, 313-335.

0/10691898.1996.11910512

Stevens, S. (1946). On the theory of scales of measurement. Science, 103, 677-680.

Sigall, H., & Ostrove, N. (1975). Beautiful but dangerous: Effects of offender attractiveness and nature of the crime on juridic judgment. Journal of Personality and Social Psychology, 31, 410-414.

Stevens, S. (1951). Mathematics, measurement, and psychophysics. In S. Stevens (Ed.), Handbook of experimental psychology (pp. 1-49). New York: John Wiley.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after y-hacking. In S. Botti & A. Labroo (Eds.), Advances in consumer research. Duluth, MN: Association for Consumer

Research. Simons, D.]J., Shoda, Y., & Lindsay,S. (2017). Constraints on generalizability (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123-1128. doi: 10.1177/17546911708630

Stricker, L.J. (1991). Current validity of 1975 and 1985 SATs: Implications for validity trends since the mid-1970s. Journal of Educational Measurement, 28(2), 93-98.

Tabachnick, B. G., & Fidell, L. S. (2018). Using multivariate statistics (7th ed.). Boston: Pearson.

Tankard, J. W. (1984). The statistical pioneers. Cambridge, MA: Schenkman.

97% Page 610 of 624 - Location 15228 of 15772

standard deviations. It helps us visualize how much overlap there is between two distributions of scores. The following examples illustrate small

sourced from Kling, Hyde, Showers, and Buswell.

versuslargevalues of Cohen's d Figure 12.13

Abovethe imageisthefollowinginformation; Cohen's d equals 0.22 and Overlap equals 83.7

shows a small effectsize. Data from numerous

percent.

studies suggests that men tend to haveself-

Thedistribution on theleft has the d at 0 and the one ontheright hasthe d at .22. Thus, the distributions overlap.

esteem scores about .22 (two tenths) 57 higher than those of women (i.e., Cohen's 2=.22). Thisis a small effect. Figure 12.13 shows the overlap

A note below the graph states the following:

between these two distributions of scores. The normal distribution on the left represents selfesteem scores for women, with the meanlocated at d= 0. The distribution on the right represents self-esteem scores for men, with the mean located

at d=.22. Figure 12.13 Small Cohen’s d Effect Size and Overlap of Female (Left) Versus Male (Right)

Distributions of Self-Esteem Scores

Across numerousstudies, the average difference in self-esteem between male and female samplesisestimatedto be about .22; mean self-esteem for menis typically about twotenths of a standard deviation than mean self-esteem of women. Figure 12.14 Large Cohen's d Effect Size and Overlap of Female (Left) Versus Male (Right)

Cohen's d= 0.22 Overlap = 83.7%

Distributions of Heights

Cohen's d=2.00 Overlap = 18.9%

o

1 d

2

3

4

5 o

Source-Kling, Hyde, Showers, and Buswell (1999).

Note:Across numerousstudies, the average

1 d

2

3

4

5

Source-http://en.wikipedia.org/wiki/Effect size.

difference in self-esteem between male and

Note:From samples of men and women in the

female samples is estimated to be about .22; mean

United Kingdom, mean height for men = 1,755

self-esteem for menistypically about two tenths

mm, and mean height for women = 1,620 mm.

of a standard deviation /herthan meanself-

The standard deviation for height = 67.5 mm.

esteem of women.

Therefore Cohen's @= (Male — Mremale)/s = (1,754

-1,620)/67.5 = 2.00. The image resembles a normal distribution with two overlapping curves that shows the small Cohen's effect size. The imagehas been

The image resembles a normal distribution

o 57% Page 348 of 624 - Location 8924 of 15772

Wilkinson, L., & Task Force on Statistical Inference, APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. Winer, B.J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed.). New York: McGraw-Hill. Wootson, C. R., Jr. (2017, July 18). Diet drinks are associated with weight gain, new research suggests. The Washington Post. Retrieved from

https://www.washingtonpost.com/news/toyour-health/wp/2017/07/18/diet-drinks-areassociated-with-weight-gain-new-researchsuggests/?utm term=.83b6d025e6b5 Zumbo, B. D., € Zimmerman, D. W. (1993). Isthe selection of statistical methods governed by level of measurement? Canadian Psychology, 34,390-400.

98% Page 611 of 624 - Location 15309 of 15772

nonparametric alternative to, 407-408

Index

null hypothesis for, 375, 403-405 planned contrasts, 387-389, 399, 467

Accidental sample, 30 Aggregated data, 283 Alpha (a) level choosing of, 198-200 reject regions, 200-202 Alternative hypothesis (Za) reject regions, 200-202 selection of, 195-197

post hoc or protected tests, 390-391, 467

preliminary data screening for, 377-378 questions in, 375-376

repeated-measures. See Repeatedmeasures ANOVA reporting results from, 397-398 research situations for, 374-375 source table, 452

American Statistical Association Guidelines for Assessment and Instruction in Statistics

in SPSS, output, 394-397

Education, 1, 10

SSbetween, 380-381, 383-385

Analysis bivariate. See Bivariate analyses definition of, 16 selection of, 28-29 selective reporting of, 567 variable type based on, 18 Analysis of covariance (ANCOVA), 507 Analysis of variance (ANOVA) assumptions for, 377-378 computations for, 378-383 confidence intervals for group means, 385

contrast coefficients, 388 data screening for, 377-378 description of, 306 division of scores into components, 400— 403

effect sizes for, 385-386 errorin, 375

expected Fvalue, 403-405 factorial, 374. See also Factorial ANOVA factor in, 374

in SPSS, 391-394

SSiotal, 381-382 SSwithin, 381, 383-385 SSwithin groups, 378 statistical power analysis for, 386-387 study, planning of, 398-399 summary of, 399-400 ttest versus, 374,405-406

Anecdotal evidence, 4 Anecdotes, 4 ANOVA.See Analysis of variance A priori comparisons, 376 Areas under normal distribution, 138-140 zscores and, 138 Area under the curve, 153 Arithmetic operations, order of, 94 Artifacts, 253 Associations, 237-239

Asymptotic output, 543 Attenuation (of correlation) due to unreliability, 282

hypothetical research examplefor, 375

Attrition, 432

independent-samples ztest and, ④0⑤-

Bar charts

④0⑥

Kruskal-Wallis test versus, 407-408

08% Page 612 of 624 » Location 15322 of 15772

construction of, 101-102 data screening uses of, 122-124

deceptive, 11,102-103

predictor variables in, 314

for frequencies of categorical variables,

preliminary data screening for, 297-298

100-101

questions for answering, 296-297

group means represented with, 125-126

regression equations, 291-296

Bar graphs, 11

regression line, 291-292, 296, 304

Bell-shaped distribution

researchsituations using, 290

communication about, 120

statistical significance tests for, 300

description of, 78-79,109-111, 138

Bivariate regression coefficients, 298-299

illustration of, 103-104

Bonferroni procedure, 256-257, 390, 468

mean for, 82

Boxplot

sketching of, 109-111

definition of, 115

Bernoullitrials, 161

outer fences in, 117

Beta (6, risk for Type I error), 221, 223, 231,

settingup, 115-117

299

SPSS for obtaining, 117-120, 123

Between-S, 28, 374, 413, 417,437 See also1ndependent-samples t test Between-Sfactorial ANOVA, 487-489

Carryover effects, 431,474 Case number, 44 Cases, 16

Bias confirmation, 2, 9-10 in correlation, 278 Bimodal distribution, 79-80, 106 Binned frequency distribution, 48-49 Binning, 48,113

Binomial distribution, 161,163 Bivariate analyses dependentvariables for, 563-564 independentvariables for, 563-564 nonparametric, 564-565 parametric, 564-565

results section of, 569-570 selection of, 563 variables added to, 570-572 Bivariate outliers, 242, 244-246, 455

Bivariate Pearson correlation. See Pearson's r Bivariate regression advantages of, 291 comparing two forms of regression, 295-296

information provided by, 290-291 partition of sums of squares in, 312-313 planning of study, 314-315

08% Page 612 of 624 + Location 15348 of 15772

Case study, 4 Categorical variables bar charts for frequencies of, 100-101 data screening for, 46 dependent, 563-564 description of, 17, 38, 571-572 frequency distribution tables for, 40, 49— 50

independent, 563-566 modefor, 45 naturally occurring groups, 50 pie charts for, 99-100 treatment groups, 50

Causal claims description of, 6 “post hoc, ergo propter hoc”fallacy, 6

Causal inference conditions for, 20-21 correlation and, 235 description of, 568-569 evidence of, requirements for, 8-9 Ceiling effect, 147 Central limit theorem, 169-173, 176

Central tendency measures

exampleof, 184-187

description of, 72

graphing of, 357-358

lying with, 83

for group means, 385

See also specific measure

independent-samples ¿test, 357-358

Change scores, 423 Cherry-picking, 2

interpreting of, 183

Chi-square (42) test

for regression coefficients, 300

95%, for Pearson's 7, 273-274

computation of, 533-535

samplestatistics obtained using, 187

description of, 28, 532

sampling error used to set up, 181-182

effect size indexes for, 552

Confidence levels

effect sizes, 536-538

for correlations, 257

expected cell frequencies if Mois true,

in research reports, 257

532-533

Confirmation bias, 2, 9-10

as “goodness of fit” index, 559

Confirmatory evidence, 195

one-way, 559

Confirmatory studies, null-hypothesis

results, reporting, 542-543

significance testing in, 224

SPSS, 538-540

Confounded variables, 21

SPSS, crosstabs procedure, 540-542

Confounds, 21, 23,363

statistical significance of, 535-536

Consensus, 10

in structural equation modeling, 560

Construct validity, 314

uses of, 558-560

Contingency, defined, 526-528 Contingency tables

Citation, 4

Clinical significance, 217

association, measures of, 552

Close to the mean, 180

assumptions for, 543-551

Coefficient of determination, 258

chi-square analysis of. See Chi-square (x2)

Cognitive behavioral therapy, 5

test

Cohen’s d 214-215, 217-218, 348-349, 353

conditional probabilities, 528-529

Communicators

data screening for, 543-551

credentials of, 3

description of, 524

skills of, 3

examples of, 524-526, 530-532

Complete counterbalancing, 473

expected cell frequencies if Mois true,

Completely crossed, 481

532-533

Computer simulations, 33

expected values in cells, minimum

Conceptual replication, 9

requirements for, 543-544

Concordant pairs, 253

Fisher exact test, 556-557

Conditional probabilities, 528-529

groups, combining, 547-551

Confidence intervals

groups, removing, 544-547

around M; - M2, 342

marginal distributions for Xand ¥

body temperature exampleof, 184-187

constrain maximum value of ¢, 557-558

description of, 143,169

McNemartest, 553-556

null hypothesis for, 529-533

08% Page 613 of 624 - Location 15376 of 15772

observations, independence of, 543

Cubic trends, 459

with repeated measures, 553-556

Cumulative percentage, 48,137

2x2,557 unconditional probability, 528 Contrast coefficients, 388,393 Contrasts description of, 376 in general linear model, 456-460 polynomial, 458 population variance of, 455 repeated, 457-458 simple, 457 Control variables, 571-572 Convenience sample, 30,172

Correlation alpha level for tests of, 256 attenuation of, due to unreliability, 282 bias in, 278 causal inference and, 235 causation and, 6 computation of, 252-253 confidence levels for, 257 cross-validation of, 256 differences between, testing significance

of, 274-275 limiting the number of, 256 magnitudeof, 283 meaning of, 6 as necessary but not sufficient condition, 7 Pearson product-moment, 6

perfect, 7-8 point biserial, 347 replication of, 256 skepticism about, 270-271 spurious, 263-264, 281

Correlational study, 24 Counterbalancing, 472-474 Covariance, 285-286 Cramer's 7, 536, 540

Cronbach’s alpha reliability, 450

08% Page 614 of 624 - Location 15403 of 15772

Data aggregated, 283 definition of, 16 repeated-measures, 419-420, 444

Data analysis, 376 Data collection, ethical issues in, 10-11 Data organization for independent-samples /test, 418-419 for paired-samples test, 419 Data reporting, 39 Data screening for ANOVA, 377-378

bar charts, 122-124 for categorical variables, 46 for frequencydistribution tables, 39-40 graphs for, 121 preliminary, 149 Dataset, 16

Data torturing, 572

Deceptive bar graphs, 11,102-103 Degree of belief, 12,573 Degrees of freedom (47 description of, 88-89, 179 in factorial ANOVA, 489-493 reject regions, 200-202 Dependent variables, 19,314, 563-564 Descriptivestatistics in journal articles, 92-93 notation, 73

quantitative variables, 72, 150-151 reporting of, 92 SPSS use of, for obtaining quantitative variable, 83-85 Descriptive use of statistics, 167

Deviation from the mean, 136 Dichotomous variable, 242 Difference (2) scores, 420-421 Directional test, 196,202

Disconfirmatory evidence, 195

340,349,357

Discordant pairs, 253

Equal variances not assumed ztest, 333,372

Distribution

Error

bell-shaped. See Bell-shaped distribution

definition of, 223,375

bimodal, 79-80, 106

prediction, 171-172, 223

binned frequency, 48-49

sampling. See Sampling error

F,383

technical types of, 224

frequency. See Frequency distribution

Typel, 220-223

Gaussian, 103,163

Type II, 220-223

grouped frequency, 48-49

Error bars, in graphs of group means, 188—

J-shaped, 105,120

189,357-359

normal. See Normal distribution

Errors in interpretation, 16

skewed, 80-82

Error variance

trimodal, 106

description of, 260

uniform, 105

within-group, 356

Distribution shapes, 98, 150-151

Eta squared (72) calculation of, from Fratio, 385

Effectsize for analysis of variance, 385-386 for chi-square test, 536-538 computation of, 349-350 description of, 214-216, 226, 300 in factorial ANOVA, 493-494 forindependent-samples test, 345-353, 429

indexes, 258 interpretation of, 351 Nand, 353-355

for paired-samples ¿test, 429-430 Pearson's rand 72as, 258-261

for repeated-measures ANOVA, 470 summary of, 350-353 unit-free, 570

Effect size indexes for chi-square test, 552 Cohen`s ② 348-349 eta squared, ③④⑥-③④⑦, ③⑥④ forindependent-samples Ztest, ③④⑤-③⑤③

M; - M2, 345-346 point biserial 7, 347-348 Equal variances assumed version of the Ztest,

99% Раде 614 of 624 » Location 15430 of 15772

description of, 346-347, 364 Ethical issues, in data collection, 10-11

Evidence replication of, 9 selective reporting of, 567 supporting, 4-5 Exact replication, 9 Experimental controls, 21, 23, 569 Experimental error, 356 Experimental research design control group in, 21 definition of, 21 experimental control in, 22 quasi-, 25-26 Experiments, quasi-, 25-26

Experiment-wise alpha (EW), 256 “Experiment-wise” error rate, 468, 496

Exploratory studies, 224 External validity, 27-28 Extreme bivariate outliers, 244-246 Extreme scores, 77-78

Extreme values, 198 Factor, in ANOVA, 374

Factorial ANOVA

weighted means, 508-510

assumptions, violations, 486

Factorial design, 481

A x B,test of, 484-485

Fdistribution, 383

between-S, 487-489

Fisher, R. A., 559

components in, 520

Fisher exact test, 556-557

degrees of freedom calculation in, 489—

Fisher’s Z, 273

493

Fixed factors, 508

description of, 571

Floor effect, 146

effect size estimates in, 493-494

Fratio

fixed factors, 508

description of, 371, 379, 382, 399, 426

Fratio in, 483, 506

expected value, 403-405

group means, 505-506

in factorial ANOVA, 483, 506

hypothetical research situation, 486-

Frequency, 40-41

487

Frequency counts, 40-41

main effect differences, 496

Frequency distribution

main effect for Factor A, null hypothesis

binned, 48-49

for, 484

grouped, 48-49

main effect for Factor B, null hypothesis

ungrouped, 46-48 Frequency distribution tables

for, 484 modelfor, 515-518

for categorical variables, 40, 49-50

nonexperimental research situations

cumulative percentage, 48

using, 482

for data screening, 39-40

nonorthogonal, partition of variance in,

elements of, 40-42

513-515

frequency counts, 40-41

null hypotheses in, 484-485

missing values, 41, 44, 63-65

orthogonal, 486,489,511-513

overview of, 37-39

questions in, 482-483

percentages in, 41-42

random factors, 508

proportions, 41

research situations using, 481-482

for quantitative variables, 39, 46-50

results, 504-505

SPSS for obtaining, 42-44

SPSS GLM procedure for, 496-499

total number of scores in a sample, 41

SPSS output, 499-504

ungrouped, 46-49

statistical power in, 494-495

variation amongscores in, standard

summary of, 507

deviation for describing, 90-91

sum of squares in, 489-493, 505-506,

Friedman one-way ANOVAtest, 476-478

518-520

Function

two-way, 515

definition of, 152

two-way interaction, 495-496

linear, 152

2 x 2,495, 504 unequal cell 7s in, 508, 510-515 unweighted means, 508-510

99% Page 615 of 624 » Location 15458 of 15772

Gallup, 5,29 Galton board, 161-162

Gaussian distribution, 103, 163

See alsoNormal distribution

Huynh-Feldt procedure, 466 Hypothesis

Generalizability, 5, 568

alternative, 195-197

Generalization, 16

null. See Null hypothesis (HO)

General linear model contrasts in, 456-460

definition of, 449 simple contrasts in, 457 SPSS procedure, for repeated-measures ANOVA, 460-464

variables added to, 474-475

GLM.See General linear model Goodness of fit index, 559 Graphs

Hypothetical or imaginary population, 30 Imaginary population, 30 Imperfect association, 8

Independent-samples /ratio, 340-341, 350 Independent-samples ¿test ANOVA and, 405-406

assumptions for use of, 332-338 computation of, 338-341 confidence interval around M; - М», 342

bar, 11

confidence intervals, 357-358

lying with, 11

confounds, 363

maps as formats for, 127

data organization for, 418-419

research uses of, 121-122

description of, 28-29, 122,413

Greenhouse-Geisser df 465-466

effect size indexes for, 345-353,429

Grouped frequencydistribution, 48-49

formulafor, 353

Group means

groupsin, 414

bar charts used to represent, 125-126

hypothetical research examplefor, 331

comparisons among, 375

Mann-Whitney Utest versus, 365,367

distances among, information about,

nonparametric alternative to, 365-367

380-381

null hypothesisfor, 403, 422

errors bars in graphs of, 188-189

outliers within groups, 333-334, 337

factorial ANOVA, 505-506

paired-samples ¿test versus, 413, 426429

Harmonic mean of 7's, 387,412

preliminary data screening, 335-338

Higher order polynomials, 459

research situations for, 329-331

Histograms

Results section, 357

examples of, 132-133, 145

samplesize for, decisions about, 361-364

for groups, 123

SPSS commandsfor, 342-344

negatively skewed, 249

SPSS output for, 344-345

for quantitative variables, 103-107

SSpartition in, 453-455

settingup, 111-115

statistical power for, 362

SPSS used to obtain, 107-109, 248

statistical significance of, 341-342

Homogeneity of variance, 332,456

study design, issues in, 363-364

Homoscedasticity, 297

summary of, 364-365

Human error, 226

terms for, values of, 332-338

Hume, David, 12

99% Page 616 of 624 - Location 15483 of 15772.

Independentvariables, 19,314, 563-564,

571-573

description of, 18, 79

Index effect size. See Effectsize indexes

justification for using, 33-34 questionnaire item on, 79

for kurtosis, 158-159

Linear function, 152,458

for skewness, 157-158

Linearrelationships, 235

Inferential use of statistics, 167

Linear trend contrast, 389

Institutional animal care and use committee,

Literature reviews, 2

10

Logistic regression, 3

Interaction effect, 487

Lower tail, 142-143, 180

Intercept (4p), 291 Internal validity description of, 27-28 threats to, 432 Interpretation, errorsin, 16

Interquartile range, 116 Interval level of measurement, 32

M, — M9, 345-346

Main effects-only model, 486 Mann-Whitney Utest, 362,365, 367, 407,

564 Maps, 127-128

Marginal frequencies, 525 Margin of error

Journal articles, descriptive statistics in, 92—

description of, 188

93

for percentages in surveys, 553

J-shaped distribution, 105, 120 Kendall's tau correlations, 272-273, 563 Kolmogorov-Smirnov test, 148, 159

Kruskal-Wallis test, 407-408 Kurtosis description of, 148 formula for, 158 index for, 158-159 Latin squares, 473-474

Leptokurtic distribution shape, 148 Levelof confidence, 181 Levels of a factor, 482 Levels of measurement, 31-33 interval, 32 nominal, 31-32 ordinal, 32 ratio, 32-33 in SPSS, 61-63

Levene Ftest, 333, 345, 357, 371, 377, 394, 499

Matched pairs, 416-417 Mauchly’s sphericity test, 456, 465-466 Maximum scores, 85-86 McNemar test, 553-556, 563

Mean advantages of, 76-77 for bell-shaped distributions, 82 definition of, 73 deviation from, 76,136 of difference scores, 420 disadvantages of, 77-78 obtaining of, 74-75 in real-worldsituations, 78 sum of deviations from M= 0, 75-77 when to choose, 82-83 See also Group means

Median definition of, 73 obtaining of, 73-74 in real-worldsituations, 78 when to choose, 82-83 Meta-analysis, 353

Likert scale 99% Page 617 of 624 - Location 15512 of 15772

Minimum scores, 85-86

Nonparametric analyses, 564-565

Missing values

No person-by-treatment interaction, 456

frequency distribution tables, 41, 44, 63—

Normal distribution

65

areas under, 138-140

SPSS, 61

definition of, 135

Mixed models, 507

description of, 103-105, 120

Mode

development of, 163

for categorical variables, 45

locations of individual scores in, 135

in real-worldsituations, 78

lower tail of, 142-143, 180

when to choose, 82-83

mathematics of, 152-154

Moderator variables, 572

middle area of, 142, 180

Monte Carlo simulation

negatively skewed, 146-147

sampling distribution of M, 175

outliers relative to, 144-145

sampling errorin, 171

positively skewed, 104, 146-147

Multiple-point rating scales, 18

real-world variables, 160-163

Multivariable analyses, 573

skewness of, 146-147

Multivariate analyses, 573

standard, 135, 140-141, 154

Multivariate analysis of variance (MANOVA),

upper tail of, 142-143, 180

507,573

Normal distribution shape

Naturally occurring groups, 50 Naturally occurring pairs, 415-416 Necessary but not sufficient, 7 Negatively skewed distribution, 105 Negatively skewed histograms, 249 New Statistics approach, 210 Nominal level of measurement, 31-32 Nominal variables, 17 Nonadditivity, 476 Nondirectional test, 196, 202, 204-206

See also Two-tailed test Nonequivalent control group, 25 Nonexperimental research design, 24-25 Nonlinear relationships, 284-285 Nonorthogonal factorial ANOVA, 513-515 Nonparametricalternatives to ANOVA,407-408

to independent-samples /test, 365-367 to paired-samples test, 438-440 to Pearson's 7, 271-273 to repeated-measures ANOVA, 476-478

99% Page 617 of 624 + Location 15538 of 15772

description of, 138-139 overall departure from, 159-160 Normality departure from, 157-158 description of, 148-149 Null hypothesis (Zp) ANOVA, 403-405

contingency table analysis, 529-533 expected Fvalue when true, 403-405 in factorial ANOVA, 484 false, 222 formal, 254 forindependent-samples test, 403,422 “no-interaction,” 516

for paired-samples ¿test, 422 planned contrasts, 387-388 rejection of, 218,220, 225 for repeated contrasts, 457 for repeated-measures ANOVA, 444 rho (pg = 0), 254-255 stating of, 194-195 Null-hypothesis significance testing (NHST)

alternative hypothesis, 195-197

Repeated-measures ANOVA

in confirmatory studies, 224

Open Science model, 9

definition of, 193

Order effects, 430-431

disconfirmatory evidence, 195

Order of arithmetic operations, 94

in exploratory studies, 224

Ordinal level of measurement, 32

logic of, 194-195, 210, 255

Ordinal variables, 17-18

null hypothesis, 194-195,218

Orthogonal contrasts, 389

rules for using, 203-204

Orthogonal factorial ANOVA, 486,489,511

traditional approach to, 210

513

Typelerrorin, 221-222

Outcome variables, 314

Type Il error in, 221

Outer fences, 117

Null outcomes, 225 Numeracy guidelines, 1 Oddsratios, 3, 552 OLS derivation of equation for regression coefficients, 321-323 Omnibustest, 375 One-sample ¿test assumptions for, 203 description of, 193, 197-198, 329

Outliers, 314 bivariate, 242,244-246,455 definition of, 144 independent-samples ¿test, 333-334, 337

normal distribution and, 144-145 Pearson's rand, 242 in SPSS, 154-157

Paired-samples ¿test

equation for, 215

advantages of, 437

one-tailed, reporting results for, 209

assumptions for, 422-423, 433-437

questions for, 203

data organization for, 419

reporting results, 227

designs, 414

SPSS analysis, 206

difference (d) scores, 420-421

two-tailed, reporting results for, 207—

effect size for, 429-430

208

as follow-up, 468-469

One-tailed p values, 201

formulas for, 423-424

One-tailed test

hypothetical study, 417-418

advantages of, 209-210

independent-samples ¿test versus, 413,

description of, 196

426-429

disadvantages of, 209-210

matched pairs, 416-417

driving speed data analysis using, 208—

naturally occurring pairs, 415-416

209

nonparametric alternative to, 438-440

one-sample¿test, 209

null hypothesisfor, 422

reject region, 208

paired samples, 415-417

two-tailed tests versus, 209-210

repeated-measures ANOVAversus, 443

One-way between-subjects (between-S)

results for, 426-429, 433

ANOVA.See Analysis of variance

SPSS procedure, 424-426

One-way repeated-measures ANOVA. See

summary of, 437-438

99% Page 618 of 624 + Location 15567 of 15772

terms for, values of, 428

Percentages

variance in, 429

cumulative, 48, 137

Wilcoxon signed rank test versus, 438—

in frequencydistribution tables, 41-42

440

in surveys, margin of error for, 553

Parametric analyses, 564-565

Percentile rank, 48

Parametric statistics, 564

Perfect correlation, 7-8

Partition of sums of squares, 312-313

Perfect negative correlation, 237

Partition of variance

Person effects, 427

definition of, 520

Person x Treatment interaction, 449, 475—

in nonorthogonal factorial ANOVA, 513—

476

515

みbhacking ②①③, ⑤⑥④ ⑤⑦②

Pearson product-moment correlation. See

Phi coefficient (¢), 536

gearson`sr

Pie charts

Pearson's 7

for categorical variables, 99-100

artifacts that affect, 253

disadvantages of, 100

assumptions for, 242-244

Plagiarism, 4

bivariate outliers and, 242, 244-246

Planned contrasts, 376,399,467

computation of, 251-252, 285-286

ANOVA, 387-389, 399,467

definition of, 251, 290

null hypothesis, 387-388

deflation of, 275-276

Platykurtic distribution shape, 148

description of, 29

Plotting residuals, 315-318

distribution shapes, 243-244

Pointbiserial 7, 347-348

Fisher’s Zconversion of, 273

Political polls, 188

formulafor, 285-286

Polling organizations, 5,29

magnitudeof, 235, 275-283

Polynomial contrasts, 458

95% confidence interval for, 273-274

Pooled-variances /test, 340, 349, 357

nonparametric alternatives to, 271-273

Popper, Karl, 10

outcomes for, 262-264

Population

overestimation of, 278-280

definition of, 29

phi coefficient interpreted as, 537

hypothetical, 30

preliminary data screening for, 244

imaginary, 30

rand 7 as effectsizes, 258-261

notation for, 168-169

research example of, 246-250

sample and, 16,29

research situations for, 234

sample versus, 172

results sections for, 269-270

Population effect size, 220, 222

Spearman's 7 versus, 271-273

Population sampling distribution, 176-177

statistical power for, 261

Population standard deviation (0), 177-178

statistical significance of, 262

Population standard error (om)

tratio from, 255

factors that influence, 173

when 7= 0.0, 240-241

Neffects on value of, 173-176

Peer review, 9

100% Page 619 of 624 + Location 15593 of 15772

Population variance of contrasts, 455

Positively skewed distribution, 104, 146

frequency distribution tables for, 39, 46—

“Post hoc, ergo propter hoc”fallacy, 6

50

Post hoc power analysis, 220, 262,363

histograms for, 103-107

Post hoc tests, 376, 467

independent, 563-564

Practical significance

questions about, 72

description of, 208,217

SPSS use of descriptivestatistics for, 83—

statistical significance versus, 567-568

85

Prediction error, 171-172, 223,298

Quasi-experimental research design, 25-26

Predictor variables, 314

Quasi-experiments, 25-26

Preliminary data screening

Quincunx, 161-162

for ANOVA, 377-378

for bivariate regression, 297-298 description of, 149 forindependent-samples test, 335-338 for Pearson's 7, 244 Primary source, 2

Probability conditional, 528-529 unconditional, 528 Proportions, in frequency distribution tables,

41 Protected tests, 376,390-391, 467

“Protective factors,” 8 Proximal similarity model, 30 ク values critical evaluation of, 226 definition of, 198 exact, 206-207

limitations of, 567 misleading, 226 one-tailed, 202 problems with, 213 things not to say about, 211,228 two-tailed, 201, 209

Quintic trends, 459-460 Random assignment of participants to groups or conditions, 22-23 Random factors, 508 Random sampling of participants froma population, 22 Range, 85-86

Range rule, 90 Rating scales description of, 18 justification for using, 33-34 Ratio level of measurement, 32-33 Raw-score prediction equation, 309 Raw-score regression equation, 294, 308

Real-world variables, 160-163 Regression coefficients confidence intervals for, 300 description of, 291 OLS derivation of equation for, 321-323 Regression equations description of, 291-296, 307 graphing a line from two points obtained from, 320-321 Regression line, 291-292, 296, 304

gratio, 390-391

Regression slope, 291

Quadratic trends, 458-460

Reject regions

Quality control, 9

definition of, 198

Quantitative variables

specifying of, 199-202

dependent, 563-566

Reliability, 314

description of, 17-18, 38

Repeated contrasts, 457-458

100% Page 620 of 624 » Location 15621 of 15772

Repeated measures, contingency tables with,

Research designs

553-556

description of, 21

Repeated-measures analyses

experimental. See Experimental research

carryover effects in, 431

design

order effects in, 430-431

Researcher credentials, 3

participants, 431-432

Research questions, 19-20

Repeated-measures ANOVA

Research reports

advantages of, 472, 475

confidence levels in, 257

assumptions for, 455-456

description of, 571

computations for, 446-449

languagein, 2-3

counterbalancing in, 472-474

peer review of, 9

data, preliminary assessment of, 444—

Residuals

446

definition of, 298

description of, 571

plotting, 315-318

effect size, 470

standardized, 317

Friedman one-way ANOVAtest versus,

Results

476-478

ANOVA, 397-398

GLM, 460-464

bivariate analyses, 569-570

GLM, contrasts in, 456-460

chi-square tests, 542-543

GLM,output of, 464-468

factorial ANOVA, 504-505

GLM,variables added to, 474-475

generalizability of, 5

nonparametric alternative to, 476-478

independent-samples ¿test, 357

null hypothesisfor, 444

interpretation of, problems in, 31

overview of, 443-445

one-sample test, 207-209, 227

paired-samples /test versus, 443

paired-samples /test, 426-429, 433

Person x Treatment interaction, 475—

Pearson’s 7, 269-270

476

repeated-measures ANOVA, 469-470

results for, 469-470

Reverse J-shaped distribution, 105, 146

SPSS reliability procedure for, 449-453

“Risk factors,” 8

SSpartition in, 453-455

Rival explanatory variables, 21

statistical power of, 470-472

Robust analyses, 564

summary of, 475

Rounding, 94-95

Repeated-measures data, 419-420 Replication, 9 Representative sample, 30, 172

Research analysis of variance, 374-375 bivariate regression uses in, 290

future, planning of, 227 graph uses in, 121-122 past, understanding of, 226

100% Page 620 of 624 » Location 15648 of 15772

Row percentages, 526-528

Sample accidental, 30 convenience, 30,172

definition of, 29 notation for, 168-169 population and, 16, 29 population versus, 172

B, sp would berelatively large. If other factors

heartrate.

(effect size and A) are held constant, there would be a better chance of obtaining a large ¿value for

Results

Study A than for Study B. Recruiting similar participants can help withstatistical power, but it

An independent-samples /test was

also reduces generalizability of findings. The

performed to assess whether mean heart rate

participants in Study A are notdiverse.

differed significantly for a group of 10 participants who consumed no caffeine

12.11.4 Summary for Design Decisions Members of my undergraduate class became upset when I explained the way research design decisions can affect the values of « They said, “You

mean you can make a study turn out any way you want?” The answer is, within some limits, yes. The independent-samples ¿testis likely to be large for these situations and decisions. (For each factor, such as 7, add the condition “other factors being equal.”)

(Group 1) compared with a group of 10 participants who consumed 150 mg of caffeine. Preliminary data screening indicated that scores on heart rate were reasonably normally distributed within groups. There were two high-end outliers in Group 1, but they were not extreme; outliers were retained in the analysis. The mean heartrates differed significantly, (18) =-2.75, p= .013, two

tailed. Mean heart rate for the no-caffeine group (M= 57.8, SD = 7.2) was about 10 beats per minute lower than mean heart rate for the caffeine group (M= 67.9, SD= 9.1). The

® Nis large (a very large NV study can yield a

effectsize, as indexed by n2, was .30; this is a

statistically significant ¿ratio evenif the

very large effect. The 95% CI for the

population effect is very small).

difference between sample means, /ハ ー か ,

e Population effect size such as n° is large(this is often related to treatment dosages or types

had alower bound of-①⑦.⑧① and an upper bound of -2.39. This study suggests that

of participants being compared). e M, — Mis large (however, M; — M» is not

consuming 150 mg of caffeine may

interpretable if confoundsare present). * spis small (this happens when participant

increase on the order of 10 bpm.

significantly increase heart rate, with an

characteristics and assessmentsituations are

The assumption of homogeneity of variance was

homogeneous within groups).

assessed using the Levene test, £= 1.57, p= .226;

Depending on their research questions and resources, the degree to which researchers can control each of these factors may vary.

12.12 Results Section

this indicated no significant violation of the equal variance assumption. Readers generally assume that the equal variances assumed version of the 7 test (also called the pooled-variances ¿test) was used unless otherwise stated. If you see d/ reported to several decimal places, thistells you that the equal variances not assumed /test was

Following is an example of a “Results” section for

used.

the study of the effect of caffeine consumption on

eee 59% Page 356 of 624 - Location 9139 of 15772

paired-samples /test procedure, 424—

Standardized residuals, 317

426,468

Standard normal distribution, 135,154

Q-Qplot, 160

kurtosisfor, 159

relationship survey example, 264-269

reading tables of areasfor, 140-141

reliability procedure, for repeated-

Standard regression, 514

measures ANOVA, 449-453

Standard score, 135

salary data, 301-305

See alsoz scores (standard scores)

Save Output As dialog box in, 51, 55

Statistical analysis, 28-29

statistical power, 362

Statistical association, 7

Tukey HSD test, 391,396

Statistical control, 21, 446, 569

variables, properties, 58-63

Statistically significant outcomes

variables names, defining, 58-63

human error as cause of, 226

Wilcoxon signed rank test output, 440

interpretation of, 225-226

windows, moving between, 54-56

sampling error as cause of, 226

zscores, saving, 164-165

SPSS GLM procedures

Statistical power assessment of, 300

description of, 449

for correlation studies, 261-262

effect sizes, 493

definition of, 261,361

factorial ANOVA using, 496-499

description of, 218-220, 229-232

repeated-measures ANOVA, 460-464

in factorial ANOVA, 494-495

source tables, 492-493

repeated-measures ANOVA, 470-472

Spurious correlation, 263-264, 281

Statistical power analysis

SShetween, 380-381, 383-385

for ANOVA, 386-387

SSerror, 445

description of, 352,361

SStotal, 381-382, 487 SStreatment 445

SSwithin, 381, 383-385 SSwithin groups, 378 Standard deviation computation of, 115 population, 177-178 sample (s), 89 variation among scores in frequency table, 90-91 Standard error of the difference, 340 Standard error of the estimate (SZest), 318—

Statistical power tables, 218-219, 300, 361 Statistical significance chi-square test, 535-536 definition of, 217, 372 independent-samples ztest ③④①-③④② Pearson’s with, 262 practical significance versus, 567-568 Statistical significance tests avoidance of, 255-256 description of, 193 formulasfor, 353 limitations of, 567 logic of, 193

319

number of, 226

Standard error (SE) for Md, 424, 427

uncertainty in, 224

Standardization, 23,136

Statisticians, 10-11

Standardized regression equation, 294-295

Statistics

100% Page 622 of 624 - Location 15701 of 15772

12.13 Graphing Results: Means and CIs

raiesDan 55 Sis Dvr

CEE 因 ニ ュ jam = ーn [Jaen] 2% | commsons

Cumming and Finch (2005) suggested that

レ レ

authors should emphasize confidence intervals along with effect sizes. Graphs of CIs help focus

ョ ョ

リ ョ レ ョ ョ ョ ョ ョ ョ ョ 2

reader attention on these. Several types of CI graphs can be presented for the independentsamples test. We could set up a graph of the CI

ョー ape San

for the (M, — M») difference using either an error

а

baror a bar chart. The lower and upper limits of this Clare provided in the independent-samples ¿ test output. It is more common to show a CI for each of the group means (M, and M2). This can be

done with either the SPSS error bar or bar chart procedure. To obtain an error bar graph for M; and My, make the menu selections shown in Figure 12.15, Figure 12.16, and Figure 12.17.

In Figure 12.18 the separate vertical lines for each group (no caffeine, 150 mg caffeine) have two features. The dot represents the group mean. The T-shaped bars identify the lower and upper limits of the 95% CI for each group. Be careful when you examine error bar plots in journals or conference posters. Error bars that resemble the ones in Figure 12.18 sometimes represent the mean + 1 standard deviation, or the mean * 1 SZ, instead of a 95% CI. Graphs should be clearly labeled so that viewers know whatthe error bars represent.

Figure 12.15 SPSS Menu Selections for Error Bar Procedure

The image is a SPSS menu selection to obtain the error bar procedure for thefile hrcaffeine.sav.

At thetopofthe spreadsheet are the following menu buttons;file, edit, view, data, transform, analyze, graphs, utilities, extensions, window and help. Below these buttonsare icon buttonsto open a

file, save, print, go back andforward, and other table editing options.

The graphs menu hasbeenopened and the following selectionsare visible; chart builder, graphboard template chooser, Weibull plot, comparesubgroupsandlegacy dialogs. The legacy dialogs menu has beenopenedto show thefollowing menu options; bar, 2-D bar, line, area, pie, high-low, box plot,error bar, population pyramid, scatter or dot and histogram.

......

There is some data visible on the spreadsheet. This has been reproduced below:

Caffeine, hr 1,51 1,66 1,58 1,58 1,53

SE 50% Page 357 of 624 - Location 9167 of 15772

Variability, 83-84

standardization of, 136

Variables

unit free, 136

analysis based on, 18

values of, 137

categorical. See Categorical variables

Xscores converted into, 136

confounded, 21 control, 571-572 definition of, 16 dependent, 19, 314, 563-566 dichotomous, 242 independent, 19, 314, 563-566 moderator, 572 nominal, 17 nonexperimental design with, 24 ordinal, 17-18 outcome, 314

predictor, 314 quantitative. See Quantitative variables rating scale, 18 real-world, 160-163 types of, 17-19 Variance homogeneity of, 332 partition of. See Partition of variance reasons for, 91-92, 313-314 sphericity of, 456 Weighted means, 508-510 Wilcoxon signed rank test, 438-440 Within-group error variance, 356

Within-S; 28, 413, 418, 443 See alsoRepeated-measures ANOVA *ZPRED, 315 zratio, ⑰⑥-①⑦⑦ *ZRESID, 315

zscores (standard scores) areas and, 138 computation of, 135 definition of, 135 finding of, 136-137 saving of, 164-165

100% Page 623 of 624 » Location 15756 of 15772

ーで





E

ーで







ーマ

a

ао

-

100% Page 624 of 624 » Location 15772 of 15772