Research Methods and Statistics for the Social Sciences: A Brief Introduction [1 ed.] 1516577299, 9781516577293

Research Methods and Statistics for the Social Sciences: A Brief Introduction provides students with an accessible and p

153 117 39MB

English Pages 146 [142] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Table of Contents
Chapter 1: Why Am I Here?
Chapter 2: Laying the Foundation: Quantitative Research and Statistics
Chapter 3: Introduction to Statistics Fundamentals
Chapter 4: Standardized Scores, Correlational Research Design, and Calculating Pearsonâ•Žs r
Chapter 5: Psychological Assessments, Reliability, and Validity
Chapter 6: Experimental Designs
Chapter 7: How Do I Analyze My Results From My Between-SubjectsExperiment? Independent T-Tests and Dependent T-Tests
Chapter 8: How Do I Analyze My Results From My Within-SubjectsExperiment? Dependent T-Tests
Chapter 9: Complex Experiments Mean More Complex Statistics:One-Way ANOVA
Chapter 10: Experiments With Two Independent Variables: Two-Way ANOVA
Chapter 11: Chi-Square: When My Study Has a Categorical Dependent Variable
Chapter 12: Qualitative Research and Data Analysis
Chapter 13: Writing Your APA Report
Appendix: Cutoff Score Tables for Statistical Tests
Index
Recommend Papers

Research Methods and Statistics for the Social Sciences: A Brief Introduction [1 ed.]
 1516577299, 9781516577293

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

RESEARCH METHODS & STATISTICS FOR THE SOCIAL SCIENCES A Brief Introduction

Amber

DeBono,

Ph.D.

Research Methods and Statistics for the Social Sciences

Research Methods and Statistics for the Social Sciences A Brief Introduction

Amber DeBono

WINSTON-SALEM STATE UNIVERSITY

ssssssss

Bassim Hamadeh, CEO and Publisher Amy Smith, Senior Project Editor Christian Berk, Production Editor Emely Villavicencio, Sentor Graphic Designer Stephanie Kohl, Licensing Coordmator Natalte Piccotts, Director of Marketing Kasste Graves, Vice President of Editonial Jamie Giganti, Director of Academic Publishing

Copynght © 2021 by Cognella, Inc All rights reserved No part of this publication may be reprinted, reproduced, transmutted, or utilized 1 any form or by any electronic, mechanical, or other means, now known or hereafter nvented, mcluding photocopying, microfilming, and recording, or n any information retrieval system without the written permussion of Cognella, Inc For inquiries regarding permussions, translations, foreign nghts, audio rights, and any other forms of reproduction, please contact the Cognella Licensing Department at rights@cognella com Trademark Notice Product or corporate names may be trademarks or registered trademarks and are used only for 1dentification and explanation without mntent to infringe All software screenshots are Copyright © by Microsoft or IBM Corporation Cover image copyright ® 2017 1Stockphoto LP/SolStock Design Image Copyright © 2013 Depositphotos|foxiedelmar Printed in the United States of America

< cognella | sse 3970 Soreno ety B o 500 Son Diaga, CAG2121

Brief Contents Preface

xi

CHAPTER 1 Why Am | Here?

2

CHAPTER 2 Laying the Foundation: Quantitative Research and Statistics

10

CHAPTER 3 Introduction to Statistics Fundamentals

24

CHAPTER 4 Standardized Scores, Correlational Research Design,

CHAPTER5

and Calculating Pearson’s r

32

Psychological Assessments, Reliability, and Validity

44

CHAPTER 6 Experimental Designs

50

CHAPTER 7 How Do | Analyze My Results From My Between-Subjects Experiment? Independent T-Tests and Dependent T-Tests

58

CHAPTER 8 How Do | Analyze My Results From My Within-Subjects Experiment? Dependent T-Tests

68

CHAPTER 9 Complex Experiments Mean More Complex Statistics: One-Way ANOVA

74

CHAPTER 10 Experiments With Two Independent Variables: Two-Way ANOVA

82

CHAPTER 11 Chi-Square: When My Study Has a Categorical Dependent Variable

90

CHAPTER 12 Qualitative Research and Data Analysis

98

CHAPTER 13 Writing Your APA Report

106

Appendix: Cutoff Score Tables for Statistical Tests

122

Index

127

Detailed Contents L L CHAPTER1;,

S

Why AmUHere? svovssvimi saiis iii

Learning Objectives

s do

i v

————— v

svsia i

‘Why I Wrote Your Textbook

Why You Are Here Becoming an Ethical Researcher Chapter Summary

Key Terms Check-In Questions References CHAPTER2

Laying the Foundation: Quantitative Research and Statistics . . .

Learning Objectives First Things First: Talking the Talk What Is a Variable?

Independent Versus Dependent Variable Categorical Versus Continuous Variables

Developing Your Hypothesis Research Versus Null Hypothesis

Directional Versus Non-Directional Research Hypothesis

‘What Is Quantitative Research and Statistics? Quantitative Versus Qualitative Quantitative Research Involves Numerical Data

Let’s Start With the Easy Stuff Introduction to Frequencies and Percentages

Big N Versus Little n Normal Curve

Skewed Distributions

Probability

Probability and Null Hypothesis Testing: p

I

2

41

g

0

-

1

2

-1 -

2|

FIGURE 4.4. This scatterplot shows the five participants’ data for self-esteem and GPA (one plot represents two participants’ data because they scored the same on selfesteem and GPA). Note that the slope is upward from left to right—indicating a positive correlation. Also, the plots are fairly close to the regression line, indicating a strong correlation. Indeed, a correlation of .95 is quite strong - as indicated by our calculations!

Between o and 10 I Betwe:n Between

10;1

30

4o and 60

Zero or near zero 7Weak

Positive

-

;osgve

Moderate

Positive

Between 70 and .90

Strong

]

Positive

100

Perfect

Positive

Between o and -10

Zero or near zero

Negative

Between -10and -30

Weak

Negative

Between - 40 and - 60

Moderate

Negative

Between -70 and -90

Strong

Negative

-100

Perfect

Negative

Calculating Z-Scores and Correlations in Excel Let’s take a look at the loneliness scores we’ve been analyzing over the last few chapters. Excel can turn our loneliness scores into z-scores (standardized scores). To do this, we need to type

=standardize(score, mean, standard deviation). Here, we moved the scores over a column so

that the previous calculations can be properly labeled. To standardize 5, we type in =standard-

ize(B2, B, B11).

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES

i

A

-

e

MG

©

Loneliness Zlonely

E

F

5|=STANDARDIZE(B2,87,811)

3

9

4

10

5

8

6

7

7

Mean

8

Median

9 Mode

10 Variance

‘STANDARD\ZE(X, mean, standard_dev)

7.8

" #N/A

8 37

Standard 11

1)

Deviation | 1.923538

You should see -1.45565, which we would round to -1.46. Do the same for the remaining

scores. You should see this:

A | & e

ik

Loneliness Zlonely

2

5

-1.45565

3

9

0.62385

4

10

1.143726

5

8

0.103975

6

7

__-0.4159

-Mean

8

Median

9 Mode

10 Variance

7.8

" #N/A

!

8 3.7

Standard 11

Deviation

1.923538

Again, let’s double-check that these z-scores make sense. The score 10 is higher than the mean (7.8), so the z-score should be a positive number. Sure enough, it’s positive (Z = 1.14). The score 5 is lower than the mean. Sure enough, it has a negative z-score (Z = -1.46). Again,

it’s always good to double check the calculations because Excel isn’t perfect, and the people entering in these formulas aren’t perfect either. And, as a friendly reminder, always round to

the 100th place if you're using APA format. Now, let’s add a new variable: aggression. Imagine that the researcher not only measured how lonely participants are, but also measured how aggressive they are. Let’s reset our Excel spreadsheet to include only the loneliness scores and our new aggression scores like this:

CHAPTER 4 * STANDARDIZED SCORES, CORRELATIONAL RESEARCH DESIGN, AND CALCULATING PEARSON’S R

[N = 1

Loneliness

Aggression

[

2

5

1

3

9

3

4

10

4

5

8

9

6

7

5

--

Now, let’s get Excel to calculate the correlation for us. We do this by entering =correl(first

variable scores, in this case, A2 through A6, second variable scores, in this case, B2 through B6.) See image 4.4 to see exactly how to do this:

1! 2 3

PO

Loneliness

Aggression

4

5 6

When you hit enter, you will see that the Pearson’s r is .32. Behind the scenes, Excel calculated the z-scores and then calculated the formula for r. Unfortunately, Excel will not tell us the p-value for this r. So, we will need to look at the correlation table on page 42. In this case, we had five participants (two scores from each participant). The cutoff score for r with N = 5 (note that you need to subtract 2 from your N) is .878 for a two-tailed test. Is .32 higher than our cutoff of .878? No, it is not, so we do not have evidence to support the research hypothesis (in this case, that loneliness is positively correlated with aggression). This shouldn’t be surprising with only five participants. We need much larger samples in order to represent the population. Note that this r would be significant if we had 42 participants (the cutoff score is .304 for 42 participants and .32 would then be higher than this cutoff). However, it’s also important to realize that just because you get 42 participants doesn’t mean that the correlation will stay the same (it can go higher or lower).

‘When writing our research reports, we need to report exact p-values to the 1000th place. How do researchers do this? Most of us use a statistical program called SPSS. This is a software program that analyzes small and large datasets quickly. Let’s take a look at SPSS with this same dataset (to set up this dataset, you need to create variables in the variable view tab). As you can

see, it looks a lot like the Excel spreadsheet:

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES

3

& Loneliness|& Aggression 5.00

1.00

2

9.00

3.00

3

10.00

4.00

4

8.00

9.00

5

7.00

5.00

6

Now, let’s get SPSS to calculate Pearson’s r (let’s see if it matches what Excel reported) with

its corresponding p-value. To do this, click on “Analyze,” “Correlate,” and “Bivariate.” Next, bring over the two variables (in this case, loneliness and aggression) to the “Variables” section.

Make sure that “Pearson” is checked and that you are using a two-tailed test. Then click “OK.” Here is what you should see:

# Correlations [Dataset0] Correlations

Loneliness Loneliness

Pearson Correlation Sig. (2tailed) N Aggression _ Pearson Correlation Sig. (2-tailed) N

1 5 324 595 5

Aggression 324 595 5 1 5

Asyou can see, in the row that says Loneliness Pearson Correlation, we see under aggression the value, .324. This means that our r = .32 (rounding to the 100th place), but now we have a

p-value that is labeled “Sig. (2-tailed).” The Sig. stands for significance. So, if we were to report this correlation in a report, we would write something like this: There was not a significant correlation between loneliness and aggression, r = .32, p = .595. Note that the correlation is mirrored on the other side of the table. That is because the correlation between aggression and loneliness is the same as the correlation between loneliness and aggression. Also note that correlations along the diagonal are always 1. This is because, in this case, loneliness has a r = 1.00 with loneliness, and aggression has a 1.00 correlation with aggression. Any variable that is correlated with itself is always going to have a perfect positive correlation because it is correlating two sets of the same data (e.g., the scores from loneliness

are perfectly and positively correlated with the same scores in loneliness).

Chapter Summary Correlational research can help us understand the relationship between different variables. However, it cannot tell us if one variable is causing the other. Pearson’s correlation (rp{

the most commonly used correlation. r

) is

. can range between -1and +1. To calculate pearson?

CHAPTER 4 * STANDARDIZED SCORES, CORRELATIONAL RESEARCH DESIGN, AND CALCULATING PEARSON’S R we must first standardize the raw scores into z-scores. The closer LA

is to either -1 or +1,

the stronger the correlation. When data points for two variables are plotted on a scatterplot, we can see how strong a correlation is. The closer the plots fall on a diagonal line (also called a regression line), the stronger the correlation. Strong correlations are good because we will be good at predicting where one person will be on one variable by knowing where they are on the other. While strength is important, the direction of the correlation is also important. Positive

correlations indicate that as scores increase on one variable, they increase on the other variable. Negative correlations show an opposite relationship between the variables. We calculate correlations by summing the multiplication of z-scores from two variables and dividing by the total number of participants.

Key Terms Z-scores: Also known as standardized scores, these scores tell us how far each score is from the mean (in terms of standard deviations)

Z-test: A statistical test to find out if a single score is significantly different from the mean

Correlational research: A type of research study that examines how two or more variables are related to each other but does not determine cause and effect Scatterplot: A graph that includes plots for participants’ data on two variables T earson’ The most frequently reported correlation. It is calculated by summing the multiplication of z-scores on two variables and dividing that sum by the number of participants (N). Regression line: The best-fitting line in a scatterplot that is closest to the most data points Positive correlation: A correlation in which scores on one variable increase as scores on the other variable increase Negative correlation: A correlation in which scores on one variable increase as scores on the other variable decrease Strong correlation: A type of correlation that is good at predicting how one person will score on one variable, knowing how they scored on another variable

Weak correlation: A type of correlation that is not very good at predicting how one person

will score on one variable, knowing how they scored on another variable

Check-In Questions “wHwN

1. What is a z-score? Why is it important? Calculate the z-score for a raw score of 12 with a mean of 6 and a standard deviation of2. What does a z-test tell us? Is a z-score of -2.25 statistically significant in a two-tailed test? Imagine you calculatedr, =-.85. Describe the directionand strength of this correlation. When is it goodto use correlational research?

References Hinkle, D., Weirsma, W., & Jurs, S. (1988). Apphed statistics for the behavioral sciences. Houghton Muffln. Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25-45.

Credits

Fig 41 Copyright © by Qwfp (talk) (CC BY-SA 3.0) at https.//commons wikimedia.org/wiki/File.NormalDist1 96 png

Fig 42 Source http jwww statisticshowto com/probability-and-statistics/correlation-coefficient-formula/ Fig 43 Source https.jutwl0426 utweb utexas edu/Topics/Correlation/Text html

_

CHAPTER

Psychological

Assessments,

Reliability, and Validity n this chapter, you will learn about psychological assessments and what makes an assessment “good.” A good assessment is both reliable and valid. You will also learn about reliability and validity, which will be tied into what you learned from the previous chapter regarding correlations. Chapter 6 will refer back to Chapter S in regard to reliability and validity in experimental designs.

Learning Objectives In this chapter, you will learn the following: 1. The key components to creating a “good” psychological assessment: reliability and validity 2. What a reliable questionnaire is and how social science researchers measure reliability 3. Theways social science researchers create valid questionnaires and how we measure them Often, social scientists want to measure a characteristic in people. Sometimes this may be a personality trait (e.g., narcissism), a social worldview (e.g., explicit racism), or a view of themselves (e.g., self-esteem). For example, I might predict that people who are socially excluded will be especially aggressive if they are highly narcissistic. Therefore, I would need to measure narcissism. One of the most common ways to measure narcissism is with the Narcissism Personality Inventory (Raskin & Hall, 1979). So, how do researchers measure these types of 44

CHAPTERS + PSYCHOLOGICAL ASSESSMENTS, RELIABILITY, AND VALIDITY characteristics in people? Researchers find well-established questionnaires (also called scales and inventories) and administer them to the participants. This chapter will focus primarily on what constitutes a well-established questionnaire. To be well established, there must be ample evidence that the questionnaire is both reliable and valid.

Reliability Our questionnaires are deemed reliable when we have evidence that they are consistent. We want our questionnaire to consistently find the same results. Similar to a ruler (we want a ruler to always be 12 inches long), we want the scores on a questionnaire to consistently measure the

same characteristic. There are three main ways that we measure reliability: internal consistency (Cronbach’s alpha), test-retest reliability, and inter-rater reliability.

Internal Consistency: Cronbach’s Alpha

We want the questionnaires that we use in the social sciences to be internally consistent. This means that all the questions seem to be measuring the same concept, the one that we claim to be measuring. If all the questions are measuring the same concept, then the responses to

the questions should mostly be positively correlated with one another, rather than with the questions we designed to measure the same concept or behavior. However, researchers do not typically examine multiple correlations between item responses; instead, we look to a single number—the Cronbach’s alpha. The Cronbach’s alpha is like a mega-correlation; it tells us the extent to which responses to all questions correlate with each other. In general, social scientists like to have Cronbach’s alphas above .80, although .70 is generally considered acceptable. If you come across a questionnaire with a reported Cronbach’s alpha below .70, it would be wise to find another questionnaire with a Cronbach’s alpha higher than at least .70. To calculate this reliability analysis in SPSS, we click “Analyze,” “Scale,” and then “Reliability Analysis.” Let’s look at Figure 5.1. Here, we see that Chronbach’s alpha was calculated for a three-item questionnaire that measures loyalty. There are two spots I want to highlight here. Under Cronbach’s alpha, we see that the alpha is .799. We would report this as o = .80.

Test-Retest Reliability ‘We also want to receive the same (or very similar) scores on the questionnaires every time we

administer them to each of our participants. Rosenberg would want a participant who scored a 40 on the self-esteem scale to score a 40 (or very close to it) the next time the participant

responded to the scale. As researchers, this demonstrates that our questionnaires are consistent; we are consistently finding the same (or very similar) scores every time we administer

them. To assess test-retest reliability, we simply calculate the correlation between the first time we administer the scale with the second time (usually, these two administrations ofthese

questionnaires are days, if not weeks, apart). As with our Cronbach’s alpha, a correlation here above .80 would be considered good, but one above .70 is considered acceptable. A test-retest correlation below .70 would be unacceptable to researchers (and now you!). To calculate this

correlation, we use the correlation formula from Chapter 4. More specifically, we will test the relationship between the scores from the first test to the second test.

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES

Scale: ALL VARIABLES Case Processing Summary N Cases

%

Valid Elilded'

Total

15

100.0

0

0

15

100.0

a. Listwise deletion based on all variables in the procedure

Reliability Statistics

Cronbach's Alpha

N of tems

799

3

Item-Total Statistics Scale

Corrected

Item Deleted

Item Deleted

Correlation

| am always loyal to my

833

2952

572

813

| 'am always loyal to my

783

1638

815

523

8.40

22587

613

759

Scale Mean if

friends

family | 'am always loyal to the people in my life

Variance if

Item-Total

Cronbach's

Alpha if tem

Deleted

FIGURES5.1. This SPSS output demonstrates that the Cronbach’s alpha 1s .80 (rounding to the 100th place).

Inter-Rater Reliability

Inter-rater reliability is most often used with measuring behaviors directly—not with ques-

tionnaires. Imagine watching a child playing with toys in a laboratory while you’re behind a one-way mirror. It’s your job to count how many aggressive behaviors you see. Two other people also have the same job you do; they are also counting how many aggressive behaviors they see. Researchers would consider you (and the other two people counting aggressive behaviors) to have the job of “rater.” The researchers in charge of this research project would want you to have very similar counts for how many aggressive behaviors you saw. To ensure that this happens, the raters need to be well trained. The definition of an aggressive behavior must be very clear. Itis also important that the raters are paying close attention and do not become distracted, as they may miss an aggressive behavior. Again, researchers want the raters’ scores to be highly correlated—above .80 is considered acceptable. If such a correlation is found, it indicates the

CHAPTERS + PSYCHOLOGICAL ASSESSMENTS, RELIABILITY, AND VALIDITY raters mostly agreed on the number of aggressive behaviors exhibited. More specifically, an

inter-rater reliability of .80 indicates that the raters agreed 80% of the time.

Validity Whereas reliability tells us about the consistency of our measures, validity tells us if we are measuring what we think we are measuring. Rosenberg wanted to be sure that his frequently used self-esteem scale actually measured self-esteem and not some other concept, such as happiness or pride. Therefore, he needed to find evidence that his scale was valid. The primary ways that we establish that our scales are valid are construct and content validity.

Construct Validity: Convergent and Discriminant Validity

Construct validity tells us if our scale is measuring to the construct (e.g, the trait) that it’s

supposed to. There are two main ways that researchers can establish construct validity: convergent and discriminant validity. To demonstrate that your questionnaire has good convergent validity, you will need to administer your questionnaire and other questionnaires that measure very similar characteristics. For example, Rosenberg might have participants respond to scales closely related to self-esteem such as confidence. Rosenberg would want there to be significant correlations between his self-esteem scale and these other scales. However, he would not want

them to be too highly correlated (e.g., r = .90 or higher) because that would indicate that he is measuring the same characteristic as the other scale (e.g., confidence). Researchers usually

want their questionnaires to measure a unique characteristic or to measure that characteristic

better than already existing questionnaires. To demonstrate discriminant validity, researchers want to ensure that the questionnaire they developed is not related to questionnaires that measure characteristics that are unrelated to the characteristic the researcher is hoping to measure. For example, Rosenberg would hope

that his self-esteem scale would be unrelated to a humor questionnaire or a death anxiety scale. One would expect that self-esteem would be unrelated to humor and death anxiety. Thus, Rosenberg would also administer these types of questionnaires and hope that there would be a zero (or near zero) correlation between his self-esteem scale and these seemingly

unrelated questionnaires.

Content Vali

Content validity tells us if the items in the questionnaire are assessing what they are supposed

to (Haynes et al., 1995). One of the best ways to establish content validity is to establish face validity. If you have face validity, this means that a lay person or expert in the field has reviewed the items in your questionnaire and agrees that your items seem like they would measure what you are trying to measure (Holden, 2010). For example, if you want to design a

humor questionnaire, you might want Dave Chappelle or Chris Rock to review your question-

naire (wouldn’t that be a fun study?). If you’re unable to get these rock stars of comedy, you can always ask your local comedians or academics who study comedy/humor to review your scale (some universities offer bachelor’s degrees in comedy). If they agree that the items in

your questionnaire seem to measure humor, then on the face of it you have content validity!

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES

Chapter Summary When researchers create a questionnaire or other type of assessment, they need to make sure that it is valid (accurate) and reliable (consistent). There are three types of reliability that are important for researchers: internal consistency (Cronbach’s alpha), test-retest reliability, and

inter-rater reliability. Cronbach’s alpha, a measure of internal consistency, tells researchers how closely questionnaire items are correlated with one another. Test-retest reliability tells researchers how much their assessment results in similar scores over time. Inter-rater reliability tells researchers who use behavioral measures if they are consistently measuring a behavior based on people’s ratings. To make sure that their assessment is valid, they need to find evidence for content and construct validity. Construct validity tells us if our scale is measuring what it is supposed to, whereas content validity tells us if our items are measuring what they are supposed to measure.

Key Terms Reliability: A characteristic of a measure that demonstrates consistency Internal consistency: A characteristic of a measure that demonstrates that all questions are measuring the same concept; this is typically reported as Cronbach’s alpha () Cronbach’s alpha: A statistic that we use to measure the internal consistency of a questionnaire Test-retest reliability: A characteristic of a measure that tells a researcher how consistent the scale is over time Inter-rater reliability: A characteristic of a measure

(usually behavioral) that demonstrates

the consistency of raters Validity: A characteristic of a measure that tells us how accurate our measure is, if the measure is measuring what it is designed to measure Construct validity: A characteristic of a measure that tells us if our scale is measuring the construct it’s supposed to. This includes convergent and discriminant validity. Convergent validity: The extent to which a measure correlates with similar measures Discriminant validity: The extent to which a measure does not correlate with unrelated measures Content validity: A characteristic of a measure that tells us if our items are measuring the content they are supposed to Face validity: Tells the researcher if, simply reading each item, the items seem to measure the

characteristic they are supposed to measure

Check-In Questions i

1. What What What What

Lo

m

is reliability and validity? Why are they important for research measures? are the three main types of reliability? are the two main types of validity? data would make a researcher say that they have a “good” or well-established

questionnaire?

CHAPTERS + PSYCHOLOGICAL ASSESSMENTS, RELIABILITY, AND VALIDITY

References

Haynes, S. N., Richard, D., & Kubany, E. S. (1995). Content validity 1n psychological assessment: A

functional approach to concepts and methods. Psychological Assessment, 7(3), 238-247. Holden, R. R. (2010). Face valdity. The Corsim Encyclopedia of Psychology, 1-2.

Raskin, R., & Hall, C. 8. (1979). A naraissistic personality inventory. Psychological Reports, 45(2), 590.

_

CHAPTER

Experimental Designs hereas in the previous chapter you learned what makes a questionnaire “good,” in this chapter, you will learn about what makes an experiment “good.” We will consider how studies can be “threatened” by problems with validity and how to avoid these problems. The chapter will begin witha refresher on independent and dependent variables by explaining the importance of how experiments inform us about cause and effect.

Learning Objectives In this chapter, you will learn the following: 1. The key components to the experimental method: independent and dependent variables

2. The threats to researchers’ experimental research 3. How social science researchers fight these threats to design better experiments 4. The importance of generalizability and how social science researchers can make their experiments generalizable

Cause and Effect: The Experimental Method Asssocial scientists, we want to know what causes the interesting behaviors we see.

Researchers know that the best way to determine what causes something else to happen is the experimental method. The experimental method requires two basic components: an independent and dependent variable. The independent variable is best described as providing people different experiences in a research study. Imagine that you have the following hypothesis: Social exclusion causes aggression. For example, if you want to examine how people react to feelingleft out, you would randomly assign some people to be left out of asocial experience and other 50

CHAPTER 6 + EXPERIMENTAL DESIGNS people to be included. When doing this type of research, researchers should be mindful of the participants’ feelings because we never want

people to feel badly because of our experiments.

In this type of research, a careful and thorough debriefing, an oral or written statement about

the trueintentions of the study, is needed so that participants realize that the exclusion feedback was only part of the experiment. It is very important that you assign the groups

at random (perhaps with the flip of a coin) so that all participants have an equal chance of FIGURE 6.1. Harlow removed infant monkeys being assigned to either group. This makes it from ther mothers and monitored how they unlikely that there is some trait that participants spent therr time with two artificial “mothers”: in one group share that the other group does a wire monkey that provided food or a soft not have. For example, it would be unlikely, if monkey that provided comfort. The monkeys overwhelming spent more time with the cloth you assigned groups at random, to end up with monkey even though it didn’t provide the food it a group of naturally aggressive people in the needed to live. Some find the treatment of these socially excluded group and naturally passive monkeys to be cruel and believe this study is people in the socially included group. unethical. In this experiment, the type of mother Also, after excluding or including par(cloth or wire) was the independent variable ticipants, you would need to measure the and the time periods spent with each mother were the dependent variables. dependent variable—the variable that is measured after the independent variable is administered and is hoped to be caused by the independent variable. Therefore, after including or excluding your participants, you would measure aggression. Although you might initially think of measuring aggression by allowing the participants to punch someone, you would later realize that this is quite unethical. Instead, you might assess aggression by measuring how much hot sauce participants put in a food sample for someone who hates spicy food (Lieberman et al,, 1999). To find support for your hypothesis, you would hope that the socially excluded participants would administer more hot sauce than the included participants.

Experimental Designs There are two main types of experiments: between-subjects and within-subjects. A between-subjects design is an experiment where groups of participants receive different experiences of the independent variable. The aforementioned social exclusion experiment would be a classic example of a between-subjects experimental design. A within-subjects design is an experiment in which each participant serves as both the experimental and control condition. Often, this means a pre-/ post-test design. This means that you measure the dependent variable before and after introducing the experimental condition to the participants. For example, if I turned the social exclusion experiment into a within-subjects design, Iwould measure the participants’ aggression, then make them feel left out, and finally measure aggression again. In this case, the researcher would hope that the aggression after excluding the participants would be higher than prior to the exclusion.

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES

Internal Validity In addition to being detectives, experimenters are also part-time warriors. They have to protect their studies from multiple problems that threaten the internal validity of their experiments. Internal validity describes how well an experiment demonstrates the cause-and-effect relationship between the independent and dependent variables. If they successfully defend their experiments from these threats, then they can be confident that their independent variable is

actually causing changes in the dependent variable. There are many ways our internal validity can be threatened. Like a shield, there are several ways to protect the internal validity of the study. Let’s go through the most frequent threats to internal validity.

History History is what we call one of the threats to an experiment’s internal validity. History refers to the events that occur between the measurements of the dependent variable in a within-subjects design. This is unlikely to be a problem in the within-subjects social exclusion experiment previously described. This is because so little time passes between the measurements of the dependent variable that it is unlikely for an event (other than the independent variable) to affect the post-test

measure of aggression. However, it might be a problem in experiments where there is a longer period of time between the pre- and post-test measures. Let’s say that the experimenter decided to measure participants’ aggression and provide the experimental treatment (social exclusion) on Monday and then asked participants to return the next day to measure their aggression. So

many events may have occurred between Monday and Tuesday. Perhaps the participants had a fight with a friend or had a road rage incident prior to returning to the experiment. Therefore, if the experimenter finds higher levels of aggression after the exclusion than before, it could be because they were excluded before. It could also be because they were angry at their friend or the rage-inducing driver. To protect against this threat, a good experimenter would make sure there is as little time as possible in between the pre- and post-test measures of the dependent variable.

Maturation

Another potential threat to internal validity is called maturation. Maturation refers to changes in participants that occur over time during an experiment. These changes could include actual physical maturation (for studies that last for weeks or years; this is called a longitudinal design), tiredness, boredom, and hunger. When participants are experiencing these states, this could have an effect on the dependent variable. For example, if one of the social exclusion participants got hangry (hungry and angry) during the experiment, the participant may become very aggressive, not because of the social exclusion, but because of the “hanger.” This would be a clear threat to the internal validity of the experiment. To combat this threat, it is important for the experimenter to make their studies as short as possible. The shorter the experiment, the

less likely participants will become tired, bored, hungry, or mature physically.

Practice Effect

Another threat to internal validity is the practice effect. The practice effect is when the experimenter measures the dependent variable so often that the participants perform better on the dependent variable simply because of practice (and not the independent variable). Imagine a

CHAPTER 6 + EXPERIMENTAL DESIGNS researcher believes that teaching children “new math” will result in higher math scores. Over the semester, the researcher teaches students “new math” and administers 10 math tests.

Although the researcher may find that the students performed better on the 10th math test than the first, this may not be due to the independent variable (teaching students new math).

It is certainly possible that the students simply improved their math scores because they had practiced over 10 tests. To avoid this validity threat, researchers should keep measurements of the dependent variable to a minimum.

Reactive Measures Reactive measures are another way that research studies are threatened. Reactive measures are measurements of the dependent variable that provoke the participants and result in imprecisely measuring the dependent variable. For example, questionnaires that measure participants’

sexual activity and drug use would be considered reactive measures. Participants may not respond truthfully to these types of provoking questions. One way to combat this threat is to obtaina certificate of confidentiality. Often used in studies investigating drug use, this certificate prevents police or other authorities from obtaining the information participants shared during the study. This added layer of confidentiality is designed to allow participants to feel more comfortable sharing this sensitive information with the researchers.

Selection

Another type of threat to internal validity is selection. Selection refers to choosing participants in a way so that our groups are not equal prior to the experiment. For example, if we assigned men to the social exclusion condition and women to the inclusion condition, we cannot be sure if gender (not the independent variable) or social exclusion (the independent variable)

caused changes in aggression. To fight this threat, researchers should always randomly assign participants to condition.

Mortality Mortality is a threat to internal validity that does not (necessarily) refer to death. Instead,

mortality refers to dropout rates. In particular, this is problematic if participants drop out of one of the experimental conditions more often than another experimental condition. This could mean that participants in one experimental group die at a higher rate than another group (this can be common in pharmaceutical research), but often it simply means that participants in one group stop participating at a higher rate than another group. Imagine a researcher examining the effect of therapy on depression. Typically, one group of participants would receive

no therapy because they are in the control condition, while the other would receive therapy sessions. Sometimes, the participants not receiving treatment lose interest in the study and cease participating. To combat this threat, researchers may need to emphasize the importance of participation—especially for participants not receiving the experimental treatment—or

carefully consider how they incentivize both groups of participants.

Demand Characte

One last important threat researchers must keep in mind are demand characteristics. Sometimes, researchers may lead participants to behave a certain way in an experiment and therefore

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES “demand” certain “characteristics” from the participants. For example, a researcher in the social exclusion experiment might be more rude to participants in the social exclusion condition than the inclusion condition. If the experimenter finds that excluded participants are more aggressive than the included ones, this could be due to exclusion, or it could be due to the

rude treatment they received. Therefore, the researcher cannot conclude that exclusion caused aggression. Often, demand characteristics are done unknowingly and unintentionally, although not always. To prevent the occurrence of these demand characteristics, experimenters should create scripts that are practiced and strictly followed during the experiment. Participants, in turn, may respond to questionnaires in a way that is misleading or false. This type of bias is called response bias. Response bias may be due to demand characteristics, but this bias may simplybe due to the participants’ desire to present themselves favorably. For example, if a researcher wants to understand how often people use illicit drugs, participants may report infrequently using these drugs because they do not want to appear to be a “drug user.” In this study, the researcher might conclude that the recruited population very infrequently uses illicit drugs, when the rate of drug us is actually much higher.

Important Steps to Protect Against These Threats To ensure that your experiment is internally valid, it is critical for you to randomly assign your participants (in a between-group design). Also, be sure to keep the length of your study short and keep measurements of the dependent variable to a minimum. A good researcher will also practice running the experiment multiple times and follow a script so that all participants are treated similarly (except for the experience of the independent variable).

External Validity: Generalizing Your Findings External validity refers to the extent to which your experimental results apply to different populations and situations that are different from those in your experiment. Researchers want to be able to generalize their findings from the sample they recruited to the general population, across time and place. In order to do that, researchers must keep three types of generalization in mind: population, environmental, and temporal generalizability.

To be able to apply the results of an experiment to a group of participants that is different and more encompassing than those used in the original experiment is called population generaliz-

ability. Researchers do not do their research so that they can say that this effect is true of the small sample they recruited; they do it so that they can say this is an effect you would likely find in the larger population. Unfortunately, much research in the social sciences (particularly psychology) has poor population generalizability. This is because most samples of participants are comprised of undergraduate students. This particular population makes it difficult to generalize as they tend to be primarily White, female, 18-20 years old, and from families that are financially well-off (Nielsen et al., 2017). For these studies, it is very difficult to demonstrate

that the results of studies conducted only on this population can generalize to people from other races, genders, ages, and socioeconomic statuses. Social scientists should, to the extent

possible, recruit from abroad swath of the population to ensure good population generalizability.

CHAPTER 6 + EXPERIMENTAL DESIGNS

The ability to find the same

>4

(or very similar) results from an experiment to a situation or

environment that differs from that of the original experiment is called environmental gen-

eralizability. Social sciences that confine their study of social events to a laboratory will have poor environmental generalizability. To improve your environmental generalizability, you should conduct your study in a variety of settings (laboratories, classrooms, on campus, and off campus) to see if you find the same results in all these settings. If so, you have established a case for good environmental generalizability.

Temporal Generalizability

To have good temporal generalizability, you need to conduct your experiment for years and

find very similar results every year. One of the most reliable findings in the social sciences is that social exclusion causes aggression. Studies in the 1990s until now, study after study, year after year, find this effect. So, this experimental finding has good temporal generalizability.

The Statistics of Assessing Generalizability

So, how do we figure out if our results are generalizable? To determine generalizability, we will need to find a similar pattern of findings across studies. The best way to assess these patterns is with a meta-analysis. For a meta-analysis, researchers need to find all studies (published and unpublished) that investigate a specific effect of an independent variable on a dependent variable (e.g., the effect of social exclusion on aggression; for an example of a meta-analysis on

this effect, see Blackhart et al., 2009). This is a type of analysis that finds the average effect of an independent variable on a dependent variable across different studies'. Meta-analyses will include studies that sampled from different populations to address population generalizability, studies from various environments to address environmental generalizability, and studies conducted at different points in history to address temporal generalizability.

Major Threats to External Validity: Artificial Conditions, White Rats, and College Students

One of the major threats to external validity is artificial conditions. Usually, we are referring to laboratory studies. These are studies that are conducted in very sterile environments in which nearly all aspects are carefully controlled. However, in “real life” we are rarely in these types of environments. Researchers should keep this in mind as they are reviewing their laboratory results: Will I be able to find this effect out in the real world? To find out, researchers must

reproduce their findings in participants’ natural environments. In the social sciences, some researchers (predominantly biopsychologists, neuroscientists)

study rats to understand human beings. Because you can forgo the informed consent with rats, you can do more invasive procedures (such as providing them with illicit drugs) in a way that you cannot with human beings. Surprisingly, many of the findings from these rats can be generalized to human beings because we share several of the same basic biological structures in our brains. However, these researchers might not even be able to generalize their findings to other rats. As they typically study white rats, we are not sure if their findings could apply to rats of another color. 1

More specifically, meta-analyses determine an average effect size (Cohen’s d) across all studies See Chapter 7

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES Metaphorically, this is also true with our research on human beings. Research is often

conducted on undergraduates who are predominantly White. So, not only are our rat subjects white, so are our participants. It is critical for researchers to examine the demographics of their participants to make sure they are as representative of the population they are hoping to generalize to as possible.

Using undergraduates as participants is problematic in the social sciences. They are used

because they are a convenient sample (also called convenience sampling). They are easy to recruit as they are often required to participate in research as part of their coursework. It is always good to see when researchers keep this in mind and find samples outside of the college setting (or at least colleges with differing demographics) to demonstrate their population generalizability.

The Importance of Replication

Many researchers agree with Mook’s (1983) suggestion that generalizability is critical for research and that the best way to do this is through replication. Replication simply means finding the same results again with a new sample of participants. You will rarely see published experimental studies in the social sciences with a single experiment. Frequently, there will be multiple studies showing the same results again and again (sometimes with additional explanations or findings; this is called replication with extension). This is a good habit for researchers, and many are calling for more replications, as many classic studies in the social sciences appear to not be replicable (cannot find the same or similar results again). However, this is not unique to the social sciences; the medical sciences are going through a replication crisis as well.

Chapter Summary The experimental method is considered the best scientific method because it can provide evidence for a cause-and-effect relationship between the independent and dependent variable. To make sure that you have a good experiment, the experimenter must thwart several threats to the experiment’s internal and external validity. Mook (1983) suggested that it is critical for researchers to replicate their results. By doing this, the experimenter has strong evidence that an experiment’s results are generalizable.

Key Terms Experimental method: A research design that includes a manipulated independent variable (assigning participant to different experiences) and a measured dependent variable Between-subjects design: An experiment in which groups of participants receive different experiences of the independent variable Within-subjects design: An experiment in which the participants serve as both the experimental and control conditions Internal validity: The extent to which an experimenter can demonstrate that the independent variable causes changes to the dependent variable

History: The events that occur between the measurements of the dependent variable in a within-subjects design Maturation: The changes in participants that occur over time during an experiment Practice effect: Participants perform better on the dependent variable due to multiple measurements of the dependent variable

CHAPTER 6 + EXPERIMENTAL DESIGNS Reactive measures: Measurements of the dependent variable that provoke the participants Selection: Choosing participants in a way so that groups are not equal prior to the experiment

Mortality: Participants’ dropout rates that are particularly problematic if dropout rates differ between experimental conditions Demand characteristics: The researcher leads participants to behave in a certain way in the experiment

Response bias: Participants in a research study respond in a way that presents themselves

more favorably External validity: The extent to which experimental results apply to different populations and situations Population generalizability: The ability to apply the results of an experiment to a group of participants that is different and more encompassing than those used in the original experiment Environmental generalizability: The ability to find the same (or very similar) results from an experiment to a situation or environment that differs from the original experiment Temporal generalizability: The ability to find the same (or very similar) results from an experiment over time

Meta-analysis: A statistical analysis that examines the combined findings of multiple studies (published and unpublished) Artificial conditions: A research environment, such as a laboratory, that does not look or feel

like the participants’ natural environment Convenience sampling: Recruiting participants who are convenient or easy to find and par-

ticipate in research Replication: Repeating an experiment with a new set of participants

Check-In Questions 1. What is the primary difference between a between-subjects and a within-subjects design?

2. Name three threats that researchers worry about and how they can combat these threats. 3. What is generalizability? Why is it important for researchers to have results that are generalizable? What are the three primary ways we can demonstrate generalizability? 4. What does it mean to replicate another researcher’s study? What does it mean if the replication failed? What does it mean if the replication duplicates the original researcher’s results?

References

Blackhart, G. C., Nelson, B. C., Knowles,M. L.,& Baumeister, R. F. (2009). Rejection elicits emotional reactions but neither causes immediate distress nor lowers self-esteem: A meta-analytic review of 192 studies on social exclusion. Personality and Social Psychology Review, 13(4), 269-309. Lieberman, J. D., Solomon, S., Greenberg,J., & McGregor, H. A. (1999). A hot new way to measure aggres-

ston: Hot sauce allocation. Aggresstve Behavior- Official Journal of the International Soctety for Research on

Aggression, 25(5), 331-348. Mook, D. G. (1983). In defense of externalinvalidity. American Psychologist, 38(4), 379-387.

Nielsen, M., Haun, D., Kirtner, J., & Legare, C. H. (2017). The persistent sampling bias 1n developmental

psychology: A call to action. Journal of Expertmental Child Psychology, 162, 31-38.

Credits

Fig 6.1 Source https./jcommons wikimedia org/wiki/File Natural_of_Love_Wire_and_cloth_mother_surrogates jpg

CHAPTER

How Do | Analyze My Results From My Between-Subjects Experiment? Independent T-Tests and Dependent T-Tests ow that we have learned about experimental designs, we need to learn how we analyze our data from these experimental designs. In this chapter, we will learn about how we analyze data from a simple experiment—a two-group, between-subjects experimental design. An independent t-test is the best way to analyze this type of data. We will learn how to calculate this statistical test by hand and in SPSS.

Learning Objectives In this chapter, you will learn the following: 1. When you should use an independent t-test 2. How to calculate an independent t-test by hand and in SPSS 3. How to report independent t-tests in a scientific report

58

CHAPTER7 + HOW DO | ANALYZE MY RESULTS FROM MY BETWEEN-SUBJECTS EXPERIMENT?

A Review of Between- and Within-Subjects Design In Chapter 6, we learned that the two most common types of experimental designs are between and within subjects. Between-subjects design refers to experiments in which people are assigned to different experimental groups and the researcher determines if there is a difference between these groups. For example, in my research, I assign some people to be left out and others to be included. Each group experiences completely different aspects of the independent variable, in this case social exclusion. This type of research design is often referred to as a classic experimental design because it is a frequently used experimental design. On

the other hand, 1 could easily turn my research into a within-subjects design. This type of research design means that the subjects (although we should say participants: People are participants; animals are subjects) experience all or some of the aspects of the independent variable. To turn my between-subjects experiment into a within-subjects design, I could measure participants’ aggression after they signed their consent forms, then make them feel left out, and then measure their aggression again. In this within-subjects design, I would compare participants’ aggression before and after being left out. Therefore, participants are serving as both the control and experimental condition. The first aggression measure serves as the outcome from the “not excluded” condition, and the second aggression measure serves as the outcome from the excluded condition. This is also an example of a pre-/post-test design. In this case, the dependent variable (aggression) was measured before and after the experimental intervention, making participants feel left out. There are several pros and cons for using a between-subjects design. For example, a between-subjects design tends to be shorter than an experiment with a within-subjects design because the dependent variable is only measured once in a between-subjects experiment. In a within-subjects design, the dependent variable might be measured twice or multiple times (which is why within-subjects designs are sometimes called repeated measures designs)— especially if the experiment lasts days or weeks (which is especially typical in pharmaceutical research where researchers are examining the effects of drugs over a longer period of time). Another perk to using a between-subjects design is that order effects (also known as carryover effects) are irrelevant because the order of the independent variable and dependent variable are the same for all participants. However, for a within-subjects design, this would be a major concern. TABLE 7.1. Between-Subjects Design Versus Within-Subjects Design

Individual differences unlikely to play a role in the differences seen in the dependent variable

Need more

participants

Fewer participants

needed (two-for-one effect)

Indwvidual differences more likely to play a role in the measurements of the DV

(continued)

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES TABLE 7.1. (continued)

Shorter sessions (due | More costly (if paying to only administering | participants) one level of the independent variable)

= Less costly (if paying participants)

| Longer sessions (due to administering multiple levels of the independent variable)

Simpler design

More complex design

Low mortality rates

Higher mortality rates (especially

for studies that last

multiple days/weeks/ months)

Two-Group Between-Subjects Design: Independent T-Test The simplest research design is a two-group between-subjects design, with one independent variable and one dependent variable. Consider the previous example where participants are randomly assigned to be either included or excluded (the independent variable) and then

their aggression is measured (the dependent variable). In this experiment, I would expect the excluded participants to be more aggressive than the included participants. But as social scientists, we are part researchers and part statisticians. That means that after our research is completed, it’s time to put on our statistician hat. Remember, we want to find evidence to support our research hypothesis. In this case, our research hypothesis is that people who are excluded will be more aggressive than people who are included. That indicates we need to compare the means for aggression in each of these groups. You now know how to calculate the means for two groups. Now, we need to use these means to figure out

if the groups are significantly different from each other (in this case, how differently they aggressed). Remember, statistical significance is not the same as simple difference; it’s a much higher threshold. So, let’s get to it! For a two-group, between-subjects design, with a continuous dependent variable (in this case aggression was measured on a scale) we need to use an independent t-test

(only use an independent t-test if you have a categorical independent variable and a continuous dependent variable). The formula is fairly straightforward once we put it in plain English:

i M

LT——

CHAPTER7 + HOW DO | ANALYZE MY RESULTS FROM MY BETWEEN-SUBJECTS EXPERIMENT? To translate, this means that we need to subtract the mean (for the DV) from Group 2 from

1, and then divide by the standard error for both groups. The standard error is the standard deviation of the means for both groups. Now, it’s critical to remember that we want t to be as big as possible. That means we want the differences between the groups to be big and the standard error to be small (which makes sense; error is bad, so we want it to be as small as possible). Now, to calculate the standard error, you have to combine the variance of the means

for both groups: Step 1

5 wgromprvgours = S g

5 sigrnpa

Then, you take the square root of the variance of the means for both groups to calculate the standard error. Step 2

MoMg

N tiost

Let’s do a practice problem using the social exclusion experiment. Let’s say that, on average, the excluded participants (N = 20) scored a 10 on an aggression scale ranging from 0to 50 and the included participants (N = 20) scored a 6 on average. We can tell that these

two means are different from each other, and in the direction predicted by the research hypothesis, but are they significantly different from each other? To find out, we’ll need to do the independent t-test. We already have what we need for the numerator portion of the t-test: Step 3 ndpt

(10-6) Mgroupsigroup2

We have the easy part done. Now for the tough part: calculating the standard error. First we need the variance of the means for both groups. Let’s say that the variance of the means for each group is 2 (usually the variance of the means is not the same for both groups, but let’s start with a simple example). According to the formula, we need to add these together: 2 + 2 =4. This 4 is our variance of the means for both groups (SlMgmmAMgm“pl) But that’s not what we want; we want the standard error (or the standard deviation of the means)—that’s what goes in the

denominator of our independent t-test. All we have to do is take the square root of the variance of the means, which in this case is 4. And the square root of 4 is 2: Step 4 t

(10-6)_4 2 2

_

m

RESEARCH METHODS AND STATISTICS FOR THE SOCIAL SCIENCES Now, let’s take a look at this fraction. The difference between our groups is 4, but our error is 2. That means the error (which is bad) is half the size of the difference between the

groups. That is not a good sign for finding a significant difference between our groups. Still,

we need to finish calculating our t and see if it is significant. Obviously, 4 divided by 2 is 2. So, our tis 2.

On page 65, we need to find our cutoff score in the t-test table. We need to make sure that the t we calculated is larger than the cutoff score we find in the table. To find our cutoff score,

we need to know the df—the degrees of freedom. Degrees of freedom are a number we use that is always somewhat less than our total number of participants. We cannot use the total number of participants because we are using a sample of participants (typical in the social sciences), not an entire population. Remember, we had 20 people in each group. To calculate the degrees of freedom from an independent t-test (dfmdpl>, we need to subtract 1 from

each group and then add those two numbers together, as indicated in the following formula: Step 5 Df,

o =2+ (e =)

In our case, Dfmdm =(20-1)+(20-1)

=19+ 19=38.

Remember, in order for a statistic to be significant in the social sciences, the probability

of the null hypothesis being true in our dataset must be less than 5% (recall the p

> » ’

Quality Control

>

‘Spatial and Temporal Modeling.. 18M SPSS Amos.

*

[ Roc cure.

Window

Once your data is entered, you can do your dependent t-test. Click on “Analyze,” “Compare means,” and then “Paired-Samples T Test.” Next, you will need to bring your Time 2 variable under Variable 1 (because we calculated the difference scores by subtracting Time 1 from Time 2) and Time 1 under Variable 2. Then click “OK.” You will see the image appearing on the following page. SPSS is telling us that the dependent t-test results in t = 4.116, with df of 9, and a p-value of .003. What did we calculate when we did this t-test by hand? We calculated a t = 4.11, with df of 9, p < .05. Typically, we will have hand-calculated statistics that are a little off of the ones computed by SPSS due to rounding in our hand calculations (SPSS doesn’t round until it gets to the final statistic, which it rounds to the 1,000th place). Critically, SPSS is providing us with the exact p-value, which is telling us the probability of the null hypothesis being true given this dataset (in this case, there is a .3% chance that the null is true—a pretty small chance—and, critically, lower than a 5% chance, remember the p